Class HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
java.lang.Object
org.apache.beam.sdk.io.Source<KV<K,V>>
org.apache.beam.sdk.io.BoundedSource<KV<K,V>>
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
- Type Parameters:
K
- Type of keys to be read.V
- Type of values to be read.
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
HadoopFormatIO
public static class HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
extends BoundedSource<KV<K,V>>
implements Serializable
Bounded source implementation for
HadoopFormatIO
.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
HadoopInputFormatBoundedSource
(SerializableConfiguration conf, Coder<K> keyCoder, Coder<V> valueCoder, @Nullable SimpleFunction<?, K> keyTranslationFunction, @Nullable SimpleFunction<?, V> valueTranslationFunction, HadoopFormatIO.SerializableSplit inputSplit, boolean skipKeyClone, boolean skipValueClone) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Creates instance of InputFormat class.createReader
(PipelineOptions options) Returns a newBoundedSource.BoundedReader
that reads from this source.long
An estimate of the total size (in bytes) of the data that would be read from this source.Returns theCoder
to use for the data read from this source.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.List
<BoundedSource<KV<K, V>>> split
(long desiredBundleSizeBytes, PipelineOptions options) Splits the source into bundles of approximatelydesiredBundleSizeBytes
.void
validate()
Checks that this source is valid, before it can be used in a pipeline.Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
Constructor Details
-
HadoopInputFormatBoundedSource
protected HadoopInputFormatBoundedSource(SerializableConfiguration conf, Coder<K> keyCoder, Coder<V> valueCoder, @Nullable SimpleFunction<?, K> keyTranslationFunction, @Nullable SimpleFunction<?, V> valueTranslationFunction, HadoopFormatIO.SerializableSplit inputSplit, boolean skipKeyClone, boolean skipValueClone)
-
-
Method Details
-
getConfiguration
-
validate
public void validate()Description copied from class:Source
Checks that this source is valid, before it can be used in a pipeline.It is recommended to use
Preconditions
for implementing this method. -
populateDisplayData
Description copied from class:Source
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classSource<KV<K,
V>> - Parameters:
builder
- The builder to populate with display data.- See Also:
-
split
public List<BoundedSource<KV<K,V>>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception Description copied from class:BoundedSource
Splits the source into bundles of approximatelydesiredBundleSizeBytes
. -
getEstimatedSizeBytes
Description copied from class:BoundedSource
An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.If there is no way to estimate the size of the source implementations MAY return 0L.
- Specified by:
getEstimatedSizeBytes
in classBoundedSource<KV<K,
V>> - Throws:
Exception
-
createInputFormatInstance
Creates instance of InputFormat class. The InputFormat class name is specified in the Hadoop configuration.- Throws:
IOException
-
getOutputCoder
Description copied from class:Source
Returns theCoder
to use for the data read from this source.- Overrides:
getOutputCoder
in classSource<KV<K,
V>>
-
createReader
public BoundedSource.BoundedReader<KV<K,V>> createReader(PipelineOptions options) throws IOException Description copied from class:BoundedSource
Returns a newBoundedSource.BoundedReader
that reads from this source.- Specified by:
createReader
in classBoundedSource<KV<K,
V>> - Throws:
IOException
-