Class HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
java.lang.Object
org.apache.beam.sdk.io.Source<KV<K,V>>
org.apache.beam.sdk.io.BoundedSource<KV<K,V>>
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
- Type Parameters:
K- Type of keys to be read.V- Type of values to be read.
- All Implemented Interfaces:
Serializable,HasDisplayData
- Enclosing class:
HadoopFormatIO
public static class HadoopFormatIO.HadoopInputFormatBoundedSource<K,V>
extends BoundedSource<KV<K,V>>
implements Serializable
Bounded source implementation for
HadoopFormatIO.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T> -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedHadoopInputFormatBoundedSource(SerializableConfiguration conf, Coder<K> keyCoder, Coder<V> valueCoder, @Nullable SimpleFunction<?, K> keyTranslationFunction, @Nullable SimpleFunction<?, V> valueTranslationFunction, HadoopFormatIO.SerializableSplit inputSplit, boolean skipKeyClone, boolean skipValueClone) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCreates instance of InputFormat class.createReader(PipelineOptions options) Returns a newBoundedSource.BoundedReaderthat reads from this source.longAn estimate of the total size (in bytes) of the data that would be read from this source.Returns theCoderto use for the data read from this source.voidpopulateDisplayData(DisplayData.Builder builder) Register display data for the given transform or component.List<BoundedSource<KV<K, V>>> split(long desiredBundleSizeBytes, PipelineOptions options) Splits the source into bundles of approximatelydesiredBundleSizeBytes.voidvalidate()Checks that this source is valid, before it can be used in a pipeline.Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
Constructor Details
-
HadoopInputFormatBoundedSource
protected HadoopInputFormatBoundedSource(SerializableConfiguration conf, Coder<K> keyCoder, Coder<V> valueCoder, @Nullable SimpleFunction<?, K> keyTranslationFunction, @Nullable SimpleFunction<?, V> valueTranslationFunction, HadoopFormatIO.SerializableSplit inputSplit, boolean skipKeyClone, boolean skipValueClone)
-
-
Method Details
-
getConfiguration
-
validate
public void validate()Description copied from class:SourceChecks that this source is valid, before it can be used in a pipeline.It is recommended to use
Preconditionsfor implementing this method. -
populateDisplayData
Description copied from class:SourceRegister display data for the given transform or component.populateDisplayData(DisplayData.Builder)is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData). Implementations may callsuper.populateDisplayData(builder)in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayDatain interfaceHasDisplayData- Overrides:
populateDisplayDatain classSource<KV<K,V>> - Parameters:
builder- The builder to populate with display data.- See Also:
-
split
public List<BoundedSource<KV<K,V>>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception Description copied from class:BoundedSourceSplits the source into bundles of approximatelydesiredBundleSizeBytes. -
getEstimatedSizeBytes
Description copied from class:BoundedSourceAn estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.If there is no way to estimate the size of the source implementations MAY return 0L.
- Specified by:
getEstimatedSizeBytesin classBoundedSource<KV<K,V>> - Throws:
Exception
-
createInputFormatInstance
Creates instance of InputFormat class. The InputFormat class name is specified in the Hadoop configuration.- Throws:
IOException
-
getOutputCoder
Description copied from class:SourceReturns theCoderto use for the data read from this source.- Overrides:
getOutputCoderin classSource<KV<K,V>>
-
createReader
public BoundedSource.BoundedReader<KV<K,V>> createReader(PipelineOptions options) throws IOException Description copied from class:BoundedSourceReturns a newBoundedSource.BoundedReaderthat reads from this source.- Specified by:
createReaderin classBoundedSource<KV<K,V>> - Throws:
IOException
-