Package org.apache.beam.sdk.io
Class BoundedSource<T>
java.lang.Object
org.apache.beam.sdk.io.Source<T>
org.apache.beam.sdk.io.BoundedSource<T>
- Type Parameters:
T
- Type of records read by the source.
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Direct Known Subclasses:
BeamImpulseSource
,BigQueryStorageTableSource
,CosmosIO.BoundedCosmosBDSource
,ElasticsearchIO.BoundedElasticsearchSource
,HadoopFormatIO.HadoopInputFormatBoundedSource
,MongoDbGridFSIO.Read.BoundedGridFSSource
,OffsetBasedSource
A
Source
that reads a finite amount of input and, because of that, supports some
additional operations.
The operations are:
- Splitting into sources that read bundles of given size:
split(long, org.apache.beam.sdk.options.PipelineOptions)
; - Size estimation:
getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions)
; - The accompanying
reader
has additional functionality to enable runners to dynamically adapt based on runtime conditions.- Progress estimation (
BoundedSource.BoundedReader.getFractionConsumed()
) - Tracking of parallelism, to determine whether the current source can be split (
BoundedSource.BoundedReader.getSplitPointsConsumed()
andBoundedSource.BoundedReader.getSplitPointsRemaining()
). - Dynamic splitting of the current source (
BoundedSource.BoundedReader.splitAtFraction(double)
).
- Progress estimation (
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract BoundedSource.BoundedReader
<T> createReader
(PipelineOptions options) Returns a newBoundedSource.BoundedReader
that reads from this source.abstract long
getEstimatedSizeBytes
(PipelineOptions options) An estimate of the total size (in bytes) of the data that would be read from this source.abstract List
<? extends BoundedSource<T>> split
(long desiredBundleSizeBytes, PipelineOptions options) Splits the source into bundles of approximatelydesiredBundleSizeBytes
.Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder, getOutputCoder, populateDisplayData, validate
-
Constructor Details
-
BoundedSource
public BoundedSource()
-
-
Method Details
-
split
public abstract List<? extends BoundedSource<T>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception Splits the source into bundles of approximatelydesiredBundleSizeBytes
.- Throws:
Exception
-
getEstimatedSizeBytes
An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.If there is no way to estimate the size of the source implementations MAY return 0L.
- Throws:
Exception
-
createReader
public abstract BoundedSource.BoundedReader<T> createReader(PipelineOptions options) throws IOException Returns a newBoundedSource.BoundedReader
that reads from this source.- Throws:
IOException
-