Class UnboundedSource<OutputT,CheckpointMarkT extends UnboundedSource.CheckpointMark> 
- Type Parameters:
 OutputT- Type of records output by this source.CheckpointMarkT- Type of checkpoint marks used by the readers of this source.
- All Implemented Interfaces:
 Serializable,HasDisplayData
- Direct Known Subclasses:
 UnboundedSolaceSource,UnboundedSourceImpl
Source that reads an unbounded amount of input and, because of that, supports some
 additional operations such as checkpointing, watermarks, and record ids.
 - Checkpointing allows sources to not re-read the same data again in the case of failures.
 - Watermarks allow for downstream parts of the pipeline to know up to what point in time the data is complete.
 - Record ids allow for efficient deduplication of input records; many streaming sources do not guarantee that a given record will only be read a single time.
 
See Window and Trigger for more information on timestamps and
 watermarks.
- See Also:
 
- 
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceA marker representing the progress and state of anUnboundedSource.UnboundedReader.static classAReaderthat reads an unbounded amount of input.Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T> - 
Constructor Summary
Constructors - 
Method Summary
Modifier and TypeMethodDescriptionabstract UnboundedSource.UnboundedReader<OutputT> createReader(PipelineOptions options, @Nullable CheckpointMarkT checkpointMark) Create a newUnboundedSource.UnboundedReaderto read from this source, resuming from the given checkpoint if present.abstract Coder<CheckpointMarkT> Returns aCoderfor encoding and decoding the checkpoints for this source.booleanIf offsetBasedDeduplicationSupported returns true, then the UnboundedSource needs to provide the following: UnboundedReader which provides offsets that are unique for each element and lexicographically ordered.booleanReturns whether this source requires explicit deduping.abstract List<? extends UnboundedSource<OutputT, CheckpointMarkT>> split(int desiredNumSplits, PipelineOptions options) Returns a list ofUnboundedSourceobjects representing the instances of this source that should be used when executing the workflow.Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder, getOutputCoder, populateDisplayData, validate 
- 
Constructor Details
- 
UnboundedSource
public UnboundedSource() 
 - 
 - 
Method Details
- 
split
public abstract List<? extends UnboundedSource<OutputT,CheckpointMarkT>> split(int desiredNumSplits, PipelineOptions options) throws Exception Returns a list ofUnboundedSourceobjects representing the instances of this source that should be used when executing the workflow. Each split should return a separate partition of the input data.For example, for a source reading from a growing directory of files, each split could correspond to a prefix of file names.
Some sources are not splittable, such as reading from a single TCP stream. In that case, only a single split should be returned.
Some data sources automatically partition their data among readers. For these types of inputs,
nidentical replicas of the top-level source can be returned.The size of the returned list should be as close to
desiredNumSplitsas possible, but does not have to match exactly. A low number of splits will limit the amount of parallelism in the source.- Throws:
 Exception
 - 
createReader
public abstract UnboundedSource.UnboundedReader<OutputT> createReader(PipelineOptions options, @Nullable CheckpointMarkT checkpointMark) throws IOException Create a newUnboundedSource.UnboundedReaderto read from this source, resuming from the given checkpoint if present.- Throws:
 IOException
 - 
getCheckpointMarkCoder
Returns aCoderfor encoding and decoding the checkpoints for this source. - 
requiresDeduping
public boolean requiresDeduping()Returns whether this source requires explicit deduping.This is needed if the underlying data source can return the same record multiple times, such a queuing system with a pull-ack model. Sources where the records read are uniquely identified by the persisted state in the CheckpointMark do not need this.
Generally, if
UnboundedSource.CheckpointMark.finalizeCheckpoint()is overridden, this method should return true. Checkpoint finalization is best-effort, and readers can be resumed from a checkpoint that has not been finalized. - 
offsetBasedDeduplicationSupported
public boolean offsetBasedDeduplicationSupported()If offsetBasedDeduplicationSupported returns true, then the UnboundedSource needs to provide the following:- UnboundedReader which provides offsets that are unique for each element and lexicographically ordered.
 - CheckpointMark which provides an offset greater than all elements read and less than or equal to the next offset that will be read.
 
 
 -