Package org.apache.beam.sdk.io
Class TextSource
java.lang.Object
- All Implemented Interfaces:
Serializable,HasDisplayData
Implementation detail of
TextIO.Read.
A FileBasedSource which can decode records delimited by newline characters.
This source splits the data into records using UTF-8 \n, \r, or
\r\n as the delimiter. This source is not strict and supports decoding the last record even if
it is not delimited. Finally, no records are decoded if the stream is empty.
This source supports reading from any arbitrary byte position within the stream. If the
starting position is not 0, then bytes are skipped until the first delimiter is found
representing the beginning of the first record to be decoded.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.FileBasedSource
FileBasedSource.FileBasedReader<T>, FileBasedSource.ModeNested classes/interfaces inherited from class org.apache.beam.sdk.io.OffsetBasedSource
OffsetBasedSource.OffsetBasedReader<T>Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T> -
Constructor Summary
ConstructorsConstructorDescriptionTextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter) TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter, int skipHeaderLines) TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter) TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter, int skipHeaderLines) -
Method Summary
Modifier and TypeMethodDescriptionprotected FileBasedSource<String> createForSubrangeOfFile(MatchResult.Metadata metadata, long start, long end) Creates and returns a newFileBasedSourceof the same type as the currentFileBasedSourcebacked by a given file and an offset range.protected FileBasedSource.FileBasedReader<String> createSingleFileReader(PipelineOptions options) Creates and returns an instance of aFileBasedReaderimplementation for the current source assuming the source represents a single file.Returns theCoderto use for the data read from this source.Methods inherited from class org.apache.beam.sdk.io.FileBasedSource
createReader, createSourceForSubrange, getEmptyMatchTreatment, getEstimatedSizeBytes, getFileOrPatternSpec, getFileOrPatternSpecProvider, getMaxEndOffset, getMode, getSingleFileMetadata, isSplittable, populateDisplayData, split, toString, validateMethods inherited from class org.apache.beam.sdk.io.OffsetBasedSource
getBytesPerOffset, getEndOffset, getMinBundleSize, getStartOffsetMethods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
Constructor Details
-
TextSource
public TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter, int skipHeaderLines) -
TextSource
public TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter) -
TextSource
public TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter, int skipHeaderLines) -
TextSource
-
-
Method Details
-
createForSubrangeOfFile
protected FileBasedSource<String> createForSubrangeOfFile(MatchResult.Metadata metadata, long start, long end) Description copied from class:FileBasedSourceCreates and returns a newFileBasedSourceof the same type as the currentFileBasedSourcebacked by a given file and an offset range. When current source is being split, this method is used to generate new sub-sources. When creating the source subclasses must call the constructorFileBasedSource(Metadata, long, long, long)ofFileBasedSourcewith corresponding parameter values passed here.- Specified by:
createForSubrangeOfFilein classFileBasedSource<String>- Parameters:
metadata- file backing the newFileBasedSource.start- starting byte offset of the newFileBasedSource.end- ending byte offset of the newFileBasedSource. May be Long.MAX_VALUE, in which case it will be inferred usingFileBasedSource.getMaxEndOffset(org.apache.beam.sdk.options.PipelineOptions).
-
createSingleFileReader
Description copied from class:FileBasedSourceCreates and returns an instance of aFileBasedReaderimplementation for the current source assuming the source represents a single file. File patterns will be handled byFileBasedSourceimplementation automatically.- Specified by:
createSingleFileReaderin classFileBasedSource<String>
-
getOutputCoder
Description copied from class:SourceReturns theCoderto use for the data read from this source.- Overrides:
getOutputCoderin classSource<String>
-