Package org.apache.beam.sdk.io
Class TextSource
java.lang.Object
- All Implemented Interfaces:
Serializable
,HasDisplayData
Implementation detail of
TextIO.Read
.
A FileBasedSource
which can decode records delimited by newline characters.
This source splits the data into records using UTF-8
\n
, \r
, or
\r\n
as the delimiter. This source is not strict and supports decoding the last record even if
it is not delimited. Finally, no records are decoded if the stream is empty.
This source supports reading from any arbitrary byte position within the stream. If the
starting position is not 0
, then bytes are skipped until the first delimiter is found
representing the beginning of the first record to be decoded.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.FileBasedSource
FileBasedSource.FileBasedReader<T>, FileBasedSource.Mode
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.OffsetBasedSource
OffsetBasedSource.OffsetBasedReader<T>
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
Constructor Summary
ConstructorsConstructorDescriptionTextSource
(MatchResult.Metadata metadata, long start, long end, byte[] delimiter) TextSource
(MatchResult.Metadata metadata, long start, long end, byte[] delimiter, int skipHeaderLines) TextSource
(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter) TextSource
(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter, int skipHeaderLines) -
Method Summary
Modifier and TypeMethodDescriptionprotected FileBasedSource
<String> createForSubrangeOfFile
(MatchResult.Metadata metadata, long start, long end) Creates and returns a newFileBasedSource
of the same type as the currentFileBasedSource
backed by a given file and an offset range.protected FileBasedSource.FileBasedReader
<String> createSingleFileReader
(PipelineOptions options) Creates and returns an instance of aFileBasedReader
implementation for the current source assuming the source represents a single file.Returns theCoder
to use for the data read from this source.Methods inherited from class org.apache.beam.sdk.io.FileBasedSource
createReader, createSourceForSubrange, getEmptyMatchTreatment, getEstimatedSizeBytes, getFileOrPatternSpec, getFileOrPatternSpecProvider, getMaxEndOffset, getMode, getSingleFileMetadata, isSplittable, populateDisplayData, split, toString, validate
Methods inherited from class org.apache.beam.sdk.io.OffsetBasedSource
getBytesPerOffset, getEndOffset, getMinBundleSize, getStartOffset
Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
Constructor Details
-
TextSource
public TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter, int skipHeaderLines) -
TextSource
public TextSource(ValueProvider<String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter) -
TextSource
public TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter, int skipHeaderLines) -
TextSource
-
-
Method Details
-
createForSubrangeOfFile
protected FileBasedSource<String> createForSubrangeOfFile(MatchResult.Metadata metadata, long start, long end) Description copied from class:FileBasedSource
Creates and returns a newFileBasedSource
of the same type as the currentFileBasedSource
backed by a given file and an offset range. When current source is being split, this method is used to generate new sub-sources. When creating the source subclasses must call the constructorFileBasedSource(Metadata, long, long, long)
ofFileBasedSource
with corresponding parameter values passed here.- Specified by:
createForSubrangeOfFile
in classFileBasedSource<String>
- Parameters:
metadata
- file backing the newFileBasedSource
.start
- starting byte offset of the newFileBasedSource
.end
- ending byte offset of the newFileBasedSource
. May be Long.MAX_VALUE, in which case it will be inferred usingFileBasedSource.getMaxEndOffset(org.apache.beam.sdk.options.PipelineOptions)
.
-
createSingleFileReader
Description copied from class:FileBasedSource
Creates and returns an instance of aFileBasedReader
implementation for the current source assuming the source represents a single file. File patterns will be handled byFileBasedSource
implementation automatically.- Specified by:
createSingleFileReader
in classFileBasedSource<String>
-
getOutputCoder
Description copied from class:Source
Returns theCoder
to use for the data read from this source.- Overrides:
getOutputCoder
in classSource<String>
-