public class TextSource extends FileBasedSource<java.lang.String>
TextIO.Read
.
A FileBasedSource
which can decode records delimited by newline characters.
This source splits the data into records using UTF-8
\n
, \r
, or \r\n
as the delimiter. This source is not strict and supports decoding the last record even if
it is not delimited. Finally, no records are decoded if the stream is empty.
This source supports reading from any arbitrary byte position within the stream. If the
starting position is not 0
, then bytes are skipped until the first delimiter is found
representing the beginning of the first record to be decoded.
FileBasedSource.FileBasedReader<T>, FileBasedSource.Mode
OffsetBasedSource.OffsetBasedReader<T>
BoundedSource.BoundedReader<T>
Source.Reader<T>
Constructor and Description |
---|
TextSource(MatchResult.Metadata metadata,
long start,
long end,
byte[] delimiter) |
TextSource(ValueProvider<java.lang.String> fileSpec,
EmptyMatchTreatment emptyMatchTreatment,
byte[] delimiter) |
Modifier and Type | Method and Description |
---|---|
protected FileBasedSource<java.lang.String> |
createForSubrangeOfFile(MatchResult.Metadata metadata,
long start,
long end)
Creates and returns a new
FileBasedSource of the same type as the current FileBasedSource backed by a given file and an offset range. |
protected FileBasedSource.FileBasedReader<java.lang.String> |
createSingleFileReader(PipelineOptions options)
Creates and returns an instance of a
FileBasedReader implementation for the current
source assuming the source represents a single file. |
Coder<java.lang.String> |
getOutputCoder()
Returns the
Coder to use for the data read from this source. |
createReader, createSourceForSubrange, getEmptyMatchTreatment, getEstimatedSizeBytes, getFileOrPatternSpec, getFileOrPatternSpecProvider, getMaxEndOffset, getMode, getSingleFileMetadata, isSplittable, populateDisplayData, split, toString, validate
getBytesPerOffset, getEndOffset, getMinBundleSize, getStartOffset
getDefaultOutputCoder
public TextSource(ValueProvider<java.lang.String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter)
public TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter)
protected FileBasedSource<java.lang.String> createForSubrangeOfFile(MatchResult.Metadata metadata, long start, long end)
FileBasedSource
FileBasedSource
of the same type as the current FileBasedSource
backed by a given file and an offset range. When current source is being
split, this method is used to generate new sub-sources. When creating the source subclasses
must call the constructor #FileBasedSource(Metadata, long, long, long)
of FileBasedSource
with corresponding parameter values passed here.createForSubrangeOfFile
in class FileBasedSource<java.lang.String>
metadata
- file backing the new FileBasedSource
.start
- starting byte offset of the new FileBasedSource
.end
- ending byte offset of the new FileBasedSource
. May be Long.MAX_VALUE, in
which case it will be inferred using FileBasedSource.getMaxEndOffset(org.apache.beam.sdk.options.PipelineOptions)
.protected FileBasedSource.FileBasedReader<java.lang.String> createSingleFileReader(PipelineOptions options)
FileBasedSource
FileBasedReader
implementation for the current
source assuming the source represents a single file. File patterns will be handled by FileBasedSource
implementation automatically.createSingleFileReader
in class FileBasedSource<java.lang.String>
public Coder<java.lang.String> getOutputCoder()
Source
Coder
to use for the data read from this source.getOutputCoder
in class Source<java.lang.String>