public class TextSource extends FileBasedSource<java.lang.String>
TextIO.Read.
A FileBasedSource which can decode records delimited by newline characters.
This source splits the data into records using UTF-8 \n, \r, or \r\n as the delimiter. This source is not strict and supports decoding the last record even if
it is not delimited. Finally, no records are decoded if the stream is empty.
This source supports reading from any arbitrary byte position within the stream. If the
starting position is not 0, then bytes are skipped until the first delimiter is found
representing the beginning of the first record to be decoded.
FileBasedSource.FileBasedReader<T>, FileBasedSource.ModeOffsetBasedSource.OffsetBasedReader<T>BoundedSource.BoundedReader<T>Source.Reader<T>| Constructor and Description |
|---|
TextSource(MatchResult.Metadata metadata,
long start,
long end,
byte[] delimiter) |
TextSource(MatchResult.Metadata metadata,
long start,
long end,
byte[] delimiter,
int skipHeaderLines) |
TextSource(ValueProvider<java.lang.String> fileSpec,
EmptyMatchTreatment emptyMatchTreatment,
byte[] delimiter) |
TextSource(ValueProvider<java.lang.String> fileSpec,
EmptyMatchTreatment emptyMatchTreatment,
byte[] delimiter,
int skipHeaderLines) |
| Modifier and Type | Method and Description |
|---|---|
protected FileBasedSource<java.lang.String> |
createForSubrangeOfFile(MatchResult.Metadata metadata,
long start,
long end)
Creates and returns a new
FileBasedSource of the same type as the current FileBasedSource backed by a given file and an offset range. |
protected FileBasedSource.FileBasedReader<java.lang.String> |
createSingleFileReader(PipelineOptions options)
Creates and returns an instance of a
FileBasedReader implementation for the current
source assuming the source represents a single file. |
Coder<java.lang.String> |
getOutputCoder()
Returns the
Coder to use for the data read from this source. |
createReader, createSourceForSubrange, getEmptyMatchTreatment, getEstimatedSizeBytes, getFileOrPatternSpec, getFileOrPatternSpecProvider, getMaxEndOffset, getMode, getSingleFileMetadata, isSplittable, populateDisplayData, split, toString, validategetBytesPerOffset, getEndOffset, getMinBundleSize, getStartOffsetgetDefaultOutputCoderpublic TextSource(ValueProvider<java.lang.String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter, int skipHeaderLines)
public TextSource(ValueProvider<java.lang.String> fileSpec, EmptyMatchTreatment emptyMatchTreatment, byte[] delimiter)
public TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter, int skipHeaderLines)
public TextSource(MatchResult.Metadata metadata, long start, long end, byte[] delimiter)
protected FileBasedSource<java.lang.String> createForSubrangeOfFile(MatchResult.Metadata metadata, long start, long end)
FileBasedSourceFileBasedSource of the same type as the current FileBasedSource backed by a given file and an offset range. When current source is being
split, this method is used to generate new sub-sources. When creating the source subclasses
must call the constructor #FileBasedSource(Metadata, long, long, long) of FileBasedSource with corresponding parameter values passed here.createForSubrangeOfFile in class FileBasedSource<java.lang.String>metadata - file backing the new FileBasedSource.start - starting byte offset of the new FileBasedSource.end - ending byte offset of the new FileBasedSource. May be Long.MAX_VALUE, in
which case it will be inferred using FileBasedSource.getMaxEndOffset(org.apache.beam.sdk.options.PipelineOptions).protected FileBasedSource.FileBasedReader<java.lang.String> createSingleFileReader(PipelineOptions options)
FileBasedSourceFileBasedReader implementation for the current
source assuming the source represents a single file. File patterns will be handled by FileBasedSource implementation automatically.createSingleFileReader in class FileBasedSource<java.lang.String>public Coder<java.lang.String> getOutputCoder()
SourceCoder to use for the data read from this source.getOutputCoder in class Source<java.lang.String>