public abstract static class TextIO.Read extends PTransform<PBegin,PCollection<java.lang.String>>
TextIO.read()
.name, resourceHints
Constructor and Description |
---|
Read() |
Modifier and Type | Method and Description |
---|---|
PCollection<java.lang.String> |
expand(PBegin input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
TextIO.Read |
from(java.lang.String filepattern)
Reads text files that reads from the file(s) with the given filename or filename pattern.
|
TextIO.Read |
from(ValueProvider<java.lang.String> filepattern)
Same as
from(filepattern) , but accepting a ValueProvider . |
protected FileBasedSource<java.lang.String> |
getSource() |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
TextIO.Read |
watchForNewFiles(Duration pollInterval,
Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)
Same as
Read#watchForNewFiles(Duration, TerminationCondition, boolean) with matchUpdatedFiles=false . |
TextIO.Read |
watchForNewFiles(Duration pollInterval,
Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition,
boolean matchUpdatedFiles)
See
MatchConfiguration#continuously(Duration, TerminationCondition, boolean) . |
TextIO.Read |
withCompression(Compression compression)
Reads from input sources using the specified compression type.
|
TextIO.Read |
withCompressionType(TextIO.CompressionType compressionType)
Deprecated.
|
TextIO.Read |
withDelimiter(byte[] delimiter)
Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
|
TextIO.Read |
withEmptyMatchTreatment(EmptyMatchTreatment treatment)
|
TextIO.Read |
withHintMatchesManyFiles()
Hints that the filepattern specified in
from(String) matches a very large number of
files. |
TextIO.Read |
withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
Sets the
FileIO.MatchConfiguration . |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setResourceHints, toString, validate
public TextIO.Read from(java.lang.String filepattern)
This can be a local path (if running locally), or a Google Cloud Storage filename or
filename pattern of the form "gs://<bucket>/<filepath>"
(if running locally or using
remote execution service).
Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
If it is known that the filepattern will match a very large number of files (at least tens
of thousands), use withHintMatchesManyFiles()
for better performance and scalability.
public TextIO.Read from(ValueProvider<java.lang.String> filepattern)
from(filepattern)
, but accepting a ValueProvider
.public TextIO.Read withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
FileIO.MatchConfiguration
.@Deprecated public TextIO.Read withCompressionType(TextIO.CompressionType compressionType)
withCompression(org.apache.beam.sdk.io.Compression)
.public TextIO.Read withCompression(Compression compression)
If no compression type is specified, the default is Compression.AUTO
.
public TextIO.Read watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition, boolean matchUpdatedFiles)
MatchConfiguration#continuously(Duration, TerminationCondition, boolean)
.
This works only in runners supporting splittable DoFn
.
public TextIO.Read watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<java.lang.String,?> terminationCondition)
Read#watchForNewFiles(Duration, TerminationCondition, boolean)
with matchUpdatedFiles=false
.public TextIO.Read withHintMatchesManyFiles()
from(String)
matches a very large number of
files.
This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
public TextIO.Read withEmptyMatchTreatment(EmptyMatchTreatment treatment)
public TextIO.Read withDelimiter(byte[] delimiter)
public PCollection<java.lang.String> expand(PBegin input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PBegin,PCollection<java.lang.String>>
protected FileBasedSource<java.lang.String> getSource()
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PBegin,PCollection<java.lang.String>>
builder
- The builder to populate with display data.HasDisplayData