Class TextIO.Read
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
TextIO
TextIO.read()
.- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransform
should be expanded on the givenInputT
.Reads text files that reads from the file(s) with the given filename or filename pattern.from
(ValueProvider<String> filepattern) Same asfrom(filepattern)
, but accepting aValueProvider
.protected FileBasedSource
<String> void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.watchForNewFiles
(Duration pollInterval, Watch.Growth.TerminationCondition<String, ?> terminationCondition) Same aswatchForNewFiles(Duration, TerminationCondition, boolean)
withmatchUpdatedFiles=false
.watchForNewFiles
(Duration pollInterval, Watch.Growth.TerminationCondition<String, ?> terminationCondition, boolean matchUpdatedFiles) withCompression
(Compression compression) Reads from input sources using the specified compression type.withCompressionType
(TextIO.CompressionType compressionType) Deprecated.withDelimiter
(byte[] delimiter) Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').withEmptyMatchTreatment
(EmptyMatchTreatment treatment) Hints that the filepattern specified infrom(String)
matches a very large number of files.withMatchConfiguration
(FileIO.MatchConfiguration matchConfiguration) Sets theFileIO.MatchConfiguration
.withSkipHeaderLines
(int skipHeaderLines) Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Read
public Read()
-
-
Method Details
-
from
Reads text files that reads from the file(s) with the given filename or filename pattern.This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form
"gs://<bucket>/<filepath>"
(if running locally or using remote execution service).Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
If it is known that the filepattern will match a very large number of files (at least tens of thousands), use
withHintMatchesManyFiles()
for better performance and scalability. -
from
Same asfrom(filepattern)
, but accepting aValueProvider
. -
withMatchConfiguration
Sets theFileIO.MatchConfiguration
. -
withCompressionType
Deprecated. -
withCompression
Reads from input sources using the specified compression type.If no compression type is specified, the default is
Compression.AUTO
. -
watchForNewFiles
public TextIO.Read watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<String, ?> terminationCondition, boolean matchUpdatedFiles) SeeFileIO.MatchConfiguration.continuously(Duration, TerminationCondition, boolean)
.This works only in runners supporting splittable
DoFn
. -
watchForNewFiles
public TextIO.Read watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<String, ?> terminationCondition) Same aswatchForNewFiles(Duration, TerminationCondition, boolean)
withmatchUpdatedFiles=false
. -
withHintMatchesManyFiles
Hints that the filepattern specified infrom(String)
matches a very large number of files.This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
-
withEmptyMatchTreatment
-
withDelimiter
Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n'). -
withSkipHeaderLines
-
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PBegin,
PCollection<String>>
-
getSource
-
populateDisplayData
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PBegin,
PCollection<String>> - Parameters:
builder
- The builder to populate with display data.- See Also:
-
withCompression(org.apache.beam.sdk.io.Compression)
.