Class ContextualTextIO.Read
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
ContextualTextIO
ContextualTextIO.read()
.- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransform
should be expanded on the givenInputT
.Reads text from the file(s) with the given filename or filename pattern.from
(ValueProvider<String> filepattern) Same asfrom(filepattern)
, but accepting aValueProvider
.protected FileBasedSource
<Row> void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.withCompression
(Compression compression) Reads from input sources using the specified compression type.withDelimiter
(byte[] delimiter) Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').withEmptyMatchTreatment
(EmptyMatchTreatment treatment) withHasMultilineCSVRecords
(Boolean hasMultilineCSVRecords) When reading RFC4180 CSV files that have values that span multiple lines, set this to true.Hints that the filepattern specified infrom(String)
matches a very large number of files.withMatchConfiguration
(FileIO.MatchConfiguration matchConfiguration) Sets theFileIO.MatchConfiguration
.Allows the user to opt into getting recordNums associated with each record.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Read
public Read()
-
-
Method Details
-
from
Reads text from the file(s) with the given filename or filename pattern.This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form
"gs://<bucket>/<filepath>"
(if running locally or using remote execution service).Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
If it is known that the filepattern will match a very large number of files (at least tens of thousands), use
withHintMatchesManyFiles()
for better performance and scalability. -
from
Same asfrom(filepattern)
, but accepting aValueProvider
. -
withMatchConfiguration
Sets theFileIO.MatchConfiguration
. -
withHasMultilineCSVRecords
When reading RFC4180 CSV files that have values that span multiple lines, set this to true. Note: this reduces the read performance (see:ContextualTextIO
). -
withCompression
Reads from input sources using the specified compression type.If no compression type is specified, the default is
Compression.AUTO
. -
withHintMatchesManyFiles
Hints that the filepattern specified infrom(String)
matches a very large number of files.This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
-
withRecordNumMetadata
Allows the user to opt into getting recordNums associated with each record. This option is only supported with default triggers.When set to true, it will introduce a grouping step to assemble the recordNums for each record, which will increase the resources used by the pipeline.
Use this when you need metadata like fileNames and you need processed position/order information.
-
withEmptyMatchTreatment
-
withDelimiter
Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n'). -
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PBegin,
PCollection<Row>>
-
getSource
-
populateDisplayData
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PBegin,
PCollection<Row>> - Parameters:
builder
- The builder to populate with display data.- See Also:
-