public abstract static class ContextualTextIO.Read
extends <any>
ContextualTextIO.read()
.Constructor and Description |
---|
Read() |
Modifier and Type | Method and Description |
---|---|
PCollection<Row> |
expand(PBegin input) |
ContextualTextIO.Read |
from(java.lang.String filepattern)
Reads text from the file(s) with the given filename or filename pattern.
|
ContextualTextIO.Read |
from(ValueProvider<java.lang.String> filepattern)
Same as
from(filepattern) , but accepting a ValueProvider . |
protected FileBasedSource<Row> |
getSource() |
void |
populateDisplayData(DisplayData.Builder builder) |
ContextualTextIO.Read |
withCompression(Compression compression)
Reads from input sources using the specified compression type.
|
ContextualTextIO.Read |
withDelimiter(byte[] delimiter)
Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
|
ContextualTextIO.Read |
withEmptyMatchTreatment(EmptyMatchTreatment treatment)
|
ContextualTextIO.Read |
withHasMultilineCSVRecords(java.lang.Boolean hasMultilineCSVRecords)
When reading RFC4180 CSV files that have values that span multiple lines, set this to true.
|
ContextualTextIO.Read |
withHintMatchesManyFiles()
Hints that the filepattern specified in
from(String) matches a very large number of
files. |
ContextualTextIO.Read |
withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
Sets the
FileIO.MatchConfiguration . |
ContextualTextIO.Read |
withRecordNumMetadata()
Allows the user to opt into getting recordNums associated with each record.
|
public ContextualTextIO.Read from(java.lang.String filepattern)
This can be a local path (if running locally), or a Google Cloud Storage filename or
filename pattern of the form "gs://<bucket>/<filepath>"
(if running locally or using
remote execution service).
Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
If it is known that the filepattern will match a very large number of files (at least tens
of thousands), use withHintMatchesManyFiles()
for better performance and scalability.
public ContextualTextIO.Read from(ValueProvider<java.lang.String> filepattern)
from(filepattern)
, but accepting a ValueProvider
.public ContextualTextIO.Read withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
FileIO.MatchConfiguration
.public ContextualTextIO.Read withHasMultilineCSVRecords(java.lang.Boolean hasMultilineCSVRecords)
ContextualTextIO
).public ContextualTextIO.Read withCompression(Compression compression)
If no compression type is specified, the default is Compression.AUTO
.
public ContextualTextIO.Read withHintMatchesManyFiles()
from(String)
matches a very large number of
files.
This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
public ContextualTextIO.Read withRecordNumMetadata()
When set to true, it will introduce a grouping step to assemble the recordNums for each record, which will increase the resources used by the pipeline.
Use this when you need metadata like fileNames and you need processed position/order information.
public ContextualTextIO.Read withEmptyMatchTreatment(EmptyMatchTreatment treatment)
public ContextualTextIO.Read withDelimiter(byte[] delimiter)
public PCollection<Row> expand(PBegin input)
protected FileBasedSource<Row> getSource()
public void populateDisplayData(DisplayData.Builder builder)