public class TextIO
extends java.lang.Object
PTransforms for reading and writing text files.
To read a PCollection from one or more text files, use TextIO.read() to
instantiate a transform and use TextIO.Read.from(String) to specify the path of the
file(s) to be read.
TextIO.Read returns a PCollection of Strings, each
corresponding to one line of an input UTF-8 text file (split into lines delimited by '\n', '\r',
or '\r\n').
Example:
Pipeline p = ...;
// A simple Read of a local file (only runs locally):
PCollection<String> lines = p.apply(TextIO.read().from("/local/path/to/file.txt"));
To write a PCollection to one or more text files, use TextIO.write(), using
TextIO.Write.to(String) to specify the output prefix of the files to write.
By default, all input is put into the global window before writing. If per-window writes are
desired - for example, when using a streaming runner -
TextIO.Write.withWindowedWrites() will cause windowing and triggering to be
preserved. When producing windowed writes, the number of output shards must be set explicitly
using TextIO.Write.withNumShards(int); some runners may set this for you to a
runner-chosen value, so you may need not set it yourself. A FileBasedSink.FilenamePolicy can also be
set in case you need better control over naming files created by unique windows.
DefaultFilenamePolicy policy for producing unique filenames might not be appropriate
for your use case.
Any existing files with the same names as generated output files will be overwritten.
For example:
// A simple Write to a local file (only runs locally):
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
// Same as above, only with Gzip compression:
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
.withSuffix(".txt")
.withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));
| Modifier and Type | Class and Description |
|---|---|
static class |
TextIO.CompressionType
Possible text file compression types.
|
static class |
TextIO.Read
Implementation of
read(). |
static class |
TextIO.Write
Implementation of
write(). |
| Modifier and Type | Method and Description |
|---|---|
static TextIO.Read |
read()
A
PTransform that reads from one or more text files and returns a bounded
PCollection containing one element for each line of the input files. |
static TextIO.Write |
write()
A
PTransform that writes a PCollection to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line. |
public static TextIO.Read read()
PTransform that reads from one or more text files and returns a bounded
PCollection containing one element for each line of the input files.public static TextIO.Write write()
PTransform that writes a PCollection to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line.