public class TextIO
extends java.lang.Object
PTransform
s for reading and writing text files.
To read a PCollection
from one or more text files, use TextIO.read()
to
instantiate a transform and use TextIO.Read.from(String)
to specify the path of the
file(s) to be read.
TextIO.Read
returns a PCollection
of Strings
, each
corresponding to one line of an input UTF-8 text file (split into lines delimited by '\n', '\r',
or '\r\n').
Example:
Pipeline p = ...;
// A simple Read of a local file (only runs locally):
PCollection<String> lines = p.apply(TextIO.read().from("/local/path/to/file.txt"));
To write a PCollection
to one or more text files, use TextIO.write()
, using
TextIO.Write.to(String)
to specify the output prefix of the files to write.
By default, all input is put into the global window before writing. If per-window writes are
desired - for example, when using a streaming runner -
TextIO.Write.withWindowedWrites()
will cause windowing and triggering to be
preserved. When producing windowed writes, the number of output shards must be set explicitly
using TextIO.Write.withNumShards(int)
; some runners may set this for you to a
runner-chosen value, so you may need not set it yourself. A FileBasedSink.FilenamePolicy
can also be
set in case you need better control over naming files created by unique windows.
DefaultFilenamePolicy
policy for producing unique filenames might not be appropriate
for your use case.
Any existing files with the same names as generated output files will be overwritten.
For example:
// A simple Write to a local file (only runs locally):
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
// Same as above, only with Gzip compression:
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
.withSuffix(".txt")
.withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));
Modifier and Type | Class and Description |
---|---|
static class |
TextIO.CompressionType
Possible text file compression types.
|
static class |
TextIO.Read
Implementation of
read() . |
static class |
TextIO.Write
Implementation of
write() . |
Modifier and Type | Method and Description |
---|---|
static TextIO.Read |
read()
A
PTransform that reads from one or more text files and returns a bounded
PCollection containing one element for each line of the input files. |
static TextIO.Write |
write()
A
PTransform that writes a PCollection to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line. |
public static TextIO.Read read()
PTransform
that reads from one or more text files and returns a bounded
PCollection
containing one element for each line of the input files.public static TextIO.Write write()
PTransform
that writes a PCollection
to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line.