public class TextIO
extends java.lang.Object
PTransform
s for reading and writing text files.
To read a PCollection
from one or more text files, use TextIO.read()
to
instantiate a transform and use TextIO.Read.from(String)
to specify the path of the
file(s) to be read.
TextIO.Read
returns a PCollection
of Strings
, each
corresponding to one line of an input UTF-8 text file (split into lines delimited by '\n', '\r',
or '\r\n').
Example:
Pipeline p = ...;
// A simple Read of a local file (only runs locally):
PCollection<String> lines = p.apply(TextIO.read().from("/local/path/to/file.txt"));
To write a PCollection
to one or more text files, use TextIO.write()
, using
TextIO.Write.to(String)
to specify the output prefix of the files to write.
By default, all input is put into the global window before writing. If per-window writes are
desired - for example, when using a streaming runner -
TextIO.Write.withWindowedWrites()
will cause windowing and triggering to be
preserved. When producing windowed writes, the number of output shards must be set explicitly
using TextIO.Write.withNumShards(int)
; some runners may set this for you to a
runner-chosen value, so you may need not set it yourself. A FileBasedSink.FilenamePolicy
must be
set, and unique windows and triggers must produce unique filenames.
Any existing files with the same names as generated output files will be overwritten.
For example:
// A simple Write to a local file (only runs locally):
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
// Same as above, only with Gzip compression:
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"));
.withSuffix(".txt")
.withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));
Modifier and Type | Class and Description |
---|---|
static class |
TextIO.CompressionType
Possible text file compression types.
|
static class |
TextIO.Read
Implementation of
read() . |
static class |
TextIO.Write
Implementation of
write() . |
Modifier and Type | Method and Description |
---|---|
static TextIO.Read |
read()
A
PTransform that reads from one or more text files and returns a bounded
PCollection containing one element for each line of the input files. |
static TextIO.Write |
write()
A
PTransform that writes a PCollection to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line. |
public static TextIO.Read read()
PTransform
that reads from one or more text files and returns a bounded
PCollection
containing one element for each line of the input files.public static TextIO.Write write()
PTransform
that writes a PCollection
to a text file (or multiple text files
matching a sharding pattern), with each element of the input collection encoded into its own
line.