public class TextIO
extends java.lang.Object
PTransforms for reading and writing text files.
 To read a PCollection from one or more text files, use TextIO.read() to
 instantiate a transform and use TextIO.Read.from(String) to specify the path of the
 file(s) to be read. Alternatively, if the filenames to be read are themselves in a PCollection you can use FileIO to match them and readFiles() to read them.
 
read() returns a PCollection of Strings, each corresponding to
 one line of an input UTF-8 text file (split into lines delimited by '\n', '\r', or '\r\n', or
 specified delimiter see TextIO.Read.withDelimiter(byte[])).
 
By default, the filepatterns are expanded only once. TextIO.Read.watchForNewFiles(org.joda.time.Duration, org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition<java.lang.String, ?>) or the
 combination of FileIO.Match#continuously(Duration, TerminationCondition) and readFiles() allow streaming of new files matching the filepattern(s).
 
By default, read() prohibits filepatterns that match no files, and readFiles()
 allows them in case the filepattern contains a glob wildcard character. Use TextIO.Read.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment) or FileIO.Match.withEmptyMatchTreatment(EmptyMatchTreatment) plus readFiles() to configure
 this behavior.
 
Example 1: reading a file or filepattern.
 Pipeline p = ...;
 // A simple Read of a local file (only runs locally):
 PCollection<String> lines = p.apply(TextIO.read().from("/local/path/to/file.txt"));
 Example 2: reading a PCollection of filenames.
 Pipeline p = ...;
 // E.g. the filenames might be computed from other data in the pipeline, or
 // read from a data source.
 PCollection<String> filenames = ...;
 // Read all files in the collection.
 PCollection<String> lines =
     filenames
         .apply(FileIO.matchAll())
         .apply(FileIO.readMatches())
         .apply(TextIO.readFiles());
 Example 3: streaming new files matching a filepattern.
 Pipeline p = ...;
 PCollection<String> lines = p.apply(TextIO.read()
     .from("/local/path/to/files/*")
     .watchForNewFiles(
       // Check for new files every minute
       Duration.standardMinutes(1),
       // Stop watching the filepattern if no new files appear within an hour
       afterTimeSinceNewOutput(Duration.standardHours(1))));
 If it is known that the filepattern will match a very large number of files (e.g. tens of
 thousands or more), use TextIO.Read.withHintMatchesManyFiles() for better performance and
 scalability. Note that it may decrease performance if the filepattern matches only a small number
 of files.
 
To write a PCollection to one or more text files, use TextIO.write(), using
 TextIO.Write.to(String) to specify the output prefix of the files to write.
 
For example:
 // A simple Write to a local file (only runs locally):
 PCollection<String> lines = ...;
 lines.apply(TextIO.write().to("/path/to/file.txt"));
 // Same as above, only with Gzip compression:
 PCollection<String> lines = ...;
 lines.apply(TextIO.write().to("/path/to/file.txt"))
      .withSuffix(".txt")
      .withCompression(Compression.GZIP));
 Any existing files with the same names as generated output files will be overwritten.
If you want better control over how filenames are generated than the default policy allows, a
 custom FileBasedSink.FilenamePolicy can also be set using TextIO.Write#to(FilenamePolicy).
 
TextIO supports all features of FileIO.write() and FileIO.writeDynamic(),
 such as writing windowed/unbounded data, writing data to multiple destinations, and so on, by
 providing a TextIO.Sink via sink().
 
For example, to write events of different type to different filenames:
 PCollection<Event> events = ...;
 events.apply(FileIO.<EventType, Event>writeDynamic()
       .by(Event::getTypeName)
       .via(TextIO.sink(), Event::toString)
       .to(type -> nameFilesUsingWindowPaneAndShard(".../events/" + type + "/data", ".txt")));
 For backwards compatibility, TextIO also supports the legacy FileBasedSink.DynamicDestinations interface for advanced features via Write#to(DynamicDestinations).
| Modifier and Type | Class and Description | 
|---|---|
| static class  | TextIO.CompressionTypeDeprecated. 
 Use  Compression. | 
| static class  | TextIO.ReadImplementation of  read(). | 
| static class  | TextIO.ReadAllDeprecated. 
 See  readAll()for details. | 
| static class  | TextIO.ReadFilesImplementation of  readFiles(). | 
| static class  | TextIO.SinkImplementation of  sink(). | 
| static class  | TextIO.TypedWrite<UserT,DestinationT>Implementation of  write(). | 
| static class  | TextIO.WriteThis class is used as the default return value of  write(). | 
| Modifier and Type | Method and Description | 
|---|---|
| static TextIO.Read | read()A  PTransformthat reads from one or more text files and returns a boundedPCollectioncontaining one element for each line of the input files. | 
| static TextIO.ReadAll | readAll()Deprecated. 
 You can achieve The functionality of  readAll()usingFileIOmatching plusreadFiles(). This is the preferred method to make composition
     explicit.TextIO.ReadAllwill not receive upgrades and will be removed in a future version
     of Beam. | 
| static TextIO.ReadFiles | readFiles()Like  read(), but reads each file in aPCollectionofFileIO.ReadableFile, returned byFileIO.readMatches(). | 
| static TextIO.Sink | sink()Creates a  TextIO.Sinkthat writes newline-delimited strings in UTF-8, for use withFileIO.write(). | 
| static TextIO.Write | write()A  PTransformthat writes aPCollectionto a text file (or multiple text files
 matching a sharding pattern), with each element of the input collection encoded into its own
 line. | 
| static <UserT> TextIO.TypedWrite<UserT,java.lang.Void> | writeCustomType()A  PTransformthat writes aPCollectionto a text file (or multiple text files
 matching a sharding pattern), with each element of the input collection encoded into its own
 line. | 
public static TextIO.Read read()
PTransform that reads from one or more text files and returns a bounded PCollection containing one element for each line of the input files.@Deprecated public static TextIO.ReadAll readAll()
readAll() using FileIO
     matching plus readFiles(). This is the preferred method to make composition
     explicit. TextIO.ReadAll will not receive upgrades and will be removed in a future version
     of Beam.PTransform that works like read(), but reads each file in a PCollection of filepatterns.
 Can be applied to both bounded and unbounded PCollections, so this is
 suitable for reading a PCollection of filepatterns arriving as a stream. However, every
 filepattern is expanded once at the moment it is processed, rather than watched for new files
 matching the filepattern to appear. Likewise, every file is read once, rather than watched for
 new entries.
public static TextIO.ReadFiles readFiles()
read(), but reads each file in a PCollection of FileIO.ReadableFile, returned by FileIO.readMatches().public static TextIO.Write write()
PTransform that writes a PCollection to a text file (or multiple text files
 matching a sharding pattern), with each element of the input collection encoded into its own
 line.public static <UserT> TextIO.TypedWrite<UserT,java.lang.Void> writeCustomType()
PTransform that writes a PCollection to a text file (or multiple text files
 matching a sharding pattern), with each element of the input collection encoded into its own
 line.
 This version allows you to apply TextIO writes to a PCollection of a custom type
 UserT. A format mechanism that converts the input type UserT to the String that
 will be written to the file must be specified. If using a custom FileBasedSink.DynamicDestinations
 object this is done using FileBasedSink.DynamicDestinations.formatRecord(UserT), otherwise the TextIO.TypedWrite.withFormatFunction(org.apache.beam.sdk.transforms.SerializableFunction<UserT, java.lang.String>) can be used to specify a format function.
 
The advantage of using a custom type is that is it allows a user-provided FileBasedSink.DynamicDestinations object, set via Write#to(DynamicDestinations) to examine the
 custom type when choosing a destination.
public static TextIO.Sink sink()
TextIO.Sink that writes newline-delimited strings in UTF-8, for use with FileIO.write().