ContextualTextIO (Apache Beam 2.49.0)

java.lang.Object
- org.apache.beam.sdk.io.contextualtextio.ContextualTextIO

```
public class ContextualTextIO
extends java.lang.Object
```
PTransforms that read text files and collect contextual information of the elements in the input.
Prefer TextIO when not reading files with multi-line records or additional record metadata is not required.
Reading from text files

To read a PCollection from one or more text files, use ContextualTextIO.read(). To instantiate a transform use ContextualTextIO.Read.from(String) and specify the path of the file(s) to be read. Alternatively, if the filenames to be read are themselves in a PCollection you can use FileIO to match them and readFiles() to read them.
read() returns a PCollection of Rows with schema RecordWithMetadata.getSchema(), each corresponding to one line of an input UTF-8 text file (split into lines delimited by '\n', '\r', '\r\n', or specified delimiter via ContextualTextIO.Read.withDelimiter(byte[])).
Filepattern expansion and watching

By default, the filepatterns are expanded only once. The combination of FileIO.Match#continuously(Duration, TerminationCondition) and readFiles() allow streaming of new files matching the filepattern(s).
By default, read() prohibits filepatterns that match no files, and readFiles() allows them in case the filepattern contains a glob wildcard character. Use ContextualTextIO.Read.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment) or FileIO.Match#withEmptyMatchTreatment(EmptyMatchTreatment) plus readFiles() to configure this behavior.
Example 1: reading a file or filepattern.
```
 Pipeline p = ...;

 // A simple Read of a file:
 PCollection<Row> records = p.apply(ContextualTextIO.read().from("/local/path/to/file.txt"));
 
```
Example 2: reading a PCollection of filenames.
```
 Pipeline p = ...;

 // E.g. the filenames might be computed from other data in the pipeline, or
 // read from a data source.
 PCollection<String> filenames = ...;

 // Read all files in the collection.
 PCollection<Row> records =
     filenames
         .apply(FileIO.matchAll())
         .apply(FileIO.readMatches())
         .apply(ContextualTextIO.readFiles());
 
```
Example 3: streaming new files matching a filepattern.
```
 Pipeline p = ...;

 PCollection<Row> records = p.apply(ContextualTextIO.read()
     .from("/local/path/to/files/*")
     .watchForNewFiles(
       // Check for new files every minute
       Duration.standardMinutes(1),
       // Stop watching the filepattern if no new files appear within an hour
       afterTimeSinceNewOutput(Duration.standardHours(1))));
 
```
Example 4: reading a file or file pattern of RFC4180-compliant CSV files with fields that may contain line breaks.
Example of such a file could be:
"aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx
```
 Pipeline p = ...;

 PCollection<Row> records = p.apply(ContextualTextIO.read()
     .from("/local/path/to/files/*.csv")
      .withHasMultilineCSVRecords(true));
 
```
Example 5: reading while watching for new files
```
 Pipeline p = ...;

 PCollection<Row> records = p.apply(FileIO.match()
      .filepattern("filepattern")
      .continuously(
        Duration.millis(100),
        Watch.Growth.afterTimeSinceNewOutput(Duration.standardSeconds(3))))
      .apply(FileIO.readMatches())
      .apply(ContextualTextIO.readFiles());
 
```
Example 6: reading with recordNum metadata.
```
 Pipeline p = ...;

 PCollection<Row> records = p.apply(ContextualTextIO.read()
     .from("/local/path/to/files/*.csv")
      .setWithRecordNumMetadata(true));
 
```
NOTE: When using ContextualTextIO.Read.withHasMultilineCSVRecords(Boolean), a single reader will be used to process the file, rather than multiple readers which can read from different offsets. For a large file this can result in lower performance.
NOTE: Use ContextualTextIO.Read.withRecordNumMetadata() when recordNum metadata is required. Computing absolute record positions currently introduces a grouping step, which increases the resources used by the pipeline. By default withRecordNumMetadata is set to false, in this case record objects will not contain absolute record positions within the entire file, but will still contain relative positions in respective offsets.
Reading a very large number of files

If it is known that the filepattern will match a very large number of files (e.g. tens of thousands or more), use ContextualTextIO.Read.withHintMatchesManyFiles() for better performance and scalability. Note that it may decrease performance if the filepattern matches only a small number of files.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ContextualTextIO.Read` Implementation of `read()`.
`static class`	`ContextualTextIO.ReadFiles` Implementation of `readFiles()`.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static ContextualTextIO.Read`	`read()` A `PTransform` that reads from one or more text files and returns a bounded `PCollection` containing one `element` for each line in the input files.
`static ContextualTextIO.ReadFiles`	`readFiles()` Like `read()`, but reads each file in a `PCollection` of `FileIO.ReadableFile`, returned by `FileIO.readMatches()`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - read
```
public static ContextualTextIO.Read read()
```
    A PTransform that reads from one or more text files and returns a bounded PCollection containing one element for each line in the input files.
  - readFiles
```
public static ContextualTextIO.ReadFiles readFiles()
```
    Like read(), but reads each file in a PCollection of FileIO.ReadableFile, returned by FileIO.readMatches().

Class ContextualTextIO

Reading from text files

Filepattern expansion and watching

Reading a very large number of files

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

read

readFiles