CsvIO (Apache Beam 2.55.0)

java.lang.Object
- org.apache.beam.sdk.io.csv.CsvIO

```
public class CsvIO
extends java.lang.Object
```
PTransforms for reading and writing CSV files.
Reading CSV files

Reading from CSV files is not yet implemented. Please see https://github.com/apache/beam/issues/24552.
Writing CSV files

To write a PCollection to one or more CSV files, use CsvIO.Write, using writeRows(java.lang.String, org.apache.commons.csv.CSVFormat) or write(java.lang.String, org.apache.commons.csv.CSVFormat). CsvIO.Write supports writing Row or custom Java types using an inferred Schema. Examples below show both scenarios. See the Beam Programming Guide on inferring schemas for more information on how to enable Beam to infer a Schema from a custom Java type.
CsvIO.Write only supports writing the parts of Schema aware types that do not contain any nested Schema.FieldTypes such a Schema.TypeName.ROW or repeated Schema.TypeName.ARRAY types. See VALID_FIELD_TYPE_SET for valid Schema.FieldTypes.
Example usage:

Suppose we have the following Transaction class annotated with @DefaultSchema(JavaBeanSchema.class) so that Beam can infer its Schema:
```
 @DefaultSchema(JavaBeanSchema.class)
 public class Transaction {
   public Transaction() { … }
   public Long getTransactionId();
   public void setTransactionId(Long transactionId) { … }
   public String getBank() { … }
   public void setBank(String bank) { … }
   public double getPurchaseAmount() { … }
   public void setPurchaseAmount(double purchaseAmount) { … }
 }
 
```
From a PCollection<Transaction>, CsvIO.Write can write one or many CSV files automatically creating the header based on its inferred Schema.
```
 PCollection<Transaction> transactions = ...
 transactions.apply(CsvIO.<Transaction>write("path/to/folder/prefix", CSVFormat.DEFAULT));
 
```
The resulting CSV files will look like the following where the header is repeated for every file, whereas by default, CsvIO.Write will write all fields in sorted order of the field names.
```
 bank,purchaseAmount,transactionId
 A,10.23,12345
 B,54.65,54321
 C,11.76,98765
 
```
To control the order and subset of fields that CsvIO.Write writes, use CSVFormat.withHeader(java.lang.Class<? extends java.lang.Enum<?>>). Note, however, the following constraints:
1. Each header column must match a field name in the Schema; matching is case sensitive.
2. Matching header columns must match Schema fields that are valid Schema.FieldTypes; see VALID_FIELD_TYPE_SET.
3. CSVFormat only allows repeated header columns when CSVFormat.withAllowDuplicateHeaderNames()
The following example shows the use of CSVFormat.withHeader(java.lang.Class<? extends java.lang.Enum<?>>) to control the order and subset of Transaction fields.
```
 PCollection<Transaction> transactions ...
 transactions.apply(
  CsvIO
    .<Transaction>write("path/to/folder/prefix", CSVFormat.DEFAULT.withHeader("transactionId", "purchaseAmount"))
 );
 
```
The resulting CSV files will look like the following where the header is repeated for every file, but will only include the subset of fields in their listed order.
```
 transactionId,purchaseAmount
 12345,10.23
 54321,54.65
 98765,11.76
 
```
In addition to header customization, CsvIO.Write supports CSVFormat.withHeaderComments(java.lang.Object...) as shown below. Note that CSVFormat.withCommentMarker(char) is required when specifying header comments.
```
 PCollection<Transaction> transactions = ...
 transactions
    .apply(
        CsvIO.<Transaction>write("path/to/folder/prefix",
        CSVFormat.DEFAULT
          .withCommentMarker('#')
          .withHeaderComments("Bank Report", "1970-01-01", "Operator: John Doe")
    );
 
```
The resulting CSV files will look like the following where the header and header comments are repeated for every shard file.
```
 # Bank Report
 # 1970-01-01
 # Operator: John Doe
 bank,purchaseAmount,transactionId
 A,10.23,12345
 B,54.65,54321
 C,11.76,98765
 
```
A PCollection of Rows works just like custom Java types illustrated above, except we use writeRows(java.lang.String, org.apache.commons.csv.CSVFormat) as shown below for the same Transaction class. We derive Transaction's Schema using a DefaultSchema.DefaultSchemaProvider. Note that hard-coding the Rows below is for illustration purposes. Developers are instead encouraged to take advantage of DefaultSchema.DefaultSchemaProvider.toRowFunction(org.apache.beam.sdk.values.TypeDescriptor<T>).
```
 DefaultSchemaProvider defaultSchemaProvider = new DefaultSchemaProvider();
 Schema schema = defaultSchemaProvider.schemaFor(TypeDescriptor.of(Transaction.class));
 PCollection<Row> transactions = pipeline.apply(Create.of(
  Row
    .withSchema(schema)
    .withFieldValue("bank", "A")
    .withFieldValue("purchaseAmount", 10.23)
    .withFieldValue("transactionId", "12345")
    .build(),
  Row
    .withSchema(schema)
    .withFieldValue("bank", "B")
    .withFieldValue("purchaseAmount", 54.65)
    .withFieldValue("transactionId", "54321")
    .build(),
  Row
    .withSchema(schema)
    .withFieldValue("bank", "C")
    .withFieldValue("purchaseAmount", 11.76)
    .withFieldValue("transactionId", "98765")
    .build()
 );

 transactions.apply(
  CsvIO
    .writeRowsTo("gs://bucket/path/to/folder/prefix", CSVFormat.DEFAULT)
 );
 
```
Writing the transactions PCollection of Rows would yield the following CSV file content.
```
 bank,purchaseAmount,transactionId
 A,10.23,12345
 B,54.65,54321
 C,11.76,98765
 
```
CsvIO.Write does not support the following CSVFormat properties and will throw an IllegalArgumentException.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class CsvIO.Write<T>
PTransform for writing CSV files.

Nested Classes
Modifier and Type	Class and Description
`static class`	`CsvIO.Write<T>` `PTransform` for writing CSV files.

Field Summary

Fields
Modifier and Type Field and Description

static java.util.Set<Schema.FieldType> VALID_FIELD_TYPE_SET
The valid Schema.FieldType from which CsvIO converts CSV records to the fields.

Fields
Modifier and Type	Field and Description
`static java.util.Set<Schema.FieldType>`	`VALID_FIELD_TYPE_SET` The valid `Schema.FieldType` from which `CsvIO` converts CSV records to the fields.

Constructor Summary

Constructors
Constructor and Description

CsvIO()

Constructors
Constructor and Description
`CsvIO()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static <T> CsvIO.Write<T>`	`write(java.lang.String to, CSVFormat csvFormat)` Instantiates a `CsvIO.Write` for writing user types in `CSVFormat` format.
`static CsvIO.Write<Row>`	`writeRows(java.lang.String to, CSVFormat csvFormat)` Instantiates a `CsvIO.Write` for writing `Row`s in `CSVFormat` format.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - VALID_FIELD_TYPE_SET
```
public static final java.util.Set<Schema.FieldType> VALID_FIELD_TYPE_SET
```
    The valid Schema.FieldType from which CsvIO converts CSV records to the fields.
- Constructor Detail
  - CsvIO
```
public CsvIO()
```
- Method Detail
  - write
```
public static <T> CsvIO.Write<T> write(java.lang.String to,
                                       CSVFormat csvFormat)
```
    Instantiates a CsvIO.Write for writing user types in CSVFormat format.
  - writeRows
```
public static CsvIO.Write<Row> writeRows(java.lang.String to,
                                         CSVFormat csvFormat)
```
    Instantiates a CsvIO.Write for writing Rows in CSVFormat format.

Class CsvIO

Reading CSV files

Writing CSV files

Example usage:

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

VALID_FIELD_TYPE_SET

Constructor Detail

CsvIO

Method Detail

write

writeRows