SpannerIO (Apache Beam 2.19.0)

java.lang.Object
- org.apache.beam.sdk.io.gcp.spanner.SpannerIO

```
@Experimental(value=SOURCE_SINK)
public class SpannerIO
extends java.lang.Object
```
Experimental Transforms for reading from and writing to Google Cloud Spanner.
Reading from Cloud Spanner

To read from Cloud Spanner, apply SpannerIO.Read transformation. It will return a PCollection of Structs, where each element represents an individual row returned from the read operation. Both Query and Read APIs are supported. See more information about reading from Cloud Spanner
To execute a query, specify a SpannerIO.Read.withQuery(Statement) or SpannerIO.Read.withQuery(String) during the construction of the transform.
```
 PCollection<Struct> rows = p.apply(
     SpannerIO.read()
         .withInstanceId(instanceId)
         .withDatabaseId(dbId)
         .withQuery("SELECT id, name, email FROM users"));
 
```
To use the Read API, specify a table name and a list of columns.
```
 PCollection<Struct> rows = p.apply(
    SpannerIO.read()
        .withInstanceId(instanceId)
        .withDatabaseId(dbId)
        .withTable("users")
        .withColumns("id", "name", "email"));
 
```
To optimally read using index, specify the index name using SpannerIO.Read.withIndex(java.lang.String).
The transform is guaranteed to be executed on a consistent snapshot of data, utilizing the power of read only transactions. Staleness of data can be controlled using SpannerIO.Read.withTimestampBound(com.google.cloud.spanner.TimestampBound) or SpannerIO.Read.withTimestamp(Timestamp) methods. Read more about transactions in Cloud Spanner.
It is possible to read several PCollections within a single transaction. Apply createTransaction() transform, that lazily creates a transaction. The result of this transformation can be passed to read operation using SpannerIO.Read.withTransaction(PCollectionView).
```
 SpannerConfig spannerConfig = ...

 PCollectionView<Transaction> tx =
 p.apply(
    SpannerIO.createTransaction()
        .withSpannerConfig(spannerConfig)
        .withTimestampBound(TimestampBound.strong()));

 PCollection<Struct> users = p.apply(
    SpannerIO.read()
        .withSpannerConfig(spannerConfig)
        .withQuery("SELECT name, email FROM users")
        .withTransaction(tx));

 PCollection<Struct> tweets = p.apply(
    SpannerIO.read()
        .withSpannerConfig(spannerConfig)
        .withQuery("SELECT user, tweet, date FROM tweets")
        .withTransaction(tx));
 
```
Writing to Cloud Spanner

The Cloud Spanner SpannerIO.Write transform writes to Cloud Spanner by executing a collection of input row Mutations. The mutations are grouped into batches for efficiency.
To configure the write transform, create an instance using write() and then specify the destination Cloud Spanner instance (SpannerIO.Write.withInstanceId(String) and destination database (SpannerIO.Write.withDatabaseId(String)). For example:
```
 // Earlier in the pipeline, create a PCollection of Mutations to be written to Cloud Spanner.
 PCollection<Mutation> mutations = ...;
 // Write mutations.
 SpannerWriteResult result = mutations.apply(
     "Write", SpannerIO.write().withInstanceId("instance").withDatabaseId("database"));
 
```
SpannerWriteResult
The SpannerWriteResult object contains the results of the transform, including a PCollection of MutationGroups that failed to write, and a PCollection that can be used in batch pipelines as a completion signal to Wait.OnSignal to indicate when all input has been written. Note that in streaming pipelines, this signal will never be triggered as the input is unbounded and this PCollection is using the GlobalWindow.
Batching

To reduce the number of transactions sent to Spanner, the Mutations are grouped into batches The default maximum size of the batch is set to 1MB or 5000 mutated cells, or 500 rows (whichever is reached first). To override this use withBatchSizeBytes(), withMaxNumMutations() or withMaxNumRows(). Setting either to a small value or zero disables batching.
Note that the maximum size of a single transaction is 20,000 mutated cells - including cells in indexes. If you have a large number of indexes and are getting exceptions with message: INVALID_ARGUMENT: The transaction contains too many mutations you will need to specify a smaller number of MaxNumMutations.
The batches written are obtained from by grouping enough Mutations from the Bundle provided by Beam to form (by default) 1000 batches. This group of Mutations is then sorted by Key, and the batches are created from the sorted group. This so that each batch will have keys that are 'close' to each other to optimise write performance. This grouping factor (number of batches) is controlled by the parameter withGroupingFactor().
Note that each worker will need enough memory to hold GroupingFactor x MaxBatchSizeBytes Mutations, so if you have a large MaxBatchSize you may need to reduce GroupingFactor
Database Schema Preparation

The Write transform reads the database schema on pipeline start. If the schema is created as part of the same pipeline, this transform needs to wait until this has happened. Use SpannerIO.Write.withSchemaReadySignal(PCollection) to pass a signal PCollection which will be used with Wait.OnSignal to prevent the schema from being read until it is ready. The Write transform will be paused until the signal PCollection is closed.
Transactions

The transform does not provide same transactional guarantees as Cloud Spanner. In particular,
- Individual Mutations are submitted atomically, but all Mutations are not submitted in the same transaction.
- A Mutation is applied at least once;
- If the pipeline was unexpectedly stopped, mutations that were already applied will not get rolled back.
Use MutationGroups with the SpannerIO.WriteGrouped transform to ensure that a small set mutations is bundled together. It is guaranteed that mutations in a MutationGroup are submitted in the same transaction. Note that a MutationGroup must not exceed the Spanner transaction limits.
```
 // Earlier in the pipeline, create a PCollection of MutationGroups to be written to Cloud Spanner.
 PCollection<MutationGroup> mutationGroups = ...;
 // Write mutation groups.
 SpannerWriteResult result = mutationGroups.apply(
     "Write",
     SpannerIO.write().withInstanceId("instance").withDatabaseId("database").grouped());
 
```
Streaming Support

SpannerIO.Write can be used as a streaming sink, however as with batch mode note that the write order of individual Mutation/MutationGroup objects is not guaranteed.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`SpannerIO.CreateTransaction` A `PTransform` that create a transaction.
`static class`	`SpannerIO.FailureMode` A failure handling strategy.
`static class`	`SpannerIO.Read` Implementation of `read()`.
`static class`	`SpannerIO.ReadAll` Implementation of `readAll()`.
`static class`	`SpannerIO.Write` A `PTransform` that writes `Mutation` objects to Google Cloud Spanner.
`static class`	`SpannerIO.WriteGrouped` Same as `SpannerIO.Write` but supports grouped mutations.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static SpannerIO.CreateTransaction`	`createTransaction()` Returns a transform that creates a batch transaction.
`static SpannerIO.Read`	`read()` Creates an uninitialized instance of `SpannerIO.Read`.
`static SpannerIO.ReadAll`	`readAll()` A `PTransform` that works like `read()`, but executes read operations coming from a `PCollection`.
`static SpannerIO.Write`	`write()` Creates an uninitialized instance of `SpannerIO.Write`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - read
```
@Experimental(value=SOURCE_SINK)
public static SpannerIO.Read read()
```
    Creates an uninitialized instance of SpannerIO.Read. Before use, the SpannerIO.Read must be configured with a SpannerIO.Read.withInstanceId(java.lang.String) and SpannerIO.Read.withDatabaseId(java.lang.String) that identify the Cloud Spanner database.
  - readAll
```
@Experimental(value=SOURCE_SINK)
public static SpannerIO.ReadAll readAll()
```
    A PTransform that works like read(), but executes read operations coming from a PCollection.
  - createTransaction
```
@Experimental
public static SpannerIO.CreateTransaction createTransaction()
```
    Returns a transform that creates a batch transaction. By default, TimestampBound.strong() transaction is created, to override this use SpannerIO.CreateTransaction.withTimestampBound(TimestampBound).
  - write
```
@Experimental
public static SpannerIO.Write write()
```
    Creates an uninitialized instance of SpannerIO.Write. Before use, the SpannerIO.Write must be configured with a SpannerIO.Write.withInstanceId(java.lang.String) and SpannerIO.Write.withDatabaseId(java.lang.String) that identify the Cloud Spanner database being written.

Class SpannerIO

Reading from Cloud Spanner

Writing to Cloud Spanner

SpannerWriteResult

Batching

Database Schema Preparation

Transactions

Streaming Support

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

read

readAll

createTransaction

write