public final class DynamoDBIO extends java.lang.Object
PCollection<List<Map<String, AttributeValue>>> output = pipeline.apply( DynamoDBIO.<List<Map<String, AttributeValue>>>read() .withScanRequestFn(in -> ScanRequest.builder().tableName(tableName).totalSegments(1).build()) .items()); // ScanResponse items mapper
At a minimum you have to provide:
total segmentsare required. Note: Choose
total segmentsaccording to the number of workers used.
scanResponseMapperFnto map the
ScanResponseto the expected output type, such as
PCollection<T> data = ...; SerializableFunction<T, WriteRequest> requestBuilder = ...; data.apply( DynamoDBIO.<WriteRequest>write() .withWriteRequestMapperFn(t -> KV.of(tableName, requestBuilder.apply(t))));
Note: AWS does not allow writing duplicate keys within a single batch operation. If
primary keys possibly repeat in your stream (i.e. an upsert stream), you may encounter a
`ValidationError`. To address this you have to provide the key names corresponding to your
primary key using
DynamoDBIO.Write.withDeduplicateKeys(List). Based on these keys only the last
observed element is kept. Nevertheless, if no deduplication keys are provided, identical elements
are still deduplicated.
Configuration for a specific IO can be overwritten using
which also allows to configure the retry behavior for the respective IO.
Retries for failed requests can be configured using
ClientConfiguration.Builder#retry(Consumer) and are handled by the AWS SDK unless there's a
partial success (batch requests). The SDK uses a backoff strategy with equal jitter for computing
the delay before the next retry.
Note: Once retries are exhausted the error is surfaced to the runner which may then opt to retry the current partition in entirety or abort if the max number of retries of the runner is reached.
|Modifier and Type||Class and Description|
Write a PCollection
|Constructor and Description|
|Modifier and Type||Method and Description|