@Experimental(value=SOURCE_SINK) public class BigtableIO extends java.lang.Object
Transforms
for reading from and writing to Google Cloud Bigtable.
For more information about Cloud Bigtable, see the online documentation at Google Cloud Bigtable.
The Bigtable source returns a set of rows from a single table, returning a PCollection<Row>
.
To configure a Cloud Bigtable source, you must supply a table id, a project id, an instance id
and optionally a BigtableOptions
to provide more specific connection configuration. By
default, BigtableIO.Read
will read all rows in the table. The row ranges to be read can
optionally be restricted using BigtableIO.Read.withKeyRanges(java.util.List<org.apache.beam.sdk.io.range.ByteKeyRange>)
, and a RowFilter
can
be specified using BigtableIO.Read.withRowFilter(com.google.bigtable.v2.RowFilter)
. For example:
Pipeline p = ...;
// Scan the entire table.
p.apply("read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table"));
// Scan a prefix of the table.
ByteKeyRange keyRange = ...;
p.apply("read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table")
.withKeyRange(keyRange));
// Scan a subset of rows that match the specified row filter.
p.apply("filtered read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table")
.withRowFilter(filter));
The Bigtable sink executes a set of row mutations on a single table. It takes as input a
PCollection<KV<ByteString, Iterable<Mutation>>>
, where the
ByteString
is the key of the row being mutated, and each Mutation
represents an
idempotent transformation to that row.
To configure a Cloud Bigtable sink, you must supply a table id, a project id, an instance id
and optionally a configuration function for BigtableOptions
to provide more specific
connection configuration, for example:
PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
data.apply("write",
BigtableIO.write()
.withProjectId("project")
.withInstanceId("instance")
.withTableId("table"));
Optionally, BigtableIO.write() may be configured to emit BigtableWriteResult
elements
after each group of inputs is written to Bigtable. These can be used to then trigger user code
after writes have completed. See Wait
for details on the
windowing requirements of the signal and input PCollections.
// See Wait.on
PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
PCollection<BigtableWriteResult> writeResults =
data.apply("write",
BigtableIO.write()
.withProjectId("project")
.withInstanceId("instance")
.withTableId("table"))
.withWriteResults();
// The windowing of `moreData` must be compatible with `data`, see {@link org.apache.beam.sdk.transforms.Wait#on}
// for details.
PCollection<...> moreData = ...;
moreData
.apply("wait for writes", Wait.on(writeResults))
.apply("do something", ParDo.of(...))
This connector for Cloud Bigtable is considered experimental and may break or receive backwards-incompatible changes in future versions of the Apache Beam SDK. Cloud Bigtable is in Beta, and thus it may introduce breaking changes in future revisions of its service or APIs.
Permission requirements depend on the PipelineRunner
that is used to execute the
pipeline. Please refer to the documentation of corresponding PipelineRunners
for more details.
Modifier and Type | Class and Description |
---|---|
static class |
BigtableIO.Read
A
PTransform that reads from Google Cloud Bigtable. |
static class |
BigtableIO.Write
A
PTransform that writes to Google Cloud Bigtable. |
static class |
BigtableIO.WriteWithResults
A
PTransform that writes to Google Cloud Bigtable and emits a BigtableWriteResult for each batch written. |
Modifier and Type | Method and Description |
---|---|
static BigtableIO.Read |
read()
Creates an uninitialized
BigtableIO.Read . |
static BigtableIO.Write |
write()
Creates an uninitialized
BigtableIO.Write . |
@Experimental public static BigtableIO.Read read()
BigtableIO.Read
. Before use, the Read
must be
initialized with a BigtableIO.Read.withInstanceId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
and BigtableIO.Read.withProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies the source Cloud Bigtable instance, and a BigtableIO.Read.withTableId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies which table to read. A RowFilter
may also
optionally be specified using BigtableIO.Read.withRowFilter(RowFilter)
.@Experimental public static BigtableIO.Write write()
BigtableIO.Write
. Before use, the Write
must be
initialized with a BigtableIO.Write.withProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
and BigtableIO.Write.withInstanceId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies the destination Cloud Bigtable instance, and a
BigtableIO.Write.withTableId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies which table to write.