@Experimental(value=SOURCE_SINK) public class BigtableIO extends java.lang.Object
Transforms
for reading from and writing to Google Cloud Bigtable.
For more information about Cloud Bigtable, see the online documentation at Google Cloud Bigtable.
The Bigtable source returns a set of rows from a single table, returning a
PCollection<Row>
.
To configure a Cloud Bigtable source, you must supply a table id, a project id, an instance
id and optionally a BigtableOptions
to provide more specific connection configuration.
By default, BigtableIO.Read
will read all rows in the table. The row ranges to be read
can optionally be restricted using BigtableIO.Read.withKeyRanges(java.util.List<org.apache.beam.sdk.io.range.ByteKeyRange>)
, and a RowFilter
can be specified using BigtableIO.Read.withRowFilter(com.google.bigtable.v2.RowFilter)
. For example:
Pipeline p = ...;
// Scan the entire table.
p.apply("read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table"));
// Scan a prefix of the table.
ByteKeyRange keyRange = ...;
p.apply("read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table")
.withKeyRange(keyRange));
// Scan a subset of rows that match the specified row filter.
p.apply("filtered read",
BigtableIO.read()
.withProjectId(projectId)
.withInstanceId(instanceId)
.withTableId("table")
.withRowFilter(filter));
The Bigtable sink executes a set of row mutations on a single table. It takes as input a
PCollection<KV<ByteString, Iterable<Mutation>>>
, where the
ByteString
is the key of the row being mutated, and each Mutation
represents an
idempotent transformation to that row.
To configure a Cloud Bigtable sink, you must supply a table id, a project id, an instance id
and optionally a configuration function for BigtableOptions
to provide more specific
connection configuration, for example:
PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
data.apply("write",
BigtableIO.write()
.withProjectId("project")
.withInstanceId("instance")
.withTableId("table"));
This connector for Cloud Bigtable is considered experimental and may break or receive backwards-incompatible changes in future versions of the Apache Beam SDK. Cloud Bigtable is in Beta, and thus it may introduce breaking changes in future revisions of its service or APIs.
Permission requirements depend on the PipelineRunner
that is used to execute the
pipeline. Please refer to the documentation of corresponding
PipelineRunners
for more details.
Modifier and Type | Class and Description |
---|---|
static class |
BigtableIO.Read
A
PTransform that reads from Google Cloud Bigtable. |
static class |
BigtableIO.Write
A
PTransform that writes to Google Cloud Bigtable. |
Modifier and Type | Method and Description |
---|---|
static BigtableIO.Read |
read()
Creates an uninitialized
BigtableIO.Read . |
static BigtableIO.Write |
write()
Creates an uninitialized
BigtableIO.Write . |
@Experimental public static BigtableIO.Read read()
BigtableIO.Read
. Before use, the Read
must be
initialized with a BigtableIO.Read.withInstanceId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
and
BigtableIO.Read.withProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies the source Cloud Bigtable
instance, and a BigtableIO.Read.withTableId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies which table to
read. A RowFilter
may also optionally be specified using
BigtableIO.Read.withRowFilter(RowFilter)
.@Experimental public static BigtableIO.Write write()
BigtableIO.Write
. Before use, the Write
must be
initialized with a BigtableIO.Write.withProjectId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
and
BigtableIO.Write.withInstanceId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies the destination Cloud
Bigtable instance, and a BigtableIO.Write.withTableId(org.apache.beam.sdk.options.ValueProvider<java.lang.String>)
that specifies
which table to write.