@Experimental(value=SOURCE_SINK) public class BigtableIO extends java.lang.Object
Transforms
for reading from and writing to Google Cloud Bigtable.
For more information about Cloud Bigtable, see the online documentation at Google Cloud Bigtable.
The Bigtable source returns a set of rows from a single table, returning a
PCollection<Row>
.
To configure a Cloud Bigtable source, you must supply a table id and a BigtableOptions
or builder configured with the project and other information necessary to identify the
Bigtable instance. By default, BigtableIO.Read
will read all rows in the table. The row
range to be read can optionally be restricted using BigtableIO.Read.withKeyRange(org.apache.beam.sdk.io.range.ByteKeyRange)
, and
a RowFilter
can be specified using BigtableIO.Read.withRowFilter(com.google.bigtable.v2.RowFilter)
. For example:
BigtableOptions.Builder optionsBuilder =
new BigtableOptions.Builder()
.setProjectId("project")
.setInstanceId("instance");
Pipeline p = ...;
// Scan the entire table.
p.apply("read",
BigtableIO.read()
.withBigtableOptions(optionsBuilder)
.withTableId("table"));
// Scan a prefix of the table.
ByteKeyRange keyRange = ...;
p.apply("read",
BigtableIO.read()
.withBigtableOptions(optionsBuilder)
.withTableId("table")
.withKeyRange(keyRange));
// Scan a subset of rows that match the specified row filter.
p.apply("filtered read",
BigtableIO.read()
.withBigtableOptions(optionsBuilder)
.withTableId("table")
.withRowFilter(filter));
The Bigtable sink executes a set of row mutations on a single table. It takes as input a
PCollection<KV<ByteString, Iterable<Mutation>>>
, where the
ByteString
is the key of the row being mutated, and each Mutation
represents an
idempotent transformation to that row.
To configure a Cloud Bigtable sink, you must supply a table id and a BigtableOptions
or builder configured with the project and other information necessary to identify the
Bigtable instance, for example:
BigtableOptions.Builder optionsBuilder =
new BigtableOptions.Builder()
.setProjectId("project")
.setInstanceId("instance");
PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
data.apply("write",
BigtableIO.write()
.withBigtableOptions(optionsBuilder)
.withTableId("table"));
In order to use local emulator for Bigtable you should use:
BigtableOptions.Builder optionsBuilder =
new BigtableOptions.Builder()
.setProjectId("project")
.setInstanceId("instance")
.setUsePlaintextNegotiation(true)
.setCredentialOptions(CredentialOptions.nullCredential())
.setDataHost("127.0.0.1") // network interface where Bigtable emulator is bound
.setInstanceAdminHost("127.0.0.1")
.setTableAdminHost("127.0.0.1")
.setPort(LOCAL_EMULATOR_PORT))
PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
data.apply("write",
BigtableIO.write()
.withBigtableOptions(optionsBuilder)
.withTableId("table");
This connector for Cloud Bigtable is considered experimental and may break or receive backwards-incompatible changes in future versions of the Apache Beam SDK. Cloud Bigtable is in Beta, and thus it may introduce breaking changes in future revisions of its service or APIs.
Permission requirements depend on the PipelineRunner
that is used to execute the
pipeline. Please refer to the documentation of corresponding
PipelineRunners
for more details.
Modifier and Type | Class and Description |
---|---|
static class |
BigtableIO.Read
A
PTransform that reads from Google Cloud Bigtable. |
static class |
BigtableIO.Write
A
PTransform that writes to Google Cloud Bigtable. |
Modifier and Type | Method and Description |
---|---|
static BigtableIO.Read |
read()
Creates an uninitialized
BigtableIO.Read . |
static BigtableIO.Write |
write()
Creates an uninitialized
BigtableIO.Write . |
@Experimental public static BigtableIO.Read read()
BigtableIO.Read
. Before use, the Read
must be
initialized with a
BigtableOptions
that specifies
the source Cloud Bigtable instance, and a tableId
that
specifies which table to read. A RowFilter
may also optionally be specified using
BigtableIO.Read.withRowFilter(com.google.bigtable.v2.RowFilter)
.@Experimental public static BigtableIO.Write write()
BigtableIO.Write
. Before use, the Write
must be
initialized with a
BigtableOptions
that specifies
the destination Cloud Bigtable instance, and a tableId
that specifies which table to write.