@Experimental public class HBaseIO extends java.lang.Object
For more information, see the online documentation at HBase.
The HBase source returns a set of rows from a single table, returning a
PCollection<Result>
.
To configure a HBase source, you must supply a table id and a Configuration
to identify the HBase instance. By default, HBaseIO.Read
will read all rows in the
table. The row range to be read can optionally be restricted using with a Scan
object
or using the HBaseIO.Read.withKeyRange(org.apache.beam.sdk.io.range.ByteKeyRange)
, and a Filter
using
HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter)
, for example:
// Scan the entire table.
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table"));
// Filter data using a HBaseIO Scan
Scan scan = ...
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table"))
.withScan(scan));
// Scan a prefix of the table.
ByteKeyRange keyRange = ...;
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table")
.withKeyRange(keyRange));
// Scan a subset of rows that match the specified row filter.
p.apply("filtered read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table")
.withFilter(filter));
The HBase sink executes a set of row mutations on a single table. It takes as input a
PCollection<KV<byte[], Iterable<Mutation>>>
, where the
byte[]
is the key of the row being mutated, and each Mutation
represents an
idempotent transformation to that row.
To configure a HBase sink, you must supply a table id and a Configuration
to identify the HBase instance, for example:
Configuration configuration = ...;
PCollection<KV<byte[], Iterable<Mutation>>> data = ...;
data.setCoder(HBaseIO.WRITE_CODER);
data.apply("write",
HBaseIO.write()
.withConfiguration(configuration)
.withTableId("table"));
The design of the API for HBaseIO is currently related to the BigtableIO one, it can evolve or be different in some aspects, but the idea is that users can easily migrate from one to the other
.Modifier and Type | Class and Description |
---|---|
static class |
HBaseIO.Read
A
PTransform that reads from HBase. |
static class |
HBaseIO.Write
A
PTransform that writes to HBase. |
Modifier and Type | Field and Description |
---|---|
static Coder<KV<byte[],java.lang.Iterable<org.apache.hadoop.hbase.client.Mutation>>> |
WRITE_CODER |
Modifier and Type | Method and Description |
---|---|
static HBaseIO.Read |
read()
Creates an uninitialized
HBaseIO.Read . |
static HBaseIO.Write |
write()
Creates an uninitialized
HBaseIO.Write . |
@Experimental public static HBaseIO.Read read()
HBaseIO.Read
. Before use, the Read
must be
initialized with a
HBaseIO.Read.withConfiguration(Configuration)
that specifies
the HBase instance, and a tableId
that
specifies which table to read. A Filter
may also optionally be specified using
HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter)
.public static HBaseIO.Write write()
HBaseIO.Write
. Before use, the Write
must be
initialized with a
HBaseIO.Write.withConfiguration(Configuration)
that specifies
the destination HBase instance, and a tableId
that specifies which table to write.