Class HBaseIO
For more information, see the online documentation at HBase.
Reading from HBase
The HBase source returns a set of rows from a single table, returning a
PCollection<Result>
.
To configure a HBase source, you must supply a table id and a Configuration
to
identify the HBase instance. By default, HBaseIO.Read
will read all rows in the table.
The row range to be read can optionally be restricted using with a Scan
object or using
the HBaseIO.Read.withKeyRange(org.apache.beam.sdk.io.range.ByteKeyRange)
, and a Filter
using HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter)
, for example:
// Scan the entire table.
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table"));
// Filter data using a HBaseIO Scan
Scan scan = ...
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table"))
.withScan(scan));
// Scan a prefix of the table.
ByteKeyRange keyRange = ...;
p.apply("read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table")
.withKeyRange(keyRange));
// Scan a subset of rows that match the specified row filter.
p.apply("filtered read",
HBaseIO.read()
.withConfiguration(configuration)
.withTableId("table")
.withFilter(filter));
readAll()
allows to execute multiple Scan
s to multiple Table
s.
These queries are encapsulated via an initial PCollection
of HBaseIO.Read
s and can be
used to create advanced compositional patterns like reading from a Source and then based on the
data create new HBase scans.
Note: HBaseIO.ReadAll
only works with runners that support
Splittable DoFn.
PCollection<Read> queries = ...;
queries.apply("readAll", HBaseIO.readAll().withConfiguration(configuration));
Writing to HBase
Writing Mutation
The HBase sink executes a set of row mutations on a single table. It takes as input a PCollection<Mutation>
, where each Mutation
represents an idempotent
transformation on a row.
To configure a HBase sink, you must supply a table id and a Configuration
to identify
the HBase instance, for example:
Configuration configuration = ...;
PCollection<Mutation> data = ...;
data.apply("write",
HBaseIO.write()
.withConfiguration(configuration)
.withTableId("table"));
Writing RowMutations
An alternative way to write to HBase is with writeRowMutations()
, which takes
as input a PCollection<KV<byte[],
, representing KVs of bytes row keys and
RowMutations
.
This implementation is useful for preserving mutation order if the upstream is ordered by row key, as RowMutations will only be applied after previous RowMutations are successful.
To configure the sink, you must supply a table id string and a Configuration
to
identify the HBase instance, for example:
Configuration configuration = ...;
PCollection<KV<byte[], RowMutations>> data = ...;
data.apply("write",
HBaseIO.writeRowMutations()
.withConfiguration(configuration)
.withTableId("table"));
Note that the transformation emits the number of RowMutations written as an integer after successfully writing to HBase.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
APTransform
that reads from HBase.static class
Implementation ofreadAll()
.static class
APTransform
that writes to HBase.static class
Transformation that writes RowMutation objects to a Hbase table. -
Method Summary
Modifier and TypeMethodDescriptionstatic HBaseIO.Read
read()
Creates an uninitializedHBaseIO.Read
.static HBaseIO.ReadAll
readAll()
APTransform
that works likeread()
, but executes read operations coming from aPCollection
ofHBaseIO.Read
.static HBaseIO.Write
write()
Creates an uninitializedHBaseIO.Write
.static HBaseIO.WriteRowMutations
-
Method Details
-
read
Creates an uninitializedHBaseIO.Read
. Before use, theRead
must be initialized with aHBaseIO.Read.withConfiguration(Configuration)
that specifies the HBase instance, and atableId
that specifies which table to read. AFilter
may also optionally be specified usingHBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter)
. -
readAll
APTransform
that works likeread()
, but executes read operations coming from aPCollection
ofHBaseIO.Read
. -
write
Creates an uninitializedHBaseIO.Write
. Before use, theWrite
must be initialized with aHBaseIO.Write.withConfiguration(Configuration)
that specifies the destination HBase instance, and atableId
that specifies which table to write. -
writeRowMutations
-