org.apache.beam.sdk.io.hbase.HBaseIO

public class HBaseIO extends Object

A bounded source and sink for HBase.

For more information, see the online documentation at HBase.

Reading from HBase

The HBase source returns a set of rows from a single table, returning a PCollection<Result>.

To configure a HBase source, you must supply a table id and a Configuration to identify the HBase instance. By default, HBaseIO.Read will read all rows in the table. The row range to be read can optionally be restricted using with a Scan object or using the HBaseIO.Read.withKeyRange(org.apache.beam.sdk.io.range.ByteKeyRange), and a Filter using HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter), for example:


 // Scan the entire table.
 p.apply("read",
     HBaseIO.read()
         .withConfiguration(configuration)
         .withTableId("table"));

 // Filter data using a HBaseIO Scan
 Scan scan = ...
 p.apply("read",
     HBaseIO.read()
         .withConfiguration(configuration)
         .withTableId("table"))
         .withScan(scan));

 // Scan a prefix of the table.
 ByteKeyRange keyRange = ...;
 p.apply("read",
     HBaseIO.read()
         .withConfiguration(configuration)
         .withTableId("table")
         .withKeyRange(keyRange));

 // Scan a subset of rows that match the specified row filter.
 p.apply("filtered read",
     HBaseIO.read()
         .withConfiguration(configuration)
         .withTableId("table")
         .withFilter(filter));

readAll() allows to execute multiple Scans to multiple Tables. These queries are encapsulated via an initial PCollection of HBaseIO.Reads and can be used to create advanced compositional patterns like reading from a Source and then based on the data create new HBase scans.

Note: HBaseIO.ReadAll only works with runners that support Splittable DoFn.


 PCollection<Read> queries = ...;
 queries.apply("readAll", HBaseIO.readAll().withConfiguration(configuration));

Writing to HBase

Writing `Mutation`

The HBase sink executes a set of row mutations on a single table. It takes as input a PCollection<Mutation>, where each Mutation represents an idempotent transformation on a row.

To configure a HBase sink, you must supply a table id and a Configuration to identify the HBase instance, for example:


 Configuration configuration = ...;
 PCollection<Mutation> data = ...;

 data.apply("write",
     HBaseIO.write()
         .withConfiguration(configuration)
         .withTableId("table"));

Writing `RowMutations`

An alternative way to write to HBase is with writeRowMutations(), which takes as input a PCollection<KV<byte[],org.apache.hadoop.hbase.client.RowMutations>>, representing KVs of bytes row keys and RowMutations.

This implementation is useful for preserving mutation order if the upstream is ordered by row key, as RowMutations will only be applied after previous RowMutations are successful.

To configure the sink, you must supply a table id string and a Configuration to identify the HBase instance, for example:


 Configuration configuration = ...;
 PCollection<KV<byte[], RowMutations>> data = ...;

 data.apply("write",
     HBaseIO.writeRowMutations()
         .withConfiguration(configuration)
         .withTableId("table"));

Note that the transformation emits the number of RowMutations written as an integer after successfully writing to HBase.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

HBaseIO.Read

A PTransform that reads from HBase.

static class

HBaseIO.ReadAll

Implementation of readAll().

static class

HBaseIO.Write

A PTransform that writes to HBase.

static class

HBaseIO.WriteRowMutations

Transformation that writes RowMutation objects to a Hbase table.
Method Summary

Modifier and Type

Method

Description

static HBaseIO.Read

read()

Creates an uninitialized HBaseIO.Read.

static HBaseIO.ReadAll

readAll()

A PTransform that works like read(), but executes read operations coming from a PCollection of HBaseIO.Read.

static HBaseIO.Write

write()

Creates an uninitialized HBaseIO.Write.

static HBaseIO.WriteRowMutations

writeRowMutations()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- read
  
  public static HBaseIO.Read read()
  
  Creates an uninitialized HBaseIO.Read. Before use, the Read must be initialized with a HBaseIO.Read.withConfiguration(Configuration) that specifies the HBase instance, and a tableId that specifies which table to read. A Filter may also optionally be specified using HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter).
- readAll
  
  public static HBaseIO.ReadAll readAll()
  
  A PTransform that works like read(), but executes read operations coming from a PCollection of HBaseIO.Read.
- write
  
  public static HBaseIO.Write write()
  
  Creates an uninitialized HBaseIO.Write. Before use, the Write must be initialized with a HBaseIO.Write.withConfiguration(Configuration) that specifies the destination HBase instance, and a tableId that specifies which table to write.
- writeRowMutations
  
  public static HBaseIO.WriteRowMutations writeRowMutations()

Class HBaseIO

Reading from HBase

Writing to HBase

Writing Mutation

Writing RowMutations

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

read

readAll

write

writeRowMutations

Writing `Mutation`

Writing `RowMutations`