Class KuduIO

java.lang.Object
org.apache.beam.sdk.io.kudu.KuduIO

public class KuduIO extends Object
A bounded source and sink for Kudu.

For more information, see the online documentation at Kudu.

Reading from Kudu

KuduIO provides a source to read and returns a bounded collection of entities as PCollection<T>. An entity is built by parsing a Kudu RowResult using the provided SerializableFunction.

The following example illustrates various options for configuring the IO:


 pipeline.apply(
     KuduIO.<String>read()
         .withMasterAddresses("kudu1:8051,kudu2:8051,kudu3:8051")
         .withTable("table")
         .withParseFn(
             (SerializableFunction<RowResult, String>) input -> input.getString(COL_NAME))
         .withCoder(StringUtf8Coder.of()));
     // above options illustrate a typical minimum set, returns PCollection<String>
 

withCoder(...) may be omitted if it can be inferred from the @{CoderRegistry}. However, when using a Lambda Expression or an anonymous inner class to define the function, type erasure will prohibit this. In such cases you are required to explicitly set the coder as in the above example.

Optionally, you can provide withPredicates(...) to apply a query to filter rows from the kudu table.

Optionally, you can provide withProjectedColumns(...) to limit the columns returned from the Kudu scan to improve performance. The columns required in the ParseFn must be declared in the projected columns.

Optionally, you can provide withBatchSize(...) to set the number of bytes returned from the Kudu scanner in each batch.

Optionally, you can provide withFaultTolerent(...) to enforce the read scan to resume a scan on another tablet server if the current server fails.

Writing to Kudu

The Kudu sink executes a set of operations on a single table. It takes as input a PCollection and a KuduIO.FormatFunction which is responsible for converting the input into an idempotent transformation on a row.

To configure a Kudu sink, you must supply the Kudu master addresses, the table name and a KuduIO.FormatFunction to convert the input records, for example:


 PCollection<MyType> data = ...;
 FormatFunction<MyType> fn = ...;

 data.apply("write",
     KuduIO.write()
         .withMasterAddresses("kudu1:8051,kudu2:8051,kudu3:8051")
         .withTable("table")
         .withFormatFn(fn));
 

KuduIO does not support authentication in this release.

  • Method Details