@Experimental public class BeamSql extends java.lang.Object
BeamSql is the DSL interface of BeamSQL. It translates a SQL query as a PTransform, so developers can use standard SQL queries in a Beam pipeline.
A typical pipeline with Beam SQL DSL is:
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
//create table from TextIO;
PCollection<Row> inputTableA = p.apply(TextIO.read().from("/my/input/patha")).apply(...);
PCollection<Row> inputTableB = p.apply(TextIO.read().from("/my/input/pathb")).apply(...);
//run a simple query, and register the output as a table in BeamSql;
String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION";
PCollection<Row> outputTableA = inputTableA.apply(
BeamSql
.query(sql1)
.registerUdf("MY_FUNC", MY_FUNC.class, "FUNC");
//run a JOIN with one table from TextIO, and one table from another query
PCollection<Row> outputTableB =
PCollectionTuple
.of(new TupleTag<>("TABLE_O_A"), outputTableA)
.and(new TupleTag<>("TABLE_B"), inputTableB)
.apply(BeamSql.query("select * from TABLE_O_A JOIN TABLE_B where ..."));
//output the final result with TextIO
outputTableB.apply(...).apply(TextIO.write().to("/my/output/path"));
p.run().waitUntilFinish();
| Constructor and Description |
|---|
BeamSql() |
| Modifier and Type | Method and Description |
|---|---|
static QueryTransform |
query(java.lang.String sqlQuery)
Returns a
QueryTransform representing an equivalent execution plan. |
public static QueryTransform query(java.lang.String sqlQuery)
QueryTransform representing an equivalent execution plan.
The QueryTransform can be applied to a PCollection or PCollectionTuple representing all the input tables.
The PTransform outputs a PCollection of Row.
If the PTransform is applied to PCollection then it gets registered with
name PCOLLECTION.
If the PTransform is applied to PCollectionTuple then TupleTag.getId() is used as the corresponding PCollections name.
PCollectionTuple,
this is valid;
PCollectionTuple, an IllegalStateException is thrown during query validation;
PCollectionTuple are only valid in the scope of
the current query call.