@Experimental public class BeamSql extends java.lang.Object
BeamSql
is the DSL interface of BeamSQL. It translates a SQL query as a
PTransform
, so developers can use standard SQL queries in a Beam pipeline.
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
//create table from TextIO;
PCollection<Row> inputTableA = p.apply(TextIO.read().from("/my/input/patha")).apply(...);
PCollection<Row> inputTableB = p.apply(TextIO.read().from("/my/input/pathb")).apply(...);
//run a simple query, and register the output as a table in BeamSql;
String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION";
PCollection<Row> outputTableA = inputTableA.apply(
BeamSql
.query(sql1)
.registerUdf("MY_FUNC", MY_FUNC.class, "FUNC");
//run a JOIN with one table from TextIO, and one table from another query
PCollection<Row> outputTableB =
PCollectionTuple
.of(new TupleTag<>("TABLE_O_A"), outputTableA)
.and(new TupleTag<>("TABLE_B"), inputTableB)
.apply(BeamSql.query("select * from TABLE_O_A JOIN TABLE_B where ..."));
//output the final result with TextIO
outputTableB.apply(...).apply(TextIO.write().to("/my/output/path"));
p.run().waitUntilFinish();
Constructor and Description |
---|
BeamSql() |
Modifier and Type | Method and Description |
---|---|
static QueryTransform |
query(java.lang.String sqlQuery)
Returns a
QueryTransform representing an equivalent execution plan. |
public static QueryTransform query(java.lang.String sqlQuery)
QueryTransform
representing an equivalent execution plan.
The QueryTransform
can be applied to a PCollection
or PCollectionTuple
representing all the input tables.
The PTransform
outputs a PCollection
of Row
.
If the PTransform
is applied to PCollection
then it gets registered with
name PCOLLECTION.
If the PTransform
is applied to PCollectionTuple
then
TupleTag.getId()
is used as the corresponding PCollection
s name.
PCollectionTuple
,
this is valid;PCollectionTuple
,
an IllegalStateException
is thrown during query validation;PCollectionTuple
are only valid in the scope
of the current query call.