@Experimental public class BeamSql extends java.lang.Object
BeamSql
is the DSL interface of BeamSQL. It translates a SQL query as a
PTransform
, so developers can use standard SQL queries in a Beam pipeline.
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
//create table from TextIO;
PCollection<BeamSqlRow> inputTableA = p.apply(TextIO.read().from("/my/input/patha"))
.apply(...);
PCollection<BeamSqlRow> inputTableB = p.apply(TextIO.read().from("/my/input/pathb"))
.apply(...);
//run a simple query, and register the output as a table in BeamSql;
String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION";
PCollection<BeamSqlRow> outputTableA = inputTableA.apply(
BeamSql.query(sql1)
.withUdf("MY_FUNC", MY_FUNC.class, "FUNC"));
//run a JOIN with one table from TextIO, and one table from another query
PCollection<BeamSqlRow> outputTableB = PCollectionTuple.of(
new TupleTag<BeamSqlRow>("TABLE_O_A"), outputTableA)
.and(new TupleTag<BeamSqlRow>("TABLE_B"), inputTableB)
.apply(BeamSql.queryMulti("select * from TABLE_O_A JOIN TABLE_B where ..."));
//output the final result with TextIO
outputTableB.apply(...).apply(TextIO.write().to("/my/output/path"));
p.run().waitUntilFinish();
Modifier and Type | Class and Description |
---|---|
static class |
BeamSql.QueryTransform
A
PTransform representing an execution plan for a SQL query. |
static class |
BeamSql.SimpleQueryTransform
A
PTransform representing an execution plan for a SQL query referencing
a single table. |
Constructor and Description |
---|
BeamSql() |
Modifier and Type | Method and Description |
---|---|
static BeamSql.SimpleQueryTransform |
query(java.lang.String sqlQuery)
Transforms a SQL query into a
PTransform representing an equivalent execution plan. |
static BeamSql.QueryTransform |
queryMulti(java.lang.String sqlQuery)
Transforms a SQL query into a
PTransform representing an equivalent execution plan. |
public static BeamSql.QueryTransform queryMulti(java.lang.String sqlQuery)
PTransform
representing an equivalent execution plan.
The returned PTransform
can be applied to a PCollectionTuple
representing
all the input tables and results in a PCollection<BeamSqlRow>
representing the output
table. The PCollectionTuple
contains the mapping from table names
to
PCollection<BeamSqlRow>
, each representing an input table.
PCollectionTuple
,
this is valid;PCollectionTuple
,
an IllegalStateException
is thrown during query validation;PCollectionTuple
are only valid in the scope
of the current query call.public static BeamSql.SimpleQueryTransform query(java.lang.String sqlQuery)
PTransform
representing an equivalent execution plan.
This is a simplified form of queryMulti(String)
where the query must reference
a single input table.
Make sure to query it from a static table name PCOLLECTION.