@Experimental public abstract class SqlTransform extends PTransform<PInput,PCollection<Row>>
SqlTransform
is the DSL interface of Beam SQL. It translates a SQL query as a PTransform
, so developers can use standard SQL queries in a Beam pipeline.
A typical pipeline with Beam SQL DSL is:
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
//create table from TextIO;
PCollection<Row> inputTableA = p.apply(TextIO.read().from("/my/input/patha")).apply(...);
PCollection<Row> inputTableB = p.apply(TextIO.read().from("/my/input/pathb")).apply(...);
//run a simple query, and register the output as a table in BeamSql;
String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION";
PCollection<Row> outputTableA = inputTableA.apply(
SqlTransform
.query(sql1)
.addUdf("MY_FUNC", MY_FUNC.class, "FUNC");
//run a JOIN with one table from TextIO, and one table from another query
PCollection<Row> outputTableB =
PCollectionTuple
.of(new TupleTag<>("TABLE_O_A"), outputTableA)
.and(new TupleTag<>("TABLE_B"), inputTableB)
.apply(SqlTransform.query("select * from TABLE_O_A JOIN TABLE_B where ..."));
//output the final result with TextIO
outputTableB.apply(...).apply(TextIO.write().to("/my/output/path"));
p.run().waitUntilFinish();
A typical pipeline with Beam SQL DDL and DSL is:
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
String sql1 = "INSERT INTO pubsub_sink SELECT * FROM pubsub_source";
String ddlSource = "CREATE EXTERNAL TABLE pubsub_source(" +
"attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" +
"TYPE pubsub LOCATION 'projects/myproject/topics/topic1'";
String ddlSink = "CREATE EXTERNAL TABLE pubsub_sink(" +
"attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" +
"TYPE pubsub LOCATION 'projects/myproject/topics/mytopic'";
p.apply(SqlTransform.query(sql1).withDdlString(ddlSource).withDdlString(ddlSink))
p.run().waitUntilFinish();
name
Constructor and Description |
---|
SqlTransform() |
Modifier and Type | Method and Description |
---|---|
PCollection<Row> |
expand(PInput input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
static SqlTransform |
query(java.lang.String queryString)
Returns a
SqlTransform representing an equivalent execution plan. |
SqlTransform |
registerUdaf(java.lang.String functionName,
Combine.CombineFn combineFn)
register a
Combine.CombineFn as UDAF function used in this query. |
SqlTransform |
registerUdf(java.lang.String functionName,
java.lang.Class<? extends BeamSqlUdf> clazz)
register a UDF function used in this query.
|
SqlTransform |
registerUdf(java.lang.String functionName,
SerializableFunction sfn)
Register
SerializableFunction as a UDF function used in this query. |
SqlTransform |
withAutoLoading(boolean autoLoading) |
SqlTransform |
withDdlString(java.lang.String ddlString) |
SqlTransform |
withDefaultTableProvider(java.lang.String name,
TableProvider tableProvider) |
SqlTransform |
withNamedParameters(java.util.Map<java.lang.String,?> parameters) |
SqlTransform |
withPositionalParameters(java.util.List<?> parameters) |
SqlTransform |
withQueryPlannerClass(java.lang.Class<? extends QueryPlanner> clazz) |
SqlTransform |
withTableProvider(java.lang.String name,
TableProvider tableProvider) |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public PCollection<Row> expand(PInput input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PInput,PCollection<Row>>
public static SqlTransform query(java.lang.String queryString)
SqlTransform
representing an equivalent execution plan.
The SqlTransform
can be applied to a PCollection
or PCollectionTuple
representing all the input tables.
The PTransform
outputs a PCollection
of Row
.
If the PTransform
is applied to PCollection
then it gets registered with
name PCOLLECTION.
If the PTransform
is applied to PCollectionTuple
then TupleTag.getId()
is used as the corresponding PCollection
s name.
PCollectionTuple
,
this is valid;
PCollectionTuple
, an IllegalStateException
is thrown during query validati on;
PCollectionTuple
are only valid in the scope of
the current query call.
Any available implementation of QueryPlanner
can be used as the query planner in
SqlTransform
. An implementation can be specified globally for the entire pipeline with
BeamSqlPipelineOptions.getPlannerName()
. The global planner can be overridden
per-transform with withQueryPlannerClass(Class<? extends QueryPlanner>)
.
public SqlTransform withTableProvider(java.lang.String name, TableProvider tableProvider)
public SqlTransform withDefaultTableProvider(java.lang.String name, TableProvider tableProvider)
public SqlTransform withQueryPlannerClass(java.lang.Class<? extends QueryPlanner> clazz)
public SqlTransform withNamedParameters(java.util.Map<java.lang.String,?> parameters)
public SqlTransform withPositionalParameters(java.util.List<?> parameters)
public SqlTransform withDdlString(java.lang.String ddlString)
public SqlTransform withAutoLoading(boolean autoLoading)
public SqlTransform registerUdf(java.lang.String functionName, java.lang.Class<? extends BeamSqlUdf> clazz)
Refer to BeamSqlUdf
for more about how to implement a UDF in BeamSql.
public SqlTransform registerUdf(java.lang.String functionName, SerializableFunction sfn)
SerializableFunction
as a UDF function used in this query. Note, SerializableFunction
must have a constructor without arguments.public SqlTransform registerUdaf(java.lang.String functionName, Combine.CombineFn combineFn)
Combine.CombineFn
as UDAF function used in this query.