org.apache.beam.sdk.transforms.PTransform<PInput,PCollection<Row>>

org.apache.beam.sdk.extensions.sql.SqlTransform

All Implemented Interfaces:: Serializable, HasDisplayData

public abstract class SqlTransform extends PTransform<PInput,PCollection<Row>>

SqlTransform is the DSL interface of Beam SQL. It translates a SQL query as a PTransform, so developers can use standard SQL queries in a Beam pipeline.

Beam SQL DSL usage:

A typical pipeline with Beam SQL DSL is:


 PipelineOptions options = PipelineOptionsFactory.create();
 Pipeline p = Pipeline.create(options);

 //create table from TextIO;
 PCollection<Row> inputTableA = p.apply(TextIO.read().from("/my/input/patha")).apply(...);
 PCollection<Row> inputTableB = p.apply(TextIO.read().from("/my/input/pathb")).apply(...);

 //run a simple query, and register the output as a table in BeamSql;
 String sql1 = "select MY_FUNC(c1), c2 from PCOLLECTION";
 PCollection<Row> outputTableA = inputTableA.apply(
    SqlTransform
        .query(sql1)
        .addUdf("MY_FUNC", MY_FUNC.class, "FUNC");

 //run a JOIN with one table from TextIO, and one table from another query
 PCollection<Row> outputTableB =
     PCollectionTuple
     .of(new TupleTag<>("TABLE_O_A"), outputTableA)
     .and(new TupleTag<>("TABLE_B"), inputTableB)
         .apply(SqlTransform.query("select * from TABLE_O_A JOIN TABLE_B where ..."));

 //output the final result with TextIO
 outputTableB.apply(...).apply(TextIO.write().to("/my/output/path"));

 p.run().waitUntilFinish();

A typical pipeline with Beam SQL DDL and DSL is:


 PipelineOptions options = PipelineOptionsFactory.create();
 Pipeline p = Pipeline.create(options);

 String sql1 = "INSERT INTO pubsub_sink SELECT * FROM pubsub_source";

 String ddlSource = "CREATE EXTERNAL TABLE pubsub_source(" +
     "attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" +
     "TYPE pubsub LOCATION 'projects/myproject/topics/topic1'";

 String ddlSink = "CREATE EXTERNAL TABLE pubsub_sink(" +
     "attributes MAP<VARCHAR, VARCHAR>, payload ROW<name VARCHAR, size INTEGER>)" +
     "TYPE pubsub LOCATION 'projects/myproject/topics/mytopic'";

 p.apply(SqlTransform.query(sql1).withDdlString(ddlSource).withDdlString(ddlSink))

 p.run().waitUntilFinish();

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final String

PCOLLECTION_NAME

Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
Constructor Summary

Constructors

Constructor

Description

SqlTransform()
Method Summary

Modifier and Type

Method

Description

PCollection<Row>

expand(PInput input)

Override this method to specify how this PTransform should be expanded on the given InputT.

static SqlTransform

query(String queryString)

Returns a SqlTransform representing an equivalent execution plan.

SqlTransform

registerUdaf(String functionName, Combine.CombineFn combineFn)

register a Combine.CombineFn as UDAF function used in this query.

SqlTransform

registerUdf(String functionName, Class<? extends BeamSqlUdf> clazz)

register a UDF function used in this query.

SqlTransform

registerUdf(String functionName, SerializableFunction sfn)

Register SerializableFunction as a UDF function used in this query.

SqlTransform

withAutoLoading(boolean autoLoading)

SqlTransform

withDdlString(String ddlString)

SqlTransform

withDefaultTableProvider(String name, TableProvider tableProvider)

SqlTransform

withErrorsTransformer(PTransform<PCollection<Row>,? extends POutput> errorsTransformer)

SqlTransform

withNamedParameters(Map<String,?> parameters)

SqlTransform

withPositionalParameters(List<?> parameters)

SqlTransform

withQueryPlannerClass(Class<? extends QueryPlanner> clazz)

SqlTransform

withTableProvider(String name, TableProvider tableProvider)

Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- PCOLLECTION_NAME
  
  public static final String PCOLLECTION_NAME
  See Also:
  
  Constant Field Values
Constructor Details
- SqlTransform
  
  public SqlTransform()
Method Details
- expand
  
  public PCollection<Row> expand(PInput input)
  
  Description copied from class: PTransform
  
  Override this method to specify how this PTransform should be expanded on the given InputT.
  NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.
  Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
  
  Specified by:
  
  expand in class PTransform<PInput,PCollection<Row>>
- query
  
  public static SqlTransform query(String queryString)
  Returns a SqlTransform representing an equivalent execution plan.
  The SqlTransform can be applied to a PCollection or PCollectionTuple representing all the input tables.
  The PTransform outputs a PCollection of Row.
  If the PTransform is applied to PCollection then it gets registered with name PCOLLECTION.
  If the PTransform is applied to PCollectionTuple then TupleTag.getId() is used as the corresponding PCollections name.
  
  If the sql query only uses a subset of tables from the upstream PCollectionTuple, this is valid;
  If the sql query references a table not included in the upstream PCollectionTuple, an IllegalStateException is thrown during query validati on;
  Always, tables from the upstream PCollectionTuple are only valid in the scope of the current query call.
  
  Any available implementation of QueryPlanner can be used as the query planner in SqlTransform. An implementation can be specified globally for the entire pipeline with BeamSqlPipelineOptions.getPlannerName(). The global planner can be overridden per-transform with withQueryPlannerClass(Class).
- withTableProvider
  
  public SqlTransform withTableProvider(String name, TableProvider tableProvider)
- withDefaultTableProvider
  
  public SqlTransform withDefaultTableProvider(String name, TableProvider tableProvider)
- withQueryPlannerClass
  
  public SqlTransform withQueryPlannerClass(Class<? extends QueryPlanner> clazz)
- withNamedParameters
  
  public SqlTransform withNamedParameters(Map<String,?> parameters)
- withPositionalParameters
  
  public SqlTransform withPositionalParameters(List<?> parameters)
- withDdlString
  
  public SqlTransform withDdlString(String ddlString)
- withAutoLoading
  
  public SqlTransform withAutoLoading(boolean autoLoading)
- registerUdf
  
  public SqlTransform registerUdf(String functionName, Class<? extends BeamSqlUdf> clazz)
  
  register a UDF function used in this query.
  Refer to BeamSqlUdf for more about how to implement a UDF in BeamSql.
- registerUdf
  
  public SqlTransform registerUdf(String functionName, SerializableFunction sfn)
  
  Register SerializableFunction as a UDF function used in this query. Note, SerializableFunction must have a constructor without arguments.
- registerUdaf
  
  public SqlTransform registerUdaf(String functionName, Combine.CombineFn combineFn)
  
  register a Combine.CombineFn as UDAF function used in this query.
- withErrorsTransformer
  
  public SqlTransform withErrorsTransformer(PTransform<PCollection<Row>,? extends POutput> errorsTransformer)

Class SqlTransform

Beam SQL DSL usage:

Field Summary

Fields inherited from class org.apache.beam.sdk.transforms.PTransform

Constructor Summary

Method Summary

Methods inherited from class org.apache.beam.sdk.transforms.PTransform

Methods inherited from class java.lang.Object

Field Details

PCOLLECTION_NAME

Constructor Details

SqlTransform

Method Details

expand

query

withTableProvider

withDefaultTableProvider

withQueryPlannerClass

withNamedParameters

withPositionalParameters

withDdlString

withAutoLoading

registerUdf

registerUdf

registerUdaf

withErrorsTransformer