Beam SQL Walkthrough

This page illustrates the usage of Beam SQL with example code.

Row

Before applying a SQL query to a PCollection, the data in the collection must be in Row format. A Row represents a single, immutable record in a Beam SQL PCollection. The names and types of the fields/columns in the row are defined by its associated Schema. You can use the Schema.builder() to create Schemas. See Data Types for more details on supported primitive data types.

A PCollection<Row> can be obtained multiple ways, for example:

Once you have a PCollection<Row> in hand, you may use SqlTransform to apply SQL queries to it.

SqlTransform

SqlTransform.query(queryString) method is the only API to create a PTransform from a string representation of the SQL query. You can apply this PTransform to either a single PCollection or a PCollectionTuple which holds multiple PCollections:

BeamSqlExample in the code repository shows basic usage of both APIs.