Beam SQL overview

Beam SQL allows a Beam user (currently only available in Beam Java) to query bounded and unbounded PCollections with SQL statements. Your SQL query is translated to a PTransform, an encapsulated segment of a Beam pipeline. You can freely mix SQL PTransforms and other PTransforms in your pipeline.

Apache Calcite is a widespread SQL dialect used in big data processing with some streaming enhancements. Calcite provides the basic dialect underlying Beam SQL.

There are two additional concepts you need to know to use SQL in your pipeline:


The SQL pipeline walkthrough works through how to use Beam SQL with example code.


The Beam SQL shell allows you to write pipelines as SQL queries without using the Java SDK. The Shell page describes how to work with the interactive Beam SQL shell.

Apache Calcite dialect

The Calcite overview summarizes Apache Calcite operators, functions, syntax, and data types supported by Beam SQL.

Beam SQL extensions

Beam SQL has additional extensions to make it easy to leverage Beam’s unified batch/streaming model and support for complex data types.