Beam Calcite SQL lexical structure

A Beam Calcite SQL statements are comprised of a series of tokens. Tokens include identifiers, quoted identifiers, literals, keywords, operators, and special characters. Tokens can be separated by whitespace (space, backspace, tab, newline) or comments.

Identifiers

Identifiers are names that are associated with columns, tables, and other database objects.

Identifiers must begin with a letter or an underscore. Subsequent characters can be letters, numbers, or underscores. Quoted identifiers are identifiers enclosed by backtick (`) characters and can contain any character, such as spaces or symbols. However, quoted identifiers cannot be empty. Reserved Keywords can only be used as identifiers if enclosed by backticks.

Syntax (defined here as a regular expression):

[A-Za-z_][A-Za-z_0-9]*

Examples:

Customers5
_dataField1
ADGROUP

Invalid examples:

5Customers
_dataField!
GROUP

5Customers begins with a number, not a letter or underscore. _dataField! contains the special character “!” which is not a letter, number, or underscore. GROUP is a reserved keyword, and therefore cannot be used as an identifier without being enclosed by backtick characters.

Both identifiers and quoted identifiers are case insensitive, with some nuances. See Case Sensitivity for further details.

Quoted identifiers have the same escape sequences as string literals, defined below.

Literals

A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals.

String Literals

Both string and bytes literals must be quoted with single (') quotation mark.

Quoted literals:

LiteralExamplesDescription
Quoted string
  • 'it''s'
  • 'Title: "Boy"'
Quoted strings enclosed by single (') quotes can contain unescaped double (") quotes.
Two quotation marks ('') is the escape sequence.
Quoted strings can contain newlines.

Integer Literals

Integer literals are either a sequence of decimal digits (0 through 9). Integers can be prefixed by “+” or “-” to represent positive and negative values, respectively.

Examples:

123
-123

An integer literal is interpreted as an BIGINT.

Floating Point Literals

Syntax options:

[+-]DIGITS.[DIGITS][e[+-]DIGITS]
[DIGITS].DIGITS[e[+-]DIGITS]
DIGITSe[+-]DIGITS

DIGITS represents one or more decimal numbers (0 through 9) and e represents the exponent marker (e or E).

Examples:

123.456e-67
.1E4
58.
4e2

Numeric literals that contain either a decimal point or an exponent marker are presumed to be type double.

Implicit coercion of floating point literals to float type is possible if the value is within the valid float range.

There is no literal representation of NaN or infinity.

Array Literals

Array literals are a comma-separated lists of elements enclosed in square brackets prefixed with the ARRAY keyword.

Examples:

ARRAY[1, 2, 3]
ARRAY['x', 'y', 'xy']

Struct Literals

Syntax:

(elem[, elem...])

where elem is an element in the struct. elem must be a literal data type, not an expression or column name.

The output type is an anonymous struct type (structs are not named types) with anonymous fields with types matching the types of the input expressions.

ExampleOutput Type
(1, 2, 3)STRUCT<BIGINT,BIGINT,BIGINT>
(1, 'abc')STRUCT<BIGINT,STRING>

Date Literals

Syntax:

DATE 'YYYY-M[M]-D[D]'

Date literals contain the DATE keyword followed by a string literal that conforms to the canonical date format, enclosed in single quotation marks. Date literals support a range between the years 1 and 9999, inclusive. Dates outside of this range are invalid.

For example, the following date literal represents September 27, 2014:

DATE '2014-09-27'

String literals in canonical date format also implicitly coerce to DATE type when used where a DATE-type expression is expected. For example, in the query

SELECT * FROM foo WHERE date_col = "2014-09-27"

the string literal "2014-09-27" will be coerced to a date literal.

Time Literals

Syntax:

TIME '[H]H:[M]M:[S]S[.DDDDDD]]'

TIME literals contain the TIME keyword and a string literal that conforms to the canonical time format, enclosed in single quotation marks.

For example, the following time represents 12:30 p.m.:

TIME '12:30:00.45'

Timestamp literals

Syntax:

TIMESTAMP 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]]'

Timestamp literals contain the TIMESTAMP keyword and a string literal that conforms to the canonical timestamp format, enclosed in single quotation marks.

Timestamp literals support a range between the years 1 and 9999, inclusive. Timestamps outside of this range are invalid.

For example, the following timestamp represents 12:30 p.m. on September 27, 2014:

TIMESTAMP '2014-09-27 12:30:00.45'

Case Sensitivity

Beam SQL follows these rules for case sensitivity:

CategoryCase Sensitive?Notes
KeywordsNo
Function namesNo
Table namesYes
Column namesYes
String valuesYes
String comparisonsYes
Aliases within a queryNo
Regular expression matchingSee NotesRegular expression matching is case sensitive by default, unless the expression itself specifies that it should be case insensitive.
LIKE matchingYes 

Reserved Keywords

Keywords are a group of tokens that have special meaning in the Beam SQL language, and have the following characteristics:

Beam SQL has the following reserved keywords.

A
ABS
ABSOLUTE
ACTION
ADA
ADD
ADMIN
AFTER
ALL
ALLOCATE
ALLOW
ALTER
ALWAYS
AND
ANY
APPLY
ARE
ARRAY
ARRAY_MAX_CARDINALITY
AS
ASC
ASENSITIVE
ASSERTION
ASSIGNMENT
ASYMMETRIC
AT
ATOMIC
ATTRIBUTE
ATTRIBUTES
AUTHORIZATION
AVG
BEFORE
BEGIN
BEGIN_FRAME
BEGIN_PARTITION
BERNOULLI
BETWEEN
BIGINT
BINARY
BIT
BLOB
BOOLEAN
BOTH
BREADTH
BY
C
CALL
CALLED
CARDINALITY
CASCADE
CASCADED
CASE
CAST
CATALOG
CATALOG_NAME
CEIL
CEILING
CENTURY
CHAIN
CHAR
CHAR_LENGTH
CHARACTER
CHARACTER_LENGTH
CHARACTER_SET_CATALOG
CHARACTER_SET_NAME
CHARACTER_SET_SCHEMA
CHARACTERISTICS
CHARACTERS
CHECK
CLASSIFIER
CLASS_ORIGIN
CLOB
CLOSE
COALESCE
COBOL
COLLATE
COLLATION
COLLATION_CATALOG
COLLATION_NAME
COLLATION_SCHEMA
COLLECT
COLUMN
COLUMN_NAME
COMMAND_FUNCTION
COMMAND_FUNCTION_CODE
COMMENT
COMMIT
COMMITTED
CONDITION
CONDITION_NUMBER
CONNECT
CONNECTION
CONNECTION_NAME
CONSTRAINT
CONSTRAINT_CATALOG
CONSTRAINT_NAME
CONSTRAINT_SCHEMA
CONSTRAINTS
CONSTRUCTOR
CONTAINS
CONTINUE
CONVERT
CORR
CORRESPONDING
COUNT
COVAR_POP
COVAR_SAMP
CREATE
CROSS
CUBE
CUME_DIST
CURRENT
CURRENT_CATALOG
CURRENT_DATE
CURRENT_DEFAULT_TRANSFORM_GROUP
CURRENT_PATH
CURRENT_ROLE
CURRENT_ROW
CURRENT_SCHEMA
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TRANSFORM_GROUP_FOR_TYPE
CURRENT_USER
CURSOR
CURSOR_NAME
CYCLE
DATA
DATABASE
DATE
DATETIME_INTERVAL_CODE
DATETIME_INTERVAL_PRECISION
DAY
DEALLOCATE
DEC
DECADE
DECIMAL
DECLARE
DEFAULT
DEFAULTS
DEFERRABLE
DEFERRED
DEFINE
DEFINED
DEFINER
DEGREE
DELETE
DENSE_RANK
DEPTH
DEREF
DERIVED
DESC
DESCRIBE
DESCRIPTION
DESCRIPTOR
DETERMINISTIC
DIAGNOSTICS
DISALLOW
DISCONNECT
DISPATCH
DISTINCT
DOMAIN
DOUBLE
DOW
DOY
DROP
DYNAMIC
DYNAMIC_FUNCTION
DYNAMIC_FUNCTION_CODE
EACH
ELEMENT
ELSE
EMPTY
END
END-EXEC
END_FRAME
END_PARTITION
EPOCH
EQUALS
ESCAPE
EVERY
EXCEPT
EXCEPTION
EXCLUDE
EXCLUDING
EXEC
EXECUTE
EXISTS
EXP
EXPLAIN
EXTEND
EXTERNAL
EXTRACT
FALSE
FETCH
FILTER
FINAL
FIRST
FIRST_VALUE
FLOAT
FLOOR
FOLLOWING
FOR
FOREIGN
FORTRAN
FOUND
FRAC_SECOND
FRAME_ROW
FREE
FROM
FULL
FUNCTION
FUSION
G
GENERAL
GENERATED
GEOMETRY
GET
GLOBAL
GO
GOTO
GRANT
GRANTED
GROUP
GROUPING
GROUPS
HAVING
HIERARCHY
HOLD
HOUR
IDENTITY
IF
IMMEDIATE
IMMEDIATELY
IMPLEMENTATION
IMPORT
IN
INCLUDING
INCREMENT
INDICATOR
INITIAL
INITIALLY
INNER
INOUT
INPUT
INSENSITIVE
INSERT
INSTANCE
INSTANTIABLE
INT
INTEGER
INTERSECT
INTERSECTION
INTERVAL
INTO
INVOKER
IS
ISOLATION
JAVA
JOIN
JSON
K
KEY
KEY_MEMBER
KEY_TYPE
LABEL
LAG
LANGUAGE
LARGE
LAST
LAST_VALUE
LATERAL
LEAD
LEADING
LEFT
LENGTH
LEVEL
LIBRARY
LIKE
LIKE_REGEX
LIMIT
LN
LOCAL
LOCALTIME
LOCALTIMESTAMP
LOCATION
LOCATOR
LOWER
M
MAP
MATCH
MATCHED
MATCHES
MATCH_NUMBER
MATCH_RECOGNIZE
MAX
MAXVALUE
MEASURES
MEMBER
MERGE
MESSAGE_LENGTH
MESSAGE_OCTET_LENGTH
MESSAGE_TEXT
METHOD
MICROSECOND
MILLENNIUM
MIN
MINUTE
MINVALUE
MOD
MODIFIES
MODULE
MONTH
MORE
MULTISET
MUMPS
NAME
NAMES
NATIONAL
NATURAL
NCHAR
NCLOB
NESTING
NEW
NEXT
NO
NONE
NORMALIZE
NORMALIZED
NOT
NTH_VALUE
NTILE
NULL
NULLABLE
NULLIF
NULLS
NUMBER
NUMERIC
OBJECT
OCCURRENCES_REGEX
OCTET_LENGTH
OCTETS
OF
OFFSET
OLD
OMIT
ON
ONE
ONLY
OPEN
OPTION
OPTIONS
OR
ORDER
ORDERING
ORDINALITY
OTHERS
OUT
OUTER
OUTPUT
OVER
OVERLAPS
OVERLAY
OVERRIDING
PAD
PARAMETER
PARAMETER_MODE
PARAMETER_NAME
PARAMETER_ORDINAL_POSITION
PARAMETER_SPECIFIC_CATALOG
PARAMETER_SPECIFIC_NAME
PARAMETER_SPECIFIC_SCHEMA
PARTIAL
PARTITION
PASCAL
PASSTHROUGH
PAST
PATH
PATTERN
PER
PERCENT
PERCENTILE_CONT
PERCENTILE_DISC
PERCENT_RANK
PERIOD
PERMUTE
PLACING
PLAN
PLI
PORTION
POSITION
POSITION_REGEX
POWER
PRECEDES
PRECEDING
PRECISION
PREPARE
PRESERVE
PREV
PRIMARY
PRIOR
PRIVILEGES
PROCEDURE
PUBLIC
QUARTER
RANGE
RANK
READ
READS
REAL
RECURSIVE
REF
REFERENCES
REFERENCING
REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
RELATIVE
RELEASE
REPEATABLE
REPLACE
RESET
RESTART
RESTRICT
RESULT
RETURN
RETURNED_CARDINALITY
RETURNED_LENGTH
RETURNED_OCTET_LENGTH
RETURNED_SQLSTATE
RETURNS
REVOKE
RIGHT
ROLE
ROLLBACK
ROLLUP
ROUTINE
ROUTINE_CATALOG
ROUTINE_NAME
ROUTINE_SCHEMA
ROW
ROW_COUNT
ROW_NUMBER
ROWS
RUNNING
SAVEPOINT
SCALE
SCHEMA
SCHEMA_NAME
SCOPE
SCOPE_CATALOGS
SCOPE_NAME
SCOPE_SCHEMA
SCROLL
SEARCH
SECOND
SECTION
SECURITY
SEEK
SELECT
SELF
SENSITIVE
SEQUENCE
SERIALIZABLE
SERVER
SERVER_NAME
SESSION
SESSION_USER
SET
SETS
MINUS
SHOW
SIMILAR
SIMPLE
SIZE
SKIP
SMALLINT
SOME
SOURCE
SPACE
SPECIFIC
SPECIFIC_NAME
SPECIFICTYPE
SQL
SQLEXCEPTION
SQLSTATE
SQLWARNING
SQL_BIGINT
SQL_BINARY
SQL_BIT
SQL_BLOB
SQL_BOOLEAN
SQL_CHAR
SQL_CLOB
SQL_DATE
SQL_DECIMAL
SQL_DOUBLE
SQL_FLOAT
SQL_INTEGER
SQL_INTERVAL_DAY
SQL_INTERVAL_DAY_TO_HOUR
SQL_INTERVAL_DAY_TO_MINUTE
SQL_INTERVAL_DAY_TO_SECOND
SQL_INTERVAL_HOUR
SQL_INTERVAL_HOUR_TO_MINUTE
SQL_INTERVAL_HOUR_TO_SECOND
SQL_INTERVAL_MINUTE
SQL_INTERVAL_MINUTE_TO_SECOND
SQL_INTERVAL_MONTH
SQL_INTERVAL_SECOND
SQL_INTERVAL_YEAR
SQL_INTERVAL_YEAR_TO_MONTH
SQL_LONGVARBINARY
SQL_LONGVARCHAR
SQL_LONGVARNCHAR
SQL_NCHAR
SQL_NCLOB
SQL_NUMERIC
SQL_NVARCHAR
SQL_REAL
SQL_SMALLINT
SQL_TIME
SQL_TIMESTAMP
SQL_TINYINT
SQL_TSI_DAY
SQL_TSI_FRAC_SECOND
SQL_TSI_HOUR
SQL_TSI_MICROSECOND
SQL_TSI_MINUTE
SQL_TSI_MONTH
SQL_TSI_QUARTER
SQL_TSI_SECOND
SQL_TSI_WEEK
SQL_TSI_YEAR
SQL_VARBINARY
SQL_VARCHAR
SQRT
START
STATE
STATEMENT
STATIC
STDDEV_POP
STDDEV_SAMP
STREAM
STRUCTURE
STYLE
SUBCLASS_ORIGIN
SUBMULTISET
SUBSET
SUBSTITUTE
SUBSTRING
SUBSTRING_REGEX
SUCCEEDS
SUM
SYMMETRIC
SYSTEM
SYSTEM_TIME
SYSTEM_USER
TABLE
TABLE_NAME
TABLESAMPLE
TBLPROPERTIES
TEMPORARY
THEN
TIES
TIME
TIMESTAMP
TIMESTAMPADD
TIMESTAMPDIFF
TIMEZONE_HOUR
TIMEZONE_MINUTE
TINYINT
TO
TOP_LEVEL_COUNT
TRAILING
TRANSACTION
TRANSACTIONS_ACTIVE
TRANSACTIONS_COMMITTED
TRANSACTIONS_ROLLED_BACK
TRANSFORM
TRANSFORMS
TRANSLATE
TRANSLATE_REGEX
TRANSLATION
TREAT
TRIGGER
TRIGGER_CATALOG
TRIGGER_NAME
TRIGGER_SCHEMA
TRIM
TRIM_ARRAY
TRUE
TRUNCATE
TYPE
UESCAPE
UNBOUNDED
UNCOMMITTED
UNDER
UNION
UNIQUE
UNKNOWN
UNNAMED
UNNEST
UPDATE
UPPER
UPSERT
USAGE
USER
USER_DEFINED_TYPE_CATALOG
USER_DEFINED_TYPE_CODE
USER_DEFINED_TYPE_NAME
USER_DEFINED_TYPE_SCHEMA
USING
VALUE
VALUES
VALUE_OF
VAR_POP
VAR_SAMP
VARBINARY
VARCHAR
VARYING
VERSION
VERSIONING
VIEW
WEEK
WHEN
WHENEVER
WHERE
WIDTH_BUCKET
WINDOW
WITH
WITHIN
WITHOUT
WORK
WRAPPER
WRITE
XML
YEAR
ZONE

Terminating Semicolons

Statements can optionally use a terminating semicolon (;) in the context of a query string submitted through an Application Programming Interface (API). Some interactive tools require statements to have a terminating semicolon. In a request containing multiple statements, statements must be separated by semicolons, but the semicolon is optional for the final statement.

Comments

Comments are sequences of characters that are ignored by the parser. Beam SQL supports the following types of comments.

Single line comments

Single line comments are supported by prepending -- before the comment.

Examples

SELECT x FROM T; --x is a field and T is a table

Comment includes all characters from the ‘--’ sequence to the end of the line. You can optionally add a space after the ‘--’.

Multiline comments

Multiline comments are supported by enclosing the comment using /* <comment> */.

Example:

SELECT x FROM T /* x is a field and T is a table */
WHERE x = 3;

Invalid example:

SELECT x FROM T /* comment starts here
                /* comment ends on this line */
                this line is not considered a comment */
WHERE x = 3;

Comment includes all characters, including newlines, enclosed by the first occurrence of ‘/*’ and the first subsequent occurrence of ‘*/’. Nested comments are not supported. The second example contains a nested comment that renders the query invalid.

Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.