apache_beam.dataframe.expressions module

class apache_beam.dataframe.expressions.Session(bindings=None)[source]

Bases: object

A session represents a mapping of expressions to concrete values.

The bindings typically include required placeholders, but may be any intermediate expression as well.

evaluate(expr)[source]
lookup(expr)[source]
class apache_beam.dataframe.expressions.Expression(name, proxy, _id=None)[source]

Bases: object

An expression is an operation bound to a set of arguments.

An expression represents a deferred tree of operations, which can be evaluated at a specific bindings of root expressions to values.

proxy()[source]
evaluate_at(session)[source]

Returns the result of self with the bindings given in session.

requires_partition_by_index()[source]

Whether this expression requires its argument(s) to be partitioned by index.

preserves_partition_by_index()[source]

Whether the result of this expression will be partitioned by index whenever all of its inputs are partitioned by index.

class apache_beam.dataframe.expressions.PlaceholderExpression(proxy)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be explicitly bound in the session.

Initialize a placeholder expression.

Parameters:proxy – A proxy object with the type expected to be bound to this expression. Used for type checking at pipeline construction time.
args()[source]
evaluate_at(session)[source]
requires_partition_by_index()[source]
preserves_partition_by_index()[source]
class apache_beam.dataframe.expressions.ConstantExpression(value, proxy=None)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value is known at pipeline construction time.

Initialize a constant expression.

Parameters:
  • value – The constant value to be produced by this expression.
  • proxy – (Optional) a proxy object with same type as value to use for rapid type checking at pipeline construction time. If not provided, value will be used directly.
args()[source]
evaluate_at(session)[source]
requires_partition_by_index()[source]
preserves_partition_by_index()[source]
class apache_beam.dataframe.expressions.ComputedExpression(name, func, args, proxy=None, _id=None, requires_partition_by_index=True, preserves_partition_by_index=False)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be computed at pipeline execution time.

Initialize a computed expression.

Parameters:
  • name – The name of this expression.
  • func – The function that will be used to compute the value of this expression. Should accept arguments of the types returned when evaluating the args expressions.
  • args – The list of expressions that will be used to produce inputs to func.
  • proxy – (Optional) a proxy object with same type as the objects that this ComputedExpression will produce at execution time. If not provided, a proxy will be generated using func and the proxies of args.
  • _id – (Optional) a string to uniquely identify this expression.
  • requires_partition_by_index – Whether this expression requires its argument(s) to be partitioned by index.
  • preserves_partition_by_index – Whether the result of this expression will be partitioned by index whenever all of its inputs are partitioned by index.
args()[source]
evaluate_at(session)[source]
requires_partition_by_index()[source]
preserves_partition_by_index()[source]
apache_beam.dataframe.expressions.elementwise_expression(name, func, args)[source]