apache_beam.dataframe.expressions module

class apache_beam.dataframe.expressions.Session(bindings=None)[source]

Bases: object

A session represents a mapping of expressions to concrete values.

The bindings typically include required placeholders, but may be any intermediate expression as well.

evaluate(expr)[source]
lookup(expr)[source]
class apache_beam.dataframe.expressions.Expression(name, proxy, _id=None)[source]

Bases: object

An expression is an operation bound to a set of arguments.

An expression represents a deferred tree of operations, which can be evaluated at a specific bindings of root expressions to values.

proxy()[source]
placeholders()[source]

Returns all the placeholders that self depends on.

evaluate_at(session)[source]

Returns the result of self with the bindings given in session.

requires_partition_by()[source]

Returns the partitioning, if any, require to evaluate this expression.

Returns partitioning.Nothing() to require no partitioning is required.

preserves_partition_by()[source]

Returns the partitioning, if any, preserved by this expression.

This gives an upper bound on the partitioning of its ouput. The actual partitioning of the output may be less strict (e.g. if the input was less partitioned).

class apache_beam.dataframe.expressions.PlaceholderExpression(proxy, reference=None)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be explicitly bound in the session.

Initialize a placeholder expression.

Parameters:proxy – A proxy object with the type expected to be bound to this expression. Used for type checking at pipeline construction time.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
class apache_beam.dataframe.expressions.ConstantExpression(value, proxy=None)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value is known at pipeline construction time.

Initialize a constant expression.

Parameters:
  • value – The constant value to be produced by this expression.
  • proxy – (Optional) a proxy object with same type as value to use for rapid type checking at pipeline construction time. If not provided, value will be used directly.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
class apache_beam.dataframe.expressions.ComputedExpression(name, func, args, proxy=None, _id=None, requires_partition_by=<apache_beam.dataframe.partitionings.Index object>, preserves_partition_by=<apache_beam.dataframe.partitionings.Nothing object>)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be computed at pipeline execution time.

Initialize a computed expression.

Parameters:
  • name – The name of this expression.
  • func – The function that will be used to compute the value of this expression. Should accept arguments of the types returned when evaluating the args expressions.
  • args – The list of expressions that will be used to produce inputs to func.
  • proxy – (Optional) a proxy object with same type as the objects that this ComputedExpression will produce at execution time. If not provided, a proxy will be generated using func and the proxies of args.
  • _id – (Optional) a string to uniquely identify this expression.
  • requires_partition_by_index – Whether this expression requires its argument(s) to be partitioned by index.
  • preserves_partition_by_index – Whether the result of this expression will be partitioned by index whenever all of its inputs are partitioned by index.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
apache_beam.dataframe.expressions.elementwise_expression(name, func, args)[source]
apache_beam.dataframe.expressions.allow_non_parallel_operations(allow=True)[source]
exception apache_beam.dataframe.expressions.NonParallelOperation[source]

Bases: Exception