apache_beam.dataframe.expressions module

class apache_beam.dataframe.expressions.Session(bindings=None)[source]

Bases: object

A session represents a mapping of expressions to concrete values.

The bindings typically include required placeholders, but may be any intermediate expression as well.

evaluate(expr)[source]
lookup(expr)[source]
class apache_beam.dataframe.expressions.PartitioningSession(bindings=None)[source]

Bases: apache_beam.dataframe.expressions.Session

An extension of Session that enforces actual partitioning of inputs.

When evaluating an expression, inputs are partitioned according to its requires_partition_by specifications, the expression is evaluated on each partition separately, and the final result concatinated, as if this were actually executed in a parallel manner.

For testing only.

evaluate(expr)[source]
class apache_beam.dataframe.expressions.Expression(name, proxy, _id=None)[source]

Bases: object

An expression is an operation bound to a set of arguments.

An expression represents a deferred tree of operations, which can be evaluated at a specific bindings of root expressions to values.

proxy()[source]
placeholders()[source]

Returns all the placeholders that self depends on.

evaluate_at(session)[source]

Returns the result of self with the bindings given in session.

requires_partition_by()[source]

Returns the partitioning, if any, require to evaluate this expression.

Returns partitioning.Nothing() to require no partitioning is required.

preserves_partition_by()[source]

Returns the partitioning, if any, preserved by this expression.

This gives an upper bound on the partitioning of its ouput. The actual partitioning of the output may be less strict (e.g. if the input was less partitioned).

class apache_beam.dataframe.expressions.PlaceholderExpression(proxy, reference=None)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be explicitly bound in the session.

Initialize a placeholder expression.

Parameters:proxy – A proxy object with the type expected to be bound to this expression. Used for type checking at pipeline construction time.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
class apache_beam.dataframe.expressions.ConstantExpression(value, proxy=None)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value is known at pipeline construction time.

Initialize a constant expression.

Parameters:
  • value – The constant value to be produced by this expression.
  • proxy – (Optional) a proxy object with same type as value to use for rapid type checking at pipeline construction time. If not provided, value will be used directly.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
class apache_beam.dataframe.expressions.ComputedExpression(name, func, args, proxy=None, _id=None, requires_partition_by=<apache_beam.dataframe.partitionings.Index object>, preserves_partition_by=<apache_beam.dataframe.partitionings.Nothing object>)[source]

Bases: apache_beam.dataframe.expressions.Expression

An expression whose value must be computed at pipeline execution time.

Initialize a computed expression.

Parameters:
  • name – The name of this expression.
  • func – The function that will be used to compute the value of this expression. Should accept arguments of the types returned when evaluating the args expressions.
  • args – The list of expressions that will be used to produce inputs to func.
  • proxy – (Optional) a proxy object with same type as the objects that this ComputedExpression will produce at execution time. If not provided, a proxy will be generated using func and the proxies of args.
  • _id – (Optional) a string to uniquely identify this expression.
  • requires_partition_by – The required (common) partitioning of the args.
  • preserves_partition_by – The level of partitioning preserved.
placeholders()[source]
args()[source]
evaluate_at(session)[source]
requires_partition_by()[source]
preserves_partition_by()[source]
apache_beam.dataframe.expressions.elementwise_expression(name, func, args)[source]
apache_beam.dataframe.expressions.allow_non_parallel_operations(allow=True)[source]
exception apache_beam.dataframe.expressions.NonParallelOperation[source]

Bases: Exception