apache_beam.dataframe.expressions module¶
- 
class apache_beam.dataframe.expressions.Session(bindings=None)[source]¶
- Bases: - object- A session represents a mapping of expressions to concrete values. - The bindings typically include required placeholders, but may be any intermediate expression as well. 
- 
class apache_beam.dataframe.expressions.PartitioningSession(bindings=None)[source]¶
- Bases: - apache_beam.dataframe.expressions.Session- An extension of Session that enforces actual partitioning of inputs. - Each expression is evaluated multiple times for various supported partitionings determined by its requires_partition_by specification. For each tested partitioning, the input is partitioned and the expression is evaluated on each partition separately, as if this were actually executed in a parallel manner. - For each input partitioning, the results are verified to be partitioned appropriately according to the expression’s preserves_partition_by specification. - For testing only. 
- 
apache_beam.dataframe.expressions.output_partitioning(expr, input_partitioning)[source]¶
- Return the expected output partitioning for expr when it’s input is partitioned by input_partitioning. - For internal use only; No backward compatibility guarantees 
- 
class apache_beam.dataframe.expressions.Expression(name: str, proxy: T, _id: Optional[str] = None)[source]¶
- Bases: - typing.Generic- An expression is an operation bound to a set of arguments. - An expression represents a deferred tree of operations, which can be evaluated at a specific bindings of root expressions to values. - requires_partition_by indicates the upper bound of a set of partitionings that are acceptable inputs to this expression. The expression should be able to produce the correct result when given input(s) partitioned by its requires_partition_by attribute, or by any partitoning that is _not_ a subpartitioning of it. - preserves_partition_by indicates the upper bound of a set of partitionings that can be preserved by this expression. When the input(s) to this expression are partitioned by preserves_partition_by, or by any partitioning that is _not_ a subpartitioning of it, this expression should produce output(s) partitioned by the same partitioning. - However, if the partitioning of an expression’s input is a subpartitioning of the partitioning that it preserves, the output is presumed to have no particular partitioning (i.e. Arbitrary()). - For example, let’s look at an “element-wise operation”, that has no partitioning requirement, and preserves any partitioning given to it: - requires_partition_by = Arbitrary() -----------------------------+ | +-----------+-------------+---------- ... ----+---------| | | | | | Singleton() < Index([i]) < Index([i, j]) < ... < Index() < Arbitrary() | | | | | +-----------+-------------+---------- ... ----+---------| | preserves_partition_by = Arbitrary() ----------------------------+ - As a more interesting example, consider this expression, which requires Index partitioning, and preserves just Singleton partitioning: - requires_partition_by = Index() -----------------------+ | +-----------+-------------+---------- ... ----| | | | | Singleton() < Index([i]) < Index([i, j]) < ... < Index() < Arbitrary() | | preserves_partition_by = Singleton() - Note that any non-Arbitrary partitioning is an acceptable input for this expression. However, unless the inputs are Singleton-partitioned, the expression makes no guarantees about the partitioning of the output. - 
evaluate_at(session: apache_beam.dataframe.expressions.Session) → T[source]¶
- Returns the result of self with the bindings given in session. 
 - 
requires_partition_by() → apache_beam.dataframe.partitionings.Partitioning[source]¶
- Returns the partitioning, if any, require to evaluate this expression. - Returns partitioning.Arbitrary() to require no partitioning is required. 
 - 
preserves_partition_by() → apache_beam.dataframe.partitionings.Partitioning[source]¶
- Returns the partitioning, if any, preserved by this expression. - This gives an upper bound on the partitioning of its ouput. The actual partitioning of the output may be less strict (e.g. if the input was less partitioned). 
 
- 
- 
class apache_beam.dataframe.expressions.PlaceholderExpression(proxy, reference=None)[source]¶
- Bases: - apache_beam.dataframe.expressions.Expression- An expression whose value must be explicitly bound in the session. - Initialize a placeholder expression. - Parameters: - proxy – A proxy object with the type expected to be bound to this expression. Used for type checking at pipeline construction time. 
- 
class apache_beam.dataframe.expressions.ConstantExpression(value, proxy=None)[source]¶
- Bases: - apache_beam.dataframe.expressions.Expression- An expression whose value is known at pipeline construction time. - Initialize a constant expression. - Parameters: - value – The constant value to be produced by this expression.
- proxy – (Optional) a proxy object with same type as value to use for rapid type checking at pipeline construction time. If not provided, value will be used directly.
 
- 
class apache_beam.dataframe.expressions.ComputedExpression(name, func, args, proxy=None, _id=None, requires_partition_by=Index, preserves_partition_by=Singleton)[source]¶
- Bases: - apache_beam.dataframe.expressions.Expression- An expression whose value must be computed at pipeline execution time. - Initialize a computed expression. - Parameters: - name – The name of this expression.
- func – The function that will be used to compute the value of this expression. Should accept arguments of the types returned when evaluating the args expressions.
- args – The list of expressions that will be used to produce inputs to func.
- proxy – (Optional) a proxy object with same type as the objects that this ComputedExpression will produce at execution time. If not provided, a proxy will be generated using func and the proxies of args.
- _id – (Optional) a string to uniquely identify this expression.
- requires_partition_by – The required (common) partitioning of the args.
- preserves_partition_by – The level of partitioning preserved.