apache_beam.dataframe.partitionings module

class apache_beam.dataframe.partitionings.Partitioning[source]

Bases: object

A class representing a (consistent) partitioning of dataframe objects.

is_subpartitioning_of(other)[source]

Returns whether self is a sub-partition of other.

Specifically, returns whether something partitioned by self is necissarily also partitioned by other.

partition_fn(df, num_partitions)[source]

A callable that actually performs the partitioning of a Frame df.

This will be invoked via a FlatMap in conjunction with a GroupKey to achieve the desired partitioning.

test_partition_fn(df)[source]
class apache_beam.dataframe.partitionings.Index(levels=None)[source]

Bases: apache_beam.dataframe.partitionings.Partitioning

A partitioning by index (either fully or partially).

If the set of “levels” of the index to consider is not specified, the entire index is used.

These form a partial order, given by

Singleton() < Index([i]) < Index([i, j]) < … < Index() < Arbitrary()

The ordering is implemented via the is_subpartitioning_of method, where the examples on the right are subpartitionings of the examples on the left above.

is_subpartitioning_of(other)[source]
partition_fn(df, num_partitions)[source]
check(dfs)[source]
class apache_beam.dataframe.partitionings.Singleton[source]

Bases: apache_beam.dataframe.partitionings.Partitioning

A partitioning of all the data into a single partition.

is_subpartitioning_of(other)[source]
partition_fn(df, num_partitions)[source]
check(dfs)[source]
class apache_beam.dataframe.partitionings.Arbitrary[source]

Bases: apache_beam.dataframe.partitionings.Partitioning

A partitioning imposing no constraints on the actual partitioning.

is_subpartitioning_of(other)[source]
test_partition_fn(df)[source]
check(dfs)[source]