apache_beam.typehints.batch module

Utilities for working with batched types in the Beam SDK.

A batched type is a type B that is logically equivalent to Sequence[E], where E is some other type. Typically B has a different physical representation than Sequence[E] for performance reasons.

A trivial example is B=np.array(dtype=np.int64), E=int.

class apache_beam.typehints.batch.BatchConverter(batch_type, element_type)[source]

Bases: typing.Generic

produce_batch(elements: Sequence[E]) → B[source]

Convert an instance of List[E] to a single instance of B.

explode_batch(batch: B) → Iterator[E][source]

Convert an instance of B to Iterator[E].

combine_batches(batches: Sequence[B]) → B[source]
get_length(batch: B) → int[source]
estimate_byte_size(batch)[source]
static register(batch_converter_constructor: Callable[[type, type], BatchConverter])[source]
static from_typehints(*, element_type, batch_type) → apache_beam.typehints.batch.BatchConverter[source]
batch_type
element_type
class apache_beam.typehints.batch.ListBatchConverter(batch_type, element_type)[source]

Bases: apache_beam.typehints.batch.BatchConverter

SAMPLE_FRACTION = 0.2
MAX_SAMPLES = 100
SAMPLED_BATCH_SIZE = 500.0
static from_typehints(element_type, batch_type)[source]
produce_batch(elements)[source]
explode_batch(batch)[source]
combine_batches(batches)[source]
get_length(batch)[source]
estimate_byte_size(batch)[source]
class apache_beam.typehints.batch.NumpyBatchConverter(batch_type, element_type, dtype, element_shape=(), partition_dimension=0)[source]

Bases: apache_beam.typehints.batch.BatchConverter

static from_typehints(element_type, batch_type) → Optional[apache_beam.typehints.batch.NumpyBatchConverter][source]
produce_batch(elements)[source]
explode_batch(batch)[source]

Convert an instance of B to Generator[E].

combine_batches(batches)[source]
get_length(batch)[source]
estimate_byte_size(batch)[source]
class apache_beam.typehints.batch.NumpyTypeHint[source]

Bases: object

class NumpyTypeConstraint(dtype, shape=())[source]

Bases: apache_beam.typehints.typehints.TypeConstraint

type_check(batch)[source]