apache_beam.testing.synthetic_pipeline module

A set of utilities to write pipelines for performance tests.

This module offers a way to create pipelines using synthetic sources and steps. Exact shape of the pipeline and the behaviour of sources and steps can be controlled through arguments. Please see function ‘parse_args()’ for more details about the arguments.

Shape of the pipeline is primariy controlled through two arguments. Argument ‘steps’ can be used to define a list of steps as a JSON string. Argument ‘barrier’ describes how these steps are separated from each other. Argument ‘barrier’ can be use to build a pipeline as a a series of steps or a tree of steps with a fanin or a fanout of size 2.

Other arguments describe what gets generated by synthetic sources that produce data for the pipeline.

apache_beam.testing.synthetic_pipeline.parse_byte_size(s)[source]
apache_beam.testing.synthetic_pipeline.div_round_up(a, b)[source]

Return ceil(a/b).

apache_beam.testing.synthetic_pipeline.rotate_key(element)[source]

Returns a new key-value pair of the same size but with a different key.

class apache_beam.testing.synthetic_pipeline.SyntheticStep(per_element_delay_sec=0, per_bundle_delay_sec=0, output_records_per_input_record=1, output_filter_ratio=0)[source]

Bases: apache_beam.transforms.core.DoFn

A DoFn of which behavior can be controlled through prespecified parameters.

start_bundle()[source]
finish_bundle()[source]
process(element)[source]
class apache_beam.testing.synthetic_pipeline.SyntheticSource(input_spec)[source]

Bases: apache_beam.io.iobase.BoundedSource

A custom source of a specified size.

Initiates a synthetic source.

Parameters:input_spec – Input specification of the source. See corresponding option in function ‘parse_args()’ below for more details.
Raises:ValueError – if input parameters are invalid.
element_size
estimate_size()[source]
split(desired_bundle_size, start_position=0, stop_position=None)[source]
get_range_tracker(start_position, stop_position)[source]
read(range_tracker)[source]
default_output_coder()[source]
class apache_beam.testing.synthetic_pipeline.ShuffleBarrier(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

expand(pc)[source]
class apache_beam.testing.synthetic_pipeline.SideInputBarrier(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

expand(pc)[source]
apache_beam.testing.synthetic_pipeline.merge_using_gbk(name, pc1, pc2)[source]

Merges two given PCollections using a CoGroupByKey.

apache_beam.testing.synthetic_pipeline.merge_using_side_input(name, pc1, pc2)[source]

Merges two given PCollections using side inputs.

apache_beam.testing.synthetic_pipeline.expand_using_gbk(name, pc)[source]

Expands a given PCollection into two copies using GroupByKey.

apache_beam.testing.synthetic_pipeline.expand_using_second_output(name, pc)[source]

Expands a given PCollection into two copies using side outputs.

apache_beam.testing.synthetic_pipeline.parse_args(args)[source]

Parses a given set of arguments.

Parameters:args – set of arguments to be passed.
Returns:a tuple where first item gives the set of arguments defined and parsed within this method and second item gives the set of unknown arguments.
apache_beam.testing.synthetic_pipeline.run(argv=None)[source]

Runs the workflow.