apache_beam.testing.synthetic_pipeline module¶
A set of utilities to write pipelines for performance tests.
This module offers a way to create pipelines using synthetic sources and steps. Exact shape of the pipeline and the behaviour of sources and steps can be controlled through arguments. Please see function ‘parse_args()’ for more details about the arguments.
Shape of the pipeline is primariy controlled through two arguments. Argument ‘steps’ can be used to define a list of steps as a JSON string. Argument ‘barrier’ describes how these steps are separated from each other. Argument ‘barrier’ can be use to build a pipeline as a a series of steps or a tree of steps with a fanin or a fanout of size 2.
Other arguments describe what gets generated by synthetic sources that produce data for the pipeline.
-
apache_beam.testing.synthetic_pipeline.
rotate_key
(element)[source]¶ Returns a new key-value pair of the same size but with a different key.
-
class
apache_beam.testing.synthetic_pipeline.
SyntheticStep
(per_element_delay_sec=0, per_bundle_delay_sec=0, output_records_per_input_record=1, output_filter_ratio=0)[source]¶ Bases:
apache_beam.transforms.core.DoFn
A DoFn of which behavior can be controlled through prespecified parameters.
-
class
apache_beam.testing.synthetic_pipeline.
SyntheticSource
(input_spec)[source]¶ Bases:
apache_beam.io.iobase.BoundedSource
A custom source of a specified size.
Initiates a synthetic source.
Parameters: input_spec – Input specification of the source. See corresponding option in function ‘parse_args()’ below for more details. Raises: ValueError
– if input parameters are invalid.-
element_size
¶
-
-
apache_beam.testing.synthetic_pipeline.
merge_using_gbk
(name, pc1, pc2)[source]¶ Merges two given PCollections using a CoGroupByKey.
-
apache_beam.testing.synthetic_pipeline.
merge_using_side_input
(name, pc1, pc2)[source]¶ Merges two given PCollections using side inputs.
-
apache_beam.testing.synthetic_pipeline.
expand_using_gbk
(name, pc)[source]¶ Expands a given PCollection into two copies using GroupByKey.
-
apache_beam.testing.synthetic_pipeline.
expand_using_second_output
(name, pc)[source]¶ Expands a given PCollection into two copies using side outputs.