apache_beam.testing package

Submodules

apache_beam.testing.pipeline_verifiers module

End-to-end test result verifiers

A set of verifiers that are used in end-to-end tests to verify state/output of test pipeline job. Customized verifier should extend hamcrest.core.base_matcher.BaseMatcher and override _matches.

class apache_beam.testing.pipeline_verifiers.PipelineStateMatcher(expected_state='DONE')[source]

Bases: hamcrest.core.base_matcher.BaseMatcher

Matcher that verify pipeline job terminated in expected state

Matcher compares the actual pipeline terminate state with expected. By default, PipelineState.DONE is used as expected state.

describe_mismatch(pipeline_result, mismatch_description)[source]
describe_to(description)[source]
class apache_beam.testing.pipeline_verifiers.FileChecksumMatcher(file_path, expected_checksum, sleep_secs=None)[source]

Bases: hamcrest.core.base_matcher.BaseMatcher

Matcher that verifies file(s) content by comparing file checksum.

Use apache_beam.io.filebasedsink to fetch file(s) from given path. File checksum is a hash string computed from content of file(s).

describe_mismatch(pipeline_result, mismatch_description)[source]
describe_to(description)[source]
apache_beam.testing.pipeline_verifiers.retry_on_io_error_and_server_error(exception)[source]

Filter allowing retries on file I/O errors and service error.

apache_beam.testing.test_pipeline module

Test Pipeline, a wrapper of Pipeline for test purpose

class apache_beam.testing.test_pipeline.TestPipeline(runner=None, options=None, argv=None, is_integration_test=False, blocking=True)[source]

Bases: apache_beam.pipeline.Pipeline

TestPipeline class is used inside of Beam tests that can be configured to run against pipeline runner.

It has a functionality to parse arguments from command line and build pipeline options for tests who runs against a pipeline runner and utilizes resources of the pipeline runner. Those test functions are recommended to be tagged by @attr(“ValidatesRunner”) annotation.

In order to configure the test with customized pipeline options from command line, system argument ‘test-pipeline-options’ can be used to obtains a list of pipeline options. If no options specified, default value will be used.

For example, use following command line to execute all ValidatesRunner tests:

python setup.py nosetests -a ValidatesRunner         --test-pipeline-options="--runner=DirectRunner                                  --job_name=myJobName                                  --num_workers=1"

For example, use assert_that for test validation:

pipeline = TestPipeline()
pcoll = ...
assert_that(pcoll, equal_to(...))
pipeline.run()
get_full_options_as_args(**extra_opts)[source]

Get full pipeline options as an argument list.

Append extra pipeline options to existing option list if provided. Test verifier (if contains in extra options) should be pickled before appending, and will be unpickled later in the TestRunner.

get_option(opt_name)[source]

Get a pipeline option value by name

Parameters:opt_name – The name of the pipeline option.
Returns:None if option is not found in existing option list which is generated by parsing value of argument test-pipeline-options.
run()[source]

apache_beam.testing.test_stream module

Provides TestStream for verifying streaming runner semantics.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.testing.test_stream.Event[source]

Bases: object

Test stream event to be emitted during execution of a TestStream.

class apache_beam.testing.test_stream.ElementEvent(timestamped_values)[source]

Bases: apache_beam.testing.test_stream.Event

Element-producing test stream event.

class apache_beam.testing.test_stream.WatermarkEvent(new_watermark)[source]

Bases: apache_beam.testing.test_stream.Event

Watermark-advancing test stream event.

class apache_beam.testing.test_stream.ProcessingTimeEvent(advance_by)[source]

Bases: apache_beam.testing.test_stream.Event

Processing time-advancing test stream event.

class apache_beam.testing.test_stream.TestStream(coder=<class 'apache_beam.coders.coders.FastPrimitivesCoder'>)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Test stream that generates events on an unbounded PCollection of elements.

Each event emits elements, advances the watermark or advances the processing time. After all of the specified elements are emitted, ceases to produce output.

add_elements(elements)[source]

Add elements to the TestStream.

Elements added to the TestStream will be produced during pipeline execution. These elements can be TimestampedValue, WindowedValue or raw unwrapped elements that are serializable using the TestStream’s specified Coder. When a TimestampedValue or a WindowedValue element is used, the timestamp of the TimestampedValue or WindowedValue will be the timestamp of the produced element; otherwise, the current watermark timestamp will be used for that element. The windows of a given WindowedValue are ignored by the TestStream.

advance_processing_time(advance_by)[source]

Advance the current processing time by a given duration in seconds.

The duration must be a positive second duration and should be given as an int, float or utils.timestamp.Duration object.

advance_watermark_to(new_watermark)[source]

Advance the watermark to a given Unix timestamp.

The Unix timestamp value used must be later than the previous watermark value and should be given as an int, float or utils.timestamp.Timestamp object.

advance_watermark_to_infinity()[source]

Advance the watermark to the end of time.

expand(pbegin)[source]
get_windowing(unused_inputs)[source]

apache_beam.testing.test_utils module

Utility methods for testing

For internal use only; no backwards-compatibility guarantees.

apache_beam.testing.test_utils.compute_hash(content, hashing_alg='sha1')[source]

Compute a hash value from a list of string.

apache_beam.testing.test_utils.patch_retry(testcase, module)[source]

A function to patch retry module to use mock clock and logger.

Clock and logger that defined in retry decorator will be replaced in test in order to skip sleep phase when retry happens.

Parameters:
  • testcase – An instance of unittest.TestCase that calls this function to patch retry module.
  • module – The module that uses retry and need to be replaced with mock clock and logger in test.

apache_beam.testing.util module

Utilities for testing Beam pipelines.

apache_beam.testing.util.assert_that(actual, matcher, label='assert_that')[source]

A PTransform that checks a PCollection has an expected value.

Note that assert_that should be used only for testing pipelines since the check relies on materializing the entire PCollection being checked.

Parameters:
  • actual – A PCollection.
  • matcher – A matcher function taking as argument the actual value of a materialized PCollection. The matcher validates this actual value against expectations and raises BeamAssertException if they are not met.
  • label – Optional string label. This is needed in case several assert_that transforms are introduced in the same pipeline.
Returns:

Ignored.

apache_beam.testing.util.equal_to(expected)[source]
apache_beam.testing.util.is_empty()[source]
apache_beam.testing.util.open_shards(*args, **kwargs)[source]

Returns a composite file of all shards matching the given glob pattern.

Module contents