apache_beam.runners.direct package¶
Submodules¶
apache_beam.runners.direct.bundle_factory module¶
A factory that creates UncommittedBundles.
-
class
apache_beam.runners.direct.bundle_factory.
BundleFactory
(stacked)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
BundleFactory creates output bundles to be used by transform evaluators.
Parameters: stacked – whether or not to stack the WindowedValues within the bundle in case consecutive ones share the same timestamp and windows. DirectRunnerOptions.direct_runner_use_stacked_bundle controls this option.
apache_beam.runners.direct.clock module¶
Clock implementations for real time processing and testing.
apache_beam.runners.direct.consumer_tracking_pipeline_visitor module¶
ConsumerTrackingPipelineVisitor, a PipelineVisitor object.
-
class
apache_beam.runners.direct.consumer_tracking_pipeline_visitor.
ConsumerTrackingPipelineVisitor
[source]¶ Bases:
apache_beam.pipeline.PipelineVisitor
For internal use only; no backwards-compatibility guarantees.
Visitor for extracting value-consumer relations from the graph.
Tracks the AppliedPTransforms that consume each PValue in the Pipeline. This is used to schedule consuming PTransforms to consume input after the upstream transform has produced and committed output.
apache_beam.runners.direct.direct_metrics module¶
DirectRunner implementation of MetricResults. It is in charge not only of responding to queries of current metrics, but also of keeping the common state consistent.
apache_beam.runners.direct.direct_runner module¶
DirectRunner, executing on the local machine.
The DirectRunner is a runner implementation that executes the entire graph of transformations belonging to a pipeline on the local machine.
-
class
apache_beam.runners.direct.direct_runner.
DirectRunner
[source]¶ Bases:
apache_beam.runners.runner.PipelineRunner
Executes a single pipeline on the local machine.
-
cache
¶
-
apache_beam.runners.direct.evaluation_context module¶
EvaluationContext tracks global state, triggers and watermarks.
-
class
apache_beam.runners.direct.evaluation_context.
EvaluationContext
(pipeline_options, bundle_factory, root_transforms, value_to_consumers, step_names, views)[source]¶ Bases:
object
Evaluation context with the global state information of the pipeline.
The evaluation context for a specific pipeline being executed by the DirectRunner. Contains state shared within the execution across all transforms.
EvaluationContext contains shared state for an execution of the DirectRunner that can be used while evaluating a PTransform. This consists of views into underlying state and watermark implementations, access to read and write side inputs, and constructing counter sets and execution contexts. This includes executing callbacks asynchronously when state changes to the appropriate point (e.g. when a side input is requested and known to be empty).
EvaluationContext also handles results by committing finalizing bundles based on the current global state and updating the global state appropriately. This includes updating the per-(step,key) state, updating global watermarks, and executing any callbacks that can be executed.
-
create_bundle
(output_pcollection)[source]¶ Create an uncommitted bundle for the specified PCollection.
-
create_empty_committed_bundle
(output_pcollection)[source]¶ Create empty bundle useful for triggering evaluation.
-
handle_result
(completed_bundle, completed_timers, result)[source]¶ Handle the provided result produced after evaluating the input bundle.
Handle the provided TransformResult, produced after evaluating the provided committed bundle (potentially None, if the result of a root PTransform).
The result is the output of running the transform contained in the TransformResult on the contents of the provided bundle.
Parameters: - completed_bundle – the bundle that was processed to produce the result.
- completed_timers – the timers that were delivered to produce the completed_bundle.
- result – the TransformResult of evaluating the input bundle
Returns: the committed bundles contained within the handled result.
-
has_cache
¶
-
apache_beam.runners.direct.executor module¶
An executor that schedules and executes applied ptransforms.
-
class
apache_beam.runners.direct.executor.
Executor
(*args, **kwargs)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.runners.direct.executor.
TransformExecutor
(transform_evaluator_registry, evaluation_context, input_bundle, applied_transform, completion_callback, transform_evaluation_state)[source]¶ Bases:
apache_beam.runners.direct.executor.CallableTask
For internal use only; no backwards-compatibility guarantees.
TransformExecutor will evaluate a bundle using an applied ptransform.
A CallableTask responsible for constructing a TransformEvaluator and evaluating it on some bundle of input, and registering the result using the completion callback.
apache_beam.runners.direct.helper_transforms module¶
-
class
apache_beam.runners.direct.helper_transforms.
FinishCombine
(combine_fn)[source]¶ Bases:
apache_beam.transforms.core.DoFn
Merges partially combined results.
-
class
apache_beam.runners.direct.helper_transforms.
LiftedCombinePerKey
(combine_fn, args, kwargs)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
An implementation of CombinePerKey that does mapper-side pre-combining.
-
class
apache_beam.runners.direct.helper_transforms.
PartialGroupByKeyCombiningValues
(combine_fn)[source]¶ Bases:
apache_beam.transforms.core.DoFn
Aggregates values into a per-key-window cache.
As bundles are in-memory-sized, we don’t bother flushing until the very end.
apache_beam.runners.direct.transform_evaluator module¶
An evaluator of a specific application of a transform.
-
class
apache_beam.runners.direct.transform_evaluator.
TransformEvaluatorRegistry
(evaluation_context)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
Creates instances of TransformEvaluator for the application of a transform.
-
for_application
(applied_ptransform, input_committed_bundle, side_inputs, scoped_metrics_container)[source]¶ Returns a TransformEvaluator suitable for processing given inputs.
-
should_execute_serially
(applied_ptransform)[source]¶ Returns True if this applied_ptransform should run one bundle at a time.
Some TransformEvaluators use a global state object to keep track of their global execution state. For example evaluator for _GroupByKeyOnly uses this state as an in memory dictionary to buffer keys.
Serially executed evaluators will act as syncing point in the graph and execution will not move forward until they receive all of their inputs. Once they receive all of their input, they will release the combined output. Their output may consist of multiple bundles as they may divide their output into pieces before releasing.
Parameters: applied_ptransform – Transform to be used for execution. Returns: True if executor should execute applied_ptransform serially.
-
apache_beam.runners.direct.transform_result module¶
The result of evaluating an AppliedPTransform with a TransformEvaluator.
-
class
apache_beam.runners.direct.transform_result.
TransformResult
(applied_ptransform, uncommitted_output_bundles, state, timer_update, counters, watermark_hold, undeclared_tag_values=None)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
The result of evaluating an AppliedPTransform with a TransformEvaluator.
apache_beam.runners.direct.watermark_manager module¶
Manages watermarks of PCollections and AppliedPTransforms.
-
class
apache_beam.runners.direct.watermark_manager.
WatermarkManager
(clock, root_transforms, value_to_consumers)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
Tracks and updates watermarks for all AppliedPTransforms.
-
WATERMARK_NEG_INF
= Timestamp(-9223372036854.775808)¶
-
WATERMARK_POS_INF
= Timestamp(9223372036854.775807)¶
-
get_watermarks
(applied_ptransform)[source]¶ Gets the input and output watermarks for an AppliedPTransform.
If the applied_ptransform has not processed any elements, return a watermark with minimum value.
Parameters: applied_ptransform – AppliedPTransform to get the watermarks for. Returns: A snapshot (TransformWatermarks) of the input watermark and output watermark for the provided transform.
-
Module contents¶
Inprocess runner executes pipelines locally in a single process.
Anything in this package not imported here is an internal implementation detail with no backwards-compatibility guarantees.