apache_beam.runners.dask.dask_runner module¶
DaskRunner, executing remote jobs on Dask.distributed.
The DaskRunner is a runner implementation that executes a graph of transformations across processes and workers via Dask distributed’s scheduler.
- 
class apache_beam.runners.dask.dask_runner.DaskOptions(flags=None, **kwargs)[source]¶
- Bases: - apache_beam.options.pipeline_options.PipelineOptions- Initialize an options class. - The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv. - The subclasses of PipelineOptions do not need to redefine __init__. - Parameters: - flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
- **kwargs – Add overrides for arguments passed in flags. For overrides of arguments, please pass the option names instead of flag names. Option names: These are defined as dest in the parser.add_argument() for each flag. Passing flags like {no_use_public_ips: True}, for which the dest is defined to a different flag name in the parser, would be discarded. Instead, pass the dest of the flag (dest of no_use_public_ips is use_public_ips).
 
- 
class apache_beam.runners.dask.dask_runner.DaskRunnerResult(client: distributed.client.Client, futures: Sequence[distributed.client.Future])[source]¶
- Bases: - apache_beam.runners.runner.PipelineResult- 
distributed= <module 'dask.distributed' from '/home/runner/work/beam/beam/beam/sdks/python/target/.tox/py38-docs/lib/python3.8/site-packages/dask/distributed.py'>¶
 
- 
- 
class apache_beam.runners.dask.dask_runner.DaskRunner[source]¶
- Bases: - apache_beam.runners.direct.direct_runner.BundleBasedDirectRunner- Executes a pipeline on a Dask distributed client.