apache_beam.runners.dask.dask_runner module¶
DaskRunner, executing remote jobs on Dask.distributed.
The DaskRunner is a runner implementation that executes a graph of transformations across processes and workers via Dask distributed’s scheduler.
-
class
apache_beam.runners.dask.dask_runner.
DaskOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.options.pipeline_options.PipelineOptions
Initialize an options class.
The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.
The subclasses of PipelineOptions do not need to redefine __init__.
Parameters: - flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
- **kwargs – Add overrides for arguments passed in flags. For overrides of arguments, please pass the option names instead of flag names. Option names: These are defined as dest in the parser.add_argument() for each flag. Passing flags like {no_use_public_ips: True}, for which the dest is defined to a different flag name in the parser, would be discarded. Instead, pass the dest of the flag (dest of no_use_public_ips is use_public_ips).
-
class
apache_beam.runners.dask.dask_runner.
DaskRunnerResult
(client: distributed.client.Client, futures: Sequence[distributed.client.Future])[source]¶ Bases:
apache_beam.runners.runner.PipelineResult
-
distributed
= <module 'dask.distributed' from '/home/runner/work/beam/beam/beam/sdks/python/target/.tox/docs/lib/python3.8/site-packages/dask/distributed.py'>¶
-
-
class
apache_beam.runners.dask.dask_runner.
DaskRunner
[source]¶ Bases:
apache_beam.runners.direct.direct_runner.BundleBasedDirectRunner
Executes a pipeline on a Dask distributed client.