apache_beam.runners.dask.dask_runner module

DaskRunner, executing remote jobs on Dask.distributed.

The DaskRunner is a runner implementation that executes a graph of transformations across processes and workers via Dask distributed’s scheduler.

class apache_beam.runners.dask.dask_runner.DaskOptions(flags: Sequence[str] | None = None, **kwargs)[source]

Bases: PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.

  • **kwargs – Add overrides for arguments passed in flags. For overrides of arguments, please pass the option names instead of flag names. Option names: These are defined as dest in the parser.add_argument() for each flag. Passing flags like {no_use_public_ips: True}, for which the dest is defined to a different flag name in the parser, would be discarded. Instead, pass the dest of the flag (dest of no_use_public_ips is use_public_ips).

class apache_beam.runners.dask.dask_runner.DaskRunnerResult(client: distributed.client.Client, futures: Sequence[distributed.client.Future])[source]

Bases: PipelineResult

distributed = <module 'dask.distributed' from '/home/runner/work/beam/beam/beam/sdks/python/target/.tox/docs/lib/python3.9/site-packages/dask/distributed.py'>
client: Client
futures: Sequence[Future]
wait_until_finish(duration=None) str[source]
cancel() str[source]
metrics()[source]
class apache_beam.runners.dask.dask_runner.DaskRunner[source]

Bases: BundleBasedDirectRunner

Executes a pipeline on a Dask distributed client.

static to_dask_bag_visitor() PipelineVisitor[source]
static is_fnapi_compatible()[source]
run_pipeline(pipeline, options)[source]