apache_beam.yaml.yaml_provider module¶
This module defines Providers usable from yaml, which is a specification for where to find and how to invoke services that vend implementations of various PTransforms.
- 
class apache_beam.yaml.yaml_provider.Provider[source]¶
- Bases: - object- Maps transform types names and args to concrete PTransform instances. - 
provided_transforms() → Iterable[str][source]¶
- Returns a list of transform type names this provider can handle. 
 - 
requires_inputs(typ: str, args: Mapping[str, Any]) → bool[source]¶
- Returns whether this transform requires inputs. - Specifically, if this returns True and inputs are not provided than an error will be thrown. - This is best-effort, primarily for better and earlier error messages. 
 - 
create_transform(typ: str, args: Mapping[str, Any], yaml_create_transform: Callable[[Mapping[str, Any], Iterable[apache_beam.pvalue.PCollection]], apache_beam.transforms.ptransform.PTransform]) → apache_beam.transforms.ptransform.PTransform[source]¶
- Creates a PTransform instance for the given transform type and arguments. 
 
- 
- 
class apache_beam.yaml.yaml_provider.ExternalProvider(urns, service)[source]¶
- Bases: - apache_beam.yaml.yaml_provider.Provider- A Provider implemented via the cross language transform service. 
- 
apache_beam.yaml.yaml_provider.maven_jar(urns, *, artifact_id, group_id, version, repository='https://repo.maven.apache.org/maven2', classifier=None, appendix=None)[source]¶
- 
apache_beam.yaml.yaml_provider.beam_jar(urns, *, gradle_target, appendix=None, version='2.54.0', artifact_id=None)[source]¶
- 
class apache_beam.yaml.yaml_provider.InlineProvider(transform_factories, no_input_transforms=())[source]¶
- 
class apache_beam.yaml.yaml_provider.MetaInlineProvider(transform_factories, no_input_transforms=())[source]¶
- 
class apache_beam.yaml.yaml_provider.SqlBackedProvider(transforms: Mapping[str, Callable[[...], apache_beam.transforms.ptransform.PTransform]], sql_provider: Optional[apache_beam.yaml.yaml_provider.Provider] = None)[source]¶
- 
class apache_beam.yaml.yaml_provider.YamlProviders[source]¶
- Bases: - object- 
static create(elements: Iterable[Any], reshuffle: Optional[bool] = True)[source]¶
- Creates a collection containing a specified set of elements. - YAML/JSON-style mappings will be interpreted as Beam rows. For example: - type: Create elements: - {first: 0, second: {str: "foo", values: [1, 2, 3]}} - will result in a schema of the form (int, Row(string, List[int])). - Parameters: - elements – The set of elements that should belong to the PCollection. YAML/JSON-style mappings will be interpreted as Beam rows.
- reshuffle – (optional) Whether to introduce a reshuffle (to possibly redistribute the work) if there is more than one element in the collection. Defaults to True.
 
 - 
static fully_qualified_named_transform(constructor: str, args: Optional[Iterable[Any]] = (), kwargs: Optional[Mapping[str, Any]] = {})[source]¶
- A Python PTransform identified by fully qualified name. - This allows one to import, construct, and apply any Beam Python transform. This can be useful for using transforms that have not yet been exposed via a YAML interface. Note, however, that conversion may be required if this transform does not accept or produce Beam Rows. - For example: - type: PyTransform config: constructor: apache_beam.pkg.mod.SomeClass args: [1, 'foo'] kwargs: baz: 3 - can be used to access the transform apache_beam.pkg.mod.SomeClass(1, ‘foo’, baz=3). - Parameters: - constructor – Fully qualified name of a callable used to construct the transform. Often this is a class such as apache_beam.pkg.mod.SomeClass but it can also be a function or any other callable that returns a PTransform.
- args – A list of parameters to pass to the callable as positional arguments.
- kwargs – A list of parameters to pass to the callable as keyword arguments.
 
 - 
class Flatten[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- Flattens multiple PCollections into a single PCollection. - The elements of the resulting PCollection will be the (disjoint) union of all the elements of all the inputs. - Note that in YAML transforms can always take a list of inputs which will be implicitly flattened. 
 - 
class WindowInto(windowing)[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- A window transform assigning windows to each element of a PCollection. - The assigned windows will affect all downstream aggregating operations, which will aggregate by window as well as by key. - See [the Beam documentation on windowing](https://beam.apache.org/documentation/programming-guide/#windowing) for more details. - Sizes, offsets, periods and gaps (where applicable) must be defined using a time unit suffix ‘ms’, ‘s’, ‘m’, ‘h’ or ‘d’ for milliseconds, seconds, minutes, hours or days, respectively. If a time unit is not specified, it will default to ‘s’. - For example: - windowing: type: fixed size: 30s - Note that any Yaml transform can have a [windowing parameter](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md#windowing), which is applied to its inputs (if any) or outputs (if there are no inputs) which means that explicit WindowInto operations are not typically needed. - Parameters: - windowing – the type and parameters of the windowing to perform 
 - 
static log_for_testing(level: Optional[str] = 'INFO', prefix: Optional[str] = '')[source]¶
- Logs each element of its input PCollection. - The output of this transform is a copy of its input for ease of use in chain-style pipelines. - Parameters: - level – one of ERROR, INFO, or DEBUG, mapped to a corresponding language-specific logging level
- prefix – an optional identifier that will get prepended to the element being logged
 
 
- 
static 
- 
class apache_beam.yaml.yaml_provider.TranslatingProvider(transforms: Mapping[str, Callable[[...], apache_beam.transforms.ptransform.PTransform]], underlying_provider: apache_beam.yaml.yaml_provider.Provider)[source]¶
- 
apache_beam.yaml.yaml_provider.create_java_builtin_provider()[source]¶
- Exposes built-in transforms from Java as well as Python to maximize opportunities for fusion. - This class holds those transforms that require pre-processing of the configs. For those Java transforms that can consume the user-provided configs directly (or only need a simple renaming of parameters) a direct or renaming provider is the simpler choice. 
- 
class apache_beam.yaml.yaml_provider.PypiExpansionService(packages, base_python='/home/runner/work/beam/beam/beam/sdks/python/target/.tox/py38-docs/bin/python')[source]¶
- Bases: - object- Expands transforms by fully qualified name in a virtual environment with the given dependencies. - 
VENV_CACHE= '/home/runner/.apache_beam/cache/venvs'¶
 
- 
- 
class apache_beam.yaml.yaml_provider.RenamingProvider(transforms, mappings, underlying_provider, defaults=None)[source]¶
- Bases: - apache_beam.yaml.yaml_provider.Provider- 
create_transform(typ: str, args: Mapping[str, Any], yaml_create_transform: Callable[[Mapping[str, Any], Iterable[apache_beam.pvalue.PCollection]], apache_beam.transforms.ptransform.PTransform]) → apache_beam.transforms.ptransform.PTransform[source]¶
- Creates a PTransform instance for the given transform type and arguments. 
 
-