apache_beam.transforms.managed module

Managed Transforms.

This module builds and instantiates turnkey transforms that can be managed by the underlying runner. This means the runner can upgrade the transform to a more optimal/updated version without requiring the user to do anything. It may also replace the transform with something entirely different if it chooses to. By default, however, the specified transform will remain unchanged.

Available transforms

Please check the Managed IO configuration page: https://beam.apache.org/documentation/io/managed-io/

Using Managed Transforms

Managed turnkey transforms have a defined configuration and can be built using an inline dict like so:

results = p | beam.managed.Read(
                  beam.managed.ICEBERG,
                  config={"table": "foo",
                          "catalog_name": "bar",
                          "catalog_properties": {
                              "warehouse": "path/to/warehouse",
                              "catalog-impl": "org.apache.my.CatalogImpl"}})

A YAML configuration file can also be used to build a Managed transform. Say we have the following config.yaml file:

topic: "foo"
bootstrap_servers: "localhost:1234"
format: "AVRO"

Simply provide the location to the file like so:

input_rows = p | beam.Create(...)
input_rows | beam.managed.Write(
                  beam.managed.KAFKA,
                  config_url="path/to/config.yaml")

Note: inputs and outputs need to be PCollection(s) of Beam apache_beam.pvalue.Row elements.

Runner specific features

Google Cloud Dataflow supports additional management features for managed including automatically upgrading transforms to the latest supported version. For more details and examples, please see Dataflow managed I/O https://cloud.google.com/dataflow/docs/guides/managed-io.

class apache_beam.transforms.managed.Read(source: str, config: dict[str, Any] | None = None, config_url: str | None = None, expansion_service=None)[source]

Bases: PTransform

Read using Managed Transforms

expand(input)[source]
default_label() str[source]
class apache_beam.transforms.managed.Write(sink: str, config: dict[str, Any] | None = None, config_url: str | None = None, expansion_service=None)[source]

Bases: PTransform

Write using Managed Transforms

expand(input)[source]
default_label() str[source]