apache_beam.transforms.external module

Defines Transform whose expansion is implemented elsewhere.

apache_beam.transforms.external.convert_to_typing_type(type_)[source]
apache_beam.transforms.external.iter_urns(coder, context=None)[source]
class apache_beam.transforms.external.PayloadBuilder[source]

Bases: object

Abstract base class for building payloads to pass to ExternalTransform.

build()[source]
Returns:

ExternalConfigurationPayload

payload()[source]

The serialized ExternalConfigurationPayload

Returns:

bytes

class apache_beam.transforms.external.SchemaBasedPayloadBuilder[source]

Bases: PayloadBuilder

Base class for building payloads based on a schema that provides type information for each configuration value to encode.

build()[source]
class apache_beam.transforms.external.ImplicitSchemaPayloadBuilder(values)[source]

Bases: SchemaBasedPayloadBuilder

Build a payload that generates a schema from the provided values.

class apache_beam.transforms.external.NamedTupleBasedPayloadBuilder(tuple_instance)[source]

Bases: SchemaBasedPayloadBuilder

Build a payload based on a NamedTuple schema.

Parameters:

tuple_instance – an instance of a typing.NamedTuple

class apache_beam.transforms.external.SchemaTransformPayloadBuilder(identifier, **kwargs)[source]

Bases: PayloadBuilder

identifier()[source]

The URN referencing this SchemaTransform

Returns:

str

build()[source]
class apache_beam.transforms.external.ExplicitSchemaTransformPayloadBuilder(identifier, schema_proto, **kwargs)[source]

Bases: SchemaTransformPayloadBuilder

build()[source]
class apache_beam.transforms.external.JavaClassLookupPayloadBuilder(class_name)[source]

Bases: PayloadBuilder

Builds a payload for directly instantiating a Java transform using a constructor and builder methods.

Parameters:

class_name – fully qualified name of the transform class.

IGNORED_ARG_FORMAT = 'ignore%d'
build()[source]
with_constructor(*args, **kwargs)[source]

Specifies the Java constructor to use. Arguments provided using args and kwargs will be applied to the Java transform constructor in the specified order.

Parameters:
  • args – parameter values of the constructor.

  • kwargs – parameter names and values of the constructor.

with_constructor_method(method_name, *args, **kwargs)[source]

Specifies the Java constructor method to use. Arguments provided using args and kwargs will be applied to the Java transform constructor method in the specified order.

Parameters:
  • method_name – name of the constructor method.

  • args – parameter values of the constructor method.

  • kwargs – parameter names and values of the constructor method.

add_builder_method(method_name, *args, **kwargs)[source]

Specifies a Java builder method to be invoked after instantiating the Java transform class. Specified builder method will be applied in order. Arguments provided using args and kwargs will be applied to the Java transform builder method in the specified order.

Parameters:
  • method_name – name of the builder method.

  • args – parameter values of the builder method.

  • kwargs – parameter names and values of the builder method.

class apache_beam.transforms.external.SchemaTransformsConfig(identifier, configuration_schema, inputs, outputs, description)

Bases: tuple

Create new instance of SchemaTransformsConfig(identifier, configuration_schema, inputs, outputs, description)

configuration_schema

Alias for field number 1

description

Alias for field number 4

identifier

Alias for field number 0

inputs

Alias for field number 2

outputs

Alias for field number 3

class apache_beam.transforms.external.SchemaAwareExternalTransform(identifier, expansion_service, rearrange_based_on_discovery=False, classpath=None, **kwargs)[source]

Bases: PTransform

A proxy transform for SchemaTransforms implemented in external SDKs.

This allows Python pipelines to directly use existing SchemaTransforms available to the expansion service without adding additional code in external SDKs.

Parameters:
  • identifier – unique identifier of the SchemaTransform.

  • expansion_service – an expansion service to use. This should already be available and the Schema-aware transforms to be used must already be deployed.

  • rearrange_based_on_discovery – if this flag is set, the input kwargs will be rearranged to match the order of fields in the external SchemaTransform configuration. A discovery call will be made to fetch the configuration.

  • classpath – (Optional) A list paths to additional jars to place on the expansion service classpath.

Kwargs:

field name to value mapping for configuring the schema transform. keys map to the field names of the schema of the SchemaTransform (in-order).

expand(pcolls)[source]
classmethod discover(expansion_service, ignore_errors=False)[source]

Discover all SchemaTransforms available to the given expansion service.

Returns:

a list of SchemaTransformsConfigs that represent the discovered SchemaTransforms.

static discover_iter(expansion_service, ignore_errors=True)[source]
static discover_config(expansion_service, name)[source]

Discover one SchemaTransform by name in the given expansion service.

Returns:

one SchemaTransformsConfig that represents the discovered SchemaTransform

Raises:

ValueError: if more than one SchemaTransform is discovered, or if none are discovered

class apache_beam.transforms.external.JavaExternalTransform(class_name, expansion_service=None, classpath=None)[source]

Bases: PTransform

A proxy for Java-implemented external transforms.

One builds these transforms just as one would in Java, e.g.:

transform = JavaExternalTransform('fully.qualified.ClassName'
    )(contructorArg, ... ).builderMethod(...)

or:

JavaExternalTransform('fully.qualified.ClassName').staticConstructor(
    ...).builderMethod1(...).builderMethod2(...)
Parameters:
  • class_name – fully qualified name of the java class

  • expansion_service – (Optional) an expansion service to use. If none is provided, a default expansion service will be started.

  • classpath – (Optional) A list paths to additional jars to place on the expansion service classpath.

expand(pcolls)[source]
class apache_beam.transforms.external.AnnotationBasedPayloadBuilder(transform, **values)[source]

Bases: SchemaBasedPayloadBuilder

Build a payload based on an external transform’s type annotations.

Parameters:
  • transform – a PTransform instance or class. type annotations will be gathered from its __init__ method

  • values – values to encode

class apache_beam.transforms.external.DataclassBasedPayloadBuilder(transform)[source]

Bases: SchemaBasedPayloadBuilder

Build a payload based on an external transform that uses dataclasses.

Parameters:

transform – a dataclass-decorated PTransform instance from which to gather type annotations and values

class apache_beam.transforms.external.ExternalTransform(urn, payload, expansion_service=None)[source]

Bases: PTransform

External provides a cross-language transform via expansion services in foreign SDKs.

Wrapper for an external transform with the given urn and payload.

Parameters:
  • urn – the unique beam identifier for this transform

  • payload – the payload, either as a byte string or a PayloadBuilder

  • expansion_service – an expansion service implementing the beam ExpansionService protocol, either as an object with an Expand method or an address (as a str) to a grpc server that provides this method.

with_output_types(*args, **kwargs)[source]
replace_named_inputs(named_inputs)[source]
replace_named_outputs(named_outputs)[source]
default_label()[source]
classmethod get_local_namespace()[source]
classmethod outer_namespace(namespace)[source]
expand(pvalueish: PCollection) PCollection[source]
static service(expansion_service)[source]
to_runner_api_transform(context, full_label)[source]
class apache_beam.transforms.external.ExpansionAndArtifactRetrievalStub(channel, **kwargs)[source]

Bases: ExpansionServiceStub

artifact_service()[source]
ready(timeout_sec)[source]
class apache_beam.transforms.external.JavaJarExpansionService(path_to_jar, extra_args=None, classpath=None, append_args=None)[source]

Bases: object

An expansion service based on an Java Jar file.

This can be passed into an ExternalTransform as the expansion_service argument which will spawn a subprocess using this jar to expand the transform.

Parameters:
  • path_to_jar – the path to a locally available executable jar file to be used to start up the expansion service.

  • extra_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will replace the default arguments.

  • classpath – Additional dependencies to be added to the classpath.

  • append_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will be appended to the default arguments.

is_existing_service()[source]
class apache_beam.transforms.external.BeamJarExpansionService(gradle_target, extra_args=None, gradle_appendix=None, classpath=None, append_args=None)[source]

Bases: JavaJarExpansionService

An expansion service based on an Beam Java Jar file.

Attempts to use a locally-built copy of the jar based on the gradle target, if it exists, otherwise attempts to download and cache the released artifact corresponding to this version of Beam from the apache maven repository.

Parameters:
  • gradle_target – Beam Gradle target for building an executable jar which will be used to start the expansion service.

  • extra_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will replace the default arguments.

  • gradle_appendix – Gradle appendix of the artifact.

  • classpath – Additional dependencies to be added to the classpath.

  • append_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will be appended to the default arguments.

apache_beam.transforms.external.memoize(func)[source]