apache_beam.transforms.external module
Defines Transform whose expansion is implemented elsewhere.
- class apache_beam.transforms.external.PayloadBuilder[source]
Bases:
object
Abstract base class for building payloads to pass to ExternalTransform.
- class apache_beam.transforms.external.SchemaBasedPayloadBuilder[source]
Bases:
PayloadBuilder
Base class for building payloads based on a schema that provides type information for each configuration value to encode.
- class apache_beam.transforms.external.ImplicitSchemaPayloadBuilder(values)[source]
Bases:
SchemaBasedPayloadBuilder
Build a payload that generates a schema from the provided values.
- class apache_beam.transforms.external.NamedTupleBasedPayloadBuilder(tuple_instance)[source]
Bases:
SchemaBasedPayloadBuilder
Build a payload based on a NamedTuple schema.
- Parameters:
tuple_instance – an instance of a typing.NamedTuple
- class apache_beam.transforms.external.SchemaTransformPayloadBuilder(identifier, **kwargs)[source]
Bases:
PayloadBuilder
- class apache_beam.transforms.external.ExplicitSchemaTransformPayloadBuilder(identifier, schema_proto, **kwargs)[source]
- class apache_beam.transforms.external.JavaClassLookupPayloadBuilder(class_name)[source]
Bases:
PayloadBuilder
Builds a payload for directly instantiating a Java transform using a constructor and builder methods.
- Parameters:
class_name – fully qualified name of the transform class.
- IGNORED_ARG_FORMAT = 'ignore%d'
- with_constructor(*args, **kwargs)[source]
Specifies the Java constructor to use. Arguments provided using args and kwargs will be applied to the Java transform constructor in the specified order.
- Parameters:
args – parameter values of the constructor.
kwargs – parameter names and values of the constructor.
- with_constructor_method(method_name, *args, **kwargs)[source]
Specifies the Java constructor method to use. Arguments provided using args and kwargs will be applied to the Java transform constructor method in the specified order.
- Parameters:
method_name – name of the constructor method.
args – parameter values of the constructor method.
kwargs – parameter names and values of the constructor method.
- add_builder_method(method_name, *args, **kwargs)[source]
Specifies a Java builder method to be invoked after instantiating the Java transform class. Specified builder method will be applied in order. Arguments provided using args and kwargs will be applied to the Java transform builder method in the specified order.
- Parameters:
method_name – name of the builder method.
args – parameter values of the builder method.
kwargs – parameter names and values of the builder method.
- class apache_beam.transforms.external.SchemaTransformsConfig(identifier, configuration_schema, inputs, outputs, description)
Bases:
tuple
Create new instance of SchemaTransformsConfig(identifier, configuration_schema, inputs, outputs, description)
- configuration_schema
Alias for field number 1
- description
Alias for field number 4
- identifier
Alias for field number 0
- inputs
Alias for field number 2
- outputs
Alias for field number 3
- class apache_beam.transforms.external.SchemaAwareExternalTransform(identifier, expansion_service, rearrange_based_on_discovery=False, classpath=None, **kwargs)[source]
Bases:
PTransform
A proxy transform for SchemaTransforms implemented in external SDKs.
This allows Python pipelines to directly use existing SchemaTransforms available to the expansion service without adding additional code in external SDKs.
- Parameters:
identifier – unique identifier of the SchemaTransform.
expansion_service – an expansion service to use. This should already be available and the Schema-aware transforms to be used must already be deployed.
rearrange_based_on_discovery – if this flag is set, the input kwargs will be rearranged to match the order of fields in the external SchemaTransform configuration. A discovery call will be made to fetch the configuration.
classpath – (Optional) A list paths to additional jars to place on the expansion service classpath.
- Kwargs:
field name to value mapping for configuring the schema transform. keys map to the field names of the schema of the SchemaTransform (in-order).
- classmethod discover(expansion_service, ignore_errors=False)[source]
Discover all SchemaTransforms available to the given expansion service.
- Returns:
a list of SchemaTransformsConfigs that represent the discovered SchemaTransforms.
- static discover_config(expansion_service, name)[source]
Discover one SchemaTransform by name in the given expansion service.
- Returns:
one SchemaTransformsConfig that represents the discovered SchemaTransform
- Raises:
ValueError: if more than one SchemaTransform is discovered, or if none are discovered
- class apache_beam.transforms.external.JavaExternalTransform(class_name, expansion_service=None, classpath=None)[source]
Bases:
PTransform
A proxy for Java-implemented external transforms.
One builds these transforms just as one would in Java, e.g.:
transform = JavaExternalTransform('fully.qualified.ClassName' )(contructorArg, ... ).builderMethod(...)
or:
JavaExternalTransform('fully.qualified.ClassName').staticConstructor( ...).builderMethod1(...).builderMethod2(...)
- Parameters:
class_name – fully qualified name of the java class
expansion_service – (Optional) an expansion service to use. If none is provided, a default expansion service will be started.
classpath – (Optional) A list paths to additional jars to place on the expansion service classpath.
- class apache_beam.transforms.external.AnnotationBasedPayloadBuilder(transform, **values)[source]
Bases:
SchemaBasedPayloadBuilder
Build a payload based on an external transform’s type annotations.
- Parameters:
transform – a PTransform instance or class. type annotations will be gathered from its __init__ method
values – values to encode
- class apache_beam.transforms.external.DataclassBasedPayloadBuilder(transform)[source]
Bases:
SchemaBasedPayloadBuilder
Build a payload based on an external transform that uses dataclasses.
- Parameters:
transform – a dataclass-decorated PTransform instance from which to gather type annotations and values
- class apache_beam.transforms.external.ExternalTransform(urn, payload, expansion_service=None)[source]
Bases:
PTransform
External provides a cross-language transform via expansion services in foreign SDKs.
Wrapper for an external transform with the given urn and payload.
- Parameters:
urn – the unique beam identifier for this transform
payload – the payload, either as a byte string or a PayloadBuilder
expansion_service – an expansion service implementing the beam ExpansionService protocol, either as an object with an Expand method or an address (as a str) to a grpc server that provides this method.
- expand(pvalueish: PCollection) PCollection [source]
- class apache_beam.transforms.external.ExpansionAndArtifactRetrievalStub(channel, **kwargs)[source]
Bases:
ExpansionServiceStub
- class apache_beam.transforms.external.JavaJarExpansionService(path_to_jar, extra_args=None, classpath=None, append_args=None)[source]
Bases:
object
An expansion service based on an Java Jar file.
This can be passed into an ExternalTransform as the expansion_service argument which will spawn a subprocess using this jar to expand the transform.
- Parameters:
path_to_jar – the path to a locally available executable jar file to be used to start up the expansion service.
extra_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will replace the default arguments.
classpath – Additional dependencies to be added to the classpath.
append_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will be appended to the default arguments.
- class apache_beam.transforms.external.BeamJarExpansionService(gradle_target, extra_args=None, gradle_appendix=None, classpath=None, append_args=None)[source]
Bases:
JavaJarExpansionService
An expansion service based on an Beam Java Jar file.
Attempts to use a locally-built copy of the jar based on the gradle target, if it exists, otherwise attempts to download and cache the released artifact corresponding to this version of Beam from the apache maven repository.
- Parameters:
gradle_target – Beam Gradle target for building an executable jar which will be used to start the expansion service.
extra_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will replace the default arguments.
gradle_appendix – Gradle appendix of the artifact.
classpath – Additional dependencies to be added to the classpath.
append_args – arguments to be provided when starting up the expansion service using the jar file. These arguments will be appended to the default arguments.