apache_beam.transforms.external_transform_provider module¶
-
class
apache_beam.transforms.external_transform_provider.
ExternalTransform
(expansion_service=None, **kwargs)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Template for a wrapper class of an external SchemaTransform
This is a superclass for dynamically generated SchemaTransform wrappers and is not meant to be manually instantiated.
Experimental; no backwards compatibility guarantees.
-
default_expansion_service
= None¶
-
description
= ''¶
-
identifier
= ''¶
-
configuration_schema
= {}¶
-
-
class
apache_beam.transforms.external_transform_provider.
ExternalTransformProvider
(expansion_services, urn_pattern='^beam:schematransform:org.apache.beam:([\w-]+):(\w+)$')[source]¶ Bases:
object
Dynamically discovers Schema-aware external transforms from a given list of expansion services and provides them as ready PTransforms.
A
ExternalTransform
subclass is generated for each external transform, and is named based on what can be inferred from the URN (see the urn_pattern parameter).These classes are generated when
ExternalTransformProvider
is initialized. We need to give it one or more expansion service addresses that are already up and running: >>> provider = ExternalTransformProvider([“localhost:12345”, … “localhost:12121”]) We can also give it the gradle target of a standard Beam expansion service: >>> provider = ExternalTransform(BeamJarExpansionService( … “sdks:java:io:google-cloud-platform:expansion-service:shadowJar”)) Let’s take a look at the output ofget_available()
to know the available transforms in the expansion service(s) we provided: >>> provider.get_available() [(‘JdbcWrite’, ‘beam:schematransform:org.apache.beam:jdbc_write:v1’), (‘BigtableRead’, ‘beam:schematransform:org.apache.beam:bigtable_read:v1’), …]Then retrieve a transform by
get()
,get_urn()
, or by directly accessing it as an attribute ofExternalTransformProvider
. All of the following commands do the same thing: >>> provider.get(‘BigqueryStorageRead’) >>> provider.get_urn( … ‘beam:schematransform:org.apache.beam:bigquery_storage_read:v1’) >>> provider.BigqueryStorageReadTo know more about the usage of a given transform, take a look at the description attribute. This returns some documentation IF the underlying SchemaTransform provides any. >>> provider.BigqueryStorageRead.description
Similarly, the configuration_schema attribute returns information about the parameters, including their names, types, and any documentation that the underlying SchemaTransform may provide: >>> provider.BigqueryStorageRead.configuration_schema {‘query’: ParamInfo(type=typing.Optional[str], description=’The SQL query to be executed to read from the BigQuery table.’, original_name=’query’), ‘row_restriction’: ParamInfo(type=typing.Optional[str]…}
The retrieved external transform can be used as a normal PTransform like so:
with Pipeline() as p: _ = (p | 'Read from BigQuery` >> provider.BigqueryStorageRead( query=query, row_restriction=restriction) | 'Some processing' >> beam.Map(...))
Experimental; no backwards compatibility guarantees.
-
get_available
() → List[Tuple[str, str]][source]¶ Get a list of available ExternalTransform names and identifiers
-
get_all
() → Dict[str, apache_beam.transforms.external_transform_provider.ExternalTransform][source]¶ Get all ExternalTransform
-