apache_beam.transforms.enrichment module¶
-
apache_beam.transforms.enrichment.
cross_join
(left: Dict[str, Any], right: Dict[str, Any]) → apache_beam.pvalue.Row[source]¶ cross_join performs a cross join on two dict objects.
Joins the columns of the right row onto the left row.
Parameters: Returns: beam.Row containing the merged columns.
-
class
apache_beam.transforms.enrichment.
EnrichmentSourceHandler
[source]¶ Bases:
apache_beam.io.requestresponse.Caller
Wrapper class for
apache_beam.io.requestresponse.Caller
.Ensure that the implementation of
__call__
method returns a tuple of beam.Row objects.
-
class
apache_beam.transforms.enrichment.
Enrichment
(source_handler: apache_beam.transforms.enrichment.EnrichmentSourceHandler, join_fn: Callable[[Dict[str, Any], Dict[str, Any]], apache_beam.pvalue.Row] = <function cross_join>, timeout: Optional[float] = 30, repeater: apache_beam.io.requestresponse.Repeater = <apache_beam.io.requestresponse.ExponentialBackOffRepeater object>, throttler: apache_beam.io.requestresponse.PreCallThrottler = <apache_beam.io.requestresponse.DefaultThrottler object>)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A
apache_beam.transforms.enrichment.Enrichment
transform to enrich elements in a PCollection. NOTE: This transform and its implementation are under development and do not provide backward compatibility guarantees. Uses theapache_beam.transforms.enrichment.EnrichmentSourceHandler
to enrich elements by joining the metadata from external source.Processes an input
PCollection
of beam.Row by applying aapache_beam.transforms.enrichment.EnrichmentSourceHandler
to each element and returning the enrichedPCollection
.Parameters: - source_handler – Handles source lookup and metadata retrieval.
Implements the
apache_beam.transforms.enrichment.EnrichmentSourceHandler
- join_fn – A lambda function to join original element with lookup metadata. Defaults to CROSS_JOIN.
- timeout – (Optional) timeout for source requests. Defaults to 30 seconds.
- repeater (Repeater) – provides method to
repeat failed requests to API due to service errors. Defaults to
apache_beam.io.requestresponse.ExponentialBackOffRepeater
to repeat requests with exponential backoff. - throttler (PreCallThrottler) – provides methods to pre-throttle a request. Defaults to
apache_beam.io.requestresponse.DefaultThrottler
for client-side adaptive throttling usingapache_beam.io.components.adaptive_throttler.AdaptiveThrottler
.
- source_handler – Handles source lookup and metadata retrieval.
Implements the