apache_beam.transforms.enrichment module¶
-
apache_beam.transforms.enrichment.cross_join(left: Dict[str, Any], right: Dict[str, Any]) → apache_beam.pvalue.Row[source]¶ cross_join performs a cross join on two dict objects.
Joins the columns of the right row onto the left row.
Parameters: Returns: beam.Row containing the merged columns.
-
class
apache_beam.transforms.enrichment.EnrichmentSourceHandler[source]¶ Bases:
apache_beam.io.requestresponse.CallerWrapper class for
apache_beam.io.requestresponse.Caller.Ensure that the implementation of
__call__method returns a tuple of beam.Row objects.
-
class
apache_beam.transforms.enrichment.Enrichment(source_handler: apache_beam.transforms.enrichment.EnrichmentSourceHandler, join_fn: Callable[[Dict[str, Any], Dict[str, Any]], apache_beam.pvalue.Row] = <function cross_join>, timeout: Optional[float] = 30, repeater: apache_beam.io.requestresponse.Repeater = <apache_beam.io.requestresponse.ExponentialBackOffRepeater object>, throttler: apache_beam.io.requestresponse.PreCallThrottler = <apache_beam.io.requestresponse.DefaultThrottler object>)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformA
apache_beam.transforms.enrichment.Enrichmenttransform to enrich elements in a PCollection. NOTE: This transform and its implementation are under development and do not provide backward compatibility guarantees. Uses theapache_beam.transforms.enrichment.EnrichmentSourceHandlerto enrich elements by joining the metadata from external source.Processes an input
PCollectionof beam.Row by applying aapache_beam.transforms.enrichment.EnrichmentSourceHandlerto each element and returning the enrichedPCollection.Parameters: - source_handler – Handles source lookup and metadata retrieval.
Implements the
apache_beam.transforms.enrichment.EnrichmentSourceHandler - join_fn – A lambda function to join original element with lookup metadata. Defaults to CROSS_JOIN.
- timeout – (Optional) timeout for source requests. Defaults to 30 seconds.
- repeater (Repeater) – provides method to
repeat failed requests to API due to service errors. Defaults to
apache_beam.io.requestresponse.ExponentialBackOffRepeaterto repeat requests with exponential backoff. - throttler (PreCallThrottler) – provides methods to pre-throttle a request. Defaults to
apache_beam.io.requestresponse.DefaultThrottlerfor client-side adaptive throttling usingapache_beam.io.components.adaptive_throttler.AdaptiveThrottler.
- source_handler – Handles source lookup and metadata retrieval.
Implements the