apache_beam.io.requestresponse module¶

PTransform for reading from and writing to Web APIs.

class apache_beam.io.requestresponse.RequestResponseIO(caller: ~apache_beam.io.requestresponse.Caller[~apache_beam.io.requestresponse.RequestT, ~apache_beam.io.requestresponse.ResponseT], timeout: float | None = 30, should_backoff: ~apache_beam.io.requestresponse.ShouldBackOff | None = None, repeater: ~apache_beam.io.requestresponse.Repeater = <apache_beam.io.requestresponse.ExponentialBackOffRepeater object>, cache: ~apache_beam.io.requestresponse.Cache | None = None, throttler: ~apache_beam.io.requestresponse.PreCallThrottler = <apache_beam.io.requestresponse.DefaultThrottler object>)[source]¶

Bases: PTransform[PCollection[RequestT], PCollection[ResponseT]]

A RequestResponseIO transform to read and write to APIs.

Processes an input PCollection of requests by making a call to the API as defined in Caller’s __call__ method and returns a PCollection of responses.

Instantiates a RequestResponseIO transform.

Parameters:

caller – an implementation of Caller object that makes call to the API.
timeout (float) – timeout value in seconds to wait for response from API.
should_backoff – (Optional) provides methods for backoff.
repeater – provides method to repeat failed requests to API due to service errors. Defaults to apache_beam.io.requestresponse.ExponentialBackOffRepeater to repeat requests with exponential backoff.
cache – (Optional) a ~apache_beam.io.requestresponse.Cache object to use the appropriate cache.
throttler – provides methods to pre-throttle a request. Defaults to apache_beam.io.requestresponse.DefaultThrottler for client-side adaptive throttling using apache_beam.io.components.adaptive_throttler.AdaptiveThrottler

expand(requests: PCollection[RequestT]) → PCollection[ResponseT][source]¶

class apache_beam.io.requestresponse.ExponentialBackOffRepeater[source]¶

Bases: Repeater

Configure exponential backoff retry strategy.

It retries for exceptions due to the remote service such as TooManyRequests (HTTP 429), UserCodeTimeoutException, UserCodeQuotaException.

It utilizes the decorator apache_beam.utils.retry.with_exponential_backoff().

repeat(caller: Caller[RequestT, ResponseT], request: RequestT, timeout: float, metrics_collector: _MetricsCollector | None = None) → ResponseT[source]¶

repeat method is called from the RequestResponseIO when a repeater is enabled.

Parameters:

caller – a ~apache_beam.io.requestresponse.Caller object that calls the API.
request – input request to repeat.
timeout – time to wait for the request to complete.
metrics_collector – (Optional) a ~apache_beam.io.requestresponse._MetricsCollector object to collect the metrics for RequestResponseIO.

class apache_beam.io.requestresponse.DefaultThrottler(window_ms: int = 1, bucket_ms: int = 1, overload_ratio: float = 2, delay_secs: int = 5)[source]¶

Bases: PreCallThrottler

Default throttler that uses apache_beam.io.components.adaptive_throttler.AdaptiveThrottler

Parameters:

window_ms (int) – length of history to consider, in ms, to set throttling.
bucket_ms (int) – granularity of time buckets that we store data in, in ms.
overload_ratio (float) – the target ratio between requests sent and successful requests. This is “K” in the formula in https://landing.google.com/sre/book/chapters/handling-overload.html.
delay_secs (int) – minimum number of seconds to throttle a request.

class apache_beam.io.requestresponse.NoOpsRepeater[source]¶

Bases: Repeater

Executes a request just once irrespective of any exception.

repeat(caller: Caller[RequestT, ResponseT], request: RequestT, timeout: float, metrics_collector: _MetricsCollector | None) → ResponseT[source]¶

class apache_beam.io.requestresponse.RedisCache(host: str, port: int, time_to_live: int | timedelta = 86400, *, request_coder: Coder | None = None, response_coder: Coder | None = None, **kwargs)[source]¶

Bases: Cache

Configure cache using Redis for apache_beam.io.requestresponse.RequestResponseIO.

Parameters:

host (str) – The hostname or IP address of the Redis server.
port (int) – The port number of the Redis server.
time_to_live – (Union[int, timedelta]) The time-to-live (TTL) for records stored in Redis. Provide an integer (in seconds) or a datetime.timedelta object.
request_coder – (Optional[coders.Coder]) coder for encoding requests.
response_coder – (Optional[coders.Coder]) coder for decoding responses received from Redis.
kwargs – Optional additional keyword arguments that are required to connect to your redis server. Same as redis.Redis().

get_read()[source]¶: get_read returns a PTransform for reading from the cache.

get_write()[source]¶: returns a PTransform for writing to the cache.

property source_caller¶

property request_coder¶