apache_beam.ml.inference.vertex_ai_inference module

class apache_beam.ml.inference.vertex_ai_inference.VertexAIModelHandlerJSON(endpoint_id: str, project: str, location: str, experiment: Optional[str] = None, network: Optional[str] = None, private: bool = False, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.base.ModelHandler

Implementation of the ModelHandler interface for Vertex AI. NOTE: This API and its implementation are under development and do not provide backward compatibility guarantees. Unlike other ModelHandler implementations, this does not load the model being used onto the worker and instead makes remote queries to a Vertex AI endpoint. In that way it functions more like a mid-pipeline IO. Public Vertex AI endpoints have a maximum request size of 1.5 MB. If you wish to make larger requests and use a private endpoint, provide the Compute Engine network you wish to use and set private=True

Parameters:
  • endpoint_id – the numerical ID of the Vertex AI endpoint to query
  • project – the GCP project name where the endpoint is deployed
  • location – the GCP location where the endpoint is deployed
  • experiment – optional. experiment label to apply to the queries. See https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments for more information.
  • network – optional. the full name of the Compute Engine network the endpoint is deployed on; used for private endpoints. The network or subnetwork Dataflow pipeline option must be set and match this network for pipeline execution. Ex: “projects/12345/global/networks/myVPC”
  • private – optional. if the deployed Vertex AI endpoint is private, set to true. Requires a network to be provided as well.
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
load_model() → google.cloud.aiplatform.models.Endpoint[source]

Loads the Endpoint object used to build and send prediction request to Vertex AI.

get_request(batch: Sequence[Any], model: google.cloud.aiplatform.models.Endpoint, throttle_delay_secs: int, inference_args: Optional[Dict[str, Any]])[source]
run_inference(batch: Sequence[Any], model: google.cloud.aiplatform.models.Endpoint, inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]

Sends a prediction request to a Vertex AI endpoint containing batch of inputs and matches that input with the prediction response from the endpoint as an iterable of PredictionResults.

Parameters:
  • batch – a sequence of any values to be passed to the Vertex AI endpoint. Should be encoded as the model expects.
  • model – an aiplatform.Endpoint object configured to access the desired model.
  • inference_args – any additional arguments to send as part of the prediction request.
Returns:

An iterable of Predictions.

validate_inference_args(inference_args: Optional[Dict[str, Any]])[source]
batch_elements_kwargs() → Mapping[str, Any][source]