apache_beam.ml.transforms.embeddings.vertex_ai module

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAITextEmbeddings(model_name: str, columns: list[str], title: str | None = None, task_type: str = 'RETRIEVAL_DOCUMENT', project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Text Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long Text Embeddings are generated for a batch of text using the Vertex AI SDK. Embeddings are returned in a list for each text in the batch. Look at https://cloud.google.com/vertex-ai/docs/generative-ai/learn/model-versioning#stable-versions-available.md # pylint: disable=line-too-long for more information on model versions and lifecycle.

Parameters:

model_name – The name of the Vertex AI Text Embedding model.
columns – The columns containing the text to be embedded.
task_type – The downstream task for the embeddings. Valid values are RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING. For more information on the task type, look at https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long
title – Identifier of the text content.
project – The default GCP project for API calls.
location – The default location for API calls.
credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() → ModelHandler[source]

get_ptransform_for_processing(**kwargs) → PTransform[source]

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAIImageEmbeddings(model_name: str, columns: list[str], dimension: int | None, project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Image Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings # pylint: disable=line-too-long Image Embeddings are generated for a batch of images using the Vertex AI API. Embeddings are returned in a list for each image in the batch. This transform makes remote calls to the Vertex AI service and may incur costs for use.

Parameters:

model_name – The name of the Vertex AI Multi-Modal Embedding model.
columns – The columns containing the image to be embedded.
dimension – The length of the embedding vector to generate. Must be one of 128, 256, 512, or 1408. If not set, Vertex AI’s default value is 1408.
project – The default GCP project for API calls.
location – The default location for API calls.
credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() → ModelHandler[source]

get_ptransform_for_processing(**kwargs) → PTransform[source]