apache_beam.ml.transforms.embeddings.vertex_ai module

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAITextEmbeddings(model_name: str, columns: list[str], title: str | None = None, task_type: str = 'RETRIEVAL_DOCUMENT', project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Text Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long Text Embeddings are generated for a batch of text using the Vertex AI SDK. Embeddings are returned in a list for each text in the batch. Look at https://cloud.google.com/vertex-ai/docs/generative-ai/learn/model-versioning#stable-versions-available.md # pylint: disable=line-too-long for more information on model versions and lifecycle.

Parameters:
  • model_name – The name of the Vertex AI Text Embedding model.

  • columns – The columns containing the text to be embedded.

  • task_type – The downstream task for the embeddings. Valid values are RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING. For more information on the task type, look at https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long

  • title – Identifier of the text content.

  • project – The default GCP project for API calls.

  • location – The default location for API calls.

  • credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() ModelHandler[source]
get_ptransform_for_processing(**kwargs) PTransform[source]
class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAIImageEmbeddings(model_name: str, columns: list[str], dimension: int | None, project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Image Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings # pylint: disable=line-too-long Image Embeddings are generated for a batch of images using the Vertex AI API. Embeddings are returned in a list for each image in the batch. This transform makes remote calls to the Vertex AI service and may incur costs for use.

Parameters:
  • model_name – The name of the Vertex AI Multi-Modal Embedding model.

  • columns – The columns containing the text to be embedded.

  • dimension – The length of the embedding vector to generate. Must be one of 128, 256, 512, or 1408. If not set, Vertex AI’s default value is 1408.

  • project – The default GCP project for API calls.

  • location – The default location for API calls.

  • credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() ModelHandler[source]
get_ptransform_for_processing(**kwargs) PTransform[source]