apache_beam.ml.transforms.embeddings.vertex_ai module

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAITextEmbeddings(model_name: str, columns: List[str], title: Optional[str] = None, task_type: str = 'RETRIEVAL_DOCUMENT', project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, **kwargs)[source]

Bases: apache_beam.ml.transforms.base.EmbeddingsManager

Embedding Config for Vertex AI Text Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long Text Embeddings are generated for a batch of text using the Vertex AI SDK. Embeddings are returned in a list for each text in the batch. Look at https://cloud.google.com/vertex-ai/docs/generative-ai/learn/model-versioning#stable-versions-available.md # pylint: disable=line-too-long for more information on model versions and lifecycle.

Parameters:
  • model_name – The name of the Vertex AI Text Embedding model.
  • columns – The columns containing the text to be embedded.
  • task_type – The downstream task for the embeddings. Valid values are RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING. For more information on the task type, look at https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long
  • title – Identifier of the text content.
  • project – The default GCP project for API calls.
  • location – The default location for API calls.
  • credentials – Custom credentials for API calls. Defaults to environment credentials.
get_model_handler() → apache_beam.ml.inference.base.ModelHandler[source]
get_ptransform_for_processing(**kwargs) → apache_beam.transforms.ptransform.PTransform[source]