apache_beam.ml.transforms.embeddings.vertex_ai module¶
-
class
apache_beam.ml.transforms.embeddings.vertex_ai.
VertexAITextEmbeddings
(model_name: str, columns: List[str], title: Optional[str] = None, task_type: str = 'RETRIEVAL_DOCUMENT', project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, **kwargs)[source]¶ Bases:
apache_beam.ml.transforms.base.EmbeddingsManager
Embedding Config for Vertex AI Text Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long Text Embeddings are generated for a batch of text using the Vertex AI SDK. Embeddings are returned in a list for each text in the batch. Look at https://cloud.google.com/vertex-ai/docs/generative-ai/learn/model-versioning#stable-versions-available.md # pylint: disable=line-too-long for more information on model versions and lifecycle.
Parameters: - model_name – The name of the Vertex AI Text Embedding model.
- columns – The columns containing the text to be embedded.
- task_type – The downstream task for the embeddings. Valid values are RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING. For more information on the task type, look at https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long
- title – Identifier of the text content.
- project – The default GCP project for API calls.
- location – The default location for API calls.
- credentials – Custom credentials for API calls. Defaults to environment credentials.
-
class
apache_beam.ml.transforms.embeddings.vertex_ai.
VertexAIImageEmbeddings
(model_name: str, columns: List[str], dimension: Optional[int], project: Optional[str] = None, location: Optional[str] = None, credentials: Optional[google.auth.credentials.Credentials] = None, **kwargs)[source]¶ Bases:
apache_beam.ml.transforms.base.EmbeddingsManager
Embedding Config for Vertex AI Image Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings # pylint: disable=line-too-long Image Embeddings are generated for a batch of images using the Vertex AI API. Embeddings are returned in a list for each image in the batch. This transform makes remote calls to the Vertex AI service and may incur costs for use.
Parameters: - model_name – The name of the Vertex AI Multi-Modal Embedding model.
- columns – The columns containing the text to be embedded.
- dimension – The length of the embedding vector to generate. Must be one of 128, 256, 512, or 1408. If not set, Vertex AI’s default value is 1408.
- project – The default GCP project for API calls.
- location – The default location for API calls.
- credentials – Custom credentials for API calls. Defaults to environment credentials.