apache_beam.ml.rag.ingestion.milvus_search module
- class apache_beam.ml.rag.ingestion.milvus_search.MilvusWriteConfig(collection_name: str, partition_name: str = '', timeout: float | None = None, write_config: ~apache_beam.ml.rag.ingestion.jdbc_common.WriteConfig = <factory>, kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
objectConfiguration parameters for writing data to Milvus collections.
This class defines the parameters needed to write data to a Milvus collection, including collection targeting, batching behavior, and operation timeouts.
- Parameters:
collection_name – Name of the target Milvus collection to write data to. Must be a non-empty string.
partition_name – Name of the specific partition within the collection to write to. If empty, writes to the default partition.
timeout – Maximum time in seconds to wait for write operations to complete. If None, uses the client’s default timeout.
write_config – Configuration for write operations including batch size and other write-specific settings.
kwargs – Additional keyword arguments for write operations. Enables forward compatibility with future Milvus client parameters.
- write_config: WriteConfig
- property write_batch_size
Returns the batch size for write operations.
- Returns:
The configured batch size, or DEFAULT_WRITE_BATCH_SIZE if not specified.
- class apache_beam.ml.rag.ingestion.milvus_search.MilvusVectorWriterConfig(connection_params: ~apache_beam.ml.rag.utils.MilvusConnectionParameters, write_config: ~apache_beam.ml.rag.ingestion.milvus_search.MilvusWriteConfig, column_specs: ~typing.List[~apache_beam.ml.rag.ingestion.postgres_common.ColumnSpec] = <factory>)[source]
Bases:
VectorDatabaseWriteConfigConfiguration for writing vector data to Milvus collections.
This class extends VectorDatabaseWriteConfig to provide Milvus-specific configuration for ingesting vector embeddings and associated metadata. It defines how Apache Beam chunks are converted to Milvus records and handles the write operation parameters.
The configuration includes connection parameters, write settings, and column specifications that determine how chunk data is mapped to Milvus fields.
- Parameters:
connection_params – Configuration for connecting to the Milvus server, including URI, credentials, and connection options.
write_config – Configuration for write operations including collection name, partition, batch size, and timeouts.
column_specs – List of column specifications defining how chunk fields are mapped to Milvus collection fields. Defaults to standard RAG fields (id, embedding, sparse_embedding, content, metadata).
Example
- config = MilvusVectorWriterConfig(
- connection_params=MilvusConnectionParameters(
uri=”http://localhost:19530”),
write_config=MilvusWriteConfig(collection_name=”my_collection”), column_specs=MilvusVectorWriterConfig.default_column_specs())
- connection_params: MilvusConnectionParameters
- write_config: MilvusWriteConfig
- column_specs: List[ColumnSpec]
- create_converter() Callable[[Chunk], Dict[str, Any]][source]
Creates a function to convert Apache Beam Chunks to Milvus records.
- Returns:
A function that takes a Chunk and returns a dictionary representing a Milvus record with fields mapped according to column_specs.
- create_write_transform() PTransform[source]
Creates the Apache Beam transform for writing to Milvus.
- Returns:
A PTransform that can be applied to a PCollection of Chunks to write them to the configured Milvus collection.
- static default_column_specs() List[ColumnSpec][source]
Returns default column specifications for RAG use cases.
Creates column mappings for standard RAG fields: id, dense embedding, sparse embedding, content text, and metadata. These specifications define how Chunk fields are converted to Milvus-compatible formats.
- Returns:
List of ColumnSpec objects defining the default field mappings.