apache_beam.ml.rag.types module
Core types for RAG pipelines. This module contains the core dataclasses used throughout the RAG pipeline implementation, including Chunk and Embedding types that define the data contracts between different stages of the pipeline.
- class apache_beam.ml.rag.types.Content(text: str | None = None)[source]
Bases:
object
Container for embeddable content. Add new types as when as necessary.
- Parameters:
text – Text content to be embedded
- class apache_beam.ml.rag.types.Embedding(dense_embedding: List[float] | None = None, sparse_embedding: Tuple[List[int], List[float]] | None = None)[source]
Bases:
object
Represents vector embeddings.
- Parameters:
dense_embedding – Dense vector representation
sparse_embedding – Optional sparse vector representation for hybrid search
- class apache_beam.ml.rag.types.Chunk(content: ~apache_beam.ml.rag.types.Content, id: str = <factory>, index: int = 0, metadata: ~typing.Dict[str, ~typing.Any] = <factory>, embedding: ~apache_beam.ml.rag.types.Embedding | None = None)[source]
Bases:
object
Represents a chunk of embeddable content with metadata.
- Parameters:
content – The actual content of the chunk
id – Unique identifier for the chunk
index – Index of this chunk within the original document
metadata – Additional metadata about the chunk (e.g., document source)
embedding – Vector embeddings of the content