apache_beam.ml.rag.types module
Core types for RAG pipelines. This module contains the core dataclasses used throughout the RAG pipeline implementation, including Chunk and Embedding types that define the data contracts between different stages of the pipeline.
- class apache_beam.ml.rag.types.Content(text: str | None = None)[source]
Bases:
objectContainer for embeddable content. Add new types as when as necessary.
- Parameters:
text – Text content to be embedded
- class apache_beam.ml.rag.types.Embedding(dense_embedding: List[float] | None = None, sparse_embedding: Tuple[List[int], List[float]] | None = None)[source]
Bases:
objectRepresents vector embeddings.
- Parameters:
dense_embedding – Dense vector representation
sparse_embedding – Optional sparse vector representation for hybrid search
- class apache_beam.ml.rag.types.Chunk(content: ~apache_beam.ml.rag.types.Content, id: str = <factory>, index: int = 0, metadata: ~typing.Dict[str, ~typing.Any] = <factory>, embedding: ~apache_beam.ml.rag.types.Embedding | None = None)[source]
Bases:
objectRepresents a chunk of embeddable content with metadata.
- Parameters:
content – The actual content of the chunk
id – Unique identifier for the chunk
index – Index of this chunk within the original document
metadata – Additional metadata about the chunk (e.g., document source)
embedding – Vector embeddings of the content