Retriever Guide¶
Cogent provides a comprehensive retrieval system with multiple strategies for different use cases. This guide covers all available retrievers and when to use each.
Unified API¶
All retrievers share a unified retrieve() API with optional scoring:
# Get documents only (default)
docs = await retriever.retrieve("query", k=5)
# Get documents with relevance scores
results = await retriever.retrieve("query", k=5, include_scores=True)
for r in results:
print(f"{r.score:.3f}: {r.document.text[:50]}")
# With metadata filter
results = await retriever.retrieve(
"query",
k=10,
filter={"category": "docs"},
include_scores=True,
)
# Retriever-specific args (e.g., TimeBasedRetriever)
results = await retriever.retrieve(
"recent news",
k=5,
time_range=TimeRange.last_days(30),
include_scores=True,
)
Overview¶
| Category | Retriever | Best For |
|---|---|---|
| Core | DenseRetriever |
Semantic similarity search |
BM25Retriever |
Keyword/lexical matching | |
EnsembleRetriever |
Combining multiple retrievers (dense + sparse) | |
HybridRetriever |
Metadata filtering + content search | |
| Contextual | ParentDocumentRetriever |
Precise chunks → full context |
SentenceWindowRetriever |
Sentence-level → paragraph context | |
| LLM-Powered | SummaryRetriever |
Document summaries |
TreeRetriever |
Hierarchical summary tree | |
KeywordTableRetriever |
Keyword extraction + lookup | |
KnowledgeGraphRetriever |
Entity-based retrieval | |
SelfQueryRetriever |
Natural language → filters | |
| Specialized | HierarchicalRetriever |
Structured docs (markdown/html) |
TimeBasedRetriever |
Recency-aware retrieval | |
MultiRepresentationRetriever |
Multiple embeddings per doc |
Core Retrievers¶
DenseRetriever¶
Semantic search using vector embeddings. The most common retriever for general RAG applications.
from cogent.retriever import DenseRetriever
from cogent.vectorstore import VectorStore
# Create vectorstore and retriever
vectorstore = VectorStore(embeddings=embeddings)
await vectorstore.add_texts([
"Python is a programming language",
"Machine learning uses algorithms",
"Neural networks learn from data",
])
retriever = DenseRetriever(vectorstore)
results = await retriever.retrieve("AI and deep learning", k=2)
When to use: - General semantic search - Finding conceptually similar content - When exact keyword matching isn't required
BM25Retriever¶
Lexical retrieval using the BM25 algorithm. Fast, interpretable, and excellent for keyword queries.
from cogent.retriever import BM25Retriever
from cogent.vectorstore import Document
# Create documents
documents = [
Document(text="Python programming tutorial", metadata={"type": "tutorial"}),
Document(text="JavaScript web development", metadata={"type": "tutorial"}),
Document(text="Machine learning with Python", metadata={"type": "guide"}),
]
# Create BM25 retriever with documents
retriever = BM25Retriever(documents, k1=1.5, b=0.75)
# Or add documents later
retriever = BM25Retriever()
retriever.add_documents(documents)
# Keyword-based search
results = await retriever.retrieve("Python tutorial", k=2)
When to use: - Exact keyword matching is important - Domain-specific terminology - Fast, interpretable results needed - No embedding model available
HybridRetriever¶
Combines metadata search with content search. Wraps any retriever and boosts/filters by metadata fields.
from cogent.retriever import HybridRetriever, DenseRetriever, MetadataMatchMode
# Wrap any content retriever
content_retriever = DenseRetriever(vectorstore)
hybrid = HybridRetriever(
retriever=content_retriever,
metadata_fields=["category", "author", "department"],
metadata_weight=0.3, # 30% from metadata match
content_weight=0.7, # 70% from content match
mode=MetadataMatchMode.BOOST, # or ALL, ANY
)
# Query searches both metadata and content
results = await hybrid.retrieve("machine learning best practices", k=5)
# Each result has enriched metadata
for r in results:
print(f"Content score: {r.metadata['content_score']}")
print(f"Metadata score: {r.metadata['metadata_score']}")
Matching modes:
- BOOST: Metadata matches increase score (no filtering)
- ALL: Only return docs matching ALL metadata terms
- ANY: Return docs matching ANY metadata term
When to use: - Documents have rich metadata (author, category, date) - Users search by both content and attributes - Want to boost relevant metadata matches
EnsembleRetriever¶
Combine any number of retrievers with configurable fusion strategies.
from cogent.retriever import (
EnsembleRetriever,
DenseRetriever,
BM25Retriever,
)
# Combine multiple retrievers
ensemble = EnsembleRetriever(
retrievers=[
DenseRetriever(vectorstore_openai), # OpenAI embeddings
DenseRetriever(vectorstore_cohere), # Cohere embeddings
BM25Retriever(documents), # Lexical
],
weights=[0.4, 0.4, 0.2],
fusion="rrf", # or "linear", "max", "voting"
)
results = await ensemble.retrieve("query", k=10)
Fusion strategies:
- rrf (Reciprocal Rank Fusion): Best for diverse retrievers (default)
- linear: Weighted score combination
- max: Take highest score per document
- voting: Count how many retrievers found each doc
Tip: The RAG capability accepts
retrievers=directly and creates an EnsembleRetriever internally.
Contextual Retrievers¶
ParentDocumentRetriever¶
Index small chunks for precise matching, but return full parent documents for context. Solves the embedding dilution problem.
The Problem: - Large chunks → Embeddings average across many topics → Imprecise matching - Small chunks → Focused embeddings → Precise matching but missing context
The Solution: ParentDocumentRetriever indexes small chunks (for precise matching) but returns entire parent documents (for complete context). Best of both worlds.
from cogent.retriever import ParentDocumentRetriever
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
chunk_size=150, # Small chunks for precise matching
chunk_overlap=20,
)
# Add full documents (automatically chunked internally)
await retriever.add_documents(documents)
# Search finds chunks, returns parents
results = await retriever.retrieve("specific concept", k=3)
# Each result is a full document, not a chunk
Example:
# Document: 1000-char database performance guide covering:
# - Connection pooling
# - Query optimization
# - Index strategies
# Without ParentDocumentRetriever:
# Embedding averages ALL topics → diluted, score 0.35
# With ParentDocumentRetriever:
# Chunk 1: Pure connection pooling content (150 chars)
# Query: "connection pool timeouts"
# Match: Focused embedding → score 0.58 ✓
# Return: Full 1000-char parent document with complete context
When to use: - Medium-sized documents (1-5 pages, ~1-10KB) - LLM needs more context than a single chunk - Documents have interconnected information - You want precise matching with comprehensive results
When NOT to use: - ❌ Very long documents (books, 50+ page PDFs) - ❌ Documents exceeding LLM context window (100K+ tokens) - ❌ When only specific excerpts are needed
Alternatives for very long documents:
- SentenceWindowRetriever - Returns chunk + configurable surrounding sentences
- DenseRetriever with larger chunks (500-1000 tokens)
- Hierarchical chunking with summaries
SentenceWindowRetriever¶
Index individual sentences, but return with surrounding context.
from cogent.retriever import SentenceWindowRetriever
retriever = SentenceWindowRetriever(
vectorstore=vectorstore,
window_size=2, # 2 sentences before and after
)
await retriever.add_documents(documents)
# Precise sentence match with context
results = await retriever.retrieve("specific fact", k=3, include_scores=True)
for r in results:
print(f"Matched: {r.metadata['matched_sentence']}")
print(f"Context: {r.document.text}") # Full window
When to use: - Need precise sentence-level matching - Want to return paragraph-level context - Fact-checking or citation tasks
LLM-Powered Indexes¶
SummaryRetriever¶
Generate LLM summaries of documents for efficient high-level retrieval.
from cogent.retriever import SummaryRetriever
index = SummaryRetriever(
llm=model,
vectorstore=vectorstore,
extract_entities=True, # For knowledge graph
extract_keywords=True,
)
await index.add_documents(long_documents)
# Search by summary
results = await index.retrieve("machine learning concepts", k=3)
# Access extracted entities for KG integration
for doc_id, summary in index.summaries.items():
print(f"Keywords: {summary.keywords}")
print(f"Entities: {summary.entities}")
When to use: - Long documents that don't fit in embeddings well - Need document-level topics quickly - Building knowledge graphs from documents
TreeRetriever¶
Hierarchical tree of summaries for very large documents or corpora.
from cogent.retriever import TreeRetriever
index = TreeRetriever(
llm=model,
vectorstore=vectorstore,
max_children=5, # Children per node
max_depth=3, # Tree depth
)
await index.add_documents(very_large_documents)
# Efficient tree traversal
results = await index.retrieve("specific topic", k=5)
When to use: - Very large documents (books, manuals) - Corpus-level search across many documents - When full indexing is too slow/expensive
KeywordTableRetriever¶
Extract keywords with LLM and build inverted index for fast lookup.
from cogent.retriever import KeywordTableRetriever
index = KeywordTableRetriever(
llm=model,
max_keywords_per_doc=10,
)
await index.add_documents(documents)
# Fast keyword-based lookup
results = await index.retrieve("Python machine learning", k=5)
# Access keyword table
print(index.keyword_table) # {"python": [doc_ids...], "ml": [...]}
When to use: - Domain with specific terminology - Fast keyword lookup needed - Interpretable retrieval wanted
SelfQueryRetriever¶
LLM parses natural language queries into semantic search + metadata filters.
from cogent.retriever import SelfQueryRetriever, AttributeInfo
retriever = SelfQueryRetriever(
vectorstore=vectorstore,
llm=model,
attribute_info=[
AttributeInfo("category", "Document category", "string"),
AttributeInfo("year", "Publication year", "integer"),
AttributeInfo("author", "Author name", "string"),
],
)
# Natural language with implicit filters
results = await retriever.retrieve(
"research papers about AI from 2024 by OpenAI"
)
# LLM extracts: semantic="AI research papers"
# filter={"year": 2024, "author": "OpenAI"}
When to use: - Users query in natural language - Documents have filterable metadata - Want to combine semantic + structured search
Specialized Indexes¶
HierarchicalRetriever¶
Respect and leverage document structure (headers, sections).
from cogent.retriever import HierarchicalRetriever
index = HierarchicalRetriever(
vectorstore=vectorstore,
llm=model,
structure_type="markdown", # or "html"
top_k_sections=3,
chunks_per_section=3,
)
await index.add_documents(structured_docs)
# Find section first, then relevant chunks
results = await index.retrieve("installation", k=5)
for r in results:
print(f"Section: {r.metadata['section_title']}")
print(f"Path: {r.metadata['hierarchy_path']}")
When to use: - Well-structured documents (docs, manuals, specs) - Want to respect document organization - Need section-level context
TimeBasedRetriever¶
Prioritize recent information with time-decay scoring.
from cogent.retriever import TimeBasedRetriever, TimeRange, DecayFunction
index = TimeBasedRetriever(
vectorstore=vectorstore,
decay_function=DecayFunction.EXPONENTIAL,
decay_rate=0.01, # Halve score every ~70 days
auto_extract_timestamps=True,
)
await index.add_documents(news_articles)
# Recent docs score higher
results = await index.retrieve("market trends", k=5)
# Filter by time range
results = await index.retrieve(
"company policy",
time_range=TimeRange.last_days(30),
)
# Point-in-time query
results = await index.retrieve(
"regulations",
time_range=TimeRange.year(2023),
)
Decay functions:
- EXPONENTIAL: Smooth decay over time
- LINEAR: Linear decrease
- STEP: Full score within window, zero outside
- LOGARITHMIC: Slow initial decay
- NONE: No decay, just filtering
When to use: - News, articles, changelogs - Evolving knowledge bases - Time-sensitive information
MultiRepresentationRetriever¶
Store multiple embeddings per document for diverse query handling.
from cogent.retriever import MultiRepresentationRetriever, QueryType
index = MultiRepresentationRetriever(
vectorstore=vectorstore,
llm=model,
representations=["original", "summary", "detailed", "questions"],
)
await index.add_documents(documents)
# Auto-detect query type
results = await index.retrieve("What is machine learning?")
# Force specific representation
results = await index.retrieve(
"backpropagation gradient calculation",
query_type=QueryType.SPECIFIC, # Uses detailed representation
)
# Search all and fuse
results = await index.retrieve(
"AI applications",
search_all=True,
)
Representations:
- original: Raw document embedding
- summary: Conceptual summary
- detailed: Technical details
- keywords: Key terms
- questions: Hypothetical Q&A
- entities: Named entities
When to use: - Diverse query styles expected - Technical/specialized domains - Want maximum recall
Rerankers¶
Rerankers improve retrieval quality by re-scoring initial results.
from cogent.retriever import (
DenseRetriever,
CrossEncoderReranker,
CohereReranker,
LLMReranker,
)
# Initial retrieval
retriever = DenseRetriever(vectorstore)
initial_docs = await retriever.retrieve(query, k=20) # Get documents
# Rerank with cross-encoder (local)
reranker = CrossEncoderReranker(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
reranked = await reranker.rerank(query, initial_docs, top_n=5)
# Or Cohere Rerank API
reranker = CohereReranker(api_key="...")
reranked = await reranker.rerank(query, initial_docs, top_n=5)
# Or any LLM
reranker = LLMReranker(llm=model)
reranked = await reranker.rerank(query, initial_docs, top_n=5)
Available rerankers:
- CrossEncoderReranker: Local cross-encoder models
- FlashRankReranker: Lightweight, fast local reranker
- CohereReranker: Cohere Rerank API
- LLMReranker: Any LLM for pointwise scoring
- ListwiseLLMReranker: LLM ranks all docs at once
Utilities¶
Fusion Functions¶
from cogent.retriever import fuse_results, FusionStrategy
# Fuse results from multiple retrievers
fused = fuse_results(
[results_1, results_2, results_3],
strategy=FusionStrategy.RRF,
weights=[0.5, 0.3, 0.2],
k=10,
)
Score Normalization¶
from cogent.retriever import normalize_scores
# Normalize scores to 0-1 range
normalized = normalize_scores(results)
Deduplication¶
from cogent.retriever import deduplicate_results
# Remove duplicate documents
unique = deduplicate_results(results, by="content") # or "id"
Citations and Formatting¶
For RAG applications, use these utilities to prepare results for LLM prompts:
from cogent.retriever import (
add_citations,
format_context,
format_citations_reference,
filter_by_score,
top_k,
)
# Retrieve results
results = await retriever.retrieve(query, k=10, include_scores=True)
# Filter low-quality results
results = filter_by_score(results, min_score=0.5)
results = top_k(results, k=5)
# Add citation markers «1», «2», etc.
results = add_citations(results)
# results[0].metadata["citation"] == "«1»"
# Format as context string for LLM prompt
context = format_context(results)
# Output:
# «1» [Source: doc.pdf]
# This is the first chunk of text...
#
# ---
#
# «2» [Source: other.pdf]
# This is the second chunk...
# Generate citations reference section
reference = format_citations_reference(results)
# Output:
# Sources:
# «1» doc.pdf: This is a preview of the first document...
# «2» other.pdf: This is a preview of the second...
Example RAG prompt construction:
query = "What are the key findings?"
results = await retriever.retrieve(query, k=5, include_scores=True)
results = filter_by_score(results, min_score=0.5)
results = add_citations(results)
context = format_context(results)
prompt = f"""Based on the following context, answer the question.
Use citation markers like «1» to reference sources.
Context:
{context}
Question: {query}
Answer:"""
Choosing a Retriever¶
┌─────────────────────────────────────────────────────────────┐
│ What's your use case? │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
General RAG Specialized Advanced
│ │ │
▼ │ │
┌───────────────┐ │ │
│ HybridRetriever│◄──────┤ │
│ (default) │ │ │
└───────────────┘ │ │
▼ │
┌─────────────────────┐ │
│ Time-sensitive? │ │
│ → TimeBasedRetriever │ │
├─────────────────────┤ │
│ Structured docs? │ │
│ → HierarchicalRetriever│ │
├─────────────────────┤ │
│ Need full context? │ │
│ → ParentDocument │ │
└─────────────────────┘ │
▼
┌─────────────────────────────┐
│ Multiple embedding models? │
│ → EnsembleRetriever │
├─────────────────────────────┤
│ Natural language filters? │
│ → SelfQueryRetriever │
├─────────────────────────────┤
│ Very long documents? │
│ → SummaryRetriever / TreeRetriever │
└─────────────────────────────┘
Performance Tips¶
- Start with EnsembleRetriever(dense + sparse) - Best default for most cases
- Use rerankers - Cheap way to improve quality
- Retrieve more, rerank less - Get top 20-50, rerank to top 5
- Cache embeddings - Reuse for similar queries
- Batch operations - Add documents in batches
- Add HybridRetriever for metadata - When you have structured metadata to filter/boost