Skip to content

Retriever Guide

Cogent provides a comprehensive retrieval system with multiple strategies for different use cases. This guide covers all available retrievers and when to use each.

Unified API

All retrievers share a unified retrieve() API with optional scoring:

# Get documents only (default)
docs = await retriever.retrieve("query", k=5)

# Get documents with relevance scores
results = await retriever.retrieve("query", k=5, include_scores=True)
for r in results:
    print(f"{r.score:.3f}: {r.document.text[:50]}")

# With metadata filter
results = await retriever.retrieve(
    "query",
    k=10,
    filter={"category": "docs"},
    include_scores=True,
)

# Retriever-specific args (e.g., TimeBasedRetriever)
results = await retriever.retrieve(
    "recent news",
    k=5,
    time_range=TimeRange.last_days(30),
    include_scores=True,
)

Overview

Category Retriever Best For
Core DenseRetriever Semantic similarity search
BM25Retriever Keyword/lexical matching
EnsembleRetriever Combining multiple retrievers (dense + sparse)
HybridRetriever Metadata filtering + content search
Contextual ParentDocumentRetriever Precise chunks → full context
SentenceWindowRetriever Sentence-level → paragraph context
LLM-Powered SummaryRetriever Document summaries
TreeRetriever Hierarchical summary tree
KeywordTableRetriever Keyword extraction + lookup
KnowledgeGraphRetriever Entity-based retrieval
SelfQueryRetriever Natural language → filters
Specialized HierarchicalRetriever Structured docs (markdown/html)
TimeBasedRetriever Recency-aware retrieval
MultiRepresentationRetriever Multiple embeddings per doc

Core Retrievers

DenseRetriever

Semantic search using vector embeddings. The most common retriever for general RAG applications.

from cogent.retriever import DenseRetriever
from cogent.vectorstore import VectorStore

# Create vectorstore and retriever
vectorstore = VectorStore(embeddings=embeddings)
await vectorstore.add_texts([
    "Python is a programming language",
    "Machine learning uses algorithms",
    "Neural networks learn from data",
])

retriever = DenseRetriever(vectorstore)
results = await retriever.retrieve("AI and deep learning", k=2)

When to use: - General semantic search - Finding conceptually similar content - When exact keyword matching isn't required


BM25Retriever

Lexical retrieval using the BM25 algorithm. Fast, interpretable, and excellent for keyword queries.

from cogent.retriever import BM25Retriever
from cogent.vectorstore import Document

# Create documents
documents = [
    Document(text="Python programming tutorial", metadata={"type": "tutorial"}),
    Document(text="JavaScript web development", metadata={"type": "tutorial"}),
    Document(text="Machine learning with Python", metadata={"type": "guide"}),
]

# Create BM25 retriever with documents
retriever = BM25Retriever(documents, k1=1.5, b=0.75)

# Or add documents later
retriever = BM25Retriever()
retriever.add_documents(documents)

# Keyword-based search
results = await retriever.retrieve("Python tutorial", k=2)

When to use: - Exact keyword matching is important - Domain-specific terminology - Fast, interpretable results needed - No embedding model available


HybridRetriever

Combines metadata search with content search. Wraps any retriever and boosts/filters by metadata fields.

from cogent.retriever import HybridRetriever, DenseRetriever, MetadataMatchMode

# Wrap any content retriever
content_retriever = DenseRetriever(vectorstore)

hybrid = HybridRetriever(
    retriever=content_retriever,
    metadata_fields=["category", "author", "department"],
    metadata_weight=0.3,   # 30% from metadata match
    content_weight=0.7,    # 70% from content match
    mode=MetadataMatchMode.BOOST,  # or ALL, ANY
)

# Query searches both metadata and content
results = await hybrid.retrieve("machine learning best practices", k=5)

# Each result has enriched metadata
for r in results:
    print(f"Content score: {r.metadata['content_score']}")
    print(f"Metadata score: {r.metadata['metadata_score']}")

Matching modes: - BOOST: Metadata matches increase score (no filtering) - ALL: Only return docs matching ALL metadata terms - ANY: Return docs matching ANY metadata term

When to use: - Documents have rich metadata (author, category, date) - Users search by both content and attributes - Want to boost relevant metadata matches


EnsembleRetriever

Combine any number of retrievers with configurable fusion strategies.

from cogent.retriever import (
    EnsembleRetriever,
    DenseRetriever,
    BM25Retriever,
)

# Combine multiple retrievers
ensemble = EnsembleRetriever(
    retrievers=[
        DenseRetriever(vectorstore_openai),    # OpenAI embeddings
        DenseRetriever(vectorstore_cohere),    # Cohere embeddings
        BM25Retriever(documents),              # Lexical
    ],
    weights=[0.4, 0.4, 0.2],
    fusion="rrf",  # or "linear", "max", "voting"
)

results = await ensemble.retrieve("query", k=10)

Fusion strategies: - rrf (Reciprocal Rank Fusion): Best for diverse retrievers (default) - linear: Weighted score combination - max: Take highest score per document - voting: Count how many retrievers found each doc

Tip: The RAG capability accepts retrievers= directly and creates an EnsembleRetriever internally.


Contextual Retrievers

ParentDocumentRetriever

Index small chunks for precise matching, but return full parent documents for context. Solves the embedding dilution problem.

The Problem: - Large chunks → Embeddings average across many topics → Imprecise matching - Small chunks → Focused embeddings → Precise matching but missing context

The Solution: ParentDocumentRetriever indexes small chunks (for precise matching) but returns entire parent documents (for complete context). Best of both worlds.

from cogent.retriever import ParentDocumentRetriever

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    chunk_size=150,     # Small chunks for precise matching
    chunk_overlap=20,
)

# Add full documents (automatically chunked internally)
await retriever.add_documents(documents)

# Search finds chunks, returns parents
results = await retriever.retrieve("specific concept", k=3)
# Each result is a full document, not a chunk

Example:

# Document: 1000-char database performance guide covering:
#   - Connection pooling
#   - Query optimization  
#   - Index strategies

# Without ParentDocumentRetriever:
# Embedding averages ALL topics → diluted, score 0.35

# With ParentDocumentRetriever:
# Chunk 1: Pure connection pooling content (150 chars)
# Query: "connection pool timeouts"
# Match: Focused embedding → score 0.58 ✓
# Return: Full 1000-char parent document with complete context

When to use: - Medium-sized documents (1-5 pages, ~1-10KB) - LLM needs more context than a single chunk - Documents have interconnected information - You want precise matching with comprehensive results

When NOT to use: - ❌ Very long documents (books, 50+ page PDFs) - ❌ Documents exceeding LLM context window (100K+ tokens) - ❌ When only specific excerpts are needed

Alternatives for very long documents: - SentenceWindowRetriever - Returns chunk + configurable surrounding sentences - DenseRetriever with larger chunks (500-1000 tokens) - Hierarchical chunking with summaries


SentenceWindowRetriever

Index individual sentences, but return with surrounding context.

from cogent.retriever import SentenceWindowRetriever

retriever = SentenceWindowRetriever(
    vectorstore=vectorstore,
    window_size=2,  # 2 sentences before and after
)

await retriever.add_documents(documents)

# Precise sentence match with context
results = await retriever.retrieve("specific fact", k=3, include_scores=True)
for r in results:
    print(f"Matched: {r.metadata['matched_sentence']}")
    print(f"Context: {r.document.text}")  # Full window

When to use: - Need precise sentence-level matching - Want to return paragraph-level context - Fact-checking or citation tasks


LLM-Powered Indexes

SummaryRetriever

Generate LLM summaries of documents for efficient high-level retrieval.

from cogent.retriever import SummaryRetriever

index = SummaryRetriever(
    llm=model,
    vectorstore=vectorstore,
    extract_entities=True,   # For knowledge graph
    extract_keywords=True,
)

await index.add_documents(long_documents)

# Search by summary
results = await index.retrieve("machine learning concepts", k=3)

# Access extracted entities for KG integration
for doc_id, summary in index.summaries.items():
    print(f"Keywords: {summary.keywords}")
    print(f"Entities: {summary.entities}")

When to use: - Long documents that don't fit in embeddings well - Need document-level topics quickly - Building knowledge graphs from documents


TreeRetriever

Hierarchical tree of summaries for very large documents or corpora.

from cogent.retriever import TreeRetriever

index = TreeRetriever(
    llm=model,
    vectorstore=vectorstore,
    max_children=5,      # Children per node
    max_depth=3,         # Tree depth
)

await index.add_documents(very_large_documents)

# Efficient tree traversal
results = await index.retrieve("specific topic", k=5)

When to use: - Very large documents (books, manuals) - Corpus-level search across many documents - When full indexing is too slow/expensive


KeywordTableRetriever

Extract keywords with LLM and build inverted index for fast lookup.

from cogent.retriever import KeywordTableRetriever

index = KeywordTableRetriever(
    llm=model,
    max_keywords_per_doc=10,
)

await index.add_documents(documents)

# Fast keyword-based lookup
results = await index.retrieve("Python machine learning", k=5)

# Access keyword table
print(index.keyword_table)  # {"python": [doc_ids...], "ml": [...]}

When to use: - Domain with specific terminology - Fast keyword lookup needed - Interpretable retrieval wanted


SelfQueryRetriever

LLM parses natural language queries into semantic search + metadata filters.

from cogent.retriever import SelfQueryRetriever, AttributeInfo

retriever = SelfQueryRetriever(
    vectorstore=vectorstore,
    llm=model,
    attribute_info=[
        AttributeInfo("category", "Document category", "string"),
        AttributeInfo("year", "Publication year", "integer"),
        AttributeInfo("author", "Author name", "string"),
    ],
)

# Natural language with implicit filters
results = await retriever.retrieve(
    "research papers about AI from 2024 by OpenAI"
)
# LLM extracts: semantic="AI research papers"
#              filter={"year": 2024, "author": "OpenAI"}

When to use: - Users query in natural language - Documents have filterable metadata - Want to combine semantic + structured search


Specialized Indexes

HierarchicalRetriever

Respect and leverage document structure (headers, sections).

from cogent.retriever import HierarchicalRetriever

index = HierarchicalRetriever(
    vectorstore=vectorstore,
    llm=model,
    structure_type="markdown",  # or "html"
    top_k_sections=3,
    chunks_per_section=3,
)

await index.add_documents(structured_docs)

# Find section first, then relevant chunks
results = await index.retrieve("installation", k=5)
for r in results:
    print(f"Section: {r.metadata['section_title']}")
    print(f"Path: {r.metadata['hierarchy_path']}")

When to use: - Well-structured documents (docs, manuals, specs) - Want to respect document organization - Need section-level context


TimeBasedRetriever

Prioritize recent information with time-decay scoring.

from cogent.retriever import TimeBasedRetriever, TimeRange, DecayFunction

index = TimeBasedRetriever(
    vectorstore=vectorstore,
    decay_function=DecayFunction.EXPONENTIAL,
    decay_rate=0.01,  # Halve score every ~70 days
    auto_extract_timestamps=True,
)

await index.add_documents(news_articles)

# Recent docs score higher
results = await index.retrieve("market trends", k=5)

# Filter by time range
results = await index.retrieve(
    "company policy",
    time_range=TimeRange.last_days(30),
)

# Point-in-time query
results = await index.retrieve(
    "regulations",
    time_range=TimeRange.year(2023),
)

Decay functions: - EXPONENTIAL: Smooth decay over time - LINEAR: Linear decrease - STEP: Full score within window, zero outside - LOGARITHMIC: Slow initial decay - NONE: No decay, just filtering

When to use: - News, articles, changelogs - Evolving knowledge bases - Time-sensitive information


MultiRepresentationRetriever

Store multiple embeddings per document for diverse query handling.

from cogent.retriever import MultiRepresentationRetriever, QueryType

index = MultiRepresentationRetriever(
    vectorstore=vectorstore,
    llm=model,
    representations=["original", "summary", "detailed", "questions"],
)

await index.add_documents(documents)

# Auto-detect query type
results = await index.retrieve("What is machine learning?")

# Force specific representation
results = await index.retrieve(
    "backpropagation gradient calculation",
    query_type=QueryType.SPECIFIC,  # Uses detailed representation
)

# Search all and fuse
results = await index.retrieve(
    "AI applications",
    search_all=True,
)

Representations: - original: Raw document embedding - summary: Conceptual summary - detailed: Technical details - keywords: Key terms - questions: Hypothetical Q&A - entities: Named entities

When to use: - Diverse query styles expected - Technical/specialized domains - Want maximum recall


Rerankers

Rerankers improve retrieval quality by re-scoring initial results.

from cogent.retriever import (
    DenseRetriever,
    CrossEncoderReranker,
    CohereReranker,
    LLMReranker,
)

# Initial retrieval
retriever = DenseRetriever(vectorstore)
initial_docs = await retriever.retrieve(query, k=20)  # Get documents

# Rerank with cross-encoder (local)
reranker = CrossEncoderReranker(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
reranked = await reranker.rerank(query, initial_docs, top_n=5)

# Or Cohere Rerank API
reranker = CohereReranker(api_key="...")
reranked = await reranker.rerank(query, initial_docs, top_n=5)

# Or any LLM
reranker = LLMReranker(llm=model)
reranked = await reranker.rerank(query, initial_docs, top_n=5)

Available rerankers: - CrossEncoderReranker: Local cross-encoder models - FlashRankReranker: Lightweight, fast local reranker - CohereReranker: Cohere Rerank API - LLMReranker: Any LLM for pointwise scoring - ListwiseLLMReranker: LLM ranks all docs at once


Utilities

Fusion Functions

from cogent.retriever import fuse_results, FusionStrategy

# Fuse results from multiple retrievers
fused = fuse_results(
    [results_1, results_2, results_3],
    strategy=FusionStrategy.RRF,
    weights=[0.5, 0.3, 0.2],
    k=10,
)

Score Normalization

from cogent.retriever import normalize_scores

# Normalize scores to 0-1 range
normalized = normalize_scores(results)

Deduplication

from cogent.retriever import deduplicate_results

# Remove duplicate documents
unique = deduplicate_results(results, by="content")  # or "id"

Citations and Formatting

For RAG applications, use these utilities to prepare results for LLM prompts:

from cogent.retriever import (
    add_citations,
    format_context,
    format_citations_reference,
    filter_by_score,
    top_k,
)

# Retrieve results
results = await retriever.retrieve(query, k=10, include_scores=True)

# Filter low-quality results
results = filter_by_score(results, min_score=0.5)
results = top_k(results, k=5)

# Add citation markers «1», «2», etc.
results = add_citations(results)
# results[0].metadata["citation"] == "«1»"

# Format as context string for LLM prompt
context = format_context(results)
# Output:
# «1» [Source: doc.pdf]
# This is the first chunk of text...
#
# ---
#
# «2» [Source: other.pdf]
# This is the second chunk...

# Generate citations reference section
reference = format_citations_reference(results)
# Output:
# Sources:
# «1» doc.pdf: This is a preview of the first document...
# «2» other.pdf: This is a preview of the second...

Example RAG prompt construction:

query = "What are the key findings?"
results = await retriever.retrieve(query, k=5, include_scores=True)
results = filter_by_score(results, min_score=0.5)
results = add_citations(results)
context = format_context(results)

prompt = f"""Based on the following context, answer the question.
Use citation markers like «1» to reference sources.

Context:
{context}

Question: {query}

Answer:"""

Choosing a Retriever

┌─────────────────────────────────────────────────────────────┐
│                    What's your use case?                     │
└─────────────────────────────────────────────────────────────┘
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
     General RAG        Specialized        Advanced
            │                 │                 │
            ▼                 │                 │
    ┌───────────────┐        │                 │
    │ HybridRetriever│◄──────┤                 │
    │   (default)    │        │                 │
    └───────────────┘        │                 │
                              ▼                 │
                    ┌─────────────────────┐    │
                    │  Time-sensitive?    │    │
                    │  → TimeBasedRetriever   │    │
                    ├─────────────────────┤    │
                    │  Structured docs?   │    │
                    │  → HierarchicalRetriever│    │
                    ├─────────────────────┤    │
                    │  Need full context? │    │
                    │  → ParentDocument   │    │
                    └─────────────────────┘    │
                              ┌─────────────────────────────┐
                              │ Multiple embedding models?  │
                              │ → EnsembleRetriever         │
                              ├─────────────────────────────┤
                              │ Natural language filters?   │
                              │ → SelfQueryRetriever        │
                              ├─────────────────────────────┤
                              │ Very long documents?        │
                              │ → SummaryRetriever / TreeRetriever  │
                              └─────────────────────────────┘

Performance Tips

  1. Start with EnsembleRetriever(dense + sparse) - Best default for most cases
  2. Use rerankers - Cheap way to improve quality
  3. Retrieve more, rerank less - Get top 20-50, rerank to top 5
  4. Cache embeddings - Reuse for similar queries
  5. Batch operations - Add documents in batches
  6. Add HybridRetriever for metadata - When you have structured metadata to filter/boost