Memory Module¶

The cogent.memory module provides a memory-first architecture where memory is a first-class citizen that can be wired to any agent.

Overview¶

Memory enables agents to: - Persist knowledge across conversations - Share state between agents - Perform semantic search over memories - Scope memories by user, team, or conversation - ACC (Agentic Context Compression) — Bounded context for long conversations

from cogent import Agent
from cogent.memory import Memory

# Basic in-memory storage
memory = Memory()
await memory.remember("user_preference", "dark mode")
value = await memory.recall("user_preference")

# Wire to an agent
agent = Agent(name="assistant", model=model, memory=memory)

# Memory with ACC enabled (prevents drift in long conversations)
memory = Memory(acc=True)

4-Layer Memory Architecture¶

Cogent provides four distinct memory mechanisms that work together:

Layer	Parameter	Mechanism	When to Use
1	`conversation=True`	Automatic message concatenation	Short sessions, full context needed
2	`acc=True`	Agentic Context Compression	Long conversations, prevent drift
3	`memory=True`	Explicit remember/recall tools	Persistent knowledge, semantic search
4	`cache=True`	Semantic tool output cache	Expensive/slow tool calls

Layer 1: Conversation History (Automatic)¶

Raw message concatenation - all previous messages automatically sent to LLM:

agent = Agent(name="Assistant", model="gpt4")  # conversation=True by default

await agent.run("Hi, I'm Alice", thread_id="session1")
await agent.run("What's my name?", thread_id="session1")

# Internally sends to LLM:
# [
#   {"role": "user", "content": "Hi, I'm Alice"},
#   {"role": "assistant", "content": "Hello Alice!"},
#   {"role": "user", "content": "What's my name?"}  # <-- Full history
# ]

Characteristics: - ✅ Automatic - No tools needed, no LLM decision required - ✅ Works immediately - LLM sees full context - ✅ Perfect recall - Nothing lost from conversation - ❌ Grows unbounded - Context window fills up over time - ❌ No semantic search - Just chronological concatenation - ❌ Session-bound - Lost when thread ends

When to use: Short sessions where full context fits in window.

Layer 2: ACC (Agentic Context Compression)¶

Compresses growing conversation history into structured constraints and entities:

agent = Agent(name="Assistant", model="gpt4", acc=True)

# After many messages, ACC compresses into:
# Constraints: ["User prefers dark mode", "User timezone is EST", "Project deadline: March 1"]
# Entities: ["Alice (user)", "Project Alpha (active)", "Bob (team lead)"]
# Only compressed context sent to LLM, not full 50-message history

Characteristics: - ✅ Bounded context - Prevents window overflow - ✅ Automatic - No LLM tool calls needed - ✅ Prevents drift - Maintains key facts across long sessions - ✅ Structured - Constraints + Entities format - ❌ Lossy - Some details discarded during compression

When to use: Long conversations that exceed context window.

Layer 3: Long-Term Memory (Explicit Tools)¶

LLM must explicitly call remember(), recall(), forget() tools:

agent = Agent(name="Assistant", model="gpt4", memory=True)

# Agent gets memory tools automatically
# LLM decides when to use them:

await agent.run("Remember that I prefer dark mode")
# LLM calls: remember(key="user_preference", value="dark mode")

await agent.run("What's my UI preference?")  
# LLM calls: recall(query="user preference")
# Returns: "dark mode"

# Next session (different thread_id)
await agent.run("What do I prefer?", thread_id="new_session")
# LLM can still recall: "dark mode" (survives sessions!)

Characteristics: - ✅ Semantic search - Finds relevant memories by meaning - ✅ Persistent - Survives across sessions/threads - ✅ Scoped - Can isolate per-user, per-team, etc. - ✅ Selective - LLM stores only important info - ❌ Requires LLM decision - LLM must choose to call tools - ❌ Not automatic - Won't use unless LLM decides to - ❌ Tool call overhead - Adds latency when used

When to use: Persistent knowledge across sessions, user preferences, facts that should survive.

Layer 4: Semantic Cache (Tool Outputs)¶

Caches tool results by semantic similarity to avoid redundant calls:

agent = Agent(name="Assistant", model="gpt4", cache=True)

# First call
await agent.run("Search for Python tutorials")  
# Calls search_tool(), caches result

# Similar query (different wording)
await agent.run("Find Python learning resources")  
# Cache hit! Returns previous result without calling search_tool()

Characteristics: - ✅ Speeds up repeated queries - Avoids slow/expensive tool calls - ✅ Semantic matching - Recognizes similar queries - ✅ Transparent - LLM doesn't know cache is used - ❌ Can return stale data - Cached results may be outdated - ❌ Storage overhead - Caches all tool outputs

When to use: Expensive API calls, slow database queries, rate-limited services.

Layer Comparison: Automatic vs Explicit¶

The key distinction is who decides to use memory:

# Layer 1: Conversation history (AUTOMATIC)
agent = Agent(name="Assistant", model="gpt4")  # conversation=True default
await agent.run("I'm Alice", thread_id="s1")
await agent.run("My name?", thread_id="s1")  
# ✅ Works! History automatically sent to LLM
# No tool calls, no LLM decision needed

# Layer 3: Memory tools (EXPLICIT - LLM decides)
agent = Agent(name="Assistant", model="gpt4", memory=True)
await agent.run("Remember I'm Alice")  
# ⚠️ LLM may or may not call remember() - it decides

await agent.run("My name?")            
# ⚠️ LLM may or may not call recall() - it decides
# If LLM doesn't call the tool, memory isn't used!

Recommendation: Use Layer 1 (conversation) for short-term context, Layer 3 (memory tools) for long-term persistent knowledge.

Core Classes¶

Memory¶

The main memory interface with simple remember/recall API:

from cogent.memory import Memory

memory = Memory()

# Remember a value
await memory.remember("key", "value")
await memory.remember("user.name", "Alice")
await memory.remember("conversation.topic", "AI research")

# Recall a value
name = await memory.recall("user.name")  # "Alice"
missing = await memory.recall("unknown")  # None
missing = await memory.recall("unknown", default="N/A")  # "N/A"

# Check existence
exists = await memory.exists("user.name")  # True

# Delete a memory
await memory.forget("user.name")

# List all keys
keys = await memory.list_keys()  # ["conversation.topic"]

# Clear all memories
await memory.clear()

Scoped Memory¶

Create isolated memory views for users, teams, or conversations:

from cogent.memory import Memory

memory = Memory()

# Create scoped views
user_mem = memory.scoped("user:alice")
team_mem = memory.scoped("team:research")
conv_mem = memory.scoped("conv:thread-123")

# Each scope is isolated
await user_mem.remember("preference", "compact")
await team_mem.remember("preference", "detailed")

user_pref = await user_mem.recall("preference")  # "compact"
team_pref = await team_mem.recall("preference")  # "detailed"

# Scopes can be nested
project_mem = team_mem.scoped("project:alpha")
await project_mem.remember("status", "active")

Shared Memory Between Agents¶

Wire the same memory to multiple agents for shared knowledge:

from cogent import Agent
from cogent.memory import Memory

# Shared memory instance
shared = Memory()

# Both agents share the same memory
researcher = Agent(name="researcher", model=model, memory=shared)
writer = Agent(name="writer", model=model, memory=shared)

# Researcher stores findings
await shared.remember("findings", "Key insight: AI adoption is growing")

# Writer can access them
findings = await shared.recall("findings")

Storage Backends¶

InMemoryStore (Default)¶

Fast, no-persistence storage for development and testing:

from cogent.memory import Memory, InMemoryStore

# Default - uses InMemoryStore
memory = Memory()

# Explicit
memory = Memory(store=InMemoryStore())

SQLAlchemyStore¶

Persistent storage with SQLAlchemy 2.0 async support:

from cogent.memory import Memory, SQLAlchemyStore

# SQLite (local file)
store = SQLAlchemyStore("sqlite+aiosqlite:///./memory.db")
memory = Memory(store=store)

# PostgreSQL
store = SQLAlchemyStore(
    "postgresql+asyncpg://user:pass@localhost/db",
    pool_size=10,
)
memory = Memory(store=store)

# Initialize tables (run once)
await store.initialize()

# Cleanup
await store.close()

Context manager for cleanup:

async with SQLAlchemyStore("sqlite+aiosqlite:///./data.db") as store:
    memory = Memory(store=store)
    await memory.remember("key", "value")

RedisStore¶

Distributed cache with native TTL support:

from cogent.memory import Memory, RedisStore

store = RedisStore(
    url="redis://localhost:6379",
    prefix="myapp:",  # Key prefix
    default_ttl=3600,  # 1 hour default TTL
)
memory = Memory(store=store)

# With TTL per key
await memory.remember("session", {"user": "alice"}, ttl=1800)

Memory Key Search¶

Memory provides intelligent key search with three methods that automatically cascade:

1. Fuzzy Matching (Default - Fast & Free)¶

The default search method uses fuzzy string matching for instant, offline key discovery:

from cogent import Agent
from cogent.memory import Memory

# No special setup needed - fuzzy matching works out of the box
memory = Memory()

agent = Agent(name="assistant", model=model, memory=memory)

# Save memories
await agent.run("My name is Alice, I prefer dark mode, language is Python")

# Fuzzy matching finds similar keys instantly
await agent.run("What are my preferences?")
# → search_memories("preferences") finds "preferred_mode" and "preferred_language"
# Method: Fuzzy match (0.1ms, free, offline)

Benefits: - ⚡ 2,800× faster than semantic search (0.1ms vs 280ms) - 💰 Free - no API calls - 🔌 Works offline - no network required - 📊 62.5% accuracy - good enough for most use cases - 🧹 Smart normalization - handles underscores, hyphens, word order

How it works:

# String normalization helps matching:
"preferred_mode" → "preferred mode"
"user_timezone" → "user timezone"
"notification-settings" → "notification settings"

# Fuzzy matching finds similarity:
Query: "preferences" → Matches: "preferred mode", "preferred language"
Query: "contact" → Matches: "email", "phone number"
Query: "settings" → Matches: "notification settings"

2. Semantic Search (Optional Fallback)¶

Enable semantic search by adding a vectorstore (used when fuzzy matching unavailable):

from cogent import Agent
from cogent.memory import Memory
from cogent.vectorstore import VectorStore

# Add vectorstore for semantic fallback
memory = Memory(vectorstore=VectorStore())

agent = Agent(name="assistant", model=model, memory=memory)

When semantic search is used: - Fuzzy matching library (rapidfuzz) not installed - Fuzzy matching finds no matches (< 40% similarity)

Trade-offs: - ✅ 75% accuracy - better than fuzzy (but only 12.5% improvement) - ❌ 280ms avg - 2,800× slower than fuzzy - ❌ Costs money - OpenAI API calls - ❌ Requires network - API dependency

3. Keyword Search (Final Fallback)¶

Simple substring matching when all else fails:

# Query: "mode" → Matches keys containing "mode": "preferred_mode", "dark_mode"

Installation¶

Recommended (fuzzy matching):

uv add rapidfuzz  # For fast, free fuzzy matching

Optional (semantic fallback):

from cogent.memory import Memory
from cogent.vectorstore import VectorStore

memory = Memory(vectorstore=VectorStore())  # Enables semantic fallback

Performance Comparison¶

Method	Speed	Accuracy	Cost	Offline
Fuzzy	0.1ms	62.5%	Free	✅ Yes
Semantic	280ms	75.0%	$$ API	❌ No
Keyword	0.1ms	~30%	Free	✅ Yes

Recommendation: Use fuzzy matching (default) for 99% of use cases.

Example¶

See examples/basics/memory_semantic_search.py for a complete demo.

from cogent import Agent
from cogent.memory import Memory

memory = Memory()  # Fuzzy matching by default

agent = Agent(name="assistant", model="gpt-5.4", memory=memory)

# Save with specific key names
await memory.remember("preferred_mode", "dark")
await memory.remember("preferred_language", "Python")
await memory.remember("email", "alice@example.com")

# Agent finds them with fuzzy matching (instant!)
await agent.run("What are my preferences?")
# → search_memories("preferences") finds "preferred_mode" and "preferred_language"
# ⚡ 0.1ms, free, offline

await agent.run("How can I contact the user?")
# → search_memories("contact") finds "email"

Memory Tools¶

Memory automatically exposes tools to agents for autonomous memory management:

from cogent import Agent
from cogent.memory import Memory

# Memory is always agentic - tools auto-added
memory = Memory()

agent = Agent(
    name="assistant",
    model=model,
    memory=memory,
)

# Agent has 5 memory tools available:
# 1. remember(key, value) - Save facts to long-term memory
# 2. recall(key) - Retrieve specific facts
# 3. forget(key) - Remove facts
# 4. search_memories(query) - Search long-term facts (fuzzy matching by default)
# 5. search_conversation(query) - Search conversation history

# Agent can now use memory tools autonomously
result = await agent.run("Remember that my name is Alice")
result = await agent.run("What's my name?")

Available Tools¶

1. remember(key, value) - Save important facts

# Agent automatically calls when user shares information
await agent.run("My favorite language is Python")
# → Agent calls: remember("favorite_language", "Python")

2. recall(key) - Retrieve specific saved facts

await agent.run("What's my favorite language?")
# → Agent calls: recall("favorite_language")

3. forget(key) - Remove facts (when user requests)

await agent.run("Forget my favorite language")
# → Agent calls: forget("favorite_language")

4. search_memories(query, k=5) - Search long-term facts with intelligent matching

# Default: Fast fuzzy matching (0.1ms, free, offline)
memory = Memory()
await agent.run("What are my preferences?")
# → Agent calls: search_memories("preferences")
# → Finds: "preferred_mode", "preferred_language" via fuzzy matching

# Optional: Add vectorstore for semantic fallback
memory = Memory(vectorstore=VectorStore())
# → Uses fuzzy matching first, falls back to semantic if needed

5. search_conversation(query, max_results=5) - Search conversation history

# Critical for long conversations exceeding context window
await agent.run("What were the three projects I mentioned earlier?")
# → Agent calls: search_conversation("three projects")

When Tools Are Used¶

The agent's system prompt instructs it to:

At conversation start → search_memories("user") to recall context
When user shares info → remember(key, value) immediately
When asked about something → Search before saying "I don't know"
For facts → search_memories(query) or recall(key)
For past conversation → search_conversation(query)
In long conversations → Use search_conversation() to find earlier context

Shorthand - memory=True creates a Memory instance:

# Shorthand for Memory()
agent = Agent(name="assistant", model=model, memory=True)

Usage Patterns¶

Conversation History¶

from cogent.memory import Memory

memory = Memory()

async def chat(user_id: str, message: str) -> str:
    user_mem = memory.scoped(f"user:{user_id}")

    # Load history
    history = await user_mem.recall("history", default=[])
    history.append({"role": "user", "content": message})

    # Get response (using agent)
    response = await agent.run(message, history=history)

    # Save updated history
    history.append({"role": "assistant", "content": response})
    await user_mem.remember("history", history)

    return response

Team Knowledge Base¶

from cogent.memory import Memory, SQLAlchemyStore
from cogent.vectorstore import VectorStore

# Persistent team memory with search
team_memory = Memory(
    store=SQLAlchemyStore("sqlite+aiosqlite:///./team.db"),
    vectorstore=VectorStore(),
)

# Store team knowledge
await team_memory.remember("policy:vacation", "Employees get 20 days PTO")
await team_memory.remember("policy:remote", "Remote work allowed 3 days/week")
await team_memory.remember("contact:hr", "hr@company.com")

# Search policies
results = await team_memory.search("time off work", k=3)

Agent with Persistent Context¶

from cogent import Agent
from cogent.memory import Memory, SQLAlchemyStore

store = SQLAlchemyStore("sqlite+aiosqlite:///./agent.db")
memory = Memory(store=store)

agent = Agent(
    name="assistant",
    model=model,
    memory=memory,
    instructions="""You have access to persistent memory.
    Use it to remember user preferences and context.""",
)

# First conversation
await agent.run("My favorite color is blue")

# Later conversation (same agent)
await agent.run("What's my favorite color?")  # Recalls "blue"

Store Protocol¶

Implement custom storage backends:

from typing import Protocol, Any

class Store(Protocol):
    """Protocol for memory storage backends."""

    async def get(self, key: str) -> Any | None:
        """Get a value by key."""
        ...

    async def set(self, key: str, value: Any, ttl: int | None = None) -> None:
        """Set a value with optional TTL."""
        ...

    async def delete(self, key: str) -> bool:
        """Delete a key. Returns True if existed."""
        ...

    async def exists(self, key: str) -> bool:
        """Check if key exists."""
        ...

    async def keys(self, pattern: str = "*") -> list[str]:
        """List keys matching pattern."""
        ...

    async def clear(self) -> None:
        """Clear all keys."""
        ...

Custom implementation example:

class DynamoDBStore:
    """Custom DynamoDB backend."""

    def __init__(self, table_name: str):
        self.table_name = table_name
        self.client = boto3.resource("dynamodb")
        self.table = self.client.Table(table_name)

    async def get(self, key: str) -> Any | None:
        response = self.table.get_item(Key={"pk": key})
        item = response.get("Item")
        return item["value"] if item else None

    async def set(self, key: str, value: Any, ttl: int | None = None) -> None:
        item = {"pk": key, "value": value}
        if ttl:
            item["ttl"] = int(time.time()) + ttl
        self.table.put_item(Item=item)

    # ... implement other methods

# Use custom store
memory = Memory(store=DynamoDBStore("my-memories"))

API Reference¶

Memory¶

Method	Description
`remember(key, value, ttl?)`	Store a value
`recall(key, default?)`	Retrieve a value
`forget(key)`	Delete a value
`exists(key)`	Check if key exists
`list_keys(pattern?)`	List matching keys
`clear()`	Clear all memories
`scoped(prefix)`	Create scoped view
`search(query, k?)`	Semantic search (requires vectorstore)

Stores¶

Store	Use Case
`InMemoryStore`	Development, testing, ephemeral
`SQLAlchemyStore`	Persistent, ACID, SQL databases
`RedisStore`	Distributed, TTL, high-throughput

Semantic Cache¶

SemanticCache provides embedding-based caching with configurable similarity thresholds. When a query is "close enough" to a cached entry, return the cached result instead of making an expensive LLM or API call.

Key Benefits: - 80%+ hit rates — Cache similar queries, not just exact matches - 7-10× speedup — Cached responses return instantly - Cost reduction — Fewer API calls = lower costs - Automatic eviction — LRU policy and TTL expiration

Quick Start¶

Enable caching with cache=True:

from cogent import Agent

agent = Agent(
    model="gpt-5.4-mini",
    cache=True,  # Enable semantic cache with defaults
)

# First query
await agent.run("What are the best Python frameworks?")

# Similar query hits cache (instant!)
await agent.run("What are the top Python frameworks?")

Custom Configuration¶

Pass a SemanticCache instance for custom settings:

from cogent import Agent
from cogent.memory import SemanticCache
from cogent.models import create_embedding

# Create embedding model
embed = create_embedding("openai", "text-embedding-3-small")

agent = Agent(
    model="gpt-5.4-mini",
    cache=SemanticCache(
        embedding=embed,            # Embedding model (required for custom)
        similarity_threshold=0.90,  # Stricter matching (default: 0.85)
        max_entries=5000,           # Cache size (default: 10000)
        default_ttl=3600,           # 1 hour TTL (default: 86400)
    ),
)

Similarity Threshold:

Threshold	Behavior	Use Case
0.95-1.0	Very strict, near-exact	Deterministic outputs
0.85-0.95	Balanced, similar intent	General purpose (default)
0.70-0.85	Loose, broad matching	Exploratory queries

Tool-Level Caching¶

Use @tool(cache=True) to cache expensive tool calls:

from cogent import Agent, tool

@tool(cache=True)
async def search_products(query: str) -> str:
    """Search products in the catalog."""
    return await product_api.search(query)

agent = Agent(
    model="gpt-5.4-mini",
    tools=[search_products],
    cache=True,  # Required — tools use agent's cache
)

# First call executes the tool
await agent.run("Find running shoes")

# Similar query hits cache
await agent.run("Show me running sneakers")  # Cache hit!

See tool-building.md for more details.

When to Use¶

Use Semantic Cache When	Don't Use When
User queries with variation	Need exact-match guarantees

ACC (Agentic Context Compression)¶

ACC provides bounded memory for long conversations, preventing context drift and memory poisoning.

Basic Usage¶

Enable ACC with acc=True on Memory or Agent:

from cogent import Agent
from cogent.memory import Memory

# Option 1: Enable on Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=True)

# Option 2: Enable on Memory (then pass to Agent)
memory = Memory(acc=True)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)

Custom ACC Bounds¶

For fine-grained control, pass custom bounds directly:

from cogent import Agent
from cogent.memory import Memory
from cogent.memory.acc import AgentCognitiveCompressor

# Create ACC with custom bounds
acc = AgentCognitiveCompressor(
    max_constraints=10,  # Rules, guidelines
    max_entities=30,     # Facts, knowledge
    max_actions=20,      # Past actions
    max_context=15,      # Relevant context
)

# Pass to Memory
memory = Memory(acc=acc)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)

# Or pass directly to Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=acc)

Thread ID for Context Persistence¶

ACC requires thread_id to persist context across multiple run() calls:

# Same thread_id = context persists
await agent.run("My name is Alice", thread_id="session-1")
await agent.run("What's my name?", thread_id="session-1")  # Remembers!

# Different thread_id = fresh context
await agent.run("What's my name?", thread_id="session-2")  # Doesn't know

When to Use ACC¶

Use ACC When	Don't Use When
Long conversations (>10 turns)	Short, one-off queries
Need to prevent context drift	Stateless operations
Bounded memory is critical	Need full conversation replay
Multi-turn workflows	Simple Q&A