Skip to content

Memory Module

The cogent.memory module provides a memory-first architecture where memory is a first-class citizen that can be wired to any agent.

Overview

Memory enables agents to: - Persist knowledge across conversations - Share state between agents - Perform semantic search over memories - Scope memories by user, team, or conversation - ACC (Agentic Context Compression) — Bounded context for long conversations

from cogent import Agent
from cogent.memory import Memory

# Basic in-memory storage
memory = Memory()
await memory.remember("user_preference", "dark mode")
value = await memory.recall("user_preference")

# Wire to an agent
agent = Agent(name="assistant", model=model, memory=memory)

# Memory with ACC enabled (prevents drift in long conversations)
memory = Memory(acc=True)

4-Layer Memory Architecture

Cogent provides four distinct memory mechanisms that work together:

Layer Parameter Mechanism When to Use
1 conversation=True Automatic message concatenation Short sessions, full context needed
2 acc=True Agentic Context Compression Long conversations, prevent drift
3 memory=True Explicit remember/recall tools Persistent knowledge, semantic search
4 cache=True Semantic tool output cache Expensive/slow tool calls

Layer 1: Conversation History (Automatic)

Raw message concatenation - all previous messages automatically sent to LLM:

agent = Agent(name="Assistant", model="gpt4")  # conversation=True by default

await agent.run("Hi, I'm Alice", thread_id="session1")
await agent.run("What's my name?", thread_id="session1")

# Internally sends to LLM:
# [
#   {"role": "user", "content": "Hi, I'm Alice"},
#   {"role": "assistant", "content": "Hello Alice!"},
#   {"role": "user", "content": "What's my name?"}  # <-- Full history
# ]

Characteristics: - ✅ Automatic - No tools needed, no LLM decision required - ✅ Works immediately - LLM sees full context - ✅ Perfect recall - Nothing lost from conversation - ❌ Grows unbounded - Context window fills up over time - ❌ No semantic search - Just chronological concatenation - ❌ Session-bound - Lost when thread ends

When to use: Short sessions where full context fits in window.

Layer 2: ACC (Agentic Context Compression)

Compresses growing conversation history into structured constraints and entities:

agent = Agent(name="Assistant", model="gpt4", acc=True)

# After many messages, ACC compresses into:
# Constraints: ["User prefers dark mode", "User timezone is EST", "Project deadline: March 1"]
# Entities: ["Alice (user)", "Project Alpha (active)", "Bob (team lead)"]
# Only compressed context sent to LLM, not full 50-message history

Characteristics: - ✅ Bounded context - Prevents window overflow - ✅ Automatic - No LLM tool calls needed - ✅ Prevents drift - Maintains key facts across long sessions - ✅ Structured - Constraints + Entities format - ❌ Lossy - Some details discarded during compression

When to use: Long conversations that exceed context window.

Layer 3: Long-Term Memory (Explicit Tools)

LLM must explicitly call remember(), recall(), forget() tools:

agent = Agent(name="Assistant", model="gpt4", memory=True)

# Agent gets memory tools automatically
# LLM decides when to use them:

await agent.run("Remember that I prefer dark mode")
# LLM calls: remember(key="user_preference", value="dark mode")

await agent.run("What's my UI preference?")  
# LLM calls: recall(query="user preference")
# Returns: "dark mode"

# Next session (different thread_id)
await agent.run("What do I prefer?", thread_id="new_session")
# LLM can still recall: "dark mode" (survives sessions!)

Characteristics: - ✅ Semantic search - Finds relevant memories by meaning - ✅ Persistent - Survives across sessions/threads - ✅ Scoped - Can isolate per-user, per-team, etc. - ✅ Selective - LLM stores only important info - ❌ Requires LLM decision - LLM must choose to call tools - ❌ Not automatic - Won't use unless LLM decides to - ❌ Tool call overhead - Adds latency when used

When to use: Persistent knowledge across sessions, user preferences, facts that should survive.

Layer 4: Semantic Cache (Tool Outputs)

Caches tool results by semantic similarity to avoid redundant calls:

agent = Agent(name="Assistant", model="gpt4", cache=True)

# First call
await agent.run("Search for Python tutorials")  
# Calls search_tool(), caches result

# Similar query (different wording)
await agent.run("Find Python learning resources")  
# Cache hit! Returns previous result without calling search_tool()

Characteristics: - ✅ Speeds up repeated queries - Avoids slow/expensive tool calls - ✅ Semantic matching - Recognizes similar queries - ✅ Transparent - LLM doesn't know cache is used - ❌ Can return stale data - Cached results may be outdated - ❌ Storage overhead - Caches all tool outputs

When to use: Expensive API calls, slow database queries, rate-limited services.


Layer Comparison: Automatic vs Explicit

The key distinction is who decides to use memory:

# Layer 1: Conversation history (AUTOMATIC)
agent = Agent(name="Assistant", model="gpt4")  # conversation=True default
await agent.run("I'm Alice", thread_id="s1")
await agent.run("My name?", thread_id="s1")  
# ✅ Works! History automatically sent to LLM
# No tool calls, no LLM decision needed

# Layer 3: Memory tools (EXPLICIT - LLM decides)
agent = Agent(name="Assistant", model="gpt4", memory=True)
await agent.run("Remember I'm Alice")  
# ⚠️ LLM may or may not call remember() - it decides

await agent.run("My name?")            
# ⚠️ LLM may or may not call recall() - it decides
# If LLM doesn't call the tool, memory isn't used!

Recommendation: Use Layer 1 (conversation) for short-term context, Layer 3 (memory tools) for long-term persistent knowledge.


Core Classes

Memory

The main memory interface with simple remember/recall API:

from cogent.memory import Memory

memory = Memory()

# Remember a value
await memory.remember("key", "value")
await memory.remember("user.name", "Alice")
await memory.remember("conversation.topic", "AI research")

# Recall a value
name = await memory.recall("user.name")  # "Alice"
missing = await memory.recall("unknown")  # None
missing = await memory.recall("unknown", default="N/A")  # "N/A"

# Check existence
exists = await memory.exists("user.name")  # True

# Delete a memory
await memory.forget("user.name")

# List all keys
keys = await memory.list_keys()  # ["conversation.topic"]

# Clear all memories
await memory.clear()

Scoped Memory

Create isolated memory views for users, teams, or conversations:

from cogent.memory import Memory

memory = Memory()

# Create scoped views
user_mem = memory.scoped("user:alice")
team_mem = memory.scoped("team:research")
conv_mem = memory.scoped("conv:thread-123")

# Each scope is isolated
await user_mem.remember("preference", "compact")
await team_mem.remember("preference", "detailed")

user_pref = await user_mem.recall("preference")  # "compact"
team_pref = await team_mem.recall("preference")  # "detailed"

# Scopes can be nested
project_mem = team_mem.scoped("project:alpha")
await project_mem.remember("status", "active")

Shared Memory Between Agents

Wire the same memory to multiple agents for shared knowledge:

from cogent import Agent
from cogent.memory import Memory

# Shared memory instance
shared = Memory()

# Both agents share the same memory
researcher = Agent(name="researcher", model=model, memory=shared)
writer = Agent(name="writer", model=model, memory=shared)

# Researcher stores findings
await shared.remember("findings", "Key insight: AI adoption is growing")

# Writer can access them
findings = await shared.recall("findings")

Storage Backends

InMemoryStore (Default)

Fast, no-persistence storage for development and testing:

from cogent.memory import Memory, InMemoryStore

# Default - uses InMemoryStore
memory = Memory()

# Explicit
memory = Memory(store=InMemoryStore())

SQLAlchemyStore

Persistent storage with SQLAlchemy 2.0 async support:

from cogent.memory import Memory, SQLAlchemyStore

# SQLite (local file)
store = SQLAlchemyStore("sqlite+aiosqlite:///./memory.db")
memory = Memory(store=store)

# PostgreSQL
store = SQLAlchemyStore(
    "postgresql+asyncpg://user:pass@localhost/db",
    pool_size=10,
)
memory = Memory(store=store)

# Initialize tables (run once)
await store.initialize()

# Cleanup
await store.close()

Context manager for cleanup:

async with SQLAlchemyStore("sqlite+aiosqlite:///./data.db") as store:
    memory = Memory(store=store)
    await memory.remember("key", "value")

RedisStore

Distributed cache with native TTL support:

from cogent.memory import Memory, RedisStore

store = RedisStore(
    url="redis://localhost:6379",
    prefix="myapp:",  # Key prefix
    default_ttl=3600,  # 1 hour default TTL
)
memory = Memory(store=store)

# With TTL per key
await memory.remember("session", {"user": "alice"}, ttl=1800)

Memory provides intelligent key search with three methods that automatically cascade:

1. Fuzzy Matching (Default - Fast & Free)

The default search method uses fuzzy string matching for instant, offline key discovery:

from cogent import Agent
from cogent.memory import Memory

# No special setup needed - fuzzy matching works out of the box
memory = Memory()

agent = Agent(name="assistant", model=model, memory=memory)

# Save memories
await agent.run("My name is Alice, I prefer dark mode, language is Python")

# Fuzzy matching finds similar keys instantly
await agent.run("What are my preferences?")
# → search_memories("preferences") finds "preferred_mode" and "preferred_language"
# Method: Fuzzy match (0.1ms, free, offline)

Benefits: - ⚡ 2,800× faster than semantic search (0.1ms vs 280ms) - 💰 Free - no API calls - 🔌 Works offline - no network required - 📊 62.5% accuracy - good enough for most use cases - 🧹 Smart normalization - handles underscores, hyphens, word order

How it works:

# String normalization helps matching:
"preferred_mode"  "preferred mode"
"user_timezone"  "user timezone"
"notification-settings"  "notification settings"

# Fuzzy matching finds similarity:
Query: "preferences"  Matches: "preferred mode", "preferred language"
Query: "contact"  Matches: "email", "phone number"
Query: "settings"  Matches: "notification settings"

2. Semantic Search (Optional Fallback)

Enable semantic search by adding a vectorstore (used when fuzzy matching unavailable):

from cogent import Agent
from cogent.memory import Memory
from cogent.vectorstore import VectorStore

# Add vectorstore for semantic fallback
memory = Memory(vectorstore=VectorStore())

agent = Agent(name="assistant", model=model, memory=memory)

When semantic search is used: - Fuzzy matching library (rapidfuzz) not installed - Fuzzy matching finds no matches (< 40% similarity)

Trade-offs: - ✅ 75% accuracy - better than fuzzy (but only 12.5% improvement) - ❌ 280ms avg - 2,800× slower than fuzzy - ❌ Costs money - OpenAI API calls - ❌ Requires network - API dependency

3. Keyword Search (Final Fallback)

Simple substring matching when all else fails:

# Query: "mode" → Matches keys containing "mode": "preferred_mode", "dark_mode"

Installation

Recommended (fuzzy matching):

uv add rapidfuzz  # For fast, free fuzzy matching

Optional (semantic fallback):

from cogent.memory import Memory
from cogent.vectorstore import VectorStore

memory = Memory(vectorstore=VectorStore())  # Enables semantic fallback

Performance Comparison

Method Speed Accuracy Cost Offline
Fuzzy 0.1ms 62.5% Free ✅ Yes
Semantic 280ms 75.0% $$ API ❌ No
Keyword 0.1ms ~30% Free ✅ Yes

Recommendation: Use fuzzy matching (default) for 99% of use cases.

Example

See examples/basics/memory_semantic_search.py for a complete demo.

from cogent import Agent
from cogent.memory import Memory

memory = Memory()  # Fuzzy matching by default

agent = Agent(name="assistant", model="gpt-5.4", memory=memory)

# Save with specific key names
await memory.remember("preferred_mode", "dark")
await memory.remember("preferred_language", "Python")
await memory.remember("email", "alice@example.com")

# Agent finds them with fuzzy matching (instant!)
await agent.run("What are my preferences?")
# → search_memories("preferences") finds "preferred_mode" and "preferred_language"
# ⚡ 0.1ms, free, offline

await agent.run("How can I contact the user?")
# → search_memories("contact") finds "email"

Memory Tools

Memory automatically exposes tools to agents for autonomous memory management:

from cogent import Agent
from cogent.memory import Memory

# Memory is always agentic - tools auto-added
memory = Memory()

agent = Agent(
    name="assistant",
    model=model,
    memory=memory,
)

# Agent has 5 memory tools available:
# 1. remember(key, value) - Save facts to long-term memory
# 2. recall(key) - Retrieve specific facts
# 3. forget(key) - Remove facts
# 4. search_memories(query) - Search long-term facts (fuzzy matching by default)
# 5. search_conversation(query) - Search conversation history

# Agent can now use memory tools autonomously
result = await agent.run("Remember that my name is Alice")
result = await agent.run("What's my name?")

Available Tools

1. remember(key, value) - Save important facts

# Agent automatically calls when user shares information
await agent.run("My favorite language is Python")
# → Agent calls: remember("favorite_language", "Python")

2. recall(key) - Retrieve specific saved facts

await agent.run("What's my favorite language?")
# → Agent calls: recall("favorite_language")

3. forget(key) - Remove facts (when user requests)

await agent.run("Forget my favorite language")
# → Agent calls: forget("favorite_language")

4. search_memories(query, k=5) - Search long-term facts with intelligent matching

# Default: Fast fuzzy matching (0.1ms, free, offline)
memory = Memory()
await agent.run("What are my preferences?")
# → Agent calls: search_memories("preferences")
# → Finds: "preferred_mode", "preferred_language" via fuzzy matching

# Optional: Add vectorstore for semantic fallback
memory = Memory(vectorstore=VectorStore())
# → Uses fuzzy matching first, falls back to semantic if needed

5. search_conversation(query, max_results=5) - Search conversation history

# Critical for long conversations exceeding context window
await agent.run("What were the three projects I mentioned earlier?")
# → Agent calls: search_conversation("three projects")

When Tools Are Used

The agent's system prompt instructs it to:

  1. At conversation startsearch_memories("user") to recall context
  2. When user shares inforemember(key, value) immediately
  3. When asked about something → Search before saying "I don't know"
  4. For facts → search_memories(query) or recall(key)
  5. For past conversation → search_conversation(query)
  6. In long conversations → Use search_conversation() to find earlier context

Shorthand - memory=True creates a Memory instance:

# Shorthand for Memory()
agent = Agent(name="assistant", model=model, memory=True)

Usage Patterns

Conversation History

from cogent.memory import Memory

memory = Memory()

async def chat(user_id: str, message: str) -> str:
    user_mem = memory.scoped(f"user:{user_id}")

    # Load history
    history = await user_mem.recall("history", default=[])
    history.append({"role": "user", "content": message})

    # Get response (using agent)
    response = await agent.run(message, history=history)

    # Save updated history
    history.append({"role": "assistant", "content": response})
    await user_mem.remember("history", history)

    return response

Team Knowledge Base

from cogent.memory import Memory, SQLAlchemyStore
from cogent.vectorstore import VectorStore

# Persistent team memory with search
team_memory = Memory(
    store=SQLAlchemyStore("sqlite+aiosqlite:///./team.db"),
    vectorstore=VectorStore(),
)

# Store team knowledge
await team_memory.remember("policy:vacation", "Employees get 20 days PTO")
await team_memory.remember("policy:remote", "Remote work allowed 3 days/week")
await team_memory.remember("contact:hr", "hr@company.com")

# Search policies
results = await team_memory.search("time off work", k=3)

Agent with Persistent Context

from cogent import Agent
from cogent.memory import Memory, SQLAlchemyStore

store = SQLAlchemyStore("sqlite+aiosqlite:///./agent.db")
memory = Memory(store=store)

agent = Agent(
    name="assistant",
    model=model,
    memory=memory,
    instructions="""You have access to persistent memory.
    Use it to remember user preferences and context.""",
)

# First conversation
await agent.run("My favorite color is blue")

# Later conversation (same agent)
await agent.run("What's my favorite color?")  # Recalls "blue"

Store Protocol

Implement custom storage backends:

from typing import Protocol, Any

class Store(Protocol):
    """Protocol for memory storage backends."""

    async def get(self, key: str) -> Any | None:
        """Get a value by key."""
        ...

    async def set(self, key: str, value: Any, ttl: int | None = None) -> None:
        """Set a value with optional TTL."""
        ...

    async def delete(self, key: str) -> bool:
        """Delete a key. Returns True if existed."""
        ...

    async def exists(self, key: str) -> bool:
        """Check if key exists."""
        ...

    async def keys(self, pattern: str = "*") -> list[str]:
        """List keys matching pattern."""
        ...

    async def clear(self) -> None:
        """Clear all keys."""
        ...

Custom implementation example:

class DynamoDBStore:
    """Custom DynamoDB backend."""

    def __init__(self, table_name: str):
        self.table_name = table_name
        self.client = boto3.resource("dynamodb")
        self.table = self.client.Table(table_name)

    async def get(self, key: str) -> Any | None:
        response = self.table.get_item(Key={"pk": key})
        item = response.get("Item")
        return item["value"] if item else None

    async def set(self, key: str, value: Any, ttl: int | None = None) -> None:
        item = {"pk": key, "value": value}
        if ttl:
            item["ttl"] = int(time.time()) + ttl
        self.table.put_item(Item=item)

    # ... implement other methods

# Use custom store
memory = Memory(store=DynamoDBStore("my-memories"))

API Reference

Memory

Method Description
remember(key, value, ttl?) Store a value
recall(key, default?) Retrieve a value
forget(key) Delete a value
exists(key) Check if key exists
list_keys(pattern?) List matching keys
clear() Clear all memories
scoped(prefix) Create scoped view
search(query, k?) Semantic search (requires vectorstore)

Stores

Store Use Case
InMemoryStore Development, testing, ephemeral
SQLAlchemyStore Persistent, ACID, SQL databases
RedisStore Distributed, TTL, high-throughput

Semantic Cache

SemanticCache provides embedding-based caching with configurable similarity thresholds. When a query is "close enough" to a cached entry, return the cached result instead of making an expensive LLM or API call.

Key Benefits: - 80%+ hit rates — Cache similar queries, not just exact matches - 7-10× speedup — Cached responses return instantly - Cost reduction — Fewer API calls = lower costs - Automatic eviction — LRU policy and TTL expiration

Quick Start

Enable caching with cache=True:

from cogent import Agent

agent = Agent(
    model="gpt-5.4-mini",
    cache=True,  # Enable semantic cache with defaults
)

# First query
await agent.run("What are the best Python frameworks?")

# Similar query hits cache (instant!)
await agent.run("What are the top Python frameworks?")

Custom Configuration

Pass a SemanticCache instance for custom settings:

from cogent import Agent
from cogent.memory import SemanticCache
from cogent.models import create_embedding

# Create embedding model
embed = create_embedding("openai", "text-embedding-3-small")

agent = Agent(
    model="gpt-5.4-mini",
    cache=SemanticCache(
        embedding=embed,            # Embedding model (required for custom)
        similarity_threshold=0.90,  # Stricter matching (default: 0.85)
        max_entries=5000,           # Cache size (default: 10000)
        default_ttl=3600,           # 1 hour TTL (default: 86400)
    ),
)

Similarity Threshold:

Threshold Behavior Use Case
0.95-1.0 Very strict, near-exact Deterministic outputs
0.85-0.95 Balanced, similar intent General purpose (default)
0.70-0.85 Loose, broad matching Exploratory queries

Tool-Level Caching

Use @tool(cache=True) to cache expensive tool calls:

from cogent import Agent, tool

@tool(cache=True)
async def search_products(query: str) -> str:
    """Search products in the catalog."""
    return await product_api.search(query)

agent = Agent(
    model="gpt-5.4-mini",
    tools=[search_products],
    cache=True,  # Required — tools use agent's cache
)

# First call executes the tool
await agent.run("Find running shoes")

# Similar query hits cache
await agent.run("Show me running sneakers")  # Cache hit!

See tool-building.md for more details.

When to Use

Use Semantic Cache When Don't Use When
User queries with variation Need exact-match guarantees

ACC (Agentic Context Compression)

ACC provides bounded memory for long conversations, preventing context drift and memory poisoning.

Basic Usage

Enable ACC with acc=True on Memory or Agent:

from cogent import Agent
from cogent.memory import Memory

# Option 1: Enable on Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=True)

# Option 2: Enable on Memory (then pass to Agent)
memory = Memory(acc=True)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)

Custom ACC Bounds

For fine-grained control, pass custom bounds directly:

from cogent import Agent
from cogent.memory import Memory
from cogent.memory.acc import AgentCognitiveCompressor

# Create ACC with custom bounds
acc = AgentCognitiveCompressor(
    max_constraints=10,  # Rules, guidelines
    max_entities=30,     # Facts, knowledge
    max_actions=20,      # Past actions
    max_context=15,      # Relevant context
)

# Pass to Memory
memory = Memory(acc=acc)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)

# Or pass directly to Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=acc)

Thread ID for Context Persistence

ACC requires thread_id to persist context across multiple run() calls:

# Same thread_id = context persists
await agent.run("My name is Alice", thread_id="session-1")
await agent.run("What's my name?", thread_id="session-1")  # Remembers!

# Different thread_id = fresh context
await agent.run("What's my name?", thread_id="session-2")  # Doesn't know

When to Use ACC

Use ACC When Don't Use When
Long conversations (>10 turns) Short, one-off queries
Need to prevent context drift Stateless operations
Bounded memory is critical Need full conversation replay
Multi-turn workflows Simple Q&A

See acc.md for detailed ACC documentation. | Similar questions rephrased | Outputs must be deterministic | | Intent-based matching | Query structure matters | | High query volume | Low query volume |