Agentive Context Control (ACC)¶
Bounded memory for long conversations with drift prevention.
Overview¶
ACC (Agentic Context Compression) maintains bounded internal state instead of unbounded transcript replay. Based on arXiv:2601.11653, it prevents:
- Context drift — Maintains constraints and entities across turns
- Memory poisoning — Verifies artifacts before committing
- Context overflow — Bounded state regardless of conversation length
Quick Start¶
Enable ACC with acc=True on Agent or Memory:
from cogent import Agent
from cogent.memory import Memory
# Option 1: Enable on Agent
agent = Agent(name="Assistant", model="gpt-5.4", acc=True)
# Option 2: Enable on Memory
memory = Memory(acc=True)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)
# Use thread_id to persist context across turns
await agent.run("My name is Alice", thread_id="session-1")
await agent.run("I prefer dark mode", thread_id="session-1")
await agent.run("What's my name?", thread_id="session-1") # Remembers!
Custom Bounds¶
For fine-grained control, pass custom bounds directly to AgentCognitiveCompressor:
from cogent import Agent
from cogent.memory import Memory
from cogent.memory.acc import AgentCognitiveCompressor
# Create ACC with custom bounds
acc = AgentCognitiveCompressor(
max_constraints=10, # Rules, guidelines (default: 10)
max_entities=30, # Facts, knowledge (default: 50)
max_actions=20, # Past actions (default: 30)
max_context=15, # Relevant context (default: 20)
)
# Pass to Agent or Memory
agent = Agent(name="Assistant", model="gpt-5.4", acc=acc)
# OR
memory = Memory(acc=acc)
agent = Agent(name="Assistant", model="gpt-5.4", memory=memory)
# Access state for monitoring
print(f"Entities: {len(acc.state.entities)}/{acc.state.max_entities}")
print(f"Actions: {len(acc.state.actions)}/{acc.state.max_actions}")
Extraction Modes¶
ACC supports two extraction modes:
| Mode | Description | Speed | Quality |
|---|---|---|---|
heuristic |
Rule-based extraction (default) | ⚡ Fast | Good |
model |
LLM-based semantic extraction | Slower | Better |
Heuristic Mode (Default)¶
Fast, rule-based extraction using keyword matching and simple heuristics:
Model Mode¶
Uses an LLM to semantically extract constraints, entities, and actions:
# Option 1: Specify a dedicated efficient model
acc = AgentCognitiveCompressor(
extraction_mode="model",
model="gpt-5.4-mini", # Efficient model for extraction
)
# Option 2: Use agent's model (no model specified)
acc = AgentCognitiveCompressor(
extraction_mode="model",
# model=None → Uses agent's model automatically
)
# Option 3: Pass any BaseChatModel
from cogent.models import AnthropicChat
acc = AgentCognitiveCompressor(
extraction_mode="model",
model=AnthropicChat(model="claude-3-haiku-20240307"),
)
Recommendation: Use extraction_mode="model" with model="gpt-5.4-mini" or similar efficient model for best quality/cost balance.
When to Use ACC¶
| Use ACC When | Don't Use When |
|---|---|
| Long conversations (>10 turns) | Short, stateless queries |
| Need to prevent drift | Simple Q&A |
| Bounded memory is critical | Need full transcript replay |
| Multi-turn workflows | One-off operations |
How ACC Works¶
ACC maintains bounded internal state with four categories:
| Category | Purpose | Default Max |
|---|---|---|
| Constraints | Rules, guidelines, requirements | 10 |
| Entities | Facts, knowledge, data | 50 |
| Actions | What worked/failed | 30 |
| Context | Relevant snippets | 20 |
Total: ~110 items regardless of conversation length.
from cogent.memory.acc import BoundedMemoryState
# View state contents
state = BoundedMemoryState()
print(state.constraints) # List of constraints
print(state.entities) # List of entities
print(state.actions) # List of actions
print(state.context) # List of context items
ACC vs SemanticCache¶
| Feature | ACC | SemanticCache |
|---|---|---|
| Purpose | Bounded conversation context | Cache tool outputs |
| Matching | Structured memory extraction | Semantic similarity |
| Use Case | Long conversations | Expensive tool calls |
| Thread-aware | Yes (thread_id) | No |
Use together: ACC for conversation context, SemanticCache for tool output caching.
Best Practices¶
- Always use thread_id — Required for context persistence across turns
- Set appropriate bounds — Smaller bounds = less context but faster
- Scope per user/session — Use unique thread_id per conversation
- Monitor state — Check entity/action counts for debugging
Examples¶
See working examples: - examples/advanced/acc.py — ACC usage patterns - examples/advanced/content_review.py — ACC with Memory integration
API Reference¶
BoundedMemoryState¶
class BoundedMemoryState:
def __init__(
self,
max_constraints: int = 10,
max_entities: int = 50,
max_actions: int = 30,
max_context: int = 20,
):
"""Initialize bounded state with category limits."""
@property
def constraints(self) -> list[str]: ...
@property
def entities(self) -> list[str]: ...
@property
def actions(self) -> list[str]: ...
@property
def context(self) -> list[str]: ...
AgentCognitiveCompressor¶
class AgentCognitiveCompressor:
def __init__(
self,
state: BoundedMemoryState,
forget_gate: SemanticForgetGate | None = None,
):
"""Initialize ACC with bounded state."""
async def update_from_turn(
self,
user_message: str,
assistant_message: str,
tool_calls: list[dict],
current_task: str,
) -> None:
"""Update memory state from a conversation turn."""
Further Reading¶
- Memory System — Overview of all memory components
- Semantic Cache — Similarity-based caching
- Agent Configuration — Configuring agents with ACC