Subagents: Native Delegation Support¶
Status: Production Ready (v0.x.x+)
Subagents enable true multi-agent coordination where a coordinator agent delegates tasks to specialist agents while preserving full metadata (tokens, duration, delegation chain) for accurate cost tracking and observability.
Overview¶
The Solution¶
Native subagents= parameter preserves Response metadata through executor interception:
# ✅ New approach - preserves metadata
specialist = Agent(name="specialist", model="gpt-5.4")
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={"specialist": specialist}, # Full Response[T] preserved
)
response = await coordinator.run("Analyze this data")
# Token count includes coordinator + specialist ✅
print(f"Total tokens: {response.metadata.tokens.total_tokens}")
print(f"Subagent calls: {len(response.subagent_responses)}")
How It Works¶
- LLM Perspective: Subagents appear as regular tools with a
taskparameter - Executor Interception: Executor detects subagent tools and routes to SubagentRegistry
- Metadata Preservation: Full
Response[T]objects are cached, not just strings - Automatic Aggregation: Tokens, duration, and delegation chain are aggregated automatically
Key Principle: Zero LLM behavior changes - uses existing tool calling mechanism.
Quick Start¶
Basic Example¶
from cogent import Agent
# Create specialist agents
data_analyst = Agent(
name="data_analyst",
model="gpt-5.4-mini",
instructions="Analyze data and provide statistical insights.",
)
market_researcher = Agent(
name="market_researcher",
model="gpt-5.4-mini",
instructions="Research market trends and competitive landscape.",
)
# Create coordinator with subagents
coordinator = Agent(
name="coordinator",
model="gpt-5.4-mini",
instructions="""Coordinate research tasks:
- Use data_analyst for numerical analysis
- Use market_researcher for market trends
Synthesize their findings.""",
# Simply pass the agents - their names become tool names
subagents=[data_analyst, market_researcher],
)
# Run task - coordinator will delegate automatically
response = await coordinator.run(
"Analyze Q4 2025 e-commerce growth: 18% YoY to $1.2T globally, "
"mobile is 65% of total. What are the key insights?"
)
print(response.content)
print(f"Total tokens: {response.metadata.tokens.total_tokens}")
Note: You can also use a dict to override tool names:
# Dict form: explicit tool names (optional)
subagents={
"data_analyst": data_analyst, # Tool name = "data_analyst"
"market_researcher": market_researcher, # Tool name = "market_researcher"
}
# List form: uses agent.name (simpler!)
subagents=[data_analyst, market_researcher] # Uses agent names automatically
Structured Output from Subagents¶
By default, a subagent's response is sent to the parent LLM as a plain string. Use returns= on the subagent to declare a structured output schema — the coordinator's LLM then receives clean JSON it can reason over directly.
How It Works¶
returns=MySchemais set on the subagentAgent- When the coordinator delegates to it,
agent.run(task, returns=MySchema)is called automatically - The executor serializes
StructuredResult.dataas JSON (model_dump_json()) for the parent LLM - Parent LLM sees clean, typed JSON instead of a Python repr string
This works identically for remote A2A subagents via A2AAgent: set returns= at construction time or pass it per-call. The remote agent's response is parsed on the coordinator side after the wire round-trip.
Round-trip for remote structured output:
- Remote
Agent(returns=Schema)producesStructuredResult— the server serializes viamodel_dump_json()before sending over the wire A2AAgentreceives the JSON text and re-validates it withvalidate_and_parse(text, schema)- The coordinator's executor sees a
StructuredResultand passes clean JSON to the parent LLM
Example¶
from pydantic import BaseModel
from typing import Literal
class ReviewScore(BaseModel):
score: int
verdict: Literal["approved", "needs_revision"]
feedback: str
# Reviewer declares what it returns when used as a subagent
reviewer = Agent(
name="reviewer",
model="gpt-5.4-mini",
description="Review copy for quality and compliance",
returns=ReviewScore,
instructions="Review content. Score 1-10. Be concise.",
)
writer = Agent(
name="writer",
model="gpt-5.4-mini",
description="Write marketing copy",
instructions="Write concise, factual copy.",
)
editor = Agent(
name="editor",
model="gpt-5.4-mini",
subagents=[writer, reviewer],
instructions="Write copy, have it reviewed, return the result.",
)
# When editor delegates to reviewer, the LLM sees:
# {"score": 8, "verdict": "approved", "feedback": "Clear and engaging."} ✅
# Instead of a raw string representation
result = await editor.run("Write a tweet for SmartWatch")
Remote A2A subagent with returns=¶
from cogent import A2AAgent
# Set returns= at construction time — applies to every delegation
remote_reviewer = A2AAgent(
url="http://review-svc/a2a",
name="reviewer",
description="Review copy for quality and compliance",
returns=ReviewScore,
)
# Or override per-call
response = await remote_reviewer.run("Review this tweet", returns=ReviewScore)
result = response.content # StructuredResult[ReviewScore]
print(result.data.score) # 8
print(result.data.verdict) # "approved"
The remote agent's server serialises its StructuredResult as JSON over the wire. The A2AAgent validates the received JSON against the schema locally — the same validate_and_parse path used for local agents.
When to Use returns=¶
| Situation | Use returns=? |
|---|---|
| Subagent produces a well-defined data structure | ✅ Yes |
| Parent needs to branch on subagent output (score, verdict, etc.) | ✅ Yes |
| Subagent just writes freeform text (e.g., articles, copy) | ❌ No — plain string is fine |
| Multiple coordinators reuse the same subagent | ✅ Yes — schema is defined once |
returns= only affects behavior when the agent is called as a subagent. A standalone agent.run() call without an explicit returns= ignores it.
Accessing Metadata¶
# Token aggregation (coordinator + all subagents)
tokens = response.metadata.tokens
print(f"Total tokens: {tokens.total_tokens}")
print(f" Prompt: {tokens.prompt_tokens}")
print(f" Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
print(f" Reasoning: {tokens.reasoning_tokens}")
# Individual subagent responses
for sub_resp in response.subagent_responses:
print(f"{sub_resp.metadata.agent}: {sub_resp.metadata.tokens.total_tokens} tokens")
if sub_resp.metadata.tokens.reasoning_tokens:
print(f" └─ Reasoning: {sub_resp.metadata.tokens.reasoning_tokens}")
# Delegation chain
for delegation in response.metadata.delegation_chain:
print(f"{delegation['agent']} - {delegation['tokens']} tokens - {delegation['duration']:.2f}s")
API Reference¶
Agent Constructor¶
Agent(
name: str,
model: str | BaseChatModel,
subagents: dict[str, AgentLike] | Sequence[AgentLike] | None = None,
**kwargs
)
Parameters:
- subagents: Local agents or remote A2A adapters for delegation
- dict form: Explicit tool names {"tool_name": agent_or_a2a}
- list/tuple form: Uses agent.config.name as tool name [agent1, a2a_agent]
Examples:
# Local agents only
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[analyst_agent, researcher_agent],
)
# Mix local and remote A2A agents
from cogent import A2AAgent
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[
local_analyst,
A2AAgent(
url="http://remote-svc/a2a",
name="remote_writer",
description="Remote writing specialist",
),
],
)
# Dict form - override tool names if needed
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={
"custom_analyst_name": analyst_agent,
"custom_researcher_name": researcher_agent,
},
)
A2AAgent¶
A2AAgent(
url: str,
*,
name: str,
description: str = "",
timeout: float = 60.0,
headers: dict[str, str] | None = None,
returns: type | dict | None = None,
)
Wraps a remote A2A endpoint so it can be passed to subagents= alongside local agents.
returns= accepts the same schemas as a local Agent (Pydantic model, dataclass, TypedDict, JSON Schema dict). When provided, the text response from the remote agent is parsed and validated before being returned to the caller. The per-call returns= kwarg on A2AAgent.run() overrides the instance-level value.
A2AServer¶
A2AServer(
agent: Agent,
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None, # public base URL for AgentCard; defaults to http://{host}:{port}
version: str = "1.0",
skills: list[AgentSkill] | None = None,
task_store: str | TaskStore | None = None, # persistence backend
streaming: bool = True, # emit incremental SSE token events
push_notifications: bool = False, # enable webhook push-notification callbacks
security_schemes: dict[str, SecurityScheme] | None = None, # AgentCard auth schemes
security: list[dict[str, list[str]]] | None = None, # AgentCard security requirements
)
| Member | Description |
|---|---|
.app |
FastAPI ASGI application (cached, safe to mount) |
.agent_card() |
Build and return the a2a.types.AgentCard |
await .start() |
Start in background (default); returns self when port is bound. Pass background=False to block until stopped |
await .stop() |
Stop a background server |
.run() |
asyncio.run(self.start(background=False)) — for scripts |
await A2AServer.start_many(...) |
Start multiple servers concurrently; returns a ServerGroup |
async with A2AServer(...) as srv: |
Start in background, stop on exit — context-manager style |
task_store options:
| Value | Behaviour |
|---|---|
None (default) |
InMemoryTaskStore — tasks lost on restart |
"sqlite+aiosqlite:///tasks.db" |
SQLite via aiosqlite; schema auto-created |
"postgresql+asyncpg://user:pass@host/db" |
PostgreSQL via asyncpg |
TaskStore instance |
Bring your own implementation |
Requires aiosqlite or asyncpg for database URLs.
push_notifications — webhook callbacks:
When push_notifications=True, the server enables the A2A push-notification protocol.
Clients may register a callback URL via tasks/pushNotification/set; the server will
POST task-status updates to that URL as the task progresses.
security_schemes and security — AgentCard authentication:
Declare authentication requirements in the AgentCard following OpenAPI 3.0 conventions.
Clients inspect the card to discover how to authenticate before sending tasks.
from a2a.types import APIKeySecurityScheme, SecurityScheme
server = A2AServer(
agent,
port=10002,
security_schemes={
"api-key": SecurityScheme(root=APIKeySecurityScheme(name="X-API-Key", in_="header"))
},
security=[{"api-key": []}],
)
The A2AAgent client side already supports headers= for passing auth tokens:
serve_agent¶
async def serve_agent(
agent: Agent,
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None,
version: str = "1.0",
task_store: str | TaskStore | None = None,
streaming: bool = True,
push_notifications: bool = False,
security_schemes: dict[str, SecurityScheme] | None = None,
security: list[dict[str, list[str]]] | None = None,
) -> None
Async convenience wrapper around A2AServer.start().
Agent.serve¶
agent.serve(
*,
host: str = "0.0.0.0",
port: int = 10000,
url: str | None = None,
version: str = "1.0",
) -> None
Blocking one-liner for scripts. Internally creates A2AServer(self, ...).run().
Response Metadata¶
@dataclass
class Response[T]:
content: T
metadata: ResponseMetadata
subagent_responses: list[Response] | None # NEW: Responses from delegated subagents
# ... other fields
@dataclass
class ResponseMetadata:
agent: str
model: str
tokens: TokenUsage
duration: float
delegation_chain: list[dict] | None # NEW: Chain of delegations
# ... other fields
Delegation Chain Structure:
{
"agent": "analyst", # Subagent name
"model": "gpt-5.4-mini", # Model used
"tokens": 150, # Total tokens
"duration": 2.5, # Seconds
}
Serving an Agent over A2A¶
To expose any Agent as a remote A2A endpoint, use one of three entry points depending
on how much control you need.
One-liner (scripts and demos):
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
agent.serve(port=10002) # blocks until Ctrl+C
Async entrypoint (integrates into an existing async main):
from cogent.agent.a2a_server import serve_agent
async def main():
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
await serve_agent(agent, port=10002) # blocks until Ctrl+C
Mount into an existing FastAPI app (low-level):
from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn
agent = Agent(name="analyst", model="gpt-5.4", instructions="...")
server = A2AServer(agent, port=10002, url="http://localhost:10002/a2a")
app = FastAPI()
@app.get("/health")
def health():
return {"status": "ok"}
app.mount("/a2a", server.app)
uvicorn.run(app, host="0.0.0.0", port=10002)
Once running, call it from anywhere with A2AAgent:
from cogent.agent.a2a import A2AAgent
remote = A2AAgent(url="http://localhost:10002", name="analyst")
coordinator = Agent(name="coordinator", model="gpt-5.4", subagents=[remote])
Best Practices¶
1. Clear Agent Responsibilities¶
# ✅ GOOD: Specific, non-overlapping responsibilities
data_cleaner = Agent(
name="data_cleaner",
instructions="Clean and normalize messy data. Fix formatting, handle nulls.",
)
data_validator = Agent(
name="data_validator",
instructions="Validate data quality. Check for errors, inconsistencies.",
)
# ❌ BAD: Overlapping, vague responsibilities
helper1 = Agent(name="helper1", instructions="Help with data stuff")
helper2 = Agent(name="helper2", instructions="Also help with data")
2. Descriptive Naming¶
# ✅ GOOD: Names that indicate purpose
subagents={
"sql_generator": sql_agent,
"data_visualizer": viz_agent,
"report_writer": report_agent,
}
# ❌ BAD: Generic names
subagents={
"agent1": sql_agent,
"helper": viz_agent,
"assistant": report_agent,
}
3. Coordinator Instructions¶
# ✅ GOOD: Explicit delegation guidelines
coordinator = Agent(
instructions="""You coordinate ETL tasks:
- Use data_analyst to understand CSV structure and issues
- Use data_cleaner to design transformation rules
- Use sql_generator to create database schema
Synthesize their work into a complete ETL plan.""",
subagents={...},
)
# ❌ BAD: Vague instructions
coordinator = Agent(
instructions="You have some helpers. Use them if you want.",
subagents={...},
)
4. Observability¶
Always use an Observer to see delegation flow:
from cogent import Agent, Observer
observer = Observer(level="progress")
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={...},
observer=observer, # Shows [subagent-decision], [subagent-call], etc.
)
5. Context Propagation¶
Context automatically propagates through delegation:
from cogent import RunContext
ctx = RunContext(
thread_id="session-123",
user_id="user-456",
metadata={"department": "analytics"},
)
response = await coordinator.run("Analyze data", context=ctx)
# All subagents receive the same context automatically
Advanced Patterns¶
Nested Subagents¶
Subagents can have their own subagents:
# Specialist with sub-specialists
data_analyst = Agent(
name="data_analyst",
model="gpt-5.4",
subagents={
"statistician": statistician_agent,
"visualizer": viz_agent,
},
)
# Top-level coordinator
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents={
"data_analyst": data_analyst, # Has its own subagents
"report_writer": writer_agent,
},
)
Remote Agents via A2A¶
A2AAgent wraps any Agent2Agent (A2A) protocol endpoint so it
participates in delegation exactly like a local agent. Requires the a2a extra:
from cogent import Agent, A2AAgent
remote_analyst = A2AAgent(
url="http://analyst-service/a2a",
name="analyst",
description="Remote financial analyst running on a separate service",
)
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
subagents=[
local_writer, # in-process Agent
remote_analyst, # remote A2A server
],
)
response = await coordinator.run("Analyse Q4 results and write a report")
The LLM sees both as regular tools — no special instructions needed. RunContext is not forwarded over the wire; each remote call carries only the task string.
Serving a Cogent Agent via A2A¶
Any cogent agent can be exposed as an A2A HTTP endpoint so that external A2A clients
(including other A2AAgent instances) can call it. Requires the same a2a extra.
High-level — recommended for scripts and __main__ blocks:
agent = Agent(
name="analyst",
model="gpt-5.4",
instructions="You are a financial analyst.",
description="Financial data analyst",
)
agent.serve(port=10002) # blocks until Ctrl+C
Mid-level — for use inside an existing async entrypoint:
from cogent.agent.a2a_server import serve_agent
async def main():
await serve_agent(agent, port=10002)
asyncio.run(main())
Low-level — mount into an existing FastAPI application:
from fastapi import FastAPI
from cogent.agent.a2a_server import A2AServer
import uvicorn
app = FastAPI()
server = A2AServer(agent, port=10002, url="http://myhost:10002/a2a")
app.mount("/a2a", server.app) # A2AServer.app is a FastAPI sub-application
uvicorn.run(app, port=10002)
Once served, any A2A client — another cogent system, a different framework, or a raw
HTTP client — can call the agent at http://host:port/ and discover its capabilities
at http://host:port/.well-known/agent.json.
In-process — self-contained scripts and tests:
start() launches the server as a background task and returns as soon as the port is
bound. Call stop() when you are done. async with is also supported and stops the
server automatically on block exit.
from cogent.agent.a2a_server import A2AServer
# Explicit start / stop
analyst_server = await A2AServer(agent, port=10099).start()
remote = A2AAgent(url="http://localhost:10099", name="analyst")
response = await remote.run("What is 18% of 250?")
print(response.content)
await analyst_server.stop()
Multiple servers — flat, no nesting:
from cogent.agent.a2a_server import A2AServer
group = await A2AServer.start_many(
(analyst, 10001),
(researcher, 10002),
(writer, 10003),
)
response = await coordinator.run("...")
await group.stop_all()
Context-manager style — automatic cleanup:
from cogent.agent.a2a_server import A2AServer
async with A2AServer(agent, port=10099) as server:
remote = A2AAgent(url="http://localhost:10099", name="analyst")
response = await remote.run("What is 18% of 250?")
print(response.content)
# Server stopped automatically on exit.
Conditional Delegation¶
The LLM decides when to delegate:
coordinator = Agent(
instructions="""Analyze requests:
- For simple questions, answer directly
- For complex analysis, delegate to data_analyst
- For market research, delegate to market_researcher
Use your judgment on which tasks need specialist help.""",
subagents={
"data_analyst": analyst,
"market_researcher": researcher,
},
)
# LLM may or may not delegate based on complexity
response1 = await coordinator.run("What is 2+2?") # Answers directly
response2 = await coordinator.run("Analyze Q4 sales trends") # Delegates to analyst
Mixed Tools and Subagents¶
Subagents and regular tools work together:
from cogent import tool
@tool
def search_database(query: str) -> str:
"""Search internal database."""
return database.search(query)
coordinator = Agent(
name="coordinator",
model="gpt-5.4",
tools=[search_database], # Regular tool
subagents={
"analyst": analyst_agent, # Subagent
},
)
# LLM can use both:
# 1. Call search_database tool → get data
# 2. Delegate to analyst → analyze data
Troubleshooting¶
Subagent not being called¶
Problem: LLM ignores subagent tools
Solutions: - Make coordinator instructions explicit about delegation - Use descriptive subagent names (e.g., "data_analyst" not "helper") - Add descriptions to subagent Agent configs
specialist = Agent(
name="specialist",
model="gpt-5.4",
description="Expert in data analysis and statistics", # Helps LLM understand when to call
)
Token counts seem wrong¶
Problem: Tokens don't match expectations
Debug:
tokens = response.metadata.tokens
print(f"Coordinator total: {tokens.total_tokens}")
print(f" Prompt: {tokens.prompt_tokens}")
print(f" Completion: {tokens.completion_tokens}")
if tokens.reasoning_tokens:
print(f" Reasoning: {tokens.reasoning_tokens}")
for sub in response.subagent_responses:
sub_tokens = sub.metadata.tokens
print(f"{sub.metadata.agent}: {sub_tokens.total_tokens} tokens")
if sub_tokens.reasoning_tokens:
print(f" └─ Reasoning: {sub_tokens.reasoning_tokens}")
print(f"Delegation chain: {response.metadata.delegation_chain}")
Note: Token counts include prompt + completion + reasoning (when available). All categories are aggregated across coordinator and all subagents.
Subagent errors¶
Problem: Subagent fails during execution
Behavior: Error returned to coordinator as tool result, LLM can retry or handle
# LLM sees error message and can:
# 1. Retry with different parameters
# 2. Try different subagent
# 3. Handle error in response
Performance Considerations¶
Memory Usage¶
Each subagent maintains its own conversation history if conversation=True:
# ✅ GOOD: Disable conversation for stateless subagents
data_cleaner = Agent(
name="data_cleaner",
model="gpt-5.4-mini",
conversation=False, # Saves memory
)
# ❌ BAD: Unnecessary conversation history
data_cleaner = Agent(
name="data_cleaner",
model="gpt-5.4-mini",
conversation=True, # Wastes memory if not needed
)
Parallel Execution¶
Subagents execute in parallel when LLM calls multiple at once:
# LLM decides to call both in one turn
# → Both execute in parallel automatically
# → Results returned together
Model Selection¶
Use appropriate models for each role:
# ✅ GOOD: Match model to task complexity
coordinator = Agent(
name="coordinator",
model="gpt-5.4", # Complex orchestration
subagents={
"summarizer": Agent(model="gpt-5.4-mini"), # Simple task
"analyst": Agent(model="gpt-5.4"), # Complex analysis
},
)
Examples¶
See:
- examples/basics/simple_delegation.py - Minimal example
- examples/advanced/subagent_coordinator.py - Full-featured coordinator
- examples/advanced/single_vs_multi_etl.py - Before/after comparison
- examples/subagent/a2a_delegation.py - A2AAgent in subagents= alongside local agents
- examples/subagent/a2a_serve.py - Serving a cogent agent as an A2A endpoint (all three API levels)
Comparison: Single Agent vs Multi-Agent¶
| Aspect | Single Agent | Multi-Agent (Subagents) |
|---|---|---|
| Memory | Natural 4-layer memory | Must share via RunContext |
| Complexity | Simple, one config | More setup, multiple configs |
| Specialization | Generalist approach | Focused specialists |
| Token cost | Usually lower | Higher (multiple calls) |
| Observability | One agent trace | Full delegation chain |
| Best for | Linear workflows | Complex coordination |
Rule of Thumb: Start with a single agent. Add subagents when you need: - Clear specialist roles (SQL expert, data cleaner, etc.) - Separation of concerns (analysis vs presentation) - Delegated decision-making (coordinator decides who handles what)
FAQ¶
Q: Can subagents have different models?
A: Yes! Each agent can use a different model.
Q: Do subagents share conversation history?
A: No. Each agent has its own conversation if conversation=True. Use RunContext to share state.
Q: Can I mix subagents= and tools=?
A: Yes! They work together seamlessly.
Q: Are token counts accurate?
A: Yes - coordinator + all subagent tokens are aggregated automatically.
Q: Can subagents call each other?
A: Not directly. But nested subagents work (subagent has its own subagents).
Q: What if a subagent fails?
A: Error is returned to coordinator as a tool result. The LLM can handle it.
Q: How deep can I nest?
A: No hard limit, but 2-3 levels max is recommended for clarity.
Q: Does this work with all models?
A: Yes - any model that supports tool calling (OpenAI, Anthropic, Gemini, etc.)