Sub-Agent Delegation
Sub-agents let a primary Penguin agent break large objectives into delegated tasks that can run with scoped permissions, tailored prompts, and isolated execution state.
Delegation Model
- Primary agent receives a user instruction and determines that a supporting workflow is needed.
- Sub-agent spawn happens via the core orchestration pipeline. Each sub-agent inherits the parent's system prompt, tools, and conversation metadata unless explicitly overridden.
- Scoped execution ensures the sub-agent can only act within its delegated objective. Results are streamed back to the parent for evaluation.
- Merge and respond: The parent agent inspects the sub-agent's output (and optional partial checkpoints) and incorporates it into the final reply.
Use Cases
- Running long-lived analysis in parallel with a main dialogue
- Executing read-only audits before the primary agent performs mutating actions
- Enlisting specialized prompts (security reviewer, documentation writer) without swapping personas for the entire session
Capabilities
Shared Context
Sub-agents have access to:
- Conversation history provided by the parent at time of spawn
- Registered tools (file editing, shell access, web search, etc.)
- Memory recall, including vector search results and semantic summaries
Scoped State
- Checkpoints: Sub-agents can create checkpoints tagged with their identifier. Parents can choose to adopt or discard them.
- Tokens and budgets: Each sub-agent run maintains its own token accounting, allowing strict budgeting without impacting the parent run.
- Streaming callbacks: Streaming output from sub-agents is surfaced through the same event bus so UIs can display incremental progress.
Working with Sub-Agents
Python API
import asyncio
from penguin.api_client import ChatOptions, PenguinClient
async def research_and_write(prompt: str) -> str:
async with PenguinClient() as client:
parent_id = "primary"
researcher_id = "research"
# Ensure a base conversation for the parent agent
cm = client.core.conversation_manager
cm.create_agent_conversation(parent_id)
# Create a sub-agent that inherits the parent's context window budget
cm.create_sub_agent(
researcher_id,
parent_agent_id=parent_id,
shared_cw_max_tokens=512,
)
# Let the researcher gather information
research_notes = await client.chat(
prompt,
options=ChatOptions(agent_id=researcher_id),
)
# Feed the findings back to the primary agent for synthesis
return await client.chat(
f"Summarize and refine: {research_notes}",
options=ChatOptions(agent_id=parent_id),
)
asyncio.run(research_and_write("Compile highlights from the latest changelog."))
Under the hood the conversation manager clones the parent's system and context state, optionally clamping context-window budgets so the delegated run cannot exceed agreed limits. Advanced setups can combine this with PenguinCore.register_agent
to wire dedicated executors once the engine is running.
Need repeatable personas? Define them in config.yml
under the agents:
section with system prompts, default tools, and model overrides (including alternate providers such as OpenRouter). Pass persona="research"
to PenguinCore.register_agent
or client.create_sub_agent
to pull those defaults in without re-specifying each field.
The CLI exposes these persona presets via penguin agent personas
, and you can register or update agents with penguin agent spawn
/ penguin agent set-persona
. The TUI mirrors the same affordances through /agent …
commands so multi-agent rosters stay visible while you experiment.
ActionXML (Agents-as-Tools)
Sub-agents can also be managed directly from model output using ActionXML tags:
<spawn_sub_agent>{...}</spawn_sub_agent>
– create a child (defaults to isolated session/CW). Supportsid
,parent
,persona
,system_prompt
,share_session
,share_context_window
,shared_cw_max_tokens
,model_config_id
ormodel_overrides
,default_tools
, and an optionalinitial_prompt
.<stop_sub_agent>{"id": "child"}</stop_sub_agent>
– pause a child (engine-driven loops should skip work).<resume_sub_agent>{"id": "child"}</resume_sub_agent>
– resume a paused child.<delegate>{"parent": "default", "child": "child", "content": "…", "channel": "dev-room"}</delegate>
– send a message to a child and record a delegation event (includeschannel
).
See penguin/prompt_actions.py
for the full syntax and examples.
CLI Quick Reference
- List agents:
penguin agent list
(table)penguin agent list --json
(script-friendly)
- Spawn sub-agent:
penguin agent spawn child --parent default --isolate-session --isolate-context [--persona research] [--model-id kimi-lite]
- Pause/Resume:
penguin agent pause child
penguin agent resume child
- Agent info:
penguin agent info child --json
- REST convenience:
POST /api/v1/agents
to spawn (parent optional)POST /api/v1/agents/{id}/delegate
to route work with channel metadataPATCH /api/v1/agents/{id}
with{ "paused": true|false }
Live Script Example
The repository includes scripts/phaseD_live_sub_agent_demo.py
, a Python client
demo that spawns two sub-agents, runs focused prompts through the engine, and
prints conversation summaries. Run it with:
uv run python scripts/phaseD_live_sub_agent_demo.py
It respects the same model requirements documented above (defaulting to the OpenRouter Moonshot model). Use this as a template for richer experiments or integration tests.
Message Flow & Ordering
- Shared transport: Parent and sub-agents use the same MessageBus fabric as top-level personas. Registering a sub-agent wires an inbox handler for that
agent_id
;core.send_to_agent(...)
simply enqueues events for that handler. - Event-driven: Delegates operate asynchronously. Parents send work, then consume events (stream chunks, action results, summaries) as they arrive. There is no blocking RPC; instead, the conversation manager records every message with
agent_id
,recipient_id
, and timestamps so parents can reconstruct the full timeline. - Ordering guarantees: Each agent processes its own queue sequentially—tool output emitted by a delegate arrives in-order for that delegate. When multiple agents talk on the same room/channel, interleaving is determined by send time; rely on the recorded timestamps and
channel
metadata to understand flow. - Result merging: Parents typically read the delegate’s conversation history or listen on the shared channel to decide how to respond. For deterministic handoffs, write shared artifacts (e.g.,
context/TASK_CHARTER.md
) so every participant reads the same source of truth before continuing. - Synchronous needs: When a parent must wait for a specific completion signal, have the delegate post a sentinel message (e.g.,
status=ready
) or update the shared charter/status file—parents can watch for that condition before proceeding.
REST and WebSocket
Today, REST and WebSocket interfaces expose the agent_id
routing parameter. Sub-agent orchestration occurs through the core APIs shown above; API-level payloads for sub-agent creation are on the roadmap and will follow the same intent-but be explicit about that being future work.
Best Practices
- Keep scopes tight: Sub-agents should have a singular, well-defined objective. Broad scopes reduce determinism.
- Budget tokens: Supply explicit limits when spawning analysis-heavy sub-agents to avoid runaway costs.
- Audit results: Treat sub-agent output as suggestions; validate before enacting irreversible changes.
- Instrument: Include sub-agent identifiers in your telemetry so you can monitor success rates and latency.
Roadmap
- First-class CLI commands for configuring sub-agent templates
- Adaptive delegation heuristics that decide when to spawn sub-agents automatically
- Fine-grained permission profiles per sub-agent (read-only vs. write access)
- Visualizations in the web UI showing delegation trees and progress
Need to coordinate multiple top-level personas instead? Check out Multi-Agent Orchestration.