Skip to main content

Sub-Agent Delegation

Sub-agents let a primary Penguin agent break large objectives into delegated tasks that can run with scoped permissions, tailored prompts, and isolated execution state.

Delegation Model

  1. Primary agent receives a user instruction and determines that a supporting workflow is needed.
  2. Sub-agent spawn happens via the core orchestration pipeline. Each sub-agent inherits the parent's system prompt, tools, and conversation metadata unless explicitly overridden.
  3. Scoped execution ensures the sub-agent can only act within its delegated objective. Results are streamed back to the parent for evaluation.
  4. Merge and respond: The parent agent inspects the sub-agent's output (and optional partial checkpoints) and incorporates it into the final reply.

Use Cases

  • Running long-lived analysis in parallel with a main dialogue
  • Executing read-only audits before the primary agent performs mutating actions
  • Enlisting specialized prompts (security reviewer, documentation writer) without swapping personas for the entire session

Capabilities

Shared Context

Sub-agents have access to:

  • Conversation history provided by the parent at time of spawn
  • Registered tools (file editing, shell access, web search, etc.)
  • Memory recall, including vector search results and semantic summaries

Scoped State

  • Checkpoints: Sub-agents can create checkpoints tagged with their identifier. Parents can choose to adopt or discard them.
  • Tokens and budgets: Each sub-agent run maintains its own token accounting, allowing strict budgeting without impacting the parent run.
  • Streaming callbacks: Streaming output from sub-agents is surfaced through the same event bus so UIs can display incremental progress.

Working with Sub-Agents

Python API

import asyncio

from penguin.api_client import ChatOptions, PenguinClient


async def research_and_write(prompt: str) -> str:
async with PenguinClient() as client:
parent_id = "primary"
researcher_id = "research"

# Ensure a base conversation for the parent agent
cm = client.core.conversation_manager
cm.create_agent_conversation(parent_id)

# Create a sub-agent that inherits the parent's context window budget
cm.create_sub_agent(
researcher_id,
parent_agent_id=parent_id,
shared_cw_max_tokens=512,
)

# Let the researcher gather information
research_notes = await client.chat(
prompt,
options=ChatOptions(agent_id=researcher_id),
)

# Feed the findings back to the primary agent for synthesis
return await client.chat(
f"Summarize and refine: {research_notes}",
options=ChatOptions(agent_id=parent_id),
)


asyncio.run(research_and_write("Compile highlights from the latest changelog."))

Under the hood the conversation manager clones the parent's system and context state, optionally clamping context-window budgets so the delegated run cannot exceed agreed limits. Advanced setups can combine this with PenguinCore.register_agent to wire dedicated executors once the engine is running.

Need repeatable personas? Define them in config.yml under the agents: section with system prompts, default tools, and model overrides (including alternate providers such as OpenRouter). Pass persona="research" to PenguinCore.register_agent or client.create_sub_agent to pull those defaults in without re-specifying each field.

The CLI exposes these persona presets via penguin agent personas, and you can register or update agents with penguin agent spawn / penguin agent set-persona. The TUI mirrors the same affordances through /agent … commands so multi-agent rosters stay visible while you experiment.

ActionXML (Agents-as-Tools)

Sub-agents can also be managed directly from model output using ActionXML tags:

  • <spawn_sub_agent>{...}</spawn_sub_agent> – create a child (defaults to isolated session/CW). Supports id, parent, persona, system_prompt, share_session, share_context_window, shared_cw_max_tokens, model_config_id or model_overrides, default_tools, and an optional initial_prompt.
  • <stop_sub_agent>{"id": "child"}</stop_sub_agent> – pause a child (engine-driven loops should skip work).
  • <resume_sub_agent>{"id": "child"}</resume_sub_agent> – resume a paused child.
  • <delegate>{"parent": "default", "child": "child", "content": "…", "channel": "dev-room"}</delegate> – send a message to a child and record a delegation event (includes channel).

See penguin/prompt_actions.py for the full syntax and examples.

CLI Quick Reference

  • List agents:
    • penguin agent list (table)
    • penguin agent list --json (script-friendly)
  • Spawn sub-agent:
    • penguin agent spawn child --parent default --isolate-session --isolate-context [--persona research] [--model-id kimi-lite]
  • Pause/Resume:
    • penguin agent pause child
    • penguin agent resume child
  • Agent info:
    • penguin agent info child --json
  • REST convenience:
    • POST /api/v1/agents to spawn (parent optional)
    • POST /api/v1/agents/{id}/delegate to route work with channel metadata
    • PATCH /api/v1/agents/{id} with { "paused": true|false }

Live Script Example

The repository includes scripts/phaseD_live_sub_agent_demo.py, a Python client demo that spawns two sub-agents, runs focused prompts through the engine, and prints conversation summaries. Run it with:

uv run python scripts/phaseD_live_sub_agent_demo.py

It respects the same model requirements documented above (defaulting to the OpenRouter Moonshot model). Use this as a template for richer experiments or integration tests.

Message Flow & Ordering

  • Shared transport: Parent and sub-agents use the same MessageBus fabric as top-level personas. Registering a sub-agent wires an inbox handler for that agent_id; core.send_to_agent(...) simply enqueues events for that handler.
  • Event-driven: Delegates operate asynchronously. Parents send work, then consume events (stream chunks, action results, summaries) as they arrive. There is no blocking RPC; instead, the conversation manager records every message with agent_id, recipient_id, and timestamps so parents can reconstruct the full timeline.
  • Ordering guarantees: Each agent processes its own queue sequentially—tool output emitted by a delegate arrives in-order for that delegate. When multiple agents talk on the same room/channel, interleaving is determined by send time; rely on the recorded timestamps and channel metadata to understand flow.
  • Result merging: Parents typically read the delegate’s conversation history or listen on the shared channel to decide how to respond. For deterministic handoffs, write shared artifacts (e.g., context/TASK_CHARTER.md) so every participant reads the same source of truth before continuing.
  • Synchronous needs: When a parent must wait for a specific completion signal, have the delegate post a sentinel message (e.g., status=ready) or update the shared charter/status file—parents can watch for that condition before proceeding.

REST and WebSocket

Today, REST and WebSocket interfaces expose the agent_id routing parameter. Sub-agent orchestration occurs through the core APIs shown above; API-level payloads for sub-agent creation are on the roadmap and will follow the same intent-but be explicit about that being future work.

Best Practices

  • Keep scopes tight: Sub-agents should have a singular, well-defined objective. Broad scopes reduce determinism.
  • Budget tokens: Supply explicit limits when spawning analysis-heavy sub-agents to avoid runaway costs.
  • Audit results: Treat sub-agent output as suggestions; validate before enacting irreversible changes.
  • Instrument: Include sub-agent identifiers in your telemetry so you can monitor success rates and latency.

Roadmap

  • First-class CLI commands for configuring sub-agent templates
  • Adaptive delegation heuristics that decide when to spawn sub-agents automatically
  • Fine-grained permission profiles per sub-agent (read-only vs. write access)
  • Visualizations in the web UI showing delegation trees and progress

Need to coordinate multiple top-level personas instead? Check out Multi-Agent Orchestration.