Multi-Agent Configuration

This guide covers the full configuration for multi-agent systems in ByteBrew Engine. Multi-agent orchestration lets a supervisor agent delegate tasks to specialist agents, each with their own model, tools, and constraints.

Orchestrator pattern (hub-and-spoke)

The most common pattern is a single orchestrator (supervisor) agent that delegates to multiple specialist agents:

         ┌─────────────┐
         │  Orchestrator │  (persistent, powerful model)
         │  can_spawn:   │
         │  - researcher │
         │  - writer     │
         │  - reviewer   │
         └──────┬────────┘
                │
     ┌──────────┼──────────┐
     │          │          │
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│research│ │ writer │ │reviewer│  (spawn, cheaper models)
│  er    │ │        │ │        │
└────────┘ └────────┘ └────────┘

The orchestrator receives user messages and decides which specialist(s) to invoke based on reasoning. Each specialist runs with lifecycle: spawn — fresh context, focused on the subtask, terminates after completion.

can_spawn configuration

The can_spawn field lists which agents a given agent is allowed to create at runtime:

agents:
  orchestrator:
    model: gpt-4o
    can_spawn:
      - researcher     # Engine creates spawn_researcher tool
      - writer         # Engine creates spawn_writer tool
      - reviewer       # Engine creates spawn_reviewer tool

For each entry in can_spawn, the engine auto-generates a tool named spawn_<agent-name>. The orchestrator’s LLM sees these as regular tools and decides when to use them based on the system prompt and conversation context.

The generated spawn tool accepts a single message parameter — the task description for the sub-agent:

# What the LLM sees:
Tool: spawn_researcher
Description: Spawn the 'researcher' agent with a task message
Parameters:
  message (required): The task to assign to the researcher agent

Lifecycle: persistent vs spawn

Setting	When to use	Behavior
`persistent`	Orchestrators, customer-facing agents	Maintains conversation history within a single session. Each new session starts fresh unless the Memory capability is enabled. Never terminates mid-session.
`spawn`	Specialist sub-agents	Fresh context per invocation. Returns a summary when done, then terminates. No memory between invocations.

agents:
  orchestrator:
    lifecycle: persistent     # Keeps conversation history
    can_spawn: [specialist]

  specialist:
    lifecycle: spawn          # Fresh for each delegation

Nested spawn (orchestrator -> agent -> sub-agent)

Specialists can themselves spawn further sub-agents, creating a tree:

agents:
  ceo:
    model: gpt-4o
    lifecycle: persistent
    can_spawn: [sales-lead, engineering-lead]

  sales-lead:
    model: gpt-4o-mini
    lifecycle: spawn
    can_spawn: [market-researcher, proposal-writer]

  engineering-lead:
    model: gpt-4o-mini
    lifecycle: spawn
    can_spawn: [code-reviewer, test-runner]

  market-researcher:
    model: gpt-4o-mini
    lifecycle: spawn
    mcp_servers: [web-search]    # Web search via MCP (Tavily, Brave, etc.)

  proposal-writer:
    model: gpt-4o-mini
    lifecycle: spawn
    tools: [knowledge_search]

  code-reviewer:
    model: qwen-local
    lifecycle: spawn
    tools:
      - knowledge_search

  test-runner:
    model: qwen-local
    lifecycle: spawn
    tools:
      - knowledge_search

Results flow back up the tree: market-researcher returns to sales-lead, which returns to ceo.

Per-agent model assignment

Different agents can use different models. Use expensive models where reasoning quality matters, and cheap/local models for simple tasks:

agents:
  orchestrator:
    model: gpt-4o              # Best reasoning for coordination ($$$)
  researcher:
    model: gpt-4o-mini         # Good enough for web search ($)
  local-analyzer:
    model: qwen-local          # Free, runs on your GPU

models:
  gpt-4o:
    provider: openai
    api_key: ${OPENAI_API_KEY}
  gpt-4o-mini:
    provider: openai
    api_key: ${OPENAI_API_KEY}
  qwen-local:
    provider: ollama
    model: qwen2.5-coder:32b
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"

Tool whitelisting and scope isolation

Each agent only sees the tools in its own configuration. This is a critical security boundary:

agents:
  customer-service:
    tools:
      - knowledge_search         # Can search docs
      - create_ticket            # Can create tickets
      # Cannot: create_order, check_inventory

  devops-bot:
    tools:
      - manage_tasks             # Can track incidents
    mcp_servers: [web-search]   # Can search the web via MCP
      # Cannot: create_ticket, create_order

  order-processor:
    tools:
      - create_order             # Can create orders
      - check_inventory          # Can check stock
      # Cannot: knowledge_search, create_ticket

MCP servers are also isolated per-agent:

agents:
  github-bot:
    mcp_servers: [github]        # Only GitHub tools
  db-bot:
    mcp_servers: [database]      # Only database tools

confirm_before for destructive operations

Require human approval before an agent executes sensitive tools:

agents:
  order-agent:
    tools:
      - search_products          # Safe, no confirmation
      - create_order             # Dangerous, needs approval
      - refund_order             # Dangerous, needs approval
    confirm_before:
      - create_order
      - refund_order

When the agent calls create_order, the stream pauses with a confirmation event. The client must approve or reject before execution continues. See REST API Chat: Handling confirmation events.

Complete multi-agent example (3 agents)

A content creation team with a project manager, researcher, and writer:

agents:
  project-manager:
    model: gpt-4o
    lifecycle: persistent
    system: |
      You are a content project manager. When a user requests
      content, break it into research and writing tasks.
      Delegate research to the researcher and writing to the writer.
      Review the final output before returning it to the user.
    can_spawn:
      - researcher
      - writer
    tools:
      - manage_tasks

  researcher:
    model: gpt-4o-mini
    lifecycle: spawn
    system: |
      You are a research analyst. Given a topic, find relevant
      information, statistics, and examples. Return a structured
      brief with sources that a writer can use.
    tools:
      - knowledge_search
    mcp_servers: [web-search]   # Web search via MCP (Tavily, Brave, etc.)

  writer:
    model: claude-sonnet-4
    lifecycle: spawn
    system: |
      You are a content writer. Given a research brief and
      content requirements, produce polished content.
      Follow the specified format (blog post, email, report).
      Cite sources from the research brief.
    tools:
      - knowledge_search

models:
  gpt-4o:
    provider: openai
    api_key: ${OPENAI_API_KEY}
  gpt-4o-mini:
    provider: openai
    api_key: ${OPENAI_API_KEY}
  claude-sonnet-4:
    provider: anthropic
    api_key: ${ANTHROPIC_API_KEY}

Test it:

curl -N http://localhost:8443/api/v1/schemas/{schema_id}/chat \
  -H "Authorization: Bearer bb_your_token" \
  -H "Content-Type: application/json" \
  -d '{"message": "Write a blog post about the future of AI agents in enterprise software"}'