Configuration Reference

ByteBrew Engine is configured through YAML files or the Admin Dashboard. Both methods write to the same PostgreSQL database — YAML is just a convenient bootstrap format. This reference covers every configuration option in detail.

Agent Configuration

Agents are the core building blocks of ByteBrew. Each agent is an LLM-powered entity with its own identity, behavior, tools, and memory. You define agents under the agents: key, where each key is the agent’s unique name.

Parameter	Default	Description
`model` *	—	References a model defined in the models: section. Determines which LLM the agent uses for reasoning.
`system`	—	Inline system prompt string that defines the agent’s personality, role, and behavior rules.
`lifecycle`	`persistent`	`persistent` keeps context across sessions. `spawn` creates a fresh instance per invocation and terminates after.
`tool_execution`	`sequential`	`sequential` runs tool calls one at a time. `parallel` runs independent tool calls concurrently.
`max_steps`	`50`	Maximum number of reasoning iterations (1-500). Prevents infinite loops in complex tasks.
`max_context_size`	`16000`	Maximum context window in tokens (1,000-200,000). Older messages are compressed when exceeded.
`tools`	`[]`	List of built-in tools and custom tool names available to this agent.
`knowledge`	—	Path to a folder of documents for RAG. The engine auto-indexes files at startup.
`mcp_servers`	`[]`	List of MCP server names (defined in mcp_servers: section) available to this agent.
`can_spawn`	`[]`	List of agent names this agent can create at runtime. The engine auto-generates spawn_<name> tools.
`confirm_before`	`[]`	List of tool names that require user confirmation before execution.

agents:
  sales-agent:
    model: glm-5                       # Required: model from models: section
    system: |                          # Multi-line system prompt
      You are a sales consultant for Acme Corp.
      Always be professional and helpful.
      Never discuss competitor products.
    lifecycle: persistent              # Keep conversation history
    tool_execution: parallel           # Run independent tools concurrently
    max_steps: 100                     # Allow complex multi-step tasks
    max_context_size: 32000            # Larger context for long conversations
    tools:
      - knowledge_search               # Search product docs
      - create_order                   # Custom HTTP tool
    knowledge: "./docs/products/"      # Auto-indexed product catalog
    mcp_servers:
      - crm-api                        # CRM integration via MCP
    can_spawn:
      - researcher                     # Can delegate research tasks
    confirm_before:
      - create_order                   # Ask user before placing orders

System Prompts: Best Practices

The system prompt is the most important configuration for an agent. It defines personality, capabilities, constraints, and output format. A well-written prompt dramatically improves agent reliability.

Structure of an effective prompt

Role definition — who the agent is and what organization it belongs to.
Capabilities — what tools are available and when to use each one.
Constraints — what the agent must never do (guardrails).
Output format — how to structure responses (markdown, JSON, bullet points).
Escalation rules — when to ask the user vs. act autonomously.

# Good: specific role, clear boundaries, actionable instructions
system: |
  You are a customer support agent for ByteStore, an online electronics retailer.

  ## Your capabilities
  - Search the knowledge base for product information and policies
  - Look up order status by order ID
  - Create support tickets for issues you cannot resolve

  ## Rules
  - Always greet the customer by name if available
  - Never share internal pricing or margin data
  - If asked about a competitor, redirect to our product advantages
  - For refund requests over $500, escalate to a human agent

  ## Response format
  - Keep responses concise (2-3 paragraphs max)
  - Use bullet points for lists of options
  - Always end with a follow-up question or next step

For long prompts, use YAML’s multi-line block syntax (|). Long system prompts can also be managed through the Admin Dashboard editor, which provides a full-screen text area.

agents:
  support-bot:
    model: glm-5
    system: |
      You are a customer support bot for ByteStore.
      Keep responses concise. Escalate refunds over $500.

Security Zones Explained

Every tool in ByteBrew is assigned a security zone that indicates its risk level. This helps operators understand what an agent can do and enforce appropriate safeguards.

Zone	Description
`Safe`	Read-only or non-destructive operations. Examples: `knowledge_search`, `show_structured_output`, `memory_recall`. No confirmation needed.
`Caution`	Operations that modify state but are reversible. Examples: custom HTTP tools that update records or send notifications. Consider adding to `confirm_before`.
`Dangerous`	Operations with irreversible side effects. Examples: custom tools that create orders, delete data, or trigger external processes. Strongly recommended for `confirm_before`.

agents:
  order-agent:
    model: glm-5
    tools:
      - knowledge_search        # Safe: read-only (built-in)
      - check_inventory         # Caution: reads external system (custom HTTP tool)
      - create_order            # Dangerous: irreversible action (custom HTTP tool)
    confirm_before:
      - create_order            # Require human approval before placing orders

Tool Confirmation (`confirm_before`)

The confirm_before list specifies tools that require user approval before execution. When the agent calls a confirmed tool, the engine pauses execution and sends a confirmation SSE event to the client. The client then approves or rejects the action.

agents:
  sales-agent:
    model: glm-5
    tools:
      - knowledge_search
      - create_order
      - send_email
    confirm_before:
      - create_order     # Pause before placing orders
      - send_email       # Pause before sending emails

SSE event flow

Agent decides to call create_order.
Engine detects the tool is in confirm_before.
Engine sends a confirmation SSE event with the tool name, input, and a confirmation_id.
Client displays the pending action to the user.
User approves or rejects via POST /api/v1/sessions/{session_id}/respond.
If approved, the tool executes and the stream continues.
If rejected, the agent receives the rejection reason and adapts.

event: confirmation
data: {"call_id":"call_abc","tool":"create_order","input":{"customer_id":"cust_123","items":"ProBook x1"}}

# Approve
curl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \
  -H "Authorization: Bearer bb_your_token" \
  -H "Content-Type: application/json" \
  -d '{"call_id": "call_abc", "answers": ["approve"]}'

# Reject
curl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \
  -H "Authorization: Bearer bb_your_token" \
  -H "Content-Type: application/json" \
  -d '{"call_id": "call_abc", "answers": ["reject: Customer changed their mind"]}'

Environment Variables

ByteBrew supports ${VAR_NAME} syntax for referencing environment variables anywhere in your YAML configuration. Variables are expanded at engine startup, so the YAML file never contains actual secrets.

How it works

The engine reads the YAML file and replaces every ${VAR_NAME} with the value of that environment variable.
If a referenced variable is not set, the engine logs a warning and leaves the placeholder empty.
You can use variables in any string value: URLs, API keys, file paths, even system prompts.
Variables are expanded once at startup (or on hot-reload). They are not re-evaluated per-request.

# .env file (loaded by Docker Compose automatically)
OPENAI_API_KEY=sk-proj-abc123
CATALOG_API=https://api.mystore.com/v2
WEBHOOK_SECRET=whsec_xyz789
CRM_API_KEY=crm_live_456

# agents.yaml — references variables, never contains secrets
models:
  glm-5:
    provider: openai
    api_key: ${OPENAI_API_KEY}

tools:
  search_products:
    type: http
    url: "${CATALOG_API}/products/search"

Model Configuration

Models define the LLM backends your agents use. ByteBrew supports any OpenAI-compatible API, Anthropic, and local models via Ollama. You can configure multiple models and assign different ones to different agents.

Parameter	Default	Description
`provider` *	—	LLM provider type: `ollama`, `openai_compatible`, `anthropic`, `azure_openai`, `google`, `openrouter`, `deepseek`, `mistral`, `xai`, or `zai`.
`model`	—	Model name as expected by the provider API (e.g., gpt-4o, claude-sonnet-4-20250514, llama3.2).
`base_url`	Provider default	Custom API endpoint. Required for Ollama and third-party OpenAI-compatible providers.
`api_key`	—	API key for the provider. Use `${VAR}` syntax. Not required for Ollama.

Ollama (local models)

Run models locally with zero API costs. Install Ollama, pull a model, and point ByteBrew at it:

# 1. Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a model
ollama pull llama3.2
ollama pull qwen2.5-coder:32b

# 3. Configure in ByteBrew
models:
  llama-local:
    provider: ollama
    model: llama3.2
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"              # Ollama ignores the key, but the field is required

  qwen-coder:
    provider: ollama
    model: qwen2.5-coder:32b
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"

OpenAI-compatible providers

Any API that follows the OpenAI chat completions format works out of the box. Just change the base_url:

Provider	base_url
OpenAI	`https://api.openai.com/v1` (default, can be omitted)
DeepInfra	`https://api.deepinfra.com/v1/openai`
Together AI	`https://api.together.xyz/v1`
Groq	`https://api.groq.com/openai/v1`
vLLM	`http://localhost:8000/v1` (self-hosted)
LiteLLM	`http://localhost:4000/v1` (proxy)

models:
  # DeepInfra — pay-per-token cloud inference
  qwen-3-32b:
    provider: openai
    model: Qwen/Qwen3-32B
    base_url: "https://api.deepinfra.com/v1/openai"
    api_key: ${DEEPINFRA_API_KEY}

  # Groq — ultra-fast inference
  llama-groq:
    provider: openai
    model: llama-3.3-70b-versatile
    base_url: "https://api.groq.com/openai/v1"
    api_key: ${GROQ_API_KEY}

  # Self-hosted vLLM
  local-vllm:
    provider: openai
    model: meta-llama/Llama-3.2-8B-Instruct
    base_url: "http://gpu-server:8000/v1"
    api_key: "not-needed"

Anthropic

Native Anthropic API support with automatic message formatting:

models:
  claude-sonnet-4:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: ${ANTHROPIC_API_KEY}

Azure OpenAI

Azure-hosted OpenAI models use deployment-based URLs and require an api_version field:

models:
  gpt4-azure:
    provider: azure_openai
    base_url: "https://my-company.openai.azure.com"
    model: "gpt-4o-deploy"              # Your deployment name
    api_version: "2024-10-21"
    api_key: ${AZURE_OPENAI_KEY}

The engine constructs the full Azure URL automatically: {base_url}/openai/deployments/{model}/chat/completions?api-version={api_version}. Authentication uses the api-key header instead of Authorization: Bearer.

Google Gemini

Native Google Gemini API support via the generateContent endpoint:

models:
  gemini-pro:
    provider: google
    model: "gemini-3.1-pro"
    api_key: ${GOOGLE_API_KEY}

Authentication uses the x-goog-api-key header. No base_url needed — the engine uses the default Google AI API endpoint.

Preset providers

Several providers have preset base_url values, so you only need to specify provider, model, and api_key:

Provider	Preset base_url
`openrouter`	`https://openrouter.ai/api/v1`
`deepseek`	`https://api.deepseek.com/v1`
`mistral`	`https://api.mistral.ai/v1`
`xai`	`https://api.x.ai/v1`
`zai`	`https://open.bigmodel.cn/api/paas/v4`

models:
  # OpenRouter — access 100+ models via one API key
  openrouter-claude:
    provider: openrouter
    model: "anthropic/claude-sonnet-4-20250514"
    api_key: ${OPENROUTER_API_KEY}

  # DeepSeek — cost-effective coding model
  deepseek-v3:
    provider: deepseek
    model: "deepseek-chat"
    api_key: ${DEEPSEEK_API_KEY}

  # Mistral
  mistral-medium:
    provider: mistral
    model: "mistral-medium-3"
    api_key: ${MISTRAL_API_KEY}

  # xAI Grok
  grok:
    provider: xai
    model: "grok-4.1"
    api_key: ${XAI_API_KEY}

  # Z.ai GLM
  glm-5:
    provider: zai
    model: "glm-5"
    api_key: ${ZAI_API_KEY}

Tool Configuration (Declarative YAML)

Declarative HTTP tools let you connect agents to any REST API without writing code. You define the endpoint, parameters, and authentication in YAML — the engine handles the HTTP request and passes the result back to the LLM.

Parameter	Default	Description
`type` *	—	Tool type. Currently only `http` is supported for declarative tools.
`method` *	—	HTTP method: GET, POST, PUT, PATCH, DELETE.
`url` *	—	Endpoint URL. Supports `${VAR}` for env vars and `{{param}}` for LLM-provided values.
`params`	—	Query parameters as key-value pairs. Values can use `{{param}}` placeholders.
`body`	—	Request body (POST/PUT/PATCH). Keys and values can use `{{param}}` placeholders.
`headers`	—	Additional HTTP headers as key-value pairs.
`auth`	—	Authentication block: type (bearer, basic, header), token/username/password/name/value.
`confirmation_required`	`false`	When true, pauses execution and asks the user before making the request.
`description`	—	Human-readable description shown to the LLM. Helps the model decide when to use this tool.

tools:
  # GET with query parameters
  search_products:
    type: http
    method: GET
    url: "${CATALOG_API}/products/search"
    description: "Search the product catalog by keyword"
    params:
      query: "{{search_term}}"
      limit: "10"
    auth:
      type: bearer
      token: ${API_TOKEN}

  # POST with JSON body
  create_order:
    type: http
    method: POST
    url: "${ORDER_API}/orders"
    description: "Create a new order for a customer"
    body:
      customer_id: "{{customer_id}}"
      items: "{{items}}"
      notes: "{{notes}}"
    confirmation_required: true       # Human approval before execution
    auth:
      type: bearer
      token: ${ORDER_API_TOKEN}

  # Basic auth example
  legacy_erp:
    type: http
    method: GET
    url: "${ERP_URL}/api/inventory/{{sku}}"
    auth:
      type: basic
      username: ${ERP_USER}
      password: ${ERP_PASSWORD}

  # Custom header auth
  internal_api:
    type: http
    method: GET
    url: "http://internal:3000/data"
    auth:
      type: header
      name: "X-Internal-Key"
      value: ${INTERNAL_KEY}

MCP Server Configuration

Model Context Protocol (MCP) servers extend agent capabilities with external tools. ByteBrew supports two transport types: stdio (the engine spawns a local process) and SSE (the engine connects to a remote HTTP server via Server-Sent Events).

Parameter	Default	Description
`command`	—	For stdio transport: the command to run (e.g., npx, python, node).
`args`	`[]`	Command-line arguments for the stdio process.
`env`	`{}`	Environment variables passed to the stdio process. Supports `${VAR}` syntax.
`type`	`stdio`	Transport type: `stdio` (default), `sse`, `http`, or `streamable-http`. `stdio` is blocked in Cloud deployments.
`url`	—	For HTTP/SSE transport: the server URL to connect to.
`forward_headers`	`[]`	List of HTTP header names to forward from the incoming chat request to the MCP server. Useful for passing tenant/user context to multi-tenant MCP backends.

mcp_servers:
  # Stdio: Engine spawns the process and communicates over stdin/stdout
  # Note: blocked in Cloud deployments
  github:
    command: npx
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_TOKEN: ${GITHUB_TOKEN}

  # Stdio: Python-based MCP server
  database:
    command: python
    args: ["-m", "mcp_server_postgres"]
    env:
      DATABASE_URL: ${DATABASE_URL}

  # HTTP: Streamable HTTP transport (current MCP standard)
  tavily:
    type: http
    url: "https://mcp.tavily.com/mcp"

  # SSE: Engine connects to a running server via Server-Sent Events
  analytics:
    type: sse
    url: "http://analytics-service:3000/mcp"

  # HTTP with forwarded headers — pass tenant context to multi-tenant MCP backends
  my-platform:
    type: http
    url: "http://mcp-server:8087/mcp"
    forward_headers:
      - "X-Org-Id"
      - "X-User-Id"

Forwarding user authentication tokens

A common pattern: your backend authenticates users with JWT tokens, but the Authorization header in the ByteBrew chat request carries the ByteBrew API key (Bearer bb_...). To pass the user’s JWT to your MCP server, use a separate header.

Frontend → Your backend (Authorization: <user JWT>)
  → ByteBrew (Authorization: Bearer bb_..., X-Forwarded-Authorization: <user JWT>)
    → MCP server (X-Forwarded-Authorization: <user JWT>)

mcp_servers:
  my-platform:
    type: sse
    url: "http://mcp-server:8087/sse"
    forward_headers:
      - "X-Forwarded-Authorization"   # User JWT
      - "X-Org-Id"                     # Tenant context
      - "X-User-Id"                    # User context

Your backend must include these headers when calling the ByteBrew chat API:

curl -X POST http://bytebrew:8443/api/v1/schemas/{schema_id}/chat \
  -H "Authorization: Bearer bb_your_api_key" \
  -H "X-Forwarded-Authorization: eyJhbGciOiJSUzI1NiIs..." \
  -H "X-Org-Id: org-123" \
  -H "X-User-Id: user-456" \
  -d '{"message": "List my devices", "stream": true}'

ByteBrew extracts only the headers listed in forward_headers and includes them in every MCP tool call request. The Authorization header is always consumed by ByteBrew for its own authentication and is never forwarded.

Trigger Configuration (Planned for V3)

Cron and webhook triggers are on the V3 roadmap and not yet available.

Current approach: Enable chat on a schema via Admin Dashboard → Schemas → toggle Chat Enabled. Once enabled, the schema accepts POST /api/v1/schemas/{id}/chat requests from your application or any HTTP client.

See Concepts: Schemas & Chat for the full schema reference and the V3 trigger roadmap.

Rate Limits (EE)

Enterprise Edition supports configurable, header-based rate limiting with tiered access levels. Rate limit rules are defined at the top level of the configuration and apply to all chat API endpoints.

Each rule identifies requests by a header value (e.g., X-Org-Id) and assigns a tier based on another header (e.g., X-Subscription-Tier). Each tier defines its own request quota and time window.

rate_limits:
  - name: "per-org"
    key_header: "X-Org-Id"
    tier_header: "X-Subscription-Tier"
    tiers:
      free:
        requests: 50
        window: "24h"
      pro:
        requests: 500
        window: "24h"
      enterprise:
        unlimited: true
    default_tier: "free"

Parameter	Description
`name`	Unique name for this rate limit rule.
`key_header`	HTTP header used to identify the requester (e.g., org ID, user ID).
`tier_header`	HTTP header that specifies the requester’s tier (e.g., subscription level).
`tiers`	Map of tier names to rate limit parameters.
`tiers.<name>.requests`	Maximum number of requests allowed within the window.
`tiers.<name>.window`	Time window as a Go duration string (e.g., `1h`, `24h`, `30m`).
`tiers.<name>.unlimited`	Set to `true` to allow unlimited requests for this tier.
`default_tier`	Tier used when the `tier_header` is missing or contains an unknown value.

When rate limiting is active, every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. Requests that exceed the limit receive HTTP 429.