Skip to content

Configuration Reference

ByteBrew Engine is configured through YAML files or the Admin Dashboard. Both methods write to the same PostgreSQL database — YAML is just a convenient bootstrap format. This reference covers every configuration option in detail.

Agents are the core building blocks of ByteBrew. Each agent is an LLM-powered entity with its own identity, behavior, tools, and memory. You define agents under the agents: key, where each key is the agent’s unique name.

ParameterDefaultDescription
model *References a model defined in the models: section. Determines which LLM the agent uses for reasoning.
systemInline system prompt string that defines the agent’s personality, role, and behavior rules.
lifecyclepersistentpersistent keeps context across sessions. spawn creates a fresh instance per invocation and terminates after.
tool_executionsequentialsequential runs tool calls one at a time. parallel runs independent tool calls concurrently.
max_steps50Maximum number of reasoning iterations (1-500). Prevents infinite loops in complex tasks.
max_context_size16000Maximum context window in tokens (1,000-200,000). Older messages are compressed when exceeded.
tools[]List of built-in tools and custom tool names available to this agent.
knowledgePath to a folder of documents for RAG. The engine auto-indexes files at startup.
mcp_servers[]List of MCP server names (defined in mcp_servers: section) available to this agent.
can_spawn[]List of agent names this agent can create at runtime. The engine auto-generates spawn_<name> tools.
confirm_before[]List of tool names that require user confirmation before execution.
agents:
sales-agent:
model: glm-5 # Required: model from models: section
system: | # Multi-line system prompt
You are a sales consultant for Acme Corp.
Always be professional and helpful.
Never discuss competitor products.
lifecycle: persistent # Keep conversation history
tool_execution: parallel # Run independent tools concurrently
max_steps: 100 # Allow complex multi-step tasks
max_context_size: 32000 # Larger context for long conversations
tools:
- knowledge_search # Search product docs
- create_order # Custom HTTP tool
knowledge: "./docs/products/" # Auto-indexed product catalog
mcp_servers:
- crm-api # CRM integration via MCP
can_spawn:
- researcher # Can delegate research tasks
confirm_before:
- create_order # Ask user before placing orders

The system prompt is the most important configuration for an agent. It defines personality, capabilities, constraints, and output format. A well-written prompt dramatically improves agent reliability.

  • Role definition — who the agent is and what organization it belongs to.
  • Capabilities — what tools are available and when to use each one.
  • Constraints — what the agent must never do (guardrails).
  • Output format — how to structure responses (markdown, JSON, bullet points).
  • Escalation rules — when to ask the user vs. act autonomously.
# Good: specific role, clear boundaries, actionable instructions
system: |
You are a customer support agent for ByteStore, an online electronics retailer.
## Your capabilities
- Search the knowledge base for product information and policies
- Look up order status by order ID
- Create support tickets for issues you cannot resolve
## Rules
- Always greet the customer by name if available
- Never share internal pricing or margin data
- If asked about a competitor, redirect to our product advantages
- For refund requests over $500, escalate to a human agent
## Response format
- Keep responses concise (2-3 paragraphs max)
- Use bullet points for lists of options
- Always end with a follow-up question or next step

For long prompts, use YAML’s multi-line block syntax (|). Long system prompts can also be managed through the Admin Dashboard editor, which provides a full-screen text area.

agents:
support-bot:
model: glm-5
system: |
You are a customer support bot for ByteStore.
Keep responses concise. Escalate refunds over $500.

Every tool in ByteBrew is assigned a security zone that indicates its risk level. This helps operators understand what an agent can do and enforce appropriate safeguards.

ZoneDescription
SafeRead-only or non-destructive operations. Examples: knowledge_search, show_structured_output, memory_recall. No confirmation needed.
CautionOperations that modify state but are reversible. Examples: custom HTTP tools that update records or send notifications. Consider adding to confirm_before.
DangerousOperations with irreversible side effects. Examples: custom tools that create orders, delete data, or trigger external processes. Strongly recommended for confirm_before.
agents:
order-agent:
model: glm-5
tools:
- knowledge_search # Safe: read-only (built-in)
- check_inventory # Caution: reads external system (custom HTTP tool)
- create_order # Dangerous: irreversible action (custom HTTP tool)
confirm_before:
- create_order # Require human approval before placing orders

The confirm_before list specifies tools that require user approval before execution. When the agent calls a confirmed tool, the engine pauses execution and sends a confirmation SSE event to the client. The client then approves or rejects the action.

agents:
sales-agent:
model: glm-5
tools:
- knowledge_search
- create_order
- send_email
confirm_before:
- create_order # Pause before placing orders
- send_email # Pause before sending emails
  1. Agent decides to call create_order.
  2. Engine detects the tool is in confirm_before.
  3. Engine sends a confirmation SSE event with the tool name, input, and a confirmation_id.
  4. Client displays the pending action to the user.
  5. User approves or rejects via POST /api/v1/sessions/{session_id}/respond.
  6. If approved, the tool executes and the stream continues.
  7. If rejected, the agent receives the rejection reason and adapts.
event: confirmation
data: {"call_id":"call_abc","tool":"create_order","input":{"customer_id":"cust_123","items":"ProBook x1"}}
Terminal window
# Approve
curl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"call_id": "call_abc", "answers": ["approve"]}'
# Reject
curl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"call_id": "call_abc", "answers": ["reject: Customer changed their mind"]}'

ByteBrew supports ${VAR_NAME} syntax for referencing environment variables anywhere in your YAML configuration. Variables are expanded at engine startup, so the YAML file never contains actual secrets.

  • The engine reads the YAML file and replaces every ${VAR_NAME} with the value of that environment variable.
  • If a referenced variable is not set, the engine logs a warning and leaves the placeholder empty.
  • You can use variables in any string value: URLs, API keys, file paths, even system prompts.
  • Variables are expanded once at startup (or on hot-reload). They are not re-evaluated per-request.
# .env file (loaded by Docker Compose automatically)
OPENAI_API_KEY=sk-proj-abc123
CATALOG_API=https://api.mystore.com/v2
WEBHOOK_SECRET=whsec_xyz789
CRM_API_KEY=crm_live_456
# agents.yaml — references variables, never contains secrets
models:
glm-5:
provider: openai
api_key: ${OPENAI_API_KEY}
tools:
search_products:
type: http
url: "${CATALOG_API}/products/search"

Models define the LLM backends your agents use. ByteBrew supports any OpenAI-compatible API, Anthropic, and local models via Ollama. You can configure multiple models and assign different ones to different agents.

ParameterDefaultDescription
provider *LLM provider type: ollama, openai_compatible, anthropic, azure_openai, google, openrouter, deepseek, mistral, xai, or zai.
modelModel name as expected by the provider API (e.g., gpt-4o, claude-sonnet-4-20250514, llama3.2).
base_urlProvider defaultCustom API endpoint. Required for Ollama and third-party OpenAI-compatible providers.
api_keyAPI key for the provider. Use ${VAR} syntax. Not required for Ollama.

Run models locally with zero API costs. Install Ollama, pull a model, and point ByteBrew at it:

Terminal window
# 1. Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a model
ollama pull llama3.2
ollama pull qwen2.5-coder:32b
# 3. Configure in ByteBrew
models:
llama-local:
provider: ollama
model: llama3.2
base_url: "http://localhost:11434/v1"
api_key: "ollama" # Ollama ignores the key, but the field is required
qwen-coder:
provider: ollama
model: qwen2.5-coder:32b
base_url: "http://localhost:11434/v1"
api_key: "ollama"

Any API that follows the OpenAI chat completions format works out of the box. Just change the base_url:

Providerbase_url
OpenAIhttps://api.openai.com/v1 (default, can be omitted)
DeepInfrahttps://api.deepinfra.com/v1/openai
Together AIhttps://api.together.xyz/v1
Groqhttps://api.groq.com/openai/v1
vLLMhttp://localhost:8000/v1 (self-hosted)
LiteLLMhttp://localhost:4000/v1 (proxy)
models:
# DeepInfra — pay-per-token cloud inference
qwen-3-32b:
provider: openai
model: Qwen/Qwen3-32B
base_url: "https://api.deepinfra.com/v1/openai"
api_key: ${DEEPINFRA_API_KEY}
# Groq — ultra-fast inference
llama-groq:
provider: openai
model: llama-3.3-70b-versatile
base_url: "https://api.groq.com/openai/v1"
api_key: ${GROQ_API_KEY}
# Self-hosted vLLM
local-vllm:
provider: openai
model: meta-llama/Llama-3.2-8B-Instruct
base_url: "http://gpu-server:8000/v1"
api_key: "not-needed"

Native Anthropic API support with automatic message formatting:

models:
claude-sonnet-4:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}

Azure-hosted OpenAI models use deployment-based URLs and require an api_version field:

models:
gpt4-azure:
provider: azure_openai
base_url: "https://my-company.openai.azure.com"
model: "gpt-4o-deploy" # Your deployment name
api_version: "2024-10-21"
api_key: ${AZURE_OPENAI_KEY}

The engine constructs the full Azure URL automatically: {base_url}/openai/deployments/{model}/chat/completions?api-version={api_version}. Authentication uses the api-key header instead of Authorization: Bearer.

Native Google Gemini API support via the generateContent endpoint:

models:
gemini-pro:
provider: google
model: "gemini-3.1-pro"
api_key: ${GOOGLE_API_KEY}

Authentication uses the x-goog-api-key header. No base_url needed — the engine uses the default Google AI API endpoint.

Several providers have preset base_url values, so you only need to specify provider, model, and api_key:

ProviderPreset base_url
openrouterhttps://openrouter.ai/api/v1
deepseekhttps://api.deepseek.com/v1
mistralhttps://api.mistral.ai/v1
xaihttps://api.x.ai/v1
zaihttps://open.bigmodel.cn/api/paas/v4
models:
# OpenRouter — access 100+ models via one API key
openrouter-claude:
provider: openrouter
model: "anthropic/claude-sonnet-4-20250514"
api_key: ${OPENROUTER_API_KEY}
# DeepSeek — cost-effective coding model
deepseek-v3:
provider: deepseek
model: "deepseek-chat"
api_key: ${DEEPSEEK_API_KEY}
# Mistral
mistral-medium:
provider: mistral
model: "mistral-medium-3"
api_key: ${MISTRAL_API_KEY}
# xAI Grok
grok:
provider: xai
model: "grok-4.1"
api_key: ${XAI_API_KEY}
# Z.ai GLM
glm-5:
provider: zai
model: "glm-5"
api_key: ${ZAI_API_KEY}

Declarative HTTP tools let you connect agents to any REST API without writing code. You define the endpoint, parameters, and authentication in YAML — the engine handles the HTTP request and passes the result back to the LLM.

ParameterDefaultDescription
type *Tool type. Currently only http is supported for declarative tools.
method *HTTP method: GET, POST, PUT, PATCH, DELETE.
url *Endpoint URL. Supports ${VAR} for env vars and {{param}} for LLM-provided values.
paramsQuery parameters as key-value pairs. Values can use {{param}} placeholders.
bodyRequest body (POST/PUT/PATCH). Keys and values can use {{param}} placeholders.
headersAdditional HTTP headers as key-value pairs.
authAuthentication block: type (bearer, basic, header), token/username/password/name/value.
confirmation_requiredfalseWhen true, pauses execution and asks the user before making the request.
descriptionHuman-readable description shown to the LLM. Helps the model decide when to use this tool.
tools:
# GET with query parameters
search_products:
type: http
method: GET
url: "${CATALOG_API}/products/search"
description: "Search the product catalog by keyword"
params:
query: "{{search_term}}"
limit: "10"
auth:
type: bearer
token: ${API_TOKEN}
# POST with JSON body
create_order:
type: http
method: POST
url: "${ORDER_API}/orders"
description: "Create a new order for a customer"
body:
customer_id: "{{customer_id}}"
items: "{{items}}"
notes: "{{notes}}"
confirmation_required: true # Human approval before execution
auth:
type: bearer
token: ${ORDER_API_TOKEN}
# Basic auth example
legacy_erp:
type: http
method: GET
url: "${ERP_URL}/api/inventory/{{sku}}"
auth:
type: basic
username: ${ERP_USER}
password: ${ERP_PASSWORD}
# Custom header auth
internal_api:
type: http
method: GET
url: "http://internal:3000/data"
auth:
type: header
name: "X-Internal-Key"
value: ${INTERNAL_KEY}

Model Context Protocol (MCP) servers extend agent capabilities with external tools. ByteBrew supports two transport types: stdio (the engine spawns a local process) and SSE (the engine connects to a remote HTTP server via Server-Sent Events).

ParameterDefaultDescription
commandFor stdio transport: the command to run (e.g., npx, python, node).
args[]Command-line arguments for the stdio process.
env{}Environment variables passed to the stdio process. Supports ${VAR} syntax.
typestdioTransport type: stdio (default), sse, http, or streamable-http. stdio is blocked in Cloud deployments.
urlFor HTTP/SSE transport: the server URL to connect to.
forward_headers[]List of HTTP header names to forward from the incoming chat request to the MCP server. Useful for passing tenant/user context to multi-tenant MCP backends.
mcp_servers:
# Stdio: Engine spawns the process and communicates over stdin/stdout
# Note: blocked in Cloud deployments
github:
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_TOKEN: ${GITHUB_TOKEN}
# Stdio: Python-based MCP server
database:
command: python
args: ["-m", "mcp_server_postgres"]
env:
DATABASE_URL: ${DATABASE_URL}
# HTTP: Streamable HTTP transport (current MCP standard)
tavily:
type: http
url: "https://mcp.tavily.com/mcp"
# SSE: Engine connects to a running server via Server-Sent Events
analytics:
type: sse
url: "http://analytics-service:3000/mcp"
# HTTP with forwarded headers — pass tenant context to multi-tenant MCP backends
my-platform:
type: http
url: "http://mcp-server:8087/mcp"
forward_headers:
- "X-Org-Id"
- "X-User-Id"

A common pattern: your backend authenticates users with JWT tokens, but the Authorization header in the ByteBrew chat request carries the ByteBrew API key (Bearer bb_...). To pass the user’s JWT to your MCP server, use a separate header.

Frontend → Your backend (Authorization: <user JWT>)
→ ByteBrew (Authorization: Bearer bb_..., X-Forwarded-Authorization: <user JWT>)
→ MCP server (X-Forwarded-Authorization: <user JWT>)
mcp_servers:
my-platform:
type: sse
url: "http://mcp-server:8087/sse"
forward_headers:
- "X-Forwarded-Authorization" # User JWT
- "X-Org-Id" # Tenant context
- "X-User-Id" # User context

Your backend must include these headers when calling the ByteBrew chat API:

Terminal window
curl -X POST http://bytebrew:8443/api/v1/schemas/{schema_id}/chat \
-H "Authorization: Bearer bb_your_api_key" \
-H "X-Forwarded-Authorization: eyJhbGciOiJSUzI1NiIs..." \
-H "X-Org-Id: org-123" \
-H "X-User-Id: user-456" \
-d '{"message": "List my devices", "stream": true}'

ByteBrew extracts only the headers listed in forward_headers and includes them in every MCP tool call request. The Authorization header is always consumed by ByteBrew for its own authentication and is never forwarded.

Cron and webhook triggers are on the V3 roadmap and not yet available.

Current approach: Enable chat on a schema via Admin Dashboard → Schemas → toggle Chat Enabled. Once enabled, the schema accepts POST /api/v1/schemas/{id}/chat requests from your application or any HTTP client.

See Concepts: Schemas & Chat for the full schema reference and the V3 trigger roadmap.

Enterprise Edition supports configurable, header-based rate limiting with tiered access levels. Rate limit rules are defined at the top level of the configuration and apply to all chat API endpoints.

Each rule identifies requests by a header value (e.g., X-Org-Id) and assigns a tier based on another header (e.g., X-Subscription-Tier). Each tier defines its own request quota and time window.

rate_limits:
- name: "per-org"
key_header: "X-Org-Id"
tier_header: "X-Subscription-Tier"
tiers:
free:
requests: 50
window: "24h"
pro:
requests: 500
window: "24h"
enterprise:
unlimited: true
default_tier: "free"
ParameterDescription
nameUnique name for this rate limit rule.
key_headerHTTP header used to identify the requester (e.g., org ID, user ID).
tier_headerHTTP header that specifies the requester’s tier (e.g., subscription level).
tiersMap of tier names to rate limit parameters.
tiers.<name>.requestsMaximum number of requests allowed within the window.
tiers.<name>.windowTime window as a Go duration string (e.g., 1h, 24h, 30m).
tiers.<name>.unlimitedSet to true to allow unlimited requests for this tier.
default_tierTier used when the tier_header is missing or contains an unknown value.

When rate limiting is active, every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. Requests that exceed the limit receive HTTP 429.