Configuration Reference
ByteBrew Engine is configured through YAML files or the Admin Dashboard. Both methods write to the same PostgreSQL database — YAML is just a convenient bootstrap format. This reference covers every configuration option in detail.
Agent Configuration
Section titled “Agent Configuration”Agents are the core building blocks of ByteBrew. Each agent is an LLM-powered entity with its own identity, behavior, tools, and memory. You define agents under the agents: key, where each key is the agent’s unique name.
| Parameter | Default | Description |
|---|---|---|
model * | — | References a model defined in the models: section. Determines which LLM the agent uses for reasoning. |
system | — | Inline system prompt string that defines the agent’s personality, role, and behavior rules. |
lifecycle | persistent | persistent keeps context across sessions. spawn creates a fresh instance per invocation and terminates after. |
tool_execution | sequential | sequential runs tool calls one at a time. parallel runs independent tool calls concurrently. |
max_steps | 50 | Maximum number of reasoning iterations (1-500). Prevents infinite loops in complex tasks. |
max_context_size | 16000 | Maximum context window in tokens (1,000-200,000). Older messages are compressed when exceeded. |
tools | [] | List of built-in tools and custom tool names available to this agent. |
knowledge | — | Path to a folder of documents for RAG. The engine auto-indexes files at startup. |
mcp_servers | [] | List of MCP server names (defined in mcp_servers: section) available to this agent. |
can_spawn | [] | List of agent names this agent can create at runtime. The engine auto-generates spawn_<name> tools. |
confirm_before | [] | List of tool names that require user confirmation before execution. |
agents: sales-agent: model: glm-5 # Required: model from models: section system: | # Multi-line system prompt You are a sales consultant for Acme Corp. Always be professional and helpful. Never discuss competitor products. lifecycle: persistent # Keep conversation history tool_execution: parallel # Run independent tools concurrently max_steps: 100 # Allow complex multi-step tasks max_context_size: 32000 # Larger context for long conversations tools: - knowledge_search # Search product docs - create_order # Custom HTTP tool knowledge: "./docs/products/" # Auto-indexed product catalog mcp_servers: - crm-api # CRM integration via MCP can_spawn: - researcher # Can delegate research tasks confirm_before: - create_order # Ask user before placing ordersSystem Prompts: Best Practices
Section titled “System Prompts: Best Practices”The system prompt is the most important configuration for an agent. It defines personality, capabilities, constraints, and output format. A well-written prompt dramatically improves agent reliability.
Structure of an effective prompt
Section titled “Structure of an effective prompt”- Role definition — who the agent is and what organization it belongs to.
- Capabilities — what tools are available and when to use each one.
- Constraints — what the agent must never do (guardrails).
- Output format — how to structure responses (markdown, JSON, bullet points).
- Escalation rules — when to ask the user vs. act autonomously.
# Good: specific role, clear boundaries, actionable instructionssystem: | You are a customer support agent for ByteStore, an online electronics retailer.
## Your capabilities - Search the knowledge base for product information and policies - Look up order status by order ID - Create support tickets for issues you cannot resolve
## Rules - Always greet the customer by name if available - Never share internal pricing or margin data - If asked about a competitor, redirect to our product advantages - For refund requests over $500, escalate to a human agent
## Response format - Keep responses concise (2-3 paragraphs max) - Use bullet points for lists of options - Always end with a follow-up question or next stepFor long prompts, use YAML’s multi-line block syntax (|). Long system prompts can also be managed through the Admin Dashboard editor, which provides a full-screen text area.
agents: support-bot: model: glm-5 system: | You are a customer support bot for ByteStore. Keep responses concise. Escalate refunds over $500.Security Zones Explained
Section titled “Security Zones Explained”Every tool in ByteBrew is assigned a security zone that indicates its risk level. This helps operators understand what an agent can do and enforce appropriate safeguards.
| Zone | Description |
|---|---|
Safe | Read-only or non-destructive operations. Examples: knowledge_search, show_structured_output, memory_recall. No confirmation needed. |
Caution | Operations that modify state but are reversible. Examples: custom HTTP tools that update records or send notifications. Consider adding to confirm_before. |
Dangerous | Operations with irreversible side effects. Examples: custom tools that create orders, delete data, or trigger external processes. Strongly recommended for confirm_before. |
agents: order-agent: model: glm-5 tools: - knowledge_search # Safe: read-only (built-in) - check_inventory # Caution: reads external system (custom HTTP tool) - create_order # Dangerous: irreversible action (custom HTTP tool) confirm_before: - create_order # Require human approval before placing ordersTool Confirmation (confirm_before)
Section titled “Tool Confirmation (confirm_before)”The confirm_before list specifies tools that require user approval before execution. When the agent calls a confirmed tool, the engine pauses execution and sends a confirmation SSE event to the client. The client then approves or rejects the action.
agents: sales-agent: model: glm-5 tools: - knowledge_search - create_order - send_email confirm_before: - create_order # Pause before placing orders - send_email # Pause before sending emailsSSE event flow
Section titled “SSE event flow”- Agent decides to call
create_order. - Engine detects the tool is in
confirm_before. - Engine sends a
confirmationSSE event with the tool name, input, and aconfirmation_id. - Client displays the pending action to the user.
- User approves or rejects via
POST /api/v1/sessions/{session_id}/respond. - If approved, the tool executes and the stream continues.
- If rejected, the agent receives the rejection reason and adapts.
event: confirmationdata: {"call_id":"call_abc","tool":"create_order","input":{"customer_id":"cust_123","items":"ProBook x1"}}# Approvecurl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \ -H "Authorization: Bearer bb_your_token" \ -H "Content-Type: application/json" \ -d '{"call_id": "call_abc", "answers": ["approve"]}'
# Rejectcurl -X POST http://localhost:8443/api/v1/sessions/sess_123/respond \ -H "Authorization: Bearer bb_your_token" \ -H "Content-Type: application/json" \ -d '{"call_id": "call_abc", "answers": ["reject: Customer changed their mind"]}'Environment Variables
Section titled “Environment Variables”ByteBrew supports ${VAR_NAME} syntax for referencing environment variables anywhere in your YAML configuration. Variables are expanded at engine startup, so the YAML file never contains actual secrets.
How it works
Section titled “How it works”- The engine reads the YAML file and replaces every
${VAR_NAME}with the value of that environment variable. - If a referenced variable is not set, the engine logs a warning and leaves the placeholder empty.
- You can use variables in any string value: URLs, API keys, file paths, even system prompts.
- Variables are expanded once at startup (or on hot-reload). They are not re-evaluated per-request.
# .env file (loaded by Docker Compose automatically)OPENAI_API_KEY=sk-proj-abc123CATALOG_API=https://api.mystore.com/v2WEBHOOK_SECRET=whsec_xyz789CRM_API_KEY=crm_live_456
# agents.yaml — references variables, never contains secretsmodels: glm-5: provider: openai api_key: ${OPENAI_API_KEY}
tools: search_products: type: http url: "${CATALOG_API}/products/search"Model Configuration
Section titled “Model Configuration”Models define the LLM backends your agents use. ByteBrew supports any OpenAI-compatible API, Anthropic, and local models via Ollama. You can configure multiple models and assign different ones to different agents.
| Parameter | Default | Description |
|---|---|---|
provider * | — | LLM provider type: ollama, openai_compatible, anthropic, azure_openai, google, openrouter, deepseek, mistral, xai, or zai. |
model | — | Model name as expected by the provider API (e.g., gpt-4o, claude-sonnet-4-20250514, llama3.2). |
base_url | Provider default | Custom API endpoint. Required for Ollama and third-party OpenAI-compatible providers. |
api_key | — | API key for the provider. Use ${VAR} syntax. Not required for Ollama. |
Ollama (local models)
Section titled “Ollama (local models)”Run models locally with zero API costs. Install Ollama, pull a model, and point ByteBrew at it:
# 1. Install Ollama (https://ollama.com)curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a modelollama pull llama3.2ollama pull qwen2.5-coder:32b# 3. Configure in ByteBrewmodels: llama-local: provider: ollama model: llama3.2 base_url: "http://localhost:11434/v1" api_key: "ollama" # Ollama ignores the key, but the field is required
qwen-coder: provider: ollama model: qwen2.5-coder:32b base_url: "http://localhost:11434/v1" api_key: "ollama"OpenAI-compatible providers
Section titled “OpenAI-compatible providers”Any API that follows the OpenAI chat completions format works out of the box. Just change the base_url:
| Provider | base_url |
|---|---|
| OpenAI | https://api.openai.com/v1 (default, can be omitted) |
| DeepInfra | https://api.deepinfra.com/v1/openai |
| Together AI | https://api.together.xyz/v1 |
| Groq | https://api.groq.com/openai/v1 |
| vLLM | http://localhost:8000/v1 (self-hosted) |
| LiteLLM | http://localhost:4000/v1 (proxy) |
models: # DeepInfra — pay-per-token cloud inference qwen-3-32b: provider: openai model: Qwen/Qwen3-32B base_url: "https://api.deepinfra.com/v1/openai" api_key: ${DEEPINFRA_API_KEY}
# Groq — ultra-fast inference llama-groq: provider: openai model: llama-3.3-70b-versatile base_url: "https://api.groq.com/openai/v1" api_key: ${GROQ_API_KEY}
# Self-hosted vLLM local-vllm: provider: openai model: meta-llama/Llama-3.2-8B-Instruct base_url: "http://gpu-server:8000/v1" api_key: "not-needed"Anthropic
Section titled “Anthropic”Native Anthropic API support with automatic message formatting:
models: claude-sonnet-4: provider: anthropic model: claude-sonnet-4-20250514 api_key: ${ANTHROPIC_API_KEY}Azure OpenAI
Section titled “Azure OpenAI”Azure-hosted OpenAI models use deployment-based URLs and require an api_version field:
models: gpt4-azure: provider: azure_openai base_url: "https://my-company.openai.azure.com" model: "gpt-4o-deploy" # Your deployment name api_version: "2024-10-21" api_key: ${AZURE_OPENAI_KEY}The engine constructs the full Azure URL automatically: {base_url}/openai/deployments/{model}/chat/completions?api-version={api_version}. Authentication uses the api-key header instead of Authorization: Bearer.
Google Gemini
Section titled “Google Gemini”Native Google Gemini API support via the generateContent endpoint:
models: gemini-pro: provider: google model: "gemini-3.1-pro" api_key: ${GOOGLE_API_KEY}Authentication uses the x-goog-api-key header. No base_url needed — the engine uses the default Google AI API endpoint.
Preset providers
Section titled “Preset providers”Several providers have preset base_url values, so you only need to specify provider, model, and api_key:
| Provider | Preset base_url |
|---|---|
openrouter | https://openrouter.ai/api/v1 |
deepseek | https://api.deepseek.com/v1 |
mistral | https://api.mistral.ai/v1 |
xai | https://api.x.ai/v1 |
zai | https://open.bigmodel.cn/api/paas/v4 |
models: # OpenRouter — access 100+ models via one API key openrouter-claude: provider: openrouter model: "anthropic/claude-sonnet-4-20250514" api_key: ${OPENROUTER_API_KEY}
# DeepSeek — cost-effective coding model deepseek-v3: provider: deepseek model: "deepseek-chat" api_key: ${DEEPSEEK_API_KEY}
# Mistral mistral-medium: provider: mistral model: "mistral-medium-3" api_key: ${MISTRAL_API_KEY}
# xAI Grok grok: provider: xai model: "grok-4.1" api_key: ${XAI_API_KEY}
# Z.ai GLM glm-5: provider: zai model: "glm-5" api_key: ${ZAI_API_KEY}Tool Configuration (Declarative YAML)
Section titled “Tool Configuration (Declarative YAML)”Declarative HTTP tools let you connect agents to any REST API without writing code. You define the endpoint, parameters, and authentication in YAML — the engine handles the HTTP request and passes the result back to the LLM.
| Parameter | Default | Description |
|---|---|---|
type * | — | Tool type. Currently only http is supported for declarative tools. |
method * | — | HTTP method: GET, POST, PUT, PATCH, DELETE. |
url * | — | Endpoint URL. Supports ${VAR} for env vars and {{param}} for LLM-provided values. |
params | — | Query parameters as key-value pairs. Values can use {{param}} placeholders. |
body | — | Request body (POST/PUT/PATCH). Keys and values can use {{param}} placeholders. |
headers | — | Additional HTTP headers as key-value pairs. |
auth | — | Authentication block: type (bearer, basic, header), token/username/password/name/value. |
confirmation_required | false | When true, pauses execution and asks the user before making the request. |
description | — | Human-readable description shown to the LLM. Helps the model decide when to use this tool. |
tools: # GET with query parameters search_products: type: http method: GET url: "${CATALOG_API}/products/search" description: "Search the product catalog by keyword" params: query: "{{search_term}}" limit: "10" auth: type: bearer token: ${API_TOKEN}
# POST with JSON body create_order: type: http method: POST url: "${ORDER_API}/orders" description: "Create a new order for a customer" body: customer_id: "{{customer_id}}" items: "{{items}}" notes: "{{notes}}" confirmation_required: true # Human approval before execution auth: type: bearer token: ${ORDER_API_TOKEN}
# Basic auth example legacy_erp: type: http method: GET url: "${ERP_URL}/api/inventory/{{sku}}" auth: type: basic username: ${ERP_USER} password: ${ERP_PASSWORD}
# Custom header auth internal_api: type: http method: GET url: "http://internal:3000/data" auth: type: header name: "X-Internal-Key" value: ${INTERNAL_KEY}MCP Server Configuration
Section titled “MCP Server Configuration”Model Context Protocol (MCP) servers extend agent capabilities with external tools. ByteBrew supports two transport types: stdio (the engine spawns a local process) and SSE (the engine connects to a remote HTTP server via Server-Sent Events).
| Parameter | Default | Description |
|---|---|---|
command | — | For stdio transport: the command to run (e.g., npx, python, node). |
args | [] | Command-line arguments for the stdio process. |
env | {} | Environment variables passed to the stdio process. Supports ${VAR} syntax. |
type | stdio | Transport type: stdio (default), sse, http, or streamable-http. stdio is blocked in Cloud deployments. |
url | — | For HTTP/SSE transport: the server URL to connect to. |
forward_headers | [] | List of HTTP header names to forward from the incoming chat request to the MCP server. Useful for passing tenant/user context to multi-tenant MCP backends. |
mcp_servers: # Stdio: Engine spawns the process and communicates over stdin/stdout # Note: blocked in Cloud deployments github: command: npx args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_TOKEN: ${GITHUB_TOKEN}
# Stdio: Python-based MCP server database: command: python args: ["-m", "mcp_server_postgres"] env: DATABASE_URL: ${DATABASE_URL}
# HTTP: Streamable HTTP transport (current MCP standard) tavily: type: http url: "https://mcp.tavily.com/mcp"
# SSE: Engine connects to a running server via Server-Sent Events analytics: type: sse url: "http://analytics-service:3000/mcp"
# HTTP with forwarded headers — pass tenant context to multi-tenant MCP backends my-platform: type: http url: "http://mcp-server:8087/mcp" forward_headers: - "X-Org-Id" - "X-User-Id"Forwarding user authentication tokens
Section titled “Forwarding user authentication tokens”A common pattern: your backend authenticates users with JWT tokens, but the Authorization header in the ByteBrew chat request carries the ByteBrew API key (Bearer bb_...). To pass the user’s JWT to your MCP server, use a separate header.
Frontend → Your backend (Authorization: <user JWT>) → ByteBrew (Authorization: Bearer bb_..., X-Forwarded-Authorization: <user JWT>) → MCP server (X-Forwarded-Authorization: <user JWT>)mcp_servers: my-platform: type: sse url: "http://mcp-server:8087/sse" forward_headers: - "X-Forwarded-Authorization" # User JWT - "X-Org-Id" # Tenant context - "X-User-Id" # User contextYour backend must include these headers when calling the ByteBrew chat API:
curl -X POST http://bytebrew:8443/api/v1/schemas/{schema_id}/chat \ -H "Authorization: Bearer bb_your_api_key" \ -H "X-Forwarded-Authorization: eyJhbGciOiJSUzI1NiIs..." \ -H "X-Org-Id: org-123" \ -H "X-User-Id: user-456" \ -d '{"message": "List my devices", "stream": true}'ByteBrew extracts only the headers listed in forward_headers and includes them in every MCP tool call request. The Authorization header is always consumed by ByteBrew for its own authentication and is never forwarded.
Trigger Configuration (Planned for V3)
Section titled “Trigger Configuration (Planned for V3)”Cron and webhook triggers are on the V3 roadmap and not yet available.
Current approach: Enable chat on a schema via Admin Dashboard → Schemas → toggle Chat Enabled. Once enabled, the schema accepts POST /api/v1/schemas/{id}/chat requests from your application or any HTTP client.
See Concepts: Schemas & Chat for the full schema reference and the V3 trigger roadmap.
Rate Limits (EE)
Section titled “Rate Limits (EE)”Enterprise Edition supports configurable, header-based rate limiting with tiered access levels. Rate limit rules are defined at the top level of the configuration and apply to all chat API endpoints.
Each rule identifies requests by a header value (e.g., X-Org-Id) and assigns a tier based on another header (e.g., X-Subscription-Tier). Each tier defines its own request quota and time window.
rate_limits: - name: "per-org" key_header: "X-Org-Id" tier_header: "X-Subscription-Tier" tiers: free: requests: 50 window: "24h" pro: requests: 500 window: "24h" enterprise: unlimited: true default_tier: "free"| Parameter | Description |
|---|---|
name | Unique name for this rate limit rule. |
key_header | HTTP header used to identify the requester (e.g., org ID, user ID). |
tier_header | HTTP header that specifies the requester’s tier (e.g., subscription level). |
tiers | Map of tier names to rate limit parameters. |
tiers.<name>.requests | Maximum number of requests allowed within the window. |
tiers.<name>.window | Time window as a Go duration string (e.g., 1h, 24h, 30m). |
tiers.<name>.unlimited | Set to true to allow unlimited requests for this tier. |
default_tier | Tier used when the tier_header is missing or contains an unknown value. |
When rate limiting is active, every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. Requests that exceed the limit receive HTTP 429.