Skip to content

REST API Chat Integration

This guide covers everything you need to build a client that communicates with ByteBrew Engine over the REST API with SSE streaming.

When you send a message to POST /api/v1/schemas/{name}/chat, the engine responds with a stream of Server-Sent Events. Each event has a type field in the event name and a JSON data payload.

EventData fieldsDescription
message_deltacontentStreaming token. A partial text chunk from the agent. Concatenate all message_delta events for the full response.
messagecontent, roleComplete message. Sent when a full message is available (non-streaming mode or final assembly).
thinkingcontentReasoning started. The agent is processing internally. Contains partial reasoning text (if the model supports it).
tool_calltool, inputTool execution started. Contains the tool name and the input parameters the agent provided.
tool_resulttool, output, errorTool execution completed. Contains the tool output or error message.
structured_outputoutput_type, title, rows, actions, questionsAgent emitted structured data (table, info block, or form). Client renders the block; for form mode the user’s reply arrives as the next chat message.
confirmationtool, input, call_idRequires user approval. A tool with confirm_before is about to execute. Send approval via the confirmation endpoint.
donesession_id, tokensSession completed. Contains the session ID for resuming and total token count.
errormessage, codeError occurred. The stream terminates after this event.
event: thinking
data: {"content":"Let me search for that information..."}
event: tool_call
data: {"tool":"search_products","input":{"query":"laptops under 1000"}}
event: tool_result
data: {"tool":"search_products","output":"[{\"name\":\"ProBook 450\",\"price\":849}]"}
event: message_delta
data: {"content":"I found "}
event: message_delta
data: {"content":"several options for you:\n\n"}
event: message_delta
data: {"content":"1. **ProBook 450** — $849"}
event: done
data: {"session_id":"sess_abc123","tokens":156}

When a tool has confirm_before configured, the stream pauses with a confirmation event:

event: confirmation
data: {"tool":"create_order","input":{"customer_id":"cust_123","items":"ProBook 450 x1"},"call_id":"conf_xyz"}

To approve or reject:

Terminal window
# Approve
curl -X POST http://localhost:8443/api/v1/sessions/{session_id}/respond \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"call_id": "conf_xyz", "answers": ["approve"]}'
# Reject
curl -X POST http://localhost:8443/api/v1/sessions/{session_id}/respond \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"call_id": "conf_xyz", "answers": ["reject: Customer changed their mind"]}'

When an agent calls show_structured_output, the engine emits a structured_output SSE event. The event is non-blocking — the agent’s turn ends immediately after emitting it.

event: structured_output
data: {"output_type":"summary_table","title":"Project Summary","rows":[{"label":"Name","value":"MyApp"},{"label":"Status","value":"Active"},{"label":"Users","value":"1,234"}],"actions":[{"label":"Deploy","type":"primary","value":"deploy"},{"label":"Cancel","type":"secondary","value":"cancel"}]}
FieldDescription
output_typeType of structured output: summary_table, info, or form.
titleOptional title for the output block.
descriptionOptional description text.
rowsArray of {label, value} pairs for table display (summary_table mode).
actionsArray of {label, type, value} action buttons (type: primary or secondary).
questionsArray of input question objects in form mode (see below).

In form mode the agent emits a structured form and its turn ends. The client renders the form and the user’s answers arrive as the next chat message — no separate respond endpoint needed.

event: structured_output
data: {"output_type":"form","title":"Leave request","questions":[{"id":"leave_type","label":"What type of leave?","type":"select","options":[{"label":"Vacation","value":"vacation"},{"label":"Sick","value":"sick"}]},{"id":"dates","label":"What dates? (start – end)","type":"text"}]}

Each question object:

FieldRequiredDescription
idYesStable identifier returned with the answer.
labelYesQuestion text shown to the user.
typeYestext, select, or multiselect.
optionsselect/multiselectArray of 2–5 options with label (and optional value).
defaultNoDefault value pre-filled for the user.

The client submits answers as the next user message. The agent receives the answers in the next turn and continues processing.

For summary_table and info modes, the client renders the block but does not need to respond. If action buttons are present, clicking one sends the button’s value back as a regular chat message.

Omit session_id to start a new conversation:

Terminal window
curl -N http://localhost:8443/api/v1/schemas/{schema_id}/chat \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"message": "Hello, I need help with my order"}'

The done event returns a session_id. Save it for continuations.

Pass session_id to continue a conversation with full history:

Terminal window
curl -N http://localhost:8443/api/v1/schemas/{schema_id}/chat \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"message": "Can you check order #12345?", "session_id": "sess_abc123"}'
Terminal window
curl "http://localhost:8443/api/v1/sessions?agent=my-agent&limit=20" \
-H "Authorization: Bearer bb_your_token"
Terminal window
curl -X DELETE http://localhost:8443/api/v1/sessions/sess_abc123 \
-H "Authorization: Bearer bb_your_token"

For clients that cannot handle SSE, set stream: false in the request body:

Terminal window
curl http://localhost:8443/api/v1/schemas/{schema_id}/chat \
-H "Authorization: Bearer bb_your_token" \
-H "Content-Type: application/json" \
-d '{"message": "Hello", "stream": false}'

Response (standard JSON, not SSE):

{
"response": "Hello! How can I help you today?",
"session_id": "sess_abc123",
"tokens": 42,
"tool_calls": []
}

All endpoints require a Bearer token in the Authorization header:

Authorization: Bearer bb_your_api_token

Tokens are created in Admin Dashboard -> API Keys. Each token has scopes that limit what it can access. For chat integrations, the chat scope is sufficient.

See API Reference: Authentication for details on scopes and token management.

StatusMeaning
400Bad request. Invalid JSON or missing required fields.
401Unauthorized. Missing or invalid API token.
403Forbidden. Token lacks the required scope.
404Agent not found. Check the agent name in the URL.
429Rate limited. Too many requests. Retry after the Retry-After header value.
500Internal server error. Check engine logs.

Errors during streaming are sent as error events:

event: error
data: {"message":"Model returned an error: context length exceeded","code":"model_error"}

The stream closes after an error event. Your client should reconnect or show the error to the user.

For transient errors (429, 500), implement exponential backoff:

async function chatWithRetry(message, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await sendMessage(message);
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.pow(2, attempt) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}

Do NOT use EventSource — it only supports GET requests. Use fetch + ReadableStream for POST-based SSE:

const response = await fetch('http://localhost:8443/api/v1/schemas/{schema_id}/chat', {
method: 'POST',
headers: {
'Authorization': 'Bearer bb_your_token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ message: 'Hello', session_id: null }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
let currentEvent = '';
for (const line of lines) {
if (line.startsWith('event: ')) currentEvent = line.slice(7);
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (currentEvent === 'message_delta') console.log(data.content);
if (currentEvent === 'done') console.log('Session:', data.session_id);
}
}
}

The engine enforces rate limits per API token:

  • Default: 60 requests per minute per token.
  • Configurable in engine settings.
  • Rate-limited responses return HTTP 429 with a Retry-After header.

Every API response includes rate limit headers when configurable rate limiting is enabled (EE):

HeaderDescription
X-RateLimit-LimitMaximum number of requests allowed in the current window.
X-RateLimit-RemainingNumber of requests remaining in the current window.
X-RateLimit-ResetUnix timestamp (seconds) when the current window resets.
HTTP/1.1 200 OK
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 487
X-RateLimit-Reset: 1711929600

See Configuration: Rate Limits for setup.

Query tool call history for auditing and debugging. Requires admin scope.

Terminal window
curl "http://localhost:8443/api/v1/audit/tool-calls?agent=sales-agent&tool=create_order&page=1&per_page=20" \
-H "Authorization: Bearer bb_your_token"
ParameterDescription
session_idFilter by session ID.
agentFilter by agent name.
toolFilter by tool name.
statusFilter by status: completed or failed.
user_idFilter by user ID.
fromStart date (RFC3339 or YYYY-MM-DD).
toEnd date (RFC3339 or YYYY-MM-DD).
pagePage number (default: 1).
per_pageResults per page (default: 50, max: 100).
{
"data": [
{
"id": 42,
"session_id": "sess_abc123",
"agent_name": "sales-agent",
"tool_name": "create_order",
"input": "{\"customer_id\":\"cust_123\"}",
"output": "{\"order_id\":\"ord_456\"}",
"status": "completed",
"duration_ms": 340,
"user_id": "user_789",
"created_at": "2026-03-20T14:30:00Z"
}
],
"total": 156,
"page": 1,
"per_page": 20,
"total_pages": 8
}

Browse the built-in catalog of known models and providers. No authentication required.

Terminal window
# List all models
curl http://localhost:8443/api/v1/models/registry
# Filter by provider
curl "http://localhost:8443/api/v1/models/registry?provider=anthropic"
# Filter by tier
curl "http://localhost:8443/api/v1/models/registry?tier=1"
# Filter by tool support
curl "http://localhost:8443/api/v1/models/registry?supports_tools=true"
# List all providers
curl http://localhost:8443/api/v1/models/registry/providers

See Model Registry for full details.

Check current rate limit usage for a specific key. Requires admin scope.

Terminal window
curl "http://localhost:8443/api/v1/rate-limits/usage?key_header=X-Org-Id&key_value=org-123" \
-H "Authorization: Bearer bb_your_token"
{
"rule": "per-org",
"key": "org-123",
"tier": "pro",
"used": 42,
"limit": 500,
"window": "24h0m0s",
"resets_at": "2026-03-25T00:00:00Z"
}

The engine exposes Prometheus-compatible metrics at /metrics. No authentication required.

Terminal window
curl http://localhost:8443/metrics

See Production: Prometheus Metrics for available metrics and Kubernetes integration.