Skip to content

Model Selection Guide

Choosing the right model is critical for agent reliability. ByteBrew Engine works with any LLM that supports tool calling, but model quality directly affects how well agents use tools and follow instructions.

RequirementImportanceWhy
Tool calling (function calling)MandatoryAgents need structured tool calls to interact with APIs, MCP servers, and built-in tools. Models without tool calling cannot use any tools.
Multi-turn conversationMandatoryAgents maintain conversation context across multiple exchanges. The model must handle system + user + assistant message sequences.
32K+ context windowRecommendedLong conversations, tool results, and knowledge base passages consume context. 32K+ prevents premature context compression.
Instruction followingRecommendedThe system prompt defines agent behavior, constraints, and output format. Better instruction following = more reliable agents.

ByteBrew supports these providers out of the box:

ProviderTypeAPI Key RequiredNotes
OpenAICloudYesBest tool calling support. Models: GPT-5.4, GPT-5.4 Mini.
AnthropicCloudYesNative support. Models: Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5.
Azure OpenAICloudYesAzure-hosted OpenAI models. Deployment-based URLs, requires api_version.
Google (Gemini)CloudYesNative Gemini API support. Models: Gemini 3.1 Pro, Gemini 2.5 Flash.
DeepSeekCloudYesCost-effective models. Preset base URL.
MistralCloudYesMistral AI models. Preset base URL.
xAICloudYesGrok models. Preset base URL.
Z.ai (GLM)CloudYesGLM models. Preset base URL.
OllamaLocalNoFree, private. Requires Ollama installed on host.
OpenRouterCloudYesAggregator. Access 100+ models via single API key. Preset base URL.
Custom (vLLM, LiteLLM)Self-hostedVariesAny OpenAI-compatible API endpoint via openai_compatible provider.

See Model Registry for the full catalog of known models with capabilities, pricing, and tier classifications.

ModelProviderStrengthsBest for
gpt-4oOpenAIExcellent tool calling, fastSupervisors, complex reasoning
gpt-4o-miniOpenAIGood quality, low costSpecialist agents, high volume
claude-sonnet-4-20250514AnthropicStrong reasoning, long contextResearch agents, analysis
claude-3-haikuAnthropicFast, cheapSimple tasks, data retrieval
ModelParametersVRAMTool calling quality
qwen2.5-coder:32b32B24 GBExcellent. Best quality/hardware ratio for local deployment.
qwen2.5:14b14B12 GBGood. Minimum recommended size for stable tool calling.
llama3.2:3b3B4 GBBasic. Works for simple single-tool agents. Not recommended for multi-step tasks.
mistral:7b7B8 GBFair. Better instruction following than llama 7B, but tool calling can be inconsistent.

Ollama exposes two APIs: native (/api) and OpenAI-compatible (/v1). ByteBrew requires the OpenAI-compatible endpoint.

# CORRECT: Use /v1 endpoint
models:
local-model:
provider: ollama
model: qwen2.5-coder:32b
base_url: "http://localhost:11434/v1" # /v1 is required
api_key: "ollama"
# WRONG: Native API does not support tool calling format
# base_url: "http://localhost:11434/api" # Will not work
Terminal window
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (downloads once, cached locally)
ollama pull qwen2.5-coder:32b
# Verify it works
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "qwen2.5-coder:32b", "messages": [{"role": "user", "content": "Hello"}]}'
  1. Navigate to Admin Dashboard -> Models.
  2. Click Add Model.
  3. Select the provider (Ollama, OpenAI Compatible, Anthropic).
  4. Fill in the model name, base URL (if needed), and API key.
  5. Click Save. The engine validates the connection automatically.
Terminal window
# Import a model configuration via YAML
curl -X POST http://localhost:8443/api/v1/config/import \
-H "Authorization: Bearer bb_admin_token" \
-H "Content-Type: application/x-yaml" \
-d '
models:
my-new-model:
provider: openai
model: gpt-4o
api_key: ${OPENAI_API_KEY}
'
# Reload to apply
curl -X POST http://localhost:8443/api/v1/config/reload \
-H "Authorization: Bearer bb_admin_token"

Different agents can use different models. Use your best model for the supervisor and cheaper models for specialists:

agents:
supervisor:
model: gpt-4o # Best reasoning for coordination
researcher:
model: gpt-4o-mini # Cheaper for data retrieval
local-analyzer:
model: qwen-local # Free, private, no API costs
models:
gpt-4o:
provider: openai
api_key: ${OPENAI_API_KEY}
gpt-4o-mini:
provider: openai
api_key: ${OPENAI_API_KEY}
qwen-local:
provider: ollama
model: qwen2.5-coder:32b
base_url: "http://localhost:11434/v1"
api_key: "ollama"

Bring Your Own Key lets API consumers override the model for a single request by passing headers:

Terminal window
curl -N http://localhost:8443/api/v1/schemas/{schema_id}/chat \
-H "Authorization: Bearer bb_your_token" \
-H "X-Model-Provider: anthropic" \
-H "X-Model-API-Key: sk-ant-customer-key" \
-H "X-Model-Name: claude-sonnet-4-20250514" \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}'

BYOK must be enabled per-provider in Settings. See BYOK integration guide for details.