AI Providers
Octomind supports 7 AI providers through a unified interface. Provider support is implemented in [octolib](https://github.com/muvon/octolib) -- new pr
Model Format
All models use provider:model format:
model = "openrouter:anthropic/claude-sonnet-4"
model = "openai:gpt-4o"
model = "anthropic:claude-sonnet-4"
model = "deepseek:deepseek-chat"
Supported Providers
OpenRouter (recommended)
Access many providers through a single API key. Best for flexibility and model switching.
export OPENROUTER_API_KEY="your_key"
model = "openrouter:anthropic/claude-sonnet-4"
model = "openrouter:openai/gpt-4o"
model = "openrouter:google/gemini-2.5-flash-preview"
Get a key at openrouter.ai.
OpenAI
Direct access to GPT models.
export OPENAI_API_KEY="your_key"
model = "openai:gpt-4o"
model = "openai:gpt-4o-mini"
Anthropic
Direct access to Claude models. Supports prompt caching for cost savings.
export ANTHROPIC_API_KEY="your_key"
model = "anthropic:claude-sonnet-4"
model = "anthropic:claude-haiku-4-5"
Google (Vertex AI)
Gemini models via Google Cloud.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
model = "google:gemini-2.5-flash-preview"
Requires a Google Cloud project with Vertex AI API enabled.
Amazon (Bedrock)
AWS-hosted models.
export AWS_ACCESS_KEY_ID="your_key"
export AWS_SECRET_ACCESS_KEY="your_secret"
export AWS_REGION="us-east-1"
model = "amazon:anthropic.claude-v2"
Cloudflare (Workers AI)
Edge inference with low latency.
export CLOUDFLARE_API_TOKEN="your_token"
model = "cloudflare:@cf/meta/llama-3.1-8b-instruct"
DeepSeek
Cost-effective models.
export DEEPSEEK_API_KEY="your_key"
model = "deepseek:deepseek-chat"
Provider Comparison
| Provider | Format | Caching | Vision | Structured Output |
|---|---|---|---|---|
| OpenRouter | openrouter:provider/model |
Yes (model-dependent) | Yes | Model-dependent |
| OpenAI | openai:model |
No | Yes (GPT-4o+) | Yes |
| Anthropic | anthropic:model |
Yes (Claude 3.5+) | Yes (Claude 3+) | Yes (tool-based) |
google:model |
No | Yes (Gemini 1.5+) | No | |
| Amazon | amazon:model |
No | Yes (Claude models) | No |
| Cloudflare | cloudflare:model |
No | Limited | No |
| DeepSeek | deepseek:model |
No | No | No |
Model Selection Strategy
| Use Case | Recommended | Why |
|---|---|---|
| Main development | anthropic:claude-sonnet-4 |
Best coding, caching support |
| Fast queries / layers | openai:gpt-4o-mini |
Fast, cheap |
| Compression decisions | anthropic:claude-haiku-4-5 |
10x cheaper than Sonnet |
| Research / exploration | openrouter:google/gemini-2.5-flash-preview |
Large context, fast |
| Cost-effective | deepseek:deepseek-chat |
Lowest cost |
Prompt Caching
Providers with caching support can reduce costs by caching repeated context:
- Anthropic: Automatic for Claude 3.5+ models. Cache write at 1.25x, read at 0.1x cost.
- OpenRouter: Depends on underlying model.
Configure caching behavior:
cache_tokens_threshold = 2048 # Cache responses > 2048 tokens
cache_timeout_seconds = 240 # Cache lifetime
use_long_system_cache = true # Longer cache for system messages
Cost Tracking
Every request tracks token usage and cost:
/info # Session overview
/report # Detailed per-request breakdown
Set spending limits:
max_session_spending_threshold = 5.0 # USD per session
max_request_spending_threshold = 1.0 # USD per request
Switching Models
Change model mid-session:
/model openai:gpt-4o
/model anthropic:claude-sonnet-4
Or override at startup:
octomind run -m anthropic:claude-sonnet-4
Troubleshooting
"Invalid model format": Must be provider:model. Example: openrouter:anthropic/claude-sonnet-4.
"API key not found": Set the provider's environment variable. Use octomind config --show to check.
"Provider does not support structured output": Not all providers support --schema. Use OpenAI or Anthropic for structured output.
Google Vertex AI issues: Ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid JSON file and the Vertex AI API is enabled.