AI Providers

Octomind supports 7 AI providers through a unified interface. Provider support is implemented in [octolib](https://github.com/muvon/octolib) -- new pr

Model Format

All models use provider:model format:

model = "openrouter:anthropic/claude-sonnet-4"
model = "openai:gpt-4o"
model = "anthropic:claude-sonnet-4"
model = "deepseek:deepseek-chat"

Supported Providers

OpenRouter (recommended)

Access many providers through a single API key. Best for flexibility and model switching.

export OPENROUTER_API_KEY="your_key"
model = "openrouter:anthropic/claude-sonnet-4"
model = "openrouter:openai/gpt-4o"
model = "openrouter:google/gemini-2.5-flash-preview"

Get a key at openrouter.ai.

OpenAI

Direct access to GPT models.

export OPENAI_API_KEY="your_key"
model = "openai:gpt-4o"
model = "openai:gpt-4o-mini"

Anthropic

Direct access to Claude models. Supports prompt caching for cost savings.

export ANTHROPIC_API_KEY="your_key"
model = "anthropic:claude-sonnet-4"
model = "anthropic:claude-haiku-4-5"

Google (Vertex AI)

Gemini models via Google Cloud.

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
model = "google:gemini-2.5-flash-preview"

Requires a Google Cloud project with Vertex AI API enabled.

Amazon (Bedrock)

AWS-hosted models.

export AWS_ACCESS_KEY_ID="your_key"
export AWS_SECRET_ACCESS_KEY="your_secret"
export AWS_REGION="us-east-1"
model = "amazon:anthropic.claude-v2"

Cloudflare (Workers AI)

Edge inference with low latency.

export CLOUDFLARE_API_TOKEN="your_token"
model = "cloudflare:@cf/meta/llama-3.1-8b-instruct"

DeepSeek

Cost-effective models.

export DEEPSEEK_API_KEY="your_key"
model = "deepseek:deepseek-chat"

Provider Comparison

Provider Format Caching Vision Structured Output
OpenRouter openrouter:provider/model Yes (model-dependent) Yes Model-dependent
OpenAI openai:model No Yes (GPT-4o+) Yes
Anthropic anthropic:model Yes (Claude 3.5+) Yes (Claude 3+) Yes (tool-based)
Google google:model No Yes (Gemini 1.5+) No
Amazon amazon:model No Yes (Claude models) No
Cloudflare cloudflare:model No Limited No
DeepSeek deepseek:model No No No

Model Selection Strategy

Use Case Recommended Why
Main development anthropic:claude-sonnet-4 Best coding, caching support
Fast queries / layers openai:gpt-4o-mini Fast, cheap
Compression decisions anthropic:claude-haiku-4-5 10x cheaper than Sonnet
Research / exploration openrouter:google/gemini-2.5-flash-preview Large context, fast
Cost-effective deepseek:deepseek-chat Lowest cost

Prompt Caching

Providers with caching support can reduce costs by caching repeated context:

  • Anthropic: Automatic for Claude 3.5+ models. Cache write at 1.25x, read at 0.1x cost.
  • OpenRouter: Depends on underlying model.

Configure caching behavior:

cache_tokens_threshold = 2048    # Cache responses > 2048 tokens
cache_timeout_seconds = 240      # Cache lifetime
use_long_system_cache = true     # Longer cache for system messages

Cost Tracking

Every request tracks token usage and cost:

/info     # Session overview
/report   # Detailed per-request breakdown

Set spending limits:

max_session_spending_threshold = 5.0   # USD per session
max_request_spending_threshold = 1.0   # USD per request

Switching Models

Change model mid-session:

/model openai:gpt-4o
/model anthropic:claude-sonnet-4

Or override at startup:

octomind run -m anthropic:claude-sonnet-4

Troubleshooting

"Invalid model format": Must be provider:model. Example: openrouter:anthropic/claude-sonnet-4.

"API key not found": Set the provider's environment variable. Use octomind config --show to check.

"Provider does not support structured output": Not all providers support --schema. Use OpenAI or Anthropic for structured output.

Google Vertex AI issues: Ensure GOOGLE_APPLICATION_CREDENTIALS points to a valid JSON file and the Vertex AI API is enabled.