Multi-Provider LLM Routing: One API for 13+ Providers
Route AI requests to the right LLM provider automatically. One endpoint for OpenAI, Anthropic, Google, Mistral, and 9 more.
What you will learn
- Route requests to 13+ LLM providers through a single endpoint
- Implement cost-optimized routing based on task complexity
- Set up failover between providers for high availability
- Compare provider performance with real cost data
The Multi-Provider Challenge
No single LLM provider is best at everything. GPT-4o excels at reasoning. Claude handles long documents beautifully. Gemini is fast and cost-effective for classification. Mistral offers strong performance at lower cost for European data residency. The question is not which provider to use — it is how to use them all efficiently.
Locked into one provider. Every agent uses GPT-4o, even for simple tasks that a cheaper model handles well. Provider outage means all agents stop.
Each agent uses the best model for its task. Simple classification uses Gemini Flash ($0.075/1M tokens). Complex reasoning uses GPT-4o. If one provider goes down, traffic fails over to another.
Supported Providers
- OpenAI — GPT-4o, GPT-4o-mini, o1, o3
- Anthropic — Claude Sonnet, Claude Haiku, Claude Opus
- Google — Gemini 2.5 Flash, Gemini 2.5 Pro
- Mistral — Mistral Large, Mistral Small, Codestral
- AWS Bedrock — Claude, Titan, Llama via AWS
- Azure OpenAI — GPT models via Azure
- Grok — xAI's Grok models
- DeepSeek — DeepSeek V3, DeepSeek Coder
- Perplexity — pplx-7b, pplx-70b (search-augmented)
- Replicate — Llama, Mixtral via Replicate
- Plus GitHub Copilot, Microsoft 365 Copilot, and more
How Routing Works
The Gateway uses a standard OpenAI-compatible API. You specify the model in the request, and the Gateway routes to the correct provider. Your API keys for each provider are stored encrypted in the platform — the agent never sees them.
from openai import OpenAI
client = OpenAI(
base_url="https://dobby-ai.com/api/v1/gateway",
api_key="gk_svc_your_key"
)
# Route to OpenAI
result_gpt = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this code for bugs"}]
)
# Route to Anthropic (same SDK, same endpoint)
result_claude = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Summarize this 50-page report"}]
)
# Route to Google (same SDK, same endpoint)
result_gemini = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Classify this support ticket"}]
)The Gateway automatically handles provider-specific API differences. You write standard OpenAI SDK code. The Gateway translates to each provider's format, manages auth, and normalizes the response — 13+ providers, one SDK.
Cost-Optimized Routing Strategy
The most effective strategy is matching task complexity to model capability. Not every task needs the most expensive model. A cost-optimized routing strategy can reduce LLM spend by 40-60% without sacrificing quality.
- Tier 1 (Simple) — Classification, extraction, formatting → Gemini Flash, GPT-4o-mini ($0.075-0.15/1M tokens)
- Tier 2 (Standard) — Summarization, Q&A, code review → Claude Sonnet, GPT-4o ($2.50-5/1M tokens)
- Tier 3 (Complex) — Multi-step reasoning, architecture, strategy → Claude Opus, o1 ($10-15/1M tokens)
Provider Failover
The Gateway includes a circuit breaker pattern. If a provider returns 5 consecutive errors, the circuit opens and requests are automatically routed to the fallback provider. After 30 seconds, the circuit half-opens and sends a test request. If it succeeds, traffic resumes.
Provider failover is automatic. Configure a primary and fallback provider per agent. If OpenAI is down, requests seamlessly route to Anthropic. The agent never sees the failure — and the switch is logged in the audit trail.