Multi-Provider LLM Routing: One API for 13+ Providers
Route AI requests to the right LLM provider automatically. One OpenAI-compatible endpoint for OpenAI, Anthropic, Google, Mistral, and 9 more — with failover.
What you will learn
- Route requests to 13+ LLM providers through a single endpoint
- Implement cost-optimized routing based on task complexity
- Set up failover between providers for high availability
- Compare provider performance with real cost data
- Understand how provider outages affect your agent fleet
TL;DR — No single LLM provider is best at everything. Routing by task complexity can cut LLM spend 40-60% with no quality loss, and automatic failover keeps agents running even when a provider has a 2 AM outage.
The Multi-Provider Challenge
No single LLM provider is best at everything. GPT-4o excels at reasoning. Claude handles long documents beautifully. Gemini is fast and cost-effective for classification. Mistral offers strong performance at lower cost for European data residency. The question is not which provider to use — it is how to use them all efficiently.
Locked into one provider. Every agent uses GPT-4o, even for simple tasks that a cheaper model handles well. Provider outage means all agents stop.
Each agent uses the best model for its task. Simple classification uses Gemini Flash ($0.075/1M tokens). Complex reasoning uses GPT-4o. If one provider goes down, traffic fails over to another.
Supported Providers
- OpenAI — GPT-4o, GPT-4o-mini, o1, o3
- Anthropic — Claude Sonnet, Claude Haiku, Claude Opus
- Google — Gemini 2.5 Flash, Gemini 2.5 Pro
- Mistral — Mistral Large, Mistral Small, Codestral
- AWS Bedrock — Claude, Titan, Llama via AWS
- Azure OpenAI — GPT models via Azure
- Grok — xAI's Grok models
- DeepSeek — DeepSeek V3, DeepSeek Coder
- Perplexity — pplx-7b, pplx-70b (search-augmented)
- Replicate — Llama, Mixtral via Replicate
- Plus GitHub Copilot, Microsoft 365 Copilot, and more
How Routing Works
The Gateway uses a standard OpenAI-compatible API. You specify the model in the request, and the Gateway routes to the correct provider. Your API keys for each provider are stored encrypted in the platform — the agent never sees them.
from openai import OpenAI
client = OpenAI(
base_url="https://dobby-ai.com/api/v1/gateway",
api_key="gk_svc_your_key"
)
# Route to OpenAI
result_gpt = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this code for bugs"}]
)
# Route to Anthropic (same SDK, same endpoint)
result_claude = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Summarize this 50-page report"}]
)
# Route to Google (same SDK, same endpoint)
result_gemini = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Classify this support ticket"}]
)The Gateway automatically handles provider-specific API differences. You write standard OpenAI SDK code. The Gateway translates to each provider's format, manages auth, and normalizes the response — 13+ providers, one SDK.
Cost-Optimized Routing Strategy
The most effective strategy is matching task complexity to model capability. Not every task needs the most expensive model. A cost-optimized routing strategy can reduce LLM spend by 40-60% without sacrificing quality.
- Tier 1 (Simple) — Classification, extraction, formatting → Gemini Flash, GPT-4o-mini ($0.075-0.15/1M tokens)
- Tier 2 (Standard) — Summarization, Q&A, code review → Claude Sonnet, GPT-4o ($2.50-5/1M tokens)
- Tier 3 (Complex) — Multi-step reasoning, architecture, strategy → Claude Opus, o1 ($10-15/1M tokens)
Provider Failover
The Gateway includes a circuit breaker pattern. If a provider returns 5 consecutive errors, the circuit opens and requests are automatically routed to the fallback provider. After 30 seconds, the circuit half-opens and sends a test request. If it succeeds, traffic resumes.
OpenAI has a 20-minute outage. Every agent in your fleet fails. Support tickets pile up. You spend the outage writing an apology email.
OpenAI has the same outage. Circuit breaker opens after 5 errors. Traffic fails over to Anthropic within seconds. Agents keep running. You find out from the audit trail the next morning.
Provider failover is automatic. Configure a primary and fallback provider per agent. If OpenAI is down, requests seamlessly route to Anthropic. The agent never sees the failure — and the switch is logged in the audit trail.
Measuring Provider Quality With Real Data
Run the same prompt through 2-3 providers for a week. Capture quality scores (human rating or LLM-as-judge), latency, and cost. The FinOps dashboard already has cost and latency — plug quality scores in via the evaluation API and you get a quality-per-dollar ranking you can actually defend in a budget meeting.
Frequently Asked Questions
Do I need a separate contract with every provider?
You can either BYOK (bring your existing provider contracts and keys) or use Dobby's shared provider quota. BYOK preserves your negotiated pricing; shared quota is the fastest way to start.
What happens to streaming responses when a provider fails mid-stream?
Gateway policy is fail-safe — a mid-stream failure is surfaced to the client rather than silently swapped, so the agent can choose to retry cleanly against the fallback. For idempotent prompts, enable automatic retry-with-failover.
Can I enforce a specific provider for compliance reasons?
Yes. Org-level policy can restrict models or providers. For EU-strict workloads, pin routing to Mistral or Azure OpenAI EU / AWS Bedrock EU endpoints. Any attempt to route elsewhere is blocked and logged.