Dobby
Back to Academy
GatewayAdvanced

Multi-Provider LLM Routing: One API for 13+ Providers

Route AI requests to the right LLM provider automatically. One OpenAI-compatible endpoint for OpenAI, Anthropic, Google, Mistral, and 9 more — with failover.

10 min read Gil KalMar 25, 2026

What you will learn

  • Route requests to 13+ LLM providers through a single endpoint
  • Implement cost-optimized routing based on task complexity
  • Set up failover between providers for high availability
  • Compare provider performance with real cost data
  • Understand how provider outages affect your agent fleet

TL;DR — No single LLM provider is best at everything. Routing by task complexity can cut LLM spend 40-60% with no quality loss, and automatic failover keeps agents running even when a provider has a 2 AM outage.

The Multi-Provider Challenge

No single LLM provider is best at everything. GPT-4o excels at reasoning. Claude handles long documents beautifully. Gemini is fast and cost-effective for classification. Mistral offers strong performance at lower cost for European data residency. The question is not which provider to use — it is how to use them all efficiently.

Without Dobby

Locked into one provider. Every agent uses GPT-4o, even for simple tasks that a cheaper model handles well. Provider outage means all agents stop.

With Dobby

Each agent uses the best model for its task. Simple classification uses Gemini Flash ($0.075/1M tokens). Complex reasoning uses GPT-4o. If one provider goes down, traffic fails over to another.

Supported Providers

  • OpenAI — GPT-4o, GPT-4o-mini, o1, o3
  • Anthropic — Claude Sonnet, Claude Haiku, Claude Opus
  • Google — Gemini 2.5 Flash, Gemini 2.5 Pro
  • Mistral — Mistral Large, Mistral Small, Codestral
  • AWS Bedrock — Claude, Titan, Llama via AWS
  • Azure OpenAI — GPT models via Azure
  • Grok — xAI's Grok models
  • DeepSeek — DeepSeek V3, DeepSeek Coder
  • Perplexity — pplx-7b, pplx-70b (search-augmented)
  • Replicate — Llama, Mixtral via Replicate
  • Plus GitHub Copilot, Microsoft 365 Copilot, and more

How Routing Works

The Gateway uses a standard OpenAI-compatible API. You specify the model in the request, and the Gateway routes to the correct provider. Your API keys for each provider are stored encrypted in the platform — the agent never sees them.

python
from openai import OpenAI

client = OpenAI(
    base_url="https://dobby-ai.com/api/v1/gateway",
    api_key="gk_svc_your_key"
)

# Route to OpenAI
result_gpt = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this code for bugs"}]
)

# Route to Anthropic (same SDK, same endpoint)
result_claude = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Summarize this 50-page report"}]
)

# Route to Google (same SDK, same endpoint)
result_gemini = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Classify this support ticket"}]
)

The Gateway automatically handles provider-specific API differences. You write standard OpenAI SDK code. The Gateway translates to each provider's format, manages auth, and normalizes the response — 13+ providers, one SDK.

Cost-Optimized Routing Strategy

The most effective strategy is matching task complexity to model capability. Not every task needs the most expensive model. A cost-optimized routing strategy can reduce LLM spend by 40-60% without sacrificing quality.

  • Tier 1 (Simple) — Classification, extraction, formatting → Gemini Flash, GPT-4o-mini ($0.075-0.15/1M tokens)
  • Tier 2 (Standard) — Summarization, Q&A, code review → Claude Sonnet, GPT-4o ($2.50-5/1M tokens)
  • Tier 3 (Complex) — Multi-step reasoning, architecture, strategy → Claude Opus, o1 ($10-15/1M tokens)

Provider Failover

The Gateway includes a circuit breaker pattern. If a provider returns 5 consecutive errors, the circuit opens and requests are automatically routed to the fallback provider. After 30 seconds, the circuit half-opens and sends a test request. If it succeeds, traffic resumes.

Without Dobby

OpenAI has a 20-minute outage. Every agent in your fleet fails. Support tickets pile up. You spend the outage writing an apology email.

With Dobby

OpenAI has the same outage. Circuit breaker opens after 5 errors. Traffic fails over to Anthropic within seconds. Agents keep running. You find out from the audit trail the next morning.

Provider failover is automatic. Configure a primary and fallback provider per agent. If OpenAI is down, requests seamlessly route to Anthropic. The agent never sees the failure — and the switch is logged in the audit trail.

Measuring Provider Quality With Real Data

Run the same prompt through 2-3 providers for a week. Capture quality scores (human rating or LLM-as-judge), latency, and cost. The FinOps dashboard already has cost and latency — plug quality scores in via the evaluation API and you get a quality-per-dollar ranking you can actually defend in a budget meeting.

Frequently Asked Questions

Do I need a separate contract with every provider?

You can either BYOK (bring your existing provider contracts and keys) or use Dobby's shared provider quota. BYOK preserves your negotiated pricing; shared quota is the fastest way to start.

What happens to streaming responses when a provider fails mid-stream?

Gateway policy is fail-safe — a mid-stream failure is surfaced to the client rather than silently swapped, so the agent can choose to retry cleanly against the fallback. For idempotent prompts, enable automatic retry-with-failover.

Can I enforce a specific provider for compliance reasons?

Yes. Org-level policy can restrict models or providers. For EU-strict workloads, pin routing to Mistral or Azure OpenAI EU / AWS Bedrock EU endpoints. Any attempt to route elsewhere is blocked and logged.

Related Features

Ready to try this yourself?

Start free — no credit card required.

Start Free
Multi-Provider LLM Routing: One API for 13+ Providers — Dobby Academy