AI Agent Cost Control: Budgets, Quotas & Alerts
Track AI costs per agent, set token budgets, configure provider quotas, and get alerts before you overspend.
What you will learn
- Track AI costs per agent, per provider, and per user
- Set up token budgets with automatic alerts at 80%, 90%, 100%
- Configure provider quotas to prevent overspend
- Use the FinOps dashboard to optimize cost allocation
The AI Cost Problem
AI agent costs are invisible until the bill arrives. A single agent running a loop with GPT-4 can burn $500 in an afternoon. Multiply that by 10 agents across 3 providers, and your monthly AI spend becomes unpredictable and uncontrollable.
The solution is not to stop using AI — it is to instrument every call, set limits, and get alerts before the damage is done.
End-of-month surprise: $12,000 OpenAI bill. Nobody knows which agent caused it. The team scrambles to check logs across 3 provider dashboards.
Real-time dashboard shows cost per agent. Budget alert fired at $500 (80% of $625 monthly limit). The responsible agent was automatically throttled before reaching the cap.
Three Layers of Cost Control
- Layer 1: Tracking — Every LLM call is metered with tokens consumed, cost calculated, and attributed to an agent, user, and provider
- Layer 2: Budgets — Token budgets per agent, per tenant, per organization. Daily and monthly limits with configurable thresholds
- Layer 3: Quotas — Provider-level quotas that sync with your actual provider limits (requests per day, tokens per minute)
Setting Up Token Budgets
// Example: Create a monthly budget for an agent
POST /api/v1/tenants/{tenantId}/budgets
{
"name": "Backend Agent Monthly",
"agent_id": "agent_backend_001",
"budget_type": "monthly",
"token_limit": 500000,
"cost_limit_usd": 50.00,
"alert_thresholds": [80, 90, 100],
"action_on_limit": "block" // or "warn"
}When a budget threshold is hit, Dobby sends a Slack alert to #dobby-alerts with the agent name, current spend, and budget limit. At 100%, the agent is automatically blocked from making further LLM calls.
Provider Quotas
Provider quotas are different from budgets. Budgets control how much you want to spend. Quotas reflect how much your provider allows you to spend — API rate limits, requests per day, tokens per minute.
Sync your provider quotas into the platform so agents are automatically throttled before hitting provider rate limits. This prevents 429 errors and wasted retry cycles.
The FinOps Dashboard
The FinOps dashboard provides 5 views of your AI spend: Overview (total cost, daily trend, forecast), Cost by Agent (which agents cost the most), Cost by Provider (OpenAI vs Anthropic vs Google), Cost by User (who is consuming the most), and Cost by Department.
- Daily trend chart with 30-day forecast
- Per-agent drill-down with provider and model breakdown
- Budget tracking with visual progress bars
- Monthly comparison and week-over-week delta
Cost Optimization Tips
- Route simple tasks to cheaper models (GPT-4o-mini instead of GPT-4o)
- Use the Gateway semantic cache to avoid duplicate LLM calls
- Set model restrictions per agent — not every agent needs the most expensive model
- Review the By Agent page weekly to catch cost spikes early
The Gateway semantic cache can reduce LLM costs by 20-40% for repetitive queries. It checks if a semantically similar question was already answered and returns the cached response in under 1ms.