AdvancedAdvanced

FinOps for AI Agents: Per-Agent Cost Tracking & Optimization

Track, analyze, and optimize AI agent costs. Per-agent breakdown, provider comparison, forecasting, and 5 strategies that typically cut spend 40-60%.

12 min read Gil KalMar 19, 2026

What you will learn

Implement per-agent, per-provider, and per-user cost attribution
Use the FinOps dashboard to identify optimization opportunities
Build cost forecasts based on historical usage patterns
Reduce AI spend by 40-60% with model tiering and caching
Build a chargeback model that finance will trust

TL;DR — AI FinOps is three things: instrumentation (per-call attribution), analysis (which agent/provider/team drives cost), and optimization (tiering, caching, budgets). Most teams that do all three cut spend 40-60% in the first quarter.

AI FinOps: Beyond Cost Tracking

FinOps for AI agents goes beyond knowing how much you spend. It is about understanding where the money goes (per agent, per provider, per user), predicting future spend, and systematically reducing costs without sacrificing quality. The teams that master AI FinOps spend 40-60% less than those that do not.

Without Dobby

Monthly AI bill: $8,000. Nobody knows which agents cost the most. The team suspects the QA agent is expensive but cannot prove it. Cost optimization is guesswork.

With Dobby

Monthly AI bill: $3,200. Dashboard shows the QA agent costs $2,100/month on GPT-4o. Switched to Claude Sonnet for QA — same quality, 60% less cost. Forecast: $2,800 next month.

The FinOps Dashboard

The FinOps dashboard provides five views of your AI spend, each answering a different question:

Overview — Total spend, daily trend, 30-day forecast, budget tracking. How much are we spending and where is it going?
Cost by Agent — Per-agent breakdown with expandable provider drill-down. Which agents cost the most?
Cost by Provider — OpenAI vs Anthropic vs Google cost comparison. Which provider gives the best value?
Cost by User — Per-user cost attribution. Who is consuming the most resources?
Cost by Department — Team-level cost allocation for chargeback. Which department should be billed?

Cost Attribution Architecture

Every LLM call through the Gateway is tagged with: agent ID, user ID, tenant ID, organization ID, provider, model, and request metadata. This creates a multi-dimensional cost cube that you can slice any way you need.

sql

-- Example: Top 10 most expensive agents (last 30 days)
SELECT
  agent_id,
  agent_name,
  COUNT(*) as total_requests,
  SUM(total_tokens) as total_tokens,
  SUM(estimated_cost_usd) as total_cost,
  ROUND(SUM(estimated_cost_usd) / COUNT(*), 4) as avg_cost_per_request
FROM ds_platform.llm_gateway_requests
WHERE organization_id = @orgId
  AND created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY agent_id, agent_name
ORDER BY total_cost DESC
LIMIT 10

The FinOps dashboard runs 6 parallel BigQuery queries to build the complete cost picture. Data updates in real-time as Gateway requests flow through. No manual ETL, no separate analytics pipeline — it is built into the platform.

5 Optimization Strategies

1. Model Tiering

Match model capability to task complexity. Use GPT-4o-mini or Gemini Flash for simple tasks (classification, extraction) at 10-20x less cost than GPT-4o. Reserve expensive models for complex reasoning.

2. Semantic Caching

The Gateway semantic cache checks if a semantically similar question was already answered. Cache hits return in under 1ms at zero LLM cost. Typical savings: 20-40% for repetitive workloads.

3. Token Budget Automation

Set daily and monthly budgets per agent. Automatic alerts at 80% and 90%. Automatic blocking at 100%. This prevents runaway costs before they happen.

4. Provider Shopping

Use the Cost by Provider view to compare quality-per-dollar across providers. Often, switching one agent from Provider A to Provider B saves 30-50% with no quality loss.

5. Request Optimization

Reduce token consumption by: shortening system prompts, using structured outputs (JSON mode), batching similar requests, and pruning conversation history. Each optimization compounds.

Start with the Cost by Agent view. Find your top 3 most expensive agents. For each, check if a cheaper model would work. This single action typically saves 20-30% of total AI spend.

Chargeback That Finance Will Trust

Once per-agent and per-department cost is accurate, you can treat AI like any other cloud bill: allocate, budget, forecast, and chargeback. Three rules make finance comfortable.

Reconcile monthly — compare Dobby cost vs provider invoice, document any variance.
Version the allocation model — 'as of 2026-04-01 we attribute cached hits to the original requester'. Write it down.
Export the same CSV every month — same columns, same format. Boring is trustworthy.

Frequently Asked Questions

How accurate is the forecast?

The 30-day forecast blends 7-day and 30-day running averages with seasonality adjustments. Typical error is 10-15% on stable workloads and higher during ramp periods. It is not a budget — it is a heads-up.

Can I include non-LLM costs (vector DB, compute)?

Yes — push external cost events into the Gateway's cost metadata endpoint and they appear in the same dashboard. This gives you a single AI-TCO view rather than multiple disconnected bills.

How do I prove a 40-60% savings to my CFO?

Take a baseline month before optimizations, then compare post-optimization months with the same workload mix. The dashboard's week-over-week delta + provider invoice reconciliation is usually enough to make finance happy.

Related Features

Finops Mcp Analytics

Continue Learning

Building a Control Plane for AI Agents: Architecture Guide Autonomous AI Agent Execution: Scheduling, Governance & Safety

Ready to try this yourself?

Start free — no credit card required.

Start Free