Governanceshadow-aigovernancediscovery

Shadow AI: The Metric You're Not Measuring

Your org has 10× more AI in production than you think. Here's why traditional DLP misses it, what the right metric looks like, and how to build a governance score your board can trust.

Gil KalApril 21, 20267 min read

Every CISO I've talked to in the last three months has been asked the same question by their board: "How much AI is running in our organization, and who's governing it?"

Every CISO I've talked to has given the same honest answer: "We have a spreadsheet. And it's probably wrong."

This is Shadow AI. Not the sci-fi version where a rogue LLM takes over your datacenter — the boring version where a marketing manager built a ChatGPT agent in Zapier six months ago and it's quietly sending customer data to OpenAI every time someone books a demo. The version where a backend engineer added Claude to their CI pipeline without telling security. The version where your HR team uses a browser extension that summarizes resumes — and sends them to a third-party AI vendor your DPO has never heard of.

Why your current DLP catches none of this

Traditional DLP was designed for a different threat model. It watches for files, emails, and HTTP uploads containing keyword patterns — credit card numbers, SSNs, regex for sensitive strings. It works reasonably well when the question is "did Bob email the customer database to his personal Gmail?"

It works badly when the question is "which of my 400 employees is pasting customer PII into ChatGPT?" Three reasons:

ChatGPT.com traffic looks like any other SaaS traffic to your DLP. No attachment. No keyword-dense email. Just HTTPS to openai.com.
LLM responses don't come through your DLP at all — they go directly to the user's browser.
Your DLP has no concept of "AI agents" as a distinct category. A LangChain agent making 50 API calls per minute looks like any other backend service.

Which is how you end up with 40-plus unsanctioned AI integrations and a one-line answer for the board: "we'll get back to you."

What "Shadow AI" actually looks like (three patterns)

From the CISO interviews I've been running, shadow AI lands in three distinct patterns:

1. Employee browser usage

Your employees paste company data into ChatGPT.com, Claude.ai, Gemini, Copilot, Perplexity. They use Chrome extensions that rewrite emails with AI. They dictate meeting notes to tools like Granola or Fathom. Most of this is well-intentioned — employees trying to be productive. Almost none of it is governed.

2. Unsanctioned backend integrations

An engineer adds an OpenAI call to a microservice to summarize logs. A product team builds a LangChain agent to auto-tag support tickets. A growth hacker wires Zapier to Claude to generate marketing copy. These integrations have API keys, production data access, and zero audit trail. They work until they don't.

3. Third-party vendors embedding LLMs

Your Salesforce, your HubSpot, your ATS — all of them are quietly shipping LLM-powered features. Your vendors are telling OpenAI what your customers look like. This isn't their fault; it's the 2026 SaaS default. But it IS your problem, because it's your data.

The metric you should be measuring

When your board asks "how exposed are we to AI risk?", the right answer is a number. Not a slide. Not a narrative. A number between 0 and 100.

Here's the metric I've been pitching to design-partner CISOs, and the feedback has been unanimous: YES, this is what we need.

Per-agent Governance Posture Score: every AI agent in the org scored against a fixed set of governance controls — is there a policy bound to it? A budget cap? Audit logs? A kill switch? An owner? — rolled up to a single org-level number.

We score each agent 0-100 across 18 controls in 6 categories (safety, cost, observability, operational, security, quality). A fresh agent with no policy, no budget, no audit, no owner might score 15. An agent with all 18 controls configured scores 100. The org-wide rollup is the single number your CISO takes to the board.

The insight is not the controls themselves — every compliance team has a spreadsheet of those. The insight is applying them AT THE AGENT LAYER, automatically, for every agent (including the ones IT didn't authorize), and surfacing the score as a live metric.

Why "observability" doesn't solve this

If you've spent any time in the AI Gateway space, you've seen the pitch: "We give you dashboards for every LLM call." That's observability. It's useful. It's not governance.

Observability tells you: "agent X made 400 requests yesterday." Governance tells you: "agent X has no policy binding, no budget cap, no audit retention, and was last reviewed 90 days ago — and 37 other agents in your fleet are in the same state."

Observability is what happened. Governance is what's approved. The difference matters because auditors, regulators, and boards ask governance questions, not observability questions.

The "4 Control Modes" framing

Here's the framing that clicks with CISOs once we've walked through the problem: your AI traffic can be reached in one of four ways. You should know which of your AI falls into which mode, and your policy should flex accordingly.

**Inline** — the AI traffic flows through a gateway you control. Full policy enforcement possible.
**Hybrid** — some routes flow through an MCP proxy you control. Tool-level governance possible.
**Surrounding** — the AI traffic doesn't flow through you, but you receive telemetry. Post-hoc audit + alerting.
**Shadow** — neither traffic nor telemetry. You don't know it exists (yet).

Most enterprise security teams only operate in the first mode (Inline). Shadow AI is, by definition, the fourth mode. Which is why it's invisible to you. The trick is to make the fourth mode visible first, then gradually promote it to mode 3, then 2, then 1 — each promotion unlocks a stricter governance policy.

What to do this quarter

Regardless of whether you buy Dobby or another tool, the operational steps are the same:

Run a log-based Discovery scan on your outbound traffic logs (your SIEM already has them). You're looking for any traffic to *.openai.com, *.anthropic.com, *.ai.google.dev, *.perplexity.ai, and a dozen other AI endpoints. Group by source IP or user, rank by volume.
Survey your employees. One well-crafted anonymous form with an honest "we're not going to punish you — we just need to know" framing gets you 80% of the picture. The other 20% comes from step 1.
Pick a metric. A single number you'll report to your board quarterly. Per-agent governance score is my recommendation, but even a simple "unmanaged AI sources: 47" is better than nothing.
Get one win this quarter. Pick the top-3 most-used shadow AI tools, register them formally, bind a policy, put them under governance. Ship the number improvement to the board.
Repeat quarterly. The number goes down. You've proven governance.

The bottom line

Shadow AI is not a tooling problem. It's a measurement problem. Once you have the right metric in place — a single org-wide governance score — the operational work becomes obvious. The hardest step is the first one: admitting that what you have today is a spreadsheet and it's probably wrong.

If you're a CISO at a 100-500 person organization in a compliance-sensitive industry, and you want to see your governance score in 30 minutes, I'd be happy to walk you through it. Book a call — no demo unless you ask for one, just 8 research questions about how your team is thinking about this.

— Gil

Ready to take control of your AI agents?

Start free with Dobby AI — connect, monitor, and govern agents from any framework.

Get Started Free