Governancehuman-in-the-loopapprovalsgovernance

Human-in-the-Loop AI: 5 Approval Gate Patterns That Keep Agents Safe

Fully autonomous AI agents sound exciting — until one overspends your budget or sends the wrong email. 5 approval gate patterns that keep agents productive, auditable, and safe.

Gil KalMarch 26, 20266 min read

The promise of AI agents is autonomy: define a goal, let the agent figure out the steps, and get the result. In practice, every production team learns the same lesson — fully autonomous agents are a liability. Not because the technology is bad, but because the cost of a wrong decision scales with autonomy. A chatbot that gives a mediocre answer is a minor issue. An agent that deploys untested code to production or commits to a vendor contract is a different story entirely.

Consider what happens without guardrails. A code review agent finds a vulnerability and decides to push a fix directly to production — bypassing the staging environment. A customer support agent, trying to resolve a complaint, offers a 90 percent discount without checking company policy. A DevOps agent, responding to a scaling alert, spins up 50 new instances when 5 would have been enough. These are not hypothetical edge cases. They are the kinds of decisions agents make when they optimize for task completion without human judgment in the loop.

The Trust Spectrum

Think of agent autonomy as a spectrum. On one end, every action requires human approval — safe but slow. On the other end, agents act freely — fast but risky. The sweet spot is somewhere in the middle: let agents handle routine decisions autonomously while requiring human approval for high-risk actions. The challenge is defining where that line sits for your organization.

Different teams draw the line differently. An engineering team might auto-approve code changes that touch fewer than three files, all within a known safe directory. A finance team might require approval for any action involving money, regardless of amount. A marketing team might let agents draft content freely but require sign-off before anything is published externally. The point is that the line should be a conscious decision, not an afterthought.

What Approval Gates Look Like

An approval gate is a checkpoint where an agent pauses execution and waits for a human decision. The agent presents what it wants to do and why, then a human approves, rejects, or requests changes. The key is that the gate is non-blocking for the human — they review when available — but blocking for the agent — it cannot proceed without approval. This means agents do not sit idle waiting. They move on to other tasks and resume when approval arrives.

Common triggers for approval gates include:

Code changes that affect more than 5 files or 100 lines
External API calls that cost more than a threshold amount
Actions that modify production infrastructure
Communications sent to external parties (emails, messages, Slack posts)
Database migrations or schema changes
Any action involving financial transactions or commitments
Any action that the agent itself flags as uncertain

The approval flow itself matters. A good system sends notifications where the reviewer already works — Slack, email, or inside the IDE via MCP. The reviewer sees a summary of the proposed action, the agent's reasoning, and any relevant context. They can approve with one click, reject with a reason, or request modifications. The entire exchange is logged.

Auto-Approve Conditions

Not every action needs a human in the loop. Auto-approve conditions let you define safe boundaries: if an agent modifies fewer than 3 files, all within allowed paths, and the cost is under a threshold — approve automatically. This gives agents room to be productive while keeping guardrails on anything unusual. The conditions are per-agent and per-organization, so your security agent can have stricter rules than your documentation agent.

Auto-approve conditions are where teams recover speed without sacrificing safety. A well-tuned configuration might auto-approve 70 to 80 percent of agent actions while catching the 20 percent that actually need human judgment. The key is reviewing auto-approve conditions regularly based on audit trail data — tightening rules where agents make mistakes and loosening them where approvals are always granted.

How Dobby Implements Approval Gates

In Dobby, approval gates are a first-class feature of the agent system. Every agent has a requires_approval flag and a set of auto_approve_conditions that define when human review is needed. When an agent hits a gate, the task moves to pending status and a notification is sent to the assigned reviewer. Approvals can be handled through the dashboard, via Slack interactive buttons, or through MCP tools directly in your IDE.

{
  "agent": "backend-worker",
  "requires_approval": true,
  "auto_approve_conditions": {
    "max_files": 3,
    "max_lines": 50,
    "allowed_paths": ["src/utils/", "src/helpers/"],
    "max_cost_usd": 0.50
  },
  "approval_routing": {
    "notify": ["slack:#dev-approvals", "email:[email protected]"],
    "escalate_after_minutes": 60,
    "auto_reject_after_hours": 24
  }
}

The approval API supports four actions: get pending approvals, approve a task, reject a task, and request changes. Each action is available through the REST API, the MCP server, and Slack interactive messages. This means reviewers never need to context-switch to a separate dashboard — they approve where they already work.

The Kill-Switch

Sometimes approval gates are not enough. If an agent is behaving erratically, consuming unexpected resources, or a security incident is unfolding, you need the ability to stop everything immediately. A kill-switch blocks all agent activity for an organization — every LLM call, every MCP tool invocation, every task execution. It is the emergency brake that makes the whole system trustworthy. The kill-switch supports three scopes: block all traffic, block LLM calls only, or block new key creation only.

The kill-switch is not a feature you hope to use. It is the feature that lets you sleep at night knowing agents are running in production.

Building Trust Incrementally

The most successful teams start with strict approval requirements and relax them over time as they build confidence. Week one: every action requires approval. Week two: auto-approve routine actions like linting fixes and documentation updates. Month two: expand autonomous boundaries based on audit trail analysis. Month six: the agent handles 90 percent of tasks autonomously, and the team only reviews truly novel situations.

This incremental approach builds organizational trust in AI agents without ever exposing the business to uncontrolled risk. It also generates the data you need to make informed decisions. Instead of guessing which actions are safe to automate, you have months of audit trail proving it.

The Audit Trail

Every approval decision — who approved, when, what the agent proposed, what it actually did — is recorded in an immutable audit trail. This is not just for compliance. It is the data that lets you tune auto-approve conditions, identify patterns in rejections, and prove to stakeholders that AI agents are operating within agreed boundaries. Without the audit trail, human-in-the-loop is just a checkbox. With it, it becomes a learning system.

The audit trail also answers questions that come up during incident reviews. Why did this agent take that action? Who approved it? What context did the reviewer have? How long did the approval take? These are the questions that turn a post-mortem from finger-pointing into process improvement. And for regulated industries — finance, healthcare, government — the audit trail is not optional. It is the documentation that proves your AI agents operate within policy.

Ready to take control of your AI agents?

Start free with Dobby AI — connect, monitor, and govern agents from any framework.

Get Started Free