AI Agent Kill-Switch: Emergency Controls That Work in 5 Seconds
Learn how to stop all AI agent activity instantly with a kill-switch. Scoped controls, fast propagation, and recovery procedures.
What you will learn
- Understand when and why you need a kill-switch for AI agents
- Activate scoped kill-switches (all traffic, LLM only, new keys only)
- Know the propagation timeline — from activation to full enforcement
- Follow the recovery procedure to safely resume operations
When You Need a Kill-Switch
Kill-switches are emergency controls. You do not use them daily — but when you need them, you need them fast. Scenarios include: an agent entering an infinite loop burning tokens, a compromised API key being used maliciously, a production incident where agent actions would make things worse, or a data breach where you need to stop all data access immediately.
The kill-switch is insurance you hope to never use. But the one time you need it, nothing else will do.
Agent goes rogue at 2 AM. You scramble to find API keys across 3 providers, revoke them manually, restart services. Takes 45 minutes. Damage: $3,000 in wasted tokens and 200 incorrect customer emails sent.
Alert fires. You activate the kill-switch from your phone. All agent traffic stops in 5 seconds. Damage contained. You investigate in the morning, fix the root cause, and deactivate the switch.
Three Kill-Switch Scopes
- All Traffic — blocks every request through the Gateway. Use for critical incidents where you need total silence.
- LLM Only — blocks LLM calls but allows MCP tool calls and API access. Use when the LLM provider is the problem (bad model outputs, prompt injection).
- New Keys Only — blocks new API key creation but allows existing keys to work. Use when you suspect a key compromise and want to prevent escalation.
How Fast It Propagates
The kill-switch state is cached in Redis with a 5-second TTL. When you activate it, every subsequent Gateway request checks the kill-switch state. Within 5 seconds, all new requests are blocked. In-flight requests complete normally — they are not interrupted mid-stream.
In Dobby, the kill-switch is available at the organization level. Activate it from the dashboard, the API, or the MCP tools. It affects all tenants under the organization instantly.
Activating via API
# Activate kill-switch (all traffic)
curl -X POST "https://dobby-ai.com/api/v1/organizations/{org_id}/gateway/kill-switch" \
-H "Authorization: Bearer sk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"scope": "all",
"reason": "Agent loop detected - burning tokens at $50/min"
}'
# Response: { "active": true, "scope": "all", "activated_at": "..." }
# Check status
curl "https://dobby-ai.com/api/v1/organizations/{org_id}/gateway/kill-switch" \
-H "Authorization: Bearer sk_live_your_key"
# Deactivate
curl -X DELETE "https://dobby-ai.com/api/v1/organizations/{org_id}/gateway/kill-switch" \
-H "Authorization: Bearer sk_live_your_key"Recovery Procedure
Investigate the root cause. Check the audit trail for the time period leading up to the incident. Identify which agent and which requests caused the problem.
Fix the root cause. Update the agent configuration, revoke compromised keys, adjust policies, or disable the specific agent.
Deactivate the kill-switch. Traffic resumes within 5 seconds. Monitor the dashboard closely for the first 10 minutes.
Post-mortem. Document what happened, what was the impact, and what preventive measures will be added (budget alerts, stricter policies, additional approval gates).
Prevention Is Better Than Emergency
The kill-switch is a last resort. Good governance reduces the chance you will ever need it. Set token budgets with alerts at 80%. Configure approval gates for high-risk actions. Restrict models to prevent expensive mistakes. Monitor the Live feed for anomalies.
When the kill-switch is activated, Dobby sends an immediate Slack alert to #dobby-alerts with the reason, scope, and who activated it. When it is deactivated, a recovery alert is sent. Everything is logged in the audit trail.