When an AI agent is compromised or behaves unexpectedly, traditional incident response playbooks may not apply. AI systems require specialized detection, containment, and recovery procedures. This playbook provides a framework for handling AI-related security incidents.
AI Incident Categories
Not all AI incidents are the same. Understanding the category helps determine response priorities:
| Category | Description | Severity | Response Time |
|---|---|---|---|
| Active Exploitation | Attacker actively using compromised agent | Critical | Immediate |
| Data Breach | Sensitive data exposed through agent | Critical | < 1 hour |
| Safety Bypass | Agent producing harmful outputs | High | < 4 hours |
| Anomalous Behavior | Unexpected but not clearly malicious | Medium | < 24 hours |
| Performance Degradation | Quality or reliability issues | Low | < 72 hours |
Detection Signals
High-Confidence Indicators
These signals warrant immediate investigation:
Definite Compromise:
- Agent accessing resources outside normal scope
- Unusual tool call patterns (frequency, parameters)
- Data exfiltration attempts detected
- Known attack patterns in logs
Probable Compromise:
- Multiple failed attempts followed by success
- Sudden behavior changes
- Requests for credentials or sensitive data
- Attempts to escalate privileges
Low-Confidence Indicators
These may indicate issues but require context:
Possible Issues:
- Increased error rates
- Latency spikes
- User complaints about responses
- Unusual topic patterns
Response Phases
Phase 1: Detection & Triage
Immediate Actions (0-15 minutes):
-
Verify the Alert
- Confirm the incident is real (not false positive)
- Review raw logs and evidence
- Identify affected systems/users
-
Initial Classification
- Determine incident category
- Assess potential impact scope
- Identify affected data types
-
Notify Stakeholders
- Alert incident response team
- Notify management if high severity
- Prepare for potential escalation
Phase 2: Containment
Goal: Stop the bleeding without destroying evidence.
Containment Options by Severity:
| Severity | Primary Action | Secondary Action |
|---|---|---|
| Critical | Kill switch - disable agent entirely | Isolate affected systems |
| High | Restrict to read-only operations | Enable enhanced logging |
| Medium | Rate limit operations | Increase monitoring |
| Low | Flag for review | Document behavior |
Containment Checklist:
- Affected agent identified
- Containment action selected
- Containment implemented
- Verification that containment is effective
- Evidence preservation initiated
- Timeline documentation started
Phase 3: Investigation
Evidence Collection:
Gather and preserve:
- Complete conversation logs
- Tool invocation history
- Input/output pairs
- System metrics and telemetry
- User reports and complaints
- Configuration snapshots
Analysis Framework:
-
Timeline Construction
- When did anomalous behavior start?
- What was the first indicator?
- What actions occurred in sequence?
-
Attack Vector Identification
- How did the attacker gain access?
- What prompt or input triggered the issue?
- Was this direct or indirect injection?
-
Impact Assessment
- What data was exposed?
- What actions were taken?
- Who was affected?
Phase 4: Eradication
Remove the Threat:
-
Identify Root Cause
- Document the specific vulnerability
- Understand how it was exploited
- Identify any persistence mechanisms
-
Remediate
- Patch the vulnerability
- Update guardrails and filters
- Strengthen relevant controls
-
Verify Fix
- Test with known attack patterns
- Confirm vulnerability is addressed
- Review related areas for similar issues
Phase 5: Recovery
Restore Normal Operations:
Staged Recovery:
- Enhanced monitoring mode
- Limited operation (reduced permissions)
- Gradual permission restoration
- Full operation with monitoring
Validation Checklist:
- Vulnerability confirmed fixed
- Agent behavior tested
- Monitoring in place
- Rollback plan ready
- Stakeholders notified
Phase 6: Post-Incident
Learn and Improve:
Post-Incident Review:
- What happened?
- How was it detected?
- What worked well in response?
- What could be improved?
- What systemic changes are needed?
Documentation:
- Incident report
- Timeline of events
- Actions taken
- Lessons learned
- Recommendations
Communication Templates
Internal Alert (Critical)
CRITICAL AI INCIDENT - [Agent Name]
Time Detected: [Timestamp]
Category: [Active Exploitation / Data Breach / etc.]
Status: [Investigating / Contained / Resolved]
Summary: [Brief description]
Immediate Actions Taken:
- [Action 1]
- [Action 2]
Next Update: [Time]
Incident Lead: [Name]
Stakeholder Update
AI Security Incident Update
Incident ID: [Number]
Status: [Status]
Impact: [Description of impact]
What Happened:
[Brief, non-technical summary]
Current Status:
[What's happening now]
Next Steps:
[What will happen next]
Questions: Contact [Name] at [Contact]
Preparation Checklist
Before incidents happen, ensure you have:
Technical Readiness:
- Kill switch mechanism tested
- Logging and monitoring in place
- Evidence collection procedures
- Backup and recovery tested
Process Readiness:
- Incident response playbook documented
- On-call rotation established
- Escalation paths defined
- Communication templates ready
Team Readiness:
- Team trained on AI-specific incidents
- Tabletop exercises conducted
- Roles and responsibilities clear
- Contact information current
Key Takeaways
- Prepare before incidents - Have playbooks, tools, and training ready
- Detect early - Invest in monitoring and alerting
- Contain quickly - Speed matters, but preserve evidence
- Investigate thoroughly - Understand root cause before declaring victory
- Learn continuously - Every incident is an opportunity to improve
AI incident response is different from traditional IR. Prepare accordingly.
Need help building your AI incident response capability? Schedule a demo to see how Saf3AI can help.