When an AI agent is compromised or behaves unexpectedly, traditional incident response playbooks may not apply. AI systems require specialized detection, containment, and recovery procedures. This playbook provides a framework for handling AI-related security incidents.

AI Incident Response Phases

AI Incident Categories

Not all AI incidents are the same. Understanding the category helps determine response priorities:

CategoryDescriptionSeverityResponse Time
Active ExploitationAttacker actively using compromised agentCriticalImmediate
Data BreachSensitive data exposed through agentCritical< 1 hour
Safety BypassAgent producing harmful outputsHigh< 4 hours
Anomalous BehaviorUnexpected but not clearly maliciousMedium< 24 hours
Performance DegradationQuality or reliability issuesLow< 72 hours

Detection Signals

High-Confidence Indicators

These signals warrant immediate investigation:

Definite Compromise:

  • Agent accessing resources outside normal scope
  • Unusual tool call patterns (frequency, parameters)
  • Data exfiltration attempts detected
  • Known attack patterns in logs

Probable Compromise:

  • Multiple failed attempts followed by success
  • Sudden behavior changes
  • Requests for credentials or sensitive data
  • Attempts to escalate privileges

Low-Confidence Indicators

These may indicate issues but require context:

Possible Issues:

  • Increased error rates
  • Latency spikes
  • User complaints about responses
  • Unusual topic patterns

Response Phases

Phase 1: Detection & Triage

Immediate Actions (0-15 minutes):

  1. Verify the Alert

    • Confirm the incident is real (not false positive)
    • Review raw logs and evidence
    • Identify affected systems/users
  2. Initial Classification

    • Determine incident category
    • Assess potential impact scope
    • Identify affected data types
  3. Notify Stakeholders

    • Alert incident response team
    • Notify management if high severity
    • Prepare for potential escalation

Phase 2: Containment

Goal: Stop the bleeding without destroying evidence.

Containment Options by Severity:

SeverityPrimary ActionSecondary Action
CriticalKill switch - disable agent entirelyIsolate affected systems
HighRestrict to read-only operationsEnable enhanced logging
MediumRate limit operationsIncrease monitoring
LowFlag for reviewDocument behavior

Containment Checklist:

  • Affected agent identified
  • Containment action selected
  • Containment implemented
  • Verification that containment is effective
  • Evidence preservation initiated
  • Timeline documentation started

Phase 3: Investigation

Evidence Collection:

Gather and preserve:

  • Complete conversation logs
  • Tool invocation history
  • Input/output pairs
  • System metrics and telemetry
  • User reports and complaints
  • Configuration snapshots

Analysis Framework:

  1. Timeline Construction

    • When did anomalous behavior start?
    • What was the first indicator?
    • What actions occurred in sequence?
  2. Attack Vector Identification

    • How did the attacker gain access?
    • What prompt or input triggered the issue?
    • Was this direct or indirect injection?
  3. Impact Assessment

    • What data was exposed?
    • What actions were taken?
    • Who was affected?

Phase 4: Eradication

Remove the Threat:

  1. Identify Root Cause

    • Document the specific vulnerability
    • Understand how it was exploited
    • Identify any persistence mechanisms
  2. Remediate

    • Patch the vulnerability
    • Update guardrails and filters
    • Strengthen relevant controls
  3. Verify Fix

    • Test with known attack patterns
    • Confirm vulnerability is addressed
    • Review related areas for similar issues

Phase 5: Recovery

Restore Normal Operations:

Staged Recovery:

  1. Enhanced monitoring mode
  2. Limited operation (reduced permissions)
  3. Gradual permission restoration
  4. Full operation with monitoring

Validation Checklist:

  • Vulnerability confirmed fixed
  • Agent behavior tested
  • Monitoring in place
  • Rollback plan ready
  • Stakeholders notified

Phase 6: Post-Incident

Learn and Improve:

Post-Incident Review:

  • What happened?
  • How was it detected?
  • What worked well in response?
  • What could be improved?
  • What systemic changes are needed?

Documentation:

  • Incident report
  • Timeline of events
  • Actions taken
  • Lessons learned
  • Recommendations

Communication Templates

Internal Alert (Critical)

CRITICAL AI INCIDENT - [Agent Name]

Time Detected: [Timestamp]
Category: [Active Exploitation / Data Breach / etc.]
Status: [Investigating / Contained / Resolved]

Summary: [Brief description]

Immediate Actions Taken:
- [Action 1]
- [Action 2]

Next Update: [Time]
Incident Lead: [Name]

Stakeholder Update

AI Security Incident Update

Incident ID: [Number]
Status: [Status]
Impact: [Description of impact]

What Happened:
[Brief, non-technical summary]

Current Status:
[What's happening now]

Next Steps:
[What will happen next]

Questions: Contact [Name] at [Contact]

Preparation Checklist

Before incidents happen, ensure you have:

Technical Readiness:

  • Kill switch mechanism tested
  • Logging and monitoring in place
  • Evidence collection procedures
  • Backup and recovery tested

Process Readiness:

  • Incident response playbook documented
  • On-call rotation established
  • Escalation paths defined
  • Communication templates ready

Team Readiness:

  • Team trained on AI-specific incidents
  • Tabletop exercises conducted
  • Roles and responsibilities clear
  • Contact information current

Key Takeaways

  1. Prepare before incidents - Have playbooks, tools, and training ready
  2. Detect early - Invest in monitoring and alerting
  3. Contain quickly - Speed matters, but preserve evidence
  4. Investigate thoroughly - Understand root cause before declaring victory
  5. Learn continuously - Every incident is an opportunity to improve

AI incident response is different from traditional IR. Prepare accordingly.


Need help building your AI incident response capability? Schedule a demo to see how Saf3AI can help.