Back to Checklists

Tabletop Scenario Template: Cloud Outage

SOC 2Incident Response

Tabletop Scenario Template: Cloud Outage

This scenario simulates a major outage affecting your primary cloud provider (e.g., AWS, Azure, GCP) and is intended for SaaS companies with remote teams and cloud-hosted infrastructure.

Exercise Setup

Field

Value

Scenario Name

Cloud Infrastructure Outage

Exercise Type

Discussion-based Tabletop

Facilitator

Name/Role of Facilitator

Date

MM/DD/YYYY

Duration

60–90 minutes

Participants

Engineering, Security, Customer Success, Product, Executive

Objective

Test response to a sustained cloud infrastructure failure, assess comms, data resilience, and decision-making under pressure.

Scenario Briefing

At 8:30 AM EST, your monitoring system detects elevated latency and timeout errors for your primary SaaS application hosted in AWS (us-east-1).

By 8:45 AM, your status page shows partial outage. Users are reporting login failures and data not loading.


Timeline of Escalation

Time

New Information

T+0

Customers report your app is down. Monitoring shows service errors.

T+15 min

AWS status page confirms “connectivity issues” in us-east-1.

T+30 min

Your backups are stored in the same region. Users escalate via social media.

T+45 min

A key enterprise client emails saying their team is blocked.

T+60 min

Internal tools (Slack, CI/CD) also affected. Engineers are asking for direction.


Discussion Prompts

Category

Questions

Detection

How do we first become aware of the issue? Who confirms it?

Communication

When and how do we notify users? Who approves the message?

Internal Coordination

What tools do we use to coordinate? Who leads incident command?

Business Impact

What services are down? What SLAs are being violated?

Recovery Plan

Do we have cross-region failover? Can we restore from backup?

Escalation

Who is authorized to declare this a major incident? Do we contact AWS?

Postmortem Readiness

Are logs and actions being recorded? What needs to be reviewed later?

Key Documents to Reference

  • Incident Response Plan
  • Disaster Recovery Plan
  • Customer Communication Templates
  • Status Page Access
  • Engineering On-call Roster
  • Escalation Contacts Sheet



After-Action Review (Facilitator Use)

Question

Notes

What went well?

E.g., “Quick detection via monitoring.”

What needs improvement?

E.g., “Confusion over who owns customer messaging.”

What were the biggest delays or gaps?

E.g., “Backups were not regionally redundant.”

Which documents or procedures need updating?

E.g., IR Plan did not include Slack fallback channels.

Who owns each follow-up action?

List action items and owners.