Disaster Recovery

RTO & RPO

RTO & RPO

DR Strategies

  • Backup & Restore

    • High RPO & RTO but cheapest

Backup & Restore DR
  • Pilot Light

    • Critical part of the application is in the cloud (DB)

Pilot Light DR
  • Warm Standby

    • Full system is up and running with a minimum scale

    • Upon disaster, scale up for production load

Warm standby DR
  • Hot Site / Multi Site

    • Lowest RTO but expensive

    • Full Production Sacle is running on AWS and on premise

On premise & on AWS DR

Multi Site DR

Disaster Recovery Tips

  • Backup

    • EBS snapshot, RDS automated backups / snapshots, etc.

    • Regular pushes to S3 / S3 IA / Glacier with lifecycle policy, Cross Region Replication

    • From on-premise: Snowball / Storage Gateway

  • High Availability

    • Use Route53 to migrate DNS over from Region to Region

    • RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3

    • Site to site VPN as a recovery from Direct Connect

  • Replication

    • RDS Replication (cross region), AWS Aurora + Global Database

    • Database replication from on-premise to RDS

    • Storage Gateway

  • Automation

    • CloudFormation / Elastic Beanstalk to recreate a whole new environment

    • Recover / reboot EC2 instances with CloudWatch if alarms fail

    • AWS Lambda for customized automations

  • Chaos

    • Netlflix has a "simian-army" randomly terminating EC2

Last updated

Was this helpful?