Disaster Recovery
Last updated
Last updated
RTO & RPO
DR Strategies
Backup & Restore
High RPO & RTO but cheapest
Pilot Light
Critical part of the application is in the cloud (DB)
Warm Standby
Full system is up and running with a minimum scale
Upon disaster, scale up for production load
Hot Site / Multi Site
Lowest RTO but expensive
Full Production Sacle is running on AWS and on premise
Disaster Recovery Tips
Backup
EBS snapshot, RDS automated backups / snapshots, etc.
Regular pushes to S3 / S3 IA / Glacier with lifecycle policy, Cross Region Replication
From on-premise: Snowball / Storage Gateway
High Availability
Use Route53 to migrate DNS over from Region to Region
RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
Site to site VPN as a recovery from Direct Connect
Replication
RDS Replication (cross region), AWS Aurora + Global Database
Database replication from on-premise to RDS
Storage Gateway
Automation
CloudFormation / Elastic Beanstalk to recreate a whole new environment
Recover / reboot EC2 instances with CloudWatch if alarms fail
AWS Lambda for customized automations
Chaos
Netlflix has a "simian-army" randomly terminating EC2