Disaster Recovery

RTO & RPO

DR Strategies

Warm Standby
- Full system is up and running with a minimum scale
- Upon disaster, scale up for production load

Hot Site / Multi Site
- Lowest RTO but expensive
- Full Production Sacle is running on AWS and on premise

Disaster Recovery Tips

Backup
- EBS snapshot, RDS automated backups / snapshots, etc.
- Regular pushes to S3 / S3 IA / Glacier with lifecycle policy, Cross Region Replication
- From on-premise: Snowball / Storage Gateway
High Availability
- Use Route53 to migrate DNS over from Region to Region
- RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
- Site to site VPN as a recovery from Direct Connect
Replication
- RDS Replication (cross region), AWS Aurora + Global Database
- Database replication from on-premise to RDS
- Storage Gateway
Automation
- CloudFormation / Elastic Beanstalk to recreate a whole new environment
- Recover / reboot EC2 instances with CloudWatch if alarms fail
- AWS Lambda for customized automations
Chaos
- Netlflix has a "simian-army" randomly terminating EC2

Last updated 4 years ago

Was this helpful?