Redshift

Introduction

A fully managed OLAP, Business Intelligence (BI) tool based on PostgreSQL, scales to PBs of data.
Pay for instance you provisioned
- It's worth when you have a sustained usage, otherwise use Athena for sporadic queries.
Currently only available in 1 AZ in VPC
Integrates with BI tools:
- AWS Quicksight
- Tableau

Feature

Can create 1~128 nodes, up to 160 GB per node.
Data can be loaded from:
- S3
- DynamoDB
- Kinesis Firehose
- DMS
Data Processing
- Columnar Data Storage
  - With block size 1 MB
- Advanced Compression
- Massive parallel processing (MPP): automatically distribute data and check nodes' loading
Node Type
- Single Node
- Multi-Node
  - Leader Node (not charged)
  - Compute Node
Can enable "Redshift enhanced VPC routing" for path optimizing when executing a COPY / UNLOAD command.
Encryption
- In-transit with SSL
- At rest, Redshift takes care of key management
  - Can manage your own keys through hardware security module (HSM)
  - Can manage keys with KMS
Snapshot
- point-in-time backups of a cluster, stored in S3
- Snapshots are incremental
- Can restore snapshots to a new AZ or a new cluster
- Can configure to copy snapshot to another region
  - If KMS is used, set up a snapshot copy grant for a master key in the destination region
  - Enable cross-region snapshots in your Redshift cluster to copy snapshots of the cluster to another region
- Can be created:
  - Automatically: every 8 hours, every 5 GB, or on a schedule. With retention.
  - Manually: retained until you delete it
Workload management
- Can create different workloads so that short, fast-running queries won't get stuck in queues.
Cost efficiency
- Storing the summary data in Redshift
- Keeping detailed transaction data out of Redshift (Ex. store on S3)
- Making use of Redshift Spectrum for drill-down queries that join tables from Redshift / S3.
Redshift Spectrum
- Query data that is already in S3 without loading it
- Must have a Redshift cluster available to start the query
- The query is the submitted to thousands of Redshift Spectrum nodes
- If no need to "Join" different sources, use Athena.
Troubleshooting queries

PreviousRDS NextS3

Last updated 4 years ago

Was this helpful?