Redshift

Introduction

  • A fully managed OLAP, Business Intelligence (BI) tool based on PostgreSQL, scales to PBs of data.

  • Pay for instance you provisioned

    • It's worth when you have a sustained usage, otherwise use Athena for sporadic queries.

  • Currently only available in 1 AZ in VPC

  • Integrates with BI tools:

    • AWS Quicksight

    • Tableau

Feature

  • Can create 1~128 nodes, up to 160 GB per node.

  • Data can be loaded from:

    • S3

    • DynamoDB

    • Kinesis Firehose

    • DMS

  • Data Processing

    • Columnar Data Storage

      • With block size 1 MB

    • Advanced Compression

    • Massive parallel processing (MPP): automatically distribute data and check nodes' loading

  • Node Type

    • Single Node

    • Multi-Node

      • Leader Node (not charged)

      • Compute Node

  • Can enable "Redshift enhanced VPC routing" for path optimizing when executing a COPY / UNLOAD command.

  • Encryption

    • In-transit with SSL

    • At rest, Redshift takes care of key management

      • Can manage your own keys through hardware security module (HSM)

      • Can manage keys with KMS

  • Snapshot

    • point-in-time backups of a cluster, stored in S3

    • Snapshots are incremental

    • Can restore snapshots to a new AZ or a new cluster

    • Can configure to copy snapshot to another region

      • If KMS is used, set up a snapshot copy grant for a master key in the destination region

      • Enable cross-region snapshots in your Redshift cluster to copy snapshots of the cluster to another region

    • Can be created:

      • Automatically: every 8 hours, every 5 GB, or on a schedule. With retention.

      • Manually: retained until you delete it

  • Workload management

    • Can create different workloads so that short, fast-running queries won't get stuck in queues.

  • Cost efficiency

    • Storing the summary data in Redshift

    • Keeping detailed transaction data out of Redshift (Ex. store on S3)

    • Making use of Redshift Spectrum for drill-down queries that join tables from Redshift / S3.

  • Redshift Spectrum

    • Query data that is already in S3 without loading it

    • Must have a Redshift cluster available to start the query

    • The query is the submitted to thousands of Redshift Spectrum nodes

    • If no need to "Join" different sources, use Athena.

Last updated