Development Notes
  • Introduction
  • Programming Langauges
    • Java
      • Cache
      • Java Fundamentals
      • Multithreading & Concurrency
      • Spring Boot
        • Spring Security
        • Development tips
      • ORM
        • Mybatis
      • Implementation & Testing
    • Node.js
      • Asynchronous Execution
      • Node.js Notes
    • Python
      • Memo
  • Data Structure & Algorithm
  • Database
  • Design Pattern
  • AWS Notes
    • Services
      • API Gateway
      • CloudHSM
      • Compute & Load Balancing
        • Auto Scaling Group
        • EC2
        • ECS
        • ELB
        • Lambda
      • Data Engineering
        • Athena
        • Batch
        • EMR
        • IoT
        • Kinesis
        • Video Streaming
        • Quicksight
      • Deployment
        • CloudFormation
        • Code Deploy
        • Elastic Beanstalk
        • OpsWorks
        • SAM
        • SSM
      • ElasticSearch
      • Identity & Federation
        • Directory Service
        • IAM
        • Organizations
        • Resource Access Manager (RAM)
        • SSO
        • STS
      • KMS
      • Management Tools
        • Catalog
        • CloudTrail
        • CloudWatch
        • Config
        • Cost Allocation Tags
        • GuardDuty
        • Savings Plans
        • Trusted Advisor
        • X-Ray
      • Migration
        • Cloud Migration: The 6R
        • Disaster Recovery
        • DMS
        • VM Migrations
      • Networking
        • ACM
        • CloudFront
        • Direct Connect
        • EIP & ENI
        • Network Security
        • PrivateLink
        • Route53
        • VPC
        • VPN
      • Service Commnucation
        • Amazon MQ
        • SNS
        • SQS
        • Step Functions
        • SWF
      • Storage
        • Aurora
        • DynamoDB
        • EBS
        • EFS
        • ElastiCache
        • RDS
        • Redshift
        • S3
        • Storage Gateway
      • Other Services
        • Alexa for Business, Lex, Connect
        • AppStream 2.0
        • CloudSearch
        • Comprehend
        • Data Tools
        • Elastic Transcoder
        • Mechanical Turk
        • Rekognition
        • WorkDocs
        • WorkSpaces
    • Well Architect Framework
      • Security
      • Reliability
      • Performance Effeciency
      • Cost Optimization
      • Operational Excellence
    • Labs
      • Webserver Implementation
      • ELB Implementation
      • Auto-scaling Implementation
      • A 3-tier Architecture In VPC
  • Architecture
    • Security
  • Spark
    • Memo
  • Conference Notes
    • Notes of JCConf 2017
  • AI Notes
Powered by GitBook
On this page

Was this helpful?

  1. AWS Notes
  2. Services
  3. Storage

Redshift

PreviousRDSNextS3

Last updated 4 years ago

Was this helpful?

Introduction

  • A fully managed OLAP, Business Intelligence (BI) tool based on PostgreSQL, scales to PBs of data.

  • Pay for instance you provisioned

    • It's worth when you have a sustained usage, otherwise use Athena for sporadic queries.

  • Currently only available in 1 AZ in VPC

  • Integrates with BI tools:

    • AWS Quicksight

    • Tableau

Feature

  • Can create 1~128 nodes, up to 160 GB per node.

  • Data can be loaded from:

    • S3

    • DynamoDB

    • Kinesis Firehose

    • DMS

  • Data Processing

    • Columnar Data Storage

      • With block size 1 MB

    • Advanced Compression

    • Massive parallel processing (MPP): automatically distribute data and check nodes' loading

  • Node Type

    • Single Node

    • Multi-Node

      • Leader Node (not charged)

      • Compute Node

  • Can enable "Redshift enhanced VPC routing" for path optimizing when executing a COPY / UNLOAD command.

  • Encryption

    • In-transit with SSL

    • At rest, Redshift takes care of key management

      • Can manage your own keys through hardware security module (HSM)

      • Can manage keys with KMS

  • Snapshot

    • point-in-time backups of a cluster, stored in S3

    • Snapshots are incremental

    • Can restore snapshots to a new AZ or a new cluster

    • Can configure to copy snapshot to another region

      • If KMS is used, set up a snapshot copy grant for a master key in the destination region

      • Enable cross-region snapshots in your Redshift cluster to copy snapshots of the cluster to another region

    • Can be created:

      • Automatically: every 8 hours, every 5 GB, or on a schedule. With retention.

      • Manually: retained until you delete it

  • Workload management

    • Can create different workloads so that short, fast-running queries won't get stuck in queues.

  • Cost efficiency

    • Storing the summary data in Redshift

    • Keeping detailed transaction data out of Redshift (Ex. store on S3)

    • Making use of Redshift Spectrum for drill-down queries that join tables from Redshift / S3.

  • Redshift Spectrum

    • Query data that is already in S3 without loading it

    • Must have a Redshift cluster available to start the query

    • The query is the submitted to thousands of Redshift Spectrum nodes

    • If no need to "Join" different sources, use Athena.

Troubleshooting queries