DynamoDB

Introduction

  • NoSQL database, fully managed, massive scale (1,000,000 rps), single-digit millisecond latency.

  • Similar to Apache Cassandra (can migrate to DynamoDB). DynamoDB is made of tables.

  • Store on SSD, spread across 3 geographically data centers (cannot choose AZ)

  • Backups available, point in time recovery.

Feature

  • Capacity mode:

    • Provisioned (R/W Capacity Unit & auto scaling, default, free-tier eligible)

      • If capacity is running out and indexes are well used and don't want to increase the cost, consider to export / archive data.

      • Can purchase reserved capacity in advance to lower the costs

    • On-demand

  • Read type:

    • Eventually consistent reads (default)

    • Strongly consistent reads

  • Supports ACID transactions across multiple tables.

  • Integrated with IAM for security.

  • Data types:

    • Scalar types: string, number, binary, boolean, null.

    • Document types: list, map

    • Set types: set, number set, binary set.

  • Primary Key (must be decided at creation time, and must be unique.)

    • Partition Key (Hash attribute)

    • Partition Key (Hash attribute) + Sort Key (Range Attribute)

      • Data is grouped by Partition Key.

      • Timestemp is a good Sort Key candidate.

  • Working With Indexes

    • Can only query by PK + Sort Key on main table / indexes (Cannot query by a specific column spontanesously).

    • Local Secondary Index (to select an alternative Sort Key)

      • Contains identical partition key of base table.

      • The identifier of the Local Secondary Index can only be composite key. The hash attribute must be the same with the hash attribute of base table.

      • Total size <= 10 GB.

      • Created at the same time with base table, and can not be deleted if table exists.

      • Supports both eventual / strong read consistency.

      • Action of read / write consumes capacity units from base table.

      • Best Practices

        • Use Indexes sparingly (Avoid indexing to heavily-write table. Don't add index not used.)

        • Choose Projections carefully,

        • Optimize Projection to avoid fetches.

        • Take advantage of sparse Indexes (Ex. Create an attribute of item for indexing, remove the attribute when the item is no longer needed.)

        • Watch for expanding item collections.

    • Global Secondary Index (in case you need another key to work like the Primary Key)

      • It contains a full mapping to all items (rows) with the specified attribute in the base table.

      • The identifier of the Global Secondary Index can be simple partition key or a composite key. The key can be any attribute.

      • No size restrictions.

      • Can be created at the same time with base table, and can be created / deleted any time.

      • Supports eventual consistency read only.

      • Action of read / write consumes capacity units from the index.

      • Best Practices

        • Choose a key that will provide uniform workloads.

        • Take advantage of sparse (the attribute appears infrequently among all items) Indexes.

        • Use a Global Secondary Index for quick lookups. (Ex. Select sub-sets of attributes.)

        • Create an Eventually Consistent Read Replica.

  • Allows for the storage of large text and binary objects with a limit of 400 KB per item (row).

    • If an object is over 400 KB, store it to S3, then save reference.

  • TTL: automatically purge out old data without consuming WCU / RCU row after a specified epoch date.

  • DynamoDB Streams:

    • React to changes to DynamoDB tables in real time

    • Can be read by AWS Lambda, EC2, etc. Then send to ElasticSerach or Kinesis, etc.

    • 24 hours retention of data

  • Global Tables (cross region replication)

    • Active-Active replication to many regions

    • Must enable DynamoDB Streams

    • Useful for low latency, DR purposes

  • DAX (DynamoDB Accelerator)

    • Seamless cache for DynamoDB, no application re-write.

    • Writes go through DAX to DynamoDB

    • Microsecond latency for cached reads

    • Solves the Hot Key Problem (too many reads)

    • 5 minutes TTL cache by default

    • Up to 10 nodes in the cluster

    • Multi-AZ (3 nodes minimum reconnended for production)

    • Secure (Encryption at rest with KMS, VPC, IAM, CloudTrail, etc.)

  • Scenario:

    • S3 indexing with DynamoDB

      • Create DynamoDB and indexes for later API retrieval.

      • S3 event to trigger Lambda, to insert data to DynamoDB.

    • DAX vs ElastiCache

      • DAX for: individual objects cache for query

      • ElastiCache: store aggregated result

Last updated