Kinesis

Introduction

Kinesis is to collect, process, and analyze real-time, streaming data.
Data is automatically replicated synchronously to 3 AZ.
Good for:
- IoT
- Realtime Big Data
- Streaming processing

Feature

Kinesis services:
- Kinesis Data Streams (low latency streaming ingest at sacle)
  - Consists of ordered "shards", the total capacity is the sum of all its shards. Can re-send shards.
  - Must manage scaling (shard splitting / merging)
  - To store streaming data from producers in shards, and waiting for consumption (ex. EC2 instances)
  - Retention period: 24 hours (default) ~ 7 days. Cannot be deleted or changed.
  - Billing is per shard provisioned.
  - Batching available or per message calls.
  - Need to write your own code for producer / consumer.
  - Producer:
    Options:
    AWS SDK (simple producer)
    Kinesis Producer Library (KPL): batch, compression, retries, with C++ / Java
    Kinesis Agent
    Monitor log files and sends them to Kinesis directly
    Can write to Kinesis Data Streams and Kinesis Firehose
    Limit:
    1 MB/s or 1000 messages/s at write per shard, otherwise would get ProvisionThroughputException.
  - Consumer:
    Options:
    AWS SDK
    Lambda (Event source mapping)
    KCL: checkpointing, coordinated reads
    Limits:
    Consumer Classics:
    ~200 ms latency
    2 MB/s at read per shard across all consumers
    5 API calls per second per shard across all consumers
    Consumer Enhanced Fan-out:
    ~70 ms latency
    2 MB/s at read per shard, per enhanced consumer
    No API calls needed (push model)
- Kinesis Firehose (to only some specific destinations, near real time, serverless)
  - Fully managed buffered streaming service, no administration, automatic scaling, serverless
  - Buffer would be flushed if buffer size / time is reached. So it's near realtime (60 seconds latency minimum for non full batches)
  - No data storage. So once flushed, data is gone. (not able to re-send)
  - Supports many data formats, conversions, transformatins, compression (with Lambda, templates available.)
  - Pay for the amount of data going through Firehose
  - Doesn't have shards to keep data, only receive streaming data from producers:
    SDK / KPL / Kinesis Agent
    Kinesis Data Streams
    CloudWatch logs & events
    IoT rules actions
  - Can transform streams with Lambda
  - Then can only load into S3, Redshit, ElasticSearch & Splunk
  - Example of architecture:

Kinesis Analytics (perform real time analytics on streams using SQL)
- Pay only for resources comsumed (but not cheap)
- Scales automatically and with real-time latency from milliseconds to seconds.
- Use SQL-type query language to Kinesis streams / Firehose, then send data to S3, Redshift, ElaticSearch Cluster.
- Can use Lambda for pre-processing
- Use Case:
  - Streaming ETL
  - Continuous metric generation
  - Responsive analytics
Scenario:
- Streaming Architecture comparison of Kinesis and DynamoDB

Comparison of storages

Tips

Streaming Data != Data Streaming
- Streaming Data is about data that is continuously generated by different sources.
- Data Streaming is the process of transferring a stream of data from one place to another

PreviousIoT NextVideo Streaming

Last updated 4 years ago

Was this helpful?