Kinesis
Introduction
Kinesis is to collect, process, and analyze real-time, streaming data.
Data is automatically replicated synchronously to 3 AZ.
Good for:
IoT
Realtime Big Data
Streaming processing
Feature
Kinesis services:
Kinesis Data Streams (low latency streaming ingest at sacle)
Consists of ordered "shards", the total capacity is the sum of all its shards. Can re-send shards.
Must manage scaling (shard splitting / merging)
To store streaming data from producers in shards, and waiting for consumption (ex. EC2 instances)
Retention period: 24 hours (default) ~ 7 days. Cannot be deleted or changed.
Billing is per shard provisioned.
Batching available or per message calls.
Need to write your own code for producer / consumer.
Producer:
Options:
AWS SDK (simple producer)
Kinesis Producer Library (KPL): batch, compression, retries, with C++ / Java
Kinesis Agent
Monitor log files and sends them to Kinesis directly
Can write to Kinesis Data Streams and Kinesis Firehose
Limit:
1 MB/s or 1000 messages/s at write per shard, otherwise would get
ProvisionThroughputException
.
Consumer:
Options:
AWS SDK
Lambda (Event source mapping)
KCL: checkpointing, coordinated reads
Limits:
Consumer Classics:
~200 ms latency
2 MB/s at read per shard across all consumers
5 API calls per second per shard across all consumers
Consumer Enhanced Fan-out:
~70 ms latency
2 MB/s at read per shard, per enhanced consumer
No API calls needed (push model)
Kinesis Firehose (to only some specific destinations, near real time, serverless)
Fully managed buffered streaming service, no administration, automatic scaling, serverless
Buffer would be flushed if buffer size / time is reached. So it's near realtime (60 seconds latency minimum for non full batches)
No data storage. So once flushed, data is gone. (not able to re-send)
Supports many data formats, conversions, transformatins, compression (with Lambda, templates available.)
Pay for the amount of data going through Firehose
Doesn't have shards to keep data, only receive streaming data from producers:
SDK / KPL / Kinesis Agent
Kinesis Data Streams
CloudWatch logs & events
IoT rules actions
Can transform streams with Lambda
Then can only load into S3, Redshit, ElasticSearch & Splunk
Example of architecture:
Kinesis Analytics (perform real time analytics on streams using SQL)
Pay only for resources comsumed (but not cheap)
Scales automatically and with real-time latency from milliseconds to seconds.
Use SQL-type query language to Kinesis streams / Firehose, then send data to S3, Redshift, ElaticSearch Cluster.
Can use Lambda for pre-processing
Use Case:
Streaming ETL
Continuous metric generation
Responsive analytics
Scenario:
Streaming Architecture comparison of Kinesis and DynamoDB
Comparison of storages
Tips
Streaming Data != Data Streaming
Streaming Data is about data that is continuously generated by different sources.
Data Streaming is the process of transferring a stream of data from one place to another
Last updated