Deep Dive into Amazon Kinesis Data Streams and Consumers

Deep Dive into Amazon Kinesis Data Streams and Consumers

July 17, 2020 / Eternal Team

Have you ever wondered how are services are made which ingest huge chunks of data and provide perfect output? What kind of machines would they be using? If these question bugged your mind then fret no more because we will explain how you can do so very very easily and introduce you to AWS Kinesis.

By definition “AWS Kinesis is a service which allows you to collect and process and analyze real-time streaming data”.

It has four different variants, each applying to a particular use case.

  • Video Streams
  • Data Streams
  • Data Firehose
  • Data Analytics

Data streams could be anything like clickstream data, video or audio data, application logs, unstructured data etc.

If you might remember, a Kinesis stream is made up of a set of multiple shards and each shard is a sequence of data records, each with their own unique sequence number.

  • And the data capacity of your stream is the sum total of the capacity of all of its shards.
  • And each shard gives you five read transactions per second, up to a maximum of two megabytes per second.
  • You also get 1,000 write records per second, up to a maximum of one megabyte per second.

So as your data rate goes up, as it increases, then you will also need to increase the number of shards to handle the increase in your data and when we increase the number of shards, that is known as resharing.

Deep Dive into Amazon Kinesis Data Streams and Consumers

The plus point is Kinesis has no upfront cost, and you only pay for the resources you use. For as little as $0.015 per hour.

But what about the consumer (an app built to read and process data records from Kinesis)?

So we can say by consumers it means an EC2 instance running an app, which is consuming data from your stream. And on the consumer, you’ve got the Kinesis Client Library. And the Kinesis Client Library tracks the number of shards that exist in your stream and it also discovers when new shards are added.

Say if you increase the number of shards from four to six, it’s the Kinesis Client Library
Which is going to detect that and respond accordingly. So the Kinesis Client Library ensures that for every shard there is a record processor, which actually process the data which is being streamed on your Kinesis stream.

And the client library manages the number of record processors relative to the number of shards and consumers.

So if you only have one consumer instance, then the Kinesis Client Library is going to create
all of the record processors on that single consumer instance.

However, if you have two or more consumer instances, it’s going to load balance across all of those consumer instances and it will create an equal number of record processors on each instance.

Deep Dive into Amazon Kinesis Data Streams and Consumers

What is Kinesis Client Library?

The Amazon Kinesis Client Library for Java (Amazon KCL) enables Java developers to easily consume and process data from Amazon Kinesis
(https://github.com/awslabs/amazon-kinesis-client)

Conclusion

We learned what is kinesis and shards and you just really need to be aware that on your consumer instances, it’s the Kinesis Client Library that’s doing all of this work of managing the number of record processors and the Kinesis Client Library creates a record processor for each shard which is being consumed by your instance.

For more details and pricing please visit: https://aws.amazon.com/kinesis/

AWS-Consulting-Partner

Talk to AWS Certified Consultant

Want to start a project?

It’s simple.

Contact us