The talk Serverless data streaming: Amazon Kinesis Data Streams and AWS Lambda by Anahit Pogosova was the first session I attended at re:Invent 2023. The presentation was loaded with practical information and real-world experience that anyone who works with Kinesis will need to understand.
In this blog, I aim to distill some of the key takeaways and highlights to help you navigate the complexities of Amazon Kinesis Data Streams and how to Overcome batch requests and poison pills.
One of the first points that struck me was the behavior of AWS Kinesis when handling batch requests.
AWS Kinesis supports batching of records to optimize data transfer. However, when you send a batch request using the PutRecords API call, Amazon Kinesis doesn't guarantee that all records will be successfully ingested. It may ingest some, all, or none of them due to reasons such as throttling, internal errors, or exceeding the maximum allowed size.
Here’s the interesting part. When a single record in a PutRecords request fails, rather than failing the whole request, Kinesis returns a successful HTTP 200 response with a PutRecordsResult object. This object includes an array of PutRecordsResultEntry items for each record. Each PutRecordsResultEntry has a SequenceNumber and a ShardId. If the record is not successfully ingested, the ErrorCode and ErrorMessage fields are also included.
This approach can be counterintuitive as developers may expect an all-or-nothing behavior where either all records are processed successfully or the whole request fails. Not handling this appropriately can lead to data loss if failed records are not retried for ingestion. Therefore, it's crucial to inspect the PutRecordsResult response and handle any failed records appropriately. This usually includes implementing retry logic possibly with exponential backoff and jitter, especially in the case of throttling errors.
A Poison Pill in AWS Kinesis refers to a record that cannot be processed successfully due to its size, data type, or other characteristics. Kinesis Data Streams has a record size limit of 1MB. If a record exceeds this size, a ProvisionedThroughputExceededException error is thrown and the record is not ingested into Amazon Kinesis.
When consuming data from Kinesis using the Amazon Kinesis Client Library (KCL), if this poison pill record is encountered, it can halt the progress of the entire shard, as KCL processes records in order.
This problem becomes exacerbated with default SDK settings. The default iterator used by KCL is TRIM_HORIZON which makes the application read all the data from the stream. So if there's a poison pill record, it'll keep trying to process it until the record expires, preventing the processing of subsequent records.
The poison pill failure mode can be mitigated by the use of destinations, event source failures, and bisect batch-on error.
Destinations are alternate locations for your records. When AWS Kinesis fails to process a record, instead of attempting to reprocess indefinitely, the record can be moved to a destination. This can be another Kinesis Stream, an Amazon S3 bucket, or a Lambda function. Having a separate location for failed records ensures that your stream continues processing the next records.
When AWS Kinesis feeds data into Lambda (or another event source) and the invocation fails, Lambda will attempt to process the batch until it succeeds or the data expires. Starting from the AWS SDK v1.10.0, you can also configure retries on individual records to handle event source failures. If a record results in an error, it can be retried before moving to the next record.
Bisect batch on error is another method to handle poison pills. If a batch contains a bad record, the function will throw an error. If you enable bisect on error, Kinesis splits the payload in two and retries.
The insights from Anahit Pogosova's session at re:Invent 2023 are invaluable for anyone looking to integrate Kinesis Data Streams into their applications. Understanding how Amazon Kinesis responds to failed records in batch requests, how to mitigate poison pills, and the limitations of AWS Kinesis On-Demand is vital for those who plan to integrate Kinesis Data Streams with their applications. For more excellent information watch the full video of the presentation.
At StratusGrid, we specialize in providing solutions and expert guidance to help you maximize the potential of your Amazon Kinesis projects. Whether you're grappling with batch request nuances, poison pill challenges, or seeking to optimize your data streaming architecture, our team of AWS experts is ready to assist.
Contact StratusGrid today and let's work together to turn your data streaming challenges into successes.
BONUS: Download Your FinOps Guide to Effective Cloud Cost Optimization Here ⤵️