AWS Lambda is a managed service that enables you to deploy application code. You can write function code in many different programming languages, such as JavaScript, Python, Rust, and more.
One of the major benefits of running your business logic in AWS Lambda functions is that you’re billed for usage down to the millisecond level of granularity. Lambda also handles scaling for you automatically by running parallel executions of your function code.
AWS Lambda functions have a service-imposed maximum execution time of 15 minutes. In some application architectures, a Lambda function might be expected to execute for only a few milliseconds, or seconds. When Lambda functions are invoked, the request can be synchronous or asynchronous.
Synchronous invocations return a result immediately, whereas asynchronous functions run in the background without returning output to the caller.
For example, a Lambda function might retrieve data from a database and send it back to the caller. Another practical example would be a user registering an account for a SaaS service. A Lambda function might create a new user record in a storage mechanism such as Amazon DynamoDB. For CPU-intensive data processing tasks, such as converting media file formats, a Lambda function could execute asynchronously, for several minutes.
Each Lambda function can be configured with its own timeout threshold, in seconds. The default configuration is set for 3 seconds. If your function code exceeds this execution time, then the Lambda service will kill the function execution.
Although having a practical timeout threshold is important for cost control and predictable performance, premature termination of functions can also cause unpredictable application behavior. It’s best to configure your timeout for a reasonable value that’s slightly longer than your function’s expected execution time.
Here are some common scenarios where Lambda function timeouts could occur.
When a Lambda function is invoked asynchronously and fails execution due to a timeout, the function is retried up to 2 times. Each time your Lambda function is retried, you will incur the cost of the function’s execution duration.
This can increase your cost for AWS Lambda compute time and invocation requests significantly, especially if your function is being invoked and retried frequently, and if you have a long timeout threshold configured. You can modify the default Lambda retry behavior to retry once or not at all.
When you’re deploying AWS Lambda functions, you can start with a higher timeout value, and progressively reduce it once you identify how long your function takes to execute, on average.
AWS Lambda automatically emits metrics to the Amazon CloudWatch service that help you understand your function’s behavior. Keep an eye on the Duration metric to see how long your function takes to execute, and use this data to guide your timeout threshold.
Stratusphere™ FinOps is a Software-as-a-Service (SaaS) platform from StratusGrid that enables you to identify excessive timeouts in AWS Lambda functions across your entire organization.
According to the documentation for AWS Trusted Advisor – one of Stratusphere™ FinOps’s data sources – excessive timeouts are flagged when “> 10% of invocations end in an error due to a timeout on any given day within the last 7 days.” For example, if a function is invoked 1,000 times in a day, and 100 of those invocations result in the configured timeout threshold being exceeded, then this alert will be triggered.
In the Stratusphere™ FinOps web interface, navigate to the Findings section. Select the Service filter drop-down box and choose AWS Lambda. Select the Levels of Effort filter and select High. Stratusphere™ FinOps classifies the Level of Effort required for remediation as High, due to the potential for code debugging and external service investigation that may be required.
In the list below, you will see the AWS Lambda functions that are candidates for remediation due to excessive timeouts. Each of the AWS Lambda functions has a unique Amazon Resource Names (ARN), under the Resource ID column. This shows you which AWS account and region the function belongs to, so that you can locate it for remediation.
Before you apply a remediation technique for your AWS Lambda functions, you should consider the risks below.
Depending on the reason for the excessive timeouts in your AWS Lambda function, there may be a different approach to resolving it.
Let’s explore each of these remediations in more depth, in the sections below.
If your code takes longer to execute than the currently configured Lambda function timeout threshold, then you can simply increase the threshold to accommodate the estimated execution time. To update the Lambda function’s timeout threshold, follow these steps.
This updated configuration should give your Lambda function code adequate time to execute. Feel free to repeat these steps to adjust the timeout period according to your requirements.
If you have an external service that your Lambda function code is dependent on, you will want to ensure that this service is healthy and responsive. Examples of service dependencies could include a REST API developed by an internal team, an external SaaS REST API, or a database service (eg. MySQL, Postgres).
StratusGrid recommends implementing observability tools, such as Amazon CloudWatch or Datadog, to ensure that your services are responding with the expected results, and in a timely manner.
Considering that AWS Lambda simply executes the code that you hand off to it, it’s possible for your code to cause an unexpected infinite loop. For example, consider the contrived scenario in the function code in the screenshot below. This code will never return a result to the Lambda runtime, because it is caught in an infinite “while” loop.
To fix this, we need to specify a condition to break out of the loop, so the rest of the Lambda handler function can execute. In Python, we can simply specify the break statement to break execution out of the loop and continue on. Alternatively, we could simply eliminate the loop from the function code entirely, and find another method of accomplishing our code’s objective.
Frequent timeouts of AWS Lambda functions can drive up your cloud spend unnecessarily, and create instability in your business applications.
Using Stratusphere™ FinOps from StratusGrid, you can easily identify Lambda functions that are experiencing high rates of timeouts, and flag them for remediation. Once these remediation opportunities have been identified, your software development teams can take the necessary steps to remediate each function, according to its unique requirements.
If you're facing challenges with Lambda function timeouts or simply want to ensure your AWS Lambda functions are optimized for performance and cost-efficiency, we're here to help. Contact us now to take the first step towards optimized Lambda function performance.
Check out these resources for more information about AWS Lambda functions.
BONUS: Download Your FinOps Guide to Effective Cloud Cost Optimization Here ⤵️