How to Remediate Lambda Functions with Excessive Timeouts in Stratusphere™ FinOps

Discover how to tackle excessive timeouts in AWS Lambda. Learn strategies to optimize your functions in Stratusphere™ FinOps and boost your cloud efficiency.

Need additional support?

Subscribe

AWS Lambda is a managed service that enables you to deploy application code. You can write function code in many different programming languages, such as JavaScript, Python, Rust, and more.

One of the major benefits of running your business logic in AWS Lambda functions is that you’re billed for usage down to the millisecond level of granularity. Lambda also handles scaling for you automatically by running parallel executions of your function code.

AWS Lambda functions have a service-imposed maximum execution time of 15 minutes. In some application architectures, a Lambda function might be expected to execute for only a few milliseconds, or seconds. When Lambda functions are invoked, the request can be synchronous or asynchronous.

Synchronous invocations return a result immediately, whereas asynchronous functions run in the background without returning output to the caller.

For example, a Lambda function might retrieve data from a database and send it back to the caller. Another practical example would be a user registering an account for a SaaS service. A Lambda function might create a new user record in a storage mechanism such as Amazon DynamoDB. For CPU-intensive data processing tasks, such as converting media file formats, a Lambda function could execute asynchronously, for several minutes.

Lambda Function Timeout Period

Each Lambda function can be configured with its own timeout threshold, in seconds. The default configuration is set for 3 seconds. If your function code exceeds this execution time, then the Lambda service will kill the function execution.

Although having a practical timeout threshold is important for cost control and predictable performance, premature termination of functions can also cause unpredictable application behavior. It’s best to configure your timeout for a reasonable value that’s slightly longer than your function’s expected execution time.

Here are some common scenarios where Lambda function timeouts could occur.

  • Function code generally requires a longer timeout threshold
  • Infinite loop bug in function code
  • External service dependency takes longer than expected to respond

Lambda Timeout Retries

When a Lambda function is invoked asynchronously and fails execution due to a timeout, the function is retried up to 2 times. Each time your Lambda function is retried, you will incur the cost of the function’s execution duration.

This can increase your cost for AWS Lambda compute time and invocation requests significantly, especially if your function is being invoked and retried frequently, and if you have a long timeout threshold configured. You can modify the default Lambda retry behavior to retry once or not at all.

Specifying the Correct Timeout Threshold

When you’re deploying AWS Lambda functions, you can start with a higher timeout value, and progressively reduce it once you identify how long your function takes to execute, on average.

AWS Lambda automatically emits metrics to the Amazon CloudWatch service that help you understand your function’s behavior. Keep an eye on the Duration metric to see how long your function takes to execute, and use this data to guide your timeout threshold.

Identify Excessive Timeouts in Lambda

Stratusphere™ FinOps is a Software-as-a-Service (SaaS) platform from StratusGrid that enables you to identify excessive timeouts in AWS Lambda functions across your entire organization.

According to the documentation for AWS Trusted Advisorone of Stratusphere™ FinOps’s data sources – excessive timeouts are flagged when “> 10% of invocations end in an error due to a timeout on any given day within the last 7 days.” For example, if a function is invoked 1,000 times in a day, and 100 of those invocations result in the configured timeout threshold being exceeded, then this alert will be triggered.

In the Stratusphere™ FinOps web interface, navigate to the Findings section. Select the Service filter drop-down box and choose AWS Lambda. Select the Levels of Effort filter and select High. Stratusphere™ FinOps classifies the Level of Effort required for remediation as High, due to the potential for code debugging and external service investigation that may be required.

Stratuspheres Findings Filters

In the list below, you will see the AWS Lambda functions that are candidates for remediation due to excessive timeouts. Each of the AWS Lambda functions has a unique Amazon Resource Names (ARN), under the Resource ID column. This shows you which AWS account and region the function belongs to, so that you can locate it for remediation.

Stratuspheres Findings Dashboard

Risks & Other Considerations

Before you apply a remediation technique for your AWS Lambda functions, you should consider the risks below.

  • Risk: Increasing the Lambda function timeout threshold can increase cloud costs
    • Consideration: Arbitrarily increasing your Lambda function timeout, without understanding what is causing it, can increase your AWS cloud spend. Make sure that you thoroughly test your code and understand the approximate, reasonable execution time. If the code is not executing within that threshold, you may want to add debugging messages to help identify what is causing performance slowdowns, or try increasing the memory allocation to the function.

Remediation

Depending on the reason for the excessive timeouts in your AWS Lambda function, there may be a different approach to resolving it.

  • Increase your Lambda function timeout threshold
  • Investigate and resolve external service, and add monitoring
  • Fix infinite loop in Lambda function code

Let’s explore each of these remediations in more depth, in the sections below.

Increase Lambda Function Timeout

If your code takes longer to execute than the currently configured Lambda function timeout threshold, then you can simply increase the threshold to accommodate the estimated execution time. To update the Lambda function’s timeout threshold, follow these steps.

  1. Login to the AWS Management Console
  2. Select the AWS Region where your Lambda function exists
  3. Navigate to the AWS Lambda service
  4. Select your function from the Functions list
  5. Select the Configuration tab from the function details
  6. Select the General Configuration option
  7. Click the Edit button
  8. Specify a new value for the Timeout section
  9. Click the Save button

Stratuspheres AWS Lambda Timeout Settings

This updated configuration should give your Lambda function code adequate time to execute. Feel free to repeat these steps to adjust the timeout period according to your requirements.

Fix External Service Dependency

If you have an external service that your Lambda function code is dependent on, you will want to ensure that this service is healthy and responsive. Examples of service dependencies could include a REST API developed by an internal team, an external SaaS REST API, or a database service (eg. MySQL, Postgres).

StratusGrid recommends implementing observability tools, such as Amazon CloudWatch or Datadog, to ensure that your services are responding with the expected results, and in a timely manner.

Fix Infinite Loop Bug

Considering that AWS Lambda simply executes the code that you hand off to it, it’s possible for your code to cause an unexpected infinite loop. For example, consider the contrived scenario in the function code in the screenshot below. This code will never return a result to the Lambda runtime, because it is caught in an infinite “while” loop.

To fix this, we need to specify a condition to break out of the loop, so the rest of the Lambda handler function can execute. In Python, we can simply specify the break statement to break execution out of the loop and continue on. Alternatively, we could simply eliminate the loop from the function code entirely, and find another method of accomplishing our code’s objective.

Fix Infinite Loop Bug

Take Action to Optimize Your Lambda Functions Now!

Frequent timeouts of AWS Lambda functions can drive up your cloud spend unnecessarily, and create instability in your business applications. 

Using Stratusphere™ FinOps from StratusGrid, you can easily identify Lambda functions that are experiencing high rates of timeouts, and flag them for remediation. Once these remediation opportunities have been identified, your software development teams can take the necessary steps to remediate each function, according to its unique requirements.

If you're facing challenges with Lambda function timeouts or simply want to ensure your AWS Lambda functions are optimized for performance and cost-efficiency, we're here to help. Contact us now to take the first step towards optimized Lambda function performance.

Check out these resources for more information about AWS Lambda functions.

See Stratusphere™ FinOps in Action Here:

 

Similar posts