Benchmarking Amazon EC2 Instance Performance

Learn how to benchmark AWS EC2 instances for optimal performance and cost savings. This guide covers setup, configuration, and data-driven insights.

Subscribe

Subscribe

AWS EC2 Benchmarking: A Guide to Performance Optimization
23:28

Why We Benchmark

The Amazon Web Services (AWS) cloud platform provides many different types of Elastic Compute Cloud (EC2) instances. EC2 instances are essentially virtual machines that are running in the cloud. However, AWS does also offer bare metal EC2 instances, depending on your specific needs.

EC2 instances have many different configuration options available. In addition to the instance type, you can also specify the target operating system, type and size of network attached storage, the AWS Region and Availability Zone, dedicated or shared tenancy, and much more.

While AWS attempts to provide a consistent customer experience, it's possible for varying configurations of EC2 instances to perform differently. Depending on the specific workload you're running, there might be a specific configuration that makes the most sense for that workload. Wouldn't it be nice to have some empirical data that could help you make data-driven decisions about where and how to run your cloud workloads?

That's the objective of this article. We want to create a repeatable process that can periodically benchmark the performance of various EC2 instance configurations.

Architecture: How We Benchmark

To consistently benchmark Amazon EC2 instances, we propose the use of an EC2 “user data” script, which is executed at instance initialization time via cloud-init. This allows the script to execute in a timely manner, and subsequently shutdown the VM, to control the costs of executing cloud benchmarks.

This is not the only mechanism for running benchmark scripts, however. Another reasonable option would be to embed the benchmark script into an AWS Systems Manager “document” that can be deployed to EC2 instances. However, this requires that the instance have an AWS IAM Role (Instance Profile) attached to it, with permissions to the AWS Systems Manager (aka. SSM) service APIs, and have the Systems Manager agent installed. The agent is pre-installed in many of the operating system images provided by AWS; see the “Find AMIs with the SSM Agent preinstalled” document for details.

EC2 Spot Instances

To control the costs for running benchmark EC2 instances, we propose the use of Amazon EC2 Spot Instances. Using Spot Instances can reduce your cloud compute costs by up to 90%, but often somewhere in the 60-80% range. One caveat with Spot Instances is that some Availability Zones don't have Spot Instance capacity for certain instance types. In those cases, you can simply launch on-demand instances instead.

VPC Subnets

We set up a VPC in each AWS region, with a subnet in each Availability Zone. This allows us to benchmark performance and evaluate consistency across different zones, given the same EC2 instance type. We enable subnets with both IPv4 and IPv6 network stacks for full connectivity.

Store and Visualize Performance Data

There's a plethora of mechanisms you can use to store performance benchmark data. We propose using InfluxDB, which is a popular open source and self-hostable time series database engine. Each EC2 instance can write its own benchmark data using the InfluxDB write API.

In addition to ingesting performance metrics, InfluxDB also provides a web-based user interface to build charts and dashboards, to compare performance data. The web UI is polished and easy to use.

Benchmark Workload: CPU

One of the considerations of any benchmark is to identify which system component is being evaluated. Some workloads, such as databases, may stress all system components: CPU, memory, and disk, while others may simply make heavy use of the CPU. For this article, we will focus on evaluating CPU performance by compiling an application.

The Rust programming language makes an excellent choice for benchmarking CPU performance. The Rust compiler utilizes 100% of the CPU whenever possible, which can even lead to starvation of other important services, such as Secure Shell (SSH). Hence, it is important that the benchmark is configured in such a way that it does not block administrative access to the system. The Linux “nice” command can be used for this purpose, to de-prioritize the benchmark over administrative use of services like sshd.

We propose creating a simple Rust application, adding several “heavy” dependencies to it, and then compiling the application. First, we need to install the Rust toolchain with the following lines.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"

Once the Rust toolchain is installed, we can initialize a new benchmark application, add the dependencies, and compile it. The Rust toolchain’s “cargo” CLI tool doesn’t offer a separate command to download dependencies. We don’t want to include the download step in our CPU-focused benchmark. Hence, we need to run “cargo build” once, then clean the project’s compiled artifacts with “cargo clean”, and then run “cargo build” once again. The second instance of the build command will utilize the dependencies that were downloaded during the execution of the first build command.

cargo new bench01
cd bench01
cargo add diesel --features=mysql,chrono,postgres,serde_json,time,uuid,numeric,network-address,large-tables,huge-tables,extras
cargo add bevy --features=exr,flac,file_watcher,dynamic_linking,detailed_trace,bmp,dds,bevy_dev_tools,meshlet,mp3,pnm,serialize,shader_format_glsl,asset_processor,async-io
cargo add tokio --features=full
nice cargo build
cargo clean
nice cargo build # <--  this is the command we want to measure execution time of

As changes occur to the dependencies, it could impact the benchmarking time either positively or negatively. In order to provide consistency of the benchmark over a longer period of time, it’s important to pin specific versions of dependencies. However, for the sake of this article, we will just use the in-line commands to add the dependencies to the project.

Install Prerequisites

We’ll need to install a few dependencies to run the benchmarking script. The benchmark script is designed to be interactive, so that you can select different EC2 instance types and Subnets (Availability Zones) for each invocation. We wrote the script in PowerShell, a cross-platform, object-oriented language, which makes it easy to automate deployment of AWS cloud resources.

After installing PowerShell, you can run this command to install the necessary modules.

$ModuleList = @(
 'Microsoft.PowerShell.ConsoleGuiTools',
 'AWS.Tools.EC2'
)
Install-Module -Scope CurrentUser -Name $ModuleList -Force

Also, we are assuming that you’ve already set up your AWS credentials and config files, as necessary. While the linked document is for the AWS CLI, the same concepts apply to AWS SDKs for other languages.

EC2 User Data Script

Now that we've determined the benchmark that we want to run, we need to wrap it inside of a script that launches Amazon EC2 instances. The following simply defines the user data script as a large “here-string” (multi-line string) in PowerShell, and stores it in a variable.

$UserDataScript = @'
#!/usr/bin/env bash
# This is some magic I copied online to log all script
# output to a file, in case you need to debug something. Extremely useful.
exec 3>&1 4>&2
trap 'exec 2>&4 1>&3' 0 1 2 3
exec 1>/root/userdata-log.txt 2>&1

# Install some common dependencies
apt update;
apt upgrade --yes;
apt install curl gcc docker.io httpie --yes;
# Install PowerShell on 64-bit ARM or Intel/AMD 64-bit architecture
if [[ "$HOSTTYPE" == "arm64" || "$HOSTTYPE" == "aarch64" ]]; then
 export PWSH_URL=https://github.com/PowerShell/PowerShell/releases/download/v7.4.6/powershell-7.4.6-linux-arm64.tar.gz
 export PWSH_DIR=/usr/local/bin/pwsh/
 http --download $PWSH_URL --output /root/pwsh.tar.gz
 mkdir ${PWSH_DIR}
 tar --directory ${PWSH_DIR} -xzvf /root/pwsh.tar.gz
 chmod +x ${PWSH_DIR}pwsh
 ln -s ${PWSH_DIR}pwsh /usr/bin/pwsh
else
 snap install powershell --classic
fi

snap install btop

# Install some dependencies to compile Rust crates Bevy and Diesel
apt-get install libasound2-dev libudev-dev pkg-config libmysqlclient-dev --yes

cat > script.sh <<'EOF'
#!/usr/bin/env bash

cat > $HOME/bench_script.ps1 <<'EOF2'

cd $HOME/bench01

# Clean the Rust project before compiling
cargo clean
$TimedResult = Measure-Command -Expression {
 nice cargo build
}
$InfluxWriteAPI = @{
 Uri = 'https://YOURINFLUXDBSERVERDNSNAME/api/v2/write?org=YOURINFLUXDBORG&bucket=YOURINFLUXDBBUCKET'
 Method = 'Post'
 Body = 'ec2bench,instanceType=INSTANCETYPEGOESHERE,availabilityZoneId=AZIDGOESHERE benchmark_time={0}' -f ([int]$TimedResult.TotalMilliseconds)
 Headers = @{
   Authorization = 'Token {0}' -f 'INFLUXDBTOKENGOESHERE'
 }
}
Invoke-RestMethod @InfluxWriteAPI
EOF2

echo "Switching to user home directory"
cd $HOME

pwsh -Command Install-Module -Name AWS.Tools.S3 -Scope CurrentUser -Force

# Install the Rust Toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"

# Create a new Rust project
cargo new bench01
cd bench01
cargo add diesel --features=mysql,chrono,postgres,serde_json,time,uuid,numeric,network-address,large-tables,huge-tables,extras
cargo add bevy --features=exr,flac,file_watcher,dynamic_linking,detailed_trace,bmp,dds,bevy_dev_tools,meshlet,mp3,pnm,serialize,shader_format_glsl,asset_processor,async-io
cargo add tokio --features=full
nice cargo build

pwsh $HOME/bench_script.ps1

EOF

chmod +x script.sh

sudo --set-home --preserve-env -u ubuntu ./script.sh

touch /root/done.txt
shutdown now -h
'@

There’s a lot going on in this user data script, so let’s summarize it:

  • Install any pending Linux base package updates
  • Install PowerShell on Linux (ARM64 or Intel / AMD 64-bit)
  • Install btop for live monitoring from SSH session (optional)
  • Install Linux dependencies for Bevy and Diesel (Rust crates) compilation
  • Define a Bash script that will run as the Ubuntu (non-root) user
    • Creates a PowerShell script that runs the benchmark and writes results
    • Installs Rust toolchain
    • Creates a Rust project
    • Adds Rust dependencies (Bevy, Diesel)
    • Pre-builds the Rust project
    • Runs the PowerShell benchmark script
  • Execute the Bash script as the Ubuntu user
  • Shutdown the EC2 instance (auto-terminates)

There’s a bit of nested scripting happening, so feel free to pick it apart into smaller pieces, to gain a better understanding of what’s going on. The script makes use of the Bash “here documents” (multi-line strings) to define and execute code, so that it can be easily invoked as a non-root user.

Value Substitutions

There’s a number of different values you’ll need to substitute, in the EC2 user data script, in order to use this script for yourself.

  • YOURINFLUXDBSERVERDNSNAME - set this to your InfluxDB server’s IPv4 or IPv6 address, or a corresponding DNS name
  • YOURINFLUXDBORG - set this to your InfluxDB organization name
  • YOURINFLUXDBBUCKET - set this to your InfluxDB bucket name
  • INFLUXDBTOKENGOESHERE - generate an API token for InfluxDB and set it here

The other substitutions are taken care of by the remainder of the benchmark automation script, which we will review down below.

  • INSTANCETYPEGOESHERE - this records the EC2 instance type that is being benchmarked, so that you can identify it in the InfluxDB metric results
  • AZIDGOESHERE - this records the AWS Availability Zone ID (different from your AWS account’s AZ mapping), so you can identify them in your InfluxDB metrics

Again, this simply defines the user data script as a large string for now. Nothing is actually executing until you launch EC2 instances. This concludes the walkthrough of the EC2 user data script, that will execute the benchmark and record the results. 

Benchmark Launch Script

Now that we’ve talked through the entire EC2 user data script, let’s walk through the additional code that actually launches the benchmark EC2 instances. There are quite a few different parameters you can specify when launching EC2 instances, so bear with us as we explain the launch script.

At a high level, the steps for the benchmark launch script are:

  • Ask user which EC2 instance types to benchmark
  • Ask user which Subnet IDs (Availability Zones) to test in
  • Create a new EC2 instance for each selected Instance Type, in each selected Subnet

Gather User Input

Before launching the EC2 instances, we need to ask the user (you) which EC2 instance types you want to run the benchmark on, and which Subnet IDs (Availability Zones) to test in. We can do that using the PowerShell snippet below.

$AWSProfile = 'YOUR_AWS_PROFILE'
Set-DefaultAWSRegion -Region us-west-2

# Which EC2 instance types do you want to run benchmarks against?
$InstanceTypeList = $null = Get-EC2InstanceType -ProfileName $AWSProfile | Sort-Object -Property InstanceType | Select-Object -Property InstanceType | Out-ConsoleGridView -OutputMode Multiple
if (!$InstanceTypeList) {
 Write-Error -Message 'No EC2 instance types were selected. Please re-run benchmark.'
 return
}

# Ask the user to select one or more subnets (Availability Zones) to launch EC2 benchmarks in
$SubnetList = $null = Get-EC2Subnet -ProfileName $AWSProfile | Sort-Object -Property VpcId | Select-Object -Property SubnetId,AvailabilityZoneId,VpcId,Ipv6CidrBlockAssociationSet,AssignIpv6AddressOnCreation,MapPublicIpOnLaunch | Out-ConsoleGridView -OutputMode Multiple
if (!$SubnetList) {
 Write-Error -Message 'No VPC subnets were selected. Please re-run the benchmark.'
 return
}
Make sure you specify your AWS credentials profile name on the first line; the rest of the code can remain unchanged. If you don’t select any EC2 instance types or Subnets, the script will print a helpful message and abort, to avoid any unexpected behavior. After we capture these details, it’s finally time to actually launch the EC2 instances, for each permutation of InstanceType and SubnetID. That’s what we’ll do in the loop below.

 

foreach ($InstanceType in $InstanceTypeList) {
 foreach ($Subnet in $SubnetList) {
   $Instance = @{
     InstanceMarketOption = $(
       $Market = [Amazon.EC2.Model.InstanceMarketOptionsRequest]::new()
       $Market.MarketType = 'spot'
       $Market.SpotOptions = [Amazon.EC2.Model.SpotMarketOptions]::new()
       $Market.SpotOptions.SpotInstanceType = [Amazon.EC2.SpotInstanceType]::OneTime
       $Market
     )
     InstanceInitiatedShutdownBehavior = [Amazon.EC2.ShutdownBehavior]::Terminate
     InstanceType = $InstanceType.InstanceType
     ImageId = 'ami-04dd23e62ed049936'
     SubnetId = $Subnet.SubnetId
     ProfileName = $AWSProfile
     # Optionally specify an SSH public key
     # KeyName = '2024-Q4-stratusgrid-trevor'
     # Optionally specify an IAM Instance Profile (Role)
      # IamInstanceProfile_Arn = 'arn:aws:iam::999888777666:instance-profile/sg-benchmark'

     UserData = [System.Convert]::ToBase64String([byte[]][char[]]($UserDataScript.Replace('INSTANCETYPEGOESHERE', $InstanceType.InstanceType).Replace('AZIDGOESHERE', $Subnet.AvailabilityZoneId)))
     TagSpecification = $(
       $Tag = [Amazon.EC2.Model.TagSpecification]::new()
       $Tag.ResourceType = [Amazon.EC2.ResourceType]::Instance
       $Tag.Tags += [Amazon.EC2.Model.Tag]::new("Name", ("my-bench-{0}" -f $InstanceType.InstanceType))
       $Tag.Tags += [Amazon.EC2.Model.Tag]::new("Owner", "YOUR NAME HERE")
       $Tag
     )
   }
   New-EC2Instance @Instance
 }
}

First we iterate over the selected instance types, then we iterate over the selected subnets. For each permutation, we create a PowerShell HashTable containing all the parameter names and values for the New-EC2Instance command. Finally, we call the New-EC2Instance command, and “splat” the HashTable of parameters onto it. This is what creates the EC2 instance in your AWS account.

Due to the complexities of the AWS SDK for PowerShell, there are several lines of code that handle setting up the EC2 Spot configuration, and the EC2 Instance Resource Tags. Also, the EC2 User Data script must 1) have values substituted for instance type and AZ, and 2) be Base64-encoded, before it is passed into the New-EC2Instance command. Hopefully a future version of the SDK makes this process more straightforward.

You’ll also notice that we do some basic string replacement operations to inject the Instance Type and Availability Zone ID as metric tags, into the benchmark script.

Update Security Parameters

You can optionally enable the SSH key pair name, which is the public key that will be associated with the EC2 instances. Simply uncomment the KeyName parameter and specify the name of your specific key pair. Note that this is a regional resource reference; your SSH key pair must be imported into all AWS regions you are testing.

If you’d like to manage your EC2 instance with AWS Systems Manager, you can also uncomment the IamInstanceProfile_Arn parameter and specify your custom IAM role Amazon Resource Name (ARN). Your role will need the built-in SSMManagedInstanceCore IAM policy associated with it, and any other custom policies you might need to access various AWS resources. This is only needed if you further customize the benchmark user data script, and need access to other AWS resources.

Updating the AMI ID

If you use a region other than the us-west-2 Oregon region, or if you’re reading this article far into the future, you’ll want to update the Amazon Machine Image (AMI) ID that’s referenced in the parameters. The AMI ID is specific to a particular AWS region. In the above script, we’re currently hard-coding the AMI ID for Ubuntu 24.04 LTS for Intel/AMD 64-bit, for the us-west-2 region.

Update EC2 Resource Tags

You can optionally update the EC2 resource tag values in the above script as well. I added tags to give instances a Name, and Owner, which is generally good practice. You can simply utilize your own naming scheme, and plug in an appropriate value for the Owner tag. The tag values can be whatever you want, such as a team or individual name.

Execute the Benchmark Script

Now that you have an understanding of all the components of the Benchmark script, let's go ahead and run it. You can copy and paste the entire script into a PowerShell terminal. You can also copy and paste it into Microsoft Visual Studio Code, install the PowerShell Extension, and run it from there.

Upon running the script, you should be prompted for the EC2 instance types, and subnets. Check out the screenshots below for examples of what you might see. After selecting each of these options, you should see the script attempt to launch all of the EC2 instance permutations for your selection.

Benchmarking-Amazon-EC2-Instance-Performance-1

Benchmarking-Amazon-EC2-Instance-Performance-2

After launching the benchmark script, wait a few minutes for your EC2 instances to finish running the benchmark and shut themselves down. It will probably take a few minutes for each one to finish.

Explore Benchmark Results

After executing the script to invoke your benchmark EC2 instances, it's time to take a look at the results in your InfluxDB server. After logging into the InfluxDB web interface, open the Data Explorer view. Select the time range where your benchmark data resides, similar to the screenshot below.

Benchmarking-Amazon-EC2-Instance-Performance-3

After selecting the data time range, you can select the InfluxDB bucket, measurement, and any filters you want to apply. For example, let’s say we want to compare performance of c7a.2xlarge instances across all Availability Zones. Take a look at the screenshot below. Simply don’t select any Availability Zone filter, or select all the values, or limit it to the ones you want to visualize.

Benchmarking-Amazon-EC2-Instance-Performance-4

After selecting these filters, you should see a graph similar to the following. While the data I have shown here is somewhat disparate, there’s a valid reason for it. Spot instance capacity was not available in a certain Availability Zone. Also, certain instance types may only be available in certain AZs as well. You’ll typically see these errors show up when you are executing the benchmark launch script.

Benchmarking-Amazon-EC2-Instance-Performance-5

While there are only a few data points in this screenshot, we can make some inferences from it. You’ll notice that the red line, on the bottom-right, shows closely consistent performance between the first and second data points. You can also see that the purple, pink-ish, and orange lines show a decent variance between the available data points. If we hover the mouse over those data points, we can see the Instance Type and Availability Zone tags for that line. That’s why we made sure to add those tags to our data points when we write them to InfluxDB.

While we only have a few data points to work with, here are some general conclusions:

  • Both the Intel and AMD-based instance types in usw2-az3 seem to have some variance in their performance over time, along with the Intel instance in usw2-az4
    • The others are more stable
  • The c7i.2xlarge in usw2-az4 started off slower than usw2-az2 and usw2-az3
  • For this benchmark, the AMD-based c7a.2xlarge seems consistently faster than the Intel-based c7i.2xlarge

Feel free to play around with different filters and use the data to answer your own questions about Amazon EC2.

Making the Most of Your AWS EC2 Benchmark Data

I hope this article has inspired you to run some benchmarks on AWS and evaluate the platform’s performance, using various workloads and resource configurations. With a little automation work, we can gather empirical data and use that to help us make informed decisions. You can take this a step further and run this type of benchmark automation on a schedule, using a CI/CD pipeline service like GitHub Actions. That way you can start analyzing data at peak times of the day, comparing across different days of the week, and even holiday periods. Rather than prompting for Instance Types and Subnet IDs, you could modify the script to hard-code those values as input parameters.

Beyond the EC2 service, there are many other AWS services that may benefit from benchmarking, to ensure consistency in performance and availability. For example, it may be interesting to test if DynamoDB tables have any performance impact from enabling additional indexes. Another interesting thing to test might be query performance, as the size of a DynamoDB table expands. How much slower is a given query on a 1TB table versus a 5GB table, for example?

Please reach out to us at StratusGrid if there’s a cloud project you have in mind that we can help out with. We would also love to show you a live demo of our SaaS tool, Stratusphere. Stratusphere helps organizations find cost optimization and security improvement opportunities across their cloud platforms.

Similar posts