Stratusphere™ FinOps vs. AWS Trusted Advisor - A Comparison
Explore an in-depth comparison of Stratusphere™ FinOps and AWS Trusted Advisor for cost optimization. Discover which tool fits your enterprise needs.
Explore the role of networking in cloud engineering and learn how you can expand your skills beyond the job description for better problem-solving.
So, you work in IT? That means your job stops where the job description does, right? This is not true of any job, as learning should never be limited by a job description - otherwise, we would never be challenged beyond what we already know!
Now, let’s narrow our scope down to IT engineers, specifically those who are cloud-focused. Being cloud-focused means we’re building new and exciting stuff all day and it’s all cloudy and AWS handles all of the networking. That seems like something we could only dream about, but then one day you realize AWS handles 90% of the workload while your networking team handles the other 10%.
While this idea sounds great in theory, it is essential that as engineers, we strive to be well-rounded and always seek to push ourselves outside of our comfort zone. In this article, we will talk about the reality of this concept that engineers need to face by using a real-life example, and then I’ll provide some applicable tools to help you achieve this idea.
We all have a part in the grand scheme of our infrastructure from App Code to Security to Networking and everything in between. In the past, when the app server couldn’t connect to the file server, we would throw our hands up in the air and say it was the networking team's problem and we were roadblocked. It is easy to pass on the blame without taking responsibility, but the reality is that being willing to admit to your mistakes is an important part of the job.
I hear this question a lot, and to answer, I have an example of a situation that recently happened at StratusGrid.
One of our lead software engineers recently found that he was having an issue working with a VPC-connected Lambda and could not reach the Internet through a NAT Gateway. He is well-versed in Networking and ran through every test to determine the issue. At that point, the issue was handed off, and after a couple of hours of troubleshooting, we were able to determine it was due to NACLs on the public subnet where the NAT Gateway was located.
When we looked back on the issue, we found out that the lead spotted the ACLs but didn’t know enough about them. He is a full-stack developer and went above and beyond on networking knowledge. This is the type of networking knowledge that can empower you as a Software Engineer.
As engineers, we all have our favorite tools that just work - some tools that cause us pain (though they get the job done), and tools we’re encouraged to use. We’re going to talk about some of these tools and how StratusGrid uses them as well as a few of their pros and cons.
For StratusGrid, most of our engineers work in Terraform (HCL) or CDK (TypeScript) for our day-to-day jobs. The wonderful thing about Terraform is the amazing community-supported modules, especially the AWS-specific ones. There are very few deployments where StratusGrid doesn’t use the community AWS VPC Model. VPCs are simple on the surface (though they can get complex very quickly), and by utilizing this module and its features, we’re able to quickly deploy VPCs and changes.
With this being said, the VPC module isn’t perfect and has some quirks, though it allows our engineers to quickly deploy with standardization in a programmatic manner for a variety of networks.
One of the things that StratusGrid engineers really like about the VPC module is that it allows you to easily define Public/Private/Database subnets to create a proper three-tier application architecture. In the logic of the module, it allows you to define if the database subnets should have routes to the Internet, and it allows you to define your NAT Gateway logic from a single one to one per availability zone (AZ). These are just a few of the many features it offers.
One of my favorite things about the VPC module is how easy it is to log every single packet on your network. StratusGrid has repeatedly used this feature to diagnose what is happening and where a packet is getting dropped.
In a traditional network, you might start with a Wireshark packet capture on the source and destination box and hope it's not being dropped somewhere else in the line, and then you would need to have your network team help diagnose where it's being dropped. None of that needs to happen with VPC Flow Logs; with the example code shown below combined with the AWS VPC Module discussed above, you can have every ENI in the VPC logging within about five minutes:
#Variables
variable "cloud_watch_retention" {
description = "Global Repo CloudWatch Log Retention in Days"
type = number
validation {
condition = contains([0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653], var.cloud_watch_retention)
error_message = "Not a valid retention day option, see https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group ."
}
}
variable "env_name" {
description = "Environment name string to be used for decisions and name generation. Appended to name_suffix to create full_suffix"
type = string
}
# KMS for CloudWatch Logs - https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html
data "aws_iam_policy_document" "cloudwatch_kms" {
statement {
actions = [
"kms:*",
]
principals {
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
]
type = "AWS"
}
resources = [
"*",
]
sid = "Enable IAM User Permissions"
}
statement {
actions = [
"kms:Encrypt*",
"kms:Decrypt*",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:Describe*"
]
condition {
test = "ArnEquals"
values = [
"arn:aws:logs:*:${data.aws_caller_identity.current.account_id}:log-group:*",
]
variable = "kms:EncryptionContext:aws:logs:arn"
}
principals {
identifiers = [
"logs.${var.region}.amazonaws.com",
]
type = "Service"
}
resources = [
"*",
]
sid = "Allow cloudwatch to encrypt logs"
}
}
#KMS
resource "aws_kms_key" "cloudwatch" {
description = "Default Key for CloudWatch Log Groups"
enable_key_rotation = true
policy = data.aws_iam_policy_document.cloudwatch_kms.json
}
# CloudWatch KMS Key Alias
resource "aws_kms_alias" "cloudwatch" {
name = "cloudwatch-default-key"
target_key_id = aws_kms_key.cloudwatch.key_id
}
#CloudWatch Log Group for VPC Flow Logs
resource "aws_cloudwatch_log_group" "vpc_flow_logs" {
count = var.env_name == "dev" ? 0 : 1 #If dev no flow logs, otherwise enable them
name = "${var.name_prefix}-vpc-flow-logs${local.name_suffix}"
retention_in_days = var.cloud_watch_retention
kms_key_id = aws_kms_alias.cloudwatch.arn
}
#Module additional code
module "vpc" {
# VPC Flow Logs
enable_flow_log = var.env_name == "dev" ? false : true #If dev no flow logs, otherwise enable them
create_flow_log_cloudwatch_iam_role = true
flow_log_destination_type = "cloud-watch-logs"
flow_log_file_format = "plain-text"
flow_log_traffic_type = "ALL"
flow_log_cloudwatch_log_group_retention_in_days = var.cloud_watch_retention
flow_log_destination_arn = aws_cloudwatch_log_group.vpc_flow_logs[0].arn
}
You may have heard the phrase, “Forget everything you know about networking in the cloud”, but that phrase is a bit broad. It’s true in essence, but in practice, it’s mostly wrong and in my experience, it depends on the day (just like everything else in IT). With cloud networking, specifically in AWS, the VPC is your network and unlike EC2 classic which won't exist past this year.
AWS controls all aspects of your backend network like the routers and switches, but you can always add your own router and replace some of their routers’ functions. ARP still exists in subnets in the VPC, Route Tables are easily customizable, you can do NACLs for stateless rules, and layer 3 security group rules for segmentation.
As a matter of fact, you can even accidentally change your default gateway to route over an S2S VPN if you want. You still have routing protocols such as BGP in your VPC, so you can communicate with on-premise infrastructure or cloud-to-cloud. All of the basic networking concepts still apply from the OSI Model (excluding physical cables most of the time): packets still have MTUs and your machines must be able to communicate over the network via micro-segmentation.
While doing networking in the cloud, here are a few common issues that can occur and are often the root cause of the issue:
Final Destination local firewall - everything can be set up correctly, but a common mistake is not adjusting the firewall rules on the final destination. This can cause health checks to fail and the server to not work.
All of us have job descriptions. They are one of the most important pieces of information we use during the hiring process, annual reviews, and as a compass to guide our day-to-day focus. However, it can become all too easy to limit yourself by using the description as a way to say, “Sorry, but that’s not my job.” Yes, our job is to stay focused on the assigned tasks at hand but our job is also to innovate, support other teams, and challenge ourselves to grow beyond our current capacity.
Are you facing challenges with network or cloud issues? StratusGrid is here to help. Our team of experts specializes in providing innovative solutions to complex cloud and networking problems. Whether you're looking to enhance your cloud infrastructure, troubleshoot network issues, or simply want to learn more about how networking can empower your cloud strategies, we've got you covered.
Contact us today and let our engineers guide you through your cloud and networking journey.
BONUS: Download Your FinOps Guide to Effective Cloud Cost Optimization Here ⤵️
Explore an in-depth comparison of Stratusphere™ FinOps and AWS Trusted Advisor for cost optimization. Discover which tool fits your enterprise needs.
Explore an in-depth comparison of Stratusphere™ FinOps and AWS Billing and Cost Management for cost optimization. Discover which tool fits you better.
Explore an in-depth comparison of Stratusphere™ FinOps and AWS Compute Optimizer for cost optimization. Discover which tool fits your enterprise.