AWS Cloud Cost Savings by Migrating Kubernetes Workloads to Graviton
Running Kubernetes clusters on AWS Cloud is powerful and offers various deployment options depending on instance choice. One significant opportunity for optimization is migrating workloads from the traditional AMD64 (x86_64) architecture to AWS ARM64 Graviton instances. This post walks through our recent migration project, highlights the practical steps and real-world challenges.
Motivation: Better Performance at Lower Cost
AWS Graviton instances are built on custom ARM chips designed for cloud workloads. They deliver better price–performance compared to Intel and AMD instances, often at up to 40% lower cost. For Kubernetes clusters running many workloads, this can make a major difference in both performance and budget.
In our migration, we tested Graviton across development, staging, and production environments. The results were clear:
- Development: Saved about $300/month
- Staging: Saved about $400/month
- Production: Saved about $1,200/month
Overall, we achieved around 45% lower compute costs while maintaining or improving workload performance. Graviton let us run services with lower CPU usage, faster response times, and reduced build durations for some applications.
AWS Graviton Migration
Process Overview
-
Setting Up Graviton Node Groups
We added new EKS managed node groups with Graviton instances (t4g.xlarge, c6g.xlarge, c7g.xlarge). For Karpenter, we created provisioners that scale ARM64 nodes. This allowed us to run x86 and Graviton workloads side by side.
-
Building Multi-Architecture Images
We updated CI/CD to build images for both amd64 and arm64. Golang services used cross-compilation to avoid emulation, while Python and .NET used ARM64 runners. This made builds faster and deployments smooth.
-
Updating Scheduling Rules
We added node selectors and tolerations so Kubernetes could place pods on Graviton nodes. This let us migrate gradually and control workloads during testing.
-
Handling Compatibility Issues
Some libraries and tools did not support ARM64. In those cases, we kept workloads on x86 or switched to manual solutions. This way, most services migrated successfully without disruption.
We’ll detail each step and our approach in the following sections.
1. Setting Up Graviton EKS Node Groups
When setting up Graviton in our clusters, we worked with two types of autoscalers: EKS Managed Node Groups and Karpenter. By using both, we had the reliability of managed scaling with EKS and the flexibility of custom scaling policies with Karpenter. This gave us full control during the migration and let us balance stability with efficiency.
Note: We used taints and tolerations to gain more control during the migration of our workloads. Since the Kubernetes scheduler manages architecture selection, using taints and tolerations is optional.
A. EKS Managed Node Groups
For clusters using managed node groups, we added a new group with Graviton instances. We selected t4g.xlarge, c6g.xlarge and c7g.xlarge because they closely matched the size and capacity of our old AMD64 nodes.
Sample Terraform Configuration:
eks_managed_node_groups = {
# ... existing configurations
linux_arm = {
min_size = 1
max_size = 6
desired_size = 4
ami_type = "AL2_ARM_64"
instance_types = [
"t4g.xlarge", # Burstable performance
"c6g.xlarge", # Compute optimized
"c7g.xlarge", # Latest generation compute optimized
]
labels = {
workload = "graviton"
}
taints = [{
key = "graviton"
value = "enabled"
effect = "NO_SCHEDULE"
}]
}
}
B. Karpenter Provisioners
For clusters running Karpenter, we created new provisioners that launch Graviton instances. We also applied taints and tolerations to control which workloads would run on these new nodes during the migration.
Sample Karpenter Graviton Provisioner:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: graviton-nodepool
spec:
consolidation:
enabled: true
limits:
resources:
cpu: "200"
providerRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["t4g.xlarge", "c6g.xlarge", "c7g.xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
taints:
- effect: NoSchedule
key: graviton
value: enabled
2. Building Multi-Architecture Images
Running workloads on AWS Graviton nodes means your containers must support the arm64 architecture. To achieve this, we updated our CI/CD pipelines and Dockerfiles to build multi-architecture images that work on both amd64 and arm64. This allowed us to deploy the same image across x86 and Graviton nodes without maintaining separate builds.
A. Github Actions Build Pipeline Changes
We updated our CI/CD pipelines to build and push images for both architectures. Using Docker Buildx with QEMU, we could build amd64 and arm64 images in one step. This ensured every service had a single multi-arch image that runs on any node type.
jobs:
build:
runs-on: ubuntu-latest
steps:
...
+ - name: Set up QEMU
+ uses: docker/setup-qemu-action@v3
+
+ - name: Set up Docker Buildx
+ uses: docker/setup-buildx-action@v3
- name: Build and Push
run: |
docker buildx build \\
--file ./Dockerfile \\
+ --platform linux/amd64,linux/arm64 \\
--tag > \\
--no-cache \\
.
...
B. Example Golang Cross-Build Dockerfile
For Go services, we enhanced our multi-architecture build performance by leveraging Go's native cross-compilation capabilities. The following Dockerfile optimizations eliminate the need for ARM64 emulation during the build process, significantly reducing build times.
FROM --platform=$BUILDPLATFORM golang:1.21.0 AS builder
ARG BUILDPLATFORM
ARG TARGETARCH
ARG TARGETOS
WORKDIR /app
COPY go.sum go.mod ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o /bin/app ./cmd/
FROM --platform=$TARGETPLATFORM scratch
COPY --from=builder /bin/app /bin/app
ENTRYPOINT ["/bin/app"]
.
...
To run the build, the command below can be used.
docker buildx build \
--file ./Dockerfile \
--platform linux/amd64,linux/arm64 \
--tag > \
--no-cache
Note: For languages like Python and .NET, cross-building is less straightforward (see the “Challenges” below).
3. Updating Scheduling Rules (Optional)
As mentioned above, to control rollout, we added nodeSelectors and tolerations so Kubernetes could place specific pods on Graviton nodes. This gave us flexibility to test gradually before moving everything over. While optional, this step helped reduce migration risk.
tolerations:
- key: graviton
operator: Exists
nodeSelector:
kubernetes.io/arch: arm64
Challenges Encountered During Graviton Migration
1. Library Incompatibility
Some dependencies may not (yet) support ARM64. For example, we discovered the Fiona library for Python had no ARM64 support, meaning we had to skip Graviton migration for that specific service.
2. Github Actions Docker Slow Builds for Python and .NET
Unlike Go, Python and .NET don't natively support cross-building. Using QEMU emulation in Buildx made these image builds 5x slower!
The solution: Running builds on ARM64 Github Action runners.
- Provisioned ARM64 Github self-hosted runners (since the public ones were in preview and our repo is private)
- Split our build jobs to run AMD64 and ARM64 builds on matching runners
- Merged images into a multi-arch manifest tag after building both types
**Docker’s official docs on this pattern.
3. Instana Auto Trace Graviton Incompatibility
Our monitoring tool (Instana) did not support automatic instrumentation on Graviton. To keep visibility, we switched to manual instrumentation based on Instana’s language-specific docs.
Future Optimization: Spot Instances with Graviton
Summary
Migrating our Kubernetes workloads to AWS Graviton was not without challenges, but the results made it worth the effort. By carefully updating node groups, pipelines, and scheduling rules, we achieved around 45% lower compute costs while also improving workload efficiency. Services ran with lower CPU usage, faster response times, and in many cases shorter build times.
Although we had to handle a few compatibility issues, these were manageable and did not block the migration. Overall, Graviton proved to be a reliable, cost-effective option for running Kubernetes at scale.
That said, Graviton migration may not be ideal for every application. Some programming languages and libraries still lack full ARM64 support, which can create compatibility issues. In addition, depending on the language and runtime, certain applications may even lose performance when moved from x86 to ARM. Because of this, it’s important to benchmark and test your workloads first to confirm that they actually benefit from Graviton before fully committing.
👉 For teams looking to cut costs and improve performance, moving to Graviton is a practical and future-ready choice.
Got questions
Others frequently ask…-
Not always. Some applications and libraries do not fully support ARM64. For example, certain Python or .NET dependencies may lack compatibility. That’s why benchmarking and compatibility testing before migration is essential.
-
Yes, if you plan to run both x86 and ARM nodes in the same cluster. Without multi-arch images, pods might be scheduled onto the wrong architecture and crash.
-
No. While Graviton usually provides better price–performance, certain CPU-intensive applications or those using specific instruction sets may perform better on x86. Measuring workload performance is the best way to confirm the benefits.
-
Yes. Running workloads on Graviton Spot Instances can increase savings even further (up to 70–80%). However, since Spot nodes can be reclaimed at any time, workloads must be resilient to interruptions.
-
Not if planned correctly. By adding new Graviton node groups and gradually shifting workloads using nodeSelectors, taints, and tolerations, you can achieve a zero-downtime migration.
Muhammad Bintang Bahy
Site Reliability Engineer @kloia