Low-cost Node Auto-scaling with Karpenter and AWS EKS

Carlos Noguera - September 24, 2023 - 0 comments

In this post, I’ll show you how to set up a managed Kubernetes cluster with autoscaling using low-cost nodes (spot instances) with AWS EKS and Karpenter.

Amazon EKS (Elastic Kubernetes Service) is a managed Kubernetes service that lets you run Kubernetes on AWS without having to install, operate, and maintain your own Kubernetes control plane or nodes. Karpenter is an open-source project developed by Amazon that offers a Kubernetes-native node autoscaling solution. Integrating Karpenter with EKS can optimize resource allocation and reduce costs significantly.

Using spot instances can reduce drastically the cost of cloud infrastructure used by Kubernetes to process workload, personally, I recommend using this for devops, machine learning, or any other application that requires high resources on the cloud.

This post is based on Karpenter getting started docs.

Requirements

Linux Bash Terminal
AWS account and AWS cli tool
kubectl – Kubernetes cli tool
eksctl – AWS EKS cli tool
helm -Package manager for Kubernetes

Installation Steps

First, create some environment variables we are going to use for the installation.

# Export environment variables
export K8S_VERSION=1.26
export KARPENTER_VERSION="v0.27.5"
export CLUSTER_NAME="demo"
export AWS_DEFAULT_REGION="us-east-1"
export AWS_PROFILE=default
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT=$(mktemp)

Check environment variables before we start the installation.

echo $KARPENTER_VERSION \
     $CLUSTER_NAME \
     $AWS_DEFAULT_REGION \
     $AWS_ACCOUNT_ID \
     $TEMPOUT

Let’s create an EKS cluster on AWS

# Karpenter AWS requirements configuration
curl -fsSL https://karpenter.sh/"${KARPENTER_VERSION}"/getting-started/getting-started-with-karpenter/cloudformation.yaml  > $TEMPOUT \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

# Create EKS cluster
eksctl create cluster -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "${K8S_VERSION}"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

vpc:
  cidr: 10.10.0.0/16
  #autoAllocateIPv6: true
  # disable public access to endpoint and only allow private access
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  nat:
    gateway: Single 

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: karpenter
      namespace: karpenter
    roleName: ${CLUSTER_NAME}-karpenter
    attachPolicyARNs:
    - arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
    roleOnly: true

iamIdentityMappings:
- arn: "arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  groups:
  - system:bootstrappers
  - system:nodes
# We keep one node as permanent for any service like istio, grafana, etc
managedNodeGroups:
- instanceType: r5a.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-node-core-services
  labels: { role: core }
  volumeSize: 100
  volumeType: gp3
  desiredCapacity: 1
  minSize: 1
  maxSize: 1

cloudWatch:
  clusterLogging:
    enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler"]
    logRetentionInDays: 7

addons:
- name: coredns
  version: latest # auto discovers the latest available
- name: kube-proxy
  version: latest
- name: aws-ebs-csi-driver
  wellKnownPolicies:      # add IAM and service account
    ebsCSIController: true

## Optionally run on fargate
# fargateProfiles:
# - name: karpenter
#  selectors:
#  - namespace: karpenter
EOF

Export cluster endpoint and kubeconfig configuration

export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"

# Check cluster endpoint and karpenter Iam role
echo $CLUSTER_ENDPOINT $KARPENTER_IAM_ROLE_ARN

# Create kubeconfig
aws eks update-kubeconfig --region $AWS_DEFAULT_REGION --name $CLUSTER_NAME

We need to enable spot instances service in our account, otherwise, Karpenter will fail to create nodes with spot instances.

# Enable spot instances service in AWS account
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true

Install Karpenter helm chart


# Logout of docker to perform an unauthenticated pull against the public ECR
docker logout public.ecr.aws
helm registry logout public.ecr.aws

# Install karpenter helm chart
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
  --set settings.aws.clusterName=${CLUSTER_NAME} \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

Once Karpenter installation is finished we must configure it, we need to create a provisioner, this is the way we tell Karpenter what kind of instances it should use when is launching a new node, also what is the autoscaling limit.


cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: "karpenter.k8s.aws/instance-cpu"
      operator: In
      values: ["2", "4", "8", "16", "32"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 10
      memory: 64Gi
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
  securityGroupSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
EOF

Check if Karpenter installation was successful, create a test deployment, and scale it.

# Create a test deployment
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

# Scale the deployment 
kubectl scale deployment inflate --replicas 5

# Check the logs to see how the autoscaling is working
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

# Check the nodes created by karpenter
kubectl get nodes

Scale down the test deployment

# Delete test deployment
kubectl delete deployment inflate

# Check logs to watch the progess of node scaling down
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

# Check if the nodes are gone
kubectl get nodes

Once we have finished testing Karpenter we can delete the EKS cluster.


# Uninstall Karpenter helm chart
helm uninstall karpenter --namespace karpenter

# Remove resources created by cloudformation
aws cloudformation delete-stack --stack-name "Karpenter-${CLUSTER_NAME}"

# Delete launch templates
aws ec2 describe-launch-templates --filters Name=tag:karpenter.k8s.aws/cluster,Values=${CLUSTER_NAME} |
    jq -r ".LaunchTemplates[].LaunchTemplateName" |
    xargs -I{} aws ec2 delete-launch-template --launch-template-name {}

# Delete EKS cluster
eksctl delete cluster --name "${CLUSTER_NAME}"

Low-cost Node Auto-scaling with Karpenter and AWS EKS

Requirements

Installation Steps

Automatic Version Tag and Release with Github Actions

Local Container Registry with Harbor and Minikube

Kubernetes v1.31 Boosts Cluster Performance with Enhanced Cache Reads

Enhancing Kubernetes Storage Management with VolumeAttributesClass in Version 1.31

Kubernetes v1.31 Enhances Resource Management with PersistentVolume Transition Time Feature

Low-cost Node Auto-scaling with Karpenter and AWS EKS

Requirements

Installation Steps

Related posts

Automatic Version Tag and Release with Github Actions

Local Container Registry with Harbor and Minikube