status: building

kubernetes

Slashing EKS Costs: Multi-Tenant Kubernetes with Karpenter

Reducing EKS compute costs by 60% using Karpenters intelligent node provisioning - featuring Spot instance strategies, node consolidation, per-team resource quotas, and cost allocation with showback dashboards.

Milan Dangol

Sr DevOps & DevSecOps Engineer

Jun 5, 2025

11 min read

Introduction

EKS compute costs can spiral out of control quickly. I inherited a cluster with static node groups that were either over-provisioned (wasting money) or under-provisioned (causing scheduling failures). Karpenter changed everything - it provisions exactly the right nodes, at the right time, using the cheapest instance types.

This post covers how I built a multi-tenant EKS platform that:

Reduced compute costs by 60% using Spot instances intelligently
Provisions nodes in under 60 seconds (vs 5+ minutes with Cluster Autoscaler)
Consolidates underutilized nodes automatically
Allocates costs per team with accurate showback

Architecture Overview

flowchart TB subgraph EKSCluster["EKS Cluster"] subgraph ControlPlane["Control Plane"] API[Kubernetes API] KARPENTER[Karpenter Controller] end subgraph NodePools["Node Pools"] subgraph OnDemand["On-Demand Pool"] OD1[m6i.xlarge] OD2[m6i.2xlarge] end subgraph Spot["Spot Pool"] SP1[c6i.xlarge] SP2[c6i.2xlarge] SP3[m6i.xlarge] SP4[r6i.xlarge] end subgraph GPU["GPU Pool"] GPU1[g5.xlarge] GPU2[g5.2xlarge] end end subgraph Workloads["Workloads by Team"] subgraph TeamA["Team A - Production"] A1[Critical API] A2[Payment Service] end subgraph TeamB["Team B - Analytics"] B1[Spark Jobs] B2[Data Pipeline] end subgraph TeamC["Team C - ML"] C1[Training Jobs] C2[Inference] end end end subgraph CostManagement["Cost Management"] KUBECOST[Kubecost] CUR[AWS Cost & Usage Report] DASHBOARD[Cost Dashboard] end KARPENTER --> NodePools TeamA --> OnDemand TeamB --> Spot TeamC --> GPU NodePools --> KUBECOST KUBECOST --> DASHBOARD CUR --> DASHBOARD style ControlPlane fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff style OnDemand fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff style Spot fill:#f77f00,stroke:#fff,stroke-width:2px,color:#fff style GPU fill:#9b5de5,stroke:#fff,stroke-width:2px,color:#fff style CostManagement fill:#264653,stroke:#e63946,stroke-width:2px,color:#fff

Karpenter Installation

# karpenter/main.tf

resource "helm_release" "karpenter" {
  name       = "karpenter"
  repository = "oci://public.ecr.aws/karpenter"
  chart      = "karpenter"
  version    = "0.33.0"
  namespace  = "karpenter"

  create_namespace = true

  values = [
    yamlencode({
      settings = {
        clusterName       = var.cluster_name
        clusterEndpoint   = var.cluster_endpoint
        interruptionQueue = aws_sqs_queue.karpenter.name
      }

      serviceAccount = {
        annotations = {
          "eks.amazonaws.com/role-arn" = aws_iam_role.karpenter.arn
        }
      }

      controller = {
        resources = {
          requests = {
            cpu    = "500m"
            memory = "512Mi"
          }
          limits = {
            cpu    = "1"
            memory = "1Gi"
          }
        }
      }

      # Enable consolidation
      replicas = 2
    })
  ]

  depends_on = [
    aws_iam_role_policy_attachment.karpenter,
  ]
}

# SQS queue for Spot interruption handling
resource "aws_sqs_queue" "karpenter" {
  name                      = "karpenter-${var.cluster_name}"
  message_retention_seconds = 300
  sqs_managed_sse_enabled   = true
}

# EventBridge rules for Spot interruptions
resource "aws_cloudwatch_event_rule" "spot_interruption" {
  name        = "karpenter-spot-interruption"
  description = "Spot instance interruption notice"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })
}

resource "aws_cloudwatch_event_target" "spot_interruption" {
  rule      = aws_cloudwatch_event_rule.spot_interruption.name
  target_id = "karpenter"
  arn       = aws_sqs_queue.karpenter.arn
}

NodePool Configuration

flowchart TD subgraph NodePoolStrategy["Node Pool Strategy"] direction TB subgraph Critical["Critical Workloads"] CRIT_REQ["Requirements: - High availability - Predictable performance"] CRIT_POOL["On-Demand NodePool - m6i, c6i families - No Spot"] end subgraph General["General Workloads"] GEN_REQ["Requirements: - Cost efficient - Interruption tolerant"] GEN_POOL["Spot NodePool - Diverse instance types - 70% cost savings"] end subgraph Batch["Batch/Analytics"] BATCH_REQ["Requirements: - Checkpointing - Flexible scheduling"] BATCH_POOL["Spot NodePool - Large instances - Consolidation enabled"] end subgraph MLWorkloads["ML Workloads"] ML_REQ["Requirements: - GPU instances - Training/Inference"] ML_POOL["GPU NodePool - g5 instances - Spot for training"] end end style Critical fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff style General fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff style Batch fill:#f77f00,stroke:#fff,stroke-width:2px,color:#fff style MLWorkloads fill:#9b5de5,stroke:#fff,stroke-width:2px,color:#fff

On-Demand NodePool for Critical Workloads

# karpenter/nodepools/critical.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: critical
spec:
  template:
    metadata:
      labels:
        workload-type: critical
        billing-team: platform
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # No Spot for critical
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
      nodeClassRef:
        name: default
      taints:
        - key: workload-type
          value: critical
          effect: NoSchedule
  
  limits:
    cpu: 1000
    memory: 2000Gi
  
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s
    budgets:
      - nodes: "10%"
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  instanceProfile: KarpenterNodeInstanceProfile
  
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  
  tags:
    Environment: production
    ManagedBy: karpenter

Spot NodePool for General Workloads

# karpenter/nodepools/spot-general.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    metadata:
      labels:
        workload-type: general
        capacity-type: spot
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["large", "xlarge", "2xlarge"]
        # Diversify across instance types for Spot availability
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["c6i", "c6a", "c7i", "m6i", "m6a", "m7i", "r6i", "r6a"]
      nodeClassRef:
        name: default
  
  limits:
    cpu: 2000
    memory: 4000Gi
  
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "20%"
  
  # Weight for scheduling preference (higher = preferred)
  weight: 100

GPU NodePool for ML Workloads

# karpenter/nodepools/gpu.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    metadata:
      labels:
        workload-type: gpu
        nvidia.com/gpu: "true"
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["g"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g5", "g4dn"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
      nodeClassRef:
        name: gpu
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
  
  limits:
    cpu: 500
    memory: 1000Gi
    nvidia.com/gpu: 50
  
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  instanceProfile: KarpenterNodeInstanceProfile
  
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 200Gi
        volumeType: gp3
        encrypted: true
  
  # NVIDIA driver installation
  userData: |
    #!/bin/bash
    set -e
    
    # Install NVIDIA drivers
    amazon-linux-extras install -y epel
    yum install -y nvidia-driver-latest-dkms
    
    # Install nvidia-container-toolkit
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | \
      tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    yum install -y nvidia-container-toolkit
    
    # Configure containerd
    nvidia-ctk runtime configure --runtime=containerd
    systemctl restart containerd

Node Provisioning Flow

sequenceDiagram participant Scheduler as Kubernetes Scheduler participant Karpenter as Karpenter Controller participant EC2 as AWS EC2 participant Node as New Node Scheduler->>Scheduler: Pod pending (no capacity) Scheduler->>Karpenter: Unschedulable pod event Karpenter->>Karpenter: Evaluate NodePools Karpenter->>Karpenter: Calculate optimal instance type Note over Karpenter: Consider: CPU, memory, GPU, architecture, Spot availability Karpenter->>EC2: CreateFleet (Spot or On-Demand) EC2-->>Karpenter: Instance launched Karpenter->>Node: Bootstrap node Node->>Node: Join cluster Node-->>Scheduler: Node ready Scheduler->>Node: Schedule pending pods Note over Karpenter: ~60 seconds total

Multi-Tenant Resource Quotas

# quotas/team-a.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "50"
    services.loadbalancers: "5"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: team-a-limits
  namespace: team-a
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: 512Mi
      defaultRequest:
        cpu: "100m"
        memory: 128Mi
      max:
        cpu: "8"
        memory: 32Gi
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi

Priority Classes for Workload Scheduling

# priority-classes.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Critical production workloads - never preempted"
preemptionPolicy: Never
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high
value: 100000
globalDefault: false
description: "High priority workloads"
preemptionPolicy: PreemptLowerPriority
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default
value: 10000
globalDefault: true
description: "Default priority for general workloads"
preemptionPolicy: PreemptLowerPriority
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch
value: 1000
globalDefault: false
description: "Batch jobs - can be preempted"
preemptionPolicy: PreemptLowerPriority

Node Consolidation

flowchart TD subgraph Before["Before Consolidation"] N1["Node 1 CPU: 20% Memory: 30%"] N2["Node 2 CPU: 25% Memory: 20%"] N3["Node 3 CPU: 15% Memory: 25%"] P1[Pod A] --> N1 P2[Pod B] --> N1 P3[Pod C] --> N2 P4[Pod D] --> N3 end subgraph After["After Consolidation"] N1A["Node 1 CPU: 60% Memory: 75%"] P1A[Pod A] --> N1A P2A[Pod B] --> N1A P3A[Pod C] --> N1A P4A[Pod D] --> N1A end Before -->|"Karpenter Consolidation"| After SAVINGS["Cost Savings: 2 nodes removed ~66% reduction"] style Before fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff style After fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff style SAVINGS fill:#ffbe0b,stroke:#fff,stroke-width:2px,color:#000

Spot Instance Strategy

# spot-strategy.tf

# Diversify across instance types and AZs
locals {
  spot_instance_types = [
    # Compute optimized
    "c6i.large", "c6i.xlarge", "c6i.2xlarge",
    "c6a.large", "c6a.xlarge", "c6a.2xlarge",
    "c7i.large", "c7i.xlarge", "c7i.2xlarge",
    
    # General purpose
    "m6i.large", "m6i.xlarge", "m6i.2xlarge",
    "m6a.large", "m6a.xlarge", "m6a.2xlarge",
    "m7i.large", "m7i.xlarge", "m7i.2xlarge",
    
    # Memory optimized (for some workloads)
    "r6i.large", "r6i.xlarge",
    "r6a.large", "r6a.xlarge",
  ]
}

# Monitor Spot pricing and availability
resource "aws_cloudwatch_metric_alarm" "spot_interruptions" {
  alarm_name          = "high-spot-interruptions"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "SpotInterruptionRate"
  namespace           = "Karpenter"
  period              = 300
  statistic           = "Sum"
  threshold           = 5
  alarm_description   = "High rate of Spot interruptions"
  
  alarm_actions = [var.sns_topic_arn]
}

Cost Allocation with Kubecost

# kubecost/main.tf

resource "helm_release" "kubecost" {
  name       = "kubecost"
  repository = "https://kubecost.github.io/cost-analyzer/"
  chart      = "cost-analyzer"
  namespace  = "kubecost"
  version    = "1.106.0"

  create_namespace = true

  values = [
    yamlencode({
      global = {
        prometheus = {
          enabled = false  # Use existing Prometheus
          fqdn    = "http://prometheus-server.monitoring:80"
        }
      }

      kubecostModel = {
        etlCloudAsset = true
      }

      # AWS integration for accurate pricing
      kubecostProductConfigs = {
        cloudIntegrationJSON = jsonencode({
          aws = [{
            athenaBucketName = var.athena_bucket
            athenaRegion     = var.region
            athenaDatabase   = "athenacurcfn_cost_report"
            athenaTable      = "cost_report"
            athenaWorkgroup  = "primary"
            masterPayerARN   = var.master_payer_arn
          }]
        })
      }

      # Cost allocation
      kubecostDeployment = {
        labels = {
          "app.kubernetes.io/component" = "cost-analyzer"
        }
      }
    })
  ]
}

Cost Allocation Dashboard

# kubecost/allocation-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: allocation-config
  namespace: kubecost
data:
  allocation.yaml: |
    # Team-based allocation
    teams:
      - name: team-a
        namespaces:
          - team-a
          - team-a-staging
        labels:
          team: team-a
      - name: team-b
        namespaces:
          - team-b
          - analytics
        labels:
          team: team-b
      - name: platform
        namespaces:
          - kube-system
          - monitoring
          - karpenter
        labels:
          team: platform

    # Shared cost distribution
    sharedCosts:
      - name: cluster-overhead
        type: weighted
        filter:
          namespaces:
            - kube-system
            - monitoring
      - name: networking
        type: proportional
        filter:
          labels:
            cost-type: networking

Cost Savings Breakdown

flowchart LR subgraph Before["Before Karpenter"] B_NODES["Static Node Groups Always running Over-provisioned"] B_COST["Monthly Cost $50,000"] end subgraph After["After Karpenter"] subgraph Savings["Savings Sources"] S1["Spot Instances -40%"] S2["Right-sizing -15%"] S3["Consolidation -10%"] S4["Scale to Zero -5%"] end A_COST["Monthly Cost $20,000"] end Before --> After TOTAL["Total Savings: 60% $30,000/month"] style Before fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff style After fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff style TOTAL fill:#ffbe0b,stroke:#fff,stroke-width:2px,color:#000

Workload Examples

Critical Workload (On-Demand)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: team-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
        workload-type: critical
    spec:
      priorityClassName: critical
      nodeSelector:
        workload-type: critical
      tolerations:
        - key: workload-type
          value: critical
          effect: NoSchedule
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: payment-service
      containers:
        - name: payment-service
          image: payment-service:v1.2.3
          resources:
            requests:
              cpu: "500m"
              memory: 512Mi
            limits:
              cpu: "2"
              memory: 2Gi

Batch Workload (Spot)

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
  namespace: team-b
spec:
  parallelism: 10
  completions: 100
  backoffLimit: 3
  template:
    metadata:
      labels:
        app: data-processing
        workload-type: batch
    spec:
      priorityClassName: batch
      nodeSelector:
        capacity-type: spot
      tolerations:
        - key: karpenter.sh/disruption
          operator: Exists
      restartPolicy: OnFailure
      containers:
        - name: processor
          image: data-processor:v2.0
          resources:
            requests:
              cpu: "2"
              memory: 4Gi
            limits:
              cpu: "4"
              memory: 8Gi

Monitoring and Alerts

# monitoring/karpenter-alerts.tf

resource "aws_cloudwatch_metric_alarm" "karpenter_pending_pods" {
  alarm_name          = "karpenter-pending-pods"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "pending_pods"
  namespace           = "Karpenter"
  period              = 60
  statistic           = "Maximum"
  threshold           = 10
  alarm_description   = "Too many pending pods - Karpenter may be failing"
  
  alarm_actions = [var.sns_topic_arn]
}

resource "aws_cloudwatch_metric_alarm" "node_launch_failures" {
  alarm_name          = "karpenter-launch-failures"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "nodeclaims_launch_failed"
  namespace           = "Karpenter"
  period              = 300
  statistic           = "Sum"
  threshold           = 3
  alarm_description   = "Karpenter failing to launch nodes"
  
  alarm_actions = [var.sns_topic_arn]
}

Best Practices

Practice	Why
Diversify Spot instance types	Higher availability, fewer interruptions
Use consolidation wisely	Balance cost vs stability
Set appropriate limits	Prevent runaway scaling
Tag everything	Accurate cost allocation
Use PriorityClasses	Protect critical workloads
Monitor Spot interruptions	React to capacity issues

Troubleshooting

"Pods stuck pending"

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

# Check NodePool status
kubectl get nodepools -o yaml

# Check if limits are reached
kubectl get nodepools -o jsonpath='{.items[*].status}'

"Nodes not consolidating"

Check PodDisruptionBudgets
Verify consolidation policy is set
Check for pods with do-not-disrupt annotation

"Spot interruptions causing issues"

Increase instance type diversity
Add fallback to on-demand
Implement proper pod disruption handling

Conclusion

Karpenter transforms EKS cost management from a guessing game into a precise, automated process. The combination of:

Intelligent provisioning - right instance at the right time
Spot instances - 70% savings on interruptible workloads
Automatic consolidation - no more wasted capacity
Per-team cost allocation - accountability and showback

Delivers significant cost savings while actually improving cluster responsiveness. The key is matching workload requirements to the right NodePool and letting Karpenter handle the rest.

cloud11 min read

Cloud FinOps Framework: AWS Cost Intelligence Dashboard, Budgets, and Cost Anomaly Detection for Enterprise Cost Governance

Architecting a FinOps framework that reduced cloud costs by 30% and delivered predictable spend - featuring Cost Intelligence Dashboard, automated anomaly detection, chargeback mechanisms, and executive-level cost visibility.

kubernetes11 min read

Zero-Downtime EKS Upgrades in Production

Implementing a blue-green node group strategy for EKS cluster upgrades with automated rollback, PodDisruptionBudgets, and Terraform orchestration - achieving zero customer impact during Kubernetes version upgrades.

kubernetes12 min read

Implementing Private EKS with Transit Gateway and Hybrid Connectivity

Deploying a fully private EKS cluster with no public endpoints, Transit Gateway for multi-VPC and on-premises routing, PrivateLink for AWS services, and hybrid DNS resolution - achieving enterprise-grade network isolation.

back to blog

kubernetes

Slashing EKS Costs: Multi-Tenant Kubernetes with Karpenter

Milan Dangol

Sr DevOps & DevSecOps Engineer

Jun 5, 2025

11 min read

Introduction

This post covers how I built a multi-tenant EKS platform that:

Reduced compute costs by 60% using Spot instances intelligently
Provisions nodes in under 60 seconds (vs 5+ minutes with Cluster Autoscaler)
Consolidates underutilized nodes automatically
Allocates costs per team with accurate showback

Architecture Overview

Karpenter Installation

# karpenter/main.tf

resource "helm_release" "karpenter" {
  name       = "karpenter"
  repository = "oci://public.ecr.aws/karpenter"
  chart      = "karpenter"
  version    = "0.33.0"
  namespace  = "karpenter"

  create_namespace = true

  values = [
    yamlencode({
      settings = {
        clusterName       = var.cluster_name
        clusterEndpoint   = var.cluster_endpoint
        interruptionQueue = aws_sqs_queue.karpenter.name
      }

      serviceAccount = {
        annotations = {
          "eks.amazonaws.com/role-arn" = aws_iam_role.karpenter.arn
        }
      }

      controller = {
        resources = {
          requests = {
            cpu    = "500m"
            memory = "512Mi"
          }
          limits = {
            cpu    = "1"
            memory = "1Gi"
          }
        }
      }

      # Enable consolidation
      replicas = 2
    })
  ]

  depends_on = [
    aws_iam_role_policy_attachment.karpenter,
  ]
}

# SQS queue for Spot interruption handling
resource "aws_sqs_queue" "karpenter" {
  name                      = "karpenter-${var.cluster_name}"
  message_retention_seconds = 300
  sqs_managed_sse_enabled   = true
}

# EventBridge rules for Spot interruptions
resource "aws_cloudwatch_event_rule" "spot_interruption" {
  name        = "karpenter-spot-interruption"
  description = "Spot instance interruption notice"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })
}

resource "aws_cloudwatch_event_target" "spot_interruption" {
  rule      = aws_cloudwatch_event_rule.spot_interruption.name
  target_id = "karpenter"
  arn       = aws_sqs_queue.karpenter.arn
}

NodePool Configuration

On-Demand NodePool for Critical Workloads

# karpenter/nodepools/critical.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: critical
spec:
  template:
    metadata:
      labels:
        workload-type: critical
        billing-team: platform
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # No Spot for critical
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
      nodeClassRef:
        name: default
      taints:
        - key: workload-type
          value: critical
          effect: NoSchedule
  
  limits:
    cpu: 1000
    memory: 2000Gi
  
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s
    budgets:
      - nodes: "10%"
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  instanceProfile: KarpenterNodeInstanceProfile
  
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  
  tags:
    Environment: production
    ManagedBy: karpenter

Spot NodePool for General Workloads

# karpenter/nodepools/spot-general.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    metadata:
      labels:
        workload-type: general
        capacity-type: spot
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["large", "xlarge", "2xlarge"]
        # Diversify across instance types for Spot availability
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["c6i", "c6a", "c7i", "m6i", "m6a", "m7i", "r6i", "r6a"]
      nodeClassRef:
        name: default
  
  limits:
    cpu: 2000
    memory: 4000Gi
  
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "20%"
  
  # Weight for scheduling preference (higher = preferred)
  weight: 100

GPU NodePool for ML Workloads

# karpenter/nodepools/gpu.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    metadata:
      labels:
        workload-type: gpu
        nvidia.com/gpu: "true"
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["g"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g5", "g4dn"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
      nodeClassRef:
        name: gpu
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
  
  limits:
    cpu: 500
    memory: 1000Gi
    nvidia.com/gpu: 50
  
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "true"
  instanceProfile: KarpenterNodeInstanceProfile
  
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 200Gi
        volumeType: gp3
        encrypted: true
  
  # NVIDIA driver installation
  userData: |
    #!/bin/bash
    set -e
    
    # Install NVIDIA drivers
    amazon-linux-extras install -y epel
    yum install -y nvidia-driver-latest-dkms
    
    # Install nvidia-container-toolkit
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | \
      tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    yum install -y nvidia-container-toolkit
    
    # Configure containerd
    nvidia-ctk runtime configure --runtime=containerd
    systemctl restart containerd

Node Provisioning Flow

Multi-Tenant Resource Quotas

# quotas/team-a.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "50"
    services.loadbalancers: "5"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: team-a-limits
  namespace: team-a
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: 512Mi
      defaultRequest:
        cpu: "100m"
        memory: 128Mi
      max:
        cpu: "8"
        memory: 32Gi
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi

Priority Classes for Workload Scheduling

# priority-classes.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Critical production workloads - never preempted"
preemptionPolicy: Never
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high
value: 100000
globalDefault: false
description: "High priority workloads"
preemptionPolicy: PreemptLowerPriority
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default
value: 10000
globalDefault: true
description: "Default priority for general workloads"
preemptionPolicy: PreemptLowerPriority
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch
value: 1000
globalDefault: false
description: "Batch jobs - can be preempted"
preemptionPolicy: PreemptLowerPriority

Node Consolidation

Spot Instance Strategy

# spot-strategy.tf

# Diversify across instance types and AZs
locals {
  spot_instance_types = [
    # Compute optimized
    "c6i.large", "c6i.xlarge", "c6i.2xlarge",
    "c6a.large", "c6a.xlarge", "c6a.2xlarge",
    "c7i.large", "c7i.xlarge", "c7i.2xlarge",
    
    # General purpose
    "m6i.large", "m6i.xlarge", "m6i.2xlarge",
    "m6a.large", "m6a.xlarge", "m6a.2xlarge",
    "m7i.large", "m7i.xlarge", "m7i.2xlarge",
    
    # Memory optimized (for some workloads)
    "r6i.large", "r6i.xlarge",
    "r6a.large", "r6a.xlarge",
  ]
}

# Monitor Spot pricing and availability
resource "aws_cloudwatch_metric_alarm" "spot_interruptions" {
  alarm_name          = "high-spot-interruptions"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "SpotInterruptionRate"
  namespace           = "Karpenter"
  period              = 300
  statistic           = "Sum"
  threshold           = 5
  alarm_description   = "High rate of Spot interruptions"
  
  alarm_actions = [var.sns_topic_arn]
}

Cost Allocation with Kubecost

# kubecost/main.tf

resource "helm_release" "kubecost" {
  name       = "kubecost"
  repository = "https://kubecost.github.io/cost-analyzer/"
  chart      = "cost-analyzer"
  namespace  = "kubecost"
  version    = "1.106.0"

  create_namespace = true

  values = [
    yamlencode({
      global = {
        prometheus = {
          enabled = false  # Use existing Prometheus
          fqdn    = "http://prometheus-server.monitoring:80"
        }
      }

      kubecostModel = {
        etlCloudAsset = true
      }

      # AWS integration for accurate pricing
      kubecostProductConfigs = {
        cloudIntegrationJSON = jsonencode({
          aws = [{
            athenaBucketName = var.athena_bucket
            athenaRegion     = var.region
            athenaDatabase   = "athenacurcfn_cost_report"
            athenaTable      = "cost_report"
            athenaWorkgroup  = "primary"
            masterPayerARN   = var.master_payer_arn
          }]
        })
      }

      # Cost allocation
      kubecostDeployment = {
        labels = {
          "app.kubernetes.io/component" = "cost-analyzer"
        }
      }
    })
  ]
}

Cost Allocation Dashboard

# kubecost/allocation-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: allocation-config
  namespace: kubecost
data:
  allocation.yaml: |
    # Team-based allocation
    teams:
      - name: team-a
        namespaces:
          - team-a
          - team-a-staging
        labels:
          team: team-a
      - name: team-b
        namespaces:
          - team-b
          - analytics
        labels:
          team: team-b
      - name: platform
        namespaces:
          - kube-system
          - monitoring
          - karpenter
        labels:
          team: platform

    # Shared cost distribution
    sharedCosts:
      - name: cluster-overhead
        type: weighted
        filter:
          namespaces:
            - kube-system
            - monitoring
      - name: networking
        type: proportional
        filter:
          labels:
            cost-type: networking

Cost Savings Breakdown

Workload Examples

Critical Workload (On-Demand)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: team-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
        workload-type: critical
    spec:
      priorityClassName: critical
      nodeSelector:
        workload-type: critical
      tolerations:
        - key: workload-type
          value: critical
          effect: NoSchedule
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: payment-service
      containers:
        - name: payment-service
          image: payment-service:v1.2.3
          resources:
            requests:
              cpu: "500m"
              memory: 512Mi
            limits:
              cpu: "2"
              memory: 2Gi

Batch Workload (Spot)

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
  namespace: team-b
spec:
  parallelism: 10
  completions: 100
  backoffLimit: 3
  template:
    metadata:
      labels:
        app: data-processing
        workload-type: batch
    spec:
      priorityClassName: batch
      nodeSelector:
        capacity-type: spot
      tolerations:
        - key: karpenter.sh/disruption
          operator: Exists
      restartPolicy: OnFailure
      containers:
        - name: processor
          image: data-processor:v2.0
          resources:
            requests:
              cpu: "2"
              memory: 4Gi
            limits:
              cpu: "4"
              memory: 8Gi

Monitoring and Alerts

# monitoring/karpenter-alerts.tf

resource "aws_cloudwatch_metric_alarm" "karpenter_pending_pods" {
  alarm_name          = "karpenter-pending-pods"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "pending_pods"
  namespace           = "Karpenter"
  period              = 60
  statistic           = "Maximum"
  threshold           = 10
  alarm_description   = "Too many pending pods - Karpenter may be failing"
  
  alarm_actions = [var.sns_topic_arn]
}

resource "aws_cloudwatch_metric_alarm" "node_launch_failures" {
  alarm_name          = "karpenter-launch-failures"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "nodeclaims_launch_failed"
  namespace           = "Karpenter"
  period              = 300
  statistic           = "Sum"
  threshold           = 3
  alarm_description   = "Karpenter failing to launch nodes"
  
  alarm_actions = [var.sns_topic_arn]
}

Best Practices

Practice	Why
Diversify Spot instance types	Higher availability, fewer interruptions
Use consolidation wisely	Balance cost vs stability
Set appropriate limits	Prevent runaway scaling
Tag everything	Accurate cost allocation
Use PriorityClasses	Protect critical workloads
Monitor Spot interruptions	React to capacity issues

Troubleshooting

"Pods stuck pending"

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

# Check NodePool status
kubectl get nodepools -o yaml

# Check if limits are reached
kubectl get nodepools -o jsonpath='{.items[*].status}'

"Nodes not consolidating"

Check PodDisruptionBudgets
Verify consolidation policy is set
Check for pods with do-not-disrupt annotation

"Spot interruptions causing issues"

Increase instance type diversity
Add fallback to on-demand
Implement proper pod disruption handling

Conclusion

Karpenter transforms EKS cost management from a guessing game into a precise, automated process. The combination of:

Intelligent provisioning - right instance at the right time
Spot instances - 70% savings on interruptible workloads
Automatic consolidation - no more wasted capacity
Per-team cost allocation - accountability and showback

Delivers significant cost savings while actually improving cluster responsiveness. The key is matching workload requirements to the right NodePool and letting Karpenter handle the rest.

cloud11 min read

Cloud FinOps Framework: AWS Cost Intelligence Dashboard, Budgets, and Cost Anomaly Detection for Enterprise Cost Governance

kubernetes11 min read

Zero-Downtime EKS Upgrades in Production

kubernetes12 min read

Slashing EKS Costs: Multi-Tenant Kubernetes with Karpenter

Introduction

Architecture Overview

Karpenter Installation

NodePool Configuration

On-Demand NodePool for Critical Workloads

Spot NodePool for General Workloads

GPU NodePool for ML Workloads

Node Provisioning Flow

Multi-Tenant Resource Quotas

Priority Classes for Workload Scheduling

Node Consolidation

Spot Instance Strategy

Cost Allocation with Kubecost

Cost Allocation Dashboard

Cost Savings Breakdown

Workload Examples

Critical Workload (On-Demand)

Batch Workload (Spot)

Monitoring and Alerts

Best Practices

Troubleshooting

"Pods stuck pending"

"Nodes not consolidating"

"Spot interruptions causing issues"

Conclusion

Related Articles

Cloud FinOps Framework: AWS Cost Intelligence Dashboard, Budgets, and Cost Anomaly Detection for Enterprise Cost Governance

Zero-Downtime EKS Upgrades in Production

Implementing Private EKS with Transit Gateway and Hybrid Connectivity

Slashing EKS Costs: Multi-Tenant Kubernetes with Karpenter

Introduction

Architecture Overview

Karpenter Installation

NodePool Configuration

On-Demand NodePool for Critical Workloads

Spot NodePool for General Workloads

GPU NodePool for ML Workloads

Node Provisioning Flow

Multi-Tenant Resource Quotas

Priority Classes for Workload Scheduling

Node Consolidation

Spot Instance Strategy

Cost Allocation with Kubecost

Cost Allocation Dashboard

Cost Savings Breakdown

Workload Examples

Critical Workload (On-Demand)

Batch Workload (Spot)

Monitoring and Alerts

Best Practices

Troubleshooting

"Pods stuck pending"

"Nodes not consolidating"

"Spot interruptions causing issues"

Conclusion

Related Articles

Cloud FinOps Framework: AWS Cost Intelligence Dashboard, Budgets, and Cost Anomaly Detection for Enterprise Cost Governance

Zero-Downtime EKS Upgrades in Production

Implementing Private EKS with Transit Gateway and Hybrid Connectivity