Skip to content

Choosing AWS Node Types for Your Workloads#

This guide helps you select the appropriate AWS EC2 instance types for your Kubernetes workloads on the IT Common Platform.

Overview#

When deploying workloads to AWS EKS clusters, the platform uses Karpenter to automatically provision EC2 instances based on your pod requirements. Your choice of node types directly impacts:

  • Application performance
  • Cost efficiency
  • Availability and resilience
  • Scaling capabilities

Instance Type Information#

For detailed information about AWS EC2 instance types, their specifications, and use cases, please refer to the official AWS documentation:

On-Demand vs Spot Instances#

On-Demand Instances#

What they are: Standard EC2 instances that run continuously until you terminate them.

Pros:

  • Guaranteed availability once launched
  • No interruptions or forced terminations
  • Predictable pricing
  • Best for production workloads

Cons:

  • Higher cost (typically 60-90% more expensive than spot)
  • No cost savings for unused capacity

Use cases:

  • Production applications
  • Stateful workloads
  • Applications that cannot tolerate interruptions
  • Databases and data stores

Spot Instances#

What they are: Spare EC2 capacity available at discounted rates (up to 90% off on-demand prices).

Pros:

  • Significant cost savings (60-90% cheaper)
  • Same performance as on-demand instances
  • Ideal for fault-tolerant workloads

Cons:

  • Can be terminated with 2-minute notice
  • Not guaranteed to be available
  • Price fluctuates based on demand

Use cases:

  • Batch processing jobs
  • CI/CD pipelines
  • Development and testing environments
  • Stateless, fault-tolerant applications
  • Background processing tasks

Small vs Large Instances#

Small Instances (e.g., t3.small, t3.medium)#

Pros:

  • Lower cost per instance
  • Better bin packing for small pods
  • Reduced blast radius if a node fails
  • Faster node provisioning and termination
  • More granular scaling

Cons:

  • Higher overhead ratio (system pods consume larger percentage)
  • More nodes to manage
  • Potential for increased network traffic between nodes
  • May hit AWS API rate limits with many nodes

Best for:

  • Microservices with small resource requirements
  • Development and testing environments
  • Applications with variable load

Large Instances (e.g., m5.2xlarge, m5.4xlarge)#

Pros:

  • Lower overhead ratio (system resources are a smaller percentage)
  • Fewer nodes to manage
  • Better for applications requiring node-local communication
  • More efficient for large pods

Cons:

  • Higher cost per instance
  • Larger blast radius on node failure
  • Less granular scaling
  • Potential for resource waste with poor bin packing

Best for:

  • Resource-intensive applications
  • Databases and caches
  • Applications requiring high memory or CPU
  • Workloads with stable, predictable resource needs

Currently Available Node Types#

The IT Common Platform currently offers the following standard node configurations:

Standard Node Types#

  • t3a.medium (2 vCPU, 4 GB RAM) - Burstable performance, general purpose
  • t3a.large (2 vCPU, 8 GB RAM) - Burstable performance, general purpose

These instance types are suitable for most general-purpose workloads and provide a good balance of cost and performance. They are currently used across all production, pre-production, and development environments.

Need Different Instance Types?#

If your workload requires different instance types (e.g., compute-optimized, memory-optimized, or larger instances), please create a ServiceNow ticket. The team can help evaluate your requirements and configure appropriate node pools for your specific needs.

AWS Instance Costs#

To estimate and compare costs for different instance types:

How to Request Node Pool Changes#

  1. Gather requirements: Resource needs (CPU/memory), workload type, availability requirements, desired instance types
  2. Submit request: Go to IT Common Platform Consultation and include:
    • Namespace/tenant name
    • Desired instance types and capacity type (on-demand/spot)
    • Resource limits and performance requirements
    • Business justification
  3. Validate changes: After implementation, verify pod placement with kubectl get pods -o wide -n <namespace> and monitor in Grafana

Best Practices#

  1. Start with standard instance types (t3a.medium/t3a.large) and optimize based on metrics
  2. Monitor actual usage through Grafana before requesting changes
  3. Set appropriate pod resource requests to ensure efficient bin packing:
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
    
  4. Set pod disruption budgets when using spot instances to maintain availability

Common Configurations#

Development Environment#

nodePools:
  - name: dev
    instanceTypes: ["t3a.small", "t3a.medium"]
    capacityTypes: ["spot"]
    limits:
      cpu: 20
      memory: 40Gi

Production Web Application#

nodePools:
  - name: web-app
    instanceTypes: ["t3a.medium", "t3a.large"]
    capacityTypes: ["on-demand"]
    limits:
      cpu: 100
      memory: 200Gi

Monitoring and Optimization#

Use the platform's monitoring tools to validate your instance choices:

  • Grafana Dashboards: Monitor CPU, memory, and network utilization
  • Opencost: Analyze cost per pod and namespace

Regular review of these metrics will help you optimize your node pool configuration for both performance and cost.

Additional Resources#