Choosing AWS Node Types for Your Workloads#

This guide helps you select the appropriate AWS EC2 instance types for your Kubernetes workloads on the IT Common Platform.

Overview#

When deploying workloads to AWS EKS clusters, the platform uses Karpenter to automatically provision EC2 instances based on your pod requirements. Your choice of node types directly impacts:

Application performance
Cost efficiency
Availability and resilience
Scaling capabilities

Instance Type Information#

For detailed information about AWS EC2 instance types, their specifications, and use cases, please refer to the official AWS documentation:

AWS EC2 Instance Types - Complete list of all instance families and their characteristics
AWS Instance Type Explorer - Interactive tool to find and compare instance types

On-Demand vs Spot Instances#

On-Demand Instances#

What they are: Standard EC2 instances that run continuously until you terminate them.

Pros:

Guaranteed availability once launched
No interruptions or forced terminations
Predictable pricing
Best for production workloads

Cons:

Higher cost (typically 60-90% more expensive than spot)
No cost savings for unused capacity

Use cases:

Production applications
Stateful workloads
Applications that cannot tolerate interruptions
Databases and data stores

Spot Instances#

What they are: Spare EC2 capacity available at discounted rates (up to 90% off on-demand prices).

Pros:

Significant cost savings (60-90% cheaper)
Same performance as on-demand instances
Ideal for fault-tolerant workloads

Cons:

Can be terminated with 2-minute notice
Not guaranteed to be available
Price fluctuates based on demand

Use cases:

Batch processing jobs
CI/CD pipelines
Development and testing environments
Stateless, fault-tolerant applications
Background processing tasks

Small vs Large Instances#

Small Instances (e.g., t3.small, t3.medium)#

Pros:

Lower cost per instance
Better bin packing for small pods
Reduced blast radius if a node fails
Faster node provisioning and termination
More granular scaling

Cons:

Higher overhead ratio (system pods consume larger percentage)
More nodes to manage
Potential for increased network traffic between nodes
May hit AWS API rate limits with many nodes

Best for:

Microservices with small resource requirements
Development and testing environments
Applications with variable load

Large Instances (e.g., m5.2xlarge, m5.4xlarge)#

Pros:

Lower overhead ratio (system resources are a smaller percentage)
Fewer nodes to manage
Better for applications requiring node-local communication
More efficient for large pods

Cons:

Higher cost per instance
Larger blast radius on node failure
Less granular scaling
Potential for resource waste with poor bin packing

Best for:

Resource-intensive applications
Databases and caches
Applications requiring high memory or CPU
Workloads with stable, predictable resource needs

Currently Available Node Types#

The IT Common Platform currently offers the following standard node configurations:

Standard Node Types#

t3a.medium (2 vCPU, 4 GB RAM) - Burstable performance, general purpose
t3a.large (2 vCPU, 8 GB RAM) - Burstable performance, general purpose

These instance types are suitable for most general-purpose workloads and provide a good balance of cost and performance. They are currently used across all production, pre-production, and development environments.

Need Different Instance Types?#

If your workload requires different instance types (e.g., compute-optimized, memory-optimized, or larger instances), please create a ServiceNow ticket. The team can help evaluate your requirements and configure appropriate node pools for your specific needs.

AWS Instance Costs#

To estimate and compare costs for different instance types:

AWS EC2 Instance Pricing - Official pricing for all instance types
AWS Pricing Calculator - Estimate costs for your configuration
EC2 Instance Comparison - Third-party tool to compare instance types and prices
AWS Spot Pricing History - View spot price trends

How to Request Node Pool Changes#

Gather requirements: Resource needs (CPU/memory), workload type, availability requirements, desired instance types
Submit request: Go to IT Common Platform Consultation and include:
- Namespace/tenant name
- Desired instance types and capacity type (on-demand/spot)
- Resource limits and performance requirements
- Business justification
Validate changes: After implementation, verify pod placement with kubectl get pods -o wide -n <namespace> and monitor in Grafana

Best Practices#

Start with standard instance types (t3a.medium/t3a.large) and optimize based on metrics
Monitor actual usage through Grafana before requesting changes

Set appropriate pod resource requests to ensure efficient bin packing:

resources:
  requests:
    cpu: 500m
    memory: 1Gi

Set pod disruption budgets when using spot instances to maintain availability

Common Configurations#

Development Environment#

nodePools:
  - name: dev
    instanceTypes: ["t3a.small", "t3a.medium"]
    capacityTypes: ["spot"]
    limits:
      cpu: 20
      memory: 40Gi

Production Web Application#

nodePools:
  - name: web-app
    instanceTypes: ["t3a.medium", "t3a.large"]
    capacityTypes: ["on-demand"]
    limits:
      cpu: 100
      memory: 200Gi

Monitoring and Optimization#

Use the platform's monitoring tools to validate your instance choices:

Grafana Dashboards: Monitor CPU, memory, and network utilization
Opencost: Analyze cost per pod and namespace

Regular review of these metrics will help you optimize your node pool configuration for both performance and cost.

Choosing AWS Node Types for Your Workloads#

Overview#

Instance Type Information#

On-Demand vs Spot Instances#

On-Demand Instances#

Spot Instances#

Small vs Large Instances#

Small Instances (e.g., t3.small, t3.medium)#

Large Instances (e.g., m5.2xlarge, m5.4xlarge)#

Currently Available Node Types#

Standard Node Types#

Need Different Instance Types?#

AWS Instance Costs#

How to Request Node Pool Changes#

Best Practices#

Common Configurations#

Development Environment#

Production Web Application#

Monitoring and Optimization#

Additional Resources#