Choosing AWS Node Types for Your Workloads#
This guide helps you select the appropriate AWS EC2 instance types for your Kubernetes workloads on the IT Common Platform.
Overview#
When deploying workloads to AWS EKS clusters, the platform uses Karpenter to automatically provision EC2 instances based on your pod requirements. Your choice of node types directly impacts:
- Application performance
- Cost efficiency
- Availability and resilience
- Scaling capabilities
Instance Type Information#
For detailed information about AWS EC2 instance types, their specifications, and use cases, please refer to the official AWS documentation:
- AWS EC2 Instance Types - Complete list of all instance families and their characteristics
- AWS Instance Type Explorer - Interactive tool to find and compare instance types
On-Demand vs Spot Instances#
On-Demand Instances#
What they are: Standard EC2 instances that run continuously until you terminate them.
Pros:
- Guaranteed availability once launched
- No interruptions or forced terminations
- Predictable pricing
- Best for production workloads
Cons:
- Higher cost (typically 60-90% more expensive than spot)
- No cost savings for unused capacity
Use cases:
- Production applications
- Stateful workloads
- Applications that cannot tolerate interruptions
- Databases and data stores
Spot Instances#
What they are: Spare EC2 capacity available at discounted rates (up to 90% off on-demand prices).
Pros:
- Significant cost savings (60-90% cheaper)
- Same performance as on-demand instances
- Ideal for fault-tolerant workloads
Cons:
- Can be terminated with 2-minute notice
- Not guaranteed to be available
- Price fluctuates based on demand
Use cases:
- Batch processing jobs
- CI/CD pipelines
- Development and testing environments
- Stateless, fault-tolerant applications
- Background processing tasks
Small vs Large Instances#
Small Instances (e.g., t3.small, t3.medium)#
Pros:
- Lower cost per instance
- Better bin packing for small pods
- Reduced blast radius if a node fails
- Faster node provisioning and termination
- More granular scaling
Cons:
- Higher overhead ratio (system pods consume larger percentage)
- More nodes to manage
- Potential for increased network traffic between nodes
- May hit AWS API rate limits with many nodes
Best for:
- Microservices with small resource requirements
- Development and testing environments
- Applications with variable load
Large Instances (e.g., m5.2xlarge, m5.4xlarge)#
Pros:
- Lower overhead ratio (system resources are a smaller percentage)
- Fewer nodes to manage
- Better for applications requiring node-local communication
- More efficient for large pods
Cons:
- Higher cost per instance
- Larger blast radius on node failure
- Less granular scaling
- Potential for resource waste with poor bin packing
Best for:
- Resource-intensive applications
- Databases and caches
- Applications requiring high memory or CPU
- Workloads with stable, predictable resource needs
Currently Available Node Types#
The IT Common Platform currently offers the following standard node configurations:
Standard Node Types#
- t3a.medium (2 vCPU, 4 GB RAM) - Burstable performance, general purpose
- t3a.large (2 vCPU, 8 GB RAM) - Burstable performance, general purpose
These instance types are suitable for most general-purpose workloads and provide a good balance of cost and performance. They are currently used across all production, pre-production, and development environments.
Need Different Instance Types?#
If your workload requires different instance types (e.g., compute-optimized, memory-optimized, or larger instances), please create a ServiceNow ticket. The team can help evaluate your requirements and configure appropriate node pools for your specific needs.
AWS Instance Costs#
To estimate and compare costs for different instance types:
- AWS EC2 Instance Pricing - Official pricing for all instance types
- AWS Pricing Calculator - Estimate costs for your configuration
- EC2 Instance Comparison - Third-party tool to compare instance types and prices
- AWS Spot Pricing History - View spot price trends
How to Request Node Pool Changes#
- Gather requirements: Resource needs (CPU/memory), workload type, availability requirements, desired instance types
- Submit request: Go to IT Common Platform Consultation and include:
- Namespace/tenant name
- Desired instance types and capacity type (on-demand/spot)
- Resource limits and performance requirements
- Business justification
- Validate changes: After implementation, verify pod placement with
kubectl get pods -o wide -n <namespace>and monitor in Grafana
Best Practices#
- Start with standard instance types (t3a.medium/t3a.large) and optimize based on metrics
- Monitor actual usage through Grafana before requesting changes
- Set appropriate pod resource requests to ensure efficient bin packing:
- Set pod disruption budgets when using spot instances to maintain availability
Common Configurations#
Development Environment#
nodePools:
- name: dev
instanceTypes: ["t3a.small", "t3a.medium"]
capacityTypes: ["spot"]
limits:
cpu: 20
memory: 40Gi
Production Web Application#
nodePools:
- name: web-app
instanceTypes: ["t3a.medium", "t3a.large"]
capacityTypes: ["on-demand"]
limits:
cpu: 100
memory: 200Gi
Monitoring and Optimization#
Use the platform's monitoring tools to validate your instance choices:
- Grafana Dashboards: Monitor CPU, memory, and network utilization
- Opencost: Analyze cost per pod and namespace
Regular review of these metrics will help you optimize your node pool configuration for both performance and cost.