Kubernetes abstracts infrastructure beautifully — but that abstraction can hide waste. Teams often over-provision resources because the cost of an outage feels higher than the cost of idle compute. The result: clusters running at 15-25% utilization while cloud bills climb monthly.
Understanding Where Money Goes
Before optimizing, you need visibility. Most Kubernetes cost comes from three sources:
Compute — Worker nodes are typically 60-70% of the bill. Over-provisioned CPU and memory requests mean you are paying for capacity that sits idle.
Storage — Persistent volumes, especially high-IOPS SSD classes, add up quickly. Orphaned PVCs from deleted workloads keep billing silently.
Networking — Cross-AZ traffic, NAT gateway charges, and load balancer fees are often overlooked but can represent 10-15% of total spend.
Right-Sizing Workloads
The most impactful optimization is aligning resource requests with actual usage. Most teams set requests based on guesswork during initial deployment and never revisit them.
Use the Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual CPU and memory consumption over 7-14 days. Compare recommendations against current requests — the gap is your waste.
A common pattern: a service requests 1 CPU and 2Gi memory but consistently uses 200m CPU and 400Mi memory. That is 80% waste on a single workload. Multiply across 50-100 services and the savings are substantial.
Autoscaling Strategies
Horizontal Pod Autoscaler (HPA) — Scale pod replicas based on CPU, memory, or custom metrics. Set target utilization to 70-80% to balance responsiveness with efficiency. Use custom metrics (requests per second, queue depth) for more accurate scaling.
Cluster Autoscaler — Automatically adds or removes nodes based on pending pods. Configure scale-down delays to avoid thrashing. Use mixed instance pools to balance cost and availability.
KEDA (Event-Driven) — For workloads with variable traffic patterns, KEDA scales to zero during idle periods and ramps up based on queue length or event count.
Spot and Preemptible Instances
Spot instances offer 60-90% discounts but can be reclaimed with short notice. They work well for:
- Stateless web services with multiple replicas
- Batch processing and data pipelines
- CI/CD build agents
- Development and staging environments
Use node affinity and tolerations to schedule spot-tolerant workloads on cheaper nodes while keeping stateful services on on-demand capacity.
Storage Optimization
- Audit persistent volumes monthly — delete orphaned PVCs
- Use appropriate storage classes: not everything needs high-IOPS SSD
- Implement lifecycle policies for logs and temporary data
- Consider object storage (S3/GCS) instead of block storage for large datasets
FinOps Practices
Technology alone does not control costs — you need organizational practices:
- 1Tagging and allocation — Label every namespace and workload with team, environment, and cost center
- 2Showback reports — Monthly reports showing each team their infrastructure spend
- 3Budget alerts — Automated notifications when spend exceeds thresholds
- 4Regular reviews — Quarterly optimization sprints focused on the top 10 cost drivers
Quick Wins Checklist
- Enable VPA in recommendation mode across all namespaces
- Reduce over-provisioned resource requests by 30%+
- Move dev/staging to spot instances
- Delete orphaned PVCs and unused load balancers
- Set up cost monitoring with Kubecost or OpenCost
- Configure HPA on stateless services
- Implement node auto-scaling with appropriate cool-down periods
Conclusion
Kubernetes cost optimization is not a one-time project — it is an ongoing practice. Start with visibility (you cannot optimize what you cannot measure), right-size the obvious waste, then build organizational habits that prevent cost drift. Teams that adopt FinOps practices consistently achieve 30-50% cost reduction within the first quarter.