Kubernetes Cost Optimization: A Practical Guide

Kubernetes abstracts infrastructure beautifully — but that abstraction can hide waste. Teams often over-provision resources because the cost of an outage feels higher than the cost of idle compute. The result: clusters running at 15-25% utilization while cloud bills climb monthly.

Understanding Where Money Goes

Before optimizing, you need visibility. Most Kubernetes cost comes from three sources:

Compute — Worker nodes are typically 60-70% of the bill. Over-provisioned CPU and memory requests mean you are paying for capacity that sits idle.

Storage — Persistent volumes, especially high-IOPS SSD classes, add up quickly. Orphaned PVCs from deleted workloads keep billing silently.

Networking — Cross-AZ traffic, NAT gateway charges, and load balancer fees are often overlooked but can represent 10-15% of total spend.

Right-Sizing Workloads

The most impactful optimization is aligning resource requests with actual usage. Most teams set requests based on guesswork during initial deployment and never revisit them.

Use the Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual CPU and memory consumption over 7-14 days. Compare recommendations against current requests — the gap is your waste.

A common pattern: a service requests 1 CPU and 2Gi memory but consistently uses 200m CPU and 400Mi memory. That is 80% waste on a single workload. Multiply across 50-100 services and the savings are substantial.

Autoscaling Strategies

Horizontal Pod Autoscaler (HPA) — Scale pod replicas based on CPU, memory, or custom metrics. Set target utilization to 70-80% to balance responsiveness with efficiency. Use custom metrics (requests per second, queue depth) for more accurate scaling.

Cluster Autoscaler — Automatically adds or removes nodes based on pending pods. Configure scale-down delays to avoid thrashing. Use mixed instance pools to balance cost and availability.

KEDA (Event-Driven) — For workloads with variable traffic patterns, KEDA scales to zero during idle periods and ramps up based on queue length or event count.

Spot and Preemptible Instances

Spot instances offer 60-90% discounts but can be reclaimed with short notice. They work well for:

Stateless web services with multiple replicas
Batch processing and data pipelines
CI/CD build agents
Development and staging environments

Use node affinity and tolerations to schedule spot-tolerant workloads on cheaper nodes while keeping stateful services on on-demand capacity.

Storage Optimization

Audit persistent volumes monthly — delete orphaned PVCs
Use appropriate storage classes: not everything needs high-IOPS SSD
Implement lifecycle policies for logs and temporary data
Consider object storage (S3/GCS) instead of block storage for large datasets

FinOps Practices

Technology alone does not control costs — you need organizational practices:

1Tagging and allocation — Label every namespace and workload with team, environment, and cost center
2Showback reports — Monthly reports showing each team their infrastructure spend
3Budget alerts — Automated notifications when spend exceeds thresholds
4Regular reviews — Quarterly optimization sprints focused on the top 10 cost drivers

Quick Wins Checklist

Enable VPA in recommendation mode across all namespaces
Reduce over-provisioned resource requests by 30%+
Move dev/staging to spot instances
Delete orphaned PVCs and unused load balancers
Set up cost monitoring with Kubecost or OpenCost
Configure HPA on stateless services
Implement node auto-scaling with appropriate cool-down periods

Conclusion

Kubernetes cost optimization is not a one-time project — it is an ongoing practice. Start with visibility (you cannot optimize what you cannot measure), right-size the obvious waste, then build organizational habits that prevent cost drift. Teams that adopt FinOps practices consistently achieve 30-50% cost reduction within the first quarter.