Automation platform provider Cast AI, today released its 2026 State of Kubernetes Optimization Report, a comprehensive analysis of GPU, CPU, and memory utilization across non-optimized Kubernetes clusters. Drawing on data from tens of thousands of clusters, the report delivers a clear and urgent message: GPUs are poorly utilized at 5%, despite their cost. The efficiency gains that Kubernetes was designed to unlock are not emerging naturally with scale, and the gap between what organizations are paying for and what they are actually using is widening.
Kubernetes Is Growing. Efficiency Is Not
As Kubernetes adoption accelerates across organizations of every size and industry, resource utilization is moving in the opposite direction. Average CPU utilization across clusters stood at just 8% in 2025, while memory utilization was 20%.
The AI Era Is Adding a New Layer of Waste
A newer and rapidly escalating pressure is amplifying the problem: the expansion of GPU-equipped nodes as Kubernetes becomes the default platform for AI and ML workloads. Yet the data tells the same story as CPU and memory. GPU utilization averaged just 5% across the clusters analyzed, representing an enormous and largely invisible cost for organizations investing heavily in AI infrastructure.
As enterprises race to build AI capabilities on Kubernetes, the report warns that without the right optimization infrastructure in place, GPU waste will emerge as one of the most expensive inefficiencies in the modern cloud stack.
One-Time Fixes Are Not Enough
Cast AI’s report identifies a critical misconception holding back Kubernetes efficiency: the belief that configuration is a deployment-time task. Rightsizing that runs once at deployment is not rightsizing. Workloads change, traffic patterns shift, and the configuration that was accurate six months ago is unlikely to remain accurate today. The same applies to Spot Instance selection, autoscaler configuration, commitment utilization, and node lifecycle management. Each has a time dimension that manual processes simply cannot keep pace with at scale.
“A GPU sitting idle costs dollars per hour. A CPU sitting idle costs cents. And 95% of GPU capacity is doing nothing,” said Laurent Gil, founder and president, Cast AI. “Cloud vendors just raised H200 prices 15%, breaking a 20-year trend of falling compute costs. That’s not a configuration problem as much as it is a business emergency. Autonomous optimization is the only rational response to infrastructure economics that are moving against you.”
