Cloud Waste Started Climbing Again. Your AI Workloads Are Why

For five straight years, industry tracking on cloud waste showed the number declining as FinOps practices matured.

In 2026, the trend reversed. Wasted cloud spend climbed to 29%, and the data points to AI and new PaaS complexity as the driver.

Worth pausing on what that means in practice. We built cost programs around workloads with predictable demand: databases that grow steadily, web tiers with consistent traffic, applications that consume roughly the same compute week to week.

GPU experimentation, container platforms, and inference workloads don’t behave that way.

The cost program is doing exactly what it was designed to do. The problem is that the workload it was designed for has been displaced.

Stay updated with Simform’s weekly insights.

Subscribe Now

Your forecast model is reading a workload that isn’t there anymore

Cloud cost forecasting works by reading the past and projecting it forward. Twenty-four months of tagged usage data, seasonal adjustments, planned launches, and the model returns a number that finance can budget against.

The math assumes that next month’s compute pattern resembles the pattern of the last several months. That assumption is what AI experimentation breaks.

A training run consumes 800 GPU-hours across three weeks and then stops. An inference workload runs on provisioned throughput in Q1, hits a model upgrade, and shifts to pay-as-you-go in Q2.

Neither pattern feeds the forecast anything stable to extrapolate from. The model returns a confident number against data that no longer follows a stable pattern.

The signal is already in the practitioner data. Industry tracking on AI cloud challenges ranks unpredictable AI usage patterns as the third-largest concern after security and data quality.

The FinOps Foundation 2026 report names AI value attribution as the top practitioner challenge and identifies pre-deployment architecture costing as the single capability practitioners most want from current tools and cannot get.

What you can do

Forecast AI workloads in their own cohort, separate from steady-state. Treat experimentation as project allocation with a defined end date, not a baseline to trend forward.

Bring engineering and finance into the same forecast cycle for AI, specifically, because the assumptions that worked for VMs need to be restated explicitly for GPUs.

Reservations bet on a stable baseline. AI workloads don’t have one.

Forecasting fails when the workload pattern doesn’t repeat. Commitments fail one step downstream, when the workload pattern can’t be locked in. Reserved Instances and Savings Plans are financial instruments built on the assumption that a workload has a stable enough demand floor to commit against for one to three years.

The discount math depends on it. A web tier with steady traffic earns the full reservation discount because the baseline holds.

AI workloads have two different shapes, and neither fits the instrument. Training runs are bursty with no continuous floor, and committing too aggressively leaves capacity stranded when the experiment ends.

Inference is continuous but version-dependent, so a reservation locked in against today’s model can be wrong by next quarter when the team upgrades to a newer foundation model with different compute requirements.

The strongest evidence isn’t a survey. It’s a product change. In February 2025, Microsoft enabled customer-initiated exchanges of Azure OpenAI-provisioned reservations directly in the portal, with refund flexibility shortly after.

That capability exists because the underlying commitment math broke at the workload level. The legacy 1-year and 3-year shape didn’t survive AI capacity patterns, and Microsoft’s own platform shipped a control admitting it.

Industry data underlines the same pattern: fewer than half of organizations are utilizing any single commitment discount type with any cloud provider in 2026.

What you can do

Classify every workload by stability class before any commitment decision. The production baseline moves to full reservation coverage. Inference goes to provisioned throughput units (PTUs) with exchange flexibility.

Training and experimentation stay on PAYG. Audit utilization quarterly, not annually, because the workload mix is now moving fast enough to invalidate annual commitment reviews.

Your container platform looks utilized. The pods inside it don’t.

Container cost dashboards measure utilization at the node level. The waste lives one layer down, at the pod-request level.

Engineers request more CPU and memory than the workload actually consumes, the cluster scales out to meet those requests, and the dashboard reports a healthy node running near its allocated capacity. The pod inside is consuming a quarter of what it requested.

This failure mode hides cleanly because the metric most cost programs monitor is in the wrong unit.

When most compute was VMs, node-level utilization was the right question. Containers shifted the unit of analysis to the pod, and the cost program didn’t follow.

Roughly 35% of compute spend now runs through containers, and almost none of the standard cost reviews look at requested-versus-actual at the workload tier.

Datadog’s 2024 research found 83% of container costs were tied to idle resources, with the majority coming from cluster overprovisioning and the rest from workload requests larger than actual use.

The cross-platform follow-up confirmed the pattern persists across Azure Container Apps and other managed runtimes; most workloads consume less than half their requested memory and under a quarter of their requested CPU.

What you can do

Change the metric. Pod-request-versus-actual at the workload tier replaces node utilization as the primary signal. AKS cost recommendations in Azure Advisor surface this directly.

Make the request-sizing recommendation a deployment gate rather than a quarterly review, so the resource request gets right-sized before the workload ships, not six months after.

The trend reversed because the workload shape changed faster than the control plane around it. Forecasting, commitments, and container metrics are where the breaks show up first because they carry the most weight.

Anomaly thresholds and tagging discipline bend in the same direction, and the operating model that holds across all five is what separates a cost program that drifts from one that compounds.

So the question worth sitting with is: when was your reservation portfolio last reclassified by workload stability, and does your forecast model still assume the workload mix you had two years ago?

Most teams find the answer is “longer than it should have been,” and the redesign is bigger than a quarterly review can absorb. Simform’s Azure Cost Optimization whitepaper covers which controls to keep, which to retire, and what continuous cost governance looks like when AI and container workloads are the new baseline.

Stay updated with Simform’s weekly insights.

Subscribe Now

Modernize Your Manufacturing and Retail Data Estate with Microsoft Fabric

Cloud Waste Started Climbing Again. Your AI Workloads Are Why

Table of Contents

Stay updated with Simform’s weekly insights.

Your forecast model is reading a workload that isn’t there anymore

What you can do

Reservations bet on a stable baseline. AI workloads don’t have one.

What you can do

Your container platform looks utilized. The pods inside it don’t.

What you can do

Stay updated with Simform’s weekly insights.

Hiren Dhaduk

Cancel reply

Modernize Your Manufacturing and Retail Data Estate with Microsoft Fabric

Cloud Waste Started Climbing Again. Your AI Workloads Are Why

Summarize with AI

Table of Contents

Stay updated with Simform’s weekly insights.

Your forecast model is reading a workload that isn’t there anymore

What you can do

Reservations bet on a stable baseline. AI workloads don’t have one.

What you can do

Your container platform looks utilized. The pods inside it don’t.

What you can do

Stay updated with Simform’s weekly insights.

Get in Touch

Hiren Dhaduk

Cancel reply

Sign up for the free Newsletter

Related Posts

Your Retail Data Modernization Will Succeed or Fail on One Decision

The SDLC Phases Your AI Budget Skipped

What Your AI Agent Actually Needs From Your Data, Your Systems, and Your Team