Webinar

Modernize Your Manufacturing and Retail Data Estate with Microsoft Fabric

In this live webinar discover practical strategies and real-world insights from our experts.

Register Now

Summarize with AI

Not enough time? get the key points instantly.

For five straight years, industry tracking on cloud waste showed the number declining as FinOps practices matured.

In 2026, the trend reversed. Wasted cloud spend climbed to 29%, and the data points to AI and new PaaS complexity as the driver.

Worth pausing on what that means in practice. We built cost programs around workloads with predictable demand: databases that grow steadily, web tiers with consistent traffic, applications that consume roughly the same compute week to week.

GPU experimentation, container platforms, and inference workloads don’t behave that way.

The cost program is doing exactly what it was designed to do. The problem is that the workload it was designed for has been displaced.

Stay updated with Simform’s weekly insights.

Your forecast model is reading a workload that isn’t there anymore

Cloud cost forecasting works by reading the past and projecting it forward. Twenty-four months of tagged usage data, seasonal adjustments, planned launches, and the model returns a number that finance can budget against.

The math assumes that next month’s compute pattern resembles the pattern of the last several months. That assumption is what AI experimentation breaks.

A training run consumes 800 GPU-hours across three weeks and then stops. An inference workload runs on provisioned throughput in Q1, hits a model upgrade, and shifts to pay-as-you-go in Q2.

Neither pattern feeds the forecast anything stable to extrapolate from. The model returns a confident number against data that no longer follows a stable pattern.

The signal is already in the practitioner data. Industry tracking on AI cloud challenges ranks unpredictable AI usage patterns as the third-largest concern after security and data quality.

The FinOps Foundation 2026 report names AI value attribution as the top practitioner challenge and identifies pre-deployment architecture costing as the single capability practitioners most want from current tools and cannot get.

What you can do

Forecast AI workloads in their own cohort, separate from steady-state. Treat experimentation as project allocation with a defined end date, not a baseline to trend forward.

Bring engineering and finance into the same forecast cycle for AI, specifically, because the assumptions that worked for VMs need to be restated explicitly for GPUs.

Reservations bet on a stable baseline. AI workloads don’t have one.

Forecasting fails when the workload pattern doesn’t repeat. Commitments fail one step downstream, when the workload pattern can’t be locked in. Reserved Instances and Savings Plans are financial instruments built on the assumption that a workload has a stable enough demand floor to commit against for one to three years.

The discount math depends on it. A web tier with steady traffic earns the full reservation discount because the baseline holds.

AI workloads have two different shapes, and neither fits the instrument. Training runs are bursty with no continuous floor, and committing too aggressively leaves capacity stranded when the experiment ends.

Inference is continuous but version-dependent, so a reservation locked in against today’s model can be wrong by next quarter when the team upgrades to a newer foundation model with different compute requirements.

The strongest evidence isn’t a survey. It’s a product change. In February 2025, Microsoft enabled customer-initiated exchanges of Azure OpenAI-provisioned reservations directly in the portal, with refund flexibility shortly after.

That capability exists because the underlying commitment math broke at the workload level. The legacy 1-year and 3-year shape didn’t survive AI capacity patterns, and Microsoft’s own platform shipped a control admitting it.

Industry data underlines the same pattern: fewer than half of organizations are utilizing any single commitment discount type with any cloud provider in 2026.

What you can do

Classify every workload by stability class before any commitment decision. The production baseline moves to full reservation coverage. Inference goes to provisioned throughput units (PTUs) with exchange flexibility.

Training and experimentation stay on PAYG. Audit utilization quarterly, not annually, because the workload mix is now moving fast enough to invalidate annual commitment reviews.

Your container platform looks utilized. The pods inside it don’t.

Container cost dashboards measure utilization at the node level. The waste lives one layer down, at the pod-request level.

Engineers request more CPU and memory than the workload actually consumes, the cluster scales out to meet those requests, and the dashboard reports a healthy node running near its allocated capacity. The pod inside is consuming a quarter of what it requested.

This failure mode hides cleanly because the metric most cost programs monitor is in the wrong unit.

When most compute was VMs, node-level utilization was the right question. Containers shifted the unit of analysis to the pod, and the cost program didn’t follow.

Roughly 35% of compute spend now runs through containers, and almost none of the standard cost reviews look at requested-versus-actual at the workload tier.

Datadog’s 2024 research found 83% of container costs were tied to idle resources, with the majority coming from cluster overprovisioning and the rest from workload requests larger than actual use.

The cross-platform follow-up confirmed the pattern persists across Azure Container Apps and other managed runtimes; most workloads consume less than half their requested memory and under a quarter of their requested CPU.

What you can do

Change the metric. Pod-request-versus-actual at the workload tier replaces node utilization as the primary signal. AKS cost recommendations in Azure Advisor surface this directly.

Make the request-sizing recommendation a deployment gate rather than a quarterly review, so the resource request gets right-sized before the workload ships, not six months after.

The trend reversed because the workload shape changed faster than the control plane around it. Forecasting, commitments, and container metrics are where the breaks show up first because they carry the most weight.

Anomaly thresholds and tagging discipline bend in the same direction, and the operating model that holds across all five is what separates a cost program that drifts from one that compounds.

So the question worth sitting with is: when was your reservation portfolio last reclassified by workload stability, and does your forecast model still assume the workload mix you had two years ago?

Most teams find the answer is “longer than it should have been,” and the redesign is bigger than a quarterly review can absorb. Simform’s Azure Cost Optimization whitepaper covers which controls to keep, which to retire, and what continuous cost governance looks like when AI and container workloads are the new baseline.

Stay updated with Simform’s weekly insights.

Hiren is CTO at Simform with an extensive experience in helping enterprises and startups streamline their business performance through data-driven innovation.

Sign up for the free Newsletter

For exclusive strategies not found on the blog

Revisit consent button
How we use your personal information

We do not collect any information about users, except for the information contained in cookies. We store cookies on your device, including mobile device, as per your preferences set on our cookie consent manager. Cookies are used to make the website work as intended and to provide a more personalized web experience. By selecting ‘Required cookies only’, you are requesting Simform not to sell or share your personal information. However, you can choose to reject certain types of cookies, which may impact your experience of the website and the personalized experience we are able to offer. We use cookies to analyze the website traffic and differentiate between bots and real humans. We also disclose information about your use of our site with our social media, advertising and analytics partners. Additional details are available in our Privacy Policy.

Required cookies Always Active

These cookies are necessary for the website to function and cannot be turned off.

Optional cookies

Under the California Consumer Privacy Act, you may choose to opt-out of the optional cookies. These optional cookies include analytics cookies, performance and functionality cookies, and targeting cookies.

Analytics cookies

Analytics cookies help us understand the traffic source and user behavior, for example the pages they visit, how long they stay on a specific page, etc.

Performance cookies

Performance cookies collect information about how our website performs, for example,page responsiveness, loading times, and any technical issues encountered so that we can optimize the speed and performance of our website.

Targeting cookies

Targeting cookies enable us to build a profile of your interests and show you personalized ads. If you opt out, we will share your personal information to any third parties.