The cost of running AI has fallen 900 times since 2022. Token prices halve every three to six months. This is the fastest deflationary curve in the history of computing infrastructure – faster than cloud storage, faster than semiconductors, faster than anything the enterprise software industry has seen before.
But contact center budgets are definitely not reflecting that.
If anything, AI spend is going up. Leaders who expected costs to fall after their AI investments are instead fielding questions from finance about why the bills keep rising.
The technology is getting cheaper. The invoices aren’t getting smaller.
This isn’t a coincidence. It’s a structural problem – and it has four distinct causes.
The distance problem
Before getting to the four drivers of AI pricing, it helps to understand the underlying dynamic.
Most enterprise AI isn’t bought directly. It’s bought through a platform – a CCaaS vendor, a CRM suite, a workforce management tool – that has wrapped an underlying model in its own software layer and repriced it accordingly. That layer is where value is captured. It’s also where the deflationary curve stops.
The further you are from the model, the less of the cost reduction you see. Every intermediary between your budget and the underlying API is a point where margin is rebuilt on top of falling infrastructure costs.
This isn’t an argument against buying AI through platforms. It’s an argument for understanding exactly what you’re buying – and what you’re paying for that you might not need to.

Why your AI bill isn’t falling
1. You’re running the wrong model for the task
There is currently a 336x price gap between the most and least expensive AI models on the market. That gap exists because different models are built for genuinely different things. Frontier models – the ones vendors lead with in demos – are optimized for complex reasoning, nuanced generation, and tasks that require real depth.
Most contact center tasks don’t require that.
Intent classification, call routing logic, simple query handling, wrap-up summarization – these are well within the capability of much cheaper, faster, lighter models. But vendors bundle flagship models because it’s easier to sell and better for their margins. The result is that you’re running a formula one engine to drive to the supermarket, and paying accordingly.
What to do:
Before signing or renewing any AI contract, ask which model runs on which task – and why. If the vendor can’t answer that clearly, treat it as a signal. Map your actual use cases to model requirements before you commit, not after.
2. Your token usage is inflated – and nobody is fixing it
78% of AI leaders report unexpected charges. A significant portion of that gap comes not from the unit price, but from the volume of tokens being consumed – driven by poor prompt design, inefficient orchestration, and pipelines that weren’t built with cost efficiency in mind.
The deeper problem is incentive alignment. Inside a vendor’s platform layer, there is no structural reason to optimize your token consumption. More tokens used means more billed. Efficiency is your problem, not theirs.
Token waste is invisible in most reporting. It doesn’t show up as a line item. It accumulates quietly behind a per-seat or per-resolution price that obscures what’s actually happening at the model layer.
What to do:
Treat token efficiency as an operational metric – the same way you’d treat handling time or cost per contact. Ask vendors for consumption data, not just billing summaries. If you have access to the underlying usage logs, review them. The patterns are usually revealing.

3. Vendor capture is absorbing the savings
The hidden cost above the listed price in a typical CCaaS AI contract is 60%. That figure deserves a moment.
The deflationary curve is real – at the model layer. But the software layer above it is where margin is reconstructed. As infrastructure costs fall, platform vendors don’t pass the reduction through. They rebuild margin. The list price may stay flat or even fall slightly, while the underlying cost of delivery drops dramatically. The delta is captured by the vendor.
This is rational behavior from a vendor perspective. It’s also something buyers rarely scrutinize because the billing structure makes it nearly impossible to see.
What to do:
In any vendor conversation, push to separate the model cost from the platform cost. What are you paying for the underlying inference? What are you paying for the software layer on top? If a vendor won’t or can’t give you that breakdown, you’re operating blind – and they know it.
4. Falling prices, rising usage – net AI spend stays flat
Between 2024 and 2025, input token usage grew 3x and output token usage grew 4x across enterprise AI deployments. Token prices fell. Total spend didn’t.
This is the offset problem. The deflationary curve on unit cost is real, but volume growth absorbs it. As AI becomes more embedded in contact center operations – more touchpoints, more automation steps, more summarization and classification running in parallel – consumption grows faster than prices fall.
CX workloads don’t scale token usage as aggressively as coding workloads, but the directional trend is the same. A falling price on a growing volume is not a saving. It’s a smaller increase than you would otherwise have had.
What to do:
Track actual consumption over time, not just unit price. Build a baseline now if you don’t have one. When a vendor presents cost reduction figures, ask whether those figures account for projected usage growth – or whether they’re comparing unit costs in isolation.

The common thread
Each of these four problems is distinct. But they share a root cause: buying AI at a distance, through layers that obscure the economics and misalign the incentives.
That distance isn’t accidental. Platform vendors benefit from opacity. Bundled models, opaque billing, and consumption data that stays inside the platform are features of the business model, not oversights.
The contact centers that will manage AI costs effectively over the next three to five years won’t necessarily be the ones with the most negotiating leverage. They’ll be the ones with enough architectural visibility to see what’s actually running, and enough flexibility to make different choices as the market moves.
What that looks like in practice
A few principles that follow from the analysis:
Know which model is running on which task. This should be a standard procurement question, not an afterthought.
Own your consumption data. If your current platform doesn’t give you access to token-level usage, that’s a gap worth addressing – either through contract negotiation or platform review.
Separate inference cost from platform cost. The two are bundled for vendor convenience, not yours. Understanding the split is the foundation of any serious cost conversation.
Build for optionality. The AI model market is moving faster than any enterprise procurement cycle. Architectures that lock you into a single model or vendor layer will cost more over time, not less – because you can’t respond to a cost curve you can’t access.
The savings are real
The 900x reduction in AI infrastructure cost since 2022 is not vendor spin. The halving cycle is documented. The deflationary pressure on model pricing is structural and ongoing.
The savings exist. They’re just not reaching you yet – because the way most enterprise AI is bought ensures they don’t have to.
That’s a solvable problem. But solving it starts with understanding where the distance is, and what it’s costing you.