Tuesday, June 02, 2026

Twenty Years of Compute: From Virtual Machines to Intelligence

 

Over the past twenty years, compute has mostly done one thing: moved profit up the stack — first past machines, then past software, and now past large models themselves.

Cerebras has just gone public. In intraday trading on its first day, the stock nearly doubled, and its market cap briefly touched about $95 billion.

On the same news tape, Nvidia paid roughly $20 billion at the end of 2025 to absorb Groq’s core team. OpenAI and Anthropic now raise $10 billion-plus rounds. Capital has cast its vote almost entirely for two layers: companies that train large models, and companies that build GPUs and inference chips.

This looks like the endgame. It is more likely the high-water mark of a compute cycle repeating itself. Twenty years ago, the scarce thing was the machine. Later, profit climbed to the cloud, then to software, then to applications. The two layers that look most profitable today will probably not hold high margins five to ten years from now. Profit will move from “making intelligence” to “using intelligence.”

1. The Old Script: The Bottom Layer Always Gets Commoditized

Cloud computing has replayed the same script again and again: the layer that is scarce, expensive, and most profitable today becomes standardized and commoditized tomorrow, handing its premium to the next layer built on top.

In August 2006, AWS launched EC2. Compute moved from the heavy fixed asset of self-built data centers to an “instance-hour” utility meter. SaaS then turned software into “seat-month” subscriptions, with operations, upgrades, and security patches swallowed by vendors. Every time the stack grew upward, the layer beneath it was commoditized once more, and money moved up.

Over the past twenty years, compute went from machines to cloud, and from cloud to software. Over the next decade, intelligence will move along the same chain. The billing unit has already climbed from one token, to one action, to one solved problem. The only question is where this round of commoditization stops. My answer: it burns all the way through large models and GPUs themselves.

2. The Two Most Expensive Layers Are Being Hollowed Out by Their Own Builders

For intelligence to become cheap, its power bill has to collapse first. That is already happening, and the people building it are driving the collapse themselves.

The first nail is convergence. GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro all crowd into the 93%-94% range on GPQA Diamond. Benchmarks this close to saturation can no longer support a durable premium.

Move to engineering tasks, and the leaderboards spread out again.

OpenAI reports GPT-5.5 at 58.6% on SWE-Bench Pro. When Anthropic introduced Opus 4.8, it emphasized workflow benchmarks such as SWE-bench, Super-Agent, and CursorBench. The point is not that “all models are the same.” The point is that a general foundation model is becoming harder to rent out at a premium on the strength of a single benchmark.

The second nail is price collapse. According to Epoch AI, the median inference price needed to reach the same benchmark score has been falling by roughly 50x per year. DeepSeek R1 entered the market and cut comparable inference pricing by about 96%, forcing OpenAI, Anthropic, and Google to follow with steep price cuts across comparable models within a year.

The irony is that high-speed inference, one of capital’s favorite lanes, is also pushing prices down. Capital is pouring tens of billions into “faster, cheaper tokens,” but that is exactly how the intelligence layer becomes commoditized. The more successful this commoditization becomes, the more intelligence looks like electricity, and the less likely it is to command high prices for long. Today’s hot money is rushing into the layer that may be least profitable in the future.

The third nail is open source. Open source does not need to match the frontier. It only needs to be “good enough” in more and more commercial settings, and the map of “good enough” expands every year. The frontier still leads by 6–12 months in long-tail areas such as agents and long-horizon coding, but that long tail keeps retreating. Frontier premiums are temporary rent. What can command a high price is always “today’s frontier,” not “yesterday’s frontier.”

3. Inference Is Moving Down to Edge Devices

The second pressure-release valve is local inference.

Small open-source models that run on phones and laptops are getting stronger with each generation. Llama, Qwen, Gemma, Phi, and other models with only a few billion parameters, paired with on-device NPUs, can already handle a meaningful share of daily work: summarization, rewriting, classification, and local question answering. Apple, Qualcomm, and Intel are building NPUs into every main chip, effectively preinstalling a slice of free inference across more than a billion devices.

Once inference can run for free on your own device, the cloud-side tollbooth that charges by the token loses much of its grip. Cloud and GPUs will still carry training and the heaviest frontier inference, but massive volumes of “good enough” daily calls will sink to the device. Add convergence and falling prices, and the conclusion is blunt: the marginal profit of making intelligence is being squeezed toward zero from both ends.

4. The Valuable Thing Was Never the Model. It Was Using It Well.

So where does the profit go? Start with a fact that keeps reappearing: enterprise spending on models often does not pencil out. Spending on wiring models into the business does.

In MIT NANDA’s August 2025 survey, roughly 95% of companies had spent money on generative AI, yet could not see measurable returns in the P&L. The methodology was debated, but the direction matches multiple CIO surveys: the failure is not that the models do not work. It is that they do not fit into workflows.

The two most awkward examples came from two of the most aggressive companies. At the end of 2025, Microsoft rolled out Claude Code across its Experiences and Devices group. Less than half a year later, it adjusted the deployment and moved some engineers back to its own GitHub Copilot CLI. Uber rolled Claude Code out to thousands of engineers, burned through its entire annual AI coding budget by April 2026, and saw its COO publicly question the ROI. These companies were not avoiding AI. Quite the opposite: they pushed AI to the limit early, and therefore hit the “the math does not work” wall early.

An e-commerce ERP CIO I know put it more directly. Last year, they launched 12 AI pilots. The board applauded the demos. Six months later, finance kept only an Excel plugin. The other survivor was invoice extraction plugged into an existing OCR pipeline. The other 10 did not fail because the model was bad. They failed because they could not be wired into the ticketing system, could not pass compliance review, or had no cross-functional owner. “The model answered correctly. The organization could not absorb it.”

That sentence reveals where the profit goes. Models themselves are getting cheaper and more similar. What remains truly scarce, truly hard, and therefore truly valuable is the ability to embed them into real processes and solve a real problem. That work lives at the application and software layer, not the model layer.

5. Intelligence Is Electricity. So Where Does the Road Lead?

The dream of “selling intelligence like electricity” has been around for more than sixty years. In 1961, McCarthy said in MIT’s centennial lecture that computation would one day be sold to everyone like a public utility. In March 2026, Altman said almost the same thing at BlackRock’s infrastructure summit, with intelligence substituted into the line.

The history of electricity contains a fact people often miss: power generation has never been the most profitable link. When electricity prices approach commodity levels, power plants earn thin regulated margins. The real profit goes to the people who use electricity to do things: factories, appliances, and the whole modern economy.

If intelligence becomes electricity, model companies are power plants, and GPUs are generators. Profit will not stay in their hands. It will move to applications and software that use intelligence to solve real problems.

Some will say that AWS turned servers into a commodity and still became Amazon’s profit engine. Correct.

In 2025, AWS reached $128.7 billion in revenue and $45.6 billion in operating profit, but not because it sold a single premium point of “the strongest compute.” It won through scale, lock-in, ecosystem control, and long-term operating leverage. The best destination for model companies may also look more like an “AWS of intelligence” than like a business that rents out “today’s strongest model.”

As for who captures the application-layer value — whether a new “App Store” appears as it did in the mobile era, and who owns it — that is still an unsettled wager. But one thing is already clear: it will not automatically belong to today’s model vendors or chip vendors.

The next AI giant does not necessarily need the strongest model, nor the most GPUs. It only has to answer one question: who can plug cheap intelligence into the expensive real world?

No comments:

Post a Comment