Tuesday, June 02, 2026

Twenty Years of Compute: From Virtual Machines to Intelligence

 

Over the past twenty years, compute has mostly done one thing: moved profit up the stack — first past machines, then past software, and now past large models themselves.

Cerebras has just gone public. In intraday trading on its first day, the stock nearly doubled, and its market cap briefly touched about $95 billion.

On the same news tape, Nvidia paid roughly $20 billion at the end of 2025 to absorb Groq’s core team. OpenAI and Anthropic now raise $10 billion-plus rounds. Capital has cast its vote almost entirely for two layers: companies that train large models, and companies that build GPUs and inference chips.

This looks like the endgame. It is more likely the high-water mark of a compute cycle repeating itself. Twenty years ago, the scarce thing was the machine. Later, profit climbed to the cloud, then to software, then to applications. The two layers that look most profitable today will probably not hold high margins five to ten years from now. Profit will move from “making intelligence” to “using intelligence.”

1. The Old Script: The Bottom Layer Always Gets Commoditized

Cloud computing has replayed the same script again and again: the layer that is scarce, expensive, and most profitable today becomes standardized and commoditized tomorrow, handing its premium to the next layer built on top.

In August 2006, AWS launched EC2. Compute moved from the heavy fixed asset of self-built data centers to an “instance-hour” utility meter. SaaS then turned software into “seat-month” subscriptions, with operations, upgrades, and security patches swallowed by vendors. Every time the stack grew upward, the layer beneath it was commoditized once more, and money moved up.

Over the past twenty years, compute went from machines to cloud, and from cloud to software. Over the next decade, intelligence will move along the same chain. The billing unit has already climbed from one token, to one action, to one solved problem. The only question is where this round of commoditization stops. My answer: it burns all the way through large models and GPUs themselves.

2. The Two Most Expensive Layers Are Being Hollowed Out by Their Own Builders

For intelligence to become cheap, its power bill has to collapse first. That is already happening, and the people building it are driving the collapse themselves.

The first nail is convergence. GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro all crowd into the 93%-94% range on GPQA Diamond. Benchmarks this close to saturation can no longer support a durable premium.

Move to engineering tasks, and the leaderboards spread out again.

OpenAI reports GPT-5.5 at 58.6% on SWE-Bench Pro. When Anthropic introduced Opus 4.8, it emphasized workflow benchmarks such as SWE-bench, Super-Agent, and CursorBench. The point is not that “all models are the same.” The point is that a general foundation model is becoming harder to rent out at a premium on the strength of a single benchmark.

The second nail is price collapse. According to Epoch AI, the median inference price needed to reach the same benchmark score has been falling by roughly 50x per year. DeepSeek R1 entered the market and cut comparable inference pricing by about 96%, forcing OpenAI, Anthropic, and Google to follow with steep price cuts across comparable models within a year.

The irony is that high-speed inference, one of capital’s favorite lanes, is also pushing prices down. Capital is pouring tens of billions into “faster, cheaper tokens,” but that is exactly how the intelligence layer becomes commoditized. The more successful this commoditization becomes, the more intelligence looks like electricity, and the less likely it is to command high prices for long. Today’s hot money is rushing into the layer that may be least profitable in the future.

The third nail is open source. Open source does not need to match the frontier. It only needs to be “good enough” in more and more commercial settings, and the map of “good enough” expands every year. The frontier still leads by 6–12 months in long-tail areas such as agents and long-horizon coding, but that long tail keeps retreating. Frontier premiums are temporary rent. What can command a high price is always “today’s frontier,” not “yesterday’s frontier.”

3. Inference Is Moving Down to Edge Devices

The second pressure-release valve is local inference.

Small open-source models that run on phones and laptops are getting stronger with each generation. Llama, Qwen, Gemma, Phi, and other models with only a few billion parameters, paired with on-device NPUs, can already handle a meaningful share of daily work: summarization, rewriting, classification, and local question answering. Apple, Qualcomm, and Intel are building NPUs into every main chip, effectively preinstalling a slice of free inference across more than a billion devices.

Once inference can run for free on your own device, the cloud-side tollbooth that charges by the token loses much of its grip. Cloud and GPUs will still carry training and the heaviest frontier inference, but massive volumes of “good enough” daily calls will sink to the device. Add convergence and falling prices, and the conclusion is blunt: the marginal profit of making intelligence is being squeezed toward zero from both ends.

4. The Valuable Thing Was Never the Model. It Was Using It Well.

So where does the profit go? Start with a fact that keeps reappearing: enterprise spending on models often does not pencil out. Spending on wiring models into the business does.

In MIT NANDA’s August 2025 survey, roughly 95% of companies had spent money on generative AI, yet could not see measurable returns in the P&L. The methodology was debated, but the direction matches multiple CIO surveys: the failure is not that the models do not work. It is that they do not fit into workflows.

The two most awkward examples came from two of the most aggressive companies. At the end of 2025, Microsoft rolled out Claude Code across its Experiences and Devices group. Less than half a year later, it adjusted the deployment and moved some engineers back to its own GitHub Copilot CLI. Uber rolled Claude Code out to thousands of engineers, burned through its entire annual AI coding budget by April 2026, and saw its COO publicly question the ROI. These companies were not avoiding AI. Quite the opposite: they pushed AI to the limit early, and therefore hit the “the math does not work” wall early.

An e-commerce ERP CIO I know put it more directly. Last year, they launched 12 AI pilots. The board applauded the demos. Six months later, finance kept only an Excel plugin. The other survivor was invoice extraction plugged into an existing OCR pipeline. The other 10 did not fail because the model was bad. They failed because they could not be wired into the ticketing system, could not pass compliance review, or had no cross-functional owner. “The model answered correctly. The organization could not absorb it.”

That sentence reveals where the profit goes. Models themselves are getting cheaper and more similar. What remains truly scarce, truly hard, and therefore truly valuable is the ability to embed them into real processes and solve a real problem. That work lives at the application and software layer, not the model layer.

5. Intelligence Is Electricity. So Where Does the Road Lead?

The dream of “selling intelligence like electricity” has been around for more than sixty years. In 1961, McCarthy said in MIT’s centennial lecture that computation would one day be sold to everyone like a public utility. In March 2026, Altman said almost the same thing at BlackRock’s infrastructure summit, with intelligence substituted into the line.

The history of electricity contains a fact people often miss: power generation has never been the most profitable link. When electricity prices approach commodity levels, power plants earn thin regulated margins. The real profit goes to the people who use electricity to do things: factories, appliances, and the whole modern economy.

If intelligence becomes electricity, model companies are power plants, and GPUs are generators. Profit will not stay in their hands. It will move to applications and software that use intelligence to solve real problems.

Some will say that AWS turned servers into a commodity and still became Amazon’s profit engine. Correct.

In 2025, AWS reached $128.7 billion in revenue and $45.6 billion in operating profit, but not because it sold a single premium point of “the strongest compute.” It won through scale, lock-in, ecosystem control, and long-term operating leverage. The best destination for model companies may also look more like an “AWS of intelligence” than like a business that rents out “today’s strongest model.”

As for who captures the application-layer value — whether a new “App Store” appears as it did in the mobile era, and who owns it — that is still an unsettled wager. But one thing is already clear: it will not automatically belong to today’s model vendors or chip vendors.

The next AI giant does not necessarily need the strongest model, nor the most GPUs. It only has to answer one question: who can plug cheap intelligence into the expensive real world?

Thursday, May 21, 2026

Why Do We Need Each Other? Economics' Final Question

 In Book I, Chapter 1 of The Wealth of Nations, published in 1776, Adam Smith described a pin factory. An untrained worker making pins alone might not make even one pin a day, and certainly could not make twenty.

Wednesday, May 20, 2026

The Last Interface: Where Human-Computer Interaction Ends

 One late night in 1965, a programmer walked toward the machine room with a stack of punched cards in his arms.

Each card had 80 columns, one character per column. The whole stack might have carried less information than a text message today. He handed the cards to an operator in a white coat and went back to sleep. The result would arrive the next day. If one character was wrong, the whole day was wasted.

One late night in 2026, you open your computer and say: help me turn these 50 emails into a business trip plan. Draft the plan first. Ask me before booking tickets or sending messages.

The computer starts researching, arranging the schedule, editing spreadsheets, and finally lays out the actions waiting for your confirmation.

The goal is the same: make the machine do something for you. What changed is everything that used to sit between “your intent” and “the machine’s execution.”

From punched cards to today, the history of human-computer interaction is the history of removing that stack, layer by layer.


A History of Removing Intermediaries

In the punched-card era, humans served the machine. You could not touch the machine directly. You had to translate what was in your head into physical holes the machine could read, then hand those cards to a specialized class of operators who fed the machine for you. ENIAC, publicly unveiled in 1946, was even more extreme: “programming” it meant wiring logic into the machine with cables and plugboards. One rewiring job could take days. People were not using computers. They were tending to them.

In 1961, MIT demonstrated the CTSS time-sharing system. Terminals gave more people their first taste of using a computer in something close to real time. The machine began to respond to you. But the command line still forced humans to accommodate the machine: artificial language had to be memorized, command names, syntax, and parameters all had to be exact, and one wrong character meant an error. On the screen there was only a blank cursor. If you did not already know what to do, you had nowhere to begin.

In 1968, Engelbart demonstrated the mouse. Then came Xerox PARC, and then the Macintosh in 1984. The graphical interface introduced the desktop metaphor: it wrapped the alien computer in the familiar shape of an office desk, with files, folders, and a trash can.

The selling point of the Lisa was almost this simple: if you can recognize the trash can on an office desk, you can use this computer. This was the first time the machine actively borrowed a human mental model to accommodate the human. Interaction shifted from recall to recognition.

In 2007, the iPhone removed the mouse - the proxy pointer on the screen. Your finger landed directly on the content.

Now natural language is removing the last fixed intermediary: the controls you must first learn, locate, and understand. You no longer need to find the button first. You just say what you want.

Punched cards -> command line -> graphical interface -> touchscreen -> natural language. Every generation of interface has done the same thing: remove one layer of machine language that humans had to learn. In “The Battle for the Desktop: Who Will Take Over Your Computer,” I wrote that every leap has moved in the same direction: lowering the cost of human adaptation to machines, while increasing the machine’s ability to understand humans. The next question is obvious: if this curve keeps going, where does it end?


The End of the Curve: Not Zero, But One

Start by seeing an interface as a translation layer. It exists for one reason: humans and machines speak different languages. Punched cards, command lines, icons, and menus are all translations, each generation easier to understand than the one before.

So what happens when machines can understand human language directly?

The fixed translation layer loses its reason to stay permanently in front.

We have chased the dream of “operating machines by speaking” for a long time, and failed for a long time. Siri arrived in 2011. Alexa arrived in 2014. For more than a decade, the high-frequency uses of voice assistants remained concentrated around music, weather, timers, and alarms. The reason was simple: before large models, voice assistants depended on predefined skills or intent systems. Your wording had to fall into a slot they had prepared. They were not fully understanding you. They were matching you.

Large models changed exactly this. They can follow context, infer intent, ask follow-up questions, and remember what came before. For the first time, natural language is qualified to become something close to a complete interface, not merely an input shortcut.

But the idea that “the interface will disappear” is not new. In 1991, Mark Weiser wrote in Scientific American that the most profound technologies are the ones that disappear. In 2015, designer Golden Krishna wrote a book titled The Best Interface Is No Interface. More than thirty years later, that ideal has not arrived. What we got instead was more and more apps, and hundreds of icons inside a phone.

So my prediction is different from these earlier visions. The endpoint of the interface is not “zero interfaces.” It is “one interface”: one supervisable agentic entry point that can carry authorization and responsibility.

Why one?

First, convergence is a recurring script in the history of technology. The smartphone did not make devices vanish. It made devices converge. It swallowed the point-and-shoot camera, GPS navigator, MP3 player, calculator, flashlight, voice recorder, and paper map in one sweep. In CIPA data, global shipments of built-in-lens cameras fell from about 109 million units in 2010 to about 3.58 million units in 2020, a roughly 97% collapse in ten years. General-purpose platforms defeat special-purpose devices not because they are best at every single thing, but because they are more convenient as a whole.

Second, the key that makes “one interface” technically plausible is generative UI. In the past, “one entry point” meant “limited functionality,” because interfaces were drawn in advance by designers. Now interfaces can be generated on demand. Google has already shown early versions of this in Gemini 3-related products: AI Mode can generate interactive tools and simulations based on a query, and experimental views in the Gemini app can create one-off interactive interfaces from prompts. Need a slider? A slider appears. Need a table? A table appears. Use it, then discard it.

“Only one agentic entry point” no longer means “only a chat box.” Some front doors of specialized apps can retreat into the background and become tools and APIs called by the agent.


That One Interface Is a Cockpit, Not a Chat Box

But “one interface” does not mean “one chat box.” That is the easiest trap to fall into right now.

Today, the mainstream way we interact with AI is the text dialogue box. ChatGPT, Claude, and Gemini all look like this. But more people are pointing out something uncomfortable: the chat box looks a lot like the command line coming back. A blank input field, a blinking cursor, and you must invent what to say and discover through trial and error what it can do. Is that not the old command line problem all over again? The graphical interface worked so hard to move interaction from recall to recognition with menus and icons. Now we have returned to an empty box that tells you nothing. Some simply call it “a command line wearing natural language as a costume.”

To understand what it should become, we need to start from a fact that is badly underestimated: language is a low-bandwidth channel. A study across 17 languages found that the information rate of human speech is almost constant, at roughly 39 bits per second. The bandwidth from the retina to the brain is on the order of tens of millions of bits per second. The two measurements are not directly comparable, but the gap is already more than five orders of magnitude. This is the quantitative reminder behind “a picture is worth a thousand words.”

On the input side, language is excellent for expressing goals. One sentence is enough. On the output side, the machine must return analysis, data, and plans; you still need to inspect, compare, and continuously adjust. Language is too slow. Output must rely on vision.

So the last interface is a hybrid: you express intent in language, and it presents results visually. Generative UI summons the controls needed for the task. An entry point that listens, plus a canvas that changes on demand.

But that is only the form. The crucial change in the last interface is not its form. It is its nature.

Every interface before this - from punched cards to touchscreens - was an operation panel. You click once, the machine responds once. Control and responsibility remain in your hands at every step. An agentic interface is different. You no longer operate. You delegate. You state a goal, and it breaks that goal into a chain of actions and executes them. What you hand over is not an “instruction.” It is an “intent.”

This means the last interface is not fundamentally about input. It is about trust and supervision.

When an agent runs a long chain of actions you have not reviewed one by one, what you really need is not a brighter button. You need four things: visibility into what it plans to do, so the black box becomes a glass box; the ability to stop and correct it when it goes off course; a way to verify afterward that it did the right thing; and boundaries that define what it may decide by itself and what it must ask you before doing.

HCI already has a useful vocabulary for this: the human role is moving from human-in-the-loop, stuck inside the loop approving each step, to human-on-the-loop, standing above the loop and supervising it. In “The Reins of Artificial Intelligence and the Return of Cybernetics,” I wrote that engineers are putting a precise set of reins on large models. Those reins are attached to the machine. The last interface is the other end of the reins - the end held in human hands.

It is no longer a panel. It is a cockpit.

Of course, graphical interfaces will not disappear. Ben Shneiderman’s idea of “direct manipulation,” proposed in 1983, still holds. Continuous, spatial, and fuzzy intentions are inherently hard to express through the one-dimensional channel of language. The command line is not dead either. Programmers still live in terminals every day. Convergence does not mean extinction. What will disappear is the current mode in which every graphical interface governs its own separate front door. They will retreat behind the agent, be summoned when needed, and fold away when finished. One entry point in front; graphics still alive behind it.

As for who controls that entry point, and what power structure will emerge from interface convergence - that belongs to another essay, “The Battle for the Desktop.” Here I only want to make one point clear: once interfaces converge, their nature changes.


Two Late Nights: A Spiral, Not a Loop

At this point, the whole thing may feel absurd.

Human-computer interaction has worked for more than sixty years. From the 80 columns of a punched card, to the text box of the command line, to the countless icons on the desktop, to touchscreens, and then to… an entry point that looks like a text box. We have made a huge circle and returned to a blinking cursor.

But this is not a loop. It is a spiral.

The shape is similar: both are boxes waiting for input. But the direction is completely different. The command-line box required you to learn its language: memorize commands, remember syntax, and fail on one wrong character. The agentic entry point means it learns your language: say it however you like, and it follows your meaning.

After more than sixty years, we have not returned to the starting point. We have returned to the space above it.

Back to the two late nights at the beginning.

Science fiction had already drawn this scene long ago. In Star Trek, crew members could look up and say “Computer” to ask any question or issue any command. The Alexa team later publicly acknowledged that this shipboard computer was one of their original inspirations. In 1987, Apple produced the Knowledge Navigator concept video, in which a conversational assistant could ask follow-up questions and manage your schedule. Frame by frame, it almost predicted today’s Siri lineage. Then came Samantha in Her: no screen, only voice.

But these stories almost always carry a trace of unease. HAL in 2001: A Space Odyssey moves from a gentle voice to lethal intent. Samantha in Her, at her most intimate with you, quietly evolves beyond the range you can follow, then leaves. The more natural and intimate the conversational interface becomes, the heavier the unease grows: are you using it, or depending on it; are you supervising it, or is it taking care of you?

In 1965, a human carried cards and served the machine. In 2026, a human speaks one goal, and the machine begins to run the process on the human’s behalf. On the surface, the master-servant relationship has been completely reversed.

But when the last interface becomes an agent you must constantly supervise, authorize, and calibrate - while depending on it more every day - has that reversal really happened?

Tuesday, April 28, 2026

Two Civilizations of a Kilowatt-Hour

 On September 20, 2024, Microsoft signed a $1.6 billion power purchase agreement. The electricity was coming from the Three Mile Island nuclear plant.

Yes, that Three Mile Island from 1979. The unit that melted down back then was Unit 2. Unit 1 had actually operated safely for decades and was only shut down in 2019 because it wasn't turning a profit. Microsoft paid to have Constellation Energy restart this 835-megawatt reactor, rename it the Crane Clean Energy Center, and feed 100% of its output into Microsoft's data centers. They signed a 20-year deal.

By January 2026, Constellation reported they were ahead of schedule. Originally set to connect to the grid in 2028, it’s now slated for 2027.

A power plant infamous for a nuclear disaster was resurrected by a software company. Its purpose isn't to heat homes; it's to power GPUs for AI.


The Shattered Illusion

For the past two decades, the entire IT industry believed in one core tenet: digital is light. Once code is deployed globally, its marginal cost is near zero. If an article is published online, whether one person reads it or a hundred million do, the extra electricity cost is negligible. Venture capitalists called this "zero-marginal-cost scalability."

Bitcoin was the first to puncture this illusion. In 2025, the Bitcoin network consumed about 173 TWh of electricity, with Cambridge estimates exceeding 211 TWh. That’s roughly the annual power consumption of the entire country of Ukraine.

AI followed up and tore the hole wide open. On April 20, Fortune reported a staggering figure: data centers devoured half of all new electricity demand in the US last year. The IEA estimates that total global data center power consumption will break 1,050 TWh by the end of the year—ranking fifth globally, wedged right between Japan and Russia.

It's time to throw away the notion of an "asset-light" digital world.


What Does the Same Kilowatt-Hour Produce?

Both Bitcoin and AI are converting electricity into digital outputs at a massive scale. But once the power is burned, what comes out the other end is entirely different.

Bitcoin burns electricity to produce trust. Miners repeatedly run SHA-256 hash operations, and the resulting outputs carry no semantic meaning—nobody cares what that specific string of numbers is. What miners are proving is simply: "I genuinely spent this much electricity." This proof makes tampering with the ledger incredibly expensive, so expensive that it's economically irrational. Gold operates on the exact same logic: it’s valuable not just because it’s shiny, but because digging it out of the ground is tremendously difficult. Bitcoin transplanted this logic into the digital realm—using physical cost to anchor trust, bypassing the need for banks or government stamps.

AI burns electricity to produce replicas of cognitive ability. GPT-5.5 running an inference, Claude Opus 4.6 writing a script, Gemini 3.1 analyzing a CT scan—every single invocation is a GPU cluster grinding through floating-point operations. Unlike Bitcoin, these computations yield specific semantic outputs: an analysis report, a bug fix, a contract review.

Bitcoin manufactures scarcity; AI manufactures abundance. One turns electricity into "you can trust me," while the other turns it into "I can do work for you." Both put the cost on the power meter, yet they seem to point toward entirely different civilizations.

But beneath the surface lies a commonality that very few people mention.


Two Probability Machines

Bitcoin's consensus is probabilistic. Once a transaction is packed into a block, it isn't "irreversible forever." More accurately, as subsequent blocks are added, the probability of reversing that transaction drops exponentially. The industry standard is to wait for 6 confirmations—not because it's absolutely mathematically safe after 6, but because by then, the cost of an override is so astronomical that no rational attacker would attempt it. Satoshi Nakamoto calculated this probability decay curve in the whitepaper. The so-called "immutability" was never a mathematical impossibility; it was always an economic impracticality.

AI's output is also probabilistic. The process of an LLM generating each token is essentially sampling from a probability distribution. Ask the same question twice, and you might get different answers. The model doesn't "know" what is correct—it is statistically inclined to produce a plausible result, but no mechanism guarantees that any single output is flawless. The so-called "hallucination" is not a bug; it is an inherent property of this sampling mechanism.

This leads to a highly underestimated commonality: both systems use energy to buy probabilistic guarantees, not deterministic ones.

Bitcoin burns electricity to buy the guarantee that "this transaction will probably not be reversed." After six blocks, the probability is low enough that you can safely accept the money, but theoretically, it is never zero. AI burns electricity to buy the guarantee that "this output is probably reasonable." It’s good enough for most daily scenarios, but nobody can promise that every inference is correct.

This shared trait goes much deeper than just "they both use a lot of power." It implies that both systems inherently require additional mechanisms to bridge the gap between "probably right" and "definitely right." Bitcoin bridges it by waiting for more blocks, relying on exchange risk controls, and leaning on the credit enhancements of clearing networks. What does AI rely on? We still lack a solid engineering answer for that.


Software Companies Are Becoming Infrastructure Companies

The Three Mile Island story isn't an isolated incident; it's a microcosm of a structural shift.

In 2026, the combined capital expenditures of Amazon, Alphabet, Microsoft, Meta, and Oracle are hurtling toward 660690 billion. A Goldman Sachs report put it bluntly: the capex intensity of these companies has reached 45% to 57% of their revenues.

What do these numbers mean? The capex intensity of traditional software companies usually hovers around 5% to 10%. Automakers sit roughly at 15% to 25%. Oil companies might hit 30% at the peak of their industry cycle. Right now, the capex intensity of Microsoft and Google surpasses that of Toyota and ExxonMobil.

What are they buying? GPU clusters, liquid cooling systems, fiber optics, electrical substations, and long-term power purchase agreements. Google signed the world's first corporate PPA for an SMR (Small Modular Reactor). Amazon bought the entire substation capacity next to a nuclear plant in Virginia.

Harvard economist Jason Furman shared a figure that made me read it twice: in the first half of 2025, 92% of US GDP growth came from AI infrastructure investments. Strip those investments away, and the annualized growth rate of the US economy was a mere 0.1%.

This doesn't look like a tech industry expansion. This looks like a nation's economic growth is tethered to the construction of data centers.

In the internet era, the core assets of tech companies were code, users, and network effects. Today, that list includes a few new items: power contracts, chip fab capacities, water cooling rights, and substation access. Software companies are morphing into infrastructure companies. This transition is happening so quietly that almost no one discusses it, yet it is far more profound than any new model release.


The End of the Replication Economy

When physical forms change, business logic must follow suit.

For the past twenty years, the two most lucrative businesses on the internet were online advertising and online gaming. The defining trait of both models is that the marginal cost of delivery is practically zero. When Google shows you one more ad, or Tencent lets you play one more match, the extra electricity and bandwidth cost to the servers is negligible. Therefore, the internet plays a traffic game—corral the users, and indirectly monetize them via ads or virtual items. Users don't pay directly for the product's core value; advertisers and whales foot the bill.

I call this the "replication economy." A product is created once and replicated infinitely, with each replication costing almost nothing. Software, music, video, social networks—nearly all tech giants of the past two decades were built on this logic.

AI has overturned this logic completely.

Every single API call burns GPU time. Every token carries real electricity and compute costs. The more users you have and the more frequently they prompt, the higher your costs go. This is the exact opposite of the internet's "scale equals profitability" logic. OpenAI lost over 5billionin2024,burningmassiveamountsofcashoninferencecompute.Anthropicisalsolosingmoney.A20 monthly subscription fee simply cannot sustain a heavy AI user. A programmer coding all day with Claude might burn through their monthly subscription's worth of compute in a single day.

This is a "production economy," not a "replication economy." Every single delivery incurs a tangible production cost.

So where does the money come from?

The advertising model doesn't work here. Shoving ads into an AI dialogue ruins the experience, and fundamentally, advertising is a traffic game of "trading free content for attention," which directly contradicts the physical reality that every AI invocation burns money. The subscription model has a ceiling, too. A fixed monthly fee can't cover the compute consumption of power users; you either cap their usage (frustrating users) or subsidize the compute (bankrupting the company).

The way out lies in the B2B sector, in direct value creation.

If a white-collar worker earns 15,000amonth,andanAIcanreplaceoraugment101,500 a month. How much is the enterprise willing to pay for this? Definitely more than a 20subscription.WhenenterprisesbuyAI,theyarentbuyingachatbox;theyarebuyingcostreplacement.Ifaparalegalshumancosttoreviewacontractis300 an hour, and an AI takes ten seconds to do an initial screening at an inference cost of 20 cents—even if the AI is only 60% accurate and still requires human review later, the enterprise's ROI is solidly positive.

This is a fundamental paradigm shift. The internet era relied on indirect monetization (ads, traffic, attention economy); the AI era must rely on direct monetization (charging for value created). Not because direct monetization is nobler, but because every delivery has a cost, and you have to ensure the revenue covers it.

Taking it a step further: this is actually a good thing. The byproducts of the traffic game are information pollution, attention wars, and the rampant spread of fake content. Because the marginal cost is zero, producing and distributing garbage content is also practically free. AI's "production economy" logic naturally repels low-value output because every meaningless token is a real financial loss. Economic pressure will force AI toward high-value scenarios, rather than incentivizing it to generate more noise like the internet did.

Of course, this filtering mechanism isn't automatic. If someone is willing to burn cash using AI to batch-generate garbage for SEO, being economically irrational doesn't mean it's technically impossible. But the overall direction is clear: the center of gravity for AI commercialization will be in B2B, in scenarios with quantifiable ROI, not in B2C subscriptions.


Agents Don't Sleep

The biggest change hasn't even arrived yet.

The mainstream use of AI today is still "human asks a question, machine gives an answer." The energy consumption of this usage pattern is pulsed: you burn power when you use it, and it sits idle when you don't. But Agentic AI is altering this model.

Deloitte pointed out an easily overlooked trend in this year's report: enterprises are deploying "always-on" monitoring agents—scanning emails, logs, market data, and operational metrics 24/7. These backend agents don't wait for a prompt to start; they continuously consume compute, even at 3 AM.

From chatbots to reasoning models to autonomous agents, the compute required for a single inference session has grown roughly 10,000 times.

10,000 times.

In 2024, the average enterprise AI budget was 1.2million.By2026,itsurgedto7 million. Inference accounts for 85% of enterprise AI budgets. And these numbers were recorded before the large-scale deployment of agents.

When millions of agents are running simultaneously—managing supply chains, inspecting codebases, monitoring compliance risks, coordinating cross-timezone teams—inference demand will no longer be a "peak load." It will become "baseload," much like the foundational layer of demand on a power grid that never fluctuates, doesn't care about human sleep schedules, and doesn't shut off at 5 PM.

This is AI's true energy challenge. Training massive models is a one-time capital expenditure; once it's burned, it's done. The continuous power draw from millions of always-on agents will be the lion's share. Baseload demand requires baseload power. Wind and solar cannot serve as baseload; only nuclear and natural gas can.

Microsoft resurrecting Three Mile Island, Google betting on SMRs, Amazon buying up nuclear substations—these moves don't seem so absurd anymore. They aren't buying power for today's chatbots; they are stockpiling rations for tomorrow's fleet of always-on agents.


The $25 Billion Hidden Bill

On April 21, Fortune published a report highlighting a number that rarely gets discussed: US data centers cause an estimated $25 billion in hidden damages annually. The air pollution from coal and gas plants, the freshwater consumed by cooling systems, the public resources squeezed by grid expansions—these costs don't appear on any tech company's balance sheet.

Google's 2025 environmental report is a prime example. Its data center electricity use surged by 27% in 2024, and its carbon emissions grew by 51% cumulatively since 2019, hitting 11.5 million tons. The share of clean energy rose from 64% to 66%. A two-percentage-point progress in decarbonization simply cannot keep pace with a 30% surge in power demand.

The pushback has evolved from numbers on a page to real people. In 2025, at least 16 data centers in the US were voted down or delayed by local communities, affecting $64 billion in investments. A resident in a Virginia town put it bluntly:

"I pay my power bill to run my air conditioner, not to calculate tokens for your chatbots."

Pew's polling corroborates this sentiment: the public has positive views on the jobs and tax revenues data centers bring, but their resentment toward the energy consumption and environmental impact is far stronger. Once AI's physical footprint spreads from the server farms of Silicon Valley to the farmlands of Ohio, it ceases to be just a technical issue.


The Cost of Probability

As mentioned earlier, both Bitcoin and AI are probability machines. On the Bitcoin side, the methods for bridging the probability gap are quite mature—wait for more block confirmations, employ exchange risk controls, and use clearing networks as a safety net. Burn more electricity, wait a bit longer, and you crush the probability of failure down to near-zero.

The problem on the AI side is much messier.

Generating a paragraph of legal analysis costs just $0.01 in inference fees. But verifying that this analysis is accurate in your jurisdiction, hasn't missed key precedents, and can actually be adopted in court—that still requires a human lawyer spending two hours. An AI-generated code snippet might pass the tests and look fine, but will it crash under edge conditions? Does it introduce race conditions under concurrency? If it causes a production outage, who is liable? None of these problems can be solved simply by burning a few more kilowatt-hours.

Generation costs are dropping exponentially; verification costs are not.

This is the destiny of probabilistic systems. Bitcoin's probabilistic nature is constrained by the structural design of the blockchain—wait six blocks, the probability is negligible, and the process is automated. But AI's probabilistic nature is diffused across the semantic layer. There is no automated "six blocks" mechanism that can serve as a safety net. Every output may require human intervention to judge, and human judgment is expensive, slow, and unscalable.

What does this mean?

It means that the truly valuable AI systems will not be the ones that can generate the most tokens, but the ones that can close the loop from "generation → verification → execution → feedback → accountability." On this chain, generation is the cheapest link. Verification and accountability are what cost real money.

It also means that measuring efficiency by "cost per token" is wholly insufficient. The real metric should be "verified, adopted output per kilowatt-hour." A system that uses ten times the electricity but yields verified diagnostic conclusions is vastly more "efficient" than a low-power system whose conclusions nobody dares to trust.

I believe this will be the most profound shift in the AI industry over the next five years: moving from a competition of who generates the most and the fastest, to a competition of whose output is credible and who can bear the liability for that output. It took Bitcoin fifteen years to convince mainstream society that its probabilistic consensus was reliable. AI will have to walk the same path, and along the way, there is no ready-made mathematical proof from Nakamoto's whitepaper to borrow.


An Unexpected Lesson from Bitcoin Miners

Back to Bitcoin. AI can take notes on the pitfalls Bitcoin encountered during its energy controversies.

When China banned mining in 2021, the farms migrated to Kazakhstan, Texas, and Northern Europe. By 2025, the share of renewable energy in the global Bitcoin network reached 52.4%—hydro 23.4%, wind 15.4%, and nuclear 9.8%. Miners didn't flock to these areas out of ecological enlightenment; they went because the electricity was the cheapest. And the electricity was cheap because renewable energy was oversupplied and stranded.

Miners turned into the grid's "sponges": absorbing capacity when power was abundant, and shutting down to yield it when the grid was stressed. Some mining farms in Texas signed demand-response agreements with the grid—when a heatwave hits, they power down to let residents run their ACs.

AI data centers running inference services can't be turned on and off on a whim. But training runs can. Scheduling massive training jobs during off-peak hours at night, and anchoring site selection to renewable energy layouts—these are the paths carved out by miners voting with their feet.

Bitcoin's fifteen-year energy debate proved one thing: any digital system that burns electricity at scale will eventually be dragged to the real world's negotiating table. How much power was burned, how much carbon was emitted, whose resources were crowded out—these questions don't disappear just because the system runs "in the cloud."


A Kilowatt-Hour

I keep thinking about Microsoft resurrecting Three Mile Island. Beyond the sheer drama of it, it exposes a profound contradiction: we are industrially manufacturing the two most intangible concepts in human civilization—trust and intelligence—using the most tangible, physical means imaginable. Electricity, silicon, cooling water, concrete.

At their lowest level, Bitcoin and AI belong to the same category: they are probability machines. Both use energy to buy the guarantee of being "probably right," and then employ additional engineering and institutional scaffolding to approach being "definitely right." It took Bitcoin fifteen years, burning the electricity equivalent of a mid-sized country, to finally compel the mainstream financial system to accept its probabilistic consensus. Trillion-dollar assets now flow across this mechanism, running to this day without a central bank's guarantee.

AI is only at the starting line of this journey. It can generate increasingly more things, but the gap between "probably right" and "definitely right" hasn't been reliably sealed. What does it take to seal it? Better verification mechanisms, clearer chains of liability, more mature industry standards. None of these can be solved by simply burning more electricity, nor can they be built in a year or two.

Meanwhile, the underlying physical bill is inflating at breakneck speed. The capex intensity of tech companies has eclipsed that of oil giants, an entire nation's economic growth is tethered to data center construction, and the baseload power draw of millions of agents hasn't even begun to hit the budget.

"What is this kilowatt-hour calculating?"—Three years ago, this was an internal debate among tech communities. Today, it dictates grid planning, nuclear policy, semiconductor export controls, and the trajectory of regional economies.

Years ago, Bitcoin was mocked as "burning electricity to mine thin air." Fifteen years later, we look back and see that electricity forged a probabilistic trust network with a global market cap over a trillion dollars. What will the electricity currently being burned by AI forge? That depends on whether we can find a reasonable engineering compromise between probability and certainty. And it depends on who gets to define what "reasonable" means.

The latter question is far more difficult than the former.