AI Compute Economics: Why Real Output Is Growing 18x Faster Than the Spending

Maggie Nanyonga · 2026-05-20 · AI Compute Economics, AI Infrastructure, Inference Economics, AI GDP, Hyperscalers, Custom Silicon, Enterprise AI, Macroeconomics

AI compute spending grew 144% a year while real output grew 2,600%, and that measurement gap reveals where the economic surplus actually pools.

The most important number in artificial intelligence is the one almost nobody is reporting. In the United States, nominal spending on AI compute grew at roughly 144 percent a year, climbing from about \$37 billion in 2023 to $90 billion in 2024 and \$219 billion in 2025. Those are the figures that fill earnings calls and infrastructure headlines. They are also, in the most consequential sense, the wrong figures. When the same output is measured the way national statistical agencies measure an economy, adjusting for the collapsing cost of delivering a fixed unit of capability, real AI output grew at approximately 2,600 percent a year. The gap between those two numbers, nominal spend rising 144 percent while real output rises at roughly eighteen times that rate, is not a rounding error. It is the entire story of the AI economy, and it decides who captures the surplus and who gets compressed out of it.

from electricity to dollars

This framing comes from a recent attempt to do something conventional economics has not yet done: build a Gross Domestic Product for the machines. In Working Paper 26-9, Measuring the AI Economy, Anton Korinek of the Peterson Institute for International Economics and Patrick McKelvey of the Bank of Canada draw a national-accounting boundary around AI computation and measure the value created inside it. Korinek is also affiliated with the Anthropic Institute, a tie the paper discloses, and one worth flagging on a subject where AI labs have an obvious stake. The methods stand or fall on their own logic, which the rest of this piece examines. The result reframes nearly every strategic assumption operators carry about infrastructure, margins, and competitive moats. What follows works through that framework and then turns it toward the question executives actually care about: where the money goes from here.

The Measurement Gap Hides AI's Real Growth

Nominal spending is a poor proxy for AI output because the cost of a unit of AI capability is falling faster than almost any price in modern economic history. The PIIE framework constructs a chained price index for inference that holds model intelligence constant, tracking the cheapest available price for a fixed capability tier over time. That index implies the price of a token at constant capability fell by roughly 35 times per year. Models also generate longer responses, expanding output lengths at about 2.2 times annually, which partially offsets the decline. Net the two against each other and the effective price of a unit of AI output still falls to roughly 6 percent of its prior-year value. That is an annual real price decline near 94 percent.

When prices fall that fast, headline spending tells you almost nothing about volume. A flat dollar figure can conceal an order-of-magnitude increase in delivered capability. Applying a chained Fisher quantity index, the same methodology the U.S. Bureau of Economic Analysis uses for national accounts, the framework finds real AI output expanding more than 26-fold per year. Nominal AI GDP rose from about \$42 billion in 2023 to \$251 billion in 2025. Real AI GDP, expressed in constant 2023 dollars, rose from \$42 billion to more than \$31 trillion over the same window. These quality-adjusted figures are estimates, not observations, and they hinge on the deflator that turns falling token prices into real volume. Push on that assumption and the exact multiple moves. The researchers cross-check the result against a second method built from chip-shipment data, and it points the same way, which is the part that matters. The claim is not that the figure is precisely 2,600 percent. It is that real output is growing far faster than spending suggests.

| Metric | 2023 | 2024 | 2025 | Annual growth |

|---|---|---|---|---|

| Nominal compute spending | $37B | $90B | $219B | ~144% |

| Real AI output (quality-adjusted) | Baseline | 26x | 676x | ~2,600% |

This divergence has a clear historical precedent. During the early decades of the semiconductor era, hedonic price indices revealed that real computing output was exploding even as nominal market revenues advanced at far more modest rates. AI is running the same play, only faster. Anyone budgeting, forecasting, or competing off the nominal figures is reading the wrong instrument.

Treating AI as Its Own Economy

The framework's core move is to draw a border around AI computation and treat everything crossing it as trade between two economies. On one side sits the human-attributable economy. On the other sits the AI-attributable economy, defined as all economic value created by foundational AI computation rather than human cognition. Operationally, that means workloads running inside AI-accelerated data centers. Workloads small enough to run on consumer hardware are excluded for now, which is a limitation worth noting as smaller models improve.

Once the border exists, the accounting becomes a trade ledger. When a model serves an API call, fulfills a chatbot subscription, or executes an enterprise workflow, it exports a service from the AI economy back to the human one, and that export adds to AI GDP. Conversely, every input the AI sector consumes from the human side, the electricity, the silicon, the construction labor, the engineers maintaining the systems, is an import that subtracts from gross product, exactly as imported intermediate goods do in any national account. Where humans and models work in tight loops, the result is heavy bilateral trade in intermediate goods, the same pattern you see in integrated cross-border supply chains.

This is more than an accounting curiosity. It forces a discipline most AI commentary lacks: separating what the AI sector produces from what it merely consumes. And it immediately produces a counterintuitive result.

The Net-Zero Capital Paradox Explains the Gap

The hundreds of billions flowing into data center construction contribute almost nothing to current AI GDP. This is the net-zero capital paradox, and it is the methodological proof of why the measurement gap exists.

Consider an economy that builds a housing development using entirely imported lumber, imported steel, and imported labor. The construction counts as domestic investment, but the imported materials and labor count against it in equal measure. The net contribution to current GDP is roughly zero, even though the housing stock has grown. Data center buildouts work the same way inside the AI economy. The shells, the cooling loops, the server racks, the merchant silicon: all of it is produced in the human economy and exported into the AI economy. During the build phase, the physical investment is offset by the imports that fund it, yielding a net-zero impact on current AI GDP. This is a statement about the AI sector's own books, not a claim that the spending is inert. Those same dollars are counted in the broader national accounts, as construction and manufacturing output in the human economy. Inside the AI economy's boundary, the capital simply arrives as an import, so it nets against the investment that deploys it.

Physical capital is therefore not output. It is a leading indicator of future output, a signal of the compute capacity that will exist once the facility is energized. The only activities that add directly to AI GDP are the production of intangible model capital through training, which is the AI economy's equivalent of intellectual property investment, and the live export of inference services. This is precisely why nominal headlines mislead. They over-index on the visible capital expenditure, which is an import, and miss the intangible capital and token exports that constitute the real product.

How the Number Gets Built

Because AI revenue is buried inside diversified technology giants, the framework estimates output backward from the one signal that cannot hide: electricity. Total AI compute is reconstructed from data center power consumption, which rose from 28.7 terawatt-hours in 2023 to 138 terawatt-hours in 2025, then converted to billable spending through a chain of engineering adjustments. Power usage effectiveness is modeled at 1.3, stripping out cooling and distribution overhead. A thermal design power adjustment of 0.9 reflects that chips rarely run at maximum rated draw. A sellable fraction of 0.9 accounts for the powered-on compute that never bills a customer because of sync lags, scheduling gaps, and downtime.

The most revealing detail sits in the coverage factors. To correct for the chips hidden inside proprietary networks, the framework estimates what share of each chip type is actually visible in public cluster data. Nvidia's Hopper family registers a coverage factor of about 0.30, making it the most liquidly tracked asset in the market. AMD's MI300 series tracks at 0.18. Google's TPU stock registers at 0.04. Only about 4 percent of it surfaces in public cluster tracking, a gap consistent with heavy internal and private deployment rather than the rentable market. The reading is inferential rather than proof of intent, but the direction holds: much of the leading edge of AI capacity sits outside the cloud anyone can rent, which is precisely where most analysis is forced to look.

Inference Economics Is Where the Real Growth Lives

The explosion in real output is driven by inference, the ongoing work of serving models, not by the one-time investment of training. The framework splits compute roughly evenly between the two, but the trajectory points hard toward inference, and three forces make its growth non-linear.

The first is the reasoning token asymmetry. Picture the difference between an executive who answers from the top of their head and an analyst who fills legal pads with hidden calculation before saying a single word. Traditional models behaved like the executive, where every token billed was a token shown. Large reasoning models behave like the analyst. They generate thousands of internal thinking tokens, testing logic and cross-examining themselves, before displaying a short final answer. The enterprise pays for and sees the answer, but the infrastructure burned the electricity for the entire internal monologue.

The second is the agentic loop multiplier. When multiple specialized agents pass work back and forth, the cost compounds like a game of whisper down the lane in which every participant must read the entire prior transcript aloud before adding a sentence. Each agent re-parses the prior history of the chain to stay consistent, so context windows balloon and token consumption can multiply severalfold over a single-turn exchange. The size of that multiple depends on the framework, the task decomposition, and how aggressively the system caches, so it resists a single clean figure. Much of the spend buys repetition, not new analysis.

The third is the output horizon. Even for an identical prompt, response lengths have grown roughly 2.2 times per year as models are tuned to say more. Stack these three forces together and the result is a real inference output growing on the order of 39 times per year, even as the price per unit collapses. The volume of cognition being delivered is the part the spending figures cannot see.

The Hardware Consequence

If inference now dominates the workload, the chips optimized for training become an expensive way to serve it. A training GPU is a freight train. Building a foundational model means hauling a mountain of data in one massive parallel journey, and the brute-force engine of a multi-thousand-node cluster is exactly what that requires. Serving a single user prompt is a delivery scooter task: nimble, sequential, low payload. Running a 700 to 1,000 watt training GPU to answer one routine query is the economic equivalent of dispatching a hundred-car freight train across town to deliver one pizza. It works, and the cost per delivery is catastrophic. Training silicon only reaches efficient utilization with large batches, so on real-time single queries it idles at high baseline power and inflates the cost per million tokens.

That mismatch is fracturing the data center floor into specialized tiers. General-purpose GPUs from Nvidia and AMD still command the majority of accelerator revenue, on the order of 75 to 80 percent, anchored by training demand and the entrenchment of the CUDA ecosystem, though industry trackers project Nvidia's share drifting toward the mid-70s as alternatives scale. Custom hyperscaler ASICs are the fastest-growing category, with Bloomberg Intelligence projecting a 44.6 percent compound growth rate for custom accelerators through 2033 against roughly 16 percent for GPUs, driven by total-cost advantages that analysts at SemiAnalysis and Bernstein estimate at 40 to 65 percent at scale. A third tier is emerging for latency-critical work: Nvidia's $20 billion licensing-and-talent deal for Groq's language processing unit technology, struck in late 2025, produced the Groq 3 LPX accelerator that now sits inside its Vera Rubin platform as a dedicated decode-phase co-processor. A fourth tier of low-power accelerators handles embeddings and preprocessing on a throughput-per-watt basis. These market figures come from industry trackers and analyst houses rather than audited shipment data, so treat the specific percentages as directional. They get revised. The pattern across sources does not.

| Tier | Primary role | Economic logic | Direction |

|---|---|---|---|

| GPUs (Nvidia, AMD) | Training, heavy-context serving | Flexibility plus CUDA lock-in | Dominant in dollars, shrinking in unit share |

| Custom ASICs (TPU, Trainium, MTIA, Maia) | High-volume cloud inference | Built at cost, no merchant margin | Fastest-growing category |

| LPUs (Groq architecture) | Real-time and agentic serving | On-chip SRAM, deterministic latency | Niche hyper-growth |

| NPUs and dense accelerators | Embeddings, preprocessing | Maximum throughput per watt | Steady internal expansion |

The throughline is that the monoculture is over. The defensible question for any infrastructure operator is no longer how many GPUs they own, but whether their silicon mix matches the workload they actually serve.

Where the Value Pools

As raw capability commoditizes, the surplus abandons the middle and anchors at two poles of scarcity. This is the strategic payoff of measuring AI as an economy: once you see that real output is compounding while the price of capability collapses, you can see that owning the model is no longer enough, because the model is becoming cheap.

At one pole sits physical capital. Operators who control finite power allocations, grid interconnects, land, cooling, and custom silicon capacity earn durable, utility-like returns. Power is the ultimate gatekeeper of compute velocity, and a secured megawatt is becoming a harder asset than a trained model. At the other pole sits intangible capital, but in a narrower form than before. The return is migrating away from the model weights themselves, which open-weight architectures are commoditizing, and toward the specification and verification layer: the validation pipelines, evaluation harnesses, and runtime policy controls that make a non-deterministic model safe to run inside a regulated enterprise. That layer is the new moat, because it is the part that cannot be cheaply copied.

Between those poles sits the middle-layer squeeze, and it falls hardest on one specific kind of company: the undifferentiated wrapper that puts a thin interface around someone else's API and owns nothing else. Those players control neither the physical compute beneath them nor the cognitive and verification capital above them, and they are caught between the rising cost of the tokens they consume and the falling price of the capability they resell. The qualifier matters. A middle-layer company that owns a real scarcity of its own, proprietary data, deep workflow lock-in, a regulated-industry compliance moat, or hard-won distribution, is not in the squeeze. Vertical software in law, healthcare, and finance is full of such businesses, and they defend their margins on switching costs rather than on model weights. The squeeze is not a tax on sitting in the middle. It is a tax on sitting in the middle with nothing to defend.

The companies that endure will own a scarcity that the market cannot route around: a megawatt nobody else can secure, or a verification layer nobody else can replicate. Everything in between is renting its margin from one of the two, and the rent is going up.


Maggie Nanyonga