The Inference Report

April 8, 2026

From the Wire

The industry is fracturing along lines of compute access and control. Those with chips, data, and relationships to cloud providers are consolidating power; everyone else is either building around the margins or discovering that AI's productivity gains don't survive contact with real work.

The capital flow tells the story most clearly. Anthropic just expanded its compute deal with Google and Broadcom as its run-rate revenue hit thirty billion dollars. Firmus, Nvidia's Asia data center play, raised 1.35 billion in six months and now values at 5.5 billion. Uber is expanding its AWS contract to run ride-sharing features on Amazon's chips. Intel is joining Musk's Terafab project. These aren't decisions about which model is smartest. They're about who controls the hardware and software stack when the real money gets deployed. Arcee, a 26-person startup that built an open source LLM, is gaining traction with users, but the headline calling it tiny is telling. Scale in this industry still means access to billions of dollars of compute and the relationships to secure it.

But the productivity story is cracking under scrutiny. Google's AI Overviews tell millions of lies per hour at ninety percent accuracy. AMD's AI Group director publicly flagged that Claude Code cuts corners on complex problems, offering answers that seem quicker and lighter but don't stick, forcing her team to stop using it. Only 28 percent of AI use cases in infrastructure and operations fully succeed and meet ROI expectations, while 20 percent fail outright. The gap between what AI can do in controlled demos and what it delivers when bolted onto real systems is widening, not closing. This is why companies are cutting engineering teams based on the fantasy that AI can now build and maintain enterprise applications with minimal supervision. It's also why those cuts will produce consequences beyond bad quarters.

Sloane Duvall