The Inference Report — May 21, 2026

The AI industry has abandoned the pretense that models are the product. Across infrastructure, benchmarks, and deployed systems, the consolidation pattern is unmistakable: the money, the compute, and the power are flowing toward agents, inference infrastructure, and the systems that make autonomous execution possible at scale.

Google has reorganized its entire product surface around agency. Gemini 3.5 Flash is engineered for agentic workflows, positioned as four times faster than Claude Opus 4.7 and twice as fast as Gemini 3.1 Pro. The company is unifying its coding tools under Antigravity and embedding agents across Search, Android, and enterprise platforms. Nvidia's CEO Jensen Huang announced a $200 billion market opportunity in CPUs for AI agents, not inference chips for chat. This reorientation is structural, not cosmetic. Google processes 3.2 quadrillion tokens per month. That metric is now the unit of metering, pricing, and control. Whoever owns the inference layer owns the billing relationship.

The compute arms race has become visible and expensive. xAI burned $6.4 billion in 2025 and is purchasing $2.8 billion in natural gas turbines over three years while paying Anthropic $1.25 billion per month for compute. Anthropic is on track for its first profitable quarter with $10.9 billion in projected Q2 revenue, a milestone neither OpenAI nor xAI has reached. OpenAI is preparing its IPO filing for as soon as September with Goldman Sachs and Morgan Stanley. These are capital-intensive utilities being valued as such. Figure AI's continuous livestream of humanoid robots handling packages is not marketing; it is proof of concept that the market will watch robots work. The question has shifted from whether agents will exist to who controls the compute they run on.

In benchmarks and deployed code, the shift manifests as concrete technical priorities. Claude Opus 4.6 climbed 12.4 points on SWE-rebench to reach 65.3 percent, the largest single-model improvement in the dataset, while GLM-5 and Kimi K2 Thinking each gained roughly 13 to 16 points. On GitHub, the trending patterns split between agentic coding frameworks that reduce hallucination and token waste, and unglamorous infrastructure: llama.cpp and whisper.cpp remain gravitational centers for efficient local inference, now joined by quantization and pruning strategies. The secondary wave addresses production realities: observability tools like Phoenix, time-series anomaly detection, data synthesis, and ML pipeline orchestration. These don't trend virally because they solve problems that only matter once something actually ships. Agentic coding promises leverage over writing itself. The infrastructure work promises leverage over everything that comes after.

Grant Calloway

AI LabsAll labs

AMD

NVIDIA

NVIDIA Announces Financial Results for First Quarter Fiscal 2027

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Paradoxes of Game Theoretic Equilibria and Price of Anarchy cs.GT

For decades, static solution concepts (Nash, Correlated, and Coarse Correlated Equilibria) and the Price of Anarchy (PoA) have formed the bedrock of algorithmic game theory, with no-regret learning proving fast convergence to such game-theoretic equilibria. We show that reducing multi-agent learning to static equilibrium and black-box regret analysis obscures underlying dynamic disequilibrium and game theoretic bounds. First, interior Nash equilibria lack $C^1$ vector field information, meaning agents cannot distinguish aligned from strictly opposing incentives. Inheriting this geometry, the worst-case pure Nash equilibria dictating robust PoA bounds manifest as topologically unstable strict saddles, and in canonical congestion games, as global repellers supported on almost everywhere strictly dominated strategies. Anchoring efficiency guarantees to these unstable states causes algebraic sensitivity; we prove that accommodating all strictly positive affine costs renders the PoA unbounded. Furthermore, projecting learning trajectories onto the discrete simplex of correlated play systematically accommodates non-rationalizable behavior. Evaluating dynamics via Coarse Correlated Equilibria or proximal refinements fails to preclude strictly dominated strategies. Moreover, optimal $O(1/T)$ swap-regret minimization does not preclude macroscopic turbulence, manifesting as chaotic limit sets even in minimal games. Finally, we examine the non-atomic limit of congestion games. Though considered highly stable with tight sub-linear $Θ(p/\ln p)$ PoA bounds (where $p$ is the polynomial degree), we prove that under discrete-time learning, the unique equilibrium destabilizes into Li-Yorke chaos and global attractors whose time-averaged inefficiency degrades exponentially as $2^p$. These results necessitate re-evaluating worst-case equilibrium frameworks for dynamically grounded metrics.

Contextual Procurement Auctions with Bandit Learning cs.GT

We study repeated contextual procurement auctions in which producers have private costs and the platform must learn context-dependent product values from bandit feedback. The objective is welfare rather than revenue or a virtual-cost surrogate: regret is the total surplus loss relative to the full-information efficient procurement rule. We first show that the natural UCB allocation rule attains $\tilde O(\sqrt{ngT})$ welfare regret under truthful bids, but its adaptive bid-dependent learning path does not by itself give a truthfulness guarantee. To obtain exact incentives, we design a bid-independent explore-then-commit mechanism with empirical critical payments; it is dominant-strategy truthful and has $\tilde O((ng)^{1/3}T^{2/3})$ regret. We then introduce frozen-payment UCB, which estimates payments in an initial bid-independent exploration phase, freezes those payment estimates, and continues adaptive UCB allocation learning afterwards. Under a smoothed truthful-path margin condition, this mechanism gives a regret-incentive tradeoff: the near-UCB tuning attains $\tilde O(\sqrt{ngT})$ welfare regret, while the average per-round gain from any fixed deviation is at most $\tilde O(T^{-1/4})$ for fixed $n,g$. A matching lower bound shows that this frozen-payment frontier is unavoidable.

Contextual Procurement Auctions with Bandit Learning cs.GT

We study repeated contextual procurement auctions in which the platform must learn context-dependent product values from bandit feedback. We give an exactly truthful explore-then-commit mechanism with $\widetilde O((ng)^{1/3}T^{2/3})$ regret. We also give a frozen-payment UCB mechanism with a regret-incentive tradeoff: the near-UCB tuning attains $\widetilde O(\sqrt{ngT})$ welfare regret, while for fixed $n,g$ its total incentive error is $\widetilde O(T^{3/4})$; the balanced tuning gives $\widetilde O(T^{2/3})$ on both scales. Regret is measured as welfare loss relative to the full-information efficient allocation. We prove a matching lower bound for the frozen-payment regret-incentive tradeoff.

LLM Semantic Signaling Game and Mechanism Design: Systematic Blindness, Awareness Shaping, and Mindset Dynamics cs.GT

Large language models (LLMs) increasingly mediate strategic interactions through natural language, making semantic control a critical element of communication and deception. This paper develops a semantic signaling game in which a sender selects a semantic control, an LLM generates a stochastic message, and a receiver evaluates the message using an awareness-dependent scoring mechanism. Receiver awareness is modeled as a type that determines which linguistic features are perceived and used for inference, providing a formal model of systematic blindness. The framework connects prompt-based control, statistical detection, and game-theoretic equilibrium analysis. Gaussian approximations of aggregate message scores enable likelihood-ratio decision rules, while Perfect Bayesian Nash equilibria characterize strategic behavior. The paper further develops mechanism-design approaches that reshape receiver awareness, penalize deceptive semantic controls, and modify receiver populations to induce benign pooling equilibria. Numerical experiments validate the Gaussian approximation, quantify awareness-ordering effects, analyze mindset dynamics under adaptive adversaries, and demonstrate how awareness shaping and guardrail costs reduce successful phishing attacks. The proposed framework provides a principled foundation for analyzing strategic language-mediated interactions in agentic AI systems and offers new tools for the design of robust and secure human-AI communication.

Projected Exploitability Descent for Nash Equilibrium Computation in Multiplayer Imperfect-Information Games cs.GT

Many important games have more than two players and imperfect information. Existing approaches for computing Nash equilibrium, the central game-theoretic solution concept, in such games either lack scalability or obtain poor performance. In this paper we introduce a new algorithm called projected exploitability descent (PED) for approximating Nash equilibria in multiplayer games of imperfect information. The algorithm works by running projected subgradient descent minimizing a proxy for the multiplayer generalized exploitability function. The objective is nonconvex and nonsmooth, but can be represented as the sum of the maxima of linear functions, for which a subgradient can easily be computed and projected to the polytope of feasible sequence-form strategies. We explore performance of PED on a generalized version of the well-studied benchmark game three-player Kuhn poker. No prior exact algorithms scale to the version of the game with deck size larger than 4, and we compare performance to the popular algorithms of fictitious play (FP) and counterfactual regret minimization (CFR). We find that PED obtains a consistent near-monotonic improvement throughout all runs, though both FP and CFR perform significantly better in the initial iterations. This inspires a hybrid algorithm FP-PED that runs FP for an initial burn-in period before switching to PED for stable long-run refinement. We can alternatively view this as a multi-step algorithm that runs FP as a pre-processing step to obtain a strong initialization for PED.

Improved Multi-Dimensional Forecasting for Swap Regret cs.GT

We study the problem of forecasting for an arbitrary number of downstream agents with unknown objectives, each of whom best responds to the forecaster's predictions. We seek a single forecaster that guarantees sublinear swap regret for all downstream agents simultaneously. For two-dimensional outcome spaces, we give a polynomial time algorithm that guarantees $\tilde{O}(\sqrt{kT})$ swap regret for any downstream agent with $k$ actions. This improves over the previously known bound of $\tilde{O}(kT^{5/8})$ and avoids the exponential in $T$ runtime of prior algorithms in this setting. Our algorithm extends nicely to other low dimensional environments, retaining $\tilde{O}(\sqrt{T})$ downstream swap regret while the exponent of $k$ in the regret bound and the exponent of $T$ in the running time both grow with dimension. For arbitrary dimension $d$, we give a forecasting algorithm that guarantees $\tilde{O}(d\sqrt{kT})$ swap regret, assuming the forecaster knows an upper bound $k$ on the number of actions available to any downstream agent, albeit with a much longer runtime. This improves upon previous high dimensional guarantees that had $\tilde{O}(T^{2/3})$ dependence and required additional behavioral assumptions.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	64	$11.25
2	Claude Opus 4.7	57.3	48	$10.94
3	Gemini 3.1 Pro Preview	57.2	138	$4.50
4	GPT-5.4	56.8	81	$5.63
5	Qwen3.7 Max	56.6	0	$0.00

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

colbymchenry/codegraph

26469 ★

Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local

Imbad0202/academic-research-skills

36173 ★

Academic Research Skills for Claude Code: research → write → review → revise → finalize

tinyhumansai/openhuman

24209 ★

Your Personal AI super intelligence. Private, Simple and extremely powerful.

multica-ai/andrej-karpathy-skills

155979 ★

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.

rohitg00/ai-engineering-from-scratch

33431 ★

Learn it. Build it. Ship it for others.

Daily discovery

MRPT/mrptRobotics

2133 ★

:zap: The Mobile Robot Programming Toolkit (MRPT)

Arize-ai/phoenixPrompt Engineering

9766 ★

AI Observability & Evaluation

KudoAI/duckduckgptNLP

272 ★

🐤 AI chat & search summaries in DuckDuckGo, powered by the latest LLMs

tabularis-ai/be_greatSynthetic Data

362 ★

A novel approach for synthesizing tabular data using pretrained large language models

mlrun/mlrunMLOps

1668 ★

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.