The Inference Report

May 22, 2026

From ten thousand feet, the week presents a capital system in full retreat from constraint. SpaceX files an eighty-billion-dollar IPO embedding AI infrastructure as orbital hedge against regulatory exclusion. OpenAI prepares a trillion-dollar debut. The White House delays security review mandates, citing innovation concerns and competitive disadvantage against China rather than technical merit. When federal executives cite innovation as reason to suspend safety protocols, the market signal clarifies: builders move fast, regulators step back, and capital flows toward those positioned to exploit the gap. This is not a debate about whether AI should be regulated. It is a decision about who bears the cost of speed.

Beneath the capital movements sits a second pattern: the verticalization of AI consumption. Spotify embeds agentic audio generation into subscription tiers while striking revenue-share deals with Universal Music. Salesforce integrates Agentforce, Data Cloud, MuleSoft, and Tableau into a headless architecture for autonomous agents. Google folds CodeMinder into agent ecosystems. Microsoft open-sources safety tools. The underlying dynamic is not about individual AI features. It is about companies threading agentic systems into existing product surfaces and data flows, then monetizing through subscription and licensing structures already in place. Distribution wins. The AI becomes infrastructure.

Yet production reality diverges sharply from announcement narrative. The Path's mental health model scores 95 on the Vera-MH benchmark against 65 for consumer bots. Microsoft releases open-source safety tools. These claims measure isolated models in isolation. Enterprise teams report that production AI is significantly harder than early experimentation suggested, with most agents shipping as custom plumbing, fragile session logic, and security models held together by hope. Benchmarks do not capture agents operating inside messy environments, calling APIs, managing state, making decisions with consequences. The fiction of safety theater persists because safety gets announced at the model layer while risk accumulates at the integration layer, where few look and fewer measure.

Lab announcements reveal the real competition has shifted from capability to control. OpenAI targets enterprise workflow capture through ChatGPT for Healthcare. Google DeepMind frames environmental risk as regulatory alignment. NVIDIA treats the entire stack from data centers to edge devices as a single sales funnel. IBM positions quantum as a long-term hardware play backed by proposed federal funding. No lab announced a meaningful advance in underlying technology. Instead they announced distribution channels, geographic expansion, funding mechanisms. GitHub trending reinforces this: developers are moving past "can an LLM write code" toward practical agent infrastructure, skills frameworks, observability systems, and local execution. The unglamorous layer where traction accumulates. That is where the real competition lives now.

Grant Calloway

AI LabsAll labs
From the WireAll feeds
Research Papers — FocusedAll papers
Kernel-based potential mean-field games with unbiased random Fourier $U$-statistics math.OC

We study the subclass of potential mean-field games in which the running interaction cost and the terminal target cost are both expressed through reproducing-kernel maximum mean discrepancy (MMD) penalties, and develop a computational framework that exploits this kernel structure. Both costs are estimated from finite-sample empirical distributions using a random Fourier U-statistic representation that is unbiased and has linear cost in the batch size. The drift of the controlled diffusion is parametrized by a neural network and trained via stochastic gradient descent. For this subclass we prove a sample-level almost-sure convergence theorem and an explicit almost-sure rate of convergence, under coupled rate conditions on the penalty parameter, the random-feature count, the sample size, and the optimization tolerance. The framework includes the kernel-MMD-penalty Schrödinger bridge problem as the special case of a vanishing interaction cost. Numerical experiments illustrate the method on the Schrödinger bridge problem in dimensions up to one hundred, and on an electric vehicle charging coordination problem with per-vehicle physical heterogeneity, where an aggregate-demand congestion cost represents price-feedback competition at the population level and the terminal MMD penalty shapes the state-of-charge distribution at the deadline.

MoSSP: A Momentum-Based Single-Loop Stochastic Penalty Method for Nonconvex Constrained DC-Regularized Optimization math.OC

In this paper, we study a structured class of nonconvex constrained stochastic problems with difference-of-convex (DC) regularization, where the feasible set is possibly nonconvex and the concave part of the DC regularizer is allowed to be nonsmooth. The fundamental challenge lies in maintaining feasibility for nonconvex constraints while achieving favorable oracle complexity. Although single-loop algorithms efficiently solve unconstrained DC optimization problems, their potential for constrained optimization with DC structure remains largely unexplored. To address this gap, we develop MoSSP, a Momentum-based Single-loop Stochastic Penalty method for such problems with provable complexity guarantees. The key idea is to apply a single stochastic proximal-gradient step to the Moreau envelope of the penalty plus the convex DC part, with the concave part's proximal mapping computed in parallel. We derive two algorithm variants: a Polyak-momentum version with $O(\varepsilon^{-4})$ oracle complexity for finding stochastic $\varepsilon$-KKT points, and an improved $O(\varepsilon^{-3})$ version incorporating recursive momentum. Experimental results demonstrate the effectiveness of the proposed algorithms.

Incentive-Aligned Vehicle-to-Vehicle Energy Trading via Nash-Integrated Multi-Agent Reinforcement Learning math.OC

Vehicle-to-vehicle (V2V) energy trading enables decentralized peer-to-peer energy exchange among electric vehicles (EVs), reducing grid dependency while monetizing surplus capacity. However, coordinating self-interested EV agents with diverse charging needs and uncertain arrival-departure schedules remains challenging. Existing approaches either require centralized optimization with computational limitations or lack fairness guarantees. This paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, namely Nash-MADDPG, for incentive-aligned V2V energy trading. Nash bargaining determines efficient bilateral pricing, while Nash-guided price proximity rewards align agent learning toward bargaining-optimal strategies. Evaluation over 30-day continuous operation demonstrates an improvement of 61.6% in social welfare and 62.9% improvement in trading volume over Double Auction, while achieving superior fairness, such as 40.1% improvement in Jain's index. Testing across 6-100 agents over a 30-day horizon with continuous vehicle turnover confirms scalability across population size and empirically stable pricing near the Nash Bargaining benchmark.

Optimization over the intersection of manifolds math.OC

Optimization over the intersection of two manifolds arises in a broad range of applications, but is hindered by the coupled geometry of the feasible region. In this paper, we prove that the regularities -- clean intersection and intrinsic transversality -- are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore, we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. Specifically, the iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality, we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary. Numerical experiments on problems stemming from sparse and low-rank optimization, including fitting spherical data, approximating hyperbolic embeddings on real data, and computing compressed modes, demonstrate the effectiveness of the proposed method.

Training Infinitely Deep and Wide Transformers math.OC

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible updates consistently improve final validation loss, and in several cases training stability, over corresponding AdamW updates.

BenchmarksFull tables
Artificial AnalysisIntelligence Index

Composite score across coding, math, and reasoning

#ModelScoretok/s$/1M
1GPT-5.560.265$11.25
2Claude Opus 4.757.349$10.94
3Gemini 3.1 Pro Preview57.2142$4.50
4GPT-5.456.893$5.63
5Qwen3.7 Max56.60$3.75
SWE-rebench

Agentic coding on real-world software engineering tasks

#ModelScore
1Claude Opus 4.665.3%
2gpt-5.2-2025-12-11-medium64.4%
3GLM-562.8%
4Junie62.8%
5gpt-5.4-2026-03-05-medium62.8%