The Inference Report

May 22, 2026

From ten thousand feet, the week presents a capital system in full retreat from constraint. SpaceX files an eighty-billion-dollar IPO embedding AI infrastructure as orbital hedge against regulatory exclusion. OpenAI prepares a trillion-dollar debut. The White House delays security review mandates, citing innovation concerns and competitive disadvantage against China rather than technical merit. When federal executives cite innovation as reason to suspend safety protocols, the market signal clarifies: builders move fast, regulators step back, and capital flows toward those positioned to exploit the gap. This is not a debate about whether AI should be regulated. It is a decision about who bears the cost of speed.

Beneath the capital movements sits a second pattern: the verticalization of AI consumption. Spotify embeds agentic audio generation into subscription tiers while striking revenue-share deals with Universal Music. Salesforce integrates Agentforce, Data Cloud, MuleSoft, and Tableau into a headless architecture for autonomous agents. Google folds CodeMinder into agent ecosystems. Microsoft open-sources safety tools. The underlying dynamic is not about individual AI features. It is about companies threading agentic systems into existing product surfaces and data flows, then monetizing through subscription and licensing structures already in place. Distribution wins. The AI becomes infrastructure.

Yet production reality diverges sharply from announcement narrative. The Path's mental health model scores 95 on the Vera-MH benchmark against 65 for consumer bots. Microsoft releases open-source safety tools. These claims measure isolated models in isolation. Enterprise teams report that production AI is significantly harder than early experimentation suggested, with most agents shipping as custom plumbing, fragile session logic, and security models held together by hope. Benchmarks do not capture agents operating inside messy environments, calling APIs, managing state, making decisions with consequences. The fiction of safety theater persists because safety gets announced at the model layer while risk accumulates at the integration layer, where few look and fewer measure.

Lab announcements reveal the real competition has shifted from capability to control. OpenAI targets enterprise workflow capture through ChatGPT for Healthcare. Google DeepMind frames environmental risk as regulatory alignment. NVIDIA treats the entire stack from data centers to edge devices as a single sales funnel. IBM positions quantum as a long-term hardware play backed by proposed federal funding. No lab announced a meaningful advance in underlying technology. Instead they announced distribution channels, geographic expansion, funding mechanisms. GitHub trending reinforces this: developers are moving past "can an LLM write code" toward practical agent infrastructure, skills frameworks, observability systems, and local execution. The unglamorous layer where traction accumulates. That is where the real competition lives now.

Grant Calloway

AI LabsAll labs

Google DeepMind

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

IBM

IBM and U.S. Department of Commerce Announce America’s First Purpose-Built Quantum Foundry, Supported by Proposed $1 Billion CHIPS Award

NVIDIA

OpenAI

AdventHealth advances whole-person care with OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Learned Pairwise Deep Dual-Optimal Inequalities for Stabilizing Column Generation math.OC

Column generation (CG) is central to many large-scale optimization algorithms, including branch-price-and-cut methods for vehicle routing problems, but unstable dual solutions can substantially slow its convergence. Existing deep dual-optimal inequalities can reduce this instability by restricting the dual space. Their construction, however, typically relies on problem-specific exchange arguments that are difficult to establish for routing problems with capacity limits, time windows, and other resource constraints. We introduce learned pairwise deep dual-optimal inequalities (L-PDDOIs), a learning framework that predicts pairwise orderings between dual variables and incorporates their primal counterparts directly into the master problem. To construct training labels, the framework samples optimal dual solutions and selects pairwise order relations that hold simultaneously on a sufficiently large common subset of the samples. A classifier then assigns a score to each candidate relation. Because conflicts and redundancies among the predicted relations can impair performance, graph-based postprocessing filters and compresses the candidate set before deployment. We further introduce a recovery procedure that selectively relaxes learned inequalities and provides a certificate when the baseline CG bound has been restored. On the main test sets for the capacitated vehicle routing problem and the vehicle routing problem with time windows, direct deployment of L-PDDOIs reduces the geometric mean root CG time by 89.7% and 93.9%, respectively, while incurring mean bound losses of only 1.3% and 0.5%. The recovery procedure retains corresponding time reductions of 54.8% and 83.1%, respectively, while guaranteeing no loss in the CG bound.

Learning-enabled Acceleration of Scenario-based Model Predictive Control math.OC

Scenario-based model predictive control (SBMPC) is a variant of model predictive control (MPC) that explicitly accounts for uncertainty by optimizing control actions over multiple predicted scenarios. However, its computational complexity increases rapidly with the number of scenarios and prediction horizon, limiting is applicability to real-time planning and control. This paper presents a learning-accelerated Alternating Direction Method of Multipliers (ADMM) algorithm for efficiently solving SBMPC problems by leveraging parallel computing and Moreau envelope learning, while maintaining high solution accuracy. We reformulate the SBMPC problems into consensus forms that can be decomposed via ADMM, separating the scenario-dependent dynamics from non-anticipativity constraints and enabling parallel updates across scenarios and time steps. Building on this decomposition, we utilize existing learning-to-optimize schemes, which leverages Moreau envelope learning of the cost function to accelerate the primal update in ADMM, thereby reducing computation time. The proposed framework is evaluated on a microgrid energy management problem subject to load and renewable generation uncertainties. Comparisons with IPOPT and MadNLP, popular and modern nonlinear programming solvers, demonstrate substantial computational speedups while maintaining reliable closed-loop control performance.

Actor-Critic Learning for Extended Mean Field Control with Deterministic Policies math.OC

This paper develops a model-free reinforcement learning framework for continuous--time extended mean field control problems, where both the dynamics and reward may depend on the joint distribution of states and controls. We adopt deterministic feedback policies, under which the state--action distribution is induced directly as a push--forward of the state law. This avoids optimization over stochastic kernels and bypasses key limitations of existing approaches in extended mean field settings. We first establish a model--free sensitivity formula for parameterized McKean--Vlasov dynamics and use it to derive a deterministic policy gradient formula expressed through an advantage--rate function on the Wasserstein space. We then refine this formula by introducing local value and advantage--rate representations that depend on the state, action, and joint state--action distribution, yielding a policy gradient that includes both action derivatives and measure--derivative terms with respect to the control distribution. These characterizations lead to a martingale--based learning principle and motivate a continuous--time deep deterministic policy gradient algorithm combining particle approximations, measure--dependent neural networks, temporal--difference learning, and exploration in either action or parameter space. Numerical experiments on stochastic Cucker--Smale consensus control and optimal liquidation with trade crowding demonstrate the efficiency, stability, and robustness of the proposed method, including problems with explicit dependence on the control distribution.

Inter-Stop Energy Prediction and Causal Driver Quantification for Dual-Source Trolleybuses via a Time-Aware Tabular Deep Learning Architecture math.OC

Dual-source trolleybuses alternate between overhead catenary supply and on-board battery operation, creating energy-use patterns driven by route attributes, high-frequency trajectories, and hourly weather. Existing models struggle to represent these heterogeneous inputs and rarely explain the causal drivers of consumption. This paper proposes a time-aware tabular deep learning framework for inter-stop energy management. Periodic time encoding is integrated into a parameter-efficient batch-ensemble backbone to jointly learn static and sequential features, while Bayesian optimization with tree-structured density estimation tunes hyperparameters. To move beyond prediction, a three-layer causal explanation pipeline combines feature attribution for marginal effects, a linear non-Gaussian acyclic model for causal direction discovery, and a meta-learner for net average treatment effects. Experiments on the Zurich trolleybus dataset enriched with meteorological records achieve a MAPE of 6.52% and R of 0.982, outperforming ten statistical, tree-ensemble, and deep learning baselines. Ablation results show that periodic time encoding contributes most to the accuracy gain. Causal analysis identifies regenerative braking ratio and average speed as the strongest energy-saving factors, while coasting distance is the main driver of excess consumption. The findings offer actionable thresholds for vehicle technology, driving behavior, capacity allocation, and catenary network planning.

Optimization Geometrodynamics: A Framework for Dynamic Geometric Optimization math.OC

Most gradient-based optimization methods move parameters through a fixed background geometry, even when their internal states implicitly define changing notions of length, curvature, and preconditioning. We introduce optimization geometrodynamics, a benchmark language in which optimization is a coupled evolution of a parameter trajectory, a transported distribution of particles, and a controlled time-varying Riemannian metric. The language separates invariant obstructions from improvable geometric mismatch: positive metrics preserve critical points and Morse indices, and cannot remove global geodesic-convexity obstructions, but can alter conditioning, distributional transport, and flux away from exact critical points. We introduce dynamic geometric complexity, the minimum geometric cost required to reduce an optimization difficulty observable. In the oracle benchmark model of strongly convex quadratic objectives with full positive-definite metric control, this complexity is exactly the affine-invariant distance from the relative log-spectrum to a low-condition-number set. We also analyze Hessian-matching flows, spectral Onsager relaxation, discrete exponential projection updates, gauge-invariant observables, and fixed-time local Morse-saddle flux. The paper is theory-only: its claims are formal statements with proofs, intended to provide invariants and benchmark costs against which implementable adaptive optimizers can be compared once their admissible metric families, curvature estimates, and discretization errors are specified.

Mathematical methods of reinforcement learning math.OC

Reinforcement learning (RL) is increasingly grounded in tools from probability, optimization, and operator theory. This survey organizes the mathematical structures that underpin the design and analysis of modern algorithms in RL. We begin from Markov decision processes (MDPs) and the Bellman operators, emphasizing contraction mappings, monotonicity, and fixed-point theory that yield convergence guarantees and rates for value and policy iteration, and temporal-difference schemes. We then develop the optimization perspective: stochastic approximation and martingale methods, convex duality and the role of regularization linking mirror/proximal methods. Function approximation is treated through linear and non-linear settings, covering stabilization, error decomposition, and sample-complexity via concentration inequalities for dependent data and mixing processes. We further cover off-policy evaluation/learning, constrained RL and constrained MDPs (CMDPs). Throughout we unify algorithmic templates under common operator and variational lenses, highlighting both finite-sample bounds and asymptotic results. Our presentation is intended to provide a unified mathematical entry point for researchers in probability, optimization, and statistics interested in reinforcement learning.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	65	$11.25
2	Claude Opus 4.7	57.3	49	$10.94
3	Gemini 3.1 Pro Preview	57.2	142	$4.50
4	GPT-5.4	56.8	93	$5.63
5	Qwen3.7 Max	56.6	0	$3.75

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

anthropics/claude-plugins-official

30987 ★

Official, Anthropic-managed directory of high quality Claude Code Plugins.

colbymchenry/codegraph

26469 ★

Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local

multica-ai/andrej-karpathy-skills

155979 ★

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.

dotnet/skills

4382 ★

Repository for skills to assist AI coding agents with .NET and C#

obra/superpowers

252042 ★

An agentic skills framework & software development methodology that works.

Daily discovery

jjang-ai/vmlxLLM

520 ★

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

RunanywhereAI/runanywhere-sdksDiffusion Models

10347 ★

Production ready toolkit to run AI locally

netdata/netdataMCP

78903 ★

The fastest path to AI-powered full stack observability, even for lean teams.

opendilab/awesome-RLHFRLHF

4369 ★

A curated list of reinforcement learning with human feedback resources (continually updated)

autogluon/autogluonAutoML

10444 ★

Fast and Accurate ML in 3 Lines of Code