The Inference Report

June 3, 2026

The infrastructure layer for autonomous AI agents is now the primary battleground for competitive advantage, and the companies racing to control it are abandoning the fiction that models alone drive value. Microsoft's Project Solara, Scout, and new agent governance tools reveal a strategic pivot away from model capability toward the operating system where agents execute tasks. Workday, Snowflake, and others are simultaneously building compliance and context layers because enterprises will not deploy agents without guardrails embedded in the platform itself. The leverage now flows to whoever owns the policy engine, the identity layer, and the audit trail when an agent acts on your behalf continuously. This shift from model-centric competition to infrastructure-centric competition is not theoretical. It is visible across every company announcement this week: OpenAI is moving Codex from a developer tool into a horizontal productivity layer across finance, marketing, and analytics; NVIDIA and Microsoft are packaging hardware, runtimes, data layers, and tuned models as an integrated full-stack offering; Hugging Face and Anthropic are focusing on local agent deployment and orchestration infrastructure rather than model releases.

The market is simultaneously correcting the productivity narrative that sustained AI valuations through 2024. Cyera is raising at 80x ARR despite operating losses while Uber capped AI spending after exhausting its budget in four months. Most telling: Impulse Space raised half a billion dollars explicitly to hire humans instead of betting on AI replacing engineers. When venture capital flows toward human hiring rather than automation, the market is signaling that the productivity gains from current AI are narrower than the hype suggested. Autonomous systems require human oversight, domain expertise, and governance infrastructure that existing models cannot provide. The collision between the venture narrative of "AI will do the work" and the operating reality of what autonomous systems actually need is reshaping how companies deploy capital.

Regulatory capture is outpacing formal regulation. Trump signed a narrower executive order requiring only voluntary prerelease government reviews of advanced models after industry objections killed stronger versions. Meanwhile, Anthropic is scaling Claude Mythos access to 150 organizations across 15 countries targeting critical infrastructure in power, water, healthcare, and communications, essentially certifying itself as trustworthy for systems affecting 100 million people. This is not regulation; this is the regulated choosing which regulator to work with. Supply chain attacks on npm packages targeting OpenAI Codex users and Red Hat cloud services reveal the real vulnerability is not the models themselves but the developer tools and integrations wrapping around them, which are moving too fast for security to keep pace.

On the execution side, the GitHub ecosystem confirms what the infrastructure announcements suggest: developers are moving past monolithic agent frameworks toward specialized components that solve concrete problems. LangGraph is maturing into genuine state management and resilience patterns. Headroom cuts token usage by 60-95% by compressing logs and RAG chunks before they reach the model. MarkItDown's 140,000 stars reflects a simpler truth: converting documents to Markdown remains a bottleneck for RAG pipelines. VoxCPM2, Open-LLM-VTuber, Scrapling, and CVAT occupy distinct niches rather than pretending to solve everything. This is a healthier ecosystem than monolithic platforms, and it mirrors the infrastructure layer consolidation happening upstream: better plumbing matters more than smarter agents.

Grant Calloway

AI LabsAll labs

Anthropic

Expanding Project Glasswing

Hugging Face

Holo3.1: Fast & Local Computer Use Agents

IBM

IBM Commits More Than $10 Billion to Quantum Computing, Funding Its Roadmap from Today's Leading Systems to the World's First Fault-Tolerant Quantum Computers

NVIDIA

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Learned Pairwise Deep Dual-Optimal Inequalities for Stabilizing Column Generation math.OC

Column generation (CG) is central to many large-scale optimization algorithms, including branch-price-and-cut methods for vehicle routing problems, but unstable dual solutions can substantially slow its convergence. Existing deep dual-optimal inequalities can reduce this instability by restricting the dual space. Their construction, however, typically relies on problem-specific exchange arguments that are difficult to establish for routing problems with capacity limits, time windows, and other resource constraints. We introduce learned pairwise deep dual-optimal inequalities (L-PDDOIs), a learning framework that predicts pairwise orderings between dual variables and incorporates their primal counterparts directly into the master problem. To construct training labels, the framework samples optimal dual solutions and selects pairwise order relations that hold simultaneously on a sufficiently large common subset of the samples. A classifier then assigns a score to each candidate relation. Because conflicts and redundancies among the predicted relations can impair performance, graph-based postprocessing filters and compresses the candidate set before deployment. We further introduce a recovery procedure that selectively relaxes learned inequalities and provides a certificate when the baseline CG bound has been restored. On the main test sets for the capacitated vehicle routing problem and the vehicle routing problem with time windows, direct deployment of L-PDDOIs reduces the geometric mean root CG time by 89.7% and 93.9%, respectively, while incurring mean bound losses of only 1.3% and 0.5%. The recovery procedure retains corresponding time reductions of 54.8% and 83.1%, respectively, while guaranteeing no loss in the CG bound.

Learning-enabled Acceleration of Scenario-based Model Predictive Control math.OC

Scenario-based model predictive control (SBMPC) is a variant of model predictive control (MPC) that explicitly accounts for uncertainty by optimizing control actions over multiple predicted scenarios. However, its computational complexity increases rapidly with the number of scenarios and prediction horizon, limiting is applicability to real-time planning and control. This paper presents a learning-accelerated Alternating Direction Method of Multipliers (ADMM) algorithm for efficiently solving SBMPC problems by leveraging parallel computing and Moreau envelope learning, while maintaining high solution accuracy. We reformulate the SBMPC problems into consensus forms that can be decomposed via ADMM, separating the scenario-dependent dynamics from non-anticipativity constraints and enabling parallel updates across scenarios and time steps. Building on this decomposition, we utilize existing learning-to-optimize schemes, which leverages Moreau envelope learning of the cost function to accelerate the primal update in ADMM, thereby reducing computation time. The proposed framework is evaluated on a microgrid energy management problem subject to load and renewable generation uncertainties. Comparisons with IPOPT and MadNLP, popular and modern nonlinear programming solvers, demonstrate substantial computational speedups while maintaining reliable closed-loop control performance.

Actor-Critic Learning for Extended Mean Field Control with Deterministic Policies math.OC

This paper develops a model-free reinforcement learning framework for continuous--time extended mean field control problems, where both the dynamics and reward may depend on the joint distribution of states and controls. We adopt deterministic feedback policies, under which the state--action distribution is induced directly as a push--forward of the state law. This avoids optimization over stochastic kernels and bypasses key limitations of existing approaches in extended mean field settings. We first establish a model--free sensitivity formula for parameterized McKean--Vlasov dynamics and use it to derive a deterministic policy gradient formula expressed through an advantage--rate function on the Wasserstein space. We then refine this formula by introducing local value and advantage--rate representations that depend on the state, action, and joint state--action distribution, yielding a policy gradient that includes both action derivatives and measure--derivative terms with respect to the control distribution. These characterizations lead to a martingale--based learning principle and motivate a continuous--time deep deterministic policy gradient algorithm combining particle approximations, measure--dependent neural networks, temporal--difference learning, and exploration in either action or parameter space. Numerical experiments on stochastic Cucker--Smale consensus control and optimal liquidation with trade crowding demonstrate the efficiency, stability, and robustness of the proposed method, including problems with explicit dependence on the control distribution.

Inter-Stop Energy Prediction and Causal Driver Quantification for Dual-Source Trolleybuses via a Time-Aware Tabular Deep Learning Architecture math.OC

Dual-source trolleybuses alternate between overhead catenary supply and on-board battery operation, creating energy-use patterns driven by route attributes, high-frequency trajectories, and hourly weather. Existing models struggle to represent these heterogeneous inputs and rarely explain the causal drivers of consumption. This paper proposes a time-aware tabular deep learning framework for inter-stop energy management. Periodic time encoding is integrated into a parameter-efficient batch-ensemble backbone to jointly learn static and sequential features, while Bayesian optimization with tree-structured density estimation tunes hyperparameters. To move beyond prediction, a three-layer causal explanation pipeline combines feature attribution for marginal effects, a linear non-Gaussian acyclic model for causal direction discovery, and a meta-learner for net average treatment effects. Experiments on the Zurich trolleybus dataset enriched with meteorological records achieve a MAPE of 6.52% and R of 0.982, outperforming ten statistical, tree-ensemble, and deep learning baselines. Ablation results show that periodic time encoding contributes most to the accuracy gain. Causal analysis identifies regenerative braking ratio and average speed as the strongest energy-saving factors, while coasting distance is the main driver of excess consumption. The findings offer actionable thresholds for vehicle technology, driving behavior, capacity allocation, and catenary network planning.

Optimization Geometrodynamics: A Framework for Dynamic Geometric Optimization math.OC

Most gradient-based optimization methods move parameters through a fixed background geometry, even when their internal states implicitly define changing notions of length, curvature, and preconditioning. We introduce optimization geometrodynamics, a benchmark language in which optimization is a coupled evolution of a parameter trajectory, a transported distribution of particles, and a controlled time-varying Riemannian metric. The language separates invariant obstructions from improvable geometric mismatch: positive metrics preserve critical points and Morse indices, and cannot remove global geodesic-convexity obstructions, but can alter conditioning, distributional transport, and flux away from exact critical points. We introduce dynamic geometric complexity, the minimum geometric cost required to reduce an optimization difficulty observable. In the oracle benchmark model of strongly convex quadratic objectives with full positive-definite metric control, this complexity is exactly the affine-invariant distance from the relative log-spectrum to a low-condition-number set. We also analyze Hessian-matching flows, spectral Onsager relaxation, discrete exponential projection updates, gauge-invariant observables, and fixed-time local Morse-saddle flux. The paper is theory-only: its claims are formal statements with proofs, intended to provide invariants and benchmark costs against which implementable adaptive optimizers can be compared once their admissible metric families, curvature estimates, and discretization errors are specified.

Mathematical methods of reinforcement learning math.OC

Reinforcement learning (RL) is increasingly grounded in tools from probability, optimization, and operator theory. This survey organizes the mathematical structures that underpin the design and analysis of modern algorithms in RL. We begin from Markov decision processes (MDPs) and the Bellman operators, emphasizing contraction mappings, monotonicity, and fixed-point theory that yield convergence guarantees and rates for value and policy iteration, and temporal-difference schemes. We then develop the optimization perspective: stochastic approximation and martingale methods, convex duality and the role of regularization linking mirror/proximal methods. Function approximation is treated through linear and non-linear settings, covering stabilization, error decomposition, and sample-complexity via concentration inequalities for dependent data and mixing processes. We further cover off-policy evaluation/learning, constrained RL and constrained MDPs (CMDPs). Throughout we unify algorithmic templates under common operator and variational lenses, highlighting both finite-sample bounds and asymptotic results. Our presentation is intended to provide a unified mathematical entry point for researchers in probability, optimization, and statistics interested in reinforcement learning.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Opus 4.8	61.4	59	$10.94
2	GPT-5.5	60.2	67	$11.25
3	Claude Opus 4.7	57.3	53	$10.94
4	Gemini 3.1 Pro Preview	57.2	123	$4.50
5	GPT-5.4	56.8	79	$5.63

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	gpt-5.5-2026-04-23-xhigh	62.7%
2	Codex	60.4%
3	Claude Code	59.6%
4	gpt-5.5-2026-04-23-medium	58.9%
5	Claude Opus 4.8-xhigh	56.4%

GitHub Repos All repos

Trending

chopratejas/headroom

42574 ★

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

microsoft/markitdown

143443 ★

Python tool for converting files and office documents to Markdown.

affaan-m/ECC

225416 ★

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

D4Vinci/Scrapling

60665 ★

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

nesquena/hermes-webui

13269 ★

Hermes WebUI: The best way to use Hermes Agent from the web or from your phone!

Daily discovery

PufferAI/PufferLibReinforcement Learning

5806 ★

Simplifying reinforcement learning for complex game environments

makeecat/PengRobotics

699 ★

A minimal quadrotor autonomy framework in Rust (Mac, Linux, Windows)

autogluon/autogluonAutoML

10444 ★

Fast and Accurate ML in 3 Lines of Code

langchain-ai/langgraphGenerative AI

33724 ★

Build resilient language agents as graphs.

expectedparrot/edslSynthetic Data

472 ★

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.