The infrastructure layer for autonomous AI agents is now the primary battleground for competitive advantage, and the companies racing to control it are abandoning the fiction that models alone drive value. Microsoft's Project Solara, Scout, and new agent governance tools reveal a strategic pivot away from model capability toward the operating system where agents execute tasks. Workday, Snowflake, and others are simultaneously building compliance and context layers because enterprises will not deploy agents without guardrails embedded in the platform itself. The leverage now flows to whoever owns the policy engine, the identity layer, and the audit trail when an agent acts on your behalf continuously. This shift from model-centric competition to infrastructure-centric competition is not theoretical. It is visible across every company announcement this week: OpenAI is moving Codex from a developer tool into a horizontal productivity layer across finance, marketing, and analytics; NVIDIA and Microsoft are packaging hardware, runtimes, data layers, and tuned models as an integrated full-stack offering; Hugging Face and Anthropic are focusing on local agent deployment and orchestration infrastructure rather than model releases.
The market is simultaneously correcting the productivity narrative that sustained AI valuations through 2024. Cyera is raising at 80x ARR despite operating losses while Uber capped AI spending after exhausting its budget in four months. Most telling: Impulse Space raised half a billion dollars explicitly to hire humans instead of betting on AI replacing engineers. When venture capital flows toward human hiring rather than automation, the market is signaling that the productivity gains from current AI are narrower than the hype suggested. Autonomous systems require human oversight, domain expertise, and governance infrastructure that existing models cannot provide. The collision between the venture narrative of "AI will do the work" and the operating reality of what autonomous systems actually need is reshaping how companies deploy capital.
Regulatory capture is outpacing formal regulation. Trump signed a narrower executive order requiring only voluntary prerelease government reviews of advanced models after industry objections killed stronger versions. Meanwhile, Anthropic is scaling Claude Mythos access to 150 organizations across 15 countries targeting critical infrastructure in power, water, healthcare, and communications, essentially certifying itself as trustworthy for systems affecting 100 million people. This is not regulation; this is the regulated choosing which regulator to work with. Supply chain attacks on npm packages targeting OpenAI Codex users and Red Hat cloud services reveal the real vulnerability is not the models themselves but the developer tools and integrations wrapping around them, which are moving too fast for security to keep pace.
On the execution side, the GitHub ecosystem confirms what the infrastructure announcements suggest: developers are moving past monolithic agent frameworks toward specialized components that solve concrete problems. LangGraph is maturing into genuine state management and resilience patterns. Headroom cuts token usage by 60-95% by compressing logs and RAG chunks before they reach the model. MarkItDown's 140,000 stars reflects a simpler truth: converting documents to Markdown remains a bottleneck for RAG pipelines. VoxCPM2, Open-LLM-VTuber, Scrapling, and CVAT occupy distinct niches rather than pretending to solve everything. This is a healthier ecosystem than monolithic platforms, and it mirrors the infrastructure layer consolidation happening upstream: better plumbing matters more than smarter agents.
Grant Calloway
We study the subclass of potential mean-field games in which the running interaction cost and the terminal target cost are both expressed through reproducing-kernel maximum mean discrepancy (MMD) penalties, and develop a computational framework that exploits this kernel structure. Both costs are estimated from finite-sample empirical distributions using a random Fourier U-statistic representation that is unbiased and has linear cost in the batch size. The drift of the controlled diffusion is parametrized by a neural network and trained via stochastic gradient descent. For this subclass we prove a sample-level almost-sure convergence theorem and an explicit almost-sure rate of convergence, under coupled rate conditions on the penalty parameter, the random-feature count, the sample size, and the optimization tolerance. The framework includes the kernel-MMD-penalty Schrödinger bridge problem as the special case of a vanishing interaction cost. Numerical experiments illustrate the method on the Schrödinger bridge problem in dimensions up to one hundred, and on an electric vehicle charging coordination problem with per-vehicle physical heterogeneity, where an aggregate-demand congestion cost represents price-feedback competition at the population level and the terminal MMD penalty shapes the state-of-charge distribution at the deadline.
In this paper, we study a structured class of nonconvex constrained stochastic problems with difference-of-convex (DC) regularization, where the feasible set is possibly nonconvex and the concave part of the DC regularizer is allowed to be nonsmooth. The fundamental challenge lies in maintaining feasibility for nonconvex constraints while achieving favorable oracle complexity. Although single-loop algorithms efficiently solve unconstrained DC optimization problems, their potential for constrained optimization with DC structure remains largely unexplored. To address this gap, we develop MoSSP, a Momentum-based Single-loop Stochastic Penalty method for such problems with provable complexity guarantees. The key idea is to apply a single stochastic proximal-gradient step to the Moreau envelope of the penalty plus the convex DC part, with the concave part's proximal mapping computed in parallel. We derive two algorithm variants: a Polyak-momentum version with $O(\varepsilon^{-4})$ oracle complexity for finding stochastic $\varepsilon$-KKT points, and an improved $O(\varepsilon^{-3})$ version incorporating recursive momentum. Experimental results demonstrate the effectiveness of the proposed algorithms.
Vehicle-to-vehicle (V2V) energy trading enables decentralized peer-to-peer energy exchange among electric vehicles (EVs), reducing grid dependency while monetizing surplus capacity. However, coordinating self-interested EV agents with diverse charging needs and uncertain arrival-departure schedules remains challenging. Existing approaches either require centralized optimization with computational limitations or lack fairness guarantees. This paper integrates Nash Bargaining Solution into Multi-Agent Deep Deterministic Policy Gradient, namely Nash-MADDPG, for incentive-aligned V2V energy trading. Nash bargaining determines efficient bilateral pricing, while Nash-guided price proximity rewards align agent learning toward bargaining-optimal strategies. Evaluation over 30-day continuous operation demonstrates an improvement of 61.6% in social welfare and 62.9% improvement in trading volume over Double Auction, while achieving superior fairness, such as 40.1% improvement in Jain's index. Testing across 6-100 agents over a 30-day horizon with continuous vehicle turnover confirms scalability across population size and empirically stable pricing near the Nash Bargaining benchmark.
Optimization over the intersection of two manifolds arises in a broad range of applications, but is hindered by the coupled geometry of the feasible region. In this paper, we prove that the regularities -- clean intersection and intrinsic transversality -- are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore, we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. Specifically, the iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality, we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary. Numerical experiments on problems stemming from sparse and low-rank optimization, including fitting spherical data, approximating hyperbolic embeddings on real data, and computing compressed modes, demonstrate the effectiveness of the proposed method.
Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.
A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible updates consistently improve final validation loss, and in several cases training stability, over corresponding AdamW updates.
Composite score across coding, math, and reasoning
| # | Model | Score | tok/s | $/1M |
|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 61.4 | 59 | $10.94 |
| 2 | GPT-5.5 | 60.2 | 67 | $11.25 |
| 3 | Claude Opus 4.7 | 57.3 | 53 | $10.94 |
| 4 | Gemini 3.1 Pro Preview | 57.2 | 123 | $4.50 |
| 5 | GPT-5.4 | 56.8 | 79 | $5.63 |
Agentic coding on real-world software engineering tasks
| # | Model | Score |
|---|---|---|
| 1 | gpt-5.5-2026-04-23-xhigh | 62.7% |
| 2 | Codex | 60.4% |
| 3 | Claude Code | 59.6% |
| 4 | gpt-5.5-2026-04-23-medium | 58.9% |
| 5 | Claude Opus 4.8-xhigh | 56.4% |
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
Python tool for converting files and office documents to Markdown.
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Hermes WebUI: The best way to use Hermes Agent from the web or from your phone!
Simplifying reinforcement learning for complex game environments
A minimal quadrotor autonomy framework in Rust (Mac, Linux, Windows)
Fast and Accurate ML in 3 Lines of Code
Build resilient language agents as graphs.
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.