The frontier AI labs have stopped competing on model capability alone. The real race is for control over the operating layer where intelligence gets deployed, governed, and monetized. OpenAI is shipping agentic coding tools that control your desktop. Anthropic is expanding to London while negotiating Pentagon access. Google is embedding AI directly into Chrome and Photos. Whether Claude Opus 4.7 outscores a leaked competitor matters less than who owns the infrastructure where these models actually run.
Venture capital is treating AI infrastructure as the new platform layer. Factory commanded a $1.5 billion valuation after three years. Upscale AI raised $2 billion just seven months after launch. Physical Intelligence's π0.7 robot brain attracted major funding. But concentration is accelerating. First-quarter venture funding flowed overwhelmingly to large, well-funded U.S. companies even as global deal count fell. Data center delays now threaten Microsoft and OpenAI projects. Meta raised Quest headset prices by $50 to $100 citing RAM shortages. When infrastructure becomes the bottleneck, whoever controls it owns the next decade of software. AWS is tightening its relationship with Anthropic by launching Claude Opus 4.7 through Bedrock's new inference engine, positioning Amazon's infrastructure as the default deployment layer for Claude users. IBM and NVIDIA are pursuing quantum-adjacent positioning to establish themselves as infrastructure for the quantum transition. The pattern across the stack is consolidation around inference engines, API grant programs, and vertical models that embed switching costs into workflows.
The real tension surfaces in how these labs are positioning themselves against traditional software. Anthropic's Chief Product Officer left Figma's board to build competing design tools. Runway's CEO is betting AI can make fifty films instead of one blockbuster. Canva's AI assistant calls external tools. Enterprise customers are beginning to see AI not as a feature but as a replacement for entire categories of software. The margin isn't in the model, it's in the operational layer that makes models reliable enough to replace humans at scale. InsightFinder raised $15 million to diagnose where AI agents fail. Antioch built robotics simulation platforms for the same purpose. Google blocked 8.3 billion ads while suspending fewer advertisers, demonstrating how platform power compounds when you control both the model and the distribution channel.
Developers are already building for this future. GitHub's trending repositories reveal two waves of investment. One is infrastructure for AI agents: memory systems like claude-mem and knowledge engines like cognee built as separate, composable pieces rather than baked into monolithic platforms. The second is self-evolution. GenericAgent achieves full system control from a 3.3K-line seed with 6x lower token consumption than baseline approaches. EvoMap's Evolver and EvoScientist use Gene Expression Programming to let agents modify themselves. These implementations may not be production-ready yet, but they point toward a real problem: manually updating agent prompts and skills doesn't scale. Meanwhile, benchmark convergence at the frontier suggests the capability differentiation game is narrowing. Claude Opus 4.6 moved from fourth to first on SWE-rebench, climbing 12.3 points to 65.3 percent. The gap between first and second place narrowed to 0.9 points, with the top six models clustering between 62.3 and 65.3 percent. The field is consolidating not around who builds the smartest model, but around who builds the infrastructure that makes those models deployable, governable, and hard to leave.
Grant Calloway
This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.
Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.
Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.
Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.
The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.
Composite score across coding, math, and reasoning
| # | Model | Score | tok/s | $/1M |
|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 57.2 | 123 | $4.50 |
| 2 | GPT-5.4 | 56.8 | 81 | $5.63 |
| 3 | GPT-5.3 Codex | 53.6 | 70 | $4.81 |
| 4 | Claude Opus 4.6 | 53 | 44 | $10.00 |
| 5 | Muse Spark | 52.1 | 0 | $0.00 |
Agentic coding on real-world software engineering tasks
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.6 | 65.3% |
| 2 | gpt-5.2-2025-12-11-medium | 64.4% |
| 3 | GLM-5 | 62.8% |
| 4 | gpt-5.4-2026-03-05-medium | 62.8% |
| 5 | GLM-5.1 | 62.7% |
A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption
The open-source voice synthesis studio
An open source template for building cloud agents.
🔬 Harness Vibe Research with Self-evolving AI Scientists
Full-Stack Development Platform for Building Reliable Agents
End-to-End Speech Processing Toolkit
Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
fully OSS lightweight network video recorder system witten in C with modern js frontend