The Inference Report — April 17, 2026

The frontier AI labs have stopped competing on model capability alone. The real race is for control over the operating layer where intelligence gets deployed, governed, and monetized. OpenAI is shipping agentic coding tools that control your desktop. Anthropic is expanding to London while negotiating Pentagon access. Google is embedding AI directly into Chrome and Photos. Whether Claude Opus 4.7 outscores a leaked competitor matters less than who owns the infrastructure where these models actually run.

Venture capital is treating AI infrastructure as the new platform layer. Factory commanded a $1.5 billion valuation after three years. Upscale AI raised $2 billion just seven months after launch. Physical Intelligence's π0.7 robot brain attracted major funding. But concentration is accelerating. First-quarter venture funding flowed overwhelmingly to large, well-funded U.S. companies even as global deal count fell. Data center delays now threaten Microsoft and OpenAI projects. Meta raised Quest headset prices by $50 to $100 citing RAM shortages. When infrastructure becomes the bottleneck, whoever controls it owns the next decade of software. AWS is tightening its relationship with Anthropic by launching Claude Opus 4.7 through Bedrock's new inference engine, positioning Amazon's infrastructure as the default deployment layer for Claude users. IBM and NVIDIA are pursuing quantum-adjacent positioning to establish themselves as infrastructure for the quantum transition. The pattern across the stack is consolidation around inference engines, API grant programs, and vertical models that embed switching costs into workflows.

The real tension surfaces in how these labs are positioning themselves against traditional software. Anthropic's Chief Product Officer left Figma's board to build competing design tools. Runway's CEO is betting AI can make fifty films instead of one blockbuster. Canva's AI assistant calls external tools. Enterprise customers are beginning to see AI not as a feature but as a replacement for entire categories of software. The margin isn't in the model, it's in the operational layer that makes models reliable enough to replace humans at scale. InsightFinder raised $15 million to diagnose where AI agents fail. Antioch built robotics simulation platforms for the same purpose. Google blocked 8.3 billion ads while suspending fewer advertisers, demonstrating how platform power compounds when you control both the model and the distribution channel.

Developers are already building for this future. GitHub's trending repositories reveal two waves of investment. One is infrastructure for AI agents: memory systems like claude-mem and knowledge engines like cognee built as separate, composable pieces rather than baked into monolithic platforms. The second is self-evolution. GenericAgent achieves full system control from a 3.3K-line seed with 6x lower token consumption than baseline approaches. EvoMap's Evolver and EvoScientist use Gene Expression Programming to let agents modify themselves. These implementations may not be production-ready yet, but they point toward a real problem: manually updating agent prompts and skills doesn't scale. Meanwhile, benchmark convergence at the frontier suggests the capability differentiation game is narrowing. Claude Opus 4.6 moved from fourth to first on SWE-rebench, climbing 12.3 points to 65.3 percent. The gap between first and second place narrowed to 0.9 points, with the top six models clustering between 62.3 and 65.3 percent. The field is consolidating not around who builds the smartest model, but around who builds the infrastructure that makes those models deployable, governable, and hard to leave.

Grant Calloway

AI LabsAll labs

AWS

Introducing Anthropic’s Claude Opus 4.7 model in Amazon Bedrock

Anthropic

Introducing Claude Opus 4.7

Google

Hugging Face

IBM

NVIDIA

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Operator-Informed Gaussian Processes for Complex Helmholtz Wavefields: From Synthetic Benchmarks to In Vivo Brain Elastography stat.ML

The Helmholtz equation governs time-harmonic wave propagation, and in dissipative media a complex modulus renders its squared wavenumber $κ^2$ complex. Inferring such fields from sparse, noisy data calls for solvers that also quantify their own uncertainty. Physics-informed Gaussian-process (GP) regression supplies this by returning a posterior over the solution, yet operator-conditioned formulations have been developed almost exclusively for real-valued fields. We extend operator-informed GP regression to complex-valued Helmholtz problems by realifying the complex operator into an equivalent coupled real block, which enables inference with standard real-valued GP conditioning. The construction admits a family of priors, from a proper diagonal prior to coregionalized and multiscale variants, and conditions on PDE residuals and boundary traces. On benchmark problems in one to three dimensions, the solver is competitive with finite-difference and neural-network baselines at a far smaller interior-constraint budget. Unlike those deterministic baselines, it returns a posterior over the complex wavefield rather than a point estimate. Applied to \textit{in vivo} brain magnetic resonance elastography, a proper multiscale prior reconstructs the shear curl field to a correlation of $0.77$ with measurement, above a $0.75$ target. The gain arises from the multiscale kernel rather than from real--imaginary coupling. We further identify a low-frequency accuracy ceiling set by model mismatch and a posterior uncertainty that is not yet calibrated. Calibrated uncertainty therefore emerges as the central next step for probabilistic wavefield inference in dissipative media.

Spectral Concentration and Recovery in Sparse High-Dimensional Random Geometric Graphs stat.ML

We study sparse random geometric graphs generated by connecting pairs of high-dimensional vectors whose inner product exceeds a threshold. The latent vectors are sampled either uniformly from the sphere or from a standard Gaussian distribution. Although every edge appears with probability $p$, the edges are dependent through their shared latent vectors. For the spherical model, at the connectivity scale $np=Ω(\log n)$, we prove $\|A-\mathbb E A\|=O\left(\sqrt{np\log n}+npτ\right)$, with high probability, where $τ$ is the cap threshold. This sharpens the spectral norm bound of Liu, Mohanty, Schramm, and Yang (2023) under weaker assumptions. An analogous result holds for the Gaussian model after removing the fluctuations of the vector norms, yielding improved global synchronization guarantees for the homogeneous Kuramoto model. We then recover the latent geometry from the leading eigenspace. When $np\gg\log n$, both the latent vector and relative Gram matrix errors vanish provided $d\ll np\log(1/p)/\log n$. The required lower dimension is only $d\gg\log(1/p)$ for the spherical model and $d\gg\log^2(1/p)\log n$ for the Gaussian model, improving the recovery guarantees of Li and Schramm (2023). Finally, we prove the first exact recovery result for the Gaussian mixture block model of Li and Schramm (2023). At the optimal connectivity scale $np=Ω(\log n)$, a polynomial-time semidefinite program exactly recovers all labels in a moderate-separation regime, whereas larger separation makes exact recovery impossible because isolated vertices appear with high probability. Our proofs combine orthogonal polynomial expansions, decoupling, and matrix concentration, avoiding the trace-moment arguments used in previous work.

Optimal Self-Distillation for Rectified Flow via Linear Probing stat.ML

Modern generative models are increasingly trained using model-generated signals, creating both opportunities for self-improvement and risks of collapse. We study optimal self-distillation (SD) for rectified flow (RF): given a suboptimal teacher velocity field, can a student trained on a mixture of true RF velocities and teacher velocities provably improve the teacher? For linear RF with ridge regularization on fixed interpolation pairs, we prove an exact affine path identity, derive the optimal mixing coefficient in closed form, and show strict improvement in integrated velocity risk whenever the teacher risk is nonstationary along the regularization path. The optimal coefficient obeys a sign rule: positive mixing corrects under-regularized teachers, while negative mixing corrects over-regularized teachers. We also give one-shot generalized cross-validation (GCV) and validation tuning procedure that avoids grid search over mixing weights and repeated refitting. Combining this theorem with RF Wasserstein convergence bounds, we show that optimal self-distillation improves the velocity estimation terms controlling continuous-time and finite-step generation error. Experiments with Gaussian models, Gaussian mixtures, and image data show that optimal self-distillation improves velocity risk, mode recovery, and finite-step generation relative to both the teacher and pure distillation.

cGAP: Generalized Association Plots with HOMALS-Guided Heatmaps for Visualization of High-Dimensional Categorical Data stat.ML

High-dimensional categorical data arise in genetics, biomedicine, and the social sciences, yet visualization tools for such data remain far less developed than those for continuous variables. Existing methods either scale poorly, rely heavily on low-dimensional displays detached from the original data matrix, or prioritize predictive accuracy over interpretability. To address this gap, we introduce categorical Generalized Association Plots (cGAP), a visualization framework for nominal, ordinal, and binary data that preserves the original data matrix while augmenting it with interpretable geometric structure. cGAP uses Homogeneity Analysis (HOMALS) to embed subjects and category levels in a three-dimensional Euclidean space and maps the embedding to red-green-blue coordinates so that similar patterns receive similar colors. The framework integrates three coordinated views: a HOMALS-guided heatmap of the raw data matrix, a subject proximity matrix, and a variable proximity matrix. Seriation algorithms are then used to reorder rows and columns to reveal coherent clusters, outliers, and local-to-global structure. We also derive barycentric traceability, projection-distortion, and contrast-preservation properties that clarify how embedding geometry is transferred to the display. We demonstrate the versatility of cGAP through applications to student-animal classification data, mammalian dentition profiles, mushroom records from the UCI Machine Learning Repository, and the Clusters of Orthologous Genes database. These examples show that cGAP supports transparent exploratory analysis by maintaining traceability between derived visual structure and the original categorical observations. cGAP provides a full-matrix, heatmap-based visualization environment for investigating complex categorical datasets across scientific domains.

Subjective Risk Decomposition: A New View for Uncertainty Quantification stat.ML

We present a novel viewpoint for uncertainty quantification. Uncertainty measures are not primitives, in need of axioms and argumentation, but instead consequences, of higher-level modelling decisions. We show how epistemic and aleatoric uncertainty measures can be derived via decomposition of a subjective risk, based on a strictly proper loss. Reverse cross-entropy provides a prominent example, where decomposition recovers the classic information-theoretic uncertainty terms. The same approach recovers numerous measures previously proposed across the UQ literature, providing them a common theoretical foundation. From a practical point of view, this suggests a new approach to UQ: given a modelling scenario and strictly proper loss, the corresponding epistemic and aleatoric terms are induced by the subjective-risk decomposition. We then extend our view to learning theory: we introduce and analyse subjective risk analogues of excess risk, approximation error, and estimation error, and identify the connections to UQ. We consider this a first step towards a full learning-theoretic framework for uncertainty quantification.

Price of Fairness in Bandits: A Tight Minimax Characterization stat.ML

In bandit problems, standard regret-minimizing algorithms treat exploration as an amortized cost, which can expose early participants to unfair ex-ante losses in settings such as clinical trials. Recent work addresses this by evaluating the sequence of per-round expected rewards through the generalized $p$-mean, interpolating between utilitarian welfare ($p=1$), Nash welfare ($p\to0$), and Rawlsian fairness ($p\to-\infty$). Although tight guarantees are known for $p\ge0$, the strictly fair regime $q=-p>0$ remains unresolved because negative-power means are dominated by the smallest per-round rewards. For $σ$-sub-Gaussian rewards with nonnegative means, the best prior algorithm relied on uniform early exploration and achieved regret $O(k^{(q+1)/2}/\sqrt{T})$, while the only general lower bound was the classical $Ω(σ\sqrt{k/T})$. Thus it was unclear whether the extra dependence on $k$ was intrinsic to strict fairness or an artifact of uniform exploration. We close this gap by identifying the exact polynomial price of strict fairness. Using a needle-in-haystack construction, we prove an algorithm-independent lower bound $Ω(σ\sqrt{k^{\max(1,q)}/T})$; for $q>1$, this shows that the penalty $k^{q/2}$ is information-theoretically unavoidable. We then introduce \textsf{UCB-HARE} (Harmonic Anchored Rank Exploration), which replaces uniform exploration with an inverse-weighted harmonic rank schedule protected by a certified positive-mean anchor. Its regret is $\widetilde{O}(σ\sqrt{k^{\max(1,q)}/T})$, matching the lower bound up to logarithmic factors. Experiments on synthetic instances confirm that \textsf{UCB-HARE} improves over uniform-exploration baselines, with gains increasing as $q$ grows.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Gemini 3.1 Pro Preview	57.2	123	$4.50
2	GPT-5.4	56.8	81	$5.63
3	GPT-5.3 Codex	53.6	70	$4.81
4	Claude Opus 4.6	53	44	$10.00
5	Muse Spark	52.1	0	$0.00

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	gpt-5.4-2026-03-05-medium	62.8%
5	GLM-5.1	62.7%

GitHub Repos All repos

Trending

forrestchang/andrej-karpathy-skills

115291 ★

thedotmack/claude-mem

78952 ★

A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

lsdefine/GenericAgent

12970 ★

Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption

jamiepine/voicebox

33512 ★

The open-source voice synthesis studio

vercel-labs/open-agents

5168 ★

An open source template for building cloud agents.

Daily discovery

EvoScientist/EvoScientistAI Agents

2793 ★