The Inference Report — June 2, 2026

Meanwhile, the companies building AI at scale have shifted from competing on model capability to competing on who can afford the infrastructure to run them. Alphabet's $80 billion stock sale, Anthropic's confidential IPO filing, and SpaceX's water-access disclosures all point to a single constraint: raw physical and financial resources. OpenAI's 1GW Michigan data center groundbreaking and its simultaneous move to distribute frontier models through AWS and Amazon Bedrock signal that compute capacity itself has become a competitive moat, while the chip makers NVIDIA and AMD are positioning around the software stacks that will make their hardware sticky for edge deployment and enterprise workloads. The shift from model capability to operational capacity is reshaping who gets to compete at all.

The economics of inference remain broken for many workloads, as GitHub Copilot's usage-based pricing and reports of users burning through monthly credit allotments in a single day make clear. Yet companies that can absorb high compute costs are capturing the value: GM's shift from 15-hour CFD simulations to one-minute AI-powered iterations demonstrates that the real margin accrues to those with existing distribution and hardware relationships. HPE's stock soared 37% on booming demand for data center equipment. Microsoft, Dell, and HP are shipping AI agent PCs as NVIDIA chases the $200 billion CPU market, because the cheaper economics of edge inference outweigh the latency trade-offs. The capital intensity of AI infrastructure is consolidating power among companies that can raise tens of billions and among those with existing relationships to enterprise procurement.

The secondary effects are already visible in operational failures. Meta's AI support chatbot was duped into helping steal Instagram handles because the company moved fast without production discipline. OpenAI faces a lawsuit from Florida over ChatGPT's alleged role in violent incidents, a category of liability that no insurance framework has priced in. Flowise's MCP implementation has a one-click remote code execution vulnerability affecting self-hosted deployments. A startup testing robots faces a $12,000 lawsuit after allegedly trashing an Airbnb. These failures are not failures of AI itself but of companies deploying it without the operational maturity that comes from running systems at scale under scrutiny. The winners in the next phase will not just be those who can raise capital and build chips. They will be those who can run production systems without breaking things.

The developer community is already building the infrastructure layer that this reality demands. GitHub's trending set shows aggressive infrastructure for AI agents: trading frameworks, memory engines, web scrapers, and terminal coding agents climbing the charts. TradingAgents sits at 82k stars because it solves a concrete problem, coordinating multiple LLM instances to make financial decisions. Equally important is conversion and data movement: microsoft/markitdown's 139k stars reflects that file format conversion is a solved problem that never stops being needed now that every document pipeline touches an LLM. The unglamorous work of getting data into shape before the model sees it is where the actual friction lives, and that is where serious projects are investing.

Grant Calloway

AI LabsAll labs

AMD

AWS

Anthropic

Anthropic confidentially submits draft S-1 to the SEC

Hugging Face

Microsoft

There is no Copilot without the pilots, says Slovenian insurance executive

NVIDIA

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Operator-Informed Gaussian Processes for Complex Helmholtz Wavefields: From Synthetic Benchmarks to In Vivo Brain Elastography stat.ML

The Helmholtz equation governs time-harmonic wave propagation, and in dissipative media a complex modulus renders its squared wavenumber $κ^2$ complex. Inferring such fields from sparse, noisy data calls for solvers that also quantify their own uncertainty. Physics-informed Gaussian-process (GP) regression supplies this by returning a posterior over the solution, yet operator-conditioned formulations have been developed almost exclusively for real-valued fields. We extend operator-informed GP regression to complex-valued Helmholtz problems by realifying the complex operator into an equivalent coupled real block, which enables inference with standard real-valued GP conditioning. The construction admits a family of priors, from a proper diagonal prior to coregionalized and multiscale variants, and conditions on PDE residuals and boundary traces. On benchmark problems in one to three dimensions, the solver is competitive with finite-difference and neural-network baselines at a far smaller interior-constraint budget. Unlike those deterministic baselines, it returns a posterior over the complex wavefield rather than a point estimate. Applied to \textit{in vivo} brain magnetic resonance elastography, a proper multiscale prior reconstructs the shear curl field to a correlation of $0.77$ with measurement, above a $0.75$ target. The gain arises from the multiscale kernel rather than from real--imaginary coupling. We further identify a low-frequency accuracy ceiling set by model mismatch and a posterior uncertainty that is not yet calibrated. Calibrated uncertainty therefore emerges as the central next step for probabilistic wavefield inference in dissipative media.

Spectral Concentration and Recovery in Sparse High-Dimensional Random Geometric Graphs stat.ML

We study sparse random geometric graphs generated by connecting pairs of high-dimensional vectors whose inner product exceeds a threshold. The latent vectors are sampled either uniformly from the sphere or from a standard Gaussian distribution. Although every edge appears with probability $p$, the edges are dependent through their shared latent vectors. For the spherical model, at the connectivity scale $np=Ω(\log n)$, we prove $\|A-\mathbb E A\|=O\left(\sqrt{np\log n}+npτ\right)$, with high probability, where $τ$ is the cap threshold. This sharpens the spectral norm bound of Liu, Mohanty, Schramm, and Yang (2023) under weaker assumptions. An analogous result holds for the Gaussian model after removing the fluctuations of the vector norms, yielding improved global synchronization guarantees for the homogeneous Kuramoto model. We then recover the latent geometry from the leading eigenspace. When $np\gg\log n$, both the latent vector and relative Gram matrix errors vanish provided $d\ll np\log(1/p)/\log n$. The required lower dimension is only $d\gg\log(1/p)$ for the spherical model and $d\gg\log^2(1/p)\log n$ for the Gaussian model, improving the recovery guarantees of Li and Schramm (2023). Finally, we prove the first exact recovery result for the Gaussian mixture block model of Li and Schramm (2023). At the optimal connectivity scale $np=Ω(\log n)$, a polynomial-time semidefinite program exactly recovers all labels in a moderate-separation regime, whereas larger separation makes exact recovery impossible because isolated vertices appear with high probability. Our proofs combine orthogonal polynomial expansions, decoupling, and matrix concentration, avoiding the trace-moment arguments used in previous work.

Optimal Self-Distillation for Rectified Flow via Linear Probing stat.ML

Modern generative models are increasingly trained using model-generated signals, creating both opportunities for self-improvement and risks of collapse. We study optimal self-distillation (SD) for rectified flow (RF): given a suboptimal teacher velocity field, can a student trained on a mixture of true RF velocities and teacher velocities provably improve the teacher? For linear RF with ridge regularization on fixed interpolation pairs, we prove an exact affine path identity, derive the optimal mixing coefficient in closed form, and show strict improvement in integrated velocity risk whenever the teacher risk is nonstationary along the regularization path. The optimal coefficient obeys a sign rule: positive mixing corrects under-regularized teachers, while negative mixing corrects over-regularized teachers. We also give one-shot generalized cross-validation (GCV) and validation tuning procedure that avoids grid search over mixing weights and repeated refitting. Combining this theorem with RF Wasserstein convergence bounds, we show that optimal self-distillation improves the velocity estimation terms controlling continuous-time and finite-step generation error. Experiments with Gaussian models, Gaussian mixtures, and image data show that optimal self-distillation improves velocity risk, mode recovery, and finite-step generation relative to both the teacher and pure distillation.

cGAP: Generalized Association Plots with HOMALS-Guided Heatmaps for Visualization of High-Dimensional Categorical Data stat.ML

High-dimensional categorical data arise in genetics, biomedicine, and the social sciences, yet visualization tools for such data remain far less developed than those for continuous variables. Existing methods either scale poorly, rely heavily on low-dimensional displays detached from the original data matrix, or prioritize predictive accuracy over interpretability. To address this gap, we introduce categorical Generalized Association Plots (cGAP), a visualization framework for nominal, ordinal, and binary data that preserves the original data matrix while augmenting it with interpretable geometric structure. cGAP uses Homogeneity Analysis (HOMALS) to embed subjects and category levels in a three-dimensional Euclidean space and maps the embedding to red-green-blue coordinates so that similar patterns receive similar colors. The framework integrates three coordinated views: a HOMALS-guided heatmap of the raw data matrix, a subject proximity matrix, and a variable proximity matrix. Seriation algorithms are then used to reorder rows and columns to reveal coherent clusters, outliers, and local-to-global structure. We also derive barycentric traceability, projection-distortion, and contrast-preservation properties that clarify how embedding geometry is transferred to the display. We demonstrate the versatility of cGAP through applications to student-animal classification data, mammalian dentition profiles, mushroom records from the UCI Machine Learning Repository, and the Clusters of Orthologous Genes database. These examples show that cGAP supports transparent exploratory analysis by maintaining traceability between derived visual structure and the original categorical observations. cGAP provides a full-matrix, heatmap-based visualization environment for investigating complex categorical datasets across scientific domains.

Subjective Risk Decomposition: A New View for Uncertainty Quantification stat.ML

We present a novel viewpoint for uncertainty quantification. Uncertainty measures are not primitives, in need of axioms and argumentation, but instead consequences, of higher-level modelling decisions. We show how epistemic and aleatoric uncertainty measures can be derived via decomposition of a subjective risk, based on a strictly proper loss. Reverse cross-entropy provides a prominent example, where decomposition recovers the classic information-theoretic uncertainty terms. The same approach recovers numerous measures previously proposed across the UQ literature, providing them a common theoretical foundation. From a practical point of view, this suggests a new approach to UQ: given a modelling scenario and strictly proper loss, the corresponding epistemic and aleatoric terms are induced by the subjective-risk decomposition. We then extend our view to learning theory: we introduce and analyse subjective risk analogues of excess risk, approximation error, and estimation error, and identify the connections to UQ. We consider this a first step towards a full learning-theoretic framework for uncertainty quantification.

Price of Fairness in Bandits: A Tight Minimax Characterization stat.ML

In bandit problems, standard regret-minimizing algorithms treat exploration as an amortized cost, which can expose early participants to unfair ex-ante losses in settings such as clinical trials. Recent work addresses this by evaluating the sequence of per-round expected rewards through the generalized $p$-mean, interpolating between utilitarian welfare ($p=1$), Nash welfare ($p\to0$), and Rawlsian fairness ($p\to-\infty$). Although tight guarantees are known for $p\ge0$, the strictly fair regime $q=-p>0$ remains unresolved because negative-power means are dominated by the smallest per-round rewards. For $σ$-sub-Gaussian rewards with nonnegative means, the best prior algorithm relied on uniform early exploration and achieved regret $O(k^{(q+1)/2}/\sqrt{T})$, while the only general lower bound was the classical $Ω(σ\sqrt{k/T})$. Thus it was unclear whether the extra dependence on $k$ was intrinsic to strict fairness or an artifact of uniform exploration. We close this gap by identifying the exact polynomial price of strict fairness. Using a needle-in-haystack construction, we prove an algorithm-independent lower bound $Ω(σ\sqrt{k^{\max(1,q)}/T})$; for $q>1$, this shows that the penalty $k^{q/2}$ is information-theoretically unavoidable. We then introduce \textsf{UCB-HARE} (Harmonic Anchored Rank Exploration), which replaces uniform exploration with an inverse-weighted harmonic rank schedule protected by a certified positive-mean anchor. Its regret is $\widetilde{O}(σ\sqrt{k^{\max(1,q)}/T})$, matching the lower bound up to logarithmic factors. Experiments on synthetic instances confirm that \textsf{UCB-HARE} improves over uniform-exploration baselines, with gains increasing as $q$ grows.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Opus 4.8	61.4	60	$10.94
2	GPT-5.5	60.2	66	$11.25
3	Claude Opus 4.7	57.3	56	$10.94
4	Gemini 3.1 Pro Preview	57.2	132	$4.50
5	GPT-5.4	56.8	79	$5.63

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	gpt-5.5-2026-04-23-xhigh	62.7%
2	Codex	60.4%
3	Claude Code	59.6%
4	gpt-5.5-2026-04-23-medium	58.9%
5	Claude Opus 4.8-xhigh	56.4%

GitHub Repos All repos

Trending

microsoft/markitdown

143443 ★

Python tool for converting files and office documents to Markdown.

nesquena/hermes-webui

13269 ★

Hermes WebUI: The best way to use Hermes Agent from the web or from your phone!

supermemoryai/supermemory

25375 ★

Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.

D4Vinci/Scrapling

60665 ★

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

pbakaus/impeccable

33141 ★

The design language that makes your AI harness better at design.

Daily discovery

FireRedTeam/FireRedASR2SSpeech Recognition

530 ★

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.

Haoyu-ha/LNLNMultimodal

116 ★

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

skypilot-org/skypilotMLOps

10257 ★

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).

sdv-dev/DeepEchoSynthetic Data

123 ★

Synthetic Data Generation for mixed-type, multivariate time series.

autowarefoundation/autoware_vision_pilotRobotics

562 ★

Free self-driving car stack - fully open-source ADAS and autonomous driving system