The Inference Report — April 30, 2026

The infrastructure race has collapsed into a single dimension: compute capacity. Google Cloud crossed $20 billion in quarterly revenue while admitting it was capacity-constrained, meaning demand exceeded supply hard enough to leave money on the table. Microsoft deploys Copilot to over 20 million paid users without paying OpenAI for the underlying models, a structural advantage that compounds quarter to quarter. Amazon is spending heavily to match, and SoftBank is building a robotics company specifically to construct data centers, which is to say the bottleneck has become so acute that you need robots to build the infrastructure that runs the robots. Capital is flowing toward whoever controls the pipe, with Anthropic raising at $900 billion valuation and Runway at $5.3 billion on video models alone. But the pipe itself is becoming a liability. Drone strikes on data centers in the Middle East have made war damage uninsurable, forcing Big Tech to rethink regional projects. This is not regulatory friction. This is physical risk pricing itself into the business model.

The real tension is between velocity and control. Companies are raising record capital to move faster, but the faster they move, the less visibility they have into what they've built. A senior engineer at a well-funded company couldn't explain how a critical algorithm at the heart of their product worked. An AI model called Centaur claimed to mimic human thinking across 160 cognitive tasks but was just memorizing patterns. The infrastructure is scaling exponentially while the ability to reason about it is not. OpenAI's pivot from dismissing model quirks as harmless to forensically examining failure modes signals recognition that as models scale, their failure modes scale too. Hugging Face flags evaluation as the new computational bottleneck, suggesting the open-source ecosystem sees a different constraint than the closed labs do. The legal template for what happens when AI companies convert from nonprofit to for-profit is being written in real time in the Musk v. Altman trial, with $134 billion in assets hanging in the balance. An AI agent wiped out a company's entire customer database in nine seconds and confessed. A critical remote code execution vulnerability in GitHub could execute arbitrary code on millions of repositories. These aren't edge cases. They're the friction that emerges when you move faster than your operational maturity can support.

The developer ecosystem is splitting into two camps: one building agentic systems to delegate routine work to autonomous agents, the other solving the practical infrastructure problems those agents create. Presidio detects PII before it reaches an LLM. Memory services and knowledge graph builders exist because agents without persistent context are expensive and unreliable. The diversity of agent frameworks suggests the category is still contested, which means early adopters are paying the integration cost. Real momentum isn't in the agent frameworks themselves but in the supporting layer that makes agents feasible to run, debug, and reason about at all. On the benchmark front, Claude Opus 4.6 holds the SWE-rebench lead at 65.3%, but the meaningful movement occurs in the mid-field where Chinese models have made gains on specialized evaluations. The lack of dramatic score inflation and persistence of the same top performers suggest the evaluations are not drifting, though divergence between benchmarks for mid-tier models warrants investigation into whether they stress different failure modes. The constraint now is not chips or capital. It's whether you can build something at scale and still understand what you've built before it breaks something that matters.

Grant Calloway

AI LabsAll labs

Anthropic

ScienceEvaluating Claude’s bioinformatics research capabilities with BioMysteryBench

Google

Four ways Google Research scientists have been using Empirical Research Assistance

Hugging Face

IBM

The MIT-IBM Computing Research Lab Launches to Shape the Future of AI and Quantum Computing

Mistral

Remote agents in Vibe. Powered by Mistral Medium 3.5.

NVIDIA

NVIDIA Sets Conference Call for First-Quarter Financial Results

OpenAI

From the WireAll feeds

Research PapersAll papers

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models cs.CL

Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge transfer, in which the teacher and student differ in architecture, attention mechanism, and tokenizer. We present TIDE, the first framework for cross-architecture dLLM distillation, comprising three modular components: (1) TIDAL, which jointly modulates distillation strength across training progress and diffusion timestep to account for the teacher's noise-dependent reliability; (2) CompDemo, which enriches the teacher's context via complementary mask splitting to improve predictions under heavy masking; and (3) Reverse CALM, a cross-tokenizer objective that inverts chunk-level likelihood matching, yielding bounded gradients and dual-end noise filtering. Distilling 8B dense and 16B MoE teachers into a 0.6B student via two heterogeneous pipelines outperforms the baseline by an average of 1.53 points across eight benchmarks, yielding notable gains in code generation, where HumanEval scores reach 48.78 compared to 32.3 for the AR baseline.

Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport cs.LG

We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing ICNNs and MLPs in terms of predictive performance for convex regression and interpolation tasks. We further apply HyCNNs to learn high-dimensional optimal transport maps for synthetic examples and for single-cell RNA sequencing data, where they oftentimes outperform ICNN-based neural optimal transport methods and other baselines across a wide range of settings.

Select to Think: Unlocking SLM Potential with Local Sufficiency cs.CL

Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose SELECT TO THINK (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying the supervision signal to discrete candidate rankings. Leveraging this, we introduce S2T-LOCAL, which distills the selection logic into the SLM, empowering it to perform autonomous re-ranking without inference-time LLM dependency. Empirically, we demonstrate that a 1.5B SLM's top-8 candidates capture the 32B LLM's choice with 95% hit rate. Translating this potential into performance, S2T-LOCAL improves greedy decoding by 24.1% on average across benchmarks, effectively matching the efficacy of 8-path self-consistency while operating with single-trajectory efficiency.

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees math.OC

The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty updates. We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions, and show on benchmark quadratic programs that the resulting learned policies improve both iteration count and wall-clock time over baseline OSQP.

A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound cs.LG

In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. In this short technical note, I show that this is equivalent to changing the prior in the Krichevsky--Trofimov algorithm. Then, I show how to use the same idea to remove the $\ln \ln T$ factor in the data-independent bound for the Squint algorithm.

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation cs.SE

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations are either confined to isolated functions or rely on manually curated class-level tasks that are expensive to scale and increasingly susceptible to data contamination. We introduce ClassEval-Pro, a benchmark of 300 class-level tasks spanning 11 domains, constructed through an automated three-stage pipeline that combines complexity enhancement, cross-domain class composition, and integration of real-world GitHub code contributed after January 2025. Every task is validated by an LLM Judge Ensemble and must pass test suites with over 90% line coverage. We evaluate five frontier LLMs under five generation strategies. The best model achieves only 45.6% class-level Pass@1, with a 17.7-point gap between the strongest and weakest models, confirming the benchmark's discriminative power. Strategy choice strongly interacts with model capability: structured approaches such as bottom-up improve weaker models by up to 9.4 percentage points, while compositional generation collapses to as low as 1.3%. Error analysis over 500 manually annotated failures reveals that logic errors (56.2%) and dependency errors (38.0%) dominate, identifying cross-method coordination as the core bottleneck.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	65	$11.25
2	Claude Opus 4.7	57.3	52	$10.00
3	Gemini 3.1 Pro Preview	57.2	129	$4.50
4	GPT-5.4	56.8	93	$5.63
5	Kimi K2.6	53.9	25	$1.71

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	gpt-5.4-2026-03-05-medium	62.8%
5	GLM-5.1	62.7%

GitHub Repos All repos

Trending

warpdotdev/warp

46561 ★

Warp is an agentic development environment, born out of the terminal.

mattpocock/skills

46843 ★

My personal directory of skills, straight from my .claude directory.

HunxByts/GhostTrack

11858 ★

Useful tool to track location or mobile number

ComposioHQ/awesome-codex-skills

5045 ★

A curated list of practical Codex skills for automating workflows across the Codex CLI and API.

1jehuang/jcode

1624 ★

Coding Agent Harness

Daily discovery

alibaba/ROLLRLHF

3115 ★

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

microsoft/presidioTransformers

7893 ★

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

AceDataCloud/NexiorImage Generation

363 ★

Consumer AI app for chat, image generation, video generation, and music creation powered by Ace Data Cloud APIs.

BBC-Esq/VectorDB-PluginRAG

363 ★

Program that lets you ask questions about your documents, audio, and video files.

pixeltable/pixeltableComputer Vision

1549 ★

Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.