The Inference Report

May 27, 2026

The infrastructure supporting AI agents is fracturing under the weight of production reality. Starlette, downloaded 325 million times weekly, carries a critical vulnerability that exposes millions of agents to compromise precisely when enterprises are racing to deploy them, yet the gap between ambition and execution keeps widening: 85% of organizations want agentic systems within three years, but 76% cannot operationally support them. Production agents are being quietly downscoped to read-only assistants and human-in-the-loop workflows because real-world data arrives late, facts conflict, APIs time out, and permissions fail. The demos work. The deployments don't.

Market behavior is signaling rejection of forced consolidation. DuckDuckGo installs jumped 30% when Google forced AI Search integration, while OpenRouter's valuation more than doubled to $1.3 billion on the strength of 5x usage growth in six months, driven by demand for choice among models rather than capture around a single interface. Distribution no longer guarantees control. Those offering optionality are winning where incumbents expected lock-in.

At the silicon level, NVIDIA is positioning Vera to handle the computational demands of continuous execution and agentic reasoning, targeting architectural gaps that batch-processing inference never faced. AWS and Anthropic are making different bets: AWS emphasizes startup engagement and geographic expansion while Anthropic opens in Seoul ahead of Computex, suggesting they believe the next phase of competition happens at distribution and regional footprint rather than at the memory bus. The divergence reveals two different theories of where the bottleneck actually lies.

GitHub and the open-source layer tell the real story. Claude-mem, Understand-Anything, and Taste-Skill have accumulated tens of thousands of stars by solving concrete problems: agents forget, code needs to be queryable, and generic output needs filtering. Mukul975's cybersecurity skills repository maps 754 competencies across Claude Code, Cursor, Copilot, and 20+ platforms rather than locking into a single vendor, a pattern repeated across agent harnesses and knowledge-work plugins. The infrastructure race has shifted from model size to making agents stateful, searchable, portable, and data-aware. What's being built is not a unified system but a fragmented web of tools designed to work across multiple platforms, each solving a specific failure mode of the previous generation.

Grant Calloway

AI LabsAll labs

AWS

AWS Weekly Roundup: AWS Local Zones in Istanbul, open-source ExtendDB, Kiro Web, and more (May 25, 2026)

Anthropic

Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening

NVIDIA

NVIDIA Vera CPU Is ‘Packing a Heavy-Hitting Punch’ Against Competition

From the WireAll feeds

Research Papers — FocusedAll papers

Subgrid-Scale Parameterization in Burgers' Equation Using Structure-Preserving Neural Networks and Entropy Variables math.NA

We present a machine learning approach for developing subgrid-scale (SGS) parametrizations in coarse simulations of partial differential equations. We utilize structure-preserving neural networks and entropy variables to learn subgrid fluxes in coarse simulations of the Burgers' equation. In particular, we employ a decoupled neural network architecture explicitly separating the subgrid corrections into two distinct components: a conservative Flux Potential network and an Eddy Viscosity network. We demonstrate that this reduced-order framework maintains high physical fidelity, accurately reproducing the energy spectrum, spatial and temporal correlation functions, and dynamical characteristics of the full-scale system. Furthermore, we show that our approach is robust and applicable to parameters outside the training regime.

Spectral-Informed Neural Networks Outperform Spectral Methods in High-dimensional PDEs math.NA

For low-dimensional problems ($d\leq3$), spectral methods can achieve exceptionally high accuracy. For middle-dimensional problems ($4 \leq d \lesssim 10$), spectral methods remain feasible through specific techniques such as sparse grids or hyperbolic cross. However, for high-dimensional problems ($d\gg 10$), spectral methods suffer frome the curse of dimensionality. Physics-informed neural networks (PINNs) have emerged as a promising approach to overcome this challenge, offering scalability to high dimensions, but often suffer from limited accuracy and efficiency. Recently proposed spectral-informed neural networks (SINNs) combine spectral methods with PINNs, operating directly in the spectral domain to avoid spatial derivative computations and to reduce memory consumption. In this work, we introduce Modified SINNs, which integrate coefficient decay scaling and basis embeddings motivated by harmonic analysis to enhance accuracy in high-dimensional problems and enable accurate approximation of unknown spectral coefficients. Numerical experiments on steady and time-dependent partial differential equations demonstrate that Modified SINNs outperform sparse grid spectral methods on middle-dimensional problems with incomplete spectral information and achieve superior accuracy compared to PINNs on high-dimensional problems.

Approximation of solutions of parameter-dependent problems by residual neural networks math.NA

We develop a convergent scheme to train neural networks involving analytic activation functions based on gradient flows. Convergence properties are guaranteed by Lojasiewicz theory. The main advantage of this approach is its simplicity of implementation. The coefficients of the network are approximated by solving a system of ordinary differential equations. We test the method by constructing residual neural network approximations of solutions of parametric problems. The dependence of the solutions of simple ordinary differential equations on a few parameters is correctly reproduced. The solutions of inverse problems involving wave constraints which depend on a few parameters can be reasonably approximated, even in regions in which the problem is severely ill posed.

Deep Learning-based Surrogate Modelling of the LOD Method for Multiscale Problems math.NA

Multiscale problems are notoriously difficult to tackle using traditional numerical methods, as accurately resolving fine-scale features often requires prohibitively fine discretizations. This challenge is particularly pronounced in applications such as materials science, fluid dynamics, climate systems, chemical processes, and complex networks. Recent neural operator models provide a promising data-driven alternative, but frequently struggle to achieve sufficient accuracy in the presence of strongly heterogeneous or oscillatory coefficients. In this work, we focus on the solution of elliptic PDEs with rough and high-contrast inputs. The Localized Orthogonal Decomposition (LOD) method is a well-established numerical approach for such problems, but it comes, however, at a substantial computational cost. We investigate the performance of popular neural operator architectures on these challenging multiscale problems and identify key limitations in their ability to resolve fine-scale structure. To overcome these challenges, we introduce LOD-MSNO (LOD-Multiscale Neural Operator), a hybrid approach that leverages the LOD method as a strong multiscale prior by building on its representation of the solution as a linear combination of problem-adapted basis functions, while addressing its main computational bottlenecks through data-driven operator learning. We further provide theoretical error estimates for the proposed coefficient-learning framework. Lastly, we demonstrate the potential of our proposed method to outperform current neural operator baselines in terms of accuracy for challenging multiscale inputs, while mainly retaining the computational efficiency of neural operator models.

Kernel-based Operator Learning: Error Analysis, Budget Allocation, and a Physics-Informed Extension math.NA

We study kernel-based operator learning in a two-stage sampling framework, where an offline kernel regression operator learns a discretized representation of the target operator from input-output pairs and an online kernel reconstruction operator recovers the output function from predicted observations. Our main theoretical contribution is an explicit budget allocation condition relating the number $N$ of training pairs, the number $n$ of input observations, and the output resolution $m$. The condition is derived from a coupled error analysis that interprets the surrogate as a reconstruction from approximate data. This yields a decomposition of the total error into reconstruction and learning contributions that can be analyzed independently. As a consequence, we obtain quantitative scaling laws describing how $N$, $n$, and $m$ must be coupled to guarantee convergence and to balance offline learning and online reconstruction errors. The resulting estimates extend previous analyses of kernel-based operator learning. We further introduce a physics-informed extension that incorporates knowledge of the underlying PDE at evaluation time. Rather than encoding constraints directly into the kernel, we augment the online reconstruction step by penalizing PDE residuals at collocation points. The method requires no retraining for new inputs. Numerical experiments illustrate the theoretical findings and demonstrate the effectiveness of the proposed physics-informed reconstruction strategy.

Online TT-ALS for Streaming Tensor Decomposition with Incremental Orthogonalization math.NA

Tensor Train (TT) decomposition is a powerful technique for analyzing high-dimensional data. Existing algorithms for computing TT decompositions can be categorized into two main types: conventional batch-based approaches and recursive online methods. In the context of streaming data, batch methods typically achieve higher reconstruction accuracy but often suffer from memory exhaustion, while online methods provide greater computational efficiency. In this work, we introduce Online TT-ALS (Alternating Least Squares), an algorithm that sequentially enforces orthogonality constraints. This approach allows for efficient and exact updates of the core tensor while maintaining high reconstruction accuracy. Theoretically, we prove that enforcing these orthogonal gauge constraints guarantees monotonic decrease of the local objective function and temporal smoothness. Computationally, our deterministic single-sweep update reduces the rank dependence from quadratic to linear, achieving an overall complexity of $\mathcal{O}(I^{n-1} r)$. Experimental results demonstrate that the proposed method outperforms existing online techniques not only in terms of mathematical approximation accuracy but also in human perception-based video quality metrics. Furthermore, compared to recent deep learning-based paradigms, our algebraic approach achieves speedups of several orders of magnitude. Consequently, our method exhibits high computational efficiency and is suitable for low-latency real-time processing applications.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	72	$11.25
2	Claude Opus 4.7	57.3	54	$10.94
3	Gemini 3.1 Pro Preview	57.2	130	$4.50
4	GPT-5.4	56.8	90	$5.63
5	Qwen3.7 Max	56.6	206	$3.75

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

Lum1104/Understand-Anything

43640 ★

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.

affaan-m/ECC

225416 ★

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

rohitg00/ai-engineering-from-scratch

33431 ★

Learn it. Build it. Ship it for others.

anthropics/knowledge-work-plugins

17517 ★

Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork

mukul975/Anthropic-Cybersecurity-Skills

21522 ★

754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0

Daily discovery

pytorch/executorchNeural Network

4746 ★

On-device AI across mobile, embedded and edge for PyTorch

vercel/aiGenerative AI

24491 ★

The AI Toolkit for TypeScript. From the creators of Next.js, the AI SDK is a free open-source library for building AI-powered applications and agents

NVIDIA-AI-Blueprints/ragRAG

647 ★

This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented Generation (RAG) pipeline.

HumanSignal/label-studioMLOps

27423 ★

Label Studio is a multi-type data labeling and annotation tool with standardized output format

sdv-dev/CopulasSynthetic Data

646 ★

A library to model multivariate data using copulas.