The Inference Report

May 5, 2026

The financialization of artificial intelligence has entered a new phase. OpenAI extracted $10 billion from a consortium of 19 Wall Street firms while Anthropic closed $1.5 billion from Blackstone, Goldman Sachs, and Hellman & Friedman, then immediately launched a joint venture with those same asset managers to aggressively market enterprise AI products. This is not a go-to-market strategy shift. This is a distribution layer replacement. Venture capital, which built these companies, has been superseded by private equity, which brings patient capital, institutional sales relationships, and the ability to embed AI into existing portfolios rather than pitch it as a standalone product. Cerebras is heading for a blockbuster IPO valued at $26.6 billion or more. The money is flowing to companies positioned as infrastructure or enterprise tools, not consumer products or research labs.

Yet actual product performance is diverging sharply from market narrative. Microsoft announced more than 20 million paying Copilot users, up 33 percent from 15 million in January, but the company is not claiming those users are generating outsized productivity gains or revenue. Image AI models now drive app downloads at 6.5 times the rate of chatbot upgrades, yet most of those downloads do not convert to revenue. Anthropic, which bills itself as the most sophisticated evaluation shop in AI, shipped three quality regressions in Claude Code that its own internal evaluations did not catch. The gap between what AI can do and what it actually does for paying customers is widening, not closing. Capital is flowing into the space anyway because the institutional buyers now have skin in the game and incentive to make the bet work.

OpenAI and Anthropic are racing to embed themselves into enterprise workflows through vertical integration and partnership capital, while IBM and AWS are positioning infrastructure and orchestration as the durable layer beneath that stack. What is absent is any lab announcing a pure model capability that does not come bundled with services, deployment, or infrastructure commitments. The era of selling weights is over. In software engineering benchmarks, Claude Opus 4.6 holds the SWE-rebench top position at 65.3 percent, with gpt-5.2-2025-12-11-medium, GLM-5, and Junie clustered tightly behind it. Yet the divergence between SWE-rebench and Artificial Analysis suggests these benchmarks measure different failure modes: SWE-rebench tests unmodified real pull requests requiring executable validation, while Artificial Analysis appears to weight instruction-following and synthetic tasks more heavily. Developers on GitHub are moving past simple LLM calls toward agent orchestration frameworks that coordinate multiple instances toward specific goals, while simultaneously prioritizing privacy and local control through tools that keep data on device and offer open replacements for SaaS incumbents. The question now is whose orchestration layer becomes the standard, and whether that standard is open or proprietary.

Grant Calloway

AI LabsAll labs
From the WireAll feeds
Research Papers — FocusedAll papers
HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs cs.CE

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how long - conditioned on regime features and state statistics. Modules may be numerical sub-solvers or learned components, enabling hybrid surrogates evaluated at arbitrary query times without autoregressive rollout. Across diverse PDE benchmarks, HyCOP produces interpretable programs, delivers order-of-magnitude OOD improvements over monolithic neural operators, and supports modular transfer through dictionary updates (e.g., boundary swaps, residual enrichment). Our theory characterizes expressivity and gives an error decomposition that separates composition error from module error and doubles as a process-level diagnostic.

Design Structure Matrix Modularization with Large Language Models cs.CE

Design Structure Matrix (DSM) modularization, the task of partitioning system elements into cohesive modules, is a fundamental combinatorial challenge in engineering design. Traditional methods treat modularization as a pure graph optimization, without access to the engineering context embedded in the system. Building on prior work on LLM-based combinatorial optimization for DSM sequencing, this paper extends the method to modularization across five cases and three backbone LLMs. Our method achieves near-reference quality within 30 iterations without requiring specialized optimization code. Counterintuitively, domain knowledge, beneficial in sequencing, consistently impairs performance on more complex DSMs. We attribute this to semantic misalignment between the LLM's functional priors and the purely structural optimization objective, and propose the semantic-alignment hypothesis as a testable condition governing knowledge effectiveness with LLMs. Ablation studies identify the most effective input representation, objective formulation, and solution pool design for practical deployment. These findings offer practical guidance for deploying LLMs in engineering design optimization.

MappingEvolve: LLM-Driven Code Evolution for Technology Mapping cs.CE

Technology mapping is a critical yet challenging stage in logic synthesis. While Large Language Models (LLMs) have been applied to generate optimization scripts, their potential for core algorithm enhancement remains untapped. We introduce MappingEvolve, an open-source framework that pioneers the use of LLMs to directly evolve technology mapping code. Our method abstracts the mapping process into distinct optimization operators and employs a hierarchical agent-based architecture, comprising a Planner, Evolver, and Evaluator, to guide the evolutionary search. This structured approach enables strategic and effective code modifications. Experiments show our method significantly outperforms direct evolution and strong baselines, achieving 10.04\% area reduction versus ABC and 7.93\% versus mockturtle, with 46.6\%--96.0\% $S_{overall}$ improvement on EPFL benchmarks, while explicitly navigating the area--delay trade-off. Our code and data are available at https://github.com/Flians/MappingEvolve.

Partition-of-Unity Gaussian Kolmogorov-Arnold Networks cs.CE

Gaussian basis functions provide an efficient and flexible alternative to spline activations in KANs. In this work, we introduce the partition-of-unity Gaussian KAN (PU-GKAN), a Shepard-type normalized Gaussian KAN in which the Gaussian basis values on each edge are divided by their local sum over fixed centers. This produces a partition-of-unity feature map with trainable coefficients, while preserving the standard edge-based KAN structure. The normalized construction gives exact constant reproduction at the edge level and admits an explicit finite-feature kernel interpretation. We formulate both the standard Gaussian KAN (GKAN) and PU-GKAN from a finite-feature and additive-kernel viewpoint, making the induced layer kernels and empirical feature matrices explicit. Using the first-layer feature matrix as the reference object, we adopt a practical scale-selection interval for \(ε\), with the lower endpoint determined by adjacent-center overlap and the upper endpoint determined by a conservative conditioning threshold. Numerical experiments show that PU-GKAN reduces sensitivity to \(ε\), improves validation accuracy for most smooth and moderately non-smooth targets, and gives more stable training behavior. The benefit persists across sample-size and center-number sweeps, higher-dimensional architectures, Matérn RBF bases, and physics-informed examples involving Helmholtz and wave equations. These results indicate that Shepard-type partition-of-unity normalization is a simple and effective stabilization mechanism for RBF-based KANs.

Scaling of Gaussian Kolmogorov--Arnold Networks cs.CE

The Gaussian scale parameter \(ε\) is central to the behavior of Gaussian Kolmogorov--Arnold Networks (KANs), yet its role in deep edge-based architectures has not been studied systematically. In this paper, we investigate how \(ε\) affects Gaussian KANs through first-layer feature geometry, conditioning, and approximation behavior. Our central observation is that scale selection is governed primarily by the first layer, since it is the only layer constructed directly on the input domain and any loss of distinguishability introduced there cannot be recovered by later layers. From this viewpoint, we analyze the first-layer feature matrix and identify a practical operating interval, \[ ε\in \left[\frac{1}{G-1},\frac{2}{G-1}\right], \] where \(G\) denotes the number of Gaussian centers. For the standard shared-center Gaussian KAN used in current practice, we interpret this interval not as a universal optimality result, but as a stable and effective design rule, and validate it through brute-force sweeps over \(ε\) across function-approximation problems with different collocation densities, grid resolutions, network architectures, and input dimensions, as well as a physics-informed Helmholtz problem. We further show that this range is useful for fixed-scale selection, variable-scale constructions, constrained training of \(ε\), and efficient scale search using early training MSE. Finally, using a matched Chebyshev reference, we show that a properly scaled Gaussian KAN can already be competitive in accuracy relative to another standard KAN basis. In this way, the paper positions scale selection as a practical design principle for Gaussian KANs rather than as an ad hoc hyperparameter choice.

Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems cs.CE

Financial institutions face tension between maximizing data utility and mitigating the re-identification risks inherent in traditional anonymization methods. This paper explores Differentially Private (DP) synthetic data as a robust "Privacy by Design" framework to resolve this conflict, ensuring output privacy while satisfying stringent regulatory obligations. We examine two distinct generative paradigms: Direct Tabular Synthesis, which reconstructs high-fidelity joint distributions from raw data, and DP-Seeded Agent-Based Modeling (ABM), which uses DP-protected aggregates to parameterize complex, stateful simulations. While tabular synthesis excels at reflecting static historical correlations for QA testing and business analytics, the DP-Seeded ABM offers a forward-looking "counterfactual laboratory" capable of modeling dynamic market behaviors and black swan events. By decoupling individual identities from data utility, these methodologies eliminate traditional data-clearing bottlenecks, enabling seamless cross-institutional research and compliant decision-making in an evolving regulatory landscape.

BenchmarksFull tables
Artificial AnalysisIntelligence Index

Composite score across coding, math, and reasoning

#ModelScoretok/s$/1M
1GPT-5.560.274$11.25
2Claude Opus 4.757.358$10.94
3Gemini 3.1 Pro Preview57.2130$4.50
4GPT-5.456.886$5.63
5Kimi K2.653.930$1.71
SWE-rebench

Agentic coding on real-world software engineering tasks

#ModelScore
1Claude Opus 4.665.3%
2gpt-5.2-2025-12-11-medium64.4%
3GLM-562.8%
4Junie62.8%
5gpt-5.4-2026-03-05-medium62.8%