The Inference Report — May 15, 2026

"The constraint is no longer capability. It's capital, energy, and talent retention."

The infrastructure of AI is revealing itself as fundamentally material rather than neutral. NV Energy's decision to cut water to 49,000 Lake Tahoe residents in favor of Nevada data centers represents the most literal expression of this reality: when energy is scarce, it flows to the highest-value customer, not the most critical one. This same logic is reshaping compensation across the sector. Anthropic is moving Claude from unlimited subscriptions to per-call billing starting June 15, a metering mechanism that transforms how agent work gets valued. McKinsey is shifting partner compensation toward equity. Cisco cut nearly 4,000 jobs while posting record revenue. These aren't isolated business moves. They're synchronized signals of capital and labor being redistributed as companies move from research to production at scale. When partnerships fail, OpenAI exploring legal action against Apple over a ChatGPT integration that failed to deliver subscribers, SpaceXAI losing 50 employees since its February merger, the Musk v. Altman litigation, the failure is rarely about capability. It's about returns not matching expectations and executives fighting to protect their positions when they don't.

The market is consolidating around operational deployment rather than model advancement. OpenAI is pushing Codex into production workflows and real-time steering at Sea Limited. AWS is moving beyond model selection into prompt optimization tooling that compares outputs across five models simultaneously. Anthropic is signing enterprise deals with PwC and the Gates Foundation. Microsoft and IBM are both positioning themselves as implementation engines for existing organizations rather than research frontiers. The infrastructure layer, AMD optimizing inference, Hugging Face publishing embedding improvements, NVIDIA shipping games on GeForce NOW, is competing for the same outcome: making the layer between model and customer's problem invisible. The labs that win will be the ones that own that layer.

GitHub trending repositories show developers solving concrete friction points: persistent memory and agentic workflows (AgentMemory), computer vision infrastructure (Roboflow's Supervision, NVIDIA's video analytics), and offline inference (Supertone's on-device TTS, LocalAI's hardware-agnostic serving). What's absent is any major push on reasoning or long-horizon planning. The momentum is instead in making existing approaches reliable, composable, and deployable at scale. The gap between established tools and emerging specialized stacks suggests the market for agentic frameworks is fragmenting rather than converging. Across signal processing research, the same pattern holds: self-supervised contrastive methods paired with structured neural architectures are winning over generic approaches because they encode domain-specific priors. Uncertainty quantification is moving from post-hoc calibration to a training objective. Lightweight adaptation techniques are replacing full retraining. The field is optimizing for practical deployment constraints, latency, bandwidth, label scarcity, not theoretical capability.

Grant Calloway

AI LabsAll labs

AI21 Labs

Reproducing Variance: Caching in Agentic LLM Pipelines

AMD

Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

AWS

Amazon Bedrock introduces new advanced prompt optimization and migration tool

Anthropic

Hugging Face

IBM

A New Way to Make AI Actually Work in the Real World

Microsoft

Costa Rican dairy cooperative turns AI agents into coworkers

NVIDIA

Sea You in the Cloud: ‘Subnautica 2’ Early Access Dives Onto GeForce NOW

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

Modulation Consistency-based Contrastive Learning for Self-Supervised Automatic Modulation Classification eess.SP

Deep learning-based AMC methods have achieved remarkable performance, but their practical deployment remains constrained by the high cost of labeled data. Although self-supervised learning (SSL) reduces the reliance on labels, existing SSL-based AMC methods often rely on task-agnostic pretext objectives misaligned with modulation classification, leading to representations entangled with nuisance factors such as symbol, channel, and noise. In this paper, we identify intra-instance modulation consistency as a task-aware structural prior, whereby different temporal segments of the same signal may differ in waveform while preserving the same modulation type, thus providing a principled cue for task-aligned self-supervision. Based on this prior, we propose Mod-CL, a Modulation consistency-based Contrastive Learning framework that constructs positive pairs from different temporal segments of the same signal instance, to encourage the model to learn shared modulation information while suppressing nuisance variations. We further develop a contrastive objective tailored to Mod-CL, which jointly exploits temporal segmentation and data augmentation to pull together views sharing the same modulation semantics while avoiding supervisory conflicts within each signal instance. Extensive experiments on RadioML datasets show that Mod-CL consistently outperforms strong baselines, especially in low-label regimes, achieving substantial improvements in linear probing accuracy.

Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study eess.SP

Specialized foundation models are beginning to emerge in various medical subdomains, but pretraining methodologies and parametric scaling with the size of the pretraining dataset are rarely assessed systematically and in a like-for-like manner. This work focuses on foundation models for electrocardiography (ECG) data, one of the most widely captured physiological time series world-wide. We present a comprehensive assessment of pretraining methodologies, covering five different contrastive and non-contrastive self-supervised learning objectives for ECG foundation models, and investigate their scaling behavior with pretraining dataset sizes up to 11M input samples, exclusively from publicly available sources. Pretraining strategy has a meaningful and consistent impact on downstream performance, with contrastive predictive coding (slightly ahead of JEPA) yielding the most transferable representations across diverse clinical tasks. Scaling pretraining data continues to yield meaningful improvements up to 11M samples for most objectives. We also compare model architectures across all pretraining methodologies and find evidence for a clear superiority of structured state space models compared to transformers and CNN models. We hypothesize that the strong inductive biases of structured state space models, rather than pretraining scale alone, are the primary driver of effective ECG representation learning, with important implications for future foundation model development in this and potentially other physiological signal domains.

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance eess.SP

To address the issues of high interruption time and measurement report overhead under user equipment (UE) mobility especially in high speed 5G use cases the use of AI/ML techniques (AI/ML beam management and mobility procedures) have been proposed. These techniques rely heavily on data that are most often simulated for various scenarios and do not accurately reflect real deployment behavior or user traffic patterns. Therefore, there is an utmost need for realistic datasets under various conditions. This work presents a dataset collected from a commercially deployed network across various modes of mobility (pedestrian, bike, car, bus, and train) and at multiple speeds to depict real time UE mobility. When collecting the dataset, we focused primarily on handover (HO) scenarios, with the aim of reducing the HO interruption time and maintaining continuous throughput during and immediately after HO execution. To support this research, the dataset includes timing advance (TA) measurements at various signaling events such as RACH trigger, MAC CE, and PDCCH grant which are typically missing in existing works. We cover a detailed description of the creation of the dataset; experimental setup, data acquisition, and extraction. We also cover an exploratory analysis of the data, with a primary focus on mobility, beam management, and TA. We discuss multiple use cases in which the proposed dataset can facilitate understanding of the inference of the AI/ML model. One such use case is to train and evaluate various AI/ML models for TA prediction.

PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting eess.SP

Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We present PropSplat, a map-free propagation modeling method that reconstructs radio frequency (RF) fields using 3D anisotropic Gaussian primitives. Each Gaussian encodes a scalar path loss offset relative to an explicit baseline path loss model with a learnable path loss exponent. Gaussians are initialized along observed transmitter--receiver paths and optimized end-to-end to learn the propagation environment without external information like floor plans, terrain databases, or clutter data. We evaluate PropSplat against wireless radiance field methods NeRF$^2$, GSRF, and WRF-GS+ on two real-world datasets. On large-scale outdoor drive-tests spanning multiple topographical regions at six sub-6 GHz frequencies, PropSplat achieves 5.38 dB RMSE when training measurements are spaced 300m apart and outperforms WRF-GS+ (5.87 dB), GSRF (7.46 dB), and NeRF$^2$ (14.76 dB). On indoor Bluetooth Low Energy measurements, PropSplat achieves 0.19m mean localization error, an order of magnitude better than NeRF$^2$ (1.84m), while achieving near-identical received signal strength prediction accuracy. These results show that accurate site-specific propagation reconstruction is achievable from sparse RF-native measurements. The need for geographic data as a prerequisite for scalable RF environment modeling is reduced.

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision eess.SP

Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods already learn measurement weighting through the solver, but they still use position-only objectives. As a result, the mean estimate may improve while the reported covariance remains too small, too large, or wrong in shape. In this work, we propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. The Weighting Generation Network (WGN) predicts per-satellite reliability weights. The differentiable Gauss--Newton solver maps these weights to a position estimate and posterior covariance, and proper scoring rules supervise the East--North predictive distribution end-to-end. We study negative log-likelihood (NLL), Energy Score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in uncertainty credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes, and the mean horizontal error and 95th-percentile error improve on the deep-urban scene. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77\,m to 11.68\,m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05. The case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

Sequential Inference for Gaussian Processes: A Signal Processing Perspective eess.SP

The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization, adaptive and active sensing, and sequential detection and decision-making. By organizing these advances from a signal-processing perspective, we intend to equip practitioners with practical tools and a coherent roadmap for deploying sequential GP models in real-world systems.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	66	$11.25
2	Claude Opus 4.7	57.3	62	$10.94
3	Gemini 3.1 Pro Preview	57.2	126	$4.50
4	GPT-5.4	56.8	83	$5.63
5	Kimi K2.6	53.9	43	$1.71

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

tinyhumansai/openhuman

8222 ★

Your Personal AI super intelligence. Private, Simple and extremely powerful.

rohitg00/agentmemory

9289 ★

#1 Persistent memory for AI coding agents based on real-world benchmarks

obra/superpowers

191902 ★

An agentic skills framework & software development methodology that works.

K-Dense-AI/scientific-agent-skills

22037 ★

A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.

shiyu-coder/Kronos

24981 ★

Kronos: A Foundation Model for the Language of Financial Markets

Daily discovery

jatinkrmalik/vocalinuxSpeech Recognition

306 ★

Free, open-source, 100% offline voice dictation for Linux. Speak and type anywhere via whisper.cpp, Whisper & VOSK engines, GPU-accelerated, works on X11 + Wayland!

flwrlabs/flowerFederated Learning

6901 ★

Flower: A Friendly Federated AI Framework

sdv-dev/CopulasSynthetic Data

645 ★

A library to model multivariate data using copulas.

AstrBotDevs/AstrBotMCP

32242 ★

Agentic IM Chatbot infrastructure that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨

pytorch/pytorchDeep Learning

99920 ★

Tensors and Dynamic neural networks in Python with strong GPU acceleration