The Inference Report — May 28, 2026

Nvidia's $150 billion commitment to Taiwan represents a decisive realignment of AI infrastructure power away from Washington's policy ambitions and toward the physical realities of chips, power, and proximity to manufacturing. The decision is not a vote of confidence in US incentives but a calculation that Taiwan's existing TSMC ecosystem offers lower friction than any regulatory environment. Snowflake's $6 billion AWS deal for AI CPUs and DigitalBridge's $1 billion acquisition of ArcLight energy underscore the same pattern: capital flows to whoever guarantees reliable compute and power, not to whoever promises the best policy. This shift occurs precisely as the gap between AI's claimed capabilities and its actual performance widens. Google's misspelling of its own name in AI search results and reports that agents ignore evidence and struggle to learn expose fundamental brittleness in the technology, yet companies like Remote report 50% revenue-per-employee gains from AI adoption and Cognition reached $492 million in annualized run rate. The resolution is structural rather than technical: organizations absorb failures through human review, sandboxed environments, and constrained deployment. Productivity gains are real. Safety is purchased through constraint.

The competitive battleground has shifted from foundation model capability to the infrastructure that coordinates multiple agents across fragmented enterprise systems. OpenAI's Codex positioning as agent orchestration connective tissue and Nvidia's framing of AI factories as token factories converting power into intelligence both signal that near-term value capture lies in inference economics, cost per token, and performance per watt rather than raw model capability. Yet Hugging Face's ITBench benchmark revealed frontier models scoring below 50 percent on agentic enterprise IT tasks, exposing the chasm between static benchmark performance and actual agent behavior in production workflows. Anthropic's coding agents for social science research and Hugging Face's local-first robotics deployment show the market already fragmenting by use case and deployment constraint. What remains conspicuously absent is any lab claiming general superiority in agentic reasoning; instead announcements focus on embedding agents into specific workflows where switching costs are real.

Regulation is becoming a competitive advantage for incumbents. Illinois passing America's strongest AI safety bill requiring third-party audits of models from OpenAI, Anthropic, and Google creates compliance costs that lock in dominant players while raising barriers for new entrants. Cognition's $25 billion valuation after eight months, Kirkland & Ellis committing $500 million to build proprietary AI technology, and OpenAI's foundation allocating $250 million to research AI's economic impact all point to the same conclusion: winners are determined by speed and user lock-in before rules harden, not by regulatory favorability. The GitHub trending set confirms this pattern. Infrastructure tooling like Streamlit and NocoBase gains traction because it replaces something developers were already doing. Meanwhile, repos claiming compatibility with twenty platforms and selling taste as a feature show coordinated promotion around specific AI platforms rather than organic adoption. The real signal concentrates in unglamorous spaces where actual constraints live: cost, latency, and the gap between benchmark numbers and production behavior. That is where the work is happening.

Grant Calloway

AI LabsAll labs

AMD

Deep Dive Into 4-Wave Interleave FP8 GEMM

Anthropic

Economic ResearchCoding agents in the social sciences

Google

Private analytics via zero-trust aggregation

Hugging Face

NVIDIA

AI Factories: The New Infrastructure of Intelligence

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

ANet Patu-1: The Value of Connection in the Agent Network cs.NI

The Internet taught us that the value of a network depends on \emph{how} its nodes connect: broadcast stars scale as $V\!\propto\!N$ (Sarnoff), fully-connected meshes as $N^2$ (Metcalfe), and group-forming networks as $2^{N}$ (Reed). We ask the analogous question for networks of AI agents. We model the net value of connection as a function of coordination-group size, derive from it the properties an optimal collaboration protocol must have, and introduce ANet Patu-1 -- a self-organizing consensus protocol in which the network continuously re-forms its own coalitions, adaptively riding the upper envelope of all three regimes at $O(1)$ parallel consensus rounds. To measure value without opinion-grading, we score an emergent protocol by formally specifying it and deriving its complexity, the way distributed algorithms are analyzed. Two results follow. (i)~Emergence -- a crowd of the \emph{cheapest} model, when heterogeneous, starts weak but its collective value compounds with $N$ and \emph{overtakes} a crowd of a far \emph{stronger} model that is homogeneous: a crossover that marks a scaling law for collaboration rather than for scale. (ii)~Reflexivity -- a heterogeneous network, given only its own problem and no design hints, converges on ANet Patu-1 itself, reconstructing the high-dimensional law that governs its own connective value.

ADORN: Adaptive Drift handling for Open RAN using Reinforcement Learning cs.NI

Dynamic traffic variations in Open Radio Access Networks (O-RAN) lead to drift, which degrades the performance of Artificial Intelligence/Machine Learning (AI/ML) models. Traditional retraining approaches maintain forecasting accuracy but incur high computational cost and may lead to violations of Service Level Agreements (SLAs). This work proposes a Q-learning-based adaptive retraining approach that formulates the retraining decision as a Markov Decision Process (MDP), where a Reinforcement Learning (RL) agent learns a policy that balances forecasting accuracy and retraining cost. The proposed approach incorporates a multi-expert Long Short-Term Memory (LSTM) ensemble to mitigate catastrophic forgetting and improve robustness across diverse traffic conditions. Experimental results show that the proposed approach effectively reduces retraining overhead compared to greedy and random baselines, while maintaining system performance within predefined limits.

Spatio-Temporal Scheduling Prediction Under Backhaul Delay for Resilient Coordinated Beamforming cs.NI

Coordinated beamforming in distributed 5G networks relies on the timely exchange of inter-cell scheduling information, but backhaul latency makes this information stale. Even a single transmission time interval (TTI) of delay can reduce CBF-SLNR performance below the uncoordinated baseline, because the precoder suppresses interference toward users that are no longer active. Coordination on stale information is therefore worse than no coordination at all. To address this, we propose a two-stage predictive framework in which a Spectral Temporal Graph Neural Network (StemGNN) predicts future user equipment (UE) scheduling states from delayed historical observations, and the predictions replace stale inputs to the CBF-SLNR precoder. Evaluated on a three-cell massive MIMO downlink with 60 UEs and 64 antennas per base station under Quadriga Urban Micro (UMi) channels and a proportional fair scheduler, StemGNN achieves a mean scheduling prediction accuracy of 87.57%, outperforming LSTM, GRU, Simple RNN, and Markov chain baselines at all evaluated horizons, with gains of up to 7.71% over LSTM at longer horizons where inter-UE structural dependencies dominate over temporal autocorrelation. When integrated into coordinated beamforming, the predictions recover 57-73% of the sum rate loss caused by one TTI of backhaul delay, improving sum rate by 9.58-14.35% over the no-prediction baseline and recovering up to 83% of the Lag-1 fairness loss for cell-edge users, with fairness gains persisting at higher lag values where throughput gains diminish. These results show that treating backhaul latency as a spatio-temporal forecasting problem is an effective approach for robust inter-cell coordination in delay-constrained networks.

From Agentic to Autogenic Network Management for AI-Native 6G and Beyond: A Standards Perspective cs.NI

Standards bodies, including TM Forum, 3GPP, and ETSI, are converging on Agentic AI as the foundation for next-generation network management, where Large AI Model (LAM)-based agents autonomously interpret intent, coordinate resources, and adapt operational behaviors at runtime. However, achieving this vision at the scale and complexity of 6G networks requires management systems that can generate and evolve their own automation software during operation. We introduce Autogenic network management, a reference architecture that extends agentic capabilities with self-programming, self reflection, self-orienting, and self-architecting capabilities. The architecture supports practical staged deployment beginning with human-supervised LAM-based agents and progressing toward autonomous operation as confidence builds. We demonstrate the approach through high-priority operator scenarios drawn from TM Forum's autonomous network use cases, showing how autogenic management addresses real operational challenges. We conclude with a research roadmap outlining the technical advances needed to make autogenic network management realistic in future 6G networks.

Agentic AI for IPoDWDM Network Lifecycle Automation: An MCP-Enabled Architecture cs.NI

We present a distributed, vendor-agnostic multi-MCP architecture for SDN-based automation and autonomous control of multi-vendor, multi-layer IPoDWDM networks. The framework enables E2E service lifecycle automation, closed-loop cross-layer control using GNPy model and optical telemetry, and is experimentally validated on a IPoDWDM testbed.

MCP-Enabled Agentic AI for Autonomous IPoDWDM Network Lifecycle Automation cs.NI

This demo presents an MCP-enabled agentic AI architecture for autonomous control of vendor-agnostic IPoDWDM networks. We demonstrate live end-to-end lifecycle multi-layer automation and closed-loop control using GNPy and telemetry, validated on a real testbed.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	81	$11.25
2	Claude Opus 4.7	57.3	55	$10.94
3	Gemini 3.1 Pro Preview	57.2	132	$4.50
4	GPT-5.4	56.8	89	$5.63
5	Qwen3.7 Max	56.6	206	$3.75

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	gpt-5.5-2026-04-23-xhigh	62.7%
2	Codex	60.4%
3	Claude Code	59.6%
4	gpt-5.5-2026-04-23-medium	58.9%
5	gpt-5.4-2026-03-05-medium	54.9%

GitHub Repos All repos

Trending

Lum1104/Understand-Anything

43640 ★

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.

hardikpandya/stop-slop

7152 ★

A skill file for removing AI tells from prose

affaan-m/ECC

225416 ★

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

anthropics/knowledge-work-plugins

17517 ★

Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork

Leonxlnx/taste-skill

59585 ★

Taste-Skill - gives your AI good taste. stops the AI from generating boring, generic slop

Daily discovery

kubeflow/katibAutoML

1684 ★

Automated Machine Learning on Kubernetes

nocobase/nocobaseAI Agents

22551 ★

NocoBase is an open-source AI + no-code platform for building business systems fast. Instead of generating everything from scratch, AI works on top of production-proven infrastructure and a WYSIWYG no-code interface, so you get both speed and reliability.

schmitech/orbitAI Safety

281 ★

A self-hosted AI infrastructure for private RAG and multi-model applications.

streamlit/streamlitDeep Learning

44735 ★

Streamlit — A faster way to build and share data apps.

666DZY666/micronetModel Compression

2270 ★

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape