The Inference Report

June 1, 2026

Capital is abandoning manufacturing for silicon. SoftBank's market value now exceeds Toyota's, marking the definitive end of Japan's industrial era and the beginning of its AI hardware bet. Ardian is building a five-billion-euro data center outside Paris. Intel is shipping inference GPUs by year end. The infrastructure thesis is being priced in at scale, and the money has moved decisively from pipes to software and back again to pipes. What's being tested now is whether returns justify deployment, and whether the public will tolerate the environmental and labor costs of getting there.

NVIDIA has moved beyond selling chips. At Computex, the company announced a unified stack: hardware, software layers, reference designs, open source tools, and now Hugging Face integration of NVIDIA Cosmos 3 into its model hub. The architecture makes it cheaper and faster to build on NVIDIA infrastructure than anywhere else, then calls it openness. Taiwan's manufacturing ecosystem serves as both proof of concept and supply chain lock-in. Token demand is exploding. Every factory, hospital, and developer scaling agentic AI is being funneled toward the same compute layer. This isn't about winning deals. It's about making the alternative friction-filled enough that builders stop considering it.

Beneath this confidence sits a widening credibility gap. Gen Z uses AI more than any cohort but increasingly views it as a threat rather than opportunity. Erin Brockovich's scrutiny of data center secrecy suggests environmental and transparency concerns will become friction points as capacity demands accelerate. Wall Street strategists are shrugging off bubble warnings while simultaneously betting on AI stocks to defy gravity, a posture that depends entirely on the infrastructure thesis holding.

Developer energy has already shifted from "what if we use an LLM" to "how do we actually operate these systems." GitHub trending shows agent coordination tools like Harness, CORAL, and Pi Subagents solving real problems that didn't exist two years ago. Aegis enforces runtime policies with cryptographic audit trails. Claude Code dominates, but the deeper signal is the emergence of glue tools: MarkItDown converts documents, Scrapling handles web scraping, Supermemory builds memory layers. Openpilot remains the most interesting robotics repo because it ships actual code running on actual cars, not research simulations. The practical layer is solidifying while the infrastructure layer consolidates.

Grant Calloway

AI LabsAll labs

Hugging Face

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

NVIDIA

From the WireAll feeds

Research Papers — FocusedAll papers

The Energy Society: A Simulation Environment for Studying Agent Cooperation under Survival Pressure cs.MA

LLM-based agents are increasingly deployed in multi-agent environments whose incentives can shape their behavior. We introduce The Energy Society, a minimal survival economy for studying how competitive and cooperative incentives affect emergent behavior when inference cost is directly tied to survival: Agents spend energy based on model size when generating tokens, regain energy by completing jobs or receiving donations, and deactivate if their energy reaches zero. We compare competitive and cooperative objectives against a baseline setting and several control variants. Across experiments, larger models consistently consume the most energy and spend more energy than they gain, even in those settings where token cost is not size-dependent. Cooperative incentives substantially alter behavior: agents donate to reactivate others, sometimes at the cost of their own survival, and job allocation changes. Ablations reveal that allowing agents to recommend actions to each other supports coordination and ambitious job selection, while memory helps agents calibrate risk from past outcomes. Agents rarely choose direct sabotage, but show more subtle signs of self-serving behavior in the competitive setting. The Energy Society is a compact testbed for studying the interaction between token costs and group incentives under a survival pressure. Source code is available at https://github.com/LucasBergholdt/EnergySociety

Social Simulations: from Agent-Based Modeling to Digital Twins cs.MA

This book chapter covers the evolution of social simulation from classical agent-based models, in which agents interact according to explicitly defined behavioral rules, to AI-enhanced simulations based on Large Language Models and, ultimately, Social Digital Twins: high-fidelity, data-driven representations of real-world socio-technical systems. Along this trajectory, we discuss the main methodological foundations, applications, advantages, and limitations of each paradigm, highlighting the progressive shift from abstract models designed to investigate general social mechanisms toward increasingly realistic computational representations of specific social systems.

MetaInfer: A Knowledge Only LLM Inference Engine Generator SKILL Toolbox cs.MA

As LLM technology advances, the space of model families, compute hardware, quantization schemes, parallelization strategies, and specialized optimization kernels continues to expand, sharply increasing the code complexity and maintenance cost of general-purpose inference frameworks. Conventional software engineering uses multiple layers of abstraction to support diverse application scenarios, but these abstractions also increase system complexity and may introduce additional performance overhead. This paper presents metainfer, an 'LLM-as-Compiler' approach in which users specify only the runtime constraints of an inference program. An LLM-driven multi-agent collaboration system, coupled with a contract knowledge base, then automatically generates a compact customized inference framework that satisfies these constraints. We evaluate metainfer from three perspectives: the effect of source-code reference, the runtime behavior and performance profile of engines generated under the zero-reference constraint on CKB-covered targets, and knowledge-base evolution for new model and platform scenarios. The results show that metainfer organizes generation constraints, validation feedback, and knowledge consolidation into a continuous closed loop, enabling runnable customized inference solutions to be generated from explicit knowledge. The code is publicly available at https://github.com/MetaInfer/MetaInfer.

Distributed Agent System: Fault-Tolerant Collaboration Among Embodied Agents cs.MA

AI engineering is shifting from passive text generation by large language models (LLMs) to agent-driven task execution, creating new reliability challenges for long-horizon tasks under resource constraints and environmental uncertainty. Conventional error-elimination optimization strategies fail to address cumulative error propagation. This paper proposes Distributed Agent System (DAS), a device-edge-cloud framework for fault-tolerant collaboration among heterogeneous agents. We redefine agent reliability as system-level fault tolerance rather than single-turn zero-error accuracy, and present a two-layer fault-tolerance architecture: single-agent execution reliability via fault-tolerant alignment, and cross-agent communication reliability via semi-formal language protocols. This framework provides a practical engineering pathway for reliable heterogeneous embodied agents collaboration in industrial scenarios.

Auditing Belief-Conditioned LLM Agents in Hidden-Information Social Deduction Games cs.MA

Evaluating LLM agents in hidden-information multi-agent settings is hard: final outcomes are high-variance and rarely reveal why an agent decided as it did. We study this in a 9-player Werewolf environment where agents act under strict, code-level information isolation, and we build an auditable framework that maintains an external belief state over hidden roles, logs belief updates and belief-action deviations as structured evidence, and supports a defensive offline improvement loop that reviews bad cases before any strategy change. Across 1,080 frozen games spanning belief-disabled, active-belief, kernel-ablation, camp-restricted, consumption-policy, and high-load arms, and including a seed-paired A0/A1 comparison, the active-belief condition is associated with substantially better good-side outcomes: in the 200-seed A0/A1 comparison the good-side win rate rises from 0.205 to 0.390 (paired McNemar $χ^2 = 16.4$, $p < 0.001$), with fewer irreversible witch-poison errors. We do not, however, attribute this shift to belief content. Direct action-belief consistency is low ($\approx 0.21$), and giving belief only to the werewolves helps the good side more than giving it only to the good side, which argues against a simple holder-benefit account; we therefore report the effect as an association and treat its mechanism as unresolved. The contribution is the audit framework itself: it makes the effect measurable, exposes low direct action-belief consistency, rejects an unreliable forced-consumption intervention with evidence, and separates strategy effects from load confounds. We accordingly position external belief in high-noise hidden-information games primarily as an auditable cognitive baseline that also carries decision-relevant signal, turning opaque agent behavior into replayable evidence for safer, controlled iteration.

Multi-Agent LLMs Fail to Explore Each Other cs.MA

Exploration is essential for reliable autonomy in multi-agent systems, yet it remains unclear whether large language model (LLM) agents can explore effectively when interacting with one another. We show that modern LLM agents fail to do so, often exhibiting myopic and polarized interaction patterns that lead to suboptimal coordination and increased regret. We formalize this challenge as the Multi-Agent Exploration problem, modeling it as a partially observable stochastic game (POSG) problem in which agents must probe peers to infer their capabilities and identify effective interaction strategies. To address this, we introduce Multi- Agent Contextual Exploration (MACE), a lightweight framework that explicitly promotes exploration through structured peer selection. Across both contextual and parametric diversity settings, MACE substantially improves exploration behavior and downstream task performance. We further show theoretically that the value of exploration increases with agent diversity. Overall, our results highlight a fundamental limitation of current LLM agents and underscore the importance of explicitly guided exploration for reliable multi-agent autonomy. Code will be released in https://github.com/deeplearning-wisc/mace

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Opus 4.8	61.4	63	$10.94
2	GPT-5.5	60.2	66	$11.25
3	Claude Opus 4.7	57.3	60	$10.94
4	Gemini 3.1 Pro Preview	57.2	144	$4.50
5	GPT-5.4	56.8	86	$5.63

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	gpt-5.5-2026-04-23-xhigh	62.7%
2	Codex	60.4%
3	Claude Code	59.6%
4	gpt-5.5-2026-04-23-medium	58.9%
5	Claude Opus 4.8-xhigh	56.4%

GitHub Repos All repos

Trending

microsoft/markitdown

143443 ★

Python tool for converting files and office documents to Markdown.

D4Vinci/Scrapling

60665 ★

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

nesquena/hermes-webui

13269 ★

Hermes WebUI: The best way to use Hermes Agent from the web or from your phone!

EveryInc/compound-engineering-plugin

19242 ★

Official Compound Engineering plugin for Claude Code, Codex, Cursor, and more

github/docs

19815 ★

The open-source repo for docs.github.com

Daily discovery

VectifyAI/OpenKBLLM

1970 ★

OpenKB: Open LLM Knowledge Base

apache/hamiltonMLOps

2503 ★

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

modelscope/modelscopeNLP

9028 ★

ModelScope: bring the notion of Model-as-a-Service to life.

RunanywhereAI/runanywhere-sdksDiffusion Models

10347 ★

Production ready toolkit to run AI locally

Justin0504/AegisAI Safety

362 ★

Runtime policy enforcement for AI agents. Cryptographic audit trail, human-in-the-loop approvals, kill switch. Zero code changes.