The Inference Report — March 1, 2026

The AI industry's central tension is no longer whether agents will work, but whether anyone can afford to run them at scale. That double bind defined the week's news across every sector. The Pentagon gave Anthropic an ultimatum that made plain what has been true for months: the era of AI companies negotiating their own terms with the military is over. Defense Secretary Pete Hegseth gave the company until February 27 to agree to broader military access or risk being designated a supply chain risk, invoking the Defense Production Act to force compliance regardless. OpenAI simultaneously admitted its own Pentagon deal was definitely rushed and that the optics do not look good. The companies that built the most capable models are discovering that capability is precisely what makes them indispensable to the state, and therefore subject to state demands.

That same dynamic of growing ambition colliding with stubborn constraints played out across the infrastructure layer. NVIDIA posted record $68 billion in fourth-quarter sales, AWS refreshed its compute lineup with M8azn and C8id generations, and Railway raised $100 million to challenge AWS with AI-native cloud infrastructure, a telling signal that legacy providers are struggling to serve this new workload. But the physical world is pushing back: data center builders cannot convince farmers to sell their land even with million-dollar offers, while Microsoft and Anthropic both pledged to cover electricity cost increases driven by their own expansion. The agentic future is arriving before the safety scaffolding is in place, as evidenced by the SANDWORM_MODE npm attack, the AWS Kiro incident, and restrictions on OpenClaw due to security concerns.

What may prove most durable, though, is the consumer market's quiet recalibration. Samsung's Galaxy S26 ships with more AI and a higher price tag, Amazon made Alexa+ free for Prime members, and the QuitGPT campaign is gaining traction among developers frustrated with meandering responses and $20 monthly fees. Claude Code's pricing up to $200 per month has opened a lane for free alternatives like Goose to gain ground. The distinction between what's genuinely useful and what's merely novel is becoming clearer: GitHub's markitdown solves a boring but real problem at nearly 90,000 stars, while DeepSpeed and the Hugging Face transformers library form the foundational infrastructure that shows up in dependency trees across thousands of projects. The emergence of dedicated memory systems like memU for autonomous agents represents a maturation beyond the prompt-response paradigm, but the question hanging over every lab is whether the market will test whether the AI industry's pricing, product design, and value proposition hold up under real usage.

Grant Calloway

AI LabsAll labs

AI21 Labs

AMD

Alibaba (Qwen)

Google DeepMind

Hugging Face

MiniMax

NVIDIA

OpenAI

Our agreement with the Department of War

From the WireAll feeds

Research PapersAll papers

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems cs.RO

Safety-critical task planning in robotic systems remains challenging: classical planners suffer from poor scalability, Reinforcement Learning (RL)-based methods generalize poorly, and base Large Language Models (LLMs) cannot guarantee safety. To address this gap, we propose safety-generalizable large language models, named SafeGen-LLM. SafeGen-LLM can not only enhance the safety satisfaction of task plans but also generalize well to novel safety properties in various domains. We first construct a multi-domain Planning Domain Definition Language 3 (PDDL3) benchmark with explicit safety constraints. Then, we introduce a two-stage post-training framework: Supervised Fine-Tuning (SFT) on a constraint-compliant planning dataset to learn planning syntax and semantics, and Group Relative Policy Optimization (GRPO) guided by fine-grained reward machines derived from formal verification to enforce safety alignment and by curriculum learning to better handle complex tasks. Extensive experiments show that SafeGen-LLM achieves strong safety generalization and outperforms frontier proprietary baselines across multi-domain planning tasks and multiple input formats (e.g., PDDLs and natural language).

Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion cs.DS

We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our work follows a recent framework called metric forest completion (MFC), where the learned input is a forest that must be given additional edges to form a full spanning tree. Veldt et al. (2025) showed that optimally completing the forest takes $Ω(n^2)$ time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a $(2γ+ 1)$-approximation for the original MST problem, where $γ\geq 1$ is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal $Ω(n^2)$-time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and $(2γ+1)$ for metric MST to 2 and $2γ$ respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.

Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference cs.LG

In this paper, we provide the first investigation into adaptive combinatorial experimental design, focusing on the trade-off between regret minimization and statistical power in combinatorial multi-armed bandits (CMAB). While minimizing regret requires repeated exploitation of high-reward arms, accurate inference on reward gaps requires sufficient exploration of suboptimal actions. We formalize this trade-off through the concept of Pareto optimality and establish equivalent conditions for Pareto-efficient learning in CMAB. We consider two relevant cases under different information structures, i.e., full-bandit feedback and semi-bandit feedback, and propose two algorithms MixCombKL and MixCombUCB respectively for these two cases. We provide theoretical guarantees showing that both algorithms are Pareto optimal, achieving finite-time guarantees on both regret and estimation error of arm gaps. Our results further reveal that richer feedback significantly tightens the attainable Pareto frontier, with the primary gains arising from improved estimation accuracy under our proposed methods. Taken together, these findings establish a principled framework for adaptive combinatorial experimentation in multi-objective decision-making.

A Variational Estimator for $L_p$ Calibration Errors stat.ML

Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.

BLISSNet: Deep Operator Learning for Fast and Accurate Flow Reconstruction from Sparse Sensor Measurements physics.flu-dyn

Reconstructing fluid flows from sparse sensor measurements is a fundamental challenge in science and engineering. Widely separated measurements and complex, multiscale dynamics make accurate recovery of fine-scale structures difficult. In addition, existing methods face a persistent tradeoff: high-accuracy models are often computationally expensive, whereas faster approaches typically compromise fidelity. In this work, we introduce BLISSNet, a model that strikes a strong balance between reconstruction accuracy and computational efficiency for both flow reconstruction and nudging-based data assimilation. The model follows a DeepONet-like architecture, enabling zero-shot inference on domains of arbitrary size. After the first model call on a given domain, certain network components can be precomputed, leading to low inference cost for subsequent evaluations on large domains. Consequently, the model can achieve faster inference than classical interpolation methods such as radial basis function or bicubic interpolation. This combination of high accuracy, low cost, and zero-shot generalization makes BLISSNet well-suited for large-scale real-time flow reconstruction and data assimilation tasks.

MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy cs.CV

Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader tissue organization. Many analysis tasks require combining these scales, yet most vision models operate at a single resolution or derive multi-scale features from one view, limiting their ability to exploit the inherently multi-resolution nature of microscopy data. We introduce MuViT, a transformer architecture built to fuse true multi-resolution observations from the same underlying image. MuViT embeds all patches into a shared world-coordinate system and extends rotary positional embeddings to these coordinates, enabling attention to integrate wide-field context with high-resolution detail within a single encoder. Across synthetic benchmarks, kidney histopathology, and high-resolution mouse-brain microscopy, MuViT delivers consistent improvements over strong ViT and CNN baselines. Multi-resolution MAE pretraining further produces scale-consistent representations that enhance downstream tasks. These results demonstrate that explicit world-coordinate modelling provides a simple yet powerful mechanism for leveraging multi-resolution information in large-scale microscopy analysis.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

No benchmark data.

SWE-rebench

Agentic coding on real-world software engineering tasks

No benchmark data.

GitHub Repos All repos

Trending

ruvnet/wifi-densepose

19539 ★

Production-ready implementation of InvisPose - a revolutionary WiFi-based dense human pose estimation system that enables real-time full-body tracking through walls using commodity mesh routers

moeru-ai/airi

40403 ★

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

anthropics/claude-code

129171 ★

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

tukaani-project/xz

1258 ★

XZ Utils

Shubhamsaboo/awesome-llm-apps

98978 ★

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Daily discovery

Natfii/ZeroClaw-AndroidAI Agents

143 ★

Run AI agents 24/7 on your Android phone. Native Rust core, 25+ providers (OpenAI, Claude, Gemini, Groq, DeepSeek, Ollama), encrypted key storage, plugin browser, Material You UI. Self-hosted alternative to Mac Mini setups. MIT licensed.

shimat/opencvsharpComputer Vision

5925 ★

OpenCV wrapper for .NET

HKUDS/LightRAGKnowledge Graph

36634 ★

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

qdrant/qdrantMachine Learning

31374 ★

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

RunMaestro/MaestroGenerative AI

2367 ★

Agent Orchestration Command Center