The Inference Report — June 13, 2026

The state's hand is now visibly on the kill switch. Anthropic pulled Claude Fable 5 offline after the Trump administration flagged a jailbreak vulnerability as a national security threat, and the company's public frustration signals a collision between regulatory authority and commercial confidence that will reshape how AI companies operate. This is no longer a debate about safety frameworks. It is about who controls the product after deployment, and the answer is unambiguous: the government does, regardless of what internal testing showed. That shift cascades through everything else happening today. SpaceX raised nearly $12 billion in private capital since 2002 and closed its IPO up 19 percent, yet the company's success is partly inseparable from government contracts and regulatory favor. Mistral is rumored to be raising 3 billion euros at a 20 billion euro valuation, yet European AI companies operate under entirely different political constraints than their US counterparts. Anthropic is recruiting 1,000 Claude Corps fellows to evangelize AI to nonprofits across the US, a move that reads less like community outreach and more like inoculation against the student backlash that has greeted AI speakers at graduation ceremonies. Meta's internal AI unit is described by employees as chaotic and demoralizing, with 6,500 people caught between competing visions of strategy. Companies are spending capital and political energy managing perception and compliance, not just building better models.

The ground beneath data centers is shifting in ways that matter more immediately. Protests have blocked 130 billion dollars in data center projects so far this year. Google is suing a Chinese cybercrime operation called Outsider Enterprise that used Gemini to automate scam sites, targeting hundreds of thousands of victims with 2.5 million text messages in two weeks. Ukraine is installing AI modules on drones and robots for autonomous targeting. The Pokémon Go dataset, collected from players hunting creatures on their phones, continues to be repurposed for AI training and military drone applications. These are the real pressures: local communities blocking infrastructure, criminal groups automating fraud at scale, militaries weaponizing autonomous systems, and the casual reuse of consumer data for purposes users never consented to.

The labs are signaling a coordinated shift toward agents and workflow automation, but the actual product moves reveal something more granular. The real competition is over who owns the layer between model capability and user outcome. OpenAI is embedding itself into Preply's tutoring pipeline not because teaching people to prompt is a business, but because controlling the workflow means controlling the switching cost. GitHub's work on making Copilot CLI more selective about delegation and Hugging Face's olmo-eval workbench both point to the same friction point: agents sound inevitable until you actually run them, at which point developers need better visibility into what is being delegated and why. NVIDIA's AgentPerf benchmark and Blackwell's 20x agents-per-megawatt claim are infrastructure positioning, but the benchmark itself is the more important move. Across the set, no one is arguing about model scale or reasoning anymore. The argument is about whose system sits closest to the user's actual work. On GitHub, developers are moving past "can we build agents" to "how do we build them reliably, with measurable behavior and traceable decision paths." Addyosmani's agent-skills and obra's superpowers position themselves as production-grade methodologies. LMCache and MLflow occupy the plumbing layer, one optimizing inference cost through KV cache management, the other providing observability and control for production AI systems. The pattern is not "AI is eating software." It is "developers are building the operational layer that makes AI systems dependable enough to ship."

Grant Calloway

AI LabsAll labs

Anthropic

GitHub Blog

How we made GitHub Copilot CLI more selective about delegation

Google

Hugging Face

olmo-eval: An evaluation workbench for the model development loop

Microsoft

AI alone won’t change your business. The system running it will.

NVIDIA

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization cs.NI

Network planning optimization is a fundamental problem across diverse domains, including transportation systems, communication networks, and power grids. It requires simultaneous optimization of multiple competing objectives under complex constraints. Existing network planning optimization frameworks rely on mixed integer programming (MIP) solvers, heuristics, and deep reinforcement learning (DRL) models to compute planning decisions. However, they lack effective adaptability to diverse and dynamic user intents, thus leading to the trade-off between execution time and optimality. In this paper, we propose OmniPlan, an adaptive framework that achieves both timeliness and near-optimality in network planning optimization. To achieve the adaptability lacking in existing solutions, OmniPlan employs a large language model (LLM)-based interpreter to convert heterogeneous natural-language intents into a unified and quantifiable user-preference vector. Then it employs a mixture-of-experts architecture that integrates MIP solvers, heuristics, and DRL models as specialized experts, where OmniPlan adapts to diverse intents by dynamically selecting timely and near-optimal experts. Finally, it incorporates a DRL-based expert configuration module that fine-tunes optimization objective weights to align planning decisions with user-specific preferences. We evaluate OmniPlan with a representative real-world workload, i.e., distributed machine learning (ML), where we leverage OmniPlan to offload a wide spectrum of ML inference tasks, e.g., decision trees, SVM, naive Bayes, XGBoost, and random forests, onto a network of hardware devices. Our experiments on a real-world testbed indicate that OmniPlan achieves near-optimal and low-execution-time offloading for real-world ML inference tasks, reducing latency by up to 97.8\% and network device resource consumption by up to 11.5\%.

A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions cs.NI

Optical networks need intent-driven, closed-loop agentic management, a key enabler for higher autonomy levels. We present the first T-API-compliant reasoning and act (ReAct) loop. We show that domain-specific composite tools achieve 90% oracle-validated correctness with threefold token savings compared to generic tools.

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization cs.NI

Network planning optimization is a fundamental problem across diverse domains, including transportation systems, communication networks, and power grids. It requires simultaneous optimization of multiple competing objectives under complex constraints. Existing network planning optimization frameworks rely on mixed integer programming (MIP) solvers, heuristics, and deep reinforcement learning (DRL) models to compute planning decisions. However, they lack effective adaptability to diverse and dynamic user intents, thus leading to the trade-off between execution time and optimality. In this paper, we propose OmniPlan, an adaptive framework that achieves both timeliness and near-optimality in network planning optimization. To achieve the adaptability lacking in existing solutions, OmniPlan employs a large language model (LLM)-based interpreter to convert heterogeneous natural-language intents into a unified and quantifiable user-preference vector. Then it employs a mixture-of-experts architecture that integrates MIP solvers, heuristics, and DRL models as specialized experts, where OmniPlan adapts to diverse intents by dynamically selecting timely and near-optimal experts. Finally, it incorporates a DRL-based expert configuration module that fine-tunes optimization objective weights to align planning decisions with user-specific preferences. We evaluate OmniPlan with a representative real-world workload, i.e., distributed machine learning (ML), where we leverage OmniPlan to offload a wide spectrum of ML inference tasks, e.g., decision trees, SVM, naive Bayes, XGBoost, and random forests, onto a network of hardware devices. Our experiments on a real-world testbed indicate that OmniPlan achieves near-optimal and low-execution-time offloading for real-world ML inference tasks, reducing latency by up to 97.8\% and network device resource consumption by up to 11.5\%.

Hidden Degradation Costs in Energy-Cost-Only HEMS Optimisation: Study on Battery and PV Sensitivity cs.NI

Residential battery energy storage systems (BESS) are increasingly deployed alongside photovoltaic (PV) generation to reduce household energy costs under volatile time-of-use (TOU) tariffs. Model predictive control (MPC) is a widely adopted optimisation strategy for home energy management systems (HEMS), typically formulated to minimise net energy cost, subject to physical and operational constraints. However, battery degradation is rarely embedded in the optimisation objective, meaning its cost is unquantified and aggressive; high-cycle-count strategies could incur significant losses once deployed to physical systems. This paper presents a receding-horizon mixed-integer linear programming (MILP) baseline for a UK residential HEMS, using demand data from the REFIT dataset. A 3 by 3 sensitivity study is conducted across three battery sizes and three PV array sizes, with post-hoc degradation cost estimated using the Naumann stress model and rainflow cycle counting. Results show that degradation remains constant for each battery size and can exceed energy cost savings by up to 1,060 %. These results demonstrate that energy-cost-only optimisation systematically underestimates the true system cost, motivating a degradation-aware control formulation.

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites cs.NI

Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

Graphical Causal Reasoning for Root Cause Analysis in Cloud Networks cs.NI

Cloud-computing relies on large-scale networks which are inherently complex systems. In this paper, we present a novel approach to root cause analysis (RCA) of cloud network incidents, leveraging graph-based causal discovery techniques. Our method addresses the limitations of rule-based automation by introducing a spatiotemporal grouping strategy and an automation ontology to reduce the dimensionality of the problem. We construct a causal graph from binary time series data using bivariate Granger causality and conditional independence tests. For inference, we introduce a probabilistic method that assigns edge-specific conditional probabilities as a function of time lag, allowing for interpretable, time-aware root cause scoring via causal graph traversal. We evaluated the system using a labeled dataset of 35 production incidents from a major cloud provider. The model successfully recalled the correct root cause in 85.7% of incidents and produced an exact match in 74.3%. In production, the deployed system has been used in over 800 real-world incidents, with positive qualitative feedback from network engineers. These results highlight the practicality of a data-driven, causal approach to RCA in dynamic and large-scale operational environments.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Fable 5	64.9	68	$20.00
2	Claude Opus 4.8	61.4	57	$10.00
3	GPT-5.5	60.2	62	$11.25
4	Claude Opus 4.7	57.3	52	$10.00
5	Gemini 3.1 Pro Preview	57.2	132	$4.50

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	gpt-5.5-2026-04-23-xhigh	62.7%
2	Junie	61.6%
3	Codex	60.4%
4	Claude Code	59.6%
5	gpt-5.5-2026-04-23-medium	58.9%

GitHub Repos All repos

Trending

addyosmani/agent-skills

58861 ★

Production-grade engineering skills for AI coding agents.

music-assistant/server

2636 ★

Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistant and must run on an always-on device like a Raspberry Pi, a NAS or an Intel NUC or alike.

mattermost/mattermost

37447 ★

Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..

apple/container

36589 ★

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

iptv-org/iptv

124579 ★

Collection of publicly available IPTV channels from all over the world

Daily discovery

EvoMap/evolverPrompt Engineering

8561 ★

The GEP-Powered Self-Evolution Engine for AI Agents. Genome Evolution Protocol. | evomap.ai

verivital/nnvNeural Network

142 ★

Neural Network Verification Software Tool https://www.verivital.com Documentation:

mlflow/mlflowMLOps

26483 ★

The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

alphadl/AdaRubricsRLHF

215 ★

AdaRubric: Adaptive Dynamic Rubric Evaluator for Agent Trajectories

ultralytics/yolo-flutter-appObject Detection

440 ★

Flutter plugin for Ultralytics YOLO