The Inference Report

May 3, 2026

The power to set rules around AI is increasingly divorced from the power to control where it actually runs. The Academy's ban on AI-generated actors and scripts, Apple's warnings about security risks from "vibe coding" apps, and Disney's deployment of face recognition at theme parks all represent institutions drawing formal boundaries around legitimate AI use. Yet these gestures of control are happening precisely as the technical foundation for that control erodes. Gemma 4's release has made local models competitive enough for production deployment, meaning developers can now run capable AI on their own hardware rather than rent access from frontier providers. The NSA's testing of Anthropic's Mythos Preview and Disney's operationalization of surveillance systems suggest different institutions are moving fast to embed AI into contexts where the cost of failure is high, but the infrastructure to support those deployments is increasingly available to anyone with a GPU.

The sorting is visible across multiple technical layers. Benchmark results show Claude Opus 4.6 now leads software engineering tasks at 65.3%, but the divergence between SWE-rebench and Artificial Analysis rankings reveals that no single evaluation framework captures complete capability, and that Chinese model families are closing gaps faster than top-tier consolidation would suggest. On GitHub, the trending repositories split sharply between financial automation through multi-agent systems and infrastructure for deploying agents at scale. Ruflo, Hermes Web UI, and the Claude Agent SDK all solve the same friction point: agents work in notebooks but fail in production without coordination and observability. The discovery set shows developers investing in prompt optimization, memory systems for agents, and standardization layers like MLX-OpenAI-server that make local models speak the OpenAI API. That is not innovation. That is standardization, signaling the commodity shift is real.

The structural tension is this: institutions that built power through scarcity and gatekeeping are erecting rules and warnings as their leverage erodes, while actual capability distribution tilts toward anyone with commodity hardware and an internet connection. The velocity of deployment visible in GitHub repositories focused on high-stakes applications like financial trading and web scraping suggests that regulators will have to catch up to what developers are already building. The gatekeepers can set cultural boundaries. They cannot control where the models run.

Grant Calloway

AI LabsAll labs

No lab headlines.

From the WireAll feeds

Research Papers — FocusedAll papers

Why Self-Supervised Encoders Want to Be Normal cs.IT

We develop a geometric and information-theoretic framework for encoder-decoder learning built on the Information Bottleneck (IB) principle. Recasting IB as a rate-distortion problem with Kullback-Leibler (KL) divergence as distortion, we show that the optimal representation at any distortion level is a soft clustering of the \emph{predictive manifold} $\mathcal{M}=\{p(Y|x):x\in\mathcal{X}\}$ inside the probability simplex, admitting a linear decoder in the canonical parameterization. We derive a chain of exact transformations, from flat Dirichlet to exponential to isotropic Gaussian, connecting the maximum entropy prior on the simplex to Euclidean space, with quantified entropy overhead at each step, and show that Sketched Isotropic Gaussian Regularization (SIGReg) implements a Gaussian relaxation of this principle whose overhead affects rate accounting but not achievable prediction. This relaxation provides a principled distributional regularizer for learning with limited or no supervision. Using the Conditional Entropy Bottleneck (CEB) decomposition, we derive concrete encoder losses for supervised and semi-supervised settings, estimated via minibatch marginals without variational bounds. In the self-supervised setting, the CEB conditional rate is replaced by a view-prediction proxy. SIGReg serves as the distributional regularizer for both the semi-supervised and self-supervised settings. Experiments on toy problems and FashionMNIST confirm the predicted rate-distortion trade-offs and show that the non-parametric estimator is competitive with the standard variational approach.

Lightweight Quantum Agent for Edge Systems: Joint PQC and NOMA Resource Allocation cs.IT

In the context of quantum secure scenarios, existing research on mobile edge devices and intelligent computing and edge (ICE) systems based on the Non-Orthogonal Multiple Access (NOMA) communication model have overlooked the energy consumption overhead of Post-Quantum Cryptography (PQC) modules, and the high complexity of traditional resource allocation algorithms fails to meet the demands of real-time decision-making. To address these challenges, this paper proposes a lightweight agentic AI framework designed for online joint optimization within ICE-enabled mobile devices. The scheme constructs a multi-stage stochastic Mixed Integer Nonlinear Programming (MINLP) model that incorporates static power-consumption constraints for PQC modules. Based on Lyapunov optimization theory, the long-term optimization problem is decoupled, and a linear complexity algorithm is proposed to solve the nonconvex challenges of NOMA power allocation . Simulation results verify that the proposed scheme significantly improves computational throughput while ensuring system queue stability and energy consumption constraints. Compared with traditional Successive Convex Approximation (SCA) algorithms, the complexity is reduced to $\mathcal{O}(N)$, achieving a speedup of approximately 46 times when the number of devices $N=35$, thereby meeting the real-time decision-making requirements in dynamic wireless environments.

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference cs.IT

While distributed device-edge speculative decoding enhances resource utilization across heterogeneous nodes, its performance is often bottlenecked by conventional token-level verification strategies. Such rigid alignment leads to excessive rejections, significantly diminishing the accepted sequence length and increasing interaction rounds under fluctuating wireless conditions. In this paper, we propose WISV (Wireless-Informed Semantic Verification), a novel distributed speculative decoding framework that goes beyond strict token-level matching via a channel-aware semantic acceptance policy. WISV integrates a lightweight decision head into the edge-side target LLM to dynamically evaluate speculative tokens by synthesizing high-dimensional hidden representations with instantaneous channel state information (CSI). To optimize the trade-off between verification fidelity and communication overhead, we further design two tailored communication protocols: full-hidden upload and mismatch-first selective-hidden upload. Extensive simulations using a 1B drafter and an 8B target model demonstrate that WISV achieves up to a 60.8% increase in accepted length, a 37.3% reduction in interaction rounds, and a 31.4% improvement in end-to-end latency compared to vanilla speculative decoding across tested settings, while maintaining a negligible task accuracy drop (<1%). Finally, we validate WISV on a hardware testbed comprising an NVIDIA Jetson AGX Orin and an A40-equipped server, confirming its real-world efficacy in accelerating edge-deployed LLM inference.

Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach cs.IT

To address high data traffic demands of sixth-generation (6G) networks, this paper proposes a novel architecture that integrates autonomous aerial vehicles (AAVs) and multi-functional reconfigurable intelligent surfaces (MF-RISs) as AM-RIS in fluid antenna (FA)-assisted full-duplex (FD) networks. The AM-RIS provides hybrid functionalities, including signal reflection, amplification, and energy harvesting (EH), potentially improving both signal coverage and sustainability. Meanwhile, FA facilitates fine-grained spatial adaptability at FD-enabled base station (BS), which complements residual self-interference (SI) suppression. We aim at maximizing the overall energy efficiency (EE) by jointly optimizing transmit DL beamforming at BS, UL user power, configuration of AM-RIS, and positions of the FA and AM-RIS. Owing to the hybrid continuous-discrete parameters and high dimensionality of the intractable problem, we have conceived a self-optimized multi-agent hybrid deep reinforcement learning (DRL) framework (SOHRL), which integrates multi-agent deep Q-networks (DQN) and multi-agent proximal policy optimization (PPO), respectively handling discrete and continuous actions. To enhance self-adaptability, an attention-driven state representation and meta-level hyperparameter optimization are incorporated, enabling multi-agents to autonomously adjust learning hyperparameters. Simulation results validate the effectiveness of the proposed AM-RIS-enabled FA-aided FD networks empowered by SOHRL algorithm. The results reveal that SOHRL outperforms benchmarks of the case without attention mechanism and conventional hybrid/multi-agent/standalone DRL. Moreover, AM-RIS in FD achieves the highest EE compared to half-duplex, conventional rigid antenna arrays, partial EH, and conventional RIS without amplification, highlighting its potential as a compelling solution for EE-aware wireless networks.

A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff cs.IT

The fundamental limit of natural signal compression has traditionally been characterized by classical rate-distortion (RD) theory through the tradeoff between coding rate and reconstruction distortion, while the rate-distortion-perception (RDP) framework introduces a divergence-based measure of perceptual quality as a modeling principle rather than a theoretically-derived principle, leaving its theoretical origin unclear. In this paper, motivated by a synonymity-based semantic information perspective, we reformulate perceptual reconstruction as recovering any admissible sample within an ideal synonymous set (synset) associated with the source, rather than the source sample itself, and correspondingly establish a synonymous source coding architecture. On this basis, we develop a synonymous variational inference (SVI) analysis framework with a synonymous variational lower bound (SVLBO) for tractable analysis of synset-oriented compression. Within this framework, we establish a synonymity-perception consistency principle, showing that optimal identification of semantic information is theoretically consistent with perceptual optimization. Based on its derivation result, we prove a synonymous RDP tradeoff for the proposed synonymous source coding. These analytical results show that the distributional divergence term arises naturally from the synset-based reconstruction objective, clarify its compatibility with existing RDP formulations and classical RD theory, and suggest the potential advantages of synonymous source coding.

Exploiting Correlations in Federated Learning: Opportunities and Practical Limitations cs.IT

The communication bottleneck in federated learning (FL) has spurred extensive research into techniques to reduce the volume of data exchanged between client devices and the central parameter server. In this paper, we systematically classify gradient and model compression schemes into three categories based on the type of correlations they exploit: structural, temporal, and spatial. We examine the sources of such correlations, propose quantitative metrics for measuring their magnitude, and reinterpret existing compression methods through this unified correlation-based framework. Our experimental studies demonstrate that the degrees of structural, temporal, and spatial correlations vary significantly depending on task complexity, model architecture, and algorithmic configurations. These findings suggest that algorithm designers should carefully evaluate correlation assumptions under specific deployment scenarios rather than assuming that they are always present. Motivated by these findings, we propose two adaptive compression designs that actively switch between different compression modes based on the measured correlation strength, and we evaluate their performance gains relative to conventional non-adaptive approaches. In summary, our unified taxonomy provides a clean and principled foundation for developing more effective and application-specific compression techniques for FL systems.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	76	$11.25
2	Claude Opus 4.7	57.3	61	$10.94
3	Gemini 3.1 Pro Preview	57.2	133	$4.50
4	GPT-5.4	56.8	84	$5.63
5	Kimi K2.6	53.9	29	$1.71

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

TauricResearch/TradingAgents

63737 ★

TradingAgents: Multi-Agents LLM Financial Trading Framework

ruvnet/ruflo

37284 ★

🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, distributed swarm intelligence, RAG integration, and native Claude Code / Codex Integration

browserbase/skills

1612 ★

Claude Agent SDK with a web browsing tool

soxoj/maigret

23111 ★

🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites

Flowseal/zapret-discord-youtube

27160 ★

Daily discovery

EKKOLearnAI/hermes-web-uiAI Agents

3363 ★

Web dashboard for Hermes Agent — multi-platform AI chat, session management, scheduled jobs, usage analytics & channel configuration (Telegram, Discord, Slack, WhatsApp)

linshenkx/prompt-optimizerPrompt Engineering

27886 ★

An AI prompt optimizer for writing better prompts and getting better AI results.

cubist38/mlx-openai-serverSpeech Recognition

325 ★

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

kornia/korniaRobotics

11191 ★

🐍 Geometric Computer Vision Library for Spatial AI

darkdevil3610/100-AI-Machine-learning-Deep-learning-Computer-vision-NLPNLP

164 ★

100+ AI Machine learning Deep learning Computer vision NLP Projects with code