The Inference Report

June 29, 2026

Execution is becoming the constraint. Ford rehired the engineers it fired, admitting that algorithms alone cannot replace human judgment in manufacturing. Meanwhile, capital continues flooding toward the commodity layer, chips, power, memory, while the builders solving actual problems operate in a narrower band. Flexion Robotics is training robots to perform real work. Liquid AI shipped a 230-million-parameter model that runs on a phone. DeepSeek open-sourced a speculative decoding framework cutting inference time 57 to 85 percent. These are products, not frameworks. The gap between what investors think they're buying and what actually works is widening.

HP's partnership with OpenAI illustrates the broader shift: non-AI companies are choosing integration speed over proprietary capability, wrapping frontier models into hardware and workflows rather than building their own. This accelerates commoditization of the hardware layer itself. When your value proposition becomes access and integration speed, you're no longer selling the thing, you're selling the conduit. The capital markets have priced AI as solved infrastructure. The actual builders know better.

Research papers cluster around compositional learning, constrained optimization, and inference-time adaptation, work that distinguishes carefully between what a method optimizes for and what problem it actually solves. On GitHub, developers are past waiting for frameworks to abstract details away. They want tools that run on their hardware: CuPy and mlx-audio port familiar APIs to GPU and Apple Silicon. Code-aware systems like codebase-memory-mcp index 158 languages into knowledge graphs, solving the specific problem of giving AI enough context without drowning in tokens. That's where the real work is happening.

The benchmarks reveal little. SWE-rebench and Artificial Analysis diverge substantially at the top, their methodologies opaque, their scoring reproducibility unclear. Without knowing what each test measures or how tasks are sampled, the rankings function as indices rather than measures of capability. The infrastructure vendors will keep raising capital. The scarcity is talent that knows how to build things that function in production. When capital starts flowing there instead, the hierarchy of returns shifts.

Grant Calloway

AI LabsAll labs

OpenAI

HP Inc. launches Frontier strategic partnership with OpenAI

From the WireAll feeds

Research PapersAll papers

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand cs.RO

Dexterous manipulation policies can solve individual skills, but composing them to perform multiple tasks with a single hand remains challenging. Adding a new task on top of an existing manipulation skill often imposes conflicting demands on overlapping fingers and contact modes, causing destructive interference between preserving an existing manipulation outcome and executing a new one. We propose DexCompose, a role-aware residual composition framework that reuses pretrained dexterous policies for multi-task manipulation through explicit finger-level action ownership. Given two pretrained full-hand policies, DexCompose first collects successful post-task states from the first skill and performs release tests over candidate finger masks to identify which fingers are necessary for maintaining the established skill state. It then trains two asymmetric residual modules: a bounded residual stabilizer for task preservation, and a context-aware residual that adapts the frozen downstream policy only within the action subspace assigned to the new task. We evaluate the framework on 16 composite dexterous manipulation tasks spanning four object-retention skills and four downstream interactions. DexCompose achieves a 77.4% average composite success rate, demonstrating that structural action ownership with dual residuals offers a promising direction for composing dexterous skills beyond conventional policy chaining.

Surprises in Proper Positive-Only Learning stat.ML

Binary classification from positive-only samples is a variant of PAC learning in which the learner receives i.i.d. samples from the positive region of an unknown target concept, but is evaluated under the original distribution (which places mass on both positive and negative regions). This model dates back to Natarajan [1987, STOC], and the characterization of improper learning is well-known -- it even appears in textbooks. The characterization of proper positive-only learning, however, has long remained open. In this work, we revisit and settle this question: a concept class is properly learnable from positive-only samples if and only if it has finite VC dimension and satisfies a new combinatorial condition, which we call uniform exterior separability. Together with several separation results, this characterization reveals a surprisingly rich landscape that differs sharply from standard PAC learning: proper and improper learning are separated, randomized and deterministic proper learning are separated, there are classes for which no ERM is a learner, and finite VC dimension does not suffice even for non-uniform learning. Along the way, we introduce new combinatorial dimensions that we believe can be of broader interest in learning theory.

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes cs.GT

Many two-player zero-sum games admit not a unique Nash equilibrium but a convex set of them: a polytope of profiles that all share the minimax value V* yet prescribe different behaviour. Standard solvers each converge to some equilibrium and are treated as interchangeable. We ask whether they instead select different members of the Nash set, systematically as a function of the algorithm rather than the seed. Using a tabular, exactly solvable testbed of six games with analytically known Nash sets -- including a two-dimensional Nash polytope and Kuhn poker -- we find that (i) selection is determined by the algorithm, not the seed, but families differ only on asymmetric Nash sets; (ii) regularized last-iterate methods (R-NaD, magnetic mirror descent) select the maximum-entropy member, the information projection of their uniform reference onto the Nash set -- exactly on the 2-D polytope and at 99.7% of maximum entropy in Kuhn -- while regret-averaging methods (CFR, CFR+, fictitious play) drift to a lower-entropy face; we confirm this on a randomized 180-game ensemble, where R-NaD attains the maximum-entropy member in 100% of converged games while CFR+ sits strictly below it in 94% (paired Wilcoxon p < 10^-27); (iii) the selected member has downstream consequences against sub-optimal opponents that scale with sequential/hidden-information structure but stay bounded -- in Kuhn the max-entropy member is a strictly better hedge, whereas on the matrix games the members differ without either dominating. We also report two negative results correcting common intuitions: removing CFR's positive-orthant (max(R,0)) projection does not eliminate boundary drift; and R-NaD's selection is anchor-following, not initialization-independent. We state the maximum-entropy / I-projection characterization as a strongly data-supported conjecture, checked throughout against analytic ground truth.

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization math.OC

We analyze Bregman ADMM for nonconvex linearly constrained problems under two-sided relative smoothness, a condition that replaces the standard Lipschitz gradient assumption with a Hessian comparison relative to a Bregman kernel. This setting covers polynomial objectives arising in matrix and tensor models for which a global Lipschitz-gradient constant need not exist. We show that on an invariant open state-space domain, one iteration of Bregman ADMM defines a smooth primal--dual fixed-point map whose strict-saddle KKT points are unstable fixed points; consequently, from random initialization the iterates converge to a strict saddle with probability zero. Combined with existing first-order convergence results, this yields almost-sure second-order stationarity of limiting KKT points. We extend the analysis to a multi-block star consensus formulation for distributed optimization. The technical novelty lies in a determinant reduction with a Bregman-specific symmetrization and scaling step in the two block spectral argument, together with a null space cancellation exploiting the star graph structure in the consensus case. Numerical experiments on distributed matrix factorization illustrate the theory, and a symmetric tensor factorization example demonstrates the broader Bregman proximal splitting idea beyond the separable consensus setting.

VGB for Masked Diffusion Model: Efficient Test-time Scaling for Reward Satisfaction and Sample Editing cs.LG

Inference-time scaling is a promising paradigm to improve generative models, especially when outputs must satisfy structural constraints or optimize downstream rewards. We consider Masked Diffusion Model (MDM) and introduce MDM-VGB, a discrete diffusion sampler that augments unmasking generation with theoretically principled reward-guided remasking. Inspired by the recent success of the classical Jerrum-Sinclair backtracking Markov chain in reward-tilted generation, MDM-VGB extends the backtracking random walk from a fixed prefix tree to a masked-state graph, allowing tokens to be unmasked and remasked at arbitrary positions. The resulting sampler favors unmasking and remasking moves that lead to higher-value partial configurations, enabling both effective high-reward generation and efficient repair of low-reward samples. We prove that MDM-VGB is robust to process-verifier noise and achieves quadratic complexity, while popular test-time heuristics such as best-of-$N$ can incur exponential complexity due to error accumulation. Our theoretical findings are corroborated by strong empirical performance, particularly on popular constraint-satisfaction and scientific benchmarks such as Sudoku and QM9.

Democratic ICAI: Debating Our Way to Steering Principles from Preferences cs.LG

Preference-based alignment often struggles to capture the reasoning that underlies human judgments. Many evaluations rely on multiple interacting criteria, yet pairwise labels reveal only the final choice rather than the considerations that shape preferences. Inverse Constitutional AI (ICAI) improves interpretability in decision making by summarizing preferences into natural-language principles, but its single-pass explanations miss much of the nuance involved in complex decisions. We introduce Democratic ICAI, a novel approach that gathers multiple competing rationales through structured persona debate, offering a broader and more expressive account of the factors influencing each comparison. From these richer signals, we derive clearer and more comprehensive steering principles and use them to guide decision modeling through both LLM-based and decision-tree judges. Experiments on creative preference benchmarks, MuCE-Pref and LiTBench, across multiple creative task categories show that Democratic ICAI yields a more faithful preference structure. It improves average preference prediction across tasks relative to deliberative prompting and principle-based baselines, while producing constitutions that LLM annotators prefer.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Fable 5	59.9	0	$20.00
2	Claude Opus 4.8	55.7	58	$10.00
3	GPT-5.5	54.8	83	$11.25
4	Claude Opus 4.7	53.5	55	$10.00
5	GPT-5.4	51.4	174	$5.63

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	OpenAIgpt-5.5-2026-04-23-xhighModel	62.7%± 0.91%
2	JunieJunieAgent	61.6%± 0.64%
3	OpenAICodexAgent	60.4%± 1.37%
4	AnthropicClaude CodeAgent	59.6%± 1.98%
5	OpenAIgpt-5.5-2026-04-23-mediumModel	58.9%± 0.78%

GitHub Repos All repos

Trending

simplex-chat/simplex-chat

15714 ★

SimpleX - the first messaging network operating without user identifiers of any kind - 100% private by design! iOS, Android and desktop apps 📱!

ripienaar/free-for-dev

126101 ★

A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev

commaai/openpilot

62598 ★

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

Robbyant/lingbot-map

8444 ★

A feed-forward 3D foundation model for reconstructing scenes from streaming data

DeusData/codebase-memory-mcp

20533 ★

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

Daily discovery

Blaizzy/mlx-audioTransformers

7451 ★

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

StarTrail-org/LEANNRAG

12609 ★

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

DmitrL-dev/AISecurityAI Safety

106 ★

AI Security Platform: Defense (61 Rust engines + Micro-Model Swarm) + Offense (39K+ payloads)

FootprintAI/ContainariumMCP

242 ★

Open-source agent runtime — SSH-native isolation, eBPF egress policy, Kubernetes + LXC backends, GPU passthrough, MCP-native CLI

esengine/DeepSeek-ReasonixLLM

25296 ★

DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.