The Inference Report

July 18, 2026

Regulators are finally moving fast enough to matter, but the market has already priced in the cost of being caught. The EU forced Google to open Android to rival AI agents. San Francisco shut down nudify apps generating millions in fees. Apple sued OpenAI over 400 former employees and alleged trade secret theft. Yet builders and capital are treating regulation as a tax on growth rather than a boundary, moving faster than enforcement can follow. OpenAI's response to Apple's lawsuit has been carefully hedged, replaced by a $230 keyboard marketed as a "command center for agentic work" that signals desperation wrapped in premium packaging.

The real consolidation is happening in infrastructure, not models. Databricks hit a $188 billion valuation by repositioning itself as the data layer for AI. A $400 million chip-backed loan shows the next wave of financing is moving from GPU hoarding to inference chips and efficiency. Meta is exploring a cloud business to commercialize its $145 billion infrastructure spend. Agility Robotics opened a training center in Fremont to compete with Tesla in robotics. The companies that own the pipes, data flows, inference hardware, and robotics platforms are consolidating power faster than the companies building models on top of them.

OpenAI and NVIDIA have converged on cost efficiency as the primary measure of AI value. OpenAI's scorecard centers on cost per successful task and return on compute. NVIDIA's Vera Rubin positions intelligence per dollar for post-training workloads. Hugging Face's move to enable fine-tuning at scale through NeMo Automodel confirms the real competition isn't over foundational model quality anymore, it's over who can deliver useful capabilities cheapest and fastest. When three major players release announcements on the same day all pointing toward cost-per-useful-output as the metric that matters, they're confirming a strategy that's already won.

Labor and liability questions are colliding with financial architecture. Hyundai workers are striking over the deployment of 25,000 Atlas robots starting in 2028. xAI is suing Grok users to avoid admitting the model generates child sexual abuse material. OpenAI confirmed that GPT-5.6 can accidentally delete files and called it an "honest mistake." Linus Torvalds told critics of AI coding in Linux to "fork it or walk away," signaling that the open-source commons is no longer a place to negotiate terms. Venture capitalist Neil Rimer predicted that the historic wealth AI is generating will have to be redistributed "voluntarily or involuntarily." Today's headlines suggest it will happen through litigation, regulation, and market consolidation simultaneously, with the winners already known.

Grant Calloway

AI LabsAll labs

Hugging Face

Fine-tune video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffusers

NVIDIA

NVIDIA Vera Rubin Maximizes Intelligence per Dollar for Post-Training Workloads — a Key Metric for Agentic AI

OpenAI

A scorecard for the AI age

From the WireAll feeds

Research Papers — FocusedAll papers

Real-Time Detection of Charge Jumps in Superconducting Qubits with a Convolutional Neural Network quant-ph

Ionizing radiation from cosmic rays and gammas can induce discontinuous jumps in the environmental charge of superconducting qubits (charge jumps), causing correlated errors that challenge fault-tolerant quantum computing while simultaneously providing a detection signature for quantum sensing applications. Current detection methods operate offline, introducing latency incompatible with in-the-loop qubit control. In this paper, an online detector of charge jumps for superconducting qubits, based on a dilated causal convolutional neural network (DCCNN) designed for in-the-loop deployment on the Quantum Instrumentation Control Kit (QICK) platform, is presented. The network is trained on synthetic Ramsey tomography scans generated from qubit templates measured at the Northwestern Experimental Underground Site (NEXUS) at Fermilab, and translated to FPGA firmware via hls4ml with ap_fixed$\langle 16,6 \rangle$ quantization, reaching a per-inference latency of $6.19 μ$s on the Zynq UltraScale+ RFSoC ZCU216. At this operating point the DCCNN matches the detection efficiency of the established offline $χ^2$ algorithm ($0.843 \pm 0.022$ vs. $0.866 \pm 0.020$ on $|Δq| \in [0.1, 0.5] e$ at matched false-positive rate), while requiring no per-qubit hyperparameter tuning. This shifts charge-jump detection from a post-hoc diagnostic to a control-loop primitive, enabling adaptive protocols that respond to radiation-induced events in situ, with applications to quantum-computing error mitigation and to the use of superconducting qubits as particle detectors.

Towards quantum machine learning for assessing the resilience of post-quantum cryptography quant-ph

The potential capabilities of quantum computers motivated the development of cryptographic protocols suitable for securing communication against adversaries with access to large fault-tolerant quantum computers. However, even though current quantum computers are limited in terms of size and precision, they can still be useful for finding loopholes and weaknesses in the post-quantum cryptographic protocols. In this work, we present an attempt to utilize the capabilities of Quantum Generative Adversarial Networks (QGANs), one of the promising architectures used in quantum machine learning, for this purpose. We describe an example application of QGAN architecture for the purpose of loading the probability distribution of the hash-based digital signatures into the memory of a quantum computer. Our results confirm that near-term hybrid quantum-classical methods possess capabilities required for this purpose. The presented approach can be used as a first step in the workflow, enabling the utilization of quantum computing for attacking post-quantum cryptographic primitives.

Quantum Topological Data Encoding quant-ph

Many datasets encountered across a wide range of domains possess rich geometric and topological structure that is difficult to capture using conventional vector-based representations. Quantum machine learning offers the possibility of processing high-dimensional data in Hilbert spaces, but its practical success depends critically on how classical data is encoded into quantum states. We introduce \emph{quantum topological data encoding} (QTDE), a general framework for encoding topological information into quantum states via topology-driven quantum evolution. Our method generalises an existing topology-driven quantum encoding framework to higher-dimensional data. We test the proposed method on clique-complexes classification tasks, and provide preliminary evidence that topology-driven quantum representations can capture discriminative information beyond that available through direct comparisons of classical topological descriptors. The proposed quantum representations consistently outperform a baseline based on direct comparisons of the combinatorial Laplacians describing the underlying topological structure. We indicate several areas of application where the framework can be used to provide a more efficient and reliable data representation.

VQCSim: When Does Compile-Once Statevector Simulation Beat Generic Quantum Frameworks? quant-ph

Hybrid quantum-classical machine learning workflows repeatedly evaluate many small parametrized circuits during training and model exploration. In this regime, framework dispatch and orchestration overhead often dominate runtime. Prior simulators accelerate execution but leave open the question of when compile-once specialization is the right choice for static variational circuits. We answer this question with VQCSim, a compile-once, PyTorch-native statevector execution path with native autograd. In a systematic MQT Bench study, VQCSim compiles all tested static circuits and provides 87.7% end-to-end semantic validation. Across a five-GPU evaluation set, VQCSim delivers pooled median speedups of 4.49x for native inference and 26.78x for native training, while retaining a 3.31x advantage under matched finite-difference training. Ablation identifies native autograd as the dominant source of acceleration (27.6x), with compile-once caching and batch vectorization contributing additional gains. The speedup trades higher GPU memory (VQCSim is memory-limited at the high end) for lower runtime. We derive a hardware-aware regime map and release vqcsim-oracle, an open-source backend selector with 91.1%-97.7% top-1 agreement (including cross-GPU transfers), enabling automatic simulator selection in QML design loops.

Quantum PDE Solvers in Practice: Application-Driven Benchmarking of the Heat Equation quant-ph

Quantum PDE solvers are difficult to evaluate in practice because published studies use different discretizations, output models, reconstruction rules, and hardware assumptions. We present a reproducible, application-driven benchmark for the 1-D Dirichlet heat equation that compares eleven kernels under the same problem instances and readout contract. The benchmark covers coherent linear solvers (HHL, QSVT, and QLS-Fourier), VQLS, imaginary-time methods (QITE, var-QITE, and AVQDS), real-time Hamiltonian simulation and unitary dilations (Hamiltonian simulation, Schade-Hamiltonian, and Schr"odingerisation), and the spectral quantum simulation method (QSM). We use three initial conditions, four grid sizes from $n=4$ to $7$ qubits ($N=16$ to $128$), a CFL-like ratio $r\approx0.4$, and final time $T=1$. Statevector, ideal-shot ($10^5$ shots per step), and noisy Aer backends separate algorithmic, sampling, and device-noise errors. On statevector, QSM and Schade-Hamiltonian reproduce the semi-discrete reference to floating-point precision, Schr"odingerisation reaches approximately $10^{-4}$ error, and QITE is the strongest non-transform method for smooth data. Under the fixed-shot setting, HHL degrades to approximately $0.79$ relative $\ell_2$ error, while several low-depth or postselected methods become readout-limited. A norm-mismatch ablation attributes 23--29% of the $n=7$ smooth-initial-condition error of Hamiltonian simulation, AVQDS, and QLS-Fourier to reconstruction normalization. Compact observables, including total thermal energy and individual Fourier-mode weights, require 1--3 orders of magnitude fewer shots than full-field reconstruction. The resulting public benchmark provides a practical guide for selecting quantum PDE solvers.

When Close Enough Is Not Enough: Autoregressive Drift in Quantum Circuit Synthesis quant-ph

Quantum circuit optimization for fault-tolerant computing requires exact functional equivalence while minimizing expensive non-Clifford resources such as T gates. We study this problem using a compact 44.8M-parameter encoder-decoder transformer with structured circuit tokenization, evaluating on parameterized circuits (2-6 qubits) and Clifford+T circuits (3-6 qubits). On parameterized circuits, a hybrid approach -- structure from the transformer, angles from classical optimization -- achieves median fidelity 1.000 on 3-6 qubit circuits. On Clifford+T circuits, where all gates are discrete and no post-processing is possible, the model learns valid syntax and accurate T-Count statistics, yet exact equivalence degrades sharply with target length -- from 88% on circuits with <=9 gates to near zero beyond 26 gates. We trace this failure to autoregressive drift: early-token divergence cascading irrecoverably through left-to-right decoding. Two levers partially mitigate the drift: inference-time strategies that generate multiple candidates and select via equivalence verification raise exact-match rates from 7% to 22.5%, while scaling training data by 2.5x pushes them to 39.5%. Yet the degradation with target length persists -- even with more data, exact equivalence drops from 94% on short circuits to under 4% beyond 26 gates. The contrast between settings is our central finding: when approximate outputs can be rescued by post-processing, the transformer succeeds; when exact discrete correctness is required, autoregressive drift limits reliability, with both inference-time search and data scaling as effective levers while training-side fine-tuning and model-level diversification are not.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	Claude Fable 5	59.9	57	$20.00
2	GPT-5.6 Sol	58.9	66	$11.25
3	Kimi K3	57.1	59	$6.00
4	Claude Opus 4.8	55.7	53	$10.00
5	GPT-5.6 Terra	55	137	$5.63

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	OpenAIgpt-5.5-2026-04-23-xhighModel	62.7%± 0.91%
2	JunieJunieAgent	61.6%± 0.64%
3	OpenAICodexAgent	60.4%± 1.37%
4	AnthropicClaude CodeAgent	59.6%± 1.98%
5	OpenAIgpt-5.5-2026-04-23-mediumModel	58.9%± 0.78%

GitHub Repos All repos

Trending

codecrafters-io/build-your-own-x

527757 ★

Master programming by recreating your favorite technologies from scratch.

PostHog/posthog

36318 ★

🦔 PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack.

HenryNdubuaku/maths-cs-ai-compendium

6799 ★

Become a cracked AI/ML Research Engineer

Nutlope/hallmark

12392 ★

Anti-AI-slop design skill for Claude Code, Cursor, and Codex.

github/copilot-sdk

9844 ★

Multi-platform SDK for integrating GitHub Copilot Agent into apps and services

Daily discovery

qdrant/qdrantVector Database

33362 ★

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

JohnSnowLabs/spark-nlpTransformers

4144 ★

State of the Art Natural Language Processing

okesipoke/manuscript-phoneme-decipherNLP

152 ★

Voynich Manuscript Decoded: Elu-Sinhala Phonetic Transcription & Vocabulary Toolkit 2026

blakeblackshear/frigateObject Detection

34401 ★

NVR with realtime local object detection for IP cameras

kigner/audio.cpp-webuiSpeech Recognition

172 ★

audio.cpp with a full-task WebUI - pure C++ audio-model inference engine powered by ggml. TTS, ASR/STT, VAD, voice conversion, speaker diarization, music generation. No Python dependency.