The Inference Report — May 8, 2026

Anthropic's $44 billion annualized revenue run rate and a $200 billion Google Cloud commitment are conspicuously absent from any regulatory enforcement action. The week's volume reveals a market that has already moved past the question of whether AI deployment should wait for policy frameworks and settled firmly on the answer: no. The EU softened AI Act deadlines to late 2027 and 2028, effectively conceding that compliance timelines cannot catch up to shipping velocity. Meanwhile, OpenAI launched voice reasoning into the API, Perplexity shipped its Personal Computer on Mac, Spotify expanded its AI DJ to four new languages, and Bumble integrated AI dating assistants. Each deployment embeds the infrastructure deeper into production systems before regulators can respond. Moonshot AI hit $200 million in annualized revenue, and startups like Fazeshift raised $17 million to automate accounts receivable because the economic case for labor displacement is too strong to resist. The pattern is unmistakable: regulation follows deployment, not the reverse.

What emerges beneath the noise is a secondary but crucial tension between augmentation and displacement. Basata automates medical office administration. Teradata's Autonomous Knowledge Platform forces enterprises to answer which data agents can use and who is accountable when they fail. These are not philosophical questions but questions about cost centers and headcount. Pennsylvania sued Character.AI for impersonating a psychiatrist. A union vote landed at Google DeepMind. These actions arrive after deployment, not before. The builders have already won the race to install the infrastructure.

In the labs, the competition has stratified into layers that bypass traditional benchmarking. OpenAI owns the consumer and enterprise API surface through monetization of ChatGPT and specialized GPT-5.5 variants for security teams. NVIDIA consolidated the deployment layer through infrastructure partnerships. AMD is fighting for cost parity on GPU optimization. Anthropic is publishing mechanistic interpretability work and open-source alignment tooling as credibility plays. The real differentiation is no longer model capability alone but relationship to enterprise builders and the ability to solve operational problems at scale. GitHub's trending repos confirm this: developers are moving past monolithic "throw an LLM at it" frameworks toward domain-specific composition. InsForge and goose abstract deployment friction. DeepSeek-TUI, agent-skills, and specialized tools like local-deep-research raise the ceiling on what agents can actually do. The distinction matters because it reveals where the work now concentrates: not in model training but in the infrastructure, tooling, and operational maturity that make deployment faster and cheaper than any regulatory process can constrain.

Grant Calloway

AI LabsAll labs

AMD

Anthropic

GitHub Blog

Google DeepMind

AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

Hugging Face

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

Microsoft

Inside Porsche Cup Brasil’s AI-powered race operations

NVIDIA

OpenAI

From the WireAll feeds

Research Papers — FocusedAll papers

FINER-SQL: Boosting Small Language Models for Text-to-SQL cs.DB

Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural alternative is to use small language models (SLMs), which enable efficient and private on-premise deployment. Yet, SLMs often struggle with weak reasoning and poor instruction following. Conventional reinforcement learning methods based on sparse binary rewards (0/1) provide little learning signal when the generated SQLs are incorrect, leading to unstable or collapsed training. To overcome these issues, we propose FINER-SQL, a scalable and reusable reinforcement learning framework that enhances SLMs through fine-grained execution feedback. Built on group relative policy optimization, FINER-SQL replaces sparse supervision with dense and interpretable rewards that offer continuous feedback even for incorrect SQLs. It introduces two key reward functions: a memory reward, which aligns reasoning with verified traces for semantic stability, and an atomic reward, which measures operation-level overlap to grant partial credit for structurally correct but incomplete SQLs. This approach transforms discrete correctness into continuous learning, enabling stable, critic-free optimization. Experiments on the BIRD and Spider benchmarks show that FINER-SQL achieves up to 67.73\% and 85\% execution accuracy with a 3B model -- matching much larger LLMs while reducing inference latency to 5.57~s/sample. These results highlight a cost-efficient and privacy-preserving path toward high-performance Text-to-SQL generation. Our code is available at https://github.com/thanhdath/finer-sql.

Inconsistent Databases and Argumentation Frameworks with Collective Attacks cs.DB

The connection between subset-maximal repairs for inconsistent databases involving various integrity constraints and acceptable sets of arguments within argumentation frameworks has recently drawn growing interest. In this paper, we contribute to this domain by establishing a new connection when integrity constraints (ICs) include denial constraints and local-as-view tuple-generating dependencies. It turns out that SET-based Argumentation Frameworks (SETAFs), an extension of Dung's argumentation frameworks (AFs) allowing collective attacks, are needed. It is known that subset-maximal repairs under denial constraints correspond to the naive extensions, which also coincide with the preferred and stable extensions in the resulting SETAFs. Our main findings establish that repairs under the considered fragment of tuple-generating dependencies correspond to the preferred extensions. Moreover, for these dependencies, additional preprocessing allows computing a unique extension that is stable and naive. Allowing both types of constraints breaks this relationship, and even the pre-processing does not help as only preferred semantics captures these repairs. Finally, while it is known that functional dependencies do not require set-based attacks, we prove the same regarding inclusion dependencies. Thus, one can translate inconsistent databases under these restricted classes of ICs to plain AFs with attacks only between arguments.

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement cs.DB

Text-to-SQL enables non-expert users to query databases in natural language, yet real-world schemas often suffer from ambiguous, abbreviated, or inconsistent naming conventions that degrade model accuracy. Existing approaches treat schemas as fixed and address errors downstream. In this paper, we frame schema refinement as a constrained optimization problem: find a renaming function that maximizes downstream Text-to-SQL execution accuracy while preserving query equivalence through database views. We analyze the computational hardness of this problem, which motivates a column-wise greedy decomposition, and instantiate it as EGRefine: a four-phase pipeline that screens ambiguous columns, generates context-aware candidate names, verifies them through execution-grounded feedback, and materializes the result as non-destructive SQL views. The pipeline carries two structural properties: column-local non-degradation, ensured by the conservative selection rule in the verification phase, and database-level query equivalence, ensured by the view-based materialization phase. Together they make the resulting refinement safe by construction at the column level, with cross-column and prompt-level interactions handled empirically rather than analytically. Across controlled schema-degradation, real-world, and enterprise benchmarks, EGRefine recovers accuracy lost to schema naming noise where applicable and correctly abstains where the underlying task exceeds current Text-to-SQL capabilities, with refined schemas transferring across model families to enable refine-once, serve-many-models deployment. Code and data are publicly available at https://github.com/ai-jiaqian/EGRefine.

VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines cs.DB

Visual Graph Query Interfaces (VQIs) empower non-programmers to query graph data by constructing visual queries intuitively. Devising efficient technologies in Graph Query Engines (GQEs) for interactive search and exploration has also been studied for years. However, these two vibrant scientific fields are traditionally independent of each other, causing a vast barrier for users who wish to explore the full-stack operations of graph querying. In this demonstration, we propose a novel VQI system built upon Neo4j called VisualNeo that facilities an efficient subgraph query in large graph databases. VisualNeo inherits several advanced features from recent advanced VQIs, which include the data-driven gui design and canned pattern generation. Additionally, it embodies a database manager module in order that users can connect to generic Neo4j databases. It performs query processing through the Neo4j driver and provides an aesthetic query result exploration.

Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification cs.DB

Viruses represent the most abundant biological entities on Earth and play a pivotal role in microbial ecosystems, yet, as prominent human pathogens, they are closely linked to human morbidity and mortality. Accurate identification of viral sequences from viral genome sequences is therefore essential, but existing genome-based classification models that largely relying on composition- or frequency-based subsequence features often suffer from limited interpretability and reduced accuracy, particularly on complex or imbalanced datasets. To address these limitations, we propose GeneNSPCla (Genomic Negative Sequential Pattern-based Classification), a novel viral classification framework based on Negative Sequential Patterns (NSPs) that extracts discriminative absence-based features from nucleotide sequences of RNA viral genomes. By transforming these NSPs into numerical feature vectors and integrating them into multiple supervised classifiers, GeneNSPCla effectively captures both presence and absence signals in viral sequences. Furthermore, we propose a negative pattern mining algorithm adapted for processing genomic data: GONPM+, which can discover longer and more biologically meaningful negative sequential patterns. The experimental results demonstrate that the average accuracy of GONPM+ in 8 classifiers has improved by 10.03% compared to the original negative pattern mining algorithm and by 24.75% compared to the positive pattern mining algorithm. These findings highlight the effectiveness of incorporating absence-based sequential information, providing a new and complementary perspective for viral genome analysis and classification.

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering cs.DB

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain $\rightarrow$ Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	75	$11.25
2	Claude Opus 4.7	57.3	52	$10.94
3	Gemini 3.1 Pro Preview	57.2	125	$4.50
4	GPT-5.4	56.8	78	$5.63
5	Kimi K2.6	53.9	38	$1.71

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

anthropics/financial-services

13014 ★

Hmbown/DeepSeek-TUI

20633 ★

Coding agent for DeepSeek models that runs in your terminal

z-lab/dflash

3617 ★

DFlash: Block Diffusion for Flash Speculative Decoding

InsForge/InsForge

9008 ★

Give agents everything they need to ship fullstack apps. The backend built for agentic development.

LearningCircuit/local-deep-research

6480 ★

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.

Daily discovery

DemonDamon/AgenticXMCP

117 ★

AgenticX is a unified, production-ready multi-agent platform — Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical memory, avatar & group chat, skill ecosystem, safety sandbox, and IM gateway (Feishu/WeChat).

Kiln-AI/KilnFine-tuning

4805 ★

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

unslothai/unslothLLM

63801 ★

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

pytorch/pytorchMachine Learning

99746 ★

Tensors and Dynamic neural networks in Python with strong GPU acceleration

K-Dense-AI/karpathyAutoML

1410 ★

An agentic Machine Learning Engineer