The Inference Report

May 17, 2026

Fragmentation is the story. The AI industry is sorting itself into winners and losers not by technological capability alone but by proximity to capital, regulatory favor, and distribution channels. OpenAI's partnership with Malta reveals the real playbook: lock in government relationships before competitors do, embed your product into civic infrastructure, and frame commercial distribution as a public good. Meanwhile, the infrastructure layer is splitting between monolithic all-in-one solutions like Bun and Superpowers that eliminate coordination overhead, and focused modular components like RAGLite that preserve flexibility for heterogeneous stacks. Both approaches are winning because they solve real friction, but they solve it for different developer profiles.

The fractures run deeper than product strategy. ArXiv's ban on outsourced language model research and bug bounty programs drowning in AI-generated noise both signal the same problem: automation floods any system with low friction and high noise tolerance, and the industry acknowledges this but won't solve it structurally. ByteDance and Kuaishou's lead in video generation exposes how Western AI dominance claims often reflect benchmark scores rather than production-grade systems that actually ship at scale. The CFTC's plan to deploy AI against insider trading in prediction markets reads as regulatory capture dressed as oversight. This is capital sorting itself toward massive concentration or narrow technical wins, away from the messy middle.

Research in human-computer interaction reveals what happens when this sorting completes: AI's apparent frictionlessness and sycophantic affirmation quietly shift human behavior and expectations in ways that feel effortless but carry hidden costs to judgment and epistemic integrity. Observability and knowledge management are emerging as first-class infrastructure concerns precisely because systems are becoming harder to inspect and understand as they grow more complex and interconnected. Developers are building private knowledge bases and on-device inference solutions to manage the cognitive overhead of working with multiple models. The unglamorous problems that compound at scale are now driving architectural decisions. The industry is not moving toward coherence. It is efficiently sorting capital, talent, and regulatory attention away from the messy middle.

Grant Calloway

AI LabsAll labs

OpenAI

OpenAI and Malta partner to bring ChatGPT Plus to all citizens

From the WireAll feeds

Research Papers — FocusedAll papers

Memory-Driven Self-Disclosure and Relational Turning Points: A Longitudinal Multimodal Study of Human-AI Interaction cs.HC

As conversational AI systems are designed for repeated use, a central question is how a series of interactions becomes a relationship. We present a longitudinal multimodal study of a memory-augmented conversational agent (24 participants x 10 sessions), in which participants rated five relational constructs -- familiarity, self-disclosure, perceived memory, conversational quality, and enjoyment -- after each session. Two complementary dynamics emerge. First, conversational quality strongly shapes how enjoyable a session feels in the moment but does not carry forward across sessions, whereas perceived memory is relationally conditioned -- predicted by prior relational state rather than reflecting system capability alone -- and it shapes later enjoyment indirectly, via subsequent self-disclosure. Second, relationships are punctuated by discrete turning points -- crashes and surges -- that are partially traceable in multimodal behavior and open different intervention windows: surges are more behaviorally detectable in the moment, enjoyment surges persist more reliably than enjoyment crashes recover, and some crashes are better forecast from person-specific behavioral drift than detected after they have already occurred. Together, the findings suggest that longitudinal human-AI relationships are built through both slow accumulation and abrupt turning points.

When AI Blurs the Boundaries of Contribution: An Empirical Study of Authorship Calibration cs.HC

The broad adoption of Artificial Intelligence (AI), especially Generative AI, raises pressing questions about how users interact with these systems to produce new content. In this paper, we introduce the concept of authorship calibration, defined as users awareness of their actual authorship when interacting with AI. Using the CoAuthor dataset, we empirically examine how authorship calibration varies across users and how it relates to their frequency of AI use. Our results reveal high variability: users relying heavily on AI tend to misjudge their authorship, whereas those using AI less frequently exhibit more accurate authorship calibration. These findings suggest that AI can obscure users perception of their own authorship. In learning contexts, miscalibration can affect metacognitive monitoring and learning strategies, ultimately impacting learning outcomes. Fostering authorship calibration then appears essential for promoting responsible and educationally meaningful AI integration.

SoftBoard: A Multi-Agent Tool for the Creation and Evaluation of Low-Fidelity Prototypes cs.HC

User Experience (UX) is recognized as a critical factor for the success of digital products, particularly in software startups, environments marked by time constraints, limited resources, and low maturity in design practices. Building Minimum Viable Products (MVPs) through low-fidelity prototyping represents a well-established strategy for rapid validation cycles at reduced cost. A systematic literature mapping, however, revealed gaps in the ecosystem of available tools: a predominance of general-purpose solutions adapted for prototyping, the absence of integrated methodological guidance, and the incipient use of Artificial Intelligence in the design process. This paper presents SoftBoard, a web-based tool for the creation and evaluation of low-fidelity prototypes in the context of MVP development. The tool integrates a prototype editor, team-based project organization, and a multi-agent system based on large language models that supports requirements elicitation and refinement, automates prototype generation, and evaluates interface quality based on usability heuristics. This integration aims to reduce reliance on prior UX expertise, standardize the prototyping process, and support teams in building MVPs aligned with user needs. As future work, a feasibility study with software professionals is currently underway.

TRAIL: A Platform for Configurable Human--AI Teaming Experiments cs.HC

An AI teammate's design properties (personality, communication style, when it speaks) can shape a team's trust, coordination, and decisions. Studying this rigorously demands infrastructure no existing tool provides: reproducible configuration of an AI teammate embedded in instrumented, real-time collaboration sustained over time. We present the Team Research and AI Integration Lab (TRAIL), a web platform that makes the AI teammate a configurable, reproducible design object, pairing a Big Five persona with a selective-participation message pipeline, dual memory, chained longitudinal experiments, and export-ready analytics. In a real six-session classroom deployment (about 51 students), TRAIL sustained longitudinal chaining, held the AI to a stable minority of the conversation, and enabled export-driven AI-human text-similarity analysis. A single blind persona change produced a design-consistent double dissociation: a cognitive-scaffolding agent drew stronger contribution ratings and closer linguistic alignment; a socially-supportive agent, a warmer team climate and lower over-reliance.

Practical Judgment, Virtue, and Intuition in the Use of Opaque AI-Enabled Systems cs.HC

AI-enabled systems are seeing increasing deployment across numerous domains, with many being "black boxes" with respect to core functions and capabilities. I.e., many systems take inputs and give outputs, but without users having any ability to see how the former lead to the latter. AI-enabled systems are also being used to augment autonomy in systems, and autonomy coupled with opacity raises numerous concerns surrounding, e.g., the reliability of systems, their regularity in functioning, human ability to control them, or whether deploying opaque and potentially autonomous systems is in compliance with ethical and legal norms. In this article, we argue that many of these worries can be mitigated by leveraging practical judgment, virtue, and intuition in the deployment and use of opaque AI-enabled systems. We show that focusing on these distinctly human capabilities provides a means for bridging between the practical challenges created by opacity and the ethical, legal, and social norms underpinning particular domains. We argue that a core element in doing this is a recognition that many positive human traits are not quantifiable and we therefore must develop training regimen and guidelines on AI deployment anchored in humanistic but non-quantifiable values. Throughout the article, we focus on the military domain as an exemplar of the importance of practical judgment, virtue, and intuition as drivers for ethical and effective human decision-making surrounding AI deployments, but the underlying arguments apply to all domains where opaque and potentially autonomous systems are being deployed (subject to domain-specific alterations).

Spatula: Exploring On-Demand In-Situ Interfaces and Interaction for Attribute Control cs.HC

Controlling attributes is a critical step toward achieving the final creative outcome, yet current approaches fall short in supporting users in the iterative refinement of generative content. We propose Spatula, a proof-of-concept system that generates on-demand, in-situ attribute control interfaces and interactions for creating motion graphics. Building on a technical probe that automatically analyzes animation context and generates corresponding attributes and UI, we frame attribute control as an explorable landscape and explore the attribute control space along four key dimensions: Discoverability, Resolution, Scope, and Expandability. Findings from a user study (N=12) show that our system provides intuitive and convenient interactions while supporting diverse needs for fine-grained parameter control. Furthermore, our applications demonstrate that the plug-and-play design generalizes to other domains, such as web design and 3D modeling.

BenchmarksFull tables

Intelligence Index

Composite score across coding, math, and reasoning

#	Model	Score	tok/s	$/1M
1	GPT-5.5	60.2	75	$11.25
2	Claude Opus 4.7	57.3	51	$10.94
3	Gemini 3.1 Pro Preview	57.2	131	$4.50
4	GPT-5.4	56.8	82	$5.63
5	Kimi K2.6	53.9	44	$1.71

SWE-rebench

Agentic coding on real-world software engineering tasks

#	Model	Score
1	Claude Opus 4.6	65.3%
2	gpt-5.2-2025-12-11-medium	64.4%
3	GLM-5	62.8%
4	Junie	62.8%
5	gpt-5.4-2026-03-05-medium	62.8%

GitHub Repos All repos

Trending

oven-sh/bun

94609 ★

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one

K-Dense-AI/scientific-agent-skills

24651 ★

A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.

obra/superpowers

252042 ★

An agentic skills framework & software development methodology that works.

Anil-matcha/Open-Generative-AI

21536 ★

Uncensored, open-source alternative to Higgsfield AI, Freepik AI, Krea AI, Openart AI — Free, unrestricted AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.

supertone-inc/supertonic

8556 ★

Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.

Daily discovery

superlinear-ai/ragliteRAG

1158 ★

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

hongbo-miao/hongbomiao.comNeural Network

298 ★

A personal research and development (R&D) lab that facilitates the sharing of knowledge.

xtreme1-io/xtreme1RLHF

1191 ★