Meanwhile, as federal courts weigh whether Anthropic poses a national security threat and the Trump administration releases a framework designed to preempt state AI regulations, the real battle is shifting beneath the political theater toward control of infrastructure and the platforms through which work flows. The government's case against Anthropic rests on technical misunderstandings, according to sworn declarations filed Friday, yet the administration's National Policy Framework emphasizes lighter-touch rules for companies while attempting to centralize authority over AI policy at the federal level. Big Tech is fracturing over these attacks, with former Trump allies offering unprecedented criticism even as the administration tries to block state-level laws. This is not philosophical disagreement about safety. It is a power struggle over who writes the rules: the executive branch, states, or the companies themselves.
Beneath the regulatory rupture, power has become the bottleneck constraining the entire enterprise. Nvidia's CEO Jensen Huang projects one trillion dollars in AI chip sales through 2027, yet energy consumption is now the north star metric alongside accuracy and engagement as engineers discover that rolling out new data centers depends on power availability, not model capability. Microsoft rolled back Copilot bloat on Windows after user and developer resistance to forced integration. OpenAI is folding ChatGPT, Codex, and its browser Atlas into a single desktop superapp, signaling a shift toward enterprise infrastructure and developer tools away from the consumer market that made it a household name. These are admissions that the consumer AI wave has peaked and the real money is in developer platforms and enterprise lock-in.
Distribution control is now the prize. WordPress.com lets AI agents write and publish posts directly. Google embedded AI into Stitch, enabling developers to describe interfaces in natural language. Amazon is building a smartphone called Transformer to integrate shopping, streaming, and voice services through Alexa. LinkedIn banned an AI agent that had conquered the platform. Each move lowers friction for adoption while raising switching costs and centralizing control through the platform. PwC told staff they must embrace AI or face replacement. Google told researchers to stop submitting AI-generated bug reports to its open-source program due to hallucinations and low quality. AI adoption is no longer optional, quality control is breaking down at scale, and the winners will be whoever owns the platform through which work flows.
The technical evidence confirms this shift. Benchmark performance has plateaued at the top tier, with Claude Code holding 52.9% on SWE-rebench and the next three positions separated by less than 1.2 percentage points, signaling that incremental gains in raw capability now demand substantial effort. On GitHub, the dominant pattern is developers moving past building individual models toward building systems that orchestrate them: Claude HUD, Open-SWE, and Superpowers all solve the same problem of making autonomous agents predictable enough to trust in production. The repos gaining traction are those that make the infrastructure layers reliable: specialized data handling for AI pipelines, vector storage, dataflow definition, and domain-specific scaffolding. The boring parts of AI systems, not raw capability, are what developers are actively building on.
Grant Calloway
Partial driving automation creates a tension: drivers remain legally responsible for vehicle behaviour, yet their active control is significantly reduced. This reduction undermines the engagement and sense of agency needed to intervene safely. Meaningful human control (MHC) has been proposed as a normative framework to address this tension. However, empirical methods for evaluating whether existing systems actually provide MHC remain underdeveloped. In this study, we investigated the extent to which drivers experience MHC when interacting with partially automated driving systems. Twenty-four drivers completed a simulator study involving silent automation failures under two modes - haptic shared control (HSC) and traded control (TC). We derived behavioural metrics from telemetry data, subjective perception scores from post-trial surveys and used them to test hypothesised relations between them derived from the properties of systems under MHC. The confirmatory analysis showed a significant negative correlation between the perception of the automated vehicle (AV) understanding the driver and conflict in steering torques. An exploratory analysis also revealed a surprising positive correlation between reaction times and the perception of sufficient control. Qualitative feedback from open-ended post-experiment questionnaires revealed that mismatches in intentions between the driver and automation, lack of safety, and resistance to driver inputs contribute to the reduction of perceived MHC, while subtle haptic guidance aligned with driver intent had a positive effect. These findings suggest that future designs should prioritise effortless driver interventions, transparent communication of automation intent, and context-sensitive authority allocation to strengthen meaningful human control in partially automated driving.
Expectations about the support of artificial intelligence (AI) may influence interaction outcomes similar to placebos. Such expectations may result from AI washing, a practice of overstating a system's AI capabilities when actual functionality is limited. For example, some computer mice are marketed as "AI-assisted" despite lacking AI in core functions. In a within-subjects study, 28 participants completed Fitts' Law tasks with a computer mouse under three conditions: no support, supposed predictive AI support, and supposed biosignal-enhanced AI support. Objective Fitts' Law performance indicators and subjective performance expectations, perceived workload, and perceived usability were measured. Compared to baseline, participants expected significantly improved performance in placebo conditions. However, these expectations did not translate into differences in objective or subjective assessments. This paper contributes evidence that AI washing inflates user expectations without altering actual interaction outcomes, highlighting a critical transparency issue. By exposing how deceptive AI marketing can shape user expectations, we underscore the need for accountability in AI product claims. Further, we establish Fitts' Law as a rigorous methodological lens for auditing AI-labelled input devices.
Freelance workers must continually acquire new skills to remain competitive in online labor markets, yet they lack the organizational training, mentorship, and infrastructure available to traditional employees. Generative AI-powered tools like ChatGPT are reshaping market skill demands while also offering new forms of on-demand learning support to meet those demands. Despite growing interest in AI-powered learning tools, little is known about how freelancers actually use these tools to learn, the challenges they encounter, and how generative AI for learning interacts with precarity and competition in platform-based work. We present a mixed-methods study combining a survey and semi-structured interviews with freelance knowledge workers. Grounded in self-directed learning theory, we examine how freelancers integrate generative AI tools into their learning practices. Our findings show that freelancers increasingly rely on generative AI to structure learning and support exploratory skill acquisition, but do not treat it as their primary learning resource due to inconsistency, lack of contextual relevance, and verification overhead. We identify a shift from learning as growth to learning as survival, where upskilling is oriented toward immediate market viability rather than long-term development. We also surface a structural challenge we term invisible competencies, in which workers acquire skills through generative AI tools but lack credible ways to signal or validate these skills in competitive freelance markets. Based on these insights, we offer design recommendations for generative AI-powered learning tools for freelancers.
Large language model (LLM) reading assistants are increasingly used in settings that require interpretation rather than simple retrieval. In these contexts, the central risk is not only error or unsafe output, but interpretive displacement: the transfer of meaning-making work from reader to system. This paper examines that problem through the concept of epistemic guardrails, defined here as constraints on how an artificial intelligence (AI) system participates in reading and interpretation. Using TextWalk, a minimal reading-support prototype designed as a co-reader rather than an answer-provider, the study applies a fixed ten-prompt protocol to twelve analytical texts spanning four categories of argumentative prose. The protocol escalates from baseline reading support to interpretive inquiry, boundary stress, and explicit shortcut pressure, enabling guardrails to be examined as behavioral properties observable in interaction rather than as static instruction features. Results show strong baseline stability, measurable strain during interpretive inquiry, partial recovery under direct boundary stress, and late-stage stabilization under escalation pressure. The most consequential weaknesses did not appear as overt collapse, but in a middle zone between support and substitution, where the system remained grounded and pedagogical while redistributing too much interpretive labor away from the reader. The paper contributes a protocol for evaluating epistemic guardrails as interactional phenomena in conversational AI reading assistants, an empirical account of their behavioral dynamics under pressure, and an emerging model of interpretive boundary function in reading-support AI.
As information ecosystems grow more heterogeneous, both humans and artificial agents increasingly face a simple yet unresolved question: when seeking knowledge, whom should we ask, and why? Inspired by how people intuitively "read a room", this paper introduces the concept of knowledge affordance (KA) to systematize how agents identify meaningful opportunities for information seeking in hybrid human-AI environments. Rather than introducing a fully formed framework, we propose KAs as declarative, semantically grounded descriptions of what a knowledge source can offer, for which kinds of questions, and with which contextual properties. Additionally, we suggest that KAs are relational, possibly emerging from the interplay between the agent's task, preferences and situational factors. Our contribution is thus a conceptual proposal that connects different research streams, including affordances, semantic web services, knowledge engineering and querying, and mutual intelligibility. We sketch possible research directions to build KA-aware systems that navigate information spaces with greater transparency, adaptability and shared understanding.
A long-standing challenge in economics lies not in the lack of intuition, but in the difficulty of translating intuitive insights into verifiable research. To address this challenge, we introduce AgentEconomist, an end-to-end interactive system designed to translate abstract intuitions into executable computational experiments. Grounded in a domain-specific knowledge base covering over 13,000 high-quality academic papers, the system employs a modular multi-stage architecture. Specifically, the Idea Development Stage generates literature-grounded hypotheses, the Experimental Design Stage configures simulator-aligned experimental parameters and protocols, and the Experimental Execution Stage runs experiments and returns structured analyses. Together, these stages form a human-in-the-loop, iterative workflow that translates economic intuitions into executable computational experiments. Through extensive experiments involving human expert evaluation and large language models (LLMs) as judges, we show that the system generates research ideas with stronger literature grounding and higher novelty and insight than state-of-the-art generic LLMs. Overall, AgentEconomist adopts a human-AI collaboration paradigm that enables researchers to focus on high-level intuitions, while delegating the labor-intensive processes of translation and computational execution to agents.
Composite score across coding, math, and reasoning
| # | Model | Score | tok/s | $/1M |
|---|---|---|---|---|
| 1 | GPT-5.4 | 57.2 | 86 | $5.63 |
| 2 | Gemini 3.1 Pro Preview | 57.2 | 117 | $4.50 |
| 3 | GPT-5.3 Codex | 54 | 74 | $4.81 |
| 4 | Claude Opus 4.6 | 53 | 54 | $10.00 |
| 5 | Claude Sonnet 4.6 | 51.7 | 70 | $6.00 |
Agentic coding on real-world software engineering tasks
| # | Model | Score |
|---|---|---|
| 1 | Claude Code | 52.9% |
| 2 | Junie | 52.1% |
| 3 | Claude Opus 4.6 | 51.7% |
| 4 | gpt-5.2-2025-12-11-xhigh | 51.7% |
| 5 | gpt-5.2-2025-12-11-medium | 51.0% |
A Claude Code plugin that shows what's happening - context usage, active tools, running agents, and todo progress
An Open-Source Asynchronous Coding Agent
An agentic skills framework & software development methodology that works.
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
Generate any location from the real world in Minecraft with a high level of detail.
Full-Stack Development Platform for Building Reliable Agents
A zenoh plug-in that allows to transparently route DDS data. This plugin can be used by DDS applications to leverage zenoh for geographical routing or for better scaling discovery. For ROS2 robotic applications, use https://github.com/eclipse-zenoh/zenoh-plugin-ros2dds
Run Cursor, Claude Code, OpenCode, or Codex with any LLM provider — deploy to IM, HTTP, or your own product.
Build, Manage and Deploy AI/ML Systems
A fast in-memory rule engine