Meanwhile Oracle is betting its balance sheet on AI workloads that haven't materialized yet, cutting 21,000 jobs to fund billions in data center spending, while Menlo Ventures raised $3 billion, its largest fund in 50 years, to back startups that may never justify the infrastructure costs. The capital intensity of building AI systems is creating a two-tier market: large incumbents and well-funded startups racing to lock in organizational context and customer relationships before rivals do, while smaller competitors and workers targeted for replacement are being sorted into the losing column. Nvidia's banned chips are doubling in price on China's black market, SpaceX is raising $25 billion in bonds despite lacking a credit rating, and political challengers who resist Silicon Valley's leverage are losing House primaries. The cost of entry is so high that only the largest players can afford to play.
But the actual business case remains unproven. Enterprise deployments of agents remain fragmented and unverified. Fika Jobs is building video-first hiring platforms with AI agents, OpenAI launched Patch the Planet to find vulnerabilities, and Nvidia is calling new PC chips a reinvention, yet none provides evidence that enterprises are actually deploying agents at scale or that productivity gains justify the infrastructure costs. Meta paused employee monitoring after security controls failed twice, revealing that the data collection underpinning AI training is vulnerable to the same people it's meant to monitor. JD.com's founder acknowledged robots will replace 700,000 delivery workers while expressing reluctance about the outcome, a statement containing no plan to avoid displacement. The infrastructure is being built on conviction, not results.
Labs have stopped racing on capability and started competing on infrastructure lock-in. OpenAI is positioning itself as the standards-setter while simultaneously funding frameworks through the Appia Foundation, a move that looks like infrastructure capture dressed as cooperation. Nvidia controls 81 percent of the TOP500 supercomputers and is moving downstream into domain-specific toolkits and enterprise agent deployment, signaling a shift from selling raw compute to selling the full stack. Anthropic's Claude Tag and Mistral's OCR 4 are tactical releases addressing specific use cases rather than strategic pivots. Research papers have shifted from proving capability to decomposing complex tasks into modular, interpretable components and aligning model architectures to problem structure rather than chasing parameter count. Benchmarks show stability at the top tier with incremental reshuffling below, suggesting the capability frontier has stabilized enough that the next phase is about who can build the most reliable, cost-efficient, and specialized systems on top of it.
The developer community has moved decisively past whether agents can work to how to make them work reliably at scale. Trending repositories consolidate agentic systems into opinionated, production-ready harnesses where agent skills are treated as first-class abstractions and orchestration layers are built with specific operational constraints in mind. Performance improvements are substantial: DeusData's codebase-memory-mcp achieves 99% token reduction through persistent knowledge graphs, and lance-format's lakehouse format delivers 100x faster random access for multimodal data. The infrastructure is hardening into specialized, domain-specific tooling rather than monolithic platforms. If the productivity case doesn't materialize at enterprise scale, the capital structure supporting this consolidation will face a reckoning. For now, the market is pricing in success and lock-in, not competition or failure.
Grant Calloway
Vision-language-action (VLA) models can learn manipulation skills from demonstrations, but their capabilities are bounded by the skills in the training data. We present InSight, a framework that unlocks autonomous skill acquisition by rendering VLAs steerable at the primitive-action level (e.g., "move gripper to the bowl", "lift upward", "pour the bottle"). InSight consists of two primary stages: (1) an automated segmentation pipeline that partitions demonstrations into labeled primitives via VLM plan decomposition and end-effector poses to enable VLA primitive steerability, and (2) a VLM-guided data flywheel that identifies missing primitives required to accomplish a novel task, autonomously attempts demonstrations of the missing primitives with VLM-proposed low-level control, and automatically labels, stores, and integrates successful demonstrations into the VLA training set. We evaluate InSight across simulation and real-world manipulation tasks, including block flipping, drawer closing, sweeping, twisting, and pouring, without any human demonstrations of these target skills. Once learned, these primitives can be composed to execute novel, long-horizon tasks without additional human demonstrations. Our findings demonstrate that primitive steerability provides a practical foundation for continual skill acquisition in VLA policies. Project website: https://insight-vla.github.io.
We study the last iterate of the stochastic subgradient method for one-dimensional convex Lipschitz objectives. For a fixed horizon $n$, we consider the standard fixed stepsizes $η=Θ(1/\sqrt n)$. We prove that, for such stepsize policies, under additive i.i.d. subgradient noise with uniformly bounded variance, the last iterate features an optimization error of order $1/\sqrt n$, thereby removing the extra $(\log n)$ factor present in existing generic bounds. On the other hand, we show that without the i.i.d. assumption, the optimization error can be of order $(\log n)/\sqrt n$. Thus, under the uniformly bounded variance assumption alone, the last iterate of SsGM is suboptimal even in dimension one, resolving negatively an open problem posed in Koren and Segal, COLT, 2020.
Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) generation, yet current methods struggle to preserve high-frequency visual details of input images due to two structural bottlenecks. First, they adopt discriminative 2D features optimized for semantic abstraction to construct sparse voxel latents, which suppress reconstructive cues and induce a representation bottleneck. Second, in the generation stage, standard diffusion transformers lack effective mechanisms to align dense 2D image tokens with sparse 3D voxel latents, resulting in a cross-modal correspondence bottleneck. To address these issues, we propose FLUX3D, a scalable image-to-3DGS framework that boosts both representation learning and cross-modal alignment during generation. We first revisit 2D feature selection for sparse-voxel-based 3D representation learning, propose Diffusion-Aligned Structured Latents (DA-SLAT) and couple it with a decoder-only architecture to improve 3DGS reconstruction fidelity. We also design a sparse-structure-aware diffusion framework, which integrates the Sparse-structure Multimodal Diffusion Transformer (SMDiT) and Modal-Aware Rotary Positional Embedding (MARoPE) to achieve geometry-agnostic 2D-3D alignment. Extensive benchmark experiments demonstrate that FLUX3D yields substantial improvements in appearance fidelity and significantly outperforms all state-of-the-art (SOTA) methods in generating high-quality 3DGS assets.
Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct more than 100 controlled ablation experiments to systematically investigate each stage of the pipeline, yielding insights on the importance of task sources and diversity. We then assemble a training set of 100K examples from our pipeline and fine-tune Qwen3-32B on this dataset, which yields an average accuracy of 44.8% across seven agentic benchmarks and a 3.9 percentage point improvement over the strongest existing open data agentic model (Nemotron-Terminal-32B, 40.9%). Moreover, our training data exhibits strong scaling properties, outperforming alternative open datasets at every training set size in compute-controlled comparisons. We publicly release our training sets, data pipeline, experimental data, and models at openthoughts.ai to support future open research on agentic model training.
Artificial intelligence (AI) can enhance what people who use augmentative and alternative communication (AAC) are able to do with their systems. However, evaluating AI-powered AAC interfaces can be difficult. People are intersectional beings and current evaluation metrics can struggle to capture the multifaceted and nuanced desires people may have for their AAC. We explore the complicated nature of six AAC problem spaces, explore how AI might be used in these spaces, and suggest more robust methods of evaluation that take the intersectional nuances of people into account. We also discuss broader issues that arise across these problem spaces and how they could be addressed using our proposed evaluation methods.
Fourier Neural Operators (FNO) learn solution operators of partial differential equations by parameterizing global convolutions in the complex Fourier domain. For real-valued PDE solutions, the complex FFT carries representational redundancy through conjugate symmetry. We introduce the Hartley Neural Operator (HNO), the exact real-valued mirror of FNO: it replaces the FFT with the purely real Discrete Hartley Transform and learns a single real multiplier per retained spectral mode, with no complex arithmetic. Because the real Hartley spectrum is not halved by conjugate symmetry, HNO retains twice as many frequency corners as FNO but one real weight where FNO carries a complex pair, so the two operators are iso-parametric at equal width and differ only in spectral basis. Our central thesis is that the best basis is a property of the operator. Self-adjoint elliptic operators (Poisson, biharmonic) have real, symmetric Green's functions that the real Hartley multiplier diagonalizes exactly, and HNO is favored there. Time-dependent operators carry phase, from oscillation in the wave equation to transport in advection, Burgers, and Navier-Stokes, which a real diagonal multiplier cannot represent, so FNO is favored there, and increasingly so with the operator's phase content, leaving the phaseless heat equation as the borderline case. Training both operators identically and benchmarking across PDE classes, initial-condition families, and boundary conditions, we find an elliptic-versus-time-dependent split that is monotone in operator phase content and matches the Green's-function theory we develop. Rather than a universal winner, our findings give a predictive rule: match the spectral basis to the symmetry of the solution operator.
Composite score across coding, math, and reasoning
| # | Model | Score | tok/s | $/1M |
|---|---|---|---|---|
| 1 | Claude Fable 5 | 59.9 | 0 | $20.00 |
| 2 | Claude Opus 4.8 | 55.7 | 72 | $10.00 |
| 3 | GPT-5.5 | 54.8 | 64 | $11.25 |
| 4 | Claude Opus 4.7 | 53.5 | 62 | $10.00 |
| 5 | GPT-5.4 | 51.4 | 161 | $5.63 |
Agentic coding on real-world software engineering tasks
| # | Model | Score |
|---|---|---|
| 1 | gpt-5.5-2026-04-23-xhigh | 62.7% |
| 2 | Junie | 61.6% |
| 3 | Codex | 60.4% |
| 4 | Claude Code | 59.6% |
| 5 | gpt-5.5-2026-04-23-medium | 58.9% |
World's first open-source, agentic video production system. 11 pipelines, 49 tools, 400+ agent skills. Turn your AI coding assistant into a full video production studio.
754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0
Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skills and subagents, it handles different levels of tasks that could take minutes to hours.
Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
An Open Source Machine Learning Framework for Everyone
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Build an agent harness and control it end-to-end. Open-source SDK for production AI agents in Python & TypeScript - any model, any cloud.
Cloud Robotics Core: Kubernetes, Federation, App Management