OpenAI dominated the day by closing $110 billion in private funding, one of the largest such rounds in history, immediately followed by Amazon's announcement of a $50 billion investment in the company alongside a expanded $100 billion cloud deal with AWS. The twin announcements signaled that the race to secure AI infrastructure has reached a new intensity, with Wall Street reacting with immediate jitters as US tech stocks headed for their worst month in almost a year amid what analysts described as AI psychosis. The market tension was compounded by Condé Nast CEO Roger Lynch calling AI a "death blow" to Google search, a stark warning that generative search tools are already reshaping the advertising and information economy.
The funding cascade washed over several other developments. Block, the payments company formerly known as Square, revealed it had cut 40% of its workforce to go all-in on AI tools, a dramatic bet that echoes broader industry consolidation around artificial intelligence as a core business rather than a supplementary capability. Meanwhile, AI music generator Suno reached a significant milestone with 2 million paid subscribers and $300 million in annual recurring revenue, demonstrating that consumer AI products can achieve substantial monetization even as enterprise adoption remains the dominant narrative. Perplexity continued its agent ambitions with the launch of a new "Computer" product that can run other agents for users, while Google and OpenAI employees publicly backed Anthropic's refusal to work with the Pentagon in an open letter that highlighted growing labor activism around AI ethics and military applications.
The lab announcements reflected a field pushing in multiple directions. Google's Gemini team highlighted February updates for its app, NVIDIA published guidance on maximizing GPU utilization with Run:ai and NIM, and Hugging Face introduced a Visual Aesthetic Benchmark to test whether frontier models can judge beauty, an intriguing probe into whether models can capture subjective human judgment. GitHub published a practical guide to building with Copilot CLI, signaling that developer tooling continues to mature. Notably absent from today's feeds were recent research papers, leaving the research paragraph to pivot toward these benchmark and tooling developments as indicators of where the field is heading.
Three days before Mobile World Congress, the landscape suggests that infrastructure and monetization are converging as the defining themes of this moment. OpenAI's historic raise, Amazon's deepening commitment, and Suno's subscription success collectively indicate that the market is no longer questioning whether AI will be economically dominant but rather how quickly that dominance will consolidate and who will control the underlying compute and distribution layers.
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.
Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source distribution (e.g., a standard normal) and a target data distribution. Flow-based methods often exhibit greater training stability and have achieved strong empirical performance in high-dimensional settings where data concentrate near a low-dimensional manifold, such as text-to-image synthesis, video generation, and molecular structure generation. Despite this success, existing theoretical analyses of flow matching assume target distributions with smooth, full-dimensional densities, leaving its effectiveness in manifold-supported settings largely unexplained. To this end, we theoretically analyze flow matching with linear interpolation when the target distribution is supported on a smooth manifold. We establish a non-asymptotic convergence guarantee for the learned velocity field, and then propagate this estimation error through the ODE to obtain statistical consistency of the implicit density estimator induced by the flow-matching objective. The resulting convergence rate is near minimax-optimal, depends only on the intrinsic dimension, and reflects the smoothness of both the manifold and the target distribution. Together, these results provide a principled explanation for how flow matching adapts to intrinsic data geometry and circumvents the curse of dimensionality.
In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost.
Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers from performance degradation under model misspecification. While self-consistency (SC) training on unlabeled empirical data can enhance network robustness, current approaches are limited to static, single-task settings and fail to handle sequentially arriving data or distribution shifts. We propose a continual learning framework for ABI that decouples simulation-based pre-training from unsupervised sequential SC fine-tuning on real-world data. To address the challenge of catastrophic forgetting, we introduce two adaptation strategies: (1) SC with episodic replay, utilizing a memory buffer of past observations, and (2) SC with elastic weight consolidation, which regularizes updates to preserve task-critical parameters. Across three diverse case studies, our methods significantly mitigate forgetting and yield posterior estimates that outperform standard simulation-based training, achieving estimates closer to MCMC reference, providing a viable path for trustworthy ABI across a range of different tasks.
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.
We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The proposed measure extends integrated $R^2$ from scalar responses to responses taking values on general spaces equipped with a characteristic kernel, allowing to measure dependence of multivariate, functional, and structured data, while remaining sensitive to tail behaviour and oscillatory dependence structures. We establish that (i) this new measure takes values in $[0,1]$, (ii) equals zero if and only if independence holds, and (iii) equals one if and only if the response is almost surely a measurable function of the covariates. Two estimators are proposed: a graph-based method using $K$-nearest neighbours and an RKHS-based method built on conditional mean embeddings. We prove consistency and derive convergence rates for the graph-based estimator, showing its adaptation to intrinsic dimensionality. Numerical experiments on simulated data and a real data experiment in the context of dependency testing for media annotations demonstrate competitive power against state-of-the-art dependence measures, particularly in settings involving non-linear and structured relationships.
| # | Model | Score |
|---|---|---|
| 1 | Claude Code | 52.9% |
| 2 | Claude Opus 4.6 | 51.7% |
| 3 | gpt-5.2-2025-12-11-xhigh | 51.7% |
| 4 | gpt-5.2-2025-12-11-medium | 51.0% |
| 5 | gpt-5.1-codex-max | 48.5% |
Production-ready implementation of InvisPose - a revolutionary WiFi-based dense human pose estimation system that enables real-time full-body tracking through walls using commodity mesh routers
An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skills and subagents, it handles different levels of tasks that could take minutes to hours.
Fast and accurate automatic speech recognition (ASR) for edge devices
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems. Use when building, optimizing, or debugging agent systems that require effective context management.
An agentic skills framework & software development methodology that works.
A lightweight suite of motion imitation methods for training controllers.
Spec-driven development for large codebases
🌐 The open-source Agentic browser; alternative to ChatGPT Atlas, Perplexity Comet, Dia.
SIMD-Accelerated Sampling-based Motion Planning
An Open Source Machine Learning Framework for Everyone