The Inference Report

June 4, 2026

The AI market is consolidating into tiers by capital and scale while regulatory pressure applies selectively to incumbents already winning. Google faces orders to disclose sources in AI search results after claiming users don't want them, yet simultaneously raised $85 billion to fund spending that will dwarf any competitor's runway. Anthropic pulled in $50 billion in May alone, representing 54 percent of global venture funding that month. The message is unmistakable: scale and capital are tilting sharply upward, and regulatory friction applies only to those already dominant.

Product competition below the headline layer reveals a different pattern. Gemma 4 12B runs on any laptop with 16GB of RAM, signaling that smaller efficient models are becoming table stakes. Nvidia's RTX Spark chips are designed to move AI workloads from cloud to edge, potentially fragmenting the market into mainstream laptops and premium workstations. Google bundles AI into existing surfaces, Amazon generates product images at search time, WhatsApp charges businesses per token for AI agents. These are not innovations but plays to lock in usage and extract margin from infrastructure everyone else is already building. Lovable signed a multiyear expansion with Google Cloud worth a 5x increase in footprint and expanded access to Anthropic's Claude, a reminder that distribution and API access matter more than the model itself.

Regulatory capture is happening in plain sight. UK publishers can now opt out of Google's generative AI search, a requirement rolling out globally but only after Google already trained on their content. OpenAI added session visibility controls to ChatGPT after the governance problem surfaced. Meanwhile, startups handling 17,000 calls per day in Africa and the Middle East, builders using Claude for DIY infrastructure, and firms like Coralogix raising capital to monitor AI agents in production are moving fast into underserved markets. They operate on top of APIs and infrastructure controlled by companies that just raised enough capital to outspend any competitor for the next decade.

Across labs and research, competition has shifted from benchmark scores to operational deployment. OpenAI doubled down on agent-driven enterprise workflow automation, Nvidia positioned itself as the platform for robotics and autonomous systems, and IBM pushed hands-on AI experience to 20,000 universities to build a developer pipeline. Materials science research shows the same pattern: machine learning is no longer treated as a surrogate for expensive calculations but embedded as a component in hybrid workflows where validation, constraint-handling, and mechanistic transparency are prerequisites. In developer tooling, token compression via Headroom achieves 60-95% reduction while preserving answer quality, signaling teams have moved past throwing everything at models into surgical data selection. Agent frameworks fractured into specialized tools rather than monolithic systems, and inference optimization became hardware-specific rather than general-purpose. The field stopped waiting for hardware to catch up and started building software that works within real constraints.

Grant Calloway

AI LabsAll labs
From the WireAll feeds
Research Papers — FocusedAll papers
AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces cond-mat.mtrl-sci

Identifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, yet exhaustive exploration with ab initio calculations is computationally prohibitive. Machine-learning force fields (MLFFs) accelerate structural relaxation but leave the search over the vast configurational space a major bottleneck, and open-loop large language model (LLM) agents lack a physics-grounded feedback mechanism to correct erroneous initial guesses. We propose AdsMind (Adsorption configuration discovery with Machine intelligence and relaxation feedback), a closed-loop multi-agent framework that enables autonomous error correction through MLFF relaxation feedback. Across four LLM backends, AdsMind achieves consistently high search reliability, with success rates of 100% and 98.8% on the benchmarks AA20 and OCD-GMAE62. Relative to its single-pass (1-Shot) ablation it reduces cross-backend energy dispersion, and it uses only 4.11 and 4.67 MLFF relaxations per case, respectively -- an approximately 14-fold reduction over heuristic enumeration baselines. Density functional theory (DFT) validation using VASP/PBE on six representative AA20 systems shows that the reported open-loop Adsorb-Agent outputs exhibit qualitative adsorption-energy sign errors for molecular adsorbates, whereas AdsMind preserves the correct sign in all tested cases with closer quantitative agreement. AdsMind thus delivers reliability, self-reflection, and interpretability simultaneously, supporting more DFT-informed autonomous chemistry workflows.

Physics-Informed Attention Mechanism and Generalization Capability of Deep Learning-Based Grain Growth Evolution Prediction cond-mat.mtrl-sci

Machine Learning (ML) models for grain growth prediction are typically trained on idealized synthetic data, yet practical applications require generalization to conditions outside the training distribution. This study evaluated the Out-Of-Distribution (OOD) generalization capability of the trained model from our previous study across three test cases, including experimental microstructures, microstructures characterized by a bimodal grain size distribution, and abnormal grain growth. To further probe whether physics-informed architectural design could improve robustness under these different conditions, a boundary-masked attention mechanism was proposed specifically for grain growth, constraining attention to grain boundary pixels. Both the baseline and the proposed physics-informed attention model were evaluated without retraining or fine-tuning on the OOD data. Both models successfully generalized to all three test cases, yet the boundary-masked attention mechanism provided substantial improvements, with the most notable gains for microstructures characterized by a bimodal grain size distribution, where Structural Similarity Index Measure (SSIM) improved from \num{0.6221} to \num{0.7609} and mean grain size ($\overline{R}$) error decreased from \SI{8.75}{\percent} to \SI{3.57}{\percent}. The attention heatmap analysis revealed that the boundary-masked attention model learned to concentrate attention on large grain boundaries in a manner consistent with curvature-driven grain growth physics, emerging from training without being explicitly encoded into the architecture. These results indicate that models trained on synthetic data can generalize to diverse OOD conditions without retraining, and that physics-informed attention may improve accuracy when the boundary morphology matches the training domain.

InvDesMobility: a reliability-gated first-principles feedback framework for closed-loop materials discovery cond-mat.mtrl-sci

Inverse materials design starts from target functionality and searches for structures that can realize it. Its value in closed-loop discovery depends not only on prediction performance, but also on whether expensive first-principles results are independently validated, provenance-recorded, and admitted as feedback only when evidence is sufficient. This is especially important for composite properties such as carrier mobility, where a final scalar value hides intermediate quantities, fit quality, convergence history, and workflow assumptions. Here we present InvDesMobility, a reliability-gated first-principles feedback framework that integrates multi-agent automated DFT, evidence stratification, generative structure proposal, acquisition ranking, and auditable release. Using 516 2DMatPedia-derived candidates, the workflow produced 280 QC-passed materials and 573 retained carrier-direction seed channels after channel-level reliability gating. These records were split into two feedback objects: relaxed structures updated the generative model, while retained mobility channels trained the acquisition model and set validation priority. Over multiple iterations, InvDesMobility screened 2.4 x 10^6 structures, submitted 102 candidates for DFT validation, and retained 86 reliability-gated generated channels across 41 formulas. Overall, the main contribution is not a fixed list of high-mobility materials, but a transferable feedback contract that makes closed-loop inverse design both useful and auditable when learning from expensive calculated properties. All source data, retained feedback records, and workflows are available at https://github.com/DreamLufei/invDesMobility, with an accompanying evidence website at https://dreamlufei.github.io/invDesMobility/.

Modelling magnetic material properties with uncertainty-aware neural networks cond-mat.mtrl-sci

Machine learning is increasingly applied to accelerate the discovery of novel materials by exploring large compositional and structural design spaces. Yet, the scarcity of high-quality data and the frequent need for out-of-distribution prediction introduce substantial uncertainty, making the assessment of model reliability essential. In this work, we investigate uncertainty quantification as a means to evaluate model confidence in the context of permanent magnet research. In a first study, we benchmark classical and modern machine learning models for predicting intrinsic magnetic properties, focusing on the quality of their uncertainty estimates. We apply Gaussian negative log-likelihood loss and dropout-based Bayesian approximation as practical strategies for estimating predictive uncertainty. In a second study, we transfer these architectural features for uncertainty estimation to a more complex task: predicting coercivity from microstructural information using a graph neural network. Together, these studies demonstrate that uncertainty quantification not only enhances the trustworthiness of predictions but is also transferable across different modeling tasks.

Context-Aware Deep Learning for Defect Classification in Atomic-Resolution STEM cond-mat.mtrl-sci

Artificial intelligence is rapidly advancing materials characterization, yet most applications in electron microscopy rely solely on image contrast, overlooking the chemical and experimental context that shapes image formation. This limitation makes defect classification inherently ambiguous, as similar contrasts can arise from different materials or imaging conditions. Here we develop a context-aware learning framework that integrates image-derived contrast with metadata describing composition, beam energy, and detector geometry. Using a systematically constructed dataset of ~55 million simulated patches spanning 576 cases across 96 doped monolayer transition-metal dichalcogenides, we show that conditioning on contextual variables transforms defect classification from an ill-posed image-only task into a well-posed, physically grounded problem. The framework achieves over 98% accuracy on simulations and near-human agreement on experimental data, with a 94% reduction in posterior entropy. By emphasizing contextual grounding over architectural complexity, this approach links experimental image contrast to the underlying chemical and imaging conditions, supporting physically grounded defect assignments and a general pathway toward multimodal AI models for autonomous materials characterization.

Sustainable Metal-Organic Framework Water Harvesters in the Artificial Intelligence Era cond-mat.mtrl-sci

Metal-organic frameworks (MOFs) are excellent candidates for water harvesting due to their tunable pore environments, which can be precisely engineered to capture and release water in arid conditions. Integrating artificial intelligence (AI) into MOF discovery can further accelerate the design of high-performance sorbents by identifying structural features that enhance atmospheric water harvesting (AWH), stability, and cycling efficiency. In this Perspective, we examine key MOF design principles, including cooperative adsorption, operational relative humidity (RH), uptake capacity, hysteresis, and scalability. We highlight recent design advancements such as multivariate strategies and long-arm linker extension, and examine how these principles tune pore capacity and hydrophilicity, while preserving stability and crystallinity. Furthermore, we discuss how AI, large language models (LLMs), and data mining can accelerate the discovery process through predictive synthesis, inverse design, and elucidating synthesis-structure-property relationships for the next generation of MOF water harvesters.

BenchmarksFull tables
Artificial AnalysisIntelligence Index

Composite score across coding, math, and reasoning

#ModelScoretok/s$/1M
1Claude Opus 4.861.453$10.94
2GPT-5.560.267$11.25
3Claude Opus 4.757.347$10.94
4Gemini 3.1 Pro Preview57.2116$4.50
5GPT-5.456.878$5.63
SWE-rebench

Agentic coding on real-world software engineering tasks

#ModelScore
1gpt-5.5-2026-04-23-xhigh62.7%
2Codex60.4%
3Claude Code59.6%
4gpt-5.5-2026-04-23-medium58.9%
5Claude Opus 4.8-xhigh56.4%