OpenAI is positioning smaller models as the infrastructure layer for distributed agent workloads, releasing GPT-5.4 mini and nano optimized for tool use and high-volume API calls while simultaneously capturing data on worker compensation queries, a signal that it sees embedding in enterprise workflows and labor-market applications as the next revenue surface. NVIDIA, meanwhile, is building the hardware and networking stack to make that distribution possible: AI grids that push inference to telecom networks, RTX-accelerated local devices running open models, and integrations with Apple Vision Pro for professional applications. The company is also tightening relationships across industrial software (Cadence, Siemens, PTC, Dassault Systèmes) and robotics, positioning CUDA and Omniverse as the standard tools for physical AI at manufacturing scale. Hugging Face and Mistral are competing in the open-source compact model space, Nemotron 3 Nano and Holotron-12B for local deployment, Forge as a new offering, but lack the infrastructure moat that NVIDIA controls or the API distribution that OpenAI does. Google and DeepMind are moving into measurement: healthcare workflows and AGI evaluation frameworks, neither of which directly compete with the inference and deployment layer that OpenAI and NVIDIA are racing to own. IBM's acquisition of Confluent signals that real-time data infrastructure is now seen as essential to enterprise agents, and the company is betting that integrating it with watsonx.data and Z systems will lock in existing customers. The pattern across announcements is clear: the money is flowing toward whoever controls the stack from model to inference to data to hardware, not toward individual model releases or research frameworks.
Sloane Duvall
A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.
None
None
None
None
None
None
None