The labs are converging on agents as the concrete unit of deployment, and the infrastructure race is bifurcating between platform builders and silicon specialists. OpenAI is shipping agent runtime primitives, the Responses API, container orchestration, prompt injection defenses, that turn models into executable workflows for enterprises already embedded in their stack. Rakuten and Wayfair aren't buying models; they're buying operational leverage: half the incident response time, automated catalog maintenance, ticket triage that doesn't require human gatekeeping. This is where the margin lives. Meanwhile, NVIDIA and its partners are racing to solve the throughput problem at the foundation layer. Nemotron 3 Super, a 120-billion-parameter model with only 12 billion active parameters in use, is explicitly designed to run agentic workloads efficiently, the kind of inference cost structure that makes autonomous systems economically viable at scale. Nebius's partnership with NVIDIA signals that hyperscalers outside the Big Three see full-stack infrastructure as the battleground, not model weights. Google's clinical diagnostics study and Meta's MTIA chip roadmap suggest both are betting on domain-specific deployments and custom silicon rather than competing in the general-purpose agent layer where OpenAI already has distribution. The pattern is clear: whoever controls the agent runtime and the inference economics wins the next cycle. Model capability alone is table stakes.
Sloane Duvall
A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.
None
None
None
None
None
None
None