The lab announcements today pivot decisively toward agentic systems deployed in enterprise workflows, with the economics of inference, power consumption, token cost, and real-time performance, emerging as the central competitive battleground. OpenAI is positioning Codex as the connective tissue for agent orchestration across engineering, tax compliance, and open-source development, signaling that the near-term value capture lies not in foundation models themselves but in the infrastructure that coordinates multiple agents across fragmented enterprise systems. NVIDIA's framing of AI factories as token factories converting power into intelligence is the most candid articulation of this shift: as always-on autonomous agents become the deployment model, the unit economics of inference, cost per token, performance per watt, displace traditional model capability metrics as the actual competitive moat. Google and AMD are both publishing deep technical work on inference optimization and privacy-preserving aggregation, suggesting they recognize the infrastructure play is where margin lives. Hugging Face's ITBench benchmark revealing frontier models scoring below 50 percent on agentic enterprise IT tasks is the honest counterweight here: capability benchmarks on static tasks have decoupled from actual agent performance in production workflows, which means labs are racing to solve a different problem than the one they've been publicly measured against. Anthropic's work on coding agents for social science research and Hugging Face's local-first robotics deployment indicate the market is already fragmenting by use case and deployment constraint. What's conspicuously absent is any lab claiming general superiority in agentic reasoning or agent coordination, the announcements are instead about embedding agents into specific workflows where the friction is highest and the switching cost is real.
Sloane Duvall
A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.
None
None
None
None
None
None
None