The labs are signaling a decisive shift from model weights to inference infrastructure and operational tooling. OpenAI and Broadcom's Jalapeño chip targets the actual constraint labs face in production: getting LLMs to run fast and cheap at scale, not making them bigger or smarter. Google and DeepMind are pushing reasoning as a retrieval mechanism and computer use as a capability layer, both moves that make models useful without necessarily requiring new weights. The real activity, though, is in the plumbing. Hugging Face is optimizing fine-tuning pipelines with NVIDIA tooling and building benchmarks for real-world speech recognition. AMD is publishing kernel-level solutions for running DeepSeek efficiently on MI355X, a direct engineering response to competitive pressure. IBM, Red Hat, and Palo Alto are bundling vulnerability detection and remediation into a single workflow, treating security as an operational problem, not a model problem. Mistral is expanding connector control, Mistral is allowing users to customize integrations. AI21 Labs is merging weak agents into stronger ones through composition rather than scale. What's absent is louder than what's here: no lab announced a major new model. Instead, they're racing to own the layer between models and users, where margins live and lock-in begins. Inference chips, fine-tuning frameworks, computer use APIs, vulnerability workflows, and agent composition all point to the same insight: the model itself is becoming a commodity input. The money moves to whoever controls how it gets deployed.
Sloane Duvall
A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.
None
None
None
None
None
None
None