OpenAI is flooding the zone with use-case documentation for Codex across sales, operations, and data science teams, not announcing new capabilities, but packaging existing ones as solved problems for specific workflows. The move signals a shift from model releases to adoption infrastructure: if enterprises see their exact job mapped onto a tool, friction drops. Databricks' deployment of GPT-5.5 on OfficeQA Pro benchmarks follows the same pattern, validating the model on domain-specific tasks before enterprises commit integration resources. Meanwhile, ChatGPT's new personal finance module for Pro subscribers represents a direct play for recurring revenue and stickiness through vertical integration, connect your bank, get financial guidance, which transforms the product from chat interface into a data-dependent service with switching costs. GitHub's accessibility agent and AMD's work on semantic dataset splitting are both infrastructure plays, but they reveal different bets: GitHub is testing whether agents can solve accessibility as a general problem rather than a case-by-case patch, while AMD is addressing a fundamental data science problem that affects every model's real-world performance. The collective pattern shows labs moving past benchmark theater into the harder work of making AI useful in the specific, messy context of actual business operations and financial decisions. That's where the stickiness lives, and where the defensibility lies.
Sloane Duvall
A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.
None
None
None
None
None
None
None