The Inference Report

July 1, 2026
AI Labs — News

The market is fragmenting along infrastructure and application lines, with labs signaling radically different bets about where AI's economic value concentrates. OpenAI is publishing adoption metrics and benchmarking genomics performance, moves that read as defensive, establishing market presence in life sciences before competitors lock in researcher workflows. Google and NVIDIA are moving harder into the infrastructure play: TabFM targets the unglamorous but economically dense world of tabular data; NVIDIA's token cost analysis and robotics software stack are explicitly framed around production deployment and cost discipline, not capability. Anthropic released Claude Sonnet 5 and Claude Science, positioning itself as the inference vendor for specialized workloads. AI21 Labs is calling out routing inefficiency, a signal that token arbitrage is becoming a real cost center for enterprises. The pattern is clear: whoever controls the inference layer for domain-specific work controls the margin. Benchmark releases and adoption reports matter less than who owns the production pipeline.

The infrastructure vendors are winning the narrative. NVIDIA's announcements span GPU optimization, robotics software, synthetic data workflows, and life sciences tooling, a full stack play that locks in developer dependency before application-layer competitors can establish themselves. AMD is optimizing chiplet communication, a technical move that matters only if MI300X gains traction, which depends on NVIDIA not maintaining its cost advantage. Hugging Face is doubling down on specialization and benchmarking, trying to remain relevant as a discovery layer while the actual value migrates to whoever ships production systems. MIRI's newsletter launch and governance workshop coverage signal that the policy and safety conversation is becoming decoupled from the engineering conversation, which means safety frameworks are unlikely to constrain competitive moves. What's missing from today's announcements is any lab announcing a price cut or a direct challenge to NVIDIA's inference economics. That absence is the story.

Sloane Duvall

AI Labs — Models

A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.

usAmazon
Closed Weights
  • Amazon: Nova 2 Lite
    TextVisionVideoFiles1M
  • Amazon: Nova Premier 1.0
    TextVision1M
Open Weights

None

usAnthropic
Closed Weights
  • Anthropic: Claude Fable 5
    TextVisionFiles1M
  • Anthropic: Claude Haiku 4.5
    TextVisionFiles200K
  • Anthropic: Claude Opus 4.1
    VisionTextFiles200K
  • Anthropic: Claude Opus 4.5
    FilesVisionText200K
  • Anthropic: Claude Opus 4.6
    TextVisionFiles1M
  • Anthropic: Claude Opus 4.7
    TextVisionFiles1M
  • Anthropic: Claude Opus 4.7 (Fast)
    TextVisionFiles1M
  • Anthropic: Claude Opus 4.8
    TextVisionFiles1M
  • Anthropic: Claude Opus 4.8 (Fast)
    TextVisionFiles1M
  • Anthropic: Claude Sonnet 4.5
    TextVisionFiles1M
  • Anthropic: Claude Sonnet 4.6
    TextVisionFiles1M
  • Anthropic: Claude Sonnet 5
    TextVisionFiles1M
Open Weights

None

usGoogle DeepMind
Closed Weights
  • Google: Gemini 2.5 Flash Lite
    TextVisionFilesAudioVideo1M
  • Google: Gemini 2.5 Flash Lite Preview 09-2025
    TextVisionFilesAudioVideo1M
  • Google: Gemini 3 Flash Preview
    TextVisionFilesAudioVideo1M
  • Google: Gemini 3.1 Flash Lite
    TextVisionVideoFilesAudio1M
  • Google: Gemini 3.1 Flash Lite Preview
    TextVisionVideoFilesAudio1M
  • Google: Gemini 3.1 Pro Preview
    AudioFilesVisionTextVideo1M
  • Google: Gemini 3.1 Pro Preview Custom Tools
    TextAudioVisionVideoFiles1M
  • Google: Gemini 3.5 Flash
    TextVisionVideoFilesAudio1M
  • Google: Lyria 3 Clip Preview
    TextVision1M
  • Google: Lyria 3 Pro Preview
    TextVision1M
  • Google: Nano Banana (Gemini 2.5 Flash Image)
    VisionText33K
  • Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
    VisionText131K
  • Google: Nano Banana 2 (Gemini 3.1 Flash Image)
    VisionText131K
  • Google: Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image)
    VisionText66K
  • Google: Nano Banana Pro (Gemini 3 Pro Image Preview)
    VisionText66K
  • Google: Nano Banana Pro (Gemini 3 Pro Image)
    VisionText66K
Open Weights
  • Google: Gemma 4 26B A4B
    VisionTextVideo262K
  • Google: Gemma 4 31B
    VisionTextVideo262K
usOpenAI
Closed Weights
  • OpenAI: GPT Audio
    TextAudio128K
  • OpenAI: GPT Audio Mini
    TextAudio128K
  • OpenAI: GPT Chat Latest
    TextVisionFiles400K
  • OpenAI: GPT-5
    TextVisionFiles400K
  • OpenAI: GPT-5 Chat
    FilesVisionText128K
  • OpenAI: GPT-5 Codex
    TextVision400K
  • OpenAI: GPT-5 Image
    VisionTextFiles400K
  • OpenAI: GPT-5 Image Mini
    FilesVisionText400K
  • OpenAI: GPT-5 Mini
    TextVisionFiles400K
  • OpenAI: GPT-5 Nano
    TextVisionFiles400K
  • OpenAI: GPT-5 Pro
    VisionTextFiles400K
  • OpenAI: GPT-5.1
    VisionTextFiles400K
  • OpenAI: GPT-5.1 Chat
    FilesVisionText128K
  • OpenAI: GPT-5.1-Codex
    TextVision400K
  • OpenAI: GPT-5.1-Codex-Max
    TextVision400K
  • OpenAI: GPT-5.1-Codex-Mini
    VisionText400K
  • OpenAI: GPT-5.2
    FilesVisionText400K
  • OpenAI: GPT-5.2 Chat
    FilesVisionText128K
  • OpenAI: GPT-5.2 Pro
    VisionTextFiles400K
  • OpenAI: GPT-5.2-Codex
    TextVision400K
  • OpenAI: GPT-5.3 Chat
    TextVisionFiles128K
  • OpenAI: GPT-5.3-Codex
    TextVisionFiles400K
  • OpenAI: GPT-5.4
    TextVisionFiles1M
  • OpenAI: GPT-5.4 Image 2
    VisionTextFiles272K
  • OpenAI: GPT-5.4 Mini
    FilesVisionText400K
  • OpenAI: GPT-5.4 Nano
    FilesVisionText400K
  • OpenAI: GPT-5.4 Pro
    TextVisionFiles1M
  • OpenAI: GPT-5.5
    FilesVisionText1M
  • OpenAI: GPT-5.5 Pro
    FilesVisionText1M
  • OpenAI: o3 Deep Research
    VisionTextFiles200K
  • OpenAI: o4 Mini Deep Research
    FilesVisionText200K
Open Weights
  • OpenAI: gpt-oss-120b
    Text131K
  • OpenAI: gpt-oss-20b
    Text131K
  • OpenAI: gpt-oss-safeguard-20b
    Text131K
usxAI
Closed Weights
  • xAI: Grok 4.20
    TextVisionFiles2M
  • xAI: Grok 4.20 Multi-Agent
    TextVisionFiles2M
  • xAI: Grok 4.3
    TextVisionFiles1M
  • xAI: Grok Build 0.1
    TextVision256K
Open Weights

None

frMistral AI
Closed Weights
  • Mistral: Codestral 2508
    TextFiles256K
  • Mistral: Mistral Large 3 2512
    TextVisionFiles262K
  • Mistral: Mistral Medium 3.1
    TextVisionFiles131K
  • Mistral: Mistral Medium 3.5
    TextVisionFiles262K
Open Weights
  • Mistral: Devstral 2 2512
    TextFiles262K
  • Mistral: Ministral 3 14B 2512
    TextVision262K
  • Mistral: Ministral 3 3B 2512
    TextVision131K
  • Mistral: Ministral 3 8B 2512
    TextVision262K
  • Mistral: Mistral Small 4
    TextVision262K
  • Mistral: Voxtral Small 24B 2507
    TextAudioFiles32K
ilAI21 Labs
Closed Weights

None

Open Weights
  • AI21: Jamba Large 1.7
    Text256K
cnAlibaba (Qwen)
Closed Weights
  • Qwen: Qwen Plus 0728
    Text1M
  • Qwen: Qwen Plus 0728 (thinking)
    Text1M
  • Qwen: Qwen3 Coder Flash
    Text1M
  • Qwen: Qwen3 Coder Plus
    Text1M
  • Qwen: Qwen3 Max
    Text262K
  • Qwen: Qwen3 Max Thinking
    Text262K
  • Qwen: Qwen3.5 Plus 2026-02-15
    TextVisionVideo1M
  • Qwen: Qwen3.5 Plus 2026-04-20
    TextVisionVideo1M
  • Qwen: Qwen3.5-Flash
    TextVisionVideo1M
  • Qwen: Qwen3.6 Flash
    TextVisionVideo1M
  • Qwen: Qwen3.6 Max Preview
    Text262K
  • Qwen: Qwen3.6 Plus
    TextVisionVideo1M
  • Qwen: Qwen3.7 Max
    Text1M
  • Qwen: Qwen3.7 Plus
    TextVision1M
Open Weights
  • Qwen: Qwen3 235B A22B Instruct 2507
    Text262K
  • Qwen: Qwen3 235B A22B Thinking 2507
    Text262K
  • Qwen: Qwen3 30B A3B Instruct 2507
    Text131K
  • Qwen: Qwen3 30B A3B Thinking 2507
    Text131K
  • Qwen: Qwen3 Coder 30B A3B Instruct
    Text160K
  • Qwen: Qwen3 Coder 480B A35B
    Text1M
  • Qwen: Qwen3 Coder Next
    Text262K
  • Qwen: Qwen3 Next 80B A3B Instruct
    Text262K
  • Qwen: Qwen3 Next 80B A3B Thinking
    Text262K
  • Qwen: Qwen3 VL 235B A22B Instruct
    TextVision262K
  • Qwen: Qwen3 VL 235B A22B Thinking
    TextVision131K
  • Qwen: Qwen3 VL 30B A3B Instruct
    TextVision262K
  • Qwen: Qwen3 VL 30B A3B Thinking
    TextVision131K
  • Qwen: Qwen3 VL 32B Instruct
    TextVision262K
  • Qwen: Qwen3 VL 8B Instruct
    VisionText256K
  • Qwen: Qwen3 VL 8B Thinking
    VisionText256K
  • Qwen: Qwen3.5 397B A17B
    TextVisionVideo256K
  • Qwen: Qwen3.5-122B-A10B
    TextVisionVideo262K
  • Qwen: Qwen3.5-27B
    TextVisionVideo262K
  • Qwen: Qwen3.5-35B-A3B
    TextVisionVideo262K
  • Qwen: Qwen3.5-9B
    TextVisionVideo262K
  • Qwen: Qwen3.6 27B
    TextVisionVideo262K
  • Qwen: Qwen3.6 35B A3B
    TextVisionVideo262K
cnByteDance
Closed Weights
  • Seed: Seed 1.6
    VisionTextVideo262K
  • Seed: Seed 1.6 Flash
    VisionTextVideo262K
  • Seed: Seed-2.0-Lite
    TextVisionVideo262K
  • Seed: Seed-2.0-Mini
    TextVisionVideo262K
Open Weights

None

cnDeepSeek
Closed Weights

None

Open Weights
  • DeepSeek: DeepSeek V3.1
    Text164K
  • DeepSeek: DeepSeek V3.1 Terminus
    Text164K
  • DeepSeek: DeepSeek V3.2
    Text131K
  • DeepSeek: DeepSeek V3.2 Exp
    Text164K
  • DeepSeek: DeepSeek V4 Flash
    Text1M
  • DeepSeek: DeepSeek V4 Pro
    Text1M
cnMiniMax
Closed Weights
  • MiniMax: MiniMax M2-her
    Text66K
Open Weights
  • MiniMax: MiniMax M2
    Text205K
  • MiniMax: MiniMax M2.1
    Text205K
  • MiniMax: MiniMax M2.5
    Text205K
  • MiniMax: MiniMax M2.7
    Text205K
  • MiniMax: MiniMax M3
    TextVisionVideo1M