The Inference Report

June 25, 2026

AI Labs — News

The labs are signaling a decisive shift from model weights to inference infrastructure and operational tooling. OpenAI and Broadcom's Jalapeño chip targets the actual constraint labs face in production: getting LLMs to run fast and cheap at scale, not making them bigger or smarter. Google and DeepMind are pushing reasoning as a retrieval mechanism and computer use as a capability layer, both moves that make models useful without necessarily requiring new weights. The real activity, though, is in the plumbing. Hugging Face is optimizing fine-tuning pipelines with NVIDIA tooling and building benchmarks for real-world speech recognition. AMD is publishing kernel-level solutions for running DeepSeek efficiently on MI355X, a direct engineering response to competitive pressure. IBM, Red Hat, and Palo Alto are bundling vulnerability detection and remediation into a single workflow, treating security as an operational problem, not a model problem. Mistral is expanding connector control, Mistral is allowing users to customize integrations. AI21 Labs is merging weak agents into stronger ones through composition rather than scale. What's absent is louder than what's here: no lab announced a major new model. Instead, they're racing to own the layer between models and users, where margins live and lock-in begins. Inference chips, fine-tuning frameworks, computer use APIs, vulnerability workflows, and agent composition all point to the same insight: the model itself is becoming a commodity input. The money moves to whoever controls how it gets deployed.

Sloane Duvall

AI21 Labs

Tipping the scales: Merging weak agents into a state-of-the-art deep researcher

AMD

DP Attention and TBO for DeepSeek-V4 on MI355X

Google

Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

Google DeepMind

Introducing computer use in Gemini 3.5 Flash

Hugging Face

IBM

IBM, Red Hat and Palo Alto Networks Expand Project Lightwell to Help Organizations Respond to Software Vulnerabilities

Mistral

Bringing more control over your connectors

OpenAI

OpenAI and Broadcom unveil LLM-optimized inference chip

AI Labs — Models

A curated reference of models from major AI labs, with open/closed weight status, input modalities, and context window size. American labs tend towards closed weights models and Chinese labs tend toward open weights models.

Amazon

Closed Weights

Amazon: Nova 2 Lite
TextVisionVideoFiles1M
Amazon: Nova Premier 1.0
TextVision1M

Open Weights

None

Anthropic

Closed Weights

Anthropic: Claude Fable 5
TextVisionFiles1M
Anthropic: Claude Haiku 4.5
TextVisionFiles200K
Anthropic: Claude Opus 4.1
VisionTextFiles200K
Anthropic: Claude Opus 4.5
FilesVisionText200K
Anthropic: Claude Opus 4.6
TextVisionFiles1M
Anthropic: Claude Opus 4.6 (Fast)
TextVisionFiles1M
Anthropic: Claude Opus 4.7
TextVisionFiles1M
Anthropic: Claude Opus 4.7 (Fast)
TextVisionFiles1M
Anthropic: Claude Opus 4.8
TextVisionFiles1M
Anthropic: Claude Opus 4.8 (Fast)
TextVisionFiles1M
Anthropic: Claude Sonnet 4.5
TextVisionFiles1M
Anthropic: Claude Sonnet 4.6
TextVisionFiles1M

Open Weights

None

Google DeepMind

Closed Weights

Google: Gemini 2.5 Flash Lite
TextVisionFilesAudioVideo1M
Google: Gemini 2.5 Flash Lite Preview 09-2025
TextVisionFilesAudioVideo1M
Google: Gemini 3 Flash Preview
TextVisionFilesAudioVideo1M
Google: Gemini 3.1 Flash Lite
TextVisionVideoFilesAudio1M
Google: Gemini 3.1 Flash Lite Preview
TextVisionVideoFilesAudio1M
Google: Gemini 3.1 Pro Preview
AudioFilesVisionTextVideo1M
Google: Gemini 3.1 Pro Preview Custom Tools
TextAudioVisionVideoFiles1M
Google: Gemini 3.5 Flash
TextVisionVideoFilesAudio1M
Google: Lyria 3 Clip Preview
TextVision1M
Google: Lyria 3 Pro Preview
TextVision1M
Google: Nano Banana (Gemini 2.5 Flash Image)
VisionText33K
Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
VisionText131K
Google: Nano Banana 2 (Gemini 3.1 Flash Image)
VisionText131K
Google: Nano Banana Pro (Gemini 3 Pro Image Preview)
VisionText66K
Google: Nano Banana Pro (Gemini 3 Pro Image)
VisionText66K

Open Weights

Google: Gemma 4 26B A4B
VisionTextVideo262K
Google: Gemma 4 31B
VisionTextVideo262K

Microsoft

Closed Weights

None

Open Weights

Microsoft: Phi 4 Mini Instruct
Text131K

OpenAI

Closed Weights

OpenAI: GPT Audio
TextAudio128K
OpenAI: GPT Audio Mini
TextAudio128K
OpenAI: GPT Chat Latest
TextVisionFiles400K
OpenAI: GPT-5
TextVisionFiles400K
OpenAI: GPT-5 Chat
FilesVisionText128K
OpenAI: GPT-5 Codex
TextVision400K
OpenAI: GPT-5 Image
VisionTextFiles400K
OpenAI: GPT-5 Image Mini
FilesVisionText400K
OpenAI: GPT-5 Mini
TextVisionFiles400K
OpenAI: GPT-5 Nano
TextVisionFiles400K
OpenAI: GPT-5 Pro
VisionTextFiles400K
OpenAI: GPT-5.1
VisionTextFiles400K
OpenAI: GPT-5.1 Chat
FilesVisionText128K
OpenAI: GPT-5.1-Codex
TextVision400K
OpenAI: GPT-5.1-Codex-Max
TextVision400K
OpenAI: GPT-5.1-Codex-Mini
VisionText400K
OpenAI: GPT-5.2
FilesVisionText400K
OpenAI: GPT-5.2 Chat
FilesVisionText128K
OpenAI: GPT-5.2 Pro
VisionTextFiles400K
OpenAI: GPT-5.2-Codex
TextVision400K
OpenAI: GPT-5.3 Chat
TextVisionFiles128K
OpenAI: GPT-5.3-Codex
TextVisionFiles400K
OpenAI: GPT-5.4
TextVisionFiles1M
OpenAI: GPT-5.4 Image 2
VisionTextFiles272K
OpenAI: GPT-5.4 Mini
FilesVisionText400K
OpenAI: GPT-5.4 Nano
FilesVisionText400K
OpenAI: GPT-5.4 Pro
TextVisionFiles1M
OpenAI: GPT-5.5
FilesVisionText1M
OpenAI: GPT-5.5 Pro
FilesVisionText1M
OpenAI: o3 Deep Research
VisionTextFiles200K
OpenAI: o4 Mini Deep Research
FilesVisionText200K

Open Weights

OpenAI: gpt-oss-120b
Text131K
OpenAI: gpt-oss-20b
Text131K
OpenAI: gpt-oss-safeguard-20b
Text131K

xAI

Closed Weights

xAI: Grok 4.20
TextVisionFiles2M
xAI: Grok 4.20 Multi-Agent
TextVisionFiles2M
xAI: Grok 4.3
TextVision1M
xAI: Grok Build 0.1
TextVision256K

Open Weights

None

Mistral AI

Closed Weights

Mistral: Codestral 2508
TextFiles256K
Mistral: Mistral Large 3 2512
TextVisionFiles262K
Mistral: Mistral Medium 3.1
TextVisionFiles131K
Mistral: Mistral Medium 3.5
TextVisionFiles262K

Open Weights

Mistral: Devstral 2 2512
TextFiles262K
Mistral: Ministral 3 14B 2512
TextVision262K
Mistral: Ministral 3 3B 2512
TextVision131K
Mistral: Ministral 3 8B 2512
TextVision262K
Mistral: Mistral Small 4
TextVision262K
Mistral: Voxtral Small 24B 2507
TextAudioFiles32K

AI21 Labs

Closed Weights

None

Open Weights

AI21: Jamba Large 1.7
Text256K

Alibaba (Qwen)

Closed Weights

Qwen: Qwen Plus 0728
Text1M
Qwen: Qwen Plus 0728 (thinking)
Text1M
Qwen: Qwen3 Coder Flash
Text1M
Qwen: Qwen3 Coder Plus
Text1M
Qwen: Qwen3 Max
Text262K
Qwen: Qwen3 Max Thinking
Text262K
Qwen: Qwen3.5 Plus 2026-02-15
TextVisionVideo1M
Qwen: Qwen3.5 Plus 2026-04-20
TextVisionVideo1M
Qwen: Qwen3.5-Flash
TextVisionVideo1M
Qwen: Qwen3.6 Flash
TextVisionVideo1M
Qwen: Qwen3.6 Max Preview
Text262K
Qwen: Qwen3.6 Plus
TextVisionVideo1M
Qwen: Qwen3.7 Max
Text1M
Qwen: Qwen3.7 Plus
TextVision1M

Open Weights

Qwen: Qwen3 235B A22B Instruct 2507
Text262K
Qwen: Qwen3 235B A22B Thinking 2507
Text262K
Qwen: Qwen3 30B A3B Instruct 2507
Text131K
Qwen: Qwen3 30B A3B Thinking 2507
Text131K
Qwen: Qwen3 Coder 30B A3B Instruct
Text160K
Qwen: Qwen3 Coder 480B A35B
Text1M
Qwen: Qwen3 Coder Next
Text262K
Qwen: Qwen3 Next 80B A3B Instruct
Text262K
Qwen: Qwen3 Next 80B A3B Thinking
Text262K
Qwen: Qwen3 VL 235B A22B Instruct
TextVision262K
Qwen: Qwen3 VL 235B A22B Thinking
TextVision131K
Qwen: Qwen3 VL 30B A3B Instruct
TextVision262K
Qwen: Qwen3 VL 30B A3B Thinking
TextVision131K
Qwen: Qwen3 VL 32B Instruct
TextVision262K
Qwen: Qwen3 VL 8B Instruct
VisionText256K
Qwen: Qwen3 VL 8B Thinking
VisionText256K
Qwen: Qwen3.5 397B A17B
TextVisionVideo256K
Qwen: Qwen3.5-122B-A10B
TextVisionVideo262K
Qwen: Qwen3.5-27B
TextVisionVideo262K
Qwen: Qwen3.5-35B-A3B
TextVisionVideo262K
Qwen: Qwen3.5-9B
TextVisionVideo262K
Qwen: Qwen3.6 27B
TextVisionVideo262K
Qwen: Qwen3.6 35B A3B
TextVisionVideo262K

ByteDance

Closed Weights

Seed: Seed 1.6
VisionTextVideo262K
Seed: Seed 1.6 Flash
VisionTextVideo262K
Seed: Seed-2.0-Lite
TextVisionVideo262K
Seed: Seed-2.0-Mini
TextVisionVideo262K

Open Weights

None

DeepSeek

Closed Weights

None

Open Weights

DeepSeek: DeepSeek V3.1
Text164K
DeepSeek: DeepSeek V3.1 Terminus
Text164K
DeepSeek: DeepSeek V3.2
Text131K
DeepSeek: DeepSeek V3.2 Exp
Text164K
DeepSeek: DeepSeek V4 Flash
Text1M
DeepSeek: DeepSeek V4 Pro
Text1M

MiniMax

Closed Weights

MiniMax: MiniMax M2-her
Text66K

Open Weights

MiniMax: MiniMax M2
Text205K
MiniMax: MiniMax M2.1
Text205K
MiniMax: MiniMax M2.5
Text205K
MiniMax: MiniMax M2.7
Text205K
MiniMax: MiniMax M3
TextVisionVideo1M