Live Tracker

AI Model Leaderboard

The current ranking of the leading AI models — by capability, reliability, price, and deployment reality — plus every model release we cover, updated as the frontier moves.

Updated Jun 16, 2026 · 8 models tracked

The ranking

AI Model Leaderboard — 2026-06-16
#	Model	Vendor	Context	Pricing	Best for	Status
1	Claude Opus 4.8	Anthropic	1M tokens	$5 / $25 per MTok	Reasoning & agentic coding	Available
2	GPT-5.5	OpenAI	1M+ tokens	~$5 / $30 per MTok	Broadest all-rounder & tooling	Available
3	Gemini 3.1 Pro	Google DeepMind	1M–2M tokens	$2 / $12 per MTok (≤200K)	Multimodal & long-document	Available
4	Claude Fable 5	Anthropic	1M tokens	$10 / $50 per MTok	Highest-capability reasoning (when available)	Suspended
5	DeepSeek V4-Pro	DeepSeek	1M tokens	$0.435 / $0.87 per MTok	Best value · open-weights lineage	Available
6	Grok 4.3	xAI	1M tokens	$1.25 / $2.50 per MTok	Real-time data & agentic, cost-efficient	Available
7	Qwen3.7 Max	Alibaba	1M tokens	$2.50 / $7.50 per MTok	Multilingual reasoning	Available
8	Meta Muse Spark	Meta	262K tokens	Preview — no public pricing yet	Multimodal · Meta ecosystem	Preview

The models, in brief

#1
Claude Opus 4.8

Anthropic · $5 / $25 per MTok

Our highest-rated model for production work: it holds a coherent plan across long agentic coding loops and tends to flag uncertainty rather than fabricate.

Full analysis › · Visit Anthropic ›
#2
GPT-5.5

OpenAI · ~$5 / $30 per MTok

The safest single-vendor bet — the deepest tooling and integrations, plus a model ladder that tunes cost to task without leaving the ecosystem.

Full analysis › · Visit OpenAI ›
#3
Gemini 3.1 Pro

Google DeepMind · $2 / $12 per MTok (≤200K)

The multimodal and long-document leader: native video, audio and image understanding with a 1M–2M context, on paid-API pricing (the $19.99 consumer plan is separate).

In paid preview (no free API tier).

Full analysis › · Visit Google DeepMind ›
#4
Claude Fable 5
Suspended

Anthropic · $10 / $50 per MTok

Anthropic's most capable model — currently dark, suspended under a U.S. export-control order; it out-ranks Opus 4.8 on capability, but Opus 4.8 is the top model you can actually use today.

Suspended under a U.S. export-control order (June 2026).

Full analysis › · Visit Anthropic ›
#5
DeepSeek V4-Pro

DeepSeek · $0.435 / $0.87 per MTok

The value flagship of 2026: near-frontier quality at a fraction of closed-model cost, with an even cheaper V4-Flash tier ($0.14/$0.28) and an open-weights lineage you can self-host.

Flagship in preview; V4-Flash is the economy tier.

Full analysis › · Visit DeepSeek ›
#6
Grok 4.3

xAI · $1.25 / $2.50 per MTok

Real-time answers via native X access plus strong agentic performance — and after the 4.3 update, one of the cheapest frontier models to run.

Full analysis › · Visit xAI ›
#7
Qwen3.7 Max

Alibaba · $2.50 / $7.50 per MTok

The multilingual reasoning flagship, now a closed API-only Max tier; the open-weights Qwen3.5 remains the best free, self-hostable option.

Full analysis › · Visit Alibaba ›
#8
Meta Muse Spark
Preview

Meta · Preview — no public pricing yet

Meta's first proprietary frontier model (from Meta Superintelligence Labs) — frontier-tier on benchmarks, but still private-preview only as its public API slips.

Private invitation-only API; public developer API delayed (June 2026).

Visit Meta ›

The state of the AI frontier in 2026

There is no single best AI model in 2026, and any leaderboard that crowns one and stops is selling certainty that does not exist. The frontier has split into specialists: one model writes the cleanest code, another reasons hardest on graduate-level science, a third natively reads an hour of video, and a fourth does roughly ninety percent of the work for a fraction of the price. The useful question is no longer which model is best but which model is best for your task at your budget — which is why this leaderboard pairs an overall ranking with a per-model breakdown and a deep-dive guide.

Three structural shifts define the year. Price has collapsed at the bottom: open-weight models like DeepSeek's V4 line now deliver near-frontier quality at a fraction of the cost per token, and you can self-host them. Context has standardized: a one-million-token window, exotic two years ago, is now table stakes across the closed flagships. And reasoning has gone always-on: the leading models decide for themselves how much deliberation a problem warrants rather than exposing a manual toggle.

The other defining feature of 2026 is volatility. Models are repriced overnight, deprecated on short notice, and — as Claude Fable 5's mid-year suspension under a U.S. export-control order showed — can be switched off entirely by forces outside the vendor's roadmap. Model availability is now a supply-chain variable, not a given, which is the single strongest argument for a multi-model strategy and for tracking the frontier continuously rather than choosing once.

How we rank

Placements weigh reasoning depth, coding and agentic reliability, context and multimodality, price-to-quality, and deployment control, cross-checked against primary vendor documentation and public benchmarks (SWE-bench, the LMArena leaderboard) rather than vendor marketing. The table is refreshed when a model ships, is repriced, or changes status; the "latest releases" feed below updates automatically. For the task-by-task breakdown — strengths, weaknesses, and sourced pricing for every model — see the full Best LLMs guide.

Latest model releases

SpaceXAI Releases Grok Voice Think Fast 2.0 as Speech-to-Speech Leader

The successor model from SpaceXAI delivers benchmark gains in quality and reasoning while cutting token usage and latency, with a planned default rollout that positions it ahead of GPT-Realtime-2.1 and Gemini 3.1 Flash variants.

Jul 29, 2026 · Marcus Vance
Moonshot AI Raises $3.5 Billion at $35 Billion Valuation Following Kimi K3 Breakthrough

Beijing startup surpasses initial funding targets with oversubscribed round backed by National Artificial Intelligence Industry Investment Fund and eyes $50 billion pre-money round plus Hong Kong IPO.

Jul 29, 2026 · The Intel Desk
b.ai Offers Google Gemini 3.6 Flash and 3.5 Flash-Lite via API for Agent Development

The platform provides immediate access to the new models with large context windows and tool support, enabling developers to scale AI agents efficiently without additional setup.

Jul 29, 2026 · Marcus Vance
Qwen3.7 Flash Models Integrate with NanoGPT for Multimodal Agent Tasks

The models add near-1M context, tool calling and multimodal inputs to the platform for coding, visual and agent workflows at listed pricing.

Jul 28, 2026 · Marcus Vance
Claude Opus 5 Tops SWE-bench Verified at 96% as Anthropic Sweeps Podium

BenchLM.ai data from July 28, 2026 places three Anthropic models at the top of the software engineering benchmark while scores across 59 evaluated systems show tight clustering that points to saturation.

Jul 28, 2026 · Marcus Vance
Microsoft Launches MAI-Cyber-1-Flash as First Specialized Cybersecurity AI

The compact model integrates into MDASH to deliver leading benchmark results through a multi-model agentic approach that routes routine tasks internally while escalating complex cases.

Jul 28, 2026 · Marcus Vance
Claude Opus 5 Matches Fable 5 Performance at Half the Cost

Anthropic's July 2026 release positions the model as a practical option for agentic systems by delivering frontier benchmark results at reduced pricing compared to prior top-tier options.

Jul 27, 2026 · Marcus Vance
DeepSeek V4 Enters General Availability with Peak API Pricing

The official launch introduces optimizations for V4-Pro and V4-Flash while retiring legacy endpoints and applying new pricing structures to the open-weight models.

Jul 27, 2026 · Marcus Vance
SpaceXAI Launches Grok 4.5 as Strongest Frontier Coding Model at 80 TPS

The new release from SpaceXAI emphasizes curated engineering data and multi-step reasoning to support coding and agentic tasks, with rollout across web, mobile, and API channels.

Jul 26, 2026 · Marcus Vance

Frequently asked

What is the best AI model right now?

As of 2026-06-16, Claude Opus 4.8 (Anthropic) tops our leaderboard for overall capability and reliability — it leads on reasoning and long-horizon agentic coding. But there is no universal winner in 2026: the frontier has split into specialists, so the best model depends on your task and budget. DeepSeek V4-Pro wins on price-to-quality, and Gemini 3.1 Pro leads multimodal. For most teams the right answer is to run two or three behind a gateway and route per task.

How often is this leaderboard updated?

The ranking is reviewed and refreshed whenever a major model ships, is repriced, or changes status — most recently 2026-06-16. The "Latest model releases" section below updates automatically as we publish new model coverage, so the page reflects the current state of the frontier rather than a frozen snapshot.

Why is Claude Fable 5 marked suspended?

Anthropic suspended access to Claude Fable 5 in June 2026 after a U.S. export-control order required it to disable the model for all customers. It remains on the leaderboard for reference because it is one of the most capable models released, but it is not currently available — a reminder that in 2026, model availability is a supply-chain variable, not a given.

Are open-weight models included?

Yes. Open-weight models are first-class entries — DeepSeek V4-Pro ranks for value and Qwen3.5 (open weights, Apache 2.0) is the multilingual self-hosting pick. In 2026 the gap between the best open-weight models and the closed frontier has narrowed dramatically, and for high-volume, private, or self-hosted workloads they are often the most rational choice. The leaderboard ranks them on the same capability, price, and deployment criteria as the closed flagships.

Where can I see the full per-model comparison?

Each row links to its full analysis in our Best LLMs guide, which breaks every model down by task (coding, reasoning, multimodal, cost), with strengths, weaknesses, pricing detail, and sources. The leaderboard is the live snapshot; the guide is the deep dive.

Track the frontier with The AI Brief

New model releases and leaderboard shifts, the moment they land — free.

The ranking

The models, in brief

Claude Opus 4.8

GPT-5.5

Gemini 3.1 Pro

Claude Fable 5

DeepSeek V4-Pro

Grok 4.3

Qwen3.7 Max

Meta Muse Spark