AI Model Leaderboard
The current ranking of the leading AI models — by capability, reliability, price, and deployment reality — plus every model release we cover, updated as the frontier moves.
· 8 models tracked
The ranking
| # | Model | Vendor | Context | Pricing | Best for | Status |
|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | Anthropic | 1M tokens | $5 / $25 per MTok | Reasoning & agentic coding | Available |
| 2 | GPT-5.5 | OpenAI | 1M+ tokens | ~$5 / $30 per MTok | Broadest all-rounder & tooling | Available |
| 3 | Gemini 3.1 Pro | Google DeepMind | 1M–2M tokens | $2 / $12 per MTok (≤200K) | Multimodal & long-document | Available |
| 4 | Claude Fable 5 | Anthropic | 1M tokens | $10 / $50 per MTok | Highest-capability reasoning (when available) | Suspended |
| 5 | DeepSeek V4-Pro | DeepSeek | 1M tokens | $0.435 / $0.87 per MTok | Best value · open-weights lineage | Available |
| 6 | Grok 4.3 | xAI | 1M tokens | $1.25 / $2.50 per MTok | Real-time data & agentic, cost-efficient | Available |
| 7 | Qwen3.7 Max | Alibaba | 1M tokens | $2.50 / $7.50 per MTok | Multilingual reasoning | Available |
| 8 | Meta Muse Spark | Meta | 262K tokens | Preview — no public pricing yet | Multimodal · Meta ecosystem | Preview |
The models, in brief
- #1
Claude Opus 4.8
Anthropic · $5 / $25 per MTok
Our highest-rated model for production work: it holds a coherent plan across long agentic coding loops and tends to flag uncertainty rather than fabricate.
- #2
GPT-5.5
OpenAI · ~$5 / $30 per MTok
The safest single-vendor bet — the deepest tooling and integrations, plus a model ladder that tunes cost to task without leaving the ecosystem.
- #3
Gemini 3.1 Pro
Google DeepMind · $2 / $12 per MTok (≤200K)
The multimodal and long-document leader: native video, audio and image understanding with a 1M–2M context, on paid-API pricing (the $19.99 consumer plan is separate).
In paid preview (no free API tier).
- #4
Claude Fable 5
SuspendedAnthropic · $10 / $50 per MTok
Anthropic's most capable model — currently dark, suspended under a U.S. export-control order; it out-ranks Opus 4.8 on capability, but Opus 4.8 is the top model you can actually use today.
Suspended under a U.S. export-control order (June 2026).
- #5
DeepSeek V4-Pro
DeepSeek · $0.435 / $0.87 per MTok
The value flagship of 2026: near-frontier quality at a fraction of closed-model cost, with an even cheaper V4-Flash tier ($0.14/$0.28) and an open-weights lineage you can self-host.
Flagship in preview; V4-Flash is the economy tier.
- #6
Grok 4.3
xAI · $1.25 / $2.50 per MTok
Real-time answers via native X access plus strong agentic performance — and after the 4.3 update, one of the cheapest frontier models to run.
- #7
Qwen3.7 Max
Alibaba · $2.50 / $7.50 per MTok
The multilingual reasoning flagship, now a closed API-only Max tier; the open-weights Qwen3.5 remains the best free, self-hostable option.
- #8
Meta Muse Spark
PreviewMeta · Preview — no public pricing yet
Meta's first proprietary frontier model (from Meta Superintelligence Labs) — frontier-tier on benchmarks, but still private-preview only as its public API slips.
Private invitation-only API; public developer API delayed (June 2026).
The state of the AI frontier in 2026
There is no single best AI model in 2026, and any leaderboard that crowns one and stops is selling certainty that does not exist. The frontier has split into specialists: one model writes the cleanest code, another reasons hardest on graduate-level science, a third natively reads an hour of video, and a fourth does roughly ninety percent of the work for a fraction of the price. The useful question is no longer which model is best but which model is best for your task at your budget — which is why this leaderboard pairs an overall ranking with a per-model breakdown and a deep-dive guide.
Three structural shifts define the year. Price has collapsed at the bottom: open-weight models like DeepSeek's V4 line now deliver near-frontier quality at a fraction of the cost per token, and you can self-host them. Context has standardized: a one-million-token window, exotic two years ago, is now table stakes across the closed flagships. And reasoning has gone always-on: the leading models decide for themselves how much deliberation a problem warrants rather than exposing a manual toggle.
The other defining feature of 2026 is volatility. Models are repriced overnight, deprecated on short notice, and — as Claude Fable 5's mid-year suspension under a U.S. export-control order showed — can be switched off entirely by forces outside the vendor's roadmap. Model availability is now a supply-chain variable, not a given, which is the single strongest argument for a multi-model strategy and for tracking the frontier continuously rather than choosing once.
How we rank
Placements weigh reasoning depth, coding and agentic reliability, context and multimodality, price-to-quality, and deployment control, cross-checked against primary vendor documentation and public benchmarks (SWE-bench, the LMArena leaderboard) rather than vendor marketing. The table is refreshed when a model ships, is repriced, or changes status; the "latest releases" feed below updates automatically. For the task-by-task breakdown — strengths, weaknesses, and sourced pricing for every model — see the full Best LLMs guide.
Latest model releases
-
Z.ai Releases GLM-5.2 Open-Weight Frontier Model With 1M Context
The firm grants immediate subscriber access to its latest model while scheduling open weights under MIT license, highlighting a push for accessible frontier capabilities in coding and long-context domains.
-
Zhipu AI Rolls Out GLM-5.2 with 1M Context as Open Frontier Response
The Chinese developer has made its latest model immediately available to subscribers while scheduling full API access and MIT-licensed weights for the following week.
-
Anthropic Suspends Claude Fable 5 and Mythos 5 After U.S. Export Control Order
The U.S. government export control order led Anthropic to suspend access to Claude Fable 5 and Mythos 5 on June 12, 2026, limiting availability to Opus 4.8.
-
Moonshot AI Releases Kimi K2.7-Code Open-Source Coding Model
The new model from Moonshot AI offers efficiency improvements in coding tasks while being freely available under open licensing.
-
Anthropic Suspends Claude Fable 5 and Mythos 5 After U.S. Export-Control Order
A federal export-control directive forced Anthropic to disable its two most capable models for every customer. Here is where Claude Fable 5 went — and why regulators pulled it.
-
Microsoft MAI-Thinking-1 Matches Claude Opus 4.6 on Coding Benchmarks at Mid-Size
The new reasoning model from Microsoft AI uses clean licensed data and a sparse MoE design to achieve high performance on software engineering and math tasks without third-party distillation.
-
xAI Updates Grok Web Release Notes With Connectors and API Enhancements
xAI's recent updates to Grok Web release notes highlight the rollout of Connectors for enterprise app integrations and WebSocket enhancements that lower latency for agent-based tasks.
-
Anthropic Launches Claude Fable 5 with Mythos Capabilities and Safety Handoffs
The dual release pairs a safeguarded model for broad access with an unrestricted variant for select users, addressing both performance demands and risk management in frontier AI deployment.
-
xAI Grok 4.3 Enhances Agentic Tool Calling and Instruction Following
This flagship model update from xAI brings native single-brain tool use, a large context window, and configurable reasoning to improve reliability in AI agent applications, while older models are phased out.
Frequently asked
What is the best AI model right now?
As of 2026-06-16, Claude Opus 4.8 (Anthropic) tops our leaderboard for overall capability and reliability — it leads on reasoning and long-horizon agentic coding. But there is no universal winner in 2026: the frontier has split into specialists, so the best model depends on your task and budget. DeepSeek V4-Pro wins on price-to-quality, and Gemini 3.1 Pro leads multimodal. For most teams the right answer is to run two or three behind a gateway and route per task.
How often is this leaderboard updated?
The ranking is reviewed and refreshed whenever a major model ships, is repriced, or changes status — most recently 2026-06-16. The "Latest model releases" section below updates automatically as we publish new model coverage, so the page reflects the current state of the frontier rather than a frozen snapshot.
Why is Claude Fable 5 marked suspended?
Anthropic suspended access to Claude Fable 5 in June 2026 after a U.S. export-control order required it to disable the model for all customers. It remains on the leaderboard for reference because it is one of the most capable models released, but it is not currently available — a reminder that in 2026, model availability is a supply-chain variable, not a given.
Are open-weight models included?
Yes. Open-weight models are first-class entries — DeepSeek V4-Pro ranks for value and Qwen3.5 (open weights, Apache 2.0) is the multilingual self-hosting pick. In 2026 the gap between the best open-weight models and the closed frontier has narrowed dramatically, and for high-volume, private, or self-hosted workloads they are often the most rational choice. The leaderboard ranks them on the same capability, price, and deployment criteria as the closed flagships.
Where can I see the full per-model comparison?
Each row links to its full analysis in our Best LLMs guide, which breaks every model down by task (coding, reasoning, multimodal, cost), with strengths, weaknesses, pricing detail, and sources. The leaderboard is the live snapshot; the guide is the deep dive.