Sunday, June 14, 2026

Today’s Edition

AI Intel Report

MARKETS

Live Tracker

AI Model Leaderboard

The current ranking of the leading AI models — by capability, reliability, price, and deployment reality — plus every model release we cover, updated as the frontier moves.

· 8 models tracked

The ranking

AI Model Leaderboard — 2026-06-16
#ModelVendor ContextPricingBest forStatus
1 Claude Opus 4.8 Anthropic 1M tokens $5 / $25 per MTok Reasoning & agentic coding Available
2 GPT-5.5 OpenAI 1M+ tokens ~$5 / $30 per MTok Broadest all-rounder & tooling Available
3 Gemini 3.1 Pro Google DeepMind 1M–2M tokens $2 / $12 per MTok (≤200K) Multimodal & long-document Available
4 Claude Fable 5 Anthropic 1M tokens $10 / $50 per MTok Highest-capability reasoning (when available) Suspended
5 DeepSeek V4-Pro DeepSeek 1M tokens $0.435 / $0.87 per MTok Best value · open-weights lineage Available
6 Grok 4.3 xAI 1M tokens $1.25 / $2.50 per MTok Real-time data & agentic, cost-efficient Available
7 Qwen3.7 Max Alibaba 1M tokens $2.50 / $7.50 per MTok Multilingual reasoning Available
8 Meta Muse Spark Meta 262K tokens Preview — no public pricing yet Multimodal · Meta ecosystem Preview

The models, in brief

  1. #1

    Claude Opus 4.8

    Anthropic · $5 / $25 per MTok

    Our highest-rated model for production work: it holds a coherent plan across long agentic coding loops and tends to flag uncertainty rather than fabricate.

  2. #2

    GPT-5.5

    OpenAI · ~$5 / $30 per MTok

    The safest single-vendor bet — the deepest tooling and integrations, plus a model ladder that tunes cost to task without leaving the ecosystem.

  3. #3

    Gemini 3.1 Pro

    Google DeepMind · $2 / $12 per MTok (≤200K)

    The multimodal and long-document leader: native video, audio and image understanding with a 1M–2M context, on paid-API pricing (the $19.99 consumer plan is separate).

    In paid preview (no free API tier).

  4. #4

    Claude Fable 5

    Suspended

    Anthropic · $10 / $50 per MTok

    Anthropic's most capable model — currently dark, suspended under a U.S. export-control order; it out-ranks Opus 4.8 on capability, but Opus 4.8 is the top model you can actually use today.

    Suspended under a U.S. export-control order (June 2026).

  5. #5

    DeepSeek V4-Pro

    DeepSeek · $0.435 / $0.87 per MTok

    The value flagship of 2026: near-frontier quality at a fraction of closed-model cost, with an even cheaper V4-Flash tier ($0.14/$0.28) and an open-weights lineage you can self-host.

    Flagship in preview; V4-Flash is the economy tier.

  6. #6

    Grok 4.3

    xAI · $1.25 / $2.50 per MTok

    Real-time answers via native X access plus strong agentic performance — and after the 4.3 update, one of the cheapest frontier models to run.

  7. #7

    Qwen3.7 Max

    Alibaba · $2.50 / $7.50 per MTok

    The multilingual reasoning flagship, now a closed API-only Max tier; the open-weights Qwen3.5 remains the best free, self-hostable option.

  8. #8

    Meta Muse Spark

    Preview

    Meta · Preview — no public pricing yet

    Meta's first proprietary frontier model (from Meta Superintelligence Labs) — frontier-tier on benchmarks, but still private-preview only as its public API slips.

    Private invitation-only API; public developer API delayed (June 2026).

The state of the AI frontier in 2026

There is no single best AI model in 2026, and any leaderboard that crowns one and stops is selling certainty that does not exist. The frontier has split into specialists: one model writes the cleanest code, another reasons hardest on graduate-level science, a third natively reads an hour of video, and a fourth does roughly ninety percent of the work for a fraction of the price. The useful question is no longer which model is best but which model is best for your task at your budget — which is why this leaderboard pairs an overall ranking with a per-model breakdown and a deep-dive guide.

Three structural shifts define the year. Price has collapsed at the bottom: open-weight models like DeepSeek's V4 line now deliver near-frontier quality at a fraction of the cost per token, and you can self-host them. Context has standardized: a one-million-token window, exotic two years ago, is now table stakes across the closed flagships. And reasoning has gone always-on: the leading models decide for themselves how much deliberation a problem warrants rather than exposing a manual toggle.

The other defining feature of 2026 is volatility. Models are repriced overnight, deprecated on short notice, and — as Claude Fable 5's mid-year suspension under a U.S. export-control order showed — can be switched off entirely by forces outside the vendor's roadmap. Model availability is now a supply-chain variable, not a given, which is the single strongest argument for a multi-model strategy and for tracking the frontier continuously rather than choosing once.

How we rank

Placements weigh reasoning depth, coding and agentic reliability, context and multimodality, price-to-quality, and deployment control, cross-checked against primary vendor documentation and public benchmarks (SWE-bench, the LMArena leaderboard) rather than vendor marketing. The table is refreshed when a model ships, is repriced, or changes status; the "latest releases" feed below updates automatically. For the task-by-task breakdown — strengths, weaknesses, and sourced pricing for every model — see the full Best LLMs guide.

Latest model releases

Frequently asked

What is the best AI model right now?

As of 2026-06-16, Claude Opus 4.8 (Anthropic) tops our leaderboard for overall capability and reliability — it leads on reasoning and long-horizon agentic coding. But there is no universal winner in 2026: the frontier has split into specialists, so the best model depends on your task and budget. DeepSeek V4-Pro wins on price-to-quality, and Gemini 3.1 Pro leads multimodal. For most teams the right answer is to run two or three behind a gateway and route per task.

How often is this leaderboard updated?

The ranking is reviewed and refreshed whenever a major model ships, is repriced, or changes status — most recently 2026-06-16. The "Latest model releases" section below updates automatically as we publish new model coverage, so the page reflects the current state of the frontier rather than a frozen snapshot.

Why is Claude Fable 5 marked suspended?

Anthropic suspended access to Claude Fable 5 in June 2026 after a U.S. export-control order required it to disable the model for all customers. It remains on the leaderboard for reference because it is one of the most capable models released, but it is not currently available — a reminder that in 2026, model availability is a supply-chain variable, not a given.

Are open-weight models included?

Yes. Open-weight models are first-class entries — DeepSeek V4-Pro ranks for value and Qwen3.5 (open weights, Apache 2.0) is the multilingual self-hosting pick. In 2026 the gap between the best open-weight models and the closed frontier has narrowed dramatically, and for high-volume, private, or self-hosted workloads they are often the most rational choice. The leaderboard ranks them on the same capability, price, and deployment criteria as the closed flagships.

Where can I see the full per-model comparison?

Each row links to its full analysis in our Best LLMs guide, which breaks every model down by task (coding, reasoning, multimodal, cost), with strengths, weaknesses, pricing detail, and sources. The leaderboard is the live snapshot; the guide is the deep dive.