# Qwen-AgentWorld-397B-A17B Leads AgentWorldBench at 58.71

> The closed mixture-of-experts model edges GPT-5.4 while an Apache 2.0 35B variant ships with weights, data and a new open benchmark built from real agent trajectories.

*Published 2026-06-27 · By Marcus Vance*

Qwen-AgentWorld is a family of language world models that simulate agent environments across seven domains via long chain-of-thought reasoning.

Qwen released two models under the Qwen-AgentWorld name. The larger closed variant uses a mixture-of-experts design.

The smaller variant carries an Apache 2.0 license and includes both weights and training data.

## What background context surrounds the Qwen-AgentWorld release?

Agent systems require models that can predict outcomes of sequences of actions inside simulated environments. Earlier language models often lacked sufficient training on such interaction data.

Qwen drew from its existing large language model lineage to create specialized world models. The effort targets seven distinct domains that cover common agent use cases.

The domains are MCP, Search, Terminal, SWE, Web, OS and Android. Each domain supplies trajectories that reflect real agent behavior.

## What new elements appear in the Qwen-AgentWorld announcement?

The announcement introduces both a 397 billion parameter closed model and a 35 billion parameter open model. The closed model is Qwen-AgentWorld-397B-A17B with 17 billion active parameters.

The open model is Qwen-AgentWorld-35B-A3B with 3 billion active parameters and a 256K context window. Weights for the open model appear on Hugging Face and ModelScope.

A new benchmark named AgentWorldBench accompanies the models. The benchmark aggregates real trajectories collected from five frontier models across nine prior agent benchmarks.

## What technical details define the training and architecture?

Training proceeds through a three-stage pipeline. The stages are continual pre-training, supervised fine-tuning and reinforcement learning.

More than 10 million environment interaction trajectories supply the training signal. These trajectories span the seven listed domains.

The mixture-of-experts design activates only a fraction of total parameters during inference. This structure supports the large total parameter counts while controlling compute.

- Continual Pre-Training (CPT)
- Supervised Fine-Tuning (SFT)
- Reinforcement Learning (RL)

## How do the models compare on AgentWorldBench?

AgentWorldBench measures simulation quality for agent environments. Higher scores indicate closer alignment with observed trajectories from frontier models.

Qwen-AgentWorld-397B-A17B reaches 58.71. GPT-5.4 reaches 58.25. Claude Opus 4.8 reaches 56.59. Gemini 3.1 Pro reaches 54.57.

The 35B variant records an 8.66 point gain over the base Qwen3.5-35B-A3B on the same benchmark.

AgentWorldBench scores for Qwen-AgentWorld-397B-A17B and competing modelsModelAgentWorldBench ScoreQwen-AgentWorld-397B-A17B58.71GPT-5.458.25Claude Opus 4.856.59Gemini 3.1 Pro54.57

## What release terms apply to the open variant?

Qwen-AgentWorld-35B-A3B carries the Apache 2.0 license. The license permits commercial use and modification.

The model and the benchmark are also hosted on GitHub under the QwenLM organization. All open-weight artifacts use the same Apache 2.0 terms.

The 397B model remains closed. Only the 35B model provides public weights.

## What market and stakeholder implications follow from the release?

The open 35B model supplies developers with a ready starting point for agent simulation research. Organizations can fine-tune the weights without licensing restrictions.

The closed 397B model preserves a performance margin for Qwen in internal evaluations. This dual strategy balances openness with competitive differentiation.

Agent framework builders gain access to a model trained specifically on interaction data rather than generic text. This specialization may reduce the need for custom environment simulators in some workflows.

## What expert reactions have accompanied the models?

The release demonstrates that language models can serve as world simulators when trained on sufficient trajectory data. The narrow margin over GPT-5.4 shows continued rapid progress in the category.

The decision to open the 35B variant while keeping the larger model closed reflects standard practices in frontier model releases. The accompanying benchmark release supports reproducible evaluation.

> Today we release Qwen-AgentWorld, a native language world model that simulates agent environments across seven domains.QwenTeam, Qwen research team

## What developments are likely next?

Additional trajectory data may further close the gap between open and closed variants. Community contributions to AgentWorldBench could expand domain coverage.

Integration of the open weights into existing agent orchestration libraries is expected. Such integration would test the models in live environments beyond the benchmark.

Future iterations may increase active parameter counts or context length while retaining the mixture-of-experts structure. The current 256K window already supports extended agent sessions.

## Sources

1. [Qwen-AgentWorld-397B-A17B achieves the highest overall simulation quality, outperforming GPT-5.4, Claude Opus 4.8, and Gemini 3.1 Pro.](https://qwen.ai/blog?id=qwen-agentworld)
2. [We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories...](https://arxiv.org/abs/2606.24597)
3. [Qwen-AgentWorld is the first language world model to cover seven agent interaction domains within a single model. License: apache-2.0](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B)

---
Source: https://aiintelreport.com/frontier-models/qwen-agentworld-397b-agent-simulation
Index: https://aiintelreport.com/llms.txt · Full text: https://aiintelreport.com/llms-full.txt
