Frontier Models
Meituan LongCat-2.0: 1.6T-Parameter MoE Model Trained on Domestic ASICs
Food delivery company Meituan demonstrates that domestic Chinese hardware can support full training of a trillion-parameter agentic model, with open-sourcing planned.
LongCat-2.0 is a 1.6-trillion-parameter Mixture-of-Experts language model with approximately 48 billion active parameters per token.
Meituan announced the release of LongCat-2.0 after completing pretraining on more than 35 trillion tokens using a large domestic compute cluster. The food delivery company has positioned the model as optimized for agentic coding and workflows, with capabilities that extend to handling extended sequences through its native context window. A preview deployment operated under the name Owl Alpha on OpenRouter, where it accumulated high call volumes that placed it among the top three models globally at times. The full release includes plans to distribute weights on public repositories, allowing broader experimentation by researchers and developers.
What background led to the development of LongCat-2.0?
Meituan built its AI efforts around operational needs in logistics and customer service before scaling to frontier-scale research. Domestic hardware constraints have shaped recent Chinese AI projects, and the company elected to train from scratch on locally produced ASICs rather than relying on imported accelerators. This choice aligns with broader national priorities around technology independence. Pretraining proceeded across millions of accelerator-days without reported major loss spikes, indicating stable scaling behavior at this parameter count.
The decision to pursue a Mixture-of-Experts architecture reflects efficiency considerations when operating under hardware limitations. Only a fraction of the total parameters activates for each token, which reduces memory and compute demands during both training and inference. Meituan integrated features such as LongCat Sparse Attention and N-gram Embedding to manage the 1-million-token context window effectively. These design elements support agentic use cases where models must maintain coherence over long interactions or code repositories.
What technical specifications define LongCat-2.0?
LongCat-2.0 contains 1.6 trillion total parameters while activating between 33 billion and 56 billion per token, averaging near 48 billion. The architecture employs dynamic routing to select expert subnetworks, which enables the model to reach high capacity without proportional increases in active compute. Pretraining data volume exceeded 35 trillion tokens, drawn from diverse sources to support coding, reasoning, and general language tasks. Inference runs on the same 50,000-card domestic cluster used for training, confirming end-to-end compatibility.
| Specification | Value |
|---|---|
| Total Parameters | 1.6 trillion |
| Active Parameters per Token | ~48 billion (range 33B-56B) |
| Context Window | 1 million tokens |
| Pretraining Tokens | 35 trillion+ |
| Training Hardware | 50,000 domestic ASICs |
| Optimization Focus | Agentic coding and workflows |
How does LongCat-2.0 perform on benchmarks?
On SWE-bench Pro the model recorded a score of 59.5, reflecting strong performance on software engineering tasks that require multi-step code modifications. Terminal-Bench 2.1 yielded 70.8, indicating reliable handling of terminal-based agent workflows. These results place the model competitively among large systems despite the hardware constraints. The scores derive from evaluations conducted by the development team and reported through official model documentation.
What are the market and stakeholder implications?
The release signals that Chinese firms can achieve competitive scale without foreign accelerators, potentially accelerating similar projects at other domestic companies. Open-sourcing under a permissive license may encourage adoption in research communities seeking alternatives to closed models. Stakeholders in the global supply chain could observe shifts in procurement strategies as hardware independence becomes more feasible. Agentic optimization may also influence enterprise deployment patterns where long-context reasoning is required.
Meituan's background in high-volume transaction processing likely informed the focus on reliable, long-running agent behaviors. Integration potential with existing delivery and logistics platforms offers a direct path to real-world validation. Other organizations monitoring the open weights release may adapt the architecture for their own clusters, testing whether the reported efficiency gains hold across varied hardware setups.
What expert reactions have surfaced regarding the release?
Industry observers note the significance of completing a full training run on domestic processors at this scale. The absence of major training instabilities suggests mature engineering practices around the chosen ASIC cluster. Questions remain about long-term maintenance costs and whether the active parameter count delivers sustained advantages in production environments compared with denser models.
LongCat-2.0 has demonstrated that we now have the capability to train large-scale models on domestic computing clusters.Meituan
What comes next for LongCat-2.0?
Meituan intends to publish the complete weights on GitHub and Hugging Face, enabling community fine-tuning and evaluation. Additional work on agent tooling and integration with coding environments is expected to follow the initial release. Performance on extended benchmarks and real-world agent tasks will determine further adoption.
- Full model weights will be released on GitHub and Hugging Face under a permissive license.
- Further optimizations for agentic workflows and coding tasks are anticipated.
- Integration with Meituan operational systems may provide additional validation data.
- Other Chinese technology firms are likely to pursue comparable domestic training runs.
- Community evaluations will clarify comparative strengths against existing frontier systems.
The open release creates opportunities for independent verification of the claimed hardware efficiency and benchmark scores. Researchers may explore scaling laws specific to the MoE design under domestic ASIC constraints. Continued monitoring of usage patterns on platforms such as OpenRouter will indicate practical demand for the model's agentic capabilities.
Frequently asked
What makes LongCat-2.0 different from other large models?
It is the first trillion-parameter model trained entirely on domestic Chinese chips, with a 1 million token context window and optimization for agentic workflows.