Summarize

New MiniMax-M1 dethrones DeepSeek as China's top AI model

By Mudit Dube

Jun 18, 2025

10:16 am

What's the story

Shanghai-based MiniMax has launched an open-source AI reasoning model that rivals Chinese competitor DeepSeek and US-based OpenAI and Google in terms of performance and cost. The new model, named MiniMax-M1, was released on Monday under an Apache software license. Unlike Meta's Llama family, which is offered under a community license that's not open source, and DeepSeek, which is only partially under an open-source license, MiniMax-M1 is fully open source.

Performance comparison

MiniMax-M1's capabilities are top-tier among open-source models

MiniMax claims that in complex, productivity-oriented scenarios, M1's capabilities are top-tier among open-source models. The company also says it beats domestic closed-source models and comes close to the best overseas models while providing the industry's best cost-effectiveness. On various benchmarks (AIME 2024, LiveCodeBench, SWE-bench Verified, Tau-bench, and MRCR), M1 is competitive with OpenAI o3, Gemini 2.5 Pro, Claude 4 Opus, DeepSeek R1, DeepSeek R1-0528, and Qwen3-235B to varying degrees.

Model specifications

One million tokens context window

MiniMax is gunning for DeepSeek's top spot in the industry by boasting a context window (the amount of input it can handle) of one million tokens. This limit matches Google Gemini 2.5 Pro and is eight times the capacity of DeepSeek R1. The model can also manage an output of 80,000 tokens, which beats DeepSeek's 64,000 token capacity but falls short of OpenAI's o3 that can deliver a whopping 100,000 tokens in response to a prompt.

Efficiency boost

Lightning Attention mechanism makes M1 model compute efficient

Backed by Alibaba Group, Tencent, and IDG Capital, MiniMax touts its Lightning Attention mechanism as a way to calculate attention matrices that improves both training and inference efficiency. The company claims this feature gives the M1 model an edge when computing long context inputs and reasoning. "When performing deep reasoning with 80,000 tokens it requires only about 30% of the computing power of DeepSeek R1," MiniMax said.

Cost efficiency

CISPO algorithm leads to lower computing costs

The more efficient computation method of MiniMax-M1, along with an improved reinforcement learning algorithm called CISPO, results in lower computing costs. "The entire reinforcement learning phase used only 512 [NVIDIA] H800s for three weeks with a rental cost of just $537,400," MiniMax said. "This is an order of magnitude less than initially anticipated."