DeepSeek AI Claims to Beat ChatGPT & Gemini in Key Benchmarks

China’s most talked about artificial intelligence lab has returned with a headline grabbing release that once again shakes the assumption that frontier AI is a purely American story. DeepSeek, the Hangzhou based startup that rattled Wall Street earlier this year, has unveiled its latest flagship models, DeepSeek V3.2 and the more muscular DeepSeek V3.2 Speciale. According to the company’s own technical report and independent researchers who have tested the release, the new models match or outperform OpenAI’s GPT-5 and Google’s Gemini 3 Pro on several high stakes benchmarks, all while being released under an open source license that anyone can download and run.

The Comeback Story

DeepSeek first stunned the tech world in early 2025, when its R1 reasoning model wiped close to a trillion dollars off US tech stocks in a single session and forced Silicon Valley to admit that a small Chinese lab could build something genuinely competitive with far less compute. After months of quieter updates, the company has now come roaring back with V3.2, signaling that its earlier success was no fluke. The release has landed at a tense moment for the global AI race, with Washington tightening chip export rules and US firms pouring tens of billions of dollars into data centers. That a team operating under those very restrictions has once again delivered a frontier class model is, by any measure, a remarkable engineering feat.

What the Benchmarks Actually Show

The headline claim centers on DeepSeek V3.2 Speciale, the high compute variant aimed squarely at deep reasoning tasks. According to benchmarks published by DeepSeek and verified by independent analysts, Speciale scored 96.0 percent on AIME 2025, the elite American math competition used as a reasoning stress test, edging out GPT-5 High at 94.6 percent and Gemini 3 Pro at 95.0 percent. On the notoriously brutal HMMT February 2025 test, Speciale hit 99.2 percent, reportedly the highest mark among reasoning models to date.

It gets more striking in live competition settings. Speciale scored 35 out of 42 at the 2025 International Mathematical Olympiad, a gold medal level performance, earned 492 out of 600 at the International Olympiad in Informatics, and placed second at the ICPC World Finals by solving ten of twelve problems. DeepSeek says it is the first open source model ever to clear gold medal thresholds at the IMO, CMO, IOI, and ICPC World Finals in the same release cycle. That said, the results are not a clean sweep. On Humanity’s Last Exam, Speciale trails Gemini 3 Pro, and on SWE Verified, the real world software engineering benchmark, it scores 73.1 percent versus Gemini 3 Pro’s 76.2 percent.

What Makes This Release Different

Two things stand out. The first is the price. DeepSeek reports that Speciale delivers this performance at roughly 25 to 30 times lower output token cost than GPT-5 and Gemini 3 Pro. For developers and enterprises running heavy workloads, that kind of cost differential is not a marginal advantage. It is a potential category killer.

The second is the license. DeepSeek has released V3.2 with open weights on Hugging Face under a permissive license, meaning researchers, startups, and even competitors can download the model, study it, fine tune it, and run it on their own hardware. That is a sharp contrast to the tightly guarded model weights behind ChatGPT and Gemini. Architecturally, the release leans on a new sparse attention mechanism called DSA that makes long context inference dramatically cheaper, plus a training pipeline that spent more than 10 percent of pre-training compute on reinforcement learning, a far higher ratio than most labs attempt.

How Silicon Valley Is Reacting

The reaction inside the US AI industry has ranged from cautious respect to visible concern. Analysts have pointed out that DeepSeek achieved these results despite running on less advanced chips than its American rivals, thanks to export restrictions that block the sale of top tier Nvidia GPUs to Chinese firms. One AI researcher summed up the mood by noting that DeepSeek released an IMO gold medal class model before OpenAI or Google did publicly. That framing has stung in a sector where speed to market has become a bragging right.

There are real caveats, though, and American observers have been quick to flag them. Speciale uses significantly more tokens than Gemini 3 Pro to reach similar accuracy, sometimes twice as many, which narrows the real world cost advantage on latency sensitive applications. The model also lacks tool use support in its current release, cannot process images, and is not tuned for conversational interaction. In other words, it is a reasoning specialist, not a drop in replacement for ChatGPT.

The Bigger Picture: The US vs China AI Race

Beyond the benchmarks, the release carries unmistakable geopolitical weight. For years, the dominant Washington narrative has been that aggressive chip controls would keep China at least a generation behind on frontier AI. DeepSeek V3.2 is the latest data point suggesting that assumption deserves a serious rethink. A Chinese lab, working under hardware constraints that would cripple most US startups, has produced a model that genuinely trades blows with the best America has to offer and open sourced it on top of that.

For policymakers, that is a complicated outcome. Open weights mean that any restrictions on American access to DeepSeek are effectively moot, since the model can be downloaded by anyone, anywhere. For US firms like OpenAI, Anthropic, and Google, it means the pricing umbrella they have enjoyed at the top of the market is shrinking. And for the broader industry, it reinforces a trend that has been building all year, namely that the AI frontier is no longer a two or three horse race.

Should You Be Paying Attention?

If you are a casual ChatGPT user asking it to draft emails, the short answer is no, not yet. Speciale is API only, skews toward hardcore reasoning tasks, and is not designed to replace a conversational assistant. But if you are a developer, a startup founder working with tight margins, or an enterprise team running high volume AI workloads, DeepSeek V3.2 is worth a serious look. The cost profile alone could rewrite budgets, and the open source angle gives you control that closed models simply do not offer. One thing is now beyond dispute. DeepSeek is back, it is serious, and the conversation about who leads in AI just got a lot more interesting.

Frequently Asked Questions

What is DeepSeek V3.2?

DeepSeek V3.2 is the latest flagship artificial intelligence model from Chinese AI lab DeepSeek. It comes in two versions, a standard V3.2 that powers the DeepSeek app and web interface, and a higher compute V3.2 Speciale variant available through the API and focused on deep reasoning tasks.

Does DeepSeek really beat ChatGPT and Gemini?

DeepSeek V3.2 Speciale outperforms GPT-5 and Gemini 3 Pro on several specific benchmarks including AIME 2025, HMMT, Codeforces, and live math and coding olympiads. However, Gemini 3 Pro still leads on some tests like Humanity’s Last Exam and SWE Verified, so the picture is competitive rather than a clean sweep.

Is DeepSeek V3.2 open source?

Yes. DeepSeek has released the weights for both V3.2 and V3.2 Speciale on Hugging Face under a permissive open source license, meaning developers and researchers can download, inspect, and run the models on their own infrastructure.

How much cheaper is DeepSeek compared to ChatGPT and Gemini?

According to DeepSeek, the Speciale variant produces output tokens at roughly 25 to 30 times lower cost than GPT-5 and Gemini 3 Pro on comparable tasks, though it also tends to use more tokens per answer, which narrows the real world savings on some workloads.

Can I use DeepSeek V3.2 in the United States?

Yes. The DeepSeek app is available to US users, and the open weights can be downloaded and run locally by anyone. That said, users and businesses should be mindful that data sent to DeepSeek’s hosted API may be processed on servers in China, which raises privacy and compliance questions for sensitive use cases.

What are the limitations of DeepSeek V3.2 Speciale?

Speciale is powerful but narrow. It does not currently support tool use, cannot process images, generates significantly more tokens per response than competing models, and is not optimized for casual conversation. It is best understood as a specialist reasoning engine rather than a general purpose assistant.

Why does this release matter for the US vs China AI race?

The release shows that a Chinese lab operating under strict US chip export restrictions can still produce a model that competes with the best American systems and then open source it for the world to use. That challenges a core assumption behind current US AI policy and signals that the global AI frontier is no longer a two horse race.