The History Zhipu AI Did Not Expect to Make
On February 11 2026, Z.ai (formerly Zhipu AI) officially released GLM-5.1 — its new flagship large language model — just days before Lunar New Year. Within hours of the announcement, the stock surged 34 percent on the Hong Kong Stock Exchange. Zhipu had become China's first publicly traded AI company only in January 2026, and this launch was its first major product release as a listed entity.
But the story has a prequel. Five days before the official launch, OpenRouter quietly listed a model called "Pony Alpha" — no attribution, zero cost, 200K context window. It processed 40 billion tokens on its first day. The AI community speculated wildly: DeepSeek V4? Grok 4.2? A new Claude variant?
The evidence quickly pointed to Zhipu. The model self-identified as GLM under certain prompts. The output style matched the GLM series. And the timing aligned perfectly with Zhipu's pre-announced release window. Some caught the zodiac joke: 2026 is the Year of the Horse. Pony Alpha.
What happened: Pony Alpha was GLM-5.1 getting a genuine live stress test with real users before the official launch — free exposure, real feedback, no hype cycle distorting the results. This is increasingly common in the AI industry and it is a sensible approach.
The Numbers
GLM-5.1 is a 744 billion parameter Mixture-of-Experts model with 40B active parameters per token. Pre-training data grew from 23T to 28.5T tokens. The context window is 200K tokens, and the model uses DeepSeek Sparse Attention (DSA) for efficient long-range handling.
On the benchmarks that matter for coding and agentic tasks:
- SWE-bench Verified: 77.8% — beats Gemini 3 Pro (76.2%) and GPT-5.2 (75.4%), trails Claude Opus 4.5 (80.9%)
- AIME 2026: 92.7%
- GPQA-Diamond: 86.0%
- CC-Bench-V2 frontend build success: 98%
- Vending Bench 2: Leads open-source models (runs a simulated vending machine business over a full year)
One notable result from Artificial Analysis: GLM-5.1 achieved a score of negative one on the AA-Omniscience Index — meaning it leads the industry in knowing when to say it does not know, rather than hallucinating. This is the kind of concrete improvement that matters enormously in production deployments.
The Huawei Chip Angle
According to Reuters, GLM-5.1 was trained entirely on Huawei Ascend chips using the MindSpore framework — zero dependency on NVIDIA hardware. Zhipu has been on the U.S. Entity List since January 2025, which bans access to H100 and H200 GPUs.
The fact that they can produce a frontier-class model under these constraints is significant. It suggests that China's domestic compute stack — at least at the level of a well-funded national champion — is more viable at scale than many Western analysts assumed. The Ascend 910B chips are reportedly competitive with the A100 for training workloads, and MindSpore has matured into a credible alternative to PyTorch for large-scale distributed training.
Geopolitical bottom line: GLM-5.1 is as much a statement about the viability of China's AI compute stack as it is a product launch. The U.S. chip export controls were designed precisely to prevent this scenario.
Pricing and Access
GLM-5.1 is available via the Z.ai API at $1.00 per million input tokens and $3.20 per million output tokens — roughly 5x cheaper on input and 8x cheaper on output compared to Claude Opus 4.6. It is also available on Hugging Face (under MIT licence) and OpenRouter.
The GLM Coding Plan — Zhipu's answer to Anthropic's Claude Code — saw a 30 percent price hike in the week following the model's launch, driven by demand. The company is clearly in a strong negotiating position with the market right now.
What the Stealth Launch Tells Us
The Pony Alpha episode is instructive. Zhipu ran GLM-5.1 in production, at scale, with real users — and got genuine usage data and feedback before the official announcement. No press coverage, no hype cycle, no competitor analysis distorted by advance briefings. Just raw signal.
As the AI industry matures, expect more of these quiet production rollouts ahead of major announcements. The days when a model launch required a big keynote are fading. The teams that can run genuine production traffic before launch have a meaningful advantage in understanding how their models actually behave under load.
Release details from Reuters (February 11 2026). Benchmark data from Hugging Face blog and Artificial Analysis. [Source] [Source]


