← Back to news

GLM-5.2 is the new leading open weights model on Artificial Analysis

artificialanalysis.ai|596 points|314 comments|by himata4113|Jun 17, 2026

GLM-5.2: The New Champion of Open Weights Models

Published June 17, 2026 | Source: Artificial Analysis

Z ai has officially released GLM-5.2, which has ascended to the top spot for open weights models on the Artificial Analysis Intelligence Index. With a score of 51, the model now defines the Pareto frontier when balancing Intelligence against Cost per Task.

While GLM-5.2 maintains the same architectural footprint as its predecessor—GLM-5.1 (featuring 744B744\text{B} total parameters and 40B40\text{B} active parameters)—it achieves a massive Δ11\Delta 11 point increase on the Intelligence Index v4.1. This leap places it significantly ahead of competitors like MiniMax-M3 (44) and DeepSeek V4 Pro (max) (44).


🚀 Key Performance Milestones

Summary: GLM-5.2 is currently the premier open weights model on the Intelligence Index v4.1, outperforming MiniMax-M3, DeepSeek V4 Pro, and Kimi K2.6.

Scientific & Reasoning Breakthroughs

The model shows substantial gains across nearly all evaluations, with the most dramatic improvements seen in scientific reasoning:

  • CritPt: 16%\uparrow 16\% (reaching 21%21\%)
  • HLE: 12%\uparrow 12\% (reaching 40%40\%)
  • AA-LCR: 9%\uparrow 9\% (reaching 71%71\%)
  • tau3 banking: 15%\uparrow 15\% (reaching 27%27\%)
  • SciCode: 7%\uparrow 7\% (reaching 50%50\%)
  • TerminalBench v2.1: 16\uparrow 16 points (reaching 78%78\%)
  • GPQA Diamond: 3\uparrow 3 points (reaching 89%89\%)

Agentic Capabilities: GDPval-AA v2

GLM-5.2 dominates the GDPval-AA v2 benchmark, which measures real-world agentic performance.

ModelGDPval-AA v2 Score
GLM-5.21524
MiniMax-M31418
DeepSeek V4 Pro (max)1328
GPT-5.5 (xhigh reasoning)1514

Note on GDPval-AA v2 Methodology: This updated benchmark improves upon the original by:

  1. Baselining Elo to human performance at 10001000.
  2. Implementing a rotating panel of frontier-model judges.
  3. Extending the turn limit from 100 \rightarrow 250 to better evaluate long-horizon agent trajectories.

📉 Efficiency and Cost Analysis

The Pareto Frontier

GLM-5.2 occupies a highly strategic position on the Intelligence vs. Cost per Task chart, offering the lowest cost for its specific intelligence tier.

Cost Comparison per Task:

  • GLM-5.2: ~$0.46
  • GLM-5.1: $0.25
  • Kimi K2.6: $0.31
  • MiniMax-M3: $0.18
  • DeepSeek V4 Pro (max): $0.05

Token Consumption

One trade-off for this intelligence is token efficiency. GLM-5.2 is more "verbose" in its reasoning process:

  • Total output tokens per task: 43k43\text{k} (of which 37k37\text{k} is dedicated to reasoning).
  • Comparison: This is significantly higher than GLM-5.1 (26k26\text{k}), MiniMax-M3 (24k24\text{k}), Kimi K2.6 (35k35\text{k}), and DeepSeek V4 Pro (37k37\text{k}).

🛠️ Technical Specifications & Availability

Model Profile

  • License: MIT
  • Architecture: 744B Total / 40B Active Parameters
  • Context Window: 200K \rightarrow 1M tokens
  • Omniscience Index: 4 (Up from 2 in GLM-5.1)
    • Accuracy: 25.1%25.1\% (vs 24.2%24.2\%)
    • Hallucination Rate: 28.1%28.1\% (vs 29.4%29.4\%)
    • Attempt Rate: 47%47\% (Flat)

Pricing Structure

The pricing remains consistent with the previous version:

{
  "pricing_per_1M_tokens": {
    "input": "$1.40",
    "output": "$4.40",
    "cache_hit": "$0.26"
  }
}

Where to Access

Beyond Z ai's own API, GLM-5.2 is available via:

  • DeepInfra
  • Novita
  • Nebius
  • Parasail
  • Siliconflow
  • GMI Cloud
  • Baseten
  • Fireworks

🖼️ Visual Data & Analysis

Figure 1: Intelligence Index Positioning

Figure 2: Cost vs. Intelligence Pareto Frontier

Figure 3: Detailed Evaluation Breakdown

Figure 4: Token Efficiency Analysis

Figure 5: Agentic Performance Metrics

Figure 6: Comparison with Proprietary Models

Figure 7: Intelligence Index v4.1 Distribution

Figure 8: Reasoning Token Breakdown


Further Reading:

  • Explore GLM-5.2 at artificialanalysis.ai/models/glm-5-2
  • Read about the shift toward agentic workloads in the Intelligence Index v4.1 announcement (June 16, 2026).
  • Check out the launch of Claude Fable 5, the first Mythos-class model (June 10, 2026).