โ† Back to news

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

arxiv.org|332 points|175 comments|by timhigins|Jun 23, 2026

VibeThinker-3B: Redefining the Limits of Verifiable Reasoning in Compact LLMs

This technical report introduces VibeThinker-3B, a dense language model featuring 3 billion parameters. The primary objective of this project was to determine the maximum potential for verifiable reasoning when constrained to a strictly small-model architecture.

๐Ÿ› ๏ธ The Development Pipeline

The researchers utilized a specialized post-training framework known as the Spectrum-to-Signal paradigm. The model's capabilities were honed through a systematic, three-stage optimization process:

  1. Curriculum-based Supervised Fine-Tuning (SFT): A structured approach to initial training.
  2. Multi-domain Reinforcement Learning (RL): Utilizing techniques like GRPO to refine reasoning paths.
  3. Offline Self-Distillation: Further compressing and refining the model's internal logic.

๐Ÿ“Š Performance Benchmarks

VibeThinker-3B demonstrates "frontier-level" capabilities, often rivaling or surpassing flagship models that are significantly larger (e.g., Gemini 3 Pro, GLM-5, and DeepSeek V3.2).

BenchmarkScore / MetricNotes
AIME2694.394.3Increases to 97.197.1 via claim-level test-time scaling
LiveCodeBench v680.280.2Measured as Pass@1\text{Pass@1}
LeetCode (Unseen)96.1%96.1\%Acceptance rate on recent contests
IFEval93.493.4Validates strict instruction following

Key Achievements:

  • Outperforms models orders of magnitude larger.
  • Maintains high instruction controllability (via IFEval).
  • Exhibits exceptional out-of-distribution generalization.

๐Ÿง  Theoretical Contribution: The Parametric Compression-Coverage Hypothesis

The findings from VibeThinker-3B (and previous 1.5B iterations) lead the authors to propose a new theoretical framework:

The Parametric Compression-Coverage Hypothesis: This theory posits that verifiable reasoning can be compressed into "compact reasoning cores." In contrast, open-domain knowledge and general competence require "broad parameter coverage" to account for the vast array of facts, concepts, and long-tail edge cases.

In mathematical terms, we can view the requirement for parameters (PP) as: Preasoningโ‰ชPknowledgeP_{reasoning} \ll P_{knowledge}

This suggests that small models are just efficient alternatives โ†’\rightarrow a complementary path toward achieving frontier performance in specific, dense capability regimes.

๐Ÿ“ Summary of Impact

By proving that a 3B model can compete with the industry's largest reasoning systems, VibeThinker-3B shifts the narrative on model scaling. It demonstrates that for tasks where the answer is verifiable (like math and code), architectural efficiency and training quality can override raw parameter count.


Reference: arXiv:2606.16140 Authors: Sen Xu, Shixi Liu, Wei Wang, et al. Date: June 15, 2026

license icon