Local Qwen isn't a worse Opus, it's a different tool
Local Qwen isn't a worse Opus, it's a different tool
Many claim that local models like Qwen 27B or 35-A3B are "near-Opus level." However, I'm not basing my perspective on a cursory glance, a random X post about canceling a Claude subscription, or a hobbyist's report of a model crawling at single-digit tokens per second with a tiny 32K context window. Nor is this a tweet from a celebrity CEO coding on a plane.
Instead, this is a transparent account from a founder of a small software business where local models have provided actual, albeit caveated, value. I have skin in the game, but no motive to shill cloud or local solutions; I simply want local models to be reliable.
What this exploration covers:
- How the hardware investment paid for itself in 2-3 months.
- The specific business use cases it serves.
- Why unsupervised trust is still impossible.
- Qwen's biggest flaws:
perfect reliabilityinfinite loops and hallucinations (especially when quantized for consumer GPUs).
My AI Use Case & Background
My path as a founder and maintainer began with OpenFaaS, which I built entirely by hand back in 2016. I laid the foundation alone and then grew it through community involvement—not because I lacked the ability to solo it, but because I wanted a successful open-source ecosystem.
My professional trajectory looked like this:
- 2017: Joined VMware to fund my time.
- 2019: Shifted toward an open-core, bootstrapped company model due to market changes.
The Product Ecosystem
Our lean team currently manages a suite of tools focused on efficiency, control, and autonomy:
These products rely on low-level Linux primitives: containers, Firecracker microVMs, network protocols, and Kubernetes. They are primarily written in Go, with some React for UIs and documentation. Because we are small, we provide high-touch, "non-scalable" support to our users.
I've adopted AI tools since their inception—starting with VS Code tab completion, moving to ChatGPT for bug hunting, and eventually spending 12 hours a day in tmux. I even built Superterm.dev to manage my sessions and visualize coding agents. I've watched AI evolve from "boilerplate reduction" to "end-to-end architecture." While I still handle my own writing, I rarely write code by hand now rely heavily on Claude or Codex.
The Frontier Intelligence Shift
Between November 2025 and January 2026, a paradigm shift occurred. Developers on X began reporting that Claude Opus had evolved to the point of handling nearly all their professional workloads.
- Manual coding became as obsolete as milk left in the sun.
- Cost: Top-tier plans settled around
$200 / mofor individuals. - Limits: If you manage your unattended tasks, you can stretch the 5-hour and weekly limits.
The Case for Local Models
One might ask: "Why use anything less than the absolute best?"
In 2026, we are in a strange era where any idea can be cloned overnight by an unknown competitor using a subscription in a developing nation. I've seen this happen to SlicerVM (hand-written in 2022) and Superterm (created in 2026 via agents).
While a "vibecoded" clone isn't the same as a well-architected solution backed by an experienced team, in a market where the cost of software drops to near zero, "free and good enough" often wins.
The Capacity Gap
There is a massive difference in scale between frontier and local models. Frontier models are estimated at:
This isn't just a marginal increase; it's a different league of reasoning and knowledge. Yet, a dense model like Qwen 3.6 27B performs surprisingly well.
| Model | SWE-Bench Verified Score |
|---|---|
| Claude Opus 4.8 | |
| Qwen 3.6 27B |
This gap leads people to shout that "local is nearly SOTA," claiming a 6-year-old GPU can replace a \200/\text{mo}$ subscription.
The Trap of "Benchmaxxing"
Benchmarks are moving targets. Since they are public, models can be tuned specifically to score higher on them.
The SWE-Bench Verified test focuses on Python issues. While Python supports async and threading, the majority of its codebase is single-threaded and synchronous. This is fundamentally different from our work: distributed systems written in Go.
In Go, we deal with:
channelscontextsstructs
When a local model fails, it doesn't just give a wrong answer; it often enters a failure state like this:
// Example of a hallucinated infinite loop risk
for {
// The model might forget the exit condition
// or hallucinate a variable that never changes
if condition {
break
}
// ... logic that never triggers 'condition'
}
This risk of infinite loops and hallucinations spikes significantly when you quantize the model to fit it onto consumer-grade hardware.