Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
Discussion: Transitioning from Cloud LLMs to Local Models for Software Development
The developer community is currently engaged in a heated debate: Can a local Large Language Model (LLM) truly replace the productivity gains provided by proprietary giants like Claude 3.5 Sonnet or GPT-4o?
While the allure of privacy and zero subscription fees is strong, the reality of "local-first" coding is a complex trade-off between raw intelligence and sovereignty.
The Current Landscape
For most, the "Gold Standard" remains Claude 3.5 Sonnet due to its superior reasoning and coding capabilities. However, the gap is closing. Many developers are attempting to move away from the cloud to avoid data leaks and latency.
Local models are completely useless for complex architecture. Actually, they are becoming viable for specific tasks.
The Hardware Bottleneck
The primary constraint isn't the software, but the VRAM. To run a model effectively, you need enough GPU memory to hold the weights. The relationship can be roughly simplified as:
(Where 1.2 accounts for KV cache and overhead).
Local Contenders vs. Cloud Giants
Depending on the task, different models excel. Below is a comparison of the current options discussed by the community:
| Feature | Claude 3.5 / GPT-4o | DeepSeek-Coder-V2 | Llama 3 (70B) | CodeQwen |
|---|---|---|---|---|
| Reasoning | ||||
| Privacy | Low (Cloud) | High (Local) | High (Local) | High (Local) |
| Latency | Variable (Network) | Low (Local) | Low (Local) | Very Low |
| Cost | Subscription/API | Free (Hardware cost) | Free (Hardware cost) | Free (Hardware cost) |
The Workflow Integration
Most users aren't just using a chat interface; they are integrating these models directly into their IDEs.
"The magic happens when you stop copy-pasting and start using a bridge that connects your local weights to your editor's context."
Recommended Toolstack
- Backend:
OllamaorvLLMfor serving the model. - Frontend/Plugin:
Continue.devorTabbyfor IDE integration. - Model Selection:
deepseek-coderfor logic,starcoder2for autocomplete.
Implementation Example
To run a coding model via Ollama, a user might execute:
ollama run deepseek-coder-v2:16b
And configure their config.json in Continue.dev:
{
"models": [
{
"title": "Local DeepSeek",
"provider": "ollama",
"model": "deepseek-coder-v2"
}
]
}
Decision Logic: Which one to use?
The community generally follows a hybrid decision tree when deciding which model to invoke for a specific coding task:
The "Hybrid" Strategy
Rather than a total replacement, most "power users" adopt a tiered approach. They use local models for the "grunt work" and cloud models for the "brain work."
The Local Checklist for Setup:
- Upgrade GPU to at least 24GB VRAM (e.g., RTX 3090/4090).
- Install
Ollamafor easy model management. - Configure
Continue.devin VS Code or JetBrains. - Test
DeepSeek-Coder-V2for Python/TypeScript proficiency. - Set up a fallback API key for GPT-4o for "impossible" bugs.

Final Nuances
The consensus is that while we aren't quite at the point where a local model can replace Claude 3.5 for a senior engineer's entire day, we are very close for 80% of daily tasks. The nuance lies in the quantization; a 4-bit quantized version of a large model often outperforms a full-precision version of a smaller model.
It is no longer a question of "if" local models can code, but "when" the hardware becomes cheap enough to make the cloud irrelevant.