Running local models is good now
The Era of Viable Local LLMs
By Vicki Boykis | June 15, 2026
After spending significant time experimenting with local models since their inception, I can finally say: they are actually good now.
🛠️ My Technical Environment
To achieve these results, I am utilizing a 2022 M2 Mac. Here is the hardware breakdown:
| Component | Specification |
|---|---|
| Processor | Apple M2 |
| Memory | |
| Storage |
Throughout my journey, I've cycled through a variety of models and orchestration layers:
Models Tested:
Mistral 7BGemma 3OpenAI OSS-20BQwen 3 MOE(and variousQwen 2.5 Coderiterations)Gemma 4(specifically the26b-a4band12b-qatversions)
System Setups:
- Raw
llama.cpppaired with Open WebUI llama-cpp-python- Ollama
llamafiles- LM Studio
📈 The Evolution of Local Intelligence
In the beginning, local models were usable frustrating. They were sluggish, cumbersome to deploy, and generally failed at complex programming tasks.
The "Vibe Metric": I don't have a formal scientific benchmark. Instead, I use a personal heuristic: Do I feel the need to double-check this output against a frontier API model?
The turning point was the release of GPT-OSS. For the first time, my reliance on API-based verification dropped significantly. Initially, I treated local models as a high-speed, personalized version of Google for development queries that didn't require real-time web data.
However, the arrival of the Gemma 4 family changed the game. I can now perform agentic coding locally. I've found that these loops operate at roughly of the speed and accuracy of top-tier frontier models.
The Progression of Capability
💻 Real-World Applications
My current default is the gemma-4-26b-a4b implementation via LM Studio. I've used this setup for several non-trivial tasks:
- Refactoring: Converting a messy Python notebook into a structured repository consisting of 5-6 distinct modules.
- Linting: Ensuring correct type hints for generics (a task frontier models usually handle, but not always consistently).
- Content & Testing: Proofreading blog entries and generating comprehensive unit tests.
- Prototyping: Bootstrapping a two-tower recommendation model from scratch.
![Placeholder: Image showing a basic but functional two-tower model architecture generated by a local agent]
I am also currently developing an application to track trending topics from Arxiv papers. When I asked Pi to analyze my LM Studio logs, it confirmed that most of my work involves "personalized Google" style lookups for my project, Rijksearch. While these aren't "groundbreaking" tasks, they are computationally heavy; my K-V cache frequently expands to fill the entire of RAM.
Crucially: Six months ago, these workflows were simply impossible on local hardware.
⚙️ Setting Up Local Agentic Flows
If you want to replicate this, you don't have to take my word for it—try it yourself. To get started, you will need:
- Inference Engine (e.g., LM Studio)
- Agentic Harness (e.g., Pi)
- Model Artifact (The actual weights)
My Specific Configuration
I use Pi as the harness and LM Studio as the server. While llama.cpp might be faster, this combo is highly accessible.
Key Adjustments:
- Model Choice: While some suggest
Gemma 26B A4B, I recommendgemma-4-12b-qat. It is smaller, faster, and maintains impressive accuracy. - Security: To prevent the agent from running rogue Python code or browsing the web, I encapsulate every session in a Docker container with permissions restricted solely to
bash. (I may addcurllater for research).
Agent Configuration (models.json):
"lmstudio": {
"baseUrl": "http://host.docker.internal:1234/v1",
"api": "openai-completions",
"apiKey": "not-needed",
"models": [
{
"id": "google/gemma-4-12b-qat",
"input": ["text", "image"]
}
]
}
Docker Compose Setup:
services:
pi:
build:
context: .
dockerfile: Dockerfile
image: pi-agent:0.74.0
init: true
stdin_open: true
tty: true
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY :- }
OPENAI_API_KEY: ${OPENAI_API_KEY :- not-needed}
GEMINI_API_KEY: ${GEMINI_API_KEY :- }
OPENAI_API_BASE: ${OPENAI_API_BASE :- http://host.docker.internal:1234/v1}
WHATEVER_API_KEY: ${WHATEVER_API_KEY :- }
volumes:
- ${HOME}/.pi/agent/models.json:/config/models.json
- ${WORKSPACE :- .}:/workspace
- pi-config:/config
- pi-sessions:/sessions
working_dir: /workspace
volumes:
pi-config:
pi-sessions:
Execution Script:
SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
WORKSPACE_DIR="${WORKSPACE :-$(pwd)}"
case "$WORKSPACE_DIR" in
/* ) ;;
* ) WORKSPACE_DIR="$(cd -- "$WORKSPACE_DIR" && pwd)" ;;
esac
export WORKSPACE="$WORKSPACE_DIR"
sandbox="${PI_SANDBOX :- 0}"
pi_args=()
while (( $# )) ; do
case "$1" in
--sandbox ) sandbox=1 ;;
--no ) # ... (truncated)