← Back to news

Running local models is good now

vickiboykis.com|1137 points|460 comments|by jfb|Jun 16, 2026

The Era of Viable Local LLMs

By Vicki Boykis | June 15, 2026

After spending significant time experimenting with local models since their inception, I can finally say: they are actually good now.

🛠️ My Technical Environment

To achieve these results, I am utilizing a 2022 M2 Mac. Here is the hardware breakdown:

ComponentSpecification
ProcessorApple M2
Memory64 GB RAM64\text{ GB RAM}
Storage1 TB SSD1\text{ TB SSD}

Throughout my journey, I've cycled through a variety of models and orchestration layers:

Models Tested:

  • Mistral 7B
  • Gemma 3
  • OpenAI OSS-20B
  • Qwen 3 MOE (and various Qwen 2.5 Coder iterations)
  • Gemma 4 (specifically the 26b-a4b and 12b-qat versions)

System Setups:

  • Raw llama.cpp paired with Open WebUI
  • llama-cpp-python
  • Ollama
  • llamafiles
  • LM Studio

📈 The Evolution of Local Intelligence

In the beginning, local models were usable frustrating. They were sluggish, cumbersome to deploy, and generally failed at complex programming tasks.

The "Vibe Metric": I don't have a formal scientific benchmark. Instead, I use a personal heuristic: Do I feel the need to double-check this output against a frontier API model?

The turning point was the release of GPT-OSS. For the first time, my reliance on API-based verification dropped significantly. Initially, I treated local models as a high-speed, personalized version of Google for development queries that didn't require real-time web data.

However, the arrival of the Gemma 4 family changed the game. I can now perform agentic coding locally. I've found that these loops operate at roughly 75%\approx 75\% of the speed and accuracy of top-tier frontier models.

The Progression of Capability


💻 Real-World Applications

My current default is the gemma-4-26b-a4b implementation via LM Studio. I've used this setup for several non-trivial tasks:

  1. Refactoring: Converting a messy Python notebook into a structured repository consisting of 5-6 distinct modules.
  2. Linting: Ensuring correct type hints for generics (a task frontier models usually handle, but not always consistently).
  3. Content & Testing: Proofreading blog entries and generating comprehensive unit tests.
  4. Prototyping: Bootstrapping a two-tower recommendation model from scratch.

![Placeholder: Image showing a basic but functional two-tower model architecture generated by a local agent]

I am also currently developing an application to track trending topics from Arxiv papers. When I asked Pi to analyze my LM Studio logs, it confirmed that most of my work involves "personalized Google" style lookups for my project, Rijksearch. While these aren't "groundbreaking" tasks, they are computationally heavy; my K-V cache frequently expands to fill the entire 64 GB64\text{ GB} of RAM.

Crucially: Six months ago, these workflows were simply impossible on local hardware.


⚙️ Setting Up Local Agentic Flows

If you want to replicate this, you don't have to take my word for it—try it yourself. To get started, you will need:

  • Inference Engine (e.g., LM Studio)
  • Agentic Harness (e.g., Pi)
  • Model Artifact (The actual weights)

My Specific Configuration

I use Pi as the harness and LM Studio as the server. While llama.cpp might be faster, this combo is highly accessible.

Key Adjustments:

  • Model Choice: While some suggest Gemma 26B A4B, I recommend gemma-4-12b-qat. It is smaller, faster, and maintains impressive accuracy.
  • Security: To prevent the agent from running rogue Python code or browsing the web, I encapsulate every session in a Docker container with permissions restricted solely to bash. (I may add curl later for research).

Agent Configuration (models.json):

"lmstudio": {
    "baseUrl": "http://host.docker.internal:1234/v1",
    "api": "openai-completions",
    "apiKey": "not-needed",
    "models": [
        {
            "id": "google/gemma-4-12b-qat",
            "input": ["text", "image"]
        }
    ]
}

Docker Compose Setup:

services:
  pi:
    build:
      context: .
      dockerfile: Dockerfile
    image: pi-agent:0.74.0
    init: true
    stdin_open: true
    tty: true
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY :- }
      OPENAI_API_KEY: ${OPENAI_API_KEY :- not-needed}
      GEMINI_API_KEY: ${GEMINI_API_KEY :- }
      OPENAI_API_BASE: ${OPENAI_API_BASE :- http://host.docker.internal:1234/v1}
      WHATEVER_API_KEY: ${WHATEVER_API_KEY :- }
    volumes:
      - ${HOME}/.pi/agent/models.json:/config/models.json
      - ${WORKSPACE :- .}:/workspace
      - pi-config:/config
      - pi-sessions:/sessions
    working_dir: /workspace

volumes:
  pi-config:
  pi-sessions:

Execution Script:

SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
WORKSPACE_DIR="${WORKSPACE :-$(pwd)}"

case "$WORKSPACE_DIR" in
    /* ) ;;
    * ) WORKSPACE_DIR="$(cd -- "$WORKSPACE_DIR" && pwd)" ;;
esac

export WORKSPACE="$WORKSPACE_DIR"
sandbox="${PI_SANDBOX :- 0}"
pi_args=()

while (( $# )) ; do
    case "$1" in
        --sandbox ) sandbox=1 ;;
        --no ) # ... (truncated)