The Era of Viable Local LLMs

By Vicki Boykis | June 15, 2026

After spending significant time experimenting with local models since their inception, I can finally say: they are actually good now.

🛠️ My Technical Environment

To achieve these results, I am utilizing a 2022 M2 Mac. Here is the hardware breakdown:

Component	Specification
Processor	Apple M2
Memory	$64\text{ GB RAM}$
Storage	$1\text{ TB SSD}$

Throughout my journey, I've cycled through a variety of models and orchestration layers:

Models Tested:

Mistral 7B
Gemma 3
OpenAI OSS-20B
Qwen 3 MOE (and various Qwen 2.5 Coder iterations)
Gemma 4 (specifically the 26b-a4b and 12b-qat versions)

System Setups:

Raw llama.cpp paired with Open WebUI
llama-cpp-python
Ollama
llamafiles
LM Studio

📈 The Evolution of Local Intelligence

In the beginning, local models were ~~usable~~ frustrating. They were sluggish, cumbersome to deploy, and generally failed at complex programming tasks.

The "Vibe Metric": I don't have a formal scientific benchmark. Instead, I use a personal heuristic: Do I feel the need to double-check this output against a frontier API model?

The turning point was the release of GPT-OSS. For the first time, my reliance on API-based verification dropped significantly. Initially, I treated local models as a high-speed, personalized version of Google for development queries that didn't require real-time web data.

However, the arrival of the Gemma 4 family changed the game. I can now perform agentic coding locally. I've found that these loops operate at roughly $\approx 75\%$ of the speed and accuracy of top-tier frontier models.

The Progression of Capability

💻 Real-World Applications

My current default is the gemma-4-26b-a4b implementation via LM Studio. I've used this setup for several non-trivial tasks:

Refactoring: Converting a messy Python notebook into a structured repository consisting of 5-6 distinct modules.
Linting: Ensuring correct type hints for generics (a task frontier models usually handle, but not always consistently).
Content & Testing: Proofreading blog entries and generating comprehensive unit tests.
Prototyping: Bootstrapping a two-tower recommendation model from scratch.

![Placeholder: Image showing a basic but functional two-tower model architecture generated by a local agent]

I am also currently developing an application to track trending topics from Arxiv papers. When I asked Pi to analyze my LM Studio logs, it confirmed that most of my work involves "personalized Google" style lookups for my project, Rijksearch. While these aren't "groundbreaking" tasks, they are computationally heavy; my K-V cache frequently expands to fill the entire $64\text{ GB}$ of RAM.

Crucially: Six months ago, these workflows were simply impossible on local hardware.

⚙️ Setting Up Local Agentic Flows

If you want to replicate this, you don't have to take my word for it—try it yourself. To get started, you will need:

Inference Engine (e.g., LM Studio)
Agentic Harness (e.g., Pi)
Model Artifact (The actual weights)

My Specific Configuration

I use Pi as the harness and LM Studio as the server. While llama.cpp might be faster, this combo is highly accessible.

Key Adjustments:

Model Choice: While some suggest Gemma 26B A4B, I recommend gemma-4-12b-qat. It is smaller, faster, and maintains impressive accuracy.
Security: To prevent the agent from running rogue Python code or browsing the web, I encapsulate every session in a Docker container with permissions restricted solely to bash. (I may add curl later for research).

Agent Configuration (models.json):

"lmstudio": {
    "baseUrl": "http://host.docker.internal:1234/v1",
    "api": "openai-completions",
    "apiKey": "not-needed",
    "models": [
        {
            "id": "google/gemma-4-12b-qat",
            "input": ["text", "image"]
        }
    ]
}

Docker Compose Setup:

services:
  pi:
    build:
      context: .
      dockerfile: Dockerfile
    image: pi-agent:0.74.0
    init: true
    stdin_open: true
    tty: true
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY :- }
      OPENAI_API_KEY: ${OPENAI_API_KEY :- not-needed}
      GEMINI_API_KEY: ${GEMINI_API_KEY :- }
      OPENAI_API_BASE: ${OPENAI_API_BASE :- http://host.docker.internal:1234/v1}
      WHATEVER_API_KEY: ${WHATEVER_API_KEY :- }
    volumes:
      - ${HOME}/.pi/agent/models.json:/config/models.json
      - ${WORKSPACE :- .}:/workspace
      - pi-config:/config
      - pi-sessions:/sessions
    working_dir: /workspace

volumes:
  pi-config:
  pi-sessions:

Execution Script:

SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
WORKSPACE_DIR="${WORKSPACE :-$(pwd)}"

case "$WORKSPACE_DIR" in
    /* ) ;;
    * ) WORKSPACE_DIR="$(cd -- "$WORKSPACE_DIR" && pwd)" ;;
esac

export WORKSPACE="$WORKSPACE_DIR"
sandbox="${PI_SANDBOX :- 0}"
pi_args=()

while (( $# )) ; do
    case "$1" in
        --sandbox ) sandbox=1 ;;
        --no ) # ... (truncated)