What Makes a Good Agent Harness?
And why the sandbox paradigm is more nuanced than it looks.
There is a question that keeps surfacing whenever engineers move from building LLM-powered prototypes to production agents: how much structure should the scaffolding impose on the model?
Get it wrong in one direction, and your "agent" is just a fancy state machine with an LLM classifier bolted on. Get it wrong in the other direction, and you have an autonomous process that is powerful, unpredictable, and nearly impossible to debug.
This post explores what a well-designed agent harness actually looks like, where the sandbox paradigm fits, and how to think about the tradeoffs.
The Core Tension
This represents a spectrum:
- State Machine: fully pre-defined control flow
- ReAct: mixed reasoning + tool use with structure
- Function Calling: model chooses tools but within fixed contracts
- Sandbox: model writes and executes code freely inside constraints
The key question is not “which is best”, but which matches the task and model capability.
What a Harness Actually Does
A harness has exactly three legitimate jobs:
- execute what the LLM decided
- return results faithfully (including errors)
- enforce safety boundaries without replacing reasoning
In practice, this means the harness is just an execution layer. It should not “interpret” or “optimize” the model’s intent.
Four Ways Harnesses Go Wrong
1. Structural Lobotomy
Instead of letting the model reason, we pre-route everything into fixed branches like:
- search
- answer
- clarify
This turns the model into a classifier. It cannot dynamically chain tools or revise plans.
2. Context Fragmentation
When we split an agent into many micro-nodes, each node only sees partial history.
The result is predictable: the system loses global coherence, and decisions become locally optimal but globally inconsistent.
3. Swallowed Errors
A common failure mode is silently handling tool failures:
- retrying without exposing the error
- substituting fallback outputs
- hiding timeouts or exceptions
This removes a critical signal. If the model cannot see failures, it cannot adapt its strategy.
4. Harness-Owned Termination
There are two ways to stop an agent loop:
- Bad default: stop after
max_steps - Better design: stop when the model explicitly signals completion
The first forces arbitrary cutoffs; the second preserves the model’s agency.
The Ideal Harness Structure
The ideal design is simple:
-
The user sends a task to the harness
-
The harness forwards full context + tool definitions to the model
-
The model decides:
- what tool to call
- how to interpret results
- when to stop
-
The harness only executes tools and returns raw outputs
A key principle: the model owns decision-making, the harness owns execution.
The Sandbox Paradigm
The sandbox is the extreme version of this design philosophy.
Instead of giving the model a fixed set of tools, we give it a runtime environment where it can write and execute code.
What this changes conceptually:
- Tool selection becomes program writing
- Multi-step reasoning becomes code execution flow
- Intermediate steps are no longer externalized as tool calls, but internalized in code
Tradeoffs:
Advantages
- extremely expressive
- flexible for unknown workflows
- easy to audit (code is explicit)
Limitations
- requires strong coding ability from the model
- harder integration with authenticated external systems
- resource usage becomes less predictable
- failures can be more complex to debug
Choosing the Right Paradigm
A practical decision rule:
| Scenario | Best fit |
|---|---|
| Fully predictable workflow | State machine |
| Multi-step reasoning with known tools | Function calling |
| Open-ended tasks without fixed structure | Sandbox |
A useful intuition:
- If you already know the steps → don’t use a sandbox
- If you don’t know the steps → sandbox may be appropriate
- If you want controlled flexibility → function calling is usually enough
A Note on Model Capability
The most important factor is not architecture — it is model quality.
Weak models:
- mis-select tools
- fail to recover from errors
- lose coherence in long tasks
Strong models:
- self-correct when tools fail
- maintain long-horizon intent
- naturally choose appropriate abstraction levels
In other words, harness design amplifies capability, it does not replace it.
Summary
A good agent harness:
- keeps the LLM in control of decisions
- executes tools without interpretation
- returns full error signals
- preserves full context
- avoids imposing artificial structure
- lets the model decide when to stop
The sandbox paradigm is the most powerful expression of this idea — but also the most demanding.
It works well only when the model is strong enough to operate with that level of freedom. Otherwise, simpler function-calling systems are usually more reliable and easier to control.
Related reading: Does a Harness Lobotomize Your LLM? — on the structural failure modes of over-engineered agent scaffolding.