And why the sandbox paradigm is more nuanced than it looks.

There is a question that keeps surfacing whenever engineers move from building LLM-powered prototypes to production agents: how much structure should the scaffolding impose on the model?

Get it wrong in one direction, and your "agent" is just a fancy state machine with an LLM classifier bolted on. Get it wrong in the other direction, and you have an autonomous process that is powerful, unpredictable, and nearly impossible to debug.

This post explores what a well-designed agent harness actually looks like, where the sandbox paradigm fits, and how to think about the tradeoffs.

The Core Tension

This represents a spectrum:

State Machine: fully pre-defined control flow
ReAct: mixed reasoning + tool use with structure
Function Calling: model chooses tools but within fixed contracts
Sandbox: model writes and executes code freely inside constraints

The key question is not “which is best”, but which matches the task and model capability.

What a Harness Actually Does

A harness has exactly three legitimate jobs:

execute what the LLM decided
return results faithfully (including errors)
enforce safety boundaries without replacing reasoning

In practice, this means the harness is just an execution layer. It should not “interpret” or “optimize” the model’s intent.

Four Ways Harnesses Go Wrong

1. Structural Lobotomy

Instead of letting the model reason, we pre-route everything into fixed branches like:

search
answer
clarify

This turns the model into a classifier. It cannot dynamically chain tools or revise plans.

2. Context Fragmentation

When we split an agent into many micro-nodes, each node only sees partial history.

The result is predictable: the system loses global coherence, and decisions become locally optimal but globally inconsistent.

3. Swallowed Errors

A common failure mode is silently handling tool failures:

retrying without exposing the error
substituting fallback outputs
hiding timeouts or exceptions

This removes a critical signal. If the model cannot see failures, it cannot adapt its strategy.

4. Harness-Owned Termination

There are two ways to stop an agent loop:

Bad default: stop after max_steps
Better design: stop when the model explicitly signals completion

The first forces arbitrary cutoffs; the second preserves the model’s agency.

The Ideal Harness Structure

The ideal design is simple:

The user sends a task to the harness
The harness forwards full context + tool definitions to the model
The model decides:
- what tool to call
- how to interpret results
- when to stop
The harness only executes tools and returns raw outputs

A key principle: the model owns decision-making, the harness owns execution.

The Sandbox Paradigm

The sandbox is the extreme version of this design philosophy.

Instead of giving the model a fixed set of tools, we give it a runtime environment where it can write and execute code.

What this changes conceptually:

Tool selection becomes program writing
Multi-step reasoning becomes code execution flow
Intermediate steps are no longer externalized as tool calls, but internalized in code

Tradeoffs:

Advantages

extremely expressive
flexible for unknown workflows
easy to audit (code is explicit)

Limitations

requires strong coding ability from the model
harder integration with authenticated external systems
resource usage becomes less predictable
failures can be more complex to debug

Choosing the Right Paradigm

A practical decision rule:

Scenario	Best fit
Fully predictable workflow	State machine
Multi-step reasoning with known tools	Function calling
Open-ended tasks without fixed structure	Sandbox

A useful intuition:

If you already know the steps → don’t use a sandbox
If you don’t know the steps → sandbox may be appropriate
If you want controlled flexibility → function calling is usually enough

A Note on Model Capability

The most important factor is not architecture — it is model quality.

Weak models:

mis-select tools
fail to recover from errors
lose coherence in long tasks

Strong models:

self-correct when tools fail
maintain long-horizon intent
naturally choose appropriate abstraction levels

In other words, harness design amplifies capability, it does not replace it.

Summary

A good agent harness:

keeps the LLM in control of decisions
executes tools without interpretation
returns full error signals
preserves full context
avoids imposing artificial structure
lets the model decide when to stop

The sandbox paradigm is the most powerful expression of this idea — but also the most demanding.

It works well only when the model is strong enough to operate with that level of freedom. Otherwise, simpler function-calling systems are usually more reliable and easier to control.

Related reading: Does a Harness Lobotomize Your LLM? — on the structural failure modes of over-engineered agent scaffolding.