logo
Kurashizu Blogwhere ideas flow
  • Home
  • News
  • Blog
  • About
  • Home
  • News
  • Blog
  • About

© 2026 Kurashizu. All rights reserved.

Admin·New Post·Service Status
Powered by Cloudflare·23.06.26
← Back to blog

Token-Efficient Agent Escalation

May 10, 2026|Kurashizu
cost-managementtoken-efficiencyharness-engineering

In practical agent engineering, small models often fail to reliably follow multi-step protocols. Instead of forcing perfect compliance at the model level, we design system-level enforcement with structured outputs and escalation to larger models.

The core idea is simple:

Use small models for execution, and large models only when the system detects failure or uncertainty.


1. Key Problem

Small models tend to:

  • ignore constraints under load
  • simplify multi-step instructions
  • fail to report uncertainty
  • hallucinate completion

The issue is not capability, but unreliable adherence to protocol.


2. System Architecture

Instead of relying on prompting, we enforce behavior via structure and routing.


3. Core Execution Loop


4. Structured Output Contract

We force the small model to always output a schema:

{
  "step": "",
  "action": "",
  "reasoning": "",
  "status": "ok | confused | blocked",
  "confidence": 0.0
}

This makes internal state externally verifiable.


5. Escalation Triggers

Escalation happens when:

  • confidence < threshold
  • status == "confused"
  • schema invalid
  • repeated failure detected

This ensures failure is caught early instead of propagating.


6. Compact Context Principle

When escalating, we do NOT send full history.

We only send:

  • current goal
  • last valid state
  • failure point
  • error signals
  • relevant code snippet
  • constraints

This minimizes token cost while preserving diagnostic signal.


7. Why This Works

This system works because it shifts reliability from model behavior to system enforcement:

Failure ModeSystem Fix
ignored rulesschema validation
silent failurestatus field
hallucinationstructured output
long context costescalation routing

8. Runtime Behavior

run → step execution → validate → continue
run → failure → escalate → patch → restart
run → success → terminate

Over time, the system converges to fewer escalations and more stable execution.


9. Key Insight

You do not make small models reliable. You make unreliability detectable and recoverable.


10. Extensions

This pattern can evolve into:

  • skill compilers
  • agent operating systems
  • automated debugging loops
  • multi-model routing systems

It is a general pattern for cost-efficient agent architectures.