The Coming Loop

"I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do." — Boris Cherny

Recently, I've noticed a shift. People aren't just using coding agents; they are building sophisticated architectures on top of them. Whether it's happening via Pi or other platforms, the fundamental architecture is identical: a task enters a queue, a machine attempts it, and a harness evaluates the result.

If the goal isn't met, the harness manages the next step—whether that means continuing the session, injecting new prompts, resetting the context, or handing the task to a different machine. The process persists long after a standard LLM would have simply said, "I am done."

Understanding the Two Loops

To understand this, we have to distinguish between two different layers of iteration:

The Agent Loop: The internal cycle where the model calls a tool $\rightarrow$ processes the output $\rightarrow$ calls another tool $\rightarrow$ edits a file $\rightarrow$ runs a test.
The Harness Loop: The external "meta-loop" that governs the agent. This is the layer currently dominating the discourse on Twitter and the evolution of agentic engineering.

The Friction: Why I'm Not There Yet

Despite the hype, I haven't found this workflow successful for the code I truly care about. My standards remain high: I want code to be elegant, and more importantly, I want to comprehend it.

I refuse to be in a position where, during a technical debate or under high pressure, I have to ask a clanker (an AI) to explain how my own system works. While I don't know if I'll still value "human comprehension" in a few years, for now, it is non-negotiable.

The "Slop" Problem

When code is generated by loops without active human steering, the quality often degrades. I've observed that current models produce code that is:

~~Concise~~ $\rightarrow$ Overly defensive
~~Elegant~~ $\rightarrow$ Unnecessarily complex
~~Architectural~~ $\rightarrow$ Too local in reasoning

They tend to invent clumsy abstractions and use "machinery" to hide poor design. In fact, I feel we might be moving backward. Tools like Claude Code combined with Fable can work autonomously for 30+ minutes; whereas previously, a human would have intervened to correct the course.

The "Exception Terror"

As Andrej Karpathy noted, these models are "mortally terrified of exceptions." This leads to a fundamental design flaw:

$\text{Correct Fix} = \text{Make the malformed case impossible (Invariants)}$ $\text{LLM Fix} = \text{Wrap everything in a try-catch/defensive check}$

When you put this behavior into a loop, you create a compounding effect. Each iteration adds another layer of local defense. The system looks more robust, but it becomes a nightmare to understand.

Feature	Human-Centric Design	Loop-Generated "Slop"
Error Handling	Prevents invalid states	Handles every possible invalid state
Abstraction	Based on system-wide logic	Based on local failure patterns
Maintainability	High (Understandable)	Low (Obscured by defenses)

The Danger for Juniors: When junior developers use these tools without guidance, they aren't just getting bad code—they are being taught bad habits. Because the LLM can provide a convincing (though flawed) argument for why the complex, defensive code is "better."

Where the Loop Actually Wins

It would be intellectually dishonest to say loops don't work. In specific domains, they are already transformative.

Successful Use Cases:

Large-scale Porting: (e.g., migrating Bun components from Zig to Rust).
Performance Optimization: The loop can Experiment $\rightarrow$ Benchmark $\rightarrow$ Discard/Keep in a way humans cannot.
Security Auditing: Exploring a massive problem space to find vulnerabilities.
Rapid Research: Generating PoCs that don't need to last.

The Common Thread: Longevity vs. Utility

The loops that work are those where the output is either a mechanical translation (verifiable by binary tests) or a temporary artifact (code that doesn't need a long shelf life).

Whether the "judge" in the loop is a binary test suite or another LLM, the process works because the goal is a specific outcome, not long-term maintainability.

# Example of "Loop Slop" vs "Clean Code"
# Loop-generated approach:
def process_data(data):
    if data is None: return None
    try:
        if not isinstance(data, list): return []
        # ... 10 more defensive checks ...
        return [item.strip() for item in data if item]
    except Exception as e:
        print(f"Error: {e}")
        return []

# Human-centric approach:
# Ensure data is a list at the type-boundary/API level
def process_data(data: list[str]):
    return [item.strip() for item in data if item]

I genuinely love loops that automate the tedious, boring parts of engineering. The issue isn't the harness—it's the model's tendency to produce "slop." As long as the loop is driving toward a useful iteration, it's a powerful tool, provided we don't mistake "robust-looking" code for good code.

![Image Placeholder: A conceptual diagram showing a human steering a loop vs. a loop running wild into a thicket of defensive code]