Kurashizu's Blog

We have officially transitioned from the era of "AI as a smart autocomplete" to the era of Agentic Coding. Today, autonomous software agents don't just write snippets; they read entire repositories, formulate execution plans, invoke CLI tools via protocols like MCP (Model Context Protocol), and rewrite massive legacy subsystems.

Yet, as agents gain autonomy, they expose a catastrophic vulnerability that every developer eventually runs into: The Perception Gap.

An agent can generate 2,000 lines of technically flawless React code that passes every unit test, only for a human developer to boot it up and realize the UI layout is upside down, the user experience feels disjointed, or a subtle global side effect is draining the system's performance.

Agents excel at deterministic logic, but they lack human intuition, architectural "taste," and visual empathy. To build reliable software with AI, we must shift our focus from prompting to alignment engineering.

1. Low-Cost Visual Grounding: The Power of ASCII Prototyping

One of the most pragmatic tactics emerging from leading agentic frameworks (like OpenHands and Aider) is enforcing a strict Plan Mode before a single line of production code is written.

When tasks involve user interfaces or systemic layouts, forcing the agent to output an ASCII wireframe within its plan.md acts as a low-cost visual contract.

+-----------------------------------+
| [Logo]  Search Projects... (Avatar)|
+--------+--------------------------+
| Nav 1  |  Main Repository Content |
| Nav 2  |  [Active Filter]         |
| Nav 3  |  +--------------------+  |
|        |  | Commit History     |  |
+--------+--+--------------------+--+

Why this works:

Human-in-the-Loop (HITL) Interception: It is infinitely cheaper to correct an agent's mental model when it's rendered in plaintext characters than to refactor a broken CSS flexbox layout after the fact.
Structural Anchor: The ASCII map serves as a rigid constraint during the Act phase. The agent references this plaintext blueprint to ensure that the generated DOM or component hierarchy matches the human's spatial expectations.

2. System-Level Alignment: Establishing the Local Covenant

Agents entering a massive repository are like junior developers on their first day—they don't know your tribal knowledge, naming conventions, or strict architectural bans. To bridge this, modern ecosystems utilize static rule files like CLAUDE.md or DEVELOPER.md placed directly at the repository root.

# Repository Conventions

## Technology Stack
- TypeScript in strict mode only.
- Functional components with React Hooks; strictly ban Class components.

## Code Style & Guardrails
- Use early returns (Guard Clauses) to eliminate deeply nested `if` statements.
- Never use the `any` type. If a type is unknown, use `unknown`.

By injecting this file directly into the agent’s system context window, you achieve preventative alignment. The agent aligns its output with your codebase's local engineering culture before creating a single file pull request.

3. The Ultimate Synthesis: Agentic Test-Driven Development (TDFlow)

If agents lack human intuition during manual testing, how do we enforce rigorous correctness? We look to a classic software engineering paradigm: Test-Driven Development (TDD). When integrated into multi-agent workflows, the traditional Red-Green-Refactor cycle becomes the ultimate automated alignment loop.

  ┌────────────────────────────────────────────────────────┐
  ▼                                                        │
[RED: Agent writes E2E/Unit test] ──► [GREEN: Agent writes barebones code] ──► [REFACTOR: Dual-agent critique]

RED: The human describes a highly specific edge case or bug using natural language. The agent's first job is not to fix the bug, but to write an automated test (e.g., using Playwright or Vitest) that accurately reproduces the failure. The test suite turns Red.
GREEN: The agent writes the minimal, sometimes messy code required to turn that test Green.
REFACTOR: This is where the magic happens. How does an agent know its refactored code is actually excellent if it doesn't possess human intuition?

4. Engineering "Taste": How Agents Quantify Clean Code

Because an AI cannot look at code and inherently feel its elegance, agentic architectures substitute human intuition with quantitative software metrics and multi-agent adversarial critique. During the Refactor phase, the agent validates its structural choices against three distinct pillars:

I. Static Metric Gates

An agent evaluates its refactored code by invoking native AST (Abstract Syntax Tree) analysis tools to measure structural improvements:

Cyclomatic Complexity: Ensuring conditional branching paths are flattened.
Cognitive Complexity & Nesting Depth: Forcing nested loops and conditional blocks down to manageable levels.
Duplication Density: Utilizing tokens scanners to ensure copy-pasted blocks are modularized under the DRY (Don't Repeat Yourself) principle.

II. Runtime Telemetry Profiles

Excellent code isn't just readable; it's performant. Advanced agent loops run isolated benchmarks comparing the micro-performance of the pre-refactored code against the new architecture, analyzing execution time deltas and memory allocations.

III. The "Shadow Tech Lead" (Adversarial Critique)

Never let the agent that wrote the code act as the sole judge of its quality. Modern workflows deploy a multi-agent split:

The Coder Agent: Focuses entirely on implementation and passing the test suite.
The Reviewer Agent: Programmed with an aggressively critical persona acting as a Tech Lead. It reviews the code differential (Diff) against strict style guidelines, flagging unhandled promises, semantic naming issues, or missing documentation. The refactoring step is only deemed successful when the Reviewer Agent grants an automated LGTM (Looks Good To Me) and all regression tests remain green.

Summary: Shifting the Paradigm

The developers who will thrive in this agentic era are not those who treat LLMs like a magical oracle, but those who treat them like highly volatile, infinitely fast execution engines that require rigid rails. By wrapping your AI agents in structural blueprints (ASCII), rigid project covenants (CLAUDE.md), and impenetrable algorithmic loops (TDD), you turn unpredictable "Vibe Coding" into a disciplined, high-velocity engineering pipeline. The future of software engineering isn't about writing code faster; it's about building the systems that ensure AI writes code correctly.