Human Judgment as a Specification
Human Judgment as a Specification
Posted on: 09 June 2026
Tags: #LargeLanguageModels, #Semantics, #Tools, #LinearTemporalLogic
The Intersection of GenAI and Formalism
The rapid proliferation of Generative AI within the realm of software engineering necessitates a corresponding surge in the application of formal methods. As AI systems are deployed with increasing autonomy, we must have a mechanism to verify that these "wild" systems are actually delivering the specific outcomes we intend.
To leverage the power of formal verification tools, the specification must be expressed mathematically. However, a significant skill gap exists:
The Dilemma: The average programmer possesses far more expertise in writing code than in constructing formal mathematical specifications.
The Translation Problem
The core challenge is the transition from informal descriptions (natural language prose) to formal specifications (mathematical logic).
We can represent this desired transformation as a function:
The "Incorrect" Approach to Specification
A tempting, yet flawed, solution is to utilize Large Language Models (LLMs) to perform this translation. While LLMs are surprisingly adept at generating syntax for various formal notations, this approach is fraught with danger.
The Minsky-Eisenberg Debate
The tension regarding this method is captured in the following exchange:
Ron Minsky (suggesting a lighthearted approach): "I wonder if a more plausible model is, you go to your large language model and say, ‘Please write me a specification for a function that sorts a list.’ And then it, like, spits something out. And then you look at it and think, yeah, that seems about right."
Richard Eisenberg (the critical counterpoint): "How can we be sure that the generated specification is the right one?"
Why "Looking Right" Isn't Enough
Simply glancing at a generated spec is sufficient insufficient. The risks include:
| Risk Factor | Description | Impact |
|---|---|---|
| Subtle Errors | The spec looks correct but contains a logical flaw. | False sense of security. |
| Ambiguity | The prose was vague; the LLM picked the wrong interpretation. | System meets the wrong goal. |
| Misconceptions | The LLM inherits common myths about the language. | Fundamental logic errors. |
| Lack of Ground Truth | The requirement exists only in the user's mind. | No objective way to verify. |
Visualizing the Failure Pipeline
Potential Failure Checklist
- Is the specification ambiguous?
- Does the LLM have a misconception about the
LTLorSemantics? - Is there a clear "ground truth" for this requirement?
- Are there subtle edge cases the LLM ignored?
Example: The "Sorting" Specification
If we asked an LLM to provide a formal specification for a sorting function, it might produce something like this in a pseudo-formal notation:
(* Hypothetical Formal Spec for Sort *)
let is_sorted (l: 'a list) =
match l with
| [] | [_] -> true
| x :: y :: tl -> x <= y && is_sorted (y :: tl)
let is_permutation (l1 l2: 'a list) =
(* Logic to ensure elements are the same *)
...
let spec_sort (input output: 'a list) =
is_sorted output && is_permutation input output
While the code above seems correct, the process of relying on an LLM to generate it without a rigorous verification framework is where the danger lies.
