(How to Write a (Lisp) Interpreter (In Python)) (2010)
Guide: Constructing a (Lisp) Interpreter (Using Python) (2010)
This resource serves a dual purpose: providing a general overview of how to implement interpreters for programming languages and specifically guiding the creation of an interpreter for the majority of the Scheme Lisp dialect, utilizing Python 3 as the host language.
Years ago, I demonstrated how to build a semi-practical Scheme interpreter using Java and Common Lisp.
The objective here is to present, with maximum simplicity and brevity, what Alan Kay described as the "Maxwell's Equations of Software." But why is this pursuit significant?
"If you don't know how compilers work, then you don't know how computers work." — Steve Yegge
Yegge identifies eight distinct problems that can be addressed via compilers (or interpreters, though Yegge often adds a layer of cynicism to his descriptions).
Syntax vs. Semantics in Scheme
To understand interpreters, we must distinguish between two concepts:
- Syntax: The specific arrangement of characters required to form valid expressions or statements.
- Semantics: The actual meaning or logic behind those expressions.
For instance, in mathematical notation, the syntax for adding one and two is 1 + 2. The semantics is the act of performing addition on those two integers, which results in the value 3. In formal terms, we say the expression evaluates as:
The Uniqueness of Scheme Syntax
Scheme deviates significantly from the norms of most mainstream languages. Compare the following logic:
Java Implementation:
if (x.val() > 0) {
return fn(A[i] + 3 * i, new String[] {"one", "two"});
}
Scheme Implementation:
(if (> (val x) 0)
(fn (+ (aref A i) (* 3 i))
(quote (one two))))
While Java employs a complex array of syntactic rules—including infix operators, three different bracket types, operator precedence, semicolons, and dot notation—Scheme is radically simpler. In Scheme, everything is an expression.
- Atomic Expressions: These are indivisible units. In Scheme, these include numbers and symbols. Notably, operators like
+are treated as symbols, just likefnorA. - List Expressions: These consist of an opening parenthesis
(, followed by zero or more expressions, and a closing parenthesis).
The first element of a list defines its behavior. If it starts with a keyword (e.g., (if ...)), it is known as a special form.
Complexity Comparison
The elegance of Scheme lies in its minimalism.
| Language | Keywords | Syntactic Forms |
|---|---|---|
| Scheme | 5 | 8 |
| Python | 33 | 110 |
| Java | 50 | 133 |
Though the abundance of parentheses can be daunting, it provides unmatched consistency. Some joke that Lisp stands for "Lots of Irritating Silly Parentheses," but it more accurately stands for "Lisp Is Syntactically Pure."
The Implementation Roadmap
To build the full interpreter, we will proceed in two distinct phases:
- Phase 1: Define a stripped-down version called the Lispy Calculator.
- Phase 2: Expand this into a near-complete Scheme language.
Language 1: The Lispy Calculator
The Lispy Calculator is a minimal subset of Scheme. It supports only five syntactic forms: two atomic types, two special forms, and the procedure call. It can handle any calculation a standard calculator can, provided you use prefix notation. Additionally, it introduces variable definitions and conditional logic.
Example: Calculating the area of a circle () with
(define r 10)
(* pi (* r r))
Expression Reference Table
| Expression | Syntax | Semantics & Example |
|---|---|---|
| Variable Reference | symbol | The symbol is treated as a name; it evaluates to the variable's stored value. <br> Ex: r 10 |
| Constant | number | A literal number evaluates to itself. <br> Ex: 12 12 |
| Conditional | (if test conseq alt) | Evaluates test. If true, returns conseq; otherwise, returns alt. <br> Ex: (if (< 10 20) (+ 1 1) (+ 3 3)) 6 |
| Definition | (define symbol exp) | Creates a new variable symbol and assigns it the value of exp. |
| Procedure Call | (proc args...) | If proc isn't if, define, or quote, it's a function. Evaluate proc and args, then apply the function. <br> Ex: (sqrt (* 2 8)) 4.0 |
Anatomy of a Language Interpreter
An interpreter is divided into two primary stages:
- Parsing: This component converts a raw string of characters into an internal representation. It verifies the syntax and typically produces an Abstract Syntax Tree (AST), which mirrors the nested structure of the code.
- Note: A compiler also starts with an AST but translates it into a series of machine-executable instructions rather than executing it directly.
- Execution: The AST is processed according to the language's semantic rules. In our implementation, the execution function is named
eval(which overrides Python's nativeevalfunction).
The Interpretation Pipeline
Practical Example:
If we provide a program like (begin (def..., the interpreter will parse the string into a tree and then eval that tree to produce the final output.