So far, we have implemented a number of fundamental datatypes in our interpreter, e.g., numbers and booleans. While each has introduced new syntactic forms, operators, and typechecking and evaluation rules, they have all been “simple” to implement in the sense that they have not required heavy modification to our infrastructure. Indeed, more complex language features require that we make additional changes to the architecture of our interpreter. We will spend the remainder of this course examining such features in depth.
The first of these features are variables, i.e., storage locations that contain values. Variables are cornerstones of any real programming language—without them, we cannot write parameterized code. However, to add variables to our language, we will need to both extend the syntactic forms of our language as well as enhance typechecking and evaluation to account for this new behavior.
Statements Versus Expressions
To implement variables, we must provide ways to (a) introduce variables into a program and (b) use variables that are in scope, i.e. available for use at a particular point in our program. Most variables provide a variable initialization construct that allows you to introduce a new variable. For example, in C:
int sum(int x, int y) {
int ret = x + y;
return x + y;
}
Importantly, in these languages, the variable initialization is not an expression! To observe this, you can using a variable initialization construct in a position where an expression is expected:
// Error: expected an expression, found a statement
("%d", 1 + (int z = x + y)); printf
Variable initialization constructs in most languages do not produce a value as a result of their execution. Instead, they produce an effect in our code, in this case, allocating space for a new storage location denoted by the variable. In this respect, they are our first example of a different syntactic class in our programs, statements.
Definition (Statements): statements are program fragments that do not produce values. Instead, they produce effects when executed.
In many programming languages, a program is usually defined to be a collection of (top-level) statements. For example, in this Python program:
= 10
x
def foo(y):
return x + y
print(foo(5))
We have three different statements:
- A variable definition.
- A function definition.
- An expression statement.
The final statement is interesting in that it shows that many languages allow for expressions to appear as statements but not vice versa! When used as a statement, the final value that the expression evaluates to is ignored. Thus, an expression statement is only useful as long as the expression produces side-effects!
Variable and Environments
With this in mind, let’s add variable declarations as
statements in our language. In our formal grammar, we’ll use the
metavariable s
to denote statements and
e
to denote expressions. A program prog
in our language is simply a collection of statements that will be
executed in-order.
s ::= (define x e) | (display e)
e ::= x | n | (+ e1 e2)
v ::= n
prog ::= s1 ... sk
The statement form (define x e)
initializes a
variable x
to contain the result of evaluating
e
. We’ll add an additional form,
(display e)
, that is, effectively, our “expression
statement.” When we execute a display
statement, we
print to the console the result of evaluating e
.
Extending our AST to handle statements is straightforward. We
need to include a new AST type, Stmt
, for statements.
A Prog
is, therefore, a list of
Stmt
s:
type Stmt = { tag: 'define', id: string, e: Exp }
type Disp = { tag: 'display', e: Exp }
type Prog = Stmt[]
// Along with appropriate constructor definitions...
When parsing, we must not just parse a single s-expression
corresponding to a single expression, we must now parse a list of
s-expressions corresponding to a complete program. We also need to
add functions to parse our statements and program constructs,
parseStmt
and parseProg
.
Once these preliminaries are out of the way, we can focus on how we integrate variables into evaluation. Importantly, we need a way to track the value of variables as they are initialized throughout our program. To do so, we will need to maintain an additional data structure during evaluation that captures precisely this information.
We call this structure an environment, a mapping from
variables to their values during evaluation. We can implement
environments using a mapping data structure, such as a hash map.
In Javascript/Typescript, the built-in Map
datatype
serves this purpose:
type Env = Map<string, Value>
To add environments to evaluation:
We add the current
Env
as a parameter to ourevaluate
function:function evaluate (env: Env, e: Exp): Value { /* ... */ }
Most of our syntactic forms ignores
env
, passing it along as needed through any recursive calls.When executing a
define
statement,(define x e)
, we updateEnv
with a new entry:x
is bound to the value resulting from evaluatinge
.We evaluate a variable expression
x
by looking up whatx
is bound to instead ofenv
.
The first and third tasks are straightforward to implement. To complete the second task, we need to enhance our interpreter to handle statements. Because statements do not return values, it doesn’t make sense to “evaluate” them. Instead, we execute statements, producing effects in our program.
We can add an execute
function to our interpreter
for this purpose:
function execute (env:Env, s: Stmt) { /* ... */ }
execute
proceeds by case analysis on the kind of
Stmt
that we have:
(display e)
executes by first evaluatinge
to a value and then printing that value to the screen. In Javascript/Typescript, we can achieve this effect withconsole.log
.(define x e)
executes by evaluatinge
to a valuev
, then updating theEnv
so thatx
is bound tov
.
Functions
Another place where variables arise in computer programs are functions. Virtually every programming languages has facilities for declaring and calling functions. Traditionally, function declarations are statements that occur at the top-level. But following in the footsteps of functional languages, many modern languages feature some kind of expression-based function declaration form, i.e., a “lambda” (from the Lambda calculus). Consequently, functions in these languages are “first-class” citizens, i.e., functions are values.
Let’s add first-class functions to our language along with function calls, i.e., function application, to our set of expressions:
e ::= ... | (lambda x e) | (e1 e2)
v ::= ... | (lambda x e)
Note that the lambda
form is, itself, a value
since it will not take any further steps of evaluation. A function
application (e1 e2)
, on the other hand, evaluates as
follows:
- Evaluate
e1
to alambda
value,(lambda x e)
. Ife1
does not evaluate to alambda
, then we should raise a (dynamic) type error. - Evaluate
e2
to a value, call itv
. - Evaluate the body of the function
e
to a value wherex
is bound tov
(the valuee2
evaluated to). This final value is the result of the function call.
We used an environment to record the values of global variables
bound with define
. We can also use environments with
function parameters. However, we must be careful to respect the
stack-like discipline of function calls! For example,
consider the following code:
define f
(lambda (x)
(+ x y)))
(
define g
(lambda (y)
(+ 1 (f 5))))
(
10) (g
A common beginner mistake when learning about functions is
believing that because g
calls f
,
f
can directly access g
’s parameter,
y
. This is not possible because the scope of
y
is the body of g
. However, if we
aren’t careful, we may evaluate f
in an environment
that includes g
’s definition of y
!
Before working through how we might implement environment-based evaluation for function application, let’s take a look at an alternative that is arguably easier to implement although more limited in its ability to cope with other language features, in particular, mutation. Rather than using an environment to remember the values of variables, we can instead immediately substitute the value for a variable once we know it. For example, consider the following code:
define f
(lambda (x)
(+ x 1)))
(
+ 1 2) (f (
To evaluate the expression (f 1 2)
using
substitution:
- We first observe that
f
is a variable, so we look up its corresponding value in the environment, alambda
. - We then evaluate the argument
(+ 1 2)
to a value,3
. - We then evaluate the body of the lambda
(+ x 1)
, substituting argument3
everywhere the parameterx
occurs in the body. Thus, we evaluate(+ 3 1)
which evaluates to4
. Consequently4
is the value produced by the function call.
The critical operation here is the substitution of a value for a variable in an expression (presumably, the body of the function). We can implement this as a simple recursive function over the expression that we substitute for:
/** @returns `e` but with `x` substituted for `v` everywhere `x` occurs in `e`. */
function substitute (v: Value, x: string, e: Exp): Exp { /* ... */ }
- In almost all cases, substitution simply recursively
substitutes into the expression’s sub-components. For example, to
substitute in a plus expression of the form
(+ e1 e2)
, we return(+ e1' e2')
wheree1'
ande2'
are the results of substituting intoe1
ande2
, respectfully. - When the expression being substituted for is a variable, we
have work to do!
- If the variable is equal to
x
, then we replace it withv
. - Otherwise, the variable is not equal to
x
so we leave it alone.
- If the variable is equal to