Back to CSC 312 (Spring 2023)

So far, we have implemented a number of fundamental datatypes in our interpreter, e.g., numbers and booleans. While each has introduced new syntactic forms, operators, and typechecking and evaluation rules, they have all been “simple” to implement in the sense that they have not required heavy modification to our infrastructure. Indeed, more complex language features require that we make additional changes to the architecture of our interpreter. We will spend the remainder of this course examining such features in depth.

The first of these features are variables, i.e., storage locations that contain values. Variables are cornerstones of any real programming language—without them, we cannot write parameterized code. However, to add variables to our language, we will need to both extend the syntactic forms of our language as well as enhance typechecking and evaluation to account for this new behavior.

Statements Versus Expressions

To implement variables, we must provide ways to (a) introduce variables into a program and (b) use variables that are in scope, i.e. available for use at a particular point in our program. Most variables provide a variable initialization construct that allows you to introduce a new variable. For example, in C:

int sum(int x, int y) {
    int ret = x + y;
    return x + y;
}

Importantly, in these languages, the variable initialization is not an expression! To observe this, you can using a variable initialization construct in a position where an expression is expected:

// Error: expected an expression, found a statement
printf("%d", 1 + (int z = x + y));

Variable initialization constructs in most languages do not produce a value as a result of their execution. Instead, they produce an effect in our code, in this case, allocating space for a new storage location denoted by the variable. In this respect, they are our first example of a different syntactic class in our programs, statements.

Definition (Statements): statements are program fragments that do not produce values. Instead, they produce effects when executed.

In many programming languages, a program is usually defined to be a collection of (top-level) statements. For example, in this Python program:

x = 10

def foo(y):
    return x + y

print(foo(5))

We have three different statements:

A variable definition.
A function definition.
An expression statement.

The final statement is interesting in that it shows that many languages allow for expressions to appear as statements but not vice versa! When used as a statement, the final value that the expression evaluates to is ignored. Thus, an expression statement is only useful as long as the expression produces side-effects!

Variable and Environments

With this in mind, let’s add variable declarations as statements in our language. In our formal grammar, we’ll use the metavariable s to denote statements and e to denote expressions. A program prog in our language is simply a collection of statements that will be executed in-order.

s    ::= (define x e) | (display e)
e    ::= x | n | (+ e1 e2)
v    ::= n
prog ::= s1 ... sk

The statement form (define x e) initializes a variable x to contain the result of evaluating e. We’ll add an additional form, (display e), that is, effectively, our “expression statement.” When we execute a display statement, we print to the console the result of evaluating e.

Extending our AST to handle statements is straightforward. We need to include a new AST type, Stmt, for statements. A Prog is, therefore, a list of Stmts:

type Stmt = { tag: 'define', id: string, e: Exp }
type Disp = { tag: 'display', e: Exp }
type Prog = Stmt[]
// Along with appropriate constructor definitions...

When parsing, we must not just parse a single s-expression corresponding to a single expression, we must now parse a list of s-expressions corresponding to a complete program. We also need to add functions to parse our statements and program constructs, parseStmt and parseProg.

Once these preliminaries are out of the way, we can focus on how we integrate variables into evaluation. Importantly, we need a way to track the value of variables as they are initialized throughout our program. To do so, we will need to maintain an additional data structure during evaluation that captures precisely this information.

We call this structure an environment, a mapping from variables to their values during evaluation. We can implement environments using a mapping data structure, such as a hash map. In Javascript/Typescript, the built-in Map datatype serves this purpose:

type Env = Map<string, Value>

To add environments to evaluation:

We add the current Env as a parameter to our evaluate function:
```
function evaluate (env: Env, e: Exp): Value { /* ... */ }
```
Most of our syntactic forms ignores env, passing it along as needed through any recursive calls.
When executing a define statement, (define x e), we update Env with a new entry: x is bound to the value resulting from evaluating e.
We evaluate a variable expression x by looking up what x is bound to instead of env.

The first and third tasks are straightforward to implement. To complete the second task, we need to enhance our interpreter to handle statements. Because statements do not return values, it doesn’t make sense to “evaluate” them. Instead, we execute statements, producing effects in our program.

We can add an execute function to our interpreter for this purpose:

function execute (env:Env, s: Stmt) { /* ... */ }

execute proceeds by case analysis on the kind of Stmt that we have:

(display e) executes by first evaluating e to a value and then printing that value to the screen. In Javascript/Typescript, we can achieve this effect with console.log.
(define x e) executes by evaluating e to a value v, then updating the Env so that x is bound to v.

Functions

Another place where variables arise in computer programs are functions. Virtually every programming languages has facilities for declaring and calling functions. Traditionally, function declarations are statements that occur at the top-level. But following in the footsteps of functional languages, many modern languages feature some kind of expression-based function declaration form, i.e., a “lambda” (from the Lambda calculus). Consequently, functions in these languages are “first-class” citizens, i.e., functions are values.

Let’s add first-class functions to our language along with function calls, i.e., function application, to our set of expressions:

e ::= ... | (lambda x e) | (e1 e2)
v ::= ... | (lambda x e)

Note that the lambda form is, itself, a value since it will not take any further steps of evaluation. A function application (e1 e2), on the other hand, evaluates as follows:

Evaluate e1 to a lambda value, (lambda x e). If e1 does not evaluate to a lambda, then we should raise a (dynamic) type error.
Evaluate e2 to a value, call it v.
Evaluate the body of the function e to a value where x is bound to v (the value e2 evaluated to). This final value is the result of the function call.

We used an environment to record the values of global variables bound with define. We can also use environments with function parameters. However, we must be careful to respect the stack-like discipline of function calls! For example, consider the following code:

(define f
  (lambda (x)
    (+ x y)))

(define g
  (lambda (y)
    (+ 1 (f 5))))

(g 10)

A common beginner mistake when learning about functions is believing that because g calls f, f can directly access g’s parameter, y. This is not possible because the scope of y is the body of g. However, if we aren’t careful, we may evaluate f in an environment that includes g’s definition of y!

Before working through how we might implement environment-based evaluation for function application, let’s take a look at an alternative that is arguably easier to implement although more limited in its ability to cope with other language features, in particular, mutation. Rather than using an environment to remember the values of variables, we can instead immediately substitute the value for a variable once we know it. For example, consider the following code:

(define f
  (lambda (x)
    (+ x 1)))

(f (+ 1 2)

To evaluate the expression (f 1 2) using substitution:

We first observe that f is a variable, so we look up its corresponding value in the environment, a lambda.
We then evaluate the argument (+ 1 2) to a value, 3.
We then evaluate the body of the lambda (+ x 1), substituting argument 3 everywhere the parameter x occurs in the body. Thus, we evaluate (+ 3 1) which evaluates to 4. Consequently 4 is the value produced by the function call.

The critical operation here is the substitution of a value for a variable in an expression (presumably, the body of the function). We can implement this as a simple recursive function over the expression that we substitute for:

/** @returns `e` but with `x` substituted for `v` everywhere `x` occurs in `e`. */
function substitute (v: Value, x: string, e: Exp): Exp { /* ... */ }

In almost all cases, substitution simply recursively substitutes into the expression’s sub-components. For example, to substitute in a plus expression of the form (+ e1 e2), we return (+ e1' e2') where e1' and e2' are the results of substituting into e1 and e2, respectfully.
When the expression being substituted for is a variable, we have work to do!
- If the variable is equal to x, then we replace it with v.
- Otherwise, the variable is not equal to x so we leave it alone.