CSC 208: Discrete Mathematics (Fall 2024)

Instructor: Peter-Michael Osera
Office Hours: Science 2811, by appointment
Course Mentor: Peter Versh
Meeting Times: MWF 3:00–3:50, Science 3815

What are the mathematical foundations of computer science? How does mathematical formalism relate to the pragmatics of computer science? In this course, we study discrete mathematics, broadly the branches of mathematics that study discrete objects, and their applications towards computer science.

By understanding discrete mathematics deeply, we, in turn, gain an understanding of how mathematics informs our studies as computer scientists, namely:

We solve problems in computer science by modeling domains of problems, a process that is, at its core, mathematical.
The interpretation of the syntax and semantics of mathematics is identical to the interpretation of a programming language, so we can leverage our understanding of programming to learn mathematics rapidly.
There is a spectrum of reasoning between absolute mathematical formalism and informal reasoning, a spectrum that we must move across at will as competent programmers.

Finally, by studying discrete mathematics in depth and relating it to our experiences as computer programmers, we also gain expertise and comfort in studying mathematics as a discipline of modeling and problem-solving.

Discrete Structures (Fall 2024)

Overview

By understanding discrete mathematics deeply, we, in turn, gain an understanding of how mathematics informs our studies as computer scientists, namely:

We solve problems in computer science by modeling domains of problems, a process that is, at its core, mathematical.
The interpretation of the syntax and semantics of mathematics is identical to the interpretation of a programming language, so we can leverage our understanding of programming to learn mathematics rapidly.
There is a spectrum of reasoning between absolute mathematical formalism and informal reasoning, a spectrum that we must move across at will as competent programmers.

Course Learning Outcomes

Mathematical Literacy
- Reading: read and comprehend mathematical text involving both symbols and prose.
- Writing: author arguments of varying mathematical rigor with fundamental proof techniques (i.e., constructive, inductive, contradictory, and equivalence-based arguments).
- Analyzing: critically analyze rigorous mathematical arguments for latent assumptions and missing detail.
- Problem-solving: employ the concrete-to-abstract method to efficiently solve mathematical problems.
Mathematical Modeling
- Objects: model real-world phenomena using relevant objects drawn from discrete mathematics (i.e., sets, relations, random variables, and graphs).
- Properties: formally state and prove relevant properties of mathematical models using propositional and first-order logic.
Mathematical Computation
- Combinatorics: count the number of elements in a finite algebraic structure using combinatorial principles.
- Probability: compute the probability of an event using combinatorial principles.
- Graphs: carry out the execution of fundamental graph algorithms by hand, e.g., traversals, spanning trees, and paths.
Program Reasoning
- Operational correctness: state and reason about formal properties of programs using operational semantics.
- Complexity: count the number of relevant operations a (potentially recursive) program performs.
- Algorithmic correctness: state and reason about formal properties of algorithms (specified in pseudocode) using tools drawn from discrete mathematics.
- Constructive design: translate between a (constructive) proof of a property and a program that enjoys that property by design.
Mathematical Soft Skills
- Collaboration: employ appropriate collaborative strategies to productively solve problems with peers.
- Practice: habitualize learning mathematics through self-driven, hands-on exploration and problem-solving practice.

Core Outcomes

Core learning outcomes are the fundamental skills that you should be able to confidentially perform quickly and efficiently by the end of the course. They are asssessed via the core exams we conduct throughout the semester:

Weeks 1–4
- Author a formal proof of a property of a pure program.
- Author a formal proof by structural induction.
- Author a formal proof by mathematical induction.
- Model propositions rigorously in terms of first-order logic.
- Author a formal proof of a abstract proposition in propositional/first-order logic.
Weeks 5–9
- Author a rigorous proof of the equality of two sets.
- Author a rigorous proof utilizing classical reasoning (“proof by contradiction”).
- Model real-world phenomena using the fundamental definitions of relations.
- Model real-world phenomena using the formal definitions of graphs and trees.
- Author rigorous proofs of properties of graphs and their associated algorithms.
Week 10–13
- Count the number of elements in an algebraic structure using combinatorial principles.
- Accurately count the number of relevant operations that a (recursive) program performs.
- Compute the probability of an event using fundamental combinatorial principles.
- Apply random variables and expectation to model a probabilistic phenomena.
- Interpret a combinatorial formula as an algorithm for constructing an object when choice is involved.

Textbook, Software, and Connectivity

There is no required textbook for this course. Required readings for the course will be distributed electronically as a work-in-progress textbook. The course webpage and readings contain suggestions for supplementary texts that give alternative takes on topics or go into more depth.

In this course, we will use the following software packages and services:

Python: a general-purpose scripting programming language.
Overleaf: online editing and collaboration of LaTeX documents.
Gradescope: deliverable submission and feedback reporting.
Microsoft Teams: communication and Q&A.

You will receive invitations to Gradescope and Teams at the beginning of the semester. Please let me know if you do not receive access to this site. The course webpage also contains additional software resources, e.g., Python and LaTeX tutorials and other utilities, that you might find useful in this course.

Diversity, Inclusion, and Accommodations

I am committed to fostering an inclusive classroom environment that allows you try, fail, and succeed, all in the service of mastering the learning outcomes of the course. In turn, I expect you to fully engage with the course through class attendance and timely submissions of work. If anything precludes you from doing so, e.g., illness, sports event, or religious observance, the golden rule is to let me know as early as possible. I will do what I can to help you succeed in the course. However, this requires that I have advanced notice, at least a week in advances, when appropriate and possible, so that we have an appropriate time frame to develop and implement a plan of action based on your needs.

I particularly encourage students with disabilities to meet with me and discuss how our classroom and course activities could impact their work and what accommodations would be essential. As part of this process, you should contact the Office of Disability Resources for further guidance and instructions.

Title IX

Grinnell College is committed to compliance with Title IX and to supporting the academic success of pregnant and parenting students and students with pregnancy related conditions. If you are a pregnant student, have pregnancy related conditions, or are a parenting student (child under one-year needs documented medical care) who wishes to request reasonable related supportive measures from the College under Title IX, please email the Title IX Coordinator at titleix@grinnell.edu. The Title IX Coordinator will work with Disability Resources and your professors to provide reasonable supportive measures in support of your education while pregnant or as a parent under Title IX.

Course Work and Evaluation

My grading philosophy comes from the grading for growth movement described in such texts as Nilson’s Specifications Grading, Feldman’s Grading for Equity, and Blum’s Ungrading. I believe that:

The stated learning outcomes are achievable by anyone that enrolls in this course.
Mastery of the learning outcomes is obtained via exploration, experimentation, and failure.
Eventual mastery should be valued as highly as “getting it right” the first time.
Your final course grade should reflect your mastery of the course’s learning goals at the end of the term.

To this end, the course is structured around several deliverables that, when taken together, indicate your mastery of the course’s learning outcomes outlined in the Overview section of the syllabus.

Deliverables

The main activities of our course are centered around four kinds of deliverables:

Daily drills: introductory practice problems tied to each reading due the day before each class period.
Lab exercises: collaborative practice and exploration-style problems worked on during class.
Demonstration exercises: individually completed weekly homework sets that apply the weekly concepts to substantial tasks aligned with the themes of the course.
Core exams: in-class exams that directly assess mastery of the core skills of the course.

Daily Drills

In each course reading, you will find a small number of practice problems that reinforce the concepts introduced in the reading. As the old saying goes, “Mathematics is not a spectator sport,” so these drills are designed to help you begin putting the day’s topics into practice.

Each class period's daily drill is due at 10 PM the day before class.
Daily drills are graded on a binary satisfactory (S)/non-satisfactory (N) scale. If it is clear that you have put effort into your responses by completing the drill with mostly positive results, you can expect to receive a satisfactory grade.
There is a 24 hour grace period for turning in daily drills, e.g., if you forget to press the submit button. Otherwise, late daily drills will not be accepted! Note that you can miss several daily drills without penalty to your final grade; see how your overall letter grade is calculated for details.
You are expected to bring your completed daily drills to class every day. We will frequently use the daily drills to begin our class discussion.

Lab Exercises

The bulk of your practice and exploration of the course learning goals come through lab exercises. These lab exercises will allow you to gain familiarity and eventual fluency with the course concepts by exploring and working through problems. Lab exercises are completed in small groups so that you can take advantage of the benefits of collaborative learning.

Each set of labs is due the Saturday of the week that the lab is assigned. For example, if labs are assigned on Monday, Wednesday, and Friday, they are due the same week on Saturday.
Like daily drills, labs are graded on a binary satisfactory (S)/non-satisfactory (N) scale.
There is a 48 hour grace period to turn in labs, e.g., if you need more time to coordinate with your partner. Like daily drills, late labs will not be accepted, and you can miss several labs without penalty to your final grade; see how your overall letter grade is calculated for details.
While labs are graded on a binary scale, you are expected to read the detailed feedback given by the course staff. This feedback will help you self-assess your mastery of the course content.

Demonstration Exercises

The demonstration exercises, i.e., weekly homework, allows you to demonstrate mastery of the course's learning outcomes through problems that put the course concepts into more practical, real-world contexts.

Demo responses will be graded in more depth than the other deliverables, specifically along two dimensions:

Is the response correct? Does the response correctly answer the question(s) posed? Does it meet the specification outlined in the problem description?
Is the response well-designed? Does it follow the design requirements and conventions appropriate to the medium? Is the deliverable clear, and does it communicate a proper understanding of both the problem and its solution?

Rather than using a point-based system that obscures these two dimensions, we codify these requirements with an EMRN rubric (an adaption of the “EMRF” rubric designed by Stutzman and Race). Demonstration responses are graded on a four-point scale:

Excellent (E)
- Complete understanding of the material is evident.
- Exhibits, at worst, a few minor design errors and can serve as an exemplary solution for the course.
Meets Expectations (M)
- Complete understanding of the material is evident without the need for further revision.
- Exhibits minor correctness or design errors that can improve the submission significantly if revised.
Needs Revision (R)
- One or more misunderstandings of the material are evident from the work.
- Exhibits many minor errors or one or more major errors that necessitate revision.
Not Completed (N)
- Not completed to a degree where understanding is evident.

Note that excellent ratings represent work that reflects mastery of the material and mindfulness towards producing quality work. To obtain excellent ratings, you should dedicate ample time to review and revise your work—just like writing a paper—before the deadline.

Each week, normally on Mondays, you are allowed turn in up to two demonstration exercises for grading, whether they are new submissions or revisions. This includes the Monday of finals week. Additionally, you may turn in a final two demos for grading by final deadline for all work.

If you are turning in a revision of a demonstration exercise, you must fill out the revision request form in addition to turning your work in to Gradescope to notify the course staff. If you do not fill out the revision request form, the course staff will be unable to grade your revision for that week.

Core Exams

Some of the course's learning outcomes are core outcomes, demonstrable skills that you should be confident performing by the end of the semester. To directly assess your mastery of these skills, we will conduct a series of core exams during the semester. Core exams are in-class exams inspired by mastery-based testing practices found in mathematics.

Core exams consist of one problem for each core learning outcome of the course covered so far at the time of the exam. This includes all learning outcomes covered in previous core exams, allowing you reattempt problems if you missed them on previous exams!

Core problems are graded on a binary satisfactory (S)/non-satisfactory (N) scale where a satisfactory answer is completely correct (modulo minor flaws that are understandable given the timed, in-class nature of the exam). Note that, unlikely the demos, core problems more closely resemble the daily drills in terms of their scope and complexity.

Once you receive an S on a problem tied to a particular core outcome, you do not need to attempt additional problems connected to that outcome in subsequent exams, i.e., you have demonstrated mastery of that outcome, so you are done with it!

The final core exam period of the course, held during finals week, is a revision core exam. No new learning outcomes are introduced so that you have a final opportunity to demonstrate mastery of any core outcomes you have missed throughout the semester.

Final Deadline for All Work

Note that all work must be submitted by Friday, December 20, 5:00 PM CST . This is College policy and cannot be waived for any reason. If you find yourself needing to turn in work past this deadline, you must consult with me as soon as possible beforehand to submit an incomplete request for the course. Regarding incompletes:

Incomplete requests are not automatically granted; make sure you have a backup plan in case your incomplete request is denied.
The only work we will allow during the incomplete period are demonstration exercises.

Overall Letter Grades

Major letter grades for the course are determined by tiers, a collection of required grades from your demonstration exercises and core exams. You will receive the grade corresponding to the tier for which you meet all of the requirements. For example, if you qualify for the A tier in one category and the C tier in another category, then you qualify for the C tier overall as you only meet the requirements for a C among all the categories.

Note that I reserve the right to update the requirements for grades as circumstances dictate during the semester, e.g., if a deliverable is cut. However, I will always update the requirements so that they are no stricter than they were previously.

Tier	Demonstration Exercises (8)	Core (18)
C	No Ns, at most 3 Rs, at least 1 Es	At least 9 Ss
B	No Ns, at most 2 Rs, at least 3 Es	At least 11 Ss
A	No Ns, at most 1 Rs, at least 5 Es	At least 13 Ss

D: exactly one of the requirements of a C are met.
F: no requirements for a C are met.

Plus/minus grades

To earn a plus/minus grade, you must have completed one tier’s requirements and partially meet the next tier’s requirements. This will arise in two situations: C/B and B/A. For example, you may completely meet the requirements of a C and meet the requirements of a B for demos, but not for core exams. In these situations, you will earn a minus grade for the higher tier, i.e., a B- if you are between a C and B and an A- if you are between a B and an A.

Be aware that if you are at an A tier for one deliverable category but at a C tier for another, then you fully qualify for the C tier and partially meet the requirements of the B tier and thus would earn a B-.

Daily drill and lab grades

You may miss turning in at most three daily drills and at most three lab exercises without penalty. After the first three missing drill or lab exercises, your overall letter grade will lower by one-third of a letter grade (i.e., A becomes A-, B- becomes a C+, C becomes a D) for every two additional deliverables you miss from that category. The following table summarizes this policy for concrete numbers of missed daily drills and labs through 9 although the policy extends to any number of missing assignments.

Missed x/y daily/labs	0–3 labs	4 lab	5 labs	6 labs	7 labs	8 labs	9 labs
0–3 dailies	-0	-1/3	-1/3	-2/3	-2/3	-1	-1
4 dailies	-1/3	-2/3	-2/3	-1	-1	-4/3	-4/3
5 dailies	-1/3	-2/3	-2/3	-1	-1	-4/3	-4/3
6 dailies	-2/3	-1	-1	-4/3	-4/3	-5/3	-5/3
7 dailies	-2/3	-1	-1	-4/3	-4/3	-5/3	-5/3
8 dailies	-1	-4/3	-4/3	-5/3	-5/3	-2	-2
9 dailies	-1	-4/3	-4/3	-5/3	-5/3	-2	-2

Course Breakpoints

Our grading system offers flexibility, but at the cost of giving the illusion that if you fall behind in your work, there is always an opportunity to catch up. While this is true in theory, in practice, it is difficult to do so in many situations because of personal issues, competing courses, extracurricular obligations, etc. This flexibility also makes it difficult—for both you and myself—to determine when you have fallen behind in the course and need external help such as the course staff, tutors, or academic advising.

I encourage you to preemptively come to me for help and guidance if you feel like you are falling behind. However, to be more clear about when you might be falling behind in the course, I am tracking the following course breakpoints in your progress. When one of the following situations occurs:

You have missed more than two classes in a row.
You have used up either your “free” daily drill or lab assignments.
You receive an N on a demo.
You do not turn in any demos during a revision window in which you have outstanding demos that need revision.
After a core examination, your total completed outcomes among all outstanding core outcomes is below 60%.
You are otherwise at substantial risk of earning below a C in the course.

I will follow up with you and academic advising (via an academic alert) to check in, provide guidance, and develop a plan for getting back on track.

Help, Collaboration, and Academic Honesty

There are several resources—course staff, your peers, and external sources—that you can use to expedite your learning in the course. However, you must balance your use of external resources with the need to produce original work, so that we can assess your learning appropriately. To this end, we have several policies that describeou resources are permissible to use depending on the deliverable you are using them on.

The Instructor and Course Staff

Please use each course staff member’s preferred means of communication, e.g., email or Microsoft Teams DM, to communicate with them about individual issues concerning logistics, grades, demos, or core exam questions. Regardless of the medium, note that the course staff will generally not respond immediately to messages. However, we will check our messages at fixed times throught the day.

Peer Learning and Outside Resources

Utilizing discussion with peers and outside learning to facilitate your learning is a critical skill for success in computer science. However, at the same time, you must be aware that getting stuck and pushing through challenging problems is essential for robust learning. To this end, we allow the following forms of collaboration.

You are encouraged to collaborate with your peers on daily drills and the labs. You may also consult the course staff, other people, and external resources, including AI-based completion tools. In all cases, you (or your group in the case of group work) should independently write up your solutions and cite all the resources you used in authoring your work.
You may only discuss the demonstration exercises and core exam questions with the course staff. When completing these, you may only consult the course website and book when developing your solutions. You may not collaborate with peers, consult external resources, or share information about assignments with others.

Keep in mind that adaptation of pre-existing code or solutions whether it comes from a peer, myself, or the Internet, requires a citation in the cases where it is allowed.

In all cases, the work that you produce should be your own. The golden rule is that you should be capable of reproducing your deliverable on the spot with minimal effort if it was accidentally deleted.

If you feel that the stress and pressure of the course are compelling you to violate the academic honesty policies of the course and the college as explained in the student handbook, please talk to me as soon as possible. The course’s grading policies are designed to help you manage your time in light of the different stressors in your life. I will do my best to work with you to figure out how to help you better manage your time relative to your learning goals and desired achievement level for the course.

While you retain copyright of the work you produce, we must still uphold the academic integrity of this course. To this end, you may not share copies of your assignments with others (unless otherwise allowed by the course policies) or upload your assignments to third-party websites unless substantial changes are made to the assignment (e.g., significant extensions and improvements to your code). Ultimately, it must be clear that the end product is significantly different from what was asked in the original assignment. I do recognize that there are times when you want to do this, e.g., uploading projects to Github for your resume, and so I encourage you to talk to me in advance so that we can ensure that you upload a meaningful project that does not run afoul of this policy.

Course Tools

We will use two tools in this course:

The Python programming language as a subject for our study of program correctness.
The LaTeX documentation preparation system to author mathematical prose.

Python is available via straightforward installers on all operating systems:

https://www.python.org/downloads/

Additionally, it is available through most platform's package managers, e.g., Homebrew on Macs.

To develop Python code, I recommend using Visual Studio Code which features robust support for developing Python programs. Although, any editor of your choice will work!

LaTeX can similarly be installed directly on most platform:

https://www.latex-project.org/get/

However, the raw edit-and-compilation experience for LaTeX is lacking. I recommend using the Overleaf online service for authoring LaTeX documents. Overleaf has a full-featured, in-browser LaTeX IDE that smooths away many of the warts when developing in LaTeX. A free Overleaf account is sufficient for both this course and most people's needs.

Additionally, Overleaf provides an excellent introductory tutorial to LaTeX. I highly recommend you work through it and come back to it as a reference for this course:

https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes

We'll introduce the parts of Python that you need for this course. However, you should certainly use this time to dig deeper into the language. I recommend using a comparison resource like Learn X in Y minutes:

https://learnxinyminutes.com/docs/python/

That provides a succinct, practical reference for the language as an initial guide and, later, reference when picking up the language.

Solutions to the labs

Demonstration Exercise #1

Demonstration exercises are your opportunity to showcase your mastery of the week's material. In lab, you gain practice employing the fundamental skills we introduce in class. In contrast, the demos are integrative in nature, requiring to put those skills into context as well as mix different skills together to solve problems that are more like the ones you find in the real world.

Formatting and Submitting Your Work

Create a LaTeX document for each demonstration exercise, using the csc208_template.tex template file to format your document. Your write-ups for each problem should reflect good writing principles and mathematical style:

Grammatically correct English prose and mathematics.
Structure of the prose made obvious through subheadings, paragraphs, and lists.
Math appropriately integrated into the prose as needed.

To achieve this, you will need to set aside ample time for revision and editing of your work before the deadline. Approach the demonstration exercises not like a one-and-done problem set, but as a writing assignment where you are putting in work that you eventually refine into a polished, complete product.

When you are done, you should compile your LaTeX file to a PDF and then submit that PDF to Gradescope when you are done. Please do not submit your LaTeX source (in textual or PDF format); we would like to only work with a complete, rendered document!

Problem 1: Tracing

Consider the following Python program mystery that performs a more complicated recursive computation:

def mystery(l):
    if len(l) == 0:
        return []
    elif len(l) == 1:
        return l
    else:
        x = head(l)
        y = head(tail(l))
        t = tail(tail(l))
        return cons(y, cons(x, mystery(t)))

Give the step-by-step evaluation of the following expressions. In your work, whenever you need to write down a conditional statement, you can elide the branches of the conditional with ellipses. In other words, instead of:
```
if len(l) == 0:
    return []
elif len(l) == 1:
    return l
else:
    x = head(l)
    y = head(tail(l))
    t = tail(tail(l))
    return cons(y, cons(x, mystery(t)))
```
You should write instead:
```
if len(l) == 0: ...
```
- mystery([])
- mystery(0, 1)
- mystery("a", "b", "c", "d")
When giving your evaluation traces, please use a verbatim block, placing each step of the derivation on its own line, separated by a long arrow (-->), e.g.,
```
\begin{verbatim}
    1 + 2 * 3
--> 1 + 6
--> 7
\end{verbatim}
```
In a sentence, describe what the mystery function does.

Problem 2: Blasting Booleans

Consider the following Python functions over booleans that give implementations of the standard boolean operations not, and, and or:

def my_not(b):
    if b:
        return False
    else:
        return True

def my_and(b1, b2):
    if b1:
        return b2
    else:
        return False

def my_or(b1, b2):
    if b1:
        return True
    else:
        return b2

Prove the following claims of program equivalence about these definitions:

Claim 1

my_not(my_and(True, False)) ≡ True.

Claim 2

There exists a boolean b1 such that for all booleans b2, my_or(b1, b2) ≡ True.

Claim 3 (Negation is an Involution)

For any boolean b, my_not(my_not b) ≡ b.

Claim 4 (De Morgan's Law)

For any pair of booleans b1 and b2, my_not(my_and(b1, b2)) ≡ my_or(my_not(b1), my_not(b2)).

Make sure not to skip any steps in your derivations!

Course Tools

We will use two tools in this course:

The Python programming language as a subject for our study of program correctness.
The LaTeX documentation preparation system to author mathematical prose.

Python is available via straightforward installers on all operating systems:

https://www.python.org/downloads/

Additionally, it is available through most platform's package managers, e.g., Homebrew on Macs.

To develop Python code, I recommend using Visual Studio Code which features robust support for developing Python programs. Although, any editor of your choice will work!

LaTeX can similarly be installed directly on most platform:

https://www.latex-project.org/get/

Additionally, Overleaf provides an excellent introductory tutorial to LaTeX. I highly recommend you work through it and come back to it as a reference for this course:

https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes

https://learnxinyminutes.com/docs/python/

That provides a succinct, practical reference for the language as an initial guide and, later, reference when picking up the language.

Concrete Evaluation

We begin our journey into the foundations of computer science by first studying one of its key applications to computer programming: program correctness. Rigorously stating and proving program properties will require us to deep-dive into mathematical logic, the sub-field of mathematics devoted to modeling deductive reasoning. Deductive reasoning is the foundation for virtually all activities in computer science, whether you are designing an algorithm, verifying the correctness of a circuit, or assessing a program's complexity. In this manner, this initial portion of our investigation of the foundations of computer science is, quite literally, the cornerstone of everything you do in this field moving forward.

Why Functional Programming?

In this course, we'll use the Python programming language as our vehicle for studying program correctness. In particular, we'll focus on a pure, functional subset of Python that corresponds to other function languages such as the Scheme we introduce in CSC 151. We do this because, as we will see shortly, pure, functional languages admit for a simple, substitution-based model of computation. By "pure," we mean that the language does not allow functions to produce side-effects. A side-effect is some behavior observable to the caller of the function beyond the function's return value.

The canonical example of a side-effect is the mutation of global variables. For example, in Python:

glob = 0    # a global variable

def increment_global():
    # in Python, we have to declare a global variable as
    # locally accessible with a `global` declaration
    global glob
    glob = glob + 1

def main():
    print(glob)           # 0
    increment_global()
    print(glob)           # 1
    increment_global();
    print(glob)           # 2
    increment_global();
    increment_global();
    print(glob)           # 4

The increment_global function takes no parameters and produces no output. Its sole task is to produce a side-effect: change the state of global by incrementing its value. Since the function does not return a value, main can only use increment_global for its side-effect by calling the function and observing the change to global.

We'll have more to say about side-effects and their relationship with the substitutive model of computation we introduce later in this reading. For now, note that ultimately it is not the language that is important here. What is important is whether we are considering a pure fragment of that language, instead. Indeed, the lessons we learn here regarding program correctness apply to any program that behaves in a pure manner, i.e., has no side-effects. For example, we can easily translate our model of computation to the pure fragment Python!

This fact supports a general maxim about programming you have hopefully heard at some point:

Minimize the number of side-effects in your code.

Intuitively, complicated code with side-effects is frequently tricky to reason about, e.g., explicit pointer manipulation in C. Our study of program correctness in this course will give us concrete reasons why this is indeed the case.

Program Correctness

Program correctness is a particularly topical application of logic to start with because every developer cares that their programs are correct. However, what do we mean by correctness?

For this example, we'll first introduce several functions so that we can operate on Python lists like Scheme lists. These functions give common names to common list indexing and access tricks that we would use in Python to achieve the same effect.

# returns the head (first element) of list l
def head(l):
    return l[0]

# return the tail of l, a list that is l but without its first element
def tail(l):
    return l[1:]

# return true iff list l is empty
def is_empty(l):
    return len(l) == 0

# return a new list that is the result of consing x onto the front of l
def cons(x, l):
    return [x, *l]

With these functions in mind, consider the following Python function that appends two lists together recursively:

def list_append(l1, l2):
    if is_empty(l1, l2):
        return l2
    else:
        return cons(head(l1), list_append(tail(l1), l2))

What does it mean for list_append to be "correct"? Of course, we have a strong intuition about how list_append should behave: the result of list_append should be a list that "glues" l2 onto the end of l1. But being more specific about what this means without saying the word "append" is tricky! As we try to crystallize what it means for list_append to be correct, we run into two issues:

Generality

A natural way to specify the correctness of list_append is through a test suite. For example, in Python, we can write a simple collection of tests using the unittest module:

import unittest

class list_appendTest(unittest.TestCase):
    def test(self):
      self.assertEqual(list_append([1, 2], [3, 4, 5]), [1, 2, 3, 4, 5])
      self.assertEqual(list_append([], [1, 2, 3, 4 ,5]))
      self.assertEqual(list_append([1, 2, 3, 4, 5], []))

unittest.main()

Note that these tests exemplify very specific properties about the behavior of list_append. A test case demands that when a function is given a particular set of inputs, e.g., [1, 2] and [3, 4, 5], that the function must produce the output [1, 2, 3, 4, 5]. You can't get more specific about the behavior of list-append than that!

However, are these tests enough? While they cover a wide variety of the potential behavior of list_append, they certainly don't cover every possible execution of the function. For example, the tests don't cover when list-append is given two empty ([]) lists. They also don't cover the case where the expected output is not [1, 2, 3, 4, 5]! In this sense, while tests are highly specific, they do not necessarily generalize to the full behavior of the function. This always leaves us with an inkling of doubt that our functions are truly correct.

Specificity

How might we arrive at a more general notion of program correctness? Rather than thinking about concrete test cases, we might consider more abstract propositions about the behavior of the function. Such abstract propositions are typically framed in terms of relationships between the inputs and outputs of the function. With list_append, we might think about the lengths of the input and output lists and how we might phrase our intuition about the function "gluing" its arguments together. This insight leads to the following property that we might check of the function:

For all lists l1 and l2, len(l1) + len(l2) $=$ len(list_append(l1, l2)).

To put another way, the lengths of l1 and l2 are preserved through list_append. That is, list_append doesn't (a) add any elements to the output beyond those found in the input, and (b) does not remove any elements.

By virtue of its statement, this property applies to all possible inputs to list_append. So if we were able to prove that this property held, we knew that it holds for the function, irrespective of the inputs that you pass to the function (assuming that the inputs are lists, of course). This stands in contrast to test cases where we make claims over individual pairs of inputs and outputs.

However, unlike a test case which is highly indicative of our intuition of correctness for this function, this property is only an approximation of our intuition. To put it another way, even if we know this property holds of every input to list_append, it doesn't mean that the function is correct! Here is an example implementation of list_append where the property holds, but the function is not "correct":

define bad_list_append(l1, l2):
    if is_empty(l1):
        return l2
    else:
        cons(0, bad_list_append(tail(l1), l2))

The lengths of the input lists are preserved with bad_list_append. However, the output is not the result of gluing together the input lists!

bad_list_append([1, 2, 3], [4, 5])
> [0, 0, 0, 4, 5]

Every element of l1 is replaced with 0! In this sense, our length-property of list_append is general---it applies to every possible set of inputs to the function---but not specific---the property only partially implies correctness.

Balancing Generality and Specificity

In summary, we've seen that test cases and general program properties sit on opposite ends of the generality-specificity spectrum. When we think about program correctness and verifying the behavior of programs, we are always trying to find the balance between generality and specify in our verification techniques. Ultimately, you can never be 100% sure that your program works in all cases---after all, a stray gamma ray can ruin whatever verification you have done about your program. So when employing program verification techniques---testing, automated analysis, or formal proof---you must ultimately make an engineering decision based on context, needs, and available time and resources. We won't have time to dedicate substantial time to these pragmatic concerns in this course, but keep them in the back of your mind as we introduce formal reasoning for program correctness in the sections that follow.

Exercise

Consider the Python function lcm(x, y) from the math module which returns the least-common multiple (LCM) of the numbers x and y. For example, the LCM of 3 and 4 is 12, and the LCM of 6 and 15 is 30.

Write a few tests for this function.
Write a few properties of this function's behavior that implies its correctness.

A Substitutive Model of Computation

In order to rigorously prove that a property holds of a program, we must first have a rigorous model of how a program executes. You likely have some intuition about how Python programs operate. For example, try to deduce what this poorly named function does:

def foo(bar):
    if is_empty(bar):
        return 0
    else:
        1 + foo(tail(bar))

How might you proceed? You might imagine taking an example input, e.g., [9, 1, 2], and predicting how the function processes that input. With a few examples and prior knowledge of how conditionals and recursion work, you can likely deduce that this function calculates the length of the input list given to it. Our first goal, therefore, is to formalize this intuition, namely, explicate the rules that govern how Python programs operate.

For a pure, functional programming language like the subset of Python that we work with, we can use a simple substitutive model of computation that extends how we evaluate arithmetic expressions. While simple, this model is capable of capturing the behavior of most Python programs we might consider in this course.

Review: Arithmetic Expressions

First, let's recall the basic definitions and rules surrounding arithmetic expressions. We can divide these into two sorts:

Syntactic definitions and rules that govern when a collection of symbols is a well-formed arithmetic expression.
Semantic definitions and rules that give meaning to a well-formed arithmetic expression. "Meaning," in this case, can be thought of as "how the arithmetic expression computes."

Syntax

Here is an example of a well-formed arithmetic expression:

$8 \times (10 - 5 \div 2)$

In contrast, here is an ill-formed arithmetic expression:

$(4 + 5) - (; \times 9)$

In this ill-formed arithmetic expression, the multiplication operator (×) is missing a left-hand argument.

We can formalize this intuition about how to create a well-formed arithmetic expression with a grammar. Similarly to natural language, grammars concisely define syntactic categories as well as rules of formation for various textual objects. Here is a grammar for arithmetic expressions:

e ::= <number> | e1 + e2 | e1 - e2 | e1 × e2 | e1 ÷ e2 | (e)

To the left of the ::= symbol is a variable, e, that represents a particular syntactic category. To the right are the rules for forming an element of that category. Here, the syntactic category is "arithmetic expressions," traditionally represented with the variable e. The rules are given as a collection of possible forms or alternatives, separated by pipes (|). The grammar says that a well-formed expression is either:

A number (<number> is a placeholder for any number), or
An addition of the form e1 + e2 where e1 and e2 are expressions, or
A subtraction of the form e1 - e2 where e1 and e2 are expressions, or
A multiplication of the form e1 × e2 where e1 and e2 are expressions, or
A division of the form e1 ÷ e2 where e1 and e2 are expressions, or
A parenthesized expression of the form (e) where e is an expression.

Note

About Pronunciation: it may not seem like it, but pronunciation is an excellent tool for internalizing the meaning of a collection of mathematical symbols. Every collection of mathematical symbols is really a sentence in the natural language sense. In our grammar, we can pronounce ::= as "is" and | as "or," so we obtain: "e is a number, or an addition, or … ." This makes it more intuitive and clear what the symbols represent. As we introduce more mathematical symbols throughout the course, always look for ways to pronounce these symbols as cohesive sentences.

Importantly, the grammar is recursive in nature: the various alternatives contain expressions inside of them. This allows us to systematically break down an arithmetic expression into smaller parts, a fact that we leverage when we evaluate expressions. For example:

$8 \times (10 - 5 \div 2)$ is an expression.
$8$ is an expression.
$(10 - 5 \div 2)$ is an expression.
$10 - 5 \div 2$ is an expression.
$10$ is an expression.
$5 \div 2$ is an expression.
$5$ is an expression.
$2$ is an expression.

Note that when interpreting symbols, there is sometimes ambiguity as to how different symbols group together. For example, consider the sub-expression from above:

$10 - 5 \div 2.$

Should we interpret the expression as $10$ and $5 \div 2$ as in the example or alternatively $10 - 5$ and $2$ . Our knowledge of arithmetic tells us that the first interpretation is correct because the $(\div)$ operator takes precedence over the $(-)$ operator. Rules of precedence are, therefore, additional syntactic rules that we must consider in some cases. There also exist rules of associativity for some grammars that govern the order in which we group symbols when the same operator is invoked multiple times. For example, we traditionally interpret subtraction as left-associative. That is $1 - 2 - 3$ is understood to be $(1 - 2) - 3$ rather than $1 - (2 - 3)$ , i.e., $1 - 2$ goes first rather than $2 - 3$ .

Semantics

Once we have established what a well-formed expression looks like, we then need to define what it means. In general, the meaning of a well-formed object is dependent entirely on the purpose of said object. For arithmetic expressions, we care about computing the result of such an expression, so we take that to be the meaning or semantics of an expression.

Traditionally, we define the rules for computing or evaluating an arithmetic expression as follows:

Find the next sub-expression to evaluate using the rules of associativity and precedence to resolve ambiguity.
Substitute the value that the sub-expression evaluates to for that sub-expression in question.
Repeat the process until you are left with a final value.

A value is simply an expression that can not be evaluated or simplified any further. For arithmetic expressions, any number is a value.

By repeating this process, we obtain a series of evaluation steps that an arithmetic expression takes before arriving at a final value. For example, here are the steps of evaluation taken for our sample expression from above:

    8 × (10 - 5 ÷ 2)
--> 8 × (10 - 2.5)
--> 8 × 7.5
--> 60

Note that at each step, the resulting expressions are equivalent, e.g., $8 \times (10 - 5 \div 2)$ and $8 \times 7.5$ are equivalent; both of these expressions are also equivalent to the value $60$ .

Exercise

For each of the following arithmetic expressions, determine if (a) they are well-formed and (b) if they are well-formed, what is their step-by-step evaluation to a final value.

$100 \div (25 \times 5 - 3)$ .
$38 - \times 5 + 2)$ .
$1 + 2 - 3 + 4 \times 5 \div 6 - 7$ .

Core Python and the Substitutive Model

Our substitutive computational model for Python extends the rules for evaluating arithmetic expressions. Like arithmetic expressions, we will think of Python programs as an expression that evaluates to a final value. And like arithmetic, programs operate by stepwise evaluation, where we find the next sub-expression to evaluates and substitute. Python's rules for precedence mirror those from arithmetic. So we only need to consider precedence for Python's additional constructs beyond arithmetic.

Expressions

For our purposes, well-formed Python expressions can be defined by the following grammar:

e ::= x                         (variables)
    | <number>                  (numbers)
    | True                      (true)
    | False                     (false)
    | lambda (x1, ..., xk): e   (lambdas)
    | e(e1, ..., ek)            (function application)

In other words, a Racket expression is either:

A variable x.
A number.
The true value True.
The false value False.
A Lambda with arguments x1, ..., xk and a body expression e.
A function application applying arguments e1, ..., ek to function e.

As an example, the following expression taken from our definition of list_append above:

cons(head(l1), list_append(tail(l1), l2))

Is a well-formed expression that is a function application of two arguments to the function bound to the variable cons. Each argument is, itself, a function application.

Values

In arithmetic, numbers were our only kind of value. In Racket, we have multiple kinds of values corresponding to the different types of Racket:

Numbers are still values (of type int or float depending on whether number is integral or floating-point).
True and False are values (of bool type)
lambda (x1, ..., xk): e is also a value (of function type).

Statements

In addition to expressions, Python also has statements defined by the following grammar:

s ::= s1                    (sequenced statements)
      s2
    | x = e                 (variable declaration/assignment)
    | def f(x1, ..., xk):   (function declaration)
          s
    | return e              (returns)
    | if e1:                (conditional statements)
          s1
      else:
          s2

The sequenced statement looks syntactically odd on its own. But it captures idea that wherever one statement is expected, multiple statements can appear by placing them one line after the other at the same indentation level. For example:

def f(x):
    y = x + 1
    z = y + 1
    return x + y + z

Here, the body of f can be thought of as three statements or a single sequenced statement (y = x + 1) that, itself, precedes a single sequenced statement (z = y + 1), that, itself, precedes a final statement (return x + y + z).

In contrast to expression, statements do not produce values. Instead, they have some kind of effect on the execution of our program. For example:

Variable and function declarations introduce new variables into scope.
Return statements causes the program to return from a function with a specified value.
Conditional statements transfer the flow of execution into one of several possible branches.

Reviewing our list_append definition, we see that it is a function declaration. The body of list_append is a single conditional and inside each branch of the conditional is a return statement.

Substitutive Evaluation for Python

The heart of Python evaluation is function application. Function application directly generalizes arithmetic evaluation for operators: In arithmetic evaluation:

We first evaluate the left- and right-hand arguments to the operator to values.
We then apply the operator to those values.

For example, for $e_{1} + e_{2}$ , we first evaluate $e_{1}$ to a value, call it $v_{1}$ and then evaluate $e_{2}$ to a value, call it $v_{2}$ . We then carry out the addition of $v_{1}$ and $v_{2}$ .

To evaluate a function application, we first evaluate each of the sub-expressions of the function application to values. For example, consider the inc function defined to be:

def inc(n):
    return n + 1

if we have the expression, inc(3 + (5 * 2)), it evaluates as follows:

    inc(3 + (5 * 2))
--> inc(3 + 10)
--> inc(13)

Once we arrive at this point, we need to evaluate the actual function application itself. In the case of primitive operations, we just carry them out directly and substitute their resulting values into the overall expression. However, arbitrary functions will perform arbitrary operations, so we cannot simply just "substitute the result"; we must calculate it by hand!

To do this, we want to perform the following steps:

We substitute the body of the function be applied to for the overall function application.
We substitute the actual arguments passed to the function for each of its parameters.
We then continue evaluation of the resulting expression like normal.

However, you may have noticed in the Python grammar we presented earlier that the body of functions are statements and not expressions. This poses a conundrum for our evaluation model because we want each step to be a valid expression in our language. In other words, we don't want to an expression to step to a statement; that doesn't make sense!

To reconcile this fact, we will introduce one additional expression form that only appears in our step-by-step evaluative model. The statement-expression, written {s1; ...; sk} represents a series of statements that are currently executing that must end in a return statement. The value that this return statement evaluates to is what this statement-expression evaluates to when it is done executing.

For example, with our inc expression above:

We substitute the statement-expression {return n + 1} for the function call inc(13).
We substitute 13 for n everywhere it occurs in this new statement-expression: {return 13 + 1}.
We continue evaluation of the resulting expression like normal.

To evaluate a statement-expression, we execute the first statement in the list of statements left to execute. In our example, this means we must evaluate return 13 + 1. Evaluating a return statement is straightforward: we evaluate the return's expression to a value and then substitute this value for the entire statement-expression.

This results in the following steps of evaluation:

    (inc 13)
--> {return 13 + 1}
--> {return 14}
--> 14

Observe how the body of inc took two steps to evaluate: one to evaluate the return's argument and one to return from the function call.

These rules apply to functions of multiple arguments as well. For example, consider the following definition of a function that averages three numbers:

def avg3(x1, x2, x3):
    return (x1 + x2 + x3) / 3

Then the expression avg3(2 * 5, 8, 1 + 2) evaluates as follows:

    avg3(2 * 5, 8, 1 + 2)
--> avg3(10, 8, 1 + 2)
--> avg3(10, 8, 3)
--> {return (10 + 8 + 3) / 3}
--> {return (18 + 3) / 3}
--> {return 21 / 3}
--> 7

Note that we evaluate the arguments to avg3 in left-to-right order, one argument at a time.

Exercise

Consider the following Racket top-level definitions:

def f(x, y):
    return x + y

def g(a, b):
    return f(a, a) - f(b, b)

Use your mental model of computation to give the step-by-step evaluation of the following Racket expression: g(5, 3). Check your work by evaluating this expression in DrRacket and verifying that you get the same final result.

Return Statements

As described above, to evaluate a return statement:

Evaluate the return's argument to a value.
Substitute this value for the return statement's immediate surrounding statement-expression.

In other words, {return v} --> v for any value v.

Statement-expressions can be nested because functions call other functions. In these situations, we simply evaluate the innermost statement-expression. This corresponds to evaluating the most recent function call!

For example, consider the following example:

def f(n):
    return n + 1

def g(a, b)
    return f(a) + f(b)

And consider the step-by-step evaluation of g(8, 2):

    g(8, 2)
--> {return f(8) + f(2)}
--> {return {return 8 + 1} + f(2)}
--> {return {return 9} + f(2)}
--> {return 9 + f(2)}
--> {return 9 + {return 2 + 1}}
--> {return 9 + {return 3}}
--> {return 9 + 3}
--> {return 12}
--> 12

Conditionals

Recall that conditional statements have the form:

if e:
    s1
else:
    s2

To evaluate a conditional:

We first evaluate the guard e to a value. This value ought to be a boolean value, i.e., True or False.
If the guard evaluates to True, we substitute statement s1 for the entire conditional.
Otherwise, the guard must have evaluated to False. We, therefore, substitute s2 for the entire conditional.
We then continue evaluating the resulting expression.

Here is an example of a conditional within our substitutive model.

    1 + { if 3 < 5:
              return 2
          else:
              return 5 * 5 }
--> 1 + { if False:
              return 2
          else:
              return 5 * 5 }
--> 1 + { return 2 }
--> 1 + 2
--> 3

Note that it is sometimes quite onerous to write a conditional statement over multiple lines. When it is more prudent to do so, we will collapse everything onto a single line, e.g.,

{if 3 < 5: return 2 else: return 5 * 5}

Variable Declarations

Variable declarations allow us assign names to intermediate computations for the purposes of readability or performance (i.e., to avoid redundant work). To evaluate a variable declaration of the form x = e:

We evaluate e to a value v.
We substitute every occurrence of x for v in the immediate statement-expression that encloses this assignment.
Finally, we remove this variable declaration from the statement-expression that we are evaluating.

Consider the following function that declares local variables:

def f():
    x = 1 + 1
    y = x * 2
    z = x * y
    return x + y + z

Here's an example of the step-by-step evaluation of a call to this no-argument function:

    f()
--> { x = 1 + 1
      y = x * 2
      z = x * y
      return x + y + z }
--> { x = 2
      y = x * 2
      z = x * y
      return x + y + z }
--> { y = 2 * 2
      z = 2 * y
      return 2 + y + z }
--> { y = 4
      z = 2 * y
      return 2 + y + z }
--> { z = 2 * 4
      return 2 + 4 + z }
--> { z = 8
      return 2 + 4 + z }
--> { return 2 + 4 + 8 }
--> { return 6 + 8 }
--> { return 14 }
--> 14

Note

A note on precision: while performing these step-by-step derivations, you might be tempted to skip steps, e.g., evaluating all the arguments of a function call at once down to a single value. Resist this urge! One of the "hidden" skills we want to develop in working through these derivations is unfailing attention to detail. We want to be able to carry out mechanical processes like these derivations exactly as advertised.

This is useful for two purposes:

Such attention to detail is a skill necessary for effective algorithmic design. By maintaining this level of precision, you are more likely to catch corner cases and oddities in your design that you can fix before they later become bugs in code.
More immediate for our purposes, skipping steps in deductive reasoning is the number one way to introduce erroneous reasoning in your proofs! Mechanics don't lie, so being able to explicate every step of a logical argument is necessary to be certain that your reasoning is sound.

Later in the course, we will discuss lifting our reasoning to more high-level concerns where we are free to skip low-level steps so that we don't lose sight of the bigger picture of what we are trying to prove. However, our high-level reasoning is always backed by low-level mechanics. It is imperative your develop those mechanics now, otherwise, your high-level reasoning will be baseless!

Additional Exercises

Exercise (Additional Tracing, ‡)

Consider the following Python top-level definitions:

def compute3(x, y, z):
    intermediate = x + y
    return z * intermediate + intermediate

def triple(n):
    return compute3(n, n, n)

Give the step-by-step evaluation (i.e., evaluation traces) for each of the following expressions. Make sure to write down all steps of evaluation as required by our substitutive model of computation!

3 + (5 * (2 / (10 - 5)))
compute3(2 * 3, 8 + 3, 1 / 2)
triple(5 + 0.25)

Make sure to check your work by entering these expressions into Python!

Exploring Python

Today, we'll explore our first mathematical model, a substitutive semantics for Python programs.

LaTeX and the CSC208 Package

For each lab and demonstration exercise, you will submit a final LaTeX-generated PDF with your solutions. To help format your files, we've created a LaTeX template template for you to use. You can download the template here:

csc208_template.tex

The template includes a number of standard packages and formats the document to look like a more traditional document, i.e., not in narrow column format.

To get started, simply download the file and import it into a new Overleaf project. Make sure to fill in the \title, \author, and \grinnellusername macros at the top of the file with the appropriate information!

For each problem, you should use the \begin{problem} ... \end{problem} environment to format your solutions by problem. Note that the environment puts each problem on a new page!

Problem 1: What's in a name?

A strength of possessing an explicit computation model is explaining tricky corner cases in the language's design. In this problem, we'll look at some of the issues arising with names in Python.

Consider the following function definitions.

def f1(x, y):
    return x + y + x

def f2(n):
    return n * 3

def f3(n):
    return f2(4) + f2(n) + n

Give execution traces for each of the following expressions. Try to let each member of your group take the lead on writing down the derivation, so everyone gets their practice in. The remaining members of the group should check their work.

When formatting your traces, use the \begin{verbatim} ... \end{verbatim} environment. Importantly, verbatim will allow a chunk of text to leak off the page if it is too big. If this is the case, please properly format your document by introducing newlines and/or breaking up your trace into multiple verbatim blocks!
- f1(1+2, 7)
- f2(f2(f3(5)))
- f3(11)
Consider an alternative definition of f2 given below:
```
def f2(x):
    return x * 3
```
Compare the second expression's execution from the last part (f2(f2(f5(5)))) with the old and new definitions of f2. In a few sentences:
- Comment on the behavior of both versions of f2. Do they behave the same or differently?
- From this example, what can you say about the choice of names of function parameters between different functions. Does it matter what names you choose? Why or why not?

Now consider the following top-level declarations:

value = 12              # A

print(value)            # 1

def f(value):           # B
    x = value           # 2
    value = 5           
    return x + value    # 3

f(value)                # 4
print(value)            # 5

Give an execution trace for the expression on line 4. Double-check your final result in Python.

Based on your derivation, determine which definitions of value (points labeled A--C) are used for each usage of value in f, i.e., the points labeled 1--5.
Give a rule for determining which of the definitions of a variable applies to some use of that variable. Once you have a candidate rule, check your answer with a member of the course staff.

Problem 2: Loop the Loop

Another corner case in the evaluation of Python programs concerns non-termination. Consider the following recursive function:

# Returns a new list that is l with x in head position
def cons(x, l):
    return [x, *l]

def mystery(n, x)
    if n == 0:
        return []
    else:
        return cons(x, mystery(n - 1), x)

Give execution traces for each of the following expressions. Again, let every group member take lead on at least one of the expressions. Note that we didn't talk about recursion in the reading! However, there is nothing new to say here—recursion "just works" in this model, so follow your nose!
- mystery(0, 10)
- mystery(1, "a")
- mystery(3, False)
In a sentence or two describe what this function does.
Now, trace the execution of this expression. What happens to your execution trace for this expression? Give enough of your trace to explain what is going on.
- mystery(-1, "q")
Run this expression in Python. What happens? Try to reconcile explain the behavior of Python in terms of how your trace played out.
Try tracing this curious expression:
- (lambda x: x(x))(lambda x: x(x))
Be very careful about variable names and substitution here! What happens to your execution trace for this expression? Give enough of your trace to explain what is going on.

Propositions and Proofs

Our formal model of computation allows us to reason about the behavior of programs. But to what ends can we apply this reasoning? Besides merely checking our work, we can also use our formal model to prove propositions about our programs.

Proposition

A proposition is an assertion or statement, potentially provable.

Another word for a proposition is a claim. Here are some example propositions over programs that we could consider:

len("hello world!") is equivalent to 12.
There exists a list l for which len(l) is 0.
insertion_sort(l) performs more comparison operations than mergesort(l) for any list l.
For any number n, 2 * n is greater than n.

The first proposition is a fully concrete proposition of the sort we have previously considered. The second is an abstract proposition because it involves a variable, here l.

Ultimately, the first two propositions involve equivalences between expressions. But propositions do not need to be restricted to equivalences. The third proposition talks about the number of operations that one expression performs relative to another.

Furthermore, propositions don't even need to be provable! For example, the final proposition is not provable. A counterexample to this proposition arises when we consider n = -1. 2 * -1 evaluates to -2, and -2 is not greater than -1!

Info

"True" versus "Provable": in our discussion of logic and its applications to program correctness, we will need to discuss both boolean expressions as well as propositions. There exist commonalities between both---they involve truth and falseness. However, booleans exist inside our model, i.e., at the level of programs. Propositions exist outside of the model as they ultimately are statements about the model.

To distinguish between the two, we'll use the terms true and false when discussing booleans and "provable" and "refutable" when discussing propositions. We should think of boolean expressions as evaluating to true or false. In contrast, we will employ logical reasoning to show that a proposition is provable or refutable.

Equivalences Between Expressions

Of the many sorts of propositions possible, we will work exclusively with equivalences in our discussion of program correctness.

Definition: Program Equivalence

Two expressions $e_{1}$ and $e_{2}$ are equivalent, written $e_{1} \equiv e_{2}$ (LaTeX: \equiv, Unicode: ≡) if $e_{1} ⟶^{*} v$ and $e_{2} ⟶^{*} v$ .

Recall that $e ⟶ e^{'}$ ("steps to", LaTeX: \longrightarrow) means that the Python expression $e$ takes a single step of evaluation to $e^{'}$ in our mental model of computation. The notation $e ⟶^{*} e^{'}$ ("evaluates to", LaTeX: \longrightarrow^*) means that $e$ takes zero or more steps to arrive at $e^{'}$ . With this in mind, the formal definition of equivalence amounts to saying the following:

Two expressions are equivalent if they evaluate to identical values.

Thus, we can prove a claim of program equivalence by using our mental model to give a step-by-step derivation (an execution trace) of an expression to a final value. If both sides of the equivalence evaluate to the same value, then we know they are equivalent by our definition! The execution trace itself is a proof that the equivalence holds.

For example, consider the following recursive definition of factorial:

def factorial(n):
    if n == 0:
        return 1
    else:
        n * factorial(n-1)

and subsequent claim about its behavior:

Claim

Claim: factorial(3) $\equiv$ 6.

We can prove this claim by evaluating the left-hand side of the equivalence and observing that it is identical to the right-hand side:

Proof

The left-hand side expression evaluates as follows:

    factorial(3)
--> {if 3 == 0: return 1 else: return 3 * factorial(3-1)}
--> {if False: return 1 else: return 3 * factorial(3-1)}
--> {return 3 * factorial(3-1)}
--> {return 3 * factorial(2)}
--> {return 3 * { if 2 == 0: return 1 else: return 2 * factorial(2-1)}}
--> {return 3 * { if False: return 1 else: return 2 * factorial(2-1)}}
--> {return 3 * { return 2 * factorial(2-1)}}
--> {return 3 * { return 2 * factorial(1)}}
--> {return 3 * { return 2 * {if 1 == 0: return 1 else: return 1 * factorial(1-1)}}}
--> {return 3 * { return 2 * {if False: return 1 else: return 1 * factorial(1-1)}}}
--> {return 3 * { return 2 * {return 1 * factorial(1-1)}}}
--> {return 3 * { return 2 * {return 1 * factorial(0)}}}
--> {return 3 * { return 2 * {return 1 * {if 0 == 0: return 1 else: return 0 * factorial(0-1)}}}}
--> {return 3 * { return 2 * {return 1 * {if True: return 1 else: return 0 * factorial(0-1)}}}}
--> {return 3 * { return 2 * {return 1 * {return 1}}}}
--> {return 3 * { return 2 * {return 1 * 1}}}
--> {return 3 * { return 2 * {return 1}}}
--> {return 3 * { return 2 * 1}}
--> {return 3 * { return 2}}
--> {return 3 * 2}
--> {return 6}
--> 6

This precise step-by-step analysis of the behavior of the expression rigorously proves our claim!

Formatting Proofs

We will use this standard format for writing formal proofs for the remainder of the course. In summary, we write:

Proof Template

Claim: (The claim to be proven)

Proof. (The proof of the claim)

Exercise (Concrete Evaluation, ‡)

Prove the following claim over concrete expressions:

Claim: 1 + 2 + 3 $\equiv$ (3 * (3 + 1)) / 2.

Write our your proof in the format described above.

Symbolic Execution

Previously, we introduced a model of computation for the Racket programming language. This model allows us to prove program properties when concrete values are involved. However, we frequently wish to prove properties where the values are unknown. For example, we might consider a proposition about the standard list_append function:

For all lists l1 and l2, len(l1) + len(l2) = len(list_append(l1, l2))

This proposition ranges over unknown, rather than concrete, lists. We, therefore, need to upgrade our mental model to work with these unknown quantities.

Abstract Propositions

Up to this point, we have considered concrete expressions, i.e., expressions that do not contain variables. What happens if we allow expressions to contain variables. As an example, consider the following implementation of the boolean and function:

# N.B. non-short-circuiting version of `and`
def my_and(b1, b2):
    if b1:
        return b2
    else:
        return False

Now, let's consider the following equivalence claim:

Claim: my_and(True, b) ≡ False

Here, b is a variable, presumed to be of boolean type. However, how do we interpret b? It turns out there are two interpretations we might consider:

Does there exist a boolean value to b so that the proposition is provable?
Is the proposition provable for all possible boolean values that b can take on?

The former interpretation is called an existential quantification of b. We alternatively say that b is existentially quantified or is an "existential." Quantification refers to the fact that our interpretation tells us "how many" values of b to consider in the proposition. In existential quantification, we consider a single value.

In contrast, the latter interpretation is called a universal quantification of b. In universal quantification, we mean that the proposition holds for all possible values of b. Note that the above proposition is provable if b is interpreted existentially: if we let b be #f then:

    my_and(True, b)
--> my_and(True, False)
--> { if True: return False else: return False }
--> { return False }
--> False

However, the proposition does not hold when b is universally quantified. More specifically, while it holds when b is False, it does not hold when b is True.

    my_and(True, b)
--> my_and(True, True)
--> { if True: return False else: return False }
--> { return False }
--> False

Because of this, we must be explicit when introducing variables into our proposition. For each such variable, we must declare whether it is existentially quantified and universally quantified. To do so, we can use the words:

For all for universal quantification, e.g., "for all lists l …" and
There exists for existential quantification, e.g., "there exists a number n …".

Furthermore, we reason about the variable differently depending on its quantification, as we see in the following sections.

Exercise (Quantified Propositions)

Write down an additional existential and universal claim involving the my_and function.

Existential Propositions

If a variable $x$ appears as an existential in a proposition, we interpret that variable as: there exists a value for $x$ such that the proposition holds. In other words, the proposition is provable if we can give a single value to substitute for $x$ so that the resulting proposition is provable. Thus, to prove an existential claim, we can choose such a value, substitute for the existential variable, and then use concrete evaluation as before.

As an example, let's formally prove the existential version of the my_and claim that we introduced above:

Proof

Claim: there exists a boolean b such that my_and(True, b) ≡ False.

Proof. Let b be False. Then we have:

    my_and(True, False)
--> my_and(True, False)
--> { if True: return False else: return False }
--> { return False }
--> False

Note how our proof has changed now that our proposition is abstract:

In our claim, we explicitly quantify the variable b by declaring it existential by using "there exists" to describe it.
In our proof, we explicitly choose a value for the existentially quantified variable ("Let b be False").

We can also existentially quantify over multiple variables. In these situations, we provide instantiations for each variable but otherwise proceed as normal.

Proof

Claim: There exists lists l1 and l2 such that list_append(l1, l2) ≡ [1 2 3].

Proof. Let l1 be [] (the empty list) and l2 be [1 2 3].

    list_append([], [1, 2, 3])
--> { if is_empty([]):
          return [1, 2, 3]
      else:
          return cons(head([]), list_append(tail([]), [1, 2, 3]))
    }
--> { if True:
          return [1, 2, 3]
      else:
          return cons(head([]), list_append(tail([]), [1, 2, 3]))
    }
--> { return [1, 2, 3] }
--> [1, 2, 3]

Exercise (Alternative Instantiation)

Revise the proof of list-append above by choosing alternative instantiations for l1 and l2 and deriving an alternative execution trace for the expression.

Universal Quantification

When a variable is universally quantified, it stands for any possible value. Let's take a look at a simple squaring function:

def square(n):
    return n * n

And a simple universal claim about this function:

Claim

for all numbers n, square(n) ≡ n * n.

Because the claim holds for all possible values of n, we can't choose a n like with existentials. Instead, we must hold n abstract, i.e., consider it to be an arbitrary number, and then proceed with the proof. In effect, because n is universally quantified, we treat n like a constant, yet unknown, quantity in our reasoning.

Variables versus Unknown Constants

It may seem pedantic to distinguish between a variable and a constant of unknown quantity. However, there a subtle yet essential difference between the two concepts. A variable is an object in a proposition that must be quantified to give it meaning. An unknown constant already has meaning---it is known to be a single value. However, we don't assume anything about the variable's identity beyond what we already know, e.g., whether it is a list or a number.

When we use our mental model of computation, we immediately arrive at a problem: both square(n) and n * n cannot take any evaluation steps! square(n) cannot step because n needs to be a value before we perform the function application, and we said that values were numbers, boolean constants, or lambdas. n * n cannot step because since we don't know what n is, we don't know what concrete value the multiplication will produce. We say that both expressions are stuck: they are not values, but they cannot take any additional evaluation steps.

We can't reconcile the n * n case. Without knowing what n is, we cannot carry out the multiplication. However, if we treat the constant-yet-unknown n as a value, then we can proceed with the function application:

    square(n)
--> n * n

Even though the left- and right-hand sides of the equivalence are not values, they are identical. This fact is sufficient to conclude that the two original expressions are equivalent according to our definition of program equivalences! Let's put these ideas together into a complete proof of the proposition:

Proof

Claim: for all numbers n, square(n) ≡ n * n.

Proof. Let n be an arbitrary number. Then the left-hand side of the equivalence simplifies to square(n) --> n * n, which is identical to the right-hand side.

In summary, when we encounter a universally quantified variable in a proposition, we:

Consider the variable to be a constant, yet unknown value. For convenience, we keep the name of this constant to be the same as the (universally quantified) variable, but we understand that the two are different objects!
When reasoning about the constant, we assume that it is a value for the purposes of our mental model of computation.

Case Analysis

Sometimes when we work with universally quantified variables, we can get away without thinking about their actual values. However, more often or not, our reasoning must consider their possible values. This reasoning will ultimately depend on the types of values involved, and this is where our proofs get more intricate in their design!

As an introduction to these concepts, let's consider the case where we know the type in question only allows for a finite set of values. At present, only the boolean type has this property. Booleans only allow for two values, True and False, whereas there are an infinite number of numbers and functions to choose from!

To see how we can take advantage of the finiteness of the boolean type in our proofs, let's consider the following simple claim, again using the my_and function:

Claim

For all booleans b, my_and(b, False) ≡ False.

If we let b be arbitrary, we can begin evaluating the left-hand expression.

    my_and(b, False)
--> { if b: return False else: return False }

We can see pretty readily that no matter how the conditional evaluates, we will return False. Be careful, though; this intuition is not sufficient for formal proof! We must instead rely on our evaluation model directly to ultimately show that this intuition is correct. However, if b is unknown, we don't know which branch the conditional will produce.

Thankfully, we assumed that b was a boolean, so it must either be True or False. Therefore, we can proceed by case analysis: we will consider two separate cases, b is True and b is False, and show that the claim holds in both cases. If the claim holds for both cases, we know that the claim holds for every possible value of b and, thus, have completed the proof.

Proof

Proof. Let b be an arbitrary boolean. The left-hand side of the equivalence evaluates as follows:

    my_and(b, False)
--> { if b: return False else: return False }

Because b is a boolean, either b is True or b is False.

b is True. Then { if True: return False else: return False } --> { return False } --> False
b is False. Then { if False: return False else: return False } --> { return False } --> False

In both cases, the left-hand side steps to False, precisely the right-hand side of the equivalence.

Typesetting Cases

When performing a case analysis, it is imperative to state the proof's different cases explicitly. To format this in LaTeX, use a bulleted lists, e.g.,

\begin{proof}
  Because \code{b} is a boolean, either \code{b} is \code{True} or \code{b} is \code{False}.

  \begin{itemize}
  \item \textbf{\code{b} is \code{True}}.
    Then \code{ \{ if True: return False else: return False \} --> \{ return False \} --> False }
  \item \textbf{\code{b} is \code{False}}.
    Then \code{ \{ if False: return False else: return False \} --> \{ return False \} --> False }`
  \end{itemize}

  In both cases, the left-hand side steps to \code{#f}, precisely the right-hand side of the equivalence.
\end{proof}

Note how I bold each case at the start of the bullet to clearly separate the statement of the case from the subsequent proof.

Exercise (Order of Reasoning)

In the previous proof, we performed the case analysis on \code{b} after partially evaluating \code{my_and}. Rewrite the proof so that the case analysis happens before any evaluation occurs. Was either approach more concise? Easier to reason about? Why?

Exercise (Quantified Propositions, ‡)

Exercise (Quantified Propositions, ‡): consider the following Python definition of the boolean disjunction, \code{or}, function:

def my_of(b1, b2)
    if b1:
        True
    else:
        b2

Prove the following claims over this function:

Claim 1: there exists booleans b1 and b2 such that my_or(b1, b2) ≡ False.
Claim 2: for all booleans b, my_or(b, True) ≡ True.

Program Equivalence Proofs

Problem 1: Equivalence Propositions

Exploring a model for interesting corner cases and behavior is one thing. However, we are ultimately interested in using our model to formally prove properties of programs. We call such properties propositions, statements about the world that are (potentially) provable. A proof is a logical argument that the proposition is indeed true.

There are many kinds of propositions we might consider when we think about program correctness. The most fundamental of these is program equivalence, asserting that two programs produce the same value.

Let's try writing our first proof and writing it down in LaTeX. Here is a recursive function definition over lists:

def is_empty(l):
    return len(l) == 0

def head(l):
    return l[0]

def tail(l):
    return l[1:]

def cons(x, l)
    return [x, *l]

def list_length(l):
    if is_empty(l):
        return 0
    else:
        return 1 + list_length(tail(l))

We will prove the following claim:

Claim

list_length([21, 7, 4]) ≡ 3.

The equivalence symbol (≡) (LaTeX: \equiv) acts like equality in that it asserts that the left- and right-hand sides of the symbol are equivalent. We say that two programs are equivalent if they evaluate to the same final value.

Before we write anything, we must prove the claim first! To show that an equivalence between two expressions holds, it is sufficient to show that both sides evaluate to identical values. The right-hand side of the equivalence is already a value, 3, so we need to show that the left-side of the equivalence evaluates to this same value. Use our mental model of computation to give an evaluation trace of list_length([21, 7, 4]).
Now, let's write the proof in your lab write-up. When writing proofs, we will always:
- Restate the claim.
- Give the proof.
For example, here is a proof of a simple arithmetic equivalence in this style:
Sample Proof
```
Claim: `5 * pow(2, 3) - 1` ≡ `39`.
```
Proof. The left-hand side of the equivalence evaluates as follows:
```
    5 * pow(2, 3) - 1
--> 5 * 8 - 1
--> 40 - 1
--> 39
```
□

Problem 2: Symbolic Reasoning with Xor

Consider the following definition of the boolean xor function:

def xor(b1, b2):
    if b1:
        return not b2
    else:
        b2

Prove the following claims about this function:

Claim (Xor can return true)

There exists a boolean b such that xor(True, b) ≡ True.

Claim (Xor cancellation)

For any boolean b, xor(b, b) ≡ False.

Claim (Xor equivalence)

For any pair of booleans b1 and b2, xor(b1, b2) ≡ and(or(b1, b2), not(and(b1, b2))).

For this final equivalence, you may evaluate individual calls to and, or, and not, in a single step of evaluation.

Demonstration Exercise 2

Problem 1: Verified Grades

Write a function calculate_grade(num_n, num_r, num_m, num_e, num_core) that takes five arguments, the number of Ns, Rs, Ms, Es, and core LOs earned by a student in this course and returns a string corresponding to the major letter grade they earned, i.e., "A", "B", "C", "D", or "F". The calculation for overall grades in the course can be found in the course syllabus. Include your source code for this function in your demo write-up, ideally in a \begin{lstlisting} ... \end{lstlisting} source code block.
State a property about calculate_grade that implies its correctness and formally prove that the property holds of your implementation. Your property must universally quantify over at least three of the parameters to calculate_grade. Otherwise, you are free to state any property of calculate_grade that could reasonably imply that your implementation is correct.

Problem 2: What's Wrong?

Consider the following erroneous implementation of a map_add(n, l) function which ought to add n to every element of a list of numbers l.

def cons(x, l):
    return [x, *l]

define map_add(n, l):
    match l:
        case []:
            return []
        case [head, *tail]
            return map_add(n, tail)

Part 1: Debug

In a sentence or two, describe what is wrong with this implementation.

Part 2: Debunk

Consider the following erroneous "proof" of the correctness of map_add.

Proof

Claim For any list of natural numbers l, map_add(0, l) ≡ l.

Proof. We proceed by induction on l.

Case: l is empty. Then the left-hand side evaluates as follows: map_add(0, l) --> [] ≡ l.
Case: l is non-empty. Let head and tail be the head and tail of l, respectively. Our induction hypothesis states that:

IH: map_add(0, tail) ≡ l.

The left-hand side evaluates as follows: map_add(0, l) -->* map_add(0, tail). By our induction hypothesis, map-add(0, tail) ≡ l, which completes proof.

Analyze the proof and in a sentence or two describe the error in the proof that allows the proof to go through.

Part 3: Deal With

Finally, replicate the proof, fixing the error identified in the previous step, and attempt to carry the proof to completion. Describe in a sentence or two where you get "stuck" in the proof and cannot proceed forward.

Problem 3: Zero Is The Hero

Part 1: Implement

Implement a recursive Python function dropzeroes(l) that takes a list l of numbers as input and returns l but with all 0s. For example dropzeroes([1, 3, 0, 1, 0, 2]) -->* [1, 3, 1, 2]. Make sure to test your code in Python and add your code to this write-up using a code block.

Part 2: Prove

Prove the following property of dropzeroes:

Claim (Idempotency of dropzeroes)

dropzeroes(dropzeroes(l)) ≡ dropzeroes(l)

(Hint: be precise with your program derivations and every step you make! In particular, make sure that every step is justified, and you consider all possibilities of evaluation.)

Preconditions and Proof States

With quantified variables, we can write rich equality propositions over functions. However, there is an essential detail in our claims that we have glossed over until now. Recall our list_append claim from the previous readings:

Claim

For any lists l1 and l2, len(l1) + len(l2) ≡ len(list_append(l1, l2)).

We snuck under the radar that we assumed that l1 and l2 were lists! This fact is essential because Python is happy to let us call our functions with any values that we want. The only caveat is that we may not like the results!

>>> list_append("banana", 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in list_append
  File "<stdin>", line 5, in list_append
  File "<stdin>", line 5, in list_append
  [Previous line repeated 3 more times]
TypeError: Value after * must be an iterable, not int

Sometimes we might get lucky and get a runtime error indicating that our assumption was violated. However, in other cases, we aren't so lucky:

>>> list_append([], "wrong")
'wrong'

Here we feed list_append a string for its second argument. Recall that list_append performs case analysis on its first argument. When that first argument is empty, then the second argument is returned immediately. So even though the second argument had an incorrect type, the function still "works."

Nothing stops the user of list_append from passing inappropriate arguments. list_append assumes that the arguments passed obey specific properties, in this particular case, that the arguments are lists. These assumptions are integral to the correct behavior of the function. Consequently, when we analyze a program, we will also need to track and utilize these assumptions in our reasoning. This realization will lead us to a formal definition of proof that will guide our remaining study of the foundation of mathematics.

Review: Preconditions and Postconditions

In CSC 151, we captured assumptions about our functions' requirements and behavior as preconditions and postconditions.

Definitions

Precondition: a property about the inputs to a function that (a) the caller of the function must fulfill and (b) the function assumes are true.

Post-condition: a property about the behavior of the function that (a) the caller can assume holds provided they fulfilled the preconditions of the function and (b) the function ensures during its execution.

Preconditions and post-conditions form a contract between the caller of the function and the function itself, also called the callee. The caller agrees to fulfill the preconditions of the function. In return, the callee fulfills its post-conditions under the assumption that the preconditions have been fulfilled.

Preconditions and post-conditions are an integral aspect of program design. They formalize the notion that while a function, in theory, can guard against every possible erroneous input, it is impractical to do so. For example, recall that mergesort relies on a helper function merge that takes two lists and combines them into a single sorted list:

def merge(l1, l2):
    '''merge(l1, l2) merges lists l1 and l2 into a sorted list.'''
    # ...
    pass

The post-condition of merge is that the list it returns is sorted, i.e., using our language of propositions:

Post-condition

is_sorted(merge(l1, l2)) ≡ True.

is_sorted is a custom function that takes a list l as input and returns True if and only if l is sorted.

However, what preconditions do we place upon the inputs, l1 and l2?

l1 and l2 must both be lists.
The elements of l1 and l2 must all be comparable to each other (presumably with the (<) operator).
l1 and l2 must be sorted.

If the implementor of merge was paranoid, they might consider implementing explicit checks for these three preconditions. However, imagine what these checks might look like in code. All the checks require that we either:

Scan the lists l1 and l2 multiple times, once for each precondition we wish to check.
Integrate all three checks into the merging behavior of the function.

The former option is modular---we can write one helper function per check---but inefficient. The latter option is more efficient but highly invasive to the merge function and leads to an unreadable implementation.

Both options are undesirable. This fact is why we usually choose a third option: make the preconditions documented assumptions about the inputs of the function.

Checking Preconditions in Code

Note that how we account for preconditions in our programs is ultimately an engineering decision. You have to consider the context you are in---development ecosystem, company culture--and perform a cost-benefit analysis to determine what the "correct" approach is for your code.

In CSC 151, we introduced a "kernel-husk" implementation that separated precondition checks from the function's primary behavior. While inefficient, this approach is desirable in some situations because Scheme provides little static error checking. Other languages provide different mechanisms to capture precondition checks without incurring runtime costs when they are important. For example, Java has an assert statement that you can use to check preconditions only when you compile the program in debug mode. When you release your software, you can easily switch to release mode, which removes all the assert checks.

Preconditions as Assumptions During Proof

When we reason about concrete propositions, preconditions are unnecessary because we work with actual values. However, with abstract propositions, preconditions become assumptions about (universally quantified) variables.

We initially obtain these assumptions as part of the proposition we are verifying. For example, in our list_append claim:

Claim

For any lists l1 and l2, len(l1) + len(l2) ≡ len(list_append(l1, l2)).

The text "for any lists …" introduces the precondition that l1 and l2 are lists.

As another example, consider the post-condition of merge now realized as a proposition:

Claim

For any lists of numbers l1 and l2, if is_sorted(l1) ≡ True and is_sorted(l2) ≡ True then is_sorted(merge(l1, l2)) ≡ True.

In addition to the assumptions we have about the type of the variables, we can also have arbitrary properties about these variables.

Our claim is of the form "if … then …" where the preconditions sit between the "if" and "then" We assume these preconditions and then go on to prove the claim which follows the "then." In this example, the additional preconditions are that:

is_sorted(l1) ≡ True
is_sorted(l2) ≡ True

And then the claim we prove with these assumptions is:

is_sorted(merge(l1, l2)).

Short-hand for Propositions Involving Booleans

As we discuss preconditions throughout this reading and beyond, we will frequently work with preconditions that assert an equivalence between two expressions of boolean type. For example, to state the proposition that x is less than y, we would declare that it is equivalent to True:

x < y ≡ True

This notation is quite taxing to write where we have many preconditions in our reasoning. Instead, we'll use shorthand, taking advantage of the similarities between boolean expressions and propositions. When we write the following proposition:

x < y

We really mean the complete equivalence x < y ≡ True. With this notation, we can rewrite the above claim above more elegantly as:

Claim

For any lists of numbers l1 and l2, if is_sorted(l1) and is_sorted(l2) then is_sorted(merge(l1, l2)).

On Mathematical Short-hand

This short-hand is our first example of a common mathematics practice. In our pursuit of a rigorous, formal definition, we might create a situation where it is highly inconvenient to use this definition in prose. Perhaps this definition is too verbose. Or maybe it is highly parameterized, but in the common case, the parameterization is unnecessary.

In these situations, we introduce short-hand notation in mathematics to concisely write down our ideas. An example of this notation are symbols, e.g., $x ∣ y$ for " $x$ divides $y$ ." Alternatively, we may introduce puns, seemingly invalid mathematical syntax, given special meaning in context. For example, a Racket expression is not a program equivalence. However, in the above example, we introduce the pun that a boolean Python expression is, implicitly, a program equivalence with True.

Mathematical short-hand is a secret sauce of writing mathematics. It allows us to write beautifully concise yet thoroughly precise prose. However, if you aren't aware of the meaning of symbols and puns in a piece of mathematical writing, you can quickly become lost. Keep an eye out for such short-hand when reading mathematical prose, and you will find your comprehension to go up rapidly as a result.

Exercise

Write a claim about the behavior of the following function that implies its correctness. Make sure to include any necessary preconditions in your claim.

def index_of(x, l):
    '''Returns the (0-based) index of element x in list l'''
    # ...
    pass

Tracking and Utilizing Assumptions

Assumptions about our variables initially come from preconditions we write into our claims. How do we use these assumptions in our proofs of program correctness? It turns out we have been doing this already without realizing it!

For example, in our analysis of the boolean my_and function from our previous reading, we said the following:

Because b is a boolean, either b is True or b is False.

This line of reason is only correct because we assumed via a precondition:

Claim

For all booleans b, my_and(b, False) ≡ False.

That b is indeed a boolean. If we did not have such a precondition, this line of reasoning would be invalid!

When we have an assumption that asserts the type of a universally quantified variable, we can use that assumption to conclude that the variable is of a particular shape according to that type. For example:

A variable that is a natural number is zero or a positive integer.
A variable that is a function is bound to a lambda value, e.g., (lambda (n) (+ n 1)).
A variable that is a boolean is either #t or #f.

Furthermore, if an operation requires a value of a particular type as a precondition, a type assumption about a variable allows us to conclude that the variable fulfills that precondition. For example, if we know that x is a number then we know that x > 3 will produce a boolean result instead of potentially throwing an error.

Utilizing General Assumptions

Besides type assumptions, we can also assert general properties about our variables as preconditions. In our merge example, we assumed that the input lists satisfied the is_sorted predicate, i.e., were sorted. We can then use this fact to deduce other properties of our lists that will help us ascertain correctness, e.g., that the smallest elements of each list are at the front.

Because these properties are general, we have to reason about them in a context-specific manner. Let's look at a simple example of reasoning about one common kind of assumption: numeric constraints. Consider the following simple numeric function:

def double(n):
    return n * 2

And suppose we want to prove the following claim about this function:

Claim

For all numbers n, if n > 0 then double(n) > 0.

Employing our symbolic techniques, we first assume that n is an arbitrary number. However, the claim that follows the quantification of n includes a precondition---it is of the form "if … then … ". Therefore, in addition to n, we also gain the assumption that n > 0, i.e., n is positive.

When we go prove that double(n) > 0, we know from our model of computation that:

double(n) > 0 --> n * 2 > 0

This resulting expression is not always true. In particular, if n is non-negative, then n * 2 will be negative and thus less than 0. However, our assumption that n > 0) tells us this is not the case!

Here is a complete proof of the claim that employs this observation.

Proof

Claim: For all numbers n, if n > 0 then (double n) > 0.

Proof. Let n be a number and assume that n > 0. By the definition of double we have that:

double(n) > 0 --> n * 2 > 0.

However, by our assumption that n > 0 we know that n * 2 is a positive quantity---multiplying two positive numbers results in a positive number. Therefore, n * 2 > 0 holds.

Being Explicit When Invoking Assumptions

In the above proof, I was explicit about:

When I used an assumption in a step of reasoning, and
How I used that assumption to infer new information.

At this stage of your development as a mathematician, you should do the same for any other assumption you employ in a proof. As an example of what not to do, here is the same proof but without the explicit call-out of the assumption:

Proof

Claim: For all numbers n, if n > 0 then double(n) > 0.

Proof. Let n be a number and assume that n > 0. By the definition of double we have that:

double(n) > 0 --> n * 2 > 0.

Therefore, n * 2 > 0 holds.

This proof is valid. The steps presented and the conclusion are sound. However, this is a less formal proof because it has left some steps of reasoning implicit. In this course, we aim for rigorous, formal proof, and so we should not leave any steps implicit in our reasoning.

That being said, we will quickly find that dogmatically following this guidance will lead us to ruin. Analogous to programming, if we write our proofs at too low of a level, we will get lost in the weeds with low-level details and lose sight of the bigger picture.

For now, let's be ultra-explicit about our reasoning and follow the formula above whenever we invoke an assumption. However, because reasoning about type constraints, e.g., happens so frequently, it is okay if you elide when you use a type assumption in your proofs. For example, instead of saying:

By our assumptions, we know b is a boolean. Therefore, b must either be True or False.

You can, instead, say:

b is either True or False.

However, you should still be clear when you are assuming the type of a variable using a "let"-style statement, e.g.,

Let b be a boolean.

Assumptions that Arise During Analysis

Assumptions arise not only when we initially process our claim. We also gain new assumptions during the proving process. For example, let's consider the following simple function:

def nat_sub(x, y):
    if x < y:
        return 0
    else:
        return x - y

And the following claim about this function:

Claim

For all numbers x and y, nat_sub(x, y) >= 0.

Here is a partial proof of this claim:

Proof

Let x and y be numbers. Then we have that:

    nat_sub(x, y)
--> { if x < y: return 0 else: return x - y }

But now, we're stuck! We need to evaluate the guard to proceed, but we don't know how the expression x < y will evaluate. However, we do know the following:

x and y are numbers.
x < y will produce a result because x and y are numbers.
The result of x < y is either True or False because the expression has boolean type.

Because of this, we can proceed by case analysis on the result of x < y: it evaluates to either True or False. Let us consider each case in turn:

Proof

x < y --> True: In this case:

    { if x < y: return 0 else: return x - y }
--> { if True: return 0 else: x - y }
--> { return 0 }
--> 0

So nat_sub(x, y) -->* 0 which is greater-than or equal to 0.

x < y --> False: In this case

    { if x < y: return 0 else: return x - y }
--> { if False: return 0 else: x - y }
--> { return x - y }
--> x - y

Here, we seem to be stuck again. We don't know how to proceed with the subtraction since x and y are held abstract. However, we must remember that in this case, we are assuming that x < y is False. Therefore, we know that x is greater than or equal to y. Because we know the difference between a larger number and a smaller number is positive, we can conclude that x - y <= 0 as desired.

Let's see this reasoning together in a complete proof of our claim.

Proof

Claim: for all numbers x and y, nat_sub(x, y) >= 0.

Proof. Let x and y be numbers. Then we have that:

    nat_sub(x, y)
--> { if x < y: return 0 else: return x - y }

Either x < y --> #t or x < y --> #f.

x < y --> True: In this case:

    { if x < y: return 0 else: return x - y }
--> { if True: return 0 else: x - y }
--> { return 0 }
--> 0

Thus, nat_sub(x, y) -->* 0, a non-negative result.

x < y --> False. Then we have:
```
    { if x < y: return 0 else: return x - y }
--> { if False: return 0 else: x - y }
--> { return x - y }
--> x - y
```
From our case analysis, we assume that x < y does not hold. Thus, we know that x >= y. Because we know subtracting an equal-or-smaller number from a larger number results in a non-negative quantity, we can conclude that x - y <= 0.

In summary, we can obtain assumptions from places other than our claims, e.g., through case analysis or the post-condition of an applied function. We can then use these newly acquired facts to complete our proofs.

Exercise (Nesting)

Consider the following conditional statement in Python:

if x < 0:
    return e1
elif x <= 3:
    return e2
elif x == 5:
    return e3
elif x <= 8:
    return e4
else
    return e5

For each branch expressions e1 through e5 above, describe the assumptions about x you can make by entering that branch of the cond.

(Hint: make sure you account for the fact that to reach a branch, all the previous branches must have returned false.)

Proof States and Proving

In summary, we have introduced assumptions into our proofs of program correctness. These assumptions come from preconditions or through analysis of our code. We use these assumptions to prove our claim. Because our claim evolves throughout the proof, e.g., it is more accurate to say that we use assumptions to prove a goal proposition where the initial goal proposition is our original claim.

Surprisingly, it turns out that all mathematical proofs can be thought of in these terms!

Definitions

Proof State: the state of a proof or proof state is a pair of a set of assumptions and a goal proposition to prove.

Proof: a proof is a sequence of logical steps that manipulate a proof state towards a final, provable result.

When either reading or writing a mathematical proof, we should keep these definitions in mind. In particular, we have to be aware at every point of the proof:

What is our set of assumptions?
What are we trying to prove?

Exercise (Entry Fees, ‡)

Consider the following function that calculates the entry fee of a single ticket to Disney World in 2021 (mined from the Disney website):

def calculate_price(day_of_week, age, is_park_hopper):
    return 115 + \
           -5 if age < 10 else 0 + \
           14 if is_friday_or_weekend(day_of_week) else 0 + \
           65 if is_park_hopper else 0

(For conciseness sake, the code above utilizes two lesser known features of Python:

You can split up a long line into multiple lines by adding '\' to the end of each line.
The conditional expression operator e2 if e1 else e3 is analogous to the Scheme conditional expression (if e1 e2 e3). If e1 evaluates to True, the conditional expression evaluates to whatever e2 evaluates to. Otherwise, it evaluates to whatever e3 evaluates to.)

Prove the following claim:

Claim: For all days of the week d, natural numbers n, and boolean b, if b ≡ True then calculate-price(d, n, b) >= 175.

(Note: yea, Disney World is expensive, huh?)

Assumptions in proofs

Shorthand for Program Derivations

Most of the functions we will encounter from this point forward are simple conditionals, but our mental model can be quite verbose in these situations. Consider the following toy function:

def f(x):
    if 0 < x + 2:
        return "hello"
    else:
        return "goodbye"

For example, in our model, the function call f(1) evaluates as follows:

--> f(1)
--> { if 0 < 1 + 2:
         return "hello"
      else:
         return "goodbye" }
--> { if 0 < 3
         return "hello"
      else:
         return "goodbye" }
--> { if True
         return "hello"
      else:
         return "goodbye" }
-->  { return "hello" }
--> "hello"

While precise, these steps will obscure the broader argument we are trying to make for our program.

To make our program derivations more readable, we will use shorthand for this lab when evaluating function calls:

You are free to evaluate a function call directly to the returned expression that ultimately executed by the function.

If the statement-expression above came about from a function call, say f(1), we would write the evaluation of that call as:

     f(1) 
-->* "hello"

Skipping over the explicit execution of the statement-expression that the function call evaluates to in our model. We use the notation -->* (LaTeX: \longrightarrow^*, Unicode: ⟶*) to mean that multiple evaluation steps happened between the two lines of the derivation.

Be aware that whenever we skip steps, we introduce the possibility of making an error in our reasoning! So double-check your work whenever you skip steps in this manner!

Problem: Narrowing Down the Possibilities

Consider the following Python functions:

def f1(name):
    if name == "Alice":
        return 0            # Point A
    elif name == "Bob":
        return 1            # Point B
    elif name == "Carol":
        return 2            # Point C
    elif name != "Daniel":
        return 3            # Point D
    elif name != "Emily":
        return 4            # Point E

def f2(x, y):
    if y >= 1:
        return 0            # Point A
    elif x >= 1: 
        return 1            # Point B
    elif x == -1:
        return 2            # Point C
    elif x <= -5 or y >= 3:
        return 3            # Point D
    elif x == 0:
        return 4            # Point E
    elif x == y:
        return 5            # Point F

For each of the functions:

Identify the types of each of the parameters
Describe the set of assumptions we know are true about the parameters inside each branch at each labeled point, i.e., when that branch's guard is True.
Are each of the conditionals exhaustive? If not, describe what values are not covered by the if-else statements.

Problem: Capture

Consider the following Python function that computes a report for a student given their current grades in a course:

def report(hw_avg, quiz1, quiz2, quiz3):
    if hw_avg > 90 and quiz1 > 90 and quiz2 > 90 and quiz3 > 90:
        return "excelling"
    elif: hw_avg > 75 and quiz1 <= quiz2 and quiz2 <= quiz3:
        return "progressing"
    elif: hw_avg < 60 or quiz1 < 60 or quiz2 < 60 or quiz3 < 60:
        return "needs help"
    else:
        return "just fine"

Describe the preconditions (both in terms of types and general constraints) on the parameters.
Describe the conditions under which report(hw_avg, quiz1, quiz2, quiz3) will return "just fine". Describe them in terms of properties that ought to hold on each parameter to the function.
Are there any conditions under which the function reports an inappropriate string, given the arguments and your interpretation of the function's intended behavior?

Problem: Looking Ahead

In this problem, we'll motivate our discussion for next period on recursion and induction. Consider the standard append function for lists.

# Our standard list helper functions...
def is_empty(l):
    return len(l) == 0

def head(l):
    return l[0]

def tail(l):
    return l[1:]

def cons(x, l):
    return [x, *l]

def append(l1, l2):
    if is_empty(l1):
        return l2
    else:
        return cons(head(l1), append(tail(l1), l2))

Part 1: Simple Reasoning About Append

While the function is recursive, our substitutive model of computation is capable of modeling the behavior of this function. Thus, we can prove properties about append just like any other function. Prove this one as an exercise:

Claim (Nil is an Identity on the Left)

For any list l, append([], l) ≡ l.

Part 2: (Not) Commutativity

We say that an operation is commutative if we can swap the order of the objects involved, i.e., if x ≡ y then y ≡ x and vice versa. This is true for some operations, e.g., integers and addition, but not other, e.g., integers and subtraction. It turns out that append is not commutative for lists! Prove this fact by way of a counterexample:

Claim (Append is Not Commutative)

There exists lists l1 and l2 such that append(l1, l2) $ \not\equiv $ append(l2, l1).

Part 3: A Humble Attempt

Because append is not commutative, this means that just because the empty list is an identity on the left-hand side of the function, it is not necessarily an identity on the right-hand side. With a little bit of thought, though, we can convince ourselves this ought to be the case. However, proving this fact is deceptively difficult! Attempt to prove this claim:

Claim (Nil Is An Identity on the Right)

For any list l, append(l, []) ≡ l.

(Hint: you can perform case analysis on each new list you encounter in your proof, even the tail of the original list!)

At some point, you will get stuck, ideally at a point where you start seeing that your reasoning will go on infinitely. Check with an instructor that you have indeed gotten stuck at the "right point."

Part 4: Reflection

Finally, reflect on your experience. Answering the following questions in a few sentences each:

Why were you able to directly prove the left-identity claim?
Why is the right-identity claim more difficult to prove?
What information do you need in order to break the chain of infinite reason that you found in the previous proof?

Inductive Reasoning

Review of Recursive Design

The workhorse of functional programming is recursion. Recursion allows us to perform repetitive behavior by defining an operation in terms of a smaller version of itself. However, recursion is not just something you learn in Racket and summarily forget about for the rest of your career. Recursion is pervasive everywhere in computer science, especially in algorithmic design.

Because of this, we must know how to reason about recursive programs. Our model of computation is robust enough to trace the execution of recursive programs. However, our formal reasoning tools fall short in letting us prove properties of these programs. We introduce induction one of the foundational reasoning principles in mathematics and computer science, to account for this disparity.

A Review of Lists

Lists are a common data structure found in most modern programming languages. Python is no exception!

Like numbers, there are an infinite number of possible lists, e.g.,

[]
[3, 5, 8]
["Hi", "goodbye", "!"]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

However, our recursive definition of a list categorizes these infinite possibilities into a finite set of cases.

Definition (List)

A list is either:

Empty, or
Non-empty, consisting of a head element and the remainder of the list, its tail.

This definition is similar to that of a boolean, where we define a boolean as one of two possible values: True or False. However, the definition of a list is recursive because the non-empty case includes a component that is also a list.

Accessing Lists Recursively

In Python, we typically operate over lists much like an array with indexing operations and mutation. However, we can easily write functions that allow us to operate on lists according to our recursive definition, similarly to Scheme.

def is_empty(l):
    '''is_empty(l) returns true if list l is empty'''
    return len(l) == 0

def head(l):
    '''head(l) returns the element at the front of (non-empty) list l'''
    return l[0]

def tail(l):
    '''tail(l) returns (non-empty) list l, but without its head element'''
    return l[1:]

The implementation of the tail function, in particular, takes advantage of Python's list slicing expressions. l[n:m] returns a list containing the elements of l starting at index n, ending at index m (exclusive). When we elide m in a list slice, m becomes len(l), i.e., the last index of the list.

We can create lists explicitly with list literal notation, e.g., [1, 2, 3, 4, 5]. Additionally, we can write a "cons" function:

def cons(x, l):
    '''cons(x, l) returns list l but with element x at the front'''
    return [x, *l]

That allows us to add an element to the front of a list. The implementation of cons uses Python's sequence unpacking expression where *l takes the elements of list l and, effectively, injects them into a context where a sequence is expected. Here, the elements of l become the remaining elements of the list literal that cons returns.

Exercise

Predict the results of each of the following expressions:

head([1, 2, 3])
tail([1, 2, 3])
head(tail(tail([1, 2, 3])))
is_empty(tail(tail(tail([1, 2, 3]))))

Recursive Design with Lists

Because lists are defined via a finite set of cases, we define operations over lists using a combination of the is_empty function to determine which kind of list we have and head and tail to access parts of the list. For some simple operations, this is enough to get by, e.g., a function that retrieves the second element of a list:

def second(l):
    if is_empty(l):
        raise ValueError('empty list given')
    elif is_empty(tail(l)):
        raise ValueError('singleton list given')
    else:
        return head(tail(l))

We can translate this code to a high-level description:

If the list is empty, throw an error (via Python's raise statement).
If the list contains one element, throw an error.
If the list has at least two elements, retrieve the second element.

However, mere case analysis doesn't allow us to write more complex behavior. Consider the prototypical example of a recursive function: computing the length of a list. Let's first consider a high-level recursive design of the operation of this function, call it length(l). Because a list is either empty or non-empty, this leads to straightforward case analysis. The empty case is straightforward:

If the list is empty, its length is 0.

However, in the non-empty case, it isn't immediately clear how we should proceed. We can draw the non-empty case of a list l with head element v and tail tl as follows:

l = [v][ ?? tl ?? ]

Besides knowing that tl is a list, we don't know anything about it---it is an arbitrary list. So how can we compute the length of l in this case? We proceed by decomposing the length according to the parts of the lists we can access:

length =  1 + length t
l      = [h][ ?? t ?? ]

We know that the head element h contributes 1 to the overall length of l. How much does the tail of the list t contribute? Since t is, itself, a list, t contributes whatever length(t) produces. But this is a call to the same function that we are defining, albeit with a smaller list than the original input!

Critically, as long as the call to length(t) works, the overall design of length is correct. This assumption that we make---that the recursive call "just works" as long as it is passed a "smaller" input---is called the recursive assumption, and it is the distinguishing feature of recursion compared to simple case analysis.

In summary, we define length recursively as follows:

The length of a list l is:

0 if l is empty.

1 plus the length of the tail of l if l is non-empty.

In the recursive case of the definition, our recursive assumption allows us to conclude that the "length of the tail of l" is well-defined.

Our high-level recursive design of length admits a direct translation into Racket using our list functions:

def length(l):
    if is_empty(l):
        return 0
    else:
        return 1 + length(tail(l))

Because the translation is immediate from the high-level recursive design, we can be confident our function is correct provided that the design is correct.

Pattern Matching with Lists

Recall that one of our design goals is to write programs that are correct from inspection. In particular, when we have a recursive design, we want our code to look like that design. Let's see how our recursive definition of length fares in this respect. Below, we have replicated the definition of length with the recursive design in-lined in comments:

def length(l):
    if is_empty(l):             # A list is either empty or non-empty.
        return 0                # + The empty list has zero length.
    else:                       # + A non-empty list has...
        hd = head(l)            #     - A head element hd and
        tl = tail(l)            #     - A tail element tl.
        return 1 + length(tl)   #   The length of a non-empty list is
                                #   plus the length of the tail.

In this version of the code, we explicitly bind the head and tail of l to be clear where these components of l are manipulated.

Overall, this isn't too bad! Like our design, the code is clearly conditioned on whether l is empty or non-empty. Furthermore, the results of the cases clearly implement the cases of our design, so we can believe our implementation is correct as long as we believe our design is correct.

Is there anything we can improve here? Yes—some subtle, yet important things, in fact:

We need to make sure that the guard of our conditional accurately reflects the cases of our data structure. Here, our list is either empty or non-empty which is captured by an is_empty check.
We know that in the recursive case that our non-empty list is made up of a head and tail which we need to manually access using head and tail, respectively. We locally bind names to these individual pieces so that we don't interchange head and tail calls in our code, but these bindings add additional complexity to our implementation.

To fix these issues, we'll use the pattern matching facilities of Python to express our recursive design directly without the need for a guard expression or let-binding. Note that when we talk about pattern matching here, we don't mean regular expression matching but instead a separate facilities of Python for writing code that looks at the shape of a data type.

First, we'll revise our list definition slightly based on the functions we use to construct lists.

A list is either:

[], the empty list.

cons(v, l), a non-empty list constructed that consists of a head element v and a list l.

Remember, in this recursive scheme, a list is ultimately composed of repeated cons calls ending in []. For example:

>>> [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
> cons(1, cons(2, cons(3, cons(4, cons(5, []))))) 
[1, 2, 3, 4, 5]

Because of this, we know that our constructive definition of a list covers all possible lists. Now, we'll use pattern matching to directly express length in terms of this constructive definition:

def length(l):
    match l:
        case []:
            return 0
        case [hd, *tl]:
            return 1 + length(tl)

This version of length behaves identically to the previous version of the code but is more concise, directly reflecting our constructive definition of a list.

The pattern matching statement in Python allows us to directly perform a case analysis on data:

After the match keyword is the scrutinee or subject of the pattern match, l.
Following the scrutinee are branches, each one corresponding to a particular shape of the data we're analyzing.
- Following the case keyword is a pattern, a particular value shape that the scrutinee must match in order for this branch to be selected.
- On the next line after each case is the body of the pattern match that is evaluated when the scrutinee matches the branch's pattern.

Importantly, patterns may contain variables that are bound to parts of the scrutinee on a successful match. The first pattern, the empty list [], does not contain any such variables. But the second pattern, a list literal pattern combined with an unpacking operator, binds the first element of the list to hd and the tail of the list to tl. By using pattern matching, we no longer need to bind locals to name the subcomponents of a list!

A Recursive Skeleton for Lists

Ultimately, the recursive design of a function contains two parts:

Case analysis over a recursively-defined structure.
A recursive assumption allowing us to use the function recursively on a smaller object than the input.

When we fix the structure, e.g., lists, we arrive at a skeleton or template for defining recursive functions that operate on that structure. This skeleton serves as a starting point for our recursive designs. The skeleton always mimics the recursive definition of the structure:

Recursive Skeleton for Lists

For an input list l:

What do we do when l is empty (the base case)?
What do we do when l is non-empty (the recursive case)? When l is non-empty, we have access to the head of l and the tail of l with pattern matching. Furthermore, when l is non-empty, we can use our recursive assumption to recursively call our function on the tail of l.

Note that this skeleton is only a starting point in our recursive design. We may need to generalize or expand the skeleton, e.g., by adding additional base cases depending on the problem.

Exercise (Intersperse)

Write a high-level recursive design for the list_intersperse function. list_intersperse(v, l) returns l but with v placed between every element of l. For example:

>>> list_intersperse(0, [1, 2, 3, 4, 5])
[1, 0, 2, 0, 3, 0, 4, 0, 5]
>>> list_intersperse(0, [])
[]
>>> list_intersperse(0, [1])
[1]

(Hint: this is an example of a function where its most elegant implementation comes from having multiple base cases. Consider an additional base case in your recursive design.)

Inductive Reasoning with Lists

Now that we've discussed how we write recursive programs over lists, we'll develop our primary technique for reasoning about recursive structures, structural induction.

Reasoning About Recursive Functions

Let's come back to the proposition about append that we used to start our discussion of program correctness:

Claim

For all lists l1 and l2, length(l1) + length(l2) ≡ length(append(l1, l2)).

To prove this claim, we need the definitions of both length and append.

Exercise (Append)

Try to design and implement append(l1, l2), which returns the result of appending l1 onto the front of l2 without peeking below!

def length(l)
    match l:
        case []:
            return 0
        case [head, *tail]:
            return 1 + length(tail)

def append(l1, l2):
    match l1:
        case []:
            return l2
        case [head, *tail]:
            return cons(head, append(tail))

The proof proceeds similarly to all of our symbolic proofs so far: assume arbitrary values for the universally quantified variables and attempt to use symbolic evaluation.

Proof

Proof. Let l1 and l2 be arbitrary lists. The left-hand side of the equivalence evaluates to:

    length(l1) + length(l2)
--> { match l1:
          case []:
              return 0
          case [head, *tail]:
              return 1 + length(tail) } + length(l2)

The right-hand side of the equivalence evaluates to:

    length(append(l1, l2))
--> length({ match l1:
                 case []:
                     return l2
                 case [head, *tail]:
                     return cons(head, append(tail, l2)) })

At this point, both sides of the equivalence are stuck. However, we know that because l1 is a list, it is either empty or non-empty. Therefore, we can proceed with a case analysis of this fact!

Proof (Empty Case)

Either l1 is empty or non-empty.

l1 is empty, i.e., l1 is []. The left-hand side of the equivalence evaluates as follows:

...
--> { match []:
          case []:
              return 0
          case [head, *tail]:
              return 1 + length(tail) } + length(l2)
--> { return 0 }
--> 0 + length(l2)
--> length(l2)

On the right-hand side of the equivalence, we have:

...
--> length({ match []:
                case []:
                    return l2
                case [head, *tail]:
                    return cons(head, append(tail, l2)) })
--> length({ return l2 })
--> length(l2)

Both sides evaluate to (length l2), so they are equivalent!

So the empty case works out just fine. What about the non-empty case?

Proof (Non-empty Case)

l1 is non-empty. Since l1 is non-empty, l1 is (cons h t) for some value h and list t, so on the left-hand side of the equivalence, we have:

...
--> { match l1:
          case []:
              return 0
          case [head, *tail]:
              return 1 + length(tail) } + length(l2)
--> { return 1 + length(tail) } + length(l2)
--> (1 + length(tail)) + length(l2)
--> 1 + (length(tail) + length(l2))

The final step of evaluation comes from the commutative property of addition: $(1 + x) + y = 1 + (x + y)$ .

On the right-hand side of the equivalence, we have:

...
--> length({ match l1:
                case []:
                    return l2
                case [head, *tail]:
                    return cons(head, append(tail, l2)) })
--> length({ return cons(head, append(tail, l2)) })
--> length(cons(head, append(tai1, l2)))
--> { match cons(head, append(tail, l2)):
          case []:
              return 0
          case [h2, t2]:
              return 1 + length(t2) }
--> { return 1 + length(t2) }
--> return 1 + length(t2)
--> return 1 + length(append(tail, l2))

Note that this evaluation is a bit trickier than the previous ones that we have seen. In particular, we have to observe that the tail of cons(x, y) is simply y! Nevertheless, if we push through accurately, we can persevere!

At this point, our equivalence in the non-empty case is:

1 + length(tail) + length(l2) ≡ 1 + length(append(tail, l2))

tail is still abstract, so we can't proceed further. One way to proceed is to note that tail itself is a list. Therefore, we can perform case analysis on it---is tail empty or non-empty?

Proof (Case analysis on the tail)

(Still in the case where l1 is non-empty.)

tail is either empty or non-empty.

tail is empty. The left-hand side of the equivalence evaluates to:

...
--> 1 + length([]) + length(l2)
--> 1 + { match []:
              case []:
                  return 0
              case [head2, *tail2]:
                  return 1 + length(tail2) } + length(l2)
--> 1 + { return 0 } + length(l2)
--> 1 + 0 + length(l2)
--> 1 + length(l2)

The right-hand side of the equivalence evaluates to:

...
--> 1 + length(append([], l2))
--> 1 + length({ match []:
                     case []:
                         return l2
                     case [head2, *tail2]:
                         return cons(head2, append(tail2, l2)) })
--> 1 + length({ return l2 })
--> 1 + length(l2)

Both sides of the equivalence are 1 + length(l2), completing this case.

Note that when tail is empty, the original list l1 only contains a single element. Therefore, it should not be surprising that the equivalence boils down to demonstrating that both sides evaluates to 1 + length(l2).

Again, while the empty case works out, the non-empty case runs into problems.

Proof (Case analysis on the tail, non-empty case)

(Still in the case where l1 is non-empty.)

tail is non-empty. The left-hand side of the equivalence evaluates to:

...
--> 1 + length([]) + length(l2)
--> 1 + { match []:
              case []:
                  return 0
              case [head2, *tail2]:
                  return 1 + length(tail2) } + length(l2)
--> 1 + { return 1 + length(tail2) } + length(l2)
--> 1 + (1 + length(tail2)) + length(l2)
--> 1 + 1 + length(tail2) + length(l2)

tail2 here is the tail of tail, i.e., tail(tail(l1))!

Notice a pattern yet? Here is where our case analyses have taken the left-hand side of the equivalence so far:

     length(l1) + length(l2)
-->* <... l1 is non-empty ...>
-->* 1 + length(tail) + length(l2)
-->* <... tail of l1 is non-empty ...>
-->* 1 + 1 + length(tail2) + length(l2)

We could now proceed with case analysis on tail2. We'll find that the base/empty case is provable because in that case, we assume that l1 has exactly two elements. But then, we'll end up in the same situation we are in, but with one additional (+ 1 ... at the front of the expression! Because the inductive structure is defined in terms of itself, and we are proving this property over all possible lists, we don't know when to stop our case analysis!

Exercise (The Other Side)

We demonstrated that case analysis and evaluation of the equivalence's left-hand side seemingly has no end. Perform a similar analysis of the equivalence's right-hand side, starting when tail is non-empty. You should arrive at the point where the right-hand side evaluates to:

1 + 1 + length(append(tail2, l2))

Inductive Reasoning

How do we break this seemingly infinite chain of reasoning? We employ an inductive assumption similar to the recursive assumption we use to design recursive functions. The recursive assumption is that our function "just works" for the tail of the list. Our inductive assumption states that our original claim holds for the tail of the list!

Recall that our original claim stated:

Claim

For all lists l1 and l2, length(l1) + length(l2) ≡ length(append(l1, l2)).

Our inductive assumption is precisely the original claim but specialized to the tail of the list that we perform case analysis over. We also call this inductive assumption our inductive hypothesis.

Inductive Hypotheiss

length(tail) + length(l2) ≡ length(append(tail, l2)).

While we are trying to prove the claim, the inductive hypothesis is an assumption we can use in our proof.

Let's unwind our proof back to the case analysis of l1. The case where l1 was empty was provable without this inductive hypothesis, so let's focus on the non-empty case. Recall that before we performed case analysis, we arrived at a proof state where our goal equivalence to prove was:

1 + length(tail) + length(l2) ≡ 1 + length(append(tail, l2))

Compare this goal equivalence with our induction hypothesis above. We see that the left-hand side of the induction hypothesis equivalence, length(tail) + length(l2), is contained in the left-hand side of the goal equivalence. Because our induction hypothesis states that this expression is equivalent to length(append(tail, l2)), we can rewrite the former expression to the latter expression in our goal! This fact allows us to finish the proof as follows:

Proof (Completed, finally!)

l1 is non-empty. Our induction hypothesis states that:
```
`length(tail) + length(l2) ≡ length(append(tail, l2))`
```
Since l1 is non-empty, evaluation simplifies the goal equivalence to:
```
1 + length(tail) + length(l2) ≡ 1 + length(append(tail, l2))
```
By our induction hypothesis, we can rewrite this goal to:
```
1 + length(append(tail, l2)) ≡ 1 + length(append(tail, l2))
```
Which completes the proof.

We call a proof that uses an induction hypothesis a proof by induction or inductive proof. Like recursion in programming, inductive proofs are pervasive in mathematics.

In summary, here is a complete inductive proof of the append claim. In this proof, we'll step directly from a call to length or append directly to the branch of the match that we would have selected. We'll take this evaluation shortcut moving forward to avoid cluttering our proof.

Note in our proof that we declare that we "proceed by induction on l1" and then move into a case analysis. This exemplifies how we should think of inductive proof moving forward:

Inductive proof

An inductive proof is a case analysis over a recursively-defined structure with the additional benefit of an induction hypothesis to avoid infinite reasoning.

Proof

Claim: for all lists l1 and l2, length(l1) + length(l2) ≡ length(append(l1, l2)).

Proof. We proceed by induction on l1.

l1 is empty, thus l1 is []. The left-hand side of the equivalence evaluates as follows:
```
    length([]) + length(l2)
--> 0 + length(l2)
--> length(l2)
```
On the right-hand side of the equivalence, we have:
```
    length(append([], l2))
--> length(l2)
```
l1 is non-empty. Let head and tail be the head element and tail of l1, respectively. Our induction hypothesis is:

Inductive hypothesis: length(tail) + length(l2) ≡ length(append(tail, l2)).

On the left-hand side of the equivalence, we have:
```
--> length(l1) + length(l2)
--> (1 + length(tail)) + length(l2)
  ≡ 1 + (length(tail) + length(l2))
```
The final step comes from the commutative property of addition: $(1 + x) + y = 1 + (x + y)$ .

On the right-hand side of the equivalence, we have:
```
--> length(append(l1, l2))
--> length(cons(head, append(tail, l2)))
--> 1 + length(append(tail, l2))
```
In summary, we now have:
```
1 + (length(tail) + length(l2)) ≡ 1 + length(append(tail, l2))
```
We can use our induction hypothesis to rewrite the left-hand side of the equivalence to the right-hand side:
```
1 + length(append(tail, l2)) ≡ 1 + length(append(tail, l2))
```
Completing the proof.

Exercise (Switcharoo, ‡)

In our proof of the correctness of append, we performed induction on l1. Could we have instead performed induction on l2? Try it out! And based on your findings, explain why or why not in a few sentences.

Recursion to Induction

Now we'll gain first-hand experience proving properties of our recursive designs.

Problem 1: The Structure of Inductive Proof

We use inductive reasoning to prove properties of programs involving recursion. In this problem, we'll incrementally develop all the pieces of an inductive proof concerning the append function over lists:

def cons(x, l):
    return [x, *l]

def append(l1, l2):
    match l1:
        case []:
            return l2
        case [head, *tail]:
            return cons(head, append(tail, l2))

Note

Remember that, unless explicitly stated, you may take the shortcut that a function call evaluates directly to the expression that it would return in a single step.

In the reading, we explored these two claims regarding append:

Claim (left-null-append)

For all lists l, append([], l) ≡ l.

Claim (right-null-append)

For all lists l, append(l, []) ≡ l.

You noted that the right-null-append claim requires more proof machinery than just symbolic execution. We need induction to prove it! Like recursion, inductive proof also has a skeleton we can use as a starting point in understanding this technique. Below is our inductive proof skeleton for claims involving lists with places you should fill in indicated with italicized parentheses: (FILL ME IN!).

Proof Skeleton

Proof. We prove this claim by induction on (The list we are performing case analysis over).

The list is empty. (Proof of the claim assuming the list is empty---the base case)
The list is non-empty. We assume our induction hypothesis:

Induction Hypothesis: (The induction hypothesis)

We must prove:

(Restatement of proof goal)

(Proof of the claim assuming the list is non-empty---the inductive case)

Some notes about this skeleton:

Because induction is ultimately a case analysis, we must do induction on some object and thus must explicitly identify it. In our case, this is a list, but generally speaking, we can perform induction on any inductively-defined object that we can get our hands on.
The proof of the base case, when the list is empty, is usually straightforward. We know the list is empty, so we typically evaluate the goal equivalences to final values and observe that they are identical.
In the inductive case, we explicitly name (a) the induction hypothesis and (b) the proof goal. We do as a point of presentation---we want to be clear what we are assuming in our induction hypothesis and how that differs from the proof goal. Recall that the induction hypothesis is the original claim but specialized to the tail of the list we are performing induction over.
Finally, in the inductive case, we will likely need to use our induction hypothesis. Like other assumptions, we should be explicit in our prose when we invoke and how we use it to update our proof state.

Use this skeleton to author a proof of the right-null-append claim.

Problem 2: On Your Own

Consider the following Python function:

def any(f, l):
    match l:
        case []:
            return False
        case [head, *tail]:
            if f(head):
                return True
            else:
                return any(f, tail)

For the following claim, write a formal proof of the claim.

Claim

For any list l, any(lambda (v): False, l) ≡ False.

Mathematical Induction

So far, we have applied induction exclusively to lists. However, are there other structures that are inductive? It turns out that the natural numbers are the most common structure to perform induction over in mathematics. Now that we have some experience with inductive proof, let's explore this foundational application of the technique in detail.

The Natural Numbers

Recall that we can apply induction to any inductively-defined structure, that is, a structure defined in terms of a finite set of (potentially recursively-defined) cases. How can we define the natural numbers, i.e., non-negative integers, in this manner? One natural choice is to start at zero, the smallest natural number, and work our way up.

Since it is the smallest natural number, zero seems to serve as a base case, similar to the empty list. However, how can we characterize a non-zero natural number in terms of another natural number? Consider the number seven, written in unary, i.e., tally marks:

 1111111
\_______/
    |
    7

Do we see a smaller natural number nested somewhere inside of seven? The unary representation makes this explicit:

 1   111111
    \______/
        |
        6

Seven can be thought of as six but with one extra mark! In general, any non-zero natural number $k$ is one more than $k - 1$ . This fact leads to the following definition of the natural number as an inductive structure:

Definition (Natural Number)

A natural number is either:

Zero, or
The successor, $k + 1$ of some natural number $k$ .

With lists, we can use list functions list to query and break apart the structure:

is_empty allows us to test if a list is empty.
head lets us extract the head of an empty list.
tail lets us extract the tail of an empty list.
cons lets us construct a list from a head and tail.

We do the same thing with natural numbers, albeit with functions that you have seen before but may not have thought of in this context:

(==) allows us to test if a natural number is zero.
(-) allows us to retrieve the smaller natural number from a larger one.
(+) allows us to build up larger natural numbers from smaller ones.

Exercise (Primitive Zero)

Using the inductive definition of a natural number, write a recursive function plus(n1, n2), which returns the result of adding n1 and n2. In your implementation, you are only allowed to use:

Conditionals,
... == 0,
A recursive function call,
Subtraction by one, i.e., n - 1.

(Note: This isn't how we would implement plus in a real system. This is merely an exercise in using the inductive definition of the natural numbers.)

Induction Over the Natural Numbers

Now that we have an inductive definition of the natural numbers, we can:

Perform recursive operations over the natural numbers, and
Prove properties involving natural numbers by using inductive reasoning.

Of course, not having a formal inductive definition didn't stop us from writing recursive functions over the natural numbers. Similarly, with lists, having this definition in-hand can give us more confidence that we are doing the right thing when we begin programming.

With that being said, let us focus on the second activity for the remainder of this section: writing inductive proofs with natural numbers. Induction over the natural numbers is so common that we give it a generic name: mathematical induction. Consider the following canonical recursive function over the natural numbers:

def factorial(n):
    if n == 0:
        return 1
    else
        return n * factorial(n-1)

And the following claim about factorial asserts that factorial produces only positive, non-negative, non-zero results.

Claim

For any natural number n, 0 < factorial(n).

Note that the definition of factorial reflects the structure of the inductive definition of a natural number. This strongly suggests that we ought to prove this claim using induction. Let's go ahead and try it, following the same style of proof introduced for lists, but instead invoking the inductive definition of natural numbers.

Proof Skeleton

Proof. By induction on n.

n is zero. …
n is non-zero. …

The case analysis of our inductive is similar to that of lists. The "base case" for a natural number is when that number is zero. The "inductive case" for a natural number is when that number is non-zero. Note that because a natural number cannot be negative and the inductive case number is not zero, this number must be a positive integer. If we call that natural number k, then we know that k-1 is well-defined (since k is at least 1).

The proof of the base case follows from evaluation:

Proof (Base Case)

n is zero. factorial(n) -->* 1 and so 0 < (factorial 0) -->* 0 < 1.

The proof of the recursive case also follows from evaluation, invoking the induction hypothesis, and collecting assumed constraints about the different pieces of the inequality.

Proof (Inductive Case)

n is non-zero. We assume our induction hypothesis:

Induction Hypothesis: 0 < factorial(n-1)

And prove:

Goal: 0 < factorial(n)

The left-hand side of the equivalence evaluates as follows:
```
     0 < factorial(n)
-->* 0 < n * factorial(n-1)
```
Our induction hypothesis tells us that 0 < factorial(n-1), i.e., factorial(n-1) is a strictly positive integer. Furthermore, we know that n is non-zero, so n is also a strictly positive integer. Therefore, we know that their product n * factorial(n-1) is also strictly positive and thus 0 < n * factorial (- n 1).

Note that in the inductive case, we acquire the following assumptions:

n is non-zero because we are in the inductive case.
0 < factorial(n-1) from our induction hypothesis.

These two facts combined with the knowledge that two multiplying two positive numbers produces a positive number tell us that the body of factorial produces a strictly positive result.

Exercise (Racket Sum, ‡)

Consider the following Python definition.

def sum(n)
    if n == 0:
        return 0
    else:
        return n + sum(n-1)

Prove the following claim using mathematical induction:

Claim: for all natural numbers n, sum(n) ≡ (n * (n+1)) / 2.

(Hint: when manipulating the right-hand side of the equation, you will likely need to employ several arithmetic identities to get the goal into a form where the induction hypothesis applies.)

Structural Versus Mathematical Induction

Now, we have two objects that can be the subject of induction: lists and numbers. In some situations, we might have both types available as candidate subjects of inductive analysis. Does it matter which one that we pick? Naturally, it depends on the operation we are analyzing! However, here are some general considerations when choosing one kind of induction over the other:

You can only perform induction on things you have

For example, let's consider the function replicate(n, v) which produces a list of n copies of v. And consider the following claim about this function:

Claim

For all natural numbers n and values v, length(replicate(n, v)) ≡ n.

Since our claim is ultimately about the length of a list and replicate produces a list, it is tempting to say that we will prove this claim by induction on a list. However, note that our assumptions don't give us a list! As part of quantifying our variables, we assume the existence of some natural number n and a value (of arbitrary type) v. We do not have an actual list to perform induction over.

Perform induction on the object your function performs case analysis over

In other cases, we have multiple objects, say, for example, a natural number and a list, available to us. Which one should we perform induction over? We should likely perform induction on the object that our function is performing case analysis over. That way, the cases allow us to make progress in evaluating the function call.

For example, consider the function list_inc(l, n), which increments every element of l by n. Both l and n are inductively defined. However, the implementation of list_inc:

def list_inc(l, n):
    match l:
        case []:
            return []
        case [head, *tail]:
            return cons(n+head, list_inc(tail, n))

Relies on a case analysis on l, not n! Thus, it is likely that we should analyze the function by structural induction on l rather than mathematical induction on n.

Exercise (What If?)

Imagine the following claim over list_inc:

Claim: for all lists l and natural numbers n, length(list_inc(l, n)) ≡ length(l).

Where do you specifically get stuck in proving this claim if you perform mathematical induction on n?

That being said, mathematical induction can be applied in any context where we have access to a natural number, even if it is not part of the immediate goal. For example, consider a claim over lists we've seen previously:

Claim

For all lists l, append(l, []) ≡ l.

While there is no natural number variable introduced by the claim, we do have access to one number: length(l) produces a natural number since l is assumed to be a list! In other words, we could try to proceed by induction on the length of l rather than its structure.

Proof

Proof. By induction on the result of length(l). Because l is a list length(l) must produce a natural number; call it n.

n is zero. Because n is zero, that means l has no elements and thus l is the empty list and append([], []) -->* '().
n is non-zero. Our induction hypothesis states that:

Inductive Hypothesis: for any list l1 where length(l1) = n-1, append(l1, []) ≡ l1.

Goal: append(l, []) ≡ l.

Because n is non-zero, we know that l has at least one element and is thus non-empty. The left-hand side of the equivalence evaluates as follows:
```
     append(l, [])
-->* cons(head, append(tail, []))
```
Because length(l) ≡ n, length(tail) ≡ n-1, i.e., tail has one less element than l. Thus we can invoke our induction hypothesis to rewrite append(tail, []) to l, thus proving the claim.
```
    cons(head, append(tail, []))
  ≡ cons(head, tail)
--> l
```
The derivation's final line follows from the fact that cons attaches its first argument onto the front of the second argument.

It turns out that the proof works out, but our reasoning is a lot more complicated!

We have to continually conclude that the input list l is either empty or non-empty from its length. Note that we shouldn't assume this fact is true---we ought to prove it true as an auxiliary claim or lemma.
Our induction hypothesis is far more complicated to state and utilize. We'll talk more about the specifics of what it says when we discuss logic in detail. But intuitively, the induction hypothesis says that our original claim holds for a list that has length one less than the original list. Because we assume that the induction hypothesis is true, we get to choose an instantiation for l1 that benefits our reasoning, here the tail of the list.

In short, when we can perform structural induction on an object, we can frequently figure out a way to also perform mathematical induction on that same object, e.g., through some notion of "length." However, when possible, we should use structural induction because it allows us to work directly with the object's definition rather than indirectly through that "length."

Mathematical Induction Beyond Program Correctness

As a final note, mathematical induction also serves as a bridge between our current focus on program correctness and the wider world of formal mathematics. We can perform induction on natural numbers that appear at the level of mathematics, not just programs! The mechanics are identical, although the domain of proof may be different, something we'll discuss in the coming weeks. We close by highlighting how we can directly translate our proof of the positivity fact for factorial-the-Racket-function to factorial-the-math-function.

First, we define factorial in mathematical notation as follows:

$0! = n! = 1 n \times (n - 1)!$

And then we can state and prove our positivity claim as follows:

Proof

Claim: for all natural numbers $n$ , $n! > 0$ .

Proof. By induction on $n$ .

$n$ is zero, then $0! = 1 > 0$ by definition.
$n$ is non-zero.

Induction Hypothesis: $(n - 1)! > 0$

Goal: $n! > 0$ .

Since $n$ is non-zero, $n! = n \times (n - 1)!$ . We know from our IH and by our case assumption that both $n$ and $n - 1$ are strictly positive so $n \times (n - 1)! > 0$ as well.

Compare the proof of positive for factorial versus $n!$ and note how they are identical save for the fact that we use program simplification for factorial and arithmetic for $n!$ . Notably, while the domains---programs versus arithmetic---are different, the proof technique---mathematical induction---remains the same!

Exercise (Math Sum, ‡)

In the Racket Sum exercise, you showed that the Racket sum function followed the arithmetic sum identity $1 + \dots + n = \frac{n ( n + 1 )}{2}$ . Following the proof template above for mathematical induction for arithmetic to prove that the arithmetic sum identity is true:

Claim (Arithmetic Summation): $1 + \dots + n = \frac{n ( n + 1 )}{2}$ .

(Hint: $1 + \dots + n$ is short-hand for the summation of $1$ through $n$ . Where is the smaller summation sequence instead of this longer one?)

Mathematical Induction Practice

Now, we'll practice writing inductive proofs, but this time, over the natural numbers. We'll also begin to test our understanding of proof mechanics by exploring incorrect proofs of correct propositions. As we shall see, writing proofs is one thing, but identifying where proofs go wrong is quite tricky!

Problem 1: Mathematical Induction Practice

Consider the following Python definitions:

def cons(x, l):
    return [x, *l]

def length(l):
    match l:
        case []:
            return 0
        case [_, *tail]:
            return 1 + length(tail)

def replicate(v, n):
    if n == 0:
        return []
    else:
        return cons(v, replicate(v, n-1))

Prove the following claim using mathematical induction.

Claim

For all values v and natural numbers n, length(replicate(v, n)) ≡ n.

Problem 2: More Mathematical Induction Practice

Consider the following claim regarding the distribution of powers and multiplication.

Claim

For all natural numbers $a$ , $b$ , and $n$ , $(ab)^{n} = a^{n} b^{n}$ .

Prove this claim by mathematical induction on $n$ . In your proof, you may only rely on the following mathematical facts:

For any numbers $x$ and $n$ , $x \cdot x^{n - 1} = x^{n}$
Commutativity of multiplication: for any numbers $x$ and $y$ , $x y = y x$ .
Associativity of multiplication: for any numbers $x$ , $y$ , and $z$ , $x (yz) = (x y) z$ .

Problem 3: Problem Proof

Here is a bogus claim about replicate and a corresponding "proof" of that claim:

Bogus Claim and Proof

Claim: for all values v and natural numbers n, length(replicate(v, n)) ≡ 0.

Proof. Let v be a value and n be a natural number, we proceed by induction on n.

n = 0. The left-hand side of the equivalence evaluates as follows:
```
     length(replicate(v, 0))
-->* length([])
-->* 0
```
Which is the right-hand side of the equivalence.
n = k+1. We must prove that:

Goal: length(replicate(v, n) ≡ 0.

From our induction hypothesis, we know that length(replicate(v, n)) ≡ 0 so we are done.

In a sentence or two, describe what the claim is saying and why it is incorrect.
In a sentence or two, describe the error in the "proof."
Correct the error and attempt to finish the proof. You should get stuck, i.e., a point where you are unable to move further in the proof. Show your work to get to this point and describe in a sentence or two why you cannot proceed forward with the proof.

Problem 4: Even More Mathematical Induction Practice

We can formally define the $i$ th odd number to be $d_{i} = 2 i - 1$ (where $d_{1}$ is the first odd number: $d_{1} = 2 \cdot 1 - 1 = 1$ ). Prove the following claim using mathematical induction on $n$ :

Claim

Claim: the sum of the first $n$ odd numbers $d_{1} + \dots + d_{n} = n^{2}$ .

(Hint: where does the induction hypothesis appear in the left-hand side summation?)

Problem 5: Another Problem Proof (Optional)

Here is yet another bogus claim and "proof" of that claim:

Bogus Claim and Proof

Claim: For all natural numbers $n$ , $n + 1 \leq n$ .

Proof. By induction on $n$ . In the inductive case where $n = k + 1$ , our induction hypothesis states that $k + 1 \leq k$ . We must then show that $(k + 1) + 1 \leq k + 1$ :

$(k + 1) (k + 1) + 1 \leq k \leq k + 1 inductive hypothesis transitivity of \leq$

And thus our goal is immediately proven.

In a sentence or two, describe what the claim is saying and why it is incorrect.
In a sentence or two, describe the error in the "proof."
Correct the error and attempt to finish the proof. You should get stuck, i.e., a point where you are unable to move further in the proof. Show your work to get to this point and describe in a sentence or two why you cannot proceed forward with the proof.

Demonstration Exercise 3

Problem 1: Down By More Than One

Consider the following recursive Python definitions:

def is_even(n):
    if n == 0:
        return True
    elif n == 1:
        return False
    else:
        return is_even(n-2)

def iterate(f, x, n):
    if n == 0:
        return x
    else:
        return iterate(f, f(x), n-1)

For this problem, you may take the shortcut of evaluating any function call directly to the expression that it returns in a single step.

First prove the following auxiliary claim about even:

Lemma

For any natural number n, if even(n) and n is not zero then 2 <= n.

Then, consider the following claim about iterate:

Claim

For all natural numbers n and booleans b, if is_even(n) then iterate(not, b, n) ≡ b.

In a sentence or two, describe at a high-level why this claim is correct.

Finally, formally prove this claim using the auxiliary claim.

Problem 2: Translation

Consider the following parameterized atomic propositions:

$A (x, y, z) =$ " $x$ knows that $y$ owes $z$ money"
$B (x, y) =$ " $x$ owes $y$ money"
$D (x, y) =$ " $x$ will pay $y$ "
$C (x) =$ " $x$ is an honest person"

First, translate each of the following English propositions into a formal propositional statement, using the parameterized atomic propositions above. Make sure to make maximal use of the logical connectives, in particular, negation, in your formal statements.

Either Sam owes the store money or the store owes Sam money.
If Henry knows that Io owes Mateo money and Io is honest, then Io will pay Mateo.
For all people $x$ , if $x$ is not honest, then $x$ will not pay Tina.
There exists people $x$ and $y$ such that for any person $z$ , $x$ owes $y$ money but $z$ does not know that $x$ owes $y$ money.

Now consider the following formal propositional statements. Give English propositions that are equivalent to these formal statements.

$D ("Laura", "Roberto") \to C ("Laura")$ .
$\forall x, y . C (x) \to D (x, y)$ .
$\forall x . \neg C (x) \to \exists y, z . A (y, x, z) \to \neg D (x, z) \land D (x, y)$ .

Problem 3: Objection!

A logical fallacy is a misconception due to an erroneous step of logical reasoning. In this problem, we'll explore some common logical fallacies, their formal representation in propositional logic, and their refutation.

For each of the propositions below:

Translate the proposition into a formal propositional statement, representing atomic propositions with fresh propositional variables. Make sure (a) state which variables correspond to which atomic propositions and (b) make maximal use of the logical connectives in your formal statement.
In a few sentences, argue why the resulting formal proposition is not provable using your intuition of the semantics of the logical connectives of propositional logic.

(Hint: recall that logical implication $(\to)$ captures the idea that we assume that a proposition is true and go on to prove another proposition true using that assumption. Whenever a situation describes a proposition that you are supposed to assume is true, you should use implication to represent this fact.)

If I buy you a present, you will like me. I did not buy you a present, therefore you will not like me.
We know that this drug will cure cancer or kill the patient. We observe that the drug cures the patient. Therefore, the drug will not kill the patient.
Suspect A claims that suspect B was the thief. We showed that suspect A is a liar. Therefore, we know that suspect B is not a thief.
It has been proven that if you run, you will get into shape. I got into shape, therefore, I must have run.
I claim that there are mole people underground that are plotting to take over the world. There exists no evidence that implies there are not mole people underground. Therefore, my claim about mole people is correct.

Variations of Inductive Proof

We have learned that operations and proofs over inductively-defined objects follow from their inductive definitions. For example, recall the inductive definition of a list:

Definition (List)

A list is either:

Empty, or
Non-empty, a combination of a head element and the rest of the list, its tail.

But this isn't the only way of sorting the infinite possible lists into a finite set of cases. For example, here is an alternative definition:

Definition (List, Alternative)

A list is either:

Empty, or
A singleton list containing exactly one element, or
Non-empty with at least two elements and a tail.

This definition differs from the previous one because it explicitly calls out the singleton list case. Both definitions are equivalent---every possible list is described by exactly one of the cases in each definition. But in some situations, using an alternative inductive definition is appropriate to better fit the operation we are describing or reasoning about.

We'll look at an inductive proof that takes advantage of this alternative definition. This is just one example of the variations we might encounter when performing inductive proof. There are many others out there, more than we can cover in the brief time we have in this course. However, be aware that these variations exist, but they all ultimately rest on our basic definition of inductive reasoning:

Definition (Induction)

A proof by induction is a proof that proceeds by case analysis on the inductive structure of an object where you may assume an inductive hypothesis in the inductive cases of the proof.

Case Study: Intersperse

Previously, we introduced the intersperse function takes an element x and a list l and returns a new list that puts x between each element of l.

def intersperse(v, l):
    match l:
        case []:
            return []
        case [x]:
            return [x]
        case [h, *tail]:
            return [h, v, *intersperse(v, tail)]

Exercise (Why Three?)

It is not immediately obvious why intersperse requires that we have a separate case for the singleton list. Try implementing intersperse naively with our standard, two-case definition of lists, i.e., the list is empty or non-empty. Test your implementation and discover what bug arises with only two cases! Try to then summarize in a few sentence why this bug occurs and how the singleton case in the above implementation fixes the problem.

Suppose we want to prove the following claim:

Claim (Length of Intersperse)

For all values v and lists l. If not is_empty(l) then length(intersperse(v, l)) = 2 * length(l) - 1.

Note that intersperse is defined according to our alternative, singleton-based definition of a list. Our proof, therefore, follows the cases outlined by this definition. If the cases include recursive sub-structure, i.e., tails of lists drawn from the subject of the inductive proof, then we gain the induction hypotheses for all of these sub-structures. These are our inductive cases whereas cases with no recursive sub-structures are called base cases.

Here is a formal proof the claim as an exemplar of the concepts we've learned so faralong with some of these additional concerns we've introduced such as using an alternative definition of lists. Study this proof carefully, paying attention to its structure and formatting.

Proof

Proof. Let v be an arbitrary value and l be an arbitrary list. We assume that not is_empty(l) i.e., l is non-empty. We proceed by induction on the structure of l.

l is empty. By assumption, we know that l is not empty. Thus, we do not need to consider this case.
l has one element. Call the singleton element of this list x. The left-hand side of the equality evaluates as follows:
```
length(intersperse(v, [x])) -->* length([x]) -->* 1
```
And the right-side evaluates to:
```
2 * length([x]) - 1 -->* 2 * 1 - 1 -->* 1
```
l has more than one element. Let the first element be h and the remainder of the list be tail. We assume our inductive hypothesis holds:

IH: If not is_empty(tail) then length(intersperse(v, tail)) = 2 * length(tail) - 1.

We must prove:

Goal: length(intersperse(v, l)) = 2 * length(l) - 1.

The left-hand side of the equality evaluates to:
```
     length(interserse(v, l))
-->* length([h, v, *intersperse(v, tail)])
-->* 2 + length(intersperse(v, tail))
```
In this case, we know that l has at least two elements, so not is_empty(tail). Therefore, our induction hypothesis applies and so we can rewrite this final quantity to 2 + 2 * length(tail) - 1.

Recalling that in this case, l has at least two elements, the right-hand side of the equality evaluates to:
```
    2 * length(l) - 1
--> 2 * (1 + length(tail)) - 1
  ≡ 2 + 2 * length(tail) - 1
```
So both sides of the equality evaluate to 2 + 2 * length(tail) - 1, completing the proof.

Note that our claim is the form of a conditional: "if … then … ." This means that our induction hypothesis must also be of this form! This form of proposition is called a logical implication. When trying to prove an implication, we assume the "if" portion (the premise) and go on the prove the "then" portion (the conclusion). However, if we assume that an implication is true, we must first prove the "if" portion and then we can go on to assume the "then" portion. We will learn more about logical implication and, more generally, mathematical logic soon!

Exercise (Proof Busters, Intersperse Edition, ‡)

Consider the following implementation of intersperse where we do not introduce a special case for a 1-element list:

def intersperse(v, l):
    match l:
        case []:
            return []
        case [head, *tail]:
            return [head, v, *intersperse(v, tail)]

This implementation is not correct. (If you don't see why, I encourage you to experiment with the function in the Python interpreter!)

Nevertheless, try applying the proof from the reading to these function. How does the proof change? The claim should not hold for this function---where does the proof go wrong in your analysis?

Lab: Strong Mathematical Induction

Consider the following classic numeric problem regarding making change:

Suppose that you live in a country where currency only comes in 3¢ and 10¢ denominations. Show that you can form any amount of money $n$ from 3¢ and 10¢ pieces as long as $n \geq 18$ .

Let us attempt to prove this claim by induction. Really, the claims says that we can choose $x$ and $y$ , such that $3 x + 10 y = n$ as long as $n \geq 18$ . $x$ and $y$ here are the number of 3¢ and 10¢ pieces, respectively.

Claim

Claim: for any natural number $n$ where $n \geq 18$ , there exists natural numbers $x$ and $y$ such that $3 x + 10 y = n$ .

Attempt to prove this claim by mathematical induction. Push the proof through as much as possible until you get stuck. Identify you get stuck and why you cannot make any more progress.
Since the proof did not work out, you might be tempted to believe that the claim is actually false; not an unreasonable conclusion. However, it turns out this claim is actually true. Develop a series of examples for $n = 18$ through $n = 26$ (9 examples in all) to convince yourself that this is the case.
Sometimes, a particular proof technique is not strong enough to prove a proposition. In these situations, we must resort to a different proof technique better suited for the situation at hand.

Rather than using regular mathematical induction, we'll invoke a variant of mathematical induction, strong induction, to prove this claim.

Definition (Strong Induction): strong induction is a proof principle that is like mathematical induction, i.e., induction over a natural number $n$ , except that we assume that our induction hypothesis holds for any natural number $k < n$ .

Regular mathematical induction allows us to handle situations where our recursive definitions decrease by one on each step. Strong induction allows us to handle situations where our recursive definitions steps by more than just one and more generally, in irregular patterns.

Practically speaking, to use the strong induction principle:
- We state that we are using strong induction instead of regular induction.
- We change our induction hypothesis to reflect the fact that the hypothesis holds for any natural number $k < n$ rather than just $n - 1$ .
- Finally, we also may need to include additional base cases to account for the potentially irregular pattern that our inductive object decreases by.
Change your original, incomplete inductive proof to a strong induction proof and complete it.

(Hint 1: now your induction hypothesis holds for any number less than $n$ . Don't overthink this part! Choose a convenient target value that is less than $n$ that you can apply your induction hypothesis to.)

(Hint 2: now think about how much you decrease $n$ by in each inductive step. A single base case of $n = 18$ is no longer sufficient because you are no longer decreasing by one. What additional base cases do you need?)
Once you are done, you might think this proof feels "too simple!" In some sense it is because the corresponding algorithm for making change implied by this proof is also straightforward! In a sentence or two, describe the algorithm for making change that is suggested by your reasoning. This final bit highlights the connection between proving and algorithmic design. We can design an algorithm with correctness in mind; dually, we often can derive an algorithm from our reasoning!

Have you ever been in an argument where the other person "just wasn't being reasonable?" What were they doing that was "without reason?"

Making "facts" up out of thin air?
Forming connections between claims without appropriate evidence?
Jumping between unrelated arguments?
Not addressing the problem at hand?

How do you know that you were being reasonable and your partner was the unreasonable one? Is there an agreed-upon set of rules for what it means to operate "within reason" or is it simply a matter of perspective?

Mathematical logic is the study of formal logical reasoning. Coincidentally, logic itself is one of the foundations of both mathematics and computer science. Recall that mathematical prose contains propositions that assert relationships between definitions. These propositions are stated in terms of mathematical logic and thus form one of the cornerstones of the discipline.

As computer scientists, we care about logic for similar reasons. We care about the properties of the algorithms we design and implement, in particular, their correctness and complexity. Logic gives us a way of stating and reasoning about these properties in a precise manner.

Propositional Logic

Let us begin to formalize our intuition about propositions. First, how can we classify propositions? We can classify them into two buckets:

Atomic propositions that make elementary claims about the world. "It is beautiful today" and "it is rainy today" are both elementary claims about the weather.
Compound propositions that are composed of smaller propositions "It is either rainy today or it is beautiful today" is a single compound proposition made of up two atomic propositions.

This is not unlike arithmetic expressions that contain atomic parts---integers---and compound parts---operators. As you learned about arithmetic throughout grade school, you learned about each operator and how they transformed their inputs in different ways. Likewise, we will learn about the different logical operators---traditionally called connectives in mathematical logic---and how they operate over propositions.

We can formally define the syntax of a proposition $p$ similarly to arithmetic using a grammar:

$p ::= A ∣ ⊤ ∣ ⊥ ∣ \neg p ∣ p_{1} \land p_{2} ∣ p_{1} \lor p_{2} ∣ p_{1} \to p_{2} .$

A grammar defines for a sort (the proposition $p$ ) its set of allowable syntactic forms, each written between the pipes ( $∣$ ). A proposition can take on one of the following forms:

Atomic propositions, elementary claims about whatever domain we are considering. When we don't care about the particular domain (e.g., we want to hold the domain abstract), we'll use capital letters such as $A$ as meta-variables for atomic propositions.
Top, $⊤$ (LaTeX: \top), pronounced "top", the proposition which is always provable.
Bot, $⊥$ (LaTeX: \bot), pronounced "bottom" or "bot", the proposition which is never provable.
Logical negation, $\neg p$ (LaTeX: \neg), pronounced "not", which stands for proposition $p$ , but negated. For example, the negation of $p$ = "It is beautiful today" is $\neg p$ = "It is not beautiful today".
Logical conjunction, $p_{1} \land p_{2}$ (LaTeX: \wedge), pronounced "and", which stands for the proposition where both $p_{1}$ and $p_{2}$ are provable. For example, the conjunction of $p_{1}$ = "It is beautiful today" and $p_{2}$ = "The sun is out" claims that it is both beautiful and the sun is out.
Logical disjunction, $p_{1} \lor p_{2}$ (LaTeX: \vee), pronounced "or", which stands for the proposition where at least one of $p_{1}$ or $p_{2}$ is provable. For example, the disjunction of $p_{1}$ = "It is beautiful today" and $p_{2}$ = "It is raining today" claims that it is either beautiful today or it is raining today (or both are true).
Logical implication, $p_{1} \to p_{2}$ (LaTeX: \rightarrow), pronounced "implies", which stands for the proposition where $p_{2}$ is provable assuming $p_{1}$ is provable. For example, if $p_{1}$ = "It is cloudy" and $p_{2}$ = "It is raining", then $p_{1} \to p_{2}$ claims that it is raining, assuming that it is cloudy.

Notably, logical implication are the conditional propositions we have seen previously of the form "if … then … ." For example, if we had the following program correctness claim:

Claim

Claim: for all natural numbers n, if n >= 0 then factorial(n) > 0.

We might write it concisely using the formal notation above as:

Claim

Claim: for all natural numbers n, n >= 0 $\to$ factorial(n) > 0.

This form of logic is called propositional logic because it focuses on propositions and basic logic connectives between them. There exist more exotic logics that build upon propositional logic with additional connectives that capture richer sorts of propositions we might consider.

Modeling with Propositions

Recall that mathematics is the study of modeling the world. In the case of mathematical logic, we would like to model real-world propositions using the language of mathematical propositions we have defined so far. This amounts to translating our informal natural language descriptions into formal mathematical descriptions.

To do this, keep in mind that mathematical propositions are formed from atomic and compound propositions. We then need to:

Identify the atomic propositions in the statement.
Determine how the atomic propositions are related through various compound propositions.

As an example, let's consider a proposition more complicated than the previous ones that we have encountered:

"If it is either the case that the grocer delivers our food on time, or I get out to the store this morning, then the party will start on time and people will be happy.

First, we identify the domain under which we are making this proposition: preparing for a party. Thus, the atomic propositions are the statements that directly talk about preparing for a party. These propositions are:

$p_{1}$ = "The grocer delivers our food on time."
$p_{2}$ = "I get out to the store this morning."
$p_{3}$ = "The party will start on time."
$p_{4}$ = "People will be happy."

Now, we re-analyze the statement to determine how the propositions are related, translating those relations into logical connectives. You may find it helpful to replace every occurrence of the atomic propositions with variables to make the connecting language pop out:

"If it is either the case that $p_{1}$ or $p_{2}$ then $p_{3}$ and $p_{4}$ ."

From this reformulation of the statement, we can see that:

$p_{1}$ and $p_{2}$ are related by the word "or" which implies disjunction.
$p_{3}$ and $p_{4}$ are related by the word "and" which implies conjunction.
These previous two compound conjunctions are related by the words "if" and "then" which implies implication.

Thus we arrive at our full formalization of the statement:

$p_{1} \lor p_{2} \to p_{3} \land p_{4} .$

(Note that $(\to)$ has lower precedence than $(\lor)$ or $(\land)$ so the statement written as-is is not ambiguous.)

When translating statements in this manner, it is useful to keep in mind some key words that usually imply the given connective:

$(\neg)$ : "not"
$(\land)$ : "and"
$(\lor)$ : "or"
$(\to)$ : "implies", "if ...then"

Reasoning with Propositions

Recall that our goal is to formally model reasoning. While equivalences allow us to show that two propositions are equivalent, they do not tell us how to reason with propositions. That is, how do we prove propositions once they have been formally stated?

First, we must understand the reasoning process itself so that we can understand its parts. From there, we can state rules in terms of manipulating these parts. When we are trying to prove a proposition, whether it is in the context of a debate, an argument with a friend, or a mathematical proof, our proving process can be broken up into two parts:

A set of assumptions, propositions that we are assuming are true.
The proposition that we are currently trying to prove, our proof goal.

We call this pair of objects our proof state. Our ultimate goal when proving a proposition is to transform the proof goal into an "obviously" correct statement through valid applications of proof rules.

For example, if we are in the midst of a political debate, we might be tasked with proving the statement:

"80s-era Reaganomics is the solution to our current economic crisis."

Our proof goal is precisely this statement and our set of assumptions includes our knowledge of Reaganomics and the current economy. We might take a step of reasoning---unfolding the definition of Reaganomics, for example---to refine our goal:

"Economic policies associated with tax reduction and the promotion of the free-market are the solution to our current economic crisis."

We can then apply additional assumptions and logical rules to transform this statement further. Once our logical argument has transformed the statement into a self-evident form, for example:

"Everyone needs money to live."

Then, if everyone agrees this is a true statement, we're done! Because the current proof goal follows logically from assumptions and appropriate uses of logical rules, we know that if this proposition is true, then our original proposition is true. In a future reading, we will study these rules in detail in a system called natural deduction.

Propositional Logic Versus Boolean Algebra

Throughout this discussion, you might have noted similarities between propositions and a concept you have encountered previously in programming: booleans. Values of boolean type take on one of two forms, true and false, and there are various operators between booleans, && and ||,logical-AND and logical-OR, respectively. It seems like we can draw the following correspondence between propositions and booleans (using the boolean operators found in a C-like language):

$⊤$ is true.
$⊥$ is false.
$(\neg)$ is !.
$(\land)$ is &&.
$(\lor)$ is ||.

This correspondence is quite accurate! In particular, the equivalences we discussed in the previous section for propositions also work for boolean expressions. The only caveat is that there is no analog to implication for the usual boolean expressions, but the equivalence:

$p_{1} \to p_{2} \equiv \neg p_{1} \lor p_{2}$

Allows us to translate an implication into an appropriate boolean expression.

So why do we talk about propositions when booleans exist? It turns out they serve subtle, yet distinct purposes!

A proposition is a statement that may be proven.
A boolean expression is an expression that evaluates to true or false.

Note that a proposition does not say anything about whether the claim is true or false. It also doesn't say anything computational in nature, unlike a boolean expression which carries with it an evaluation semantics for how to evaluate it in the context of a larger computer program.

Furthermore, the scope of a boolean expression is narrow: we care about booleans values insofar as they assist in controlling the flow of a program, e.g., with conditionals and loops. We don't need to complicate booleans any further! In contrast, propositions concern arbitrary statements, and our language of propositions may need to be richer than the corresponding language of booleans to capture the statements we have in mind.

Exercise (Translation, ‡)

Consider the following propositions

$p_{1}$ = "The sky is cloudy."
$p_{2}$ = "I will go running today."
$p_{3}$ = "I'm going to eat a cheeseburger."

Write down English translations of these formal statements:

$p_{2} \lor p_{3}$ .
$p_{1} \to p_{3} \to p_{2}$ . (Note: implication is a right-associative operator! This means that this statement is equivalent to $p_{1} \to (p_{3} \to p_{2})$ ).
$p_{1} \to p_{2} \land \neg p_{3}$ .

Exercise (Funny Nesting)

Recall that implication is right-associative. That is, sequences of $(\to)$ operators parenthesize "to the right", i.e.,

$p_{1} \to p_{2} \to p_{3} \equiv p_{1} \to (p_{2} \to p_{3}) .$

Now, consider the abstract proposition where we instead parenthesize to the left:

$(p_{1} \to p_{2}) \to p_{3} .$

Come up with concrete instantiations of $p_{1}$ , $p_{2}$ , and $p_{3}$ and write down the English translation of this proposition.

First-order Logic

Propositional logic gives us a formal language for expressing propositions about mathematical objects. However, is it sufficiently expressive? That is, can we write down any proposition we might have in mind using propositional logic?

On the surface, this seems to be the case because we can instantiate our atomic propositions to be whatever we would like. For example, consider the following proposition:

Every month has a day in which it rains.

We can certainly take this entire proposition to be an atomic proposition, i.e.,

$P = Every month has a day in which it rains .$

However, this is not an ideal encoding of the proposition in logic! Why is this the case? We can see that there is structure to the claim that is lost by taking the whole proposition to be atomic. In this case, the notion of "every" month is not distinguished from the rest of the proposition. This is a common idea in propositions, for example:

For every natural number $n$ , $n > 0$ .
For any boolean expression $e$ , $e ⟶^{*} true$ .
For any pair of people in the group, they are friends with each other.

We would like our formal language of logic to capture this idea of "every" as a distinguished form so that we can reason about it precisely. However, propositional logic has no such form!

To this end, we introduce a common extension to proposition logic, predicate logic (also known as first-order-logic), which introduces a notion of quantification to propositions. Quantification captures precisely this concept of "every" we have identified, and more!

Introducing Quantification

When we express the notion of "every month" or "any pair of people," we are really introducing unknown quantities, i.e., variables, into our propositions. For example, our sample proposition above introduces two such variables, months and days. At first glance, we might try expressing the proposition as an abstract proposition, i.e., a function that expects a month and day and produces a proposition:

$P (x, y) = x has y \land it rains during y .$

We call such functions that produce propositions predicates.

However, this encoding of our proposition is not quite accurate! If we think carefully about the situation, we will note that the interpretation of $x$ and $y$ are subtlety different! We can make this more clear by a slight rewriting of our original proposition:

For every month, there exists a day in which it rains.

Note how we are interested in "every month." However, we are not interested in "every day"; we only identify a single day in which the property holds. The property may hold for more than one day for any given month, but we only care that there is at least one such day.

Thus, we find ourselves needing to quantify our variables in two different ways:

Any value (of a given type).
At least one value (of a given type).

Observe that our predicate notation $P (x, y)$ does not tell us which quantification we are using!

First-order logic extends propositional logic with two additional connectives that make explicit this quantification:

$p ::= \dots ∣ A (x) ∣ \forall x . p ∣ \exists x . p$

Universal quantification of a proposition, written $\forall x . p$ (pronounced "for all $x$ , $p$ ", LaTeX: \forall) introduces a universally quantified variable $x$ that may be mentioned inside of $p$ .
Existential quantification of a proposition, written $\exists x . p$ (pronounced "there exists $x$ , $p$ ", LaTeX: \exists) introduces an existentially quantified variable $x$ that may be mentioned inside of $p$ .

These quantified variables may then appear inside of our atomic propositions. We write $A (x)$ to remind ourselves that our atomic propositions may be parameterized by these quantified variables.

With quantifiers, we can now fully specify our example in first-order logic:

$P = \forall x . \exists y . x has y \land it rains during y .$

The first quantifier $\forall x$ introduces a new variable $x$ that is interpreted universally. That is, the following proposition holds for every possible value of $x$ . The second quantifier $\exists y$ introduces a new variable $y$ that is interpreted existentially. That is, the following proposition holds for at least one possible value of $y$ . When taken together, our English pronunciation of this formal proposition now lines up with our intuition:

For all months $x$ , there exists a day $y$ where $x$ has $y$ and it rains during $y$ .

Variables and Scope in Mathematics

Like function headers in a programming language, quantifiers introduce variables into our propositions. For example, consider the following Python function definition:

def list_length(l):
    match l:
        case []:
            return 0
        case [_, *tail]:
            return 1 + list_length(tail)

Here, the function declaration introduces a new program variable l that may be used anywhere inside of the function list_length. However, l has no meaning outside of list_length. The location where a variable has meaning is called its scope:

Definition (Scope)

The scope of a variable is the region in which that variable has meaning.

In the case of functions, the scope of a parameter is the body of its enclosing function. So if we try to evaluate l outside of the definition of list_length, we receive an error:

>>> list_length([1, 2, 3])
3
>>> l
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'l' is not defined

The same principles hold for logical variables. The quantifiers $\forall x . p$ and $\exists x . p$ introduce an appropriately quantified variable $c$ that has scope inside of $p$ . $x$ does not have meaning outside of $p$ . For example, the proposition:

$P = x > 5 \land \forall x . x \neq = 0.$

Is malformed because the $x$ outside the quantifier is not well-scoped.

Implicit Types and Quantification

Variables are always quantified according to a particular set of values, its type. However, you may have noticed that we never mention the type of a variable in its quantification. How do we know what the type of $x$ is in $\forall x . p$ ? Traditionally, we infer the types of variables based on their usage.

For example, in the proposition:

$\forall x . \exists y . x > y .$

We can see that $x$ and $y$ are used in a numeric comparison $(>)$ , so we assume that the quantified variables range over numbers. Could the variables be quantified more specifically? Certainly! $x$ and $y$ could certainly be real numbers, integers, or even the natural numbers (i.e., the positive integers or zero). We would need to look at more surrounding context, e.g., how this proposition is used in a larger mathematical text, to know whether the variables can be typed in more specific ways.

In addition to implicit types, sometimes quantification is also implicit. For example, the standard statement of the Pythagorean theorem is:

$a^{2} + b^{2} = c^{2}$

Where the quantification of $a$ , $b$ , and $c$ is implicit. What is the implied quantification of these variables? It turns out that when a quantifier is omitted, we frequently intend for the variables to be universally quantified, i.e., the Pythagorean theorem holds for any $a$ , $b$ , and $c$ of appropriate types.

When writing mathematics, you should always explicitly quantify your variables so your intent is clear. However, you should be aware that implicit typing and quantification is pervasive in mathematical texts, so you will need to use inference and your best judgment to make sense of a variable.

Exercise (Quantified Translation, ‡)

Consider the following parameterized propositions:

$A_{1} (x)$ = " $x$ is a kitty."
$A_{2} (x, y)$ = " $x$ likes $y$ ."
$A_{3} (x)$ = " $x$ is a dog."

Translate these formal propositions into informal English descriptions:

$\exists x . \exists y . A_{1} (x) \land A_{3} (y)$ .
$\forall x . \exists y . A_{1} (y) \to A_{2} (x, y)$ .
$\forall x . \forall y . (A_{1} (x) \land A_{3} (y)) \to \neg A_{2} (x, y)$ .

Lab: Modeling with Logic

Problem: Back and Forth

Translate each of the natural language propositions into formal logical propositions. Ensure in your answer that you (1) identify the atomic propositions and assign them to variables and (2) write your formal proposition in terms of these variables. Make sure you maximally translate the proposition using as many of the connectives introduced in the reading. In particular, make sure you use the negation logical operator when appropriate in your answers.

It rained this morning and it will be sunny this afternoon.
If I am forced to stay in-doors this week, I will play Genshin Impact, have fun, and become poor.
If I either eat Taco Bell or McDonalds, I will either become swoll, or I will not survive the night.
If Jan and Bob both work at the company, then Jan certainly works more than Bob.

Now, consider the following parameterized atomic propositions:

$A (x)$ = "I love $x$ ."
$B (x)$ = " $x$ barfs."
$C (x)$ = "I own $x$ ."
$D (x)$ = "I'm sending away $x$ ."

As well as their parameterized versions:

Translate each of the formal logical propositions to natural language propositions. Ensure that your natural language propositions clearly indicate the explicit grouping found in the formal logic propositions.

$A (cats) \land B (cats)$ .
$A (cats) \to \neg C (dogs)$ .
$A (cats) \lor (B (cat) \to C (dog))$ .
$B (my cat) \to D (my cat) \land \neg A (my cat)$ .
$A (dogs) \to C (dogs) \land D (cats)$ .

Note that $(\land)$ and $(\lor)$ have higher precedence than $(\to)$ . So, for example,

$B (my cat) \to D (my cat) \land \neg A (my cat)$

is equivalent to

$B (my cat) \to (D (my cat) \land \neg A (my cat)) .$

Problem: Warm-up

In our examples from the reading relied heavily on non-mathematical, intuitive propositions to convey the meaning of quantified propositions. Now we'll consider translating propositions of a more mathematical nature between formal, symbolic statements and English.

First, translate each of the following formal statements regarding integers into English and for each, explain in a sentence or two why they are true.

$\exists x . \exists y . x - y = 0$ .
$\forall x . \exists y . y > x$ .
$\forall x . \forall y . x \geq y \land y \geq x \to x = y$ .

Now, consider the following theorems taken from Fortnow's paper on his Favorite Ten Complexity Theorems of the Past Decade. Computational complexity theory is a branch of computer science concerning the resource usage (both time and space) of algorithms. While we do not have the background to fully understand what each of Fortnow's theorems (i.e., main propositions) says, we can certainly translate their logical intent. For each of these theorems, identify relevant atomic propositions and predicates and give a formal statement of the theorem maximizing your use of our logical connectives.

Theorem 1: For any one-way function, there exists a pseudorandom generator constructed from that one-way function.
Theorem 2: There are no sparse sets hard for $NP$ via polynominal-time bounded truth-table queries unless $P = NP$ . (Hint: to deal with "unless," think about what it means and try to rewrite the theorem in terms of a "if...then" rather than "unless.")

Problem: Quantification and Negation

The negation of propositions can be tricky to reason about. De Morgan's laws:

$\neg (p_{1} \land p_{2}) \equiv \neg p_{1} \lor \neg p_{2}$
$\neg (p_{1} \lor p_{2}) \equiv \neg p_{1} \land \neg p_{2}$

Remind us that negation doesn't simply "distribute into" connectives like conjunction and disjunction. Instead, De Morgan's laws say we "distribute/factor the negations and flip the signs." Are there similar equivalences for quantifiers? We'll answer this question by using real-world examples and our intuition to gain understanding.

Consider the following variants of propositions involving universal and existential quantification and negation both inside and outside of the quantifier.

$P_{1} = \forall x . A (x)$ .
$P_{2} = \exists x . A (x)$ .
$P_{3} = \neg\forall x . A (x)$ .
$P_{4} = \neg\exists x . A (x)$ .
$P_{5} = \forall x . \neg A (x)$ .
$P_{6} = \exists x . \neg A (x)$ .

Come up with a concrete example predicate $A (x)$ to use for this exploration. For each of these propositions above, translate the proposition into English. Try to choose a predicate that is both memorable and simple to reason about.
Using your English translations as a guide, decide which pairs of propositions are equivalent.
For each pair of propositions you believe are equivalent, write a sentence or two justifying the equivalence, using your concrete example to explain why it holds.

Problem: Thinking About Proof

Thinking ahead, we've used first-order logic to explore the extends of what we can express as far as propositions go. However, we are studying logic in this course to derive a set of rules for proving a proposition. Because our propositions are defined inductively, i.e., as a finite set of cases, our rules for proving a proposition also follow by case analysis on the different cases.

For each possible form of a proposition given in the reading, describe at a high-level how you would go about proving that proposition in a few sentences per form. Use a concrete example of a proposition in English for each case to describe your process. For example, for conjunction $p_{1} \land p_{2}$ , I would instantiate $p_{1}$ and $p_{2}$ to concrete propositions, e.g.,

$p_{1}$ = "The sky is rainy."
$p_{2}$ = "It is dark outside."

And then explain how I would prove $p_{1} \land p_{2}$ . Use your example and your intuition about what each logical connective means to arrive at your process. In tomorrow's reading and lab, we'll firm up your intuition with concrete rules and then look at how we express these rules precisely using mathematical notation.

Natural Deduction

So far we have discussed the objects of study of logic, the proposition, defined recursively as:

$p ::= A (x) ∣ ⊤ ∣ ⊥ ∣ \neg p ∣ p_{1} \land p_{2} ∣ p_{1} \lor p_{2} ∣ p_{1} \to p_{2} ∣ \forall x . p ∣ \exists x . p$

However, recall that our purpose for studying propositional logic was to develop rules for developing logically sound arguments. Equivalences, while convenient to use in certain situations, e.g., rewriting complex boolean equations, do not translate into what we would consider to be "natural" deductive reasoning.

Now, we present a system for performing deductive reasoning within propositional logic. This system closely mirrors our intuition about how we would prove propositions, thus giving it the name natural deduction. As we explore how we formally define deductive reasoning in this chapter, keep in mind your intuition about how these connectives ought to work to help you navigate the symbols that we use to represent this process.

The Components of a Proof

To understand what are the components proof that we capture in natural deduction, let us scrutinize a basic proof involving the even numbers. First we remind ourselves what it means for a number to be even.

Definition (Evenness)

Call a natural number $n$ even if $n = 2 k$ for some natural number $k \geq 0$ .

In other words, an even number is a multiple of two. This intuition informs how we prove the following fact:

Claim

Claim: For any natural number $n$ , if $n$ is even then $4 n$ is also even.

Proof. Let $n$ be an even natural number. Because $n$ is even, by the definition of evenness, $n = 2 k$ for some natural number $k$ . Then:

$4 n = 4 (2 k) = 8 k = 2 (4 k) .$

So we can conclude that $2 n$ is also even by the definition of evenness because it can be expressed as $2 m$ where $m = 4 k$ .

From the above proof, we can see that there is an overall proposition that we must prove, called the goal proposition:

For any natural number $n$ , if $n$ is even then $4 n$ is also even.

As the proof evolves, our goal proposition changes over time according to the steps of reasoning that we take. For example, implicit in the equational reasoning we do in the proof:

$4 n = 4 (2 k)$

Is a transformation of our goal proposition, " $4 n$ is even," to another proposition " $4 (2 k)$ is even." Ultimately, our steps of reasoning transform our goal repeatedly until the goal proposition is "obviously" provable. In the proof above, we transform our goal from " $4 n$ is even" to " $2 (4 k)$ is even" by the rules of arithmetic. At this point, we can apply the definition of evenness directly which says that a number is even if it can be expressed as a multiple of two.

However, in addition to the goal proposition, there are also assumptions that we acquire and use in the proof. For example, the proposition " $n$ is even" becomes an assumption of our proof after we process the initial goal proposition which is stated in the form of an implication, $P \to Q$ , where:

$P =$ " $n$ is even"
$Q =$ " $4 n$ is even"

We then utilize this assumption in our steps of reasoning. In particular, it is the assumption that $n$ is even coupled with the definition of evenness that allows us to conclude that $n = 2 k$ for some natural number $k$ . There exists other assumptions in our proof as well, e.g., $n$ and $k$ are natural numbers, that we use implicitly in our reasoning.

Proof States and Proofs

We can crystallize these observations into the following definition of "proof."

Definition (Proof State)

The state of a proof or proof state is a pair of a set of assumptions and a goal proposition to prove.

Definition (Proof)

A proof is a sequence of logical steps that manipulate a proof state towards a final, provable result.

In our above example, we initially started with a proof state consisting of:

No assumptions.
The initial claim as our goal proposition: "For any natural number $n$ , if $n$ is even then $4 n$ is also even."

After some steps of reasoning that breaks apart this initial goal, we arrived at the following proof state:

Assumptions: " $n$ is a natural number," " $n$ is even"
Goal: " $4 n$ is even."

At the end of the proof, we rewrote the goal in terms of a new variable $m$ and performed some factoring, leading to the final proof state:

Assumptions: " $n$ and $m$ are natural numbers," " $n$ is even," " $m = 4 k$ "
Goal: " $2 m$ is even"

This final goal is provable directly from the definition of evenness since the quantity $2 m$ is precisely two times a natural number.

Rules of Natural Deduction

Our proof rules manipulate proof states, ultimately driving them towards goal propositions that are directly provable through our assumptions. Because of the syntax of our propositions breaks up propositions into a finite set of cases, our rules operate by case analysis on the syntax or shape of the propositions contained in the proof state. Specifically, these propositions can either appear as assumptions or in the goal proposition. Therefore, for each kind of proposition, we give one or more rules describing how we "process" that proposition depending on where it appears in the proof goal.

Rules that operate on propositions in the goal proposition are called introduction rules. This is because we typically first introduce various propositions when they initially appear in our goal.
Rules that operate on propositions in assumptions are called elimination rules. These rules use or "consume" assumptions, resulting in more assumptions and/or an updated set of goals.

A Note on the Presentation of Proof Rules

When we talk about rules of proof we really mean the set of "allowable actions" that we can take on the current proof state. We describe these rules in the following form:

To prove a proof state of the form … we can prove proof state(s) of the form …

One of our proof rules applies when it has the form specified by the rule, e.g., the goal proposition is a conjunction or there is an implication in the assumption list. The result of the proof rule is one or more new proof states that we must now prove instead of the current state. In effect, the proof rule updates the current state to be a new state (or set of states) that becomes the new proof state under consideration.

In terms of notation, we will use $p$ and $q$ to denote arbitrary propositions. Similarly, for assumptions, we will use the traditional metavariable $Γ$ (i.e., Greek uppercase Gamma, $L A T E X$ : \Gamma) to represent an arbitrary set of assumptions. When writing down our proof rules, it is very onerous to describe the components of proof states in prose:

If our proof state contains assumptions $Γ$ and goal proposition $p$ … .

Instead, we will write down a proof state as a pair of a set of assumptions and a proof state. Traditionally, we write pairs using parentheses, separating the components of the pair with commas, e.g., a coordinate pair $(x, y)$ . However, in logic, we traditionally write this pair as a stylized pair called a sequent:

$Γ ⊢ p .$

The turnstile symbol ( $L A T E X$ , \vdash) acts like the comma in the coordinate pair; it merely creates visual separation between the assumptions $Γ$ and the goal proposition $p$ .

Conjunction and Assumptions

A conjunction, $p_{1} \land p_{2}$ , is a proposition that asserts that both propositions $p_{1}$ and $p_{2}$ are provable. For example, the proposition "I'm running out of cheeseburgers and the sky is falling" is a conjunction of two propositions: "I'm running out of cheeseburgers" and "the sky is falling." We call each of the sub-propositions of a conjunction a conjunct.

First, let's consider how we prove a proposition that is a conjunction. If we have to prove the example proposition above, our intuition tells that we must prove both "sides" of the conjunction. That is, to show that the proposition "I'm running out of cheeseburgers and they sky is falling", we must show that both

"I'm running out of cheeseburgers" and
"the sky is falling"

Are true independently of each other. In particular, we don't get to rely on fact being true when going to prove the other fact.

We can codify this intuition into the following proof rule describing how we can process a conjunction when it appears as our proof goal:

Proof Rule [intro-∧]

To prove $Γ ⊢ p_{1} \land p_{2}$ , we may prove the following two proof states separately:

$Γ ⊢ p_{1}$ .
$Γ ⊢ p_{2}$ .

Recall that $Γ ⊢ p_{1} \land p_{2}$ represents the following proof state:

We assume that the assumptions in set $Γ$ are provable.
Our goal proposition we must prove is $p_{1} \land p_{2}$ .

Our proof rule says that whenever we have a proof state of this form, we can prove it by proving $Γ ⊢ p_{1}$ and $Γ ⊢ p_{2}$ as two separate cases. More generally, our introduction rules describe, for each kind of proposition, how to prove that proposition when it appears as a goal.

Note that this proof rule applies only when the overall goal proposition is of the form of a conjunction. For example:

If our goal is $A \land B$ , then the intro-∧ rule applies with $p_{1} = A$ and $p_{2} = B$ .
If our goal is $(A \to B) \land (C \land D)$ , then the intro-∧ rule applies with $p_{1} = A \to B$ and $p_{2} = C \land D)$ .

This rule cannot be applied if only a sub-component of the goal proposition is a conjunction. For example, if our goal is $A \to (B \land C)$ , the rule does not apply even though $B \land C$ is part of the goal.

Now, what happens if the conjunction appears to the left of the turnstile, i.e., as an assumption? From our intro-∧ rule, we know that if the conjunction is provable, then both conjuncts are provable individually. So if we know that "I'm running out of cheeseburgers and the sky is falling" is provable, then we know that both conjuncts are true, i.e., "I'm running out of chesseburgers" and "the sky is falling." The following pair of rules captures this idea:

Proof Rule [elim-∧-left]

To prove $Γ ⊢ p$ , if $p_{1} \land p_{2} \in Γ$ , then we may prove $p_{1}, Γ ⊢ p$ .

Proof Rule [elim-∧-right]

To prove $Γ ⊢ p$ , if $p_{1} \land p_{2} \in Γ$ , then we may prove $p_{2}, Γ ⊢ p$ .

If we are trying to prove some proposition $p$ and we assume that a conjunction $p_{1} \land p_{2}$ is true, then the two rules say we can continue trying to prove $p$ , but with an additional assumption gained from the conjunction. In terms of the new notation in these rules:

To state that $p_{1} \land p_{2}$ is contained in $Γ$ , we write $p_{1} \land p_{2} \in G amma$ where the $(\in)$ symbol ( $L A T E X$ , \in) is pronounced "in."
To add a new assumption to the set of assumptions $Γ$ , we "cons" on the new assumption, e.g., $p_{1}$ in the elim-∧-left rule, by adding it to the front of $Γ$ with a comma, i.e., $p_{1}, Γ$ .

The elim-∧-left rule allows us to add the left conjunct to our assumptions, and the elim-∧-right rule allows us to add the right conjunct. In effect, these elimination rules allow us to decompose or extract information from assumptions. However, how do we use these assumptions to prove a goal? The assumption rule allows us to prove a goal proposition directly if it is one of our assumptions.

Proof Rule [assumption]

If $p \in Γ$ , then $Γ ⊢ p$ is proven immediately.

Proofs in Natural Deduction

First-order logic and natural deduction give us all the definitions necessary for rigorously defining every step of reasoning in a proof. Let's see this in action with a simple claim within formal propositional logic.

Claim: $A, B ⊢ A \land B$ .

First, let's make sure we understand what the claim says. To the left of the turnstile are our set of assumptions. Here we assume that propositions $A$ and $B$ are provable. To the right of the turnstile is our goal. We are trying to prove that $A \land B$ is provable.

Next, let's develop a high-level proof strategy. Because we are working at such a low-level of proof, it is important to ask ourselves:

Is the proof state as I understand it provable?
How do the different parts of the proof state relate to each other?

In our example claim, we see that the proposition consists of atomic propositions $A$ and $B$ joined by conjunction and those same propositions appear as assumptions in the initial sequent. Thus, our overall strategy in our proof will be to decompose the proof goal into cases where we need to prove these individual cases directly via our hypotheses.

With this in mind, let's see what the formal proof looks like:

Proof

Claim: $A, B ⊢ A \land B$ .

Proof. By the intro-∧ rule, we must prove two new goals:

Case $A, B ⊢ A$ . $A$ is an assumption, so we are done.
Case $A, B ⊢ B$ . $B$ is an assumption, so we are done.

Because each proof rule creates zero or more additional proof states that we must prove or discharge, our proofs take on a tree-like shape. In a diagram, our reasoning would look as follows:

      A, B |- A ∧ B
        /        \
  [intro-∧ (1)]   [intro-∧ (2)]
      /            \
 A, B |- A     A, B |- B
    /                \
[assumption: A]  [assumption: B]

Because the intro-∧ rule creates two sub-goals we must prove, we have two branches of reasoning emanating from our initial proof state, one for each goal (labeled (1) and (2), respectively). We then prove each branch immediately by invoking the appropriate assumption. Every proof, not just in formal logic but in any context, has a hierarchical structure like this.

We can write this hierarchical structure in linear prose with a bulleted list where indentation levels correspond to branching. Whenever we perform case analysis, we should be clear when we are entering different cases, usually through some kind of sub-heading or bullet-like structure.

Finally, note that we cite every step of our proof. This "luxury" is afforded to us because our proof rules are now explicit and precise---we can justify every step of reasoning as an invocation of one of our proof rules!

Exercise (Adaption)

Formally prove the following claim in propositional logic:

Claim: $A, B ⊢ A \land B \land A .$

(Hint: remember that $(\land)$ is a left-associative operator!)

Implication

Next, let's look at implication. An implication, $p_{1} \to p_{2}$ , is a proposition that asserts that whenever $p_{1}$ is provable, then $p_{2}$ is provable as well. For example, the proposition "If I'm running out of cheeseburgers then the sky is falling" is an implication where "I'm running out of cheeseburgers" is the premise of the implication and "the sky is falling" is the conclusion.

Implication is closely related to the preconditions and postconditions we analyzed in our study of program correctness. Pre- and postconditions form an implication where the preconditions are premises and the postcondition is a conclusion. When we went to prove a claim that involved pre- and postconditions, we assumed that preconditions held and went on to prove the postcondition. Likewise here, to prove an implication, we assume the premises and then go on to prove the conclusion.

Proof Rule [intro-→]

To prove $Γ ⊢ p_{1} \to p_{2}$ we must prove $p_{1}, Γ ⊢ p_{2}$ .

What do we do if the implication appears as a hypothesis, i.e., to the left of the turnstile, rather than the right? We saw this process, too, in our discussion of program correctness. If we had an auxiliary claim that we proven that was the form of a conditional, or if our induction hypothesis was a conditional, we needed to first prove the preconditions and then we could assume that the conclusion held. Thus, to use an assumed implication, we must first prove its premise and then we can assume the conclusion as a new hypothesis:

Proof Rule [elim-→]

To prove $Γ ⊢ p$ , if $p_{1} \to p_{2} \in Γ$ , then we may prove both $Γ ⊢ p_{1}$ and $p_{2}, Γ ⊢ p$ .

After using elim-→, we must prove two proof goals:

The first requires us to prove that the premise of the assumed implication is provable.
The second requires us to prove our original goal, but with the additional information of the conclusion of implication.

If you are familiar with logic already, e.g., from a symbolic logic class, you should recognize elim-→ as the modus ponens rule that says:

If $p$ implies $q$ and $p$ is true, then $q$ is true as well.

Here is a pair of simple example proofs that illustrate the basic usage of these rules:

Proof

Claim: $\cdot ⊢ A \to A$ .

Proof. By intro-→, we first assume that $A$ is provable and go on to show that $A$ is provable. However, this is simply the assumption that we just acquired.

Proof

Claim: $A \to B, A ⊢ B$ .

Proof. By elim-→, we must show that $A$ is provable and then we may assume $B$ is provable. $A$ is provable by assumption and the goal $B$ is provable by the assumption we gained from eliminating the assumption.

Compare the flow of these proofs with the rules presented in this section to make sure you understand how the rules translate into actual proof steps.

Disjunction

Now, let's look at disjunction. A disjunction, $p_{1} \lor p_{2}$ , is a proposition that asserts that one or both of $p_{1}$ and $p_{2}$ are provable as well. For example, the proposition "I'm running out of cheeseburgers or the sky is falling" is a disjunction where at least one of "I'm running out of cheeseburgers" and "the sky is falling" must be provable.

Again, using our intuition, we can see that if we have to prove a disjunction, our job is easier than with a conjunction. With a conjunction we must prove both conjuncts; with disjunction, we may prove only one of the disjuncts.

Proof Rule [intro-∨-left]

To prove $Γ ⊢ p_{1} \lor p_{2}$ , we may prove $Γ ⊢ p_{1}$ .

Proof Rule [intro-∨-right]

To prove $Γ ⊢ p_{1} \lor p_{2}$ , we may prove $Γ ⊢ p_{2}$ .

The intro-∨-left and intro-∨-right rules allow us to explicit choose the left- and right-hand sides of the disjunction to prove, respectively.

We have flexibility in proving a disjunction. However, that flexibility results in complications when we reasoning about what we know if a disjunction is assumed to be true. As an example, suppose that our example disjunction from above is assumed to be true. What do we know as a result of this fact? Well, we know that at least one of "I'm running out of cheeseburgers" and "the sky is falling" is true. The problem is that we don't know which one is true!

We seem to be stuck! It doesn't seem like we can extract any interesting information from a disjunction. Indeed, a disjunction doesn't give us the same direct access to new information like a conjunction. However, disjunctions can be used in an indirect manner to prove a claim!

Suppose that we are trying to prove the claim that "the apocalypse is now" and we know our disjunction is true. We know that at least of the disjuncts is true, so if we can do the following:

Assume that "I'm running out of cheeseburgers" and then prove "the apocalypse is now."
Assume that "the sky is falling" and then prove "the apocalypse is now."

It must be the case that apocalypse is happening because both disjuncts imply the apocalypse and we know at least one of the disjuncts is true!

This logic gives rise to the following left-rule for disjunction:

Proof Rule [elim-∨]

To prove $Γ ⊢ p$ , if $p_{1} \lor p_{2} \in Γ$ , then we may prove $p_{1}, Γ ⊢ p$ and $p_{2}, Γ ⊢ p$ .

In effect, a disjunction gives us additional assumptions when proving a claim, but we must consider all possible cases when doing so. This reasoning should sound familiar: this is precisely the kind of reasoning we invoke when analyzing the guard of a conditional in a computer program: "either the guard evaluates to true or false." This reasoning is possible because we know that the guard of a conditional produces a boolean value and booleans can only take on two values.

Top, Bottom, and Negation

Finally, we arrive at our two "trivial" propositions. Recall that $⊤$ ("top") is the proposition that is always provable and $⊥$ ("bottom") is the proposition that is never provable.

First let's consider $⊤$ . Since $⊤$ is always provable, its introduction is straightforward to write down:

Proof Rule [intro-⊤]

$Γ ⊢ ⊤$ is always provable.

But what if we know $⊤$ holds as an assumption? Well that doesn't mean much because intro-⊤ tells us that we can always prove $⊤$ ! Indeed, there is no left-rule for $⊤$ because knowing $⊤$ holds does not tell us anything new.

Similarly, because $⊥$ is never provable, there is no introduction rule for $⊥$ because we should never be able to prove it! But what does it means if $⊥$ somehow becomes an assumption in our proof state? Think about what this implies:

$⊥$ is defined to never be provable.
We assume that $⊥$ is provable as a hypothesis.

This is a logical contradiction: we are assuming something we know is not true! We are now in an inconsistent state of reasoning where, it turns out, anything is provable. This gives rise to the following elimination rule for $⊥$ :

Proof Rule [elim-⊥]

If $Γ ⊢ p$ and $⊥ \in Γ$ then our goal $p$ is always provable.

Finally, how do we handle negation? It turns out rather than giving proof rules for negation, we'll employ an equivalence, relating negation to implication:

Definition (Negation)

$\neg p \equiv p \to ⊥$ .

$p \to ⊥$ means that whenever we can prove $p$ we can "prove" a contradiction. Since contradictions cannot arise, it must be the case that $\neg p$ must be true instead.

Exercise (Follow the Rules, ‡)

Formally prove the following claim:

Claim: $A, B ⊢ A \land (B \lor C)$ .

Reasoning About Quantifiers

We have given rules for all of the connectives in propositional logic. But what about the additional constructs from first-order logic: universal and existential quantification? Let's study each of these constructs in turn.

Universal Quantification

A universally quantified proposition, for example, $\forall x . x \geq 5$ holds for all possible values of its quantified variable. Here, $x \geq 5$ means that every possible value of $x$ is greater than five.

Because the proposition must hold for all possible values, when we go to prove a universally quantified proposition, we must consider the value as arbitrary. In other words, we don't get to assume anything about the universally quantified variable. As we discussed during the program correctness section of the course, this amounts to substituting an unknown, yet constant value for that variable when we go to reason about the proposition. In practice, we think of the variable as the unknown, yet constant value, but it is important to remember that we need to remove the variable, either implicitly or explicitly, from the proposition before we can continue processing it.

In contrast, when we have a universally quantified proposition as an assumption, we can take advantage of the fact that the proposition holds for all values. We do so by choosing a particular value for the proposition's quantified variable, a process called instantiation. We can choose whatever value we want, or even instantiate the proposition multiple times to many variables, depending on the situation at hand.

We can summarize this behavior as introduction and elimination rules in our natural deduction system:

Proof Rule [intro-∀]

To prove $Γ ⊢ \forall x . p$ , we must prove $Γ ⊢ \forall [c / x] p$ where $[c / x] p$ is the substitution of some unknown constant $c$ for $x$ everywhere $x$ occurs in proposition $p$ .

Proof Rule [elim-∀]

To prove $Γ ⊢ p$ , if $\forall x . p \in Γ$ , then we may prove $[v / x] p, Γ ⊢ p$ where $[v / x] p$ is the substitution of some chosen value $v$ for $x$ everywhere $x$ occurs in $p$ .

The difference between the rules is subtle but important to summarize:

In intro-∀, we hold the quantified variable abstract.
In elim-∀, we choose a particular value for the variable.

We formalize this through substitution notation. The proposition $[e / x] p$ is proposition $p$ but every occurrence of variable $x$ is replaced with $e$ . Furthermore, we use the convention that:

$c$ is a unknown, yet constant value.
$v$ is a chosen value.

From this, we see that in the intro-∀ rule, we substitute an unknown, constant value $c$ for the quantified variable $x$ . In contrast, in the elim-∀ rule, we substitute a chosen value $v$ for $x$ .

Existential Quantification

An existentially quantified proposition, for example, $\exists x . x \geq 5$ holds for at least possible value that its quantified variable can take on. Here, $x \geq 5$ means that there is at least one value for $x$ such that $x$ is greater than five.

Since an existentially quantified proposition holds if a single value makes the proposition true, then we get the luxury of choosing such a value when going to prove an existential. For example, we might choose $x = 10$ and then we would be tasked with proving that $10 \geq 5$ . However, this flexibility comes at a price. When know an existential is provable, we do not know what value(s) make the proposition true. So the only thing we know is the existentially quantified proposition is true for some unknown, constant value.

We can summarize this behavior with a pair of rules as well:

Proof Rule [intro-∃]

To prove $Γ ⊢ \exists x . p$ , we must prove $Γ ⊢ \exists [v / x] p$ where $[v / x] p$ is the substitution of some chosen $v$ for $x$ in $p$ .

Proof Rule [elim-∃]

To prove $Γ ⊢ p$ , if $\exists x . p \in Γ$ , then we may prove $[c / x] p, Γ ⊢ p$ where $[c / x] p$ is the substitution of some arbitrary constant $c$ for $x$ everywhere $x$ occurs in $p$ .

Summary of Reasoning with Universals and Existentials

We can summarize our reasoning principles using universals and existentials with the following table:

Position/Quantifier	$\forall x . p$	$\exists x . p$
Goal	$x$ is arbitrary	$x$ is chosen
Assumption	$x$ is chosen	$x$ is arbitrary

Note how the rules of reasoning about universal and existential quantification are flipped depending on whether the proposition appears in goal or assumption position. We call mathematical objects that have a reciprocal relationship of this nature duals. In other words, universal and existential quantification are duals of each other. This dual nature leads to a number of interesting properties between the two connectives, e.g., De Morgan's law-style reasoning:

$\neg\forall x . p \equiv \exists x . \neg p \neg\exists x . p \equiv \forall x . \neg p .$

Another example of duals in logic are conjunction and disjunction. Compare their introduction and elimination rules:

Position/Connective	$p \land q$	$p \lor q$
Goal	Prove both	Prove one
Assumption	Assume one	Assume both

Here, the duality manifests itself in whether we choose to analyze one or both of the arguments to the connective.

Duals highlight an important goal of mathematical modeling. When we model a phenomena, we are interested in understanding the relationship between objects in our models. As mathematicians, we care about these relationships so that we discover and ultimately prove properties about these objects. As computer scientists and programmers, we can exploit these relationships to write more efficient or concise code.

Exercise (Alternation, ‡)

Consider the following abstract first-order proposition:

$\forall x . \exists y . \forall z . p (x, y, z) .$

For each of the variables $x$ , $y$ , and $z$ identify whether you must hold the variable abstract or when you get to choose a value for that variable when you are:

Proving the proposition.
Utilizing the proposition as an assumption.

Logical Reasoning

In today's brief lab, we'll begin applying the rules of natural deduction to write rigorous, low-level proofs.

Problem: Starting Off

For each claim:

In a sentence, describe what the claim is saying. Your description should be in the rough form "under assumptions ... we must show that ... ." You can reference the atomic propositions in the claims directly in your description, i.e., you don't need to instantiate the proof state into a real-world context.
Give a rigorous, natural deduction-style proof of the claim.

Claim 1

$\cdot ⊢ (A \to B) \to A \to B$ .

Claim 2

$A ⊢ (A \to B) \to (B \to C) \to C$ .

Natural Deduction

So far we have discussed the objects of study of logic, the proposition, defined recursively as:

$p ::= A (x) ∣ ⊤ ∣ ⊥ ∣ \neg p ∣ p_{1} \land p_{2} ∣ p_{1} \lor p_{2} ∣ p_{1} \to p_{2} ∣ \forall x . p ∣ \exists x . p$

The Components of a Proof

Definition (Evenness)

Call a natural number $n$ even if $n = 2 k$ for some natural number $k \geq 0$ .

In other words, an even number is a multiple of two. This intuition informs how we prove the following fact:

Claim

Claim: For any natural number $n$ , if $n$ is even then $4 n$ is also even.

Proof. Let $n$ be an even natural number. Because $n$ is even, by the definition of evenness, $n = 2 k$ for some natural number $k$ . Then:

$4 n = 4 (2 k) = 8 k = 2 (4 k) .$

So we can conclude that $2 n$ is also even by the definition of evenness because it can be expressed as $2 m$ where $m = 4 k$ .

From the above proof, we can see that there is an overall proposition that we must prove, called the goal proposition:

For any natural number $n$ , if $n$ is even then $4 n$ is also even.

As the proof evolves, our goal proposition changes over time according to the steps of reasoning that we take. For example, implicit in the equational reasoning we do in the proof:

$4 n = 4 (2 k)$

$P =$ " $n$ is even"
$Q =$ " $4 n$ is even"

Proof States and Proofs

We can crystallize these observations into the following definition of "proof."

Definition (Proof State)

The state of a proof or proof state is a pair of a set of assumptions and a goal proposition to prove.

Definition (Proof)

A proof is a sequence of logical steps that manipulate a proof state towards a final, provable result.

In our above example, we initially started with a proof state consisting of:

No assumptions.
The initial claim as our goal proposition: "For any natural number $n$ , if $n$ is even then $4 n$ is also even."

After some steps of reasoning that breaks apart this initial goal, we arrived at the following proof state:

Assumptions: " $n$ is a natural number," " $n$ is even"
Goal: " $4 n$ is even."

At the end of the proof, we rewrote the goal in terms of a new variable $m$ and performed some factoring, leading to the final proof state:

Assumptions: " $n$ and $m$ are natural numbers," " $n$ is even," " $m = 4 k$ "
Goal: " $2 m$ is even"

This final goal is provable directly from the definition of evenness since the quantity $2 m$ is precisely two times a natural number.

Rules of Natural Deduction

Rules that operate on propositions in the goal proposition are called introduction rules. This is because we typically first introduce various propositions when they initially appear in our goal.
Rules that operate on propositions in assumptions are called elimination rules. These rules use or "consume" assumptions, resulting in more assumptions and/or an updated set of goals.

A Note on the Presentation of Proof Rules

When we talk about rules of proof we really mean the set of "allowable actions" that we can take on the current proof state. We describe these rules in the following form:

To prove a proof state of the form … we can prove proof state(s) of the form …

If our proof state contains assumptions $Γ$ and goal proposition $p$ … .

$Γ ⊢ p .$

The turnstile symbol ( $L A T E X$ , \vdash) acts like the comma in the coordinate pair; it merely creates visual separation between the assumptions $Γ$ and the goal proposition $p$ .

Conjunction and Assumptions

"I'm running out of cheeseburgers" and
"the sky is falling"

Are true independently of each other. In particular, we don't get to rely on fact being true when going to prove the other fact.

We can codify this intuition into the following proof rule describing how we can process a conjunction when it appears as our proof goal:

Proof Rule [intro-∧]

To prove $Γ ⊢ p_{1} \land p_{2}$ , we may prove the following two proof states separately:

$Γ ⊢ p_{1}$ .
$Γ ⊢ p_{2}$ .

Recall that $Γ ⊢ p_{1} \land p_{2}$ represents the following proof state:

We assume that the assumptions in set $Γ$ are provable.
Our goal proposition we must prove is $p_{1} \land p_{2}$ .

Note that this proof rule applies only when the overall goal proposition is of the form of a conjunction. For example:

If our goal is $A \land B$ , then the intro-∧ rule applies with $p_{1} = A$ and $p_{2} = B$ .
If our goal is $(A \to B) \land (C \land D)$ , then the intro-∧ rule applies with $p_{1} = A \to B$ and $p_{2} = C \land D)$ .

Proof Rule [elim-∧-left]

To prove $Γ ⊢ p$ , if $p_{1} \land p_{2} \in Γ$ , then we may prove $p_{1}, Γ ⊢ p$ .

Proof Rule [elim-∧-right]

To prove $Γ ⊢ p$ , if $p_{1} \land p_{2} \in Γ$ , then we may prove $p_{2}, Γ ⊢ p$ .

To state that $p_{1} \land p_{2}$ is contained in $Γ$ , we write $p_{1} \land p_{2} \in G amma$ where the $(\in)$ symbol ( $L A T E X$ , \in) is pronounced "in."
To add a new assumption to the set of assumptions $Γ$ , we "cons" on the new assumption, e.g., $p_{1}$ in the elim-∧-left rule, by adding it to the front of $Γ$ with a comma, i.e., $p_{1}, Γ$ .

Proof Rule [assumption]

If $p \in Γ$ , then $Γ ⊢ p$ is proven immediately.

Proofs in Natural Deduction

Claim: $A, B ⊢ A \land B$ .

Next, let's develop a high-level proof strategy. Because we are working at such a low-level of proof, it is important to ask ourselves:

Is the proof state as I understand it provable?
How do the different parts of the proof state relate to each other?

With this in mind, let's see what the formal proof looks like:

Proof

Claim: $A, B ⊢ A \land B$ .

Proof. By the intro-∧ rule, we must prove two new goals:

Case $A, B ⊢ A$ . $A$ is an assumption, so we are done.
Case $A, B ⊢ B$ . $B$ is an assumption, so we are done.

Because each proof rule creates zero or more additional proof states that we must prove or discharge, our proofs take on a tree-like shape. In a diagram, our reasoning would look as follows:

      A, B |- A ∧ B
        /        \
  [intro-∧ (1)]   [intro-∧ (2)]
      /            \
 A, B |- A     A, B |- B
    /                \
[assumption: A]  [assumption: B]

Exercise (Adaption)

Formally prove the following claim in propositional logic:

Claim: $A, B ⊢ A \land B \land A .$

(Hint: remember that $(\land)$ is a left-associative operator!)

Implication

Proof Rule [intro-→]

To prove $Γ ⊢ p_{1} \to p_{2}$ we must prove $p_{1}, Γ ⊢ p_{2}$ .

Proof Rule [elim-→]

To prove $Γ ⊢ p$ , if $p_{1} \to p_{2} \in Γ$ , then we may prove both $Γ ⊢ p_{1}$ and $p_{2}, Γ ⊢ p$ .

After using elim-→, we must prove two proof goals:

The first requires us to prove that the premise of the assumed implication is provable.
The second requires us to prove our original goal, but with the additional information of the conclusion of implication.

If you are familiar with logic already, e.g., from a symbolic logic class, you should recognize elim-→ as the modus ponens rule that says:

If $p$ implies $q$ and $p$ is true, then $q$ is true as well.

Here is a pair of simple example proofs that illustrate the basic usage of these rules:

Proof

Claim: $\cdot ⊢ A \to A$ .

Proof. By intro-→, we first assume that $A$ is provable and go on to show that $A$ is provable. However, this is simply the assumption that we just acquired.

Proof

Claim: $A \to B, A ⊢ B$ .

Compare the flow of these proofs with the rules presented in this section to make sure you understand how the rules translate into actual proof steps.

Disjunction

Proof Rule [intro-∨-left]

To prove $Γ ⊢ p_{1} \lor p_{2}$ , we may prove $Γ ⊢ p_{1}$ .

Proof Rule [intro-∨-right]

To prove $Γ ⊢ p_{1} \lor p_{2}$ , we may prove $Γ ⊢ p_{2}$ .

The intro-∨-left and intro-∨-right rules allow us to explicit choose the left- and right-hand sides of the disjunction to prove, respectively.

Suppose that we are trying to prove the claim that "the apocalypse is now" and we know our disjunction is true. We know that at least of the disjuncts is true, so if we can do the following:

Assume that "I'm running out of cheeseburgers" and then prove "the apocalypse is now."
Assume that "the sky is falling" and then prove "the apocalypse is now."

It must be the case that apocalypse is happening because both disjuncts imply the apocalypse and we know at least one of the disjuncts is true!

This logic gives rise to the following left-rule for disjunction:

Proof Rule [elim-∨]

To prove $Γ ⊢ p$ , if $p_{1} \lor p_{2} \in Γ$ , then we may prove $p_{1}, Γ ⊢ p$ and $p_{2}, Γ ⊢ p$ .

Top, Bottom, and Negation

Finally, we arrive at our two "trivial" propositions. Recall that $⊤$ ("top") is the proposition that is always provable and $⊥$ ("bottom") is the proposition that is never provable.

First let's consider $⊤$ . Since $⊤$ is always provable, its introduction is straightforward to write down:

Proof Rule [intro-⊤]

$Γ ⊢ ⊤$ is always provable.

$⊥$ is defined to never be provable.
We assume that $⊥$ is provable as a hypothesis.

Proof Rule [elim-⊥]

If $Γ ⊢ p$ and $⊥ \in Γ$ then our goal $p$ is always provable.

Finally, how do we handle negation? It turns out rather than giving proof rules for negation, we'll employ an equivalence, relating negation to implication:

Definition (Negation)

$\neg p \equiv p \to ⊥$ .

$p \to ⊥$ means that whenever we can prove $p$ we can "prove" a contradiction. Since contradictions cannot arise, it must be the case that $\neg p$ must be true instead.

Exercise (Follow the Rules, ‡)

Formally prove the following claim:

Claim: $A, B ⊢ A \land (B \lor C)$ .

Reasoning About Quantifiers

Universal Quantification

We can summarize this behavior as introduction and elimination rules in our natural deduction system:

Proof Rule [intro-∀]

To prove $Γ ⊢ \forall x . p$ , we must prove $Γ ⊢ \forall [c / x] p$ where $[c / x] p$ is the substitution of some unknown constant $c$ for $x$ everywhere $x$ occurs in proposition $p$ .

Proof Rule [elim-∀]

To prove $Γ ⊢ p$ , if $\forall x . p \in Γ$ , then we may prove $[v / x] p, Γ ⊢ p$ where $[v / x] p$ is the substitution of some chosen value $v$ for $x$ everywhere $x$ occurs in $p$ .

The difference between the rules is subtle but important to summarize:

In intro-∀, we hold the quantified variable abstract.
In elim-∀, we choose a particular value for the variable.

We formalize this through substitution notation. The proposition $[e / x] p$ is proposition $p$ but every occurrence of variable $x$ is replaced with $e$ . Furthermore, we use the convention that:

$c$ is a unknown, yet constant value.
$v$ is a chosen value.

Existential Quantification

We can summarize this behavior with a pair of rules as well:

Proof Rule [intro-∃]

To prove $Γ ⊢ \exists x . p$ , we must prove $Γ ⊢ \exists [v / x] p$ where $[v / x] p$ is the substitution of some chosen $v$ for $x$ in $p$ .

Proof Rule [elim-∃]

To prove $Γ ⊢ p$ , if $\exists x . p \in Γ$ , then we may prove $[c / x] p, Γ ⊢ p$ where $[c / x] p$ is the substitution of some arbitrary constant $c$ for $x$ everywhere $x$ occurs in $p$ .

Summary of Reasoning with Universals and Existentials

We can summarize our reasoning principles using universals and existentials with the following table:

Position/Quantifier	$\forall x . p$	$\exists x . p$
Goal	$x$ is arbitrary	$x$ is chosen
Assumption	$x$ is chosen	$x$ is arbitrary

$\neg\forall x . p \equiv \exists x . \neg p \neg\exists x . p \equiv \forall x . \neg p .$

Another example of duals in logic are conjunction and disjunction. Compare their introduction and elimination rules:

Position/Connective	$p \land q$	$p \lor q$
Goal	Prove both	Prove one
Assumption	Assume one	Assume both

Here, the duality manifests itself in whether we choose to analyze one or both of the arguments to the connective.

Exercise (Alternation, ‡)

Consider the following abstract first-order proposition:

$\forall x . \exists y . \forall z . p (x, y, z) .$

For each of the variables $x$ , $y$ , and $z$ identify whether you must hold the variable abstract or when you get to choose a value for that variable when you are:

Proving the proposition.
Utilizing the proposition as an assumption.

More Logical Reasoning

Problem 1: Logical Equivalences

In our formulation of mathematical logic, we take negation to be equivalent to an implication with $⊥$ :

$\neg p \equiv p \to ⊥.$

It turns out equivalence itself is also a logical proposition! We say that $p \equiv q$ if whenever $p$ is provable, then $q$ is provable as well and vice versa. This leads to the definition of logical equivalence:

Definition (Logical Equivalence)

We say that proposition $p$ is (logically) equivalent to $q$ , written $p \equiv q$ , whenever $p$ is provable, then $q$ is provable and whenever $q$ is provable, then $p$ is provable. In other words, $p \equiv q$ exactly when the two propositions are provable:

$p \to q$ and
$q \to p$ .

We say that proving an equivalence is proof "in both directions."

Use this formal definition of logical equivalence to prove these common properties of propositional logic:

Claim 1 (Idempotence of Conjunction): $\cdot ⊢ p \land p \equiv p$ .

Claim 2 (Absorption of Disjunction): $\cdot ⊢ p \lor (p \land q) \equiv p$ .

Problem 2: Coup De Grâce

Finally, try to prove this more complicated claim that brings together a number of the rules we introduced in the reading.

Claim: $\cdot ⊢ (A \to (B \lor C)) \to (A \land \neg C) \to B$ .

(Hint: make a plan regarding how you will plan the overall goal $B$ in terms of the premises $A \to (B \lor C)$ and $A \land \neg C$ . How will the premises contribute towards proving $B$ ?)

Additional Topics in Logic

Logical reasoning and program correctness form the backbone of our mathematical and programming endeavors. To close this first portion of the course, we look at some final, miscellaneous topics that further tie together these two critical concepts together.

The Spectrum of Formality

What does it mean for a proof to be "formal?" Is it enough for a proof to contain symbols and reference relevant proof rules? In today's class we will answer these important questions!

Formal Proof Revisited

Previously we discussed the role of intuition and formal definition in our understanding of mathematics. Recall that mathematics is the science of modeling abstract phenomena. Our intuition helps us gain a foothold to understand these abstract models, in particular, by instantiating them to real-world examples we understand. However, these models are ultimately determined by their formal definitions.

If we are interested in rigorous, logical arguments, then they must also be rooted in these formal definitions rather than our intuition. This is what we mean by "formal proof!"

Definition (Formal Proof)

A formal proof is a mathematical argument rooted in formal definitions.

To put it another way, a formal proof is a logical argument whose steps of reasoning are justified by the formal definitions of whatever model we are operating in. For example, consider the inequality proof involving the mathematical factorial function from a previous reading:

Definition (Factorial)

factorial is a function of type $N \to N$ ( $N$ is the set of natural numbers), written $n!$ , defined inductively on $n$ as follows:

$0! = n! = 1 n \times (n - 1)!$

Proof

Claim: for any natural number $n$ , $n! > 0$ .

Proof. Let $n$ be a natural number. We proceed by induction on $n$ .

Case $n = 0$ . $0! = 1 > 0$ .
Case $n = k + 1$ . We assume that:

IH: $(n - 1)! > 0$

And must prove:

Goal: $n! > 0$ .

We observe that since $n$ is non-zero that:

$n! = n \times (n - 1)!$

By our induction hypothesis, $(n - 1)! > 0$ , i.e., a positive natural number. Since $n$ is non-zero, we know from multiplying two positive natural numbers results in a positive natural number, thus justifying our desired result:

$n \times (n - 1)! > 0.$

I think we can all agree this a formal proof! Observe how our argument proceeds by expanding $n!$ according to its definition and appealing to mathematical facts we know from our previous math education.

In contrast, consider this less formal proof:

(A Less Formal) Proof

Claim: for any natural number $n$ , $n! > 0$ .

Proof. Let $n$ be a natural number. We observe that the inequality holds because factorial always produces a non-zero, positive result.

Why is this proof less formal than the previous one? Note that this proof does not say anything inaccurate! Formality is not about correctness! It is a matter of whether appropriate justification exists for the reasoning in the argument.

In this example, we do not justify why factorial always produces a non-zero, positive result. Here's another attempt that introduces some additional information:

(A Slightly Less Formal) Proof

Claim: for any natural number $n$ , $n! > 0$ .

Proof. Let $n$ be a natural number. We observe that the inequality holds because factorial always produces a non-zero, positive result. This is because $n!$ is the product of all the positive natural numbers from $1$ to $n$ , and we know that the product of positive natural numbers is, itself, a positive natural number.

Now we introduced prose to justify our reasoning, is this proof formal? I would argue not! While we have an intuition that factorial is the product of the positive natural numbers from $1$ to $n$ , that is not the formal definition we have given factorial. In this sense, this third proof is still not formal because it does not rely on the formal definitions of our mathematical objects.

With this in mind, look at this excerpt from our original proof:

Since $n$ is non-zero, we know from multiplying two positive natural numbers results in a positive natural number, thus justifying our desired result:

This is awfully similar to the second proof in that we stated a "fact" without justification. Do we need to provide justification for this fact? But wait, we didn't given any justification for our mathematical induction proof principle. Do we need to justify that as well?

We can play this game ad nauseam because, like computer programs, our mathematical models are built upon stacks of abstractions. For example, underlying our proof are our principles of natural deduction. In particular, the line:

Let $n$ be a natural number.

Is really an application of our intro-∀ rule where we prove a universally quantified proposition by instantiating its quantified variable with an arbitrary value. Did we need to cite this rule in our proof to be truly formal?

Our way out of this madness is to recognize that formality in mathematical arguments is not a binary formal/not-formal question. Instead, mathematical arguments span a spectrum of formality where their level of formality is defined by what top-level assumptions they make.

Definition (Trusted Reasoning Base)

In a formal argument, our trusted reasoning base (TRB) is the set of assumptions we make about the world in order to establish our formal argument.

In our "formal" proof above, our TRB includes:

First-order logic and natural deduction reasoning principles.
The soundness of mathematical induction.
Facts about arithmetic, i.e., the result of multiplying positive natural numbers.

When making mathematical arguments, we have to be cognizant of the assumptions we are making and, thus, our position on the spectrum of formality.

Symbols, Formality, and Readability

With all that being said, a common misconception we should address is that the formality of a proof is a function of the number of symbols it includes. With formality defined above as the size of our TRB, how do symbols fit into the picture? Here is our original proof again, but this time, without symbols:

(A formal?) Proof

Claim: for any natural number $n$ , factorial of $n$ is greater than $0$ .

Proof. Let $n$ be a natural number. We proceed by induction on $n$ .

In the case where $n$ is zero, we know that factorial of zero is equal to one which is certainly greater than zero.
In the case where $n$ is non-zero, we assume that:

IH: Factorial of $n$ minus one is greater than zero.

And must prove:

Goal: Factorial of $n$ is greater than zero.

We observe that since $n$ is non-zero that $n$ factorial is equal to $n$ times $n$ minus one factorial. By our induction hypothesis, we know that $n$ minus one is greater than zero, i.e., a positive natural number. Since $n$ is non-zero, we know from multiplying two positive natural numbers results in a positive natural number, thus justifying our desired result: $n$ times $n$ minus one factorial is indeed greater than zero.

You will probably find the first version of the proof much more readable than this version. Why is that? Likely, you see this version of the proof as much more verbose than the first, and it is easy to lose sight of the forest for the trees when you are wading through text.

From this, we see that symbols serve the purpose of acting as concise shorthand for formal mathematical definitions. Symbols pack much information in a small amount of space. Consequently, we can also overwhelm the reader if we include too many symbols as you likely felt when you first saw logic and natural deduction! Thus, a readable proof is typically achieved by intermixing prose and symbols, balancing the information we present to the reader to help them remember the salient details, but not overwhelm them.

Where should we sit on the spectrum of formality? How do we balance readability with all the details that are frequently present in intricate logical arguments? Just like traditional writing, there are no easy answers here, and the best way to learn is by writing a bunch, and experimenting what works for you as a writing of mathematical arguments! Throughout the rest of the course, we'll try to hone your proof writing instinct so that you can craft good mathematical prose.

Exercise (Sequent-to-String, ‡)

Recall from our lab on natural deduction your proof for this sequent.

Claim: $A ⊢ (A \to B) \to (B \to C) \to C$ .

You wrote your proof in pure symbols utilizing natural deduction. Keeping in mind the need to strike a balance between symbols and prose, write a more readable version of your proof by intermixing symbols and prose. In your writing, try to convey the big picture of the proof while also communicating the critical details, i.e., what is the "heart" of the reasoning.

Uniting Reasoning and Design

Traditionally, we think of algorithmic design and reasoning as two separate activities. First, you design an algorithm, and then, in a separate step, you reason about that algorithm. This is not wrong, but perhaps undesirable in two ways:

In terms of workload, you are taking two passes through the code: once for construction and once for verification.
By separating the two activities, you are not able to take advantage of one activity to assist in the other.

Recursive design and inductive reasoning, when employed correctly, alleviates these concerns. To see this, observe the similarities between our recursive design and inductive reasoning templates:

Definition (Recursive design, lists)

If a function takes a list as input, define the behavior of the function in terms of possible shapes of a list:

The list is empty.
The list is non-empty with an element at the front of the list (its head) and a remaining sublist (its tail). In designing this case, assume that the function just works when applied recursively to the tail of the list.

Definition (Inductive proof, lists)

When proving a property of a function that takes a list as input, prove the property according to the possible shapes of a list:

The list is empty.
The list is non-empty with an element at the front of the list (its head) and a remaining sublist (its tail). When proving this case, assume an inductive hypothesis that states that the property holds for the tail of the list.

If we follow these templates, we can unite recursive design and inductive proof, so that when we build a recursive program, we can have some confidence that our program is correct, simply by following the template! To do this, we not only build a program according to the template, we also have a property in mind that we can use to check our work as we design the algorithm. We design the program in such a way that the property is "obviously" fulfilled. This has the dual benefit of not only ensuring we have fulfilled the property, but we also receive concrete guidance as to how to proceed!

An Example: Designing Map

As an example, let's consider re-designing the standard map function over lists. For example map((lambda x: x+1), [1, 2, 3, 4, 5]) -->* [2, 3, 4, 5, 6].

Before we apply the recursive design template, let's consider a property of map that ought to hold. After some thought, we might note that a simple property of map is that it should not change the number of elements in the list. Thus, we can consider the property:

Claim

For any list l and unary function f that takes elements of l as input, length(map(f, l)) ≡ length(l).

Now let's design the function, (map f l), with this property in mind. For this function, f is a function and l is a list.

When the input list l is empty, our property says that map should produce a list of length length([]) -->* 0. The only list of this sort is the empty list, so map should return an empty list in this case.
When is l is non-empty, the list has a head and tail, and we assume that length(map(f, tail)) ≡ length(tail). We know that length(tail) is precisely one less than the length of l. So, if we take the list generated by map(f, tail), we just need to extend it by one additional element. To obtain such an element, we can transform head by f and then cons that onto map(f, tail).

Note in our design how we leverage our induction hypothesis, length(map(f, tail)) ≡ length(tail), to steer the design of the function.

In this sense, we see how verification can directly inform program design. I argue that you've likely done this when writing your programs whether you were cognizant of it or not! However, now that we've seen the connection directly, you can begin to leverage this connection whenever you write programs!

Exercise (Chessing It Up, ‡)

As we move away from program correctness, we'll apply all our logical techniques to other domains. As a fun example of this, consider a classical induction problem: tours on a chessboard.

Chess is played on a $n \times n$ board (with $n = 8$ usually). There are a variety of pieces in chess, each of which move in a different way. For these claims, we will consider two pieces:

The rook which can move any number of squares in a cardinal (non-diagonal) direction.
The knight which moves in a L-shaped pattern: 2 squares horizontally, 1 square vertically or 2 squares vertically, 1 square horizontally.

Furthermore, we will consider two problems (really, thought experiments because these specific situations never arise in a real chess game) when only one such piece is on the board.

The walk is a sequence of moves for a single piece that causes the piece to visit every square of the board. It is ok if the piece visits the same square multiple times.
A tour is a walk with the additional property that the piece visits every square exactly once. In a tour, a piece cannot visit the same square multiple times.

When considering walks and tours, we are free to initially place our piece on the board at any position. In addition, we only consider a square visited if the piece ends its movement on that square. With these things in mind, use induction to prove the following fact:

Claim (Rook's Tours). There exists a rook's tour for any chessboard of size $n \geq 1$ .

(Hint: we need an object to perform induction over. What inductively defined structures are present in the claim?)

Lab: Beyond Program Correctness

In the reading today, we introduced the Rook's Tour problem as an example of inductive reasoning outside of program correctness:

Chess is played on a $n \times n$ board (with $n = 8$ usually). There are a variety of pieces in chess, each of which move in a different way. For these claims, we will consider two pieces:

The rook which can move any number of squares in a cardinal (non-diagonal) direction.
The knight which moves in a L-shaped pattern: 2 squares horizontally, 1 square vertically or 2 squares vertically, 1 square horizontally.

Furthermore, we will consider two problems (really, thought experiments because these specific situations never arise in a real chess game) when only one such piece is on the board.

The walk is a sequence of moves for a single piece that causes the piece to visit every square of the board. It is ok if the piece visits the same square multiple times.
A tour is a walk with the additional property that the piece visits every square exactly once. In a tour, a piece cannot visit the same square multiple times.

Claim (Rook's Tours). There exists a rook's tour for any chessboard of size $n \geq 1$ .

(Hint: we need an object to perform induction over. What inductively defined structures are present in the claim?)

We'll focus on the Rook's Tour for our lab. Likely, you came up with a solution that "works," but the proof was difficult! We will discuss this discrepancy in class and collaboratively develop a proof (and corresponding algorithm) that works with our inductive reasoning principle!

Additional problem: Knight's Walk

The other problem alluded to in the description is a Knight's Walk.

Claim (Knight's Walk). There exists a rook's tour for any chessboard of size $n \geq 4$ .

While seemingly more complicated because of how knight's move, this proof is a relatively straightforward induction. Feel free to give it a shot!

Demonstration Exercise 4

Problem 1: Set Equality

Prove the following common identity about sets:

Claim (Distribution of Union)

$S \cup (T \cap U) = (S \cup T) \cap (S \cup U)$

Problem 2: Partitions and Pivots, Revisited

Recall the definition of partitions and pivots from the lab:

Definition (Partition)

A partition of a set $T$ is a pair of subsets, $S_{1}$ and $S_{2}$ , of $T$ that obeys the following properties:

$S_{1} \cap S_{2} = \emptyset$ .
$S_{1} \cup S_{2} = T$ .

Claim (Pivots Determine Partitions)

Let $a \in S$ . Define $T_{1}$ and $T_{2}$ as follows:

$T_{1} = P (S - {a})$ .
$T_{2} = {B \cup {a} ∣ B \in P (S - {a})}$ .

$T_{1}$ and $T_{2}$ form a partition of $P (S)$ where $a$ is its pivot.

In lab, we explored this definition and proposition using examples. Additionally, we begun to show that this claim is correct. To do so, observe that we must show that the $T_{1}$ and $T_{2}$ defined in the claim form a partition. By the definition of partition, we must show that:

$T_{1} \cap T_{2} = \emptyset$ .
$T_{1} \cup T_{2} = P (S)$ .

In the latter case, you needed to show that sets $T_{1} \cup T_{2}$ and $T$ are subsets "in both directions," of which, you showed the right-to-left direction in class. First, prove the right-to-left direction:

Lemma (Right-to-left Direction)

Let $S$ be a set and let $a \in S$ . Define $T_{1}$ and $T_{2}$ as follows:

$T_{1} = P (S - {a})$ .
$T_{2} = {B \cup {a} ∣ B \in P (S - {a})}$ .

$T_{1} \cup T_{2} \subseteq P (S)$ .

Now, prove the first proposition:

Lemma (Emptiness of Intersection)

Let $S$ be a set and let $a \in S$ . Define $T_{1}$ and $T_{2}$ as follows:

$T_{1} = P (S - {a})$ .
$T_{2} = {B \cup {a} ∣ B \in P (S - {a})}$ .

$T_{1} \cap T_{2} = \emptyset$ .

Finally, put all three lemmas together---the lemmas from the lab and these two lemmas---to write a proof of the "Pivots Determine Partitions" claim.

(Hint: this final proof should be short since you already did all the work. You can simply cite these three lemma in your proof without replicating their steps!)

Sets and Their Operations

As computer scientists, we aim to use mathematics to model computational phenomena in order to

Precisely understand how the phenomena works (usually for the purposes of implementing that phenomena as a computer program) and
Abstract differences between seemingly unrelated phenomena to discover how they are related.

Very often, these computations involve collections of data, for example, a class management system might store a collection of students, or a server might need to track the set of computers it can directly communicate with on the network. Furthermore, these computations can frequently be phrased in terms of manipulations of or queries over these data, for example, generating pairs of students for a class activity, or finding if there exists a series of computers that connect the server to some other computer on the network.

To model data, we frequently resort to mathematical sets.

Definition (Set)

A set is a collection of distinct objects, i.e., there are no duplicates in a set.

Set theory is the branch of mathematics that studies how we construct and manipulate sets. Because virtually any mathematical model includes data of some sort---integers, real numbers, or more abstract objects---sets form the basis of formal mathematics. In addition to using sets directly in our models, we’ll also see sets in every other field of mathematics. In this unit, we briefly explore the field of set theory with an eye towards building up a working understanding of what set theory provides us and what questions we can answer by framing our problems in terms of sets.

Specifying Sets of Objects

When discussing sets, we must first define the universe under consideration.

Definition (Universe, Set Theory)

The universe or domain of discourse $U$ (LaTeX: \mathcal{U}) is the collection of elements that we may include in our sets.

As a concrete example, consider representing a collection of students at Grinnell using sets. We can, therefore, consider our universe to be all students at the college. We typically denote the universe under consideration by the variable $U$ , so we might say:

Let $U$ be the collection of all students at the college.

In many cases, the universe may be inferred by context, e.g., if our sets contain integers, then we can likely infer from context that $U$ is the collection of integers. However, regardless of whether $U$ is stated explicitly or inferred, we must keep it in mind as many of our operations over sets will require this knowledge.

With our universe defined, we may now define sets over this universe. For example, for simplicity's sake, suppose that the college only contains five students: Jessica, Sam, Phillip, Jordan, and Elise. We might specify a set $S_{1}$ containing the first three students as follows:

$S_{1} = {Jessica, Sam, Phillip}$

We say that Jessica, Sam, and Phillip are all elements of the set $S_{1}$ . The elements of the set are surrounded by curly braces. (Note that in LaTeX, you will need to escape the curly braces, i.e., \{ ... \}. You may also consider putting in an explicit thin space, \,, between the braces, i.e., \{\, ... \,\}.)

The primary query that we can ask of a set is whether a set contains a particular element. For example, Jessica is an element of $S_{1}$ or, more colloquially, we say that Jessica is in $S_{1}$ . We can think of set inclusion as a proposition, between a potential element of a set and a set.

Definition (Set Inclusion)

We say that value $x$ is in a set $S$ if $x$ , written $x \in S$ (LaTeX: \in), is an element of $S$ .

With this, we can write the inclusion relationship between Jessica and $S_{1}$ as follows:

$Jessica \in S_{1} .$

Likewise, we know that Elise is not in $S_{1}$ . Like how we write $(\neq =)$ (LaTeX: \neq) to say that two values are not equal, we write $\in /$ (LaTeX \notin) to say that an element is not in a set. For Elise, we would then write

$Elise \in / S_{1} .$

The order of the elements does not matter in a set, so the following set

$S_{2} = {Sam, Jessica, Phillip}$

is equivalent to $S_{1}$ because they contain exactly the same elements. As mentioned previously, sets only contain unique elements, so our sets cannot duplicates. From this definition, we cannot have a set that, for example, contains Jessica twice.

Set Comprehensions

An alternative to explicitly enumerating all the elements of a set, we can also specify a set by way of a set comprehension. For example, the following set:

$S_{3} = {s ∣ s \in U, s is a student whose name starts with "J"}$

is equivalent to the set ${Jessica, Jordan}$ .

A set comprehension is broken up into two parts separate by a pipe ( $∣$ ) (LaTeX: \mid). To the left of pipe is the output expression which normally involves one or more variables. To the right of the of the pipe is a collection of qualifiers that refine which elements are included in the set. There are two sorts of qualifiers we might include:

Generators describe what values the variables of the output expression take on. In the example above, we quantify that $s$ is drawn from our universe $U$ by stating $s \in U$ . This quantification is implicitly universal in nature: we really intend $\forall s . s \in U$ however, we traditionally do not include the extra verbiage of the $\forall$ symbol. (Note that we can think of our universe itself as a set!)

Predicates describe conditions that must hold of the values involved to include it in the set. For an element to be included in the set, it must satisfy all of the predicates of the comprehension.

By combining the generator and the predicate of $S_{3}$ , we see that $S_{3}$ will contain all the students of the college whose name starts with "J".

Note that we frequently elide the generator from our set comprehensions when it is clear from context what the variable ranges over. For example, the predicate already describes $s$ as a student, so it is unnecessary to express that it is also a member of $U$ . We can more concisely write $S_{3}$ as follows:

$S_{3} = {s ∣ s is a student whose name starts with "J"} .$

Set comprehensions are a powerful, flexible method for describing the members of a set. To illustrate this, let's consider another simple universe, the universe of natural numbers. The natural numbers consist of zero and the positive integers. First, let's define a set over this universe, e.g.,

$S_{4} = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} .$

Now let's consider specifying some more complicated set comprehensions. For example, we may denote the sets of even and odd numbers in the range zero through ten as:

$S_{even} = S_{odd} = {n ∣ n \in S_{4}, n mod 2 = 0} {n ∣ n \in S_{4}, n mod 2 = 1} .$

In contrast, the following set comprehension contains a non-trivial expression:

$S_{5} = {2 n ∣ n \in S_{4}}$

For each $n \in U$ , $S_{5}$ contains the value $2 n$ . Thus, $S_{5} = {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20}$ .

When a comprehension includes a single variable, the set ranges over all the elements from the variable is drawn from. When a comprehension includes multiple variables, the set ranges over all the possible combinations of values from which the variables are drawn. For example, consider the following set:

$S_{6} = {(n_{1}, n_{2}) ∣ n_{1} \in S_{even}, n_{2} \in S_{odd}}$

Is equivalent to the larger set:

$S_{6} = {(0, 1), (0, 3), (0, 5), (0, 7), (0, 9) (2, 1), (2, 3), (2, 5), (2, 7), (2, 9) (4, 1), (4, 3), (4, 5), (4, 7), (4, 9) (6, 1), (6, 3), (6, 5), (6, 7), (6, 9) (8, 1), (8, 3), (8, 5), (8, 7), (8, 9) (10, 1), (10, 3), (10, 5), (10, 7), (10, 9)} .$

First, let's consider the expression portion of the comprehension. The expression consists of a pair of elements $(n_{1}, n_{2})$ , so we expect the elements of $S_{6}$ to be pairs. Furthermore, $n_{1}$ and $n_{2}$ are drawn from the sets $S_{even}$ and $S_{odd}$ , respectively, so we expect the pairs to contain natural numbers.

In effect, by including two generators, we consider all possible pairs of elements drawn from the two generators. We can alternatively interpret how these set comprehension "compute" their elements through a series of nested for-loops. In effect, the comprehension computes:

result = []
for n_1 in S_even:
    for n_2 in S_odd:
        pair = (n1, n2)
        result.append(pair)

In general, when we have any collection of generators, we can think of them as a collection of nested for-loops where the body of the loop includes a new element in the set according to the output expression.

Standard Sets of Objects

Very often, our sets a drawn from a standard universes, usually over numbers. Rather than repeating the description of these sets, we denote them using the following variables written in blackboard font (LaTeX: \mathbb{...}):

$N = Z = Z^{+} = Z^{-} = Q = R = {0, 1, 2, \dots} {\dots, - 2, - 1, 0, 1, 2, \dots} {n ∣ n \in Z, n > 0} {n ∣ n \in Z, n < 0} {\dots, - 1, 0, 1, \frac{1}{2}, \dots} {\dots, - 3, 2.5, π, \dots} Natural Numbers Integers Positive Integers Negative Integers Rationals Reals$

Finite and Infinite Sets

In the situation where our set contains a finite number of elements, we denote the size of a set by $∣ - ∣$ , the set surrounded by pipes. For example, the size of $S_{6}$ , the set of pairs of even and odd elements drawn from the range zero through ten is: $∣ S_{6} ∣ = 30$ .

However, a set need not contain a finite number of elements. The universes of numbers we discussed previous all contain an infinite number of elements. However, we can also directly construct sets that are also infinite in size. For example, consider the set:

$S_{7} = {n ∣ n \in N, n mod 10 = 0} .$

$S_{7}$ contains all natural numbers that are a multiple of 10. There is clearly an infinite amount of such numbers, but this poses no problem for defining what the set $S_{7}$ contains. As we shall see, whether a set is finite or infinite does not change the behavior how the basic operations over sets that we consider next. However, infinite sets pose some significant conundrums for set theory that will briefly explore at the end of this chapter.

Reading Exercise (Descriptions)

Write down formal set comprehensions for each of the following descriptions of sets:

The set of all natural numbers that are either less than five or greater than 20.
The set of all pairs of integers such that the sum of the pair of numbers is equal to zero.
The set of all real numbers that are also positive integers.

Set Operations

With our basic definitions for sets—inclusion and set comprehension—we can define the fundamental operations over sets.

Union and Intersection

The union of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \cup S_{2}$ , produces a set that contains all of the elements drawn from either of these sets. For example, if $S_{1} = {2, 3, 5}$ and $S_{2} = {3, 5, 9}$ then $S_{1} \cup S_{2} = {2, 3, 5, 9}$ (keeping in mind that duplicates are discarded with sets).

Definition (Set Union)

The union of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \cup S_{2}$ (LaTeX: \cup) is defined as:

$S_{1} \cup S_{2} = {x ∣ x \in S_{1} \lor x \in S_{2}} .$

In contrast, the intersection of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \cap S_{2}$ , produces a set that contains the elements that are found in both $S_{1}$ and $S_{2}$ . For example, if $S_{1} = {2, 3, 5}$ and $S_{2} = {3, 5, 9}$ then $S_{1} \cap S_{2} = {3, 5}$ .

Definition (Set Intersection)

The intersection of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \cap S_{2}$ (LaTeX: \cap) is defined as:

$S_{1} \cap S_{2} = {x ∣ x \in S_{1} \land x \in S_{2}} .$

Note the parallels between the definitions of these set theoretic operations and the logical connectives we explored earlier:

Set union $(\cup)$ is defined in terms of logical disjunction $(\lor)$ .
Set intersection $(\cap)$ is defined in terms of logical conjunction $(\land)$ .

This is no coincidence! We can think of union and intersection as the set-theoretic realization of logical disjunction and conjunction, respectively. Because they are defined directly in terms of their logical counterparts, union and intersection behave similarly to them as well.

Difference and Complement

The difference of two sets $S_{1}$ and $S_{2}$ , written $S_{1} - S_{2}$ , produces a set that contains the elements of $S_{1}$ that are not also in $S_{2}$ . For example, if $S_{1} = {2, 3, 5}$ and $S_{2} = {3, 5, 9}$ then $S_{1} - S_{2} = {2}$ . Note that $3$ and $5$ are removed from the difference since they are in $S_{2}$ .

Definition (Set Difference)

The difference of two sets $s_{1}$ and $s_{2}$ , written $s_{1} - s_{2}$ is defined as:

$s_{1} - s_{2} = {x ∣ x \in s_{1} \land x \in / s_{2}} .$

The complement of a set $S$ , written $\overline{S}$ , is the set of elements that are not found in this set. Note that this requires knowledge of what our universe $U$ is in order to constrain what elements are not in the set in question. Say that our universe $U$ is over the finite set $U = {1, 2, 3, 4, 5}$ . Then if $S = {2, 3, 5}$ , then $\overline{S} = {1, 4}$ . In contrast, if we expand our universe to be the natural numbers, e.g., $U = N$ , then $\overline{S} = {x ∣ x \in N \land x \neq = 2 \land x \neq = 3 \land x \neq = 5}$ . Formally, we can write this as:

Definition (Set Complement)

The complement of a set $S$ , written $\overline{S}$ (LaTeX: \overline{...}) is defined as:

$\overline{S} = {x ∣ x \in / S} .$

Note that set complement is defined in terms of logical negation and is, thus, strongly identified with logical negation in the same way union and intersection identify with conjunction and disjunction. There is not a direct analog to set difference in logic, but we can see that difference can be written in terms of the other three operators via a direct translation of its formal definition:

$S_{1} - S_{2} \equiv S_{1} \cap \overline{S_{2}} .$

Exercise (Different Strokes)

Consider the following sets:

$S_{1} = {1, 3, 4, 6, 8} S_{2} = {3, 4, 5, 7, 9}$

Demonstrate the equivalence of set difference's definition with the equivalent formulation in terms of intersection and complement by (a) deriving $S_{1} - S_{2}$ in terms of the formal definition of set difference and (b) checking the equivalence on this example by "executing" $S_{1} \cap \overline{S_{2}}$ and observing that you obtain identical results.

Cartesian Product

The Cartesian product of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \times S_{2}$ , is the set of all the possible pairs of elements drawn from $S_{1}$ and $S_{2}$ . The first element of these pairs is drawn from $S_{1}$ , and the second element of these pairs is drawn from $S_{2}$ . It is a bit easier to see how the Cartestian product works by using sets of abstract symbols rather than numbers. First, let's consider $S_{1} = {†, ‡, ⊞}$ and $S_{2} = {▷, ◁}$ . The Cartestian product of these two sets is:

$S_{1} \times S_{2} = {(†, ▷), (†, ◁), (‡, ▷), (‡, ◁), (⊞, ▷), (⊞, ◁)} .$

Note that each element of $S_{1}$ is paired with each possible element of $S_{2}$ . When writing out the Cartesian product, it’ll be useful to do so in a systematic, grid-like manner like above where each row corresponds to a choice of element from $S_{1}$ and each column corresponds to a choice of element from $S_{2}$ .

With this in mind, we can return to our universe of natural numbers and consider our canonical sample sets. For example, if $S_{1} = {2, 3, 5, 8}$ and $S_{2} = {3, 5, 9}$ then:

$S_{1} \times S_{2} = {(2, 3), (2, 5), (2, 9), (3, 3), (3, 5), (3, 9), (5, 3), (5, 5), (5, 9), (8, 3), (8, 5), (8, 9)} .$

Note here that elements shared in common between $S_{1}$ and $S_{2}$ are not discarded like with disjunction. This is because such elements always result in unique pairs in the resulting set.

Definition (Cartesian Product)

The Cartesian product of two sets $S_{1}$ and $S_{2}$ , written $S_{1} \times S_{2}$ (LaTeX: \times) is defined as:

$S_{1} \times S_{2} = {(x_{1}, x_{2}) ∣ x_{1} \in S_{1} \land x_{2} \in S_{2}} .$

Subsets and Set Equality

A set is a subset of another set if all the elements of the first set are contained in the second set. For example, if we have $S_{1} = {2, 5}$ and $S_{2} = {1, 2, 3, 4, 5}$ , then $S_{1}$ is a subset of $S_{2}$ , written $S_{1} \subseteq S_{2}$ . In contrast, $S_{2}$ is not a subset of $S_{1}$ , written $S_{2} \neq \subseteq S_{1}$ . Note that this is not an operation over sets (that produces another set) but, rather, a proposition over sets (that is potentially provable).

The basic proposition we can assert about a set $S$ is whether an element $x$ is found inside the set, written $x \in S$ . We can use this inclusion proposition to formally define subset:

Definition (Subset)

We say that $S_{1}$ is a subset of $S_{2}$ , written $S_{1} \subseteq S_{2}$ (LaTeX: \subseteq) if:

$S_{1} \subseteq S_{2} \Leftrightarrow \forall x . x \in S_{1} \to x \in S_{2} .$

In other words, if $S_{1}$ is a subset of $S_{2}$ then every element of $S_{1}$ must also be an element of $S_{2}$ .

A proper subset, written $S_{1} \subset S_{2}$ (LaTeX: \subset), is where $S_{1} \subseteq S_{2}$ but $S_{1} \neq = S_{2}$ . Note that when we say "subset", we will implicitly mean "subset-or-equal" and use the term "proper subset" to denote this case where we require that the two sets are also not equal.

If we know that $x$ and $y$ are numbers and $x \leq y$ and $y \leq x$ , we know that $x = y$ . This is because if $x$ and $y$ cannot both be less than each other; they must, therefore, be equal to each other. Likewise, if we know that $S_{1} \subseteq S_{2}$ and $S_{2} \subseteq S_{1}$ , we know that $S_{1}$ and $S_{2}$ must be equal.

This realization gives us an alternative definition of set equality in terms of subsets.

Definition (Set Equality)

We say that sets $S_{1}$ and $S_{2}$ are equal if and only if they are subsets of each other. In other words:

$S_{1} = S_{2} \Leftrightarrow S_{1} \subseteq S_{2} \land S_{2} \subseteq S_{1} .$

This definition, in turns, gives us a principle for reasoning about the equality of sets, the so-called double inclusion principle, that we will later use to prove that two sets are equal.

Power Set

Finally, the power set of a set $S$ is a set that contains all the subsets of $S$ .

Definition (Power Set)

The power set of a set $S$ , written $P (S)$ , is:

$P (S) = {T ∣ T \subseteq S}$

Note in our formal definition that $T$ implicitly is a set because it is, by definition, a subset of $S$ . This is not a problem! Sets can certainly contain other sets which allows us to create more complex structures.

As an example of the power set operation, let $S = {1, 2, 3, 4}$ . Then:

$P (S) = {\emptyset, {1}, {2}, {3}, {4} {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4} {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4} {1, 2, 3, 4}} .$

To make it easier to enumerate all the possible subsets of $S$ in a systematic way, we arrange them in order of size.

There is one subset of size zero, the set containing no elements, e.g., the empty set. We can write the empty set using set literal notation, ${}$ but we traditionally use the empty set symbol $\emptyset$ (LaTeX: \emptyset) to denote the empty set.
There are four subsets of size one corresponding to the singleton sets, each containing of the elements of $S$ .
There are six subsets of size two.
There are four subsets of size three.
Finally, there is a single subset of size four which is $S$ itself (because it has size four).

In this particular case, this results in 16 subsets overall. You can imagine that the number of such subsets grows dramatically as the size of the input set increases. We will revisit this point in our discussion of counting.

Problem (Execution, ‡)

Define the following universe and sets drawn from that universe:

$U = {1, 2, 3, 4, 5} S = {1, 3, 4, 5} T = {2, 4, 5} .$

Write down the contents of the resulting set operations use set literals:

$S \cap T$ .
$S \cup T$ .
$\overline{T}$ .
$(S - T) \times T$ .
$P (\overline{T})$ .

Set Inclusion Principles

Inclusion and Equality

A basic question we can ask about sets is whether one element is contained in a set. For example:

Claim

If $x \in A \cap B$ then $x \in A \cup B$ .

This claim posits that if an arbitrary element of the intersection of $A$ and $B$ , it is also in the union of $A$ and $B$ . Intuitively, we know this is true because the intersection of two sets contains elements that are in both sets whereas the union only demands that the elements are in at least one of the sets.

To formally prove this claim, we will work from our initial assumption that $x \in A \cap B$ and proceed forwards to our goal that $x \in A \cup B$ . To do so, we will utilize the formal definitions of our operators to justify each step of our reasoning explicitly. Here is a formal proof of the claim above.

Proof

Proof. We suppose that $x \in A$ and show that $x \in A \cup B$ . By the definition of $(\cup)$ , we must show that $x \in A \lor x \in B$ . However, we already know that $x \in A$ by assumption.

Alternatively, we can present the same proof using a two-column style where each row consists of a fact on the left-hand side and a rule on the right-hand side that justifies how the fact is derivable from the fact on the previous row.

Proof (two-column format)

$x \in A x \in A \cup B [assumption] [def. (\cup)]$

In $L A T E X$ , we can use the \begin{align*} ... \end{align*} environment to format the proof in this two-column style. For example, the following LaTeX code produces the above math prose:

\begin{align*}
x \in A        & & [\text{assumption}] \\
x \in A \cup B & & [\text{def. \$(\cup)\$}]
\end{align*}

Note that supposing that we have an arbitrary $x \in A$ is equivalent to proving a claim that is universally quantified (i.e., $\forall$ ), over that variable $x$ . Therefore this proof also shows that $A \subseteq A \cup B$ as the subset proposition is equivalent to:

$A \subseteq A \cup B \equiv \forall x . x \in A \to x \in A \cup B .$

Our natural deduction rules tells us that to prove this logical proposition, we:

Assume an arbitrary $x$ by the intro (∀) rule.
Assume that $x \in A$ by the intro (→) rule.
Go on to prove that $x \in A \cup B$ .

This is precisely how our proofs above proceeded! In our prose-based proof, we explicated this reasoning although did not cite natural deduction rules justifying the reasoning. At this higher level of proof, we don't cite rules of logic although we know that our reasoning is backed by them. Our symbolic, two-column proof avoids this verbiage, leaving the introductory steps of reasoning implicit so that we can focus on the important parts of the proof: the step-by-step manipulation of sets.

In practice, because we will prove membership of an arbitrary element of a set, we will usually state our claims in terms of subset relationships. For example, here is a similar claim and proof to our original one, but utilizing subset notation instead:

Proof

Claim: $A \cap B \subseteq B$

Proof: Let $x \in A \cap B$ . It suffices to show that if $x \in A \cap B$ then $x \in B$ . However, we know that $x \in A \lor x \in B$ by the definition of $(\cap)$ , allowing us to conclude that $x \in B$ .

Proving Set Inclusion Claims

In summary, when proving that a set $S$ is a subset of another set $T$ , we:

Assume that we have an arbitrary element $x$ of the set $S$ .
Give a proof that shows how we can logically reason step-by-step from this initial assumption to our final goal.
End our proof by showing that $x$ is an element of $T$ , thereby proving our claim.

In logic, this is called forwards reasoning because we are reasoning from our assumptions and axioms to our final goal. This contrasts with our program correctness and natural deduction proofs where we tended to work from our initial goal and generate new assumptions and refined goals from it, a process called backwards reasoning. Note that both forms of reasoning—from assumptions or from our goal—are valid and can be intermixed in a single proof. Ultimately, whether we operate in a forwards or backwards manner in our proofs is a function of context: the domain of the proof and the particular proof state that we are in.

As with all proofs, our proofs in set theory consist of assumptions and a goal. Our assumptions take on various forms:

Element inclusion, e.g., $x \in A$ .
Subsets, e.g., $S \in T$ .
Equality, e.g., $x = (y, z)$ or $S = T \cap V$ .

Like propositional logic, how we reason about our different set operations depends on whether the operation appears in an assumption (something we already know) or a goal (something we are trying to prove). As an assumption:

If we know $x \in S \cup T$ then $x$ is in either $S$ or $T$ .
If we know $x \in S \cap T$ then $x$ is in both $S$ and $T$ .
If we know that $x \in S - T$ then $x$ is in $S$ and not in $T$ .
If we know that $x \in \overline{S}$ then we know $x$ is not in $S$ .
If we know that $x \in S \times T$ then we know that $x = (s, t)$ where $s \in S$ and $t \in T$ .
If we know that $x \in P (S)$ then we know that $x \subseteq S$ .

All of these rules of inference follow directly from our formal definitions for our operations. Likewise, if these operations instead appear as our goal:

If we must show $x \in S \cup T$ then we must show $x$ is in either $S$ or $T$ .
If we must show $x \in S \cap T$ then we must show that $x$ is in both $S$ and $T$ .
If we must show $x \in S - T$ then we must show that $x$ is in $S$ and not in $T$ .
If we must show $x \in \overline{S}$ then we must show $x$ is not in $S$ .
If we must show $x \in S \times T$ then we must show that $x = (s, t)$ , $s \in S$ , and $t \in T$ .
If we must show $x \in P (S)$ then we show that $x \subseteq S$ .

To show these different rules in action, consider the following claim and proof over a more complicated subset relationship:

Proof

Claim: $A \times (B \cup C) \subseteq (A \times B) \cup (A \times C)$

Proof: Let $(x, y) \in A \times (B \cup C)$ with $x \in A$ and $y \in B \cup C$ . By the definition of $(\times)$ , it suffices to show that $(x, y) \in (A \times B) \cup (A \times C)$ . And by the definition of $(\cap)$ , we know that $y \in B \lor y \in C$ . Now consider whether $y \in B$ or $y \in C$ .

If $y \in B$ , then by the definition of $(\times)$ , $(x, y) \in A \times B$ and by the definition of $(\cup)$ , $(x, y) \in (A \times B) \cup (A \times C)$ .
If $y \in C$ , then by the definition of $(\times)$ , $(x, y) \in A \times C$ and by the definition of $(\cup)$ , $(x, y) \in (A \times B) \cup (A \times C)$ .

Note several things with this proof:

When we know that an element a member of a union, we can perform case analysis to refine which set that element comes from.
When we show that an element is in a Cartesian product, we must show that it is a pair and that each of the pair’s components come from the appropriate sets. Because the justification for these parts may not all come from the previous line of the proof, we state which of the lines these justifications come from.

Equality Proofs

Recall that we defined set equality in terms of subsets:

$S = T = def S \subseteq T \land T \subseteq S .$

Thus, to prove that two sets are equal, we need to perform two subset proofs, one in each direction. In the previous section, we proved that $A \times (B \cup C) \subseteq (A \times B) \cup (A \times C)$ . By showing that $(A \times B) \cup (A \times C) \subseteq A \times (B \cup C)$ , we can then conclude that the two sets are indeed equal. Here is a two-column proof of the right-to-left direction of the claim:

Proof

Claim: $(A \times B) \cup (A \times C) \subseteq A \times (B \cup C)$ .

Proof: Let $x \in (A \times B) \cup (A \times C)$ . Consider whether $x \in A \times B$ or $x \in A \times C$ .

Suppose $x \in A \times B$ .

$x \in (A \times B) x = (a, b), a \in A, b \in B b \in B \cup C (a, b) \in A \times (B \cup C) [assumption] [def. (\times)] [def. (\cup)] [def. (\times)] .$

Now suppose $x \in A \times C$ .

$x \in (A \times C) x = (a, c), a \in A, c \in C c \in B \cup C (a, c) \in A \times (B \cup C) [assumption] [def. (\times)] [def. (\cup)] [defn (\times)] .$

We call such equality proofs double-inclusion proofs. Double-inclusion or proving "both sides" of the equality is a powerful, alternative technique for showing that two objects are equal. While it is the primary way we show the equality of sets, we can also apply it to other "equality-like" operations. For example:

To show that two logical propositions are equivalent, $p \equiv q$ , we can show $p \to q$ and $q \to p$ .
To show that two numbers are equivalent, $x = y$ , we can show $x \leq y$ and $y \leq x$ .

Empty Set Proofs

Our proof techniques for set inclusion runs into a snag when we consider the empty set. For example, consider the following claim:

Claim

Claim: $A \cap \overline{A} = \emptyset$ .

Intuitively, $\overline{A}$ contains precisely the elements that are not in $A$ . Thus, we expect the intersection to be empty. To prove this equality, we must show that the left- and right-hand sides are subsets of each other. In one direction, this proof proceeds trivially:

Proof

Claim: $\emptyset \subseteq A \cap \overline{A}$

Proof. There is no such $x$ such that $x \in \emptyset$ , so the claim is vacuously true.

Recall that the definition of subset says that $A \subseteq B = def \forall x . x \in A \to x \in B$ . No elements are contained in $\emptyset$ by definition, so the logical proposition holds trivially, i.e., there are no $x$ that fulfill $x \in A$ .

In the other direction, we become stuck with our standard proof machinery:

Proof

Claim: $A \cap \overline{A} \subseteq \emptyset$

Proof. We assume that $x \in A \cap \overline{A}$ . We must show that $x \in \emptyset$ . But… .

We begin the proof by assuming $x \in A \cap \overline{A}$ . Note that we know this is not possible because the intersection should be empty, but this is precisely what we are trying to prove! However, we encounter a worse problem: our proof requires us to show that $x \in \emptyset$ and that is certainly impossible!

Because of this, we need an alternative proof strategy to prove set emptiness—that a set is equivalent to the empty set. The strategy we'll employ is our final fundamental proof technique, proof by contradiction.

To prove that a proposition $P$ holds using a proof by contradiction:

We assume $\neg P$ is provable.
We then show how this assumption allows us to prove a contradiction, i.e., $⊥$ or $Q \land \neg Q$ for some proposition $Q$ .
Because we cannot logically conclude a contradiction holds and our proof proceeds logically, the only thing that could have caused the contradiction was our initial assumption that $\neg P$ holds. Therefore, $\neg P$ must not hold and so $P$ must hold.

We will apply the technique of proof by contradiction to set emptiness proofs where we show that some set $S = \emptyset$ as follows:

First assume for the sake of contradiction that $x \in S$ .
Then we will show a contradiction. In the context of set theory, this usually means showing that some element $y$ (not necessarily $x$ ) is both in a set and not in a set, e.g., $y \in U \land y \in / U$ .
From this contradiction, we can conclude that our assumption that $x \in S$ is false and thus $x \in / S$ for all $x$ and thus $S = \emptyset$ .

Let us use this proof technique to show that our claim above holds directly without the use of subsets.

Proof

Claim: $A \cap \overline{A} = \emptyset$ .

Proof. We prove this claim by assuming that some $x \in A \cap \overline{A}$ and deriving a contradiction.

$x \in A \cap \overline{A} x \in A, x \in \overline{A} x \in / A [assumption] [def. (\cap)] [def. (\overline{\cdot})]$

$x \in A$ and $x \in / A$ cannot both be true. Our original assumption that there exists an $x \in A \cap \overline{A}$ must then be false and thus no such $x$ exists. Therefore, $A \cap \overline{A} = \emptyset$ .

Exercise (Arr, ‡)

Prove the left-to-right direction of DeMorgan's Law:

Claim: $\overline{A \cup B} \subseteq \overline{A} \cap \overline{B}$ .

Artificial Examples and Sets

Problem 1: Corners

Artificial examples are a particularly useful device for exploring the corner cases of a mathematical definition. While our intuition allows us to explain the "common" scenarios, we sometimes do not have real-world examples that exercise corner cases. Furthermore, sometimes the corner cases behave precisely again our intuition, leading us to make incorrect assumptions about what a mathematical definition says.

In contrast, artificial examples allow us to create situations that are small enough to analyze directly using the definitions involved so that we can obtain a crisp, definitions-based understanding of what is going on. In short, we create small sets built from abstract values, e.g., $a$ , $b$ , and $c$ , and then "run" our definitions on these sets and then observe the results. Ideally, the examples are constructed in such a way that we isolate our predicted behavior, so the example directly explains the situation. In programming, the analogy here is a minimally reproducing example or a "repro" that isolates buggy behavior in a program and is unlikely to produce other effects that might make analysis unnecessarily complicated.

For each of the following (intentionally vague) questions about the fundamental definitions of sets:

Create one or more artificial examples that help you answer the question.
Answer the question by generalizing your observations from your artificial examples. Explain your reasoning in a few sentences.

Is the following claim always true?

Claim: for any sets $S_{1}$ and $S_{2}$ , $∣ S_{1} \cup S_{2} ∣ = ∣ S_{1} ∣ + ∣ S_{2} ∣$ .

(Recall that $∣ S ∣$ is the size of $S$ , i.e., the number of elements it contains.)
Is the empty set, $\emptyset$ , the subset of any set? What about itself?
When performing subtraction over numbers, we obtain negative numbers. We have set subtraction, $A - B$ . Is there a resulting "negative" set we can obtain from set subtraction?
When performing multiplication of numbers, we know that multiplying any number by zero results in zero. The Cartesian Product, $A \times B$ , seems to behave similarly to numeric multiplication. Is there an analogous set in set theory for Cartesian Product?
Set subtraction and Cartesian Product are analogous to arithmetic subtraction and multiplication, respectively. Is there an arithmetic analog to the power set operation, $P (S)$ ? If so, what is it? In particular, what is $P (\emptyset)$ ?

Problem 2: Trickiness

Artificial examples are also useful for gaining intuition about tricky definitions. Here is an example of such a definition:

Definition (Partition)

A partition of a set $T$ is a pair of subsets, $S_{1}$ and $S_{2}$ , of $T$ that obeys the following properties:

$S_{1} \cap S_{2} = \emptyset$ .
$S_{1} \cup S_{2} = T$ .

Give an artificial example of a partition.
Instantiate that artificial example to a real-world example. Use real-world objects in place of your abstract values. Try to choose a real-world domain where your intuitive notion of a "partition" would be relevant.
Using your examples as a guide, describe at a high-level what each of the two conditions of a partition is saying in a sentence or two each.

Problem 3: The Other Side

In a previous reading exercise, we proved the left-to-right direction of De Morgan's Law. Go ahead and prove the right-to-left direction to demonstrate equivalence of the two set expressions.

Claim

$\overline{A} \cap \overline{B} \subseteq \overline{A \cup B}$ .

Problem 4: Pivoting to New Things

Now let's consider a proposition about this definition.

Claim (Pivots Determine Partitions)

Let $a \in S$ . Define $T_{1}, T_{2} \subseteq P (S)$ as follows:

$T_{1} = P (S - {a})$ .
$T_{2} = {B \cup {a} ∣ B \in P (S - {a})}$ .

$T_{1}$ and $T_{2}$ form a partition of $P (S)$ where $a$ is its pivot.

Give an artificial example of a set $S$ .
Identify a pivot element $a$ drawn from $S$ and list the contents of $T_{1}$ and $T_{2}$ for that choice of pivot.
Explain in a sentence or two why your concrete sets $T_{1}$ and $T_{2}$ form a partition according to this definition.

Now, to prove this claim, we must show that for an arbitrary set $S$ and choice of pivot $a$ that:

$T_{1} \cap T_{2} = \emptyset$ .
$T_{1} \cup T_{2} = P (S)$ .

(Note that our claim states that $T_{1}$ and $T_{2}$ are a partition for the power set of $S$ , not necessarily $S$ itself.) We'll focus on the second proposition for this lab. Recall that, as a set equality, the proposition really consists of two subset proofs due to double inclusion:

$T_{1} \cup T_{2} \subseteq P (S)$ , the left-to-right or "if" direction.
$P (S) \subseteq T_{1} \cup T_{2}$ , the right-to-left or "only if" direction.

You will prove the left-to-right direction in the demonstration exercise for this week. As a warm-up for this, you will now prove the right-to-left direction to wrap up this lab!

Lemma (Left-to-right Direction)

$P (S) \subseteq T_{1} \cup T_{2}$

This proof is a bit trickier than the other ones we've seen so far, primarily because of $P (S)$ . Here are some hints to help guide your proof development:

For this set inclusion proof, note that the only thing you can conclude from your initial premise that $X \in P (S)$ is that $X \subseteq S$ . To make progress you need to perform case analysis on the fact that the pivot $a$ is either in $X$ or it is not in $X$ . From there, you can choose which of $T_{1}$ or $T_{2}$ ought to be a member of and then proceed forward.
Note that since $X$ is a set, it'll be difficult to reason about $X$ 's relationship to $T_{1}$ and $T_{2}$ . To work around this, recall the definition of subset: $X \subseteq Y \equiv \forall x \in X . x \in X \to x \in Y$ . At this point in the proof you should consider an arbitrary $x \in X$ and try to show that $x$ is also in $T_{1}$ or $T_{2}$ . This will then imply that $X$ is a subset of the appropriate partition since the $x$ was arbitrary!

Problem 5: Double the Practice Makes Perfect (Optional)

Formally prove the following standard identities of sets. Recall that to prove a set equality, you must prove both directions of the equality.

(Absorption) $A \cup (A \cap B) = A$
(Distribution of Difference) $(A - B) - C = A - (B \cup C)$ .

Proof by Contradiction

So far, we have looked exclusively at proofs by construction where our goal is to construct an object, e.g., an evaluation trace that provides evidence that a proposition holds. However, in some cases, it is not feasible to construct such evidence directly. This is particularly true when we to prove that an object does not obey some property of interest. Indeed, as we've seen from our study of mathematical logic, reasoning about the negation of a property can be subtly tricky!

In these situations, we can employ a different proof technique, proof by contradiction, to show the proposition of interest. Proving a proposition $P$ by contradiction proceeds as follows:

For the sake of contradiction, assume that $\neg P$ , i.e., the negation of the goal proposition, is true.
From this assumption, derive additional facts until we can exhibit a logical contradiction.
If we are able to derive a contradiction, we know that it must be because we assumed $\neg P$ holds. Thus, it must be the case that $\neg P$ does not hold and therefore $P$ holds instead.

Note that in terms of our formal natural deduction rules, this final step invokes the law of the excluded middle:

Definition (Law of the Excluded Middle)

For any proposition $P$ , exactly one of $P$ or $\neg P$ is provable.

We saw this briefly when we reasoned about set inclusion proofs involving the empty set. However, reasoning via contradiction is pervasive throughout mathematics. A classical example of contradiction proof is showing that $2$ is irrational. Recall that the definition of a rational number.

Definition (Rational Number)

A number $n$ is considered rational whenever there exists numbers $a$ and $b$ such that $n = \frac{a}{b}$ .

In other words, a number is rational if it can be expressed as a fraction.

In contrast, an irrational number is precisely a number that is not rational! By "negating" the definition of rational number, we see that we must show that there is no way to decompose the number into a fraction. But that means that we have to show that every possible fractional decomposition does not work, a much harder task than exhibiting a single fractional decomposition!

Instead of this route, we can use a proof by contradiction. We will assume that $2$ is irrational and then follow our nose until we arrive at a contradiction.

Take the time to read and scrutinize this classic proof! It is especially important to understand the details of a proof by contradiction because one misstep or unstated assumption can lead to a contradiction that may not be real.

Proof (Irrationality of square root of two)

Claim: $2$ is irrational.

Proof. Assume for the sake of contradiction that $2$ is rational. By the definition of rational, there exists $a$ and $b$ such that $2 = \frac{a}{b}$ . Furthermore, assume that $\frac{a}{b}$ is simplified, i.e., $a$ and $b$ have no common factors. Now consider the following algebraic manipulation:

$2 = b 2 = (b 2)^{2} = 2 b^{2} = \frac{a}{b} a a^{2} a^{2}$

Observe that $a^{2}$ must be even because it is divisible by 2 (precisely because it is equal to $2 b^{2}$ ). Because the square of an even number is also even, $a$ must be even as well. Therefore, $a = 2 m$ for some integer $m$ . Substituting back for $a$ yields:

$2 b^{2} = 2 b^{2} = b^{2} = (2 m)^{2} 4 m^{2} 2 m^{2}$

Now observe that $b^{2}$ must be even as well because it is divisible by 2 as well and thus $b$ is even. However, we have now established that both $a$ and $b$ are even. This is a contradiction because we assumed they had no factors in common, but, in fact, they do---the common factor is 2.

Thus, our original assumption that $2$ is rational is incorrect; $2$ must be irrational, instead.

Reading Exercises

Check 1: Empty Set Inclusion (‡)

Prove the following set equality using a proof by contradiction:

Claim

For any sets $S$ and $T$ , $S - (T \cup S) = \emptyset$ .

Demonstration Exercise 5

Problem: Representatives: Real-World Edition

In lab, you developed a number of artificial examples of different variants of functions. In this problem, you'll develop real-world examples of these variants. In any programming language you'd like, write functions (one per each variant) that are:

A partial function.
An injective function.
A surjective function.
A left-unique partial function.
A bijection.

Recall that a mathematical function is defined entirely by the values it takes as input and produce as output. In particular, throwing an error is not "outputting" a value!

Additionally, you may need to specify pre-conditions of the types of the input to your function, e.g., restricting something of type int to be a natural number. If you do so, please mention such pre-conditions in a comment in your code.

Problem: A-mazing

One surprising application of equivalence classes is maze generation. Initially, we might start with a completely walled off maze except for distinguished entry and exit points.

┌─┬─┬─┬─┬─┐
├─┼─┼─┼─┤  
├─┼─┼─┼─┼─┤
├─┼─┼─┼─┼─┤
├─┼─┼─┼─┼─┤
└ ┴─┴─┴─┴─┘

The entry for this maze is in the bottom-left corner and the exit is in the top-right. (Note that this choice was arbitrary; we could have flipped entry and exit!)

We can then punch out walls at random:

┌─┬─┬─┬─┬─┐   ┌─┬─┬─┬─┬─┐   ┌─┬─┬─┬─┬─┐
├─┼─┼─┼─┤     ├─┼─┼─┼ ┤     ├─┼ ┼─┼ ┤  
├─┼─┼─┼─┼─┤   ├─┼─┼─┼─┼─┤   ├─┼─┼─┼─┼─┤
├─┼─┼─┼─┼─┤ → ├─┼─┼─┼─┼─┤ → ├─┼─┼─┼─┼─┤ → ⋯
├─┼─┼─┼─┼─┤   ├─┼─┼─┼─┼─┤   ├─┼─┼─┼─┼─┤
└ ┴─┴─┴─┴─┘   └ ┴─┴─┴─┴─┘   └ ┴─┴─┴─┴─┘

Until we have a completed maze:

┌───────┬─┐
├── ┌─┐ │
│ ──┴─┤ │ │
│ ┌── │ └ ┤
├─┴── ┼ ╶─┤
└ ────┴───┘

An algorithm for generating a maze proceeds simply: keep punching out walls until the maze is complete. But what does complete mean? In this problem, we'll use the theory of relations to crystallize the notion of maze completeness.

As a starting point, let's consider labeling each of the rooms induced by the walls of our original maze:

a b c d e
f g h i j
k l m n o
p q r s t

With these labels, we observe that $p$ is the room containing the entrance to the maze and $j$ contains the exit.

First, formally define our universe $U$ and a binary relation $R$ that captures the notion of a maze and the connection between rooms in a maze.
Prove that $R$ is an equivalence.
Use $R$ to give a formal definition for what it means to be a maze to be complete. (Hint: we need to relate the rooms containing the entrance and exit in some fashion!)
Argue in a few sentences why the algorithm for generating a maze:

Keep punching out walls randomly until the maze is complete.

Always generates a maze. Use your formal definition of completeness in your argument.

Problem: The Cyclotron

A cycle in a relation $R$ is a sequence of $k$ distinct elements $x_{1}, \dots, x_{k}$ that are related in sequence and the sequence ends with $x_{k}$ and $x_{1}$ being related, i.e.,

$(x_{1}, x_{2}), (x_{2}, x_{3}), \dots, (x_{k - 1}, x_{k}), (x_{k}, x_{1}) \in R .$

A cycle is considered non-trivial if $k \geq 2$ .

Give an artificial example equivalence relation and a non-trivial cycle in that equivalence relation.
Give a real-world example of an equivalence relation and an example of a non-trivial cycle in that relation.
Recall from the reading that a partial ordering is a reflexive, anti-symmetric, and transitive relation. We'll show that a partial ordering contains no non-trival cycles. To prove this claim, we'll instead show that for any natural number $k$ , there does not exist any cycles of size $k$ :

Claim (Cycles and Partial Orders): Let $R$ be a partial order and $k \geq 2$ be a natural number. Then $R$ possess no non-trivial cycles of size $k$ .

To prove this claim, proceed by strong induction on the length $k$ of the cycle under consideration. Since we are proving the non-existence of an object (cycles in this case), you will need to use a proof by contradiction within each case!

(Hint: when proving this fact by contradiction, you will assume the existence of a cycle of a certain size. Use the properties of a partial order to infer additional relations between the elements of the cycle until a contradiction arises.)

Functions and Relations

Previously, we explored the mathematical formalism of sets. Sets allow us to model collections of data. However, we frequently wish to capture relevant relationships in our data. Structured data is called as such because the data in the collection is related in some way. For example:

An element precedes another element in a list.
A person is friends with another person in a social network.
One number is a divisor of another number.

To model structured data, we need some way of modeling these relationships between individual datum. In this chapter, we use sets to develop the theory of relations which will allow us to formally reason about these relationships.

Definitions and Notation

Intuitively, a relation relates two objects by some property as defined by the relation. To capture this correspondence we combine pairs with sets:

Definition (Relation)

A relation $R$ over a universe $U$ is a subset of pairs of elements drawn from $U$ , i.e., $R \in P (U \times U)$ .

Suppose we have a relation $R$ . An element of $R$ , $(a, b) \in R$ denotes that $a$ and $b$ are related by $R$ , which we can express in different ways:

Notation	Form
$(a, b) \in R$	Set notation
$R (a, b)$	Function notation
$a R b$	Infix notation

For example, let our universe $U = {"Mary", "Miguel", "Li", "Lana"}$ . Then we can define a relation $owes : U \times U$ that captures whether one person owes another money. With this relation, the following expressions:

$("Miguel", "Li") \in owes$ .
$owes ("Miguel", "Li")$ .
$"Miguel" owes "Li"$ .

All posit that Miguel owes Li money.

Note that because the order of a pair matters, these expressions do not automatically assert that Li owes Miguel money. We would need to include this fact separately: $("Li", "Miguel") \in owes$ .

Operations Over Relations

Because we define relations in terms of sets, we can define our fundamental operations over relations using set-theoretic notation.

Domain and Range

First, we can project out the left-hand and right-hand elements of a relation, typically called the domain and range, respectively.

Definition (Domain)

Let $R$ be a relation. Define the domain of $R$ , written $dom (R)$ , as:

$dom (R) = {x ∣ \exists y . (x, y) \in R} .$

Definition (Range)

Let $R$ be a relation. Define the range of $R$ as:

$range (R) = {y ∣ \exists x . (x, y) \in R} .$

Alternatively, the range may also be called the codomain of the relation.

Cardinality

Frequently, we may want to have the domain and range of a relation come from disparate sets $S$ and $T$ . This is no problem for our definition of relation; we can simply define the universe to be the union of $S$ and $T$ . Then our pairs are drawn from this union where the domain is always an element of $S$ and the range is always an element of $T$ .

In this case, we can define a relation between sets $S$ and natural numbers $n$ where $n$ is the number of elements in $S$ . We commonly call this the cardinality of a set. This is normally written $∣ S ∣ = n$ , but to align with our relation notation, we can write $card (S, n)$ to denote this fact. For example:

$({1, 2, 3}, 3) \in card$ .
$({a}, 10) \in / card$ .
$(\emptyset, 0) \in card$ .

Lifted Operations

Because relations are sets, we can lift any binary operation over sets to relations. As examples, let $R$ and $S$ be two relations. Then define the following lifted operations over sets to relations as:

$R \cup S R \cap S \overline{R} = {(a, b) ∣ (a, b) \in R \lor (a, b) \in S} = {(a, b) ∣ (a, b) \in R \land (a, b) \in S} = {(a, b) ∣ (a, b) \in / R}$

Definition (Relational Union)

As a practical example of applying set-theoretic operations to relations, consider using relations to map items in a store to their stock, i.e., a relation whose domain is (abstract) objects and the codomain is natural numbers. We might have two different stores, with their own separate stocks of disparate items:

$R_{1} = {(a, 2), (b, 0), (c, 3)}$ .
$R_{2} = {(d, 1), (e, 5), (f, 0)}$ .

Then $R_{1} \cup R_{2}$ might represent joining together the stocks into a single store:

$R_{1} \cup R_{2} = {(a, 2), (b, 0), (c, 3), (d, 1), (e, 5), (f, 0)}$ .

Transformations

Beyond lifted operations, we can also define several fundamental transformations over relations.

Definition (Inverse)

Let $R$ be a relation. Define the inverse of $R$ , written $R^{- 1}$ , as:

$R^{- 1} = {(b, a) ∣ (a, b) \in R}$

Definition (Composition)

Let $R$ and $S$ be relations. Define the composition of $R$ and $S$ , written $S \circ R$ (LaTeX: \circ) as:

$S \circ R = {(a, c) ∣ (a, b) \in R, (b, c) \in S}$

Note that with composition that we "run the relation" from right-to-left, first through $R$ and then through $S$ .

Definition (Image)

Let $R$ be a relation. Define the image of an element $a$ , written $R (a)$ , as:

$R (a) = {b ∣ (a, b) \in R}$

This final transformation is particularly useful when talking about functions which (we will discover shortly) are a special case of relations. In particular, note that if we have a function $f$ , then both the definition and notation of image coincides with "run the function", $f (x)$ .

Function-like Relations

Functions form the heart of computation within mathematics. Consider the following partially specified relation:

$R_{1} = {(0, 1), (1, 2), (2, 3), (3, 4), \dots}$

From inspection, you would rightfully conclude that $R$ relates a natural number to the number one greater than it, i.e., $R$ is the increment function. We can see that the left-hand element of a pair represents an input to the function and the right-hand element of a pair is its corresponding output.

Based on this example, it may feel like functions and relations are the same. However, not all relations are functions. For example, consider the following relation:

$R_{2} = {(0, 1), (0, 2), (0, 3)} .$

If we think of $R_{2}$ as a function, what is the result of $R_{2} (0)$ ? It appears there are three choices— $1$ , $2$ , and $3$ ! This does not align with our intuition about how a function works where a single input to a function should generate a single output.

In actuality, functions can be thought of as a special case of relations. Next, we'll develop the definitions necessary to classify certain relations as functions. These definitions will help us understand better the nature of functions as well as leverage the functions-as-relations view in our own mathematical models.

Note

This next section is light on exposition by intention! You should employ the strategies we've discussed in the course to understand and internalize these definitions. Create small example sets that exhibit each of these definitions and try to understand the essence of the definitions by generalizing the structure of the examples.

Totality and Uniqueness

The two main properties that separate functions from other relations are totality and uniqueness. Because functions distinguish between inputs and outputs in a non-symmetric fashion, totality and uniqueness can apply either to the inputs of the function (the "left") or the outputs of the function (the "right").

Totality

Totality concerns whether all the elements in the universe of some relation appear in the relation.

Definition (left totality)

A relation $R$ is left-total if all elements are related by $R$ on the left

$\forall x . \exists y . (x, y) \in R .$

Definition (right totality)

A relation $R$ is right-total if all elements are related by $R$ on the right:

$\forall y . \exists x . (x, y) \in R .$

Uniqueness

Uniqueness concerns whether an element is related to a single other element. The way that we express this property formally is that if an element is mapped to two elements, those two elements are in fact the same.

Definition (left-unique)

A relation $R$ is left-unique if every element in the relation on the left-hand side is mapped to a unique element right.

$\forall x, y, z . (x, y) \in R \to (z, y) \in R \to x = z .$

Definition (right-unique)

A relation $R$ is right-unique if every element in the relation on the right-hand side is mapped to a unique element on the left.

$\forall x, y, z . (x, y) \in R \to (x, z) \in R \to y = z .$

With totality and uniqueness defined, we can define particular refinements relations in terms of these properties.

Definition (partial function)

A relation is a partial function if it is right-unique.

Definition (function)

A relation is a function if it is both right-unique and left-total.

To better distinguish from partial functions, we also call right-unique and left-total relations total functions. Note that a total function is one that is well-defined, i.e., "has an answer" every possible input. In contrast, a partial function may be undefined on some inputs; this corresponds to the non-existence of a pair mentioning the undefined element on the left-hand side.

Definition (injectivity)

Definition (injectivity): a relation is an injective function if it is a function (right-unique and left-total) as well as left-unique.

Definition (surjectivity)

Definition (surjectivity): A relation is a surjective function if it is a function (right-unique and left-total) as well as right-total.

Definition (bijection)

Definition (bijection): A relation is a bijection if it is a function (right-unique and left-total) as well as injective and surjective (left-unique and right-total).

Reading Exercise (Definitions, ‡)

Consider the following relation $R$ over $U = {a, b, c, d, e}$ :

$R = {(a, c), (b, c), (c, c), (d, c), (e, c)} .$

Does the relation fulfill each of the given properties? If so, you can simply say "yes". If not, give a single sentence explaining why not.

Left-total
Right-total
Left-unique
Right-unique
Partial function
Function
Injective function
Surjective function
Bijection

A Plethora of Definitions

In this lab, we'll be exploring the variety of definitions found in today's reading that span relations and functions. These definitions test our ability to (a) read definitions made from logical propositions and (b) explore definitions using artificial and real-world examples.

Problem 1: Relatable

Consider the following artificial set:

$S = {a, b, c, d, e, f}$

As well as the following relation $R$ over $S$ :

$R = {(a, b), (a, c), (b, d), (c, c), (d, b), (e, b), (e, f)}$

Compute the following operations over $R$ :
- $dom (R)$ .
- $range (R)$ .
- $R^{- 1}$ .
- $R (a)$ .
Instantiation this artificial example to a real-life example. That is, give real-life meaning to the set $S$ and the relation captured by $R$ . Describe this real-life example in a sentence or two.
For each of the operations from part (a), interpret what the operation is "computing" in terms of your real-life example.

Problem 2: Representatives

Uniqueness and totality are deceptively complex definitions. In this problem, we'll explore them using artificial and real-world examples.

Give artificial examples of the following relations:
- A partial function.
- An injective function.
- A surjective function.
- A left-unique partial function.
- A bijection.
Your examples should possess exactly the properties implied by the definitions and no more. For example, your injective function should not also be a bijection. This way, you can use your artificial example to understand precisely what it means for a function be injective.
The notion of partial and (total) functions is an important concept in computer science! Note that the notion of output in a mathematical function corresponds to the function returning a value. Other kinds of ways that a function can produce some kind of observable behavior, e.g., mutating variables, throwing an exception, or printing to the console, do not count as output.

Give an example of a function in a programming language of your choice (e.g., Racket) that is:
- A partial function.
- A surjective function.

Problem 3: Applications to Programming

Here is a common piece of advice about function design couched in terms of the language of relations:

We should heavily favor designing total rather than partial functions whenever possible.

Keep in mind that when we talk about programming language functions, we only count as output the values that a function returns.

In a sentence or two, clarify what we mean by total versus partial programming language functions.
In a few sentences, describe why we might favor total versus partial functions by appealing to the definitions of relations we've talked about today.

Problem 4: Compression

Imagine that you are writing a file-compressing program ala zip. We can think of this program as a pair of functions that (a) takes a file as input and produces a (presumably) compressed version of the file as output and (b) takes a compressed version of the file and produces the original file as output.

When viewed as functions, we clearly want these functions to be total, i.e., we want to be able to compress any file. However, what about the other properties of functions?

First suppose that the functions in question were surjective but not injective. In a few sentences, describe whether such a compression program would be a good compression program.
Now suppose that the functions are injective but not surjective. In a few sentences, describe whether the resulting compression program would be a good compression program.

Problem 5: Why We Don't Talk About Floats

In CSC 161, learn that IEEE floating-point numbers (i.e., the float and double types) are approximations of real numbers in a computer system. Because they are approximations, computations over floats are prone to imprecision. To account for this, we typically say two floats $x$ and $y$ are equal whenever the difference between the two is no larger than some threshold value, call it $ϵ$ (LaTeX: \epsilon). To model floats, we will use numbers drawn from the reals, and so we can define equality between floats, written $≐$ , as:

$(≐) = {(x, y) ∣ x, y \in R, ∣ y - x ∣ < ϵ} .$

In this definition, we can think of $ϵ$ as a pre-defined global constant.

Is $(≐)$ reflexive? If so, prove this fact formally. If not, give a counterexample demonstrating that the claim does not hold.
Is $(≐)$ symmetric? If so, prove this fact formally. If not, give a counterexample demonstrating that the claim does not hold.
Is $(≐)$ transitive? If so, prove this fact formally. If not, give a counterexample demonstrating that the claim does not hold.
Is $(≐)$ an equivalence relation? Briefly explain your answer using your results from the previous parts.
In a few sentences, describe the implications of your answer to part (d) to computer programming. In particular, in what situations do your results from (d) arise and potentially introduce hard-to-find bugs in a program?

Problem 6: Closure

Recall that the closure of a set under a property $P$ of a relation $R$ is the largest relation that satisfies $P$ and contains $R$ . We can compute the closure of any relation under a property by repeatedly applying the property to generate new pairs to add to the relation until we can no longer add new pairs.

Consider the following universe $U$ and relation $R : U \times U$ :

$U = R = {a, b, c, d, e} {(a, b), (b, d), (c, e), (e, d)}$

Compute the:
- The reflexive closure of $R$ .
- The transitive closure of $R$ .
- The equivalence closure of $R$ .
Now consider $U$ to be the set of all Racket programs and $R$ to be the relation:

${(e_{1}, e_{2}) ∣ e_{1} ⟶ e_{2}} .$

In other words, $R$ is the single-step relation, $e_{1} ⟶ e_{2}$ , of our Racket model of computation where $e_{1}$ takes a single step of evaluation to $e_{2}$ . For example (* 3 (+ 4 5)) $⟶$ (* 3 9). Answer the following questions regarding $R$ in a sentence or two each:
- Is $R$ reflexive? Why?
- Is $R$ symmetric? Why?
- What does the transitive closure of $R$ represent? (Hint: if you iterate the single-step relation on an expression repeatedly, what do you obtain?)

Equivalences and Orderings

There are a number of special kinds of relations that are ubiquitous in mathematics. We have already studied functions-as-relations. Now we will explore two other kinds of common relations:

The equivalence, which captures the notion of equality between objects in a universe
The ordering, which captures the notion of, literally just that, ordering between objects.

Equivalences

Like functions, equivalences are a refinement of relations. In particular, a relation that enjoys these three properties, reflexivity, symmetry, and transitivity, is considered an equivalence.

Definition (Reflexivity)

A relation $R$ is reflexive if it relates every element in the universe to itself.

$\forall x . (x, x) \in R$

Definition (Symmetry)

A relation $R$ is symmetric if any pair of related elements are also related "in the opposite direction."

$\forall x, y . (x, y) \in R \to (y, x) \in R$

Definition (Transitivity)

A relation $R$ is transitive if whenever any pair of elements are related with a common element in the middle, the first and last elements are also related.

$\forall x, y, z . (x, y) \in R \to (y, z) \in R \to (x, z) \in R$

These three concepts form the definition of an equivalence relation.

Definition (Equivalence)

A relation an equivalence if it is reflexive, symmetric, and transitive.

The standard equality relation $(=)$ over the natural numbers $N$ is an equivalence relation as it fulfills all three properties of an equivalence:

Reflexive : Identical numbers are considered equal.

Symmetric : Order doesn't matter when asserting equality between numbers.

Transitive : When declaring $x = y$ and $y = z$ , we know that these two facts establish that $x$ and $y$ are the same number and $z$ and $y$ are the same number. From this, we can conclude that $x$ and $z$ must also be the same number.

Reasoning About Equivalences

To formally show that a relation is an equivalence, we must show that it obeys the three properties of an equivalence: reflexivity, symmetry, and transitivity. We show the outline of such a proof using the following real-world example, arithmetic expressions, e.g., $3 + 5$ or $3 \cdot (2 - 1)$ .

Let $(\equiv)$ be the following relation:

$(\equiv) = {(e_{1}, e_{2}) ∣ e_{1} and e_{2} are arithmetic expressions that evaluate to the same value v}$

Claim: $(\equiv)$ is an equivalence relation.

To show that $(\equiv)$ is an equivalence relation, we must show that it is reflexive, symmetric, and transitive.

Proof

Proof. We show that $(\equiv)$ is a reflexive, symmetric, and transitive relation:

Reflexive : Because an arithmetic expression evaluates to a unique value, it must be the case that $\forall e . e \equiv e$

Symmetric : Let $e_{1}, e_{2} \in U$ and assume that $e_{1} \equiv e_{2}$ . By the definition of $(\equiv)$ , since the pair of expressions is related, they must evaluate to the same value $v$ . Because of this fact and the definition of $(\equiv)$ , we know that the pair is related in the other direction, i.e., $e_{2} \equiv e_{1}$ .

Transitive : Let $e_{1}, e_{2}, e_{3} \in U$ and assume that $e_{1} \equiv e_{2}$ and $e_{2} \equiv e_{3}$ . By the definition of $R$ , this means that $e_{1}$ and $e_{2}$ evaluate to the same value, call it $v_{1}$ and $e_{2}$ and $e_{3}$ evaluate evaluate to the same value, call it $v_{2}$ . However, we know that an arithmetic expression evaluates to a unique value, so it must be the case that $v_{1}$ and $v_{2}$ are identical since they are both the result from evaluating $e_{2}$ . This means that $e_{1} \equiv e_{3}$ as well.

Equivalence Closures

The closure of a set $S$ under a property $P$ is the (smallest) set $S^{*} \subseteq S$ whose elements all satisfy $P$ . The concept of closure lifts to relations in the expected way. For example, let $U = {0,., 10}$ and $P$ be the property of symmetry $\forall x, y . (x, y) \in R \to (y, x) \in R$ , then if $R$ is the relation:

$R = {(0, 3), (2, 5), (6, 9), (5, 2)},$

Then the symmetric closure of $R$ is the relation $R^{*}$ :

$S = {(0, 3), (3, 0), (2, 5), (5, 2), (6, 9), (9, 6)} .$

We can compute the closure of any relation under a property by repeatedly applying the property to generate new pairs to add to the relation until we can no longer add new pairs.

We can apply the notion of closure to all the properties of an equivalence relation simultaneously to form an equivalence closure of a set of elements. Intuitively, the equivalence closure of a set of elements captures all the different equalities induced by the properties of equivalences.

For example, consider an artificial set $S = {a, b, c}$ and suppose we know that some relation $R$ relates the elements as follows:

$(a, b) \in R, (b, c) \in R .$

If we furthermore know that $R$ ought to be an equivalence relation, then we can compute the equivalence closure of $R$ as follows:

The reflexive closure of the relation relates every element to itself: $(a, a), (b, b), (c, c) \in R$ .
The symmetric closure of the relation relates every pair "in the other direction": $(b, a), (c, b) \in R$ .
The transitive closure of the relation connects every transitive pair of elements: $(a, c) \in R$ .
Finally, we also have to consider the symmetric closure again for this new pair: $(c, a) \in R$ .

In this particular case, the equivalence closure of $R$ is all nine possible pairs of $S$ with itself, i.e., $S \times S$ . This case captures the intuition that the two original equalities are sufficient to deduce that all the elements of $S$ are equal.

Exercise (Closure, ‡)

Consider the following relation $R$ over universe $U = {a, b, c, d, e, f}$ :

$R = {(a, b), (c, e), (d, b), (f, e)} .$

Compute the equivalence closure of $R$ .

Equivalence Classes

Intuitively, an equivalence relation captures some notion of equality between objects. We can then think about grouping together sets of mutually equal objects. For example, let's return to arithmetic expressions. The following expressions are all equivalent to each other:

$10$ .
$5 + 5$ .
$2 \times (2 + 3)$ .

Because they all evaluate to $10$ . Consider creating a set of such expressions, call it $S_{4}$ with the property that they all evaluate to $4$ :

$S_{4} = {e ∣ e is an arithmetic expression that evaluates to 4} .$

Any pair of expressions within $S_{4}$ are equivalent to each other. We call such a set an equivalence class.

Definition (Equivalence Classes)

An equivalence class of an equivalence relation $R$ over universe $U$ is a set $S$ of elements drawn from $U$ that are pairwise equivalent according to $R$ , i.e.,

$\forall x, y \in S . (x, y) \in R .$

Recall that $x mod y$ is the whole-number remainder of $x \div y$ . Because of the nature of division, the result of $x mod y$ takes on the values $0, \dots, y - 1$ . Because of this, we can consider, e.g., the equivalences classes induced by taking a number and modding it by 3. $x mod 3$ takes on three values, $0$ , $1$ , and $2$ , and thus, $mod 3$ induces three equivalence classes:

$E_{1} = {0, 3, 6, 9, \dots}$ .
$E_{2} = {1, 4, 7, 10, \dots}$ .
$E_{3} = {2, 5, 8, 11, \dots}$ .

In the context of the modulus operator, we can say that any pair of numbers in an equivalence class are equivalent modulo 3, e.g., $4$ and $11$ are equivalent modulo $3$ .

Orderings

There are many ways that we might order data. For a collection of strings, for example, we might order by:

Simple length. For example, "dog" is less than "doghouse" since it is shorter.
Lexicographical, i.e., dictionary order, comparing letter-by-letter. For example, "alphabet" is less than "zoo" in lexicographical ordering.
A more arbitrary measure such as the number of consecutive trailing constants in the word. For example, "correctness" (-ss) comes before "algorithms" (-thms).

What is the essential nature of these relationships that make it a valid "ordering?" Let's start with the properties from an equivalence—reflexivity, symmetry, and transitivity—and use them In other words, which of the three properties of an equivalence—reflexivity, symmetry, and transitivity—are necessary for a relation to be considered an ordering? Let's consider numeric ordering $(\leq)$ as our quintessential example of orderings and each of these properties in turn:

Reflexivity: $(\leq)$ appears to be reflexive because any number is equal to itself!
Symmetry: $(\leq)$ appears to not be symmetric. For example, $3 \leq 5$ but $5 \neq \leq 3$ .
Transitivity: $(\leq)$ also appears to be transitive. If we establish that $x \leq y$ and $y \leq z$ , we form the following chain of comparisons $x \leq y \leq z$ with the understanding that this notation implies that $x \leq z$ , too.

So it seems like an ordering is reflexive, transitive, but not symmetric! However, it seems like we need to say something stronger than "not symmetric." To see this, observe that we never want different numbers to be related to each other in both directions. In other words, it should never be the case that $x \leq y$ and $y \leq x$ for different $x$ and $y$ ! We call this property anti-symmetry:

Definition (Anti-symmetry)

A relation $R$ is anti-symmetric if any pair of elements are related, at most, in one direction:

$\forall x, y . (x, y) \in R \to (y, x) \in R \to x = y .$

Observe how we have to define this "zero-or-one" property in the same way we do with uniqueness. Intuitively, the formal definition of anti-symmetry says that whenever two elements are related in both directions, they must be the same element.

With the definition of anti-symmetry, we can now formally define a partial ordering:

Definition (Partial Ordering)

A relation $R$ is a partial ordering if it is reflexive, anti-symmetric, and transitive.

As an example of a partial ordering, consider a hierarchy of employees where an employee has a manager, their manager has a manager, and so forth. We say that one employee is higher in the hierarchy than another if the first employee is the direct or indirect manager (i.e., manager of their manager, manager of their manager's manager, etc.) of the second. We can consider the following notion of employee equality:

$(=_{e}) = {(e_{1}, e_{2}) ∣ e_{1} and e_{2} have the same position}$

Along with an ordering based on the employee hierarchy:

$(\leq_{e}) = {(e_{1}, e_{2}) ∣ e_{1} =_{e} e_{2} or e_{2} is higher than e_{1} in the company hierarchy.}$

We can show that this relation obeys the properties of a partial ordering:

Proof

Claim: $(\leq_{e})$ is a partial ordering.

Proof. We show that $(\leq_{e})$ is reflexive, anti-symmetric, and transitive.

: Reflexive. For any employee $e$ , $e \leq_{e} e$ since they have the same position.

: Anti-symmetric. Suppose we have two employees $e_{1}$ and $e_{2}$ and we know $e_{1} \leq_{e} e_{2}$ and $e_{2} \leq e_{1}$ . From the definition of $(\leq_{e})$ , either it is the case that $e_{1} = e_{2}$ or $e_{1}$ and $e_{2}$ are mutually higher in the company hierarchy than the other. The former case is impossible because this would imply there is a cycle in the hierarchy, but that would violate the definition of a hierarchy. Thus, it must be the case that $e_{1} = e_{2}$ .

: Transitivity: Suppose we have three employees $e_{1}$ , $e_{2}$ , and $e_{3}$ and $e_{1} \leq_{e} e_{2}$ and $e_{2} \leq_{e} e_{3}$ . Since our definition of "higher-up" in the hierarchy includes both direct and indirect managerial relationships, since $e_{2}$ is higher-up than $e_{1}$ and $e_{3}$ is higher-up than $e_{2}$ , then $e_{3}$ is also higher up than $e_{1}$ . In other words, $e_{3}$ is either identical to $e_{1}$ or is at least one level of manager above $e_{1}$ .

Note that two employees that don't have a direct manager in common are incomparable by this definition because they don't appear in each other's chain of managers to the top of the hierarchy. This is why partial orders are named as such: some employees are incomparable to each other under this relation. To obtain this property, we add it directly to our definition, giving us a total ordering:

Definition (Total Ordering)

A relation $R$ is a total ordering if it is a partial ordering with the additional property that any pair of elements is related in either direction:

$\forall x, y . R (x, y) \lor R (y, x)$

Exercise (Orderings, ‡)

Consider the set $U = {a, b, c, d, e}$ and the following relation $R$ over this set:

$R = {(a, a), (b, b), (c, c), (d, d), (e, e), (a, b), (a, c), (b, d), (c, d), (d, e) (a, d), (a, e), (b, e), (c, e)}$

Prove that $R$ is a partial order.
Is $R$ a total order? If so, prove it. If not, provide a counterexample demonstrating that some pair of elements in $U$ is incomparable.

Demonstration Exercise 6

In this demonstration exercise, you'll encounter yet another area of graph theory, flow networks, and gain experience in exploring mathematical definitions with examples.

Problem 1: Exploring Flow Networks

An important specialization of graphs are flow networks:

Definition (Flow Network)

A flow network is a directed graph $G = (V, E)$ coupled with a capacity function $c : E \to R^{+}$ . $c$ describes the maximum amount of flow across a particular edge.

The prototypical real-life example of a flow network is a network of pipes that carry water. Each edge corresponds to a pipe and the pipes are connected at joints, represented by vertices. Flow in this context is literal water flow where the capacity represents how much water each pipe can hold.

In this formulation, we represent the flow through a network as a separate flow function:

Definition (Flow)

Given a flow network $G = (V, E)$ , a flow function $f$ is a function $f : E \to R$ that describes the amount of flow traveling over a particular edge.

Finally, it is common when discussing flow networks to distinguish two nodes, a source and sink node, $s, t \in V$ , respectively. We think of them as the entry and exit points of flow in our network, whatever that means for the domain in question. We then define the flow value of the network as the flow going into the sink (and thus, out of the network!):

Definition (Flow Value)

Given a flow network $G = (V, E)$ with flow function $f$ and source and sink vertices $s, t \in V$ , respectively, the flow value $∣ f ∣$ of the flow function is the amount of flow that goes into the sink:

$v \in V \sum f (v t)$

Give two additional real-world examples of flow networks, explicitly describing what each of the components of the network:

Vertices $V$ ,
Edges $E$ ,
The capacity function $c$ ,
The flow function $f$ ,
The source $s$ and sink $t$ , and
The flow value $∣ f ∣$ .

Represent in each example.

Problem 2: Feasible Flows

Next, come up with an artificial example of a complete flow network with at least five vertices. Make sure to formally specify all the components of the network!

Note that to avoid cluttering our definitions, we make several assumptions about the shape of a flow network:

We assume (but do not specify) symmetric edges, i.e., if $(u, v) \in E$ then $(v, u) \in E$ .
If an edge $(u, v) \in / E$ then we assume it is present (again, with explicit specifying this fact). However, we also assume that $f (uv) = c (uv) = 0$ .

Using your artificial example, explore these three critical conditions under which a flow function is considered valid or feasible. All three conditions are specified in terms of a flow network with components as described above:

Condition 1

$\forall u, v \in V . f (u, v) \leq c (u, v)$ .

Condition 2

$\forall u, v \in V . f (u, v) = - f (v, u)$ .

Condition 3

$\forall v \in V - {s, t} . \sum_{(u, v) \in E} f (u, v) = \sum_{(u, v) \in E} f (v, u) .$

If your artificial example does not meet these conditions, change it so it does!

For each condition:

Describe in a sentence or two what the condition is enforcing about flow in a flow network.
For each of your real-world examples from the previous part, describe what the condition means in the concrete context of the example. (Note: the real-world instantiation of each condition may be quite similar to your general description. That is fine!)

Problem 3: The max flow-min cut theorem

Recall from the reading that a cut of a graph is a partition of the graph into two sets of vertices:

Definition (Cut)

Let $G = (V, E)$ be a graph. A cut of the graph is a partition of its vertices $(S, V - S)$ with $S \subseteq V$ . A non-trivial cut is one where $S \neq = \emptyset$ and $S \neq = V$ .

In the context of flow networks, we define a $s$ - $t$ cut to be a cut where $s$ is in one partition and $t$ is in the other.

A critical theorem of flow networks is the max-flow min-cut theorem:

Max-flow min-cut theorem

For any flow network with source and sink vertices $s$ and $t$ , the maximum flow value $∣ f ∣$ among all feasible flow functions is equal to the minimum capacity over all possible $s$ - $t$ cuts.

The capacity of a $s$ - $t$ cut is simply the sum of the capacities the edges that connect vertices across the two partitions induced by the cut.

Create an artificial example of a flow network and show that this theorem holds of your example. Make sure to identify precisely:
- The feasible flow function that produces the maximum flow value.
- The $s$ - $t$ cut that produces the minimum capacity. You may either use your artificial example from the previous section or a new example.
Go back to your real-world examples and for each example, describe in a sentence or two what the max-flow min-cut theorem means for that example's particular context. Try to avoid using the terms "flow" and "cut" in your descriptions.
Finally, in a few sentences, describe why this theorem holds. You do not need to prove this claim formally, but your explanation should appeal directly to nature of a cut and what it does to the capacity and/or flow of the network.

Introduction to Graphs

Recall that a relation models a relationship between objects. Of particular note are binary relations which capture relationships between pairs of objects drawn from the same universe. For example, we might consider:

An ordering relationship $(\leq)$ between members of $N$ .
A friendship relationship between friends.
A connectedness relationship between cities by roads or available flights.
A transition relationship between states in an abstract machine. For example, an idealized traffic light is such an abstract machine where the traffic light switches between "green", "yellow", and "red."

Binary relationships are ubiquitous in mathematics and computer science, and they all have a similar structure: a relation $R : U \times U$ . Can we exploit this structure to talk about all these sorts of relationships in a uniform manner? Is there a set of universal definitions and properties that apply to binary relations?

This is precisely the study of graph theory, the next area of discrete mathematics we'll examine in this course. Graph theory is really the study of binary relations, although we more commonly think of a graph as a visual object with nodes and edges.

Basic Definitions

Consider the following abstract binary relation over universe $U = {a, b, c, d, e, f}$ .

$R = {(a, b), (b, c), (c, d), (d, e)} .$

A graph allows us to visualize these relationships. Here is an example of a such graph for this relation:

A graph for relation R

We call the elements $a, \dots, f$ vertices or nodes of the graph. For each related pair of elements, we draw a line called an edge in our graph.

While our graph is simply a graphical representation of our binary relation, we traditionally represent a graph using a slightly different structure. We say that the graph above is $G = (V, E)$ . That is, graph $G$ is a pair of sets:

$V$ is the set of vertices in the graph. Here $V = {a, b, c, d, e, f}$ .
$E$ is the set of edges in the graph. Here $E = {(a, b), (b, c), (c, d), (d, e)}$ .

Definition (Graph)

A graph $G$ is a pair of sets:

$V$ , the set of vertices or nodes of the graph.
$E : V \times V$ , the set of edges of the graph.

Because we talk about edges so much, we frequently write the edge $(a, b) \in E$ as $ab \in E$ , i.e., we drop the pair notation and simply write the vertices together.

Exercise (Sketchin')

Draw the graph $G = (V, E)$ where:

$V = {a, b, c, d, e, f, g}$ .
$E = {a g, b g, c g, d g, e g, f g}$ .

Variations

The fundamental definition of a graph is a simple riff on a binary relation. We call such graphs simple graphs. However, there exists several variations of graphs that accommodate the wide range of scenarios we might find ourselves in.

Directed versus Undirected Graphs

Because individual relationships are encoded as pairs, the order matters between vertices. For example, the pair $(a, b)$ is distinct from the pair $(b, a)$ . In a directed graph or digraph, we acknowledge this fact and distinguish between the two orderings.

For example, consider the following graph $G = (V, E)$ with

$V = {a, b, c, d}$ .
$E = {ab, b c, c d, d c, d a}$ .

If we consider this graph directed, we would draw it as follows:

A directed graph

Note that the edges are directed edges where the direction is indicated by an arrowhead. If we were to have two vertices be mutually related, i.e., related in both directions, we need two edges, one for each direction. For example, $c$ and $d$ are mutually related, so we connect them with two edges $c d$ and $d c$ .

In contrast, we can consider $G$ to be undirected where we do not distinguish between the two orderings. Effectively, this means relations are unordered sets rather ordered pairs, but in terms of notation, we still keep $E : V \times V$ . If we consider $G$ to be undirected, we would draw it as follows:

A undirected graph

Here, the edges are undirected, i.e., without arrowheads. Effectively, we treat a single edge pair as relating symmetrically by default, so the edge $ab$ implies that $a$ is related to $b$ and $b$ is related to $a$ . Because of this, we should not include symmetric pairs in our set of edges. So we should define $E$ for the above graph as $E = {ab, b c, c d, d a}$ where we removed the symmetric pair $d c$ .

When should we employ a directed versus undirected graph? We should employ a directed graph where it is not assumed that our relation is symmetric for every pair of related vertices. For example, a "loves"-style relationship where $a$ loves $b$ is not inherently symmetric since $b$ might not love $a$ . A directed graph allows us to represent this distinction. A directed graph can always represent an undirected graph by explicitly including symmetric edges. Therefore, we can think of an undirected graph as a shortcut where we can avoid writing extras edges if we know that our relation is already symmetric. For example, a "friends"-style relationship is symmetric because $a$ being friends with $b$ implies that $b$ is also $a$ 's friend.

Self-loops

Like symmetry, we may or may not take reflexivity of a relation for granted. If we do not take this for granted, i.e., some elements are reflexively related but not all of them are, then we might consider introducing self-loops into a graph. For example, consider the following digraph $G = (V, E)$ with $E = {ab, b c, c a, aa, cc}$ .

A graph with self-loops

In this graph, $a$ and $c$ are related to themselves, but not $b$ .

Weights and Multi-graphs

Edges encode relations between objects in a graph. We can also carry additional information on these edges dependent on context. Most commonly, we will add numeric weights to our edges, e.g., to capture the distance between cities, or the cost of moving from one state to another. Both directed and undirected graphs can be weighted. As an example, consider the digraph $G = (V, E)$ with $E = {ab, b c, c a, c d}$ .

A weighted graph

We annotate the edges with a weight whose interpretation depends on context. For example, we can see that the edge $c a$ has weight 5. We represent the weights on our graph formally with an additional structure, a function $W : E \to Z$ , that maps edges to weight value. The codomain of $W$ can be whatever type is appropriate for the problem at hand; here we choose integers ( $Z$ ). For the above graph, we would define our weight function as:

$W (ab) = W (b c) = W (c a) = W (c d) = 315 - 2$

We can also extend our graphs further by extending $E$ to be a multiset, a set that tracks duplicate elements. This allows us to express the idea of multiple edges, e.g., with different weights according to $W$ .

Simple Graphs Revisited

Now that we have introduced various variations on a graph, we can finally come back and formally define a simple graph as a graph with no such variations.

Definition (Simple Graph)

A simple graph is an undirected, unweighted graph with no self-loops.

In closing, we have many variations of a graph that we might consider. In successive readings, we'll consider various analyses over graphs and problems we might try to solve. The beauty of graph theory is that because graphs are so general, by defining and solving problems in terms of graphs, we can apply our solutions to a whole host of problems!

Exercise (What's That?, ‡)

Consider the following formal definition of an abstract graph $G = (V, E)$ with:

$V = {a, b, c, d, e} E = {(a, b), (a, c), (a, d), (a, e), (c, d), (c, e), (b, e)}$

Draw $G$ .
Instantiate this abstract graph to a real-life scenario. Describe what objects the vertices $V$ represent and what relationship between objects is captured by $E$ .
Observe that $c$ , $d$ , and $e$ are mutually connected in this graph, i.e., each vertex has an edge to the other. Interpret the fact that they are mutually connected in your real-life scenario. Is the fact that they are mutually connected have special meaning in the scenario you envisioned?

Trees

Frequently the relations we draw between objects are hierarchical in nature. That is the objects have a parent-to-child relationship, for example:

A literal parent and their children.
A manager and the employees that report to them.
A folder and the files it contains.

We represent these relationships with a specialized kind of graph called a tree.

Definition (Tree)

A tree is an undirected graph that contains no cycles.

Here is an artificial example of a tree with five nodes, $a$ -- $f$ :

An example abstract tree with five nodes

We distinguish a vertex of the tree as its root. Here we'll consider $a$ to be the root of the tree although any of the vertices could be considered the root. By convention, we draw trees "upside down" with the root at the top and the tree growing downwards.

The root allows us to categorize the vertices of the tree by their distance from the root. We call a collection of vertices that are the same distance away from the root a level.

Definition (Level (Tree))

Let $T = (V, E)$ be a tree with a distinguished root $r$ . Define the $i$ th level of a tree, denoted $L_{i}$ to be the set of vertices that are $i$ nodes away from $r$ :

$L_{i} = {v ∣ v \in V and there exists a path of length i from r to v} .$

Definition (Height)

The height at tree is the maximum level of any of its vertices.

In our above example:

$L_{0} = L_{1} = L_{2} = {a} {b, c} {d, e, f}$

And the tree has height 2. Note that $L_{k}$ for any $k > 2$ is $\emptyset$ since there are no nodes greater than 2 away from $A$ .

With levels defined, we can now formally define the parent-child relationship that characterizes trees:

Definition (Parent)

The parent of a vertex $v$ at level $i$ of a tree is the node $u$ for which the edge $(u, v)$ is in the tree and $u$ is at level $i - 1$ .

Definition (Children)

The children of a vertex $u$ at level $i$ of a tree are the nodes $V$ for which each $v \in V$ , the edge $(u, v)$ is in the tree and $v$ is at level $i + 1$ .

Because a tree contains no cycles and the tree is rooted at a particular node, it follows that every vertex of a tree except the root has exactly one parent. (This is a worthwhile claim to prove yourself for practice!)

We can categorize trees by the maximal number of children any single node possesses. We call this value the tree's fan-out:

Definition (Fan-out)

The fan-out of a tree $k$ is the maximum number of children that any one vertex of the tree possesses. We call such a tree a $k$ -ary tree or a $k$ -tree for short. Notably a $1$ -tree is a sequence or a list, and a $2$ -tree is a binary tree.

Finally, we've restricted ourselves to connected trees, trees in which all its vertices are mutually reachable. If a graph is unconnected, but all of its connected components themselves are trees, then we call the graph a forest:

Definition (Connected Components)

Call an undirected graph $G = (V, E)$ connected if there exists a path between every pair of vertices in $G$ . A connected component $G^{'}$ of $G$ is a sub-graph of $G$ that is, itself, connected.

Definition (Forest)

A graph is a forest if all of its connected components are trees.

Directed Acyclic Graphs

We generally assume that a tree is an undirected graph. However, we can apply the same concepts to a directed graph. This results in a kind of graph called a directed acyclic graph (DAG):

Definition (Directed Acylic Graph)

A directed acyclic graph (DAG) is a directed graph that contains no cycles.

DAGs are outside the scope of our discussion of the basics of graphs, but be aware that DAGs have their own interesting properties and operations distinct from trees!

Depth-First Tree Traversals

Previously, we learned about depth-first and breadth-first traversals for graphs. Breadth-first traversals remain the same: we traverse the vertices of the tree by order of increasing level. However, the specialized nature of trees lets us specify different sorts of depth-first traversals, in particular, for binary trees where every node possesses at most two children. In such a tree, we call one child the left child and the other child the right child.

With this in mind, we can describe a recursive algorithm for depth-first search specialized to binary trees. In the Python-like code below, we use dot-notation to denote children, e.g., v.l for v's left child and v.r for v's right child.

def preDFS(u):
    visit(u)
    preDFS(u.l)
    preDFS(u.r)

Note that we visit the node u before visiting its children. Performing this traversal on our example tree from the beginning of this section yields the sequence:

$a, b, d, e, c, f .$

This kind of depth-first traversal of the tree is called a pre-order traversal of the tree. We first visit the current element and then visit its children.

In contrast, a post-order traversal of the tree visits the children first and then the current node last.

def postDFS(u):
    postDFS(u.l)
    postDFS(u.r)
    visit(u)

A post-order traversal of the graph yields the following sequence:

$d, e, b, f, c, a .$

Finally, we can intermix visiting children and visiting the current node with an in-order traversal:

def inDFS(u):
    inDFS(u.l)
    inDFS(u.r)
    visit(u)

An in-order traversal yields the following sequence:

$d, b, e, a, c, f .$

Graph Problems

Because graphs describe such a wide variety of phenomena, it is not surprising that there are many graph problems we might consider. A comprehensive study of graph theory and algorithms is beyond the scope of this course, but it is important as a working computer programmer to be aware of these different problems. Being able to see a graph problem lurking within the problem you are trying to solve usually leads to a quick and efficient solution!

In this lab, we'll explore several of the most common graph problems. You'll employ artificial and real-world examples to understand the practical nature of the formal definitions involved with the problem. And then, you'll use that built-up intuition to explore a problem within this area of graph theory.

Problem 1: Bipartiteness

Definition (Bipartite Graphs)

Let $G = (V, E)$ be a simple graph. We call $G$ bipartite if there exists a partition of the vertices of $V$ into two sets, $V_{1}$ and $V_{2}$ such that $V_{1} \cap V_{2} = \emptyset$ and for any edge $(u, v) \in E$ , $u \in V_{1}$ and $v \in V_{2}$ .

Create two artificial examples of graphs, one that is bipartite and one that is not bipartite.
Instantiate your artificial examples to a real-world example where the notion of bipartiteness would be useful. Describe what the vertices and edges of your graph represent and describe how to interpret the notion of bipartiteness in this context.
Consider the following additional definition:

Definition (Perfect Matching): Let $G = (V, E)$ be a simple graph. A perfecting matching $M \subseteq E$ of $G$ is a subset of the edges of $E$ such that every vertex of $V$ is incident (i.e., mentioned) in exactly one edge of $M$ .

Give a perfect matching, if there exists one, in your positive artificial example above, and in a sentence or two, describe what a perfecting matching means in your real-world scenario.

Problem 2: Cliques

Definition (Cliques)

Let $G = (V, E)$ be a simple graph. A clique is a subset of vertices $V^{'} \subseteq V$ such that for any pair of vertices $u, v \in V^{'}$ that $(u, v) \in E$ . Call a $k$ -clique a clique that contains $k$ vertices.

Create two artificial examples of graphs, one that contains a $4$ -clique and one that does not contain a $4$ -clique, i.e., a clique of size $4$ .
Instantiate your artificial examples to a real-world example where the notion of a clique would be useful. Describe what the vertices and edges of your graph represent and describe how to interpret the notion of clique in this context.
Consider the following additional definition:

Definition (Complete Graph): $G = (V, E)$ is a complete graph if for every pair of vertices $u, v \in V$ , $(u, v) \in E$ .

In a few sentences, describe what the relationship is between a complete graph and a clique and describe what this interpretation means in the context of the real-world example you gave previously.

Problem 3: Coloring

Definition (Coloring)

Let $G = (V, E)$ be a simple graph and $C$ be a set of colors. A coloring of graph $G$ is a function $h : V \to C$ such that for any pair of vertices $u, v \in V$ , if $(u, v) \in E$ , then $h (u) \neq = h (v)$ . We call a graph $k$ -colorable if it has a coloring with a set of $k$ colors.

Create two artificial examples of graphs, one that contains a $3$ -coloring and one that does not contain a $3$ -coloring, i.e., a coloring of size $3$ .
The famous four color theorem says the following:

Theorem (The Four Color Theorem): any map can be colored with at most four colors.

By "map" in this claim, we mean a geographic map, e.g., a map of the United States, or a map of the counties in Iowa. Test this theorem out by drawing 2--3 simple maps with, e.g., 5--6 regions and giving them a 4-coloring.
For each of your maps, give a corresponding graph that represents that map. What do nodes and edges represent in the graph?
Draw a graph that does not contain a 4-coloring and attempt to translate it into a physical map based on your answer to the previous part. In a sentence or two, what about that graph makes it so that you cannot translate it into a geographic map?

(Hint: for a simple non-4-colorable graph, look back to problem 2 and the notion of completeness. Using completeness, try to sketch out the smallest possible graph that certainly does not contain a 4-coloring because it has "too many edges.")

Spanning Trees

Consider a graph $G = (V, E)$ . A spanning tree of $G$ is a subgraph $G^{'} = (V^{'}, E^{'})$ (with $V^{'} \subseteq V$ and $E^{'} \subseteq E$ ) that is a tree (i.e., contains no cycles) that covers every vertex of $G$ . That is $V^{'} = V$ . For example, consider the following graph:

A graph of six nodes

A spanning tree for the graph is given below:

A spanning tree of the previous graph

Spanning trees are not necessarily unique. For example, here is a different spanning tree for the graph.

An alternative spanning tree of the previous graph

Constructing a spanning tree for a graph is useful in many situations, for example:

Consider an electrical network for a neighborhood where nodes represents houses and edges represent potential electrical connections between houses. A spanning tree in this context represents a minimal set of electrical connections used to connect all of the houses to the power grid.
Consider a collection of networked computers where nodes represent computers and edges represent physical connections between computers. A spanning tree in this context represents a minimal collection of connections that allow one machine to communicate with another without the fear of encountering a routing loop where a message is relayed between a collection of machines in perpetuity.

Exercise (Sizes of Spanning Trees)

Say that we have a graph $G = (V, E)$ . What is the number of edges of any spanning tree of $G$ ? Prove your claim.

Constructing Spanning Trees

To construct a spanning tree, we can employ either traversal algorithm we have discussed for graphs---breadth-first or depth-first search---to reach every vertex from some arbitrary starting vertex. When processing a vertex $v$ , we add to our working set each vertex $v^{'}$ connected to $v$ (i.e., $(v, v^{'}) \in E$ ) that we haven't already seen. If we are performing breadth-first search, we treat the working set like a queue, processing the oldest entry in the working set first. If we are performing depth-first search, we treat the working set like a stack, processing the newest entry in the working set first.

For example, the first spanning tree we discussed previously is the result of a breadth-first traversal, namely:

$A, B, D, C, E, F .$

The second spanning tree is the result of a depth-first traversal, namely:

$A, D, C, B, F, E .$

Exercise (Runtime of Spanning Tree Construction)

Prove that for a graph $G = (V, E)$ , we must process exactly $V - 1$ vertices when constructing the spanning tree, irrespective of the traversal used to generate the tree.

Minimum Spanning Trees

Now, let's consider extending our graph edges with weights. Each weight represents a "cost" associated with that edge, for example:

Distances if the edges represent connections between physical places.
Monetary amounts if the edges represent a transformation from one kind of object to another.

Let:

$weight (E) = (v_{1}, v_{2}, w) \in E \sum w$

Be the sum of the weights of all the edges in $E$ .

With a weighted graph, we can refine our notion of spanning tree. We can now consider minimum spanning trees (MST), a spanning tree with minimal cost. Formally defined:

Definition (Minimum Spanning Tree)

Let $G = (V, E)$ be a weighted graph. Then a minimum spanning tree, $T_{1} = (V_{1}, E_{1})$ , be a spanning tree of $G$ such that for any other spanning tree $T_{2} = (V_{2}, E_{2})$ of $G$ , it is the case that $weight (E_{1}) \leq weight (E_{2})$ .

In our above examples, we could attach an appropriate cost to each graph:

Weights in the electrical network represent physical distance between houses.
Weights in the computer network represent the average time it takes for two computers to communicate with each other.

Minimum spanning trees for each of these examples then also minimize the values of these trees---physical distances and average communication time, respectively---in addition to "spanning" the graphs.

Algorithms for Minimum Spanning Trees

Note that our current methods for constructing minimum spanning trees are agnostic to the weights of the edges. Even though such algorithms choose a minimal number of edges, this doesn't guarantee that the weight of the resulting spanning tree is minimized. For example, consider the following weighted graph:

An example weighted graph

A BFS traversal of the graph starting at $A$ produces the following weighted graph:

A spanning tree generated from a BFS of the previous graph

Its weight is $3 + 1 + 15 + 5 + 8 + 4 = 36$ but it is not minimal. The following minimum spanning tree is minimal for our graph:

A minimum spanning tree of the previous graph

The weight of this MST is $3 + 1 + 1 + 3 + 5 + 4 = 17$ . We clearly need another method of calculating a MST that takes into account (a) the vertices we have yet to explore and (b) the weights of the chosen edges.

There are several algorithms that we could consider. Here, we will consider Prim's Algorithm which we will present as a modification of breadth-first search for this setting. To gain some intuition about how to proceed, let's see how naive breadth-first search failed to produce the MST and then modify the algorithm to obtain the desired result.

BST producing a spanning tree of the previous graph

Initially, we begin our BFS at $A$ and then add $B$ . We can think of $A$ and $B$ and the edge $A B$ as part of our MST. Our goal is to figure out which edge to add to the MST next. Note that we must pick an edge that does not create a loop in the MST, thus we must only consider edges that connect the MST to a vertex not already in the MST.

At this point, BFS would consider $C$ next. However, we see that the edge $A C$ is not in the MST. This is because it turns out it is more profitable to instead include vertex $C$ by way of edge $CG$ instead. How can we avoid making this choice? Note that we have three edges to choose from at this point--- $A C$ , $B D$ , and $BE$ ---and the last two edges have a lower weight (1) than $A C$ (5). It seems like we should consider one of these edges first since its weight is smaller.

If we choose $B D$ , we obtain the following extended MST that includes $A$ , $B$ , and $D$ :

Choosing BD to extend the spanning tree

Now we can consider the following edges:

$A C$ ---cost 5,
$BE$ ---cost 1, and
$D F$ ---cost 15.

By considering the lowest weight edge next, we then add $BE$ to the MST:

Extending the spanning tree with BE

We continue in this manner, considering the lowest weight edge that expands the MST by one vertex. We would next add $EG$ (cost 5):

Extending the spanning tree with EG

Next we would add $FG$ (cost 3):

Extending the spanning tree with FG

And then finally $CG$ (cost 4):

Extending the spanning tree with CG

At this point, we know we are done because there are no other vertices left to add to the graph. Alternatively, we know that any tree of the graph contains $∣ V ∣ - 1$ edges, so once we add this number of edges, we can safely stop.

To summarize, Prim's algorithm proceeds as follows on a graph $G = (V, E)$ :

Choose a start vertex $s \in V$ as an initial $T$ consisting of just $s$ .
For $∣ V ∣ - 1$ times:
- Choose the minimum weight edge $e \in E$ that connects $T$ with a vertex not in $T$ and add $e$ to $T$ .
Afterwards $T$ is a MST for $G$ .

Prim's algorithm is an example of a greedy algorithm. A greedy algorithm is one that, for each iteration, makes its next choice by choosing the minimum or maximum option available. Here, that choice is the minimum weight edge that expands the current MST. However, how do we know this choice is correct?

To prove that our greedy choice is correct, we must first introduce the notion of a cut.

Definition (Cut)

$G = (V, E)$ be a graph. A cut of the graph is a partition of its vertices $(S, V - S)$ with $S \subseteq V$ . A non-trivial cut is one where $S \neq = \emptyset$ and $S \neq = V$ .

Since $S$ in the above definition uniquely identifies the cut, we will refer to the cut by $S$ for notational convenience.

Cuts allow us to talk about the state of Prim's algorithm precisely. At any iteration of the algorithm, we may view the MST $T = (V^{'}, E^{'})$ as inducing a cut $V^{'}$ of $G$ . The algorithm then considers the minimum weight edge that flows between vertices of the cut, i.e., vertices of the form $u, v$ with $u \in V^{'}$ and $v \in V - V^{'}$ . We must show that this edge belongs to some MST of $G$ . In particular, we'll show that it can belong to the eventual MST $T^{*}$ of the graph grown from the current MST $T$ , that is, $T \subseteq T^{*}$ .

Claim (Cut Property)

Let $G = (V, E)$ be a graph and $T = (V^{'}, E^{'})$ with $V^{'} \subseteq V$ and $E^{'} \subseteq E$ be a minimum spanning tree for $V^{'}$ . Consider the set of edges $D$ that connect a vertex in $T$ to a vertex not in $T$ , i.e.,

$D = {(u, v) ∣ (u, v) \in E, u \in V^{'}, v \in V - V^{'}},$

And let $e$ be an edge with minimum weight of $D$ . $e$ belongs to some MST $T^{*}$ of $G$ where $T \subseteq T^{*}$ .

Proof

We prove this fact by contradiction. Let the cut induced by $T = (V^{'}, E^{'})$ be $(V^{'}, V - V^{'})$ and the minimum weight edge under consideration be $e = (u, v)$ with $u \in V^{'}$ and $v \in V - V^{'}$ . Suppose for the sake of contradiction that $e$ does not belong to any MST of $G$ of which $T$ is a subtree. Now consider an arbitrary MST of $G$ , call it $T_{2}^{*}$ . Note that because $v$ must be connected in $T_{2}^{*}$ , if $e$ does not connect $e$ in $T_{2}^{*}$ , there must exist some other path between $T$ and $v$ in $T_{2}^{*}$ . Let the edge in this path that flows across the cut induced by $T$ be $e^{'}$ .

Now, consider the alternative tree of $G$ where we replace $e^{'}$ with $e$ , call it $T_{1}^{*}$ . $T_{1}^{*}$ is a spanning tree because any vertex that was reachable through $e^{'}$ is now reachable through $e$ . Furthermore, note that $e$ and $e^{'}$ are both edges across the cut induced by $T$ , but $e$ is assumed to have minimum weight among such edges. Therefore, $weight (e) \leq weight (e^{'})$ . But $T_{1}^{*}$ only differs from $T_{2}^{*}$ in this edge, so we can conclude that $weight (T_{1}^{*}) \leq weight (T_{2}^{*})$ which implies that $weight (T_{1}^{*})$ is a MST for $G$ , a contradiction since $e$ belongs to it.

This argument is an example of an exchange argument for greedy algorithms. We justify the greedy choice by arguing that any "sub-optimal" choice could be substituted by the greedy choice to obtain a solution at least as good as the original one. Our argument accounts for the fact that since edge weights are not necessarily distinct that $e$ and $e^{'}$ could be both valid minimal choices. The resulting MSTs are, therefore, potentially different, but both have the same (minimal) weight. Note that if the edge weights of $G$ are distinct, then $e$ is the minimum weight edge of the cut and thus $T_{2}^{*}$ in the proof above is not a MST; $T_{1}^{*}$ . This implies that in the case where edge weights are distinct, the minimum weight edge $e$ must belong to any MST of $G$ , not just one of them.

We can use this lemma to prove the correctness of Prim's algorithm easily. The algorithm maintains a MST for the vertices it contains so far, extending the MST by one edge (and thus, one vertex) on each iteration. We therefore prove the correctness of Prim's algorithm by induction of the number of iterations of its loop, claiming that $T$ is a MST for the vertices it contains so far.

Claim (Correctness of Prim's Algorithm)

On the $n$ -th iteration of Prim's on a graph $G = (V, E)$ , $T = (V^{'}, E^{'})$ with $V^{'} \subseteq V$ and $E \subseteq E^{'}$ is a MST for $V^{'}$ .*

Proof

By induction on $n$ .

$n = 0$ . Initially $T$ contains a single vertex and is trivially a MST for that one vertex.
$n = k + 1$ . Our induction hypothesis says that on iteration $k$ , $T = (V^{'}, E^{'})$ is a MST for $V^{'}$ In the $(k + 1)$ -st iteration, we extend $T$ with an edge $e$ across the cut induced by $T$ . By the Cut Property, we know that $e$ can belong some MST $T^{*}$ of $G$ with $T^{*} \subseteq T$ . Therefore, we know that $T$ extended with $e$ is a MST for its vertices.

Shortest Paths

A related problem to minimum spanning trees is shortest paths. That is, what is the shortest path between two vertices in a graph, say $a$ and $b$ . A naive greedy approach seems compelling given our exploration of minimum spanning trees. For example, consider the following graph:

An example graph

Suppose we want to find the shortest path from $A$ to $F$ . We could try to start from $A$ , and choose the edge with the least weight to traverse next, eventually arriving at $F$ . However, this yields the path $A, C, E, F$ which has total weight $4 + 10 + 3 = 17$ . The optimal path instead is the top path of the graph: $A, B, D, F$ with total weight $5 + 5 + 1 = 11$ . Note that even if we were to divine that it is better to go to $B$ from $A$ , we encounter the same problem at $B$ : a greedy choice will send us down to $E$ (at cost 4) when traversing $D$ would have been the correct move!

More generally, this approach has the problem where it may send us down a sub-optimal path:

Sub-optimal greedy searching for a shortest path

For example, we may reach vertex $z$ in the graph above only to find that we are either in a sub-optimal path or worse yet, that the target node of our path is unreachable from $z$ ! In this case, we would be forced to backtrack to explore other possible paths. In general, this occurs when a greedy choice fails: we have to "undo" our optimal choice and try other possibilities instead. We may end up exploring all such possibilities exhaustively which is problematic if there are many possibilities to consider! This realization might motivate us to dismiss any sort of greedy algorithm for this problem. However, if we are smart in tracking enough information so that we never need to backtrack, we can retain a greedy approach!

The algorithm we'll consider is Dijkstra's algorithm which we can think of as a refinement of Prim's algorithm for minimal path searching. Note that Prim's proceeded by growing a minimal spanning tree from a single node. Dijkstra's proceeds similarly, growing an optimal path from the start vertex under consideration. However, unlike Prim's which only tracks the growing MST of interest, Dijkstra's does not just record the current optimal path to the desired end vertex but all such optimal paths to every vertex in the graph from the start vertex. This additional information is sufficient for us to make greedy choices that always lead to the discovery of the optimal path.

Suppose we are interested in finding the shortest path from $A$ to $F$ in our example graph for this section. Dijkstra's ultimately tracks the shortest path from $A$ through the nodes it has visited so far to all nodes in the graph, refining these paths as it greedily consumes vertices in the graph. Initially, we know that the shortest path from $A$ to itself is simply staying at $A$ for a cost of 0. For every other node, we don't know a path---indeed, such a path may not exist!---so we assign the value $\infty$ to these paths.

Destination	Path	Cost
$A$	$A$	0
$B$	?	$\infty$
$C$	?	$\infty$
$D$	?	$\infty$
$E$	?	$\infty$
$F$	?	$\infty$

We begin by considering $A$ and its edges. We look at each edge incident to $A$ and update our table based on these edges. When looking at these edges, we ask the question:

Can taking this edge to a vertex $v$ give us a new optimal path from $A$ to $v$ ?

There are two such edges to consider: $A B$ and $A C$ . Since we don't know of any paths from $A$ to either of these edge's endpoints---represented by the fact that their entries in the table are $\infty$ ---we can update our shortest paths entries for these vertices with these edges.

Destination	Path	Cost
$A$	$A$	0
$B$	$A B$	$5$
$C$	$A C$	$4$
$D$	?	$\infty$
$E$	?	$\infty$
$F$	?	$\infty$

After processing these edges, we are done with $A$ . An invariant of the algorithm is that we never need to reconsider these edges again. The important information about them has been recorded in the table, allowing us to avoid backtracking if an optimal path needs to be updated!

We now repeat the process by choosing a vertex that we have not yet visited and updating the table based on its incident edges that we have not yet considered. Which vertex do we consider next? This is where we'll make our greedy choice: we'll consider the vertex with minimum cost according to the table that we have not yet visited.

In our running example, this is node $C$ with current path cost $4$ . We now update our table with $C$ 's additional edge: $CE$ . Note that this edge gives us a path from $A$ to $E$ ; what is its length? It is the length of the optimal path from $A$ to $C$ plus the cost of traversing $CE$ ! This quantity is $4 + 10 = 14$ corresponding to the path $A CE$ .

Destination	Path	Cost
$A$	$A$	0
$B$	$A B$	$5$
$C$	$A C$	$4$
$D$	?	$\infty$
$E$	$A CE$	$14$
$F$	?	$\infty$

Next we consider node $B$ since its current shortest path cost $4$ is lower than $E$ 's cost $14$ . We thus consider edges $B D$ and $BE$ next. Edge $B D$ updates the path to $D$ as expected. However, edge $BE$ introduces a choice between two paths:

The current shortest path in the table: $A CE$ with cost $14$ .
The candidate shortest path through $BE$ . The cost of this path is the minimal cost of reaching $B$ from $A$ plus the cost of traversing $BE$ : $5 + 4 = 9$ .

We note that $9 < 14$ and thus the candidate path is shorter than our current best known path. We therefore update the table to reflect the fact, recording a new best shortest path for $E$ !

Destination	Path	Cost
$A$	$A$	0
$B$	$A B$	$5$
$C$	$A C$	$4$
$D$	$A B D$	$10$
$E$	$A BE$	$9$
$F$	?	$\infty$

We next consider $E$ with current shortest path length $9$ . It has one unvisited incident edge, $EF$ , allowing us to final reach our intended endpoint, $F$ , with cost $9 + 3 = 12$ .

Destination	Path	Cost
$A$	$A$	0
$B$	$A B$	$5$
$C$	$A C$	$4$
$D$	$A B D$	$10$
$E$	$A BE$	$9$
$F$	$A BEF$	$12$

Now we will consider vertex $D$ with edge $D F$ . We employ the same logic here as with $B$ . We need to compare the:

The current shortest path recorded in the table: $A BEF$ of length $12$ .
The candidate shortest path through $D$ : $A B D F$ of length $10 + 1 = 11$ .

The candidate shortest path is shorter, so we update the table with the new path for $F$ .

Destination	Path	Cost
$A$	$A$	0
$B$	$A B$	$5$
$C$	$A C$	$4$
$D$	$A B D$	$10$
$E$	$A BE$	$9$
$F$	$A B D F$	$11$

Since $F$ has no unvisited edges to process, we don't need to do anything to process it, completing the search procedure. We can now inspect the table to find the shortest path from $A$ to $F$ , which is $A BEF$ of length $11$ as desired!

With this example, we see that the salient parts of Dijkstra's algorithm are:

Repeatedly choosing vertices to process based on their current, best-known shortest paths from the start vertex.
Comparing our current known shortest paths with paths through the current vertex under consideration and choosing the better of the two.

Let $G = (V, E)$ be an undirected graph. Suppose that we are interested in finding all shortest paths starting from vertex $s \in V$ . Let $cost (x)$ be the cost of the shortest known path from $s$ to $x$ . Dijkstra's algorithm proceeds as follows:

Initially, let $cost (s) = 0$ and $cost (v) = \infty$ for all $u \in V$ that are not $s$ .
Repeatedly choose vertex $u$ that has minimal $cost (v)$ among all vertices that have not yet been processed in $V$ .
- For each edge $(u, v) \in E$ that has not yet been processed by the algorithm, update the shortest known path to $v$ as follows:
  
  $cost (v) \leftarrow min (cost (v), cost (u) + weight (uv))$

After Dijkstra's algorithm completes, $cost (t)$ records the shortest path from $s$ to $t$ in $G$ for all $t \in V$ .

Note that our choice of initially assigning unknown paths the value $\infty$ makes the description of our algorithm concise. We don't need to define a special case for when we first find a path to a target vertex. We will always choose the found path because $\infty$ is effectively larger than the length of any known path length we would consider during execution of the algorithm.

Spanning Trees and Shortest Paths

Problem: Minimum Spanners

Consider the following weighted, undirected graph:

A weighted, undirected graph

Come up with at least three spanning trees for this graph using the techniques discussed in the reading. Write down the collection of edges that compromises each tree.
When our graphs have weights, we are concerned with finding minimum spanning trees (MSTs), spanning trees whose sum-of-weights-of-edges is minimal for any spanning tree of the graph. Use Prim's algorithm as described in the reading to come up with a MST for this graph. Note the edges of your MST and the sum-of-weights of that tree. Check with an instructor that your MST is indeed the minimal one.

Problem: Jump Up, Superstar!

Example graph, a 3x3 grid of nodes whose edges all have weight one.

In the reading, we explored Dijkstra's algorithm, a fundamental technique for finding the shortest paths of a graph. Here, we explore the algorithm in more detail and then extend it to make it more appropriate for certain real world scenarios.

Run Dijkstra's algorithm on the graph above, starting at vertex $A$ . In your write-up, give the resulting shortest paths and their costs from $A$ to every other node in the graph.
In many cases, weights in our graph correspond to physical distances between vertices. In these scenarios, the triangle inequality will hold between vertices. For any three vertices $a$ , $b$ , and $c$ that have edges between them, i.e., they form a triangle:

$∣ ab ∣ + ∣ b c ∣ \geq ∣ a c ∣$

Where $∣ ab ∣$ is the weight of edge $ab$ .

In a sentence or two, translate what the triangle inequality says with respect to the domain of vertices and lengths of paths between them.
In a sentence or two, explain why the graph from the previous question on spanning trees does not obey the triangle inequality.
In cases where the triangle inequality holds, we can take advantage of this fact by introducing a heuristic function that allows us to better prioritize the nodes we consider during Dijkstra's search. Let $h (v)$ be the straight line distance between $v$ and target $t$ . For example, in the graph from part (a), if we declare our target vertex to be $I$ , then:

$h (A) h (E) h (H) = h (A) = 2^{2} + 2^{2} = 8 = 1^{2} + 1^{2} = 2 = 1.$

All due to the Pythagorean theorem.

Rerun Dijkstra's on this graph to find the optimal path from $A$ to $I$ . However, when you need to calculate the next node to process, rather than choosing the minimal $c (v)$ , the current best-known path from $A$ to $v$ , choose the vertex $v$ that minimizes:

$c (v) + h (v) .$

In other words, choose the vertex $v$ that minimizes the sum of the current best-known path from $A$ to $v$ and the straight line distance from $v$ to $I$ . You may stop once you have established the shortest path from $A$ to $I$ .

In your write-up, give the sequence of nodes that you visited in your modified algorithm. Also, in your write-up describe in a few sentences how the heuristic function $h (v)$ made your shortest path search more efficient.
This extension of Dijkstra's algorithm is called $A^{*}$ ("A-star") which is a common pathfinding algorithm in many contexts where the graph in question represents physical distances. In this problem, we considered a heuristic function $h (v)$ , but it turns out that any heuristic function will work as long as it doesn't overestimate the actual cost to get to the goal node in the graph.

Discuss why the heuristic function must now overestimate the actual cost of the shortest path. Specifically, what happens to Dijkstra's algorithm if the heuristic function has this bad property?

Case Study: Finite Automata

Graphs are a foundational data structure in many areas of computer science. To close this portion of the course, we'll take a look at a particular application of graphs towards an older theme: program correctness. We'll use graphs to develop a simple model from the theory of computation, the finite automata. While simple, the model captures a wide variety of computations that we might consider in a computer program. This model will put together everything we have learned so far towards the task of using mathematics productively in programming and tease at what future courses in the foundations of computer science will cover!

Strings

A finite automata is an abstract machine that consumes strings of characters as input and produces a boolean, a yes or no answer, as output. Before we explore the finite automata itself, we must first formalize the notion of a string.

Definition (String)

A string $s$ drawn from an alphabet $Σ$ is a sequence of zero or more characters drawn from $Σ$ , i.e., $s = x_{1} \dots x_{k}$ where $x_{1}, \dots, x_{k} \in Σ$ .

Note that when we write $s = x_{1} \dots x_{k}$ , this implicitly binds $x_{1}, \dots, x_{k}$ to the individual characters of $s$ . Because there are $k$ variables bound in this manner, we know that the string has length $k$ .

Unlike strings in a programming language, we explicitly define the set of possible characters that make up our strings. We call this set the alphabet under consideration and denote it with $Σ$ ( $L A T E X$ : \Sigma). Here are some examples of alphabets and possible strings drawn from those alphabets:

Examples

Example 1: let $Σ_{1}$ be the set of the 26 lowercase English letters and the space characters. Then $s = hello world$ is a string drawn from $Σ_{1}$ .

Example 2: let $Σ_{2} = {0, 1}$ . Then $s = 010010110$ is a string drawn from $Σ_{2}$ . Strings drawn form $Σ_{2}$ are binary strings, i.e., sequences of 1s and 0s.

Example 3: note that the definition of string is a sequence of zero or more characters. Therefore, the empty string, written $ϵ$ ( $L A T E X$ : \epsilon), is a string drawn from any alphabet, including $Σ_{1}$ or $Σ_{2}$ as defined above.

In a conventional programming language, we would write the empty string $ϵ$ as "". However, since we traditionally do not use quotes to delineate strings, we must rely on a special symbol to denote the empty string.

Overview of Finite Automata

Now let's look at our first finite automata to get a feel for this sort of abstract machine.

    ┌─────────┬────────┐
    │         a        b
    ↓         │        │
-→ (q0) -a→ (q1) -b→ (q2) -a→ [q3]──┐
   │  ↑                         ↑  a,b
   └b-┘                         └───┘

This simple finite automata recognizes strings drawn from the alphabet $Σ = {a, b}$ . We can think of a finite automata as a labeled, directed graph where the nodes are states and the edges are transitions between states.

Informally, a finite automata operates as follows:

The machine begins initially in its start state, denoted in the graph above as the state $q_{0}$ with an incoming edge but no out-going state.
Next it reads in an input string character-by-character. As it reads each character, the machine transitions from its current state to a new state by moving along the edge annotated with the character that was read in.
Once the input string is completely consumed, we check to see what state the machine ended on. If the state is an accepting state, then the machine accepts the input string, i.e., returns "true." Otherwise, the machine rejects the input string, i.e., returns "false." In our example, we denote a final state in brackets rather than parentheses, so $q_{3}$ above is an accepting state whereas all the other states are normal.

As an example, here is how the machine operates over the string $ababa$ :

The machine initially starts in $q_{0}$ .
The machine reads an $a$ and transitions from $q_{0}$ to $q_{1}$ .
The machine reads a $b$ and transitions from $q_{1}$ to $q_{2}$ .
The machine reads an $a$ and transitions from $q_{2}$ to $q_{0}$ .
The machine reads a $b$ and transitions from $q_{0}$ back to $q_{0}$ .
The machine reads an $a$ and transitions from $q_{0}$ back to $q_{1}$ .

So when reading the string $ababa$ , the machine ends on state $q_{1}$ . $q_{1}$ is not an accepting state, so the automata rejects this string. In contrast, the machine would accept the string $bbababa$ .

Exercise (Simulation)

Trace the execution of the finite automata above on the input string $bbababa$ to show that the automata accepts this string.

With some effort, we can also verify that the strings:

$bbbababbb$ ,
$abab$ , and
$aba$ ,

Are accepted by this automata, whereas the strings

$bbbb$ ,
$aaaa$ , and
The empty string $ϵ$

Are not accepted by this automata.

What are the set of strings that the automata accepts? It turns out that the automata accepts any string that contains $aba$ ! We can observe this by inspecting the states and transitions of the automata and ask the question: how do we reach the acceptance state $q_{3}$ ?

We can only reach this state by moving from $q_{0}$ through $q_{1}$ and $q_{2}$ and finally to $q_{3}$ . According to the transitions, we can only do so by reading in the characters $a$ , $b$ , and $a$ in order. Also note that:

If we have not seen $aba$ yet, then any character not involved in this pattern returns us to $q_{0}$ , i.e., our search for this substring resets.
Once we have seen $aba$ and land in $q_{3}$ , anything that we read keeps us in this acceptance state. In other words, once we reach $q_{3}$ , we will always accept the string!

Finite Automata Formally Defined

Now that we have a high-level idea of how finite automata operate, let's look at their formal description.

Definition (Finite Automata)

A (deterministic) finite automata $D$ is a 5-tuple $D = (Q, Σ, q_{0}, δ, F)$ where:

$Q$ is the set of states.
$Σ$ is the alphabet.
$q_{0} \in Q$ is the initial state.
$δ : Q \times Σ \to Q$ is the state-transition function.
$F \subseteq Q$ is the set of accepting states.

Note that even though the automata is a directed graph, we do not model it directly as such! Instead, we model it as a collection of five components---the set of states, alphabet, initial state, transition function, and accepting states. The graph-like nature of an automata is inferred by this choice of representation:

States are the nodes.
The transition function contains the (directed) edges.

In particular, note that the transition function, when viewed as a relation, can be thought of a set of pairs, just like the edges of a graph!

Our example automata from above can be formally represented as follows:

$Q = {q_{0}, q_{1}, q_{2}, q_{3}}$ .
$Σ = {a, b}$ .
$q_{0}$ is the initial state.
$F = {q_{3}}$ .

The transition function is defined by the following transition pairs expressed in a state transition table:

	$a$	$b$
$q_{0}$	$q_{1}$	$q_{0}$
$q_{1}$	$q_{0}$	$q_{2}$
$q_{2}$	$q_{3}$	$q_{0}$
$q_{3}$	$q_{3}$	$q_{3}$

For example, in state $q_{0}$ , if the automata reads an $a$ , then the automata transitions to state $q_{1}$ .

Example

Example: define an automata $D = (Q, Σ, q_{even}, δ, F)$ as follows:

$Σ = {0, 1}$ .
$Q = {q_{even}, q_{odd}}$ .
$F = {q_{even}}$ .

Observe that there are two states and two characters of our alphabet. Therefore, we expect there to be $2 \times 2 = 4$ state-transition pairs in $δ$ . We'll, therefore, define the transition function $δ$ by cases as follows:

$δ (q_{even}, 0) = δ (q_{even}, 1) = δ (q_{odd}, 0) = δ (q_{odd}, 1) = q_{odd} q_{even} q_{even} q_{odd}$

Exercise (Formal Picture)

Draw the graph representation of the automata $D$ from the example above.

Acceptance

In order to formally verify that an automata is correct, we also need to formalize the notion of acceptance. Acceptance has a somewhat complicated definition; let's take a look:

Definition (Acceptance)

Consider a finite automata $D = (Q, q_{0}, Σ, δ, F)$ and $s = x_{1} \dots x_{k}$ be a string drawn from $Σ$ . We say that automata $D$ accepts string $s$ if there exists a sequence of states $q_{1}, \dots, q_{k} \in Q$ where:

$\forall i . 0 \leq i < k \to δ (q_{i}, x_{1}) = q_{i + 1}$ and
$q_{k} \in F$ .

Intuitively, this definition says that an automata accepts a string if the string drives the automata from its starting state to a final state.

Exercise (Back and Forth)

Take this intuition about what the formal definition of automata acceptance is saying and try to map the intuition on the symbols. In particular, how does the formal definition capture the idea that the input string "drives the automata from its starting state to a final state?"

Exercise (Scrutinize)

According to the formal definition of acceptance, does an automata accept a string if during execution of that string, the automata enters a final state, but is not in a final state at the execution?

Exercise (Spell It Out, ‡)

Consider the formally defined automata $D$ in the example above that reads binary strings. Apply the definition of acceptance to show that $D$ accepts the string $01100101$ .

(Hint: what does the formal definition of acceptance say we must construct and what must we show about this construction to show that $D$ accepts the string?)

Verification of State Machines

Problem 1: Traffic Lights

The beauty of finite state machines is that they can be used to model a wide variety of phenomena. For example, a finite state machine can be used to describe the behavior of a system of traffic lights. A single traffic light can be either green (go), yellow (caution), and red (stop).

A four-way intersection typically has one light for each incoming road. However, on many intersections, the behavior of the lights across from each other are symmetric. That is, consider the following intersection:

    A
    |
D ===== B
    |
    C

Because A and C are opposite each other, it is fine if both their lights are green, so they will typically be in sync, likewise with D and B. However, it would be problematic if A and B's lights were synced because then cars driving from A and B might collide into each other.

Because of this, we will model an intersection as a pair of color values, one for each light with the idea that each light represents parallel intersections, i.e., A and C or B and D. On top of this, when both lights become red, we need to know which of the two sets of lights should become green. Therefore, we need to also include a third value in our pair that tells us which light—the first or second element of the pair—should become green once both lights become red.

This leads to the following formal definition of the state of traffic lights at an intersection $S$ :

Definition (Traffic Light State)

The state of traffic lights at an intersection is represented by a triple $S$ :

$S : C \times C \times W$

Where $C = {g, y, r}$ representing the colors green, yellow, and red respectively and $W = {0, 1}$ representing whether the first light ( $0$ ) or the second light ( $1$ ) will become green next.

Example

The state $S = (g, r, 1)$ represents the situation where the first light is green, the second light is red, and the second light will turn green once both lights become red.

1.a: Code-to-Automata

We can translate these definitions into code pretty easily. Python allows us to write tuples directly, and we can use a string to represent color and a boolean to represent which light will toggle next. For example, the Python expression ('g', 'r', True) captures the state $S$ above. With this, we can give a Python function that simulates a traffic intersection:

def advance_intersection(state):
    match state:
        case ('g', 'r', b):      return ('y', 'r', b)
        case ('y', 'r', b):      return ('r', 'r', b)
        case ('r', 'r', True):   return ('r', 'g', False)
        case ('r', 'g', b):      return ('r', 'y', b)
        case ('r', 'y', b):      return ('r', 'r', b)
        case ('r', 'r', False):  return ('g', 'r', True)

Feel free to try this Python function out and give it some sample inputs, e.g., advance_intersection(('g', 'r', True)), and see how it operates.

Once you are comfortable with the function, draw a finite automata that represents this simulation. You should think carefully about how the different aspects of this problem map to the different components of a finite automata. You do not need to give a formal, symbolic description of the automata.

1.b: Correctness

A properly working intersection does not give right-of-way to perpendicular lights. We'll call such an intersection consistent. An inconsistent intersection is, therefore, one that gives right-of-way to perpendicular lights. Write a Python function, intersection_consistent, that takes a state as input and returns True iff the intersection is consistent.

1.c: A Slight Hiccup

We would like to eventually prove that advance_intersection only produces consistent results. However, it is not always the case that this is true. Show this by proving the following claim:

Claim

There exists an intersection state s such that intersection_consistent (advance_intersection s) = False.

(Hint: use your finite automata diagram to identify any particular states that don't have reasonable transitions!)

1.d: Fix-ups

What pre-condition do you need on advance_intersection to guarantee that advance_intersection always produces a consistent result? Write the condition formally $P (s)$ as a function of an arbitrary intersection state.

1.e: Proof of Correctness

With the precondition $P (s)$ that you identified in the previous part, finally prove the correctness of advance_intersection:

Claim

For all intersection states $s$ , $P (s) \to$ intersection_consistent (advance_intersection s) = True.

Problem 2: Servers

Another example of a finite state machine is a server-style program that hosts data, public and private. The server allows users to optionally authenticate after connecting to enable them to read private data in addition to public data. While we would need to appeal to networking and database libraries to properly implement such a server, we can emulate the authentication portion of the system with simple Python code.

First, let's define the series of commands we can issue to the server as a collection of strings:

"Connect": connects to the server
"ReadPublic": reads public data from the server
"Authenticate": authenticates with the server
"ReadPrivate": reads private data
"Disconnect": disconnects from the server

A session is a sequence of these commands.

Next we'll formalize the rules of the server, i.e., what constitutes a valid session:

A user must connect before reading or authenticating to the server.
A user must authenticate before reading private data.
A user must be disconnected at the end the session.

The state of a session, then, consists of two parts: whether a client is connected to the server and whether the client is authenticated. We'll represent this with a pair of booleans.

Definition (Client-Server State)

Let the state $s$ of our client-server simulation be a pair of booleans, represented by $0$ (False) and $1$ (True)

$S : {0, 1} \times {0, 1} .$

Problem 2a: Translation

Below is a candidate implementation of this protocol, a function that takes the current server state and a command and returns the updated state:

def process_command (state, cmd):
    (connected, authenticated) = state
    match cmd:
        case 'Connect':
            if not connected:
                print("Connected")
                return (True, authenticated)
            else:
                raise RuntimeError("Tried to connect but already connected")
        case 'ReadPublic':
            if connected:
                print("Reading public data")
                return (connected, authenticated)
            else:
                raise RuntimeError("Tried to read before connecting")
        case 'Authenticate':
            if connected and not authenticated:
                print("Authenticated")
                return (connected, true)
            elif not connected:
                raise RuntimeError("Tried to authenticate before connecting")
            else:
                raise RuntimeError("Tried to authenticate but already authenticated")
        case 'ReadPrivate':
            if connected and authenticated:
                print("Read private data")
                return (connected, authenticated)
            elif not connected:
                raise RuntimeError("Tried to read before connecting")
            else:
                raise RuntimeError("Tried to read private data before authenticating")
        case 'Disconnect':
            if connected:
                print("Disconnected")
                return (false, authenticated)
            else:
                raise RuntimeError("Tried to disconnect but not connected")

From the code, write down a finite state machine that captures this protocol.

Problem 2b: Problems

If you didn't see already, it turns out there is a serious flaw in our implementation of the server protocol! Prove the following claim:

Claim

There exists a sequence of commands $c_{1}, \dots, c_{k}$ such that they allow a user to read private data from the server without authenticating.

Problem 2c: Fix-ups

Correct your finite automata from problem 2a so the problem from the previous part is resolved. Use this new finite automata to rewrite the code so that it is correct!

Demonstration Exercise 7

Problem: Sportsball

Consider the problem of filling basketball teams. A single basketball team must have five people. Give combinatorial descriptions for each of the required values.

Basketball teams have five basic positions---center, power forward, small forward, shooting guard, and point guard. If there are $s$ students to choose from, how many ways can I fill a single basketball team where each student is assigned to one of these positions?
In a pickup game of basketball, the teams typically don't make a distinction between positions. In this context (where players are simply "on" a team without being assigned a position), how many ways are there to fill a team when there are $s$ students to choose from?
Now suppose you are a high school P.E. coach and you are filling a team of mixed grades. Such a team has at least 1 player that is in each of 9th, 10th, 11th, and 12th grades. The last slot may be drawn from any of these players. Let the number of students from each grade be:
- 9th graders: $a$
- 10th graders: $b$
- 11th graders: $c$
- 12th graders: $d$
How many ways are there to fill a single team?
At the college level, we only differentiate between three positions---center, forward, and guard. A team is normally composed of a center, two forwards, and two guards. Suppose there are $s$ students to choose from. How many ways can we fill a single team where each student is assigned to one of these positions? (Hint: You want to rule out repetition where, e.g., student $s_{1}$ is first chosen to be a guard, then $s_{2}$ and the case where $s_{2}$ is first chosen, then $s_{1}$ .)
Suppose you are filling teams based on a rating system where:
- There are $x$ 2 point players.
- There are $y$ 3 point players.
- There are $z$ 5 point players.
As a coach you may spend at most 15 points on forming a team of 5 players (but you may choose to spend less). How many different teams can you form in this system?

(Hint: Systematically explore all the possible valid point combinations, determine how many teams you can form from each, and then combine everything together. Make sure you give an unsimplified formula for this problem so your derivation is clear.)
Finally, suppose the pool of players consist of
- $v$ point guards,
- $w$ shooting guards,
- $x$ small forwards,
- $y$ power forwards, and
- $z$ centers.
In a sentence or two, give a description of the quantity described by the following combinatorial description. Be explicit in your description when ordering matters.

$(1 v) \cdot (2 x) \cdot (1 y) \cdot (1 z) .$

Problem: Relationship Counting

Consider the problem of counting binary relations over a domain of set $S$ and range of set $T$ .

How many possible functions can we form with domain $S$ and range $T$ ? Justify your combinatorial description using a few sentences explaining how you constructed your formula.
How many possible injective functions can we form with domain $S$ and range $T$ ? Justify your combinatorial description using a few sentences explaining how you constructed your formula.
Under what conditions (i.e., constraints on the sizes of $S$ and $T$ ) can we not form any injective functions between these sets? Prove this claim using the principles of counting.
Similarly, under what conditions (i.e., constraints on the sizes of $S$ and $T$ ) can we not form any surjective functions between these sets? Prove this claim using the principles of counting.

Problem: Parallel Hell

Consider the problem of analyzing programs operating in parallel over shared state. For example, here are two example program snippets written in C over a shared global variable glob with initial value 3.

// Program A
/* 1a */ int x = glob;
/* 2a */ glob = 5;
/* 3a */ glob = x * 2 + glob;

// Program B
/* 1b */ int y = glob;
/* 2b */ glob = glob - 2;
/* 3b */ glob = y;

If program A and B operated in serial, then execution of one program follows the other, so the value of glob will be 11. However, in a world where $A$ and $B$ operate in parallel, it is not clear which program executes first. On top of that, the instructions of $A$ and $B$ may be interleaved, leading to a variety of possible outcomes.

One way we can think of the possible interleavings of parallel programs is to serialize their executions, imagining how the two programs interact with each other through glob. For example, one such interleaving might be: 1a, 2a, 1b, 3a, 2b, 3b where:

Program A executes its first two instructions,
Then program B executes its first instruction,
Then program A finishes, and
Then program B finishes after that.

This results in a different value for glob, 5! Trace through this execution to make sure you understand why the resulting value is 5.

In general, a serialized execution is an interleaving of the instructions of the parallel-executing programs involved. The only constraint on this execution is that the order of the instructions within a program is preserved. Thus, the following interleaving: 1a, 1b, 2a, 3b, 3a, 2b is not a valid interleaving because 3b comes before 2b.

Come up with a serialized execution above where glob has the value 3 at the end of the execution.

(Hint: There's not really a systematic way to approach this problem in general. You might find it more productive to simply enumerate possible executions until you stumble on one that produces the desired result.)
The state explosion problem when performing static analysis over parallel programs come from the number of interleavings of instructions we must consider. Suppose that we are analyzing a pair of parallel programs that each have $m$ and $n$ instructions, respectively. Derive a combinatorial description of the number of possible serialized executions of these two programs and give a justification of why this description is accurate.

(Hint 1: There is an elegant description of this situation! Think careful about the requirements of a particular interleaving. What must be true about the order of instructions? How does this constrain the possible ways we can layout instructions if we, for example, layout one set of instructions first?)

(Hint 2: Draw diagrams to help you spatially understand these relationships!)
What does the state explosion problem say about the problem of trying to analyze parallel programs? For example, is it feasible to perform brute-force search of interleavings to find those that exhibit bugs (such as race conditions)? In a few sentences, explain why or why not.

Counting

Recall that the cardinality or size of a set $S$ , written $∣ S ∣$ , is the number of elements contained in $S$ . In some cases, computing the size of a set is straightforward. For example, if $S = {a, b, c, d, e}$ then $∣ S ∣ = 5$ by inspection of the definition of $S$ . However, suppose we have:

$S_{1} = {a, b, c, d} S_{2} = {a, c, e, f} .$

What is $∣ P (S_{1} \cup S_{2}) ∣$ ? We can compute the contents of $P (S_{1} \cup S_{2})$ and then count the number of elements. However, if there are many elements in the set, it might be impractical to compute-and-count. Furthermore, what if don't know the contents of $S_{1}$ and $S_{2}$ ? We would like to express the cardinality of this quantity in terms of $∣ S_{1} ∣$ and $∣ S_{2} ∣$ .

In this chapter, we focus on techniques for calculating the cardinality of finite sets, a branch of mathematics called enumerative combinatorics. As computer scientists, we are interested in not just modeling data but also performing operations over this data. Thus, we care greatly about techniques for calculating the sizes of sets as their sizes ultimately influence the expected runtime of the algorithms we development.

For example, consider the problem of determining the optimal route for a delivery truck to visit a number of businesses in a city and return back to the delivery center. This problem, a variant of the traveling salesman problem, is a fundamental problem with applications to a wide range of domains. A simple algorithm is as follows:

Enumerate every possible path through the city that originates from the delivery center and pick the shortest among them that (a) visits every business and (b) returns to the center.

How long would this program take to run? This is equivalent to asking the following question: what is the size of the set of all possible paths through the city? It turns out for a sufficiently well-connected city, there are an exponential number of paths to consider, far too many to simply enumerate in a reasonable amount of time. To see why this is the case, we will develop principles for counting sets of increasing complexity, using our set operations as a guide.

The Sum and Product Rules

Let's first explore how we might calculate the cardinality of the union of two sets. Suppose that we have the following sets:

$S_{1} = {a, b, c} S_{2} = {d, e}$

$S_{1} \cup S_{2} = {a, b, c, d, e}$ so $∣ S_{1} \cup S_{2} ∣ = 5$ . $∣ S_{1} ∣ = 3$ and $∣ S_{2} ∣ = 2$ so it is tempting to infer that the cardinality of the union of two sets is the sum of their cardinalities. However, consider the following alternative sets:

$S_{1} = {a, b, c} S_{2} = {a, c}$

$∣ S_{1} ∣ = 3$ and $∣ S_{2} ∣ = 2$ but $S_{1} \cup S_{2} = {a, b, c}$ and thus $∣ S_{1} \cup S_{2} ∣ = 3$ . Therefore, in order to assert that the cardinality of the union of sets is the sum of the cardinalities of the sets, we must also require that the sets do not possess elements in common! This gives rise to the sum rule for set cardinalities:

Definition (Sum Rule for Sets)

If $S_{1} \cap S_{2} = \emptyset$ then $∣ S_{1} \cup S_{2} ∣ = ∣ S_{1} ∣ + ∣ S_{2} ∣$ .

Note that we capture the notion of "sets do not possess elements in common" with the condition $S_{1} \cap S_{2} = \emptyset$ .

Exercise (Boundaries

Give a lower bound and upper bound for the cardinality of the union of two sets $S_{1}$ and $S_{2}$ . Justify your bounds in a sentence or two a piece.

We can generalize the sum rule to any number of sets as long as they are pairwise disjoint.

Definition (Pairwise Disjoint)

Say that a collection of $k$ sets $S_{1}, \dots, S_{k}$ are pairwise disjoint if for any pair of such sets $S_{i}$ and $S_{j}$ with $i \neq = j$ that $S_{i} \cap S_{j} = \emptyset$ .

Then we can say that for a collection of sets $S_{1}, \dots, S_{k}$ , if they are pairwise disjoint, then $∣ S_{1} \cup \dots \cup S_{k} ∣ = ∣ S_{1} ∣ + \dots + ∣ S_{k} ∣$ .

Next, let's consider the Cartesian product, $S_{1} \times S_{2}$ . Suppose that we again have:

$S_{1} = {a, b, c} S_{2} = {d, e}$

Then:

$S_{1} \times S_{2} = {(a, d), (a, e), (b, d), (b, e), (c, d), (c, e)}$

So $∣ S_{1} \times S_{2} ∣ = 6$ . Because $∣ S_{1} ∣ = 3$ and $∣ S_{2} ∣ = 2$ , we can infer that the cardinality of the Cartesian product is the product of the input sets.

Indeed, this is the case, which gives us the product rule for sets.

Definition (Product Rule for Sets)

$∣ S_{1} \times S_{2} ∣ = ∣ S_{1} ∣ \cdot ∣ S_{2} ∣$ .

Exercise (Duplicate Denouncement)

The sum rule places a pairwise disjointness restriction on its input sets. Is the same restriction necessary for the product rule? Calculate the size of the Cartesian product $S_{1} \times S_{2}$ of the following sets:

$S_{1} = {a, b, c} S_{2} = {a, b, c}$

And use this example to answer the question of whether pairwise disjointness is necessary to apply the product rule.

Counting as Choices

Set operations give us a formal, definition-based view of counting elements in sets. A useful, higher-level view of our counting principles are phrasing them as the number of possible choices we can make from a given set. This view of counting as choice is particularly useful for applying counting principles to real-world examples.

For example, let's consider the sum rule. Suppose that we have on a field trip:

10 first grade students,
15 second grade students, and
8 third grade students.

There are $10 + 15 + 8 = 23$ different students we can choose from, overall. If we label the sets $S_{1}$ , $S_{2}$ , and $S_{3}$ , respectively, then the sum rule tells us that:

$∣ S_{1} \cup S_{2} \cup S_{3} ∣ = ∣ S_{1} ∣ + ∣ S_{2} ∣ + ∣ S_{3} ∣ = 10 + 15 + 8 = 23.$

Set union and the sum rule allow us to consider choices when we combine sets into a single set. In contrast, Cartesian product allows us to make independent choices from a collection of sets. Each of the $k$ sets of a Cartesian product represents a different pool of elements to choose from. The Cartesian product enumerates all the different ways we can generate a tuple of $k$ elements by choosing one element from each pool.

Definition (Tuple)

A tuple is a fixed-size collection of $k$ elements, written as a $(x_{1}, \dots, x_{k})$ where the order between elements is relevant. We call a tuple of $k$ elements a $k$ -tuple, e.g., a pair is a 2-tuple.

Suppose that we have two hats, five shirts, three pairs of pants, and two pairs of shoes. The total number of outfits we can put together consisting of a hat, shirt, pants, and shoes is:

$2 \times 5 \times 3 \times 2 = 60.$

Alternatively, we can think of the problem as having four sets, one for hats, shirts, pants, and shoes. An outfit is, therefore, a 4-tuple with elements drawn from each of these sets. The Cartesian product of these four sets then gives us all possible outfits as 4-tuples.

Combinatorial Descriptions

Consider the total quantity of outfits we derived previous:

$2 \times 5 \times 3 \times 2.$

This unsimplified formula actually tells us quite a bit about the quantity in question. Because of the product rule, we know that the quantity represents the number of ways we can form choices from pools of two, five, three, and two choices, respectively.

In contrast, consider in isolation the value that this formula evaluates to:

$60.$

While technically accurate, this value tells us very little about the structure of the quantity or object that we are counting!

When counting quantities, we will universally favor giving unsimplified formulae for our set cardinalities rather than simplifying the formulae to a final result. We call these formulae combinatorial descriptions because these unsimplified cardinality formulae communicate the various choices we made in constructing an object in terms of set operations and our counting principles. In effect, a combinatorial description serves as a terse proof that an object can be decomposed used our counting principles as long as we know how to interpret the arithmetic operations contained within!

The Power Set

We saw in the previous reading that the power set of a set $S$ is the set of all subsets that you can make from $S$ . If $S = {a, b, c}$ , then:

$S = {\emptyset, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}} .$

So $∣ S ∣ = 2$ and $∣ P (S) ∣ = 8$ .

Exercise (Powerful Data)

Calculate the power sets and their cardinalities for a variety of sizes from $0$ to $4$ . You can also try computing the power set for a set of size $5$ or larger. However, be wary that the size of the power set grows very quickly as we shall discuss next!

With additional data points, we can see that the $∣ P (S) ∣$ seems to grow exponentially with the size of the input set! Indeed, the following property of the size of a power set is true:

Claim (Power Set Cardinality)

For any set $S$ with cardinality $k$ , $∣ P (S) ∣ = 2^{k}$ .

It is difficult to validate this empirically. If this proposition is true, then trying to calculate the power set of a 10-element set ought to result in $2^{10} = 1024$ elements which is certainly not reasonable to do by-hand! Instead, we would like to prove that this formula holds through a counting argument that establishes the cardinality of a set without needing to enumerate elements. We instead use our counting principles in a systematic fashion to describe how to build or choose elements from that set.

Let's see how we can do this to justify our claim about power sets.

Proof

The power set of $S$ contains all the possible subsets of $S$ . Consider constructing one such subset. Each of the $k$ elements of $S$ can be either included in the subset or not. By the product rule, this means that the total number of possible such subsets we can construct is

$k 2 \times \dots \times 2 = 2^{k}$

In other words, we can choose a subset of a set by forming a $k$ -tuple. Each element of the $k$ -tuple corresponds to one of the elements of the set. We can then assign a boolean value to each position indicating whether that element is in or out of the subset.

As a concrete example, suppose we have $S = {a, b, c}$ . Then the tuple $(t, f, t)$ corresponds to the subset ${a, c}$ .

This particular counting argument, the in-out argument, is particularly useful in computer science because we frequently work with binary data (0--1) or boolean choices (yes-or-no). As an example, suppose we have a piece of datum that is $k$ bits wide, e.g., 32 bits for an integer. Recall that a bit can either be set to 0 or 1. We can think of each integer as the collection of bits in the 32 bit sequence that are set 1. Since there are 32 bits in the collection, there must be $2^{32}$ such possible sets and thus $2^{32}$ possible integers. (Note that other bits of the datum might be devoted to other tasks, so the number of effective datum possible from $k$ -bits might be different than this raw quantity!)

The Inclusion-Exclusion Principle

The sum rule gives us a cardinality of the union of sets provided they are pairwise disjoint. However, can we use intersection to precisely characterize the size of the union without the need for a restriction?

To explore this idea, consider the following statistics about college majors:

There are 45 computer science majors.
There are 20 math majors.
There are 30 economics majors.

And suppose that we want to compute the total number of CS, math, and econ majors at the college. It would be tempting to say that the total is $45 + 20 + 30 = 95$ but what about a double major? If a person majors in both computer science and math, they would be counted twice, once for the count of CS majors and once for the count of math majors. If we knew how many double majors there were of all possible combinations, we can subtract them out once to account for this overcounting.

Suppose that we know that:

There are 10 computer science and math double majors.
There are 5 computer science and economics double majors.
There are 5 math and economics double majors.

Then, we might say that the total number of majors is:

$45 + 20 + 30 - 10 - 5 - 5 = 75.$

However, what about those rare triple majors? Consider such a single triple major:

The triple major appears once in each of the individual major counts, so we triple counted them in our addition.
The triple major appears once in each of the double major counts, so we triple counted them again in our subtraction.

That means we need to add them back in one last time! If we know that:

There are 2 computer science, math, and economics triple majors.

Then the total number of majors is:

$45 + 20 + 30 - 10 - 5 - 5 + 2 = 77.$

Intuitively, adding up all the singleton sets of majors overcounts the double major overlap, so we subtract them out. But by subtracting out the double major overlap, we undercount the triple majors, so we add them back in.

We generalize this alternation of addition and subtraction to account for overlap in the Inclusion-exclusion Principle or Generalized sum rule:

Definition (Inclusion-exclusion Principle)

Let $S_{1}, \dots, S_{k}$ be a collection of sets. Then the cardinality of the union of these sets is given by:

$∣ S_{1} \cup \dots \cup S_{k} ∣ = ∣ S_{1} ∣ + \dots + ∣ S_{k} ∣ - ∣ S_{1} \cap S_{2} ∣ - \dots - ∣ S_{k - 1} \cap S_{k} ∣ + ∣ S_{1} \cap S_{2} \cap S_{3} ∣ + \dots + ∣ S_{k - 2} \cap S_{k - 1} \cap S_{k} ∣ \dots$

In other words, the cardinality of the union of a set of sets $S_{1} \cup \dots \cup S_{k}$ , written $∣ S_{1} \cup \dots \cup S_{k} ∣$ is the sum of all the cardinalities of the singleton sets, the difference of all the intersection of pairs of sets, the sum of all the intersection of triples of sets, and so forth.

Exercise (Culinary Master, ‡)

Imagine that you are experimenting in the kitchen with a new dish and you need to add some vegetables, spices, colors, and sweeteners. Suppose that you have $a$ kinds of vegetables, $b$ kinds of spices, $c$ kinds of colors, and $d$ kinds of sweeteners to choose from. Give combinatorial descriptions (i.e., unevaluated counting formulae) for each of the following quantities:

The total number of vegetables and spices to choose from.
The number of ways to combine a single kind of vegetable, spice, and sweetener into the dish.
The number of ways to combine pairs of vegetables and spices and pairs of colors and sweeteners into the dish.

Ordered and Unordered Choice

So far, we have developed a number of counting principles based on our fundamental set operations. We will now use those principles to development two of the most foundational counting principles in combinatorics: permutations and combinations.

Ordered Choices with Replacement

Consider a set of 21 pre-school students. Every week, one student is randomly chosen to be the teacher's assistant. How many different sequences of assistants can we have over a 4-week period?

We can model this situation using a 4-tuple. If $T$ is the set of students, then each element of the 4-tuple is a member of this set. The 4-tuple, therefore, has the following type:

$(\cdot, \cdot, \cdot, \cdot) : T \times T \times T \times T$

The product rule tells us that the number of such tuples is:

$∣ T ∣ \times ∣ T ∣ \times ∣ T ∣ = 21 \times 21 \times 21 \times 21.$

Recall that this unsimplified expression, a combinatorial description, is preferred over its final value: $2 1^{4} = 194481$ . While this value is useful for understanding the magnitude of the number of possibilities, it does not tell us how this expression was derived.

Alternatively, we can view this problem as making choices from a set with replacement. We choose an element from the set $T$ ---there are 21 such choices---place that element back in the set and then make another choice from the same set. Since we do this process four times, there must be $21 \times 21 \times 21 \times 21$ such choices.

Exercise (Choices, Part 1)

Give a combinatorial description for the number of sequences (i.e., order is relevant) of size $6$ you can generate from the set of lowercase letters $S = {a, \dots, z}$ .

Ordered Choices without Replacement

You might have felt that the above arrangement of assistants to be unfair. For example, a possible set of choices we can make is:

$(Roy, Roy, Roy, Roy) .$

Once Roy is teacher's assistant, we might want to avoid choosing Roy in the future. How can we account for this? When making successive choices, we can simply not replace Roy in the set of students. Of course, we should not single out Roy. We should do this for every student that we pick!

If we were to form the same set of sequence of four teaching assistants from the set $T$ of 21 students using this strategy of no replacement, we would have:

$21$ choices for the first assistant.
$21 - 1 = 20$ choices for the second assistant.
$21 - 2 = 19$ choices for the third assistant.
$21 - 3 = 18$ choices for the fourth assistant.

In summary, the total number of possible sequences of four teaching assistants without replacement is:

$21 \times 20 \times 19 \times 18.$

Intuitively, the first choice is made with the entire set $T$ . The second choice is with $T - {x}$ where $x$ is the first choice. The third choice is made with $T - {x, y}$ where $y$ is the second choice. The final choice is made with $T - {x, y, z}$ where $z$ is the third choice.

In general, if we are creating a sequence of size $k$ from $n$ elements without replacement, there are:

$n \times (n - 1) \times \dots \times (n - k + 1) .$

This quantity is so common in counting that we give it a name, the falling factorial.

Definition (Falling Factorial)

We define:

$(n)_{k} = n \times (n - 1) \times \dots \times (n - k + 1) .$

$(n)_{k}$ , pronounced " $n$ to the falling $k$ " is the falling factorial of $n$ down to $k$ .

In terms of our counting principles, $(n)_{k}$ is the number of sequences of size $k$ from $n$ elements without replacement. Use this falling factorial notation in your combinatorial descriptions whenever you want to capture this quantity.

Exercise (Check It)

Why is the final term in the definition of $(n)_{k}$ $(n - k + 1)$ ? What would happen if we removed " $+ 1$ " from this term? Is the resulting formula equivalent to the original?

Permutations

Continuing with our teaching assistant example, it is clear that if there are 21 students in the class (represented by the set $T$ ) that we will have 21 choices to make. How many such sequences can we make where we eventually choose every student in the class to be an assistant? We can specify this using the falling factorial:

$(∣ T ∣)_{∣ T ∣} = ∣ T ∣ \times (∣ T ∣ - 1) \times \dots \times (∣ T ∣ - ∣ T ∣ + 1) = ∣ T ∣ \times \dots \times 1.$

The right-hand side of this quantity is simply our standard factorial function! So we can view $(n)_{n} = n!$ to be the number of sequences of size $n$ made from a set of $n$ elements without replacement.

However, there is another way we can view this quantity. We can, instead, consider choosing an arbitrary ordering of the students, a permutation. Reading off this ordering from left-to-right gives us the desired sequence since each element has a unique position in the ordering. We can think of $n!$ as the number of permutations of a $n$ -element set.

Permutations of a collection of objects arise in a variety of circumstances. For example, consider a list of $n$ elements. Can we efficiently sort a list by randomly generating a permutation until we arrive at a sorted one? This seemingly silly sorting algorithm has a name, Bogo Sort.

There are $n!$ possible permutations of a list. How many of these permutations are sorted? For simplicity's sake, let's assume that all the elements of the list are distinct. How many choices are there for each of the $n$ slots of the permutation?

There is only one choice for the first slot of the sorted permutation, the smallest element of the list.
There is only one choice for the second slot of the sorted permutation, the second smallest element of the list.

And so forth, for every slot of the permutation. In other words, there is a single permutation that is sorted for any list!

So Bogo Sort must randomly generate one permutation out of the $n!$ possible permutations for the list of $n$ elements. How likely is this? We are aware that $n!$ grows very quickly as $n$ grows:

$0! = 1$ .
$3! = 6$ .
$5! = 120$ .
$10! = 3628800$ .

According to Stirling's approximation:

$k! \approx 2 πk (\frac{k}{e})^{k}$

So factorial grows exponentially in $k$ . This means that for a list of any reasonable size, Bogo Sort is very unlikely to ever produce a correct answer!

Generally speaking, if an algorithm demands that we brute-force analyze the various permutations of a collection of data, that algorithm is likely computationally infeasible in practice on non-trivial inputs. If we recognize that we are in this situation, we ought to redesign our algorithm to rule out some of these possibilities and thus regain tractability.

Overcounting

In some cases, we will find it difficult to arrive at a direct combinational description of a quantity. It is sometimes easier instead to over-count the quantity and remove out redundant amounts. We saw this with the Principle of Inclusion-Exclusion where we subtracted out over-counted elements. However, we can generalize this technique to any situation, not just ones involving unions of sets.

As an example, consider the following situation. You are trying to count the number of sheep at a farm. However, there is wall in the way that only allows you to see the sheep's feet. Can you use this information to count the number of sheep? Assuming that all the sheep has four legs, you know that you can count the number of legs and divide by four to get the total number of sheep.

In effect, counting sheep feet over-counts the true number of sheep. However, we know that we're over-counting by a factor of 4---each sheep has 4 feet---so we can remove this factor by division.

As a second example of employing over-counting, consider counting the number of possible triangles formed by three distinct points, $A$ , $B$ , and $C$ . What is different about this problem relative to counting permutations is that certain permutations are considered equal to each other. For example, the triangles:

   A        B        C
  / \      / \      / \
 /   \    /   \    /   \
C-----B  A-----C  B-----A

Are all really the same triangle because if we rotate the first triangle counterclockwise, we obtain the second triangle, and again for the third.

How do we count the number of such unique triangles? One way to do this is to first consider the possible permutations of the three points $A$ , $B$ , and $C$ :

$A BC B A C C A B A CB BC A CB A$

We can think of each permutation as reading the points of the triangle in some predefined starting point and order, e.g., the top-most point in a clockwise direction. Note that in this light, we have two sets of triangles that are equivalent.

$A BC = C A B = BC A B A C = CB A = A CB$

Thus, even though there are $3! = 6$ permutations of the three points, we only have two different triangles according to this definition:

\begin{gather*} ABC \quad BAC \end{gather*}

Note that with three possible nodes, we will generate three equivalent triangles, corresponding to the three ways we can shift the sequence one element to the right: $A BC$ , $C A B$ , $BC A$ . To account for this redundancy, we can divide out the three expected equivalent triangles from each unique triangle that we generate, yielding:

$\frac{3 !}{3} = 2.$

To summarize, when we try to count a collection of objects, it is sometimes convenient to over-count and then remove the excess elements that do not meet our criteria. In the above example, the excess elements are equivalent triangles, and we know that for every triangle that we care to count, three equivalent triangles are introduced. To remove this redundancy, we divide accordingly.

Unordered Choices

With the technique of overcounting, we can now consider how to count the number of possible subsets of size $k$ with elements drawn from a collection of $n$ elements. In our counting terminology, sets are collections of objects where their order is irrelevant. We call such a set a combination drawn from the collection.

To derive a counting principle for combinations, we will first count the number of subsequences of size $k$ that we can create from $n$ elements. We can then remove the repetitive subsequences that contain the same set of elements but in a different order.

To summarize, the quantity that we want can be described as:

$\frac{#/subsequences of size k from n elements}{#/repeated subsequences of size k} .$

The numerator is simply $(n)_{k}$ . Now, how many of these subsequences are redundant? Note that there are $k!$ permutations of a $k$ -sequence. Of these permutations, only one is relevant because every other permutation is simply a re-ordering of the others! Therefore, the number of repeated subsequences is $k!$ . This leads to the final formula:

$\frac{( n ) _{k}}{k !} = \frac{n \times ( n - 1 ) \times \dots \times ( n - k + 1 )}{k !} .$

We frequently count the number of $k$ -combinations drawn from a set of $n$ elements, so denote it with the following notation:

$(k n) = \frac{( n ) _{k}}{k !} .$

$(k n)$ is pronounced " $n$ choose $k$ ", which denotes the number of $k$ subsets drawn from $n$ elements. This is also called the binominal coefficient, so-called because the formula also denotes the number of occurrences of $x^{n - k} y^{k}$ in the expansion of $(x + y)^{n}$ .

Summary

Here is a summary of the various counting principles we have discussed so far:

The Sum Rule: $∣ S_{1} \cup S_{2} ∣ = ∣ S_{1} ∣ \cup ∣ S_{2} ∣$ whenever $S_{1} \cap S_{2} = \emptyset$ .
The Product Rule: $∣ S_{1} \times S_{2} ∣ = ∣ S_{1} ∣ \times ∣ S_{2} ∣$ .
The Principle of Inclusion-Exclusion: $∣ S_{1} \cup S_{2} ∣ = ∣ S_{1} ∣ + ∣ S_{2} ∣ - ∣ S_{1} \cap S_{2} ∣$ .
$k$ -sequences from $n$ elements: $(n)_{k}$ .
Permutations of $n$ elements: $(n)_{n} = n!$ .
$k$ -subsets from $n$ elements: $(k n)$ .

Exercise (Choices, Part 2, ‡)

In poker, you play with a deck of 52 unique cards. Give combinatorial descriptions of the following quantities:

The number of ways that we can draw-and-replace sequences of 5 cards.
The number of five card hands (N.B., the arrangement of your hand is irrelevant).
The number of ways we can draw three pairs of cards from the deck.

(Hint: apply both your counting principles for sequences and combinations here.)

Counting Practice

Get out your fingers! In this lab, we're going to practice writing combinatorial descriptions using our fundamental counting principles. Recall that a combinatorial description is an unsimplified arithmetic formula whose structure communicates how we've decomposed a set according to our counting principles. Throughout, give combinatorial descriptions for each of the desired quantities when appropriate. You should also check your work by instantiating your combinatorial descriptions to small, artificial sets and then hand-constructing the quantities in question.

Problem: Take-away

Suppose that you have a deck of $c$ playing cards.

Consider drawing a sequence of three cards from the deck where the order of the cards is relevant. However, you form this sequence by drawing a card and then putting it back in the deck, i.e., with replacement. Give a combinatorial description for the number of possible three-card sequences you could draw in this manner.
Now consider three-card sequences from the deck but without replacement. That is, each drawn card stays out of the deck. Give a combinatorial description for the number of possible three-card sequences you could draw in the manner and show your check that your description is correct.

(Hint: The description for this part should be different from the description for the previous part! Consider drawing one card from the deck. How many cards remain if you don’t replace this initial card?)
What happens when you mix the two approaches? Suppose that you:
- First draw 2 cards with replacement.
- Next, draw 2 cards without replacement.
- Finally, draw 1 card with replacement.
Give a combinatorial description of the number of such sequences of five cards.
Suppose you are interested in counting the number of five-card hands you can draw from the deck. We might suggest that the number of such hands is:

$c \times (c - 1) \times (c - 2) \times (c - 3) \times (c - 4) .$

Think about what it means to have a hand of cards in a card game and what you would view as two equivalent hands. With this in mind, in a sentence or two, explain why this combinatorial description does not accurately describe the situation at hand.

Problem: The Perfect Fit

Suppose that a software company was hiring individuals based on the following criteria:

Knowledge of Racket programming.
Can write an inductive proof.
Is no taller than 175 cm.

The company, through pre-screening, ensures that all of their candidates possess at least one of these qualities. This hiring season, the company noted of their $c$ candidates:

$x$ knew how to program in Racket.
$y$ knew how to write an inductive proof.
$z$ were no taller than 175 cm.
$a$ candidates fulfilled all three criteria.

The company ultimately decided to hire candidates that satisfied exactly two of these criteria.

Use the principle of inclusion-exclusion to derive an unsimplified combinatorial description of this quantity and show your check that your description is correct.

Problem: Alternative

In the reading, you learned that the number of $k$ -sized (unordered) sets drawn from $n$ objects is given by:

$(k n) = \frac{( n ) _{k}}{k !} .$

If you have seen the combination (or "choose") operator before, you may not have learned that it was equal to this quantity. Instead, you likely learned that:

$(k n) = \frac{n !}{k ! ( n - k )!} .$

In a few sentences, justify or "explain" this second formula in terms of overcounting. What are you overcounting and what are you removing from that overcount to arrive at the desired final quantity?

Problem: Poker Hands

In variations of poker, players receive a five-card hand from a deck of 52 player cards. Recall that each of the 52 cards has a rank and suit:

The ranks range from 2--10, Jack (J), Queen (Q), King (K), and Ace (A), 13 ranks overall.
The suits are drawn from spades, clubs, hearts, and diamonds.

Give combinatorial descriptions for each of the following values.

The total number of possible poker hands. Remember that a poker hand is an unordered collection of five cards.
The total number of possible poker hand that contains exactly two pairs. A pair is a pair of the cards with the same rank, e.g., the 8 of spades and hearts.

Rather than computing this quantity by determining the ways of drawing the first card of the hand, the second card, and so forth, which will lead to a series of conditional choices, consider the following strategy instead:
- Choose 2 ranks to participate in the pair.
- Choose 2 suits from each rank to participate in the pair.
- Choose a remaining rank and a suite from that rank as the final card.
The total number of possible poker hands that are a full house. A full house is a hand consisting of a three-of-a-kind and a pair. A three-of-a-kind is a pair but with three cards of the same rank instead of two.
The total number of possible poker hands that are a four-of-a-kind, e.g., the Ace of spades, clubs, hearts, and diamonds.
The total number of possible poker hands that are a single pair. A single pair is where two of the cards have the same rank, e.g., the Jack of spades and hearts. Note that when a hand contains a single pair, it should not contain other, better hands, e.g., two pairs, a three-of-a-kind, or a four-of-a-kind.
The total number of possible poker hands that are a flush where all five cards are of the same suit, e.g., all spades cards. Like a pair, this quantity should also not include better hands: a straight flush (where you have a flush and the cards are in consecutive rank, e.g., 5-6-7-8-9) and a royal flush (a straight flush that starts with 10, i.e., 10-J-Q-K-A).

Check your work by observing that your combinatorial descriptions produce these quantities:

The number of poker hands: 2,598,960.
Two pairs: 123,552.
Full house: 3,744.
Four-of-a-kind: 624.
Pair: 1,098,240.
Flush: 5,108.

Problem: Deceptive

With five-card poker hands, a royal flush is a straight flush that runs 10-J-Q-K-A. The following combinatorial description denotes the number of royal flushes:

$(1 4) .$

In a sentence or two, justify this combinatorial description.

Problem: Shades of Pre-registration

Suppose that you are building a schedule from among $k$ distinct possible courses. Give combinatorial descriptions of each of the cardinalities described below. For each description, check your work by instantiating a concrete set of courses, and demonstrating that your description works for each such concrete example.

The number of five course schedules (i.e., sequences of courses) you can build from these courses.
The number of sets of three courses (i.e., unordered collections) you can build from these courses.
The number of four course schedules that contain both Basket Weaving or Underwater Knitting.

(Hint: choose positions for Basket Weaving and Underwater Knitting. This implies where the remaining two courses will go. Finally choose which courses go into those remaining two slots.)
The number of four course schedules that do not contain both Basket Weaving and Underwater Knitting.

(Hint: leverage your answer to the previous part!)
The number of six course schedules that include Calculus I and Computer Science I but do not place Calculus I after Computer Science I in the schedule.

(Hint: similarly to the previous part, choose positions for the two named courses. But here, you need to remove the possibilities that get the order these courses swapped.)
(Bonus Problem) The number of six course schedules that include Calculus I and Computer Science I but do not place Calculus I immediately after Computer Science I in the schedule.

(Hint: unlike the previous problem, we only want to ensure that Calc I does not appear in the next time slot after CS I. How can we break up the possible positions of the two courses to get a handle on these situations? How do we then account for overcounting?)

Counting-based Reasoning

Recall in our previous readings how we derived the number of $k$ -subsets drawn from $n$ elements, $(k n)$ :

Generate an ordered sequence of size $k$ drawn from $n$ elements, $(n)_{k}$ such sequences in all. For every unique subset of $k$ elements, remove the $k!$ permutations of that subset that appear in the set of ordered sequences. What remains is a subset of size $k$ drawn from the $n$ elements. Therefore, $(k n) = \frac{( n ) _{k}}{k !}$ .

This is not the only way we can derive $(k n)$ . Consider this alternative derivation.

Generate an ordered sequence of size $n$ , $n!$ such sequences in all and select the first $k$ elements from the sequence to be a subset. Observe that once we have distinguished the first $k$ elements from the remaining $n - k$ elements of the sequence, that the first $k$ elements can be permuted in $k!$ ways and the remaining elements can be permuted in $(n - k)!$ ways. Therefore, for every such unique subset of $k$ elements, remove the $k! (n - k)!$ equivalent sequences that contain this subset. Thus, $(k n) = \frac{n !}{k ! ( n - k )!}$ .

Observe that from both derivations we can conclude that:

$(k n) = \frac{( n ) _{k}}{k !} = \frac{n !}{k ! ( n - k )!}$

In other words, if we can count a collection of objects in two different ways, those two different ways must be equal. This is the principle of double counting to establish the equivalence of arithmetic formulae.

As a second example, let's consider the following combinatorial identity:

$(k n) = (n - k n) .$

We could use arithmetic to demonstrate this identity. However, let's use double counting instead which will immediately unveil why the two quantities are true. To do so, we must demonstrate that both formulae count the same object. It is clear from the left-hand side that the quantity in question is likely the number of $k$ -sized subsets drawn from $n$ elements. The left-hand side is precisely this quantity, so we must argue that the right-hand also computes this quantity.

Proof

Claim: the number of $k$ -sized subsets drawn from $n$ elements is $(n - k n)$ .

Proof. Observe that $(n - k n)$ is the number of possible subsets of size $n - k$ . Consider such a subset and note that $k$ elements are not included in this subset. The elements not included in the $n - k$ subset form the $k$ -sized subset in question. Finally, because we consider all such possible $n - k$ -sized subsets, then we will also consider all $k$ -sized subsets in this manner.

Because we demonstrated that the two formulae count the same collection---the number of $k$ -subsets drawn from $n$ elements---we can conclude that they are equal.

Also note that this particular example is interesting because it demonstrates yet another counting technique: implicit counting. Sometimes it is difficult to construct an object directly. Instead, we can construct another object that implies the desired object. Here, we constructed $n - k$ subsets whose existence implied the $k$ -sized subset we wanted.

Exercise (The One-two Punch, ‡)

Use the double counting principle to justify Pascal's Identity:

$(k n) = (k - 1 n - 1) + (k n - 1)$

(Hint: recall that summation means that you are making an "or"-style choice, i.e., breaking up the problem into two mutually-exclusive choices. For this problem, think about distinguishing one of the $n$ elements of the set under consideration. Work from the premise that your "or"-choice is whether this element is in the generated $k$ -sized subset or not.

Case Study: Derangements

A derangement of a sequence of elements is a permutation of the sequence where no element is placed in its original position. In this problem, we'll incrementally build up the number of derangements of $n$ elements through example exploration and double counting.

To begin with, let's first understand what a derangement is. Consider a group of $n$ students in a writing class with papers to be peer-reviewed. A derangement of this group of people corresponds to an assignment of peer-reviewed papers to students such that no student receive their own paper.

Problem (A Small Example)

First let's explore a concrete example to gain a feel for what calculating derangements entails. Suppose we have a sequence $S = a, b, c, d$ . Here are the $4! = 24$ permutations of this sequence for your reference:

$a, b, c, d b, a, c, d c, a, b, d a, c, b, d b, c, a, d c, b, a, d c, b, d, a b, c, d, a d, c, b, a c, d, b, a b, d, c, a d, b, c, a d, a, c, b a, d, c, b c, d, a, b d, c, a, b a, c, d, b c, a, d, b b, a, d, c a, b, d, c d, b, a, c b, d, a, c a, d, b, c d, a, b, c$

Use this list to identify the 9 derangements of $S$ .

Note that the number of derangements is closely related to the number of permutations. In this problem, we're going to count the number of derangements twice: once using recursion to obtain a recursively-defined function, a recurrence relation, and another time using inclusion-exclusion to obtain an explicit formula. By our double counting principle, we will therefore know that the two formula are equal!

First let's start with the recursive formula. Let's denote the number of derangements of $n$ elements as $! n$ . Note that this similar but not the same notation for factorial: $n!$ . Let's use our real-life scenario of peer-reviewed papers to illustrate the choices each student can make in picking a paper to review.

First, let's imagine a line of $n$ students that will choose papers to peer review and consider the choices for the first student:

Problem (Choices for First Student)

Problem (Choices For First Student). How many choices of papers are there for the first student to review among the $n$ students given that they are not allowed to review their own paper?

Suppose that the first student has chosen student $i$ 's paper ( $i \neq = 1$ ). Now let's consider student $i$ 's choices. Note that there's an asymmetry in our choices here! While the first student chose paper $i$ , student $i$ could not have chosen that paper since it is their own paper! Our successive choices now depend on whether student $i$ chooses the first student's paper.

Problem (Second Student: Chooses First Student's Paper)

Suppose that student $i$ chooses the first student's paper. In this situation, the first student and student $i$ have mutually paired up. What is a recursive formula for the number of derangements of the rest of the students in this case?

Now consider the case where student $i$ chooses a paper that is not the first person's paper. Because the first student choose student $i$ 's paper, there are now $n - 1$ papers left.

Problem (Second Student: Chooses Another Paper)

To count the number of possibilities, we note that while student $i$ 's paper has been taken, student $i$ can't take the first student's paper. If we put student $i$ back in line, what is a recursive formula for the number of derangements for the remaining students.

Finally let's put all this together into a final recursive formula for the number of derangements of a sequence of $n$ elements.

Problem (Putting It All Together)

Exercise (Putting It All Together).

Give a recursive formula for $! n$ , the number of derangements of $n$ elements in terms of the three formula you derived above.

(Hint: Think about the quantities in terms of the numbers of ways the first student and student $i$ can choose a paper.)

Now let's come up with an explicit formula for the number of derangements using inclusion-exclusion. Our approach will start with $n!$ , the number of permutations and then subtract out permutations that are not derangements. We'll do this in a systematic way: we'll consider all the permutations where one element is in original position, then two elements, and then three, all the way up to $n$ .

Increasing Non-Derangements

Consider the artificial sequence $S = a, b, c, d$ from before. Define the set $S_{k}$ to be the set of permutations of $1, 2, 3, 4$ where element $k$ is in its original position. List the contents of $S_{a}$ , $S_{b}$ , $S_{c}$ , and $S_{d}$ .

From this, derive an formula using set operations over $S_{a}, \dots S_{d}$ that describes the set of sequences with at least one element in its correct position.

Once you are done, you should note substantial overlap between the four sets. While we want to subtract all of these bad sequences from the total number of permutations $n!$ , we don't want to "over-over-count" and subtract too much! The principle of inclusion-exclusion comes to the rescue here!

A Concrete Formula

Give a formula for $! 4$ (since $∣ S ∣ = 4$ ) in terms of the concrete size of the buckets you calculated above using the principle of inclusion-exclusion. Verify that your formula results in 9, the number of derangements of this particular set.

Once you have this concrete formula, what is left is coming up with a general formula for the size of an arbitrary bucket.

An Abstract Formula

Consider a sequence of $n$ elements. Give a formula for the number of permutations of the $n$ elements where at least $k$ elements are in their own positions. To do this, frame this as two choices:

The number of ways to choose the $k$ elements that are in their own positions.
The number of ways to arrange the remaining elements of the sequence.

Finally, we can now come up with an overall count for the number of derangements and equate it to our recurrence. Note in developing this explicit formula, we have been counting the number of permutations with at least one element in its original position. In coming up with this final formula, note that the number of derangements is precisely the permutations where no such element is in its original position.

The Punchline

Put everything together to write down two formula for the number of derangements $! n$ of $n$ elements, a recursive formula and an explicit formula.

Counting Operations

An appropriate description of an object or an algorithm readily admits a combinatorial description of its size or complexity. This is especially important for computer scientists because we want to analyze the complexity of the programs we develop. We measure the complexity of a program by expected amount of resources that it consumes as a function of the size of its input. By resources, we typically mean either the time the program takes to execute---its time complexity---or the amount of memory the program consumes while executing---its space complexity. In CSC 207, you'll explore program complexity as it relates to the fundamental data structures of computer science. In this reading, we'll approach the topic of program complexity as an exercise in applied counting.

Critical Operations

The true amount of resources that a program takes to execute depends on many factors, e.g., applications running at the same time, the underlying operating system or hardware, most of which we cannot hope to easily capture with a mathematical function. We therefore approximate the complexity of a function by choosing a measure that implies the amount of resources a program uses while executing but is ultimately independent of these particular details. Usually, this measure is the number of critical operations that a program executes. The actual operations that we count are highly dependent on the program under consideration; we ultimately want to pick a minimal number of operations that is manageable to analyze but also accurately reflects the behavior of the program. In particular, when we analyze time complexity, we can use our critical operation(s) as the "unit" of time that the program takes to execute.

For example, consider the standard python implementation of the factorial function:

def factorial(n):
    if n == 0:
        return 1
    else:
        n * factorial(n-1)

A natural candidate for critical operations to count, here, are multiplications. Multiplications are the core behavior of the factorial computation. Furthermore, we intuitively believe they are the most common operation that occurs in factorial. Because of this, we can have confidence that the number of multiplications is a good approximation of the total time that factorial takes to run.

However, this is not the only operation we might consider. For example, we might consider equality comparisons, i.e., how many times we evaluate (= n 0) during the evaluation of factorial. Choosing equalities versus multiplications will lead in slightly different counts of total operations. In CSC 207, you will learn about Big-O notation that will allow you to conclude that these differences don't matter. For now, we'll simply note that both are reasonable choices as critical operations without further proof.

Counting Operations

Once we've identified our critical operations, we can now go about the business of counting them in our programs. In the absence of branching constructs, i.e., straight line code, this is straightforward: simply count all the occurrences of the operation in the code! For example, if we are counting calls to display as our critical operations, then display-string above clearly calls display four times.

When we sequence a series of actions, e.g., with begin, we can take the union of their resulting operations, i.e., add them up. For example, consider the following functions:

def display_strings():
    print('!')
    print('#')
    print('@')
    print('%')

def f():
    display_strings()
    display_strings()
    display_strings()

f calls display_strings three times, each of which calls print three times, so there are $3 + 3 + 3 = 9$ calls to print made overall.

Things get more interesting when we move from straight-line code to code that involves branching. We'll first focus on loops. Let's attempt to count the number of times display-range calls display.

def display_range(n):
    for i in range(n):
        print(i)

Here, the function in question calls print once for each element in range(n). The number of elements produced by range(n) is n since (range n) produces the list [0, ..., n-1]. Thus, display_range(n) calls print $1 \cdot n = n$ times. We can express as a (mathematical) function of $n$ , $f (n) = n$ represents the number of times print is called as a function of the input n.

Another tool that we can use to capture this quantity in a way that's more indicative of the structure of the code is summation notation. We recognize that the loop performs a set of repeated, sequential actions that, as per our previous discussion, can be combined with addition. Thus, we really have:

$f (n) = n 1 + 1 + \dots + 1 .$

Summation notation allows to express this repeated addition concisely:

$i = 0 \sum n - 1 1.$

We can think of a summation as the direct encoding of a for-loop. It is composed of three parts:

Below the $Σ$ (upper-case sigma, $L A T E X$ : \Sigma), an initializer for a bound variable that may appear in the body of the summation.
Above the $Σ$ , an (inclusive) upper-bound on the bound variable. This is the final value of the bound variable of the summation.
To the right of the $Σ$ , the body of the summation which describes each summand. The body appears as a summand once for each value the bound variable takes on.

In this particular case, the bound variable is $i$ , and it ranges from $0$ to $n - 1$ , inclusive. For each of these values of $i$ , the summand $1$ appears in the overall sum. In other words:

$i = 0 \sum n - 1 1 = n 1 + \dots + 1 .$

Note that the range from $0$ to $n - 1$ , inclusive, includes $n$ numbers. In contrast with computer programmers, mathematicians tend to use 1-indexing rather than 0-indexing as we normally do with our loops. This would be the more colloquial way to write this summation in math terms.

$i = 1 \sum n 1 = n 1 + \dots + 1 .$

In this particular case, the two summations are equivalent because the body of the summation doesn't reference $i$ . But if it does, then the meaning of the summation changes slightly. For example, consider these two summations that mention the bound variable in their bodies:

$i = 0 \sum n - 1 i = 0 + 1 + \dots + (n - 1) i = 1 \sum n i = 1 + 2 + \dots + n = \frac{n ( n + 1 )}{2}$

The change of lower- and upper-bounds of the variable resulted in a shift of the summands by one! Note that we can fix this by adjusting our usage of the variable in the body of the summand. For example, if we wanted to fix the $1$ -- $n$ summation so that it behaved like the $0$ -- $n - 1$ summation, we would write:

$i = 1 \sum n (i - 1) = (1 - 1) + (2 - 1) + \dots + (n - 1) = 0 + 1 + \dots + (n - 1) .$

Summations can be manipulated algebraic either by unfolding their definition as we have done or by using known identities. Examples of summation identities can be found in a variety of places, e.g., the Wikipedia article on summations.

Growth of Functions

Once we develop a mathematical function for the number of relevant operations an algorithm performs, we can categorize this function in terms of how fast it grows as its input grows:

Constant Functions are those functions that do not depend on their input. For example $f (x) = 300$ is a constant function that evaluates to $300$ , irrespective of its input. A function that grabs the head, i.e., first element of a linked list is a constant function since the process does not depend on the size of the list.
Linear Functions take the form $f (x) = m x + b$ where $m$ and $b$ are constants. They correspond to lines in a geometric sense. For example, walking an array takes linear time.
Quadratic Functions take the form $f (x) = a x^{2} + b x + c$ where $a$ , $b$ , and $c$ are constants. They correspond to curves. Functions with quadratic complexity arise, for example, when we must perform an operation involving all possible pairs of a collection of objects. If there are $n$ objects, then there are $n \times n = n^{2}$ operations that must be performed.
Cubic Functions take the form $f (x) = a x^{3} + b x^{2} + c x + d$ where $a$ , $b$ , $c$ , and $d$ . They correspond to curves with an inflection point and have a slope greater than a quadratic function. Functions with cubic complexity arise, for example, when we must perform an operation involving all possible triples of a collection of objects. Like the quadratic case, if there are $n$ objects, then there are $n \times n \times n = n^{3}$ operations to be performed.
Polynomial Functions generalizes the previous functions discussed so far. A polynomial has the form $f (x) = \sum_{i} a_{i} x^{i} + c$ where each $a_{i}$ and $c$ are constants. We'll usually lump quadratic and cubic functions under the "polynomial" functions and be more specific when we want to talk about linear and constant functions.
Exponential Functions take the form $f (x) = a b^{x}$ where $a$ and $b$ are constants. They also correspond to curves but with a steeper slope. Exponential functions arise, for example, when we have to consider all possible subsets of a collection of objects. For a collection of $n$ objects, there are $2^{n}$ possible such subsets.
Factorial, $f (x) = x!$ , corresponds to the number of possible orderings or permutations of $x$ elements. If our program needs to generate or consider all permutations of a collection of $n$ elements, then its runtime will be $n!$ .
Logarithmic Functions take the form $f (x) = lo g x$ . When using $lo g$ we usually assume the base of the logarithm is 10 (so that $lo g 1 0^{3} = 3$ ). However, in computer science, we usually assume $lo g$ is base 2. It will turn out the base of the logarithm is usually irrelevant for our purposes of asymptotic analysis because via the change-of-base rule--- $lo g_{a} x = \frac{l o g _{b} x}{l o g _{a} b}$ ---logarithms of different bases only differ by a constant amount (the term $lo g_{a} b$ in the rule). Logarithmic functions arise when we are able to divide a problem into sub-problems whose size is reduced by some factor, e.g., by half. When these problems are smaller versions the original problem, we call them "divide-and-conquer" problems and frequently use recursive design to solve them.
Linearithmic Functions are "linear-like" functions by some logarithmic factor, i.e., have the form $f (x) = x lo g x$ . Linearithmic functions arise when a divide-and-conquer sub-problems requires a linear amount of work. For example, the most efficient general-purpose sorting algorithms have this runtime.

Big-O Notation

When we perform complexity analysis, we would like to classify the growth behavior of our program according to one of the classes of functions listed above. We use Big-O notation to capture this fact. When we write $O (f)$ for some function $f$ , we refer to the set of functions that are all in the same growth class as $f$ . For example, $O (f)$ where $f (n) = n$ , refers to the class of linear functions such as:

$f_{1} (n) = 3 n f_{2} (n) = 5 n - 3 f_{3} (n) = 1 - n$

If we therefore think of $O (f)$ as a set of functions, we can write $g \in O (f)$ to mean that function $g$ belongs to the same class of functions that $O (f)$ belongs to. The functions $f, f_{1}, f_{2}, f_{3}$ above are all in the same complexity class so $f \in O (f_{1})$ , $f_{1} \in O (f)$ , $f \in O (f_{2})$ , $f_{2} \in O (f_{1})$ , etc..

We can categorize the complexity of our functions by using Big-O notation in tandem with the mathematical models we build to count the functions' relevant operations. For example, the following function:

def display_bunches(n):
    for i in range(n):
        print('!')
        print('!')

Performs $2$ prints for every n. Thus, the number of operations this function performs is:

$T (n) = \sum i = 1^{n} 2 = 2 n$

We can then say that $T \in O (n)$ , declaring that the runtime of the function is in the linear complexity class.

Note that when describing the complexity class, we tend to use the simplest function in that class, e.g., $n$ instead of $2 n$ or $3 n + 5$ even though these are technically accurate. With the constant complexity class we write $O (1)$ since it relates together all constant functions together.

The Formal Definition of Big-O

So far, we've developed an informal idea of Big-O---a classification of the growth rate of mathematical functions. Now let's unveil the specifics:

Definition (Big-O)

We write $f \in O (g)$ to mean that a function $f$ is upper-bounded by a function $g$ . This is true when the following condition holds:

$\exists c, n_{0} . \forall n \geq n_{0} . f (n) ∣ \leq c ∣ g (n) ∣$

What does this mean? First of all, for some function of interest $f$ , we say that $f \in O (g)$ , pronounced " $f$ is (Big) O-of- $g$ " or " $f$ is order $g$ ". This is true whenever there exists ( $\exists$ ) two constants $c$ and $n_{0}$ such that for all ( $\forall$ ) $n$ where $n \geq n_{0}$ the following inequality holds: $∣ f (n) \leq c ∣ g (n) ∣$ . That is $f (n)$ dominates $g (n)$ by some constant factor $c$ after some starting input $n_{0}$ .

$f \in O (g)$ captures the idea that $f$ is bounded above by $g$ . To show this, we must give two integers:

$c$ , a constant factor that $g$ is multiplied by and
$n_{0}$ , the minimum input size to consider.

Such that for all input sizes greater than or equal to $n_{0}$ , $f (n)$ is less than or equal to $c g (n)$ . That is, from $n_{0}$ on, $f$ is also smaller (or equal) to $g$ within a constant.

For example, let's show that $f \in O (g)$ where $f (n) = 30$ and $g (n) = x^{2}$ . First let's examine a graph of the two functions:

Graphs of f and g.

We can see that eventually $g$ (the red line) dominates $f$ (the blue line), but what is that point? This is the point where $g (n) = x^{2} = 30$ . Solving for $n$ yields $n = 3 0^{1/2} \equiv 5.48$ . Thus, we can claim that $n_{0} = 6$ (rounding up to be safe) and $c = 1$ . Here, we see the inequality holds because $f (6) = 30 \leq g (6) = 36$ . With this, we can conclude that $f \in O (g)$ .

Note that Big-O provides an upper bound on the asymptotic complexity of a function. For example, in the above example $f \in O (n!)$ as well. To see this, note that for $n \geq 5$ , $f (n) \leq g (n)$ . However, this is a weak upper bound because many other classes of functions are "smaller" than factorial, for example, polynomials and linear functions.

We always want to provide the tightest bound possible. However, because the $f$ that we are analyzing is not a pure mathematical function but a computer program with arbitrary behavior that we are trying to model, we will sometimes be unable to give a tight bound. We will therefore resort to a less tight bound in these situations. For example, you may suspect that the program has quadratic complexity but have difficulty proving it, so instead, you may claim a cubic bound instead which may be easier to show.

Exercise (Counting Exercise, ‡)

Consider the following function that creates a list of pairs from an input list:

def pair_up(l):
    result = []
    for x in range(3):
        for y in l:
            result.append((x, y))
    return result

Write down a mathematical function $T (n)$ that describes the number of append made as a function of the length of l, call it $n$ .
Give a big-O upper-bound for your $T$ and prove that this bound holds of $T$ .

Problem: Brute-force

Many algorithms in computing can be classified as combinatorial algorithms. To develop a combinatorial problem for a problem, we must cast that problem as a search for an object of interest from among a finite set of possibilities. Thus, we can think of the algorithm as a search procedure that moves through the space of possibilities in as efficient of a manner as possible.

If there are a finite number of possibilities, then one simple class of combinatorial algorithms are brute-force algorithms where we ignore efficiency and simply try all possible possibilities in some systematic fashion. While seemingly inelegant, sometimes brute-force solutions are the more pragmatic approach---and sometimes even more efficient in real-world settings! However, in many other cases, it is precisely avoiding brute-force search that is the purpose of the algorithm we develop for a task at hand.

Each of the following situations describes a problem that can be recast as a combinatorial algorithm. For each situation:

Describe what the finite set of possibilities are and the object of interest you are looking for.
Give a combinatorial description of the number of possibilities that a brute force algorithm would search through.
From the combinatorial description, give the (temporal) computational complexity in Big-O notation of the associated brute force algorithm for that problem.
Finally, from the Big-O description you gave, state whether the brute force algorithm is an efficient solution to the problem.

Note that some situations described below come from our previous lab on graph problems:

Finding the smallest element among $n$ elements of an array.
Finding a path between points $u$ and $v$ in a simple graph $G = (V, E)$ .
Determining whether the $n$ elements of an array are in sorted order.
Given a bipartite graph $G = (V, E)$ with $V_{1}, V_{2} \subseteq V$ forming a partition of $V$ and a matching $M \subseteq E$ , determining whether $M$ is a perfect matching.
Given a bipartite graph $G = (V, E)$ with $V_{1}, V_{2} \subseteq V$ forming a partition of $V$ and a matching $M \subseteq E$ , determine whether $G$ has a perfect matching.
Given a simple $G = (V, E)$ , determine whether $G$ has a $k$ -coloring.

Recurrences

Loops are one way to obtain repetitive behavior in a program, and we represent them in mathematics with summations. The other method we use to obtain repetitive behavior is recursion. For recursion, we use recurrence relations, recursively-defined mathematical functions, to capture their complexity.

Let us return to the factorial function we defined earlier:

def factorial(n):
    if n = 0:
        return 1
    else:
        return n * factorial(n-1)

We've established that we wish to count the number of multiplications that factorial performs. Intuitively, we know the answer to this question already: factorial(n) should perform $n$ multiplications, once for each number it multiplies from $n$ to $1$ . However, let's use this simple example as an opportunity to develop a technique for analyzing the complexity of recursive functions and, more generally, the size of any recursively defined object.

Ultimately, we want to define a function $f : N \to N$ where the input is the size of the input to factorial, i.e., the value of n, and the output is the number of multiplications that performs. Because the function ultimately performs a case analysis on $n$ , our function will also be conditionally defined based on $n$ :

$n = 0$ : the function immediately returns 1 and performs no multiplications.
$n > 0$ : we can see that the function directly performs one multiplication, but it also makes a recursive call to factorial(n-1). We can express this fact directly by recursively calling $f (n - 1)$ .

Our function $f$ is defined directly in terms of this case analysis:

$f (0) = f (n) = 0 1 + f (n - 1)$

Because $f$ is recursive, we call it a recurrence relation. Recurrence relations arise naturally when talking about the complexity of recursive functions.

Solving Recurrence Relations

To determine the complexity of a recursive function, we need to find an equivalent closed-form equation for the recurrence relation. The definition of "closed-form equation" varies based on context; we will interpret a "closed-form equation" as an equation that does not involve any recursion. There are many methods for solving recurrences, e.g., characteristic equations, recursion trees, the master theorem, that require mathematics that are outside the scope of our course. We instead present a simple substitution-based technique to first guess what the closed-form equation is for a simple recurrence relation and then check that we were correct by using inductive proof.

Guessing a Closed-Form Equation

To guess a closed-form equation for a recurrence, we repeatedly substitute for the recursive calls of our occurrence until we see a pattern. From that pattern, we extrapolate a likely equation for the recurrence. In the case of factorial, we may start with $f (k)$ for some arbitrary $k > 0$ and perform some substitutions to see what happens:

$f (k) = = = = = 1 + f (k - 1) 1 + (1 + f (k - 2)) 1 + (1 + (1 + f (k - 3))) 1 + (1 + (1 + (1 + f (k - 3)))) \dots$

We see that for every recursive call, we add one to the overall result. This immediately suggests the following equation: $f (n) = n$ . Alternatively, if this leap wasn't clear, we might consider operating more symbolically to mechanically derive this result. We observe that after $m$ expansions, we have:

$f (k) = = = = 1 + f (k - 1) 1 + (1 + f (k - 2)) \dots m + f (k - m) .$

Now we ask: when do these recursive calls end? These recursive call ends when $k - m = 0$ (the base case of the recurrence) which occurs when $k = m$ . Substituting back into the equation yields:

$f (k) = = = = = m + f (k - m) k + f (k - k) k + f (0) k + 0 k$

Checking a Closed-Form Equation

Now that we have guessed a closed-form solution to our recurrence, we must now check it for correctness. Since our recurrence is recursive in nature, it is not surprising that we must use induction to check that our recurrence is equivalent to our guessed closed-form equation. To differentiate between these two equations, we'll use $f (n)$ to denote the recurrence and $g (n)$ to denote our guessed equation.

Now our claim and subsequent simply equates these two functions:

Claim

$\forall n . f (n) = g (n) = n$ .

Proof

We proceed by induction $n$ .

$n = 0$ : $f (0) = 0$ by the definition of the recurrence and $g (n) = n$ so $g (0) = 0$ .
$n > 0$ : Our inductive hypothesis says that $f (n - 1) = g (n - 1) = n - 1$ and we must show that $f (n) = g (n) = n$ . By the definition of our recurrence $f (n) = 1 + f (n - 1)$ . We can substitute for $f (n - 1)$ by our induction hypothesis, yielding $f (n) = 1 + n - 1 = n$ , completing the proof.

In summary, when we encounter recursively defined structures, we count them using recurrence relations we can solve for closed-form solutions through a combination of the substitution method and induction to guess and check, respectively.

Exercise (Once More, With Feeling, ‡)

Repeat the analysis of factorial using recurrence relations but instead count the number of comparisons the function makes. Your should derive a new (but similar) recurrence, guess a closed-form equation, and then prove that closed-form equation correct with induction.

Recurrences

Problem: Recursive Counting

Now we'll try out hand at counting operations embedded in recursive functions. Recall that we use recurrence relations to capture these counts, mimicking the structure of the function. We then guess a closed-form solution to the recurrence and check it with an inductive proof.

In class, we analyzed a sorting example. In lab, we'll analyze two implementations of the power/exponentiation function.

def pow1(x, y):
    if y == 0:
        return 1
    else:
        return x * pow1(x, y-1)

def pow2(x, y):
    if y == 0:
        return 1
    elif y == 1:
        return x
    elif y % 2 == 0:
        return pow2(x, y/2) * pow2(x, y/2)
    else:
        return x * pow2(x, y/2) * pow2(x, y/2)

For each of pow1 and pow2:

Identify a critical operation to count.
Give a recurrence relation describing the number of critical operations the function performs as a function of the size of its input.

(Hint: which of the two inputs to pow1 and pow2 contributes to its runtime?)
Guess a closed-form solution to these recurrences by using the substitution method.

(Hint: for pow2 the following summation identity will be useful:

$i = 0 \sum n 2^{i} = 2^{n + 1} - 1.)$
Check your closed-form solution with an inductive proof.

Demonstration Exercise 8

Problem 1: Down for the Count

Consider the following Racket code:

def f(n):
    for i in range(int(n/2)):
        print("!")
    for i in range(n):
        for j in range(n):
            print("!")
            print("!")

def g(n):
    if n <= 1:
        pass # i.e., do nothing
    else:
        print("!")
        print("!")
        print("!")
        g(n-2)

Give two combinatorial descriptions for the number of !s printed by f, the first using summation notation, and the second, an explicit formula without the use of summation notation. Your description should be a function of the input n.
Give a recurrence relation describing the number of !s printed by g.

(Hint: Your recurrence relation's case should account for the fact that the recurrence goes down by two!)
Derive a closed-form expression for this recurrence relation and prove that the closed-form expression is equivalent to the recurrence. For this step, assume that $n$ is even.

(Hint: When proving your formula correct, note that because the recurrence ticks down by two, standard mathematical induction will not suffice.)
Adapt your answer from the previous part to account for any $n \in N$ .

(Hint: There are two ways to proceed. You may consider creating an additional recurrence to your solution for the case when $n$ is odd. It is also possible to come up with a single recurrence that uses the floor function $⌊ r ⌋$ which rounds real number $r$ down to the nearest whole number. For example $⌊ 4.7 ⌋ = 4$ .)

Problem 2: Cash Money

In the game of roulette, players bet on the outcome of a ball randomly landing in 38 distinct slots labeled with the numbers $1$ -- $36$ , $0$ , and $00$ . The numbers in the range $1$ -- $36$ are furthermore colored as follows:

$1, 3, 5, 7, 9, 12, 14, 16, 18, 19, 21, 23, 25, 27, 30, 32, 34, 36 2, 4, 6, 8, 10, 11, 13, 15, 17, 20, 22, 24, 26, 28, 29, 31, 33, 35 = red = black$

Give combinational descriptions (i.e., unsimplified formulae) for each of the required values.

What is the probability of winning a "00" bet (where the ball lands on $00$ )?
What is the probability of winning an "odd" bet (where the ball lands on an odd number)?
What is the probability of winning either a "1st dozen" bet (where the ball lands on a number in the range 1--12) or a red bet (where the ball lands on a red number)?
What is the probability of winning both an "even" bet (where the ball lands on an even number, 0 and 00 do not count as even) and a "3rd dozen" bet (where the ball lands on a 25--36)?
Let $X$ be a random variable describing the pay-off of a single set of bets in dollars. What is the expected value of placing both a "black" bet (where the ball lands on a black number) and a "1st dozen bet" bet if the pay-off for the black bet is 1 dollar and the pay-off for the "1st dozen bet" is 2 dollars?

Frequentist Probability

Uncertainty is a fundamental part of life. For example, imagine interviewing for an internship position. We ask ourselves, instinctually, what are the chances that I will get the job? Perhaps you feel like your chances are higher that day because you slept and ate well that morning. But maybe you know the company uses a programming language you aren't entirely comfortable with. You weigh these factors and arrive at an intuition of the likelihood that the interview is successful. However, you know that even though you feel like your chances are high, you may still not get the job.

How do we model this uncertainty? Up until this point, everything that we have done has been deterministic in nature, i.e., there has only been a singular, definite outcome of an event, whether that it is evaluating a program or simplifying an arithmetic expression. However, uncertainty introduces the need to consider multiple possible outcomes arising from a single event. Can we precisely define what it means for one of these events to be more likely to occur?

Probability theory models uncertainty by capturing our intuition that some events are more likely to occur than others. In this chapter, we'll study probability theory as an application of counting. If we can count of the number of occurrences of an event, we can assert a probability value that describes the likelihood of that event. As computer scientists, probability theory is particularly important for two reasons:

Because uncertainty is a natural phenomenon, we need ways of capturing and reasoning about uncertainty to accurately model real-world objects.
Uncertainty holds special potential for algorithmic design. Can we trade certainty like a resource to make our algorithms more efficient?

You likely have seen probability computations in your pre-collegiate math education. Such a probability value of a particular event occurring is computed using the following formula:

$\frac{Number of times that event occurs}{Total number of possible events} .$

In this reading, we'll introduce the fundamental definitions of this frequentist perspective on probability theory as well as some key concepts: expectation and conditional probabilities. We'll only scratch the surface of probability theory in this course. I highly recommend pursuing additional course work in this area, e.g., STA 209, because probability theory is becoming increasingly important for all computer scientists to understand in a world where statistical and machine learning-based techniques are gaining prevalence.

The Foundations of Frequentist Probability Theory

The probability of an event occurring can only be described in terms of other events occurring. We thus define a sample space that captures all the possible events under consideration.

Definition (Sample Space)

A sample space, $Ω$ , is the set of all possible outcomes of an experiment. Such a sample space is considered discrete if $Ω$ has finite cardinality. Otherwise, the sample space is considered continuous.

In this class, we focus exclusively on discrete probability. It is in the title of the class, after all.

Example

The sample space of an experiment where we flip a pair of coins is denoted by:

$Ω_{1} = {(H, H), (H, T), (T, H), (T, T)} .$

The sample space of an experiment where we roll three six-sided dice is denoted by:

$Ω_{2} = {(x, y, z) ∣ x, y, z \in [1, \dots, 6]} .$

Note that the sample space captures precisely the set of possible outcomes. Other outcomes, by definition, are not under consideration, e.g., the coins landing on their sides, unless they are included in $Ω$ .

Formally, an event, $E \subseteq Ω$ , describes the outcome of a particular experiment.

Example

The event describing when we obtain at least one head in two coin flips is denoted by:

$E_{1} = {(H, H), (H, T), (T, H)} .$

The event describing when the sum of the three die we roll is exactly 4 is denoted by: $E_{2} = {(x, y, z) ∣ (x, y, z) \subseteq Ω_{2}, x + y + z = 4}$

With an event formally defined, we can now define its likelihood, i.e., probability through a probability mass function. A probability mass function is a function $f : P (Ω) \to R$ that obeys the following properties:

$\forall E \in P (Ω) . f (E) \geq 0$ : the probability function produces non-negative probabilities.
$P (Ω) = 1$ : the probability that any event happens in the sample space is 1.
If $E_{1}, \dots, E_{k}$ are pairwise disjoint events, then:

$f (E_{1} \cup \dots \cup E_{k}) = i = 1 \sum k f (E_{i}),$

The probability of the union of a collection of disjoint events is the sum of their individual probabilities, i.e., the sum rule for probability theory.

We take the event as the domain of our probability function rather than a single outcome because we can always represent outcomes as singleton sets of events.

Example

If our coins are fair, then we expect that the probability of obtaining a heads or tails with a single coin to be equal.

$f (e) = \frac{1}{4}$

Now, suppose that we wish to know the probability of obtaining at least one head in two flips. This is denoted by the event:

$E = {(H, H), (H, T), (T, H)} .$

And by the sum rule, the probability of this event is:

$f (E) = = f ({(H, H)}) + f ({(H, T)}) + f ({(T, H)}) \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = \frac{3}{4} .$

Exercise (Gambling, ‡)

Consider the following gambling game:

Roll three six-side dice in sequence. Say that a die wins if it is a five or a six.

Write down the sample space $Ω$ of possible outcomes.
Assuming that the dice are all fair, what is the probability that you will win one occurrence of the gambling game?

Random Variables and Expectation

In this reading, we see how we can use probabilities to compute the expected value arising from an experiment. This simple statistic is the entry point into the wider world of statistical analysis where we look at common patterns of probability distributions and their properties. We won't have time in this course to explore statistics in detail, but we will talk about some basics here, so that you are aware of them for future study.

Random Variables and Expectation

Recall how our fundamental probability definitions are set up:

The outcome of an experiment is drawn from the sample space $Ω$ .
An event $E \subseteq Ω$ describes a particular collection of outcomes of interest.
For each event, we assign a probability to that event through the probability mass function $Pr : P (Ω) \to R$ which obeys the three axioms of probability of theory.

With our probability mass function, we can state the likelihood of events occurring. However, we can also use our probability mass function in conjunction with some other machinery to state the weighted average of the possible outcomes of an experiment. To do this, we first need to define a way to interpret the outcome of an experiment. We do by way of a function (confusingly) called a random variable.

Definition (Random Variable)

A random variable is a function $X : Ω \to T$ for some output type $T$ . A random variable represents some interpretation of the outcomes of some random process.

Consider our example of rolling three random dice, denoted by the set of outcomes:

$Ω = {(x, y, z) ∣ x, y, z \in [1, \dots, 6]} .$

The sum of these dice forms a random variable, $X$ :

$X (x, y, z) = x + y + z .$

The codomain of this random variable is the set of natural numbers in the range $[3, \dots, 18]$ .

Note that the codomain of a random variable need not be a number. For example, if the sample space $Ω$ is the set of valid rock-paper-scissor plays:

$Σ = {(p_{1}, p_{2}) ∣ p_{1}, p_{2} \in P}$

Where $P = {rock, paper, scissors}$ .

Then the random variable $X_{w}$ :

$X_{w} (p_{1}, p_{2}) = {win lose p_{1} beats p_{2} otherwise$

Has type $X_{w} : P \times P \to {win, lose}$ .

Exercise (Random Variables)

Let \Omega consist of the outcomes of flipping three coins. Define a random variable $X$ that gives the parity of the coins, i.e., the number of coins that turn up heads.

While the codomain of a random variable can be of any type, we most commonly work with real-valued random variables, i.e., $R$ . Let $X$ be a random variable over a set of outcomes $Ω$ of type $X : Ω \to R$ . Also suppose the existence of a probability function $p r : Ω \to R$ over these outcomes. Then the expected value of $X$ , written $E [X]$ , is defined to be the weighted average of the outcomes and their respective probabilities:

$E [X] = t \in Ω \sum X (t) \cdot p r (t) .$

Example

Example: consider an experiment where we have a weighed six-sided dice with outcomes $Ω = {1, 2, 3, 4, 5, 6}$ . The probabilities of each outcome are:

$f (1) = \frac{1}{20} f (2) = \frac{1}{6} f (3) = \frac{1}{6} f (4) = \frac{1}{6} f (5) = \frac{1}{5} f (6) = \frac{1}{4} .$

Let $X : Ω \to R$ be a random variable that represents the value of a particular dice roll. Then the expectation of $X$ is the expected value of the weighted die:

$E [X] = 1 \cdot \frac{1}{20} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{5} + 6 \cdot \frac{1}{4} = 4.05.$

In contrast, if the probabilities of all the sides of the die were equally likely, then the expected value of the die would be:

$E [X] = i = 1 \sum 6 i \cdot \frac{1}{6} = 3.5.$

We can think of the expectation of a random variable to be the weighted average of that variable where the weights are the probabilities of the various outcomes.

One consequence of the definition of expectation is that we can treat $E [-]$ as an operation on a (random) variable. With this perspective, we can see that several algebraic properties hold of expectations. The most important of these is the linearity of expectation:

Claim (Linearity of Expectation)

Let $X$ and $Y$ be real-valued random variables. Then the following identities hold:

$E [X + Y] = E [X] + E [Y] E [a X] = a E [X]$

For some constant value $a$ .

The linearity of expectation says that addition and multiplication (of a constant) distribute in a natural sense through expectation. This fact allows us to manipulate and combine random variables as if they were plain old variables.

Example

let $Ω$ be the set of all pairs of outcomes of two six-sided dice. Let $X$ be a random variable defined as follows:

$X (k) = {10 k is even otherwise .$

And let $Y$ be a random variable that is defined to be the sum of the two dice values.

By the linearity of expectations, $E [X + Y] = E [X] + E [Y] = \frac{1}{2} + \frac{7}{2} = 4$ is the sum of averages of the two random variables. Also by the linearity of expectations $E [2 Y] = 2 E [Y] = 2 \cdot \frac{7}{2} = 7$ , the average of $Y$ scaled by a factor of two.

Probability Distributions

Many experiments share similar distributions of probabilities among its outcomes. The study of probability distributions and their properties is an important part of the mathematical subfield of statistics. Here, we explore the basic concepts of probability distributions in light of our fundamental definitions of probability theory.

Definition (Probability Distribution)

Let $X : Ω \to T$ be a random variable over a sample space $Ω$ and interpretation $T$ . A probability distribution is a function $Pr (t) : T \to R$ that describes the probabilities of the various interpretations of the elements of the sample space.

More informally, a probability distribution is a description of how a probability function distributes probabilities among the possible outcomes of an experiment. Many kinds of experiments fall into a handful of well known and understood probability distributions.

Bernoulli Distributions

Let $X$ be a random variable with codomain ${0, 1}$ . Then a probability distribution $Pr : {0, 1} \to R$ over $X$ forms a Bernoulli distribution with probability $p$ where:

$Pr (1) Pr (0) = p = 1 - p$

We call $p$ a parameter of the probability distribution. The Bernoulli distribution describe the outcome of a single experiment with a binary outcome---success or failure. The use of $0$ and $1$ to indicate boolean values is common in many areas of mathematics.

Example

Example: here are some applications of the Bernoulli distribution.

The probability of a single fair coin flip being heads forms a Bernoulli distribution with success $p = 0.5$ .
Suppose you play a game where you roll two six-sided dice and you win if the sum of the die is greater than 8. Then the probability of winning the game forms a Bernoulli distribution with success $p = \frac{10}{36}$ . (Note that there are 10 ways out of $6 \times 6 = 36$ possibilities to get a higher than an 8 with two six-sided dice.)

Note how the Bernoulli distribution allows us to concisely describe the distribution of a set of probabilities. Different distributions exist in statistics that capture a wide variety of possible probabilities and situations.

Binominal Distributions

We can describe a particular probability distribution using a variety of statistics which summarize salient characteristics of that distribution. These include statistics you ought to be familiar with already, e.g.,

The average (the expected value of a random value),
Median (the value that splits the probability distribution in half), and
Mode (the most frequent value).

Let $X : Ω \to N$ be a random variable that records the number of successes $k$ after running $n$ independent experiments. Then a probability distribution $Pr : N \to R$ over $X$ forms a Binomial distribution with the probability of generating $k$ successes is given by:

$Pr (k) = (k n) p^{k} (1 - p)^{n - k}$

Where $p$ is the probability of a single experiment generating a success. As shorthand, we write $B (n, p)$ for the binomial distribution consisting of $n$ independent experiments with probability of success $p$ for an individual experiment.

The probability function is derived combinatorially as follows:

The probability of getting $k$ successes is $k p \cdot \dots \cdot p = p^{k}$ .
The remaining experiments must be failures, and there are $n - k$ of them, so the probability of this is $n - k (1 - p) \cdot \dots \cdot (1 - p) = (1 - p)^{n - k}$ .
Finally, any $k$ -subset of the $n$ experiments may succeed, so there are $(k n)$ such combinations where we have $k$ successes and $n - k$ failures.

This final point is why we alternatively call the "choose" operator $(k n)$ the binomial coefficient.

The binomial distribution is a generalization of the Bernoulli distribution where we conduct $n$ such experiments rather than a single one. As such, it is potentially relevant whenever we are discussing the outcomes of running repeated trials with a binary result.

Example

Suppose that we flip a biased coin with probability $p = 0.75$ heads and $1 - p = 0.25$ tails $20$ times with success defined as obtaining heads. This forms a binomial distribution $B (20, 0.75)$ .

As a consequence of identifying a probability distribution is binomial, we can apply known formulae to quickly derive statistics for that distribution.

Let $B (n, p)$ be a binomial distribution. Then:

The mean or expected value of the distribution is $n p$ .
The median of the distribution is either $⌊ n p ⌋$ or $⌈ n p ⌉$ . $⌊ - ⌋$ is the unary flooring function which rounds its argument down. In contrast, $⌈ - ⌉$ is the unary ceiling function which rounds its argument up.
The mode of the distribution is either $⌊(n + 1) p ⌋$ or $⌈(n + 1) p ⌉ - 1$ .

Example

Example: For our biased coin example above, the expected number of heads is $20 \cdot 0.75 = 15$ .

Exercise (More Gambling, ‡)

Let's expand on the simple gambling game from the previous readings' exercise.

Suppose that to play the game, you need to put in $1. Roll three six-sided dice in sequence. Say that a die wins if it is a five or a six.

If the first die wins you get $1.

If the first and second die win you get $2.

If the first, second, and third die win you get $4.

In all other cases, you get nothing.

a. Write down the sample space $Ω$ of possible outcomes and a set of four disjoint events $E_{1}, \dots, E_{4}$ that describe the outcomes above. b. Write down the definition of a random variable $X$ that describes the amount of money you may win from a single play of the game. c. Calculate the expected value $E [X]$ of the random variable you defined above. Based on the computed value of $E [X]$ , is it worthwhile to play this game?

Probability Practice

In this lab, we'll practice using the fundamental definitions of probability theory to perform some probabilistic computations.

Problem: Roleplaying

In tabletop games such as Dungeons and Dragons, players roll a variety of polyhedral die each with a certain number of sides. To express the number of such dice rolled, we use the notation $x d y$ to refer to rolling $x$ dice with $y$ sides. For example $1 d 6$ refers to rolling one six-sided die (with values 1--6). In contrast, $2 d 8$ means rolling two eight-sided dice (with values 1--8).

Give combinatorial descriptions for each of the following values:

The probability of succeeding at a medium ability check in Dungeons and Dragons (with no additional modifiers). To succeed at a medium ability check, the player rolls $1 d 20$ and succeeds if they get a 15 or higher.
The probability that the player succeeds a medium ability check with disadvantage. A player has disadvantage if the ability check is made under circumstances unfavorable to the player. When a player has disadvantage, in contrast, they roll $2 d 20$ and take the lower of the two rolls.

(Hint: consider what rolls lead to player success in this scenario.)
The probability of succeeding a medium ability check with advantage. A player has advantage if the ability check is made under circumstances that are favorable to the player. When a player has advantage, they roll $2 d 20$ and take the higher of the two rolls.

(Hint: you might find it easier to reason about the situations in which a player fails, instead.)
The probability of resisting a disintegrate spell. Consider a level 6 disintegrate spell. To resist the spell, the victim rolls a single $d 20$ and then adds their dexterity modifier $d$ to the roll. This is compared to the modified spell-casting ability of the caster, calculated as follows:

$8 + spellcasting profiency bonus + intelligence modifier .$

Let $s$ be the spellcasting bonus and $i$ be the intelligence modifier. The victim resists the spell if their modified roll is higher than the modified spellcasting ability of the caster.
The probability that a creature survives a magic missile spell cast at level 2. Magic missiles deals $2 d 4 + 2$ , i.e., two $d 4$ s plus 2 additional damage, to a target (without a chance for resisting or saving). Consider two separate cases, the victim with:
- 7 hit points.
- $h$ hit points, i.e., an arbitrary number of hit points.
The creature survives if the damage dealt by the spell is less than its remaining hit points.

(Hint: the space of possible damage values is small enough to hand-calculate. For $h$ hit points, you'll need to define the probability as a piecewise function---what should the probabilities of survival be if $h$ is equal to or less than the minimum damage? The maximum damage?)
The probability of being disintegrated by a disintegrate spell. If a disintegrate spell is not resisted, the victim takes $10 d 6 + 40$ damage. If this damage reduces the victim to $0$ hit points or lower, the target is disintegrated---they cannot be resurrected except through a True Resurrection or Wish spell! Calculate this probability assuming that the dexterity modifier of the victim is $1$ , the spellcasting proficiency bonus of the caster is $4$ , the caster's intelligence modifier is $2$ , and, in two separate cases, the victim has:
- $46$ hit points.
- $99$ hit points.
(Hint: with concrete numbers, you can compute (a) the total number of outcomes possible from the $10 d 6$ roll and (b) the total number of ways that the damage roll exceeds the victim's health. In the second case, you can come up with a formula for all the ways that $10 d 6 + 40$ can result in $99$ or greater damage.)

Problem: Expectations

So far, we have calculated the probability of an event occurring. Suppose that we assign numeric values to each possible event, e.g., an amount of money won if you have a certain hand. We can then use these probabilities to compute the average value of an experiment, called its expected value or expectation.

To formalize this notion, we can create a function $f : P (Ω) \to R$ be an interpretation function that takes an event as input and then produces its numeric "value" as output. If $p : P (Ω) \to R$ is the probability of an event occurring, then the expected value of the experiment is given by:

$E \in P (Q) \sum f (E) \cdot p (E) .$

In other words, the expectation is the weighted average of all the possible events of the experiment.

Suppose that we draw a poker hand at random and receive the following pay-out depending on the outcome:
- $1 if we draw a pair.
- $10 if we draw a full house.
- $100 if we draw a flush.
If none of these occurrences happen, we lose $3 dollars. Compute the expectation of this game.
Now consider our Dungeons and Dragon example again. To hit a monster in combat, the player must first roll $1 d 20$ to determine if they hit the monster. They hit the monster if the roll equals to or exceeds the monster's armor class. If they do hit the monster, they roll a certain number of additional dice to determine the amount of damage dealt to the monster.

Let's say for the purposes of the problem that the player is equipped with two daggers so they roll $2 d 4$ damage if they hit and the monster in question has an armor class of 12. What is the expected amount of damage dealt to the monster by the player in a single attack?

Conditional Probability

So far, we've considered probability computations in the absence of additional information. However, how does knowledge of one event influence the probability of another event occurring? By modeling this phenomena with conditional probabilities, we can begin to model the notion of learning where the discovery of new information influences our current knowledge. This is the basis of modern machine learning techniques that are so prevalent in modern-day computing.

Definition (Conditional Probability)

The conditional probability of an event $A$ given that an event $B$ has occurred, written $Pr (A ∣ B)$ is:

$Pr (A ∣ B) = \frac{Pr ( A \cap B )}{Pr ( B )} .$

We can pronounce $Pr (A ∣ B)$ as the probability of event $A$ occurring given that $B$ has occurred. This is a sort of implication, but for probabilities.

For example, consider the random value $X$ representing the sum of rolling two six-sided dice. Here are all the possible outcomes of $X$ :

$X = 2$ , 1 possibility: $1 + 1$ .
$X = 3$ , 2 possibilities: $1 + 2$ , $2 + 1$ .
$X = 4$ , 3 possibilities: $1 + 3$ , $3 + 1$ , $2 + 2$ .
$X = 5$ , 4 possibilities: $1 + 4$ , $4 + 1$ , $2 + 3$ , $3 + 2$ .
$X = 6$ , 5 possibilities: $1 + 5$ , $5 + 1$ , $2 + 4$ , $4 + 2$ , $3 + 3$ .
$X = 7$ , 6 possibilities: $1 + 6$ , $6 + 1$ , $2 + 5$ , $5 + 2$ , $3 + 4$ , $4 + 3$ .
$X = 8$ , 5 possibilities: $2 + 6$ , $6 + 2$ , $3 + 5$ , $5 + 3$ , $4 + 4$ .
$X = 9$ , 4 possibilities: $3 + 6$ , $6 + 3$ , $4 + 5$ , $5 + 4$ .
$X = 10$ , 3 possibilities: $4 + 6$ , $6 + 4$ , $5 + 5$ .
$X = 11$ , 2 possibilities: $5 + 6$ , $6 + 5$ .
$X = 12$ , 1 possibility: $6 + 6$ .

The probability of $X = 8$ is $Pr (8) = \frac{5}{36}$ . However, what if we know that the first die is a $3$ ? Then, we only consider the dice rolls where the first die $x$ is a $3$ :

$3 + 1, 3 + 2, 3 + 3, 3 + 4, 3 + 5, 3 + 6$

Of these six possibilities, only one results in a sum of $8$ , so we have that $Pr (8 ∣ x = 3) = \frac{1}{6}$ . Alternatively, we can calculate this directly using the definition of conditional probability:

$Pr (8 ∣ x = 3) = \frac{Pr ( 8 \cap x = 3 )}{Pr ( x = 3 )} = \frac{\frac{1}{36}}{\frac{6}{36}} = \frac{1}{6} .$

Independence

In the example above, knowing that the first dice is a 3 influences the probability that their sum is 8. We say that the two events---"the first dice is a 3" and "the sum of the two dice is 8"---are dependent on each other. However, we have an intuition that some events are independent of each other. For example, consider the two events:

$E_{1}$ = "The first dice is a two."
$E_{2}$ = "The second dice is even."

$P r (E_{1}) = \frac{6}{36}$ since there are six possibilities for the second dice when the first is fixed to two. $P r (E_{2}) = \frac{3 \cdot 6}{36}$ since there are 3 possibilities for the second dice to be even and then 6 possibilities for the first dice once the second has been fixed. However, since we believe that $E_{1}$ is independent of $E_{2}$ that $Pr (E_{1} ∣ E_{2}) = Pr (E_{1})$ , i.e., knowledge of $E_{2}$ does not change the probability of $E_{1}$ .

We formalize the notion of independence in probability theory as follows:

Definition (Independence)

We say that two events $E_{1}, E_{2} \subseteq Ω$ are independent if $Pr (E_{1} \cap E_{2}) = Pr (E_{1}) \cdot Pr (E_{2})$ .

That is, independence is the condition necessary for us to apply the combinatorial product rule to probabilities. If two events are independent, then we can reason about their probabilities in sequence.

Proof

Claim: If two events $E_{1}, E_{2} \subseteq Ω$ are independent, then $P r (E_{1} ∣ E_{2}) = E_{1}$ .

Proof. By the definition of conditional probability and independence:

$Pr (E_{1} ∣ E_{2}) = \frac{Pr ( E _{1} \cap E _{2} )}{Pr ( E _{2} )} = \frac{Pr ( E _{1} ) Pr ( E _{2} )}{P r ( E _{2} )} = Pr (E_{1}) .$

A similar argument holds for $Pr (E_{2})$ as well.

Bayes' Theorem

Is the probability $Pr (A ∣ B)$ related to $Pr (B ∣ A)$ in any way? We can use the definition of conditional probability to explore this idea:

$Pr (A ∣ B) = \frac{Pr ( A \cap B )}{Pr ( B )} Pr (B ∣ A) = \frac{Pr ( B \cap A )}{Pr ( A )} .$

But set intersection is symmetric, so we have that:

$Pr (A \cap B) = Pr (A ∣ B) Pr (B) = Pr (B ∣ A) Pr (A) .$

But now we can remove $A \cap B$ entirely from discussion and reason exclusively about conditional probabilities. This insight leads us to Bayes' Theorem:

Theorem (Bayes' Theorem)

For any events $A, B \subseteq Ω$ :

$Pr (A ∣ B) = \frac{Pr ( B ∣ A ) Pr ( A )}{Pr ( B )} .$

Bayes' Theorem allows us to talk concretely about our updated belief of an event $A$ occurring given the new knowledge that $B$ occurred. A classical example of this concerns drug testing. Suppose that we have a drug test that has the following characteristics:

The true positivity rate of the drug test is 95%. This is the rate at which the drug test reports "yes" when the drug is actually present.
The true negativity rate of the drug test is 90%. This is the rate at which the drug test reports "no" when the drug is not present.

Furthermore, suppose that we assume that 1% of people use this drug. What is $Pr (user ∣ pos)$ the probability that a person is a user of a drug given that they tested positive? By Bayes' Theorem, this quantity is given by:

$Pr (user ∣ pos) = \frac{Pr ( pos ∣ user ) Pr ( user )}{Pr ( p os )} .$

What are these various probabilities on the right-hand side of the equation?

$Pr (pos ∣ user)$ is the probability of a test reporting positive when the person is actually a user. This is precisely the true positivity rate, 0.95 in our case.
$Pr (user)$ is the probability that a person is a user of the drug, assumed to be 1% in our example.
$Pr (pos)$ is the probability that a given test is positive.

We don't have immediate access to this last value. However, we can reconstruct it using the probabilities that we have! We observe that the following equality holds:

$Pr (pos) = Pr (pos \cap user) + Pr (pos \cap non-user)$

Because every person is either a user or non-user of a drug. We can then use the definition of conditional probability to rewrite the equation in terms of the conditional probabilities that we know:

$= Pr (pos \cap user) + Pr (pos \cap non-user) Pr (pos ∣ user) Pr (user) + Pr (pos ∣ non-user) Pr (non-user)$

The probability $Pr (pos ∣ non-user)$ is the false negativity rate, which is $1 - 0.90 = 0.10$ .

Putting all of this together, we obtain:

$Pr (user ∣ pos) = = = \frac{Pr ( pos ∣ user ) Pr ( user )}{Pr ( pos ∣ user ) Pr ( user ) + Pr ( pos ∣ non-user ) Pr ( non-user )} \frac{0.95 \cdot 0.01}{0.95 \cdot 0.01 + 0.10 \cdot 0.99} 0.0876$

In other words, the probability of a positive test given the person is a drug user is only 8.8%! Since many more people are non-users than users, it is more important that our drug functions correctly in the negative cases rather than the positive cases. To see this, observe that the drug always reporting "yes" would result in more false claims than the situation where the drug always reports "no." This is because, by default, there are many more "no" cases than there are "yes" cases.

Exercise (False Negativity, ‡)

redo the drug test calculation two more times:

In the first, raise the true positivity rate to 100%.
In the second, raise the true negativity rate to 95%.

Which calculation produces the better probability for $Pr (pos ∣ user)$ ?

Inference

Recall Bayes's theorem:

$Pr (A ∣ B) = \frac{Pr ( B ∣ A )}{Pr ( B )} \cdot Pr (A)$

Observe that we can interpret Bayes's theorem as a way of expressing an updated belief about the probability of $A$ given that we learn some new information $B$ .

$Pr (A)$ is the original probability, called the prior belief of $A$ .
$Pr (A ∣ B)$ is the updated probability, called the posterior belief of $A$ given $B$ occurs.

To obtain the posterior belief, we multiply the prior belief by:

$Pr (B ∣ A)$ , the likelihood, the probability of the new information occurring given $A$ occurs.
$Pr (B)$ , the prior probability of the new information.

We can apply this interpretation to the classification problem to obtain a simple algorithm, the Naive Bayes algorithm, for classifying data according to a training set. For example, consider the following table of data that records when we are likely to go to class given that certain circumstances occur:

Time of Day	Partied Yesterday?	Test Today?	Hungry?	Went to Class?
Early	Yes	No	Yes	No
Early	No	No	No	No
Early	No	No	No	Yes
Late	Yes	Yes	Yes	Yes
Late	No	No	Yes	No
Early	Yes	Yes	No	Yes
Early	No	No	Yes	Yes
Late	No	No	No	No
Late	Yes	No	Yes	Yes
Late	No	Yes	Yes	Yes

We'll call an individual row of this table a piece of data. The first four columns are:

"Time of day", either early or late
"Partied yesterday?", either yes or no.
"Test today?", either yes or no.
"Hungry?", either yes or no.

We'll call these columns the features of the data. To simplify our subsequent calculation, we'll let all our features be two-valued, i.e., boolean. But in general, we can consider other types of values as well. We can represent these features as a 4-tuple, e.g., $(Early, Yes, No, Yes)$ for the first entry of the table.

The final column, "Went to Class?", is the class of the data. For example, the class of the first entry of the table is "No," i.e., when it was early, we partied, there was no test, and we were hungry, we recorded that we did not go to class.

The table above represents our training data. Given this training data, a classification algorithm is computes the most likely class for a new piece of data. For example, if we see the data:

$(Early, No, No, Yes)$

Is it more likely we'll go to class or not in this scenario?

Problem 1: Working Out the Theory

First, let's see how Bayes Theorem applies to this situation. We'll let $A$ be the result of the classifier, i.e., whether we go to class, and $B$ will be the data fed to the algorithm, i.e., the 4-tuple. If we let $X$ be the 4-tuple and $c$ be the class, then we have according to Bayes's theorem:

$Pr (c ∣ X) = \frac{Pr ( X ∣ c )}{Pr ( X )} \cdot Pr (c)$

Note that $c$ can be either "Yes" or "No", so we can calculate the probability of either class for our data $X$ by instantiating $c$ appropriately. Ultimately, our classifier to compute each of these probabilities utilize the probabilities $Pr (Yes ∣ X)$ and $Pr (No ∣ X)$ . Our classifier chooses the class corresponding to the higher of the two probabilities as the class to assign to $X$ .

In terms of notation, the $arg max$ function captures this idea succinctly. If $C = {"Yes", "No"}$ is our set of classes, then this computation is:

$c \in C arg max \frac{Pr ( X ∣ c )}{Pr ( X )} \cdot Pr (c)$

In other words, we choose the class $c$ that maximizes the probability described by Bayes's equation.

Next, we must figure out how to compute each of these probabilities in terms of our training data. But before we do that, let's try to simplify the formula to avoid having to compute too much stuff. One observation for simplification is that computing $Pr (X)$ is unnecessary, so we can instead compute:

$c \in C arg max Pr (X ∣ c) Pr (c)$

In a sentence or two, describe why we don't need to compute $Pr (X)$ in our $arg max$ calculation.

(Hint: think about for each $c$ what $Pr (X)$ represents and whether it depends on the particular $c$ in question.)

Problem 2: Independence and Computation

The above calculation now demands that we compute two quantities for each class $c$ :

$Pr (c)$ : the probability that the class occurs in the training data.
$Pr (X ∣ c)$ : the likelihood of the data given the class.

We can employ the definition of conditional probability which says that:

$Pr (X \cap c) = Pr (X ∣ c) \cdot Pr (c)$

Employed in this fashion, we call this the probability chain rule.

Therefore, we are really computing $Pr (X \cap c)$ , the probability that we have data $X$ and class $c$ at the same time. Recall that $X$ is really a tuple of sub-events, $X = (x_{1}, x_{2}, x_{3}, x_{4})$ . With this in mind, we can then repeatedly apply the chain rule, observing that intersection of events is commutative:

$Pr (X ∣ c) \cdot Pr (c) = Pr (c) \cdot Pr (x_{1} ∣ c) \cdot Pr (x_{2} ∣ c, x_{1}) \cdot Pr (x_{3} ∣ c, x_{2}, x_{1}) \cdot Pr (x_{4} ∣ c, x_{3}, x_{2}, x_{1}) .$

While this, in theory, allows us to compute $Pr (X ∣ c)$ , it is onerous to do. We need to know the conditional probability of each feature given the others.

To simplify the computation, let's assume that the features $x_{1}, \dots, x_{4}$ are independent from each other, i.e., their probabilities do not depend on each other. This leads us to the following computation:

$Pr (X ∣ c) = Pr (x_{1} ∣ c) \cdot Pr (x_{2} ∣ c) \cdot Pr (x_{3} ∣ c) \cdot Pr (x_{4} ∣ c) .$

$Pr (x_{1} ∣ c)$ , for example, is the probability in our training data that it was early (or not) given that we went to class (or not). These probabilities are much easier to compute!

In summary, we now have the following computation to perform:

$c \in C arg max Pr (c) \cdot Pr (x_{1} ∣ c) \cdot Pr (x_{2} ∣ c) \cdot Pr (x_{3} ∣ c) \cdot Pr (x_{4} ∣ c) .$

And then we choose the class $c$ that maximizes this probability.

Go through the training data and hand-calculate the various probabilities you need to carry out this computation. It'll be useful to organize these probabilities into various tables that you can look up easily. Once you have done this, determine what class your resulting classifier assigns the following data:

$(Early, No, No, Yes)$

Create three other test data points and carry out the classifier on those data as well, showing the computation you performed.

Counting

Our reading on countability can be found on Profs. Autry's and Liu's CSC 208 website:

Counting (CSC 208 (24sp))

Lab: Countability Proofs

Our lab on countability proofs can be found on Profs. Autry's and Liu's CSC 208 website:

Countability Proofs (CSC 208 (24sp))

CSC 208: Discrete Structures