\[ \newcommand{\set}[1]{\{\, #1 \,\}} \newcommand{\card}[1]{|#1|} \newcommand{\pset}[1]{đť’«(#1)} \newcommand{\falling}[2]{(#1)_{#2}} \]

*Uncertainty* is a fundamental part of life. For example,
imagine interviewing for an internship position. We ask ourselves,
instinctually, *what are the chances that I will get the job?*
Perhaps you feel like your chances are higher that day because you slept
and ate well that morning. But maybe you know the company uses a
programming language you arenâ€™t entirely comfortable with. You weigh
these factors and arrive at an intuition of the *likelihood* that
the interview is successful. However, you know that even though you feel
like your chances are high, you may still not get the job.

How do we model this uncertainty? Up until this point, everything
that we have done has been *deterministic* in nature,
*i.e.*, there has only been a singular, definite outcome of an
event, whether that it is evaluating a program or simplifying an
arithmetic expression. However, uncertainty introduces the need to
consider multiple possible outcomes arising from a single event. Can we
precisely define what it means for one of these events to be more likely
to occur?

*Probability theory* models uncertainty by capturing our
intuition that some events are more likely to occur than others. In this
chapter, weâ€™ll study probability theory as an application of counting.
If we can count of the number of occurrences of an event, we can assert
a *probability value* that describes the likelihood of that
event. As computer scientists, probability theory is particularly
important for two reasons:

- Because uncertainty is a natural phenomena, we need ways of capturing and reasoning about uncertainty to accurately model real-world objects.
- Uncertainty holds special potential for algorithmic design. Can we trade certainty like a resource to make our algorithms more efficient?

You likely have seen probability computations in your pre-collegiate math education. Such a probability value of a particular event occurring is computed using the following formula:

\[ \frac{\text{Number of times that event occurs}}{\text{Total number of possible events}}. \]

In this reading, weâ€™ll introduce the fundamental definitions of this
*frequentist* perspective on probability theory as well as some
key concepts: expectation and conditional probabilities. Weâ€™ll only
scratch the surface of probability theory in this course. I highly
recommend pursuing additional course work in this area, *e.g.*,
STA 209, because probability theory is becoming increasingly important
for *all* computer scientists to understand in a world where
statistical and machine learning-based techniques are gaining
prevalence.

# The Foundations of Frequentist Probability Theory

The probability of an event occurring can only be described in terms
of other events occurring. We thus define a *sample space* that
captures all the possible events under consideration.

**Definition (Sample Space)**: a *sample space*,
\(\Omega\), is the set of all possible
outcomes of an experiment. Such a sample space is considered
*discrete* if \(\Omega\) has
finite cardinality. Otherwise, the sample space is considered
*continuous*.

In this class, we focus exclusively on discrete probability. It is in the title of the class, after all.

**Example**: the sample space of an experiment where we
flip a pair of coins is denoted by:

\[ \Omega_{1} = \set{(H, H), (H, T), (T, H), (T, T)}. \]

The sample space of an experiment where we roll three six-sided dice is denoted by:

\[ \Omega_{2} = \set{(x, y, z) \mid x, y, z \in [1, \ldots, 6]}. \]

Note that the sample space captures precisely the set of possible
outcomes. Other outcomes, by definition, are not under consideration,
*e.g.*, the coins landing on their sides, unless they are
included in \(\Omega\).

Formally, an *event*, \(E \subseteq
\Omega\), describes the *outcome* of a particular
experiment.

**Example**: the event describing when we obtain at
least one head in two coin flips is denoted by:

\[ E_1 = \set{(H, H), (H, T), (T, H)}. \]

The event describing when the sum of the three die we roll is exactly 4 is denoted by: \[ E_2 = \set{(x, y, z) \mid (x, y, z) \subseteq \Omega_{2}, x + y + z = 4} \]

With an event formally defined, we can now define its likelihood,
*i.e.*, probability through a *probability mass function*.
A probability mass function is a function \(f
: \mathcal{P}(\Omega) â†’ \mathbb{R}\) that obeys the following
properties:

\(\forall E \in \mathcal{P}(\Omega).\,f(E) \geq 0\): the probability function produces non-negative probabilities.

\(P(\Omega) = 1\): the probability that

*any*event happens in the sample space is 1.If \(E_1, \ldots, E_k\) are pairwise disjoint events, then:

\[ f(E_1 \cup \ldots \cup E_k) = \sum_{i = 1}^{k} f(E_i), \]

The probability of the union of a collection of disjoint events is the sum of their individual probabilities,

*i.e.*,*the sum rule*for probability theory.

We take the event as the domain of our probability function rather than a single outcome because we can always represent outcomes as singleton sets of events.

**Example**: if our coins are fair, then we expect that
the probability of obtaining a heads or tails with a single coin to be
equal.

\[ f(e) = \frac{1}{4} \]

Now, suppose that we wish to know the probability of obtaining at least one head in two flips. This is denoted by the event:

\[ E = \set{(H, H), (H, T), (T, H)}. \]

And by the sum rule, the probability of this event is:

\[\begin{align} f(E) =&\; f(\{\, (H, H) \,\}) + f(\{\, (H, T) \,\}) + f(\{\, (T, H) \,\}) \\ =&\; \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = \frac{3}{4}. \end{align}\]

**Exercise (Gambling, â€ˇ)**: consider the following
gambling game:

Roll three six-side dice in sequence. Say that a die

winsif it is a five or a six.

- Write down the sample space \(\Omega\) of possible outcomes.
- Assuming that the dice are all fair, what is the probability that you will win one occurrence of the gambling game?