Sets and Their Operations
As computer scientists, we aim to use mathematics to model computational phenomena in order to
- Precisely understand how the phenomena works (usually for the purposes of implementing that phenomena as a computer program) and
- Abstract differences between seemingly unrelated phenomena to discover how they are related.
Very often, these computations involve collections of data, for example, a class management system might store a collection of students, or a server might need to track the set of computers it can directly communicate with on the network. Furthermore, these computations can frequently be phrased in terms of manipulations of or queries over these data, for example, generating pairs of students for a class activity, or finding if there exists a series of computers that connect the server to some other computer on the network.
To model data, we frequently resort to mathematical sets.
Set theory is the branch of mathematics that studies how we construct and manipulate sets. Because virtually any mathematical model includes data of some sort---integers, real numbers, or more abstract objects---sets form the basis of formal mathematics. In addition to using sets directly in our models, we’ll also see sets in every other field of mathematics. In this unit, we briefly explore the field of set theory with an eye towards building up a working understanding of what set theory provides us and what questions we can answer by framing our problems in terms of sets.
Specifying Sets of Objects
When discussing sets, we must first define the universe under consideration.
The universe or domain of discourse (LaTeX: \mathcal{U}) is the collection of elements that we may include in our sets.
As a concrete example, consider representing a collection of students at Grinnell using sets. We can, therefore, consider our universe to be all students at the college. We typically denote the universe under consideration by the variable , so we might say:
Let be the collection of all students at the college.
In many cases, the universe may be inferred by context, e.g., if our sets contain integers, then we can likely infer from context that is the collection of integers. However, regardless of whether is stated explicitly or inferred, we must keep it in mind as many of our operations over sets will require this knowledge.
With our universe defined, we may now define sets over this universe. For example, for simplicity's sake, suppose that the college only contains five students: Jessica, Sam, Phillip, Jordan, and Elise. We might specify a set containing the first three students as follows:
We say that Jessica, Sam, and Phillip are all elements of the set .
The elements of the set are surrounded by curly braces.
(Note that in LaTeX, you will need to escape the curly braces, i.e., \{ ... \}.
You may also consider putting in an explicit thin space, \,, between the braces, i.e., \{\, ... \,\}.)
The primary query that we can ask of a set is whether a set contains a particular element. For example, Jessica is an element of or, more colloquially, we say that Jessica is in . We can think of set inclusion as a proposition, between a potential element of a set and a set.
We say that value is in a set if , written (LaTeX: \in), is an element of .
With this, we can write the inclusion relationship between Jessica and as follows:
Likewise, we know that Elise is not in .
Like how we write (LaTeX: \neq) to say that two values are not equal, we write (LaTeX \notin) to say that an element is not in a set.
For Elise, we would then write
The order of the elements does not matter in a set, so the following set
is equivalent to because they contain exactly the same elements. As mentioned previously, sets only contain unique elements, so our sets cannot duplicates. From this definition, we cannot have a set that, for example, contains Jessica twice.
Set Comprehensions
An alternative to explicitly enumerating all the elements of a set, we can also specify a set by way of a set comprehension. For example, the following set:
is equivalent to the set .
A set comprehension is broken up into two parts separate by a pipe () (LaTeX: \mid).
To the left of pipe is the output expression which normally involves one or more variables.
To the right of the of the pipe is a collection of qualifiers that refine which elements are included in the set.
There are two sorts of qualifiers we might include:
- Generators describe what values the variables of the output expression take on. In the example above, we quantify that is drawn from our universe by stating . This quantification is implicitly universal in nature: we really intend however, we traditionally do not include the extra verbiage of the symbol. (Note that we can think of our universe itself as a set!)
- Predicates describe conditions that must hold of the values involved to include it in the set. For an element to be included in the set, it must satisfy all of the predicates of the comprehension.
By combining the generator and the predicate of , we see that will contain all the students of the college whose name starts with "J".
Note that we frequently elide the generator from our set comprehensions when it is clear from context what the variable ranges over. For example, the predicate already describes as a student, so it is unnecessary to express that it is also a member of . We can more concisely write as follows:
Set comprehensions are a powerful, flexible method for describing the members of a set. To illustrate this, let's consider another simple universe, the universe of natural numbers. The natural numbers consist of zero and the positive integers. First, let's define a set over this universe, e.g.,
Now let's consider specifying some more complicated set comprehensions. For example, we may denote the sets of even and odd numbers in the range zero through ten as:
In contrast, the following set comprehension contains a non-trivial expression:
For each , contains the value . Thus, .
When a comprehension includes a single variable, the set ranges over all the elements from the variable is drawn from. When a comprehension includes multiple variables, the set ranges over all the possible combinations of values from which the variables are drawn. For example, consider the following set:
Is equivalent to the larger set:
First, let's consider the expression portion of the comprehension. The expression consists of a pair of elements , so we expect the elements of to be pairs. Furthermore, and are drawn from the sets and , respectively, so we expect the pairs to contain natural numbers.
In effect, by including two generators, we consider all possible pairs of elements drawn from the two generators. We can alternatively interpret how these set comprehension "compute" their elements through a series of nested for-loops. In effect, the comprehension computes:
result = []
for n_1 in S_even:
for n_2 in S_odd:
pair = (n1, n2)
result.append(pair)
In general, when we have any collection of generators, we can think of them as a collection of nested for-loops where the body of the loop includes a new element in the set according to the output expression.
Standard Sets of Objects
Very often, our sets a drawn from a standard universes, usually over numbers.
Rather than repeating the description of these sets, we denote them using the following variables written in blackboard font (LaTeX: \mathbb{...}):
Finite and Infinite Sets
In the situation where our set contains a finite number of elements, we denote the size of a set by , the set surrounded by pipes. For example, the size of , the set of pairs of even and odd elements drawn from the range zero through ten is: .
However, a set need not contain a finite number of elements. The universes of numbers we discussed previous all contain an infinite number of elements. However, we can also directly construct sets that are also infinite in size. For example, consider the set:
contains all natural numbers that are a multiple of 10. There is clearly an infinite amount of such numbers, but this poses no problem for defining what the set contains. As we shall see, whether a set is finite or infinite does not change the behavior how the basic operations over sets that we consider next. However, infinite sets pose some significant conundrums for set theory that will briefly explore at the end of this chapter.
Write down formal set comprehensions for each of the following descriptions of sets:
- The set of all natural numbers that are either less than five or greater than 20.
- The set of all pairs of integers such that the sum of the pair of numbers is equal to zero.
- The set of all real numbers that are also positive integers.
Set Operations
With our basic definitions for sets—inclusion and set comprehension—we can define the fundamental operations over sets.
Union and Intersection
The union of two sets and , written , produces a set that contains all of the elements drawn from either of these sets. For example, if and then (keeping in mind that duplicates are discarded with sets).
In contrast, the intersection of two sets and , written , produces a set that contains the elements that are found in both and . For example, if and then .
The intersection of two sets and , written (LaTeX: \cap) is defined as:
Note the parallels between the definitions of these set theoretic operations and the logical connectives we explored earlier:
- Set union is defined in terms of logical disjunction .
- Set intersection is defined in terms of logical conjunction .
This is no coincidence! We can think of union and intersection as the set-theoretic realization of logical disjunction and conjunction, respectively. Because they are defined directly in terms of their logical counterparts, union and intersection behave similarly to them as well.
Difference and Complement
The difference of two sets and , written , produces a set that contains the elements of that are not also in . For example, if and then . Note that and are removed from the difference since they are in .
The complement of a set , written , is the set of elements that are not found in this set. Note that this requires knowledge of what our universe is in order to constrain what elements are not in the set in question. Say that our universe is over the finite set . Then if , then . In contrast, if we expand our universe to be the natural numbers, e.g., , then . Formally, we can write this as:
The complement of a set , written (LaTeX: \overline{...}) is defined as:
Note that set complement is defined in terms of logical negation and is, thus, strongly identified with logical negation in the same way union and intersection identify with conjunction and disjunction. There is not a direct analog to set difference in logic, but we can see that difference can be written in terms of the other three operators via a direct translation of its formal definition:
Consider the following sets:
Demonstrate the equivalence of set difference's definition with the equivalent formulation in terms of intersection and complement by (a) deriving in terms of the formal definition of set difference and (b) checking the equivalence on this example by "executing" and observing that you obtain identical results.
Cartesian Product
The Cartesian product of two sets and , written , is the set of all the possible pairs of elements drawn from and . The first element of these pairs is drawn from , and the second element of these pairs is drawn from . It is a bit easier to see how the Cartestian product works by using sets of abstract symbols rather than numbers. First, let's consider and . The Cartestian product of these two sets is:
Note that each element of is paired with each possible element of . When writing out the Cartesian product, it’ll be useful to do so in a systematic, grid-like manner like above where each row corresponds to a choice of element from and each column corresponds to a choice of element from .
With this in mind, we can return to our universe of natural numbers and consider our canonical sample sets. For example, if and then:
Note here that elements shared in common between and are not discarded like with disjunction. This is because such elements always result in unique pairs in the resulting set.
The Cartesian product of two sets and , written (LaTeX: \times) is defined as:
Subsets and Set Equality
A set is a subset of another set if all the elements of the first set are contained in the second set. For example, if we have and , then is a subset of , written . In contrast, is not a subset of , written . Note that this is not an operation over sets (that produces another set) but, rather, a proposition over sets (that is potentially provable).
The basic proposition we can assert about a set is whether an element is found inside the set, written . We can use this inclusion proposition to formally define subset:
In other words, if is a subset of then every element of must also be an element of .
A proper subset, written (LaTeX: \subset), is where but .
Note that when we say "subset", we will implicitly mean "subset-or-equal" and use the term "proper subset" to denote this case where we require that the two sets are also not equal.
If we know that and are numbers and and , we know that . This is because if and cannot both be less than each other; they must, therefore, be equal to each other. Likewise, if we know that and , we know that and must be equal.
This realization gives us an alternative definition of set equality in terms of subsets.
We say that sets and are equal if and only if they are subsets of each other. In other words:
This definition, in turns, gives us a principle for reasoning about the equality of sets, the so-called double inclusion principle, that we will later use to prove that two sets are equal.
Power Set
Finally, the power set of a set is a set that contains all the subsets of .
Note in our formal definition that implicitly is a set because it is, by definition, a subset of . This is not a problem! Sets can certainly contain other sets which allows us to create more complex structures.
As an example of the power set operation, let . Then:
To make it easier to enumerate all the possible subsets of in a systematic way, we arrange them in order of size.
-
There is one subset of size zero, the set containing no elements, e.g., the empty set. We can write the empty set using set literal notation, but we traditionally use the empty set symbol (LaTeX:
\emptyset) to denote the empty set. -
There are four subsets of size one corresponding to the singleton sets, each containing of the elements of .
-
There are six subsets of size two.
-
There are four subsets of size three.
-
Finally, there is a single subset of size four which is itself (because it has size four).
In this particular case, this results in 16 subsets overall. You can imagine that the number of such subsets grows dramatically as the size of the input set increases. We will revisit this point in our discussion of counting.