Suppose that we have a collection \(I = \{ 1, \ldots, k \}\) of \(k\) items where each item has positive, integral value \(v_i\) and weight \(w_i\). The goal of the classic *knapsack problem* is to fill a backpack with capacity \(C\) (a positive integer) with a subset of the items from \(I\) such that the sum of the *weights* of the chosen items does not exceed \(C\). In the decision variant of this problem, we attempt to fill the knapsack so that the sum of values of the chosen items is equal to some target value \(t\). For this problem, we’ll instead consider the maximization variant of this problem where we attempt to fill the knapsack so that the sum of the values is maximized while respecting the capacity of the knapsack.

As we discussed in class this week, the knapsack problem is clearly expressible as an 0-1 integer program and as such is known to be \(\mathsf{NP}\)-hard. In this demonstration exercise, we’ll develop an approximation algorithm for this problem by considering a number of *greedy* approaches.

# Problem 1: By Value

First, let’s consider a simple algorithm utilizing the values of the items:

First remove any items that do not fit in the knapsack,

i.e., an item \(i\) such that \(w_i > C\). Then, order the items in terms ofdecreasing value(most valuable item first) and add items to the knapsack until the knapsack’s capacity would be exceeded by adding another item.

Give an example instance of the knapsack problem where this algorithm *fails* to find the optimal solution. Also give the solution generated by the algorithm and what the optimal solution is.

# Problem 2: By Ratio

Obviously, a greedy algorithm that orders by weight instead of value would be a non-starter. However, what if we instead consider the *ratio* of value to weight for each item instead?

First remove any items that do not fit in the knapsack,

i.e., an item \(i\) such that \(w_i > C\). Then, order the items in terms ofdecreasing value-weight ratio,i.e., \(\frac{v_i}{w_i}\) for item \(i\), and add items to the knapsack until the knapsack’s capacity would be exceeded by adding another item.

Give an example instance of the knapsack problem where this algorithm also *fails* to find the optimal solution. Give the solution generated by the algorithm and what the optimal solution is.

# Problem 3: Surprise

The by-ratio algorithm seems to be superior to the by-value algorithm. However, it turns out this is not always the case!

Give a *pair* of example instances, one where the by-ratio algorithm outperforms the by-value algorithm and one where the by-value algorithm outperforms the by-ratio algorithm. Again, report the solutions generated by each algorithm and the optimal solutions for each of the instances.

(*Hint*: you may not have to look hard to find these examples—check problems 1 and 2 again!)

# Problem 4: Why Not Both?

The previous problem suggests an intriguing approach: why not try both algorithms and take the better of the two?

Run the by-value and by-ratio algorithms and use the solution that generates the higher value.

For the remainder of this demo, we’ll work to prove that this algorithm is a *2-approximation*, *i.e.*, the result is no smaller than \(\frac{1}{2}\) from the optimal solution.

First, let’s make sure that we aren’t wasting our time. Since the knapsack problem is \(\mathsf{NP}\)-hard, we expect a brute-force algorithm to take an exponential amount of time. State and prove the runtime of our (supposed) 2-approximation algorithm for knapsack.

# Problem 5: Just a Bit

To begin our proof of correctness, let’s consider the by-ratio algorithm first. Let’s consider \(I_\mathsf{r}\) to be the *sequence* of the \(k\) items, \(I_\mathsf{r} = 1, \ldots, k\) where the items are ordered by decreasing ratio. That is, for any \(i < k \in I_\mathsf{r}\), it is the case that:

\[ \frac{v_i}{w_i} \leq \frac{v_{i+1}}{w_{i+1}}. \]

The by-ratio algorithm chooses elements of \(I_\mathsf{r}\) in left-to-right order until the knapsack is filled. Let \(n \leq k\) be the index that the by-ratio algorithm ends on so the solution generated by the algorithm is \(S_\mathsf{r} = 1, \ldots, n\). Thus, the value of the optimal solution given by the by-ratio algorithm is \(\mathsf{OPT}_r = \sum_{i = 1}^n v_i\). Finally, let \(\mathsf{OPT}\) be the *true* optimal solution to the knapsack instance.

Prove that if \(n = k\), then \(\mathsf{OPT}_r = \mathsf{OPT}\).

Suppose \(n < k\). Prove that \(\sum_{i = 1}^{n+1} v_i ≥ \mathsf{OPT}\).

(

*Hint*: think about what the collection of elements from \(i\) to \(n+1\) of \(I_r\) represents and, in particular, what the \(n+1\) element represents with respect to the optimal solution.)

# Problem 6: Bounds

Now, let’s think about the by-value algorithm in relation to the by-ratio algorithm. Suppose the by-ratio algorithm produces a solution \(S_r\) where \(n \leq k\) is the index the element ends on. Let \(\mathsf{OPT}_v\), be the optimal value of the by-value algorithm. Prove that \(v_{n+1} \leq \mathsf{OPT}_v\).

# Problem 7: Putting It Together

Now, let’s put together the facts from the previous two parts together to complete the argument.

- Use the previous two parts to show that \(\mathsf{OPT}_r + \mathsf{OPT}_v \geq \mathsf{OPT}\).
- Argue in a few sentences why this fact implies that the algorithm from problem 4 is indeed a 2-approximation of the Knapsack problem.