Contents

A greedy algorithm attempts to find a global optimum through iterative local optimisation. The heuristic/rule we use to decide how to proceed at each stage is called the greedy rule, which should be easy to compute.

In this module, we explore 2 practical methods of proving that a greedy algorithm produces an optimal solution:

  • Staying ahead (& the interval scheduling problem)
  • The exchange argument
    Note the fact it produces “an” optimal solution, not “the” optimal solution—there may be more than one.

An algorithm with “stays ahead” is at least as good as any other algorithm at each stage of execution, and therefore must be optimal. To demonstrate a “staying ahead” proof we examine The Interval Scheduling Problem



Input: A set of intervals such that and refer to the start and finish time of interval respectively
Output: The largest subset such that no two intervals are incompatible. (Two intervals and are incompatible if and only if or , i.e. they overlap)

This can be solved using a greedy algorithm, but we need to define the rules that allow us to implement it. We can sort the intervals by a specific rule, then take intervals in order, providing they are compatible with all intervals already selected.

The rule used to order the intervals can be many things: earliest start time, earliest finish time, or shortest interval all being possibilities.

The only correct and optimal rule is earliest finish time, with the following algorithm:

Interval Scheduling Problem


We can prove this algorithm runs with :

We must also prove the correctness of the algorithm, i.e. that the Earliest Finish First algorithm is optimal



Another common interval problem is the interval partitioning problem—consider the scenario that you are once again arranging a set of intervals . Interval has start time and finish time . Your goal is to find the minimum number of groups the intervals can be split into, thus all are scheduled such that none are incompatible.

Input: A set of intervals such that and refer to the start and finish time of interval respectively
Output: The smallest possible collection of sets of compatible intervals

Recall the “earliest start time first” rule from earlier. We can prove that while it was incorrect for the interval scheduling problem, it is correct and optimal for interval partitioning:

Interval Partitioning Problem

&\text{sort intervals by start time, and renumber them such that } s(1) \le s(2) \le … \le s(n) \
& d = 0 \text{ # number of allocated rooms} \
&\text{for } j = 1..n:\
&\quad \text{if interval } j \text{ compatible with all intervals in any set } k \
&\qquad \text{schedule} j \text{ in } k\
&\quad\text{else: }\
&\qquad \text{allocate new set } d + 1 \
&\qquad \text{ schedule } j \text{ in set } d + 1\
&\text{return the schedule}
\end{aligned}$$

If we use a suitable data structure to store the set of schedules, we can prove this algorithm runs with :

Definition depth of a set of open intervals is the maximum number of intervals that contain some point, i.e. the point where the most intervals overlap from all schedules determines the depth (which will equal the number of separate schedules)

The

We must also prove the correctness of the algorithm, i.e. that the Earliest Start First (esf) algorithm is optimal

Broadly, an exchange argument is a proof which shows that a modification to a given solution can only make it less optimal (and hence the solution must be optimal).

One of the most well-known problems in graph theory is finding the shortest path between two nodes in a connected, weighted graph. There is a variant on this problem—the single source shortest path problem, which asks whether we can efficiently find the shortest path from some start node to every other node in the graph.

Input: A connected weighted graph and a start node . The weight of each edge is denoted
Output: A path of lowest weight from to each . The weight of a path is defined as

This is a classic algorithm for the single source shortest path problem. At a high level it works as follows:

  • We maintain an “estimate” for the distance from the source node to each other node of the graph
  • In each round of the algorithm, we explore an unexplored node with the lowest estimate (i.e. the closest unexplored node).
  • Initially, we set values for the shortest paths to each vertex as and update these as we explore the graph.

The use of the array in the code above allows us to say, for each node, which neighbour is the shortest path to the source. For example, suppose the shortest path to an arbitrary node is


the and
By using the array like this, we can reconstruct the shortest path from to any other node using linear space.

Using a binary heap allows Dijkstra to be implemented in time, however there are more sophisticated implementations that can achieve

Another common problem in graph theory is, given a connected, weighted graph , to find a tree which spans for the minimum cost.

Input: A connected weighted graph , where is the weight of an edge

Output: A set of edges such that the graph \sum_{e \in T}\ell(e)$ is minimised.

We begin the solution by observing a property of cutsets of a minimum spanning tree, allowing us to identify which edges in will be included in .

definition Cutsets

A cutset of a minimum spanning tree is the set of edges XEG = (V, E)$

claim

For each cutset of a graph , every minimum spanning tree contains an edge such that there does not exist where