Table of contents
Contents
- The Interval Scheduling Problem
- The Interval Partitioning (Colouring) Problem
- Minimising Lateness
- Strategies of Analysis
A greedy algorithm attempts to find a global optimum through iterative local optimisation. The heuristic/rule we use to decide how to proceed at each stage is called the greedy rule, which should be easy to compute.
In this module, we explore 2 practical methods of proving that a greedy algorithm produces an optimal solution:
- Staying ahead (& the interval scheduling problem)
- The exchange argument
Note the fact it produces “an” optimal solution, not “the” optimal solution—there may be more than one.
An algorithm with “stays ahead” is at least as good as any other algorithm at each stage of execution, and therefore must be optimal. To demonstrate a “staying ahead” proof we examine The Interval Scheduling Problem
Input: A set of intervals such that and refer to the start and finish time of interval respectively
Output: The largest subset such that no two intervals are incompatible. (Two intervals and are incompatible if and only if or , i.e. they overlap)
This can be solved using a greedy algorithm, but we need to define the rules that allow us to implement it. We can sort the intervals by a specific rule, then take intervals in order, providing they are compatible with all intervals already selected.
The rule used to order the intervals can be many things: earliest start time, earliest finish time, or shortest interval all being possibilities.
The only correct and optimal rule is earliest finish time, with the following algorithm:
Interval Scheduling Problem
We can prove this algorithm runs with :Proof of Runtime
We know that sorting is at best O(n)O(n)O(1)$.
- If we keep track of an interval , which is the last interval added to
- is compatible with if and only if
Therefore, comparison is indeedWe must also prove the correctness of the algorithm, i.e. that the Earliest Finish First algorithm is optimal
Proof of Correctness
We will prove by contradiction. Let us assume that the Earliest Finish First algorithm is not optimal.
Let be the set of intervals selected by eff
Let be the optimal set, with for as large of a value of as possible. If eff is not optimal, .
If interval doesn’t exist, then by nature of the algorithm all jobs after are incompatible with it. However, since , and we know that the optimal must strictly have more jobs than eff, there must be compatible jobs after , and this we reach a contradiction.
If interval exists, it cannot finish later than because of the sorting rule. Thus we can just replace with and guarantee that all jobs and afterwards are compatible. This the optimal is still optimal, and the condition that we have met the largest possible has been violated.
Another common interval problem is the interval partitioning problem—consider the scenario that you are once again arranging a set of intervals . Interval has start time and finish time . Your goal is to find the minimum number of groups the intervals can be split into, thus all are scheduled such that none are incompatible.
Input: A set of intervals such that and refer to the start and finish time of interval respectively
Output: The smallest possible collection of sets of compatible intervals
Recall the “earliest start time first” rule from earlier. We can prove that while it was incorrect for the interval scheduling problem, it is correct and optimal for interval partitioning:
Interval Partitioning Problem
&\text{sort intervals by start time, and renumber them such that } s(1) \le s(2) \le … \le s(n) \
& d = 0 \text{ # number of allocated rooms} \
&\text{for } j = 1..n:\
&\quad \text{if interval } j \text{ compatible with all intervals in any set } k \
&\qquad \text{schedule} j \text{ in } k\
&\quad\text{else: }\
&\qquad \text{allocate new set } d + 1 \
&\qquad \text{ schedule } j \text{ in set } d + 1\
&\text{return the schedule}
\end{aligned}$$
If we use a suitable data structure to store the set of schedules, we can prove this algorithm runs with :
Proof of Runtime
We know that sorting is at best $O(n \log n). If we store the set of schedules in a priority queue with the key being the finish time of the last interval:
- When we allocate a new schedule, we insert it in to the priority queue
- When we schedule in , we increase the key of to
- To determine whether is compatible with any , we compare to findMin of the priority queue
The total number of searches in the priority queue is of order , where each priority queue operation is , and therefore we getDefinition depth of a set of open intervals is the maximum number of intervals that contain some point, i.e. the point where the most intervals overlap from all schedules determines the depth (which will equal the number of separate schedules)
The
We must also prove the correctness of the algorithm, i.e. that the Earliest Start First (esf) algorithm is optimal
Proof of Correctness
Let the number of schedules esf allocations
Schedule number is opened because we need to schedule an interval , which is incompatible with all intervals in schedules .
Because of the earliest start sort, each incompatible interval in all prior schedules must have a start time . Additionally, all intervals (including ) will have ended by .
Thus there will be intervals overlapping at some time for a number , which is our depth. Since depth = number of schedules by definition, this demonstrates that esf is optimal.
Broadly, an exchange argument is a proof which shows that a modification to a given solution can only make it less optimal (and hence the solution must be optimal).
One of the most well-known problems in graph theory is finding the shortest path between two nodes in a connected, weighted graph. There is a variant on this problem—the single source shortest path problem, which asks whether we can efficiently find the shortest path from some start node to every other node in the graph.
Input: A connected weighted graph and a start node . The weight of each edge is denoted
Output: A path of lowest weight from to each . The weight of a path is defined as
This is a classic algorithm for the single source shortest path problem. At a high level it works as follows:
- We maintain an “estimate” for the distance from the source node to each other node of the graph
- In each round of the algorithm, we explore an unexplored node with the lowest estimate (i.e. the closest unexplored node).
- Initially, we set values for the shortest paths to each vertex as and update these as we explore the graph.
Psuedocode
foreach do
…
… vF\operatorname{dist}[s] \leftarrow 0F is,not,emptyv \leftarrow \operatorname{argmin}_{v \in V} \operatorname{dist}[v]vFneightbour,u,of,v,still,in,F\leftarrow \operatorname[v] + c(v, u)< \operatorname{dist}[u]\operatorname{dist}[u] \leftarrow\operatorname{prev}[u] \leftarrow v$
… end
… end
end
The use of the array in the code above allows us to say, for each node, which neighbour is the shortest path to the source. For example, suppose the shortest path to an arbitrary node is
the and
By using the array like this, we can reconstruct the shortest path from to any other node using linear space.
Using a binary heap allows Dijkstra to be implemented in time, however there are more sophisticated implementations that can achieve
Another common problem in graph theory is, given a connected, weighted graph , to find a tree which spans for the minimum cost.
Input: A connected weighted graph , where is the weight of an edge
Output: A set of edges such that the graph \sum_{e \in T}\ell(e)$ is minimised.
We begin the solution by observing a property of cutsets of a minimum spanning tree, allowing us to identify which edges in will be included in .
definition Cutsets
A cutset of a minimum spanning tree is the set of edges XEG = (V, E)$
claim
For each cutset of a graph , every minimum spanning tree contains an edge such that there does not exist where
Proof
Suppose that for someone cutset of a graph we have a minimum-cost edge . It suffices to show that if is a minimum spanning tree, then either or where
This is equivalent to the contraposition
Consider a cutset . Suppose:
is an edge in the cutset such that such that (1)
(2) is a minimum spanning treeWe can then say:
(3) By (2), there must exist a unique edge on the path from to
(4) By (1),
(5) Let . By (4),
(6) By (5), is not a minimum spanning tree
(7) Since is an arbitrary cutset, (6) holds for all cutsets, Q.E.D.