Data Structures IITK

PowerPoint Presentation

Data Structures and Algorithms(CS210/ESO207/ESO211) Lecture 36

Sortingbeyond O(n log n) bound1Overview of todays lectureThe sorting algorithms you studied till now

Integer sorting

Solving 2 problems from Practice sheet 6 and one problem from Practice sheet 5.

2Sorting algorithms studied till nowAlgorithms for Sorting n elementsword RAM model of computation:CharacteristicsWord is the basic storage unit of RAM. Word is a collection of few bytes.

Each input item (number, name) is stored in binary format.

RAM can be viewed as a huge array of words. Any arbitrary location of RAM can be accessed in the same time irrespective of the location.

Data as well as Program reside fully in RAM.

Each arithmetic or logical operation (+,-,*,/,or, xor,) involving a constant number of words takes a constant number of steps by the CPU. 6Each arithmetic or logical operation (+,-,*,/,or, xor,) involving O( log n) bits take a constant number of steps by the CPU, where n is the number of bits of input instance.Integer sortingCounting sort: algorithm for sorting integersCounting sort: algorithm for sorting integersA0 1 2 3 4 5 6 7Count0 1 2 3 4 5 2 5 3 0 2 3 0 32 2 2 4 7 7 802301Place0 1 2 3 4 5 B0 1 2 3 4 5 6 73Counting sort: algorithm for sorting integersA0 1 2 3 4 5 6 7Count0 1 2 3 4 5 2 5 3 0 2 3 0 32 2 2 4 6 7 802301Place0 1 2 3 4 5 B0 1 2 3 4 5 6 703Counting sort: algorithm for sorting integersA0 1 2 3 4 5 6 7Count0 1 2 3 4 5 2 5 3 0 2 3 0 32 1 2 4 6 7 802301Place0 1 2 3 4 5 B0 1 2 3 4 5 6 7303Counting sort: algorithm for sorting integersCounting sort: algorithm for sorting integersPractice sheet 6We shall solve exercises 5 and 1 from this sheetSolution of Problem 5 of practice sheet 6.Description(in terms of interval): Given a set A of n intervals, compute smallest set B of intervals so that for every interval I in A\B, there is some interval in B which overlaps/intersects with I.

AThe set of green intervals is a solution but not an optimal solution.Solution of Problem 5 of practice sheet 6.Description(in terms of interval): Given a set A of n intervals, compute smallest set B of intervals so that for every interval I in A\B, there is some interval in B which overlaps/intersects with I.

Let I* be the interval with earliest finish time.Let I be the interval with maximum finish time overlapping I*. Lemma1: There is an optimal solution for set A that contains I.AI*ISolution of Problem 5 of practice sheet 6.Question: How to obtain smaller instance A using this greedy approach ? Naive approach (again inspired from the job scheduling problem): remove from A all intervals which overlap with I. This is A. This approach does not work! Here is a counterexample.

The problem is that some deleted interval (in this case I) could have been used for intersecting many intervals if it were not deleted. But deleting it from the instance disallows it to be selected in the solution. AII*IOverview of the approachIn order to make sure we do not delete intervals (like I in the previous slide) if they are essential to be selected to cover many other intervals, we make some observations and introduce a terminology called Uniquely covered interval. It turns out that we need to keep I in the smaller instance if there is an interval there which is uniquely covered by I . Otherwise, we may discard I. An ObservationWe can delete all intervals whose finish time is before finish time of I because any interval overlapped by such intervals will anyway be overlapped by I. Let us consider intervals which overlap with I, but have finish time greater than that of I. In the example shown below, these intervals are those three intervals which cross the red line.

Observation1: Among the intervals crossing the red line, we need to keep only that interval which has maximum finish time. (I in this picture)Proof: Notice that each of these intervals are anyway intersected by I. As far as using them to intersect other intervals in concerned, we may better choose I for this purpose.So from now onwards, we shall assume that there is exactly one interval I in A which overlaps I (intersects the red line) and has finish time larger than I. II*IAUniquely covered intervalI2 is said to be uniquely covered by I1 if I2 is fully covered by I1Every interval overlapping I2 is also full covered by I1.Lemma2 : There is an optimal solution containing I1.Proof: Surely I2 or some other interval overlapping it must be there in the optimal solution. If we replace that interval by I1, we still get a solution of the same size and hence an optimal solution.

I2I1We are now ready to give description/construction of A from A. There will be two cases. We shall then prove that |Opt(A)| = |Opt(A)| + 1 for each of these cases.

Important note: The reader is advised to full understand Lemma1, Lemma2, Observation1, and the notion of Uniquely covered interval. Also fully internalize the notations I*, I, and I. This will help the reader understand the rest of the solution.Constructing A from A

Constructing A from A

AII*IICase1: There is an interval I D uniquely covered by IAIIDEWe need to take care of intervals whose starting point is to the right of red line (finish time of I).We can partition these intervals into two sets.D: those which overlap with I.E: those that start after the end of I and hence do not overlap with I.DENow we shall describe the two cases for construction of A.Constructing A from A

If there is an interval I D uniquely covered by I, then we define A as follows. Remove all intervals from A which overlap with I (this was our usual way of defining A in our wrong solution). Now add I to this set. This set is the smaller instance A for Case 1.

We shall now define A for Case 2.

Constructing A from A

Case2: There is no interval uniquely covered by IAII*IDEADEConstructing A from A

If there is no interval in D uniquely covered by I, then we define A as follows. Remove all intervals from A which overlap with I (this was our usual way of defining A in our wrong solution). This set is the smaller instance A for Case 2.

Theorem: |Opt(A)| = |Opt(A)| + 1

We shall prove this theorem for case 1 as well as case 2.Case1: There is an interval I D uniquely covered by I|Opt(A)| |Opt(A)| + 1

AII*IIAIIDEDENow Using Lemma2, it follows that there is an optimal solution for A containing I.What to add to this solution to get a solution for A ? We need to add just I to get a solution for A and we are done.Case1: There is an interval I uniquely covered by I|Opt(A)| |Opt(A)| - 1

AII*IIAIIDEDEUsing Lemma1 and Lemma2, it follows that there is an optimal solution for A containing I and I.We need to just remove I from this optimal solution for A to get a solution for A and we are done.This finishes the proof of Theorem for Case 1.We shall now analyze Case2 and prove Theorem for this case as well.Case2: There is no interval uniquely covered by I|Opt(A)| |Opt(A)| + 1

AII*IADEDEConsider any optimal solution for A. Note that this optimal solution takes care of D and E.So we just need to take care of intervals from A which intersect the red line. These are taken care by adding I to this solution. We are done.Case2: There is no interval uniquely covered by I|Opt(A)| |Opt(A)| - 1

AII*IADEDEUsing Lemma1, it follows that there is an optimal solution for A containing I. If I is not in this optimal solution, we can see that removing I from this optimal solution gives a valid solution for A. So let us consider the case when I is present in the optimal solution of A. The problem is that I is not present in A, so we need a substitute of I from A. Notice that I can serve the purpose of overlapping of intervals from D only. So we should search for substitute for I from D only. We replace I by the interval from D which intersects the violet line and has earliest start time. See the following slide for its justification.Let be the interval in D which intersects the violet vertical line (has finish time greater than that of I) and has earliest start time. It suffices if we can show that every interval of D overlaps with . We proceed as follows. Consider any interval in D. There are two cases.Finish time of is less than that of I. In other words, does not intersects the violet line. In this case, there must be some other interval in D that overlaps and intersects the violet line (otherwise, would be uniquely covered by I); since start time of is less than this interval, so is overlapped by as well.Finish time of is more than I. In other words, does intersect the violet line. Hence overlaps with as well since the latter also intersects the violet line.This completes the proof.

Concluding slide for exercise 5We demonstrated a greedy strategy and proved its correctness by establishing the relationship between its optimal solution and the optimal solution of the smaller instance defined by the greedy step. Each step of the greedy strategy can be executed in time polynomial of n.

Theorem:There is a polynomial time algorithm for computing smallest subset of intervals overlapping a given set of intervals.Problem 1Given an array A storing n elements, and a number k, compute k nearest elements for the median. Time complexity should be O(n).

Hint: Use the following tools.Divide and conquer strategy like used in problem 2 of the same practice sheet.Linear time median finding algorithm.You need to divide the problem to half the size in each step. Finding DFS tree from start and finish timeThere was a problem in practice sheet 5 where, given start time and finish time of DFS traversal for all vertices, the aim is to compute DFN number and DFS tree.

A few students were facing the problem of determining children of a node in DFS tree. An easy way to achieve this goal is an indirect way: In order to compute children of a vertex in DFS tree, it suffices if we can compute parent of each vertex. We can do the latter task as follows.Among all vertices neighboring to a vertex u, find all those vertices whose start time is smaller than that of u. All these vertices are ancestors of u. Who among them will be parent of u? Surely, the vertex with maximum start time.So we can compute parent of vertex u in O(deg(u)) time. Time spent over all vertices will be O(m+n) time. Hence we can compute children of each vertex in DFS tree and hence the entire DFS tree structure in O(m+n)time.

Data Structures IITK

Documents