8/31/2016 1 Advanced Algorithms and Data Structures Prof. Tapio Elomaa [email protected]Course Prerequisites • A seven credit unit course • We take things a bit further than basic algorithms / data structures courses that you might have attended • We will assume familiarity with – Necessary mathematics – Elementary data structures – Programming 31-Aug-16 MAT-72006 AADS, Fall 2016 2
29
Embed
Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data Structures ... algorithms / data structures courses that you ... – Elementary
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• A seven credit unit course• We take things a bit further than basic
algorithms / data structures courses that youmight have attended
• We will assume familiarity with– Necessary mathematics– Elementary data structures– Programming
31-Aug-16MAT-72006 AADS, Fall 2016 2
8/31/2016
2
Course Basics
• There will be 4 hours of lectures per week• Weekly exercises start in a weeks time
• We will not have a programming exercise thisyear (unless you demand to have one)
• We might consider organizing a seminar withvoluntary presentations (yielding extra points)at the end of the course
31-Aug-16MAT-72006 AADS, Fall 2016 3
Organization & Timetable
• Lectures: Prof. Tapio Elomaa– Mon & Wed 12–14 PM in TB216– Aug. 29 – Dec. 7, 2016
• Period break Oct. 17–23, 2016
• Exercises: M.Sc. Juho LauriThu 12–14 TB224, Start: Sept. 8
• Exam: Fri Dec. 16, 2016 @ 13—16
31-Aug-16MAT-72006 AADS, Fall 2016 4
8/31/2016
3
Course Grading
31-Aug-16MAT-72006 AADS, Fall 2016 5
• Exam: Maximum of 30 points• Weekly exercises yield extra points
• 40% of questions answered: 1 point• 80% answered: 6 points• In between: linear scale (so that
decimals are possible)• Final grading depends on what we
agree as course components
Material
• The textbook of the course is– Cormen, Leiserson, Rivest, Stein: Introduction
to Algorithms, 3rd ed., MIT Press, 2009• There is no prepared material, the slides
appear in the web as the lectures proceed– http://www.cs.tut.fi/~elomaa/teach/72006.html
• The exam is based on the lectures (i.e., noton the slides only)
31-Aug-16MAT-72006 AADS, Fall 2016 6
8/31/2016
4
Content (Plan)
I. FoundationsII. (Sorting and) Order StatisticsIII. Data StructuresIV. Advanced Design and Analysis TechniquesV. Advanced Data StructuresVI. Graph AlgorithmsVII.Selected Topics
31-Aug-16MAT-72006 AADS, Fall 2016 7
I Foundations
The Role of Algorithms in ComputingGetting Started
Growth of FunctionsRecurrences
Probabilistic Analysis andRandomized Algorithms
8/31/2016
5
II (Sorting and) OrderStatistics
HeapsortQuicksort
Sorting in Linear TimeMedians and Order Statistics
Part of these could also bestudent presentation topics
8/31/2016
8
The sorting problem
• Input: A sequence of numbers, , … ,
• Output: A permutation (reordering), , … ,
of the input sequence such that
• The numbers that we wish to sort are alsoknown as keys
31-Aug-16MAT-72006 AADS, Fall 2016 15
INSERTION-SORT( )1. for 2 to .2.3. // Insert [ ] into the sorted sequence [1. . – 1]4. 15. while > 0 and [ ] >6. [ + 1] [ ]7. 18. [ + 1]
31-Aug-16MAT-72006 AADS, Fall 2016 16
8/31/2016
9
5 2 4 6 1 3
31-Aug-16MAT-72006 AADS, Fall 2016 17
2 5 4 6 1 3
2 4 5 6 1 3 2 4 5 6 1 3
1 2 4 5 6 3 1 2 3 4 5 6
Correctness of the Algorithm
• The following loop invariant helps usunderstand why the algorithm is correct:
At the start of each iteration of the forloop of lines 1–8, the subarray [1. . – 1]consists of the elements originally in
[1. . – 1] but in sorted order
31-Aug-16MAT-72006 AADS, Fall 2016 18
8/31/2016
10
Initialization
• The loop invariant holds before the first loopiteration, when = 2:– The subarray, therefore, consists of just the
single element [1]– It is the original element in [1]– This subarray is trivially sorted– Therefore, the loop invariant holds prior to
the first iteration of the loop
31-Aug-16MAT-72006 AADS, Fall 2016 19
Maintenance• Each iteration maintains the loop invariant:
– The body of the for loop works by moving[ – 1], [ – 2], [ – 3], … by one position to
the right until the proper position for [ ] isfound (lines 4–7)
– At this point the value of [ ] is inserted (line8)
– The subarray [1. . ] then consists of theelements originally in [1. . ], but in sortedorder
31-Aug-16MAT-72006 AADS, Fall 2016 20
8/31/2016
11
Termination
• The condition causing the for loop toterminate is that > . =
• Because each loop iteration increases by 1,we must have = + 1 at that time
• Substituting + 1 for j in the wording of loopinvariant, we have that the subarray [1. . ]consists of the elements originally in [1. . ],but in sorted order
• [1. . ] is the entire array31-Aug-16MAT-72006 AADS, Fall 2016 21
Analysis of insertion sort
• The time taken by the INSERTION-SORTdepends on the input:– sorting a thousand numbers takes longer than
sorting three numbers• Moreover, the procedure can take different
amounts of time to sort two input sequencesof the same size– depending on how nearly sorted they already
are31-Aug-16MAT-72006 AADS, Fall 2016 22
8/31/2016
12
Input size
• The time taken by an algorithm grows withthe size of the input
• Traditional to describe the running time of aprogram as a function of the size of its input
• For many problems, such as sorting mostnatural measure for input size is the numberof items in the input—i.e., the array size
31-Aug-16MAT-72006 AADS, Fall 2016 23
• For, e.g., multiplying two integers, the bestmeasure is the total number of bits needed torepresent the input in binary notation
• Sometimes, more appropriate to describe thesize with two numbers rather than one
• E.g., if the input to an algorithm is a graph,the input size can be described by thenumbers of vertices and edges in it
31-Aug-16MAT-72006 AADS, Fall 2016 24
8/31/2016
13
Running time
• Running time of an algorithm on an input:– The number of primitive operations (“steps”)
executed• Step as machine-independent as possible• For the moment:
– Constant amount of time to execute each lineof pseudocode
– We assume that each execution of the th linetakes time , where is a constant
31-Aug-16MAT-72006 AADS, Fall 2016 25
31-Aug-16MAT-72006 AADS, Fall 2016 26
INSERTION-SORT( ) cost times
1 for 2 to . 1
2 [ ] 2 – 13 // Insert [ ] into the sorted sequence [1. . ] 0 – 1
4 – 1 4 – 1
5 while > 0 and [ ] > 5
6 [ + 1] [ ] 6 ( 1)
7 – 1 7 ( 1)
8 [ + 1] 8 – 1
8/31/2016
14
• denotes the number of times the while looptest in line 5 is executed for that value of
• When a for or while loop exits in the usualway, the test is executed one time more thanthe loop body
• Comments are not executable statements, sothey take no time
• Running time of the algorithm is the sum ofthose for each statement executed
31-Aug-16MAT-72006 AADS, Fall 2016 27
• To compute ( ), the running time ofINSERTION-SORT on an input of values,– we sum the products of the cost and times
columns, obtaining
= + 1 + 1 +
+ ( 1) + 1 + ( 1)
31-Aug-16MAT-72006 AADS, Fall 2016 28
8/31/2016
15
Best case
• The best case occurs if the array is alreadysorted
• For each = 2, 3, … , , we then nd that [ ] <in line 5 when has its initial value of 1
• Thus = 1 for = 2, 3, … , , and the best-caserunning time is
= + 1 + 1 + 1 + 1= ( + + + + ) ( + + + )
31-Aug-16MAT-72006 AADS, Fall 2016 29
Worst case
• We can express this as + for constantsand that depend on the statement costs
• It is a linear function of• The worst case results when the array is in
reverse sorted order — in decreasing order• We must compare each element [ ] with each
element in the entire sorted subarray [1. . – 1],and so = for = 2, 3, … ,
31-Aug-16MAT-72006 AADS, Fall 2016 30
8/31/2016
16
• Note that
=( + 1)
21
and
1 =( 1)
2by the summation of an arithmetic series
=( + 1)
231-Aug-16MAT-72006 AADS, Fall 2016 31
• The worst-case running time of INSERTION-SORT is
= + 1 + 1
+( + 1)
2 1 +( 1)
2
+( 1)
2 + ( 1)
= 2 + 2 + 2 + +
( + + )31-Aug-16MAT-72006 AADS, Fall 2016 32
8/31/2016
17
• We can express this worst-case running timeas 2 + + for constants , , and thatdepend on the statement costs
• It is a quadratic function of• The rate of growth, or order of growth, of
the running time really interests us• We consider only the leading term of a
formula ( 2); the lower-order terms arerelatively insigni cant for large values of
31-Aug-16MAT-72006 AADS, Fall 2016 33
• We also ignore the leading term’s coef cient,constant factors are less signi cant than therate of growth in determining computationalef ciency for large inputs
• For insertion sort, we are left with the factorof 2 from the leading term
• We write that insertion sort has a worst-caserunning time of ( 2)(“theta of -squared”)
31-Aug-16MAT-72006 AADS, Fall 2016 34
8/31/2016
18
2.3 Designing algorithms
• Insertion sort is an incremental approach: havingsorted [1. . – 1], we insert [ ] into its properplace, yielding sorted subarray [1. . ]
• Let us examine an alternative design approach,known as “divide-and-conquer”
• We design a sorting algorithm whose worst-caserunning time is much lower
• The running times of divide-and-conqueralgorithms are often easily determined
31-Aug-16MAT-72006 AADS, Fall 2016 35
The divide-and-conquer approach
• Many useful algorithms are recursive:– to solve a problem, they call themselves to
deal with closely related subproblems• These algorithms typically follow a divide-
and-conquer approach:– Break the problem into subproblems that
resemble the original problem but are smaller,– Solve the subproblems recursively,– Combine these solutions to create a solution
to the original problem31-Aug-16MAT-72006 AADS, Fall 2016 36
8/31/2016
19
• The paradigm involves three steps at eachlevel of the recursion:1. Divide the problem into a number of
subproblems that are smaller instances ofthe same problem
2. Conquer the subproblems by solving themrecursively
• If the sizes are small enough, just solve thesubproblems in a straightforward manner
3. Combine the solutions to the subproblemsinto the solution for the original problem
31-Aug-16MAT-72006 AADS, Fall 2016 37
The merge sort algorithm
• Divide: Divide the -element sequence intotwo subsequences of /2 elements each
• Conquer: Sort the two subsequencesrecursively using merge sort
• Combine: Merge the two sortedsubsequences to produce the sorted answer– Recursion “bottoms out” when the sequence
to be sorted has length 1: a sequence oflength 1 is already in sorted order
31-Aug-16MAT-72006 AADS, Fall 2016 38
8/31/2016
20
• The key operation is the merging of two sortedsequences in the “combine” step
• We call auxiliary procedure MERGE( , , , ),where is an array and , , and are indicessuch that <
• The procedure assumes that the subarrays[ . . ] and [ + 1. . ] are in sorted order and
• merges them to form a single sorted subarraythat replaces the current subarray [ . . ]
31-Aug-16MAT-72006 AADS, Fall 2016 39
MERGE( , , , )1. 1 – + 12. 2 –3. Let [1. . 1 + 1] and