Algorithm Analysis Data Structures and Algorithms (60-254)

Algorithm AnalysisData Structures and Algorithms (60-254)

2

Quantification of Performance

• We want to understand the behaviour of our algorithms (both runtime and space utilization) on different inputs.

• This will enable us to compare different algorithms that solve the same problem.

• To do this, we characterize the performance as a function of the number of inputs (why? reasonable? pitfalls?)

• The functions may be complex. We want to understand their growth rate in simpler terms.

3

Growth of Functions

Let f(n) be a function of a positive integer n. The dominant term of f(n) determines the behavior of f(n) as n .For example, let f(n) = 2n3 + 3n2 + 4n + 1The dominant term of f(n) is 2n3.This means that as n becomes large (n ):

2n3 dominates the behavior of f(n) The other terms’ contributions become much less significant.

4

More Examples

Example 2:The term dominates the behavior of f(n) as n

Example 3:The term 2n dominates the behavior of f(n) as n

5

Rate of Growth

The rate of growth means how function behaves as n .It is determined by its dominant term.The big-Oh notation is a short-hand way of expressing this. The relationship f(n) is O(n2) is interpreted as:

f(n) grows no faster than n2 as n becomes large (n ).The dominant term of f(n) does not grow faster than n2.

6

More Examples

Also, O(n2) is true for:

We could make up as many such functions as we wish.

7

Definition of Big Oh

Problem: Find the most accurate description for any function in terms of the big-Oh notation.

A formal definition of: f(n) is O(g(n))is that the inequality: f(n) c * g(n)holds for all n n0, where n0 and c are positive constants

f(n) and g(n) are functions mapping nonnegative integers to real numbersInformally “f(n) is order g(n)”

8

Graphical Interpretation of Big Oh

9

f(n) = 7n2 + .5n + 6 and g(n) = n2

f(n) is O(n2), provided that c = 10 and n0 = 2

n0 = 2

Example

7n2 + .5n + 6

10n2

10

In General

In general, if f(n) = a0 + a1 n + … + ad-1 nd-1 + ad nd

Then, f(n) is O(nd)

We will see other functions too:For example: O(log n), O(n log n), , etc. Having defined the big-Oh notation, now …

11

Quantifying Performance

Input size TimeI1 T1

I2 T2

… …

We want to quantify the behavior of an algorithm.Use this to compare efficiency of two algorithms on the same problem.

Observations:1. A program (algorithm) consumes resources: time and space2. Amount of resources directly related to size of input.

Say, we have the following table:

How can we derive the function T(I) ?

12

Quantifying Performance

How can we derive the function T(I) ?

Problems:• Too many parameters for using interpolation• It depends on

• machine in which program is run• Compiler used• Programming language• Programmer who writes the code

Solution:Imagine our algorithm runs on an “algorithm machine” that accepts pseudo-code.

Assumptions:READs and WRITEs take constant timeArithmetic operations take constant timeLogical operations “ “ “

13

Worst Case

• Even though we made various assumptions, it is still complicated.• Instead, we quantify our algorithm on the worst-case input.

• This is called “worst-case analysis”

• Also, the “average-case analysis” exists:• Requires probability distribution of set of inputs which is usually unknown.

• Not studied in this course.

14

Input Size

• not always easy to determine, and• problem dependent

Some examples:• Graph-theoretic problem: Number of vertices, V, and number of

edges, E.• Matrix multiplication: Number of rows and columns of input matrices.• Sorting: The number of elements, n.

15

Reporting Performance

• Typically we don’t try to find exactly what T(I) is.• Instead, we can say: T(I) is O(g(I))

• For example, time complexity of mergesort of n elements:T(n) is O(n log n)

• Behavior of mergesort is better than a constant times n log n, where n n0.

16

Analysis of Examples

1. Given a list of n elements, find the minimum (or maximum).Then, T(n) is O(n)We look at all elements to determine minimum (maximum).

2. Given n points in the plane, find the closest pair of points.In this case, T(n) is O(n2)Why? A brute-force algorithm that looks at all n2 pairs of points.

17

Analysis of Examples

3. Given n points in a plane, determine if any three points are contained in a straight line.

In this case, T(n) is O(n3)

Why? A brute-force algorithm that searches all n3 triplets.

18

WAIT A MINUTE!

• What is T(n) for finding the GCD of m and n?

• The naïve brute force algorithm was O(n) but the Euclidean algorithm was O(log n)? Hmmm…

• And how about this? Find the gcd(m=1989,n=1590).Algorithm:Step 1. Output 3.

• So T(n) is O(1).

• We must distinguish between the complexity of an algorithm and the complexity of a class of problems.

19

Maximum Contiguous Subsequence (MCS) ProblemGiven a sequence of n integers: a1, a2, a3, …, an-1, an

a contiguous subsequence is: ai, ai+1, …, aj-1, aj, where 1 i j n.

The problem: Determine a contiguous subsequence such that:

ai + ai+1 + … + aj-1 + aj 0 is maximal.

Some examples: -1, -2, -3, -4, -5, -6MCS is empty, has value 0 by definition.

20

More Examples

For the sequence: -1, 2, 3, -3, 2, an MCS is 2, 3 whose value is 2 + 3 = 5. Note: There may be more than one MCS. For example: -1, 1, -1, 1, -1, 1 has six MCS whose value is 1

21

An O(n2) Algorithm for MCS

Search problems have an associated search space.To figure out: How large the search space is.For the MCS problem: How many sequences need be examined?For example, -1, 2, 3, -3, 2Then, the subsequences that begin with –1 are:-1-1, 2-1, 2, 3-1, 2, 3, -3-1, 2, 3, -3, 2

22


The ones beginning with 2 are:22, 32, 3, -32, 3, -3, 2Those beginning with 3 are:33, -33, -3, 2The ones beginning with –3:-3-3, 2and beginning with 2, just one: 2

23


Then, including the empty sequence, a total of 16 examined.

In general, given a1, a2, a3, …, an-1, an

We have n sequences beginning with a1:a1

a1, a2

a1, a2, a3

….a1, a2, a3, …, an-1, an

n-1 beginning with a2:a2

a2, a3

….a2, a3, …, an-1, an

24


and so on. Then, two subsequences beginning with an-1:an-1

an-1, an

and, finally, one beginning with an

an

Total of possible subsequences: 1 + 2 + … n-1 + n + 1 = n(n+1)/2 + 1 Analysis:The dominant term is n2/2, hence search space is O(n2). A “brute-force” algorithm follows…

25


Algorithm MCSBruteForce

Input: A sequence a1, a2, a3, …, an-1, an.Output: value, start and end of MCS.

Set maxSum 0for i = 1 to n do

Set sum 0for j = i to n do

sum sum + aj

if (sum > maxSum).maxSum sumstart iend j

Print start, end, maxSum and STOP.

26

Improved MCS Algorithm

Think of avoiding looking at all the subsequences.Introduce the following notion.

Given: ai, ai+1, …, ak, ak+1, …, aj (1)

the subsequence: ai, ai+1, …, ak

is a prefix of (1), where i k j.

The prefix sum is: ai + ai+1 + … + ak

Observation: In an MCS no prefix sum can be negative.

27


In the previous example, -1, 2, 3, -3, 2, we exclude:

-1-1, 2-1, 2, 3-1, 2, 3, -3-1, 2, 3, -3, 2

and -3-3, 2

as being possible candidates.

28


In general:If ever sum < 0, skip over index positions from i+1, …, jAlso, if sum 0 always for a starting position i, none of positions i+1, …, n is a candidate start position, since all prefix sums are non-negative.

The improved MCS algorithm inspects ai just once. The algorithm follows….

29


Algorithm MCSImprovedSet i 1; Set start end 1Set maxSum sum 0

for j = 1 to n dosum sum + aj

if (sum > maxSum)maxSum sumstart iend j

if (sum < 0)i j + 1sum 0

Print start, end, maxSum and STOP.

30

Analysis of the Algorithms

Algorithm MCSBruteForce:The outer loop is executed n timesFor each i, the inner loop is executed n – i + 1 timesThus, the total number of times the inner loop is executed: Algorithm MCSImproved:It has a single for loop, which visits all n elements.Hence,

Algorithm Analysis Data Structures and Algorithms (60-254)

Documents