Top Banner
Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah
65
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STA301_LEC05

Virtual University of PakistanLecture No. 5

Statistics and Probability

by

Miss Saleha Naghmi Habibullah

Page 2: STA301_LEC05

IN THE LAST LECTURE, YOU LEARNT:

•Frequency distribution of a continuous variable

•Relative frequency distribution and percentage frequency distribution

•Histogram

•Frequency Polygon

•Frequency CurveToday’s lecture is in continuation with the last lecture, and today we will begin with various types of frequency curves that are encountered in practice.

Also, we will discuss the cumulative frequency distribution and cumulative frequency polygon for a continuous variable.

Page 3: STA301_LEC05

FREQUENCY POLYGON

A frequency polygon is obtained by plotting the class frequencies against the mid-points of the classes, and connecting the points so obtained by straight line segments.

In our example of the EPA mileage ratings, the classes were:

ClassBoundaries

Mid-Point(X)

Frequency(f)

26.95 – 29.95 28.4529.95 – 32.95 31.45 232.95 – 35.95 34.45 435.95 – 38.95 37.45 1438.95 – 41.95 40.45 841.95 – 44.95 43.45 244.95 – 47.95 46.45

Page 4: STA301_LEC05

02468

10121416

28.45

31.45

34.45

37.45

40.45

43.45

46.45

Miles per gallon

Num

ber o

f Car

s

X

Y

Page 5: STA301_LEC05

Also, it was mentioned that, when the frequency polygon is smoothed, we obtain what may be called the FREQUENCY CURVE.

02468

10121416

Miles per gallon

Num

ber o

f Car

s

X

Y

In the above figure, the dotted line represents the frequency curve.It should be noted that it is not necessary that our frequency curve must touch all the points.

Page 6: STA301_LEC05

The purpose of the frequency curve is simply to display the overall pattern of the distribution.

Hence we draw the curve by the free-hand method, and hence it does not have to touch all the plotted points.

It should be realized that the frequency curve is actually a theoretical concept.

If the class interval of a histogram is made very small, and the number of classes is very large, the rectangles of the histogram will be narrow as shown below:

Page 7: STA301_LEC05
Page 8: STA301_LEC05

The smaller the class interval and the larger the number of classes,the narrower the rectangles will be. In this way, the histogram approaches a smooth curve as shown below:

Page 9: STA301_LEC05

VARIOUS TYPES OFFREQUENCY CURVES

the symmetrical frequency curvethe moderately skewed frequency curvethe extremely skewed frequency curvethe U-shaped frequency curve

Page 10: STA301_LEC05

THE SYMMETRIC CURVE

Page 11: STA301_LEC05

X

f

If we place a vertical mirror in the centre of this graph, the left hand side will be the mirror image of the right hand side.

Page 12: STA301_LEC05

X

f

THE POSITIVELY SKEWED CURVE

Page 13: STA301_LEC05

X

f

THE NEGATIVELY SKEWED CURVEOn the other hand, the negatively skewed

frequency curve is the one for which the left tail is longer than the right tail.

Page 14: STA301_LEC05

X

f

THE EXTREMELY NEGATIVELY SKEWED(J-SHAPED) CURVE

This is the case when the maximum frequency occurs at the end of the frequency table.

Page 15: STA301_LEC05

For example, if we think of the death rates of adult males of various age groups starting from age 20 and going up to age 79 years, we might obtain something like this:

Age Group No. of deathsper thousand

20 – 29 2.130 – 39 4.340 – 49 5.750 – 59 8.960 – 69 12.470 – 79 16.7

Page 16: STA301_LEC05

X

f

THE EXTREMELY POSITIVELY SKEWED (REVERSE J-SHAPED) CURVE

This will result in a J-shaped distribution similar to the one shown above. Similarly, the extremely positively skewed distribution is known as the REVERSE J-shaped distribution.

Page 17: STA301_LEC05

Example

The following are the no. of 6’s obtained in 60 rolls of 4 dice:

00100020010000110120010001101001210031100001210011

Construct a frequency distribution and line chart, and discuss the overall shape of the distribution.

Page 18: STA301_LEC05

Solution

Applying the tally method, we obtain the following frequency distribution:

Page 19: STA301_LEC05

Frequency distribution

No. of 6’s No. of 6’s TallyTally frequencyfrequency

00 |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||||| 2828

11 |||| |||| |||| |||||| |||| |||| || 1717

22 |||||||| 44

33 || 11

TotalTotal 5050

Page 20: STA301_LEC05

Line Chart

X3210

30

20

10

0

f

Page 21: STA301_LEC05

Eventually, This is an extremely positively skewed distribution ---

Which may also be regarded as reverse j-shaped distribution.

Page 22: STA301_LEC05

In this example, Since X is discrete variable, hence, actually we should not draw a continuous curve in this diagram. We have done so here only to indicate the overall shape of the distribution.

Page 23: STA301_LEC05

Do the above frequency distribution indicate that dice that were rolled were unfair?

Page 24: STA301_LEC05

X

f

THE U-SHAPED CURVEA relatively LESS frequently encountered

frequency distribution is the U-shaped distribution.

Page 25: STA301_LEC05

If we consider the example of the death rates not for only the adult population but for the population of ALL the age groups, we will obtain the U-shaped distribution.Out of all these curves, the MOST frequently encountered frequency distribution is the moderately skewed frequency distribution. There are thousands of natural and social phenomena which yield the moderately skewed frequency distribution.

Page 26: STA301_LEC05

Another rather less frequently encountered distribution is the uniform distribution.

Page 27: STA301_LEC05

Example

Suppose that a fair die is rolled 120 times and the following frequency distribution is obtained:

Page 28: STA301_LEC05

Frequency distributionNo. of dots on the upper-most face

Xf

1 192 223 204 215 196 19Total 120

Page 29: STA301_LEC05

Line chart

X4321

30

20

10

0

f

5 6

Page 30: STA301_LEC05

The point to be noted is that, Since the die was absolutely fair, hence the every side of the die had equal chance of coming on the top.

As such, Out of 120 tosses, we could have expected to obtain X= 1 20 times, X= 2 20 times, X= 3 20 times and so on.

Page 31: STA301_LEC05

Whenever we are dealing with “an equally likely” situation of the type described in this example, we encounter the uniform distribution.

Page 32: STA301_LEC05

Suppose that we walk into a school and collect data of the weights, heights, marks, shoulder-lengths, finger-lengths or any other such variable pertaining to the children of any one class. If we construct a frequency distribution of this data, and draw its histogram and its frequency curve, we will find that our data will generate a moderately skewed distribution. Until now, we have discussed the various possible shapes of the frequency distribution of a continuous variable.

Similar shapes are possible for the frequency distribution of a discrete variable.

Page 33: STA301_LEC05

I. Positively Skewed Distribution

0 1 2 3 4 5 6 7 8 9 10X

VARIOUS TYPES OF DISCRETE FREQUENCY DISTRIBUTION

Page 34: STA301_LEC05

II. Negatively Skewed Distribution

0 1 2 3 4 5 6 7 8 9 10X

Page 35: STA301_LEC05

III. Symmetric Distribution

0 1 2 3 4 5 6 7 8 9 10X

Page 36: STA301_LEC05

Let us now consider another aspect of the frequency distribution i.e. the CUMULATIVE FREQUENCY DISTRIBUTION. As in the case of the frequency distribution of a discrete variable, if we start adding the frequencies of our frequency table column-wise, we obtain the column of cumulative frequencies.

Page 37: STA301_LEC05

ClassBoundaries Frequency Cumulative

Frequency29.95 – 32.95 2 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTION

Page 38: STA301_LEC05

In the above table, 2+4 gives 6, 6+14 gives 20,

and so on. The question arises: “What is the purpose of

making this column?” You will recall that, when we were discussing the frequency distribution of a discrete variable, any particular cumulative frequency meant that we were counting the number of observations starting from the very first value of X and going upto THAT particular value of X against which that particular cumulative frequency was falling.

Page 39: STA301_LEC05

In case of a the distribution of a continuous variable, each of these cumulative frequencies represents the total frequency of a frequency distribution from the lower class boundary of the lowest class to the UPPER class boundary of THAT class whose cumulative frequency we are considering. In the above table, the total number of cars showing mileage less than 35.95 miles per gallon is 6, the total number of car showing mileage less than 41.95 miles per gallon is 28, etc.

Page 40: STA301_LEC05

ClassBoundaries Frequency Cumulative

Frequency29.95 – 32.95 2 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTION

Page 41: STA301_LEC05

Such a cumulative frequency distribution is called a “less than” type of a cumulative frequency distribution. The graph of a cumulative frequency distribution is called a CUMULATIVE FREQUENCY POLYGON or OGIVE. A “less than” type ogive is obtained by marking off the upper class boundaries of the various classes along the X-axis and the cumulative frequencies along the y-axis, as shown below:

Page 42: STA301_LEC05

05

1015202530

Upper Class Boundaries

cf

Page 43: STA301_LEC05

05

101520253035

29.95

32.95

35.95

38.95

41.95

44.95

Cumulative Frequency Polygon or OGIVEThe cumulative frequencies are plotted on

the graph paper against the upper class boundaries, and the points so obtained are joined by means of straight line segments. Hence we obtain the cumulative frequency polygon shown below:

Page 44: STA301_LEC05

ClassBoundaries Frequency Cumulative

Frequency26.95 – 29.95 0 029.95 – 32.95 2 0+2 = 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTIONIt should be noted that this graph is touching

the X-Axis on the left-hand side. This is achieved by ADDING a class having zero frequency in the beginning of our frequency distribution, as shown below:

Page 45: STA301_LEC05

Since the frequency of the first class is zero, hence the cumulative frequency of the first class will also be zero, and hence, automatically, the cumulative frequency polygon will touch the X-Axis from the left hand side.If we want our cumulative frequency polygon to be closed from the right-hand side also , we can achieve this by connecting the last point on our graph paper with the X-axis by means of a vertical line, as shown below:

Page 46: STA301_LEC05

05

101520253035

29.95

32.95

35.95

38.95

41.95

44.95

OGIVE

Page 47: STA301_LEC05

Example

Let us consolidate these ideas with the help of the example of the ages of the managers of child-care centers that we discussed in the last lecture. The following table contains the ages of 50 managers of child-care centers in five cities of a developed country

Page 48: STA301_LEC05

Ages of a sample of managers of Urban child-care centers

42 26 32 34 5730 58 37 50 3053 40 30 47 4950 40 32 31 4052 28 23 35 2530 36 32 26 5055 30 58 64 5249 33 43 46 3261 31 30 40 6074 37 29 43 54

Convert this data into Frequency Distribution.

Page 49: STA301_LEC05

Frequency Distribution of Child-Care Managers Age

Class Interval Frequency 20 – 29 630 – 39 1840 – 49 1150 – 59 1160 – 69 370 – 79 1Total 50

Construct the cumulative frequency distribution.

Page 50: STA301_LEC05

Cumulative Frequency

The cumulative frequency is the running total of the frequencies through the total.

The cumulative frequency for each class interval is the frequency for that class interval added to the preceding cumulative total.

Page 51: STA301_LEC05

Cumulative frequencies of child-Cumulative frequencies of child-care datacare data

Class Interval

Frequency Cumulative frequency

20 – 29 6 630 – 39 18 2440 – 49 11 3550 – 59 11 4660 – 69 3 4970 – 79 1 50Total 50

Page 52: STA301_LEC05

Interpretation 24 of the 50 managers (i.e. 48% of

the managers) are 39 years of age or less. (i.e. less than 40 years old.)

46 of 50 managers (i.e. 92% of the managers) are 59 years of age or less. (i.e. less than 60 years old.) and so on.

Page 53: STA301_LEC05

Cumulative frequency polygon or Ogive

0

10

20

30

40

50

60

19.5

29.5

39.5

49.5

59.5

69.5

79.5

Page 54: STA301_LEC05

Real-life applications

The concept of cumulative frequency is used in many ways including,

Sales cumulated over fiscal year. Sports scores during a contest.

(cumulated points) Years of service. Points earned in a course. Costs of doing business over a period

of time.

Page 55: STA301_LEC05

EXAMPLE:

For a sample of 40 pizza products, the following data represent cost of a slice in dollars (SCost).

PRODUCT ScostPizza Hut Hand Tossed 1.51Domino’s Deep Dish 1.53Pizza Hut Pan Pizza 1.51Domino’s Hand Tossed 1.90Little Caesars Pan! Pizza! 1.23

Continued …...

Page 56: STA301_LEC05

Continued …...

PRODUCT SCostBoboli crust with Boboli sauce 1.00Jack’s Super Cheese 0.69Pappalo’s Three Cheese 0.75Tombstone Original Extra Cheese 0.81Master Choice Gourmet Four Cheese 0.90Celeste Pizza For One 0.92Totino’s Party 0.64The New Weight Watchers Extra Cheese 1.54Jeno’s Crisp’N Tasty 0.72Stouffer’s French Bread 2-Cheese 1.15

Page 57: STA301_LEC05

Continued …...

PRODUCT SCostEllio’s 9-slice 0.52Kroger 0.72Healthy Choice French Bread 1.50Lean Cuisine French Bread 1.49DiGiorno Rising Crust 0.87Tombstone Special Order 0.81Pappalo’s 0.73Jack’s New More Cheese! 0.64Tombstone Original 0.77Red Baron Premium 0.80

Page 58: STA301_LEC05

PRODUCT ScostTony’s Italian Style Pastry Cruse 0.83Red Baron Deep Dish Singles 1.13Totino’s Party 0.62The New Weight Watchers 1.52Jeno’s Crisp’N Tasty 0.71Stouffer’s French Bread 1.14Celeste Pizza For One 1.11Tombstone For One French Bread 1.11Healthy Choice French Bread 1.46Lean Cuisine French Bread 1.71

Continued …...

Page 59: STA301_LEC05

PRODUCT ScostLittle Caesars Pizza! Pizza! 1.28Pizza Hut Stuffed Crust 1.23DiGiorno Rising Crust Four Cheese 0.90Tombstone Speical Order Four Cheese 0.85Red Baron Premium 4-Cheese 0.80

Example taken from “Business Statistics – A First Course” by Mark L. Berenson & David M. Levine (International Edition), Prentice-Hall International, Inc., Copyright © 1998.

Source: “Pizza,” Copyright 1997 by Consumers Union of United States, Inc., Yonkers, N.Y. 10703.

Page 60: STA301_LEC05

In order to construct the frequency distribution of the above data, the first thing to note is that, in this example, all our data values are correct to two decimal places. As such, we should construct the class limits correct to TWO decimal places, and the class boundaries correct to three decimal places. As in the last example, first of all, let us find the maximum and the minimum values in our data, and compute the RANGE.

Minimum value X0 = 0.52Maximum value Xm = 1.90

Hence: Range = 1.90 - 0.52

= 1.38

Page 61: STA301_LEC05

Desired number of classes = 8

Class interval h ~= 1.38 / 8 = 0.1725 ~ 0.20

ClassesofNumberRange

Lower limit of the first class = 0.51Hence, our successive class limits come out

to be: Class Limits0.51 – 0.700.71 – 0.900.91 – 1.101.11 – 1.301.31 – 1.501.51 – 1.701.71 – 1.90

Page 62: STA301_LEC05

Class

Limits

Class

Boundaries

0.51 – 0.70 0.505 – 0.705

0.71 – 0.90 0.705 – 0.905

0.91 – 1.10 0.905 – 1.105

1.11 – 1.30 1.105 – 1.305

1.31 – 1.50 1.305 – 1.505

1.51 – 1.70 1.505 – 1.705

1.71 – 1.90 1.705 – 1.905

Page 63: STA301_LEC05

By tallying the data-values in the appropriate classes, we will obtain a frequency distribution similar to the one that we obtained in the examples of the EPA mileage ratings.

By constructing the histogram of this data-set, we will be able to decide whether our distribution is symmetric, positively skewed or negatively skewed.

Page 64: STA301_LEC05

IN TODAY’S LECTURE, YOU LEARNT

•Frequency Distribution of a continuous variable•Relative frequency distribution•Percentage frequency distribution•Histogram•Frequency polygon •Frequency curve

Page 65: STA301_LEC05

IN THE NEXT LECTURE, YOU WILL LEARN

•Stem and leaf plot

•Dot plot

•The Concept of Central Tendency