Top Banner

of 55

chapter2-091117004812-phpapp01

Apr 05, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/31/2019 chapter2-091117004812-phpapp01

    1/55

    QQS1013

    ELEMENTARY STATISTIC

    CHAPTER 2DESCRIPTIVE STATISTICS

    2.1 Introduction

    2.2 Organizing and Graphing Qualitative Data

    2.3 Organizing and Graphing Quantitative Data

    2.4 Central Tendency Measurement

    2.5 Dispersion Measurement

    2.6 Mean, Variance and Standard Deviation for

    Grouped Data

    2.7 Measure of Skewness

  • 7/31/2019 chapter2-091117004812-phpapp01

    2/55

    2

    OBJECTIVES

    After completing this chapter, students should be able to:

    Create and interpret graphical displays involve qualitative

    and quantitative data.

    Describe the difference between grouped and ungrouped

    frequency distribution, frequency and relative frequency,

    relative frequency and cumulative relative frequency.

    Identify and describe the parts of a frequency distribution:

    class boundaries, class width, and class midpoint.

    Identify the shapes of distributions.

    Compute, describe, compare and interpret the three

    measures of central tendency: mean, median, and mode for

    ungrouped and grouped data. Compute, describe, compare and interpret the two measures

    of dispersion: range, and standard deviation (variance) for

    ungrouped and grouped data.

    Compute, describe, and interpret the two measures of

    position: quartiles and interquartile range for ungrouped and

    grouped data.

    Compute, describe and interpret the measures of skewness:

    Pearson Coefficient of Skewness.

  • 7/31/2019 chapter2-091117004812-phpapp01

    3/55

    3

    2.1 Introduction

    Raw data - Data recorded in the sequence in which there are

    collected and before they are processed or ranked.

    Array data - Raw data that is arranged in ascending or descendingorder.

    Example 1

    Here is a list of question asked in a large statistics class and the raw

    data given by one of the students:

    1. What is your sex (m=male, f=female)?Answer (raw data): m

    2. How many hours did you sleep last night?Answer: 5 hours

    3. Randomly pick a letterS or Q.Answer: S

    4. What is your height in inches?Answer: 67 inches

    5. Whats the fastest youve ever driven a car (mph)?Answer: 110 mph

    Example 2

    Quantitative raw data

  • 7/31/2019 chapter2-091117004812-phpapp01

    4/55

    4

    Qualitative raw data

    These data also called ungrouped data

    2.2 Organizing and Graphing Qualitative Data

    2.2.1 Frequency Distributions/ Table

    2.2.2 Relative Frequency and Percentage Distribution

    2.2.3 Graphical Presentation of Qualitative Data

    2.2.1 Frequency Distributions / Table

    A frequency distribution for qualitative data lists all categories and

    the number of elements that belong to each of the categories. It exhibits the frequencies are distributed over various categories

    Also called as a frequency distribution table or simply a frequency

    table.

    The number of students who belong to a certain category is called

    thefrequencyof that category.

  • 7/31/2019 chapter2-091117004812-phpapp01

    5/55

    5

    2.2.2 Relative Frequency and Percentage Distribution

    A relative frequency distribution is a listing of all categories along

    with their relative frequencies (given as proportions or percentages).

    It is commonplace to give the frequency and relative frequency

    distribution together.

    Calculating relative frequency and percentage of a category

    Relative Frequency of a category

    = Frequency of that categorySum of all frequencies

    Percentage = (Relative Frequency)* 100

  • 7/31/2019 chapter2-091117004812-phpapp01

    6/55

    6

    Example 3

    A sample of UUM staff-owned vehicles produced by Proton was

    identified and the make of each noted. The resulting sample follows (W =

    Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):

    W W P Is Is P Is W St Wj

    Is W W Wj Is W W Is W WjWj Is Wj Sv W W W Wj St W

    Wj Sv W Is P Sv Wj Wj W W

    St W W W W St St P Wj Sv

    Construct a frequency distribution table for these data with their relative

    frequency and percentage.

    Solution:

    Category FrequencyRelative

    FrequencyPercentage (%)

    Wira 19 19/50 = 0.380.38*100

    = 38

    Iswara 8 0.16 16

    Perdana 4 0.08 8

    Waja 10 0.20 20

    Satria 5 0.10 10

    Savvy 4 0.08 8

    Total 50 1.00 100

    2.2.3 Graphical Presentation of Qualitative Data

    1. Bar Graphs

    A graph made of bars whose heights represent the frequencies of

    respective categories.

    Such a graph is most helpful when you have many categories to

    represent.

    Notice that agap is inserted between each of the bars.

    It has=> simple/ vertical bar chart

  • 7/31/2019 chapter2-091117004812-phpapp01

    7/55

    7

    => horizontal bar chart

    => component bar chart

    => multiple bar chart

    Simple/ Vertical Bar Chart

    To construct a vertical bar chart, mark the various categories on the

    horizontal axis and mark the frequencies on the vertical axis

    Refer to Figure 2.1 and Figure 2.2,

    Figure 2.1 Figure 2.2

    Horizontal Bar Chart

    To construct a horizontal bar chart, mark the various categories on

    the vertical axis and mark the frequencies on the horizontal axis.

    Example 4: Refer Example 3,

    Figure 2.30 5 10 15 20

    Wira

    Iswara

    Perdana

    Waja

    Satria

    Savvy

    Frequency

    TypesofVehicle

    UUM Staff-owned Vehicles Produced By

    Proton

  • 7/31/2019 chapter2-091117004812-phpapp01

    8/55

    8

    Another example of horizontal bar chart: Figure 2.4

    Figure 2.4: Number of students at Diversity College who areimmigrants, by last country of permanent residence

    Component Bar Chart

    To construct a component bar chart, all categories is in one bar and

    every bar is divided into components.

    The height of components should be tally with representativefrequencies.

    Example 5

    Suppose we want to illustrate the information below, representing

    the number of people participating in the activities offered by an

    outdoor pursuits centre during Jun of three consecutive years.

    2004 2005 2006Climbing 21 34 36Caving 10 12 21Walking 75 85 100

    Sailing 36 36 40Total 142 167 191

  • 7/31/2019 chapter2-091117004812-phpapp01

    9/55

    9

    Solution:

    Figure 2.5

    Mulztiple Bar Chart

    To construct a multiple bar chart, each bars that representative any

    categories are gathered in groups.

    The height of the bar represented the frequencies of categories.

    Useful for making comparisons (two or more values).

    Example 6: Refer example 5,

    Figure 2.6

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180200

    2004 2005 2006

    Numberofparticipants

    Year

    Activities Breakdown (Jun)

    Sailing

    Walking

    Caving

    Climbing

    0

    20

    40

    60

    80

    100

    120

    2004 2005 2006

    Numberofparticipants

    Year

    Activities Breakdown (Jun)

    Climbing

    Caving

    Walking

    Sailing

  • 7/31/2019 chapter2-091117004812-phpapp01

    10/55

    10

    Another example of horizontal bar chart: Figure 2.7

    Figure 2.7: Preferred snack choices of students at UUM

    The bar graphs for relative frequency and percentage distributions

    can be drawn simply by marking the relative frequencies or

    percentages, instead of the class frequencies.

    2. Pie Chart

    A circle divided into portions that represent the relative frequencies

    or percentages of a population or a sample belonging to different

    categories.

    An alternative to the bar chart and useful for summarizing a single

    categorical variable if there are not too many categories.

    The chart makes it easy to compare relative sizes of each

    class/category.

    The whole pie represents the total sample or population. The pie is

    divided into different portions that represent the different categories.

    To construct a pie chart, we multiply 360o by the relative frequency

    for each category to obtain the degree measure or size of the angle

    for the corresponding categories.

  • 7/31/2019 chapter2-091117004812-phpapp01

    11/55

    11

    Example 7 (Table 2.6 and Figure 2.8):

    Table 2.6 Figure 2.8

    Example 8 (Table 2.7 and Figure 2.9):

    MovieGenres

    Frequency RelativeFrequency

    Angle Size

    Comedy

    ActionRomanceDrama

    Horror

    Foreign

    ScienceFiction

    54

    362828

    22

    16

    16

    0.27

    0.180.140.14

    0.11

    0.08

    0.08

    360*0.27=97.2o

    360*0.18=64.8o

    360*0.14=50.4

    o

    360*0.14=50.4o

    360*0.11=39.6o

    360*0.08=28.8o

    360*0.08=28.8o

    200 1.00 360o

    Figure 2.9Figure 2.9

  • 7/31/2019 chapter2-091117004812-phpapp01

    12/55

    12

    3. Line Graph/Time Series Graph

    A graph represents data that occur over a specific period time of

    time.

    Line graphs are more popular than all other graphs combined

    because their visual characteristics reveal data trends clearly and

    these graphs are easy to create.

    When analyzing the graph, look for a trend or pattern that occurs

    over the time period.

    Example is the line ascending (indicating an increase over time) or

    descending (indicating a decrease over time).

    Another thing to look for is theslope, orsteepness, of the line. A line

    that is steep over a specific time period indicates a rapid increase or

    decrease over that period.

    Two data sets can be compared on the same graph (called a

    compound time series graph) if two lines are used.

    Data collected on the same element for the same variable at different

    points in time or for different periods of time are called time series

    data.

    A line graph is a visual comparison of how two variablesshown on

    the x- and y-axesare related or vary with each other. It shows

    related information by drawing a continuous line between all the

    points on a grid.

    Line graphs compare two variables: one is plotted along the x-axis

    (horizontal) and the other along the y-axis (vertical).

    The y-axis in a line graph usually indicates quantity (e.g., RM,

    numbers of sales litres) or percentage, while the horizontal x-axis

    often measures units of time. As a result, the line graph is often

    viewed as a time series graph

  • 7/31/2019 chapter2-091117004812-phpapp01

    13/55

    13

    Example 9

    A transit manager wishes to use the following data for a presentation

    showing how Port Authority Transit ridership has changed over the

    years. Draw a time series graph for the data and summarize thefindings.

    YearRidership

    (in millions)1990

    1991

    1992

    1993

    1994

    88.0

    85.0

    75.7

    76.6

    75.4

    Solution:

    The graph shows a decline in ridership through 1992 and then leveling offfor the years 1993 and 1994.

    75

    77

    79

    81

    83

    85

    87

    89

    1990 1991 1992 1993 1994

    Ridership(in

    millions)

    Year

  • 7/31/2019 chapter2-091117004812-phpapp01

    14/55

    14

    Exercise 1

    1. The following data show the method of payment by 16 customers in asupermarket checkout line. Here, C = cash, CK = check, CC = credit card, D =

    debit and O = other.

    C CK CK C CC D O C

    CK CC D CC C CK CK CC

    a. Construct a frequency distribution table.b. Calculate the relative frequencies and percentages for all categories.c. Draw a pie chart for the percentage distribution.

    2. The frequency distribution table represents the sale of certain product in ZeeZeeCompany. Each of the products was given the frequency of the sales in certain

    period. Find the relative frequency and the percentage of each product. Then,

    construct a pie chart using the obtained information.

    Type ofProduct

    Frequency RelativeFrequency

    Percentage Angle Size

    A

    B

    C

    D

    E

    13

    12

    5

    9

    11

    3. Draw a time series graph to represent the data for the number of worldwide airlinefatalities for the given years.

    Year 1990 1991 1992 1993 1994 1995 1996No. of

    fatalities440 510 990 801 732 557 1132

    4. A questionnaire about how people get news resulted in the following information

    from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).

    N N R T T

    R N T M R

    M M N R N

    T R M N M

    T R R N N

    a. Construct a frequency distribution for the data.b. Construct a bar graph for the data.

  • 7/31/2019 chapter2-091117004812-phpapp01

    15/55

    15

    5. The given information shows the export and import trade in million RM for fourmonths of sales in certain year. Using the provided information, present this

    data in component bar graph.

    Month Export Import

    SeptemberOctober

    November

    December

    2830

    32

    24

    2028

    17

    14

    6. The following information represents the maximum rain fall in millimeter (mm)in each state in Malaysia. You are supposed to help a meteorologist in your

    place to make an analysis. Based on your knowledge, present this information

    using the most appropriate chart and give your comment.

    State Quantity (mm)

    Perlis

    Kedah

    Pulau Pinang

    Perak

    Selangor

    Wilayah Persekutuan

    Kuala Lumpur

    Negeri Sembilan

    Melaka

    Johor

    Pahang

    Terengganu

    Kelantan

    Sarawak

    Sabah

    435

    512

    163

    721

    664

    1003

    390

    223

    876

    1050

    1255

    986

    878

    456

  • 7/31/2019 chapter2-091117004812-phpapp01

    16/55

    16

    2.3 Organizing and Graphing Quantitative Data

    2.3.1 Stem and Leaf Display

    2.3.2 Frequency Distribution

    2.3.3 Relative Frequency and Percentage

    Distributions.

    2.3.4 Graphing Grouped Data

    2.3.5 Shapes of Histogram

    2.3.6 Cumulative Frequency Distributions.

    2.3.1 Stem-and-Leaf Display

    In stem and leaf display of quantitative data, each value is

    divided into two portions a stem and a leaf. Then the leaves

    for each stem are shown separately in a display.

    Gives the information of data pattern.

    Can detect which value frequently repeated.

    Example 10

    25 12 9 10 5 12 23 736 13 11 12 31 28 37 61441 38 44 13 22 18 19

    Solution:

    0 9 5 7 6

    1 2 0 2 3 1 2 4 3 8 9

    2 5 3 8 2

    3 6 1 7 8

    4 1 4

  • 7/31/2019 chapter2-091117004812-phpapp01

    17/55

    17

    2.3.2 Frequency Distributions

    Afrequency distribution for quantitative data lists all the classes and

    the number of values that belong to each class.

    Data presented in form of frequency distribution are called grouped

    data.

    The class boundary is given by the midpoint of the upper limit of

    one class and the lower limit of the next class. Also calledreal class

    limit.

    To find the midpoint of the upper limit of the first class and the

    lower limit of the second class, we divide the sum of these two limits

    by 2.

    e.g.:

    400 401400.5

    2

    class boundary

  • 7/31/2019 chapter2-091117004812-phpapp01

    18/55

    18

    Class Width (class size)

    Class width = Upper boundaryLower boundary

    e.g. :Width of the first class = 600.5400.5 = 200

    Class Midpoint or Mark

    Lower limit + Upper limitclass midpoint or mark =

    2

    e.g:

    401 600Midpoint of the 1st class = 500.5

    2

  • 7/31/2019 chapter2-091117004812-phpapp01

    19/55

    19

    Constructing Frequency Distribution Tables

    1. To decide the number of classes, we used Sturges formula,

    which is

    c = 1 + 3.3 log n

    where c is the no. of classes

    n is the no. of observations in the data set.

    2. Class width,

    Largest value - Smallest valueNumber of classes

    Range

    i

    ic

    This class width is rounded to a convenient number.

    3. Lower Limit of the First Class or the Starting Point

    Use the smallest value in the data set.

    Example 11

    The following data give the total home runs hit by all players of each of

    the 30 Major League Baseball teams during 2004 season

  • 7/31/2019 chapter2-091117004812-phpapp01

    20/55

    20

    Solution:

    i) Number of classes, c = 1 + 3.3 log 30= 1 + 3.3(1.48)= 5.89 6 class

    ii) Class width,

    242 135

    6

    17.8

    18

    i

    iii) Starting Point = 135

    Table 2.10 Frequency Distribution for Data of Table 2.9

    Total Home Runs Tally f135152

    153170171188

    189206

    207224

    225242

    |||| ||||

    ||||||

    |||| |

    |||

    ||||

    10

    25

    6

    3

    4

    30f

  • 7/31/2019 chapter2-091117004812-phpapp01

    21/55

    21

    2.3.3 Relative Frequency and Percentage Distributions

    Frequency of that classRelative frequency of a class =

    Sum of all frequencies

    =

    Percentage = (Relative frequency) 100

    f

    f

    Example 12 (Refer example 11)

    Table 2.11: Relative Frequency and Percentage Distributions

    Total HomeRuns

    Class Boundaries RelativeFrequency

    %

    135152153170

    171188

    189206

    207224

    225242

    134.5 less than 152.5152.5 less than 170.5

    170.5 less than 188.5

    188.5 less than 206.5

    206.5 less than 224.5

    224.5 less than 242.5

    0.33330.0667

    0.1667

    0.2

    0.1

    0.1333

    33.336.67

    16.67

    20

    10

    13.33

    Sum 1.0 100%

    2.3.4 Graphing Grouped Data

    1. Histograms

    A histogram is a graph in which the class boundaries are

    marked on the horizontal axis and either the frequencies,

    relative frequencies, or percentages are marked on the vertical

    axis. The frequencies, relative frequencies or percentages are

    represented by the heights of the bars.

    In histogram, the bars are drawn adjacent to each other and

    there is a space between y axis and the first bar.

  • 7/31/2019 chapter2-091117004812-phpapp01

    22/55

    22

    0

    2

    4

    6

    8

    10

    12

    1

    Frequency

    Total home runs

    Example 13 (Refer example 11)

    Figure 2.10: Frequency histogram for Table 2.10

    2. Polygon

    A graph formed by joining the midpoints of the tops of

    successive bars in a histogram with straight lines is called apolygon.

    Example 13

    Figure 2.11: Frequency polygon for Table 2.10

    0

    2

    4

    6

    8

    10

    12

    1

    Frequency

    Total home runs

    134.5 152.5 170.5 188.5 206.5 224.5 242.5

    134.5 152.5 170.5 188.5 206.5 224.5 242.5

  • 7/31/2019 chapter2-091117004812-phpapp01

    23/55

    23

    For a very large data set, as the number of classes is increased (and

    the width of classes is decreased), the frequency polygon eventually

    becomes a smooth curve called a frequency distribution curve or

    simply afrequency curve.

    Figure 2.12: Frequency distribution curve

    2.3.5 Shape of Histogram

    Same as polygon.

    For a very large data set, as the number of classes is increased

    (and the width of classes is decreased), the frequency polygon

    eventually becomes a smooth curve called a frequency

    distribution curveor simply afrequency curve.

    The most common of shapes are:

    (i) Symmetric

    Figure 2.13 & 2.14: Symmetric histograms

  • 7/31/2019 chapter2-091117004812-phpapp01

    24/55

    24

    (ii) Right skewed and (iii) Left skewed

    Figure 2.15 & 2.16: Right skewed and Left skewed

    Describing data using graphs helps us insight into the main

    characteristics of the data.

    When interpreting a graph, we should be very cautious. We should

    observe carefully whether the frequency axis has been truncated or

    whether any axis has been unnecessarily shortened or stretched.

    2.3.6 Cumulative Frequency Distributions

    Acumulative frequency distribution gives the total number of

    values that fall below the upper boundary of each class.

    Example 14: Using the frequency distribution of table 2.11,

    Total HomeRuns

    Class Boundaries Cumulative Frequency

    135152

    153170

    171188

    189206

    207224

    225242

    134.5 less than 152.5

    152.5 less than 170.5

    170.5 less than 188.5

    188.5 less than 206.5

    206.5 less than 224.5

    224.5 less than 242.5

    10

    10+2=12

    10+2+5=17

    10+2+5+6=23

    10+2+5+6+3=26

    10+2+5+6+3+4=30

  • 7/31/2019 chapter2-091117004812-phpapp01

    25/55

    25

    Ogive

    An ogive is a curve drawn for the cumulative frequency distribution

    by joining with straight lines the dots marked above the upper

    boundaries of classes at heights equal to the cumulative frequencies

    of respective classes.

    Two type of ogive:

    (i) ogive less than

    (ii) ogive greater than

    First, build a table of cumulative frequency.

    Example 15 (Ogive Less Than)

    Earnings (RM) CumulativeFrequency

    (F)

    Less than 29.5Less than 39.5Less than 49.5

    Less than 59.5Less than 69.5Less than 79.5Less than 89.5

    05

    11

    17202330

    Figure 2.17

    5663

    37

    30 3940 4950 5960 - 69

    70

    7980 - 89

    30

    Number ofstudents (f)

    Total

    Earnings(RM)

    CumulativeFrequency

    0

    5

    10

    1520

    25

    30

    35

    29.5 39.5 49.5 59.5 69.5 79.5 89.5

    Earnings

  • 7/31/2019 chapter2-091117004812-phpapp01

    26/55

    26

    Example 16 (Ogive Greater Than)

    Figure 2.18

    Figure 2.18

    566337

    30 3940 4950 5960 - 6970 7980 - 89

    30

    Number of

    students (f)

    Total

    Earnings

    (RM)

    302519131070

    More than 29.5More than 39.5More than 49.5More than 59.5More than 69.5More than 79.5More than 89.5

    CumulativeFre uenc F

    Earnings

    RM

    0

    5

    10

    15

    20

    25

    30

    35

    29.5 39.5 49.5 59.5 69.5 79.5 89.5

    EarningsCumulativeFrequency

  • 7/31/2019 chapter2-091117004812-phpapp01

    27/55

    27

    2.3.7 Box-Plot

    Describe the analyze data graphically using 5 measurement:

    smallest value, first quartile (K1), second quartile (median or

    K2), third quartile (K3) and largest value.

    2.4 Measures of Central Tendency

    2.4.1 Ungrouped Data(1) Mean

    (2) Weighted mean

    (3) Median

    (4) Mode

    2.4.2 Grouped Data(1) Mean

    (2) Median

    (3) Mode

    Smallest

    value

    Largest

    value

    K1 Median K3

    Largestvalue

    K1 Median K3

    Largestvalue

    K1 Median K3

    Smallestvalue

    Smallest

    value

    For symmetry data

    For left skewed data

    For right skewed data

  • 7/31/2019 chapter2-091117004812-phpapp01

    28/55

    28

    2.4.3 Relationship among mean, median & mode

    2.4.1 Ungrouped Data

    1. Mean

    Mean for population data:x

    N

    Mean for sample data:x

    xn

    where: x = the sum af all valuesN = the population size

    n = the sample size, = the population mean

    x = the sample mean

    Example 17

    The following data give the prices (rounded to thousand RM) of five

    homes sold recently in Sekayang.

    158 189 265 127 191

    Find the mean sale price for these homes.

    Solution:

    158 189 265 127 191

    5

    930

    5

    186

    x

    x

    n

    Thus, these five homes were sold for an average price of RM186thousand @ RM186 000.

  • 7/31/2019 chapter2-091117004812-phpapp01

    29/55

    29

    The mean has the advantage that its calculation includes each valueof the data set.

    2. Weighted Mean

    Used when have different needs.

    Weight mean :

    w

    wxx

    w

    where w is a weight.

    Example 18

    Consider the data of electricity components purchasing from a factory inthe table below:

    Type Number of component (w) Cost/unit (x)

    1

    23

    4

    5

    1200

    5002500

    1000

    800

    RM3.00

    RM3.40RM2.80

    RM2.90

    RM3.25

    Total 6000

    Solution:

    1200(3) 500(3.4) 2500(2.8) 1000(2.9) 800(3.25)

    1200 500 2500 1000 800

    17800

    6000

    2.967

    w

    wx

    x w

    =

    =

    =

    Mean cost of a unit of the component is RM2.97

  • 7/31/2019 chapter2-091117004812-phpapp01

    30/55

    30

    3. Median

    Median is the value of the middle term in a data set that has been

    ranked in increasing order.

    Procedure for finding the Median

    Step 1: Rank the data set in increasing order.

    Step 2: Determine the depth (position or location) of the median.

    1

    2

    n Depth of Median =

    Step 3: Determine the value of the Median.

    Example 19

    Find the median for the following data:

    10 5 19 8 3

    Solution:

    (1) Rank the data in increasing order3 5 8 10 19

    (2) Determine the depth of the Median1

    2

    5 1

    2

    3

    n

    Depth of Median =

    =

    =

    (3) Determine the value of the median

    Therefore the median is located in third position of the data set.

    3 5 8 10 19

    Hence, the Median for above data = 8

  • 7/31/2019 chapter2-091117004812-phpapp01

    31/55

    31

    Example 20

    Find the median for the following data:

    10 5 19 8 3 15

    Solution:

    (1) Rank the data in increasing order

    3 5 8 10 15 19

    (2) Determine the depth of the Median

    1

    2

    6 1

    2

    3.5

    n

    Depth of Median =

    =

    =

    (3) Determine the value of the Median

    Therefore the median is located in the middle of 3rd

    position and 4th

    position of the data set.

    8 109

    2

    Median

    Hence, the Median for the above data = 9

    The median gives the center of a histogram, with half of the data

    values to the left of (or, less than) the median and half to the right of

    (or, more than) the median.

    The advantage of using the median is that it is not influenced by

    outliers.

  • 7/31/2019 chapter2-091117004812-phpapp01

    32/55

    32

    4. Mode

    Mode is the value that occurs with the highest frequency in adata set.

    Example 21

    1. What is the mode for given data?

    77 69 74 81 71 68 74 73

    2. What is the mode for given data?

    77 69 68 74 81 71 68 74 73

    Solution:

    1. Mode = 74 (this number occurs twice): Unimodal

    2. Mode = 68 and 74: Bimodal

    A major shortcoming of the mode is that a data set may have

    none or may have more than one mode.

    One advantage of the mode is that it can be calculated for both

    kinds of data, quantitative and qualitative.

    2.4.2 Grouped Data

    1. Mean

    Mean for population data:

    fx =

    N

    Mean for sample data:

    fxx =

    n

    Where x the midpoint andf is the frequency of a class.

  • 7/31/2019 chapter2-091117004812-phpapp01

    33/55

    33

    Example 22

    The following table gives the frequency distribution of the number of

    orders received each day during the past 50 days at the office of a mail-

    order company. Calculate the mean.

    Solution:

    Because the data set includes only 50 days, it represents a sample. The

    value of fx is calculated in the following table:

    Numberof order

    f x fx

    10121315

    1618

    1921

    412

    20

    14

    1114

    17

    20

    44168

    340

    280

    n = 50 fx= 832

    The value of mean sample is:

    fx 832x = = =16.64

    n 50

    Thus, this mail-order company received an average of 16.64 orders per

    day during these 50 days.

    Numberof order

    f

    1012

    1315

    1618

    1921

    4

    12

    20

    14

    n = 50

  • 7/31/2019 chapter2-091117004812-phpapp01

    34/55

    34

    2. Median

    Step 1: Construct the cumulative frequency distribution.

    Step 2: Decide the class that contain the median.

    Class Median is the first class with the value of cumulative

    frequency is at least n/2.

    Step 3: Find the median by using the following formula:

    Where:n = the total frequencyF = the total frequency before class mediani = the class width

    = the lower boundary of the class median= the frequency of the class median

    Example 23

    Based on the grouped data below, find the median:

    Time to travel to work Frequency

    110

    11202130

    31404150

    8

    1412

    97

    Median mm

    n- F

    2= L + i f

    mL

    mf

  • 7/31/2019 chapter2-091117004812-phpapp01

    35/55

    35

    Solution:

    1st Step: Construct the cumulative frequency distribution

    Time to travelto work

    Frequency CumulativeFrequency

    110

    11202130

    3140

    4150

    8

    1412

    9

    7

    8

    2234

    43

    50

    Class median is the 3rd

    class

    So, F= 22, = 12, = 21.5 and i = 10

    Therefore,

    Thus, 25 persons take less than 24 minutes to travel to work and another

    25 persons take more than 24 minutes to travel to work.

    252

    50

    2

    n

    mf

    mL

    2

    25 2221 5 10

    12

    24

    Median

    =

    =

    mm

    n- F

    = L if

    -.

  • 7/31/2019 chapter2-091117004812-phpapp01

    36/55

    36

    3. Mode

    Mode is the value that has the highest frequency in a data set.

    For grouped data, class mode (or, modal class) is the class with

    the highest frequency.

    To find mode for grouped data, use the following formula:

    Where:

    is the lower boundary of class mode

    is the difference between the frequency of class mode and

    the frequency of the class before the class mode

    is the difference between the frequency of class mode and

    the frequency of the class after the class mode

    i is the class width

    Example 24

    Based on the grouped data below, find the mode

    Time to travel to work Frequency

    110

    1120

    21303140

    4150

    8

    14

    129

    7

    Mode 1mo

    1 2

    = L + i

    +

    moL

    1

    2

  • 7/31/2019 chapter2-091117004812-phpapp01

    37/55

  • 7/31/2019 chapter2-091117004812-phpapp01

    38/55

    38

    2.4.3 Relationship among mean, median & mode

    As discussed in previous topic, histogram or a frequency

    distribution curve can assume either skewed shape or

    symmetrical shape.

    Knowing the value of mean, median and mode can give us

    some idea about the shape of frequency curve.

    (1) For a symmetrical histogram and frequency curve with one

    peak, the value of the mean, median and mode are identical

    and they lie at the center of the distribution.(Figure 2.20)(2) For a histogram and a frequency curve skewed to the right, the

    value of the mean is the largest that of the mode is the smallest

    and the value of the median lies between these two.

    Figure 2.20: Mean, median, andmode for a symmetric histogram

    and frequency distribution curve

    Figure 2.21: Mean, median, and mode fora histogram and frequency distributioncurve skewed to

    the right

    (3) For a histogram and afrequency curve skewed to

    the left, the value of the

    mean is the smallest and

    that of the mode is the

    largest and the value of the

    median lies between thesetwo.

  • 7/31/2019 chapter2-091117004812-phpapp01

    39/55

    39

    Figure 2.22: Mean, median, and mode for a histogram and

    frequency distribution curve skewed to the left

  • 7/31/2019 chapter2-091117004812-phpapp01

    40/55

    40

    2.5 Dispersion Measurement

    The measures of central tendency such as mean, median and

    mode do not reveal the whole picture of the distribution of adata set.

    Two data sets with the same mean may have a completely

    different spreads.

    The variation among the values of observations for one data

    set may be much larger or smaller than for the other data set.

    2.5.1 Ungrouped data

    (1) Range

    (2) Standard Deviation

    2.5.2 Grouped data

    (1) Range

    (2) Standard deviation

    2.5.3 Relative Dispersion Measurement

    2.5.1 Ungrouped Data

    1. Range

    RANGE = Largest valueSmallest value

    Example 25:

    Find the range of production for this data set,

  • 7/31/2019 chapter2-091117004812-phpapp01

    41/55

    41

    Solution:

    Range = Largest valueSmallest value

    = 267 27749 651

    = 217 626

    Disadvantages:o being influenced by outliers.o Based on two values only. All other values in a data set are

    ignored.

    2. Variance and Standard Deviation

    Standard deviation is the most used measure of dispersion.

    A Standard Deviation value tells how closely the values of a data

    set clustered around the mean.

    Lower value of standard deviation indicates that the data set value

    are spread over relatively smaller range around the mean.

    Larger value of data set indicates that the data set value are spread

    over relatively larger around the mean (far from mean).

    Standard deviation is obtained the positive root of the variance:

    Variance Standard Deviation

    Population

    N

    N

    xx

    2

    2

    2

    22

    Sample

    1

    2

    2

    2

    n

    n

    x

    x

    s

    22ss

  • 7/31/2019 chapter2-091117004812-phpapp01

    42/55

    42

    Example 26

    Let x denote the total production (in unit) of company

    Company ProductionA

    B

    C

    D

    E

    62

    93

    126

    75

    34

    Find the variance and standard deviation,

    Solution:

    Company Production (x) x2

    A

    B

    C

    D

    E

    62

    93

    126

    75

    34

    3844

    8649

    15 876

    5625

    1156

    1156 351502 x

    2

    5

    5 1

    1182 50

    39035150-=

    =

    2

    2

    2

    xx -

    ns =n -1

    .

    Since s2

    = 1182.50;

    Therefore,

    1182 50

    34 3875

    s .

    .

  • 7/31/2019 chapter2-091117004812-phpapp01

    43/55

    43

    The properties of variance and standard deviation:

    (1) The standard deviation is a measure of variation of all values

    from the mean.

    (2) The value of the variance and the standard deviation are nevernegative. Also, larger values of variance or standard deviation

    indicate greater amounts of variation.

    (3) The value ofs can increase dramatically with the inclusion of

    one or more outliers.

    (4) The measurement units of variance are always the square ofthe measurement units of the original data while the units of

    standard deviation are the same as the units of the original

    data values.

    2.5.2 Grouped Data

    1. Range

    Class Frequency

    4150

    5160

    61707180

    8190

    91 - 100

    1

    3

    713

    10

    6

    Total 40

    Upper bound of last class = 100.5

    Lower bound of first class = 40.5Range = 100.540.5 = 60

    Range = Upper bound of last classLower bound of first class

  • 7/31/2019 chapter2-091117004812-phpapp01

    44/55

    44

    2. Variance and Standard Deviation

    Variance Standard Deviation

    Population

    2

    2

    2

    fx

    fx NN

    22

    Sample

    2

    2

    2

    1

    fxfx

    nsn

    22ss

    Example 27

    Find the variance and standard deviation for the following data:

    Solution:

    No. of order f x fx fx2

    101213151618

    1921

    41220

    14

    111417

    20

    44168340

    280

    48423525780

    5600

    Total n = 50 857 14216

    No. of order f

    1012

    1315

    1618

    1921

    4

    12

    20

    14

    Total n = 50

  • 7/31/2019 chapter2-091117004812-phpapp01

    45/55

  • 7/31/2019 chapter2-091117004812-phpapp01

    46/55

  • 7/31/2019 chapter2-091117004812-phpapp01

    47/55

    47

    1. Quartiles

    Quartiles are three summary measures that divide ranked dataset into four equal parts.

    The 1st quartilesdenoted as Q1

    1

    4

    1Depth of Q =

    n

    The 2nd quartilesmedian of a data set or Q2

    The 3rd quartilesdenoted as Q3

    3 1

    4

    3Depth of Q =

    (n )

    Example 29

    1. Table below lists the total revenue for the 11 top tourism company inMalaysia

    109.7 79.9 21.2 76.4 80.2 82.1 79.4 89.3 98.0 103.586.8

    Solution:

    Step 1: Arrange the data in increasing order

    76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7

    121.2

    Step 2: Determine the depth for Q1 and Q3

  • 7/31/2019 chapter2-091117004812-phpapp01

    48/55

    48

    1 11 13

    4 4

    1Depth of Q = = =

    n

    3 11 13 1 94 4

    3Depth of Q = = =

    (n )

    Step 3: Determine the Q1 and Q3

    76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7121.2

    Q1 = 79.9

    Q3 = 103.5

    2. Table below lists the total revenue for the 12 top tourism company inMalaysia

    109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3

    98.0 103.5 86.8

    Solution:

    Step 1: Arrange the data in increasing order

    74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5109.7 121.2

    Step 2: Determine the depth for Q1

    and Q3

    1 12 13 25

    4 4

    1Depth of Q = = =

    n.

    3 12 13 19 75

    4 4

    3Depth of Q = = =

    (n ).

  • 7/31/2019 chapter2-091117004812-phpapp01

    49/55

    49

    Step 3: Determine the Q1 and Q3

    74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5109.7 121.2

    Q1 = 79.4 + 0.25 (79.979.4) = 79.525

    Q3 = 98.0 + 0.75 (103.598.0) = 102.125

    2. Interquartile Range

    The difference betweenthe third quartile and the first quartile

    for a data set.

    IQR = Q3Q1

    Example 30

    By referringto example 29, calculate the IQR.

    Solution:

    IQR = Q3Q1 = 102.12579.525 = 22.6

    2.6.2 Grouped Data

    1.Quartiles

    From Median, we can get Q1 and Q3 equation as follows:

    1

    1

    1 Q

    Q

    n- F

    4Q L + if

    ;

    3

    3

    3 Q

    Q

    3 n- F

    4Q L + if

  • 7/31/2019 chapter2-091117004812-phpapp01

    50/55

    50

    Example 31

    Refer to example 23, find Q1 and Q3

    Solution:

    1st Step: Construct the cumulative frequency distribution

    Time to travelto work

    Frequency Cumulative Frequency

    110

    11

    202130

    3140

    4150

    8

    1412

    9

    7

    8

    2234

    43

    50

    2nd Step: Determine the Q1 and Q3

    1

    n 50Class Q 12 5

    4 4

    .

    Class Q1 is the 2nd

    class

    Therefore,

    1

    1

    1

    4

    12 5 810 5 10

    14

    13 7143

    Q

    Q

    n- F

    Q L if

    . -.

    .

  • 7/31/2019 chapter2-091117004812-phpapp01

    51/55

    51

    3

    3 503nClass Q 37 5

    4 4.

    Class Q3 is the 4th

    class

    Therefore,

    3

    3

    3

    4

    37 5 3430 5 10

    9

    34 3889

    Q

    Q

    n- F

    Q L if

    . -.

    .

    2.Interquartile Range

    IQR = Q3Q1

    Example 32:

    Referto example 31, calculate the IQR.

    Solution:

    IQR = Q3Q1 = 34.388913.7143 = 20.6746

  • 7/31/2019 chapter2-091117004812-phpapp01

    52/55

    52

    2.7 Measure of Skewness

    To determine the skewness of data (symmetry, left skewed,

    right skewed) Also called Skewness Coefficient orPearson Coefficient of

    Skewness

    IfSk+ve right skewed

    IfSk-ve left skewed

    IfSk= 0

    IfSk takes a value in between (-0.9999, -0.0001) or (0.0001,

    0.9999)

    approximately symmetry.

    Example 33

    The duration of cancer patient warded in Hospital Seberang Jaya recorded

    in a frequency distribution. From the record, the mean is 28 days, median

    is 25 days and mode is 23 days. Given the standard deviation is 4.2 days.

    a. What is the type of distribution?b. Find the skewness coefficient

    Solution:

    This distribution is right skewed because the mean is the largest value

    28 2311905

    4 2

    3 3 28 2521429

    4 2

    Mean - Mode

    OR

    Mean - Median

    k

    k

    S .s .

    S .s .

    So, from the Skvalue this distribution is right skewed.

    s

    ModeMeanS

    or

    s

    ModeMeanS

    k

    k

    )(3

  • 7/31/2019 chapter2-091117004812-phpapp01

    53/55

    53

    Exercise 2:

    1. A survey research company asks 100 people how many times they have been tothe dentist in the last five years. Their grouped responses appear below.

    Number of Visits Number of Responses

    04 16

    59 25

    1014 48

    1519 11

    What are the mean and variance of the data?

    2. A researcher asked 25 consumers: How much would you pay for a televisionadapter that provides Internet access? Their grouped responses are as follows:

    Amount ($) Number of Responses

    099 2

    100199 2

    200249 3

    250299 3

    300349 6

    350399 3

    400499 4

    500999 2

    Calculate the mean, variance, and standard deviation.

    3. The following data give the pairs of shoes sold per day by a particular shoe storein the last 20 days.

    85 90 89 70 79 80 83 83 75 76

    89 86 71 76 77 89 70 65 90 86

    Calculate thea. mean and interpret the value.b. median and interpret the value.c. mode and interpret the value.d. standard deviation.

  • 7/31/2019 chapter2-091117004812-phpapp01

    54/55

    54

    4. The followings data shows the information of serving time (in minutes) for 40

    customers in a post office:

    2.0 4.5 2.5 2.9 4.2 2.9 3.5 2.8

    3.2 2.9 4.0 3.0 3.8 2.5 2.3 3.5

    2.1 3.1 3.6 4.3 4.7 2.6 4.1 3.14.6 2.8 5.1 2.7 2.6 4.4 3.5 3.0

    2.7 3.9 2.9 2.9 2.5 3.7 3.3 2.4

    a. Construct a frequency distribution table with 0.5 of class width.

    b. Construct a histogram.

    c. Calculate the mode and median of the data.

    d. Find the mean of serving time.

    e. Determine the skewness of the data.

    f. Find the first and third quartile value of the data.

    g. Determine the value of interquartile range.

    5. In a survey for a class of final semester student, a group of data was obtained for

    the number of text books owned.

    Number ofstudents

    Number of textbook owned

    12

    9

    11

    15

    108

    5

    5

    3

    2

    10

    Find the average number of text book for the class. Use the weighted mean.

    6. The following data represent the ages of 15 people buying lift tickets at a skiarea.

    15 25 26 17 38 16 60 21

    30 53 28 40 20 35 31

    Calculate the quartile and interquartile range.

    7. A student scores 60 on a mathematics test that has a mean of 54 and a standarddeviation of 3, and she scores 80 on a history test with a mean of 75 and a

    standard deviation of 2. On which test did she perform better?

  • 7/31/2019 chapter2-091117004812-phpapp01

    55/55

    8. The following table gives the distribution of the shares price for ABC Companywhich was listed in BSKL in 2005.

    Price (RM) Frequency

    1214

    1517

    1820

    2123

    2426

    27 - 29

    5

    14

    25

    7

    6

    3

    Find the mean, median and mode for this data.