Top Banner

of 69

3.Numerical Descriptive Techniques

Apr 03, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/28/2019 3.Numerical Descriptive Techniques

    1/69

    1

    Numerical

    DescriptiveTechniques

  • 7/28/2019 3.Numerical Descriptive Techniques

    2/69

    Summary Measures

    Arithmetic Mean

    Median

    Mode

    Describing Data Numerically

    Variance

    Standard Deviation

    Coefficient of Variation

    Range

    Interquartile Range

    Geometric Mean

    Skewness

    Central Tendency Variation Shape

    Quartiles

    2

  • 7/28/2019 3.Numerical Descriptive Techniques

    3/69

    3

    Measures of Central Location

    Usually, we focus our attention on twotypes of measures when describing

    population characteristics: Central location

    Variability or spread

    The measure of central locationreflects the locations of all the actual

    data points.

  • 7/28/2019 3.Numerical Descriptive Techniques

    4/69

    4

    With one data point

    clearly the central

    location is at the point

    itself.

    Measures of Central Location

    The measure of central location reflectsthe locations of all the actual datapoints.

    How?

    But if the third data point

    appears on the left hand-side

    of the midrange, it should pull

    the central location to the left.

    With two data points,

    the central location

    should fall in the middle

    between them (in order

    to reflect the location of

    both of them).

  • 7/28/2019 3.Numerical Descriptive Techniques

    5/69

    5

    Sum of the observations

    Number of observationsMean =

    This is the most popular and usefulmeasure of central location

    The Arithmetic Mean

  • 7/28/2019 3.Numerical Descriptive Techniques

    6/69

    6

    n

    xx i

    n1i

    Sample mean Population mean

    N

    x iN1i

    Sample size Population size

    n

    xx i

    n1i

    The Arithmetic Mean

  • 7/28/2019 3.Numerical Descriptive Techniques

    7/69

    7

    10...

    10

    1021101 xxxxxii

    Example 1The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33,

    14, 8, 0, 9, 22 hours. Find the mean time on the Internet.

    0 7 2211.0

    Example 2

    Suppose the telephone bills represent the populationof measurements.The population mean is

    200

    x...xx

    200

    x 20021i2001i 42.19 38.45 45.77

    43.59

    The Arithmetic Mean

  • 7/28/2019 3.Numerical Descriptive Techniques

    8/69

    8

    The Arithmetic Mean

    Drawback of the mean:

    It can be influenced by unusualobservations, because it uses all theinformation in the data set.

  • 7/28/2019 3.Numerical Descriptive Techniques

    9/69

    9

    Odd number of observations

    0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 330, 0, 5, 7, 8,9, 12, 14, 22, 33

    Even number of observations

    Example 3

    Find the median of the time on the internet

    for the 10 adults of example 1

    The Median of a set of observations is thevalue that falls in the middle when theobservations are arranged in order of

    magnitude. It divides the data in half.

    The Median

    Suppose only 9 adults were sampled

    (exclude, say, the longest time (33))

    Comment

    8.5, 8

  • 7/28/2019 3.Numerical Descriptive Techniques

    10/69

    10

    The Median

    Median of

    8 2 9 11 1 6 3

    n = 7 (odd sample size). First order the data.1 2 3 6 8 9 11

    Median

    For odd sample size, median is the {(n+1)/2}th

    ordered observation.

  • 7/28/2019 3.Numerical Descriptive Techniques

    11/69

    11

    The Median

    The engineering group receives e-mailrequests for technical information fromsales and services person. The daily

    numbers for 6 days were11, 9, 17, 19, 4, and 15.

    What is the central location of the data?

    For even sample sizes, the median is theaverage of {n/2}th and {n/2+1}th ordered observations.

  • 7/28/2019 3.Numerical Descriptive Techniques

    12/69

    12

    The Mode of a set of observations is the valuethat occurs most frequently.

    Set of data may have one mode (or modal

    class), or two or more modes.

    The modal classFor large data sets

    the modal class is

    much more relevant

    than a single-value

    mode.

    The Mode

  • 7/28/2019 3.Numerical Descriptive Techniques

    13/69

    13

    Find the mode for the data in Example 1. Hereare the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22

    Solution

    All observation except 0 occur once. There are two 0.Thus, the mode is zero.

    Is this a good measure of central location? The value 0 does not reside at the center of this set

    (compare with the mean = 11.0 and the median = 8.5).

    The Mode

  • 7/28/2019 3.Numerical Descriptive Techniques

    14/69

    14

    Relationship among Mean, Median,

    and Mode

    If a distribution is symmetrical, the mean,median and mode coincide

    If a distribution is asymmetrical, andskewed to the left or to the right, the three

    measures differ.

    A positively skewed distribution

    (skewed to the right)

    MeanMedianMode

    Mean = Median = Mode

    Mode < Median < Mean

  • 7/28/2019 3.Numerical Descriptive Techniques

    15/69

    15

    If a distribution is symmetrical, the mean,median and mode coincide

    If a distribution is non symmetrical, andskewed to the left or to the right, the threemeasures differ.

    A positively skewed distribution(skewed to the right)

    MeanMedian

    ModeMeanMedianMode

    A negatively skewed distribution(skewed to the left)

    Relationship among Mean, Median,

    and Mode

    Mean < Median < Mode

  • 7/28/2019 3.Numerical Descriptive Techniques

    16/69

    Geometric Mean

    The arithmetic mean is the most popular measure of thecentral location of the distribution of a set ofobservations.

    But the arithmetic mean is not a good measure of theaverage rate at which a quantity grows over time. Thatquantity, whose growth rate (or rate of change) wewish to measure, might be the total annual sales of afirm or the market value of an investment.

    The geometric mean should be used to measure theaverage growth rate of the values of a variable overtime.

    16

  • 7/28/2019 3.Numerical Descriptive Techniques

    17/69

    17

  • 7/28/2019 3.Numerical Descriptive Techniques

    18/69

    Example

    18

  • 7/28/2019 3.Numerical Descriptive Techniques

    19/69

    19

  • 7/28/2019 3.Numerical Descriptive Techniques

    20/69

    20

  • 7/28/2019 3.Numerical Descriptive Techniques

    21/69

    21

  • 7/28/2019 3.Numerical Descriptive Techniques

    22/69

    22

    Measures of variability

    Measures of central location fail to tell thewhole story about the distribution.

    A question of interest still remainsunanswered:

    How much are the observations spread out

    around the mean value?

  • 7/28/2019 3.Numerical Descriptive Techniques

    23/69

    23

    Measures of variability

    Observe two hypothetical

    data sets:

    The average value provides

    a good representation of the

    observations in the data set.

    Small variability

    This data set is now

    changing to...

  • 7/28/2019 3.Numerical Descriptive Techniques

    24/69

    24

    Measures of variability

    Observe two hypothetical

    data sets:

    The average value provides

    a good representation of the

    observations in the data set.

    Small variability

    Larger variabilityThe same average value does not

    provide as good representation of the

    observations in the data set as before.

  • 7/28/2019 3.Numerical Descriptive Techniques

    25/69

    25

    The range of a set of observations is the difference

    between the largest and smallest observations. Its major advantage is the ease with which it can be

    computed.

    Its major shortcoming is its failure to provide

    information on the dispersion of the observationsbetween the two end points.

    ? ? ?

    But, how do all the observations spread out?

    Smallest

    observation

    Largest

    observation

    The range cannot assist in answering this question

    Range

    The range

  • 7/28/2019 3.Numerical Descriptive Techniques

    26/69

    26

    This measure reflects the dispersion ofall the

    observations

    The variance ofa population of size N, x1, x2,,xN

    whose mean is is defined as

    The variance ofa sample of n observationsx1, x2, ,xnwhose mean is is defined asx

    N

    )x( 2iN1i2

    1n

    )xx(s

    2i

    n1i2

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    27/69

    27

    Why not use the sum of deviations?

    Consider two small populations:

    1098

    74 10

    11 12

    13 16

    8-10= -2

    9-10= -1

    11-10= +1

    12-10= +2

    4-10 = - 6

    7-10 = -3

    13-10 = +3

    16-10 = +6

    Sum = 0

    Sum = 0

    The mean of both

    populations is 10...

    but measurements in B

    are more dispersed

    than those in A.

    A measure of dispersion

    Should agrees with thisobservation.

    Can the sum of deviations

    Be a good measure of dispersion?

    A

    B

    The sum of deviations iszero for both populations,

    therefore, is not a good

    measure of dispersion.

  • 7/28/2019 3.Numerical Descriptive Techniques

    28/69

    28

    Let us calculate the variance of the two populations

    185

    )1016()1013()1010()107()104( 222222B

    25

    )1012()1011()1010()109()108( 222222A

    Why is the variance defined as

    the average squared deviation?

    Why not use the sum of squared

    deviations as a measure of

    variation instead?

    After all, the sum of squared

    deviations increases in

    magnitude when the variation

    of a data set increases!!

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    29/69

    29

    Which data set has a larger dispersion?

    1 3 1 32 5A B

    Data set B

    is more dispersed

    around the mean

    Let us calculate the sum of squared deviations for both data sets

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    30/69

    30

    1 3 1 32 5

    A B

    SumA = (1-2)2++(1-2)2 +(3-2)2 + +(3-2)

    2= 10

    SumB = (1-3)2 + (5-3)2 = 8

    SumA > SumB. This is inconsistent with the

    observation that set B is more dispersed.

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    31/69

    31

    1 3 1 32 5

    A B

    However, when calculated on per observation

    basis (variance), the data set dispersions are

    properly ranked.

    A2 = SumA/N = 10/10 = 1

    B2 = SumB/N = 8/2 = 4

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    32/69

    32

    Example 4 The following sample consists of the

    number of jobs six students applied for: 17,15, 23, 7, 9, 13. Find its mean and

    variance

    Solution

    2

    2222

    in1i2

    jobs2.33

    )1413...()1415()1417(16

    1

    1n

    )xx(s

    jobs14

    6

    84

    6

    1397231517

    6

    xx

    i61i

    The Variance

  • 7/28/2019 3.Numerical Descriptive Techniques

    33/69

    33

    2

    2222

    2i

    n1i2

    i

    n

    1i

    2

    jobs2.33

    6

    13...151713...1517

    16

    1

    n

    )x(x

    1n

    1s

    The Variance Shortcut

    method

  • 7/28/2019 3.Numerical Descriptive Techniques

    34/69

    34

    The standard deviation of a set ofobservations is the square root of the

    variance .

    2

    2

    :deviationandardstPopulation

    ss:deviationstandardSample

    Standard Deviation

  • 7/28/2019 3.Numerical Descriptive Techniques

    35/69

    35

    Example 5

    To examine the consistency of shots for anew innovative golf club, a golfer was asked

    to hit 150 shots, 75 with a currently used (7-iron) club, and 75 with the new club.

    The distances were recorded.

    Which 7-iron is more consistent?

    Standard Deviation

  • 7/28/2019 3.Numerical Descriptive Techniques

    36/69

    36

    Example 5 solution

    Standard Deviation

    Excel printout, from theDescriptive Statistics sub-menu.

    Current Innovation

    Mean 150.5467 Mean 150.1467

    Standard Error 0.668815 Standard Error 0.357011

    Median 151 Median 150

    Mode 150 Mode 149

    Standard Deviation 5.792104 Standard Deviation 3.091808

    Sample Variance 33.54847 Sample Variance 9.559279

    Kurtosis 0.12674 Kurtosis -0.88542Skewness -0.42989 Skewness 0.177338

    Range 28 Range 12

    Minimum 134 Minimum 144

    Maximum 162 Maximum 156

    Sum 11291 Sum 11261

    Count 75 Count 75

    The innovation club ismore consistent, and

    because the means are

    close, is considered a

    better club

    http://localhost/var/www/apps/conversion/tmp/scratch_15/Xm04-08.xlshttp://localhost/var/www/apps/conversion/tmp/scratch_15/Xm04-08.xls
  • 7/28/2019 3.Numerical Descriptive Techniques

    37/69

    37

    Interpreting Standard Deviation

    The standard deviation can be used to compare the variability of several distributions

    make a statement about the general shape of a

    distribution. The empirical rule: If a sample of

    observations has a mound-shapeddistribution, the interval

    tsmeasurementheof68%elyapproximatcontains)sx,sx(

    tsmeasurementheof95%elyapproximatcontains)s2x,s2x( tsmeasurementheof99.7%elyapprox imatcontains)s3x,s3x(

  • 7/28/2019 3.Numerical Descriptive Techniques

    38/69

    38

    Example 6A statistics practitioner wants to

    describe the way returns on investmentare distributed.

    The mean return = 10%

    The standard deviation of the return = 8%

    The histogram is bell shaped.

    Interpreting Standard Deviation

  • 7/28/2019 3.Numerical Descriptive Techniques

    39/69

    39

    Example 6 solution

    The empirical rule can be applied (bell shapedhistogram)

    Describing the return distribution Approximately 68% of the returns lie between 2% and

    18%[10 1(8), 10 + 1(8)]

    Approximately 95% of the returns lie between -6% and26%[10 2(8), 10 + 2(8)]

    Approximately 99.7% of the returns lie between -14% and34% [10 3(8), 10 + 3(8)]

    Interpreting Standard Deviation

  • 7/28/2019 3.Numerical Descriptive Techniques

    40/69

    40

    For any value of k 1, greater than 100(1-1/k2)% ofthe data lie within the interval from to .

    This theorem is valid foranyset of measurements(sample, population) of any shape!!

    k Interval Chebyshev Empirical Rule

    1 at least 0% approximately 68%

    2 at least 75% approximately 95%3 at least 89% approximately 99.7%

    s2x,s2x

    sx,sx

    s3x,s3x

    The Chebyshevs Theorem

    (1-1/12)

    (1-1/22)

    (1-1/32)

    x ks x ks

  • 7/28/2019 3.Numerical Descriptive Techniques

    41/69

    41

    Example 7 The annual salaries of the employees of a chain of

    computer stores produced a positively skewed histogram.The mean and standard deviation are $28,000 and

    $3,000,respectively. What can you say about the salariesat this chain?

    Solution

    At least 75% of the salaries lie between $22,000 and

    $34,00028000 2(3000) 28000 + 2(3000)

    At least 88.9% of the salaries lie between $$19,000 and$37,000

    28000 3(3000) 28000 + 3(3000)

    The Chebyshevs Theorem

  • 7/28/2019 3.Numerical Descriptive Techniques

    42/69

    42

    The coefficient of variation of a set ofmeasurements is the standard deviation dividedby the mean value.

    This coefficient provides a proportionatemeasure of variation.

    CV:variationoftcoefficienPopulation

    x

    s

    cv:variationoftcoefficienSample

    A standard deviation of 10 may be perceived

    large when the mean value is 100, but only

    moderately large when the mean value is 500

    The Coefficient of Variation

  • 7/28/2019 3.Numerical Descriptive Techniques

    43/69

    43

    Your score

    Sample Percentiles and Box Plots

    Percentile

    Thepth percentile of a set of measurements isthe value for which

    p percent of the observations are less than that value

    100(1-p) percent of all the observations are greaterthan that value.

    Example

    Suppose your score is the 60% percentile of a SATtest. Then

    60% of all the scores lie here 40%

  • 7/28/2019 3.Numerical Descriptive Techniques

    44/69

    44

    Sample Percentiles

    To determine the sample 100p percentile of adata set of size n, determine

    a) At least np of the values are less than or equal

    to it.b) At least n(1-p) of the values are greater than or

    equal to it.

    Find the 10 percentile of 6 8 3 6 2 8 1Order the data: 1 2 3 6 6 8

    Find np and n(1-p): 7(0.10) = 0.70 and 7(1-0.10) = 6.3A data value such that at least 0.7 of the values are less than or equal to it

    and at least 6.3 of the values greater than or equal to it. So, the first observationis the 10 percentile.

  • 7/28/2019 3.Numerical Descriptive Techniques

    45/69

    45

    Commonly used percentiles

    First (lower)decile = 10th percentile

    First (lower) quartile, Q1= 25th percentile Second (middle)quartile,Q2 = 50th percentile

    Third quartile, Q3 = 75th percentile

    Ninth (upper)decile = 90th percentile

    Quartiles

  • 7/28/2019 3.Numerical Descriptive Techniques

    46/69

    46

    Quartiles

    Example 8

    Find the quartiles of the following set ofmeasurements 7, 8, 12, 17, 29, 18, 4, 27,30, 2, 4, 10, 21, 5, 8

  • 7/28/2019 3.Numerical Descriptive Techniques

    47/69

    47

    SolutionSort the observations

    2, 4, 4, 5, 7, 8, 10, 12, 17, 18, 18, 21, 27, 29, 30

    At most (.25)(15) = 3.75 observations

    should appear below the first quartile.

    Check the first 3 observations on theleft hand side.

    At most (.75)(15)=11.25 observations

    should appear above the first quartile.

    Check 11 observations on theright hand side.

    The first quartile

    Comment:If the number of observations is even, two observations

    remain unchecked. In this case choose the midpoint between these

    two observations.

    15 observations

    Quartiles

  • 7/28/2019 3.Numerical Descriptive Techniques

    48/69

    48

    Find the location of any percentile usingthe formula

    Example 9Calculate the 25th, 50th, and 75th percentile ofthe data in Example 1

    Location of Percentiles

    percentilePtheoflocationtheisLwhere

    100

    P)1n(L

    thP

    P

  • 7/28/2019 3.Numerical Descriptive Techniques

    49/69

    49

    2 3

    0 5

    1

    0

    Location

    Location

    Values

    Location 3

    Example 9 solution

    After sorting the data we have 0, 0, 5, 7, 8, 9,12, 14, 22, 33.

    75.2100

    25)110(L25 3.75

    The 2.75th

    locationTranslates to the value

    (.75)(5 0) = 3.75

    2.75

    Location of Percentiles

  • 7/28/2019 3.Numerical Descriptive Techniques

    50/69

    50

    Example 9 solution continued

    The 50th percentile is halfway between thefifth and sixth observations (in the middlebetween 8 and 9), that is 8.5.

    Location of Percentiles

    5.5100

    50)110(L50

  • 7/28/2019 3.Numerical Descriptive Techniques

    51/69

    51

    Example 9 solution continued

    The 75th percentile is one quarter of thedistance between the eighth and ninthobservation that is14+.25(22 14) = 16.

    Location of Percentiles

    25.8100

    75)110(L75

    Eighth

    observation

    Ninth

    observation

  • 7/28/2019 3.Numerical Descriptive Techniques

    52/69

    52

    Quartiles and Variability

    Quartiles can provide an idea about theshape of a histogram

    Q1 Q2 Q3

    Positively skewed

    histogram

    Q1 Q2 Q3

    Negatively skewed

    histogram

  • 7/28/2019 3.Numerical Descriptive Techniques

    53/69

    53

    This is a measure of the spread of themiddle 50% of the observations

    Large value indicates a large spread of the

    observations

    Interquartile range = Q3 Q1

    Interquartile Range

  • 7/28/2019 3.Numerical Descriptive Techniques

    54/69

    54

    1.5(Q3 Q1) 1.5(Q3 Q1)

    This is a pictorial display that provides themain descriptive measures of the data set:

    L - the largest observation

    Q3 - The upper quartile Q2 - The median

    Q1 - The lower quartile

    S - The smallest observation

    S Q1 Q2 Q3 LWhisker Whisker

    Box Plot

  • 7/28/2019 3.Numerical Descriptive Techniques

    55/69

    55

    Example 10

    Box Plot

    Bills

    42.19

    38.45

    29.2389.35

    118.04

    110.46

    .

    .

    .

    Smallest = 0

    Q1 = 9.275

    Median = 26.905

    Q3 = 84.9425

    Largest = 119.63

    IQR = 75.6675

    Outliers = ()

    Left hand boundary = 9.2751.5(IQR)= -104.226

    Right hand boundary=84.9425+ 1.5(IQR)=198.4438

    9.2750 84.9425 198.4438119.63-104.226

    26.905

    No outliers are found

  • 7/28/2019 3.Numerical Descriptive Techniques

    56/69

    56

    Box Plot

    The following data give noise levels measuredat 36 different times directly outside of GrandCentral Station in Manhattan.

    NOISE

    82

    89

    94

    110

    .

    .

    .

    Smallest = 60

    Q1 = 75

    Median = 90

    Q3 = 107

    Largest = 125

    IQR = 32

    Outliers =

    BoxPlot

    60 70 80 90 100 110 120 130

    10775

    75-1.5(IQR)=27107+1.5(IQR)

    =155

  • 7/28/2019 3.Numerical Descriptive Techniques

    57/69

    57

    Interpreting the box plot results

    The scores range from 60 to 125. About half the scores are smaller than 90, and about half are

    larger than 90.

    About half the scores lie between 75 and 107.

    About a quarter lies below 75 and a quarter above 107.

    Q1

    75

    Q2

    90

    Q3

    107

    25% 50% 25%

    60 125

    Box Plot

    NOISE - continued

  • 7/28/2019 3.Numerical Descriptive Techniques

    58/69

    58

    50%

    25% 25%

    The histogram is positively skewed

    Q1

    75

    Q2

    90

    Q3

    107

    25% 50% 25%

    60 125

    Box Plot

    NOISE - continued

    Di t ib ti Sh d

  • 7/28/2019 3.Numerical Descriptive Techniques

    59/69

    Distribution Shape andBox-and-Whisker Plot

    Right-SkewedLeft-Skewed Symmetric

    Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

    59

  • 7/28/2019 3.Numerical Descriptive Techniques

    60/69

    60

    Example 11

    A study was organized to compare the qualityof service in 5 drive through restaurants.

    Interpret the results

    Example 11 solution

    Minitab box plot

    Box Plot

  • 7/28/2019 3.Numerical Descriptive Techniques

    61/69

    61

    100 200 300

    1

    2

    3

    4

    5

    C6

    C7

    Wendys service time appears to be the

    shortest and most consistent.

    Hardees service time variability is the largest

    Jack in the box is the slowest in service

    Box Plot

    Jack in the Box

    Hardees

    McDonalds

    Wendys

    Popeyes

  • 7/28/2019 3.Numerical Descriptive Techniques

    62/69

    62

    100 200 300

    1

    2

    3

    4

    5

    C6

    C7

    Popeyes

    Wendys

    Hardees

    Jack in the Box

    Wendys service time appears to be the

    shortest and most consistent.

    McDonalds

    Hardees service time variability is the largest

    Jack in the box is the slowest in service

    Box Plot

    Times are positively skewed

    Times are symmetric

    Paired Data Sets and the

  • 7/28/2019 3.Numerical Descriptive Techniques

    63/69

    63

    Paired Data Sets and the

    Sample Correlation Coefficient

    The covariance and the coefficient ofcorrelation are used to measure thedirection and strength of the linear

    relationship between two variables. Covariance - is there any pattern to the way

    two variables move together?

    Coefficientof correlation - how strong is thelinear relationship between two variables

  • 7/28/2019 3.Numerical Descriptive Techniques

    64/69

    64

    N

    )y)((xY)COV(X,covariancePopulation

    yixi

    x (y) is the population mean of the variable X (Y).N is the population size.

    1-n

    )yy)(x(x

    y)cov(x,covarianceSampleii

    Covariance

    x (y) is the sample mean of the variable X (Y).

    n is the sample size.

  • 7/28/2019 3.Numerical Descriptive Techniques

    65/69

    65

    Compare the following three sets

    Covariance

    xi yi (x x) (y y) (x x)(y y)

    2

    6

    7

    13

    20

    27

    -3

    1

    2

    -7

    0

    7

    21

    0

    14

    x=5 y =20 Cov(x,y)=17.5

    xi yi (x x) (y y) (x x)(y y)2

    6

    7

    27

    20

    13

    -3

    1

    2

    7

    0

    -7

    -21

    0

    -14

    x=5 y =20 Cov(x,y)=-17.5

    xi yi

    2

    6

    7

    20

    27

    13

    Cov(x,y) = -3.5

    x=5 y =20

  • 7/28/2019 3.Numerical Descriptive Techniques

    66/69

    66

    If the two variables move in oppositedirections, (one increases when the otherone decreases), the covariance is a largenegative number.

    If the two variables are unrelated, thecovariance will be close to zero.

    If the two variables move in the samedirection, (both increase or bothdecrease), the covariance is a large

    positive number.

    Covariance

  • 7/28/2019 3.Numerical Descriptive Techniques

    67/69

    67

    This coefficient answers the question: Howstrong is the association between X and Y.

    yx

    )Y,X(COV

    ncorrelatiooftcoefficienPopulation

    yxss

    )Y,Xcov(r

    ncorrelatiooftcoefficienSample

    The coefficient of correlation

  • 7/28/2019 3.Numerical Descriptive Techniques

    68/69

    68

    COV(X,Y)=0 or r =

    +1

    0

    -1

    Strong positive linear relationship

    No linear relationship

    Strong negative linear relationship

    or

    COV(X,Y)>0

    COV(X,Y)

  • 7/28/2019 3.Numerical Descriptive Techniques

    69/69

    If the two variables are very stronglypositively related, the coefficient value isclose to +1 (strong positive linearrelationship).

    If the two variables are very stronglynegatively related, the coefficient value isclose to -1 (strong negative linear

    relationship).

    No straight line relationship is indicated by acoefficient close to zero

    The coefficient of correlation