Top Banner
Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal number of entries. (Careful: if n is odd the two halves do not contain the median , if n is even they may) We compute the “median” on each half, and …
12

Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Jan 18, 2016

Download

Documents

Erik Reed
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Centrality revisitedWe have already seen how to compute the

median.If we use the median as an axe we cut the data into two halves, each with an equal number of entries.(Careful: if n is odd the two halves do not contain the median, if n is even they may)We compute the “median” on each half, and …

Page 2: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

call the two resulting numbers (one on the left, one on the right of the median)the lower (on the left) quartile andthe upper (on the right) quartile.We get this nice “breakdown of data:”

That justifies the name quartiles (duh!)We could go on cutting sets of data in half, but not now. Instead we look at

Page 3: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Measures of “Spread”

One measure of spread comes immediately to mind, the range, but a quick look at some examples shows right away that this isn’t precise enough, wildly different sets of data have the same range.Now what?Another way to look at spread, besides range (which is too crude a measure), is to look at how “spread out” the data are, that is, how far they wonder away from the middle.

Page 4: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Unfortunately we have to decide firstwhich middle?

Say we have a finite set of datax1, x2, x3, …, xn

Intuitively we would like to take the median, but for computational ease we’ll choose the average,( for a sample, for a population). So …we write the distance between and xi for each datum, add the distances and divide by n. We can write a long hand formula for this as follows:

Page 5: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Or an even prettier short hand formula(BUT forget about pretty long or pretty short, learn the method!)

This is very nice, except that absolute values are computationally intractable!

Much nicer (computationally)

Page 6: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

The right-hand side is called thevariance

(denoted by Var)So we have the baptism (definition)

(This formula may lead to fairly difficult computations, we’ll learn a short-cut soon)If the data are from the entire population of interest life is good. If however the data are from just a sample of the population, it turns out that

Page 7: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

the value we get from Var tends to underestimate the true value of Var (from the entire population, such is life!) We compensate for this slight underestimation by a slight increase in the value of Var. We just multiply by the fraction (why is this an increase ?). In summary:population Var

sample Var = (population Var)•

Page 8: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

We have stated before that the formula

can lead to some seriously difficult computations.Try applying it to the set of numbers

3 5 7 -4 6 8 -2There is, however, a short-cut. In formula it looks worse, but in words (and use) it is much easier.

Page 9: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

In words it says:

1. First compute the mean

2. Then compute the

mean of the squares

3. Then subtract 12 from 2.

Let's try the short-cut on the set of numbers

3 5 7 -4 6 8 -2

Page 10: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Step 12 gives

Step 2 gives

We get Var =

Page 11: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

When the number of data is small there is an even easier (visually) way to proceed. We apply it to the same set of seven data:

Page 12: Centrality revisited We have already seen how to compute the median. If we use the median as an axe we cut the data into two halves, each with an equal.

Final RemarksThe variance we have computed is a

population varianceIf the data come from a sample we must remember to correct our answer, multiplying by the correction factor

Then we take a square root and obtain the corresponding standard deviation

(population or sample)