Top Banner
1 Continuous Data
23

1 Continuous Data. 2 Median Sorted data: Min position 1 Max position n The median is the value in the “middle” position: position ½(1 + n) If this.

Dec 22, 2015

Download

Documents

Mikel Baldridge
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

1

Continuous Data

Page 2: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

2

MedianSorted data: Min position 1

Max position n

The median is the value in the “middle” position:

position ½(1 + n)

If this position is halfway between, then average the two associated data values.

Median = 50th percentile.

Page 3: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

3

Median – Failure Time data

Failure times in hours.

The median is 232.3.

The 50th percentile is 232.3.

Page 4: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

4

Percentile / Percentile RankThe idea is to put the data onto a 0% - 100% scale.

Data scale: x Percent scale: k

x is the kth percentile

equivalent

the percentile rank of x is k

Page 5: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

5

Interpretationx is the kth percentile / the percentile rank of x is k

This means…

(Approximately*) k% of units** have variable*** less than x and (100 – k)% of units have variable greater than x.

* technically required; you may omit this

** state what the units are – don’t use the word “units”

*** state what the variable is – don’t use the word “variable”

Page 6: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

6

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

GPA x = 3.274 Percent k = 70% (=0.70)

“the 70th percentile of GPAs is 3.274”

“the percentile rank of 3.274 is 70”

Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 3.274 and 70.

Page 7: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

7

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274.

units variable

Page 8: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

8

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

It is not correct to say…

Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274.

There aren’t exactly 100 graduating seniors

If you chose 100, you would be unlikely to get a 70/30 split.

Page 9: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

9

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

It is not correct to say…

Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274.

This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way:

70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274.

Page 10: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

10

Illustration 170% of graduating seniors have GPA below 3.274; the other 30% have GPA above 3.274.

Do not worry about seniors with GPA exactly 3.274.

This figure is likely rounded. Very few (much less than 1% of) people will have exactly this GPA.

Page 11: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

11

PercentilesSuitable to data where there are few to no ties.

Continuous data

Page 12: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

12

Illustration 2In discussing investment opportunities, a financial advisor speaks about a company’s “price to earnings” ratio (PE) – the price of a share of stock divided by the amount of profit the company makes annually (ie.: How much it costs to purchase $1 of annual profit).

“For the ECC Company, its PE of 7.3 is at the 15th percentile among companies in the industrial sector.”

Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 7.3 and 15.

Page 13: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

13

Illustration 2“For the Edmundsen company, the PE of 7.3 is at the 15th percentile among companies in the industrial sector.”

15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

units variable

Page 14: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

14

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

It is not correct to say…

Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 7.3.

There aren’t exactly 100 industrial companies

If you chose 100, you would be unlikely to get a 15/85 split.

Page 15: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

15

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

It is not correct to say

Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 73.

This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way.

Page 16: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

16

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

Do not worry about companies with PE exactly 7.3. Even ECCs PE is not exactly 7.3 It’s rounded to that figure.

Page 17: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

17

Percentiles & Percentile Ranksin Excel

Data in sorted (low to high) array

Value on data scale: x

Value on % scale “Percentile Rank”: k (%)

=PERCENTRANK(array, x, 9)

(the 9 ensures accuracy)

=PERCENTILE(array, k/100)

Page 18: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

18

Sorted failure time data in cells B2 through B29(n = 28).

Determine the percentile rank for a failure time of 216.6 hours.

=PERCENTRANK(B2:B29, 216.6, 9)

0.3704 = 37.04%

“216.6 is the 37.04 percentile.”

“The percentile rank of 216.6 is 37.04.”

Page 19: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

19

Rouding of Percents

For 10% - 90%

to the nearest 1% is generally fine

For 1% - 10% and 90% - 99%

to the nearest 0.1% is fine

For 0.1% - 1.0% and 99.0% - 99.9%

to the nearest 0.01% is fine

It’s OK to give more precision than is called for. You can run into trouble working with less precision than specified here.

Page 20: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

20

Rounding of Percents

Consider two treatments for your condition. With Treatment A the chance of dying is 0.51%. With Treatment B the chance is 1.49%.

Rounded to the nearest 1%, both are 1%.

Out of 10,000 people getting treatment A, on average 51 die.

Out of 10,000 people getting treatment B, on average 149 die.

Almost 3 times as many.

Page 21: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

21

Sorted failure time data in cells B2 through B29(n = 28).

Determine the 75th percentile.

75% has to be “converted” to 0.75 for use in PERCENTILE

=PERCENTILE(B2:B29, 0.75)

254.2

“254.2 is the 75th percentile.”

“The percentile rank of 254.2 is 75.”

Page 22: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

22

# of Cars OwnedSuppose we surveyed 100 families.

Most would say 1 or 2, some 3, a few 4, and a few 0.

The data are highly discrete.

0 1 2 3 40

5

10

15

20

25

30

35

40

45

50

# of Cars Owned

# o

f F

am

ilie

s

Page 23: 1 Continuous Data. 2 Median Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½(1 + n) If this.

23

# of Cars Owned (sorted)0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 3 4

The 38th percentile is 2. The 82nd percentile is 2.

Percentiles don’t make much sense for discrete data (and make no sense for categorical data).