Top Banner
1 Multiple-choice example
58

1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

Mar 28, 2015

Download

Documents

Hannah Corbett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

1

Multiple-choice example

Page 2: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

2

Solution

• The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.) Statement A is false.

• B looks good; BUT TRY THE OTHERS.

• The SD is the square root of the variance, so C is false.

• D provides no further information.

• We accept B.

Page 3: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

3

Study questions

The mean weight of three people in a car is 170 pounds. They pick up another person, whose weight is 190 pounds. What is now the mean weight of the people in the car?

Page 4: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

4

Answer

• From the definition of the mean as the total divided by the number of values, the total weight of the first three people is 170 × 3 = 510 lbs.

• Adding the fourth person, we have a new total weight of 510 + 190 = 700 lbs.

• The new mean weight is 700/4 = 175 lbs.

Page 5: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

5

Adding and multiplying by 2

• We have seen that the mean of the scores in the Caffeine group is 11.90 and the SD is 3.28.

• Suppose we add a constant of 2 to each of the 20 scores. What effects would that have upon the values of the mean, the variance and the SD?

• What would be the effects of multiplying each score by 2?

Page 6: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

6

A histogram showing the distribution of height in 523 men

Page 7: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

7

A normal curve

• The histogram of men’s heights is approximately symmetrical and bell-shaped.

• The curve, which is truly SYMMETRICAL and BELL-SHAPED, is known as a NORMAL curve.

• A variable with such a distribution is said to have a NORMAL DISTRIBUTION.

Page 8: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

8

Representing a distribution

• The height distribution can be represented like this.

• Not all distributions have this shape.

• But the data from the Caffeine and Placebo conditions in our experiment do, more or less.

mean

Page 9: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

9

Notation

• Returning to the data from the Caffeine experiment, let M be the mean score of those participants tested after ingesting caffeine.

• Let s and s2 be the variance and standard deviation of the 20 scores, respectively. (The standard deviation is the square root of the variance.)

Page 10: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

10

Adding and multiplying by a constant

Adding a constant k

Multiplying by a constant k

M M + k

M

s2

kM

s2 k2s2

Page 11: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

11

Adding a constant

• Adding a constant to every score simply shifts the whole distribution two units to the right.

• So the new mean will be the old one plus two: new mean = 11.90 + 2 = 13.90

• The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.

Page 12: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

12

Adding a constant k

• If you have summation algebra, you can easily show that adding a constant k adds the same value to the mean.

• In the derivation, MX is the mean of the original scores; whereas MX+k is the mean of the scores with k added to each of them. The terms sX and sX+k

2 are to be interpreted in a similar way. • The addition of k makes NO DIFFERENCE to the value of the

variance (or the standard deviation).

Page 13: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

13

Multiplying by a constant

• Multiplying each score by a constant k not only increases the mean by a factor of k, but also increases the SPREAD of the scores about the new mean.

• The new mean will be k times the old one.

• The new variance will be k2 times the old variance.

• The new SD will be k times the old one.

Page 14: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

14

Multiplying by a constant …

• The mean was originally 11.90 • The SD and variance were originally 3.28

and 10.73, respectively • When all scores are multiplied by a factor

of 2, the SD becomes 3.28 ×2 = 6.55; the variance becomes 10.73 × 4 = 42.91.

• The standard deviation has increased by a factor of 2; but the variance has increased by a factor of 4.

Page 15: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

15

Multiplying by a constant k

• A little more summation algebra shows that multiplying by a constant k multiplies the value of the mean by k.

• The dispersion or variance also increases – by a factor of k2.

Page 16: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

16

Standard deviation of kX

Since the standard deviation is the square root of the variance, multiplying the original scores by the constant k multiplies the standard deviation by k.

Page 17: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

17

Lecture 4

MORE DESCRIPTIVE STATISTICS

Page 18: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

18

Properties of a distribution

The three most important properties of a distribution are:

1. The typical value, AVERAGE or CENTRAL TENDENCY. (The terms LEVEL and LOCATION are also used.)

2. The SPREAD or DISPERSION of scores around the average.

3. The SHAPE of the distribution.

Page 19: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

19

Statistics

• The CENTRAL TENDENCY of a distribution is measured by AVERAGES, one measure of the average being the MEAN. Today I shall consider two additional measures of the average; but there are several others.

• The SPREAD or DISPERSION of a distribution is measured by the VARIANCE, STANDARD DEVIATION and various RANGE STATISTICS.

• There are also statistics for measuring the asymmetry or SKEWNESS of a distribution.

Page 20: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

20

The Poisson distribution

• By no means all variables are normally distributed.

• In a nursery, there are 20 electric lights.

• The mean rate at which light bulbs blow is 2 per week.

• But occasionally many more blow.

• The distribution looks like this.

Page 21: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

21

A Poisson distribution

Page 22: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

22

Poisson distribution …

• This distribution is POSITIVELY SKEWED: that is, it has a tail to the right.

• Most of the values bunch around a mean of 2 bulbs blowing.

• The tail represents the occasional large numbers of blow-outs.

Page 23: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

23

Measuring skewness

• Asymmetry or skewness is measured with a statistic which I shall call simply ‘Skewness’.

• (Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.)

• The Statistical Package for the Social Sciences (SPSS) will calculate the value of Skewness for any distribution.

• If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.

Page 24: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

24

Comparing skewness

Skewness = .044 Skewness = +.795 Skewness = -0.795

Page 25: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

25

Salaries in the US

Skewness = 2.13

Page 26: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

26

Outliers

• Often data sets contain scores that are atypical of the distribution as a whole.

• Such an atypical score is known as an OUTLIER.

• With small data sets, outliers can have marked effects upon the values of some statistics.

• Such statistics can become UNREPRESENTATIVE of the data as a whole.

Page 27: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

27

The mean as the centre of gravity

• The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point.

• Positive and negative deviations can be thought of as the distances of the points to the right and the left of the balance point.

• The deviations must sum to zero if balance is to be maintained.

• Points further from the balance point exert more LEVERAGE: the scores 0 and 6 exert more leverage than the 4s and the 1s.

• The 3s exert no leverage at all, since they are situated at the balance point.

Page 28: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

28

An outlier exerts ‘leverage’ upon the value of the mean.

• Add a score of 20 to the set. This is clearly an OUTLIER. • There were 16 scores in the old set; 17 in the augmented set. • The new mean is [(3×16) + 20]/17 = 68/17 = 4. (See the car

problem.) • The outlier has exerted LEVERAGE upon the value of the mean. • Arguably the value of the new mean isn’t typical of the distribution.

old mean

new meanoutlier exerting leverage

Page 29: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

29

Other measures of ‘the average’

• There are other measures of the average or central tendency which are more ROBUST to the influence of outliers.

• Two such measures are the MEDIAN and the MODE.

Page 30: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

30

The mode

• The MODE is the MOST FREQUENT score.

• In the original distribution, the mode is 3.

• In the new distribution, the mode is still 3.

• But the mean has been drawn to the right by the outlier.

• The mode is more resistant to the outlier’s influence.

mode mean

mean mode

Page 31: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

31

Problems with the mode

• The mode is only useful as a measure of the average with well-shaped data sets.

• Take the distribution of salaries in a firm which would, of course, be positively skewed, with the directors up in the tail on the right.

• Several of the directors, however, might be on exactly the same salary, which might therefore be the modal salary. Here the mode (200 K?) might be quite atypical of the salary of employees.

Page 32: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

32

A bimodal distribution

Page 33: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

33

The median

• The MEDIAN is the MIDDLE SCORE. The median is the score below (or above) which half the distribution lies.

• Obtain the median by arranging the scores in order and taking the middle one.

• The median of the scores (1, 2, 7, 8, 9) is 7.

• The median of the original distribution on the left is 3.

• The median of the augmented distribution is still 3.

• Like the mode, the median is ROBUST to the pull of outliers.

median mean

meanmedian

Page 34: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

34

Salaries again

• The mean value of 34K seems rather atypically high.

• The median of 29K seems somewhat more typical.

• Pay no attention to the mode with salary distributions – it can be very misleading.

Mean=34K

Median=29K

Mode=31K

Page 35: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

35

Uses of the median

• Classical statistical theory is based upon the MEAN, rather than the median.

• But the median is very useful for EXPLORING YOUR DATA before proceeding to the stage of making formal statistical tests.

• The comparison between the values of the mean and the median provides important information about the shape of a distribution.

• Early measures of skewness were based upon the difference between the mean and the median.

Page 36: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

36

Relative frequency as an area

• Think of the AREAS of the bars as representing the RELATIVE FREQUENCIES with which values within their class intervals occur in the distribution.

• Their total area is approximately the area under the normal curve.

• The total area under the curve represents UNITY or 100%.

• ALL values lie SOMEWHERE under the curve.

Page 37: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

37

Relative frequency as an area

Relative frequency of heights between 65 inches and 70 inches.

Page 38: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

38

Percentiles

• A PERCENTILE is the VALUE or SCORE below which a specified percentage or proportion of the distribution lies.

• The 30th percentile is the value below which 30% of the distribution lies.

• The 70th percentile is the value below which 70% of scores lie.

Page 39: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

39

The 30th and 70th percentiles

0.30

30th percentile

70th percentile

0.70

(0.70)

(0.30)

Page 40: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

40

The median is the 50th percentile

0.50 0.50

50th percentile

(median)

Page 41: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

41

Quartiles

• The three QUARTILES are percentiles which divide the distribution into four parts.

• The FIRST QUARTILE Q1 (also known as the LOWER QUARTILE) is the value below which 25% of scores lie.

• The SECOND QUARTILE Q2 is the score below which 50% of scores lie. The second quartile is the MEDIAN.

• The THIRD QUARTILE Q3 (also known as the UPPER QUARTILE) is the value below which 75% of the distribution lies.

Page 42: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

42

Upper and lower QUARTILES

0.25

0.75

25th percentile

(lower quartile, Q1)75th percentile

(upper quartile, Q3)

Page 43: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

43

More measures of spread: the range statistics

Page 44: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

44

The interquartile range (IQR)

The interquartile range includes 50% of the values in the distribution.

Page 45: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

45

The semi-interquartile range (SIQR)

• The midquartile is NOT the median.

• The semi-interquartile range is the median of the absolute deviations (that is, the deviations with signs ignored) of scores from the mid-quartile.

Page 46: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

46

Heights of men

IQR = 3.44

SIQR = 1.72

Page 47: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

47

Comparison of the measures

• In this distribution, the SIQR and the SD have different values.

• For a well-shaped distribution like this, we should prefer the SD.

• For the purposes of exploration, however, the SIQR might provide a more useful measure of spread.

Page 48: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

48

95% of the distribution

• 95% of ANY distribution lies between the 2.5th percentile and the 97.5th percentile.

• BELOW the 2.5th percentile lie .025 (2.5%) of the scores.

• ABOVE the 97.5th percentile lie .025 (2.5%) of the scores.

• Outside those limits lie .025+.025 = .05 (5%) of the scores.

Page 49: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

49

95% of ANY continuous distribution

0.95 (95%)

0.025

0.025

2.5th percentile

97.5th percentile

Page 50: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

50

Normal distribution

• A NORMAL DISTRIBUTION is symmetrical and bell-shaped.

• If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.

0.95 (95%)

mean

mean – 1.96×SD mean +1.96×SD

2 ½ % = .025 2 ½ % = .025

Page 51: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

51

Another useful interval

• NINETY-NINE per cent of values in a normal distribution lie within 2.58 standard deviations on either side of the mean.

• Only .01/2 = .005 or ½ % of values lie above this range.

• Only .01/2 = .005 or ½ % of values lie below this range.

• The upper value is the 99.5th percentile; the lower value is the .5th percentile.

Page 52: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

52

99% of values

Mean + 2.58×SD

Mean – 2.58×SD

0.99 (99%)½ % = 0.005½ % = 0.005

Page 53: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

53

Within ONE standard deviation

• SIXTY-EIGHT per cent of values in a normal distribution lie within 1 standard deviation on either side of the mean.

• So the upper limit of this interval (mean + 1SD) is the [68 + 32/2]th percentile, that is, the 84th percentile.

• The lower limit of this interval (mean – 1SD) is the 32/2th percentile, that is, the 16th percentile.

Page 54: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

54

The 95th percentile

• NINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean.

• (Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.)

• These statements apply only to the normal distribution.

Page 55: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

55

The 95th percentile of a normal distribution

0.95 (95%)

Mean + 1.64×SD

Page 56: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

56

Study question: distribution of IQ

• The IQ has an approximately normal distribution, with a mean of 100 and a standard deviation of 15.

• If 1000 people are drawn at random from the population, how many of them can we expect to have IQs …

1. greater than 130?2. between 100 and 130?3. less than 85?

Page 57: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

57

Study question

At which percentile in the IQ distribution is

1. an IQ of 130?

2. an IQ of 115?

3. an IQ of 100?

4. an IQ of 85?

Page 58: 1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)

58

Next week

• Next week, I shall show you how to use the statistical package SPSS to explore the results of an experiment and obtain the sorts of statistics I have been talking about this afternoon, including percentiles.

• I shall show you how to obtain graphs of a distribution.