Top Banner
1 My contact details • Colin Gray • Room S2 (Thursday mornings, especially) • E-mail address: [email protected] • Telephone: (27) 2234 • A rapid response to any queries assured!
201

1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: [email protected]@abdn.ac.uk Telephone: (27) 2234 A rapid.

Mar 28, 2015

Download

Documents

Alexis Newton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

1

My contact details • Colin Gray

• Room S2 (Thursday mornings, especially)

• E-mail address: [email protected]

• Telephone: (27) 2234

• A rapid response to any queries assured!

Page 2: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

2

This afternoon’s programme

• 1:30 – 3:30pm Simple descriptive statistics.

• 3:30 – 4:00pm A break for coffee.

• 4:00 – 4:45pm Finding probabilities.

Page 3: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

3

SESSION 1

Describing data

Page 4: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

4

Kinds of data

Page 5: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

5

Univariate, bivariate and multivariate data sets

• We can classify data according to the number of measured variables in the data set.

• If there is one measured variable, we have a UNIVARIATE data set.

• If there are two measured variables, we have a BIVARIATE data set.

• If there are three or more measured variables, we have a MULTIVARIATE data set.

Page 6: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

6

Levels of measurement

• There are three levels of measurement:

1. Scale, interval or continuous.

2. Ordinal.

3. Nominal.

Page 7: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

7

Scale data

• Measures on an independent scale with units. Heights, weights, performance scores, IQs and number of Hits are all scale data. So also are counts of the number of hits and so on. Each score has ‘stand-alone’ meaning.

Page 8: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

8

Ordinal data

• Data in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed.

• Rarely would a researcher collect data in the form of ranks. But there are hidden issues here. Some would argue that ratings are really ordinal data (with ties) and should be treated as such in statistical analysis.

Page 9: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

9

Nominal data

• Assignments to categories (so-many males, so-many females.) Nominal data are numerical, but the numbers are arbitrary LABELS, as when John receives a 1 for Sex, while Jane receives a 2.

• Nominal data are not really measurements at all.

Page 10: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

10

Experimental versus correlational research

• In a true experiment such as a randomised clinical trial, the researcher manipulates one variable, the INDEPENDENT VARIABLE (IV), with a view to demonstrating that is has a causal effect upon the DEPENDENT VARIABLE (DV).

• The DV is measured during the course of the experiment.

• In correlational research, ALL variables are measured as they occur in the people studied.

Page 11: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

11

Comparison

• Experimental research usually results in univariate data sets.

• The statistical analysis usually involves COMPARISON of scores obtained under the different experimental conditions.

• For example, performance under an active condition might be compared with performance under a control condition.

Page 12: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

12

Association

• Correlational research results in bivariate or multivariate data sets.

• Here, the interest centres on the possible existence of statistical ASSOCIATIONS among the variables measured.

• If watching screened violence promotes actual violence, we should find that those who watch most screened violence should tend to be the most violent, those who watch least should be the least violent and so on.

Page 13: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

13

Uses of statistics

1. We use statistics to SUMMARISE and DESCRIBE our data.

2. We use statistics to CONFIRM patterns in our data. One aspect of this process of confirmation is the making of statistical TESTS.

Page 14: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

14

A simple two-group experiment

• The experimenter wants to show that ingestion of caffeine improves shooting accuracy, as measured by number of Hits.

• Participants are randomly assigned to one of the two conditions.

• All participants shoot at the same target.

Page 15: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

15

Results of the Caffeine experiment

Page 16: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

16

The raw data

• The table shows the RAW DATA, that is, the ORIGINAL SCORES achieved by the participants.

• From inspection, it seems that the Caffeine group tended to have higher scores.

• With larger data sets, however, it can be very difficult to see what’s going on merely from inspection.

Page 17: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

17

Distribution

• The DISTRIBUTION of a variable is a table or diagram showing the relative FREQUENCIES, over the entire range, with which different values occur.

• A good first move in a statistical analysis is to draw a graph of the distribution.

Page 18: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

18

Distributions of the Caffeine and Placebo data

Page 19: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

19

Three important aspects of a distribution

1. Its LEVEL or CENTRAL TENDENCY.

2. The SPREAD or DISPERSION of scores around the centre.

3. The SHAPE of the distribution.

Page 20: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

20

Different central tendencies • The scores of the

Caffeine group TEND to be higher than do the scores of the Placebo group. The two distributions differ in LEVEL or CENTRAL TENDENCY.

• There is, however, considerable overlap: some participants in the Placebo condition outperformed those in the Caffeine condition.

Page 21: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

21

Individual differences • In the Caffeine

distribution, values are densest around 13; whereas in the Placebo distribution, values are densest around 9.

• But there is a huge RANGE in performance.

• The worst performer (who scored 2) was in the Caffeine group; the best (who scored 20) was in the Placebo group.

Page 22: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

22

Central tendency: the “average”

• An average is a measure of level or central tendency, the “typical” value.

• It is clear from inspection of the figure that the average score of the Caffeine distribution should be higher than the average score of the Placebo distribution.

• There are several different measures of the “average” of a set of scores.

Page 23: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

23

The mean

• The MEAN of a set of scores is the sum of their values divided by the number of scores.

• If X is a score and n is the number of scores, the mean M is:

Page 24: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

24

Example

• The mean of the scores 10, 1, 3, 4 and 2 is …

Page 25: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

25

The two group means

Page 26: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

26

Deviation scores • A deviation score d is

a score from which the mean has been subtracted.

• Deviation scores have the very important property that they sum to zero.

• Therefore, their mean is also zero.

Page 27: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

27

Centring • In column X, are raw

scores, centred on their mean value of 2.

• Place the deviation scores d in the next column. This operation is known as CENTRING and is common in regression analysis.

• The new values are now centred on zero, rather than the mean of the original values.

Page 28: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

28

The mean as the ‘centre of gravity’

• The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point.

• We can see (because this distribution is symmetrical) that the mean of this distribution is 3.

Page 29: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

29

Outliers

• Often data sets contain scores that are atypical of the distribution as a whole.

• Such an atypical score is known as an OUTLIER.

• With small data sets, outliers can have marked effects upon the values of some statistics.

• Such statistics can become UNREPRESENTATIVE of the data as a whole.

Page 30: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

30

An outlier (20 hits) exerts ‘leverage’ upon the value of the mean.

Page 31: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

31

Other measures of ‘the average’

• There are other measures of the average or central tendency which are more ROBUST to the influence of outliers.

• Two such measures are the MEDIAN and the MODE.

Page 32: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

32

The median

• The MEDIAN of a distribution is the MIDDLE number. It is the value below which 50% of the distribution lies.

• The medians of the scores in the Placebo and Caffeine groups are, respectively, 9 and 12.5 .

Page 33: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

33

Points about the median

• Notice that, for the Placebo group, the median does not have the value of any of the actual scores.

• With symmetrical distributions, the median and the mean have similar values.

Page 34: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

34

The mode • The MODE is the MOST FREQUENT value. • For the Placebo and Caffeine groups, the values

of the mode are 8 and 13, respectively. • On all three measures of central tendency or

level, therefore, the three averages agree that the Caffeine group typically performed at a higher level than did the Placebo group.

Page 35: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

35

Comparison of the three measures

• The mean is the basis of classical statistical theory, because it has many useful mathematical properties.

• The median is useful for exploring data sets, particularly in comparison with the mean. With an extremely asymmetrical distribution, the median is arguably a truer measure of level in the data as a whole.

• The mode is seldom used.

Page 36: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

36

Properties of the mean

• We have seen that deviations about the mean sum to zero.

• The sum of the SQUARES of deviations about the mean is a MINIMUM, that is, it is smaller than the sum of squared deviations about any other value.

Page 37: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

37

A property of the median

• The sum of ABSOLUTE deviations about the MEDIAN is also a minimum.

• But absolute values are less useful mathematically.

Page 38: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

38

A second scenario

• The scores of both groups cluster around the same value: 12 . Since the distributions are completely symmetrical, the mean of either is clearly 12.

• In the Caffeine distribution, however, the scores are more widely SPREAD OUT or DISPERSED than those of the Placebo group.

Page 39: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

39

The simple range

• The SIMPLE RANGE is the highest score minus the lowest score.

• So, for the Placebo group in Scenario 2, the simple range is (15 – 9) = 6 score units.

• For the Caffeine group, the simple range is (18 – 6) = 12 score units.

• On this measure of dispersion, therefore, the Caffeine distribution shows twice as much spread or dispersion of scores around the mean.

Page 40: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

40

A problem with the simple range

• The simple range statistic only uses TWO scores out of the whole distribution.

• Should those particular scores be highly atypical of the distribution, the range may not reflect the true spread of scores about the mean of the distribution. The data from the original scenario (left) exemplify this situation.

Page 41: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

41

Other range statistics

• Nevertheless, the simple range can be a very useful statistic when you are EXPLORING a data set.

• Also available are more complex RANGE STATISTICS (the interquartile range, the seminterquartile range) which use more of the information in a data set than does the simple range.

Page 42: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

42

The variance and the standard deviation (SD)

• The VARIANCE (s2) and the STANDARD DEVIATION (s or SD) are also measures of dispersion.

• Both statistics use the values of ALL the scores in the distribution.

Page 43: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

43

Deviation scores again • The DEVIATION SCORE

is the building block from which the variance and SD are calculated.

• Could the mean deviation serve as a measure of spread?

• No, because deviations about the mean sum to zero. So the mean deviation is also zero, whatever the spread of your data.

Page 44: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

44

Squared deviations

• The sum of the SQUARED deviations is always either positive (when scores have different values) or zero (if all the scores have the same value).

• If there is any variability in the scores at all, the sum of the squared deviations will have a positive value.

Page 45: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

45

Formula for the variance

• The Greek letter sigma (Σ) is used to indicate that you are to obtain the deviation of each score from the mean, square it, then add up all the squared deviations.

• The sample variance s2 is close to being the MEAN SQUARED DEVIATION.

• The value 1 is subtracted from n in order to improve the sample variance as an estimate of the spread of values in the population.

Page 46: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

46

Applying the formula

Page 47: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

47

Variance of the Caffeine scores in Scenario 1

Page 48: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

48

Adding a constant

• Adding a constant of ten to every score in the Caffeine group simply shifts the whole distribution ten units to the right.

• So the new mean will be the old one plus ten: new mean = 11.90 + 2 = 13.90

• The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.

Page 49: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

49

Multiplying by a constant

• Multiplying each score by a constant of ten not only increases the mean by a factor of ten, but also increases the SPREAD of the scores about the new mean.

• The new mean will be ten times the old one. • The new variance will be ten SQUARED, that is

one hundred, times the old variance. • The new SD will be ten times the old one.

Page 50: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

50

Adding and multiplying scores by a constant of ten

Page 51: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

51

Examples

Page 52: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

52

Effect of centring

• When you centre scores by subtracting the mean, the mean becomes zero.

• The variance, however, remains unaltered.

Page 53: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

53

Interpreting the variance

• The simple range statistic has the merit of being in the same units as the raw data.

• The variance, since it is based on the squares of the deviations, is in SQUARED UNITS and is therefore difficult to interpret.

• If you take the (positive) square root of the variance, you have the STANDARD DEVIATION, which is in the original units of measurement.

Page 54: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

54

The standard deviation is the positive square root of the variance

• We found that the variance of the scores of the Caffeine group was 10.73

• To obtain the standard deviation, we take the square root of 10.73, which is 3.28 .

• The square root operation restores the measure of spread to the original measurement units: we can say that the standard deviation is 3.28 hits.

Page 55: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

55

Tables of results

• As well as means, always include the standard deviations.

Page 56: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

56

Vulnerability of variance and SD to outliers

• We have seen that the mean is vulnerable to the leverage exerted by outliers.

• This is true, a fortiori, of the variance, because it is the sum of the SQUARES of deviations from the mean.

• The leverage effect is NOT removed by taking the square root of the variance to obtain the standard deviation.

Page 57: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

57

Standard or z scores

• A standard or z score is a special kind of deviation score which expresses a value as so-many standard deviations above or below the mean (0):

Page 58: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

58

Mean and SD of z scores

• Their mean is always zero (because they are deviation scores).

• Their variance and standard deviation are 1.

Page 59: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

59

Advantage of z scores

• Scores in different units (heights and weights) cannot be directly compared.

• But when someone’s weight has a z score of –1 (one SD below the mean (0) and their height has a z score of +2 (two SDs above the mean), we can say that someone is tall and thin.

• If we can make additional assumptions about the distribution, knowledge of z scores is even more informative.

Page 60: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

60

Distribution shape

• We have measured the AVERAGE and the SPREAD of the Caffeine and Placebo distributions.

• We noted that both distributions were (at least approximately) SYMMETRICAL.

• There are circumstances in which that would not be the case.

Page 61: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

61

A disappointing result

• The mean for the Caffeine group is only very slightly greater than the Placebo mean.

• But note that both means are near the top of the scale (20).

• And notice how small the SD’s are.

Page 62: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

62

Ceiling effect

• The scores of both groups are bunched around the top of the scale.

• Any possible effect of caffeine intake has been masked by a CEILING EFFECT.

• The task chosen was TOO EASY for the participants.

• No conclusions about the effects of ingestion of caffeine can be drawn from these data.

Page 63: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

63

Another disappointing result

• Again the Caffeine mean is only slightly greater than the Placebo mean.

• But both means are near the bottom of the scale (zero).

• Once again, note the small SD’s.

Page 64: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

64

Floor effect

• The scores of either group are bunched around the bottom of the scale.

• The task was too difficult.

• No conclusions about the effects of ingestion of caffeine can be drawn from these data either.

Page 65: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

65

Skewness

• In both Scenarios 3 and 4, the distributions are asymmetric or SKEWED.

• When a distribution has a tail to the left, it is said to be NEGATIVELY SKEWED; when it has a tail to the right, it is POSITIVELY SKEWED.

• When there is a ceiling effect, the distributions are negatively skewed; when there is a floor effect, they are positively skewed.

Page 66: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

66

Screen violence and actual violence

• Does screened violence promote actual violence?

• Ethical and practical considerations may rule out direct manipulation of the amount of violent material that children watch.

• It may be more feasible to measure children on the amount of screen violence they watch and upon their actual violence.

Page 67: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

67

Correlation

• A statistical ASSOCIATION or CORRELATION is a tendency for events or values to occur together.

• If exposure to screen violence promotes actual violence, we should expect those who watch more violence to be more violent and those who watch less violence to be less violent.

• Such a POSITIVE ASSOCIATION would be at least consistent with the hypothesis.

Page 68: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

68

A scatterplot • Here is a picture of the

results of our study. • In this SCATTERPLOT,

each point represents one of the children.

• Richard got a score of 2 on Exposure and 4 on Actual.

• John got 9 on Exposure and 8 on Actual.

• Jim got scores of 5 on both Exposure and Actual.

Richard

John

Jim

Page 69: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

69

A strong positive correlation

• When the shape of a scatterplot is a narrow ellipse like this, a strong correlation is indicated.

• The results of the study are consistent with the hypothesis.

Page 70: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

70

A negative correlation?

• Does the number of complaints made against GPs very inversely with the average length of their appointments?

• The following scatterplot supports this hypothesis.

Page 71: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

71

A strong negative correlation

Page 72: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

72

Scatterplot indicating no association

• When the cloud of points is circular, there is NO ASSOCIATION between the variables.

Page 73: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

73

Linear functions

• Y is a LINEAR FUNCTION of X if the graph of Y upon X is a straight line.

• For example, temperature in degrees Fahrenheit is a linear function of temperature in degrees Celsius.

Page 74: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

74

The Pearson correlation

• The PEARSON CORRELATION (r), is designed to measure the strength of a supposed linear relationship between two variables.

• A correlation can only take values within the range from –1 to +1, inclusive.

• The closer the value of a correlation to unity (forgetting the sign), the STRONGER the linear association.

Page 75: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

75

Formula for the Pearson correlation

• There are several equivalent formulae. Here is the simplest.

• Transform X and Y to standard scores z.

• Divide the sum of the products of the pairs of standard scores by (n – 1).

Page 76: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

76

The calculation of r for the violence data

• The value of r (.892) is high and positive, consistent with the appearance of the scatterplot.

Page 77: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

77

Centring again

• What is the effect upon the value of r when the variables involved are centred?

• There is no effect. • In fact, no linear transformation of either variable

(or both variables) will change the ABSOLUTE value of r.

• Suppose you measure the heights and weights of 100 people in inches and pounds and find that the correlation is +.6 . If you convert the heights and weights to cms and grams, respectively, the correlation is still +.6 .

• Merely subtracting their mean from the values of each variable leaves the correlation unchanged.

Page 78: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

78

Reversing the slope

• If you multiply all the scores on one variable by –1, you will change the slope of the scatterplot; but the absolute value of r will remain the same.

Page 79: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

79

Centring in regression

• We have seen that centring does not change the variance of a variable in the data set.

• Nor does centring change the correlations among the variables.

• Centring is used in several multivariate procedures in order to help the algorithm to find a unique solution.

Page 80: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

80

Question

• We have been told of a bivariate data set, from which the calculated Pearson correlation is ZERO: r = 0.

• From this information alone, can we conclude that the two variables are independent, that is, there is no association between them?

• The answer is NO!

Page 81: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

81

The scatterplot

• There is a perfect, but nonlinear association between the two variables.

• Yet the Pearson correlation is zero.

Page 82: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

82

Anscombe’s data set

• Many years ago, Fred Anscombe (American Statistician, 1973) published a famous paper warning readers of the pitfalls awaiting the unwary user of information about correlations.

• There were four bivariate data sets, all of which produced a Pearson correlation with a value of +.82.

Page 83: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

83

An elliptical scatterplot

• This is fine. • The elliptical

scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.

Page 84: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

84

A non-linear relationship

• There is actually a perfect association between variable Y2 and variable X1.

• This relationship, however, is non-linear and is understated by the value of r.

Page 85: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

85

An understatement by r

• There is a substantial correlation.

• The scatterplot, however, is not elliptical.

• Basically there is a perfect linear relationship between Y3 and X1.

• The outlier (a typo?) has depressed the value of r.

Page 86: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

86

Anscombe’s rule

• When you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question:

“Would the removal of one or two points at random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables”.

Page 87: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

87

In summary …

• The Pearson correlation r is a measure of the strength of a supposed LINEAR relationship between 2 variables.

• It is one of the most widely used of statistical measures; but it is also one of the most misused.

• Wherever possible, a value of r should be interpreted in the context of the scatterplot.

Page 88: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

88

Have we really gathered evidence for the hypothesis that viewing

screened violence increases actual violence?

Page 89: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

89

A famous dictum

CORRELATION

does not imply

CAUSATION

Page 90: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

90

A causal model

• The scientific hypothesis implies this CAUSAL MODEL.

• The results are CONSISTENT with the hypothesis.

Page 91: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

91

Another causal model

• The child’s violent tendencies towards and appetite for violence lead to his watching violent programmes as often as possible.

• This model is also consistent with the data.

Page 92: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

92

Yet another causal model

• NEITHER variable causes the other. • Both are determined by the behaviour of the

child’s parents.

Page 93: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

93

Direction of causality

• Returning to the caffeine experiment, it would be ridiculous to suggest that shooting accuracy determines the group to which one is assigned.

• In the violence study, however, which was of CORRELATIONAL, rather than EXPERIMENTAL design, the direction of causation is uncertain.

• Indeed, at least three possible MODELS OF CAUSATION are consistent with the results.

Page 94: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

94

A background variable

• Perhaps neither Exposure nor Actual violence cause one another.

• Perhaps they are caused by a background parental behaviour variable.

• We have data on such a variable.

• The background variable correlates highly with both Exposure and Actual violence.

Page 95: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

95

Partial correlation

A PARTIAL CORRELATION is what remains of a Pearson correlation between two variables when the influence of a third variable has been removed, or PARTIALLED OUT.

Page 96: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

96

The partial correlation

• The partial correlation fails to reach significance.• Now that we have taken the background variable into

consideration, we see that there is no significant correlation between Exposure and Actual violence.

• It appears that, of the three possible causal models, the ‘third party’ model gives the most convincing account of these data.

Page 97: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

97

Coffee break

Page 98: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

98

Histograms

• A HISTOGRAM is useful for displaying the distribution of a large data set.

• Here is a histogram of the heights of 1000 men.

Page 99: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

99

Heights of 1000 men

Page 100: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

100

Features of a histogram

• The entire range of variation (shown on the x-axis) is divided into CLASS INTERVALS.

• The heights of the bars are proportional to the FREQUENCIES of values (y-axis) falling within the class intervals represented by the bases of the bars.

• The bars touch each other, indicating the CONTINUOUS variation of the variable.

Page 101: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

101

A normal distribution

Page 102: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

102

Salaries in the US

• Many variables have asymmetrical distributions.

Skewness = 2.13

Page 103: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

103

Measuring skewness

• Asymmetry or skewness is measured with a statistic which I shall call simply ‘Skewness’.

• (Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.)

• PASW will calculate the value of Skewness for any distribution.

• If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.

Page 104: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

104

Skewness of three distributions

Page 105: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

105

Relative frequency as an area

• The area of a bar is the proportion of values within the range of its base.

• The green area is the proportion of heights between 70 inches and 75 inches.

Page 106: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

106

Proportion between 65” and 75”

Page 107: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

107

Proportion of heights either below 65” or above 75”.

Page 108: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

108

Unity

• All values lie within the total range.

• The area of the green bars is 100% or unity.

Page 109: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

109

Populations and samples

• We have some scores on shooting accuracy from the caffeine trial.

• The POPULATION of such scores is the reference set, that is, the infinite set of all possible scores.

• Our data are merely a subset or SAMPLE from the population.

Page 110: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

110

Theoretical populations or distributions

• In these talks, the term “population” always refers to a theoretical distribution.

• For example the 1000 men’s heights are a sample from a theoretical NORMAL population whose mean is 69” and whose standard deviation is 2.59”.

• This NORMAL distribution is symmetrical and bell-shaped.

Page 111: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

111

Statistics versus parameters

• STATISTICS are characteristics of SAMPLES.

• PARAMETERS are characteristics of populations.

Page 112: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

112

Notational convention

• Roman letters denote statistics such as our sample means and SDs.

• Greek letter denote the corresponding population characteristics or parameters

Page 113: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

113

Two parameters

• There is an infinitely large family of normal distributions.

• To specify a normal distribution you must assign values to TWO parameters:

1. The mean

2. The standard deviation

Page 114: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

114

The height population

Page 115: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

115

Probability

Page 116: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

116

Probability

• The PROBABILITY of an event is a measure of its likelihood, which can take values from zero (an impossible event) to unity (a certainty).

• There have been several definitions of probability.

• All of them raise serious philosophical questions.

Page 117: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

117

An ‘event’

• An EVENT is the outcome of an experiment of chance, such as rolling a die, tossing a coin – or running a psychological experiment.

• Chance is an important factor in the outcome of an experiment.

• Joe, Fred and Mary participated this time; but Anne, Jim and Fiona could easily have done so – and their scores would certainly have been different.

Page 118: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

118

Classical ‘probability’

• “The first impetus came from a situation in which the dissolute nobility of France were competing in a race to ruin at the gaming tables” (Hogben, 1967; p.551).

• In 1654, Pascal and Fermat analysed the gambling strategies of one particular nobleman.

• Their approach was to determine the number of ways an outcome (such as a particular hand in cards) could occur in comparison with the total number of possibilities.

Page 119: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

119

Classical definition of a probability

• The probability of an event is the NUMBER OF WAYS in which the event can occur, divided by the TOTAL NUMBER OF OUTCOMES.

• Roll a die. • What is the probability of a six? • There is ONE way of getting a six. There

are SIX possible outcomes. • So the probability of a six is 1/6.

Page 120: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

120

More examples

• Roll a die. What is the probability of an even number?• That could happen in three ways: 2 spots, 4 spots or six

spots. • So the probability is 3/6 = ½. • What is the probability of a seven? There is NO WAY in

which that could happen, so the probability is 0/6 = 0 (indicating an IMPOSSIBILITY).

• A number between 1 and 6, inclusive? That event could happen in six ways, so the probability is 6/6 = 1 (indicating a CERTAINTY).

Page 121: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

121

A formula for classical probability

• If an experiment of chance has N possible outcomes and an event E can occur in n ways,

Page 122: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

122

A problem with the classical definition

• The classical definition is circular.

• The “number of ways” in which an experiment of chance could turn out were stated to be “equally likely”, which (by implication) pressed the term into service for its own definition.

Page 123: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

123

The empirical definition of a probability

• This notion is implicit in the notion of a FAIR coin.

• A fair coin is one that, IN THE LONG RUN, shows heads half the time.

• This “convergence”, however, which is a special case of what I shall call simplistically “The law of large numbers”, is an empirical fact.

• It cannot, however, be proved “analytically”, that is, by mathematical deduction.

Page 124: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

124

Interpretation of a probability

• If a coin is ‘fair’, the probability of a head is ½. • This does not mean that if I toss the coin 100

times, I shall get 50 heads. • Nor does it mean that if I toss the coin a million

times, I shall get close to half a million heads. • But with a million tosses, the proportion of heads

will be closer to ½ than it would be if I were to toss the coin 10 times, 100 times or 1000 times.

• A probability is a PROPORTION to which we can get as close as desired by taking a sample of sufficient size.

Page 125: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

125

Health events

• A HEALTH EVENT is an uncertain occurrence, such as acute appendicitis, admission to a dental clinic - or death.

• ADVERSE events are those occurring after admission to hospital.

• The likelihood of such events occurring is quantified as proportions obtainable from the records over a period of time.

• These proportions are thus EMPIRICAL PROBABILITIES.

Page 126: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

126

The laws of large numbers

• You can make a sample resemble the population as closely as you like by making it sufficiently large.

• So small samples from the same population can show considerable variation; whereas very large samples show little variation.

Page 127: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

127

Example

• I draw five samples of size ten from a normal population with mean zero and standard deviation 1. (The STANDARD normal distribution.)

• I then draw five samples of size one million from the same population.

Page 128: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

128

Size ten versus size one million

Page 129: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

129

Large samples and populations

• With the lower histograms, you are looking at the population, rather than at samples.

• … relative frequencies become PROBABILITIES.

• Visualise the probability of a value within a specified interval as the area under the curve of the theoretical distribution between the limits of the interval.

Page 130: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

130

Relative frequency becomes probability

Page 131: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

131

Probability distribution

• When we take a measurement such as a person’s height, we assume we have performed an experiment of chance.

• We have sampled from a theoretical population.

• Since areas under the curve represent probabilities, theoretical distributions are known as PROBABILITY DISTRIBUTIONS.

Page 132: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

132

Random variable or variate

• A RANDOM VARIABLE or VARIATE is a variable that takes values in an unpredictable way.

• The values of a random variable make up a theoretical distribution or population, i.e., a probability distribution.

• Let X be a value selected at random from a normal population with mean 69 and standard deviation 2.58. The variable X is a normal random variable or normal VARIATE.

Page 133: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

133

Cumulative probability

• The cumulative probability of a value from a distribution is the probability of a value less than or equal to that value.

• The cumulative probability of 75 is .99; the cumulative probability of 70 is .65 .

Page 134: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

134

Cumulative probability of 75”

Page 135: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

135

Cumulative probability of 70”

Page 136: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

136

Probability of a height in the range from 70 to 75 inches

• Just subtract the cumulative probability of 70 from the cumulative probability of 75.

Page 137: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

137

Percentiles

• A PERCENTILE is the value below which a specified proportion of the distribution lies.

• The 90th percentile is the value below which 90% of values lie.

• The 10th percentile is the value below which 10% of values lie.

• The 50th percentile (the MEDIAN) is the value below which 50% of values lie.

Page 138: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

138

The 30th and 70th percentiles

• The green areas are the cumulative probabilities of the 30th and 70th percentile values.

Page 139: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

139

The median is the 50th percentile

• The cumulative probability of the median or middle value is .50.

Page 140: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

140

95% of the distribution

• 95% of ANY distribution lies between the 2.5th percentile and the 97.5th percentile.

• BELOW the 2.5th percentile lie .025 (2.5%) of the scores.

• ABOVE the 97.5th percentile lie .025 (2.5%) of the scores.

• Outside those limits lie .025+.025 = .05 (5%) of the scores.

Page 141: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

141

95% of ANY continuous distribution lies between the 2.5th and 97.5th percentiles

Page 142: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

142

Normal distribution • A NORMAL

DISTRIBUTION is symmetrical and bell-shaped.

• If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.

Page 143: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

143

The 95th percentile

• NINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean.

• (Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.)

• These statements apply only to the normal distribution.

Page 144: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

144

The 95th percentile of a normal distribution

Page 145: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

145

The standard normal variable z

• Let X be a normal variable with mean μ and SD σ.

• Let z be defined as in the formula.

• z is also normally distributed, and is known as the STANDARD NORMAL VARIABLE.

Page 146: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

146

Mean and standard deviation of the standard normal distribution

• We have seen that the effect of standardising scores is to centre the distribution on zero and produce a variance and standard deviation of 1.

• Thus the standard normal distribution has a mean of zero and an SD of 1.

Page 147: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

147

Standard normal curve

Page 148: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

148

Any normal distribution can be transformed to the standard normal distribution by subtracting the mean

from each value and dividing the difference by the standard

deviation.

Page 149: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

149

The standard normal distribution

Page 150: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

150

Questions about probability

• Questions about the probabilities of ranges of values of a normally distributed random variable can always be rephrased in terms of the standard normal distribution.

• Just convert the raw values to z scores by subtracting the mean and dividing by the standard deviation.

Page 151: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

151

A question about IQ

• The IQ measure has an approximately normal distribution, with a mean of 100 and a standard deviation of 15.

• If 1000 people are drawn at random from the population, how many of them can we expect to have IQs greater than 130?

Page 152: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

152

Solution

• Transform 130 to z (2).

• A proportion of .025, that is, 25 in a thousand values, are at least as large as 130.

Page 153: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

153

Taking samples

• Suppose I take 16 people’s IQs and calculate the mean. It might be 95.1 . I take another 16 people and find that their mean is 102.6 .

• I draw a total of 4000 samples, calculating the value of the mean each time.

• The means will vary considerably, but not so much as the original distribution of IQs.

Page 154: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

154

The mean is a random variable

• A random variable X is one whose values are not predictable. One can only assign probabilities to ranges of its values.

• A statistic such as the mean M, since its value depends upon the values of X selected for the sample, is also a random variable or variate.

• The variate M has a distribution of its own.

Page 155: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

155

Sampling distribution

• The probability distribution of a STATISTIC (such as the mean or the variance) is known as its SAMPLING DISTRIBUTION.

• If X is normally distributed, then so is M.• If we can specify the sampling distribution

of M by giving a value to its SD, we can assign probabilities to ranges of values for M.

Page 156: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

156

The IQ distribution

Page 157: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

157

Drawing to scale

• If I request a histogram of the sampling distribution of the mean, it will look similar to the histogram of IQ.

• But if I ask for BACK-TO-BACK HISTOGRAMS, we can compare the two distributions drawn to the same scale.

• In the following figure, the distribution on the right is the sampling distribution of the mean.

Page 158: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

158

Back-to-back histograms

Page 159: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

159

Shape of the sampling distribution

• It’s narrower than the original distribution. • The standard deviation has been much

reduced. • The areas of both distributions are the

same (unity, 100%, or a probability of one).

• But values of the mean are particularly thick on the ground in the region of the population mean value of the IQ, that is 100.

Page 160: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

160

Sampling distribution of the mean

Page 161: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

161

Standard error of the mean

• The STANDARD ERROR of a statistic is the standard deviation of its SAMPLING or PROBABILITY distribution.

• It is called the standard “error” because, if a sample value were to be used as an estimate of the corresponding parameter (the population mean), the estimate would be, to at least some degree, wide of the mark.

Page 162: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

162

Standard error of the mean

• If we draw samples of size n from a normal distribution with mean μ and standard deviation σ, the standard error of the mean σM is given by

Page 163: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

163

Standard error of the mean

σstandard error of the mean

153.75

16

M

n

Page 164: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

164

Sample size

• As the sample size n increases, the denominator of the formula increases and the standard error of the mean is reduced.

• The distribution becomes taller and narrower.

• The effect of increasing the size of the sample is to reduce the dispersion or variance of the sampling distribution of the mean.

Page 165: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

165

Effect of increasing the sample size n

μ

The IQ distribution

Sampling distributions of the mean for n = 16 and n = 64.

n = 64

n = 16

Page 166: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

166

Referring to z

• A question about a range of values of ANY normally distributed variable can always be translated into a question about a range of values of the standard normal variable z.

• Just subtract the mean and divide by the standard deviation.

• BUT if your question is about a range of values for the MEAN, you must divide by the STANDARD ERROR, not the original population SD.

Page 167: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

167

Question

• If I select 9 IQs at random and take their mean M, what is the probability that M is at least 110?

Page 168: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

168

Convert values to z

• This question is about a mean, so we must refer to the sampling distribution of the mean.

• The standard error or the mean is 15 divided by the square root of 9, that is, 5.

• If M = 110, z = (110 – 100)/5 = 2.

• So we want the probability of a value of z of more than 2.

Page 169: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

169

Referring to the standard normal distribution

Page 170: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

170

Answer

Page 171: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

171

Important!

• If your question is about MEANS, divide by the STANDARD ERROR OF THE MEAN σM, not the standard deviation of the original population.

Page 172: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

172

Question

• If I select a sample of size n = 16 from the IQ population, what is the probability that the mean lies between 92.5 and 100?

Page 173: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

173

Convert values to z

• The question is about a mean, so we must use the standard error of the mean to find the z values.

• The SEM is 15 divided by root 16 (4), that is, 3.75 .

• So z = (92.5 – 100)/3.75 = –2.

• For 100, z = 0.

Page 174: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

174

Referring to the standard normal distribution

• 95% of values lie between –2 and +2.

• So green area is 47.5%.

• The probability is .475 .

Page 175: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

175

Two populations

• Suspend your disbelief and suppose that two barrels each contain millions of tickets, on each of which is the value of an IQ. So each barrel contains a normal distribution with mean 100 and SD 15.

• I draw a sample of size 16 from each barrel and calculate the means M1 and M2.

• I also calculate the difference M1 – M2 and put it in a third barrel.

• The process is repeated millions of times. • The third barrel now contains the sampling distribution of

the DIFFERENCE (between means). • The sampling distribution of the difference is also

normal.

Page 176: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

176

Barrels

Page 177: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

177

Another random variable

• We have seen that the sample mean M is a random variable, whose probability distribution is the sampling distribution of the mean.

• The difference between means M1 – M2 is also a random variable.

• Its probability distribution is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means).

Page 178: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

178

Sampling distribution of the difference (between means)

Page 179: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

179

Variance of the difference

• We have seen that the sample means M1 and M2 are random variables.

• They are INDEPENDENT random variables – separate barrels.

• The variance of the sum OR DIFFERENCE BETWEEN independent random variables is the sum of their separate variances. (Remember that a variance cannot be negative.)

Page 180: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

180

Sampling variance of the difference

• Sampling variance of means from the first barrel:

• From the second:

• Sampling variance of M1– M2 :

• Standard error of the difference between means

Page 181: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

181

Standard error of the difference

Page 182: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

182

In our example,

Page 183: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

183

Question

• I draw a sample of size 16 from each of two identical IQ distributions, with mean 100 and SD 15.

• What is the probability that the difference (M1 – M2) is at least +10.61 ?

• What is the probability of a difference in EITHER direction?

Page 184: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

184

Answer

• The question is about a difference between means, so we must refer to the sampling distribution of the difference.

• We have found that the standard error of the difference is 5.3033 .

• As usual, we convert the value to z: • z = (10.61 – 0)/5.3033 = +2 . • So we want the probability of a value of z

at least as great as +2.

Page 185: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

185

Referring to the standard normal distribution

• We know that .025 (2.5%) of the distribution lies above z = 1.96 (2 approx).

• So the probability of a difference greater than + 10.61 is .025.

• The probability of a difference this large in EITHER direction

is .025 × 2 = .05 .

Page 186: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

186

Summary

• The three most important properties of a distribution are LEVEL, SPREAD and SHAPE.

• Several measures of these properties were discussed. • The notion of population was introduced and the notion

of probability introduced in that context. • The concept of a sampling distribution was introduced.• The sampling distributions of the mean and of the

difference between means were discussed. • Questions about the probabilities of ranges of values for

the mean and difference between means can be answered with reference to the standard normal distribution.

Page 187: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

187

Appendix

PROBABILITY

Page 188: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

188

An experiment of chance

• An EXPERIMENT OF CHANCE is a procedure with an uncertain outcome, such as tossing a coin or rolling a die.

• The classical notion of PROBABILITY arises in the context of an experiment of chance.

Page 189: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

189

The sample space

• Consider an experiment of chance in which a coin is tossed and a die is rolled.

• There are twelve possible outcomes, which can be set out in an array called a SAMPLE SPACE (S).

• Each outcome is known as an ELEMENTARY EVENT.

• The number of elementary events, n(S), is 12.

Page 190: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

190

Drawing of the sample space

Page 191: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

191

Drawing of an event space

Page 192: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

192

The classical definition revisited

• Let E be “a one or a two on the die”.

• Then n(E) = 4.

• Following the classical definition of a probability

Page 193: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

193

Complementary events

• Two elements are complementary if they are

1. Mutually exclusive; 2. Exhaustive.• If E is “a one or a two on the die”, the

event “not E”, which is denoted by Ē, is “any other number on the die”.

• Events E and Ē are complementary: they have no common outcome points and they exhaust the possibilities.

Page 194: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

194

Probabilities of complementary events

• If E and Ē are complementary events, their probabilities, p and q, respectively, sum to zero.

• So p + q = 1; p = 1 – q; q = 1 – p.

Page 195: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

195

Mutually exclusive events

• Two events, A and B, are said to be MUTUALLY EXCLUSIVE if the probability of their joint occurrence is zero.

• In terms of S, the event spaces of A and B have no elementary outcome points in common.

• For example, if A is “a six on the die” and B is “a one or a two on the die”, A and B are mutually exclusive.

Page 196: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

196

Two mutually exclusive events

Page 197: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

197

The exclusive OR rule

• If A and B are two mutually exclusive events, the Probability of either occurring, that is, Prob(A or B), is the sum of their separate probabilities.

Page 198: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

198

In our example,

Page 199: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

199

Independent events

• Two events A and B are INDEPENDENT if the occurrence of either has no effect upon the probability of the occurrence of the other.

• For example, if A is “a head” and B is “a six”, A and B are independent.

Page 200: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

200

AND rule for independent events

• If events A and B are independent, the probability of their joint occurrence Prob(A and B) is the product of their separate probabilities.

• In our example,

Page 201: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid.

201

References

• Hogben, L. (1967). Mathematics for the million. London: Pan Books. Chapter 12. The Algebra of Choice and Chance.

• Ross, S, (1976). A first course in probability New York: Macmillan. Pages 20 onwards.

• Woodroofe, M. (1975). Probability with Applications. Tokyo: McGraw-Hill Kogakusha. Chapter 2 - page 38 in particular.