8/19/2019 PS - Handbook.pdf
1/108
CONTENTS
Lecture 1: Exploratory Data Analysis And Descriptive Statistics 11.1 Studying one variable at a time 11.2 Studying two variables at a time 61.3 Studying more than two variables at a time 6
1.4 Effect of Transformation 61.5 Percentiles 61.6 Exercises 8
Lecture 2: Probability 112.1 Probability 132.2 Operations with Probability 142.3 Additive Rules of Probability 152.4 Complement of Event A 152.5 Conditional Probability 162.6 Independent Events 17
2.7 Intersection of events A and B 172.8 Bayes’ Rule 192.9 Exercises 21
Lecture 3: Discrete Random variables 253.1 Introduction 253.2 Bernoulli Distribution 273.3 Binomial Distribution 273.4 Poisson Distribution 303.5 Exercises 32
Lecture 4: Continuous Random variables 334.1 Introduction 334.2 Exponential Distribution 364.3 Exercises 38
Lecture 5: Normal Distribution 395.1 Introduction 395.2 Normal as an approximating distribution 405.3 Exercises 41
Lecture 6:
Random Sampling and sampling distributions 436.1 Introduction 436.2 Exercises 46
Lecture 7: Test of Hypotheses 487.1 Introduction to Hypothesis Testing 487.2 Procedure 487.3 Confidence intervals for hypothesis testing 507.4 Proportions 517.5 Sample size 537.6 Hypothesis Test for Proportions 53
7.7 Exercises 54
8/19/2019 PS - Handbook.pdf
2/108
Lecture 8: Type I and II Errors 568.1 Introduction 568.2 Type I and II Errors 568.3 Exercises 56
Lecture 9: Further Hypothesis Tests 58
9.1 Introduction 589.2 Comparison of two population means 589.3 The difference between two proportions 619.4 Paired samples 639.5 Exercises 65
Lecture 10: Inference for Variance 6710.1 Introduction 67
10.2 Confidence Interval for 2 6710.3 Confidence Interval for the Ratio of Two Variances 6810.4 Significance Test of Hypotheses about a Variance 68
10.5 Significance Test of Hypotheses about Two Variances 6910.6 Exercises 70
Lecture 11: Chi-squared Test 7211.1 Goodness-of-fit Test 7211.2 Test for Homogeneity 7411.3 Continuity Correction 7711.4 Exercises 78
Lecture 12: Regression Analysis 8012.1 Introduction 8012.2 Correlation 8212.3 Regression 8312.4 Exercises 87
Lab Assessment one 88
Lab Assessment two 90
Lab Assessment three 93
Lab Assessment four 94
Lab Assessment five 96
Lab Assessment six 98
Lab Assessment seven 99Lab Assessment eight 101
Lab Assessment nine 104
Lab Assessment ten 106
8/19/2019 PS - Handbook.pdf
3/108
LECTURE 1
EXPLORATORY DATA ANALYSIS AND DESCRIPTIVE STATISTICS
Statistics can be divided into two major areas. Descriptive statistics comprises the statistical methods
dealing with the collection, tabulation and summarization of data, so as to present meaningful
information. Statistical inference, on the other hand, consists of the methods involved with the
analysis and interpretation of data that will enable the statistician to develop meaningful inferences
about the data. Both sub fields are interrelated; while descriptive statistics organizes the collected data
in a systematic manner, statistical inference analyses the data and enables one to produce significant
inferences about it.
A population is the totality of the observations with which a statistician is concerned. The
observations could refer to anything of interest, such as persons, animals or objects; it need not belimited to people. The size of the population is defined to be the number of observations in the
population. In collecting data concerning a population, the statistician is often interested in arriving at
conclusions involving the entirety of the population.
A sample is a subset of a population. A random sample of n observations is a sample with n
observations, selected in such a way that every such sample of the population has the same probability
of being selected. These samples are considered to be unbiased.
Often, a sample of the population is taken, data collected from it, and inferences about the population
are made based on the analysis of the sample data.
1.1 Studying one variable at a time
A stem-and-leaf plot is a graphical display showing the frequency of values in specifiedintervals. It is useful for small amounts of data as it retains the actual numerical values.
Example:
Stem Leaf
1 3456662 0000112233453 22334444484 1112345675 223367896 2697 458 9
The stem is an integer and the leaf is a decimal value.
A histogram is a graphical way to display the shape of the distribution.
8/19/2019 PS - Handbook.pdf
4/108
A box-plot is a graphical summary of the distribution of a variable. The minimum, the 1stquartile, the median, the 3rd quartile and the maximum are used to construct a box-plot.
This is called the five-number summary.
1. The ends of the box are at the quartiles.
2. Mark the median with a line.
3. Observations more than 1.5 * IQR outside the box are considered to be outliers and
are marked with stars.
4. Whiskers extend from the ends of the box to the smallest and largest observations
that are not outliers.
Mean
The statistical mean of a set of observations is the average of the measurements in a set of data. The
population mean and sample mean are defined as follows:
Class mid points
F r e q u e n c y
32302826242220
40
30
20
10
0
Mean 25.74
StDev 2.389
N 250
Histogram of BMI
34
32
30
28
26
24
22
20
Box plot of BMI
8/19/2019 PS - Handbook.pdf
5/108
Given the set of data values , , . . . ., from a finite population of size , the population mean is calculated as
1
=
Given the set of data values , , . . . ., from a sample of size , the sample̅ 1
=
The sample mean is often used as an estimator of the mean of the population from whence the sample
was taken. In fact, the sample mean is statistically proven to be a most effective estimator for the
population mean.
A tr immed mean of a set of values is a mean with a specified percentage of the largest and smallestvalues excluded from the calculation.
Median
The median of a set of observations is that value that, when the observations are arranged in an
ascending or descending order, satisfies the following condition:
1. If the number of observations is odd, the median is the middle value.2. If the number of observations is even, the median is the average of the two middle values.
The median is the same as the 50th percentile of a set of data.
Mode
The mode of a set of observations is the specific value that occurs with the greatest frequency. There
may be more than one mode in a set of observations, if there are several values that all occur with the
greatest frequency. A mode may also not exist; this is true if all the observations occur with the same
frequency.
Another measure of central location that is occasionally used is the midrange. It is computed as the
average of the smallest and largest values in a set of data.
Example 1.1: Given the following set of data
1.2 1.5 2.6 3.8 2.4 1.9 3.5 2.5 2.4 3.0
It can be sorted in ascending order:
1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8
The mean, median and mode are computed as follows:
8/19/2019 PS - Handbook.pdf
6/108
x =10
0.34.25.25.39.14.28.36.25.12.1
= 2.48
x~ = (2.4 + 2.5) / 2
= 2.45
The mode is 2.4, since it is the only value that occurs twice.
The midrange is (1.2 + 3.8) / 2 = 2.5.
Note that the mean, median and mode of this set of data are very close to each other. This suggests
that the data is very symmetrically distributed.
Range
The range of a set of observations is the absolute value of the difference between the largest and
smallest values in the set. It measures the size of the smallest contiguous interval of real numbers that
encompasses all the data values.
Example 1.2: Given the following sorted data:
1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8
The range of this set of data is 3.8 - 1.2 = 2.6.
Variance and Standard Deviation
The variance of a set of data is a cumulative measure of the squares of the difference of all the data
values from the mean.
The population and sample variance are calculated as follows:
Given the set of data values , , . . . ., from a finite population of size , the population varianceis calculated as,
1 ( )
=
Given the set of data values , , . . . ., from a sample of size , the sample variance iscalculated as,
1 1 ( ̅)
=
8/19/2019 PS - Handbook.pdf
7/108
Note that the population variance is simply the arithmetic mean of the squares of the difference
between each data value in the population and the mean. On the other hand, the formula for the sample
variance is similar to the formula for the population variance, except that the denominator in the
fraction is ( 1 ) instead of . Using the above formula, the sample variance is statistically provento be a most effective estimator for the variance of the population to which the sample belongs.
The standard deviation of a set of data is the positive square root of the variance.
Example 1.3: Given the following sorted data:
1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8
x = 2.48 as computed earlier
2 s =
110
1
((1.2 - 2.48) 2 + (1.5 - 2.48) 2 + (1.9 - 2.48) 2 + (2.4 - 2.48)2
+ (2.4 - 2.48) 2 + (2.5 - 2.48)2 + (2.6 - 2.48) 2 + (3.0 - 2.48) 2
+ (3.5 - 2.48)2 + (3.8 - 2.48)2)
= (1 / 9) × (1.6384 + 0.9604 + 0.3364 + 0.0064 + 0.0064 + 0.0004 + 0.0144 + 0.2704 + 1.0404
+ 1.7424)
= 0.6684
s = (0.6684) 1/2 = 0.8176
The sample variance can also be calculated as follows:
2
11
22
)1(
1 n
i
i
n
i
i x xnnn
s
Example 1.4: Given the above data, we can calculate s using the above formula:
n
i
i x1
2 = 2222222222 8.35.30.36.25.24.24.29.15.12.1
= 1.44 + 2.25 + 3.61 + 5.76 + 5.76 + 6.25 + 6.76 + 9.00 + 12.25
+ 14.44= 67.52
2 s =910
1
× (10 × 67.52- 28.24 )
= 0.6684
1.2 Studying two variables at a time
A two-way frequency table gives the number of cases within each combination ofcategories of two qualitative variables.
A Scatter plot is a two-dimensional graphical display of two quantitative variables.
8/19/2019 PS - Handbook.pdf
8/108
1.3 Studying more than two variable at a time
A multiway frequency table or multidimensional contingency table displays the number
of cases within each combination of categories of several qualitative variables.
1.4 Effect of Transformation
A transformation of a variable is a mathematical manipulation of each value of the variable. When
we make a transformation, we transform the original scale of measurement for the variable to a new
scale.
Many statistical techniques require that the data is approximately normally distributed so we often
apply a transformation to the data. If the data is skewed to the right, we can try the natural logarithms
or the square root. If the data is skewed to the left, we can try a power transformation greater than one.
All these transformations are non-linear.
8/19/2019 PS - Handbook.pdf
9/108
1.5 Percentiles
Percentiles are values in a given set of observations that divide the data into 100 equal parts. These
values can be denoted by , , . . . . . , where1 % of the data falls below (is less than or equal to) P 12 % of the data falls below P2
:
:
99 % of the data falls below P99
Percentiles can be calculated using a sorted list of observations or the cumulative frequency
distribution table corresponding to the observations. In the latter method, it is assumed that the values
in a class interval are uniformly distributed within it; extrapolation is then used to calculate the
percentiles. As this assumption is often untrue, percentile values can differ depending on whether raw
data or frequency distributions were used in the computation. Therefore, percentiles are often treated
as estimates for the value below which certain percentages of the observations fall.
Example 1.1: Given the following sorted list of observations:
0.7 0.8 0.9 1.1 1.2 1.4 1.9 2.2 2.2 2.3
2.5 3.1 3.2 3.3 3.4 3.8 3.9 4.0 4.1 4.2
4.3 4.6 4.7 5.0 5.2 5.5 5.6 5.8 5.9 6.1
6.4 6.6 6.8 7.0 7.7 8.2 8.9 9.2 9.5 9.9
P75 = 6.1, since 40 x 75 % = 30 and 6.1 is the 30th ranked value.
P45 = 4.0, since 40 x 45 % = 18 and 4.0 is the 18th ranked value.
P62 = 5.2, since 40 x 62 % = 24.8 and 5.2 is the 25th ranked value.
This set of observations has the following cumulative frequency distribution:
Measurements Cumulative Frequency Relative Cumulative Frequency
0.0 - 1.0 3 0.075
1.0 - 2.0 7 0.175
2.0 - 3.0 11 0.275
3.0 - 4.0 18 0.450
4.0 - 5.0 24 0.600
5.0 - 6.0 29 0.725
6.0 - 7.0 34 0.8507.0 - 8.0 35 0.875
8.0 - 9.0 37 0.925
9.0 - 10.0 40 1.000
Totals 40 1.000
The percentiles can also be calculated from the cumulative frequency distribution table, using
extrapolation to arrive at estimates:
6.0 + 1.0 ∗(0.750.725)(0.850.725) 6.0 +
0.0250.125 6.2
8/19/2019 PS - Handbook.pdf
10/108
where 6.0 is the upper class limit of interval 5.0 - 6.0 with cumulative frequency 0.725, and 0.850 is
the cumulative frequency of the next interval, 6.0 - 7.0, with class width 1.0.
4.0 , since the interval 3.0 - 4.0 has a cumulative frequency of 0.45 5.0 + 1.0 ∗ (0.6200.600)(0.7250.600) 5.0 +
0.0200.125 5.16
where 5.0 is the upper class limit of interval 4.0 - 5.0 with cumulative frequency 0.600, and 0.725 is
the cumulative frequency of the next interval, 5.0 - 6.0, with class width 1.0.
The values of P75 and P62 differ between the two methods of calculation, while the values of P45 for both
methods are the same.
Deciles are values in a given set of observations that divide the data into 10 equal parts. These values
can be denoted by , , . . . . . , , where10 % of the data falls below D1
20 % of the data falls below D2
:
:
90 % of the data falls below D9
It is easy to see that
D1 = P10 D4 = P40 D7 = P70
D2 = P20 D5 = P50 D8 = P80
D3 = P30 D6 = P60 D9 = P90
Quartiles are values in a given set of observations that divide the data in 4 equal parts. These values
can be denoted by Q1, Q2 and Q3, where
25 % of the data falls below Q1
50 % of the data falls below Q2
75 % of the data falls below Q3
Again, it is obvious that , and .Deciles and quartiles are calculated in the same manner as percentiles.
1.6 Exercises
1. Construct a box-plot using the five-number summary, minimum, Q1, median, Q3, maximum,given as 48, 63, 70, 81 and 100 respectively.
2. Consider the following strength measurements.
66 117 132 111 107 85 89 79 91 97 138 103
111 86 78 96 93 101 102 110 95 96 88 122 115 92 137 91 84 96 97 100 105 104 137 80 104
8/19/2019 PS - Handbook.pdf
11/108
8/19/2019 PS - Handbook.pdf
12/108
In 1984-1985, 482,528 men and 496,949 women received bachelor’s degrees 143,390 men
and 142,861 women received master’s degrees, 21,700 men and 11,243 women received
doctorates, and 50,455 men and 24,608 women received first professional degrees.
(a) Arrange this information in one or more frequency tables.
(b) Discuss the relationship between sex and degree, separately for the two academic years.(c) Discuss the relationship between year and degree, separately for men and women.
1.7 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th edition)Exercises - Page 52 - Question 1.13 & 1.14
8/19/2019 PS - Handbook.pdf
13/108
LECTURE 2
PROBABILITY
So far we have used tools of data analysis to learn about a collection of information. In formal statisticalanalysis, we go beyond the goals of data analysis. In general, statistical analysis (inference) involves
making probability statements about populations based on what we observe in our samples. The ideas
in probability that are needed for formal statistical inference are discussed in this lecture.
Statisticians use the word experiment to describe any process that generates a set of data.
An experiment is a process leading to a well-defined observation or outcome that generates a set of
data.
A simple example of a statistical experiment is the tossing of a coin. In this experiment there are only
two possible outcomes, heads and tails.We are particularly interested in the observations obtained by repeating the experiment several times
under the same conditions. In most cases the outcomes will depend on chance and, therefore, cannot
be predicted with certainty. When a coin is tossed repeatedly, we cannot be certain that a given toss
will result in head. However we know the entire set of possibilities for each toss.
The sample space is the set of all possible outcomes of the experiment and is denoted by S. Each of
the possible outcomes is called an element or a member of the sample space , or simply a sample point.
If the sample space has finite number of elements we can list them as follows.
The sample space of S, of possible outcomes when a coin is tossed, may be written as
S = {H, T}Where H and T corresponds to “heads” and “Tails”.
In some experiment it is helpful to list the elements of the sample space systematically by means of a
tree diagram.
Sample spaces with large or infinite number of sample points are best described by a statement or a
rule. For example, if the possible outcomes of an experiment are the set of cities in the world with a
population over 1 million, the sample space is written
S = { x | x is a city with a population over 1 million}
A finite sample space is a sample space that contains a finite number of outcomes.
The sample spaces that contain the outcomes of tossing a coin, drawings from a bag of mixed-colour
balls, and dealings from a regular 52-card deck are examples of discrete sample spaces.
A continuous sample space is a sample space that contains an interval of values.
Sample spaces that contain the outcomes of temperature readings, height measurements, and salaries
are examples of continuous sample spaces.
An event is a subset of the sample space and is denoted by E.
It may contain some, all or none of the outcomes comprising the sample space. If the event contains
only one sample point, it is a simple event . If the event contains two or more sample points, it is a
compound event . And if the event contains no sample points, it is known as a null space.
8/19/2019 PS - Handbook.pdf
14/108
For any given experiment we may interested in occurrence of certain events rather than in the outcome
of a specific element in the sample space. For example we may interest in the event A that the outcome
when a die is tossed is divisible by 3. This will occur if the outcome is an element of the subset A =
{3,6} of the sample space S = {1,2,3,4,5,6} of tossing a die experiment.
To each event we assign a collection of sample points, which constitute a subset of the sample space.That subset represents all of the elements for which the element is true.
The complement of an event A with respect to S is the subset of all the elements of s that are not in
A. we denote the complement of a by the symbol A .
For example consider the sample space
S = {A, B, C, D, E}. Let A = {B, D}. Then A ={A, C, E}.
The intersection of two events A and B, denoted by the symbol B A is the event containing allelements that are common to A and B.
In the tossing of a die we might let A be the event that an even number occurs and B the event that a
number greater than 3 shows. Then the subsets A = {2,4,6} and B ={4,5,6} are subsets of the same
sample space S={1,2,3,4,5,6}. Both A and B will occur on a given toss if the outcome is an element of
the subset {4,6}, which is the intersection of A and B. So B A = {4,6}
For certain statistical experiments it is usual to define two events that cannot occur simultaneously.
Such events are said to be mutually exclusive.
Two events A and B are mutually exclusive, or disjoint if B A , that is, if A and B have noelements in common.
The Union of the two events A and B, denoted by the symbol B A , is the event containing all theelements that belong to A or B or both.
Example2.1:
Consider tossing a die and observing the number that appears on top face. This has a well-defined
outcome that is top face can be 1,2 3, 4, 5 or 6. So this can be taken as an experiment .The sample space S of the experiment is S = {1,2 ,3,4,5,6}.
S consists of 6 definite outcomes. So S is a finite sample space.
Some events on this sample space can be identified as even number occurs, odd number occurs and
number greater than 3 occurs.
Let A be the event that an even number occurs, B that an odd number occurs, and C that a numbergreater than 3 occurs. Then
Throwing the die
* * * * * *
| | | | | |
1 2 3 4 5 6
8/19/2019 PS - Handbook.pdf
15/108
A = {2,4,6}
B = {1,3,5}
C = {4,5,6}
C B = {1,3,4,5,6}C B = {5}
B A = {}, So A and B are mutually exclusive.
2.1 Probability
The probability of an event is the chance or likelihood of the event occurring.
In this chapter we consider only those experiments for which the sample space contains a finite number
of elements. The probability of an outcome or sample point is a real number, between 0 and 1 that
provides a measure of likelihood that the outcome or sample point will actually occur. A sample point
that absolutely cannot occur has a probability of 0, while a sample point that will always occur has a
probability of 1; all other sample points are assigned a probability based on this relative measure.
A probability function assigns a unique number or probability to each outcome.
The probability of an event A is the summation of the probabilities of all the sample points in A and
is denoted by P(A).
If event A is a subset of the sample space S, then 0 P(A) 1. If A = , then P(A) = P( ) =
0; if A = S, then P(A) = P(S) = 1. Otherwise, the value of P(A) is between 0 and 1.
Example2.2: A coin is tossed twice. What is the probability that at least one head occurs?
Solution:
The sample space for this experiment is S = {HH, HT, TH, TT}. If the coin is balanced each of these
outcomes would be equally likely to occur. Therefore, we assign a probability of w to each sample
point. Then 4w = 1,or w =1/4.If A represents the event of at least one head occurring, then
A = {HH,HT,TH} and4
3
4
1
4
1
4
1)( A P
If an experiment can result in any one of N different equally likely outcomes, and if exactly n of theseoutcomes correspond to event A, then the probability of event A is
N
n A P )(
Example2.3:
A mixture of candies contains 6 mints, 4 toffees, and 3 chocolates. If a person makes a random selection
of one of these candies, find the probability of getting (a) mint, or (b)a toffee or a chocolate.
Solution:
Let M, T, and C represent the events that the person selects, respectively, a mint, toffee, or chocolatecandy. The total number of candies is 13, all of which are equally likely to be selected.
8/19/2019 PS - Handbook.pdf
16/108
(a) Since 6 of the 13 candies are mints, the probability of event M , selecting a mint at random, is
13
6)( M P
(b) Since 7 of the 13 candies are toffees or chocolates, it follows that 13
7 B A P
2.2 Operations with Probability
Often it is easier to calculate the probability of other events. This may well be true if the event in
question can be represented as the union or intersection of two other events or as the complement of
some event.
Just as events can be treated as sets, so can probabilities of an event (in a sense). The formulas used to
calculate the probability of unions, intersections and complements of events are similar to the ones
used for sets.
8/19/2019 PS - Handbook.pdf
17/108
2.3 Additive Rules of Probability
Given that event A and event B are subsets of the sample space S, the following rules Union of events
A and B (Additive Rule of Probability)
If A and B are any two events, then B A P B P A P B A P
where B A P is the probability that either events A or B occur and B A P is the probability that both events A and B occur.
If A and B are mutually exclusive, then
B P A P B A P
Since if events A and B are mutually exclusive (i.e. A and B cannot occur together), P(A B) =P( ) = 0
In general we can write,If a set of events A1, A2 , A3,....., An are mutually exclusive, then
nn A P A P A P A A A P 21121
2.4 Complement of event A
Since S A A , and the event A and its complement are mutually exclusive,
1 A P A P A A A A P S P
A P A P 1
Example 2.4:What is the probability of getting a total of 7 or 11 when a pair of dice are tossed?
Solution:
Let A be the event that 7 occurs and B the event that 11 comes up. Now, a total of 7 occurs for 6 of
the 36 sample points and a total of 11 occurs for only 2 of the sample points. Since all sample points
are equally likely, we have 6 A P and 81 B P .The events A and B are mutually exclusive, sincea total of 7 and 11cannot both occur on the same toss. Therefore,
9
2
18
1
6
1 B P A P B A P
This result could also have been obtained by counting the total number of points for the event B A P , namely 8, and writing
.9
2
36
81
N
n B A P
8/19/2019 PS - Handbook.pdf
18/108
8/19/2019 PS - Handbook.pdf
19/108
95.082.0
78.0|
A P
A D P A D P
2.6 Independent Events
Although conditional probability allows for an alteration of the probability of an event in the light of
additional material, it also helps to understand the concept of Independent Events. In the above
example A D P | differs from D P . This suggests that the occurrence of A influenced D. Howeverconsider the situation where we have events A and B and A P B A P | . In other words theoccurrence of B had no impact on the occurrence of A. Here the occurrence of A is independent of the
occurrence B.
Two events A and B are independent if and only if
B P A B P | and A P B A P | . Otherwise, A and B are dependent.
2.7 Intersection of events A and B (Multiplicative Rules of probability)
If in an experiment the events A and B can both occur, then
)|( A B P A P B A P
Thus the probability that both A and B occur is equal to the probability that A occurs multiplied
by the probability that B occurs, given that A occurs. Since the events B A P and B A P are equivalent, it follows from above rule that we can also write
)|( B A P B P A B P B A P
If two events A and B are independent then
B P A P B A P
If in an experiment, the events k A A A A ,,,, 321 can occur, then
121213121321 || k k k A A A P A P A A P A P A A P A P A A A A P If the events k A A A A ,,,, 321 are independent, then
k k A P A P A P A P A A A A P 321321
Example 2.5:A card is drawn from a regular deck of 52 cards. Event A is the event that the card drawn is a Jack.
Event B is the event that the card drawn is a diamond. Find the probability that the
a. card drawn is a diamond and a Jack. b. card drawn is a Jack given that the card is a diamond.c. card drawn is a diamond given that the card is a Jack.
Solution:
P(A) = 4 / 52 = 1 / 13 since there are 4 Jacks in the deck
P(B) = 13 / 52 = 1 / 4 since there are 13 diamonds in the deck
P(A B) = 1 / 52 since there is only 1 Jack of diamonds in the deck
8/19/2019 PS - Handbook.pdf
20/108
P(A|B) = P(A B) / P(B) = (1 / 52) / (13 / 52) = 1 / 13 = P(A)P(B|A) = P(A B) / P(A) = (1 / 52) / (4 / 52) = 1 / 4 = P(B)
As event A and B are two independent events we can get P (A B) using the formula also.That is P (A B) = P (A) P (B) = (1/13) (1/4) = 1/52
Example 2.6:A bag contains 6 blue balls and 4 red balls. Two balls will be drawn from the bag. Calculate the
probability of either one of the balls is blue.
Solution:
Let event A be the event that the first ball is blue, and let event B be the event that the second ball is
blue. Then, the event A' will be the event that the first ball is red, and event B' will be the event that
the second ball is red.
Since there are 6 blue balls out of a total of 10 balls, the probability of choosing a blue ball in the firstdrawing is 6/10. If a blue ball is taken out, then there will only be 5 blue balls and 9 total balls left;
the probability of choosing a blue ball will be 5/9. On the other hand, if the first ball is a red ball, then
there will be 6 blue balls and a total of 9 balls, in which case there would be a 6/9 (or 2/3) probability
of getting a blue ball.
Therefore
P(A) = 6/10
P(A') = 1 - 6/10 = 4/10
P(B|A) = 5/9
P(B|A') = 2/3
P(A B) = P(A) P(B|A) = (6/10)(5/9) = 1/3P(A' B) = P(A') P(B|A') = (4/10)(2/3) = 4/15
Since A and A' are mutually exclusive events, (B A) and (B A') are also mutually exclusiveevents. Thus, we can calculate P(B) as follows:
P(B)= P(B S) = P( B (A A') ) = P( (B A) (B A') )= P(B A) + P(B A')= 1/3 + 4/15
= 9/15
P (A B) = P(A) + P(B) - P(A B)
= 6/10 + 9/15 - 1/3
= 13/15
(A B) is the event that either one of the two balls drawn is blue. This being the case, P(A B)is the probability that the first ball is blue, plus the probability that the first ball is red and the second
ball is blue. Thus,
P (A B) = P(A) + P(A' B)= 6/10 + 4/15
8/19/2019 PS - Handbook.pdf
21/108
= 13/15
This is a different way to obtain the solution, but the result is the same nevertheless (Events A and B
are independent events.)
2.8 Bayes’ Rule
If the set of events A1 , A2 , ....., An constitutes a partition of the sample space S, and event B is a
subset of S, then
B = B S= B (A1 A2 ......... An )= (B A1 ) (B A2 ) ....... (B An )
As the events A1 , A2 , ......, An are mutually exclusive, then the events (B Ai ), where i {1, 2,...., n }, is also mutually exclusive. Assuming that none of the events A1 , A2 ,....., An is null, i.e. P(Ai
) 0 , i {1, 2,...., n }
P(B) = P(B A1 ) + P(B A2 ) + ....... + P(B An)= P(A1) P(B | A1 ) + P(A2) P(B | A2) + ....... + P(An ) P(B | An )
Theorem of total probability
If the events k A A A ,...,, 21 constitute a partition of the sample space S such that, 0i A P fori=1,2,…,k, then for any event B of S,
k
i
ii
k
i
i A B P A P A B P B P 11
|
From the definition of conditional probability,
)(
)|()(
)(
)()|(
B P
A B P A P
B P
B A P B A P iiii
Thus we have derived Bayes' Rule, which states the following:
Bayes' Rule :
If the set of events A1 , A2 ,....., An constitutes a partition of the sample space S, P(Ai ) 0 , i {1,
2,....., n }, and event B is a subset of S, P(B) 0,
)|()(........)|()()|()(
)|()()|(
2211 nn
ii
i A B P A P A B P A P A B P A P
A B P A P B A P
Example 2.7:A family had plans to go fishing on a Sunday afternoon, but their plans were dependent on the weather
at noon Sunday. If it was sunny, then there was a 90 % chance that they would go fishing. If it was
cloudy, then the probability that they would go fishing would drop to 50 %. And if it was raining, the
chances dropped to 15 %. The weather prediction, which we can assume to be accurate, called for a10 % chance of rain, a 25 % chance of clouds, and a 65 % chance of sunshine.
8/19/2019 PS - Handbook.pdf
22/108
Set event F as the event that the family goes fishing
S as the event that the weather is sunny at Sunday noon
C as the event that the weather is cloudy at Sunday noon
R as the event that the weather is rainy at Sunday noon
Assuming that the family ends up going fishing, find the probability of each type of weather occurring.
Solution:
P(S) = 0.65, P(C) = 0.25, P(R) = 0.10
Note that P(S) + P(C) + P(R) = 1, and of course S, C and R are mutually exclusive events.
P(F|S) = 0.90, P(F|C) = 0.50, P(F|R) = 0.15
P(F) = P(F|S) P(S) + P(F|C) P(C) + P(F|R) P(R)
= (0.90)(0.65) + (0.50)(0.25) + (0.15)(0.10)
= 0.585 + 0.125 + 0.015
= 0.725
Assuming that the family ends up going fishing, the probability of each type of weather occurring isP(S|F) = probability of sunny weather, given that the family went fishing.
=)(
)()|(
F P
S P S F P =
725.0
)65.0)(90.0( = 0.807
P(S|F) = probability of cloudy weather, given that the family went fishing.
=)(
)()|(
F P
C P C F P =
725.0
)25.0)(50.0( = 0.172
P(S|F) = probability of rainy weather, given that the family went fishing.
=)(
)()|(
F P
R P R F P =
725.0
)10.0)(15.0( = 0.021
Note that P(S|F) + P(C|F) + P(R|F) = 0.807 + 0.172 + 0.021 = 1.000
2.9 Exercises (Extracted from Schaum’s Series by Walpole & Mayer)
1. A pair of dice is tossed and the two numbers appearing on the top are recorded. Draw the samplespace and find the number of elements in each of the following events:
(a) A = { two numbers are equal }(b) B = { sum is 10 or more }(c) C = { 5 appears on first die }(d) D = { 5 appears on at least one die }
2. Determine the probability p of each event:(a) An even number appears in the toss of a fair die.
8/19/2019 PS - Handbook.pdf
23/108
(b) At least one tail appears in the toss of 3 fair die.(c) A white marble appears in the random drawing of 1 marble from a box containing 4 white
marbles, 3 red marbles and 5 blue marbles.
3. A box contains 15 billiard balls, which are numbered from 1 to 15. A ball is drawn at random andthe number recorded. Find the probability P that the number is;
(a) Even(b) Less than 5(c) Even and less than 5(d) Even or less than 5
4. A class contains 10 men and 20 women of which half the women and half the men have browneyes. Find the probability P that a person chosen at random is a man or has brown eyes.
8/19/2019 PS - Handbook.pdf
24/108
5. A sample space S consists of 4 elements, that is, S = 4321 ,,, aaaa . Under which of the followingfunctions P does become a probability space?
(a) 3.0,2.0,3.0,4.0 4321 a P a P a P a P
(b) 1.0,7.0,2.0,4.0 4321 a P a P a P a P
(c) 3.0,1.0,2.0,4.0 4321 a P a P a P a P (d)
1.0,5.0,0,4.0 4321 a P a P a P a P
6. Suppose A and B are events with ,6.0 A P ,3.0 B P and .2.0 B A P Find the probabilitythat:
(a) A does not occur.(b) B does not occur.(c) A or B occurs.(d) Neither A nor B occurs.
7. Three fair coins, a penny, a nickel, and a dime, are tossed. Find the probability p that they are allheads if:
(a) The penny is heads(b) At least one of the coins is heads,(c) The dime is tails
8. A billiard ball is drawn at random from a box containing 15 billiard balls numbered 1 to 15, andthe number n is recorded.
(a) Find the probability p that n exceeds 10.(b) If n is even, find the probability p that n exceeds 10.
9. In a certain college, 25 percent of the students failed mathematics, 15 percent failed chemistry, and10 percent failed both mathematics and chemistry. A student is selected random.
(a) If the student failed chemistry, what is the probability that he or she failed mathematics?(b) If the student failed mathematics, what is the probability that he or she failed chemistry?(c) What is the probability that the student failed mathematics or chemistry?(d) What is the probability that the student failed neither mathematics nor chemistry?
10. Find A B P | if :(a) A is a subset of B.(b) A and B are mutually exclusive (disjoint) Assume 0 A P .
11. If the probabilities are, respectively, 0.09,0.15,0.21, and 0.23 that a person purchasing a newautomobile will choose the color green, white, red, or blue, what is the probability that a given
buyer will purchase a new automobile that comes in one of those colors?
12. Suppose that a factory has a fuse box containing 20 fuses, of which 5 are defective. If 2 fuses areselected at random and removed from the box in succession without replacing the first, what is the
probability that both fuses are defective?
13. Items sampled on a production line may be classified as defective (D) or non-defective (N). Listelements in the sample space if sampling process terminates:
(a) After 4 items have been sampled.
8/19/2019 PS - Handbook.pdf
25/108
(b) After 3 defectives in a row have been observed or 4 items have been sampled.(c) When the first defective is observed.
Suppose, 5% of the products are defective.
(d) Find the probability of exactly 2 defective items if sampling processes (a) is adopted.(e) In the sampling process (c), what is the probability that the sampling process is terminated
before the 3rd item is sampled?
14. It is compulsory for the driver of a car to wear a seat belt while driving. The results of a surveyshow that not all drivers are wearing seat belts.
Age Driver wearing
seat belt
Driver not
wearing seat belt
< 40 375 52
>= 40 425 148
Use the data to estimate the probability that a randomly chosen driver(a) Is wearing a seat belt.(b) Is under 40 and wearing a seat belt.(c) Suppose the randomly chosen driver is under 40. What is the probability that the driver is
wearing a seat belt?
15. In a certain region of the country it is known from past experience that the probability of selectingan adult over 40 years of age with cancer is 0.05. If the probability of a doctor correctly diagnosing
a person with cancer as having the disease is 0.78 and the probability of incorrectly diagnosing a
person without cancer as having the disease is 0.06, what is the probability that a person isdiagnosed as having cancer?
16. In a certain assembly plant, three machines, B1, B2, and B3, make 30%,45%,and 25%,respectively,of the products. It is known from past experience that 2%, 3%, and 2% of the products made by
each machine, respectively, are defective. Now, suppose that a finished product is randomly
selected. What is the probability that it is defective?
8/19/2019 PS - Handbook.pdf
26/108
17. Suppose that three machines at a factory are used to produce a large quantity of identical parts. The production machines have different capacities: Machine A has a large capacity and produces 60%
of the parts, while machines B and C produce 30% and 10% of the parts, respectively.
Historical data indicate that 10% of the parts produce by Machine A are defective, compared to
30 % for Machine B and 40% for Machine C.
(a) Complete the following table.
(b) What are the conditional probabilities, updated in light of the evidence that the part is defective
of machine A, B or C having produced it?
2.9 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th edition)
Exercises - Page 97 - Question 2. 109, 2. 110, 2. 111, 2. 112, 2. 129
LECTURE 3
DISCRETE RANDOM VARIABLES
3.1 Introduction
Machine Defective Nondefective Total
A
B
C
Total 100
8/19/2019 PS - Handbook.pdf
27/108
Definition: A random variable X is a numerically valued variable defined on the sample space, . R:X
We say that X is a discrete random variable if it can take only a countable set of values, i.e. integer
or rational values.
Consider tossing a fair coin. We know that the outcome is either a head or a tail.
P(head) =2
1 , P (tail) =
2
1
If we denote the number of heads by X, then
P(X = 1) =2
1 , P (X = 0) =
2
1
X is an example of a random variable. Note that a random variable is usually labelled with a capital
letter (say X). The realised value of the random variable X is denoted by x.
Definition: If we have a discrete random variable X taking values n21 x,.....,x,x
with probabilities n21 p,......., p, p respectively, where
i,0 p1 p..... p p p in321 ,
then this defines a discrete probability distribution for X. Although we have written the random
variable X as taking a finite set of values in this definition, it also holds for an X which takes an infinite
countable set of values, e.g. all non-negative integers.
We may write P( X = xi) as pi. This is sometimes referred to as the probability function for X.
Example 3.1: Two fair dice are thrown. Let X be the sum of the values on the faces turned uppermost.Find the probability distribution for X.
The sample space can be shown as follows.
X 2 3 4 5 6 7 8 9 10 11 12
P(X)1
36 2
36 3
36 4
36 5
36 6
36 5
36 4
36 3
36 2
36 1
36
Note that X = 2 if and only if both dice show 1. Also, X = 3 if and only if one die shows 1 and the
other 2.
Note that the sum of the probabilities is one and all are positive so this is a valid probability distribution.
Example 3.2: The discrete random variable X has probability function given by
P(X=x) = cx2 , x=1,2,3,4. Find C.
X 1 2 3 4
P(X) c 4c 9c 16c
We know that c + 4c + 9c + 16c = 1 and hence c =301
8/19/2019 PS - Handbook.pdf
28/108
Definition: Suppose X is a discrete random variable taking values n21 x,.....,x,x , with probabilities
n321 p,....., p, p, p then the mean or expected value of X, written as or E[X] is given by
n
i 1
ii x pE(X)μ
If X takes an infinite number of values the sum is taken over all values of i.
To justify this definition, suppose we had a sample of x values where x occurs with frequency f . Thenthe sample mean would be
ixf
f
f
xf x
i
i
i
ii
In the limit if we collect enough data ii f f tends to p.
Example 3.3: A die is thrown, what is the mean (or expected) score?
6
16
6
15
6
14
6
13
6
12
6
11)( X E = 3.5
Note that the expected value of a random variable is not necessarily a value the random variable can
take. The expected score when we throw a. fair die is 2 1/2, but a die cannot take this value. Think of
the expected value or mean of a random variable as a measure of where the distribution is centred
around.
The expectation of any function of a random variable, g(X) say, is defined in a similar way.
Definition: If X is a discrete random variable then the expectation of X is given by
n
1i
ii )g(x pE[g(X)]
We can also define the variance and standard deviation of a random variable.
Definition: If X is a discrete random variable then its variance, written Var[X] is defined by
n
1i
2
ii μ)(x pVar[X]
The standard deviation of X is the positive square root of the variance of X.By multiplying out the bracket it is straightforward to see that the variance is given by
22ii μ)x p(Var[X] or 22 (E[X])]E[XVar[X]
Example 3.4 : A die is thrown, what is the variance of the score?
6
91
6
16
6
15
6
14
6
13
6
1.2
6
11]E[X 2222222
12
35
4
49
6
91]V[X
The variance gives an idea of how spread out the distribution is.
8/19/2019 PS - Handbook.pdf
29/108
3.2 Bernoulli Distribution
X 0 1
P(x) 1-p P
Example 3.5 : Toss a coin once. Let p be the probability of getting a head and
X = 0 if T occurs
= 1 if H occurs
Then X ~ Bernoulli (p).
3.3 Binomial Distribution
If we have n Bernoulli trials with probability of a success equal to p then the probability of r successes
is given by the binomial probability
n.,2,1,0,r p)(1 pc)( r nr r n
r X P
Thus, if we consider the random variable X which is the number of successes in n Bernoulli trials, then
P(X = r) is given by the binomial probability with parameters n and p.
Statistical table 1 gives the probability of r or more successes in n independent trials with the
probability of success p. For example if we wanted the probability of obtaining 23 or more heads in50 tosses of a fair coin we find that the answer is 0.76006.
Example 3.6: Suppose that 5% of the articles made by a factory are defective. What is the probability
of finding 1 defective in a sample of 10 from a very large batch? Since it is a large batch we may treat
this as sampling without replacement and the number of defectives, X, will have a binomial distribution
with n =10 and
p = 0.05. Thus
475.095.005.01
10)1( 9
X P
We can also find this quantity from the tables, 31512.008614.040126.0)2()1()1( X P X P X P
The tables are only given for some values of n and p so are not always useful, but you should knowhow to use them. Note that although p is only given up to 0.5, we can always turn a problem where the
probability of a ‘success’ is greater than 0.5 into a question about ‘failures’ which will have probability
less than 0.5. An example of this is given next.
Example 3.7: Fifty seeds were planted and it is known that the probability of any seed germinating is
0.8. Assuming that the number of seeds germinating follows a binomial distribution, using tables find
the probabilities of the following events (a) exactly 40 seeds germinate,
(b) more than 12 seeds fail to germinate,(c) more than 38 but fewer than 45 seeds germinate.
8/19/2019 PS - Handbook.pdf
30/108
8/19/2019 PS - Handbook.pdf
31/108
np
p pnp
p pk
nnp
p pr
nr np
p pr
nn
p pr nr
nn
p pr nr
nr
p pr
nr
p pr
nr
r X rP X E
n
n
k
k nr
n
r
r nr
n
r
r nr
n
r
r nr
n
r
r nr
n
r
r nr
n
r
r nr
n
r
1
1
0
1
1
1
1
1
1
1
0
0
)]1([
)1(1
)1(1
1
)1(1
1
)1()!()!1(
)!1(
)1()!(!
!.
)1(
)1(
)()(
Recall that 22 ]][[][][ X E X E X Var
Now ][)]1([][][ 22
X E X X E X E X E
Then
2
22
2
0
22
2
22
2
2
2
0
)1(
)]1([)1(
)1(2
)1(
)1(2
2)1(
)1(2
2)1(
)1()!()!2(
)!2)(1(
)1()!(!
!)1(
)()1()]1([
pnn
p p pnn
p pk
n pnn
p pr
n pnn
p pr
nnn
p pr nr
nnn
p pr nr
nr r
r X P r r X X E
n
n
k
r nk
n
r
r nr
n
r
r nr
n
r
r nr
n
r
r nr
n
r
8/19/2019 PS - Handbook.pdf
32/108
Thus )1()(.)1(][ 22 pnpnpnp pnn X Var
Example 3.8: The random variable X has a binomial distribution with parameters n=100 and p=0.8.
Find the mean and the variance of X.
The mean = np = 80, the variance is np (1- p) =16
3.4 Poisson Distribution
Suppose events occur at random at an average rate per minute. Examples include radioactive decayand arrivals in a queue. Then the distribution of the number of events which occur in one minute, is
said to have a Poisson distribution with parameter . If X has a Poisson distribution then
,2,1,0!
)exp(][ r r
r X P r
where >0. Note that
1
)exp()exp(
!)exp(
!)exp(
0 0
r r
r r
r r
So this is a valid probability distribution.
It can be shown that )(,)( X V X E
Statistical Table (3) gives the probability that a Poisson random variable with mean will be greateror equal to r in the same way as the binomial tables. For example, suppose that X is a random variable
with Poisson distribution with mean 2.0.Find )2()3()3()2()2()1( X P X P X P
27067.032332.059399.0)3()2()2( X P X P X P
40611.059399.01)2(1)2(
32332.0)3(
X P X P
X P
A property of the Poisson distribution is that if X is Poisson with mean then kX is Poisson with mean
k . This can be useful in calculating probabilities of numbers of event in a time period different tothat for which information is given.
Note that if X has a binomial distribution with parameters n and p that np X E )( and
)1(][ pnp X Var . Now if p is small then 1-p is close to one and np(1-p) np .This suggest that if pis small we may be able to approximate X by a Poisson random variable with mean np. So long as p
is small (may be < 0.1) and n is large (may be >50) a binomially distributed random variable is well
approximated by a Poisson random variable of mean np.
Example 3.9: IF X has a binomial distribution, n=100, p=0.01 then from the tables
36973.0)1(
026424)2(
63397.0)1(
X P
X P
X P
8/19/2019 PS - Handbook.pdf
33/108
The corresponding quantities from the Poisson tables with =1 are
36788.0)1(
26424.0)2(
63212.0)1(
Y P
Y P
Y P
Example 3.10: The probability that a car has defective gearbox is 0.02. If I check the gearboxes of
140 cars what is a suitable approximation to the probability that I find
(a) 2 defectives (b) more than 5 defectives (c) fewer than 4 defectives
Let X be the number of defective gearboxes that I find. Then X has a binomial distribution with n=140
and p=0.02. Since n is large and p is small a Poisson random variable with mean 8.2 np will
give a good approximation to X tables
692.030806.01)4(1)4()(
065.0)6()5()(
238.053055.076892.0)3()2()2()(
X P X P c
X P X P b
X P X P X P a
3.5 Exercises
1. A manufacturing process produces components which are free from any faults with probability p. Find the probability that in a sample of size 50 from a large batch there are fewer than 4
faulty components when p = 0.95. Find the probability that in a sample of size 50 there are
fewer than 10 faulty when p = 0.75.
2. Use the table to give a suitable approximation to the probability that 5 X where X is binomialrandom variable with parameters p = 0.05 and n = 400.
3. A car-pooling study shows that the number of passengers, X in a car (excluding the driver) islikely to assume the values 01,2,3 and 4 with probabilities given by the table.
X 0 1 2 3 4
P(X=x) 0.7 0.1 0.1 0.05 0.05
(a) Determine the probability of at least two passengers in a car.(b) Find the cumulative distribution function of X and sketch it.(c) Calculate
(i) E(X)(ii) E(X2)(iii) V(X)
4. Suppose that in late summer, the Fremantle Surf Life Saving club makes an average of two surfrescues per day Use the Poisson probability distribution to determine the probability that
(a) More than two rescues are made on a particular day.
8/19/2019 PS - Handbook.pdf
34/108
(b) Five surf rescues are made in a 3-day period.
3.6 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th Edition)
1. Exercises - Page 189 - Question 5.51 – 5.70
8/19/2019 PS - Handbook.pdf
35/108
LECTURE 4
CONTINUOUS RANDOM VARIABLES
4.1 Introduction
A random variable X is a numerically valued variable defined on the sample space, X: R
We say that X is a continuous variable if it is not discrete.
Definition: If X is a continuous random variable then there exists a non-negative function, f(x), called
the probability density function of X such that
And
b
a
dx x f b X a P
dx x f
)()(and
1)(
Note that any function, which is non-negative and integrates to one is a possible probability density
function for a random variable X. As with discrete random variables some density functions are
commonly used to model continuous random variables. It is also convenient to define the following
function.
Definition The cumulative distribution function, F(x) of a continuous random variable x is defined by
t
dx x f t X P t F )()()(
Note that for a discrete random variable the cumulative distribution function
P(X x) will be a step function with steps of height P(X = x) at the points at which X is defined. Thecontinuous version can be thought of as a limiting case when all values of x in an interval are possible.
Note that the cumulative distribution function is always non-decreasing and
x
x F 0)(lim
x
x F 1)(lim
We define the mean of a continuous random variable as follows.
Definition If X is a continuous random variable with probability density function f(x) then the mean
or expected value of X, E[X] or p is defined by
dx x xf X E )(][
We define the expectation of a function of X in a similar way
Definition If X is a continuous random variable with probability density function f(x) then the
expected value of g(x) is defined by
8/19/2019 PS - Handbook.pdf
36/108
dx x f x g X g E )()()]([
Similarly the variance is defined by
Definition If X is a continuous random variable with probability density function f(x) then the variance
of X, Var [X] is defined by
22
2
)(
)()(][
dx x f x
dx x f x xVar
We can also define the median of a continuous random variable.
Definition if X is a continuous random variable with probability density function f(x) then the median
of x is the value m satisfying the equation
m
m
dx x f dx x f 2
1)()(
It is the value such that X is equally likely to be more than the median as less than it.
Example 4.1: A random variable X has probability density function.
otherwise0
10)1()(
2 xif xcx x f
1. Determine c.
2. Find E[X].
3. Find Var[X].
4. Show that the median m satisfies the equation
0186 34 mm Solution:
1. We know that
1)( dx x f
so
8/19/2019 PS - Handbook.pdf
37/108
12
1
1
043
1
0
32
1)(
1)(
43
c
c
dx x xc
x x
and hence c = 12.
2.
5
3
)54
(12
).(12)(
1
0
54
1
0
43
x x
dx x x X E
3.
3
2
30
12
)65
(12
)(12][
1
0
65
1
0
542
x x
dx x x X E
Thus25
1
5
3
5
2][
2
X Var
4.
8/19/2019 PS - Handbook.pdf
38/108
0186
5.034
5.0)43
(12
5.0)43
(12
5.0)(12
34
43
430
43
0
32
mm
mm
mm
x x
dx x x
m
4.2 Exponential Distribution
The exponential distribution can be used to model the lifetimes of components. It is also linked to the
Poisson distribution. If X has a Poisson distribution then the time between occurrences of X follows
an exponential distribution.
The probability density function for an exponential distribution is
otherwise0
0,)exp()(
xif x x f
We shall check first that this is a valid p d f. Clearly 0)( x f . Also
1]exp[]exp[
00
xdx x
To find the mean we use integration by parts
1
exp[1
)exp(]]exp[[
)exp(][
0
0
0
0
x
dx x x x
dx x x X E
8/19/2019 PS - Handbook.pdf
39/108
To find the variance we first find E[X2]. This is also done by integration by parts
2
00
2
0
22
2
12
)(1
)exp(2]]exp[[
)exp(][
X E
dx x x x x
dx x x X E
Therefore
222
112][
X Var
The cumulative distribution function is given by
0)(00
)()(
0
xif dt t f
xif x X P x F x
Now
]exp[1
]exp[[
)exp()(
0
0 0
x
t
dt t dt t f
The median m is given by F(m) = ½. Therefore substituting into the cdf
.2ln
2ln
2/1ln
2/1]exp[
2/1].exp[1
1
m
m
m
m
m
4.3 Exercises
1. The random variable X has probability density function 3)2()( xc x f for 0
8/19/2019 PS - Handbook.pdf
40/108
2. Assume that the continuous random variable x has the probability density function
otherwise0
2/30)49()(
2 x for xk x f
(a) Calculate the value of k .
(b) Find the mean and variance of x.(c) Find the cumulative distribution function of x.
(d) Find the median of x.
(e) Find P(1/2x< 1).
3. The time (in hours) between successive calls has an exponential distribution with parameter 1/6 . What is the probability of waiting more than 15 minutes between any two successivecalls?
4. Identify and name the continuous random variables from the following list of variables: X : the number of automobile accidents per year in Virginia.
Y : the length of time to play 18 holes of golf.
M : the amount of milk produced yearly by a particular cow.
N : the number of eggs laid each month by a hen.
P : the number of building permits issued each in a certain city.
Q: the weight of grain produced per acre.
4.4 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th Edition)
1. Exercises - Page 112- Question 3.7, 3.9, 3.12, 3.21
8/19/2019 PS - Handbook.pdf
41/108
LECTURE 5
NORMAL DISTRIBUTION
5.1 Introduction
The normal, or Gaussian, distribution is the most commonly used distribution in statistics. A normally
distributed random variable with mean and variance 2 has its probability function given by
xfor ]/2σμ)(xexp[2πσ
1φ(x) 22
It is denoted by )σ N(μ(~X 2 .
If X is normally distributed with mean 0 and variance 1, then we write N(0,1)~X . Its probability
density function is usually written as (x) and is given by
xfor )2xexp(2π
1φ(x) 2
The cumulative distribution function is denoted by (x).
We can calculate probabilities for a normal distribution from the standard normal using
N(0,1)~σ
μX
Statistical Table (4) gives the probability that a standard normal random variable, i.e. with mean zero
and variance 1, is larger than specified value. i.e. 1-(x).. In using the tables we utilise the symmetryof the normal distribution, and the fact that 0.50)P(Z0)P(Z
Example 5.1: Calculate the probabilities of the following events.
(i) Z < -2.45,(ii) (Z < - 2.1) ( Z > 2.1)(iii) 0 < Z < 1.2
Solution:
(i) By symmetry P ( Z < -2.45) = P (Z > 2.45) = 0.00714
(ii) By symmetry P [( Z < -2.1) P ( Z > 2.1) ] = 2 P ( Z > 2.1) = 2 x 0.01786 = 0.03572(iii)
P [ Z > 1.2] = 0.11507P [Z < 1.2 ] = 1 – 0.11507
= 0.88493
P[0 < Z < 1.2] = 0.88493 – 0.5
= 0.38493
Example 5.2: It is known that in a certain district the heights of adult males are normally distributed
with mean 175cm and standard deviation 7cm. Find the probability that a man selected at random from
this district will be
(a) over 182cm tall.
(b) between 170cm and 181cm tall.
(c) under 179cm tall.
Let X be the height of the selected man.
8/19/2019 PS - Handbook.pdf
42/108
Then ) N(175,7~X 2 Z = (X-175)/7 ~ N (0,1)
(a) P( X > 182) = P ( Z > (182 – 175)/7) = P (Z > 1) = 0.159
(b) P( 170 < X < 181) = P ( -5/7 < Z < 6/7)
= P (Z >-5/7) – P (Z > 6/7)
= 0.7625 – 0.1968 = 0.566
(c) P (X< 179) = P ( Z < 4/7) = 1 – P( Z > 4/7) 1 – 0.284 = 0.716
5.2 Normal as an Approximating Distribution
When n is large and p moderate we may use the normal distribution to approximate binomial
probabilities. Note that as we are approximating a discrete random variable by a continuous one, wehave to employ continuity correction.
For discrete random variable P( X < x) = P ( X x-1) We approximate these quantities by P (Y < x -
21 ) . We illustrate the technique in the following example.
Example 5.3: A fair coin is tossed 150 times. Find a suitable approximation to the
probability of each of the following events.
(a) more than 70 heads
(b) fewer than 82 heads(c) more than 72 but fewer than 79 heads.
Let X be the number of heads thrown, then X has a binomial distribution with n = 150 and p = ½ . As
n is larger and p moderate we may approximate X by Y a normal random variable with mean np = 75
and variance np(1-p) = 37.5.
a. We require P(X > 71) but this is the same as P(X 70 ) so we approximate by P (Y > 70.5).
0.7690.735)P(Z)37.575)/(70.5P(Z
b. We require P( X < 82) but this is the same as P(X 81) so we approximate
by P(Y < 81.5). P(Z < (81.5 – 75)/ 5.37 ) P(Z < 1.06) = 1- 0.145 = 0.855
8/19/2019 PS - Handbook.pdf
43/108
(c) We require P (72 < X < 79) which is the same as P (73 X 78) and thus we approximate by(72.5 < y < 78.5).
P(-0.408 < Z < 0.571) = 0.658 – 0.284 = 0.374
We may similarly approximate a Poisson random variable by a normal one of the same mean and
variance so long as this mean is moderately large. We again have to use the continuity correction.
Example 5.4: A radioactive source emits particles at random at an average rate of 36 per hour. Find
an approximation to the probability that more than 40 particles are emitted in one hour.
Let X be the number of particles emitted in one hour. Then X has a Poisson distribution with mean 36
and variance 36. We can approximate X by Y which has a N(36, 36) distribution. We require P(X >
40). This is approximately P(Y 40.5).
0.2266
0.75)P(Z
)6
3640.5P(Z40.5)P(Y
5.3 Exercises
1. The sample data consists of the values:
0.325 0.317 0.375 0.325 0.508 0.117 0.150 0.317 0.275 0.383
Do they appear to come from a Normal Distribution?
(i) What is the percentage of values within one standard deviation of the mean?(ii) What is the percentage of values within two standard deviations of the mean?Do they appear to come from a Normal Distribution? Justify your answer.
2. Construct a Normal probability plot using SPSS for the data given in (1) and explain how itcould be used for checking normality.
3. 94 95 30 98 76 73 95 97 86 91 85 70 96
70 91 72 97 97 84 28 19 90 77 58 58 47
48 28 20 65
(a) Plot these data.(b) Find the mean and the standard deviation for this data.(c) Let X be a Gaussian (Normal) random variable with mean and standard deviation you
calculated in part (b). Find the following probabilities.
8/19/2019 PS - Handbook.pdf
44/108
(i) P(X < 30)(ii) P(X > 90)(iii) P(50 < X < 80)
(d) Find the proportion of data values that are
(i) less than 30(j) greater than 90(k) from 50 to 80
(e) Can the distribution of these values approximated by a Normal distribution?
4. The lengths of a batch of bolts are assumed normally distributed with mean 4cm and standard
deviation 0.1cm. What is the probability that a bolt selected at random will be more than
4.1655cm in length? (Give answer to 5 dp)
5. A coin is to be tossed 100 times.
(a) Assuming the coin is biased with P(head) =0.6, use a normal approximation to estimate
the probability that between 56 and 63 heads occur.
(b) Assume P(head)=0.99. Use a suitable approximation to estimate the probability that
exactly 99 heads occur. (Do not calculate the exact binomial probability).
5.4 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th Edition)
1. Exercises - Page 209 - Question 6.1 – 6.10
8/19/2019 PS - Handbook.pdf
45/108
LECTURE 6
RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS
6.1 Introduction
Definition: The sampling distribution of a random variable is the collection or distribution of all
possible values of the random variable over all possible samples. If the sample is a random sample of
size n from an infinite population then, x1,x2,…xn are independent random variables each with the
same distribution (i.e. same p.d.f or probability function) as the population so that
E(xi) = Var (xi) = 2
Theorem 1 Averaging over all random samples of size n from an arbitrary population with mean and variance
2 , the sample mean x and sample variance 2
s have the following three properties:
E ( x) = i.e is an unbiased estimator of
Var ( x) = 2/ni.e. the variability of as an estimator decreases with n.
22 σ]E[s i.e. s2 is an unbiased estimate of 2 .
Thus s2/n is used as an unbiased estimate of the variability or variance of x as an estimator of μ .
Example 6.1: An infinite population is described by an asymmetrical discrete distribution with just
two values: -3 with probability 0.3 and +1 with probability 0.7. Thus we have
0.20.7)(10.3)3(E[X]μ
36.3)2.0(7.0)(10.33)(μ]E[XVar[X]σ 222222
These are the values of the (usually unknown) population parameters. Let us now look at all samples
of size of 3. There are infinitely many, but we can tabulate them as follows:
Sample observations x s2 P(sample)
-3, -3, -3 -3 0 (0.3)3 = 0.027-3, -3,1 -5/3 32/6 3(0.3)2 x 0.7 = 0.189
-3,1,1 -1/3 32/6 3(0.7)2 x 0.3 = 0.441
1,1,1 1 0 (0.7)3 = 0.343
Thus we see that 2 as an estimate of p is 2.8 below the ’true ’value in 2.7% of
samples, 1.2 above in 34.3% of samples etc. Taking the average over all samples or equivalently the
expectation over the sampling distribution, we see that
)xE( = - 3x0.027+3
5x 0.189 +
3
1 x 0.441 + 1 x 0.343 = - 0.2
exactly, confirming the first result of Theorem 1.
Then
8/19/2019 PS - Handbook.pdf
46/108
22])X(E[]XE[]XV[
= 2222
2 )2.0(343.0)1(441.03
1189.0
3
5027.0).3(
x
= 2
12.1 which confirms the second result. The third result is verified for this example by averaging over all possible values of s2 thus:
22 σ3.360.34300.4416
320.189
6
320.0270)E(s
Proof of Theorem 1
μ
μ)(nn
1
μ)(μn
1
])E[x](E[xn
1.]xE[
)x(x
n
1x
n1
n1
n
σ.
n
nσ
)σ(σn
1
])Var[x](Var[xn
1
]x[xVar n
1]x[Var
2
2
2
22
2
n12
n12
Note: x1, …,xn are independent (random sample)
The theorem shows thatn
is the standard deviation of the sampling distribution of x. A sample
estimate of this variability isn
s and is called the (estimated) standard error of the (sample) mean.
Theorem 2 Central Limit Theorem says that as n the sampling distribution of x tends to a Normal distribution with the same mean and variance.
The importance of this result is that we do not need to know the form or type of the original populationdistribution if our sample size is sufficiently large. We can use instead the Normal distribution for
8/19/2019 PS - Handbook.pdf
47/108
statistical inference with the knowledge that the probabilities we calculate will be good approximations
to the true (but generally unknown) probabilities.
n N X
2
,~ approximately for large n.
Then using the properties of the Normal distribution we can say that for large n,
u
n
X P
can be found approximately for any specified value u without knowing the original form of the
population.
If, however, we do know the form of the population and it follows a Normal distribution, then for any
sample size n > 1 it can be shown that
n
σμ, N~X
2
Thus 1); N(0~nσ
μXZ
has a sampling distribution which is Standard Normal for any n (Table 4).
As the population standard deviation is often unknown, replacing it by the corresponding samplequantity s changes the sampling distribution.
However, provided the underlying population is Normal, it can be shown that
ns
μXT
has a ‘Student’s t-distribution’ with v ‘degrees of freedom’, where 1 nv (named after W. S.Gossett, who took the pseudonym ‘Student’). The per centiles of this distribution are given in Table 7.
Another distribution, which arises from random samples of Normal popu lations, is the ‘chi-square’
distribution, whose percentage points are given in Table 8. It can be shown that
2
1n2
2
χ ~σ
1)s(nV
the chi-square distribution with 1n degrees of freedom, whatever the value of X . Yet anotherdistribution is the (Fisher) F-distribution with percentage points in Table 9. The F and 2 distributions
are used for statistical inference on the variances of Normal populations as well as for wider application
in Goodness-of-Fit tests.
Note:
Using SPSS it is possible to check these distributional results empirically by generating a sufficient
8/19/2019 PS - Handbook.pdf
48/108
number of random samples from a Normal population.
6.2 Exercises
1. The heights of 1000 students are approximately normally distributed with a mean of 174.5 cm anda standard deviation of 6.9 cm. If 200 random samples of size 25 are drawn from this population
and the means recorded, determine
(a) The expected mean and standard deviation of the sampling distribution of the mean.(b) The number of sample means that fall between 172.5 and 175.8 cm inclusive.(c) The number of sample means that falling below 172 cm.
2. Show that the sample variance is unchanged if a constant is added to or subtracted from each value
of the sample.
3. If the size of a sample is 36 and the standard error of the mean is 2, what must the size of thesample become if the standard error is to be reduced to 1.2?
4. The amount of time that a drive-through bank teller depends on a customer is a random variablewith a mean 3.2 minutes and a standard deviation 1.6 minutes. If a random sample of 64customers is observed, find the probability that their mean time at the teller’s counter is
a) At most 2.7 minutes; b) More than 3.5 minutes;
c) At least 3.2 minutes but less than 3.4 minutes.
5. If all possible samples of size 16 are drawn from a normal population with mean equal to 50 and
standard deviation equal to 5, what is the probability that a sample mean will fall in the intervalfrom 1.9 , 0.4? Assume that the sample means can be measured to any degree ofaccuracy.
6.3 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th Edition)
1. Exercises - Page 275 - Question 8.18 – 8.20
8/19/2019 PS - Handbook.pdf
49/108
8/19/2019 PS - Handbook.pdf
50/108
LECTURE 7
TEST OF HYPOTHESIS
7.1 Introduction to Hypothesis Testing
Definition 1 The null hypothesis H 0 is a statement about the value of the parameter of interest. A simple
null hypothesis specifies the population distribution exactly. We examine the data to see whether they
support ‘or provide evidence against the null hypothesis H0.
The alternative hypothesis H1 describes only the possibilities (there may be many) that we are prepared
to consider if H0 is not true.
Definition 2 The test statistic for H0 versus H1 is a random variable with known (or approximately
known) distribution-assuming H0 to be true ‘under H0’. The observed value of the test statistic can
indicate departures from H0 in favour of H1.
Definition 3 The P-value gives the probability of, under H 0 , observing a value of the test statistic at
least as extreme as the value actually observed, where extremities indicate departures from H0 in favour
of H1. If the P-value is as small or smaller than , we say the test is statistically significant.
7.2 Procedure
Null and Alternative Hypotheses A clear statement of both should be given in terms of the population
parameter of interest, together with a short verbal interpretation.
Test Statistics: The formula in terms of sample statistics such as mean and standard deviation should
be stated with the (sampling) distribution under the null hypothesis. Then the observed value of the
test statistic should be calculated to at least three significant figures.
Assess evidence: The P-value should be used to form a verbal statement or conclusion regarding the
truth or otherwise of the null hypothesis. Finally a verbal interpretation of this conclusion should be
given for the non-statistician.
Depending on the conclusion reached (if any) the investigator may wish to quote a confidence interval
for the parameter at the desired level.
Example 7.1: Articles produced by a manufacturer should have mean length 4 cm. and standard
deviation 0.02cm. A test sample of size 10 from a large batch of production has x = 4.01. Is there
evidence that the unknown mean length μ , say, of articles in the batch is unsatisfactory?
8/19/2019 PS - Handbook.pdf
51/108
The Null hypothesis is H0 : = 4 (batch satisfactory) to be tested against the alternative
H1 : μ 4 (batch unsatisfactory).
We need a test statistic whose distribution is known under the null hypothesis i.e. assuming H 0 to be
true. We know that in general for random samples from a Normal population
)n
σ., N(μ~X
2
so, under H 0
)10
(0.02), N(4~X
2
N(0,1)~100.02/
4XZ
is standard Normal . Large values of Z (either positive or negative) indicate departures from H 0 in
favour of H1 and the observed value of Z is
58.110/02.0
401.4
Z
So, the probability of observing a value of Z at least as extreme as this (the P-value) is
P (Z> 1.58) + P( Z < -1.58) = 2 x P (Z> 1.58) = 0.1141,
using the symmetry of the Normal distribution.
Thus there is a 11.4% chance of observing this sample result or worse even if the batch is satisfactory.
We therefore conclude that there is no evidence against the null hypothesis.
Note that this was a ‘two-sided’ or ‘two-tailed’ test as the alternative hypothesis is ‘two sided’, namely
4 . If there was a legal requirement of a maximum mean length of cm, then we would not be
concerned with the possibility that 4. We would ask
whether there was sufficient evidence in the data to make us worry about failing the requirement, and
the test statistic and observed value would be the same as before. Only large positive and not negative
values of Z would indicate departures in favour of H1 so the P-value is just P(Z >1.58) = 0.057. Now
we have slight evidence against H0 in favour of.H1 i.e. slight evidence that the batch may fail to meet
the legal requirement. This is called a ‘one-tailed’ or ‘one-sided’ test as the alternative hypothesis is
“one-sided’, namely > 4.
However, the assumption that the population variance 2 is known is often unrealistic:
Example 7.2: A random sample of 5 men had a mean height x of 70 inches and a sample standard
deviation s of 2 inches. Is there any evidence in these data against the (null) hypothesis that the mean
of the population is 67 inches? To test H0 : = 67 versus H1 : 67 we need a test statistic whosedistribution is known under H0. Such a statistic is
ns
μXT
~ t0 under Ho
8/19/2019 PS - Handbook.pdf
52/108
That is, Student’s ‘t’ with 4 degrees of freedom. The observed value of T is 3.35 so to calculate P we
must refer this value to percentage points of the t-distribution with 4 degrees of freedom (d.o.f.) Now
t4(0.025) = 2.776 lie on either side of our observed value.
Alternatively we can use the 2-values on the second row of Table 7 to arrive at the same answer. We
cannot therefore say exactly what the probability of obtaining a value at least as extreme as the oneobserved is, but we can specify it within a suitable range and this is sufficient to enable us to conclude
that there is moderate evidence against the null hypothesis. So even this small sample provides
evidence.
7.3 Confidence Intervals for Hypothesis Testing
Often we may be asked to estimate the population mean, , rather than testing a hypothesis about it.Or we may have performed a test and found evidence against the null hypothesis casting doubt on our
original hypothesised value. We can (and indeed must) give an estimate of uncertainty along with our
best estimate of p, which is ~, the sample mean.
Whatever the value of
95.0]96.196.1[
n
X P
,
cross multiplying we get
95.0]/96.1/96.1[ n X n P
Subtracting X gives
95.0]/96.1/96.1[ n X n X P
and finally multiplying by — 1 gives,
95.0]/96.1/96.1[ n X n X P
and this is true whatever the value of , so we can say that the random interval)/96.1,/96.1( n X n X has a probability of 0.95 of containing or covering the value of ;
that is, 95% of all samples will give intervals (calculated according to this formula) which contain the
true value of the population mean. This interval is called a 95% confidence interval for . Note that
there is no guarantee that any specific sample contains with 95% probability.
In general a 100(1 — )% confidence interval for is given by
[n
x )2/(1 ]
where )2/(1 denotes an upper percentage point of the standard Normal distribution when 2 is
known, and given by
[n
st x )2/(1 ]
when 2
is unknown.
8/19/2019 PS - Handbook.pdf
53/108
Example 7.1 revisited: The above argument can be used for a 95% confidence interval as we are
assuming that the population variance 2 is known. Thus)10/02.096.1,10/02.096.1( X X
is a 95% confidence interval for p and substituting the observed value x = 4.01 we obtain (3.9976,
4.0224).
This interval includes 4 cm. Therefore, no evidence to say that the articles in the batch are
unsatisfactory.
Example 7.2 revisited : When 2 is unknown,
)52.776s/X,52.776s/X(
which contains 4 cm with probability 0.95 as t4(0.025) = 2.776 from Table 7, there being o