IEM Outline Lecture Notes Autumn 2016

8/18/2019 IEM Outline Lecture Notes Autumn 2016

1/198

1

200052 INTRODUCTION TO ECONOMIC METHODS

LECTURE - WEEK 1

Required Reading:Ref. File 1: Section 1.13Ref. File 3: Introduction and Sections 3.1 to 3.4, 3.7

KEYS TO PASSING THIS UNIT:

(i) Undertake the required reading from the referencefiles each week. (It may be necessary to re-readsome sections more than once) – Approximately 4hours per week.

(ii) Carefully study lecture material and take notice ofadvice given in lectures.

(iii) Attempt tutorial exercises before tutorials and workout where you have difficulties, which hopefully canbe resolved in tutorials.

(iv) Make a conscious effort to keep up with the materialpresented.

1. INTRODUCTION TO UNIT

1.1 How Can We Define Statistics?

Statistics, for our purposes encompasses the followingmajor activities:

(i) Collection and description of information, or data -“descriptive statistics”. We will normally be dealing


2/198

2

with a subset of a larger collection or set of data. Thesubset is called a sample, the larger set a population.

(ii) Using sample data to make inferences about apopulation - “statistical inference”.

1.2 Why Study Statistics?

(i) (Major) It can be useful. It can help us to make

decisions in the face of uncertainty.

(ii) People are bombarded with statistics all the time.Often statistics is used in ways that are notwarranted. It is important not to be fooled by peoplewho misuse statistics.

(iii) It is important to have a clear understanding of thestrengths and limitations of statistical analysis.

1.3 Structure of the Subject

Descriptive Statistics:

How we summarise the characteristics of raw data(using graphs, summary measures, etc.)

Probability Theory and Probability Distributions(“deductive statistics”):

Rules (or axioms) for calculating probabilities ofcertain things (called events) happening.


3/198

3

Probability theory can be considered part ofdescriptive statistics.

Here we will be concerned about making probabilitystatements about a given population.

Sampling Theory and Sampling Distributions (the basis

of “inductive statistics”):

Here we will be concerned with making probabilitystatements about characteristics of samples, givenassumptions about the population from which thesample was drawn.

Point and Interval Estimation:Point Estimation - Here we will be concerned about

producing a particular estimate (a number), based onsample data, of a characteristic of a population.

Interval Estimation - Here we will not give anestimate of a population characteristic, but rather arange in which we are confident (to some degree) thetrue value of the population characteristic is.

Hypothesis Testing:Under this heading we will be looking at ways oftesting hypotheses about characteristics ofpopulations, based on sample data.


4/198

4

Regression Analysis:

In this case we will be concerned with estimatinglinear relationships between different variables, i.e.linear equations.

We will go on to examine statistical tests associatedwith estimated regression equations.

Introduction to Differential Calculus

2. DESCRIPTIVE STATISTICS

2.1 Some Basic Definitions Relating to Data

(i) Elementary Units and Frames:

Statistical data normally represents measurements orobservations of a certain characteristic or variable ofinterest of each member of a set of objects or people.

Each object (or person) for which the characteristic is orcan be measured is called an elementary unit.

The set or listing of all possible elementary units is called aframe.


5/198

5

(ii) Population/Sample:

A statistical population is the set of measurements orobservations of a characteristic of interest for allelementary units in a frame.

A population may comprise a finite or infinite number ofelements (observations), depending on the context.

A statistical sample is a subset of a population.

(iii) Parameters/Statistics:

For our purposes -The numerical characteristics which describe a populationare called parameters of the population.

The numerical values calculated from sample data arecalled sample statistics. These sample statistics can bethought of as describing or characterizing the sample.

(iv) Qualitative and Quantitative Variables:

Populations may be quantitative or qualitative. Data from

quantitative populations is called quantitative or intervaldata. Data from qualitative populations is calledqualitative, nominal or categorical data.

Data from a quantitative population can be expressednumerically in a meaningful way. The variable (orcharacteristic) associated with a quantitative population is

called a quantitative variable.


6/198

6

Data from qualitative populations cannot be expressed

numerically in a meaningful way. The variable (orcharacteristic) associated with a qualitative population iscalled a qualitative or categorical variable.

Note: Just because we assign a numerical code to aqualitative variable does not mean the variable isquantitative.

(v) Discrete and Continuous Quantitative Variables:

A discrete quantitative variable can assume only certaindiscrete numerical values (on the number line); i.e. thereare gaps between the various values. Depending on thevariable, there could be a finite or infinite number of thesediscrete values.

A continuous quantitative variable can assume any valuein a specific range or interval. The interval can be of finiteor infinite width.

Note: By definition there are an infinite number of valuesa continuous variable can take.

2.2 Frequency Distributions

(a) Introduction

Suppose we have a set of raw statistical data. At this stagewe will make no distinction as to whether we are talking

about a statistical population or sample.


7/198

7

In studying the data it is often useful to initially group the

raw data into different classes or categories. A frequencydistribution for a set of data lists the number ofobservations or ‘data points’ in each class used forgrouping (the class frequencies). The classes of afrequency distribution must be mutually exclusive (anobservation cannot fall into two classes) and exhaustive(any observation must belong to a class).

(b) Frequency Distributions for Quantitative Data

Each class of a frequency distribution of quantitative datausually has a lower and an upper limit, althoughsometimes it is necessary or convenient to have open-ended classes, i.e. classes which have either an upper orlower limit but not both.

Example:Suppose we have data on the number of children in 100households as follows:

Class Frequency0 to under 2 children2 to under 4 children4 to under 6 children6 or more children

3055132

The class width is the difference between successive lowerclass limits or upper class limits.

Note: An open-ended class has no class width.


8/198

8

General Advice for Forming Frequency Distributions:

The number of classes should generally be between5 and 20.

Class widths are ideally equal, but this may notalways be possible, and open-ended classes may benecessary.

Class limits should be chosen such that the classmidpoint is close to the average of observations in

the class. This is because in calculating summarystatistics based on grouped data the midpoint isused as representative of all observations in theclass.

(c) Relative, Cumulative and Cumulative RelativeFrequency Distributions

A relative frequency distribution shows the proportion ofall observations falling in each class. It is obtained bydividing the class frequencies ( i f ) by the total number of

observations in the data (‘n’).

A cumulative frequency distribution shows, for each class

i , the total of the first i frequencies.

A cumulative relative frequency distribution shows, foreach class i , the total of the first i relative frequencies.


9/198

9

For the previous example we have

Class (i )

0 to under 22 to under 44 to under 66+ children

Frequency(f i )305513

__2100

Cumulative.Frequency

308598100

RelativeFrequency

0.300.550.130.021.00

Cumulative.Rel. Freq.

0.300.850.981.00

An ogive is a graph of the cumulative relative frequencydistribution.

2.3 Histograms

Histograms give us a convenient way of visualising thedistribution of observations over classes. They take the

form of a series of adjacent (contiguous) rectangles, onefor each class, with the base of each rectangle centred overthe corresponding class midpoint.

In a frequency histogram the areas of the rectangles areproportional to the class frequencies, with the factor ofproportionality the same for all classes. Thus if all the

classes have the same width, each rectangle will have thesame base width and the class frequencies can berepresented by the rectangle heights.

In a relative frequency histogram the areas of therectangles are proportional to the relative frequencies.

Similarly cumulative and cumulative relative frequencyhistograms can be defined.


10/198

10

Note: Frequency and relative frequency histograms will

have the same shape.

Example:Consider the following distribution

Class0.5 to under 2.5

2.5 to under 4.54.5 to under 6.56.5 to under 8.5

Frequ.10

305010

Rel. Freq.0.1

0.30.50.1

Cum. Freq.10

4090100

Frequency Histogram

Frequency

50

30

10

0.5 2.5 4.5 6.5 8.5

Relative Frequency Histogram

RelativeFrequency

0.5

0.3

0.1

0.5 2.5 4.5 6.5 8.5


11/198

11

Cumulative Frequency Histogram

100

90

CumulativeFrequency

40

10

0.5 2.5 4.5 6.5 8.5

2.4 Shapes of Distributions

The frequency or relative frequency histogram gives us arepresentation of the shape of the distribution of the databeing analysed.

There are several terms commonly used to describe theshapes of distributions.

A distribution is described as negatively skewed (skewedto the left) if it has the following shape

A Distribution that is Skewed to the Left

Relative Frequency

Variable Value


12/198

12

A distribution is positively skewed (skewed to the right) ifit has the following shape.

A Distribution that is Skewed to the Right

Relative Frequency

Variable Value

A distribution is symmetric if it has the following shape.

A Symmetric Distribution

Relative Frequency

Variable Value

The above are all examples of unimodal distributions. Abimodal distribution has two peaks.


13/198

13

2.5 Bivariate Frequency Distributions

Often it is of interest to classify observations of elementaryunits according to two variables (characteristics). Thisallows one to gauge the relationship between the twovariables.

Example:Consider the final results of 50 students in a particular

subject. Each student’s final grade and gender arerecorded, allowing the derivation of the followingbivariate frequency distribution.

GradeGender HD Dist. Credit Pass Fail Row

Total

Male 5 4 10 6 2 27Female 2 3 11 2 5 23Column

Total7 7 21 8 7 50

Each combination of grade and gender is represented by acell in the bivariate frequency distribution, which containsthe frequency of that combination in the data.

The row totals represent, in this example, the marginalfrequencies of females and males in the class (27 and 23,respectively).

The column totals represent the marginal frequencies offinal grades.


14/198

14

Marginal frequencies, represented by the row and columntotals, each refer to one variable only.

We can express the information in a bivariate frequencydistribution as a relative frequency distribution bydividing each entry in the distribution by the total numberof observations.

Example:

For the previous example, the bivariate relative frequencydistribution is given by

GradeGender HD Dist. Credit Pass Fail Row

Total

Male 0.10 0.08 0.20 0.12 0.04 0.54Female 0.04 0.06 0.22 0.04 0.10 0.46Col.

Total0.14 0.14 0.42 0.16 0.14 1.00

The row and column totals in the above table are calledthe marginal relative frequencies.


15/198

15

3. MEASURES OF CENTRAL TENDENCY ANDDISPERSION

In this section we shall look at important ways ofsummarising data from both populations and samples.We shall be concerned with measures of the

‘centre’ of a frequency distribution

‘dispersion’ of values in a frequency distribution

3.1 Summation Notation

Suppose we have ‘n’ numbers. By labelling the numbersn)321( ,..., , , , we can represent the numbers by

n,..., 1, x i i

The sum of the numbers can be denoted

n21

n

1

x........xxx i

i

n

1 xi i is a shorthand way of writing the sum.

Theorem (Basic Properties of Summation Notation)Given ‘c’ is some constant and n21 a,..., a, a are ‘n’

numbers:

(i)

n

1

n

1 accai i i i


16/198

16

(ii) nca)ca(n

1

n

1

i i i i

(iii) 2n

1

n

1

2n

1

2 ncac2a)ca(

i i

i i

i i

(iv) 2n

1

n

1

2n

1

2 ncac2a)ca(

i

i

i

i

i

i

Example:Consider the following four labelled numbers.

1a, 2a, 3a, 1a 4321

Use property (iii) of the above theorem to calculate

4

1

2)1a(i

i .


17/198

17

3.2 Measures of Central Tendency

For each measure considered there are population andsample versions. We will suppose here there are N valuesin the population and ‘n’ values in a sample.

Note that at this stage we are only concerned withquantitative variables, and we assume the populationcontains a finite number of values.

Definition (Mean of a Finite Quantitative Population)If N321 x......., , x, x, x represents a finite population of ‘N’

quantitative data points, then the mean of this populationis given by

Population meanN

x

Nx...xx

N

1N21

i

i

( is the Greek letter ‘mu’)

Definition (Mean of a Sample from a QuantitativePopulation)If n321 x....., , x, x, x represents a particular sample of size

‘n’ from a quantitative population, then the mean of thissample is given by

Sample meann

x

n

x.....xxx

n

1n21

i i


18/198

18

Definition (Mode of a Set of Data)The mode is the data value that occurs most frequently in

a set of data (population or sample).

Definition (The Median of a Set of Data)If quantitative data is arranged in ascending ordescending order, the middle value of data is called themedian. If there is an even number of data points, the

median is typically taken to be the arithmetic average ofthe two middle values.

Example:Consider the following set of data, which we can assume tobe a sample from a population.

13510

1112

5214

4752

12686

46930

6x, 5x, 1x, 24n 1131 , etc. (if we label across rows

then down)


19/198

19

Comparison of the Mean, Median and Mode

The mean takes account of all observation values thereforeit can be affected by extreme values or outliers, i.e. valueswhich differ greatly from the majority of values.

The median and mode are unaffected by extremely high orlow values.

The mode may not represent a “central” value in the

distribution, as in the above example, but it may be useful,for example, for qualitative data.

If the frequency (or relative frequency) distribution isperfectly symmetric and unimodal, the mean, median andmode will coincide.

Symmetric DistributionRelative Frequency

Variable ValueMeanMedianMode

If the distribution is skewed to the right (positively

skewed) and unimodal, mode < median < mean.


20/198

20

Distribution that is Skewed to the Right

Relative Frequency

Variable Value

Mode MeanMedian

If the distribution is skewed to the left (negatively skewed)and unimodal, mean < median < mode.

Distribution that is Skewed to the Left

Relative Frequency

Variable Value

Mean ModeMedian


21/198

21

MAIN POINTS

A statistical population is a set of measurements orcharacteristics of elementary units of interest.

Once a population is defined, a sample is a subset from

the population.

Parameters are numerical characteristics of a population.

Sample statistics are numerical characteristics of a

sample.

A frequency or relative frequency distribution describes

how data is distributed over different classes or categories.

A histogram shows graphically a frequency, relative frequency or cumulative frequency distribution (the areas

of the ‘contiguous’ rectangles are proportional to the

frequencies or relative frequencies).

The mean is affected by ‘extreme’ values; the median and

the mode are not affected by ‘extreme’ values.

The population mean is denoted : the sample mean is

denoted x .

The median divides a set of quantitative data into two

equal halves.


22/198

22


LECTURE - WEEK 2

Required Reading:Ref. File 1: Section 1.1Ref. File 3: Sections 3.5(a)-(d), 3.5(f)Ref. File 4: Introduction and Sections 4.1, 4.2

3. MEASURES OF CENTRAL TENDENCY ANDDISPERSION CONTINUED

3.3 Measures of Dispersion

(a) The Range

Definition (Range of a Set of Data)The range of a set of quantitative data is the differencebetween the highest and lowest data values.

(b) The Mean Absolute Deviation

Definition (Deviation from the Mean)

Consider a particular value i x from a finite data set. The

deviation from the mean of this value is defined as

)x( i if the population mean is known

)xx( i if only a sample mean is available


23/198

23

Definition (Mean Absolute Deviation)(i) If N321 x....., , x, x, x represents a finite quantitative

population, then the population mean absolutedeviation is given by

Population MADN

xN

1

i i

(ii) If n321 x....., , x, x, x represents a sample from aquantitative population, then the sample meanabsolute deviation is given by

Sample MADn

xxn

1

i i

(c) The Standard Deviation and Variance

Another more mathematically convenient way ofanalysing the deviations from the mean is to square them.This leads to the definition of the variance.


24/198

24

Definition (Variance of a Finite Quantitative Population)If N321 x....., , x, x, x represent a finite population of N

quantitative data points, then the variance of thispopulation is given by

(Finite) Population varianceN

)x(N

1

2

2

i i

Definition (Variance of a Sample from a QuantitativePopulation)If n321 x....., , x, x, x represent a particular sample of size ‘n’

from a quantitative population, then the variance of thissample is given by

Sample variance1n

)xx(

s

n

1

2

2

i i

Alternatively we can equivalently write:

Population varianceN

Nx 2

N

1

2

2

i i

Sample variance

1n

xnx

s

2n

1

2

2

i i


25/198

25

The standard deviation is defined as the positive square

root of the variance.

Definition (Finite Population and Sample StandardDeviations)(i) If N321 x....., , x, x, x represent a finite quantitative

population, then the population standard deviation isgiven by

Population standard deviationN

)x(N

1

2

i i

(ii) If n321 x....., , x, x, x represent a sample from a

quantitative population, then the sample standard

deviation is given by

Sample standard deviation1n

)xx(

s

n

1

2

i i


26/198

26

An advantage of the standard deviation over the varianceis that it is expressed in the original units of measure.

Example:

Calculate 2s and ‘s’ for the previous 24 number example.(36.3315, 6.0276)

3.4 The Coefficient of Variation

The coefficient of variation is useful for comparing thevariability of data sets with means that differ significantly,or data sets based on different units of measure.

Definition (Coefficient of Variation)(i) For a population with mean and standard deviation

:

Population coefficient of variation

(i) For a sample with mean x and standard deviation ‘s’:

Sample coefficient of variation

x

s


27/198

27

Example:Suppose we wish to compare the variability of the weights

of a given sample of people with the variability of theirdaily calorie intake. We are told

sample mean of weights = 68kgsample standard deviation of weights = 5kgsample mean of daily calorie intake = 1200 caloriessample standard deviation of daily calorie intake = 300

calories

3.5 Chebyshev’s Theorem and the Empirical Rule

Theorem (Chebyshev’s Theorem) For any quantitative population with a finite variance, theproportion of data points less than ‘c’ standard deviations

from the mean is at least )c1(1 2 , where 0c .

For hump-shaped or bell-shaped (unimodal) distributions,Chebyshev’s theorem will give a conservative indication of

the concentration of population data points around themean. In such cases we can refer to the empirical rule.


28/198

28

The Empirical RuleFor a bell-shaped distribution of sample or population

data, it will be approximately true that 68% of the data points will lie within 1 standard

deviation of the mean.

95% of the data points will lie within 2 standarddeviations of the mean.

99.7% of the data points will lie within 3 standarddeviations of the mean.


29/198

29

4. INTRODUCTORY PROBABILITY THEORY

4.1 Basic Set Theory

A set is a collection of objects or elements.

Definitions (Sets)The set of all elements of interest in a particular problem

or context is called the universal set, which can be denotedby, say, . Other basic definitions relating to sets are as

follows:

(i) The null set , denoted

, contains no elements.(ii) If an element denoted ‘x’ is a member of a set ,

this is commonly denoted x : if ‘x’ is not a

member of set , this can be denoted x .(iii) The intersection of sets and #, denoted#

, is

the set of elements in both and #.

(iv) The union of sets and #, denoted#

, is the

set of elements in and/or #.

(v) Set is said to be a subset of set #, denoted#

, if all elements in are also in #: if is not

a subset of #, this can be denoted #

.(vi) The complement of set , denoted , is the set of

elements in but not in .

(vii) If # , we say and # are mutually

exclusive or disjoint sets; they have no element incommon.


30/198

30

Venn diagrams are often a convenient way of portrayingsets and the relationship between them. An example is the

following diagram.

( #)

$ $

#$

!

" # , # disjoint/mutually exclusive

!

ExampleSuppose we have the set 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

Define 7, 4, 3, 1, 9, 7, 5, 3, 1 %


31/198

31

4.2 Terminology Related to Statistical Experiments

An experiment, in a statistical sense, is an act or processthat leads to an outcome which cannot be predicted withcertainty.

Definition (Simple Events and Events)A simple event of an experiment is an outcome that cannot

be decomposed into simpler outcomes. An event is acollection or set of one or more simple events. An event issaid to have occurred if a simple event included in theevent occurs.

Definition (Sample Space of a Statistical Experiment)The sample space of an experiment, which will be denoted, is the set of all possible simple events. It can be

described as the event consisting of all simple events.

Venn diagrams often provide a convenient way ofdepicting sample spaces and events.


32/198

32

Definition (Discrete Sample Space)A discrete sample space consists of either a finite number

of simple events or a countable and infinite number ofsimple events.

Definition (Continuous Sample Space)A continuous sample space consists of simple events thatrepresent all the points in an interval on the real numberline. The interval could be of finite or infinite width.

4.3 Basic Concepts of Probability

(a) Probabilities of Events as Relative Frequencies

Definition (Probability of an Event)If Ef is the frequency with which event ‘E’ occurs in ‘n’

repetitions (trials) of an experiment under identicalconditions/rules, )E( P is defined as

n

f lim)E( En

P


33/198

33

(b) Definition of a Probability Distribution

Definition (Probability Distribution)A probability model or probability distribution for anexperiment takes the form of either a list of probabilitiesof simple events or some other representation of therelative frequency distribution of the underlyingpopulation associated with the experiment.

(c) Axioms of Probability

Suppose an experiment has a sample space . Any

assignment of probabilities to events in (subsets of )

must satisfy the following axioms:

1. For any event ‘E’ in , 1)E(0

P 2. 1)P(

3. The probability of an event that is the union of acollection of mutually exclusive events is given bythe sum of the probabilities of these mutuallyexclusive events. (The ‘additive property ofprobability’)


34/198

34

(d) Assigning Probabilities to Simple Events in DiscreteSample Spaces

There are three broad approaches to assigningprobabilities to events.

(i) The Underlying Population Relative FrequencyDistribution is Known or Assumed

In this case the relative frequencies of the simple eventscan be considered the probabilities of these simple events.

As a special case, the ‘classical’ or ‘equally likely’ approach to assigning probabilities is applicable inexperiments where it is reasonable to assume that eachsimple event is equally likely. In this case, if there are ‘n’

simple events, each will occur with probability 1/n.

(ii) The Underlying Population Relative FrequencyDistribution is Not Known or Assumed, but theExperiment is Repeatable

This approach relies on past observation of outcomes from

an experiment that allows an approximate determinationof relative frequencies of simple events and events.

In terms of this approach, the probability of an event isapproximated by the relative frequency of the event in a‘large’ number of identical trials of the experimentconsidered. This is often referred to as the ‘empirical’ or

‘relative frequency’ approach to assigning probabilities.


35/198

35

(iii) The Underlying Population Relative Frequency

Distribution is Not Known or Assumed, and theExperiment is Not Repeatable

In many circumstances an experiment may not berepeatable, i.e. it will only happen once. In suchcircumstances people assign subjective probabilities to theexperiment outcomes which reflect their personal beliefs.

For two events ‘A’ and ‘B’ defined on a sample space:

)BA( P probability of simple events in both ‘A’

and ‘B’.

)BA( P probability of simple events in ‘A’ and/or‘B’.

!

B BA

A


36/198

36

Events ‘A’ and ‘B’ are said to be mutually exclusive if

BA . It follows immediately that, if ‘A’ and ‘B’ are mutually exclusive

0)BA( P

!

A B

Two Approaches to Determining the Probability of anEvent Defined on a Discrete Sample Space:

(i) Add up the probabilities of the simple eventsincluded in the event.(ii) Use various probability rules and laws relating to

unions, intersections and complements of events(considered later)


37/198

37

The first approach above can be formalised asperformance of the following steps:

(i) Define the experiment.(ii) List the simple events and assign probabilities to

them in a way consistent with the axioms ofprobability.

(iii) Determine the simple events included in the eventof interest.

(iv) Sum the probabilities of the simple events in theevent of interest to find its probability.

Example:Consider the experiment of tossing a fair die once and let‘A’ be the event of obtaining an odd number of dots on theupward facing side.


38/198

38

Example:Suppose

}8, 5, 4, 3, 2{B, }6, 5, 3, 1{A, }8, 7, 6, 5, 4, 3, 2, 1{ .

Where is the sample space of a statistical experiment and

all the simple events are equally likely.

Example:Suppose that for }8, 7, 6, 5, 4, 3, 2, 1{ :

1.0)6()3()2()1( P P P P

08.0)8()7()4(

P P P 36.0)5( P

with }8, 5, 4, 3, 2{B, }6, 5, 3, 1{A


39/198

39

MAIN POINTS

For a finite population,

varianceN

)x(N

1

2

2

i i

For a sample,

variance 1n

)xx(

s

n

1

2

2

i i

The standard deviation is the square root of the variance:

it has the same units of measure as the data.

Chebyshev’s t heorem applies to all statistical populations.

The empirical rule applies only to hump-shapeddistributions.

The coefficient of variation measures dispersion relative

to the mean. It allows us to compare the dispersions of

data sets with different means and units of measure.

In set notation:

means ‘and’

means ‘and/or’

A means ‘not A’


40/198

40

In statistical experiments:

Simple events cannot be decomposed into simpleroutcomes.

The sample space is the set of all simple events.

Events are a collections or sets of one or more simple

events.

An event occurs if any of its included simple events

occur.

All statistical experiments can be thought of as sampling

from a statistical population.

Probabilities must obey certain axioms.


41/198

41


LECTURE - WEEK 3

Required Reading:Ref. File 4: Sections 4.3, 4.4, 4.6

4. PROBABILITY THEORY CONTINUED

4.4 Discrete Bivariate Probability Distributions

Definitions (Joint and Marginal Probabilities)Suppose a statistical experiment for which simple eventstake the form of intersections of outcomes with respect totwo or more variables. For such a statistical experiment:

The probabilities of the simple events are referred toas joint probabilities

The probabilities of events representing outcomeswith respect to one of the variables only are calledmarginal probabilities.

A listing or other representation of the jointprobabilities is called a joint probability distribution.


42/198

42

Example:Suppose we have the following data on all 1950 first year

students at a particular university.

Work StatusAge inYears

NotWorking

Part-Time

Full-Time

RowTotal

Under 2525 - 34

35 or over

1200100

10

20075

5

250100

10

1650275

25ColumnTotal

1310 280 360 1950

Consider the experiment of selecting one of the students atrandom. Define the following events for the experiment:

A: Under 25B: 25 - 34C: 35 or overD: Not workingE: Part-time workerF: Full-time worker

Calculate the following probabilities:

)EC(, )C(, )FC(, )AD(, )D(, )C(, )(

P P P P P P A P

( )7811, 7877, 265, 138, 195131, 781, 1311


43/198


44/198

44

4.5 Useful Counting Techniques

(a) The Multiplicative Rule

Theorem (Multiplicative Rule of Counting)Suppose two sets of elements, sets and #, consist of An

and Bn distinct elements, respectively: An and Bn need not

be equal. Then it is possible to form BA nn distinct pairs

of elements consisting of one element from set and one

element from set #, without regard to order within a pair.

Example:If a take-away food store sells 10 different food items and5 different types of drink, 50105 distinct food/drinkpairs are possible.

The multiplicative rule can be extended naturally. Thus

k21 n...nn different sets of ‘k’ elements are possible if one

selects an element from each of ‘k’ groups consisting of

k21 n,..., n, n distinct elements, respectively.

Example:

Suppose we select 5 people at random. What is theprobability that they were born on different days of theweek, assuming an individual has an equal probability ofbeing born on any of the seven days of the week?(Approx. 0.1499)


45/198

45

A simple event here is an ordered sequence of 5 elements,the first representing the day of the week the first person

was born on, the second the day the second person wasborn on, and so forth.

(b) Permutations

Definition (Permutations)A permutation is an ordered sequence of elements.

Definition (Factorial Notation)If ‘N’ is a non-negative integer, we define:

)1)(2)(3).......(2N)(1N(N!N

(‘N-factorial’)

And

1!0


46/198

46

Theorem (Number of Permutations)The total number of possible distinct permutations

(ordered sequences) of ‘R’ elements selected (withoutreplacement) from ‘N’ distinct elements, denoted RN P , isgiven by

)!RN(

!NRN

P

Example:Consider the numbers 1, 2, 3, 4. How many permutationsof these four numbers taken 2 at a time can be found?(12)

(c) Combinations

Definition (Combinations)

A set of ‘R’ elements selected from a set of ‘N’ distinctelements without regard to order is called a combination.

Theorem (Number of Combinations)The total number of possible combinations of ‘R’ elementsselected from a set of ‘N’ distinct elements is given by.

)!RN(!R

!NRN

C


47/198

47

Example:In how many ways can a committee of 4 people be chosen

from a group of 7 people? (35)

(d) Permutations of ‘N’ Non-Distinct Elements

Theorem (Number of Permutations of ‘N’ Non-DistinctElements)Consider a set of ‘N’ elements of which 1N are alike, 2N

are alike,....., and rN are alike, where 1N i )r,..., 1( i

and NNr

1

i i . Then the number of distinct permutations

of these ‘N’ elements is given by

!N!......N!N

!N

r21

If the above result is specialized to the case where ‘x’ is the

number of distinct arrangements (or distinctpermutations) of ‘N’ objects where ‘R’ are alike and

)RN( are alike, then

RN

)!RN(!R

!Nx C


48/198

48

Example:Say we have 3 black flags and 2 red flags. How many

distinct ways are there of arranging these flags in a row?(10)

Example:Suppose there are 6 applicants for 2 similar jobs. As thepersonnel manager is too lazy he simply selects 2 of theapplicants at random and gives them each a job. What isthe probability that he selects one of the 2 best applicants,and 1 of the four worst applicants? (8/15)


49/198

49

4.6 Conditional Probability

Definition (Conditional Probability)The probability of event ‘A’ occurring given that event ‘B’

occurs, or the conditional probability of ‘A’ given ‘B’ (hasoccurred) is denoted )B|A( P . Provided 0)B( P , this

conditional probability is defined to be

)B()BA()B|A(

P P P

Example:Suppose that a survey of women aged 20-30 years suggeststhe following joint probability table relating to maritalstatus and desire to become pregnant within the next 12months.


50/198

50

Desire

Marital status Pregnancy No pregnancy TotalMarriedUnmarried

0.080.02

0.470.43

0.550.45

Total 0.10 0.90 1.00

Theorem (Multiplicative Law of Probability)

Suppose events ‘A’ and ‘B’ defined on a sample space.Then

)B|A()B()A|B()A()BA( P P P P P

Example:Define events ‘A’ and ‘B’ in the following way:

‘A’: A student achieves a mark of over 65% in a first yearstatistics exam

‘B’: A student goes on to complete her bachelors degree.


51/198

51

Suppose past experience indicates

88.0)A|B(7.0)A(

P P

4.7 Independence of Events

Sometimes, whether an event ‘B’ has occurred or not willhave no effect on the probability of ‘A’ occurring. In this

case we say events ‘A’ and ‘B’ are independent.

Definition (Independent and Dependent Events)Events ‘A’ and ‘B’ are said to be statistically independent if

)B()A()BA( P P P

If )B()A()BA( P P P

, the events are said to bestatistically dependent .


52/198

52

Alternative Definition (Independent and DependentEvents)

Events ‘A’ and ‘B’ are said to be statistically independent if

)A()B|A( P P

)B()A|B( P P

Otherwise the events are said to be statistically dependent.

Example:Consider the single die tossing experiment again anddefine the following events:

‘A’: an odd number of dots results‘B’: a number of dots greater than 2 results

Are ‘A’ and ‘B’ independent?


53/198

53

4.8 More Useful Probability Rules

(a) The Additive Law of Probability

Theorem (Additive Law of Probability)For two events ‘A’ and ‘B’ defined on a sample space

)BA()B()A()BA( P P P P

Example:Again suppose that for }8, 7, 6, 5, 4, 3, 2, 1{ :

1.0)6()3()2()1( P P P P

08.0)8()7()4( P P P

36.0)5( P

with }8, 5, 4, 3, 2{B, }6, 5, 3, 1{A


54/198

54

(b) The Complementation Rule

Theorem 4.7 (Complementation Rule)Suppose an event ‘E’ and its complement E defined onsome sample space . Then

)E(1)E( P P

(b) The Law of Total Probability

Theorem (Law of Total Probability)Suppose a sample space and a set of ‘k’ events

k21 E,..., E, E such that

0)E(

i P )k,..., 1(

i

j i EE )( j i (i.e. the events are mutually

exclusive)

k21 E...EE (i.e. the events are exhaustive

on )

Then for any event ‘A’ defined on :

k

1

kk2211

k21

)E|A()E(

)E|A()E(....)E|A()E()E|A()E(

)AE(....)AE()AE()A(

j j j

P P

P P P P P P

P P P P


55/198

55

MAIN POINTS

In some statistical experiments the number of basicoutcomes in the sample space or event of interest can be

enumerated by using the ‘multiplicative rule’,

permutation or combination formulae, depending on how

a basic outcome can be represented most appropriately.

)B|A( P means the probability event ‘A’ occurs given

that event ‘B’ has occurred. The conditional probabilitydefinition is

)B(

)BA()B|A(

P

P P

Multiplicative law of probability:)B|A()B()A|B()A()BA( P P P P P

Events ‘A’ and ‘B’ are statistically independent if the

probability of ‘A’ occurring is not affected by whether ‘B’

has occurred.

Events ‘ A’ and ‘ B’ are independent if)B()A()BA( P P P

or equivalently )A()B|A( P P

Additive law of probability:)BA()B()A()BA( P P P P


56/198

56


LECTURE - WEEK 4

Required Reading:Ref. File 4: Sections 4.7 to 4.9Ref. File 5: Introduction and Sections 5.1 to 5.4

4. PROBABILITY THEORY CONTINUED

4.9 Sampling With and Without Replacement

Definition (Random Sample from a Statistical Population)A random sample of ‘n’ elements from a statisticalpopulation is such that every possible combination of ‘n’elements from the population has an equal probability ofbeing in the sample.

Many experiments involve taking a random sample from afinite population. If we sample with replacement, weeffectively return each observation to the populationbefore making the next selection. In this way thepopulation from which we are sampling remains the same

from one selection to the next; provided sampling israndom, the successive outcomes will be independent.

If we sample without replacement from a finitepopulation, the outcome of any one selection will dependon the outcomes of all previous selections; the populationis reduced with each selection.


57/198

57

Example:Suppose that in a given street 50 residents voted in the last

election. Of these, 15 voted for party ‘A’, 30 voted forparty ‘B’ and 5 voted for neither party ‘A’ nor ‘B’.Suppose that one evening a candidate for the next electionvisits the residents of the street to introduce herself. Whatis the probability that the first two eligible voters shemeets voted for party ‘A’ at the last election? ( 353 )

Example:

Consider the experiment of successively drawing 2 cardsfrom a deck of 52 playing cards. Define the followingevents:

1A : ace on first draw

2A : ace on second draw

What is the probability of selecting 2 aces if sampling(drawing) is (i) without replacement, and (ii) withreplacement? ( 1691, 2211 )


58/198

58

Note: If we simultaneously select a sample of ‘n’ elements,we are effectively sampling without replacement.

4.10 Probability Trees

Tree diagrams can be a useful aid in calculating theprobabilities of intersections of events (i.e. jointprobabilities).

Example:Greasy Mo’s take-away food store offers special $10 mealdeals consisting of a small pizza or a kebab, together witha can of soft drink, a milkshake or a cup of fruit juice.Past experience has shown that 60% of meal deal buyerschoose a pizza (‘P’), 40% choose kebabs (‘K’), 75% choosesoftdrink (‘S’), 20% choose a milkshake (‘M’) and 5%choose fruit juice (‘J’). Assume the events ‘P’ and ‘K’ areindependent of the events ‘S’, ‘M’ and ‘J’. What is theprobability that a meal deal customer (chosen at random)will choose a pizza and fruit juice? (0.03)

The tree diagram for this example can be drawn as below.


59/198

59

45.0)75.0(6.0)SP( P

S:0.75

M:0.2 12.0)2.0(6.0)MP( P

P:0.6 J:0.05

03.0)05.0(6.0)JP( P

3.0)75.0(4.0)SK ( P

S:0.75

K:0.4

M:0.2 08.0)2.0(4.0)MK ( P

J:0.0502.0)05.0(4.0)JK ( P


60/198

60

5. PROBABILITY DISTRIBUTIONS OF DISCRETERANDOM VARIABLES

5.1 Probability Distributions and Random Variables

A probability distribution can be considered a theoreticalmodel for a relative frequency distribution of data from areal life population.

A probability distribution thus specifies the probabilitiesassociated with the various outcomes of a statisticalexperiment. It can take the form of a table, a graph orsome formula.

From now on we shall be concerned with thecharacteristics of probability distributions. However, tofacilitate our study we shall now represent simple eventsand events associated with statistical experiments byvalues of random variables.

Definition (Random Variable)A random variable X is a rule that assigns to each simpleevent of a statistical experiment a unique numerical value.

The above definition can also be expressed in the followingslightly more mathematical way.


61/198

61

Alternative Definition (Random Variable)A random variable X is a real valued function for which

the domain is the sample space of a statistical experiment.

In most statistical experiments of interest, outcomes giverise to quantitative data that can be considered values ofthe random variable being studied.

In experiments which give rise to categorical or qualitativedata, a random variable can normally also be defined.

Example:Consider the experiment of selecting a person at randomand noting their hair colour.


62/198

62

Definition (Discrete Random Variable)A discrete random variable can only assume a finite or

infinite and countable number of values.

Definition (Continuous Random Variable)A continuous random variable can assume any value in aninterval (finite or infinite).

Definition (Discrete Probability Distribution)A discrete probability distribution lists a probability for, orprovides a means (e.g. a rule or formula) of assigning aprobability to, each value a discrete random variable cantake.

Suppose our random variable is called X . Then )x( X P

represents the probability that the random variable takeson the particular value ‘x’.

Properties of the Discrete Probability Distribution of aRandom Variable X :

1)x(0 X P for all values of ‘x’

x

1)x(all

X P


63/198

63

Example:Consider again the experiment of tossing a fair die once

and noting the number of dots on the upward facing side( X ).

Definition (Cumulative Distribution Function)The cumulative distribution function of a random variable X , denoted )x( F , is defined as

)x()x( X P F

where ‘x’ is any real number.

5.2 Expected Values of Random Variables

It is of interest to have a measure of the centre of theprobability distribution of a random variable X . This roleis filled by the expected value of X .


64/198

64

Definition (Expected Value of a Discrete RandomVariable)

The expected value of a discrete random variable X isdefined as

x

)x(x)(all

X P X E

If a statistical experiment considered generates values ofthe random variable that coincide with values in thepopulation considered, and the theoretical probabilitydistribution of the random variable and populationrelative frequency distribution are the same, the mean ofthe theoretical distribution of X will be the same as thepopulation mean (i.e. ). That is, )( X E .

Example:Suppose you buy a lottery ticket for $10. The sole prize inthe lottery is $100,000 and 100,000 tickets are sold. If thelottery is fair (i.e. each ticket sold has an equal chance ofwinning), what will be your expected gain from buying thelottery ticket? (-9)


65/198

65

Theorem (Expected Value of a Function of a DiscreteRandom Variable)

Suppose a function )( X g of a discrete random variable X .The expected value of this function, if it exists, is given by

x

)x()x()]([all

X P g X g E

Theorem 5.2 (Various Properties of Expected Values)

If ‘c’ is any constant then

c)c( E

If ‘c’ is any constant and )( X g is any function of a

discrete or continuous random variable X then

)]([c)](c[ X g E X g E

If )( X g i )k,..., 1( i are ‘k’ functions of a discrete or

continuous random variable X then

)]([...)]([)](..)([ k1k1 X g E X g E X g X g E

If )( X h and )( X g are two functions of a discrete or

continuous random variable X such that )()( X g X h

for all X , then

)]([)]([ X g E X h E


66/198

66

5.3 The Variance of a Random Variable

To gauge the dispersion of a random variable X about itsexpected value or mean we can calculate the expected

value of its squared distance 2))(( X E X from the mean.

This is called the variance of the random variable X ,denoted )( X Var .

Definition (Variance of a Random Variable)

The variance of any random variable X (discrete orcontinuous) is given by

]))([()( 2 X E X E X Var

Definition (Standard Deviation of a Random Variable)

The standard deviation of any random variable X (discreteor continuous) is given by

]))([()()( 2 X E X E X Var X SD

Again assuming the probability distribution of X is an

accurate representation of the population relativefrequency distribution of X , we can write 2)( X Var ,

where 2 is the population variance.


67/198

67

An alternative way of writing (and calculating) )( X Var is

discrete)isIf ()]([)x(x

)]([)()(

2

x

2

22

X X E X P

X E X E X Var

all

Example:Suppose a lottery offers 3 prizes: $1,000, $2,000 and

$3,000. 10,000 tickets are sold and each ticket has anequal chance of winning a prize. Calculate the varianceand standard deviation of the random variable X representing the value of the prize won by a ticket.(1399.64, 37.4118)

x )x(

X P

2

x )x(x

X P )x(x

2

X P 0

10000

9997 0 0 0

1,00010000

1 1,000,000 0.1 100

2,00010000

1 4,000,000 0.2 400

3,00010000

1 9,000,000 0.3 900

Total 0.6 1400


68/198

68

If we wish to determine the variance of a linear function

X X gY ba)(

of a random variable X , the followingrule can be used

)(b)ba()( 2 X Var X Var Y Var


69/198

69

5.4 The Binomial Distribution

The binomial distribution is a discrete probabilitydistribution based on ‘n’ repetitions of an experimentwhose outcomes are represented by a Bernoulli randomvariable.

(a) Bernoulli Experiments

A Bernoulli experiment (or trial) is such that only 2outcomes are possible. These outcomes can be denotedsuccess (‘S’) and failure (‘F’), with probabilities ‘p’ and

)p1( , respectively.

A Bernoulli random variable Y is usually defined so that ittakes the value 1 if the outcome of a Bernoulli experiment

is a success, and the value 0 if the outcome is a failure.Thus

)p1()0(

p)1(

Y P

Y P

The mean and variance of a Bernoulli random variable

defined in the above way are

)p1(p)(

p)(

Y Var

Y E


70/198

70

(b) Binomial Experiments

Definition (Binomial Experiment)A binomial experiment fulfils the following requirements:

(i) There are ‘n’ repetitions or ‘trials’ of a Bernoulliexperiment for which there are only twooutcomes, ‘success’ or ‘failure’.

(ii) All trials are performed under identical

conditions.(iii) The trials are independent.(iv) The probability of success ‘p’ is the same for each

trial.(v) The random variable of interest, say X , is the

number of successes observed in the ‘n’ trials.

Theorem (The Binomial Probability Function)Let X represent the number of successes in a binomialexperiment consisting of ‘n’ trials and with a probability

‘p’ of success on each trial. The probability of ‘x’

successes in such an experiment is given by

xnxxn )p1(p)x( C X P for n,..., 3, 2, 1, 0x


71/198

71

Example:A company that supplies reverse-cycle air conditioning

units has found from experience that 70% of the units itinstalls require servicing within the first 6 weeks ofoperation. In a given week the firm installs 10 airconditioning units. Calculate the probability that, within 6weeks

5 of the units require servicing (0.1029 approx.)

none of the units require servicing (0 approx.)

all of the units require servicing (0.0282 approx.)


72/198

72

(c) Cumulative Binomial Probabilities

(Extract of Appendix 3)

CUMULATIVE BINOMIAL PROBABILITIES: )n, p|x( X P

pn x 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 .... 0.70

1

2

3

10

01

012

012

3

0123456789

10

0.95001.0000

0.90250.99751.0000

0.85740.99280.9999

1.0000

0.59870.91390.98850.99900.99991.00001.00001.00001.00001.00001.0000

0.90001.0000

0.81000.99001.0000

0.72900.97200.9990

1.0000

0.34870.73610.92980.98720.99840.99991.00001.00001.00001.00001.0000

0.85001.0000

0.72250.97751.0000

0.61410.93930.9966

1.0000

0.19690.54430.82020.95000.99010.99860.99991.00001.00001.00001.0000

0.80001.0000

0.64000.96001.0000

0.51200.89600.9920

1.0000

0.10740.37580.67780.87910.96720.99360.99910.99991.00001.00001.0000

0.75001.0000

0.56250.93751.0000

0.42190.84380.9844

1.0000

0.05630.24400.52560.77590.92190.98030.99650.99961.00001.00001.0000

0.70001.0000

0.49000.91001.0000

0.34300.78400.9730

1.0000

0.02820.14930.38280.64960.84970.95270.98940.99840.99991.00001.0000

0.65001.0000

0.42250.87751.0000

0.27460.71830.9571

1.0000

0.01350.08600.26160.51380.75150.90510.97400.99520.99951.00001.0000

0.60001.0000

0.36000.84001.0000

0.21600.64800.9360

1.0000

0.00600.04640.16730.38230.63310.83380.94520.98770.99830.99991.0000

....

....

....

....

0.30001.0000

0.09000.51001.0000

0.02700.21600.6570

1.0000

0.00000.00010.00160.01060.04730.15030.35040.61720.85070.97181.0000

Example:Referring to previous air conditioning unit example,calculate the probability that within 6 weeks of installation

less than 8 of the air conditioners require servicing.

(0.6172 approx.) 4 or more of the air conditioners require servicing.

(0.9894 approx.)


73/198

73

Example:A referring to previous air conditioning unit example, use

the cumulative binomial tables to calculate the probabilitythat within 6 weeks of installation

5 units require servicing (0.103)

10 units require servicing (0.0282)

(d) Characteristics of the Binomial Distribution

Theorem (Mean and Variance of a Binomial RandomVariable)Let X represent the number of successes in a binomialexperiment consisting of ‘n’ trials, and where the

probability of success on each trial is ‘p’. Then

np)( X E )p1(np)( X Var


74/198

74

Each combination of ‘n’ and ‘p’ gives a particularbinomial distribution. We say ‘n’ and ‘p’ are the

parameters of the binomial distribution.

If 0.5p , the binomial distribution is symmetric.

ExampleSuppose 5n

and 0.5p

(probability histogram)probability

0.3125

0.1563

0.0313

0 1 2 3 4 5 X

The binomial distribution will be skewed to the left (i.e.‘negatively skewed’) if 0.5p , and skewed to the right

(i.e. ‘positively skewed’) if 0.5p . In either case the

tendency to be skewed diminishes as ‘n’ increases.


75/198


76/198

76

The binomial distribution is a model for the relative

frequency (probability) distribution of numbers ofsuccesses in ‘n’ trials of a Bernoulli experiment.

The binomial distribution can be represented by the

probability function

xnx)p1(p)x( xn C X P

where ‘ n’ is the number of trials, ‘x’ the number ofsuccesses and ‘ p’ the probability of success at each trial.


77/198

77


LECTURE - WEEK 5

Required Reading:Ref. File 6: Introduction and Sections 6.1 to 6.4

6. CONTINUOUS PROBABILITY DISTRIBUTIONS

6.1 Introduction

From now on we shall be mainly concerned with studyingthe distributions of continuous random variables. As wehave noted, a continuous random variable can assume anyvalue in a given interval.

The probability distribution for a continuous randomvariable X will have a smooth curve or line as its graphicalrepresentation. The heights of the points on this curve willbe given by a function of ‘x’, denoted )x( f , which is

variously called the probability density function, theprobability distribution or simply the density function ofthe random variable X .

Areas under a density function )x( f represent

probabilities of X taking on values in the correspondingintervals.


78/198

78

Area )ba( X P

y y = f (x)

a b X

Properties of Density FunctionsIf )x( f is a valid density function, it satisfies the following

two properties:

(i) 0)x(

f for all x

(ii)

1x)x( d f

Note: For a continuous random variable the probabilityassociated with any particular value of the variable is 0.

The mean and variance of a continuous random variableare normally determined using calculus.


79/198

79

6.2 The Uniform Distribution

“If a random variable X can take on any value in a given finite interval bxa and the probability of the variabletaking a value in a given finite sub-interval is the same asthe probability the variable takes a value in any otherfinite sub-interval of the same width, we say the variable X is uniformly distributed.” We have the following formal

definition.

Definition (Uniform Random Variable)A continuous random variable X is said to be uniformlydistributed over the finite interval ba X if and only ifits density function is given by

bxoraxif , 0

bxaif ,

ab

1

)x( f

We can calculate probabilities with respect to the random

variable X in the above definition from

ab

cd)dc(

X P for bd, ac


80/198

80

f (x)

Total Area = 11/(b-a)

a c d b X

Theorem (Expected Value and Variance of a Uniform

Random Variable)

Suppose the random variable X is uniformly distributed

over the finite interval bxa . The expected value and

variance of X are, respectively

2

)ba()(

X E

12

)ab()(

2

X Var

Example:

The amount of petrol sold daily by a service station (say X )is known to be uniformly distributed between 4,000 and6,000 litres inclusive. What is the probability of sales onany one day being between 5,500 and 6,000 litres? (0.25)


81/198

81

6.3 The Normal (Gaussian) Distribution

The normal distribution represents a family of “bell-shaped” distributions that are distinguished according totheir mean and variance.

Definition (Normally Distributed Random Variable)A random variable X is normally distributed if and only ifit has a density function of the following form:

2

2 )x(

2

1

2e

2

1)x( f for all real ‘x’

Where:

and 2 are parameters of the distribution of X .

They are used to represent )( X E and )( X Var ,

respectively.

‘e’ is the irrational number ‘e’ that serves as the base

for natural logarithms ..)7182.2e(

is the irrational number representing the ratio ofthe circumference of a circle to its diameter

..)1415.3(

A normal distribution with mean and variance 2 is

usually denoted ), ( 2 N .

The normal distribution has a positive density for all real‘x’. Therefore it can strictly speaking never exactly matchthe distribution of a variable that only takes on non-

negative values. However, even in such cases it can oftengive a very good approximation.


82/198

82

The normal distribution is symmetric about .

y y = f (x)

For any normal distribution it will be the case that,approximately:

68% of its values will fall within one standard

deviation ( ) of .

95.5% of its values will fall within two standard

deviations (2

) of . 99.7% of its values will fall within three standard

deviations (3 ) of .

Computing areas under a normal density function is

difficult, but we can use a table showing probabilitiesassociated with the standardised normal random variable(many calculators and Microsoft Excel are also able tocalculate these probabilities).


83/198

83

The standard normal distribution has a mean of 0 and avariance (and standard deviation) of 1. A standard

normal variable is often denoted Z . Thus

Z ! )1, 0( N

Probabilities relating to X ! ), ( 2 N can be calculated by

first calculating the standardised Z scores correspondingto the value(s) of X and then using the standard normal

probability table. This is formalized by the followingtheorem.

Theorem 6.2 (The Standardizing Transformation of Non-Standard Normal Probabilities)A random variable X is normally distributed with mean

and variance 2 if and only if

X Z is a standard

normal random variable, that is

X ! ), ( 2 N if and only if

X Z ! )1, 0( N


84/198

84

Also note that a linear function of a normal variable is alsonormally distributed.

(Extract of Appendix 5)AREAS UNDER THE STANDARD NORMAL DISTRIBUTION

The table below gives areas under the standardnormal distribution between 0 and z.

0 z Z

z 0 1 2 3 4 5 6 7 8 90.00.10.20.30.4

.0000

.0398

.0793

.1179

.1554

.0040

.0438

.0832

.1217

.1591

.0080

.0478

.0871

.1255

.1628

.0120

.0517

.0910

.1293

.1664

.0160

.0557

.0948

.1331

.1700

.0199

.0596

.0987

.1368

.1736

.0239

.0636

.1026

.1406

.1772

.0279

.0675

.1064

.1443

.1808

.0319

.0714

.1103

.1480

.1844

.0359

.0754

.1141

.1517

.1879

0.50.60.70.80.9

.1915

.2258

.2580

.2881

.3159

.1950

.2291

.2612

.2910

.3186

.1985

.2324

.2642

.2939

.3212

.2019

.2357

.2673

.2967

.3238

.2054

.2389

.2704

.2996

.3264

.2088

.2422

.2734

.3023

.3289

.2123

.2454

.2764

.3051

.3315

.2157

.2486

.2794

.3078

.3340

.2190

.2518

.2823

.3106

.3365

.2224

.2549

.2852

.3133

.33891.01.1

1.21.31.4

.3413

.3643

.3849

.4032

.4192

.3438

.3665

.3869

.4049

.4207

.3461

.3686

.3888

.4066

.4222

.3485

.3708

.3907

.4082

.4236

.3508

.3729

.3925

.4099

.4251

.3531

.3749

.3944

.4115

.4265

.3554

.3770

.3962

.4131

.4279

.3577

.3790

.3980

.4147

.4292

.3599

.3810

.3997

.4162

.4306

.3621

.3830

.4015

.4177

.4319

1.51.61.71.81.9

.4332

.4452

.4554

.4641

.4713

.4345

.4463

.4564

.4649

.4719

.4357

.4474

.4573

.4656

.4726

.4370

.4484

.4582

.4664

.4732

.4382

.4495

.4591

.4671

.4738

.4394

.4505

.4599

.4678

.4744

.4406

.4515

.4608

.4686

.4750

.4418

.4525

.4616

.4693

.4756

.4429

.4535

.4625

.4699

.4761

.4441

.4545

.4633

.4706

.4767.... ..... ..... ..... ..... ..... ..... ..... ..... ..... .........3.83.9

.......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

......4999.5000

Example:If Z ! )1, 0( N determine the following probabilities:

)0( Z P (0.5)

)5.0( Z P (0.3085)

)9.01.0( Z P (0.3557)

)64.1(

Z P (0.0505)


85/198

85

Example:If X ! )4, 12( N , calculate )26.6( X P , )137( X P and

)5.15(

X P . (0.0021, 0.6853, 0.0401)


86/198

86

Example:From several years’ records, a fish market manager has

determined that the weight of deep sea bream sold in themarket )( X is approximately normally distributed with a

mean of 420 grams and a standard deviation of 80 grams.Assuming this distribution will remain unchanged in thefuture, calculate the expected proportions of deep seabream sold over the next year weighing

(a) Between 300 and 400 grams. (0.3345)

(b) Between 300 and 500 grams. (0.7745)

(c) More than 600 grams. (0.0122)


87/198


88/198

88

Example:It is known that 60% of cars registered in a given town use

unleaded petrol. A random sample of 200 cars is selected.Determine the probability that, of the cars in the sample:

130 use unleaded petrol. (0.021)

more than 130 use unleaded petrol. (0.0643)

less than 130 use unleaded petrol. (0.9147)


89/198

89

MAIN POINTS

The graphical representation of a continuous randomvariable is the graph of its density function – This is the

counterpart of the probability histogram for a discrete

random variable.

The probability that a continuous random variable takes

on a value in some range is given by an area under the

density function.

The uniform distribution has a constant density function.

If X is normally distributed with a mean and variance2

, we can write this information as

X ~ ), ( 2

N

The standard normal random variable Z is such that

Z ~ )1, 0( N

Areas under a normal density function can be calculated

with reference to the ‘standard normal table’, and making

use of the symmetry of the distribution as needed.

The normal distribution can be used to approximate

binomial probabilities provided 5np and 5)p1(n

(with np and )p1(np2

); the approximation can

be improved by using continuity correction.


90/198

90


LECTURE - WEEK 6

Required Reading:Ref. File 7: Introduction and Sections 7.1 to 7.4

7. INTRODUCTION TO ESTIMATION

7.1 Estimators and Their Properties

From now on we will mainly be concerned with ‘random

samples of random variables’.

Definition (Random Sample of Size ‘n’ of a Random

Variable)Consider a set of random variables

n21 ,....., , X X X . This

set of random variables is said to represent a randomsample of size ‘n’ of the random variable X if

(i) n21 ,....., , X X X are all statistically independent

And

(ii) n21 ,....., , X X X each have the same probability

distribution (or distribution function) as the randomvariable X .

We will mostly use an upper case italicized letter to denotea random variable, and a lower case non-italicized letter to

denote an actual realization or value of the variable.


91/198

91

Definition (Sample Statistic)Suppose the random variables n21 ,....., , X X X are

associated with a sample of size ‘n’ from a statisticalpopulation. Then any function of (or formula containing)

n21 ,....., , X X X that does not depend on any unknown

parameter is called a sample statistic.

Definition (Estimator/Estimate of a Population

Parameter)Suppose the random variables n21 ,....., , X X X are

associated with a sample of size ‘n’ from a statistical

population. Then a sample statistic involving

n21 ,....., , X X X that is used to estimate a parameter of the

population or associated probability distribution is calledan estimator of the parameter, and a realization of the

sample statistic (an actual number) is called an estimate ofthe parameter.


92/198

92

Definition (Sample Mean and Variance of a RandomVariable)

Suppose the random variables n21 ,....., , X X X represent arandom sample of size ‘n’ of the random variable X . Thesample mean and variance of X are then defined as,respectively

Sample Mean of X n

n

1

i i

X

X

Sample Variance of X 1n

)(n

1

2

2

i i X X

S

If an estimator is used to obtain a single value estimate ofa parameter, this estimate is called a point estimate.

An interval estimate describes a range, or interval, ofvalues in which the population parameter is believed to be.

An interval estimate is normally centred around a pointestimate.

Since estimators are functions of random variables, theywill also be random variables whose values vary from

sample to sample. The probability distribution of anestimator is called a sampling distribution.


93/198

93

Most statistical inference is based on a knowledge of the

sampling distributions of estimators.

Properties of Estimators

Definition (Unbiased Estimator)Consider an estimator θ ˆ of some population parameter .

θ ˆ is an unbiased estimator of if )ˆ(θ E . If )ˆ(θ E , θ ˆ

is said to be a biased estimator of with the value of the

bias given by )ˆ(B θ E .

( is the lower case version of the Greek letter ‘theta’)


94/198

94

Definition (Relative Efficiency of an Estimator)

If 1θ̂ and 2θ̂ are both unbiased estimators of a population

parameter with unequal variances, 1θ̂ is said to be

relatively more efficient than 2θ̂ if

)ˆ()ˆ( 21 θ Var θ Var

Definition (Consistency of an Estimator)An estimator θ ˆ of some population parameter is said tobe a consistent estimator of if as the (random) samplesize increases the probability increases of the estimatoryielding an estimate in some arbitrary fixed interval,however small, centred round the true parameter value.

Theorem (Sufficient Condition for Consistency of anEstimator)

An estimator θ ˆ of some population parameter is aconsistent estimator of this parameter if

)ˆ(limn

θ E and 0)ˆ(limn

θ Var


95/198

95

7.2 The Sampling Distribution of the Sample Mean

Example:Suppose we know that in a large city 20% of householdspossess no car, 60% possess one car and 20% possess twocars. If we let X be the number of cars in a household wecan write the probability distribution of X as

0x

1x 2x

x)(

X P 51 53 51

Determine the sampling distribution of X based onrandom samples of size 2.

1x 2x x ))x()x(( 2211 X X P

0 0 0 1/5 1/5 = 1/250 1 0.5 1/5 3/5 = 3/250 2 1 1/5 1/5 = 1/251 0 0.5 3/5 1/5 = 3/251 1 1 3/5 3/5 = 9/251 2 1.5 3/5 1/5 = 3/252 0 1 1/5 1/5 = 1/252 1 1.5 1/5 3/5 = 3/252 2 2 1/5 1/5 = 1/25


96/198

96

x 0 0.5 1 1.5 2

)x(

X P 1/25 6/25 11/25 6/25 1/25

Theorem (The Central Limit Theorem)Consider a random sample n21 ,....., , X X X of size ‘n’ of a

random variable X with a finite mean )( X E and a

finite variance 2)( X Var . Then:

(i) If X is (exactly) normally distributed, the samplemean X will be exactly normally distributed with a

mean and a variance n2 .

(ii) If X is not normally distributed, the sample mean X will be approximately normally distributed with a

mean and a variance n2 for large sample sizes.

This approximation is generally considered to be

valid when 30n .


97/198

97

Note: )( X Var decreases as ‘n’ increases and approaches

zero in the limit. This, together with the fact that X is

unbiased, ensures that X is a consistent estimator of .

Note: The standard deviation of an estimator is oftencalled the standard error of the estimator, although oftenthis term is used for an estimate of the standard deviationof an estimator.

Example:A particular type of light bulb has a mean life of 6,000hours and a standard deviation of bulb life of 400 hours.What percentage of random samples made up of 100observations of bulb lives will yield mean bulb livesbetween 5,950 and 6,050 hours? (78.88%)


98/198

98

8. INTERVAL ESTIMATION

8.1 Introduction and Terminology

A confidence interval not only comprises an interval ofpossible population parameter values, but also somemeasure of the degree of belief or confidence that theinterval does indeed contain the parameter in question.

The level of confidence associated with a confidenceinterval is the probability that we will obtain a realizationof the interval that contains the population parameter, i.e.before we actually take a sample. It is usually denoted( 1 )100%, where is the probability ( 10

) ofobtaining a realization of the interval that does not containthe population parameter.

Confidence intervals are constructed on the basis ofknowledge of the sampling distribution of the estimator(or some function thereof) and a predetermined .

The

z Notation:

z is used to denote the value of the standard normal

variable Z such that

)z( Z P


99/198


100/198

100

n

X Z ! )1, 0( N

Therefore, for a given :

*1n

zn

z

1n

zn

z

1n

zn

z

1n

zn

z

1zn

z

1)zz(

22

22

22

22

22

22

X X P

X X P

X P

X P

X P

Z P

Thus the ( 1 )100% confidence interval for an observedx is given by

**n

zx, n

zx 22


101/198


102/198

102

8.3 Properties of Confidence Intervals

The width of a confidence interval for the populationmean, where we are justified in using the normaldistribution, is given by

nz2z2)zx(zx 2222

X X X

For a given confidence level and a given , the confidenceinterval width decreases with increasing ‘n’. This leads toa criterion for choosing ‘n’.

If we wish to use a calculated x to estimate to within ‘D’

(units) with ( 1 )100% confidence, we should choose ‘n’

such that2

2

D

zn

(assuming a normally distributed population or an 30n )


103/198

103

Example:A clothing shop located in a busy shopping arcade is

interested in estimating the mean age of people whofrequent the arcade. The shop intends to use thisinformation in determining the appropriate range ofclothing it should stock in order to maximize sales. Asample of people is to be selected at random in the arcadeand questioned by the shop manager about their age.What should the sample size be if the shop manager

wishes to use a calculated x to estimate the average age ofpeople who frequent the arcade to within 1.5 years, with95% confidence, assuming the population standarddeviation is approximately 7.5? (97)


104/198

104

MAIN POINTS

A random sample of a random variable is such that therandom variables representing the sample are

independently and identically distributed.

An estimator of a population parameter is a formula

containing the random variables representing sample

values.

The probability distribution of an estimator is called a

sampling distribution.

An unbiased estimator of a parameter has a mean equal

to the parameter value.

A consistent estimator has a probability distribution thatbecomes ‘more concentrated’ around the true parameter

value as n tends to infinity.

)( X E , andn

)(2

X Var (if the s X i ' represent a

random sample of the random variable X)

X is and unbiased and consistent estimator of .

The central limit theorem says that even if we are

sampling from a non-normal distribution, the distribution

of the sample mean will be approximately normal

provided the sample size is sufficiently large

IEM Outline Lecture Notes Autumn 2016

Documents

IEM Outline Lecture Notes Autumn 2016