8/18/2019 IEM Outline Lecture Notes Autumn 2016
1/198
1
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 1
Required Reading:Ref. File 1: Section 1.13Ref. File 3: Introduction and Sections 3.1 to 3.4, 3.7
KEYS TO PASSING THIS UNIT:
(i) Undertake the required reading from the referencefiles each week. (It may be necessary to re-readsome sections more than once) – Approximately 4hours per week.
(ii) Carefully study lecture material and take notice ofadvice given in lectures.
(iii) Attempt tutorial exercises before tutorials and workout where you have difficulties, which hopefully canbe resolved in tutorials.
(iv) Make a conscious effort to keep up with the materialpresented.
1. INTRODUCTION TO UNIT
1.1 How Can We Define Statistics?
Statistics, for our purposes encompasses the followingmajor activities:
(i) Collection and description of information, or data -“descriptive statistics”. We will normally be dealing
8/18/2019 IEM Outline Lecture Notes Autumn 2016
2/198
2
with a subset of a larger collection or set of data. Thesubset is called a sample, the larger set a population.
(ii) Using sample data to make inferences about apopulation - “statistical inference”.
1.2 Why Study Statistics?
(i) (Major) It can be useful. It can help us to make
decisions in the face of uncertainty.
(ii) People are bombarded with statistics all the time.Often statistics is used in ways that are notwarranted. It is important not to be fooled by peoplewho misuse statistics.
(iii) It is important to have a clear understanding of thestrengths and limitations of statistical analysis.
1.3 Structure of the Subject
Descriptive Statistics:
How we summarise the characteristics of raw data(using graphs, summary measures, etc.)
Probability Theory and Probability Distributions(“deductive statistics”):
Rules (or axioms) for calculating probabilities ofcertain things (called events) happening.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
3/198
3
Probability theory can be considered part ofdescriptive statistics.
Here we will be concerned about making probabilitystatements about a given population.
Sampling Theory and Sampling Distributions (the basis
of “inductive statistics”):
Here we will be concerned with making probabilitystatements about characteristics of samples, givenassumptions about the population from which thesample was drawn.
Point and Interval Estimation:Point Estimation - Here we will be concerned about
producing a particular estimate (a number), based onsample data, of a characteristic of a population.
Interval Estimation - Here we will not give anestimate of a population characteristic, but rather arange in which we are confident (to some degree) thetrue value of the population characteristic is.
Hypothesis Testing:Under this heading we will be looking at ways oftesting hypotheses about characteristics ofpopulations, based on sample data.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
4/198
4
Regression Analysis:
In this case we will be concerned with estimatinglinear relationships between different variables, i.e.linear equations.
We will go on to examine statistical tests associatedwith estimated regression equations.
Introduction to Differential Calculus
2. DESCRIPTIVE STATISTICS
2.1 Some Basic Definitions Relating to Data
(i) Elementary Units and Frames:
Statistical data normally represents measurements orobservations of a certain characteristic or variable ofinterest of each member of a set of objects or people.
Each object (or person) for which the characteristic is orcan be measured is called an elementary unit.
The set or listing of all possible elementary units is called aframe.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
5/198
5
(ii) Population/Sample:
A statistical population is the set of measurements orobservations of a characteristic of interest for allelementary units in a frame.
A population may comprise a finite or infinite number ofelements (observations), depending on the context.
A statistical sample is a subset of a population.
(iii) Parameters/Statistics:
For our purposes -The numerical characteristics which describe a populationare called parameters of the population.
The numerical values calculated from sample data arecalled sample statistics. These sample statistics can bethought of as describing or characterizing the sample.
(iv) Qualitative and Quantitative Variables:
Populations may be quantitative or qualitative. Data from
quantitative populations is called quantitative or intervaldata. Data from qualitative populations is calledqualitative, nominal or categorical data.
Data from a quantitative population can be expressednumerically in a meaningful way. The variable (orcharacteristic) associated with a quantitative population is
called a quantitative variable.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
6/198
6
Data from qualitative populations cannot be expressed
numerically in a meaningful way. The variable (orcharacteristic) associated with a qualitative population iscalled a qualitative or categorical variable.
Note: Just because we assign a numerical code to aqualitative variable does not mean the variable isquantitative.
(v) Discrete and Continuous Quantitative Variables:
A discrete quantitative variable can assume only certaindiscrete numerical values (on the number line); i.e. thereare gaps between the various values. Depending on thevariable, there could be a finite or infinite number of thesediscrete values.
A continuous quantitative variable can assume any valuein a specific range or interval. The interval can be of finiteor infinite width.
Note: By definition there are an infinite number of valuesa continuous variable can take.
2.2 Frequency Distributions
(a) Introduction
Suppose we have a set of raw statistical data. At this stagewe will make no distinction as to whether we are talking
about a statistical population or sample.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
7/198
7
In studying the data it is often useful to initially group the
raw data into different classes or categories. A frequencydistribution for a set of data lists the number ofobservations or ‘data points’ in each class used forgrouping (the class frequencies). The classes of afrequency distribution must be mutually exclusive (anobservation cannot fall into two classes) and exhaustive(any observation must belong to a class).
(b) Frequency Distributions for Quantitative Data
Each class of a frequency distribution of quantitative datausually has a lower and an upper limit, althoughsometimes it is necessary or convenient to have open-ended classes, i.e. classes which have either an upper orlower limit but not both.
Example:Suppose we have data on the number of children in 100households as follows:
Class Frequency0 to under 2 children2 to under 4 children4 to under 6 children6 or more children
3055132
The class width is the difference between successive lowerclass limits or upper class limits.
Note: An open-ended class has no class width.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
8/198
8
General Advice for Forming Frequency Distributions:
The number of classes should generally be between5 and 20.
Class widths are ideally equal, but this may notalways be possible, and open-ended classes may benecessary.
Class limits should be chosen such that the classmidpoint is close to the average of observations in
the class. This is because in calculating summarystatistics based on grouped data the midpoint isused as representative of all observations in theclass.
(c) Relative, Cumulative and Cumulative RelativeFrequency Distributions
A relative frequency distribution shows the proportion ofall observations falling in each class. It is obtained bydividing the class frequencies ( i f ) by the total number of
observations in the data (‘n’).
A cumulative frequency distribution shows, for each class
i , the total of the first i frequencies.
A cumulative relative frequency distribution shows, foreach class i , the total of the first i relative frequencies.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
9/198
9
For the previous example we have
Class (i )
0 to under 22 to under 44 to under 66+ children
Frequency(f i )305513
__2100
Cumulative.Frequency
308598100
RelativeFrequency
0.300.550.130.021.00
Cumulative.Rel. Freq.
0.300.850.981.00
An ogive is a graph of the cumulative relative frequencydistribution.
2.3 Histograms
Histograms give us a convenient way of visualising thedistribution of observations over classes. They take the
form of a series of adjacent (contiguous) rectangles, onefor each class, with the base of each rectangle centred overthe corresponding class midpoint.
In a frequency histogram the areas of the rectangles areproportional to the class frequencies, with the factor ofproportionality the same for all classes. Thus if all the
classes have the same width, each rectangle will have thesame base width and the class frequencies can berepresented by the rectangle heights.
In a relative frequency histogram the areas of therectangles are proportional to the relative frequencies.
Similarly cumulative and cumulative relative frequencyhistograms can be defined.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
10/198
10
Note: Frequency and relative frequency histograms will
have the same shape.
Example:Consider the following distribution
Class0.5 to under 2.5
2.5 to under 4.54.5 to under 6.56.5 to under 8.5
Frequ.10
305010
Rel. Freq.0.1
0.30.50.1
Cum. Freq.10
4090100
Frequency Histogram
Frequency
50
30
10
0.5 2.5 4.5 6.5 8.5
Relative Frequency Histogram
RelativeFrequency
0.5
0.3
0.1
0.5 2.5 4.5 6.5 8.5
8/18/2019 IEM Outline Lecture Notes Autumn 2016
11/198
11
Cumulative Frequency Histogram
100
90
CumulativeFrequency
40
10
0.5 2.5 4.5 6.5 8.5
2.4 Shapes of Distributions
The frequency or relative frequency histogram gives us arepresentation of the shape of the distribution of the databeing analysed.
There are several terms commonly used to describe theshapes of distributions.
A distribution is described as negatively skewed (skewedto the left) if it has the following shape
A Distribution that is Skewed to the Left
Relative Frequency
Variable Value
8/18/2019 IEM Outline Lecture Notes Autumn 2016
12/198
12
A distribution is positively skewed (skewed to the right) ifit has the following shape.
A Distribution that is Skewed to the Right
Relative Frequency
Variable Value
A distribution is symmetric if it has the following shape.
A Symmetric Distribution
Relative Frequency
Variable Value
The above are all examples of unimodal distributions. Abimodal distribution has two peaks.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
13/198
13
2.5 Bivariate Frequency Distributions
Often it is of interest to classify observations of elementaryunits according to two variables (characteristics). Thisallows one to gauge the relationship between the twovariables.
Example:Consider the final results of 50 students in a particular
subject. Each student’s final grade and gender arerecorded, allowing the derivation of the followingbivariate frequency distribution.
GradeGender HD Dist. Credit Pass Fail Row
Total
Male 5 4 10 6 2 27Female 2 3 11 2 5 23Column
Total7 7 21 8 7 50
Each combination of grade and gender is represented by acell in the bivariate frequency distribution, which containsthe frequency of that combination in the data.
The row totals represent, in this example, the marginalfrequencies of females and males in the class (27 and 23,respectively).
The column totals represent the marginal frequencies offinal grades.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
14/198
14
Marginal frequencies, represented by the row and columntotals, each refer to one variable only.
We can express the information in a bivariate frequencydistribution as a relative frequency distribution bydividing each entry in the distribution by the total numberof observations.
Example:
For the previous example, the bivariate relative frequencydistribution is given by
GradeGender HD Dist. Credit Pass Fail Row
Total
Male 0.10 0.08 0.20 0.12 0.04 0.54Female 0.04 0.06 0.22 0.04 0.10 0.46Col.
Total0.14 0.14 0.42 0.16 0.14 1.00
The row and column totals in the above table are calledthe marginal relative frequencies.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
15/198
15
3. MEASURES OF CENTRAL TENDENCY ANDDISPERSION
In this section we shall look at important ways ofsummarising data from both populations and samples.We shall be concerned with measures of the
‘centre’ of a frequency distribution
‘dispersion’ of values in a frequency distribution
3.1 Summation Notation
Suppose we have ‘n’ numbers. By labelling the numbersn)321( ,..., , , , we can represent the numbers by
n,..., 1, x i i
The sum of the numbers can be denoted
n21
n
1
x........xxx i
i
n
1 xi i is a shorthand way of writing the sum.
Theorem (Basic Properties of Summation Notation)Given ‘c’ is some constant and n21 a,..., a, a are ‘n’
numbers:
(i)
n
1
n
1 accai i i i
8/18/2019 IEM Outline Lecture Notes Autumn 2016
16/198
16
(ii) nca)ca(n
1
n
1
i i i i
(iii) 2n
1
n
1
2n
1
2 ncac2a)ca(
i i
i i
i i
(iv) 2n
1
n
1
2n
1
2 ncac2a)ca(
i
i
i
i
i
i
Example:Consider the following four labelled numbers.
1a, 2a, 3a, 1a 4321
Use property (iii) of the above theorem to calculate
4
1
2)1a(i
i .
8/18/2019 IEM Outline Lecture Notes Autumn 2016
17/198
17
3.2 Measures of Central Tendency
For each measure considered there are population andsample versions. We will suppose here there are N valuesin the population and ‘n’ values in a sample.
Note that at this stage we are only concerned withquantitative variables, and we assume the populationcontains a finite number of values.
Definition (Mean of a Finite Quantitative Population)If N321 x......., , x, x, x represents a finite population of ‘N’
quantitative data points, then the mean of this populationis given by
Population meanN
x
Nx...xx
N
1N21
i
i
( is the Greek letter ‘mu’)
Definition (Mean of a Sample from a QuantitativePopulation)If n321 x....., , x, x, x represents a particular sample of size
‘n’ from a quantitative population, then the mean of thissample is given by
Sample meann
x
n
x.....xxx
n
1n21
i i
8/18/2019 IEM Outline Lecture Notes Autumn 2016
18/198
18
Definition (Mode of a Set of Data)The mode is the data value that occurs most frequently in
a set of data (population or sample).
Definition (The Median of a Set of Data)If quantitative data is arranged in ascending ordescending order, the middle value of data is called themedian. If there is an even number of data points, the
median is typically taken to be the arithmetic average ofthe two middle values.
Example:Consider the following set of data, which we can assume tobe a sample from a population.
13510
1112
5214
4752
12686
46930
6x, 5x, 1x, 24n 1131 , etc. (if we label across rows
then down)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
19/198
19
Comparison of the Mean, Median and Mode
The mean takes account of all observation values thereforeit can be affected by extreme values or outliers, i.e. valueswhich differ greatly from the majority of values.
The median and mode are unaffected by extremely high orlow values.
The mode may not represent a “central” value in the
distribution, as in the above example, but it may be useful,for example, for qualitative data.
If the frequency (or relative frequency) distribution isperfectly symmetric and unimodal, the mean, median andmode will coincide.
Symmetric DistributionRelative Frequency
Variable ValueMeanMedianMode
If the distribution is skewed to the right (positively
skewed) and unimodal, mode < median < mean.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
20/198
20
Distribution that is Skewed to the Right
Relative Frequency
Variable Value
Mode MeanMedian
If the distribution is skewed to the left (negatively skewed)and unimodal, mean < median < mode.
Distribution that is Skewed to the Left
Relative Frequency
Variable Value
Mean ModeMedian
8/18/2019 IEM Outline Lecture Notes Autumn 2016
21/198
21
MAIN POINTS
A statistical population is a set of measurements orcharacteristics of elementary units of interest.
Once a population is defined, a sample is a subset from
the population.
Parameters are numerical characteristics of a population.
Sample statistics are numerical characteristics of a
sample.
A frequency or relative frequency distribution describes
how data is distributed over different classes or categories.
A histogram shows graphically a frequency, relative frequency or cumulative frequency distribution (the areas
of the ‘contiguous’ rectangles are proportional to the
frequencies or relative frequencies).
The mean is affected by ‘extreme’ values; the median and
the mode are not affected by ‘extreme’ values.
The population mean is denoted : the sample mean is
denoted x .
The median divides a set of quantitative data into two
equal halves.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
22/198
22
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 2
Required Reading:Ref. File 1: Section 1.1Ref. File 3: Sections 3.5(a)-(d), 3.5(f)Ref. File 4: Introduction and Sections 4.1, 4.2
3. MEASURES OF CENTRAL TENDENCY ANDDISPERSION CONTINUED
3.3 Measures of Dispersion
(a) The Range
Definition (Range of a Set of Data)The range of a set of quantitative data is the differencebetween the highest and lowest data values.
(b) The Mean Absolute Deviation
Definition (Deviation from the Mean)
Consider a particular value i x from a finite data set. The
deviation from the mean of this value is defined as
)x( i if the population mean is known
)xx( i if only a sample mean is available
8/18/2019 IEM Outline Lecture Notes Autumn 2016
23/198
23
Definition (Mean Absolute Deviation)(i) If N321 x....., , x, x, x represents a finite quantitative
population, then the population mean absolutedeviation is given by
Population MADN
xN
1
i i
(ii) If n321 x....., , x, x, x represents a sample from aquantitative population, then the sample meanabsolute deviation is given by
Sample MADn
xxn
1
i i
(c) The Standard Deviation and Variance
Another more mathematically convenient way ofanalysing the deviations from the mean is to square them.This leads to the definition of the variance.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
24/198
24
Definition (Variance of a Finite Quantitative Population)If N321 x....., , x, x, x represent a finite population of N
quantitative data points, then the variance of thispopulation is given by
(Finite) Population varianceN
)x(N
1
2
2
i i
Definition (Variance of a Sample from a QuantitativePopulation)If n321 x....., , x, x, x represent a particular sample of size ‘n’
from a quantitative population, then the variance of thissample is given by
Sample variance1n
)xx(
s
n
1
2
2
i i
Alternatively we can equivalently write:
Population varianceN
Nx 2
N
1
2
2
i i
Sample variance
1n
xnx
s
2n
1
2
2
i i
8/18/2019 IEM Outline Lecture Notes Autumn 2016
25/198
25
The standard deviation is defined as the positive square
root of the variance.
Definition (Finite Population and Sample StandardDeviations)(i) If N321 x....., , x, x, x represent a finite quantitative
population, then the population standard deviation isgiven by
Population standard deviationN
)x(N
1
2
i i
(ii) If n321 x....., , x, x, x represent a sample from a
quantitative population, then the sample standard
deviation is given by
Sample standard deviation1n
)xx(
s
n
1
2
i i
8/18/2019 IEM Outline Lecture Notes Autumn 2016
26/198
26
An advantage of the standard deviation over the varianceis that it is expressed in the original units of measure.
Example:
Calculate 2s and ‘s’ for the previous 24 number example.(36.3315, 6.0276)
3.4 The Coefficient of Variation
The coefficient of variation is useful for comparing thevariability of data sets with means that differ significantly,or data sets based on different units of measure.
Definition (Coefficient of Variation)(i) For a population with mean and standard deviation
:
Population coefficient of variation
(i) For a sample with mean x and standard deviation ‘s’:
Sample coefficient of variation
x
s
8/18/2019 IEM Outline Lecture Notes Autumn 2016
27/198
27
Example:Suppose we wish to compare the variability of the weights
of a given sample of people with the variability of theirdaily calorie intake. We are told
sample mean of weights = 68kgsample standard deviation of weights = 5kgsample mean of daily calorie intake = 1200 caloriessample standard deviation of daily calorie intake = 300
calories
3.5 Chebyshev’s Theorem and the Empirical Rule
Theorem (Chebyshev’s Theorem) For any quantitative population with a finite variance, theproportion of data points less than ‘c’ standard deviations
from the mean is at least )c1(1 2 , where 0c .
For hump-shaped or bell-shaped (unimodal) distributions,Chebyshev’s theorem will give a conservative indication of
the concentration of population data points around themean. In such cases we can refer to the empirical rule.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
28/198
28
The Empirical RuleFor a bell-shaped distribution of sample or population
data, it will be approximately true that 68% of the data points will lie within 1 standard
deviation of the mean.
95% of the data points will lie within 2 standarddeviations of the mean.
99.7% of the data points will lie within 3 standarddeviations of the mean.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
29/198
29
4. INTRODUCTORY PROBABILITY THEORY
4.1 Basic Set Theory
A set is a collection of objects or elements.
Definitions (Sets)The set of all elements of interest in a particular problem
or context is called the universal set, which can be denotedby, say, . Other basic definitions relating to sets are as
follows:
(i) The null set , denoted
, contains no elements.(ii) If an element denoted ‘x’ is a member of a set ,
this is commonly denoted x : if ‘x’ is not a
member of set , this can be denoted x .(iii) The intersection of sets and #, denoted#
, is
the set of elements in both and #.
(iv) The union of sets and #, denoted#
, is the
set of elements in and/or #.
(v) Set is said to be a subset of set #, denoted#
, if all elements in are also in #: if is not
a subset of #, this can be denoted #
.(vi) The complement of set , denoted , is the set of
elements in but not in .
(vii) If # , we say and # are mutually
exclusive or disjoint sets; they have no element incommon.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
30/198
30
Venn diagrams are often a convenient way of portrayingsets and the relationship between them. An example is the
following diagram.
( #)
$ $
#$
!
" # , # disjoint/mutually exclusive
!
ExampleSuppose we have the set 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
Define 7, 4, 3, 1, 9, 7, 5, 3, 1 %
8/18/2019 IEM Outline Lecture Notes Autumn 2016
31/198
31
4.2 Terminology Related to Statistical Experiments
An experiment, in a statistical sense, is an act or processthat leads to an outcome which cannot be predicted withcertainty.
Definition (Simple Events and Events)A simple event of an experiment is an outcome that cannot
be decomposed into simpler outcomes. An event is acollection or set of one or more simple events. An event issaid to have occurred if a simple event included in theevent occurs.
Definition (Sample Space of a Statistical Experiment)The sample space of an experiment, which will be denoted, is the set of all possible simple events. It can be
described as the event consisting of all simple events.
Venn diagrams often provide a convenient way ofdepicting sample spaces and events.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
32/198
32
Definition (Discrete Sample Space)A discrete sample space consists of either a finite number
of simple events or a countable and infinite number ofsimple events.
Definition (Continuous Sample Space)A continuous sample space consists of simple events thatrepresent all the points in an interval on the real numberline. The interval could be of finite or infinite width.
4.3 Basic Concepts of Probability
(a) Probabilities of Events as Relative Frequencies
Definition (Probability of an Event)If Ef is the frequency with which event ‘E’ occurs in ‘n’
repetitions (trials) of an experiment under identicalconditions/rules, )E( P is defined as
n
f lim)E( En
P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
33/198
33
(b) Definition of a Probability Distribution
Definition (Probability Distribution)A probability model or probability distribution for anexperiment takes the form of either a list of probabilitiesof simple events or some other representation of therelative frequency distribution of the underlyingpopulation associated with the experiment.
(c) Axioms of Probability
Suppose an experiment has a sample space . Any
assignment of probabilities to events in (subsets of )
must satisfy the following axioms:
1. For any event ‘E’ in , 1)E(0
P 2. 1)P(
3. The probability of an event that is the union of acollection of mutually exclusive events is given bythe sum of the probabilities of these mutuallyexclusive events. (The ‘additive property ofprobability’)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
34/198
34
(d) Assigning Probabilities to Simple Events in DiscreteSample Spaces
There are three broad approaches to assigningprobabilities to events.
(i) The Underlying Population Relative FrequencyDistribution is Known or Assumed
In this case the relative frequencies of the simple eventscan be considered the probabilities of these simple events.
As a special case, the ‘classical’ or ‘equally likely’ approach to assigning probabilities is applicable inexperiments where it is reasonable to assume that eachsimple event is equally likely. In this case, if there are ‘n’
simple events, each will occur with probability 1/n.
(ii) The Underlying Population Relative FrequencyDistribution is Not Known or Assumed, but theExperiment is Repeatable
This approach relies on past observation of outcomes from
an experiment that allows an approximate determinationof relative frequencies of simple events and events.
In terms of this approach, the probability of an event isapproximated by the relative frequency of the event in a‘large’ number of identical trials of the experimentconsidered. This is often referred to as the ‘empirical’ or
‘relative frequency’ approach to assigning probabilities.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
35/198
35
(iii) The Underlying Population Relative Frequency
Distribution is Not Known or Assumed, and theExperiment is Not Repeatable
In many circumstances an experiment may not berepeatable, i.e. it will only happen once. In suchcircumstances people assign subjective probabilities to theexperiment outcomes which reflect their personal beliefs.
For two events ‘A’ and ‘B’ defined on a sample space:
)BA( P probability of simple events in both ‘A’
and ‘B’.
)BA( P probability of simple events in ‘A’ and/or‘B’.
!
B BA
A
8/18/2019 IEM Outline Lecture Notes Autumn 2016
36/198
36
Events ‘A’ and ‘B’ are said to be mutually exclusive if
BA . It follows immediately that, if ‘A’ and ‘B’ are mutually exclusive
0)BA( P
!
A B
Two Approaches to Determining the Probability of anEvent Defined on a Discrete Sample Space:
(i) Add up the probabilities of the simple eventsincluded in the event.(ii) Use various probability rules and laws relating to
unions, intersections and complements of events(considered later)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
37/198
37
The first approach above can be formalised asperformance of the following steps:
(i) Define the experiment.(ii) List the simple events and assign probabilities to
them in a way consistent with the axioms ofprobability.
(iii) Determine the simple events included in the eventof interest.
(iv) Sum the probabilities of the simple events in theevent of interest to find its probability.
Example:Consider the experiment of tossing a fair die once and let‘A’ be the event of obtaining an odd number of dots on theupward facing side.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
38/198
38
Example:Suppose
}8, 5, 4, 3, 2{B, }6, 5, 3, 1{A, }8, 7, 6, 5, 4, 3, 2, 1{ .
Where is the sample space of a statistical experiment and
all the simple events are equally likely.
Example:Suppose that for }8, 7, 6, 5, 4, 3, 2, 1{ :
1.0)6()3()2()1( P P P P
08.0)8()7()4(
P P P 36.0)5( P
with }8, 5, 4, 3, 2{B, }6, 5, 3, 1{A
8/18/2019 IEM Outline Lecture Notes Autumn 2016
39/198
39
MAIN POINTS
For a finite population,
varianceN
)x(N
1
2
2
i i
For a sample,
variance 1n
)xx(
s
n
1
2
2
i i
The standard deviation is the square root of the variance:
it has the same units of measure as the data.
Chebyshev’s t heorem applies to all statistical populations.
The empirical rule applies only to hump-shapeddistributions.
The coefficient of variation measures dispersion relative
to the mean. It allows us to compare the dispersions of
data sets with different means and units of measure.
In set notation:
means ‘and’
means ‘and/or’
A means ‘not A’
8/18/2019 IEM Outline Lecture Notes Autumn 2016
40/198
40
In statistical experiments:
Simple events cannot be decomposed into simpleroutcomes.
The sample space is the set of all simple events.
Events are a collections or sets of one or more simple
events.
An event occurs if any of its included simple events
occur.
All statistical experiments can be thought of as sampling
from a statistical population.
Probabilities must obey certain axioms.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
41/198
41
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 3
Required Reading:Ref. File 4: Sections 4.3, 4.4, 4.6
4. PROBABILITY THEORY CONTINUED
4.4 Discrete Bivariate Probability Distributions
Definitions (Joint and Marginal Probabilities)Suppose a statistical experiment for which simple eventstake the form of intersections of outcomes with respect totwo or more variables. For such a statistical experiment:
The probabilities of the simple events are referred toas joint probabilities
The probabilities of events representing outcomeswith respect to one of the variables only are calledmarginal probabilities.
A listing or other representation of the jointprobabilities is called a joint probability distribution.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
42/198
42
Example:Suppose we have the following data on all 1950 first year
students at a particular university.
Work StatusAge inYears
NotWorking
Part-Time
Full-Time
RowTotal
Under 2525 - 34
35 or over
1200100
10
20075
5
250100
10
1650275
25ColumnTotal
1310 280 360 1950
Consider the experiment of selecting one of the students atrandom. Define the following events for the experiment:
A: Under 25B: 25 - 34C: 35 or overD: Not workingE: Part-time workerF: Full-time worker
Calculate the following probabilities:
)EC(, )C(, )FC(, )AD(, )D(, )C(, )(
P P P P P P A P
( )7811, 7877, 265, 138, 195131, 781, 1311
8/18/2019 IEM Outline Lecture Notes Autumn 2016
43/198
8/18/2019 IEM Outline Lecture Notes Autumn 2016
44/198
44
4.5 Useful Counting Techniques
(a) The Multiplicative Rule
Theorem (Multiplicative Rule of Counting)Suppose two sets of elements, sets and #, consist of An
and Bn distinct elements, respectively: An and Bn need not
be equal. Then it is possible to form BA nn distinct pairs
of elements consisting of one element from set and one
element from set #, without regard to order within a pair.
Example:If a take-away food store sells 10 different food items and5 different types of drink, 50105 distinct food/drinkpairs are possible.
The multiplicative rule can be extended naturally. Thus
k21 n...nn different sets of ‘k’ elements are possible if one
selects an element from each of ‘k’ groups consisting of
k21 n,..., n, n distinct elements, respectively.
Example:
Suppose we select 5 people at random. What is theprobability that they were born on different days of theweek, assuming an individual has an equal probability ofbeing born on any of the seven days of the week?(Approx. 0.1499)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
45/198
45
A simple event here is an ordered sequence of 5 elements,the first representing the day of the week the first person
was born on, the second the day the second person wasborn on, and so forth.
(b) Permutations
Definition (Permutations)A permutation is an ordered sequence of elements.
Definition (Factorial Notation)If ‘N’ is a non-negative integer, we define:
)1)(2)(3).......(2N)(1N(N!N
(‘N-factorial’)
And
1!0
8/18/2019 IEM Outline Lecture Notes Autumn 2016
46/198
46
Theorem (Number of Permutations)The total number of possible distinct permutations
(ordered sequences) of ‘R’ elements selected (withoutreplacement) from ‘N’ distinct elements, denoted RN P , isgiven by
)!RN(
!NRN
P
Example:Consider the numbers 1, 2, 3, 4. How many permutationsof these four numbers taken 2 at a time can be found?(12)
(c) Combinations
Definition (Combinations)
A set of ‘R’ elements selected from a set of ‘N’ distinctelements without regard to order is called a combination.
Theorem (Number of Combinations)The total number of possible combinations of ‘R’ elementsselected from a set of ‘N’ distinct elements is given by.
)!RN(!R
!NRN
C
8/18/2019 IEM Outline Lecture Notes Autumn 2016
47/198
47
Example:In how many ways can a committee of 4 people be chosen
from a group of 7 people? (35)
(d) Permutations of ‘N’ Non-Distinct Elements
Theorem (Number of Permutations of ‘N’ Non-DistinctElements)Consider a set of ‘N’ elements of which 1N are alike, 2N
are alike,....., and rN are alike, where 1N i )r,..., 1( i
and NNr
1
i i . Then the number of distinct permutations
of these ‘N’ elements is given by
!N!......N!N
!N
r21
If the above result is specialized to the case where ‘x’ is the
number of distinct arrangements (or distinctpermutations) of ‘N’ objects where ‘R’ are alike and
)RN( are alike, then
RN
)!RN(!R
!Nx C
8/18/2019 IEM Outline Lecture Notes Autumn 2016
48/198
48
Example:Say we have 3 black flags and 2 red flags. How many
distinct ways are there of arranging these flags in a row?(10)
Example:Suppose there are 6 applicants for 2 similar jobs. As thepersonnel manager is too lazy he simply selects 2 of theapplicants at random and gives them each a job. What isthe probability that he selects one of the 2 best applicants,and 1 of the four worst applicants? (8/15)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
49/198
49
4.6 Conditional Probability
Definition (Conditional Probability)The probability of event ‘A’ occurring given that event ‘B’
occurs, or the conditional probability of ‘A’ given ‘B’ (hasoccurred) is denoted )B|A( P . Provided 0)B( P , this
conditional probability is defined to be
)B()BA()B|A(
P P P
Example:Suppose that a survey of women aged 20-30 years suggeststhe following joint probability table relating to maritalstatus and desire to become pregnant within the next 12months.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
50/198
50
Desire
Marital status Pregnancy No pregnancy TotalMarriedUnmarried
0.080.02
0.470.43
0.550.45
Total 0.10 0.90 1.00
Theorem (Multiplicative Law of Probability)
Suppose events ‘A’ and ‘B’ defined on a sample space.Then
)B|A()B()A|B()A()BA( P P P P P
Example:Define events ‘A’ and ‘B’ in the following way:
‘A’: A student achieves a mark of over 65% in a first yearstatistics exam
‘B’: A student goes on to complete her bachelors degree.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
51/198
51
Suppose past experience indicates
88.0)A|B(7.0)A(
P P
4.7 Independence of Events
Sometimes, whether an event ‘B’ has occurred or not willhave no effect on the probability of ‘A’ occurring. In this
case we say events ‘A’ and ‘B’ are independent.
Definition (Independent and Dependent Events)Events ‘A’ and ‘B’ are said to be statistically independent if
)B()A()BA( P P P
If )B()A()BA( P P P
, the events are said to bestatistically dependent .
8/18/2019 IEM Outline Lecture Notes Autumn 2016
52/198
52
Alternative Definition (Independent and DependentEvents)
Events ‘A’ and ‘B’ are said to be statistically independent if
)A()B|A( P P
)B()A|B( P P
Otherwise the events are said to be statistically dependent.
Example:Consider the single die tossing experiment again anddefine the following events:
‘A’: an odd number of dots results‘B’: a number of dots greater than 2 results
Are ‘A’ and ‘B’ independent?
8/18/2019 IEM Outline Lecture Notes Autumn 2016
53/198
53
4.8 More Useful Probability Rules
(a) The Additive Law of Probability
Theorem (Additive Law of Probability)For two events ‘A’ and ‘B’ defined on a sample space
)BA()B()A()BA( P P P P
Example:Again suppose that for }8, 7, 6, 5, 4, 3, 2, 1{ :
1.0)6()3()2()1( P P P P
08.0)8()7()4( P P P
36.0)5( P
with }8, 5, 4, 3, 2{B, }6, 5, 3, 1{A
8/18/2019 IEM Outline Lecture Notes Autumn 2016
54/198
54
(b) The Complementation Rule
Theorem 4.7 (Complementation Rule)Suppose an event ‘E’ and its complement E defined onsome sample space . Then
)E(1)E( P P
(b) The Law of Total Probability
Theorem (Law of Total Probability)Suppose a sample space and a set of ‘k’ events
k21 E,..., E, E such that
0)E(
i P )k,..., 1(
i
j i EE )( j i (i.e. the events are mutually
exclusive)
k21 E...EE (i.e. the events are exhaustive
on )
Then for any event ‘A’ defined on :
k
1
kk2211
k21
)E|A()E(
)E|A()E(....)E|A()E()E|A()E(
)AE(....)AE()AE()A(
j j j
P P
P P P P P P
P P P P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
55/198
55
MAIN POINTS
In some statistical experiments the number of basicoutcomes in the sample space or event of interest can be
enumerated by using the ‘multiplicative rule’,
permutation or combination formulae, depending on how
a basic outcome can be represented most appropriately.
)B|A( P means the probability event ‘A’ occurs given
that event ‘B’ has occurred. The conditional probabilitydefinition is
)B(
)BA()B|A(
P
P P
Multiplicative law of probability:)B|A()B()A|B()A()BA( P P P P P
Events ‘A’ and ‘B’ are statistically independent if the
probability of ‘A’ occurring is not affected by whether ‘B’
has occurred.
Events ‘ A’ and ‘ B’ are independent if)B()A()BA( P P P
or equivalently )A()B|A( P P
Additive law of probability:)BA()B()A()BA( P P P P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
56/198
56
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 4
Required Reading:Ref. File 4: Sections 4.7 to 4.9Ref. File 5: Introduction and Sections 5.1 to 5.4
4. PROBABILITY THEORY CONTINUED
4.9 Sampling With and Without Replacement
Definition (Random Sample from a Statistical Population)A random sample of ‘n’ elements from a statisticalpopulation is such that every possible combination of ‘n’elements from the population has an equal probability ofbeing in the sample.
Many experiments involve taking a random sample from afinite population. If we sample with replacement, weeffectively return each observation to the populationbefore making the next selection. In this way thepopulation from which we are sampling remains the same
from one selection to the next; provided sampling israndom, the successive outcomes will be independent.
If we sample without replacement from a finitepopulation, the outcome of any one selection will dependon the outcomes of all previous selections; the populationis reduced with each selection.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
57/198
57
Example:Suppose that in a given street 50 residents voted in the last
election. Of these, 15 voted for party ‘A’, 30 voted forparty ‘B’ and 5 voted for neither party ‘A’ nor ‘B’.Suppose that one evening a candidate for the next electionvisits the residents of the street to introduce herself. Whatis the probability that the first two eligible voters shemeets voted for party ‘A’ at the last election? ( 353 )
Example:
Consider the experiment of successively drawing 2 cardsfrom a deck of 52 playing cards. Define the followingevents:
1A : ace on first draw
2A : ace on second draw
What is the probability of selecting 2 aces if sampling(drawing) is (i) without replacement, and (ii) withreplacement? ( 1691, 2211 )
8/18/2019 IEM Outline Lecture Notes Autumn 2016
58/198
58
Note: If we simultaneously select a sample of ‘n’ elements,we are effectively sampling without replacement.
4.10 Probability Trees
Tree diagrams can be a useful aid in calculating theprobabilities of intersections of events (i.e. jointprobabilities).
Example:Greasy Mo’s take-away food store offers special $10 mealdeals consisting of a small pizza or a kebab, together witha can of soft drink, a milkshake or a cup of fruit juice.Past experience has shown that 60% of meal deal buyerschoose a pizza (‘P’), 40% choose kebabs (‘K’), 75% choosesoftdrink (‘S’), 20% choose a milkshake (‘M’) and 5%choose fruit juice (‘J’). Assume the events ‘P’ and ‘K’ areindependent of the events ‘S’, ‘M’ and ‘J’. What is theprobability that a meal deal customer (chosen at random)will choose a pizza and fruit juice? (0.03)
The tree diagram for this example can be drawn as below.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
59/198
59
45.0)75.0(6.0)SP( P
S:0.75
M:0.2 12.0)2.0(6.0)MP( P
P:0.6 J:0.05
03.0)05.0(6.0)JP( P
3.0)75.0(4.0)SK ( P
S:0.75
K:0.4
M:0.2 08.0)2.0(4.0)MK ( P
J:0.0502.0)05.0(4.0)JK ( P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
60/198
60
5. PROBABILITY DISTRIBUTIONS OF DISCRETERANDOM VARIABLES
5.1 Probability Distributions and Random Variables
A probability distribution can be considered a theoreticalmodel for a relative frequency distribution of data from areal life population.
A probability distribution thus specifies the probabilitiesassociated with the various outcomes of a statisticalexperiment. It can take the form of a table, a graph orsome formula.
From now on we shall be concerned with thecharacteristics of probability distributions. However, tofacilitate our study we shall now represent simple eventsand events associated with statistical experiments byvalues of random variables.
Definition (Random Variable)A random variable X is a rule that assigns to each simpleevent of a statistical experiment a unique numerical value.
The above definition can also be expressed in the followingslightly more mathematical way.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
61/198
61
Alternative Definition (Random Variable)A random variable X is a real valued function for which
the domain is the sample space of a statistical experiment.
In most statistical experiments of interest, outcomes giverise to quantitative data that can be considered values ofthe random variable being studied.
In experiments which give rise to categorical or qualitativedata, a random variable can normally also be defined.
Example:Consider the experiment of selecting a person at randomand noting their hair colour.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
62/198
62
Definition (Discrete Random Variable)A discrete random variable can only assume a finite or
infinite and countable number of values.
Definition (Continuous Random Variable)A continuous random variable can assume any value in aninterval (finite or infinite).
Definition (Discrete Probability Distribution)A discrete probability distribution lists a probability for, orprovides a means (e.g. a rule or formula) of assigning aprobability to, each value a discrete random variable cantake.
Suppose our random variable is called X . Then )x( X P
represents the probability that the random variable takeson the particular value ‘x’.
Properties of the Discrete Probability Distribution of aRandom Variable X :
1)x(0 X P for all values of ‘x’
x
1)x(all
X P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
63/198
63
Example:Consider again the experiment of tossing a fair die once
and noting the number of dots on the upward facing side( X ).
Definition (Cumulative Distribution Function)The cumulative distribution function of a random variable X , denoted )x( F , is defined as
)x()x( X P F
where ‘x’ is any real number.
5.2 Expected Values of Random Variables
It is of interest to have a measure of the centre of theprobability distribution of a random variable X . This roleis filled by the expected value of X .
8/18/2019 IEM Outline Lecture Notes Autumn 2016
64/198
64
Definition (Expected Value of a Discrete RandomVariable)
The expected value of a discrete random variable X isdefined as
x
)x(x)(all
X P X E
If a statistical experiment considered generates values ofthe random variable that coincide with values in thepopulation considered, and the theoretical probabilitydistribution of the random variable and populationrelative frequency distribution are the same, the mean ofthe theoretical distribution of X will be the same as thepopulation mean (i.e. ). That is, )( X E .
Example:Suppose you buy a lottery ticket for $10. The sole prize inthe lottery is $100,000 and 100,000 tickets are sold. If thelottery is fair (i.e. each ticket sold has an equal chance ofwinning), what will be your expected gain from buying thelottery ticket? (-9)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
65/198
65
Theorem (Expected Value of a Function of a DiscreteRandom Variable)
Suppose a function )( X g of a discrete random variable X .The expected value of this function, if it exists, is given by
x
)x()x()]([all
X P g X g E
Theorem 5.2 (Various Properties of Expected Values)
If ‘c’ is any constant then
c)c( E
If ‘c’ is any constant and )( X g is any function of a
discrete or continuous random variable X then
)]([c)](c[ X g E X g E
If )( X g i )k,..., 1( i are ‘k’ functions of a discrete or
continuous random variable X then
)]([...)]([)](..)([ k1k1 X g E X g E X g X g E
If )( X h and )( X g are two functions of a discrete or
continuous random variable X such that )()( X g X h
for all X , then
)]([)]([ X g E X h E
8/18/2019 IEM Outline Lecture Notes Autumn 2016
66/198
66
5.3 The Variance of a Random Variable
To gauge the dispersion of a random variable X about itsexpected value or mean we can calculate the expected
value of its squared distance 2))(( X E X from the mean.
This is called the variance of the random variable X ,denoted )( X Var .
Definition (Variance of a Random Variable)
The variance of any random variable X (discrete orcontinuous) is given by
]))([()( 2 X E X E X Var
Definition (Standard Deviation of a Random Variable)
The standard deviation of any random variable X (discreteor continuous) is given by
]))([()()( 2 X E X E X Var X SD
Again assuming the probability distribution of X is an
accurate representation of the population relativefrequency distribution of X , we can write 2)( X Var ,
where 2 is the population variance.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
67/198
67
An alternative way of writing (and calculating) )( X Var is
discrete)isIf ()]([)x(x
)]([)()(
2
x
2
22
X X E X P
X E X E X Var
all
Example:Suppose a lottery offers 3 prizes: $1,000, $2,000 and
$3,000. 10,000 tickets are sold and each ticket has anequal chance of winning a prize. Calculate the varianceand standard deviation of the random variable X representing the value of the prize won by a ticket.(1399.64, 37.4118)
x )x(
X P
2
x )x(x
X P )x(x
2
X P 0
10000
9997 0 0 0
1,00010000
1 1,000,000 0.1 100
2,00010000
1 4,000,000 0.2 400
3,00010000
1 9,000,000 0.3 900
Total 0.6 1400
8/18/2019 IEM Outline Lecture Notes Autumn 2016
68/198
68
If we wish to determine the variance of a linear function
X X gY ba)(
of a random variable X , the followingrule can be used
)(b)ba()( 2 X Var X Var Y Var
8/18/2019 IEM Outline Lecture Notes Autumn 2016
69/198
69
5.4 The Binomial Distribution
The binomial distribution is a discrete probabilitydistribution based on ‘n’ repetitions of an experimentwhose outcomes are represented by a Bernoulli randomvariable.
(a) Bernoulli Experiments
A Bernoulli experiment (or trial) is such that only 2outcomes are possible. These outcomes can be denotedsuccess (‘S’) and failure (‘F’), with probabilities ‘p’ and
)p1( , respectively.
A Bernoulli random variable Y is usually defined so that ittakes the value 1 if the outcome of a Bernoulli experiment
is a success, and the value 0 if the outcome is a failure.Thus
)p1()0(
p)1(
Y P
Y P
The mean and variance of a Bernoulli random variable
defined in the above way are
)p1(p)(
p)(
Y Var
Y E
8/18/2019 IEM Outline Lecture Notes Autumn 2016
70/198
70
(b) Binomial Experiments
Definition (Binomial Experiment)A binomial experiment fulfils the following requirements:
(i) There are ‘n’ repetitions or ‘trials’ of a Bernoulliexperiment for which there are only twooutcomes, ‘success’ or ‘failure’.
(ii) All trials are performed under identical
conditions.(iii) The trials are independent.(iv) The probability of success ‘p’ is the same for each
trial.(v) The random variable of interest, say X , is the
number of successes observed in the ‘n’ trials.
Theorem (The Binomial Probability Function)Let X represent the number of successes in a binomialexperiment consisting of ‘n’ trials and with a probability
‘p’ of success on each trial. The probability of ‘x’
successes in such an experiment is given by
xnxxn )p1(p)x( C X P for n,..., 3, 2, 1, 0x
8/18/2019 IEM Outline Lecture Notes Autumn 2016
71/198
71
Example:A company that supplies reverse-cycle air conditioning
units has found from experience that 70% of the units itinstalls require servicing within the first 6 weeks ofoperation. In a given week the firm installs 10 airconditioning units. Calculate the probability that, within 6weeks
5 of the units require servicing (0.1029 approx.)
none of the units require servicing (0 approx.)
all of the units require servicing (0.0282 approx.)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
72/198
72
(c) Cumulative Binomial Probabilities
(Extract of Appendix 3)
CUMULATIVE BINOMIAL PROBABILITIES: )n, p|x( X P
pn x 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 .... 0.70
1
2
3
10
01
012
012
3
0123456789
10
0.95001.0000
0.90250.99751.0000
0.85740.99280.9999
1.0000
0.59870.91390.98850.99900.99991.00001.00001.00001.00001.00001.0000
0.90001.0000
0.81000.99001.0000
0.72900.97200.9990
1.0000
0.34870.73610.92980.98720.99840.99991.00001.00001.00001.00001.0000
0.85001.0000
0.72250.97751.0000
0.61410.93930.9966
1.0000
0.19690.54430.82020.95000.99010.99860.99991.00001.00001.00001.0000
0.80001.0000
0.64000.96001.0000
0.51200.89600.9920
1.0000
0.10740.37580.67780.87910.96720.99360.99910.99991.00001.00001.0000
0.75001.0000
0.56250.93751.0000
0.42190.84380.9844
1.0000
0.05630.24400.52560.77590.92190.98030.99650.99961.00001.00001.0000
0.70001.0000
0.49000.91001.0000
0.34300.78400.9730
1.0000
0.02820.14930.38280.64960.84970.95270.98940.99840.99991.00001.0000
0.65001.0000
0.42250.87751.0000
0.27460.71830.9571
1.0000
0.01350.08600.26160.51380.75150.90510.97400.99520.99951.00001.0000
0.60001.0000
0.36000.84001.0000
0.21600.64800.9360
1.0000
0.00600.04640.16730.38230.63310.83380.94520.98770.99830.99991.0000
....
....
....
....
0.30001.0000
0.09000.51001.0000
0.02700.21600.6570
1.0000
0.00000.00010.00160.01060.04730.15030.35040.61720.85070.97181.0000
Example:Referring to previous air conditioning unit example,calculate the probability that within 6 weeks of installation
less than 8 of the air conditioners require servicing.
(0.6172 approx.) 4 or more of the air conditioners require servicing.
(0.9894 approx.)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
73/198
73
Example:A referring to previous air conditioning unit example, use
the cumulative binomial tables to calculate the probabilitythat within 6 weeks of installation
5 units require servicing (0.103)
10 units require servicing (0.0282)
(d) Characteristics of the Binomial Distribution
Theorem (Mean and Variance of a Binomial RandomVariable)Let X represent the number of successes in a binomialexperiment consisting of ‘n’ trials, and where the
probability of success on each trial is ‘p’. Then
np)( X E )p1(np)( X Var
8/18/2019 IEM Outline Lecture Notes Autumn 2016
74/198
74
Each combination of ‘n’ and ‘p’ gives a particularbinomial distribution. We say ‘n’ and ‘p’ are the
parameters of the binomial distribution.
If 0.5p , the binomial distribution is symmetric.
ExampleSuppose 5n
and 0.5p
(probability histogram)probability
0.3125
0.1563
0.0313
0 1 2 3 4 5 X
The binomial distribution will be skewed to the left (i.e.‘negatively skewed’) if 0.5p , and skewed to the right
(i.e. ‘positively skewed’) if 0.5p . In either case the
tendency to be skewed diminishes as ‘n’ increases.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
75/198
8/18/2019 IEM Outline Lecture Notes Autumn 2016
76/198
76
The binomial distribution is a model for the relative
frequency (probability) distribution of numbers ofsuccesses in ‘n’ trials of a Bernoulli experiment.
The binomial distribution can be represented by the
probability function
xnx)p1(p)x( xn C X P
where ‘ n’ is the number of trials, ‘x’ the number ofsuccesses and ‘ p’ the probability of success at each trial.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
77/198
77
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 5
Required Reading:Ref. File 6: Introduction and Sections 6.1 to 6.4
6. CONTINUOUS PROBABILITY DISTRIBUTIONS
6.1 Introduction
From now on we shall be mainly concerned with studyingthe distributions of continuous random variables. As wehave noted, a continuous random variable can assume anyvalue in a given interval.
The probability distribution for a continuous randomvariable X will have a smooth curve or line as its graphicalrepresentation. The heights of the points on this curve willbe given by a function of ‘x’, denoted )x( f , which is
variously called the probability density function, theprobability distribution or simply the density function ofthe random variable X .
Areas under a density function )x( f represent
probabilities of X taking on values in the correspondingintervals.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
78/198
78
Area )ba( X P
y y = f (x)
a b X
Properties of Density FunctionsIf )x( f is a valid density function, it satisfies the following
two properties:
(i) 0)x(
f for all x
(ii)
1x)x( d f
Note: For a continuous random variable the probabilityassociated with any particular value of the variable is 0.
The mean and variance of a continuous random variableare normally determined using calculus.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
79/198
79
6.2 The Uniform Distribution
“If a random variable X can take on any value in a given finite interval bxa and the probability of the variabletaking a value in a given finite sub-interval is the same asthe probability the variable takes a value in any otherfinite sub-interval of the same width, we say the variable X is uniformly distributed.” We have the following formal
definition.
Definition (Uniform Random Variable)A continuous random variable X is said to be uniformlydistributed over the finite interval ba X if and only ifits density function is given by
bxoraxif , 0
bxaif ,
ab
1
)x( f
We can calculate probabilities with respect to the random
variable X in the above definition from
ab
cd)dc(
X P for bd, ac
8/18/2019 IEM Outline Lecture Notes Autumn 2016
80/198
80
f (x)
Total Area = 11/(b-a)
a c d b X
Theorem (Expected Value and Variance of a Uniform
Random Variable)
Suppose the random variable X is uniformly distributed
over the finite interval bxa . The expected value and
variance of X are, respectively
2
)ba()(
X E
12
)ab()(
2
X Var
Example:
The amount of petrol sold daily by a service station (say X )is known to be uniformly distributed between 4,000 and6,000 litres inclusive. What is the probability of sales onany one day being between 5,500 and 6,000 litres? (0.25)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
81/198
81
6.3 The Normal (Gaussian) Distribution
The normal distribution represents a family of “bell-shaped” distributions that are distinguished according totheir mean and variance.
Definition (Normally Distributed Random Variable)A random variable X is normally distributed if and only ifit has a density function of the following form:
2
2 )x(
2
1
2e
2
1)x( f for all real ‘x’
Where:
and 2 are parameters of the distribution of X .
They are used to represent )( X E and )( X Var ,
respectively.
‘e’ is the irrational number ‘e’ that serves as the base
for natural logarithms ..)7182.2e(
is the irrational number representing the ratio ofthe circumference of a circle to its diameter
..)1415.3(
A normal distribution with mean and variance 2 is
usually denoted ), ( 2 N .
The normal distribution has a positive density for all real‘x’. Therefore it can strictly speaking never exactly matchthe distribution of a variable that only takes on non-
negative values. However, even in such cases it can oftengive a very good approximation.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
82/198
82
The normal distribution is symmetric about .
y y = f (x)
For any normal distribution it will be the case that,approximately:
68% of its values will fall within one standard
deviation ( ) of .
95.5% of its values will fall within two standard
deviations (2
) of . 99.7% of its values will fall within three standard
deviations (3 ) of .
Computing areas under a normal density function is
difficult, but we can use a table showing probabilitiesassociated with the standardised normal random variable(many calculators and Microsoft Excel are also able tocalculate these probabilities).
8/18/2019 IEM Outline Lecture Notes Autumn 2016
83/198
83
The standard normal distribution has a mean of 0 and avariance (and standard deviation) of 1. A standard
normal variable is often denoted Z . Thus
Z ! )1, 0( N
Probabilities relating to X ! ), ( 2 N can be calculated by
first calculating the standardised Z scores correspondingto the value(s) of X and then using the standard normal
probability table. This is formalized by the followingtheorem.
Theorem 6.2 (The Standardizing Transformation of Non-Standard Normal Probabilities)A random variable X is normally distributed with mean
and variance 2 if and only if
X Z is a standard
normal random variable, that is
X ! ), ( 2 N if and only if
X Z ! )1, 0( N
8/18/2019 IEM Outline Lecture Notes Autumn 2016
84/198
84
Also note that a linear function of a normal variable is alsonormally distributed.
(Extract of Appendix 5)AREAS UNDER THE STANDARD NORMAL DISTRIBUTION
The table below gives areas under the standardnormal distribution between 0 and z.
0 z Z
z 0 1 2 3 4 5 6 7 8 90.00.10.20.30.4
.0000
.0398
.0793
.1179
.1554
.0040
.0438
.0832
.1217
.1591
.0080
.0478
.0871
.1255
.1628
.0120
.0517
.0910
.1293
.1664
.0160
.0557
.0948
.1331
.1700
.0199
.0596
.0987
.1368
.1736
.0239
.0636
.1026
.1406
.1772
.0279
.0675
.1064
.1443
.1808
.0319
.0714
.1103
.1480
.1844
.0359
.0754
.1141
.1517
.1879
0.50.60.70.80.9
.1915
.2258
.2580
.2881
.3159
.1950
.2291
.2612
.2910
.3186
.1985
.2324
.2642
.2939
.3212
.2019
.2357
.2673
.2967
.3238
.2054
.2389
.2704
.2996
.3264
.2088
.2422
.2734
.3023
.3289
.2123
.2454
.2764
.3051
.3315
.2157
.2486
.2794
.3078
.3340
.2190
.2518
.2823
.3106
.3365
.2224
.2549
.2852
.3133
.33891.01.1
1.21.31.4
.3413
.3643
.3849
.4032
.4192
.3438
.3665
.3869
.4049
.4207
.3461
.3686
.3888
.4066
.4222
.3485
.3708
.3907
.4082
.4236
.3508
.3729
.3925
.4099
.4251
.3531
.3749
.3944
.4115
.4265
.3554
.3770
.3962
.4131
.4279
.3577
.3790
.3980
.4147
.4292
.3599
.3810
.3997
.4162
.4306
.3621
.3830
.4015
.4177
.4319
1.51.61.71.81.9
.4332
.4452
.4554
.4641
.4713
.4345
.4463
.4564
.4649
.4719
.4357
.4474
.4573
.4656
.4726
.4370
.4484
.4582
.4664
.4732
.4382
.4495
.4591
.4671
.4738
.4394
.4505
.4599
.4678
.4744
.4406
.4515
.4608
.4686
.4750
.4418
.4525
.4616
.4693
.4756
.4429
.4535
.4625
.4699
.4761
.4441
.4545
.4633
.4706
.4767.... ..... ..... ..... ..... ..... ..... ..... ..... ..... .........3.83.9
.......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
......4999.5000
Example:If Z ! )1, 0( N determine the following probabilities:
)0( Z P (0.5)
)5.0( Z P (0.3085)
)9.01.0( Z P (0.3557)
)64.1(
Z P (0.0505)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
85/198
85
Example:If X ! )4, 12( N , calculate )26.6( X P , )137( X P and
)5.15(
X P . (0.0021, 0.6853, 0.0401)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
86/198
86
Example:From several years’ records, a fish market manager has
determined that the weight of deep sea bream sold in themarket )( X is approximately normally distributed with a
mean of 420 grams and a standard deviation of 80 grams.Assuming this distribution will remain unchanged in thefuture, calculate the expected proportions of deep seabream sold over the next year weighing
(a) Between 300 and 400 grams. (0.3345)
(b) Between 300 and 500 grams. (0.7745)
(c) More than 600 grams. (0.0122)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
87/198
8/18/2019 IEM Outline Lecture Notes Autumn 2016
88/198
88
Example:It is known that 60% of cars registered in a given town use
unleaded petrol. A random sample of 200 cars is selected.Determine the probability that, of the cars in the sample:
130 use unleaded petrol. (0.021)
more than 130 use unleaded petrol. (0.0643)
less than 130 use unleaded petrol. (0.9147)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
89/198
89
MAIN POINTS
The graphical representation of a continuous randomvariable is the graph of its density function – This is the
counterpart of the probability histogram for a discrete
random variable.
The probability that a continuous random variable takes
on a value in some range is given by an area under the
density function.
The uniform distribution has a constant density function.
If X is normally distributed with a mean and variance2
, we can write this information as
X ~ ), ( 2
N
The standard normal random variable Z is such that
Z ~ )1, 0( N
Areas under a normal density function can be calculated
with reference to the ‘standard normal table’, and making
use of the symmetry of the distribution as needed.
The normal distribution can be used to approximate
binomial probabilities provided 5np and 5)p1(n
(with np and )p1(np2
); the approximation can
be improved by using continuity correction.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
90/198
90
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 6
Required Reading:Ref. File 7: Introduction and Sections 7.1 to 7.4
7. INTRODUCTION TO ESTIMATION
7.1 Estimators and Their Properties
From now on we will mainly be concerned with ‘random
samples of random variables’.
Definition (Random Sample of Size ‘n’ of a Random
Variable)Consider a set of random variables
n21 ,....., , X X X . This
set of random variables is said to represent a randomsample of size ‘n’ of the random variable X if
(i) n21 ,....., , X X X are all statistically independent
And
(ii) n21 ,....., , X X X each have the same probability
distribution (or distribution function) as the randomvariable X .
We will mostly use an upper case italicized letter to denotea random variable, and a lower case non-italicized letter to
denote an actual realization or value of the variable.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
91/198
91
Definition (Sample Statistic)Suppose the random variables n21 ,....., , X X X are
associated with a sample of size ‘n’ from a statisticalpopulation. Then any function of (or formula containing)
n21 ,....., , X X X that does not depend on any unknown
parameter is called a sample statistic.
Definition (Estimator/Estimate of a Population
Parameter)Suppose the random variables n21 ,....., , X X X are
associated with a sample of size ‘n’ from a statistical
population. Then a sample statistic involving
n21 ,....., , X X X that is used to estimate a parameter of the
population or associated probability distribution is calledan estimator of the parameter, and a realization of the
sample statistic (an actual number) is called an estimate ofthe parameter.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
92/198
92
Definition (Sample Mean and Variance of a RandomVariable)
Suppose the random variables n21 ,....., , X X X represent arandom sample of size ‘n’ of the random variable X . Thesample mean and variance of X are then defined as,respectively
Sample Mean of X n
n
1
i i
X
X
Sample Variance of X 1n
)(n
1
2
2
i i X X
S
If an estimator is used to obtain a single value estimate ofa parameter, this estimate is called a point estimate.
An interval estimate describes a range, or interval, ofvalues in which the population parameter is believed to be.
An interval estimate is normally centred around a pointestimate.
Since estimators are functions of random variables, theywill also be random variables whose values vary from
sample to sample. The probability distribution of anestimator is called a sampling distribution.
8/18/2019 IEM Outline Lecture Notes Autumn 2016
93/198
93
Most statistical inference is based on a knowledge of the
sampling distributions of estimators.
Properties of Estimators
Definition (Unbiased Estimator)Consider an estimator θ ˆ of some population parameter .
θ ˆ is an unbiased estimator of if )ˆ(θ E . If )ˆ(θ E , θ ˆ
is said to be a biased estimator of with the value of the
bias given by )ˆ(B θ E .
( is the lower case version of the Greek letter ‘theta’)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
94/198
94
Definition (Relative Efficiency of an Estimator)
If 1θ̂ and 2θ̂ are both unbiased estimators of a population
parameter with unequal variances, 1θ̂ is said to be
relatively more efficient than 2θ̂ if
)ˆ()ˆ( 21 θ Var θ Var
Definition (Consistency of an Estimator)An estimator θ ˆ of some population parameter is said tobe a consistent estimator of if as the (random) samplesize increases the probability increases of the estimatoryielding an estimate in some arbitrary fixed interval,however small, centred round the true parameter value.
Theorem (Sufficient Condition for Consistency of anEstimator)
An estimator θ ˆ of some population parameter is aconsistent estimator of this parameter if
)ˆ(limn
θ E and 0)ˆ(limn
θ Var
8/18/2019 IEM Outline Lecture Notes Autumn 2016
95/198
95
7.2 The Sampling Distribution of the Sample Mean
Example:Suppose we know that in a large city 20% of householdspossess no car, 60% possess one car and 20% possess twocars. If we let X be the number of cars in a household wecan write the probability distribution of X as
0x
1x 2x
x)(
X P 51 53 51
Determine the sampling distribution of X based onrandom samples of size 2.
1x 2x x ))x()x(( 2211 X X P
0 0 0 1/5 1/5 = 1/250 1 0.5 1/5 3/5 = 3/250 2 1 1/5 1/5 = 1/251 0 0.5 3/5 1/5 = 3/251 1 1 3/5 3/5 = 9/251 2 1.5 3/5 1/5 = 3/252 0 1 1/5 1/5 = 1/252 1 1.5 1/5 3/5 = 3/252 2 2 1/5 1/5 = 1/25
8/18/2019 IEM Outline Lecture Notes Autumn 2016
96/198
96
x 0 0.5 1 1.5 2
)x(
X P 1/25 6/25 11/25 6/25 1/25
Theorem (The Central Limit Theorem)Consider a random sample n21 ,....., , X X X of size ‘n’ of a
random variable X with a finite mean )( X E and a
finite variance 2)( X Var . Then:
(i) If X is (exactly) normally distributed, the samplemean X will be exactly normally distributed with a
mean and a variance n2 .
(ii) If X is not normally distributed, the sample mean X will be approximately normally distributed with a
mean and a variance n2 for large sample sizes.
This approximation is generally considered to be
valid when 30n .
8/18/2019 IEM Outline Lecture Notes Autumn 2016
97/198
97
Note: )( X Var decreases as ‘n’ increases and approaches
zero in the limit. This, together with the fact that X is
unbiased, ensures that X is a consistent estimator of .
Note: The standard deviation of an estimator is oftencalled the standard error of the estimator, although oftenthis term is used for an estimate of the standard deviationof an estimator.
Example:A particular type of light bulb has a mean life of 6,000hours and a standard deviation of bulb life of 400 hours.What percentage of random samples made up of 100observations of bulb lives will yield mean bulb livesbetween 5,950 and 6,050 hours? (78.88%)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
98/198
98
8. INTERVAL ESTIMATION
8.1 Introduction and Terminology
A confidence interval not only comprises an interval ofpossible population parameter values, but also somemeasure of the degree of belief or confidence that theinterval does indeed contain the parameter in question.
The level of confidence associated with a confidenceinterval is the probability that we will obtain a realizationof the interval that contains the population parameter, i.e.before we actually take a sample. It is usually denoted( 1 )100%, where is the probability ( 10
) ofobtaining a realization of the interval that does not containthe population parameter.
Confidence intervals are constructed on the basis ofknowledge of the sampling distribution of the estimator(or some function thereof) and a predetermined .
The
z Notation:
z is used to denote the value of the standard normal
variable Z such that
)z( Z P
8/18/2019 IEM Outline Lecture Notes Autumn 2016
99/198
8/18/2019 IEM Outline Lecture Notes Autumn 2016
100/198
100
n
X Z ! )1, 0( N
Therefore, for a given :
*1n
zn
z
1n
zn
z
1n
zn
z
1n
zn
z
1zn
z
1)zz(
22
22
22
22
22
22
X X P
X X P
X P
X P
X P
Z P
Thus the ( 1 )100% confidence interval for an observedx is given by
**n
zx, n
zx 22
8/18/2019 IEM Outline Lecture Notes Autumn 2016
101/198
8/18/2019 IEM Outline Lecture Notes Autumn 2016
102/198
102
8.3 Properties of Confidence Intervals
The width of a confidence interval for the populationmean, where we are justified in using the normaldistribution, is given by
nz2z2)zx(zx 2222
X X X
For a given confidence level and a given , the confidenceinterval width decreases with increasing ‘n’. This leads toa criterion for choosing ‘n’.
If we wish to use a calculated x to estimate to within ‘D’
(units) with ( 1 )100% confidence, we should choose ‘n’
such that2
2
D
zn
(assuming a normally distributed population or an 30n )
8/18/2019 IEM Outline Lecture Notes Autumn 2016
103/198
103
Example:A clothing shop located in a busy shopping arcade is
interested in estimating the mean age of people whofrequent the arcade. The shop intends to use thisinformation in determining the appropriate range ofclothing it should stock in order to maximize sales. Asample of people is to be selected at random in the arcadeand questioned by the shop manager about their age.What should the sample size be if the shop manager
wishes to use a calculated x to estimate the average age ofpeople who frequent the arcade to within 1.5 years, with95% confidence, assuming the population standarddeviation is approximately 7.5? (97)
8/18/2019 IEM Outline Lecture Notes Autumn 2016
104/198
104
MAIN POINTS
A random sample of a random variable is such that therandom variables representing the sample are
independently and identically distributed.
An estimator of a population parameter is a formula
containing the random variables representing sample
values.
The probability distribution of an estimator is called a
sampling distribution.
An unbiased estimator of a parameter has a mean equal
to the parameter value.
A consistent estimator has a probability distribution thatbecomes ‘more concentrated’ around the true parameter
value as n tends to infinity.
)( X E , andn
)(2
X Var (if the s X i ' represent a
random sample of the random variable X)
X is and unbiased and consistent estimator of .
The central limit theorem says that even if we are
sampling from a non-normal distribution, the distribution
of the sample mean will be approximately normal
provided the sample size is sufficiently large