Top Banner
Normal and Poisson Distributions GTECH 201 Lecture 14
47

Normal and Poisson Distributions GTECH 201 Lecture 14.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Normal and Poisson Distributions GTECH 201 Lecture 14.

Normal and Poisson Distributions

GTECH 201Lecture 14

Page 2: Normal and Poisson Distributions GTECH 201 Lecture 14.

Sampling Population

The entire group of objects about which information is sought

Unit Any individual member of the population

Sample A part or a subset of the population used to

gain information about the whole Sampling Frame

The list of units from which the sample is chosen

Page 3: Normal and Poisson Distributions GTECH 201 Lecture 14.

Simple Random Sampling

A simple random sample of size n is a sample of n units chosen in such a way that every collection of n units from a sampling frame has the same chance of being chosen

Page 4: Normal and Poisson Distributions GTECH 201 Lecture 14.

Random Sampling in R

In R you can simulate random draws

For example, to pick five numbers at random from the set 1:40, you can

> sample(1:40,5)

[1] 4 30 28 40 13

Page 5: Normal and Poisson Distributions GTECH 201 Lecture 14.

Sampling with Replacement

Default in R is ‘without replacement’

sample(c("H", "T"), 10, replace=T)

[1] "T" "T" "T" "T" "T" "H" "H" "T" "H" "T“

prob=c(.9,.1)sample(c("S", "F"), 10, replace=T, prob)

Page 6: Normal and Poisson Distributions GTECH 201 Lecture 14.

Random Number Tables A table of random digits is:

A list of 10 digits 0 through 9 having the following properties

The digit in any position in the list has the same chance of being any of of 0 through 9;

The digits in different positions are independent, in that the value of one has no influence on the value of any other

Any pair of digits has the same chance of being any of the 100 possible pairs, i.e., 00,01,02, ..98, 99

Any triple of digits has the same chance of being any of the 1000 possible triples, i.e., 000, 001, 002, …998, 999

Page 7: Normal and Poisson Distributions GTECH 201 Lecture 14.

Using Random Number Tables

A health inspector must select a SRS of size 5 from 100 containers of ice cream to check for E. coli contamination

The task is to draw a set of units from the sampling frame

Assign a number to each individual Label the containers 00, 01,02,…99 Enter table and read across any line

81486 69487 6051309297

81, 48, 66, 94, 87, 60, 51, 30, 92, 97

Page 8: Normal and Poisson Distributions GTECH 201 Lecture 14.

Random Number Generation in R

> rnorm(10) > rnorm(10, mean=7, sd=5) > rbinom(10, size=20, prob=.5)

We will revisit the meaning of the parameters at the end of today’s session

Page 9: Normal and Poisson Distributions GTECH 201 Lecture 14.

Combinatorics 1 Back to draw five out of 40

sample(1:40,5) The probability for any given

number is 1/40 in the first sample,, 1/39 in the second, and so on

P(x ) = 1/(40*39*38*37*36*35) > 1/prod(40:36)

[1] 1.266449e-08But…

Page 10: Normal and Poisson Distributions GTECH 201 Lecture 14.

We don’t care about the order of the five numbers out of 40

There are 5*4*3*2*1 combinations for the five drawn numbers

> prod(1:5) / prod(40:36)[1] 1.519738e-06

Shorthand for the above in > 1/choose(40,5)

Combinatorics 2

Page 11: Normal and Poisson Distributions GTECH 201 Lecture 14.

Binomial Distribution

Discrete probability distribution Events have only 2 possible outcomes

binary, yes-no, presence-absence Computing probability of multiple events

or trials Examples

Probability that x number of people are alive at the age of 65

Probability of a river reaching flood stage for three consecutive years

Page 12: Normal and Poisson Distributions GTECH 201 Lecture 14.

When to Apply Binomial If sample is less than 10% of a large

population in which a proportion p have acharacteristic of interest, then the distribution X, the number in the sample with that characteristic, is approximately binomial (n, p), where n is the sample size

Page 13: Normal and Poisson Distributions GTECH 201 Lecture 14.

Geometric Distribution

Tossing a biased coin until the first head appears pr(H) = p

pr(X = x) = pr(TT…T H) = pr(T1 ∩ T2 ∩ ..∩ Hx) = (1 – p)x-1 p

The geometric distribution is the distributionof the number of tosses of a biased coin up toand including the first head

Page 14: Normal and Poisson Distributions GTECH 201 Lecture 14.

Poisson Distribution Discrete probability distribution

Named in honor of Simeon Poisson (1781-1840)

What is it used for? To model the frequency with which a specified

event occurs over a period of time The specified event occurs randomly

Independent of past or future occurrences Geographers also use this distribution to model

how frequently an event occurs across a particular area

We can also examine a data set (of frequency counts in order to determine whether a random distribution exists

Page 15: Normal and Poisson Distributions GTECH 201 Lecture 14.

Poisson Distribution is used…

To analyze the number of patients arriving at a hospital emergency room between 6 AM and 7 AM on a particular day Obvious implications for resource allocation

To analyze the number of phone calls per day arriving at a telephone switchboard

To analyze the number of cars using the drive through window at a fast-food restaurant

To analyze hailstorm occurrence in one Canadian province

Page 16: Normal and Poisson Distributions GTECH 201 Lecture 14.

The Poisson Probability Formula

( )!

x

P X x ex

Lambda () is a positive real number (mean frequency)

e = 2.718 (mathematical constant)

X = 0, 1, 2, 3, ….(frequency of an occurrence)

X!= X factorial

Page 17: Normal and Poisson Distributions GTECH 201 Lecture 14.

Example - 1 General Hospital, located in Phoenix, keeps

records of emergency room traffic. From these records, we find that the number of patients arriving between 10 AM and 12 Noon has a Poisson distribution of with parameter =6.9Determine the probability that, on any given day, the number of patients arriving at that emergency room between 10 AM and 12 Noon will be: Exactly four At the most two

Page 18: Normal and Poisson Distributions GTECH 201 Lecture 14.

Exactly four arrivals, x=4

4(6.9) (6.9)( 4)

4! P X e

(6.9) 2266.7121( 4) 2.71824

P X

0.095

Page 19: Normal and Poisson Distributions GTECH 201 Lecture 14.

At the most, two arrivals…

( 2) ( 0) ( 1) ( 2) P X p X p X P X

0 1 2(6.9) (6.9) (6.9) (6.9)

( 2)0! 1! 2!

P X e

6.9 (1 6.9 23.805) e

0.032

( 2)P X

Page 20: Normal and Poisson Distributions GTECH 201 Lecture 14.

Revisiting Mean and Standard Deviation

Mean

Standard Dev.

Page 21: Normal and Poisson Distributions GTECH 201 Lecture 14.

What if…We wanted to obtain a table of probabilities for the random variable X, the number of patients arriving between 10AM and 12 Noon?

Number arriving, x

Probability (X=x)

0 0.0011 0.0072 0.0243 0.0554 0.0955 0.1316 0.1517 0.1498 0.1289 0.098

10 0.068

Page 22: Normal and Poisson Distributions GTECH 201 Lecture 14.

Discrete versus Continuous

Distributions Moving from individual probabilities to

total number of successes or failures Probability distribution f (x ) = P (X=x)

for discrete events:

Probability distribution for continuous events:

!1

!( )!n xxn

f x p px n x

22

1exp

22

xf x

Page 23: Normal and Poisson Distributions GTECH 201 Lecture 14.

Expected Values Population standard deviation

square root of the average squared distance of X from the mean

2( ) [( ) ]sd X E X

Page 24: Normal and Poisson Distributions GTECH 201 Lecture 14.

Expected Values

Mean and Poisson distribution

It can be shown that this adds to . Thus, for Poisson-distributed populations E(X) =

The standard deviation sd(X) for Poisson() is √

E(X) = ixipr(xi)

= 0 x pr(0) + 1 x pr(1) + 2 x pr(2) + ...

= 0 x pr( ) + 1 x pr( ) + 2 x pr( ) + ...e-00!

e-11!

e-22!

Page 25: Normal and Poisson Distributions GTECH 201 Lecture 14.

Probability Density Functions

Moving from the discrete to the continuous

Increasing the frequency of observations results in an ever finer histogram

Total area under the curve = 1

Page 26: Normal and Poisson Distributions GTECH 201 Lecture 14.

Probability Density Functions

Population means and standard dev’s• x balances the distribution

• The standard deviation is calculated as for discrete density functions

x

f(x)

x

Page 27: Normal and Poisson Distributions GTECH 201 Lecture 14.

The Normal Distribution

Page 28: Normal and Poisson Distributions GTECH 201 Lecture 14.

Properties of a Normal Distribution

Continuous Probability Distribution Symmetrical about a central point

No skewness Central point in this dataset corresponds to all

three measures of central tendency Also called a Bell Curve If we accept or assume that our data is

normally distributed, then, We can compute the probability of different

outcomes

Page 29: Normal and Poisson Distributions GTECH 201 Lecture 14.

Using the symmetrical property of the

distribution, we can conclude: 50 % of values must lie to the right, i.e. they

are greater than the mean 50% of values must lie to the left, i.e. If the data is normally distributed, the

probability values are also normally distributed

The total area under the normal curve represents all (100%) of probable outcomes

What can you say about data values in a normally distributed data set?

Properties of a Normal Distribution

Page 30: Normal and Poisson Distributions GTECH 201 Lecture 14.

Normal Distribution and Standard Deviations

Page 31: Normal and Poisson Distributions GTECH 201 Lecture 14.

Approximating a Normal Distribution

In reality, If a variable’s distribution is shaped

roughly like a normal curve, Then the variable approximates a

normal distribution Normal Distribution is determined by

Mean Standard Deviation These measures are considered parameters

of a Normal Distribution / Normal Curve

Page 32: Normal and Poisson Distributions GTECH 201 Lecture 14.

Equation of a Normal Curve

Mean = ; Standard Deviation =

e = 2.718 ; = 3.142

22

1exp

22

xf x

Page 33: Normal and Poisson Distributions GTECH 201 Lecture 14.

Areas Within the Normal Curve

For a normally distributed variable, the percentage of all possible observations

that lie within any specified range equals the corresponding area under its associated normal curve expressed as a percentage

A college has an enrollment of 3264 female students. Mean height is 64.4 inches, standard deviation is 2.4 inches Frequency and relative frequency are

presented

Page 34: Normal and Poisson Distributions GTECH 201 Lecture 14.

Height (in) Freq Rel. Freq56 < 57 3 0.000957 < 58 6 0.001858 < 59 26 0.008059 < 60 74 0.022760 < 61 147 0.045061 < 62 247 0.075762 < 63 382 0.117063 < 64 483 0.148064 < 65 559 0.171365 < 66 514 0.157566 < 67 359 0.110067 < 68 240 0.073568 < 69 122 0.037469 < 70 65 0.019970 < 71 24 0.007471 < 72 7 0.002172 < 73 5 0.001573 < 74 1 0.0003

3264 1

Frequency and Relative

Frequency Table

0.0735, i.e. 7.35 % of the students are between 67 and 68 inches tall

Page 35: Normal and Poisson Distributions GTECH 201 Lecture 14.

Relative Frequency Histogram with Normal Curve

0.0735 = the area that has been cross-hatched

Shaded area under the normal curve approximates the percentage of students who are between 67-68 inches tall

Page 36: Normal and Poisson Distributions GTECH 201 Lecture 14.

Standardizing a Normal Variable

Once we have mean and standard deviation of a curve, we know its distribution and the associated normal curve

Percentages for a normally distributed variable are equal to the areas under the associated normal curve

There could be hundreds of different normal curves (one for each choice of mean or std. dev. value

How can we find the areas under a standard normal curve?

A normally distributed variable with a mean of 0 and a standard deviation of 1 is said to have a standard normal distribution

Page 37: Normal and Poisson Distributions GTECH 201 Lecture 14.

Z Score

xz

The variable z is called the standardized version of x, or the standardized variable corresponding to x, with the mean = 0 and standard deviation = 1Almost all observations in a dataset will lie within three standard deviations to either side of the mean, i.e., almost all possible observations will have z scores between – 3 and + 3

x xz

s

Page 38: Normal and Poisson Distributions GTECH 201 Lecture 14.

Normal Curve Properties

The total area under the standard normal curve is equal to 1

The standard normal curve extends infinitely in both directions, approaching but never touching the horizontal axis

Standard normal curve is symmetric about 0

Most of the area under a standard normal curve lies between –3 and + 3

Page 39: Normal and Poisson Distributions GTECH 201 Lecture 14.

Using the Standard Normal Table

The times taken for runners to complete a local 10 km race is normally distributed with a mean of 61 minutes and a standard deviation of 9 minutes. Let x be the finish time of a randomly selected runner. Find the probability that x > 75 minutes

Step 1 Calculate the standard score

z = 75-61/9; z = 1.56Step 2 Determine the probability from the normal table For z of 1.56, p = 0.4406Step 3 Interpret the result p (x>75) = 0.5 – 0.446 = 0.054 or 5.4% chance

Page 40: Normal and Poisson Distributions GTECH 201 Lecture 14.

Using the Standard Normal Table

In the previous example, what is the probability that someone finishes in less than 45 minutes?

Step 1 Calculate the standard score z = 45-61/9; z = -1.78Step 2 Determine the probability from the normal table For z of -1.78, area= 0.4625 Step 3 Interpret the result p (x<45) = 1- (0.5+0.4625) = 0.038 or 3.8 % of the runners finish in less than 45 minutes

Page 41: Normal and Poisson Distributions GTECH 201 Lecture 14.

Three Distributions

Distribution Parameters

Binomial number of events or trials probability of success

Poisson mean number of events

Normal mean standard deviation

Page 42: Normal and Poisson Distributions GTECH 201 Lecture 14.

Normal Approximations for Discrete Distributions

Approximation of the Binomial Binomial is used for large n and small p If p is moderate (not close to 0 or 1),

then the Binomial can be approximated by the normal

Rule of thumb: np (1-p) ≥ 10 Other normal approximations

If X ~ Poisson(), normal works well for ≥ 10

Page 43: Normal and Poisson Distributions GTECH 201 Lecture 14.

Built-in Distributions in

Four fundamental items can be calculated for a statistical distribution: Density or point probability Cumulated probability, distribution

function Quantiles Pseudo-random numbers

In there are functions for each of these

Page 44: Normal and Poisson Distributions GTECH 201 Lecture 14.

Density of a Normal Distribution

> x = seq(-4, 4, 0.1) > plot (x, dnorm(x), type="l")

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

dnor

m(x

)

Page 45: Normal and Poisson Distributions GTECH 201 Lecture 14.

For Discrete Distributions..

> x = 0:50 > plot (x, dbinom(x, size=50, prob=.33,

type="h")

0 10 20 30 40 50

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

x

dbi

nom

(x, s

ize

= 5

0, p

rob

= 0

.33)

Page 46: Normal and Poisson Distributions GTECH 201 Lecture 14.

Cumulative Distribution Functions

Could be graphed but is not very informative

Example Blood sugar concentration in the US

population has a mean of 132 and a standard deviation of 13.How special is a patient with a value 160?

1 – pnorm(160, mean=132, sd=13)[1] 0.01562612 or 1.5%

Page 47: Normal and Poisson Distributions GTECH 201 Lecture 14.

Random Number Generation in R

> rnorm(10) > rnorm(10, mean=7, sd=5) > rbinom(10, size=20, prob=.5)

Now you understand the parameters…