Top Banner
UNIT 4 THEORETICAL PROBABILITY DISTRIBUTION Prof. Ryung Kim [email protected] 1 Unit 1.3 (PnG p. 4) This unit extends the notion of probability and introduces some common probability distributions. These mathematical models are useful as a basis for the methods studied in the remainder of the text. 2 1 2
25

unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Oct 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

UNIT 4 THEORETICAL PROBABILITY DISTRIBUTION

Prof. Ryung Kim

[email protected]

1

Unit 1.3 (PnG p. 4)

This unit extends the notion of probability and introduces some common probability distributions. These mathematical models are useful as a basis for the methods studied in the remainder of the text.

2

1

2

Page 2: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

1. PROBABILITY DISTRIBUTIONS

Prof. Ryung Kim

[email protected]

3

Random Variable

Random variable is a function that assigns a number to each outcome. E.g. In a single toss of a coin, let X be 1 if we observe

head and 0 if we observe tail. E.g. Let’s consider coin tossing 3 times, and Y be # of

heads.

Notation X, Y, Z, … : random variable

x, y, z … : observation

4

3

4

Page 3: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Definitions - Types of Random Variables

Discrete random variable a finite number of (or countable) values

• Marital status, number of ear infections an infant develops during the first year of life, …

Continuous random variable infinitely many values that can be mapped on a

continuous scale • Height, weight, life time, forced expiratory volume

In 1 second, …

5

Probability distribution

Probability distribution (for discrete random variable) is a table (or formula) of the probability for each value of the random variable

6

Elementary Statistics, 10th Edition, p 202

5

6

Page 4: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

P(xi) = 1

0 P(xi) 1

Requirements for Probability Distribution7

µ = [xi • P(xi)] Mean or Expected Value

2 = [(xi – µ)2

• P(xi)] Variance

= [(xi – µ)2

• P(xi)] Standard Deviation

Mean, Variance, Standard Deviation of a Discrete Probability Distribution

8

7

8

Page 5: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

2. BERNOULLI AND BINOMIAL PROBABILITY DISTRIBUTION

Prof. Ryung Kim

[email protected]

9

Bernoulli Random Variable

A Bernoulli random variable Y has two possible values 1 and 0, and the probabilities of obtaining those values are p and 1-p, respectively.

In other words,

P(Y=1) = p and P(Y=0) = 1-p

E.g. life/death, male/female, sickness/health

10

9

10

Page 6: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Bernoulli Random variable

In 1987, 29% of the adults in the U.S. smoked cigarettes, cigars, or pipes [CDC, 1989]

If we randomly select one person, and let Y be 1 if the person smokes and 0 if he/she does not.

P(Y=1) = p=0.29

P(Y=0) = 1-p=0.71

11

Binomial Random variable

Now, suppose we randomly select three people, and let X be the number of smokers.

P(X=0) = (1- p)3=(0.71) 3=0.358 P(X=2) = 3p2 (1- p) =3(0.29) 2(0.71) =0.179

P(X=1) = 3p(1- p)2=3(0.29) (0.71) 2=0.439 P(X=3) = p3= (0.29) 3 =0.024

1st

person2nd

person3rd

personProbability

Number ofsmokers (X)

0 0 0 (1-p)(1-p)(1-p) 0

1 0 0 p(1-p)(1-p) 1

0 1 0 (1-p)p(1-p) 1

0 0 1 (1-p)(1-p)p 1

1 1 0 pp(1-p) 2

1 0 1 p(1-p)p 2

0 1 1 (1-p)pp 2

1 1 1 ppp 3

12

11

12

Page 7: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Notationn = number of trialsx = number of successes among n trialsp = probability of success in any one trialq = probability of failure in any one trial (q = 1 – p)

Probability distribution of Binomial Random Variable

P(X=x) = • px (1-p)n-x(n – x )!x!

n !

for x = 0, 1, 2, . . ., n

13

Understanding the Binomial Probability Formula

P(X=x) = • px (1-p)n-xn ! (n – x )!x!

Number of combinations with exactly xsuccesses in n

trials

The probability of x successes

and n-x failures in a particular

order

n ! (n – x )!x! x

n=

14

13

14

Page 8: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Binomial probability distribution

One can model a random phenomenon by a binomial probability distribution if it meets the following requirements:

1. The procedure has a fixed number of trials.

2. The trials must be independent.

3. Each trial has outcomes classified into two categories (‘success’ or ‘failure’).

4. The probability of a success remains the same in all trials.

15

Example – binomial model

When and how do you model a phenomenon with binomial distribution?

In a population of flatworms (Planaria) living in a certain pond, one in five individuals is adult and four are juvenile. An ecologist plans to count the adults in a random sample of 12 flatworms from the pond. What is the probability that she finds less than 5 adults?

16

15

16

Page 9: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Binomial probability model

p=0.2

n=12

Right skewed

Left skewed when p>.5

Symmetric when p=.5

17

Use R to compute binomial probabilities

18

17

18

Page 10: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Use Statato compute binomial probabilities

19

Mean µ = n p

Variance 2 = n p (1-p)

Std. Dev. = n p (1-p)

Mean, variance, Standard Deviation of Binomial Distribution

Recall the definition of mean and variance…

µ = [xi • P(xi)] 2 = [(xi – µ)2• P(xi)]

20

19

20

Page 11: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

3. NORMAL DISTRIBUTION

Prof. Ryung Kim

[email protected]

21

Many random variables of interest – e.g. blood pressure, amount of chemicals in human body, height, and weight – are approximately normally distributed. (PnG, p.177)

The importance of Normal distribution will be obvious in the following chapters.

Normal distribution is a continuous distribution

Normal distribution22

21

22

Page 12: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Probability density function

A density curve is the graph of a continuous probability distribution.1. The total area under the curve must equal 1.

2. Every point on the curve must have a vertical height that is 0 or greater. (That is, the curve cannot fall below the x-axis.)

23

Density function of Normal Distribution

The normal probability distribution has a bell-shape density function and the total area under its density curve is equal to 1. It’s mean µ can be any number and variance 2 can be any positive number.

Graph from Elementary Statistics, 10th Edition

24

23

24

Page 13: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Probability density function -Area and Probability

Because the total area under the density curve is equal to 1, there is a correspondence between area and probability.

The area under the density curve between a and b corresponds to P(a ≤ X ≤ b), i.e. probability that the random variable has value between a and b.

25

µ = ∫ y p(y) dy Mean or Expected Value

2 = ∫ [(y – µ)2p(y)] dy Variance

= ∫ [(y – µ)2p(y)] dy Standard Deviation

I will not ask you to do integration to compute mean, variance, or standard deviation.

Mean, Variance, Standard Deviation of a Continuous Probability Distribution

26

25

26

Page 14: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Shape of Normal Probability Density

Mean =0

27

Shape of Normal Probability Density

Standard Deviation (σ) =1

28

27

28

Page 15: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

THE STANDARD NORMAL DISTRIBUTION

Prof. Ryung Kim

[email protected]

29

Definition – Standard Normal Distribution

The standard normal distribution is the normal distribution with mean equal to 0 and standard deviation equal to 1.

It is extremely important to develop the skill to find areas corresponding to various regions under the graph of the standard normal distribution.

Graph from Elementary Statistics, 10th Edition

30

29

30

Page 16: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

P (z < 1.58) = 0.9429

Computing Standard Normal Probabilities

31

P (z > –1.23) = 0.8907

Computing standard normal probabilities32

31

32

Page 17: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Computing standard normal probabilities

P (z < –2.00) = 0.0228P (z < 1.50) = 0.9332

0.9332 – 0.0228 = 0.9104

P (–2.00 < z < 1.50) = ?

Graph from Elementary Statistics, 10th

Edition

33

Finding the 95th Percentile

1.645

5% or 0.05

(z score will be positive)

Finding Percentiles of Standard Normal Distribution

Graph from Elementary Statistics, 10th

Edition

34

33

34

Page 18: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Finding the Bottom 2.5% and Upper 2.5%(One z score will be negative and the other positive)

Finding Percentiles (continued.)

-1.96 1.96

35

Use R to compute standard normal percentiles

R can be used to find the standard normal percentiles.

1st example

2nd example

2nd example

36

35

36

Page 19: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Use Stata to compute standard normal probability

Stata can be used to find the standard normal probabilities.

1st example

2nd example

3rd example

37

Use Stata to compute standard normal percentiles

Stata can be used to find the standard normal percentiles.

1st example

2nd example

2nd example

38

37

38

Page 20: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Z-TRANSFORMATION

Prof. Ryung Kim

[email protected]

39

To compute probability of non-standard normal:Convert it to the Standard Normal (Z-transformation)

x – z =

Y – µZ =

Graph from Elementary Statistics, 10th Edition

If Y is a Normal random variable with mean µ and variance 2, than Z is a standard normal random variable.

40

39

40

Page 21: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Example – Systolic blood pressure (p.180)

For the population of 18 to 74 year-old males in the U.S., systolic blood pressure is approximately normally distributed with mean 129 millimeters of mercury (mm Hg) and standard deviation 19.8 mm Hg [5]What is the proportion of men in the population who

have systolic blood pressures greater than 150 mm Hg?

41

Example - cont

=19.8 mm Hg = 129 mm Hg

P ( Y > 150 mmHg)= P(Z > 1.06) = 1–0.8556

= 0.1446

z = 150 – 129

19.8= 1.06

Elementary Statistics, 10th Edition

42

41

42

Page 22: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Use R to compute Normal Probability43

Example – Finding percentiles

Find the values that cut off the upper and lower 2.5% of the curve of systolic blood pressure.

44

43

44

Page 23: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

y = + zy = 129 + 1.9619.8y = 167.81

Example – Finding percentiles (cont.)

“The pressure of 167.81 (mm Hg) separates the lightest 97.5% from the heaviest 2.5%”

45

Use R to compute Normal Probability46

45

46

Page 24: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Use Stata to compute Normal Probability

1st example

2nd example

47

Other Probability Distributions

For counts Poisson distributions

For positive continuous variables Exponential distributions Gamma distributions Weibull distributions Chi-square distributions

For percentages or proportions as continuous variables Beta distributions

And MANY others

48

47

48

Page 25: unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp 9duldeohv 'lvfuhwh udqgrp yduldeoh d ilqlwh qxpehu ri ru frxqwdeoh ydoxhv 0dulwdo vwdwxv

Reference Principles of Biostatistics (Pagano and Gauvreau)

Elementary Statistics by Triola, 10th edition.

Applied Statistics for Engineers and Scientists by Petruccelli et al.

Acknowledgements Prof. Jayson Wilbur, WPI

Prof. Balgobin Nandram, WPI

Prof. Lee Jaeyong, Seoul National University

Some slides provided by Pearson Education, Inc Publishing as Pearson Addison-Weslely

49

49