Top Banner
Introduction to Statistical Inference Floyd Bullard Introduction Example 1 Example 2 Example 3 Example 4 Conclusion Introduction to Statistical Inference Floyd Bullard SAMSI/CRSC Undergraduate Workshop at NCSU 23 May 2006
36

Introduction to Statistical Inferencefab2/inference_talk.pdfIntroduction to Statistical Inference. Statistical Inference. Statistical Inference. Inference data" if the ratio of its

Mar 16, 2018

Download

Documents

TrươngTuyến
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Introduction to Statistical Inference

    Floyd Bullard

    SAMSI/CRSC Undergraduate Workshop at NCSU

    23 May 2006

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Parametric models

    Statistical inference means drawing conclusions based ondata. There are a many contexts in which inference isdesirable, and there are many approaches to performinginference.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Parametric models

    Statistical inference means drawing conclusions based ondata. There are a many contexts in which inference isdesirable, and there are many approaches to performinginference.

    One important inferential context is parametric models. Forexample, if you have noisy (x , y) data that you think followthe pattern y = 0 + 1x + error , then you might want toestimate 0, 1, and the magnitude of the error.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Parametric models

    Statistical inference means drawing conclusions based ondata. There are a many contexts in which inference isdesirable, and there are many approaches to performinginference.

    One important inferential context is parametric models. Forexample, if you have noisy (x , y) data that you think followthe pattern y = 0 + 1x + error , then you might want toestimate 0, 1, and the magnitude of the error.

    Throughout this week, well be examining parametric models.(More complex than this simple linear model of course.)

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Likelihood ratios

    There are numerous tools available for parameter estimation,and youll be introduced to two or three of them this week.The one well look at this afternoon may be the moststraightforward and easiest to understand: likelihood ratios.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1

    Suppose a large bag contains a million marbles, somefraction of which are red. Lets call the fraction of redmarbles . is a constant, but its value is unknown to us.We want to estimate the value of .

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    Obviously wed be just guessing if we didnt collect any data,so lets suppose we draw 3 marbles out at random and findthat the first is white, the second is red, and the third iswhite.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    Obviously wed be just guessing if we didnt collect any data,so lets suppose we draw 3 marbles out at random and findthat the first is white, the second is red, and the third iswhite.

    Question: What would be the probability of that particularsequence, WRW, if were equal to, say, 0.2?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    If = 0.2, then the probability of drawing out the sequenceWRW would be 0.8 0.2 0.8 = 0.128.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    If = 0.2, then the probability of drawing out the sequenceWRW would be 0.8 0.2 0.8 = 0.128.

    Question: What would be the probability of that particularsequence, WRW, if = 0.7?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    If = 0.7, then the probability of drawing out the sequenceWRW would be 0.3 0.7 0.3 = 0.063.

    Notice that = 0.7 is less likely to have produced theobserved sequence WRW that is = 0.2.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    If = 0.7, then the probability of drawing out the sequenceWRW would be 0.3 0.7 0.3 = 0.063.

    Notice that = 0.7 is less likely to have produced theobserved sequence WRW that is = 0.2.

    Question: Of all possible values of [0, 1], which onewould have had the greatest probability of producing thesequence WRW?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    Your gut feeling may be that = 13 is the candidate value of that would have had the greatest probability of producingthe sequence we observed, WRW. But can that be proven?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    The probability of observing the sequence WRW for someunknown value of is given by the equation

    L() = (1 )()(1 ) = (1 )2.

    Differentiating gives:

    d

    dL() = 2(1 )(1) + (1 )2 1

    = 32 4 + 1

    = (3 1)( 1)

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    The function L() is called the likelihood function, and thevalue of that maximizes L() is called the maximumlikelihood estimate, or MLE. In this case we did indeed havean MLE of 13 .

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    The MLE may be the best guess for , at least based onthe maximum likelihood criterion, but surely there are othervalues of that are also plausible. How should be find them?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    Like

    lihoo

    d L(

    )

    Figure: The likelihood function L() plotted against . Whatvalues of are plausible, given the observation WRW?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 1 (continued)

    Here is the MATLAB code that generated the graph on theprevious slide:

    p = [0:0.01:1];

    L = p.*((1-p).^2);

    plot(p,L)

    xlabel(\pi)

    ylabel(Likelihood L(\pi ))

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 2

    Okay. Now suppose that you again have a bag with a millionmarbles, and again you want to estimate the proportion ofreds, . This time you drew 50 marbles out at random andobserved 28 reds and 22 whites.

    Come up with the MLE for and also use MATLAB to give arange of other plausible values for .

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 2 (continued)

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4x 1015

    Like

    lihoo

    d L(

    )

    Figure: The red line is at 0.1 of the MLEs likelihood. Plausiblevalues of (by this criterion) are between 0.41 and 0.70.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    A warning

    The likelihood function L is not a probability densityfunction, and it does not integrate to 1!

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    A comment

    Notice that in the second example the scale of the likelihoodfunction was much smaller than in the first example. (Why?)

    Some of you likely know some combinatorics, and perhapswere inclined to include a binomial coefficient in thelikelihood function:

    L() =

    (

    5028

    )

    28(1 )22,

    instead ofL() = 28(1 )22.

    Why might that matter? How does it change our inferentialconclusions?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3

    Suppose now we have data that we will model as havingcome from a normal distribution with an unknown mean and an unknown standard deviation . For example, thesefive heights (in inches) of randomly selected MLB players:

    Player Position Team Height Age

    Pedro Lopez 2B White Sox 73 22Boof Bonser P Twins 76 24Ken Ray P Braves 74 31Xavier Nady RF Mets 74 27Jeremy Guthrie P Indians 73 27

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    Recall that the normal distribution is a continuousprobability density function (pdf), so the probability ofobserving any number exactly is, technically, 0. But theseplayers heights are clearly rounded to the nearest inch. Sothe probability of observing a height of 73 inches when theactual height is rounded to the nearest inch is equal to thearea under the normal curve over that span of heights thatwould round to 73 inches.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    730

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    height (inches)

    prob

    abilit

    y de

    nsity

    Hypothetical height distribution of MLB players

    Figure: The probability that a players height will be within a halfinch of 73 inches is (roughly) proportional to the pdf at 73 inches.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    So if f (h) is a probability density funtion with mean andstandard deviation , then the probability of observing theheights h1, h2, h3, h4, and h5 is (approximately) proportionalto f (h1) f (h2) f (h3) f (h4) f (h5).

    Lets not forget what were trying to do: estimate and !The likelihood function L is a function of both and , andit is proportional to the product of the five normal densities:

    L(, ) f (h1) f (h2) f (h3) f (h4) f (h5),

    where f is the normal probability density function withparameters and .

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    Happily, the normal probability density function is a built-infunction in MATLAB:

    normpdf(X, mu, sigma)

    X can be a vector of values, and MATLAB will compute thenormal pdf at each of them, returning a vector.

    As such, we may compute the likelihood function at aparticular and in MATLAB like this:

    data = [73, 76, 74, 74, 73];

    L(mu, sigma) = prod(normpdf(data, mu, sigma));

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    data = [73, 76, 74, 74, 73];

    mu = [70:0.1:80];

    sigma = [0:0.1:5];

    L = zeros(length(mu), length(sigma));

    for i = 1:length(mu)

    for j = 1:length(sigma)

    L(i,j) = prod(normpdf(data, mu(i), sigma(j)));

    end

    end

    surf(sigma, mu, L)

    xlabel(sigma)

    ylabel(mu)

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    7072

    7476

    7880

    01

    23

    450

    1

    2

    3

    4

    5

    6

    x 104

    Figure: The likelihood function shows what values of theparameters and are most consistent with the observed datavalues.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    MLB player heights, and

    70 71 72 73 74 75 76 77 78 79 800

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    Figure: contour(sigma, mu, L)

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 3 (continued)

    MLB player heights, and

    70 71 72 73 74 75 76 77 78 79 800

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    Figure: Level contours at 10%, 5%, and 1% of the maximumlikelihood

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 4 (a cautionary tale!)

    Here are five randomly sampled MLB players annual salaries:

    Player Position Team 2006 Salary

    Jeff Fassero P Giants $750 KBrad Penny P Dodgers $5250 KChipper Jones 3B Braves $12333 KJose Valverde P Diamondbacks $359 KAlfredo Amezaga SS Marlins $340 K

    Lets use the same technique we used with MLB playersheights to estimate the mean and standard deviation ofplayers salaries.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 4 (continued)

    ($K)

    MLB player salaries, and

    6000 4000 2000 0 2000 4000 6000 8000 10000 120000

    5000

    10000

    15000

    Figure: Level contours at 10%, 5%, and 1% of the maximumlikelihood. Whats wrong with this picture?

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Example 4 (continued)

    Moral: If the model isnt any good, then the inference wontbe either.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Conclusion

    Statistical inference means drawing conclusions based ondata. One context for inference is the parametric model, inwhich data are supposed to come from a certain distributionfamily, the members of which are distinguished by differingparameter values. The normal distribution family is oneexample.

    One tool of statistical inference is the likelihood ratio, inwhich a parameter value is considered consistent with thedata if the ratio of its likelihood to the maximum likelihoodis at least some threshold value, such as 10% or 1%. Whilemore sophisticated inferential tools exist, this one may bethe most staightforward and obvious.

  • Introduction toStatisticalInference

    Floyd Bullard

    Introduction

    Example 1

    Example 2

    Example 3

    Example 4

    Conclusion

    Conclusion

    Enjoy the week here at NCSU!

    Feel free to ask any of us questions at any time!

    IntroductionExample 1Example 2Example 3Example 4Conclusion