Top Banner
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
37

Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Jan 11, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sociology 5811:Lecture 7: Samples, Populations,

The Sampling Distribution

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Announcements

• Problem Set #2 due today!

Page 3: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Review: Populations

• Population: The entire set of persons, objects, or events that have at least one common characteristic of interest to a researcher (Knoke, p. 15)

• Beyond literal definition, a population is the general group that we wish to study and gain insight into

• Sample: A subset of a population

• Random Sample: A sample chosen from a population such that each observation has an equal chance of being selected (Knoke, p. 77)

• Randomness is one strategy to avoid biased samples.

Page 4: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Review: Statistical Inference• Statistical inference: making statistical

generalizations about a population from evidence contained in a sample (Knoke, 77)

• When is statistical inference likely to work?• 1. When a sample is large

• If a sample approaches the size of the population, it is likely be a good reflection of that population

• 2. When a sample is representative of the entire population

• As opposed to a sample that is atypical in some way, and thus not reflective of the larger group.

Page 5: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Populations and Samples

• Population parameters (μ, σ) are constants• There is one true value, but it is usually unknown

• Sample statistics (Y-bar, s) are variables• Up until now we’ve treated them as constants

• But, there are many possible samples

• The value of mean, S.D. vary depending on which sample you have

• Like any variable, the mean and S.D. have a distribution

• Called the “sampling distribution”

• Made up of all values for any given population

Page 6: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Populations and Samples: Overview

Population Sample

Characteristics “parameters” “statistics”

Characteristics are:

constant (one for population)

variables (varies for each sample)

Notation Greek (, ) Roman ( , s)

Estimate “hat”: “point estimate” based on sample

σ̂

Y

Page 7: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Population and Sample Distributions

Y

s

Page 8: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Estimating the Mean

• Suppose we want to know the mean of a population (μ). What do we do?

• Plan A: Spend $100 million dollars to survey our entire population

• If it is even possible to survey the whole population

• Plan B: Spend $1,000 sampling a few hundred people.

• Estimate the mean

• Simply use formulas to estimate mu: μ̂

Page 9: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Estimating the Mean

• Question: Given our sample, what is our best guess of the population mean?

• Answer: The sample mean: Y-bar• Look at Y-bar, assume that it is a “good guess”

• Thus, we calculate:

N

iiY

NY

1

1μ̂

Page 10: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Estimating the Mean

• Issue: There are an infinite number of possible samples that one can take from any population– Each possible sample has a mean, most of which are

different• Some are close to the population mean, some not

• Q: How do we know if we got a “good guess”?

• A: We can’t know for sure. We may draw incorrect conclusions about the mean

• But: We can use probability theory to determine if our guess is likely to be good!

Page 11: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Estimates and Sampling Distributions

• It is possible to take more than one sample• And calculate more than one estimate of the mean

• If we took many samples (and calculated many means), we’d see a range of estimates

• We could even plot a histogram of the many estimates

• Our confidence in our guess depends on how “spread out” the range of guesses tends to be

• The “standard deviation” of that particular histogram.

Page 12: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distributions

• Sampling Distribution: The distribution of estimates created by taking all possible unique samples (of a fixed size) from a population

• Example: Take every possible 10-person sample of sociology graduate students (all combinations)

• 1. Calculate the mean of each sample

• 2. Graph a histogram of all estimates

• This is called “the sampling distribution of the mean”

• Note: The sampling distribution is rarely known• It is typically thought of as a probability distribution.

Page 13: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Notation

• Population mean and S.D. are: • Each sample has a mean and S.D.: Y-bar, s

• The sampling distribution of the mean (i.e., the distribution of mean-estimates) also has a mean

• And a S.D., aka the “standard error”

• Mean, S.D. of sampling distribution: YY σ μ

• Question: Why are they Greek?• A:Because all possible samples represent a population

• Question: Why is there a sub-Y-bar?• Because it is the mean of all possible Y-bars (means)

Page 14: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution of the Mean

• It turns out that under some circumstances, the shape of the sampling distribution of the mean can be determined– Thus allowing one to get a sense of the range of

estimates of the mean one is likely to see• If distribution is narrow, our guess is probably good!

• If S.D. is large, our guess may be quite bad

• This provides insight into the probable location of the population mean

• Even if you only have one single sample to look at

• This “trick” lets us draw conclusions!!!

Page 15: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example

• Let’s create a sampling distribution from a small population, = 52. (Sample N = 3)

Case # of CDs

1 30

2 100

3 20

4 70

5 40

• Note how the mean varies depending on the sample

• Mean of cases 1,2,3 = 50

• Mean of 2,4,5 = 70

• For this population (N=5) we can calculate all possible means based on sample size 3

Page 16: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example

• First, we must calculate every possible mean

Case # of CDs

1 30

2 100

3 20

4 70

5 40

• 1,2,3 = 50• 1,2,4 = 66.67• 1,2,5 = 56.67 • 1,3,4 = 40• 1,3,5 = 30• 1,4,5 = 46.67• 2,3,4 = 63.33• 2,3,5 = 53.33• 2,4,5 = 70• 3,4,5 = 43.33

Page 17: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example

• Here, you can see how the sample mean is really a variable

• This complete list of all possible means is the sampling distribution

• As a probability distribution, this tells us the probability of picking a sample with each mean

• Note: Sampling Dist mean = 52• Same as population mean!

Sample Y-bar

1 50

2 66.67

3 56.67

4 40

5 30

6 46.67

7 63.33

8 53.33

9 70

10 43.33

Page 18: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example

• Histogram of Sampling Distribution (N=3):

17-27 27-37 37-47 47-57 57-67 67-77 77-87

4

3

2

1

0

= 52

• Note: The distribution centers around the population mean

• And, it is roughly symmetrical

Page 19: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example• As a probability distribution, the sampling

distribution gives a sense of the quality of our estimate of

17-27 27-37 37-47 47-57 57-67 67-77 77-87

.4

.3

.2

.1

0

= 52

Probability = Frequency / N

The probability of picking a sample with a mean that is within +/- 5

of is p = .3 (30%)

The probability of overestimating by

more than 15 is about p = .1 (10%)

Q: What is the probability of a

“poor” estimate of ?

Page 20: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sampling Distribution Example

• Note: If the sampling distribution is narrow, most of our estimates of the mean will be good

• That is, they will be close to , the population mean

• If the sampling distribution is wide, the probability of a “bad” estimate goes up

• A measure of dispersion can help us assess the sampling distribution

• Recall: the standard deviation of a sampling distribution is called: the standard error

• It tells us the width of the sampling distribution!

Page 21: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

The Central Limit Theorem

• But, how do we know the width of the sampling distribution?

• Statisticians have shown that the sampling distribution will have consistent properties, if we have a large sample

• Several of these properties constitute the “Central Limit Theorem”

• These properties provide the basis for drawing statistical inferences about the mean.

Page 22: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

The Central Limit Theorem

• If you have a large sample (Large N):

• 1. The sampling distribution of the mean (and thus all possible estimates of the mean) cluster around the true population mean

• 2. They cluster as a normal curve• Even if the population distribution is not normal

• 3. The estimates are dispersed around the population mean by a knowable standard deviation (sigma over root N)

Page 23: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

The Central Limit Theorem

• Formally stated:

1. As N grows large, the sampling distribution of the mean approaches normality

YY μμ 2.

NY

Y

σσ 3.

Page 24: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Central Limit Theorem: Visually

Ys

YμYσ

Page 25: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Implications of the C.L.T

• What does this mean for us?

• Typically, we only have one sample, and thus only one estimate of

• The actual value of is unknown• So we don’t know the center of the sampling distribution

• All we know for certain is that our estimate falls somewhere in the sampling distribution

• This is always true by definition

• And, later, we’ll estimate its width.

Page 26: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Implications of the C.L.T• Visually: Suppose we observe mu-hat = 16

16μ̂ μ

16μ̂ μ

16μ̂ μ

16μ̂ μ

But, mu-hat always falls within the

sampling distribution

Sampling distribution

There are many

possible locations

of

Page 27: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Implications of the C.L.T

• We know that the mean from our sample falls somewhere in this sampling distribution

• Which has mean , standard deviation over square root N

• If we can estimate , we can estimate sigma over root N... The “Standard Error” of the mean

• We don’t know exactly where the sample falls• But, laws of probability suggest that we are most likely to

draw a sample w/mean from near the center

• Recall: 67% fall +/- 1 SD, 95 +/- 2SD in a normal curve

• So, we can determine the range around in which 95% (or 99%, or 99.9%) of cases will fall.

Page 28: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Implications of the C.L.T

• What is the relation between the Standard Error and the size of our sample (N)?

• Answer: It is an inverse relationship.• The standard deviation of the sampling distribution shrinks

as N gets larger

• Formula:NY

Y

σσ

• Conclusion: Estimates of the mean based on larger samples tend to cluster closer around the true population mean.

Page 29: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Implications of the CLT

• Visually: The width of the sampling distribution is an inverse function of N (sample size)– The distribution of mean estimates based on N = 10

will be more dispersed. Mean estimates based on N = 50 will cluster closer to .

μ̂μ

μ̂μ

Smaller sample size Larger sample size

Page 30: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Confidence Intervals

• Benefits of knowing the width of the sampling distribution:

• 1. You can figure out the general range of error that a given point estimate might miss by

• based on the range around the true mean that the estimates will fall

• 2. And, this defines the range around an estimate that is likely to hold the population mean

• A “confidence interval”

• Note: These only work if N is large!

Page 31: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Confidence Interval

• Confidence Interval: “A range of values around a point estimate that makes it possible to state the probability that an interval contains the population parameter between its lower and upper bounds.” (Bohrnstedt & Knoke p. 90)

• It involves a range and a probability

• Examples: • We are 95% confident that the mean number of CDs owned

by grad students is between 20 and 45

• We are 50% confident the mean rainfall this year will be between 12 and 22 inches.

Page 32: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Confidence Interval

• Visually: It is probable that falls near mu-hat16μ̂

μ

μ μ

Probable values of

Range where is unlikely to be

Q: Can be this far from mu-hat?

Answer: Yes, but it is very improbable

Page 33: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Confidence Interval

• To figure out the range in of “error” in our mean estimate, we need to know the width of the sampling distribution– The Standard Error! (The S.D. of this distribution)

• The Central Limit Theorem provides a formula:

NY

Y

σσ

• Problem: We do not know the exact value of sigma-sub-Y, the population standard deviation!

Page 34: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Confidence Interval

• Question: How do we calculate the standard error if we don’t know the population S.D.?

• Answer: We estimate it using the information we have

• Formula for best estimate:

• Where N is the sample size and s-sub-Y is the sample standard deviation

NY

Y

sσ̂

Page 35: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

95% Confidence Interval Example

• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200

• How do we find the 95% Confidence Interval?

• If N is large, we know that:• 1. The sampling distribution is roughly normal

• 2. Therefore 95% of samples will yield a mean estimate within 2 standard deviations (of the sampling distribution) of the population mean ()

• Thus, 95% of the time, our estimates of (Y-bar) are within two “standard errors” of the actual value of .

Page 36: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

95% Confidence Interval

• Formula for 95% confidence interval:

)(σ2Y : CI 95% Y• Where Y-bar is the mean estimate and sigma (Y-

bar) is the standard error

• Result: Two values – an upper and lower bound

• Adding our estimate of the standard error:

N

s2Y )σ̂(2Y Y

Y

Page 37: Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

95% Confidence Interval

• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200

• Calculate:

)100

200)(2(1020

)(2Y : CI 95%Ns

)2( 1020 10200

40 1020 2(20) 1020 • Thus, we are 95% confident that the population

mean falls between 980 and 1060.