Top Banner
Inferential Statistics Confidence Intervals and Hypothesis Testing
26

Inferential Statistics Confidence Intervals and Hypothesis Testing.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Inferential Statistics

Confidence Intervals and Hypothesis Testing

Page 2: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Samples vs. Populations

• Population– All of the objects that belong to a

class (e.g. all Darl projectile points, all Americans, all pollen grains)

– A theoretical distribution• Sample

– Some of the objects in a class– Observations drawn from a

distribution

Page 3: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Two Distributions

• The sample distribution is the distribution of the values of a sample – exactly what we get plotting a histogram or a kernel density plot

• The sampling distribution is the distribution of a statistic that we have computed from the sample (e.g. a mean)

Page 4: Inferential Statistics Confidence Intervals and Hypothesis Testing.
Page 5: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Confidence Intervals

• Given a sample statistic estimating a population parameter, what is the parameter’s actual value?

• Standard Error of the Estimate provides the standard deviation for the sample statistic:

n

ssX

Page 6: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Example 1

• Snodgrass house size. Mean area is 236.8 with a standard deviation of 94.25 based on 91 houses.

• Area is slightly asymmetrical• Can we use these data to predict

house sizes at other Mississippian sites?

Page 7: Inferential Statistics Confidence Intervals and Hypothesis Testing.
Page 8: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Example 1 (cont)

• The confidence interval is based on the mean, sd, and sample size

• Mean ± t(p<confidence)*sd/sqrt(n)• For 95% , 90%, 67% confidence

– qt(c(.025,.975), df=90)– qt(c(.025,.975), df=90)– qt(c(.167,.833), df=90)

Page 9: Inferential Statistics Confidence Intervals and Hypothesis Testing.

# Distributionsx <- seq(10, 40, length.out=200)y1 <- dnorm(x, mean=25, sd=4)y2 <- dnorm(x, mean=25, sd=1)max(y2)plot(x, y1, type="l", ylim=c(0, .4), col="red")lines(x, y2, col="blue")text(c(28, 26.3), c(.08, .30), c("Sample Distribution\n mean=25, sd=4", "Sampling Distribution\n m=25, sd=1, n=16)"), col=c("red", "blue"), pos=4)

# Snodgrass House Areasplot(density(Snodgrass$Area), main="Snodgrass House Areas")lines(seq(0, 475, length.out=100), dnorm(seq(0, 475, length.out=100), mean=236.8, sd=94.2), lty=2)abline(v=mean(Snodgrass$Area))legend("topright", c("Kernel Density", "Normal Distribution"), lty=c(1, 2))

# Confidence interval functionconf <- function(x, conf) {

conf <- ifelse(conf>1, conf/100, conf)tail <- (1-conf)/2mean(x)+qt(c(tail, 1-tail), df=length(x)-1)*sd(x)/sqrt(length(x))

}

Page 10: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Bootstrapping

• Confidence intervals depend on a normal sampling distribution

• This will generally be a reasonable assumption if the sample size is moderately large

• We can draw multiple samples of house areas to get some idea

Page 11: Inferential Statistics Confidence Intervals and Hypothesis Testing.
Page 12: Inferential Statistics Confidence Intervals and Hypothesis Testing.

# Draw 100 samples of size 50

samples <- sapply(1:100, function(x) mean(sample(Snodgrass$Area, 50, replace=TRUE)))range(samples)quantile(samples, probs=c(.025, .975))conf(Snodgrass$Area, 95)plot(density(samples), main="Sample Size = 50")x <- seq(175, 300, 1)lines(x, dnorm(x, mean=mean(samples), sd=sd(samples)), lty=2)legend("topright", c("Kernel Density", "Normal Distribution"), lty=c(1, 2))

# Draw 1000 samples of size 91

samples <- sapply(1:100, function(x) mean(sample(Snodgrass$Area, 91, replace=TRUE)))range(samples)quantile(samples, probs=c(.025, .975))conf(Snodgrass$Area, 95)plot(density(samples), main="Sample Size = 91")x <- seq(175, 300, 1)lines(x, dnorm(x, mean=mean(samples), sd=sd(samples)), lty=2)legend("topright", c("Kernel Density", "Normal Distribution"), lty=c(1, 2))

Page 13: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Example 2• Radiocarbon Ages are presented as

an age estimate and a standard error: 2810 ± 110 B.P.

• The probability that the true age is between 2700 and 2920 B.P. is .6826 or .3174 that it is outside that range

• The probability that the true age is between 2590 and 3030 B.P. is .9546 or .0545 that it is outside that range

Page 14: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Hypothesis Testing

• Assumptions and Null Hypothesis• Test Statistic (method)• Significance Level• Observe Data• Compute Test Statistic• Make Decision

Page 15: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Assumptions

• Data are a random sample– Every combination is equally likely

• Appropriate sampling distribution

Page 16: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Null Hypothesis

• Represented by H0

• Must be specific, e.g. S1-S2 = 0• The difference between two

sample statistics is zero, e.g. they are drawn from the same population (two tailed test)

• Or S1-S2>0 (one tailed)

Page 17: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Test Statistic

• Measurement Levels• Number of groups• Dependent vs. Independent• Power

Page 18: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Significance Level

• Nothing is absolute in probability• Select probability of making

certain kinds of errors• Cannot minimize both kinds of

errors• Social scientists often use p ≤ 0.05• Consider how many tests

Page 19: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Errors in Hypothesis Testing

Null Hypothesis (H0) is

True False

Research Decision Reject H0

ErrorType I, α

Correct Decision

Accept H0 (fail to reject)

Correct Decision

ErrorType II, β

Page 20: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Difference of Means (t-test)

• Independent random samples of normally distributed variates

• Samples: 1, 2 independent, 2 related

• If 2 independent – variances equal or unequal

• Sample statistics follow the t-distribution

Page 21: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Example

• Snodgrass site is a Mississippian site in Missouri that was occupied about A.D. 1164

Page 22: Inferential Statistics Confidence Intervals and Hypothesis Testing.
Page 23: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Using Rcmdr• Snodgrass Site – House sizes inside

and outside are the same• Check normality - shapiro.test()• Check equal variances – var.test()

or bartlett.test()• Compute statistic and make

decision – t.test()

Page 24: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Wilcoxon Test

• If data do not follow a normal distribution or are ranks not interval/ratio scale

• Nonparametric test that is similar to the t-test but not as powerful

• Tests for equality of medians– wilcox.test()

Page 25: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Difference of Proportions

• Uses the normal distribution to approximate the binomial distribution to test differences between proportions (probabilities)

• This approximation is accurate as long as N x (min(p,(1-p))>5 where N is the sample size, p is the proportion, and min() is the minimum

Page 26: Inferential Statistics Confidence Intervals and Hypothesis Testing.

Using Rcmdr• Must have two or more variables

defined as factors, eg, – Create ProjPts to be equal to

as.factor(ifelse(Points>0, 1, 0)) using Data | Manage variables . . . | Compute new variable

– Statistics | Proportions | Two sample . . .

– prop.test()– Are the % Absent equal inside and

outside the wall?