Top Banner
Basic Statistics Primer August 2021 UM FMRI Workshop Dr. Adriene Beltz Assistant Professor of Psychology [email protected]
61

Basic Statistics Primer - courses.lsa.umich.edu

Oct 17, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basic Statistics Primer - courses.lsa.umich.edu

Basic Statistics Primer

August 2021UM FMRI Workshop

Dr. Adriene BeltzAssistant Professor of [email protected]

Page 2: Basic Statistics Primer - courses.lsa.umich.edu

Primer Topics

¡Statistics¡Populations,

Samples, & the Central Limit Theorem (CLT)

¡Hypothesis Testing¡ One-tailed &

two-tailed tests¡ Type I & Type II Errors

¡General Linear Model (GLM)¡ Regression¡ t-test¡ ANOVA

¡Advanced Topics¡ Dependent t-test¡ One-sample t-test¡ Factorial ANOVA

Page 3: Basic Statistics Primer - courses.lsa.umich.edu

Statistics

Page 4: Basic Statistics Primer - courses.lsa.umich.edu

Statistics is…

¡ The science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data, or values, measurements, and observations

¡73.6% of statistics are made up.

¡Statistics is harder than algebra!

Page 5: Basic Statistics Primer - courses.lsa.umich.edu

Types of Statistics¡Descriptive: A first step in analysis to describe the

nature of the data; consists of organizing, summarizing, and presenting data. ¡ Measures of central tendency: Describing a data

set by the middle value¡ Frequency distributions: Function showing how often

values in a data set occur

¡ Inferential: A follow-up step in analysis to make inferences from data; consists of generalizing from samples to populations, performing hypothesis tests, determining relationships, and making predictions.

Page 6: Basic Statistics Primer - courses.lsa.umich.edu

Value of Plotting

“We sometimes learn more from what we see than from what we compute; sometimes what we learn from what we see is that we shouldn’t compute, at least not on those data as they stand.” (Cohen, 1990, p. 1305)

Page 7: Basic Statistics Primer - courses.lsa.umich.edu

Frequency Distribution = Histogram ¡ A graph plotting values of observations on the

horizontal axis, with a bar showing how many times each value occurred in the data set

Page 8: Basic Statistics Primer - courses.lsa.umich.edu

The Normal Distribution¡ Bell-shaped¡ Symmetric about the center

The curve shows the idealized shape.

Where’s the mean? median? mode?

mean, median, & mode

Page 9: Basic Statistics Primer - courses.lsa.umich.edu

Normal Probability Distribution¡ Probability distribution: An idealized frequency distribution

from which the likelihood of certain values occurring can be determined¡ Z-curve is the standard normal probability distribution¡ Mean = 0 and standard deviation = 1

34 D ISCOVER ING STAT IST ICS US ING SPSS

data. That is to go where eagles dare, and no one should fly where eagles dare; but to become scientists we have to, so the rest of this book attempts to guide you through the various models that you can fit to the data.

1.7. Reporting data !

1.7.1. Dissemination of research !

Having established a theory and collected and started to summarize data, you might want to tell other people what you have found. This sharing of information is a fun-damental part of being a scientist. As discoverers of knowledge, we have a duty of care to the world to present what we find in a clear and unambiguous way, and with enough information that others can challenge our conclusions. Tempting as it may be to cover up the more unsavoury aspects of our results, science should be about ‘the truth’. We tell the world about our findings by presenting them at conferences and in articles published in scientific journals. A scientific journal is a collection of articles written by scientists on a vaguely similar topic. A bit like a magazine, but more tedi-ous. These articles can describe new research, review existing research, or might put forward a new theory. Just like you have magazines such as Modern Drummer, which is about drumming, or Vogue, which is about fashion (or Madonna, I can never remem-ber which), you get journals such as Journal of Anxiety Disorders, which publishes articles about anxiety disorders, and British Medical Journal, which publishes articles about medicine (not specifically British medicine, I hasten to add). As a scientist, you submit your work to one of these journals and they will consider publishing it. Not everything a scientist writes will be published. Typically, your manuscript will be given to an ‘editor’ who will be a fairly eminent scientist working in that research area who has agreed, in return for their soul, to make decisions about whether or not to publish articles. This editor will send your manuscript out to review, which means they send it to other experts in your research area and ask those experts to assess the quality of the work. The reviewers’ role is to provide a constructive and even-handed overview of the strengths and weaknesses of your article and the research contained within it. Once these reviews are complete the editor reads them all, and assimilates the comments

z

Den

sity

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

−1.96−3.00−4.00 −1.00 0.00 1.00 1.651.96 3.00 4.00

Probability = .025 Probability = .025

Probability = .95

FIGURE 1.15The probability density function of a normal distribution

01-Field 4e-SPSS-Ch-01.indd 34 07/11/2012 6:14:56 PM

Page 10: Basic Statistics Primer - courses.lsa.umich.edu

Normal Probability Distribution¡± 1.96 ¡ 1.96 cuts off the top 2.5% of the distribution¡ −1.96 cuts off the bottom 2.5% of the distribution¡ Thus, 95% of z-scores lie between −1.96 and 1.96

34 D ISCOVER ING STAT IST ICS US ING SPSS

data. That is to go where eagles dare, and no one should fly where eagles dare; but to become scientists we have to, so the rest of this book attempts to guide you through the various models that you can fit to the data.

1.7. Reporting data !

1.7.1. Dissemination of research !

Having established a theory and collected and started to summarize data, you might want to tell other people what you have found. This sharing of information is a fun-damental part of being a scientist. As discoverers of knowledge, we have a duty of care to the world to present what we find in a clear and unambiguous way, and with enough information that others can challenge our conclusions. Tempting as it may be to cover up the more unsavoury aspects of our results, science should be about ‘the truth’. We tell the world about our findings by presenting them at conferences and in articles published in scientific journals. A scientific journal is a collection of articles written by scientists on a vaguely similar topic. A bit like a magazine, but more tedi-ous. These articles can describe new research, review existing research, or might put forward a new theory. Just like you have magazines such as Modern Drummer, which is about drumming, or Vogue, which is about fashion (or Madonna, I can never remem-ber which), you get journals such as Journal of Anxiety Disorders, which publishes articles about anxiety disorders, and British Medical Journal, which publishes articles about medicine (not specifically British medicine, I hasten to add). As a scientist, you submit your work to one of these journals and they will consider publishing it. Not everything a scientist writes will be published. Typically, your manuscript will be given to an ‘editor’ who will be a fairly eminent scientist working in that research area who has agreed, in return for their soul, to make decisions about whether or not to publish articles. This editor will send your manuscript out to review, which means they send it to other experts in your research area and ask those experts to assess the quality of the work. The reviewers’ role is to provide a constructive and even-handed overview of the strengths and weaknesses of your article and the research contained within it. Once these reviews are complete the editor reads them all, and assimilates the comments

z

Den

sity

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

−1.96−3.00−4.00 −1.00 0.00 1.00 1.651.96 3.00 4.00

Probability = .025 Probability = .025

Probability = .95

FIGURE 1.15The probability density function of a normal distribution

01-Field 4e-SPSS-Ch-01.indd 34 07/11/2012 6:14:56 PM

Page 11: Basic Statistics Primer - courses.lsa.umich.edu

Z-scores Standardize

¡Z-scores standardize a score with respect to the other scores in the data set¡ Express a score in terms of how many standard

deviations it is away from the mean

sXXz -

=In your first undergraduate neuroscience exam, the class average was 85 with a standard deviation of 8. You got a 92. What was your z-score?

z = (92 – 85) / 8 = 7 / 8 = .88

Page 12: Basic Statistics Primer - courses.lsa.umich.edu

The Normal Probability Distribution

¡ What proportion of scores fall below z = .93?¡ Look-up: .8238

¡ What proportion of scores fall below z = -.93?¡ Look-up & subtract:

1 - .8238 = .1762

¡ What is the z for a score in the 97th percentile?¡ Look-up: 1.88

Page 13: Basic Statistics Primer - courses.lsa.umich.edu

Populations, Samples, & the Central Limit Theorem (CLT)

Page 14: Basic Statistics Primer - courses.lsa.umich.edu

Populations & Samples

Population¡ The collection of

units (people, plankton, plants, cities, authors, etc.) to which findings are generalized

¡ Parameters¡ Mean: µ¡ Standard deviation: σ

Sample¡ A smaller (but hopefully

representative) collection of units from a population used to determine truths about that population

¡ Statistics¡ Mean: X¡ Standard deviation: s

Page 15: Basic Statistics Primer - courses.lsa.umich.edu

Populations & Samples

Page 16: Basic Statistics Primer - courses.lsa.umich.edu

The Central Limit Theorem

¡ Generally, the sum of a large number of independent random samples is normally distributed

¡ With many samples, the sampling distribution is normal - even if the population distribution is not

¡ This is important because many statistical tests assume that data are normally distributed

Page 17: Basic Statistics Primer - courses.lsa.umich.edu

Sample Mean

6 7 8 9 10 11 12 13 14

Freq

uenc

y

0

1

2

3

4

Mean = 10SD = 1.22

µ = 10

M = 8M = 10

M = 9

M = 11

M = 12M = 11

M = 9

M = 10

M = 10

The Central Limit TheoremSampling variation creates a sampling distribution, or a frequency distribution of the sample means

As samples increase, the distribution becomes normal with M = µ and SE (standard error of the means):

Page 18: Basic Statistics Primer - courses.lsa.umich.edu

Degrees of Freedom

10=X

12

118

9

Sample10=µ

15

78

?

Population

(12 + 11 + 9 + 8) / 4 = 40 / 4 = 10

(15 + 8 + 7 + X) / 4 = 1015 + 8 + 7 + x = 40 X = 10

Page 19: Basic Statistics Primer - courses.lsa.umich.edu

Central Limit Theorem

Bunnies, Dragons and the 'Normal' World

Page 20: Basic Statistics Primer - courses.lsa.umich.edu

Hypothesis Testing

Page 21: Basic Statistics Primer - courses.lsa.umich.edu

Types of Hypotheses

¡Null hypothesis: H0¡ There is no effect¡ Jelly beans do not

cause acne.

¡The alternative hypothesis: H1¡ The experimental

hypothesis¡ There is an effect¡ Jelly beans cause

acne.

Page 22: Basic Statistics Primer - courses.lsa.umich.edu

Test Statistics

¡ A statistic for which the frequency of particular values is known¡ Like in a z or t distribution

¡ Observed values can be used to test hypotheses

systematic variationunsystematic variation=

Page 23: Basic Statistics Primer - courses.lsa.umich.edu

Statistical Significance

¡Assuming that the null hypothesis is true and the study is repeated an infinite number times by drawing random samples from the same population(s), less than 5% of these results will be more extreme than the current result.

¡NOT

From Kline (2013), as in Cassidy et al. (2019)

Page 24: Basic Statistics Primer - courses.lsa.umich.edu

P-values

¡ 95% of scores lie between −1.96 and +1.96

¡ 99% of z-scores lie between −2.58 and +2.58

¡ 99.9% of z-scores lie between −3.29 and +3.29

¡ Assuming the null hypothesis is correct, the probability of obtaining results as extreme as the test statistic

Z-cutoffs are critical

values.

Page 25: Basic Statistics Primer - courses.lsa.umich.edu

One- and Two-Tailed Tests

With a one-tailed test, there is increased likelihood of detecting an effect (if one is present), but an effect will be missed if it is in the other direction.

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

−4.00 −3.00 −1.96 −1.00 0.00 1.00 1.64 1.96 3.00 4.00z

Density

Mean of group 1 is smaller than the mean of

group 2, or there is a negative relationship

Mean of group 1 is bigger than the mean of group 2,

or there is a positive relationship

Probability = .025 Probability = .025

Probability = .05

Page 26: Basic Statistics Primer - courses.lsa.umich.edu

Type I and Type II Errors

¡ Type I error¡ Occurs when an effect is

detected in the sample when, in fact, it does not exist in the population

¡ The probability is the α-level (usually .05)

¡ Type II error¡ Occurs when no effect is

detected in the sample when, in fact, there is one in the population

¡ The probability is the β-level (often .2)

Page 27: Basic Statistics Primer - courses.lsa.umich.edu

General Linear Model (GLM)

Page 28: Basic Statistics Primer - courses.lsa.umich.edu

50! DISCOVERING!STATISTICS!USING!SPSS!

must$represent$the$data$collected$(the$observed!data)$as$closely$as$possible.$The$degree$to$which$a$statistical$model$represents$the$data$collected$is$known$as$the$fit$of$the$model.$

Figure$ 2.2$ illustrates$ three$ models$ that$ an$ engineer$ might$ build$ to$ represent$ the$ realJworld$

bridge$ that$ she$wants$ to$ create.$ The$ first$model$ is$ an$ excellent$ representation$ of$ the$ realJworld$

situation$and$is$said$to$be$a$good!fit$(i.e.,$the$model$is$basically$a$very$good$replica$of$reality).$If$the$engineer$ uses$ this$ model$ to$ make$ predictions$ about$ the$ real$ world$ then,$ because$ it$ so$ closely$

resembles$ reality,$ she$ can$ be$ confident$ that$ these$ predictions$ will$ be$ accurate.$ So,$ if$ the$ model$

collapses$in$a$strong$wind,$then$there$is$a$good$chance$that$the$real$bridge$would$collapse$also.$The$

second$ model$ has$ some$ similarities$ to$ the$ real$ world:$ the$ model$ includes$ some$ of$ the$ basic$

structural$ features,$ but$ there$ are$ some$ big$ differences$ from$ the$ realJworld$ bridge$ (namely$ the$

absence$of$one$of$the$supporting$towers).$We$might$consider$this$model$to$have$a$moderate!fit$(i.e.,$there$are$some$similarities$to$reality$but$also$some$important$differences).$If$the$engineer$uses$this$

model$to$make$predictions$about$the$real$world$then$these$predictions$may$be$inaccurate$or$even$

catastrophic$(e.g.,$the$model$predicts$that$the$bridge$will$collapse$in$a$strong$wind,$causing$the$real$

bridge$ to$be$closed$down,$ creating$100Jmile$ tailbacks$with$everyone$stranded$ in$ the$snow;$all$of$

which$ was$ unnecessary$ because$ the$ real$ bridge$ was$ perfectly$ safe—the$ model$ was$ a$ bad$

representation$ of$ reality).$ We$ can$ have$ some$ confidence,$ but$ not$ complete$ confidence,$ in$

predictions$from$this$model.$The$final$model$ is$completely$different$to$the$realJworld$situation;$ it$

bears$no$structural$similarities$to$the$real$bridge$and$is$a$poor$fit.$As$such,$any$predictions$based$on$

this$model$are$likely$to$be$completely$inaccurate.$Extending$this$analogy$to$science,$it$is$important$

when$we$fit$a$statistical$model$to$a$set$of$data$that$it$fits$the$data$well.$If$our$model$is$a$poor$fit$of$

the$observed$data$then$the$predictions$we$make$from$it$will$be$equally$poor.$

$

$

Figure'2.2:'Fitting'models'to'real5world'data'(see'text'for'details)'

Jane'Superbrain'Box'2.1'Types'of'statistical'models'(1)'

The Real World

Good Fit Moderate Fit Poor Fit

Statistical Models

Observed data

Model-predicted data

Model fit is how accurately a model represents observed data

Page 29: Basic Statistics Primer - courses.lsa.umich.edu

Statistical Models

Page 30: Basic Statistics Primer - courses.lsa.umich.edu

A Simple Statistical Model: The Mean¡The mean is a hypothetical value (i.e., it

doesn’t have to be a value that actually exists in the data set)

¡ It is a good guess at a typical score

1mean ( )n

iix

nX =å=

Page 31: Basic Statistics Primer - courses.lsa.umich.edu

What is Regression?

¡ A linear model in which an outcome variable is predicted from other variables (not just the mean)¡ Simple regression: Outcome is predicted by one variable¡ Multiple regression: Outcome is predicted by a (linear

combination of) more than one variable

Page 32: Basic Statistics Primer - courses.lsa.umich.edu

¡ Yi : Outcome score for individual i

¡ b0 : Intercept (value of Y when X = 0)¡ Point at which the regression line crosses the Y-axis

¡ b1 : Slope (gradient) of the regression line¡ Regression coefficient for the predictor¡ Reflects direction and magnitude of relationship¡ Change in Y associated with unit change in X

¡ Xi : Predictor score for individual i

¡ εi : Difference between predicted and observed score for participant i

Equation for a Straight Line

Yi = b0 +b1Xi +εi Same as…y = mx + b

Page 33: Basic Statistics Primer - courses.lsa.umich.edu

Intercepts and Slopes296 D ISCOVER ING STAT IST ICS US ING SPSS

model (i.e., a straight line) to summarize the relationship between two variables: the gradi-ent (b1) tells us what the model looks like (its shape) and the intercept (b0) tells us where the model is (its location in geometric space).

This is all quite abstract, so let’s look at an example. Imagine that I was interested in predicting physical and downloaded album sales (outcome) from the amount of money spent advertising that album (predictor). We could summarize this relationship using a linear model by replacing the names of our variables into equation (8.1):

0 1i i iy b b X !" # #

" # # !0 1album sales advertising budgeti i ib b (8.2)

Once we have estimated the values of the bs we would be able to make a prediction about album sales by replacing ‘advertising’ with a number representing how much we wanted to spend advertising an album. For example, imagine that b0 turned out to be 50 and b1 turned out to be 100. Our model would be:

$ %" # & # !album sales 50 100 advertising budgeti i i (8.3)

Note that I have replaced the betas with their numeric values. Now, we can make a predic-tion. Imagine we wanted to spend £5 on advertising, we can replace the variable ‘advertising budget’ with this value and solve the equation to discover how many album sales we will get:

$ %" # & # !album sales 50 100 5i i

550 i!" #

So, based on our model we can predict that if we spend £5 on advertising, we’ll sell 550 albums. I’ve left the error term in there to remind you that this prediction will probably not be perfectly accurate. This value of 550 album sales is known as a predicted value.

8.2.2. The linear model with several predictors !

We have seen that we can use a straight line to ‘model’ the relationship between two vari-ables. However, life is usually more complicated than that: there are often numerous vari-ables that might be related to the outcome of interest. To take our album sales example, we

Same intercepts, different gradients Same gradients, different interceptsFIGURE 8.2Lines that share the same intercept but have different gradients, and lines with the same gradients but different intercepts

08-Field 4e-SPSS-Ch-08.indd 296 07/11/2012 7:42:46 PM

Page 34: Basic Statistics Primer - courses.lsa.umich.edu

Visualizing Regression

Page 35: Basic Statistics Primer - courses.lsa.umich.edu

Which Line?

Scatterplots with lines representing the model.

¡ Line of best fit is the line – of all possible lines –that minimizes the difference between the observed values and the model-predicted values

¡Use Method of Least Squares to find line

Page 36: Basic Statistics Primer - courses.lsa.umich.edu

¡ Yi : Outcome score for individual i

¡ b0 : Intercept (value of Y when all Xs = 0)¡ Point at which the regression plane crosses the Y-axis

¡ b1 : Regression coefficient for predictor variable 1¡ Change in Y associated with unit change in X1 when X2 = 0

¡ X1,i : Predictor variable 1 score for individual i

¡ b2 : Regression coefficient for predictor variable 2¡ Change in Y associated with unit change in X2 when X1 = 0

¡ X2,i : Predictor variable 2 score for individual i

¡ εi : Residual for participant i

Multiple Regression Equation for 2 Predictors

Yi = b0 +b1X1,i +b2X 2,i +εi

Page 37: Basic Statistics Primer - courses.lsa.umich.edu

Assumptions

¡ Linear relationships

¡ Predictors¡ Have variance¡ Are additive¡ No multicollinearity

¡ Outcome¡ Measured at interval level¡ Independence of measurements

¡ Residuals (difference between observed and predicted values)¡ Random, normally distributed, with a mean of 0 ¡ Uncorrelated¡ Homoscedasticity: Similar variance at all levels of the predictors

Most examined after regression analysis(because based on

model residuals)

Page 38: Basic Statistics Primer - courses.lsa.umich.edu

Testing the Model

¡ If the model results in better prediction than the mean, then SSM will be much greater than SSR¡ Tests the null hypothesis that the

model explains no variance in the outcome

M

R

MSMS F =

Page 39: Basic Statistics Primer - courses.lsa.umich.edu

Testing the Model

¡ R2: The proportion of variance accounted for by the regression model¡ Gauge of the size of the

relationship¡ Pearson correlation

coefficient squared

M

T

SS2SS R = t = bobserved −bexpected

SEb

¡ Use t-statistics to determine significance of predictors¡ bexpected = 0¡ Null hypothesis: bobserved = 0¡ df = N – (k + 1)

Page 40: Basic Statistics Primer - courses.lsa.umich.edu

General Linear Model (GLM)¡ GLM is an umbrella for regression as applied to more than

one dependent variable, including t-tests and correlations

Page 41: Basic Statistics Primer - courses.lsa.umich.edu

Independent t-test

¡ A parametric statistical test that compares two means based on independent data

2

2

1

221

ns

ns

XXtpp +

-=

( ) ( )211

21

222

2112

-+-+-

=nn

snsnsp

Page 42: Basic Statistics Primer - courses.lsa.umich.edu

t table

The tdistribution is just a slightly flattened z

distribution for small samples

Page 43: Basic Statistics Primer - courses.lsa.umich.edu

t-test Example¡Does treatment (n=10;

coded 1) impact mPFCactivity compared to a control group (n=10; coded 0)?

¡H0: Group means are the same¡ MA = MB

¡H1: Group mean are different¡ MA ≠ MB

Page 44: Basic Statistics Primer - courses.lsa.umich.edu

t-test as Regression

Yi = b0 +b1Xi +εiGrouping variable:

Treatment or Control (coded 0 & 1)

The t-statistic and p-values are the same.

The intercept is the group coded 0 mean, and the slope is the

difference between the group means: 12772.6-11872.5=900.1.

T-test:

Regression:

Page 45: Basic Statistics Primer - courses.lsa.umich.edu

Analysis of Variance (ANOVA)¡ A parametric statistical test that compares more

than two means¡ One-way versus factorial

¡ Familywise error rate (Type I error for a study) is increased with multiple statistical tests

1

2

3

1 2

2 3

1 3

familywise error 1 0.95n= -

With 3 t-tests at α=.05, the familywise error rate is 1 – (.95 x .95 x .95) = 1 - .86 = .14!

Page 46: Basic Statistics Primer - courses.lsa.umich.edu

False Positives When No Corrections for Multiple Comparisons

Bennett et al. (2009)

“Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction”

Page 47: Basic Statistics Primer - courses.lsa.umich.edu

Logic of ANOVA: Variances

F = MSBMSW

Page 48: Basic Statistics Primer - courses.lsa.umich.edu

012345678

0 1 2 3 4

Sum of Squares Between (SSB)

Grand Mean

SSB = nk (xk − xgrand )2∑ dfB = k −1

Page 49: Basic Statistics Primer - courses.lsa.umich.edu

012345678

0 1 2 3 4

Sum of Squares Within (SSW)

Grand Mean

SSW = (xik − xk )2∑ dfW = N − k

Page 50: Basic Statistics Primer - courses.lsa.umich.edu

F table

Page 51: Basic Statistics Primer - courses.lsa.umich.edu

ANOVA Example¡Do treatment (n=10; coded 2), control (n=10;

coded 1), and never treated (n=10; coded 3) groups differ on mPFC activity?

¡H0: Group means are the same¡ MA = MB = MC

¡H1: At least one group mean differs¡ MA ≠ MB = MC (or MA = MB ≠ MC or MA ≠ MB ≠ MC)

¡ANOVA is an omnibus test¡ Indicates an overall effect or difference between groups¡ Does NOT indicate which mean(s) differ(s)

Page 52: Basic Statistics Primer - courses.lsa.umich.edu

ANOVA Example¡Do treatment (n=10; coded 2), control (n=10;

coded 1), and never treated (n=10; coded 3) groups differ on mPFC activity?

The omnibus test reveals a significant difference

between the group means somewhere...

Page 53: Basic Statistics Primer - courses.lsa.umich.edu

Coding Schemes

¡ Dichotomous variables can be used in regression

¡ Categorical variables with more than two levels can also be used if expressed as a series of variable codes¡ k – 1 variables in the series, where k = number of groups

¡ Different coding schemes generally provide the same omnibus test results (i.e., F-test), but different test coefficients

¡ Dummy coding¡ b0 = Mreference

¡ b1 = Difference between Mreference and Mgroup1

¡ b2 = Difference between Mreference and Mgroup2

Group C1 C2

1 (Tx) 1 02 (control) 0 1

3 (reference) 0 0

Page 54: Basic Statistics Primer - courses.lsa.umich.edu

ANOVA as Regression

Grouping variables: Two dummy codes

Regression:

Yi = b0 +b1X1,i +b2X 2,i +εi

The F-test is the same as the ANOVA omnibus test.

The intercept is the reference group mean, D1 is the difference between

the treatment and reference groups, and D2 is the difference between

the control group and reference groups.

Page 55: Basic Statistics Primer - courses.lsa.umich.edu

GLM Summary

Introduction to the General Linear Model - Statistics for the Social Sciences

Page 56: Basic Statistics Primer - courses.lsa.umich.edu

Advanced Topics

Page 57: Basic Statistics Primer - courses.lsa.umich.edu

Dependent t-test

¡ A parametric statistical test that compares two means based on related data

Page 58: Basic Statistics Primer - courses.lsa.umich.edu

Dependent t-test

NsDtD

Dµ-=

df = N - 1

Page 59: Basic Statistics Primer - courses.lsa.umich.edu

One-sample t-test

t =

observed sample mean − expected population mean

(if null hypothesis is true)

estimate of the standard error of the sample mean

= 0

= s / √n

¡ A parametric statistical test that compares a mean to a hypothesized value (often 0)

Page 60: Basic Statistics Primer - courses.lsa.umich.edu

Factorial ANOVA

Page 61: Basic Statistics Primer - courses.lsa.umich.edu

Questions?Office hours: Sunday, August 8th 6PM-8PM

Email: [email protected]