Effect Sizes and Power Review. Statistical Power Statistical power refers to the probability of finding a particular sized effect Specifically, it is.

Effect Sizes and Power Review

Statistical Power

Statistical power refers to the probability of finding a particular sized effect

Specifically, it is 1- type II error rate– Probability of rejecting the null hypothesis if it is false

It is a function of type I error rate, sample size, and effect size

Its utility lies in helping us determine the sample size needed to find an effect size of a certain magnitude

Two kinds of power analysis

A priori– Used when planning your study– What sample size is needed to obtain a certain level of power?

Post hoc– Used when evaluating study– What chance did you have of significant results?– Not really useful

If you do the power analysis and conduct your analysis accordingly then you did what you could. To say after, “I would have found a difference but didn’t have enough power” isn’t going to impress anyone.

A priori power

Can use the relationship of n, d, and (the noncentrality parameter, i.e. what the sampling distribution is centered on if H0 is false) plus our specified to calculate how many subjects we need to run

Decide on your level Decide an acceptable level of power/type II error rate Figure out the effect size you are looking for Calculate n

A priori Effect Size?

Figure out an effect size before I run my experiment?

Several ways to do this:– Base it on substantive knowledge

What you know about the situation and scale of measurement

– Base it on previous research– Use conventions

An acceptable level of power?

Why not set power at .99? Practicalities

– Howell shows how for a 1 sample t test, and an effect size d of 0.33:

Power = .80, then n = 72 Power = .95, then n = 119 Power = .99, then n = 162

Cost of increasing power (usually done through increasing n) can be high

Howell’s general rule

Look for big effectsor

Use big samples

You may now start to understand how little power many of the studies in psych have considering they are often looking for small effects

Many seem to think that if they use the central limit theorem rule of thumb (n=30), which doesn’t even hold that often, that power is solved too

This is clearly not the case

Post hoc power: the power of the actual study

If you fail to reject the null hypothesis might want to know what chance you had of finding a significant result – defending the failure

As many point out this is a little dubious One thing we can understand regarding the power of a

particular study at hand is that it can be affected by a number of issues such as

– Reliability of measurement An increase in reliability can actually result in power increasing or

decreasing as we will see later, though here I stress the decrease due to unreliable measures

– Outliers– Skewness– Unequal N for group comparisons– The analysis chosen

Something to consider

Doing a sample size calculation is nice in that it gives a sense of what to shoot for, but rarely if ever do the data or circumstances bare out such that it provides a perfect estimate for our needs

– Mike’s sample size calculation for all studies: The sample size needed is the largest N you can obtain based on

practical considerations (e.g. time, money) Also, even the useful form of power analysis (for sample size

calculation) involves statistical significance as its focus While it gives you something to shoot for, our real interest

regards the effect size itself and how comfortable we are with its estimation

Emphasizing effect size over statistical significance in a sense de-emphasizes the power problem

Always a relationship

Commonly define the null hypothesis as ‘no difference’ or ‘no relationship’

There is always a non-zero relationship (to some decimal place) seen in sample data

As such obtaining statistical significance can be seen as just a matter of sample size

Furthermore, the importance and magnitude of an effect are not reflected (because of the role of sample size in probability value attained)

What should we be doing?

Want to make sure we have looked hard enough for the difference – power analysis

Figure out how big the thing we are looking for is – effect size

Effect Size

There are different ways to speak about the relationship between variables, but in general effect size refers to the practical, rather than statistical, significance

– This is what we are really interested in No one cares about the statistical particulars if the effect is real

and will change the way we think about things and how we act However, the effect size, like our other measures, varies from

sample to sample– I.e. if we did a study 5 times, we would get 5 different effect

sizes So while we are primarily interested in effect size, we will need

to be cautious in our interpretation there too, and use other available evidence also to come to our final conclusions

Calculating effect size

Different statistical tests have different effect sizes developed for them

However, the general principle is the same

Effect size refers to the magnitude of the impact of the independent variable (factor) on the outcome variable

Thinking about effect size again

d family: Focused on standardized mean differences– Allows comparison across samples and variables with

differing variance Equivalent to z scores

– Note sometimes no need to standardize (units of the scale have inherent meaning)

r family: Variance-accounted-for– Amount of variance explained versus the total

d family and r family

Example: Cohen’s d – Differences Between Means

Used with independent samples t test

Cohen initially suggested could use either sample standard deviation, since they should both be equal in the population according to our assumptions. In practice people now use the pooled variance.

Variations of this are for control group settings, dependent samples, more than two groups… but the notion of standardized mean difference is the same

1 2

p

X Xd

s

Cohen’s d – Differences Between Means

Relationship to t

Relationship to rpb

1 2

1 1d t

n n

1 22

1 2

2 1 1

1pbpb

n nd r

n nr

2 (1/ )

dr

d pq

P and q are the proportions of the total each group makes up.If equal groups p=.5, q=.5.

Characterizing effect size

Cohen emphasized that the interpretation of effects requires the researcher to consider things narrowly in terms of the specific area of inquiry

Evaluation of effect sizes inherently requires a personal value judgment regarding the practical or clinical importance of the effects

Even though rules of thumb exist, use only as a last resort and be wary of “mindlessly invoking” these criteria

Association

A measure of association describes the amount of the covariation between the independent and dependent variables

It is expressed in an unsquared metric or a squared metric—the former is usually a correlation, the latter a variance-accounted-for effect size

We can apply the measure to continuous data(r and R2), categorical predictors with continuous DV (eta2), and strictly categorical settings (e.g. phi)

Again the notion is the same, a measure of linear association which, if squared, provides a measure of variance in the DV can be accounted for by the predictor

Case-level effect sizes for group differences

Indexes such as Cohen’s d and eta2 estimate effect size at the group or variable level only

However, it is often of interest to estimate differences at the case level Case-level indexes of group distinctiveness are proportions of scores

from one group versus another that fall above or below a reference point

– Examples Cohen’s Us, common language effect size, tail ratios Reference points can be relative (e.g., a certain number of standard

deviations above or below the mean in the combined frequency distribution) or more absolute (e.g., the cutting score on an admissions test)

Note that all three effect size types applicable to the group difference setting are transferable to the other, it is just a matter of preference as to which one we use for communication

Confidence Intervals for Effect Size

Effect size statistics such as Cohen’s d and η2 have complex distributions

General form is the same as any CI

( . )statistic critval std error

Confidence Intervals for Effect Size

Traditional methods of interval estimation rely on approximate standard errors assuming large sample sizes

We need a computer program to help us find the correct noncentrality parameters to use in calculating exact confidence intervals for effect sizes

Both standalone programs (Steiger) and statistical packages (R) can do this for us, and thus provide a measure of effect while noting the uncertainty with that estimate

Limitations of effect size measures

Variability across samples– No more a limitation than other statistics, but one needs to be fully aware

of this Just because you found a moderate effect doesn’t mean that there is one

Standardized mean differences: – Heterogeneity of within-conditions variances across studies can limit their

usefulness—the unstandardized contrast may be better in this case Measures of association:

– Correlations can be affected by sample variances and whether the samples are independent or not, the design is balanced or not, or the factors are fixed or not

– Also affected by artifacts such as missing observations, range restriction, categorization of continuous variables, and measurement error (see Hunter & Schmidt, 1994, for various corrections)

– Variance-accounted-for indexes can make some effects look smaller than they really are in terms of their substantive significance


How to fool yourself with effect size estimation:

1. Measure effect size only at the group level

2. Apply generic definitions of effect size magnitude without first looking to the literature in your area

3. Believe that an effect size judged as “large” according to generic definitions must be an important result and that a “small” effect is unimportant

4. Ignore the question of how theoretical or practical significance should be gauged in your research area

5. Estimate effect size only for statistically significant results


6. Believe that finding large effects somehow lessens the need for replication

7. Forget that effect sizes are subject to sampling error

8. Forget that effect sizes for fixed factors is specific to the particular levels selected for study

9. Forget that standardized effect sizes encapsulate other quantities such as the unstandardized effect size, error variance, and experimental design

10. As a journal editor or reviewer, substitute effect size magnitude for statistical significance as a criterion for whether a work is published

Recommendations

Report effect sizes along with statistical significance

Report confidence intervals Use graphics Use common sense combined with

theoretical considerations Do not rely on any one result to support your

conclusions

Effect Sizes and Power Review. Statistical Power Statistical power refers to the probability of finding a particular sized effect Specifically, it is.

Documents

little power

power isnt

power review slide

n slide

certain level of power

kinds of power analysis

cost of increasing power

effect size d