Hypothesis Testing. Research hypothesis are formulated in terms of the outcome that the experimenter wants, and an alternative outcome that he doesn’t.

Hypothesis Hypothesis TestingTesting

Hypothesis TestingHypothesis Testing

Research hypothesis are formulated in Research hypothesis are formulated in terms of the outcome that the terms of the outcome that the experimenter wants, and an alternative experimenter wants, and an alternative outcome that he doesn’t wantoutcome that he doesn’t want I.e. If we’re comparing scores on an exam I.e. If we’re comparing scores on an exam

with two groups, one with test anxiety and with two groups, one with test anxiety and one without, our hypotheses are:one without, our hypotheses are:

(1) That the group with test anxiety will score (1) That the group with test anxiety will score higher (expected outcome)higher (expected outcome)

(2) The two groups will score the same (2) The two groups will score the same (unexpected outcome)(unexpected outcome)


The hypothesis that outlines that The hypothesis that outlines that outcome that we’re outcome that we’re expecting/hoping for is the expecting/hoping for is the Research Research Hypothesis (HHypothesis (H11))

The hypothesis that runs counter to The hypothesis that runs counter to our expectations is the our expectations is the Null Null Hypothesis (HHypothesis (Hoo))

Hypothesis TestingHypothesis Testing We can use the sampling distribution of the We can use the sampling distribution of the

mean to determine the probability that we would mean to determine the probability that we would obtain the mean of our sample by chanceobtain the mean of our sample by chance I.e. the same way we could convert a score to a z-score, I.e. the same way we could convert a score to a z-score,

and determine the probability of obtaining values and determine the probability of obtaining values higher or lower than ithigher or lower than it

z

3.503.00

2.502.00

1.501.00

.500.00

-.50-1.00

-1.50-2.00

-2.50-3.00

-3.50-4.00

Normal Distribution

Cutoff at +1.6451200

1000

800

600

400

200

0


If the probability is low (i.e. only a If the probability is low (i.e. only a 5% chance or less), we can assume 5% chance or less), we can assume that chance sampling error did not that chance sampling error did not produce our results, and our IV didproduce our results, and our IV did I.e. In our comparison of people with I.e. In our comparison of people with

test anxiety, our test anxious group may test anxiety, our test anxious group may also be quite dumb, resulting in their also be quite dumb, resulting in their poor test scores. However, if their poor test scores. However, if their scores are extreme enough (low), we scores are extreme enough (low), we can discount even that possibilitycan discount even that possibility


Why bother with HWhy bother with Hoo at all? at all? Technically, we can never prove a particular Technically, we can never prove a particular

hypothesis to be truehypothesis to be true You cannot prove the statement: “All ducks are black”, You cannot prove the statement: “All ducks are black”,

because you would have to have observations on all because you would have to have observations on all ducks that were, are, and ever will be (i.e. on ducks that were, are, and ever will be (i.e. on all all ducks)ducks)

You can disprove a hypothesis – “All ducks are black” You can disprove a hypothesis – “All ducks are black” can be easily proven false by seeing one white (non-can be easily proven false by seeing one white (non-black) duckblack) duck

This is why technically, we are supposed to talk This is why technically, we are supposed to talk about “rejecting Habout “rejecting Hoo” and not “accepting H” and not “accepting H11” and ” and “failing to reject H“failing to reject Hoo”, never “proving H”, never “proving H00””


Beginning with the assumption that HBeginning with the assumption that H00 is true, and trying to disprove it also is true, and trying to disprove it also maintains the scientific spirit of maintains the scientific spirit of objectivity and skepticismobjectivity and skepticism Objectivity – illustrates that we value the Objectivity – illustrates that we value the

results of the data more than the hypothesis results of the data more than the hypothesis that, if proven, would make us happiest (Hthat, if proven, would make us happiest (H11))

Skepticism – showing that we are not Skepticism – showing that we are not convinced of even our own hypothesis until convinced of even our own hypothesis until confirmed by the dataconfirmed by the data


In our example of people with (xIn our example of people with (x11) ) and without test anxiety (xand without test anxiety (x22), where ), where our hypothesis is that people with our hypothesis is that people with anxiety will have lower IQ scores:anxiety will have lower IQ scores: Ho = [xHo = [x11 ≥ ≥ xx22]]

HH11 = [x = [x1 1 << xx22]]


If, instead, we were testing if the If, instead, we were testing if the group with anxietygroup with anxiety was different was different from the average student from the average student population population (Hint: Look at the italics), how would (Hint: Look at the italics), how would we phrase Hwe phrase Hoo and H and H11?? What if we were testing whether or not What if we were testing whether or not

the two groups (xthe two groups (x11 & & xx22) were equal?) were equal?


How do we know when our sample is rare How do we know when our sample is rare enough to fail to accept Henough to fail to accept Hoo?? Statistical convention Statistical convention says when the says when the

probability of obtaining a mean that exceeds probability of obtaining a mean that exceeds the one you’ve obtained is only 5% or less, we the one you’ve obtained is only 5% or less, we can says this is not due to chancecan says this is not due to chance

AKA the probability of rejecting HAKA the probability of rejecting Ho o when it is when it is “true” (i.e. screwing-up) = “true” (i.e. screwing-up) = significance/rejection level/alpha/critical valuesignificance/rejection level/alpha/critical value

HOWEVER THIS DOES NOT MEAN THAT HOWEVER THIS DOES NOT MEAN THAT 5.1% IS MEANINGLESS!5.1% IS MEANINGLESS!

p<.05

Hypothesis TestingHypothesis Testing For our group with test anxiety, if their mean score For our group with test anxiety, if their mean score

on an IQ test was 70, we first convert this into a z-on an IQ test was 70, we first convert this into a z-score (score (μμ = 100, = 100, σσ = 15) = 15) z = (70 – 100)/15 = -2z = (70 – 100)/15 = -2 Since our HSince our H11 is that the group with anxiety will be is that the group with anxiety will be less than less than

those without, we look at the percent in the “Lesser Portion”those without, we look at the percent in the “Lesser Portion”


Look at Table E.10, the probability Look at Table E.10, the probability of obtaining a score at or below z = -of obtaining a score at or below z = -2 is .0228 or 2.3%2 is .0228 or 2.3%

Since this is below the 5% Since this is below the 5% convention, we would reject Hconvention, we would reject Ho o (or (or “accept” H“accept” H11))


αα is the p(“accepting” H is the p(“accepting” H11 when it is when it is false/rejecting Hfalse/rejecting H00 when it is true), or of when it is true), or of making a mistake called a making a mistake called a Type I ErrorType I Error p(“accepting” Hp(“accepting” H11 when it is false) ≠ when it is false) ≠

p(“accepting” Hp(“accepting” H11) – the former refers to a ) – the former refers to a type of error, the latter simply to an outcometype of error, the latter simply to an outcome

What about the p(“accepting” HWhat about the p(“accepting” H00 when it when it is false/rejecting His false/rejecting H11 when it is true)? when it is true)? This is called a This is called a Type II Error, or Type II Error, or ββ (Beta) (Beta)

Hypothesis TestingHypothesis Testing Why not make Why not make αα as small as possible? as small as possible?

Because as Because as αα [p(Type I Error)] decreases, [p(Type I Error)] decreases, ββ [p(Type II [p(Type II Error)] increasesError)] increases

Red = Red = αα, , Blue = Blue = ββ


It seems like we care more about Type It seems like we care more about Type I Error than Type II Error. Why?I Error than Type II Error. Why? Scientists are more likely to commit a Scientists are more likely to commit a

Type I Error because they are more Type I Error because they are more motivated to prove their hypothesis (Hmotivated to prove their hypothesis (H11))

In Law, establishing motive is important In Law, establishing motive is important to proving guilt, without a motive, there’s to proving guilt, without a motive, there’s little reason to expect that a crime will little reason to expect that a crime will occur, let alone stringently attempt to occur, let alone stringently attempt to protect against itprotect against it

Hypothesis TestingHypothesis Testing So long we’re only willing to take a 5% of So long we’re only willing to take a 5% of

incorrectly rejecting Hincorrectly rejecting Hoo, it doesn’t matter how we , it doesn’t matter how we distribute this 5%, as long distribute this 5%, as long as it doesn’t exceed as it doesn’t exceed 5%5% We can place all 5% in one “tail” of the distribution if We can place all 5% in one “tail” of the distribution if

we only expect a difference in means in one direction we only expect a difference in means in one direction = = One-Tailed/Directional TestOne-Tailed/Directional Test

We can place half of 5% (2.5%) in either “tail”, if we We can place half of 5% (2.5%) in either “tail”, if we have no a priori (before) hypothesis about where our have no a priori (before) hypothesis about where our mean difference will be – mean difference will be – Two-Tailed/Non-Directional Two-Tailed/Non-Directional TestTest

The decision of which type of test to use should be The decision of which type of test to use should be made a priori based on theory, not data drivenmade a priori based on theory, not data driven


One-Tailed TestOne-Tailed Test Two-Tailed Test Two-Tailed Test


HHoo and H and H11 with One- and Two-Tailed Tests: with One- and Two-Tailed Tests: For One-Tailed Tests:For One-Tailed Tests:

If our hypothesis is that group x is lower than group If our hypothesis is that group x is lower than group yy

HHoo = (x ≥ y) = (x ≥ y)

HH11 = (x < y) = (x < y)

For Two-Tailed Tests:For Two-Tailed Tests: If our hypothesis is that group x is either greater If our hypothesis is that group x is either greater

than or less than group ythan or less than group y HHoo = (x = y) = (x = y)

HH11 = (x ≠ y) = (x ≠ y)


Psychologists can be sneaky bastards Psychologists can be sneaky bastards and covertly increase and covertly increase αα by testing one by testing one hypothesis many times by:hypothesis many times by: Evaluating one hypothesis with many Evaluating one hypothesis with many

different statistical testsdifferent statistical tests Using more than one measure to Using more than one measure to

operationalize one DVoperationalize one DV i.e. Measuring depression with both the Beck i.e. Measuring depression with both the Beck

Depression Inventory-II (BDI-II) and the Depression Inventory-II (BDI-II) and the Minnesota Multi-Phasic Personality Inventory-II Minnesota Multi-Phasic Personality Inventory-II (MMPI-II) = testing depression twice = (MMPI-II) = testing depression twice = doubling your doubling your αα

Hypothesis TestingHypothesis Testing What should you do to prevent this from What should you do to prevent this from

happening? happening? If you’re testing one hypothesis many different ways If you’re testing one hypothesis many different ways

or with many measures, adjust or with many measures, adjust αα accordingly w/ the accordingly w/ the Bonferroni CorrectionBonferroni Correction

Note: NOT the same as the Beeferoni™ Correction, Note: NOT the same as the Beeferoni™ Correction, which prevents incorrect preparation of Chef which prevents incorrect preparation of Chef Boyardee ™ productsBoyardee ™ products

Testing w/ 2 tests; Test using Testing w/ 2 tests; Test using αα = .05/2 = .025 = .05/2 = .025 Test using 3 measures of one construct; Use Test using 3 measures of one construct; Use αα

= .05/3 = .0167= .05/3 = .0167 Testing w/2 tests and 3 measures; Use Testing w/2 tests and 3 measures; Use αα = .05/6 = .05/6

= .008 = .008


Example:Example: Your hypothesis is that males and females Your hypothesis is that males and females

will differ in degree of instrumental will differ in degree of instrumental aggression (IA = aggression designed to aggression (IA = aggression designed to obtain an end). IA is measured with the obtain an end). IA is measured with the Instrumental Aggression Scale (IAS) and Instrumental Aggression Scale (IAS) and the Positive and Negative Affect Scale the Positive and Negative Affect Scale (PANAS), and the groups are evaluated (PANAS), and the groups are evaluated with both ANOVA and SEMwith both ANOVA and SEM

What is your corrected What is your corrected αα-level?-level?


Three of the Ten Commandments of Statistics:Three of the Ten Commandments of Statistics: 1. P-Values indicate the probability that your 1. P-Values indicate the probability that your

findings occurred by chance or the likelihood of findings occurred by chance or the likelihood of obtaining them again in a similar sample obtaining them again in a similar sample NOTNOT the the strength of the relationship between an IV and DVstrength of the relationship between an IV and DV

I.e. NEVER SAY “In my experiment evaluating the I.e. NEVER SAY “In my experiment evaluating the influence of coffee (the IV) on people’s activity levels (the influence of coffee (the IV) on people’s activity levels (the DV), I found highly significant results at p = .000001, DV), I found highly significant results at p = .000001, indicating that coffee produces a lot of activity in people”indicating that coffee produces a lot of activity in people”

CORRECT – “The likelihood that the effect, that coffee CORRECT – “The likelihood that the effect, that coffee boosted activity levels, was due to sampling error (i.e. boosted activity levels, was due to sampling error (i.e. chance) was only .000001”chance) was only .000001”


Three of the Ten Commandments of Three of the Ten Commandments of Statistics:Statistics: 2. p = .052, .055, etc. is not “insignificant”, 2. p = .052, .055, etc. is not “insignificant”,

and and does not meandoes not mean that a relationship that a relationship between your IV and DV does not exist, just between your IV and DV does not exist, just that it did not meet “conventional” levels of that it did not meet “conventional” levels of significance.significance.

3. When testing a hypothesis multiple ways, 3. When testing a hypothesis multiple ways, always always use some corrected level of use some corrected level of αα (i.e. the (i.e. the Bonferroni Correction).Bonferroni Correction).

Hypothesis Testing. Research hypothesis are formulated in terms of the outcome that the experimenter wants, and an alternative outcome that he doesn’t.

Documents

particular hypothesis

hypothesis testing slide

research hypothesis

test anxiety

null hypothesis h o

alternative outcome

poor test scores

unexpected outcome slide