Top Banner

of 22

Week 11 Anova

Mar 09, 2016

Download

Documents

Soh Mei Ling

Summed-up notes from books and webs on ANOVA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1

    Week 11

    Parametric technique Non-parametric technique

    One-way between groups ANOVA Kruskal-Wallis Test

    One-way repeated measures ANOVA Friedman Test

    Two-way between groups ANOVA None

    Mixed between-within group ANOVA None

    Analysis of variance is used when you have two or more groups or time points.

    Paired-sample/ repeated measures/ within-group techniques are used when you test the same

    people on more than one occasion, or you have matched pairs.

    Independent/ between-group techniques are used when the participants in each group are different

    people (independent of one another).

    One-way ANOVA one categorical independent variable (e.g.: gender) and one continuous

    dependent variable (e.g.: scores)

    Two-way ANOVA two independent variables (e.g.: gender, age group) and one continuous

    dependent variable (e.g: scores)

    Examples for types of ANOVA:

    A manager wants to raise the productivity at his company by increasing the speed at which his

    employees can use a particular spreadsheet program. As he does not have the skills in-house, he

    employs an external agency which provides training in this spreadsheet program. They offer 3

    courses: a beginner, intermediate and advanced course. He is unsure which course is needed for the

    type of work they do at his company, so he sends 10 employees on the beginner course, 10 on the

    intermediate and 10 on the advanced course. When they all return from the training, he gives them

    a problem to solve using the spreadsheet program, and times how long it takes them to complete

    the problem. He then compares the three courses (beginner, intermediate, advanced) to see if there

    are any differences in the average time it took to complete the problem.

    One-way between-group ANOVA

    Heart disease is one of the largest causes of premature death and it is now known that chronic, low-

    level inflammation is a cause of heart disease. Exercise is known to have many benefits, including

    protection against heart disease. A researcher wants to know whether this protection against heart

    disease might be due to exercise reducing inflammation. The researcher was also curious as to

    whether this protection might be gained over a short period of time or whether it took longer. In

    order to investigate this idea, the researcher recruited 20 participants who underwent a 6-month

    exercise training program. In order to determine whether inflammation had been reduced, the

    researcher measured the inflammatory marker called CRP at pre-training, 2 weeks into training and

    after 6 months of training.

    One-way repeated measures ANOVA/ One-way within-group ANOVA

  • 2

    Assumptions for one-way between-group/within-group ANOVA:

    Before running any parametric test, we always need to make sure that the data we want to analyse

    can actually be analysed using a one-way ANOVA.

    Between-group Within-group

    Assumption #1: The dependent variable should be measured at the interval or ratio level (i.e., continuous scale rather than discrete scale). For example: Revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg).

    Assumption #2: The independent variable should consist of two or more categorical, independent groups. When you have only two groups (e.g.: gender: male and female), an independent-samples t-test is commonly used, although one-way ANOVA will generate the same results. For example: Ethnicity (e.g., 3 groups: Caucasian, African American and Hispanic), physical activity level (e.g., 4 groups: sedentary, low, moderate and high), profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist).

    Assumption #2: The independent variable should consist of at least two categorical, "related groups" or "matched pairs". "Related groups" indicates that the same subjects are present in both groups. The reason that it is possible to have the same subjects in each group is because each subject has been measured on two occasions on the same dependent variable. For example, individuals' performance in a spelling test (the dependent variable) before and after they underwent a new form of computerized teaching method to improve spelling. The repeated measures ANOVA can also be used to compare different subjects, but this does not happen very often.

    Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. For example, when using between-group techniques, there must be different participants in each group with no participant being in more than one group. This is more of a study design issue than something you can test for, but it is an important assumption of the one-way ANOVA. If your study fails this assumption, you will need to use another statistical test instead of the one-way ANOVA (e.g., a repeated measures design). N/A for within-group techniques

    Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the one-way ANOVA, reducing the validity and accuracy of your results.

    Assumption #5: Your dependent variable should be approximately normally distributed for each category of the independent variable. We talk about the one-way ANOVA only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that assumption can be a little violated and still provide valid results, provided that the experiment design is balanced. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics.

  • 3

    Assumption #6: There needs to be homogeneity of variances. You can test this assumption in SPSS Statistics using Levene's test for homogeneity of variances. If your data fails this assumption, you will need to carry out a Welch ANOVA instead of a one-way ANOVA, which you can do using SPSS Statistics, and also

    use a different post-hoc test.

    Assumption #6: Known as sphericity, the variances of the differences between all combinations of related groups must be equal. Unfortunately, repeated measures ANOVAs are particularly susceptible to violating the assumption of sphericity, which causes the test to become too liberal (i.e., leads to an increase in the Type I error rate; that is, the likelihood of detecting a statistically significant result when there isn't one). Fortunately, SPSS Statistics makes it easy to test whether your data has met or failed this assumption.

    Example:

    Research question: Is there a statistically significant difference in undergraduate students grade

    points for a Statistics class based on the type of lecture medium (online conference class, traditional

    lecture, and traditional lecture supplemented by online conference class).

    : There is no statistically significant difference in undergraduate students grade points for a

    Statistics class based on the type of lecture (online conference class, traditional lecture, and

    traditional lecture supplemented by online conference class).

    Null hypothesis: No difference between population means (1 = 2 = 3).

    =

    Research hypothesis: Population means are different (1 2 3).

    >

    Independent samples t-test One-Way ANOVA

    Outcome (Dependent

    variable)

    Control

    Treatment

    Outcome (Dependent

    variable)

    Control

    Treatment 1

    Treatment 2

  • 4

    Example #1 (post hoc):

    We have three teaching program (online conference class, traditional lecture, and traditional lecture

    supplemented by online conference class) and we are interested in the effectiveness of each

    program on increasing undergraduate students grade points for a Statistics class. Below is the data

    for analysis.

    Online conference class

    Traditional lecture Traditional lecture supplemented by online conference class

    12 20 40

    15 19 35

    9 23 42

    MEAN, X 12.00

    20.67 39.00

    Table Summary:

    SS df MS F-ratio

    Between n (X X)2 k-1

    SS (B)

    k-1

    MS (B)

    MS (W)

    Within (X X)2 N-k

    SS (W)

    N-k

    Total (X X)2 N-1

    i refers to the individual cell.

    j refers to the specific group.

    k refers to the number of conditions/ treatments/ groups.

    n is the observations in each group (level of factor A).

    N refers to the total number of participants for the entire study.

    X is the grand mean.

  • 5

    Computation of ANOVA:

    Sum of squares between-groups examines the differences among the group means by calculating

    the variation of each mean (X) around the grand mean (X). This is variation in scores that is due to

    the treatment (or independent variable).

    SSA = n (X X)2

    X =12.00+20.67+39.00

    3

    X =12+15+9+20+19+23+40+35+42

    9

    = .

    SSA = 3 [(12-23.89)2 + (20.67 - 23.89)2 + (39.00 - 23.89)2]

    = 1140.21

    Sum of squares within-group examines error variation or variation of individual scores around each

    group mean. This is variation in scores that is not due to the treatment (or independent variable) but

    due to variation in individuals.

    SSS/A = (X X)2

    SSOnline = (12-12)2 + (15 12)2 + (9 12)2

    = 0 + 9 + 9

    = 18.00

    SSTraditional = (20-20.67)2 + (19 20.67)2 + (23 20.67)2

    = 0.44 + 2.78 + 5.44

    = 8.67

    SSOnline and traditional combination = (40-39)2 + (35 39)2 + (42 39)2

    = 1 + 16 + 9

    = 26.00

    SSS/A = 18 + 8.67 + 26

    = 52.67

  • 6

    Total sum of squares can be computed by adding SSA and SSS/A , but also by simply subtracting each

    score from the grand mean, squaring, and then summing across all cases.

    SST = SSA + SSS/A

    SST = (X X)2

    SST = (12 23.89)2 + (15 23.89)2 + (9 23.89)2 + (20 23.89)2 + (19 23.89)2 +

    (23 23.89)2 + (40 23.89)2 + (35 23.89)2 + (42 23.89)2

    = 141.35 + 79.01 + 221.68 + 15.12 + 23.90 + 0.79 + 259.57 + 123.46 + 328.01

    = 1192.89

    SS df MS F-ratio

    Between 1140.21 3 1 =2 570.105

    64.94

    Within 52.67 9 -3 = 6 8.779

    Total 1192.89 9 1 = 8

    F (2,6)= 64.94, p

  • 7

    Test procedures in SPSS Statistics

    1. Click Analyze > Compare Means > One-way ANOVA

    2. Dependent List: Dependent Variable

    3. Factor: Independent Variable

    4. Post hoc: Tukey

    5. Options: Descriptive, Homogeneity of variance test, Means Plot

    6. Missing Values: Exclude cases by analysis by analysis

    Data Outcome:

    Levenes Test tests the homogeneity of variance (HOV)/ whether the variances of the groups are the

    same.

    ***If Levenes test is significant, (i.e.: the p-value is less than .05), then we can say that the

    variances are significantly different and we have violated the assumption of homogeneity. We

    always want the p-value for Levenet test to be more than .05, and not violate the assumption of

    HOV.

    When we found that we have violated the HOV assumption, we will need to refer to the table

    Robust Tests of Equality of Means.

    Solution: Adjust the F-test to correct the problem using Brown-Forsythe (1974) F-ratio or Welchs F.

    Effect size:

    One can estimate the magnitude of the effect of the independent variable by computing 2 or 2.

    2 = SSASST

    2 = 1140.222

    1192.9 = 0.95 or 95%

    2 = SSA ( 1)(MSS/A)

    SST + MSS/A

    2 = 1140.222(21)(8.778)

    1192.9+ 8.778 = 0.94 or 94%

    Reporting of results:

    There was a statistically significant difference between groups as determined by one-way ANOVA

    (F(2,6) = 64.949, p < .001). A Tukey HSD post-hoc test revealed that the effectiveness of the

    treatment is significant for all treatments: online class (M = 12.00, SD = 3.00); traditional class (M =

    20.67, SD = 2.08) and traditional class with online class (M = 39.00, SD = 3.61). Traditional class is

    more statistically significant with both Treatment A and B at p < .001. The proportion of variance in

    undergraduates grade accounted by the type of teaching program was approximately 95% (2 =

    0.95).

  • 8

    Pairwise comparison

    (i) Planned comparison: Planned at the beginning of the study

    Planned comparisons are more sensitive in detecting the differences.

    However, they do not control for the increased risks of Type 1 errors (rejecting the

    null hypothesis when it is true). Post hoc set a more stringent significance levels to

    reduce the risk of a Type 1 error, given the larger number of comparison tests

    performed.

    One way to control for Type 1 error is to apply Bonferroni adjustment to the alpha

    level that you will use to judge statistical significance. This involves setting a more

    stringent alpha level for each comparison, to keep the alpha across all the tests at a

    reasonable level.

    To achieve this, you divide the alpha level (usually .05) by the number of

    comparisons that you intend to make.

    Test procedures in SPSS Statistics

    Same procedure as one-way ANOVA but just that instead of clicking on the

    post-hoc button, we click on the contrasts button.

    Coefficients: Put in the pre-determined coefficients

    Make sure that the coefficient total comes up to 0.

    (ii) Post-hoc comparison: Conducted if the F-ratio is significant and are exploratory. The

    common ones are Fishers LSD, Tukeys and Scheffe tests.

    Post hoc comparisons are designed to guard against the possibility of an increased

    Type 1 error due to the large number of different comparisons being made. This is

    done by setting more stringent criteria for significance, and therefore it is often

    harder to achieve significance. With small samples, this can be a problem, as it can

    be very hard to find a significance result even when the apparent difference in

    scores between the groups is quite large.

    Test procedures in SPSS Statistics

    Same procedure as one-way ANOVA and click on the post-hoc button.

    ** It is not appropriate to try both and see which results you prefer!

  • 9

    Example # 1 (Planned comparison):

    We have three teaching program (online conference class, traditional lecture, and traditional lecture

    supplemented by online conference class) and we are interested in the effectiveness of each

    program on increasing undergraduate students grade points for a Statistics class. Below is the data

    for analysis.

    Online conference class

    Traditional lecture Traditional lecture supplemented by online conference class

    12 20 40

    15 19 35

    9 23 42

    MEAN 12.00 20.67 39.00

    Research question 1: Is combination of traditional lecture and online conference class is superior to

    online conference class and traditional stand-alone?

    : A traditional lecture supplemented by online conference class is NOT superior to online

    conference class and traditional alone.

    : No difference between population means for online class vs. combination of lecture and online

    class (1 = 3 ) and traditional lecture vs. combination of lecture and online class (2 = 3 ).

    1 : A traditional lecture supplemented by online conference class is superior to online conference

    class and traditional alone.

    1: There is a difference between population means for online class vs. combination of lecture and

    online class (1 < 3 ) and traditional lecture vs. combination of lecture and online class (2 < 3 ).

    Type of lecture Coded as Coefficients

    Online conference class A -1

    Traditional lecture B -1

    Traditional lecture supplemented by online conference class

    C 2

    Research question 2: Is a traditional lecture supplemented by online conference class is superior to

    online conference class alone? (Ignoring the traditional lecture).

    : No difference between population means for online class vs. combination of lecture and online

    class (1 = 3 ).

    1: There is a difference between population means for online class vs. combination of lecture and

    online class (1 < 3 ).

    Type of lecture Coded as Coefficients

    Online conference class A -1

    Traditional lecture B 0

    Traditional lecture supplemented by online conference class

    C 1

  • 10

    Example #2 (Planned comparison):

    A researcher wants to test the effectiveness of Drug X on preventing seasonal allergy and she

    administered the drugs to the patients in her research clinic. She randomly grouped them into 3

    conditions: placebo (sugar pill), low dose and high dose. The dependant variable is an objective

    measure of the effectiveness of the drug.

    Placebo Low Dose High Dose

    3 5 7

    2 2 4

    1 4 5

    1 2 3

    4 3 6

    2.20 3.20 5.00 s 1.30 1.30 1.58

    2 1.70 1.70 2.50 Grand Mean= 3.467

    Grand SD= 1.767 Grand Variance= 3.124

    One-Way ANOVA

    = Means for the three groups are the same = = =

    1 = Means for the three groups are different=

    Planned comparisons

    Research question 1: Is Drug X superior to placebo? Is Drug X effective in preventing seasonal

    allergy?

    Conditions Coded as Coefficients

    Placebo A -2

    Low dose B 1

    High dose C 1

    Research question 2: What is the amount of dose that is needed to prevent seasonal allergy?

    Conditions Coded as Coefficients

    Placebo A 0

    Low dose B -1

    High dose C 1

  • 11

    Figure 1: Overview of the general procedure for one-way ANOVA

    Explore data

    Correct outliers/normality problems

    Run the ANOVA

    Follow-up tests

    Calculate effect size

    Check for outliers, normality, homogeneity, etc.

    Boxplots, histograms, descriptive statistics

    Levene's test significant

    Use Welch or Brown-Forsythe F

    Specific hypotheses Planned comparisons

    No hypotheses Post-hoc tests

  • 12

    One-way Repeated Measures ANOVA/ Within-subjects ANOVA/ ANOVA for correlated samples

    It is equivalent of the one-way ANOVA, but for related and not-independent groups. You can also

    think of it as an extension of the dependent t-test.

    There is one categorical (e.g.: nominal or ordinal) independent variable and one continuous (e.g.:

    interval or ratio) dependent variable.

    We use a repeated measures of ANOVA when:

    (1) It is a study that investigates changes in mean scores over three or more time points.

    For example, you might be investigating the effect of a 6-month exercise training

    programme on blood pressure and want to measure blood pressure at 3 separate time

    points (pre-, midway and post-exercise intervention), which would allow you to develop a

    time-course for any exercise effect.

    In repeated measures ANOVA, the independent variable has categories

    called levels or related groups. Where measurements are repeated over time, such as when

    measuring changes in blood pressure due to an exercise-training programme, the

    independent variable is time. Each level (or related group) is a specific time point. Hence,

    for the exercise-training study, there would be three time points and each time-point is a

    level of the independent variable (a schematic of a time-course repeated measures design is

    shown below):

  • 13

    (2) It is a study that investigates differences in mean scores under three or more different

    conditions.

    For example, you might get the same subjects to eat different types of cake (chocolate,

    caramel and lemon) and rate each one for taste, rather than having different people flavour

    each different cake.

    Where measurements are made under different conditions, the conditions are the levels (or

    related groups) of the independent variable (e.g., type of cake is the independent variable

    with chocolate, caramel, and lemon cake as the levels of the independent variable). A

    schematic of a different-conditions repeated measures design is shown below. It should be

    noted that often the levels of the independent variable are not referred to as conditions,

    but treatments. Which one you want to use is up to you. There is no right or wrong naming

    convention. You will also see the independent variable more commonly referred to as

    the within-subjects factor.

    ***It is important to note that for these two studies mentioned above, the same people are being

    measured more than once on the same dependent variable. This is also why it is called repeated

    measures design.

    Hypothesis for Repeated Measures ANOVA

    The repeated measures ANOVA tests for whether there are any differences between related population means. The null hypothesis (H0) states that the means are equal:

    H0: 1 = 2 = 3 = = k

    where = population mean and k = number of related groups. The alternative hypothesis (HA) states that the related population means are not equal (at least one mean is different to another mean):

    HA: at least two means are significantly different

  • 14

    F-Ratio:

    One-way ANOVA

    Repeated measures ANOVA

    In one-way ANOVA, we partition the variability attributable to the differences between groups

    (SSconditions) and variability within groups (SSw). However, with a repeated measures ANOVA, as we are using the same subjects in each group, we can remove the variability due to the individual differences between subjects, referred to as SSsubjects, from the within-groups variability (SSw). Each subject becomes a level of a factor called subjects. And, with the ability to subtract SSsubjects it will leave us with a smaller SSerror term.

    The between-subjects variability, our new SSerror only reflects individual variability to each condition. You might recognise this as the interaction effect of subject by conditions; that is, how subjects react to the different conditions.

  • 15

    Example # 3

    You are interested to investigate the effect of a 6-month exercise training programme on blood

    pressure and want to measure blood pressure at 3 separate time points (pre-, midway and post-

    exercise intervention), which would allow you to develop a time-course for any exercise effect.

    Exercise intervention

    Subjects

    Pre 3 months 6 months Subject Means, X

    1 45 50 55 50

    2 42 42 45 43

    3 36 41 43 40

    4 39 35 40 38

    5 51 55 59 55

    6 44 49 56 49.67

    Mean, X 42.83

    45.33 49.67

    Grand Mean = 45.94

    Table Summary:

    SS df MS F-ratio

    Between (Treatments)

    n (X X)2 k-1

    SS (B)

    k-1

    MS (B)

    MS (e)

    Within (Subjects)

    (X X)2 n-1

    SS (W)

    N-k

    MS (W)

    MS (e)

    Error

    SS = SS + SS

    SS = SS SS

    SS=k (X X)2

    (k-1)(n-1) SS (e)

    (k-1)(n-1)

    Total (X X)2 N-1

    i refers to the individual cell.

    j refers to the specific group.

    k refers to the number of levels in a factor

    n is the subjects in each group (level of factor A)

    N is the total number of subjects in the whole study.

    X is the grand mean.

  • 16

    Computation of ANOVA:

    Sum of squares between-groups examines the differences between related group means by

    calculating the variation of each mean (X) around the grand mean (X). This is variation in scores that

    is due to the treatment (or independent variable).

    SSBetween= n (X X)2

    SSBetween= 6 [(42.8 45.9)2 + (45.3 45.9)2 + (49.7 45.9)2]

    = 6 [9.61 + 0.36 + 14.44]

    = 143.44

    Sum of squares within-group examines error variation or variation of individual scores around each

    group mean. This is variation in scores that is not due to the treatment (or independent variable) but

    due to variation caused by other factors.

    SSwithin = (X X)2

    SSpre = (45 42.8)2 + (42 42.8)2 + (36 42.8)2 + (39 42.8)2 + (51 42.8)2 + (44 42.8)2

    = 134.83

    SS3months = (50 45.3)2 + (42 45.3)2 + (41 45.3)2 + (35 45.3)2 + (55 45.3)2 + (49 45.3)2

    = 265.33

    SS6months = (55 49.7)2 + (45 49.7)2 + (43 49.7)2 + (40 49.7)2 + (59 49.7)2 + (56 49.7)2

    = 315.33

    SSwithin = 134.83 + 265.33 + 315.33

    = 715.5

    Sum of squares error examines error variation or variation of individual scores around each group

    mean. This is variation in scores that is not due to the treatment (or independent variable) but due

    to variation caused by other factors.

    SSSubjects= 3 [(50 45.9)2 + (43 45.9)2 + (40 45.9)2 + (38 45.9)2 + (55 45.9)2 + (49.7 45.9)2]

    = 658.3

  • 17

    SS = SS + SS

    SS = SS SS

    SS = 715.5 658.3

    = .

    Total sum of squares can be computed by adding SSA and SSS/A , but also by simply subtracting each

    score from the grand mean, squaring, and then summing across all cases.

    SST = SSbetween + SSwithin + SSerror

    SST = (X X)2

    SST = (12 23.89)2 + (15 23.89)2 + (9 23.89)2 + (20 23.89)2 + (19 23.89)2 +

    (23 23.89)2 + (40 23.89)2 + (35 23.89)2 + (42 23.89)2

    = 141.35 + 79.01 + 221.68 + 15.12 + 23.90 + 0.79 + 259.57 + 123.46 + 328.01

    = 1192.89

    SS df MS F-ratio

    Between 143.44 3 1 =2 71.72

    12.53

    Within 715.5 6-1 =5 143.1

    Error

    57.2 (3-1)(6-1) =10 5.72

    Total 858.94 18 1 = 17

    There was a statistically significant effect of time on exercise-induced fitness, F (2, 10) = 12.53, p =

    .002.

    Partial eta-squared of:

    2 =

    SSSS + SS

  • 18

    Test procedures in SPSS Statistics

    1. Click Analyze > General Linear Model > Repeated Measures

    2. Within Subject Factor Name: Put in meaningful name for your Independent Variable (e.g.:

    Time or Condition)

    3. Number of Levels: No. of levels in the factor

    4. Measure Name: Put in meaningful name for your Dependent Variable

    5. Define

    6. Within-subjects variables: Drag the related levels for IV into this box in order (e.g.: Time 1,

    Time 2, Time 3).

    7. Plots: Move factors into Horizontal Axis, then Add and Continue

    8. Options: Transfer IV from Factor(s) and Factor Interaction to the Display Means for.

    9. Tick Compare Main Effects

    10. Select Bonferroni for Confidence interval adjustment.

    11. Display: Descriptive statistics, Estimates of effect size and Homogeneity tests.

    12. Continue and OK.

  • 19

    Increased Power in a Repeated Measures ANOVA

    The major advantage with running a repeated measures ANOVA over an independent

    ANOVA is that the test is generally much more powerful. This particular advantage is achieved by the

    reduction in MSerror (the denominator of the F-statistic) that comes from the partitioning of

    variability due to differences between subjects (SSsubjects) from the original error term in an

    independent ANOVA (SSw): i.e. SSerror = SSw - SSsubjects.

    We achieved a result of F(2, 10) = 12.53, p = .002, for our example repeated measures

    ANOVA. How does this compare to if we had run an independent ANOVA instead? Well, if we ran

    through the calculations, we would have ended up with a result ofF(2, 15) = 1.504, p = .254, for the

    independent ANOVA. We can clearly see the advantage of using the same subjects in a repeated

    measures ANOVA as opposed to different subjects.

    For our exercise-training example, the illustration below shows that after taking away

    SSsubjectsfrom SSw we are left with an error term (SSerror) that is only 8% as large as the

    independent ANOVA error term.

    This does not lead to an automatic increase in the F-statistic as there are a greater number

    of degrees of freedom for SSw than SSerror. However, it is usual for SSsubjects to account for such a

    large percentage of the within-groups variability that the reduction in the error term is large enough

    to more than compensate for the loss in the degrees of freedom (as used in selecting an F-

    distribution).

    Underlying Assumptions: Sphericity

    ANOVAs with repeated measures (within-subject factors) are particularly susceptible to the

    violation of the assumption of sphericity. Sphericity is the condition where the variances of the

    differences between all combinations of related groups (levels) are equal. Violation of sphericity is

    when the variances of the differences between all combinations of related groups are not equal.

    Sphericity can be likened to homogeneity of variances in a between-subjects ANOVA.

    The violation of sphericity is serious for the repeated measures ANOVA, with violation

    causing the test to become too liberal (i.e., an increase in the Type I error rate). Therefore,

    determining whether sphericity has been violated is very important. Luckily, if violations of sphericity

    do occur, corrections have been developed to produce a more valid critical F-value (i.e., reduce the

    increase in Type I error rate). This is achieved by estimating the degree to which sphericity has been

    violated and applying a correction factor to the degrees of freedom of the F-distribution.

    Testing for sphericity is an option in SPSS using Mauchly's Test for Sphericity as part of the

    GLM Repeated Measures procedure. Mauchly's Test of Sphericity tests the null hypothesis that the

    variances of the differences are equal. Thus, if Mauchly's Test of Sphericity is statistically significant

    (p < .05), we can reject the null hypothesis and accept the alternative hypothesis that the variances

    of the differences are not equal (i.e., sphericity has been violated).

  • 20

    Mauchly's Test of Sphericitya

    Measure: CBR

    Within Subjects

    Effect

    Mauchly's

    W

    Approx. Chi-

    Square

    df Sig. Epsilonb

    Greenhouse-

    Geisser

    Huynh-

    Feldt

    Lower-

    bound

    time .434 3.343 2 .188 .638 .760 .500

    Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent

    variables is proportional to an identity matrix.

    a. Design: Intercept

    Within Subjects Design: time

    b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are

    displayed in the Tests of Within-Subjects Effects table.

    Reporting on Mauchlys Test of Sphericity

    Mauchly's Test of Sphericity indicated that the assumption of sphericity had not been violated, 2 =

    3.343, p = .188

    When it is violated, you can report is as:

    Mauchly's Test of Sphericity indicated that the assumption of sphericity had been violated, 2(2) =

    22.115, p < .0005, and therefore, a Greenhouse-Geisser correction was used. There was a significant

    effect of time the DV, F(1.171, 38) = XXX, p < .0005.

    Effect size according to Cohens (1988) guidelines. According to him:

    Small: 0.01

    Medium: 0.059

    Large: 0.138

    So if you end up with = 0.45, you can assume the effect size is very large. It also means that 45%

    of the change in the DV can be accounted for by the IV.

  • 21

    Results:

    Table 97

    Descriptive statistics for effect of a 6-month exercise training at 3 time points: pre, mid and

    post exercise.

    Source n Mean Standard deviation, SD

    Pre 6 42.83 5.19

    3 months 6 45.33 7.29

    6 months 6 49.67 7.94

    Table 98

    Analysis of variance (ANOVA) summary

    Source SS df MS F p 2

    Time 143.44 2 71.72 12.53 .002 .715

    Error (Time) 57.22 10 5.72

    Table 99

    Bonferroni comparison for time: pre, 3 months, and 6 months.

    95% CI

    Comparisons Mean

    Difference

    Std.

    Error

    Lower

    Bound

    Upper

    Bound

    Time

    Pre vs. 6 months 6.83* 1.70 .82 12.85

    3 months vs. 6 months 4.33* .72 1.81 6.86

    * p < 0.05

    Reporting the result: A repeated measures ANOVA was conducted to investigate the effect of a 6-month exercise training

    programme on blood pressure at 3 separate time points (pre-, midway and post-exercise

    intervention). The mean and standard deviations of word status are presented in Table 3. Mauchly's

    Test of Sphericity indicated that the assumption of sphericity had not been violated, 2 = 3.343, p =

    .188.

    The repeated measures ANOVA determined that blood pressure due to exercise effect and time

    differed statistically significantly (F(2, 10) = 12.53, P = 0.002). Partial eta squared is reported at .715

    (large). Post hoc tests using the Bonferroni correction revealed that the mean difference in blood

    pressure for pre and 6-month (MD = 6.83, SD = 1.70, CI = .821 to 12.846) and 3-month and 6-month

    (MD = 4.33, SD = 4.33, CI= 1.81 to 6.86) were statistically significant. However, there is no significant

    difference in blood pressure for pre and 3-month (MD = 2.50, SD = 1.52, CI = 2.88 to 7.88).

    Therefore, we can conclude that a long-term exercise training programme (6-month) elicits a

    significant reduction in blood pressure, but not after only 3 months of training.

  • 22

    Example # 4 Research conducted by: Pearson et al. (2003)

    Case study prepared by: David Lane and Emily Zitek

    Overview: This study investigated the cognitive effects of stimulant medication in children with

    mental retardation and Attention-Deficit/Hyperactivity Disorder. This case study shows the data for

    the Delay of Gratification (DOG) task. Children were given various dosages of a drug,

    methylphenidate (MPH) and then completed this task as part of a larger battery of tests. The order

    of doses was counterbalanced so that each dose appeared equally often in each position. For

    example, six children received the lowest dose first, six received it second, etc. The children were on

    each dose one week before testing.

    This task, adapted from the preschool delay task of the Gordon Diagnostic System (Gordon, 1983),

    measures the ability to suppress or delay impulsive behavioral responses. Children were told that a

    star would appear on the computer screen if they waited long enough to press a response key. If a

    child responded sooner in less than four seconds after their previous response, they did not earn a

    star, and the 4-second counter restarted. The DOG differentiates children with and without ADHD of

    normal intelligence (e.g., Mayes et al., 2001), and is sensitive to MPH treatment in these children

    (Hall & Kataria, 1992).

    Questions to Answer

    Does higher dosage lead to higher cognitive performance (measured by the number of correct

    responses to the DOG task)?

    Design Issues

    This is a repeated-measures design because each participant performed the task after each dosage.

    Descriptions of Variables

    Variable Description

    d0 Number of correct responses after taking a placebo

    d15 Number of correct responses after taking .15 mg/kg of the drug

    d30 Number of correct responses after taking .30 mg/kg of the drug

    d60 Number of correct responses after taking .60 mg/kg of the drug

    References:

    Pearson, D.A., Santos, C.W., Jerger, S.W., Casat, C.D., Roache, J., Loveland, K.A., Lane, D.M., Lachar,

    D., Faria, L.P., & Getchell, C. (2003) Treatment effects of methylphenidate on cognitive

    functioning in children with mental retardation and ADHD. Journal of the American Academy

    of Child and Adolescent Psychiatry, 43, 677-685.