Page 1
Planned Contrast: Execution (Conceptual)
1. Must predict pattern of interaction before gathering data. Predict that Democratic women will be most opposed to gun instruction in school, compared to Democratic men, Republican men, and Republican women.
0
1
2
3
4
5
Republican Democrat
Rat
ing Male
Female
Page 2
Post Hoc Tests
Do female democrats differ from other groups?
1 = Male/Republican 5.002 = Male/Democrat 4.503 = Female/Republican 4.754 = Female/Democrat 2.75
Conduct six t tests? NO. Why not? Will capitalizes on chance.
Solution: Post hoc tests of multiple comparisons.
Post hoc tests consider the inflated likelihood of Type I error
Kent's favorite—Tukey test of multiple comparisons, which is the most generous.
NOTE: Post hoc tests can be done on any multiple set of means, not only on planned contrasts.
Page 3
Conducting Post Hoc Tests
1. Recode data from multiple factors into single factor, as per planned contrast. 2. Run oneway ANOVA statistic 3. Select "posthoc tests" option.
ONEWAY gunctrl BY genparty /CONTRAST= -1 -1 -1 3 /STATISTICS DESCRIPTIVES /MISSING ANALYSIS /POSTHOC = TUKEY ALPHA(.05).
Selected post-hoc test
Note: Not necessary to conduct planned contrast to conduct post-hoc test
Page 4
Descriptives
gunctrl
4 5.0000 .81650 .40825 3.7008 6.2992 4.00 6.004 4.5000 1.29099 .64550 2.4457 6.5543 3.00 6.004 4.7500 .95743 .47871 3.2265 6.2735 4.00 6.004 2.7500 .95743 .47871 1.2265 4.2735 2.00 4.00
16 4.2500 1.29099 .32275 3.5621 4.9379 2.00 6.00
male republicanmale democratfemale republicanfemale democratTotal
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
ANOVA
gunctrl
12.500 3 4.167 4.000 .03512.500 12 1.04225.000 15
Between GroupsWithin GroupsTotal
Sum ofSquares df Mean Square F Sig.
Post hoc Tests, Page 1
Page 5
Multiple Comparisons
Dependent Variable: gunctrlTukey HSD
.50000 .72169 .898 -1.6426 2.6426
.25000 .72169 .985 -1.8926 2.39262.25000* .72169 .039 .1074 4.3926-.50000 .72169 .898 -2.6426 1.6426-.25000 .72169 .985 -2.3926 1.89261.75000 .72169 .125 -.3926 3.8926-.25000 .72169 .985 -2.3926 1.8926.25000 .72169 .985 -1.8926 2.3926
2.00000 .72169 .070 -.1426 4.1426-2.25000* .72169 .039 -4.3926 -.1074-1.75000 .72169 .125 -3.8926 .3926-2.00000 .72169 .070 -4.1426 .1426
(J) genpartymale democratfemale republicanfemale democratmale republicanfemale republicanfemale democratmale republicanmale democratfemale democratmale republicanmale democratfemale republican
(I) genpartymale republican
male democrat
female republican
female democrat
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound95% Confidence Interval
The mean difference is significant at the .05 level.*.
Post Hoc Tests, Page 2
Page 6
Data Management Issues
Setting up data file
Checking accuracy of data
Disposition of data Why obsess on these details? Murphy's Law
If something can go wrong, it will go wrong, and at the worst possible time.
Errars Happin!
Page 7
Creating a Coding Master
1. Get survey copy 2. Assign variable names 3. Assign variable values 4. Assign missing values 5. Proof master for accuracy 6. Make spare copy, keep in file drawer
Page 8
Coding Master
variable names
variable values
Note: Var. values not needed for scales
Page 9
Cleaning Data Set
1. Exercise in delay of gratification 2. Purpose: Reduce random error 3. Improve power of inferential stats.
Page 10
Complete Data Set
Note: Are any cases missing data?
Page 11
Are any “Minimums” too low? Are any “Maximums” too high?
Do Ns indicate missing data?
Do SDs indicate extreme outliers?
Checking Descriptives
Page 12
Do variables correlate in the expected manner?
Checking Correlations Between Variables
Page 13
Using Cross Tabs to Check for Missing or Erroneous Data Entry
Case A: Expect equal cell sizesGender
Oldest Youngest Only Child
Males 10 10 20
Females 5 15 20
TOTAL 15 25 40
Case B: Impossible outcomeNumber of Siblings
Oldest Youngest Only Child
None 4 3 6
One 3 4 0
More than one 3 4 2
TOTAL 10 10 8
Page 14
Storing Data
Raw Data
1. Hold raw data in secure place
2. File raw data by ID #
3. Hold raw date for at least 5 years post publication, per APA Automated Data
1. One pristine source, one working file, one syntax file
2. Back up, Back up, Back up
` 3. Use external hard drive as back-up for PC
Page 15
File Raw Data Records By ID Number
01-20 21-40 41-60 61-80 81-100 101-120
Page 16
COMMENT SYNTAX FILE GUN CONTROL STUDY SPRING 2007
COMMENT DATA MANAGEMENT
IF (gender = 1 & party = 1) genparty = 1 .EXECUTE .IF (gender = 1 & party = 2) genparty = 2 .EXECUTE .IF (gender = 2 & party = 1) genparty = 3 .EXECUTE .IF (gender = 2 & party = 2) genparty = 4 .EXECUTE .
COMMENT ANALYSES
UNIANOVA gunctrl BY gender party /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PRINT = DESCRIPTIVE /CRITERIA = ALPHA(.05) /DESIGN = gender party gender*party .
ONEWAY gunctrl BY genparty /CONTRAST= -1 -1 -1 3 /STATISTICS DESCRIPTIVES /MISSING ANALYSIS /POSTHOC = TUKEY ALPHA(.05).
Save Syntax File!!!
Page 17
Research Project NotebookPurpose: All-in-one handy summary of research project
Content: 1. Administrative (timeline, list of staff, etc.)2. Overview of Research3. Experiment Materials
* Surveys* Consents, debriefings* Manipulations* Procedures summary/instructions
4. IRB materials* Application* Approval
5. Data* Coding forms* Syntax file* Primary outcomes
Page 18
Correlation
Class 20
Page 19
Today's Class Covers
What and why of measures of association
Covariation
Pearson's r correlation coefficient
Partial Correlation
Comparing two correlations
Non-Parametric correlations
Page 20
Do Variables Relate to One Another?
Is teacher pay related to performance?
Is exercise related to illness?
Is CO2 related to global warming?
Is platoon cohesion related to PTSD?
Is TV viewing related to shoe size?
Positive
Negative
Positive
Negative
Zero
Page 21
Exercise and Illness
1. How many times a week do you exercise? _____
2. How many days have you missed school this term due to illness? _____
3. How many hours of sleep do you get each night? ____
Page 22
Interpreting Correlations
[C] Sleep Hours
[A] Exercise
[B] Illness
A --> B Exercise reduces illness
B --> A Illness reduces exercise
C --> (A & B) Third variable (sleep) affects exercise and illness simultaneously
Page 23
Exercise and Illness Data (fabricated)
subject exerise.days sleep.hours sick.days1 5 7 0
2 3 6 2
3 4 8 1
4 6 7 1
5 2 6 3
6 4 7 1
7 1 5 7
8 7 6 3
9 4 7 3
10 3 6 3
11 5 7 2
12 2 6 4
13 3 5 2
14 3 6 4
Page 24
Description of Data
Page 25
Scatterplot: Exercise and Days Sick
Regression Line
Page 26
Co-variation8
7
6
5
4
3
2
1
01 2 3 4 5 6 7 8 9 10 11 12 13 14
exercise dayssick days
Subject Number
# Da
ys
Page 27
Covariation Formula
cov (x,y) =Σ (Xi – X) (Yi – Y)
N – 1
cov(exercise, sickness) =(-3.32) + (0.40) + (-0.46) …+ (-1.02)
14-1
= -23/13 = -1.77
Page 28
Problem with Covariation
"To all health and exercise researchers: Please send us your exercise and health covariations."
Team 1: exercise = days per week exercise, covariation = -1.77
Team 2: exercise = hours per week exercise, covariation = -34.00
What if we all we have are the covariations?
How do we compare them?
How would we know, in this case, whether Team 1 showed a larger, smaller, or equal covariation than did Team 2?
Page 29
Pearson Correlation Coefficient
r = covxy
sxsy
r = Σ (Xi – X) (Yi – Y)
(N – 1)
sxsy
Pearson r (“rho”): -1.00 to + 1.00
=
Page 30
Using R2 to Interpret Correlation
R2 = r2 = amount of variance shared between correlated variables.
Correl: exercise.hours, sick.days = .613
R2 = .6132 = .376
“About 38% of variability in sick days is explained by variability in exercise hours.”
Page 31
Variation in Sick Days Explained by Exercise Hours
Exercise hours = .376%
0 2.5 7
Number of Sick Days Last Term
R2 = .6132 = .376
Page 32
Partial Correlation
Issue: How much does Variable 1 explain Variable 2, AFTER accounting for the influence of Variable 3?
Sickness and Exercise Study: How much does exercise explain days sick, AFTER accounting for the influence of nightly hours of sleep?
Partial Correlation answers this question.
Page 33
Partial Correlations in SPSS
PARTIAL CORR /VARIABLES= sleep.hours sick.days by exercise.days /SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE.
PARTIAL CORR /VARIABLES= sleep.hours exercise.days by sick.days /SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE.
Page 34
Non-Parametric CorrelationsAssumptions of Correlations
1. Normally distributed data
2. Homogeneity of variance
3. Interval data (at least)
What if Assumptions Not Met?
Spearman's rho: Data are ordinal.
Kendall's tau: Data are ordinal, but small sample, and many scores have the same ranking
Page 35
Parametric CorrelationsAssumptions of Correlations
1. Normally distributed data
2. Homogeneity of variance
3. Interval data (at least)
Var. A Var. B
Watch TV
1 hr 2 hr 3 hr 4 hr 5 hr
Eat Fast Food
1 day 2 day 3 day 4 day 5 day
Page 36
Non-Parametric Correlations
Var. A Var. B
Watch TV
Never Daily Weekly Monthly Yearly
Eat Fast Food
Never Daily Weekends Holidays Leap Years
What if Assumptions Not Met?
Spearman's rho: Data are ordinal.
Kendall's tau: Data are ordinal, but small sample, and many scores have the same ranking.
Page 37
Comparing Correlations
Issue: How do we know if one correlation is different from another?
Example: Is the nightly-sleep / sick days correl. different from the TV hours /sick days correl?
Page 38
Difference Between Correlations
Link to calculator for two ind. samples correlationshttp://faculty.vassar.edu/lowry/rdiff.html
Diff. Between 2 Independent correlations
Diff. Between 2 dependent = correlations
tdifference = (rxy - rzy) √(n-3) (1 + rxz)
2 (1-r2xy -r2
xz - r2zy + 2rxyrxzrzy)
z = zr1 - zr2
1
n1 - 3+
1
n2 - 3
Page 39
Note: Assumes independent samples
Page 40
Partial Correlation
Sick DaysExercise DaysSleep Hours
var. explained = .376
var. explained = .27 var. explained by exercise alone (.04)
var. explained by sleep alone (.04)
var. explained by exercise + sleep (.21)