Top Banner
Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
40

Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Dec 25, 2015

Download

Documents

Morgan Hunter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Sociology 5811:Lecture 16: Crosstabs 2Measures of Association

Plus Differences in Proportions

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Announcements

• Final project proposals due Nov 15• Get started now!!!

• Find a dataset

• figure out what hypotheses you might test

• Today: Wrap up Crosstabs• If time remains, we’ll discuss project ideas…

Page 3: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Chi-square Test

• Chi-Square test is a test of independence

• Null hypothesis: the two categorical variables are statistically independent

• There is no relationship between them

• H0: Gender and political party are independent

• Alternate hypothesis: the variables are related, not independent of each other

• H1: Gender and political party are not independent

• Test is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables.

Page 4: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Expected Cell Values

• If two variables are independent, cell values will depend only on row & column marginals– Marginals reflect frequencies… And, if frequency is

high, all cells in that row (or column) should be high

• The formula for the expected value in a cell is:

N

fff jiij

))((ˆ

• fi and fj are the row and column marginals

• N is the total sample size

Page 5: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Chi-square Test

• The Chi-square formula:

R

i

C

j ij

ijij

E

OE

1 1

22 )(

• Where:

• R = total number of rows in the table

• C = total number of columns in the table

• Eij = the expected frequency in row i, column j

• Oij = the observed frequency in row i, column j

– Assumption for test: Large N (>100)– Critical value DofF: (R-1)(C-1).

Page 6: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Example: Gender and Political Views– Let’s pretend that N of 68 is sufficient

Women Men

DemocratO11: 27

E11: 23.4

O12 : 10

E12 : 13.6

RepublicanO21 : 16

E21 : 19.6

O22 : 15

E22 : 11.4

Page 7: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Compute (E – O)2 /E for each cell

Women Men

Democrat(23.4 – 27)2/23.4

= .55(13.6 – 10)2/13.6

= .95

Republican(19.6 – 16)2/19.6

= .66

(11.4 – 15)2/15

= .86

Page 8: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-Square Test of Independence

• Finally, sum up to compute the Chi-square

• 2 = .55 + .95 + .66 + .86 = 3.02

• What is the critical value for =.05?• Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1

• According to Knoke, p. 509: Critical value is 3.84

• Question: Can we reject H0?• No. 2 of 3.02 is less than the critical value

• We cannot conclude that there is a relationship between gender and political party affiliation.

Page 9: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Weaknesses of chi-square tests:

• 1. If the sample is very large, we almost always reject H0.

• Even tiny covariations are statistically significant

• But, they may not be socially meaningful differences

• 2. It doesn’t tell us how strong the relationship is• It doesn’t tell us if it is a large, meaningful difference or a

very small one

• It is only a test of “independence” vs. “dependence”

• Measures of Association address this shortcoming.

Page 10: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Measures of Association

• Separate from the issue of independence, statisticians have created measures of association– They are measures that tell us how strong the

relationship is between two variables

• Weak Association Strong Association

Women Men

Dem. 51 49

Rep. 49 51

Women Men

Dem. 100 0

Rep. 0 100

Page 11: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association:Yule’s Q

• #1: Yule’s Q– Appropriate only for 2x2 tables (2 rows, 2 columns)

• Label cell frequencies a through d: a b

c d

• Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship.

• Yule’s Q captures that in a measure

• 0 = no association. -1, +1 = strong association

adbc

adbcQ

:Formula

Page 12: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association:Yule’s Q

• Rule of Thumb for interpreting Yule’s Q:• Bohrnstedt & Knoke, p. 150

Absolute value of Q

Strength of Association

0 to .24 “virtually no relationship”

.25 to .49 “weak relationship”

.50 to .74 “moderate relationship”

.75 to 1.0 “strong relationship”

Page 13: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

a b

c d

Crosstab Association:Yule’s Q• Example: Gender and Political Party Affiliation

Women Men

Dem 27 10

Rep 16 15

Calculate “bc”

bc = (10)(16) = 160

Calculate “ad”

ad = (27)(15) = 405

adbc

adbcQ

405160

405160

48.505

245

• -.48 = “weak association”, almost “moderate”

Page 14: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Association: Other Measures

• Phi ()• Very similar to Yule’s Q

• Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc.

• Gamma (G)• Based on a very different method of calculation

• Not limited to 2x2 tables

• Requires ordered variables

• Tau c (c) and Somer’s d (dyx)• Same basic principle as Gamma

• Several Others discussed in Knoke, Norusis.

Page 15: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Gamma, like Q, is based on comparing “diagonal” to “off-diagonal” cases.– But, it does so differently

• Jargon:

• Concordant pairs: Pairs of cases where one case is higher on both variables than another case

• Discordant pairs: Pairs of cases for which the first case (when compared to a second) is higher on one variable but lower on another

Page 16: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Example: Approval of candidates– Cases in “Love Trees/Love Guns” cell make

concordant pairs with cases lower on both

Hate Trees

Trees OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

All 71 individuals can be a pair with everyone in the

lower cells. Just Multiply!

(71)(659+1498+ 431+467) = 216,905 conc. pairs

Page 17: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• More possible concordant pairs– The “Love Guns/Trees are OK” cell and the “Trees =

OK/Love Guns” cells also can have concordant pairs

Hate Trees

Trees = OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

These 603 can pair with all those that score lower on

approval for Guns & Trees

(603)(659 + 431) = 657,270 conc. pairs

These can pair lower too!

(452)(431 + 467) = 405,896 conc. pairs

Page 18: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Discordant pairs: Pairs where a first person ranks higher on one dimension (e.g. approval of Trees) but lower on the other (e.g., app. of Guns)

Hate Trees

Trees = OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

The top-left cell is higher on Guns but lower on Trees than those in the

lower right. They make pairs:

(1205)(1498 + 452 + 467 + 1120) = 4,262,085

discordant pairs

Page 19: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Associaton: Gamma

• If all pairs are concordant or all pairs are discordant, the variables are strongly related

• If there are an equal number of discordant and concordant pairs, the variables are weakly associated.

• Formula for Gamma:ds

ds

nn

nnG

• ns = number of concordant pairs

• nd = number of discordant pairs

Page 20: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Calculation of Gamma is typically done by computer

• Zero indicates no association

• +1 = strong positive association

• -1 = strong negative association

• It is possible to do hypothesis tests on Gamma• To determine if population gamma differs from zero

• Requirements: random sample, N > 50

• See Knoke, p. 155-6.

Page 21: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association

• Final remarks:

• You have a variety of possible measures to assess association among variables. Which one should you use?

• Yule’s Q and Phi require a 2x2 table

• Larger ordered tables: use Gamma, Tau-c, Somer’s d

• Ideally, report more than one to show that your findings are robust.

Page 22: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Odds ratios are a powerful way of analyzing relationships in crosstabs

• Many advanced categorical data analysis techniques are based on odds ratios

• Review: What is a probability?• p(A) = # of outcomes that are “A” divided by total number

of outcomes

• To convert a frequency distribution to a probability distribution, simply divide frequency by N

• The same can be done with crosstabs: Cell frequency over N is probability.

Page 23: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• If total N = 68, probability of drawing cases is:

Women Men

Dem 27 / 68 10 / 68

Rep 16 / 68 15 / 68

Women Men

Dem .397 .147

Rep .235 .220

Page 24: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Odds are similar to probability… but not quite

• Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A– Note: Denominator is different that probability

• Ex: Probability of rolling 1 on a 6-sided die = 1/6

• Odds of rolling a 1 on a six-sided die = 1/5

• Odds can also be calculated from probabilities:

i

ii p

podds

1

Page 25: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Conditional odds = odds of being in one category of a variable within a specific category of another variable– Example: For women, what are the odds of being

democrat?– Instead of overall odds of being democrat, conditional

odds are about a particular subgroup in a table

Women Men

Dem 27 10

Rep 16 15

Conditional odds of being democrat are:

27 / 16 = 1.69

Note: Odds for women are different than men

Page 26: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• If variables in a crosstab are independent, their conditional odds are equal

• Odds of falling into one category or another are same for all values of other variable

• If variables in a crosstab are associated, conditional odds differ

• Odds can be compared by making a ratio• Ratio is equal to 1 if odds are the same for two groups

• Ratios much greater or less than 1 indicate very different odds.

Page 27: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Formula for Odds Ratio in 2x2 table:

ad

bc

ca

dbOR XY

Women Men

Dem 27 10

Rep 16 15

• Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395

• Interpretation: men have .395 times the odds of being a democrat compared to women

• Inverted value (1/.395=2.5) indicates odds of women being democrat = 2.5 is times men’s odds

a b

c d

Page 28: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios: Final Remarks

• 1. Cells with zeros cause problems for odds ratios

• Ratios with zero in denominator are undefined.

• Thus, you need to have full cells

• 2. Odds ratios can be used to measure assocation• Indeed, Yule’s Q is based on them

• 3. Odds ratios form the basis for most advanced categorical data analysis techniques

• For now it may be easier to use Yule’s Q, etc. But, if you need to do advanced techniques, you will use odds ratios.

Page 29: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.
Page 30: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.
Page 31: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions• Another approach to small (2x2) tables:

• Instead of making a crosstab, you can just think about the proportion of people in a given category

• More similar to T-test than a Chi-square test

• Ex: Do you approve of Pres. Bush? (Yes/No)

• Sample: N = 86 women, 80 men

• Proportion of women that approve: PW = .70

• Proportion of men that approve: PM = .78

• Issue: Do the populations of men/women differ?• Or are the differences just due to sampling variability

Page 32: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Hypotheses:

• Again, the typical null hypothesis is that there are no differences between groups

• Which is equivalent to statistical independence

• H0: Proportion women = proportion men

• H1: Proportion women not = proportion men• Note: One-tailed directional hypotheses can also be used.

Page 33: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Strategy: Figure out the sampling distribution for differences in proportions

• Statisticians have determined relevant info:

• 1. If samples are “large”, the sampling distribution of difference in proportions is normal– The Z-distribution can be used for hypothesis tests

• 2. A Z-value can be calculated using the formula:

)(

21

21σ̂

ZPP

PP

Page 34: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Standard error can be estimated as:

21

2211

NN

PNPNPboth

21

21)( )1(σ̂

21 NN

NNPP bothbothPP

• Where:

Page 35: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Q: Do you approve of Pres. Bush? (Yes/No)

• Sample: N = 86 women, 80 men

• Women: N = 86, PW = .70

• Men: N = 80, PW = .78

• Total N is “Large”: 166 people– So, we can use a Z-test

• Use = .05, two-tailed Z = 1.96

Page 36: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Use formula to calculate Z-value

)()()(

21

212121σ̂

08.

σ̂

78.70.

σ̂Z

PPPPPP

PP

• And, estimate the Standard Error as:

21

21)( )1(σ̂

21 NN

NNPP bothbothPP

Page 37: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• First: Calculate Pboth:

21

2211

NN

PNPNPboth

739.166

4.622.60

bothP

8086

)78(.80)70(.86

bothP

Page 38: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Plug in Pboth=.739:

21

21)( )739.1(739.σ̂

21 NN

NNPP

)80)(86(

8086454.σ̂ )( 21

PP

104.6880

166674.σ̂ )( 21

PP

Page 39: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Finally, plug in S.E. and calculate Z:

)()()(

21

212121σ̂

08.

σ̂

78.70.

σ̂Z

PPPPPP

PP

769.104.

08.

σ̂Z

)(

21

21

PP

PP

Page 40: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Results:

• Critical Z = 1.96

• Observed Z = .739

• Conclusion: We can’t reject null hypothesis– Women and Men do not clearly differ in approval of

Bush