Cross Tabulation and Chi Square Test for Independence
Feb 10, 2016
Cross Tabulation and Chi Square
Test for Independence
Cross-tabulation• Helps answer questions about whether two
or more variables of interest are linked:– Is the type of mouthwash user (heavy or light)
related to gender?– Is the preference for a certain flavor (cherry or
lemon) related to the geographic region (north, south, east, west)?
– Is income level associated with gender?• Cross-tabulation determines association not
causality.
• The variable being studied is called the dependent variable or response variable.
• A variable that influences the dependent variable is called independent variable.
Dependent and Independent Variables
Cross-tabulation• Cross-tabulation of two or more variables is
possible if the variables are discrete:– The frequency of one variable is subdivided by the other
variable categories.• Generally a cross-tabulation table has:
– Row percentages– Column percentages– Total percentages
• Which one is better?DEPENDS on which variable is considered as independent.
Cross tabulationGROUPINC * Gender Crosstabulation
10 9 1952.6% 47.4% 100.0%55.6% 18.8% 28.8%15.2% 13.6% 28.8%
5 25 3016.7% 83.3% 100.0%27.8% 52.1% 45.5%7.6% 37.9% 45.5%
3 14 1717.6% 82.4% 100.0%16.7% 29.2% 25.8%4.5% 21.2% 25.8%
18 48 6627.3% 72.7% 100.0%
100.0% 100.0% 100.0%27.3% 72.7% 100.0%
Count% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of Total
income <= 5
5<Income<= 10
income >10
GROUPINC
Total
Female MaleGender
Total
• A contingency table shows the conjoint distribution of two discrete variables
• This distribution represents the probability of observing a case in each cell– Probability is calculated as:
Contingency Table
Observed casesTotal cases
P=
Chi-square Test for Independence
• The Chi-square test for independence determines whether two variables are associated or not.
H0: Two variables are independent H1: Two variables are not independent
Chi-square test results are unstable if cell count is lower than 5
x² = chi-square statisticsOi = observed frequency in the ith cellEi = expected frequency on the ith cell
i
ii )²( ²E
EOx
nCR
E jiij
Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample sizeEij = estimated cell frequency
Estimated cell Frequency
Chi-Square statistic
Chi-Square Test
Degrees of Freedom
d.f.=(R-1)(C-1)
Aware 50/39 10/21 60
Unaware 15/21 25/14 40 65 35 100
Men Women Total
Awareness of Tire Manufacturer’s Brand
21)2110(
39)3950( 22
2
X
14)1425(
26)2615( 22
Chi-Square Test: Differences Among Groups Example
161.22643.8654.4762.5102.3
2
2
1)12)(12(..)1)(1(..
fd
CRfd
X2 with 1 d.f. at .05 critical value = 3.84
Chi-square Test for Independence
• Under H0, the joint distribution is approximately distributed by the Chi-square distribution (2).
2
Reject H0 Chi-square
3.84
22.16
Differences Between Groups when Comparing Means
• Ratio scaled dependent variables• t-test
– When groups are small– When population standard deviation is
unknown• z-test
– When groups are large
021
21
OR
Null Hypothesis About Mean Differences Between Groups
means random ofy Variabilit2mean - 1mean t
t-Test for Difference of Means
21
21 XXS
t
X1 = mean for Group 1X2 = mean for Group 2SX1-X2 = the pooled or combined standard error of difference between means.
t-Test for Difference of Means
21
21 XXS
t
t-Test for Difference of Means
X1 = mean for Group 1X2 = mean for Group 2SX1-X2
= the pooled or combined standard error
of difference between means.
t-Test for Difference of Means
Pooled Estimate of the Standard Error
2121
222
211 11
2))1(1
21 nnnnSnSnS XX
S12 = the variance of Group 1
S22
= the variance of Group 2n1 = the sample size of Group 1n2 = the sample size of Group 2
Pooled Estimate of the Standard Error
Pooled Estimate of the Standard Error t-test for the Difference of Means
2121
222
211 11
2))1(1
21 nnnnSnSnS XX
S12 = the variance of Group 1
S22
= the variance of Group 2n1 = the sample size of Group 1n2 = the sample size of Group 2
Degrees of Freedom
• d.f. = n - k• where:
–n = n1 + n2
–k = number of groups
14
1211
336.2131.220 22
21 XXS
797.
t-Test for Difference of Means Example
797.2.125.16
t797.
3.4
395.5
Comparing Two Groups when Comparing Proportions
• Percentage Comparisons• Sample Proportion - P• Population Proportion -
Differences Between Two Groups when Comparing Proportions
The hypothesis is:
Ho: 1
may be restated as:
Ho: 1
21: oHor
0: 21 oH
Z-Test for Differences of Proportions
Z-Test for Differences of Proportions
21
2121
ppSppZ
p1 = sample portion of successes in Group 1p2 = sample portion of successes in Group 21 1)= hypothesized population proportion 1
minus hypothesized populationproportion 1 minus
Sp1-p2 = pooled estimate of the standard errors of difference of proportions
Z-Test for Differences of Proportions
Z-Test for Differences of Proportions
21
1121 nn
qpS pp
p = pooled estimate of proportion of success in a sample of both groupsp = (1- p) or a pooled estimate of proportion of failures in a sample of both groupsn= sample size for group 1 n= sample size for group 2
p
q p
Z-Test for Differences of Proportions
Z-Test for Differences of Proportions
21
2211
nnpnpnp
100
1100
1625.375.21 ppS
068.
Z-Test for Differences of Proportions
100100
4.10035.100
p
375.
A Z-Test for Differences of Proportions
Analysis of Variance
Hypothesis when comparing three groups
1
groupswithinVariancegroupsbetweenVariance
F
Analysis of Variance F-Ratio
Analysis of Variance Sum of Squares
betweenwithintotal SS SS SS
n
i
c
j1 1
2total )( SS XX ij
Analysis of Variance Sum of SquaresTotal
Analysis of Variance Sum of Squares
pi = individual scores, i.e., the ith observation or test unit in the jth grouppi = grand meann = number of all observations or test units in a groupc = number of jth groups (or columns)
ijX
X
n
i
c
jj
1 1
2within )( SS XX ij
Analysis of Variance Sum of SquaresWithin
Analysis of Variance Sum of SquaresWithin
pi = individual scores, i.e., the ith observation or test unit in the jth grouppi = grand meann = number of all observations or test units in a groupc = number of jth groups (or columns)
ijX
X
n
jjjn
1
2between )( SS XX
Analysis of Variance Sum of Squares Between
Analysis of Variance Sum of squares Between
= individual scores, i.e., the ith observation or test unit in the jth group = grand meannj = number of all observations or test units in a group
jX
X
1
cSSMS between
between
Analysis of Variance Mean Squares Between
ccnSSMS within
within
Analysis of Variance Mean Square Within
within
between
MSMSF
Analysis of Variance F-Ratio
Sales in Units (thousands)
Regular Price$.99
1301188784
X1=104.75X=119.58
Reduced Price$.89
145143120131
X2=134.75
Cents-Off CouponRegular Price
1531299699
X1=119.25
Test Market A, B, or CTest Market D, E, or FTest Market G, H, or ITest Market J, K, or L
MeanGrand Mean
A Test Market Experiment on Pricing
ANOVA Summary Table Source of Variation
• Between groups• Sum of squares
– SSbetween• Degrees of freedom
– c-1 where c=number of groups• Mean squared-MSbetween
– SSbetween/c-1
ANOVA Summary Table Source of Variation
• Within groups• Sum of squares
– SSwithin• Degrees of freedom
– cn-c where c=number of groups, n= number of observations in a group
• Mean squared-MSwithin– SSwithin/cn-c
WITHIN
BETWEEN
MSMSF
ANOVA Summary Table Source of Variation
• Total• Sum of Squares
– SStotal• Degrees of Freedom
– cn-1 where c=number of groups, n= number of observations in a group