Statistical Tests for Computational Intelligence Research and Human Subjective Tests Hideyuki TAKAGI Kyushu University, Japan http://www.design.kyushu-u.ac.jp/~takagi/ ver. March 26, 2015 ver. July 15, 2013 ver. July 11, 2013 ver. April 23, 2013 Slides are downloadable from http://www.design.kyushu-u.ac.jp/~takagi Contents 2 groups n groups (n > 2) data distribution ・unpaired t -test ・sign test ・Wilcoxon signed-ranks test ・Friedman test ・Kruskal-Wallis test ・ one-way ANOVA ・ two-way ANOVA (no normality) one-way data two-way data Parametric Test Non-parametric Test (normality) unpaired (independent) paired (related) unpaired (independent) paired (related) ・paired t -test ・Mann-Whitney U-test ANOVA (Analysis of Variance) + Scheffé's method of paired comparison for Human Subjective Tests fitness generations fitness generations conventional EC proposed EC1 proposed EC2 How to Show Significance? conventional EC Just compare averages visually? It is not scientific. Fig. XX Average convergence curves of n times of trial runs. How to Show Significance? sound made by conventional IEC sound made by proposed IEC1 sound made by proposed IEC2 Sound design concept: exiting Which method is good to make exiting sound? How to show it?
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Statistical Tests for Computational Intelligence Research and
Human Subjective Tests
Hideyuki TAKAGIKyushu University, Japan
http://www.design.kyushu-u.ac.jp/~takagi/ver. March 26, 2015ver. July 15, 2013ver. July 11, 2013ver. April 23, 2013
Slides are downloadable fromhttp://www.design.kyushu-u.ac.jp/~takagi
Contents2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
+Scheffé's method of paired comparison for Human Subjective Tests
fitne
ss
generations
fitne
ss
generations
conventionalEC
proposed EC1 proposed EC2
How to Show Significance?
conventionalEC
Just compare averages visually?It is not scientific.
Fig. XX Average convergence curves of n times of trial runs.
How to Show Significance?
sound made byconventional IEC
sound made byproposed IEC1
sound made byproposed IEC2
Sound design concept: exiting
Which method is good to make exiting sound?How to show it?
You cannot show the superiority of your method without statistical tests.
Papers without statistics tests may be rejected.
statistical test
My method is significantly better!
Which Test Should We Use?2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
Which Test Should We Use?2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
n-th generation n-th generation
2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
When (p > 0.05), we assume that there is no significant difference between σ2
A and σ2B .
t-Testt-Test t-Testt-TestExcel (32 bits version only?) has t-tests and ANOVA in Data Analysis Tools. You must install its add-in. (File -> option -> add-in, and set its add-in.)
Q1: Where is significant among A, B, and C?A1: Apply multiple comparisons between all pairs
among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
significant?
Non-Parametric Tests2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA(n
o no
rmal
ity)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
If normality and equal variances are not guaranteed, use non-parametric tests.
Mann-Whitney U-test2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
n-th generation
??no normality
1. Comparison of two groups.2. Data have no normality.3. There are no data corresponding
between two groups (independent).
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
1. Calculate a U value. 0
2
34
U = 0 + 2 + 3 + 4 = 9U' = 11 (U + U' = n1n2)
when two values are the same, count as 0.5.( )
2. See a U-test table.
Mann-Whitney U-test (cont.)(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
• Use the smaller value of U or U'.• When n1 ≤ 20 and n2 ≤ 20 , see a Mann-Whitney test table.
(where n1 and n2 are the # of data of two groups.)• Otherwise, since U follows the below normal distribution roughly,
normalize U as and check a standard normal distribution table
with the , where and .
12
)1(,2
, 2121212 nnnnnnNN UU
U
UUz
221nn
U 12
)1( 2121
nnnnUz
Use an Excel function to calculate the p-value for the z-value:p-value = 1 - NORM.S.DIST( z )
Examples: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
U = 9U' = 11
0
2
34
00.5
2.54
5
3.555
5
U = 23.5U' = 1.5
U = 12U' = 13
Ex.1 Ex.2 Ex.3
4 5 6 ・・・
・・・ ー ・・・ ・・・ ・・・
4 0 1 2 ・・・
5 2 3 ・・・
・・・ ・・・ ・・・
n1n2
(p < 0.05)4 5 6 ・・・
・・・ ー ・・・ ・・・ ・・・
4 ー ー 0 ・・・
5 1 1 ・・・
・・・ ・・・ ・・・
n1n2
(p < 0.01)
5
Exercise: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
2.5
45
6
66
U = 29.5U' = 6.5
4 5 6 7
3 ー 0 1 1
4 0 1 2 3
5 2 3 5
6 5 6
n1n2
(p < 0.05)4 5 6 7
3 ー ー ー ー
4 ー ー 0 0
5 1 1 1
6 2 3
n1n2
(p < 0.01)
Since U' > 5, (p > 0.05): significance is not found( )
Sign Test2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
(1)Sign Test
(2)Wilcoxon's Signed Ranks Test
significance test between the # of winnings and losses
173 174
143 137
158 151
156 143
176 180
165 162
- ++ -+ -+ -- ++ -
-1+6+7+13-4+3
data of 2 groups
# of winnings and losses
the level of winnings/losses
Sign Test
significance test using both the # of winnings and losses and the level of winnings/losses
n-th generation
Sign Test
1. Calculate the # of winnings and losses by comparing runs with the same initial data.
2. Check a sign test table to show significance of two methods.
Sign Test Generations 0 10 20 30 40 50 |__________|__________|__________|__________|__________|F1: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_GLB_nD + +++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_LOC_nD ++ ++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_GLB_1D + +++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_LOC_1D +++ ++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_LR + ++++++++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_FR_GLB_nDF2: DE_N vs. DE_FR_LOC_nDF2: DE_N vs. DE_FR_GLB_1DF2: DE_N vs. DE_FR_LOC_1DF3: DE_N vs. DE_LRF3: DE_N vs. DE_LSF3: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++F3: DE_N vs. DE_FR_LOC_nDF3: DE_N vs. DE_FR_GLB_1D F3: DE_N vs. DE_FR_LOC_1D + +F4: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_LOC_nDF4: DE_N vs. DE_FR_GLB_1D + ++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_LOC_1D ++++++++++++++++++++++++++++++++++++++++++++++F5: DE_N vs. DE_LR + ++ +F5: DE_N vs. DE_LSF5: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++ F5: DE_N vs. DE_FR_LOC_nD ++F5: DE_N vs. DE_FR_GLB_1D +++ F5: DE_N vs. DE_FR_LOC_1D +F6: DE_N vs. DE_LR ++++++++++++ + +++++++F6: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++ +++++F6: DE_N vs. DE_FR_GLB_nD + + +++++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_LOC_nD +++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_GLB_1D + + +++++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_LOC_1D +++++++++ ++++++++++++++++++++++ +++F7: DE_N vs. DE_LR +F7: DE_N vs. DE_LSF7: DE_N vs. DE_FR_GLB_nD ++++++ + ++F7: DE_N vs. DE_FR_LOC_nD +F7: DE_N vs. DE_FR_GLB_1D +F7: DE_N vs. DE_FR_LOC_1DF8: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_LOC_nD +F8: DE_N vs. DE_FR_GLB_1D +++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_LOC_1D +++ +++++++++++++++++++++++++++++++++++++++++++++
Fig.3 in Y. Pei and H. Takagi, "Fourier analysis of the fitness landscape for evolutionary search acceleration," IEEE Congress on Evolutionary Computation (CEC), pp.1-7, Brisbane, Australia (June 10-15, 2012).The (+,-) marks show whether our proposed methods converge significantly better or poorer than normal DE, respectively, (p ≤0.05).
Fig.2 in the same paper.
Task ExampleWhether performances of pattern recognition methods A and B are significantly different?
n1 cases: Both methods succeeded.n2 cases: Method A succeeded, and method B failed.n3 cases: Method A failed, and method B succeeded.n4 cases: Both methods failed.
1. Set N = n2 + n3.2. Check the right table with the N.3. If min(n2, n3) is smaller than the number for the N,
we can say that there is significant difference with the significant risk level of XX.
How to check?
Whether there is significant difference for n2 = 12 and n3 = 28?
Exercise
level of significanceSign Test %%
ANSWER:Check the right table with N = 40.As n2 is bigger than 11 and smaller than 13, we can say that there is a significant difference between two with (p < 0.05) but cannot say so with (p < 0.01).
% %
level of significance level of significance
%%Sign Test
Let's think about the case of N = 17.
To say that n1 and n2 are significantly different,(n1 vs. n2) = (17 vs. 0), (16 vs. 1), or (15 vs. 2) (p < 0.01)or(n1 vs. n2) = (14 vs. 3) or (13 vs. 4) (p < 0.05)
Check the significance of:
Exercise: Sign Test level of significance%%
16 vs. 4
14 vs. 1
9 vs. 3
18 vs. 5
Wilcoxon Signed-Ranks Test2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
n-th generation
Wilcoxon Signed-Ranks Test
Q: When a sign test could not show significance,how to do?
A: Try the Wilcoxon signed-ranks test. It is more sensitive than a simple sign test due to more information use.
(1)Sign Test
(2)Wilcoxon's Signed Ranks Test
significance test between the # of winnings and losses
173 174
143 137
158 151
156 143
176 180
165 162
- ++ -+ -+ -- ++ -
-1+6+7+13-4+3
data of 2 groups
# of winnings and losses
the level of winnings/losses
significance test using both the # of winnings and losses and the level of winnings/losses
Wilcoxon Signed-Ranks Test
Wilcoxon Signed-Ranks Test
v (system A) v (system B) difference d rank of |d| add sign to the ranks
As T > 3, we cannot say that there is a significant difference between A and B.
Exercise 2: Wilcoxon Signed-Ranks Test
n-th generation
Explain how to apply this test to test whether two groups are significantly different at the below generation?
Exercise 3: Wilcoxon Signed-Ranks Test
Kruskal-Wallis Test2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test
・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
two-waydata
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
Kruskal-Wallis Test
n-th generation
1. Comparison of more than two groups.2. Data have no normality.3. There are no data corresponding
among groups (independent). ??? no
nor
mal
ity
Kruskal-Wallis TestLet's use ranks of data.
1
23
4 56 7
8 91011 1213
14 1516
17
Kruskal-Wallis TestN: total # of datak: # of groupsni: # of data of group iRi : sum of ranks of group i
R1 = 38 R2 = 69 R3 = 46
1. Rank all data.2. Calculate N, k, ni and Ri .3. Calculate statistical value H.
4. If k = 3 and N ≤ 17, compare the H with a significant point in a Kruskal-Wallis test table.Otherwise, assume that Hfollows the χ2 distribution and test the H using a χ2
Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).
Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).
6.6098.1245.765
significance point of (p<0.05)
significance point of (p<0.01)
Q1: Where is significant among A, B, and C?A1: Apply multiple comparisons between all pairs
among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
R1 = R2 = R3 =
1
2
4
7
10
8
111213
3
5
6
9
N = n1+n2+n3 =k =(n1, n2, n3) =(R1, R2, R3) =
)1(3)1(
121
2
NnR
NNH
k
i i
i
=
Exercise: Kruskal-Wallis Test
24 44 23
13 samples3 groups( 5, 4, 4)(24, 44, 23)
6.227
7.7605.657
significance point of (p<0.05)
significance point of (p<0.01)
6.227
There is/are significant difference(s) somewhere among three groups (p<0.05).
Friedman Test2 groups n groups (n > 2)
datadistribution
・unpaired t -test
・sign test
・Wilcoxon signed-ranks test ・Friedman test
・Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA(n
o no
rmal
ity) one-way
data
Para
met
ric T
est
Non
-par
amet
ric T
est
(nor
mal
ity)
unpa
ired
(inde
pend
ent)
paire
d(re
late
d)un
paire
d(in
depe
nden
t)pa
ired
(rela
ted)
・paired t -test
・Mann-Whitney U-test
ANO
VA(A
naly
sis
of V
aria
nce)
Friedman TestWhen
(1) more than two groups, (2) data have correspondence (not independent), but(3) the conditions of two-way ANOVA are not satisfied,
Let' use ranks of data and Friedman test.
benchmarktasks
methods
a b c d
A 0.92 0.75 0.65 0.81
B 0.48 0.45 0.41 0.52
C 0.56 0.41 0.47 0.50
D 0.61 0.50 0.56 0.54
(ex.) Comparison of recognition rates. a b c dmethods
12
3
4
1
2 34 1 234
1
234
Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.
Step 3: Calculate the Friedman test value, χ2r .
Step 4: If k =3 or 4, compare χ2r with a significant point
in a Friedman test table. Otherwise, use a χ2 table of (k-1) degrees of freedom.
methods
12
34
1
2 34 1 234
123
4
a b c d
)1(3)1(
121
22
knRknk
k
iir
where (k, n) are the # of levels of factors 1 and 2.
# of
dat
a(n
= 4)
# of methods (k = 4)
benchmarktasks
methoda b c d
A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2
Σ 15 6 7 12
ranking among methods
Example: Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.
Step 3: Calculate the Friedman test value, χ2r .
Step 4: Since significant point for (k,n) = (4,4) is7.80,there is/are significant difference(s) somewhereamong four methods, a, b, c, and d (p<0.05).
A1: Apply multiple comparisons between all pairsamong columns.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
Multiple Comparisons
When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.
4C2 = 6 times of pair comparisons with (p < 0.05)
1 - (1 - 0.05)6 = significance level 26.5%!
Example
Multiple Comparisons
When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.
4C2 = 6 times of pair comparisons with (p < 0.05)
1 - (1 - 0.05)6 = significance level 26.5%!
Solution is to apply multiple pair comparisons with more strict significance level.
Example
Multiple Comparisons-- Bobferroni method --
When pair comparisons are applied m times,let's use a significance level of p / m.
4C2 = 6 times of pair comparisons with (p < )0.056
Features:(1) Simple.(2) Rather strict, i.e. showing significances is rather hard.
Multiple Comparisons-- Holm method --
Corrected Bonferroni method to detect significances easily.
Example pair comparisons p-value corrected
p-value eqn.corrected p-value
0.0076 = p-value* 6 0.0456
0.0095 = p-value* 5 0.0475
0.0280 = p-value* 4 0.1120
0.0320 = p-value* 3 0.0960
0.0380 = p-value* 2 0.0760
0.0410 = p-value* 1 0.0410
vs.
vs.
vs.
vs.
vs.
vs.
normality(parametric)
2 groups n groups (n > 2)
datadistribution
t -test
・sign test
・Wilcoxon Signed-Ranks Test
ANOVA(Analysis of
Variance)
・Friedman test
・kruskal-wallis test
one-way ANOVA
two-way ANOVA
no normality(non-parametric)
one-waydata
two-waydata
Scheffé's Method of Paired Comparison
+Scheffé's method of paired comparison for Human Subjective Tests
IEC
lighting design of 3-D CG
measuring mental scale
geological simulation
hearing-aid fitting
CorridorW
K
Wall
LB
Verenda
MEMS design
EvolutionaryComputation
Target System
subjectiveevaluations
Interactive Evolutionary Computation
imag
e en
hanc
emen
t pro
cess
ing
room layoutplanning design
room lighting design byoptimizing LED assignments
Can you hear me ?
??
Scheffé's Method of Paired Comparison
ANOVA based on nC2 paired comparisons for n objects.
ANOVAevenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
significance checkusing a yardstick
Scheffé's Method of Paired Comparison
Original method and three modified methodsAll subjects must evaluate all pairs.
no yes
yesoriginal
(原法, 1952)Ura's variation(浦の変法, 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法, 1970)or
der e
ffect
Order Effect
(1) and then
(2) and then
may result different evaluation.
Scheffé's Method of Paired Comparison
Scheffé's Method of Paired Comparison1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades.3. Then, start calculation (see other material).
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
O1 O2 O3 O4 O5 O6
A1 - A2 2 1 1 2 1 2
A1 - A3 2 2 1 1 1 1
A2 - A3 1 0 1 1 -1 0
Six subjects (N = 6)
Paire
d co
mpa
rison
sfo
r t=3
obj
ects
.
Questionnaire Total row data
・・・
strap for a mobile phone
invitation to a dinner
tea /coffee stuffed animal fountain pen
Ex. Q.
Application Example:
What is the best present to be her/his boy/girl friend?
[SITUATION] He/he is my longing. I want to be her/his boy/girl friend before we graduate from our university. To get over my one-way love, I decided to present something of about 3,000 JPY and express my heart.
I show you 5C2 pairs of presents. Please compare each pair and mark your relative evaluation in five levels.
・・・・
present from a male(significant difference)
Results of Scheffé's Method of Paired Comparison (Nakaya's variation)
What is the best present to be her/his boy/girl friend?
present from a female
I thi
nk e
ffect
ive.
Rea
lity
is ..
.
-1 -0.5 0 0.5 1
I will catch her heart by dinner.
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
How about tea leave or a stuffed anima?
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
Eat! Eat! Eat!
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
mor
eef
fect
ive
less
effe
ctiv
e
I hesitate to accept it as we have not gone about with him.
Original method and three modified methodsAll subjects must evaluate all pairs.
no yes
yes original(原法, 1952)
Ura's variation(浦の変法, 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法, 1970)or
der e
ffect
y yScheffé's Method of Paired Comparison
Modified methods by Ura and Nakaya
yScheffé's Method of Paired Comparison
Modified method by UraPairwise comparisons for objects which are effected by display order (order effect).
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
yScheffé's Method of Paired Comparison
Modified method by UraAsk N human subjects to evaluate 2×tC2 pairs for tobjects in 3, 5 or 7 grades and assign [-1, +1], [-2, +2] or [-3, +3], respectively.
yScheffé's Method of Paired Comparison
Modified method by UraStep 1: Make paired comparison table of each human subject.
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
A1
A2
A1
A1
A2
A4
A4
A3
A3
A2
A4
A3
・・・
・・・
・・・
A1 A2 A3 A4
A1 0 -1 -1
A2 3 0 0
A3 3 1 -1
A4 3 3 1
ijlx : evaluation value when the l-th human subject compares the i-th object with the j-th object.
SubjectO1
Subject O3
SubjectO2
yScheffé's Method of Paired Comparison
Modified method by UraStep 1: Make paired comparison table of each human subject.
)(2
1ˆ iii xxtN
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂
Average of four objects
where t: # of object (4) N: # of human subjects (3)
yScheffé's Method of Paired Comparison
Modified method by UraStep 2: Make a table summing all subjects' data and
calculate the average evaluations for all objects.
27 13 -12 -28
SxxN
S
Sxxt
S
xxtN
S
i ijjiij
l iilliB
iii
2
2)(
2
)(21
)(21
)(2
1
l i ijijlT
BBT
ilB
xS
SSSSSSS
Sxtt
S
xtNt
S
2
)()(
2)(
2
)1(1
)1(1
freedom. of degree and,,,,,, where )()(
fSSSSSSSS TBB
yScheffé's Method of Paired Comparison
Modified method by UraStep 3: Make a ANOVA table.
unbiased variance = S/f
F = unbiased varianceunbiased variance of S
for F tests.
SxxN
S
Sxxt
S
xxtN
S
i ijjiij
l iilliB
iii
2
2)(
2
)(21
)(21
)(2
1
l i ijijlT
BBT
ilB
xS
SSSSSSS
Sxtt
S
xtNt
S
2
)()(
2)(
2
)1(1
)1(1
yScheffé's Method of Paired Comparison
Modified method by Ura
ANOVA table.
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂There are significant difference among A1 - A4
yScheffé's Method of Paired Comparison
Modified method by Ura
ANOVA table.
yScheffé's Method of Paired Comparison
Modified method by UraStep 4: Apply multiple comparisons.
Q1: Where is significant among A1, A2, and A3?A1: Apply multiple comparisons between all pairs.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
Step 4: Apply multiple comparisons between all pairs and find which distance is significant.
Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized yardstick,
the distance is significant.
yScheffé's Method of Paired Comparison
Modified method by Ura
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
(See in the next slide.)
yScheffé's Method of Paired Comparison
Modified method by Ura
tNftqY 2/ˆ),( 2
Step 4: Example of a simple multiple comparisons.
(studentized yardstick)
When (t, f) = (4,21), studentized yardsticks for significance levels of 5% and 1% are:
),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.
Modified method by UraStep 4: Example of a simple multiple comparisons.
Original method and three modified methodsAll subjects must evaluate all pairs.
no yes
yes original(原法, 1952)
Ura's variation(浦の変法, 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法, 1970)or
der e
ffect
y yScheffé's Method of Paired Comparison
Modified methods by Ura and Nakaya
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
y yScheffé's Method of Paired Comparison
Modified method by NakayaPairwise comparisons for objects that can be compared without order effect.
y yScheffé's Method of Paired Comparison
Modified method by Nakaya1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades, respectively.3. Then, start calculation (see other material).
O1 O2 O3 O4 O5 O6
A1 - A2 2 3 3 2 0 1
A1 - A3 2 0 0 1 1 0
A2 - A3 -3 -2 -1 -1 -3 -2
Six human subjects (N = 6)
Paire
d co
mpa
rison
sfo
r t=3
obj
ects
.
Questionnaire
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 1: Make paired comparison table of each human subject.
: evaluation value when the l-th human subject compares the i-th object with the j-th object.
ijlx
ii xtN1̂
Average of four objects
where t: # of object (3)N: # of human subjects (6)
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 2: Make a table summing all subjects' data and
calculate the average evaluations for all objects.
ii
l iliB
ii
xtN
S
Sxt
S
xtN
S
2
2
..
2.)(
..
1
1
1
SF
SSSSS
Sxt
S
BT
l iliB
of varianceUnbariased varianceUnbiased
1
)(
2.)(
There are significant difference among A1 - A3
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 3: Make a ANOVA table.
ANOVA table.
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Apply multiple comparisons.
Q1: Where is significant among A1, A2, and A3?A1: Apply multiple comparisons between all pairs
among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
ANOVA table.
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Apply multiple comparisons between all pairs and
find which distance is significant.
Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized yardstick,
the distance is significant.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
1980.263/79.197.6
4506.163/79.160.4
01.0
05.0
Y
Y
y yScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Example of a simple multiple comparisons.
tNftqY /ˆ),( 2 (studentized yardstick)
),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.