SoC Presentation Title 2004 Comments from Pre-submission Presentation Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]
Comments from Pre-submission Presentation. Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SoC Presentation Title 2004
Comments from Pre-submission Presentation
Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%.
A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]
SoC Presentation Title 2004
[Joachims98][Debole03][Dumais98]Results on the Reuters Corpus
Bayes Rocchio C4.5 kNN SVM
(linear)
SVM
(Poly)
SVM
(rbf)
Micro-BEP(%)
69.84 79.14 77.78 82.5 84.2 86 86
kNN SVM
(linear)
Micro-F1
85.4 92.0
NBayes DT SVM
(linear)
Micro-
BEP
81.5 88.4 92.0
SoC Presentation Title 2004
[Yang 99 Re-examination]Significance Test
Micro-level analysis (s-test)
SVM > kNN >> {LLSF, NNet} >> NB
Macro-level analysis
{SVM, kNN, LLSF} >> {NB, NNet}
Error-rate based comparison
{SVM, kNN} > LLSF > NNet >> NB
SoC Presentation Title 2004
Comments from Pre-submission Presentation
2. Explain why BEP & F1 in Chap 7
-Add reference
SoC Presentation Title 2004
Breakeven point (1)
BEP, first proposed by Lewis[1992]. Later, he himself pointed out that BEP is not a good effectiveness measure, because
1. there may be no parameter setting that yields the breakeven; in this case the final BEP value, obtained by interpolation, is artificial;2. to have P=R is not necessarily desirable, and it is not clear that a system that achieves high BEP can be tuned to score high on other effectiveness measure.
SoC Presentation Title 2004
Breakeven point (2)
Yang[1999Re-examinatio] also noted that when for no value of the parameters P and R are close enough, interpolated breakeven may not be a reliable indicator of effectiveness.
SoC Presentation Title 2004
Comments from Pre-submission Presentation
3. Add more qualitative analysis would be better
SoC Presentation Title 2004
Analysis and Proposal: Empirical observation
feature
Category: 00_acq Category: 03_earn
idf rf chi2 idf rf chi2
Acquir 3.553 4.368 850.66 3.553 1.074 81.50
Stake 4.201 2.975 303.94 4.201 1.082 31.26
Payout 4.999 1 10.87 4.999 7.820 44.68
dividend 3.567 1.033 46.63 3.567 4.408 295.46
Comparison of idf, rf and chi2 value of four features in two categories of Reuters Corpus
SoC Presentation Title 2004
Comments from Pre-submission Presentation
4. Chap 7 remove Joachims Results using quotation is fine
SoC Presentation Title 2004
Comments from Pre-submission Presentation
5. Tone down “best” claims
to our knowledge (experience, understanding)
Pay attention this usage when doing presentation
SoC Presentation Title 2004Introduction:Other Text Representation
• Word senses (meanings) [Kehagias 2001]
same word assumes different meanings in a different contexts
• Term clustering [Lewis 1992]
group words with high degree of pairwise semantic relatedness
• Semantic and syntactic representation [Scott & matwin 1999]
Relationship between words, i.e. phrases, synonyms and hypernyms