Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 Searching Question by Identifying Question Topic and Question Focus 2008/7/9 1 Rick Liu
Rick Liu 1
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong YuShanghai Jiao Tong University & MSRA
ACL 2008
Searching Question by Identifying Question Topic and Question Focus
2008/7/9
Rick Liu 2
Introduction
Question & their Answers A very large archives Built up by Online Services
Example Traditional FAQ services Community-based Q&A services▪ Emerging▪ Yahoo! Answers, Live QnA, Baidu Zhidao
2008/7/9
Rick Liu 3
Motivation
Question Search Help users to search previous
answers
2008/7/9
Any cool clubs in Berlin or Hamburg?
What are the best/most fun clubs in Berlin? Any nice hotels in Berlin or Hamburg? How long does it take to Hamburg from
Berlin? Cheap hotels in Berlin?
Rick Liu 4
Motivation
2008/7/9
Any cool clubs in Berlin or Hamburg?
Question TopicQuestion Focus
Rick Liu 5
Approach
Identifying question topic & focus Question tree Determining the tree cut
Modeling question topic & focus for search Language model
2008/7/9
Rick Liu 6
Question Tree
Topic terms BaseNP, WH-ngram
Topic profile probability distribution of categories
Specificity inverse of the entropy of the topic profile
Topic chain topic terms ordered by specificity value
(desc) Topic tree2008/7/9
Rick Liu 7
Question Tree Example
2008/7/9
Rick Liu 8
Tree Cut Model
M = ( Γ , θ ) Γ = [ C1, C2, .. Ck ] , tree cut Θ = [ P(C1), P(C2), .. P(Ck) ] , prob
param vector A cut is any set of nodes Σi=1..kP( Ci ) = 1
2008/7/9
Rick Liu 9
Tree Cut Model Example
2008/7/9
[n0, n11], [n12, n21, n22, n23], [n13, n24][n11, n21, n22,
n23, n24]
Rick Liu 10
MDL-base Tree Cut Model
2008/7/9
Minimum Description Length
Ref : Li and Abe, 1998
Rick Liu 11
Determining the Tree Cut
2008/7/9
HEADTAIL
Rick Liu 12
Modeling for Search
P( q | q ) q : queried question q : targeted question
2008/7/9
~
~
Rick Liu 13
Experimental Data
Yahoo! Answers Resolved questions
travel : 314,616 items computers & internet : 210,785 items
Tree fields title ( only used ) description answers
2008/7/9
Rick Liu 14
Ground Truth
Employed Vector Space Model Manual judgments : relevant /
irrelevant
Baseline : VSM, LMIR Evaluation : MAP, R-precision, MRR
2008/7/9
Rick Liu 15
Results for ‘travel’
2008/7/9
Rick Liu 16
Results for ‘computer & internet’
2008/7/9
Rick Liu 17
About the λ
2008/7/9
Emphasize more in question topic
Rick Liu 18
Error Analysis ( travel )
Examine the correctness of question topics and question foci
200 queried question => 69 question incorrect (a) Only have the head part ( 59 ) (b) Incorrect order ( 10 )
(a) explains why λ is 0.7
2008/7/9
Rick Liu 19
Related Work
FAQ data Community based
Jeon et al., 2005 Compared four different retrieval
methods▪ Vector space model▪ Okapi▪ Language model▪ Translation-based model
Translation-based model performed the best2008/7/9
Rick Liu 20
Translation Model
Lexical chasm Where to stay in Hamburg? The best hotel in Hamburg?
IBM model 1 Use question titles and question
description as the parallel corpus
2008/7/9
Rick Liu 21
Results
2008/7/9
Rick Liu 22
Conclusions and Future Work
1) Data Structure2) Use MDL-based Tree Cut Model to
Identify3) A new form of language modeling
for question search4) Extensive experiments
2008/7/9
Now only community-based From forum sites / FAQ sites
Rick Liu 23
Thanks
2008/7/9
Rick Liu 24
Modeling for Search
2008/7/9
Rick Liu 25
Translation Probability
2008/7/9