Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert Douglas Oard, Tamer Elsayed, Yejun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagobert Soergel Soergel TREC-2006 at Maryland: Blog, Enterprise and QA TREC-2006 at Maryland: Blog, Enterprise and QA Tracks Tracks QA Track: QA Track: ciQA Task ciQA Task Conclusi Conclusi on on Title+N arr/Em ail-retrieved -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 T opic (sorted by difference) Diff. from M edian (AP) Title+N arr/Em ail -retrieved+supported -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Topic (sorted by difference) Diff. from M edian (AP) Em ailSupportR atio 0.0 0.2 0.4 0.6 0.8 1.0 Topic (sorted by em ailsupportratio) R atio Num ber ofSupportEm ails 0 300 600 900 1200 T opic (sorted by em ailsupportratio) Emails Title+N arr/Em ail-retrieved -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Topic (sorted by em ailsupportratio) D iff.from M edian (A P) Title+N arr/Em ail-retrieved+supported -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Topic (sorted by em ailsupportratio) D iff.from M edian (A P) Title+Narr/Email-retrieved+supported R 2 = 0.4758 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 Topic (sortedbyemailsupportratio) Diff. from Best (AP) Enterprise Track: Enterprise Track: Expert Search Expert Search Task Task Blog Track: Blog Track: Opinion Retrieval Opinion Retrieval Task Task Retrieval Retrieval Results Results Supported Retrieval Supported Retrieval Results Results Improved reference resolution Parameter tuning for weighted- field credit Learning from reply features H h h d T cand d T cand ) , | ( support ) | ( score ) , ( level ) , ( assoc ) ), ( root ( sim ) , | ( support d h d d cand T d T cand d Thread-based Scoring Thread-based Scoring T R d T cand d T cand ) , | ( support ) | ( score ) , ( assoc ) , ( sim ) , | ( support d cand T d T cand d Email-based Scoring Email-based Scoring d f f cand w d cand ) ( ) , ( assoc Candidate List ……………………… ……………………… …………………. Models of Identity Enriched Candidate Models Topi c Duplica te Removal W3C Mailing Lists Ranke d List Email and Thread Index Email Addresses Full Names Nicknames Candidate Scoring Retrieval Engine Referenc e Recognit ion Average performance Threads help in short queries More email support more accurate Future Future Work Work College of Information Studies / College of Information Studies / Computer Science Department / UMIACS, University of Maryland, Computer Science Department / UMIACS, University of Maryland, College Park, USA College Park, USA Retrieval Support Query Approac h MAP P@10 MAP P@10 Title Email 0.195 0.406 0.072 0.182 Title + Narrative Email 0.350 0.504 0.141 0.298 Title Thread 0.218 0.449 0.090 0.198 Title + Narrative Thread 0.343 0.514 0.139 0.294 Title + Description Thread 0.315 0.502 0.119 0.278 Avg. of Medians 0.341 0.508 0.154 0.294 Removing non email- supported 0.365 0.525 0.147 0.311 Reference Credit W f for Email Fields Sender 2.0 Receiver 1.0 Subject 1.0 New text t f Quoted sender 1.0 Quoted receiver 0.5 Quoted text t f Title+Narr/Em ail-retrieved R 2 = 0.3531 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 Topic (sorted by em ailsupportratio) Diff. from Best(AP) Results Results Hom Much Email Support Over Hom Much Email Support Over Topics? Topics? Performance Relative to Email Performance Relative to Email Support Support Task Goal Task Goal: locating blog posts that express an opinion about a target. Retrieval Unit: Permalinks (postings + comments): 3,215,171 documents 0.3516 0.5800 0.3866 0.2733 PasTiDesDef 0.3162 0.5280 0.3580 0.2362 ParTiDef 0.3542 0.6200 0.4034 0.2812 ParTiDesDmt3 0.3501 0.6200 0.4040 0.2845 ParTiDesDmt2 0.3490 0.6200 0.3998 0.2849 ParTitDesDef R-Prec P@10 Bpref MAP Runs Comparison at Topic Comparison at Topic Relevance Relevance 0.2264 0.3460 0.2274 0.1631 ParrTiDesDef 0.2106 0.3360 0.2256 0.1547 ParTiDef 0.2417 0.3780 0.2568 0.1873 ParTiDesDmt3 0.2421 0.3780 0.2573 0.1887 ParTiDesDmt2 0.2441 0.3780 0.2521 0.1882 ParTitDesDef R-Prec P@10 Bpref MAP Runs Comparison at Opinion Comparison at Opinion Relevance Relevance Conclusions Conclusions Paragraphs better for both topic and opinion retrieval. Title+Description queries beat title only. Demoting non-opinionated documents had little effect. Future Work Future Work Parameter tuning for: Low frequency words. Paragraph detection, passage size. Aggregation of opinion scores. Threshold of opinion scores. ParTiD esD m t2 B etter -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 892 859 865 883 852 874 863 870 884 851 873 871 854 867 855 889 875 890 869 872 864 861 881 900 895 885 886 856 857 868 897 877 882 878 853 888 891 880 899 898 896 860 862 876 894 887 866 879 858 893 M edian B etter D ifference in A P ParTiD esD m t2 B etter -0.0800 -0.0600 -0.0400 -0.0200 0.0000 0.0200 0.0400 0.0600 886 859 851 856 867 881 875 872 890 878 865 889 882 864 895 893 853 860 896 891 876 852 885 868 866 862 877 899 900 858 898 857 854 897 863 874 880 870 879 888 861 873 887 871 869 883 894 892 855 884 ParTiD esD efB etter D ifference in A P ) ) ( 1 ) ( 1 ) ( 1 ( log ) , ( 2 1 2 1 2 2 1 w hits N w hits N NEARw w hits N w w PMI SO-PMI(w) = PMI(w, {positive paradigms}) – PMI(w, {negative paradigms}) Compute Semantic Orientation of words (Turney & Littman, 2002): Par0001 Par0002 Par0003 Par0004 Par0005 Compute SO of words Wilson & Wiebe’s lexicon Demotion by 2 or 3 times If <0.15 normaliz ed 0.2 1 0.1 2 0.4 4 0.5 6 0.3 2 1 2 3 4 5 16773 lemmas (-3 ~ -0.05) negative (0.05 ~ 5) positive 8221 lemmas 0. 2 1.0 lemmatize d Top 1000 paragraph s Ranked List Docs Docs Permalink Docs “Cleaned” Docs Fixed sized Passages Paragraphs Indri Index Indri Index cleanin g query Window = 50 words Overlap = 10 words Cleaning rules based on top 5 blog hosting sites Topic relevance evaluation merge merge Lemmatize; Remove: stop words & Not in dictionary by “spell” DF<=40 Opinion relevance evaluation Docs merge BLOG06-20051224-029-0001622821.cln Notify Blogger about objectionable content. What does this mean? Blogger Get your own blog Flag Blog Next blog rabbit + crow blog It's like anything. Tuesday, December 20, 2005 Blue Planet "...Whoa!...Wow!...WOW!...Holy shit!...WOW!!!..." (my wife and I watching the first episode of Sir David Attenborough's "The Blue Planet" tonight) posted by Neal Romanek at 9:18 PM - permalink 0 Comments: Comment? About Me My Photo Name:Neal Romanek Location:Los Angeles The Previous Posts Archives SUBSCRIBE to the Rabbit + Crow Blog! WEEKLEY POALE In our last Weeklie Poll, we asked which was your favorite Ceratopsian. The winner, amid spiky competition, was, of course... ...STYRACOSAURUS. THIS WEEK... If you found you could no longer walk, which mode of ambulation would you instead adopt? (_) Bustling (_) Charging (_) Creeping (_) Tip-Toeing (_) All of the above in various combinations (_) None of the above. I would adopt a stony stillness. buy the new shirt questions? suggestions? confessions? [email protected] Listed on BlogShares blog search directory Example Topic: <num> Number: 851 <title> "March of the Penguins" <desc> Description: Provide opinion of the film documentary "March of the Penguins". <narr> Narrative: Relevant documents should include opinions concerning the film documentary "March of the Penguins". Articles or comments about penguins outside the context of this film documentary are not relevant. Participation Goals Participation Goals Building an expert search baseline system Applying models of identity to public mailing lists Building a reference-resolution infrastructure Participation Participation Goals Goals Results and Analysis Results and Analysis Conclusion Conclusion Pre Post Manual Manual + AutoFill er Automat ic Consistent judgments 427 (87.9%) 995 (90.3%) 452 (90.0%) Y Y 194 224 78 N N 233 771 374 Inconsistent judgments 59 (12.1%) 107 (9.7%) 50 (10.0%) Y N 37 48 20 N Y 22 59 30 Difference -15 +11 +10 Type # Topics Avg. Improvement #1 10 -0.0124 (-4.0%) #2 12 0.0300 (8.2%) #3 8 0.106 (44.0%) Relevant Sentence s Partiall y Relevant Sentence s Not Relevant Sentences Nugget 74 8 16 Not Nugget 258 69 270 All 332 77 286 Percenta ge 22% 10% 6% Relevance feedback does not always work for QA The error margin of nugget judgments is ~10% Relevant sentence ≠ answer nugget To explore the effectiveness of single-iteration written clarification dialogs; To explore different strategies for clarifying user needs in question answering; To better understand the nature of complex, template-based questions. Run F-Score UMDM1pre UMDM1post 0.316 0.350 (+10.6%) UMDA1pre UMDA1post 0.224 0.180 (-19.4%) Analysis 2: Consistency in Judgment Analysis 3: Relevant Sentences vs. Answer Nuggets Future Work Future Work Examination of possible systematic errors in nugget judgments Exploration of the relationship between relevant sentences and answer nuggets Document Retrieval Top 20 relevant documents Answer Generation Unordered Answers Answer Ranking Interaction Forms Generation Analysis of Interaction Responses Ordere d Answer s Refin ed Answe rs Questio ns Queries Example Question: Topic 26. Question: What evidence is there for transport of [smuggled VCDs] from [Hong Kong] to [China]? Narrative: The analyst is particularly interested in knowing the volume of smuggled VCDs and also the ruses used by smugglers to hide their efforts. External resources: • CIA World Fact Book • Google • WordNet • Roget’s Thesaurus • Wikipedia Interaction Questions Topic 026 1. What types of smuggled disks are you interested in? Check all that apply: □ VCDs □ CDs □ DVDs □ Other. Please specify: … Importance of Answer Types Topic 042 Please rate the importance of following types of evidence. 1. General claim of effects of aspirin. ○ Important. ○ Somewhat important. ○ Not needed at all. 2. Guideline of how aspirin can be used to treat heart diseases. ○ Important. ○ Somewhat important. ○ Not needed at all. … Relevance Feedback Topic 055 Please indicate the relevance of the following answers. 1. Most of Sierra Leone's diamonds were and still are smuggled into neighboring Liberia for sale, according to several human rights groups and diamond industry experts. ○ Relevant. ○ Somewhat relevant. ○ Not relevant. … Three types of interaction: 1 2 3 Methods Methods Analysis 1: Interaction Performances by Type of Interaction -50% 0% 50% 100% 150% 200% 250% 300% topics im provem entofF-score (% Sample relevance feedback Importance of answer types Clarificatio n questions