Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata
Mar 23, 2016
Personalized Query Expansion for the Web
Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl
Gabriel Barata
Motivation
by Tojosan @ Flickr
What is query expansion?
Add meaningful search terms to the query…
What is PIR based query expansion?
Add meaningful search terms to the query…
… related to the use’s interests.
Why PIR based query expansion?
More personalization quality!
More privacy!
Example
Google search: “canon book”
Example
Top 3 results:• The Canon: A Whirligig Tour of the Beautiful
Basics of Science (Hardcover) @ Amazon
• Western Canon @ Wikipedia
• Biblical Canon @ Wikipedia
Example
Top 3 results:• The Canon: A Whirligig Tour of the Beautiful
Basics of Science (Hardcover) @ Amazon
• Western Canon @ Wikipedia
• Biblical Canon @ Wikipedia
Example
Expanded query: “canon book bible”
Example
Top 3 results:• Biblical Canon @ Wikipedia
• Books of the Bible @ Wikipedia
• The Canon of the Bible @ catholicapologetics.org
Query Expansion using Desktop data
by Old Shoe Woman @ Flickr
Algorithms
• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis
Algorithms
• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis
Expanding with Local Desktop Analysis
• Term and Document Frequency• Lexical Compounds• Sentence Selection
Expanding with Local Desktop Analysis
• Term and Document Frequency• Lexical Compounds• Sentence Selection
Term and Document Frequency
𝑇𝑒𝑟𝑚𝑆𝑐𝑜𝑟𝑒= 12+ 12∙𝑛𝑟𝑊𝑜𝑟𝑑𝑠− 𝑝𝑜𝑠𝑛𝑟𝑊𝑜𝑟𝑑𝑠 ൨∙log(1+ 𝑇𝐹)
Expanding with Local Desktop Analysis
• Term and Document Frequency• Lexical Compounds• Sentence Selection
Lexical Compounds
{ adjective? Noun+ }
Expanding with Local Desktop Analysis
• Term and Document Frequency• Lexical Compounds• Sentence Selection
Sentence Selection
𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑆𝑐𝑜𝑟𝑒= 𝑆𝑊2𝑇𝑊 + 𝑃𝑆+ 𝑇𝑄2𝑁𝑄
𝑃𝑆= ቐ
𝐴𝑣𝑔ሺ𝑁𝑆ሻ− 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥𝐴𝑣𝑔2(𝑁𝑆) ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥≤ 100 ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥 > 10
𝑇𝐹> 𝑚𝑠= ቐ7− 0.1× ሺ25− 𝑁𝑆ሻ ,𝑖𝑓 𝑁𝑆< 257 ,𝑖𝑓 𝑁𝑆 ∈[25,40]7+ 0.1× ሺ𝑁𝑆− 40ሻ ,𝑖𝑓 𝑁𝑆> 40
Expanding with Global Desktop Analysis
• Term Co-occurrence Statistics• Thesaurus based Expansion
Expanding with Global Desktop Analysis
• Term Co-occurrence Statistics• Thesaurus based Expansion
Term Co-occurrence Statistics
Expanding with Global Desktop Analysis
• Term Co-occurrence Statistics• Thesaurus based Expansion
Thesaurus based Expansion
Experiments & Evaluation
by Canadian Museum of Nature @ Flickr
Experiments
• 18 users• Files indexed within user selected paths,
Emails and Web cache
Experiments
• They chose 4 queries:– 1 from the top 2% log queries (avg. length = 2.0)
– 1 random log query (avg. length = 2.3)
– 1 self-selected specific query (avg. length = 2.9)
– 1 self-selected ambiguous query (avg. length = 1.8)
Evaluation
𝐷𝐶𝐺ሺ𝑖ሻ= ቐ
𝐺ሺ1ሻ ,𝑖𝑓 𝑖 = 1𝐷𝐶𝐺ሺ𝑖 − 1ሻ+ 𝐺ሺ𝑖ሻlog2(i) ,𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Evaluation
• Evaluated algorithms:– Google: Google query output– TF, DF: Term and Document Frequency– LC, LC[O]: Regular and Optimized Lexical Compounds– TC[CS], TC[MI], TC[LR]: Term Co-occurrences Statistics
using Cosine Similarity, Mutual Information and Likelihood Ratio
– WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and super-concepts.
ResultsLog queries:
ResultsSelf-selected queries:
Introducing Adaptativity
by RavenCore17 @ Flickr
Query Clarity
Adaptive Expansion
Experiments
• Same experimental setup as for the previous analyzis.
Results
Log queries:
Results
Self-selected queries:
Results
Conclusions
by ThisIsIt2 @ Flickr
Conclusions
• Five techniques for determining expansion terms from personal documents.
• Empirical analysis showed that these approaches perform very well.
• Expansion process adapts accordingly to query features.
• Adaptive expansion process proved to yield significant improvements over the static one.
End
Any questions?