Top Banner
IMA 8/11/2006 Advanced Math Search 1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute of Standards and Technology (DLMF)
22

IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

Dec 24, 2015

Download

Documents

Roland Barnett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 1

Relevance Ranking and Hit Packaging in Math Search

Abdou Youssef

The George Washington University

And

The National Institute of Standards and Technology

(DLMF)

Page 2: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 2

Outline What are relevance ranking and hit

packaging Why relevance ranking and hit packaging Math-relevance ranking: factors & methods Math-hit packaging: issues and method

Page 3: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 3

Relevance Ranking: What and Why What:

Measuring the relevance of each hit to a query Sorting the hits from the most to the least relevant

Why: Numbers of hits are expected to be in the hundreds and

even thousands Too taxing, tedious and time consuming for users to

plow through the hits looking for the relevant one(s)

Page 4: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 4

Hit-Packaging: What and Why

What: Providing with each hit short, representative

excerpts from the corresponding document Why:

Numbers of hits in the hundreds/thousands Relevance ranking may not be perfect There could be several objectively top-ranking

equally relevant hits. Brief hit-descriptions help users select

Page 5: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 5

Relevance Ranking How it is typically done

For any document d and query q, the relevance score of d is:

qintterms d

dintfreq......

t..having.docs.NumDB.in.docs.Num ||/)log(

)..(

Page 6: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 6

Relevance Ranking How it is typically done (Contd.)

Some search systems allow users to boost some terms over others in a query:

ttermsquery d

qintBoostdintfreq..

t..having.docs.NumDB.in.docs.Num ||/)log(

)..()..(

Page 7: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 7

Why Current Ranking Schemes Not Good for Math Search

Length of a math object (e.g., equation) often has no bearing on its relevance/importance

Frequency of a term in a math object also has no bearing on the math object

Many considerations that impact the relevance/importance of a math object are not captured by the text-IR relevance metric

Page 8: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 8

Factors to Consider in Math Relevance Ranking Static Factors Static Weighting Dynamic Factors Dynamic Weighting

Page 9: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 9

Static Weighting: Determined fully by content/author Not all objects in a math file are of equal

importance

Therefore, when ranking hits, the nature of hit-contents must be factored in Some objects must be given more weight than

others in calculating the relevance score of a hit

Page 10: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 10

Static Weighting:Possible Hierarchies of Weights

Definitions

Theorems

Propositions Corollaries

Lemmas

SpecialFunctions

Operators

Other mathidentifiers

Expert-Ranked

Formulas.....

Page 11: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 11

Static Weighting: Native vs. Non-native Entities

Native entities An entity (e.g., term, concept, special function)

should carry more weight in its “native chapter” than in passing references in other chapters

Native connections A connection between two entities should carry

more weight in the chapter of either entity than in other entities

Page 12: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 12

Dynamic Weighting:Determined by Query/Users

Query-biased weighting Number and weights of external references

to an item Number of recent accesses to an item

By the same user in current session By multiple users in the last N days

Page 13: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 13

A New Model of Relevance Metrics

For any math document/object d and query q, the relevance score of d is:

The functions f and g are attenuating functions

ttermsquery dthavingdocsNum

DBindocsNumdintfreq

dweighthdtweightgtweightf

.. ||/)....

...log()..(

))(()),(())((

Page 14: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 14

Weight(t,d) Weight(t,d) defines the weight of term t

Intrinsically, and In the context of document d

)),(context())(type(),Weight( dttdt

Page 15: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 15

Weight(d)

Weight(d) defines the weight of the document/object d depending on The nature/type of d The number of pointers to d The number of accesses to d

)essesRecent.Acc(#

)Pointers(#))(type()Weight(

dd

Page 16: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 16

This math-specific relevance scoring scheme is currently being developed and implemented for DLMF

Page 17: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 17

Hit Packaging When the hit-content size is small,

display the whole content with the hit Equation hits: the equation itself Graph hits: The whole graph or just the

caption Table hits: the whole table or just the

caption Definition/Theorem/Notation hits: the

whole, unless it is embedded in a section

Page 18: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 18

Hit Packaging:When the hit-content is too large

The hit package must be Excerpts from the corresponding document Short: 2-5-10 lines long Relevant to the query Representative of the document contents

Page 19: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 19

How to Choose the Excerpts Divide document into small fragments

Titles (of sections, subsection, etc.) Equations Captions Sentences

Compute the relevance for each fragment Rank the fragments by their relevance Choose 5-10 top-ranking fragments

Page 20: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 20

Implementation Matters Query processing and searching must be

fast Users cannot and should not wait too long Servers often have to serve many users at once

Therefore: Hit relevance scoring must be fast Hit packaging must be fast

Page 21: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 21

Implementation Matters (Contd.)

Indexing can be slow, because it is done offline, ahead of search time

Therefore, compute and store in the index all kinds of information that Facilitate the relevance-scoring of hits Speed up document-fragmentation and

fragment-scoring for query-biased hit-packaging at search

time

Page 22: IMA 8/11/2006Advanced Math Search1 Relevance Ranking and Hit Packaging in Math Search Abdou Youssef The George Washington University And The National Institute.

IMA 8/11/2006 Advanced Math Search 22

Closing thoughts Math-specific relevance scoring and

hit packaging are critical to the success of math search

We barely started to scratch the surface

Much research will be needed