© Tefko Saracevic 1 Search Search strategy strategy & tactics & tactics Governed by Governed by effectiveness effectiveness & & feedback feedback
Dec 21, 2015
© Tefko Saracevic 1
Search strategySearch strategy& tactics& tactics
Search strategySearch strategy& tactics& tactics
Governed byGoverned by
effectivenesseffectiveness
& &
feedbackfeedback
© Tefko Saracevic 2
Some definitions
• Search statement (query):– set of search terms with
logical connectors and attributes - file and system dependent
• Search strategy (big picture):– overall approach to
searching of a questionselection of systems, files,
search statements & tactics, sequence, output formats; cost, time aspects
© Tefko Saracevic 3
Some definitions (cont.)
• Search tactics (action choices):– choices & variations in search
statements terms, connectors, attributes
• Move :– modifications of search
strategies or tactics that are aimed at improving the results
• Cycle (particularly applicable to systems such as DIALOG):
– set of commands from start (begin) to viewing (type) results, or from a viewing to a viewing command
© Tefko Saracevic 4
Some definitions (cont.)
• Effectiveness :– performance as to
objectivesto what degree did a search
accomplish what desired?how well done in terms of
relevance?
• Efficiency :– performance as to costs
at what cost and/or effort, time?
Both KEY concepts & criteria for selection of strategy, tactics & evaluation
© Tefko Saracevic 5
Effectiveness criteria
• Search tactics chosen & changed following some criteria of accomplishment, such as:– none - no thought given– relevance (very often)– magnitude (also very often)– output attributes– topic/strategy
• Tactics altered interactively– role & types of feedbackKnowing what tactics may produce what results key to professional searcher
© Tefko Saracevic 6
Relevance:key concept in IR
• Attribute/criterion reflecting effectiveness of exchange of inf. between people (users) & IR systems in communication contacts, based on valuation by people
• Some attributes:– in IR - user dependent– multidimensional or faceted– dynamic– measurable - somewhat– intuitively well understood
© Tefko Saracevic 7
Types of relevance
• Several types considered:– Systems or algorithmic
relevancerelation between between a
query as entered and objects in the file of a system as retrieved or failed to be retrieved by a given procedure or algorithm. Comparative effectiveness.
– Topical or subject relevance: relation between topic in the
query & topic covered by the retrieved objects, or objects in the file(s) of the system, or even in existence; Aboutness..
© Tefko Saracevic 8
Types of relevance (cont.) – Cognitive relevance or
pertinence:relation between state of knowledge &
cognitive inf. need of a user and the objects provided or in the file(s). Informativeness, novelty ...
– Motivational or affective relevancerelation between intents, goals &
motivations of a user & objects retrieved by a system or in the file, or even in existence. Satisfaction ...
– Situational relevance or utility: relation between the task or problem-at-
hand. and the objects retrieved (or in the files). Relates to usefulness in decision-making, reduction of uncertainty ...
© Tefko Saracevic 9
Effectiveness measures
• Precision:– probability that given that an
object is retrieved it is relevant, or the ratio of relevant items retrieved to all items retrieved
• Recall:– probability that given that an
object is relevant it is retrieved, or the ratio of relevant items retrieved to all relevant items in a file
• Precision easy to establish, recall is not
union of retrievals as a “trick” to establish recall
© Tefko Saracevic 10
Precision =
a
a + b
Recall =a
a + c
Calculation
High precision = maximize a, minimize b
High recall = maximize a, minimize c
JudgedRELEVANT
JudgedNOT RELEVANT
ItemsRETRIEVED
aNo. of items
relevant & retrieved
bnot relevant &
retrievedItems
NOT RETRIEVEDc
relevant ¬ retrieved
dnot relevant ¬ retrieved
© Tefko Saracevic 11
Interpretation: PRECISION
• Precision= percent of relevant stuff you have in your answer– or conversely percent of junk– high precision = most stuff
relevant– low precision = a lot of junk
• Some users demand high precision– do not want to wade through
much stuff– but it comes at a price: relevant
stuff may be missed tradeoff
© Tefko Saracevic 12
• A file may have a lot of relevant stuff
• Recall = percent of that relevant stuff in the file that you retrieved– conversely percent of stuff you
missed– high recall = you missed little– low recall = you missed a lot
• Some users demand high recall (e.g. PhD students doing dissertation)
– want to make sure that important stuff is not missed
– but will have to pay a price of wading through a lot of junk
tradeoff
Interpretation:RECALL
© Tefko Saracevic 13
Precision-recall trade-off
• USUALLY: precision & recall are inversely related– higher recall usually lower
precision & vice versa100 %
100 %0
Ideal
Usual
Impr
ovem
ents
Pre
cisi
on
Recall
© Tefko Saracevic 14
Interpretation:TRADE-OFF
• It is like in life, usually:– you get some lose some
• Usually, but not alwayskeep in mind these are
probabilities
– when you have high precision most stuff you got is relevant or on the target but you missed stuff that is also relevant – it was left behind
– when you have high recall you did not miss much but you got also a lot of junk - wading through itYou use different tactics for high recall from those for high precision
© Tefko Saracevic 15
Search tactics
• What variations possible?– several ‘things’ in a query
can be selected or changed that affect effectiveness
– each variation has consequence in output if I do X then Y will happen
1. LOGIC – choice of connectors among
terms (AND, OR, NOT, W …)
2. SCOPE– no. of terms linked - ANDs(A AND B vs A AND B AND C)
© Tefko Saracevic 16
Search tactics (cont.)
3.EXHAUSTIVITY– for each concept no. of related
terms - OR connections(A OR B vs. A OR B OR C)
4. TERM SPECIFICITY– for each concept level in hierarchy(broader vs narrower terms)
5. SEARCHABLE FIELDS– choice for text terms & non-text
attributes e.g. titles only, limit as to years
6. FILE OR SYSTEM SPECIFIC CAPABILITIES– e.g. ranking, sorting
© Tefko Saracevic 17
Effectiveness “laws”
SCOPE- adding more ANDs
EXHAUSTIVITY- adding more more
ORs
USE OF NOTs- adding more NOTs
BROAD TERM USE– low specificity
Output size: downRecall: downPrecision: up
Output size: upRecall: upPrecision: downOutput size downRecall: downPrecision: up
Output size: upRecall: upPrecision: downOutput size: downRecall: downPrecision: up
PHRASE USE - high specificity
© Tefko Saracevic 18
Tactics: What to do?
• To increase precision:– use precision devices
• To increase recall:– use recall devices
• Each will also affect magnitude of output
• With experience use of these devices will become will become second nature
© Tefko Saracevic 19
Recall, precision devices
BROADENING higher recall:Fewer ANDsMore ORsFewer NOTsMore free textFewer controlledMore synonymsBroader termsLess specificMore truncationFewer qualifiersFewer limitsCitation growing
NARROWING -higher
precision:More ANDsFewer ORsMore NOTsLess free textMore controlledLess synonymsNarrower termsMore specificLess truncationMore qualifiersMore limitsBuilding blocks
© Tefko Saracevic 20
Other tactics• Citation growing:
– find a relevant document– look for documents cited in– look for documents citing it– repeat on newly found
relevant documents
• Building blocks– find documents with term A– review – add term B & so on
• Using different feedbacks– a most important tool
© Tefko Saracevic 21
Feedback in searching
• Any feedback implies loops– a completion of a process
provides information for modification, if any, for the next process
– information from output is used to change previous or create new input
• In searching:– some information taken from
output of a search is used to do something with next query (search statement)
examine what you got to decide what to do next in searching
– a basic tactic in searching
• Several feedback types used in searching– each used for different decisions
© Tefko Saracevic 22
Feedback types
• Content relevance feedback– judge relevance of items retrieved– make decision what to do next
switch files, change exhaustivity …
• Term relevance feedback– find relevant documents– examine what other terms used in
those documents – search using additional terms
also called query modification & in some systems done automatically
• Magnitude feedback– on the basis of size of output
make tactical decisions often the size so big that documents
are not examined but next search done to limit size
© Tefko Saracevic 23
Feedback types (cont.)
• Tactical review feedback– after a number of queries (search
statements) in the same search review tactics as to getting desired outputs
review terms, logic, limits …
– change tactics accordingly
• Strategic review feedback– after a while (or after consultation
with user) review the “big” picture on what searched and how
sources, terms, relevant documents, need satisfaction, changes in question, query …
– do next searches accordingly– used in reiterative searching
• There is a difference between reviewing strategy & tactics– but they can be combined
© Tefko Saracevic 24
Bates Berry-picking model of searching
“…moving through many actions towards a general goal of satisfactory completion of research related to information need.”– query is shifting (continually)
as search progresses queries are changing
different tactics are used
– searcher (user) may move through a variety of sourcesnew files, resources may be usedstrategy may change
© Tefko Saracevic 25
Berry-picking …
– new information may provide new ideas, new directionsfeedback is used in various ways
– question is not satisfied by a single set of answers, but by a series of selections & bits of information found along the wayresults may vary & may have to
be provided in appropriate ways & means