Leveraging Procedural Knowledge for Task-Oriented Search...grow taller exercises … 18 BQ IQ BQ IQ IQ BTE ITE … Exact matching is used for annotating task in the experiment. Selected
Post on 09-Mar-2021
5 Views
Preview:
Transcript
LeveragingProceduralKnowledgeforTask-OrientedSearch
ZiYang,EricNyberg
LanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversity
{ziy,ehn}@cs.cmu.edu
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion
2
• Decomposethetaskintorequiredsubtasksmanually• Formulatequeriesmanually
• Entity-centricsearch– Seekforattribute,feature,relatedentity,action,etc.
• Task-orientedsearch– Solutionseekinganddecisionsupport.
Entity-centricvs.Task-orientedSearch
organizeaconference
chooseahotel
comparebanquetoption
recruitvolunteers
contactthe publisher considerthenumber andsize ofconference rooms
arrangemealcatering andmenu plan
checkfordiscounted rate
Howdosearchersaccomplishtasksusinginteractivesearch?
3
HowdoSearchEnginesAssistSearchers?
• QuerysuggestionasanexampleEntity-centricsearch
Suggestattribute,feature,relatedentity,action,etc.
KnowledgeofattributeandfeaturesDescriptiveknowledge
Descriptiveknowledgebase
Task-orientedsearch
Suggestrequiredsubtasks,actions,solutions,etc.
Knowledgeexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge
ExistingsolutionsProblemstudiedinthiswork
Proceduralknowledgebase4
ThinkReversely!
• Canwelearnproceduralknowledgefromusers’searchactivitiesand/orquerysuggestions,andbuildaPKBautomatically?
Task-orientedsearch
Suggestrequiredsubtasks,actions,solutions,etc.
Knowledgeofexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge
Problemalsostudiedinthiswork
AutomaticallybuiltPKBProceduralknowledgebase
5
RelatedWork
• Searchintent&task-orientedsearch– Complexsearchtaskassistantfromquerylog[Hassanetal.2012,2014]
– Task-orientedquestionsandhow-toWebqueries[Weber2012]
– IMine,SubtaskMining@NCTIR[Liu2014]• Proceduralknowledgeacquisition– Ontologiesproposedforstructuredrepresentationofproceduralknowledge[Fukazawa2010,Pareti 2014]
– Extractionbasedonstructuralinformation[Jung2010],definitionofrulesortemplates[Addis2009]
– Terminology:goal vs.target vs.purpose, instruction vs.actionsequence,step vs.action,etc.
6
Outline
• Background• ProblemDefinition– Terminology– Problem1:SearchTaskSuggestion(STS)– Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)
– STSandAPKBC
• ProposedApproach• Experiment• Conclusion
7
Proceduralknowledgegraph/base(PKB)
Terminology
How to Clean a Birdbath
How to Fix a Leaky Faucet
Ashortandconcisesummary
Adetailedexplanation
Atask
Is-achieved-byrelationbetweenaparenttaskand
alistofsubtasks• Numbered“Steps”• Bulletedsubsteps• Outgoing freelinks
8
Problem1:SearchTaskSuggestion(STS)• Whenusersturntosearchenginesforinformationseeking
andproblemsolving,howtoleverageexistingproceduralknowledgetosuggestsubsearchtask(i.e.query)?
SearchTaskSuggestion:GivenaproceduralknowledgegraphGandatask-orientedsearchq,weaimto
Task-orientedsearch Proceduralknowledgebase
searchtaskq taskt
1(a)identify thetaskfromT theuserintendstoaccomplish
taskss1,…,sn
1(b) retrievealistofn sub tasks
searchtasksp1,…,pk 1(c)suggestthe
corresponding subsearchtask9
AutomaticProceduralKnowledgeBaseConstruction:Givenataskt,weaimto
Task-orientedsearch Proceduralknowledgebase
Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)
tasktsearchtaskq2(a)identifyasearchtask
taskss1,…,sn
2(c)identifyn (≤k)searchtaskstogeneraten tasksthatcanbeperformed toaccomplishthetaskt withtextdescription.
searchtasksp1,…,pk
2(b)collectkrelatedsearchtasks
• Usersstillfaceadhoc situations(tasks)thatarenotcoveredbyanexistingPKB,butothersearchersmayhaveinteractedwithsearchenginestoattemptasolution.
• CanweconstructaPKBusingsearchqueriesandrelevantdocumentsreturnedfromsearchengines?
10
Outline
• Background• ProblemDefinition• ProposedApproach– BasicIdea– Three-wayParallelCorpusConstruction– FeatureDefinitionandModelConstruction
• Experiment• Conclusion
11
Queryable Phrase/TaskDescriptionExtraction:BasicIdea
• Jointlearningfromavailableartifacts
ExistingPKBs• Can indicatehowto
accomplishtasks• Arenot optimizedfor
interactivesearch
Existingsearchlog• Can reveal howto
formulatequeries• Cannot coverhowto
searchforproceduralknowledge
ExistingWebdocuments• Can exemplifyhowto
describetasks• Donot focuson
procedure
Canwetaketheadvantageofalltheartifactsandlearnfromeachother?
Queryphraseextraction
Three-wayparallelcorpusconstruction
Taskdescriptionextraction12
Three-wayParallelCorpusConstruction
• Parallelcorpus:=asetofmatchingtriples
• Example:GrowTallerhttp://www.wikihow.com/Grow-Taller
⟨ aqueryq,ataskt,atextualcontextc⟩
13
Three-wayParallelCorpusConstruction(cont’d)
• Step1:Extractingseedtriplesfromsearchquerylog– Scanthroughtheentiresearchquerylogtofindeachqueryq
thatmatchesthedescriptionoftaskt.– Extractthetextualcontentfromthetoprelevantdocumentsto
retrievethecontextc.Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
14
Exactmatchingisusedintheexperiment.
Three-wayParallelCorpusConstruction(cont’d)
• Step2(optional):ManuallycreatingsearchtasksfortasksinthePKB– Usethesummaryofthetaskt toformasearchqueryq and
issueitthesearchenginetoextractcontextc.– Excludethistripledueto“artificiality”!
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
15
Three-wayParallelCorpusConstruction(cont’d)
• Step3:Collectingrelatedqueries– Combinetheuser-issuedqueriesfromthesamesession(from
Step1)andthelistofqueriessuggestedbythesearchengine(fromSteps1and2).
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
16
Three-wayParallelCorpusConstruction(cont’d)
• Step4:Expandingparallelcorpus– Foreachrelatedqueryp,findthesubtasks1,…,sn thatcontains
p initssummaryorexplanation,andretrieveitscontextd.– Discardunmatchedrelatedqueries ortaskdescriptions.
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
17
Exactmatchingisusedintheexperiment.
Three-wayParallelCorpusConstruction(cont’d)
• Step5:AnnotatingBIO– Findthecontiguoussequenceofwordsfromthetaskt (context
c)thatismostrelevanttothequeryq (taskt’ssummaryorexplanation).
Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot
growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…
• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.
• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…
ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily
andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.
• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.
Searchqueriesinasession
growtaller
humangrowthhormone
growtallerexercises
…
18
BQ IQ
BQ IQ IQ
BTE ITE …Exactmatchingisusedforannotatingtask intheexperiment.
Selectedthesentencesfromcontext thatcontainallthetokens inthetask summaryand70%+ofthetokens inthetask explanation,andannotatedtheminimalspanthatcontainsthoseoverlappingtokens.
FeatureDefinition
• Featurelistforbothcontext andtask
19
Category Description/Motivation CountLocation(LOC): Appearsinthetask summaryandexplanation 2
“Skimmable information thatreaderscanquicklyunderstand”shouldbeprovidedinthetitleandthebeginningsentenceofeachstep.
Part ofspeech(POS) 36
Boththearticletitleandthefirstsentenceineachstepbeginwithaverbinbareinfinitiveform.
Parse(PAR)
Basic Stanforddependency types 50
Namedentity,nounphrase,verbphrase 3
Identify thetaskfacets(subsidiary resourcesorconstraints,etc.)
Word,context
Surface, stem,TF-IDFscore 3
Surface,stem,TF-IDFscore,POStagsofprevious/nextword 78
ModelConstruction
• Wordsequencelabelingforquery construction,tasksummaryandexplanationconstruction
20
Query construction Tasksummaryconstruction
Taskexplanationconstruction
Problem Wordsequence labelingproblems
Model MQ MTS MTE
Features The samefeatureset,exceptthatlocationisonlyusedforquery
Training set Features X t, labelsY t extractedfromtaskdescription
Features X c,labels Y c extractedfromcontext
Predictionobjective
yt*=argmax p (y t |x t ;M Q)y t ∈{BQ, IQ,O}|t |
yc*=argmax p(y c |x c ;MTS)y t ∈{BTS, ITS,O}|c |
yc*=argmax p(y c |x c ;MTE)y t ∈{BTE, ITE,O}|c |
Output yt *=O…OBQIQIQO…O yc *=O…OBTSITSITSO…OBTEITEITEO…O
Task-orientedsearch Proceduralknowledgebase
STSandAPKBC
tasktsearchtaskq2(a)identify searchtask
taskss1,…,sn
2(c)identifyandgeneratesubtaskssearchtasks
p1,…,pk
2(b)collectrelatedsearchtasks
1(a)identify task
1(c)suggestandcreatesubsearchtask
1(b) retrievesubtasks
Exactmatchingorretrievalbasedmethod
Needasearch intentmodeltoretrievetask-orientedsearchtasks(futurework)
RefertoPKBtoretrieverelatedsubtasks
Generatequeryable phrases/taskdescriptionsusinganalgorithmthatlearnshowsearchersformulatequeries/editorsdescribeproceduralknowledge
21
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment– DataPreparation– ExperimentSettings– SearchTaskSuggestionResult– ProceduralKnowledgeBaseConstructionResult
• Conclusion
22
DataPreparation
• EnglishwikiHowdatadump• AOLsearchquerylog• Queriessuggestedbysearchengines• Contextextractedfromsearchengines
23
ExperimentSettings
• Sequencelabeling vs.end-to-end evaluation
Sequencelabelingevaluation End-to-end evaluation
Goldstandard
Automaticallylabeledparallelcorpus
Manualjudgment
Testset 10-foldcrossvalidation 50randomlysampledtriples
Evaluationmethods
Precision,Recall,F-1,averagedonalltestinstances(macro-averaged) andoneachtaskthenacrossalltasks(micro-averaged),F-1basedROUGE-2and-S4
Macro-averagedandmicro-averagedPrecision@8, MAP
Baselinemethods
CRF(proposed), HMM(surface),LR,SVM,featureablation
Google, Bing,wikiHow
Featureextractors,learners
StanfordCoreNLP:sentence,token, stem,POS,dependencyparse,chunk,namedentityMALLET:CRF,HMM;LibLinear:LR,SVM
24
SearchTaskSuggestionResult
• Queryconstructionresult– TheproposedCRF-basedapproachoutperformsother
classifiers*,esp.independentclassifiers(max.SVM).– Alsooutperformseachfeaturecategory**(max.W/WORD),
andLOUstudyns (max.W/OPOS).
.7471 .6930.8112 .8087
.6855 .6612.7922 .7892
.6803 .6175.7713 .7657.7466 .6870.8113 .8082
.0000
.2000
.4000
.6000
.8000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/LOC W/WORD W/OPOS
W/OPAR W/OLOC W/OWORD LOCAL CONTEXT25
SearchTaskSuggestionResult(cont’d)
PROPOSED GOOGLE BING
Task:slimdown
weightloss slimdowndiet the slimdownclub
heavyfood 7dayslimdown howtoslimdownfast
junkfood weightloss slimdownchallenge
keepupthemood slimdownthighs howtoslimdownlegs
Task:playredalert2
buildabarracks redalert 2complete(iso)original2disc
playredalert 2game
buildawarfactory playredalert2free playra2online
radarchould playredalert2onlinefree redalert2download
buildapowerplant/tesla reactor playredalert3 freeredalert3
• End-to-endexample– Slimdown– Playredalert2
26
SearchTaskSuggestionResult(cont’d)
• End-to-endevaluation– Proposedapproachistailoredfortask-orientedsearch.– Currentgeneral-purposecommercialsearchenginesare
designedforentity-centricsearch– Currentsearchenginestendtosuggestqueriesbyappending
keywordssuchasproduct,image,logo,online,free,etc..4457 .4457
.3361
.0972 .0973.0553.0333 .0313 .0120
.0676 .0612 .0549
.0000
.1000
.2000
.3000
.4000
.5000
MacroP MicroP MAP
PROPOSED GOOGLE BING LOG
27
AutomaticProceduralKnowledgeBaseConstructionResult
.4207.3455
.4463 .4392
.1175 .1119
.2425 .2301
.3556.3153
.3822 .3788.4129
.3198.4170 .4118
.0000
.1000
.2000
.3000
.4000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/WORD W/OPOS W/OPAR
W/OWORD LOCAL CONTEXT
• Tasksummarygenerationresult– Allscoresarelowerthaninthequeryconstructiontask.– CRF outperformsotherclassifiers*(max.SVM),eachfeature
categoryns (max.W/POS),andLOUstudyns (max.W/OWORD).
28
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• Taskexplanationgenerationresult– CRF outperformsotherclassifiers*(max.HMM,implyingthe
importanceofsurfaceformsandsequencelabelingnature).– Alsooutperformseachfeaturecategoryns (max.W/WORD).– LOUstudyshowsW/OPAR performsthebestintermsof
ROUGE..3853 .3577 .3698 .3686
.0000 .0050
.2450 .2324
.3639.3176 .3489 .3472
.3718.3468
.3804 .3793
.0000
.1000
.2000
.3000
.4000
MacroF1 MicroF1 ROUGE-2 ROUGE-S4
CRF HMM SVM LR TFIDF
W/POS W/PAR W/WORD W/OPOS W/OPAR
W/OWORD LOCAL CONTEXT29
• End-to-endexample– Searchenginewouldsuggest“signupforairbnb coupon”for
“signupforairbnb”,whichimpliesanimportantresourceforthetask.
Task:signupforairbnb
Airbnb isnolongerrunningthe$50 OFF$200promobutyoucanstillsave$25OFFYourFirstAirbnb Stayof$75ormorebycopyingandpastingthislink intoyourbrowser…
Task:makeblueberrybananabread
Pleasedon’tuse regularwholewheatinthisrecipe– theloafwillturnoutverydense
Addthe wetingredients– theeggmixturetotheflourmixtureandstirwitharubberspatulauntiljustcombined
Ifyou’reinneedofaquick, easyanddelicious waytouseuptheripebabanas inyourhouse…definitely
Task:becomeacellphonedealer
However, thecellphoneprovidermayplacerestrictionsonthemannerinwhichyoucanuseitscompanyname,phonebrandsandimages
Visit thestate’sbusiness licensingagency’swebsiteandyourcity’s occupational/business licensingdepartment’swebsitetodetermineifyouneedalicenseforyourprepaidcellphonebusiness
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
30
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• End-to-endevaluation– Automaticapproachperformsworththanmanualcuration in
buildinganewPKBfromscratch.– Butstilldiscoverrelevantsubtasksthatarenotcoveredinthe
currentPKB,whichdeliversthefreshestinformationthatishardlyaddedandupdatedinstantlyinamanualprocess.
.0997 .0995 .0527.2046 .2041 .1331
.9677 .9515 .9404
.0000
.2000
.4000
.6000
.8000
1.0000
MacroP MicroP MAP
Proposed SummaryGeneration Proposed ExplanationGeneration wikiHow
31
Outline
• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion
32
Conclusion
• Investigatedtwoproblems– Searchtasksuggestionusingproceduralknowledge– Automaticproceduralknowledgebaseconstructionfromsearch
activities• Proposedtocreateathree-wayparallelcorpusofqueries,query
contexts,andtaskdescriptions.• AppliedCRF-basedsequencelabelingmodelsforquery
constructionandtaskdescriptiongeneration.• Futurework
– Userstudy– Jointranking– APKBCusinganaturallanguagegenerationapproach
33
Thanks!Questions?
http://github.com/ziy/pkb
Code&Resources
AnsweringTask-OrientedQuestionsfromtheWebWebQA Workshop,Thursday11am
RelatedWorkshopTalk
ZiYangLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityziy@cs.cmu.edu
Contact
TravelissponsoredbySIGIRStudentTravelGrant!
Acknowledgement
ParallelCorpusConstructionResult
• Relatedquery tosubtask mapping– Identified1,182query-taskpairsusingexactmatching.
• Task tocontextmapping– Selectedthesentences thatcontainallthetokensinthetask
summaryand70%+ofthetokensinthetask explanation.– Annotatedtheminimalspanthatcontainsthoseoverlapping
tokens.
35
HowDoSearchEnginesandUsersResponsetoTask-OrientedQueries?
• Thenumber(andpercentage)ofsuggestedqueries(orqueriesissuedinthesamesession)thatarementionedwithinthedescriptionofsomesubtask.– “NewWords”:E.g.slimdown->slimdowndiet– Lowqualitymaybeduetoanover-simplifiedsessiondetectionmethod
0
0.2
0.4
0.6
0.8
Fullphrase Newwords
Averagednumber
0246810
Fullphrase Newwords
Percentage(%)
Bing
Log
36
SearchTaskSuggestion
Givenatask-orientedsearchtaskrepresentedbyqueryq(a)Identifytask
– RetrievealistofcandidatetasksfromPKBthatmentionthequeryq ineitherthesummaryorexplanation.
– Selectthetaskt thatmaximizesthelikelihoodofeachcandidateoccurrence,i.e.p(yt=BQIQ…IQ|xt;MQ).
(b)Retrievesubtasks– Retrieve the first-level subtasks s1, …, sn of task t.
(c)Suggestandcreatesubsearchtask– Extract query candidates for each subtask si usingMQ again.– Rankbyp(ysi=BQIQ…IQ|xsi;MQ).
37
AutomaticProceduralKnowledgeBaseConstruction
Givenataskt,(a)Identifysearchtask
– ApplyMQ toextractatask-orientedsearchqueryq.(b)Collectrelatedsearchtasks
– Identifythequeriespi relatedto q inbothsearchlogsandsuggestedqueries.
(c)Identifyandgeneratesubtasks– Extract relevantdocumentsnippets for each relatedquerypi
fromsearchengines.– ApplyMTS/Etoextracttask summaryandexplanation.
38
Searchenginesareabletocorrectlysuggestrelatedtaskstotheuser,ratherthanrelatedentitiesorattributes.
Searchlogsrevealhowaspecificuserworkstoaccomplishatask.
DataPreparation
• EnglishwikiHowdatadump– UsedamodifiedversionofWikiTeam tool.– Obtained149,975articlesthatarenon-redirect,innamespace
“0”,non-stub,with“Introduction”and“Steps”.– CreatedaPKBof1,488,587tasks,1,439,217relations.
• AOLsearchquerylog– 21M(10Munique)queriesintotal.– Afterdowncaseandremovenon-alphanumericcharacters,639
uniquequeriesmatch619tasksummariesafterwhitespaceandpunctuationmarksignored.
– Identified33,548relatedquerycandidatesbycollectingthequeriesthatwereissuedbythesameuserwithin30minutesafterissuedeachthematchingquery.
39
DataPreparation(cont’d)
• Queriessuggestedbysearchengines– Randomlysampled1,000non-primitivetasksfromPKBthatdo
notappearinthequerylog.– Collected9,906relatedqueriessuggestedbyGoogle(avg.6.11,
max.8)and9,715(avg.5.99,max.13)relatedqueriessuggestedbyBingforthe1,639queries.
• Contextextractedfromsearchengines– ExtractedURLsfromGoogle’sfirstsearchresultpageand
excludedwikihow.comdomain(forgeneralizability),google.comdomain,URLsthathavenosubpaths (navigationalsearchresults),anddownloaded7,440contextdocuments.
– UsedBoilerpipe toextract7,437documentsascontexts,andadditional3,512documentsforend-to-endevaluation.
40
SearchTaskSuggestionResult(cont’d)
• 5mostcontributingnon-wordfeatures– Queryphrasesaremorelikelyextractedfromthesummarypart
ofadescriptionduetoitsclarityandconciseness.– Singularnounsandverbsareindicatorstobeginaquery.– Verbphraseisusedtodecidewhethertocontinueaquery.
O à BQ BQ à IQ IQ à IQ
1 POS:NNP POS-1:VB LOC:sum
2 LOC: sum LOC:sum POS-1: IN
3 DEP:ccomp POS-1:VBP VP
4 POS: VB POS-1:NNP DEP:dobj
5 DEP:nsubjpass POS-1:NN POS+1:JJ
41
AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)
• 5mostcontributingnon-wordfeatures– Nounsandverbsarecrucialforconstructiontaskdescription.– Verbsaremorepreferredtobeginthesummarythannouns.– Tobeginanexplanation,itprefersthe“begin” ofasentence
and/adependencylabelofnsubj.– Verbphrasesarealsoimportant.
Summary Explanation
O à BTS BTS à ITS ITS à ITS O à BTE BTE à ITE ITE à ITE
1 POS:VB POS-1:VB POS-1:VBP Begin VP POS-1:NN
2 POS: VBP POS-1:VBP POS:NNP POS:VBG POS-1:NN VP
3 POS:NN POS-1:NNP POS-1: IN POS:NN POS-1:DT POS-1:NNS
4 DEP: appos POS-1:NN DEP:xcomp DEP:compound
NP POS-1: ,
5 POS:NNP DEP:case POS: JJR DEP:nsubj POS-1:VB POS-1:NNP42
top related