Table of Contentspublications.aston.ac.uk/30749/1/Geo_tagging_news_stories_using... · 50 Geo-Tagging News Stories Using Contextual Modelling; Md Sadek Ferdous, University of Southampton,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The International Journal of Information Retrieval Research is indexed or listed in the following: ACM Digital Library; Bacon’s Media Directory; Cabell’s Directories; DBLP; Google Scholar; INSPEC; JournalTOCs; MediaFinder; ProQuest Advanced Technologies & Aerospace Journals; ProQuest Computer Science Journals; ProQuest Illustrata: Technology; ProQuest SciTech Journals; ProQuest Technology Journals; Research Library; The Standard Periodical Directory; Ulrich’s Periodicals Directory; Web of Science; Web of Science Emerging Sources Citation Index (ESCI)
Editorial Preface
iv Vishal Bhatnagar, Ambedkar Institute of Advanced Communication Technologies and Research, Department of Computer Science and Engineering, Delhi, India
Arushi Jain, Ambedkar Institute of Advanced Communication Technologies and Research, Department of Computer Science and Engineering, New Delhi, India
Vishal Bhatnagar, Ambedkar Institute of Advanced Communication Technologies and Research, Department of Computer Science and Engineering, New Delhi, India
Volume 7 • Issue 4 • October-December-2017 • ISSN: 2155-6377 • eISSN: 2155-6385An official publication of the Information Resources Management Association
KeywoRdSContextual Modelling, Evaluation, Geo-Tagging, Information Retrieval, Text Mining
INTRodUCTIoN
With the ever-increasing popularity of Location-based Services, geo-tagging a document - theprocessofidentifyinggeographiclocations(toponyms)inthedocument-hasgainedmuchattentioninrecentyears.Insuchservices,geographiclocationsactasthegluethatbindtogetherdisparatedocumentsets(suchastextualcontents,imagesandvideos)frommultipledatasources.Devicesthatproducemultimediadocumentssuchasimagesandvideosareequippedwiththecapabilitytohaveadditionalsensors(GPSsensors)thatcangeo-tagtherelateddocumentwithgeographicinformationsuchaslatitudeandlongitudeandtherespectiveinformationisstoredinametadataalongwiththecorrespondingdocument.Webservicesthataccumulatesuchdocuments(e.g.YouTubeandFlickr)canretrievesuchinformationautomatically.Inaddition,suchservicesallowanyusertomanuallytaganymultimediadocumentwithgeographiclocationsincasesthedocumentsarenotgeo-taggedbytheircapturingdevices.Unfortunately,thegeo-taggingprocedureisrathercumbersomefortextualdocumentsandgenerallyreliesonmanualhumaninput.Therehavebeenseveralworkstoaddress
50
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
52
of context and geo-tagging for a news story. In Section: IMPLEMENTATION, we discuss ourimplementationalongwiththealgorithmthatutilisesourmodelofcontextforgeo-tagginganewsstory.WedescribeourevaluationprocedureinSection:EVALUATIONandpresenttheresultsinSection:RESULT.Weanswerourresearchquestions,discusstheadvantagesandhighlightthelimitationsoftheproposedapproachinSection:DISCUSSION.WeconcludeinSection:CONCLUSION.
An influential work for geo-tagging web documents was presented in (Amitay, 2004). Thepaperdescribedadataminingapproachutilisingagazetteer (anatlasenlisting thenamesof allplaces)tolocateplacesmentionedwithinthedocumentaswellastodeterminethegeographicfocus,representingthebroaderlocalitysuchascitiesorstates,ofthedocument.Theauthorsalsodiscussedmechanismstoresolvetwotypesofambiguities:geo/non-geoandgeo/geo.Thefirstambiguitydepictsthescenarioswhenalocationnameissimilartoanynon-geographicname,e.g.Turkey,whereasthesecondambiguity(geo/geo)illustratesthescenarioswhenplacesindifferentcountriessharethesamename,e.g.London,EnglandandLondon,Canada.Basedontheevaluationover600webpages,theauthorsreportedaprecisionof82%forindividualgeo-tagsandaprecisionof91%indeterminingthegeographicfocusofthenews.Theirpaperalsodidnotinvestigateiftheapproachwouldbesuitableforstreet/localitylevelgranularity.
Oneofthemajorchallengesingeo-taggingadocumentistohandledisambiguity.Inthisregard,theauthorsin(Garbin,2005)presentedanapproachbasedonunsupervisedmachinelearningbyaggregating twopublicly available gazetteers. At first, ambiguous locations weredisambiguatedautomaticallybyapplyingpreferenceheuristicswhichactedasatrainingdatasetforthemachinelearner.Next,themachinelearnerwasusedtodisambiguateambiguouslocationsfromotherdata.Theirresultoftheirapproachwascomparedwithahuman-annotatednewscorpuscontaining7,739documentswith78.5%precision.
A recent work on geo-tagging the news article was presented in (Ignazio, 2014) where theauthorsextendedanexistinggeo-taggingAPIcalledCLAVIN(CLAVIN,2016)byapplyingafewheuristicsbasedonthemethoddescribedin(Amitay,2004).Theirapproachdeterminedthefocusofanewsarticleaswellasallplacesmentionedinthearticle.Theyreported95%accuracyoverasmallmanuallyannotateddatasetof75newsand90%-91%accuracyindeterminingfocusatthecountrylevelusingseparate10,000samplesfromtheNewYorkTimesAnnotatedCorpus(Sandhaus,2008)andReutersRCV-1Corpusrespectively(Sandhaus,2004).
ThereareseveralcommercialAPIsavailableforgeo-taggingtextualdocumentssuchasnewsstories,e.g.OpenCalais(OpenCalais,2016),Placespotter(Placespotter,2016)andGeoTag(GeoTag,2016),however,wehavebeenlookingforpubliclyavailableAPIssothattheycanbeextendedtomeet our requirements. We have found two such APIs, namely, CLAVIN (CLAVIN, 2016) andCLIFF(CLIFF,2016).Betweenthesetwo,CLIFFisbasedonCLAVINandhasextendedCLAVIN’scapability.Moreover,ithasbeenreportedtoachievebetterperformancethanCLAVINin(Ignazio,2014). Therefore, we have selected CLIFF for our experiment. CLIFF utilises Stanford NER(StanfordNER,2016)toextractnamedentitiesandthenappliesafewheuristicstogeo-tagthenewsandtodetermineitsgeographicfocus.EventhoughCLIFFcanidentifyastreet/locality,itdoesnotassociateastreetwithacity.Inaddition,alltheworksdiscussedabovemainlyfocusedeitheronacountryoracitylevelanddidnotinvestigateiftheirapproachwouldbesuitableforstreetorlocalitylevelgeo-tagging.Furthermore,theirapproachwouldfailintheabsenceofdirectmentionsoflocations.
LetC denotethesetofallcountriesintheworldandCITYc denotethesetofallcitiesinacountry c C∈ .Wedefinethesetofallroadsin city CITYc∈ ofacountry c with Rcity .Eachcountrydefinesitsownformatofapostcode.Withoutspecifyingwhatthatformatis,weassumethatapostcodeisassignedtoacollectionofoneormoreroadswithinaspecificcity.WedenotethesetofpostcodeswithPCcity forcity CITYc∈ .Torelateapostcodewiththecorrespondingroads,wedefinethefollowingfunction.
Definition 2:Let pcToRoads PC P Rcity city: → ( ) bethefunctionthatreturnsthesetofroads
Finally,wedefinealocationasanorderedpairconsistingofaroad,alocality,apostcode,acityandacountry.Formally,thesetoflocationsforacity city CITYc∈ incountry c C∈ isdenotedas Lcity andisdefinedas:
L r loc pc city ccity = ( ){ }, , , ,
where:
r R loc LOC pc PC city CITY and c Ccity city city c∈ ∈ ∈ ∈ ∈, , ,
l r pcToLoc roadToPC r roadToPC r city c= ( )( ) ( )( )1 1 1, , , ,
where l Lcity∈ , roadToPC r1( ) resolves the rode to the corresponding postcode and
pcToLoc roadToPC r1( )( ) resolves the road to the postcode which is then resolved to thecorrespondinglocality.Anexampleofalocationwherealocalityisdefinedis:176King’sRoad,Chelsea,SW34UP,London,UK.Anotherexampleofalocationwherealocalityisnotdefinedis:29EthelbertRoad,CT13NF,Canterbury,UKwhereCanterburyisacity,notalocality,intheUK.
Definition 6:Let geocoding C INFO T INFOCONTEXT SPATIAL: ( \ )× → bethefunctionthat,giventheinputsofacountryandanycontextualinformationexceptatimestamp,returnsthespatialinformationforthatcontextwithinthatcountry.
Definition 7:Letgeo tagging N C P INFOnews SPATIAL− × →: ( ) bethefunctionthat,giventheinputsofanewsandacountry,returnsthesetofspatiallocationinformationwhichareidentifiedinthatnewsandlocatedwithinthatcountry.
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
58
INFOSPATIAL andsoon)whichareusedtorepresentdifferententities,attributesandgeographicalproperties.Forexample, INFOCONTEXT encodesthenotionofcontextualinformationwithrespecttoanewsstorywhereas INFOSPATIAL representsaspatiallocationofacitywithinacountryalongwith its coordinates.Themodel alsoconsistsofdifferent functions thatdefine the inter-relationbetweenthespecifiedsetsusingmathematicalfunctions.Amongalldefinedfunctions,geocoding andgeo taggingnews− areofparticularinterest.Thegeocoding functionillustratestheconceptofconvertingacontextintoaspatiallocationsconsistingofthecorrespondingcoordinates.Ontheotherhandthe geo taggingnews− functionillustratestheconceptoftagginganewswithaspatialinformationandprovidesapowerfulabstractionthathidesawaytheinternalgeocodingprocess.
These twofunctions, in reality,provide theblue-print fordevelopingalgorithms thatcanbeutilisedtoimplementanimprovedgeo-tagger.Inthenextsection,weelaboratehowwehaveachievedthesegoals.
The architecture of the Geo-Tagger is illustrated in Figure 3. The application relies on thefollowingcomponents:
• TheStanfordNamedEntityRecognizer (NER),which isused toextractnamedentitiesandaspatial location information such as locations, persons and organisations within a news(StanfordNER,2016)representingthesetofcontextualinformation INFOCONTEXT( ) ;
Sinceournewscollection(denotedas N )primarilyconsistsofnewsstoriesfromaspecificcountry,themainfocusoftheGeo-Taggeristoidentifylocationswithinthatcountry.Theboundingboxcoordinatesspecifiedintheinputfileisusedtofilteroutanyotherlocationsoutsideofthisboundingbox.Inthisway,theboundingboxcoordinatesintheinputfileactsasthegeographicfocusspecifiedbytheuserandrepresentsacountry c C∈ cforthe geo taggingnews− function.Thisisincontrastwithanyexistingapproachwherethegeographicalfocusisdetectedautomatically.TheadvantageofourapproachwillbediscussedinSection:Advantages.
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
60
The geo-tagging algorithm (Algorithm 1) essentially is an implementation of thegeo taggingnews− function.Ittakesanews n N∈( ) andaboundingboxofacountry(representingc C∈ ).Internally,itexploitsthecontextualinformationextractedfromnamedentitiesusingtheStanford NER representing the INFOCONTEXT set. At the first phase, all aspatial locationsLN INFOCONTEXT⊆( ) areprocessedonebyone.Ifsuchalocationpresentsacityhavingthe
coordinates(retrievedusingtheGeocode.FarmAPI)withinthespecifiedboundingbox,thedocumentisgeo-taggedwiththelocation.Ifthelocationrepresentseitheralocalityorastreet,thenthelocationisfedintotheARA(Algorithm2).Thisisbecausesuchalocationmayexistinmorethanonecitywithinthespecifiedboundingbox.TheARAmayreturnaproperlydisambiguatedlocationandifso,thenewsisgeo-taggedwiththelocation.TheARAmayalsoreturnalistoflocationsindicatingthelocationsinthenewsarestillambiguous.Insuchcases,thenewsisgeo-taggedwiththelocationalongwiththisambiguitytagandallmatchedcitiesandisstoredinthedatabase.Atthesecondphase,allorganisations O INFOCONTEXT⊆( ) areextractedfromthenewsusingtheStanfordNER.ThecoordinatesofeachorganisationisthenretrievedusingGeocode.FarmAPIandiftheyarewithintheboundingbox,thenewsisgeo-taggedwiththelocationoftheorganisation.
Algorithm1.Geo-taggingalgorithm
Input: a news story n N∈( ), input file (representing c C∈ )
Output: spatial locations identified in the news (a subset of INFOSPATIAL )1: → use Stanford NER to extract the set INFOCONTEXT( ) of named entities within the news 2: → extract locations LN INFOCONTEXT⊆3:loop for ln LN∈4: if ln represents a city CITYc∈ then5: → utilise the Geocode.Farm API by passing ln and c and retrieve coordinates i INFOSPATIAL∈( ) of ln within c.6: → geotag n with i7: else if ln∈LOCcity or ln∈Rcity (i.e. ln representing a locality or a road in a city) then
8: → call ARA passing the location ln( ) and the named entities INFOCONTEXT( ) as inputs and retrieve coordinates i INFOSPATIAL∈( )9: if i null≠ then10: → geotag n with i11: end loop12: → extract the set of organisations locations O INFOCONTEXT⊆13: loop for o O∈14: →utilise the Geocode.Farm API by passing o and c and retrieve coordinates i INFOSPATIAL∈( ) of o within c15: if i null≠ then16: → geotag n with i17: end loop
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
61
Algorithm2.Ambiguityresolutionalgorithm
Input: an aspatial location ln( ) , a news n N∈( ), named entities from the news INFOCONTEXT( ) and a country c C∈Output: a set of spatial locations i INFOSPATIAL⊆ containing either one or multiple locations 1: → retrieve i INFOSPATIAL'⊆ the set of coordinates for ln in c using the Geocode.Farm API 2:loop for i i∈ '3: if coordinates in i belong to a city CITYc∈ then
4: → return i associating ln with city5: else if coordinates in i belong to different cities within country c then6: → extract city locations LN INFOCONTEXT⊆7: if only one city is found LN =( )1 then
8: → return i associating ln with city9: else if more than one match is found LN >( )1 then
10: → match between cities extracted in ln and i11: if only a single match is found then12: → return i associating ln with city13: else if more than one match is found then14: → determine the vicinity between ln and the matched cities 15: → choose city having the lowest vicinity score16: → return i associating ln with city17: → if more than one cities having the same vicinity score then
18: → store all cities in a list l1( )19: else if no city is found or list l1 exists then20: → extract O INFOCONTEXT⊆21: → retrieve locations for each organization using similar heuristic and return i associating ln with city22: end loop23: →return l1
TheAmbiguityResolutionAlgorithm(ARA)retrievesthecoordinatesofalocationusingtheGeocode.FarmAPI.Ifthelocation( ln LN∈ ,mainlyalocalityorastreet)isresolvedtoonlyonecityhavingthecoordinateswithintheboundingbox,thelocationisassociatedwiththecityandthenewsisgeo-taggedwiththelocationalongwithitscoordinates.Ifthelocationisresolvedtomorethanone city, cities residingwithin theboundingboxare selected.Then, aspatial city locationsLN INFOCONTEXT⊆( ) fromthenewsareextractedusingtheStanfordNER.Twolistsofcities
The studywith thehighest number evaluateddocumentswas reported in (Ignazio, 2014)whichutilisedNewYorkTimesAnnotatedCorpus(Sandhaus,2008)andReutersRCV-1Corpusrespectively(Sandhaus,2004),bothannotatedatcountryandcitylevel.However,theevaluationwasnotconductedforfinergranularity(e.g.street/localitylevel).Toourknowledge,thereisnopubliclyavailabledatasetwhichisannotatedatstreetorlocalitylevel.Hence,acomparativeevaluationwithsuchdatasetswasnotperformed.Instead,wehavedesignedauserstudywithasmallerdatasetfurtherdiscussedinSection:OurDataSetbelow.Thisdatasetwasgeo-taggedbytheGeo-Taggerapplicationandusedfortheuserstudy.Themaingoalofthestudyistodemonstratetheeffectivenessofthealgorithmaswellastoidentifyitslimitations.Thedatasetgenerationprocedure,theuser-studyanditsprotocolsarepresentedbelow.
Reuter data SetForalargescalecomparison,wecomparedthecountrylevelresultidentifiedbyGeo-TaggerwiththereportedresultbyCLIFFusingtheReuterRCV-1Corpus(Sandhaus,2004).TheRCV-1corpusconsistsofover800,000newswhereeachnewsincludesacountrytag.Fromthiscollection,asampleof10,000newswasrandomlyselectedrepresentingourReuterdatasetwhichwerethengeo-taggedusingCLIFFandtheGeo-Tagger.
our data SetOurcollection( n )ofnewsstoriesconsistsofmorethan11,000newsretrievedfromdifferentnewswebsitesforaroundaperiodofoneyear,asdiscussedinSection:INTRODUCTION.Atfirst,wehaveutilisedtheCLIFFAPItogeo-tagallnewsstoriesin n generatingtwodatasets:thefirstset,S1 ,consistingofnewsforwhichCLIFFhasfailedtogenerateanylocationandthesecondset, S2 consistingofnewsthathavebeengeo-taggedbyCLIFF.Then,fromS1 ,wehavechosen100randomnewsrepresentingourfirstevaluationdataset(denotedasEVAL SET−
).Next, our application has been utilised to geo-tag these 300 news in EVAL SET−
100 and
EVAL SET−200
whicharethenusedtocarryoutthefollowinguserstudy.
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
63
User StudyTheweb-baseduserstudyconsistedof6subjects(Female:2;Male:4;agerange:25-35years).Thesubjectswererecruitedbysendinganemailto6differentresearchgroups.SinceEVAL SET−
T1: Ifanysortof locationinformation(i.e.nameofacity,country,etc.)relevant to thenewsispresent in thenews story?T1helpedus togather statistics related to thenumberof storieshavingnolocationinformation.Theseresultswillbefurtherusedtoevaluatetheeffectivenessofourapproach,i.e.abilitytofindageo-locationintheabsenceofdirectionmentionsinastory.
T1 and T2Atfirst,weanalysetheresultswithrespecttoT1andT2.Theagreementlevel(i.e.bothsubjectshavingthesameopinion)inrelationtoT1was100andforT2was98.Thedisagreementlevelin
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
The results from the Geo-Tagger and the user response with respect to T1 and T2 are alsocompared.Thesubjectsidentifiedlocationsin260newsoutof300(87%).Thisisduetothefactthatsubjectshavebeenabletoidentifylocationsfromthenameoforganisationsbecauseoftheirlocalknowledge.However,theyfailedtoidentifyasmanylocationsastheGeo-Tagger(87%vs.93%)sincetheycouldnotidentifythelocationsofsomeorganisations,e.g.primaryschools,pubs,etc.Ontheotherhand,thesubjectsidentifiedstreets/localitiesin143newsoutof300(48%)andhenceperformedbetterthanCLIFF(15%vs48%).Thisisattributedtotheirlocalgeographicalknowledgeallowingthemtoidentifymanystreetsorlocalities.However,theperformanceofthesubjectswasstillinferiorcomparedtotheGeo-Taggerwhichidentifiedstreets/localitiesin233newsoutof300(78%).ThereasonforthisisthattheGeo-Tagger,intheabsenceofastreet/localityname,usedthenamesoftheorganisationstoidentifystreetsorlocalities.Eventhoughthesubjectshadlocalknowledge,theycouldnotresolvethestreet/localityinformationforsomeorganisations.Inthewordsofonesubject:
Table 1. CLIFF vs. geo-tagger vs. user response
CLIFF Geo-Tagger User SD
T1:Location 67% 93% 87% 14%
T2:Street/Locality 15% 78% 48% 32%
Figure 4. Result plot for T1 and T2
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
Anon-parametricKruskal-Wallistestisusedtoexaminethesignificanceoftheresultsobtainedfrom the three independent groups (UserResponses,Geo-Tagger andCLIFF), in relation to thenumberofnewsstorieshavinglocationinformationinthem(T1).Thetestresultsshowedsignificantdifferences χ 2
resultsshowedsignificantdifferencesbetweenallpairs p <( )0 001. ,except,UserResponsesand
Geo-Tagger ( p =( )0 018. ). Hence, the statistical test also suggests that the Geo-Tagger issignificantlyeffective(itsabilitytofindlocationsinanewsstory)thanCLIFF.
Anon-parametricKruskal-Wallistestisusedtoexaminethestatisticalsignificanceoftheresultsobtainedfromthethreeindependentgroups,inrelationtothenumberofnewsstorieshavingstreetl o c a t i o n s i n t h e m ( T 2 ) . T h e t e s t r e s u l t s s h owe d s i g n i f i c a n t d i f fe r e n c e sχ 2
233 20 2 0 001= = <( ). , , .df p .AMann-Whitneytest(post-hoc)wasconductedtofollowup the findingsbyapplyingaBonferroni correction, to report all the effects at a0.016 levelofsignificance.Thepost-hoctestresultsshowedsignificantdifferencesbetweenallpairs p <( )0 001. .TheseresultssuggestthattheGeo-Taggerissignificantlyeffective(infindingstreets/localitiesinanewsstory)thanCLIFF.
T3Next,weanalysetheresultswithrespecttoT3.TheagreementlevelofthesubjectsinrelationtoT3was95%.Thedifferenceinagreementlevelisattributedtothefactthatsomesubjectschosesubsetlocationsasnon-relevant,evenifthisaccuratelyrepresentedthenews.Forexample,locationsinanewsmayinclude:i)CityXandii)RoadA,CityX.Ourapproachpresentedsuchlocationsbecausewewanted to show thecity level scope for anews tohelp the subjects in identifyingirrelevantstreet/locality information.Moreover, in thecaseofsportsnews(e.g. footballmatchbetweenChelseaVsArsenal,bothinLondon,UK),somesubjectsconsideredthenameoftheteamsasrelevantlocations,buttherestdidnot.Furthermore,somesubjectschoseallirrelevantoptionsandothersdidnot.Thisvariationontheagreement,webelieve,isacommonphenomenonsincedifferentuserswillhavedifferentsubjectiveopinionsregardinghowalocationcanbeinferredevenintheabsenceofadirectmention.Inadditiontheiropinionwillbeinfluenceddependingontheirfamiliaritywithaparticularlocation.
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
66
TheGeo-Tagger identified1118 streets/localities in300news including repetitiveentries inmanylocations,meaningastreet/localitywasfoundinmorethanonenews.Outofthese,79streets/localitiesweretaggedasirrelevantbytheuserswhichisaround7%of1118,whichwasstatisticallyinsignificantaccordingtoMann-Whitneytest(p<0.001),i.e.demonstratedtheeffectivenessofthealgorithm,indicatinganaccuracyofaround93%(seeFigure5).
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
67
organisations,inadditiontoaspatiallocationsandthishelpedGeo-Taggertogeo-tagnewsevenintheabsenceofdirectmentionsofaspatiallocations.TheaccuracyofGeo-Taggeralmostresonate with the results obtained using our data set. This demonstrates the effectivenessofourapproachoverdifferentdata setsofnewsstories.TheaccuracyofCLIFFusingourReuterdatasetisslightlylowerthanwhatwasreportedin(Ignazio,2014).Theexactreasonisdifficulttoestablishsincethesampledistributionin(Ignazio,2014)andusedinthispaperwillmost likelybedifferenteventhoughtheyoriginatedfromthesamedataset.Wecouldnotcomparetheeffectivenessofourapproachinfindingthestreet/localitylevelgranularityoverthis10,000datasetsincethenewsstoriesinRCV-1arenotgeo-taggedatthisgranularityandhence it isnot reported in thispaper.Furthermore, it is important to realise that,eventhoughasmallerdata-setwasusedforcarryingoutthiscomparison,itdoesnotinvalidatethecomparisonwhatsoeverasthesamedatasethasbeenusedforcarryingoutthecomparisonbetweenCLLIFFandourapproach.
Research QuestionsWithrespecttoRQ-1,wehavedevelopedamathematicalmodelofcontextforanewsstory(Section:GEO-TAGGINGMODELLING)consistingofinformationthatanswers5corequestionsofwho,what,where,whenandwhy.Inthisway,thecontextofanewsessentiallycharacterisesthenews,justlikethewayacontextofausercharacteriseshis/hersituation.Thisisasimpleyetpowerfulmodelasseveralof itscomponentssuchaspeople,organisationsand locationscanbeutilisedforgeo-taggingnewsstories.Moreover,someofitscomponentssuchaslocationandcategorycanbeeasilyexpandeddependingonthegranularityofourchoice.Wehavealsoformalisedamodelofspatialinformationandhaveshownhowacontextualinformationcanberelatedwithaspatialinformationbydevelopingamathematicalmodelofgeocodingandgeo-tagging.Ourmodel,inessence,providesablue-printfordevelopingalgorithmstobeutilisedforthegeo-taggingprocess.Wehavediscussedhowwehavedevelopedsuchalgorithmsutilisingourmodelthatencodethegeo-taggingprocessintheimplementationsection.
Inthispaper,wehavedevelopedamathematicalmodelofcontextandgeo-taggingwithrespecttonewsstoriesandhaveexploitedthatmodeltogeo-tagnewsstoriesevenintheabsenceofdirectmentionsoflocationsaswellasatthegranularityofstreet/localitylevel.Forthis,wehave incorporated our model with a geo-tagging algorithm and utilised off-the-shelf toolsandexistingGeocodingAPIs.Thedatasetgeneratedafterapplyingourapproachhasbeenevaluatedwith6users.Inaddition,wehaveevaluatedourapproachover10,000newsfromtheReuterdataset.TheresultsdemonstratetheeffectivenessofourapproachagainstexistingpubliclyavailableAPIs.
In future, we plan to extend the evaluation with a larger sample and subjects, once theapproachisimprovedandthentoconductanexperimentwithindependentstyledesign.Attheend,weaim to release thedatasetcontaining locationsat thegranularityof street/localitiesto the research community for their research. As we have found that different subjects havedifferentopinionsregardingafewspecificlocations(e.g.locationsregardinganorganisation),itwillbeinterestingtoincorporateaconfidencelevelwhiletheusersevaluatetheapproach.Then,resultscontainingaverylowconfidencelevelcanbespeciallytreatedorevenexcludedduringtheoverallevaluation.
International Journal of Information Retrieval ResearchVolume 7 • Issue 4 • October-December 2017
70
ReFeReNCeS
Abowd,G.D.,Dey,A.K.,Brown,P.J.,Davies,N.,Smith,M.,&Steggles,P.(1999,September).Towardsabetterunderstandingofcontextandcontext-awareness.Proceedings of theInternational Symposium on Handheld and Ubiquitous Computing(pp.304-307).Springer.doi:10.1007/3-540-48157-5_29
Amitay,E.,Har’El,N.,Sivan,R.,&Soffer,A.(2004,July).Web-a-where:geotaggingwebcontent.Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval(pp.273-280).ACM.
Errico,M.,April,J.,Asch,A.,Khalfani,L.,Smith,M.,&Ybarra,X.(1997).The evolution of the summary news lead.MediaHistoryMonographs-OnLineJournalofMediaHistory.
Garbin,E.,&Mani,I.(2005,October).Disambiguatingtoponymsinnews.Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing(pp.363-370).AssociationforComputationalLinguistics.doi:10.3115/1220575.1220621
Leidner,J.L.(2006).Anevaluationdatasetforthetoponymresolutiontask.Computers, Environment and Urban Systems,30(4),400–417.doi:10.1016/j.compenvurbsys.2005.07.003
Lieberman,M.D.,Samet,H.,Sankaranarayanan,J.,&Sperling,J.(2007,November).STEWARD:architectureofaspatio-textualsearchengine.Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems(p.25).ACM.doi:10.1145/1341012.1341045
Md Sadek Ferdous is a Research Fellow at the University of Southampton, UK. He received his PhD degree in Computing Science from the University of Glasgow, UK in 2015. His research interests include Information Retrieval, Security, Privacy, Identity Management, Trust Management and Blockchain technologies.
Soumyadeb Chowdhury completed his Masters and PhD in Computing Science from University of Glasgow. Employed as a Research Assistant thereafter working in the Integrated Multimedia City Data project in Urban Big Data Center (funded by EPSRC). Currently, lecturer of InfoComm Technology in Singapore Institute of Technology. Research interests include Information Retrieval, Information Security and Privacy, Human Computer Interaction, Pervasive Sensing Technologies and Visualisation Metaphors and Usable Security.
Please recommend this Publication to your librarianFor a convenient easy-to-use library recommendation form, please visit:http://www.igi-global.com/IJIRR
Volume 7 • Issue 4 • October-December 2017 • ISSN: 2155-6377 • eISSN: 2155-6385An official publication of the Information Resources Management Association
all inquiries regarding iJirr should be directed to the attention of:Zhongyu (Joan) Lu, Editor-in-Chief • [email protected]
all manuscriPt submissions to iJirr should be sent through the online submission system:http://www.igi-global.com/authorseditors/titlesubmission/newproject.aspx
Advanced software development related information retrieval issues • Classification • Clustering approaches • Content and context awareness and environment awareness • Filtering system • Index techniques • Information mining • Information retrieval in cloud computing issues • Information retrieval in education • Information retrieval in healthcare • Information retrieval in science, engineering, and technologies • Information retrieval in social science, social behaviors • Information retrieval with business, commerce, etc. • Information retrieval with Internet of Things • Knowledge mining • Link analysis • Machine learning on documents • Message passing • Metadata and XML retrieval • Mobile computing related information retrieval issues • Multimedia retrieval • Performance measures • Query languages and optimization • Retrieval architecture • Retrieval evaluation • Retrieval languages and operations • Retrieval strategies • Retrieval systems • Retrieval theories • Retrieval with big data technologies • Scalability • Search algorithms • Search engine • Social media related information retrieval issues • Taxonomy theory and applications • Text mining • Text, document, and image retrieval • Web mining
Coverage and major topiCsThe topics of interest in this journal include, but are not limited to:
The mission of the International Journal of Information Retrieval Research (IJIRR) is to provide an outlet for researchers to present their research and obtain inspiration in the areas of information retrieval, computer science, and information science. Focusing on theories, methods, technologies, and tools, IJIRR is aimed towards information engineers, scientists, and related professionals. This journal exhibits expert experiences and state-of-the-art technologies in search and storage of texts, images, videos, and other data to stimulate innovation and exploration of improved approaches for conquering industry problems.
mission
Ideas FoR specIal Theme Issues may be submITTed To The edIToR(s)-In-chIeF
international Journal of information retrieval research