Top Banner
Extracting Metadata for Spatially-Aware Information Retrieval on the Internet Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)
21

Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Extracting Metadata for Spatially-Aware Information Retrieval on

the Internet

Research Paper Presentation – CS572 Summer 2011

Presented by Donghee Sung

Paper by Paul Clough (University of Sheffield Western

Bank)

Page 2: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Short Overview

SPIRIT:Spatial awareness to information systems

e.g.transport timetablesrouting system for motoristsmap-based web siteslocation based services

Key Part:Extraction and use of geospatial informa-tion

Page 3: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Short Overview

CriteriaSpeed, Reliability, Flexibility, Multilingualism

Geo-Parsing: - Identifying geographic references- Gazetteer lookup with context rules to filter out common-usage words and personal names

Geo-Coding: - Assigning spatial coordinate- Based on information of geographic resource

Page 4: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

What’s the SPIRIT?

< http://www.geo-spirit.org/ >

Page 5: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

What’s the SPIRIT?

SPIRITSPatially-Aware Information Retrieval on the InterneT

A search engineto find documents and datasets on the web relating to place or regions

Page 6: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

What’s the SPIRIT?

Poor existing web search facilities find information related to a particular lo-cation.

Vicinity: find other places within radiuswww.somewherenear.comYellow pages services:

find a specific place or post codeBuyukkten:

associated admin’s IP with telephone area code Stanford Research Institute:

proposed ‘.geo’ with cells with  latitude and longitude

Page 7: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

What’s the SPIRIT?

Resources relating to placemay not be foundmay not be places nearbymay have another name

Major Shortcoming:cannot recognize alternative name

modern/historical variantsinformal namecontained places name

Page 8: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

What’s the SPIRIT?

SPIRIT ProjectQuery expansion / relevance ranking pro-ceduresMachine learning techniques

extraction of geographical context generating metadata

Multi-modal user interface textual inputinteractive map feedback

Spatial indices for web collections.

Page 9: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Data Sources

Sources of Spatial DataTGN, OS, SABE

A large web collection of SPIRIT

Page 10: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Data Sources

Page 11: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Data Sources

Page 12: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Data Sources

Page 13: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Tokenization IssuesStop-wordsNamed-Entitiy Recognition (NER)Gazetteers

Page 14: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Page 15: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Named-Entity Recognition (NER)

Processing a text and identifying to par-ticular categories of Named Entities(NE)

People, Organization, Location. etc

Page 16: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Tokenization Procedure

1) Tokenized on whitespace @words = split(/s+/, $sentence);(Perl Regular Expressions)

"Isn't it ashame.“ -> Isn't / it / ashame.

2) Stemming / Case conversion.isn't / it / asham

3) Removing stop-words

Page 17: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Default setting in indexing and re-trieving

- Case sensitivity: Off - Stop-word removal: Off- Stemming: Off

Stop-word removal / stemming-> Reduce the size of index files

But, can be useful:Stop-words : ‘in’, ‘inside’, or ‘of’Stemming: “London” from “London” &“Lon-doner”.

Page 18: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Geo-Parsing Techniques

Filtering candidate locations using context rules to remove

stop-wordsreferences to people and organiza-

tions, and links to emails/URLs

Page 19: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Conclusion

Geo-Parsing method could be improved by enhancing the gazetteer matching and filtering

False hits would be reduced by generating better list of stop-words and using further context rules could reduce

Need for creating rules would be alleviateby generating further context rules with fea-tures on machine learning

Page 20: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

References

[3] Jones C.B., R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J.van Kreveld, R. Weibel (2002). Spatial information retrieval andgeographical ontologies an overview of the SPIRIT project.SIGIR 2002: In SIGI’02, Tampere, Finland, 387-388.

[6] Joho, H. and Sanderson, M. (2004) The SPIRIT collection: anoverview of a large web collection. In SIGIR Forum, 38(2), 57-61.

[8] Mikheev A., Moens M. and Grover C. (1999) Named Entityrecognition without gazetteers. In Proceedings of the AnnualMeeting of the European Association for ComputationalLinguistics EACL'99, Bergen, Norway, 1-8.

Spatially-Aware Information Retrieval on the Internet - A Working Searching System

Page 21: Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Thank You!