Top Banner
32

OpenStreetMap Geocoder Based on Solr

May 11, 2015

Download

Technology

Presented by Ishan Chattopadhyaya, LucidWorks

This talk is on the technical aspects of a new OpenStreetMap geocoder based on Apache Solr & Lucene. Recent changes to Apache Lucene and Apache Solr (4.0 and onwards) have seen a marked improvement in the spatial search capabilities. Also, its improved support for distributed storage and search, via the SolrCloud mode, makes applications using Solr scale easily. OpenStreetMap's current geocoder, Nomainatim, is based on Postgresql/PostGis. Some benefits of using Solr (as compared to a database system like Postgres) for building a geocoder, is robust partial text search, analysis in various languages (stemming, tokenization, stop words etc.), spell check, faceting, highlighting etc. Through this presentation, the author intends to bring out an appreciation for a Solr based geocoder.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OpenStreetMap Geocoder Based on Solr
Page 2: OpenStreetMap Geocoder Based on Solr

Ishan ChattopadhyayaLucidWorksOpenStreetMap FoundationTwitter: @ichattopadhyaya, OSM: chatman

Page 3: OpenStreetMap Geocoder Based on Solr

● Wikipedia of GeoData

● OpenStreetMap is a project aimed squarely

at creating and providing free geographic

data such as street maps to anyone who

wants them.

What is OpenStreetMap?

Page 4: OpenStreetMap Geocoder Based on Solr

State of OSM

● Commercial competitors

– Google Maps

– Bing Maps

● http://tools.geofabrik.de/mc/

Page 5: OpenStreetMap Geocoder Based on Solr

The OpenStreetMap Software Stack

Page 6: OpenStreetMap Geocoder Based on Solr

What is a Geocoder?

● Input: raw query

● Output: geocoordinates

Page 7: OpenStreetMap Geocoder Based on Solr

Nominatim

● http://nominatim.openstreetmap.org/

Page 8: OpenStreetMap Geocoder Based on Solr

Goals for the new Geocoder● Search for:

– Cities and towns

– Streets

– Address points

– Places of Interest, Businesses, Amenities, Attractions etc.

● Reverse geocoding

● Support for fuzzy queries

Page 9: OpenStreetMap Geocoder Based on Solr

Good changes in Lucene/Solr 4.x● Support for indexing polygons

– RecursivePrefixTree indexing

● Special spatial search predicates

– Contains

– IsWithin

– Intersects

– Etc.

● Reference: David Smiley's LuceneRevolution presentation

● SolrCloud mode for distributed indexing/searching

Page 10: OpenStreetMap Geocoder Based on Solr

Architecture

Indexer

Solr

www.Geocoder.

in

API Layer

Planet dumps

Page 11: OpenStreetMap Geocoder Based on Solr

Indexing: OSM Data format

● Node

– “A node defines a single geospatial point using a latitude and longitude.”

● Way

– “A way is an ordered list of between 2 and 2,000 nodes. Ways are used to represent linear features (vectors), such as rivers or roads.”

● Relation

– “A Relation is an all-purpose data structure that documents a relationship between two or more other objects.”

Page 12: OpenStreetMap Geocoder Based on Solr

Indexing: Facts and figures

● Number of OSM Nodes in the database = 2071039612

● Number of OSM Ways in the database = 202570637

● Number of OSM Relations in the database = 2217240

Page 13: OpenStreetMap Geocoder Based on Solr

Indexing: Schema

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin Ballsbridge Lansdowne

Street

name level geo popularity

Landsdowne Street s <shape>

Page 14: OpenStreetMap Geocoder Based on Solr

Indexing: Schema

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin

name level geo popularity

Dublin 6 <shape> 1

Page 15: OpenStreetMap Geocoder Based on Solr

Indexing: Schema (POIs)

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin Ballsbridge

name category geo

Ballsbridge Hotel hotel <shape>

Page 16: OpenStreetMap Geocoder Based on Solr

Searching

Classifier Validator

Geocoder (lookup)

Raw query Classifications

Valid classifications

Structured location + geocodes

Page 17: OpenStreetMap Geocoder Based on Solr

Searching: Classification

Tokenizer Bloom FiltersQuery Shingles Classifications

Page 18: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● Query= “hotels near lansdowne rd dublin”

● Shingles: hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, lansdowne rd, rd dublin, .., hotels near lansdowne rd dublin

Tokenizer Bloom FiltersQuery Shingles Classifications

Page 19: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

hotels

Match

Page 20: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

dublin

MatchMatch

Page 21: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

lansdowne

MatchMatch

Page 22: OpenStreetMap Geocoder Based on Solr

Searching: Classifications

● Query = “hotels near lansdowne rd dublin”

● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street

Page 23: OpenStreetMap Geocoder Based on Solr

Searching: Classifications

● Query = “hotels near lansdowne rd dublin”

● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street

● Possible permutations:C.5.5C.S.5C.5.SC...5C.5..etc.

Page 24: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

Page 25: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

Page 26: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

"POINT (-6.232063,53.333833)"

Page 27: OpenStreetMap Geocoder Based on Solr

Searching: Searching for POIs

● Query = “hotels near lansdowne rd dublin”

● Query = “hotels near” near "POINT (-6.232063,53.333833)"

● Solr query: fl=*,scoresort=score ascq={!geofilt score=distance filter=false sfield=geo pt= 53.333833,-6.232063 d=10}fq=+category:hotel

Page 28: OpenStreetMap Geocoder Based on Solr

Searching: Searching for POIs

Page 29: OpenStreetMap Geocoder Based on Solr

Challenges: Indexing

● Street Associativity

● Incomplete polygons

Page 30: OpenStreetMap Geocoder Based on Solr

Challenges

● Handling Updates

● Data validation

Page 31: OpenStreetMap Geocoder Based on Solr

Distributed Search

● Need for distributed search?

● Geographical partitioning

Page 32: OpenStreetMap Geocoder Based on Solr

Conclusion

● http://www.geocoder.in/

● Twitter: @ichattopadhyaya