May 11, 2015
Ishan ChattopadhyayaLucidWorksOpenStreetMap FoundationTwitter: @ichattopadhyaya, OSM: chatman
● Wikipedia of GeoData
● OpenStreetMap is a project aimed squarely
at creating and providing free geographic
data such as street maps to anyone who
wants them.
What is OpenStreetMap?
State of OSM
● Commercial competitors
– Google Maps
– Bing Maps
● http://tools.geofabrik.de/mc/
The OpenStreetMap Software Stack
What is a Geocoder?
● Input: raw query
● Output: geocoordinates
Goals for the new Geocoder● Search for:
– Cities and towns
– Streets
– Address points
– Places of Interest, Businesses, Amenities, Attractions etc.
● Reverse geocoding
● Support for fuzzy queries
Good changes in Lucene/Solr 4.x● Support for indexing polygons
– RecursivePrefixTree indexing
● Special spatial search predicates
– Contains
– IsWithin
– Intersects
– Etc.
● Reference: David Smiley's LuceneRevolution presentation
● SolrCloud mode for distributed indexing/searching
Architecture
Indexer
Solr
www.Geocoder.
in
API Layer
Planet dumps
Indexing: OSM Data format
● Node
– “A node defines a single geospatial point using a latitude and longitude.”
● Way
– “A way is an ordered list of between 2 and 2,000 nodes. Ways are used to represent linear features (vectors), such as rivers or roads.”
● Relation
– “A Relation is an all-purpose data structure that documents a relationship between two or more other objects.”
Indexing: Facts and figures
● Number of OSM Nodes in the database = 2071039612
● Number of OSM Ways in the database = 202570637
● Number of OSM Relations in the database = 2217240
Indexing: Schema
admin2 admin3
admin4
admin5 admin6 admin7 street st_type
Ireland Dublin County
Dublin Ballsbridge Lansdowne
Street
name level geo popularity
Landsdowne Street s <shape>
Indexing: Schema
admin2 admin3
admin4
admin5 admin6 admin7 street st_type
Ireland Dublin County
Dublin
name level geo popularity
Dublin 6 <shape> 1
Indexing: Schema (POIs)
admin2 admin3
admin4
admin5 admin6 admin7 street st_type
Ireland Dublin County
Dublin Ballsbridge
name category geo
Ballsbridge Hotel hotel <shape>
Searching
Classifier Validator
Geocoder (lookup)
Raw query Classifications
Valid classifications
Structured location + geocodes
Searching: Classification
Tokenizer Bloom FiltersQuery Shingles Classifications
Searching: Classification
●
● Query= “hotels near lansdowne rd dublin”
● Shingles: hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, lansdowne rd, rd dublin, .., hotels near lansdowne rd dublin
Tokenizer Bloom FiltersQuery Shingles Classifications
Searching: Classification
●
● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..
Tokenizer Bloom FiltersQuery Shingles Classifications
Cat A2 A4 A5 Streets
hotels
Match
Searching: Classification
●
● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..
Tokenizer Bloom FiltersQuery Shingles Classifications
Cat A2 A4 A5 Streets
dublin
MatchMatch
Searching: Classification
●
● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..
Tokenizer Bloom FiltersQuery Shingles Classifications
Cat A2 A4 A5 Streets
lansdowne
MatchMatch
Searching: Classifications
● Query = “hotels near lansdowne rd dublin”
● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street
Searching: Classifications
● Query = “hotels near lansdowne rd dublin”
● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street
● Possible permutations:C.5.5C.S.5C.5.SC...5C.5..etc.
Searching: Solr Query
● Query = “hotels near lansdowne rd dublin”
● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.
Searching: Solr Query
● Query = “hotels near lansdowne rd dublin”
● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.
Searching: Solr Query
● Query = “hotels near lansdowne rd dublin”
● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.
"POINT (-6.232063,53.333833)"
Searching: Searching for POIs
● Query = “hotels near lansdowne rd dublin”
● Query = “hotels near” near "POINT (-6.232063,53.333833)"
● Solr query: fl=*,scoresort=score ascq={!geofilt score=distance filter=false sfield=geo pt= 53.333833,-6.232063 d=10}fq=+category:hotel
Searching: Searching for POIs
Challenges: Indexing
● Street Associativity
● Incomplete polygons
Challenges
● Handling Updates
● Data validation
Distributed Search
● Need for distributed search?
● Geographical partitioning