O C T O B E R 1 3 - 1 6 , 2 0 1 6 • AU S T I N , T X
Lucene/Solr Spatial in 2015David Smiley
Search Engineer/Consultant (Freelance)
About David Smiley
Freelance Search Developer/ConsultantExpert Lucene/Solr development skills,advise (consulting), trainingJava, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC memberPrimary author of “Apache Solr Enterprise Search Server”
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
Agenda
New Features / CapabilitiesNew ApproachesImprovementsPending
Topic: New Features
Heatmaps / grid faceting — Lucene, SolrSurface-of-sphere shapes (Geo3d) — LuceneAccurate indexed geometries — Lucene, SolrGeoJSON read/write — Spatial4j
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,also useful for point-plotting search results
Usually rendered with a gradient radiusLucene & Solr APIsScalable & fast usually…
v5.2
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid basedAlgorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid
Conceptually facet.method=enum for spatialWorks on non-point indexed shapes tooComplexity: O(cells * cellDepthFactor) not O(docs)No/low memory; mainly the grid of integers
Solr will distribute to shards and mergeCould be faster still; a BFS (vs DFS) layout would be perfect
Solr Heatmap Faceting
On an RPT field (SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad”
Query: /select?facet=true&facet.heatmap=geo_rpt&facet.heatmap.geom= ["-180 -90" TO "180 90”]facet.heatmap.format=ints2D or png
// Normal Solr response..."facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]]...
Solr Heatmap Resources
Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+SearchJack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.htmlLive Demo: http://worldwidegeoweb.comOpen-source JavaScript Solr Heatmap Libraries
https://github.com/spacemansteve/SolrHeatmapLayerhttps://github.com/mejackreed/leaflet-solr-heatmaphttps://github.com/voyagersearch/leaflet-solr-heatmap
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axisNot a general 3D space geometry libInternally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematicsShapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional bufferDistance computations: Arc (angular or surface), Linear (straight-line), Normal
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!
Geo3D, continued…
BenefitsInherently more accurate than 2D projected spatial
especially for big shapes or near polesMany computations are fast; no expensive trigonometryAn alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar fileMaven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
Index & Search Geo3D Geometries
Spatial4j Geo3dShape wrapper with RPT
In Lucene-spatial for nowIndex Geo3d shapes
Limited to grid accuracy
Query by Geo3d shapeLimited distance sortHeatmaps
Geo3DPointField & PointInGeo3DShapeQuery
Based on a 3D BKD index
In spatial3d moduleIndex points-only
No multi-valuedQuery by Geo3d shapeNo distance sortLeaner & faster than RPT
v5.4v5.2
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shapes as grid cells of varying precision by prefix
Example, a point shape:D, DR, DRT, DRT2, DRT2YMore accuracy scales
Example, a polygon shape:Too many to list… 508 cellsMore accuracy does NOT scale
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)SDV (SerializedDVStrategy) stores serialized geometry (accurate)RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexesOptimized intersects predicate avoids some geometry checks> 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialFieldCompatible with the Heatmaps featureIncludes a shape cache (per-segment); configurable
v5.2
Topic: New Approaches
LuceneBKD Tree IndexesGeoPointField
BKD Tree Indexes
New numeric/spatial index approach with own file formatNot based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdfMuch faster and compact than Trie/PrefixTree based indexes
Wither term auto-prefixing? LUCENE-5879Indexed point-data only; multi-valued mostlyIntersects predicate onlyFiltering only (no distance or other scoring)Multiple implementations… (next slide)
Neat visualization https://youtu.be/x9WnzOvsGKs
Multiple BKD Implementations
Multiple implementations of the same BKD concept:(1D) RangeTreeDocValuesFormat(2D) BKDPointField & BKD…Query(3D) Geo3DPointField & PointInGeo3DShapeQuery(ND) LUCENE-6825 (to Lucene-core) in-progress
1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for nowNo Lucene-spatial module SpatialStrategy wrappers yet
thus no Spatial4j Shape integration nor Solr integration yet
BKD 1D: RangeTree
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6 bytes, …Alternatives: Normal number fields (trie), DateRangeField (RPT)
Would love to see a benchmark!How-To:
RangeTreeDocValuesFormatNumbers: SortedNumericDocValuesField with NumericRangeTreeQueryBytes: SortedSetDocValuesField with SortedSetRangeTreeQuery
v5.3
BKD 2D: BKDPointField
Efficient 2D geospatial point indexAlternative to RPT or GeoPointField5.7x faster than RPT w/ GeoHash. Smaller indexes.
How-To:Use BKDPointField (requires BKDTreeDocValuesFormat)Query:
BKDPointInBBoxQueryBKDPointInPolygonQuerypoint-radius (circle) — in-progress LUCENE-6698
v5.3
GeoPointField
2D geospatial point fieldIndexed point-only data, single/multi-valuedSpatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPTConfigurable 2x grid size (defaults to 512)Compact bit interleaved Z-order encodingRe-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic2-phase grid/postings then doc-values algorithm
v5.3
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementationsNo Solr support yetNo dependencies
Easy to use compared to RPT; simpler internally tooHow-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES))GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
Topic: Improvements
Spatial4jMinimal longitude bounding-box algorithm
Lucene (PrefixTree / RPT indexing)Leaner & faster non-point indexesNew PackedQuadPrefixTree
SolrDistance units: Kilometers/Miles/DegreesNicer ST_* spatial query parsers (almost done)
Topic: Some Pending Spatial TODOs
Spatial4jGeo3D integration — a JTS alternative
LuceneFlexPrefixTree — LUCENE-4922Multi-dimensional BKD — LUCENE-6825SpatialStrategy adapters for GeoPointField, etc.
SolrBetter spatial Solr QParsers — SOLR-4242GeoJSON parsingMore FieldType adapters for latest Lucene spatialDateRangeField facetingNearest-neighbor search
Well, 2015 isn’t over yet. :-)
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!Email: [email protected]: http://www.linkedin.com/in/davidwsmileyG+: +DavidSmileyTwitter: @DavidWSmiley