Top Banner
APACHE SLING & FRIENDS TECH MEETUP BERLIN, 23-25 SEPTEMBER 2013 Scaling Search in Oak with Solr Tommaso Teofili
21

Scaling search in Oak with Solr

Sep 08, 2014

Download

Travel

Tommaso Teofili

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling search in Oak with Solr

APACHE SLING & FRIENDS TECH MEETUP BERLIN, 23-25 SEPTEMBER 2013

Scaling Search in Oak with Solr Tommaso Teofili

Page 2: Scaling search in Oak with Solr

adaptTo() 2013 2

A look back ...

Page 3: Scaling search in Oak with Solr

Scaling Search in Oak with Solr

adaptTo() 2013 3

§  Following up last year’s “Oak / Solr integration” session

Page 4: Scaling search in Oak with Solr

Looking at last year’s agenda

adaptTo() 2013 4

§  What we have: §  Search Oak content with Solr

independently §  Oak running with a Solr index

§  Embedded    §  Remote  

§  What we miss: §  Solr based MicroKernel

Page 5: Scaling search in Oak with Solr

Apache Jackrabbit Oak

adaptTo() 2013 5

§  Scalable content repository §  JCR compatibility (to some degree) §  Performance especially for concurrent

access §  Scalability for huge repositories (> 100M

nodes) §  Support managed environments (e.g. OSGi)

§  Cloud deployments

Page 6: Scaling search in Oak with Solr

Apache Solr

adaptTo() 2013 6

§  Enterprise search platform §  Based on Apache Lucene §  Full text, faceting, highlighting, etc. §  Dynamic cluster architecture §  HTTP API §  Latest release 4.4.0

Page 7: Scaling search in Oak with Solr

Why Apache Solr

adaptTo() 2013 7

§  Distributed, fault tolerant indexing / searching

§  Highly configurable §  without touching the repository

§  Customizable

Page 8: Scaling search in Oak with Solr

adaptTo() 2013 8

How it works ...

Page 9: Scaling search in Oak with Solr

IndexEditor API

adaptTo() 2013 9

§  an  Editor  receives  info  about  changes  to  the  content  tree  

§  the  Editor  evaluates  the  status  before  and  a6er  a  specific  Oak  commit  

§  can  reject  or  accept  the  changes  (by  even  modifying  the  tree  itself)  

§  an  Editor  is  executed  right  before  an  Oak  commit  is  persisted  

§  the  SolrIndexHook  maps  changes  between  statuses  in  the  content  tree  to  Solr  documents  and  sends  them  to  the  Solr  instance(s)  

Page 10: Scaling search in Oak with Solr

IndexEditor API – creating content

adaptTo() 2013 10

§  NodeState  before  =    builder.getNodeState();  §  builder.child("newnode").setProperty("prop",  

"val");  §  NodeState  a6er  =  builder.getNodeState();  §  EditorHook  hook  =  new  EditorHook(new  

SolrIndexEditorProvider(…));  §  NodeState  indexed  =  

hook.processCommit(before,  a6er);  

Page 11: Scaling search in Oak with Solr

QueryIndex API

adaptTo() 2013 11

§  evaluate the query (Filter) cost for each available index to find the “best”

§  execute the query (Filter) against a specific revision and root node state §  internally  the  Filter  is  usually  mapped  to  the  

underlying  implementaNon  counterpart  §  improved  support  for  full  text  queries  in  Oak  

§  eventually view the query “plan”

Page 12: Scaling search in Oak with Solr

QueryIndex API

adaptTo() 2013 12

§  mapping Filter restrictions: §  Property restrictions:

§  Each  property  is  mapped  as  a  field  –  Can  use  term  queries  for  simple  value  matching  or  range  queries  for  “first  to  last”  

§  Path restrictions –  indexed  as  strings  and  with  special  fields  for  parent  /  children  /  descendant  matching  

§  Full text expressions –  use  (E)DisMax  query  parser  and/or  fallback  fields  

§  NodeType restrictions TBD

Page 13: Scaling search in Oak with Solr

Configuring the Solr index in Oak

adaptTo() 2013 13

§  create an index configuration under §  /oak:index/solrIdx

§  jcr:primaryType  =  oak:queryIndexDefiniNon  §  type  =  solr  

–  plus  some  mandatory  props  (e.g.  reindex)  

§  AddiNonal  properNes  if  want  to  run  an  embedded  Solr  server  (more  on  this  later)  

Page 14: Scaling search in Oak with Solr

Configuring the Solr index in Oak

adaptTo() 2013 14

§  Pluggable Solr server providers and configuration §  to allow different deployment scenarios

§  to allow custom configuration

Page 15: Scaling search in Oak with Solr

Oak Solr core bundle

adaptTo() 2013 15

§  provides basic API and implementation to index and search Oak content on Solr §  Solr implementation of IndexEditor

§  Solr implementation of QueryIndex

§  allows configurable mapping between §  property types and fields

§  e.g.  all  binaries  should  be  indexed  in  specific  field  

§  filter restrictions and fields §  e.g.  path  restricNons  for  children  should  hit  a  

certain  field  

Page 16: Scaling search in Oak with Solr

Oak Solr Embedded bundle

adaptTo() 2013 16

§  provides support for indexing and searching on an embedded Solr instance §  running inside the Oak repository

§  configuration can be done §  via  the  repository  

–  stored  in  the  index  definiNon  node  §  via  OSGi  

Page 17: Scaling search in Oak with Solr

Oak Solr Remote bundle

adaptTo() 2013 17

§  provides support for indexing and searching on remote Solr instances §  single Solr instance

§  distributed / replicated Solr cluster §  SolrCloud deployments §  configuration is done via OSGi

Page 18: Scaling search in Oak with Solr

OSGi platform running on Oak with Solr

adaptTo() 2013 18

§  Star  instance  with  Oak  repository  §  Add  a  bunch  of  bundles  for  Solr  

–  oak-­‐solr-­‐core,  oak-­‐solr-­‐remote,  zookeeper,  servicemix.bundles.solr-­‐solrj,  etc.  

§  Configure  the  Solr  instance  §  Configure  oak-­‐solr-­‐remote  providers  via  OSGi  

Page 19: Scaling search in Oak with Solr

adaptTo() 2013 19

See it in action ...

Page 20: Scaling search in Oak with Solr

Solr index populated with Oak content

adaptTo() 2013 20

Page 21: Scaling search in Oak with Solr

What needs to be done

adaptTo() 2013 21

§  Easy OSGi deployment §  Move common configuration stuff in

oak-solr-core §  Leverage new full text expression API §  Solr MK?