Top Banner
Sitecore and Solr Ian Mariano and Steven Zhao NorthPoint Digital
42

Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Mar 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore and SolrIan Mariano and Steven Zhao

NorthPoint Digital

Page 2: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Agenda• Who We Are • Solr Overview • Sitecore and Solr

• Setup • Nuances • Use Cases and Demos

• Scaling Solr • Q & A

Page 3: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Who We Are

Page 4: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

NorthPoint

• Based in NYC with offices in BOS and PHL • Agile solutions for Financial and Digital markets • Digital

• Delivering scalable content solutions • Focus on business outcomes and solution platforms • Technology Agnostic • Open Source, Java/Mobile, .NET, Big Data practices

We Lead with Experience

Page 5: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

NorthPointSome of Our Clients

Page 6: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

NorthPointIan Mariano

Steven Zhao

Project Manager - Digital

Senior Consultant - Digital

20+ years operating in the intersection of technology and man Colleague of instigators, storytellers and purveyors of fine design

Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other way around

@ianmariano - [email protected]

@stevenzhaonps - [email protected]

Page 7: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Overview• Open source search based on Apache Lucene

• Extensible schema • Expanded query language • Faceted search and filtering • Extensible caching • Highly scalable and available • Index external data sources • Expanded update formats • Rich document processing • Multiple search collections

http://lucene.apache.org/solr/

Page 8: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore and Solr

Page 9: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Setup

• Setup Solr • Download and configure • Local / Jetty for development • Use Tomcat / Glassfish / other servlet container for production

• Use dedicated servers (instances) • Think about security

• Create an initial itembuckets collection for Sitecore native

Page 10: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Setup

Solr Setup Demo

Page 11: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Setup

• Setup Sitecore to Use Solr • Download the appropriate Solr support package from SDN • Generate and install your schema.xml into the itembuckets collection • Configure the Solr endpoint • If needed, choose your IOC container • Re-Index

Page 12: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Setup

Page 13: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Setup

Sitecore Solr Configuration Demo

Page 14: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Nuances

• Item buckets and content editor search • What about missing field(s) from query results? • What are dynamic fields? • What are computed fields?

Page 15: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Use Cases• General search

• Faceting • Autocomplete • Boosting

• Search external sites • File crawling • Big data / external data

Page 16: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore/Solr General Search• Searching can be across any specific field • If no fields are provided, then the default field is used (configurable) • General queries • Pagination • Sorting • Filtering • Return only specific fields

Page 17: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore/Solr General SearchQuerying!!q=stent!!SolrQueryResults<Content> r = solr.Query(new SolrQuery(“stent”)); !q=title:stent AND summary:aorta!!SolrQueryResults<Content> r = solr.Query( new SolrQueryByField(“title”, “stent”) && new SolrQueryByField(“summary”, “aorta”)); !q=title:stent OR title:aorta!!SolrQueryResults<Content> r = solr.Query( new SolrQueryByField(“title”, “stent”) || new SolrQueryByField(“summary”, “aorta”));

Page 18: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore/Solr General SearchPagination!!start=x&rows=y (zero-based)!!new QueryOptions { Start = x, Rows = y } !Sorting!!sort=field1 asc, field2 asc, …!!queryOptions.AddOrder(new SortOrder(“field”, Order.ASC));

Page 19: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore/Solr General SearchFiltering!!fq=type:news&fq:category:us!!queryOptions.FilterQueries = new ISolrQuery[] { new SolrQueryByField(“type”, “news”), new SolrQueryByField(“category”, “us”) }; !fq=+type:news +category:us!!queryOptions.FilterQueries = new ISolrQuery[] { new SolrQueryByField(“type”, “news”) && new SolrQueryByField(“category”, “us”) };

Page 20: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore/Solr General SearchReturning Specific Fields!!fl=id title summary!!new QueryOptions { Filter = new[] { “id”, “title”, “summary” } } !fl=* score!!new QueryOptions { Filter = new [] { “*”, “score” }}

Page 21: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Faceted Search• Directed search like Amazon / Zappos • Have a faceting strategy in line with your content strategy • Facet by

• Field value • Field range • Subqueries

Page 22: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Faceted Search• Field value (content containing Lincoln faceting on categories and author counts)

• q=lincoln • facet=true • facet.field=category • facet.field=author • facet.mincount=1 !

ISolrOperations<President> solr = ServiceLocator.Current.GetInstance<ISolrOperations<President>>(); SolrQueryResults<President> results = solr.Query ( new SolrQuery("lincoln"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetFieldQuery("category"), new SolrFacetFieldQuery("author") }, MinCount = 1 } } );

Page 23: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Faceted Search• Field range (matching products faceting on price ranging from 0 to 1000)

• q=headphones • facet=true • facet.range=price • facet.range.start=0 • facet.range.end=1000 !

ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQuery("headphones"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 0m, 1000m) ) }, MinCount = 1 } } );

Page 24: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Faceted Search• Subqueries (matching products faceting on specific price ranges)

• q=headphones • facet=true • facet.query=price:[0 TO 99] • facet.query=price:[100 TO 199] • facet.query=price:[200 TO *] !

ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQuery("headphones"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 0m, 99m) ), new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 100m, 199m) ), new SolrFacetQuery( new SolrQueryByRange<string>("price", "200", "*") ) }, MinCount = 1 } } );

Page 25: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Autocomplete

• Via faceting or limited fields • Define a field type for autocompletion

• Choose a tokenizer (whitespace) • Filter to lowercase to normalize queries • Filter using EdgeNGramFilterFactory to match word beginnings

Page 26: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Autocomplete• Example: autocomplete list for a category facet

• q=*:* • rows=0 • facet=true • facet.field=category • facet.mincount=1 • facet.limit=5 • facet.prefix=home !

ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( SolrQuery.All, new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetFieldQuery("category"), }, MinCount = 1, Limit = 5, Prefix = "home" }, Rows = 0 } );

Page 27: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Autocomplete• Example: autocomplete list for a category field

• q=category:*headph* • rows=0

ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQueryByField(category, "*" + searchTerm + "*") );• For the UX

• JQueryUI or alternate autocomplete facility • AJAX call to web service that executes the query (direct or proxied?)

Page 28: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr Search Boosting• Why would you?

• Content / marketing strategy • Tailored user search results

• Why not? • Skewed results !

• Boost Using • Queries: q, dismax, or edismax • Boost functions in queries • Or static boosting (solrconfig.xml)

• Document elevation • Static boosting request handler

Page 29: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

• Simple “q” boosting

• Boosts are added to total scoring

• q=features:video^10+text:video^2SolrQueryResults<Product> results = solr.Query ( new SolrQuery(“features:video”).Boost(10) + new SolrQuery(“text:video").Boost(2) );

Solr Search Boosting

Page 30: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

• dismax (disjunction max) • search is executed across multiple fields with different relevance weights • The maximum score across these is added to the score - more control over ranking

• q=video • defType=dismax • qf=features^10 text^2 SolrQueryResults<College> results = solr.Query ( new SolrQuery("new york"), new QueryOptions { ExtraParams = new Dictionary<string, string> { {"qt", "dismax" }, {"qf", “features^10 text^2" } } } );

Solr Search Boosting

Page 31: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

• edismax (extended disjunction max - more features like full lucene parser, and/or, not, …) • q=video OR streaming • defType=edismax • qf=features^20 text^2 • bq:category:portable^5 SolrQueryResults<College> results = solr.Query ( new SolrQuery(“video OR streaming"), new QueryOptions { ExtraParams = new Dictionary<string, string> { {"qt", "edismax" }, {"qf", “features^20 text^2" }, {"bq", “category:portable^5" } } } );

Solr Search Boosting

Page 32: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Sitecore Solr Use Cases• General search

• Faceting • Autocomplete • Boosting

• Search external sites • File crawling • Big data / external data

Page 33: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Search External Sites• Search of other web properties you own • Search partner web properties • Shared Solr publish (Other CMS's Publish Indexes) • Crawling External Sites (Like a Search Engine)

• Custom scheduled Sitecore crawler • FileDataSource / HttpDataSource (DataImportHandler) • Nutch

Page 34: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Solr File Crawling

• Got a File Repository? • Powered by Apache Tika • Push files via POST to Solr

• Enable Solr extracting request handler (solrconfig.xml) • File List Entity Processor (DataImportHandler)

• REST API /solr/collection/dataimport?command=… • Use nutch

Page 35: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Big Data / External Data

• Expose searchable data to other applications • Push to Solr

• REST API • Pull from Solr

• DataImportHandler

Page 36: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Big Data / External Data

DataImportHandler Demo

Page 37: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Scaling Solr

Page 38: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Scaling Solr

• Debug Your Queries • debug=true • debug=timing • debug=query • debug=results !

• Cache Configuration / When Not To Cache • Some filters aren’t good cache candidates (full dates with seconds, spatial) • fq={!cache=false}date:… • fq={!cache=false}location:…

General Tuning

Page 39: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Scaling Solr• High Availability • Replication

• Core vs. Collection • Core is instance • Collection spans Cores

• Sharding • Automatic vs. Custom • Better to split as they grow

SolrCloud

Page 40: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Scaling Solr

• Distributed Queries • Can limit to specific shards

• shards=host1:port,host2:port • Limitations

• Grouping component’s group.truncate & group.func are not supported • Unique key must be unique across shards • Elevation not supported • More like this not supported • Changed documents may yield false positive matches

SolrCloud

Page 41: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Q & A

Page 42: Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr ... Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other

Thank You!NorthPoint

Ian Mariano Steven Zhao

http://www.northps.com @northps

[email protected] @ianmariano

[email protected] @stevenzhaonps

LinksSDN!

http://sdn.sitecore.net

Nutch!http://nutch.apache.org

Solr!http://lucene.apache.org/solr http://wiki.apache.org/solr/

Solr 4 Cookbook!http://www.ebooks-it.net/ebook/apache-solr-4-cookbook

NYC Open Data!https://nycopendata.socrata.com

US Open Data!http://www.data.gov