Jan 26, 2015



Binesh Gummadi

A 2009 presentation which I just found in archives
Apache SolrEnterprise search platform

from the Apache Lucene project

What is Solr?

● Search Server● Built upon Apache Lucene ● Fast, very● Scalable, query load and collection size● Interoperable● Extensible● Lucene power exposed over HTTP● Spell checking, highlighting, faceting and etc.● Caching● Replication● Distributed search

How stuff works?

● Field types○ <fieldType name="text" class="solr.TextField" indexed="true" />

● Fields○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>

● Unique key (optional) ○ <uniqueKey>id</uniqueKey>

● copy fields○ <copyField source="developers" dest="df"/>

● dynamic fields○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>

● similarity configuration○ Similarity is the scoring routine for each document vs. a query

● Lucene indexing parameters○ <mergeFactor>10</mergeFactor>○ <ramBufferSizeMB>32</ramBufferSizeMB>

● Cache settings○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="


● Request handler configuration○ <requestHandler name="dismax" class="solr.SearchHandler" >

● HTTP cache settings○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">

● Search components, response writers, query parsers○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

○ <queryResponseWriter name="velocity" class="org.apache.solr.request.VelocityResponseWriter"/>

○ <queryParser name="lucene" class=""/>

Request Handler

<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name=""></str> <str name="title">Solritas</str>

<str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>

Response Writer

● A Response Writer generates the formatted response of a search.

● The wt parameter selects the Response Writer to be used

● json, php, phps, python, ruby, xml, xslt, velocity

<queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>

Analyzers, Tokenizers, Filters

● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text

<fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/></fieldType>

● The job of a tokenizer is to break up a stream of text into tokens

<fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer></fieldType>

● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it

Other features

● Highlighting○ &hl=true&hl.fl=developers

● Synonyms○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"


● Spell check○ The spell check component can return a list of alternative spelling

suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

● Content Streams○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in

solrconfig.xml ● Solr Cell

○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many

other types ● More like this


Indexing with solrJ

SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed

Data Import Handler

● Indexes relational database, XML data, and e-mail sources

● Supports full and incremental/delta indexing● Highly extensible with custom data sources,

transformers, etc●

● Master is polled● Replicant pulls Lucene index and optionally also Solr

configuration files● Query throughput scaling: replicate and load balance●

● Download solr ○

● Start solr○ cd <solr_home>/example○ java -jar start.jar

● Post documents○ cd <solr_home>/example/exampledocs○ java -jar post.jar *.xml○ java -jar post.jar cw.xml

● Access Solr○ http://localhost:8983/solr/admin/

● Querying solr○ http://localhost:8983/solr/select/?q=binesh○ http://localhost:8983/solr/select/?q=binny○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1○ http://localhost:8983/solr/itas/

● Luke○

Liferay + Solr: Motivation

● Centralizing search index in clustered Liferay environment

● Performance improvement○ Re-indexing costs too much for large DB's○ Often time indexes of Liferay deployments in a cluster are not


Liferay + Solr: Configuration 1

Install Solr (

Setting up environment variables● SOLR_HOME = /${solr installed folder}● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"

solr.xml● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content

<?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>

Liferay + Solr: Configuration 2

schema.xml● This file tells Solr how to index the data coming from Liferay, and can be

customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have

to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...

Liferay + Solr: Configuration 3

Copy WAR file● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war

into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.

Start Liferay/tomcat● Solr will be picked up and "solr" will be deployed automatically under

${tomcat}/webapps folder

Install solr-web Liferay plugin● Latest Liferay plugin can be checked out from the following location● Build the checked out plugin and deploy it

Liferay + Solr: Configuration 4

Final Step● We need to rebuild Liferay search indexes● Control Panel > Server Administration

Liferay + Solr: How it works

... <bean id="solrServer" class=""> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class=""><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class=""><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...

solr-spring.xml (from solr-web plugin)

Liferay + Solr: Back to the default?

● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described

in the previous step