Top Banner
Full text search in Rails ith Sunspot and Solr Maurício Linhares @mauriciojr http://co deshooter.wordpress.com/ This material is provided under a Creative Commons Licence - http://creativecommons.org/licenses/by-nc-sa/3.0/ FULL TEXT SEARCH IN IN RAILS WITH SUNSPOT AND SOLR 2 STARTING THE ENGINES 3 LISTING 1  SUNSPOT.YML 3 LISTING 2  CREATE _BASE _TABLES .RB 4 LISTING 3  CATEGORY .RB 4 LISTING 4  PRODUCT .RB 5 SEARCHING 5 LISTING 4  PRODUCTS CONTROLLER .RB 6 LISTING 5  SUNSPOT HACK.RB 6 INDEXING 6 IMAGE 1   S OLR SCHEMA BROWSER 8 IMAGE 2   VIEWING THE ANALYSIS AND SEARCH FILTERS 9 IMAGE 3   S OLR ANALYZER PAGE 10 CUSTOMIZING FIELDS 10  LISTING 6  SOLR/CONF /SCHEMA.XML EXCEPT 10 LISTING 7  SOLR/CONFIG/SCHEMA.XML EXCEPT 11 IMAGE 4   S OLR ANALYZER PAGE 12 P  ARTIAL MATCHING 12  LISTING 8  SOLR/CONFIG/SCHEMA.XML EXCEPT 13 IMAGE 5   ANALYZER OUTPUT WITH PARTIAL MATCHING ENABLED 14 F  ACETING 14  LISTING 9  PRODUCTS _CONTROLLER .RB EXCEPT 15 LISTING 10  PRODUCT .RB EXCEPT 15 LISTING 11  PRODUCTS /INDEX.HTML.HAML EXCEPT 16 IMAGE 6   FACETING INFORMATION 16 CONCLUSION 16  
17

fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

Apr 08, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 1/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

FULL TEXT SEARCH IN IN RAILS WITH SUNSPOT AND SOLR 2 

STARTING THE ENGINES 3 LISTING 1  SUNSPOT.YML 3 LISTING 2  CREATE_BASE_TABLES.RB 4 

LISTING 3  CATEGORY.RB 4 LISTING 4  PRODUCT.RB 5 SEARCHING 5 LISTING 4  PRODUCTS_CONTROLLER.RB 6 LISTING 5  SUNSPOT_HACK.RB 6 INDEXING 6 IMAGE 1  SOLR SCHEMA BROWSER 8 IMAGE 2  VIEWING THE ANALYSIS AND SEARCH FILTERS 9 IMAGE 3  SOLR ANALYZER PAGE 10 CUSTOMIZING FIELDS 10 LISTING 6  SOLR/CONF/SCHEMA.XML EXCEPT 10 LISTING 7  SOLR/CONFIG/SCHEMA.XML EXCEPT 11 IMAGE 4  SOLR ANALYZER PAGE 12 P ARTIAL MATCHING 12 LISTING 8  SOLR/CONFIG/SCHEMA.XML EXCEPT 13 IMAGE 5  ANALYZER OUTPUT WITH PARTIAL MATCHING ENABLED 14 F ACETING 14 LISTING 9  PRODUCTS_CONTROLLER.RB EXCEPT 15 LISTING 10  PRODUCT.RB EXCEPT 15 LISTING 11  PRODUCTS/INDEX.HTML.HAML EXCEPT 16 IMAGE 6  FACETING INFORMATION 16 CONCLUSION 16 

Page 2: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 2/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Full text search in in Rails with Sunspot and Solr

Everyone wants to take their databases to run everything as fast as possible. We

usually say query less, add more caching mechanisms, add indexes to the

columns being searched, but another solution is not to use the database at all and

look for better solutions for your querying needs.

When querying for text in our databases, were often doing LIKE searches. Like

searches are only performant if we have an index in that field and the query is

written in a way that the index is used. Imagine that you have a field name and

it contains the text BattlestarGalactica. This query would be able to run and use

the index:

SELECT p.* FROM products p WHERE p.name LIKE ³Battlestar%´

The database would be able to optimize this query and use the index to find the

expected row. But, what if the query was like this one:

SELECT p.* FROM products p WHERE p.name LIKE ³%Galactica´

Database indexes usually match from left to right, so, unless you have a nasty

trick under your sleeve, this query will just look at ALL the rows in the products

table and perform a match on every name column before returning a result.

And thats Really Bad News for you, as the DBA will probably come for you

holding a Morning Star to beat you badly. So, querying with LIKE when you

what you need is full text search isnt nice.

Thats where full text search based solutions come in for help. Tools like Solr

allow you to perform optimized text searches, filter input, categorization andeven features like Googles Did you mean?.

In this tutorial youll learn how to add full text searching capabilities to your

Rails application using Sunpot and Solr. We will also delve a little bit into Solrs

configuration and learn how to use specific tokenizers to clear input, perform

partial matching of words and faceting results.

This project uses Rails 3 and Ruby 1.9.2, youll find a Gemfile and and .rvmrc

with all dependencies declared, it should be pretty easy to follow or setup your

environment based on it (if youre not using RVM, thats a GREAT t ime to learn

using it).

You can possibly follow this tutorial with a previous Rails version and without 

bundler or RVM, given all models and most of the code will look exactly the same

in Rails 2 and Sunspot is compatible to Rails 2 too.

The source code for this application is available at GitHub here -

https://github.com/mauricio/sunspot_tutorial  

Page 3: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 3/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Starting the engines

Download the Sunspot source code from Github -

https://github.com/outoftime/sunspot  

Enter the project folder and go to sunspot/solr-1.3, inside that folder you

should see a solr folder, copy this folder into your projects folder. This iswhere the general Solr configuration is going to live, dont worry about these

files just yet, well get to them later in this tutorial.

Now create a sunspot.yml file under your projects config folder, heres a

sample:

Listing 1 sunspot.ymldevelopment:solr :hostname: localhostport: 8980log_level: INFOauto_commit_after_delete_request: true

test:solr :hostname: localhostport: 8981log_level: OFF

production:solr :hostname: localhostport: 8982log_level: WARNINGauto_commit_after_request: true

You can have different configurations for every environment youre running. To

see all configuration options, go to the Sunspot source code and head to the

sunspot_rails/lib/sunspot/rails/configuration.rb file.

Now well create two models, Product and Category, so lets start by creating the

migration that will setup them:

rails g migration create_base_tables

Page 4: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 4/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Listing 2 create_base_tables.rbclassCreateBaseTables<ActiveRecord::Migration

defself .upcreate_table :categories do |t|t.string :name, :null => false 

end 

create_table :products do |t|t.string :name, :null => false t.decimal :price, :scale => 2, :precision => 16, :null => false t.text :description t.integer :category_id, :null => false end 

add_index :products, :category_id 

end 

defself .down

drop_table :categories drop_table :products end 

end 

Now we move on to the basic models, starting with the Category model:

Listing 3 category.rbclassCategory<ActiveRecord::Base

has_many :products 

validates_presence_of :name validates_uniqueness_of :name, :allow_blank => true 

searchable :auto_index => true, :auto_remove => true do text :name end 

def to_sself .nameend 

end

Here in the Category class we see our first reference to Sunspot, the searchable

method, where we configure the fields that should be indexed by Solr. At theCategory class, theres only one field thats useful at this moment, the name, so

we tell Sunspot to configure the field name to be indexed as text (you usually

dont want your text indexed as string, as it will only be a hit in a full match).

The :auto_index and :auto_remove options are there to let Sunspot automatically

send your model to be indexed at Solr when it is created/updated/destroyed.

The default is false for both values, which means you have to manually send

Page 5: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 5/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

your data to Solr and unless you really want to do that, you should keep both of 

these values as true in your models.

Now lets look at the Product class:

Listing 4 product.rbclassProduct<ActiveRecord::Base

belongs_to :category 

validates_presence_of :name, :description, :category_id, :price validates_uniqueness_of :name, :allow_blank => true 

searchable :auto_index => true, :auto_remove => true do text :name, :boost => 2.0 text :description float :price integer :category_id end 

def to_sself .nameend 

end

In our Product class things are a little bit different, we have more fields (and

more kinds) being indexed. float and integer are pretty self explanatory, but 

the name field has some black magic floating around, with the boost

parameter. Boosting a field when indexing means that if the match is in that 

specific field, it has more relevance than if found somewhere else.

Imagine that youre looking for Iron Maidens Powerslave album. You go toIron Maidens Online Store and search for powerslave, hoping that the album

will be the first hit, but then you see Live After Dead before Powerslave. Why

did it happen? The Live After Dead album contains the Powerslave song in its

track listing, so its a match as much as the real Powerslave album. What we

need here is to tell the search tool that if a match is on an album name, it has

higher relevance than if the hit is in the track listing.

Boosting allows you to reduce these issues. Some fields are inherently more

important than others and you can tell that to Solr by configuring a :boost value

for them. When something matches on them, the relevance of that match will be

improved and it should come up before the other results in search.

Searching

Now lets take a look at the ProductsController to see how we perform the

search:

Page 6: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 6/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Listing 4 products_controller.rbclassProductsController<ApplicationController 

def index@products = if params[:q].blank?

Product.all :order => 'name ASC' 

else Product.solr_search do |s|s.keywordsparams[:q]end end end 

end

As you can see, searching is quite simple, you just call the solr_search method

and send in the text to be searched for. One thing that I dont like about Sunspot 

is that searches do not return an Array like object, you get a

Sunspot::Search::StandardSearch object that has, as a property, the results array

which contains the records returned by the search.

Heres a simple way to fix this issue (I usually place the contents of this file inside

an initializer in config/initializers):

Listing 5 sunspot_hack.rb::Sunspot::Search::StandardSearch.class_eval do 

include Enumerable

delegate(:current_page,:per_page,

:total_entries,:total_pages,

:offset,

:previous_page,:next_page,:out_of_bounds?,:each,:in_groups_of ,:blank?,:[],:to => :results)

end

This simple monkeypatch makes the search object itself behave like anEnumerable/Array and you can use it to navigate directly in the results, without 

having to call the results method. The methods usually used by will_paginate

helpers are also included so you can pass this object to a will_paginate call in

your view and its just going to work.

Indexing

Page 7: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 7/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Now that all the models are in place, we can start fine tuning the Solr indexing

process. First thing to understand here is what happens when you send text to be

indexed by Solr, lets get into the tool, starting the server:

rakesunspot:solr:run

This rake task starts Solr in the foreground (if you wanted to start it in the

background, youd use sunspot:solr:start). With Solr running, you should add

some data to the database, this tutorials project on Github contains a seed.rb

file with some basic data for testing, just copy it over your project .

Also copy the lib/tasks/db.rake from the project to your project, it contains a

db:prepare task that truncates the database, seeds it and then indexes all items

in Solr and were doing to be reindexing data a lot.

With everything copied, run the db:prepare task:

rakedb:prepare

This will add the categories and products to your database and also index them

in Solr. If this task did run successfully, head to the Solr administration interface,

at this URL:

http://localhost:8980/solr/admin/schema.jsp  

Once you go to it, click on the FIELDS, then on NAME_TEXT, you should see a

screen just like the one in image 1:

Page 8: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 8/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Image 1 Solr schema browser

If you dont see all the fields that are available in this image, your rake

db:prepare command has probably failed or Solr wasnt running when you

called it.

What we see here is the information about the fields were indexing. This specific

field contains all data from the name properties from both Category and Product classes, as you can notice from the top 10 terms.

The name field is not indexed by its full content, as a relational database would

usually do, the text is broken into tokens, by the solr.StandardTokenizerFactory

class in Solr. This class receives our text, like BattlestarGalactica: The

Boardgame and turns it into:

[³Battlestar´, ³Galactica´, ³The´, ³Boardgame´]

This is what gets indexed and, ultimately, searched by Solr. If you open the web

application now and try to search for battle, you wont have any matches.If you

search for Battlestar, you get the two products that match the name.

Everything when indexing information in Solr revolves around building the best 

tokens available for your input. You have to teach Solr to crunch your data in a

way that makes sense and makes it easy to search for, and adding filters to the

indexing process does this. While in the same page as Image 1 above, click on the

DETAILS links as shown in Image 2:

Page 9: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 9/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Image 2 Viewing the analysis and search filters

Each field in Solr has two analyzers, one is the index analyzer, that prepares the

input to be indexed and the other is the query analyzer that prepares the

search input to finally perform a search. Unless you have some special need, both

of them are usually the same.

In our current configuration, we have the same two filters for both of the

analyzers. The StandardFilterFactory filter removes punctuation characters from

our input (the : in BattlestarGalactica: The Boardgame is not in our tokens)

and the LowerCaseFilterFactory makes all input lowercased so we can search

with baTTle, BATTLE, BaTtLe and theyre all going to work.

Before we move on to add more filters to our analyzers, lets take a look at theanalyzer screen in Solr Admin at -

http://localhost:8980/solr/admin/analysis.jsp?highlight=on  

In this screen we see how our input is going to be transformed into tokens by the

configured analyzers.

Page 10: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 10/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Image 3 Solr analyzer page

In this screen we have selected the name_text field in Solr. In the Field value

(Index) you enter the values youre sending to be indexed, just like you would

send from your model property, in the Field value (Query) you enter the values

youd use to search.

Once you type and hit Analyze you should see the output just below the form as

we see in Image 3. This output shows how your input is t ransformed into tokens

by the tokenizer and filters, this way you can easily experiment by adding more

filters and seeing if the output really matches the way youd expect it to. This

analysis view is your best friend when debugging search/indexing related issues

or trying out ways to improve the way Solr indexes and matches your data.

Customizing fields

Now that you have an idea about how the indexing and searching process work,

lets start to customize the fields in Solr, open up the solr/conf/schema.xml file

and look for this reference:

Listing 6 solr/conf/schema.xml except<fieldtype class="solr.TextField"positionIncrementGap="100" name="text"> <analyzer > <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory "/> 

</analyzer > </fieldtype>

If you look at Image 1, where we saw the name_text configuration, youll see

that the field type is text, this except above is the configuration for all fields of 

type text, which means that if we add more filters here well affect all fields of 

this type. This greatly simplifies the way we configure the tool, as we dont have

to define explicit configurations for every single field that our models have, we

Page 11: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 11/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

can just reuse this same text config for all fields that are supposed to be

indexed as text.

But thats a lot of talking, lets get into action!

Lets start the job by looking at our indexed data from before:

[³battlestar´, ³galactica´, ³the´, ³boardgame´]

The the is mostly useless, as its going to be available in almost all properties

and no one is ever going to search for the (oh yeah, there might be that ONE

guy that does it). In Information Retrieval lingo, the is a stop word, it usually

doesnt have meaning by itself and doesnt represent valuable information for

our indexer, removing all stop words from your input improves performance and

the relevance of your results.

Given that this is a common operation, Solr already contains a filter thats

capable of removing all stop words from your data, the solr.StopFilterFactory,

lets see how we can add it to our config:

Listing 7 solr/config/schema.xml except<fieldtype class="solr.TextField"positionIncrementGap="100" name="text"> <analyzer > <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory "/> <filter class="solr.StopFilterFactory" words="stopwords.txt"ignoreCase="true"/> 

<filter class="solr.ISOLatin1AccentFilterFactory "/> <filter class="solr.TrimFilterFactory"/> </analyzer > 

</fieldtype>

If you look at the solr/config folder youll se a stopwords.txt file that already

contains most of the common stop words in English, you can add or remove

words from there as needed and if youre not indexing English text you can just 

remove the English names and add your languages stop words. Now change this

in your solr/config/schema.xml file and stop and start Solr again and open the

analyzer:

Page 12: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 12/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Image 4 Solr analyzer page

As you can see, in the last step, the the was removed from both the index input 

and the query input, were maintaining only the pieces of information that are

really useful, this makes our index smaller and also speeds up searching.

While you were not looking, we have also added two other filters,

solr.ISOLatin1AccentFilterFactory, that removes accents from words in Latin

based languages, like Portuguese. If the input is não, it becomes nao. And

after that theres solr.TrimFilterFactory, that removes unnecessary spaces from

our tokens.

Partial matching

Another pretty common need is to be able to match only a part of a word, usually

a prefix. In the beginning of the tutorial, we saw that searching for battle

doesnt yield any results, while battlestar does. This happens because Solr, by

default, only sees a match if its a full match. The word you entered must be

exactly the same as a token thats available in the index, if there is no exact 

match, Solr you tell you that there are no results.

If you look at Lucenes Query Parser Syntax -http://lucene.apache.org/java/2_9_1/queryparsersyntax.html (Solr is

somewhat a web interface to Lucene) youll see that you can use the * operator

to perform a partial match. We could then search for battle* and this would

yield the results we expect, but doing this kind of partial matching is slow and

could possibly become a bottleneck for your application, so we have to figure out 

another way to do this.

Page 13: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 13/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

When all you need is prefixed partial matching, the solr.EdgeNGramFilterFactory

is your best friend. It will break words into pieces that will then be added to the

index, so it looks like you have partial matching, but in fact the partials are

tokens by themselves in the index, lets see how our config would look like in this

case:

Listing 8 solr/config/schema.xml except<fieldtype class="solr.TextField"positionIncrementGap="100" name="text"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory "/> <filter class="solr.StopFilterFactory" words="stopwords.txt"ignoreCase="true"/> <filter class="solr.ISOLatin1AccentFilterFactory "/> <filter class="solr.TrimFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory"

minGramSize="3"maxGramSize="30"/> </analyzer > 

<analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory "/> <filter class="solr.StopFilterFactory" words="stopwords.txt"ignoreCase="true"/> <filter class="solr.ISOLatin1AccentFilterFactory "/> <filter class="solr.TrimFilterFactory"/> </analyzer > </fieldtype>

As you can see, now we have two <analyzer> sections in our <fieldtype>, one of 

the analyzers is for index and the other is for query. This is needed because

we dont want to have our search parameters being transformed for a partial

match. If the user is searching for battle, it doesnt makes sense to show himresults for bat, so the generation of pieces of each word should be done only

when indexing information.

Now restart your Solr instance and head run again the form we had in the

analyzer view, you should see something like Image 5:

Page 14: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 14/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Image 5 Analyzer output with partial matching enable d

Looking at the output, battlestar became:

[³bat´, ³batt´, ³battl´, ³battle´, ³battles´, ³battlest´, ³battlesta´, ³battlestar´]

Now, if you search for battle, you should find all products that have battle as a

prefix in any of their words and the search input is not affected by this change.

Faceting

Faceting of results is YACF (Yet Another Cool Feature) that you have when using

Solr and Sunspot. What does that mean?, you might ask, it means that Solr isable to organize your results based on one of its properties and tell you how

many results did match for every property value.

I still dont get it, you might be thinking now. In our Product model were

indexing the category_id property, well tell Sunspot to facet our search based

on the category_id field and Sunspot will tell us how many matches each

category had, even if were paginating the results. Lets see how our searching

code would change:

Page 15: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 15/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/ 

Listing 9 products_controller.rb exceptdef index

@page = (params[:page] || 1).to_i@products = if params[:q].blank?

Product.paginate :order => 'name ASC', :per_page => 3, :page => @page else 

result = Product.solr_search do |s|s.keywordsparams[:q]unlessparams[:category_id].blank?s.with( :category_id ).equal_to( params[:category_id].to_i )else s.facet :category_id end s.paginate :per_page => 3, :page => @page end 

if result.facet( :category_id )@facet_rows = result.facet(:category_id).rows

end 

resultend end

The search code really changed a lot, now if theres a category_id parameter we

will use that to filter our search, if there isnt were going to perform faceting

with the s.facet :category_id call. Theres also a slight change to the product.rb

class, lets see it:

Listing 10 product.rb exceptsearchable :auto_index => true, :auto_remove => true do text :name, :boost => 2.0 

text :descriptionfloat :priceinteger :category_id, :references => ::Categoryend

Weve added the :references => ::Category to the :category_id field

configuration so Sunspot knows that this field is, in fact, a foreign key to another

object, this will allow Sunspot to load the categories in the facets automatically

for you.

The result.facet(:category_id) asks the search object for the array that contains

the facets returned for the :category_id field in this search. Each row in this list 

contains an instance (which, in our case, is an Category object) and a count,thats the number of hits in that specific facet. Once you get your hands at the

rows, we can use it in our view, lets see how we used them:

Page 16: fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

8/7/2019 fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01

http://slidepdf.com/reader/full/fulltextsearchininrailswithsunspotandsolr-110113170540-phpapp01 16/16

Full text search in Rails ith Sunspot and Solr Maurício Linhares

@mauriciojr http://codeshooter.wordpress.com/

This material is provided under a Creative Commons Licence -

http://creativecommons.org/licenses/by-nc-sa/3.0/

Listing 11 products/index.html.haml except-if !@facet_rows.blank? && @facet_rows.size> 1 %ul -for row in @facet_rows%li= link_to( "#{row.instance} (#{row.count})", products_path( :q =>params[:q], :category_id =>row.instance ) )

If there are facets available, we use them to add links that will make the user

filter based on each specific facet, each row object has an instance and a count,

and we use both in the interface to tell the user which category is it and how

many hits it had. Look at how our user interface looks like:

Image 6 Faceting information

And now you finally have search functionality added to a Rails project, with

partial matching, faceting, pagination and input cleanup. Just forget that you have

ever performed a SELECT p.* FROM products p WHERE p.name LIKE

%battle% and be happy to be using a great full text search solution.

Conclusion

Hopefully this tutorial should be enough to get you up and running with Solr, for

more advanced features Id recommend you to search on the Solr wiki

(http://wiki.apache.org/solr/FrontPage ) and buy Solr 1.4 Enterprise Search

Server by David Smiley and Erick Pugh

(http://www.amazon.com/gp/product/1847195881?ie=UTF8&tag=ultimaspalavr-

20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1847195881 ).