Top Banner
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander SilverStripe and Full Text Search Giving the people what they want Wednesday, 24 August 2011
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

SilverStripe and

Full Text SearchGiving the people what they want

Wednesday, 24 August 2011

Page 2: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What we’re covering

• What does search give you

• Three ways to get it

• Built in db backed search

• Sphinx module

• Full text search module

Big topic, not much time

Wednesday, 24 August 2011

Page 3: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What we’re not

• Search result visualization

• Search refinement

• Boost, result pre-calculation, faceting, spell checking, real-

time results

• Integrating search with IA

• Measuring search usefulness

• 3rd party modules

But that doesn’t mean they’re not important

Wednesday, 24 August 2011

Page 4: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Why add search?

Wednesday, 24 August 2011

Page 5: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What are you trying to do?

• Most people use navigation by preference

• Stats depend on site, but average 70-95% navigation

• Search is primarily used to locate stuff that’s not obvious

how to navigate to

• Deeply nested pages

• Cross-cutting information not provided as an taxonomic structure

• Re-discovering remembered items

• If search doesn’t give immediate results, users fall back to

navigation again

Be aware of the goals of your users

Wednesday, 24 August 2011

Page 6: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Getting you there quicker

• Interesting is relative

• Ideally return the page the user is after

• But failing that, at least return a page the user is interested in

• Speed is perception

• Raw speed is rarely noticed (except when it is)

• Ability to understand results is as important as accuracy of results

• A second click is OK, as long as there’s a likely payoff: “did you mean” is fine,

disambiguation is OK, paging is useless

To be used, search has to give interesting pages faster than navigation

Wednesday, 24 August 2011

Page 7: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Technology & Tools

Wednesday, 24 August 2011

Page 8: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Database internal full text search

• Most databases come with some full text search built in

• Generally work by adding new indexes to a table column

• Can easily combine full text queries with other filters

• But databases aren’t really designed for it

• Poor query language - no booleans

• Poor language processing

• Limited feature set - no field boost, spell checking, search suggestions,

faceting, result fragments, ....

• Sometime costly technically (MyISAM)

It’s just another index

Wednesday, 24 August 2011

Page 9: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

External full text indexers

• Given a schema, and a set of documents, builds an index

• Schema gives both text processing and result relevancy rules

• Different engines either retrieve documents themselves or have documents

sent to them

• Indexes might be write-once (rebuild entire index to add changes)

• Gives a language to query those indexes

• Generally query language is engine-specific

Solr, Sphinx, Elastic Search

Wednesday, 24 August 2011

Page 10: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

External engines + SilverStripe

• Building schemas is hard, time consuming, annoying when

model changes

• Can build schemas directly off models

• Effectively free - all the necessary information is already present

• Flexible search - can change form structure without index changes

• Inefficient - includes information you won’t search against

• Or can build schemas off query design

• Needs more though around design of query up front

• More efficient, leads to some more powerful abilities

A tale of two abstractions

Wednesday, 24 August 2011

Page 11: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

SilverStripe Integration

Wednesday, 24 August 2011

Page 12: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Built-in search

✓No external dependancies, separate indexes, schema files or

setup

- Can only search SiteTree and File objects, and only specific

fields

- Quality of results is heavily database dependent

Your database-dependent, barely acceptable default

Wednesday, 24 August 2011

Page 13: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Sphinx module

✓ Very little configuration gives great results on moderate

sized sites

✓ Can search any DataObject, but...

- Combining search over multiple DataObjects doesn’t really

work

- Limited real-time update support

- No exact match string mode makes filtering tricky

Easy, quality full text search

Wednesday, 24 August 2011

Page 14: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Fulltext search module

✓ Schemas generated from query structure More flexible and efficient than generating from model structure Closer to how external engines work natively

✓ Eventually multiple search backend support Currently: Solr In future: Sphinx, Elastic Search, Zend_Lucene Not intended to allow code-less swapping of backends.

- Currently needs Solr, which is a Java app Loves memory, hates empty disk space

Powerful (eventually) search engine independent toolkit

Wednesday, 24 August 2011

Page 15: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Full text search module example

Wednesday, 24 August 2011

Page 16: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Define an indexSchema gets generated from this index

Wednesday, 24 August 2011

Page 17: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Define a formStandard SilverStripe stuff

Wednesday, 24 August 2011

Page 18: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Build a query & apply to an indexFilter and excludes can be build & nested

Wednesday, 24 August 2011

Page 19: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Final thoughts

Wednesday, 24 August 2011

Page 20: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Search without searching

• Looks like navigation, acts like search

• Instant taxonomies

• Deal with inconsistent data

• Encourages exploration

Search engines as fuzzy matchers

Wednesday, 24 August 2011

Page 21: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Links

• https://github.com/silverstripe/silverstripe-sphinx

• https://github.com/silverstripe-labs/silverstripe-fulltextsearch

• http://sphinxsearch.com/

• http://lucene.apache.org/solr/

• http://www.elasticsearch.org/

• https://github.com/nyeholt/silverstripe-solr

• http://code.google.com/p/lucene-silverstripe-plugin/

Modules I’ve covered + some other stuff

Wednesday, 24 August 2011

Page 22: Fulltext search pres

Thank you!

24 August, 2011 • SilverStripe Wellington Meetup •

Hamish Friedlander

Twitter: @hafriedlander

Email: [email protected]

Wednesday, 24 August 2011