Solr

Post on 26-Jun-2015

354 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Solr TechSig 18/4/13 slides.

Transcript

Solr

What is it?

• Text search index (engine)• Open source• Not a search product• A tool that allows you to create a search

solution

What is it like?

• Google, Google Appliance.• FAST• Oracle Secure Enterprise Search• etc.

Google Appliance:

• Sucks data in• Can’t really configure• Stuck with results• Bonnet is locked

Solr:

• You need to feed data in• Highly configurable• Search results can be tuned• There is no bonnet

Why am I doing a talk?

• Did a course• LucidWorks content• Presented by FindWise• FindWise are a search specialist that use a

range of search engines

Caveats

• Course was in Solr 4.1.0, we use 3.6.1 for APVMA

• Course focussed on search, not ingestion or presentation

• Java API recommended for ingestion• ‘Browse’ interface uses Velocity templates for

presentation, but probably isn’t good enough for most projects.

Where does Solr fit?

Application Architecture

Apache Tika

• Data import handler• Used to be part of Lucene• XML• PDF• Word• Excel• etc.

Manifold CF

• Apache• Connector framework• Used to connect to content repositories (source)• Sharepoint• Documentum• CMIS• JDBC• RSS

Hydra

• FindWise• Although Solr supports validation (e.g.

‘required’), don’t use it for data cleanup.• Validation failure inconvenient: whole job fails• Feed in clean data.• Use Hydra for cleanup.

Apache ZooKeeper

• Used for SolrCloud• Clustering and sharding• Solr 4.1.0 only• Side project for Hadoop• Used to manage Hadoop clusters

Inside

General Approach

• Design schema• Prototyping• Integration

Design Schema

• A data modelling exercise• schema.xml• Dynamic fields can be useful in the first pass:

<dynamicField name=“*" type="string" indexed="true" />

Prototyping

• Get the data in (index)• csv, XML, JSON• post.jar• URL to search and inspect raw results• ‘browse’ interface allows developer to

understand how the search is working• solrconfig.xml

Integration

• Not covered• Content ingestion• Presentation of results• Up to you…

Demo

top related