Top Banner
Enterprise Search with ColdFusion Solr Dan Sirucek cf.Objective 2012 May 2012
53

Enterprise Search with ColdFusion Solr

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Enterprise Search with ColdFusion Solr

Enterprise Search with ColdFusion Solr

Dan Sirucek cf.Objective 2012 May 2012

Page 2: Enterprise Search with ColdFusion Solr

2

About Me

• Senior Learning Technologist at WellPoint, Inc • Developer for 14 years • Developing in ColdFusion for 8 years • Started in SQL Server, ASP, ASP.NET, VB.NET • Also work in Flash Builder/Flex, Java, and C#

Page 3: Enterprise Search with ColdFusion Solr

3

Where We’ve Been: Growth and Consolidation

WellPoint, Inc. was formed in 2004 as the result of a merger between Anthem, Inc. and WellPoint Health Networks, creating

the nation’s largest health benefits company by membership

Page 4: Enterprise Search with ColdFusion Solr

4

Where We Are: National Scale

1 out of 9 Americans are covered by WellPoint’s affiliated health plans

Note: Provider Network refers to BlueCard® PPO Network

• Nation’s Largest Insurer • ~34 million medical members

• Total Revenue • Nearly $60 billion

• Provider Network Advantage • ~94% Hospitals • ~82% Primary Physicians • ~84% Specialists

• Blue Licensee • 14 states

Page 5: Enterprise Search with ColdFusion Solr

5

Agenda

• Problem and Goal • Why Apache Solr for ColdFusion 9.01 • Solr Multi-core Overview • Replication Overview • Installation • Replication Configuration • Managing Collections on Multiple Solr Instances • Extending ColdFusion Solr Schema • Creating a Custom Search • Q & A • Resources

Page 6: Enterprise Search with ColdFusion Solr

6

Problem and Goal

• Problem • Slow search response

• Constant corruption issues

• Verity wasn’t scalable

• No redundancy

• Goal • Improve search response

• Create an enterprise scalable solution

• Implement redundancy for high availability

• Maintain compatibility with <cfsearch /> & <cfindex /> tags

Page 7: Enterprise Search with ColdFusion Solr

7

Why Apache Solr for ColdFusion 9.01

• Performance

• Fast, very fast

• Optimized for high volume web traffic

• Scalable

• Distributed searches

• Replication • Redundancy

• Replication supports • Master • Slave • Repeater

Page 8: Enterprise Search with ColdFusion Solr

8

Solution Architecture

Page 9: Enterprise Search with ColdFusion Solr

9

Technologies Used

• Windows Server 2008 64 bit • IIS 7.0 • Application Request Routing • ColdFusion 9.01 Multi-server • Apache Tomcat 6

• Master instance

• Apache Solr Standalone Installation for ColdFusion 9.01 • Slave instances

• Java SE JDK 1.6_026 64-bit

Page 10: Enterprise Search with ColdFusion Solr

10

Solr Multi-core Overview

• Solr core = ColdFusion collection • Multiple Cores

• Single Solr instance • Each Solr core has its own configuration and index • Unified administration

• Multi-core template • A template is used for creating a new core (collection)

• The template contains a directory structure and the configuration files needed to create a new core

• Location SolrInstallationDirectory\multicore\template

Page 11: Enterprise Search with ColdFusion Solr

11

Solr Multi-core Template

• conf directory • Contains configuration files used when creating a new Solr core

• Two key files: schema.xml

– Contains the details about which fields your index can contain – How those fields should be dealt with when adding documents to the

index – How those fields should be dealt with when querying those fields

solrconfig.xml – Contains the configuration settings for the Solr core – Used to configure replication

Page 12: Enterprise Search with ColdFusion Solr

12

Solr Multi-core Template Continued

• conf directory continued • Files referenced by schema.xml:

protwords.txt – Words that need protection from stemming – i.e. “maine” is stemmed to “main”

stopwords.txt – Words to not index e.g. a, an, and

synonyms.txt – Synonym groups e.g. GB,gib,gigabyte,gigabytes – Mappings used for spelling corrections e.g. hippa => hipaa

Page 13: Enterprise Search with ColdFusion Solr

13

Solr Multi-core Template Continued

• conf directory continued • Optional file:

solrcore.properties – User defined properties to be referenced within solrconfig.xml – Syntax – Property=Value – File is referenced by default when present in conf directory – Example:

• data directory • Empty directory

• Solr will create the following directories the 1st time content is indexed index spellindex

Page 14: Enterprise Search with ColdFusion Solr

14

Solr Replication Overview

• Replication Features • Efficient and automated distribution of index additions, updates, and

deletions • Pull strategy allows for easy addition of slaves • Configurable distribution interval allows tradeoff between timeliness and

cache utilization - interval is set by the slave instance • Replication and automatic reloading of configuration files • Works over HTTP • Works across platforms with same configuration

• Replication Modes • Master – optimized for indexing • Slave – optimized for searches • Repeater – used in WAN to reduce bandwidth between data centers

Page 15: Enterprise Search with ColdFusion Solr

15

Solr Replication Considerations & Challenges

• Considerations • Replication is not a server level configuration

• Replication is configured in at the solr core (search collection) level

• New cores need to be created on all solr instances

• Challenges • Modify the multi-core template to implement replication when new cores

are created

• Automate the creation of a solr core on all solr instances

• Create a consolidated view of cores on all instances

Page 16: Enterprise Search with ColdFusion Solr

16

Solr Replication Requirements

• Basic Requirement • One master solr instance

• One or more slave solr instances

• Configuration of replication request handlers on master and slave instances

• Replication Request Handler • Configuration is handled in the solrconfig.xml

• Replication is defined by adding a request handler using XML syntax

• Settings are used to set the properties for the request handler

• Master and slave instances are both configured using a request handler, but use different attributes to define its role

Page 17: Enterprise Search with ColdFusion Solr

17

Master Replication Request Handler

• Replication request handler with all possible attributes • Screen shot

Page 18: Enterprise Search with ColdFusion Solr

18

Required Master Settings

• replicateAfter • Configures when replication will be triggered

• Valid values: startup, commit, optimize

• If using startup option, it is necessary to have a commit/optimize entry also, if you want to trigger replication on future commits/optimizes.

• Example:

Page 19: Enterprise Search with ColdFusion Solr

19

Recommended Master Settings

• confFiles • Used to specify configuration files to be replicated

• Comma delimited list of files to replicate

• Can be configured to rename files on replication Syntax – source_file_name.xml:destination_file_name.xml

• Example:

Page 20: Enterprise Search with ColdFusion Solr

20

Optional Master Settings

• backupAfter • Configures when a backup will be created

• Valid values: optimize, startup, commit

• maxNumberOfBackups • Maximum number of backups to retain

• commitReserveDuration • Default 10 seconds

• If commits are very frequent and network is slow, you can tweak this value

Page 21: Enterprise Search with ColdFusion Solr

21

Slave Replication Request Handler

• Slave replication request handler with all possible settings • Add screen shot and high level notes

Page 22: Enterprise Search with ColdFusion Solr

22

Required Slave Settings

• Configuration file • solrconfig.xml

• masterUrl • Sets the url of the Solr master instance • ${solr.core.name} – system variable

• pollInterval • Sets the polling interval of the slave to poll the master for changes • Considerations

Frequency of updates to index Network Bandwidth Acceptable latency

Page 23: Enterprise Search with ColdFusion Solr

23

Optional Slave Settings

• httpConnTimeout • Sets connection timeout on the underlying HttpConnectionManager • Default value 5000ms

• httpReadTimeout • Sets timeout when fetching index from master • Default value 10000ms

• httpBasicAuthUser • Use if basic authentication is enabled on master

• httpBasicAuthPassword • Use if basic authentication is enabled on master

• Compression • Use only if your bandwidth is low

Page 24: Enterprise Search with ColdFusion Solr

24

Slave Replication Configuration Examples

• Basic configuration example

• Using solrcore.properties configuration example

Page 25: Enterprise Search with ColdFusion Solr

25

Slave Solr Installation

• Slave Servers • Windows Server 2008 (64 bit 8gb ram)

• Install Java SE JDK 1.6_026 64-bit Note location of installation directory

– Example : D:\Apps\Java\jdk1.6.0_26

• Execute Apache Solr Standalone Installation for ColdFusion 9.01 installer Change Java Home from default to:

javaInstallationDirectory\jdk1.6.0_26\jre – Example: D:\Apps\Java\jdk1.6.0_26\jre

Page 26: Enterprise Search with ColdFusion Solr

26

Master Solr Installation

• Master Solr Server • Windows Server 2008 (64 bit 8gb ram)

• Download Java JDK1.6_026 64-bit

• Download Apache Tomcat 6 32-bit/64-bit Windows Service Installer

• Execute Java JDK Installer Note installation directory Example: E:\Apps\java

• Execute the Tomcat 6 installer Java JRE – specify the jre in the jdk 1.6.0_26 installation

– Example: E:\Apps\Java\jdk1.6.0_26\jre Select installation directory

– Example: E:\Apps\tomcat6

Page 27: Enterprise Search with ColdFusion Solr

27

Master Solr Installation Continued

• Master Solr Installation continued • Create a solr directory – example E:\Apps\solr

• Copy the following from slave installation solr.war to solr directory

– installationDirectory\webapps\solr.war Mutli-core directory to solr directory

– installationDirectory\mutlicore

• Configure Tomcat service • Launch Configure Tomcat

• Java tab

• Set initial memory pool

• Set maximum memory pool

Page 28: Enterprise Search with ColdFusion Solr

28

Configure Tomcat for Solr

• Stop Apache Tomcat 6 service • Create solr context

• A Context is what Tomcat calls a web application • Location: tomcatInstallDir\conf\Catalina\localhost\ • Create a solr.xml file • Edit solr.xml and define Solr context • Example:

• Start Apache Tomcat 6 service • Launch Tomcat 6 - http://127.0.0.1:8080/manager/html • Navigate to solr application

Page 29: Enterprise Search with ColdFusion Solr

29

Tomcat 6 Web Application Manager

Page 30: Enterprise Search with ColdFusion Solr

30

Slave Configuration

• Apache Solr for ColdFusion 9.01 runs on a Jetty servlet • Jetty Configuration

• Configuration file location SolrInstallationDirectory\etc\jetty.xml

• Connector system properties jetty.port – default = 8983 jetty.host – default = not defined

• Default configuration listens only on 127.0.0.1

• Add jetty.host system property to the connector setting 0.0.0.0 = listen on all IPs Example:

Page 31: Enterprise Search with ColdFusion Solr

31

Slave Jetty Configuration Continued

• Default connector configuration

• After update

Page 32: Enterprise Search with ColdFusion Solr

32

Slave Service Configuration

• Service start up configuration • Default java ram maximum memory setting is 256mb

InstallationDirectory\solr.lax

• Adjust maximum memory setting -Xmx

• Add a minimum memory setting -Xms

• Example:

Page 33: Enterprise Search with ColdFusion Solr

33

Master Solr Multi-core Template Configuration

• Create solrcore.properties • Create a text file named solrcore.properties in the Solr multicore template

directory

• Add two properties MASTER_CORE_URL=http://masterHostnameUrl:masterPort/solr POLL_TIME=hh:mm:ss

• Example:

• Create solrconfig_slave.xml • Make a copy of solrconfig.xml in the master Solr multicore template

directory

• Name the file solrconfig_slave.xml

Page 34: Enterprise Search with ColdFusion Solr

34

Master Solr Multi-core Template Configuration Continued

• Configure solrconfig.xml for replication • Add master and slave replication request handlers • solrconfig.xml

• solrconfig_slave.xlm

Page 35: Enterprise Search with ColdFusion Solr

35

Slave Solr Multi-core Template Configuration

• solrcore.properties • Copy solrcore.properties in template/conf directory on master to

template/conf directory on slave

• solrconfig.xml • Delete solrconfig.xml file in template/conf on slave

• Copy solrconfig_slave.xml in template/conf directory on master to template/conf directory on slave

• Rename solrconfig_slave.xml to solrconfig.xml on slave

Page 36: Enterprise Search with ColdFusion Solr

36

Creating New Collections

• Collections (cores) need to be created on all Solr instances • Use Solr API to create new cores

• REST-like API

• Create new core parameters action – CREATE name – name of new core instanceDir – directory path for new instance template – directory path for the core template wt – writer type

– Format of response – Options: json, javabin, xml – Default = xml

version = 1

Page 37: Enterprise Search with ColdFusion Solr

37

Creating New Collections Code

• In CF create an array of server instances • Define collection name

Page 38: Enterprise Search with ColdFusion Solr

38

Creating New Collections Code Continued

• Loop over server instance array • Create collection on each instance

Page 39: Enterprise Search with ColdFusion Solr

39

Collection Create Result Struct

• De-serialized file content (cfdump from previous slide) • core – collection name

• responseHeader QTime – query time milliseconds status

• saved File path to multicore\solr.xml multicore\solr.xml file is used to store

core names and instance directory

Page 40: Enterprise Search with ColdFusion Solr

40

Solr Admin Master Replication

• Core admin • Navigate to Replication

• Replication admin • Index version

• Location

• Size

Page 41: Enterprise Search with ColdFusion Solr

41

Solr Admin Slave Replication

• Core admin • Navigate to Replication

• Replication admin • Master

• Poll Interval

• Local Index Version & location Replication status

• Controls Disable Poll Replicate Now

Page 42: Enterprise Search with ColdFusion Solr

42

Deleting Collections

• Collections (cores) should be deleted from all Solr instances • Use Solr API to delete cores

• Delete core parameters action – UNLOAD core – name of core to delete wt – writer type

– Format of response – json, javabin, xml – Default = xml

version = 1

Page 43: Enterprise Search with ColdFusion Solr

43

Delete Collections Code

• Loop over server instance array • Delete collection on each instance

Page 44: Enterprise Search with ColdFusion Solr

44

Extend ColdFusion Solr Schema (cfcore)

• Reasons to extend/change default functionality • Change default operator

The default is OR

• Enable delete by key capability

• Enable case sensitivity on search

• Possible changes to schema.xml • Default operator between words is OR

Changing default operator to AND will reduce number of results

Page 45: Enterprise Search with ColdFusion Solr

45

Extend ColdFusion Solr Schema – Enable Delete by Key

• Enable delete by key • Default unique key is a system generated identifier • Possible use case

Use API to delete indexed content by the key value • Changes

Create a copy of schema.xml and name it schema_slave.xml Update replication conf attribute to use schema_slave.xml: schema.xml Changes to schema.xml

– Change index attribute on key field to true

– Change unique key from uid to key

Changing unique key on slave instances will break cfsearch tag

Page 46: Enterprise Search with ColdFusion Solr

46

Extend ColdFusion Solr Schema – Enable case sensitivity on search

• Enable case sensitivity on search • Default configuration uses a filter to change text to lower case

• Possible use case Search by title and retain case sensitivity

• Schema Change Comment out solr.LowerCaseFilterFactory

Page 47: Enterprise Search with ColdFusion Solr

47

Creating a Custom Search

• Use case • Return category facet counts • Date range search

• Solr Search API • Basic query parameters

q – search query fq – facet query qt – query type – name of the request handler in solrconfig.xml start – start row rows – number of rows to return in response fl – comma delimited list of fields to include in response wt – write response type

Page 48: Enterprise Search with ColdFusion Solr

48

Creating a Custom Search Continued

• Solr Search API continued • Highlight parameters

hl – enable highlighted snippets to be generated hl.fragsize – the size in characters, of the snippets created by highlighter hl.snippets – maximum number of snippets to generate per field hl.simple.pre – text which appears before highlighted term hl.simple.post – text which appears after highlighted term

• Facet parameters facet – enable facet counts in query response facet.field – specify a field which should be treated as a facet facet.mincount - minimum count to include facet in response

Page 49: Enterprise Search with ColdFusion Solr

49

Creating a Custom Search Continued

• JSON specific parameter • json.nl

Controls the output format of NamedList used for field faceting data flat (default) – flat array

– Example: [name1,val1, name2,val2] map – JSON object

– Is a hash and can have repeated keys, but preserves order arrarr – an array of two element arrays

– Example: [[name1,val1], [name2, val2], [name3,val3]]

Page 50: Enterprise Search with ColdFusion Solr

50

Creating a Custom Search Code

• Code Review

Page 51: Enterprise Search with ColdFusion Solr

51

Custom Search User Interface Example

Page 52: Enterprise Search with ColdFusion Solr

52

Q & A

21555 Oxnard Dr Dan Sirucek MS: CAAC08-088I Sr. Learning Technologist Woodland Hills, CA 91316 Learning Technologies and Tel (818) 234-8017 Content Mobile (323) 251-1236 www.wellpoint.com [email protected]

Page 53: Enterprise Search with ColdFusion Solr

53

Resources

• Apache Tomcat 6 - http://tomcat.apache.org/download-60.cgi

• Apache Solr Standalone Installer for ColdFusion 9.0.1 - http://www.adobe.com/support/coldfusion/downloads.html

• Java JDK 1.6_26 download- http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u26-download-400750.html

• Apache Solr - http://lucene.apache.org/solr/

• Solr Wiki - http://wiki.apache.org/solr/FrontPage

• Solr Replication - http://wiki.apache.org/solr/SolrReplication

• Solr JSON Response Writer - http://wiki.apache.org/solr/SolJSON#JSON_Query_Response_Format

• Solr Facet Parameters - http://wiki.apache.org/solr/SimpleFacetParameters

• Solr Highlighting Parameters - http://wiki.apache.org/solr/HighlightingParameters