Top Banner
A NOOBS LESSON ON SOLR (CONFIGURATION)
37

A noobs lesson on solr (configuration)

Aug 10, 2015

Download

Software

BTI360
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A noobs lesson on solr (configuration)

A NOOBS LESSON ON SOLR (CONFIGURATION)

Page 2: A noobs lesson on solr (configuration)

STEVE, STOP ME IF I’M WRONG

at any point

not exactly a full secret, but a disclaimer here: I don’t completely know everything there is to know about Solr or its configuration

Page 3: A noobs lesson on solr (configuration)

EASIEST WAY I CAN EXPLAIN SOLR.

how would you find all the pages a term or phrase appears on in a book?

Page 4: A noobs lesson on solr (configuration)

EASIEST WAY I CAN EXPLAIN SOLR.

How would you find all the pages a term or phrase appears on in a book?

Page 5: A noobs lesson on solr (configuration)

EASIEST WAY I CAN EXPLAIN SOLR.

so we can think of Solr like an index in the back of a book

we use our brains to find the words or terms in the index

Solr’s brain is schema.xml

the words or terms refer to documents (text streams)

Page 6: A noobs lesson on solr (configuration)

? HOW DOES THE INDEX GET POPULATED?

schema.xml !

Page 7: A noobs lesson on solr (configuration)

HOW DOES THE INDEX GET SEARCHED?

? schema.xml !

Page 8: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

index contains one or more documents

documents are unit of search and index

documents contain fields

so, index = tons of documents = and each document has field(s)

make sense yet?

Page 9: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 10: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 11: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 12: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 13: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 14: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 15: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 16: A noobs lesson on solr (configuration)

SO, SCHEMA.XML IS THE BRAIN

<field name="html" type="example" indexed="true"

stored="true" multiValued="true" />

<fieldType name="example" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true" /> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

and schema.xml is where it’s at!

it defines the fields and how to index and search each field

Page 17: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

}

Page 18: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

}

Page 19: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

hit("VIRGINIA.EDU"), hit("bootp.virginia.edu"), hit("\"d-128-100-108.bootp.virginia.edu\""),

}

Page 20: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

hit("VIRGINIA.EDU"), hit("bootp.virginia.edu"), hit("\"d-128-100-108.bootp.virginia.edu\""),

miss("mail.virginia.edu"));

}

Page 21: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

hit("VIRGINIA.EDU"), hit("bootp.virginia.edu"), hit("\"d-128-100-108.bootp.virginia.edu\""),

miss("mail.virginia.edu"));

}

THIS TEST FAILS :-( So where do we look?

Page 22: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

Page 23: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

Page 24: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

Page 25: A noobs lesson on solr (configuration)

<field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

FIELD? FIELDTYPE? HALP PLS.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer> </fieldType>

Page 26: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer> </fieldType>

<field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

Page 27: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> </fieldType>

<field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

Page 28: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> </analyzer> </fieldType>

Page 29: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> <filter class=”solr.NGramFilterFactory” maxGramSize=”25” minGramSize=”3”/> </analyzer> </fieldType>

Page 30: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> <filter class=”solr.NGramFilterFactory” maxGramSize=”25” minGramSize=”3”/> <filter class=”solr.LowerCaseFilterFactory”/> </analyzer> </fieldType>

Page 31: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="text_general" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> <filter class=”solr.NGramFilterFactory” maxGramSize=”25” minGramSize=”3”/> <filter class=”solr.LowerCaseFilterFactory”/> </analyzer> </fieldType>

Page 32: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="sslcerts_hostname" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> <filter class=”solr.NGramFilterFactory” maxGramSize=”25” minGramSize=”3”/> <filter class=”solr.LowerCaseFilterFactory”/> </analyzer> </fieldType>

Page 33: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. <field name="sslcerts-hostname" type="sslcerts_hostname" indexed="true" stored="true" multiValued="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory” /> </analyzer>

<fieldType name="sslcerts_hostname" class="solr.TextField" positionIncrementGap=”100” sortMissingLast=”true”> <analyzer> <tokenizer class=”solr.WhitespaceTokenizerFactory”/> <filter class=”solr.NGramFilterFactory” maxGramSize=”25” minGramSize=”3”/> <filter class=”solr.LowerCaseFilterFactory”/> </analyzer> </fieldType>

Page 34: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

hit("VIRGINIA.EDU"), hit("bootp.virginia.edu"), hit("\"d-128-100-108.bootp.virginia.edu\""),

miss("mail.virginia.edu"));

}

Page 35: A noobs lesson on solr (configuration)

FIELD? FIELDTYPE? HALP PLS. @Test public void sslCertsHostNameField() throws SolrServerException {

testExpectations("sslcerts-hostname", "d-128-100-108.bootp.virginia.edu",

hit("VIRGINIA.EDU"), hit("bootp.virginia.edu"), hit("\"d-128-100-108.bootp.virginia.edu\""),

miss("mail.virginia.edu"));

}

THIS TEST PASSES :-D

Page 36: A noobs lesson on solr (configuration)

SUMMARY OF WHAT WE LEARNED.

A Solr index is comprised of a bunch of documents (token streams) –  think index in the back of a book example

schema.xml holds the brains, the power, the rules –  for how data gets stored as documents and how

they’re returned from matching queries

thanks to Steve’s exercises, I was able to look at the schema.xml file and… for the most part, understand it

Hopefully you can look at it now and understand it too

Page 37: A noobs lesson on solr (configuration)

QUESTIONS?