Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Post on 12-Jan-2017

610 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

Transcript

Harnessing ThePower of SearchAndré Ricardo Barreto de Oliveira ("Arbo")Software Engineer - Team Lead - Search

Darmstadt, Germany7 October, 2015

What's Searchand why is it so cool?

The dawn of Search

Searching higher

Search and the

Digital Experience

Understanding Search

Inside the Search Engine

The Index

Inside the Search Engine

The Index Documents

Inside the Search Engine

The Index Documents Fields

Inside the Search Engine

The Index Documents Fields

Not that different from ye olde database?...

Indexing documents

PUT /megacorp/employee/1{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]}

PUT /megacorp/employee/2{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ]}

PUT /megacorp/employee/3{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ]}

Queries and Filters

GET /megacorp/employee/_search?q=last_name:Smith "hits": [ { "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ]

GET /megacorp/employee/_search{ "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 21 } } }, "query" : { "match" : { "last_name" : "smith" } } } }}

Full-Text Search

GET /megacorp/employee/_search{ "query" : { "match" : {

"about" : "rock climbing" } }}

"hits": [ {

"_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25,

"about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, {

"_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32,

"about": "I like to collect rock albums", "interests": [ "music" ] } } ]

Analysis and Analyzers

Set the shape to semi-transparent by calling Set_Trans(5)

Standard analyzer

set, the, shape, to, semi, transparent, by, calling, set_trans, 5

Simple analyzer

set, the, shape, to, semi, transparent, by, calling, set, trans

Whitespace analyzer

Set, the, shape, to, semi-transparent, by, calling, Set_Trans(5)

English language analyzer

set, shape, semi, transpar, call, set_tran, 5

Field mappings

{ "number_of_clicks": {

"type": "integer" }}

{ "tag": { "type": "string",

"index": "not_analyzed" }}

{ "tweet": { "type": "string",

"analyzer": "english" }}

Analytics and Aggregations

GET /megacorp/employee/_search{ "query": { "match": { "last_name": "smith" } }, "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } }}

"buckets": [

{

"key": "music",

"doc_count": 2,

"avg_age": {

"value": 28.5

}

},

{

"key": "sports",

"doc_count": 1,

"avg_age": {

"value": 25

}

}

]

The LiferaySearch Infrastructure

The Liferay Search architecture

Liferay Portal

Assets:web content,

message boards, wiki pages...

Search infrastructure

(Magic happens

here)

Search engine(s)

Indices, documents, analysis...

The Liferay Search Engine plugins

public interface SearchEngine {

public IndexSearcher getIndexSearcher();

public IndexWriter getIndexWriter();

}

public class ElasticsearchSearchEngineextends BaseSearchEngine

public class ElasticsearchIndexSearcherextends BaseIndexSearcher

public class ElasticsearchIndexWriterextends BaseIndexWriter

public class SolrSearchEngineextends BaseSearchEngine

public class SolrIndexSearcherextends BaseIndexSearcher

public class SolrIndexWriterextends BaseIndexWriter

Solr: schema.xml

<fields>

<field indexed="true"

name="articleId"

stored="true"

type="string_keyword_lowercase"

/>

<field indexed="true"

name="companyId"

stored="true"

type="long"

/>

<field indexed="true"

name="emailAddress"

stored="true"

type="string"

/>

</fields>

The Liferay Document Mappings

Elasticsearch: liferay-type-mappings.json

"LiferayDocumentType": {

"properties": {

"articleId": {

"analyzer": "keyword_lowercase",

"store": "yes",

"type": "string"

},

"companyId": {

"index": "not_analyzed",

"store": "yes",

"type": "string"

},

"emailAddress": {

"index": "not_analyzed",

"store": "yes",

"type": "string"

}

}

}

From Portal assets to Index documents…

public interface Indexer<T> {

public Document getDocument(T object);

}

public class JournalArticleIndexer extends BaseIndexer<JournalArticle> {

protected Document doGetDocument(JournalArticle journalArticle) {

Document document = getBaseModelDocument(CLASS_NAME, journalArticle);

document.addText(

LocalizationUtil.getLocalizedName(Field.CONTENT, languageId),

content);

document.addKeyword(

Field.VERSION, journalArticle.getVersion());

document.addDate(

"displayDate", journalArticle.getDisplayDate());

}

}

public class MBMessageIndexer extends BaseIndexer<MBMessage> {

protected Document doGetDocument(MBMessage mbMessage) {

Document document = getBaseModelDocument(CLASS_NAME, mbMessage);

document.addText(

Field.CONTENT, processContent(mbMessage));

document.addKeyword(

"discussion", discussion == null ? false : true);

if (mbMessage.isAnonymous()) {

document.remove(Field.USER_NAME);

}

}

}

public interface Document {

public void addKeyword(String name, String value);public void addNumber(String name, long value);

}

… from Search Box to queries and filters

public class JournalArticleIndexer

extends BaseIndexer<JournalArticle> {

public void postProcessSearchQuery(

BooleanQuery searchQuery,

BooleanFilter fullQueryBooleanFilter,

SearchContext searchContext) {

addSearchTerm(searchQuery, searchContext,

Field.ARTICLE_ID, false);

addSearchLocalizedTerm(searchQuery, searchContext,

Field.CONTENT, false);

addSearchLocalizedTerm(searchQuery, searchContext,

Field.TITLE, false);

addSearchTerm(searchQuery, searchContext,

Field.USER_NAME, false);

}

}

public class MBThreadIndexer

extends BaseIndexer<MBThread> {

public void postProcessContextBooleanFilter(

BooleanFilter contextBooleanFilter,

SearchContext searchContext) {

contextBooleanFilter.addRequiredTerm(

"discussion", discussion);

if ((endDate > 0) && (startDate > 0)) {

contextBooleanFilter.addRangeTerm(

"lastPostDate", startDate, endDate);

}

}

}

Classic query types (and filters)

TermQuery / TermFilter

"term" : { "locale" : "de_DE" }

TermRangeQuery / RangeTermFilter

"range" : { "age" : { "gte" : 8, "lte" : 42 } }

WildcardQuery

"wildcard" : { "company" : "L*ray" }

StringQuery

"query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)" }

BooleanQuery / BooleanFilter

"bool" : { "must" : { "term" : { "locale" : "de_DE" } }, "must_not" : { "range" : { "age" : { "from" : 8, "to" : 42 } } }, "should" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] }

Speaking to the Search Engine

public interface Query {

public BooleanFilter getPreBooleanFilter();

public Filter getPostFilter();

}

public interface Filter {

public Boolean isCached();

}

public class StringQueryTranslatorImpl implements StringQueryTranslator {

public QueryBuilder translate(StringQuery stringQuery) {

// Elasticsearch Client Java API

return QueryBuilders.queryStringQuery(stringQuery.getQuery());

}}

public class ElasticsearchIndexSearcher extends BaseIndexSearcher {

protected SearchResponse doSearch(

SearchContext searchContext, Query query) {

// Elasticsearch Client Java API

Client client = _elasticsearchConnectionManager.getClient();

SearchRequestBuilder searchRequestBuilder = client.prepareSearch(

getSelectedIndexNames(queryConfig, searchContext));

QueryBuilder queryBuilder = _queryTranslator.translate(

query, searchContext);

searchRequestBuilder.setQuery(queryBuilder);

SearchResponse searchResponse = searchRequestBuilder.get();

return searchResponse;

}}

Search in Liferay 7

What's new in Liferay 7

Liferay 6

● Embedded Lucene by default

● Remote: Solr only

● Solr 4

● Portal-centric Lucene clustering

Liferay 7

● Embedded Elasticsearch by default

● Remote: Elasticsearch and Solr

● Solr 5.x and SolrCloud

● Native, transparent Elasticsearch clustering

● Queries + Filters + Boosting + Geolocation

● Extensibility and modularization

● Enterprise extras

○ Shield for security

○ Marvel for cluster monitoring

○ Kibana for visualization

New Queries

MatchQuery

"match" : { "subject" : { "query" : "Liferay Portal", "type" : "phrase" }}

MoreLikeThisQuery

"more_like_this" : {"fields" : ["title", "content"],"like_text" : "Search In Liferay 7","min_term_freq" : 1, "max_query_terms" : 12

}

DisMaxQuery

"dis_max" : {"tie_breaker" : 0.7,"queries" : [

{ "term" : { "age" : 34 } },{ "term" : { "age" : 35 } }

]}

FuzzyQuery

"fuzzy" : { "user" : { "value" : "ed", "fuzziness" : 2, "max_expansions": 100 }}

MatchAllQuery / MatchAllFilter

"match_all" : { "boost" : 1.2

}

MultiMatchQuery

"multi_match" : { "query": "Enterprise. Open Source. For Life", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ]}

New Filters

ExistsFilter

"exists" : { "field" : "emailAddress" }

MissingFilter

"missing" : { "field" : "emailAddress" }

PrefixFilter

"prefix" : { "product" : "life" }

TermsFilter

"terms" : { "locale" : ["de_DE", "pt_BR", "en_CA"] }

QueryFilter

"fquery" : { "query" : { "bool" : { "must" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] } }, "_cache" : true}

Geolocation filters

GeoDistanceFilter

"geo_distance" : { "distance" : "12km", "pin.location" : { "lat" : 40, "lon" : -70 }}

GeoBoundingBoxFilter

"geo_bounding_box" : { "pin.location" : { "top_left" : { "lat" : 40.73, "lon" : -74.1 }, "bottom_right" : { "lat" : 40.01, "lon" : -71.12 } }}

GeoDistanceRangeFilter

"geo_distance_range" : { "from" : "200km", "to" : "400km", "pin.location" : { "lat" : 40, "lon" : -70 }}

GeoPolygonFilter

"geo_polygon" : { "person.location" : { "points" : [ [-70, 40], [-80, 30], [-90, 20] ] }}

Query-time boosting

"should": [ { "match": { "title": { "query": "Liferay Portal", "boost": 2 } } }, { "match": { "content": { "query": "Liferay Portal", } } } ]

New Aggregations: Top Hits

"terms": { "field": "conference", "size": 2},"aggs": { "talks": { "top_hits": { "size" : 1, "sort": [ { "attendees": { "order": "desc" } } ] } }}

{ "key": "Liferay DEVCON", "talks": { "hits": [ { "_source": { "title": "The Power of Search" } } ] } }, { "key": "Liferay North America Symposium", "talks": { "hits": [ { "_source": { "title": "The ELK Stack" } } ] } }

New Aggregations: Extended Stats

"extended_stats" : { "field" : "attendees"

}

"attendees_per_talk_stats": { "count": 9, "min": 72, "max": 99, "avg": 86, "sum": 774, "sum_of_squares": 67028, "std_deviation": 7.180219742846005 }

Modularity and Search

● OSGi● Liferay's default Search Engine: now a plugin in itself● Extension points in Search

○ Node Settings contributors → fine tune your cluster○ Index Settings contributors → fine tune your shards and

logs○ Analyzers and Mappings contributors → fine tune your

fields and queries

Liferay 7:Enter Elasticsearch

Why Elasticsearch?

Best of breed

Built for modern web applications

Distributed and clusterable by design

Lucene based

Multi-tenancy

Great vendor support

Great monitoring tools: Marvel, Logstash

Great for Developers

Open Source

Amazing documentation

High "just works" factor, e.g. zero-config indexing and clustering

REST for queries, health, admin - everything

Update live settings programmatically

Great Java Client API

Pretty JSON for talks ;-)

Clustering with Liferay and Elasticsearch

Production mode

Dev mode

Scaling and tuning made easy

Enterprise-level Searchin Liferay 7 EE

Security: Shield

Protect your Liferay index with a username and password

SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster

Elasticsearch plugin - no need for an external security solution

Restrict access to Liferay Portal instances with IP filtering

Monitoring: Marvel

Visualization:

Kibana

Thanks and happy searching!http://j.mp/SearchLiferayDevcon2015andre.oliveira@liferay.comgithub.com/arboliveira@arbocombr

top related