Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark

2016

OCTOBER 11-14BOSTON, MA

http://lucenerevolution.org

http://lucenerevolution.org

Lucidworks Fusion 2.3 Preview

Grant Ingersoll

@gsingers

CTO, Lucidworks

Search-Driven Everything

Customer Service

Customer Insights

Fraud Surveillance

Research Portal

Online Retail Digital Content

Lucidworks Fusion Is Search-Driven Everything

• Drive next generation relevance via Content, Collaboration and Context

• Harness best in class Open Source: Apache Solr + Spark

• Simplify application development and reduce ongoing maintenance

CATALOG

DYNAMIC NAVIGATION AND LANDING PAGES

INSTANT INSIGHTS AND ANALYTICS

PERSONALIZED SHOPPING EXPERIENCE

PROMOTIONS USER HISTORY

Data Acquisition

Indexing & Streaming

Smart Access API

Recommendations & Alerts

Analytics & InsightsExtreme Relevancy

Access data from anywhere to build intelligent, data-driven applications.

Fusion Architecture

REST

API

Worker Worker Cluster Mgr.

Apache Spark

Shards Shards

Apache Solr

HD

FS (O

ptio

nal)

Shared Config Mgmt

Leader Election

Load Balancing

ZK 1

Apache Zookeeper

ZK N

DATABASEWEBFILELOGSHADOOP CLOUD

Connectors

Alerting/Messaging

NLP

Pipelines

Blob Storage

Scheduling

Recommenders/Signals

…

Core Services

Admin UI

SECURITY BUILT-IN

Lucidworks View

What’s New?http://www.lucidworks.com/products/fusion

http://www.lucidworks.com/products/fusion

• General Improvements

• Index Pipeline Previews

• Better Time Series Indexing

• Spark goodness

Agenda

• System:

• Improved Javascript Stage performance

• Updated Versions for: Solr (5.4.1), Tika (1.12), Spark (1.6.1)

• Security:

• SAML-based security support

• API password-redaction capabilities

• Connectors:

• Box now supports JWT authentication, for easier setup

• Azure now supports incremental crawling

• HDFS and Windows Shares now support Kerberos authentication

• Additional controls for Github crawling

General Improvements

• Sample your data source and preview documents without indexing

• Build and test custom pipelines without affecting the original definitions

• Copy, save, merge pipelines upon completion

Enhanced Data Modeling via Index Pipeline Previews

• Greatly simplify the care and feeding of time-based indexes

• Point and click creation of time series shards

• Total control over number of shards and replication

• Easily defined retention and archiving policies (e.g. 30 day retention)

• Intelligent query parsing optimizes shard access

• Ideal for log data and signals

Time Series Done Right

• User Interface designed for quickly getting started with Fusion and easy customization

• Popular features are pre-configured

• Built on AngularJS and Apache-licensed open source

• Built in templates for viewing a variety of data sources

• Learn more: https://lucidworks.com/products/view/

• Fork on Github: https://github.com/lucidworks/lucidworks-view

Lucidworks View

https://lucidworks.com/products/view/

https://github.com/lucidworks/lucidworks-view

DemoIndex Preview

Time Series

Lucidworks View

• Improved Spark streaming and data locality integration resulting in significant performance improvements

• $FUSION_HOME/bin/spark-shell available for rapid prototyping and testing of Spark in the Fusion environment using the command line

• Check out: http://github.com/lucidworks/spark-solr

Spark FTW

http://github.com/lucidworks/spark-solr

• Support for new Spark Job types:

• Aggregations, Script, Item Similarity, Quality

• Spark Job API now available at “/spark/jobs”

• Create and run your own Spark jobs

• Leverage best in class libraries like MLLib, Mahout and DL4J

Fusion: Creating Jobs for Engineers Since 2015

• Spark has very basic text handling capabilities built-in (whitespace tokenization and a few others)

• Lucene has a fast, capable text analysis system built-in, hence:

• We’ve made Lucene Analyzers work nicely in Spark!

• Learn more at:

• https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/

• https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/analysis/LuceneTextAnalyzer.scala

Lucene + Spark: Getting Past the Whitespace

https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/

https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/analysis/LuceneTextAnalyzer.scala

• Fusion can now capture and calculate common search metrics like:

• Mean Reciprocal Rank

• Precision/Recall

• NDCG (Normalized Discounted Cumulative Gain)

• Uses the same framework as signals and aggregations, meaning you can easily track and report across time

Speaking of Quality…

DemoSpark Shell, run k-Means, index clusters:

https://github.com/lucidworks/fusion-examples/tree/master/fusion-2.3-webinar/src/main/spark-shell

https://github.com/lucidworks/fusion-examples/tree/master/fusion-2.3-webinar/src/main/spark-shell

• Next Release will be 3.0 (June/July timeframe)

• Java 8 and above

• Solr 6.x

• Query Pipeline Builder

• Enhanced Machine Learning capabilities

• Preview in 2.3, but marked experimental

• Full featured Experiment Management framework with support for multi-arm bandit optimization

• Easy import/export for moving from Dev -> QA -> Staging -> Production

Looking Ahead

• Fusion 2.3 will be available week of April 25th

• Learn more about Fusion at: http://www.lucidworks.com/products/fusion

• Learn more about Lucidworks View: https://lucidworks.com/products/view/

• Fusion docs available at http://docs.lucidworks.com

Questions?

http://www.lucidworks.com/products/fusion

https://lucidworks.com/products/view/

http://docs.lucidworks.com

Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark

Technology