Lucidworks Fusion 2.3 Preview
Grant Ingersoll
@gsingers
CTO, Lucidworks
Search-Driven Everything
Customer Service
Customer Insights
Fraud Surveillance
Research Portal
Online Retail Digital Content
Lucidworks Fusion Is Search-Driven Everything
• Drive next generation relevance via Content, Collaboration and Context
• Harness best in class Open Source: Apache Solr + Spark
• Simplify application development and reduce ongoing maintenance
CATALOG
DYNAMIC NAVIGATION AND LANDING PAGES
INSTANT INSIGHTS AND ANALYTICS
PERSONALIZED SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations & Alerts
Analytics & InsightsExtreme Relevancy
Access data from anywhere to build intelligent, data-driven applications.
Fusion Architecture
REST
API
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HD
FS (O
ptio
nal)
Shared Config Mgmt
Leader Election
Load Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
What’s New?http://www.lucidworks.com/products/fusion
• General Improvements
• Index Pipeline Previews
• Better Time Series Indexing
• Spark goodness
Agenda
• System:
• Improved Javascript Stage performance
• Updated Versions for: Solr (5.4.1), Tika (1.12), Spark (1.6.1)
• Security:
• SAML-based security support
• API password-redaction capabilities
• Connectors:
• Box now supports JWT authentication, for easier setup
• Azure now supports incremental crawling
• HDFS and Windows Shares now support Kerberos authentication
• Additional controls for Github crawling
General Improvements
• Sample your data source and preview documents without indexing
• Build and test custom pipelines without affecting the original definitions
• Copy, save, merge pipelines upon completion
Enhanced Data Modeling via Index Pipeline Previews
• Greatly simplify the care and feeding of time-based indexes
• Point and click creation of time series shards
• Total control over number of shards and replication
• Easily defined retention and archiving policies (e.g. 30 day retention)
• Intelligent query parsing optimizes shard access
• Ideal for log data and signals
Time Series Done Right
• User Interface designed for quickly getting started with Fusion and easy customization
• Popular features are pre-configured
• Built on AngularJS and Apache-licensed open source
• Built in templates for viewing a variety of data sources
• Learn more: https://lucidworks.com/products/view/
• Fork on Github: https://github.com/lucidworks/lucidworks-view
Lucidworks View
DemoIndex Preview
Time Series
Lucidworks View
• Improved Spark streaming and data locality integration resulting in significant performance improvements
• $FUSION_HOME/bin/spark-shell available for rapid prototyping and testing of Spark in the Fusion environment using the command line
• Check out: http://github.com/lucidworks/spark-solr
Spark FTW
• Support for new Spark Job types:
• Aggregations, Script, Item Similarity, Quality
• Spark Job API now available at “/spark/jobs”
• Create and run your own Spark jobs
• Leverage best in class libraries like MLLib, Mahout and DL4J
Fusion: Creating Jobs for Engineers Since 2015
• Spark has very basic text handling capabilities built-in (whitespace tokenization and a few others)
• Lucene has a fast, capable text analysis system built-in, hence:
• We’ve made Lucene Analyzers work nicely in Spark!
• Learn more at:
• https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/
• https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/analysis/LuceneTextAnalyzer.scala
Lucene + Spark: Getting Past the Whitespace
• Fusion can now capture and calculate common search metrics like:
• Mean Reciprocal Rank
• Precision/Recall
• NDCG (Normalized Discounted Cumulative Gain)
• Uses the same framework as signals and aggregations, meaning you can easily track and report across time
Speaking of Quality…
DemoSpark Shell, run k-Means, index clusters:
https://github.com/lucidworks/fusion-examples/tree/master/fusion-2.3-webinar/src/main/spark-shell
• Next Release will be 3.0 (June/July timeframe)
• Java 8 and above
• Solr 6.x
• Query Pipeline Builder
• Enhanced Machine Learning capabilities
• Preview in 2.3, but marked experimental
• Full featured Experiment Management framework with support for multi-arm bandit optimization
• Easy import/export for moving from Dev -> QA -> Staging -> Production
Looking Ahead
• Fusion 2.3 will be available week of April 25th
• Learn more about Fusion at: http://www.lucidworks.com/products/fusion
• Learn more about Lucidworks View: https://lucidworks.com/products/view/
• Fusion docs available at http://docs.lucidworks.com
Questions?