Top Banner
1 An Overview of the Enterprise Search Market, & Current Best Practices Iain Fletcher [email protected] April 20, 2015
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: II-SDV 2015, 20 - 21 April, in Nice

1

An Overview of the Enterprise Search Market, & Current Best Practices

Iain Fletcher

[email protected]

April 20, 2015

Page 2: II-SDV 2015, 20 - 21 April, in Nice

2

Agenda

• A brief overview of the current enterprise search

market

• The convergence of search with analytics

disciplines

• Likely future architectures for search applications

Page 3: II-SDV 2015, 20 - 21 April, in Nice

3

Search engines continue to proliferate…

Page 4: II-SDV 2015, 20 - 21 April, in Nice

4

High-level Search Engine Classifications

1. Part of a portfolio, many are recently acquired technologies

– E.g. SharePoint, HP Autonomy, IBM/Vivisimo, Dassault/Exalead

2. Stand-alone specialists, often bought to address specific apps

– E.g. GSA, Coveo, Attivio, Sinequa, Recommind

3. Open source, with or without support or proprietary add-ons

– Raw: E.g. Lucene, Solr, Elasticsearch

– With support/add-ons: E.g. LucidWorks, Cloudera Search, Elastic

4. Cloud-based services, typically based on open source technology

– E.g. Amazon Cloudsearch, MS Azure search

Page 5: II-SDV 2015, 20 - 21 April, in Nice

5

The dominant market share is with SharePoint, open

source, and the Google Search Appliance

• SharePoint 2013 search is credible, and bundled

– Search teams are under pressure to use it, or to provide a

compelling reason to do otherwise

• Solr and Elasticsearch are robust and reliable

– Thanks to very wide-spread deployment

• The Google brand sells search – and a lot of GSAs have

been shipped during the past few years

Market Observations

Page 6: II-SDV 2015, 20 - 21 April, in Nice

6

Functional Observations

• Core indexing / searching is generally fast and reliable

– Search is a maturing technology

• Key differences remain in peripheral functionality, such as

content processing prior to indexing. For example:

– Coveo, Attivio, Sinequa all have well-developed indexing

pipelines, UI tools, and a range of data connectors

– SharePoint and GSA have limited content processing

functionality and rely on 3rd parties for connectivity

– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t

provide a formal indexing pipeline, UI, or connectors

Page 7: II-SDV 2015, 20 - 21 April, in Nice

7

Further Observations

• The search engines with less focus on peripheral issues

(such as content processing and connectivity) have

dominant market share

• Connectivity remains challenging, especially when

combined with continual data growth

• The movement of data sets to the cloud adds further

complexity

– Hybrid indexing environments will be with us for some years

Page 8: II-SDV 2015, 20 - 21 April, in Nice

8

Content Processing / Text Analysis Examples

• Normalization

– Names, dates, synonyms, spelling

• Entity identification and resolution

• Additional metadata from content analysis

• Categorization

• Document vector extraction

• Splitting and concatenation

• Dupe & near-dupe detection

• Link analysis

• Ingesting external signals

• Security enforcement and analysis

Index

security

category

metadata

Page 9: II-SDV 2015, 20 - 21 April, in Nice

9

Future Directions

So what will search architectures look like in the future?

Important Influences:

• The need for organizational and analytical agility

• The convergence of search and (“big data”) analytics

• Continual growth in data volumes, and churn in repository

/ storage fashions

Page 10: II-SDV 2015, 20 - 21 April, in Nice

10

Converging Architectures

Let’s take a brief look at:

1. The “Big Data Architecture”, evangelized by IBM,

Cloudera, etc.

2. Contemporary Search Architectures

Background Info

Page 11: II-SDV 2015, 20 - 21 April, in Nice

11

The Big Data Architecture

Designed for Structured Data

Page 12: II-SDV 2015, 20 - 21 April, in Nice

12

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

Designed for Unstructured Content

Page 13: II-SDV 2015, 20 - 21 April, in Nice

13

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

• A few documents-per-second?

• There are only 2.6 million seconds in a month

• If you change something significant in the index

pipeline, you will need to re-index

RE-INDEX

Page 14: II-SDV 2015, 20 - 21 April, in Nice

14

A Better Search Architecture

• Re-indexing rates greatly improved

• “Touch-time” with repositories can be managed autonomously

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndex

EmployeeDirectory

CMS

Etc.

RE-INDEX

Content

Processing

Staging Repository

Iterative

Development

Page 15: II-SDV 2015, 20 - 21 April, in Nice

15

The Future Architecture?

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

Staging Repository

Iterative

Development

• This environment will encourage ever more sophisticated content processing• We expect much innovation in text analytics during the next few years

• Driven by cheap, easily available processing power

• The deliverable is a richer search index

Page 16: II-SDV 2015, 20 - 21 April, in Nice

16

The Future Architecture

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

Staging Repository

Iterative

Development

• Google.com works something like this for 10+ years

Page 17: II-SDV 2015, 20 - 21 April, in Nice

17

An Integrated Search/Analytics Architecture

Hadoop

ContentSources

Connectors

/ Crawlers

CMS

File system

Rapid, & ad hoc Indexing

Content

Processing

Staging Repository

Iterative

Development

ETL

DataSources

Data Warehouse

Logfiles

Etc.

OSINT Search App.

Search App.

Analysis App.

Analysis App.

• Encourages agile exploitation of data and content resources

Page 18: II-SDV 2015, 20 - 21 April, in Nice

18

Summary• Search and Analytics are tending towards to the same

architecture

• Autonomous connectivity and content processing systems simplify and de-risk projects

• The “search index” is a mature technology, and becoming a commodity

– Thanks to open source alternatives setting high standards

• The centre of attention is shifting from the index to the content preparation

– This perhaps fits well with the profile of dominant market leaders: SharePoint, GSA, Solr, Elasticsearch….

Page 19: II-SDV 2015, 20 - 21 April, in Nice

19

Conclusion

• The foundation of great search and analytical applications

is a clean, rich and detailed index

• Much of the innovation during the next years will be in

content analytics

– The architecture discussed makes it easy to adopt new ideas

and products

– And it promotes agility, experimentation, and innovation

• In a data-driven world, agility is vital

Page 20: II-SDV 2015, 20 - 21 April, in Nice

20

The analyst quote….

And finally….

“Enterprise Search Can Bring Big Data Within Reach”

• Multiple, purpose-built indexes that are derived from enriched content are necessary.

http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

Page 21: II-SDV 2015, 20 - 21 April, in Nice

21

An Overview of the Current Enterprise Search Market, & Current Best Practices

Iain Fletcher

[email protected]

April 20, 2015

Thank you!

Page 22: II-SDV 2015, 20 - 21 April, in Nice

22

Spare Slides

Page 23: II-SDV 2015, 20 - 21 April, in Nice

23

Reference Architecture

Content sources

Connectors

Indexes

Semantics

Text Mining

Quality Metrics

Content Processing Pipelines

Big Data Framework

Indexes

Queryparsing

Search Engine

Web Browser

Staging Repository

Page 24: II-SDV 2015, 20 - 21 April, in Nice

24CONFIDENTIAL

Page 25: II-SDV 2015, 20 - 21 April, in Nice

25CONFIDENTIAL