Top Banner
1 1 A New Content Processing Framework for Search Applications Iain Fletcher [email protected]
34

A New Content Processing Framework for Search Applications Iain Fletcher [email protected]

Feb 25, 2016

Download

Documents

miyo

A New Content Processing Framework for Search Applications Iain Fletcher [email protected]. Agenda. Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

1

1

A New Content Processing Framework for Search Applications

Iain [email protected]

Page 2: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

2Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

2

Page 3: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

3Search Technologies overview 3

• The leading IT services company focused on search engines• Consulting• Implementation• Managed services

• Technology independent, working with most of the leading search engines

• 90 staff, 250+ customers

Page 4: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

4Search Technologies overview

San Diego, CA

San Jose, CR

Herndon, VA

Ascot, UKBoston, MACincinnati, OH

Page 5: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

5Executive team

Executive Enterprise Search Industry Experience

Kamran KhanPresident & CEO

18 years: International Sales, VP Sales, Executive

John Steinhauer VP Technology

16 years: Development Management, Project Management, Executive

Paul NelsonChief Architect

22 years: Development, Innovation, Architecting, Dev. Management

Graham CharlesworthVP Europe

16 years: Business Development, VP Sales, Executive

Phil LewisTech. Director, Europe

19 years: Development, Innovation, Architecting, Project Management

Dennis TranVP & Founder

21 years: International Sales, VP Sales

John BackVP Sales

15 years: Sales, Federal Sales Director

Iain FletcherVP Marketing

16 years: International Sales, Product Management, VP Marketing

# years in the search engine industry

5

Page 7: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

7

7

A New Content Processing Framework for Search Applications

Page 8: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

8Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

8

Page 9: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

9Enterprise Search - An Indifferent Reputation

• Major surveys show that no progress has been made during the last 10 years

• Searchers are successful in finding what they seek 50% of the time or less • 2001, IDC, “Quantifying Enterprise Search”

• More than half cannot find the information they need using their Enterprise search system • 2011, MindMetre/SmartLogic, “Mind the Enterprise

Search Gap”

9

Page 10: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

10Search Fundamentals 10

Page 11: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

11Metadata Supports Relevance Ranking

Page 12: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

12Metadata Supports Relevance Ranking

Supported by great metadata!• Title• Meta description•URL• Inbound links• Alt tag text•Etc.•Provided for free by millions of SEO practitioners

Page 13: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

13Key Issues

• Almost all modern search functions are driven by data structure

13

Page 14: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

14Key Issues

• The majority of serious problems in serious search systems are caused by data quality issues

Also...• “Big Data” and BI from unstructured data will

face the same challenges• Can you trust an analysis if you are unsure of data

providence?

14

Page 15: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

15Data quality examples

• The subscription portal caught out by template information

• The Intranet search skewed by a new piece of hardware

• The Intranet search where great quality was the problem!

15

Page 16: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

16Key Issues

• Data structure and quality issues are addressed in the indexing pipelines of search engines• Cleaning, enriching, normalizing, granularizing...

• It is about process as much as technology• And data constantly evolves

• Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency)• Some search engines don’t really have one

• We’ve written our own

16

Page 17: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

17Agenda

• Briefly About Search Technologies• Key Issues for Enterprise Search• A New Content Processing Framework for

Search Applications• How do we use it?• What does it look like?• Use case example

17

Page 18: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

18Document Processing Methodology for Search (DPMS)

• The Philosophy• Understand the Document Model• Understand the User Model

• Includes business-level requirements• Create the Search Engine Model

• Search = the pivot point between User and Data• Document everything

18

Page 19: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

19DPMS – The Methodology

Assessment (Search Technologies

Architect and Business Analyst)

DPMSAnalysis

(Knowledge Engineer, Business Analyst, etc.)

Assessment Report

Expert assessment and recommendations

ValidationAspire

DMDsReview

(Architect, Domain Experts, Peers)

1Assessment

2Detailed Analysis

3Execution

Implementation(Developer)

Validate DMDsSearchEngine

Page 20: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

20DPMS – The Implementation

Page 21: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

21Introducing “Aspire”

• Think of it as a stand-alone indexing pipeline with a framework + component architecture

• Framework built for scalability, performance and flexibility – designed to use cloud elasticity

• Components built to be autonomous and transparent

Page 22: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

22Technology Suite

• 100% Java• OSGi™ See www.osgi.org

• The Dynamic Module System for Java™• Apache Felix

• Open source implementation of OSGi• Jetty

• Embedded HTTP server• Maven & Maven Repositories

• For component deployment

Page 23: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

23Component Configuration

• Any number of document processing pipelines can be used in an application

• Disparate data sources will need different treatment• Components can be shared where appropriate• Configurations are easy to change

23

Page 24: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

24Component autonomy

• Components communicate via XML• Each component has a known and transparent input and output,

and can be tested in isolation• This simplifies problem diagnosis, promotes transparency and

controls cost-of-ownership

24

Page 25: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

25Data Quality Monitoring

• Components have built-in quarantine systems to monitor data quality

• Content is constantly evolving• This provides transparency and enables content issues to be

diagnosed and resolved faster

25

Page 26: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

26The Component Library

• Search Technologies maintains a library of components

• Currently there are more than 70• Components can be as simple as 3 lines

of groovy script, or complex, 3rd party technologies

• Many applications can be addressed using existing components + configuration

26

Page 27: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

27Component Upgrading

• Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system

• Helpful in the maintenance of complex or mission-critical systems

27

Page 28: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

28Component control

• Every component has its own control / status page

28

Page 29: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

29A very simple example

Page 30: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

30Security expansion example

Page 31: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

31Patent Assignee Name Normalization

Page 32: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

32Complexity example 32

• CPA Global Discover• The world’s leading patent research

portal• 80 million patents from 95 patent offices• More than a dozen navigators built• Numerous graphical search results

display options• Whole document comparison features

Page 33: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

33In Summary

• Many applications today don’t need this level of diligence• But as data and data dynamism grows, more will

• A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies

• Method. Diligence. Transparency – its not rocket science...

• Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years

33

Page 34: A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com

34

34

Thank You!

Iain [email protected] http://uk.linkedin.com/in/iainfletcher