Top Banner
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan Meyyappan 1
33

Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Jan 01, 2016

Download

Documents

Austin Blake
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Kelly BocciaAbi Natarajan Konstantin LivitskiSenthil Anand SubbananMeyyappan Meyyappan

1

Page 2: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Agenda

Business Requirements Client Overview Business Problem Business Goal  Solution and Scope

Technical Specification System Context  Architecture Overview  Components & Modules Security Model Document indexing Search Explained

Implementation Plan Resource & Costs Development Environment Production Environment Success Criteria 

Prototype Q&A

2

Page 3: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

          Multi-National Manufacturing & Sales Corporation  

 Business Growth     - Multiple Applications             - Multiple Repositories           

 

 

Business Problem

3

Page 4: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Business Goal 

     Organize Intellectual Capital and Assets 

Accessibility - Connect knowledge workers securely to relevant information

Productivity - Increase productivity and reduce re-work by leveraging knowledge and expertise

Client Overview

4

Page 5: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Solution

Enterprise Knowledge Management Platform

5

Page 6: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

System Context

6

Page 7: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Components & Modules

7

Page 8: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Architectural Overview

8

Page 9: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Security Model

• Integrated with existing GLOCO's security infrastructure• Any access requires authentication• To follow a link in search results, user may need additional

authorization for repository access

9

Page 10: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Document indexing

• Document is anything that a search result can point at• Documents are external to the search engine• Documents include text and metadata • Lucene sees each document as a set of named fields

10

Page 11: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

How search works

• Lucene sees each document as a set of named fields • A record is created for each document to store some fields

o URL is usually a stored field• The main index is keyed by search term (i.e. inverted)

o Typical text fields are tokenized, filtered, and stemmed into terms o Indexed fields may be discarded after processing o For each term, a list of document IDs is stored to help locate recordso Also stores frequency and proximity

• Search involves retrieval of document IDs by term, and stored fields by the document ID

11

Page 12: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Resource / Cost Plan

 21 weeks total effort 13 member team including GLOCO and Innova INNOVA supports full SDLC with phases

Solution Outline, High Level Design, Detailed Design Build / Test / Deploy and Post Production Support

12

Page 13: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

SLATES - Development Environment

Developer workstation to host Virtual Images. Developer workstation to share development

Search Servers Fully configured environment to unit test and

development

13

Page 14: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

SLATES - QA / Test and Production

• Sticky load balancer to remember the serving tomcat

 • Each Search server

to hold multiple instances.

 • Shared / Cached

Network storage to share index

 • Similar configuration

for both QA and Production environment

14

Page 15: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Success Criteria and Benchmarks

Most important project success criteria are:  10% time and resource savings on certain R&D activities  75% positive feedback on user surveys  50% of the target user group are actively using the system  5% of available documents have user-defined tags 

   

15

Page 16: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 1 Searches for the keyword 'Blood Glucose'

16

Page 17: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 1 gets back the results with the keyword ‘blood glucose’

17

Page 18: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 1 adds tag ‘diabetes’ to a result

18

Page 19: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Tag ‘diabetes’ is immediately available for searching

19

Page 20: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 2 searches for keyword ‘diabetes’

20

Page 21: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 2 gets back a result for keyword ‘diabetes’

21

Page 22: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 2 clicks on keyword ‘bp testing’ in the tag cloud

22

Page 23: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

User 2 gets more results for keyword ‘bp testing’

23

Page 24: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Thank you!

Innova would like to thank:

Zoya KinstlerJeff Parker

Basem NaseimValar Jayaprakash

Classmates Harvard University Extension School 

24

Page 25: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Questions?

25

Page 26: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Reference Slides

26

Page 27: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

27

Page 28: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Index Growth

• Index size is a percentage of the document corpus size• Maintenance trade-off:

o Expensive segment merges - load all segments, write a new oneo Fragmented index is expensive to query - must read all segments

• Lucene index segments are write-once - helps with concurrency• Updates are done as delete - re-add. Updates should be

batchedo Direct tagging is inefficient

28

Page 29: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Scalability

(Source: Mark Miller, "Scaling Lucene and Solr", Lucid Imagination, 2010)

• Query volume is scaled by replication• Index size and indexing load is scaled by sharding

29

Page 30: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Phase 1 - Work Break Down Chart

• 21 weeks total effort• 13 member team including GLOCO and Innova• INNOVA supports full SDLC with phases - Solution Outline,High

Level Design, Detailed Design, Build / Test / Deploy and Post Production Support

30

Page 31: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Use Case - Search and Tag

31

Page 32: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Hardware / Software - Detailed Configuration

32

Page 33: Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.

Interface Specification

33