SURE Internal Training
Jul 16, 2015
SUREInternal Training
SUREInternal Training
Work at SURE Technology &
Consulting
Technical Team Leader, and Enterprise Search
Consultant.
In love with ASP.Net, SharePoint, ALM, and Software Architecture, involved in search technology, and search solutions since 2007
Has working Experience with
Profile: http://www.linkedin.com/in/usamanada
Twitter: https://twitter.com/usama_nada
Lets Start
Know What is Enterprise Search.
How Search Works
The business of Search
SUREInternal Training
General overview
Problems it came to solve
Different repositories
Data in many formats.
Very Large Volumes.
Security Concerns
Bad Relevancy offered by databases solutions
High Query Rate per second killing your Database
….
What is Enterprise Search?
It helps you find your stuff…
Give me better definition…
Search Based Application
A software application in which a search engine platform is used as the core
infrastructure for information access and reporting.
Whose main purpose is performing a domain-oriented task.
Search Engine
Effectiveness (quality of results)
As good as possible
Efficiency (response time and throughput)
As quickly as possible
SUREInternal Training
high level overview of the search concepts and architecture
How Search Works Getting The Data
Crawlers
Web Crawler
Focused Crawler
Connectors
Database
ECM
CRM
Exchange
Files
…
How Search Works Process The Data (Indexing)
How Search Works
Document Words
Document 1the,cow,says,moo
Document 2the,cat,and,the,hat
Document 3
the,dish,ran,away,with,t
he,spoon
Forward Index
How Search Works Search The Data
How Search Works Summary
SUREInternal Training
Selected Features Architecture
Distributed Computing capabilities
Support building High scalable, high performance, and fault tolerant clusters
Index Replication, load balancing
Near Real-Time Indexing
….
For Developers and System Integrators
API Access for Indexing and Searching
Ability to build custom connectors
Advanced configurable Language Analysis
Relevancy and ranking is configurable
….
Selected Features Faceted Search and Filtering
Selected Features Multimedia Search Filter by Images Attributes
Selected Features
Advanced Text Analysis.
Language detection + Tokenization + Normalization
Arabic (all NLP features: Morphology, Normalization, translation, named entity,, synonyms, and more …)
Farsi (Persian), Urdu, Pachtoun, Cyrillic, Chineese/Japanese/Korean …. And
others
Selected Features Entity Extraction Enables “Discovery”
Languages:
Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Pashto,
Persian and(Farsi, Dari), Portuguese, Russian, Spanish, Urdu, …
Selected Features Synonyms
DB administrator
is defined as synonym of
Database Administrator
This synonymy
can be in one direction or
both ways
Selected Features
Name Indexing (cross-language “People Search”).
Selected Features Multilingual Search (Cross Language Information Retrieval)
Afghanistan
Selected Features Taxonomy (Categorizer): Predict category of a new document using an existing training
dataset (for example: dmoz)
Business
Consumer
Services
InqueriesCustomer
Service
Shopping
Pets
Selected Features Geospatial Search
• Limiting the search queries to geographic area
• Users can draw polygon and circle shapes to refine search results to desired areas
• Multiple Areas can be selected for single query
Selected Features
Enterprise Search as a NoSQL Database
NoSQL Data Store:
Non-traditional data stores. Not built around SQL, Distributed,
Fault Tolerant Architecture. Built to provide High Performance
Selected Features Enterprise Search as a BI platform
Other Features Spell checking
Query suggestion
Autosuggest
Search Alerts
Document Thumbnails
Sentiment analysis
Targeted Ads, and document boosting.
Recommendations. “More Like This”
Translate, visualization, …
SUREInternal Training
Search Market Market Size: In 2012 The total annual sales of search software may only amount to $3billion at most and there are
probably no more than 80 companies in the business at present
Vendors: Exalead, Google, Oracle, Attivio, HP, ….
System Integrators: There are now a number of systems integration companies that specialize in search
implementation projects, offering a range of services
Open Source Search: Getting Much Stronger since SOLR appearance in 2006 with different business models
Appliances: Started with Google and Autonomy and now to SOLR
Cloud: cloud-based search-as-a-service applications lead by Amazon, and windows Azure.
Specialized Search Components: NLP Components, and Document Filters
Selected Market Players
• Lexmark - Isys-Search
ReferencesWikipedia : Web Crawler, Search engine indexing, TF-IDF, Cosine Similarity, Vector
Space Model
Gartner: Gartner Magic Quadrant for Enterprise Search
Articles: “NoSQL, Lucene, and Solr”, TF-IDF for Dummies, TF-IDF and cosine similarity
Blogs: Exalead Blog, Attivio Blog, Enterprise Search Blog, LucidWorks Blog
Books: Enterprise Search (O’Reilly, 2012), An Introduction to Information Retrieval (Cambridge UP, 2009)
Slides: Exploring search driven applications with SharePoint 2013
Academic: Information Retrieval Course(Conrel University)
Information Retrieval and Web Search (SFU)
Search Engine Architecture (HPI)
SUREInternal Training
Thank You