ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16 th April 2014
ECIR 2014 Industry DayContent Discovery Through Entity Driven Search
Alessandro Benedettihttp://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales16th April 2014
• Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions ConsultantAlfresco Partner of the Year 2012 and
2013
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer- Master in Engineering and Technology Software- Digital Identity and Security expert- Enterprise Search Background- Semantic, NLP, ML Technologies and Information Retrieval lover- Apache Stanbol Committer- Apache contributor
@adperezmoraleshttp://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer- Master in Computer Science- Information Retrieval background-- Enterprise Search specialist- Semantic, NLP, ML Technologies and Information Retrieval lover
@AlexBenedettihttp://uk.linkedin.com/in/alexbenedetti
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Enterprise Search Problems
8
Challenge : Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined in a query
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks
Working effectively together
Architecture
13
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach
Working effectively together
Smart Autocomplete Configuration
17
• Entity type properties
• Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’ categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency / Inverted Document Frequency
• Entity Type Frequency / Inverted Document Frequency
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph relations)
• Machine Learning components: Classification, Topic annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language
Zaizi HeadquartersBrook House4th Floor, North Wing229-243 Shepherd’s Bush RoadLondon W6 7ANUnited KingdomT: (+44) 20 3582 8330 Zaizi IberiaCalle Gremios 13-15, Edificio DiseñoPlanta 1, Oficina 541927 Mairena del Aljarafe SevillaSpainT: (+34) 666 42 43 64 Zaizi Asia50 Flower RoadColombo 07Sri LankaT: (+94) 112 301 461 Zaizi Singapore14 Robinson Road #13-00Far East Finance BuildingSingapore 048545T: (+65) 3158 5886F: (+65) 6323 1839
VAT Registration No GB 932 8855 89Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!