Top Banner
ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16 th April 2014
24

Content Discovery Through Entity Driven Search

Jan 21, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Content Discovery Through Entity Driven Search

ECIR 2014 Industry DayContent Discovery Through Entity Driven Search

Alessandro Benedettihttp://uk.linkedin.com/in/alexbenedetti

Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales16th April 2014

Page 2: Content Discovery Through Entity Driven Search

• Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle

• Alfresco & Ephesoft certified Platinum Partner

• Red Hat Enterprise Linux Ready Partner

• Crafter & Varnish Gold Partners

• Search Solutions ConsultantAlfresco Partner of the Year 2012 and

2013

Page 3: Content Discovery Through Entity Driven Search

Working effectively together

Who We Are

3

Antonio David Pérez Morales

- R&D Senior Engineer- Master in Engineering and Technology Software- Digital Identity and Security expert- Enterprise Search Background- Semantic, NLP, ML Technologies and Information Retrieval lover- Apache Stanbol Committer- Apache contributor

@adperezmoraleshttp://es.linkedin.com/in/adperezmorales/

Alessandro Benedetti

- R&D Senior Engineer- Master in Computer Science- Information Retrieval background-- Enterprise Search specialist- Semantic, NLP, ML Technologies and Information Retrieval lover

@AlexBenedettihttp://uk.linkedin.com/in/alexbenedetti

Page 4: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

4

• Context

• Problem

• Solution

• Demo

• Future Works

Page 5: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

5

• Context

• Problem

• Solution

• Demo

• Future Works

Page 6: Content Discovery Through Entity Driven Search

Working effectively together

Zaizi R&D Department

6

•Giving sense to the content

• Enriching it semantically

•Adding value to ECM/CMS

• More structured content, easy to manage, link and search,

•Improving search

• Across different domains, data sources, User Experience

• Machine Learning applied research

• Content Organization – Recommendation Systems

Page 7: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

7

• Context

• Problem

• Solution

• Demo

• Future Works

Page 8: Content Discovery Through Entity Driven Search

Working effectively together

Enterprise Search Problems

8

Challenge : Search within Big and Heterogeneus Repositories

• Heterogeneus Data Sources

• Filesystem, DB, ECM/CMS, Email, …

• Unstructured Content

• PDFs, text plain, Word, …

• Documents not linked between each other

• Federated Search needed

• Search across data sources

• Different permissions

• Centralized endpoint

Page 9: Content Discovery Through Entity Driven Search

Working effectively together

Current Enterprise Search Weaknesses

9

• Keyword based

• Low precision

• Ambiguous terms not in context

• Not accurate weighting when keywords are combined in a query

Page 10: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

10

• Context

• Problem

• Solution

• Demo

• Future Works

Page 11: Content Discovery Through Entity Driven Search

Working effectively together

Entity Driven Search

11

• Moves from keywords to Entities

•More understandable to a Human

• Process the unstructured text

• Enrich it

• Build specific indexes

• Use entities and concepts in searches

Page 12: Content Discovery Through Entity Driven Search

Working effectively together

Sensefy

12

• Semantic Enterprise Search Engine

• Federated Search

• Evolved User Experience

• Based on cutting-edge Open Source Frameworks

Page 13: Content Discovery Through Entity Driven Search

Working effectively together

Architecture

13

Page 14: Content Discovery Through Entity Driven Search

Working effectively together

RedLink

14

• Semantic Cloud platform

• Providing Software as a Service

• Manage unstructured data

• Extract knowledge and intelligence

• Make sense of information

• Feed into business processes

• Open-Source based components

• Entity Linking using Knowledge Bases

Page 15: Content Discovery Through Entity Driven Search

Working effectively together

NLP & Semantic Enrichment

15

• From unstructured to structured

• NLP Analysis. POS Tagging

• Named Entities Recognition

• Linked Data

• Entity Linking using Knowledge Bases

• Disambiguation

• Indexing in Solr

Page 16: Content Discovery Through Entity Driven Search

Working effectively together

Smart Autocomplete

16

• Multi Phase suggestions

• Closer to natural language query formulation

• Named Entities infix

• Entity types infix

• Multi Language entity type support

• Properties driven query approach

Page 17: Content Discovery Through Entity Driven Search

Working effectively together

Smart Autocomplete Configuration

17

• Entity type properties

• Interesting to our use case and scenario

• Properties inheritance through type hierarchy

• Enhance type information from external resource

•Freebase, DbPedia , Custom Data Set

Page 18: Content Discovery Through Entity Driven Search

Working effectively together

Semantic Search

18

• Search by Named Entity

• Search by Entity Type

• Search by Entity Type properties

• Grouping Results by Sense

• Contextualize Results Using Semantic Information

Page 19: Content Discovery Through Entity Driven Search

Working effectively together

Semantic More Like This

19

• Search for Similar Documents based on Entities and Entities’ categories

• Similarity Function based on Documents’ Sense

• Not based on text tokens

• Entity Frequency / Inverted Document Frequency

• Entity Type Frequency / Inverted Document Frequency

Page 20: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

20

• Context

• Problem

• Solution

• Demo

• Future Works

Page 21: Content Discovery Through Entity Driven Search

Working effectively together

Agenda

21

• Context

• Problem

• Solution

• Demo

• Future Works

Page 22: Content Discovery Through Entity Driven Search

Working effectively together

Future Work

22

• Semantic More Like This new approach (Graph relations)

• Machine Learning components: Classification, Topic annotation, Clustering

• Semantic facets

• Secured Entity Search

• Image and Media searches

Page 23: Content Discovery Through Entity Driven Search

Working effectively together

Conclusions

23

• Better user experience

• More precision in search results

• Closer to human language

Page 24: Content Discovery Through Entity Driven Search

Zaizi HeadquartersBrook House4th Floor, North Wing229-243 Shepherd’s Bush RoadLondon W6 7ANUnited KingdomT: (+44) 20 3582 8330 Zaizi IberiaCalle Gremios 13-15, Edificio DiseñoPlanta 1, Oficina 541927 Mairena del Aljarafe SevillaSpainT: (+34) 666 42 43 64 Zaizi Asia50 Flower RoadColombo 07Sri LankaT: (+94) 112 301 461 Zaizi Singapore14 Robinson Road #13-00Far East Finance BuildingSingapore 048545T: (+65) 3158 5886F: (+65) 6323 1839

VAT Registration No GB 932 8855 89Registered in England and Wales with registration number 6440931

www.zaizi.com

Thanks!