Top Banner
Challenges for industrial-strength Information Retrieval on Databases R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers KARS2017 - 21 March 2017, Venice, IT
28

Challenges for Industrial-strength Information Retrieval on Databases

Apr 08, 2017

Download

Internet

Arjen de Vries
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Challenges for Industrial-strength Information Retrieval on Databases

Challenges for industrial-strength

Information Retrieval on DatabasesR. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers

KARS2017 - 21 March 2017, Venice, IT

Page 2: Challenges for Industrial-strength Information Retrieval on Databases

○ Since 2010

○ Spin-off of CWI, Amsterdam

○ “Search by Strategy”

About Spinque

Page 3: Challenges for Industrial-strength Information Retrieval on Databases

Outline

1. Search is everywhere

2. Tailored search is expected

3. Tailored search needs modelling

4. Search modelling by information specialists

5. Search modelling needs flexible IR & DB

6. IR on DB: it works

Page 4: Challenges for Industrial-strength Information Retrieval on Databases

Search is everywhere

Real world scenarios

Technical

Desktop

Coding content assistant

Product recommendation

Personalised newsfeed

Page 5: Challenges for Industrial-strength Information Retrieval on Databases

Let’s pick a simple one: autocompletion

iphone 7

iphone 5c

iphone 6s

ipho|“autocompletion is trivial”

.. not so fast!

Tailored search is expected

Page 6: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone 7

iphone 5c

iphone 6s

ipho|

Basic - products

○ Any matching term from the index

○ Suggest products

Tailored search is expected

Page 7: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone 7

iphone 5c

iphone 6 cases

ipho|

Basic - products & categories

○ Any matching term from the index

○ Suggest products & categories

Tailored search is expected

Page 8: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone 7

iphone 6 cases

iphone 6s

ipho|

Filtered

○ Any matching term from the index

○ “iPhone 5c” out of stock

Tailored search is expected

Page 9: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone 8

iphone 7

iphone 6 cases

ipho|

Filtered & ranked

○ “iPhone 5c” out of stock

○ “iPhone 8” the most requested

Tailored search is expected

Page 10: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone cases

iphone adapters

iphone 7

ipho|

Exploratory

○ First suggest categories..

○ .. then products

Tailored search is expected

Page 11: Challenges for Industrial-strength Information Retrieval on Databases

autocompletion

iphone 7 cases

iphone 7 adapters

iphone 8

ipho|

Personalised

○ I already own an “iPhone 7”

○ Suggest compatible accessories

○ Suggest upgrade

Tailored search is expected

Page 12: Challenges for Industrial-strength Information Retrieval on Databases

What if my search API isn’t enough?

Tailored search needs modelling

iphone 7 cases

iphone 7 adapters

iphone 8

ipho|

<your favourite autocompletion>

○ Out-of-the-box API may fall short

○ Build custom search API

○ Who? How?

http://localhost:8983/solr/suggest?q=ipho

Page 13: Challenges for Industrial-strength Information Retrieval on Databases

How do we build custom search APIs?

Search modelling by information specialists

data modelling search modelling

Spinque: Empower the information specialist

Page 14: Challenges for Industrial-strength Information Retrieval on Databases

Empowering the information specialist

data modelling search modelling

Search modelling by information specialists

Page 15: Challenges for Industrial-strength Information Retrieval on Databases

Data modelling

Search modelling needs flexible IR & DB

business transactions social media

Page 16: Challenges for Industrial-strength Information Retrieval on Databases

Search modelling

standard autocompletion custom autocompletion

Search modelling by information specialists

http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho

Page 17: Challenges for Industrial-strength Information Retrieval on Databases

The IR & DB challenge

Search modelling needs flexible IR & DB

○ IR & DB both needed even for trivial tasks

○ Different technologies / focus

○ How / where to integrate task results?

○ Do they stay black boxes?

○ Can we express them in the same platform,

and when does this make sense?

http://spinque/suggest_ranked?q=ipho

Page 18: Challenges for Industrial-strength Information Retrieval on Databases

Text retrieval by strategy

Search modelling needs flexible IR & DB

text retrieval.. ..is just another DB query

○ strategy-driven “collection” and “documents”

○ on-demand indexing

○ it takes just standard SQL

Page 19: Challenges for Industrial-strength Information Retrieval on Databases

Graph DB by strategy

Search modelling needs flexible IR & DB

Visual modelling Relational Algebra Graph

subject property object

123 name pen

123 availability in stock

123 price 9.99

Page 20: Challenges for Industrial-strength Information Retrieval on Databases

Graph DB by strategy

Search modelling needs flexible IR & DB

we want DB & ranking

together & seamlesslywhat if this.. ..could work on this?

subject property object p

123 name pen 1.0

123 availability in stock 0.8

123 price 9.99 1.0

Page 21: Challenges for Industrial-strength Information Retrieval on Databases

Rank. Everything. Always.

Search modelling needs flexible IR & DB

rank products.. ..get ranked orders and customers

Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB

SELECT g.obj, (o.p * g.p) as pFROM graph g, ranked_orders oWHERE g.subj = o.idAND g.rel = ’orderedBy’;

PROJECT [$3]

JOIN INDEPENDENT [$1=$1] SELECT [$2=’orderedBy’] (g) ranked_orders SQLPRA

Page 22: Challenges for Industrial-strength Information Retrieval on Databases

What about efficiency?

IR on DB: it works

1.1M docs, 2.3GB

4-core i7-3770s, 16GB RAM, 256GB SSD

find documents: 20ms

8M lots, 25K auctions (10GB raw data)

VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD

find lots: 150ms

topic

Page 23: Challenges for Industrial-strength Information Retrieval on Databases

What about efficiency?

IR on DB: it works

pre-compute what can be pre-computed.. ..but do it query-driven

○ Index on demand

○ Cache result of relational expressions

○ Algebraic analysis to determine cache

Page 24: Challenges for Industrial-strength Information Retrieval on Databases

What about efficiency?

IR on DB: it works

choose it carefully.. ..then enjoy

○ Main benefits of IR on DB

○ IR as a DB optimisation problem

○ No custom extensions, no vendor-lock

○ Column-store, CPU-friendly DB engine

Hey, we made our join 20% faster. You are welcome.

Page 25: Challenges for Industrial-strength Information Retrieval on Databases

○ If you just text retrieval on documents

○ Lucene-like will serve you well

○ Information needs tend to be more complex

○ Solve at application-level: common and painful

○ A one-platform approach pays off

IR on DB: when does it make sense?

IR on DB: it works

Page 26: Challenges for Industrial-strength Information Retrieval on Databases

Conclusions

1. Search is everywhere

○ In the real world..

2. Tailored search is expected

○ ..there is no search like another.

3. Tailored search needs modelling

○ Someone will put effort in it..

4. Search modelling by information specialists

○ ..who better than the right person for the job?

5. Search modelling needs flexible IR & DB

○ Who takes care of the low-level details then?

6. IR on DB: it works

○ The right tools. The right architecture.

Page 27: Challenges for Industrial-strength Information Retrieval on Databases

○ Live updates

○ ACID transactions overhead

○ Scale out

○ It’s more than “just an inverted file” to be distributed

○ Even better support for information specialists

○ Strategy auto-tuning

Challenges ahead

Page 28: Challenges for Industrial-strength Information Retrieval on Databases

supporting information specialists

Don’t program search engines,

design them