Top Banner
When worlds collide Metasearching meets central indexes Mike Taylor – mike@indexdata. Index Data – http://indexdata
63

When worlds collide Metasearching meets central indexes

Jan 16, 2016

Download

Documents

aldona

When worlds collide Metasearching meets central indexes. Mike Taylor – [email protected] Index Data – http://indexdata.com/. Search. When worlds collide : metasearching and central indexes Mike Taylor – [email protected]. Search. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: When worlds collide Metasearching meets central indexes

When worlds collide

Metasearching meetscentral indexes

Mike Taylor – [email protected]

Index Data – http://indexdata.com/

Page 2: When worlds collide Metasearching meets central indexes

Search

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 3: When worlds collide Metasearching meets central indexes

Search

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 4: When worlds collide Metasearching meets central indexes

Search

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data

Page 5: When worlds collide Metasearching meets central indexes

Search

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data

Problem solved!

Page 6: When worlds collide Metasearching meets central indexes

Search

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData Data

? ?

Page 7: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

Page 8: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

360 SearchEHIS (EBSCO)MetaLib

Page 9: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

360 SearchEHIS (EBSCO)MetaLib

Pazpar2(Open source)

Page 10: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

Page 11: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

A.K.A. federated search

Searching

Page 12: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

A.K.A. federated search

A.K.A. distributed search

Searching

Page 13: When worlds collide Metasearching meets central indexes

Metasearch

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

A.K.A. federated search

A.K.A

. bro

adcast

searc

h

A.K.A. distributed search

Searching

?

Page 14: When worlds collide Metasearching meets central indexes

Back tothe sadsearcher

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData Data

? ?

Page 15: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

Page 16: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

SummonWorldCatPrimo Central

Page 17: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

SummonWorldCatPrimo Central

MasterKey

Page 18: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

A.K.A. local index

Page 19: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

A.K.A. local indexA.K.A. discovery services

Page 20: When worlds collide Metasearching meets central indexes

Centralindex

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

DataData DataData

Fat database

Harvesting

A.K.A. local index

A.K.A

. verti

cal s

earch

A.K.A. discovery services

?

Page 21: When worlds collide Metasearching meets central indexes

We need a controlled vocabulary!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Metasearch= Federated search= Distributed search= Broadcast search

Central index= Local index= Discovery services= Vertical search (if you ever heard anything so dumb)

Page 22: When worlds collide Metasearching meets central indexes

Which approach is better?

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Central indexing compared with metasearching:

- requires harvesting infrastructure- requires lots of local storage- requires co-operation from services to be harvested- does not have access to all searchable data- will always be somewhat out of date- is faster at search time (or SHOULD be)- allows data to be normalised (e.g. dates extracted)- allows for better relevance ranking- can provide pre-baked facets- may have access to some data that not searchable

Page 23: When worlds collide Metasearching meets central indexes

Which approach is better?

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 24: When worlds collide Metasearching meets central indexes

Which approach is better?

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 25: When worlds collide Metasearching meets central indexes

Which approach is better?

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 26: When worlds collide Metasearching meets central indexes

Which approach is better?

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Let's do both!

Page 27: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

! “Integrated Search”

Page 28: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

! “Integrated Search”

Page 29: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

! “Integrated Search”

Page 30: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

! “Integrated Search”

Page 31: When worlds collide Metasearching meets central indexes

Metasearchhides thecomplexity

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

Page 32: When worlds collide Metasearching meets central indexes

Metasearch

Nine tenths underThe surface

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

Page 33: When worlds collide Metasearching meets central indexes

Metasearch

What you seelooks beautiful

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

Page 34: When worlds collide Metasearching meets central indexes

Problems that need solving

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

A. Problems with pure metasearching

B. How those problems change when you add a central index

Page 35: When worlds collide Metasearching meets central indexes

Problems with metasearching

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Examples based on Index Data's suite:

Pazpar2 is a free metasearching engine with a stupid name

http://indexdata.com/pazpar2/

MasterKey is a non-open suite that wraps ithttp://indexdata.com/masterkey/

MasterKey is only one way to use Pazpar2

Also integrated into other vendors' UIs.

Page 36: When worlds collide Metasearching meets central indexes

Problems with metasearching#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

Must be made available via a standard protocol

Page 37: When worlds collide Metasearching meets central indexes

Problems with metasearching#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

Must be made available via a standard protocol

Option 1: build a gateway in Perlhttp://indexdata.com/simpleserver/

Page 38: When worlds collide Metasearching meets central indexes

Problems with metasearching#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

Must be made available via a standard protocol

Option 1: build a gateway in Perlhttp://indexdata.com/simpleserver/

Option 2: MasterKey Connect (non-open)http://indexdata.com/connector-framework

Page 39: When worlds collide Metasearching meets central indexes

Problems with metasearching#2: data server is crap^H^H^H^Hsuboptimal

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Catalogs searchable using ANSI/NISO Z39.50

Support is very nominal in some cases

Page 40: When worlds collide Metasearching meets central indexes

Problems with metasearching#2: data server is crap^H^H^H^Hsuboptimal

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Catalogs searchable using ANSI/NISO Z39.50

Support is very nominal in some cases

IRSpy probes behaviourhttp://irspy.indexdata.com

MasterKey target profiles describe behaviour

Page 41: When worlds collide Metasearching meets central indexes

Problems with metasearching#3: Data servers don't support relevance

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 42: When worlds collide Metasearching meets central indexes

Problems with metasearching#3: Data servers don't support relevance

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Pazpar2 does its own relevance ranking

(Part of merging/deduplication)

Page 43: When worlds collide Metasearching meets central indexes

Problems with metasearching#4: Data servers don't return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Page 44: When worlds collide Metasearching meets central indexes

Problems with metasearching#4: Data servers don't return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Pazpar2 calculates its own facets

Page 45: When worlds collide Metasearching meets central indexes

There isa lot ofmagic in themagic boxSearchingSortingMergingDeduplicationRelevanceFacet generationTime travel...

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Page 46: When worlds collide Metasearching meets central indexes

There isa lot ofmagic in themagic boxSearchingSortingMergingDeduplicationRelevanceFacet generationTime travel...

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Pazpar2

DataData DataData

Remember, ourengine is free:

http://indexdata.com/pazpar2/

Page 47: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

! What happenswhen we adda central index?

Page 48: When worlds collide Metasearching meets central indexes

Problems with integrated search#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

Page 49: When worlds collide Metasearching meets central indexes

Problems with integrated search#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

Page 50: When worlds collide Metasearching meets central indexes

Problems with integrated search#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

You can't harvest Google

Page 51: When worlds collide Metasearching meets central indexes

Problems with integrated search#1: No data server at all!

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Data is often only in a user-facing Web UI

You can't harvest Google

You just can't

Page 52: When worlds collide Metasearching meets central indexes

Problems with integrated search#2: data server is crap^H^H^H^Hsuboptimal

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Repositories harvestable using OAI-PMH

(an even worse name than pazpar2)

Support is very nominal in some cases

Page 53: When worlds collide Metasearching meets central indexes

Problems with integrated search#2: data server is crap^H^H^H^Hsuboptimal

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Repositories harvestable using OAI-PMH (an even worse name than pazpar2)

Support is very nominal in some cases

OAI-PMH client must be very tolerant

Extensive data-cleaning is usually required

Page 54: When worlds collide Metasearching meets central indexes

Problems with integrated search#3: Central index does support relevance

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Returned records carry relevance scores

Must be merged with records scored by engine

Requires score normalisation into same range

Existing ordering may be used in merge

Page 55: When worlds collide Metasearching meets central indexes

Problems with integrated search#3: Central index does support relevance

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Unranked#1

Ranked#1

Ranked#2

Solr

Sort

MergedUnranked#2 Sort

Page 56: When worlds collide Metasearching meets central indexes

Problems with integrated search#4: Central index does return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Lists of field values with occurrence counts:

AuthorKernighan 27Pike 13Ritchie 7Thompson 4

TitleC 7Unix 35Programming 16

Date1977 51978 41979 21981 2

Page 57: When worlds collide Metasearching meets central indexes

Problems with integrated search#4: Central index does return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Lists are returned or calculated for each server:

Server 1 (central index)(all facets from 2000 hits)

Cat 68Dinosaur 162Fish 145Frog 19

Server 2 (metasearch)(1000 hits, 100 records)

Cat 7Dog 10Dinosaur 87Fish 23

Page 58: When worlds collide Metasearching meets central indexes

Problems with integrated search#4: Central index does return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Metasearched counts normalised by total hit-count

Server 1 (central index)(all facets from 2000 hits)

Cat 68Dinosaur 162Fish 145Frog 19

Server 2 (metasearch)(normalised to 1000 hits)

Cat 70Dog 100Dinosaur 870Fish 230

Page 59: When worlds collide Metasearching meets central indexes

Problems with integrated search#4: Central index does return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Facet lists are merged

Servers 1+2 (integrated)(as though for all records in result sets)

Cat 68+70 = 138Dog 0+100 = 100Dinosaur 162+870 = 1032Fish 145+230 = 375Frog 19+0 = 19

Page 60: When worlds collide Metasearching meets central indexes

Problems with integrated search#4: Central index does return facets

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Fringe benefit: facet-count normalisation is alsouseful when doing pure metasearching.

Servers 1+2(as though for all records in result sets)

Cat 68+70 = 138Dog 0+100 = 100Dinosaur 162+870 = 1032Fish 145+230 = 375Frog 19+0 = 19

Page 61: When worlds collide Metasearching meets central indexes

Summary of search issues

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Issue Metasearchsolution

Central indexsolution

No data serverBuild gatewaysMasterKey Connect

---

Bad data server Probe capabilitiesProfile targets

Tolerant harvesterData-cleaning

Relevance scores Magic engineNormalise scores Ingest from server

Facets Magic engineNormalise counts Ingest from server

Page 62: When worlds collide Metasearching meets central indexes

When worlds collide: metasearching and central indexes Mike Taylor – [email protected]

Magic box

DataData DataData

Searching

DataData DataData

Fat database

Harvesting

Page 63: When worlds collide Metasearching meets central indexes

When worlds collide

Metasearching meetscentral indexes

Mike Taylor – [email protected]

Index Data – http://indexdata.com/