Top Banner
Hedtek making the difficult simple The Mosaic Search Engine Mark van Harmelen Hedtek Ltd [email protected] hedtek.com
14

Mosiac Search Engine

Dec 05, 2014

Download

Technology

markvanharmelen

The Mosaic search engine is a prototype of an bibliographic search engine with personalisation facilities produced as part of the JISC-funded Mosaic Project
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mosiac Search Engine

Hedtek making the difficult simple

The Mosaic Search Engine

Mark van HarmelenHedtek Ltd

[email protected]

Page 2: Mosiac Search Engine

Hedtek making the difficult simple

Aim

• Provide a proof of concept that – Users can have personalised search results

according to their place and stage of studies– Users can adopt other personas or points-of-view

to explore academic resources– We can exploit ‘mass’ attention data as revealed

by library circulation information• So far only working with ISBN identified books

Page 3: Mosiac Search Engine

Hedtek making the difficult simple

front-end

Solr

build Solr index

HEI

anonymise

circulation data

reading listspartial Copac records annotated with

use and reading list data

HEI

anonymise

HEI

anonymise

Page 4: Mosiac Search Engine

Hedtek making the difficult simple

Anonymisation

• Level 1: Current prototype, enables faceting• Level 2: With extra information, enables

“people who borrowed this also borrowed”and“people who borrowed this went on to borrow”

• Anonymisation utility provided• DPA compliant,

can also use fair processing agreements

Page 5: Mosiac Search Engine

Hedtek making the difficult simple

Augmenting Solr’s index

• Solr’s search index is loaded with items and any associated use information

• Use information is: institution course progression level year of use count of number of uses in that year

• Use information enables faceting• Also add reading list info to items

Page 6: Mosiac Search Engine

Hedtek making the difficult simple

client-side front-end (browser)

Solr OPAC

query resultset

itemquery

item data

Page 7: Mosiac Search Engine

Hedtek making the difficult simple

Narrowing and broadening

• Thoughts (NB, ‘thoughts’) of narrowing of choice led to two features to broaden choice– Don’t believe that the Mosaic demo in itself

narrows when used for browsing• Broadening features– More like this link– Reading lists

Page 8: Mosiac Search Engine

Hedtek making the difficult simple

The Harry Potter ‘problem’ and scale

• The Harry Potter ‘problem’: Balderdash!– We can control this using Library of Congress

subject categories and Dewey Decimal shelfmarks• Paul Miller raises questions of scale– Dave Pattern has shown success use of use data at

a single (small) institution– We want to leverage reasonably large scale: 3.5-

4M students in HE, over say the last five years

Page 9: Mosiac Search Engine

Hedtek making the difficult simple

User context and attention

• Has been relatively simple to parameterise an open source search engine with user context– Institution, course, progression level, academic year

• This is only part of the user context, can add– Location– Attention data, e.g., search history– Further social search information

Page 10: Mosiac Search Engine

Hedtek making the difficult simple

Disclaimer

• The next slide is independent of any decisions on a pure data approach– Could be a pure data approach in there– Or maybe not

Page 11: Mosiac Search Engine

Hedtek making the difficult simple

Where is this going? A personal view

• Bind togetherFRBRish catalogue

better search UX and persistent URLs for personalisation purposes

Mosiac searchpersonalised/point-of-view search

– Massively parallel search for blindingly fast response times– Data mining for library ‘stewardship’

• We have prototypes for the first two, and we’re about to start experimenting with parallel search using Hadoop+Lucene

Page 12: Mosiac Search Engine

Hedtek making the difficult simple

Building institutional contributions

• Propose union-cat-local: Search in local library– Mosaic-like search utilises local loan data if it is

available• Two ways to encourage library contribution of

loan data (thoughts in progress)– Narrow: Libraries which contribute loan data to the

pool get Mosaic search over the pool– Broad: Offer the contextual/PoV search available

everywhere; users will agitate if they don’t see local data

Page 13: Mosiac Search Engine

Hedtek making the difficult simple

This is a Just Do It moment

• A national union catalogue with contextual search and local library interfaces– Relatively cheap to do– Potentially massive gains for learners, teachers and

researchers– Portends the development of shared services across

the library domain and large cost savings– Doesn’t preclude / agnostic on an open data approach– Could incorporate a pure data service approach

and/or a centralised service

Page 14: Mosiac Search Engine

Hedtek making the difficult simple

Questions