Hedtek making the difficult simple The Mosaic Search Engine Mark van Harmelen Hedtek Ltd [email protected] hedtek.com
Dec 05, 2014
Hedtek making the difficult simple
The Mosaic Search Engine
Mark van HarmelenHedtek Ltd
Hedtek making the difficult simple
Aim
• Provide a proof of concept that – Users can have personalised search results
according to their place and stage of studies– Users can adopt other personas or points-of-view
to explore academic resources– We can exploit ‘mass’ attention data as revealed
by library circulation information• So far only working with ISBN identified books
Hedtek making the difficult simple
front-end
Solr
build Solr index
HEI
anonymise
circulation data
reading listspartial Copac records annotated with
use and reading list data
HEI
anonymise
HEI
anonymise
Hedtek making the difficult simple
Anonymisation
• Level 1: Current prototype, enables faceting• Level 2: With extra information, enables
“people who borrowed this also borrowed”and“people who borrowed this went on to borrow”
• Anonymisation utility provided• DPA compliant,
can also use fair processing agreements
Hedtek making the difficult simple
Augmenting Solr’s index
• Solr’s search index is loaded with items and any associated use information
• Use information is: institution course progression level year of use count of number of uses in that year
• Use information enables faceting• Also add reading list info to items
Hedtek making the difficult simple
client-side front-end (browser)
Solr OPAC
query resultset
itemquery
item data
Hedtek making the difficult simple
Narrowing and broadening
• Thoughts (NB, ‘thoughts’) of narrowing of choice led to two features to broaden choice– Don’t believe that the Mosaic demo in itself
narrows when used for browsing• Broadening features– More like this link– Reading lists
Hedtek making the difficult simple
The Harry Potter ‘problem’ and scale
• The Harry Potter ‘problem’: Balderdash!– We can control this using Library of Congress
subject categories and Dewey Decimal shelfmarks• Paul Miller raises questions of scale– Dave Pattern has shown success use of use data at
a single (small) institution– We want to leverage reasonably large scale: 3.5-
4M students in HE, over say the last five years
Hedtek making the difficult simple
User context and attention
• Has been relatively simple to parameterise an open source search engine with user context– Institution, course, progression level, academic year
• This is only part of the user context, can add– Location– Attention data, e.g., search history– Further social search information
Hedtek making the difficult simple
Disclaimer
• The next slide is independent of any decisions on a pure data approach– Could be a pure data approach in there– Or maybe not
Hedtek making the difficult simple
Where is this going? A personal view
• Bind togetherFRBRish catalogue
better search UX and persistent URLs for personalisation purposes
Mosiac searchpersonalised/point-of-view search
– Massively parallel search for blindingly fast response times– Data mining for library ‘stewardship’
• We have prototypes for the first two, and we’re about to start experimenting with parallel search using Hadoop+Lucene
Hedtek making the difficult simple
Building institutional contributions
• Propose union-cat-local: Search in local library– Mosaic-like search utilises local loan data if it is
available• Two ways to encourage library contribution of
loan data (thoughts in progress)– Narrow: Libraries which contribute loan data to the
pool get Mosaic search over the pool– Broad: Offer the contextual/PoV search available
everywhere; users will agitate if they don’t see local data
Hedtek making the difficult simple
This is a Just Do It moment
• A national union catalogue with contextual search and local library interfaces– Relatively cheap to do– Potentially massive gains for learners, teachers and
researchers– Portends the development of shared services across
the library domain and large cost savings– Doesn’t preclude / agnostic on an open data approach– Could incorporate a pure data service approach
and/or a centralised service
Hedtek making the difficult simple
Questions