Transcript

Searching Linked Data with Spinque

Michiel Hildebrand, Wouter Alink, Roberto Cornacchia, Arjen de Vries

Search Engines Amsterdam, January 30 2015

background

concept

product

Information Retrieval and DB integrationCornacchia et al. Flexible and efficient IR using Array Databases. VLDB‘08 JournalMühleisen et al. Column Stores as an IR Prototyping Tool. ECIR’14 & SIGIR’14

Search by StrategyAlink et al. Searching CLEF-IP by strategy. CLEF’09PatOlympics, 2010 and 2011

Tailored access to connected datasetsKoninklijke Bibliotheek, Wageningen Universiteit, Beeld&Geluid, Elsevier, Heineken, ...

Heterogenous Data

Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05

Complex information needs

SQL

CSV

XML

HTML

OAI

JSON

Heterogenous University Data

Financial administration (ERP)Contract administration (CMS)Contract documents (CMS attachments)Publication database (Institutional Repository)Publication documents (Institutional Repository PDFs)Employee database (address lists, ERP+CMS)Companies (CMS + ERP + document mentions)Subsidy database (CMS)Departments (address lists, CMS)Web addresses (extracted from documents)Topic (assigned to publications)Research programmes (dependent on funding scheme)

Complex information needs

What funding schemes are the primary source of income?Can we move to Europe when Dutch funding dries up?

Who has active relations with partner X?“Valorisation”; new national funding requirementsWhat industry sectors do we depend upon?How many projects in smart cities? Green energy? Cloud computing? Etc.How are strategic decisions implemented?Has objective “move from Telecom toward ICT” been achieved, and how does it develop over time?

Heterogenous University Data

Harvest and link data, model as a graph

Complex information needs

Search by Strategy

Project by topic

Search in attachments of

projects

Search for project

contracts (by metadata)

Traverse from attachments to

projects & combine results

Topic expert

Search objects about topic

Expand with neighbours in and out

Return related persons Ranked by tf-idf on relations

Norbert Fuhr, Thomas Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems (1994)

API

STRATEGY EDITOR COMPILERINDEXING PIPELINE

SQL

CSV

HTML

OAI

XML

APPLICATIONS

Search by Strategy(visual) modelling of search processes

Rank. Everything. Always.all-round probabilistic search

Many strategies, one data modelmany search engines, one index

Components Supporting the Open Data Exploitation

APISQL

CSV

HTML

OAI

XML

STRATEGY EDITOR COMPILER APPLICATIONSINDEXING PIPELINE

Application front-end 400 lines of Javascript

autocompletion

Application back-end 3 search strategies

location search location + text search

API Builder for Open Data?

Supporting (search) application developersGregory Grefenstette. Search-based applications. 2010Jamie Callan. Search Engine Support For Software Applications. CIKM 2010 Keynote

Who builds search strategies?Developers are not IR specialistsDomain specialists neither

How to handle schema-mess?in a heterogeneous dataspace

Happy alignments are all alike,every unhappy alignment is unhappy in it’s own way

Jacco van Ossenbruggen 2012 (improvisation on Anna Karenina, Leo Tolstoy 1887)

Alignment strategies

Interactive vocabulary alignment, Jacco van Ossenbruggen, Michiel Hildebrand, Victor de Boer, TPDL 2011

Coming soon

Spinque Alignment ServiceBeeld&Geluid, Naturalis, Rijksdienst Cultureel Erfgoed (RCE)

www.spinque.com

michiel@spinque.com

comsode.eu

top related