Top Banner
Iden%fying Informa%on Needs by Modelling Collec%ve Query Pa:erns K.Elbedweihy, S. Mazumdar, A.E. Cano, S.N. Wrigley, F.Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield
23

Identifying Information Needs by Modelling Collective Query Patterns

Jan 21, 2015

Download

Technology

kelbedweihy

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identifying Information Needs by Modelling Collective Query Patterns

Iden%fying  Informa%on  Needs  by  Modelling  Collec%ve  Query  Pa:erns  

K.Elbedweihy,  S.  Mazumdar,  A.E.  Cano,  S.N.  Wrigley,  F.Ciravegna  OAK  Research  Group,    

Department  of  Computer  Science,    University  of  Sheffield  

Page 2: Identifying Information Needs by Modelling Collective Query Patterns

Informa%on  Needs  

           

           Informa(on  needs      “the  set  of  concepts  and  proper%es  users  refer  to  while  

using  SPARQL  queries.”  

Page 3: Identifying Information Needs by Modelling Collective Query Patterns

Informa%on  Needs  (Cont’d)  !!PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?manufacturer WHERE {!<http://dbpedia.org/resource/Acura_ZDX> !!dbo:manufacturer ?manufacturer. !}!

!•  User’s  informa%on  needs:    

 concept:    “h:p://dbpedia.org.../Automobile”    property:  “dbo:manufacturer”  

query

type

Page 4: Identifying Information Needs by Modelling Collective Query Patterns

Mo%va%on  

 Saracevic[1997]:    “The  success  or  failure  of  any  interac%ve  system  and  technology  is  con%ngent  on  the  extent  to  which  user  issues,  the  human  factors,  are  addressed  right  from  the  beginning  to  the  very  end…..”    Peter  Mika[2009]:  “Considering  the  informa%on  needs  of  end  users  is  cri%cal  to  the  success  of  Seman%c  Search”      !

Page 5: Identifying Information Needs by Modelling Collective Query Patterns

Mo%va%on  

 understand  how  to  use  logs  of  queries                iden%fy  informa%on  needs          consume  such  analysis      

                                                             be:er  understanding  and  insight  into  the  data  usage    !

Page 6: Identifying Information Needs by Modelling Collective Query Patterns

•  Introduc%on  •  Related  Work  •  Approach  

 -­‐  Formalising  Query  Logs      -­‐  Analysing  Query  Logs    

 -­‐  Consuming  Query  Log  Analyses  •  Dataset  &  Findings  

Outline  

Page 7: Identifying Information Needs by Modelling Collective Query Patterns

As of September 2011

MusicBrainz

(zitgist)

P20

Turismo de

Zaragoza

yovisto

Yahoo! Geo

Planet

YAGO

World Fact-book

El ViajeroTourism

WordNet (W3C)

WordNet (VUA)

VIVO UF

VIVO Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UniRef

UniProt

UMBEL

UK Post-codes

legislationdata.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov.

uk

Traffic Scotland

theses.fr

Thesau-rus W

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

Open Library (Talis)

tags2con delicious

t4gminfo

Swedish Open

Cultural Heritage

Surge Radio

Sudoc

STW

RAMEAU SH

statisticsdata.gov.

uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

SSW Thesaur

us

SmartLink

Slideshare2RDF

semanticweb.org

SemanticTweet

Semantic XBRL

SWDog Food

Source Code Ecosystem Linked Data

US SEC (rdfabout)

Sears

Scotland Geo-

graphy

ScotlandPupils &Exams

Scholaro-meter

WordNet (RKB

Explorer)

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAASKISTI

JISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints dotAC

DEPLOY

DBLP (RKB

Explorer)

Crime Reports

UK

Course-ware

CORDIS (RKB

Explorer)CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov.

ukRen. Energy Genera-

tors

referencedata.gov.

uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

Rådata nå!

PSH

Product Types

Ontology

ProductDB

PBAC

Poké-pédia

patentsdata.go

v.uk

OxPoints

Ord-nance Survey

Openly Local

Open Library

OpenCyc

Open Corpo-rates

OpenCalais

OpenEI

Open Election

Data Project

OpenData

Thesau-rus

Ontos News Portal

OGOLOD

JanusAMP

Ocean Drilling Codices

New York

Times

NVD

ntnusc

NTU Resource

Lists

Norwe-gian

MeSH

NDL subjects

ndlna

myExperi-ment

Italian Museums

medu-cator

MARC Codes List

Man-chester Reading

Lists

Lotico

Weather Stations

London Gazette

LOIUS

Linked Open Colors

lobidResources

lobidOrgani-sations

LEM

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

LinkedUser

FeedbackLOV

Linked Open

Numbers

LODE

Eurostat (OntologyCentral)

Linked EDGAR

(OntologyCentral)

Linked Crunch-

base

lingvoj

Lichfield Spen-ding

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Klapp-stuhl-club

Good-win

Family

National Radio-activity

JP

Jamendo (DBtune)

Italian public

schools

ISTAT Immi-gration

iServe

IdRef Sudoc

NSZL Catalog

Hellenic PD

Hellenic FBD

PiedmontAccomo-dations

GovTrack

GovWILD

GoogleArt

wrapper

gnoss

GESIS

GeoWordNet

GeoSpecies

GeoNames

GeoLinkedData

GEMET

GTAA

STITCH

SIDER

Project Guten-berg

MediCare

Euro-stat

(FUB)

EURES

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

CORDIS(FUB)

Freebase

flickr wrappr

Fishes of Texas

Finnish Munici-palities

ChEMBL

FanHubz

EventMedia

EUTC Produc-

tions

Eurostat

Europeana

EUNIS

EU Insti-

tutions

ESD stan-dards

EARTh

Enipedia

Popula-tion (En-AKTing)

NHS(En-

AKTing) Mortality(En-

AKTing)

Energy (En-

AKTing)

Crime(En-

AKTing)

CO2 Emission

(En-AKTing)

EEA

SISVU

education.data.g

ov.uk

ECS South-ampton

ECCO-TCP

GND

Didactalia

DDC Deutsche Bio-

graphie

datadcs

MusicBrainz

(DBTune)

Magna-tune

John Peel

(DBTune)

Classical (DB

Tune)

AudioScrobbler (DBTune)

Last.FM artists

(DBTune)

DBTropes

Portu-guese

DBpedia

dbpedia lite

Greek DBpedia

DBpedia

data-open-ac-uk

SMCJournals

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Metoffice Weather Forecasts

Discogs (Data

Incubator)

Climbing

data.gov.uk intervals

Data Gov.ie

databnf.fr

Cornetto

reegle

Chronic-ling

America

Chem2Bio2RDF

Calames

businessdata.gov.

uk

Bricklink

Brazilian Poli-

ticians

BNB

UniSTS

UniPathway

UniParc

Taxonomy

UniProt(Bio2RDF)

SGD

Reactome

PubMedPub

Chem

PRO-SITE

ProDom

Pfam

PDB

OMIMMGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Com-pound

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

Affy-metrix

bible ontology

BibBase

FTS

BBC Wildlife Finder

BBC Program

mes BBC Music

Alpine Ski

Austria

LOCAH

Amster-dam

Museum

AGROVOC

AEMET

US Census (rdfabout)

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

Introduc%on  

295 Dataset 31 billion RDF triples “September 2011”

Page 8: Identifying Information Needs by Modelling Collective Query Patterns

Introduc%on  

Semantic Query Logs

Page 9: Identifying Information Needs by Modelling Collective Query Patterns

Related  Work  Analysis  for  the  Web  of  Documents    

•  Studying  the  search  behavior  of  Web  users  [Silverstein  et  al.  (1999),  Jansen  and  Spink  (2005),  Jansen  et  al.  (2005)  and  Spink  et  al.  (2002)].  

•  Improving  the  search  experience  of  Web  users:    -­‐  Query  Recommenda(ons  [Baeza-­‐Yates  et  al.  (2004)  and      Wen  et  al.  (2001)]    -­‐  Query  Expansion  [Cui  et  al.  (2002a)]  

 

Page 10: Identifying Information Needs by Modelling Collective Query Patterns

Related  Work  (Cont’d)  Analysis  for  the  Web  of  Data  •  Moller  et  al.  [10]  iden%fied  pa>erns  of  Linked  Data  usage  with  respect  to  different  types  of  agents.  

•  Arias  et  al.  [1]  analyzed  the  structure  of  the  SPARQL  queries  to  iden(fy  most  frequent  language  elements.  

•  Kirchberg  et  al.  [8]  introduced  a  new  no%on  of  ‘relevance  of  a  LD  resource’  as  the  ‘rela%onship  between  traffic  and  the  resource  and  whether  it  changes  over  %me  windows’  

Page 11: Identifying Information Needs by Modelling Collective Query Patterns

Related  Work  (Cont’d)    

How  our  work  is  different:    

Our  focus  is  on        iden%fying  informa%on  needs  by  

     modelling  query  pa5erns  of  Linked  Data  users.      

 approach  to  formalize  seman%c  query  log  analysis     set  of  methods  for  extrac%ng  pa:erns  in  the  query  logs   visualiza%on  of  informa%on  needs  

Page 12: Identifying Information Needs by Modelling Collective Query Patterns

APPROACH  

Page 13: Identifying Information Needs by Modelling Collective Query Patterns

Formalizing  Query  Logs  

•  Proposed  ontology  ‘Qlog’  used  to  represent  the  main  concepts  and  rela%ons  extracted  from  a  query  log  entry.    

•  A  log  entry  follows  the  Combined  Log  Format  (CLF):  

Page 14: Identifying Information Needs by Modelling Collective Query Patterns

Qlog  Ontology  Log  Entry  Concepts   Query  Logs  Analysis  Concepts  

Page 15: Identifying Information Needs by Modelling Collective Query Patterns

Analyzing  Query  Logs  

Page 16: Identifying Information Needs by Modelling Collective Query Patterns

Consuming  Query  Logs  Analysis  

•  How  to  consume  the  query  logs  analysis?    

 -­‐  Automa%c  query  sugges%ons    

 -­‐  Recommender  systems      

 -­‐  Search  tools  (disambigua%on  and  ranking  results)    

 -­‐  Visualiza%ons  (to  gain  understanding  of  dataset  usage)      1.  Concept  Graph    

   2.  Predicate  sequence  tree  

Page 17: Identifying Information Needs by Modelling Collective Query Patterns

Consuming  Query  Logs  Analysis:  Visualiza%ons  

 Concept  Graph        Predicate  sequence  tree  

Page 18: Identifying Information Needs by Modelling Collective Query Patterns

Steps  for  Consuming  Query  Logs  Analysis  

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Identify Instance Types

Identify Predicate Sequence

Query Logs Knowledge

Base

Gather Class Size

Build Transition

Matrix

Build Vis

Tables

Build Vis

Tables

Render Vis

Render Vis

A1 A2 A3 A4

B1 B2 B3 B4

KB

Page 19: Identifying Information Needs by Modelling Collective Query Patterns

CASE  STUDY  

Page 20: Identifying Information Needs by Modelling Collective Query Patterns

Dataset  •  The  data  used  in  this  study  is  made  available  by  the  USEWOD2011  data  challenge.    

•  The  logs  contained  around  5  million  queries  issued  to  DBpedia  over  a  %me  period  of  almost  4  months.    

Number  of  analyzed  queries   4951803  

Number  of  unique  triple  pa:erns   2641098  

Number  of  unique  subjects   1168945  

Number  of  unique  predicates   2003  

Number  of  unique  objects   196221  

Number  of  unique  vocabularies   323  

Page 21: Identifying Information Needs by Modelling Collective Query Patterns

Analyzing  Dbpedia  usage  pa:erns  

Page 22: Identifying Information Needs by Modelling Collective Query Patterns

Analyzing  Dbpedia  usage  pa:erns  (Cont’d)  

Page 23: Identifying Information Needs by Modelling Collective Query Patterns

Ques%ons  

       

Ques%ons?!