Top Banner
15

STI Summit 2011 - Mlr-sm

Nov 11, 2014

Download

Documents

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STI Summit 2011 - Mlr-sm
Page 2: STI Summit 2011 - Mlr-sm

My  Seman)cs  of  “Train”  <owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Plas&c"/>          </owl:Thing>  

<owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Wood"/>          </owl:Thing>  

Page 3: STI Summit 2011 - Mlr-sm

USAGE  

A  Usage-­‐dependent  Life  Cycle  

• toy  train  • made  of  plas)c  

Enter  the  room  

• SELECT  *  WHERE  ?t      a:madeOf  a:Plas)c  

• SELECT  *  WHERE  ?t  b:madeOf  b:Wood  

Request  to  put  away  the  “train”   • toy  train  

• made  of  wood  

Nego)ate  understanding  

Page 4: STI Summit 2011 - Mlr-sm

Make  it  less  a  methodology  but  support  the  people  to  get  their  “Things”  done!  

Yet  another…  

The  Maintenance  Black  Box  METHONTOLOGY  

DILIGENT  

OTK  NeOn  

…  

Page 5: STI Summit 2011 - Mlr-sm

Who  is  hurt  by  that?  

•  rather  small/simple  ontologies  – min.  effort  for  OE  – “under-­‐engineered”  

•  unknown  user  requirements  

Page 6: STI Summit 2011 - Mlr-sm

Hey  “LOD  people”,  do  you  think  that  ontology  engineering  maaers?  

Usage-­‐based  ontology  engineering  

Page 7: STI Summit 2011 - Mlr-sm

 Publishers  of  99%  of  the  dataset  do  not  feel  responsible  for  their  data?  

Survey  ran  in  October  2010  

Survey  covering  approx.  25%  of  all  cloud  datasets  

•  size  •  complexity  •  engineering  methodology  •  …    Publishers  of  75%  of  the  dataset  do  not  feel  

responsible  for  their  data?  

Page 8: STI Summit 2011 - Mlr-sm

Concrete  Example  of  Usage-­‐based  Approach  

digging  in  log  files  

Page 9: STI Summit 2011 - Mlr-sm

USAGE  

Usage?  

• SELECT  *  WHERE  ?t      a:madeOf  a:Plas)c  

• SELECT  *  WHERE  ?t  b:madeOf  b:Wood  

Request  to  put  away  the  “train”  

Yes*!  But  beyond?  

•  What  about  the  future  of  SPARQL  endpoints  on  the  WoD?  

*  W.r.t.  an  architecture  proposed  by  a  famous  “Web-­‐Extremist”  

Page 10: STI Summit 2011 - Mlr-sm

You  should  have  a  query  endpoint!  

•  You  get  something  valuable  out  of  it  which  helps  you  to  play  your  role  on  the  WoD!  

Effort Distribution between Publisher and Consumer

Consumer generates/ data mines linksdata mines links

Effort Distribution

Publisher provides links

Links as hints

Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)

Somebody-Pays-As-You-Go

The overall data integration effort is split between the data publisher the

!"#$%&'()) *(+(,-+&.'(+"/-

split between the data publisher, the data consumer and third parties.

Data Publisherpublishes data as RDF

,-+&.'(+"/-011/'+

reuses terms from common vocabularies

sets links and publishes mappings

Third Partiesset links pointing at your data 23"'4

567)"83&'98p g y

publish mappings to the Web

Data Consumer

5('+:011/'+

567)"83&'98011/'+

Data Consumerhas to do the rest

using data mining techniques for

;/-86<&'98011/'+

Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)

using data mining techniques foridentity resolution and schema matching

Page 11: STI Summit 2011 - Mlr-sm

Usage  Analysis  •  queries  •  paaerns  •  triples  •  primi)ves  

zoom  in  and  see  details  

visualize  heat  maps  

Page 12: STI Summit 2011 - Mlr-sm

Some  Results  (DBpedia  Analysis)  

Complete  analysis  can  be  found  at  hap://page.mi.fu-­‐berlin.de/mluczak/pub/visual-­‐analysis-­‐of-­‐web-­‐of-­‐data-­‐usage-­‐dbpedia33/  

missing  facts  

inconsistent  data  

•  ns:Band  ns:knownFor  ?x  •  ns:Band  ns:na)onality  ?y  

•  ns:Band  ns:instrument  ?x  •  ns:Band  ns:genre  ?y  •  ns:Band  ns:associatedBand  ?z  

Page 13: STI Summit 2011 - Mlr-sm

Some  Thoughts  about  Benefit  

•  usage  analysis  helps  to  acquire  new  knowledge  –  links  between  data    helps  to  increase  the  quality  of  data  on  the  Web  

– external  schema  

•  lightweight  approach  helps  to  bootstrap  linked  data  

It  is  not  necessary  to  automate  everything  if  the  result  has  enough  (business)  value  in  a  problem  domain  anyway.    

Page 14: STI Summit 2011 - Mlr-sm

•  LOD  vocabularies  are  specific  ontologies  and  need  specific  life  cycle  support  

•  usage  analysis  can  help  to  maintain  them  (and  the  data)  

•  this  is  a  benefit  for  the  dataset  publisher  and  the  Web  of  data  as  a  whole  

•  make  it  less  a  methodology  •  provide  (query)  access  to  your  data  endpoint  and  play  your  role  on  the  WoD  

•  it  is  not  implicitly  necessary  to  automate  things  that  enable  automa)on  

Take  Away  

Hey  “LOD  people”,  do  you  think  that  dataset  maintenance  maaers?  

Markus  Luczak-­‐Rösch  ([email protected]­‐berlin.de)  Freie  Universität  Berlin,  Networked  Informa)on  Systems  (www.ag-­‐nbi.de)  

Page 15: STI Summit 2011 - Mlr-sm

Actual  Addi)on  

•  “15.500.000  people  in  Germany  are  not  willing  to  use  the  internet”  

– emphasis  on  the  ESWC  discussion:  bridging  the  gap  (directly  or  indirectly)  between  these  people  and  the  internet/Web  has  a  high  poten)al  to  influence  societal  transforma)on  (they  are  not  going  to  use  a  browser  or  an  iPhone  and  they  do  not  care  for  seman)cs)  

Source:  ARD-­‐Morgenmagazin,  08-­‐07-­‐2011