Top Banner
Finding Data Sets Anja Jentzsch, Freie Universität Berlin 17 April 2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Finding Data Sets

Finding Data Sets

Anja Jentzsch, Freie Universität Berlin

17 April 2012

Tutorial: Practical Cross-Dataset Queries on the Web of Data

WWW2012, Lyon, France

1

Page 2: Finding Data Sets

Different motivations

• Finding data sets

• Look for resources to link a data set to

• Find a data set with relevant data to consume / integrate

• Finding vocabularies

• Find vocabularies to use to model data sets

• Find vocabularies to map your existing schema to

2

Page 3: Finding Data Sets

Different tool types

• Search engines

• find data sets based on keywords

• Data catalogs / directories

• explore data sets and faceted search

• Data Marketplaces

• explore and consume data sets

3

Page 4: Finding Data Sets

Linked Data Search Engines

• The description of the resources is published as document in RDF

• RDF search engine index the RDF documents

• Process similar to that of search engines for HTML documents

4

Page 5: Finding Data Sets

5http://sindice.com

Page 6: Finding Data Sets

6http://sindice.com

Page 7: Finding Data Sets

7http://sig.ma

Page 8: Finding Data Sets

8http://sig.ma

Page 9: Finding Data Sets

9http://swoogle.umbc.edu

Page 10: Finding Data Sets

10http://kmi-web05.open.ac.uk/WatsonWUI/

Page 11: Finding Data Sets

11http://factforge.net

Page 12: Finding Data Sets

12http://factforge.net

Page 13: Finding Data Sets

Suitability

• Look for resources to link a data set to

• Good

• Find a data set with relevant data to consume

• Maybe good: depends on how the query is expressed

• Find vocabularies to use to model data sets

• Not good: everything is indexed, too much noise

13

Page 14: Finding Data Sets

Data catalogs

• Several governments and institutions are opening their catalogs

• http://datacatalogs.org provides a manually curated index of 226 data catalogs

14

Page 15: Finding Data Sets

15http://datacatalogs.org

Page 16: Finding Data Sets

16

Page 17: Finding Data Sets

The Data Hub

• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets

• Various metadata for each data set

• Other views over (part of) its content

• Semantic CKAN (http://semantic.ckan.net)

• LATC Data Source Inventory

• LOD Cloud

• State of the LOD Cloud

17

Page 18: Finding Data Sets

18http://thedatahub.org

Page 19: Finding Data Sets

19

Page 20: Finding Data Sets

20http://dsi.lod-cloud.net

Page 21: Finding Data Sets

21http://lod-cloud.net

Page 22: Finding Data Sets

22http://lod-cloud.net/state/

Page 23: Finding Data Sets

23http://lod-cloud.net/state

Page 24: Finding Data Sets

Data Marketplaces

• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com)

24

Page 25: Finding Data Sets

Kasabi

• Data domain

• All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …

• Data population

• Public datasets

• User submitted datasets

• Data size

• 186 data sets

• Data model

• RDF

25

Page 26: Finding Data Sets

26http://kasabi.com

Page 27: Finding Data Sets

Freebase

• Metaweb (USA), now Google

• Free for 100K read API calls per day (10K write), paid for higher volumes

• Data access

• REST API

• Linked Data endpoint (http://rdf.freebase.com)

• Triple uploader / RDF dumps

• Data tools

• Web based – schema editor, review queue, viewers, …

• GridWorks (Google Refine)

• Exploring, data cleaning, transformation of tabular data

• Map data to Freebase schema & RDF export (3rd party extension) 27

Page 28: Finding Data Sets

28http://www.freebase.com

Page 29: Finding Data Sets

29

Page 30: Finding Data Sets

Linked Open Vocabularies (LOV)

• Initiative similar to the LOD Cloud but focused on vocabularies

• 250+ vocabularies

30

Page 31: Finding Data Sets

31http://labs.mondeca.com/dataset/lov/

Page 32: Finding Data Sets

32