Finding Data Sets Anja Jentzsch, Freie Universität Berlin 17 April 2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
Jan 20, 2015
Finding Data Sets
Anja Jentzsch, Freie Universität Berlin
17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
WWW2012, Lyon, France
1
Different motivations
• Finding data sets
• Look for resources to link a data set to
• Find a data set with relevant data to consume / integrate
• Finding vocabularies
• Find vocabularies to use to model data sets
• Find vocabularies to map your existing schema to
2
Different tool types
• Search engines
• find data sets based on keywords
• Data catalogs / directories
• explore data sets and faceted search
• Data Marketplaces
• explore and consume data sets
3
Linked Data Search Engines
• The description of the resources is published as document in RDF
• RDF search engine index the RDF documents
• Process similar to that of search engines for HTML documents
4
5http://sindice.com
6http://sindice.com
7http://sig.ma
8http://sig.ma
9http://swoogle.umbc.edu
10http://kmi-web05.open.ac.uk/WatsonWUI/
11http://factforge.net
12http://factforge.net
Suitability
• Look for resources to link a data set to
• Good
• Find a data set with relevant data to consume
• Maybe good: depends on how the query is expressed
• Find vocabularies to use to model data sets
• Not good: everything is indexed, too much noise
13
Data catalogs
• Several governments and institutions are opening their catalogs
• http://datacatalogs.org provides a manually curated index of 226 data catalogs
14
15http://datacatalogs.org
16
The Data Hub
• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets
• Various metadata for each data set
• Other views over (part of) its content
• Semantic CKAN (http://semantic.ckan.net)
• LATC Data Source Inventory
• LOD Cloud
• State of the LOD Cloud
17
18http://thedatahub.org
19
20http://dsi.lod-cloud.net
21http://lod-cloud.net
22http://lod-cloud.net/state/
23http://lod-cloud.net/state
Data Marketplaces
• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com)
24
Kasabi
• Data domain
• All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
• Public datasets
• User submitted datasets
• Data size
• 186 data sets
• Data model
• RDF
25
26http://kasabi.com
Freebase
• Metaweb (USA), now Google
• Free for 100K read API calls per day (10K write), paid for higher volumes
• Data access
• REST API
• Linked Data endpoint (http://rdf.freebase.com)
• Triple uploader / RDF dumps
• Data tools
• Web based – schema editor, review queue, viewers, …
• GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension) 27
28http://www.freebase.com
29
Linked Open Vocabularies (LOV)
• Initiative similar to the LOD Cloud but focused on vocabularies
• 250+ vocabularies
30
31http://labs.mondeca.com/dataset/lov/
32