Can RDB2RDF Tools Feasible Expose Large Science Archives for Data Integration? Alasdair J G Gray (University of Glasgow now Manchester) Norman Gray (Universities of Leicester and Glasgow) Iadh Ounis (University of Glasgow) ESWC 2009 – Crete 3 June 2009
26
Embed
Can RDB2RDF Tools Feasible Expose Large Science Archives for Data Integration? Alasdair J G Gray (University of Glasgow now Manchester) Norman Gray.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Can RDB2RDF Tools Feasible Expose Large Science Archives
for Data Integration?
Alasdair J G Gray (University of Glasgow now Manchester)
Norman Gray (Universities of Leicester and Glasgow)
Iadh Ounis (University of Glasgow)
ESWC 2009 – Crete3 June 2009
A.J.G. Gray - ESWC 2009 2
Outline
• Motivation: The Virtual Observatory• Can SPARQL be used to express scientific
queries?• Can existing archives be exposed with
semantic tools?– Can RDB2RDF tools extract large volumes of data?
3 June 2009
A.J.G. Gray - ESWC 2009 3
International Virtual Observatory Alliance
“facilitate the international coordination and collaboration necessary for the development and deployment of the tools, systems and organizational structures necessary to enable the international utilization of astronomical archives as an integrated and interoperating virtual observatory.”
3 June 2009
A.J.G. Gray - ESWC 2009 4
Searching for Brown Dwarfs
• Data sets:– Near Infrared, 2MASS/UK Infrared Deep Sky
Survey– Optical, APMCAT/Sloan Digital Sky Survey
• Complex colour/motion selection criteria• Similar problems
– Halo White Dwarfs
3 June 2009
A.J.G. Gray - ESWC 2009 5
Deep Field Surveys
• Observations in multiple wavelengths– Radio to X-Ray
• Searching for new objects– Galaxies, stars, etc
• Requires correlations across many catalogues– ISO– Hubble– SCUBA– etc
3 June 2009
A.J.G. Gray - ESWC 2009 6
The Problem
Locate and combine relevant data
• Heterogeneous publishers– Archive centres– Research labs
• Heterogeneous data– Relational– XML– Files
Virtual Observatory
3 June 2009
A.J.G. Gray - ESWC 2009 7
A Data Integration Approach
• Heterogeneous sources– Autonomous – Local schemas
Is it viable to perform query-driven conversions to facilitate data access from a data model that a scientist is familiar with?
Can RDB2RDF tools feasibly expose large science archives for data integration?
Relational DB
RDB2RDF
XML DB
RDF / XML Conversion
Common Model (RDF)
Mappings
SPARQLquery
3 June 2009
SPARQLquery
A.J.G. Gray - ESWC 2009 14
Astronomical Test Data Set
• SuperCOSMOS Science Archive (SSA)– Data extracted from scans of Schmidt plates– Stored in a relational database– About 4TB of data, detailing 6.4 billion objects– Fairly typical of astronomical data archives
• Schema designed using 20 real queries• Personal version contains
– Data for a specific region of the sky
– About 0.1% of the data– About 500MB
3 June 2009
A.J.G. Gray - ESWC 2009 15
Analysis of Test Data
• Using personal version– About 500MB in size (similar size to related work)
• Organised in 14 Relations– Number of attributes: 2 – 152
• 4 relations with more than 20 attributes
– Number of rows: 3 – 585,560– Two views
• Complex selection criteria in views
3 June 2009
Makes this different from business cases and previous work!
A.J.G. Gray - ESWC 2009 16
Is SPARQL expressive enough?
Can the 20 sample queries be expressed in SPARQL?
3 June 2009
A.J.G. Gray - ESWC 2009 17
Real Science QueriesQuery 5: Find the positions and (B,R,I) magnitudes of all star-like objects within delta mag of 0.2 of the colours of a quasar of redshift 2.5 < z < 3.5SQL:SELECT ra, dec, sCorMagB,