11/27/2003 IVOA Small Projects Meeting 2003 1 China-VO Data Access Service Based on OGSA Jian Sang National Astronomical Observatory of China Chinese Virtual Observatory
Jan 15, 2016
11/27/2003IVOA Small Projects Meeting 2003 1
China-VO Data Access Service Based on OGSA
Jian SangNational Astronomical Observatory of China
Chinese Virtual Observatory
11/27/2003IVOA Small Projects Meeting 2003 2
Outline
• VO,Grid and OGSA• Build the catalog data service• Build the image mosaic service• Faced technical difficulties
11/27/2003IVOA Small Projects Meeting 2003 3
The Increase Of Astronomical Data
The number of pixels and the data double every year!
The total area of astro telescopes in m**2
The total Gigapixels of CCDs.
11/27/2003IVOA Small Projects Meeting 2003 4
Challenges
• The quantity of data nearly amounts to PB.
• The data is distributed and stored in heterogeneous DBMSs in heterogeneous
host environments.
11/27/2003IVOA Small Projects Meeting 2003 5
The VO’s Goal
• The VO’s initial goal is to federate existing astronomical data archives and provide standard services for manipulating these data.
HOW TO REACH THIS GOAL?
The Grid technology can solve the problem!
11/27/2003IVOA Small Projects Meeting 2003 6
What is Grid
• Grid technology has been driven by genesis from metacomputing, but…
• In practice, the Grid is about resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations
• Focus on how to enable, maintain and control the sharing of resources to achieve a common goal
11/27/2003IVOA Small Projects Meeting 2003 7
What “Grid“ offers:
Resource management protocols and services that support secure remote access to shared data resources and computing and the co-allocation of multiple resources.
Security solutions that support management of credentials and policies.
Information query protocols and services that provide configuration and status information about resources,organizations and services.
Data Management services that locate and transport datasets between storage systems and applications.
11/27/2003IVOA Small Projects Meeting 2003 8
What is OGSA
• The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid system architecture based on Web services concepts and technologies.
• The OGSA integrates key Grid technologies (including the Globus Toolkit with Web services mechanisms to create a distributed system framework based around the Open Grid Services Infrastructure (OGSI).
In Grids ,Everything is Service
11/27/2003IVOA Small Projects Meeting 2003 9
The Open Grid Services Architecture
• Service orientation to virtualize resources• From Web services:( everything is service) -Standard interface definition mechanisms:multiple protocol
bindings,multiple implementations,local/remote transparency
• Building on Globus Toolkit: -Grid service :semantics for service interactions -Management of transient instances -Factory,Registry,Discovery,other services -Reliable and secure transport
• Multiple host environments:J2EE,.NET,C,…
11/27/2003IVOA Small Projects Meeting 2003 10
The Structure of Grid Service
11/27/2003IVOA Small Projects Meeting 2003 11
Grid service interfaces
11/27/2003IVOA Small Projects Meeting 2003 12
Construct The Astronomical Data Grid
The astronomical data service is the most fundamental and important component in Virtual Observatory.
In the aspect of data share, the VO can be think as a astronomical Data Grid
VO=Astronomical Data Grid
11/27/2003IVOA Small Projects Meeting 2003 13
Outline
• VO,Grid and OGSA• Build the catalog data access service• Build the image mosaic service• Faced difficulties
11/27/2003IVOA Small Projects Meeting 2003 14
The Classification of Astronomical Data Service
• Astronomical Catalog Service• Image Mosaic Service• Spectrum Data Service • Simulation Data Service• •
11/27/2003IVOA Small Projects Meeting 2003 15
Class DataSet Name Data Amount ( zipped )
Catalog CDS/ADC Catalogs About 30G
Other Catalogs About 120G
Survey RealSky 5G
ROSAT X-ray Survey 10G
BATC 360G
DSS I 60G
DSS II About 620G
SDSS EDR 30G
SDSS DR1 (part) 65G
2dF 2003 /2QZ 7G
Archive ROSAT X-ray Point 28G
Einstein X-ray Data 5G
Library ADS 350G
Total >1700GB
Existing Astronomical Datasets we have
11/27/2003IVOA Small Projects Meeting 2003 16
Build Catalog Data Service
How to federate the catalog data into VO,that is, how to build Data Service using the existing databases and programs?
11/27/2003IVOA Small Projects Meeting 2003 17
Define Catalog Service Interface
• Input Query Language : SQL(now),ADQL (plan)• Output Data Format: VOTable 1.0• Catalog resource metadata registry protocol: VOResource 0.9
Some standards we used:
input: ADQL query sentence
output: VOTable format result
it makes service interface/API simple.
11/27/2003IVOA Small Projects Meeting 2003 18
How to create a catalog data service that can understand ADQL and generate VOTable format result??
we adopt two ways!
• Reconstruct the existing catalog DBMS• Encapsulate search program,like pmm
The CDS has offered search program for big catalog like USNO A2,0…..
How to use existing databases and programs to create catalog data service
11/27/2003IVOA Small Projects Meeting 2003 19
Catalog data service based on DB
Catalog/metadata
VOTable Wrapper
ADQL/SQL Translator
GT3 Interface
ADQL VOTable
JDBC
SQL ResultSet
DBMS
11/27/2003IVOA Small Projects Meeting 2003 20
Advantage and disadvantage
• Can sufficiently use the functions of SQL language and implement complex query.
• DBMSs offer the most powerful functions for data management and maintenance.
• Need many works to reconstruct the DBs.• To big catalogs, like USNOB1.0,2MASS
PSC, query efficiency is low
catalog_table
value_option
table_files
table_coordinate
has
table_field
field_value
UCD_field
field_link
catalog_metadata
hascatalog_acronym
has
belong
catalog
catal og_i dpath_nameobsolute_bytitleshort_nameidentifier
<pi > IVA80IVA200VA250VA250
<M>
Identifier_1 <pi>
Coordinate
coord_i dRA_idDec_idepochsystemequinoxepoch_RAepoch_Dec
<pi > IIIVA20VA10VA20II
<M>
Identifier_1 <pi>
Table
table_idtable_namepropertydescription
IVA200VA40TXT2000
<M>
Field_link
l i nk_i dcontent_rolecontent_typetitlevaluehrefgrefaction
<pi > IVA40VA40VA200VA80VA250VA250VA250
<M>
Identifier_1 <pi>
Field
fi el d_i dcolumn_nameoriginal_positioncolumn_typeoption_namedatatypeUCDunittypewidtharraysizeprecisiondescriptionref
<pi > IVA80SIVA80VA80VA80VA80VA80VA250IVA80VA20TXT2000VA80
<M>
Identifier_1 <pi>
Option
opti on_i doption_namevalue
<pi > IVA80VA80
<M>
Identifier_1 <pi>
Mysql_files
file_idoption_namedb_namemax_Decmin_Decmax_RAmin_RA
IVA80VA80LFLFLFLF
<M>
Identifier_1 <pi>
Value
val ue_i dvalues_nullvalues_typeinvalidmin_valuemin_inclusivemax_valuemax_inclusive
<pi > IVA80VA10BLVA80BLVA80BL
<M>
Identifier_1 <pi>
metadata
publisherpublisherIDcreatorcreater_logocontributorversiondateref_URLcontact_namecontact_Emailsubjectdescriptionsourcetypecontent_levelrelationshiprelationshipIDfacil ityinstrumentcov_spatialcov_regioncov_speccov_spec_bandcov_spec_mincov_spec_maxcov_tem_startcov_tem_stopcov_depthcov_obj_dencov_obj_countcov_sky_fracres_spatialres_specres_tempUCDrightsuncer_spatialdata_qualityuncer_photouncer_specuncer_temp
VA250VA250VA250VA250VA250VA80DLVA250VA250VA250VA250TXT2000VA40VA250VA40VA40VA250VA250VA250LVA2000LA2000VA40VA250LFLFDDVA80VA80LILFLFLFLFVA80VA40LFLFLFLFLF
UCD
option_namedescription
VA80TXT2000
Acronym
nameacronym <pi >
VA80I <M>
Identifier_1 <pi>
11/27/2003IVOA Small Projects Meeting 2003 22
Data service based on search program
VOTable Wrapper
ADQL Translator
GT3 Interface
ADQL VOTable
JNI/
stream
program
Data Files
parameters
11/27/2003IVOA Small Projects Meeting 2003 23
Advantage and disadvantage
• Positional search is quicker than DB
• Only offer search functions that programs could offer. Many programs only offer position search functions,no statistical functions.
11/27/2003IVOA Small Projects Meeting 2003 24
Catalog Access Service Provided by usBand Name Num of objects Amount
X-ray RASS-BSC 18806 0.03GB
RASS-FSC 105924 0.10GB
optical
USNO B1.0 1045913669 38 GB
USNO A2.0 526280881 7 GB
GSC 2.2.1 455851237 40 GB
GSC 1.2 25241730 1.4 GB
UCAC 1 27425433 >0.5 GB
UCAC 2 48330571 4.5 GB
Tycho2 2539913 0.5 GB
Hipparcos 118218 0.05GB
infrared 2MASS PSC 470992970 127 GB
2MASS ESC 1647599 3 GB
radio NVSS 1773484 0.44 GB
FIRST 811117 0.1 GB
Total About 110 catalogs
About 220GB
11/27/2003IVOA Small Projects Meeting 2003 25
How to call a Catalog data service
Grid Client
ResourceRegistry
Data ServiceFactory
Data Service
Instance
CreateData
service
Database
1.<Find Factory>
2.<Factory GSH>
<registry>
3.<create data service>
4.<Data service GSH>
5.<data request(ADQL)>
6.<result (VOTable)>
11/27/2003IVOA Small Projects Meeting 2003 26
Use Data Service to build www service for end user
End Users
Data serviceData service Data service
ServicesRegisterServicesRegister
MySQL Oracle 9i Files
Grid Client
Web server
Web Client
ResourcesRegister
ResourcesRegister
http
Data Mining
Service
Data processing
Service
Data Visualization
Service
End user don’t know where the data services are
11/27/2003IVOA Small Projects Meeting 2003 27
Use data service to create other service
Our next work is to build a multi-wavelength cross-identification service
(MWCI)based on the catalog data service.
What is multi-wavelength cross-identification ?
To cross-identify datasets by positional consistency, we can understand objects from different wavelength properties.
11/27/2003IVOA Small Projects Meeting 2003 28
The steps of multi-wavelength cross-identification
• Cross-identify datasets from different wavelengths within error radius.
• Divide the result of cross-identification into three situations: one-to-one, one-to-two, one-to-many.
• Choose the one-to-one entry for data mining• The other two situations need statistical
analysis to determine which source are the true counterpoint.
11/27/2003IVOA Small Projects Meeting 2003 29
Requirements
• Locate the datasets that users want to use. (dataset discovery)• How to cross-match the datasets in
heterogeneous DBMSs at different locations effectively and efficiently.
• Find storage resource to store the results
11/27/2003IVOA Small Projects Meeting 2003 30
UserApplication
NVSS
Storage Service Provider
storageFactory
MWCIFactory
Registry
DataService
2MASS
DataService
.
.
.
MWCI Service Provider...
storage
MWCI1
2
3
4
4
5
6
7
5
6
11/27/2003IVOA Small Projects Meeting 2003 31
Outline
• VO,Grid and OGSA• Build the catalog data access service• Build the image mosaic service• Faced technical difficulties
11/27/2003IVOA Small Projects Meeting 2003 32
Build The Image Mosaic Service
• Use DSS-I sky image build our first image mosaic service.
11/27/2003IVOA Small Projects Meeting 2003 33
the definition of interface of service
• Input parameters: 1.RA,2.Dec,3.image height,4.image
width• transport protocols :gridFTP• Output Data format :fits
11/27/2003IVOA Small Projects Meeting 2003 34
Realization of DSS-I image mosaic service
GT3 Interface
JNI/ Fits file
GetImage
parameters
DSS-I ImageFiles
GridFTP
11/27/2003IVOA Small Projects Meeting 2003 35
Outline
• VO,Grid and OGSA• Build the catalog data access service• Build the image mosaic service• Faced technical difficulties
11/27/2003IVOA Small Projects Meeting 2003 36
Technical Difficulties
• service/resource registry and discovery!• ADQL2SQL translator• protocol shortcoming
11/27/2003IVOA Small Projects Meeting 2003 37
protocol shortcomings
•The shortcomings of VOTable 1.0 protocol
1.How to encapsulate result of join query!!
2.The standard to encapsulating spectrum data
3.the definition of FIELD element is not strict and uncompleted
•The shortcoming of UCD
1.Can’t express concrete meaning,such as “ERROR” ,Error for what??
2. incomplete, example:HTMID has no UCD
•Lack of standard for Unit
11/27/2003IVOA Small Projects Meeting 2003 38
Q & A
?www. .org
Thank You
11/27/2003IVOA Small Projects Meeting 2003 39
Our provided catalogs in Catalog ServiceBand Name Num of objects Amount
X-ray RASS-BSC 18806 0.03GB
RASS-FSC 105924 0.10GB
optical
USNO B1.0 1045913669 38 GB
USNO A2.0 526280881 7 GB
GSC 2.2.1 455851237 40 GB
GSC 1.2 25241730 1.4 GB
UCAC 1 27425433 >0.5 GB
UCAC 2 48330571 4.5 GB
Tycho2 2539913 0.5 GB
Hipparcos 118218 0.05GB
infrared 2MASS PSC 470992970 127 GB
2MASS ESC 1647599 3 GB
radio NVSS 1773484 0.44 GB
FIRST 811117 0.1 GB
Total About 110 catalogs
About 220GB
11/27/2003IVOA Small Projects Meeting 2003 40
The Step Of Calling A Data Service
11/27/2003IVOA Small Projects Meeting 2003 41
Transparencies for Astro Data Access
• Heterogeneity Transparency• Name Transparency• Distribution Transparency
11/27/2003IVOA Small Projects Meeting 2003 42
What is Grid Service?
11/27/2003IVOA Small Projects Meeting 2003 43
What Is The Data Grid
• DataGrid : A dynamic logical namespace that enables coordinated sharing of heterogeneous distributed storage resources and digital entities based on local and global policies across administrative domains in a virtual enterprise.
• DataGrid
– Logical name space for location independent identifiers
– Abstractions for storage repositories, information repositories, and access APIs
– Latency management
11/27/2003IVOA Small Projects Meeting 2003 44
Data GridData GridData GridData Grid
Using a Data Grid – in Abstract
Ask for d
ata
•User asks for data from the data grid
Data d
elivere
d
•The data is found and returned•Where & how details are managed by data grid
11/27/2003IVOA Small Projects Meeting 2003 45