Basic Propositions of the RVO Information Infrastructure Project On behalf of the RVOII project report co- authors Leonid Kalinichenko Institute of Informatics Problems of RAS [email protected]
Dec 28, 2015
Basic Propositions of the RVO Information Infrastructure Project
On behalf of the RVOII project report co-authors
Leonid KalinichenkoInstitute of Informatics Problems of [email protected]
RVO Information Infrastructure Project Report In May 2005 in Russia the RVO Information Infrastructure (RVOII)
project report has been published as a result of joint efforts of the Special Astrophysical Observatory of RAS (SAO RAS), of the Institute of Astronomy of RAS (INASAN) and the Institute of Informatics Problems of RAS (IPI RAS) supported by one year grant of the Russian Foundation for Basic Research (RFBR).
RVOII is aimed at integrated representation of information in various problem domains of astronomy and support of scientific problems solving.
The project report contains analysis of various kinds of astronomical information resources accumulated in around the World and specifically in Russia, analysis of technological and architectural recommendations of the IVOA, analysis of classes of scientific problems that need the VO facilities, analysis of of correspondence and sufficiency of the IVOA standards for the identified RVO activities; analysis of existing components and services that can be re-used for RVOII implementation.
Based on the analysis performed, a structural design of the RVO information infrastructure has been developed. Strategically the program of development of RVOII is oriented on tight coordination of works with the activity on the development of the International VO.
Talk outline
o Objectives of the RVO projecto Representation of problem domains in natural sciences in information systemso Information Resources in Astronomy; Russian astronomical resources o Analysis of Projects of Virtual Observatories o Information Infrastructure Forming Standards o Classes of astrophysical problems for VO o Virtual observatory architecture according to IVOA o Subject mediation infrastructure planned for problem domains representation in RVO o Information infrastructure of RVO o AstroGrid as the core of the RVOII infrastructureo First trial of AstroGrid community centreo Analysis of possibility of extending of AstroGrid with subject mediation facilities
Objectives of the RVO project (1)
Main objectives of the RVO project :
to provide the Russian astronomical community with the facilities of integration of the Russian astronomical resources into the VO;
to provide the Russian astronomical community with the facilities of integrated access to the data accumulated in the International astronomical data resources;
to provide the Russian astronomical community with the facilities of problem domains definition for solving of various classes of the astronomical problems, computational facilities, facilities for information analysis and data mining, facilities for automation of scientific research in astronomy;
to support a set of standards agreed with the international community and providing for the interoperability of heterogeneous data and facilities for the problem solving;
Objectives of the RVO project (2)
to develop strategically important classes of astronomical problems based on the VO technology and develop processes (workflows) and mediators for the respective research support;
to develop organizational measures for development and usage of the VO technology in Russia agreed with the international community, for coordinating of the Astronomical Data Centers in Russia and abroad, for coordination of research based on the VO technology;
to develop a set of measures for creation of RVO as an important educational resource for the Russian Universities;
to form in Russia the sustainable community of astronomers actively using VO in their scientific research;
to contribute to the high level of research based on VO technology in Russia in the strategically important areas of astronomy.
Subject Domain in Natural Science
Material System Def in NLDomain Terminology and Concepts(abstract, methodological, concrete)
Theory (Model) 1. T1 Signature
Concretization A of T1
Concretization B of T1
(attributes, types, classes, processes) [simulators]
…
Semantics of T1…Tn constituents Observable/Measurable
Characteristics
Methods and Instruments for observa-tion, experimentation, measurement, data analysis, discovery
T1 Measurable Characteristics(attributes, types, classes, procs)
Observations, simulations, measurements for T1 Explaining, forecasting
Semantics
Interpreta-tions
T2, … , Tn measu-rable characteristics
Theories (Models)T2, … , Tn
Problems, methods of solutions,algorithms, programs, workflows
Simulation
From the Report to the President of USA “Computational Science: Ensuring America’s Competitiveness”The President’s Information Technology Advisory Committee (PITAC) in May 2005 completed the report where it states:
“No single researcher has the skills required to master all the computational and application domain knowledge needed to gather data from databases orexperimental devices, create geometric and mathematical models, create newalgorithms, implement the algorithms efficiently on modern computers, andvisualize and analyze the results. To model such complex systems faithfullyrequires a multidisciplinary team of specialists, each with complementaryexpertise and an appreciation of the interdisciplinary aspects of the system,and each supported by a software infrastructure that can leverage specificexpertise from multiple domains and integrate the results into a completeapplication software system.
Computational researchers need enabling, scalable, interoperable application software to conduct examinations of their ideas and data”.
Information Resources in Astronomy
o World-wide resources overviewo Optical surveys and catalogso Infrared and radio range surveyso Archives of observationso Data centerso Surveys o Robotic Telescopes
o Russian astronomical resources
From Tera to Petabytes
Large Synoptic Survey Telescope (LSST) ranging from Earth's vicinity to the edge of the optical universe.
It will reach 24th mag in 10 seconds, and will survey up to 14,000 square degrees three times per month. Over a period of years, 30,000 square degrees will be surveyed in multiple bands and the co-added images will go to 27th magnitude.
High technology in microelectronics, large optics fabrication and metrology, and software.
Comparing the LSST (8.4 m) telescope with the SDSS, and allowing also for its increased pixel sampling and resolution, the advantage in figure of merit is by a factor of close to 200
Data products will consist of photometric catalogs which will be continuously updating during the survey, a moving object database, images in at least 5 bands (updated on a regular schedule), the huge time-tagged processed image database, totally will climb to around 15 Petabytes.
Russian astronomical resources Main providers of astronomical data in Russia:
Special Astrophysical Observatory of RAS (SAO RAS) Sternberg Astronomical Institute of the Moscow State University (SAI
MSU) Main (Pulkovo) Astronomical observatory of RAS (MAO) Institute of Applied Astronomy (IAA RAS) Institute of Terrestrial Magnetism, Ionosphere and Radiowave Propagation
of the RAS (IZMIRAN) Institute of Solar-Terrestrial Physics of the Siberian Branch of Russian
Academy of Sciences (ISTP SB RAS) Space Research Institute (IKI) of the RAS Astronomical Institute of Saint-Petersbourg State University (AI SPbSU) Ural State University (USU) Puschino Radioastronomical Observatory of Astro Space Center of the LPI
RAS (PrAO ASC LPI RAS) Russian Robotic Telescopes
Russian astronomical resources
Subject Number of resources Number of institutions
Stellar systems 7 3
Stars 22 9
Solar system 21 8
Sun 23 8
Radioastronomy 7 4
Cosmic rays 4 3
Multi subject archives 7 5
TOTAL 91 19 (Russia and fSU)
Russian and fSU astronomical data resources classified by subject
Analysis of Projects of Virtual Observatories
o NVOo AstroGrido EURO-VO
EURO-VO Participants
French VO, as represented by the Centre de Données astronomiques de Strasbourg (CDS), Strasbourg, France
European Southern Observatory, Garching, Germany European Space Agency, Paris, France UK AstroGrid Consortium, as represented by the University
of Edinburgh, Edinburgh, UK German Astrophysical Virtual Observatory (GAVO), as
represented by the Max Planck Institute for Extraterrestrial Physics (MPE), Garching, Germany
Istituto Nazionale di Astrofisica, Rome, Italy Nederlandse Onderzoekschool voor Astronomie, Leiden,
The Netherlands Laboratorio de Astrofísica Espacial y Física Fundamental,
Madrid, Spain
Information Infrastructure Forming Standards
The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for harvesting records containing metadata from repositories
A Web service is defined as a standardized way of integrating Web-based applications using the XML, SOAP, WSDL, and UDDI open standards over an Internet protocol backbone
Grid technology Compute/File Grid Information Grid Hybrid Grid Semantic Grids
Web Services Resource Framework (WSRF) to make grid resources accessible within a web services architecture.
Classes of astrophysical problems for VO
o Class of problems solvable applying database search techniqueo Classes of general problems for VO (cosmology, formation and development of galaxies, formation and evolution of stars, sun and planets, etc.)o Theoretical research and VO (VirtU – the Virtual Universe project as an example)o Co-existence of theoretical and observational archives and services in VO
The relationship between the TVO, TOI and AstroGrid
From the Report to the President of USA “Computational Science: Ensuring America’s Competitiveness”
Astrophysical scientific problems mentioned in the PITAC Report:
Discovering Brown Dwarves via Data MiningScientists creating the NVO confirmed the existence of the new brown dwarf in 2003. The new discovery was quite unexpected from data that had been publicly available for at least 18 months. NVO researchers emphasized that a single new brown dwarf discovered , Is not as scientifically significant as the rapidity of the new discovery and the tantalizing hint it offers for the potential of NVO.Dark Matter, Dark Energy, and the Structure of the UniverseA team at the University of Illinois has conducted large-scale cosmologicalcomputational simulations that show the distribution of cold dark matter in a model of cosmic structure formation incorporating the effects of a cosmological constant (Lambda) on the expansion of the universe. The simulation contained 17 million dark matter particles in a cubic model universe that is 300 million light-years on a side.Supernova ModelingThe TeraScale Supernova Initiative (TSI) , a national, multi-institution, multidisciplinary collaboration of astrophysicists, nuclear physicists, applied mathematicians, and computer scientists. TSI’s principal goals are to understand the mechanism(s) responsible for the explosions of core collapse supernovae and all the phenomena associated with these stellar explosions.
Requirements for scientific results publishing
To publish means to make data/service products in repositories available through services that are accessible via a VO supplied sites
To allow independent checks of conclusions based on theoretical results, reproducing certain results.
To allow comparisons with similar results/methodologies or with the corresponding data by observers/theoreticians.
To make theoretical results more easily accessible and understandable for observers.
Journals may require links to actual data products and/or software used in published work.
To allow querying of publications, real and simulated data products in a uniform manner (joint queries on a structured content items and on metadata – on observations and publications)
To check observable classes as interpretations of theories (models), to make analysis of inconsistencies of observations and theoretical models.
Data Mining as a part of PSE Two basic classes of models: predictive and descriptive Predictive: one of the observational features is chosen as the target. The
model provides a way of calculating the target as a function of the rest of the features: Y=F(X1, … ,Xn). Two approaches – classification (predicts a class to which an object may belong with a certain probability) and regression (predicts a value of the target). (Naïve Bayes, Adaptive Bayes, Support Vector Machines (SVM), regression, searching for essential attributes, etc.)
Descriptive: a) Clusterization applying certain criteria of similarity (in contrast with classification features and classes of partitioning are unknown), b) Associative model (looking for stable associations)
For each model many algorithms exist (classification and regression decision trees, genetic algorithms, neuron nets, discriminant analysis, enhanced K-means, O-cluster, association search etc.)
Technology of data mining: 1) problem statement, 2) data preparation, 3) model development and choosing the algorithm, 4) evaluation and interpretation. Not all models allow interpretation (e.g., neuron nets). But if rules are applied, they give a way for interpretation
Problem statements are required !
VO architecture according to IVOA
o VO architecture overviewo Data Modeling
o A unified domain model for astronomy, for use in VOo Data model for quantityo IVOA Observation data modelo Simple Spectral Data Modelo Simulation Data Model
o Unified Content Descriptors o Metadata Registries for VOo VOTable Format Definitiono Data Access Layero DAL Architecture
o Simple Image Access Protocol Specificationo Simple Spectral Access Specification
o IVOA Query Languageo IVOA SkyNode Interface
International Virtual Observatory Alliance Partners AstroGrid (UK) (http://www.astrogrid.org); Australian Virtual Observatory (http://avo.atnf.csiro.au); Astrophysical Virtual Observatory (EU) (http://www.euro-vo.org); Virtual Observatory of China (http://www.china-vo.org); Canadian Virtual Observatory (http://services.cadc-ccda.hia-iha.nrc-
cnrc.gc.ca/cvo/); German Astrophysical Virtual Observatory (http://www.g-vo.org/); Hungarian Virtual Observatory (http://hvo.elte.hu/en/); Italian Data Grid for Astronomical Research
(http://wwwas.oat.ts.astro.it/idgar/IDGAR-home.htm); Japanese Virtual Observatory (http://jvo.nao.ac.jp/); Korean Virtual Observatory (http://kvo.kao.re.kr/); National Virtual Observatory (USA) (http://us-vo.org/); Russian Virtual Observatory (http://www.inasan.rssi.ru/eng/rvo/); Spanish Virtual Observatory (http://laeff.esa.es/svo/); Virtual Observatory of India (http://vo.iucaa.ernet.in/~voi/).
IVOA Infrastructure Controversies (just one example)
1. Euro-VO and NVO objectives: how to consolidate them and support with a complete system of standards
2. Controversies in understanding of what Data Centre is (e.g., CDS vs AstroGrid definitions)
3. Absence of a Data Centre concept in the IVOA standards
4. Controversy between SkyQuery idea and Data Centres
Subject mediation infrastructure for problem domains representation in RVO
o Information sources integration approacheso Principles of subject mediationo Subject mediation tools
Information sources integration approaches
Virtual integration: Formation of a global schema as a result of integration of
pre-selected set of source schemas (Global as View) Global schema is defined independently of existing
sources as a subject domain schema (Local as View) Materialized integration (data warehouses) Combined methods (GLAV, applying partial materialization)
There exist two principally different approaches to the problem of integrated representation of multiple information resources for a researcher solving scientific problems:
1) moving from resources to a problem (an integrated representation of multiple resources is created independently of the problem) and
2) moving from a problem to the resources (a description of a problem class subject domain (in terms of concepts, data structures, functions and processes of problem solving) is created, in which the relevant to the problem resources are mapped).
The first approach (used in SkyQuery) is not scalable with respect to the number of resources, global schema becomes not observable by researcher, completeness of information is doubtful.
To implement the second approach a mediation technology is to be created. The mediator supports an interaction between a researcher and resources applying a description of the problem class subject domain (description of the mediator). Subject mediator approach (new technology) is considered as a part of RVOII.
Subject Mediator Concept
Mediator Definition as a Subject Metainformation Consolidation
For the mediator's scalability two separate phases of the mediator's functioning are distinguished: consolidation and operational.•On the consolidation phase the efforts of the scientific community are focused on the mediator subject definition by declaring its metainformation. The metainformation created at the consolidation phase constitutes a definition of the subject domain of the mediator.•During the operational phase arbitrary information collections can be registered at the mediator expressed in terms of the mediator. Process of the registration is autonomous and can be done by collection providers independently of each other. Users of the mediator know only the metainformation defining the mediator’s subject and formulate their queries in terms of the mediator’s subject.
Advantages of subject domain mediation
1. Semantic integration of heterogeneous information collections is reached
2. Users should know only subject definitions consolidated by a community
3. Information providers can disseminate their information for integration independently of each other and at any time.
4. Autonomous information collections are absolutely independent of the mediators and their consolidated metainformation definitions
5. Users have integrated access to all information registered up to the moment of a query.
6. Mediators form recursive structure. Multiple subjects can be semantically integrated defining mediators of the higher level.
Subject mediation tools (operational phase)
Portal
Web Browser
Application Server
Web Page
Web Page
Servlets/JSP
EJB /WS
Application Client
Mediator
Oracle 10gMetainformation
Repository
DataRepository
RegistrationClient
Rewriter
Planner
Supervisor
Synth2Oracle
SOAPWrapper
ADQL2SYFSMetadataAccess
Collection Collection
Collection Adapter
Collection Adapter
4 4
4
5 9
Collection
Tool Adapter
4
Software Tools
5
1 2 1 2
6
7
6
3
3
3
3
3
Information infrastructure of the RVO
o Basic principles for the RVO infrastructureo The RVO layered infrastructureo Components of RVO
Basic principles for the RVO infrastructure
Basic RVO infrastructural principle is to represent the architecture as a network of interoperating web services (Grid services as soon as suitable OGSA DAI or WSRF standard will mature). a multilevel hierarchy of services is the basis for the RVO architecture. The handling of remote and virtual data sources should be provided. The core will be set of simple, low level services that are easy to implement even by small projects. Thus the threshold to join the VO will be low. Large data providers may be able to implement more complex, high-speed services as well. The services can be combined into more complex compositions that talk to several services, and create more complex results.
Move processing to the data is another principle motivated by large volume of the data and data intensive character of VO applications.
Modular architecture that encourages code reuse and composition is another guiding principles for the RVO infrastructure.
Conventional practice of applying global as view approach to data integration in the VO projects (e.g., SkyQuery) looks as not scalable.
Emphasizing subject mediators to support representation and access to various subject domains in astronomy is a basic RVO principle.
The RVO layered infrastructure
Archives Simulations Telescopes
ResourceLayer
DataCenters
VirtualObservatory
Researcher/Problem
Layer
Local Metadata Registries
CatalogsLocal
Catalog Search
Access Services
Data Analysis Facilities
Searchable Metadata Registries
Catalogs Warehouse
Integrated Catalog Search
Access Services
Data Analysis Facilities
Searchable Metadata Registry
Catalogs Warehouse
Integrated Catalog Search
Access Services
Data Analysis Facilities
Computational Grid Facilities
Simulators Mediators Data Analysis Programs
CollaboratoryData Spaces
Workflow Support
Portals andWorkflowSupport
Portals andWorkflowSupport
Portals andWorkflowSupport
Computational Grid Facilities
Publications
GroundLayer
Interface for Integrated
Search
Searchable metadata registries at Data Center and Virtual Observatory layers
SAO Data Center Infrastructure
INASAN Data Center Infrastructure
RVO Infrastructure
The RVO layered infrastructure
Archives Simulations Telescopes
ResourceLayer
DataCenters
VirtualObservatory
Researcher/Problem
Layer
Local Metadata Registries
CatalogsLocal
Catalog Search
Access Services
Data Analysis Facilities
Searchable Metadata Registries
Catalogs Warehouse
Integrated Catalog Search
Access Services
Data Analysis Facilities
Searchable Metadata Registry
Catalogs Warehouse
Integrated Catalog Search
Access Services
Data Analysis Facilities
Computational Grid Facilities
Simulators Mediators Data Analysis Programs
CollaboratoryData Spaces
Workflow Support
Portals andWorkflowSupport
Portals andWorkflowSupport
Portals andWorkflowSupport
Computational Grid Facilities
Publications
GroundLayer
Interface for Integrated
Search
AstroGrid as the core of the RVOII infrastructure
AstroGrid as the architectural core for implementation of RVOII
Analysis shows that usage of AstroGrid as the RVOII core provides for implementation of the RVOII principles (such as modularity of the architecture, grid interoperability of services, possibility of re-use and composition of services, development of multilayered architecture). Components of AstroGrid are analyzed to be directly applicable as the RVOII architecture core: Registry – for metadata based resource registration and search, MySpace – for management of sharable by researchers and tools data
spaces, Workbench – for the VO user interface during problem solving, Community – for administration and management of VO users, JES – for the workflow engine, CEA – for constructing of interoperable applications (services); DSA – for a facility of data storage functionality inclusion into
AstroGrid on the required level of system (task) implementation
CLI Portal Workbench
Science Application
Dataset Access Workflow Registry
VObs Support Sevices
CommunityResource Discovery
Agent Framework
Data Mining Framework
Visualization Framework
MySpace
Grid & Web Services Middleware
Astronomical Datasets
Auth/Auth Security
Astronomer Interface
Tools
VirtualObservatoryInfrastructure
Middleware
Data
Legend: Existing ComponentAstroGrid-2 Component
External Component
AstroGrid existing and planned components
Community centre in Moscow (IPI RAS) for support of scientific astronomical problem solving over distributed repositories of astronomical information
One of the first steps of implementation of RVOII is installation of Community centre in Moscow (at IPI RAS) for support of scientific astronomical problem solving over distributed repositories of astronomical information (containing data of observations, problem solving results, services for data and knowledge analysis). This Centre is positioned at the top layer of RVOII providing for its immediate usage for problem solving by scientists in astronomy..
The Centre has been created in October 2005 as an installation of the AstroGrid (1.1), developed recently in the UK and generously provided by the authors to be used for RVO.
First trial: application of AstroGrid for data analysis for the distant galaxy discovery problem
Superposition of radio images contours and optical images in Aladin
RVO facilities as a part of the International VO
AstroGridLeicester(Great Britain)
AstroGrid Edinburgh(Great Britain)
AstroGrid RVO(IPI RAS)
Tools for astrophysicalproblems definition
Metainformation of AstroGrid information sources
Tools for management of problem solving
Data CenterSAO RAS
Data CenterINASAN
Data CenterStrasbourg(France)
Data Centers in USA
Information Grid
Analysis of possibility of extending AstroGrid with subject mediation facilities
Basic preliminary decisions:
Mediators are registered in the Registry as CEA applications; At the mediator interface the methods for providing ADQL queries and
mediator programs in a subset of the SYNTHESIS language are planned; CEA applications can be used as functions in the mediator programs; The results of the mediator programs are represented in a form of VOClass,
for which VOTable is a strict subset; the results are stored in MySpace; The mediator programs can be used as tasks of the AstroGrid workflows; Adapters are embedded into AstroGrid either by means of the built-in
application server for java applications or by means of DSA application server;
For the mediator clients on the initial stage Portal and Workbench can be used; On the forthcoming stages a development of specific mediator client based on the ACR capabilities can be undertaken;
Facilities for external applications calls are planned (e.g., for data mining facilities of Weka and/or Oracle).
Composed architecture
CEC
Clients
save/load data (VOTable, VOClass)
applicationlist
submitworkflow
save/loadworkflow
resloveapplication
Applications
view data(VOTable, VOClass)
transmit query,receive result
(VOClass)
(m-I)
(m-II)
Portal
Workbench
Mediatorclient
Registry
JES
MySpace
Command-line CEA
Java CEA
Mediator CEA
DSA CEA
Servers
Aladin
Weka
Mediator
Adapter
(a-I)
(a-II)
Http CEA SIA
Links
RVOII Report Briukhov D.O., Kalinichenko L.A., Zakharov V.N., Panchuk V.E.,
Vitkovsky V.V., Zhelenkova O.P., Dluzhnevskaya O.B., Malkov O.Yu., Kovaleva D.A Information Infrastructure of the Russian Virtual Observatory (RVO). Second Edition IPI RAN, May 2005
http://synthesis.ipi.ac.ru/synthesis/publications/rvoii/rvoii.pdf
Объявление АстроГрида РВО как центра коллективного пользования, инструкция по регистрации
http://synthesis.ipi.ac.ru/synthesis/projects/ astromedia/astroannounce
BASIC INFORMATION TECHNOLOGY FOR VO IS COMING.
SCIENTIFIC PROBLEM STATEMENTS AND MULTIDISCIPLINARY WORK ON THEIR SOLVING APPLYING
VO IS REQUIRED
IVOA Architecture Diagram