Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002
Virtual Observatory &Grid Technique
ZHAO Yongheng
(National Astronomical Observatories of China)
CANS2002
Computational Science The Third Science Branch is Evolving
• In the beginning science was empirical.• Then theoretical branches evolved.
• Now, we have computational branches.– Has primarily been simulation– Growth area data analysis/visualization
of peta-scale instrument data.
• Analysis & Visualization tools– Help both simulation and instruments.– Are primitive today.
Computational Science
• Traditional Empirical Science – Scientist gathers data by direct
observation– Scientist analyzes data
• Computational Science– Data captured by instruments
Or data generated by simulator– Processed by software– Placed in a database– Scientist analyzes database
• Concern: Scalability
Astronomy Data Growth• In the “old days” astronomers took photos.
• Starting in the 1960’s they began to digitize.• New instruments are digital (100s of GB/night)
• Detectors are following Moore’s law.
• Data avalanche: double every 2 years
Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.
Universal Access to Astronomy Data • Astronomers have a few Petabytes now.
– 1 pixel (byte) / sq arc second ~ 4TB– Multi-spectral, temporal, … → 1PB
• They mine it looking for new (kinds of) objects or more of interesting ones (quasars), density variations in 400-D space correlations in 400-D space
• Data doubles every 2 years.• Data is public after 2 years.• So, 50% of the data is public.• Some have private access to 5% more data.• So: 50% vs 55% access for everyone
The Changing Style of Observational Astronomy
The Old Way: Now: Future:
Pointed, heterogeneous
observations (~ MB - GB)
Large, homogeneous sky surveys
(multi-TB, ~ 106 - 109 sources)
Multiple, federated sky surveys and archives (~ PB)
Small samples of objects (~ 100 - 103)
Archives of pointed observations (~ TB) Virtual
Observatory
Why Astronomy Data?•It has no commercial value
–No privacy concerns
–Can freely share results with others
–Great for experimenting with algorithms
•It is real and well documented–High-dimensional data (with confidence intervals)
–Spatial data
–Temporal data
•Many different instruments from many different places and many different times
•Federation is a goal
•The questions are interesting–How did the universe form?
•There is a lot of it (petabytes)
IRAS 100
ROSAT ~keV
DSS Optical
2MASS 2
IRAS 25
NVSS 20cm
WENSS 92cm
GB 6cm
Chandra
Hubble
MMT
Sub-mm array
VLA
Antartica sub-mm Magellan 6.5m
Whipple -ray
SIRTF
Oak Ridge
1.2m CO
Virtual Observatory == World-Wide Telescope
Virtual Observatory
• Premise: Most data is (or could be online)• So, the Internet is the world’s best telescope:
– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..
– As deep as the best instruments (2 years ago).
– It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..).
– It’s a smart telescope: links objects and data to literature on them.
Why is VO a Good Scientific Prospect?• Technological revolutions as the drivers/enablers
of the bursts of scientific growth
• Historical examples in astronomy:– 1960’s: the advent of electronics and access to space
Quasars, CMBR, x-ray astronomy, pulsars, GRBs, …
– 1980’s - 1990’s: computers, digital detectors (CCDs etc.)
Galaxy formation and evolution, extrasolar planets, CMBR fluctuations, dark matter and energy, GRBs, …
– 2000’s and beyond: information technology
The next golden age of discovery in astronomy?
VO is the mechanism to effect this process
SurveysObservatories
Missions
Surveyand
MissionArchives Follow-Up
Telescopesand
Missions
Results
Data Services---------------Data Miningand Analysis,
Target Selection
Digital libraries
Primary Data Providers
VOSecondary
DataProviders
SDSS
(USA)
LAMOST
(China)
4000 5000 6000 7000 8000 9000 10000 0
100
200
300
400
3000 4000 5000 6000 7000 8000 9000 0
10 20 30 40
Virtual Observatory & the Public
• The universe at anyone’s fingertips
• Educational activities involving real data
• New discoveries made by schoolchildren
• Interactive exhibits based on archived data
• Astronomy as a motivator for learning about computing
Real Astronomy Experience
Virtual Observatory Challenges• Size : multi-Petabyte
40,000 square degrees is 2 Trillion pixels– One band (at 1 sq arcsec) 4 Terabytes– Multi-wavelength 10-100 Terabytes– Time dimension >> 10 Petabytes– Need auto parallelism tools
• Unsolved MetaData problem– Hard to publish data & programs– How to federate Archives– Hard to find/understand data & programs
• Current tools inadequate– new analysis & visualization tools– Data Federation is problematic
• Transition to the new astronomy– Sociological issues
Astronomical Strategies
PROBLEM SOLUTIONSlow CPU growth Distributed Computing
Limited storage Distributed Data
Limited bandwidth Information Hierarchies
- Move only what you need
Data diversity InteroperabilityVO
Grids
GRID
MIDDLEWARE
Visualization
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Internet, networks
Desktop
Mobile Access
Hof
fman
n, R
eine
feld
, Put
zer
the Virtual Observatory concept
• Aim to make all archives speak the same language
– all searchable and analysable by the same tools
– all data sources accessible through a uniform interface
– all data held in distributed databases that appear as one
archives form the Digital Sky
– eventual interface to real observatories
the archive is the sky
• shared managed distributed resources– documents + data + software + storage + cycles + expertise
• network : ability to pass messages• web : transparent document system• computational grid : transparent CPU • datagrid : transparent data access and services• information grid, knowledge grid ... ?• Virtual Organisations ?
the Grid concept
a supercomputer on your desktop
everybody canbe a power user
Three Layer GRID Abstraction
Information Grid
Knowledge Grid
Computation/Data GridDat
a to
Kno
wle
dge
Con
trol
Automation
E-Science
What’s needed?
Science Data & Questions
Scientists
DatabaseTo store
dataExecuteQueries
Plumbers
Data Mining
Algorithms
Miners
Question & AnswerVisualizat
ion
Tools
obstacles to overcome (1)
• sociology– need agreed formats for data, metadata, provenance– need standardised semantics ("ontology")
• internet technology– need protocols for publishing and exchanging data– need registry for publishing service availability and
semantics– need method of transmitting authentication/authorisation– need methods for managing distributed resources
obstacles to overcome (2)
• i/o bottleneck– need database supercomputers – need innovative search and analysis
algorithms
• network bottleneck– data centers must provide analysis service– facility class analysis code needed
shift the results not the data
Distributed Computing at Work
• Virtual and collaborative exploration of the Universe
Floating Point Operations
Total CPU time
Results received
4.260259e+18
49.31 TFLOPs/sec
1.502416e+21
1662.448 years954229.737 years
1092374491854017
50753675440Users
Last 24 HoursTotal
Compute Resources Catalogs Data Archives
InformationDiscovery
Metadatadelivery
Data Discovery
Data Delivery
Catalog Mediator Data mediator
1. Portals and Workbenches
Bulk DataAnalysis
CatalogAnalysis
MetadataView
DataView
4.GridSecurityCachingReplicationBackupScheduling
2.Knowledge & ResourceManagement
Standard Metadata format, Data model, Wire format
Catalog/Image Specific Access
Standard APIs and Protocols Concept space
3.
5.
6.
7.
Derived Collections
National Virtual ObservatoryData Grid
AVO STATUS•AVO approved with EU funds ~2 Million € (total budget ~ 4M €)• Contract start on 15 November 2001 - 3 Year Phase A study• 9 NEW POSITIONS for 3 years over 6 institutions - total 18 FTE (~ 50 people)•Total VO funding AVO+NVO+ASTROGRID = $21 million (US)•3 Year target :
•Build VO 1.0 among the 6 partner archive sets by•Defining and executing trial science cases•Defining, developing and deploying new interoperability standards and tools•Developing and deploying new Grid-based services
Data-Rich Astronomy and Other Fields
• Technical and methodological challenges facing the VO are common to most data-intensive sciences today, and beyond (commerce, industry, finance, etc.)
• Interdisciplinary exchanges (e.g., with physics, biology, earth sciences, etc.) intellectual cross-fertilization, avoid wasteful duplication of efforts
• Partnerships and collaborations with applied CS/IT are essential, may lead to significant technological advances High-energy physics WWW ! The Grid Astronomy (VO) ???