Top Banner
Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002
29

Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Virtual Observatory &Grid Technique

ZHAO Yongheng

(National Astronomical Observatories of China)

CANS2002

Page 2: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Computational Science The Third Science Branch is Evolving

• In the beginning science was empirical.• Then theoretical branches evolved.

• Now, we have computational branches.– Has primarily been simulation– Growth area data analysis/visualization

of peta-scale instrument data.

• Analysis & Visualization tools– Help both simulation and instruments.– Are primitive today.

Page 3: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Computational Science

• Traditional Empirical Science – Scientist gathers data by direct

observation– Scientist analyzes data

• Computational Science– Data captured by instruments

Or data generated by simulator– Processed by software– Placed in a database– Scientist analyzes database

• Concern: Scalability

Page 4: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Astronomy Data Growth• In the “old days” astronomers took photos.

• Starting in the 1960’s they began to digitize.• New instruments are digital (100s of GB/night)

• Detectors are following Moore’s law.

• Data avalanche: double every 2 years

Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.

Page 5: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Universal Access to Astronomy Data • Astronomers have a few Petabytes now.

– 1 pixel (byte) / sq arc second ~ 4TB– Multi-spectral, temporal, … → 1PB

• They mine it looking for new (kinds of) objects or more of interesting ones (quasars), density variations in 400-D space correlations in 400-D space

• Data doubles every 2 years.• Data is public after 2 years.• So, 50% of the data is public.• Some have private access to 5% more data.• So: 50% vs 55% access for everyone

Page 6: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

The Changing Style of Observational Astronomy

The Old Way: Now: Future:

Pointed, heterogeneous

observations (~ MB - GB)

Large, homogeneous sky surveys

(multi-TB, ~ 106 - 109 sources)

Multiple, federated sky surveys and archives (~ PB)

Small samples of objects (~ 100 - 103)

Archives of pointed observations (~ TB) Virtual

Observatory

Page 7: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Why Astronomy Data?•It has no commercial value

–No privacy concerns

–Can freely share results with others

–Great for experimenting with algorithms

•It is real and well documented–High-dimensional data (with confidence intervals)

–Spatial data

–Temporal data

•Many different instruments from many different places and many different times

•Federation is a goal

•The questions are interesting–How did the universe form?

•There is a lot of it (petabytes)

IRAS 100

ROSAT ~keV

DSS Optical

2MASS 2

IRAS 25

NVSS 20cm

WENSS 92cm

GB 6cm

Page 8: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Chandra

Hubble

MMT

Sub-mm array

VLA

Antartica sub-mm Magellan 6.5m

Whipple -ray

SIRTF

Oak Ridge

1.2m CO

Virtual Observatory == World-Wide Telescope

Page 9: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Virtual Observatory

• Premise: Most data is (or could be online)• So, the Internet is the world’s best telescope:

– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..

– As deep as the best instruments (2 years ago).

– It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..).

– It’s a smart telescope: links objects and data to literature on them.

Page 10: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Why is VO a Good Scientific Prospect?• Technological revolutions as the drivers/enablers

of the bursts of scientific growth

• Historical examples in astronomy:– 1960’s: the advent of electronics and access to space

Quasars, CMBR, x-ray astronomy, pulsars, GRBs, …

– 1980’s - 1990’s: computers, digital detectors (CCDs etc.)

Galaxy formation and evolution, extrasolar planets, CMBR fluctuations, dark matter and energy, GRBs, …

– 2000’s and beyond: information technology

The next golden age of discovery in astronomy?

VO is the mechanism to effect this process

Page 11: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

SurveysObservatories

Missions

Surveyand

MissionArchives Follow-Up

Telescopesand

Missions

Results

Data Services---------------Data Miningand Analysis,

Target Selection

Digital libraries

Primary Data Providers

VOSecondary

DataProviders

SDSS

(USA)

LAMOST

(China)

4000 5000 6000 7000 8000 9000 10000 0

100

200

300

400

3000 4000 5000 6000 7000 8000 9000 0

10 20 30 40

Page 12: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Virtual Observatory & the Public

• The universe at anyone’s fingertips

• Educational activities involving real data

• New discoveries made by schoolchildren

• Interactive exhibits based on archived data

• Astronomy as a motivator for learning about computing

Real Astronomy Experience

Page 13: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Virtual Observatory Challenges• Size : multi-Petabyte

40,000 square degrees is 2 Trillion pixels– One band (at 1 sq arcsec) 4 Terabytes– Multi-wavelength 10-100 Terabytes– Time dimension >> 10 Petabytes– Need auto parallelism tools

• Unsolved MetaData problem– Hard to publish data & programs– How to federate Archives– Hard to find/understand data & programs

• Current tools inadequate– new analysis & visualization tools– Data Federation is problematic

• Transition to the new astronomy– Sociological issues

Page 14: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Astronomical Strategies

PROBLEM SOLUTIONSlow CPU growth Distributed Computing

Limited storage Distributed Data

Limited bandwidth Information Hierarchies

- Move only what you need

Data diversity InteroperabilityVO

Page 15: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Grids

GRID

MIDDLEWARE

Visualization

Supercomputer, PC-Cluster

Data-storage, Sensors, Experiments

Internet, networks

Desktop

Mobile Access

Hof

fman

n, R

eine

feld

, Put

zer

Page 16: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

the Virtual Observatory concept

• Aim to make all archives speak the same language

– all searchable and analysable by the same tools

– all data sources accessible through a uniform interface

– all data held in distributed databases that appear as one

archives form the Digital Sky

– eventual interface to real observatories

the archive is the sky

Page 17: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

• shared managed distributed resources– documents + data + software + storage + cycles + expertise

• network : ability to pass messages• web : transparent document system• computational grid : transparent CPU • datagrid : transparent data access and services• information grid, knowledge grid ... ?• Virtual Organisations ?

the Grid concept

a supercomputer on your desktop

everybody canbe a power user

Page 18: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Three Layer GRID Abstraction

Information Grid

Knowledge Grid

Computation/Data GridDat

a to

Kno

wle

dge

Con

trol

Automation

E-Science

Page 19: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

What’s needed?

Science Data & Questions

Scientists

DatabaseTo store

dataExecuteQueries

Plumbers

Data Mining

Algorithms

Miners

Question & AnswerVisualizat

ion

Tools

Page 20: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

obstacles to overcome

• sociology

• internet technology

• i/o bottleneck

• network bottleneck

Page 21: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

obstacles to overcome (1)

• sociology– need agreed formats for data, metadata, provenance– need standardised semantics ("ontology")

• internet technology– need protocols for publishing and exchanging data– need registry for publishing service availability and

semantics– need method of transmitting authentication/authorisation– need methods for managing distributed resources

Page 22: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

obstacles to overcome (2)

• i/o bottleneck– need database supercomputers – need innovative search and analysis

algorithms

• network bottleneck– data centers must provide analysis service– facility class analysis code needed

shift the results not the data

Page 23: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Distributed Computing at Work

• Virtual and collaborative exploration of the Universe

Floating Point Operations

Total CPU time

Results received

4.260259e+18

49.31 TFLOPs/sec

1.502416e+21

1662.448 years954229.737 years

1092374491854017

50753675440Users

Last 24 HoursTotal

Page 24: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

SkyQuery

Won 2nd prize in Microsoft .NET Contest

Page 25: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Compute Resources Catalogs Data Archives

InformationDiscovery

Metadatadelivery

Data Discovery

Data Delivery

Catalog Mediator Data mediator

1. Portals and Workbenches

Bulk DataAnalysis

CatalogAnalysis

MetadataView

DataView

4.GridSecurityCachingReplicationBackupScheduling

2.Knowledge & ResourceManagement

Standard Metadata format, Data model, Wire format

Catalog/Image Specific Access

Standard APIs and Protocols Concept space

3.

5.

6.

7.

Derived Collections

National Virtual ObservatoryData Grid

Page 26: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.
Page 27: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

AVO STATUS•AVO approved with EU funds ~2 Million € (total budget ~ 4M €)• Contract start on 15 November 2001 - 3 Year Phase A study• 9 NEW POSITIONS for 3 years over 6 institutions - total 18 FTE (~ 50 people)•Total VO funding AVO+NVO+ASTROGRID = $21 million (US)•3 Year target :

•Build VO 1.0 among the 6 partner archive sets by•Defining and executing trial science cases•Defining, developing and deploying new interoperability standards and tools•Developing and deploying new Grid-based services

Page 28: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Data-Rich Astronomy and Other Fields

• Technical and methodological challenges facing the VO are common to most data-intensive sciences today, and beyond (commerce, industry, finance, etc.)

• Interdisciplinary exchanges (e.g., with physics, biology, earth sciences, etc.) intellectual cross-fertilization, avoid wasteful duplication of efforts

• Partnerships and collaborations with applied CS/IT are essential, may lead to significant technological advances High-energy physics WWW ! The Grid Astronomy (VO) ???

Page 29: Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.

Scaling the VO MountainDiscoveriesDiscoveries

Data MiningVisualization

Data MiningVisualization

Data ServicesData Services

Existing Centers and ArchivesExisting Centers and Archives

We arehere

Thank you!