1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research http://research.microsoft.com/~gray Alex Szalay Johns Hopkins University
Mar 27, 2015
1
Online ScienceThe World-Wide Telescope
as a Prototype For the New Computational Science
Jim GrayMicrosoft Research
http://research.microsoft.com/~gray
Alex SzalayJohns Hopkins University
2
The Evolution of Science• Observational Science
– Scientist gathers data by direct observation– Scientist analyzes data
• Analytical Science – Scientist builds analytical model– Makes predictions.
• Computational Science – Simulate analytical model– Validate model and makes predictions
• Data Exploration Science Data captured by instrumentsOr data generated by simulator– Processed by software– Placed in a database / files– Scientist analyzes database / files
3
Information Avalanche• In science, industry, government,….
– better observational instruments and – and, better simulations producing a data avalanche
• Examples– BaBar: Grows 1TB/day
2/3 simulation Information 1/3 observational Information
– CERN: LHC will generate 1GB/s .~10 PB/y– VLBA (NRAO) generates 1GB/s today– Pixar: 100 TB/Movie
• New emphasis on informatics:– Capturing, Organizing,
Summarizing, Analyzing, Visualizing
Image courtesy C. Meneveau & A. Szalay @ JHU
BaBar, Stanford
Space Telescope
P&E Gene Sequencer Fromhttp://www.genome.uci.edu/
4
World Wide TelescopeVirtual Observatoryhttp://www.astro.caltech.edu/nvoconf/
http://www.voforum.org/
• Premise: Most data is (or could be online)
• The Internet is the world’s best telescope:– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..
– As deep as the best instruments (2 years ago).
– It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..).
– It’s a smart telescope: links objects and data to literature on them.
5
Why Astronomy Data?•It has no commercial value
–No privacy concerns–Can freely share results with others–Great for experimenting with algorithms
•It is real and well documented– High-dimensional data (with confidence intervals)– Spatial data– Temporal data
•Many different instruments from many different places and many different times•Federation is a goal•There is a lot of it (petabytes)•Great sandbox for data mining algorithms
–Can share cross company–University researchers
•Great way to teach both Astronomy and Computational Science
IRAS 100
ROSAT ~keV
DSS Optical
2MASS 2
IRAS 25
NVSS 20cm
WENSS 92cm
GB 6cm
6
SkyServer.SDSS.org• A modern Astronomy archive
– Raw Pixel data lives in file servers– Catalog data (derived objects) lives in Database– Online query to any and all
• Also used for education– 150 hours of online Astronomy– Implicitly teaches data analysis
• Interesting things– Spatial data search– Client query interface via Java Applet– Query interface via Emacs– Popular – Cloned by other surveys (a template design) – Web services are core of it.
7
Federation: SkyQuery.Net• Combine 4 archives initially
• Just added 6 more
• Send query to portal, portal joins data from archives.
• Problem: want to do multi-step data analysis (not just single query).
• Solution: Allow personal databases on portal
• Problem: some queries are monsters
• Solution: “batch schedule” on portal server, Deposits answer in personal database.
82MASS
INT
SDSS
FIRST
SkyQueryPortal
ImageCutout
SkyQuery Structure• Each SkyNode publishes
– Schema Web Service– Database Web Service
• Portal is – Plans Query (2 phase) – Integrates answers– Is itself a web service
9
Information Avalanche: science, business, personal
Astronomy dataSkyServer: http://SkyServer.SDSS.orgdemo http://skyquery.net/
pixel spacerecord spaceset space
Personal SkyServer download http://skyserver.org/myskyserver/Mention data mining.
World-Wide TelescopeFederated web servicesdemo http://skyquery.net/Other web servicesInterop with Linux/Python/…
Other stuffPortal with batch job scheduler
http://skyservice.pha.jhu.edu/devel/casjobs/