Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe Longo Chair of Astrophysics Department of Physical Sciences University of Napoli “Federico II”, Italy & INFN (Italian Institute for Nuclear Physics) [email protected]Chair: Prof. F. Murtagh – Queen University College Belfast H u b b l e D e e p F i e l d
27
Embed
Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hrvatska, June 3-rd, 2003
COST Action n. 283 - progress report, June 2003
Computational and Information Infrastructures in the
Astronomical Data GRID
Giuseppe Longo
Chair of AstrophysicsDepartment of Physical SciencesUniversity of Napoli “Federico II”, Italy & INFN (Italian Institute for Nuclear Physics)[email protected]
Chair: Prof. F. Murtagh – Queen University College Belfast
Hu
bb
le D
eep
Fie
ld
Hrvatska, June 3-rd, 2003
Methodological background:
Id est: is history teaching us something (or isn’t it?)…
Role of Technological Breakthroughs
All discoveries
Before 1954
After 1954
Num
ber
of
di s
c overi
es
Hrvatska, June 3-rd, 2003
Where is (now) the next breakthrough in Astronomy?
Either new channels (better: new information carriers):
• Electromagnetic waves (optical since 1609, other since 60’s)
• Solid samples (70’s ->)
• Gravitational waves (2005 ->)
• Neutrino’s (early 80’s ->)Or leaps in any of:
• Sensitivity
• Spectral range
• Spectral resolution
• Angular resolution
• Time resolution
Hrvatska, June 3-rd, 2003
The iAstro people believe that:
Discoveries
Massive data sets
Distributed computing
Massive data
mining
Hardware breakthrough: wide field imaging with CCD Mosaics enables digital surveys
The Sky covers 40.000 sq. Deg.
With 0.6 arcsec sampling: 2 x 1012 pxl
8 TB for band (10/100 TB/survey)
Ca. 10 PB keeping temporal resolution (ca.h for 1 yr …need for 20 yr)
Hrvatska, June 3-rd, 2003
From Traditional to Survey Science
Highly successful and increasingly prominent, but inherently limited by the information content of individual surveys …What comes next, beyond survey science is distributed (V.O.) science
Data Analysis
Results
Telescope
Traditional:
SurveyTelescope
Archive
Follow-UpTelescope
Results
Target SelectionData Mining
Another Survey/Archive?
Survey-Based:
Courtesy of G. Djorgovski
Hrvatska, June 3-rd, 2003
SurveysObservatories
Missions
Surveyand
MissionArchives Follow-Up
Telescopesand
Missions
Results
Data Services---------------Data Miningand Analysis,
Target Selection
Digital libraries
Primary Data Providers
VOSecondary
DataProviders
A Schematic Illustration of the new astronomy
Courtesy of G. Djorgovski
Hrvatska, June 3-rd, 2003
Radio Far-Infrared Visible
Visible + X-ray Dust Map Density Map
Panchromatic view of the Universe:
Search for the unknown
Offers:Different physicsGlobal understandingComparison with theoryNew discoveries
New domains of the parameter space: cf. time
Faint, Fast Transients (Tyson et al.)
Hrvatska, June 3-rd, 2003
RA Dec
WavelengthTime
Flux
Propermotion
Non-EM …
Polarization
Morphology / Surf.Br.
High dimensionality (N>>100)What is the coverage?Where are the gaps?
Calls for…Automatic catalogue extractionspurious features removalimage parametrization and classificationdata compressionmultiscale analysis, etc.
Hrvatska, June 3-rd, 2003
0,0000001
0,00001
0,001
0,1
10
1000
1500
1600
1700
1800
1900
2000
Hours of Computer
Time/Night
T2 (Moore)~1.5 years
Sounds Beautiful ! …. BUT:
Terascale (Petascale?) computing and/or better algorithms are required
Hrvatska, June 3-rd, 2003
In modern data sets: DD >> 10, DS >> 3Data Complexity Multidimensionality DiscoveriesBut the bad news is …
The computational cost of clustering analysis:
Some dimensionality reduction methods do exist (e.g., PCA, classprototypes, hierarchical methods, etc.), but more work is needed
K-means: K N I DExpectation Maximisation: K N I D2
Monte Carlo Cross-Validation: M Kmax2 N I D2
N = no. of data vectors, D = no. of data dimensionsK = no. of clusters chosen, Kmax = max no. of clusters triedI = no. of iterations, M = no. of Monte Carlo trials/partitions
Digital sky surveys call for huge increases in computing power
Hrvatska, June 3-rd, 2003
Hrvatska, June 3-rd, 2003
“Standard Activities” all meeting reports and proceedings on the web• First and Second MC meetings, Brussels,
11/23/2001 & 2/14-15/2002
• Third MC meeting, Edinburgh, 07/21/2002(at GGF-5, Global Grid Forum 5)
• Fourth MC meeting & workshop on Multispectral data analysis, and image metadata, Strasbourg, 11/28-29/2002
• Fifth MC meeting & workshop on High/low resolution signal processing, Granada, 02/22-23/2003
• Planned: Sixth MC meeting & workshop on Poisson noise models, Nice, Oct. 2003.
• Planned: Seventh MC meeting & workshop on Data mining & Image analysis in a distributed environment, Capri, Mar. 2004.
Hrvatska, June 3-rd, 2003Granada, february 2002
Guess who was taking the picture…
Hrvatska, June 3-rd, 2003
Major Orientation of iAstro in early 2003: FP6
• Expressions of Interest filed in - summer 2002.
• Participation in Commission Information Days.
• Involvement in several NoEs (sensor fusion, information retrieval, e-education and training, the European virtual observatory, and digital signal processing and data mining in medicine).
• Participation in evaluation panels.
Hrvatska, June 3-rd, 2003
• Submitted early April 2003.• Participants: iAstro partners in BG, CH, D, E, F, GR, H, I, IRL and
UK. Additional partner cluster in University of Paris Sud.
COST 283 proposal for the Marie Curie RTN network
“GridFocus: Data and Information Fusion and Mining in the Context of the DataGrid”
Multiband and multiple layer image and signal processing as a basic paradigm for the data Grid.
Data mining of visual and other streams, including high performance forensic image data mining.
Empirical and virtual data interfaces.
Hrvatska, June 3-rd, 2003
GridFocus concept based on data dynamics and information thermodynamics
Step 3 - experiments to find the optimal architecture Varying n. of input, n. of hidden, n. of patterns in the training set, n. of training epochs, n. of Bayesian cycles and inner loops, etc.
• Advance the state of the art through our workshops and visits. Manyof these exchanges presented results in a Special Issue of Neural Networks (Ed. Tagliaferri and Longo, vol 16 3-4, 2003). Status: ongoing.
• Define our role vis-à-vis large
Framework Programme projects on the virtual observatory, grid, computer vision, etc. through an iAstro White Paper. Status: done in early 2003.
• Spin-off specific targeted actions where greater resources are needed. Status: GridFocus Marie Curie RTN network proposal written and submitted in early 2003; local initiatives
• Next step: spin-out and commercial exploitation of our work through a STREP or IP proposal?
iAstro strategy:
Hrvatska, June 3-rd, 2003
iAstro web pages: http://www.iastro.org
To join the iAstro Mailing List: send a message to: [email protected]