Top Banner
Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe Longo Chair of Astrophysics Department of Physical Sciences University of Napoli “Federico II”, Italy & INFN (Italian Institute for Nuclear Physics) [email protected] Chair: Prof. F. Murtagh – Queen University College Belfast H u b b l e D e e p F i e l d
27

Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Mar 27, 2015

Download

Documents

Sierra Chase
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

COST Action n. 283 - progress report, June 2003

Computational and Information Infrastructures in the

Astronomical Data GRID

Giuseppe Longo

Chair of AstrophysicsDepartment of Physical SciencesUniversity of Napoli “Federico II”, Italy & INFN (Italian Institute for Nuclear Physics)[email protected]

Chair: Prof. F. Murtagh – Queen University College Belfast

Hu

bb

le D

eep

Fie

ld

Page 2: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Methodological background:

Id est: is history teaching us something (or isn’t it?)…

Role of Technological Breakthroughs

All discoveries

Before 1954

After 1954

Num

ber

of

di s

c overi

es

Page 3: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Where is (now) the next breakthrough in Astronomy?

Either new channels (better: new information carriers):

• Electromagnetic waves (optical since 1609, other since 60’s)

• Solid samples (70’s ->)

• Gravitational waves (2005 ->)

• Neutrino’s (early 80’s ->)Or leaps in any of:

• Sensitivity

• Spectral range

• Spectral resolution

• Angular resolution

• Time resolution

Page 4: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

The iAstro people believe that:

Discoveries

Massive data sets

Distributed computing

Massive data

mining

Hardware breakthrough: wide field imaging with CCD Mosaics enables digital surveys

The Sky covers 40.000 sq. Deg.

With 0.6 arcsec sampling: 2 x 1012 pxl

8 TB for band (10/100 TB/survey)

Ca. 10 PB keeping temporal resolution (ca.h for 1 yr …need for 20 yr)

Page 5: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

From Traditional to Survey Science

Highly successful and increasingly prominent, but inherently limited by the information content of individual surveys …What comes next, beyond survey science is distributed (V.O.) science

Data Analysis

Results

Telescope

Traditional:

SurveyTelescope

Archive

Follow-UpTelescope

Results

Target SelectionData Mining

Another Survey/Archive?

Survey-Based:

Courtesy of G. Djorgovski

Page 6: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

SurveysObservatories

Missions

Surveyand

MissionArchives Follow-Up

Telescopesand

Missions

Results

Data Services---------------Data Miningand Analysis,

Target Selection

Digital libraries

Primary Data Providers

VOSecondary

DataProviders

A Schematic Illustration of the new astronomy

Courtesy of G. Djorgovski

Page 7: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Radio Far-Infrared Visible

Visible + X-ray Dust Map Density Map

Panchromatic view of the Universe:

Search for the unknown

Offers:Different physicsGlobal understandingComparison with theoryNew discoveries

New domains of the parameter space: cf. time

Faint, Fast Transients (Tyson et al.)

Page 8: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

RA Dec

WavelengthTime

Flux

Propermotion

Non-EM …

Polarization

Morphology / Surf.Br.

High dimensionality (N>>100)What is the coverage?Where are the gaps?

Calls for…Feature selectionclusteringstatisticsKDDVisualization, etc…

Catalogue space (features; TB)

Pixel space (raw data; TB/PB) Huge data flow

data fusionneed for recalibrations

Calls for…Automatic catalogue extractionspurious features removalimage parametrization and classificationdata compressionmultiscale analysis, etc.

Page 9: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

0,0000001

0,00001

0,001

0,1

10

1000

1500

1600

1700

1800

1900

2000

Hours of Computer

Time/Night

T2 (Moore)~1.5 years

Sounds Beautiful ! …. BUT:

Terascale (Petascale?) computing and/or better algorithms are required

Page 10: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

In modern data sets: DD >> 10, DS >> 3Data Complexity Multidimensionality DiscoveriesBut the bad news is …

The computational cost of clustering analysis:

Some dimensionality reduction methods do exist (e.g., PCA, classprototypes, hierarchical methods, etc.), but more work is needed

K-means: K N I DExpectation Maximisation: K N I D2

Monte Carlo Cross-Validation: M Kmax2 N I D2

N = no. of data vectors, D = no. of data dimensionsK = no. of clusters chosen, Kmax = max no. of clusters triedI = no. of iterations, M = no. of Monte Carlo trials/partitions

Digital sky surveys call for huge increases in computing power

Page 11: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Page 12: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

“Standard Activities” all meeting reports and proceedings on the web• First and Second MC meetings, Brussels,

11/23/2001 & 2/14-15/2002

• Third MC meeting, Edinburgh, 07/21/2002(at GGF-5, Global Grid Forum 5)

• Fourth MC meeting & workshop on Multispectral data analysis, and image metadata, Strasbourg, 11/28-29/2002

• Fifth MC meeting & workshop on High/low resolution signal processing, Granada, 02/22-23/2003

• Planned: Sixth MC meeting & workshop on Poisson noise models, Nice, Oct. 2003.

• Planned: Seventh MC meeting & workshop on Data mining & Image analysis in a distributed environment, Capri, Mar. 2004.

Page 13: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003Granada, february 2002

Guess who was taking the picture…

Page 14: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Major Orientation of iAstro in early 2003: FP6

• Expressions of Interest filed in - summer 2002.

• Participation in Commission Information Days.

• Involvement in several NoEs (sensor fusion, information retrieval, e-education and training, the European virtual observatory, and digital signal processing and data mining in medicine).

• Participation in evaluation panels.

Page 15: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

• Submitted early April 2003.• Participants: iAstro partners in BG, CH, D, E, F, GR, H, I, IRL and

UK. Additional partner cluster in University of Paris Sud.

COST 283 proposal for the Marie Curie RTN network

“GridFocus: Data and Information Fusion and Mining in the Context of the DataGrid”

Multiband and multiple layer image and signal processing as a basic paradigm for the data Grid.

Data mining of visual and other streams, including high performance forensic image data mining.

Empirical and virtual data interfaces.

Page 16: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

GridFocus concept based on data dynamics and information thermodynamics

Page 17: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

SOMETHING ON SCIENCE….

Page 18: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

openimport

headercompliant

non compliant

open import

Head/proc.preprocessing

Parameter and training options

Supervisedunsupervised

supervisedParameter options

unsupervised

Labeledunlabeled

labeled

Label preparation

Feature selectionvia unsupervised clustering

MLP RBF Etc.

Training setpreparation

Feature selectionvia unsupervised clustering

Etc.GTMSOMFuzzy set

INTERPRETATION

Code in C++Parallelized on Beowulf

Used (so far) for

Cosmologyparticle Physics (ARGO)Gravitational Waves (VIRGO)

Page 19: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

A standard clustering example: unsupervised S/G classification

Input data: DPOSS catalogue (ca. 5x106 objects, 50 features each)

SOM (output is a U-Matrix) ~ GTM (output is a PDF)

1. Input data (Tables or strings)

2. Feature selection (backward elimination strategy)

3. Compression of input space and re-design of network

4. Classification

5. Labeling (e.g. 500 well classified objects)

6. …freeze & run on real data

Page 20: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Star/Galaxy classificationAutomatic selection of significant features Unsupervised SOM (DPOSS data)

Page 21: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Labeling

Localization of a set of 500 faint stars

Page 22: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Stars p.d.f galaxies p.d.f

cumulative p.d.f

G.T.M. unsupervised clustering; S/G

Page 23: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

cumulative p.d.f

Stars p.d.f galaxies p.d.f

G.T.M. unsupervised clustering; S/G – CDF Field

5x105 obj.

Page 24: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

SDSS-EDR DB

SOM unsup.Set construction

SOM supervisedFeature selection

MLP supervisedexperiments

SOM unsup.completeness

ReliabilityMap

Best MLP model

• Input data set: SDSS – EDR photometric data (galaxies)

• Training/validation/test set:SDSS-EDR spectroscopic subsample

Photometric redshifts: a mixed case

Page 25: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

Step 3 - experiments to find the optimal architecture Varying n. of input, n. of hidden, n. of patterns in the training set, n. of training epochs, n. of Bayesian cycles and inner loops, etc.

Convergence computed on validation set

Error derived from test set

Robust error: 0.02176

Page 26: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

• Advance the state of the art through our workshops and visits. Manyof these exchanges presented results in a Special Issue of Neural Networks (Ed. Tagliaferri and Longo, vol 16 3-4, 2003). Status: ongoing.

• Define our role vis-à-vis large

Framework Programme projects on the virtual observatory, grid, computer vision, etc. through an iAstro White Paper. Status: done in early 2003.

• Spin-off specific targeted actions where greater resources are needed. Status: GridFocus Marie Curie RTN network proposal written and submitted in early 2003; local initiatives

• Next step: spin-out and commercial exploitation of our work through a STREP or IP proposal?

iAstro strategy:

Page 27: Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Hrvatska, June 3-rd, 2003

iAstro web pages: http://www.iastro.org

To join the iAstro Mailing List: send a message to: [email protected]

Where & how to know more about iAstro:

Thanks to: E.U. & to… Prof. Fedi