Top Banner
J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science Operations Centre ESA/ESAC/SRE-OOO
25

J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 1

Data Management Challenges in Gaia

Jose Hernandez

Alexander Hutton

Gaia Science Operations Centre

ESA/ESAC/SRE-OOO

Page 2: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 2

• Gaia Observing strategy

• Data flow and Pipelines

• Data Challenges

• Data Tracking

• Tools

• Examples

Outline

Page 3: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 3

• Survey Mission at L2

• Scan the sky along great circles

• Accumulate the data on-board

• Download to Earth every night

• Full Sky Observed every 6 months

• Repeat it for at least 5 years => 10 Full Sky Maps

Gaia Observing strategy

Page 4: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 4

Gaia Observing strategy

Page 5: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 5

• 100 Tb of raw data

• We expect to observe 109 Sources (could end up being 2x109)

• Spectra for 2x108 sources

• 80 Observations per source on average:

• 1011 Astro/Photo Observations

• 2x1010 Spectra

Gaia: Some numbers after 5 years

Page 6: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 6

Operaciones

New NorciaCebreros

Mission Operation Centre (MOC)

ESOC

Science Operation Centre (SOC)

ESAC

Launcher

Satellite

Data Processing & Analysis

Consortium

DPAC

Page 7: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 7

New NorciaCebreros

Mission Operation Centre (MOC)

ESOC

Science Operation Centre (SOC)

ESAC

Launcher

Satellite

Data Processing & Analysis

Consortium

DPAC

Data flow

Malargüe

Page 8: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 8

Data flow

Figure Courtesy A. Brown, DPAC

Page 9: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 9

Data Processing Cycles

MDB-02MDB-01MDB-00

DPCsDPCs

DPCsDPCsDPCs

DPCs

MOC

<=8.5 Mbit/s

Daily Pipelines

SOC

Page 10: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 10

• Sheer number of Observations

• Ensuring No Data Loss

• Managing the Daily Data Flow

• Data Tracking

• DPCs Autonomous and Geographically Distributed

Some Challenges

Page 11: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 11

• Single Data Model/ICD with DPCs

• MDB Dictionary Tool on-line:• Keeps track of versions, changes,…

• Immediate visibility

• Automatic generation of DM classes, DB schema, Data Consumers…

• DM evolution controlled by CCB

Tools: Data Modeling

Page 12: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 12

• All Data tagged with a barcode

• Named “Solution Identifier”

• It is just a Long (64bit) Number

• Each solutionId has some metadata

Data Management and Tracking

Page 13: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 13

• Used to identify data

• Who, when, where generated the data

• What SW version, environment, run number, at what time

• We also use it to manage the daily data flow

• Related data gets same solutionId, this is a form of doing “data binning”

Data Tracking: solutionId

Page 14: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 14

Data Tracking: solutionId

• Track Data Provenance• Verify correct calibrations

get used• Find what was affected

by incorrect data• Remove incorrect data

from the pipelines

Page 15: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 15

• Current Numbers:

• 10.4x109 Astro/Photo Observations

• 1.3x109 Spectra

• Received 6.3 Tb RAW Science Data

• 144Gb of HouseKeeping Data

• 21Tb Generated in the processing

• Typically the daily pipelines are writing thousands object/sec

Data Integrity and Completness

Page 16: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 16

• Challenges:

• Ensuring there are no data leakages

• Data consistency and completeness

• Within the pipelines and wrt the MDB

Data Integrity and Completeness

SOC MDB

DPCC

DPCB

DPCG

DPCI

DPCT

MOC

Page 17: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 17

• All Gaia Data can be related to On Board Time, examples:• At time x the source image crosses CCD

• At time y Charge Injections occur

• Spacecraft attitude

• Use OBMT to collapse records of the same time together and count the number of Objects per bin

Time Data Binning

Page 18: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 18

Time Data Binning

Page 19: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 19

Time Data Binning

Galactic Centre Crossings

Galaxy Tail Crossings

FOV-PFOV-F

Page 20: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 20

• Data Binning gets done on the fly as the pipeline stores it, no overhead

• We can then compare the TimeLine data at different points

• We can also check Data Consistency

• All the checks can be automated and alarms raised if problems found

Time Data Binning

Page 21: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 21

Examples: Omega Centauri

Page 22: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 22

Time Data Binning

Galactic PlaneCrossing (FOV-P)

Omega CentauriCrossing

Page 23: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 23

Omega Centauri observation

50 sec

100,000 Observations

Page 24: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 24

Omega Centauri observation

Page 25: J. Hernandez / A. Hutton ADASS XXIV, 5-9 October 2014, Calgary, Canada 1 Data Management Challenges in Gaia Jose Hernandez Alexander Hutton Gaia Science.

J. Hernandez / A. HuttonADASS XXIV, 5-9 October 2014,

Calgary, Canada 25

Questions?

NGC 1818 in LMC