ESM post processing workflow

ESM post processing workflow

ESM post processing (PP) tables on (http://esm.zmaw.de)

• In these tables will be all information necessary for post processing (PP) :– What (which code, …) will be filled in CERA data

base (and archived (=DISK?)) ?– Which PP of output is necessary for which code ?– What are the corresponding meta data (CF-

names available ?, …)– See also : cdo –t <partab> <file> ? => Unify and complete PP-tables for different

models (see continuing slides)=> Create or adapt corresponding CERA code

tables (see below : new table ‘millenium’ or old ‘echam5’, … ???)

ESM post processing tables on (http://esm.zmaw.de)

• Fields necessary for PP or/and CERA :• Code : has to be unique (within model)• Short name : name in model; used for temporary

post processing files• Long name : same as CF-std.-name, if exists (goal :

find for all var. CF names !?)• Unit, dimension• Raw=model out ; PP=postproc ; CERA=in DB !! (diff.

Between PP and CERA ??)• ATTRB, DISC ??• + Midrange/Longterm-ARCHIVE ??

Post processing steps1. Write raw output of component models on work

space (WS) / disk array (DA) (by expid.run)2. Postprocessing on workspace WS (by expid.post)

1. Regrid, pressure levels, … (afterburner)2. Convert output in grib format, if necessary (CDOs)3. tar and szip files (prepare for archiving)4. Further postprocessing, e.g. monthly means, … (CDOs)5. Split raw files in (time series of) codes (for dbfill)

3. Archive output (by expid.arch)1. Keep midrange storage data on disk DA2. Longterm storage data on archive AR

4. Fill postproc. output in CERA (DB) (by expid.dbfill)5. Quality assurance, controlling, clean up etc.

General scheme

CERA-DB

expid.run

Write raw files on DA

Experiment (Model run)

AR(long term)

WS

expid.postRawfiles:

- multicode

Postprocessing

Pp. data to archive

Pp. data for DB

Fill in DB

expid.dbfillTransfer DB

files

Pp. data for DB

archive

DA:mr_out/

Questions to the modellers• Fill out PP- and code-tables !• Which output has to be stored where ?

– Archive raw (and postproc. ?) files as tar/szip/grib files on mid-range(DA)/long-term(/ut) ? -> ‘DISC’?

– Store time series/monthly means of which codes in CERA data base ? -> see tables

=> Which temporary files can be (re)moved when ?

• Is this changing for different experiments ?• Further infos PP (esp. DB fill) has to ‘know’ ?

Action items for IMDI (SRE)

• Create SRE-scripts for each component model (called by expid.post) :– expid.echam.post (more or less ready ?)– expid.jsbach.post (open; as echam ?)– expid.mpiom.post (open)– expid.hamocc.post (open; as mpiom ?)

• Trigger automatic data base filling• Quality assurance, Monitoring, Controlling,

clean up etc.

ECHAM

DB

expid.runWrite raw files on

DA

ECHAM5 run AR

archive

WS

expid.echam.post Monthly grib files

(multicode)

afterburnerATM.expid.grb

Fill in DB

expid.dbfill

tar and transfer to DB

Pp. data for DB

BOT.expid.grbsplit in codes

code1 : 1 code time seriescode2 : 1 code time series…………………

..

JSBACH (as ECHAM ?).post

MPI-OM

DB

expid.runWrite raw files on

DA

MPI-OM run AR

archive

WS

Expid.mpiom.post

Monthlymulticode files (extra)

Fill in DB

expid.dbfillTransfer DB

files

Pp. data for DB

split in codes code1 : 1 code time seriescode2 : 1 code time series…………………

..

conv2grib

Monthlymulticode files (grib)

concatenat

e

Runper.multicode files (grib)

szip

Runper.multicode files (grib-szip)

HAMMOC (as MPI-OM ??)

Monthly ? Rawfiles(netCDF)

.post

Monitoring and error handling

• Check (automatically) file sizes, … after each PP-step (as well tar files)

• If output checked in step m => • Status (‘ok’ or ‘error’) in corresponding log file

• If error occurs =>• Assure, that errors are detected in time and

communicated to responsible persons !• If error occurs => What are the necessary actions

(stop step m-1 ? , …. ??)• Assure ‘restart’ of workflow, if status is set again of

‘ok’

Synchronous workflow

Expid.run

Expid.post

Expid.dbfill

Model time[months]

ok ok

ok ok ERROR !

??

ESM post processing workflow

( Visions or the future ?? )

Post processing steps (vision 1)

1. Write model output (timeseries of codes, means, …) directly by the model run (expid.run) in the data base (DB)

2. Optional : Archive output on disc/tape (AR)

General scheme (vision 1)

DB

expid.run

Write_sql (Output from model …

-… directly from model runas it should stored in DB( e.g. timeseries of singular

codes,monthly means) )

Experiment (Model run) AR

optional

archive

Vision 1 : Counter arguments

• Actual postprocessing (esp. afterburner (convert spectral to regular grid, pressure levels, ‘merging’ of codes etc. )) has to be implemented in the model itself

• What happens if data base filling ‘hangs’• On the compute server must exist a data

base (Oracle) interface• ….

Post processing steps (vision 2)

1. Write model raw output (multicode and –level, model specific formated and gridded, …) on workspace (WS) (expid.run)

2. Postprocess and prepare data on DA and prepare for archiving and data base filling (by expid.post)

3. Write this postproc. data from WS / DA in the data base (DB) (in expid.post)

4. Optional : Archive output on disc/tape (AR) (in expid.post)

General scheme (vision 2)

DB

expid.run

Write raw files on DA

Experiment (Model run) AR

archive

WS

expid.post Rawfiles:- multicode, modelspecific

Postprocessing

Pp. data to archive

Pp. data for DB

Fill in DB

Vision 2 : Preconditions

• Workspace DA must be mounted on a system accessible by the data base,

• Or in other words : files should be directly written from WS in DB

• Performance ? : Write and read on the same disc (but same problem with file transfer ?)

• … further counter arguments ??

ESM post processing workflow

Documents

ESM post processing workflow