ESM post processing workflow
ESM post processing (PP) tables on (http://esm.zmaw.de)
• In these tables will be all information necessary for post processing (PP) :– What (which code, …) will be filled in CERA data
base (and archived (=DISK?)) ?– Which PP of output is necessary for which code ?– What are the corresponding meta data (CF-
names available ?, …)– See also : cdo –t <partab> <file> ? => Unify and complete PP-tables for different
models (see continuing slides)=> Create or adapt corresponding CERA code
tables (see below : new table ‘millenium’ or old ‘echam5’, … ???)
ESM post processing tables on (http://esm.zmaw.de)
• Fields necessary for PP or/and CERA :• Code : has to be unique (within model)• Short name : name in model; used for temporary
post processing files• Long name : same as CF-std.-name, if exists (goal :
find for all var. CF names !?)• Unit, dimension• Raw=model out ; PP=postproc ; CERA=in DB !! (diff.
Between PP and CERA ??)• ATTRB, DISC ??• + Midrange/Longterm-ARCHIVE ??
Post processing steps1. Write raw output of component models on work
space (WS) / disk array (DA) (by expid.run)2. Postprocessing on workspace WS (by expid.post)
1. Regrid, pressure levels, … (afterburner)2. Convert output in grib format, if necessary (CDOs)3. tar and szip files (prepare for archiving)4. Further postprocessing, e.g. monthly means, … (CDOs)5. Split raw files in (time series of) codes (for dbfill)
3. Archive output (by expid.arch)1. Keep midrange storage data on disk DA2. Longterm storage data on archive AR
4. Fill postproc. output in CERA (DB) (by expid.dbfill)5. Quality assurance, controlling, clean up etc.
General scheme
CERA-DB
expid.run
Write raw files on DA
Experiment (Model run)
AR(long term)
WS
expid.postRawfiles:
- multicode
Postprocessing
Pp. data to archive
Pp. data for DB
Fill in DB
expid.dbfillTransfer DB
files
Pp. data for DB
archive
DA:mr_out/
Questions to the modellers• Fill out PP- and code-tables !• Which output has to be stored where ?
– Archive raw (and postproc. ?) files as tar/szip/grib files on mid-range(DA)/long-term(/ut) ? -> ‘DISC’?
– Store time series/monthly means of which codes in CERA data base ? -> see tables
=> Which temporary files can be (re)moved when ?
• Is this changing for different experiments ?• Further infos PP (esp. DB fill) has to ‘know’ ?
Action items for IMDI (SRE)
• Create SRE-scripts for each component model (called by expid.post) :– expid.echam.post (more or less ready ?)– expid.jsbach.post (open; as echam ?)– expid.mpiom.post (open)– expid.hamocc.post (open; as mpiom ?)
• Trigger automatic data base filling• Quality assurance, Monitoring, Controlling,
clean up etc.
ECHAM
DB
expid.runWrite raw files on
DA
ECHAM5 run AR
archive
WS
expid.echam.post Monthly grib files
(multicode)
afterburnerATM.expid.grb
Fill in DB
expid.dbfill
tar and transfer to DB
Pp. data for DB
BOT.expid.grbsplit in codes
code1 : 1 code time seriescode2 : 1 code time series…………………
..
JSBACH (as ECHAM ?).post
MPI-OM
DB
expid.runWrite raw files on
DA
MPI-OM run AR
archive
WS
Expid.mpiom.post
Monthlymulticode files (extra)
Fill in DB
expid.dbfillTransfer DB
files
Pp. data for DB
split in codes code1 : 1 code time seriescode2 : 1 code time series…………………
..
conv2grib
Monthlymulticode files (grib)
concatenat
e
Runper.multicode files (grib)
szip
Runper.multicode files (grib-szip)
HAMMOC (as MPI-OM ??)
Monthly ? Rawfiles(netCDF)
.post
Monitoring and error handling
• Check (automatically) file sizes, … after each PP-step (as well tar files)
• If output checked in step m => • Status (‘ok’ or ‘error’) in corresponding log file
• If error occurs =>• Assure, that errors are detected in time and
communicated to responsible persons !• If error occurs => What are the necessary actions
(stop step m-1 ? , …. ??)• Assure ‘restart’ of workflow, if status is set again of
‘ok’
Synchronous workflow
Expid.run
Expid.post
Expid.dbfill
Model time[months]
ok ok
ok ok ERROR !
??
ESM post processing workflow
( Visions or the future ?? )
Post processing steps (vision 1)
1. Write model output (timeseries of codes, means, …) directly by the model run (expid.run) in the data base (DB)
2. Optional : Archive output on disc/tape (AR)
General scheme (vision 1)
DB
expid.run
Write_sql (Output from model …
-… directly from model runas it should stored in DB( e.g. timeseries of singular
codes,monthly means) )
Experiment (Model run) AR
optional
archive
Vision 1 : Counter arguments
• Actual postprocessing (esp. afterburner (convert spectral to regular grid, pressure levels, ‘merging’ of codes etc. )) has to be implemented in the model itself
• What happens if data base filling ‘hangs’• On the compute server must exist a data
base (Oracle) interface• ….
Post processing steps (vision 2)
1. Write model raw output (multicode and –level, model specific formated and gridded, …) on workspace (WS) (expid.run)
2. Postprocess and prepare data on DA and prepare for archiving and data base filling (by expid.post)
3. Write this postproc. data from WS / DA in the data base (DB) (in expid.post)
4. Optional : Archive output on disc/tape (AR) (in expid.post)
General scheme (vision 2)
DB
expid.run
Write raw files on DA
Experiment (Model run) AR
archive
WS
expid.post Rawfiles:- multicode, modelspecific
Postprocessing
Pp. data to archive
Pp. data for DB
Fill in DB
Vision 2 : Preconditions
• Workspace DA must be mounted on a system accessible by the data base,
• Or in other words : files should be directly written from WS in DB
• Performance ? : Write and read on the same disc (but same problem with file transfer ?)
• … further counter arguments ??