LLNL-PRES-679957 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

LLNL-PRES-679957This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Web Services Processing, Application Programming Interface

Charles DoutriauxSasha Ames

Tom MaxwellDan Duffy

Dean WilliamsDecember 9th, 2015

LLNL-PRES-6799572

Overview

As computer power goes up, so does data Volume

Scientists generating and analyzing these data are many and dispersed

BUT the scientific need is greater than ever

ESGF solved the first part of the equation: universal, distributed access

Bringing all needed data to your computer or even to your facility is no longer feasible

Now we need to solve the analysis part.

LLNL-PRES-67995733

The ESGF-CWT is putting together an infrastructure for WPS This talk is about the API part API is two fold:

— Developers: Common ground for creating new tools— Users: Standard way to querying/using resources

Goal: Ease things as much as possible for user, i.e. — What services are here?— Can I get their doc?— Let’s use it

As much decision as possible made for the user (but we still let these to be known to and forced by the user)

ESGF-CWT Solution

LLNL-PRES-67995744

Basic Architecture

Server Side

Services

Client Side

ESGF API

LLNL-PRES-67995755

Documented at: https://acme-climate.atlassian.net/wiki/display/ESGF/API+Standards+and+Requirements

First pass, will likely be tweaked/enhanced as more developers and users get involved

Focusing on JSON input data. First problems we’re trying to solve:

— Model Average— Model Ensemble— Multi-models Ensemble

Cater very basic needs so far, needs to grow as more features are required. Hint: That’s YOU here.

API?

https://acme-climate.atlassian.net/wiki/display/ESGF/API+Standards+and+Requirements



LLNL-PRES-67995766

API (excerpts)

http://aims2.llnl.gov:8000/wps/?version=1.0.0&service=wps&request=Execute&identifier=averager&datainputs=[domain={'id':'glbl','longitude’:{'start':%20-180.0,%20'end':%20180.0},'time’:{'start’:'1980’,'end’:'1982'}};variable={'uri':'file://opt/nfs/cwt/uvcdat/latest/share/uvcdat/sample_data/tas_dnm-95a.xml','id':'tas','domain':'glbl'}]

http://aims2.llnl.gov:8000/wps/?version=1.0.0&service=wps&

http://aims2.llnl.gov:8000/wps/?version=1.0.0&service=wps&

LLNL-PRES-67995777

http://aims2.llnl.gov:8000 VERY BASIC Demo serve— Django-based— Uses UV-CDAT for computation

Will probably grow into a real full blown pretty server Code is at: https://github.com/ESGF/wps_cwt please fork and

issue as many PR as possible and/or use issue tracker to give us feedbacks.

Also take a look at what others presenting here have already done. Let’s try to leverage from each other.

Where do I start?

http://aims2.llnl.gov:8000/

http://aims2.llnl.gov:8000/

https://github.com/ESGF/wps_cwt

https://github.com/ESGF/wps_cwt

LLNL-PRES-67995788

Example? (stick this in “process” directory of server)

class Process(esgfcwtProcess): def __init__(self): """Process initialization""" WPSProcess.__init__(self, identifier=os.path.split(__file__)[-1].split('.')[0], title='averager', version=0.1, abstract='Average a variable over a (many) dimension', storeSupported='true', statusSupported='true') self.domain = self.addComplexInput(identifier='domain', title='domain over which to average', formats=[{'mimeType': 'text/json', 'encoding': 'utf-8', 'schema': None}]) self.dataIn = self.addComplexInput(identifier='variable', title='variable to average', formats=[{'mimeType': 'text/json'}], minOccurs=1, maxOccurs=1) self.download = self.addLiteralInput(identifier='download', type=bool, title='download output', default=False) self.average = self.addComplexOutput(identifier='average', title='averaged variable', formats=[{'mimeType': 'text/json'}])

def execute(self): dataIn=self.loadData()[0] data,cdms2keyargs = self.loadVariable(dataIn) dims = "".join(["(%s)" % x for x in cdms2keyargs.keys()]) data = cdutil.averager(data,axis=dims) data.id=self.getVariableName(dataIn) self.saveVariable(data,self.average,"json") return

LLNL-PRES-67995799

No. The API is designed to be backend agnostic But:

— ESGF-CWT will use UV-CDAT where appropriate— UV-CDAT will be officially supported and will be part of the “compute

node stack”— No, your preferred application is not guaranteed to be fully supported

and/or part of the esgf stack

Yes the API team will listen to you even if you do not use UV-CDAT

But really… You “should” be using it ;) It’s so much simpler and it makes sense to have everybody using the same tools

Do I have to use UV-CDAT?

LLNL-PRES-679957101

0

Tom Maxwell -> NASA Maarten Plieger -> sort of ACME

Anybody is using this?

LLNL-PRES-679957111

1

LOTS! Tighter integration with ESGF

— result search as URI?— esgf:// new uri type?

Testing!— We need some basic dataset to run tests on— We need a mechanism to document “correct” solution to a problem

Once this is in place we can move to distributed analysis— Which nodes carry my diagnostic?— Which one should I use? (is it close to my data, is it overloaded, etc…)— Resource management

Multiple implementation of same diagnostics:— MPI vs SLURM vs MPI+SLURM vs HADOOP vs SPARK, vs combinations, etc…

• Which one to trust• which one is faster for me?

So… What’s next?

LLNL-PRES-679957121

2

Still in its infancy but crystalizing The time is NOW, the more you wait the harder it will be to get

your voice heard.

Summary

LLNL-PRES-679957 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

Documents

true self

false self

api partapi

basic needs

data volumescientists

download output

json input data

api excerpts http