LLNL-PRES-679957 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Web Services Processing, Application Programming Interface Charles Doutriaux Sasha Ames Tom Maxwell Dan Duffy Dean Williams December 9 th , 2015
12
Embed
LLNL-PRES-679957 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
LLNL-PRES-679957This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Web Services Processing, Application Programming Interface
Charles DoutriauxSasha Ames
Tom MaxwellDan Duffy
Dean WilliamsDecember 9th, 2015
LLNL-PRES-6799572
Overview
As computer power goes up, so does data Volume
Scientists generating and analyzing these data are many and dispersed
BUT the scientific need is greater than ever
ESGF solved the first part of the equation: universal, distributed access
Bringing all needed data to your computer or even to your facility is no longer feasible
Now we need to solve the analysis part.
LLNL-PRES-67995733
The ESGF-CWT is putting together an infrastructure for WPS This talk is about the API part API is two fold:
— Developers: Common ground for creating new tools— Users: Standard way to querying/using resources
Goal: Ease things as much as possible for user, i.e. — What services are here?— Can I get their doc?— Let’s use it
As much decision as possible made for the user (but we still let these to be known to and forced by the user)
Example? (stick this in “process” directory of server)
class Process(esgfcwtProcess): def __init__(self): """Process initialization""" WPSProcess.__init__(self, identifier=os.path.split(__file__)[-1].split('.')[0], title='averager', version=0.1, abstract='Average a variable over a (many) dimension', storeSupported='true', statusSupported='true') self.domain = self.addComplexInput(identifier='domain', title='domain over which to average', formats=[{'mimeType': 'text/json', 'encoding': 'utf-8', 'schema': None}]) self.dataIn = self.addComplexInput(identifier='variable', title='variable to average', formats=[{'mimeType': 'text/json'}], minOccurs=1, maxOccurs=1) self.download = self.addLiteralInput(identifier='download', type=bool, title='download output', default=False) self.average = self.addComplexOutput(identifier='average', title='averaged variable', formats=[{'mimeType': 'text/json'}])
def execute(self): dataIn=self.loadData()[0] data,cdms2keyargs = self.loadVariable(dataIn) dims = "".join(["(%s)" % x for x in cdms2keyargs.keys()]) data = cdutil.averager(data,axis=dims) data.id=self.getVariableName(dataIn) self.saveVariable(data,self.average,"json") return
LLNL-PRES-67995799
No. The API is designed to be backend agnostic But:
— ESGF-CWT will use UV-CDAT where appropriate— UV-CDAT will be officially supported and will be part of the “compute
node stack”— No, your preferred application is not guaranteed to be fully supported
and/or part of the esgf stack
Yes the API team will listen to you even if you do not use UV-CDAT
But really… You “should” be using it ;) It’s so much simpler and it makes sense to have everybody using the same tools
Do I have to use UV-CDAT?
LLNL-PRES-679957101
0
Tom Maxwell -> NASA Maarten Plieger -> sort of ACME
Anybody is using this?
LLNL-PRES-679957111
1
LOTS! Tighter integration with ESGF
— result search as URI?— esgf:// new uri type?
Testing!— We need some basic dataset to run tests on— We need a mechanism to document “correct” solution to a problem
Once this is in place we can move to distributed analysis— Which nodes carry my diagnostic?— Which one should I use? (is it close to my data, is it overloaded, etc…)— Resource management
Multiple implementation of same diagnostics:— MPI vs SLURM vs MPI+SLURM vs HADOOP vs SPARK, vs combinations, etc…
• Which one to trust• which one is faster for me?
So… What’s next?
LLNL-PRES-679957121
2
Still in its infancy but crystalizing The time is NOW, the more you wait the harder it will be to get