Implementing the Data Access Protocol in Python Dr. Rob De Almeida
Oct 14, 2014
Implementing the Data Access Protocol in Python
Dr. Rob De Almeida
Table of Contents
● History● Current implementation
● Client● Server● Plugins & responses● WSGI & Paste
● Future
History
● pyDAP is a free implementation of the Data Access Protocol written in Python from scratch
● It is the product of naïveness and determination :)
Why Python?
● Object-oriented high level programming language that emphasizes programmer effort (vs. computer effort)
● Increasing usage in science (CDAT, MayaVi) and web (Google, YouTube)
● Advantages: interpreter, batteries included, easy prototyping, dynamically typed, concise, fun
pyDAP 1.0
● Started in 2003● “Afternoon project”: client only,
downloaded data from ASCII response and worked only with Grids and Arrays
● Reverse-engineering of the protocol● Should've really been version 0.0.1
pyDAP 1.x
● Binary data using Python's xdrlib● Server architecture based on a
common core that could run as CGI, Twisted or using Python's BaseHTTPServer
pyDAP 2.0
● Complete rewrite, based on the DAP 2.0 specification draft
● Developed during the Google Summer of Code 2005
● Own implementation of XDR● Server built based on WSGI
specification*● This should've been version 1.0
pyDAP 2.1
● Fully buffered server, able to handle infinite datasets
● Automatic discovery of plugins● Automatic installation of dependencies● Runs with Python Paste*
pyDAP 2.2.5.8
● Released last Friday (2007-02-16)● Approximately 3k LOC for client and
server, including docstrings, comments and its own XDR implementation
● Support for additional plugins (for new data formats) and responses (for new output) that are auto-discoverable
● Stub support for DDX on the client and server
Client
● Based on the httplib2 module● HTTP / HTTPS● Keep Alive● Auth: digest, basic, WSSE, HMAC digest● Caching● Compression: deflate, gzip
● Intuitive interface
Sample client session
>>> from pynetcdf import NetCDFFile
>>> dataset = NetCDFFile(“coads.nc”)
>>> sst = dataset.variables['SST']
>>> print sst.shape
(12, 90, 180)
>>> print sst.dimensions
('TIME', 'COADSY', 'COADSX')
>>> print sst[0,40,40]
28.0669994354
>>> from dap.client import open
>>> dataset = \
... open(“http://server/coads.nc”)
>>> sst = dataset['SST']
>>> print sst.shape
(12, 90, 180)
>>> print sst.dimensions
('TIME', 'COADSY', 'COADSX')
>>> print sst[0,40,40]
[[[ 28.06699944]]]
Client usage
● Commonly used to automate the download of data from OpeNDAP servers and storing in a different format (scripting)
● Dapper-compliance validator for testing servers
Server
● “Writing a server is like writing a client backwards”
● Thin layer between plugins and responses (both auto-discoverable)
● Implemented as a WSGI application*● Deployed using Paste Deploy*
Plugins and responses
Plugins and responses
http://localhost:8080/file.nc.das
Installing plugins & responses
● pyDAP uses EasyInstall:● easy_install dap.plugins.netcdf● easy_install dap.responses.html
● Easy to create new plugins (for small values of “easy”):
● paster create -t dap_plugin myplugin● Generates template with skeleton code● New plugin can be easily distributed
Available plugins
● CSV● netCDF (reference implementation)● SQL (compatible with most databases
but generates “flat” dataset)● Matlab 4/5● GrADS grib● HDF5 and GDAL (experimental)● grib2? (Rob Cermak)
Available responses
● dds, das, dods● ASCII variant● HTML form● JSON● WMS / KML● EditGrid / Google Spreadsheets● netCDF?
JSON
● Lightweight alternative to XML for data exchange
● Based on a subset of Javascript● Easy to parse on the browser● Parsers and generators for C, C++ C#, Java,
Lisp, Lua, Objective C, Perl, PHP, Python, Ruby, Squeak and several other languages
● Coincidentally, also a subset of Python● JSON == valid Python code
A JSON response
Content-description: dods_json
XDODS-Server: dods/2.0
Content-type: application/json
{"test": {"attributes": {"NC_GLOBAL": {},
"author": "Roberto De Almeida"},
"type": "Dataset",
"a": {"type": "Int32",
"shape": [10],
"data": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}}}
WMS
● Returns maps (images) from requested variables and regions
● Works with geo-referenced grids and sequences
● Layers can be composed together● Data can be constrained:
● /coads.nc.wms?SST // annual mean● /coads.nc.wms?SST[0] // january
WMS example request
http://localhost:8080/netcdf/coads.nc.wms?LAYERS=SST&WIDTH=512
KML
● Generates XML file using the Keyhole Markup Language, pointing to the WMS response
● Nice and simple interface for quick visualizing data
WSGI
● Python Web Standard Gateway Interface
● Simple and universal interface between web servers (like Apache) and web applications (like pyDAP)
● Allows the sharing of middleware between applications (gzip, authentication, caching, etc.)
Before WSGI
After WSGI
Paste & Paste Deploy
● Python module that facilitates the development and deployment of web applications
● Allows the deployment of pyDAP using a simple INI file that specifies server, middleware and application configuration
Running a server
[server:main]use = egg:PasteScript#wsgiutilshost = 127.0.0.1port = 8080
[filter-app:main]use = egg:Paste#httpexceptionsnext = pyDAP
[app:pyDAP]use = egg:dapname = Test DAP serverroot = %(here)s/dataverbose = 0template = %(here)s/templatex-wsgiorg.throw_errors = 1dap.responses.kml.format = image/png
Future
● pyDAP 2.3 almost ready● Dapper compliance● Faster XDR encoding/decoding● Initial support for DDX response and parser
● Build a rich web interface (AJAX) based on JSON + WMS + KML responses
● Not only to pyDAP, but to other OPeNDAP servers using pyDAP as a proxy
Acknowledgments
● OPeNDAP for all the support● James Gallagher for all my questions
about the spec on the mailing list● Everybody who submitted bugs (bonus
points for submitting patches!)