Virtual Observatories Virtual Observatories and Data Interfaces for and Data Interfaces for Atmospheric Science Atmospheric Science 12th EISCAT International Workshop Incoherent Scatter Radar School Swedish Institute of Space Physics, Space Campus, Kiruna, Sweden Bill Rideout MIT Haystack Route 40, Westford, MA, USA 1-781-981-5624 [email protected]26 August 2005
84
Embed
Virtual Observatories and Data Interfaces for Atmospheric Science 12th EISCAT International Workshop Incoherent Scatter Radar School Swedish Institute.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Virtual Observatories and Data Virtual Observatories and Data Interfaces for Atmospheric Interfaces for Atmospheric
ScienceScience12th EISCAT International WorkshopIncoherent Scatter Radar SchoolSwedish Institute of Space Physics, Space Campus, Kiruna, Sweden
Bill RideoutMIT HaystackRoute 40, Westford, MA, USA1-781-981-5624
Cedar DatabaseCedar Database Other data sourcesOther data sources
A day in the life of an A day in the life of an Atmospheric ScientistAtmospheric Scientist
I have done an experiment with my instrument, but I have done an experiment with my instrument, but now I need to …now I need to …– Search numerous websites for dataSearch numerous websites for data– Figure out their parameters, unitsFigure out their parameters, units– Figure out their coordinate system, date formatFigure out their coordinate system, date format– Figure out how to determine data qualityFigure out how to determine data quality– Write code to download data, or (worse) manually Write code to download data, or (worse) manually
downloaddownload– Write code to convert to your formatWrite code to convert to your format– Finally, do scienceFinally, do science
How can Virtual observatories help?How can Virtual observatories help?
Virtual Observatories – one stop data Virtual Observatories – one stop data shopping!shopping!
Virtual ObservatoriesVirtual Observatories
Ideally…Ideally…– Provide a single interface to access all dataProvide a single interface to access all data– Knows about all data sourcesKnows about all data sources– Allows simple, powerful searches to discover Allows simple, powerful searches to discover
unknown data sourcesunknown data sources– Always gets the most up-to-date dataAlways gets the most up-to-date data– Uses a single set of well-defined parametersUses a single set of well-defined parameters– Provides data in consistent format(s)Provides data in consistent format(s)– Provides data in consistent coordinatesProvides data in consistent coordinates– Informs user of contact information and rules-of-Informs user of contact information and rules-of-
the-road for all datathe-road for all data
Two approachesTwo approaches
Top downTop down
Bottom upBottom up
Build an interface
Build a standard data source
How do they work?How do they work? Top-down approach:Top-down approach:
– Accept that all data sources will be forever incompatibleAccept that all data sources will be forever incompatible– Build a data model so metadata can be sharedBuild a data model so metadata can be shared– Build a unique interface to interface to each new data Build a unique interface to interface to each new data
source.source.– Scales linearly with number of data sources.Scales linearly with number of data sources.– Works best with more uniform data (i.e., astronomical Works best with more uniform data (i.e., astronomical
images)images)
Bottom-up approach:Bottom-up approach:– Standardize data format and semanticsStandardize data format and semantics– Standardize data provider APIStandardize data provider API– Approach taken by Madrigal/CedarApproach taken by Madrigal/Cedar– Try for community acceptanceTry for community acceptance
Cedar DatabaseCedar Database Other data sourcesOther data sources
What is the Madrigal What is the Madrigal database?database?
An open-source, web-based database designed An open-source, web-based database designed to hold one group’s datato hold one group’s data– www.openmadrigal.orgwww.openmadrigal.org has all code and downloads has all code and downloads
Built upon the Cedar database format Built upon the Cedar database format established over 20 years agoestablished over 20 years ago
Fundamentally a data source – allows local Fundamentally a data source – allows local owners to improve/correct their dataowners to improve/correct their data
Designed to be used for a wide variety of Designed to be used for a wide variety of instrumentsinstruments
New installations always welcome!New installations always welcome!
Madrigal site(typically a facility with scientists and a Madrigal installation)
↓Instruments
(ground-based, typically with a set location)↓
Experiments(typically of limited duration, with a single contact)
↓Experiment Files
(represents data from one analysis of the experiment)↓
Records(measurement over one period of time)
↓
Data sharedamong allMadrigal sites
Data uniqueto oneMadrigal site
Madrigal RecordsMadrigal Records
Records(measurement over one period of time)
Three types:Three types: Catalog recordCatalog record
– descriptive information about entire experimentdescriptive information about entire experiment Header recordHeader record
– descriptive information about one section of experimentdescriptive information about one section of experiment Data recordData record
– Stores valuesStores values– All parameters defined by Cedar Database standardAll parameters defined by Cedar Database standard– Contains 3 partsContains 3 parts
PrologProlog 1D records1D records 2D records2D records
Madrigal Data RecordsMadrigal Data Records
PrologProlog– Start and end time Start and end time – Instrument idInstrument id– Kind of data idKind of data id
1D records (scalar)1D records (scalar)– Single value parametersSingle value parameters
2D records (vector)2D records (vector)– Multiple value parametersMultiple value parameters– All parameters must have same number of rowsAll parameters must have same number of rows– Meant to allow multiple spatial measurementsMeant to allow multiple spatial measurements– Not meant for time variation – conflicts with Prolog!Not meant for time variation – conflicts with Prolog!
Datarecord
Prolog
ID (scalar) – S/N=2.5
2D (vector) – Altitudes = 100,150,200,250,300,350
Cedar/Madrigal DatabaseCedar/Madrigal Database
All parameters in file definedAll parameters in file defined– http://cedarweb.hao.ucar.edu/documents/parameters_listhttp://cedarweb.hao.ucar.edu/documents/parameters_list
.txt.txt Ranges of parameters for each instrumentRanges of parameters for each instrument Data stored in one or two 16 bit intsData stored in one or two 16 bit ints
Special valuesSpecial values– missingmissing– assumed (error value only)assumed (error value only)– knownbad (error value only)knownbad (error value only)
Defined inDefined in– http://cedarweb.hao.ucar.edu/cgi-bin/http://cedarweb.hao.ucar.edu/cgi-bin/
Cedar Database parametersCedar Database parameters
Exampleadditionalincrementparameter
Cedar parameters - continuedCedar parameters - continued
Madrigal contains many “derived only” Madrigal contains many “derived only” parametersparameters– Not included in Cedar standardNot included in Cedar standard– Cannot be stored in Cedar fileCannot be stored in Cedar file
New python API hides the existence of New python API hides the existence of additional increment parametersadditional increment parameters– All values are doublesAll values are doubles– Exceptions occur on overflowExceptions occur on overflow– More later… More later…
Derived parameters appear to be in fileDerived parameters appear to be in file Assumes information can be derived from Assumes information can be derived from
recordsrecords– Time from prologTime from prolog– Position either as 1D or 2DPosition either as 1D or 2D– Other parametersOther parameters
Engine determines all parameters that can Engine determines all parameters that can be derivedbe derived
Classes of derived parametersClasses of derived parameters
Space, timeSpace, time– Examples: Local time, shadow heightExamples: Local time, shadow height
Cedar DatabaseCedar Database Other data sourcesOther data sources
Remote Access to Madrigal DataRemote Access to Madrigal Data
Built on web servicesBuilt on web services Like the web, available from anywhere on Like the web, available from anywhere on
any platformany platform Complete Matlab and Python API writtenComplete Matlab and Python API written More APIs available on request or via More APIs available on request or via
contributioncontribution
Madrigal Web ServicesMadrigal Web Services
Simple delimited output via CGI scriptsSimple delimited output via CGI scripts Not based on SOAP or XmlRpc since no Not based on SOAP or XmlRpc since no
support in languages such as Matlabsupport in languages such as Matlab CGI arguments and output fully documented CGI arguments and output fully documented
at at http://www.haystack.edu/madrigal/remoteAPhttp://www.haystack.edu/madrigal/remoteAPIs.htmlIs.html
Madrigal Web Services – part 2Madrigal Web Services – part 2
To write a new API, each method mustTo write a new API, each method must– Take input arguments and generate the correct Take input arguments and generate the correct
CGI URLCGI URL– Parse the delimited textParse the delimited text– Return data to userReturn data to user
% returns a three dimensional array of double with the dimensions:%% [Number of rows, number of parameters requested, number of records]%% If error or no data returned, will return error explanation string instead.data = isprintWeb(eiscat_cgi_url, filename, parms, filterStr);
Matlab MadrigalAPI call
Simple Matlab example, Simple Matlab example, continuedcontinued
In real code, higher level methods to search In real code, higher level methods to search for filenamefor filename
Entire web could be built via remote callsEntire web could be built via remote calls See See
http://madrigal.haystack.edu/madrigal/remothttp://madrigal.haystack.edu/madrigal/remoteMatlabAPI.htmleMatlabAPI.html for complete documentation and more for complete documentation and more examplesexamples
# create the main object to get all needed info from Madrigal# create the main object to get all needed info from MadrigalmadrigalUrl = ‘http://www.haystack.mit.edu/madrigal’madrigalUrl = ‘http://www.haystack.mit.edu/madrigal’testData = madrigalWeb.madrigalWeb.MadrigalData(madrigalUrl)testData = madrigalWeb.madrigalWeb.MadrigalData(madrigalUrl)
# get all MLH experiments in 1998# get all MLH experiments in 1998expList = testData.getExperiments(30, 1998,1,1,0,0,0,1998,12,31,23,59,59) expList = testData.getExperiments(30, 1998,1,1,0,0,0,1998,12,31,23,59,59) for exp in expList: for exp in expList: # print out all experiments# print out all experiments print exp print exp
# print list of all files in first experiment# print list of all files in first experimentfileList = testData.getExperimentFiles(expList[0].id)fileList = testData.getExperimentFiles(expList[0].id) for thisfile in fileList: for thisfile in fileList: print thisfileprint thisfile
Python Remote APIPython Remote API
Similar methods to MatlabSimilar methods to Matlab Fully documented with examplesFully documented with examples Used to implement plotting across multiple sitesUsed to implement plotting across multiple sites Used by SuperDarn to constantly poll for real-time Used by SuperDarn to constantly poll for real-time
Millstone Hill dataMillstone Hill data See See
http://madrigal.haystack.edu/madrigal/remotePythhttp://madrigal.haystack.edu/madrigal/remotePythonAPI.htmlonAPI.html for documentation and more examples for documentation and more examples
Cedar DatabaseCedar Database Other data sourcesOther data sources
Extending/contributing to Extending/contributing to MadrigalMadrigal
Madrigal is completely open sourceMadrigal is completely open source See See www.openmadrigal.orgwww.openmadrigal.org for CVS for CVS All new code is C/Python, with some Tcl.All new code is C/Python, with some Tcl. Extending the Madrigal derivation engine is Extending the Madrigal derivation engine is
Extending the Madrigal derivation Extending the Madrigal derivation engineengine
Simply a list of methods with input Madrigal Simply a list of methods with input Madrigal parameters and output Madrigal parametersparameters and output Madrigal parameters– int methodName(int inCount, double * inputArr, int int methodName(int inCount, double * inputArr, int
Example – Tsyganenko parametersExample – Tsyganenko parameters/************************************************************************ getTsygan derives field line crossing points using Tsyganenko model.** arguments: * inCount (num inputs) = 5 (UT1, UT2, GDLAT, GLON, GDALT)* inputArr - double array holding:* UT1 - UT at record start* UT2 - UT at record end* GDLAT - geodetic latitude* GLON - geodetic longitude* GDALT - geodetic altitude* outCount (num outputs) = 4 * outputArr - double array holding:* TSYG_EQ_XGSM - X GSM value where field line crosses GSM XY plane* TSYG_EQ_YGSM - Y GSM value where field line crosses GSM XY plane* TSYG_EQ_XGSE - X GSE value where field line crosses GSE XY plane* TSYG_EQ_YGSE - Y GSE value where field line crosses GSE XY plane** Algorithm: See Geopack_2003.f, T01_01.f* returns - 0 (successful)*/int getTsygan(int inCount, double * inputArr, int outCount, double * outputArr, FILE * errFile)
Cedar DatabaseCedar Database Other data sourcesOther data sources
New features of Madrigal 2.4New features of Madrigal 2.4
Plotting (as demonstrated)Plotting (as demonstrated) Automatic updating of all geophysical dataAutomatic updating of all geophysical data Capture of user name, email, organizationCapture of user name, email, organization
– WebWeb– Remote APIRemote API
Simple python class to create/edit Madrigal filesSimple python class to create/edit Madrigal files Simple scripts/API to create experiments, add Simple scripts/API to create experiments, add
files, update metadatafiles, update metadata
Creating files with python -exampleCreating files with python -example
“”” create a file with two data records”””
import madrigal.metadataimport madrigal.cedar
################# sample data #################
kinst = 30 # instrument identifier of Millstone Hill ISRmodexp = 230 # id of mode of experimentkindat = 3408 # id of kind of data processingnrow = 5 # all data records have 5 2D rowsSYSTMP = (120.0, 122.0)TFREQ = (4.4E8, 4.4E8)GDALT = ((70.0, 100.0, 200.0, 300.0, 400.0), (70.0, 100.0, 200.0, 300.0, 400.0))GDLAT = ((42.0, 42.0, 42.0, 42.0, 42.0), (42.0, 42.0, 42.0, 42.0, 42.0))GLON = ((270.0, 270.0, 270.0, 270.0, 270.0), (270.0, 270.0, 270.0, 270.0, 270.0))TR = (('missing', 1.0, 1.0, 2.3, 3.0), ('missing', 1.0, 1.7, 2.4, 3.1))DTR = (('missing', 'assumed', 'assumed', 0.3, 0.7), ('missing', 'assumed', 0.7, 0.4, 0.5))
Creating files with python – part Creating files with python – part 22
newFile = '/tmp/testCedar.dat'
# create a new Madrigal file cedarObj = madrigal.cedar.MadrigalCedarFile(newFile, True)
# create all data records - each record lasts one minutestartTime = datetime.datetime(2005, 3, 19, 12, 30, 0, 0)recTime = datetime.timedelta(0,60)for recno in range(2): endTime = startTime + recTime dataRec = madrigal.cedar.MadrigalDataRecord(kinst, kindat, startTime.year, startTime.month, startTime.day, startTime.hour, startTime.minute, startTime.second, startTime.microsecond/10000, endTime.year, endTime.month, endTime.day, endTime.hour, endTime.minute, endTime.second, endTime.microsecond/10000, ('systmp', 'tfreq'), ('gdalt', 'gdlat', 'glon', 'tr', 'dtr'), nrow)
Creating files with python – part Creating files with python – part 33
# set 1d values dataRec.set1D('systmp', SYSTMP[recno]) dataRec.set1D('tfreq', TFREQ[recno])
# set 2d values for n in range(nrow): dataRec.set2D('gdalt', n, GDALT[recno][n]) dataRec.set2D('gdlat', n, GDLAT[recno][n]) dataRec.set2D('glon', n, GLON[recno][n]) dataRec.set2D('tr', n, TR[recno][n]) dataRec.set2D('dtr', n, DTR[recno][n])
# append new data record cedarObj.append(dataRec)
startTime += recTime # write new filecedarObj.write()
Editing files with pythonEditing files with python
# read the Madrigal file into memorycedarObj = madrigal.cedar.MadrigalCedarFile(orgFile)
# loop through each record, increasing all Ti values by a factor of 1.2for record in cedarObj: # skip header and catalog records if record.getType() == 'data': # loop through each 2D roow for row in range(record.getNrow()): presentTi = record.get2D('Ti', row) # make sure its not a special string value, eg 'missing' if type(presentTi) != types.StringType: record.set2D('Ti', row, presentTi*1.2)
Creates, edits catalog, header, data recordsCreates, edits catalog, header, data records Hides details of Cedar file formatsHides details of Cedar file formats
– Various flavors of file formatVarious flavors of file format– Use of 16 bit integers to store dataUse of 16 bit integers to store data– Use of “additional increment” parametersUse of “additional increment” parameters
See See http://madrigal.haystack.edu/madrigal/pythonCedahttp://madrigal.haystack.edu/madrigal/pythonCedarTutorial.htmlrTutorial.html for complete documentation for complete documentation
Cedar DatabaseCedar Database Other data sourcesOther data sources
Cedar DatabaseCedar Database Outgrowth of the Madrigal DatabaseOutgrowth of the Madrigal Database A central repositoryA central repository
– Data persistenceData persistence– Wider variety of dataWider variety of data
Has model result/toolsHas model result/tools Wider variety of output formatsWider variety of output formats Data not as actively updatedData not as actively updated Does not (yet) derive parameters Does not (yet) derive parameters Does not separate data by experimentDoes not separate data by experiment See See
Cedar – select instrumentCedar – select instrument
Select instrument
Cedar instrument – part 2Cedar instrument – part 2
Select instrument
Cedar date – part 1Cedar date – part 1
Select year
In the nextthree pagesyou are selecting astarting day.UI is designedto ensure thatonly a date withdata can be selected.
Cedar date – part 2Cedar date – part 2
Select month
Cedar date – part 3Cedar date – part 3
Select startingday
Select numberof days to view
Cedar output formatCedar output format
Choose outputformat
Data filteringavailable(optional)
Cedar TAB outputCedar TAB output
TAB format
By default, showsall measuredparameters
Cedar Database – for more infoCedar Database – for more info
More complex examples at More complex examples at http://cedarweb.hao.ucar.edu/documents/dbhttp://cedarweb.hao.ucar.edu/documents/dbexamples.htmlexamples.html
– http://www-ssc.igpp.ucla.edu/gem/worldmag/index.htmlhttp://www-ssc.igpp.ucla.edu/gem/worldmag/index.html NASA's Space Physics Data FacilityNASA's Space Physics Data Facility
– http://spdf.gsfc.nasa.gov/http://spdf.gsfc.nasa.gov/ And many more…And many more…
Virtual Observatory concept beginning to influence Virtual Observatory concept beginning to influence data gatheringdata gathering
Future success may depend on standardizationFuture success may depend on standardization Submit suggestions, or write improvements to Submit suggestions, or write improvements to