Top Banner
Data Management and Open Access Creating Data Files for Published Figures Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February, 2016
19

Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Oct 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

DataManagementandOpenAccessCreatingDataFilesforPublishedFigures

JoshStillerman,MartinGreenwald,MarkLondon,JasonThomasFebruary,2016

Page 2: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PublishingDataforFigures

2

● TheDOErequirementisnotspecificaboutexactlywhichdataandmetadatamustbeincludedwithpublishedfigures.

–Weareinterpretingtherequirementtobe:

oTheactualvaluesplottedinthefigure

oMetadataaboutthosevalues

§Name,Description,Units

oMetadataabouthowthedataaredisplayedinthefigure

§ Labels,DisplayParameters

– Theyarearealsonotdictatinghowthedatashouldbestored.

oFileFormat/DataOrganization…

Page 3: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedFormat

3

● Choosingastandardfileformathasseveraladvantages:

– Easieraccessforreadersofthepublication

– Easierverificationforlibrarians,curators,andsponsors

– Slowerobsolescence,andeasierconversionasstandardsevolve

– Standardgeneralpurpose toolsforbrowsingandviewingcontents.

● WehavechosenHDF5

– https://www.hdfgroup.org/HDF5/

Page 4: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema

4

● Usingastandardfileformatisgood,butnotgoodenough.– IfallofthedatafilesforfiguresinPSFCpublicationswereforexampleMSExcelo Thiswouldnotdictatetheorganizationoflabels,rowsandcolumnsinthosespreadsheets.

o Inordertointerpretoneofthemauserwouldhavetoopenthefileinteractivelyandattempttounderstandtheorganization.

– ThesameistrueforHDF5,sooWehavedefinedastandardHDF5fileorganizationtorepresentthedatainpublishedfigures.

o Easyaccessforallconsumers(sincetheyareallthesameinstructure)o EasytocreationfromtheprograminglanguagesinuseatthePSFC.§ IDL§ PYTHON§ MATLAB§ Thislistcanbeexpandedasneeded.

Page 5: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema(2)

5

● Onefileperfigure - thelibrarysystemwillnamethefilebasedonthepublication’sID– Rootlevelattributes:author, username,date,description, caption…– OneGroupper’trace’displayed.oGroup levelattributesforthistrace:

● OneGrouppersetofdatadisplayed– Group levelattributes:name,legendstring,plot-information– x_data – valuesfortheXaxisoUnits,label

– Y_data – valuesfortheYaxisoUnits,label

– Z_data – valuesfortheZaxisoUnits,label

Page 6: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Creatingdatafiles

6

● Thetimetocreate(orupdate)thedatafilesiswhenthefiguresarebeingcreated– Atthattime,allofthedataisavailableinsomeprogramming language.– Itismuchmorelikelythefilewillmatchthefigure, ifitiscreatedatthattime.

● APIsaresetuptomimictheplottingAPIs.● Filescanbecreatedandconsumed inanyprogramming languageinterchangeably● ExampleinIDL● ExampleinPython● Otherlanguagestofollow

Page 7: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL- Thefigure

7

Page 8: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL

8

file='Fig_1'fig_description ='Besel FunctionsJ0,J1andJ2'fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe'date=systime(0)

;setupasimplecolortable(justforplotting)r=[000,255,255,000,000]g=[000,255,000,000,255]b=[000,255,000,255,000]tvlct,r,g,b

;startanewhdf5filehdf5_new,file=file,fig_description=fig_description,fig_source=fig_source,$

comment=comment,user_fullname=user_fullname,date=date

Page 9: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(2)x_units ='s’x_axis ='time(s)'x_name ='measuredwithastopwatch'x_type ='float'

y_units ='m'y_axis ='height(m)'y_name ='measuredwitharuler'y_type ='float'

legend='J0'

;compute and plotthe firstcurve(you'll dothis to create the plotfile)x=indgen(100)/5.y0=beselj(x,0)plot,x,y0,charsize=1.8,title=fig_description,xtitle=x_axis,ytitle=y_axis,color=1xyouts,/norm,.9,.85,legend,size=1.8

hdf5_add,x,y0,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

9

Page 10: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(3)legend='J1'

y1=beselj(x,1)oplot,x,y1,color=2xyouts,/norm,.9,.8,legend,size=1.8,color=2

group_name =legendplot_graphics ='redline’

hdf5_add,x,y1,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

10

Page 11: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(4)legend='J2'

;compute and plotthe third curvey2=beselj(x,2)oplot,x,y2,color=4xyouts,/norm,.9,.75,legend,size=1.8,color=4

group_name =legendplot_graphics ='greenline’

;adddatagroupforthistracetofilehdf5_add,x,y2,file=file,group_name=group_name,$

x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

11

Page 12: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 12

Page 13: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python- Thefigure

13

Page 14: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Pythonfromscipy.special importjvromh5_dataimporth5_data

file_name ='Fig_4’fig_description ='Besel Functions J0, J1andJ2’fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe’

#Createthedatafile,withfilelevelmetadatahdf_file =h5_data("%s.hdf5"%(file_name,),

fig_description =fig_description,fig_source=fig_source,comment=comment,user_fullname =user_fullname)

14

Page 15: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(2)#Drawthefirstcurvex=linspace(0, 20)y0=jv(0,x)plot(x,y0, '-b',label='J0')x_units='s’x_label='time(s)’y0_units='m’y0_label='height (m)’

#Addthefirstcurvetothefilehdf_file.add_dataset('J0',x,y0,

legend=None,plot_info='BlueLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y0_units,y_label=y0_label,y_datatype='float')

15

Page 16: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(3)#Drawthesecondcurvey1=jv(1,x)plot(x, y1,'-g',label='J1')y1_units='m’y1_label='height (m)’

#Addthesecondcurvetothefilehdf_file.add_dataset('J1',x,y1,

legend=None,plot_info='GreenLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y1_units,y_label=y1_label,y_datatype='float')

16

Page 17: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(4)#Drawthethirdcurvey2=jv(2,x)plot(x, y2,'-r',label='J2')y2_units='m’y2_label='height (m)’title(fig_description)xlabel(x_label)ylabel(y0_label)

#Addalegendlegend(loc='upper right')

#addthethirdcurvetothefilehdf_file.add_dataset('J2',x,y2,

legend=None,plot_info='RedLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y2_units,y_label=y2_label,y_datatype='float')

17

Page 18: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 18

Page 19: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

19

END