Analyzing DMPs to inform research data services Lessons from the DART Project IDCC 2016 | Amanda L. Whitmire | http://orcid.org/0000-0003-2429-8879
Analyzing DMPs to inform research data services
Lessons from the DART Project
IDCC 2016 | Amanda L. Whitmire | http://orcid.org/0000-0003-2429-8879
25 Feb. 2016 @DMPResearch | @AWhitTwit 2
Acknowledgements
Amanda Whitmire | Stanford University Libraries
Jake Carlson | University of Michigan Library
Patricia M. Hswe | Pennsylvania State University Libraries
Susan Wells Parham | Georgia Institute of Technology Library
Brian Westra | University of Oregon Libraries
This project was made possible in part by the Institute of Museum and Library Services
grant number LG-07-13-0328.
D A
R T
Tea
m
US Context for DMPs
25 Feb. 2016 @DMPResearch | @AWhitTwit 3
~23 Federal agencies now require a DMP with
proposals
DMPTool offers 30 DMP templates
Funding is very limited
25 Feb. 2016 @DMPResearch | @AWhitTwit 4
DMPs are useful sources of information about researcher
knowledge, capabilities, practices & needs*
*caveats, etc.
25 Feb. 2016 @DMPResearch | @AWhitTwit 5
25 Feb. 2016 @DMPResearch | @AWhitTwit 6
Levels of data services
25 Feb. 2016 @DMPResearch | @AWhitTwit 7
the basics DMP
review
workshops website
mid-level dedicated
research
services
metadata
support
facilitate
deposit in
DRs
consults
high level infrastructure data
curation
From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research Data Support Services." Journal of eScience Librarianship 1(1): Article 5. http://dx.doi.org/10.7191/jeslib.2012.1002
Informed data services development
25 Feb. 2016 @DMPResearch | @AWhitTwit 8
Survey DCPs DMPs
DMP
Goal: A tool for consistent & robust analysis of DMPs
25 Feb. 2016 @DMPResearch | @AWhitTwit 9
25 Feb. 2016 @DMPResearch | @AWhitTwit 10
Performance Level
Performance Criteria Complete / detailed Addressed issue, but
incomplete Did not address
issue Directorates
Ge
ne
ral A
sse
ssm
en
t C
rite
ria
Describes what types of data will be captured, created or collected
Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation
Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project
No details included, fails to adequately describe data types.
All NSF
Dir
ect
ora
te-
or
div
isio
n-
spe
cifi
c as
sess
me
nt
crit
eri
a
Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.)
Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant.
Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices.
Does not clearly address how data will be captured or created.
GEO AGS, GEO EAR SGP, MPS AST
Identifies how much data (volume) will be produced
Amount of expected data (MB, GB, TB, etc.) is clearly specified.
Amount of expected data (GB, TB, etc.) is vaguely specified.
Amount of expected data (GB, TB, etc.) is NOT specified.
GEO EAR SGP, GEO AGS
25 Feb. 2016 @DMPResearch | @AWhitTwit 11
Performance Level
Performance Criteria Complete / detailed Addressed issue, but
incomplete Did not address
issue Directorates
Ge
ne
ral A
sse
ssm
en
t C
rite
ria
Describes what types of data will be captured, created or collected
Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation
Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project
No details included, fails to adequately describe data types.
All NSF
Dir
ect
ora
te-
or
div
isio
n-
spe
cifi
c as
sess
me
nt
crit
eri
a
Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.)
Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant.
Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices.
Does not clearly address how data will be captured or created.
GEO AGS, GEO EAR SGP, MPS AST
Identifies how much data (volume) will be produced
Amount of expected data (MB, GB, TB, etc.) is clearly specified.
Amount of expected data (GB, TB, etc.) is vaguely specified.
Amount of expected data (GB, TB, etc.) is NOT specified.
GEO EAR SGP, GEO AGS
25 Feb. 2016 @DMPResearch | @AWhitTwit 12
Performance Level
Performance Criteria Complete / detailed Addressed issue, but
incomplete Did not address
issue Directorates
Ge
ne
ral A
sse
ssm
en
t C
rite
ria
Describes what types of data will be captured, created or collected
Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation
Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project
No details included, fails to adequately describe data types.
All NSF
Dir
ect
ora
te-
or
div
isio
n-
spe
cifi
c as
sess
me
nt
crit
eri
a
Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.)
Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant.
Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices.
Does not clearly address how data will be captured or created.
GEO AGS, GEO EAR SGP, MPS AST
Identifies how much data (volume) will be produced
Amount of expected data (MB, GB, TB, etc.) is clearly specified.
Amount of expected data (GB, TB, etc.) is vaguely specified.
Amount of expected data (GB, TB, etc.) is NOT specified.
GEO EAR SGP, GEO AGS
25 Feb. 2016 @DMPResearch | @AWhitTwit 13
Performance Level
Performance Criteria Complete / detailed Addressed issue, but
incomplete Did not address
issue Directorates
Ge
ne
ral A
sse
ssm
en
t C
rite
ria
Describes what types of data will be captured, created or collected
Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation
Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project
No details included, fails to adequately describe data types.
All NSF
Dir
ect
ora
te-
or
div
isio
n-
spe
cifi
c as
sess
me
nt
crit
eri
a
Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.)
Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant.
Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices.
Does not clearly address how data will be captured or created.
GEO AGS, GEO EAR SGP, MPS AST
Identifies how much data (volume) will be produced
Amount of expected data (MB, GB, TB, etc.) is clearly specified.
Amount of expected data (GB, TB, etc.) is vaguely specified.
Amount of expected data (GB, TB, etc.) is NOT specified.
GEO EAR SGP, GEO AGS
25 Feb. 2016 @DMPResearch | @AWhitTwit 14
25 Feb. 2016 @DMPResearch | @AWhitTwit 15
25 Feb. 2016 @DMPResearch | @AWhitTwit 16
https://osf.io/kh2y6/
25 Feb. 2016 @DMPResearch | @AWhitTwit 17
A few results
Data type & format across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 18
Data types Data formats
Data type & format across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 19
Data types Data formats observational, model results, experimental, qual./quant., geospatial, code, etc.
hand-written notes, NetCDF, *.xlsx, *.csv, *.shp, *.shx, *.dbf, *.mp4, R, etc.
Data type & format across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 20
Data types Data formats observational, model results, experimental, qual./quant., geospatial, code, etc.
hand-written notes, NetCDF, *.xlsx, *.csv, *.shp, *.shx, *.dbf, *.mp4, R, etc.
Data sharing venues across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 21
Data sharing venues across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 22
Data sharing venues across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 23
Metadata across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 24
Metadata across disciplines (%)
25 Feb. 2016 @DMPResearch | @AWhitTwit 25
25 Feb. 2016 @DMPResearch | @AWhitTwit 26
Use a digital tool for collecting your assessment data
25 Feb. 2016 @DMPResearch | @AWhitTwit 27
Forces consistency
Produces co-located data
Facilitates analysis
Assess what the DMP guidelines stipulate, not what you think the DMP should include
25 Feb. 2016 @DMPResearch | @AWhitTwit 28
VS.
Ideal DMP guidance
Actual DMP guidance
25 Feb. 2016 @DMPResearch | @AWhitTwit 29
“Provide a description of the data you will collect or re-use, including the file types, dataset size, number of expected files or sets, and content. …
Consider the following: • What data will be generated in the research? • What data types will you be creating or capturing? • How will you capture or create the data? • If you will be using existing data, state this and include how you will obtain it. • What is the relationship between the data you are collecting and any existing data? • How will the data be processed? • What quality assurance & quality control measures will you employ?
DMPTool guidance on data types
“Types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project.”
General NSF DMP guidance on data types
25 Feb. 2016 @DMPResearch | @AWhitTwit 30
25 Feb. 2016 @DMPResearch | @AWhitTwit 31
25 Feb. 2016 @DMPResearch | @AWhitTwit 32
http://dmpresearch.library.oregonstate.edu/
https://osf.io/kh2y6/
Amanda Whitmire
Thank you!
25 Feb. 2016 @DMPResearch | @AWhitTwit 33
Except where otherwise noted, this work is licensed under
http://creativecommons.org/licenses/by/4.0/
Creative Commons & the double C in a circle are registered trademarks of Creative Commons in the United States & other countries. Third party marks &
brands are property of their respective holders.
Please attribute Amanda Whitmire with a link to this presentation at SlideShare.net