Research Data Management in Canada Perceptions and Practices Across the Disciplines 1 Presented by Melissa Cheung, University of Ottawa and Alexandra Cooper, Queen’s University 21 July 2020
Research Data Management in Canada
Perceptions and Practices Across the Disciplines
1
Presented by Melissa Cheung, University of Ottawa and Alexandra Cooper, Queen’s University21 July 2020
Background: The Survey Project
2015Science and Engineering Survey
2016
2017Health Sciences and Medicine Survey
2018Consortium reaches 14 institutions
2019National Dataset
2020Analysis of National DatasetHumanities and
Social Sciences Survey
Canadian RDM Survey Consortium
2
1 I I I I I /
The Survey
● Common survey instrument consisted of 4 sections with questions that reflect the data lifecycle
● Each institution ran surveys individually using their own survey software○ Customized the survey based on institutional characteristics and interest○ Obtained ethics approval from their own research ethics board
3
Merging datasets
● Recode institutional datasets to match standard codebook○ Regroup rank, funding variables ○ Regroup faculty/department into
generic list of “field of study”○ Remove text responses from
analysis
4
Fundinc
V.-iliblitNn"le variaititell
fUNCIHA
•~ souroo: OHR --poon SurYeyquestk,n
P'~ source rts:pondent hu UHd in &1st 5 .,.., or intfflds to UH in nut 5 y,11rs:: a HR Whidl fu~source<....., you - widln the past Sye>n, o,.,. pl~ toapi,ly foe in the MXt Syoan' Plooseu:lude~-.norbd ud....,llyfo,__, ;ond lnlnruuct\ro.Select that apply: v-~
0 Notellasen
1c:i-n 99 Not D<darod
- Notosbdby"""""""""'inSOMion
FUNSSltAC
f~SO<.m!. 5.SHAC
Count 2164
4SS
....... 11.4"
ViNbleNrne VMiailelmel Viri- do>aipoon SurYeyquos!lcn
P'~sowm respondent MS used in lase 5ye.sor .. tnds to Us.t.,nut 5yein:: SSl"IRC \Yhich ~sourCH how you - wi1llln the past Sye>n, o,.,. pl>nninc toapi,ly fer in the next s ve>n' _ .,.,ude hninc - ud....,11y,.,__,, ;ond lnlnruuct\ro. Select mot app1y: v-~
0 Not Cllasen 1 diosen
99 Not D<darod
- Not asbdby-orincimtrtution
<UNCR f~ souroo: CA
Count 1936
643 13
16
VariibleNrne V.-iliblellbel Viriill>le do>aipoon SurYeyquos!lcn
f'~ source respondent MS used W'I tut 5 yeJn Ol l'ltends to us.e '" next 5 y,u,s: Cfl Whidl ~SCMces howyou --the past Syon, o,.,. ~ toapi,ly foe in the next s ve>n' Pie- udude hninc - ud....,11y,.,_.-, •rd in1nruua1r1. SD<t .. mot app1y: v-~
0 Notc:::hiclMn 1 diosen
99 Not o.dir.ci
- Not - by admristorinc imtrtution
AJNNSlAC
·~ SOI.ml: NSfltC
Count ll11 227
219
Vlfilbknamt V.-iXHI.IIN.t viriill>le desoipoon SurYeyquos!lcn
,~ source respondent hu used WI last 5 y,tin or intends to uw ,n next 5 y,e-s: NSERC Whidl ~souras howyou used wi1llln the past syo,n, o,.,. ~ toapi,ly for in the next s ye>n > Pie.,. e>dudo ~ tonl\llbd udusively fo, operwons ard lnlnruuct\re. Select mot apply: v-~
0 Not Cllasen
ldlowl 99 Not D<darod
- --bv-omcimtrtution
Api 2020
Count 1917
612
13 16
....... 74."'6 218'1 0.5" UJ!6
Who participated? 5
Survey Results
6
Respondent Field of Study
Grad students made up 32.5% (n=778) of the total survey respondents and 20.5%-43.7% in each disciplinary group included in the survey.
7
Social Science
Alts/Humanities
Science
Engineering
Medicine /Preclinical Sciences
Business/Management, Education, Law
Health Science
Interdisciplinary/ Other
0 100 200 300 4 00 500
Nrunber of respondents
Funding sources
8
,----
60
ti) 50
..... c:: Q)
""O § 0. 4 0 ti) Q) ~
0 r--.. ~
30 '-....,I
Q) tl.O Cll ..... c:: Q) CJ ~ 20 Q)
0...
10
0
Other Province SSHRC NSERC None CIHR CFI Industry
Current RDM Practices
(reported at the time of the survey)
How are researchers working with and managing their data?
9
Estimated data storage size for a typical project
10
Q) N ..... (/)
~
< 50GB (Gigabyte)
50GB to < 500GB
C'O i... .B 500GB to < 1000GB (/)
C'O +-' C'O
"'d "'d
Q) +-' C'O s .....
+-' (/)
~
>1TB (Terabyte)
Not sure
Not applicable ■
0
55.1
11.6
10 20 30 40 50 60
Percentage (96) of respondents
Where are researchers storing their data?
11
Note: respondents could select all that applied
Not sure
Grid/ high perfo1mance computing (HPC) centre
Other
CD/ DVD
External data repository
Hard drive of the instmment/ sensor which generates the data
Physical copy retained
Shared drive/ university or departmental server
Cloud/web based solution
Flash drive/ USE
External hard drive
Computer hard drive
Laptop hard drive
0 10 20 30 4 0 50 60 Percentage (%) of respondents
Data kept after project
12
I only keep data for the length of the project
Less than 3 years
Between 3-5 years
Between 5-10 years
More than 10 years
Until the data becomes inaccessible or lost
0
■ Source/Survey/Raw Data
10
20.5
19.5
20 30 40 so 60
Percentage(%) of respondents
■ Intermediate/ Working Data ■ Processed data ready for publication
Researcher Readiness for
Tri-Agency RDM Policy
Would researchers be able to meet Tri-Agency RDM policy requirements for data management plans (DMPs) and to deposit data in digital repositories?
13
Audience poll: We asked researchers if they would be able to draft a DMP as part of a grant application. Guess which of the following statements most researchers said best describes their situation.
A) They do not need help drafting a DMP.B) They prefer having help and/or guided documentation.C) They need help and/or guided documentation to draft a DMP.
14
What if researchers were asked to draft a Data Management Plan (DMP)?
no help
15
I would be able to draft a data management plan that would address these types of questions without assistance
I would be able to draft a data management plan but would prefer to have assistance and/ or documentation to ensure the success of
my application
I would need assistance and / or guided documentation to appropriately address some or all of the sections
0
No help 14
Prefer help 35.7
Need help 50.3
10 20 30 40 50 60
Percentage (%) of respondents
Data Sharing
16
b.O Q .,... i... ro
..c:: t/l
'o "d t/l 0
..c:: ~ ~
Other
Institut ional repository
Discipline-specific repository
Institutional/ personal website
Supplementary materials files to a journal publisher
Online with restricted access
Not sharing/Not planning to share
Personal request only
0 10 20 30 4 0 50 60 Percentage (%) of respondents
■ Cu1Tent ■ Future
Restrictions or embargoes on data sharing
17
Public safety / sensitive nature
Commercial concems
Other
Plan to file patent
Not sure
Contractual obligations to third party
Intellectual property rights
No restrictions
Privacy or ethical restrictions
Need to publish before sharing
21.8
~
0 10 20
30.2
30-4
30
Percentage (%) of respondents
40 50 60
Restrictions or embargoes on data sharing● Arts/Humanities and Science more likely to report no restrictions on
data sharing● Engineering and Science most likely to report need to publish first● All other disciplines cited privacy, confidentiality, or ethics
restrictions ● Arts/Humanities divided between need to publish first and privacy,
confidentiality, or ethics restrictions
Image by OpenClipart-Vectors from Pixabay18
Audience poll: What do you think is the most common reason why researchers would not be willing to share their data?
A) Researchers believe their data should not be sharedB) The data are incomplete or not finishedC) Researchers still wish to derive value from the dataD) Funding body does not require sharing
19
Reasons for not sharing research data and associated methods/tools
20
Did not lmow could share Ill Not required by funding body 1111111
Lack technical skills &ti Not useful to others -
Should not be shared -
Reason not share: Other -
o place to put data -
Lack of Funding
Lack of standards
Do not hold rights
Insufficient time
Concen1s about citation
Willing to share
Privac) / legal/ security reasons
Want to derive value
Data are incomplete
10.8
12.8
0 10
16.2
22
22.
22.6
20 30 40 50 60 Percentage (96) of respondents
Back to our research question
Would researchers be able to meet Tri-Agency RDM policy requirements for data management plans (DMPs) and to deposit data in digital repositories?
Answer: NOPE
(Not based on their RDM practices as reported, anyway)
21
RDM Services
What supports do researchers want or need to create DMPs for grant applications?
What supports do researchers want or need to deposit their data?
22
Interest in RDM services
23
Faculty or Graduate Workshop
Assistance with DMP preparation
Communication about funding/journal requirements
Institutional reposito1y
Personalized consultations
Data storage during active projects
Assistance with preservation/ sharing
Finding/accessing data sources
Assistance with metadata creation
External Reposito1y
Permanent identifiers/ DOis
Digitization of physical records
0 96
55.3
2096
83.6
80.2
79.6
76.2
74.3
73.5
72.8
70.5
65.5
65.3
6096 Percentage (96) of respondents
31.5
10
12.1
13.5
18
19.9
19.6
21.7
24.5
26.3
8096
■ Interested ■ Not Interested ■ Not Applicable
10096
Interest in services: differences by discipline
Field of study Most popular RDM services Interested
Engineering Assistance with DMP preparation 82.52%
Arts/Humanities Assistance with DMP preparation 79.88%
Business/Management, Education, Law Assistance with DMP preparation 84.38%
Social Sciences Assistance with DMP preparation 80.83%
Medicine/Preclinical Sciences Communication/info about funding/journal requirements 80.18%
Health Science Communication/info about funding/journal requirements 91.51%
Science Institutional repository 77.36%
Interdisciplinary/Other Institutional repository 84.38%
24
Some Implications
How can the survey results help us with developing RDM services?
25
Gaps between practices and policy requirements
● Clarification on policies related to data sharing and deposit in digital repositories○ Privacy, confidentiality and ethics○ Funding and journal policies○ Which data should be retained and/or shared
● High demand for assistance with DMPs and institutional repositories for data preservation
26
But wait, there’s more!
● Plans to publish our findings● National Dataset will be made publicly available● Learn more about the project on the Canadian RDM Survey
Consortium’s page of Portage’s website
Image by Mohamed Hassan from Pixabay27
Thank you!Questions?
28