SDM Survey Results
Jan 14, 2016
SDM Survey Results
N=46
Q3
Others
Agriculture
Information Science and Management
Enterprise Architecture
Astrophysics
Environmental Policy
Scientific Data Management
N=46
Q4
Other
ORNL-UT/Battelle employee
Technical Manager Research Institute
N=46
Q5
N=43
Q6
Other
IR R&D budgetary data for the agencies
Produce and manage development of information products and analyses that use data, but it is more social science and policy data (not scientific data).
N=43
Q7
Other
User Services staff
Science Data Processing Systems staff
Data Management Policy
Data Preservation Manager
Digital Format Specialist
N=39 N=36
Q8 Q9
#2 Manage Data as Enterprise Asset/Liability
N=37
Q10
Required
Optional
Guidelines on preserving the provenance of data Association with derived scholarly communication
Provision of life cycle cost and level of services for data management for projects
Auditing (data and security)
Data Security and Catastrophe Planning Data retention policy
Confidentiality, Integrity, and Access (general and for people with disabilities) requirements/required
Internal & external stakeholders
Establishing the high-level adoption of a Data Lifecycle Management Framework for managing data assets
Security management
Time frame associated with Policy (how often does policy get updated?)
Linkages with other intra- and inter-agency efforts
policy on using standards Recognition of the 'resource' impact a policy can have. Policies can be big cost drivers to Projects.
long term preservation and archiving
Points of Contact (individuals and agencies responsible)
N=12
Q11 Are there any agency level data management policy elements not on the list which should be? Please list each on a separate row, following the format "element/(required)" or "element/(optional)".
N=38
Q12
N=38
13. Are there any program, grant, or research project level data management policy elements not on the list which should be? Please list each on a separate row, following the format "element/(required)" or "element/(optional)".
Required Optional
Grant/contract number
Adherence to standards
Not distinguished as Required/Optional
Documentation on the application and technology that generate and utilize the data
Funding source, if it is different from those listed
Documentation on access tools and related technologies for the data
When will data be available to peers, to public
Cross-reference/links from data sets to/from other types of information (STI documents, project documentation) (2_
What data will be preserved for secondary research versus what will be maintained to validate specific research
Data Volumes and growth rate What is sufficient documentation/meta data for measurement data to be understood by a knowledgeable user
Data migration plans for technology obsolescence
N=36
Q14
If Yes, what would be different?
More stringent data quality, provenance, etc. evaluation; possibly reduced commitment to preservation
When data is available to other users, who is responsible for maintaining data, etc
Data from extramural research should be fully open and made publicly available either: 1) after publication, or 2) after a set amount of time if the data is not published. This would require the creation of a data repository.
More detail on data rights since they are more complex
Any potential conflicts between the agency data management policy and the extramural organization's data management policy would need to be resolved as part of the data management plan for data generated by the extramural organization.
Data from alternative sources would require flexibility to ensure access to all relevant data sources. However, there should be a "minimum" expectation that may not match the host institutions plans.
I would expect greater rigor for extramural than for in-house where in some cases I would expect exploratory work to be carried out.
Managing data can be VERY EXPENSIVE. When the cost to manage the data is more than the cost to generate the data, then there is a problem. The scope of the 'plan' needs to be adaptable.
N=36
Q14
Cont…
I think not all the items in the data management apply to all cases, however, considering them all and identifying items that don’t apply as N/A should be required. Data from Extramural research results would likely be transferred to institutional (government) organization for preservation and longer term management. So the plans should cover when and how such transitions occur.
Only if it is clear that another party/cooperator/funder has the responsibility for data management.
Data management roles for managing extramural data may be best managed by the extramural researcher. Links between the in-house research management systems and the extramural system should be developed.
Data management that is external to science. For example, our data management includes descriptive information of our research facilities, and history of research performed at location. There may be stronger requirements to document methods, landscape, etc... because there would not be in-house understanding upon completion of a project. For example, if a University performs some field tests for us, we may require greater documentation than if it was performed on one of our fields.Links to publications, links to biographies of scientists
N=35
Q15
N=32
16. What are the key impediments you see, if any, to managing scientific data as an enterprise asset at your agency? Please indicate the top two barriers by labeling them chronologically, (1) & (2).
N=37
#3 Full Life Cycle Management
Q 17What percent of your efforts/resources are you willing to allocate to documenting, maintaining, and making your data available for reuse?
N=36
Q 18 As a producer of data or in thinking about others who produce data, in your community of practice, what would be a reasonable expectation for your organization to maintain and make available the information necessary to support reuse of data? your data available for reuse?
N=36
Q 19
N=36
Q 19
Other
The distinction here between 'archive' and 'agency repository' is unclear, but needs to be made clear by the underlying policy.
As established by the established relevant document / records schedule for that data.
My project has the responsibility for archiving and distributing NASA’s Earth science data for the foreseeable future as long as active research is being conducted on the data from historical as well as future missions.
This question touches a core issue of data management, at the end of the cycle of development and use by an entity: archive versus library is the likely choice; but it is highly dependent on the content and security of the data.
It depend on the project and contract status. NASA contract data should go to the appropriate NASA archive. Other SI data probably to agency archives/repositories/museums
It depends on the data. Data retention policies are domain and application specific.
Don't foresee an end to the time period during which we maintain the data and make them available
Any of the above are possibilities, depending on the value of the data and compliance with laws, regs, and NARA requirements
N=38
Q 20
N=38
Q21
#4 Documentation to Implement…..
N=36
Q 22
Other
AuthorTaxonomies and other metadata necessary to link the “found” data with other datasets
A statement describing the original intended use of the data.
"Fitness for use.
Known limitations of the data
Data Format definition
Code libraries and Code snippets (as examples) and sometime code itself
Ancillary information including calibration, validation, etc.
Software toolkits
Sensor calibration data in easy to use (computer readable and easy to parse) format
Geospatial information (location and datum)(2)
Calibration references
Space
Time format (m.d.yh.m.s)
Time as it relates to other events (e.g., two hours after a heavy rain event, two weeks after plowing, etc.)
N=35
Q 23
Linking of publication with data source
Federated search across information systems
Standardization of metadata across information systems
Getting permission to access this data
Need to derive metadata from existing metadata
All of the above are cited as reasons for not finding data in our data systems
Find multiple versions with uncertain lineage
Incorrect or misleading metadata
Insufficient information on QA/QC procedure details
Dead links in metadata
N=36
Other
Researcher submitted data
Initially deciding whether to include those data in the archival collection, which has multiple considerations
Comparison with data from other sources
Q 24
N=36
Q25
Retired field book data
Historic baseline information (4)-Apollo from NASA-Coastal aerial photos from 1970’s
Data for projects terminated due to funding
People retire and data collected is not preserved
Lots of biological species inventories
Survey results which may need to be preserved for later verification of results, or for possible repurposing to new experimental uses
Sometimes, our data managers will come to us with data sets that should be archived at our NASA ES data centers. They are reviewed in a formal process for decisions about whether they should be included.
Manual weather records at a county park that are going to be recycled.
We refer to these as "orphans" where no programmatic sponsors exists. We have had to address several collections falling into this category. There is a group proposing that ICSU CODATA form a group to inventory data needing rescuing. The proposal is to be presented next October in South Africa.
Large quantities of phenological data from individual observers. Various examples of data from professors who are nearing retirement.
Historic measurement data that have not been archived
Extra-mural research data is usually very poorly managed and preserved, and gets worse over time. It is usually left up to the principal researcher or a university archives that does not have the resources or skill-levels to maintain it
EPA emap data, - but no program is stepping up to manage the data
Technical Reports
N=36
Q25 Are you aware of scientific data that no longer has a custodian but might need to be preserved for future use?
N=37
•PubMed•Search agency awards databasesResearch.gov•goodsearch.com•Many of our users find us through related data centers or our own indexes and portals, like GCMD.•USGS seamless serverMaryland Departments of Natural Resources, Environment sitesvarious other federal agency sites I have found during previous searches•In astrophysics, the ADS service, going directly to major archives, VAO (NVO)’•GCMDvarious domain-specific metadata repositories•Reading the literature relevant to the problem
Q26
Other
PubMed
Search agency awards databases
Goodsearch.com
Portals like GCMD
USGS Seamless Server
In astrophysics, the ADS service, going directly to major archives, VAO (NVO)
Reading the literature relevant to the problem
N=34
Q27
N=26
Q28
N=36
Q29
N=36
Q30
Other
Both raw and embedded
Publication Data
Pipeline process the raw data, then use higher level data. Keep raw data on standby
N=36
Q30
Other
Both raw and embedded
Publication Data
Pipeline process the raw data, then use higher level data. Keep raw data on standby
Do you typically use “raw” data directly in your analyses, or do you use data and information embedded in published and unpublished reports and/or other resources?
N=34
Q31
N=36
N=32
N=12, data stewards only
N=36
N=32 If you are a data steward, how many inquires do you get per year?
N=36
#5 Controlling Access
Q33
N=20
Q34
Other
In-kind sharing
Promoting data sharing relationships (2)
Senior management directive
Incentive to develop a better product (2)
N=35
Q35
If yes (There are sources that are unavailable etc.) what makes that difficult to access and evaluate?
Cultural roadblocks for submitting researchers
Insufficient metadata (4)
The external data repositories are not designed well. The data formats used are not intuitive. The data format is too permissive -- it does not require enough fields, and data depositors are expected to fill out the forms themselves. This results in different levels of documentation, and documentation not being filled out properly. There is no QA in the process, and the data curators who operate the repository do not actually use the data.
Kept by researcher and not made publicly available or findable.
Owners treat datasets as proprietary
Datasets are not made accessible online
Confidentiality and privacy policies require extensive negotiations to: -Gain access (conflicting information about proprietary access) (5)-Distribution constraints-Other intellectual property issues
They may not be up to date or the data set may be large and difficult to get because distribution can be expensive.
Maintained by other agencies/entities
Possessive mentality (‘data hoarding’)
Q36 N=35
N=35
Q36
Continued. . .
Sometimes no caretaker or expert on the data
Sometimes no resources for making the data available (peer reviewed data to be purchased, copyright and other costs, digitizing, publishing)(5)
Sometimes insufficient means to identify existence of data
Lack of agency response to requests
Security restrictions
N=36
Q37 What categories do you use with marking used (e.g. Official Use Only - OUO)?
Q38
SBU (Sensitive But Unclassified) (6)
Agency Deliberative
CBI (Confidential Business Information) (3)
Contract sensitive
Deprecation
FOUO
OUO (4)
Proprietary (4)
ITAR
Secret
Agency Only
Competition Sensitive
Program Use Only
Regional Use Only
see DoD Directive 5230.24, Distribution Statements on Techical Documents
N=36
37. In your community of practice, do you have controls for unclassified but restricted information?
N=36
Q39
N=36
Q40 How easy is it to know about rights and restrictions on the use of data?
N=35
Q41Taking cultural issues aside, if an Agency places formal restrictions or controls on data you need (e.g. unclassified but restricted such as Confidential businessinformation), how hard or easy is it for you to implement the controls put on by other agencies?
N=36
Q42Are you aware of the federal initiative to coordinate and manage various categories of unclassified but controlled (CUI) information?
N=36
Q43
N=36
Q44
N=34
Q45
Solution Recommendations
Methods registries; equipment metadata; some data grids
EPA’s environmental connector, Science FTP Server
Hosting with non-governmental entities
EPA science subnet
Create a restricted-access data enclave (virtual repository with role-based access control)
Dilute security controls that are routinely implemented (5)-General (includes access, firewalls, etc)-Limits on size of electronic transfer
“Waiver” policy
Negotiations amongst similar data holders
Promote use of sunset dates
#6 Version Control
N=35
Q46
N=35
Q47
N=38
Q48
In planning and managing science projects do agency or program procedures exist for change control on data during the project?
N=33
#7 Preservation Commensurate with Value
Q49
N=31
Q50
N=31
Q51
Other
Rate of change of the contextual environment of the data (i.e. does the data become outdated or outmoded over time)
Operational Program needs/requirements
Reliability, Usability, Integrity, Authenticity
N=33
Q52
Other
Combination of fixed SOP, funding entity, PI, and data stewards/archivist
Peer advisory panels
Established record schedule fort hat data
Combination of NASA, Data Center managers, and users
Researcher decides "if"; archive decides "how long“
Appraisal archivists
Records management schedules for scientific data
N=33
Q52
In your agency, who decides if and how long data will be retained?
N=34
Q53
If yes, in your opinion, are those record schedules sufficiently granular ]and precise to address what and howlong data sets should be kept?
Q54
N=34
Q53
Does your agency apply National Archives record retention and disposition schedules to scientific data?
N=31
#8 Knowledge ManagementContext
Q55
Do you proactively try to make the data generated through your research program available and usable by others
N=35
Q56
No (explain briefly why not)
No standard policies exist for ensuring that we maintain proper metadata documentation--therefore, I maintain such documentation myself on an ad hoc basis.
Better answer is 'not always'. There are many questions that are difficult to answer based solely on written documentation due to limitations on a meaningful way to encode nuances of metadata in standardized forms.
We strongly attempt to achieve this, but I believe it is the rare occasion when such a level of completeness is achieved.
N=35
Q57
N=31
Q58
Other
Publications
Integrating other relevant information objects is done ad hoc, typically a reflection of the project author's attentiveness to documenting a "complete picture" of the data being developed / assessed.
N=35
Q59
N=36
Q60
Other
DTIC did a preliminary investigation on how the DoD datasets (both unclassified and classified) could be made available through a DoD network. The recommendation of the study is to develop a prototype system to demonstrate its value to the DoD user community and then expand it in the future. Await funding for the prototype system development.
Geospatial information (where will data be collected)
Metadata associated with the data
Data dictionaries and glossaries/terminology
END