Top Banner
SDM Survey Results
65

SDM Survey Results

Jan 14, 2016

Download

Documents

farren

SDM Survey Results. N=46. Q3. N=46. Q4. N=46. Q5. N=43. Q6. N=43. Q7. #2 Manage Data as Enterprise Asset/Liability. Q8. Q9. N=36. N=39. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SDM Survey Results

SDM Survey Results

Page 2: SDM Survey Results

N=46

Q3

Others

Agriculture

Information Science and Management

Enterprise Architecture

Astrophysics

Environmental Policy

Scientific Data Management

Page 3: SDM Survey Results

N=46

Q4

Other

ORNL-UT/Battelle employee

Technical Manager Research Institute

Page 4: SDM Survey Results

N=46

Q5

Page 5: SDM Survey Results

N=43

Q6

Other

IR R&D budgetary data for the agencies

Produce and manage development of information products and analyses that use data, but it is more social science and policy data (not scientific data).

Page 6: SDM Survey Results

N=43

Q7

Other

User Services staff

Science Data Processing Systems staff

Data Management Policy

Data Preservation Manager

Digital Format Specialist

Page 7: SDM Survey Results

N=39 N=36

Q8 Q9

#2 Manage Data as Enterprise Asset/Liability

Page 8: SDM Survey Results

N=37

Q10

Page 9: SDM Survey Results

Required

Optional

Guidelines on preserving the provenance of data Association with derived scholarly communication

Provision of life cycle cost and level of services for data management for projects

Auditing (data and security)

Data Security and Catastrophe Planning Data retention policy

Confidentiality, Integrity, and Access (general and for people with disabilities) requirements/required

Internal & external stakeholders

Establishing the high-level adoption of a Data Lifecycle Management Framework for managing data assets

Security management

Time frame associated with Policy (how often does policy get updated?)

Linkages with other intra- and inter-agency efforts

policy on using standards Recognition of the 'resource' impact a policy can have. Policies can be big cost drivers to Projects.

long term preservation and archiving

Points of Contact (individuals and agencies responsible)

N=12

Q11 Are there any agency level data management policy elements not on the list which should be? Please list each on a separate row, following the format "element/(required)" or "element/(optional)".

Page 10: SDM Survey Results

N=38

Q12

Page 11: SDM Survey Results

N=38

13. Are there any program, grant, or research project level data management policy elements not on the list which should be? Please list each on a separate row, following the format "element/(required)" or "element/(optional)".

Required Optional

Grant/contract number

Adherence to standards

Not distinguished as Required/Optional

Documentation on the application and technology that generate and utilize the data

Funding source, if it is different from those listed

Documentation on access tools and related technologies for the data

When will data be available to peers, to public

Cross-reference/links from data sets to/from other types of information (STI documents, project documentation) (2_

What data will be preserved for secondary research versus what will be maintained to validate specific research

Data Volumes and growth rate What is sufficient documentation/meta data for measurement data to be understood by a knowledgeable user

Data migration plans for technology obsolescence

Page 12: SDM Survey Results

N=36

Q14

If Yes, what would be different?

More stringent data quality, provenance, etc. evaluation; possibly reduced commitment to preservation

When data is available to other users, who is responsible for maintaining data, etc

Data from extramural research should be fully open and made publicly available either: 1) after publication, or 2) after a set amount of time if the data is not published. This would require the creation of a data repository.

More detail on data rights since they are more complex

Any potential conflicts between the agency data management policy and the extramural organization's data management policy would need to be resolved as part of the data management plan for data generated by the extramural organization.

Data from alternative sources would require flexibility to ensure access to all relevant data sources. However, there should be a "minimum" expectation that may not match the host institutions plans.

I would expect greater rigor for extramural than for in-house where in some cases I would expect exploratory work to be carried out.

Managing data can be VERY EXPENSIVE. When the cost to manage the data is more than the cost to generate the data, then there is a problem. The scope of the 'plan' needs to be adaptable.

Page 13: SDM Survey Results

N=36

Q14

Cont…

I think not all the items in the data management apply to all cases, however, considering them all and identifying items that don’t apply as N/A should be required. Data from Extramural research results would likely be transferred to institutional (government) organization for preservation and longer term management. So the plans should cover when and how such transitions occur.

Only if it is clear that another party/cooperator/funder has the responsibility for data management.

Data management roles for managing extramural data may be best managed by the extramural researcher. Links between the in-house research management systems and the extramural system should be developed.

Data management that is external to science. For example, our data management includes descriptive information of our research facilities, and history of research performed at location. There may be stronger requirements to document methods, landscape, etc... because there would not be in-house understanding upon completion of a project. For example, if a University performs some field tests for us, we may require greater documentation than if it was performed on one of our fields.Links to publications, links to biographies of scientists

Page 14: SDM Survey Results

N=35

Q15

Page 15: SDM Survey Results

N=32

16. What are the key impediments you see, if any, to managing scientific data as an enterprise asset at your agency? Please indicate the top two barriers by labeling them chronologically, (1) & (2).

Page 16: SDM Survey Results

N=37

#3 Full Life Cycle Management

Q 17What percent of your efforts/resources are you willing to allocate to documenting, maintaining, and making your data available for reuse?

Page 17: SDM Survey Results

N=36

Q 18 As a producer of data or in thinking about others who produce data, in your community of practice, what would be a reasonable expectation for your organization to maintain and make available the information necessary to support reuse of data? your data available for reuse?

Page 18: SDM Survey Results

N=36

Q 19

Page 19: SDM Survey Results

N=36

Q 19

Other

The distinction here between 'archive' and 'agency repository' is unclear, but needs to be made clear by the underlying policy.

As established by the established relevant document / records schedule for that data.

My project has the responsibility for archiving and distributing NASA’s Earth science data for the foreseeable future as long as active research is being conducted on the data from historical as well as future missions.

This question touches a core issue of data management, at the end of the cycle of development and use by an entity: archive versus library is the likely choice; but it is highly dependent on the content and security of the data.

It depend on the project and contract status. NASA contract data should go to the appropriate NASA archive. Other SI data probably to agency archives/repositories/museums

It depends on the data. Data retention policies are domain and application specific.

Don't foresee an end to the time period during which we maintain the data and make them available

Any of the above are possibilities, depending on the value of the data and compliance with laws, regs, and NARA requirements

Page 20: SDM Survey Results

N=38

Q 20

Page 21: SDM Survey Results

N=38

Q21

Page 22: SDM Survey Results

#4 Documentation to Implement…..

N=36

Q 22

Other

AuthorTaxonomies and other metadata necessary to link the “found” data with other datasets

A statement describing the original intended use of the data.

"Fitness for use.

Known limitations of the data

Data Format definition

Code libraries and Code snippets (as examples) and sometime code itself

Ancillary information including calibration, validation, etc.

Software toolkits

Sensor calibration data in easy to use (computer readable and easy to parse) format

Geospatial information (location and datum)(2)

Calibration references

Space

Time format (m.d.yh.m.s)

Time as it relates to other events (e.g., two hours after a heavy rain event, two weeks after plowing, etc.)

Page 23: SDM Survey Results

N=35

Q 23

Linking of publication with data source

Federated search across information systems

Standardization of metadata across information systems

Getting permission to access this data

Need to derive metadata from existing metadata

All of the above are cited as reasons for not finding data in our data systems

Find multiple versions with uncertain lineage

Incorrect or misleading metadata

Insufficient information on QA/QC procedure details

Dead links in metadata

Page 24: SDM Survey Results

N=36

Other

Researcher submitted data

Initially deciding whether to include those data in the archival collection, which has multiple considerations

Comparison with data from other sources

Q 24

Page 25: SDM Survey Results

N=36

Q25

Retired field book data

Historic baseline information (4)-Apollo from NASA-Coastal aerial photos from 1970’s

Data for projects terminated due to funding

People retire and data collected is not preserved

Lots of biological species inventories

Survey results which may need to be preserved for later verification of results, or for possible repurposing to new experimental uses

Sometimes, our data managers will come to us with data sets that should be archived at our NASA ES data centers. They are reviewed in a formal process for decisions about whether they should be included.

Manual weather records at a county park that are going to be recycled.

We refer to these as "orphans" where no programmatic sponsors exists. We have had to address several collections falling into this category. There is a group proposing that ICSU CODATA form a group to inventory data needing rescuing. The proposal is to be presented next October in South Africa.

Large quantities of phenological data from individual observers. Various examples of data from professors who are nearing retirement.

Historic measurement data that have not been archived

Extra-mural research data is usually very poorly managed and preserved, and gets worse over time. It is usually left up to the principal researcher or a university archives that does not have the resources or skill-levels to maintain it

EPA emap data, - but no program is stepping up to manage the data

Technical Reports

Page 26: SDM Survey Results

N=36

Q25 Are you aware of scientific data that no longer has a custodian but might need to be preserved for future use?

Page 27: SDM Survey Results

N=37

•PubMed•Search agency awards databasesResearch.gov•goodsearch.com•Many of our users find us through related data centers or our own indexes and portals, like GCMD.•USGS seamless serverMaryland Departments of Natural Resources, Environment sitesvarious other federal agency sites I have found during previous searches•In astrophysics, the ADS service, going directly to major archives, VAO (NVO)’•GCMDvarious domain-specific metadata repositories•Reading the literature relevant to the problem

Q26

Other

PubMed

Search agency awards databases

Goodsearch.com

Portals like GCMD

USGS Seamless Server

In astrophysics, the ADS service, going directly to major archives, VAO (NVO)

Reading the literature relevant to the problem

Page 28: SDM Survey Results

N=34

Q27

Page 29: SDM Survey Results

N=26

Q28

Page 30: SDM Survey Results

N=36

Q29

Page 31: SDM Survey Results

N=36

Q30

Other

Both raw and embedded

Publication Data

Pipeline process the raw data, then use higher level data. Keep raw data on standby

Page 32: SDM Survey Results

N=36

Q30

Other

Both raw and embedded

Publication Data

Pipeline process the raw data, then use higher level data. Keep raw data on standby

Do you typically use “raw” data directly in your analyses, or do you use data and information embedded in published and unpublished reports and/or other resources?

Page 33: SDM Survey Results

N=34

Q31

Page 34: SDM Survey Results

N=36

N=32

N=12, data stewards only

Page 35: SDM Survey Results

N=36

N=32 If you are a data steward, how many inquires do you get per year?

Page 36: SDM Survey Results

N=36

#5 Controlling Access

Q33

Page 37: SDM Survey Results

N=20

Q34

Other

In-kind sharing

Promoting data sharing relationships (2)

Senior management directive

Incentive to develop a better product (2)

Page 38: SDM Survey Results

N=35

Q35

If yes (There are sources that are unavailable etc.) what makes that difficult to access and evaluate?

Cultural roadblocks for submitting researchers

Insufficient metadata (4)

The external data repositories are not designed well. The data formats used are not intuitive. The data format is too permissive -- it does not require enough fields, and data depositors are expected to fill out the forms themselves. This results in different levels of documentation, and documentation not being filled out properly. There is no QA in the process, and the data curators who operate the repository do not actually use the data.

Kept by researcher and not made publicly available or findable.

Owners treat datasets as proprietary

Datasets are not made accessible online

Confidentiality and privacy policies require extensive negotiations to: -Gain access (conflicting information about proprietary access) (5)-Distribution constraints-Other intellectual property issues

They may not be up to date or the data set may be large and difficult to get because distribution can be expensive.

Maintained by other agencies/entities

Possessive mentality (‘data hoarding’)

Q36 N=35

Page 39: SDM Survey Results

N=35

Q36

Continued. . .

Sometimes no caretaker or expert on the data

Sometimes no resources for making the data available (peer reviewed data to be purchased, copyright and other costs, digitizing, publishing)(5)

Sometimes insufficient means to identify existence of data

Lack of agency response to requests

Security restrictions

Page 40: SDM Survey Results

N=36

Q37 What categories do you use with marking used (e.g. Official Use Only - OUO)?

Q38

SBU (Sensitive But Unclassified) (6)

Agency Deliberative

CBI (Confidential Business Information) (3)

Contract sensitive

Deprecation

FOUO

OUO (4)

Proprietary (4)

ITAR

Secret

Agency Only

Competition Sensitive

Program Use Only

Regional Use Only

see DoD Directive 5230.24, Distribution Statements on Techical Documents

Page 41: SDM Survey Results

N=36

37. In your community of practice, do you have controls for unclassified but restricted information?

Page 42: SDM Survey Results

N=36

Q39

Page 43: SDM Survey Results

N=36

Q40 How easy is it to know about rights and restrictions on the use of data?

Page 44: SDM Survey Results

N=35

Q41Taking cultural issues aside, if an Agency places formal restrictions or controls on data you need (e.g. unclassified but restricted such as Confidential businessinformation), how hard or easy is it for you to implement the controls put on by other agencies?

Page 45: SDM Survey Results

N=36

Q42Are you aware of the federal initiative to coordinate and manage various categories of unclassified but controlled (CUI) information?

Page 46: SDM Survey Results

N=36

Q43

Page 47: SDM Survey Results

N=36

Q44

Page 48: SDM Survey Results

N=34

Q45

Solution Recommendations

Methods registries; equipment metadata; some data grids

EPA’s environmental connector, Science FTP Server

Hosting with non-governmental entities

EPA science subnet

Create a restricted-access data enclave (virtual repository with role-based access control)

Dilute security controls that are routinely implemented (5)-General (includes access, firewalls, etc)-Limits on size of electronic transfer

“Waiver” policy

Negotiations amongst similar data holders

Promote use of sunset dates

Page 49: SDM Survey Results

#6 Version Control

N=35

Q46

Page 50: SDM Survey Results

N=35

Q47

Page 51: SDM Survey Results

N=38

Q48

In planning and managing science projects do agency or program procedures exist for change control on data during the project?

Page 52: SDM Survey Results

N=33

#7 Preservation Commensurate with Value

Q49

Page 53: SDM Survey Results

N=31

Q50

Page 54: SDM Survey Results

N=31

Q51

Other

Rate of change of the contextual environment of the data (i.e. does the data become outdated or outmoded over time)

Operational Program needs/requirements

Reliability, Usability, Integrity, Authenticity

Page 55: SDM Survey Results

N=33

Q52

Other

Combination of fixed SOP, funding entity, PI, and data stewards/archivist

Peer advisory panels

Established record schedule fort hat data

Combination of NASA, Data Center managers, and users

Researcher decides "if"; archive decides "how long“

Appraisal archivists

Records management schedules for scientific data

Page 56: SDM Survey Results

N=33

Q52

In your agency, who decides if and how long data will be retained?

Page 57: SDM Survey Results

N=34

Q53

If yes, in your opinion, are those record schedules sufficiently granular ]and precise to address what and howlong data sets should be kept?

Q54

Page 58: SDM Survey Results

N=34

Q53

Does your agency apply National Archives record retention and disposition schedules to scientific data?

Page 59: SDM Survey Results

N=31

#8 Knowledge ManagementContext

Q55

Do you proactively try to make the data generated through your research program available and usable by others

Page 60: SDM Survey Results

N=35

Q56

No (explain briefly why not)

No standard policies exist for ensuring that we maintain proper metadata documentation--therefore, I maintain such documentation myself on an ad hoc basis.

Better answer is 'not always'. There are many questions that are difficult to answer based solely on written documentation due to limitations on a meaningful way to encode nuances of metadata in standardized forms.

We strongly attempt to achieve this, but I believe it is the rare occasion when such a level of completeness is achieved.

Page 61: SDM Survey Results

N=35

Q57

Page 62: SDM Survey Results

N=31

Q58

Other

Publications

Integrating other relevant information objects is done ad hoc, typically a reflection of the project author's attentiveness to documenting a "complete picture" of the data being developed / assessed.

Page 63: SDM Survey Results

N=35

Q59

Page 64: SDM Survey Results

N=36

Q60

Other

DTIC did a preliminary investigation on how the DoD datasets (both unclassified and classified) could be made available through a DoD network. The recommendation of the study is to develop a prototype system to demonstrate its value to the DoD user community and then expand it in the future. Await funding for the prototype system development.

Geospatial information (where will data be collected)

Metadata associated with the data

Data dictionaries and glossaries/terminology

Page 65: SDM Survey Results

END