Why does Research Data Matter to Libraries? John MacColl, University Librarian & Director of Library Services, University of St Andrews Jisc Research Data Network Meeting University of St Andrews, 30 November 2016
Why does Research Data Matter to Libraries?
John MacColl, University Librarian & Director of Library Services, University of St Andrews
Jisc Research Data Network MeetingUniversity of St Andrews, 30 November 2016
Happy St Andrew’s Day!
Three sources for this talk
Back to basics• Is research data an output? Does it validate a researcher?• What is wrong with the status quo? Peer review is sufficient?• Libraries deal in graspable objects.• We have more than enough to do, reinventing ourselves as managers of
study spaces.• But – research data is in the library! Why?
Two new requirements• Need on campuses for research profile management (pragmatic;
cynical?).• The ‘integrity of science’ argument (purist; Utopian?).
Ithaka UK Survey of Academics 2015• Substantial increase in number of
academics who preserve their data in an institutional or other online repository.• Corresponding decrease in number
preserving their data themselves using commercially or freely available software.• Humanists and social scientists more
likely to build up qualitative data.
Ithaka UK Survey of Academics 2015• Scientists more likely to collect
scientific,quantitative or computational data.• RLUK academics more likely to build
up scientific or computational data.• Non-RLUK academics more frequently
build up qualitative data.• 80% organise qualitative data on their
own computer.• 30% organise or manage data using
institutional or cloud storage.
Ithaka UK Survey of Academics 2015• Medics and vets use institutional
storage the most (53%).• < 5% of respondents utilise their
library in organising or managing data.• 20% find it difficult to preserve data
long-term.• 50% of respondents find freely
available software highly valuable in managing or preserving research data, media or images.
Ithaka UK Survey of Academics 2015• Next most valuable is a disciplinary
or departmental repository at their institution.• Then their IT dept.• Then their library.• When projects end, 60% preserve
their research data themselves using commercially or freely available software.
Ithaka UK Survey of Academics 2015• This has substantially decreased
since 2012, while the number using an institutional or other online repository has substantially increased.• Number using their library for
preservation has substantially increased.• Humanists most likely to do
preservation themselves.
Ithaka UK Survey of Academics 2015• RLUK respondents more likely to use
their university’s or another online repository.• UK respondents more likely than
their US peers to use a repository, and less likely to self-preserve.• EPSRC mandate?
Funder influence• A good thing?• Librarians say yes! Drives behaviours that we know don’t come from
‘integrity of science’ argument.• Academics find it less welcome: this is where we come in?• But danger of being compliance police.• Alliance with research policy offices critical.
Integrity of science• Public scrutiny of evidence.• Data must be accessible, intelligible,
assessable, usable.• Therefore requires metadata.• ‘We are now on the brink of an achievable
aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable.’
Recommendations• Universities should recognise data
communication for progression & reward.• Develop a data strategy and curation capacity.• Be open by default.• Research Councils should include costs of data
preparation and metadata in costs of research.• Journals should require that underpinning data is
accessible as a condition of publication.
However … Jim Scott• Professor in the schools of Physics &Astronomy and
Chemistry.• Pioneering research on nano-memories used in
smartcards (industry worth £100m).• Unesco medal in October.• ‘I thought scientists were marching together into the
future to help humanity. That is not true; we are as competitive and unscrupulous as used-car dealers.’
Back to the library• Now we manage from the inside out, as well as the outside in.• Why? Research reputation, competitiveness, accountability (research
profile management).• Libraries now have a potential relationship with all academic and
research staff.• New role: to capture the local.• Libraries used to reflect scholarly disciplines back to themselves.• Still do so, but ensuring that our own institutions’ contributions are
maximised. Much less than before is left to chance.
Change is gonna come …• Public accountability is driving the changes to the system that scientific integrity
should be making, or should already have made.• We see this also in OA publishing – whether RCUK (gold) or Hefce/REF (green).• Disappointing? Let’s not forget:
• The academy is intensely conservative.• How many years has the OA community striven for change?
• Whatever the reason, we need systems to help us achieve capture.• So Jisc’s work on shared services for research data management is good news!• Libraries are in this business, and need robust infrastructure for data storage and
management, as well as expertise.
The challenges of data for librarians• Are books and articles ‘data’?• Librarians consider data as a primary source; books and
articles as secondary (unless constituted as ‘corpora’); catalogues, indexes and search engines as tertiary.• Data are difficult to separate from the software, equipment,
documentation, and knowledge required to use them. There is an elephant in the room.• When is data data?• The reusability conundrum: experimental data can be
recreated, whereas observational data cannot.
The challenges of data for librarians• Hidden data: in paper and digital form, in both public
and private hands (offices, labs, freezers).• Uncovering historic data is of use for digital
humanities.• Disciplines have their own ontologies: how to cross-
search?• Disciplines resist simplistic mirroring behaviour.• Chemistry mixes open and closed models.• Biotechnologists patent first and publish later.
The challenges of data for librarians• High-energy physicists are open with publications but
do not make data publicly available.• Biotechnologists restrict publication but openly
deposit genome and protein data.• Scholars in fields that replicate experiments or draw
on observational data are positive about data sharing.• Scholars who only work with their own data won’t
standardise their data management practices.• Where data preparation is labour-intensive, scholars
are less inclined to share.
The challenges of data for librarians• Scarce or novel data (eg new cell lines) less likely to be
shared because they are labour-intensive and may yield subsequent data and publications.• Market value affects willingness to share.• So does sensitivity (eg human subject records).• And so does the decision on when data is data
(verification).• Graduate students in high-paradigm fields are more
likely to use the same tools and information resources as their advisors.• Students in low-paradigm fields are more likely to seek
new information tools and techniques.
The challenges of data for librarians• Documents don’t always tally with their underpinning
data: terminology is simplified; data collection may be compromised; reporting can be accelerated due to pressure from funders; formatting requirements can affect presentation.• The relationship can be fuzzy.• Is this falsification?
Libraries and research data• Librarians have had an over-simplified view of data and its
infrastructure.• They don’t know what data is (are); but nor do academics outside of
their own discipline.• Royal Society: ‘A realistic means of making data open to the wider
public needs to ensure that the data that are most relevant to the public are accessible, intelligible, assessable and usable for the likely purposes of non-specialists. The effort required to do this is far greater than making data available to fellow specialists.’• A role here for libraries?
Evolving libraries• Research libraries - a four-element brand:
• Publications• Space• Special• Capture
• Can libraries develop a layer of meta-understanding, sufficient to describe types and scopes and methods and characteristics at a general level, and so do for data what they have done for many years for the world of publications?• Can they move from an essentially passive role as capture agents in response to
funder requirements, to becoming active, trusted managers of scholarly data?
Thank you!
Have a good meeting.