NISO Working Group Connection Live! Research Data Metrics Landscape: An Update from the NISO Altmetrics Working Group B: Output Types & Identifiers

NISO Working Group Connection LIVE!Research Data Metrics Landscape:

An update from the NISO Altmetrics Working Group B: Output Types & Identifiers

Monday, November 16, 2015

Presenters:

Kristi Holmes, PhD, Director, Galter Health Sciences Library, Northwestern UniversityMike Taylor, Senior Product Manager, Informetrics, Elsevier

Philippe Rocca-Serra, Ph.D., Technical Project Leader, Oxford

Tom Demeranville, THOR Senior Project Officer & ORCiD Software Engineer

Martin Fenner, Technical Director, DataCite

Dr. Sarah Callaghan, Senior Researcher and Project Manager, British Atmospheric Data Centre

Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Dept of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University

http://www.niso.org/news/events/2015/wg_connections_live/altmetrics_wgb/








•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

••

•

1.2.

3.4.5.

6.

7.

8.

Tha nk you!

Data-Level Metrics

Martin FennerDataCite Technical Director http://orcid.org/0000-0003-1419-2405

http://orcid.org/0000-0003-1419-2405

http://orcid.org/0000-0003-1419-2405

Project PartnersCalifornia Digital Library, PLOS, DataONE

National Science Foundation Grant 1448821 http://www.nsf.gov/awardsearch/showAward? AWD_ID=1448821

Project Pagehttp://mdc.lagotto.io

Making Data Count

http://www.nsf.gov/awardsearch/showAward

http://www.nsf.gov/awardsearch/showAward

http://mdc.lagotto.io/

MDC TeamStephen Abrams Matt Jones Peter Slaughter John KratzDave Vieglais

Project ends February 29, 2016

Jennifer Lin John Chodacki Patricia Cruse Martin Fenner Kristen Ratan Carly Strasser

Goals

What metrics for research data do researchers and data managers want?

Do data repositories make these metrics available?

If not, build services to collect these metrics for DataONE repository network

How interested would you be to know each of the following about the impact of your data?

http://doi.org/10.1038/sdata.2015.39http://www.dx.doi.org/10.5060/D8H59D

http://doi.org/10.1038/sdata.2015.39

http://www.dx.doi.org/10.5060/D8H59D














What metrics/statistics does your repository currently track and expose?

http://doi.org/10.1038/sdata.2015.39http://www.dx.doi.org/10.5060/D8H59D

http://doi.org/10.1038/sdata.2015.39















Citations

Metadata of datasets

https://search.labs.datacite.org/?q=10.5061%2FDRYAD.KG943

Metadata of articles

References are part of the metadata deposited to CrossRef

Cited-by service aggregates these citations for CrossRef DOIs

Work is underway to exchange DOI <-> DOI links between CrossRef and DataCite

https://cls.labs.datacite.org htts://det.crossref.org

DOI <-> DOI links are stored outside of the DataCite and CrossRef Metadata Stores

Fulltext search

http://dlm.labs.datacite.org/works/http://doi.org/10.5061/dryad.f1cb2

http://doi.org/10.5061/dryad.f1cb2

http://doi.org/10.5061/dryad.f1cb2

Second order events

http://dlm.labs.datacite.org/sources/pmceurope

http://dlm.labs.datacite.org/sources/pmceurope

Downloads

https://www.dataone.org/

http://www.dataone.org/



Usage Stats

aggregate DataOne usage log files from DataOne member nodes

parse logs, applying COUNTER rules•

•double-click intervals whitelist user agents

two versions of usage stats•

•COUNTER-compliantpartial compliant (include some bots)

Average %

of not

filteredsince 2005COUNTER 63.57%

Partial 63.59%

this past yearCOUNTER 44.88%

Partial 47.05%

Usage Stats

Future Work

•Collect data citations from CrossRef

•Analyze usage statistics in more detail and provide input to COUNTER and NISO

•Analyze network graph, e.g. linked datasets and second order citations

•Turn research project into service, including integration of client applications for search and reporting

Introducing the Metadata Model v1

Philippe Rocca-Serra PhD,University of Oxford e-Research Centre

on behalf of WG3 Metadata WG

Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego

A trans-NIH funding initiative established to enable biomedical research as a digital

research enterprise

• Facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable ->

• Conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data ->

catalog to enable researchers to find, cite research datasets

ease the use community standards to annotate datasets

Lucila Ohno-Machado (PI)Jeff Grethe



Pilot applications that ‘dock’ with the prototype and community-driven activities via Working Groups:1. BD2K Centers of Excellence Collaboration2. Data Identifiers Recommendation3. Metadata Specifications 4. Use Cases and Testing Benchmarks5. Dataset Citation Metrics6. Criteria for Being Included in the DDI7. Machine Actionable Licenses8. Ranking Algorithm9. End User Evaluation Criteria10. Repository Collaboration11. Outreach Meeting: Repository Operators12. Standard-driven Curation Best Practices13. Evaluation of Harvesting and NLP Pilot Projects

All this by August 2017!

Joint effort with BD2K Center for Expanded Data Annotation and Retrieval (CEDAR)

Synergies with BD2K cross-centers Metadata WG (co-chaired by M Musen/CEDAR, G Alter/bioCADDIE) and ELIXIR activities

WG3 Metadata - Goals

Define a set of metadata specifications that support intended capability of the Data Discovery Index prototype - being designed by the bioCADDIE Core Development Team - as outlined in the White Paper

Core metadata, designed to be future-proofed for progressive extensions (phase 1: May-July 2015) Followed by test and implementation phase

Domain specific metadata for more specialized data types (phase 2)

Use cases and the competency questions have been used throughout the process To define the appropriate boundaries and level of granularity:

which queries will be answered in full, which only partially, and which are out of scope



WG3 Metadata – work to date

with contributions, comments from several WG 3 members and colleagues, in particular: Joan Starr, George Alter, Ian Fore, Kevin Read, Stian Soyland-Reyes, Muhammad Amith, Michel Dumontier…

By:

Contains lists of material reviewed• data discovery initiatives and metadata initiatives• existing meta-models for representing metadata elements

Outlines the approach used to identify metadata descriptors • Via use cases and competency questions (top-down

approach)• Mapping generic and life science-specific metadata

schemas (bottom-up approach) Listed in the BioSharing collection for bioCADDIE

The results of both approaches has been compared and converged on the core set of metadata


Standard Operating Procedure (SOP)

https://www.biosharing.org/collection/6



List of Metadata Schema considered

• schema.org• datacite• hcls dataset descriptors• biosample• geo miniml• prideml• isatab/magetab• ga4gh metadata schema• sra xml• bioproject• cdisc sdm / element of bridge modelSupported by the NIH grant 1U24 AI117966-01 to the University of

California, San Diego

Bottom-up approach: survey of existing models


Selected competency questions representative set from use cases workshop, white paper, submitted by the

community and from Phil Bourne questions have been abstracted and key metadata elements have been

highlighted and color-coded and categorized as the set of core and extended metadata elements are defined, it will

become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part.


Use Cases and Derived Metadata

Selected competency questions representative set from use cases workshop, white paper, submitted by the

community and from Phil Bourne questions have been abstracted and key metadata elements have been

highlighted and color-coded and categorized as the set of core and extended metadata elements are defined, it will

become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part.


Use Cases and Derived Metadata

Processing use cases


All use cases on equal footing

Term BinningMaterialProcessInformationProperty

Relation identification

Core metadata elements and initial model the result of the combined approaches has delivered a set of core metadata

elements and progressively these will/could be extended to domain specific ones, in phase two, as needed

we aim to have maximum coverage of use cases with minimal number of data elements, but we do foresee that not all questions can be answered in full

Initial Set of Metadata Elements

Initial Set of Metadata Elements

Everything is on github


Formal specificationsmetadata schema in JSON

• https://github.com/biocaddie/WG3-MetadataSpecifications/tree/master/json-schemas


What’s next ?

With this work phase 1 has been completed We have entered the evaluation phase

the model will be implemented and tested by the bioCADDIE Development Team with a number of data sources

the results will inform the activities in phase 2, where the metadata elements and the model may be revised, simplified and/or enriched, as needed


Take Home Message

• primary goal: provide a general purpose metadata

schema allow harvesting of key experimental and

data descriptors from a variety of resources and

enable indexing to support data discovery

– relations between authors, datasets, publication

and funding sources

– nature of biological signal, nature of perturbation,


Outstanding issues

• prioritizing the use cases

• defining mechanisms to deal with domain specific,

granular data

• moving into phase2 and devising data ingesters

– ETL activities

– interact with other modeling efforts

• incorporate feedback from users and developers


Question Time


orcid.org

Contact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA

ORCID, Metrics andProject THOR Tom Demeranville Senior Technical Officier – Project THOR NISO Webinar, November 2015

Start Here

What is ORCID?

orcid.org

16 November 2015

55

ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers.ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs

347 members, 4 national consortia,over 200 integrations

research inst 68%

publisher 12%

funder 5%

9%

association 6%

repository MEA 3%

orcid.org

16 November 2015

56

Europe 58%Latin

America 1%

North Ameri

ca 26%

Pacific 7%

Asia 5%

What ORCID isn’t

orcid.org

16 November 2015

57

ORCID is not a CRIS systemORCID is not a researcher profile system ORCID is not a research activity metadata store

Research outputs

orcid.org

16 November 2015

58

•  ORCID includes links to publications, patents, datasets, software and more.

•  ORCID uses the CASRAI Output vocabulary for work types

•  ORCID references over 20 other output identifiers (more are being added!)

Otherresearcher activities

orcid.org

16 November 2015

59

•  Peer review•  Education•  Employment

ORCID and Metrics

orcid.org

16 November 2015

60

ORCID doesn’t track metrics – it’s not our focus

ORCID is an enabling

infrastructure ORCID improves

robustness of metrics

ORCID and Metrics

orcid.org

16 November 2015

61

•  ORCID improves the quality of research information and makes gathering it and disseminating it easier.

•  Other services use ORCID IDs to improve their data•  ORCID IDs are found in DOI metadata, funder

systems, publishers, CRIS systems, national reporting frameworks and more

•  Institutions can discover researcher curated standard and non-standard outputs or be notified when added

Project THOR

http://project-thor.eu

EC funded H2020 2.5 year project

Establish seamless integration between articles, data, and researchers across the research lifecycle

Make persistent identifier use for people and research artefacts the default

Both human and technical in scope

http://project-thor.eu/



Better identifiers == Better Metrics



What THOR are up to


Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts


Organisation identifiers


Organisation identifiers are important for all areas of scholarly communication, including metrics.

The organisation identifier landscape is fragmented. There are gaps.

It’s a hard problem. Everyone knows this.




Community driven consensus on requirements is needed.

We need a way forward.

THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.


Thanks

orcid.org

16 November 2015

15

http://project-thor.eu@tomdemeranville [email protected]


mailto:[email protected]


orcid.org

Contact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA

ORCID, Metrics andProject THOR Tom Demeranville Senior Technical Officier – Project THOR NISO Webinar, November 2015

Start Here

What is ORCID?

orcid.org

16 November 2015

70

ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers.ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs

347 members, 4 national consortia,over 200 integrations

research inst 68%

publisher 12%

funder 5%

9%

association 6%

repository MEA 3%

orcid.org

16 November 2015

71

Europe 58%Latin

America 1%

North Ameri

ca 26%

Pacific 7%

Asia 5%

What ORCID isn’t

orcid.org

16 November 2015

72

ORCID is not a CRIS systemORCID is not a researcher profile system ORCID is not a research activity metadata store

Research outputs

orcid.org

16 November 2015

73

•  ORCID includes links to publications, patents, datasets, software and more.

•  ORCID uses the CASRAI Output vocabulary for work types

•  ORCID references over 20 other output identifiers (more are being added!)

Otherresearcher activities

orcid.org

16 November 2015

74

•  Peer review•  Education•  Employment

ORCID and Metrics

orcid.org

16 November 2015

75

ORCID doesn’t track metrics – it’s not our focus

ORCID is an enabling

infrastructure ORCID improves

robustness of metrics

ORCID and Metrics

orcid.org

16 November 2015

76

•  ORCID improves the quality of research information and makes gathering it and disseminating it easier.

•  Other services use ORCID IDs to improve their data•  ORCID IDs are found in DOI metadata, funder

systems, publishers, CRIS systems, national reporting frameworks and more

•  Institutions can discover researcher curated standard and non-standard outputs or be notified when added

Project THOR


EC funded H2020 2.5 year project

Establish seamless integration between articles, data, and researchers across the research lifecycle

Make persistent identifier use for people and research artefacts the default

Both human and technical in scope




Better identifiers == Better Metrics



What THOR are up to


Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts




Organisation identifiers are important for all areas of scholarly communication, including metrics.

The organisation identifier landscape is fragmented. There are gaps.

It’s a hard problem. Everyone knows this.




Community driven consensus on requirements is needed.

We need a way forward.

THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.


Thanks

orcid.org

16 November 2015

15

http://project-thor.eu@tomdemeranville [email protected]




VO Sandpit, November 2009

Bibliometrics for Data – what counts and what doesn’t?

Sarah [email protected]

@sorcha_ni

NISO Working Group Connections LIVE!Research Data Metrics Landscape:

An update from the NISO Altmetrics Working Group B: Output Types & IdentifiersMonday, November 16 from 11:00 a.m. - 1:00 p.m. (ET)


The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings.

We deal with a variety of environmental measurements, along with the results of model simulations in:•Atmospheric science•Earth sciences•Earth observation•Marine Science•Polar Science•Terrestrial & freshwater science, Hydrology and Bioinformatics•Space Weather

Who are we and why do we care about data?


Data, Reproducibility and Science

Science should be reproducible – other people doing the same experiments in the same way should get the same results.

Observational data is not reproducible (unless you have a time machine!)

Therefore we need to have access to the data to confirm the science is valid! http://www.flickr.com/photos/31333486@N00/1893012324/

sizes/o/in/photostream/


It used to be “easy”…

Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665

The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867

…but datasets have gotten so big, it’s not useful to publish them in hard copy anymore


Hard copy of the Human Genome at the Wellcome Collection


Creating a dataset is hard work!

"Piled Higher and Deeper" by Jorge Chamwww.phdcomics.com

Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too.

We want to reward researchers for putting that effort in!


Most people have an idea of what a publication is








Some examples of data (just from the Earth Sciences)

1. Time series, some still being updated e.g. meteorological measurements

2. Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer

3. 2D scans e.g. satellite data, weather radar data

4. 2D snapshots, e.g. cloud camera5. Traces through a changing medium,

e.g. radiosonde launches, aircraft flights, ocean salinity and temperature

6. Datasets consisting of data from multiple instruments as part of the same measurement campaign

7. Physical samples, e.g. fossils


What is a Dataset?

DataCite’s definition (http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf):

Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data."

(from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation).

In my opinion a dataset is something that is:•The result of a defined process•Scientifically meaningful•Well-defined (i.e. clear definition of what is in the dataset and what isn’t)

http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf

http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf


What metrics do we use for our data?


Metric Breakdown CEDA numbers Notes

Number of discovery dataset records in the DCS

Quarterly NEODC 26 BADC 242 UKSSDC 11

Compliance with NERC data management policy. Reflects how many data sets NERC has. The number of dataset discovery records visible from the NERC data discovery service.

Web site visits Quarterly

BADC: 61,600NEODC: 10,200

Active use and visibility of the data centre. Site visits from standard web log analysis systems, such as webaliser. Sensible web crawler filters should have been applied.

Web site page views

Quarterly BADC: 219,900NEODC: 25,800

See web visits notes.

Queries closed this period

Quarterly 362 helpdesk queries838 dataset applications

Active use and visibility of the data centre. Queries marked as resolved within the quarter. A query is a request for information, a problem or ad hoc data request.

Queries received in period

Quarterly 388 helpdesk queries860 dataset applications

Active use and visibility of the data centre. See closed query notes.

Data centre metrics – produced 15th July 2014


Metric Breakdown CEDA numbers NotesPercent queries dealt with in 3 working days

Quarterly 84.06 (11.57% resolved after 3 days)87.67 (10.23% resolved after 3 days)Queries receiving initial response within 1 working day Helpdesk - 93.57 %Dataset applications - 97.91%

Responsiveness. See closed query notes

Identifiable users actively downloading

None Over year to date: BADC: 4065NEODC: 362

Use and visibility of the data centre. An estimate of the number of users using data access services over the year.

Number of metadata records in data centre web site

None BADC: 240NEODC:33

INSPIRE compliance. Reflects how many data sets NERC has.

Number of datasets available to view via the data centre web site

None (Metric in development) INSPIRE compliance. Usable services.

Number of datasets available to download via the data centre web site

None (Metric in development) INSPIRE compliance. Usable services.

Data centre metrics – produced 15th July 2014


Metric Breakdown CEDA numbers NotesNERC funded Data centre staff (FTE)

None 14 (estimate for FY 14/15)

Data management costs. Efficiency. Number of full time equivalent posts employed to perform data centre functions.

Direct costs of Data Stewardship in data centre

None (reportable at end of financial year)

Data management costs. Efficiency. Cost to NERC

Capital Expenditure directly related to Data Stewardship at data centre

None (reportable at end financial year)

Data management costs. Efficiency.

Direct Receipts from Data Licenses and Sales

None £0 (CEDA does not charge for data)

Commercial value of data products and services

Number of projects with Outline Data Management Plans

None (Metric in development)

Means of tracking projects’ adoption of good DM practice. Outline DMP is at proposal stage

Number of projects with Full Data Management Plans

None (Metric in development)

Means of tracking projects’ adoption of good DM practice. Full DMP is at funded stage

Users by area UK 2534 61% Active use. Visibility of the data centre internationally. Percentage of user base in terms of geographical spread.

Europe 494 12%Rest of the world

1024 25%

Unknown 79 2%Users by institute type University 2934 71% Active use. Visibility of the data centre

sectorially. Percentage of users base in terms of the users host institute type.

Government 694 17%NERC 160 4%Other 277 7%Commercial 42 1%School 35 1%


Short answer:

We don’t know!!

Unless the data user comes back to us to tell us.Or we stumble across a paper which•Cites us•Or mentions us in a way that we can find

• And tells us what the dataset the authors used was.

This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation.

After the data is downloaded, what happens then?


How we (NERC) cite data

We using digital object identifiers (DOIs) as part of our dataset citation because:

• They are actionable, interoperable, persistent links for (digital) objects

• Scientists are already used to citing papers using DOIs (and they trust them)

• Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs.

• We have a good working relationship with the British Library and DataCite

NERC’s guidance on citing data and assigning DOIs can be found at: http://www.nerc.ac.uk/research/sites/data/doi.asp

http://www.nerc.ac.uk/research/sites/data/doi.asp


Dataset catalogue page (and DOI landing page)

Dataset citation

Clickable link to Dataset in the archive


Another example of a cited dataset


Another example of a cited dataset


Data metrics – the state of the art!

Data citation isn’t common practice (unfortunately)

Data citation counts don’t exist yet

To count how often BADC data is used we have to:

1. Search Google Scholar for “BADC”, “British Atmospheric Data Centre”

2. Scan the results and weed out false positives

3. Read the papers to figure out what datasets the authors are talking about (if we can)

4. Count the mentions and citations (if any)

http://www.lol-cat.org/little-lovely-lolcat-and-big-work/

We’re working with DataCite and Thompson Reuters to get data

citation counts.


Altmetrics and social media for data?

Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers.

We have a social media presence @CEDAnews

- Mainly used for announcements about service availability

We definitely want ways of showing our funders that we provide a good service to our users and the research community.

And we want to be able to tell our depositors what

impact their data has had!


RDA/WDS WG Bibliometrics Survey Results: Mostly Expected

Citations are preferred metrics, downloads next.

Standards are missing.Culture change is needed.

Nothing

Data citation counts

Downloads

Social media (likes/shares/tweets)

Mentions in peer-reviewed papers

Hits in search engines

Mentions in blogs

Bookmarks in Zotero and/or Mendeley

Other (please specify)

0 10 20 30 40 50 60 70

31.5%

68.5%

Are the methods you use to evaluate impact adequate for

your needs?

YesNo

What do you currently use to evaluate the impact of data?


Other projects in the data metrics space

1. CASRAI data level metrics 2. PLOS Making Data Count 3. NISO altmetrics 4. Jisc Giving Researchers Credit for their Data


Next steps for Bibliometrics for Data WG

Will be based on:• WG survey results (presented RDA P4 and P5)• Spreadsheet of metrics being collected by repositories - Still open

for contributions! http://bit.ly/1MpyW4K • Shared results from other projects – understanding the challenges

and answering the questions posed in the case statement• Preliminary analysis of data DOI resolutions• Supporting and evaluating tools from other projects• Preliminary guidance for the community - “minimal” rather than

“best” practice – get people discussing the issues and coming up with solutions!

http://bit.ly/1MpyW4K


Thanks!Any questions?

[email protected] @sorcha_ni

http://citingbytes.blogspot.co.uk/

Image credit: Borepatch http://borepatch.blogspot.com/2010/06/its-not-what-you-dont-know-that-hurts.html

“Publishing research without data is simply advertising, not science” - Graham Steel

http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/

http://citingbytes.blogspot.co.uk/


Title: Getting (and giving) credit for all that we do

Melissa Haendel

NISO Research Data Metrics Landscape: An update from the NISO Altmetrics Working Group B:

Output Types & Identifiers11.16.2015

@ontowonka


What *IS* “success”?

VO Sandpit, November 2009https://goo.gl/b60moX

It’s not always what you see


What is attribution???



Over 1000 authors


Project CRediT

http://projectcredit.net

http://projectcredit.net/

http://projectcredit.net/


Many contributions don’t lead to authorship

BD2K co-authorship

D.EichmannN.Vasilevsky

20% key personnel are not adequately profiled using publications


Some contributions are anonymous

Data depositionImage credit: http://disruptiveviews.com/is-your-data-anonymous-or-just-encrypted/

Anonymous review


The Research Life Cycle

EXPERIMENT

CONSULT

PUBLISHDATA

FUND


The Research Life Cycle

EXPERIMENT

CONSULT

PUBLISHDATA

FUND

Network


• Measurement instruments• Continuing education materials• Cost-effective intervention• Change in delivery of healthcare services• Quality measure guidelines• Gray literature

Evidence of meaningful impact

• New experimental methods, data models, databases, software tools

• New diagnostic criteria • New standards of care• Biological materials, animal models• Consent documents• Clinical/practice guidelines

https://becker.wustl.edu/impact-assessment http://nucats.northwestern.edu/

Diverse outputs Diverse impacts

Diverse rolesEach a critical component of the

research process


EXAMPLE OUTPUTS related to software:

Outputs: binary redistribution package (installer), algorithm, data analytic software tool, analysis scripts, data cleaning, APIs, codebook (for content analysis), source code, software to make metadata for libraries archives and museums, data analytic software tool, source code, program codes (for modeling), commentary in code(thinking of open source-need to attribute code authors and commentator/enhancers/hackers, who can document what they did and why), computer language (a syntax to describe a set of operations or activities), software patch (set of changes to code to fix bugs, add features, etc.), digital workflow (automated sequence of programs, steps to an outcome), software library (non-stand alone code that can be incorporated into something larger), software application (computer code that accomplishes something)

Roles: catalog, design, develop, test, hacker, bug finder, software developer, software engineer, developer, programmer, system administrator, execute, document, software package maintainer, project manager, database administrator

Attribution workshop results - >500 scholarly products


Connecting people to their “stuff”


Modeling & implementation

VIVO-ISF: Suite of ontologies that integrates and extends community standards


Credit extends beyond the original contribution

Stacy creates mouse1

Kristi creates mouse2

Karen uses performs RNAseq analysis on mouse1 and

mouse2 to generate dataset3, which she subsequently curates and analyzes

Karen writes publication pmid:12345 about the results of her analysis

Karen explicitly credits Stacy as an author but not Kristi.


Credit is connected

Credit to Stacy is asserted, but credit to Kristi can be inferred


Introducing openRIF

The Open Research Information Framework

openRIF

SciENcv

eagle-i

VIVO-ISF


Ensuring an openRIF that meets community needs

Data Entry Discovery

Interoperability

A domain configurable suite of ontologies to enable interoperability across systems

A community of developers, tools, data providers, and end-users


Developing a computable research ecosystem

Research information is scattered amongst:Research networking toolsCitation databases (e.g., PubMED)Award databases (e.g., NIH Reporter)Curated archives (e.g., GenBank)Locked up in text (the research literature)

Map SciENcv data model to VIVO-ISF/openRIF

Enable bi-directional data exchangeIntegrate SciENcv, ORCID data into

CTSAsearch

http://research.icts.uiowa.edu/polyglot/CTSAsearch:

The Open Research Information Framework

David Eichmann

http://research.icts.uiowa.edu/polyglot/




Thank you!

Join the Force Attribution Working Group at: https://www.force11.org/group/attributionwg

Join the openRIF listserv at: http://group.openrif.org

https://www.force11.org/group/attributionwg

https://www.force11.org/group/attributionwg

http://group.openrif.org/

http://group.openrif.org/


Identifying those scholarly outputs

Identifiers for things that are not publications, or documents, need to get beyond thinking about DOIs