Top Banner
EBI is an Outstation of the European Molecular Biology Laboratory. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull http://twitter.com/dullhunk European Bioinformatics Institute, EBI.ac.uk e-Science workshop: The influence and impact of Web 2.0 on various applications 11th-12th May 2010, Edinburgh
24

Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

May 11, 2015

Download

Education

Duncan Hull

Abstract: This talk will describe the use of http://www.citeulike.org to manage and share bibliographic references among 1300 scientists and engineers working at the Sanger Institute (http://www.sanger.ac.uk) and European Bioinformatics Insitute (http://www.ebi.ac.uk) based on the Wellcome Trust Genome Campus in Cambridge, UK. Using data from references shared so far, we will illustrate the costs, benefits and adoption of citeulike to create and share bibliographic data on the web.

Presentation from The Influence and Impact of Web 2.0 on Various Applications at the National e-Science Centre, Edinburgh, UK.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

EBI is an Outstation of the European Molecular Biology Laboratory.

Bibliography 2.0: A case study from the Wellcome Trust Genome Campus

Dr. Duncan Hull

http://twitter.com/dullhunkEuropean Bioinformatics Institute, EBI.ac.uk

e-Science workshop: The influence and impact of Web 2.0 on various applications

11th-12th May 2010, Edinburgh

Page 2: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.232

Overview

• Introduction: Wellcome Trust Genome Campus• The European Bioinformatics Institute (ebi.ac.uk)• The Wellcome Trust Sanger Institute (sanger.ac.uk)• The Library

• Problem: economics and “freakonomics” of publishing• The unintended consequences of “publish or perish”• Burying data in publication silos• Obscuring identities and obstructing social applications

• Solution? Bibliography 2.0 with citeulike• Incentives• Disincentives• Case study: What we’ve learnt

• Conclusions and future work

Page 3: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

EBI is an Outstation of the European Molecular Biology Laboratory.

Wellcome to the Genome Campus

Home of The European Bioinformatics Institute The Sanger InstituteJust outside Cambridge, UK

Page 4: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

EBI: a data hub for bioinformatics in EuropeLiterature

ebi.ac.uk/citexplore

DNA +RNA sequencesebi.ac.uk/ena

Genomes: ensembl.org

Transcriptomese.g. ArrayExpress

Protein structureebi.ac.uk/pdbe

Protein domains, familiesebi.ac.uk/interpro

Pathways reactome.org

Systemsbiomodels.net

Small moleculesebi.ac.uk/chebi

andebi.ac.uk/chembl

Protein sequenceuniprot.org

Protein protein interactionsebi.ac.uk/intact

~400 staff (research/services), publishing data on the web

Page 5: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.235

e.g. Chemical Entities of Biological Interest (ChEBI)Free database /ontology of 500,000 small molecules (many drugs)

Page 6: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

The Wellcome Trust Sanger Institute

12.04.236 Alex Bateman ~900 Sanger staff (total)

Page 7: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Shared Library

12.04.237

Annual Journal subscription

budget £500,000

(modest compared to multi million pound journal budgets of

university libraries)

More later

Page 8: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

• )

12.04.238

“People respond to incentives, although not necessarily in ways that are predictable and manifest.

Therefore, one of the most powerful laws in the universe is the law of unintended consequences. This

applies to schoolteachers and Realtors and crack dealers as well

as expectant mothers, sumo wrestlers, bible salesman, and the

Ku Klux Klan…”

…and scientists too…

Page 9: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Unintended consequences, an example

• Incentive: “publish or perish”• Publications are rewarded with recognition, hiring, promotion,

tenure, fame, funding, fortune, prizes, job satisfaction etc

• Unintended consequences:• Valuable data gets damaged, destroyed or “buried” (see later)• Inaccessible to data and text mining on the Web

• Copyright and toll-access journals• Luddite scientists

• Minimal exploitation of social software for sharing data• Minimal exploitation of Web 2.0 for sharing data

12.04.239

Page 10: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

• Gene names: e.g. Hexokinase, HK1, HK2, HK3• Protein names: e.g. Hexokinase, HK1, HK2, HK3• Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc • Author names: e.g. Mark Baker (see next slide)• Poor precision and recall

12.04.2310

Why bury it [data] first and then mine it again?

Barend Mons, Wikiproteins http://proteins.wikiprofessional.org

Which gene did you mean?BMC Bioinformatics. 2005 Jun 7;6:142

DOI:10.1186/1471-2105-6-142

Page 11: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Identity crisis: Mark Baker

http://pubmed.gov?term=Baker+M[author]

http://pubmed.gov?term=Mark+Baker[author]

etc

12.04.2311

Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person

Open Researcher and Contributor ID http://orcid.org

“Tell me whenever Mark Baker publishes a paper”

Page 12: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Social information (need identity for this)

• Socialisation: (e-science > “we-science”)• How many other people have read this paper?• What are my friends / enemies reading?• What other papers did they also read?

• Personalisation (e-science > “me-science”)• These are my publications• This is my bibliography (stuff I’m reading / have read)• Digital libraries “document-centred” rather than “people-centred”

Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R. Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29. DOI:10.1145/1552303.1552304

12.04.2312

Page 13: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

A solution, citeulike.org?

• http://www.citeulike.org• Lack of personalisation of library data• Lack of socialisation of library data

• Works a lot like http://www.delicious.com

12.04.2313

Page 14: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2314

Click Post to Citeulike

Page 15: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2315

Tag it (optional) e.g. author tags

Page 16: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2316

Journal picks is a group of 40+ invited users on campus, who select interesting papers

Page 17: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2317

2,016 unique articles in journal picks

(less than one year)

3,880,055 unique articles total

Page 18: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2318

Citeulike + ZeitGeist = CiteGeist

http://www.citeulike.org/citegeist

Page 19: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Citeulike incentives

• Selfish scientist (just organise my reference mess) • What’s popular (interesting stuff CiteGeist)• Serendipity (find papers you wouldn’t find normally)• Increase visibility and PageRank of papers?• Person-centred access points into first / second page of

Google results

e.g. http://www.google.com/search?q=carole+goble

Has result below fairly high up list,

http://www.citeulike.org/group/10570/tag/carole-goble

12.04.2319

Page 20: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Citeulike disincentives

• Privacy, don’t want to share with rivals• (but can make collections private)

• Citeulike might go bust? • But Springer sponsored

• Parsers are fragile• easily (and deliberately) broken by publishers

• Valuable data in the hands of a commercial company?• But Facebook? LinkedIn? Twitter etc?

• No academic reward for using it • publication = “finished”

• Social software works best with network effects• There are LOTS of other tools that do this…

12.04.2320

Page 21: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

12.04.2321

And the rest…

www.mendeley.com

www.zotero.org

www.connotea.org

www.mekentosj.com

www.hubmed.org

www.refworks.com

“iTunes for PDF files”

“Last.fm of research”

Page 22: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Giant corporate commercial competitors

• With significant vested financial interests• Scopus http://www.scopus.com/ • ISI WOK http://isiknowledge.com

Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Comput Biol  4 (10), e1000204+. DOI:10.1371/journal.pcbi.1000204

12.04.2322

Page 23: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Conclusions

• “Publish or perish” has some unfortunate and unintended consequences in science

• Citeulike is an interesting Web 2.0 tool• We’ve had some success using it (typical “long tail”)• Weak incentives for use by many cultural barriers to adoption• Technical barriers to adoption, many tools, messy data

• Future work• Social network analysis, clickthroughs, tag analysis• Any other ideas…

• But the times they are a changin’• Citeulike or something like it will work much better if/when

“publishing” incentives change over time…

12.04.2323

Page 24: Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus

Acknowledgements

• Mark Baker for organising this workshop• EBI, Christoph Steinbeck (laboratory head)• Carole Goble, University of Manchester• The Sanger, Alex Bateman, Frances Martin, Tim Hubbard

and all the contributors to the Journal Picks group• Richard Cameron, Kevin Emamy and the rest of the

citeulike team• BBSRC for funding• Any questions?

12.04.2324