Who Are You? Managing collaborative digital identities in bioinformatics with myExperiment Duncan Hull Postdoctoral Research Associate Manchester Biocentre mib.ac.uk , School of Chemistry University of Manchester, UK NETTAB 2009, Catania, Italy, June 2009
Digital Identity is fundamental to collaboration in bioinformatics research and development because it enables attribution, contribution, publication to be recorded and quantified. However, current models of identity are often obsolete and have problems capturing both small contributions "microattribution" and large contributions "mega-attribution" in Science. Without adequate identity mechanisms, the incentive for collaboration can be reduced, and the utility of collaborative social tools hindered. Using examples of metabolic pathway analysis with the taverna workbench and myexperiment.org, this talk will illustrate problems and solutions to identifying scientists accurately and effectively in collaborative bioinformatics networks on the Web.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Who Are You? Managing collaborative digital identities in bioinformatics with myExperiment
Duncan HullPostdoctoral Research AssociateManchester Biocentremib.ac.uk, School of ChemistryUniversity of Manchester, UKNETTAB 2009, Catania, Italy, June 2009
• Many scientists don’t use these tools for serious work (if at all)
• Why?
• It’s complicated but…
Galileo Galilei (1632) Dialogo sopra i due massimi sistemi del mondo
Scientific publishing has worked this way for centuries
• Publishing the main (perhaps only) way of sharing data and communicating:
• “Publish or Perish”
Digital Data Driven Science• Science is increasingly digital and data-driven
– Scientists contributions are increasingly digital
– Not just digital publications in electronic journals…
– wiki edits, software development, workflows, database curation, ontology development, blog posts
– Traditional journal publishing is often inadequate for sharing this kind of data and attributing it to individual people
Burying or Destroying Data and Metadata?
• Publishing can be inadequte, difficult to mine
Barend MonsWikiproteins
Why bury it [data] first and then mine it again?
Which gene did you mean?http://pubmed.gov/15941477
BMC Bioinformatics. 2005 Jun 7;6:142.
In other cases important data and metadata gets destroyed completely
(author, title, gene, protein, chemical names etc)
Make digital libraries difficult to useDefrosting the Digital Library Hull, Pettifer and Kellhttp://www.pubmed.gov/18974831 PLoS Computational Biology 2008 Oct;4(10):e1000204
Double Trouble!
1. Scientists reluctant to share data until published in peer-reviewed journals
2. When they do publish, data often gets badly damaged or destroyed in the process. Digital Identity of people gets especially mangled…
CC licensed double trouble picture by Puck90 http://www.flickr.com/photos/puck90/2480833393/
Digital Identity is currently a mess (part 1)
• One person, can be identified by many different URIs
• People who know Paolo can tell the difference
– People who don’t (and software) face a significant challenge to disambiguate
• Digital Identity is a second-class citizen on the Web (see http://www.flickr.com/photos/dullhunk/3618998907/ for web e.g.)
Attribution would seem to be a simple process and yet it represents a
major, unsolved problem for information science.
Author name disambiguationChapter published in Volume 43 (2009) of the Annual Review of Information Science and Technology (ARIST) (edited by B. Cronin) which is available from the publisher Information Today, Inc
This is just one reaction, there are at least another 1700+ in Yeast
Refine Workflow:
1. Given SBML file, list all reactions
2. For each reactant, get synonms (e.g. synonyms of “D-glucose”)
3. Construct PubMed queries and execute them
4. Rank results
5. Display results to user
Workflow itself not rocket science (just a tool that needed to be built)
Services 2 and 4 have been based on other people’s workflows
saved lots of effort re-inventing the wheel
Services 1, 3 and 5 are “private” during prototyping
• Of the 661 workflows, 531 are publicly visible whereas 502 are publicly downloadable.
• 3% of the workflows with restricted access are entirely private to the contributor and for the remaining they elected to share with individual users and groups.
• 69 workflows (over 10%) have been shared, with the owner granting edit permissions to specific users and groups.
• In addition there are 52 instances where users have noted that a workflow is based on another workflow on the site.
• The most viewed workflow has 1566 views.
• There are 50 packs, ranging from tutorial examples to bundles of materials relating to specific experiments.
C
Some preliminary data: First few months of use
Conclusions
• myExperiment experience so far has been
• Scientists do share data but…
– you need to get digital identity right (still an unsolved problem)
– Get digital attribution right
• Allow fine grained control over what is shared and when with who and with what license…
Conclusions: Aristocracy 2.0 or Democracy 2.0?
Web 2.0 Science 1.0 ?
Wisdom of Crowds Wisdom of experts
Lightly filtered information (or not filtered at all)
Heavily filtered information (peer review)
Democratic (“a link is a vote”) andTechnocratic (“The geeks shall inherit the earth”)
• Paolo Romano, Rosalba Guigno and the organisers / delegates of NETTAB 2009
• Università degli Studi di Catania (University of Catania) for hosting
• Rete Nazionale de Bioinformatica Oncologica (Italian Network for Oncology Bioinformatics) http://www.rnbio.it for funding
• myExperiment team, led by Dave De Roure, Carole Goble, also Jiten Bhagat, Danius Michaelides, Don Cruickshank, Sergejs Aleksejevs, Paul Fisher, ( Also Kell Group lab members, Paul Dobson and Neil Swainston)
• REFINE project, Sophia Ananiadou, Douglas Kell, Steve Pettifer, Jun'ichi Tsujii, Yoshimasa Tsuruoka funded by BBSRC and at http://www.nactem.ac.uk