A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme: Central Machinery of Life -proteins conserved in all kingdoms of life Biological theme: Complete coverage of Thermotoga maritima Adam Godzik and the JCSG Bioinformatics Team
26
Embed
A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A community-driven annotation platform for structural genomics
Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008
Biomedical theme: Central Machinery of Life -proteins conserved in all kingdoms of lifeBiological theme: Complete coverage of Thermotoga maritima
Adam Godzikand the JCSG Bioinformatics Team
Science is all about communication• Since late XIX century, a dominant way of communicating scientific
results is through peer-reviewed manuscripts• Pro
– Peer review ensures quality– Enforces a “publishable unit” – decreases noise in the
“communication space”– Authorship rules ensure proper distribution of credit in a
system that is well integrated with system of promotions and evaluations
• Con– Significant time lag and additional costs– Enforces a “publishable unit” – below the threshold
results are lost – Not scalable with high throughput data production
Increasingly, it’s not the only game in town
• Databases and automated annotation protocols– pro: fast, machine searchable, scalable– con: difficult to ensure quality and assign credit, put the
burden of expertise on the user
• Wikipedia– pro: harnesses power of community, scalable– con: unreliable, difficult to ensure quality and assign
Structure determination in PSI centers is done on a semi-automated assembly line
• Joint Center for Structural Genomics• One of four large scale (production) centers of PSI2• ~ 600 structures deposited in the PDB• Sustained pace of ~15 PDB depositions per month
…and the pace of structure determination far outstrips the pace of our publications…
• Metabolic reconstruction of T.maritima was done in collaboration with UCSD Systems Biology Lab (Bernard Palsson)
• Model is consistent with all the published experimental data on TM (see Ines Thiele poster)
• First generation model covers 479 genes (1398 are not in the model), 492 metabolites
– Proteins coded by 113 of these genes have been solved (71 at JCSG, 28 at other PSI centers)
– 320 have be modeled– We know at least approximate
structure of ALL the proteins in the reconstruction
And bring it together to help make sense of the structures and see them in the full context
All available information about a protein on one page
We try to combine automated, database driven annotations with expert curated input.
Annotation:• Feeds from public databases• Expert-curated informationContent management:• Wiki-style editing (WYSIWYG editor)• Page-level access control• Structured fields + free text• Instant publication• Always open for comments and editsQuality control & authorship:• Encourage community collaboration• JCSG scientists & invited peers• Many authors - no contribution too
small• Lead authors (editors) in charge of
releases
TOPSAN content:
Protein Groups
Browsing Options
JCSG: Structures / Structure Notes / TOPSAN
278 of 593 structures have an annotation on TOPSAN
Members of the biological community can utilize PSI structures only when they are aware of them
Functionally well characterized enzyme and is also a new fold.PDB ID: 3C8WTargetDB: 376561
TSRI Administrative CoreIan WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen
Stanford /SSRLStructure Determination CoreKeith HodgsonAshley DeaconMitchell Miller Hsiu-Ju (Jessica) ChiuDebanu Das Kevin JinAbhinav KumarWinnie LamSilvya OommachenChristopher RifeScott TalafuseChristine TrameQingping XuHenry van den BedemRonald Reyes
The JCSG is supported by the NIH Protein Structure Initiative grant U54 GM074898 from the National Institute of General Medical Sciences (www.nigms.nih.gov).
Thermotoga browser acknowledgments• Co-PI of the project - Andrei Osterman (the biochemistry side, specific examples)• The JCSG team - for all the structures, focus on Thermotoga and CML• Bernard Palsson group and Ines Thiele for work with Thermotoga reconstruction and
model simulations• The JCMM team for structure modeling• Krzysztof Ginalski and bioinfor server team for assistance with “borderline” predictions
• Ying Zhang (JCMM) - finalizing the metabolic reconstruction, network and fold distribution analysis
• Dana Weekes (JCSG) - first pass on the Thermotoga metabolic reconstruction, TM TOPSAN pages
• Craig Shepherd (JCMM) - network visualization• Zhanwen Li (JCMM) - modeling and fold assignments