Connecting TOPSAN to Computational Analysis Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik Joint Center for Structural Genomics Sanford-Burnham Medical Research Institute, La Jolla, California, USA University of California, San Diego, La Jolla, California, USA Joint Center for Molecular Modeling
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Connecting TOPSAN to Computational Analysis
Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik
Joint Center for Structural GenomicsSanford-Burnham Medical Research Institute, La Jolla, California, USA
University of California, San Diego, La Jolla, California, USAJoint Center for Molecular Modeling
Connecting TOPSAN to Computational Analysis 2
Overview
• What is TOPSAN?– TOPSAN: The Open Protein Structure Annotation Network – community based annotation protein structures
• “Semantic” TOPSAN• How to enter machine-readable, structured data• Example: editor → entry → semantic web• Different ways to download information• SPARQL example• Availability and licenses• Acknowledgements
Connecting TOPSAN to Computational Analysis 3
What is TOPSAN?
• TOPSAN: The Open Protein Structure Annotation Network • Ten-thousands of protein structures have been determined
by structural genomics (SG) centers and many more are expected
• While these structures are available in PDB (Protein Data Bank)…
• … annotations for most of them a limited to one-line PDB titles
• TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers
Connecting TOPSAN to Computational Analysis 4
What is TOPSAN?
• TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure
• TOPSAN combines automated with human edited elements • TOPSAN spans the range of analysis of
– single proteins– characterization of protein families– reconstruction of entire genomes
• Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins
• Collaborating with PFAM to use JCSG structures to refine and create new PFAM families
5
TOPSAN example entry
Connecting TOPSAN to Computational Analysis
Connecting TOPSAN to Computational Analysis 6
“Semantic” TOPSAN
• Use the principles of the semantic web to turn TOPSAN into a database that can be:– edited– searched– linked
• TOPSAN content is being made accessible to computational query and analysis via semantic web technologies
Connecting TOPSAN to Computational Analysis 7
Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS)
• Takes the form subject, predicate, object• Subject: the protein in question• Predicate, examples:
– homologous– encoded_by– citation– member_of
• Object: “direct value” or link to other database• Example: