BioMAS: A Multi-Agent System for Automated Genomic Annotation Keith Decker Department of Computer and Information Sciences University of Delaware Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences
33
Embed
BioMAS: A Multi-Agent System for Automated Genomic Annotation
BioMAS: A Multi-Agent System for Automated Genomic Annotation. Keith Decker Department of Computer and Information Sciences University of Delaware. Salim Khan, Ravi Makkena, Gang Situ Computer & Information Sciences. Dr. Carl Schmidt, Heebal Kim Animal & Food Sciences. Outline. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BioMAS: A Multi-Agent System for Automated Genomic Annotation
Keith DeckerDepartment of Computer and Information Sciences
University of Delaware
Salim Khan, Ravi Makkena, Gang SituComputer & Information Sciences
Dr. Carl Schmidt, Heebal KimAnimal & Food Sciences
Multitude of analysis algorithms Different interfaces, output formats Create contingent process plans chaining many analyses
together Individual PIs, working on non-model organisms
Learn, then hand-navigate sea of DBs and analysis tools Easily overwhelmed by new sequence and EST data Struggle to make results available usefully to others
Approach: Multi-Agent Information Gathering Software agents for information retrieval, filtering,
integration, analysis, and display Embody heterogeneous database technology (wrappers,
mediators, …) Deal with dynamic data and changing data sources Efficient and robust distributed computation (for both info
retrieval and analysis) Deal with issues of data organization and ownership Natural approach to providing integrated information
To humans via web To other agents via semantic markup [XML/OIL/DAML]
Example: Multi-Agent System for Automated Herpesvirus Annotation Input raw sequence data Output: an annotated database that allows fairly complex
queries BLAST homologs Motifs Protein domains [Prodomain records] PSORT sub-cellular location predictions GO [Gene Ontology] electronic annotation
“Show me all the genes in Marek’s Disease virus with a tyrosine phosphorylation motif and a transmembrane domain value ≥ 2”
How does this help? Automates collection of information from various primary source
databases If the info changes, can be updated automatically. PI can be notified.
Allows various analyses to be done automatically Can encode complex (contingent) sequences of info retrieval and linked
analyses, report interesting results only New data sources, annotation, analyses can be applied as they are
developed, automatically (open system) Made available on internet to others, or private data
Much more sophisticated queries than keyword search Dynamic menu of keys Concept hierarchies (“ontology”) allow more concise queries Query planning (e.g., time, resource usage)
Can search across multiple databases (i.e., from other researchers)
How does it work?Sequence Addition Applet User Query Applet Interface Agents
GenBankInfo Extraction Agent
InformationExtraction Agents
ProDomainInfo Extraction Agent
SwissProt/ProSiteInfo Extraction Agent
Psort AnalysisWrapper
Local KnowledgebaseManagement AgentLocal Knowledgebase
Conditional effects can be used to model special cases ("exceptions") when applying operator schema
Resource Utilization can be used to model quantitative aspects such as amplification of a signal, feedback and feed-forward loops
Plan re-use: Old plans can be successfully inserted into new ones (if initial and final conditions are met )without additional computation
(ontologically driven) Operator Schema Example: Transport
(action: transport :parameters (?mol - macromolecule, ?compfrom, ?compto - compartment)
:condition (and (in ?mol ?compfrom) (open ?compfrom ?compto))
:effects (and (in ?mol ?compto) (not (in ?mol ?compfrom)))
RTK-MAPK pathway
Activation of Ras following binding of a hormone (eg. EGF) to a receptor
RTK-MAPK pathway step: O-Plan Output
Phosphorylation of GRB2 at domain Sh2 by the RTK receptor
Summary
Bioinformatics has many features amenable to multi-agent information gathering approach
BioMAS: Automated Analysis: EST processing to functional annotation ontologies
DECAF / RETSINA / TÆMS GOFigure! And electronic GO annotation CoPrDom Co-Present Domain Analysis Signal Transduction Pathway Discovery
BioMAS Future Work Sophisticated queries are possible, but how to make available to
Biologists?? “Show me all glycoproteins in Marek’s Disease virus with a tyrosine phosphorylation
motif and a transmembrane domain value ≥ 2 that are expressed in feather follicles” Robustness, efficiency, scale, data materialization issues Automating and integrating more complex analysis processes
(using existing software!) Estimating physical location of genes by synteny
Integrate new data sources Microarray and other gene expression data And thus, more analyses: QTL mapping, metabolic pathway learning
New off-site organism databases and analysis agents