Knowledge Enabled Information and Services Science Semantic (Web) Technologies for Translational Research in Life Sciences Ohio State University, June 16, 2011 Amit P. Sheth Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) [email protected]Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith); Collaborators at CTEGD UGA (Dr. Tarleton, Brent Weatherly), NLM (Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford, CITAR/WSU
57
Embed
Semantic (Web) Technologies for Translational Research in Life Sciences
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Knowledge Enabled Information and Services Science
Semantic (Web) Technologies for Translational Research in Life Sciences
Ohio State University, June 16, 2011
Amit P. ShethOhio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Knowledge Enabled Information and Services Science
Major SW Projects
• OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/
• LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/
• NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
Knowledge Enabled Information and Services Science
Project 4: Nicotine Dependence
• Why: For understanding the genetic basis of nicotine dependence.
• What: Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.
• How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).
• Where: NLM (NIH) • Status: Completed research
Knowledge Enabled Information and Services Science
Motivation
• NIDA study on nicotine dependency• List of candidate genes in humans• Analysis objectives include:
o Find interactions between geneso Identification of active genes – maximum
number of pathwayso Identification of genes based on anatomical
locations• Requires integration of genome and biological
pathway information
Knowledge Enabled Information and Services Science
Entrez Gene
ReactomeKEGG
HumanCyc
GeneOntology HomoloGene
Genome and pathway information integration
• pathway
• protein
• pmid
• pathway
• protein
• pmid
• pathway
• protein
• pmid
• GO ID • HomoloGene
ID
Knowledge Enabled Information and Services Science
JBI
Knowledge Enabled Information and Services Science
BioPAXontology
EntrezKnowledge
Model(EKoM)
Knowledge Enabled Information and Services Science
Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
Knowledge Enabled Information and Services Science
Project 5: T. cruzi SPSE
• Why: For Integrative Parasite Research to help expedite knowledge discovery
• What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosoma cruzi
• Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA
• Who: Kno.e.sis, UGA, NCBO (Stanford)• Status: Research prototype – in regular lab
use
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Research Accomplishments
• SPSE Integrated internal data with external databases, such as KEGG,
GO, and some datasets on TriTrypDB Developed semantic provenance framework and influence W3C
community SPSE supports complex biological queries that help find gene
knockout, drug and/or vaccination targets. For example:Show me proteins that are downregulated in the
epimastigote stage and exist in a single metabolic pathway.Give me the gene knockout summaries, both for plasmid
construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosoma brucei.
Knowledge Enabled Information and Services Science
Knowledge driven query formulation
Complex queries can also include:- on-the-fly Web services execution to retrieve additional data- inference rules to make implicit knowledge explicit
Knowledge Enabled Information and Services Science
Project 6: HPCO
• Why: collaborative knowledge exploration over scientific literature
• What: An up-to-date knowledge based literature search and exploration framework
• How: Using information extraction, conventional IR, and semantic web technologies for collaborative literature exploration
• Where: AFRL• Status: Completed research
Knowledge Enabled Information and Services Science
Focused KB Work Flow (Use case: HPCO)
Doozer: Base Hierarchy from
WikipediaFocused
Pattern based extraction
HPC keywords
SenseLab Neuroscience
Ontologies
Initial KB Creation
NLM: Rule based BKR Triples
Knoesis: Parsing based NLP Triples
Meta Knowledgebase
PubMed Abstracts
Enrich Knowledge Base
Final Knowledge Base
Knowledge Enabled Information and Services Science
Triple Extraction Approaches
• Open Extraction– No fixed number of predetermined entities and
predicates– At Knoesis – NLP (parsing and dependency
trees)• Supervised Extraction
– Predetermined set of entities and predicates– At Knoesis – Pattern based extraction to
connect entities in the base hierarchy using statistical techniques
– At NLM – NLP and rule based approaches
Knowledge Enabled Information and Services Science
Mapping Triples to Base Hierarchy
• Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB
• Preliminary synonyms based on anchor labels and page redirects in Wikipedia
– Prolactostatin redirects to Dopamine
• Predicates (verbs) and entities are subjected to stemming using Wordnet
Knowledge Enabled Information and Services Science
Scooner: Full Architecture
Knowledge Enabled Information and Services Science
New implicit knowledgeVIP Peptide – affects – fear conditioning
Caveat: Each triple above was observed in a different organism (cows, mice, humans), but still interesting hypothesis. Scooner’s contextual browsing makes this clear to the user.
Knowledge Enabled Information and Services Science
Project 7: Drug Abuse
• Why: To study social trends in pharmaceutical opioid abuse
• What: – Describe drug user’s knowledge, attitudes, and
behaviors related to illicit use of OxyContin® – Describe temporal patterns of non-medical use of
OxyContin® tablets as discussed on Web-based forums • Where: CITAR (Center for Interventions, Treatment and
Addictions Research) at Wright State Univ.• Status: In-progress (Recently funded from NIDA)
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Project 8: NMR
• Why: Streamline the NMR data processing tasks. Processing NMR experimental data is complex and time consuming.
• What: Providing biologists with tools to effectively process and manage Nuclear Magnetic Resonance (NMR) experimental data.
• How: Use Domain Specific Languages (DSL) to create scientist-friendly abstractions for complex statistical workflows. Use semantics based techniques to store and manage data.
• Where: Air Force Research Lab• Status: In progress
Knowledge Enabled Information and Services Science
Motivation
NMR spectroscopy data is complex and require significant statistical processing before interpreting- Writing these processes is hard
- They have to run on many different computational platforms
- The data collected has to be shared among multiple parties
A simple NMR spectrum, highlighting peaks that
correspond to the presence of specific chemical compounds
Knowledge Enabled Information and Services Science
A complex NMR spectrum, marked with chemical compound identifiers by human observers.
Knowledge Enabled Information and Services Science
Project Outline
Identify fundamental operators required for common NMR processing tasks
Use a DSL to provide abstractions for the operators (named SCALE)
Build compilers to generate multiple, cloud-enabled applications
Knowledge Enabled Information and Services Science
Real time Healthcare Information
• Matching medical requirements with availability of medical resources (Mumbai, India)– Project HERO Helpline for Emergency Response Operations – For patients seeking for immediate medical help
• Medical awareness in rural India– mMitra, info. service during pregnancy and childhood
Knowledge Enabled Information and Services Science
Future Interoperability Challenge:360 degree health
Clinical CareInsurance, Financial Aspects
Genetic Tests… Profiles
Follow up,Lifestyle
Clinical Trials Social Media
Knowledge Enabled Information and Services Science
• For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these!– Possibly largest growth in data will be in
sensors (eg Body Area Networks, Biosensors) and social content. Extensive use of mobile phones.
Credit: ece.virginia.edu
Knowledge Enabled Information and Services Science
Summary
• Semantic Web is an “interoperability technology”• Semantic Web provides the needed
interoperability, and can accommodate all necessary “points of view”
• Linked Data as a way of sharing data is highly promising
• Many examples of viable usage of Semantic Web technologies
• Words of warning about deployment• Significant research challenges remain as Health
presents the most complex domain
Knowledge Enabled Information and Services Science
Representative References
1. A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
2. Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype Information WWW2007 HCLS Workshop, May 2007.
3. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4
4. Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008.
5. Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596
6. Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
7. Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and Krishnaprasad Thirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010.