Top Banner
1 CIS607, Fall 2004 CIS607, Fall 2004 Semantic Information Semantic Information Integration Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui Yao, Xiaofang Zhang Instructor/Organizer: Dejing Dou Week 10 (Dec. 1)
16

1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

1

CIS607, Fall 2004CIS607, Fall 2004

Semantic Information Semantic Information IntegrationIntegration

Attendees: Vikash Agarwal, Julian M Catchen

Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu

Xiangkui Yao, Xiaofang Zhang

Instructor/Organizer: Dejing Dou

Week 10 (Dec. 1)

Page 2: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

2

OutlineOutline

Personal Information Management (PIM)

Semantic Integration in PIMMedical Informatics and BioinformaticsSemantic Integration in Biomedical Informatics

Page 3: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

3

Personal Information Personal Information

Homepages (HTML, XML) Personal Emails (Text) Spreadsheets (E.g. Microsoft Excel ) Contact Lists (Text) Calendar Publications and Presentations (Word, Latex, PowerPoint) Personal Databases (SQL)……

Page 4: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

4

Personal Information Management (PIM)Personal Information Management (PIM) How to organize personal information resources

– They are currently organized by applications and locations. How to integrate and share the data

– Mostly manually (e.g. copy&paste) How to search (query).

– E.g. Prof. Wang wants to know the papers his students presented in the conferences and travel expenses from grants.

Good news: The development of Internet, Web and Wireless communication makes personal information accessible from desktop, laptop, palm and cellphone.

The problems: Different formats and data structures, different contents based on applications.

Page 5: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

5

Association(Relationship)-based PIMAssociation(Relationship)-based PIM Organize the personal information resources based on

their associations (relationships).– Emails Contact Lists– Homepage Publications– Calendar Spreadsheets

Use a domain ontology to define those concepts and store associations (relationships) as mappings.

Develop an integration engine to process the data and query based on the domain ontology and mappings.

Page 6: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

6

Association(Relationship)-based PIM (cont’d)Association(Relationship)-based PIM (cont’d)

Domain ontology

PersonHomepage

ContactsSpreadSheet

Publications

CalendarEmails

Information Resources (Data)

Integration Engine

User

Personal DBs

SQL

Page 7: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

7

Main Topics in Association-based PIMMain Topics in Association-based PIM How to integrate structured data and unstructured data

– Databases and SpreadSheets are structured, XML and Latex are semi-structured.

– Emails, HTML, Contacts, Word are unstructured text.

How to define the domain ontology. The concepts of different resources use different hierarchy.

How to express the mapping (rules) of different information resources. How can integration engine use those mappings to integrate data and answer query.

– Emails Contact Lists– Homepage Publications Personal Databases– Calendar Spreadsheets

Page 8: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

8

Bioinformatics and Medical InformaticsBioinformatics and Medical Informatics What it is

The analysis of biological and medical information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological and medical research.

What it can do– In genomics, bioinformatics includes the development of methods to

search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.

– In neuroscience, medical informatics can analyze the EEG and MRI data to study functions of neurons and human brain.

– In pharmacy, medical informatics can help study drug use and drug interactions.

– In clinical study, medical informatics (e.g. expert system) can help study diseases and treatment of patients.

Page 9: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

9

Good news and problemsGood news and problems Good news

– Most biomedical data has been stored in databases. They are structured data.

– Statistics-based data mining techniques has been used successfully to get the pattern of data.

Problems in biomedical data integration. – Most biomedical databases were developed locally and application-

oriented, there is few agreement in their schemas.

– It is difficult for other people, especially people without biomedical

knowledge, to understand the schemas. – Database schemas are not expressive for the meaning (“semantics”)

of data and pattern of data.

Page 10: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

10

Integrating Neuronal DatabasesIntegrating Neuronal Databases Cooperation with Yale Medical Informatics Center to

integrate Senselab (Yale) and CNDB (Cornell)’s web-based neuronal databases.– Senselab: model and structure information of a particular class of neurons.

– CNDB:experimental data for individual neurons measured at a particular day.

Researchers in Senselab have marked up their data and database schema with EDSP[Marenco etal03], an XML specification. Cornell’s researchers also have marked up their data and database schema with another XML dialect.

Structure image

Experimental EEG Data

Electroencephalography

Page 11: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

11

Integrating Neuronal Databases(cont’d)Integrating Neuronal Databases(cont’d) Get their database schemas from XML files and transform

them to class and property definitions. Find the mapping of these two neuronal database schemas with the help of domain experts, neuroscientists. Merge these two database schemas with bridging axioms. e.g:

(forall (n - neuron)

(if (@cndb:funct_area n hippocampal.CA1) (@senselab:Neurons @senselab:Hippocampus n)))

We have developed some initial semi-automatic tools and GUIs to help domain experts, such as neuroscientists, to map and merge two neuronal database schemas.

Page 12: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

12

Interactive Axioms Composition by Interactive Axioms Composition by Domain ExpertsDomain Experts

Ontology Mapping by similarity matching using dictionaries.

e.g. Protein vs. Enzyme Axiom Production: Allow Domain Experts give some concrete

examples about how two symbols in different ontologies (database schemas) are related. Generalize examples to usable bridging axioms, an machine learning approach to generate mapping rules.

Pattern Reuse: Based on the fact a large number of correspondences can usually be sorted into a small set of patterns, allow domain experts to note and reuse these patterns.

Consistency testing: Detect contradiction of generated bridging axioms; Display the bugs to domain experts and allow axioms to be edited.

Page 13: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

13

The mappings between EEG and MRI data The mappings between EEG and MRI data

EEG Data acquisition

Magnetic resonance imaging (MRI)

Page 14: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

14

Ontology-based Data Analysis (Mining) Ontology-based Data Analysis (Mining) You can consider it as an expert system. At least useful for

training purposes.

DataMDataR Inference Engine

OR OM

EEG, MRI …data

Computational tools

What are the features (patterns) of processed data

What can the patterns tell us (e.g. any function and disease of brain)

Page 15: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

15

Ontology-based Genome DB MediationOntology-based Genome DB Mediation

Integrating databases with the domain ontology. The system can process meaningful query and data based on the mapping rules.

…… ……DB2DB1 DB3

Onto1

Domain Ontology (includes GO)

Onto2

Onto3

Query based on domain ontology

e.g. ZFIN e.g. another Zebrafish Lab DB

e.g. Human DB

Page 16: 1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.

16

Genotypes + Environment => Phenotypes Genotypes + Environment => Phenotypes

DataPDataG

OG OP

The features (makeup) of Gene

The Observable characteristics produced by

genotype interacting with the environment

DataE

OE

+

Environment Features

GO(gene ontology)

Cellular Component

Molecular Functions

Biological Process

temperaturepressure

light……

Too manyFeatures