Top Banner
BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library & Information Science University of Illinois at Urbana-Champaign BeeSpace Workshop, May 22, 2009 1
54

BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Jan 18, 2018

Download

Documents

Overview of BeeSpace Technology Literature Text Search Engine Words/Phrases Entities Relations Natural Language Understanding Users Function Annotator Space/Region Manager, Navigation Support Gene Summarizer Relational Database Text Miner Meta Data Knowledge Discovery & Hypothesis Testing Information Access & Exploration Content Analysis Question Answering 3
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

BeeSpace Informatics Research

ChengXiang (“Cheng”) Zhai

Department of Computer ScienceInstitute for Genomic Biology

StatisticsGraduate School of Library & Information Science

University of Illinois at Urbana-Champaign

BeeSpace Workshop, May 22, 2009 1

Page 2: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Goal of Informatics Research• Develop general and scalable computational methods

to enable– Semantic integration of data and information

– Effective information access and exploration– Knowledge discovery

– Hypothesis formulation and testing

• Reinforcement of research in biology and computer science– CS research to automate manual tasks of biologests

– Biology research to raise new challenges for CS

2

Page 3: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Overview of BeeSpace Technology

Literature Text

Search Engine

Words/Phrases Entities Relations

Natural Language Understanding

UsersFunction Annotator

Space/Region Manager, Navigation Support

Gene Summarizer

Relational Database

Text Miner

Meta Data

Knowledge Discovery

& Hypothesis

Testing

InformationAccess &

Exploration

ContentAnalysis

QuestionAnswering

3

Page 4: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Informatics Research Accomplishments

Literature Text

Search Engine

Words/Phrases Entities Relations

Natural Language Understanding

UsersFunction Annotator

Space/Region Manager, Navigation Support

Gene Summarizer

Relational Database

Text Miner

Meta Data

Knowledge Discovery

& Hypothesis Test

InformationAccess &

Exploration

ContentAnalysis

QuestionAnswering

Biomedical information retrieval [Jiang & Zhai 07], [Lu et al. 08]

Entity/Relation extraction [Jiang & Zhai 06], [Jiang & Zhai 07a], [Jiang & Zhai 07b]

Topic discovery and interpretation [Mei et al. 06a], [Mei et al. 07a], [Mei et al. 07b],

[Chee & Schatz 08]

Entity/Gene Summarization [Ling et al. 06], [Ling et al. 07], [Ling et al. 08]

Automatic Function Annotation [He et al. 09/10]

4

Page 5: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Overview of BeeSpace Technology

Literature Text

Search Engine

Words/Phrases Entities Relations

Natural Language Understanding

UsersFunction Annotator

Space/Region Manager, Navigation Support

Gene Summarizer

Relational Database

Text Miner

Meta Data

Knowledge Discovery

&Hypothesis

Testing

InformationAccess &

Exploration

ContentAnalysis

QuestionAnswering

Part 1. Information Extraction

Part 2. Navigation Support

Part 3. EntitySummarization

Part 4. Function Analysis

5

Page 6: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Part 1. Information Extraction

6

Page 7: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Natural Language Understanding

…We have cloned and sequenced

a cDNA encoding Apis mellifera ultraspiracle (AMUSP)

and examined its responses to …

NP

NP NP

NPVP

VP VP

Gene Gene

7

Page 8: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Entity & Relation Extraction

Gene X Gene YBcd hb…. ….… …

Genetic Interaction

Gene X Anatomy YBcd embryoHb egg… …

Expression Location

…8

Lopes FJ et al., 2005 J. Theor. Biol.

Page 9: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

General Approach: Machine Learning

• Computers learn from labeled examples to compute a function to predict labels of new examples

• Examples of predictions– Given a phrase, predict whether it is a gene name– Given a sentence with two gene names mentioned,

predict whether there is a genetic interaction relation

• Many learning methods are available, but training data isn’t always available

9

Page 10: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Extraction Example 1: Gene Name Recognition

… expression of terminal gap genes is mediated by the local activation of the Torso receptor tyrosine kinase (Tor). At the anterior, terminal gap genes are also activated by the Tor pathway but Bcd contributes to their activation.

10

Gene?

Gene? Gene?

Page 11: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Features for Recognizing Genes

• Syntactic clues:– Capitalization (especially acronyms)– Numbers (gene families)– Punctuation: -, /, :, etc.

• Contextual clues:– Local: surrounding words such as “gene”,

“encoding”, “regulation”, “expressed”, etc.– Global: same noun phrase occurs several times in

the same article

11

Page 12: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Maximum Entropy Modelfor Gene Tagging

• Given an observation (a token or a noun phrase), together with its context, denoted as x

• Predict y {gene, non-gene}

• Maximum entropy model:

P(y|x) = K exp(ifi(x, y))

• Typical f:– y = gene & candidate phrase starts with a capital letter– y = gene & candidate phrase contains digits

• Estimate i with training data

12

Page 13: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Special Challenges

• Gene name disambiguation

• Domain adaptation

13

Page 14: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Gene Name Disambiguation

• Gene names can be common English words: for (foraging), in (inturned), similar (sima),

yellow (y), black (b)…

• Solution: – Disambiguate by looking at the context of the

candidate word – Train a classifier

14

Page 15: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Discriminative Neighbor Words

15

Page 16: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Sample Disambiguation Results

16

... affect complex behaviors such as locomotion and foraging. The foraging -1.468 +3.359(for) gene encodes a pkg in drosophila melanogaster here we demonstrate a +5.497 function for the for gene in sensory responsiveness and … -0.582 +5.980

the cuticular melanization phenotype of black flies is rescued by beta-alanine but -2.780 beta-alanine production by aspartate decarboxylation was reported to be normal in assays of black mutants and although … +9.759

“foraging”, “for”

“black”

Page 17: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Nov 27, 2007 17

Problem of Domain Overfitting

gene name recognizer 54.1%

gene name recognizer 28.1%

ideal setting

realistic settingwingless

daughterless

eyeless

apexless…

fly

Page 18: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Solution: Learn Generalizable Features…decapentaplegic and wingless are expressed in

analogous patterns in each primordium of…

…that CD38 is expressed by both neurons and glial

cells…that PABPC5 is expressed in fetal brain and in

a range of adult tissues.

18

Generalizable Feature: “w+2 = expressed”

Page 19: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Generalizability-Based Feature Ranking

…training

data

……-less……expressed……

………expressed………-less

………expressed……-less…

…………expressed……-less

12345678

12345678

12345678

12345678

…expressed………-less……

…0.125………0.167…… 19

Page 20: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

20

Effectiveness of Domain Adaptation

Fly + Mouse Yeastgene name recognizer 63.3%

Fly + Mouse Yeastgene name recognizer 75.9%

standard learning

domain adaptive learning

Page 21: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

More Results on Domain AdaptationExp Method Precision Recall F1

F+M→Y Baseline 0.557 0.466 0.508Domain 0.575 0.516 0.544

% Imprv. +3.2% +10.7% +7.1%F+Y→M Baseline 0.571 0.335 0.422

Domain 0.582 0.381 0.461% Imprv. +1.9% +13.7% +9.2%

M+Y→F Baseline 0.583 0.097 0.166Domain 0.591 0.139 0.225

% Imprv. +1.4% +43.3% +35.5%

•Text data from BioCreAtIvE (Medline)•3 organisms (Fly, Mouse, Yeast) 21

Page 22: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Extraction Example 2: Genetic Interaction Relation

22

Gene

Gene

Is there a genetic interaction relation here?

Bcd regulates the expression of the maternal and zygotic gene hunchback (hb) that shows a step-like-function expression pattern, in the anterior half of the egg.

Page 23: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Challenges

• No/little training data

• What features to use?

23

Page 24: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Solution: Pseudo Training Data

24

Gene:

Bcd +

These results uncovered an antagonism between hunchback and bicoid at the anterior pole, whereas the two genes are

known to act in concert for most anterior segmented development.

Page 25: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Pseudo Training Data Works Reasonably Well

25

Precision

Recall

Using all features works the best

Page 26: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Large-Scale Entity/Relation Extraction

• Entity annotation

• Relation extraction

Entity Type Resource MethodGene NCBI, FlyBase, … Dictionary string search +

machine learningAnatomy FlyBase Dictionary string searchChemical MeSH, Biosis, … Dictionary string searchBehavior “x x behavior” pattern search

Relation Type MethodRegulatory Pre-defined pattern + machine learningExpressed In Co-occurrence + relevant keywords

Gene Behavior Co-occurrenceGene Chemical Co-occurrence

53

Page 27: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Part 2: Semantic Navigation

27

Page 28: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Space-Region Navigation

Literature Spaces

Bee Fly

Behavior

Bird…

Topic Regions

Bee Forager

MAP MAP

Bird Singing

EXTRACT

…Fly Rover

EXTRACT

SWITCHING

Intersection, Union,…

Intersection, Union,…

My Regions/Topics

My Spaces

28

Page 29: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

General Approach: Language Models

• Topic = word distribution

• Modeling text in a space with mixture models of multinomial distributions

• Text Mining = Parameter Estimation + Inferences

• Matching = Computer similarity between word distributions

• Users can “control” a model by specifying topic preferences

29

Page 30: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

A Sample Topic & Corresponding Space

filaments 0.0410238muscle 0.0327107actin 0.0287701z 0.0221623filament 0.0169888myosin 0.0153909thick 0.00968766thin 0.00926895sections 0.00924286er 0.00890264band 0.00802833muscles 0.00789018antibodies 0.00736094myofibrils 0.00688588flight 0.00670859images 0.00649626

actin filamentsflight muscleflight muscles

labels

• actin filaments in honeybee-flight muscle move collectively• arrangement of filaments and cross-links in the bee flight muscle z disk by image analysis of oblique sections• identification of a connecting filament protein in insect fibrillar flight muscle• the invertebrate myosin filament subfilament arrangement of the solid filaments of insect flight muscles• structure of thick filaments from insect flight muscle

Word Distribution (language model)

Example documents

Meaningful labels

30

Page 31: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

MAP: Topic/RegionSpace

• MAP: Use the topic/region description as a query to search a given space

• Retrieval algorithm:– Query word distribution: p(w|Q)

– Document word distribution: p(w|D)

– Score a document based on similarity of Q and D

• Leverage existing retrieval toolkits: Lemur/Indri

Vocabularyw D

QQDQ wp

wpwpDDQscore

)|()|(

log)|()||(),(

31

Page 32: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

EXTRACT: Space Topic/Region

• Assume k topics, each being represented by a word distribution

• Use a k-component mixture model to fit the documents in a given space (EM algorithm)

• The estimated k component word distributions are taken as k topic regions

| |

1 1

log ( | ) log[ ( | ) (1 ) ( | )]D k

i B j i jD C i j

p C p D p D

Likelihood:

Maximum likelihood estimator: * arg max ( | )p C

Bayesian estimator: * arg max ( | ) arg max ( | ) ( )p C p C p 32

Page 33: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

User-Controlled Exploration: Sample Topic 1

age 0.0672687division 0.0551497labor 0.052136colony 0.038305foraging 0.0357817foragers 0.0236658workers 0.0191248task 0.0190672behavioral 0.0189017behavior 0.0168805older 0.0143466tasks 0.013823old 0.011839individual 0.0114329ages 0.0102134young 0.00985875genotypic 0.00963096social 0.00883439

Prior:

labor 0.2division 0.2

33

Page 34: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

behavioral 0.110674age 0.0789419maturation 0.057956task 0.0318285division 0.0312101labor 0.0293371workers 0.0222682colony 0.0199028social 0.0188699behavior 0.0171008performance 0.0117176foragers 0.0110682genotypic 0.0106029differences 0.0103761polyethism 0.00904816older 0.00808171plasticity 0.00804363changes 0.00794045

Prior:

behavioral 0.2maturation 0.2

34

User-Controlled Exploration: Sample Topic 2

Page 35: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

foraging 0.290076nectar 0.114508food 0.106655forage 0.0734919colony 0.0660329pollen 0.0427706flower 0.0400582sucrose 0.0334728source 0.0319787behavior 0.0283774individual 0.028029rate 0.0242806recruitment 0.0200597time 0.0197362reward 0.0196271task 0.0182461sitter 0.00604067rover 0.00582791rovers 0.00306051

foraging 0.142473foragers 0.0582921forage 0.0557498food 0.0393453nectar 0.03217colony 0.019416source 0.0153349hive 0.0151726dance 0.013336forager 0.0127668information 0.0117961feeder 0.010944rate 0.0104752recruitment 0.00870751individual 0.0086414reward 0.00810706flower 0.00800705dancing 0.00794827behavior 0.00789228

Exploit Prior for Concept Switching

35

Page 36: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Part 3: Entity Summarization

36

Page 37: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Gene product

Expression

Sequence

Interactions

Mutations

General Functions

Multi-Aspect Gene Summary

Automated Gene Summarization?

Page 38: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

A Two-Stage Approach

Page 39: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Text Summary of Gene Abl

Page 40: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

General Entity Summarizer

• Task: Given any entity and k aspects to summarize, generate a semi-structured summary

• Assumption: Training sentences available for each aspect

• Method: – Train a recognizer for each aspect – Given an entity, retrieve sentences relevant to the entity– Classify each sentence into one of the k aspects– Choose the best sentences in each category

40

Page 41: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Further Generalizations

• Task: Given any entity and k pre-specified aspects to summarize, generate a semi-structured summary

• Assumption: Training sentences available for each aspect

• Method: – Train a recognizer for each aspect – Given an entity, retrieve sentences relevant to the entity– Classify each sentence into one of the k aspects– Choose the best sentences in each category

41

New method based on mixture modeland regularized optimization

Page 42: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Part 4. Function Analysis

42

Page 43: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Annotating Gene Lists: GO Terms vs. Literature MiningLimitations of GO annotations: - Labor-intensive- Limited Coverage

Literature Mining:- Automatic - Flexible exploration in the entire literature space

Page 44: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

For any term:

test its significance

Segmentation 56.0Pattern 34.2

Cell_cycle 25.6Development 22.1

Regulation 20.4…

Enriched concepts

Interactive analysis

Gene group

BcdCad…Tll

Entrez Gene

Document sets

For any gene:retrieve

its relevant documents

Bcd

Cad

Tll

Overview of Gene List Annotator

Page 45: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Intuition for Literature-based Annotation

Gene TPI1 GPM1 PGK1 TDH3 TDH2

protein_kinase 0 0 2 0 0

decarboxylase 10 0 10 7 6

protein 39 26 65 44 33

stationary_phase 2 7 3 4 2

energy_metabolism 4 5 5 8 0

oscillation 0 0 0 0 1

Page 46: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Likelihood Ratio Test with 2-Poisson Mixture Model

Dataset distribution: Poisson(λ;d)

Reference distribution: Poisson(λ0;d)

Page 47: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Agreement with GO-based Method• Gene List: 93 genes up-regulated by the manganese treatment

GO Theme Related Annotator terms

neurogenesis axon guidance, growth cone,commissural axon, proneural gene

synaptic transmission synaptic vesicle, neurotransmitterrelease, synaptic transmission, sodiumchannel

cytoskeletal protein alpha tubulin, actin filament

cell communication tight junction, heparan sulfateproteoglycan

47

Page 48: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Discovering Novel Themes• Gene List: 69 genes up-regulated by the methoprene treatment

Theme Annotator terms

muscle flight muscle, muscle myosin, nonmusclemyosin, light chain, myosin ii, thickfilament, thin filament, striated muscle

synaptic transmission neurotransmitter release, synaptictransmission, synaptic vesicle

signaling pathway notch signal

48

Page 49: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Summary

Literature Text

Search Engine

Words/Phrases Entities Relations

Natural Language Understanding

UsersFunction Annotator

Space/Region Manager, Navigation Support

Gene Summarizer

Relational Database

Text Miner

Meta Data

Knowledge Discovery

&Hypothesis

Testing

InformationAccess &

Exploration

ContentAnalysis

QuestionAnswering

Part 1. Information Extraction

Part 2. Navigation Support

Part 3. EntitySummarization

Part 4. Function Analysis

49

Machine Learning + Language Models + Minimum Human Effort

General and scalable, but there’s room for deeper semantics

Page 50: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Looking Ahead…

• Knowledge integration, inferences

• Support for hypothesis formulation and testing

50

Page 51: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

51

Exploring Knowledge Space

Gene A2

Gene A1

Gene A4

Gene A3

Gene A4’

Gene A1’

Behavior B4Behavior B3

Behavior B2

Behavior B1

isa isaCo-occur-fly

Orth-mosCo-occur-mos

Co-occur-bee

Co-occur-fly

Regorth

RegReg

1.X=NeighborOf(B4, Behavior, {co-occur,isa}) {B1,B2,B3}2. Y=NeighborOf(X, Gene, {c-occur, orth} {A1,A1’,A2,A3}3. Y=Y + {A5, A6} {A1,A1’, A2, A3,A5,A6}4. Z=NeighborOf(Y, Gene, {reg}) {A4, A4’}

Gene A5Reg

P= PathBetween({Z, B4, {co-occur, reg,isa})

Page 52: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

52

Full-Fledged BeeSpace V5

BiomedicalLiterature

Entities - Gene- Behavior- Anatomy- ChemicalRelations -Orthology- Regulatory interaction- …

ExperimentData

Analysis

Additional entities and relations

Expert knowledge

InferencesHypothesis Formulation & Testing

Page 53: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Thanks to

Xin He (UIUC)Jing Jiang (SMU)Yanen Li (UIUC)Xu Ling (UIUC)Yue Lu (UIUC)

Qiaozhu Mei (UIUC/Michigan)

& Bruce Schatz (PI, BeeSpace)53

Page 54: BeeSpace Informatics Research ChengXiang (“Cheng”) Zhai Department of Computer Science Institute for Genomic Biology Statistics Graduate School of Library.

Thank You!

54