Causal inference in biomolecular pathways using a Bayesian network approach and an Implicit method

ARTICLE IN PRESS

Journal of Theoretical Biology 253 (2008) 717– 724

Contents lists available at ScienceDirect

Journal of Theoretical Biology

0022-51

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/yjtbi

Causal inference in biomolecular pathways using a Bayesian networkapproach and an Implicit method

Hanen Ben Hassen a, Afif Masmoudi b, Ahmed Rebai a,�

a Unit of Bioinformatics and Biostatistics, Centre of Biotechnology of Sfax, Sfax 3038, Tunisiab Laboratory of Probability and Statistics, Faculty of Sciences of Sfax, Sfax 3038, Tunisia

a r t i c l e i n f o

Article history:

Received 9 November 2007

Received in revised form

3 March 2008

Accepted 24 April 2008Available online 4 May 2008

Keywords:

EGFR

Implicit statistics

Bayesian inference

Signaling pathways

Parameters learning

93/$ - see front matter & 2008 Elsevier Ltd. A

016/j.jtbi.2008.04.030

esponding author.

ail address: [email protected] (A. Reba

a b s t r a c t

We introduce here the concept of Implicit networks which provide, like Bayesian networks, a graphical

modelling framework that encodes the joint probability distribution for a set of random variables within

a directed acyclic graph. We show that Implicit networks, when used in conjunction with appropriate

statistical techniques, are very attractive for their ability to understand and analyze biological data.

Particularly, we consider here the use of Implicit networks for causal inference in biomolecular

pathways. In such pathways, an Implicit network encodes dependencies among variables (proteins,

genes), can be trained to learn causal relationships (regulation, interaction) between them and then

used to predict the biological response given the status of some key proteins or genes in the network.

We show that Implicit networks offer efficient methodologies for learning from observations without

prior knowledge and thus provide a good alternative to classical inference in Bayesian networks when

priors are missing. We illustrate our approach by an application to simulated data for a simplified signal

transduction pathway of the epidermal growth factor receptor (EGFR) protein.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

In recent years, the statistical modelling of molecular pathwayswithin living cells has known an increasing interest. In fact, thesepathways (including gene regulation, metabolic and signaltransduction networks) control many cellular processes and theirdisruption or deregulation has been reported to be the primarycauses of many complex diseases (particularly cancer). Forexample, signal transduction pathways allow a cell to sense itsenvironment and to react accordingly. This is achieved throughcascades of proteins that interact to convert one signal into aphysiological response within coordinated protein networks.Molecular pathways can be formally represented as directedacyclic graphs (DAGs) in which nodes represent genes or proteinsand arcs the regulation or interaction relationships betweenthem.

In order to analyze these networks and consequently toimprove our understanding of diseases, many statistical ap-proaches have been developed to model molecular pathways.Network approaches based on acyclic graphs originated from thegenetic studies by Wright (1921) who developed a method calledpath analysis. Based on similar formalism, Pearl (1988) introduced

ll rights reserved.

i).

Bayesian networks (BNs), a knowledge representation at theconfluence of artificial intelligence and statistics, that offers apowerful framework to capture many types of relationshipsbetween variables (Heckerman et al., 1995; Heckerman andBreese, 1996; Heckerman, 1997). A BN provides a probabilisticmodel of the dependencies between a set of variables within anetwork structure relating them. Both the structure and theconditional probabilities can be inferred using a dataset. Thepotential of the BN approach to address complex genomic data,including data on molecular pathways, has been recentlyrecognized (Friedman, 2004; Beer and Tavazoie, 2004; Woolf etal., 2005). The Bayesian method holds the promise of answeringvery interesting questions since it seems to be one of the bestavailable technologies to take advantage of the massively parallelanalysis of whole genome data to discover how they interact,control each other and align themselves in pathways of activation.

However, in spite of their remarkable power to addressinferential processes, all BN learners are slow, both in theoryand in practice (Chickering et al., 2004). In theory, BN is only asuseful as a prior knowledge is reliable, but this prior informationis not always available. We therefore need a learner that canexploit such prior information if it is available, but which can stilllearn effectively if it is not. In practice, many efforts have beenhampered by insufficient data, difficulties in data preprocessingand the large computational demands and complexity. In fact,many BN-learning algorithms require additional information

www.sciencedirect.com/science/journal/yjtbi

www.elsevier.com/locate/yjtbi

dx.doi.org/10.1016/j.jtbi.2008.04.030

mailto:[email protected]

https://www.researchgate.net/publication/3411762_Causal_independence_for_probability_assessment_and_inference_using_Bayesian_networks?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/8235969_Bayesian_analysis_of_signaling_networks_governing_embryonic_stem_cell_fate_decision?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/226884574_Learning_Bayesian_Networks_The_Combination_of_Knowledge_and_Statistical_Data?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/8619556_Predicting_Gene_Expression_from_Sequence?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/216301244_Probabilistic_Reasoning_in_Intelligent_Systems_Networks_Of_Plausible_Inference?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/8884373_Inferring_Cellular_Networks_Using_Probabilistic_Graphical_Models?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/233871606_Large-Sample_Learning_of_Bayesian_Networks_is_NP-Hard?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/220451914_Bayesian_Networks_for_Data_Mining?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

ARTICLE IN PRESS

H. Ben Hassen et al. / Journal of Theoretical Biology 253 (2008) 717–724718

notably an ordering of the nodes to reduce the search space(Heckerman et al., 1995).

In this paper, we use the Implicit method recently proposed byHassairi et al. (2005) to overcome some of the shortcomingsassociated with Bayesian inference. We subsequently introduce anew framework for parameters estimation in BN that we namedImplicit network (IN). An IN, like a BN, models causal relationshipsbetween variables, but differently to it, does not need any priorassumption about conditional probabilities. Moreover, the theoryof IN is simplified comparative to BN, which makes the approacheasier to implement.

The outline of the paper is as follows; in Section 2, we brieflypresent the Implicit method and recall the principles of implicitinference and give an application of the method to the case of amultinomial distribution. In Section 3, we define and describe theIN approach and show how an IN can be built without priors onthe parameters and how we can learn the probabilities for a givennetwork structure. In Section 4 we illustrate the IN approach withan application to simulated data for a simplified signal transduc-tion pathway of the epidermal growth factor receptor (EGFR).Finally, in Section 5, we conclude with a discussion andperspectives for future work.

2. Inference with the Implicit method

2.1. General view of the Implicit method

Our objective is to estimate the phosphorylation state of eachprotein in the EGFR signal transduction cascade given observedactivity states of the other proteins in the cascade. If we representthis cascade by a network which is a DAG, then each protein inthis graph represents a node and apparent causal relationshipsbetween these proteins represent the directed arrows (theconcept of network is further defined in Section 3). This structurerepresentation of the phosphorylation cascade is well suited forapplying a BN or an IN methodology. The Implicit method issimilar to the Bayesian method, but it does not need to specify anypriors for the parameters. In fact, in the context of the Bayesiantheory (e.g. Robert, 1994), the unknown parameter y in thestatistical model is assumed to be a random variable with aknown prior distribution. This prior information is used, togetherwith the data, to derive the posterior distribution of y. The choiceof a prior is generally based on the preliminary knowledge of theproblem. So, the basic idea of the Bayesian theory is to considerany parameter y as a random variable and to determine itsposterior (conditional) distribution given data and the assumedprior.

Alternatively, the concept of Implicit distribution was pre-viously proposed by Hassairi et al. (2005) and can be described asa kind of posterior distribution of a parameter given data. Toexplain the principle of Implicit distribution let us consider afamily of probability distributions fpðx=yÞ; y 2 Yg parameterizedby an unknown parameter y in a set Y; where x is the observeddata.

The Implicit distribution pðy=xÞ is calculated by multiplying thelikelihood function pðx=yÞ by a counting measure s if Y is acountable set and by a Lebesgue measure s if Y is an open set(s depends only on the topological structure of Y) and thendividing by the norming constant cðxÞ ¼

RY pðx=yÞsðdyÞ. Therefore,

the Implicit distribution is given by the following formula pðy=xÞ ¼

ðcðxÞÞ�1pðx=yÞsðyÞ and plays the role of a posterior distribution of ygiven x in the Bayesian method, corresponding to a particularimproper prior which depends only on the topology of Y (withoutany statistical assumption). Provided its existence (which holdsfor most statistical models), the Implicit distribution can be used

for the estimation of the parameter y following a Bayesianmethodology. The Implicit estimator by of y is nothing but themean of the Implicit distribution. To avoid any misunderstanding,it is important here to emphasize that the Implicit approach isneither a non-informative Bayesian analysis nor a fiducial-likemethod as has been criticized by some commenters (Mukhopad-hyay, 2006). Readers are referred to the paper of Hassairi et al.(2005) for a presentation of the theoretical foundations of Implicitinference and some selected applications.

2.2. Implicit method in the multinomial case

To illustrate how the Implicit method proceeds let us considera multinomial sampling, where the observed discrete variable X ¼

ðN1; . . . ;NrÞ takes r possible states x1; . . . ; xr with probabilitiesy1; . . . ; yr (yi ¼ PðNi ¼ 1Þ). The random variable X follows amultinomial distribution with parameters N ¼

Pri¼1Ni and

y ¼ ðy2; . . . ; yrÞ.Let D ¼ fXð1Þ; . . . ;XðNÞg be a set of observations; Ni is the

number of occurrences of xi in D, and y1 ¼ 1�Pr

i¼2yi.The likelihood function of this multinomial distribution is

given by

PðN1 ¼ n1; . . . ;Nr ¼ nr=N; yÞ ¼ N!Yr

i¼1

yni

i

ni!.

In the following, we show how the Implicit method providesestimates for y and N.

2.2.1. Implicit estimator of y with known N

In this subsection, we assume that N is known, so that y is theonly parameter (vector of parameters) to be estimated. Estimatingy by the Implicit method consists in determining the Implicitdistribution of y given X without any prior law for y. The inferenceproblem is solved by determining a norming constant functionCðXÞ in X, such that PðX=yÞ=CðXÞ becomes a probability distributionof y, that isZ

PðX=yÞCðXÞ

dy ¼ 1,

and so

CðXÞ ¼

ZPðX=yÞdy.

In the multinomial case we get

CðXÞ ¼

ZN!Yr

i¼1

yNi

i

Ni!dy ¼

N!

ðN þ r � 1Þ!.

It comes that the Implicit distribution of y given X ¼ ðN1; . . . ;NrÞ

is a Dirichlet distribution with parameters N1 þ 1; . . . ;Nr þ 1,denoted DirðN1 þ 1; . . . ;Nr þ 1Þ.

The probability distribution for the next observation being inthe state xk is thus given by

PðXðNþ1Þ ¼ xk=DÞ ¼

ZykDirðN1 þ 1; . . . ;Nr þ 1ÞðdyÞ

¼Nk þ 1

N þ r¼ byk; k 2 f1; . . . ; rg, (2.1)

byk is the Implicit estimator of yk.It is worth noting that in the classical BN approach the

prior distribution of y is generally considered as a Dirichletdistribution Dirða1; . . . ; arÞ. This leads to a posterior distribu-tion of y that is Dirða1 þ N1; . . . ; ar þ Nr) and an estimator(e.g. Heckerman, 1997):

byk ¼Nk þ ak

N þ a,

https://www.researchgate.net/publication/233233485_Implicit_Distributions_and_Estimation?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=





https://www.researchgate.net/publication/232856773_Some_Comments_on_Hassairi_et_al's_Implicit_Distributions_and_Estimation?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=



https://www.researchgate.net/publication/254198570_The_Bayesian_Choice_A_Decision-Theoretic_Motivation?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

ARTICLE IN PRESS

H. Ben Hassen et al. / Journal of Theoretical Biology 253 (2008) 717–724 719

where a ¼Pr

k¼1ak. So, the Implicit estimator corresponds, in casewhere N is known, to the Bayesian estimator with all ai equal 1(uniform prior).

Fig. 1. Basic structure of an IN (the node X3 is the child of the nodes X1 and X2

named its parents).

2.2.2. Implicit estimator of both N and yNow, suppose that both N and y ¼ ðy2; . . . ; yrÞ are unknown

parameters to be estimated. In this case we proceed in two steps;first, we estimate N and then we use the estimator of N to derivethe estimator of y by the Implicit method. This means thatestimating N in our method is only a way to improve theestimation of y and has no interest by itself; N can thus beconsidered as a correction factor.

To estimate N we start by calculating a norming constant CðXÞ

in X, such that PðX=NÞ=CðXÞ is a probability distribution of N,that is

XNX0

PðX=NÞ

CðXÞ¼ 1.

We find

CðXÞ ¼XNX0

N!Yr

i¼1

yNi

i

Ni!¼ �N1!ð1� y1Þ

� �N1�1Yr

i¼2

yNi

i

Ni!;

where N ¼Pr

i¼1Ni and �N1 ¼ N � N1.After some straightforward calculations, we get

PðN=XÞ ¼PðX=NÞ

CðXÞ¼ C

�N1

N yN� �N1

1 ð1� y1Þ�N1þ1.

So, the Implicit distribution of N given X ¼ ðN1; . . . ;NrÞ is a Pascaldistribution with parameters 1� y1 and �N1 þ 1. In orderto estimate N we need to assume that y1 is known. In thiscase, the Implicit estimator bN of N is the mean of the Pascaldistribution

bN ¼ EðN=XÞ ¼XNX0

NC�N1

N yN� �N1

1 ð1� y1Þ�N1þ1,

which yields,

bN ¼ ð �N1 þ 1Þ

1� y1� 1 ¼

ð �N1 þ y1Þ

1� y1. (2.2)

In order to estimate y, we apply (2.1) and we obtain

byk ¼ PðXðNþ1Þ ¼ xk=XÞ ¼Nk þ 1bN þ r

for 2pkpr.

So, by substituting bN we get

byk ¼ð1� y1ÞðNk þ 1Þ�N1 þ y1 þ rð1� y1Þ

. (2.3)

Since the sum of yk for k ¼ 1 to r should be close to 1 in order toguarantee the convergence of the estimation procedure, we needto assume that

Prk¼2yk ’ 1� y1. From (2.3) we can see that this

condition needs that y1 ’ 1=ðr � 1Þ and y1p1=ðr � 1Þ to besatisfied.

In practice, we can proceed as follows; we start by choosing thenumber of observations in the dataset, Nob. We then calculate theinitial estimate of yk as

byk0¼ max

Nk

Nob;

Nk

Nobp

1

r � 1and 1pkpr

� �.

By applying (2.2) we obtain the estimator of N as

bN ¼ ð �Nk0þ 1Þ

1� yk0

¼ Nob þNk0

�Nk0

where �Nk0¼ Nob � Nk0

.

Finally, we can calculate the probability of a new observationbeing in the state xk, which is estimated by

byk ¼ PðXðNobþ1Þ ¼ xk=DÞ ¼Nk þ 1bN þ r

; 1pkpr and kak0 (2.4)

and byk0¼ 1�

Pkak0

byk.This formula allows to predict the probability of any node in

the network, and particularly the response node, based on a newobservation of the status of other nodes.

3. Implicit networks

3.1. Definition

Like a BN, an IN has two components: a DAG and a probabilitydistribution. Nodes in the DAG represent stochastic variables andarcs represent directed dependencies among variables that arequantified by conditional probability distributions. Formally an IN

can thus be defined as a set of variables X ¼ fX1; . . . ;Xng with:

(1)
a network structure S that encodes conditional independenceassertions about variables in X;
(2)
a set P of local probability distributions associated with eachvariable.
Together, these components define the joint probability distribu-tion of X. Fig. 1 depicts an example of a simple IN structure withthree nodes. In all forthcoming sections we use Xi to denote boththe variable and the corresponding node, and PaðXiÞ to denote theparents of node Xi in S as well as the variables corresponding tothose parents. The lack of possible arcs in S encodes conditionalindependencies (Markov condition). In particular, given structureS, the joint probability distribution of X is given by the product ofall specified conditional probabilities:

PðX1; . . . ;XnÞ ¼Yn

i¼1

PðXi=PaðXiÞÞ. (3.1)

The local probability distributions are the distributions corre-sponding to the terms in the product of conditional distributionsin (3.1). When building an IN without prior knowledge, theprobabilities will depend only on the structure of the parametersset. In the following section, we demonstrate how to learnprobabilities in an IN from a dataset.

3.2. Learning probabilities in an IN

In recent years, learning graphical probabilistic models such asBN and Markov network has become a very active research issue

ARTICLE IN PRESS


(Chrisman, 1996; Krause, 1996; Chickering et al., 2004). The resultof learning an IN from a dataset is a DAG that can be used to makequantitative (probabilistic) predictions of outcomes and errorestimates.

Let n be the number of nodes in a DAG. Each node i

corresponds to a random variable Xi having ri states:

node 1! X1 2 fx11; . . . ; x

r1

1 g

..

.

node i! Xi 2 fx1i ; . . . ; x

ri

i g

..

.

node n! Xn 2 fx1n; . . . ; x

rnn g.

Let D ¼ fXð1Þ; . . . ;XðNobÞg be a dataset and let Nijk be the number of

observations in D in which the node i is in the state k ðxki Þ and its

parents are in the state j ðxjÞ. In other words Nijk is the occurrenceof the event (Xi ¼ xk

i and PaðXiÞ ¼ xj) in the dataset D.

Fig. 2. Scheme of a simplified

The distribution of Xi is multinomial with parameters Nij

and yij ¼ ðyij2; . . . ; yijriÞ, where Nij ¼

Pri

k¼1Nijk and yijk ¼ PðXi ¼

xki =PaðXiÞ ¼ xjÞ for k ¼ 1; . . . ; ri and

Pri

k¼1yijk ¼ 1. So

PðXi ¼ ðNij1; . . . ;NijriÞ=PaðXiÞ ¼ xjÞ ¼ Nij!

Yri

k¼1

yNijk

ijk

Nijk!.

Here Nij and yij are unknown parameters that will be estimated bythe Implicit method. Given a network structure S, let us denote, forany node i, by Nijob the observed number of times its parents are instate j that is the number of occurrence of xj in the dataset and let

byijk0¼

Nijk0

Nijob¼ max

Nijk

Nijob;

Nijk

Nijobp

1

ri � 1and 1pkpri

� �.

The application of the Implicit method as described in Section 2gives the following estimation of Nij and yijk:

bNij ¼ Nijob þNijk0

�Nijk0

; (3.2)

EGFR signaling pathway.


https://www.researchgate.net/publication/298821870_Learning_probabilistic_networks?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/2647248_A_Roadmap_to_Research_on_Bayesian_Networks_and_other_Decomposable_Probabilistic_Models?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

ARTICLE IN PRESS


where �Nijk0¼ Nijob � Nijk0

and

byijk ¼Nijk þ 1bNij þ ri

¼Nijk þ 1

Nijob þ Nijk0= �Nijk0

þ ri

; 1pkpri

and kak0 (3.3)

and byijk0¼ 1�

Pkak0

byijk.Let f ijk ¼ Nijk=Nijob be the observed frequency of node i being in

state xki given that its parents are in state xj, then, from (3.3), we

get

byijk ¼f ijk þ 1=Nijob

1þ f ijk0=ð1� f ijk0

Þ þ ri=Nijob. (3.4)

Suppose that, for a new observation XðNobþ1Þ we have, for node i,Xi ¼ xk

i and PaðXiÞ ¼ xj. Thus, using (3.1) and (3.3), we cancalculate the probability of this observation by

PðXðNobþ1Þ=D; SÞ ¼Yn

i¼1

byijk.

The formula above allows us to make prediction of the states ofunobserved nodes given observations on other nodes.

Table 1Means of simulated conditional probabilities and standard errors (SE) by Implicit

(IM), uniform Bayesian (UB) and prior Bayesian ðPB�Þ methods

Parameters Theta means True value Standard error ð�100Þ

IM UB PBa TVb IM UB PBa

by1110.29962 0.30032 0.29996 0.3 0.146 0.146 0.073

by1120.70038 0.69968 0.70004 0.7 0.146 0.146 0.073

by2110.19962 0.20042 0.19991 0.2 0.125 0.125 0.063

by2120.80038 0.79958 0.80009 0.8 0.125 0.125 0.063

by3110.29962 0.30032 0.29996 0.3 0.145 0.145 0.072

by3120.70038 0.69968 0.70004 0.7 0.145 0.145 0.072

by4120.19692 0.20985 0.20000 0.2 0.509 0.501 0.030

by4220.60256 0.59841 0.59998 0.6 0.412 0.408 0.051

by4320.09975 0.10347 0.10003 0.1 0.196 0.195 0.039

by4420.90030 0.89870 0.90005 0.9 0.127 0.127 0.045

by4110.80308 0.79016 0.80000 0.8 0.509 0.501 0.029

by4210.39744 0.40159 0.40002 0.4 0.412 0.408 0.051

by4310.90025 0.89653 0.89997 0.9 0.196 0.195 0.039

by4410.09970 0.10130 0.09995 0.1 0.126 0.126 0.045

by5120.09948 0.10732 0.10003 0.1 0.278 0.277 0.029

by5220.29882 0.30146 0.29999 0.3 0.284 0.283 0.060

by5320.09940 0.10415 0.09999 0.1 0.220 0.219 0.035

by5420.90014 0.89809 0.89997 0.9 0.146 0.146 0.045

by5110.90052 0.89268 0.89997 0.9 0.278 0.277 0.029

by5210.70118 0.69854 0.70001 0.7 0.284 0.283 0.059

by5310.90060 0.89585 0.90001 0.9 0.220 0.219 0.035

b 0.09986 0.10191 0.10003 0.1 0.146 0.146 0.045

4. Application to the EGFR signaling pathway

The EGFR protein is a member of the ErbB family oftransmembrane tyrosine kinase receptors that are central com-ponents of cellular signaling pathways and are involved in manycellular processes such as cell proliferation, metabolism, survivaland apoptosis (Linggi and Carpenter, 2006; Normanno et al.,2006). Several studies have provided evidence that EGFR isinvolved in the pathogenesis and progression of differentcarcinoma types. EGFR protein has three domains: an extracellulardomain which binds ligands, a transmembrane domain and anintracellular domain with tyrosine kinase activity. When a ligandbinds to the extracellular domain, two EGFR molecules aggregateto form a dimer protein. Then, the tyrosine kinase domains of onemolecule phosphorylate the C terminal tyrosine residues of theother molecule (see Aifa et al., 2006). This phosphorylationproduces binding sites for proteins with SH2 domains, such asGrb2. Grb2 is an adapter protein that binds to the active EGFR andthe complex is a branch point that leads to several signalingpathways through binding to different potential targets. One ofthese pathways is the Ras/Mitogen Activated Protein Kinase(MAPK) pathway that induces cell division (see Kholodenkoet al., 1999).

In order to model the EGFR signaling pathway and itsrelationship to human pathologies, we consider in Fig. 2 asimplified structure of the network in which only the followingnodes are used: ligand (EGF), receptor (EGFR), receptors andligand dimer in active state ðEGFR�Þ, adapter protein (Grb2),complex of EGFR and Grb2 ðGrb2�Þ and cellular response throughthe Ras/MAPK pathway (Ras). The relationships between variablesare as follows:

y541by6110.90012 0.89832 0.89997 0.9 0.134 0.134 0.045

b 0.09988 0.10168 0.10003 0.1 0.134 0.134 0.045
� y
The protein expression level of EGF is either high or low (H/L).
612b 0.09974 0.10152 0.09998 0.1 0.136 0.136 0.046 � y The protein expression level of EGFR is either high or low (H/L). 621b 0.90026 0.89848 0.90002 0.9 0.136 0.136 0.046 � y622
MSEc IM UB PBa

The level of (EGFR:EGF) dimer (denoted EGFR�Þ could be highor low (H/L) depending on the expression levels of bothreceptor and ligand.
�
4:42� 10�5 3:73� 10�4 9:71� 10�9

The protein expression level of Grb2 (adapter protein) is highor low (H/L).
�
a Corresponds to estimates obtained using true values as prior parameters.b

The level of EGFR�=Grb2 protein complex is high or low (H/L)depending on the levels of both EGFR� and Grb2.

Values used to simulate the data.c P 2
�
Mean squared error calculated as ðbyijk � yijkÞ .
The Ras is activated and initiates a cascade of reactions thatleads to cellular response (Yes/No).
experiment corresponds to the measure of the status of allproteins in the network, which were generated according to the

We simulated 10,000 datasets of 1000 experiments each. An

prior probabilities in Appendix A and the network structurein Fig. 2. In order to compare our method to the uniform Bayesianapproach, we calculated by the Implicit and the uniformBayesian methods the tables of conditional probabilities(probabilities yijk of the state of each variable given its parents).An example of detailed calculation for the two methods isprovided in Appendix B.

We used a program implemented in R language for simulationsand parameter estimates calculation (available on request to theauthors). We give in Table 1 the mean values of byijk andthe standard errors for IN and BN methods for all nodes of thenetwork. Globally, the concordance between the two approachesis very good. However, when we compare the Implicit method toBayesian method based on true values as priors (the mostfavorable setting for Bayesian estimation) we see a betterprecision of the Implicit compared to Bayesian estimates withuniform priors (MSE IM ¼ 4:42� 10�5 versus MSE UB ¼ 3:73�10�4). This fact is illustrated in Fig. 3, that shows the differencebetween results of the three methods and particulary that ourmethod is different and better than the uniform Bayesian method.

https://www.researchgate.net/publication/7390367_Electrostatic_interactions_of_peptides_flanking_the_tyrosine_kinase_domain_in_the_epidermal_growth_factor_receptor_provides_a_model_for_intracellular_dimerization_and_autophosphorylation?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/6712154_ErbB_receptors_New_insights_on_mechanisms_and_biology?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

https://www.researchgate.net/publication/7394392_Normanno_N_et_al_Epidermal_growth_factor_receptor_EGFR_signaling_in_cancer_Gene_366_2-16?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/12784993_Quantification_of_Short_Term_Signaling_by_the_Epidermal_Growth_Factor_Receptor?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


ARTICLE IN PRESS

t111 t211 t311 t412 t432 t411 t431 t512 t532 t511 t531 t611 t621

Parameters

Par

amet

er v

alue

s

0.0

0.2

0.4

0.6

0.8

1.0tIM tUB tPB

Fig. 3. Mean values of the parameters estimated using 10,000 simulated datasets (t stands for y and parameters are given in the same order than in Table 1).


5. Discussion

In this paper, we have described the new concept of Implicitnetworks (INs) as an alternative approach to Bayesian networks(BNs). In fact, for the estimation of the parameters yijk of anIN, we apply the Implicit approach. This method is similar to theBayesian one, and happens in a natural context without speci-fying any prior parameters. In fact, the posterior Bayesiandistribution is given by multiplying the likelihood function by aknown prior distribution and then dividing by a normingconstant. Consequently, this prior information is used, togetherwith the data, to derive the posterior distribution. The ideaof the Implicit distribution arises from this Bayesian concept,however, the likelihood function is multiplied by a measurethat depends only on the topological structure of the set ofparameters. From this point of view, we demonstrated that INs

are more simple in theory, as they do not need the estimationof a set of prior parameters of the Dirichlet distribution. Aswe know, the choice of prior information in Bayesian approacheshas always been problematic and is considered by many tobe the major weakness of such methods. Very often we need anexpert system to get the prior knowledge. This prior is notalways available and even when it is available, it is expensive. IN

approach avoids the problem of priors in an elegant mannerand leads to more tractable formulas which are easier toimplement.

INs thus constitute an original and promising alternative insituations where the use of BNs is recommended and priors aremissing or difficult to obtain. This means that INs might become areference method for many applications in biology, particularly inthe modelling of gene regulatory or signaling pathways. We canalso use INs in all fields where BNs have been shown to be usefultools, such as gene expression analysis (Friedman, 2004),protein–protein interaction (Jansen et al., 2003) and geneticassociation studies of multifactorial diseases (Sebastiani et al.,2005).

As we have shown in the example in Section 4, the Implicitmethod could be efficiently applied to experimental data to inferprobabilities in molecular pathways. From a practical point ofview, the prediction given by IN can provide a starting point fordrug target selection or drug response prediction. For example,the IN approach presented in the example could be used to modelthe whole EGFR pathway and to predict the effect of variouskinase inhibitors on physiological response.

In this work, we showed how these probabilities can belearned, for given networks structure, from a dataset without theneed of any prior assumption on the probabilities of proteininteractions. However, the Implicit approach can also be used tolearn the network structure. Learning structure is a much harderissue than learning parameters (Chickering et al., 2004) and mostapproaches that address this problem use a score function thatmeasures the goodness of fit between the structure and the dataand thus try to find by appropriate algorithms the network withthe highest score. Many scoring functions have been proposed andare based on different principles, such as entropy (Herskovitsand Cooper, 1990), Bayesian approaches (Buntine, 1991; Cooperand Herskovits, 1992), or the minimum description length (MDL;Lam and Bacchus, 1994). Within the framework of Implicitinference, we can propose an Implicit score function on whichnetwork structure inference can be based. This issue will beaddressed in a future work.

Another interesting issue is the quantitative modelling ofsignaling pathways (Kholodenko et al., 1999). In this case thenodes are no longer discrete variables but continuous distributionvariables. BNs have already been generalized for continuousvariables or a mixture of discrete and continuous variables(BØttcher and Dethlefsen, 2003). With IN this is also possiblesince the Implicit inference framework accommodates very wellany type of distribution (Hassairi et al., 2005).

In this paper we considered the learning of INs from completedata. However, our method could be generalized to handleincomplete datasets using an iterative process that is similar to

https://www.researchgate.net/publication/7955764_Genetic_dissection_and_prognostic_modeling_of_overt_stroke_in_sickle_cell_anemia?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/5142831_DEAL_a_package_for_learning_Bayesian_Networks?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/220344038_A_Bayesian_Method_for_the_Induction_of_Probabilistic_Networks_from_Data?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/2468278_Learning_Bayesian_Belief_Networks_An_approach_based_on_the_MDL_Principle?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/9046830_A_Bayesian_Networks_Approach_for_Predicting_Protein-Protein_Interactions_from_Genomic_Data?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=


https://www.researchgate.net/publication/221404216_an_entropy-driven_system_for_construction_of_probabilistic_expert_systems_from_databases?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=



https://www.researchgate.net/publication/2758302_Theory_Refinement_on_Bayesian_Networks?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=

ARTICLE IN PRESS


the expectation–maximization algorithm (Dempster et al.,1977). This generalization together with its implementation isavailable from the authors and will be the subject of anotherpublication.

Acknowledgments

This work was supported by the Ministry of Higher Education,Research and Technology, Tunisia.

Appendix A. Tables of prior probabilities for the Bayesiannetwork approach

Pr(EGF)

H
L
0.7
0.3
Pr(EGFR)

H
L
0.8
0.2
Pr(Grb2)

H
L
0.7
0.3
PrðEGFR�=EGF;EGFRÞ

EGFR�
EGF EGFR
H
H H 0.9
H
H L 0.6
H
L L 0.2
H
L H 0.1
PrðGrb2�=Grb2;EGFR�Þ

Grb2�
Grb2 EGFR�
H
H H 0.9
H
H L 0.3
H
L L 0.1
H
L H 0.1
PrðRas=Grb2�Þ

Ras
Grb2�
Y
H 0.9
N
H 0.1
Appendix B. Example of computation of probabilities usingthe Implicit and Bayesian approaches

We consider a dataset with 100 observations and the networkin Fig. 2. We below show the estimation of parameters for node 1.

We take the case where the node 1 was observed 70 times in stateHigh and 30 times in state Low.

Implicit networks

i ¼ 1; PaðX1Þ ¼ ;�!j ¼ 1; k 2 f2 ¼ High;1 ¼ Lowg,

N112 ¼ 70,

N111 ¼ 30,

N11ob ¼ 70þ 30 ¼ 100.

By the application of formulas (3.2) and (3.3), we get

bN11 ¼ 100þ 7030;

by111 ¼30þ 1

100þ 7030þ 2

¼ 0:297;

by112 ¼ 1� by111 ¼ 0:703.

Bayesian networks

i ¼ 1; PaðX1Þ ¼ ;�!j ¼ 1; k 2 f2 ¼ High;1 ¼ Lowg,

N112 ¼ 70,

N111 ¼ 30,

N11ob ¼ 70þ 30 ¼ 100.

By the application of formula byijk ¼ ðNijk þ 1Þ=ðNij þ riÞ (Hecker-man, 1997). by111 ¼ ð30þ 1Þ=ð100þ 2Þ ¼ 0:303; by112 ¼ ð70þ 1Þ=ð100þ 2Þ ¼ 0:696.

References

Aifa, S., Miled, N., Frikha, F., Aniba, M.R., Svensson, S.P., Rebai, A., 2006. Electrostaticinteractions of peptides flanking the tyrosine kinase domain in the epidermalgrowth factor receptor provides a model for intracellular dimerization andautophosphorylation. Proteins 62, 1036–1043.

Beer, M.A., Tavazoie, S., 2004. Predicting gene expression from sequence. Cell 117,185–198.

BØttcher, S.G., Dethlefsen, C., 2003. Deal: a package for learning Bayesiannetworks. J. Stat. Software 8, 1–40.

Buntine, W., 1991. Theory refinement of Bayesian networks. In: Proceedings of theSeventh Conference on Uncertainty in Artificial Intelligence, vol. 5260.

Chickering, D., Heckerman, D., Meek, C., 2004. Large-sample learning of Bayesiannetworks is NP-hard. J. Mach. Learn. Res. 5, 1287–1330.

Chrisman, L., 1996. A road map to research on Bayesian networks and otherdecomposable probabilistic models. Technical Report, School of ComputerScience, CMU, Pittsburgh, PA.

Cooper, G.F., Herskovits, E., 1992. A Bayesian method for the induction ofprobabilistic networks from data. Mach. Learn. 9, 309–347.

Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incompletedata via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38.

Friedman, N., 2004. Inferring cellular networks using probabilistic graphicalmodels. Science 303, 799–805.

Hassairi, A., Masmoudi, A., Kokonendji, C., 2005. Implicit distributions andestimation. Commun. Stat. A—Theory Methods 34 (2), 245–252.

Heckerman, D., 1997. Bayesian networks for data mining. Data Min. Knowl.Discovery 1, 79–119.

Heckerman, D., Breese, J., 1996. Causal independence for probability assessmentand inference using Bayesian networks. IEEE. Trans. Syst. Man Cybern. 26,826–831.

Heckerman, D., Geiger, D., Chickering, D.M., 1995. Learning Bayesian networks: thecombination of knowledge and statistical data. Mach. Learn. 20 (3), 197–243.

Herskovits, E., Cooper, G.F., 1990. An entropy-driven system for the construction ofprobabilistic expert systems from databases. In: P. Bonissone (Ed.), Proceedingsof the Sixth Conference on Uncertainty in Artificial Intelligence, Cambridge,p. 54–62.

Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder,M., Greenblatt, J.F., Gerstein, M., 2003. A Bayesian networks approach forpredicting protein–protein interactions from genomic data. Science 302,449–453.

Kholodenko, B.N., Demin, O.V., Moehren, G., Hoek, J.B., 1999. Quantification of shortterm signaling by the epidermal growth factor receptor. J. Biol. Chem. 274,30169–30181.

Krause, P., 1996. Learning probabilistic networks. Technical Report, PhilipsResearch Laboratories, UK.

Lam, W., Bacchus, F., 1994. Learning Bayesian belief networks. An approach basedon the MDL principle. Comput. Intell. 10, 269–293.

Linggi, B., Carpenter, G., 2006. ErbB receptors: new insights on mechanisms andbiology. Trends Cell. Biol. 16, 649–656.

Mukhopadhyay, N., 2006. Some comments on Hassairi et al.’s Implicit distributionsand estimation. Commun. Stat. A—Theory Methods 35, 293–297.
















































https://www.researchgate.net/publication/256476485_Maximum_Likelihood_from_Incomplete_Data_Via_the_EM_Algorithm_with_discussion_J_Roy?el=1_x_8&enrichId=rgreq-70b6ceba-280f-4031-8d7a-3a99a54dbea4&enrichSource=Y292ZXJQYWdlOzUzMTQ2NTc7QVM6MTA0NDcwOTQzODk1NTYzQDE0MDE5MTkyMjQwMzY=




ARTICLE IN PRESS


Normanno, N., De Luca, A., Bianco, C., Strizzi, L., Mancino, M., Maiello, M.R.,Carotenuto, A., De Feo, G., Caponigro, F., Salomon, D.S., 2006. Epidermal growthfactor receptor (EGFR) signaling in cancer. Gene 366, 2–16.

Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of PlausibleInference. Morgan Kaufmann, San Fransisco, CA.

Robert, C.P., 1994. The Bayesian Choice: A Decision-Theoretic Motivation. Springer,New York.

Sebastiani, P., Ramoni, M.F., Nolan, V., Baldwin, C.T., Steinberg, M.H., 2005. Geneticdissection and prognostic modeling of overt stroke in sickle cell anemia. Nat.Genet. 37, 435–440.

Woolf, P.J., Prudhomme, W., Daheron, L., Daley, G.Q., Lauffenburger, D.A., 2005.Bayesian analysis of signaling networks governing embryonic stem cell fatedecisions. Bioinformatics. 21, 741–753.

Wright, S., 1921. Correlation and causation. J. Agric. Res. 20, 557–585.














Causal inference in biomolecular pathways using a Bayesian network approach and an Implicit method

Documents