Top Banner
Computers in Biology and Medicine 37 (2007) 134 – 148 www.intl.elsevierhealth.com/journals/cobm Modelling, property verification and behavioural equivalence of lactose operon regulation Marcelo Cezar Pinto a , , Luciana Foss a , José Carlos Merino Mombach b , Leila Ribeiro a a Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brasil b Laboratório de Bioinformática e Biologia Computacional, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brasil Abstract Understanding biochemical pathways is one of the biggest challenges in the field of molecular biology nowadays. Computer science can contribute in this area by providing formalisms and tools to simulate and analyse pathways. One formalism that is suited for modelling concurrent systems is Milner’s Calculus of Communicating Systems (CCS). This paper shows the viability of using CCS to model and reason about biochemical networks. As a case study, we describe the regulation of lactose operon. After describing this operon formally using CCS, we validate our model by automatically checking some known properties for lactose regulation. Moreover, since biological systems tend to be very complex, we propose to use multiple descriptions of the same system at different levels of abstraction. The compatibility of these multiple views can be assured via mathematical proofs of observational equivalence. 2006 Elsevier Ltd. All rights reserved. Keywords: Systems biology; Model checking; Observational equivalence; CCS 1. Introduction Systems biology is the study of the mechanisms underlying complex biological processes as integrated systems of many, diverse, interacting components. Systems biology involves (a) collection of large sets of experimental data (by high- throughput technologies and/or by mining the literature of re- ductionist molecular biology and biochemistry), (b) proposal of mathematical models that might account for at least some sig- nificant aspects of this data set, (c) accurate computer solution of the mathematical equations to obtain numerical predictions, and (d) assessment of the quality of the model by comparing numerical simulations with the experimental data [1]. Biochemical pathways are one of the most studied topics in systems biology. The behaviour of cells is governed and coordinated by biochemical networks that translate external cues (hormones, growth factors, substances) into adequate Project partially supported by CNPq (Grant 550042). Corresponding author. E-mail addresses: [email protected] (M.C. Pinto), [email protected] (L. Foss), [email protected] (J.C.M. Mombach), [email protected] (L. Ribeiro). 0010-4825/$ - see front matter 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiomed.2006.01.006 biological responses such as cell proliferation, specialization and metabolic control. Metabolic and regulatory pathways are two examples of biochemical networks. Understanding biochemical pathways is central to find out how life evolves. However, laboratory experiments are typically very time consuming and expensive. An alternative approach would be to simulate these systems using computers, and only make laboratory experiments when the simulations give hints that some expected behaviour might occur. The simulation of these networks can answer, for example, whether the concen- tration of some components inside the cell increases/decreases when this cell is put in different environments. To simulate and discover properties of these networks in silico, formal mod- els are needed [2]. The most widely spread models to simulate biochemical pathways are based on differential equations and its variants [3]. However, these equation-based models make it very difficult to check properties and relate different networks. Therefore, for many applications, hybrid, like XS-Systems [4], and symbolic models, like Petri nets [5] and graphs [6], are preferred. Recent work by Regev et al. suggests that process algebras, like Calculus of Communicating Systems (CCS) [7] and - calculus [8,9], may become valuable tools in modelling and
15

Modelling, property verification and behavioural equivalence of lactose operon regulation

Feb 27, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modelling, property verification and behavioural equivalence of lactose operon regulation

Computers in Biology and Medicine 37 (2007) 134–148www.intl.elsevierhealth.com/journals/cobm

Modelling, property verification and behavioural equivalence of lactoseoperon regulation�

Marcelo Cezar Pintoa,∗, Luciana Fossa, José Carlos Merino Mombachb, Leila Ribeiroa

aInstituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, BrasilbLaboratório de Bioinformática e Biologia Computacional, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brasil

Abstract

Understanding biochemical pathways is one of the biggest challenges in the field of molecular biology nowadays. Computer science cancontribute in this area by providing formalisms and tools to simulate and analyse pathways. One formalism that is suited for modellingconcurrent systems is Milner’s Calculus of Communicating Systems (CCS). This paper shows the viability of using CCS to model and reasonabout biochemical networks. As a case study, we describe the regulation of lactose operon. After describing this operon formally using CCS,we validate our model by automatically checking some known properties for lactose regulation. Moreover, since biological systems tend to bevery complex, we propose to use multiple descriptions of the same system at different levels of abstraction. The compatibility of these multipleviews can be assured via mathematical proofs of observational equivalence.� 2006 Elsevier Ltd. All rights reserved.

Keywords: Systems biology; Model checking; Observational equivalence; CCS

1. Introduction

Systems biology is the study of the mechanisms underlyingcomplex biological processes as integrated systems of many,diverse, interacting components. Systems biology involves(a) collection of large sets of experimental data (by high-throughput technologies and/or by mining the literature of re-ductionist molecular biology and biochemistry), (b) proposal ofmathematical models that might account for at least some sig-nificant aspects of this data set, (c) accurate computer solutionof the mathematical equations to obtain numerical predictions,and (d) assessment of the quality of the model by comparingnumerical simulations with the experimental data [1].

Biochemical pathways are one of the most studied topicsin systems biology. The behaviour of cells is governed andcoordinated by biochemical networks that translate externalcues (hormones, growth factors, substances) into adequate

� Project partially supported by CNPq (Grant 550042).∗ Corresponding author.

E-mail addresses: [email protected] (M.C. Pinto), [email protected](L. Foss), [email protected] (J.C.M. Mombach), [email protected](L. Ribeiro).

0010-4825/$ - see front matter � 2006 Elsevier Ltd. All rights reserved.doi:10.1016/j.compbiomed.2006.01.006

biological responses such as cell proliferation, specializationand metabolic control. Metabolic and regulatory pathways aretwo examples of biochemical networks.

Understanding biochemical pathways is central to find outhow life evolves. However, laboratory experiments are typicallyvery time consuming and expensive. An alternative approachwould be to simulate these systems using computers, and onlymake laboratory experiments when the simulations give hintsthat some expected behaviour might occur. The simulation ofthese networks can answer, for example, whether the concen-tration of some components inside the cell increases/decreaseswhen this cell is put in different environments. To simulate anddiscover properties of these networks in silico, formal mod-els are needed [2]. The most widely spread models to simulatebiochemical pathways are based on differential equations andits variants [3]. However, these equation-based models make itvery difficult to check properties and relate different networks.Therefore, for many applications, hybrid, like XS-Systems [4],and symbolic models, like Petri nets [5] and graphs [6], arepreferred.

Recent work by Regev et al. suggests that process algebras,like Calculus of Communicating Systems (CCS) [7] and �-calculus [8,9], may become valuable tools in modelling and

Page 2: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 135

simulation of biological systems where interaction and mobil-ity are important features [10]. The field of process algebrasmay have an important impact in understanding how biologicalsystems work, giving at the same time a way to describe, ma-nipulate, and analyse them. Ciobanu et al. [11] developed a �-calculus model for Albers-Post mechanism to ion (Na+ or K+)transport across membrane and Yildirim and Mackey proposednon-linear differential delay equations to model regulation inthe lactose operon and made comparisons with experimentaldata [12]. A recent work of Chabrier-Rivier et al. [13] proposeda formal counterpart of Kohn’s compilation on the mammaliancell-cycle control and the use of the Computation Tree Logic(CTL) as a query language for biomolecular networks .

We have used CCS to model biological systems in a previouswork [14], showing that the regulation of lactose operon couldbe faithfully modelled in CCS. Bacteria have a simple generalmechanism for coordinating the regulation of genes encodingproducts that participate in a set of related processes: thesegenes are clustered on the chromosome and are transcribed to-gether. Many prokaryotic mRNAs are polycistronic—multiplegenes on a single transcript—and the single promoter that ini-tiates transcription of the cluster is the site of regulation forexpression of all the genes in the cluster. The gene cluster andpromoter, plus additional sequences that function together inregulation, are called an operon [15]. Many of the principlesof prokaryotic gene expression were first defined by studies oflactose metabolism in Escherichia coli, which can use lactoseas its sole carbon source. In 1961, Jacob and Monod publisheda paper that described how two adjacent genes involved in lac-tose metabolism were coordinately regulated by a genetic el-ement located at one end of the gene cluster [16]. The geneswere those for �-galactosidase, which cleaves lactose to galac-tose and glucose, and galactoside permease, which transportslactose into the cell. The terms “operon” and “operator” werefirst introduced in this paper. With the operon model, gene reg-ulation could, for the first time, be considered in molecularterms [15].

In this paper, we will revisit our model of lactose operonregulation. In our previous work, we have found out that evena relatively small biochemical pathway could lead to an ex-tremely large formal model, making it unfeasible to be anal-ysed as a whole. The idea here is to build first a very abstractdescription of the system under analysis, and validate it viamodel checking. Then, each component can be refined into amore complex entity. In fact, it is very common in biology touse multiple descriptions of the same system at different levelsof abstraction. However, the consistency among these multipleviews is typically shown in an ad hoc way. If we can mathe-matically prove that the refined version is consistent with (or“equivalent” to) the original abstract description of this com-ponent, the properties checked for the more abstract system arestill valid in the concrete one. To prove this consistency, we usea well-known notion of equivalence from the area of processalgebras, namely observational equivalence: two componentsare equivalent if their interactions with the environment are thesame. Thus, we advocate the use of process algebras not onlyto model and analyse biological pathways, but also to structure

the description of a system into different but consistent layers.We point out that our approach is used to make qualitative in-ferences of a biological system and it complements quantitativemodelling approaches, like [3,17–19].

This paper is structured as follows: after a short introduc-tion to the mechanism of regulation of lactose operon (Section2) and CCS (Section 3), we present the modelling of lactoseoperon regulation at two different abstraction levels (Section4). Then we show that the more abstract model has the ex-pected properties and that the concrete model is equivalent tothe more abstract one (Section 5). Finally, in Section 6 we re-late our work to previous ones and in Section 7 we make someconclusions and show our future research directions.

2. Regulation of lactose operon

The lactose operon contains three genes related to lactosemetabolism. The lac Z, Y and A genes encode �-galactosidase,galactoside permease and thiogalactoside transacetylase, re-spectively. �-galactosidase converts lactose to galactose andglucose or, by transglycosylation, to allolactose. Galactosidepermease transports lactose into the cell and thiogalactosidetransacetylase appears to modify toxic galactosides to facilitatetheir removal from the cell.

In the absence of lactose, the lac operon genes arerepressed—in fact, they are transcribed at a basal level. Thisnegative regulation is done by a molecule called Lac repressor,which binds to some sites near the start of the operon, blockingthe activity of RNA polymerase. These sites are called opera-tors. The operator to which the repressor binds most tightly isnamed O1. The lac operon has two secondary binding sites forthe Lac repressor: O2 and O3. To repress the operon, the Lacrepressor binds to both the main operator and one of the twosecondary sites.

When cells are provided with lactose, the lac operon is in-duced. An inducer (signal) molecule binds to a specific site onthe Lac repressor, causing a conformational change that resultsin dissociation of the repressor from the operators. The inducerin the lac operon system is allolactose, an isomer of lactose.When unrepressed, transcription of lac genes is increased, butnot at its higher level.

Other factors besides lactose affect the expression of the lacgenes, such as the availability of glucose—the preferred energysource of bacteria. Other sugars can serve as the main or solenutrient, but extra steps are required to prepare them for entryinto glycolysis, necessitating the synthesis of additional en-zymes. Clearly, expressing the genes for proteins that metabo-lize sugars such as lactose is wasteful when glucose is abundant.

The lac operon deals with it through a positive regulation. Aregulation mechanism known as catabolite repression restrictsexpression of the genes required for catabolism of lactose inthe presence of glucose, even when this secondary sugar arealso present. The effect of glucose is mediated by cAMP, asa coactivator, and an activator protein known as cAMP recep-tor protein, or CRP (sometimes it is called CAP, for catabo-lite gene activator protein). CRP has binding sites for DNAand cAMP. When glucose is absent, CRP–cAMP binds to a

Page 3: Modelling, property verification and behavioural equivalence of lactose operon regulation

136 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

Fig. 1. Known properties of lactose operon regulation.

site near the lac promoter and stimulates RNA transcription.CRP–cAMP is therefore a positive regulatory element respon-sive to glucose levels, whereas the Lac repressor is a negativeregulatory element responsive to lactose. The two act in con-cert. CRP–cAMP has little effect on the lac operon when theLac repressor is blocking transcription, and dissociation of therepressor from the lac operator has little effect on transcrip-tion of the lac operon unless CRP–cAMP is present to facil-itate transcription; when CRP is not bound, the wild-type lacpromoter is a relatively weak promoter.

The effect of glucose on CRP is mediated by the cAMPinteraction. CRP binds to DNA most avidly when cAMP

concentrations are high. In the presence of glucose, thesynthesis of cAMP is inhibited and efflux of cAMP fromthe cell is stimulated. As cAMP declines, CRP bindingto DNA declines, thereby decreasing the expression ofthe lac operon. Strong induction of the lac operon there-fore requires both lactose (to inactivate the Lac repres-sor) and a lowered concentration of glucose (to triggeran increase in cAMP and increase binding of cAMP toCRP) [20].

Now we can specify some known properties of lactose operonregulation to be verified in our formal model of the system.These will be shown in Fig. 1.

Page 4: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 137

3. Calculus of communicating systems

The CCS [7] is a mathematical formalism designed to de-scribe and analyse the behaviour of interactive components ex-ecuting in parallel. CCS is a process algebra, where all com-ponents of the system can be viewed as processes that caninteract via message-passing. This interaction is modelled assynchronized communication. A CCS process can be viewedas a black box, which may have a name and has a process in-terface, consisting of the channel names that this process canuse to interact with other processes in its environment. Each ofthe channels may be either an input or an output channel. Thebehaviour of a process is given by the actions it can perform.These actions can occur sequentially or in parallel, and theremay be non-deterministic choices of which actions shall occur.For example, the interface for a process named OpS (for op-erator site) may be given by bind (input) and unbind (output)channels. This process may interact with its environment (thatis, other processes) via these channels. Processes that want tointeract with OpS must have at least one of the complementarychannel names bind (output) or unbind (input).

3.1. Syntax and semantics of CCS

In this subsection, we present the subset of the CCS lan-guage that we use in this paper. We assume the infinite sets Cof channel names (input channels) and C = {c|c ∈ C} of com-plementary channel names (output channels). We let L=C∪Cbe the set of labels and Act = L ∪ {�} be the set of actions (�denotes a non-observable action). We also assume an infiniteset P of process names.

Definition 3.1 (Syntax). The collection E of CCS expressionsis given by the following grammar:

P, Q: := K|0|�.P |∑

i∈I

Pi |P |Q|P \L,

where

• K ∈ P and Kdef= P define the behaviour of process K;

• � ∈ Act;• I is a finite index set;• L ⊆ L.

The most basic process is the process 0, that performs noaction whatsoever. Another basic construction in CCS is ac-tion prefixing �.P , that describes the execution of an action� and continues executing process P. For example, the pro-cess bind.unbind.0 performs an input action in the bind chan-nel, thereafter performs an output action in unbind channel andstops. We can introduce a name for a process, and use this name

in the definition of other processes: OpS def= bind.unbind.OpS,where the process OpS performs bind and unbind actions, andthen behaves like OpS again.

A process may choose the action that will be performedamong several actions (non-deterministic choice) and this is

described by the + operator. In order to describe the opera-tor sites for lactose, where the lactose repressor could bindto operator site 2 or 3 after binding to operator site 1, we

can use it: LacOpS def= bindO1.(bindO2.unbind.LacOpS +bindO3.unbind.LacOpS), where, after performing bindO1 ac-tion, the process can perform either bindO2 or bindO3 action,both followed by unbind actions.

We can describe the behaviour of a Lac repressor protein thatis free in a cell, then binds to operator sites of lactose operonand after some time is released as

LacR def= free.bindO1.(bindO2.bO12.unbind.LacR

+ bindO3.bO13.unbind.LacR).

Lac repressor protein (LacR) can only bind to operator sitesif it interacts with lactose operator sites (LacOpS). In or-der to describe systems consisting of two or more processesexecuting at the same time (parallel behaviour), and pos-sibly interacting (synchronizing) with each other, CCS of-fers the parallel composition operation |. The expressionLacOpS|LacR describes the system where the LacOpS andthe LacR processes are running in parallel. They can inter-act through their complementary channels named bindO1,bindO2, bindO3 and unbind. The LacR and LacOpS processeshave the possibility to communicate in the parallel compo-sition LacOpS|LacR, but we do not require that they mustcommunicate with each other. Both processes could use theircomplementary channels to communicate with other processesin their environment. We can avoid the communication withthese other processes through bindO1, bindO2, bindO3 andunbind channels using the restriction operator \, whose aimis to limit the scope of channel names. For instance, defin-ing (LacOpS|LacR)\{bindO1, bindO2, bindO3, unbind}, wehide the bindO1, bindO2, bindO3 and unbind channels fromthe environment of LacOpS|LacR process.

During the execution of a CCS process, each time an actionis performed, the state is changed. This is called transition. Atransition is described by an input-state, an output-state and alabel �.The input-state (output-state) is the process state before(after) performing an action. The label � describes the actionthat was performed. A computation of a process is a sequenceof transitions. The behaviour of a CCS process is given bythe set of computations that this process can carry out. Thesecomputations can be formally described by a labelled transitionsystem (LTS) [21].

Definition 3.2 (Labelled transition system). A LTS is a triple(S, L, →), where

• S is a set of states;• L is a set of labels;• →⊆ S × L × S is a transition relation. We write s

a→ s′for (s, a, s′) ∈→ and →∗ for the reflexive and transitiveclosure of →.

A LTS describes the evolution of a system during its execu-tion. All states of this system are in S and the relation → defines

Page 5: Modelling, property verification and behavioural equivalence of lactose operon regulation

138 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

Fig. 2. LTSs for Sys (a) and Spec (b) processes.

how the system can change from a state to another, performingan L-labelled transition.

Definition 3.3 (Semantics). The operational semantics of CCSis given by the LTS (Proc, Act, →), where Proc is a set of CCSexpressions, Act is a set of actions and → is defined as follows:

(Pref)�.P

�→ P, (Def)

P�→ P ′

K�→ P ′

Kdef= P,

(Choice)Pj

�→ P ′j

∑i∈I

Pi�→ P ′

j

j ∈ I ,

(Restr)P

�→ P ′

P \L �→ P ′\L�, � /∈ L, (Par1)

P�→ P ′

P |Q �→ P ′|Q,

(Par2)Q

�→ Q′

P |Q �→ P |Q′, (Sync)

Pa→ P ′Q a→ Q′

P |Q �→ P ′|Q′,

where � ∈ Act, a ∈ L.

The rules described above define how each different CCSprocess can evolve, that is, how it performs a transition. The firstrule (Pref) defines a transition of an action prefixing process.This rule does not have conditions (written above the line) to beapplied, thus the process �.P can always evolve to P performingthe action �. The second rule (Def) defines the evolution of aprocess given by name process K. This process can evolve to

P ′ if the equation Kdef= P is defined and P can evolve to P ′.

The third rule (Choice) defines the behaviour of the + operator.A list of processes composed by + can evolve to P ′

j if someprocess Pj in that list can evolve to P ′

j . The fourth rule (Restr)defines the evolution of a process with the restriction operator \.

This operator allows the process P \L to evolve to P ′\L onlyif P can evolve to P ′ performing an action that is not listedin L (because the actions in L shall not be observed outsideP and P ′). The last three rules define the evolution of processcomposed by the parallel composition operator |. Par1 and Par2rules describe that two processes running in parallel can evolvewithout interfering on each other. The last rule (Sync) definesthe synchronization between two processes running in parallel.The processes must perform complementary actions and thewhole process performs an unobservable (silent) action �.

Based on Definition 3.3 we can describe the behaviour ofthe process Sys def= (LacOpS|LacR)\{bindO1, bindO2, bindO3,unbind} by the labelled transitions system in Fig. 2(a).

3.2. Observational equivalence

Process algebras can be used to describe systems at differentlevels of abstraction, that can be compared via equivalence no-tions (that say in which sense the processes have or not the samebehaviour). We can say that one process describing the imple-mentation of a system and another describing its specificationare equivalent if they describe the same behaviour according tothe chosen equivalence notion. There are different equivalencenotions for CCS processes [7]. We will use here the obser-vational equivalence, or weak bisimulation. The observationalequivalence is an equivalence relation that allows us to abstractfrom steps labelled with silent actions in process behavioursand equate processes that offer the same observable behaviourdespite possibly having very different amounts of internal com-putations. For example, process Sys is observationally equiva-

lent to process Spec def= free.(�.bO12.Spec+�.bO13.Spec) (theLTS for these processes can be seen in Fig. 2), that is, both pro-cesses perform the same observable actions, in the same order.

Page 6: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 139

To define this notion of equivalence we will introduce a tran-sition relation between processes that relates an input and anoutput states (processes) P and Q when there is a path from Pto Q containing at most one observable action.

Definition 3.4. Let P and Q be CCS processes. We writeP

�⇒ Q iff there is a (possibly empty) sequence of �-labelledtransitions that leads from P to Q. (If the sequence is empty,then P = Q.)

For each action �, we write P�⇒ Q iff there are processes

P ′ and Q′ such that

P�⇒ P ′ �→ Q′ �⇒ Q.

For each action �, we use � to stand for � if � = �, and for �otherwise.

Using Definition 3.4 for the Sys process, we have, for exam-ple, the transitions:

Sys free⇒ S1, Sys free⇒ S3, Sys free⇒ S4, S1�⇒ S2,

S1�⇒ S3, S1

�⇒ S4, S1bO12⇒ Sys, S1

bO13⇒ Sys,

S3bO12⇒ Sys, S4

bO13⇒ Sys, S5�⇒ Sys

and for the Spec process, the transitions:

Spec free⇒ Q1, Spec free⇒ Q2, Spec free⇒ Q3,

Q1bO12⇒ Spec, Q1

�⇒ Q2, Q3bO13⇒ Spec.

Now we can define when two processes are observationallyequivalent. Intuitively, they will be equivalent if one can simu-late the observable sequences of actions of the other and viceversa. This notion is also called weak bisimulation.

Definition 3.5 (Observational equivalence). A binary relationR over the set of states of an LTS is an observational equiva-lence iff whenever s1Rs2 and � is an action:

• if s1�→ s′

1, then there is a transition s2�⇒ s′

2 such that s′1Rs′

2;

• if s2�→ s′

2, then there is a transition s1�⇒ s′

1 such that s′1Rs′

2.

Two states s and s′ are observationally equivalent, writtens ≈ s′, iff there is an observational equivalence that relatesthem.

The idea underlying the definition of this notion of equiva-lence is that a transition of a process can now be matched bya sequence of transitions from the other that has the same “ob-servational content” and leads to a state that is equivalent tothat reached by the first process.

Let us consider the LTSs in the Fig. 2. We have Sys ≈Spec because R = {(Sys, Spec), (S1, Q1), (S2, Q1), (S3, Q2),

(S4, Q3), (S5, Spec)} is an observational equivalence such that(Sys, Spec) ∈ R. It remains to verify that R is indeed an ob-servational equivalence. Let us examine all possible transitionfrom the components of the pair:

• (Sys, Spec)

If Sysfree−→ S1 then Spec free⇒ Q1 and (S1, Q1) ∈ R.

If Specfree−→ Q1 then Sys free⇒ S1 and (S1, Q1) ∈ R.

• (S1, Q1)

If S1�→ S2 then Q1

�⇒ Q1 and (S2, Q1) ∈ R.

If Q1�→ Q2 then S1

�⇒ S3 and (S3, Q2) ∈ R.

If Q1�→ Q3 then S1

�⇒ S4 and (S4, Q3) ∈ R.• (S2, Q1)

If S2�→ S3 then Q1

�⇒ Q2 and (S3, Q2) ∈ R.

If S2�→ S4 then Q1

�⇒ Q3 and (S4, Q3) ∈ R.

If Q1�→ Q2 then S2

�⇒ S3 and (S3, Q2) ∈ R.

If Q1�→ Q3 then S2

�⇒ S4 and (S4, Q3) ∈ R.• (S3, Q2)

If S3bO12−→ S5 then Q2

bO12⇒ Spec and (S5, Spec) ∈ R.

If Q2bO12−→ Spec then S3

bO12⇒ S5 and (S5, Spec) ∈ R.• (S4, Q3)

If S4bO13−→ S5 then Q3

bO13⇒ Spec and (S5, Spec) ∈ R.

If Q3bO13−→ Spec then S4

bO13⇒ S5 and (S5, Spec) ∈ R.• (S5, Spec)

If S5�→ Sys then Spec �⇒ Spec and (Sys, Spec) ∈ R.

If Specfree−→ Q1 then S5

free⇒ S1 and (S1, Q1) ∈ R.

Hence we have shown that each pair from R satisfies thecondition given in Definition 3.5, which means that R is anobservational equivalence.

4. Modelling lactose operon regulation

The cellular concentration of a protein is determined by adelicate balance of at least seven activities, each having severalpotential points of regulation:

(1) synthesis of the primary RNA transcript (transcription);(2) posttranscriptional modification of mRNA;(3) messenger RNA degradation;(4) protein synthesis (translation);(5) posttranslational modification of proteins;(6) protein targeting and transport;(7) protein degradation.

Page 7: Modelling, property verification and behavioural equivalence of lactose operon regulation

140 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

Fig. 3. Full CCS specification of lactose operon regulation.

Our models focus on the regulation of transcription initia-tion of lac operon. Concerning this we have made some adap-tations in the models to abstract away some activities that arenot (or are not known to be) directly related to lac regulation,like RNA polymerase activity, decrease of glucose level by cel-lular activity, production of galactose by metabolism of lactosemediated by �-galactosidase and presence of thiogalactosidetransacetylase. For a review on the subject see Ref. [22].

As discussed in the introduction, we will present two differentviews of the lac operon regulation, one more abstract, that willbe called specification, and one more concrete, that will becalled system. The specification is shown in Section 4.1 and thesystem in Section 4.2.

Internal synchronizations will be done via lowercase namedchannels, whereas upper case named channels represent outputactions. Synchronizations model biological events and outputactions will be used as observable actions to allow the veri-fication of properties and compute observational equivalencerelations.

4.1. Specification process description

Our main specification process (SpecFull) has seven pro-cesses running in parallel describing the behaviour of lactose

entrance in cell (processes 2–4 in Fig. 3), �-galactosidase activ-ity (processes 5 and 6), allolactose production and consumption(processes 7 and 8), negative regulation mechanism (process9), lac operon binding sites activities (processes 10–13), posi-tive regulation (processes 14 and 15) and variation of glucoseconcentration (processes 16 and 17).

Process Ent indicates that after the entrance of external lac-tose in the cell (ELAC channel) the process behaviour changesto React process. This allows an increase in intracellular lac-tose (ILAC channel) or the consumption of intracellular lactoseby �-galactosidase enzyme (rbeta channel). After an increaseof intracellular lactose, the behaviour changes to React2 pro-cess description.

In React2 process, external lactose could enter in the cell(ELAC channel) or the intracellular lactose could be consumedby �-galactosidase enzyme (rbeta channel). The processevolves to React after entrance of lactose in the cell.

The behaviours of �-galactosidase enzyme, allolactose, lacoperon binding sites and glucose (processes 5 and 6, 7 and8, 10–13 and 16 and 17 in Fig. 3, respectively) are the samedescribed in Section 4.2.

The negative regulation of lac operon is described in Negprocess. After the binding of Lac repressor protein to O1 site(BO1 channel), this protein may bind either to O2 or O3 site

Page 8: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 141

Fig. 4. Full CCS system of lactose operon regulation (part 1).

(BO2 or BO3 channel).1 These bindings repress the lac operon(synchronization of rep channel followed by REP). When theinducer binds to the Lac repressor protein (synchronization ofballo channel followed by BALLO), it releases operator sites(UBO1 channel followed by either UBO2 or UBO3) and unre-presses the operon (synchronization of urep channel followedby UREP). Then, the process goes back to the start situation(process Neg).

The positive regulation of lac operon starts with the Posprocess. When glucose level is low (synchronization of lev andlow channels), cAMP concentration will be increased (L_to_Hchannel). With the availability of the coactivator, CRP binds toit (BC channel) and the complex cAMP–CRP binds to a sitenear the promoter of lac operon, stimulating the transcription(BS channel and synchronization of act channel followed byACT). After that, the process changes its behaviour to the onedescribed in Act process.

When the operon is fully expressed, glucose may be pro-duced and consumed without affecting the transcription speedof lac operon. When this production increases the glucose con-centration (synchronization of lev and high channels), cAMPlevel will decrease (H_to_L channel). CRP releases cAMP andthe operon site (UBC channel followed by UBS), deactivatingthe transcription (synchronization of iact channel followed byIACT). So, the process will behave as Pos again.

1 The silent actions before them mean that, according to a non-predictableevent (silent action), the system evolves to one or other branch of the non-deterministic choice.

Fig. 5. Full CCS system of lactose operon regulation (part 2).

4.2. System process description

The system modelling is shown in Figs. 4 and 5. Sometimeswe need a qualitative measure of substance concentration (oractivity) to choose the right behaviour for it. So we can havemore than one process description to each substance in the

Page 9: Modelling, property verification and behavioural equivalence of lactose operon regulation

142 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

system.2 They are all related since all descriptions can bereached by some channel synchronization.

Our main process is called SystemFull (Fig. 4). It containsall relevant processes to lac regulation running in parallel. Thechannel names listed in its description (lowercase names) arerestricted to the processes inside it. We start our system withlactose outside the cell, no intracellular lactose and allolac-tose, a few galactoside permease and �-galactosidase enzymes,high glucose level, low cAMP concentration, all regulatorysites for lac operon released and some CRP and Lac repressorproteins.3

Lactose can be outside or inside cell in our model. For ex-ternal lactose (Fig. 4), we have the process Lactose_out, whichcan interact with permease (elac channel) and after that canenter the cell (ilac channel). Intracellular lactose are modelledusing three qualitative levels: none, low and high (process de-scriptions 4–6 in Fig. 4). When lactose is available inside cellit can react with �-galactosidase enzyme (rbeta channel) andits level can decrease. While there is not high concentrationof lactose, it can enter the cell (ilac channel) and its level canincrease.

Galactoside_permease (Fig. 4) process just allows the en-trance of lactose in the cell (elac channel). The activation oflac operon does not affect it in our model and, therefore, wedo not use concentration levels for it because the only changeis in the rate of lactose entering the cell at a given time.

�-galactosidase (processes 7 and 8 in Fig. 4) has two levels:low and high, which are affected by activation and repressionof lac operon. From low to high concentration we have ibetachannel and from high to low level dbeta channel. When re-acting with lactose (rbeta), this enzyme, at low level, can pro-duce allolactose (iallo) or, at high level, can produce glucose(iglu) and galactose.4 Since galactose does not participate inlac regulation we do not include it in our model.

Allolactose (processes 9 and 10 in Fig. 4) can be present atlow concentration in the cell or can be absent. When absent, theonly action the process can perform is its increase (iallo). Whenpresent, besides its production, it can bind to Lac repressor(ballo). After binding, its concentration will reduce.

Lac repressor can be bound (on) or unbound (off) to operatorsites (processes 11 and 12 in Fig. 4). When unbound, it can bindto O1 (bo1) and either O2 (bo2) or O3 (bo3). After that, thelac promoter will be repressed (rep). When allolactose bindsto Lac repressor (ballo), it unbinds the operator sites (ubo1,ubo23s and ubo23e) and unrepresses the promoter (urep).

All operator sites (processes 13–15 in Fig. 5) can only bindto Lac repressor (bo1, bo2 and bo3) and after that can onlyunbind from it (ubo1, ubo23s and ubo23e).

The lac promoter has four states: iu for deactivated and un-repressed, ir for deactivated and repressed, au for activated

2 Experimental results available in Ref. [20] were used to choose thenumber of process descriptions to each substance.

3 Process are all qualitative, that is, one process does not mean one unitof a substance.

4 We have restricted reaction products at low and high levels in our modelbecause we want to include a preferential product according to enzyme level.

and unrepressed and ar for activated and repressed (processes16–19 in Fig. 5, respectively). These processes can change theconcentration of �-galactosidase from low to high (ibeta) afterit is activated (act) and unrepressed (urep) and from high tolow (dbeta) after it is deactivated (iact) or repressed (rep).5

The ibeta synchronization abstracts several biological eventsto one—all transcription and translation steps between operonactivation and �-galactosidase production. We do not increaseall lac-related proteins concentrations to keep only relevant in-formation in our model.

CRP can be free in the cell (process 20 in Fig. 5) or bound atCRP site (process 21 in Fig. 5). When free, it can bind to cAMP(bc). After that, it binds to CRP site (bs) and activates the lacpromoter (act and acte). When bound, it can unbind cAMP(ubc) and, after that, it unbinds CRP site (ubs) and deactivatesthe promoter (iact and iacte).

We can have low or high cAMP levels (processes 22 and23 in Fig. 5). Changes in cAMP level depends on the glucoseconcentration.6 So, our cAMP processes always ask glucoseits level (lev). According to the answer (low or high), it canchange its concentration. If its level is raised (L_to_H), cAMPbinds to CRP (bc) and the complex binds to CRP site (bs)to start activation of lac operon (acte). If its level is reduced(H_to_L), cAMP unbinds CRP (ubc), CRP release its site (ubs)and deactivates lac operon (iacte).

Glucose concentration can be at high or low levels (processes24 and 25 in Fig. 5). Glucose can have an increase in its con-centration via �-galactosidase mediated reaction (iglu) or canbe asked for its level by cAMP process (lev followed by highor low). Glucose level can be increased (GLU_L_to_H) or de-creased (DGLU). We signal these changes in glucose concen-tration to facilitate the verification of some properties relatedto glucose influence in lac regulation. The decrease of glucoseconcentration without any apparent reason in process 24 occursto avoid the usage of more processes in our model for con-suming glucose. Instead of it, we abstract the consumption ofenergy using the DGLU channel.7

5. Automatic verification of systems

We want to show now that our modelling of lac regulationsystem is related to its specification and that the propertieslisted in Fig. 1 are satisfied by our modelling. To verify theseclaims we used the automatic verification tool called Concur-rency Workbench of the New Century (CWB-NC) [23]. In theSection 5.1, we check our system written in CCS with proper-ties the system should have. Each property is formulated as alogical formula, that defines a behaviour the system should orshould not have as it executes. Finally, in Section 5.2 we show

5 In fact, we have intermediary levels of transcription (from basal tohighest), but for the sake of simplicity we model only basal rate—low—andhighest rate—high.

6 How glucose level affects cAMP level is not entirely known yet [20].7 The usage of silent actions means that, according to a non-predictable

event (silent action), the system evolves to one or other branch of the non-deterministic choice.

Page 10: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 143

if our lac regulation system is weakly bisimilar to its specifi-cation, that is, if they have the same observable behaviour.

5.1. Model checking

Model checking is a method to automatically verify finitestate systems. In this approach, we describe the system usingCCS (specification language) and the properties using sometemporal logic. There are some aspects (or properties) that arebest checked by exploring the state space of the LTS of theprocess under consideration, rather than by transforming theminto equivalence relation checking questions. For example, forthe process Spec (that describes the behaviour of Lac repressorinteracting with operator sites of lactose operon), we may wishto know whether Lac repressor

“ has the possibility to bind to O1 and O2 now” or

“ always is released after got bound ”.

In order to check these kind of properties, we must be able todescribe them using some language with a well-defined syntaxand semantics. Thereafter the properties are expressed we canverify, manually or automatically, if they hold or not for aprocess.

The language used here to describe these properties is calledtemporal logic. A temporal logic is an extension of regularpredicate logic with modalities and enduring capabilities. Thislogic gives us the potential to reason about properties fordifferent computations at the same time, not just propertiesfor one computation. Moreover, we can describe propertieslike “the property is always possible” or “the action a willeventually happen”, that is, valid for several states of thecomputations.

One of the temporal logics used in the CWB-NC is an en-riched form of modal mu-calculus [24,25] that includes the op-erators of CTL [26] for processes.

We present a subset of CWB-NC logic [27] used in thissubsection to describe the properties of our system.

Definition 5.1 (Syntax). Given a set of actions K, a sub-set of CTL-formulae for process is given by the followinggrammar:

�, � : := tt | ff| ¬� | � ∨ � | � ∧ �| [K]� | 〈K〉� | [[K]]� | 〈〈K〉〉�| E(�U�) | A(�U�) | E(�W�) | A(�W�)

| EF� | EG� | AF� | AG�

The CTL formulae used in the CWB-NC include traditionalpropositional constants (tt and ff) and connectives (¬, ∨, and,∧), modal operators ([K], [[K]], 〈K〉 and 〈〈K〉〉) and temporaloperators (EF, EG, AF, AG, E(U), E(W), A(U) and A(W)).

The propositional and modal formulae make statementsabout one state (process) in the LTS, whereas the tem-poral formulae make statements about a sequence ofthese states. Thus we will introduce an auxiliary con-cept before presenting the formal semantics of these CTLformulae.

Definition 5.2 (Path). Let T = (Proc, Act, →) be a LTS ofCCS process, �i ∈ Act be actions and P0 ∈ Proc be a process.A path from P0 is a sequence of process = P0P1P2 . . ., such

that Pi�i+1→ Pi+1. A path is maximal in the sense that if it is

finite then the final process is unable to do any action.Let denote a path from P0. We use i to denote the (i+1)–th

element of , i.e., if = P0P1P2 . . . then i = Pi , where Pi isa process. We write P(P ) for the set of all paths from P, thatis defined by P(P ) = { ∈ Proc∗|0 = P }.

The semantics of CTL is defined by a satisfaction relation�. For any modal formula � we define when a process P has,or satisfies, the property �, written P ��. If P fails to have theproperty �, we write P /� �.

Definition 5.3 (Semantics). Let (Proc, Act, →) be a LTS ofCCS, P ∈ Proc be a process, K ⊆ L be an action set and �, �be CTL-formulae. The satisfaction relation � is defined by

P � tt,P /� ff,P � ¬� iff ¬(P ��),P � � ∨ � iff (P ��) or (P ��),P � � ∧ � iff (P ��) and (P ��),

P � [K]� iff ∀Q ∈ {P ′|P �→ P ′ & � ∈ K}.Q��,

P � 〈K〉� iff ∃Q ∈ {P ′|P �→ P ′ & � ∈ K}.Q��,

P � [[K]]� iff ∀Q ∈ {P ′|P �⇒ P ′ & � ∈ K}.Q��,

P � 〈〈K〉〉� iff ∃Q ∈ {P ′|P �⇒ P ′ & � ∈ K}.Q��,P � E(�U�) iff ∃ ∈ P(P ).(∃i�0.i�� & (∀0�k < i.k��)),P � A(�U�) iff ∀ ∈ P(P ).(∃i�0.i�� & (∀0�k < i.k��)),P � E(�W�) iff ∃ ∈ P(P ).((∃i�0.i�� & (∀0�k < i.k��)) | (∀i�0.i ��)),P � A(�W�) iff ∀ ∈ P(P ).((∃i�0.i�� & (∀0�k < i.k��)) | (∀i�0.i ��)),P � EF� iff ∃ ∈ P(P ).(∃i�0.i ��),P � EG� iff ∃ ∈ P(P ).(∀i�0.i ��),P � AF� iff ∀ ∈ P(P ).(∃i�0.i ��),P � AG� iff ∀ ∈ P(P ).(∀i�0.i ��).

Page 11: Modelling, property verification and behavioural equivalence of lactose operon regulation

144 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

The formulae are interpreted with respect to processes(states) in a LTS, thus tt and ff hold for every process and noprocess, respectively. ¬� holds for a process P if � does nothold for P, � ∨ � holds for P if either � or � does, and � ∧ �holds for P if both � and � do.

Let � be an action in K. A process P satisfies 〈K〉� if it hasan �-labelled transition going to a state that satisfies �, on theother hand, P satisfies [K]� if all its �-labelled transitions goto states that satisfy �. In the case that P has no such transition,P trivially satisfies [K]�.

The formulae 〈〈K〉〉� and [[K]]� are analogously interpretedas 〈K〉� and [K]�, respectively, except that the first �-labelledtransition can be preceded or followed by �-labelled transitions,as described in Definition 3.4.

E(�U�) holds for P if, along some path starting in P, �is true until a state is reached where � is true, where � isrequired to hold eventually. E(�W�) is interpreted analogouslyas E(�U�), except that here, � is not required to hold initially.

The interpretation for the formulae A(�U�) and A(�W�)

is the same as the previous ones, except that in these formulae(�U�) or (�W�) must hold for every path starting in P.

EF� holds for P if, along some path from P, some state P ′satisfies �, while EG� holds for P if all states along some pathfrom P satisfy �.

AF� holds for P if, along every path from P, some state P ′satisfies �, while AG� holds for P if every state along everypath from P satisfy �.

Based on the formal definition of CTL semantics, we candescribe the properties listed at the beginning of this subsectionand verify if the process Spec satisfies them. That propertiescan be described by formulae:

〈bO12〉tt and AG[bO12]〈free〉tt,

respectively, where Spec /� 〈bO12〉tt and Spec�AG[bO12]〈free〉tt. We can verify them (see LTS in Fig. 2(b)):

• By definition of 〈K〉�, Spec�〈bO12〉tt holds if and

only if ∃Q ∈ {P ′|SpecbO12→ P ′} & Q�tt. Because there

is no bO12-labelled transition starting in state Spec,

{P ′|SpecbO12→ P ′} = ∅ and thus there cannot be a state Q

that satisfies tt. Consequently, Spec /� 〈bO12〉tt.• By definition of AG�, Spec�AG[bO12]〈free〉tt holds if

and only if all processes along all paths from Spec satisfy[bO12 ]〈free〉tt. Thus:

◦ Spec�[bO12]〈free〉tt holds by vacuousness (by defi-nition of [K]�), since there is no bO12-labelled tran-sition starting in state Spec.

◦ Q1 �[bO12]〈free〉tt holds by vacuousness (by defini-tion of [K]�), since there is no bO12-labeled transitionstarting in state Q1.

◦ By definition of [K]�, Q2 �[bO12]〈free〉tt} holds ifand only if Spec�〈free〉tt holds, since there is a bO12-labelled transition from Q2 to Spec. By definitionof 〈K〉�, Spec�〈free〉tt holds if and only if Q1 �ttholds, since there is a free-labelled transition from

Fig. 6. CTL formulae.

Spec to Q1. As tt holds for all processes, Q1 �ttholds and, consequently, all previous formulae, includ-ing Q2 �[bO12]〈free〉tt, hold.

◦ Q3 �[bO12]〈free〉tt holds by vacuousness (by defini-tion of [K]�), since there is no bO12-labelled transi-tion starting in state Q3.

Consequently, Spec�AG[bO12]〈free〉tt.

Fig. 6 shows some selected properties of Fig. 1 written inCTL. These properties were checked and the obtained resultsagreed with the expected answers. The remaining propertieswere omitted because their formulae are similar to one of thosedepicted in Fig. 6. Each CTL operator has a meaning that canbe translated into an English sentence. The illustrated formulaecan be translated as follows:

(A) There exists one state in one computation where, betweenGLU_L_to_H and DGLU, a H_to_L occurs.

(C) There exists one state in one computation where BC willoccur.

(D) It is similar to A. We selected this property because itsresult is different from A result.

(K) There does not exist a computation where IALLO occursbefore RBETA and there exists one state in one compu-tation where IALLO, followed (preceded) or not by silentactions (�), occurs after RBETA.

(L) There exists one state in one computation where IALLO,followed (preceded) or not by silent actions (�), occursafter ILAC.

(U) For all states at all computations, BO2 and BO3 do notoccur one after another and, at some time, they will occurbetween BO1 and UBO1.

In a previous work [14], when we tried to verify the proper-ties for our model SystemFull (and its auxiliary descriptions inFigs. 4 and 5), we were faced with the state explosion problem,that is, due to the huge number of generated states, the modelchecker was unable to verify the desired properties. We dealtwith this problem by adapting our model to each property. Firstof all, we got rid of every non-restricted channel from the modelin Figs. 4 and 5. Thereafter, for each property we includedonly the non-restricted channels related to it. For instance, themodel used to property A contains only GLU_L_to_H, H_to_L

Page 12: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 145

Fig. 7. Processes Sys1 and Sys2 are observationally equivalent since P1 ≈ P2.

Fig. 8. CCS specification of lactose entrance in cell.

Fig. 9. CCS specification of negative regulation.

and DGLU channels, exactly at the same places depicted in ourmodel. This approach, although feasible, involves editing themodel for each property that should be verified. In the next sec-tion, we propose a different approach to enable the verificationof properties of large systems.

5.2. Checking observational equivalence relations

We have seen that checking equivalence is a natural approachto establish the correctness of models of systems described atdifferent levels of abstraction in the CCS language. Weak bisim-ulation (or observational equivalence) is a congruence with re-spect to the semantics of CCS. This means that, if a process P1is observationally equivalent to another one P2, the behaviourof any process that contains P1 will remain the same if wesubstitute P1 by P2 as shown in Fig. 7.

We will show now that the SpecFull process is equivalentto SystemFull. But, since the LTS of the SystemFull processcould not be built automatically by the tool (because of stateexplosion problem), we will split our process descriptions incomponents and show that some processes of system are ob-servationally equivalent to some processes of specification. Wehave processes in our specification (Fig. 3) that describe the be-haviour of lactose entrance in cell (Fig. 8), negative regulation(Fig. 9) and positive regulation (Fig. 10). Thus, we will show

Fig. 10. CCS specification of positive regulation.

Table 1Information about equivalence relation verification

Equivalence Answer Running time

Entrance True < 0s001Negative regulation True < 0s001Positive regulation True < 0s001

Table 2Automaton size for each process description

Process No. of No. of Time tostates transitions build

SpecEntrance 3 5 < 0s001SysEntrance 15 32 < 0s001SpecNegReg 18 19 < 0s001SysNegReg 26 27 < 0s001SpecPosReg 14 16 < 0s001SysPosReg 21 23 < 0s001Beta_galactosidase_low 10 12 < 0s001Allolactose_none 2 3 < 0s001Promoter_iu 7 11 < 0s001Glucose_high 7 11 < 0s001SpecFull 35473 168116 16s312SystemFull > 1 million Unknown > 2 h

that the concrete system is equivalent to the corresponding ab-stract version.

The SysEntrance process is described as

(Lactose_out|Galactoside_permease|Lactose_in_none)\{elac, ilac}.

All process descriptions referenced by some process are listed inthe system too (processes Lactose_in_low and Lactose_in_highfor example).

The SysNegReg process is described as

(Lac_repressor_off|Operator1|Operator2|Operator3)

\{bo1, bo2, bo3, ubo1, ubo23s, ubo23e}.And the SysPosReg process is described as

(CRP_off|C_AMP_low)\{acte, iacte, bs, ubs, bc, ubc}.The answer and the running time for each equivalence relationverification is shown in Table 1. We also show the size of theautomaton built for each process description in Table 2. Notethat the process SystemFull takes more than 2 h of runningtime in CWB-NC. After that, CWB-NC exited with memoryallocation failure in a computer with 4 GBytes of RAM. Theautomaton was not entirely built, but when occurred the lackof memory, it has more than 1 million states.

Page 13: Modelling, property verification and behavioural equivalence of lactose operon regulation

146 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

We have minimized all automata related to specifications oflac regulation with respect to the observational equivalence re-lation. So we have an automaton that is observationally equiv-alent to each specification. We show in Fig. 11 some of them.All automata always start from state number 0 (zero).

6. Related work

Uhrmacher et al. have written a great review of models forSystems Biology in a recent paper [28]. They classified the ex-isting models in a three-dimensional space: continuous and dis-crete; qualitative and quantitative; stochastic and deterministic.Besides, they point out the usage of single and multiple levelsof system modelling. We will not cover all types of models re-viewed in their work, but we have chosen some papers moredirected to what we proposed here.

Yildirim and Mckey used simulation to validate their modelof lac operon—non-linear differential delay equations. But theyhave found much more data because they have a quantitativemodel. This kind of model only allows simulation and the dis-covery of some steady states in the system. They relate thatno full stability analysis of steady states was possible in theirmodel [12]. When we use qualitative models, we lose someaccuracy to enable the analysis of system structure.8

An approach similar to ours was proposed by Ciobanuet al. for Albers-Post mechanism to ion transport acrossmembrane—Na+ pump. They used �-calculus, a different pro-cess algebra. But they only checked for deadlocks9 in theirsystems [11]. The �-calculus has powerful specification mech-anisms, like replication and reconfiguration (the interface of aprocess may dynamically change during execution). However,these often lead to infinite state systems, and therefore (auto-matic) model checking and observational equivalence can onlybe used in particular cases.

Chabrier–Rivier and colleagues have modelled Kohn’s com-pilation on the mammalian cell-cycle control in a new mod-elling language [13]. They have used CTL to check systemproperties related to metabolic pathways. Chabrier–Rivier et al.presented the Biochemical Abstract Machine (BIOCHAM) ina latter paper [29]. It is based on their previous proposed lan-guage and supports modelling, simulation and checking of tem-poral properties. Another paper related to this research grouprelates the application of machine learning techniques to in-fer new molecular interaction rules from temporal propertiesusing BIOCHAM [30]. They presented an ad hoc machinelearning algorithm to discover bio-molecular interaction rulesfrom a partial model and constraints on the system behaviour.These constraints were expressed in temporal logic with pos-itive (expected properties) and negative (properties to avoid)formulae.

Mardare and Priami have advanced in the model checkingof biological systems by proposing a translation of the Ambi-

8 This justifies our use of more than one process description to modeldifferent concentrations (or activity level) of a substance.

9 A process deadlocks if its transition system has states with no succes-sors.

ent calculus to a Kripke structure using labelled syntax trees.They implement some algorithms in NuSMV (a model check-ing tool) to verify properties written in CTL* (an extension ofCTL) [31].

Another possible way through property verification of bio-logical systems is the usage of probabilistic model checking, aformal verification technique for the analysis of systems whichexhibit stochastic behaviour. Kwiatkowska et al. used the prob-abilistic model checking tool PRISM in cell-cycle control ineukaryotes [32]. Calder et al. have used the PRISM modelchecker to analyse signal transduction pathways. Their examplepathway was described using continuous time Markov chains(CTMC) and the quantitative, temporal biological queries werewritten in continuous stochastic logic CSL [33].

Danos and Krivine proposed an extension of CCS, called Re-versible CCS [34]. The main idea is the usage of a backtrack-ing mechanism where each process records in its memory allactions it performs. They argue that, as reversibility is the rulein biological interaction, the proposed process algebra could beused for modelling biological systems. However, they have nei-ther a simulation tool nor a model checker for Reversible CCS.

7. Conclusion

In this paper, we have shown a model of lactose operonregulation using the CCS process algebra. We obtained a formaldescription for this regulatory system that could be analysedto verify system properties. Moreover, CCS was used to assurecompatibility among different views of the same system (viacomputing observational equivalences).

One of the problems with fomalization is how to assure thatthe model is a faithful abstraction of the real system. Of course,this cannot be formally proven. There are mainly two comple-mentary ways to validate a model: simulation (or testing) andanalysis (for example, model checking). Like in software devel-opment, typically simulation can show how the system behavesin the cases that were simulated, but usually these results can-not be safely generalized. In contrast, verification can answerwhether or not a system has a particular property (not in oneexecution, but in all possible ones). In this paper, we used auto-matic verification as a means of validating a formal model. Inthis sense, each of the properties that the system was supposedto have that could be checked made our belief stronger that theproposed model was really faithful. A second step, after oneis confident in the formal model, would be to use verification(or other kinds of analysis) to obtain new information aboutthe system (trying to prove properties for which the answer isunknown). This can provide a non-costly means to determinewhich laboratory experiments shall be done.

To cope with the problem of the complexity of the model,we proposed to use a well-established concept of process al-gebras, namely observational equivalence to achieve a consis-tency between the multiple views of the same components. Asystem is viewed as a composition of components, collaborat-ing to perform some tasks. Then, we use multiple views ofeach component, at different levels of abstraction, and prove(automatically) that these views are consistent (observationally

Page 14: Modelling, property verification and behavioural equivalence of lactose operon regulation

M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148 147

Fig. 11. LTSs for observationally equivalent minimized (a) SpecEntrance; (b) SpecPosReg; and (c) SpecNegReg processes.

equivalent). This means that all properties already verified forthe first system would also hold for the second more concreteone. Then, to prove properties, instead of using a system withall components completely specified, we use a system in whichsome components are substituted by their abstract versions.This can reduce the size of the model, and enhance readabilityand the possibility of verification of properties. Using equiv-alences is a good way to achieve a modular construction ofsystems via refining different components separately. Since bi-ological systems tend to be very complex and large, this ap-proach seems to be an adequate formalization of the existingad hoc abstraction mechanisms.

We can also evaluate if our model is correct by studying“mutant systems”. We can knock out one of the components oflac operon regulation and verify what our model predicts. Thenwe compare our results with the ones obtained in the literature.For example, it is known that if the Operator1 site is not usedby the Lac repressor, the lac operon will not be repressed—thatis, the transcription repression will be very low.

To proceed our work, some future steps are the inclusion ofother kinds of interactions (such as metabolism) and informa-tion about where these interactions occur in the cell. Besides,we want to make use of MONET database [35] to translate bi-ological data into CCS language (or another process algebra)in a semi-automatic way. This task could be accomplished byuser selection of some data sets from MONET—up to now onlymetabolic pathways are available. If regulatory and signalingpathways would be included in MONET, a complete automa-tion of this task could be achieved.

Although the CCS language is precise, for biologists it mightnot be the most desirable way to expresses their models andproperties about them. A graphical language tailored to theapplication field, that is, biological pathways, could improvethe acceptability and usage of the proposed approach.

A good model must take all relevant biological informa-tion into account, and present results that are compatible withthe ones reached in vitro. The main goal of our work is todo analysis of biochemical processes. Using the CCS, CTLand observational equivalence we may be able to verify some

properties of biological systems and to compare a tentativemodelling of a system with its (possibly) known specification,shedding light on relevant questions of pathways, such as thepossibilities of energy generation given a certain substance inthe cell and the prospect of feasible and unfeasible chemicalreactions in a pathway of a given species. The possibility ofhaving multiple compatible views of the same system at dif-ferent levels of abstraction within one formalism seems to pro-vide an adequate way to cope with the complexity of biologicalsystems.

References

[1] JigCell Website Glossary. http://jigcell.biol.vt.edu/glossary.html.[2] E.O. Voit, Computational Analysis of Biochemical Systems, Cambridge

University Press, United Kingdom, 2000.[3] H.D. Jong, Modeling and simulation of genetic regulatory systems: a

literature review, J. Comput. Biol. 9 (1) (2002) 67–103.[4] M. Antoniotti, A. Policriti, N. Ugel, B. Mishra, Model building and

model checking for biochemical processes, Cell Biochem. Biophys. 38(2003) 271–286.

[5] V.N. Reddy, Modeling biochemical pathways: a discrete event systemsapproach, Master’s Thesis, University of Maryland, 1994.

[6] N. Lemke, F. Herédia, C.K. Barcellos, A.N. Reis, J.C.M. Mombach,Essentiality and damage in metabolic networks, Bioinformatics 20 (1)(2004) 115–119.

[7] R. Milner, Communication and Concurrency, Prentice-Hall, New York,1989.

[8] R. Milner, J. Parrow, A calculus for mobile processes I, Inf. Comput.100 (1992) 1–40.

[9] R. Milner, J. Parrow, D. Walker, A calculus for mobile processes II, Inf.Comput. 100 (1992) 41–77.

[10] A. Regev, W. Silverman, E. Shapiro, Representation and simulation ofbiochemical processes using the �-calculus process algebra, in: PacificSymposium on Biocomputing, vol. 6, World Scientific Press, Singapore,2001, pp. 459–470.

[11] G. Ciobanu, V. Ciubotariu, B. Tanasa, A �-calculus model of the Napump, Genome Inf. 13 (2002) 469–471.

[12] N. Yildirim, M.C. Mackey, Feedback regulation in the lactose operon: amathematical modelling study and comparison with experimental data,Biophys. J. 84 (2003) 2841–2851.

[13] N. Chabrier-Rivier, M. Chiaverini, V. Danos, F. Fages, V. Schächter,Modeling and querying biomolecular interaction networks, Theor.Comput. Sci. 325 (2004) 25–44.

Page 15: Modelling, property verification and behavioural equivalence of lactose operon regulation

148 M.C. Pinto et al. / Computers in Biology and Medicine 37 (2007) 134–148

[14] M.C. Pinto, L. Foss, J.C.M. Mombach, L. Ribeiro, Modeling and propertyverification of lactose operon regulation, Lecture Notes in Bioinformatics3594 (2005) 95–106.

[15] D.L. Nelson, M.M. Cox, Lehninger Principles of Biochemistry, fourthed., W. H. Freeman, New York, 2004.

[16] F. Jacob, J. Monod, Genetic regulatory mechanisms in the synthesis ofproteins, J. Mol. Biol. 3 (1961) 318–389.

[17] P. Wong, S. Gladney, J.D. Keasling, Mathematical model of the lacoperon: inducer exclusion, catabolite repression, and diauxic growth onglucose and lactose, Biotechnol. Prog. 13 (2) (1997) 132–143.

[18] P.J.E. Goss, J. Peccoud, Quantitative modeling of stochastic systems inmolecular biology by using stochastic Petri nets, Proc. Natl. Acad. Sci.USA 95 (12) (1998) 6750–6755.

[19] P. Lecca, C. Priami, P. Quaglia, B. Rossi, C. Laudanna, G. Constantin,A stochastic process algebra approach to simulation of autoreactivelymphocyte recruitment, Simulation 80 (6) (2004) 273–288.

[20] B. Lewin, Genes VIII, Pearson Prentice Hall, USA, 2004.[21] R. Keller, Formal verification of parallel programs, Commun. ACM 19

(7) (1976) 371–384.[22] J.M.G. Vilar, C.C. Guet, S. Leibler, Modeling network dynamics: the

lac operon, a case study, J. Cell Biol. 161 (3) (2003) 471–476.[23] The CWB-NC website. http://www.cs.sunysb.edu/∼cwb/.[24] E.A. Emerson, C.-L. Lei, Efficient model checking in fragments of

the propositional mu-calculus, in: Symposium on Logic in ComputerScience, 1, 1986, Cambridge, Massachusetts, USA, IEEE ComputerSociety Press, pp. 459–470.

[25] D. Kozen, Results on the propositional mu-calculus, Theor. Comput.Sci. 27 (3) (1983) 333–354.

[26] E.M. Clarke, E.A. Emerson, A.P. Sistla, Automatic verification of finite-state concurrent systems using temporal logic specifications, ACM Trans.Programming Languages Syst. 8 (2) (1986) 244–263.

[27] R. Cleaveland, J. Parrow, B. Steffen, The concurrency workbench: asemantics-based tool for the verification of concurrent systems, ACMTrans. Programming Languages Syst. 15 (1) (1993) 36–72.

[28] A.M. Uhrmacher, D. Degenring, B.P. Zeigler, Discrete event multi-level models for systems biology, Trans. Comput. Syst. Biol. 1 (2005)66–89.

[29] N. Chabrier-Rivier, F. Fages, S. Soliman, The biochemical abstractmachine BIOCHAM, in: CMSB 2004—Computational Methods inSystems Biology, Lecture Notes in Computer Science, vol. 3082,Springer, Berlin, pp. 172–191.

[30] L. Calzone, N. Chabrier-Rivier, F. Fages, L. Gentils, S. Soliman, Machinelearning bio-molecular interactions from temporal logic properties,in: Proceedings of CMSB 2005—Computational Methods in SystemsBiology, 2005.

[31] R. Mardare, C. Priami, Logical analysis of biological systems, Fundam.Informaticae 64 (2005) 271–285.

[32] M.Z. Kwiatkowska, G. Norman, D. Parker, Probabilistic model checkingin practice: case studies with PRISM, SIGMETRICS Perform. Eval.Rev. 32 (4) (2005) 16–21.

[33] M. Calder, V. Vyshemirsky, D. Gilbert, R. Orton, Analysis of signallingpathways using the prism model checker, in: Proceedings of CMSB2005—Computational Methods in Systems Biology, 2005, pp. 179–190.

[34] V. Danos, J. Krivine, Reversible communicating systems, CONCUR2004—Concurrency Theory, Lecture Notes in Computer Science, vol.3170, Springer, Berlin, 2004, pp. 292–307.

[35] E. Battistella, J.G.C. Souza, C.K. Barcellos, N. Lemke, J.C.M. Mombach,An integrated model for cellular analysis, Genet. Mole. Res. 4 (2005)506–513.

Marcelo Cezar Pinto was born in Santa Maria, Brazil, in 1978. He re-ceived the degree of Master in Computer Science from the UniversidadeEstadual de Campinas, Brazil, in 2002. He is currently a Ph.D. student atUniversidade Federal do Rio Grande do Sul, Brazil. His main research inter-ests are computational biology, systems biology, formal methods and modelchecking.

Luciana Foss received her Bachelor degree in Computer Science in 2000from the Caxias do Sul University, Brazil, and her M.Sc. in ComputerScience from Universidade Federal do Rio Grande do Sul (UFRGS) in 2003.She is currently a Ph.D. candidate at UFRGS and her advisors are LeilaRibeiro (UFRGS) and Andrea Corradini (University of Pisa, Italy). Her mainresearch interests include formal specification and verification of concurrentand distributed systems, formal semantics, graph transformation systems andmodularization and composition of systems.

Josè Carlos Merino Mombach was born in Porto Alegre, Brazil, in 1964.He received the degree of Doctor in Sciences from the Universidade Federaldo Rio Grande do Sul, Brazil, in 1997. Dr. Mombach has been an AdjunctProfessor at the Graduate School in Applied Computing, Universidade do Valedo Rio dos Sinos, Brazil, since 1999. Since 2002 he has a researcher of theLaboratòrio de Bioinformàtica e Biologia Computacional at the Universidadedo Vale do Rio dos Sinos. His main research interests are bioinformatics,computational biology, and complex systems.

Leila Ribeiro is Associate Professor at the Theoretical Computer ScienceDepartment of the Federal University of Rio Grande do Sul (UFRGS), Brazil.She received her B.Sc. and M.Sc. degrees in Computer Science from theUFRGS in 1988 and 1991, respectively, and her Ph.D. degree in Informat-ics in 1996 from the Technical University of Berlin, Germany. Due to hercontributions to the area of Informatics, in 1999, she was honoured withthe Santista Prize in Informatics, in the category young scientist (sponsoredby the Brazilian Bunge Foundation). Her main interests are formal meth-ods, software engineering, bioinformatics and model and analysis of largedistributed/concurrent systems.