1 Pattern storage in gene-protein networks Pattern storage in gene-protein networks Pattern storage in gene-protein networks Ronald Westra Department of Mathematics Maastricht University
Feb 02, 2016
1
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Pattern storage in
gene-protein networks
Ronald Westra
Department of Mathematics
Maastricht University
2
Pattern storage in gene-protein networksPattern storage in gene-protein networks
1. Problem formulation
2. Modeling of gene/proteins interactions
3. Information Processing in Gene-Protein Networks
4. Information Storage in Gene-Protein Networks
5. Conclusions
Items in this Presentation
3
Pattern storage in gene-protein networksPattern storage in gene-protein networks
1. Problem formulation
How much genome is required for an organism to survive in this World?
Some observations ...
4
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Mycoplasma genitalium500 nm580 Kbp477 genes74% coding DNAObligatory parasitic endosymbiont
Nanoarchaeum equitans400 nm460 Kbp487 ORFs95% coding DNAObligatory parasitic endosymbiont
SARS CoV100 nm30 Kbp5 ORFs98% coding DNARetro virus
Minimal genome sizes
5
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Organisms like Mycoplasma genitalium, Nanoarchaeum equitans, and the SARS Corona Virus are able to exhibit a large amount of complex and well-tuned behavioral patterns despite an extremely small genome
A pattern of behaviour here is the adequate conditional sequence of responses of the gene-protein interaction network to an external input: light, oxygen-stress, pH, feromones, and numerous organic and anorganic molecules.
6
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Questions:
* How do gene-protein networks perform computations and how do they process real time information?
* How is information stored in gene-protein networks?
* How do processing speed , computation power,
and storage capacity relate to network properties?
Problem formulation
7
Pattern storage in gene-protein networksPattern storage in gene-protein networks
CENTRAL THOUGHT [1]
What is the capacity of a gene-protein network to store input-output patterns, where the stimulus is the input and the behaviour is the output.
How does the pattern storage capacity of an organism relate to the size of its genome n, and the number of external stimuli m?
8
Pattern storage in gene-protein networksPattern storage in gene-protein networks
CENTRAL THOUGHT [2]
Conjecture:
The task of reverse engineering a gene regulatory network from a time series of m observations, is actually identical to the task of storing m patterns in that network.
In the first case an engineer tries to design a network that fits the observations; in the second case Nature selects those networks/organisms that best perform the input-output mapping.
9
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Requirements
For studying the pattern storage capacity of a gene-protein interaction system we need:
1. a suitable parametrized formal model
2. a method for fixing the model parameters with the given set of input-parameters
We will visit these items in the following slides ...
10
Pattern storage in gene-protein networksPattern storage in gene-protein networks
2. Modeling the Interactions between Genes and Proteins
Prerequisite for the successful reconstruction of gene-protein networks is the way in which the dynamics of their interactions is modeled.
11
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Components in Gene-Protein networks
Genes: ON/OFF-switches
RNA&Proteins: vectors of information exchange between genes
External inputs: interact with higher-order proteins
12
Pattern storage in gene-protein networksPattern storage in gene-protein networks
General state space dynamics
The evolution of the n-dimensional state space vector x (gene expressions) depend on p-dim inputs u, parameters θ and Gaussian white noise ξ.
13
Pattern storage in gene-protein networksPattern storage in gene-protein networks
external inputs
genes/proteins
input-coupling
interaction-coupling
Example of an general dynamics network topology
14
Pattern storage in gene-protein networksPattern storage in gene-protein networks
The general case is too complex
Strongly dependent on unknown microscopic details
Relevant parameters are unidentified and thus unknown
Therefore approximate interaction potentials and qualitative methods seem appropriate
15
Pattern storage in gene-protein networksPattern storage in gene-protein networks
1. Linear stochastic state-space models
Following Yeung et al. 2003 and others
x : the vector (x1, x2,..., xn) where xi is the
relative gene expression of gene ‘í’u : the vector (u1, u2,..., up) where ui is the
value of external input ‘í’ (e.g. a toxic agent)νξ(t) : white Gaussian noise
)(tvBA ξuxx
16
Pattern storage in gene-protein networksPattern storage in gene-protein networks
2. Piecewise Linear Models
Following Mestl, Plahte, Omhold 1995 and others
bil sum of step-functions s+,–
17
Pattern storage in gene-protein networksPattern storage in gene-protein networks
3. More complex non-linear interaction models
Example: including quadratic terms;
uaxxxx
BRAdt
d T:
)()()1( xx iiiii aa
dt
da
k kk )()( //T/ wxx
18
Pattern storage in gene-protein networksPattern storage in gene-protein networksOur mathematical framework for
non-linear gene-protein interactions
uaxxxx
BRAdt
d T:
)()()1( xx iiiii aa
dt
da
k kk )()( //T/ wxx
19
Pattern storage in gene-protein networksPattern storage in gene-protein networks
3. Information processing in sparseHierarchic gene-protein networks
Consider a network as described before with only a few connections (=sparse) and where few genes/proteins control the a considerable amount of the others (=hierarchic)
20
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Information Processing in random sparse Gene-Protein Interactions
random sparse network, n=64, k=2 largest cluster therein
21
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Information Processing in random sparse Gene-Protein Interactions
Now consider the information processing time (= #iterations) necesary to reach all nodes (proteins)
as a function of:
The number of connections (= #non-zero-elements) in the network
22
Pattern storage in gene-protein networksPattern storage in gene-protein networks
phase transition from slow to fast processing
23
Pattern storage in gene-protein networksPattern storage in gene-protein networks
24
Pattern storage in gene-protein networksPattern storage in gene-protein networks
* Ben-Hur, Siegelmann: Computation with Gene Networks, Chaos, January 2004
* Skarda and Freeman: How brains make chaos in order to make sense of the world,
Behavioral and brain sciences, Vol. 10 1987
Philosophy: Information is stored in the network topology (weights, sparsity, hierarchy) and the system dynamics
4. Memory storage in gene-protein networks
25
Pattern storage in gene-protein networksPattern storage in gene-protein networks
We assume a hierarchic, non-symmetric, and sparse gene/protein network (with k out of n possible connections/node) with linear state space dynamics
Suppose we want to store M patterns in the network
Memory storage in gene-protein networks
26
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Linearized form of a subsystem
First order linear approximation of system separates state vector x and inputs u.
uxx
BAdt
d
27
Pattern storage in gene-protein networksPattern storage in gene-protein networks
input-output pattern:
The organism has (evolutionary) learned to react to an external input u (e.g. toxic agent, viral infection) with a gene-protein activity x(t).
This combination (x,u) is the input-output PATTERN
28
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Memory Storage =
Network Reconstruction
Using these definitions it is possible to map the problem of pattern storage to the * solved * problem of gene network reconstruction with sparse estimation
29
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Information Pattern:
Now, suppose that we have M patterns we want to store in the network:
30
Pattern storage in gene-protein networksPattern storage in gene-protein networks
The relation between the desired patterns (state derivatives, states and inputs) defines constraints on the data matrices A and B, which have to be computed.
Pattern Storage: method 1.0
31
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Computing the optimal A and B for storing the Patterns
The matrices A and B, are sparse (most elements are zero):
Using optimization techniques from robust/sparse optimization, this problem can be defined as:
BUAXXBABA
:tosubject,min11,
Pattern Storage: method 1.0
32
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Number of retrieval errors as a function of the number of nonzero entries k, with: M = 150 patterns, N = 50000 genes.
1st order phase transition from error-free memory retrieval
kC
33
Pattern storage in gene-protein networksPattern storage in gene-protein networks
kC
Number of retrieval errors versus M with fixed N = 50000, k = 10.
1st order phase transition to error-free memory retrieval
34
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Critical number of patterns Mcrit versus the problem size N,
35
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Pattern Storage: method 2.0
A pattern corresponds to a converged state of the system hence:
Therefore a sparse system ∑ = {A,B} is sought that maps the inputs to the patterns {U,X}, which leads to:
0dt
dx
36
Pattern storage in gene-protein networksPattern storage in gene-protein networks
LP:
subject to:
1. condition for stationary equilibrium:
2. condition to avoid A = B = 0:
3. avoid A = 0 by using degradation of proteins
and auto-decay of genes: diag(A) < 0
11,
||||)1(||||minarg*}*,{2
BABApnn RBRA
00 BUAXX
1ˆ BAqT
Computing optimal sparse matrices
1ˆ BAqT
37
Pattern storage in gene-protein networksPattern storage in gene-protein networks
The sparsity in the gene/protein interaction matrix A is
kA : the number of non-zero elements in A
This can be scaled to the size of A: N, and we obtain:
pA = kA/N,
Similarly for the input-coupling B:
pB = kB/P.
The sparsity in A and B
38
Pattern storage in gene-protein networksPattern storage in gene-protein networks
B
A
Results: A
B
gene-gene
input-gene
39
Pattern storage in gene-protein networksPattern storage in gene-protein networks
B
A
A
B
gene-gene
input-gene
40
Pattern storage in gene-protein networksPattern storage in gene-protein networks
sparsity versus the number of stored patterns
There are three distinct regions with different ‘learning’ strategies separated by order transitions
A
B
gene-gene
input-gene
41
Pattern storage in gene-protein networksPattern storage in gene-protein networks
sparsity versus the number of stored patterns
Region I : all information is
exclusively stored in B.
Region II : information is preferably stored in A.
Region III : no clear preference for A or B, Highest ‘order’.
Highest ‘disorder’.
A
B
gene-gene
input-gene
42
Pattern storage in gene-protein networksPattern storage in gene-protein networks
sparsity versus the number of stored patterns
I : ‘impulsive’
II : ‘rational’
III : ‘hybrid’.
A
B
gene-gene
input-gene
43
Pattern storage in gene-protein networksPattern storage in gene-protein networks
The entropy of the macroscopic system relates to the
relative fraction of connections pA and pB as:
As A and B are indiscernible the total entropy is:
Phase transitions and entropy
)1log()1(log AAAAA ppppS
)1log()1(log BBBBB ppppS
BAM SSS
44
Pattern storage in gene-protein networksPattern storage in gene-protein networks
The entropy of the microscopic system A relates to
the degree distribution: the number of connections fi
of node i .
Let P(v) be the probability that a given node has v
outgoing connections: and
Information entropy
N
iiiAAAA vvppppS
1
log)1log()1(log
1)(0
dvvPApdvvvP
0
)(
0
log)( dvvvvPSS M
45
Pattern storage in gene-protein networksPattern storage in gene-protein networks
With P the Laplace distribution for large networks the average entropy per node converges to:
Information entropy [2]
)log(11log 2AEAAM ppNpS
N
Ss
With Euler's constant. 0.5772....E
46
Pattern storage in gene-protein networksPattern storage in gene-protein networks
This also allows the computation of the gain in information entropy if one connection is added:
Information gain per node
N
s
Information gain per node
If this formalism is applied to our network structure we obtain:
47
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Left: the entropy S versus for n=100, p=30, based on 1180 observations, Right: the gain in entropy for the same data set.
Again the three learning strategies are clearly visible {impulsive, rational, hybrid}
Information gain per node
48
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Relation between pA = kA/n and pB = kB/p
averaged for 10116 measurements. .
Relation between sparsities
49
Pattern storage in gene-protein networksPattern storage in gene-protein networks
5. Conclusions
Non-linear time-invariant state space models for gene-protein networks exhibit a range of complex behaviours for storing input-output patterns in sparse representations.
In this model information processing (=computing) and pattern storage (=learning) exhibit multiple distinct 1st and 2nd order continuous phase transitions
There are two second-order phase transitions that divide the network learning in three distinct regions, ‘impulsive’, ‘rational’, ‘hybrid’.
50
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Other members of trans-national University Limburg -Bioinformatics Research Team
University of Hasselt (Belgium):
• Goele Hollanders (PhD student)• Geert Jan Bex• Marc Gyssens
University of Maastricht (Netherlands):
• Stef Zeemering (PhD student)• Karl Tuyls• Ralf Peeters
51
Pattern storage in gene-protein networksPattern storage in gene-protein networks
Discussion …
Ronald Westra
Department of Mathematics
Maastricht University