Transcription Networks And The Cell’s Functional Organization

Transcription Networks And The Cell’s Functional Organization

Presenter: Roni Sharf

Topics to be discussed

• Introduction

• Transcription Networks

• Appendix C

• Albert-Laszlo Barabasi & Zoltan N.Oltvai Article

• Conclusions

Introduction and basic concepts

What is a cell?

The cell is an integrated device that is made ofseveral thousand types of interacting proteins.

A protein is a nanometer size molecular machinethat carries out a specific task with exquisite precision.

A cell knows how to monitor its environment and tocalculate the amount of proteins and which types

are needed in order to carry out a specific task.

Introduction and basic concepts (cont.)

for example: when a sugar is sensed by the cell,the cell begins to produce proteins that will transport the sugar into the cell and benefit from it.

another exmaple is when the cell is damaged, itproduces the needed amount of repair proteins to fix it.The “tool” that helps the cell to determine what proteins are needed and at which rate, an information-processing function is carried out by transcription networks.

The cognitive problem of the cell

Cells, being very complex, can sense many different signals, such as beneficial nutrients, harmful chemicals, temperature, biological signaling molecules from other cells and more. The response to these signals is presented by producing appropriate proteins that can address the given situation.

In order to represent these environmental states , the celluses special proteins called transcription factors.

The cognitive problem of the cell (cont.)

Each active transcription factor can bind the DNA to regulate the rate at which specific target genes are read. The genes are read into mRNA, which is then translated into protein, which can act on theenvironment. Their activity in the cell is considered an internal representation of the environment.

The most important internal representations are of cell survival and growth. Of course there are others as well that represent the cell “starvation” or the damage of the DNA. In each case the appropriate protein response is carried out.

The cognitive problem of the cell (cont.)

Elements of transcription networks

The interaction between transcription factors and genes is described by transcription networks.

Each gene is a stretch of DNA whose sequence encodes the information needed for production of the protein.

Elements of transcription networks (cont.)

Transcription of a gene is the process by which RNA polymerase (in short RNAp) produces mRNA that corresponds to that gene’s coding sequence.

The mRNA is then translated into a protein, that is also known as the gene product. The rate at which the gene is transcribed, the number of mRNA produced per unit time, is controlled by a regulatory region of DNA, known as the promoter, that preceds the gene.



Whereas RNAp acts on virtually all of the genes, changes in the expression of specific genes are due to transcription

factors.

Each transcription factor modulates thetranscription rate of a set of target genes. Transcription factors bind specific sites in the promoters of the regulated genes, thus affecting the transcription rate.

When bound, they change the probability per unit time that RNAp binds the promoter and produces an mRNA molecule.


The transcription factors thus affect the rate at which RNAp initiates transcription of the gene.Transcription factors can act as activators, that increase the transcription rate of the gene

Gene Y

Increased transcription

YY

Y

Y

Bound activator


Transcription factors can act as repressors, that reduce the transcription rate of the gene

Bound repressor

No transcription


Transcription factor proteins are themselves encoded by genes, which are regulated by other transcription factors, which in turn may be regulated by other transcription factors, and so on.This set of interactions forms a transcription network. Transcription network describes all of the regulatory transcription interactions in a cell. In the network, the nodes are genes and edges represent transcriptional regulation of one gene by the protein product of another gene.


Edge means that the product of gene X is a transcription factor protein that binds the promoter of gene Y to control the rate at which gene Y is transcribed. The inputs to the network are signals that carry information from the environment. For example, signal can cause X to rapidly shift to its active state , bind the promoter of gene Y, and increase the rate of transcription, leading to increased production of protein Y, as whas shown before.


A transcription network that represents about 20% of the transcription interactions in the bacterium E.coli.

return

Properties of a transcription network graph

Each edge in a transcription network corresponds to an interaction in which a transcription factor directly controls the transcription rate of a gene.These interactions can be of two types:• Activation (positive control) – occurs when a

transcription factor increases the rate of transcription when it binds the promoter. The sign on the edge is “+”.

• Repression (negative control) – occurs when a transcription factor reduces the rate of transcription when it binds the promoter. The sign on the edge is “-”.

Properties of a transcription network graph (cont.)

Transcription factors act primarily as either activators or repressors. Nodes that send out edges with mostly “-” signs represent repressors. Activators send out mostly “+” signs. But there can be some target genes for which the Activators act as repressors and vice versa.


The edges have also numbers that correspond to the strength of the interaction. The strength of the effect of a transcription factor on the transcription rate of its target gene is described by an input function. When X regulates Y, () the number of molecules of protein Y produced per unit time is a function of the concentration of X in its active form, .


Rate of production of Y = f(). The function is monotonic.It is an increasing function when X is an activator and a decreasing function when X is a repressor.

A useful function that describes many real gene input functions is called the Hill function.


Hill function for activator:

• n – Hill coefficient

• K – activation coefficient

• – maximal expression level

Y pr

omot

er a

ctivi

ty

Activator concentration


Hill function for repressor:

Y pr

omot

er a

ctivi

ty

Repressor concentration

• n – Hill coefficient

• K – activation coefficient

• – maximal expression level

APPENDIX C

Transcription Networks are sparse

Let us look on a network with N nodes. Each node can have an outgoing edge to each N-1 other nodes.

Each node can also have a self-edge.

(Cont).

What is the maximal number of the edges in a network with N nodes?

Where the first sum represents the total number of self-edges, and the second sum represent the total number of outgoing edges from all the nodes.

(Cont).A maximally connected network has a pair of edges in both directions between every two nodes. (since the network is a directed graph).

The number of edges actually found in transcription networks, E, is much smaller than the maximum number of edges, .We find that the networks are sparse: .Typically, less that 0.1% of possible edges are found in the network.

(Cont).

Transcription networks are the product of evolutionary selection. Losing an edge in the network is very easy.A single mutation in the binding site of X in the promoter of Y can cause the loss of interaction!Therefore, every edge in the network is under evolutionary selection.The sparse nature of the network reflects the fact that only very few and specific interactions, with usuful functions, appear in the network.

Long-Tailed output degree and Compact input degree sequences

Basic concepts:The number of edges that point into a node is called the node’s in-degree:

The out-degree is the number of edges pointing out of a node.

in-degree = 4

out-degree = 3

(Cont).

• Incoming edges to a node correspond to transcription factors that regulate the gene.

• Outgoing edges correspond to the number of genes regulated by the transcription factor protein that is encoded by the gene, that corresponds to the node.

(Cont).

We deifne the mean number of edges per node, also called the mean connectivity of the network by: . Typically is on the order of 2 to 10 edges/nodes.

One of the qualities that are common in the transcription networks is that they have nodes that show much higher out-degrees than the average node. example

(Cont).

That shows that not all the nodes have similar degrees!Transcription networks often have many transcription factors that regulate a few genes, fewer nodes that regulate tens of genes, and even fewer nodes that regulate hundreds of genes. The latter are called global regulators, and they usually respond to key environmental signals to control large ensembles of genes.

(Cont).

Thus, the out-degree distribution has a long tail and can be roughly described as a power law.The out-degree distribution is only approximately power law. It is bounded by the total number of genes N.The long-tailed distribution is sometimes called “scale-free” because there are sets of regulated genes of many different sizes with no typical scale. Nodes with many more connections than the average are called hubs.

(Cont).In contrast to the long-tail of the out-degree distribution, the in-degree distribution is concentrated around its average value. The in-degrees range between zero and a few times the mean connectivity, . There is a very little chance to find a node regulated by 10 or 100 times more inputs than the average node. In other words, The in-degree distribution does not have a long-tail, but instead resembles compact distributions, such as the Poisson distribution, whose standard deviation is about the same as the mean.

(Cont).

Cluster coefficients of transcription networks

An additional statistical property of graphs is the clustering coefficient, which corresponds whether the neighbors of a given node are connected to each other. We will consider the network as nondirected.A node with k neighbors can be part of at most triangles, that is one for each possible pair of neighboring nodes.

Cont.

The clustering coefficient C is the average number of triangles that a node participates in, divided by this maximal number ().

Transcription networks have average clustering coefficients larger than those of randomized networks.

Cont.

The clustering coefficient can also be measured as a function of the number of neighbors that each node has, resulting in a clustering sequence C(k).

Often C(k) ~ 1/k.The more neighbors a node has, the lower its clustering coefficient.

Undertanding The Cell’s Functional Organization

Albert-Laszlo Barabasi & Zoltan N.Oltvai

“A key aim of postgenomic biomedical research is to systematically catalogue all molecules and

their interactions within a living cell.”

Introduction

Reductionism has dominated the biological research over a century, and provided a wealth of knowledge about individual cellular components and their functions.

Nowadays, it is increasingly clear that a discrete biological function can only rarely be attributed to an individual molecule.

Introduction (cont.)

Most biological characteristics arise from complex interactions between the cell’s numerous constituents, such as proteins, DNA, RNA and more.

Therefore, the main focus in the 21st century is to understand the structure and the dynamics of the complex intercellular web of interactions that contribute to the structure and function of a living cell.


In order to do so, scientists developed new technology platforms, such as protein chips or semi-automated Yeast-Two Hybrid Screens, that help determine how and when these molecules interact with each other.

Protein chipsP-type ATPases interact with TcPKAr

Yeast-Two Hybrid Screen


Contemporary biology’ s major challenge is to map out, undestand and model in quantifiable terms the topological and dynamic properties of the various networks that control the behavior of the cell.

Basic network nomenclature

The behaviour of most complex systems, such as the cell, emerges from the activity of many components that interact with each other through pairwise interactions.

At a highly abstract level, we can consider the components as series of nodes that are connectedto each other by links (edges).The nodes and links together form a network (or a graph).

An example:

A graph theoretic description for a simple pathway (catalysed by Mg2+-dependant

Enzymes*As we can see, the graph is directed

In the most abstract approach, all interacting metabolites are considered equally

*As we can see, the graph is undirected

the degree distribution, P(k) of the metabolic network illustrates its scale-free

topology*As we can see, the greater the degree,

the smaller the distribution

The scaling of the clusteringcoefficient C(k) with the degree k

illustrates the hierarchical architecture of metabolism

*As we can see, the greater the degree, the smaller C(k)

Yeast protein interaction network. A map of protein–protein interactions, which is based on early yeast two-hybrid measurements, illustratesthat a few highly connected nodes (which are also known as hubs) hold the network together.

The largest cluster, which contains ~78% of all proteins, is shown. The colour of a node indicatesthe phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal,

orange = slow growth, yellow = unknown)

Architectural features of cellular networks

A series of recent findings indicate that the random network model (a model that implies that the nodes degrees follow the Poisson distribution, meaning that most nodes have the same number of links approximately equal to the network’s average degree <k>, where k represents the degree of a given node) cannot explain the topological properties of real networks. The finding was that for many social and technological networks the number of nodes with a given degree follows a power law. (the probability that a chosen node has exactly k linksfollows , where γ is the degree exponent, withits value for most networks being between 2 and 3.

Networks that are characterized by a power law degree distribution are highly non-uniform, most of the nodes have only a few links.

A few nodes with a very large number of links, which are often called hubs, hold these nodes together.

Networks with a power degree distribution are called scale-free. It indicates the absence of a typical node in the network (one that could be used to characterize the rest of the nodes).

However, scale-free networks could easily be called scale-rich as well, as their main feature is the coexistence of nodes of widely different degrees (scales), from nodes with one or two links to major hubs.

Cellular networks are scale-free

An important development in understanding of the cellular network architecture was the finding that most networkswithin the cell approximate a scale-free topology.

The first evidence came from the analysis of metabolism, inwhich the nodes are metabolites and the links representenzyme-catalysed biochemical reactions. As many of the reactions are irreversible, metabolic networks are directed.

So, for each metabolite an ‘in’ and an ‘out’ degree can be assigned that denotes the number of reactions that produce or consume it, respectively.

An example of a random network

where all the nodes are connected. The

network is undirected

The distribution of the degrees of the nodes according to Poisson

distribution

The clustering coefficient is

independent of the node’s degree in a random network

An example of a scale-free network where all

the nodes are connected. The


The distribution of the degrees of the nodes

according to the power-law

The clustering coefficient is

independent of the node’s degree in this

example

Evolutionary origin of scale-free networks

most networks are the result of a growth process, during which new nodes join the system over an extended time period. A good example for it is the World Wide Web,which has grown from 1 to more than 3-billion webpages over a 10-year period. Second, nodes prefer to connect to nodes that already have many links, a process that is known as preferential attachment.

If a node has many links, new nodes will tend to connect to it with a higher probability. This node will

therefore gain new links at a higher rate than itsless connected peers and will turn into a hub.

Growth and preferential attachment have a commonorigin in protein networks that is probably

rooted in gene duplication.

Duplicated genes produceidentical proteins that interact with the same

protein partners. Therefore, each protein that isin contact with a duplicated protein gains an extra link.

Highly connected proteins have a natural advantage.they are more likely to have a link to a duplicated protein

than their weakly connected cousins,and therefore they are more likely to gain new links

if a randomly selected protein is duplicated. This represents a preferential attachment.

Although the role of gene duplication has beenshown only for protein interaction networks, it probablyexplains, with appropriate adjustments, the emergence

of the scale-free features in the regulatory andmetabolic networks as well.

The red node is more likely to connect to

node 1 than to node 2

The red circle is a duplicated gene. As we can see, the duplicated gene gains the same connections as his

origin.

Network Motifs

Connectivity patterns (sub-graphs) from which the network is composed.

This idea was first presented by Uri Alon and his group, when discovered in the gene transcription network of the bacteria E.Coli.

Later on were discovered in other natural networks.

Sub-Graphs

all of the 209 bi-fan motifs (a motif with 4 nodes) that are found in the Escherichia coli transcription-regulatory network

motifs that share links with other motifs are shown in blue. otherwise they are red.

Modularity of Cellular Networks

Signatures of hierarchical modularity are present in all cellular networks that have been

investigated so far.

An example of a hierarchical network where all the nodes are connected. The


The distribution of the degrees of the nodes

according to the power-law

The clustering coefficient is according

to C(k)~1/k

Robustness of The Network

• Random Networks are not generally robust.• Scale-free networks are robust against

accidental failures: even if 80% of randomly selected nodes fail, the remaining 20% still form a compact cluster with a path connecting any two nodes.

• But what happens if a hub is removed?

Future Directions

• Developing new theoretical methods to characterize the network topology

• Enhancing our data collection abilities in order to move beyond our present level of knowledge.

• Integrated study of all the interactions could offer further insights

Conclusions

The cell can be approached from the bottom up, moving from molecules to motifs and modules, or from the top to the bottom, startingfrom the network’s scale-free and hierarchical nature and moving to the organism-specific modules and molecules.

In either case, it must be acknowledged thatstructure, topology, network usage, robustness andfunction are deeply interlinked, so we must study them and observe them as a whole.

What is it good for?

UNDERSTANDING THE BIOLOGICAL ASPECTS AND NETWORK STRUCTURES AND

CHARACTERISTICS OF THE GENES AND CELLS IN A WHOLE, WILL HAVE IMPORTANT

IMPLICATIONS FOR THE PRACTICE OF MEDICINE.

Transcription Networks And The Cell’s Functional Organization

Documents