Bioinformatics PVPSIT, Vijayawada 30 th September 2010 Allam Appa Rao JNTUK 03/27/22 Allam Appa Rao 1
Dec 25, 2015
Socrates taught us, the essence of Scienceis measuring, counting, and weighing together with reasoning from postulates or axioms
304/19/23 Allam Appa Rao
Knuth
“Science is what we understand well enough to explain to a computer”
Dijkstra "Computer Science is no more about computers than astronomy is about telescopes http://www.quotationspage.com/quote/78
8.html”
If you want to understand life, don't think about vibrant, throbbing gels and oozes, think
about information technology. --- Richard Dawkins,
University of California, BerkeleyOxford UniversityThe Blind Watchmaker, 1986,
Norton, p. 112.
IT
Over the past few decades rapid developments in molecular research technologies (MRT) and developments in information technologies (IT) have combined to produce a tremendous amount of information related to molecular biology.
04/19/23 4Allam Appa Rao
Bioinformatics: Definition
Bioinformatics is the application of information technology and computer science to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatic processes in biotic systems.
04/19/23 8Allam Appa Rao
Bioinformatics: Applications
Bioinformatics focuses on developing and applying computationally intensive techniques like pattern recognition, data mining, machine learning algorithms, and visualization
04/19/23 9Allam Appa Rao
Bioinformatics: Activities
Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures.
04/19/23 10Allam Appa Rao
DM: Definition
Diabetes Mellitus is a condition in which the body either does not produce enough, or does not properly respond to, insulin, a hormone produced in the pancreas.
04/19/23 11Allam Appa Rao
Bioinformatics: Entailment
Bioinformatics entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
04/19/23 12Allam Appa Rao
Bioinformatics: Research
Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modelling of evolution.
04/19/23 13Allam Appa Rao
DM: Epidemic
Diabetes Mellitus (DM) affects about 10-15% of the Indians and is assuming epidemic proportions, development of newer robust therapeutic approaches both in its prevention and treatment are needed.
04/19/23 14Allam Appa Rao
Our Work: BDNF• BDNF is a natural compound that is present in
human body and hence therapeutics developed using these molecules are expected to have fewer side effects.
• BDNF can indeed prevent and ameliorate DM, it would pave way to develop newer therapeutic opportunities.
• BDNF as a therapeutic tool for the prevention and treatment of DM.
04/19/23 Allam Appa Rao 15
04/19/23 Allam Appa Rao 17
Protein folding ?
The ability of a protein to fold reliably into a pre-determined conformation despite a near infinite number of possibilities is, despite much research, still poorly understood.
The structure of a protein is determined purely by the amino acid sequence, and the structure of the protein determines the function.
The function of a protein depends entirely on the ability of the protein to fold rapidly and reliably to its native structure. Many proteins fold spontaneously into their native structure in aqueous solution.
It has been suggested that for a protein of 100 amino acids, a purely random conformational search would require around 10 **29 years, and yet proteins are able to fold on a timescale of milliseconds to seconds.
This suggests that only a small amount of conformational space is sampled during the folding process and this in turn implies the existence of kinetic folding pathways.
This paradox of how proteins fold rapidly and reliably to their native conformation is known as the protein folding problem.
Application of Shannon’s information theory breaks genetics and molecular biology out of the descriptive mode into the quantitative mode
04/19/23 Allam Appa Rao 19
George Gamow (1904-68)
Shannon - Information Flow Information flow in an information theoretical context is the transfer of
information from a variable h to a variable l in a given process. The measure of information flow, p P is defined as the uncertainty before the process started minus the
uncertainty after the process terminated. This can be quantified as
where H (h | l) is the conditional entropy (equivocation) of variable h (before the process started) given the variable l (before the process started), and H(h | l') is the conditional entropy (equivocation) of variable h (before the process started) given the variable l' (the value of variable l after the process finished).
H(X,Y) is the joint entropy, and can be calculated as follows:
04/19/23 Allam Appa Rao 20
Gene Protein
Information in living organisms
One of the prime characteristics of all living organisms is the information they contain for all operational processes
Braitenberg, a German cybernetist, has submitted evidence ‘that information is an intrinsic part of the essential nature of life.’ The transmission of information plays a fundamental role in everything that lives.
• Without a doubt, the most complex information processing system in existence is the human body. If we take all human information processes together, that is, conscious ones (language, information-controlled functions of the organs, hormone system), this involves the processing of 1024 bits daily.
• This astronomically high figure is higher by a factor of 1,000,000 than the total human knowledge of 1018 bits stored in all the world’s libraries.
04/19/23 Allam Appa Rao 21
Allen Turing and Gatlinburg symposium on information theory in biology
The logic of Turing machines has an isomorphism with the logic of the genetic information system
• Information Source• Transmission of Information• Tasks to be completed• Output
Information source: DNATransmission through
m/t/r RNATasks: Transcription,
translationOutput: Protein(s)
04/19/23 Allam Appa Rao 22
04/19/23 Allam Appa Rao 23
Information Theory, Evolution, and the Origin of LifeHubert P. Yockey, pp 35
Shannon, Turing, Gamow and Rao
H (gene) L (protein)
04/19/23 Allam Appa Rao 24
InformationTransferProcess
The word "information" derives from the Latin, informare, which means "to put into form”
Latent
• Existing or present
• but concealed or inactive
Manifest
Readily seen, or understood:
apparent, clear, evident, noticeable, observable
04/19/23 Allam Appa Rao 25
DNA Protein
How does the code work?
• Template for construction of proteins
04/19/23 Allam Appa Rao
Inherited disease: broken/ damaged dna broken/ damaged proteinsViral disease: dna/rna foreign proteins (Akin to Computer VIRUS)
26
Latent Information Manifested Information
Manifestation
04/19/23 Allam Appa Rao 27
Genomic InformationACGTCCGGCCTTATACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAG…
What good is all What good is all this genetic this genetic information?information?
Information Inheritance
• Human beings are endowed with the information encoded in the genetic material of inheritance which controls the development, reproduction and self-repair.
• The carrier of this information is a complex structure of dna.
04/19/23 Allam Appa Rao 28
Anything Computable!
• Markov defined what became known as Markov algorithms (HMM) for biological computation
• Alonzo Church used Lambda calculus for computing
• Kurt Gödel defined Recursive functions
04/19/23 Allam Appa Rao 29
Genetic Information Flow
• Within the body, genetic information flows from dna to protein and other products
– first, by the transcription of portions of the dna into so-called messenger rna and,
– second, by (translation) the assembly of individual amino acids into polypeptides, including proteins.
This is the process of life and living related to information flow.
04/19/23 Allam Appa Rao 30
Paradigm of 'computational thinking'
Advances in computational power and computational methods have led to the crystallization of the paradigm of 'computational thinking’
This paradigm of 'computational thinking‘ takes applications of computer science far beyond mere programming and data management.
This paradigm of 'computational thinking’ provides newer methods for understanding the complex life style diseases like Diabetes. 04/19/23 Allam Appa Rao 31
32
The Value of the Right Tool
3.2 Billion Nucleotides A “Supercomputer” Cluster04/19/23 Allam Appa Rao
The new Challenges in Computer Science
• Promoter recognition in genomic sequence,
• Understanding data from micro array experiments, and
• Accurate prediction of protein folds from sequences
04/19/23 Allam Appa Rao 33
With the widespread availability of nucleotide and amino acid sequences, novel
methods for extracting biologically and clinically relevant knowledge are feasible.
Data is deposited on the Internet on websites such as GeneCards, available at
http://www.genecards.org/mirror.shtml .
Further information can be obtained from related sites - UniProt
(http://www.uniprot.org) and SwissProt (http://www.expasy.org/sprot/).
Using FASTA and CLUSTAL_X programs, similarity scores can be calculated to choose
items of interest.
Further information can be obtained by mining text, either manually or increasingly
using text-mining tools such as PathBinderH and GENIA corpus.
Bioinformatics approach to extract information from genes
Computerization is necessary to build the database of chromatography
coupled mass spectrometric analytical data.
On the basis of this proteomic data we can identify the proteins as
biomarkers which are expressing dissimilarity between healthy and disease
condition.
Affinity chromatography is one of the fastest liquid chromatographic
methods for the separation and purification of biomolecules due to its high
molecular specificity.
Proteomics and tools for identification and understanding the
biochemistry of proteins and pathways are in a new stage of development
and evaluation.
Computational approach of protein analysis using affinity chromatography: application to proteomics
A rapid and sensitive RP-HPLC method with UV detection (242 nm) for routine
analysis of famciclovir in pharmaceutical formulations was developed.
Chromatography was performed with mobile phase containing a mixture of
methanol and phosphate buffer (50:50, v/v) with flow rate 1.0 mL min−1.
Quantitation was accomplished with internal standard method. The procedure
was validated for linearity (correlation coefficient =0.9999), accuracy, robustness
and intermediate precision.
Experimental design was used for validation of robustness and intermediate
precision.
Contd…
Development and Validation of LC Method for the Determination of Famciclovir in Pharmaceutical
Formulation Using an Experimental Design
To test robustness, three factors were considered; percentage v/v of
methanol in mobile phase, flow rate and pH; flow rate, the percentage of
organic modifier and pH have considerable important effect on the response.
For intermediate precision measure the variables considered were: analyst,
equipment and number of days. The RSD value (0.86%, n=24) indicated an
acceptable precision of the analytical method.
The proposed method was simple, sensitive, precise, accurate and quick and
useful for routine quality control.
Mathematical Analysis of Diabetes Related Proteins Having High Sequence Complexity
We have searched for proteins affecting diabetes and we also found in which
common species these proteins were more prevalent and have performed protein
composition analysis of those having high sequence complexity.
About 90% of rat genes have counterparts in the mouse and human genomes and
this is the reason to find proteins common among the three different species.(Rat
Genome Sequencing Consortium 2004, www.ratbehaviour.org/Ratsmice.htm)
The distribution pattern of the protein variates was examined and bivariate plots
were further drawn.
Contd…
The bivariate plots show a similar clustering for Rattus norvegicus and Mus
Musculus but show some variation in Homo sapiens indicating that the plots are
correct as Rattus Norvegicus and Mus Musculus are relatively close in the
phylogenetic tree(Sridhar GR etal) hence having a similar clustering.
The proteins which are away from the cluster are outliers due to the reason that
they are having different compositional characteristics.
Bioinformatics analysis of diabetic retinopathy using functional protein sequences
Diabetic retinopathy is the leading cause of blindness among patients with
diabetes mellitus.
We evaluated the role of several proteins that are likely to be involved in diabetic
retinopathy by employing multiple sequence alignment using ClustalW tool and
constructed a phylogram tree using functional protein sequences extracted from
NCBI.
Phylogram was constructed using Neighbor-Joining Algorithm in bioinformatics
approach.
It was observed that aldose reductase and nitric oxide synthase are closely
associated with diabetic retinopathy.
Contd…
It is likely that vascular endothelial growth factor, pro-inflammatory cytokines,
advanced glycation end products, and adhesion molecules that also play a role
in diabetic retinopathy may do so by modulating the activities of aldose
reductase and nitric oxide synthase.
These results imply that methods designed to normalize aldose reductase and
nitric oxide synthase activities could be of significant benefit in the prevention
and treatment of diabetic retinopathy.
COMPUTATIONAL PROTEIN SEQUENCES ANALYSIS FOR DIABETIC RETINOPATHY – A BIO INFORMATICS STUDY
The role of bioinformatics is to aid life scientists in gathering and processing
genomic data to study protein function.
Another important role is to aid researchers at pharmaceutical companies in
making detailed studies of protein structures to facilitate drug design.
Human genome with 3 billion chemical nucleotide bases has about 30,000
genes whose functions are known to a great extent.
These genes dictate the synthesis of different proteins which proteins differ
from one another in their amino acid sequence.
The physiological functions of a protein depend upon this sequence.
Contd…
The functions of the protein, butyrylcholinesterase, are not known to a great
extent.
Therefore its amino acid sequence is compared with the sequences of 29
different proteins using computational techniques.
Close similarity is observed with the protein EST2_human which confirms
similarities of physiological actions.
This finding obtained from computational techniques, now found to be
indispensable, help the scientists of life sciences to proceed with the work in their
wet laboratories.
Contd…
Amino acid sequence of BChE is compared with proteins which act as
inhibitors of neovascularisation and similarity is found with one of the
inhibitors. Early onset of diabetic retinopathy is often found in patients who
have insufficient BChE in their serum.
This suggests that BChE may act as an inhibitor of neovascularisation
that causes retinopathy.
Bioinformatics analysis of functional protein sequences reveals a role for brain-derived neurotrophic factor in
obesity and type 2 diabetes mellitus
Using bioinformatics techniques and sequence analyses algorithms, a
comparative study between human and rodents revealed similarity in the
behavior of genes involved in the control of energy homeostasis.
Brain-derived neurotrophic factor (BDNF) modulates the secretion and
actions of insulin, leptin, ghrelin, various neurotransmitters and peptides, and
pro-inflammatory cytokines involved in energy homeostasis suggesting that it
(BDNF) has a significant role in the pathobiology of obesity and type 2
diabetes mellitus.
Contd…
Based on these evidences, we propose that obesity and type 2 diabetes could be
disorders of the brain and BDNF could serve as a biomarker in predicting their
development.
Hence, methods developed to selectively deliver BDNF to appropriate
hypothalamic neurons may form a novel approach in their treatment.
Bioinformatics Analysis of Functional Protein Sequences Reveals a Role for Tumor Necrosis Factor-α and Nitric
Oxide in Insulin Resistance Syndrome
Using bioinformatics techniques and sequence analyses algorithms, we
identified that tumor necrosis factor-α (TNF-α) and nitric oxide (NO) have a
significant role in the pathobiology of insulin resistance syndrome, a condition that
is common in subjects with abdominal obesity, hypertension, dyslipidemia,
atherosclerosis, and coronary heart disease and are accompanied by endothelial
dysfunction due to reduced endothelial nitric oxide generation.
TNF-α has neurotoxic actions, stimulates inducible NO synthase activity, and
modulates the expression of neurotransmitters involved in the control of feeding
and thermogenesis.
Contd…
NO is a neurotransmitter and influences secretion and actions of various
hypothalamic peptides and neuropeptides.
Insulin suppresses the production of TNF-α but stimulates that of endothelial NO.
This close interaction between TNF-α, NO, hypothalamic peptides, and insulin
suggests that regulation of TNF-α and NO production and action could be critical in
the management of insulin resistance syndrome and its associated conditions.
Phylogenetic Tree Construction of Butyrylcholinesterase Sequences in Life Forms
Butyrylcholinesterase is an enzyme with few known physiological functions. It is
related to acetylcholine that was shown to be expressed in a variety of life forms.
We performed a search using the human butyrylcholinesterase gene
(HGNC:983;MIM:177400), and found the sequence in a broad spectrum including
plants, bacteria and animals.
Therefore butyrylcholinesterase appears to have evolved early in evolution,
and to have been conserved.
Serum butyrylcholinesterase in type 2 diabetes mellitus: a biochemical and bioinformatics approach
Background
Butyrylcholinesterase is an enzyme that may serve as a marker of metabolic
syndrome. We (a) measured its level in persons with diabetes mellitus, (b)
constructed a family tree of the enzyme using nucleotide sequences downloaded
from NCBI. Butyrylcholinesterase was estimated colorimetrically using a
commercially available kit (Randox Lab, UK). Phylogenetic trees were constructed
by distance method (Fitch and Margoliash method) and by maximum parsimony
method.Contd…
Results
There was a negative correlation between serum total cholesterol
and butyrylcholinesterase (-0.407; p < 0.05) and between serum LDL
cholesterol and butyrylcholinesterase (-0.435; p < 0.05). There was
no statistically significant correlation among the other biochemical
parameters. In the evolutionary tree construction both methods gave
similar trees, except for an inversion in the position of Sus scrofa
(M62778) and Oryctolagus cuniculus (M62779) between Fitch and
Margoliash, and maximum parsimony methods.
Conclusion
The level of butyrylcholinesterase enzyme was inversely related to
serum cholesterol; dendrogram showed that the structures from
evolutionarily close species were placed near each other.
Alzheimer's disease and Type 2 diabetes mellitus: the cholinesterase connection?
Alzheimer's disease and type 2 diabetes mellitus tend to occur together.
We sought to identify protein(s) common to both conditions that could suggest a
possible unifying pathogenic role. Using human neuronal butyrylcholinesterase
(AAH08396.1) as the reference protein we used BLAST Tool for protein to protein
comparison in humans. We found three groups of sequences among a series of
12, with an E-value between 0–12, common to both Alzheimer's disease and
diabetes: butyrylcholinesterase precursor K allele (NP_000046.1),
acetylcholinesterase isoform E4-E6 precursor (NP_000656.1), and apoptosis-
related acetylcholinesterase (1B41|A).
Contd…