Information Information Technology As A Technology As A CATALYST in Basic CATALYST in Basic Biological Research Biological Research Sudha Bhattacharya Sudha Bhattacharya J.N.U. J.N.U. New Delhi New Delhi
Feb 03, 2016
Information Technology As Information Technology As A CATALYST in Basic A CATALYST in Basic Biological ResearchBiological Research
Sudha Bhattacharya Sudha Bhattacharya
J.N.U. J.N.U.
New DelhiNew Delhi
Mining of gene Sequence Mining of gene Sequence DataData
Pattern finding in DNAPattern finding in DNA
Specific ExampleSpecific Example
The Retrotransposons in The Retrotransposons in Entamoeba histolyticaEntamoeba histolytica
genomegenome
RetrotransposonsRetrotransposons
Mobile DNA elementsMobile DNA elements Some insert in a sequence specific Some insert in a sequence specific
mannermanner Others are widely distributedOthers are widely distributed Can disrupt the function of genes Can disrupt the function of genes
resulting in diseasesresulting in diseases
What Information can What Information can Bioinformatics provide?Bioinformatics provide?
I. Defining the element.I. Defining the element.
II. Where is the element located in the II. Where is the element located in the genome.genome.
III. Pattern Finding in preinsertion sites.III. Pattern Finding in preinsertion sites.
I. Defining the Element
Its sizeIts size Copy number in the genomeCopy number in the genome Are all copies full length?Are all copies full length? Are all copies functional?Are all copies functional? To which group this element belongs To which group this element belongs
(DNA transposon, LTR (DNA transposon, LTR retrotransposon, non LTR retrotransposon, non LTR retrotransposon)retrotransposon)
Empty site
Post insertion
Defining the end points Of the Element bySequence alignment
Constructing a consensusSequence with noMutation
Type of Element:- Deducedby BLAST search, using the sequence of reconstructed element
(could be truncated)
Reconstructed consensus element
Consensus structures of EhLINEs/SINEs
Bakre Abhijeet
Genomic abundance of full-length and truncated copies of EhLINEs and EhSINEs.
II. Where is the element located II. Where is the element located in the genome.in the genome.
Element Analyzer (ELAN) – Element Analyzer (ELAN) – a tool that searches the a tool that searches the genome and locates all genome and locates all
the elements.the elements.
ELANELAN
Occurrence of genes and other elements near EhLINEs/SINEs
GENES LOCATED UPSTREAM OF EhLINE 1GENE NAME Distance (Kb)
E. histolytica Superoxide Dismutase 1.917E. histolytica cysteine proteinase 2.86homolog of Ribosomal protein L7 0.7homolog of unknown protein [A.thaliana] 1.3homolog of coatomer complex 1.16E. histolytica heat shock protein Hsp70 1.219E. histolytica gene for amebapore 2.63homolog of AAA10008879 ~ 1.9homolog of putative protein kinase[A.thaliana]
~1.7
homolog of NP_660094 1.38homologous to AB006697 1.1SSE58 repeat region 0.3-0.8homologous to T31094 0.3homologous to Df gene product of D.melanogaster
0.9
homologous to LRP 16[M.musculus] ~1.6homologous to AP003806[Oryza sativa] 1.8homologous to mitochondrial energy transfer protein [Solanum tuberosum]
~ 1.8
Phosphomannose isomerase homolog[A.thaliana]
0.7
homologous to CAAX prenyl protease[A. thaliana]
0.25
Average distance at which gene is located 1.3 kb
Percentage of hits where ORF found is 64 %
GENES DOWNSTREAM OF EhRLE DISTANCE FROM
THE 3’ END OF ELEMENT (bp)
NP_691986 hypothetical conserved protein [Oceanobacillus iheyensis] gi|23098520|ref|NP_691986.1|[23098520]
500
NP_473455 T-cell activation Rho GTPase-activating protein isoform b [Homo sapiens] gi|21314774|
952
BAC04765 unnamed protein product [Homo sapiens] gi|21755816|
2236
NP_562800 hypothetical protein [Clostridium perfringens] gi|18310866|
2718
NP_345179 metallo-beta-lactamase superfamily protein [Streptococcus pneumoniae TIGR4]
397
NP_070339 long-chain-fatty-acid--CoA ligase (fadD-6) [Archaeoglobus fulgidus]
781
EAA14243 agCP8299 [Anopheles gambiae str. PEST]
2833
AAM43731 Prestalk protein precursor. [Dictyostelium discoideum]
2772
AAM34385 hypothetical protein [Dictyostelium discoideum]
749
XP_124364 similar to RIKEN cDNA 2410043F08 [Mus musculus]
795
NP_473326 predicted using hexExon; MAL3P7.21 (PFC0960c),
1199
Genes located downstream of EhLINE 1
From analysis of both genes upstream and downstream, it is clear that EhLINE 1 has invaded the genome widely
III. Pattern FindingIII. Pattern Finding
Although the element inserts Although the element inserts in many locations, it has some in many locations, it has some
preferences. preferences. What are these?What are these?
Preferred sitesPreferred sites
The sites that are preferred by The sites that are preferred by EndonucleaseEndonuclease for nicking (GCATT) for nicking (GCATT)
Amongst these, the sites that have Amongst these, the sites that have preferred structure preferred structure
GCATT GCATT ? ? ? ?
DNA structure criteria tested based on DNA structure criteria tested based on dinucleotide frequencies dinucleotide frequencies
Thymine ExcessThymine Excess Bendability Bendability Propeller TwistPropeller Twist Stacking EnergyStacking Energy Free EnergyFree Energy DNA Denaturation EnergyDNA Denaturation Energy Protein induced deformability Protein induced deformability Nucleosome positioningNucleosome positioning
Propeller Twist
-8.4
-8.2
-8
-7.8
-7.6
-7.4
-7.2
-7
-6.8
-42 -22 -2 18 38
Position
Pro
pel
ler
Tw
ist
Par
am
Stacking Energy
-3.5
-3.4
-3.3
-3.2
-3.1
-3
-2.9
-2.8
-42 -22 -2 18 38Position
Sta
ckin
g E
ner
gy
Par
am
Duplex Stability
-0.69
-0.67
-0.65
-0.63
-0.61
-0.59
-0.57
-42 -22 -2 18 38Position
Fre
e e
nerg
y
DNA denaturation
33
34
35
36
37
38
-42 -22 -2 18 38
Position
DN
A d
enat
urat
ion
Ene
rgy
(a) (b)
(c) (d)
Computational analysis of preinsertion loci
Conclusion Conclusion
EhLINEs/SINEs insert in a rigid EhLINEs/SINEs insert in a rigid region that can melt easily region that can melt easily and is 10-35 nucleotides and is 10-35 nucleotides
upstream of the preferred EN upstream of the preferred EN sequence (GCATT)sequence (GCATT)
DNA SCANNERDNA SCANNER
Identification of insertion hot spots for non LTR retrotransposons:
computational and biochemicalapplication to Entamoeba histolytica
Prabhat K. Mandal3, Kamal Rawal1, Ram Ramaswamy 1,2, Alok Bhattacharya 1,3 and Sudha Bhattacharya*
Nucleic Acids Research, 2006, Vol. 00, No. 00 1–12doi:10.1093/nar/gkl710
School of Environmental Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi 110 067, India,1School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India, 2School of PhysicalSciences, Jawaharlal Nehru University, New Delhi 110 067, India and 3School of Life Sciences, Jawaharlal NehruUniversity, New Delhi 110 067, IndiaReceived June 26, 2006; Revised August 22, 2006; Accepted September 14, 2006
THANKS!THANKS!