Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,
Post on 02-Jan-2016
216 Views
Preview:
Transcript
Predicting Predicting MicroRNA GenesMicroRNA Genes and and Target Target SiteSite using Structural and Sequence using Structural and Sequence Features: Machine Learning ApproachFeatures: Machine Learning Approach
Malik YousefMalik Yousef• Institute of Applied Research, The Galilee Society, IsraelInstitute of Applied Research, The Galilee Society, Israel•Louise Showe LabLouise Showe Lab Wistar Institute, UPENN, USAWistar Institute, UPENN, USA
•Neurocomputation Laboratory of CRI-Haifa
microRNA Precursor>hsa-mir-1-1 MI0000651
UGGGAAACAUACUUCUUUAUAUGCCCAUAUGGACCUGCUAAGCUAUGGAAUGUAAAGAAGUAUGUAUCUCAMature
> hsa-mir-1-1 uggaauguaaagaaguaugua
Cancer genomics: Small RNAs with BIG impactsPaul S. Meltzer, NatureDuring the past few years, molecular biologists have been stunned by the discovery of hundreds of genes that encode small RNA molecules.
MicroRNA expression profiles classify human cancersJun Lu. Et al ,Nature 2005
MicroRNA in cancer analysis of microRNA expression in over 300 individuals shows that microRNA profiles could be of value in cancer diagnosis
• 2 MicroRNAs Promote Spread of Tumor Cells :By blocking the translation of tumor suppressor genes, miRNAs have been shown to facilitate the development of many types of cancer.
• Small RNAs Can Prevent Spread of Breast Cancer:The tiny RNAs prevent the spread of cancer by interfering with the expression of genes that give cancer cells the ability to proliferate and migrate
• MicroRNAs May Be Key To HIV's Ability To Hide, Evade Drugs
miRNA processingmiRNA processing
PART IMICRORNA PREDICTION
BayesMiRNAfind: Naïve Bayes For miRNA Gene Prediction
873 outside jobs were processed in 2 weeks on the Wistar Bioinformatics Core cluster .
http://bioinfo.wistar.upenn.edu/miRNA/miRNA
One-ClassMirnaFind : One-Class microRNA gene prediction Web Server
http://wotan.wistar.upenn.edu/mirna_one_class
BMC-Algorithms for Molecular Biology
Advantage of our toolsAdvantage of our toolsAllowing predicting miRNAs for multi-species [ Vir-mir db,Li et al 2007], most of the other tools are species-specific
The input is not limited! (full genome)
Predict also non-conserved miRNA
The features seems to be more accurate describing the miRNA class.
Two-Class Two-Class : The Computation : The Computation procedure componentsprocedure components
1. Input: Genomic sequences. <cagtaataatctaaaaggacttttatcaacaattatgatattgtatatgcagcatnce>
2. Fold the sequence: A sliding window of 110nt length is passed along the input sequence. Overlapping stem-loops are removed
3. Potential stem-loops filter: Extract
Potential stem-loops. Two lists are generated,
the potential positive stem-loops and potential
negative stem-loops
4. Mature microRNA candidate: Passing a sliding window with 21nt length. Generate the Negative class by using the potential negative stem-loops
5. Naïve Bayse Classifier: Build up the
classifier or use the trained model for classification
6. Naïve Bayes Analyzer: Pick up
the one with the highest score
[chose different name]
7. Naïve Bayes Filter: Naïve Bayes score filter
8. Conservation Filter: Conservation filter
BayesMiRNA:BayesMiRNA: Mouse Genome Mouse Genome (one strand)(one strand)
1. Input: Genomics sequence Number of nucleotides is:
209117380 (135/135)
2. Fold the sequence without overlapping 21974811 (135/135)
3. Potential stem-loops filter: Extract Potential stem-loops.8967363 (135/135)
4. Mature microRNA candidates
458606474 (135/135)
5. Naïve Bayse Classifier 698399 (135/135)
6. Naïve Bayes Analyzer 265935 (135/135)
7. Naïve Bayes Filter
8. Conservation Filter
Out of 212 mature miRNAs from the mouse genome 135 are at the DNA + strandRunning on a parallel compute cluster with 100 nodes (http://core.pcbi.upenn.edu/tools/liniactools.html), The whole computation procedure took about 6.5 days to complete
PART IIMICRORNA TARGET SITE
PREDICTION
Bioinformatics Journal
miRNA target site prediction
Morten LindowmiRNA-group, Bioinformatics CentreUniversity of Copenhagen
Performance of NBmiRTar
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
150 100
150
200
250
300
400
500
600
700
800
900
1000
2000
3000
4000
5000
Sensitivity
Specificity
3’UTR microRNAs
MiRanda
Naïve Bayes classifier
Predicted microRNA targets
Orthologs score
Summary of microRNA target
prediction
Classifier
Mouse
3’UTR
(mavid)
Sequence alignment
Database
Human
3’ UTR
Miranda score
naïve Bayes score
Folding energy
Filters
Filter
NBmiRTarhttp://wotan.wistar.upenn.edu/NBmiRNAtarget/login.php
Results with Human Known TargetsmiRNA Number of confirmed
targetsMiranda
PredictionsRecovery by
MirandaBayesMirnaTarget
PredictionsRecovery by
BayesMirnaTarget
NB-filter0.9
1 1 401592 1/1 87843 0/1 60108
2 2 64984 2/2 34380 2/2 27239
3 2 321312 2/2 80632 2/2 60967
4 2 49556 2/2 24090 1/2 19013
5 1 563477 1/1 259423 1/1 202339
6 1 294255 1/1 84153 1 61725
7 1 596411 1/1 92337 1 62118
8 1 381933 1/1 54636 0 34138
9 1 329770 1/1 42736 1 45447
10 1 328120 1/1 73852 1 47663
Sum 13 3331410 13 834082 10 620757
• NBmiRTar reduces Miranda prediction by about 75% with recovery rate of 77%.
• NBmiRTar + NB-filter (threshold 0.9) reduces Miranda prediction by about 81% with recovery rate of 77%.
427 known human mature miRNA
MiRanda2620700 Predictions
50 genes (59 TFs).human 3’UTR
Showe Lab Experiment
NBmiRTar 32199 Predictions
390969
Filters: MiRanda 110 NB 0.9Orth
ologs Mouse
3’UTR G
enes
MiR
anda 110
N
B 0.9
50 genes that have been shown to be down regulated at the message level after treatment
• Malik Yousef, Segun Jung, Louise C Showe and Michael K Showe, Learning from Positive Examples when the Negative Class is Undetermined- microRNA gene identification. Algorithms for Molecular Biology, (Accepted)(2008).
• Malik Yousef, Segun Jung, Andrew V. Kossenkov, Louise C. Showe and Michael K. Showe, Naïve Bayes classifier for microRNA target gene identification, Bioinformatics, 15 November 2007; 23: 2987 - 2992
• Malik Yousef, Hagit Shatkay, Michael Nebozhyn, Louise C. Showe and Michael K. Showe, Combining Multi-Species Genomic Data for MicroRNA Identification Using Naïve Bayes Classifier. Bioinformatics, Vol. 22, No. 11, p. 1325-1334 (2006)
Related publications
top related