Top Banner
Predicting Predicting MicroRNA Genes MicroRNA Genes and and Target Site Target Site using Structural and using Structural and Sequence Features: Machine Sequence Features: Machine Learning Approach Learning Approach Malik Yousef Malik Yousef Institute of Applied Research, The Galilee Institute of Applied Research, The Galilee Society, Israel Society, Israel Louise Showe Lab Louise Showe Lab Wistar Institute, UPENN, USA Wistar Institute, UPENN, USA Neurocomputation Laboratory of CRI-Haifa
18

Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

Jan 02, 2016

Download

Documents

Amy Perkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

 

Predicting Predicting MicroRNA GenesMicroRNA Genes and and Target Target SiteSite using Structural and Sequence using Structural and Sequence Features: Machine Learning ApproachFeatures: Machine Learning Approach

Malik YousefMalik Yousef• Institute of Applied Research, The Galilee Society, IsraelInstitute of Applied Research, The Galilee Society, Israel•Louise Showe LabLouise Showe Lab Wistar Institute, UPENN, USAWistar Institute, UPENN, USA

•Neurocomputation Laboratory of CRI-Haifa

Page 2: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

microRNA Precursor>hsa-mir-1-1 MI0000651

UGGGAAACAUACUUCUUUAUAUGCCCAUAUGGACCUGCUAAGCUAUGGAAUGUAAAGAAGUAUGUAUCUCAMature

> hsa-mir-1-1 uggaauguaaagaaguaugua

Cancer genomics:  Small RNAs with BIG impactsPaul S. Meltzer, NatureDuring the past few years, molecular biologists have been stunned by the discovery of hundreds of genes that encode small RNA molecules.

MicroRNA expression profiles classify human cancersJun Lu. Et al ,Nature 2005

MicroRNA in cancer analysis of microRNA expression in over 300 individuals shows that microRNA profiles could be of value in cancer diagnosis

Page 3: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

• 2 MicroRNAs Promote Spread of Tumor Cells :By blocking the translation of tumor suppressor genes, miRNAs have been shown to facilitate the development of many types of cancer.

• Small RNAs Can Prevent Spread of Breast Cancer:The tiny RNAs prevent the spread of cancer by interfering with the expression of genes that give cancer cells the ability to proliferate and migrate

• MicroRNAs May Be Key To HIV's Ability To Hide, Evade Drugs

Page 4: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

miRNA processingmiRNA processing

Page 5: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

PART IMICRORNA PREDICTION

Page 6: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

BayesMiRNAfind: Naïve Bayes For miRNA Gene Prediction

873 outside jobs were processed in 2 weeks on the Wistar Bioinformatics Core cluster .

http://bioinfo.wistar.upenn.edu/miRNA/miRNA

Page 7: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

One-ClassMirnaFind : One-Class microRNA gene prediction Web Server

http://wotan.wistar.upenn.edu/mirna_one_class

BMC-Algorithms for Molecular Biology

Page 8: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

Advantage of our toolsAdvantage of our toolsAllowing predicting miRNAs for multi-species [ Vir-mir db,Li et al 2007], most of the other tools are species-specific

The input is not limited! (full genome)

Predict also non-conserved miRNA

The features seems to be more accurate describing the miRNA class.

Page 9: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

Two-Class Two-Class : The Computation : The Computation procedure componentsprocedure components

1. Input: Genomic sequences. <cagtaataatctaaaaggacttttatcaacaattatgatattgtatatgcagcatnce>

2. Fold the sequence: A sliding window of 110nt length is passed along the input sequence. Overlapping stem-loops are removed

3. Potential stem-loops filter: Extract

Potential stem-loops. Two lists are generated,

the potential positive stem-loops and potential

negative stem-loops

4. Mature microRNA candidate: Passing a sliding window with 21nt length. Generate the Negative class by using the potential negative stem-loops

5. Naïve Bayse Classifier: Build up the

classifier or use the trained model for classification

6. Naïve Bayes Analyzer: Pick up

the one with the highest score

[chose different name]

7. Naïve Bayes Filter: Naïve Bayes score filter

8. Conservation Filter: Conservation filter

Page 10: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

BayesMiRNA:BayesMiRNA: Mouse Genome Mouse Genome (one strand)(one strand)

1. Input: Genomics sequence Number of nucleotides is:

209117380 (135/135)

2. Fold the sequence without overlapping 21974811 (135/135)

3. Potential stem-loops filter: Extract Potential stem-loops.8967363 (135/135)

4. Mature microRNA candidates

458606474 (135/135)

5. Naïve Bayse Classifier 698399 (135/135)

6. Naïve Bayes Analyzer 265935 (135/135)

7. Naïve Bayes Filter

8. Conservation Filter

Out of 212 mature miRNAs from the mouse genome 135 are at the DNA + strandRunning on a parallel compute cluster with 100 nodes (http://core.pcbi.upenn.edu/tools/liniactools.html), The whole computation procedure took about 6.5 days to complete

Page 11: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

PART IIMICRORNA TARGET SITE

PREDICTION

Bioinformatics Journal

Page 12: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

miRNA target site prediction

Morten LindowmiRNA-group, Bioinformatics CentreUniversity of Copenhagen

Page 13: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

Performance of NBmiRTar

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

150 100

150

200

250

300

400

500

600

700

800

900

1000

2000

3000

4000

5000

Sensitivity

Specificity

Page 14: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

3’UTR microRNAs

MiRanda

Naïve Bayes classifier

Predicted microRNA targets

Orthologs score

Summary of microRNA target

prediction

Classifier

Mouse

3’UTR

(mavid)

Sequence alignment

Database

Human

3’ UTR

Miranda score

naïve Bayes score

Folding energy

Filters

Filter

Page 15: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

NBmiRTarhttp://wotan.wistar.upenn.edu/NBmiRNAtarget/login.php

Page 16: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

Results with Human Known TargetsmiRNA Number of confirmed

targetsMiranda

PredictionsRecovery by

MirandaBayesMirnaTarget

PredictionsRecovery by

BayesMirnaTarget

NB-filter0.9

1 1 401592 1/1 87843 0/1 60108

2 2 64984 2/2 34380 2/2 27239

3 2 321312 2/2 80632 2/2 60967

4 2 49556 2/2 24090 1/2 19013

5 1 563477 1/1 259423 1/1 202339

6 1 294255 1/1 84153 1 61725

7 1 596411 1/1 92337 1 62118

8 1 381933 1/1 54636 0 34138

9 1 329770 1/1 42736 1 45447

10 1 328120 1/1 73852 1 47663

Sum 13 3331410 13 834082 10 620757

• NBmiRTar reduces Miranda prediction by about 75% with recovery rate of 77%.

• NBmiRTar + NB-filter (threshold 0.9) reduces Miranda prediction by about 81% with recovery rate of 77%.

Page 17: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

427 known human mature miRNA

MiRanda2620700 Predictions

50 genes (59 TFs).human 3’UTR

Showe Lab Experiment

NBmiRTar 32199 Predictions

390969

Filters: MiRanda 110 NB 0.9Orth

ologs Mouse

3’UTR G

enes

MiR

anda 110

N

B 0.9

50 genes that have been shown to be down regulated at the message level after treatment

Page 18: Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,

• Malik Yousef, Segun Jung, Louise C Showe and Michael K Showe, Learning from Positive Examples when the Negative Class is Undetermined- microRNA gene identification. Algorithms for Molecular Biology, (Accepted)(2008).

• Malik Yousef, Segun Jung, Andrew V. Kossenkov, Louise C. Showe and Michael K. Showe, Naïve Bayes classifier for microRNA target gene identification, Bioinformatics, 15 November 2007; 23: 2987 - 2992

• Malik Yousef, Hagit Shatkay, Michael Nebozhyn, Louise C. Showe and Michael K. Showe, Combining Multi-Species Genomic Data for MicroRNA Identification Using Naïve Bayes Classifier. Bioinformatics, Vol. 22, No. 11, p. 1325-1334 (2006)

Related publications