Top Banner
1 Transcription Factor-DNA binding prediction Tahmina Ahmed Prosunjit Biswas Iffat Sharmin Chowdhury Badri Sampath
15

Transcription Factor DNA Binding Prediction

Nov 18, 2014

Download

Technology

Transcription Factor DNA Binding Prediction
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. Transcription Factor-DNA binding predictionTahmina AhmedProsunjit BiswasIffat Sharmin ChowdhuryBadri Sampath 1
  • 2. Motivation Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2
  • 3. Approaches K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression PWM based MEME and MAST Combined Model Unite both model 3
  • 4. K-mer Approach Based on Regular ExpressionMotivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer.Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.
  • 5. Classifier Selection Fig : Around 9 classifiers applied on TF data setAlgorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5
  • 6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data setSummary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6
  • 7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data setSummary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7
  • 8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data setSummary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8
  • 9. PWM based Analysis on Accuracy (TF_1 data set)Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 maxW 15, no. of motifs 5Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9
  • 10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites* 1st bar concern with no. of sites* 2nd bar concern with no. of motifs* 3rd bar concern with accuracy* the point is that accuracy decreases when we increases no. of motifs and no. of sites.
  • 11. Extra Work for TF_20 Sequences identified by both modelK-mer The New Model + for TF-20Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20Summary - * we have done some extra work for TF_20
  • 12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission* accuracy improved than first submission* PWM does not have pleasant result 12
  • 13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA,Chowdhury Transcriptio ScanAce Weka nProsunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13
  • 14. Acknowledgment 14
  • 15. Questions ???