This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
2. Motivation Label the unlabeled DNA sequences by the model,
built by examining the labeled DNA sequences and be able to
perceive some real world Machine Learning problems. 2
3. Approaches K-mer based Fixed length K-mer K-mer with
Mismatches Using Regular Expression PWM based MEME and MAST
Combined Model Unite both model 3
4. K-mer Approach Based on Regular ExpressionMotivation 2-mer
appears mostly in the sequences. So, emphasize mostly on
2-mer.Strategy - For any two 2-mers X & Y, generate regular
expression X(.*)Y and Y(.*)X. - Use these Regular expression as
candidate attribute.
5. Classifier Selection Fig : Around 9 classifiers applied on
TF data setAlgorithms are numbered as follows - (1)Logistic (2)SMO
(3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging
7)LogitBoost (8)RandomForest (9)J48Summary - * 9 classifiers are
applied on 10 data set. 3 are shown among them * choosing an
absolute classifier is not a trivial task * same classifier behaves
differently on different data sets 5
6. Change in Accuracy due to Different Classifiers Logistic J48
RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig :
The performance of different types of Classifiers on TF_3 data set
Fig : The performance of different types of Classifiers on TF_5
data setSummary - * classifiers have great consequences on accuracy
* one has to be prudent when choosing classifiers 6
7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer
6-mer Fig : The performance of different length K-mer on TF_3 data
setSummary - * K-mer length also has consequences on accuracy * not
trivial, difficult to find the absolute one 7
8. Attribute Space Selection Fig : The performance of different
selecting k-mer on TF_4 data setSummary - * considering number of
attributes also has consequences on accuracy * accuracy increases
if we consider greater number of attributes, but from such
saturation point it decreases. 8
9. PWM based Analysis on Accuracy (TF_1 data set)Fig : J48,
minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 maxW 15, no. of
motifs 5Summary - * accuracy increases when we have more motifs but
fixed no. of sites * accuracy increases when we have more sites but
fixed no. of motifs * what happened when we increases both ?????
9
10. PWM based Analysis Fig : Accuracy vary on no. of motifs and
no. of sites* 1st bar concern with no. of sites* 2nd bar concern
with no. of motifs* 3rd bar concern with accuracy* the point is
that accuracy decreases when we increases no. of motifs and no. of
sites.
11. Extra Work for TF_20 Sequences identified by both
modelK-mer The New Model + for TF-20Pwm Sequences Biased 2- Newly
identified mer Model Labeled differently Sequences Fig : Flow
diagram of Building New Model for TF-20Summary - * we have done
some extra work for TF_20
12. AUC based on the Feedback (bonus model) Fig : AUC of 10
data sets based on last submission* accuracy improved than first
submission* PWM does not have pleasant result 12
13. Participation Background Working Working Paramete
Automation Study with Tools with r Tuning Models Badri DNA,RNA,
AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output
motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin
Motif, AlignAce, FASTA,Chowdhury Transcriptio ScanAce Weka
nProsunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST
RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for
Ahmed MAST, MAST, MEME, PWM Weka MAST 13