Crystallization classification semisupervised

lklklkljkhklhkh

Evaluation of Semi-supervised Learning forClassification of Protein Crystallization Imagery

Madhav Sigdel, Imren Din, Semih Din, Madhu S Sigdel, Marc Pusey, PhDRamazan Aygun, PhDEmail: [email protected] Research LabComputer Science DepartmentUniversity of Alabama in Huntsville

IEEE SoutheastCon 2014, Lexington, KY

OutlineBackgroundMotivationSemi-supervised ClassificationSelf TrainingYet Another Two Staged Idea (YATSI)Overview of FeaturesExperimental ResultsConclusion


Background

Sample Protein Crystallization Trial Images

Non-crystals

Likely-leads

CrystalsModel of robotic system to collect images

IEEE SoutheastCon 2014, Lexington, KYImage Categories

Non-crystals Images without crystals (Clear drop/Precipitates)

Likely-leads Images with micro-crystals or high intensity regions without clear shapes

Crystals Images with different shapes of crystals (needles, plates, 3D crystals)

Related WorkProtein crystallization classification problem using variety of algorithms such asSupport vector machines (SVMs)Decision treesNeural networks etc. Combination of multiple classifiers (Saitoh et al. 08)Trend to increase the size of training data to improve classification performance79,632 images(Po & Laine 08)165,351 images (Cumba et al. 10)

MotivationExpert labeling is very difficult and time-consuming

Can we build a reliable classification system using limited labeled images?

Semi-Supervised learning

Semi-supervised ClassificationCombine labeled data and unlabeled data to improve the learning modelExamples of Semi-supervised classificationSelf-trainingYet-Another Two Staged Idea (YATSI)Laplacian SVM, transductive SVM etc.Used for applications such as text classification, spam email detection, software fault detection etc.

Self-TrainingLet L be the set of labeled data, U be the set of unlabeled data.Repeat Train a classifier h with training data LClassify data in U with hFind a subset U of U with the most confident predictionsL + U L U U U


Yet-Another Two Staged Idea (YATSI)Uses a supervised classification algorithm and a nearest neighborhood algorithmTwo stagesFirst stageGenerate prediction model (M) using labeled data (L)Find predictions for unlabeled data (U) using M U (Pre-labeled data)Combine original labeled data and pre-labeled data (L + U)Second stageApply k nearest neighbor on (L+U) to determine the actual predictions for unlabeled instances

Drissens et al. 06

O?XXXXXOOOOOOOOOOXXXXXXXOK = 1, ? OK = 3, ? XXK-Nearest Neighbor Classifier

O?XXOOOOOOOXXXXX????????????????Classifier???????

Prediction Model

YATSI Algorithm

OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOClassifierXOOOOXO

Prediction Model

YATSI Algorithm

OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOXOOOOXO

YATSI Algorithm

OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOXOOOOOOYATSI Algorithm

Overview of Features

IEEE SoutheastCon 2014, Lexington, KY3 thresholding techniques6 - Intensity Features9 Blob Features3*(6+9) = 45-dimension feature vector

Dataset


Non-crystals

Likely-leads

Crystals2250 Images

2 Class Problem67% Non-crystals 33% Likely Crystals (Crystals + Likely Leads)

3 Class Problem67% Non-crystals 18% Likely Leads 15% Crystals

Experiments Self Training2 Supervised ClassifiersNave Bayesian (NB)Sequential Minimum Optimization (SMO)Confidence level (c) for first predictionc = 0.8c = 0.9c = 0.95Training sizes - 1%, 2%, 5%, 10%, 20%


Experiments Self Training


Experiments - Self-training


Experiments - YATSI5 Supervised ClassifiersNave Bayesian (NB)Sequential Minimum Optimization (SMO)Decision Tree (J48)Multilayer Perceptron (MLP)Random Forest (RF)No of K-nearest neighbors (K)K = 10K = 20K = 30Training sizes - 1%, 2%, 5%, 10%, 20%

Experiments - YATSI

Supervised vs YATSI

Experiments - YATSIIEEE SoutheastCon 2014, Lexington, KY

Experiments - YATSIIEEE SoutheastCon 2014, Lexington, KY

Best Classifers Comparison

ConclusionCompared the performances of 2 semi-supervised classification techniques using self-training and YATSI approachNave Bayesian (NB) and SMO classifiers benefited from self-training and YATSI approachClassifiers J48, multilayer perceptron (MLP) and random forest (RF) did not show improvement by applying semi-supervised approachRandom forest provided the best classification performance


Future WorkInvestigate active learning in combination with semi-supervised learning


AcknowledgementNational Institutes of Health (GM090453) grantIEEE SoutheastCon 2014, Lexington, KY

THANK YOU

Madhav Sigdel, mren Din, Semih Din, Madhu Sigdel, Marc Pusey, PhDRamazan Aygun, PhDEmail: [email protected] Research LabComputer Science DepartmentUniversity of Alabama in Huntsville

Crystallization classification semisupervised

Data & Analytics