Top Banner
RUSBoost : A Hybrid Approach to Alleviating Class Imbalance 19. 05. 17 Yongwon Jo Data Mining & Quality Analytics Lab.
83

RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost : A Hybrid Approach to Alleviating Class Imbalance

19. 05. 17Yongwon Jo

Data Mining & Quality Analytics Lab.

Page 2: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 3: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 4: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

It is the problem in classification where the total number of a class of data (positive) is far less than the total number of another class of data (negative).

This problem exists for many domains.

Heart Disease

Health

Heart Disease

Normal

Spam

Mail

Page 5: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

Below plots are the class imbalance situation I actually saw.

Page 6: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

Below plots are the class imbalance situation I actually saw.

It is a bar that shows the output quantity divided by the remain quantity.

Page 7: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

Below plots are the class imbalance situation I actually saw.

It is a bar chart about whether the lot will be put into the next process.

Page 8: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

RandomForestClassifier(max_depth=30, n_estimators=200)

Train dataset -> Accuracy : 0.90089 | F1 : 0.652276 | Recall : 0.98723 | Precision : 0.45484

Validation dataset -> Accuracy : 0.83854 | F1 : 0.19458 | Recall : 0.29088 | Precision : 0.14618

Page 9: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

I. Introduction to Class Imbalance problem

Class Imbalance problem

Predicted

0 1 2 3 4 5 6 7 8 9 10

Actual

0 515126 86 1 6 8 7 3 2 4 2 8572

1 6900 528 17 2 0 0 2 0 0 2 1585

2 5375 147 20 8 3 0 0 0 1 0 1596

3 4154 50 11 9 1 1 2 0 1 0 1535

4 3808 37 5 5 3 1 0 0 0 0 1436

5 3407 27 1 2 1 0 1 0 1 1 1450

6 2870 16 0 1 0 2 2 1 0 0 1370

7 2928 11 1 0 0 0 1 1 0 0 1464

8 2596 18 0 3 2 0 0 0 1 1 1383

9 2742 7 0 0 0 2 1 0 1 0 1466

10 56073 37 6 0 4 3 4 4 4 4 71017

RandomForestClassifier(max_depth=30, n_estimators=200)

Train dataset -> Accuracy : 0.90089 | F1 : 0.652276 | Recall : 0.98723 | Precision : 0.45484

Validation dataset -> Accuracy : 0.83854 | F1 : 0.19458 | Recall : 0.29088 | Precision : 0.14618

Page 10: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RandomForestClassifier(max_depth=30, n_estimators=200)

Train dataset -> Accuracy : 0.90089 | F1 : 0.652276 | Recall : 0.98723 | Precision : 0.45484

Validation dataset -> Accuracy : 0.83854 | F1 : 0.19458 | Recall : 0.29088 | Precision : 0.14618

I. Introduction to Class Imbalance problem

Class Imbalance problem

Predicted

0 1 2 3 4 5 6 7 8 9 10

Actual

0 515126 86 1 6 8 7 3 2 4 2 8572

1 6900 528 17 2 0 0 2 0 0 2 1585

2 5375 147 20 8 3 0 0 0 1 0 1596

3 4154 50 11 9 1 1 2 0 1 0 1535

4 3808 37 5 5 3 1 0 0 0 0 1436

5 3407 27 1 2 1 0 1 0 1 1 1450

6 2870 16 0 1 0 2 2 1 0 0 1370

7 2928 11 1 0 0 0 1 1 0 0 1464

8 2596 18 0 3 2 0 0 0 1 1 1383

9 2742 7 0 0 0 2 1 0 1 0 1466

10 56073 37 6 0 4 3 4 4 4 4 71017

Page 11: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 12: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Sampling Techniques

Algorithm Techniques

Feature selection Techniques

Over Sampling vs. Under Sampling

AdaBoost ,……

Lots of feature selection techniques

출처 : Class Imbalance Problem in Data Mining: Review 1Mr.Rushi Longadge, 2 Ms. Snehlata S. Dongre, 3Dr. Latesh Malik

Page 13: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Sampling Techniques

Algorithm Techniques

Feature selection Techniques

Over Sampling vs. Under Sampling

Boosting, AdaBoost,

Lots of feature selection techniques

Page 14: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

Page 15: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

Page 16: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

Pred

Page 17: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

True Pred

Page 18: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

True Pred

Page 19: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

Page 20: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

① How to create repeatedly instances of positive class.

② SMOTE : Synthetic Minority Over-Sampling

Positive

Negative

Feature Space

Page 21: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

Positive

Negative

Select a observation from the minority(positive) class.

Feature Space

Page 22: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

Positive

Negative

Select 𝑘 nearst neighbors.(𝑘 = ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟)

𝑘 = 3

Feature Space

Page 23: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

Positive

Negative

Create a minority(positive) class arbitrarily in a straight line between two points.

𝑘 = 3

Feature Space

Page 24: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

Positive

Negative

Create a minority(positive) class arbitrarily in a straight line between two points. -> Interpolation

𝑘 = 3

Feature Space

Page 25: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

Positive

Negative

𝑘 = 3

Feature Space

Create a minority(positive) class arbitrarily in a straight line between two points. -> Interpolation

Page 26: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Over Sampling

② SMOTE : Synthetic Minority Over-Sampling.

After generation is complete, apply a classification algorithm.

Feature Space

Positive

Negative

Generated

Page 27: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Under Sampling

① RUS : Random Under Sampling.

Positive

Negative

Feature Space

Page 28: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Under Sampling

① RUS : Random Under Sampling.

Positive

Negative

Remove negative(majority) observations randomly.

Feature Space

Page 29: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Under Sampling

① RUS : Random Under Sampling.

Positive

Negative

Remove negative(majority) observations randomly.

Remove

Feature Space

Page 30: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Under Sampling

① RUS : Random Under Sampling.

Positive

Negative

Remove negative(majority) observations randomly.

Feature Space

Page 31: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Comparison Under sampling with Over sampling

Advantages Disadvantages

Under Sampling

① We reduce the size of the training dataset by removing the data from the negative (majority) class.

② Time to train model when using under sampling techniques is shorter than oversampling techniques.

① Because we remove observations, we can not use the information that we have in the modeling process.

Over Sampling

① Since observations are not removed, no loss of information occurs.

② Because of the use of interpolation, class boundaries do not change. That is, the distribution of the positive (minority) class does not change.

① Because it creates observations for the positive (minority) class, it takes larger time to train the training data than undersampling.

Page 32: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Sampling Techniques

Algorithm Techniques

Feature selection Techniques

Over Sampling vs. Under Sampling

Boosting, AdaBoost,……

Lots of feature selection techniques

Page 33: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Boosting

Boosting is an ensemble method that creates a predictive model by continuously building weak models to better classify misclassified observations.

Page 34: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Boosting

Boosting is an ensemble method that creates a predictive model by continuously building weak models to better classify misclassified observations.

It takes a long time to generate a weak classifier based on misclassified observations, but it performs better than normal classifiers(ex. Decision Tree, logistic regression).

Page 35: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Page 36: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

Feature Space

Page 37: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

Feature Space

Page 38: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

Feature Space

Page 39: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

Feature Space

Page 40: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

③④

Feature Space

Page 41: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

③④

Feature Space

Page 42: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

③④

Feature Space

Page 43: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

③④

Feature Space

Page 44: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

AdaBoost(Adaptive Boosting)

AdaBoost is a boosting method that creates weak classifiers while giving larger weight to misclassified observations than well-classified observations.

Positive

Negative

Feature Space

Page 45: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

II. How to solve Class Imbalance problem

Page 46: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 47: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost : A Hybrid Approach to Alleviating Class Imbalance

III. RUSBoost vs. SMOTEBoost

IEEE TSMC(IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS)

Page 48: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost : A Hybrid Approach to Alleviating Class Imbalance

III. RUSBoost vs. SMOTEBoost

Page 49: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost : A Hybrid Approach to Alleviating Class Imbalance

III. RUSBoost vs. SMOTEBoost

Hybrid = (Sampling + Algorithm) technique

Page 50: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 51: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 52: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Positive

Negative

Removed

② RUSBoost(RUS + AdaBoost)

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 53: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 54: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 55: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 56: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 57: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 58: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 59: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

RUSBoost(RUS + AdaBoost)

Positive

Negative

Removed

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 60: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

III. RUSBoost vs. SMOTEBoost

RUSBoost Pesudo code

Page 61: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

III. RUSBoost vs. SMOTEBoost

RUSBoost Pseudo code

Page 62: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost : Improving Prediction of the Minority Class in Boosting

III. RUSBoost vs. SMOTEBoost

European Conference on Principles of Data Mining and Knowledge Discovery - 2003

Page 63: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Positive

Negative

SMOTEBoost(SMOTE + AdaBoost)

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 64: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 65: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 66: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 67: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 68: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 69: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

⑥ ⑦

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 70: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

SMOTEBoost(SMOTE + AdaBoost)

Positive

Negative

Generated

⑥⑧

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 71: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Positive

Negative

SMOTEBoost(SMOTE + AdaBoost)

III. RUSBoost vs. SMOTEBoost

Feature Space

Page 72: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

III. RUSBoost vs. SMOTEBoost

SMOTEBoost Pseudo code

Page 73: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

III. RUSBoost vs. SMOTEBoost

SMOTEBoost Pseudo code

Page 74: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 75: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Datasets

Experiments were conducted on 15 class imbalance data.

To ensure independence of each result value, 10-fold cross validation was performed 10 times in total.

Page 76: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Multiple Comparison(다중비교)

After the analysis of variance , this hypothesis test is conducted when the null hypothesis is rejected.

Page 77: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Multiple Comparison(다중비교)

After the analysis of variance , this hypothesis test is conducted when the null hypothesis is rejected.

Tests are conducted by grouping the two groups together and testing how similar the groups are.

Group1 Group2 Group3 Group4

Page 78: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Multiple Comparison(다중비교)

Group1 Group2 Group3 Group4

After the analysis of variance , this hypothesis test is conducted when the null hypothesis is rejected.

Tests are conducted by grouping the two groups together and testing how similar the groups are.

Page 79: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Multiple Comparison(다중비교)

Group1 Group2 Group3 Group4

After the analysis of variance , this hypothesis test is conducted when the null hypothesis is rejected.

Tests are conducted by grouping the two groups together and testing how similar the groups are.

Page 80: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

IV. Result of experiments

Result

Page 81: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Contents

I. Introduction to Class Imbalance problem

II. How to solve Class Imbalance problem

III. RUSBoost vs. SMOTEBoost

IV.Result of experiments

V. Conclusion

Page 82: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

V. Conclusion

Conclusion

You should use the appropriate algorithm for your problem situation.

For example, if you do not know whether the data is very large and can be operated on the memory, it is recommended to select the RUS algorithms.

If the training dataset is very small and the number of positive (minority) class is also small, you should use the SMOTE algorithms.

Page 83: RUSBoost : A Hybrid Approach to Alleviating Class Imbalancedmqa.korea.ac.kr/uploads/seminar/190517 DMQA Open Seminar... · 2019-05-21 · RUSBoost : A Hybrid Approach to Alleviating

Thank you.