Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Fourteenforty Research Institute, Inc.

FFRI, Inc. http://www.ffri.jp

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Junichi Murakami Director of Advanced Development

FFRI, Inc.

• This slides was used for a presentation at CSS2013

– http://www.iwsec.org/css/2013/english/index.html

• Please refer the original paper for the detail data

– http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf (Written in Japanese but the figures are common)

• Contact information

– [email protected]

– @FFRI_Research (twitter)

Preface

2

http://www.iwsec.org/css/2013/english/index.html

http://www.iwsec.org/css/2013/english/index.html

http://www.ffri.jp/assets/files/research/research_papers/MWS2013_paper.pdf



mailto:[email protected]



FFRI, Inc.

• Background

• Problem

• Scope and purpose

• Experiment 1

• Experiment 2

• Experiment 3

• Consideration

• Conclusion

Agenda

3

FFRI, Inc.

Background – malware and its detection

4

Increasing

malware

Targeted Attack

(Unknown malware)

Malware generators

Obfuscators

Limitation of

signature matching

other methods

Heuristic

Could reputation

Machine learning Bigdata

FFRI, Inc.

Background – Related works

5

Features

Static information

Dynamic information

Hybrid

Algorithms

SVM

Naive bayes

Perceptron, etc.

Evaluation

TPR/FRP, etc.

ROC-curve, etc.

Accuracy, Precision

• Mainly focusing on a combination of the factors below

– Features selection and modification, parameter settings

• Some good results are reported (TRP:90%+, FRP:1%-)

FFRI, Inc.

• General theory of machine learning:

– Accuracy of classification declines if trends of training and testing data are different

• How about malware and benign files

Problem

6

? ?

FFRI, Inc.

① Investigating differences between similarities of malware and benign files(Experiment-1)

② Investigating an effect for accuracy of classification by the difference(Experiment-2)

③Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3)

Scope and purpose

7

FFRI, Inc.

• Used FFRI Dataset 2013 and benign files we collected as datasets

• Calculated the similarity of each malware and benign files (Jubatus, MinHash)

• Feature vector: A number of 4-gram of sequential API calls

– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times

Experiment-1(1/3)

8

malware

benign A B C ...

A

B

C

...

A B C ...

A ー 0.8 0.52 ...

B ーー 1.0 ...

C ーーー ...

... ーーーー

FFRI, Inc.

Grouping malware and benign files based on their similarities

Experiment-1(2/3)

9

Threshold of similarity (0.0 - 1.0) benign

malware

FFRI, Inc.

Experiment-1(3/3)

10

0%

20%

40%

60%

80%

100%

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

正常

系

マル

ウェ

ア

0.8 0.85 0.9 0.95 1

仲間無

仲間有

Threshold of similarity

It is more difficult to find similar benign files compared to malware

malw

are

malw

are

malw

are

malw

are

malw

are

benig

n

benig

n

benig

n

benig

n

benig

n

unique

not unique

FFRI, Inc.

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

Experiment-2(1/3)

11

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

TPR: True Positive Rate FPR: False Positive Rate

train

testi

ng

FFRI, Inc.

Experiment-2(2/3)

12

benign

malware

train

jubatus

classify

jubatus TPR: ?

FPR: ?

train

testi

ng

• How much does the difference affect a result?

• 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW)

FFRI, Inc.

The accuracy declines if trends of training and testing data are different

Experiment-2(3/3)

13

0 50 100 0 1 2 3 4 5

■TPR ■FPR

97.996(not unique)

81.297(unique)

0.624(not unique)

4.49(unique)

-16.699

+3.866

% %

FFRI, Inc.

14

benign(train) malware(train)

benign(test) malware(test）

dividing line

Experiment-3(1/6) – After a training

malware

benign

FFRI, Inc.

Experiment-3(2/6) – After a classification

15



dividing line

FFRI, Inc.

16

FP

FN

Experiment-3(2/6) – After a classification



dividing line

FFRI, Inc.

Experiment-3(3/6) – Low similarity data

17

TP(accidentally)

FN

FN



dividing line

FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

18

0.88

0.90

0.92

0.94

0.96

0.98

1.00

0

200

400

600

800

1000

1200

1400

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TP

FN

TPR


The n

um

ber

of cla

ssifie

d d

ata

FFRI, Inc.

Experiment-3(5/6) – Effect to FPR

19

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0

500

1000

1500

2000

2500

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

TN

FP

FPR

The n

um

ber

of cla

ssi

fied d

ata


FFRI, Inc.

Experiment-3(6/6)

20

0%

20%

40%

60%

80%

100%

120%

0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

マルウェア正常系ソフトウェア


The n

um

ber

of cla

ssifie

d d

ata

/ The n

um

ber

of to

tal te

stin

g d

ata

Transition of the number of classified data

malware benign

FFRI, Inc.

• In real scenario:

– trying to classify an unknown file/process whether it is benign files or not

• If we apply Experiment-3:

– Files are classified only if similar data is already trained

– If not, files are not classified which results in

• FN if the files is malware

• TF if the files is benign (All right as a result)

• Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable)

Consideration(1/3)

21

FFRI, Inc.

• If malware have many variants as the current

– ML-based detection works well

• Having many variants ∝ malware generators/obfuscators

• We have to investigate

– Trends of usage of the tools above

– Possibility of anti-machine learning detection

Consideration(2/3)

22

FFRI, Inc.

• How to deal with unclassified (filtered) data

1. Using other feature vectors

2. Enlarging a training dataset (Unique → Not unique)

3. Using other methods besides ML

Consideration(3/3)

23

FFRI, Inc.

• Distribution of similarity for malware and benign are difference (Experiment-1)

• Accuracy declines if trends of training and testing data are different (Experiment-2)

• TPR of unique malware declines when we remove low similarity data (Experiment-3)

• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

Conclusion

24

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Technology