Top Banner
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006
62

METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

METAMORPHIC SOFTWARE FOR GOOD AND EVIL

Wing Wong&

Mark Stamp

November 20, 2006

Page 2: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Outline

I. Metamorphic software What is it? Good and evil uses

II. Metamorphic virus construction kitsIII. How effective are metamorphic engines?

How to compare two pieces of code? Similarity of viruses/normal code

IV. Can we detect metamorphic viruses? Commercial virus scanners HMMs and similarity index

V. Conclusion

Page 3: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

PART I

Metamorphic Software

Page 4: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

What is Metamorphic Software?

Software is metamorphic provided All copies do the same thing Internal structure differs

Today almost all software is cloned “Good” metamorphic software…

Mitigate buffer overflow attacks “Bad” metamorphic software…

Avoid virus/worm signature detection

Page 5: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Metamorphic Software for Good?

Suppose program has a buffer overflow If we clone the program

One attack breaks every copy Break once, break everywhere (BOBE)

If instead, we have metamorphic copies Each copy still has a buffer overflow One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology

A little metamorphism does a lot of good!

Page 6: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Metamorphic Software for Evil?

Cloned virus/worm can be detected Common signature on every copy Detect once, detect everywhere (DODE?)

If instead virus/worm is metamorphic Each copy has different signature Same detection may not work against every copy Provides DODE-resistance? Analogous to genetic diversity in biology

Effective use of metamorphism here is tricky!

Page 7: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Crypto Analogy

Consider WWII ciphers German Enigma

Broken by Polish and British cryptanalysts Design was (mostly) known to

cryptanalysts Japanese Purple

Broken by American cryptanalysts Design was (mostly) unknown to

cryptanalysts

Page 8: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Crypto Analogy

Cryptanalysis break a (known) cipher Diagnosis determine how an unknown

cipher works (from ciphertext) Which was the greater achievement,

breaking Enigma or Purple? Cryptanalysis of Enigma was harder Diagnosis of Purple was harder

Can make a reasonable case for either…

Page 9: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Crypto Analogy

What does this have to do with metamorphic software?

Suppose the good guys generate metamorphic copies of software

Bad guys can attack individual copies Can bad guys attack all copies?

If they can diagnose our metamorphic generator, maybe

But that’s a diagnosis problem…

Page 10: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Crypto Analogy

What about case where bad guys write metamorphic code? Metamorphic viruses, for example

Do good guys need to solve diagnosis problem? If so, good guys are in trouble

Not if good guys “only” need to detect the metamorphic code (not diagnose)

Not claiming the good guys job is easy Just claiming that there is hope…

Page 11: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Evolution

Viruses first appeared in the 1980s Fred Cohen

Viruses must avoid signature detection Virus can alter its “appearance”

Techniques employed encryption polymorphic metamorphic

Page 12: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Evolution - Encryption

Virus consists of decrypting module (decryptor) encrypted virus body

Different encryption key different virus body signature

Weakness decryptor can be detected

Page 13: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Evolution – Polymorphism

Try to hide signature of decryptor Can use code emulator to decrypt

putative virus dynamically Decrypted virus body is constant

Once (partially) decrypted, signature detection is possible

Page 14: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Evolution – Metamorphism

Change virus body Mutation

techniques: permutation of

subroutines insertion of

garbage/jump instructions

substitution of instructions

Page 15: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

PART II

Virus Construction Kits

Page 16: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Construction Kits – PS-MPC

According to Peter Szor:“… PS-MPC [Phalcon/Skism Mass-Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”

Page 17: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Construction Kits – G2

From the documentation of G2 (Second Generation virus generator):

“… different viruses may be generated from identical configuration files…”

Page 18: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Construction Kits – NGVCK

From the documentation for NGVCK (Next Generation Virus Creation Kit):

“… all created viruses are completely different in structure and opcode…… impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code”

Oh, really?

Page 19: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

PART III

How Effective Are Metamorphic Engines?

Page 20: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

How We Compare Two Pieces of Code

Opcode sequences Score

0 call1 pop2 mov3 sub

… m-1 m-1

… score = n-1 jmp average

% match

0 push 0 n-1 0 n-11 mov2 sub3 and

…m-1 retn

Program X

Graph of real matches

Program Y Program Y

(lines with length > 5)(matching 3 opcodes)Assembly programs

Program X

Graph of matches

Program X

Program Y

Page 21: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Families – Test Data

Four generators, 45 viruses 20 viruses by NGVCK 10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN

20 normal utility programs from the Cygwin bin directory

Page 22: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Similarity within Virus Families – Results

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 50 100 150 200

Comparison number

Similarity score

NGVCK viruses

Normal files

Page 23: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Similarity within Virus Families – Results

NGVCK G2 VCL32 MPCGEN Normalmin 0.01493 0.62845 0.34376 0.44964 0.13603max 0.21018 0.84864 0.92907 0.96568 0.93395average 0.10087 0.74491 0.60631 0.62704 0.34689

Minimum, maximum, and average similarity scores

Page 24: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Similarity within Virus Families – Results

Size of bubble = average similarity

NGVCK

G2VCL32 MPCGENNormal

0

0.2

0.4

0.6

0.8

1

1.2

-0.2 0 0.2 0.4 0.6 0.8

Minmum similarity score

Maximum similarity score

NGVCK

G2

VCL32

MPCGEN

Normal

Page 25: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Similarity within Virus Families – Results

IDA_ NGVCK0-

IDA_ NGVCK8 (11.9%)

IDA_G4- IDA_G7 (75.2%)

Page 26: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Similarity within Virus Families – Results

IDA_VCL0-

IDA_VCL9

(60.2%)

IDA_MPC1-

IDA_MPC3

(58.0%)

Page 27: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

NGVCK Similarity to Virus Families

NGVCK versus other viruses 0% similar to G2 and MPCGEN viruses 0 – 5.5% similar to VCL32 viruses (43

out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8

out of 400 comparisons have score > 0)

Page 28: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

NGVCK Metamorphism/Similarity

NGVCK By far the highest degree of

metamorphism of any kit tested Virtually no similarity to other viruses

or normal programs Undetectable???

Page 29: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

PART IV

Can Metamorphic Viruses Be Detected?

Page 30: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Commercial Virus Scanners

Tested three virus scanners eTrust version 7.0.405 avast! antivirus version 4.7 AVG Anti-Virus version 7.1

Each scanned 37 files 10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses

Page 31: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Commercial Virus Scanners

ResultseTrust and avast! detected 17

(G2 and MPCGEN)AVG detected 27 viruses (G2,

MPCGEN and VCL32)none of NGVCK viruses detected

by the scanners tested

Page 32: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs

Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants Train the model on family of

metamorphic viruses Use trained model to determine

whether a given program is similar to the viruses the HMM represents

Page 33: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs – Data

Data set 200 NGVCK viruses (160 for training, 40

for testing) Comparison set

40 normal exes from Cygwin 25 other “non-family” viruses (G2,

MPCGEN and VCL32) 25 HMM models generated and

tested

Page 34: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs – MethodologyTraining:

(1)Training set(160 files) (2) Training (4)

Threshold

(3)

Data Set

(1) Test set Normal programs(40 files) (40 files)

Other viruses(25 files)

Comparison Set

Classifying:

(3) Scoring

(1) Scoring LLPO > Threshold ?

HMM

Scores (LLPO) virus0 -2.0 virus1 -2.3 :

:

random0 -11.3 :

other0 -8.9

HMMProgram A

Page 35: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs – Results

Test set 0, N = 2

-160

-140

-120

-100

-80

-60

-40

-20

0

0 10 20 30 40

File number

Score (LLPO)

family viruses

normal files

Page 36: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs – Results

Detect some other viruses “for free”

Test set 0, N = 3

-180

-160

-140

-120

-100

-80

-60

-40

-20

0

0 10 20 30 40

File number

Score (LLPO)

familyviruses

non-familyviruses

normalfiles

Page 37: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs

Summary of experimental results All normal programs distinguished VCL32 viruses had scores close to

NGVCK family viruses With proper threshold, 17 HMM models

had 100% detection rate and 10 models had 0% false positive rate

No significant difference in performance between HMMs with 3 or more hidden states

Page 38: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Virus Detection with HMMs – Trained Models

Converged probabilities in HMM matrices may give insight into the features of the represented viruses

We observe opcodes grouped into “hidden” states most opcodes in one state only

What does this mean? We are not sure…

Page 39: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Detection via Similarity Index

Straightforward similarity index can be used as detector To determine whether a program belongs

to the NGVCK virus family, compare it to any randomly chosen NGVCK virus

NGVCK similarity to non-NGVCK code is small

Can use this fact to detect metamorphic NGVCK variants

Page 40: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Detection via Similarity Index

Threshold determination:Pairwise comparison Scoring

Virus V Subset of D (randomly (randomly

chosen) chosen)Virus 0Virus 1

Data set D :Virus X

Classifying:Scoring

Similarity score > Threshold ? Yes => family virus

Virus V No => not family virus

Similarity scores Virus 0 0.035 Virus 1 0.041 : : Virus X 0.189

Program A

Page 41: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Detection via Similarity Index

Experiment compare 105 programs to one

selected NGVCK virus Results

100% detection, 0% false positive Does not depend on specific

NGVCK virus selected

Page 42: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

PART V

Conclusion

Page 43: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Conclusion

Metamorphic generators vary a lot NGVCK has highest metamorphism

(10% similarity on average) Other generators far less effective (60%

similarity on average) Normal files 35% similar, on average

But, NGVCK viruses can be detected! NGVCK viruses too different from other

viruses and normal programs

Page 44: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Conclusion

NGVCK viruses not detected by commercial scanners we tested

Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy

NGVCK viruses also detectable by similarity index

Page 45: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Conclusion

All metamorphic viruses tested were detectable because High similarity within family and/or Too different from normal programs

Effective use of metamorphism by virus/worm requires A high degree of metamorphism and

similarity to other programs This is not trivial!

Page 46: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

The Bottom Line

Metamorphism for “good” Buffer overflow mitigation, BOBE-

resistance A little metamorphism does a lot of good

Metamorphism for “evil” For example, try to evade virus/worm

signature detection Requires high degree of metamorphism

and similarity to normal programs Not impossible, but not easy…

Page 47: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

The Bottom Bottom Line

All-too-often in security, the advantage lies with the bad guys

For metamorphic software, perhaps the inherent advantage lies with the good guys

Page 48: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

References X. Gao, Metamorphic software for buffer overflow

mitigation, MS thesis, Dept. of CS, SJSU, 2005 P. Szor, The Art of Computer Virus Research and

Defense, Addison-Wesley, 2005 M. Stamp, Information Security: Principles and

Practice, Wiley InterScience, 2005 M. Stamp, Applied Cryptanalysis: Breaking Ciphers

in the Real World, Wiley, 2007 W. Wong, Analysis and detection of metamorphic

computer viruses, MS thesis, Dept. of CS, SJSU, 2006

W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology, Vol. 2, No. 3, 2006, pp. 211-229

Page 49: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Appendix

Bonus Material

Page 50: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

Hidden Markov Models (HMMs)

state machines transitions between states have fixed

probabilities each state has a probability distribution for

observing a set of observation symbols states = features of the input data transition and the observation probabilities

= statistical properties of features can “train” an HMM to represent a set of

data (in the form of observation sequences)

Page 51: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Example – the Occasionally Dishonest Casino

1: 1/6 0.05 1: 1/100.95 2: 1/6 2: 1/10 0.9

3: 1/6 3: 1/104: 1/6 4: 1/105: 1/6 5: 1/106: 1/6 0.1 6: 1/2

Fair Loaded

Page 52: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Example – the Occasionally Dishonest Casino

2 states: fair/loaded The switch between dice is a Markov

process Outcomes of a roll have different

probabilities in each state If we can only see a sequence of rolls, the

state sequence is hidden want to understand the underlying

Markov process from the observations

Page 53: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMMs – the Three Problems

1. Find the likelihood of seeing an observation sequence O given a model , i.e. P(O | )

2. Find an optimal state sequence that could have generated a sequence O

3. Find the model parameters given a sequence O

There exist efficient algorithms to solve the three problems

Page 54: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM

Page 55: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Application – Determining the Properties of English Text

Given: a large quantity of written English text

Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space)

Train a model to find the most probable parameters (i.e., solve Problem 3)

Page 56: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Application – Initial and Final Observation Probability Distributions

Page 57: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Application - Results

Observation probabilities converged, each letter belongs to one of the two hidden states

The two states correspond to consonants and vowels

Can use trained model to score any unknown sequence of letters to determine whether it corresponds to English text. (i.e. Problem 1)

Note: no a priori assumption was made HMM effectively recovered the statistically

significant feature inherent in English

Page 58: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMM Application - Results

Probabilities can be sensibly interpreted for up to n = 12 hidden states

Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformation

Page 59: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMMs – The Trained Models

popretnpushjbrcljnbjadivadcrorshrrol

addsar

boundcmpsbretfmovxordecnotimul

movsbstosdlodswlodsdlodsbin

repemovsdfnstenv

cmcjns jle clc rcr fildout

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

observation probability

opcode

state 0

state 1

state 2

Page 60: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMMs – Run Time of Training Process

5 to 38 minutes, depending on number of states N.

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7

Number of states N

Training time (seconds)

500 iterations

800 iterations

Page 61: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

HMMs – Run Time of Classifying Process 0.008 to 0.4 milliseconds, depending on N and number of opcodes

T .

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 500 1000 1500

Length of observation sequence T

Scoring time (milliseconds)

N = 2

N = 3

N = 4

N = 5

N = 6

Page 62: METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.

AVG Anti-Virus Scanning Result