Detection of ASCII Malware

Parbati Kumar Manna

Dr. Sanjay Ranka

Dr. Shigang Chen

Internet Worm and Malware

• Huge damage potential Infects hundreds of thousands of

computers Costs millions of dollars in damage Melissa, ILOVEYOU, Code Red,

Nimda, Slammer, SoBig, MyDoom

• Mostly uses Buffer Overflow

• Propagation is automatic (mostly)

Recent Trends

• Shift in hacker’s mindset

• Malware becoming increasingly evasive and obfuscative

• Emergence of Zero-day worms

• Arrival of Script Kiddies

Motivation for ASCII Attacks

• Prevalence of servers expecting text-only input

• Text-based protocols

• Presumption of text being benign

• Deployment of ASCII filter for bypassing text

IDS Detecting ASCII Attack?

• Disassembly-based IDS

All jump instructions are ASCII

Higher proportion of branches

Exponential disassembly cost

High processing overhead for IDS

• Frequency-based IDS

PAYL evaded by ASCII worm

Buffer Overflow

• Opcode Unavailability Shellcode requires binary opcodes Here only xor, and, sub, cmp etc. Must generate opcodes dynamically

• Difficulty in Encryption No backward jump Can’t use same decrypter routine

for each encrypted block No one-to-one correspondence

between ASCII and binary

Constraints of ASCII Malware

0 m a y v a r y

ASCII binary

Creation of ASCII Malware

Buffer Overflow using ASCII

Overflowing a buffer using an ASCII string:

• Opcode Unavailability Dynamic generation of opcodes

needs more ASCII instructions for each binary instruction

• Difficulty in Encryption No backward jump means

decrypter block for each encrypted block must be hardcoded

Long sequence of contiguous valid instructions likely high MEL

Detection of ASCII Malware

What is this MEL?

• Indicates maximum length of an execution path

Need to disassemble (and execute) from all possible entry points

All branching must be considered• Abstract payload execution

Used for binary worms with sled Effectiveness dwindled presently

Maximum Executable Length

Benign Text has Low MEL

• Contains characters that correspond to invalid instructions

Privileged Instruction (I/O) Arbitrary Segment Selector More Memory-accessing

instructions – may use uninitialized registers

Long sequence of contiguous valid instructions unlikely low MEL

Proposed Solution

Question:

• How long is “long”?

• Find out the maximum length of valid instruction sequence

• If it is long enough, the stream contains a malware

• Toss a coin n times

• What is the probability that the max distance between two consecutive heads is ?

Probabilistic Analysis

Head (H) Invalid Instruction (I)

Tail (T) Valid Instruction (v)

T H T T H T T T T T H T T TV I V V I V V V V V I V V V

n = number of coin tosses p = probability of a head Xi = R.V.s for inter-head

distancesXmax = Max inter-head distanceC.D.F of Xmax = Prob [Xmax ≤ x]

= [1 – p(1-p)x ]n

F.P. rate = 1 - Prob [Xmax ≤ τ] = 1 - [1 – p(1-p)τ ]n

For a fixed N = k (exactly k invalid instructions)

For all possible values of N:

Threshold Calculation

n , p , (false positive rate)

(max inter-head distance)

Unknown

)1log(

log))1(1log(1

Threshold

Independence Assumption

2 test contingency table

Observed Expected

I2 is valid

I2 is invali

I1 is valid

I2 is invalid

I1 is valid 8960 2797 8922 2835

I1 is invalid 2797 938 2835 900

• Validity of an instruction is an independent event

• All the Xi’s are independent (while Xi = n)

With increasing n, we must choose a larger to keep the same rate of false positive

With decreasing p, we must choose a larger to keep the same rate of false positive

Determine n

size)n instructio (average

)charactersinput ofnumber (

E[I] = E[Prefix chain length] + E[core instruction length]

Obtained from character frequency of input data

1.Privileged instructions

2.Wrong Segment Prefix Selector

3.Un-initialized memory access

Determine p

Invalid Instructions

Only 1. and 2. can be determined on a standalone basis

Experimental Setup

Implementation

Experimental Setup

• Benign data setup ASCII stream captured from live CISE network

using Ethereal

• Malicious data setup Existing framework used to generate ASCII worm

by converting binary worms

• Promising experimental results for max valid instruction length Benign: all max values all below threshold Malicious: values significantly higher than

Experimental Results (DAWN)

Experimental Results (APE-L)

Contrasting with APE

• Full content examination

• Threshold calculation

• Sled Vs. malware

• Exploiting text-specific properties

Multilevel Encryption

Encryption

Decryption

binary ASCII ASCII

ASCII ASCII binary

Only Visible decrypter

Multilevel Encryption

Text0x20 – 0x3F

Text0x40 – 0x5F

Text0x60 – 0x7E

Binary

Questions

Thank you

Detection of ASCII Malware

Documents

Malware Detection using Machine Learning

Malware Classification And Detection

Metamorphic Malware Analysis and Detection

Malware Detection in Android Applications

Malware detection with OSSEC @santiagobassett. Setting up a....

Behavior-Based Malware Detection

Enhancing Accuracy of Android Malware Detection using Intent...

Polymorphic Malware Detection

Analyzing Malware Detection Efficiency with Multiple...

Skeleton key malware detection owasp

Malware Detection

Malware Detection API - Qualys · Chapter 2 MD API Search.....

Data Mining for Malware Detection

Script-based malware detection in Online Banking · PDF...

Data Mining for Malware Detection

Data Science Driven Malware Detection