Top Banner
Application Identification in information-poor environments Charalampos Rotsos 02/02/2010 1 What is application identification Current status My work Future plans Open questions
12

Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Application Identification in information-poor environments

Charalampos Rotsos

02/02/2010 1

What is application identificationCurrent status

My workFuture plans

Open questions

Page 2: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

02/02/2010 2

Why?E.g., CoS, security, perfomance-analysis

Page 3: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Taxonomy of Application identification techniques

• Deep Packet InspectionMatch payload with well know protocol signatures

• Statistical AnalysisExtract network measurement ( packet size, pack

interarrival time ) and search for patterns (ML, statistical analysis etc.)

• Behavioral/Graph AnalysisFind connection patternCreate features based on the connection graph

02/02/2010 3

Page 4: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Statistical Analysis

Focused on flow-features• Which features are high-quality?• Which features are computationally-simple?02/02/2010 4

???Packet-size

Inter-packet-rate

TCP header information

Flow duration

Page 5: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Progress so far• The problem is solved

– 5 packets sufficient to classify a flow– Achieve at least 90% accuracy on all classes

• But not really….– Difficult to extract required features– Identification accuracy– Temporal stability is aweful– Technical issues:

02/02/2010 5

Page 6: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Can we do better?

• Restate the problem. • Use information that can be extracted from

current networks (a.k.a. SNMP, NetFlow).• Use better machine learning.• Define models that bridge the gap between

statistical and behavioral properties.

02/02/2010 6

Page 7: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Better ML on NetFlow• Semi-supervised learning on NetFlow data using

Bayesian data analysis. Better performance than Bayes classifier in Weka Bayesian modeling provides good parameterization Efficient reduction of the effect of time dependence

of the feature set.

Temporal and Spatial decay Difficult to balance between a model both accurate and flexibleNetFlow doesn’t provide clean separation of classes

02/02/2010 7

Page 8: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

What is next?

• Richer dataset– Aggregate flows for ports/hosts/networks– Increase dimensions by simple feature

engineering.

• Better mathematical models– Incorporate domain-specific knowledge.– Connection graph defined inference diagram.

02/02/2010 8

Page 9: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Inference Diagram

02/02/2010 9

Alice Web Server Bob

• The flows between Alice - web server are correlated and respond to the same application.

• The flow of Alice - web server and Bob - web server also correspond to the same application.

• Research on application identification hasn’t found a framework to accommodate these observations.

Web-browser

Web-browser

Page 10: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Inference Diagram – more difficult

02/02/2010 10

AliceUse random ports

BobUse random ports

Ftp Server – port 22Web Server – port 80

Database Server – port 1680

• Computers will run multiple application in parallel.

• BUT, applications on a particular server will always use a specific port.

Page 11: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

A first approach!

• Similar problem can be found in the case of node labeling– Aggregate flow records over some defined period – Use Markov Random Fields model for inference

propagations– Apply approximate inference methods (Gibbs

sampling, Message Passing) – In the end, apply some engineering ideas to refine

results

02/02/2010 11

Page 12: Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Open problems

• Is the model a good approximation?• What am I classifying and for how long?• Ports, Hosts or Networks? Is it possible to do

multi-layer analysis?• Are the approximation techniques converging? Turning the difficulty to “Eleven”…• Compute the performance of an individual

traffic within a VPN… by monitoring alone.

02/02/2010 12

Thank you!!!!