Top Banner
AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..
20

AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Jan 02, 2016

Download

Documents

Neal Parker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

AppSec USA 2014

Denver, Colorado

Catch me if you can

Machine Learning, VMs, honeypots and more..

Page 2: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Ph.D. CSE – works at CloudFlare

Anirban Banerjee• San Francisco• Web-Malware detection• Machine learning, scalable systems• Interface with hosting industry• Co-Founder of StopTheHacker• Post acquisition at CloudFlare• Interested in malware detection, RE• Various talks at Hostingcon, parallels summit

Introduction

Page 3: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

• StopTheHacker• CloudFlare• Web Malware – Existing tools Fail• Web Malware – Attack Vectors• Identification• Scaling honeypots• Machine Learning

Quick Overview

Page 4: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

• StopTheHacker– Founded in 2009– Funded by NSF– Identifies, cleans

web-malware automatically

– Partners with hosters– Uses Machine

Learning, pattern matching, AVs, VMs

• CloudFlare– DDoS protection– CDN– WAF– Cloud Solution– Contribute to NGINX– Use Lua, Go– 5->7% of Internet

traffic daily

StopTheHacker - CloudFlare

Page 5: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

• AVs– Polymorphic

malware– Checks for AV

processes– Avast, ClamAV, AVG– Linux versions seem

to not be updated as frequently

• Pattern Matching– Trivial to change

code structure– Trivial to change

commands– Yara, Perl, Grep, Awk

Web Malware - Existing Tools Fail

Page 6: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

• Via Website– SQL Injection– XSS– Ads– 3rd party libraries– Themes– Plugins

• Bypass – FTP creds– Apache modules– SEO poisoning

Web Malware – Attack Vectors

Page 7: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

• Making it a bit harder– Custom WP packages e.g. Dreampress– Auto upgrades– WAFs– Proper separation of web server and CMS roles– End clients must be educated– *Some* default scanning for *every* site• Free to end client

– Web-Malware collaboration group (SBW)

Web Malware – Attack Vectors

Page 8: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Web malware• High churn– Iframe targets– Fast flux networks– Encoded, encrypted,

randomly generated domains

– PhP code changes

Binaries• Low churn– Primarily PE32/Win– Target old IE exploits– Spyware/Adware

more than malware– FTP sniffers, IRC drop

Identification - Highlights

Page 9: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Web malware• Detection is hard– What is malware? Redirection, binary drop,

registry modification..– PhP, ASP, Shell, Perl, Python, Ruby..– Malware is smart: UA, Geo IP, Time of day, only

once per IP..– Blacklists very outdated– AVs have very poor catch rate

Identification – Challenges

Page 10: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Scaling honeypots

Bare Metal

OS

Docker

dev.go.com

Bad Hacker Bad Bot

BLWAF

Front End

Public API

Container

IP, file deposited etc..

Host content, tripwire, analyze binary

WP 3.6.1, 3.7, 2.8, 3.0 Joomla, Drupal, Django – Any flavor we want

Cuckoo based VMWindows binaries and honeypot

Page 11: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Yes• Docker – common library re-use• Spawn thousands of instances on one rack• Any flavor of CMS you like• Watchdog for file system changes• Dropped files shipped off to cuckoo VM• Complete trace, screenshots with specific IE

version

Scaling honeypots - Is this better

Page 12: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Constant Cat and Mouse game• Rotate IPs, avoid customer IPs• Juicy target for DDoS (400 Gigs/s +)• Keep up with new variants• Malware getting smarter, check for VM• Malware targets mobile devices

Scaling honeypots - Challenges

Page 13: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Helps identify the unseen • Need a dataset– Offensive computing, virustotal, blacklists..

• Analyze what is important– Reduce noise– More features is not always better– PCA type experiments– Use rules of thumb – forests/Trees– Scikitpy/pybrain/weka is your friend

Machine Learning

Page 14: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Toolkit strategy• Pybrain– Use for clustering, neural network– Identify what clusters are present

• Scikitpy/weka– Use for classification– Constant retraining needed : high recall, precision– Feedback loop based system is important

Machine Learning

Page 15: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

What is the benefit• Fuzzed iframes caught easily• Fuzzed/encoded PHP/JS caught easily• Catches ad misbehavior• Catches binary that is missed by AV but tries

to do “obvious” bad things• Lets move away from signatures

Machine Learning

Page 16: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Is it all roses and honey?• No – constant retraining needed• Has to be able to get large dataset– Features increase, exponential increase in data

• CPU needed• Near-Real-time very hard• Toolkits are good – but can be better

Machine Learning

Page 17: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Right now• Pybrain– Use for clustering, neural network– Identify what clusters are present

• Scikitpy/weka– Use for classification– Constant retraining needed : high recall, precision– Feedback loop based system is important

Current Status and Future Plans

Page 18: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Future Plans• Inline ML for WAF• More focus on mobile malware• More focus on DDoS malware• More focus on using ML – traffic anomalies

Current Status and Future Plans

Page 19: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

The road ahead• Make VM detection harder• Use on metal type solution – performance!• Investigate Go for inline traffic processing• Potentially open source portions of code• Automated malware collection at massive

scale

More work needed

Page 20: AppSec USA 2014 Denver, Colorado Catch me if you can Machine Learning, VMs, honeypots and more..

Q&[email protected]

That’s it folks