Top Banner
Machine Learning for Detecting Malware Talha Obaid Ling Zhou Timothy You Xinlei Cai MLConf – Atlanta Sep 2017 Email Security Scripting
32

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Jan 22, 2018

Download

Technology

MLconf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Machine Learning for Detecting Malware

Talha Obaid Ling Zhou Timothy You Xinlei Cai

MLConf – Atlanta Sep 2017

Email Security

Scripting

Page 2: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

The Team!

Ling ZhouTimothy You

Xinlei Cai

Talha Obaid

Page 3: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Machine Learning @ Symantec

• Early adopter of ML in industry• SRL – Symantec Research Labs• CAML – Centre for Advanced Machine Learning • Malware detection, spam identification • Helped achieve the compounded impact• Malware polymorphism

https://www.symantec.com/connect/blogs/meet-symantec-labs-industrys-best-kept-secret

Page 4: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Reference:https://www.symantec.com/connect/blogs/machine-learning-not-only-answer

How I got infected?

Page 5: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Email – as a carrier!

Page 6: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Email is the weapon of choice!

• One in 131 emails contained malicious link or attachment, the highest rate in five years

• The rate jumped from 1 in 220 emails in 2015 to 1 in 131 emails in 2016

• In 2016 Small to Medium sized Businesses were the most impacted by phishing attacks with 1 in 95 emails containing malware

• Email sent daily in 2016 – 269 billion*

• The general office worker receives an average of 600 emails per week*

• Blended attacks - Email as a career for malicious URL

• Office document files are an effective weapon

• Lighter footprint and hiding in plain sight

Reference:

https://www.symantec.com/security-center/threat-report

* Email Statistics Report, 2017-2021, Radicati Group, February 2017 Copyright © Symantec

Page 7: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Worldwide Email Forecast

Worldwide Email Users* (M)

3,718 3,823 3,930 4,037 4,147

% Growth 3% 3% 3% 3%

Reference: https://www.radicati.com/wp/wp-content/uploads/2017/01/Email-Statistics-Report-2017-2021-Executive-Summary.pdf

* Includes both Business and Consumer Email users

Daily Email Traffic 2017 2018 2019 2020 2021

Total Worldwide Emails Sent/Received Per Day (B)

269.0 281.1 293.6 306.4 319.6

% Growth 4.5% 4.4% 4.4% 4.3%

Worldwide Daily Email Traffic (B), 2017-2021

Worldwide Email User Forecast (M), 2017–2021

Page 8: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Email: Locky malware delivery vector

Reference:

https://www.symantec.com/security-center/threat-report

http://www.latimes.com/business/technology/la-me-ln-hollywood-hospital-bitcoin-20160217-story.html

https://arstechnica.com/information-technology/2016/02/locky-crypto-ransomware-rides-in-on-malicious-word-document-macro/

Copyright © Symantec

• Released in 2016• Still active in 2017• “Enable macro if data encoding is incorrect”• If the user does enable macros, the macros then save and run a

binary file that downloads the actual encryption Trojan• Hospital in Hollywood payed $17,000 in bitcoin to hackers

Page 9: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Scripting Malware – real ones!

Page 10: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Exampli Gratia

AutoClose, Random variable, String split

Page 11: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Fake variableFake commentFake condition

Page 12: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Multiple FunctionString split

Page 13: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String encryption Random variable Function Call hidden

Page 14: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String EncryptionRandom variableMulti functionClick event

Page 15: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String hiddenFake condition

Page 16: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Machine Learning forhand-written text!

Page 17: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Domain Differences

Programming Language

• Non-Ambiguous

• Deterministic language

• Clear distinction between syntax and semantics

• Semicolons, Tabs vs Spaces, Editor wars

• Identifier, sub routine calls, imports

• Comments, conventions, notations

• Design patterns

Natural Language

• Ambiguous

• Context-bound languages

• Less distinguished between syntax and semantic

• Puns, Rants, Parodies, Imitations

• TF-IDF

• LSTM – Long short term memory

• Bag of words

Copyright © Symantec

Page 18: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Machine Learning Applications – Code!

Automatic Patch Generation by Learning Correct Code by Fan et. al.

Reference:

https://www.newscientist.com/article/mg23331144-500-ai-learns-to-write-its-own-code-by-stealing-from-other-programs/

http://people.csail.mit.edu/rinard/paper/popl16.pdf

Copyright © Symantec

Page 19: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Onlyhttps://www.forbes.com/sites/adrianbridgwater/2016/03/07/machine-learning-needs-a-human-in-the-loop

https://blogs.technet.microsoft.com/machinelearning/2016/10/17/the-power-of-human-in-the-loop-combine-human-intelligence-with-machine-learning/

Human-In-The-Loop?

Page 20: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

How we do it

Page 21: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Rule ^ ML

Email

Analyze

Inflation

Macro Extraction

Parsing

Feature Extraction

Copyright © Symantec

Page 22: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Feature Selection (Total 72 Features)ML_1... ML_12…

ML_2... ML_13…

ML_3... ML_14…*

ML_4... ML_15…

ML_5... ML_16…

ML_6... ML_17…

ML_7… ML_18…

ML_8… ML_19…

ML_9… ML_20…

ML_10… ML_21…*

ML_11… …

Note: Features with (*) can be expanded to the count of each item.

ML_21_1… ML_14_1…

ML_21_2… ML_14_1…

ML_21_3… ML_14_1…

ML_21_4… ML_14_1…

ML_21_5… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

… 29 features … 21 features

Page 23: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Optimization

ML_1…(Composite)

ML_2… ML_3… ML_4… ML_14_3…

1 31469 1245 35 211 0

2 44617 1264 14 171 0

3 33247 1045 14 158 0

… … … … … …

1234 18828 682 29 222 1

… … … … … …

40000 1273048 844 19 151 0

• Treat ML_1… feature since it is dependent on other features.

• Treat features like ML_14_3… since categorical feature.

Page 24: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Results – very recent ones!

Page 25: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Spam run – from Aug 21 to Aug 27

{ "desc": "Shell call", "artifact": " Shell \"Explorer.exe \" & strCommande, vbNormalFocus, "

},

Copyright © Symantec

Page 26: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Just this morning … 15 Sep 2017

Page 27: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Recently captured…

{

"desc": "Small routine with string manipulation",

"artifact": " Chinook = (AscB(Sumatran_Rhinoceros))"

"artifact": " Tapir = Chinook(Mid(Sand_Lizard, Chipmunk, 1)) - Int(M..."

},

{

"desc": "Small routine with run & Obfuscated object concat & Obfuscated object creationarguments shell & Createobject run one-liner",

"artifact": " CreateObject(Pig + \"Shell\").Run Module1.Ibis(Sea_Dragon, \""

},

Page 28: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Obfuscated object variable",

"artifact": "Set miLxhuTjOMrpjvLQQNhstoiWlCkOdozYkasyizjweDRGlKRkgtkgxHZyAoLfJFFaMSFJDNiRekNpWbkbkzhjETbcAtytnDmZxruTFIhTLSCM = CreateObject(ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQao"

},

{

"desc": "Obfuscated object creation arguments",

"artifact": "Set qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO = CreateObject(kbUBGIKqbHJyTmAmPbuHSBjqouVxfwCfSfEWfcNXxXYAhCJKXcegnoejsdNMnNKeFdfnieGnOXJvcjJlkKZDSV"

},

{

"desc": "Long obfuscated variable assignment",

"artifact": "ZGwEiLSTkOsQSFcFzZVPMMuHalgKESzgWlohddzbmveToRIxzt"

},

Page 29: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Macro with constant manipulation in function call",

"artifact": "dNDfJESUPztgDlcNnWNZLIPsGgXDVndgUDYaarDOIWeCVstlSACjSVcUyLZ = CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(327 - 240) & CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(324 - 241) & CWvXJUNl…"

},

{

"desc": "Highly random long string found",

"artifact": "mRClEXzmRGxUqDPLJHcHeEMgjtqozQbuXXYIpdNJOtykVB"

},

{

"desc": "Object creation variable identifier",

"artifact": "qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO"

},

{

"desc": "Random subroutine name",

"artifact": "dnHLjlClNBEYNnZihnFPOighaDbyTOUim"

},

Page 30: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Random identifier with suspicious assignments",

"artifact": "ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQaoCIdBbVAdczWFVpbOGsxrmOTqKykcaurtoAaRUmQJgntcvICwoBcYTiBopmrckXChHdQUOKtTcnKzV = Chr$(327 - 240) & Chr$(324 - 241) & Chr$(24…"

},

{

"desc": "Shell/SaveToFile string contains strange variable name",

"artifact": "RhIzeRHLbzssvNwesaErYKfXuynMPZjWdUBgPAZZUnlhknaNjNAQERoHClFgeuvBPWPbMQPsAeXlYymHXZdCZTRMfteev"

},

{

"desc": "File with following name was created and run created",

"artifact": "XABNAGkAYwByAG8AcwBvAGYAdAA=XABxAGIASwBWAEsAdgBsAGgAdwBpAEoAUgBLAC4AZQB4AGUA"

},

And… we capture a lot more!

Page 31: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Findings & Going Forward …

• “If an artifact is missing” means a sample is missed – not anymore

• All features contribute to the verdict in unison

• Obfuscation is still a challenge and will remain to be one

• Identify why a variable of string type is assigned a byte array?

• Why an assignment expression is more than say 200 characters?

• Keep transitioning inflating malware samples from sandbox to static analysis

Page 32: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Thank You!

Talha Obaid Ling Zhou Timothy You Xinlei Cai

Email Security

Join us! www.symantec.com/about/careers