Pattern Recognition and Applications Lab University of Cagliari, Italy Department of Electrical and Electronic Engineering Android Security and Reverse Engineering Davide Maiorca [email protected]Computer Security, A.Y. 2017/2018 http://pralab.diee.unica.it Contents • Introduction • Android and Malware Basics – Os Elements – Application Structure – Malware Basics • Dissecting a Malware – Reading the Manifest – Exploring DexCode • Obfuscation • Machine Learning for Android • Conclusions 2
28
Embed
Android Security and Reverse Engineering · • Android and Malware Basics – Os Elements – Application Structure – Malware Basics ... • Dalvikcode is obtained through a conversion
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Pattern Recognitionand Applications Lab
Universityof Cagliari, Italy
Department of Electrical and Electronic Engineering
content providers, etc.)• Libraries (Dalvik Virtual Machine / ART)• Linux Kernel (low level system
functionalities)
6
http://pralab.diee.unica.it
Layer-Based Architecture
7
http://pralab.diee.unica.it
Kernel
• Manages processes• Drivers to physical resources
– Enabled through system calls• Supports Communication Between Processes (IPC)
– Driver Binder (Intents)– Sockets – Binder (Kernel Level)– Intents (e.g., using an external browser)– Data management with content providers
• App can (and must) be signed with certificates• Encrypted File System (Kernel >=3.0, AES 128 bit)• Memory Error Protections (DEP+ASLR from Android 4.0)
8
http://pralab.diee.unica.it
Native Libraries
• Written in C/C++ (do NOT confuse them with the ones used
to normally program)
• Interfaced with the Application Framework
– Libc
– SQLLite
– OpenGL
• Useful for the attacker when he wants to directly access
memory…
• To execute applications, Android uses a specific runtime
• Dalvik Virtual Machine is used till Android 4.4
• Since Android 5, it is compulsory to use ART (Android
RunTime )
9
http://pralab.diee.unica.it
Dalvik Virtual Machine
• Android apps are usually written in Java
• But the Java Virtual Machine (JVM) is typically not efficient!
• Try to run it on a 256 MB RAM Smartphone...
• Dalvik VM improves various elements of the JVM
• Just in Time compilation (instructions are compiled at runtime)
• Dalvik code is obtained through a conversion from Java Bytecode
• Much more efficient (optimized for ARM architectures)!
10
http://pralab.diee.unica.it
Dalvik Virtual Machine (2)
11
http://pralab.diee.unica.it
Dalvik Bytecode
12
Java
Dalvik
Source Code
http://pralab.diee.unica.it
Android RunTime (ART)
• Evolution of Dalvik Virtual Machine• Dalvik bytecode is compiled and transformed to
machine code (at install time) • The application executes machine code (ARM) once
installed– ARM architecture differs to Intel…
• Supports 64 bit processors!• Speed Boost with respect to Dalvik• Longer install time…• …And more space occupied by the app
13
http://pralab.diee.unica.it
Android RunTime (ART)
14
http://pralab.diee.unica.it
Application Framework
• An Android app is composed of four essential components• Activities
– Program screens– Each screen corresponds to one activity– An app can have multiple activities
• Services– Background execution (e.g., listening to mp3)– No interface components
• Broadcast Receivers– Events that are only activated under certain conditions
• Content Providers– Interfaces for sharing data among applications
15
http://pralab.diee.unica.it
Application Layer
AndroidManifest.xml
Classes.dex (Bytecode)
Assets
Resource Files(es. Layout)
Manifest.xml: information on application components
• A payload is injected inside an app that is then recompressed and signed
• The app is then submitted to a store• Repackaged app exploits dangerous permissions
that are already used by benign apps • Most used techniques for its simplicity – AFE (Android Framework For Exploitation -
https://github.com/appknox/AFE)• A lot of scientific research has been done to detect
repackaged apps
18
http://pralab.diee.unica.it
Update
After having installed an app, you will be required to download an update
The update contacts a malicious URL that drops a malicious app
Sometimes, the malware is directlyloaded without being downloaded
This has been used by some of the most popular malware (e.g., Base Bridge e Droid Kung Fu)
19
http://pralab.diee.unica.it
Drive-By Download and Other CreationTechniques• Drive-by download is used in a lot of malicious apps• An advertisement in a legitimate app redirects to a malicious URL• The url drops a malicious app• Examples:
– GGTracker– JIFake
• Malware can be obviously created without resorting to additional techniques– Spyware– Malware that uses interfaces similar to legitimate applications
(note: They are not repackaged - e.g., FakeNetflix)– Applications with Hybrid Functionalities (execute both
legitimate and malicious actions)
20
http://pralab.diee.unica.it
Malware Analysis: Time to Get Serious!
21
http://pralab.diee.unica.it
Goals and Tools• GOAL: Understanding some of the actions performed by a (supposedly)
• START OFF: java –jar apktool decode <application> <output folder>• You can find the already disassembled app together with these slides
(password: infected)
22
http://pralab.diee.unica.it
App Output
23
http://pralab.diee.unica.it
After Disassemling
• The classes.dex file is decoded into four Packages of .smali files (excluding further subpackages)– .smali is a simplified format to read dexcode– Two system based (android.annotation /
com.android.internal)– One third party based (org.apache.http.entity.mime) ->
related to mail?– QiN946i7GWDkTRAN.GpGlfNTX6v9V8NGm-> Guess we
have a suspicious one here J
• The Android Manifest (compulsory) is decoded to a readable .xml
• Other .xml files (for now, we overlook them)24
http://pralab.diee.unica.it
Analysis Steps
• Find the app components (from the Manifest)• DexCode analysis
– Analyze the code of the main activity (i.e., the one with the MAIN intent filter)
– Starting from it, build the user-made call graph from the MAIN activity
– Analyze suspicious strings (e.g., urls, Unicode strings, etc.) and track them
– Repeat for the other app components (e.g. services)
• Analyze external elements (images, external libraries, etc. )
• Other…? (use other programs, dynamic analysis…)25
http://pralab.diee.unica.it
Analyzing the Manifest
• XML File• Its structure is mainly based on tags• Two main parts
– Permission definitions– Component definitions
• Quite straight-forward to read once you get used to it
• Smali files can be very large• We cannot analyze every single line of code• We should set some goals, considering the typical
actions of malware– SMS Send/Stealing– Suspicious HTTP requests– Access to contacts/phone status
• In our case, we have to look for:– Invokation to User-implemented methods/classes– Suspicious strings– Suspicious fields assignments
http://pralab.diee.unica.it
DexCode Instructions
31
• A lot of instructions (but way simpler than Assembly ARM…)• For this tutorial, we only need a very simplified set• Generally: [Instr-Name] [Registers] [Parameters]
– Virtual Registers!• sget
– Places the content of parameter (a static field in this case) in the destination register
– Eg. sget-object v4, MyClass.foo (pushes the content of the field foo in v4)
• new-instance– Initialize a new class (e.g.: new-instance v4, MyFoo)
• const-string– Define a constant string and pushes it to a register (e.g., const-
string v4, “bla” places the string bla in V4)
http://pralab.diee.unica.it
DexCode Instructions - Invocations
32
• Invocations can be of multiple types
• In this tutorial, let us consider every invoke as a generic invocation to a method
– Common return types are represented by a letter (e.g., int is I, boolean is Z, void is V…)
– If a function returns, after an invoke there is usually a move-result function
– The parameter registers match the parameter names, but in object instances the first register always refers to the object caller itself
– E.g.: invoke-virtual{v0,v1}, LMyClass->Foo(I)V
http://pralab.diee.unica.it
Inside MainActivity – Finding Invocations…
33
const/4 v8, 0x0invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)V ** System Class and Method …
invoke-virtual {p0}, LQiN946i7GWDkTRAN/GpGlfNTX6v9V8NGm/view/MainActivity;->getPackageManager()Landroid/content/pm/PackageManager; **User-Defined class, but system method! (inheritance)…move-result-object v4
.super Landroid/app/Activity; ** Careful about superclasses!
• Smali files have another advantage• You can change any line of code inside the file• Then you can reassemble the file with apk tool (with the
encode command)• This can be very useful if you want, for example, to force the
execution of the program towards a direction…• …or if you want to redirect the app traffic towards your
server J• But be careful!• Android has a very precise verification system• One mistake and the application won’t work anymore!
http://pralab.diee.unica.it
More Hints
44
• Static analysis is very useful, but for some files it is not really adequate– For example, heavily obfuscated files
• Attack code can be contained also in native libraries – Need to disassemble them and read Assembly ARM L
• Be careful about multithreading!– You recognize thread usage if the file name has a $ on it– Some apps can run on multiple threads– Very hard to understand how they work with static analysis only
• Use dynamic analysis too!• Systems like TraceDroid can really help
(http://tracedroid.few.vu.nl/index.php)
http://pralab.diee.unica.it
Lessons Learned
45
• The goal is not analyzing every single line of code, but understanding suspicious actions
• Always keep in mind what the possible intentions of the attacker are
• Be conceptual, draw schemes, trace suspicious variables/fields/methods
• Whenever you find a domain/ip address, try to understand its activity (e.g., where it comes from)
• If you get lost while following a sequence of methods, try another one
• Practice, practice, practice!
http://pralab.diee.unica.it
Obfuscation
• Ensemble of techniques to make the executable code less readable for a human/machine (without changing its semantics)
• Such techniques mainly act on the classes.dex file, and secondarily on the Manifest.xml
• These techniques are typically used to protect codes from possible copies…
• …but also to evade detection systems for malware• To better understand obfuscation, let’s have a look at the
classes.dex structure
46
http://pralab.diee.unica.it
Classes.dex Structure
Header
String IDS
Proto IDS
Type IDS
Field IDS
Method IDS
Classes Defs
Data
Header
String IDS: References to strings
Type IDS: (references to strings that represent the types used by method, classes and attributes)
Proto IDS: references to prototypes (return types, parameters and method names)
Field IDS: references to fields (class attributes)
Method IDS: references to methods (define information on methods)
Classes Defs: (information about classes)
Data: all the executed/referenced data
IDS contain references to data
The code references to IDS!You can change a string without
disrupting the functionality of thefile
47
http://pralab.diee.unica.it
Static Obfuscation Techniques• Trivial / Renaming
– Rename classes, methods and attribute names– Acts on strings only
• String Encryption– Each string is encrypted and decrypted at runtime with additional
methods• Reflection
– All method/field invocations are replaced with “introspective calls” that have the same effect as the original ones
– Uses Java Reflection API– Acts on methods and data sections
• Class Encryption– Classes are encrypted– They are then decrypted and dynamically loaded at runtime– Heavy changes to the executable file
48
http://pralab.diee.unica.it
Static Obfuscation - Example
Reflection
this.counter=10; Field myCounter=MyClass.class.getDeclaredField(“counter”);myCounter.set(this,10);
Machine Learning for Android Malware Detection• The amount of available malware is progressively increasing• Signature systems are often inadequate
– Slow updates– Weak signatures– False positives
• Research has focused on Machine Learning-based strategies to improve detection– Detection of never-seen before samples– Reducing the number of updates
• Still a lot of research challenges to take!
51
http://pralab.diee.unica.it
Structure of Detection Systems
52
http://pralab.diee.unica.it
Feature Design• A crucial part is understanding which features should be analyzed by the
system
• Such information can include:
– System API calls (Drebin, R-PackDroid)
– Permissions
– IP Addresses
– Filtered Intents
• Some systems are already used at the state of the art (e.g., DREBIN – Arp et al., NDSS 2014; StormDroid – Chen et al., ASIACCS 206; R-PackDroid –Maiorca et al., SAC 2017)
• Still a lot of research challenges to take!
– More advanced and robust features
– Resilience against obfuscation
– Resilience of the ML Algorithm (Adversarial Machine Learning)
53
http://pralab.diee.unica.it
Conclusions
• Mobile malware analysis (and its detection) is a critical topic in Computer Security
• Like malware analysis in X86 architectures, it’s a complex discipline, full of hurdles and traps…
• …but it can also tell a lot about how attacks against mobile platforms are evolving
• The usage of machine learning is also particularly of interest in this field– AV companies are starting to use it more and more!
• A lot of reseach topics are waiting for being explored…• Jump in! J
54
http://pralab.diee.unica.it
References• Marko Gargenta, Learning Android. O’Reilly, 2011.• Symantec. Internet Security Threat Report, 2017.• Android Official Documentation.
https://developer.android.com/guide/index.html• D. Maiorca, D. Ariu, I. Corona, M. Aresu, G. Giacinto. Stealth Attacks. An
Extended Insight into the Obfuscation Effects on Android Malware. In Computer and Security (Elsevier), 2015.
• D. Maiorca, F. Mercaldo, G. Giacinto, A. Visaggio, F. Martinelli. R-PackDroid. API Based Characterization and Detection of Mobile Ransomware. In ACM SAC 2017.
• A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli. Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. In IEEE Trans. Dependable and Secure Computing, In Press.