Overview PEMU Design Evaluations Discussion Conclusion References PEMU:APIN Highly Compatible Out-of-VM Dynamic Binary Instrumentation Framework Junyuan Zeng, Yangchun Fu, Zhiqiang Lin Department of Computer Science The University of Texas at Dallas March 15 th , 2015
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Process Level DBI (e.g., PIN, VALGRIND)Process-level DBI such as PIN and VALGRIND providesrich APIs to analyze user level binary code execution, butthe analysis code is executed inside the VM (i.e., in-VM)with the same privilege as the instrumented process.No kernel level instrumentationLimited type of OS (VALGRIND only for Linux)
VM Monitor Level DBI (e.g., QEMU)No general DBI APIs
Table 1. Compatibility Testing with Existing PIN Plugins.
beginning or end of an interval of the execution of a program.We leave the support of these APIs for future work.
5.2 Performance Evaluation
Next, we test the performance of PEMU. We perform two setsof experiments: one is to measure how slow PEMU is whencompared to a vanilla-QEMU, and the other is how slow whencompared to PIN. We directly use the instruction countingplugin described in Fig 2. This plugin increases the numberof instructions in a BB for an accumulated counter beforethe execution of each BB. We test this plugin with the SPEC2006 benchmark programs. Each of the benchmark programsis executed 100 times, and we use the corresponding averagenumber in our report.
Performance Comparison with vanilla-QEMU. In this ex-periment, we measure the overhead introduced by PEMU
instrumentation. We compare the execution when runningthe benchmarks with PEMU, directly with QEMU without anyinstrumentation.
We report the detailed experimental result in Table 2.Specifically, we show the total number of instructions exe-cuted in the 2nd column and also the execution time of QEMU
and PEMU is reported in the 3rd and 4th column (namely,TQemu and T_Pemu). We notice that on average there are17649.1 million instructions traced for these benchmarks,and the average slowdown over QEMU is about 4.33X, whichwe believe it is reasonable for practical use. This overheadincludes our TRACE Constructor, Code Injector, as well asruntime overhead of the analysis routine.
Performance Comparison with PIN. In the second experi-ment, we compare PEMU against PIN using the same pluginwith the same benchmark. The execution time of runningin PIN is presented in the 6th column, and the comparisonbetween PEMU and PIN is presented in the last column.
We notice that the average slowdown between PEMUand PIN is over 83.61X. The main reason is that PIN isrunning natively while PEMU (based on QEMU) needs extratranslation. The largest slowdown comes from 444.namdwhich is above 310X. However, we note that when runningthis program in vanilla-QEMU, it will have close to 100Xslowdown. We carefully examine the reason and find theroot cause due to the use of large amount of floating pointinstructions which needs time-consuming emulation insideQEMU.
It is also interesting to note that for 450.soplex, run-ning in QEMU is faster than that of PIN. The main reason isthis program contains more control flow instructions that willgo to the middle of a TRACE, thereby breaking the TRACE.In this case, QEMU (based on BBL disassembling) will justredisassemble the basic block that has not been disassembled,but PIN (based on TRACE disassembling) will redisassemblethe whole trace after a new trace is found. Meanwhile, therunning time of this program is relatively short. Thus, thetime is dominated by the disassembling time.
5.3 Memory Cost EvaluationSince PEMU uses an ahead-of-time instrumentation that willstore the hooking point to facilitate the instrumentation, wewould like to measure how much memory space this hookingpoint table consumes. Again, we evaluate this memory costwith our instruction counting plugin against the SPEC2006benchmark. The result is presented in Fig. 7. We notice thatthe average memory cost is about 9M.
More specifically, as shown in Fig. 7, the maximum mem-ory cost comes from 465.tonto (about 22M) because thisprogram contains the largest number of BB, resulting in thelargest hash table to store the hooking points. More interest-ingly, 464.h264ref is one of the most time consumingprograms but requires a relative small size of hash table. Thereason is that this program contains lots of loops and thuscertain instructions get executed repeatedly.
5.4 Case StudiesWe have demonstrated using PEMU to analyze Linux binaries.In fact, our system is cross-OS, which is one of our designgoals. To test this, we apply PEMU to analyze Windows bina-ries as we have evaluated with Linux binaries. In particular,we use a number of anti-PIN binaries during this test.
First, we test how PEMU would analyze the software pro-tected by tElock and safengine shielden, whichare two widely used tools to build anti-analysis software.We apply these protectors to the hostname binaries ina Win-XP SP3 machine, with anti-debugging and anti-instrumentation enabled, and produce two anti-analysis
Table 2. Performance compared with vanilla-QEMU and PIN.
0
5
10
15
20
25
401.b
zip2
403.g
cc
410.b
wav
es
416.g
ames
s
429.m
cf
433.m
ilc
435.g
rom
acs
436.c
actu
sAD
M
437.l
esli
e3d
444.n
amd
445.g
obm
k
450.s
ople
x
453.p
ovra
y
454.c
alcu
lix
456hm
mer
458.s
jeng
462.l
ibquan
tum
464.h
264re
f
465.t
onto
470.l
bm
471.o
mnet
pp
473.a
star
482.s
phin
x3
999.s
pec
rand
Siz
e of
Has
hT
able
(M
)
Figure 7. Memory Cost Comparison with SPEC2006 Bench-marks
hostname binaries. We then use PIN and PEMU to ana-lyze the packed hostname.
More specifically, we developed a simple strace (asshown in Fig. 8) plugin to trace the syscall executed by thehostname binary. This plugin will print the syscall numberat syscall entry point, and the return value at syscall exit point.We compiled it into a PIN plugin and PEMU plugin with thesame source code. PIN failed on these two tests. Both packedprograms detected the presence of PIN and exited at earlystages. In contrast, hostname ran successfully on PEMU
and displayed the host name.In our other case study, we used eXait [13], a benchmark-
like tool to test anti-instrumentation techniques. eXait hasa plugin architecture, and each technique is implemented
Figure 8. A cross-OS PEMU plugin to trace the syscall.
as a separated DLL. There are 21 plugins in total. Againwe run PIN with strace plugin to instrument eXait andthe loaded DLLs. We found that 17 anti-instrumentationtechniques detect the presence of PIN. But none of themdetect the presence of PEMU.
Through these case studies, we show there is a need for out-of-VM PIN alternatives. Also, even though future malwarewill be able to detect the presence of PEMU, we should be ableto add countermeasures against them, given that the sourcecode of PEMU is open.
Table 2. Performance compared with vanilla-QEMU and PIN.
0
5
10
15
20
25
40
1.b
zip
2
40
3.g
cc
41
0.b
wav
es
41
6.g
ames
s
42
9.m
cf
43
3.m
ilc
43
5.g
rom
acs
43
6.c
actu
sAD
M
43
7.l
esli
e3d
44
4.n
amd
44
5.g
ob
mk
45
0.s
op
lex
45
3.p
ov
ray
45
4.c
alcu
lix
45
6h
mm
er
45
8.s
jen
g
46
2.l
ibq
uan
tum
46
4.h
26
4re
f
46
5.t
on
to
47
0.l
bm
47
1.o
mn
etp
p
47
3.a
star
48
2.s
ph
inx
3
99
9.s
pec
ran
d
Siz
e o
f H
ash
Tab
le (
M)
Figure 7. Memory Cost Comparison with SPEC2006 Bench-marks
hostname binaries. We then use PIN and PEMU to ana-lyze the packed hostname.
More specifically, we developed a simple strace (asshown in Fig. 8) plugin to trace the syscall executed by thehostname binary. This plugin will print the syscall numberat syscall entry point, and the return value at syscall exit point.We compiled it into a PIN plugin and PEMU plugin with thesame source code. PIN failed on these two tests. Both packedprograms detected the presence of PIN and exited at earlystages. In contrast, hostname ran successfully on PEMU
and displayed the host name.In our other case study, we used eXait [13], a benchmark-
like tool to test anti-instrumentation techniques. eXait hasa plugin architecture, and each technique is implemented
Figure 8. A cross-OS PEMU plugin to trace the syscall.
as a separated DLL. There are 21 plugins in total. Againwe run PIN with strace plugin to instrument eXait andthe loaded DLLs. We found that 17 anti-instrumentationtechniques detect the presence of PIN. But none of themdetect the presence of PEMU.
Through these case studies, we show there is a need for out-of-VM PIN alternatives. Also, even though future malwarewill be able to detect the presence of PEMU, we should be ableto add countermeasures against them, given that the sourcecode of PEMU is open.
BackgroundAnti-PIN malware: malware exits when it detects it is runinside PIN
Case StudiestElock, Safengine Shielden: two widely used tools tobuild anti-analysis software.eXait: a benchmark-like tool to test anti-instrumentationtechniques
PIN failed to run the testing programs generated by tElockand Safengine Shielden, which detected the presence ofPIN and exited at early stages.The testing programs ran successfully on PEMU.
eXait:17 out of 21 anti-instrumentation techniques detect thepresence of PIN,None of them detect the presence of PEMU.
A new dynamic binary code instrumentation framework1 PIN-API compatibility2 Cross-OS (support both Windows and Linux)3 Out-of-VM (strong isolation with the analysis routine and
Sanjay Bhansali, Wen-Ke Chen, Stuart de Jong, Andrew Edwards, Ron Murray, Milenko Drinic, DarekMihocka, and Joe Chau, Framework for instruction-level tracing and analysis of program executions,Proceedings of the 2Nd International Conference on Virtual Execution Environments (New York, NY, USA),VEE ’06, ACM, 2006, pp. 154–163.
Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia, Dynamo: A transparent dynamic optimizationsystem, Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design andImplementation (New York, NY, USA), PLDI ’00, ACM, 2000, pp. 1–12.
Fabrice Bellard, Qemu, a fast and portable dynamic translator, Proceedings of the annual conference onUSENIX Annual Technical Conference (Berkeley, CA, USA), ATEC ’05, USENIX Association, 2005.
Bryan Buck and Jeffrey K. Hollingsworth, An api for runtime code patching, Int. J. High Perform. Comput.Appl. 14 (2000), no. 4, 317–329.
Prashanth P. Bungale and Chi-Keung Luk, Pinos: A programmable framework for whole-system dynamicinstrumentation, Proceedings of the 3rd international conference on Virtual execution environments, 2007,pp. 137–147.
bochs: The open source ia-32 emulation project, 2001, http://bochs.sourceforge.net/.
Derek Bruening, Qin Zhao, and Saman Amarasinghe, Transparent dynamic instrumentation, Proceedings ofthe 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (New York, NY, USA), VEE’12, ACM, 2012, pp. 133–144.
Scott W. Devine, Edouard Bugnion, and Mendel Rosenblum,Virtualization System Including a Virtual Machine Monitor for a Computer with a Segmented Architecture,United States Patent 6,397,242 (1998).
Yangchun Fu and Zhiqiang Lin, Space traveling across vm: Automatically bridging the semantic-gap in virtualmachine introspection via online kernel data redirection, Proceedings of the 2012 IEEE Symposium onSecurity and Privacy (San Francisco, CA), May 2012.
, Exterior: Using a dual-vm based external shell for guest-os introspection, configuration, andrecovery, Proceedings of the Ninth Annual International Conference on Virtual Execution Environments(Houston, TX), March 2013.
Andrew Henderson, Aravind Prakash, Lok Kwong Yan, Xunchao Hu, Xujiewen Wang, Rundong Zhou, andHeng Yin, Make it work, make it right, make it fast: Building a platform-neutral whole-system dynamic binaryanalysis platform, Proceedings of the 2014 International Symposium on Software Testing and Analysis (NewYork, NY, USA), ISSTA 2014, ACM, 2014, pp. 248–258.
Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, and Radu Sion, Sok: Introspections ontrust and the semantic gap, Proceedings of the 2014 IEEE Symposium on Security and Privacy(Washington, DC, USA), SP ’14, IEEE Computer Society, 2014, pp. 605–620.
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace,Vijay Janapa Reddi, and Kim Hazelwood, Pin: building customized program analysis tools with dynamicinstrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design andimplementation (New York, NY, USA), PLDI ’05, ACM, 2005, pp. 190–200.
Barton P. Miller and Andrew R. Bernat, Anywhere, any time binary instrumentation, September 2011.
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, JohanHögberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner, Simics: A full system simulation platform,Computer 35 (2002), no. 2, 50–58.
Nicholas Nethercote and Julian Seward, Valgrind: A program supervision framework, In Third Workshop onRuntime Verification (RV’03), 2003.
Nicholas Nethercote and Julian Seward, Valgrind: A framework for heavyweight dynamic binaryinstrumentation, Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Designand Implementation (New York, NY, USA), PLDI ’07, ACM, 2007, pp. 89–100.
Angela Demke Brown Peter Feiner and Ashvin Goel, Comprehensive kernel instrumentation via dynamicbinary translation, Proceedings of the seventeenth international conference on Architectural Support forProgramming Languages and Operating Systems, 2012.
K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L. Soffa, Retargetable andreconfigurable software dynamic translation, Proceedings of the International Symposium on CodeGeneration and Optimization: Feedback-directed and Runtime Optimization (Washington, DC, USA), CGO’03, IEEE Computer Society, 2003, pp. 36–47.
Swaroop Sridhar, Jonathan S. Shapiro, Eric Northup, and Prashanth P. Bungale, Hdtrans: An open source,low-level dynamic instrumentation system, Proceedings of the 2Nd International Conference on VirtualExecution Environments (New York, NY, USA), VEE ’06, ACM, 2006, pp. 175–185.
Ariel Tamches and Barton P. Miller, Fine-grained dynamic instrumentation of commodity operating systemkernels, Proceedings of the Third Symposium on Operating Systems Design and Implementation (Berkeley,CA, USA), OSDI ’99, USENIX Association, 1999, pp. 117–130.
Jon Watson, Virtualbox: Bits and bytes masquerading as machines, Linux J. 2008 (2008), no. 166.
Emmett Witchel and Mendel Rosenblum, Embra: Fast and flexible machine simulation, Proceedings of the1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems(New York, NY, USA), SIGMETRICS ’96, ACM, 1996, pp. 68–79.
Junyuan Zeng Yangchun Fu and Zhiqiang Lin, Hypershell: A practical hypervisor layer guest os shell forautomated in-vm management, USENIX ATC’14 Proceedings of the 2014 USENIX conference on USENIXAnnual Technical Conference (USENIX Association Berkeley, CA, USA), USENIX Association, 2014,pp. 85–96.
Heng Yin and Dawn Song, Temu: Binary code analysis via whole-system layered annotative execution,Technical Report UCB/EECS-2010-3, EECS Department, University of California, Berkeley, Jan 2010.