National School of Computer Sciences Translation cache policies for dynamic binary translation Saber FERJANI TIMA Laboratory - SLS Group 18 Avril 2013 Saber F. (TIMA SLS ) ENSI 18 Avril 2013 1 / 25
National School of Computer Sciences
Translation cache policies for dynamic binary translation
Saber FERJANI
TIMA Laboratory - SLS Group
18 Avril 2013
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 1 / 25
Who I am ?
Academic & Professional Cursus2010-2013 : Student at National School of Computer Sciences - Tunisia.2011/2012 : Robotic team leader, participation to many competitions.June-July 2011 : Intern at Alpha Technology, Design of many PCB layoutincluding QFP, SO, SMT and through hole components.July-August 2012 : Intern at STMicroelectronics : Developing software for aHygrometer and an Altimeter, for STM32F3 microcontroller
http ://about.me/ferjani
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 2 / 25
Context
Why ?Hardware design is taking more and more time,Software development should start earlier,Instruction Set Simulators (ISS) handles the simulation of processors, namedtarget, on a machine with a different architecture, named host.
How?Cross Compilation.Interpretive translation.Dynamic Binary Translation.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 3 / 25
Context
Why ?Hardware design is taking more and more time,Software development should start earlier,Instruction Set Simulators (ISS) handles the simulation of processors, namedtarget, on a machine with a different architecture, named host.
How?Cross Compilation.Interpretive translation.Dynamic Binary Translation.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 3 / 25
Terminology
Simulator : just duplicate the behavior of the system.
Emulator : duplicate the inner workings of the system.
TB : Translated Bloc.
IR : Intermediate representation (also called op-code)
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 4 / 25
Outline
1 Introduction
2 Cache Algorithms
3 Qemu internals
4 Preliminary Results
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 5 / 25
I- Introduction
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 6 / 25
Qemu OverviewGeneric and open source machine emulator and virtualizer,Created by Fabrice Bellard in 2003,uses portable dynamic translation,Supported Targets : x86, arm, mips, sh4, cris, sparc, powerpc, nds32...
Qemu FeaturesJust-in-time (JIT) compilation support,Self-modifying code support,Direct block chaining.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 7 / 25
Subject
ProblematicSimulation speed is mainly affected by reuse of TB,Current policy just flush the entire cache when it is full,We need to enhance translation cache policy in order to maximize TB reuse.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 8 / 25
II- Cache Algorithms
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 9 / 25
Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !
First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.
Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.
Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25
Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !
First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.
Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.
Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25
Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !
First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.
Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.
Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25
Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !
First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.
Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.
Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25
III- Qemu internals
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 11 / 25
Dynamic Binary Translation in Qemu
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25
Dynamic Binary Translation in Qemu
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25
Dynamic Binary Translation in Qemu
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25
Bloc chaining
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 13 / 25
Bloc chaining
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 13 / 25
lookup tbby target pc
Cached ? Translate onebasic block
execute tbchain itto existedbasic block
Exceptionhandling
no
yes
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 14 / 25
lookup tbby target pc
Cached ? Translate onebasic block
execute tbchain itto existedbasic block
Exceptionhandling
no
yes
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 15 / 25
Focus on (Translate one basic block)
try to allocatespace for tb
sucess ?Flush entiretranslation
cache
generate op& host code
allocatespace for tb(cannot fail!)
no
yes
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 16 / 25
Implementation constraints
Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.
Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.
Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25
Implementation constraints
Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.
Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.
Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25
Implementation constraints
Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.
Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.
Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25
IV- Preliminary Results
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 18 / 25
GoalsSimulate LRU & LFU Algorithms,Compare cache hit ratio,Evaluate overhead of each algorithm.
AssumptionsWe ignore TB size & cache size,Quota of retained entries is 1/5,Cache size is just limited by number of TB,
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 19 / 25
GoalsSimulate LRU & LFU Algorithms,Compare cache hit ratio,Evaluate overhead of each algorithm.
AssumptionsWe ignore TB size & cache size,Quota of retained entries is 1/5,Cache size is just limited by number of TB,
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 19 / 25
Execution ratio = (executions/translation)
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 20 / 25
LFU cache hit ratio
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 21 / 25
LRU cache hit ratio
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 22 / 25
Perspectives
find a suitable cache replacement policy that take care of implementationconstraints.use a dynamically variable quota for retained entries.add small op-code buffer to optimize re-translation of self modifying code.divide translation cache into multiple space to optimize partial cache flush.
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 23 / 25
Bibliography
QEMU Just-In-Time Code Generator and System Emulation - cmchao(March 15,2010).QEMU internals - Chad D. Kersey (January 28, 2009).QEMU, a Fast and Portable Dynamic Translator - Fabrice Bellard (USENIX2005 Annual).Performance Evaluation of Traditional Caching Policies on A Large Systemwith Petabytes of Data - 2012 IEEE Seventh International Conference onNetworking, Architecture, and Storage
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 24 / 25
Thanks for your attention !
Feel free to ask any question !
Saber F. (TIMA SLS ) ENSI 18 Avril 2013 25 / 25