Synthesis of Customized Loop Caches for Core-Based Embedded Systems Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the U.S. National Science Foundation and a U.S. Department of Education GAANN Fellowship
28
Embed
Synthesis of Customized Loop Caches for Core-Based Embedded Systems
Synthesis of Customized Loop Caches for Core-Based Embedded Systems. Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Synthesis of Customized Loop Caches for Core-Based
Embedded Systems
Susan Cotterell and Frank Vahid*Department of Computer Science and Engineering
University of California, Riverside*Also with the Center for Embedded Computer Systems at UC
IrvineThis work was supported in part by the U.S. National Science Foundation and a
U.S. Department of Education GAANN Fellowship
2
Introduction
Opportunity to tune the microprocessor architecture to the program
Traditional
Core Basedmicroprocessor
architecture
3
Introduction
I$
JPEG
Processor
USB
D$
Bridge
CCDP P4
Mem
• I-cache– Size– Associativity– Replacement
policy
I$I$
JPEG
• JPEG– Compression
• Buses– Width– Bus invert/gray
code
JPEG
4
Introduction
• Memory access can consume 50% of an embedded microprocessor’s system power– Caches tend to be power
hungry
• M*CORE: unified cache consumes half of total power (Lee/Moyer/Arends 99)
• ARM920T: caches consume half of total power (Segars 01)
arm925%
SysCtl3%
CP 152%
BIU8%
PATag RAM1%
Clocks4%
Other4%
D MMU5%
D Cache19%
I Cache25%
I MMU4%
5
Introduction
Advantageous to focus on the instruction fetching subsystem
Processor
USB
I$
D$
Bridge
JPEG CCDP P4
Mem
6
Introduction
• Techniques to reduce instruction fetch power– Program Compression
• Compress only a subset of frequently used instructions (Benini 1999)
• Compress procedures in a small cache (Kirvoski 1997)
• Lookup table based (Lekatsas 2000)
– Bus Encoding• Increment (Benini 1997)
• Bus-invert (Stan 1995)
• Binary/gray code (Mehta 1996)
7
Introduction
• Techniques to reduce instruction fetch power (cont.)– Efficient Cache Design
• Small buffers: victim, non-temporal, speculative, and penalty to reduce miss rate (Bahar 1998)
• Memory array partitioning and variation in cache sizes (Ko 1995)
Required for both methodssimulation was bottleneckBiggest example only 30 minutes – small programStarted looking at MediaBench – simulation takes hours
27
Conclusion and Future Work
• Important to tune the architecture to the program
• Simulation methods are slow– Presented a equation based methodology which is faster than the
simulation based methodology previously used
– Accuracy/fidelity preserved
• Future Work– Expand types of tiny caches
– Look at more benchmarks• MediaBench - several hours (up to 48 hours) for our simulations