Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the U.S. National Science Foundation and a U.S. Department of Education GAANN Fellowship
22
Embed
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Tuning of Loop Cache Architectures to Programs in Embedded System Design
Susan Cotterell and Frank Vahid*Department of Computer Science and Engineering
University of California, Riverside*Also with the Center for Embedded Computer Systems at UC
IrvineThis work was supported in part by the U.S. National Science Foundation and a
U.S. Department of Education GAANN Fellowship
2
Introduction
Opportunity to tune the microprocessor architecture to the program
Traditional
Core Basedmicroprocessor
architecture
3
Introduction
I$
JPEG
Processor
USB
D$
Bridge
CCDP P4
Mem
• I-cache– Size– Associativity– Replacement
policy
I$I$
JPEG
• JPEG– Compression
• Buses– Width– Bus invert/gray
code
JPEG
4
Introduction
• Memory access can consume 50% of an embedded microprocessor’s system power– Caches tend to be power
hungry
• M*CORE: unified cache consumes half of total power (Lee/Moyer/Arends 99)
• ARM920T: caches consume half of total power (Segars 01)
arm925%
SysCtl3%
CP 152%
BIU8%
PATag RAM1%
Clocks4%
Other4%
D MMU5%
D Cache19%
I Cache25%
I MMU4%
5
Introduction
Advantageous to focus on the instruction fetching subsystem
Processor
USB
I$
D$
Bridge
JPEG CCDP P4
Mem
6
Introduction
• Techniques to reduce instruction fetch power– Program compression
• Compress only a subset of frequently used instructions (Benini 1999)
• Compress procedures in a small cache (Kirvoski 1997)
• Lookup table based (Lekatsas 2000)
– Bus encoding• Increment (Benini 1997)
• Bus-invert (Stan 1995)
• Binary/gray code (Mehta 1996)
7
Introduction
• Techniques to reduce instruction fetch power (cont.)– Efficient cache design
• Small buffers: victim, non-temporal, speculative, and penalty to reduce miss rate (Bahar 1998)
• Memory array partitioning and variation in cache sizes (Ko 1995)