1 Memory Management Challenges in the Power-Aware Computing Era Dr. Avi Mendelson, Dr. Avi Mendelson, Intel Intel - - Mobile Processors Architecture group Mobile Processors Architecture group [email protected][email protected]and adjunct Professor in the CS and EE depart and adjunct Professor in the CS and EE depart Technion Haifa Technion Haifa mendlson@{cs,ee}.technion.ac.il mendlson@{cs,ee}.technion.ac.il
37
Embed
1 Memory Management Challenges in the Power-Aware Computing Era Dr. Avi Mendelson, Intel - Mobile Processors Architecture group [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Memory Management Challenges in
the Power-Aware Computing Era
Dr. Avi Mendelson,Dr. Avi Mendelson, Intel Intel - - Mobile Processors Architecture groupMobile Processors Architecture [email protected]@intel.com
and adjunct Professor in the CS and EE departments, and adjunct Professor in the CS and EE departments, Technion HaifaTechnion Haifamendlson@{cs,ee}.technion.ac.ilmendlson@{cs,ee}.technion.ac.il
Dr. Avi Mendelson,Dr. Avi Mendelson, Intel Intel - - Mobile Processors Architecture groupMobile Processors Architecture [email protected]@intel.com
and adjunct Professor in the CS and EE departments, and adjunct Professor in the CS and EE departments, Technion HaifaTechnion Haifamendlson@{cs,ee}.technion.ac.ilmendlson@{cs,ee}.technion.ac.il
No Intel proprietary information is disclosed. Every future estimate or projection is only a speculation Responsibility for all opinions and conclusions falls on the author
resembles Alice through the Looking-Glass: We are looking at the same old problems, but from the other side of the looking glass, and the landscape appears much different...
Idle process technology allowed us to Double transistor density every 30 months Improve their speed by 50% every 15-18 month Keep the same power density Reduce the power of an old architecture or introduce a new
architecture with significant performance improvement at the same power
In reality Process usually is not ideal and more performance than
process scaling is needed, so: Die size and power and power densities increased over time
Tech Old Arch mm (linear) New Arch mm (linear) Ratio Ratio i386C 6.5 i486 11.5 3.1
i486C 9.5 Pentium® 17 3.2
Pentium® 12.2 Pentium® Pro 17.3 2.1
Pentium® III 10.3 Next Gen ? 2--3
(*) source source Fred Pollack, Fred Pollack, Micro-32Micro-32
Traditionally: a new generation always increase powerTraditionally: a new generation always increase power Compactions: higher performance at lower powerCompactions: higher performance at lower power Used to be “one size fits all”: start with high power and shrink for mobileUsed to be “one size fits all”: start with high power and shrink for mobile
Ma
x P
ow
er
(Wa
tts
)
i386 i386
i486 i486
Pentium® Pentium®
Pentium® w/MMX tech.
Pentium® w/MMX tech.
1
10
100
Pentium® Pro Pentium® Pro
Pentium® II Pentium® II Pentium® 4Pentium® 4Pentium® 4Pentium® 4
Power & energyPowerPower Dynamic power: consumed by all transistor that Dynamic power: consumed by all transistor that
switchswitchP = P = CVCV22ff - - WorkWork done per time unit ( done per time unit (Watts)Watts)
((: activity, C: capacitance, V: voltage, f: frequency): activity, C: capacitance, V: voltage, f: frequency) Static power (leakage): consumed by all “inactive Static power (leakage): consumed by all “inactive
transistors” - depends on transistors” - depends on temperaturetemperature and and voltagevoltage..
EnergyEnergy Power consumed during a time period.Power consumed during a time period.
Energy efficiency Energy efficiency Energy * Delay (or Energy * Delay2)
* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – * “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999.Fred Pollack, Intel Corp. Micro32 conference key note - 1999.
Compiler and applications Currently most of the cache related optimization are based on
performance Many of them are helping energy since they improve the efficiency of
the CPU. But may worsen the max power and power density
Increasing parallelism is THE key for future systems. Speculation may hurt if not done with a high degree of confidence. Do we need a new programming models such as transactional memory
for that? Reducing working sets can help reducing leakage power if the
system supports Drowsy or Decay caches The program may give the HW and the OS “hints” that can help
improving the efficiency of power consumption Response time requirements If new HW/SW interfaces defined, we can control the power of the
machine at a very fine granularity; e.g., when Floating-Point is not used, close it to save leakage power
Garbage collection In many situations, not all the cores in the system will be
active. An idle processor can be used to perform GC In this case we may want to do the CG at a very fine granularity.
Most of CMP architectures shares many of the memory hierarchies. GC done by one processor may replace the cache content and slow down the execution of the entire system. Thus we may like to do the GC at a very coarse granularity in
order to limit the overhead. New interface between HW and SW may be needed in
order to allow new algorithms for GC or optimize the execution of the system when using the current ones
Summary and future directions Power aware era impacts all aspects of computer architecture It forces the market to “go parallel” and may cause the memory portion of
the die to increase over time To take advantage of “cold transistors” To reduce memory and IO bandwidth
We may need to start looking at new paradigms for memory usage and HW/SW interfaces At all levels of the machine; e.g., programming models, OS etc. New programming models such as Transactional Memory may become very
important in order to allow better parallelism. Do we need to develop a new memory paradigm to support it?
Software will determine if the new (old) trend will become a major success or not. Increase parallelism (but use speculative execution only with high confidence) Control power New SW/HW interfaces may be needed.
Both companies have dual core processors Intel uses shared cache architecture and AMD introduces split cache architecture AMD announces that in their next generation processors they will use shared LLC (last level
cache) as well Intel announced that they are working on a four-core processors. Analysts think that
AMD are doing the same. Intel said that they will consider going to 8 way processors only after SW will catch
up. AMD had a similar announcement. For servers, analysts claims that Intel is building a 16 way Itanium based processor
for 2009 time frame. Power4 has 2 cores and power5 has 2 cores+SMT. They are considering to move in
the near future to 2 cores+SMT for each of them. Xbox has 3 Power4 cores.
All the three companies promise to increase the number of cores in a pace that fits the market’s needs.