Multicore Processors Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings
Multicore Processors
Raul Queiroz Feitosa
Parts of these slides are from the support material provided by W. Stallings
Multicore Computers 2
Objective
“This chapter provides an overview of
multicore systems”.
Stallings
Objective
Multicore Computers 3
Outline
Hardware Performance Issues
Software Performance Issues
Multicore Organizations
Intel Core Architecture
Outline
Multicore Computers 4
Hardware performance issues
Chip Density
Microprocessors performance increase in due to
a) Improved organization, e.g.,
b) Increased clock frequency
both made possible by 1. increasing chip density!
Pipelining
Superscalar
Multithreading
…
By 2015 → 100 billion transistors on 300mm2 die.
Multicore Computers 6
Hardware performance issues
Relative Performance/cycle
Pollack’s rule: performance is roughly proportional
to square root of increase in complexity Double complexity gives 40% more performance
2. Diminishing gains with complexity increase!
Multicore Computers 7
Hardware performance issues
Power
3. Power requirements grow exponentially with chip density and clock frequency!
Multicore Computers 8
Hardware performance issues Increased Complexity
4. Memory transistors have a power density one order of magnitude lower than that of logic.
Multicore Computers 9
Outline
Hardware Performance Issues
Software Performance Issues
Multicore Organizations
Intel Core Architecture
Outline
Multicore Computers 10
Software Performance Issues
small amounts of serial code impact performance
According to Amdahl’s law
where f is the fraction of code infinitely parallelizable with no schedule
overhead.
N
ff
N
1
1
processors parallel on program execute totime
processor single aon program execute totimeSpeedup
Multicore Computers 11
Software Performance Issues
Small amounts of serial code impact performance due to
Communication, distribution of work and cache coherence overheads
percentage of
sequential code
Multicore Computers 12
Software Performance Issues
More recently software Engineers have developed applications that
effectively exploit multiprocessor architecture, e. g., database
applications.
5. New applications exploit multiprocessor architecture!
Multicore Computers 13
Outline
Hardware Performance Issues
Software Performance Issues
Multicore Organization
Intel Core Architecture
Outline
Multicore Computers 14
Multicore Organization
In view of: 1. Increasing chip density.
2. Diminishing gains with complexity increase,
3. Power requirements grow exponentially with chip density and clock frequency.
4. Memory transistors have a power density an order of magnitude lower than that of logic.
5. Applications, which exploit multiprocessor architecture.
what to do with extra transistors made available by the semiconductor industry?
Multicore Computers 15
Multicore Organization
What to do with extra transistors made
available by the semiconductor industry?
Reduce complexity, so that multiple complete processors
fit in a single chip
Reduce clock frequency and increase the proportion of
chip occupied by cache to reduce power requirements
Multicore Computers 16
Multicore Organization
Main variable in a multicore organization:
Number of core processors on chip
Number of levels of cache on chip
Amount of shared cache
Multicore Computers 17
Multicore Organization Alternatives
ARM11 MPCore AMD Opteron
Intel Core Duo Intel Core i7
Multicore Computers 18
Private × shared L2 Cache
Advantages of shared L2 Cache
Constructive interference reduces overall miss rate
Data shared by multiple cores not replicated at cache level
With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic
Threads with less locality can have more cache
Easy inter-process communication through shared memory
Cache coherency confined to L1
Advantages of private L2 Cache
Dedicated L2 cache gives each core more rapid access
Shared L3 cache may also improve performance
Multicore Computers 19
Outline
Hardware Performance Issues
Software Performance Issues
Multicore Organization
Intel Core Architecture
Outline
Multicore Computers 20
Intel Core Architecture
Intel Core Duo uses superscalar cores
Intel Core i7 uses simultaneous multi-
threading (SMT)
Scales up number of threads supported
4 SMT cores, each supporting 4 threads appears as 16
cores.
Multicore Computers 22
Intel x86 Core Duo Organization
Introduced in 2006
Two x86 superscalar, shared L2 cache
Dedicated L1 cache per core implementing MESI protocol
Protocol extended to accommodate multiple chips (SMP)
Thermal control unit per core
Manages chip heat dissipation
Maximize performance within thermal constraints
If temperature of a core exceeds a threshold, clock rate reduced.
Advanced Programmable Interrupt Controlled (APIC)
Inter-process interrupts between cores
Routes I/O interrupts to appropriate core.
Each APIC includes a timer, so that OS can interrupt the local core.
Multicore Computers 23
Intel x86 Core Duo Organization
Power Management Logic
Monitors thermal conditions and CPU activity
Adjusts voltage and power consumption
Can switch individual logic subsystems
2MB shared L2 cache
Dynamic allocation
MESI support for L1 caches
Extended to support multiple Core Duo in SMP
L2 data shared between local cores or external
Bus interface
Multicore Computers 24
Intel Core i7 Organization
256 KB
L2 Cache
DDR3 Memory
Controllers
Core 1
32 KB I&D
L1 Caches
Up to 15 MB
L3 Cache
QuickPath
Interconnect
Core n
32 KB I&D
L1 Caches
256 KB
L2 Caches
256 KB
L2 Caches
● ● ●
unboxing
Multicore Computers 25
Intel Core i7 Organization
Introduced in November 2008, 2nd Generation in January 2011
Up to six x86 SMT processors
Dedicated L2, shared L3 cache
Speculative pre-fetch for caches
On chip DDR3 memory controller Three 8 byte channels (192 bits) giving 32GB/s
No front side bus
Turbo Boost
Clock frequency is incrementally adjusted on demand.
Hardware Virtualization A facility that allows multiple operating systems to simultaneously processor
resources in a safe and efficient manner
Multicore Computers 26
Intel Core i7 Organization
QuickPath Interconnection
Cache coherent point-to-point
link
High speed communications
between processor chips
6.4G transfers per second, 16 bits
per transfer
Dedicated bi-directional pairs
Total bandwidth 25.6GB/s
Intel QPI animated demo
Multicore Computers 27
Intel Multicore Processors
Cache Latency
Compare Intel Core Processors
General Information about Intel Processors