8/7/2019 Multicore Processor Report
1/19
[1]
1.INTRODUCTIONA processor is a unit that reads, decodes and executes the program instructions. The
processors were originally developed with only one core. A core is a part of processor that
actually performs1. Fetching
2. Decoding
3. Executing an instruction as shown in Fig 1.
Fig. 1.1( Single Core Computer)
Single core processor is a processing system; is an Integrated Circuit (IC) which allows
two or more individual and independent cores have been attached, on a single die. Placing
two or more powerful computing cores on a single processor opens up a world of
important new possibility. Each core has its own complete set of resources and may share
on-die cache layers.
A single core processor can process only one instruction at a time. To improve the
efficiency, processor commonly utilizes pipelines internally, which allow several
instructions to be processed together. However they are still consumed into the pipeline at
a time.
8/7/2019 Multicore Processor Report
2/19
[2]
Fig.1.2 (Single Core)
So it laid to evolution of Multi core processor; placing two or more powerful computing
cores on a single processor opens up a world of important new possibility to increase the
performance of the system.
Need of multi core processors:The difficulties in using a single core CPU gave birth to using the Multi core
processors.
1. Difficult to make Single core clock frequency even higher in cost .
Fig 1.3.(Clock Frequency and Performance)
The difficulty in raising clock frequency further results in improvement of performance
but decreases reliability.
Doubling the frequency causes fourfold increase in power consumption. Calculated as
Power = Capacitance * Voltage * Frequency.
2. Many New applications are Multi Threaded
8/7/2019 Multicore Processor Report
3/19
[3]
3. General trend in Computer Architecture now-a-days shift towards Parallelism .Deeply pipelined circuits which would lead to
a. Heat Problemsb. Speed of Light Problemsc.
Large Design Teams necessary
d. Server Farm Need Expensive Air ConditioningTo overcome the above drawbacks of single core processor and to increase the
performance of the system without increasing the power consumptions and with less
complexity the need of Multi core processor raised.
8/7/2019 Multicore Processor Report
4/19
[4]
2.PROCESSOR HISTORY
Intel manufactured the first microprocessor, the 4-bit 4004, in the early 1970s
which was basically just a number-crunching machine. Shortly afterwards they
developed the 8008 and 8080, both 8-bit, and Motorola followed suit with their 6800
which was equivalent to Intels 8080. The companies then fabricated 16-bit
microprocessors, Motorola had their 68000 and Intel the 8086 and 8088; the former would
be the basis for Intels 80386 32-bit and later their popular Pentium lineup which were in
the first consumer-based PCs. [18, 19] Each generation of processors grew smaller, faster,
dissipated more heat, and consumed more power.
Fig 2.1(generation of processors)
8/7/2019 Multicore Processor Report
5/19
[5]
2.1 MOOREs LAW
One of the guiding principles of computer architecture isknown as Moores Law. In
1965 Gordon Moore stated that the number of transistors on a chip will roughly double
each year.What is often quoted as Moores Law is Dave Houses revision that computer
performance will double every 18months. [20] The graph in Figure 2.2 plots many of the
early microprocessors against the number of transistors per chip.
Fig 2.2(microprocessors against the number of transistors per chip)
Throughout the 1990s and the earlier part of this decade microprocessor frequency
was synonymous with performance; higher frequency meant a faster, more capable
computer. Since processor frequency has reached a plateau, we must now consider other
aspects of the overall performance of a system: power consumption, temperature
dissipation, frequency, and number of cores. Multicore processors are often run at slower
frequencies, but have much better performance than a single-core processor because two
heads are better than one.
8/7/2019 Multicore Processor Report
6/19
[6]
3. ARCHITECTUREThe processor or cores are implemented on a single die or chip .Each core has its
own complete set of resources, and may share the on-die cache layers.
Fig 3.1(architecture of multicore processor)
Core:
The individual processors that are implemented on the integrated die or chip are
called the cores. The core is the part of the processor that actually performs the reading
and executing of instructions.
Register file:A register file is an array of processor registers in a central processing unit (CPU).
Modern integrated circuit-based register files are usually implemented by way of fast
static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read
and write ports, whereas ordinary multiported SRAMs will usually read and write through
the same ports.
Bus:
Back side bus connects the processor with cache memory.
Cache:
Closest to the processor is Level 1 (L1) cache; this is very fast memory used to store data
frequently used by the processor. Level2 (L2) cache is just off-chip, slower than L1 cache,
but still much faster than main memory; L2 cache is larger than L1 cache and used for the
same purpose.The cores not necessarily share the cache.
8/7/2019 Multicore Processor Report
7/19
[7]
Cross bar:
Cross bar switch is a switch connecting multiple inputs to multiple outputs in a matrix
manner. Here the cross bar switch is used to connect the system request queue (SRQ) and
the integrated memory controller. Here it directly connects both CPU cores to theHyperTransport link, as well as the integrated memory controller, for I/O to and from the
outside world. Think of it like a train-track switch - signals can pass to/from either core
and the outside world, but not at the same time.
Hyper transport link:
Hyper-transport technology is a technology for interconnection of computer
processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point
link. This is replacement for front side bus.
Fig 3.2[multicore processor implemented with 4 independent processor]
Integrated memory controller:
The memory controller is a digital circuit which manages the flow of data going toand from the main memory.
System request queue:
The System Request Queue provides an interface for the CPU cores to the crossbar, and it
is what keeps things operating smoothly. The System Request Queue manages and
8/7/2019 Multicore Processor Report
8/19
[8]
prioritizes both CPU cores' access to the crossbar switch, minimizing contention for the
system bus. The result is a very efficient use of system resources.
3.1CORE COMPONENTS
.Pipeline:
One widely accepted technique for improving the performance of serial software
tasks is pipelining. Simply put, pipelining is the process of dividing a serial task into
concrete stages that can be executed in assembly-line fashion. In order to gain the most
performance increase possible from pipelining, individual stages must be carefully
balanced so that no single stage takes a much longer time to complete than other
stages.Deeper pipeline buys frequency at expense of increased cache miss penalty and
lower instructions per clock. Shallow pipeline gives better instructions per clock at the
expense of frequency scaling. Max. frequency per core requires deeper pipelines
Cache:
With the rising gap between processor and memory speed, maximizing on-chip
cache capacity is crucial to attaining good performance. Memory system designers employ
hierarchies of caches to manage latency. Many of todays multicore processors assume
private L1 caches and a shared L2 cache. At some point, however, a single shared L2
cache will require additional levels in the hierarchy. One option designers can consider is
implementing a physical hierarchy that consists of multiple clusters, where each clusterconsists of a group of processor cores that share an L2 cache. The effectiveness of such a
physical hierarchy, however, may depend on how well the applications map to the
hierarchy. Cache size buys performance at expense of die size. Deep pipeline cache miss
penalties are reduced by larger caches.
8/7/2019 Multicore Processor Report
9/19
[9]
4.SIMULTANEOUSMULTITHREADING WITH
MULTICORE ARCHITECTURE
Simultaneous multithreading, often abbreviated as SMT, is a technique forimproving the overall efficiency of superscalar CPUs with hardware multithreading. SMT
permits multiple independent threads of execution to better utilize the resources provided
by modern processor architectures.
In simultaneous multithreading, instructions from more than one thread can be
executing in any given pipeline stage at a time. This is done without great changes to the
basic processor architecture: the main additions needed are the ability to fetch instructions
from multiple threads in a cycle, and a larger register file to hold data from multiple
threads. The number of concurrent threads can be decided by the chip designers, but
practical restrictions on chip complexity have limited the number to two for most SMTimplementations.
Without the implementation of simultaneous multithreading only a single thread
can run at a time, whether it deals with different functional unit i.e integer unit or floating
point unit. Using simultaneous multithreading the two functional units can execute two
different threads concurrently. More important, however, two programs could now run
simultaneously on a processor without having to be swapped in and out. To induce the
operating system to recognize one processor as two possible execution pipelines, the new
chips were made to appear as two logical processors to the operating system.
Fig 4.1 [early processor : fig 4.2[processor with simultaneous
One chip,one core multithreading: one chip,one
One executing thread ] core ,two executing thread ]
8/7/2019 Multicore Processor Report
10/19
[10]
The performance of simultaneous multithreading was limited by the availability of shared
resources to the two executing threads. As a result, SMT Technology cannot approach the
processing throughput of two distinct processors because of the contention for these
shared resources. To achieve greater performance gains on a single chip, a processorwould require two or more separate cores, such that each thread would have its own
complete set of execution resources.
Fig 4.3[multicore : one chip , many cores, several threads of execution ]
In this design, each core has its own execution pipeline. And each core has the resources
required to run without blocking resources needed by the other software threads. While
the example in Figure shows a four-core design, there is no inherent limitation in the
number of cores that can be placed on a single chip. The multi-core design enables two or
more cores to run at somewhat slower speeds and at much lower temperatures. The
combined throughput of these cores delivers processing power greater than the maximum
available today on single-core processors and at a much lower level of power
consumption.
8/7/2019 Multicore Processor Report
11/19
[11]
5.PERFORMANCE ANALYSIS
A multicore arrangement that provides two or more low-clock speed cores could be
designed to provide excellent performance while minimizing power consumption and
delivering lower heat output than configurations that rely on a single high-clock-speedcore.
The following example shows how multicore technology could manifest in a
standard server configuration and how multiple low-clock-speed cores could deliver
greater performance than a single high-clock-speed core for networked applications. This
example uses some simple math and basic assumptions about the scaling of multiple
processors and is included for demonstration purposes only. Until multicore processors are
available, scaling and performance can only be estimated based on technical models. The
example described in this article shows one possible method of addressing relative
performance levels as the industry begins to move from platforms based on single-core
processors to platforms based on multicore processors. Other methods are possible, and
actual processor performance and processor scalability are tied to a variety of platform
variables, including the specific configuration and application environment. Several
factors can potentially affect the internal scalability of multiple cores, such as the system
compiler as well as architectural considerations including memory, I/O, front side bus
(FSB), chip set, and so on.
For instance, enterprises can buy a dual-processor server today to provide e-mail,
calendaring, and messaging functions. Dual-processor servers are designed to deliver
excellent price/performance for messaging applications. In our example we use dual 3.6
GHz processors supporting simultaneous multithreading Technology.
The following simple example can help explain the relative performance of a low-
clock-speed, dual-core processor versus a high-clock-speed, dual-processor counterpart.
Dual-processor systems available today offer a scalability of roughly 80 percent for the
second processor, depending on the OS, application, compiler, and other factors. That
means the first processor may deliver 100 percent of its processing power, but the second
processor typically suffers some overhead from multiprocessing activities. As a result, the
two processors do not scale linearlythat is, a dual-processor system does not achieve a
200 percent performance increase over a single-processor system, but instead provides
approximately 180 percent of the performance that a single-processor system provides. In
this article, the single-core scalability factor is referred to as external, or socket-to-socket,
scalability. When comparing two single-core processors in two individual sockets, the
dual 3.6 GHz processors would result in an effective performance level of approximately
6.48 GHz (see Figure).
8/7/2019 Multicore Processor Report
12/19
[12]
Fig 5.1(Sample core speed and anticipated total relative power in a system using two
single-core processors)
For multicore processors, administrators must take into account not only socket-to-socket
scalability but also internal, orcore-to-core, scalabilitythe scalability between multiple
cores that reside within the same processor module. In this example, core-to-core
scalability is estimated at 70 percent, meaning that the second core delivers 70 percent of
its processing power.
Thus, in the example system using 2.8 GHz dual-core processors, each dual-core
processor would behave more like a 4.76 GHz processor when the performance of the two
cores2.8 GHz plus 1.96 GHzis combined. For demonstration purposes, this example
assumes that, in a server that combines two such dual-core processors within the same
system architecture, the socket-to-socket scalability of the two dualcore processors would
be similar to that in a server containing two single-core processors80 percent
scalability. This would lead to an effective performance level of 8.57 GHz (see Figure).
8/7/2019 Multicore Processor Report
13/19
[13]
Fig 5.2 (Sample core speed and anticipated total relative power in a system using two
dual-core processors)
8/7/2019 Multicore Processor Report
14/19
[14]
6.MAPPING OF AN APPLICATION TO
MULTICORE PROCESSOR
Task parallelism is the concurrent execution of independent tasks in software. On a single-core processor, separate tasks must share the same processor. On a multicore processor,
tasks essentially run independently of one another, resulting in more efficient execution.
For mapping first comes Identifying a Parallel Task Implementation. Identifying
the task parallelism in an application is a challenge that, for now, must be tackled
manually. After identifying parallel tasks, mapping and scheduling the tasks across a
multicore system requires careful planning. A four-step process, derived from Software
Decomposition for Multicore Architectures [1], is proposed to guide the design of the
application:
1. Partitioning Partitioning of a design is intended to expose opportunities for parallel
execution. The focus is on defining a large number of small tasks in order to yield a fine-
grained decomposition of a problem.
2. Communication The tasks generated by a partition are intended to execute
concurrently but cannot, in general, execute independently. The computation to be
performed in one task will typically require data associated with another task. Data must
then be transferred between tasks to allow computation to proceed. This information flow
is specified in the communication phase of a design.
3. Combining Decisions made in the partitioning and communication phases are
reviewed to identify a grouping that will execute efficiently on the multicore architecture.
4. Mapping This stage consists of determining where each task is to execute.
8/7/2019 Multicore Processor Report
15/19
[15]
7. APPLICATIONS
Database servers Web servers (Web commerce) Compilers Video editing Encoding. 3D gaming. Powerful graphics solution The full effect and the advantage of having a multi-core processor, when it is used
together with a multithreading operating.
Multimedia applications Scientific applications, CAD/CAM In general, applications with Thread-level parallelism
Fig 7.1 (two different processes running concurrently in two different cores)
8/7/2019 Multicore Processor Report
16/19
[16]
8. ADVANTAGES AND DISADVANTAGES
8.1 ADVANTAGES
(i).Cache coherency: The proximity of multiple CPU cores on the same die allowsthe cache coherency circuitry to operate at a much higher clock-rate than is possible if the
signals have to travel off-chip. Combining equivalent CPUs on a single die significantly
improves the performance of cache snoop (alternative: Bus snooping) operations. Put
simply, this means that signals between different CPUs travel shorter distances, and
therefore those signals degrade less. These higher-quality signals allow more data to be
sent in a given time period, since individual signals can be shorter and do not need to be
repeated as often. (ii)less power requirement: The largest boost in performance will likely
be noticed in improved response-time while running CPU-intensive processes, like
antivirus scans, ripping/burning media (requiring file conversion), or file searching. Forexample, if the automatic virus-scan runs while a movie is being watched, the application
running the movie is far less likely to be starved of processor power, as the antivirus
program will be assigned to a different processor core than the one running the movie
playback.(iii).Reduced size of printed circuit board: Assuming that the die can fit into the
package, physically, the multi-core CPU designs require much less printed circuit board
(PCB) space than do multi-chip SMP designs. Also, a dual-core processor uses slightly
less power than two coupled single-core processors, principally because of the decreased
power required to drive signals external to the chip. Furthermore, the cores share some
circuitry, like the L2 cache and the interface to the front side bus (FSB). In terms ofcompeting technologies for the available silicon die area, multi-core design can make use
of proven CPU core library designs and produce a product with lower risk of design error
than devising a new wider core-design. Also, adding more cache suffers from diminishing
returns. (iv) Multi-tasking productivity: multi-core processor PC users will experience
exceptional performance while executing multiple tasks simultaneously. The ability to do
complex, multi-tasked workloads, such as creating professional digital content while
checking and writing e-mails in the foreground, and also running firewall software or
downloading audio files off the Web in the background, will allow consumers and
workers to do more work in less time (v). PC security can be enhanced because multicore
processors can run more sophisticated virus, spam, and hacker protection in the
background without performance penalties (vi) Cool and quiet : The enhanced
performance offered by multicore processors will come without the additional heat and
fan noise that would likely accompany performance increases with single-core processor
machines.
8/7/2019 Multicore Processor Report
17/19
[17]
8.2. DISADVANTAGES
Maximizing the utilization of the computing resources provided by multi-core
processors requires adjustments both to the operating system (OS) support and to existing
application software.
Also, the ability of multi-core processors to increase application performance
depends on the use of multiple threads within applications.
Integration of a multi-core chip drives chip production yields down and they are
more difficult to manage thermally than lower-density single-chip designs.
From an architectural point of view, ultimately, single CPU designs may make
better use of the silicon surface area than multiprocessing cores, so a development
commitment to this architecture may carry the risk of obsolescence.
Using amulticore processor to its full potential is another issue. If programmers
dont write applicationsthat take advantage of multiple cores there is no gain, and in some
cases there is a loss ofperformance. Application need to be written so that different parts
can be run concurrently .They donot work as n times of single core processor due to
shocket to shocket scalability and core to core scalability.
8/7/2019 Multicore Processor Report
18/19
[18]
9. CONCLUSION
In the next years the trend will go to multi-core processors more and more. The
main reason is that they are faster than single-core processors and they can be still
improved., but added interesting new problems. But in the future there will be still someapplications for single-core processors because not every system needs a fast processor.
Several new multi-core chips in design phases. Parallel programming techniques
likely to gain importance.
8/7/2019 Multicore Processor Report
19/19
[19]
REFERENCES
[1] L Hammond, BA Nayfeh, K Olukotun, A Single-Chip Multiprocessor, IEEE, sept
1997
[2] P. Frost Gorder, Multicore Processors for Science and Engineering , IEEE CS,
March/April 2007
[3] D. Geer, Chip Makers Turn to Multicore Processors, Computer, IEEE Computer
Society, May 2005
[4] R. Merritt, CPU Designers Debate Multi-core Future, EETimes Online, February
2008, http://www.eetimes.com/showArticle.jhtml?articleID=206105179
[5] R. Merritt, X86 Cuts to the Cores, EETimes Online, September 2007,
http://www.eetimes.com/showArticle.jtml?articleID=202100022