This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Unit-6
Slide 2
Contents: 1. Reduced Instruction Set Computers 2. Complex
Instruction Set Computers 3. Super Scalars 4. Vector Processing 5.
Parallel Cluster Computers 6. Distributed Computers.
Slide 3
1. RISC Reduced Instruction Set Computer Key features Large
number of general purpose registers or use of compiler technology
to optimize register use Limited and simple instruction set
Emphasis on optimising the instruction pipeline Driving force for
CISC Software costs far exceed hardware costs Increasingly complex
high level languages Leads to: Large instruction sets More
addressing modes Hardware implementations of HLL statements
Slide 4
Execution Characteristics Operations performed Operands used
Execution sequencing Operations Assignments Movement of data
Conditional statements (IF, LOOP) Sequence control Procedure
call-return is very time consuming Some HLL instruction lead to
many machine code operations
Slide 5
Procedure Calls Very time consuming Depends on number of
parameters passed Depends on level of nesting Most programs do not
do a lot of calls followed by lots of returns Most variables are
local (c.f. locality of reference) Implications Best support is
given by optimising most used and most time consuming features
Large number of registers Operand referencing Careful design of
pipelines Branch prediction etc. Simplified (reduced) instruction
set
Slide 6
Large Register File Software solution Require compiler to
allocate registers Allocate based on most used variables in a given
time Requires sophisticated program analysis Hardware solution Have
more registers Thus more variables will be in registers Registers
for Local Variables Store local scalar variables in registers
Reduces memory access Every procedure (function) call changes
locality Parameters must be passed Results must be returned
Variables from calling programs must be restored
Slide 7
Register Windows Only few parameters Limited range of depth of
call Use multiple small sets of registers Calls switch to a
different set of registers Returns switch back to a previously used
set of registers Register Windows cont. Three areas within a
register set Parameter registers Local registers Temporary
registers Temporary registers from one set overlap parameter
registers from the next This allows parameter passing without
moving data
Slide 8
Overlapping Register Windows Circular Buffer diagram
Slide 9
Operation of Circular Buffer When a call is made, a current
window pointer is moved to show the currently active register
window If all windows are in use, an interrupt is generated and the
oldest window (the one furthest back in the call nesting) is saved
to memory A saved window pointer indicates where the next saved
windows should restore to Global Variables Allocated by the
compiler to memory Inefficient for frequently accessed variables
Have a set of registers for global variables
Slide 10
Registers v Cache Large Register FileCache All local
scalarsRecently used local scalars Individual variablesBlocks of
memory Compiler assigned global variablesRecently used global
variables Save/restore based on procedure Save/restore based on
nesting caching algorithm Register addressingMemory addressing
Slide 11
Referencing a Scalar - Window Based Register File Cache
Slide 12
RISC Characteristics One instruction per cycle Register to
register operations Few, simple addressing modes Few, simple
instruction formats Hardwired design (no microcode) Fixed
instruction format More compile time/effort
Slide 13
RISC Pipelining Most instructions are register to register Two
phases of execution I: Instruction fetch E: Execute ALU operation
with register input and output For load and store I: Instruction
fetch E: Execute Calculate memory address D: Memory Register to
memory or memory to register operation
Slide 14
Effects of Pipelining
Slide 15
Optimization of Pipelining Delayed branch Does not take effect
until after execution of following instruction This following
instruction is the delay slot
Slide 16
CISC and RISC (contd.,) CISC: Complex instruction set
computers. Complex instructions involve a large number of steps. If
individual instructions perform more complex operations, fewer
instructions will be needed, leading to a lower value of N and a
larger value of S. Complex instructions combined with pipelining
would achieve good performance.
Slide 17
Complex Instruction Set Computer Another characteristic of CISC
computers is that they have instructions that act directly on
memory addresses For example, ADD L1, L2, L3 that takes the
contents of M[L1] adds it to the contents of M[L2] and stores the
result in location M[L3] An instruction like this takes three
memory access cycles to execute That makes for a potentially very
long instruction execution cycle The problems with CISC computers
are The complexity of the design may slow down the processor. The
complexity of the design may result in costly errors in the
processor design and implementation. Many of the instructions and
addressing modes are used rarely.
Slide 18
Overlapped Register Windows R15 R10 R25 R16 Local to D Common
to C and D Local to C Common to B and C Local to B Common to A and
B Local to A Common to A and D Proc D Proc C Proc B Proc A R9 R0
Common to all procedures Global registers R31 R26 R9 R0 R15 R10 R25
R16 R31 R26 R41 R32 R47 R42 R57 R48 R63 R58 R73 R64 R41 R32 R47 R42
R57 R48 R63 R58 R73 R64
Slide 19
Characteristics Of RISC RISC Characteristics Advantages of RISC
- VLSI Realization - Computing Speed - Design Costs and Reliability
- High Level Language Support - Relatively few instructions -
Relatively few addressing modes - Memory access limited to load and
store instructions - All operations done within the registers of
the CPU - Fixed-length, easily decoded instruction format -
Single-cycle instruction format - Hardwired rather than
microprogrammed control
Slide 20
3. Super Scalar operation A higher degree of concurrency can be
achieved if multiple instruction pipelines are implemented in the
processor. This means that multiple function units are used,
creating parallel paths through which different instructions can be
executed in parallel. With such an arrangement, it becomes possible
to start the execution of several instructions in every clock
cycle. This mode of execution is called Super scalar
operation.
Slide 21
General Superscalar Organization Superpipelined Many pipeline
stages need less than half a clock cycle Double internal clock
speed gets two tasks per external clock cycle Superscalar allows
parallel fetch execute
Slide 22
Superscalar Vs Superpipeline
Slide 23
Limitations Instruction level parallelism Compiler based
optimisation Hardware techniques Limited by True data dependency
(ADD r1, r2 (r1 := r1+r2;) MOVE r3,r1 (r3 := r1;) Can fetch and
decode second instruction in parallel with first Can NOT execute
second instruction until first is finished) Procedural dependency
(Can not execute instructions after a branch in parallel with
instructions before a branch Also, if instruction length is not
fixed, instructions have to be decoded to find out how many fetches
are needed This prevents simultaneous fetches) Resource conflicts
(Two or more instructions requiring access to the same resource at
the same time e.g. two arithmetic instructions Can duplicate
resources e.g. have two arithmetic units Output dependency
Antidependency
Slide 24
Effect of Dependencies
Slide 25
4. Vector Computation Maths problems involving physical
processes present different difficulties for computation
Aerodynamics, seismology, meteorology Continuous field simulation
High precision Repeated floating point calculations on large arrays
of numbers Supercomputers handle these types of problem Hundreds of
millions of flops $10-15 million Optimised for calculation rather
than multitasking and I/O Limited market Research, government
agencies, meteorology Array processor Alternative to supercomputer
Configured as peripherals to mainframe & mini Just run vector
portion of problems
Slide 26
Approaches to Vector Computation
Slide 27
5. Parallel Cluster Multiple Processor Organization Single
instruction, single data stream - SISD Single instruction, multiple
data stream - SIMD Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream- MIMD
Slide 28
Taxonomy of Parallel Processor Architectures
Slide 29
Loosely Coupled - Clusters Collection of independent
uniprocessors or SMPs Interconnected to form a cluster
Communication via fixed path or network connections Parallel
Organizations - SISD
Organization Classification Time shared or common bus Multiport
memory Central control unit Symmetric Multiprocessor
Organization
Slide 33
Clusters Alternative to SMP High performance High availability
Server applications A group of interconnected whole computers
working together as unified resource Illusion of being one machine
Each computer called a node Cluster Benefits Absolute scalability
Incremental scalability High availability Superior
price/performance
Slide 34
Cluster Configurations - Shared Disk Cluster Configurations -
Standby Server, No Shared Disk