This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals
• Computing Element Choices:– Computing Element Programmability– Spatial vs. Temporal Computing– Main Processor Types/Applications
• General Purpose Processor Generations• The Von Neumann Computer Model• CPU Organization (Design)• Recent Trends in Computer Design/performance• Hierarchy of Computer Architecture• Hardware Description: Register Transfer Notation (RTN)• Computer Architecture Vs. Computer Organization• Instruction Set Architecture (ISA):
– Definition and purpose– ISA Specification Requirements– Main General Types of Instructions– ISA Types and characteristics – Typical ISA Addressing Modes– Instruction Set Encoding– Instruction Set Architecture Tradeoffs– Complex Instruction Set Computer (CISC)– Reduced Instruction Set Computer (RISC)– Evolution of Instruction Set Architectures
Computing Element Choices• General Purpose Processors (GPPs): Intended for general purpose computing
(desktops, servers, clusters..)• Application-Specific Processors (ASPs): Processors with ISAs and
architectural features tailored towards specific application domains– E.g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors,
Graphics Processing Units (GPUs), Vector Processors??? ...
• Co-Processors: A hardware (hardwired) implementation of specific algorithms with limited programming interface (augment GPPs or ASPs)
• Configurable Hardware:– Field Programmable Gate Arrays (FPGAs)– Configurable array of simple processing elements
• Application Specific Integrated Circuits (ASICs): A custom VLSI hardware solution for a specific computational task
• The choice of one or more depends on a number of factors including: - Type and complexity of computational algorithm
(general purpose vs. Specialized) - Desired level of flexibility/ - Performance requirements programmability - Development cost/time - System cost - Power requirements - Real-time constrains
The main goal of this course is the study of fundamental design techniques for General Purpose Processors
Co-ProcessorsApplication Specific Integrated Circuits (ASICs)
Configurable Hardware
- Type and complexity of computational algorithms (general purpose vs. Specialized)- Desired level of flexibility - Performance - Development cost - System cost - Power requirements - Real-time constrains
Selection Factors:
Specialization , Development cost/time Performance/Chip Area/Watt(Computational Efficiency)
Pro
gram
mab
ility
/ The main goal of this course is the study of fundamental design techniquesfor General Purpose Processors
Processor : Programmable computing element that runs programs written using a pre-defined set of instructions
Main Processor Types/ApplicationsMain Processor Types/Applications
• General Purpose Computing & General Purpose Processors (GPPs) – High performance: In general, faster is always better.– RISC or CISC: Intel P4, IBM Power4, SPARC, PowerPC, MIPS ...– Used for general purpose software– End-user programmable– Real-time performance may not be fully predictable (due to dynamic arch. features)– Heavy weight, multi-tasking OS - Windows, UNIX– Normally, low cost and power not a requirement (changing)– Servers, Workstations, Desktops (PC’s), Notebooks, Clusters …
• Embedded Processing: Embedded processors and processor cores– Cost, power code-size and real-time requirements and constraints– Once real-time constraints are met, a faster processor may not be better– e.g: Intel XScale, ARM, 486SX, Hitachi SH7000, NEC V800...– Often require Digital signal processing (DSP) support or other application-specific support (e.g network, media processing)– Single or few specialized programs – known at system design time– Not end-user programmable– Real-time performance must be fully predictable (avoid dynamic arch. features)– Lightweight, often realtime OS or no OS– Examples: Cellular phones, consumer electronics .. …
• Microcontrollers – Extremely code size/cost/power sensitive– Single program– Small word size - 8 bit common– Usually no OS– Highest volume processors by far– Examples: Control systems, Automobiles, industrial control, thermostats, ...
Incr
easi
ngC
ost/
Com
plex
ity
Increasingvolum
e
Examples of Application-Specific Processors (ASPs)
The main goal of this course is the study of fundamental design techniques for General Purpose Processors
Processor = Programmable computing element
that runs programs written using pre-defined instructions
The Von Neumann Computer ModelThe Von Neumann Computer Model• Partitioning of the programmable computing engine into components:
– Central Processing Unit (CPU): Control Unit (instruction decode , sequencing of operations), Datapath (registers, arithmetic and logic unit, connections, buses …).
– Memory: Instruction (program) and operand (data) storage.– Input/Output (I/O) sub-system: I/O bus, interfaces, devices.– The stored program concept: Instructions from an instruction set are fetched from a common
memory and executed one at a time
-Memory
(instructions, data)
Control
DatapathregistersALU, buses
CPUComputer System
Input
Output
I/O Devices
Major CPU Performance Limitation: The Von Neumann computing model implies Neumann computing model implies sequential executionsequential execution one instruction at a time one instruction at a time Another Performance Limitation: Separation of CPU and memory (The Von Neumann memory bottleneck)Von Neumann memory bottleneck)
The Program Counter (PC) points to next instruction to be processed
CPU Organization (Design)CPU Organization (Design)• Datapath Design:
– Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions
– (e.g., Registers, ALU, Shifters, Logic Units, ...)– Ways in which these components are interconnected (buses
connections, multiplexors, etc.).– How information flows between components.
• Control Unit Design:– Logic and means by which such information flow is controlled.– Control and coordination of FUs operation to realize the targeted
Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram).
• Hardware description with a suitable language, possibly using Register Transfer Notation (RTN).
Components & their connections needed by ISA instructions
Control/sequencing of operations of datapath componentsto realize ISA instructions
Components
Connections
ISA = Instruction Set ArchitectureThe ISA forms an abstraction layer that sets the requirements for both complier and CPU designers
CPU Core1 GHz - 3.8 GHz4-way SuperscalerRISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware speculation
L1
L2 L3
Memory Bus
All Non-blocking cachesL1 16-128K 1-2 way set associative (on chip), separate or unifiedL2 256K- 2M 4-32 way set associative (on chip) unifiedL3 2-16M 8-32 way set associative (off or on chip) unified
Computer Technology Trends:Computer Technology Trends: Evolutionary but Rapid ChangeEvolutionary but Rapid Change
• Processor:– 1.5-1.6 performance improvement every year; Over 100X performance in last
decade.
• Memory:– DRAM capacity: > 2x every 1.5 years; 1000X size in last decade.– Cost per bit: Improves about 25% or more per year.– Only 15-25% performance improvement per year.
• Disk:– Capacity: > 2X in size every 1.5 years.– Cost per bit: Improves about 60% per year.– 200X size in last decade.– Only 10% performance improvement per year, due to mechanical limitations.
Computer Architecture Vs. Computer OrganizationComputer Architecture Vs. Computer Organization• The term Computer architecture is sometimes erroneously restricted
to computer instruction set design, with other aspects of computer design called implementation.
• More accurate definitions:
– Instruction Set Architecture (ISA): The actual programmer-visible instruction set and serves as the boundary or interface between the software and hardware.
– Implementation of a machine has two components:• Organization: includes the high-level aspects of a computer’s
design such as: The memory system, the bus structure, the internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations.
• Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology.
• In general, Computer Architecture refers to the above three aspects:
1- Instruction set architecture 2- Organization. 3- Hardware.
The ISA forms an abstraction layer that sets therequirements for both complier and CPU designers
Instruction Set Architecture (ISA)Instruction Set Architecture (ISA)“... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.” – Amdahl, Blaaw, and Brooks, 1964.
The instruction set architecture is concerned with:
• Organization of programmable storage (memory & registers): Includes the amount of addressable memory and number of available registers.
• Data Types & Data Structures: Encodings & representations.
• Instruction Set: What operations are specified.
• Instruction formats and encoding.
• Modes of addressing and accessing data items and instructions
• Exceptional conditions.
The ISA forms an abstraction layer that sets therequirements for both complier and CPU designers
Computer Instruction SetsComputer Instruction Sets• Regardless of computer type, CPU structure, or
hardware organization, every machine instruction must specify the following:
– Opcode: Which operation to perform. Example: add, load, and branch.
– Where to find the operand or operands, if any: Operands may be contained in CPU registers, main memory, or I/O ports.
– Where to put the result, if there is a result: May be explicitly mentioned or implicit in the opcode.
– Where to find the next instruction: Without any explicit branches, the instruction to execute is the next instruction in the sequence or a specified address in case of jump or branch instructions.
Opcode = Operation Code
Operands location can be explicitly specified in the instruction or implied
Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) Specification RequirementsSpecification RequirementsInstruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
• Instruction Format or Encoding:– How is it decoded?
• Location of operands and result (addressing modes):– Where other than memory?– How many explicit operands? – How are memory operands located?– Which can or cannot be in memory?
• Data type and Size.• Operations
– What are supported• Successor instruction:
– Jumps, conditions, branches.• Fetch-decode-execute is implicit.
Types of Instruction Set ArchitecturesTypes of Instruction Set ArchitecturesAccording To Operand Memory Addressing FieldsAccording To Operand Memory Addressing Fields
Memory-To-Memory Machines:– Operands obtained from memory and results stored back in memory by any instruction
that requires operands.– No local CPU registers are used in the CPU datapath.– Include:
• The 4 Address Machine.• The 3-address Machine.• The 2-address Machine.
The 1-address (Accumulator) Machine: – A single local CPU special-purpose register (accumulator) is used as the source of one
operand and as the result destination.The 0-address or Stack Machine:
– A push-down stack is used in the CPU.
General Purpose Register (GPR) Machines:– The CPU datapath contains several local general-purpose registers which can be
used as operand sources and as result destinations.– A large number of possible addressing modes.– Load-Store or Register-To-Register Machines: GPR machines where only data
movement instructions (loads, stores) can obtain operands from memory and store results to memory.
CISC to RISC observation (load-store simplifies CPU design)
Machine = ISA or CPU targeting a specific ISA type
Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures Memory-To-Memory Machines: Memory-To-Memory Machines:
The 4-Address Machine/ISAThe 4-Address Machine/ISA• No program counter (PC) or other CPU registers are used.• Instruction encoding has four address fields to specify:
– Location of first operand. - Location of second operand.– Place to store the result. - Location of next instruction.
• A single register (accumulator) in the CPU is used as the source of one operand and result destination.
Instruction:
add Op1
Meaning:
Acc Acc + Op1
or more precise RTN:
Acc Acc + M[Op1Addr]PC PC + 4
Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures The 1-address (Accumulator) Machine/ISAThe 1-address (Accumulator) Machine/ISA
• CPU contains several general-purpose registers which can be used as operand sources and result destination.
Types of Instruction Set ArchitecturesTypes of Instruction Set Architectures General Purpose Register (GPR) MachinesGeneral Purpose Register (GPR) Machines
Instruction: load R8, Op1Meaning: R8 M[Op1Addr] PC PC + 5
+
CPU
ProgramCounter (PC)
24
Memory
Op1
Nexti
::
Op1Addr:
NextiAddr:
R8
R7
R6
R5
R4
R3
R2
R1
Registersload
add
store
Op1AddrloadBits: 8 3 24
Instruction Format
Opcode Where to find operand1
R8
Instruction: add R2, R4, R6Meaning: R2 R4 + R6 PC PC + 3
addBits: 8 3 3 3
Instruction Format
Opcode Des Operands
R2 R4 R6
Instruction: store R2, Op2Meaning: M[Op2Addr] R2 PC PC + 5
ResAddrstoreBits: 8 3 24
Instruction Format
Opcode Destination
R2
Here add instruction has three register specifier fieldsWhile load, store instructions have one register specifier fieldand one memory address specifier field
Size = 4.375 bytes rounded up to 5 bytes
Size = 2.125 bytes rounded up to 3 bytes
Size = 4.375 bytes rounded up to 5 bytes
Eight general purpose Registers (GPRs) assumed here: R1-R8
Instruction Set Architecture TradeoffsInstruction Set Architecture Tradeoffs• 3-address machine: shortest code sequence; a large number of bits
per instruction; large number of memory accesses.
• 0-address (stack) machine: Longest code sequence; shortest individual instructions; more complex to program.
• General purpose register machine (GPR): – Addressing modified by specifying among a small set of registers
with using a short register address (all new ISAs since 1975).
– Advantages of GPR:• Low number of memory accesses. Faster, since register access is
currently still much faster than memory access. • Registers are easier for compilers to use.• Shorter, simpler instructions.
• Load-Store Machines: GPR machines where memory addresses are only included in data movement instructions (loads/stores) between memory and registers (all new ISAs designed after 1980).
CISC to RISC observation (load-store simplifies CPU design)
ISA ExamplesISA Examples Machine Number of General Architecture year Purpose RegistersEDSACIBM 701CDC 6600IBM 360DEC PDP-8DEC PDP-11Intel 8008Motorola 6800DEC VAX
Complex Instruction Set Computer (CISC)Complex Instruction Set Computer (CISC)• Emphasizes doing more with each instruction:
– Thus fewer instructions per program (more compact code).
• Motivated by the high cost of memory and hard disk capacity when original CISC architectures were proposed– When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000– When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000
• Original CISC architectures evolved with faster more complex CPU designs but backward instruction set compatibility had to be maintained (e.g X86).
• Wide variety of addressing modes:• 14 in MC68000, 25 in MC68020
• A number instruction modes for the location and number of operands:
Example CISC ISAs Example CISC ISAs Motorola 680X0Motorola 680X0
18 addressing modes:• Data register direct.• Address register direct.• Immediate.• Absolute short.• Absolute long.• Address register indirect.• Address register indirect with postincrement.• Address register indirect with predecrement.• Address register indirect with displacement.• Address register indirect with index (8-bit).• Address register indirect with index (base).• Memory inderect postindexed.• Memory indirect preindexed.• Program counter indirect with index (8-bit).• Program counter indirect with index (base).• Program counter indirect with displacement.• Program counter memory indirect postindexed.• Program counter memory indirect preindexed.
Operand size:• Range from 1 to 32 bits, 1, 2, 4, 8,
10, or 16 bytes.
Instruction Encoding:• Instructions are stored in 16-bit
words.
• the smallest instruction is 2- bytes (one word).
• The longest instruction is 5 words (10 bytes) in length.