Top Banner
Computer Architecture I Lecture Notes Lecture Notes Dr. Ali Muhtaroğlu Fall 2009 METU Northern Cyprus Campus References: Patterson&Hennessy, “Computer Organization and Design” (4 th Ed.), Kaufmann, 2008. Stallings, “Computer Organization & Architecture” (7 th Ed.), Pearson, 2006. Mano & Kime, “Logic and Computer Design Fundamentals”, 4th Ed., Prentice Hall, 2008. Brown & Vranesic, “Fund. Of Dig. Logic with VHDL Design” (2nd Ed.), McGraw Hill, 2005. Dr. Patterson’s and Dr. Mary Jane Irwin’s (Penn State) Lecture notes
37

Computer Architecture I Lecture Presentation 1

Nov 16, 2014

Download

Documents

malkovan

Computer Archıtecture Introductıon Notes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Architecture I Lecture Presentation 1

Computer Architecture I

Lecture NotesLecture Notes

Dr. Ali Muhtaroğlu

Fall 2009

METU Northern Cyprus Campus

References:Patterson&Hennessy, “Computer Organization and Design” (4th Ed.), Kaufmann, 2008.Stallings, “Computer Organization & Architecture” (7th Ed.), Pearson, 2006.

Mano & Kime, “Logic and Computer Design Fundamentals”, 4th Ed., Prentice Hall, 2008.Brown & Vranesic, “Fund. Of Dig. Logic with VHDL Design” (2nd Ed.), McGraw Hill, 2005.Dr. Patterson’s and Dr. Mary Jane Irwin’s (Penn State) Lecture notes

Page 2: Computer Architecture I Lecture Presentation 1

IntroductionIntroduction

Lecture 1

2Ali Muhtaroğlu

Page 3: Computer Architecture I Lecture Presentation 1

What computers were…

3Ali Muhtaroğlu

EDSAC, University of Cambridge, UK, 1949

Page 4: Computer Architecture I Lecture Presentation 1

Games

Sensor NetsCameras

LaptopsMedia

Players

What computers are…

Robots

Smart phones

Routers

LaptopsPlayers

Servers

CS152-Spring’08 4

SupercomputersAutomobiles

Smart phones

4Ali Muhtaroğlu

Page 5: Computer Architecture I Lecture Presentation 1

What is Computer Architecture ?

Application

Physics

Gap too large to bridge

in one step

(but there are exceptions, e.g.

magnetic compass)

5Ali Muhtaroğlu

Physics

In its broadest definition, computer architecture is the design of

the abstraction layers that allow us to implement information

processing applications efficiently using available manufacturing

technologies.

Page 6: Computer Architecture I Lecture Presentation 1

Abstraction Layers

in Modern Computing Systems

Algorithm

Application

Programming Language

Gates/Register-Transfer Level (RTL)

Instruction Set Architecture (ISA)

Operating System/Virtual Machines

Microarchitecture

Programming Language

Circuits

Original

domain of the

computer

architect

(‘50s-’80s)

Domain of

recent

computer

architecture

(‘90s)

6Ali Muhtaroğlu

Devices

Circuits

Physics

Page 7: Computer Architecture I Lecture Presentation 1

Computer Architecture

vs. Computer Organization

• Architecture is those attributes visible to the

programmer

– Instruction set, number of bits used for data – Instruction set, number of bits used for data

representation, I/O mechanisms, addressing techniques.

– e.g. Is there a multiply instruction?

• Organization is how features are implemented

– Control signals, interfaces, memory technology.

7Ali Muhtaroğlu

– Control signals, interfaces, memory technology.

– e.g. Is there a hardware multiply unit or is it done by

repeated addition?

Caution: You may hear these used interchangeably in the

industry.

Page 8: Computer Architecture I Lecture Presentation 1

Computer Architecture

vs. Computer Organization

• All Intel x86 family share the same basic architecture

• The IBM System/370 family share the same basic architecture• The IBM System/370 family share the same basic architecture

• This provides backwards code compatibility

– Not necessarily forward compatibility

• Organization differs between different versions of computers

8Ali Muhtaroğlu

• Hardware gets cheaper and more compact � Do more in

hardware

• Performance, power dissipation, size factors also drive

changes in organization

Page 9: Computer Architecture I Lecture Presentation 1

Historical: ENIAC - background

• Electronic Numerical

Integrator And Computer

• Eckert and Mauchly

• Decimal (not binary)

• 20 accumulators of 10 digits

• Programmed manually by • Eckert and Mauchly

• University of Pennsylvania

• Trajectory tables for weapons

• Started 1943

• Finished 1946

– Too late for war effort

• Programmed manually by

switches

• 18,000 vacuum tubes

• 30 tons

• 15,000 square feet

• 140 kW power consumption– Too late for war effort

• Used until 1955

• 140 kW power consumption

• 5,000 additions per second

9Ali Muhtaroğlu

Page 10: Computer Architecture I Lecture Presentation 1

von Neumann/Turing

• Stored Program concept

• Main memory storing

programs and data

• ALU operating on binary data• ALU operating on binary data

• Control unit interpreting

instructions from memory and

executing

• Input and output equipment

operated by control unit

• Princeton Institute for

Advanced Studies

– IAS

• Completed 1952

10Ali Muhtaroğlu

Page 11: Computer Architecture I Lecture Presentation 1

IAS - details

• 1000 x 40 bit words

– Binary number

– 2 x 20 bit instructions

• Set of registers (storage in CPU)

– Memory Buffer Register

– Memory Address Register

– Instruction Register

– Instruction Buffer Register

– Program Counter– Program Counter

– Accumulator

– Multiplier Quotient

11Ali Muhtaroğlu

Page 12: Computer Architecture I Lecture Presentation 1

IAS - details

• 1000 x 40 bit words

– Binary number

– 2 x 20 bit instructions

• Set of registers (storage in CPU)

– Memory Buffer Register

– Memory Address Register

– Instruction Register

– Instruction Buffer Register

– Program Counter– Program Counter

– Accumulator

– Multiplier Quotient

12Ali Muhtaroğlu

Page 13: Computer Architecture I Lecture Presentation 1

Components of a Computer

• 5 classic components of a computer

• Input, Output, Memory, Datapath, Control

• Independent of hardware technology

• Represents the past and the present• Represents the past and the present

13Ali Muhtaroğlu

Page 14: Computer Architecture I Lecture Presentation 1

Generations of Computer

• Vacuum tube - 1946-1957

• Transistor - 1958-1964

• Small scale integration - 1965 on

– Up to 100 devices on a chip– Up to 100 devices on a chip

• Medium scale integration - to 1971

– 100-3,000 devices on a chip

• Large scale integration - 1971-1977

– 3,000 - 100,000 devices on a chip

• Very large scale integration - 1978 -1991

– 100,000 - 100,000,000 devices on a chip

• Ultra large scale integration – 1991 -

– Over 100,000,000 devices on a chip

14Ali Muhtaroğlu

Page 15: Computer Architecture I Lecture Presentation 1

Moore’s Law

• Increased density of components on chip

• Gordon Moore – co-founder of Intel

• Number of transistors on a chip will double every year

• Since 1970’s development has slowed a little

– Number of transistors doubles every 18 months

• Cost of a chip has remained almost unchanged

For the same structure/function:

• Higher packing density means shorter electrical paths, giving • Higher packing density means shorter electrical paths, giving

higher performance

• Smaller size gives increased flexibility

• Reduced power and cooling requirements

• Fewer interconnections increases reliability15Ali Muhtaroğlu

Page 16: Computer Architecture I Lecture Presentation 1

Growth in CPU Transistor Count

16Ali Muhtaroğlu

Page 17: Computer Architecture I Lecture Presentation 1

Speeding it up

• Pipelining

• On board cache• On board cache

• On board L1 & L2 cache

• Branch prediction

• Data flow analysis

Speculative execution• Speculative execution

17Ali Muhtaroğlu

Page 18: Computer Architecture I Lecture Presentation 1

Performance Balance

• Processor speed increased

• Memory capacity increased• Memory capacity increased

• Memory speed lags behind processor

speed

18Ali Muhtaroğlu

Page 19: Computer Architecture I Lecture Presentation 1

Logic and Memory Performance Gap

19Ali Muhtaroğlu

Page 20: Computer Architecture I Lecture Presentation 1

Solutions to Logic/Memory Gap

• Increase number of bits retrieved at one time

– Make DRAM “wider” rather than “deeper”

• Change DRAM interface

– Cache

• Reduce frequency of memory access

– More complex cache and cache on chip– More complex cache and cache on chip

• Increase interconnection bandwidth

– High speed buses

– Hierarchy of buses20Ali Muhtaroğlu

Page 21: Computer Architecture I Lecture Presentation 1

Typical I/O Device Data Rates

21Ali Muhtaroğlu

Page 22: Computer Architecture I Lecture Presentation 1

I/O Devices

• Peripherals with intensive I/O demands

• Large data throughput demands

• Processors can handle this• Processors can handle this

• Problem moving data

• Solutions:

– Caching

– Buffering

– Higher-speed interconnection buses

– More elaborate bus structures

– Multiple-processor configurations

22Ali Muhtaroğlu

Page 23: Computer Architecture I Lecture Presentation 1

Improvements in Chip Organization and

Architecture

• Increase hardware speed of processor

– Fundamentally due to shrinking logic gate size

• More gates, packed more tightly, increasing clock rate• More gates, packed more tightly, increasing clock rate

• Propagation time for signals reduced

• Increase size and speed of caches

– Dedicating part of processor chip

• Cache access times drop significantly• Cache access times drop significantly

• Change processor organization and architecture

– Increase effective speed of execution

– Parallelism

23Ali Muhtaroğlu

Page 24: Computer Architecture I Lecture Presentation 1

Problems with Clock Speed and Logic Density

• Power

– Power density increases with density of logic and clock speed

– Dissipating heat

• RC delay

– Speed at which electrons flow limited by resistance and capacitance of

metal wires connecting them

– Delay increases as RC product increases

– Wire interconnects thinner, increasing resistance

– Wires closer together, increasing capacitance

Memory latency• Memory latency

– Memory speeds lag processor speeds

• Solution:

– More emphasis on organizational and architectural approaches

24Ali Muhtaroğlu

Page 25: Computer Architecture I Lecture Presentation 1

Intel Microprocessor Performance

25Ali Muhtaroğlu

Page 26: Computer Architecture I Lecture Presentation 1

Increased Cache Capacity

• Typically two or three levels of cache between

processor and main memoryprocessor and main memory

• Chip density increased

– More cache memory on chip

• Faster cache access

• Pentium chip devoted about 10% of chip area to

cachecache

• Pentium 4 devotes about 50%

26Ali Muhtaroğlu

Page 27: Computer Architecture I Lecture Presentation 1

More Complex Execution Logic

• Enable parallel execution of instructions

• Pipeline works like assembly line• Pipeline works like assembly line

– Different stages of execution of different instructions at

same time along pipeline

• Superscalar allows multiple pipelines within single

processor

– Instructions that do not depend on one another can be

executed in parallelexecuted in parallel

27Ali Muhtaroğlu

Page 28: Computer Architecture I Lecture Presentation 1

Diminishing Returns

• Internal organization of processors complex

– Can get a great deal of parallelism– Can get a great deal of parallelism

– Further significant increases likely to be relatively modest

• Benefits from cache are reaching limit

• Increasing clock rate runs into power dissipation • Increasing clock rate runs into power dissipation

problem

– Some fundamental physical limits are being reached

28Ali Muhtaroğlu

Page 29: Computer Architecture I Lecture Presentation 1

Uniprocessor Performance

1000

10000

Perf

orm

ance (

vs. V

AX

-11/7

80)

??%/year

From Hennessy and Patterson, Computer

Architecture: A Quantitative Approach, 4th

edition, October, 2006

10

100

1000

Perf

orm

ance (

vs. V

AX

-11/7

80)

25%/year

52%/year

29Ali Muhtaroğlu

1

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

• VAX : 25%/year 1978 to 1986• RISC + x86: 52%/year 1986 to 2002• RISC + x86: ??%/year 2002 to present

Page 30: Computer Architecture I Lecture Presentation 1

Uniprocessor Performance

1000

10000

Perf

orm

ance (

vs. V

AX

-11/7

80)

??%/year

From Hennessy and Patterson, Computer

Architecture: A Quantitative Approach, 4th

edition, October, 2006

The End of Uniprocessor Era for High Performance

Computing !

10

100

1000

Perf

orm

ance (

vs. V

AX

-11/7

80)

25%/year

52%/year

30Ali Muhtaroğlu

1

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

• VAX : 25%/year 1978 to 1986• RISC + x86: 52%/year 1986 to 2002• RISC + x86: ??%/year 2002 to present

Page 31: Computer Architecture I Lecture Presentation 1

Déjà vu all over again?

• Multiprocessors imminent in 1970s, ‘80s, ‘90s, …

• “… today’s processors … are nearing an impasse as technologies approach the speed of light..”

David Mitchell, The Transputer: The Time Is Now (1989)David Mitchell, The Transputer: The Time Is Now (1989)

• Transputer was premature

⇒ Custom multiprocessors tried to beat uniprocessors

⇒ Procrastination rewarded: 2X seq. perf. / 1.5 years

• “We are dedicating all of our future product development to multicoredesigns. … This is a sea change in computing”

Paul Otellini, President, Intel (2004)

• Difference is all microprocessor companies have switched to

31Ali Muhtaroğlu

• Difference is all microprocessor companies have switched to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2+ CPUs)

⇒ Procrastination penalized: 2X sequential perf. / 5 yrs

⇒ Biggest programming challenge: from 1 to 2 CPUs

Page 32: Computer Architecture I Lecture Presentation 1

Multiple Cores

• Multiple processors on single chip

– Large shared cache

• Within a processor, increase in performance proportional to square root of increase in complexity

• Within a processor, increase in performance proportional to square root of increase in complexity

• If software can use multiple processors, doubling number of processors almost doubles performance

• So, use two simpler processors on the chip rather than one more complex processor

• With two processors, larger caches are justified

– Power consumption of memory logic less than processing logic– Power consumption of memory logic less than processing logic

• Example: IBM POWER4

– Two cores based on PowerPC

32Ali Muhtaroğlu

Page 33: Computer Architecture I Lecture Presentation 1

Some physics…• Dynamic power dissipation is CV2f

Where C is the effective switching capacitance while running an application

f is the switching frequency

V is the operating voltage

Some math…

• Assume a single core processor C = 10 nF, V=1.2 V, f = 2 GHz

Power = CV2f = 28.8 W

What if we get to 2.4 GHz by increasing V to 1.3 V.

V is the operating voltage

• Also f is roughly proportional to V

• What if we get to 2.4 GHz by increasing V to 1.3 V.

Power = 40.6 W

• What if we have 2 core processor operating at 1.6 GHz and 1.1 V

Power = 38.7 W

• What if we now build the original chip to be less complex i.e. smaller C ?

33Ali Muhtaroğlu

Page 34: Computer Architecture I Lecture Presentation 1

POWER4 Chip Organization

34Ali Muhtaroğlu

Page 35: Computer Architecture I Lecture Presentation 1

Problems with “sea change”?

• Algorithms, Programming Languages, Compilers,

Operating Systems, Architectures, Libraries, … not

ready to supply Thread-Level Parallelism or Data-

Level Parallelism for 1000 CPUs / chip, Level Parallelism for 1000 CPUs / chip,

• Architectures not ready for 1000 CPUs / chip

– Unlike Instruction-Level Parallelism, cannot be solved by

computer architects and compiler writers alone, but also

cannot be solved without participation of architects

35Ali Muhtaroğlu

cannot be solved without participation of architects

• Need a reworking of all the abstraction layers in the

computing system stack

Page 36: Computer Architecture I Lecture Presentation 1

Algorithm

Application

Programming Language Parallel

Abstraction Layers

in Modern Computing Systems

Gates/Register-Transfer Level (RTL)

Instruction Set Architecture (ISA)

Operating System/Virtual Machines

Microarchitecture

Programming Language

Circuits

Original

domain of

the computer

architect

(‘50s-’80s)

Domain of

recent

computer

architecture

(‘90s)

Reliability,

power, …

Parallel computing,

security, …

36Ali Muhtaroğlu

Devices

Physics

power, …

Reinvigoration of

computer architecture,

mid-2000s onward.

Page 37: Computer Architecture I Lecture Presentation 1

EEE-445

Our goal will be to acquire a good understanding of:

- Computer System components

- Instruction Set Architecture (ISA) design

- Single-Cycle and Multi-Cycle hardware organization/design to

support an ISA throughsupport an ISA through

- We will do limited exercises on defining hardware through

- Software emulators

- VHDL/schematic capture description and simulation

EEE-446 (next term): will complete the missing theoretical pieces to

obtain a solid Computer Architecture background:

- Bit slicing

37Ali Muhtaroğlu

- Bit slicing

- Arithmetic algorithms and how they are implemented in hardware

- More advanced Memory and I/O concepts

- Pipelining and parallel processing concepts

In addition you will get some practical experience in the lab by applying

what you learnt in EEE-445/446 sequence to design your own CPU.