Page 1
9/1/2016 CS152,Fall2016
CS152ComputerArchitectureandEngineering
Lecture3- FromCISCtoRISC
JohnWawrzynekElectricalEngineeringandComputerSciences
UniversityofCaliforniaatBerkeley
http://www.eecs.berkeley.edu/~johnwhttp://inst.eecs.berkeley.edu/~cs152
Page 2
9/1/2016 CS152,Fall2016
LastTimeinLecture2
§ ISAisthehardware/softwareinterface– Definessetofprogrammervisiblestate– Definesinstructionformat(bitencoding)andinstructionsemantics– Examples:IBM360,MIPS,RISC-V,x86,JVM
§ ManypossibleimplementationsofoneISA– 360implementations:model30(c.1964),z12(c.2012)– x86implementations:8086(c.1978),80186,286,386,486,Pentium,PentiumPro,Pentium-4(c.2000),Core2Duo,Nehalem,SandyBridge,IvyBridge,Atom,AMDAthlon,Transmeta Crusoe,SoftPC
– MIPSimplementations:R2000,R4000,R10000,R18K,…– JVM:HotSpot,PicoJava,ARMJazelle,…
§ Microcoding: straightforwardmethodicalwaytoimplementmachinesusinglowlogicgatecountandsimplifiesimplementationofcomplexinstructions
2
Page 3
9/1/2016 CS152,Fall2016
§ Instructionsperprogramdependsoncompilertechnology,andISA
§ Cyclesperinstructions(CPI)dependsonISAandµarchitecture
§ Timeperclockcycledependsupontheµarchitectureandbasetechnology
3
Time =Instructions ClockCycles TimeProgramProgram*Instruction*ClockCycle
“IronLaw”ofProcessorPerformance
Microarchitecture CPI cycletimeMicrocoded >1 shortSingle-cycleunpipelined 1 longPipelined ~1 short
Thislecture
Page 4
9/1/2016 CS152,Fall2016
HardwareElements§ Combinationalcircuits
– Mux,Decoder,ALU,...
• Synchronousstateelements– Flipflop,Register,Registerfile,SRAM,DRAM
Edge-triggered:Dataissampledattherisingedge
Clk
D
Q
Enff
Q
D
ClkEn
OpSelect- Add,Sub,...- And,Or,Xor,Not,...- GT,LT,EQ,Zero,...
Result
Comp?
A
B
ALU
Sel
OA0A1
An-1
Mux...
lg(n)
A
Decoder ...
O0O1
On-1
lg(n)
Page 5
9/1/2016 CS152,Fall2016
RegisterFiles
§ Readsarecombinational
5
ReadData1ReadSel1ReadSel2
WriteSel
Registerfile
2R+1W
ReadData2
WriteData
WEClock
rd1rs1
rs2
ws
wd
rd2
we
ff
Q0
D0
ClkEn
ff
Q1
D1
ff
Q2
D2
ff
Qn-1
Dn-1
...
...
...
register
Page 6
9/1/2016 CS152,Fall2016
RegisterFileImplementation
§ RISC-Vintegerinstructionshaveatmost2registersourceoperands
6
reg31
rd clk
reg1
wdata
we
rs1rdata1 rdata2
reg0
…
32
…
5 32 32
…
rs255
enables selects
Page 7
9/1/2016 CS152,Fall2016
ASimpleMemoryModel
7
MAGICRAM
ReadData
WriteData
Address
WriteEnableClock
Readsandwritesarealwayscompletedinonecycle• aReadcanbedoneanytime(i.e.combinational)• aWriteisperformedattherisingclockedgeiff WriteEnable signalisasserted
⇒ thewriteaddressanddatamustbestableattheclockedge
Laterinthecoursewewillpresentamorerealisticmodelofmemory
Page 8
9/1/2016 CS152,Fall2016
ImplementingRISC-V
Single-cycleperinstructiondatapath &controllogic
(SimilartoMIPSsingle-cycleprocessorinCS61C)
8
Page 9
9/1/2016 CS152,Fall2016
InstructionExecutionReview
Executionofaninstructioninvolves
1. Instructionfetch2. Decodeandregisterfetch3. ALUoperation4. Memoryoperation(optional)5. Writeback(optional)
andcomputeaddressofnextinstruction
9
Page 10
9/1/2016 CS152,Fall2016
Datapath:Reg-RegALUInstructions
10
RegWrite Timing?5 5 5 10 7
rd rs1 rs2 func opcode rd ← (rs1) func (rs2)31 27 26 22 21 17 16 7 6 0
0x4Add
clk
addrinst
Inst.Memory
PC
Inst<26:22>Inst<21:17>
Inst<31:27>
Inst<16:0>
OpCode
ALU
ALUControl
RegWriteEn
clk
rd1
GPRs
rs1rs2
wawd rd2
we
Page 11
9/1/2016 CS152,Fall2016
Datapath:Reg-ImmALUInstructions
11
5 5 12 3 7rd rs1 immediate12 func opcode rd ← (rs1) op immediate
31 27 26 22 21 10 9 7 6 0
ImmSelect
ImmSel
inst<21:10>
OpCode
0x4Add
clk
addrinst
Inst.Memory
PCALU
RegWriteEn
clk
rd1
GPRs
rs1rs2
wawd rd2
weinst<26:22>
inst<31:27>
inst<9:0> ALUControl
Page 12
9/1/2016 CS152,Fall2016
ConflictsinMergingDatapath
12
ImmSelect
ImmSelOpCode
0x4Add
clk
addrinst
Inst.Memory
PCALU
RegWrite
clk
rd1
GPRs
rs1rs2
wawd rd2
weinst<26:22>
Inst<31:27>
Inst<21:10>
Inst<16:0> ALUControlInst<9:0>
Introducemuxes
rd rs1 immediate12 func3 opcode rd ← (rs1) op immediate
5 5 5 10 7rd rs1 rs2 func10 opcode rd ← (rs1) func (rs2)
Inst<21:17>
Page 13
9/1/2016 CS152,Fall2016
Datapath forALUInstructions
13
<16:0>
rd rs1 immediate12 func3 opcode rd ← (rs1) op immediate
5 5 5 10 7rd rs1 rs2 func10 opcode rd ← (rs1) func (rs2)
Op2SelReg / Imm
ImmSelect
ImmSelOpCode
0x4Add
clk
addrinst
Inst.Memory
PCALU
RegWriteEnclk
rd1
GPRs
rs1rs2
wawd rd2
we<26:22><21:17>
FuncSel
ALUControl
<31:27>
<6:0>
Page 14
9/1/2016 CS152,Fall2016
Load/StoreInstructions
14
WBSelALU / Mem
rs1 is the base registerrd is the destination of a Load, rs2 is the data source for a Store
Op2Sel
“base”
disp
ImmSelOpCode FuncSel
ALUControl
ALU
0x4Add
clk
addrinst
Inst.Memory
PC
RegWriteEn
clk
rd1
GPRs
rs1rs2
wawd rd2
we
ImmSelect
clk
MemWrite
addr
wdata
rdataData Memory
we
rd rs1 immediate12 func3 opcode Load
5 5 5 7 3 7 Addressing Modeimm rs1 rs2 imm func3 opcode Store (rs) + displacement
Page 15
9/1/2016 CS152,Fall2016
RISC-VConditionalBranches
§ Comparetwointegerregistersforequality(BEQ/BNE)orrelativevalue(signed)(BLT/BGE)orunsigned(BLTU/BGEU)
§ 12-bitimmediateencodesbranchtargetaddressasasignedoffsetfromPC,inunitsof16-bits(i.e.,shiftleftby1thenaddtoPC).
15
7
6 0opcode
3
9 7func3
7
16 10imm[6:0]
5
21 17rs2
5
26 22rs1
5
31 27imm[11:7]
BEQ/BNE
BLT/BGE
BLTU/BGEU
Page 16
9/1/2016 CS152,Fall2016
ConditionalBranches(BEQ/BNE/BLT/BGE/BLTU/BGEU)
16
0x4
Add
PCSel
clk
WBSelMemWrite
addr
wdata
rdataData Memory
we
Op2SelImmSelOpCode
Bcomp?
FuncSel
clk
clk
addrinst
Inst.Memory
PC rd1
GPRs
rs1rs2
wawd rd2
we
ImmSelect
ALU
ALUControl
Add
br
pc+4
RegWrEn
Br Logic
Page 17
9/1/2016 CS152,Fall2016
IncludingJumpandJalr
17
0x4
RegWriteEn
AddAdd
clk
WBSelMemWrite
addr
wdata
rdataData Memory
we
WASel Op2SelImmSelOpCode FuncSel
clk
clk
addrinst
Inst.Memory
PC rd1
GPRs
rs1rs2
wawd rd2
we
ImmSelect
ALU
ALUControl
1
PCSelbrrindjabspc+4
Bcomp?Br Logic
Page 18
9/1/2016 CS152,Fall2016
HardwiredControlispureCombinationalLogic
18
combinationallogic
opcode
Equal?
ImmSelOp2SelFuncSelMemWriteWBSelWASelRegWriteEnPCSel
Page 19
9/1/2016 CS152,Fall2016
ALUControl&ImmediateExtension
19
Inst<6:0> (Opcode)
Decode Map
Inst<16:7> (Func)
ALUop+
FuncSel( Func, Op, +)
ImmSel( IType12, BsType12,
BrType12)
Page 20
9/1/2016 CS152,Fall2016
HardwiredControlTable
20
Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel
ALUALUiLWSWBEQtrue
BEQfalse
JJALJALR
Op2Sel=Reg /Imm WBSel =ALU/Mem /PCWASel =rd /X1 PCSel =pc+4/br /rind/jabs
* * * no yes rindPC rdjabs* * * no yes PC X1
jabs* * * no no * *pc+4BrType12 * * no no * *brBrType12 * * no no * *pc+4BsType12 Imm + yes no * *
pc+4* Reg Func no yes ALU rdIType12 Imm Op pc+4no yes ALU rd
pc+4IType12 Imm + no yes Mem rd
Page 21
9/1/2016 CS152,Fall2016
RISC-VUnconditional Jumps
§ 25-bitimmediateencodesjumptargetaddressasasignedoffsetfromPC,inunitsof16-bits(i.e.,shiftleftby1thenaddtoPC).(+/- 16MB)
§ JALisasubroutinecallthatalsosavesreturnaddress(PC+4)inregisterx1
21
J
JAL
7
6 0opcode
25
31 7JumpOffset[24:0]
Page 22
9/1/2016 CS152,Fall2016
RISC-VRegisterIndirectJumps
§ Jumpstotargetaddressgivenbyadding12-bitoffset(notshiftedby1bit)toregisterrs1.PC←RF[rs1]+sign-ext(Imm)
§ Thereturnaddress(PC+4)iswrittentord(canbex0 ifvaluenotneeded)
§ TheRDNPCinstructionsimplywritesreturnaddresstoregisterrdwithoutjumping(usedfordynamiclinking)
22
7
6 0opcode
3
9 7func3
12
21 10Imm[11:0]
5
26 22rs1
JALR
RDNPC
5
31 27rd
Page 23
9/1/2016 CS152,Fall2016
FullRISCV1StageDatapath (Lab1)
23
Note: Ref File shown twice for clarity.Immediate select changed.
Page 24
9/1/2016 CS152,Fall2016
Single-CycleHardwiredControl
Wewillassumeclockperiodissufficientlylongforallofthefollowingstepstobe“completed”:1. Instructionfetch2. Decodeandregisterfetch3. ALUoperation4. Datafetchifrequired5. Registerwrite-backsetuptime
⇒ tC >tIFetch +tRFetch +tALU+tDMem+tRWB
Attherisingedgeofthefollowingclock,thePC,registerfileandmemoryareupdated
24
Page 25
9/1/2016 CS152,Fall2016
§ Instructionsperprogramdependsonsourcecode,compilertechnology,andISA
§ Cyclesperinstructions(CPI)dependsonISAandµarchitecture
§ Timepercycledependsupontheµarchitectureandbasetechnology
25
Time =Instructions Cycles TimeProgramProgram*Instruction*Cycle
“IronLaw”ofProcessorPerformance
Page 26
9/1/2016 CS152,Fall2016
Inst3
CPIforMicrocodedMachine
26
7cycles
Inst1 Inst2
5cycles 10cycles
Totalclockcycles=7+5+10=22
Totalinstructions=3
CPI=22/3=7.33
CPIisalwaysanaverageoveralargenumberofinstructions.
Time
Page 27
9/1/2016 CS152,Fall2016
TechnologyInfluence
§Whenmicrocodeappearedin50s,differenttechnologiesfor:– Logic:VacuumTubes– MainMemory:Magneticcores– Read-OnlyMemory:Diodematrix,punchedmetalcards,…
§ LogicveryexpensivecomparedtoROMorRAM§ ROMcheaperthanRAM§ ROMmuchfasterthanRAM
27
Butseventiesbroughtadvancesinintegratedcircuittechnologyandsemiconductormemory…
Page 28
9/1/2016 CS152,Fall2016
FirstMicroprocessorIntel4004,1971
§ 4-bitaccumulatorarchitecture
§ 8µmpMOS§ 2,300transistors§ 3x4mm2§ 750kHzclock§ 8-16cycles/inst.
28
Madepossiblebynewintegratedcircuittechnology
Page 29
9/1/2016 CS152,Fall2016
Microprocessors intheSeventies
§ Initialtargetwasembeddedcontrol– Firstmicro,4-bit4004fromIntel,designedforadesktopprintingcalculator
– Constrainedbywhatcouldfitonsinglechip– Accumulatorarchitectures,similartoearliestcomputers– Hardwiredstatemachinecontrol
§ 8-bitmicros(8085,6800,6502)usedinhobbyistpersonalcomputers– Micral,Altair,TRS-80,Apple-II– Usuallyhad16-bitaddressspace(upto64KBdirectlyaddressable)
– OftencamewithsimpleBASIClanguageinterpreterbuiltintoROMorloadedfromcassettetape.
29
Page 30
9/1/2016 CS152,Fall2016
VisiCalc– thefirst“killer”appformicros• MicroprocessorshadlittleimpactonconventionalcomputermarketuntilVisiCalcspreadsheetforApple-II• Apple-IIusedMostek 6502microprocessorrunningat1MHz
30[PersonalComputingAd,1979]
FloppydiskswereoriginallyinventedbyIBMasawayofshippingIBM360microcodepatchestocustomers!
Page 31
9/1/2016 CS152,Fall2016
DRAMintheSeventies
§ Dramaticprogressinsemiconductormemorytechnology
§ 1970,IntelintroducesfirstDRAM,1Kbit1103
§ 1979,Fujitsuintroduces64KbitDRAM
=>Bymid-Seventies,obviousthatPCswouldsoonhave>64KBytesphysicalmemory
31
Page 32
9/1/2016 CS152,Fall2016
MicroprocessorEvolution
§ Rapidprogressin70s,fueledbyadvancesinMOSFETtechnologyandexpandingmarkets
§ Inteli432– Mostambitiousseventies’micro;startedin1975- released1981– 32-bitcapability-basedobject-orientedarchitecture– Instructionsvariablenumberofbitslong– Severeperformance,complexity,andusabilityproblems
§ Motorola68000(1979,8MHz,68,000transistors)– Heavilymicrocoded (andnanocoded)– 32-bitgeneral-purposeregisterarchitecture(24addresspins)– 8addressregisters,8dataregisters
§ Intel8086(1978,8MHz,29,000transistors)– “Stopgap”16-bitprocessor,architectedin10weeks– Extendedaccumulatorarchitecture,assembly-compatiblewith8080– 20-bitaddressingthroughsegmentedaddressingscheme
32
Page 33
9/1/2016 CS152,Fall2016
IBMPC,1981
§ Hardware– TeamfromIBMbuildingPCprototypesin1979– Motorola68000choseninitially,but68000waslate– IBMbuilds“stopgap”prototypesusing8088boardsfromDisplayWriterwordprocessor
– 8088is8-bitbusversionof8086=>allowscheapersystem– Estimatedsalesof250,000– 100,000,000ssold
§ Software– MicrosoftnegotiatestoprovideOSforIBM.LaterbuysandmodifiesQDOSfromSeattleComputerProducts.
§ OpenSystem– Standardprocessor,Intel8088– Standardinterfaces– StandardOS,MS-DOS– IBMpermitscloningandthird-partysoftware
33
Page 34
9/1/2016 CS152,Fall2016 34
[ Personal Computing Ad, 11/81]
Page 35
9/1/2016 CS152,Fall2016
Microprogramming:earlyEighties
§ Evolutionbredmorecomplexmicro-machines– Complexinstructionsetsledtoneedforsubroutineandcallstacksinµcode
– Needforfixingbugsincontrolprogramswasinconflictwithread-onlynatureofµROM
– èWritableControlStore(WCS)(B1700,QMachine,Inteli432,…)
§ WiththeadventofVLSItechnologyassumptionsaboutROM&RAMspeedbecameinvalidàmorecomplexity
§ Bettercompilersmadecomplexinstructionslessimportant.
§ Useofnumerousmicro-architecturalinnovations,e.g.,pipelining,cachesandbuffers,mademultiple-cycleexecutionofreg-reginstructionsunattractive
35
Page 36
9/1/2016 CS152,Fall2016
AnalyzingMicrocodedMachines
§ JohnCocke andgroupatIBM– Workingonasimplepipelinedprocessor,801,andadvancedcompilersinsideIBM
– PortedexperimentalPL.8compilertoIBM370,andonlyusedsimpleregister-registerandload/storeinstructionssimilarto801
– Coderanfasterthanotherexistingcompilersthatusedall370instructions!(upto6MIPSwhereas2MIPSconsideredgoodbefore)
§ Emer,Clark,atDEC– MeasuredVAX-11/780usingexternalhardware– Founditwasactuallya0.5MIPSmachine,althoughusuallyassumedtobea1MIPSmachine
– Found20%ofVAXinstructionsresponsiblefor60%ofmicrocode,butonlyaccountfor0.2%ofexecutiontime!
§ VAX8800– ControlStore:16K*147bRAM,UnifiedCache:64K*8bRAM– 4.5xmoremicrostore RAMthancacheRAM!
36
Page 37
9/1/2016 CS152,Fall2016
ICTechnologyChangesTradeoffs
§ Logic,RAM,ROMallimplementedusingMOStransistors§ SemiconductorRAM~samespeedasROM
37
Page 38
9/1/2016 CS152,Fall2016
Nanocoding
38
µcodeROM
nanoaddress
µcodenext-state
µaddress
uPC (state)
nanoinstructionROMdata
Exploitsrecurringcontrolsignalpatternsinµcode,e.g.,
ALU0 A←Reg[rs1]...ALUi0 A←Reg[rs1]...
Page 39
9/1/2016 CS152,Fall2016
FromCISCtoRISC
§ UsefastRAMtobuildfastinstructioncache ofuser-visibleinstructions,notfixedhardwaremicroroutines– Contentsoffastinstructionmemorychangetofitwhatapplicationneedsrightnow
§ UsesimpleISAtoenablehardwiredpipelinedimplementation– MostcompiledcodeonlyusedafewoftheavailableCISCinstructions– Simplerencodingallowedpipelinedimplementations
§ Furtherbenefitwithintegration– Inearly‘80s,couldfinallyfit32-bitdatapath +smallcachesonasinglechip
– Nochipcrossingsincommoncaseallowsfasteroperation
39
Page 40
9/1/2016 CS152,Fall2016
BerkeleyRISCChips
40
RISC-I(1982)Contains44,420transistors,fabbed in5µm NMOS,withadieareaof77mm2,ranat1MHz.ThischipisprobablythefirstVLSIRISC.
RISC-II(1983)contains40,760transistors,wasfabbed in3µmNMOS,ranat3MHz,andthesizeis60mm2.
Stanford built some too…
Page 41
9/1/2016 CS152,Fall2016
Summary
§ Microcoding becamelessattractiveasgapbetweenRAMandROMspeedsreduced,andlogicimplementedinsametechnologyasmemory
§ Complexinstructionsetsdifficulttopipeline,sodifficulttoincreaseperformanceasgatecountgrew
§ IronLawexplainsarchitecturedesignspace– Tradeinstruction/program,cycles/instruction,andtime/cycle
§ Load-StoreRISCISAsdesignedforefficientpipelinedimplementations– Verysimilartoverticalmicrocode– InspiredbyearlierCraymachines(CDC6600/7600)
§ RISC-VISAwillbeusedinlectures,problems,andlabs– BerkeleyRISCchips:RISC-I,RISC-II,SOAR(RISC-III),SPUR(RISC-IV)
41
Page 42
9/1/2016 CS152,Fall2016
Acknowledgements
§ Theseslidescontainmaterialdevelopedandcopyrightby:– Arvind (MIT)– KrsteAsanovic(MIT/UCB)– JoelEmer (Intel/MIT)– JamesHoe(CMU)– JohnKubiatowicz (UCB)– DavidPatterson(UCB)
§ MITmaterialderivedfromcourse6.823§ UCBmaterialderivedfromcourseCS252
42