Page 1
CS152ComputerArchitectureandEngineeringCS252GraduateComputerArchitecture
Lecture2-SimpleMachineImplementa=ons
KrsteAsanovicElectricalEngineeringandComputerSciences
UniversityofCaliforniaatBerkeley
http://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152
Page 2
LastTimeinLecture1§ ComputerArchitecture>>ISAsandRTL
– CS152isaboutinteracGonofhardwareandsoIware,anddesignofappropriateabstracGonlayers
§ TechnologyandApplicaGonsshapeComputerArchitecture– Historyprovideslessonsforthefuture
§ First130yearsofCompArch,fromBabbagetoIBM360– Movefromcalculators(nocondiGonals)tofullyprogrammablemachines– RapidchangestartedinWWII(mid-1940s),movefromelectro-mechanicaltopureelectronicprocessors
§ CostofsoIwaredevelopmentbecomesalargeconstraintonarchitecture(needcompaGbility)
§ IBM360introducesnoGonof“familyofmachines”runningsameISAbutverydifferentimplementaGons– Sixdifferentmachinesreleasedonsameday(April7,1964)– “Future-proofing”forsubsequentgeneraGonsofmachine
2
Page 3
Instruc=onSetArchitecture(ISA)
§ ThecontractbetweensoIwareandhardware§ Typicallydescribedbygivingalltheprogrammer-visiblestate(registers+memory)plusthesemanGcsoftheinstrucGonsthatoperateonthatstate
§ IBM360wasfirstlineofmachinestoseparateISAfromimplementaGon(aka.microarchitecture)
§ ManyimplementaGonspossibleforagivenISA– E.g.,Sovietsbuiltcode-compaGbleclonesoftheIBM360,asdidAmdahlaIerheleIIBM.
– E.g.2.,AMD,Intel,VIAprocessorsruntheAMD64ISA– E.g.3:manycellphonesusetheARMISAwithimplementaGonsfrommanydifferentcompaniesincludingApple,Qualcomm,Samsung,Huawei,etc.
§ WeuseRISC-VasstandardISAinclass(www.riscv.org)– Manycompaniesandopen-sourceprojectsbuildRISC-VimplementaGons
3
Page 4
ISAtoMicroarchitectureMapping
§ ISAoIendesignedwithparGcularmicroarchitecturalstyleinmind,e.g.,Accumulator⇒hardwired,unpipelinedCISC ⇒microcodedRISC ⇒hardwired,pipelinedVLIW ⇒fixed-latencyin-orderparallelpipelinesJVM ⇒soIwareinterpretaGon
§ Butcanbeimplementedwithanymicroarchitecturalstyle– IntelIvyBridge:hardwiredpipelinedCISC(x86)
machine(withsomemicrocodesupport)– Spike:SoIware-interpretedRISC-Vmachine– ARMJazelle:AhardwareJVMprocessor– Thislecture:amicrocodedRISC-Vmachine
4
Page 5
WhyLearnMicroprogramming?
§ ToshowhowtobuildverysmallprocessorswithcomplexISAs§ TohelpyouunderstandwhereCISC*machinescamefrom§ BecausesGllusedincommonmachines(x86,IBM360,PowerPC)§ AsagentleintroducGonintomachinestructures§ TohelpunderstandhowtechnologydrovethemovetoRISC*
*“CISC”/”RISC”namesmuchnewerthanstyleofmachinestheyreferto.
5
Page 6
ControlversusDatapath
§ Processordesignscanbesplitbetweendatapath,wherenumbersarestoredandarithmeGcoperaGonscomputed,andcontrol,whichsequencesoperaGonsondatapath
6
§ Biggestchallengeforearlycomputerdesignerswasgejngcontrolcircuitrycorrect
§ MauriceWilkesinventedtheideaofmicroprogrammingtodesignthecontrolunitofaprocessorforEDSAC-II,1958- ForeshadowedbyBabbage’s“Barrel”andmechanismsinearlierprogrammablecalculators
CondiGon?
Control
MainMemory
Address Data
ControlLines
Datapath
PC
Inst.R
eg.
Registers
ALU
InstrucGon
Busy?
Page 7
MicrocodedCPU
7
Datapath
MainMemory(holdsuserprogramwri?eninmacroinstruc@ons,e.g.,x86,RISC-V)
Address Data
Decoder
µPCMicrocodeROM(holdsfixedµcodeinstruc@ons)
NextState
ControlLines
Opcod
e
Cond
iGon
Busy?
Page 8
TechnologyInfluence
§ Whenmicrocodeappearedin1950s,differenttechnologiesfor:– Logic:VacuumTubes– MainMemory:MagneGccores– Read-OnlyMemory:Diodematrix,punchedmetalcards,…
§ LogicveryexpensivecomparedtoROMorRAM§ ROMcheaperthanRAM§ ROMmuchfasterthanRAM
8
Page 9
RISC-VISA
§ NewfiIh-generaGonRISCdesignfromUCBerkeley§ RealisGc&completeISA,butopen&small§ Notover-architectedforacertainimplementaGonstyle§ Both32-bit(RV32)and64-bit(RV64)address-spacevariants§ DesignedformulGprocessing§ EfficientinstrucGonencoding§ Easytosubset/extendforeducaGon/research§ RISC-VspecavailableonFoundaGonwebsiteandgithub§ IncreasingmomentumwithindustryadopGon
§ PleaseseeCS61CFall2017,Lectures5-7forRISC-VISAreview:http://inst.eecs.berkeley.edu/~cs61c/fa17/
9
Page 10
RV32ProcessorState
10
Programcounter(pc)32x32-bitintegerregisters(x0-x31)• x0alwayscontainsa032floaGng-point(FP)registers(f0-f31)• eachcancontainasingle-ordouble-precisionFPvalue(32-bitor64-bitIEEEFP)
FPstatusregister(fcsr),usedforFProundingmode&excepGonreporGng
Page 11
RISC-VInstruc=onEncoding
§ Cansupportvariable-lengthinstrucGons.§ BaseinstrucGonset(RV32)alwayshasfixed32-bitinstrucGonslowesttwobits=112
§ Allbranchesandjumpshavetargetsat16-bitgranularity(eveninbaseISAwhereallinstrucGonsarefixed32bits)
11
Page 12
RISC-VInstruc=onFormats
12
DesGnaGonReg.
Reg.Source1
Reg.Source27-bitopcodefield(butlow2bits=112)
AddiGonalopcodebits/immediate
Page 13
Single-BusDatapathforMicrocodedRISC-V
13
MicroinstrucGonswrivenasregistertransfers:§ MA:=PCmeansRegSel=PC;RegW=0;RegEn=1;MALd=1§ B:=Reg[rs2]meansRegSel=rs2;RegW=0;RegEn=1;BLd=1§ Reg[rd]:=A+BmeansALUop=Add;ALUEn=1;RegSel=rd;RegW=1
CondiGon?
MainMemory
PC
Registers
ALU
32(P
C)
rd
rs1
rs2
RegisterRAM
Address
InDataOutInstrucGon
Reg.
Mem
.Add
ressB
AImmed
iate
ImmEn RegEn ALUEn MemEn
ALUOp
Mem
W
ImmSel
RegW
BLdInstLd
MALd
ALd
RegSel
Busy?Opcode
Page 14
RISC-VInstruc=onExecu=onPhases
§ InstrucGonFetch§ InstrucGonDecode§ RegisterFetch§ ALUOperaGons§ Op@onalMemoryOperaGons§ Op@onalRegisterWriteback§ CalculateNextInstrucGonAddress
14
Page 15
MicrocodeSketches(1)
15
InstrucGonFetch: MA,A:=PC PC:=A+4 waitformemory IR:=Mem dispatchonopcode
ALU: A:=Reg[rs1]
B:=Reg[rs2] Reg[rd]:=ALUOp(A,B) gotoinstruc@onfetch
ALUI: A:=Reg[rs1]
B:=ImmI//Sign-extend12bimmediate Reg[rd]:=ALUOp(A,B) gotoinstruc@onfetch
Page 16
MicrocodeSketches(2)
16
LW: A:=Reg[rs1] B:=ImmI//Sign-extend12bimmediate
MA:=A+B waitformemory Reg[rd]:=Mem gotoinstruc@onfetch
JAL: Reg[rd]:=A//Storereturnaddress A:=A-4//RecoveroriginalPC B:=ImmJ//Jump-styleimmediate PC:=A+B gotoinstruc@onfetch
Branch: A:=Reg[rs1] B:=Reg[rs2] if(!ALUOp(A,B))gotoinstruc@onfetch//Nottaken A:=PC//Microcodefallthroughifbranchtaken A:=A-4 B:=ImmB//Branch-styleimmediate PC:=A+B gotoinstruc@onfetch
Page 17
PureROMImplementa=on
17
§ Howmanyaddressbits?|µaddress|=|µPC|+|opcode|+1+1
§ Howmanydatabits?|data|=|µPC|+|controlsignals|=|µPC|+18
§ TotalROMsize=2|µaddress|x|data|
µPC
ROMAddress
Data
Opcode Cond? Busy?
NextµPC ControlSignals
Page 18
PureROMContents
18
Address | DataµPC OpcodeCond?Busy? |ControlLines NextµPCfetch0 X X X |MA,A:=PC fetch1fetch1 X X 1 | fetch1fetch1 X X 0 |IR:=Mem fetch2fetch2 ALU X X |PC:=A+4 ALU0fetch2 ALUI X X |PC:=A+4 ALUI0fetch2 LW X X |PC:=A+4 LW0….ALU0 X X X |A:=Reg[rs1] ALU1ALU1 X X X |B:=Reg[rs2] ALU2ALU2 X X X |Reg[rd]:=ALUOp(A,B) fetch0
Page 19
Single-BusMicrocodeRISC-VROMSize
§ InstrucGonfetchsequence3commonsteps§ ~12instrucGongroups§ Eachgrouptakes~5steps(1fordispatch)§ Totalsteps3+12*5=63,needs6bitsforµPC
§ Opcodeis5bits,~18controlsignals
§ Totalsize=2(6+5+2)x(6+18)=213x24=~25KiB!
19
Page 20
ReducingControlStoreSize
§ ReduceROMheight(#addressbits)– Useexternallogictocombineinputsignals– Reduce#statesbygroupingopcodes
§ ReduceROMwidth(#databits)– RestrictµPCencoding(next,dispatch,waitonmemory,…)– Encodecontrolsignals(verGcalµcoding,nanocoding)
20
Page 21
Single-BusRISC-VMicrocodeEngine
21
µPC
Decode
ROMAddress
Data
Opcode
Cond?Busy?
ControlSignals
+1
fetch0
µPCJumpLogic
µPCjump
µPCjump=next|spin|fetch|dispatch|Irue|ffalse
Page 22
µPCJumpTypes
§ nextincrementsµPC§ spinwaitsformemory§ fetchjumpstostartofinstrucGonfetch§ dispatchjumpstostartofdecodedopcodegroup§ Krue/ffalsejumpstofetchifCond?true/false
22
Page 23
EncodedROMContents
23
Address | DataµPC |ControlLines NextµPCfetch0 |MA,A:=PC nextfetch1 |IR:=Mem spinfetch2 |PC:=A+4 dispatchALU0 |A:=Reg[rs1] nextALU1 |B:=Reg[rs2] nextALU2 |Reg[rd]:=ALUOp(A,B) fetchBranch0 |A:=Reg[rs1] nextBranch1 |B:=Reg[rs2] nextBranch2 |A:=PC ffalseBranch3 |A:=A-4 nextBranch4 |B:=ImmB nextBranch5 |PC:=A+B fetch
Page 24
CS152Administrivia
§ GradingclarificaGons– Youmustcomplete3/5labsorgetanautomaGcFregardlessofothergrades
§ Slipdays– Problemsetshavenoslipdays– Labshavetwofreeextensions(maxoneperlab)unGlnextclassaIerduedate
– Nootherextensionswithoutdocumentedemergency
24
Page 25
CS252Administrivia
§ CS252ReadingsonWebsite– MustusePiazzatosendprivatenoteoneachperpaperthreadtoinstructorsbeforemidnightSundaybeforeMondaydiscussioncontainingpaperreport:• Writeoneparagraphonmaincontentofpaperincludinggood/badpointsofpaper
• Also,1-3quesGonsaboutpaperfordiscussion• Firsttwo“360Architecture”,“B5000Architecture”
§ CS252ProjectTimeline– ProposalduestartofclassWedFeb26th– OnepageinPDFformatincluding:
• projectGtle• teammembers(2perproject)• whatproblemareyoutryingtosolve?• whatisyourapproach?• infrastructuretobeused• Gmeline/milestones
25
Page 26
Implemen=ngComplexInstruc=ons
26
Memory-memoryadd:M[rd]=M[rs1]+M[rs2]
Address | DataµPC |ControlLines NextµPCMMA0 |MA:=Reg[rs1] nextMMA1 |A:=Mem spinMMA2 |MA:=Reg[rs2] nextMMA3 |B:=Mem spinMMA4 |MA:=Reg[rd] nextMMA5 |Mem:=ALUOp(A,B) spinMMA6 | fetchComplexinstrucGonsusuallydonotrequiredatapathmodificaGons,onlyextraspaceforcontrolprogramVerydifficulttoimplementtheseinstrucGonsusingahardwiredcontrollerwithoutsubstanGaldatapathmodificaGons
Page 27
Single-BusDatapathforMicrocodedRISC-V
27
DatapathunchangedforcomplexinstrucGons!
CondiGon?
MainMemory
PC
Registers
ALU
32(P
C)
rd
rs1
rs2
RegisterRAM
Address
InDataOutInstrucGon
Reg.
Mem
.Add
ressB
AImmed
iate
ImmEn RegEn ALUEn MemEn
ALUOp
Mem
W
ImmSel
RegW
BLdInstLd
MALd
ALd
RegSel
Busy?Opcode
Page 28
HorizontalvsVer=calµCode
28
§ HorizontalµcodehaswiderµinstrucGons– MulGpleparalleloperaGonsperµinstrucGon– FewermicrocodestepspermacroinstrucGon– Sparserencoding⇒morebits
§ VerGcalµcodehasnarrowerµinstrucGons– TypicallyasingledatapathoperaGonperµinstrucGon
§ separateµinstrucGonforbranches– MoremicrocodestepspermacroinstrucGon– Morecompact⇒lessbits
§ Nanocoding– TriestocombinebestofhorizontalandverGcalµcode
#µInstrucGons
BitsperµInstrucGon
Page 29
Nanocoding
29
§ Motorola68000had17-bitµcodecontainingeither10-bitµjumpor9-bitnanoinstrucGonpointer– NanoinstrucGonswere68bitswide,decodedtogive196controlsignals
µcodeROM
nanoaddress
µcodenext-state
µaddress
µPC(state)
nanoinstrucGonROMdata
Exploitsrecurringcontrolsignalpavernsinµcode,e.g.,ALU0 A←Reg[rs1]...ALUI0 A←Reg[rs1]...
Page 30
MicroprogramminginIBM360
30
§ Onlythefastestmodels(75and95)werehardwired
M30 M40 M50 M65Datapathwidth(bits) 8 16 32 64
µinstwidth(bits) 50 52 85 87
µcodesize(Kµinsts) 4 4 2.75 2.75
µstoretechnology CCROS TCROS BCROS BCROS
µstorecycle(ns) 750 625 500 200
memorycycle(ns) 1500 2500 2000 750
Rentalfee($K/month) 4 7 15 35
Page 31
IBMCard-CapacitorRead-OnlyStorage
31[IBMJournal,January1961]
PunchedCardwithmetalfilm
Fixedsensingplates
Page 32
MicrocodeEmula=on
32
§ IBMiniGallymiscalculatedtheimportanceofsoIwarecompaGbilitywithearliermodelswhenintroducingthe360series
§ HoneywellstolesomeIBM1401customersbyofferingtranslaGonsoIware(“Liberator”)forHoneywellH200seriesmachine
§ IBMretaliatedwithopGonaladdiGonalmicrocodefor360seriesthatcouldemulateIBM1401ISA,laterextendedforIBM7000series– onepopularprogramon1401wasa650simulator,sosomecustomersranmany650programsonemulated1401s
– i.e.,650simulatedon1401emulatedon360
Page 33
Microprogrammingthrivedin‘60sand‘70s
33
§ SignificantlyfasterROMsthanDRAMswereavailable§ ForcomplexinstrucGonsets,datapathandcontrollerwerecheaperandsimpler
§ NewinstrucGons,e.g.,floaGngpoint,couldbesupportedwithoutdatapathmodificaGons
§ Fixingbugsinthecontrollerwaseasier§ ISAcompaGbilityacrossvariousmodelscouldbeachievedeasilyandcheaply
Exceptforthecheapestandfastestmachines,allcomputersweremicroprogrammed
Page 34
Microprogramming:early1980s
34
§ EvoluGonbredmorecomplexmicro-machines– ComplexinstrucGonsetsledtoneedforsubrouGneandcallstacksinµcode
– Needforfixingbugsincontrolprogramswasinconflictwithread-onlynatureofµROM
– èWritableControlStore(WCS)(B1700,QMachine,Inteli432,…)
§ WiththeadventofVLSItechnologyassumpGonsaboutROM&RAMspeedbecameinvalidàmorecomplexity
§ BevercompilersmadecomplexinstrucGonslessimportant.§ Useofnumerousmicro-architecturalinnovaGons,e.g.,pipelining,cachesandbuffers,mademulGple-cycleexecuGonofreg-reginstrucGonsunavracGve
Page 35
VAX11-780Microcode
35
Page 36
WritableControlStore(WCS)
36
§ ImplementcontrolstoreinRAMnotROM– MOSSRAMmemoriesnowalmostasfastascontrolstore(corememories/DRAMswere2-10xslower)
– Bug-freemicroprogramsdifficulttowrite
§ User-WCSprovidedasopGononseveralminicomputers– Alloweduserstochangemicrocodeforeachprocessor
§ User-WCSfailed– Livleornoprogrammingtoolssupport– DifficulttofitsoIwareintosmallspace– MicrocodecontroltailoredtooriginalISA,lessusefulforothers– LargeWCSpartofprocessorstate-expensivecontextswitches– ProtecGondifficultifusercanchangemicrocode– Virtualmemoryrequiredrestartablemicrocode
Page 37
Microprogrammingisfarfromex=nct
§ PlayedacrucialroleinmicrosoftheEighGes• DECuVAX,Motorola68Kseries,Intel286/386
§ PlaysanassisGngroleinmostmodernmicros– e.g.,AMDZen,IntelSkyLake,IntelAtom,IBMPowerPC,…– MostinstrucGonsexecuteddirectly,i.e.,withhard-wiredcontrol– Infrequently-usedand/orcomplicatedinstrucGonsinvokemicrocode
§ Patchablemicrocodecommonforpost-fabricaGonbugfixes,e.g.Intelprocessorsloadµcodepatchesatbootup– IntelhadtoscrambletoresurrectmicrocodetoolsandfindoriginalmicrocodeengineerstopatchMeltdown/Spectresecurityvulnerabilites
37
Page 38
Acknowledgements
§ Theseslidescontainmaterialdevelopedandcopyrightby:– Arvind(MIT)– KrsteAsanovic(MIT/UCB)– JoelEmer(Intel/MIT)– JamesHoe(CMU)– JohnKubiatowicz(UCB)– DavidPaverson(UCB)
§ MITmaterialderivedfromcourse6.823§ UCBmaterialderivedfromcourseCS252
38