Top Banner
CS 110 Computer Architecture Lecture 24: More I/O: DMA, Disks, Networking Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University 1 Slides based on UC Berkley's CS61C
52

CS 110 Computer Architecture Lecture 24

May 15, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 110 Computer Architecture Lecture 24

CS110ComputerArchitecture

Lecture24:MoreI/O:DMA,Disks,Networking

Instructor:SörenSchwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

1Slides based on UC Berkley's CS61C

Page 2: CS 110 Computer Architecture Lecture 24

VirtualMemoryReview

2

Page 3: CS 110 Computer Architecture Lecture 24

3

ModernVirtualMemorySystemsIllusionofalarge,private,uniformstore

Protection & Privacy* Many processes, each with their own private address space and one or more shared address spaces

Demand Paging* Many processes share DRAM. * Provides ability to run programs with large address space. Pages that aren’t yet allocated or pages that don’t fit swap to secondary storage.* Hides differences in machine configurations

The price is address translation on each memory reference

OS

Prog1

PrimaryMemory

Swapping Store (Disk)

VA PAmappingTLB

Page 4: CS 110 Computer Architecture Lecture 24

4

Private(Virtual)AddressSpaceperProgram

• Each prog has a page table • Page table contains an entry for each prog page

VA1Prog 1

Page Table

VA1Prog 2

Page Table

VA1Prog 3

Page Table

Phys

ical

Mem

ory

free

OSpages

Page 5: CS 110 Computer Architecture Lecture 24

5

TranslationLookaside Buffers(TLB)Address translation is very expensive!

In a two-level page table, each reference becomes several memory accesses

Solution: Cache translations in TLBTLB hit => Single-Cycle TranslationTLB miss => Page-Table Walk to refill

VPN offset

V R W D tag PPN

physical address PPN offset

virtual address

hit?

(VPN = virtual page number)

(PPN = physical page number)

Page 6: CS 110 Computer Architecture Lecture 24

6

Page-BasedVirtual-MemoryMachine(HardwarePage-TableWalk)

PCInst.TLB

Inst.Cache D Decode E M

DataCache W+

PageFault?Protectionviolation?

PageFault?Protectionviolation?

• Pagetablesheldinuntranslated physicalmemory

DataTLB

MainMemory (DRAM)

MemoryController PhysicalAddress

PhysicalAddress

PhysicalAddress

PhysicalAddress

Page-Table Base Register

VirtualAddress Physical

Address

VirtualAddress

HardwarePageTableWalker

Miss? Miss?

Page 7: CS 110 Computer Architecture Lecture 24

7

AddressTranslation:puttingitalltogetherVirtual Address

TLBLookup

Page TableWalk

Update TLBPage Fault(OS loads page)

ProtectionCheck

PhysicalAddress(to cache)

miss hit

the page is notinmemory inmemory denied permitted

ProtectionFault

hardwarehardware or softwaresoftware

SEGFAULTWhere?

Page 8: CS 110 Computer Architecture Lecture 24

Review:I/O• “MemorymappedI/O”:Devicecontrol/dataregistersmappedtoCPUaddressspace

• CPUsynchronizeswithI/Odevice:– Polling– Interrupts

• “ProgrammedI/O”:– CPUexecslw/sw instructionsforalldatamovementto/fromdevices

– CPUspendstimedoing2things:1. Gettingdatafromdevicetomainmemory2. Usingdatatocompute

8

Page 9: CS 110 Computer Architecture Lecture 24

Workingwithrealdevices• “MemorymappedI/O”:Devicecontrol/dataregistersmappedtoCPUaddressspace

• CPUsynchronizeswithI/Odevice:– Polling– Interrupts

• “ProgrammedI/O”:DMA– CPUexecslw/sw instructionsforalldatamovementto/fromdevices

– CPUspendstimedoing2 things:1. Gettingdatafromdevicetomainmemory2. Usingdatatocompute

9

Page 10: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

10

Page 11: CS 110 Computer Architecture Lecture 24

What’swrongwithProgrammedI/O?

• Notidealbecause…1. CPUhastoexecutealltransfers,couldbedoing

otherwork2. Devicespeedsdon’talignwellwithCPUspeeds3. Energycostofusingbeefygeneral-purposeCPU

wheresimplerhardwarewouldsuffice• UntilnowCPUhassolecontrolofmainmemory

11

Page 12: CS 110 Computer Architecture Lecture 24

PIOvs.DMA

12

Page 13: CS 110 Computer Architecture Lecture 24

DirectMemoryAccess(DMA)

• AllowsI/Odevicestodirectlyread/writemainmemory

• NewHardware:theDMAEngine• DMAenginecontainsregisterswrittenbyCPU:

– Memoryaddresstoplacedata– #ofbytes– I/Odevice#,directionoftransfer– unitoftransfer,amounttotransferperburst

13

Page 14: CS 110 Computer Architecture Lecture 24

OperationofaDMATransfer

14

[FromSection5.1.4DirectMemoryAccessinModernOperatingSystemsbyAndrewS.Tanenbaum,HerbertBos,2014]

Page 15: CS 110 Computer Architecture Lecture 24

DMA:IncomingData

1. Receiveinterruptfromdevice2. CPUtakesinterrupt,beginstransfer

– InstructsDMAengine/devicetoplacedata@certainaddress

3. Device/DMAenginehandlethetransfer– CPUisfreetoexecuteotherthings

4. Uponcompletion,Device/DMAengineinterrupttheCPUagain

15

Page 16: CS 110 Computer Architecture Lecture 24

DMA:OutgoingData

1. CPUdecidestoinitiatetransfer,confirmsthatexternaldeviceisready

2. CPUbeginstransfer– InstructsDMAengine/devicethatdataisavailable@certainaddress

3. Device/DMAenginehandlethetransfer– CPUisfreetoexecuteotherthings

4. Device/DMAengineinterrupttheCPUagaintosignalcompletion

16

Page 17: CS 110 Computer Architecture Lecture 24

DMA:Somenewproblems

• WhereinthememoryhierarchydoweplugintheDMAengine?Twoextremes:– BetweenL1andCPU:

• Pro:Freecoherency• Con:TrashtheCPU’sworkingsetwithtransferreddata

– BetweenLast-levelcacheandmainmemory:• Pro:Don’tmesswithcaches• Con:Needtoexplicitlymanagecoherency

17

Page 18: CS 110 Computer Architecture Lecture 24

DMA:Somenewproblems

• HowdowearbitratebetweenCPUandDMAEngine/Deviceaccesstomemory?Threeoptions:– BurstMode

• Starttransferofdatablock,CPUcannotaccessmemoryinthemeantime

– CycleStealingMode• DMAenginetransfersabyte,releasescontrol,thenrepeats- interleavesprocessor/DMAengineaccesses

– TransparentMode• DMAtransferonlyoccurswhenCPUisnotusingthesystembus

18

Page 19: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

19

Page 20: CS 110 Computer Architecture Lecture 24

ComputerMemoryHierarchy

20Today

Page 21: CS 110 Computer Architecture Lecture 24

MagneticDisk– commonI/Odevice• Akindofcomputermemory

– Informationstoredbymagnetizingferritematerialonsurfaceofrotatingdisk

• similartotaperecorderexceptdigitalratherthananalogdata

• Atypeofnon-volatilestorage– retainsitsvaluewithoutapplyingpowertodisk.

• TwoTypesofMagneticDisk1. HardDiskDrives(HDD)– faster,moredense,non-removable.2. Floppydisks– slower,lessdense,removable(nowreplacedbyUSB

“flashdrive”).

• Purposeincomputersystems(HardDrive):1. Workingfilesystem+long-termbackupforfiles2. Secondary“backingstore”formain-memory.Large,inexpensive,

slowlevelinthememoryhierarchy(virtualmemory)

21

Page 22: CS 110 Computer Architecture Lecture 24

PhotoofDiskHead,Arm,Actuator

Arm

Head

Spindle

22

Page 23: CS 110 Computer Architecture Lecture 24

DiskDeviceTerminology

• Severalplatters,withinformationrecordedmagneticallyonbothsurfaces(usually)

• Bitsrecordedintracks,whichinturndividedintosectors (e.g.,512Bytes)

• Actuator moveshead (endofarm)overtrack(“seek”),waitforsector rotateunderhead,thenreadorwrite

OuterTrack

InnerTrackSector

Actuator

HeadArmPlatter

23

Page 24: CS 110 Computer Architecture Lecture 24

HardDrivesareSealed.Why?• Theclosertheheadtothedisk,the

smallerthe“spotsize”andthusthedensertherecording.– MeasuredinGbit/in^2– ~900Gbit/in^2isstateoftheart– Startedoutat2Kbit/in^2– ~450,000,000ximprovementin~60

years• Disksaresealedtokeepthedust

out.– Headsaredesignedto“fly”ataround

3-20nmabovethesurfaceofthedisk.– 99.999%ofthehead/armweightis

supportedbytheairbearingforce(aircushion)developedbetweenthediskandthehead.

24

3-20nm

Page 25: CS 110 Computer Architecture Lecture 24

DiskDevicePerformance(1/2)

• Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Overhead– SeekTime=timetopositiontheheadassemblyatthepropercylinder– RotationTime=timeforthedisktorotatetothepointwherethefirst

sectorsoftheblocktoaccessreachthehead– TransferTime=timetakenbythesectorsoftheblockandanygaps

betweenthemtorotatepastthehead

Platter

Arm

Actuator

HeadSectorInnerTrack

OuterTrack

ControllerSpindle

25

Page 26: CS 110 Computer Architecture Lecture 24

DiskDevicePerformance(2/2)

• Averagevaluestoplugintotheformula:• RotationTime:Averagedistanceofsectorfromhead?– 1/2timeofarotation

• 7200RevolutionsPerMinute=> 120Rev/sec• 1revolution=1/120sec=> 8.33milliseconds• 1/2rotation(revolution)=> 4.17ms

• Seektime:Averageno.trackstomovearm?– Numberoftracks/3– Then,seektime=numberoftracksmoved× timetomoveacrossonetrack

26

Page 27: CS 110 Computer Architecture Lecture 24

Butwait!

• Performanceestimatesaredifferentinpractice:

• Manydiskshaveon-diskcaches,whicharecompletelyhiddenfromtheoutsideworld– Previousformulacompletelyreplacedwithon-diskcacheaccesstime

27

Page 28: CS 110 Computer Architecture Lecture 24

WheredoesFlashmemorycomein?• ~10yearsago:Microdrives andFlashmemory(e.g.,CompactFlash)wenthead-to-head– Bothnon-volatile(retainscontentswithoutpowersupply)

– Flashbenefits:lowerpower,nocrashes(nomovingparts,needtospinµdrivesup/down)

– Diskcost=fixedcostofmotor+armmechanics,butactualmagneticmediacostverylow

– Flashcost=mostcost/bitofflashchips– Overtime,cost/bitofflashcamedown,becamecostcompetitive

28

Page 29: CS 110 Computer Architecture Lecture 24

FlashMemory/SSDTechnology

• NMOStransistorwithanadditional conductorbetweengateandsource/drainwhich“traps”electrons.Thepresence/absenceisa1or0

• Memorycellscanwithstandalimitednumber ofprogram-erasecycles.ControllersuseatechniquecalledwearlevelingtodistributewritesasevenlyaspossibleacrossalltheflashblocksintheSSD.

29

Page 30: CS 110 Computer Architecture Lecture 24

WhatdidAppleputinitsiPods?Samsung flash

16 GB

shuffle nano classic touch

Toshiba 1.8-inch HDD80, 120, 160 GB

Toshiba flash2 GB

Toshiba flash32, 64 GB

30

Page 31: CS 110 Computer Architecture Lecture 24

FlashMemoryinSmartPhones

31

iPhone6:upto128GB

Page 32: CS 110 Computer Architecture Lecture 24

FlashMemoryinLaptops– SolidStateDrive(SSD)

32

capacitiesupto1TB

Page 33: CS 110 Computer Architecture Lecture 24

HDDvsSSDspeed

33

Page 34: CS 110 Computer Architecture Lecture 24

34

Page 35: CS 110 Computer Architecture Lecture 24

Question• Wehavethefollowingdisk:

– 15000Cylinders,1ms tocross1000Cylinders– 15000RPM=4ms perrotation– Wanttocopy1MB,transferrateof1000MB/s– 1ms controllerprocessingtime

• Whatistheaccesstimeusingourmodel?

DiskAccessTime=SeekTime+RotationTime+TransferTime+ControllerProcessingTime

35

A B C D E

10.5 ms 9 ms 8.5ms 11.4ms 12 ms

Page 36: CS 110 Computer Architecture Lecture 24

Question

• Wehavethefollowingdisk:– 15000Cylinders,1ms tocross1000Cylinders– 15000RPM=4ms perrotation– Wanttocopy1MB,transferrateof1000MB/s– 1ms controllerprocessing time

• Whatistheaccesstime?Seek=#cylinders/3*time=15000/3*1ms/1000cylinders=5msRotation=timefor½rotation=4ms /2=2msTransfer=Size/transferrate=1MB/(1000MB/s)=1msController=1msTotal=5+2+1+1=9ms

36

Page 37: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

37

Page 38: CS 110 Computer Architecture Lecture 24

Networks:TalkingtotheOutsideWorld

• OriginallysharingI/Odevicesbetweencomputers– E.g.,printers

• Thencommunicatingbetweencomputers– E.g.,filetransferprotocol

• Thencommunicatingbetweenpeople– E.g.,e-mail

• Thencommunicatingbetweennetworksofcomputers– E.g.,filesharing,www,…

38

Page 39: CS 110 Computer Architecture Lecture 24

• History– 1963:JCRLicklider,whileat

DoD’s ARPA,writesamemodescribingdesiretoconnectthecomputersatvariousresearchuniversities:Stanford,Berkeley,UCLA,...

– 1969:ARPAdeploys4“nodes”@UCLA,SRI,Utah,&UCSB

– 1973RobertKahn&Vint CerfinventTCP,nowpartoftheInternetProtocolSuite

• Internetgrowthrates– Exponentialsincestart!

TheInternet(1962)www.computerhistory.org/internet_history

www.greatachievements.org/?id=3736en.wikipedia.org/wiki/Internet_Protocol_Suite

“Lick”

Vint Cerf“Revolutions like this don't

come along very often”

39

Page 40: CS 110 Computer Architecture Lecture 24

• “SystemofinterlinkedhypertextdocumentsontheInternet”

• History– 1945:Vannevar Bushdescribes

hypertextsystemcalled“memex” inarticle

– 1989:SirTimBerners-Leeproposedandimplemented thefirstsuccessfulcommunicationbetweenaHypertextTransferProtocol(HTTP)clientandserverusingtheinternet.

– ~2000Dot-comentrepreneursrushedin,2001bubbleburst

• Today:Accessanywhere!

TheWorldWideWeb(1989)en.wikipedia.org/wiki/History_of_the_World_Wide_Web

Tim Berners-LeeWorld’s First web

server in 1990

40

Page 41: CS 110 Computer Architecture Lecture 24

Sharedvs.Switch-BasedNetworks

• Sharedvs.Switched:• Shared:1atatime(CSMA/CD)

• Switched: pairs(“point-to-point”connections)communicateatsametime

• Aggregatebandwidth(BW)inswitchednetworkismanytimesthatofshared:• point-to-pointfastersincenoarbitration,simplerinterface

Node Node Node

Shared

CrossbarSwitch

Node

Node

Node

Node

41

Page 42: CS 110 Computer Architecture Lecture 24

Whatmakesnetworkswork?• Links connectingswitchesand/orrouters toeachotherandtocomputersordevices

Computer

networkinterface

switch

switch

switch

• Abilitytonamethecomponentsandtoroutepacketsofinformation- messages- fromasourcetoadestination

• Layering,redundancy,protocols,andencapsulationasmeansofabstraction(bigideainComputerArchitecture)

42

Page 43: CS 110 Computer Architecture Lecture 24

SoftwareProtocoltoSendandReceive• SWSendsteps

1:ApplicationcopiesdatatoOSbuffer2:OScalculateschecksum,startstimer3:OSsendsdatatonetworkinterfaceHWandsaysstart

• SWReceivesteps3:OScopiesdatafromnetworkinterfaceHWtoOSbuffer2:OScalculateschecksum,ifOK,sendACK;ifnot,deletemessage (senderresendswhentimerexpires)

1:IfOK,OScopiesdatatouseraddressspace,&signalsapplicationtocontinue

Header Payload

Checksum

TrailerCMD/ Address /DataNet ID Net ID Len ACK

INFO

Dest Src

43

Page 44: CS 110 Computer Architecture Lecture 24

Whatdoesittaketosendpacketsacrosstheglobe?• Bitsonwireorair• Packetsonwireorair• Deliverypacketswithinasinglephysicalnetwork

• Deliverpacketsacrossmultiplenetworks• Ensurethedestinationreceivedthedata• Createdataatthesenderandmakeuseofthedataatthereceiver

Protocols for Networks of Networks?

44

Page 45: CS 110 Computer Architecture Lecture 24

Lotstodoandatmultiplelevels!

Useabstraction tocopewithcomplexityofcommunication

• Hierarchyoflayers:- Application(chatclient,game,etc.)

- Transport(TCP,UDP)- Network(IP)

- DataLinkLayer(Ethernet)

- PhysicalLink(copper,wireless,etc.)

ProtocolforNetworksofNetworks?

45

Page 46: CS 110 Computer Architecture Lecture 24

ProtocolFamilyConcept• Protocol:packetstructureandcontrolcommandstomanagecommunication

• Protocolfamilies(suites):asetofcooperatingprotocolsthatimplementthenetworkstack

• Keytoprotocolfamilies isthatcommunicationoccurslogically atthesameleveloftheprotocol,calledpeer-to-peer…

…butisimplementedviaservicesatthenextlowerlevel

• Encapsulation:carryhigherlevelinformationwithinlowerlevel“envelope”

46

Page 47: CS 110 Computer Architecture Lecture 24

Dear John,

Your days are numbered.

--Pat

Inspiration…

• CEO A writes letter to CEO B– Folds letter and hands it to assistant

• Assistant:– Puts letter in envelope with CEO B’s full name– Takes to FedEx

• FedEx Office– Puts letter in larger envelope– Puts name and street address on FedEx envelope– Puts package on FedEx delivery truck

• FedEx delivers to other company

47

Page 48: CS 110 Computer Architecture Lecture 24

CEO

Aide

FedEx

CEO

Aide

FedExLocationFedex Envelope(FE)

The Path of the Letter

Letter

Envelope

SemanticContent

Identity

“Peers”oneachsideunderstandthesamethingsNooneelseneedsto

Lowestlevelhasmostpackaging

48

Page 49: CS 110 Computer Architecture Lecture 24

ProtocolFamilyConcept

Message Message

TH Message TH Message TH TH

Actual Actual

Physical

Message TH Message THActual ActualLogical

Logical

49

Eachlowerlevelofstack“encapsulates”informationfromlayerabovebyaddingheaderandtrailer.

Page 50: CS 110 Computer Architecture Lecture 24

MostPopularProtocolforNetworkofNetworks

• TransmissionControlProtocol/InternetProtocol(TCP/IP)

• ThisprotocolfamilyisthebasisoftheInternet,aWAN(wideareanetwork)protocol• IPmakesbestefforttodeliver

• Packetscanbelost,corrupted

• TCPguaranteesdelivery• TCP/IPsopopularitisusedevenwhencommunicatinglocally:evenacrosshomogeneousLAN(localareanetwork)

50

Page 51: CS 110 Computer Architecture Lecture 24

Message

TCP/IPpacket,Ethernetpacket,protocols

• Applicationsendsmessage

TCP data

TCP HeaderIP Header

IP DataEH

Ethernet Hdr

Ethernet Hdr• TCPbreaksinto64KiBsegments,adds20Bheader

• IPadds20Bheader,sendstonetwork• IfEthernet,brokeninto1500Bpacketswithheaders,trailers

51

Page 52: CS 110 Computer Architecture Lecture 24

“Andinconclusion…”• I/Ogivescomputerstheir5senses• I/Ospeedrangeis100-milliontoone• Pollingvs.Interrupts• DMAtoavoidwastingCPUtimeondatatransfers• Disksforpersistentstorage,replacedbyflash• Networks:computer-to-computerI/O

– Protocolsuitesallownetworkingofheterogeneouscomponents.Abstraction!!!

52