Input / Output CPS 104 Week 14 lecture 1
Dec 14, 2015
Input / Output
CPS 104
Week 14 lecture 1
CPS 104 2© Alvin R. Lebeck 1998
Administrivia
• HW 5 Due
• HW 6 Assigned– Due last day of class
CPS 104 3© Alvin R. Lebeck 1998
Overview
• I/O devices– device controller
• Rotational media (disks)
• Device drivers
• Memory Mapped I/O
• Programmed I/O
• Direct Memory Access (DMA)
• I/O bus memory bus
• RAID (if time)
CPS 104 4© Alvin R. Lebeck 1998
Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)
I/O Bus
Memory Bus
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
I/O Bridge
I/O Systems
CPS 104 5© Alvin R. Lebeck 1998
Why I/O?
• Interactive Apps
• Long term storage (files, data repository)
• Swap for VM
• Many different devices– character v.s. block
– Networks are everywhere!
• 106 difference CPU (10 -9) & I/O (10 -3)
• Response Time vs Throughput– Not always another process to execute
• OS hides (some) differences in devices– same (similar) interface to many devices
• Permits many apps to share one device
CPS 104 6© Alvin R. Lebeck 1998
Device Drivers
• top-half– API (open, close, read, write, ioctl)
– I/O Control (IOCTL, device specific arguments)
• bottom-half– interrupt handler
– communicates with device
– resumes process
• Must have access to user address space and device control registers => runs in kernel mode.
CPS 104 7© Alvin R. Lebeck 1998
Review: Interrupts and Exceptions
• Unnatural change in control flow
• Interrupt is external event – devices: disk, network, keyboard, etc.
– clock for timeslicing
– these are useful events, must do something when they occur.
• Exception is often potential problem with program– segmentation fault
– bus error
– divide by 0
– don’t want my bug to crash the entire machine
– page fault (virtual memory…)
CPS 104 8© Alvin R. Lebeck 1998
Review: Handling an Interrupt/Exception
• Invoke specific kernel routine based on type of interrupt
– interrupt/exception handler
• Must determine what caused interrupt
– could use software to examine each device
– PC = interrupt_handler
• Vectored Interrupts– PC = interrupt_table[i]
• Clear the interrupt
• kernel initializes table at boot time
• May return from interrupt (RETT) to different process (e.g, context switch)
ldaddst
mulbeqld
subbne
RETT
User Program
Interrupt Handler
ServiceRoutines
CPS 104 9© Alvin R. Lebeck 1998
Types of Storage Devices
• Magnetic Disks
• Magnetic Tapes
• CD ROM
• Juke Box (automated tape library, robots)
CPS 104 10© Alvin R. Lebeck 1998
Magnetic Disks
• Long term nonvolatile storage
• Another slower, less expensive level of memory hierarchy
SectorTrack
Cylinder
HeadPlatter
Arm
CPS 104 11© Alvin R. Lebeck 1998
Disk Access
• Access time =
queue + seek + rotational + transfer + overhead
• Seek time– move arm over track
– average is confusing (startup, slowdown, locality of accesses)
• Rotational latency– wait for sector to rotate under head
– average = 0.5/(3600 RPM) = 8.3ms
• Transfer Time– f(size, BW bytes/sec)
CPS 104 12© Alvin R. Lebeck 1998
Disk Access Time Example
• Disk Parameters:– Transfer size is 8K bytes
– Advertised average seek is 12 ms
– Disk spins at 7200 RPM
– Transfer rate is 4 MB/sec
• Controller overhead is 2 ms
• Assume that disk is idle so no queuing delay
• What is Average Disk Access Time for a Sector?– Ave seek + ave rot delay + transfer time + controller overhead
– 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms
– 12 + 4.15 + 2 + 2 = 20 ms
• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms
CPS 104 13© Alvin R. Lebeck 1998
DRAM as Disk
• Solid state disk, Expanded Storage, NVRAM
• Disk is slow, DRAM is fast => replace Disk with battery backed DRAM
• BUT, Disk is cheap, much cheaper than DRAM
• Network Memory– fast networks (e.g., Myrinet)
– use DRAM of other workstations as backing store
– Trapeze/GMS project here
CPS 104 14© Alvin R. Lebeck 1998
Alternative Storage
• CD ROM– read only: good distribution, archiving
• Magnetic Tape– Sequential Access
– R-DAT (Rotating Digital Audio Tape)
» Helical Scan (angle to tape, high density ~5GB)
– Tera to peta bytes of storage (NASA EOS)
CPS 104 15© Alvin R. Lebeck 1998
Connecting I/O Devices to CPU/Memory
• Memory Bus– Short
– Fast
– Known set of components
– Proprietary (don’t release design free)
• Separate I/O Bus (e.g., PCI)– Standard
– Accept variety of components (w/ different BW performance)
– Long
– Slow
CPS 104 16© Alvin R. Lebeck 1998
Processor Interface Issues
• Interconnections– Busses
• Processor interface– I/O Instructions
– Memory mapped I/O
• I/O Control Structures– Device Controllers
– Polling/Interrupts
• Data movement– Programmed I/O / DMA
• Capacity, Access Time, Bandwidth
CPS 104 17© Alvin R. Lebeck 1998
Device Controllers
DeviceController
Command Status Data 0
Data 1
Data n-1
Busy Done Error
Bus
Device
Interrupt?
Controller deals withmundane control(e.g., position head, error detection/correction)
Processor communicateswith Controller
CPS 104 18© Alvin R. Lebeck 1998
I/O Instructions
Independent I/O Bus
CPU
Controller Controller
Device Device
Memory
memorybus
Separate I/O instructions (in,out)
CPU
Controller Controller
Device Device
Memory
Lines distinguish between I/O and memory transferscommon memory
& I/O bus
VME busMultibus-IINubus
40 Mbytes/secoptimistically
10 MIP processorcompletelysaturates the bus!
CPS 104 19© Alvin R. Lebeck 1998
Memory Mapped I/O
Single Memory & I/O BusNo Separate I/O Instructions
CPU
Controller
Device
Controller
Device
Memory
ROM
RAM
I/O
Bridge
Physical Address
Issue command through store instructionCheck status with load instructionCaches?
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device
CPS 104 20© Alvin R. Lebeck 1998
Communicating with the processor
• Polling– can waste time waiting for slow I/O device
– busy wait
– can interleave with useful work
• Interrupts– interrupt overhead
– interrupt could happen anytime - asynchronous
– no busy wait
CPS 104 21© Alvin R. Lebeck 1998
Data Movement
• Programmed I/O– processor has to touch all the data
– too much processor overhead
» for high bandwidth devices (disk, network)
• DMA– processor sets up transfer(s)
– DMA controller transfers data
– complicates memory system
CPS 104 22© Alvin R. Lebeck 1998
Programmed I/O & Polling
yes
busy wait loopnot an efficient
way to use the CPUunless the device
is very fast!
but checks for I/O completion can bedispersed amongcomputationallyintensive code
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device Is thedata
ready?
loaddata
storedata
yesno
done? no
CPS 104 23© Alvin R. Lebeck 1998
Interrupt Driven Data Transfer
addsubandornop
readstore...rti
memory
userprogram(1) I/O
interrupt
(2) save PC
(3) interruptservice addr
interruptserviceroutine(4)
User program progress only halted during actual transfer
Interrupt Overhead can dominate transfer time.1000 xfers of 1000 bytes each: 2usecs for interrupt 98usecs for service
Device xfer rate: 10 MB/s => .1usec/byte => .1ms for 1000 bytes
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device
CPS 104 24© Alvin R. Lebeck 1998
Direct Memory Access
Time to do 1000 x 1000 bytes:
1 DMA set-up sequence @ 50 µsec1 interrupt @ 2 µsec1 interrupt service sequence @ 48 µsec
.0001 second of CPU time
CPU sends a starting address, direction, and length count to DMAC. Then issues "start".
DMAC provides handshake signals for devicecontroller, and memory addresses and handshakesignals for memory.
0ROM
RAM
Peripherals
DMAC n
Memory Mapped I/O
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
DMA CNTRL
CPS 104 25© Alvin R. Lebeck 1998
I/O Data Flow
Memory-to-Memory Copy
DMA over Peripheral Bus
Xfer over Disk Channel
Xfer over Serial Interface
Application Address Space
OS Buffers (>10 MByte)
HBA Buffers (1 M - 4 MBytes)
Track Buffers (32K - 256KBytes)
I/O Device
I/O Controller
Embedded Controller
Head/Disk Assembly
Host Processor
Impediment to high performance: multiple copies, complex hierarchy
CPS 104 26© Alvin R. Lebeck 1998
Communication Networks
Performance limiter is memory system, OS overhead, not HW protocols
NodeProcessor
ControlReg. I/F
NetI/F Memory
RequestBlock
ReceiveBlock
Media
Network Controller
Peripheral Backplane Bus
DMA
. . .
Processor MemoryList of request blocks
Data to be transmitted
. . .
List of receive blocks
Data receivedDMA
. . .
List of free blocks
• Send/receive queues in processor memories• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received
CPS 104 27© Alvin R. Lebeck 1998
Relationship to Processor Architecture
• Virtual memory frustrates DMA– page faults during DMA?
• Synchronization between controller and CPU
• Caches required for processor performance cause problems for I/O
– Flushing is expensive, I/O pollutes cache
– Solution is borrowed from shared memory multiprocessors "snooping” (coherent DMA)
• Caches and write buffers– need uncached and write buffer flush for memory mapped I/O
CPS 104 28© Alvin R. Lebeck 1998
Bus Arbitration
Parallel (Centralized) Arbitration
Serial Arbitration (daisy chaining)
Self SelectionCollision Detection
BR BG
M
BR BG
M
BR BG
M
MBGi BGo
BRM
BGi BGo
BRM
BGi BGo
BR
BG
BR
A.U.
Bus RequestBus Grant
CPS 104 29© Alvin R. Lebeck 1998
Bus Options
Option High performance Low cost
Bus width Separate address Multiplex address& data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word transferless bus overhead is simpler
Bus masters Multiple Single master(requires arbitration) (no arbitration)
Split Yes—separate No—continuous transaction? Request and Reply connection is cheaper
packets gets higher and has lower latencybandwidth(needs multiple masters)
Clocking Synchronous Asynchronous
CPS 104 30© Alvin R. Lebeck 1998
Asynchronous Handshake
Write Transaction
t0 : Master has obtained control and asserts address, direction, data
Waits a specified amount of time for slaves to decode target\
t1: Master asserts request line
t2: Slave asserts ack, indicating data received
t3: Master releases req
t4: Slave releases ack
Address
Data
Read
Req.
Ack.
Master Asserts Address
Master Asserts Data
Next Address
t0 t1 t2 t3 t4 t5
4 Cycle Handshake
CPS 104 31© Alvin R. Lebeck 1998
Read Transaction
Time Multiplexed Bus: address and data share lines
t0 : Master has obtained control and asserts address, direction, data
Waits a specified amount of time for slaves to decode target\
t1: Master asserts request line
t2: Slave asserts ack, indicating ready to transmit data
t3: Master releases req, data received
t4: Slave releases ack
Address
Data
Read
Req
Ack
Master Asserts Address Next Address
t0 t1 t2 t3 t4 t5
4 Cycle Handshake
CPS 104 32© Alvin R. Lebeck 1998
Manufacturing Advantages of Disk Arrays
14”10”5.25”3.5”
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Low End High End
Disk Product Families
CPS 104 33© Alvin R. Lebeck 1998
Redundant Arrays of Disks
• Files are "striped" across multiple spindles• Redundancy yields high data availability
Disks will fail
Contents reconstructed from data redundantly stored in the array
Capacity penalty to store it
Bandwidth penalty to update
Mirroring/Shadowing (high capacity cost)
Horizontal Hamming Codes (overkill)
Parity & Reed-Solomon Codes
Failure Prediction (no capacity overhead!)
Techniques:
CPS 104 34© Alvin R. Lebeck 1998
Summary
• I/O devices– device controller
• Rotational media (disks)
• Device drivers (two parts)– help isolate specifics of device
• Memory Mapped I/O
• Programmed I/O
• Direct Memory Access (DMA)
• I/O bus memory bus
• RAID
Homework 6
CPS 104 36© Alvin R. Lebeck 1998
Interrupt Handler
• MIPS/SPIM program
• Use memory-mapped I/O
• Use interrupts
• Program should:– Accept keyboard input
» interrupts
– Echo input to terminal
» polling
– Exit if user typed ‘q’
• Programmed I/O?
CPS 104 37© Alvin R. Lebeck 1998
1
Interruptenable
Ready
1Unused
Receiver control(0xffff0000)
8
Received byte
Unused
Receiver data(0xffff0004)
1
Interruptenable
Ready
1Unused
Transmitter control(0xffff0008)
Transmitter data(0xffff000c)
8
Transmitted byte
Unused
Terminal Control
• Memory mapped I/O
– use LW, SW
• -mapped_io command line option
• Receiver - input– ready=1 when
data valid
• Transmitter– ready=1 when
ready to print next char
CPS 104 38© Alvin R. Lebeck 1998
Interrupt Driven I/O
• Set Interrupt Enable = 1– generates a level 0 interrupt when Ready becomes 1
– if interrupt is enabled in Status Register also
• Run spim with -notrap– allows you to install interrupt handler
1
Interruptenable
Ready
1Unused
Receiver control(0xffff0000)
CPS 104 39© Alvin R. Lebeck 1998
Status Register
• Bit 0 = interrupt enable
• Bit 8 = allow level 0 interrupts– terminal input generates level 0 int.
• Coprocessor 0, register 12– use mfc0, mtc0
• On interrupt, bits 0-5 are shifted left by 2– disables interrupts and enters kernel mode
• When done servicing interrupt, use rfe to restore15 8 5 4 3 2 1 0
Interruptmask Old Previous Current
Ker
nel/
user Inte
rrupt
enab
leK
erne
l/us
er Ker
nel/
userInte
rrupt
enab
le
Inte
rrupt
enab
le
CPS 104 40© Alvin R. Lebeck 1998
• Code 0000 = external interrupt– terminal interrupt
Cause Register
15 10 5 2
Pending
interrupts
Exception
code