University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 13: I/O Systems November 3, 2009
Mar 19, 2016
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell
CS352H: Computer Systems Architecture
Topic 13: I/O SystemsNovember 3, 2009
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2
IntroductionI/O devices can be characterized by
Behavior: input, output, storagePartner: human or machineData rate: bytes/sec, transfers/sec
I/O bus connections
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3
I/O System Characteristics
Dependability is importantParticularly for storage devices
Performance measuresLatency (response time)Throughput (bandwidth)Desktops & embedded systems
Mainly interested in response time & diversity of devicesServers
Mainly interested in throughput & expandability of devices
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4
Dependability
Fault: failure of a componentMay or may not lead to system failure
Service accomplishmentService delivered
as specified
Service interruptionDeviation from
specified service
FailureRestoration
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5
Dependability Measures
Reliability: mean time to failure (MTTF)Service interruption: mean time to repair (MTTR)Mean time between failures
MTBF = MTTF + MTTRAvailability = MTTF / (MTTF + MTTR)Improving Availability
Increase MTTF: fault avoidance, fault tolerance, fault forecastingReduce MTTR: improved tools and processes for diagnosis and repair
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6
Disk Storage
Nonvolatile, rotating magnetic storage
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7
Disk Sectors and Access
Each sector recordsSector IDData (512 bytes, 4096 bytes proposed)Error correcting code (ECC)
Used to hide defects and recording errorsSynchronization fields and gaps
Access to a sector involvesQueuing delay if other accesses are pendingSeek: move the headsRotational latencyData transferController overhead
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8
Disk Access Example
Given512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk
Average read time4ms seek time+ ½ / (15,000/60) = 2ms rotational latency+ 512 / 100MB/s = 0.005ms transfer time+ 0.2ms controller delay= 6.2ms
If actual average seek time is 1msAverage read time = 3.2ms
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9
Disk Performance Issues
Manufacturers quote average seek timeBased on all possible seeksLocality and OS scheduling lead to smaller actual average seek times
Smart disk controller allocate physical sectors on diskPresent logical sector interface to hostSCSI, ATA, SATA
Disk drives include cachesPrefetch sectors in anticipation of accessAvoid seek and rotational delay
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10
Disk Specs
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11
Flash Storage
Nonvolatile semiconductor storage100× – 1000× faster than diskSmaller, lower power, more robustBut more $/GB (between disk and DRAM)
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12
Flash Types
NOR flash: bit cell like a NOR gateRandom read/write accessUsed for instruction memory in embedded systems
NAND flash: bit cell like a NAND gateDenser (bits/area), but block-at-a-time accessCheaper per GBUsed for USB keys, media storage, …
Flash bits wears out after 1000’s of accessesNot suitable for direct RAM or disk replacementWear leveling: remap data to less used blocks
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13
Flash Types
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14
Flash Specs
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15
Interconnecting Components
Need interconnections betweenCPU, memory, I/O controllers
Bus: shared communication channelParallel set of wires for data and synchronization of data transferCan become a bottleneck
Performance limited by physical factorsWire length, number of connections
More recent alternative: high-speed serial connections with switches
Like networks
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16
Bus Types
Processor-Memory busesShort, high speedDesign is matched to memory organization
I/O busesLonger, allowing multiple connectionsSpecified by standards for interoperabilityConnect to processor-memory bus through a bridge
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17
Bus Signals and Synchronization
Data linesCarry address and dataMultiplexed or separate
Control linesIndicate data type, synchronize transactions
SynchronousUses a bus clock
AsynchronousUses request/acknowledge control lines for handshaking
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18
I/O Bus ExamplesFirewire USB 2.0 PCI Express Serial ATA Serial
Attached SCSI
Intended use External External Internal Internal External
Devices per channel
63 127 1 1 4
Data width 4 2 2/lane 4 4
Peak bandwidth
50MB/s or 100MB/s
0.2MB/s, 1.5MB/s, or 60MB/s
250MB/s/lane1×, 2×, 4×, 8×, 16×, 32×
300MB/s 300MB/s
Hot pluggable Yes Yes Depends Yes Yes
Max length 4.5m 5m 0.5m 1m 8m
Standard IEEE 1394 USB Implementers Forum
PCI-SIG SATA-IO INCITS TC T10
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19
Typical x86 PC I/O System
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20
I/O Management
I/O is mediated by the OSMultiple programs share I/O resources
Need protection and schedulingI/O causes asynchronous interrupts
Same mechanism as exceptionsI/O programming is fiddly
OS provides abstractions to programs
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21
I/O Commands
I/O devices are managed by I/O controller hardwareTransfers data to/from deviceSynchronizes operations with software
Command registersCause device to do something
Status registersIndicate what the device is doing and occurrence of errors
Data registersWrite: transfer data to a deviceRead: transfer data from a device
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22
I/O Register Mapping
Memory mapped I/ORegisters are addressed in same space as memoryAddress decoder distinguishes between themOS uses address translation mechanism to make them only accessible to kernel
I/O instructionsSeparate instructions to access I/O registersCan only be executed in kernel modeExample: x86
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23
Polling
Periodically check I/O status registerIf device ready, do operationIf error, take action
Common in small or low-performance real-time embedded systems
Predictable timingLow hardware cost
In other systems, wastes CPU time
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24
Interrupts
When a device is ready or error occursController interrupts CPU
Interrupt is like an exceptionBut not synchronized to instruction executionCan invoke handler between instructionsCause information often identifies the interrupting device
Priority interruptsDevices needing more urgent attention get higher priorityCan interrupt handler for a lower priority interrupt
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25
I/O Data Transfer
Polling and interrupt-driven I/OCPU transfers data between memory and I/O data registersTime consuming for high-speed devices
Direct memory access (DMA)OS provides starting address in memoryI/O controller transfers to/from memory autonomouslyController interrupts on completion or error
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 26
DMA/Cache Interaction
If DMA writes to a memory block that is cachedCached copy becomes stale
If write-back cache has dirty block, and DMA reads memory blockReads stale data
Need to ensure cache coherenceFlush blocks from cache if they will be used for DMAOr use non-cacheable memory locations for I/O
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 27
DMA/VM Interaction
OS uses virtual addresses for memoryDMA blocks may not be contiguous in physical memory
Should DMA use virtual addresses?Would require controller to do translation
If DMA uses physical addressesMay need to break transfers into page-sized chunksOr chain multiple transfersOr allocate contiguous physical pages for DMA
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 28
Measuring I/O Performance
I/O performance depends onHardware: CPU, memory, controllers, busesSoftware: operating system, database management system, applicationWorkload: request rates and patterns
I/O system design can trade-off between response time and throughput
Measurements of throughput often done with constrained response-time
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 29
Transaction Processing Benchmarks
TransactionsSmall data accesses to a DBMSInterested in I/O rate, not data rate
Measure throughputSubject to response time limits and failure handlingACID (Atomicity, Consistency, Isolation, Durability)Overall cost per transaction
Transaction Processing Council (TPC) benchmarks (www.tcp.org)TPC-APP: B2B application server and web servicesTCP-C: on-line order entry environmentTCP-E: on-line transaction processing for brokerage firmTPC-H: decision support — business oriented ad-hoc queries
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 30
File System & Web Benchmarks
SPEC System File System (SFS)Synthetic workload for NFS server, based on monitoring real systemsResults
Throughput (operations/sec)Response time (average ms/operation)
SPEC Web Server benchmarkMeasures simultaneous user sessions, subject to required throughput/sessionThree workloads: Banking, Ecommerce, and Support
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 31
I/O vs. CPU Performance
Amdahl’s LawDon’t neglect I/O performance as parallelism increases compute performance
ExampleBenchmark takes 90s CPU time, 10s I/O timeDouble the number of CPUs/2 years
I/O unchanged
Year CPU time I/O time Elapsed time % I/O timenow 90s 10s 100s 10%+2 45s 10s 55s 18%+4 23s 10s 33s 31%+6 11s 10s 21s 47%
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 32
RAID
Redundant Array of Inexpensive (Independent) DisksUse multiple smaller disks (c.f. one large disk)Parallelism improves performancePlus extra disk(s) for redundant data storage
Provides fault tolerant storage systemEspecially if failed disks can be “hot swapped”
RAID 0No redundancy (“AID”?)
Just stripe data over multiple disksBut it does improve performance
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 33
RAID 1 & 2
RAID 1: MirroringN + N disks, replicate data
Write data to both data disk and mirror diskOn disk failure, read from mirror
RAID 2: Error correcting code (ECC)N + E disks (e.g., 10 + 4)Split data at bit level across N disksGenerate E-bit ECCToo complex, not used in practice
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 34
RAID 3: Bit-Interleaved Parity
N + 1 disksData striped across N disks at byte levelRedundant disk stores parityRead access
Read all disksWrite access
Generate new parity and update all disksOn failure
Use parity to reconstruct missing data
Not widely used
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 35
RAID 4: Block-Interleaved Parity
N + 1 disksData striped across N disks at block levelRedundant disk stores parity for a group of blocksRead access
Read only the disk holding the required block
Write accessJust read disk containing modified block, and parity diskCalculate new parity, update data disk and parity disk
On failureUse parity to reconstruct missing data
Not widely used
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 36
RAID 3 vs RAID 4
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 37
RAID 5: Distributed Parity
N + 1 disksLike RAID 4, but parity blocks distributed across disks
Avoids parity disk being a bottleneckWidely used
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 38
RAID 6: P + Q Redundancy
N + 2 disksLike RAID 5, but two lots of parityGreater fault tolerance through more redundancy
Multiple RAIDMore advanced systems give similar fault tolerance with better performance
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 39
RAID Summary
RAID can improve performance and availabilityHigh availability requires hot swapping
Assumes independent disk failuresToo bad if the building burns down!
See “Hard Disk Performance, Quality and Reliability”http://www.pcguide.com/ref/hdd/perf/index.htm
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 40
I/O System Design
Satisfying latency requirementsFor time-critical operationsIf system is unloaded
Add up latency of components
Maximizing throughputFind “weakest link” (lowest-bandwidth component)Configure to operate at its maximum bandwidthBalance remaining components in the system
If system is loaded, simple analysis is insufficientNeed to use queuing models or simulation
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 41
Server Computers
Applications are increasingly run on serversWeb search, office apps, virtual worlds, …
Requires large data center serversMultiple processors, networks connections, massive storageSpace and power constraints
Server equipment built for 19” racksMultiples of 1.75” (1U) high
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 42
Rack-Mounted Servers
Sun Fire x4150 1U server
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 43
Sun Fire x4150 1U server
4 cores each
16 x 4GB = 64GB DRAM
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 44
I/O System Design Example
Given a Sun Fire x4150 system withWorkload: 64KB disk reads
Each I/O op requires 200,000 user-code instructions and 100,000 OS instructions
Each CPU: 109 instructions/secFSB: 10.6 GB/sec peakDRAM DDR2 667MHz: 5.336 GB/secPCI-E 8× bus: 8 × 250MB/sec = 2GB/secDisks: 15,000 rpm, 2.9ms avg. seek time, 112MB/sec transfer rate
What I/O rate can be sustained?For random reads, and for sequential reads
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 45
Design Example (cont)
I/O rate for CPUs Per core: 109/(100,000 + 200,000) = 3,3338 cores: 26,667 ops/sec
Random reads, I/O rate for disksAssume actual seek time is average/4Time/op = seek + latency + transfer= 2.9ms/4 + 4ms/2 + 64KB/(112MB/s) = 3.3ms303 ops/sec per disk, 2424 ops/sec for 8 disks
Sequential reads112MB/s / 64KB = 1750 ops/sec per disk14,000 ops/sec for 8 disks
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 46
Design Example (cont)
PCI-E I/O rate 2GB/sec / 64KB = 31,250 ops/sec
DRAM I/O rate5.336 GB/sec / 64KB = 83,375 ops/sec
FSB I/O rateAssume we can sustain half the peak rate5.3 GB/sec / 64KB = 81,540 ops/sec per FSB163,080 ops/sec for 2 FSBs
Weakest link: disks2424 ops/sec random, 14,000 ops/sec sequentialOther components have ample headroom to accommodate these rates
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 47
Fallacy: Disk Dependability
If a disk manufacturer quotes MTTF as 1,200,000 hr (140yr)
A disk will work that long
Wrong: this is the mean time to failureWhat is the distribution of failures?What if you have 1000 disks
How many will fail per year?
€
Annual Failure Rate(AFR) = 8760 hrs/disk1200000 hrs/failure
= 0.0073 failures/disk ×100%= 0.73%
So 0.73% x 1000 disks = 7.3 failures expected in a year
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 48
Fallacies
Disk failure rates are as specifiedStudies of failure rates in the field
Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8%Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5%
Why?A 1GB/s interconnect transfers 1GB in one sec
But what’s a GB?For bandwidth, use 1GB = 109 BFor storage, use 1GB = 230 B = 1.075×109 BSo 1GB/sec is 0.93GB in one second
About 7% error
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 49
Pitfall: Offloading to I/O Processors
Overhead of managing I/O processor request may dominate
Quicker to do small operation on the CPUBut I/O architecture may prevent that
I/O processor may be slowerSince it’s supposed to be simpler
Making it faster makes it into a major system componentMight need its own coprocessors!
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 50
Pitfall: Backing Up to Tape
Magnetic tape used to have advantagesRemovable, high capacity
Advantages eroded by disk technology developmentsMakes better sense to replicate data
E.g, RAID, remote mirroring
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 51
Fallacy: Disk Scheduling
Best to let the OS schedule disk accessesBut modern drives deal with logical block addresses
Map to physical track, cylinder, sector locationsAlso, blocks are cached by the drive
OS is unaware of physical locationsReordering can reduce performanceDepending on placement and caching
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 52
Pitfall: Peak Performance
Peak I/O rates are nearly impossible to achieveUsually, some other system component limits performanceE.g., transfers to memory over a bus
Collision with DRAM refreshArbitration contention with other bus masters
E.g., PCI bus: peak bandwidth ~133 MB/secIn practice, max 80MB/sec sustainable
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 53
Concluding Remarks
I/O performance measuresThroughput, response timeDependability and cost also important
Buses used to connect CPU, memory,I/O controllers
Polling, interrupts, DMAI/O benchmarks
TPC, SPECSFS, SPECWebRAID
Improves performance and dependability