EECC551 - Shaaban EECC551 - Shaaban #1 Lec # 12 Spring 2004 4-28 Input/Output & System Performance Input/Output & System Performance Issues Issues • System I/O Connection Structure System I/O Connection Structure – Types of Buses in the system. Types of Buses in the system. • I/O Data Transfer Methods. I/O Data Transfer Methods. • Cache & I/O. Cache & I/O. • I/O Performance Metrics. I/O Performance Metrics. • Magnetic Disk Characteristics. Magnetic Disk Characteristics. • I/O System Modeling Using Queuing Theory. I/O System Modeling Using Queuing Theory. • Designing an I/O System & System Designing an I/O System & System Performance: Performance: – System performance bottleneck. System performance bottleneck. n textbook: Ch. 7.1-7.3, 7.7, 7.8
37
Embed
EECC551 - Shaaban #1 Lec # 12 Spring 2004 4-28-2004 Input/Output & System Performance Issues System I/O Connection StructureSystem I/O Connection Structure.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Magnetic Disk Characteristics.Magnetic Disk Characteristics.
• I/O System Modeling Using Queuing Theory.I/O System Modeling Using Queuing Theory.
• Designing an I/O System & System Performance:Designing an I/O System & System Performance:– System performance bottleneck.System performance bottleneck.
CPU Core1 GHz - 3.4 GHz4-way SuperscalerRISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware speculation
L1
L2 L3
Memory Bus
All Non-blocking cachesL1 16-128K 1-2 way set associative (on chip), separate or unifiedL2 256K- 2M 4-32 way set associative (on chip) unifiedL3 2-16M 8-32 way set associative (on or off chip) unified
Bus CharacteristicsBus CharacteristicsOption High performance Low cost/performanceBus width Separate address Multiplex address
& data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 64 bits) (e.g., 16 bits)
Transfer size Multiple words has Single-word transferless bus overhead is simpler
Bus masters Multiple Single master(requires arbitration) (no arbitration)
Split Yes, separate No , continuous transaction?Request and Reply connection is cheaper packets gets higher and has lower latencybandwidth(needs multiple masters)
I/O Data Transfer MethodsI/O Data Transfer Methods• Programmed I/O (PIO):Programmed I/O (PIO): PollingPolling (For low-speed I/O) (For low-speed I/O)
– The I/O device puts its status information in a status register.
– The processor must periodically check the status register.
– The processor is totally in control and does all the work.
– Very wasteful of processor time.
– Used for low-speed I/O devices (mice, keyboards etc.)
• Interrupt-Driven I/OInterrupt-Driven I/O (For medium-speed I/O): (For medium-speed I/O):– An interrupt line from the I/O device to the CPU is used to
generate an I/O interrupt indicating that the I/O device needs CPU attention.
– The interrupting device places its identity in an interrupt vector.
– Once an I/O interrupt is detected the current instruction is completed and an I/O interrupt handling routine (by OS) is executed to service the device.
I/O data transfer methodsI/O data transfer methodsDirect Memory Access (DMA)Direct Memory Access (DMA) (For high-speed I/O): (For high-speed I/O): • Implemented with a specialized controller that transfers data between an I/O
device and memory independent of the processor.
• The DMA controller becomes the bus master and directs reads and writes between itself and memory.
• Interrupts are still used only on completion of the transfer or when an error occurs.
• Low CPU overhead, used in high speed I/O (storage, network interfaces)
• DMA transfer steps:– The CPU sets up DMA by supplying device identity, operation,
memory address of source and destination of data, the number of bytes to be transferred.
– The DMA controller starts the operation. When the data is available it transfers the data, including generating memory addresses for data to be transferred.
– Once the DMA transfer is complete, the controller interrupts the processor, which determines whether the entire operation is complete.
I/O Performance MetricsI/O Performance Metrics• Diversity: The variety of I/O devices that can be connected to the system.
• Capacity: The maximum number of I/O devices that can be connected to the system.
• Producer/server Model of I/O: The producer (CPU, human etc.) creates tasks to be performed and places them in a task buffer (queue); the server (I/O device or controller) takes tasks from the queue and performs them.
• I/O Throughput: The maximum data rate that can be transferred to/from an I/O device or sub-system, or the maximum number of I/O tasks or transactions completed by I/O in a certain period of time Maximized when task queue is never empty (server always busy).
• I/O Latency or response time: The time an I/O task takes from the time it is placed in the task buffer or queue until the server (I/O system) finishes the task. Includes buffer waiting or queuing time. Minimized when task queue is always empty (no queuing time).
Factors Affecting System Factors Affecting System I/O Processing PerformanceI/O Processing Performance
• I/O processing computational requirements: – CPU computations available for I/O operations.– Operating system I/O processing policies/routines.– I/O Data Transfer Method used.
• CPU cycles needed: Polling >> Interrupt Driven > DMA
• I/O Subsystem performance:– Raw performance of I/O devices (i.e magnetic disk performance).– I/O bus capabilities.– I/O subsystem organization. i.e number of devices, array level .. – Loading level of I/O devices (queuing delay, response time).
• Memory subsystem performance:– Available memory bandwidth for I/O operations.
• Operating System Policies: – File system vs. Raw I/O.– File cache size and write Policy.
• Service time completions vs. waiting time for a busy server: randomly arriving event joins a queue of arbitrary length when server is busy, otherwise serviced immediately
– Unlimited length queues key simplification• A single server queue: combination of a servicing facility that accommodates 1 task at
a time (server) + waiting area (queue): together called a system
• Server spends a variable amount of time with customers, average, Timeserver
Timeystem = Timequeue + Timeserver
Timequeue = Lengthqueue x Timeserver + Time for the server to complete current taskTime for the server to complete current task = Server utilization x remaining service time of current task
Lengthqueue = Arrival Rate x Timequeue (Little’s Law)
A Little Queuing Theory: M/G/1 and M/M/1A Little Queuing Theory: M/G/1 and M/M/1• Assumptions so far:
– System in equilibrium– Time between two successive arrivals in line are random– Server can start on next customer immediately after prior finishes– No limit to the queue: works First-In-First-Out
– Afterward, all customers in line must complete; each avg Tser
• Described “memoryless” or Markovian request arrival (M for C=1 exponentially random), General service distribution (no restrictions), 1 server: M/G/1 queue
• When Service times have C = 1, M/M/1 queue
• Tq = Tser x u x (1 + C) /(2 x (1 – u)) = Tser x u / (1 – u) Tser average time to service a customer
I/O Queuing Performance: An M/M/1 ExampleI/O Queuing Performance: An M/M/1 Example• A processor sends 10 x 8KB disk I/O requests per second, requests &
service are exponentially distributed, average disk service time = 20 ms
• On average: – How utilized is the disk, u?– What is the average time spent in the queue, Tq? – What is the average response time for a disk request, Tsys ?– What is the number of requests in the queue Lq? In system, Lsys?
• We have:
r average number of arriving requests/second = 10Tser average time to service a request = 20 ms (0.02s)
• We obtain:
u server utilization: u = r x Tser = 10/s x .02s = 0.2 = 20%Tq average time/request in queue = Tser x u / (1 – u)
= 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0 .005s)Tsys average time/request in system: Tsys = Tq + Tser= 25 msLq average length of queue: Lq= r x Tq
= 10/s x .005s = 0.05 requests in queueLsys average # tasks in system: Lsys = r x Tsys = 10/s x .025s = 0.25
I/O Queuing Performance: An M/M/1 ExampleI/O Queuing Performance: An M/M/1 Example• Previous example with a faster disk with average disk service time = 10 ms• The processor still sends 10 x 8KB disk I/O requests per second, requests &
service are exponentially distributed
• On average: – How utilized is the disk, u?– What is the average time spent in the queue, Tq? – What is the average response time for a disk request, Tsys ?
• We have:
r average number of arriving requests/second = 10Tser average time to service a request = 10 ms (0.01s)
• We obtain:
u server utilization: u = r x Tser = 10/s x .01s = 0.1 = 10%
Tq average time/request in queue = Tser x u / (1 – u) = 10 x 0.1/(1-0.1) = 10 x 0.11 = 1.11 ms (0 .0011s)
Tsys average time/request in system: Tsys = Tq +Tser=10 + 1.11 =
= 11.11 ms response time is 25/11.11 = 2.25 times faster even though the new
service time is only 2 times faster due to lower queuing time .
Example: Example: Determining the System Performance Determining the System Performance
BottleneckBottleneck (ignoring I/O queuing delays)(ignoring I/O queuing delays)• Assume the following system components:
– 500 MIPS CPU– 16-byte wide memory system with 100 ns cycle time– 200 MB/sec I/O bus – 20 20 MB/sec SCSI-2 buses, with 1 ms controller overhead– 5 disks per SCSI bus: 8 ms seek, 7,200 RPMS, 6MB/sec
• Other assumptions– All devices used to 100% utilization, always have average values– Average I/O size is 16 KB– OS uses 10,000 CPU instructions for a disk I/O– Ignore disk/controller queuing delays.
• Since I/O queuing delays are ignored here 100% disk utilization is allowed
• What is the average IOPS? What is the average I/O bandwidth?
Example: Determining the I/O BottleneckExample: Determining the I/O BottleneckAccounting For I/O Queue TimeAccounting For I/O Queue Time ( (M/M/m queue)
• Assume the following system components:– 500 MIPS CPU– 16-byte wide memory system with 100 ns cycle time– 200 MB/sec I/O bus – 20, 20 MB/sec SCSI-2 buses, with 1 ms controller overhead– 5 disks per SCSI bus: 8 ms seek, 7,200 RPMS, 6MB/sec
• Other assumptions– All devices used to 60% utilization (i.e u = 0.6).– Treat the I/O system as an M/M/m queue.– Requests are assumed spread evenly on all disks.– Average I/O size is 16 KB
– OS uses 10,000 CPU instructions for a disk I/O
• What is the average IOPS? What is the average bandwidth?• Average response time per IO operation?