EECC551 - Shaaban EECC551 - Shaaban #1 Lec # 12 Winter 2000 2-1-2001 I/O Systems I/O Systems Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller I/O Controller Graphics Network interrupts interrupts Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)
52
Embed
I/O Systems - Rochester Institute of Technologymeseec.ce.rit.edu/eecc551-winter2000/551-2-1-2001.pdf · Peripheral Bus (VME, FutureBus ... A System Performance ... • The bus is
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Storage Cost Per MegabyteStorage Cost Per Megabyte
The price per megabyte of disk storage has been decreasing at about 40% per year based on improvements in data density,-- even faster than the price decline for flash memory chips. Recent trends in HDD price per megabyte show an even steeper reduction.
Since the 1980's smaller form factor disk drives have grown in storage capacity. Today's 3.5 inch form factordrives designed for the entry-server market can store more than 75 Gbytes at the 1.6 inch height on 5 disks.
Drive areal density has increased by a factor of 8.5 million since the first disk drive, IBM's RAMAC, was introduced in 1957. Since 1991, the rate of increase in areal density has accelerated to 60% per year, andsince 1997 this rate has further accelerated to an incredible 100% per year.
Magnetic Drive Magnetic Drive Areal Areal Density EvolutionDensity Evolution
Main Bus CharacteristicsMain Bus CharacteristicsOption High performance Low costBus width Separate address Multiplex address
& data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word transferless bus overhead is simpler
Bus masters Multiple Single master(requires arbitration) (no arbitration)
Split Yes, separate No , continuous transaction?Request and Reply connection is cheaper packets gets higher and has lower latencybandwidth(needs multiple masters)
Obtaining Access to the Bus: Bus ArbitrationObtaining Access to the Bus: Bus ArbitrationBus arbitration decides which device (bus master) gets theuse of the bus next. Several schemes exist:
• A single bus master:– All bus requests are controlled by the processor.
• Daisy chain arbitration:– A bus grant line runs through the devices from the highest
priority to lowest (priority determined by the position onthe bus).
– A high-priority device intercepts the bus grant signal, notallowing a low-priority device to see it (VME bus).
• Centralized, parallel arbitration:– Multiple request lines for each device.
– A centralized arbiter chooses a requesting device andnotifies it that it is now the bus master.
A read transaction is tagged and broken into:• A read request-transaction containing the address• A memory-reply transaction that contains the data⇒ address on the bus refers to a later memory access
• Used when multiple bus masters are present,• Also known as a pipelined or a packet-switched bus• The bus is available to other bus masters while a memory operation is in progress
⇒ Higher bus bandwidth, but also higher bus latency
Asynchronous Bus OperationAsynchronous Bus Operation• Not clocked, instead self-timed using hand-shaking protocols between senders and receivers
A bus master performing a write:• Master obtains control and asserts address, direction, and data• Wait for a specified time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack t3: Master releases request t4: Slave releases ack
I/O data transfer methodsI/O data transfer methodsDirect Memory Access (DMA):Direct Memory Access (DMA):• Implemented with a specialized controller that transfers data between
an I/O device and memory independent of the processor.
• The DMA controller becomes the bus master and directs reads andwrites between itself and memory.
• Interrupts are still used only on completion of the transfer or when anerror occurs.
• DMA transfer steps:– The CPU sets up DMA by supplying device identity, operation,
memory address of source and destination of data, the number ofbytes to be transferred.
– The DMA controller starts the operation. When the data is availableit transfers the data, including generating memory addresses for datato be transferred.
– Once the DMA transfer is complete, the controller interrupts theprocessor, which determines whether the entire operation is complete.
I/O Performance MetricsI/O Performance Metrics• Diversity: The variety of I/O devices that can be connected to the system.
• Capacity: The maximum number of I/O devices that can be connected tothe system.
• Producer/server Model of I/O: The producer (CPU, human etc.)creates tasks to be performed and places them in a task buffer (queue);the server (I/O device or controller) takes tasks from the queue andperforms them.
• I/O Throughput: The maximum data rate that can be transferredto/from an I/O device or sub-system, or the maximum number of I/Otasks or transactions completed by I/O in a certain period of time
⇒ Maximized when task buffer is never empty.
• I/O Latency or response time: The time an I/O task takes from the timeit is placed in the task buffer or queue until the server (I/O system)finishes the task. Includes buffer waiting or queuing time.
• Given: An I/O system in equilibrium input rate is equal to output rate) and:– Tser : Average time to service a task– Tq : Average time per task in the queue– Tsys : Average time per task in the system, or the response time, the sum of Tser and Tq
– r : Average number of arriving tasks/sec– Lser : Average number of tasks in service.– Lq : Average length of queue– Lsys : Average number of tasks in the system, the sum of L q and Lser
• Little’s Law states: Lsys = r x Tsys
• Server utilization = u = r / Service rate = r x Tser
u must be between 0 and 1 otherwise there would be more tasks arrivingthan could be serviced.
• Service time completions vs. waiting time for a busy server:randomly arriving event joins a queue of arbitrary lengthwhen server is busy, otherwise serviced immediately
– Unlimited length queues key simplification
• A single server queue: combination of a servicing facility thataccommodates 1 customer at a time (server) + waiting area(queue): together called a system
• Server spends a variable amount of time with customers;how do you characterize variability?
– Distribution of a random variable: histogram? curve?
A Little Queuing Theory:A Little Queuing Theory:Average Wait TimeAverage Wait Time
• Calculating average wait time in queue Tq
– If something at server, it takes to complete on average m1(z)– Chance server is busy = u; average delay is u x m1(z)– All customers in line must complete; each avg Tser
Tq = u x m1(z) + Lq x Ts er= 1/2 x u x Tser x (1 + C) + Lq x Ts er
Tq = 1/2 x u x Ts er x (1 + C) + r x Tq x Ts er
Tq = 1/2 x u x Ts er x (1 + C) + u x Tq
Tq x (1 – u) = Ts er x u x (1 + C) /2Tq = Ts er x u x (1 + C) / (2 x (1 – u))
• Notation: r average number of arriving customers/second
Tser average time to service a customeru server utilization (0..1): u = r x Tser
Tq average time/customer in queueLq average length of queue:Lq= r x Tq
A Little Queuing Theory: M/G/1 and M/M/1A Little Queuing Theory: M/G/1 and M/M/1
• Assumptions so far:– System in equilibrium
– Time between two successive arrivals in line are random
– Server can start on next customer immediately after prior finishes
– No limit to the queue: works First-In-First-Out
– Afterward, all customers in line must complete; each avg Tser
• Described “memoryless” or Markovian request arrival(M for C=1 exponentially random), General servicedistribution (no restrictions), 1 server: M/G/1 queue
• When Service times have C = 1, M/M/1 queueTq = Tser x u x (1 + C) /(2 x (1 – u)) = Tser x u / (1 – u)
Tser average time to service a customeru server utilization (0..1): u = r x TserTq average time/customer in queue
I/O I/O QueuingQueuing Performance: An Example Performance: An Example• A processor sends 10 x 8KB disk I/O requests per second, requests &
service are exponentially distributed, average disk service time = 20 ms
• On average:– How utilized is the disk, u?– What is the average time spent in the queue, Tq?– What is the average response time for a disk request, Tsys ?– What is the number of requests in the queue Lq? In system, Lsys?
• We have:r average number of arriving requests/second = 10Tser average time to service a request = 20 ms (0.02s)
• We obtain:
u server utilization: u = r x Tser = 10/s x .02s = 0.2Tq average time/request in queue = Tser x u / (1 – u)
= 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0 .005s)Tsys average time/request in system: Tsys = Tq +Tser= 25 msLq average length of queue: Lq= r x Tq
= 10/s x .005s = 0.05 requests in queueLsys average # tasks in system: Lsys = r x Tsys = 10/s x .025s = 0.25
I/O I/O QueuingQueuing Performance: An Example Performance: An Example• Previous example with a faster disk with average disk service time = 10 ms• The processor still sends 10 x 8KB disk I/O requests per second, requests &
service are exponentially distributed
• On average:– How utilized is the disk, u?– What is the average time spent in the queue, Tq?– What is the average response time for a disk request, Tsys ?
• We have:r average number of arriving requests/second = 10Tser average time to service a request = 10 ms (0.01s)
• We obtain:
u server utilization: u = r x Tser = 10/s x .01s = 0.1
Tq average time/request in queue = Tser x u / (1 – u)= 10 x 0.1/(1-0.1) = 10 x 0.11 = 1.11 ms (0 .0011s)
Tsys average time/request in system: Tsys = Tq +Tser=10 + 1.11 =
= 11.11 ms response time is 25/11.11 = 2.25 times faster even though the new