IBM Systems & Technology Group © 2007 IBM Corporation Case Study: Overloaded Chpids Revision 2008-07-29 BKW IBM z/VM Performance Evaluation Brian Wade [email protected]
IBM Systems & Technology Group
© 2007 IBM Corporation
Case Study: Overloaded Chpids
Revision 2008-07-29 BKW
IBM z/VM Performance EvaluationBrian Wade [email protected]
IBM Systems & Technology Group
© 2007 IBM Corporation2
TrademarksTrademarks
The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: AS/400, DBE, e-business logo, ESCO, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/390, VM/ESA, VSE/ESA, Websphere, xSeries, z/OS, zSeries, z/VM
The following are trademarks or registered trademarks of other companies
Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development CorporationJava and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countriesLINUX is a registered trademark of Linus TorvaldsUNIX is a registered trademark of The Open Group in the United States and other countries.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.Intel is a registered trademark of Intel Corporation* All other products may be trademarks or registered trademarks of their respective companies.
NOTES:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.
The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM Systems & Technology Group
© 2007 IBM Corporation3
Customer Configuration
3 z/VM partitions on z900Each partition has about 25 3390-3 in a 2105-F20– One 2105-F20 serves all three partitions– Separate LCUs for each partition– Each partition has its own four ESCON chpids to the 2105Customer wants to back up all of these 3390-3 to a 2107 once per week– N guests concurrently, each running one DDROnly FICON is available to the 2107How much FICON capacity is required?
IBM Systems & Technology Group
© 2007 IBM Corporation4
ESCON vs. FICON Express
FasterFastIOP
32 I/Os at a time
(32x)
One I/O at a timeOpen exchanges
1 Gb/sec
(7.5x)
About 135 Mb/secLink speed
FICON ExpressESCON
IBM Systems & Technology Group
© 2007 IBM Corporation5
Support Staff Claim is….
FICON is so much faster than ESCON, and…FICON can do >1 I/O at a time, and…We don’t have so many FICON ports on our switches, so…Let’s give him only one FICON chpid, and…He can come back if he thinks he needs moreWays to check:– FCX161 LCHANNEL– FCX108 DEVICE– FCX232 IOPROCLG– Open exchange analysis
IBM Systems & Technology Group
© 2007 IBM Corporation6
FCX161 LCHANNELFCX161 Run 2008/07/25 17:54:09 LCHANNEL
Channel Load and Channel Busy Distribution
From 2008/07/20 04:10:55
To 2008/07/20 05:18:55
For 4080 Secs 01:08:00 Result of xxxxxxxx Run
_____________________________________________________________________________________________
CHPID Chan-Group
(Hex) Descr Qual Shrd Cur Ave 0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
CB ESCON 00 No 33 39 0 0 15 56 22 0 0 1 0 6
BC ESCON 00 No 33 38 0 0 19 54 19 0 0 1 0 6
DA ESCON 00 No 32 38 0 0 22 56 15 0 0 1 0 6
E9 ESCON 00 No 32 38 0 0 22 56 15 0 0 1 0 6
1A FICON 00 Yes 23 24 0 9 84 3 4 0 0 0 0 0
0C OSE 00 Yes 0 0 100 0 0 0 0 0 0 0 0 0
0D OSE 00 Yes 0 0 100 0 0 0 0 0 0 0 0 0
AD ESCON 00 No 0 0 100 0 0 0 0 0 0 0 0 0
These numbers are CPU BUSY ON THE CHANNEL ADAPTER.
- NOT fiber saturation
- NOT I/O concurrency saturation
- So far it doesn’t look too bad.
IBM Systems & Technology Group
© 2007 IBM Corporation7
FCX108 DEVICEFCX108 Run 2008/07/28 12:10:44 DEVICE
General I/O Device Load and Performance
Mdisk Pa- Req.
Addr Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued Busy READ
>> All DASD DDR00013 0 4 11.9 .0 5.1 .3 4.1 9.5 9.5 .0 .0 11 100
C800 3390 >DDR00001 0 4 11.8 .0 5.2 .1 4.1 9.4 9.4 .0 .0 11 100
C802 3390 >DDR00003 0 4 11.9 .0 5.2 .1 4.1 9.4 9.4 .0 .0 11 100
C804 3390 >DDR00005 0 4 11.9 .0 5.2 .1 4.1 9.4 9.4 .0 .0 11 100
C806 3390 >DDR00007 0 4 11.9 .0 5.2 .1 4.1 9.4 9.4 .0 .0 11 100
... some sample 2107/FICON targets
D01A 3390 >DDR00027 0 1 11.8 .0 24.6 40.9 9.6 75.1 75.1 .0 .0 89 0
D000 3390 >DDR00001 0 1 11.8 .0 24.6 40.9 9.5 75.0 75.0 .0 .0 89 0
D009 3390 >DDR00010 0 1 11.8 .0 24.4 40.9 9.7 75.0 75.0 .0 .0 89 0
D00E 3390 >DDR00015 0 1 11.8 .0 24.8 41.0 9.1 74.9 74.9 .0 .0 89 0
D008 3390 >DDR00009 0 1 11.9 .0 24.7 40.8 9.2 74.7 74.7 .0 .0 89 0
Uh, this doesn’t look so good…
IBM Systems & Technology Group
© 2007 IBM Corporation8
What FCX108 is Telling Us
ESCON sources– Possible ESCON contention – PEND ~ 5 msec– 2105 is doing fine – low DISC– CONN is indicative of data transfer size (1 I/O at a time)FICON targets– FICON looks in trouble – PEND ~ 25 msec– 2107 is hurting – high DISC time– Longer CONN than ESCON? But chpid is faster!Pop quiz– Why is “Req. Qued” = 0?
IBM Systems & Technology Group
© 2007 IBM Corporation9
FCX232 IOPROCLG FCX232 Run 2008/07/28 12:10:44 IOPROCLG
I/O Processor Activity by Time
From 2008/07/20 04:10:55
To 2008/07/20 05:18:55
For 4080 Secs 01:08:00 Result of WEB0720 Run
_______________________________________________________________________________
Interval Proc Proc
End Time Number Beg_SSCH I/O_Int %Busy Channel Switch CU Device
>>Mean>> 0 603.0 604.5 21.4 3037 .0 .0 .0
>>Mean>> 1 1815 1814 72.6 2261 .0 .0 .0
>>Mean>> 2 .1 .1 .0 18.1 .0 .0 .0
04:11:55 0 905.1 898.8 100.0 7516 .0 .0 .0
04:11:55 1 2529 2535 100.0 2285 .0 .0 .0
04:11:55 2 .1 .1 .0 .0 .0 .0 .0
04:12:55 0 1017 1011 100.0 7470 .0 .0 .0
04:12:55 1 2876 2882 100.0 2140 .0 .0 .0
04:12:55 2 .2 .4 .0 .0 .0 .0 .0
04:13:55 0 988.9 983.0 100.0 7675 .0 .0 .0
04:13:55 1 2851 2857 100.0 2159 .0 .0 .0
04:13:55 2 .6 .7 .0 .0 .0 .0 .0
You have to know how to interpret this report.
These are not percentages. PERFKIT is misleading.
PERFKIT is telling us about busy conditions the SAPs (channel subsystem) are encountering, per SSCH they handle.
/100 = busy indications per SSCH.
IOP0 is seeing 30.37 channel busy situations per SSCH.
Yes, we are fixing PERFKIT.
IBM Systems & Technology Group
© 2007 IBM Corporation10
Lifetime of an I/O
t = cwait + irwait + disc + conn– Cwait = time IOP spends waiting for a chpid– Irwait = time spent waiting for initial response from controller, once a chpid is obtained– Disc = time the controller spends apart from the channel subsystem (usually cache miss)– Conn = time the controller spends in session with channel subsystem (usually data transfer)
Our old friend PEND = cwait + irwait– This is a bit unfortunate because it includes two very different kinds of waits– But as we will see in this workload, this isn’t much of a hindrance to the analysis
“Exchange time” EXCH = irwait + disc + conn– Time the operation is open on the chpid– Analyzing this can be very useful– Perfkit doesn’t report it directly– But it’s in the MONWRITE data, for the intrepid or desperate
• Yet another reason why we ask for raw MONWRITE data
IBM Systems & Technology Group
© 2007 IBM Corporation11
ESCON EXCH times typical in this workloadConsolidated device C800 240RES
__________________________WEB0720__________________________
__Time__ ___IOR___ __IRPIO__ __DDPIO__ __CNPIO__ __EXPIO__ __EXPS___
08:12:00 32.033 0.000 0.143 4.124 4.267 136.687
08:13:00 30.850 0.000 0.119 4.136 4.255 131.262
08:14:00 30.817 0.000 0.167 4.130 4.296 132.397
08:15:00 30.667 0.000 0.176 4.120 4.296 131.748
08:16:00 21.900 0.000 0.107 4.114 4.221 92.437
From FCX168 DEVLOG C800:
Interval Mdisk Pa- Req.
End Time Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued Busy READ
04:11:55 3390 >DDR00001 0 4 32.1 .0 24.1 .1 4.1 28.3 28.9 .0 .0 91 100
04:12:55 3390 >DDR00001 0 4 30.9 .0 24.6 .1 4.1 28.8 28.8 .0 .0 89 100
04:13:55 3390 >DDR00001 0 4 30.8 .0 25.0 .2 4.1 29.3 29.9 .0 .0 90 100
04:14:55 3390 >DDR00001 0 4 31.3 .0 24.0 .2 4.1 28.3 28.3 .0 .0 88 100
04:15:55 3390 >DDR00001 0 4 22.5 .0 18.1 .1 4.1 22.3 22.3 .0 .0 50 100
We see almost no IRPIO but ~ 25 msec of PEND.
Therefore PEND is almost all CWAIT.
These ESCON chpids are too busy.
IBM Systems & Technology Group
© 2007 IBM Corporation12
FICON EXCH times typical in this workloadConsolidated device D01A WBK026
__________________________WEB0720__________________________
__Time__ ___IOR___ __IRPIO__ __DDPIO__ __CNPIO__ __EXPIO__ __EXPS___
08:18:00 13.000 0.857 21.950 12.011 34.818 452.634
08:19:00 12.717 0.956 22.792 11.514 35.262 448.418
08:20:00 13.417 0.993 20.710 11.731 33.434 448.576
08:21:00 12.883 0.900 23.369 11.040 35.309 454.901
From FCX168 DEVLOG for D01A
Interval Mdisk Pa- Req.
End Time Type Label/ID Links ths I/O Avoid Pend Disc Conn Serv Resp CUWt Qued Busy READ
04:18:00 3390 >DDR00027 0 1 13.0 .0 37.1 21.9 12.0 71.0 71.0 .0 .0 92 0
04:19:00 3390 >DDR00027 0 1 12.7 .0 38.9 22.8 11.5 73.2 73.2 .0 .0 93 0
04:20:00 3390 >DDR00027 0 1 13.4 .0 36.5 20.7 11.7 68.9 70.4 .0 .0 92 0
04:20:59 3390 >DDR00027 0 1 12.9 .0 36.7 23.4 11.0 71.1 71.1 .0 .0 92 0
IRtime ~ 1 msec => rest of PEND is ctime => too few FICON chpids.
IBM Systems & Technology Group
© 2007 IBM Corporation13
What We See So Far
Channel subsystem is really queuing on FICON chpid
Channel subsystem is queuing on ESCON chpids
This queuing is elongating I/O response time
Can we see how things are going on the chpidsthemselves?
IBM Systems & Technology Group
© 2007 IBM Corporation14
Open Exchange Analysis
For each device,– EXCHtime/sec = IOs/sec * EXCHtime/io– EXCHtime/io = IRtime/io + DISC/io + CONN/ioSum over all devices reachable over the chpidsDivide by number of chpids to get e = EXCHtime/chpid/sec– On ESCON, 0
IBM Systems & Technology Group
© 2007 IBM Corporation15
ESCON e, typical for this workload
Open Exchange Report, assuming 4 chpids
__Time__ _WEB0720__ __Total___ ___OEX____
08:11:00 186.652 186.652 0.047
08:12:00 4017.602 4017.602 1.004
08:13:00 3944.960 3944.960 0.986
08:14:00 3953.350 3953.350 0.988
08:15:00 3867.998 3867.998 0.967
08:16:00 2753.180 2753.180 0.688
08:17:00 1646.332 1646.332 0.412
08:18:00 1664.512 1664.512 0.416
08:19:00 1625.777 1625.777 0.406
08:20:00 1631.189 1631.189 0.408
08:21:00 1601.634 1601.634 0.400
08:22:00 1630.931 1630.931 0.408
08:23:00 1630.526 1630.526 0.408
IBM Systems & Technology Group
© 2007 IBM Corporation16
FICON e for this workload
Open Exchange Report, assuming 1 chpids
__Time__ _WEB0720__ _GRN0720__ _YEL0720__ __Total___ ___OEX____
08:19:00 13619.693 27151.697 10842.976 51614.366 51.614
08:20:00 13259.164 26382.097 10587.415 50228.676 50.229
08:21:00 13732.412 27513.826 10948.452 52194.690 52.195
08:22:00 13682.709 27134.276 10962.487 51779.473 51.779
08:23:00 13869.775 27277.918 10929.429 52077.122 52.077
08:24:00 13816.064 27432.410 10977.666 52226.140 52.226
08:25:00 13803.868 27340.420 11164.580 52308.868 52.309
08:26:00 14067.490 27749.820 11181.547 52998.857 52.999
08:27:00 14320.785 28666.660 11508.521 54495.966 54.496
08:28:00 14776.981 29448.911 11734.622 55960.514 55.961
08:29:00 14651.085 29432.937 11657.303 55741.325 55.741
08:30:00 13937.165 28424.352 11233.331 53594.848 53.595
08:31:00 14817.231 29654.477 11737.690 56209.397 56.209
08:32:00 16785.233 35103.435 13485.180 65373.847 65.374
08:33:00 21109.491 40675.031 16819.157 78603.680 78.604
Uniformly awful.
Usually don’t want to drive this past about 4 to 6.
PEND is correspondingly awful throughout the run.
IBM Systems & Technology Group
© 2007 IBM Corporation17
ESCON CONN < FICON CONN?
I thought CONN was time spent sending data
Shouldn’t FICON time be less? The fiber’s faster.
ESCON: one I/O, then next I/O, then next I/O…
FICON: I/Os are interleaved, like IP packets
CONN is time from beginning of first frame to end of last frame
Even though fiber is faster, FICON CONN is longer
Too much interleaving
Get more chpids
IBM Systems & Technology Group
© 2007 IBM Corporation18
Very Rough Crack at FICON Interleaving Level
ESCON conn per I/O was 4.1 msec
FICON conn per I/O was 11.5 msec
FICON fiber is about 7.5x as fast as ESCON fiber
(11.5 / (4.1 / 7.5)) = 21 interleaving factor
Either way you look at this, the FICON interleaving factor is just way too high in this workload
Use caution with this technique. Because of the prevalence of FICON, controllers often do not disconnect anymore, except for acache miss.
IBM Systems & Technology Group
© 2007 IBM Corporation19
Remediation Possibilities for This Workload
More FICON chpids to 2107– Data suggests drop from 70 msec/IO to 35 msec/IO if chpid limitations
were remediated– We don’t know whether this would just make the 2107 DISC situation
worse
Spread FICON target volumes over >1 2107– Would certainly help with DISC time
More ESCON chpids to 2105– Probably least urgent, because PEND settled down after a few
minutes
Consider Metro Mirror aka Synchronous PPRC?
IBM Systems & Technology Group
© 2007 IBM Corporation20
How Do You Know When You Have Enough?
No cwait means you have enough chpids– Probably PEND is a good approximator of this
FCX232 IOPROCLG shows no channel busy hits
CONN ratio is in line with fiber speed ratios– Be careful of non-data-transfer-time charged to CONN
DISC time will come down as you add controller cache– Spread data across multiple controllers
Acceptable application response time
IBM Systems & Technology Group
© 2007 IBM Corporation21
Summary
LCHANNEL, aka “channel busy” report, is probably the least useful indicator of “busy-ness”
PEND time is generally indicative of channel subsystem contention, even though it contains both cwait and irwait
FCX232 IOPROCLG does show us busy retries, if we know how to look at it
Open exchange analysis reveals how bad the situation is on the fiber itself