EPV Technologies z13 Capacity Planning 1 z13 Capacity Planning Fabio Massimo Ottaviani – EPV Technologies April 2015 1 Introduction On January 14 th IBM announced its new generation of the mainframe. The new system is simply called IBM z13 while the family model is 2964. Experienced capacity planners know that every new generation of machines provides a major challenge to their skills. They also know that their best friends are the IBM LSPR benchmarks, the IBM zPCR tool, the Measurement Facility counters provided in SMF 113 and an up to date performance database. At the time of writing this paper the only thing available are the IBM LSPR benchmarks so in the first part of this paper, after a quick look at the most important capacity characteristics of the IBM z13, we will start from them to calculate the MIPS capacity of each IBM z13 processor model. We will also compare z13 single CP capacity and workload variability with previous machine generations.
29
Embed
z13 Capacity Planning - EPV Technologies White Papers/z13 Capacity... · EPV Technologies z13 Capacity Planning 2 2 z13 capacity highlights It’s interesting to note that, for some
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EPV Technologies
z13 Capacity Planning 1
z13 Capacity Planning
Fabio Massimo Ottaviani – EPV Technologies
April 2015
1 Introduction On January 14th IBM announced its new generation of the mainframe. The new system is simply called IBM z13 while the family model is 2964. Experienced capacity planners know that every new generation of machines provides a major challenge to their skills. They also know that their best friends are the IBM LSPR benchmarks, the IBM zPCR tool, the Measurement Facility counters provided in SMF 113 and an up to date performance database. At the time of writing this paper the only thing available are the IBM LSPR benchmarks so in the first part of this paper, after a quick look at the most important capacity characteristics of the IBM z13, we will start from them to calculate the MIPS capacity of each IBM z13 processor model. We will also compare z13 single CP capacity and workload variability with previous machine generations.
EPV Technologies
z13 Capacity Planning 2
2 z13 capacity highlights It’s interesting to note that, for some aspects, the z13 is more an evolution of the zBC12 than the zEC12. As the zBC12 it uses CPC drawers, instead of books, and single chip modules (SCM), instead of multi chip modules (MCM). The processor speed is slightly less than in the previous machine generations but thanks to the new processor architecture the single processor capacity has increased. See next chapter.
System z13 zEC12 z196 Type 2964 2827 2817
HW Models N30, N63, N96,
NC9, NE1 H20, H43, H66,
H88, HA1 M15, M32, M49,
M66, M80 Cycle rate (Ghz) 5,0 5,5 5,2
Max CP 141 101 80 Max LPARs 85 60 60
zAAP support N Y Y Subcapacity models 4xx, 5xx, 6xx 4xx, 5xx, 6xx 4xx, 5xx, 6xx
Entry MIPS 250 240 240 Max MIPS 111.556 78.426 52.286 Entry MSU 31 30 30 Max MSU 13.078 9.194 6.140
Figure 1 In the table above we compare some of the most important capacity characteristics of z13, zEC12 and z196 machines. From the point of view of CP capacity the magic number is 40%. Compared to zEC12 the z13 provides:
About 40% more CPs; About 40% more total capacity; About 40% more LPARs.
Memory is a different story; the minimum size is 64 GB while the maximum size is 10 TB, versus a maximum of 3 TB available with zEC12. The message behind this is that memory is a critical factor in order to improve response time and reduce CPU consumption of modern applications. First rumours say that IBM will be very aggressive on memory pricing. As usual, subcapacity models (4xx, 5xx and 6xx) are available. The entry point is not very different from zEC12, about 250 MIPS (31 MSU). As expected zAAP are not supported anymore. Finally the announced SMT (Symmetric Multi-Thread) revolution has only partially started. IBM decided in fact to provide SMT for zIIP and IFL but not for the standard CPU.
EPV Technologies
z13 Capacity Planning 3
The reason is that SMT will increase the overall throughput but it will introduce very big challenges from the point of view of single address space performance, variability, measurement and accounting. IBM wisely chose a gentle approach which will allow them and their customers to gain experience with these much less critical resources first.
EPV Technologies
z13 Capacity Planning 4
3 IBM LSPR benchmarks and MIPS values On the same day as the announcement a new set of IBM LSPR benchmarks for z/OS 2.1 has been published on the web. These benchmarks are available for all the IBM machines including z13. However for all the machines, except z13, the published values are exactly the same as those already available in the z/OS 1.13 benchmark table. As usual benchmarks for three workload categories are provided: LOW RNI (Relative Nest Intensity): this category represents workloads lightly using the
memory nest (shared processor caches and memory). This would be similar to past high scaling primitives.
AVERAGE RNI (Relative Nest Intensity): this category represents workloads with an average use of the memory nest (shared processor caches and memory) hierarchy. This would be similar to the past LoIO-mix workload and is expected to represent the majority of production workloads.
HIGH RNI (Relative Nest Intensity): this category represents workloads heavily using the memory nest (shared processor caches and memory). This would be similar to the past DI-mix workload.
Benchmark values are the ITR ratio between the capacity of each processor model and the capacity of a reference processor model which, as in the z/OS 1.13 table, is the 2094-701. Starting from the published values, shown in the columns with a dark blue header in the Figure below, we calculated the capacity of each processor model, in MIPS, by multiplying the benchmark values by the suggested “capacity scaling factor”1. Only the starting and ending rows of the table are shown here. You can find the full list of z13 processor models in Appendix A. You can note that a PCI value is also provided. It is very close to the AVG MIPS capacity. The difference is due to the lack of precision, only 2 decimals, of the published benchmarks.
1 More information at https://www-304.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprindex?OpenDocument
EPV Technologies
z13 Capacity Planning 5
In the next figure you can find the MIPS to MSU ratio of every processor model of the last 3 mainframe generations. You can see that the shape of the graphs is very similar. So IBM definitely stopped using the machine MSU capacity to make discounts. The ratio is also very stable. It starts from 8,1 (8,0 for z196) for single CP processor models and rises up to become flat at 8.5 (from a capacity of about 26.000 MIPS).
Figure 3
-
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0
2817
-704
2817
-710
2817
-716
2817
-722
2817
-728
2817
-734
2817
-740
2817
-746
2817
-752
2817
-758
2817
-764
2817
-770
2817
-776
2827
-705
2827
-711
2827
-717
2827
-723
2827
-729
2827
-735
2827
-741
2827
-747
2827
-753
2827
-759
2827
-765
2827
-771
2827
-777
2827
-783
2827
-789
2827
-795
2827
-7A1
2964
-703
2964
-709
2964
-715
2964
-721
2964
-727
2964
-733
2964
-739
2964
-745
2964
-751
2964
-757
2964
-763
2964
-769
2964
-775
2964
-781
2964
-787
2964
-793
2964
-799
2964
-7A5
2964
-7B1
2964
-7B7
2964
-7C3
2964
-7C9
2964
-7D5
2964
-7E1
MIPS/MSU
EPV Technologies
z13 Capacity Planning 6
4 Single CP capacity The graph below shows the maximum capacity of the single CP processor model of each of the last five mainframe generations.
Figure 4
You can see that the relative single CP capacity improvement is continuing to decline:
from z9 to z10 it improved by 61% from z10 to z196 it improved by 33% from z196 to zEC12 it improved by 26% from zEC12 to z13 it improved by 12%.
So it really seems that Moore’s Law is approaching its limits for current commercial technologies. Capacity growth is therefore based on an increasing number of processors and, in the near future, on SMT exploitation.
560
902
1202
1514
1695
0
200
400
600
800
1.000
1.200
1.400
1.600
1.800
2094-701 2097-701 2817-701 2827-701 2964-701
MIPS
Uniprocessor Capacity Full speed modes
z9 z10 z196 zEC12 z13
EPV Technologies
z13 Capacity Planning 7
5 Capacity variability The following graph shows a comparison between the last 5 mainframe generations of a variability index calculated using the workload MIPS capacity as follows:
Low RNI MIPS – High RNI MIPS ------------------------------------------
Average RNI MIPS
You can see that capacity variability grows very quickly with the number of processors then flattens. You can also note that, with the z10 exception, workload variability is increasing with every new generation.
Figure 5
Very big z13 processor models show a variability higher than 45%. It means that the difference in z13 capacity between LOW RNI and HIGH RNI is 45% the AVG RNI capacity. So if you consider a 2094-7C2 model (122 CP) the LOW RNI capacity is 129.161 MIPS, the AVG RNI capacity is 100.124 MIPS and the HIGH RNI capacity is 84.148 MIPS. The difference between LOW RNI and HIGH RNI is about 45.000 MIPS (about 45% of AVG RNI capacity).
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
2094
-701
2094
-711
2094
-721
2094
-731
2094
-741
2094
-751
2097
-704
2097
-714
2097
-724
2097
-734
2097
-744
2097
-754
2097
-764
2817
-707
2817
-717
2817
-727
2817
-737
2817
-747
2817
-757
2817
-767
2817
-777
2827
-704
2827
-714
2827
-724
2827
-734
2827
-744
2827
-754
2827
-764
2827
-774
2827
-784
2827
-794
2964
-710
2964
-720
2964
-730
2964
-740
2964
-750
2964
-760
2964
-770
2964
-780
2964
-790
2964
-7A0
2964
-7B0
2964
-7C0
2964
-7D0
2964
-7E0
Capacity variabilityby processor family
z9 z10 z196 zEC12 z13
EPV Technologies
z13 Capacity Planning 8
Figure 6
The graph in Figure 6 shows the absolute capacity difference between LOW RNI and HIGH RNI workloads. The bottom line is that it is more and more important to correctly classify your system workloads in order to use the right benchmark in capacity planning studies. However to do that we need an update of the SMF 113 records to provide hardware measurement facility counters for the z13 machines.
Difference between LOW RNI and HIGH RNI capacityby processor model and mainframe generation
z196 zEC12 z13
EPV Technologies
z13 Capacity Planning 9
6 IBM zPCR A new version (V8.7a) of the IBM zPCR free tool, supporting the z13 machines, is already available on the web. You can download it at: https://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS1381.
Figure 7
zPCR is a “must have” tool for capacity planners. With it you can estimate the capacity of a machine taking into consideration LPAR configuration, operative system level and workload characteristics. As you can see in Figure 7 the reference CPU is still a 2094-701 estimated at 593,00 MIPS. This is the base of zPCR capacity studies and also the base for the ratio or MIPS estimates provided in the LSPR Multi-Image Capacity table. It’s worth noting that the zPCR MIPS values are a bit more precise than the ones you can calculate starting from LSPR benchmarks and provided in the Appendix to the first part of this paper. A snapshot of the table is provided in Figure 8.
EPV Technologies
z13 Capacity Planning 10
Figure 8
As you can see zPCR also provides Low-Avg and Avg-High values which are calculated as an harmonic mean of the Low, Average and High RNI benchmarks. They should be used when workload characteristics are on the border between Low and Average RNI or between Average and High RNI. To understand which benchmark best represents your system workload you need to collect the hardware measurement facility counters (recorded in SMF 113) and pass them as input to zPCR which will automatically select the appropriate LSPR benchmark2. A better solution is collecting SMF 113 in a tool, such as EPV for z/OS, and analysing system workload behaviour in multiple days and at different times of the day. Whatever method you choose, the benchmark to use depends on the number of misses in the Level 1 cache and on the RNI values of the system workload. Starting from these two values you can classify it by using the rules in Figure 9.
2 The z13 processor cache architecture and the hardware measurement facility counters will be discussed in the third part of this paper.
3% to 6% > 1,00 High 3% to 6% 0,60 to 1,00 AVG 3% to 6% < 0,60 Low
> 6 % >= 0,75 High > 6 % < 0,75 AVG
Figure 9
Beside z13 support, the biggest change introduced in this zPCR version is the possibility to take into account the effect of Simultaneous Multi-Threading (SMT) on zIIP engines.
Figure 10
A new option button to“Add SMT benefit to Capacity Results” is provided in the Partition Detail Report. When clicking it, a small box appears allowing you to choose if you want to apply a capacity increase due to SMT to zIIP, IFL or both. Default values are 25% for zIIP and 20% for IFL. Of course you can change them.
EPV Technologies
z13 Capacity Planning 12
Figure 11
In Figure 12 you can see the result: zIIP capacity increased from 1.681 to 2.102 MIPS.
Figure 12
EPV Technologies
z13 Capacity Planning 13
7 z13 Simultaneous Multi-Threading3 7.1 CPU or core?
Originally the CPUs (hardware chip) had a single central processing unit on it. So the term “CPU” was used to indicate both of them.
To increase performance, manufacturers started to increase the number of central processing units in a chip. They called them cores. A multi-core chip appears to the operating system (e.g. z/OS) as multiple processing units which can be used by different processes at the same time. This is what is relevant from a measurement and performance analysis perspective.
Mainframe machines have exploited multi-core chips for many years so we should be accustomed to the term “core”. In reality all mainframe commands, tools, manuals and people still use the term “CPU” to indicate a core.
Figure 13
In the figure above you can see the structure of the z13 PU Single Chip Module (from “IBM z13 Technical Guide”). Eight cores are hosted on the SCM.
3 Most of the content of this chapter has been inspired by “Simultaneous Multithreading and System z” written by Bob Rogers and published in number 3-2014 of Cheryl Watson’s TUNING Letter.
EPV Technologies
z13 Capacity Planning 14
7.2 Advantages and issues of SMT Mainframe cores process instructions in multiple pipes composed of a number of stages each performing one step in the processing of an instruction, similar to an assembly line. However a traditional core can operate on a single instruction stream. A big part of the core capacity is normally wasted when an instruction stream gets stalled waiting for a cache miss to be resolved. To address this issue with z13 machines IBM decided to start exploiting Simultaneous Multi-Threading (SMT). By using SMT multiple instruction streams can be processed simultaneously so when a thread is waiting for a cache miss the core can continue doing work on behalf of the other threads. Unfortunately, the additional throughput from SMT does not scale very well with the number of threads. This is because all the threads on a core share some limited resources (e.g. pipes, processor cache, TLB). We saw in the previous chapter that the default expected increase of zIIP capacity when using SMT-2 (two threads) is only 25% in zPCR.4 As already mentioned IBM has been very cautious with SMT on z13: only SMT-2 can be used and only on zIIP and IFL. The reason of this approach is that, while SMT may generally increase the overall throughput, it introduces some important issues. a. Reduced speed; a thread in an SMT environment is slower than a thread using a dedicated core;
the main reason is the fact that the Level 1 and Level 2 caches are shared among the threads; the effect on the application is similar to running on more but slower engines; the more threads the stronger the effect.
b. Throughput variability; as discussed in the first part of this paper, variability has been increasing with each new mainframe model as the processor designs get ever more complex. With SMT that variability will increase much more because the throughput will also depend on the characteristics of the threads sharing the core. If all threads need the whole Level 1 cache, throughput could be even worse than running without SMT. On the other hand if all threads have a small Level 1 cache footprint the overall throughput could be up to 100% more (with SMT-2) than running without SMT.
c. zIIP measurements; all the zIIP measurements have to be reviewed. The current CPU timer
implementation accounts processor time both when using the processor and when waiting (normally for a Level 1 cache miss); using it with SMT, the time waiting for other threads will be accounted as processor time too. Even zIIP busy may become tricky: if we have only one zIIP core, only one thread is running at 100% busy and we use SMT-2 we could say that the overall zIIP busy is 50% because we have another thread to use. But if we assume that activating the second thread the maximum throughput increase we can get is about 25%, we should say that zIIP busy is 80% because by adding 20% (80%*25%) more work we will reach 100% busy5.
4 On P7 machines the average throughput increase with SMT-2 is about 40%; it will probably be about the same on the mainframe. 5 A solution to this issue has been implemented on P7 machines.
EPV Technologies
z13 Capacity Planning 15
7.3 Settings and commands
To activate the SMT-2 function on z/OS, you have to: define the PROCVIEW CORE option in LOADxx; if you do not want to use SMT-2 you
can omit the PROCVIEW parameter or specify PROCVIEW CPU which is the default; set MT_ZIIP_MODE=2 in IEAOPTxx.
When you define PROCVIEW CORE, you cannot use the word CPU in z/OS commands. You must use CORE instead of CPU. If you want to continue to use CPU in z/OS commands, you have to define PROCVIEW CORE,CPU_OK. This parameter causes z/OS to treat CPU as an acceptable alias for CORE.
Figure 14
You can see that the output of D M=CORE is quite different from the output of D M=CPU6. For each CORE ID there is a range with two ids and each thread appears as a logical processor to z/OS when SMT-2 is used as you can see in the CPU column of CORE ID 0004 (online zIIP).
6 From “IBM z13 Configuration Setup”.
EPV Technologies
z13 Capacity Planning 16
8 Processor cache architecture Generally speaking zEC12 and z13 architectures are very similar. If the data and instructions to be processed are found in the Level 1 cache (L1) dedicated to each processor7, this is called a “cache hit”. In this case the speed of the clock can be exploited well. If the data and instructions cannot be found in L1 then the hardware tries to load them from the Level 2 cache, which is still a cache dedicated to each processor, then from the the Level 3 (L3) cache which is a cache serving all the processors on the same chip, from Level 4 cache (L4) of the same book8, from the Level 4 cache (L4) of another book, from local memory or remote memory in this order. This is a “cache miss” and clock cycles are lost while waiting for data and instructions to be loaded into the L1 cache. The number of lost cycles depends on the cache level accessed, it can range from a few cycles for L2 to hundreds of cycles for memory.
Figure 15 shows a simplified view of the zEC12 processor cache architecture.
Figure 15
zEC12 uses books up to a maximum of 4, each book includes 6 chips and each chip includes 6 processors.
One of the limits of this architecture is that there is no direct communication between L3 caches. Data and instructions moving from one L3 to the other have to pass through the L4 cache which is the coherence manager. So all memory fetches must be in the L4 cache before that data can be used by the processor.
7 There are two L1 caches, one for data the other for instructions, dedicated to each processor but for simplicity only one cache is depicted in the figure. 8 L4 serves all the processors in a book.
L3 – 48MB
L2L1
+4 L3
L1 L1 L1 L1 L1L2 L2 L2 L2 L2
L3 – 48MB
L2L1 L1 L1 L1 L1 L1
L2 L2 L2 L2 L2
L4 – 384MB +3 books
EPV Technologies
z13 Capacity Planning 17
In the z13 cache design, represented in Figure 169, some lines of the L3 cache are not included in the L4 cache. The L4 cache has a non-data inclusive coherent (NIC) directory that has entries pointing to the non-inclusive lines of L3 cache. This design ensures that L3 locally owned lines can be accessed by using the intra L3 node interface without being included in L4.
Figure 16 Another big improvement introduced is in the total amount of cache provided. The new z13 technology gives an increase in the size of cache levels (L1 and L2) without increasing access latency. This has a direct influence on productivity reducing the number of L1 and L2 misses and allowing better exploitation of the processor speed.
Up to 8 processors can be served by a L3 cache in z13 so, even though the L3 size increased, the average amount of MB per processor is the same.
Finally the size of the L4 cache has been hugely increased from about 1,5GB to about 5,5GB. So even though there is a larger number of processors to be supported, the average L4 cache per processor has been doubled in z13.
A bigger L4 cache has a key role in reducing the accesses to memory which is still much slower even than the L4 cache.
9 Only half of one CPC drawer node is represented.
L4 – 480MB
L3 – 64MB
L2L1
+1 L3
L1 L1 L1 L1 L1L2 L2 L2 L2 L2
L3 – 64MB
L2L1 L1 L1 L1 L1 L1
L2 L2 L2 L2 L2
+7 nodesNIC dir 224MB
Intra L3 node interface
L1 L1L2 L2
L1 L1L2 L2
+7 nodes
EPV Technologies
z13 Capacity Planning 18
The bottom line is always the same: “Workload capacity performance will be quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data for execution. Best performance occurs when the instructions and data are found in the cache(s) nearest the processor so that little time is spent waiting prior to execution; as instructions and data must be retrieved from farther out in the hierarchy, the processor spends more time waiting for their arrival.10” The two main factors determining workload performance are:
Percentage of L1 misses over total searches; Percentage of L1 misses satisfied by each cache level (including memory).
In the next chapter we will show how to calculate them for z13 machines.
10 From IBM Large Systems Performance Reference
EPV Technologies
z13 Capacity Planning 19
9 SMF 113 counters The CPU Measurement Facility (CPU MF), introduced with z10 machines, provides the ability to obtain measurements (counters) on processor cache effectiveness. Collected information are recorded in SMF 113 subtype 2.11 Starting with z/OS 2.1 SMF 113 subtype 1 is also written. IBM stated that subtype 2 will be frozen and all the new information will be added to subtype 1. They already started this process by adding two new counter sections:
z/OS counters MT counters
However at the moment the major advantage of subtype 1 versus subtype 2 is that it provides de-accumulated counters. All the counters and formulas discussed in this chapter apply to both subtypes. Most important groups of counters are:
basic counters; which should be used to calculate the percentage of L1 misses; extended counters; which should be used to calculate the percentage of L1 misses sourced
by each cache level and, starting from them, the RNI value. BASIC COUNTERS Six metrics are provided in the Basic Counters section:
The basic counters’ meaning is the same whatever the machine model is (z10, z196, z114, zBC12, zEC12 and z13). Starting from these measurements the percentage of L1 misses over total searches can be calculated for z13 machines by using the following formula:
z13 %L1 Miss = ((B2 + B4) / B1) * 100
z13 EXTENDED COUNTERS
The extended counters’ meaning depends on the machine model. The SMF113_2_CTRVN2 field allows us to identify the model: it is 1 for z10, 2 for z196 and z114, 3 for zEC12 and zBC12, 4 for z13. The number of Li misses sourced by each cache levels can be calculated as follows:
L2d, data sourced from L2 = E133;
11 They are also collected in a USS file written in the HIS started task HOME directory.
EPV Technologies
z13 Capacity Planning 20
L2i, instructions sourced from L2 = E136; L3d, data sourced from L3 = E144 + E145; L3i, instructions sourced from L3 = E162 + E163; L4Ld, data sourced from L4 Local = E146 + E147 + E148; L4Li, instructions sourced from L4 Local = E164 + E165 + E166; L4Rd, data sourced from L4 Remote = E149 + E150 + E151 + E152 + E153 + E154 + E155
The following formula allows you to calculate the RNI of a system when running on a z13 machine, starting from the Extended Counters: z13 RNI = 2.6 x (0.4 x %L3 + 1.6 x %L4L + 3.5 x %L4R + 7.5 x %MEM) / 100 The coefficients (in bold) are used to weight cache and memory accesses so in the above formula:
accessing the chip cache (%L3) is weighted 0,4; accessing the local book cache (%L4L) is weighted 1,6; accessing a remote book cache (%L4R) is weighted 3,5: accessing memory (%MEM), including both local and remote book memory, is weighted
7,5; an additional coefficient (2,6) is used to adjust the resulting RNI value.
IBM always states that these coefficients may change in the future. However very small changes have been done to previous machines RNI formulas up to now. As already discussed, workload capacity performance is quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data to be executed. So the higher the RNI, the worse will be the workload capacity performance. In practical terms the machine will look less powerful to a workload presenting High RNI characteristics than to a workload presenting AVG RNI or Low RNI characteristics.
12 More details in “The CPU-Measurement Facility Extended Counters Definition for z10, z196/z114, zEC12/zBC12 and z13” manual (SA23-2261-03).
EPV Technologies
z13 Capacity Planning 21
By using the %L1 Miss and RNI values together with the rules in Figure 9 - Chapter 6, you can understand which benchmark best represents the workload running in each system.
EPV Technologies
z13 Capacity Planning 22
10 The CPI index The CPI index represents the average number of cycles needed per instruction. It can be calculated by using basic counters and the following simple formula:
CPI = B0 / B1 As you can imagine there is not a Rule of Thumb for the ideal CPI value. However it’s intuitive that to exploit the processor power the CPI value should be as low as possible. Measuring this index on a regular basis will allow you to evaluate the effect of changes in:
hardware configuration; microcode; exploitation of HiperDispatch; LPAR configuration such as weights, number of logical processors, number of LPARs, etc.; system and subsystem levels; workload mixture.
Using this knowledge you will be able, in case of performance degradation after a change, to quickly identify the problem and solve it. In the figure below the effect on the CPI values of moving two production systems (SYSA and SYSB) from z10 to zEC12 is shown. The graph refers to the weeks from March to May which are the systems peak period every year.
Figure 18
-
1,00
2,00
3,00
4,00
5,00
6,00
7,00
10 11 12 13 14 15 16 17 18 19 20 21 22
CPI
Week
CPI values in the peak weeksz10 vs zEC12
SYSA 2014 - zEC12
SYSB 2014 - zEC12
SYSA 2012 - z10
SYSB 2012 - z10
EPV Technologies
z13 Capacity Planning 23
Generally speaking you should always expect a CPI reduction when moving to a new generation machine; we think it will apply to z13 too. You can also get a deeper understanding of CPI by splitting it into:
finite_CPI; cycles needed because L1 cache is not infinite; instruction_complexity_CPI; cycles needed even with an infinite L1 cache.
They can be estimated by using the following simple formulas:
finite_CPI13 = E143 / B1
instruction_complexity_CPI = CPI – finite_CPI
13 The E143 extended counter provides the number of cycles where a level-1 cache or level-1 TLB miss is in progress.
EPV Technologies
z13 Capacity Planning 24
11 Conclusions z13 looks a very powerful machine; its cache processor architecture presents interesting improvements compared to zEC12. Single processor capacity is increasing only by 12% and capacity variability depending on the workload continues to increase. SMT will introduce some throughput improvement but also new challenges both for IBM and customers. More information is needed about CPU measurements which will radically change with SMT. The new SMF 113 subtype 1 record, available since z/OS 2.1, provides de-accumulated counters and introduces new counter sets. At the moment you can use any of the SMF 113 record subtypes to calculate all the indexes relevant for capacity planning activities, such as %L1 Miss, RNI and CPI. However IBM said they will freeze the “old” subtype 2 records so you should prepare to switch to the subtype 1 as soon as you will move to z/OS 2.1.
EPV Technologies
z13 Capacity Planning 25
Appendix A – z13 MIPS table This table is provided “as is”. While EPV Technologies believes the information included in this table to be accurate, EPV Technologies cannot be held responsible for any consequential damages resulting from the application of information contained in this table.