Top Banner
April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary <app1> performance <app2> performance <app3> performance <app4> performance May RR expectation list Links to test detail
25

April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

Dec 17, 2015

Download

Documents

Dora Melton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

April Release: Performance Review

1

• High-level PPE system diagram

• Definitions

• Current PPE performance summary

• <app1> performance

• <app2> performance

• <app3> performance

• <app4> performance

• May RR expectation list

• Links to test detail

Page 2: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

High-level PPE system diagram

2

Page 3: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

Definitions

3See details here: <link>

Performance is property of a software system which indicates its ability to be as powerful, fast, stable, and scalable as required.

Ramp-up test (also known as capacity test): Virtual users are steadily incremented until performance saturation occurs (adding more virtual users results in response time growth rather than growth of transactions processed per second or in system failure).The goal of test is to find point of system saturation and describe it: how many transactions/sec, network traffic/sec, etc.Ramp-up test reveals how powerful system is.

Short low-, mid-, and high-load tests:Test is running with fixed number of virtual users for a specific period of time to measure response times under different load conditions determined from the ramp-up test.These tests are contingent on results of ramp-up test because “low”, “mid”, and “high” levels of load are relative to the level of saturation. High load is usually deemed to be 80% of the saturation load.These tests show how fast system is.

Longevity test (also known as soak test or stability test):Very long high load test to test system stability.Longevity test shows how stable system is.

Rush-hour test:Rapid increase of number of users logging onto the system (imitating start of business day) followed by rapid user logout imitating activity at the end of business hours on the low-load background. In case of tight project schedule this test can be replaced with login storm test.This test shows how stable system is.

Scalability tests:Series of ramp-up tests conducted against various system configurations (different number of application servers, different number of CPU cores, etc.)Scalability tests show how scalable system is, i.e. how its capacity changes with increase in power of resources provided.

Page 4: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

April Release Summary

4

Overall: Amber Positive factors: Increased <..> capacity in comparison to previous cycle (from 16 to 21

transactions/sec) Good <..> scalability Average of average <..> response time decreased from 1.4 to 1.1 seconds <..> capacity per server 3.3 times better than requirement Stable <..> system behavior during rush-hour tests <..> system was stable under high load, memory leak was not reproduced in long

high-load test <..> system capacity increased comparing to March RR (now it is back to the level of

February RR), issue with <..> -side Search cache was not reproduced Overall <..> "Meet KPQP" share increased (<..> and News are main contributors) Average of average <..> response time decreased from 0.6 to 0.4 seconds Slightly better overall <..> performance compared to the previous <..> tests in PPE

(higher capacity) Less average <..> response time for all VSM URLs compared to the previous test (-

11%) <..> Funds unavailability issue was not reproduced so this led to 100% KPQP pass

status for " <..> - Funds" domain Overall <..> KPQP pass status increased: Search, <..>, and Funds views and sub-component

are main contributors Good <..> scalability

<app2>: Amber

<app3>: Amber

<app4>: Amber

<app1>: Amber

Page 5: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

April Release Summary

5

RAG status demotion factors: "Authorization failed" errors can cause problems with <..> users logon during rush-

hours " <..> - FI (Debt)" domain <..> showed 15% KPQP fail status (0% KPQP failed in March

RR) Average response time within “<..> – FI (Debt)” domain in <..> increased from 1.9 to

3.3 seconds <..> system capacity is ~20% worse vs. March RR <..> system scalability is poor <..> total open defects and issues 2.6 times increase (from 8 to 21) <..> memory leak has not been fixed yet Overall <..> "Meet KPQP" share decreased significantly (from 83% to 57%), every

requests group followed the overall trend Average of average <..> response time increased from 1.22 to 2.40 seconds Disproportional <..> memory consumption growth leads to system instability when

adding extra load after the saturation point 40% of all <..> KPQP transactions do not meet target under medium load Network traffic volume impact on <..> performance (due to environmental issue) was

detected Number of open <..> defects and issues increased from 8 to 11 <..> March RR was tested in CP April RR environment Instability of critical <..> back-ends: <..> WS and DocumentStore_1 Instability of <..> back-ends: DidYouKnowService, mstService, NewsSvc_1,

ItemService_1, etc.

Overall: Amber

<app2>: Amber

<app3>: Amber

<app4>: Amber

<app1>: Amber

Page 6: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> General Status

6

Positive factors:

Increased capacity in comparison to previous cycle (from 16 to 21 transactions/sec)

Good scalability

RAG status demotion factors:

~2% of transactions do not meet KPQP target under low load

"Authorization failed" errors can cause problems with users logon during rush-hours

Amber

Meet KPQP(average <= 3 sec)

Miss KPQP(3 sec < average <= 5 sec)

Fail KPQP(average > 5 sec) Test Fail

94%

4%2%

April KPQP Status

Page 7: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Capacity

7

Transactions / sec Network traffic (MB/sec) % CPU usage

Page 8: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

KPQP Status

8Data for Medium Load

Meet KPQP(average <= 3 sec)

Miss KPQP(3 sec < average <= 5 sec)

Fail KPQP(average > 5 sec) Test Fail

Page 9: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Response Times

9Data for Medium Load

Average of average response time decreased from 1.4 to 1.1 seconds

Average response time within “FI (Debt)” domain increased from 1.9 to 3.3 seconds

Page 10: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> General Status

10

Positive factors:

Capacity per server 3.3 times better than requirement

Stable system behavior during rush-hour tests

Amber

RAG status demotion factors:

System capacity is ~20% worse vs. March RR

System scalability is poor

Total open defects 2.6 times increase (from 8 to 21)

Memory leak has not been fixed yetMeet KPQP(average <= target)

Miss KPQP(target < average <= 2x target)

Fail KPQP(average > 2x target) Test Fail

57%19

%

23%

1%

<..> April KPQP Status

Page 11: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Capacity

11

Transactions / sec Network traffic (KB/sec) % CPU usage

Page 12: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> KPQPs

12

Meet KPQP(average <= target)

Miss KPQP(target < average <= 2x target)

Fail KPQP(average > 2x target) Test Fail

Data for Medium Load

Overall “Meet KPQP” share decreased significantly, every requests group followed the overall trend

Page 13: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Response Times

13Data for Medium Load

Average of average response times increased from 1.22 to 2.40 seconds

Average response times of every requests group followed the overall trend

Page 14: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

Open <..> Defects & Issues

14

Strategicdefects

Re-openeddefects

Newdefects

Strategicenv. issues

Newenv. issues

Total number of open defects and environmental issues increased from 13 to 28

11 re-opened defects: missing KPQP target, mostly <..> and <..>_Select handlers

2 new defects: missing KPQP target for “5PerCentMetadataUpdatedDownload” and “CC.43Fields.5Instruments”

3 new env issues: 2 Performance Center related + 1 related to <..> server instability

Defects:337022337235354507354529354907337243336856337039337034337038337518

Defects:401248401253

Issues:698

14411464

Page 15: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> General Status

15

Positive factors:

System was stable under high load, memory leak was not reproduced in long high-load test

Maximum system capacity increased comparing to March RR (now it is back to the level of February RR)

RAG status demotion factors:

Disproportional memory consumption growth leads to system instability when adding extra load after the saturation point

40% of all KPQP transactions do not meet target under medium load

Network traffic volume impact on <..> performance (due to env issue) was detected

Number of open defects and issues increased from 8 to 11

Amber

Meet KPQP(average <= target)

Miss KPQP(target < average <= 2x target)

Fail KPQP(average > 2x target) Test Fail

60%

20%

20%

<..> April KPQP Status

Page 16: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Capacity

16

Transactions / sec Network traffic (MB/sec) % CPU usage

Transactions/sec value returned to the level of February RR – issue with <..>-side Search cache was not reproduced

Page 17: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> KPQPs

17

Meet KPQP(average <= target)

Miss KPQP(target < average <= 2x target)

Fail KPQP(average > 2x target) Test Fail

Data for Medium Load

Overall “Meet KPQP” share increased (<..> and News are contributors)

Page 18: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Response Times

18Data for Medium Load

Average of average response time decreased from 0.6 to 0.4 seconds

Primarily this is due to the not reproduced issues with <..>-side Search cache which was met in March RR testing

<..> and News response times are better now as well

Page 19: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

Open <..> Defects & Issues

19

Strategicdefects

Re-openeddefects

Newdefects

Strategicenv. issues

Newenv. issues

New env issue (#1567): if network traffic volume is increased from 40 Mbit/sec to 100 Mbit/sec on average response times are two times worse, the bigger response the worse impact is

New defect (#401868): non-proportional memory consumption if saturation level of load is exceeded leads to instability

Re-opened defect (#381090): SaveBinaries.512K transaction response time does not meet KPQP

Issue #1567

Defect#401868

Defect#381090

Page 20: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> General Status

20

Positive factors:

Slightly better overall performance opposite to the previous <..> tests in PPE (higher capacity)

Less average response time of all VSM URLs comparing to the previous test (-11%)

Funds unavailability issue was not reproduced so this led to 100% KPQPs pass for "Funds" domain

Good scalability.

RAG status demotion factors:

Environment is unstable

<..> March RR was tested in CP April RR environment

Instability of critical back-ends: <..>WS and DocumentStore_1

Instability of back-ends: DidYouKnowService, mstService, NewsSvc_1, ItemService_1, etc.

Amber

Meet KPQP(average <= 3 sec)

Miss KPQP(3 sec < average <= 5 sec)

Fail KPQP(average > 5 sec) Test Fail

89%

9%

2%

<..> KPQP Status

Page 21: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Capacity

21

Transactions / sec Network traffic (MB/sec) % CPU usage

Page 22: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> KPQP Status

22Data for Medium Load

Meet KPQP(average <= 3 sec)

Miss KPQP(3 sec < average <= 5 sec)

Fail KPQP(average > 5 sec) Test Fail

Overall KPQP pass status increased: Search, <..>, and Funds views and sub-component are main contributors

Page 23: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

<..> Response Times

23Data for Medium Load

Page 24: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

May RR Expectation List

24

Application What we expect

<..><..> implements new caching for equity views, so we expect that <..>performance improves, especially for these views.

<..>We expect "HTTP Status-Code=500 (Authorization failed: ..." issue (836) to be fixed to prevent failures with users logins especially in rush-hour.

<..>Moving <..> servers to Win2008. We expect that performance is at least the same as it was before.

<..><..> implements new caching for Equity <..> views, so we expect that performance of these requests improves.

<..> Memory leak is expected to be fixed

<..>Traffic volume will not affect <..> performance since issue with network is claimed to be fixed (1567)

<..>If <..> is upgraded to <..>, we expect performance improvement. Also, <..> >> <..> migration should make it possible to test with wider universe of instruments (<..>) which now causes <..> to crash

Page 25: April Release: Performance Review 1 High-level PPE system diagram Definitions Current PPE performance summary performance May RR expectation list Links.

Links to Test Detail

25

<application 1>• Test results reports: <link>• End of cycle <link>

<application 2>• Test results reports: <link>• End of cycle summary: <link>

<application 3>• Test results reports: <link>• End of cycle summary: <link>

<application 4>• Test results reports: <link>

General status: <link>