, Fair Benchmarking and Root Cause Analysis in Mobile Networks Vaclav Raida, Michael Rindler, Philipp Svoboda, Markus Rupp Institute of Telecommunications, TU Wien , The Network Perspective Network Wide Measurements I Task: Fusion of network wide measurements into a network perspective I The challenge: turn benchmark into network performance metric It is not very meaningful to compare tariff-limited data rate of User A with tariff-unlimited data rate of User B , indoor measurements with outdoor measurements or 2G with 4G. I Common grouping criteria: I Indoor / outdoor (detected e.g. based on signal strength) I Different UE hardwares I Different mobile network generations I Different tariffs (traffic shaping detection) I Repeated / automatized measurements (majority of tests conducted by few devices) Passively Active Measurements: Users perform active measure- ments. But we can’t choose when, where and how they will do so. Experimental Results 12-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 13-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 14-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 15-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 16-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 17-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 18-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 19-Sep-2016 10 20 30 40 50 60 R / (Mbit/s) Cell empty Figure 1: The Time-of-Day Effect: One week (starting on Monday) of static data rate (R ) measurements. Every dot represents mean rate of one test. We can clearly recognize time-of-day effect. The rate is highest between ca 0 and 6 AM (lowest cell load) and lowest in the afternoon / evening (many users active). 0 1000 2000 3000 4000 5000 6000 7000 t / ms 0 10 20 30 40 R(t) / (Mbit/s) Average data rate (101ms bins) Beginning of level shift End of level shift Network limitation Tariff limitation Figure 2: The Effect of Tariff Limit: Time series of a single data rate test conducted with a limited tariff. Rate is limited by some traffic shaper (e.g. leaky bucket). If we detect rate level shift, we can calculate for example bucket depth (burst size), token generation rate and to some accuracy also an estimate of capacity (rate without tariff limitation). Network limited Data rate is tariff limited Automatic detector of tariff limits Figure 3: Tariff Limit Detector: We use PAR (peak to average ratio) metric to quickly identify tariff limitation. In this case we see five different scenarios – different UEs and different indoor locations. Location impacts signal strength which impacts data rate. Data rate is further impacted by cell load (time-of-day effect). Combination of these factors leads to different overlapping clusters in our PAR vs R scatter plot. Tariff limitation is reveled by strong vertical line. We can apply the same method for crowdsourced data and detect tariff limits by identifying vertical lines. (One week of RTR-NetTest 1 measurements with CMPT.) Mobile Network Benchmarking: A Spatial, Dynamic Challenge Crowdsourced Look at the Network Operator (Austria) DL rate UL rate Ping Quantity A1 27 Mbit/s 6.3 Mbit/s 25 ms 24 925 T-Mobile 25 Mbit/s 9.6 Mbit/s 29 ms 13 018 Hutchison Drei 23 Mbit/s 11.0 Mbit/s 37 ms 18 586 Table 1: Comparison of three Austrian operators carried out by RTR 1 . Figure 4: What looks like an area with a bad coverage. . . Figure 5: . . . are actually just few “outliers.” Problem: We can’t derive network or user perspective by simply taking data rate median in some area, because we don’t know whether the low rate results were caused by location or rather by a different factor like tariff limitation (left column) or BS-handover (right column). We need to understand spatial properties together with a network perspective. Ultimate Goal: Fair Benchmarking and Root Cause Analysis I Task: Fair comparison of mobile networks based on crowdsourced benchmark data I How: I Create network perspective, extract performance metrics (left column) I Create user perspective, active and controlled measurements (right column) I Create network benchmark by fusion of user and network perspective (Fig. 6) -140 -120 -100 -80 -60 -40 RSRP / dBm 0 50 100 150 200 250 300 R / (Mbit/s) A1 Hutchison Drei T-Mobile A1 - reference cell measurements Figure 6: Benchmarking Example: Two dimensional histogram based on RTR 1 open data, showing distribution of tests’ RSRP values and data rates for operator A1. The green dashed line shows a rate-signal capacity curve, i.e. given cer- tain signal strength the curve tells us what is highest achiev- able rate. The green solid line represent measurements in reference cell (in cooperation with A1 operator; only one UE in the cell → no impact of cell load, predefined RSRP level → no fading). The orange and magenta dashed line show boundaries obtained from distributions of other operators. Conclusion: One possible benchmarking me- thod could be to compare capacity curves of different networks. 1 RTR = Austrian Regulatory Authority for Broadcasting and Telecommunications Source: https://www.netztest.at References [1] M. Rindler, P. Svoboda, M. Rupp, FLARP, Fast Lightweight Available Rate Probing: Benchmarking Mobile Broadband Networks, ICC17, Paris, May 2017 [2] S. Homayouni, V. Raida, P. Svoboda, CMPT: a Methodology of Comparing Performance Measurement Tools, ICUMT, Lisbon, October 2016 The User / Application Perspective Measurements in Mobile Networks and Other Reactive Setups I Task: Conduct active measurements, gain user perspective ground truth I Challenges: I Mobility (BS-Handover) I Cell load (Capacity shared among users) I Crosstraffic (User’s capacity split among multiple apps) I Various network changes Uncontrollable Influences: Filter criteria in the left column can be either directly extracted from the open database because they are re- ported by the measurement tool (UE hardware category, mobile network generation) or they can be derived (tariff limitation from the shape of data rate curve, indoor / outdoor based on signal strength). On the other hand, data rate decreasing factors like BS-handover, high cell load or user’s crosstraffic are difficult or even impossible to reconstruct. FLARP: Spatial Measurement, Short Monitoring of Capacity Fast Lightweight Available Rate Probing (FLARP) [1] is able to estimate available data rate in sub-second time which allows to record at high granularity in the space. Figure 7: Repeatable local avg. performance. Results Server Settings Client Set Probe Pattern Upload Results Chirp Probing Figure 8: The block diagram of the implemented probing system. CMPT: Framework for Reference Measurements Results CMPT Server Settings Configuration Upload Results Test Server Web Interface Test Server Test Server UE CMPT App UE CMPT App UE CMPT App Figure 9: Generic CMPT probing setup Figure 10: Screenshots of CMPT Android app. Crowdsourcing Mobile Performance Tool (CMPT) [2] is an Android ap- plication developed in order to per- form automatized performance mea- surement tasks. It repeatedly ex- ecutes predefined experiments with randomized parameters and reports the results to a centralized database. This enables a continuous monitoring of a ground truth at predefined static places. Acknowledgment This work was supported by the Austrian Research Promotion Agency (FFG) as a part of the Bridge Project No:850742, Methodical Solution for Cooperative Hybrid Performance Analytics in Mobile Net- works (Mc.Hypa-Miner). www.nt.tuwien.ac.at 7th TMA PhD School on Traffic Monitoring and Analysis, Dublin, June 19-20, 2017 [email protected]