This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
FFA pre/post impact analysis of service performance Compare service performance after FFA with that of before If FFA is successfully trialed and shows expected performance
impacts, then it can be rolled out network-wide Go/no-go decision is crucial Challenges: external factors can make assessment difficult
Configuration change accidentally co-occurs with strong winds that negatively impacted service performance
Service performance in cellular networks is influenced by several external factors Weather (heavy rainfall introduces obstruction for radio signals) Terrain (Mountains/flat surfaces/tall buildings have different propagation properties) User population densities and mobility patterns Seasonal changes (foliage or leaves budding) Traffic pattern changes (holidays, major events or trade shows) Other network events (outages or maintenance activities in other parts of network)
Unnecessary roll-back of change without knowledge of impact of strong winds
Compare performance between study and control group Study group – network elements where change is implemented Control group – network elements without the changeIntuition Performance at geographically nearby elements is correlated External factor influences performance at both study and control A performance impacting change at study will change the
dependency between study and control Challenges Unrelated performance changes in a small number of control
group member Poor selection of control groupLitmus Solution Robust spatial regression algorithm Domain knowledge guided control group selection
Study-group only analysis Mercury [SIGCOMM’10], PRISM [CoNEXT’11], Spectroscope [NSDI’11], … Does not account for impact of unrelated external factors A/B testing – also known as split testing, control/treatment Popular in web domains for data driven decision making [KDD’07,’12] Web users randomly exposed to the two variants of experiment Why doesn’t it apply in our context?
Tight coupling between experiment and assessment Control group might be subject to other network events such as changes or unplanned outages
Difference in Differences (DiD) Compare mean/median difference between study and control before and after the change Why doesn’t it apply in our context?
Contamination of forecast due to poor selection of control group Sensitivity to performance changes in a small number of control group
Guidelines for control group selection Subject to same external factors as the study group Share similar properties with study group such as geographical proximity or configurationControl group size Not too large: difficult to capture similar impact due to external factor Not too small: loose benefits of robustness in spatial regression analysis Attributes for selection Geographical distance using latitude/longitude and zip-code Topological structure of the cellular network Configuration settings such as software version, or equipment model Predicates to select control group Uni-variate – single attribute (for e.g., LTE cell towers within the same zip-code) Multi-variate – combination of attributes (for e.g., UMTS cell towers with same RNC and same OS)
Evaluation conducted using data collected from operational cellular networks Lack of complete ground truth makes evaluation extremely challenging Two-step methodology
A-priori known changes and assessment by Engineering & Ops Manually conducted before through visual inspection & analysis
Synthetic injection of changes in performance time-series at cell towers Compare Litmus with Difference in Differences (DiD) and study-group only analysis Accuracy computation
Result summary Litmus outperformed study-group only analysis because of robustness to external factors Litmus outperformed DiD because of robustness to a small number bad members in control group
Litmus operational experiencesLitmus is being heavily used for FFA impact assessment in production cellular networks Pre/post impact analysis across a wide variety of performance metrics Outcome is used for a go or no-go decision for wide-scale deployment of FFA change
Change type Location Impact Expectation
Impact Assessment by Litmus
External factor Go/no-go decision
Reduce start-up times for data sessions
Radio Network Controller (RNC)
No degradation in voice
Degradation in voice None
Configuration changes
Mobile Switching Center (MSC) – Voice switch
Improvement in voice
No improvement Foliage
SON load balancing and neighbor discovery
Cell Towers Improvement in call connection
Improvement Hurricane Sandy
Improve cell change success rates
Radio Network Controller (RNC)
Improvement in call retention
No improvement Traffic pattern changes due to holiday
Litmus – an automated tool for robust assessment of changes in cellular networks Carefully accounts for external factors such as foliage, weather, holidays, or network events New spatial regression algorithm for robust performance comparison of study versus control Domain knowledge guided control group selection Outperforms study-group only analysis and Difference in Differences (DiD)
Operational Experiences Litmus is being used successfully in go/no-go decisions for wide-scale deployment of changes Considerably improved the assessment accuracy and analysis time
Future Work Continue to improve methodology for control group selection Apply to other networks and services such as clouds, data centers Extend Litmus to device specific monitoring – e.g., Apple iPhone, Samsung Galaxy or Nokia Lumia