Top Banner
Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen
27

Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Dec 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Master’s Thesis, Mikko NieminenEspoo, February 14th, 2006

TROUBLESHOOTING IN LIVE WCDMA NETWORKS

Supervisor: Professor Heikki Hämmäinen

Page 2: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Background to the Study

• The number of live WCDMA networks is growing quickly.

• The first commercial Third Generation Partnership Project (3GPP) compliant network, J-phone, was opened in December 2002.

• By October of 2005, there were 80 live commercial WCDMA networks and the amount of subscribers was nearly 40 million. By that time, around 140 licenses had been awarded for WCDMA, the current WCDMA license holders having more than 500 million subscribers in their Second Generation (2G) networks.

• Especially in Europe and Asia, WCDMA network deployment after successful field trials and service launches has entered a new critical stage: the phase of network optimisation and network troubleshooting.

Page 3: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Research Problem

• As the amount of WCDMA subscribers quickly increases, operators and equipment vendors are facing big challenges in maintaining and troubleshooting their networks. – We may raise the question of how one can efficiently

narrow down the root causes of the problems when there is a huge amount of subscribers and traffic in a live WCDMA network.

– What are the principles of examination of the fault scenarios and narrowing down the problem investigation into logical manageable pieces?

– Which are the tools and methods that are in practice used in WCDMA network troubleshooting today?

• In order tackle these questions and challenges, this Thesis presents a Framework for KPI-triggered troubleshooting in live WCDMA networks.

• The applicability of the Framework is demonstrated by applying it to a selection of real troubleshooting cases that have occurred in commercial WCDMA networks.

Page 4: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Scope of the Study

• This study concentrates on the KPI-triggered problems in live WCDMA networks.

• In general, the faults can be classified into three categories

– Critical, which are emergency problems that require immediate actions,

– Major (which we refer in this study as KPI-triggered problems)

– Minor which do not affect the services of the network.• The viewpoint of is from the equipment vendor’s

side, the main objective being to create guidelines for troubleshooting experts and technical support personnel of WCDMA network manufacturers in order to perform troubleshooting and narrow the problems down following a defined logic.

• This Thesis mainly concentrates on WCDMA network troubleshooting from a Radio Access Network perspective. The reasoning behind this approach is that the UTRAN covers most of the WCDMA specific functionality and intelligence, and therefore brings the majority of the troubleshooting challenges also.

Page 5: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Research Methods

• This Thesis is mainly based on the study of various technical specifications and interviews of WCDMA network troubleshooting experts.

• The main literature sources are the 3GPP specifications of release 99, since the majority of the live WCDMA networks were based on 3GPP release 99 during the writing of this Thesis.

• It can be noted that 3GPP release 4 networks are currently gaining foothold in the live WCDMA networks. However, there are only minor differences in the Radio Access functionality of the afore-mentioned two 3GPP specification releases.

Page 6: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Structure of the Thesis

• Introduction to WCDMA Networks• UTRAN Protocols• Call Trace Analysis• Key Performance Indicators• Framework for KPI-Triggered

Troubleshooting• Cases from Live WCDMA Networks

Page 7: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

WCDMA network architecture

UTRAN RNC

CORENETWORK

Node B Node B

USIM

MEUE

MSC/VLR SGSN

GGSNGMSC

HLR

AuC

EIR

PSTN

RNC

Node B Node Bcell cell cell cell cell cell cell cell

INTERNET

Page 8: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

UTRAN architecture

UTRANIu-CS

Uu

User Equipment(UE)

IurIub

RNC

Node B

Node B

Node B

Node B

RNC

Core Network

(CN)

SGSN

3GMSC

Iu-PS

Page 9: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

UMTS Bearer Services

RRC

: SAP

Non

-A

ccess

Stra

tum

Access

Stra

tum

UE RAN CNUu Iu

Radio Access Bearer

Signalling connection

RRC connection Iu connection

Radio bearer service Iu bearer service

Page 10: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Summary of Protocols (CS user plane)

WCDMA L1

RLC

MAC

PDH/SDH

ATM

AAL2

FP

RNCNode BUE MSC

Iub Iu Uu

RLC

MAC

PDH/SDH

ATM

AAL2

FP

WCDMA L1

CSapplication

and coding

PDH/SDH

ATM

AAL2

Iu-UPprotocol

PDH/SDH

ATM

AAL2

CSapplication

and coding

Iu-UPprotocol

Page 11: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Summary of Protocols (UE control plane)

PDH/SDH

ATM

AAL2

FP

RNCNode BUE CN

WCDMA L1

Iub IuUu

RRC

RLC

MAC

PDH/SDH

ATM

AAL2

FP

WCDMA L1

RRC

RLC

MAC

PDH/SDH

ATM

AAL5

SSCOP

RANAP

MTP3b

SCCP

PDH/SDH

ATM

AAL5

SSCF-NNI

RANAP

MTP3b

SCCP

SSCOP

SSCF-NNI

NAS NAS

Page 12: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

MT CallMO Call

RRCConnection

Establishment

Radio Access Bearer

EstablishmentPaging

User PlaneData Flow

Overview of WCDMA Call Setup

Page 13: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

RRC connection establishment (DCH)

1. RRC CONNECTION REQUEST

UE RNCNode B

2. Admission Control

4. Start RX

9. Start TX

3. RADIO LINK SETUP REQUEST

5. RADIO LINK SETUP ESPONSE

10. RRC CONNECTION SETUP

11. L1 SYNCH

13. RRC CONNECTION SETUP COMPLETE

RRC RRC

C-NBAP C-NBAP

C-NBAP C-NBAP

ALCAP ALCAP6. ESTABLISH REQUEST

ALCAP ALCAP7. ESTABLISH CONFIRM

RRC RRC

12. RL RESTORE INDICATIOND-NBAP D-NBAP

RRC RRC

8. UPLINK & DOWNLINK SYNC FP FP

Page 14: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Protocol Analysers

Company Product Home Country

Nethawk [47] 3G Analyser Finland

Agilent [48] Signaling Analyzer United States

Tektronix [49] K15 United States

Radcom [50] Performer Analyser Israel

Acterna [51] Telecom Protocol Analyzer United States

Page 15: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Active phase

Access phase

RRC Connection Events and KPIs

RRC CONNECTION REQUEST

RRC CONNECTION SETUP COMPLETE

UE RNC CN

RRC CONNECTION SETUP

Event 1RRC_CONN_ATT_ESTincremented

Event 3RRC_CONN_ACC_COMP incremented

Event 2RRC_CONN_ATT_COMPincremented

Event 1

Event 2

Event 3

Event 4IU RELEASE COMMAND

Event 4RRC_CONN_ACT_COMP incremented

Setup phase

Sum of RRC_CONN_STP_COMP

Sum of RRC_CONN_STP_ATTx 100 %RRC Setup Complete Rate =

Sum of RRC_CONN_ACC_COMPSum of RRC_CONN_STP_ATT

x 100 %RRC Establishment Complete Rate =

Sum of RRC_CONN_ACC_COMP

Sum of RRC_CONN_ACT_COMPx 100 %RRC Retainability Rate =

Page 16: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

RRC connection Phases

Attempts

Setup

complete

Access

Complete

Active

Complete

Active

Release

Active

Failures

ActiveAccessSetup

Setup Failures, Blocking

Access Failures

Access

RRC Drop

Success

Phase:

Page 17: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Sum of RAB_STP_COMP

Sum of RAB_STP_ATTx 100 %RAB Setup Complete Rate =

Sum of RAB_ACC_COMP

Sum of RAB_STP_ATTx 100 %RAB Establishment Complete Rate =

Sum of RAB_ACT_COMP

Sum of RAB_ACC_COMPx 100 %RAB Retainability Rate =

Other WCDMA network KPIs

Sum of RAB_ACC_COMP

Sum of RRC_CONN_STP_ATTx 100 %CSSR

=

Sum of RAB_ACT_COMP

Sum of RRC_CONN_STP_ATTx 100 %CCSR

=

Page 18: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Fault Class Description ExamplesA-CRITICAL

Total or major outages that are not avoidable with a workaround solution.

Critical (emergency duty contacted) problems severely affect service, capacity/traffic, billing, and maintenance capabilities and require immediate corrective action, regardless of time of day or day of the week as viewed by the operator.

•System restart, all links down•Simultaneous restarts of active computer units•More than 50 per cent of traffic handling capacity out of use•Subscriber related network element functionality is not working

B-MAJORThe problem leads to degradation of network performance or the fault affects traffic randomly.

Major problems cause conditions that seriously affect system performance, operation, maintenance, and administration and require immediate attention as viewed by the operator. The urgency is less than in critical situations because of a lesser immediate or impending effect on system performance, customers, and the customers operation and revenue.

•Capacity/quality related functionality is not working as supposed to•Problems seriously affecting end user service, but avoidable with a workaround solution•Configuration changes (network, HW, and SW) are not working as supposed to•Subscriber related functions are not working completely•Performance measurement, alarm management or activation of a new feature fails•Single restart of computer units

C-MINORMinor fault not affecting operation or service quality

Other problems that the operator does does not view as critical or major are considered minor. Minor problems do not significantly impair the functioning of the system or affect the service to customers. These problems are tolerable during system use.

•Failures not seriously affecting traffic•Errors in operating commands syntax•Cosmetic errors in operational commands or statistics output•Minor errors in documentation

Fault Classification

Page 19: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Framework for KPI-Triggered Troubleshooting

• Framework is designed for investigating and soelving B-MAJOR level i.e. “KPI-triggered” faults

• Before applying the Framework– The general alarm status of the network has been

checked. No clear network alarms pointing to the root cause of the fault can be detected.

– Traces from external interfaces of RNC have been taken with a protocol analyser in order to record the fault scenario. Also RNC internal trace has been taken when the fault took place.

– The basic fault scenario has been analysed and clarified.

Page 20: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Transmissionspecific

Node B specific

Servicespecific

RNC specific

CN specific

Country specific

UE specific

Yes

No

No

Use RNC Performance Tester to generate load in test bed and perform analysis.

In case of MVI environment, check IOT results and contact foreign vendor. Investigate own vendor’s default parameters and compare implementation againts 3GPP specifications. Compare own default parameters with other default parameters of other vendors. Execute air interface protocol analysis and drive tests.

Analyse network element and interface specific alarms, parameters, capacity, logs and traces. Take specific actions depending on problem scope(refer to detailed Framework notes).

Has average network load increased significantly and/or does theproblem occur at a specific time of day?

Is the problem new in the operator network?

No Yes

Yes

Analyse and investigate thedifferences between the working and faultyconditions.

NoYes

Yes

Perform simulation of the fault in test bed. Does the fault still occur?

No

Yes

No

New SW, HW, parameters, UE model or feature introduced?

Is the fault operatorspecific?

Analyse the traces. Investigate fault scope.

Perform simulation of the fault with reference conditions.Does the fault still occur?

A

B C

D

F

H

G

J

I

Q

K L M N O

R

E

P

Page 21: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Case: Increased AMR call drop rate

• A decrease in RAB Retainability Rate KPI for AMR telephony service was experienced during the last three months in an operator network.

• The decrease was around 2% on each RNC compared to the time when the network was performing well. Actions that had already been taken with no positive effect:– Soft reset for all Node Bs and for all RNCs– Hard reset and re-commissioning of Node Bs – Alarms checked and no major alarms found

Page 22: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Is the problem new in the operator network?

Yes

Yes

Analyse and investigate thedifferences between the working and faultyconditions.

No

New SW, HW, parameters, UE model or feature introduced?

Perform simulation of the fault in reference conditions.Does the fault still occur?

A

C

G

E

I.

II.

III.

IV.

Case: Increased AMR call drop rate

Page 23: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Case: Increased AMR call drop rate

• Solution– The short term solution was that the

parameter for planned maximum downlink transmission power of all the Node Bs in the operator network was changed to the default value of 34 dBm. In this way, the problem disappeared in the operator network.

– The long term solution was to implement a fix of the bug into the next software release of the Node B.

Page 24: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Results

• As a result of thorough research conducted for this Thesis, a Framework for KPI-triggered troubleshooting for live WCDMA networks was developed.

• The Framework is mainly targeted for WCDMA network equipment vendors, to help them in solving major service affecting faults occurring in the live WCDMA networks of today.

• Troubleshooting cases from live WCDMA networks were solved using the Framework developed, in order to verify the results and test the applicability and practicality of the Framework.

Page 25: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Assessment of the results

• The applicability and relevance of the troubleshooting Framework was tested against three different fault cases from live WCDMA networks.

• The results were fairly promising since all the cases were successfully solved by utilising the Framework. The Framework was found to be quite practical and suitable for solving KPI-triggered problems in live WCDMA networks.

• However, it must be taken into account that the Framework was tested with a limited number of cases, because of time and resource limitations. If more extensive testing and verification with a large number of cases would be applied, there is a possibility that optimisations and improvements to the Framework could be done.

• Still, the basic logic of the Framework was proven with reasonable relevance. The results presented in this study can be easily tested in the future against a number of cases in order to verify the results with more extensive statistical reliability.

Page 26: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Exploitation of the results

• The results of this study will be used as source material in the development of UTRAN troubleshooting competence development and advanced learning solution creation, targeted for troubleshooting experts and customer support engineers of one of the leading WCDMA network equipment vendors.

• Also, the results of the Thesis will be used as an input in creation of customer documentation for UTRAN troubleshooting.

• There is also an intention to further test the relevance and reliability of the results of this Thesis by applying it in the 24/7 RAN technical support operator service of the equipment vendor in question.

Page 27: Master’s Thesis, Mikko Nieminen Espoo, February 14th, 2006 TROUBLESHOOTING IN LIVE WCDMA NETWORKS Supervisor: Professor Heikki Hämmäinen.

Future Research

• The significance of Performance Indicator based troubleshooting is increasing continuously in live WCDMA networks.

• Once the PI and KPI specifications become more mature, more extensive study of the most relevant Performance Indicators used in WCDMA network troubleshooting is essential.

• Also, there is a need to develop a Framework and logic for solving emergency problems in WCDMA networks.

• As the growth of complexity of telecommunication networks increases, effective and efficient troubleshooting procedures are essential in order to manage the diversity of network technologies and the increasing quality requirements of the operators.