Top Banner

of 54

Near Npo Methodology w3mr1ed2d 20080725

Apr 14, 2018

Download

Documents

Usman Siddiqui
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    1/54

    QoS Delivery/Methodology

    Network Monitoring (October 2008)

    Nicolas Palumbo

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    2/54

    All Rights Reserved Alcatel-Lucent 2006, #####2 | Presentation Title | Month 2006

    1. QoS Reports Delivery

    2. Investigation Reports delivery

    3. Investigation methodology

    4. Migration Reports/templates delivery

    5. Migration methodology

    6. QoS Alerter

    7. Investigation of problems

    6.1 Bad Attach Setup Success Rate

    6.2 WAC Abnormal Release/Handover exec fail

    6.3 Handover Preparation failure

    6.4 CPU Max = 100% on WAC

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    3/54

    All Rights Reserved Alcatel-Lucent 2006, #####3 | Presentation Title | Month 2006

    QoS Reports Delivery

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    4/54

    All Rights Reserved Alcatel-Lucent 2006, #####4 | Presentation Title | Month 2006

    QoS Reports Delivery

    All the reports present in this document can be imported in W3MR1ed2D. The files to

    import are delivered in the following linkhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUI

    Concerning the QoS Report Delivery, it is in part III)

    Concerning the Migration Delivery, it is in part III bis)

    Delivery of 4 reports permitting to follow the quality of service per WAC andavailable on commercial network

    General remark: concerning the reports, when 2 scales are present, the scale on left part isdedicated to the columns and the scale on right part is dedicated to the lines

    Rem: it is an update of the reports done in W3MR1ed2b. So the report names are kepteven if the new release is W3MR1ed2D.

    http://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/34298E19D672E336C12574C20054A478/?OpenDocument&Form=h_PageUI
  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    5/54

    All Rights Reserved Alcatel-Lucent 2006, #####5 | Presentation Title | Month 2006

    QoS Reports Delivery

    NP_Mono_W3MR1ed2b_1 to have a global behavior of the WAC: Complete Attach Setup procedure (applicable at BS level also): Ranging procedure Attach Setup success rate Attach Setup duration

    Session including (applicable at WAC level only): The maximum of simultaneous sessions The average duration of sessions

    Release causes (applicable at BS level also): Done by BS Done by WAC

    Data traffic (IP User Plan): Data traffic between BS and WAC (applicable at BS level also) Data traffic between WAC and CN (applicable at WAC level only)

    WAC capacities used (applicable at WAC level only):

    CPU (max and average) RAM (max and average)

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    6/54

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    7/54All Rights Reserved Alcatel-Lucent 2006, #####7 | Presentation Title | Month 2006

    QoS Reports Delivery

    NP_Mono_traffic_and_radio for global BS behavior (applicable at BS level also) :

    Downlink: Number of slots used per modulation type Number of bytes sent per modulation type CINR distribution and percentage of CINR < 15db RSSI distribution and percentage of RSSI < -85dbm

    Uplink: Number of slots used per modulation type Number of bytes sent per modulation type Tx Power CINR distribution and percentage of CINR < 15db RSSI distribution and percentage of RSSI < -115dbm

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    8/54All Rights Reserved Alcatel-Lucent 2006, #####8 | Presentation Title | Month 2006

    Investigation Reports delivery

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    9/54All Rights Reserved Alcatel-Lucent 2006, #####9 | Presentation Title | Month 2006

    Investigation Reports delivery

    Delivery of 3 reports permitting to establish a diagnosis on attach setup

    procedure failure or the abnormal releases and the actions to do

    NP_Mono_attach_setup_WAC available at WAC level only showing the failure cases foreach step of the attach setup procedure:

    Bad rate of ranging req compared to ranging CDMA (available at BS level also) T9 expiration: no SBC req after reception of ranging req (available at BS level also) Authentication failure (available at BS level also) PKM failure (available at BS level also) T17 expiration: no REG RSP sent the CPE failed in the attach procedure but any

    release done before the T17 expiration (available at BS level also) RAC rej Radio Admission Control. It means a BS has reached it maximum

    capacity in term of Service Flow or Service Flow & Bandwidth (depending on OMCconfiguration) (available at BS level also)

    %Attach BS fail: permit to show if the problem is before the attachment to the BSand so related to the BS (available at BS level also)

    %Fail after Attach to the BS: permit to show if the problem is after the attachmentto the BS and so could be problem of DHCP, MIP or Diameter Relay

    Avg duration: if to high, it can be due to a long DHCP procedure (completelytransparent to the WAC (available at BS level also)

    %DHCP or MIP fail All the release causes (available at BS level also)

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    10/54

    All Rights Reserved Alcatel-Lucent 2006, #####10 | Presentation Title | Month 2006

    Investigation Reports delivery

    NP_attach_setup: warning report available at BS level.

    This warning report permit to see the worst cells for the following cases: Bad rate of ranging req compared to ranging CDMA Connection setup succ rate Highest average duration T9 rate Authentication failure rate PKM failure rate T17 rate RAC rej rate Release cause other failure rate Release cause OVERLOAD rate Attach failure rate with all the causes displayed Attach failure rate before connection to the BS

    NP_Mono_attach_setup: report same as NP_attach_setup but permitting to follow the BS

    in time

    NP_Mono_HO_with_fail: report to investigate the Handover problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    11/54

    All Rights Reserved Alcatel-Lucent 2006, #####11 | Presentation Title | Month 2006

    Investigation methodology

    The goal is from a global overview to go deeper in the analysis permitting:

    to check the root cause of the problem to identify the entity concerned by the problem to take an action to solve/clarify the problem

    The reports delivered concerned the attach setup procedure

    Run the report NP_Mono_W3MR1ed2b_1 with periodicity day

    If bad performance for attach setup procedure, for the concerned day, run the reportNP_Mono_attach_setup_WACwith periodicity day. This will permit to fix if the problemis before/after the attachment to the BS. If before, you can see the view with reason ofthe failure

    If before attachment to the BS, run the warning report NP_attach_setup with periodicityday giving the day corresponding to the problem. Check for the concerned view, what is

    the worst cell. Ensure for the concerned cell you have enough samples (ex: if you have100% failure with 2 request and 0 success, it is not meaningful)

    Run the mono report NP_Mono_attach_setup for the concerned day with periodicity1/4. This permit to check if the problem is occasional of spread during all the day.

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    12/54

    All Rights Reserved Alcatel-Lucent 2006, #####12 | Presentation Title | Month 2006

    Investigation methodology

    Re-Run the report NP_Mono_attach_setup for the previous days and the concerned daywith periodicity 1/4 to check if the problem is periodic or episodic

    On NPO, for the concerned view, right click and select properties. In some views, Iadded in the description field some comments permitting to check in priority some specificpoints

    If not enough, the analysis above permit to say when and on which BS/WAC to start thetrace

    FOR HANDOVER

    Run the report NP_Mono_W3MR1ed2b_2 with periodicity day

    Run NP_Mono_HO_with_fail to identify the different failure cases

    If handover preparation failure case for the day concerned: select all the cells of the WAC and drag&drop of indicators HO_WAC_prep_req,

    _NP_BS_prep_fail, _NP_BS_prep_succ Check if the problem is identified on one or some cells or spread over all the

    cells

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    13/54

    All Rights Reserved Alcatel-Lucent 2006, #####13 | Presentation Title | Month 2006

    Investigation methodology

    If handover execution failure case for the day concerned:

    select all the cells of the WAC and drag&drop of indicatorsHO_WAC_intraWAC_exec_req_sBS, HO_WAC_intraWAC_succ_sBS,NP_HO_WAC_intraW_sBS_fail

    Check if the problem is important on a site or spread over all the sites If problem on one site (for example), select object W-adjacency, select all the

    adjacencies related to this site and drag&drop of indicatorsHO_WAC_intraWAC_exec_req_sBS, HO_WAC_intraWAC_succ_sBS,

    NP_HO_WAC_intraW_sBS_fail

    you will have the list of the worstadjacencies

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    14/54

    All Rights Reserved Alcatel-Lucent 2006, #####14 | Presentation Title | Month 2006

    Migration Reports/templates delivery

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    15/54

    All Rights Reserved Alcatel-Lucent 2006, #####15 | Presentation Title | Month 2006

    Migration reports/templates

    Delivery of 5 reports for migration

    Migration_WAC mono-report available at WAC level to run before OMC migration.

    Migration_BS multi-report available at BS level only to run before OMC migration

    Migration_WAC to run after WAC migration

    T_Mono_Evolution_BS to run after BS migration for one particular BS when doubts

    T_Multi_Migration_BS to run on all the BS after migration

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    16/54

    All Rights Reserved Alcatel-Lucent 2006, #####16 | Presentation Title | Month 2006

    Migration methodology

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    17/54

    All Rights Reserved Alcatel-Lucent 2006, #####17 | Presentation Title | Month 2006

    Migration methodology

    The migration is done in 2 part: OMC migration WAC/BS migration

    WAC/BS migration is usually done 2 or 3 days after OMC migration

    During the OMC migration there is a risk to loose partially/completely the indicatordatabase. It takes time to recover the database. During this period, we have no reference.

    So before OMC migration:

    Export the customers dictionary for report/indicators Run the Mono-Report Migration_WAC for each WAC:

    With periodicity week, for the 10 previous week With periodicity day, for the 21 previous days With periodicity hour, for the 7 previous days

    Run the Multi-Report Migration_BS for all the BS:

    With periodicity day for the 7 previous days Save the pm_xml files regularly before the OMC migration As the pm xml files are present during a period of 5 days, knowing the OMC

    migration date, the backup must be done regularly (at least during the 10 daysbefore the migration). This files contain all the counters. So in case of missinginformation in the reports, it is possible to recover it in these files

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    18/54

    All Rights Reserved Alcatel-Lucent 2006, #####18 | Presentation Title | Month 2006

    Migration methodology

    After OMC migration:

    Check if NPO is always running If KO during a long time, need to work with pm xml files

    Check the customers views/reports/indicators are always present If no more present, re-import them

    Run the mono and multi reports (used before migration) for the differentperiodicities and with the same date to check they are equivalent

    If database corrupted, the comparison will be done with the reports donebefore the OMC migration

    Before the WAC/BS migration:

    Save all the OMC parameters

    If OMC migration done with success, the day before WAC/BS migration redo the same operations than before OMC migration for mono-report for multi-report, complete with the days missing

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    19/54

    All Rights Reserved Alcatel-Lucent 2006, #####19 | Presentation Title | Month 2006

    Migration methodology

    After the WAC/BS migration:

    First day, reports to run each hour: Mono report Migration_WAC starting from the previous complete days with

    periodicity hour (if possible with periodicity 1/4 but that means theGranularity Period is 15mn else no meaning). To have a good reference, youhave to compare the previous hour with the same hour of the previous day(and if possible with the same hour and same day of the previous week)

    Multi-report T_Multi_Migration_BS with periodicity hour for the current dayto check all the BS are running and generating traffic( attach setupattempt/success)

    Results sent as soon as BS not operational or important degradation Results of the previous hour with the cell migrated and taking off the bad BS

    compared to before migration(due to any reasons) to give the list of the badBS (rem: can be done creating a cell zone)

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    20/54

    All Rights Reserved Alcatel-Lucent 2006, #####20 | Presentation Title | Month 2006

    Migration methodology

    Second day: Same as first day checking regularly if no problem seen before Compare the previous day with the same day of the previous week should

    be similar Results of the previous days for all the BS Results of the previous days taking off the BS bad compared to before

    migration(due to any reasons) to give the list of the bad BS (rem: can bedone creating a cell zone)

    Other days: Regular check for the current day Results of the previous days for all the BS Results of the previous days taking off the bad BS bad Follow of the bad BS if any action has been done

    First week:

    comparison of the previous week with the week before QoS follow up

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    21/54

    All Rights Reserved Alcatel-Lucent 2006, #####21 | Presentation Title | Month 2006

    QoS Alerter

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    22/54

    All Rights Reserved Alcatel-Lucent 2006, #####22 | Presentation Title | Month 2006

    QoS Alerter

    Regarding QoS alerters, please check out the dedicated KTS session & respective slides onthe Sleeping, Dead & Lazy cell topic: KTS QoS Alerter link

    Sleeping cell:

    Description: BS-WAC link down Impact: will trigger a BS reset [the keep alive mechanism will not work] - all users on that BS

    will be affected being disconnected Workaround: use WBS monitoring tool to determine/measuring the impact

    Dead cell:

    Description: WBS strops to make traffic Impact: connected users will not make traffic anymore and new users can't connect (BS is not

    detected by the CPE) Workaround: QoS alerter to be used (usage depends on SW release) => if a BS is dead we have

    many steps as workaround: check status of the WBS and other alarms; check if the WBS is nothaving traffic because no CPEs on that area; if you find no reason for the BS no having traffic =>Reset Telecom => if the problem still exists on the next GP, Reset WBS

    Lazy cell: Description: Traffic ongoing but no new CPE can connect Impact: Some mobiles can't connect into the WBS (this could impact all the mobiles or few

    mobiles only) Workaround: detection available in W3.1 Ed2D P2 only. Final QoS alerter formula is under test

    in a commercial network. As soon we will have the final formula with final threshold, we willannounce it.

    http://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/D07C104AAC22A1B2C12574DA003D8929/?OpenDocument&Form=h_PageUIhttp://aww.quickplace.alcatel.com/QuickPlace/mnd_pcs-psf/PageLibraryC12570D0006048F2.nsf/h_Index/D07C104AAC22A1B2C12574DA003D8929/?OpenDocument&Form=h_PageUI
  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    23/54

    All Rights Reserved Alcatel-Lucent 2006, #####23 | Presentation Title | Month 2006

    INVESTIGATION OF PROBLEMS

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    24/54

    All Rights Reserved Alcatel-Lucent 2006, #####24 | Presentation Title | Month 2006

    Bad Attach Setup Success Rate

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    25/54

    All Rights Reserved Alcatel-Lucent 2006, #####25 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Running report NP_Mono_W3MR1ed2b_1, the connection setup success rate isvery low (

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    26/54

    All Rights Reserved Alcatel-Lucent 2006, #####26 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Running report NP_Mono_attach_setup_WAC with periodicity day: view %Attach BS fail permit to say the attach setup failure occurs before the attachment

    to the BS The listed failure causes rate are near or equal to 0% (T9, T17, auth_fail, pkm fail) The view displaying all the release causes shows a high level of release cause other

    Attach fail before BS attached - WAC: waclu01 ( RanCtrl-01-Cluster01.packet1.com

    ) - 07/06/2008 To 07/15/2008 (Working Zone: Global - Medium)

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    07/06/20

    08

    07/07/20

    08

    07/08/20

    08

    07/09/20

    08

    07/10/20

    08

    07/1

    1/20

    08

    07/12/20

    08

    07/13/20

    08

    07/1

    4/20

    08

    07/15/20

    08

    Nounit

    97.%

    97.5%

    98.%

    98.5%

    99.%

    99.5%

    %setup req

    %attach BS fail

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    27/54

    All Rights Reserved Alcatel-Lucent 2006, #####27 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Attach fail at BS level - WAC: waclu01 ( RanCtrl-01-Cluster01.packet1.com ) -

    07/06/2008 To 07/15/2008 (Working Zone: Global - Medium)

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    07/06/2008

    07/07/2008

    07/08/2008

    07/09/2008

    07/10/2008

    07/1

    1/2008

    07/12/2008

    07/13/2008

    07/1

    4/2008

    07/15/2008

    Nounit

    0.00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    120.00%

    %

    overlaod

    other fail

    preProv SF failPKM exchange fail

    auth fail

    setup succ

    %attach fail

    %T17 expired

    %RAC rej

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    28/54

    All Rights Reserved Alcatel-Lucent 2006, #####28 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Running the warning report NP_attach_setup with periodicity day: In view Attach fail cause other, we took the first worst cell (containing many

    attempts)

    Attach fail cause other - BSCELL - 07/15/2008 (Working

    Zone: Global - Medium)

    0

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    80000

    Cell_PP

    RNEG

    ERISE

    MBILA

    N

    Cell_RE

    STSR

    IRATU2

    Cell_DB

    KLS2

    1

    Cell_GETA

    HASLI2

    Cell_HO

    TELTUN

    E2

    Cell_DA

    TOSE

    NA2

    Cell_HO

    TELTUN

    E3

    Cell_DA

    NAUK

    OTA3

    Cell_GETAHA

    SLI1

    Cell_HU

    PHOE

    JLNT

    ARKL3

    Cell_HU

    PHOE

    JLNT

    ARKL

    1

    Cell_SU

    BANG

    HITE

    C1

    Cell_PP

    RNEG

    ERISE

    MBILA

    N

    Cell_WISM

    AKGM

    B3

    Cell_W

    AKILM

    AS2

    Cell_WISM

    AKGM

    B1

    Cell_PP

    RSET

    APAKJAYA

    3

    Cell_SU

    BANG

    HITEC

    Cell_TM

    NSET

    APAK3

    Cell_TM

    NSET

    APAK2

    Cell_JLNP

    AHAN

    GSET

    APAK2

    Cell_ALRA

    JHI2

    Cell_KL

    INIKS

    EGAR

    ASS1

    41

    Cell_JLNP

    AHAN

    GSET

    APAK

    1

    Cell_GO

    MBAK

    JAYA

    1

    Cell_SU

    BANG

    HITE

    C3

    Cell_KL

    CTLO

    DGE3

    Nounit

    00.%

    20.%

    40.%

    60.%

    80.%

    100.%

    120.%

    %

    setup req

    %other fail

    Cell RESTSRIRATU2

    Attach_setup_req=73374

    100%70000

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    29/54

    All Rights Reserved Alcatel-Lucent 2006, #####29 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Running the mono report NP_Mono_attach_setup with periodicity 1/4 for theconcerned day:

    In view Attach fail at BS level, we see a level of release cause other spread allthe time. That means the problem is permanent

    Attach fail at BS level - BSCELL: Cell_RESTSRIRATU2 ( 000012a00132 ) -

    07/15/2008 00:00 To 07/16/2008 00:00 (Working Zone: Global - Medium)

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    07/15/2008

    00:00

    07/15/2008

    01:15

    07/15/2008

    02:30

    07/15/2008

    03:45

    07/15/2008

    05:

    00

    07/15/2008

    06:15

    07/15/2008

    07:

    30

    07/15/2008

    08:45

    07/15/2008

    10:00

    07/15/2008

    11:15

    07/15/2008

    12:30

    07/15/2008

    13:45

    07/15/2008

    15:

    00

    07/15/2008

    16:15

    07/15/2008

    17:

    30

    07/15/2008

    18:45

    07/15/2008

    20:00

    07/15/2008

    21:15

    07/15/2008

    22:30

    07/15/2008

    23:45

    Nounit

    0.00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    120.00%

    %

    overlaodother fail

    preProv SF fail

    PKM exchange fail

    auth fail

    setup succ

    %attach fail

    %T17 expired

    %RAC rej

    f bl

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    30/54

    All Rights Reserved Alcatel-Lucent 2006, #####30 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    Checking the view NP_102_Attach_Fail_Other, the description indicates that ifother cause is very high, the problem can occur from a bad anonymous identity.

    Running the report NP_Mono_attach_setup_WAC with periodicity hour duringseveral day, we have seen the problem is during the day and during the night.

    A wireshark trace has been done during the night.

    After analysis of the trace (mainly focused on ss_data_ind that contained the

    anonymous identity), we have seen a bad anonymous identity anonymousidentity without extension @P1 (find enclosed a screen shot of the wireshark traceon next slide)

    I i i f bl

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    31/54

    All Rights Reserved Alcatel-Lucent 2006, #####31 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    I ti ti f bl

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    32/54

    All Rights Reserved Alcatel-Lucent 2006, #####32 | Presentation Title | Month 2006

    Investigation of problems

    Bad attach setup success rate

    After analysis of all the trace during a period of 1 hour (around 6GB) a list of MACaddress with bad anonymous identities has been established.

    3 CPEs of the list (the worst) have been switched off/on (unplug/plug) and afterthat, the CPEs were working fine

    That means the problem is not due to a bad configuration but to a bad behaviorof the CPEs. Probably this problem will re-appear again on the sameCPEs or other CPEs

    As it is not possible to systematically make wireshark trace to have a list of MACaddress with bad anonymous identity, another solution is to use the WAC log fileCC_callControlThread_x.log.

    On WAC machine, you have to collect all the CC_callControlThread_x.log goingin /diagnosis/log/WUM__>/CC_ with corresponding tothe more recent date.

    A tool is in preparation to give the list of MAC address with bad anonymousidentity and the corresponding cell

    When the tool will be available, this check must be done each day to verify if newCPEs will have bad anonymous identity

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    33/54

    All Rights Reserved Alcatel-Lucent 2006, #####33 | Presentation Title | Month 2006

    WAC Abnormal Release/Handover exec fail

    I ti ti f bl

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    34/54

    All Rights Reserved Alcatel-Lucent 2006, #####34 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Running report NP_Mono_W3MR1ed2b_1 on WACLU01 with periodicity week onseveral weeks, there is a number of WAC abnormal release increasing

    Session Release - WAC: waclu01 ( RanCtrl-01-Cluster01.packet1.com ) - 2008

    Week 23 To 2008 Week 28 (Working Zone: Global - Medium)

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    06/02/2008 06/09/2008 06/16/2008 06/23/2008 06/30/2008 07/07/2008

    Nounit

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    Nounit

    Rel other

    WAC Abnormal Rel

    BS Rel

    WAC Normal Rel

    MS Rel

    Session Ended

    I tig ti f bl

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    35/54

    All Rights Reserved Alcatel-Lucent 2006, #####35 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Running report NP_Mono_W3MR1ed2b_2 on WACLU01with periodicity week onseveral weeks, there is:

    high number of handover execution (for intra WAC handover -inter WAC low)

    Increase of handover execution failureRem: you can see a case with %HO succ > 100%. It is under investigation. If you see similar

    anomaly, let me know (Nicolas Palumbo) to be able to investigate the problem

    INTRA WAC HO Exec Proc - WAC: waclu01 ( RanCtrl-01-Cluster01.packet1.com ) -

    2008 Week 23 To 2008 Week 28 (Working Zone: Global - Medium)

    -5000

    0

    5000

    10000

    15000

    20000

    25000

    30000

    35000

    40000

    45000

    50000

    06/02/2008 06/09/2008 06/16/2008 06/23/2008 06/30/2008 07/07/2008

    Nounit

    .%

    20.%

    40.%

    60.%

    80.%

    100.%

    120.%

    140.%

    160.%

    %

    HO fail

    HO succ

    %HO succ

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    36/54

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    37/54

    All Rights Reserved Alcatel-Lucent 2006, #####37 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    38/54

    All Rights Reserved Alcatel-Lucent 2006, #####38 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    We can see another cause of release done after handover. After several handovers success (so no failure), the CPE sends a DHCP Release

    (No 1774) and the WAC release the session (No 1775).

    The DHCP release sent by the CPE is seen by the WAC as an abnormal release.The indicator MS_rel_other (in legend Rel other) will be incremented.

    So it is important to track CPEs that are generating DHCP release ifDHCP logfile exists, it can give the MAC address of these CPEs

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    39/54

    All Rights Reserved Alcatel-Lucent 2006, #####39 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    40/54

    All Rights Reserved Alcatel-Lucent 2006, #####40 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    To reduce the ping-pong effect and so to reduce the number of handovers, thefollowing actions have been done the 22/07/2008,

    On site TMNSETAPAK, the adjacencies have been removed The hard Hysterisis Margin handover parameter was 2 and has been set to 5 on

    the following sites PPRGBONUS TUNEHOTEL DANAUKOTA

    Running the report NP_Mono_W3MR1ed2b_1 and NP_Mono_W3MR1ed2b_2 withperiodicity day for the previous week, we can see an improvement of the main KPIs

    Max Simultaneous connections ==> much better (x by 2) Average duration session ==> much better (x by 2) Traffic User Plan IP between WAC-BS ==> much better (x by 2) Traffic User Plan IP between WAC-CN ==> much better (x by 2)

    Handovers (see NP_Mono_W3MR1ed2b_2_day.xls)Number of Handoverspreparation/executions seriously decreased - divided by 4 Number of Handover execution failure rate has dropped from 90% to 50%

    ==> we have to fix on which cell(s) the problem occurs

    The number of WAC abnormal release has decreased but not as expected becausehandovers are performed and many handover failures are present

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    41/54

    All Rights Reserved Alcatel-Lucent 2006, #####41 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Session - WAC: waclu01 - 07/16/2008 To 07/23/2008

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    07/16/2008

    07/17/2008

    07/18/2008

    07/19/2008

    07/20/2008

    07/21/2008

    07/22/2008

    07/23/2008

    Nounit

    0

    20

    40

    60

    80

    100

    120

    140

    Nounit

    Start

    Max (simultaneously opened)

    Session Duration - WAC: waclu01 - 07/16/2008 To 07/23/2008

    0

    5000000

    10000000

    15000000

    20000000

    25000000

    07/16/200

    8

    07/17/200

    8

    07/18/200

    8

    07/19/200

    8

    07/20/200

    8

    07/21/200

    8

    07/22/200

    8

    07/23/200

    8

    s

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    s

    Total Duration

    Avg Duration

    UP IP WAC-BS - WAC: waclu01- 07/16/2008 To 07/23/2008

    0

    2E+10

    4E+10

    6E+10

    8E+10

    1E+11

    1.2E+11

    1.4E+11

    07/16/2008

    07/17/2008

    07/18/2008

    07/19/2008

    07/20/2008

    07/21/2008

    07/22/2008

    07/23/2008

    bytes

    0

    2000000

    4000000

    6000000

    8000000

    10000000

    12000000

    14000000

    bp

    s

    WAC to BS in bytesBS to WAC in bytesWAC to BS in bpsBS to WAC in bps

    UP IP WAC-CN - WAC: waclu01- 07/16/2008 To 07/23/2008

    0

    2E+10

    4E+10

    6E+10

    8E+10

    1E+11

    1.2E+11

    1.4E+11

    1.6E+11

    07/16/20

    08

    07/17/20

    08

    07/18/20

    08

    07/19/20

    08

    07/20/20

    08

    07/21/20

    08

    07/22/20

    08

    07/23/20

    08

    bytes

    0

    2000000

    4000000

    6000000

    8000000

    10000000

    12000000

    14000000

    bps

    WAC to CN in bytesCN to WAC in bytesWAC to CN in bpsCN to WAC in bps

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    42/54

    All Rights Reserved Alcatel-Lucent 2006, #####42 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Session Release - WAC: waclu01- 07/16/2008 To 07/23/2008

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    07/16/200

    8

    07/17/200

    8

    07/18/200

    8

    07/19/200

    8

    07/20/200

    8

    07/21/200

    8

    07/22/200

    8

    07/23/200

    8

    Nounit

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    Nounit

    Rel other

    WAC Abnormal RelBS RelWAC Normal RelMS RelSession Ended

    Handover Preparation - WAC: waclu01- 07/16/2008 To

    07/23/2008

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    18000

    20000

    07/16/200807/17/200807/18/200807/19/200807/20/200807/21/200807/22/200807/23/2008

    Nounit

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    Prep failPrep succ%Prep fail%RAC Rej

    INTRA WAC HO Exec Proc - WAC: waclu01- 07/16/2008 To

    07/23/2008

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    07/16/200807/17/200807/18/200807/19/200807/20/200807/21/200807/22/200807/23/2008

    Nounit

    .%

    10.%

    20.%

    30.%

    40.%

    50.%

    60.%

    70.%

    80.%

    90.%

    100.%

    HO fail

    HO succ

    %HO succWe can see a small reduction of WAC

    abnormal release and a big decrease of

    %HO succ.

    The next step is to see if it is due to one cellor if it is spread over all the cells.

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    43/54

    All Rights Reserved Alcatel-Lucent 2006, #####43 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    On BS Cell object, select the 3 sites withmodification of hard HO hysterisis margin

    and check the indicators relative to HO and

    release.

    We can see the main problem is on siteDANAUKOTA. Now we have to select objectW-adjacency and select all the adjacencies

    related to DANAUKOTA site (so 1,2,3)

    07/23/2008

    HO_

    WAC_

    intraWAC_exec_

    req_

    sB S

    HO_

    WAC_

    intraWA

    C_

    succ_

    sBS

    _NP_

    HO_

    WAC_

    intraW

    AC_

    sBS_

    fail

    MS_

    rel_WA

    C_

    abnormal

    M

    S_

    rel_other

    Cell_DANAUKOTA1 13 5 8 393 41

    Cell_DANAUKOTA2113 20 93 15 26

    Cell_DANAUKOTA3 293 0 293 11 81

    Cell_HOTELTUNE1 262 221 41 35 25

    Cell_HOTELTUNE2 280 247 33 41 29

    Cell_HOTELTUNE3 83 69 14 52 4

    Cell_PPRSGBONUS1 96 80 16 77 16

    Cell_PPRSGBONUS2 94 70 24 17 9

    Cell_PPRSGBONUS3 7 7 0 1 0

    Sum of the 3 sites 1241 7 19 5 22 642 231

    WACLU1 1460 7 91 6 69 762 732

    07/23/2008

    HO

    _WAC_

    intraWAC_

    exec_

    req_s

    BS

    HO_

    WAC_

    intraWAC_

    succ_s

    BS

    _N

    P_

    HO_

    WAC_

    intraWAC_

    sBS_fail

    Cell_DANAUKOTA1-Cell_DANAUKOTA2 6 3 3

    Cell_DANAUKOTA1-Cell_DANAUKOTA3 7 2 5

    Cell_DANAUKOTA2-Cell_DANAUKOTA1 56 9 47

    Cell_DANAUKOTA2-Cell_DANAUKOTA3 7 7 0

    Cell_DANAUKOTA2-Cell_PPRSGBONUS1 50 4 46

    Cell_DANAUKOTA3-Cell_DANAUKOTA1 274 0 274

    Cell_DANAUKOTA3-Cell_DANAUKOTA2 10 0 10

    Cell_DANAUKOTA3-Cell_TMNSETAPAK2 9 0 9

    Conclusion: the HO failures are mainlydue to the adjacency:

    DANAUKOTA3-DANAUKOTA1DANAUKOTA2-DANAUKOTA1DANAUKOTA2-PPRSGBONUS1

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    44/54

    All Rights Reserved Alcatel-Lucent 2006, #####44 | Presentation Title | Month 2006

    Investigation of problems

    WAC Abnormal Release/Handover Exec Fail

    Now, as the problem has been fixed on the adjacencies, drive tests must be done onDANAUKOTA3-DANAUKOTA1 to understand why handover are 100% failed.

    Action 1- On WACLU01, make WireShark trace during a GP with high number ofhandover failure

    Action 2- CCC Filter trace on BS ID corresponding to DANAUKOTA3 as source ortarget you will have all related to this BS. You can check the MAC addressattempting many handover with failure cases

    Action 3- Find the physical position of the CPE with the MAC address identified.

    Action 4- Make drive tests to check if problem similar. If not, CPE must be tracedand if necessary, BS DANAUKOTA3 and DANAUKOTA1 must be traced

    Action 5- Analysis of traces if necessary

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    45/54

    All Rights Reserved Alcatel-Lucent 2006, #####45 | Presentation Title | Month 2006

    Handover Preparation failure

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    46/54

    All Rights Reserved Alcatel-Lucent 2006, #####46 | Presentation Title | Month 2006

    Investigation of problems

    Handover Preparation/Failure

    Running report NP_Mono_W3MR1ed2b_2 on WACLU01with periodicity week onseveral weeks, there is:

    high number of handover preparation request high number of handover preparation failure

    Handover Preparation - WAC: waclu01 ( RanCtrl-01-Cluster01.packet1.com ) - 2008

    Week 23 To 2008 Week 28 (Working Zone: Global - Medium)

    0

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    80000

    90000

    100000

    06/02/2008 06/09/2008 06/16/2008 06/23/2008 06/30/2008 07/07/2008

    Nounit

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    90.00%

    %

    Prep fail

    Prep succ

    %Prep fail

    %RAC Rej

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    47/54

    All Rights Reserved Alcatel-Lucent 2006, #####47 | Presentation Title | Month 2006

    Investigation of problems

    Handover Preparation/Failure

    WAC in Release W3 doesnt support the handover during attach setup procedure

    Concerning the high number of handover preparation, the following view showsthat HO_BS_fail_empty_BSlist_sBS is very high.

    One cause is when a CPE is requesting a handover preparation during attach setupprocedure, as WAC W3 doesnt support HO during this phase, the WAC rejects thisrequest and the BS sends the message MOB_BSHO-RSP cause empty BS list to theCPE indicator HO_BS_fail_empty_BSlist_sBS is incremented.

    After the reject, the CPE re-attempts and is rejected and re-attempts etc

    Handover Preparation - WAC: waclu01- 07/16/2008 To

    07/23/2008

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    18000

    20000

    07/16/200

    8

    07/17/200

    8

    07/18/200

    8

    07/19/200

    8

    07/20/200

    8

    07/2

    1/200

    8

    07/22/200

    8

    07/23/200

    8

    Nounit

    0.%

    10.%

    20.%

    30.%

    40.%

    50.%

    60.%

    %

    Prep ko RTVR flush

    expiredPrep ko NRT flush

    expiredPrep ko empty list

    Prep succ

    %Prep succ

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    48/54

    All Rights Reserved Alcatel-Lucent 2006, #####48 | Presentation Title | Month 2006

    Investigation of problems

    Handover Preparation/Failure

    Logically if the CPE is on the best cell, there is not reason to ask for a handover ona better cell. This problem occurring many times, that means the CPE doesnt

    select the best cell before the attach setup procedure

    Checking the BS cells with this problem, the worst cells are:

    Due to the fact the CPE is not on the good cell, the ranging req vs cdma rate isdirectly impacted (poor coverage and so many ranging CDMA sent)

    07/23/2008

    HO

    _W

    AC

    _prep

    _re q

    _NP_

    BS

    _prep

    _fail

    _NP_BS

    _prep

    _suc c

    Cell_DANAUKOTA1 17 4 13

    Cell_DANAUKOTA2 231 118 113

    Cell_DANAUKOTA3 890 597 293

    Cell_HOTELTUNE1 846 584 262

    Cell_HOTELTUNE2 949 669 280

    Cell_HOTELTUNE3 228 145 83

    Cell_PPRSGBONUS1 151 55 96Cell_PPRSGBONUS2 104 10 94

    Cell_PPRSGBONUS3 7 0 7

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    49/54

    All Rights Reserved Alcatel-Lucent 2006, #####49 | Presentation Title | Month 2006

    Investigation of problems

    Handover Preparation/Failure

    First action is to make a WireShark WAC trace to see which CPE (MAC address) isasking many handover preparation during attach setup procedure

    Second action is to make a Radio Trace to follow this MAC address and to check if itis on the best BS Cell or not

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    50/54

    All Rights Reserved Alcatel-Lucent 2006, #####50 | Presentation Title | Month 2006

    CPU Max = 100% on WAC

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    51/54

    All Rights Reserved Alcatel-Lucent 2006, #####51 | Presentation Title | Month 2006

    Investigation of problems

    CPU Max = 100% on WAC

    Checking the CPU view with periodicity day for a duration of one week, we have:

    CPU and RAM used - WAC: waclu01- 07/16/2008 To 07/23/2008

    .00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    120.00%

    07/16/200807/17/200807/18/200807/19/200807/20/200807/21/200807/22/200807/23/2008

    %

    .%

    2.%

    4.%

    6.%

    8.%

    10.%

    12.%

    14.%

    16.%

    18.%

    20.%

    %

    CPU_MaxRAM_Max

    CPU_Avg

    RAM_Avg

    The 21/07/2008

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    52/54

    All Rights Reserved Alcatel-Lucent 2006, #####52 | Presentation Title | Month 2006

    g p

    CPU Max = 100% on WAC

    Checking the CPU view for the 21/07/2008 with periodicity 1/4, we have:

    CPU and RAM used - WAC: waclu01- 07/21/2008 14:00 To

    07/21/2008 17:00

    .00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    120.00%

    07/21/20

    0814:00

    07/21/20

    0814:15

    07/21/20

    0814:30

    07/21/20

    0814:45

    07/21/20

    0815:

    00

    07/21/20

    0815:

    15

    07/21/20

    0815:

    30

    07/21/20

    0815:

    45

    07/21/20

    0816:00

    07/21/20

    0816:15

    07/21/20

    0816:30

    07/21/20

    0816:45

    07/21/20

    0817:

    00

    %

    .0%

    2.0%

    4.0%6.0%

    8.0%10.0%

    12.0%

    14.0%16.0%

    18.0%20.0%

    %

    CPU_Max

    RAM_Max

    CPU_Avg

    RAM_Avg

    The 21/07/2008 at 15h58, WAC log traces have been collected on WACLU01. SoUSB key has been plugged and the WAC has transferred around 500MB of log file.

    It has taken around 30mn Doing actions on WAC can increase the CPU and so to have a direct influence on

    network behavior

    Investigation of problems

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    53/54

    All Rights Reserved Alcatel-Lucent 2006, #####53 | Presentation Title | Month 2006

    g p

    CPU Max = 100% on WAC

    Coincidence or not: the save of log files started at 15h58 and ended around 16h28: the WAC01 didnt sent the counters (so in the previous view hole for the period

    15h45-16h00

    In the following graphs, you can see an increase of the number of maxsimultaneous session followed by an increase of release cause Rel Other

    Next time, when actions are done on WAC, the 3 previous points must berechecked + the CPU to 100%

    Session - WAC: waclu01- 07/21/2008 14:00 To 07/21/2008

    17:00

    0

    10

    20

    30

    40

    50

    60

    70

    80

    07/21/20

    0814:00

    07/21/20

    0814:15

    07/21/20

    0814:30

    07/21/20

    0814:45

    07/21/20

    0815:

    00

    07/21/20

    0815:

    15

    07/21/20

    0815:

    30

    07/21/20

    0815:

    45

    07/21/20

    0816:00

    07/21/20

    0816:15

    07/21/20

    0816:30

    07/21/20

    0816:45

    07/21/20

    0817:

    00

    Nou

    nit

    0

    10

    20

    30

    40

    50

    60

    70

    Nou

    nit

    Start

    Max (simultaneously

    opened)

    Session Release - WAC: waclu01- 07/21/2008 14:00 To

    07/21/2008 17:00

    0

    1020

    30

    40

    50

    60

    70

    80

    90

    07/21/20

    0814:00

    07/21/20

    0814:15

    07/21/20

    0814:30

    07/21/20

    0814:45

    07/21/20

    0815:

    00

    07/21/20

    0815:

    15

    07/21/20

    0815:

    30

    07/21/20

    0815:

    45

    07/21/20

    0816:00

    07/21/20

    0816:15

    07/21/20

    0816:30

    07/21/20

    0816:45

    07/21/20

    0817:

    00

    Nou

    nit

    0

    1020

    30

    40

    50

    60

    70

    80

    90

    Nou

    nit

    Rel other

    WAC Abnormal

    RelBS Rel

    WAC NormalRelMS Rel

    Session Ended

  • 7/29/2019 Near Npo Methodology w3mr1ed2d 20080725

    54/54

    www.alcatel-lucent.com