Top Banner

of 24

Fault ReportNEW 27-4-11

Apr 08, 2018

Download

Documents

ajaymv_4
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/7/2019 Fault ReportNEW 27-4-11

    1/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 1

    INTRODUCTION

    Most architectures of currently deployed network management systems (NMS) for

    telecommunication networks can be characterized as centralized and hierarchical

    While marking a great achievement in making operation more efficient, such NMS

    solutions require powerful machines because of the complex logic and large amount

    of management information to be processed, furthermore they involve costly

    redundancy mechanisms in order to avoid single point so failure hierarchy

    additionally introduces different levels of abstraction - from element management of

    individual network elements (NEs) up to business management at the top of theTelecommunication Management Network (TMN) pyramid Management information

    flow in both directions - up and down this static hierarchy- is cascaded, with

    information mapping performed at each layer. From a functional point of view, the

    five FCAPS disciplines are rather separated. Interactions between these disciplines

    typically happen at a higher management layer, or even through a human operator.

    This approach to manage communication networks is very well understood and works

    well for classic telecommunication networks; plenty of technically mature systems

    that incorporate that approach have been on the market for years. The last couple of

    years, however, have revealed several emerging technologies like Voice over IP

    (VoIP) and IP-TV, ubiquitous and pervasive computing, new wireless technologies,

    and various implications of ad-hoc and peer-to-peer (P2P) networks. This results in

    interesting Next Generation Network (NGN) scenarios that have already been (or are

    about to be) realized by network operators.

    Even if these NGN scenarios cover a wide spectrum of use cases and technologies,

    they have a few characteristics in common:

    (1) Large scale - up to 106 possibly small nodes

    (2) Heterogeneity - different 11W and SW platforms, different vendors, different

    access and communication protocols and

    (3) Dynamics - nodes might appear and disappear regularly, and topology changes

    will be the rule rather than the exception.

    In a typical state-of-the-art fault management system the main fault processing logic

    resides in a powerful correlation engine at the top of the processing chain where a lot

    Dr. AIT, Dept. of ISE 2010-2011 1

  • 8/7/2019 Fault ReportNEW 27-4-11

    2/24

    Distributed Fault Management approach for Next Generation Networks

    of alarms from each network node are received. Using sophisticated statistical

    analysis this engine might find out the most probable root cause for a given sequence

    of alarms. Further, in order to be meaningful to an operator, the information contained

    in this large amount of alarms has to be enriched by correlation against data bases

    containing topology and other configuration related information, stored on the central

    server. If the number of emitting nodes and alarms exceeds a certain critical value

    such a centralized correlation engine will become a bottleneck. Moreover, the

    heterogeneity of NEs increases the complexity of the system, and finally - maybe the

    most critical issue if static information on network topology is used for fault

    analysis, the system will inevitably suffer from inconsistencies as soon as this

    topology exhibits dynamic aspects.

    Motivated by these issues we will present a conceptually different fault management

    design that is based on a peer-to peer (P2P) approach to network management

    suggested in the CELTIC research project Madeira

    Dr. AIT, Dept. of ISE 2010-2011 2

  • 8/7/2019 Fault ReportNEW 27-4-11

    3/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 2

    NETWORK MANAGEMENT GOALS AND REQUIREMENTS

    2.1 OPERATIONAL GOALS

    Proactive monitoring of network infrastructure and service levels.

    Streamline network operations functions through NMS tools optimization.

    Scalability of NMS architecture to support new network technologies such as

    Multiprotocol Label Switching (MPLS), wireless, quality of service (QoS), and others

    Increase the ability to detect soft failures at the protocol, hardware, system software,

    and interface levels

    Help enable proactive maintenance to be performed by the network operations

    center (NOC) support team upon detecting faults or performance degradation

    Help enable intelligent forwarding of network events to the NOC

    2.2 FUNCTIONAL REQUIREMENTS

    Given the high-level manageability goals outlined above, the following sections

    highlight the functional requirements in specific network management functional

    areas.

    Fault Management

    Fault management encompasses the discipline of identifying faults in a network

    environment. Faults are identified by receiving events such as syslog and Simple

    Network Management Protocol (SNMP) traps from network devices, polling network

    device MIBs, and identifying real or potential error conditions and setting thresholds

    that trigger events. In addition, the NMS should be able to provide event correlation

    as well as reporting and tracking. The NMS used should also provide a northbound

    interface for exporting critical messages to a higher level manager or MoM (manager

    of managers).

    In an ideal environment, the fault manager would collect both syslog and SNMP

    information, filter that information, and pass the filtered data to a MoM for further

    processing. This method helps decrease the amount of data that an end user needs to

    Dr. AIT, Dept. of ISE 2010-2011 3

  • 8/7/2019 Fault ReportNEW 27-4-11

    4/24

    Distributed Fault Management approach for Next Generation Networks

    see or react upon. The MoM, in turn, can provide further analysis and automation

    based on the incoming event streams such as verifying down circuits, testing

    connectivity, and opening trouble tickets based on those findings.

    Unmanaged Events

    Stand-alone fault managers are used to gather event data from devices throughout the

    network and report their findings. They have little to no capability of automating

    reactions based on gathered data. When a message comes into the fault manager, the

    typical course of action is simply to report the fault to a screen being monitored by

    operations personnel.

    Managed Events

    By employing the use of a MoM, your system can react to these events automatically,

    which can drastically reduce downtime in mission-critical networks. For example,

    when an event comes in from the fault manager, the MoM can:

    Verify connectivity to the reported down device/interface by ping/Telnet or other

    means

    Gather information about the device such as vendor, serial number, location, contact

    information, circuit IDs, and site IDs, and so on from a device inventory database

    Attach historical reports gathered from other NMSs such as bandwidth, CPU,

    memory, and so on

    Open a trouble ticket automatically and have that ticket prepopulated with important

    information from the device information database

    This method would not only relieve operations personnel from having to look up the

    information for an outage, but would save critical time in bringing the fault to a

    resolution.

    Event Correlation

    Event management encompasses event-correlation and root-cause analysis. It allows

    for multiple input streams from various network devices and environments and, using

    knowledge of the network topology and a sophisticated rule set, attempts to identify

    the source or root cause of a network fault or problem.

    At the top level (MoM), event correlation features should be supported to aggregate

    and correlate incoming alarms. The system needs to have the intelligence to correlate

    Dr. AIT, Dept. of ISE 2010-2011 4

  • 8/7/2019 Fault ReportNEW 27-4-11

    5/24

    Distributed Fault Management approach for Next Generation Networks

    event types (SNMP, syslog, and so on) as well as to provide automation of tasks

    based on event criteria.

    Filtering capability should be supported to selectively display relevant alarms.

    The system should be capable of escalating critical alarms based on the number of

    occurrences and time delays in acknowledgement.

    Alarm severity should be customizable based on end-user or operational needs.

    Alarm properties and escalation should be policy based, dependent on the role of the

    device in the network.

    The system should be able to virtually partition the managed network into multiple

    logical entities based on geographical locations.

    The fault management system should support role-based access to fault events based

    on job responsibilities.

    A knowledge base consisting of troubleshooting guidelines or methodologies should

    be part of the fault management system. This is to facilitate rapid problem isolation on

    network-related issues.

    The system should provide integration between the fault and the inventory

    management system to support auto population of information.

    Integrate between the inventory system and the trouble-ticketing system for auto

    population of relevant trouble ticket fields.

    The system should provide the flexibility to forward traps and alarms to a different

    location/system for after-hours monitoring.

    Log Management

    Logging is a critical part of network management. Good logs can help you find

    configuration errors, understand past intrusions, troubleshoot service disruptions, and

    react to probes and scans of your network. Cisco devices have the ability to log a

    great deal of their status.

    Syslog is also a great resource for network compliance, allowing companies to adapt

    quickly to changing regulations such as Sarbanes Oxley (SOX), Control Objectives

    for Information and related Technology (COBIT), IT Infrastructure Library (ITIL),

    Gramm-Leach-Bliley Financial Modernization Act (GLBA), Visa Card Holder

    Information Security Program (Visa CISP), Payment Card Industry (PCI) Data

    Dr. AIT, Dept. of ISE 2010-2011 5

  • 8/7/2019 Fault ReportNEW 27-4-11

    6/24

    Distributed Fault Management approach for Next Generation Networks

    Security Standards, Health Insurance Portability and Accountability Act (HIPAA),

    Committee of Sponsoring Organizations (COSO) of the Treadway Commission, and

    custom regulations.

    Defining all aspects of a syslog server is outside the scope of this document.

    NMS North and Southbound API Interfaces

    Communication between multiple network management systems is extremely

    important for event correlation and data aggregation. Most, if not all, NMSs should be

    able to communicate bidirectional. This helps ensure the ability to provide correlated

    events as well as the coordination of data sources throughout the network such as

    inventory, access, performance data, and so on.

    Dr. AIT, Dept. of ISE 2010-2011 6

  • 8/7/2019 Fault ReportNEW 27-4-11

    7/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 3

    HIERARCHICAL APPROACH TO NETWORK MANAGEMENT

    Layering of network management not only allows NMS systems to communicate

    better, it reduces the amount of alerts seen by network operations support staff. At the

    lowest layer, it is nearly impossible to keep up with events displayed from each

    network element reported in the NMS architecture. For example, it is not feasible to

    have someone watching every syslog event that occurs on the network. Instead, you

    rely on systems at the Network Management Layer (NML) to filter through all events

    and show only those events deemed as most important. The Service Management

    Layer (SML), meanwhile, is used to further summarize events from the NML and tie

    multiple network management systems together. A good NMS system will also

    provide reduplication of these network events in order to further reduce the amount of

    unnecessary messages seen by operations personnel.

    The hierarchical model in Figure 1 shows the major components that make up a

    comprehensive NMS system and provides a high-level integration scenario. Cisco

    Advanced Services encourages the adoption of a layered, hierarchical network

    management system. This type of architecture involves data flow and integration ofmultiple NMS tools to be effective. Figure 1 depicts those tool and data relationships.

    Figure 1: Hierarchical Network Model

    Dr. AIT, Dept. of ISE 2010-2011 7

  • 8/7/2019 Fault ReportNEW 27-4-11

    8/24

    Distributed Fault Management approach for Next Generation Networks

    The underlying hierarchical philosophy is to get the organization to a basic level of

    integrated network management. The foundation for this architecture comes from the

    Telecommunications Management Network (TMN) (M.3000) model. "TMN provides

    a framework for achieving interconnectivity and communication across heterogeneous

    operations system and telecommunication networks. To achieve this, TMN defines a

    set of interface points for elements which perform the actual communications

    processing (such as a call processing switch) to be accessed by elements, such as

    management workstations, to monitor and control them. The standard interface allows

    elements from different manufacturers to be incorporated into a network under a

    single management control."

    Element Management Layer

    The first level, the Element Management Layer, defines individual network elements

    used in deployment. In defining this layer, for each anomaly that occurs in the

    network, potentially multiple devices can be affected by the event and can

    independently alert network management systems that an event has occurred resulting

    in multiple instances of the same problem.

    Network Management Layer

    In the middle of the diagram is the Network Management Layer. This function takes

    input from multiple elements (which in reality might be different applications),

    correlates the information received from the various sources (also referred to as

    root-cause analysis), and identifies the event that has occurred. The NML provides a

    level of abstraction above the Element Management Layer in that operations

    personnel are not "weeding" through potentially hundreds of Unreachable or Node

    Down alerts but instead are focusing on the actual event such as, "an area-borderrouter has failed."

    Service Management Layer

    At the top of the diagram is the Service Management Layer. This layer is responsible

    for adding intelligence and automation to filtered events, event correlation, and

    communication between databases and incident management systems. The goal is to

    move traditional network management environments and the operations personnel

    Dr. AIT, Dept. of ISE 2010-2011 8

  • 8/7/2019 Fault ReportNEW 27-4-11

    9/24

    Distributed Fault Management approach for Next Generation Networks

    from element management (managing individual alerts) to network management

    (managing network events) to service management (managing identified problems).

    As an evolution of TMN, the TeleManagement Forum, or TMF, proposed the

    Telecom Operation Map (TOM) and, more recently, the eTOM. These models

    describe at a high level the processes a telecom operator needs to fulfill to manage its

    network and services infrastructure. Furthermore, TMF has defined the NGOSS

    architecture, which is a technology agnostic framework for the construction of

    management applications. NGOSS fosters component based architecture with

    interfaces between components defined as contracts, a shared information model and

    the separation of implementation from business logic.

    An interesting implementation of NGOSS concepts is OSS/J, which pursues the

    implementation of a set of APIs, based on J2EE technologies, to allow the integration

    of OSSs . It is also worth mentioning the 3GPP approach to network management,

    which is based on the IRP (Integration Reference Point) concept. The IRP is

    analogous to the TMN Q3 reference point, improving it by defining the information

    models in an implementation independent UML.

    An important assumption behind TMN and similar frameworks is that network

    elements have limited management capabilities, being focused on their

    communications role. Therefore, management functions are performed externally by

    dedicated systems; while network elements only provide simple management agents

    to allow these external systems access and manipulate management data. It was

    recognized considerable time ago, however, that network devices can do much more

    than running a simple agent, being even capable of managing themselves.

    This is the approach taken in IP networks, where nodes are able to perform

    management tasks for routing, signaling, path provisioning, etc. Following this trend,

    the control plane paradigm has emerged in telecom networks, giving more autonomy

    to network elements for certain tasks. Particularly in optical networks, research is

    being conducted to create an optical control plane that enables automated

    multi-vendor network operation. Examples are the ITUs Architecture for

    Automatically Switched Optical Networks (ASON) and the IETFs Generalized

    Multi-Protocol Label Switching (GMPLS) . However, decentralized standards are not

    (yet) available for wireless networks. Note that the existence of a control plane does

    not mean that the management plane is no longer necessary, but rather that it must

    Dr. AIT, Dept. of ISE 2010-2011 9

  • 8/7/2019 Fault ReportNEW 27-4-11

    10/24

    Distributed Fault Management approach for Next Generation Networks

    adapt to the new scenario, focusing on the tasks for which it is better suited and

    relying on the control plane for other tasks such as routing and signaling .

    Benefits of Hierarchical Layers

    From a practical perspective, integrating these elements involves:

    Assembling a robust set of event correlation rules that consistently and

    accurately identify the source of an event.

    Opening a trouble ticket in an incident management application that

    operational personnel begin working on

    This helps enable an operations organization to:

    Proactively manage the network.

    Identify and correct potential network issues before they become problems.

    Prevent a loss of network connectivity, thus ensuring organizational productivity.

    Focus on the solution instead of the problem.

    But some of the disadvantages of hierarchical network management are

    In the past years, TMN has been the dominant network management

    framework. It promotes a well-known centralized approach which has a

    number of effects on the scalability of the network management application.

    Managing a large network from a single, central point will increase the load of

    the central manager and could create bandwidth bottlenecks on links that are

    close to that central manager.

    Another disadvantage is the lack of flexibility, since the current generation of

    management architectures use static topology data, often based on manually

    generated files.

    Dr. AIT, Dept. of ISE 2010-2011 10

  • 8/7/2019 Fault ReportNEW 27-4-11

    11/24

    Distributed Fault Management approach for Next Generation Networks

    Peer to Peer

    In a P2P system, the nodes have a significant or total degree of autonomy from central

    servers. As pointed out by, P2P systems enable the utilization of previously unused

    resources as storage, cycles or content for example, by tolerating and working withthe variable connectivity of numerous devices. An overall characteristic of a peer-to-

    peer network is that the nodes can send and receive information in a way that makes

    them both servers and clients, or servants. In both and, a distinction is made

    between pure peer-to-peer networks and hybrid peer-to-peer networks, in such a way

    that:

    Pure P2P architectures are completely decentralized: There is no central server or

    router. Each node can issue and respond to requests, or route requests to other nodes.

    In Hybrid P2P architectures, more types of nodes exist: The leaf nodes are nodes

    with an information need or information resource. In other words, they can provide

    information to or request information from other leaf nodes. Another type of nodes,

    super peers, has a more server-like role in the network. These nodes provide

    regionally centralized services to the network in order to improve the routing of

    information requests. In these nodes are called directory nodes or ultra peers. Each

    directory node provides directory services for portions of the network and directory

    nodes work in a cooperative manner to cover the whole network.

    Figure 2: Pure Peer-to-Peer (left) and Hybrid Peer-to-Peer (right)

    Dr. AIT, Dept. of ISE 2010-2011 11

  • 8/7/2019 Fault ReportNEW 27-4-11

    12/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 4

    MADEIRA ARCHITECTURE

    In an attempt to overcome the shortcomings of the traditional management

    approaches to face the challenges of next generation telecommunication networks,

    Madeira aims to develop a new management framework based on peer-to-peer

    networking concepts. Furthermore, it provides novel technologies for a logically

    meshed Network Management System that facilitates self-management and dynamic

    behavior of nodes within the network. Madeira also takes advantage of the Policy

    Based Management Paradigm that pursues the separation of management logic from

    the actual applications. This logic is then specified as a set of rules or policies that can

    be dynamically fed into the management system allowing a change of its behavior

    without the need of changing the application or even restarting it. Besides the

    innovative architectural framework, the Madeira project will provide interface

    protocols, standards and a reference software implementation and apply it to a

    specific network management scenario. Ultimately, by enabling the management of

    network elements of increasing numbers, heterogeneity and transience, the Madeira

    approach should reduce the Operational Expenses, or OPEX.

    Madeira focuses on Fault and Configuration Management functional areas and,

    especially, on the way they can co-operate to solve management problems. In doing

    so, it will act as a complement to traditional management systems. There are many

    management tasks that current network management systems perform well, such as

    Performance Management. For these tasks, a hierarchical approach is entirely

    appropriate. Madeira will investigate the feasibility of distributing management

    responsibilities among peer nodes in order to perform certain tasks more efficiently.

    In other words, the Madeira management approach is applied to management tasks

    that are difficult to carry out using conventional methods, or tasks that can be carried

    out more efficiently using a distributed approach.

    Dr. AIT, Dept. of ISE 2010-2011 12

  • 8/7/2019 Fault ReportNEW 27-4-11

    13/24

    Distributed Fault Management approach for Next Generation Networks

    Figure 3: Madeira Architecture

    Based on the approach of applying P2P concepts to the management domain the

    Madera architecture has been designed with a number of key principles in mind .Themost important of these principles is heterogeneity or the ability of the Madeira

    system to be applied to many management domains and across heterogeneous devices

    and platforms. The main vehicle for this generic management is the usage of policies,

    notifications and applications.

    The Madeira architecture is essentially composed of an Adaptive Management

    Component (AMC) and a Platform. The AMC is the component that manages a given

    node and together many AMCs can orchestrate the overall behavior of the meshed

    network. These AMCs have the ability to exchange and export Network Management

    information between peer management applications and are deployed as an overlay

    network, communicating using the peer-to-peer paradigm. The AMC itself is

    composed of a number of sub-elements which facilitate this management.

    The AMC Core is the primary component or brain of the AMC and, based on

    notifications and policies, orchestrates the services and applications to facilitate the

    required network management function. Services are components that provide some

    functionality required by the AMC Core. Applications provide the actual

    Dr. AIT, Dept. of ISE 2010-2011 13

  • 8/7/2019 Fault ReportNEW 27-4-11

    14/24

    Distributed Fault Management approach for Next Generation Networks

    management functionality specific to a particular device. Applications are physically

    divided into parts that run within each AMC but are logically connected through peer

    interactions. For example each AMC will run some Fault Management application

    which together with FM applications running on other nodes constitutes the overall

    FM application of the entire meshed network. Policies provide the generic

    management functionality shared by all AMCs within the same management domain.

    The generic management specified in policies is then mapped to applications.

    Notifications enable inter-AMC communication, via the Madeira platform, about

    events within the mesh network and also provide a means to logically distributed

    applications.

    The following groups of services are available in an AMC:

    The Configuration Management and Fault Management contain the specific

    network management applications. They and provide the ability to setup the network,

    react to faults and other FM and CM related tasks. A description of how these tasks

    are performed will be described in the section dedicated to the scenario.

    The Northbound Interface is optional. It offers services that communicate with a

    higher layer Operation Support System (OSS) via Web Services. The OSS can, for

    example, retrieve information like network topology, events or alarms. More

    information on the connection with an external OSS will be given in the section

    Connecting to the North.

    The AMC Specific Services offers a base for the Network Management

    Applications. It mainly provides services to communicate with other AMCs. This can

    be either publish-subscribe based, or a direct peer to- peer connection to another AMC

    Dr. AIT, Dept. of ISE 2010-2011 14

  • 8/7/2019 Fault ReportNEW 27-4-11

    15/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 5

    THE MADEIRA SCENARIO

    The goal of the scenario is to prove the capabilities of the Madeira approach to

    deal with real life management problems. It provides a number of challenging

    tasks to test the management approach. To emphasize the strengths of Madeira,

    the problems that arise in the scenario are difficult to solve with traditional

    management approaches, especially with respect to dynamic reconfiguration and

    changing topologies that occur in Wi-Fi networks.

    The scenario focuses on the areas of Configuration Management and Fault

    Management, with an emphasis on the integration between both of them. In thescenario, a number of wireless base stations are deployed in such a way that

    wireless equipment (for example, laptops or PDAs) may have coverage from one

    or more base stations. Not every base station has a wired connection to the back

    haul network, as is the case in a traditional wireless network. Base stations

    directly connected to the backhaul network are called gateways.

    Configuration management

    The rest of the base stations can only use a wireless connection to reach a gateway

    and thus the backhaul network. After deploying the base stations, Madeira

    automatically sets up the wireless meshed network using OLSR (Optimized Link

    State Routing protocol) as the routing algorithm. OLSR is a link stated routing

    protocol that is specifically developed for mobile ad-hoc networks . Based on

    pre-installed policies, base stations are grouped into a number of clusters by the

    Grouping Service. These policies can be based on a number of criteria such as, for

    example, number of nodes per cluster or topological proximity.

    Figure 4depicts an example topology that could be the result of this process. As

    can be seen, clusters may or may not have direct backhaul connectivity. The network

    elements in a cluster monitor each other and exchange management information on a

    peer-to-peer basis. As mentioned, wireless network equipment that wants to

    Dr. AIT, Dept. of ISE 2010-2011 15

  • 8/7/2019 Fault ReportNEW 27-4-11

    16/24

    Distributed Fault Management approach for Next Generation Networks

    Figure 4: Management clusters formed in wireless Mesh Network

    Dr. AIT, Dept. of ISE 2010-2011 16

  • 8/7/2019 Fault ReportNEW 27-4-11

    17/24

    Distributed Fault Management approach for Next Generation Networks

    Figure 5: Management Cluster Hierarchy

    Dr. AIT, Dept. of ISE 2010-2011 17

  • 8/7/2019 Fault ReportNEW 27-4-11

    18/24

    Distributed Fault Management approach for Next Generation Networks

    use the network is in range of one or more base stations. If the wireless equipment is

    in range of more than one base station, it selects one of them as its preferred base

    station, and uses this connection to use services on the Internet for example. If the

    wireless equipment is in range of just one base station, it must select that base station

    as its preferred base station.

    Each cluster has exactly one Cluster Head. Policies are used for this election, and can

    be based on criteria like load, optimal connectivity or robustness. The Cluster Head is

    responsible for coordination and topology publishing of its cluster. Different levels of

    clustering can exist in Madeira.

    The creation of this hierarchy is also based on policies. As mentioned in the previous

    section dedicated to the architecture, the top level Cluster Head is responsible for

    publishing the topology of the complete network. This can be done to a higher layer

    Operation Support System or another Network Management System for example. The

    cluster hierarchy is the basic management overlay that is used by all management

    functionality in Madeira. It creates a scalable environment for network management.

    The Madeira Configuration Management application is responsible for the

    construction, maintenance and viewing of the topology. Other applications, such as

    Fault Management, use this management overlay to implement their functionality.

    Fault management

    During usage of the network, it is inevitable that unexpected faults occur. When such

    a fault occurs, it is important that: Appropriate action is undertaken quickly in order to

    reduce the service impact. Meaningful information on the fault is presented to the

    operator (in particular in those cases where automatic restoration is not or not fully

    possible). Besides Configuration Management (CM), Madeira focuses on Fault

    Management (FM) and how CM actions and events are related to FM faults andalarms. Correlation between CM events and FM faults is an important aspect in order

    to discover the actual cause of a problem in the network.

    Alarms can be generated by two different sources:

    1. Hardware level alarmsare generated by the base station in case of a hardware

    fault.

    2. Platform level alarms are generated by either the Directory Service or the CM

    application. The directory Service can indicate loss of connection with a neighboring

    node, and the CM application can indicate changes in the topology (a node leaves or

    Dr. AIT, Dept. of ISE 2010-2011 18

  • 8/7/2019 Fault ReportNEW 27-4-11

    19/24

    Distributed Fault Management approach for Next Generation Networks

    joins a cluster). When a fault occurs, the FM application will receive one or more

    alarms. For example, a hardware level problem say also cause a fault on platform

    level, creating two alarms. These alarms are correlated into a new alarm by FM and

    sent to the Cluster Head. This Cluster Head also performs correlation of the alarm

    with alarms originating from other nodes in order to get a clearer picture of the

    probable cause and possible solution. It can then forward the alarm to a higher

    hierarchy level. This process is repeated until it reaches the Top Level Cluster Head,

    which can notify the Northbound Interface in order to produce an alarm for the

    external OSS. This paper provides a few example scenarios in order to explain the

    basic concepts and functionality of the FM application in Madeira. These scenarios

    describe two faults with similar impact but very different in nature (in the first case a

    node goes down, while in the second there is just a loss of connectivity between two

    nodes), and focused on the way Madeira distinguishes these two cases by correlating

    alarms and CM events at different levels of the management hierarchy.

    Base Station E outage

    Figure 6: Base Station Outage

    Figure 6depicts two Madeira Management Clusters, with node A and node G being

    the Cluster Heads. The solid lines indicate physical OLSR links between nodes. The

    dotted line represents an inter-cluster connection between node E and node F. When

    node E fails, the Directory Service of its one-hop neighbours D and F will notice this.

    Both nodes will notify their Cluster Heads that the link with node E has failed, and

    that therefore E might be faulty. Besides receiving this alarm from node D, Cluster

    Head A also receives a notification from the CM application that node E isnt part of

    the cluster anymore. It will then send an alarm with this knowledge to a higher

    hierarchy level.

    Dr. AIT, Dept. of ISE 2010-2011 19

  • 8/7/2019 Fault ReportNEW 27-4-11

    20/24

    Distributed Fault Management approach for Next Generation Networks

    When the Cluster Head G of node F receives the alarm that node E isnt reachable, it

    will also forward this alarm to a higher hierarchy level.

    After receiving the alarms from node A and G, the higher level Cluster Head tries to

    correlate the information with other alarms. Since both alarms contain the same

    knowledge (a possible fault of node E) they will be merged to a single alarm with the

    same content. Afterwards the alarm will be forwarded up the hierarchy pyramid, until

    the Top Level Cluster Head will notify the NBI, which will inform the external OSS

    on the failure of node E.

    Link outage between node D and node E

    In this scenario, shown in Figure 7, the link between node D and node E fails. The

    link between E and F remains intact. Both nodes D and E will receive a notification

    from their Directory Service indicating the neighboring node is no longer reachable.

    Node D concludes E might be faulty and forwards this information to its

    Figure 7: Link Outage

    Cluster Head A, who also receives a notification from the CM application that E is no

    longer part of its cluster. After combining this information, it will send an alarm to thenext hierarchy level, identical to the previous scenario. Besides receiving the

    notification from the Directory Service, node E will recognize that it is no longer

    connected to its cluster head and it will join another cluster (it is assumed E joins the

    cluster containing F and G). After this reconfiguration process E will forward the

    information that D is no longer reachable to its new cluster head G, which will

    additionally receive a notification from the CM application that E has joined its

    cluster and forward this information to the next hierarchy level.

    Dr. AIT, Dept. of ISE 2010-2011 20

  • 8/7/2019 Fault ReportNEW 27-4-11

    21/24

    Distributed Fault Management approach for Next Generation Networks

    The higher level Cluster Head of this next hierarchy level receives the alarm from A,

    indicating that node D reported that E is unavailable and probably faulty. It also

    receives a notification from the CM application indicating that node E has joined

    another cluster, and an alarm from G indicating that node E reported that D is

    unavailable and probably faulty. After correlating these three notifications, the Cluster

    Head concludes that a link outage between D and E occurred and either suppresses

    this alarm (if configured to do so) or forwards this knowledge as a minor alarm up the

    hierarchy pyramid until the Top Level Cluster Head and NBI is reached, which will

    then inform the external OSS on the link outage between D and E.

    Dr. AIT, Dept. of ISE 2010-2011 21

  • 8/7/2019 Fault ReportNEW 27-4-11

    22/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 6

    FUTURE ENHANCEMENT

    Currently the project is focusing on the implementation and integration of the

    different components that encompass the Madeira solution. A prototype management

    system dealing with specific Configuration and Fault management scenarios of Wi-Fi

    networks will be developed and tested on top of a test bed provided by the partners.

    At the time of writing, several aspects have already been implemented. The release of

    a first prototype is planned for the beginning of this year. A second iteration

    demonstrating the main aspects of Madeira will be finished at the end of the project,

    in July 2006. Although it is not explicitly mentioned as being in the scope of Madeira,

    some efforts are being undertaken to research security mechanisms for peer-to-peer

    environments.

    In such environments, where there is no centralized security solution, security and

    trust are important aspects. A possible solution for this could be to use Public Key

    Cryptography to create a web-of-trust, similar to the PGP approach . This should not

    be mistaken with the FCAPS Security Management area, which focuses on authorized

    use of the network, data integrity and confidentiality for example.

    Dr. AIT, Dept. of ISE 2010-2011 22

  • 8/7/2019 Fault ReportNEW 27-4-11

    23/24

    Distributed Fault Management approach for Next Generation Networks

    CHAPTER 7

    CONCLUSION

    The Madeira Management framework proposes the use of peer-to-peer techniques to

    fulfill management tasks. The ability to perform self-management and the usage of

    Management Clusters solves scalability issues that are present in current hierarchical

    network management approaches. So-called Adaptive Management Components, or

    AMCs, that run on network elements, are in charge of performing the various

    management tasks By using the peer-to-peer interface, these AMCs can communicate

    with each other, creating a Management Overlay, in order to execute management

    tasks and make up network management applications.

    Furthermore, because of the peer-to peer concept, Madeira doesnt have a need for a

    central server. Besides reducing the operating expenses, this also eliminates the single

    point of failure that exists in traditional systems. Madeira also offers support for a

    higher layer Operation Support System (OSS). Such an OSS can access Madeira via

    the Northbound Interface. This Web Services based interface enables an OSS to

    acquire topology information, receive alarms, introduce policies and perform other

    management tasks. By using a publish/subscribe system for notifications from the

    network, Madeira can notify multiple external systems about events or alarms at the

    same time. In order to prove the feasibility of the Madeira approach, a challenging

    scenario has been identified, dealing with Configuration and Fault Management of

    highly dynamical wireless networks. Based on the Madeira framework, a

    Management System addressing this scenario will be prototyped and tested on a real

    test bed.

    Dr. AIT, Dept. of ISE 2010-2011 23

  • 8/7/2019 Fault ReportNEW 27-4-11

    24/24

    Distributed Fault Management approach for Next Generation Networks

    BIBLIOGRAPHY

    [1] Markus Leitner, Philipp Leitner, Martin Zach,Sandra Collins, Claire Fahy FaultManagement based on peer-to-peer paradigms an IEEE 2007 paper, pp. 697-700

    [2] Ray Carroll, Claire Fahy, Elyes Lehtihet, Sven van der Meer, Nektarios Geor

    galas, David Cleary, Applying the P2P paradigm to management of large-scale

    distributed networks using a Model Driven Approach2006 IEEE

    [3] Pablo Arozarena Llopis, Martijn Frints, David Ortega Abad, Javier GonzlezOrds, Liam Fallon, Martin Zach, Hai Nguyen Thi Van, Joan Serrat Fernndez

    Madeira: A peer-to-peer approach to network management, pp. 141-153 ,2006

    [4] Cisco Advanced Services Network Management Systems Architectural Leading

    Practice, white paper from Cisco Public Information, 2007

    [5] Bela Berde, Carolina Pinart, Javier Gonzales Ordas, Piet Demeester and Koen

    Casier: An Experience on Implementing Network Management for a GMPLS Nework

    IV Workshop in MPLS/GMPLS networks. 21-22 April 2005, Gerona, Spain.

    f