Top Banner
11.09.2008 Nagios for service monitoring in GSM-based networks at T-Mobile 1 NETWAYS Nagios Conference 2008 Using Nagios for service monitoring in GSM-based T-Mobile networks
26

NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

Jun 04, 2018

Download

Documents

vodan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 1

NETWAYS Nagios Conference 2008

Using Nagios for service monitoring in GSM-based T-Mobile networks

Page 2: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 2F. Maerz / C. Hirsch

Using Nagios for service monitoring in GSM-based

networks at T-Mobile

Introducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management forIntroducing Network, Service and Host Management for

TTTT----Mobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International RoamingMobile European Service Operation Centre International Roaming

Frank März

[email protected]

Christian Hirsch

[email protected]

Page 3: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 3F. Maerz / C. Hirsch

T-Mobile ESOC IR

European Service Operation Centre for International Roaming

� Started 1993 when International Roaming was introduced together with Italy

� Today managing roaming services for

� T-Mobile Deutschland

� T-Mobile Austria

� T-Mobile UK

� T-Mobile Netherlands

� T-Mobile Czech Rep.

and supporting T-Mobile national companies in Poland,

Slovakia, Croatia, USA, Hungary

� Core team (17) based in Nuremberg, Germany

Page 4: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 4F. Maerz / C. Hirsch

T-Mobile ESOC IR

Tasks

� IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) IREG (GSM Association IR Expert Group) � Testing new roaming partners for any type of service

� Voice roaming, prepaid roaming, data roaming, WLAN, MMS interworking

� Network troubleshooting

� Roaming EngineeringRoaming EngineeringRoaming EngineeringRoaming Engineering� Introducing new roaming and inter-working services

� Active network testing

� Network monitoring

� Service Interface Desk Service Interface Desk Service Interface Desk Service Interface Desk � Interface desk for roaming partner and carriers

� Technical support for customer care

� SIM Card management

� Reporting

Page 5: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 5F. Maerz / C. Hirsch

The most international Nagios implementation

T-Mobile uses 3 Nagios installations to monitor

205 countries in the world

530 foreign networks

every 5 minutes !

Page 6: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 6F. Maerz / C. Hirsch

T-Mobile ESOC IR service monitoring philosophy

� Layer 1 Connectivity (NAGIOS)

� Between the IP core networks for all packet service roaming partners

� Between all CS (voice) roaming partners

� Towards all used equipment

� Layer 2 Performance (partly NAGIOS)

� Service confirmation

� Performance data capturing

� Layer 3 Verification

� Performance data analysis

Page 7: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 7F. Maerz / C. Hirsch

Layer 1 - Connectivity

These active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage caThese active tests are executed in short intervals, an outage can be recognized immediately.n be recognized immediately.n be recognized immediately.n be recognized immediately.

� On controlled environment (10%):

� by standard network management tools (e.g. PING)

� On uncontrolled environment (90%):

� by simulated “user” traffic (e.g. SMTP-Mail-From)

� by simulated “control” traffic (e.g. GTP-Echo)

Ensuring connectivity for service availability:

““““Connectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT serviceConnectivity is the basis for every IT service””””

Page 8: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 8F. Maerz / C. Hirsch

Layer 2 - Performance monitoring

� Checking system function� Does the system provide the service it offers? (a DNS server response to a DNS request)

� Requesting status information� Utilizes network management protocols to gather status information (load, temperature,

disk usage)

� Using real user data traffic� Capture user traffic and check if it’s correct (protocol analyzer)

Page 9: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 9F. Maerz / C. Hirsch

Layer 3 - Traffic Verification

� Compare performance results over a period of time� Different values may indicate a load or bottleneck issue (e.g. compare Round Trip Time

values)

� Look at complete call details for a single user� Filter for a single user connection in order to find problems on the bit level

� Run statistic analysis on captured network traffic� Utilize captured user data for statistic analysis in order to measure success rates and

performance (e.g. Create PDP Context Reply Rate)

Page 10: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 10F. Maerz / C. Hirsch

History of Nagios at T-Mobile ESOC IR

2003200320032003

T-Mobile ESOC IR started testing Nagios in with DNS and SNMP checks

2004200420042004GTP (GPRS Tunnel Protocol) plugin for Nagios allowed us to simulate a GSM core node (SGSN)

2005200520052005Support contract with Netways

Introduced Nagios Grapher

Including server monitoring

NRPE design / start of rollout to other T-Mobile networks

2006200620062006Integrated gateway into SS7 network together with Telesoft Technologies (UK)

KPI performance monitoring reporting

2007200720072007International rollout for SS7 gateways

2008200820082008Nagios 3 on virtual XEN environment

Page 11: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 11F. Maerz / C. Hirsch

Nagios – the perfect match for connectivity checks

� Connectivity check

� Retrieving network data

� This requires a solution which is capable of making:� Connectivity check

� Retrieve network data

� Schedule these tasks

� Present the results and forward performance data to other systems

� Send alarms to external systems

� Very powerful

� Extremely flexible

� It may be complex to manage and likely very expensive.

} � Active checksActive checksActive checksActive checks

Not withNot withNot withNot with

Page 12: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 12F. Maerz / C. Hirsch

NRPEs

Nagios T-Mobile for IP (Nagios Master)

NRPEs in local T-Mobile backbone networks

Nagios TMO IP(nagios-master)

� IP connectivity monitoring for GPRS / 3G

� Checking MMS Inter-working (SMTP dialogs towards MMS Centers)

� WLAN Roaming (Radius authentication)

� Central Nagios Server with access to NRPEs in IP core networks in Germany, UK,

Netherlands, Austria

Page 13: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 13F. Maerz / C. Hirsch

Nagios TMD

System Monitoring with Nagios

� quite normal system health checks like:

� hardware health

� ping

� load

� ssh

� disk space

� services

� …

� performance / capacity monitoring:

� router traffic

� RTTs

� route availability

� …

Page 14: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 14F. Maerz / C. Hirsch

Nagios T-Mobile SS7

Connectivity checks for voice roaming

� Central Nagios Server triggers MAP dialogs on Telesoft Technologies application

server which runs NRPE

NAGIOS

SS7

� The application server opens the MAP dialog in the local T-Mobile network

Page 15: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 15F. Maerz / C. Hirsch

Summary of all used Nagios service checks for GSM

networks

� Nagios checks “everything” every 5 minutes, over 250.000.000 checks a year

� Connectivity check for GSM networks� Packet roaming – „GTP Echo“

� MMS Interworking – SMTP Dialog“

� CS Roaming – „MAP dialogs“

� WLAN Roaming - Radius authentication

� Performance� BGP routes to roaming partners

� BGP peers status to neighbors

� Interface status for physical links

� Link usage

� ftp/sftp connections

� Serverload, user, temperature, disk usage, raid status, power supply, fans, zombie, processes

� Running process

� Log-In (ssh, telnet)

Page 16: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 16F. Maerz / C. Hirsch

Technical Realization

Christian Hirsch

PART 2

Page 17: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 17F. Maerz / C. Hirsch

Technical Realization

Special Plugin Design

Page 18: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 18F. Maerz / C. Hirsch

GPRS / 3G Roaming network environment

MNO B

PeeringExchange

GRX1 GRX2

GGSNCPELocaltail

T-Mobile

IP

BorderGateway

BG

Nagios NRPE DNS DNS

IPnetwork

IPnetwork

� It uses the DNS protocol to resolve the APN (access point name) for IR partners

� The DNS responds with return of the IP from the home GGSN for the roaming partner

� The NRPE sends a GTP Echo towards the GGSN IP address

� If the GGSN responds the connectivity is OK

DNS.req

DNS.reqDNS.res

DNS.res

GTP-Echo.req

GTP-Echo.res

� RTT is displayed in Nagios Grapher, RTT indicates backbone bottlenecks

HowHowHowHow check_ggsncheck_ggsncheck_ggsncheck_ggsn worksworksworksworks::::

� Nagios acts like a GSM network node (SGSN)

Page 19: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 19F. Maerz / C. Hirsch

MNO B

SS7Carrier

SS7Carrier

MNO A

SS7 SS7

Voice roaming network environment

NAGIOS

SIGNALLINGGATEWAY

� This allows Nagios to simulate GSM functions like register to a network, initial calls or SMS

� The gateway was designed by T-Mobile and Telesoft Technologies

� NAGIOS interacts with a SS7 gateway which “speaks” GSM MAP (3GPP 29.002)

Page 20: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 20F. Maerz / C. Hirsch

Technical Realization

Nagios 3 on virtual XEN environment

Page 21: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 21F. Maerz / C. Hirsch

Nagios 3 on virtual XEN environment

� reduced hardware costs

� High Availability

� minimize downtimes during scheduled maintenance

� easy backups

� reduced power consumption and need for cooling (GREEN IT)(GREEN IT)(GREEN IT)(GREEN IT)

nagios-tmd

nagios-master

nagios-ss7

Page 22: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 22F. Maerz / C. Hirsch

Physical Node Design

HP ProLiant DL380 G5

XEN Dom0.0

Partition – Table:

/dev/cciss/c0d0p1 100MB /boot

/dev/cciss/c0d0p2 48GB /

/dev/cciss/c0d0p3 16GB swap

/dev/cciss/c0d0p4 extended

/dev/cciss/c0d0p5 618,76GB LVM

eth0eth1eth2eth3eth4eth5

XEN Dom0.1

Partition – Table:

/dev/cciss/c0d0p1 100MB /boot

/dev/cciss/c0d0p2 48GB /

/dev/cciss/c0d0p3 16GB swap

/dev/cciss/c0d0p4 extended

/dev/cciss/c0d0p5 618,76GB LVM

CPUs 2x Intel Xeon

5160 Dual Core, 3.0 GHz

RAM 8 GB SDRAM

NICs:

HD 6 * 146 GB

HP ProLiant DL380 G5

eth0eth1eth2eth3eth4eth5

CPUs 2x Intel Xeon

5160 Dual Core, 3.0 GHz

RAM 8 GB SDRAM

NICs:

HD 6 * 146 GB

Page 23: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 23F. Maerz / C. Hirsch

Physical

Volumes/dev/cciss/c0d0p5

(618,76 GB)

/dev/cciss/c0d0p5

(618,76 GB)

Virtual Disk Design

devel

LV

nagios-tmd

LV

nagios-master

LV nagios-ss7

LV

devel

LV

nagios-tmd

LV LV nagios-ss7

LV

Logical

Volumes

nagios-master

m4nxhpsrm

121 m

4nxhpsrm

122

Crosslink

eth0 eth1 eth0eth1

Physical

Devices

eth2 eth2

Bond 0Bond 0

VLAN

drbd1drbd resource

drbd5drbd resource

drbd6drbd resource

drbd7drbd resource

Mirrored

Logical

Volumes

(DRBD-Resources)

Volume

Groups

Volume Group

/dev/VirtualDomains

Volume Group

/dev/VirtualDomains

Page 24: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 24F. Maerz / C. Hirsch

Virtual Network Design

VLAN

m4nxhpsrm

121 m

4nxhpsrm

122

Crosslink

(DRBD sync / XEN live migration)

eth0 eth1

eth0

nagios-tmd nagios-master nagios-ss7

eth2eth2

devel

Xen

Bridge

Virtual

Layer

Physical

LayerBond 0

eth1

eth0 eth1

Bond 0

xenbr0 xenbr0virbr0 virbr0

eth0 eth1 eth0 eth1 eth0 eth1

Page 25: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 25F. Maerz / C. Hirsch

Nagios 3 on virtual XEN environment

Virtual Node Design

nagios-tmd

drbd5drbd resource

eth0eth1

NICs

HD (15GB)

RAM 1024 MB

CPUs 2

Partition – Table:

/dev/xvda1 100MB /boot

/dev/xvda2 2 GB swap

/dev/xvda3 13 GB /

nagios-master

drbd6drbd resource

eth0eth1

NICs

HD (15GB)

RAM 1024 MB

CPUs 2

Partition – Table:

/dev/xvda1 100MB /boot

/dev/xvda2 2 GB swap

/dev/xvda3 13 GB /

nagios-ss7

drbd7drbd resource

eth0eth1

NICs

HD (15GB)

RAM 1024 MB

CPUs 2

Partition – Table:

/dev/xvda1 100MB /boot

/dev/xvda2 2 GB swap

/dev/xvda3 13 GB /

Page 26: NETWAYS Nagios Conference 2008 · NETWAYS Nagios Conference 2008 ... Network troubleshooting Roaming Engineering ... Nagiosacts like a GSM network node (SGSN)

11.09.2008Nagios for service monitoring in GSM-based networks at T-Mobile 26F. Maerz / C. Hirsch

Any Questions?

““““NowNowNowNow itititit‘‘‘‘ssss time time time time forforforfor a a a a

live live live live demodemodemodemo…“…“…“…“