Top Banner
 DataPower Troubleshooting TSE-1116 Matthias David Siebler DataPower L3 Team Lead
34

Impact2012_DataPower Troubleshooting.pdf

Feb 17, 2018

Download

Documents

Lovely Mahesh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 1/34

 

DataPower Troubleshooting TSE-1116

Matthias David Siebler 

DataPower L3 Team Lead

Page 2: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 2/34

 

Session Number: 1116

Title: Troubleshooting DataPower Appliances in the Field

 Abstract: The WebSphere DataPower family of appliances has a

wealth of tools available for troubleshooting problems in the field.

owever putting all these tools together is a difficult tas! for even

e"perienced developers of the platform. #ustomers are oftenoverwhelmed by the volume of data and how to interpret the data

properly. This session will describe how to systematically approach

troubleshooting several common scenarios.

Trac!: S$A% #onnectivity & 'ntegration

Page 3: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 3/34

 

genda

'ntroduction

(ust)*ather & +rror ,eports

Pac!et #aptures

Status Providers

$ut)of)(emory -$o(

/arge Debug /ogs

 Advanced Techni0ues

1&A

Summary

Page 4: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 4/34

 

!ntrodu"tion

 As a closed system% the primary responsibility for DataPower problem determination lies with the

'2( support team.

 3 istorically% little enablement of client self diagnosis and repair has been part of the architecture for

DataPower.

To facilitate improvements in problem determination% !nown as ,AS% we are:

 3 building our software to provide the data and information re0uired to resolve problems when they occur )

FFD# -First Failure Data #apture

 3 defining tools% best practices% and standards that allow the efficient analysis of problems within a product

or solution by the system% customer% or '2( Support

 3 analy4ing problems to continually modify our processes and procedures to improve software 0uality and

prevent problems from occurring.

Page 5: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 5/34

 

Must #ather 

http:55www)67.ibm.com5support5docview.wss8uid9swg7;7;<=>

+rror reports contain most status providers

 3 Some cannot fit into the error report due to si4e or time constraints

 3 +rror report content is continually being updated & improved

 3 ,eports can be useful even some time after the event

 3 Status snapshot after the fact should be augmented by historical trendsbefore the event

2est practice is to have some minimal archives & trend graphs

 3 2ut beware: Do not monitor the bo"es to death?

 3  All data is orthogonal to the method@ via S(P% #/'% web*B'% etc.

Page 6: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 6/34

 

Error $e%ort nal&sis

Tools are available on the support website to help parse the data 3 '.2.(. Support Assistant plugins

 Audit log will have history of restarts

*rep for errors@ have a history of Ce"pectedC errors & une"pected

 3 i.e. loadbalancer health chec!s

Page 7: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 7/34

 

'hat to alwa&s (now

(inimal: -every ; minutes 3 (emory

 3 /oad

 3 +stablished T#P connections

Throttler status log target is an easy way to collect all this

#reate a dedicated log target file or syslog to !eep from having it rotateaway

/ogs persist after a crash but status provider data is lost

now what is typical for your system

Page 8: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 8/34

 

'hat Else)

 All log files:

 3  Additional logs not put into the error report will be in ClogtempC

 3 Top level & for the specific domain-s@ unless too many domains

 3  Automated scripts to get files via #/' or S$(A are helpful to build in

advance

 3 /og files can be under ClogstoreC if using the log to ,A'D option

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">

  <env:Body>

  <dp:request xmlns:dp="http://www.datapower.com/schemas/management">

  <dp:get-!le name="logtemp:deault-log"/>

  </dp:request>

  </env:Body>

</env:Envelope>

Page 9: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 9/34

Status Provider 

These provide information about the system 3 +.g. filesystem% environment sensor% domain statusE.

'nformation can be accessed thru

 3 Webgui

 3 #/'

 3 S$(A

 3 S(P

Page 10: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 10/34

 

Memor& statisti"s

Show load x!#$con!g%& show load

  'as( )ame *oad +or( *!st , emory 0!le ,ount

  --------- ---- --------- --- ------ ----------

  ma!n 1 # # 22 3#4

  wtx # # # # 1#

  ssh # # # # 14

Show memory x!#$con!g%& show mem

  emory sage: 4 5

  'otal emory: 636728 (!lo9ytes

  sed emory: 2711 (!lo9ytes

  0ree emory: 281736 (!lo9ytes

;equested emory: 33487# (!lo9ytes

  old emory: 73347 (!lo9ytes

Page 11: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 11/34

 

Memor& Statisti"s

Bsage log:

6767<T76;GH IslmJIdebugJ throttle-Throttler: tid-><:

(emory-;K6;<5<6>=>=!2 =K.76K;K> free Pool-76<7=K<

Ports-7K;G57=;6 Temporary)FS-<5<(2 >.;G7>= free File-$

 3 (emory: same as CFree (emoryC from Cshow memoryC

 3 Pool: same as Cold (emoryC from Cshow memoryC

 3 Ports: number of free ports -internal structure@ Cshow connectionsC

 3 File: generic test of all filesystems access -< possible answers

L #annot access filesystem due to low memory

L ,outer has too many open files -may need to reloadL System has too many open files -may need to reboot

L other8

Page 12: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 12/34

 

Dis"re%an"ies

What to loo! for in the status providers: 3 Cshow tcpC@ Cshow connectionsC & Cshow handlesC

 3  All give slightly different results but roughly map one)to)one

 3 'f one is out)of)range by an order of magnitude could indicate an issue

Cshow loadC vs. Cshow cpuC 3 /oad is an instantaneous measure

 3 #PB is averaged

 3 /oad can Mump around a lot@ however E

L (ismatch can indicate CspinningC ports

Page 13: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 13/34

 

Pa"(et "a%ture

Why pac!et captures8

Page 14: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 14/34

 

Pa"(et *a%tures

Definitive answer to protocol interoperability.

ot necessarily the same as the Probe?

ow can capture on loopbac!% N/As or all interfaces at once.

T#Pdump format@ viewable by Wireshar! etc.

Page 15: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 15/34

 

Pa"(et *a%ture +iltering

+"pression format should follow Cpcap)filter-KC 3 http:55www.uni".com5man)page5Free2SD5K5pcap)filter 

 3 e.g.

Supports basic and advanced filtering capabilities

Provides ability to filter on

 3 'P address

 3 Port

 3 (A# address

 3 and many other 0ualifiers

Page 16: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 16/34

 

++D* ba"(ground %a"(et "a%ture

 Always)on bac!ground pac!et capture has low overhead

#aptures pac!ets on all interfaces simultaneously

#apture automatically generated when

 3 the system e"periences an outage% such as a crash

 3 user re0uested ) (ust)*ather operation

When FFD# triggers report generation 3 information is current

+nables a pac!et capture to be compressed and stored automatically

in an +rror ,eport and optionally sent off)bo"

Page 17: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 17/34

Servi"e Probe

(ultistep Probe shows the payload as it moves through the

processing policy 3 not meant to be on)the)wire

Page 18: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 18/34

 

,oM

DataPower does not have virtual memory 3 Pro: performance

 3 #on: <*2 is shared by all domains & transactions

First step is to determine trigger of the $o( event

 3 (emory lea!

 3 Traffic spi!e

istorical graphs are necessary to determine root cause

 3 can indicate correlation of memory increase to high load

 3 can indicate correlation of memory increase to bac!end latency

Spi!es:

 3  An increase in traffic arriving at the device

 3  An increase in delay at bac!ends or in sidecalls

 3 #an be detected if Throttle status log option is enabled

Page 19: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 19/34

 

Memor& or other resour"e lea(s

*enerally must always have a baseline What can be lea!ed8

 3 (emory

 3 File handles5soc!ets5file descriptors

 3 Ports -slightly different from soc!ets

 3 'nodes -very rare

Tracing must be turned on before the resource is lea!ed

#urrently lea! detection re0uires a reboot@ development is planning for

always)on resource lea! tracing

Page 20: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 20/34

 

Memor& logs

+ach log message captures a snapshot at that time 3 ot cumulative@ can go up & down

 3 ot e"haustive@ some actions or protocols can allocate memory outside

 Added in .=.: units are in bytes

1#33#116'33#7#8 memory-report?de9ug? mpgw$sender%: t!d$7686%: ;esponse

0!n!shed: memory used 13487866

1#3#33#7'121128 memory-report?de9ug? mpgw$stp-tp-mpgw%:

t!d$8###%response?4.61.3#1.31?: rocess!ng ;ule $stp-tp@pol!cy@rule@3%A

ct!on $Cstp-tp@pol!cy@rule@3@results@output@#CA results$%%A Dnput$D)'%A

utput$)**%? !n!shed: memory used 421

Page 21: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 21/34

 

Servi"es Memor& Status Provider 

 

Page 22: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 22/34

 

Lea( re%orts

 Always best to have a baseline Tracing must be always on?

 Active transactions can cause noise in the data capture@ best to turn off

traffic if at all possible

 3  Also try to capture 76)7;O memory growth between snapshots

2y default data is captured to FS@ the snapshots can be large

#/' option is available if necessary@ in some obvious cases can be

sufficient

 3 Shows top ten users

'n some cases lea! reports may not be enough@ certain types of

memory allocations are not captured

Page 23: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 23/34

 

S"alabilit& "on"erns

S/( vs. (onitors Determining scalability re0uires a methodical and sensible approach

Debugging can be surprisingly tric!y

ow much traffic is the bo" actually ta!ing8

 3 ow many re0uests8

 3 What !ind8

 3 ow big8

 3 What actions8

Page 24: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 24/34

 

Message *ount Monitors

(ost accurate method for determining e"actly how much traffic aservice is processing

(ore lightweight than S/(

Does not have as many options&features

Page 25: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 25/34

 

SLM

Shaping can be used to smooth traffic Should not be used to hide a bro!en bac!end

 3 Plan on shaping for a few seconds@ not minutes

,eliability should be endend@ not hop)by)hop

There is no free lunch? -well maybe in Negas

Page 26: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 26/34

 

udit log

Polling for uptime is best practice for monitoring restarts

 3 +"cept when the bit counter wraps

(onitoring the audit log is also useful

 3 'f the uptime goes down then the bo" has rebooted@ otherwise it reloaded

 3 2ut note this message:

1#31#13'#723 eventlog?a!lure? $FGF'E:deault:H:H%: Boot!ng 9u!ld 1#8#on 1#33/33/3 33:#1:# count 21. pt!me 72#

 3 Boot!ng message w/ type a!lure !s Iust an !nd!cat!on that the aud!t log has rotatedJ

not that the 9ox has restarted

Page 27: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 27/34

Log *L! Trigger 

Should have the ability to e"ecute any #/' command or #/' script

eeds to match on /og (essage 'D and message te"t using an optional regular e"pression

+"amples of ability to e"ecute any #/' command or #/' scripts:

 3 start a pac!et capture on a specific event

 3 stop a pac!et on the ne"t occurrence of the same event

 3 perform a must)gather +rror ,eport generation

Page 28: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 28/34

 

Large Debug Logs

The goal is that we want to capture all possible data.

Debug logging will do that@ with some few e"ceptions:

 3 ,2( -optional

 3 web*B' -optional

 3 logging about logging -not possible.

'n the default domain@ create a new log target

 3 type file

 3 format te"t

 3 timestamp numeric

 3 archive rotate

 3 event all debug

ma"imum total is ;6(2 times 766 9 ; *2

 3 Do we have that much8

 3 (a!e sure the space is available?

Page 29: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 29/34

 

Large Debug Logs - ,nbo

2est practice: log to ,A'D

,otate@ do not archive files

2etter to pull via TTP rather than push

'f push must be used FTP is the preferred approach

The file log target cannot rotate more than once per second

 3 minimum si4e of the log file should be able to contain more than 7seconds worth of data@ otherwise you will certainly be losing messages.

Dropped messages are also in the log file: 2uffer $verflow: Q event-s

lostR

Page 30: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 30/34

 

Large Debug Logs

 Always chec! to ma!e sure they wor!

 3 #hec! on the log target status

 3 Should have 4ero dropped events

Page 31: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 31/34

 

Large Debug Logs - ,..bo

2est practice 3 syslog over BDP

Bsing syslog)tcp may cause bottlenec! -if using firmware <.6. or

before

 3 DataPower opens many simultaneous connections

 3 #an bring down some servers  Always set a static route to the syslog servers to force outbound traffic

over the correct interface

 Adding a syslog log target is a lightweight addition to a busy bo"

ote: BDP syslog may truncate some longer messages

Page 32: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 32/34

 

Su%%ort $esour"es

'2( WebSphere DataPower S$A Appliance andboo!

'2( Support Portal for DataPower  3 http:55www.ibm.com5support5entry5portal5$verview5Software5WebSphere5WebSphereDataPowerS$AAppliances

developerWor!s articles

Web#asts

Forum: https:55www.ibm.com5developerwor!s5forums5forum.Mspa8forum'D977>=

Bser *roups: http:55www.websphere.org5websphere5Site8page9ugdetail&group'd97G;

Page 33: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 33/34

 

+eedba"()

#omments8

Page 34: Impact2012_DataPower Troubleshooting.pdf

7/23/2019 Impact2012_DataPower Troubleshooting.pdf

http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 34/34

 

*o%&right / Trademar(s

'2( #orporation 67. All ,ights ,eserved.

'2(% the '2( logo% and ibm.com are trademar!s or registeredtrademar!s of 'nternational 2usiness (achines #orp.% registered inmany Murisdictions worldwide. $ther product and service names

might be trademar!s of '2( or other companies. A current list of'2( trademar!s is available on the Web at #opyright andtrademar! informationR at www.ibm.com5legal5copytrade.shtml.