DataPower Troubleshooting TSE-1116 Matthias David Siebler DataPower L3 Team Lead
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 1/34
DataPower Troubleshooting TSE-1116
Matthias David Siebler
DataPower L3 Team Lead
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 2/34
Session Number: 1116
Title: Troubleshooting DataPower Appliances in the Field
Abstract: The WebSphere DataPower family of appliances has a
wealth of tools available for troubleshooting problems in the field.
owever putting all these tools together is a difficult tas! for even
e"perienced developers of the platform. #ustomers are oftenoverwhelmed by the volume of data and how to interpret the data
properly. This session will describe how to systematically approach
troubleshooting several common scenarios.
Trac!: S$A% #onnectivity & 'ntegration
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 3/34
genda
'ntroduction
(ust)*ather & +rror ,eports
Pac!et #aptures
Status Providers
$ut)of)(emory -$o(
/arge Debug /ogs
Advanced Techni0ues
1&A
Summary
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 4/34
!ntrodu"tion
As a closed system% the primary responsibility for DataPower problem determination lies with the
'2( support team.
3 istorically% little enablement of client self diagnosis and repair has been part of the architecture for
DataPower.
To facilitate improvements in problem determination% !nown as ,AS% we are:
3 building our software to provide the data and information re0uired to resolve problems when they occur )
FFD# -First Failure Data #apture
3 defining tools% best practices% and standards that allow the efficient analysis of problems within a product
or solution by the system% customer% or '2( Support
3 analy4ing problems to continually modify our processes and procedures to improve software 0uality and
prevent problems from occurring.
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 5/34
Must #ather
http:55www)67.ibm.com5support5docview.wss8uid9swg7;7;<=>
+rror reports contain most status providers
3 Some cannot fit into the error report due to si4e or time constraints
3 +rror report content is continually being updated & improved
3 ,eports can be useful even some time after the event
3 Status snapshot after the fact should be augmented by historical trendsbefore the event
2est practice is to have some minimal archives & trend graphs
3 2ut beware: Do not monitor the bo"es to death?
3 All data is orthogonal to the method@ via S(P% #/'% web*B'% etc.
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 6/34
Error $e%ort nal&sis
Tools are available on the support website to help parse the data 3 '.2.(. Support Assistant plugins
Audit log will have history of restarts
*rep for errors@ have a history of Ce"pectedC errors & une"pected
3 i.e. loadbalancer health chec!s
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 7/34
'hat to alwa&s (now
(inimal: -every ; minutes 3 (emory
3 /oad
3 +stablished T#P connections
Throttler status log target is an easy way to collect all this
#reate a dedicated log target file or syslog to !eep from having it rotateaway
/ogs persist after a crash but status provider data is lost
now what is typical for your system
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 8/34
'hat Else)
All log files:
3 Additional logs not put into the error report will be in ClogtempC
3 Top level & for the specific domain-s@ unless too many domains
3 Automated scripts to get files via #/' or S$(A are helpful to build in
advance
3 /og files can be under ClogstoreC if using the log to ,A'D option
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<dp:request xmlns:dp="http://www.datapower.com/schemas/management">
<dp:get-!le name="logtemp:deault-log"/>
</dp:request>
</env:Body>
</env:Envelope>
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 9/34
Status Provider
These provide information about the system 3 +.g. filesystem% environment sensor% domain statusE.
'nformation can be accessed thru
3 Webgui
3 #/'
3 S$(A
3 S(P
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 10/34
Memor& statisti"s
Show load x!#$con!g%& show load
'as( )ame *oad +or( *!st , emory 0!le ,ount
--------- ---- --------- --- ------ ----------
ma!n 1 # # 22 3#4
wtx # # # # 1#
ssh # # # # 14
Show memory x!#$con!g%& show mem
emory sage: 4 5
'otal emory: 636728 (!lo9ytes
sed emory: 2711 (!lo9ytes
0ree emory: 281736 (!lo9ytes
;equested emory: 33487# (!lo9ytes
old emory: 73347 (!lo9ytes
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 11/34
Memor& Statisti"s
Bsage log:
6767<T76;GH IslmJIdebugJ throttle-Throttler: tid-><:
(emory-;K6;<5<6>=>=!2 =K.76K;K> free Pool-76<7=K<
Ports-7K;G57=;6 Temporary)FS-<5<(2 >.;G7>= free File-$
3 (emory: same as CFree (emoryC from Cshow memoryC
3 Pool: same as Cold (emoryC from Cshow memoryC
3 Ports: number of free ports -internal structure@ Cshow connectionsC
3 File: generic test of all filesystems access -< possible answers
L #annot access filesystem due to low memory
L ,outer has too many open files -may need to reloadL System has too many open files -may need to reboot
L other8
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 12/34
Dis"re%an"ies
What to loo! for in the status providers: 3 Cshow tcpC@ Cshow connectionsC & Cshow handlesC
3 All give slightly different results but roughly map one)to)one
3 'f one is out)of)range by an order of magnitude could indicate an issue
Cshow loadC vs. Cshow cpuC 3 /oad is an instantaneous measure
3 #PB is averaged
3 /oad can Mump around a lot@ however E
L (ismatch can indicate CspinningC ports
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 13/34
Pa"(et "a%ture
Why pac!et captures8
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 14/34
Pa"(et *a%tures
Definitive answer to protocol interoperability.
ot necessarily the same as the Probe?
ow can capture on loopbac!% N/As or all interfaces at once.
T#Pdump format@ viewable by Wireshar! etc.
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 15/34
Pa"(et *a%ture +iltering
+"pression format should follow Cpcap)filter-KC 3 http:55www.uni".com5man)page5Free2SD5K5pcap)filter
3 e.g.
Supports basic and advanced filtering capabilities
Provides ability to filter on
3 'P address
3 Port
3 (A# address
3 and many other 0ualifiers
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 16/34
++D* ba"(ground %a"(et "a%ture
Always)on bac!ground pac!et capture has low overhead
#aptures pac!ets on all interfaces simultaneously
#apture automatically generated when
3 the system e"periences an outage% such as a crash
3 user re0uested ) (ust)*ather operation
When FFD# triggers report generation 3 information is current
+nables a pac!et capture to be compressed and stored automatically
in an +rror ,eport and optionally sent off)bo"
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 17/34
Servi"e Probe
(ultistep Probe shows the payload as it moves through the
processing policy 3 not meant to be on)the)wire
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 18/34
,oM
DataPower does not have virtual memory 3 Pro: performance
3 #on: <*2 is shared by all domains & transactions
First step is to determine trigger of the $o( event
3 (emory lea!
3 Traffic spi!e
istorical graphs are necessary to determine root cause
3 can indicate correlation of memory increase to high load
3 can indicate correlation of memory increase to bac!end latency
Spi!es:
3 An increase in traffic arriving at the device
3 An increase in delay at bac!ends or in sidecalls
3 #an be detected if Throttle status log option is enabled
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 19/34
Memor& or other resour"e lea(s
*enerally must always have a baseline What can be lea!ed8
3 (emory
3 File handles5soc!ets5file descriptors
3 Ports -slightly different from soc!ets
3 'nodes -very rare
Tracing must be turned on before the resource is lea!ed
#urrently lea! detection re0uires a reboot@ development is planning for
always)on resource lea! tracing
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 20/34
Memor& logs
+ach log message captures a snapshot at that time 3 ot cumulative@ can go up & down
3 ot e"haustive@ some actions or protocols can allocate memory outside
Added in .=.: units are in bytes
1#33#116'33#7#8 memory-report?de9ug? mpgw$sender%: t!d$7686%: ;esponse
0!n!shed: memory used 13487866
1#3#33#7'121128 memory-report?de9ug? mpgw$stp-tp-mpgw%:
t!d$8###%response?4.61.3#1.31?: rocess!ng ;ule $stp-tp@pol!cy@rule@3%A
ct!on $Cstp-tp@pol!cy@rule@3@results@output@#CA results$%%A Dnput$D)'%A
utput$)**%? !n!shed: memory used 421
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 21/34
Servi"es Memor& Status Provider
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 22/34
Lea( re%orts
Always best to have a baseline Tracing must be always on?
Active transactions can cause noise in the data capture@ best to turn off
traffic if at all possible
3 Also try to capture 76)7;O memory growth between snapshots
2y default data is captured to FS@ the snapshots can be large
#/' option is available if necessary@ in some obvious cases can be
sufficient
3 Shows top ten users
'n some cases lea! reports may not be enough@ certain types of
memory allocations are not captured
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 23/34
S"alabilit& "on"erns
S/( vs. (onitors Determining scalability re0uires a methodical and sensible approach
Debugging can be surprisingly tric!y
ow much traffic is the bo" actually ta!ing8
3 ow many re0uests8
3 What !ind8
3 ow big8
3 What actions8
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 24/34
Message *ount Monitors
(ost accurate method for determining e"actly how much traffic aservice is processing
(ore lightweight than S/(
Does not have as many options&features
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 25/34
SLM
Shaping can be used to smooth traffic Should not be used to hide a bro!en bac!end
3 Plan on shaping for a few seconds@ not minutes
,eliability should be endend@ not hop)by)hop
There is no free lunch? -well maybe in Negas
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 26/34
udit log
Polling for uptime is best practice for monitoring restarts
3 +"cept when the bit counter wraps
(onitoring the audit log is also useful
3 'f the uptime goes down then the bo" has rebooted@ otherwise it reloaded
3 2ut note this message:
1#31#13'#723 eventlog?a!lure? $FGF'E:deault:H:H%: Boot!ng 9u!ld 1#8#on 1#33/33/3 33:#1:# count 21. pt!me 72#
3 Boot!ng message w/ type a!lure !s Iust an !nd!cat!on that the aud!t log has rotatedJ
not that the 9ox has restarted
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 27/34
Log *L! Trigger
Should have the ability to e"ecute any #/' command or #/' script
eeds to match on /og (essage 'D and message te"t using an optional regular e"pression
+"amples of ability to e"ecute any #/' command or #/' scripts:
3 start a pac!et capture on a specific event
3 stop a pac!et on the ne"t occurrence of the same event
3 perform a must)gather +rror ,eport generation
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 28/34
Large Debug Logs
The goal is that we want to capture all possible data.
Debug logging will do that@ with some few e"ceptions:
3 ,2( -optional
3 web*B' -optional
3 logging about logging -not possible.
'n the default domain@ create a new log target
3 type file
3 format te"t
3 timestamp numeric
3 archive rotate
3 event all debug
ma"imum total is ;6(2 times 766 9 ; *2
3 Do we have that much8
3 (a!e sure the space is available?
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 29/34
Large Debug Logs - ,nbo
2est practice: log to ,A'D
,otate@ do not archive files
2etter to pull via TTP rather than push
'f push must be used FTP is the preferred approach
The file log target cannot rotate more than once per second
3 minimum si4e of the log file should be able to contain more than 7seconds worth of data@ otherwise you will certainly be losing messages.
Dropped messages are also in the log file: 2uffer $verflow: Q event-s
lostR
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 30/34
Large Debug Logs
Always chec! to ma!e sure they wor!
3 #hec! on the log target status
3 Should have 4ero dropped events
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 31/34
Large Debug Logs - ,..bo
2est practice 3 syslog over BDP
Bsing syslog)tcp may cause bottlenec! -if using firmware <.6. or
before
3 DataPower opens many simultaneous connections
3 #an bring down some servers Always set a static route to the syslog servers to force outbound traffic
over the correct interface
Adding a syslog log target is a lightweight addition to a busy bo"
ote: BDP syslog may truncate some longer messages
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 32/34
Su%%ort $esour"es
'2( WebSphere DataPower S$A Appliance andboo!
'2( Support Portal for DataPower 3 http:55www.ibm.com5support5entry5portal5$verview5Software5WebSphere5WebSphereDataPowerS$AAppliances
developerWor!s articles
Web#asts
Forum: https:55www.ibm.com5developerwor!s5forums5forum.Mspa8forum'D977>=
Bser *roups: http:55www.websphere.org5websphere5Site8page9ugdetail&group'd97G;
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 33/34
+eedba"()
#omments8
7/23/2019 Impact2012_DataPower Troubleshooting.pdf
http://slidepdf.com/reader/full/impact2012datapower-troubleshootingpdf 34/34
*o%&right / Trademar(s
'2( #orporation 67. All ,ights ,eserved.
'2(% the '2( logo% and ibm.com are trademar!s or registeredtrademar!s of 'nternational 2usiness (achines #orp.% registered inmany Murisdictions worldwide. $ther product and service names
might be trademar!s of '2( or other companies. A current list of'2( trademar!s is available on the Web at #opyright andtrademar! informationR at www.ibm.com5legal5copytrade.shtml.