Integrating kdump into oVirt 3.5 Martin Peřina Software Engineer at Red Hat August 26 th 2014
Integrating kdump into oVirt 3.5 2/43
Agenda● Motivation
● What is kdump?
● What is fence_kdump?
● How is it all coupled together?
● Configuration
● Future features
Integrating kdump into oVirt 3.5 4/43
Host kernel crash on oVirt <= 3.4:1.host kernel crashed, process which gathers crash
information started (this process can take a lot of time)
2.after some time engine detected the host as non responsive and execute fencing on it
3. if host is fenced during crash gathering, all crash information are lost
Integrating kdump into oVirt 3.5 5/43
Goal for oVirt 3.5● Try to detect if host is not in kdump flow prior to fence
execution
● If host is in kdump flow, do not execute fencing and wait for host to gather its crash information successfully
Integrating kdump into oVirt 3.5 7/43
What is kdump?● kexec based kernel crash dumping mechanism (when
standard kernel crashed, capture kernel is booted)
● dumps memory content of crashed kernel into file on local or remote target
● dumping is executed from capture kernel, crashed kernel memory is preserved
● capture kernel needs reserved memory in standard kernel
Integrating kdump into oVirt 3.5 9/43
How kdump works?1. Standard kernel crashes
2. Kexec boots capture kernel
3. Memory dump is executed in capture kernel
4. Memory dump file is stored to specified target
5. Host is rebooted
Integrating kdump into oVirt 3.5 10/43
Kdump configurationkdump configuration is stored in:
● /etc/kdump.conf
● static configuration that can be changed by administrator
● capture kernel initial ramdisk file
● created from /etc/kdump.conf on kdump service restart
Integrating kdump into oVirt 3.5 11/43
Sample kdump.confpath /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
Integrating kdump into oVirt 3.5 12/43
Kdump requirements● kexec-tools package which contains tools to setup and
execute kdump
● crashkernel=MEM_SIZE command line parameter needs to be configured for standard kernel (on RHEL/Centos enabled by default, on Fedora administrator is required to enable it)
● kdump service has to be enabled
Integrating kdump into oVirt 3.5 14/43
What is fence_kdump?● set of command line tools to receive messages from
dumping host on another predefined host
● part of fence-agents-kdump package
● it uses UDP protocol for messaging
● it uses port 7410 (can be changed)
● it sends messages each 10 seconds (can be changed)
Integrating kdump into oVirt 3.5 15/43
Kdump and fence_kdump/etc/kdump.conf contains two options to setup fence_kdump:
● fence_kdump_nodes
● list of hosts to send messages to
● fence_kdump_args
● additional parameters for fence_kdump_send
Integrating kdump into oVirt 3.5 16/43
kdump.conf with fence_kdumppath /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
fence_kdump_nodes mperina.brq.redhat.com
fence_kdump_args -p 7410 -i 5
Integrating kdump into oVirt 3.5 17/43
fence_kdump limitations● fence_kdump destination host(s) have to be predefined and
they are part of capturing kernel initial ramdisk
● fence_kdump receiver can be used to determine if host is kdumping only for one host at the time and it cannot be used to determine if host finished kdumping
● fence_kdump messages are sent unencrypted using UDP protocol
● fence_kdump messages are not signed, sender can be identified only by source IP address
Integrating kdump into oVirt 3.5 20/43
New fence_kdump listener● new standalone fence_kdump listener was implemented as
a part of oVirt kdump integration
● it can receive messages from multiple kdumping hosts at once
● it can determine that host finished kdumping using timeout from last received message
● it communicates with engine using engine database
● it's executed as a service on the same host as engine
Integrating kdump into oVirt 3.5 21/43
Integration – host deploy 1/3● kdump integration can be enabled for each host by setting
an option in Power Management tab of Host detail popup in webadmin
● host needs to be redeployed after kdump integration was enabled
● kdump integration is not bound to cluster level, it can be enabled even for < 3.5 cluster levels
Integrating kdump into oVirt 3.5 22/43
Integration – host deploy 2/3● during host deploy there are executed checks if kdump
integration can be enabled:
● host kernel has crashkernel=MEM_SIZE option set
● correct version of kexec-tools is available
● kdump destination address (engine FQDN) can be resolved
● if any of these checks are not successful, host deploy finishes successfully, but kdump integration is not configured and warning displayed
Integrating kdump into oVirt 3.5 23/43
Integration – host deploy 3/3● if all checks are successful
● fence_kdump options are updated in /etc/kdump.conf
● kdump service is restarted
● if kdump integration was not successfully configured during host deploy, administrator can fix the issues later manually and try to redeploy host again
Integrating kdump into oVirt 3.5 26/43
Host deploy part limitations● host deploy updates only fence_kdump options in
kdump.conf, other options are untouched
● administrator is responsible to manually set correct kdump target
Integrating kdump into oVirt 3.5 32/43
fence_kdump listener configListener configuration is stored in text files:
● They need to have .conf suffix
● They have to be located under/etc/ovirt-engine/ovirt-fence-kdump-listener.d directory
● They are simple property based text files
Service restart is needed when config files were changed:
systemctl restart ovirt-fence-kdump-listener
Integrating kdump into oVirt 3.5 33/43
Listener config file sampleLISTENER_ADDRESS=0.0.0.0
LISTENER_PORT=7410
HEARTBEAT_INTERVAL=30
SESSION_SYNC_INTERVAL=5
REOPEN_DB_CONNECTION_INTERVAL=30
KDUMP_FINISHED_TIMEOUT=30
Integrating kdump into oVirt 3.5 34/43
fence_kdump listener options 1/3LISTENER_ADDRESS
● IP adress(es) that fence_kdump listener listens on
● It can contains either 0.0.0.0 (default) or one specific IP address
LISTENER_PORT
● port that fence_kdump listener listens on (default 7410)
Integrating kdump into oVirt 3.5 35/43
fence_kdump listener options 2/3HEARTBEAT_INTERVAL
● Defines the interval in seconds (default 30) of listener's heartbeat updates to database
SESSION_SYNC_INTERVAL
● Defines the interval in seconds (default 5) to synchronize listener's host kdumping sessions in memory to database
Integrating kdump into oVirt 3.5 36/43
fence_kdump listener options 3/3REOPEN_DB_CONNECTION_INTERVAL
● Defines the interval in seconds (default 30) to reopen database connection which was previously unavailable
KDUMP_FINISHED_TIMEOUT
● Defines maximum timeout in seconds after last received message from kdumping hosts after which the host kdump flow is marked as FINISHED
Integrating kdump into oVirt 3.5 37/43
fence_kdump engine config 1/4● fence_kdump options which are not related to listener are
stored in database and they can be changed using engine‑config tool
● it's required to restart ovirt-engine (and sometimes also redeploy hosts) when these values were changed
Integrating kdump into oVirt 3.5 38/43
fence_kdump engine config 2/4FenceKdumpDestinationAddress
● Defines the hostname(s) or IP address(es) to send fence_kdump messages to
● If empty (default), engine FQDN is used
FenceKdumpDestinationPort
● Defines the port (default 7410) to send fence_kdump messages to
Integrating kdump into oVirt 3.5 39/43
fence_kdump engine config 3/4FenceKdumpMessageInterval
● Defines interval in seconds (default 5) between messages sent by fence_kdump
FenceKdumpListenerTimeout
● Defines max timeout in seconds (default 90) since last heartbeat to consider fence_kdump listener alive.
Integrating kdump into oVirt 3.5 40/43
fence_kdump engine config 3/4KdumpStartedTimeout
● Defines maximum timeout in seconds (default 30) to wait until 1st message from kdumping host is received (to detect that host kdump flow started)
Integrating kdump into oVirt 3.5 42/43
Future features● Extend kdump to send it's flow status as a part of
fence_kdump message (starting, dumping, finished, error, ...)
● Extend fence_kdump protocol to:
● use message sequence number
● include unique host id (not to rely just on IP address)
● include HMAC signature for message