Kdump

From ArchWiki

Kdump is a standard Linux mechanism to dump machine memory content on kernel crash. Kdump is based on Kexec. Kdump utilizes two kernels: the regular system kernel and kdump capture kernel (called from now on, the kdump kernel). System kernel is the normal kernel, booted with the crashkernel parameter - we need to tell the system kernel to reserve some amount of physical memory where the kdump kernel will be loaded/executed. Then it's necessary to load the kdump kernel in advance, because when the system kernel crashes, there is no reliable way to read data from disk, for example, given that such kernel is broken.

Once a kernel crash happens, the system kernel crash handler uses the Kexec mechanism to boot the kdump kernel in its pre-reserved memory. The memory from system kernel is preserved in such kexec boot, and it's accessible from the kdump kernel at the moment of crash. Once the kdump kernel is booted, the user can collect the file /proc/vmcore to get access to memory of the crashed system kernel. Such crash dump can be saved to disk or copied over network to some other machine for further post-mortem investigation.

In server production environments, the system and kdump kernels could be different - system kernel needs a lot of features and is compiled with a many kernel flags/drivers, while the kdump kernel goal is to be minimalistic and take as small amount of memory as possible, e.g. it could be compiled without network support if we store the crash dump to disk only. But for desktops and in general, for non-specific setups, the same kernel is used both as system and kdump kernels. It means we will load the same kernel code twice - one time as normal system kernel, another one to the reserved memory area, but with different kernel parameters.

Alternatives to setup kdump

The automatic way: kdumpst

Note: kdumpst implicitly depends on GRUB and does not work with other boot loaders, see https://gitlab.freedesktop.org/gpiccoli/kdumpst/-/issues/21

The kdumpstAUR tool is an automatic way for loading kdump. It's highly customizable - it defaults to another method of log collecting (called pstore), but can be easily set to use kdump (a matter of setting USE_PSTORE_RAM=0 on /usr/share/kdumpst.d/00-default. The tool also fallbacks to kdump in case pstore RAM region isn't available.

After installing kdumpst, one can check the journal and the following message means kdump is loaded: kdumpst: panic kexec loaded successfully. If a kernel crash happens, the kdump will be collected and in the subsequent boot, a message indicates the success of the operation: kdumpst: logs saved in "/var/crash/kdumpst/logs". In that folder, the user will find a lightweight zip blob, that included a dmesg plus some extra data. The vmcore itself is saved on /var/crash/kdumpst/crash. For questions/issues, the #kdump IRC channel at OFTC could be used, or open issues in the kdumpst repository.

Manual steps

In case the preference is for doing that manually, the below guide will help with that.

Compiling kernel

Both System/kdump kernels requires some configuration flags that may not be set by default. Please consult Kernel Compilation article for more information about compiling a custom kernel in Arch. Here we will emphasize on Kdump specific configurations. Current default Arch kernel builds have these flags already set. You can verify if your running kernel has these set by looking in /proc/config.gz.

To create a kernel you need to edit the kernel .config file and enable following configuration options:

.config
CONFIG_DEBUG_INFO=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y

Also change package base name to something like linux-kdump to distinguish the kernel from the default Arch one. Compile kernel package and install it. Save ./src/linux-X.Y/vmlinux uncompressed system kernel binary - it contains debug symbols and you will need them later when analyzing the crash.

For reference, some details about building a kdump kernel or configuring the kernel parameters for kdump could be found in the kernel Kdump documentation.

Prepare the kdump initramfs

The kdump initramfs should contain the tool for extracting the /proc/vmcore file information (see #Saving the crashed kernel memory). If you wish to save the artifacts to your root filesystem, it is easiest to modify your default initramfs. For example, if initramfs is built with mkinitcpio, the makedumpfile is required.

--- mkinitcpio.conf
+++ mkinitcpio-kdump.conf
@@ -11,12 +11,12 @@
 # wish into the CPIO image.  This is run last, so it may be used to
 # override the actual binaries included by a given hook
 # BINARIES are dependency parsed, so you may safely ignore libraries
-BINARIES=()
+BINARIES=(/usr/bin/makedumpfile)

 # FILES
 # This setting is similar to BINARIES above, however, files are added
 # as-is and are not parsed in any way.  This is useful for config files.
-FILES=()
+FILES=(/etc/systemd/system/kdump-save.service)

 # HOOKS
 # This is the most important setting in this file.  The HOOKS control the

and remember to update mkinitcpio preset file so that fresh initrd is rebuilt upon each kernel update.

Setup the kdump kernel

First, you need to reserve memory in the system kernel, for the kdump kernel loading. Edit your bootloader configuration and add crashkernel=[size] kernel parameter.

Depending on the machine and how the kdump kernel was built, something from 128M to 256M is usually enough - it worth trying after setting everything to check if it succeeds. Note that the reserved memory is unavailable to the system kernel.

Reboot into your system kernel. To make sure that the kernel is booted with correct options please check the files /proc/cmdline and /sys/kernel/kexec_crash_size to see if the memory was indeed pre-reserved (sometimes it's possible , though rare, that such memory reservation fails - if it happens, check the dmesg to get more information).

Next you need to tell Kexec that you want to use your kdump kernel. Specify your kernel, initramfs file, root device and other parameters if needed:

# kexec -p [/boot/vmlinuz-linux-kdump] --initrd=[/boot/initramfs-linux-kdump.img] --append="root=[root-device] irqpoll nr_cpus=1 reset_devices"

It loads the kdump kernel into the reserved area. Without the -p flag kexec would boot the kernel right away, but in presence of such flag, the kdump kernel will be loaded into the reserved memory but its boot is postponed until a crash happens.

Note: The parameter nr_cpus=1 restricts the CPUs to 1 in the kdump environment, which is both memory saving (CPUs structures consume memory!) and also safer, as it restricts the surface for potential concurrency issues. If that option for some reason fails, there is another one to be used, instead: maxcpus=1. The second one consumes a bit more memory, since it initializes other CPUS structures but disables such CPUS except CPU0, whereas the nr_cpus one effectively drops the other CPUs structures. More information in the kernel CPU hotplug docs.

Instead of running kexec manually you might want to setup Systemd service that will run kexec on boot:

/etc/systemd/system/kdump.service
[Unit]
Description=Load the kdump kernel
After=local-fs.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/bin/kexec -p [/boot/vmlinuz-linux-kdump] --initrd=[/boot/initramfs-linux-kdump.img] --append="root=[root-device] irqpoll nr_cpus=1 reset_devices"
ExecStop=/usr/bin/kexec -p -u

[Install]
WantedBy=multi-user.target


Then enable kdump.service.

To check whether the crash kernel is already loaded please run following command:

$ cat /sys/kernel/kexec_crash_loaded

Testing kdump by crashing the kernel

If you want to test crash then you can use sysrq for this.

Warning: a kernel crash may corrupt data on your disks, run it at your own risk!
# sync; echo 1 > /proc/sys/kernel/sysrq; echo c > /proc/sysrq-trigger

Once crash happens kexec will load your kdump kernel.

Saving the crashed kernel memory

Once booted into the kdump kernel, the idea is to save the relevant contents from /proc/vmcore to analyze it later. Though this is exposed as a file (hence it's possible to copy it, like in cp /proc/vmcore /root/vmcore.crashdump, this is not the recommended way. The vmcore is a full copy of system memory, so this file will have 64G if your machine has 64G, for example. It includes all data from all the userpace loaded, as well as free memory. So, the best way for saving it is use the makedumpfile utility. Such application is able to remove free memory and userspace irrelevant data, as well as compress the vmcore! Example of the usage:

# makedumpfile -z -d 31 /proc/vmcore /root/vmcore.crashdump_compressed

You can also save out the dmesg log from the crashed kernel using this command:

# makedumpfile --dump-dmesg /proc/vmcore /root/vmcore.dmesg

The following systemd service can be used to automatically save the crash dumps and reboot into the system kernel again:

/etc/systemd/system/kdump-save.service
[Unit]
Description=Save the kernel crash dump after a crash
DefaultDependencies=no
Wants=local-fs.target
After=local-fs.target

[Service]
Type=idle
ExecStart=/bin/sh -c 'mkdir -p /var/crash/ && /usr/bin/makedumpfile -z -d 31 /proc/vmcore "/var/crash/crashdump-$$(date +%%F-%%T)"'
ExecStopPost=/usr/bin/systemctl reboot
UMask=0077
StandardInput=tty-force
StandardOutput=inherit
StandardError=inherit

This can be invoked from the kdump kernel command line - for that, we should edit the kdump load service as below:

/etc/systemd/system/kdump.service
[Unit]
Description=Load the kdump kernel
After=local-fs.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/bin/kexec -p [/boot/vmlinuz-linux-kdump] --initrd=[/boot/initramfs-linux-kdump.img] --append="root=[root-device] irqpoll nr_cpus=1 reset_devices systemd.unit=kdump-save.service"
ExecStop=/usr/bin/kexec -p -u

[Install]
WantedBy=multi-user.target

Early kdump using mkinitcpio

You might encounter a situation where the kernel crashes before the systemd service can be started. In this case, it might be helpful to run kexec as a mkinitcpio hook rather than a service.

First make a copy of your initramfs. This will be used to run the crash kernel.

# sudo cp /boot/initramfs-linux.img /boot/initramfs-linux-crash.img

Next, create the mkinitcpio install file. This builds allows us to build the main initramfs with a copy of the crash initramfs for the crash kernel and the

/etc/initcpio/install/kdump
build() {
        add_binary kexec
        add_file /boot/initramfs-linux-crash.img /crash/initramfs.img
        add_file /boot/vmlinuz-linux /crash/vmlinuz
        add_runscript
}

help() {
        cat <<HELPEOF
Installs the crash kernel on boot
HELPEOF
}

Next, make the mkinitcpio hook file. This runs kexec as an earlyhook, hopefully before anything in the kernel can crash. An important note here is that we run the kernel in emergency mode, because running the kernel in rescue or normal might might just lead to another the same crash happening in the crash kernel.

/etc/initcpio/hook/kdump
run_earlyhook() {
	msg 'Loading crash kernel..'
	if [ -e /crash/vmlinuz ]; then
		if [ -e /crash/initramfs.img ]; then
			kexec -p /crash/vmlinuz --initrd=/crash/initramfs.img --append="root=[root-device] irqpoll nr_cpus=1 reset_devices emergency"
		else
			msg 'No initramfs found'
		fi
	else
		msg 'No vmlinuz found'
	fi
}

Now run mkinitcpio with the new hook

# sudo mkinitcpio -A kdump

When the crash happens, you'll be loaded into emergency kernel mode. After entering your password, you'll be at a terminal. The first thing you'll need to do is make your root filesystem writable.

$ mount -o remount, rw /

Now you can save the dump using makedumpfile (see #Saving the crashed kernel memory)

Analyzing the kernel core dump

The best way for studying the saved kernel core dump involves tools aimed specifically at that. The most common alternative is the gdb-based crash. Run crash as in

$ crash vmlinux path/crash.dump

Where the vmlinux should contain debug symbols included in order to extract more information from the saved crash dump.

Follow man crash or [1] for more information about debugging practices.

Another recent alternative is drgn, a python-based and fully scriptable tool to extract information from the vmcore.

See also