KDUMP GUIDE

kdump是一个内核功能,用于在系统或内核崩溃时捕获故障转储。

为了启用kdump,我们必须保留一些物理RAM,这些物理RAM将用于在内核崩溃或崩溃的情况下执行kdump内核。

当发生内核崩溃或内核崩溃时,运行内核运行kexec(kdump内核)并从保留内存加载kdump内核,然后将RAM和Swap的内容复制到本地磁盘或远程磁盘上的vmcore文件,最后重启机器。

通过分析故障转储,我们可以找到系统故障的原因或根本情况。如果您有操作系统支持,则可以将故障转储共享给供应商进行分析。

Warning: CentOS 7原生内核版本为3.10,如果内核被升级到了4.10等等,那么kdump即使安装了也起不来 !

1.1 Install

install kdump & update bootlaoder:

yum install kexec-tools
grubby --update-kernel=ALL --args="crashkernel=auto"
cat /boot/grub2/grub.cfg | grep crashkernel

1.2 Config kdump

  • specified location for crash dump or vmcore file on a local file system. 1)
  • compress the dump data using core_collector makedumpfile -c, where -c is used for compression.
  • default action is reboot.
vim /etc/kdump.conf
 
path /var/crash
core_collector makedumpfile -c
default reboot

如果kexec执行的时候屏幕(IPMI,VGA等)没有反应,这时候kdump其实已经在执行了,不过画面没刷新,最佳方式是连接机器的serial console来查看最真实信息.没有的话,尝试添加一下参数到'/etc/sysconfig/kdump':

KDUMP_KEXEC_ARGS="--elf64-core-headers --reset-vga --module=vga16fb.ko"

1.3 Start and enable kdump service

systemctl enable kdump

1.4 Reboot the box

reboot to take effect,kdump need preserved RAM to run.

shutdown -r now

1.5 Test kdump by manually crashing the system

Before crashing your system, check kdump service status:

# check status
dmesg|grep Reserving
systemctl status kdump
# This will create a crash dump file (vmcore ) under ''/var/crash''.
echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
ls -lR /var/crash
/var/crash:
total 0
drwxr-xr-x 2 root root 44 Mar 18 17:07 127.0.0.1-2018-03-18-17:00:32

/var/crash/127.0.0.1-2018-03-18-17:00:32:
total 911764
-rw------- 1 root root 933597770 Mar 18 17:07 vmcore
-rw-r--r-- 1 root root     40874 Mar 18 17:00 vmcore-dmesg.txt

2.1 Prepare the platform

Use crash command to analyze and debug crash dumps,make sure two packages are installed:

yum install crash
yum remove kernel-debuginfo kernel-debuginfo-common-x86_64
yum install --enablerepo=base-debuginfo kernel-debuginfo-$(uname -r)

2.2 Analysis

cd /var/crash
# get in the directiry named by crashing time
crash vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux

Useful commands:

  • ps: list the Process which were running when the system got crashed;
  • files: view the files that were open when system got crashed;
  • sys: list the system info when it got crashed;
  • help: get help of any command on crash prompt , type help <command>, example is shown below.

To get all commands:

crash> help
*              files          mach           repeat         timer
alias          foreach        mod            runq           tree
ascii          fuser          mount          search         union
bt             gdb            net            set            vm
btop           help           p              sig            vtop
dev            ipcs           ps             struct         waitq
dis            irq            pte            swap           whatis
eval           kmem           ptob           sym            wr
exit           list           ptov           sys            q
extend         log            rd             task


1)
It is recommended that size of file system should be equivalent to the size of your system’s RAM or file system should have free space equivalent to the size of RAM.
  • linux/others/kdump.txt
  • 最后更改: 2019/04/16 18:31
  • (外部编辑)