summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDavid S. Miller2018-05-14 19:46:05 +0200
committerDavid S. Miller2018-05-14 19:49:33 +0200
commit42b956fdc81a84909932df8fc605490ef577bb3f (patch)
tree33c326162fdade738286b7b48520db15649867e9
parentnet: ipv4: ipconfig: fix unused variable (diff)
parentcxgb4: collect hardware dump in second kernel (diff)
downloadkernel-qcow2-linux-42b956fdc81a84909932df8fc605490ef577bb3f.tar.gz
kernel-qcow2-linux-42b956fdc81a84909932df8fc605490ef577bb3f.tar.xz
kernel-qcow2-linux-42b956fdc81a84909932df8fc605490ef577bb3f.zip
Merge branch 'kernel-add-support-to-collect-hardware-logs-in-crash-recovery-kernel'
Rahul Lakkireddy says: ==================== kernel: add support to collect hardware logs in crash recovery kernel On production servers running variety of workloads over time, kernel panic can happen sporadically after days or even months. It is important to collect as much debug logs as possible to root cause and fix the problem, that may not be easy to reproduce. Snapshot of underlying hardware/firmware state (like register dump, firmware logs, adapter memory, etc.), at the time of kernel panic will be very helpful while debugging the culprit device driver. This series of patches add new generic framework that enable device drivers to collect device specific snapshot of the hardware/firmware state of the underlying device in the crash recovery kernel. In crash recovery kernel, the collected logs are added as elf notes to /proc/vmcore, which is copied by user space scripts for post-analysis. The sequence of actions done by device drivers to append their device specific hardware/firmware logs to /proc/vmcore are as follows: 1. During probe (before hardware is initialized), device drivers register to the vmcore module (via vmcore_add_device_dump()), with callback function, along with buffer size and log name needed for firmware/hardware log collection. 2. vmcore module allocates the buffer with requested size. It adds an elf note and invokes the device driver's registered callback function. 3. Device driver collects all hardware/firmware logs into the buffer and returns control back to vmcore module. The device specific hardware/firmware logs can be seen as elf notes with note type 0x700, as shown below: Displaying notes found at file offset 0x00001000 with length 0x040032c0: Owner Data size Description LINUX 0x02000fec Unknown note type: (0x00000700) LINUX 0x02000fec Unknown note type: (0x00000700) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) VMCOREINFO 0x00000785 Unknown note type: (0x00000000) Patch 1 adds API to vmcore module to allow drivers to register callback to collect the device specific hardware/firmware logs. The logs will be added to /proc/vmcore as elf notes. Patch 2 updates read and mmap logic to append device specific hardware/ firmware logs as elf notes. Patch 3 shows a cxgb4 driver example using the API to collect hardware/firmware logs in crash recovery kernel, before hardware is initialized. Thanks, Rahul --- v8: - Added missing linux/types.h header include. - Removed __vmcore_add_device_dump(). v7: - Removed "CHELSIO" vendor identifier in Elf Note name. Instead, writing "LINUX". - Moved vmcoredd_header to new file include/uapi/linux/vmcore.h - Reworked vmcoredd_header to include Elf Note as part of the header itself. - Removed vmcoredd_get_note_size(). - Renamed vmcoredd_write_note() to vmcoredd_write_header(). - Replaced all "unsigned long" with "unsigned int" for device dump size since max size of Elf Word is u32. v6: - Reworked device dump elf note name to contain vendor identifier. - Added vmcoredd_header that precedes actual dump in the Elf Note. - Device dump's name is moved inside vmcoredd_header. - Added "CHELSIO" string as vendor identifier in the Elf Note name for cxgb4 device dumps. v5: - Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and updated help message. v4: - Made __vmcore_add_device_dump() static. - Moved compile check to define vmcore_add_device_dump() to crash_dump.h to fix compilation when vmcore.c is not compiled in. - Convert ---help--- to help in Kconfig as indicated by checkpatch. - Rebased to tip. v3: - Dropped sysfs crashdd module. - Exported dumps as elf notes. Suggested by Eric Biederman <ebiederm@xmission.com>. Added as patch 2 in this version. - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device dump support. - Moved logic related to adding dumps from crashdd to vmcore module. - Rename all crashdd* to vmcoredd*. - Updated comments. v2: - Added ABI Documentation for crashdd. - Directly use octal permission instead of macro. Changes since rfc v2: - Moved exporting crashdd from procfs to sysfs. Suggested by Stephen Hemminger <stephen@networkplumber.org> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory. - Replaced all proc API with sysfs API and updated comments. - Calling driver callback before creating the binary file under crashdd sysfs. - Changed binary dump file permission from S_IRUSR to S_IRUGO. - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP. rfc v2: - Collecting logs in 2nd kernel instead of during kernel panic. Suggested by Eric Biederman <ebiederm@xmission.com>. - Added new crashdd module that exports /proc/crashdd/ containing driver's registered hardware/firmware logs in patch 1. - Replaced the API to allow drivers to register their hardware/firmware log collect routine in crash recovery kernel in patch 1. - Updated patch 2 to use the new API in patch 1. ==================== Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r--drivers/net/ethernet/chelsio/cxgb4/cxgb4.h4
-rw-r--r--drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c25
-rw-r--r--drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h3
-rw-r--r--drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c10
-rw-r--r--fs/proc/Kconfig15
-rw-r--r--fs/proc/vmcore.c382
-rw-r--r--include/linux/crash_dump.h18
-rw-r--r--include/linux/kcore.h6
-rw-r--r--include/uapi/linux/elf.h1
-rw-r--r--include/uapi/linux/vmcore.h18
10 files changed, 469 insertions, 13 deletions
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 688f95440af2..01e7aad4ce5b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -50,6 +50,7 @@
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/ptp_classify.h>
+#include <linux/crash_dump.h>
#include <asm/io.h>
#include "t4_chip_type.h"
#include "cxgb4_uld.h"
@@ -964,6 +965,9 @@ struct adapter {
struct hma_data hma;
struct srq_data *srq;
+
+ /* Dump buffer for collecting logs in kdump kernel */
+ struct vmcoredd_data vmcoredd;
};
/* Support for "sched-class" command to allow a TX Scheduling Class to be
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 143686c60234..085691eb2b95 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -488,3 +488,28 @@ void cxgb4_init_ethtool_dump(struct adapter *adapter)
adapter->eth_dump.version = adapter->params.fw_vers;
adapter->eth_dump.len = 0;
}
+
+static int cxgb4_cudbg_vmcoredd_collect(struct vmcoredd_data *data, void *buf)
+{
+ struct adapter *adap = container_of(data, struct adapter, vmcoredd);
+ u32 len = data->size;
+
+ return cxgb4_cudbg_collect(adap, buf, &len, CXGB4_ETH_DUMP_ALL);
+}
+
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap)
+{
+ struct vmcoredd_data *data = &adap->vmcoredd;
+ u32 len;
+
+ len = sizeof(struct cudbg_hdr) +
+ sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
+ len += CUDBG_DUMP_BUFF_SIZE;
+
+ data->size = len;
+ snprintf(data->dump_name, sizeof(data->dump_name), "%s_%s",
+ cxgb4_driver_name, adap->name);
+ data->vmcoredd_callback = cxgb4_cudbg_vmcoredd_collect;
+
+ return vmcore_add_device_dump(data);
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
index ce1ac9a1c878..ef59ba1ed968 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
@@ -41,8 +41,11 @@ enum CXGB4_ETHTOOL_DUMP_FLAGS {
CXGB4_ETH_DUMP_HW = (1 << 1), /* various FW and HW dumps */
};
+#define CXGB4_ETH_DUMP_ALL (CXGB4_ETH_DUMP_MEM | CXGB4_ETH_DUMP_HW)
+
u32 cxgb4_get_dump_length(struct adapter *adap, u32 flag);
int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
u32 flag);
void cxgb4_init_ethtool_dump(struct adapter *adapter);
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap);
#endif /* __CXGB4_CUDBG_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c54fd189d835..07baa5e1a8c5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5558,6 +5558,16 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
if (err)
goto out_free_adapter;
+ if (is_kdump_kernel()) {
+ /* Collect hardware state and append to /proc/vmcore */
+ err = cxgb4_cudbg_vmcore_add_dump(adapter);
+ if (err) {
+ dev_warn(adapter->pdev_dev,
+ "Fail collecting vmcore device dump, err: %d. Continuing\n",
+ err);
+ err = 0;
+ }
+ }
if (!is_t4(adapter->params.chip)) {
s_qpp = (QUEUESPERPAGEPF0_S +
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade1206bb89..0eaeb41453f5 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -43,6 +43,21 @@ config PROC_VMCORE
help
Exports the dump image of crashed kernel in ELF format.
+config PROC_VMCORE_DEVICE_DUMP
+ bool "Device Hardware/Firmware Log Collection"
+ depends on PROC_VMCORE
+ default n
+ help
+ After kernel panic, device drivers can collect the device
+ specific snapshot of their hardware or firmware before the
+ underlying devices are initialized in crash recovery kernel.
+ Note that the device driver must be present in the crash
+ recovery kernel's initramfs to collect its underlying device
+ snapshot.
+
+ If you say Y here, the collected device dumps will be added
+ as ELF notes to /proc/vmcore.
+
config PROC_SYSCTL
bool "Sysctl support (/proc/sys)" if EXPERT
depends on PROC_FS
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..247c3499e5bd 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/crash_dump.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include <linux/uaccess.h>
@@ -38,12 +39,23 @@ static size_t elfcorebuf_sz_orig;
static char *elfnotes_buf;
static size_t elfnotes_sz;
+/* Size of all notes minus the device dump notes */
+static size_t elfnotes_orig_sz;
/* Total size of vmcore file. */
static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore;
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/* Device Dump list and mutex to synchronize access to list */
+static LIST_HEAD(vmcoredd_list);
+static DEFINE_MUTEX(vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+/* Device Dump Size */
+static size_t vmcoredd_orig_sz;
+
/*
* Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
* The called function has to take care of module refcounting.
@@ -178,6 +190,77 @@ static int copy_to(void *target, void *src, size_t size, int userbuf)
return 0;
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+static int vmcoredd_copy_dumps(void *dst, u64 start, size_t size, int userbuf)
+{
+ struct vmcoredd_node *dump;
+ u64 offset = 0;
+ int ret = 0;
+ size_t tsz;
+ char *buf;
+
+ mutex_lock(&vmcoredd_mutex);
+ list_for_each_entry(dump, &vmcoredd_list, list) {
+ if (start < offset + dump->size) {
+ tsz = min(offset + (u64)dump->size - start, (u64)size);
+ buf = dump->buf + start - offset;
+ if (copy_to(dst, buf, tsz, userbuf)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ size -= tsz;
+ start += tsz;
+ dst += tsz;
+
+ /* Leave now if buffer filled already */
+ if (!size)
+ goto out_unlock;
+ }
+ offset += dump->size;
+ }
+
+out_unlock:
+ mutex_unlock(&vmcoredd_mutex);
+ return ret;
+}
+
+static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
+ u64 start, size_t size)
+{
+ struct vmcoredd_node *dump;
+ u64 offset = 0;
+ int ret = 0;
+ size_t tsz;
+ char *buf;
+
+ mutex_lock(&vmcoredd_mutex);
+ list_for_each_entry(dump, &vmcoredd_list, list) {
+ if (start < offset + dump->size) {
+ tsz = min(offset + (u64)dump->size - start, (u64)size);
+ buf = dump->buf + start - offset;
+ if (remap_vmalloc_range_partial(vma, dst, buf, tsz)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ size -= tsz;
+ start += tsz;
+ dst += tsz;
+
+ /* Leave now if buffer filled already */
+ if (!size)
+ goto out_unlock;
+ }
+ offset += dump->size;
+ }
+
+out_unlock:
+ mutex_unlock(&vmcoredd_mutex);
+ return ret;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -215,10 +298,41 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
if (*fpos < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
+ /* We add device dumps before other elf notes because the
+ * other elf notes may not fill the elf notes buffer
+ * completely and we will end up with zero-filled data
+ * between the elf notes and the device dumps. Tools will
+ * then try to decode this zero-filled data as valid notes
+ * and we don't want that. Hence, adding device dumps before
+ * the other elf notes ensure that zero-filled data can be
+ * avoided.
+ */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ /* Read device dumps */
+ if (*fpos < elfcorebuf_sz + vmcoredd_orig_sz) {
+ tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+ (size_t)*fpos, buflen);
+ start = *fpos - elfcorebuf_sz;
+ if (vmcoredd_copy_dumps(buffer, start, tsz, userbuf))
+ return -EFAULT;
+
+ buflen -= tsz;
+ *fpos += tsz;
+ buffer += tsz;
+ acc += tsz;
+
+ /* leave now if filled buffer already */
+ if (!buflen)
+ return acc;
+ }
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+ /* Read remaining elf notes */
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
- kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+ kaddr = elfnotes_buf + *fpos - elfcorebuf_sz - vmcoredd_orig_sz;
if (copy_to(buffer, kaddr, tsz, userbuf))
return -EFAULT;
+
buflen -= tsz;
*fpos += tsz;
buffer += tsz;
@@ -302,10 +416,8 @@ static const struct vm_operations_struct vmcore_mmap_ops = {
};
/**
- * alloc_elfnotes_buf - allocate buffer for ELF note segment in
- * vmalloc memory
- *
- * @notes_sz: size of buffer
+ * vmcore_alloc_buf - allocate buffer in vmalloc memory
+ * @sizez: size of buffer
*
* If CONFIG_MMU is defined, use vmalloc_user() to allow users to mmap
* the buffer to user-space by means of remap_vmalloc_range().
@@ -313,12 +425,12 @@ static const struct vm_operations_struct vmcore_mmap_ops = {
* If CONFIG_MMU is not defined, use vzalloc() since mmap_vmcore() is
* disabled and there's no need to allow users to mmap the buffer.
*/
-static inline char *alloc_elfnotes_buf(size_t notes_sz)
+static inline char *vmcore_alloc_buf(size_t size)
{
#ifdef CONFIG_MMU
- return vmalloc_user(notes_sz);
+ return vmalloc_user(size);
#else
- return vzalloc(notes_sz);
+ return vzalloc(size);
#endif
}
@@ -446,11 +558,46 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
if (start < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
+ /* We add device dumps before other elf notes because the
+ * other elf notes may not fill the elf notes buffer
+ * completely and we will end up with zero-filled data
+ * between the elf notes and the device dumps. Tools will
+ * then try to decode this zero-filled data as valid notes
+ * and we don't want that. Hence, adding device dumps before
+ * the other elf notes ensure that zero-filled data can be
+ * avoided. This also ensures that the device dumps and
+ * other elf notes can be properly mmaped at page aligned
+ * address.
+ */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ /* Read device dumps */
+ if (start < elfcorebuf_sz + vmcoredd_orig_sz) {
+ u64 start_off;
+
+ tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+ (size_t)start, size);
+ start_off = start - elfcorebuf_sz;
+ if (vmcoredd_mmap_dumps(vma, vma->vm_start + len,
+ start_off, tsz))
+ goto fail;
+
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ /* leave now if filled buffer already */
+ if (!size)
+ return 0;
+ }
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+ /* Read remaining elf notes */
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
- kaddr = elfnotes_buf + start - elfcorebuf_sz;
+ kaddr = elfnotes_buf + start - elfcorebuf_sz - vmcoredd_orig_sz;
if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
kaddr, tsz))
goto fail;
+
size -= tsz;
start += tsz;
len += tsz;
@@ -665,7 +812,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
return rc;
*notes_sz = roundup(phdr_sz, PAGE_SIZE);
- *notes_buf = alloc_elfnotes_buf(*notes_sz);
+ *notes_buf = vmcore_alloc_buf(*notes_sz);
if (!*notes_buf)
return -ENOMEM;
@@ -698,6 +845,11 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
+ /* Store the size of all notes. We need this to update the note
+ * header when the device dumps will be added.
+ */
+ elfnotes_orig_sz = phdr.p_memsz;
+
return 0;
}
@@ -851,7 +1003,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
return rc;
*notes_sz = roundup(phdr_sz, PAGE_SIZE);
- *notes_buf = alloc_elfnotes_buf(*notes_sz);
+ *notes_buf = vmcore_alloc_buf(*notes_sz);
if (!*notes_buf)
return -ENOMEM;
@@ -884,6 +1036,11 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
+ /* Store the size of all notes. We need this to update the note
+ * header when the device dumps will be added.
+ */
+ elfnotes_orig_sz = phdr.p_memsz;
+
return 0;
}
@@ -976,8 +1133,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
- struct list_head *vc_list)
+static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
+ struct list_head *vc_list)
{
loff_t vmcore_off;
struct vmcore *m;
@@ -1145,6 +1302,202 @@ static int __init parse_crash_elf_headers(void)
return 0;
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/**
+ * vmcoredd_write_header - Write vmcore device dump header at the
+ * beginning of the dump's buffer.
+ * @buf: Output buffer where the note is written
+ * @data: Dump info
+ * @size: Size of the dump
+ *
+ * Fills beginning of the dump's buffer with vmcore device dump header.
+ */
+static void vmcoredd_write_header(void *buf, struct vmcoredd_data *data,
+ u32 size)
+{
+ struct vmcoredd_header *vdd_hdr = (struct vmcoredd_header *)buf;
+
+ vdd_hdr->n_namesz = sizeof(vdd_hdr->name);
+ vdd_hdr->n_descsz = size + sizeof(vdd_hdr->dump_name);
+ vdd_hdr->n_type = NT_VMCOREDD;
+
+ strncpy((char *)vdd_hdr->name, VMCOREDD_NOTE_NAME,
+ sizeof(vdd_hdr->name));
+ memcpy(vdd_hdr->dump_name, data->dump_name, sizeof(vdd_hdr->dump_name));
+}
+
+/**
+ * vmcoredd_update_program_headers - Update all Elf program headers
+ * @elfptr: Pointer to elf header
+ * @elfnotesz: Size of elf notes aligned to page size
+ * @vmcoreddsz: Size of device dumps to be added to elf note header
+ *
+ * Determine type of Elf header (Elf64 or Elf32) and update the elf note size.
+ * Also update the offsets of all the program headers after the elf note header.
+ */
+static void vmcoredd_update_program_headers(char *elfptr, size_t elfnotesz,
+ size_t vmcoreddsz)
+{
+ unsigned char *e_ident = (unsigned char *)elfptr;
+ u64 start, end, size;
+ loff_t vmcore_off;
+ u32 i;
+
+ vmcore_off = elfcorebuf_sz + elfnotesz;
+
+ if (e_ident[EI_CLASS] == ELFCLASS64) {
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfptr;
+ Elf64_Phdr *phdr = (Elf64_Phdr *)(elfptr + sizeof(Elf64_Ehdr));
+
+ /* Update all program headers */
+ for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+ if (phdr->p_type == PT_NOTE) {
+ /* Update note size */
+ phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+ phdr->p_filesz = phdr->p_memsz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz,
+ PAGE_SIZE);
+ size = end - start;
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off += size;
+ }
+ } else {
+ Elf32_Ehdr *ehdr = (Elf32_Ehdr *)elfptr;
+ Elf32_Phdr *phdr = (Elf32_Phdr *)(elfptr + sizeof(Elf32_Ehdr));
+
+ /* Update all program headers */
+ for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+ if (phdr->p_type == PT_NOTE) {
+ /* Update note size */
+ phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+ phdr->p_filesz = phdr->p_memsz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz,
+ PAGE_SIZE);
+ size = end - start;
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off += size;
+ }
+ }
+}
+
+/**
+ * vmcoredd_update_size - Update the total size of the device dumps and update
+ * Elf header
+ * @dump_size: Size of the current device dump to be added to total size
+ *
+ * Update the total size of all the device dumps and update the Elf program
+ * headers. Calculate the new offsets for the vmcore list and update the
+ * total vmcore size.
+ */
+static void vmcoredd_update_size(size_t dump_size)
+{
+ vmcoredd_orig_sz += dump_size;
+ elfnotes_sz = roundup(elfnotes_orig_sz, PAGE_SIZE) + vmcoredd_orig_sz;
+ vmcoredd_update_program_headers(elfcorebuf, elfnotes_sz,
+ vmcoredd_orig_sz);
+
+ /* Update vmcore list offsets */
+ set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
+
+ vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+ &vmcore_list);
+ proc_vmcore->size = vmcore_size;
+}
+
+/**
+ * vmcore_add_device_dump - Add a buffer containing device dump to vmcore
+ * @data: dump info.
+ *
+ * Allocate a buffer and invoke the calling driver's dump collect routine.
+ * Write Elf note at the beginning of the buffer to indicate vmcore device
+ * dump and add the dump to global list.
+ */
+int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+ struct vmcoredd_node *dump;
+ void *buf = NULL;
+ size_t data_size;
+ int ret;
+
+ if (!data || !strlen(data->dump_name) ||
+ !data->vmcoredd_callback || !data->size)
+ return -EINVAL;
+
+ dump = vzalloc(sizeof(*dump));
+ if (!dump) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ /* Keep size of the buffer page aligned so that it can be mmaped */
+ data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
+ PAGE_SIZE);
+
+ /* Allocate buffer for driver's to write their dumps */
+ buf = vmcore_alloc_buf(data_size);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ vmcoredd_write_header(buf, data, data_size -
+ sizeof(struct vmcoredd_header));
+
+ /* Invoke the driver's dump collection routing */
+ ret = data->vmcoredd_callback(data, buf +
+ sizeof(struct vmcoredd_header));
+ if (ret)
+ goto out_err;
+
+ dump->buf = buf;
+ dump->size = data_size;
+
+ /* Add the dump to driver sysfs list */
+ mutex_lock(&vmcoredd_mutex);
+ list_add_tail(&dump->list, &vmcoredd_list);
+ mutex_unlock(&vmcoredd_mutex);
+
+ vmcoredd_update_size(data_size);
+ return 0;
+
+out_err:
+ if (buf)
+ vfree(buf);
+
+ if (dump)
+ vfree(dump);
+
+ return ret;
+}
+EXPORT_SYMBOL(vmcore_add_device_dump);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+/* Free all dumps in vmcore device dump list */
+static void vmcore_free_device_dumps(void)
+{
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ mutex_lock(&vmcoredd_mutex);
+ while (!list_empty(&vmcoredd_list)) {
+ struct vmcoredd_node *dump;
+
+ dump = list_first_entry(&vmcoredd_list, struct vmcoredd_node,
+ list);
+ list_del(&dump->list);
+ vfree(dump->buf);
+ vfree(dump);
+ }
+ mutex_unlock(&vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+}
+
/* Init function for vmcore module. */
static int __init vmcore_init(void)
{
@@ -1192,4 +1545,7 @@ void vmcore_cleanup(void)
kfree(m);
}
free_elfcorebuf();
+
+ /* clear vmcore device dump list */
+ vmcore_free_device_dumps();
}
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index f7ac2aa93269..3e4ba9d753c8 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -5,6 +5,7 @@
#include <linux/kexec.h>
#include <linux/proc_fs.h>
#include <linux/elf.h>
+#include <uapi/linux/vmcore.h>
#include <asm/pgtable.h> /* for pgprot_t */
@@ -93,4 +94,21 @@ static inline bool is_kdump_kernel(void) { return 0; }
#endif /* CONFIG_CRASH_DUMP */
extern unsigned long saved_max_pfn;
+
+/* Device Dump information to be filled by drivers */
+struct vmcoredd_data {
+ char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */
+ unsigned int size; /* Size of the dump */
+ /* Driver's registered callback to be invoked to collect dump */
+ int (*vmcoredd_callback)(struct vmcoredd_data *data, void *buf);
+};
+
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+int vmcore_add_device_dump(struct vmcoredd_data *data);
+#else
+static inline int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+ return -EOPNOTSUPP;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
#endif /* LINUX_CRASHDUMP_H */
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 80db19d3a505..8de55e4b5ee9 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -28,6 +28,12 @@ struct vmcore {
loff_t offset;
};
+struct vmcoredd_node {
+ struct list_head list; /* List of dumps */
+ void *buf; /* Buffer containing device's dump */
+ unsigned int size; /* Size of the buffer */
+};
+
#ifdef CONFIG_PROC_KCORE
extern void kclist_add(struct kcore_list *, void *, size_t, int type);
#else
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index e2535d6dcec7..4e12c423b9fe 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -421,6 +421,7 @@ typedef struct elf64_shdr {
#define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */
#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */
#define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */
+#define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */
/* Note header in a PT_NOTE section */
typedef struct elf32_note {
diff --git a/include/uapi/linux/vmcore.h b/include/uapi/linux/vmcore.h
new file mode 100644
index 000000000000..022619668e0e
--- /dev/null
+++ b/include/uapi/linux/vmcore.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_VMCORE_H
+#define _UAPI_VMCORE_H
+
+#include <linux/types.h>
+
+#define VMCOREDD_NOTE_NAME "LINUX"
+#define VMCOREDD_MAX_NAME_BYTES 44
+
+struct vmcoredd_header {
+ __u32 n_namesz; /* Name size */
+ __u32 n_descsz; /* Content size */
+ __u32 n_type; /* NT_VMCOREDD */
+ __u8 name[8]; /* LINUX\0\0\0 */
+ __u8 dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Device dump's name */
+};
+
+#endif /* _UAPI_VMCORE_H */