powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors.

On non-recoverable MCE errors in kernel space, Linux kernel panics and system reboots. On BMC based system opal-prd runs as a daemon in the host. Hence, kernel crash may prevent opal-prd to detect and analyze this MCE error. This may land us in a situation where the faulty memory never gets de-configured and Linux would keep hitting same MCE error again and again. If this happens in early stage of kernel initialization, then Linux will keep crashing and rebooting in a loop. This patch fixes this issue by invoking new opal_cec_reboot2() call with reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. This patch is dependent on OPAL patchset posted on skiboot mailing list at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that introduces opal_cec_reboot2() opal call. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
author: Mahesh Salgaonkar 2015-07-31 17:54:38 +0200
committer: Michael Ellerman 2015-08-06 07:10:18 +0200
commit: e784b6499d9cba83b7f3f032b7ee01f7ca96ad91 (patch)
tree: 763f91d5dceef667f79a4f3013a65a278df413b1 /arch/powerpc/platforms/powernv/opal.c
parent: powerpc/powernv: Pull all HMI events before panic. (diff)
download: kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.tar.gz
kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.tar.xz
kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.zip
1 files changed, 35 insertions, 0 deletions
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index f084afa0e3ba..a2b53f292427 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -441,6 +441,7 @@ static int opal_recover_mce(struct pt_regs *regs,
 int opal_machine_check(struct pt_regs *regs)
 {
 	struct machine_check_event evt;
+	int ret;
 
 	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
 		return 0;
@@ -455,6 +456,40 @@ int opal_machine_check(struct pt_regs *regs)
 
 	if (opal_recover_mce(regs, &evt))
 		return 1;
+
+	/*
+	 * Unrecovered machine check, we are heading to panic path.
+	 *
+	 * We may have hit this MCE in very early stage of kernel
+	 * initialization even before opal-prd has started running. If
+	 * this is the case then this MCE error may go un-noticed or
+	 * un-analyzed if we go down panic path. We need to inform
+	 * BMC/OCC about this error so that they can collect relevant
+	 * data for error analysis before rebooting.
+	 * Use opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR) to do so.
+	 * This function may not return on BMC based system.
+	 */
+	ret = opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
+			"Unrecoverable Machine Check exception");
+	if (ret == OPAL_UNSUPPORTED) {
+		pr_emerg("Reboot type %d not supported\n",
+					OPAL_REBOOT_PLATFORM_ERROR);
+	}
+
+	/*
+	 * We reached here. There can be three possibilities:
+	 * 1. We are running on a firmware level that do not support
+	 *    opal_cec_reboot2()
+	 * 2. We are running on a firmware level that do not support
+	 *    OPAL_REBOOT_PLATFORM_ERROR reboot type.
+	 * 3. We are running on FSP based system that does not need opal
+	 *    to trigger checkstop explicitly for error analysis. The FSP
+	 *    PRD component would have already got notified about this
+	 *    error through other channels.
+	 *
+	 * In any case, let us just fall through. We anyway heading
+	 * down to panic path.
+	 */
 	return 0;
 }
author	Mahesh Salgaonkar	2015-07-31 17:54:38 +0200
committer	Michael Ellerman	2015-08-06 07:10:18 +0200
commit	e784b6499d9cba83b7f3f032b7ee01f7ca96ad91 (patch)
tree	763f91d5dceef667f79a4f3013a65a278df413b1 /arch/powerpc/platforms/powernv/opal.c
parent	powerpc/powernv: Pull all HMI events before panic. (diff)
download	kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.tar.gz kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.tar.xz kernel-qcow2-linux-e784b6499d9cba83b7f3f032b7ee01f7ca96ad91.zip