instruction storage exception handling

Jacques de Laval jacques.delaval at protonmail.com
Wed Oct 27 09:44:17 AEDT 2021


Hi,

We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
T1023 SOC with two e5500 cores, and are running in 32-bit mode:

	CONFIG_PPC32=y
	# CONFIG_PPC64 is not set

	#
	# Processor support
	#
	# CONFIG_PPC_BOOK3S_32 is not set
	CONFIG_PPC_85xx=y
	# CONFIG_PPC_8xx is not set
	# CONFIG_40x is not set
	# CONFIG_44x is not set
	CONFIG_GENERIC_CPU=y
	# CONFIG_E5500_CPU is not set
	# CONFIG_E6500_CPU is not set
	CONFIG_E500=y
	CONFIG_PPC_E500MC=y
	CONFIG_PPC_FPU_REGS=y
	CONFIG_PPC_FPU=y
	CONFIG_FSL_EMB_PERFMON=y
	CONFIG_FSL_EMB_PERF_EVENT=y
	CONFIG_FSL_EMB_PERF_EVENT_E500=y
	CONFIG_BOOKE=y
	CONFIG_FSL_BOOKE=y
	CONFIG_PPC_FSL_BOOK3E=y
	CONFIG_PTE_64BIT=y
	CONFIG_PHYS_64BIT=y
	CONFIG_PPC_MMU_NOHASH=y
	CONFIG_PPC_BOOK3E_MMU=y
	# CONFIG_PMU_SYSFS is not set
	CONFIG_SMP=y
	CONFIG_NR_CPUS=2
	CONFIG_PPC_DOORBELL=y
	# end of Processor support

We compile using 32-bit Bootlin PPC toolchain:

	powerpc-e500mc glibc bleeding-edge 2020.08-1.

When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
process is running, and for debugging purposes our init currently looks like
this:

	int main(int argc, char *argv[]){
		for (int i = 0; ; i++) {
			FILE *fp = fopen("/dev/kmsg", "w");
			if (fp) {
				fprintf(fp, "%d\n", i);
				fclose(fp);
			}
			sleep(1);
		}
		return 0;
	}

When the hangup occur we don't get any output at all from our PID 1.
The last output is from the kernel:

	Run /sbin/init as init process
	  with arguments:
	    /sbin/init
	  with environment:
	    HOME=/
	    TERM=linux
	    kgdboc=ttyS0,115200

When issuing a backtrace on all active cpus we can see that the kernel is
handling an instruction storage exception:

	sysrq: Show backtrace of all active CPUs
	sysrq: CPU0:
	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
	Call Trace:
	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
	--- interrupt: 400 at 0x65a238
	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
	NIP [0065a238] 0x65a238
	LR [0052f26c] 0x52f26c
	--- interrupt: 400
	Instruction dump:
	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc

We have also observed that the CPU is continuously servicing the same interrupt
(north of 140k times per sec), it is not deadlocked.

We have not yet been able to reproduce this behavior under QEMU system
emulation.

When bisecting between 5.10 and 5.14.11 we can see that this behavior started
with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:

	powerpc: remove arguments from fault handler functions

Our best guess that the instruction storage exception is not properly handled
and the kernel is never able to recover from the page fault, but we don't
really know how to proceed. Does anyone have any suggestions or insights?

Regards,
Jacques de Laval


More information about the Linuxppc-dev mailing list