[Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
bugzilla-daemon at bugzilla.kernel.org
bugzilla-daemon at bugzilla.kernel.org
Mon Oct 14 02:56:02 AEDT 2019
https://bugzilla.kernel.org/show_bug.cgi?id=205183
Bug ID: 205183
Summary: PPC64: Signal delivery fails with SIGSEGV if between
about 1KB and 4KB bytes of stack remain
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: 4.19.15 and others
Hardware: PPC-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: PPC-64
Assignee: platform_ppc-64 at kernel-bugs.osdl.org
Reporter: tgl at sss.pgh.pa.us
Regression: No
Created attachment 285487
--> https://bugzilla.kernel.org/attachment.cgi?id=285487&action=edit
stacktest.c
If there are between about 1K and 4K bytes remaining in a process' existing
stack segment, an attempt to deliver a signal that the process has a signal
handler for will result in SIGSEGV instead. This situation should result in
extending the process' stack to allow handling the signal, but it does not.
The attached test program illustrates this. It requires a parameter specifying
the amount of stack to consume before sleeping. Waken the process with a
manual kill -USR1. An example of a successful case is
[tgl at postgresql-fedora ~]$ gcc -g -Wall -O stacktest.c
[tgl at postgresql-fedora ~]$ ./a.out 1240000 &
[1] 7922
[tgl at postgresql-fedora ~]$ cat /proc/7922/maps | grep stack
7fffc9970000-7fffc9aa0000 rw-p 00000000 00:00 0
[stack]
[tgl at postgresql-fedora ~]$ kill -USR1 7922
[tgl at postgresql-fedora ~]$ signal delivered, stack base 0x7fffc9aa0000 top
0x7fffc9971420 (1240032 used)
[1]+ Done ./a.out 1240000
The above example shows that 0x7fffc9971420 - 0x7fffc9970000 = 5152 bytes
are enough to deliver the signal. But with a slightly larger parameter,
[tgl at postgresql-fedora ~]$ ./a.out 1241000 &
[1] 7941
[tgl at postgresql-fedora ~]$ kill -USR1 7941
[tgl at postgresql-fedora ~]$
[1]+ Segmentation fault (core dumped) ./a.out 1241000
With a still larger parameter, corresponding to just a few hundred bytes left,
it works again, showing that the kernel does know how to enlarge the stack in
such cases --- it's just got a boundary condition wrong somewhere.
On the particular userland toolchain I'm using here, parameters between about
1241000 and 1244000 (free space between about 1200 and 4200 bytes) will show
the error, but you might need to tweak it a bit with a different system.
The Postgres project has been chasing errors caused by this bug for months, and
we've seen it happen on a range of PPC64 kernels from 4.4.0 up to 4.19.15, but
not on other architectures, nor on non-Linux PPC64. My colleague Thomas Munro
found a possible explanation in
https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L251
which claims that
* The kernel signal delivery code writes up to about 1.5kB
* below the stack pointer (r1) before decrementing it.
and that seems to be the justification for the "2048" magic number at line 276.
Perhaps that number applies only to PPC32, and PPC64 requires more space? At
the very least, this function's other magic number of 0x100000 seems highly
suspicious in view of the fact that we don't see the bug until the process has
consumed at least 1MB of stack space. (Hence, please use values > 1MB with the
test program.)
--
You are receiving this mail because:
You are watching the assignee of the bug.
More information about the Linuxppc-dev
mailing list