Problems with swapping in v4.5-rc on POWER
Hugh Dickins
hughd at google.com
Tue Mar 8 22:49:55 AEDT 2016
On Mon, 7 Mar 2016, Michael Ellerman wrote:
> On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote:
> >
> > The alternative bisection was as unsatisfactory as the first:
> > again it fingered an irrelevant merge (rather than any commit
> > pulled in by that merge) as the bad commit.
> >
> > It seems this issue is too intermittent for bisection to be useful,
> > on my load anyway.
>
> Darn. Thanks for trying.
>
> > The best I can do now is try v4.4 for a couple of days, to verify that
> > still comes out good (rather than the machine going bad coincident with
> > v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad.
>
> Thanks, that would still be helpful.
v4.4 ran under load for 56 hours without any trouble, before I stopped
it to switch kernels. v4.5-rc7 ran for 19.5 hours, then hit the problem
(sigsegv in "as" on this occasion).
>
> > I'll report back on those; but beyond that, I'll have to leave it to you.
>
> I haven't had any luck here :/
>
> Can you give us a more verbose description of your test setup?
I'll be a lot more terse than you'd like, not much time to spare.
If I had a good reproducer, then of course I should specify it exactly
to you; but no, 19.5 hours or 5 hours or a few minutes, that does not
amount to a good reproducer.
>
> - G5, which exact model?
/proc/cpuinfo says:
processor : 0
cpu : PPC970MP, altivec supported
clock : 2500.000000MHz
revision : 1.1 (pvr 0044 0101)
processor : 1
cpu : PPC970MP, altivec supported
clock : 2500.000000MHz
revision : 1.1 (pvr 0044 0101)
processor : 2
cpu : PPC970MP, altivec supported
clock : 2500.000000MHz
revision : 1.1 (pvr 0044 0101)
processor : 3
cpu : PPC970MP, altivec supported
clock : 2500.000000MHz
revision : 1.1 (pvr 0044 0101)
timebase : 33333333
platform : PowerMac
model : PowerMac11,2
machine : PowerMac11,2
motherboard : PowerMac11,2 MacRISC4 Power Macintosh
detected as : 337 (PowerMac G5 Dual Core)
pmac flags : 00000000
L2 cache : 1024K unified
pmac-generation : NewWorld
> - 4k pages, no THP.
Yes.
> - how much ram & swap?
I boot with mem=700M, and use 1.5G swap.
> - building linus' tree, make -j ?
Building an old 2.6.24 tree (which had a higher source to built ratio
than nowadays; with patches to get it to build with more recent toolchain,
from openSUSE 13.1); building some config I used to run on that machine.
Building two of them, each make -j20, concurrently: one in tmpfs,
one in 4kB-blocksize ext4 on loop on tmpfs file. But I doubt that
complication is relevant here: sometimes it's the build in tmpfs
that collapses, sometimes the build in ext4, it's fairly even which.
(Do not bother to attempt such a load on linux-next, only on v4.5:
the OOM rework in mmotm has an unsolved problem with order=2 allocations,
which means that such a load will be OOM-killed very quickly.)
> - source and output on tmpfs? (how big?)
One source and output in ext4 on loop on file filling 470M tmpfs.
Other source and output in tmpfs on /tmp which I happen to size at 1300M
(but could be half that). Sizes of course fitted to that source tree
and config I happen to be building.
> - what device is the swap device? (you said SSD I think?)
Old 75G Intel SSD:
ata2.00: ATA-7: INTEL SSDSA2M080G2GN, 2CV102HD, max UDMA/133
> - anything else I've forgotten?
I happen to run with /proc/sys/vm/swappiness 100,
merely because it's swapping that I'm trying to exercise.
I doubt that any of the details above are important: plenty of
swapping is probably the only message (and doing everything in
tmpfs in limited memory is a good way to force plenty of swapping).
>
> Oh and can you send us your bisect logs, we can at least trust the bad results
> I think.
Remember that both of these bisections started from 4.5-rc1 as bad,
and f689b742f217, the powerpc merge, as good - since I didn't see a
problem at that commit in 12 hours. But we all suspect that in fact
something in that powerpc merge was actually the bad.
git bisect start
# good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good f689b742f217b2ffe7925f8a6521b208ee995309
# bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1
git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d
# bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015
# bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog
git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb
# good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next
git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043
# good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series
git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e
# good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig
git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc
# bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux
git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a
# bad: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad 4e5448a31d73d0e944b7adb9049438a09bc332cb
# good: [b70ce2ab41cb67ab3d661eda078f7c4029bbca95] dts: hisi: fixes no syscon fault when init mdio
git bisect good b70ce2ab41cb67ab3d661eda078f7c4029bbca95
# good: [4a658527271bce43afb1cf4feec89afe6716ca59] xen-netback: delete NAPI instance when queue fails to initialize
git bisect good 4a658527271bce43afb1cf4feec89afe6716ca59
# good: [c6894dec8ea9ae05747124dce98b3b5c2e69b168] bridge: fix lockdep addr_list_lock false positive splat
git bisect good c6894dec8ea9ae05747124dce98b3b5c2e69b168
# good: [36beca6571c941b28b0798667608239731f9bc3a] sparc64: Fix numa node distance initialization
git bisect good 36beca6571c941b28b0798667608239731f9bc3a
# good: [750afbf8ee9c6a1c74a1fe5fc9852146b1d72687] bgmac: Fix reversed test of build_skb() return value.
git bisect good 750afbf8ee9c6a1c74a1fe5fc9852146b1d72687
# good: [5a18d263f8d27418c98b8e8551dadfe975c054e3] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good 5a18d263f8d27418c98b8e8551dadfe975c054e3
# first bad commit: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
And then I replayed, taking the davem/net merge as good instead,
on the basis that it had taken longer than usual to hit the issue:
git bisect start
# good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good f689b742f217b2ffe7925f8a6521b208ee995309
# bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1
git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d
# bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015
# bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog
git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb
# good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next
git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043
# good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series
git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e
# good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig
git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc
# bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux
git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a
# good: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good 4e5448a31d73d0e944b7adb9049438a09bc332cb
# good: [aa13a960fc1bd28cfd8b3aef43e523ade1817a2c] Documentation: cpu-hotplug: Fix sysfs mount instructions
git bisect good aa13a960fc1bd28cfd8b3aef43e523ade1817a2c
# good: [afd8c08446d6503adc1ccd2726a8e27f35d95b79] Documentation: Explain pci=conf1,conf2 more verbosely
git bisect good afd8c08446d6503adc1ccd2726a8e27f35d95b79
# good: [e5b6c1518878e157df4121c1caf70d9c470a6d31] firmware: dmi_scan: Save SMBIOS Type 9 System Slots
git bisect good e5b6c1518878e157df4121c1caf70d9c470a6d31
# good: [ec3fc58b1e7a32cc9f552b306f8dbb4454e83798] thermal: add description for integral_cutoff unit
git bisect good ec3fc58b1e7a32cc9f552b306f8dbb4454e83798
# bad: [ece6267878aed4eadff766112f1079984315d8c8] Merge tag 'clk-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect bad ece6267878aed4eadff766112f1079984315d8c8
# bad: [d45187aaf0e256d23da2f7694a7826524499aa31] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect bad d45187aaf0e256d23da2f7694a7826524499aa31
# first bad commit: [d45187aaf0e256d23da2f7694a7826524499aa31] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Hugh
More information about the Linuxppc-dev
mailing list