coherency issue observed after hotplug on POWER8
Naveen N. Rao
naveen.n.rao at linux.ibm.com
Sat Sep 25 03:17:09 AEST 2021
Hi Cascardo,
Thanks for reporting this.
Thadeu Lima de Souza Cascardo wrote:
> Hi, there.
>
> We have been investigating an issue we have observed on POWER8 POWERNV systems.
> When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
> crashes, in different forms. [1]
Just to re-confirm: you are only seeing this on P8 powernv, and not in a
P8 guest/LPAR? I haven't been able to reproduce this on a firestone --
can you share more details about your power8 machine?
Also, do you only see this with ubuntu kernels, or are you also able to
reproduce this with the upstream tree?
>
> I managed to get xmon on that trap, and did some debugging. [2] I tried to dump
> the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f
> (the one that was hotplugged, offlined, then onlined).
Next time you reproduce this, can you try dumping the SLBs for the cpus
(command 'u' in xmon)?
>
> Here is my partial analysis [3]. Basically, the BPF JIT fills a page with
> invalid instructions (traps, in ppc64 case), and puts the BPF program in a
> random offset of the page. In the case of the hotplugged CPU, which was the one
> that compiled the program, the page had the expected contents (BPF program
> started at the offset used to run the program). On the other CPU (in many
> cases, CPU #0), the same memory address/page had different contents, with the
> program starting at a different offset.
>From [3], I think fp->aux->jit_data can be NULL if there are subprogs.
But, I find it interesting that you don't always see the correct
bpf_func, as reported in comment #25. Can you also try dumping the full
bpf_prog structure (prog/fp) from xmon?
>
> Is this a case of a bug in the micro-architecture or the firmware when
> doing the hotplug? Can someone chime in?
It's possible that something is going wrong when offlining the cpu. Can
you try booting the kernel with 'powersave=off' and see if the problem
goes away?
>
> Notice that we can't reproduce the same issue on a POWER9 system.
>
> Thanks.
> Cascardo.
>
> [1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076
> [2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29
> [3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30
>
- Naveen
More information about the Linuxppc-dev
mailing list