coherency issue observed after hotplug on POWER8

Naveen N. Rao naveen.n.rao at linux.ibm.com
Sat Sep 25 03:17:09 AEST 2021


Hi Cascardo,
Thanks for reporting this.


Thadeu Lima de Souza Cascardo wrote:
> Hi, there.
> 
> We have been investigating an issue we have observed on POWER8 POWERNV systems.
> When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
> crashes, in different forms. [1]

Just to re-confirm: you are only seeing this on P8 powernv, and not in a 
P8 guest/LPAR? I haven't been able to reproduce this on a firestone -- 
can you share more details about your power8 machine?

Also, do you only see this with ubuntu kernels, or are you also able to 
reproduce this with the upstream tree?

> 
> I managed to get xmon on that trap, and did some debugging. [2] I tried to dump
> the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f
> (the one that was hotplugged, offlined, then onlined).

Next time you reproduce this, can you try dumping the SLBs for the cpus 
(command 'u' in xmon)?

> 
> Here is my partial analysis [3]. Basically, the BPF JIT fills a page with
> invalid instructions (traps, in ppc64 case), and puts the BPF program in a
> random offset of the page. In the case of the hotplugged CPU, which was the one
> that compiled the program, the page had the expected contents (BPF program
> started at the offset used to run the program). On the other CPU (in many
> cases, CPU #0), the same memory address/page had different contents, with the
> program starting at a different offset.

>From [3], I think fp->aux->jit_data can be NULL if there are subprogs.  
But, I find it interesting that you don't always see the correct 
bpf_func, as reported in comment #25. Can you also try dumping the full 
bpf_prog structure (prog/fp) from xmon?

> 
> Is this a case of a bug in the micro-architecture or the firmware when 
> doing the hotplug? Can someone chime in?

It's possible that something is going wrong when offlining the cpu. Can 
you try booting the kernel with 'powersave=off' and see if the problem 
goes away?

> 
> Notice that we can't reproduce the same issue on a POWER9 system.
> 
> Thanks.
> Cascardo.
> 
> [1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076
> [2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29
> [3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30
> 

- Naveen



More information about the Linuxppc-dev mailing list