[PATCH] powerpc: Fix data corruption on IPI
Timothy Pearson
tpearson at raptorengineering.com
Wed Nov 15 08:32:06 AEDT 2023
----- Original Message -----
> From: "Michael Ellerman" <mpe at ellerman.id.au>
> To: "Timothy Pearson" <tpearson at raptorengineering.com>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>
> Cc: "Jens Axboe" <axboe at kernel.dk>
> Sent: Tuesday, November 14, 2023 6:14:37 AM
> Subject: Re: [PATCH] powerpc: Fix data corruption on IPI
> Hi Timothy,
>
> Thanks for debugging this, but I'm unclear why this is helping because
> we should already have a full barrier (hwsync) on both the sending and
> receiving side.
>
> More below.
So it looks like we might be dealing with a couple of potentially separate issues here, not sure. This is probably the most infuriating bug I've run across in my career, so please bear with me -- with the amount of active test kernels I have installed at any one time I'm occassionally infecting one with the tests from another. ;)
First, I'm not 100% convinced the code in try_to_wake_up() is race-free on ppc64, since adding a memory barrier between it and the call to kick_process() significantly reduces the frequency of the on-disk corruption. Can you take a look and see if anything stands out as to why that would be important?
The second part of this though is that the barrier only reduces the frequency of the corruption, it does not eliminate the corruption. After some consolidation, what completely eliminates the corruption is a combination of:
* Switching to TWA_SIGNAL_NO_IPI in task_work_add() *
* Adding udelay(1000) in io_uring/rw.c:io_write(), right before the call to io_rw_init_file()
Adding a memory barrier instead of the udelay() doesn't work, nor does adding the delay without switching to NO_IPI.
[1] Keeping TWA_SIGNAL and adding the barrier instruction also works, but this is conceptually simpler to understand as to why it would have an effect at all
More information about the Linuxppc-dev
mailing list