Memory coherency issue with IO thread offloading?

Wed Mar 29 03:38:03 AEDT 2023

On 3/28/23 6:51?AM, Michael Ellerman wrote:
> Jens Axboe <axboe at kernel.dk> writes:
>>>> Can the  queueing cause the creation of an IO thread (if one does not
>>>> exist, or all blocked?)
>>>
>>> Yep
>>>
>>> Since writing this email, I've gone through a lot of different tests.
>>> Here's a rough listing of what I found:
>>>
>>> - Like using the hack patch, if I just limit the number of IO thread
>>>   workers to 1, it seems to pass. At least longer than before, does 1000
>>>   iterations.
>>>
>>> - If I pin each IO worker to a single CPU, it also passes.
>>>
>>> - If I liberally sprinkle smp_mb() for the io-wq side, test still fails.
>>>   I've added one before queueing the work item, and after. One before
>>>   the io-wq worker grabs a work item and one after. Eg full hammer
>>>   approach. This still fails.
>>>
>>> Puzzling... For the "pin each IO worker to a single CPU" I added some
>>> basic code around trying to ensure that a work item queued on CPU X
>>> would be processed by a worker on CPU X, and too a large degree, this
>>> does happen. But since the work list is a normal list, it's quite
>>> possible that some other worker finishes its work on CPU Y just in time
>>> to grab the one from cpu X. I checked and this does happen in the test
>>> case, yet it still passes. This may be because I got a bit lucky, but
>>> seems suspect with thousands of passes of the test case.
>>>
>>> Another theory there is that it's perhaps related to an io-wq worker
>>> being rescheduled on a different CPU. Though again puzzled as to why the
>>> smp_mb sprinkling didn't fix that then. I'm going to try and run the
>>> test case with JUST the io-wq worker pinning and not caring about where
>>> the work is processed to see if that does anything.
>>
>> Just pinning each worker to whatever CPU they got created on seemingly
>> fixes the issue too. This does not mean that each worker will process
>> work on the CPU on which it was queued, just that each worker will
>> remain on whatever CPU it originally got created on.
>>
>> Puzzling...
>>
>> Note that it is indeed quite possible that this isn't a ppc issue at
>> all, just shows on ppc. It could be page cache related, or it could even
>> be a bug in mariadb itself.
> 
> I tried binary patching every lwsync to hwsync (read/write to full
> barrier) in mariadbd and all the libaries it links. It didn't fix the
> problem.
> 
> I also tried switching all the kernel barriers/spin locks to using a
> hwsync, but that also didn't fix it.
> 
> It's still possible there's somewhere that currently has no barrier at
> all that needs one, the above would only fix the problem if we have a
> read/write barrier that actually needs to be a full barrier.
> 
> 
> I also looked at making all TLB invalidates broadcast, regardless of
> whether we think the thread has only been on a single CPU. That didn't
> help, but I'm not sure I got all places where we do TLB invalidates, so
> I'll look at that some more tomorrow.

Thanks, appreciate your testing! I have no new data points since
yesterday, but the key point from then still seems to be that if an io
worker never reschedules onto a different CPU, then the problem doesn't
occur. This could very well be a page cache issue, if it isn't an issue
on the powerpc side...

-- 
Jens Axboe