Memory coherency issue with IO thread offloading?

Christophe Leroy christophe.leroy at csgroup.eu
Fri Mar 24 18:27:25 AEDT 2023


Hi,

Le 23/03/2023 à 19:54, Jens Axboe a écrit :
> Hi,
> 
> I got a report sent to me from mariadb, in where 5.10.158 works fine and
> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
> case. Beware that this email is long, as I'm trying to include
> everything that may be relevant...

Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?

Christophe


> 
> The test case in question is pretty simple. On debian testing, do:
> 
> $ sudo apt-get install mariadb-test
> $ cd /usr/share/mysql/mysql-test
> $ ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force encryption.innodb_encryption,innodb,undo0 --repeat=200
> 
> and if it fails, you'll see something like:
> 
> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
>          Test ended at 2023-03-23 16:55:17
> 
> CURRENT_TEST: encryption.innodb_encryption
> mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads'
> 
> The result from queries just before the failure was:
> SET @start_global_value = @@global.innodb_encryption_threads;
> 
>   - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
> ***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption
> 
> 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is stable
> 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=221]. You may have to recover from a backup.
> 
> where data read was not as expected.
> 
> Now, there are a number of io_uring changes between .158 and .162, as it
> includes the backport that brought 5.10-stable into line with what
> 5.15-stable includes. I'll spare you all the digging I did to vet those
> changes, but the key thing is that it STILL happens on 6.3-git on
> powerpc.
> 
> After ruling out many things, one key difference between 158 and 162 is
> that the former offloaded requests that could not be done nonblocking to
> a kthread, and 162 and newer offloads to an IO thread. An IO thread is
> just a normal thread created from the application submitting IO, the
> only difference is that it never exits to userspace. An IO thread has
> the same mm/files/you-name-it from the original task. It really is the
> same as a userspace thread created by the application The switch to IO
> threads was done exactly because of that, rather than rely on a fragile
> scheme of having the kthread worker assume all sorts of identify from
> the original task. surprises if things were missed. This is what caused
> most of the io_uring security issues in the past.
> 
> The IO that mariadb does in this test is pretty simple - a bunch of
> largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
> buffered reads with IORING_OP_READV.
> 
> Today I finally gave up and ran a basic experiment, which simply
> offloads the writes to a kthread. Since powerpc has an interesting
> memory coherency model, my suspicion was that the work involved with
> switching MMs for the kthread could just be the main difference here.
> The patch is really dumb and simple - rather than queue the write to an
> IO thread, it just offloads it to a kthread that then does
> kthread_use_mm(), perform write with the same write handler,
> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
> 
> Which then leads me to the question, what about the IO thread offload
> makes this fail on powerpc (and no other arch I've tested on, including
> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
> thread in userspace in the application, and having that thread just
> perform the writes. Is there some magic involved with the kthread mm
> use/unuse that makes this sufficiently consistent on powerpc? I've tried
> any mix of isync()/mb and making the flush_dcache_page() unconditionally
> done in the filemap read/write helpers, and it still falls flat on its
> face with the offload to an IO thread.
> 
> I must clearly be missing something here, which is why I'm emailing the
> powerpc Gods for help :-)
> 


More information about the Linuxppc-dev mailing list