[REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads)

Thorsten Leemhuis linux at leemhuis.info
Fri Nov 4 18:32:22 AEDT 2022


On 03.11.22 23:20, Johannes Weiner wrote:
> On Thu, Nov 03, 2022 at 11:46:52AM +0100, Thorsten Leemhuis wrote:
>> On 15.09.22 11:41, Christoph Hellwig wrote:
>>> btrfs compressed reads try to always read the entire compressed chunk,
>>> even if only a subset is requested.  Currently this is covered by the
>>> magic PSI accounting underneath submit_bio, but that is about to go
>>> away. Instead add manual psi_memstall_{enter,leave} annotations.
>>>
>>> Note that for readahead this really should be using readahead_expand,
>>> but the additionals reads are also done for plain ->read_folio where
>>> readahead_expand can't work, so this overall logic is left as-is for
>>> now.
>>
>> It seems this patch makes systemd-oomd overreact on my day-to-day
>> machine and aggressively kill applications. I'm not the only one that
>> noticed such a behavior with 6.1 pre-releases:
>> https://bugzilla.redhat.com/show_bug.cgi?id=2133829
>> https://bugzilla.redhat.com/show_bug.cgi?id=2134971
> [...]
>> On master as of today (8e5423e991e8) I can trigger the problem within a
>> minute or two. But I fail to trigger it with v6.0.6 or when I revert
>> 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads").
>> And yes, I use btrfs with compression for / and /home/.
> [...]
> 
> Oh, I think I see the bug. We can leak pressure state from the bio
> submission, which causes the task to permanently drive up pressure.

Thx for looking into this.

> Can you try this patch?

It apparently does the trick -- at least my test setup that usually
triggers the bug within a minute or two survived for nearly an hour now, so:

Tested-by: Thorsten Leemhuis <linux at leemhuis.info>

Can you please also add this tag to help future archeologists, as
explained by the kernel docs (for details see
Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst):

Link:
https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/

It also will make my regression tracking bot see further postings of
this patch and mark the issue as resolved once the patch lands in mainline.

tia and thx again for the patch!

Ciao, Thorsten

>>From 499e5cab7b39fc4c90a0f96e33cdc03274b316fd Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes at cmpxchg.org>
> Date: Thu, 3 Nov 2022 17:34:31 -0400
> Subject: [PATCH] fs: btrfs: fix leaked psi pressure state
> 
> When psi annotations were added to to btrfs compression reads, the psi
> state tracking over add_ra_bio_pages and btrfs_submit_compressed_read
> was faulty. The task can remain in a stall state after the read. This
> results in incorrectly elevated pressure, which triggers OOM kills.
> 
> pflags record the *previous* memstall state when we enter a new
> one. The code tried to initialize pflags to 1, and then optimize the
> leave call when we either didn't enter a memstall, or were already
> inside a nested stall. However, there can be multiple PageWorkingset
> pages in the bio, at which point it's that path itself that re-enters
> the state and overwrites pflags. This causes us to miss the exit.
> 
> Enter the stall only once if needed, then unwind correctly.
> 
> Reported-by: Thorsten Leemhuis <linux at leemhuis.info>
> Fixes: 4088a47e78f9 btrfs: add manual PSI accounting for compressed reads
> Signed-off-by: Johannes Weiner <hannes at cmpxchg.org>
> ---
>  fs/btrfs/compression.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index f1f051ad3147..e6635fe70067 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bio)
>  static noinline int add_ra_bio_pages(struct inode *inode,
>  				     u64 compressed_end,
>  				     struct compressed_bio *cb,
> -				     unsigned long *pflags)
> +				     int *memstall, unsigned long *pflags)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	unsigned long end_index;
> @@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  			continue;
>  		}
>  
> -		if (PageWorkingset(page))
> +		if (!*memstall && PageWorkingset(page)) {
>  			psi_memstall_enter(pflags);
> +			*memstall = 1;
> +		}
>  
>  		ret = set_page_extent_mapped(page);
>  		if (ret < 0) {
> @@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  	u64 em_len;
>  	u64 em_start;
>  	struct extent_map *em;
> -	/* Initialize to 1 to make skip psi_memstall_leave unless needed */
> -	unsigned long pflags = 1;
> +	unsigned long pflags;
> +	int memstall = 0;
>  	blk_status_t ret;
>  	int ret2;
>  	int i;
> @@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  		goto fail;
>  	}
>  
> -	add_ra_bio_pages(inode, em_start + em_len, cb, &pflags);
> +	add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags);
>  
>  	/* include any pages we added in add_ra-bio_pages */
>  	cb->len = bio->bi_iter.bi_size;
> @@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  		}
>  	}
>  
> -	if (!pflags)
> +	if (memstall)
>  		psi_memstall_leave(&pflags);
>  
>  	if (refcount_dec_and_test(&cb->pending_ios))


More information about the Linux-erofs mailing list