[REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads)

Johannes Weiner hannes at cmpxchg.org
Fri Nov 4 09:20:59 AEDT 2022


On Thu, Nov 03, 2022 at 11:46:52AM +0100, Thorsten Leemhuis wrote:
> Hi Christoph!
> 
> On 15.09.22 11:41, Christoph Hellwig wrote:
> > btrfs compressed reads try to always read the entire compressed chunk,
> > even if only a subset is requested.  Currently this is covered by the
> > magic PSI accounting underneath submit_bio, but that is about to go
> > away. Instead add manual psi_memstall_{enter,leave} annotations.
> > 
> > Note that for readahead this really should be using readahead_expand,
> > but the additionals reads are also done for plain ->read_folio where
> > readahead_expand can't work, so this overall logic is left as-is for
> > now.
> 
> It seems this patch makes systemd-oomd overreact on my day-to-day
> machine and aggressively kill applications. I'm not the only one that
> noticed such a behavior with 6.1 pre-releases:
> https://bugzilla.redhat.com/show_bug.cgi?id=2133829
> https://bugzilla.redhat.com/show_bug.cgi?id=2134971
> 
> I think I have a pretty reliable way to trigger the issue that involves
> starting the apps that I normally use and a VM that I occasionally use,
> which up to now never resulted in such a behaviour.
> 
> On master as of today (8e5423e991e8) I can trigger the problem within a
> minute or two. But I fail to trigger it with v6.0.6 or when I revert
> 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads").
> And yes, I use btrfs with compression for / and /home/.
> 
> See [1] for a log msg from systemd-oomd.
> 
> Note, I had some trouble with bisecting[2]. This series looked
> suspicious, so I removed it completely ontop of master and the problem
> went away. Then I tried reverting only 4088a47e78f9 which helped, too.
> Let me know if you want me to try another combination or need more data.

Oh, I think I see the bug. We can leak pressure state from the bio
submission, which causes the task to permanently drive up pressure.

Can you try this patch?

>From 499e5cab7b39fc4c90a0f96e33cdc03274b316fd Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes at cmpxchg.org>
Date: Thu, 3 Nov 2022 17:34:31 -0400
Subject: [PATCH] fs: btrfs: fix leaked psi pressure state

When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read
was faulty. The task can remain in a stall state after the read. This
results in incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new
one. The code tried to initialize pflags to 1, and then optimize the
leave call when we either didn't enter a memstall, or were already
inside a nested stall. However, there can be multiple PageWorkingset
pages in the bio, at which point it's that path itself that re-enters
the state and overwrites pflags. This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

Reported-by: Thorsten Leemhuis <linux at leemhuis.info>
Fixes: 4088a47e78f9 btrfs: add manual PSI accounting for compressed reads
Signed-off-by: Johannes Weiner <hannes at cmpxchg.org>
---
 fs/btrfs/compression.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f1f051ad3147..e6635fe70067 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bio)
 static noinline int add_ra_bio_pages(struct inode *inode,
 				     u64 compressed_end,
 				     struct compressed_bio *cb,
-				     unsigned long *pflags)
+				     int *memstall, unsigned long *pflags)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long end_index;
@@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			continue;
 		}
 
-		if (PageWorkingset(page))
+		if (!*memstall && PageWorkingset(page)) {
 			psi_memstall_enter(pflags);
+			*memstall = 1;
+		}
 
 		ret = set_page_extent_mapped(page);
 		if (ret < 0) {
@@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
-	/* Initialize to 1 to make skip psi_memstall_leave unless needed */
-	unsigned long pflags = 1;
+	unsigned long pflags;
+	int memstall = 0;
 	blk_status_t ret;
 	int ret2;
 	int i;
@@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto fail;
 	}
 
-	add_ra_bio_pages(inode, em_start + em_len, cb, &pflags);
+	add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags);
 
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
@@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		}
 	}
 
-	if (!pflags)
+	if (memstall)
 		psi_memstall_leave(&pflags);
 
 	if (refcount_dec_and_test(&cb->pending_ios))
-- 
2.38.1


More information about the Linux-erofs mailing list