[REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads)

Thorsten Leemhuis linux at leemhuis.info
Thu Nov 3 21:46:52 AEDT 2022


Hi Christoph!

On 15.09.22 11:41, Christoph Hellwig wrote:
> btrfs compressed reads try to always read the entire compressed chunk,
> even if only a subset is requested.  Currently this is covered by the
> magic PSI accounting underneath submit_bio, but that is about to go
> away. Instead add manual psi_memstall_{enter,leave} annotations.
> 
> Note that for readahead this really should be using readahead_expand,
> but the additionals reads are also done for plain ->read_folio where
> readahead_expand can't work, so this overall logic is left as-is for
> now.

It seems this patch makes systemd-oomd overreact on my day-to-day
machine and aggressively kill applications. I'm not the only one that
noticed such a behavior with 6.1 pre-releases:
https://bugzilla.redhat.com/show_bug.cgi?id=2133829
https://bugzilla.redhat.com/show_bug.cgi?id=2134971

I think I have a pretty reliable way to trigger the issue that involves
starting the apps that I normally use and a VM that I occasionally use,
which up to now never resulted in such a behaviour.

On master as of today (8e5423e991e8) I can trigger the problem within a
minute or two. But I fail to trigger it with v6.0.6 or when I revert
4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads").
And yes, I use btrfs with compression for / and /home/.

See [1] for a log msg from systemd-oomd.

Note, I had some trouble with bisecting[2]. This series looked
suspicious, so I removed it completely ontop of master and the problem
went away. Then I tried reverting only 4088a47e78f9 which helped, too.
Let me know if you want me to try another combination or need more data.

Ciao, Thorsten


[1] just one example:
```
> 10:52:29 t14s systemd-oomd[1261]: Considered 60 cgroups for killing, top candidates were:
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/packagekit.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 93.66 Avg60: 38.22 Avg300: 9.48 Total: 29s
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 276.9M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 181098
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 178926
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/firewalld.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 34.6M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 13035
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 12854
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/sssd-kcm.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 184us
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 32.9M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 7667
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 7501
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/systemd-journald.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 8ms
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 14.5M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 13020
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 12914
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/libvirtd.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 18.9M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 12983
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 12896
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/geoclue.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 18.0M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 3625
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 3550
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/polkit.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 2ms
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 15.9M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 10664
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 10596
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/NetworkManager.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 3ms
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 6.6M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 2515
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 2492
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/abrt-xorg.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 5.2M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 35154
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 35131
> 10:52:29 t14s systemd-oomd[1261]:         Path: /system.slice/dbus-broker.service
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Pressure Limit: 0.00%
> 10:52:29 t14s systemd-oomd[1261]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
> 10:52:29 t14s systemd-oomd[1261]:                 Current Memory Usage: 7.5M
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Min: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Memory Low: 0B
> 10:52:29 t14s systemd-oomd[1261]:                 Pgscan: 1183
> 10:52:29 t14s systemd-oomd[1261]:                 Last Pgscan: 1161
> 10:52:29 t14s systemd-oomd[1261]: Killed /system.slice/packagekit.service due to memory pressure for /system.slice being 91.73% > 50.00% for > 20s with reclaim activity
```

[2]

I have no idea what went wrong. 0a78a376ef3c (the last merge before the
one with this series) was fine, but c6ea70604249 (which afaics basically
is the next commit if I understand things right) was not. I tried
reverting it, which should give me the merge base (v6.0-rc2), but it was
broken, too. I guess I must have done something wrong, but I have no
idea what, but I tried again and got the same result. :-/

/me must be missing something and/or not understand git properly...


> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: David Sterba <dsterba at suse.com>
> Acked-by: Johannes Weiner <hannes at cmpxchg.org>
> ---
>  fs/btrfs/compression.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index e84d22c5c6a83..370788b9b1249 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -15,6 +15,7 @@
>  #include <linux/string.h>
>  #include <linux/backing-dev.h>
>  #include <linux/writeback.h>
> +#include <linux/psi.h>
>  #include <linux/slab.h>
>  #include <linux/sched/mm.h>
>  #include <linux/log2.h>
> @@ -519,7 +520,8 @@ static u64 bio_end_offset(struct bio *bio)
>   */
>  static noinline int add_ra_bio_pages(struct inode *inode,
>  				     u64 compressed_end,
> -				     struct compressed_bio *cb)
> +				     struct compressed_bio *cb,
> +				     unsigned long *pflags)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	unsigned long end_index;
> @@ -588,6 +590,9 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  			continue;
>  		}
>  
> +		if (PageWorkingset(page))
> +			psi_memstall_enter(pflags);
> +
>  		ret = set_page_extent_mapped(page);
>  		if (ret < 0) {
>  			unlock_page(page);
> @@ -674,6 +679,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  	u64 em_len;
>  	u64 em_start;
>  	struct extent_map *em;
> +	/* Initialize to 1 to make skip psi_memstall_leave unless needed */
> +	unsigned long pflags = 1;
>  	blk_status_t ret;
>  	int ret2;
>  	int i;
> @@ -729,7 +736,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  		goto fail;
>  	}
>  
> -	add_ra_bio_pages(inode, em_start + em_len, cb);
> +	add_ra_bio_pages(inode, em_start + em_len, cb, &pflags);
>  
>  	/* include any pages we added in add_ra-bio_pages */
>  	cb->len = bio->bi_iter.bi_size;
> @@ -810,6 +817,9 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  		}
>  	}
>  
> +	if (!pflags)
> +		psi_memstall_leave(&pflags);
> +
>  	if (refcount_dec_and_test(&cb->pending_ios))
>  		finish_compressed_bio_read(cb);
>  	return;


More information about the Linux-erofs mailing list