[Bug 216368] do_IRQ: stack overflow at boot during btrfs handling on a PowerMac G5 11,2

bugzilla-daemon at kernel.org bugzilla-daemon at kernel.org
Tue Sep 6 03:16:28 AEST 2022


https://bugzilla.kernel.org/show_bug.cgi?id=216368

--- Comment #6 from David Sterba (dsterba at suse.com) ---
(In reply to Christophe Leroy from comment #3)
> This happens when you get an IRQ while being deep into BTRFS handling it
> seems.
> 
> It should be investigated with BTRFS team why the callstack is so deep.

There's nothing strange on the call stack, contains all functions that are
expected when handling a page fault, looking up internal structures and then
passing to block layer to get the bytes from the device.

Deep stack, measured by number of functions and also size is normal for
filesystrems and we try to keep the size sane, so far we haven't seen such
problems on x86_64, the overall stack size is 16K an on debug kernel there are
about 6K consumed at maximum (reported by CONFIG_DEBUG_STACK_USAGE=y and
CONFIG_SCHED_STACK_END_CHECK=y), the lowest value found I see in my logs is
10576.

That is on a simple IO stack, ie. what's below the filesystem, as you can see
the mblk-mq, NVMe and DMA also cut some stack space, but this also does not
seem suspicious. What could be significant is layering with MD, device mapper,
NFS, networking.

The first number in the stack trace is the stack pointer, calculating what
btrfs itself takes:

[c000000019da5cf0] [c0000000004d66b4] .btrfs_submit_bio+0x274/0x5c0
[c000000019da5e00] [c000000000481f44] .btrfs_submit_metadata_bio+0x54/0x110
[c000000019da5e80] [c0000000004bd828] .submit_one_bio+0xb8/0x130
[c000000019da5f00] [c0000000004c84b0] .read_extent_buffer_pages+0x310/0x750
[c000000019da6020] [c000000000481b48] .btrfs_read_extent_buffer+0xd8/0x1b0
[c000000019da60f0] [c00000000048208c] .read_tree_block+0x5c/0x130
[c000000019da6190] [c0000000004609a8] .read_block_for_search+0x2c8/0x410
[c000000019da62b0] [c000000000466a30] .btrfs_search_slot+0x380/0xcf0
[c000000019da6400] [c00000000047adf4] .btrfs_lookup_csum+0x64/0x1d0
[c000000019da64d0] [c00000000047b754] .btrfs_lookup_bio_sums+0x274/0x6e0
[c000000019da6630] [c000000000505d18] .btrfs_submit_compressed_read+0x3b8/0x520
[c000000019da6720] [c0000000004954b4] .btrfs_submit_data_read_bio+0xc4/0xe0
[c000000019da67b0] [c0000000004bd7fc] .submit_one_bio+0x8c/0x130
[c000000019da6830] [c0000000004c4478] .submit_extent_page+0x548/0x590
[c000000019da6980] [c0000000004c4f80] .btrfs_do_readpage+0x330/0x970
[c000000019da6ad0] [c0000000004c67f4] .extent_readahead+0x2b4/0x430
[c000000019da6c70] [c000000000490440] .btrfs_readahead+0x10/0x30

0xc000000019da6c70 - 0xc000000019da5cf0 = 3968

That's on par with my expectation.

Total stack space is (from syscall to the irq handler):

0xc000000019da7e10 - 0xc000000019da4c00 = 12816

That's getting close to 16K but still a few kilobytes before overflow, the IRQ
has it's own stack (that needs to be set up from the kthread/process context).

As you mention KASAN, that can add some stack consumption due to padding and
alignment, but so far I don't know what exactly is the warning measuring.
Calculating back from do_IRQ by 3072 it's roughly 0xc000000019da5910, inside
blk_mq_flush_plug_list.

I remember some build reports from PPC that due to a different compiler used
the function inlining caused increased stack space consumption (eg. due to
aggressive optimizations that unrolled loops too much using several additional
temporary variables). So that should be investigated too before blaming btrfs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the Linuxppc-dev mailing list