[PATCH 0/2] Allow configurable stack size (especially 32k on PPC64)
Gabriel Paubert
paubert at iram.es
Tue Feb 21 23:51:00 AEDT 2017
On Tue, Feb 21, 2017 at 09:24:38AM +1300, Hamish Martin wrote:
> This patch series adds the ability to configure the THREAD_SHIFT value and
> thereby alter the stack size on powerpc systems. We are particularly interested
> in configuring for a 32k stack on PPC64.
>
> Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows as a
> result of applying a DTS overlay containing some I2C devices. Our scenario is
> an ethernet switch chassis with plug-in cards. The I2C is driven from the T2081
> through a PCA9548 mux on the main board. When we detect insertion of the plugin
> card we schedule work for a call to of_overlay_create() to install a DTS
> overlay for the plugin board. This DTS overlay contains a further PCA9548 mux
> with more devices hanging off it including a PCA9539 GPIO expander. The
> ultimate installed I2C tree is:
>
> T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander
>
> When we install the overlay the devices described in the overlay are probed and
> we see a large number of stack frames used as a result. If this is coupled with
> an interrupt happening that requires moderate to high stack use we observe
> stack corruption. Here is an example long stack (from a 4.10-rc8 kernel) that
> does not show corruption but does demonstrate the length and frame sizes
> involved.
>
> Depth Size Location (72 entries)
> ----- ---- --------
> 0) 13872 128 .__raise_softirq_irqoff+0x1c/0x130
> 1) 13744 144 .raise_softirq+0x30/0x70
> 2) 13600 112 .invoke_rcu_core+0x54/0x70
> 3) 13488 336 .rcu_check_callbacks+0x294/0xde0
> 4) 13152 128 .update_process_times+0x40/0x90
> 5) 13024 144 .tick_sched_handle.isra.16+0x40/0xb0
> 6) 12880 144 .tick_sched_timer+0x6c/0xe0
> 7) 12736 272 .__hrtimer_run_queues+0x1a0/0x4b0
> 8) 12464 208 .hrtimer_interrupt+0xe8/0x2a0
> 9) 12256 160 .__timer_interrupt+0xdc/0x330
> 10) 12096 160 .timer_interrupt+0x138/0x190
> 11) 11936 752 exc_0x900_common+0xe0/0xe4
> 12) 11184 128 .ftrace_ops_no_ops+0x11c/0x230
> 13) 11056 176 .ftrace_ops_test.isra.12+0x30/0x50
> 14) 10880 160 .ftrace_ops_no_ops+0xd4/0x230
> 15) 10720 112 ftrace_call+0x4/0x8
> 16) 10608 176 .lock_timer_base+0x3c/0xf0
> 17) 10432 144 .try_to_del_timer_sync+0x2c/0x90
> 18) 10288 128 .del_timer_sync+0x60/0x80
> 19) 10160 256 .schedule_timeout+0x1fc/0x490
> 20) 9904 208 .i2c_wait+0x238/0x290
> 21) 9696 256 .mpc_xfer+0x4e4/0x570
> 22) 9440 208 .__i2c_transfer+0x158/0x6d0
> 23) 9232 192 .pca954x_reg_write+0x70/0x110
> 24) 9040 160 .__i2c_mux_master_xfer+0xb4/0xf0
> 25) 8880 208 .__i2c_transfer+0x158/0x6d0
> 26) 8672 192 .pca954x_reg_write+0x70/0x110
> 27) 8480 144 .pca954x_select_chan+0x68/0xa0
> 28) 8336 160 .__i2c_mux_master_xfer+0x64/0xf0
> 29) 8176 208 .__i2c_transfer+0x158/0x6d0
> 30) 7968 144 .i2c_transfer+0x98/0x130
> 31) 7824 320 .i2c_smbus_xfer_emulated+0x168/0x600
> 32) 7504 208 .i2c_smbus_xfer+0x1c0/0x5d0
> 33) 7296 192 .i2c_smbus_write_byte_data+0x50/0x70
> 34) 7104 144 .pca953x_write_single+0x6c/0xe0
> 35) 6960 192 .pca953x_gpio_direction_output+0xa4/0x160
> 36) 6768 160 ._gpiod_direction_output_raw+0xec/0x460
> 37) 6608 160 .gpiod_hog+0x98/0x250
> 38) 6448 176 .of_gpiochip_add+0xdc/0x1c0
> 39) 6272 256 .gpiochip_add_data+0x4f4/0x8c0
> 40) 6016 144 .devm_gpiochip_add_data+0x64/0xf0
> 41) 5872 208 .pca953x_probe+0x2b4/0x5f0
> 42) 5664 144 .i2c_device_probe+0x224/0x2e0
> 43) 5520 160 .really_probe+0x244/0x380
> 44) 5360 160 .bus_for_each_drv+0x94/0x100
> 45) 5200 160 .__device_attach+0x118/0x160
> 46) 5040 144 .bus_probe_device+0xe8/0x100
> 47) 4896 208 .device_add+0x500/0x6c0
> 48) 4688 144 .i2c_new_device+0x1f8/0x240
> 49) 4544 256 .of_i2c_register_device+0x160/0x280
> 50) 4288 192 .i2c_register_adapter+0x238/0x630
> 51) 4096 208 .i2c_mux_add_adapter+0x3f8/0x540
> 52) 3888 192 .pca954x_probe+0x234/0x370
> 53) 3696 144 .i2c_device_probe+0x224/0x2e0
> 54) 3552 160 .really_probe+0x244/0x380
> 55) 3392 160 .bus_for_each_drv+0x94/0x100
> 56) 3232 160 .__device_attach+0x118/0x160
> 57) 3072 144 .bus_probe_device+0xe8/0x100
> 58) 2928 208 .device_add+0x500/0x6c0
> 59) 2720 144 .i2c_new_device+0x1f8/0x240
> 60) 2576 256 .of_i2c_register_device+0x160/0x280
> 61) 2320 144 .of_i2c_notify+0x12c/0x1d0
> 62) 2176 160 .notifier_call_chain+0x8c/0x100
> 63) 2016 160 .__blocking_notifier_call_chain+0x6c/0xe0
> 64) 1856 208 .__of_changeset_entry_notify+0xd8/0x140
> 65) 1648 192 .__of_changeset_apply+0x7c/0x100
> 66) 1456 272 .of_overlay_create+0x2e0/0x4b0
> 67) 1184 128 .xem2_install_overlay+0x40/0x90
> 68) 1056 176 .process_one_work+0x18c/0x540
> 69) 880 240 .worker_thread+0x98/0x550
> 70) 640 192 .kthread+0x150/0x190
> 71) 448 448 .ret_from_kernel_thread+0x58/0x64
> 13872
>
> Obviously this could be avoided by constant whack-a-mole type activity of
> restructuring code. We have in fact reworked our code from a two overlay
> install to a one overlay install to avoid the worst cases. However, we believe
> there is a more fundamental issue at the heart of the problem that ought to be
> addressed.
>
> In this thread from 2008 (https://lkml.org/lkml/2008/11/17/493) discussing
> similar issues it is observed that the minimum stack frame size for PPC64 is
> 112 bytes compared with 16 bytes for PPC32. We consider that this fact means
> the straight doubling of the 8k PPC32 stack to 16K for PPC64 does not lead to
> an "equitable" situation with regard to stack headroom. The PPC64 system will
> not have an equivalent amount of space to operate in.
Wouldn'it be better to try to switch to the Elf V2 ABI, which has a minimal frame
size of 32 bytes on PPC64?
For now it has only been used for little-endian kernel and applications,
but according to messages that I have seen on the list, switching the kernel
to Elf V2 should be possible.
Gabriel
>
> For instance for a 70 frame stack, the architecture overhead just for the stack
> frames is:
> 70 * 16 bytes = 1120 bytes for PPC32, and
> 70 * 112 bytes = 7840 bytes for PPC64.
> So a simple doubling of the PPC32 stack size leaves us with a shortfall of 5600
> bytes (7840 - (2 * 1120)). In the example the stack frame overhead for PPC32 is
> 1120/8192 = 13.5% of the stack space, whereas for PPC64 it is 7840/16384 =
> 47.8% of the space.
>
> The aim of this series is to provide the ability for users to configure for
> larger stacks without altering the defaults in a way that would impact existing
> users. However, given the inequity between the PPC32 and PPC64 stacks when
> taking into account the respective minimum stack frame sizes, we believe
> consideration should be given to having a large default. We would appreciate
> any input or opinions on this issue.
More information about the Linuxppc-dev
mailing list