[PATCH][v3] hung_task: Panic after fixed number of hung tasks
Lance Yang
lance.yang at linux.dev
Tue Oct 14 21:59:07 AEDT 2025
On 2025/10/14 17:45, Petr Mladek wrote:
> On Tue 2025-10-14 13:23:58, Lance Yang wrote:
>> Thanks for the patch!
>>
>> I noticed the implementation panics only when N tasks are detected
>> within a single scan, because total_hung_task is reset for each
>> check_hung_uninterruptible_tasks() run.
>
> Great catch!
>
> Does it make sense?
> Is is the intended behavior, please?
>
>> So some suggestions to align the documentation with the code's
>> behavior below :)
>
>> On 2025/10/12 19:50, lirongqing wrote:
>>> From: Li RongQing <lirongqing at baidu.com>
>>>
>>> Currently, when 'hung_task_panic' is enabled, the kernel panics
>>> immediately upon detecting the first hung task. However, some hung
>>> tasks are transient and the system can recover, while others are
>>> persistent and may accumulate progressively.
>
> My understanding is that this patch wanted to do:
>
> + report even temporary stalls
> + panic only when the stall was much longer and likely persistent
>
> Which might make some sense. But the code does something else.
Cool. Sounds good to me!
>
>>> --- a/kernel/hung_task.c
>>> +++ b/kernel/hung_task.c
>>> @@ -229,9 +232,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>>> */
>>> sysctl_hung_task_detect_count++;
>>> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>>> trace_sched_process_hang(t);
>>> - if (sysctl_hung_task_panic) {
>>> + if (sysctl_hung_task_panic &&
>>> + (total_hung_task >= sysctl_hung_task_panic)) {
>>> console_verbose();
>>> hung_task_show_lock = true;
>>> hung_task_call_panic = true;
>
> I would expect that this patch added another counter, similar to
> sysctl_hung_task_detect_count. It would be incremented only
> once per check when a hung task was detected. And it would
> be cleared (reset) when no hung task was found.
Much cleaner. We could add an internal counter for that, yeah. No need
to expose it to userspace ;)
Petr's suggestion seems to align better with the goal of panicking on
persistent hangs, IMHO. Panic after N consecutive checks with hung tasks.
@RongQing does that work for you?
More information about the Linux-aspeed
mailing list