[PATCH][v3] hung_task: Panic after fixed number of hung tasks

Lance Yang lance.yang at linux.dev
Tue Oct 14 21:59:07 AEDT 2025



On 2025/10/14 17:45, Petr Mladek wrote:
> On Tue 2025-10-14 13:23:58, Lance Yang wrote:
>> Thanks for the patch!
>>
>> I noticed the implementation panics only when N tasks are detected
>> within a single scan, because total_hung_task is reset for each
>> check_hung_uninterruptible_tasks() run.
> 
> Great catch!
> 
> Does it make sense?
> Is is the intended behavior, please?
> 
>> So some suggestions to align the documentation with the code's
>> behavior below :)
> 
>> On 2025/10/12 19:50, lirongqing wrote:
>>> From: Li RongQing <lirongqing at baidu.com>
>>>
>>> Currently, when 'hung_task_panic' is enabled, the kernel panics
>>> immediately upon detecting the first hung task. However, some hung
>>> tasks are transient and the system can recover, while others are
>>> persistent and may accumulate progressively.
> 
> My understanding is that this patch wanted to do:
> 
>     + report even temporary stalls
>     + panic only when the stall was much longer and likely persistent
> 
> Which might make some sense. But the code does something else.

Cool. Sounds good to me!

> 
>>> --- a/kernel/hung_task.c
>>> +++ b/kernel/hung_task.c
>>> @@ -229,9 +232,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>>>    	 */
>>>    	sysctl_hung_task_detect_count++;
>>> +	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>>>    	trace_sched_process_hang(t);
>>> -	if (sysctl_hung_task_panic) {
>>> +	if (sysctl_hung_task_panic &&
>>> +			(total_hung_task >= sysctl_hung_task_panic)) {
>>>    		console_verbose();
>>>    		hung_task_show_lock = true;
>>>    		hung_task_call_panic = true;
> 
> I would expect that this patch added another counter, similar to
> sysctl_hung_task_detect_count. It would be incremented only
> once per check when a hung task was detected. And it would
> be cleared (reset) when no hung task was found.

Much cleaner. We could add an internal counter for that, yeah. No need
to expose it to userspace ;)

Petr's suggestion seems to align better with the goal of panicking on
persistent hangs, IMHO. Panic after N consecutive checks with hung tasks.

@RongQing does that work for you?


More information about the Linux-aspeed mailing list