[PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings

Vipin K Parashar vipin at linux.vnet.ibm.com
Thu Jun 25 05:18:20 AEST 2015


On 06/02/2015 10:48 AM, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
> prompting user to take action depending upon the severity of the
> event.
>
> Some times same EPOW event warning, such as below could flood kernel
> log, over a period of time. So Limit the warnings by using ratelimit
> variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
> one to reduce the number of lines printed per warning.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared

These messages are minutes apart and thus rate limiting won't help.
One solution could be to use a flag based approach. Set a flag once a
EPOW condition is detected and check that flag upon receiving EPOW_RESET.
EPOW condition clear message should be logged only if a EPOW was previously
detected i.e. flag found set.

>
> Signed-off-by: Kamalesh Babulal <kamalesh at linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual at linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton at samba.org>
> Cc: Michael Ellerman <mpe at ellerman.id.au>
> ---
> v2 Changes:
>   - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
>     warnings, based on Michael's comments.
>
>   arch/powerpc/platforms/pseries/ras.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..3620935 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>   
>   	switch (action_code) {
>   	case EPOW_RESET:
> -		pr_err("Non critical power or cooling issue cleared");
> +		pr_err_ratelimited("Non critical power or cooling issue cleared");
>   		break;
>   
>   	case EPOW_WARN_COOLING:
> -		pr_err("Non critical cooling issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical cooling issue reported by firmware,"
> +				   " Check RTAS error log for details");
>   		break;
>   
>   	case EPOW_WARN_POWER:
> -		pr_err("Non critical power issue reported by firmware");
> -		pr_err("Check RTAS error log for details");
> +		pr_err_ratelimited("Non critical power issue reported by firmware,"
> +				   " Check RTAS error log for details");
>   		break;
>   
>   	case EPOW_SYSTEM_SHUTDOWN:
> @@ -169,15 +169,14 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>   
>   	case EPOW_MAIN_ENCLOSURE:
>   	case EPOW_POWER_OFF:
> -		pr_emerg("Critical power/cooling issue reported by firmware");
> -		pr_emerg("Check RTAS error log for details");
> -		pr_emerg("Immediate power off");
> +		pr_emerg("Critical power/cooling issue reported by firmware,"
> +			 " Check RTAS error log for details. Immediate power off");
>   		emergency_sync();
>   		kernel_power_off();
>   		break;
>   
>   	default:
> -		pr_err("Unknown power/cooling event (action code %d)",
> +		pr_err_ratelimited("Unknown power/cooling event (action code %d)",
>   			action_code);
>   	}
>   }



More information about the Linuxppc-dev mailing list