[PATCH v2] powerpc/pseries: Ratelimit EPOW event warnings
Vipin K Parashar
vipin at linux.vnet.ibm.com
Thu Jun 25 05:18:20 AEST 2015
On 06/02/2015 10:48 AM, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
> prompting user to take action depending upon the severity of the
> event.
>
> Some times same EPOW event warning, such as below could flood kernel
> log, over a period of time. So Limit the warnings by using ratelimit
> variant of pr_err. Also, merge adjacent pr_err/pr_emerg into single
> one to reduce the number of lines printed per warning.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
These messages are minutes apart and thus rate limiting won't help.
One solution could be to use a flag based approach. Set a flag once a
EPOW condition is detected and check that flag upon receiving EPOW_RESET.
EPOW condition clear message should be logged only if a EPOW was previously
detected i.e. flag found set.
>
> Signed-off-by: Kamalesh Babulal <kamalesh at linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual at linux.vnet.ibm.com>
> Cc: Anton Blanchard <anton at samba.org>
> Cc: Michael Ellerman <mpe at ellerman.id.au>
> ---
> v2 Changes:
> - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line
> warnings, based on Michael's comments.
>
> arch/powerpc/platforms/pseries/ras.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..3620935 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + pr_err_ratelimited("Non critical power or cooling issue cleared");
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err_ratelimited("Non critical cooling issue reported by firmware,"
> + " Check RTAS error log for details");
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err_ratelimited("Non critical power issue reported by firmware,"
> + " Check RTAS error log for details");
> break;
>
> case EPOW_SYSTEM_SHUTDOWN:
> @@ -169,15 +169,14 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> case EPOW_MAIN_ENCLOSURE:
> case EPOW_POWER_OFF:
> - pr_emerg("Critical power/cooling issue reported by firmware");
> - pr_emerg("Check RTAS error log for details");
> - pr_emerg("Immediate power off");
> + pr_emerg("Critical power/cooling issue reported by firmware,"
> + " Check RTAS error log for details. Immediate power off");
> emergency_sync();
> kernel_power_off();
> break;
>
> default:
> - pr_err("Unknown power/cooling event (action code %d)",
> + pr_err_ratelimited("Unknown power/cooling event (action code %d)",
> action_code);
> }
> }
More information about the Linuxppc-dev
mailing list