[RFC PATCH] powerpc/pseries: Ratelimit EPOW event warnings
Michael Ellerman
mpe at ellerman.id.au
Mon Jun 1 21:26:51 AEST 2015
On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote:
> We print the respective warning after parsing EPOW interrupts,
> prompting user to take action depending upon the severity of the
> event.
>
> Some times same EPOW event warning, such as below could flood kernel
> log, within very short duration. So Limit the message by using
> ratelimit variant of pr_err.
>
> May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
> May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
> May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
> May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
> May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared
> May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared
> May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared
> May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared
> May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared
Looking at the time stamps those are actually all fairly far apart in time,
aren't they? So do we actually see them within a short duration in practice?
It does seem sensible to rate limit them though.
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..2556bc2 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
>
> switch (action_code) {
> case EPOW_RESET:
> - pr_err("Non critical power or cooling issue cleared");
> + pr_err_ratelimited("Non critical power or cooling issue cleared");
> break;
>
> case EPOW_WARN_COOLING:
> - pr_err("Non critical cooling issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err_ratelimited("Non critical cooling issue reported by firmware");
> + pr_err_ratelimited("Check RTAS error log for details");
> break;
>
> case EPOW_WARN_POWER:
> - pr_err("Non critical power issue reported by firmware");
> - pr_err("Check RTAS error log for details");
> + pr_err_ratelimited("Non critical power issue reported by firmware");
> + pr_err_ratelimited("Check RTAS error log for details");
> break;
Those last two could be collapsed onto one line which would reduce the spam.
cheers
More information about the Linuxppc-dev
mailing list