[PATCH v2 6/7] powerpc/eeh: Allow disabling recovery

Fri Feb 15 16:58:05 AEDT 2019

On Fri, Feb 15, 2019 at 11:48:16AM +1100, Oliver O'Halloran wrote:
> Currently when we detect an error we automatically invoke the EEH recovery
> handler. This can be annoying when debugging EEH problems, or when working
> on EEH itself so this patch adds a debugfs knob that will prevent a
> recovery event from being queued up when an issue is detected.
> 
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> ---
>  arch/powerpc/include/asm/eeh.h  |  1 +
>  arch/powerpc/kernel/eeh.c       | 10 ++++++++++
>  arch/powerpc/kernel/eeh_event.c |  9 +++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 478f199d5663..810e05273ad3 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -220,6 +220,7 @@ struct eeh_ops {
>  
>  extern int eeh_subsystem_flags;
>  extern u32 eeh_max_freezes;
> +extern bool eeh_debugfs_no_recover;
>  extern struct eeh_ops *eeh_ops;
>  extern raw_spinlock_t confirm_error_lock;
>  
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 82d22c671c0e..9f20099ce2d9 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -111,6 +111,13 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
>   */
>  u32 eeh_max_freezes = 5;
>  
> +/*
> + * Controls whether a recovery event should be scheduled when an
> + * isolated device is discovered. This is only really useful for
> + * debugging problems with the EEH core.
> + */
> +bool eeh_debugfs_no_recover;
> +
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
>  
> @@ -1810,6 +1817,9 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> +		debugfs_create_bool("eeh_disable_recovery", 0600,
> +				powerpc_debugfs_root,
> +				&eeh_debugfs_no_recover);
>  		eeh_cache_debugfs_init();
>  #endif
>  	}
> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> index 227e57f980df..19837798bb1d 100644
> --- a/arch/powerpc/kernel/eeh_event.c
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -126,6 +126,15 @@ int eeh_send_failure_event(struct eeh_pe *pe)
>  	unsigned long flags;
>  	struct eeh_event *event;
>  
> +	/*
> +	 * If we've manually supressed recovery events via debugfs
> +	 * then just drop it on the floor.
> +	 */
> +	if (eeh_debugfs_no_recover) {
> +		pr_err("EEH: Event dropped due to no_recover setting\n");
> +		return 0;
> +	}
> +

I think it might be clearer if you did the 'no recovery' test at the
call sites (I think there are only a few), instead of inside the
function (and then the next patch wouldn't need to add a wrapper).

>  	event = kzalloc(sizeof(*event), GFP_ATOMIC);
>  	if (!event) {
>  		pr_err("EEH: out of memory, event not handled\n");
> -- 
> 2.20.1
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190215/835d1a03/attachment-0001.sig>