[Skiboot] [PATCH v2 2/2] npu2: hw-procedures: Add check_credits procedure

Reza Arbab arbab at linux.vnet.ibm.com
Wed Nov 22 14:32:06 AEDT 2017


On Wed, Nov 22, 2017 at 01:39:45PM +1100, Michael Neuling wrote:
>On Tue, 2017-11-21 at 17:42 -0600, Reza Arbab wrote:
>> Assert that things are exactly as we expect, because if they aren't, the
>> system will experience a catastrophic failure shortly after the start of
>> link traffic.
>
>Asserting like this is a pretty nuclear option IMHO.
>
>Generally skiboot will just drop the hardware as deconfigured or return an error
>to the caller things don't work.  Can we do that and punt the reboot problem
>back to higher layers in the stack?

Well, the higher layer in this scenario is a third-party binary kernel 
module, and we want to accomodate existing versions of the driver which 
don't handle this class of error yet.

It's basically a choice between failing loudly with a clear reason at 
boot, or crashing in vague ways at runtime when traffic first passes the 
link. We're finding out that the latter is much harder to diagnose.

I'll spin a v3 which hopefully makes this more clear in the change log.

And again, this is a mitigator which will only be needed until the 
underlying hw issue causing the unexpected credit situation is resolved.

-- 
Reza Arbab



More information about the Skiboot mailing list