[Skiboot] [PATCH v2 2/2] npu2: hw-procedures: Add check_credits procedure
Reza Arbab
arbab at linux.vnet.ibm.com
Wed Nov 22 14:32:06 AEDT 2017
On Wed, Nov 22, 2017 at 01:39:45PM +1100, Michael Neuling wrote:
>On Tue, 2017-11-21 at 17:42 -0600, Reza Arbab wrote:
>> Assert that things are exactly as we expect, because if they aren't, the
>> system will experience a catastrophic failure shortly after the start of
>> link traffic.
>
>Asserting like this is a pretty nuclear option IMHO.
>
>Generally skiboot will just drop the hardware as deconfigured or return an error
>to the caller things don't work. Can we do that and punt the reboot problem
>back to higher layers in the stack?
Well, the higher layer in this scenario is a third-party binary kernel
module, and we want to accomodate existing versions of the driver which
don't handle this class of error yet.
It's basically a choice between failing loudly with a clear reason at
boot, or crashing in vague ways at runtime when traffic first passes the
link. We're finding out that the latter is much harder to diagnose.
I'll spin a v3 which hopefully makes this more clear in the change log.
And again, this is a mitigator which will only be needed until the
underlying hw issue causing the unexpected credit situation is resolved.
--
Reza Arbab
More information about the Skiboot
mailing list