[PATCH v2] hwmon: (pmbus/ibm-cffps) Add clear_faults debugfs entry

Guenter Roeck linux at roeck-us.net
Fri Mar 18 05:50:53 AEDT 2022


On 3/17/22 09:12, Brandon Wyman wrote:
> On Wed, Mar 16, 2022 at 3:14 PM Guenter Roeck <linux at roeck-us.net> wrote:
>>
>> On 3/16/22 13:03, Brandon Wyman wrote:
>>> On Sun, Mar 13, 2022 at 11:36 PM Guenter Roeck <linux at roeck-us.net> wrote:
>>>>
>>>> On 3/11/22 10:10, Brandon Wyman wrote:
>>>>> Add a clear_faults write-only debugfs entry for the ibm-cffps device
>>>>> driver.
>>>>>
>>>>> Certain IBM power supplies require clearing some latched faults in order
>>>>> to indicate that the fault has indeed been observed/noticed.
>>>>>
>>>>
>>>> That is insufficient, sorry. Please provide the affected power supplies as
>>>> well as the affected faults, and confirm that the problem still exists
>>>> in v5.17-rc6 or later kernels - or, more specifically, in any kernel which
>>>> includes commit 35f165f08950 ("hwmon: (pmbus) Clear pmbus fault/warning
>>>> bits after read").
>>>>
>>>> Thanks,
>>>> Guenter
>>>
>>> Sorry for the delay in responding. I did some testing with commit
>>> 35f165f08950. I could not get that code to send the CLEAR_FAULTS
>>> command to the power supplies.
>>>
>>> I can update the commit message to be more specific about which power
>>> supplies need this CLEAR_FAULTS sent, and which faults. It is observed
>>> with the 1600W power supplies (2B1E model). The faults that latch are
>>> the VIN_UV and INPUT faults in the STATUS_WORD. The corresponding
>>> STATUS_INPUT fault bits are VIN_UV_FAULT and Unit is Off.
>>>
>>
>> The point is that the respective fault bits should be reset when the
>> corresponding alarm attributes are read. This isn't about executing
>> a CLEAR_FAULTS command, but about selectively resetting fault bits
>> while ensuring that faults are reported at least once. Executing
>> CLEAR_FAULTS is a big hammer.
>>
>> With the patch I pointed to in place, input (and other) faults should
>> be reset after the corresponding alarm attributes are read, assuming
>> that the condition no longer exists. If that does not happen, we should
>> fix the problem instead of deploying the big hammer.
>>
>> Thanks,
>> Guenter
> 
> Okay, I see what you are pointing out there. I had been mostly looking
> at the "files" in the debugfs paths. Those do not end up running
> through that pmbus_get_boolean() function, so the individual fault
> clearing was not being attempted. The fault I was interested in
> appears to be associated with in1_lcrti_alarm. Reading that will give
> me a 1 if there is a VIN_UV fault, and then it sends 0x10 to
> STATUS_INPUT. That clears out VIN_UV, but the STATUS_INPUT command was
> returning 0x18. Nothing appears to handle clearing BIT(3), that 0x08
> mask.
> 
> Should there be some kind of define for BIT(3) over in pmbus.h?
> Something like PB_VOLTAGE_OFF? Somehow we need something using that in
> sbit of the attributes. I had a quick hack that just OR'ed BIT(3) with
> BIT(4) for that PB_VOLTAGE_UV_FAULT. That resulted in a clear of both
> bits in STATUS_INPUT, and the faults clearing in STATUS_WORD.
> 
> It is not clear if there should be a separate alarm for that "Unit Off
> For Insufficient Input Voltage", or if the one for in1_lcrit_alarm
> could just be the two bits OR'ed into one mask. I can send a patch
> with a proposal on how to fix this one bit not getting cleared.
> 

We don't have a separate standard attribute. I think the best approach
would be to add a mask for bit 3 and or that mask for lcrit in
vin_limit_attrs with PB_VOLTAGE_UV_FAULT. I'd suggest to name the
define something like PB_VOLTAGE_VIN_OFF or PB_VOLTAGE_VIN_FAULT
to clarify that the bit applies to the input.

Thanks,
Guenter


More information about the openbmc mailing list