[PATCH linux dev-4.10 v3] drivers/hwmon/occ: Add sysfs_notify on error
eajames at linux.vnet.ibm.com
Sat Jul 8 05:36:46 AEST 2017
On 07/07/2017 01:37 PM, Patrick Williams wrote:
> On Fri, Jul 07, 2017 at 04:06:30PM +0930, Joel Stanley wrote:
>> On Fri, Jul 7, 2017 at 12:58 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
>>> From: "Edward A. James" <eajames at us.ibm.com>
>>> +void occ_set_error(struct occ *occ, int error)
>>> + occ->error_count++;
>>> + if (occ->error_count > OCC_ERROR_COUNT_THRESHOLD)
>> I think this policy can live in userspace. Instead of counting the
>> errors in the kernel and only reporting after some threshold, report
>> all of the errors.
>> Userspace can then chose to do with this information what it wants,
>> according to the policy of the system.
> This retry / threshold count is in the OCC specification itself and is
> not considered "policy of the system". I believe the specification
> requires not considering errors under a threshold count as actual
> errors and if there is a successful communication the threshold count is
> suppose to be reset. How do we coordinate that with userspace keeping a
> separate error policy?
In addition, there are a variety of different error types reported from
the driver, such as "OCC in safe state for one minute" and "OCCs present
mismatch". Those shouldn't be a part of the error counting, according to
the OCC spec.
More information about the openbmc