[PATCH linux dev-4.10] drivers/hwmon/occ: Switch to non-blocking for occ reads
Eddie James
eajames at linux.vnet.ibm.com
Wed Sep 20 00:56:38 AEST 2017
On 09/19/2017 02:13 AM, Jeremy Kerr wrote:
> Hi Eddie,
>
>> It's possible for the sbefifo ops to hang entirely if the system goes
>> down (checkstop, etc), meaning that the occ and occ-hwmon drivers may
>> hang during remove().
> How does this address that though? Your comment implies that a thread is
> stuck in occ_drv_read, is that correct?
Correct.
>
> I see that you're now opening in non-blocking mode, but:
>
>> - rc = occ_drv_read(client, (char *)resp, sizeof(*resp));
>> - if (rc < 0)
>> - goto err;
>> + do {
>> + rc = occ_drv_read(client, (char *)resp, sizeof(*resp));
>> + if (rc != -EAGAIN)
>> + break;
>> +
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + schedule_timeout(msecs_to_jiffies(OCC_CMD_INCOMPLETE_MS));
>> + } while (!p9_sbe_occ->cancel);
> - the only way that you hit the !p9_sbe_occ->cancel condition is if
> we've run the ->remove callback, in which case we *definitely* shouldn't
> have threads still referencing the device.
Hmm. In this error case, remove() is called from the sysfs unbind
pathway. I could be wrong, but I don't see anything in the kernel to
block in or return from unbind if there are hwmon files open. So I think
it's possible to get to remove() with threads hanging around.
> There seems to be nothing
> preventing p9_sbe_occ from being free()ed underneath you.
That is true... Most likely we get lucky due to trying to remove hwmon
files before freeing p9_sbe_occ... I'll have to fix this.
Thanks for the review!
Eddie
>
> If this patch works, it suggests that there's either a
> reference-counting issue, or a memory leak.
>
> Regards,
>
>
> Jeremy
>
More information about the openbmc
mailing list