[PATCH linux dev-4.10] drivers/hwmon/occ: Switch to non-blocking for occ reads

Eddie James eajames at linux.vnet.ibm.com
Wed Sep 20 00:56:38 AEST 2017



On 09/19/2017 02:13 AM, Jeremy Kerr wrote:
> Hi Eddie,
>
>> It's possible for the sbefifo ops to hang entirely if the system goes
>> down (checkstop, etc), meaning that the occ and occ-hwmon drivers may
>> hang during remove().
> How does this address that though? Your comment implies that a thread is
> stuck in occ_drv_read, is that correct?

Correct.

>
> I see that you're now opening in non-blocking mode, but:
>
>> -	rc = occ_drv_read(client, (char *)resp, sizeof(*resp));
>> -	if (rc < 0)
>> -		goto err;
>> +	do {
>> +		rc = occ_drv_read(client, (char *)resp, sizeof(*resp));
>> +		if (rc != -EAGAIN)
>> +			break;
>> +
>> +		set_current_state(TASK_INTERRUPTIBLE);
>> +		schedule_timeout(msecs_to_jiffies(OCC_CMD_INCOMPLETE_MS));
>> +	} while (!p9_sbe_occ->cancel);
> - the only way that you hit the !p9_sbe_occ->cancel condition is if
> we've run the ->remove callback, in which case we *definitely* shouldn't
> have threads still referencing the device.

Hmm. In this error case, remove() is called from the sysfs unbind 
pathway. I could be wrong, but I don't see anything in the kernel to 
block in or return from unbind if there are hwmon files open. So I think 
it's possible to get to remove() with threads hanging around.

> There seems to be nothing
> preventing p9_sbe_occ from being free()ed underneath you.

That is true... Most likely we get lucky due to trying to remove hwmon 
files before freeing p9_sbe_occ... I'll have to fix this.

Thanks for the review!
Eddie

>
> If this patch works, it suggests that there's either a
> reference-counting issue, or a memory leak.
>
> Regards,
>
>
> Jeremy
>



More information about the openbmc mailing list