[PATCH linux dev-4.13 1/2] Revert "drivers/hwmon/occ: Add temperature fault attribute and VRM temp alarm"

Eddie James eajames at linux.vnet.ibm.com
Fri Mar 2 09:47:33 AEDT 2018


Sorry, ignore this one. I renamed this patch to hwmon (occ): Remove 
temperature fault attribute and VRM temp alarm and forgot to delete this 
one.


On 03/01/2018 04:44 PM, Eddie James wrote:
> From: "Edward A. James" <eajames at us.ibm.com>
>
> This reverts commit e55423ee10a5057338d24383c00e813436a126ea.
>
> Apologies for pushing this up so early... Userspace applications aren't
> ready for this change. The hwmon polling application cannot accept EGAIN
> yet, and we can't be returning apparent errors if the sensor is
> temporarily unavailable.
>
> Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>> userspace retries for a configurable period on eio, etimedout,
>> ebadmsg, eagain, and exnio.  It also exits cleanly on enoent.
>>
>> If any of these errnos don’t go away after the configured number of retry
>> attempts, an error is logged.  Any other errnos and an error is logged
>> immediately. If this list needs improvement I’d love to hear about it.
>>
>> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
>> in various stages of flight.  They occur just before the driver is unbound
>> after the presence gpio toggle on these devices is noticed and processed
>> (killing the hwmon userspace daemon cleanly).
>>
>> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
>> but occur nevertheless.
>>
>> we all know what eagain means...
>>
>> It isn’t so much that the hwmon userspace doesn’t support this change,
>> its just the resulting behavior difference at the other end of the
>> hwmon abi <-> dbus api translation is not optimal, right now.
>>
>> 1 - Without this change, the hwmon userspace reads a value of 0 out of
>> the sysfs attribute and happily reports that as the temp at the DBus level.
>>
>> With this change, the hwmon userspace would read the attribute, get an
>> error and retry for a bit.  After a number of retries, the error is
>> logged and the sensor is marked faulted at a dbus level.
>>
>> The issue is the consumer of the sensor at the dbus level.  Today, the
>> dbus consumer happens to translate the sensor value of zero into the
>> desired behavior.  If we submit this change the sensor will be put
>> into faulted state and the consuming application doesn’t have support
>> for that yet.
>>
>> We have open issues to enhance the application to support faulted sensors,
>> its just not implemented yet.
> OpenBMC-Staging-Count: 1
> Signed-off-by: Eddie James <eajames at us.ibm.com>
> Signed-off-by: Joel Stanley <joel at jms.id.au>
> ---
>   drivers/hwmon/occ/common.c | 33 +++------------------------------
>   1 file changed, 3 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7c97a4c..a5c4df0 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -304,24 +304,11 @@ static ssize_t occ_show_temp_2(struct device *dev,
>   		val = get_unaligned_be32(&temp->sensor_id);
>   		break;
>   	case 1:
> -		val = temp->value;
> -		if (val == OCC_TEMP_SENSOR_FAULT)
> -			return -EREMOTEIO;
> -
> -		if (temp->fru_type != OCC_FRU_TYPE_VRM) {
> -			/* sensor not ready */
> -			if (val == 0)
> -				return -EAGAIN;
> -
> -			val *= 1000;	/* millidegrees */
> -		}
> +		val = temp->value * 1000;
>   		break;
>   	case 2:
>   		val = temp->fru_type;
>   		break;
> -	case 3:
> -		val = temp->value == OCC_TEMP_SENSOR_FAULT;
> -		break;
>   	default:
>   		return -EINVAL;
>   	}
> @@ -790,7 +777,7 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>   		num_attrs += (sensors->temp.num_sensors * 2);
>   		break;
>   	case 2:
> -		num_attrs += (sensors->temp.num_sensors * 4);
> +		num_attrs += (sensors->temp.num_sensors * 3);
>   		show_temp = occ_show_temp_2;
>   		break;
>   	default:
> @@ -863,22 +850,13 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>
>   	for (i = 0; i < sensors->temp.num_sensors; ++i) {
>   		s = i + 1;
> -		temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
>
>   		snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
>   		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>   					     0, i);
>   		attr++;
>
> -		if (sensors->temp.version > 1 &&
> -		    temp->fru_type == OCC_FRU_TYPE_VRM) {
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_alarm", s);
> -		} else {
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_input", s);
> -		}
> -
> +		snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
>   		attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>   					     1, i);
>   		attr++;
> @@ -890,11 +868,6 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>   						     show_temp, NULL, 2, i);
>   			attr++;
>
> -			snprintf(attr->name, sizeof(attr->name),
> -				 "temp%d_fault", s);
> -			attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
> -						     show_temp, NULL, 3, i);
> -			attr++;
>   		}
>   	}
>



More information about the openbmc mailing list