[PATCH linux dev-4.13 1/2] hwmon (occ): Remove temperature fault attribute and VRM temp alarm
Joel Stanley
joel at jms.id.au
Fri Mar 2 11:39:56 AEDT 2018
On Fri, Mar 2, 2018 at 9:14 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
> From: "Edward A. James" <eajames at us.ibm.com>
>
> This removes the EAGAIN return for "sensor not ready" and the temperature
> fault and vrm temp alarms attributes.
>
> Apologies for pushing this up so early... Userspace applications aren't
> ready for this change. The hwmon polling application cannot accept EGAIN
> yet, and we can't be returning apparent errors if the sensor is
> temporarily unavailable.
Okay. Brad's email is from October 5th, 2017. That's 148 days ago.
Any ETA on when the userspace plans to get some love?
Cheers,
Joel
>
> Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>> userspace retries for a configurable period on eio, etimedout,
>> ebadmsg, eagain, and exnio. It also exits cleanly on enoent.
>>
>> If any of these errnos don’t go away after the configured number of retry
>> attempts, an error is logged. Any other errnos and an error is logged
>> immediately. If this list needs improvement I’d love to hear about it.
>>
>> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
>> in various stages of flight. They occur just before the driver is unbound
>> after the presence gpio toggle on these devices is noticed and processed
>> (killing the hwmon userspace daemon cleanly).
>>
>> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
>> but occur nevertheless.
>>
>> we all know what eagain means...
>>
>> It isn’t so much that the hwmon userspace doesn’t support this change,
>> its just the resulting behavior difference at the other end of the
>> hwmon abi <-> dbus api translation is not optimal, right now.
>>
>> 1 - Without this change, the hwmon userspace reads a value of 0 out of
>> the sysfs attribute and happily reports that as the temp at the DBus level.
>>
>> With this change, the hwmon userspace would read the attribute, get an
>> error and retry for a bit. After a number of retries, the error is
>> logged and the sensor is marked faulted at a dbus level.
>>
>> The issue is the consumer of the sensor at the dbus level. Today, the
>> dbus consumer happens to translate the sensor value of zero into the
>> desired behavior. If we submit this change the sensor will be put
>> into faulted state and the consuming application doesn’t have support
>> for that yet.
>>
>> We have open issues to enhance the application to support faulted sensors,
>> its just not implemented yet.
>
> OpenBMC-Staging-Count: 1
> Signed-off-by: Eddie James <eajames at us.ibm.com>
> Signed-off-by: Joel Stanley <joel at jms.id.au>
> ---
> drivers/hwmon/occ/common.c | 33 +++------------------------------
> 1 file changed, 3 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7c97a4c..a5c4df0 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -304,24 +304,11 @@ static ssize_t occ_show_temp_2(struct device *dev,
> val = get_unaligned_be32(&temp->sensor_id);
> break;
> case 1:
> - val = temp->value;
> - if (val == OCC_TEMP_SENSOR_FAULT)
> - return -EREMOTEIO;
> -
> - if (temp->fru_type != OCC_FRU_TYPE_VRM) {
> - /* sensor not ready */
> - if (val == 0)
> - return -EAGAIN;
> -
> - val *= 1000; /* millidegrees */
> - }
> + val = temp->value * 1000;
> break;
> case 2:
> val = temp->fru_type;
> break;
> - case 3:
> - val = temp->value == OCC_TEMP_SENSOR_FAULT;
> - break;
> default:
> return -EINVAL;
> }
> @@ -790,7 +777,7 @@ static int occ_setup_sensor_attrs(struct occ *occ)
> num_attrs += (sensors->temp.num_sensors * 2);
> break;
> case 2:
> - num_attrs += (sensors->temp.num_sensors * 4);
> + num_attrs += (sensors->temp.num_sensors * 3);
> show_temp = occ_show_temp_2;
> break;
> default:
> @@ -863,22 +850,13 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>
> for (i = 0; i < sensors->temp.num_sensors; ++i) {
> s = i + 1;
> - temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
>
> snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
> attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
> 0, i);
> attr++;
>
> - if (sensors->temp.version > 1 &&
> - temp->fru_type == OCC_FRU_TYPE_VRM) {
> - snprintf(attr->name, sizeof(attr->name),
> - "temp%d_alarm", s);
> - } else {
> - snprintf(attr->name, sizeof(attr->name),
> - "temp%d_input", s);
> - }
> -
> + snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
> attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
> 1, i);
> attr++;
> @@ -890,11 +868,6 @@ static int occ_setup_sensor_attrs(struct occ *occ)
> show_temp, NULL, 2, i);
> attr++;
>
> - snprintf(attr->name, sizeof(attr->name),
> - "temp%d_fault", s);
> - attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
> - show_temp, NULL, 3, i);
> - attr++;
> }
> }
>
> --
> 1.8.3.1
>
More information about the openbmc
mailing list