[PATCH linux dev-4.13 1/2] hwmon (occ): Remove temperature fault attribute and VRM temp alarm

Joel Stanley joel at jms.id.au
Fri Mar 2 11:39:56 AEDT 2018


On Fri, Mar 2, 2018 at 9:14 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
> From: "Edward A. James" <eajames at us.ibm.com>
>
> This removes the EAGAIN return for "sensor not ready" and the temperature
> fault and vrm temp alarms attributes.
>
> Apologies for pushing this up so early... Userspace applications aren't
> ready for this change. The hwmon polling application cannot accept EGAIN
> yet, and we can't be returning apparent errors if the sensor is
> temporarily unavailable.

Okay. Brad's email is from October 5th, 2017. That's 148 days ago.

Any ETA on when the userspace plans to get some love?

Cheers,

Joel


>
> Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>> userspace retries for a configurable period on eio, etimedout,
>> ebadmsg, eagain, and exnio.  It also exits cleanly on enoent.
>>
>> If any of these errnos don’t go away after the configured number of retry
>> attempts, an error is logged.  Any other errnos and an error is logged
>> immediately. If this list needs improvement I’d love to hear about it.
>>
>> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
>> in various stages of flight.  They occur just before the driver is unbound
>> after the presence gpio toggle on these devices is noticed and processed
>> (killing the hwmon userspace daemon cleanly).
>>
>> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
>> but occur nevertheless.
>>
>> we all know what eagain means...
>>
>> It isn’t so much that the hwmon userspace doesn’t support this change,
>> its just the resulting behavior difference at the other end of the
>> hwmon abi <-> dbus api translation is not optimal, right now.
>>
>> 1 - Without this change, the hwmon userspace reads a value of 0 out of
>> the sysfs attribute and happily reports that as the temp at the DBus level.
>>
>> With this change, the hwmon userspace would read the attribute, get an
>> error and retry for a bit.  After a number of retries, the error is
>> logged and the sensor is marked faulted at a dbus level.
>>
>> The issue is the consumer of the sensor at the dbus level.  Today, the
>> dbus consumer happens to translate the sensor value of zero into the
>> desired behavior.  If we submit this change the sensor will be put
>> into faulted state and the consuming application doesn’t have support
>> for that yet.
>>
>> We have open issues to enhance the application to support faulted sensors,
>> its just not implemented yet.
>
> OpenBMC-Staging-Count: 1
> Signed-off-by: Eddie James <eajames at us.ibm.com>
> Signed-off-by: Joel Stanley <joel at jms.id.au>
> ---
>  drivers/hwmon/occ/common.c | 33 +++------------------------------
>  1 file changed, 3 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7c97a4c..a5c4df0 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -304,24 +304,11 @@ static ssize_t occ_show_temp_2(struct device *dev,
>                 val = get_unaligned_be32(&temp->sensor_id);
>                 break;
>         case 1:
> -               val = temp->value;
> -               if (val == OCC_TEMP_SENSOR_FAULT)
> -                       return -EREMOTEIO;
> -
> -               if (temp->fru_type != OCC_FRU_TYPE_VRM) {
> -                       /* sensor not ready */
> -                       if (val == 0)
> -                               return -EAGAIN;
> -
> -                       val *= 1000;    /* millidegrees */
> -               }
> +               val = temp->value * 1000;
>                 break;
>         case 2:
>                 val = temp->fru_type;
>                 break;
> -       case 3:
> -               val = temp->value == OCC_TEMP_SENSOR_FAULT;
> -               break;
>         default:
>                 return -EINVAL;
>         }
> @@ -790,7 +777,7 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>                 num_attrs += (sensors->temp.num_sensors * 2);
>                 break;
>         case 2:
> -               num_attrs += (sensors->temp.num_sensors * 4);
> +               num_attrs += (sensors->temp.num_sensors * 3);
>                 show_temp = occ_show_temp_2;
>                 break;
>         default:
> @@ -863,22 +850,13 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>
>         for (i = 0; i < sensors->temp.num_sensors; ++i) {
>                 s = i + 1;
> -               temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
>
>                 snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
>                 attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>                                              0, i);
>                 attr++;
>
> -               if (sensors->temp.version > 1 &&
> -                   temp->fru_type == OCC_FRU_TYPE_VRM) {
> -                       snprintf(attr->name, sizeof(attr->name),
> -                                "temp%d_alarm", s);
> -               } else {
> -                       snprintf(attr->name, sizeof(attr->name),
> -                                "temp%d_input", s);
> -               }
> -
> +               snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
>                 attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>                                              1, i);
>                 attr++;
> @@ -890,11 +868,6 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>                                                      show_temp, NULL, 2, i);
>                         attr++;
>
> -                       snprintf(attr->name, sizeof(attr->name),
> -                                "temp%d_fault", s);
> -                       attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
> -                                                    show_temp, NULL, 3, i);
> -                       attr++;
>                 }
>         }
>
> --
> 1.8.3.1
>


More information about the openbmc mailing list