[PATCH linux dev-4.13 1/2] hwmon (occ): Remove temperature fault attribute and VRM temp alarm
Matt Spinler
mspinler at linux.vnet.ibm.com
Fri Mar 9 06:15:58 AEDT 2018
On 3/1/2018 6:39 PM, Joel Stanley wrote:
> On Fri, Mar 2, 2018 at 9:14 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
>> From: "Edward A. James" <eajames at us.ibm.com>
>>
>> This removes the EAGAIN return for "sensor not ready" and the temperature
>> fault and vrm temp alarms attributes.
>>
>> Apologies for pushing this up so early... Userspace applications aren't
>> ready for this change. The hwmon polling application cannot accept EGAIN
>> yet, and we can't be returning apparent errors if the sensor is
>> temporarily unavailable.
>
> Okay. Brad's email is from October 5th, 2017. That's 148 days ago.
>
> Any ETA on when the userspace plans to get some love?
>
We have 4 issues in the backlog to address the OCC GPU sensors returning
either a 0 or 0xFF:
https://github.com/openbmc/openbmc/issues/2327
https://github.com/openbmc/openbmc/issues/2329
https://github.com/openbmc/openbmc/issues/2222
https://github.com/openbmc/openbmc/issues/2223
As fan control is already doing the right thing when it reads the values
as they are now, we deprioritized the work quite a bit so it still
isn't scheduled and may not be until later systems.
> Cheers,
>
> Joel
>
>
>>
>> Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>>> userspace retries for a configurable period on eio, etimedout,
>>> ebadmsg, eagain, and exnio. It also exits cleanly on enoent.
>>>
>>> If any of these errnos don’t go away after the configured number of retry
>>> attempts, an error is logged. Any other errnos and an error is logged
>>> immediately. If this list needs improvement I’d love to hear about it.
>>>
>>> ebadmsg and enxio are observed when i2c devices are unplugged with transfers
>>> in various stages of flight. They occur just before the driver is unbound
>>> after the presence gpio toggle on these devices is noticed and processed
>>> (killing the hwmon userspace daemon cleanly).
>>>
>>> eio, etimedout seem to be bus type errors that appear somewhat infrequently,
>>> but occur nevertheless.
>>>
>>> we all know what eagain means...
>>>
>>> It isn’t so much that the hwmon userspace doesn’t support this change,
>>> its just the resulting behavior difference at the other end of the
>>> hwmon abi <-> dbus api translation is not optimal, right now.
>>>
>>> 1 - Without this change, the hwmon userspace reads a value of 0 out of
>>> the sysfs attribute and happily reports that as the temp at the DBus level.
>>>
>>> With this change, the hwmon userspace would read the attribute, get an
>>> error and retry for a bit. After a number of retries, the error is
>>> logged and the sensor is marked faulted at a dbus level.
>>>
>>> The issue is the consumer of the sensor at the dbus level. Today, the
>>> dbus consumer happens to translate the sensor value of zero into the
>>> desired behavior. If we submit this change the sensor will be put
>>> into faulted state and the consuming application doesn’t have support
>>> for that yet.
>>>
>>> We have open issues to enhance the application to support faulted sensors,
>>> its just not implemented yet.
>>
>> OpenBMC-Staging-Count: 1
>> Signed-off-by: Eddie James <eajames at us.ibm.com>
>> Signed-off-by: Joel Stanley <joel at jms.id.au>
>> ---
>> drivers/hwmon/occ/common.c | 33 +++------------------------------
>> 1 file changed, 3 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
>> index 7c97a4c..a5c4df0 100644
>> --- a/drivers/hwmon/occ/common.c
>> +++ b/drivers/hwmon/occ/common.c
>> @@ -304,24 +304,11 @@ static ssize_t occ_show_temp_2(struct device *dev,
>> val = get_unaligned_be32(&temp->sensor_id);
>> break;
>> case 1:
>> - val = temp->value;
>> - if (val == OCC_TEMP_SENSOR_FAULT)
>> - return -EREMOTEIO;
>> -
>> - if (temp->fru_type != OCC_FRU_TYPE_VRM) {
>> - /* sensor not ready */
>> - if (val == 0)
>> - return -EAGAIN;
>> -
>> - val *= 1000; /* millidegrees */
>> - }
>> + val = temp->value * 1000;
>> break;
>> case 2:
>> val = temp->fru_type;
>> break;
>> - case 3:
>> - val = temp->value == OCC_TEMP_SENSOR_FAULT;
>> - break;
>> default:
>> return -EINVAL;
>> }
>> @@ -790,7 +777,7 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>> num_attrs += (sensors->temp.num_sensors * 2);
>> break;
>> case 2:
>> - num_attrs += (sensors->temp.num_sensors * 4);
>> + num_attrs += (sensors->temp.num_sensors * 3);
>> show_temp = occ_show_temp_2;
>> break;
>> default:
>> @@ -863,22 +850,13 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>>
>> for (i = 0; i < sensors->temp.num_sensors; ++i) {
>> s = i + 1;
>> - temp = ((struct temp_sensor_2 *)sensors->temp.data) + i;
>>
>> snprintf(attr->name, sizeof(attr->name), "temp%d_label", s);
>> attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>> 0, i);
>> attr++;
>>
>> - if (sensors->temp.version > 1 &&
>> - temp->fru_type == OCC_FRU_TYPE_VRM) {
>> - snprintf(attr->name, sizeof(attr->name),
>> - "temp%d_alarm", s);
>> - } else {
>> - snprintf(attr->name, sizeof(attr->name),
>> - "temp%d_input", s);
>> - }
>> -
>> + snprintf(attr->name, sizeof(attr->name), "temp%d_input", s);
>> attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_temp, NULL,
>> 1, i);
>> attr++;
>> @@ -890,11 +868,6 @@ static int occ_setup_sensor_attrs(struct occ *occ)
>> show_temp, NULL, 2, i);
>> attr++;
>>
>> - snprintf(attr->name, sizeof(attr->name),
>> - "temp%d_fault", s);
>> - attr->sensor = OCC_INIT_ATTR(attr->name, 0444,
>> - show_temp, NULL, 3, i);
>> - attr++;
>> }
>> }
>>
>> --
>> 1.8.3.1
>>
>
More information about the openbmc
mailing list