Phosphor-hwmon: reduce hwmonio::retries when sensor is Nonfunctional.

Thu Nguyen thu at amperemail.onmicrosoft.com
Thu Dec 24 13:32:14 AEDT 2020


On 12/24/20 08:52, Lei Yu wrote:
> On Wed, Dec 23, 2020 at 11:33 PM Thu Nguyen
> <thu at amperemail.onmicrosoft.com> wrote:
>> On 12/16/20 14:33, Thu Nguyen wrote:
>>> Hi All,
>>>
>>>
>>> I'm working with Fan sensors on Ampere MtJade platform.
>>>
>>> In this platform, I have multiple fans which name as FAN3_1, FAN3_2,
>>> FAN4_1, FAN4_2, FAN5_1...
>>>
>>> I added the configuration for those fans in phosphor-hwmon and I also
>>> added option "--enable-update-functional-on-fail" in phosphor-hwmon
>>> build flag. I'm trying to set fan functional to false when unplug fan.
>>>
>>> Flash new image to the board, read functional of fans. The time to
>>> read dbus property is about 0.05->0.1 seconds:
>>>
>>> root at mtjade:~# time busctl get-property
>>> xyz.openbmc_project.Hwmon-1644477290.Hwmon1
>>> /xyz/openbmc_project/sensors/fan_tach/FAN4_2
>>> xyz.openbmc_project.State.Decorator.OperationalStatus Functional
>>> b true
>>>
>>> real    0m0.078s
>>> user    0m0.002s
>>> sys    0m0.032s
>>> root at mtjade:~# time busctl get-property
>>> xyz.openbmc_project.Hwmon-1644477290.Hwmon1
>>> /xyz/openbmc_project/sensors/fan_tach/FAN3_2
>>> xyz.openbmc_project.State.Decorator.OperationalStatus Functional
>>> b true
>>>
>>>
>>> real    0m0.044s
>>> user    0m0.001s
>>> sys    0m0.034s
>>>
>>> After unplug one fan (FAN4_2), I can see that fan functional of FAN4_2
>>> is set to false as expected. And functional of others fans keeps
>>> true. But the time to get dbus properties of all fans have a huge
>>> increasement event in the working fans.
>>>
>>> ~# time busctl get-property
>>> xyz.openbmc_project.Hwmon-1644477290.Hwmon1
>>> /xyz/openbmc_project/sensors/fan_tach/FAN4_2
>>> xyz.openbmc_project.State.Decorator.OperationalStatus Functional
>>> b false
>>>
>>> real    0m1.189s
>>> user    0m0.001s
>>> sys    0m0.036s
>>>
>>> ~# time busctl get-property
>>> xyz.openbmc_project.Hwmon-1644477290.Hwmon1
>>> /xyz/openbmc_project/sensors/fan_tach/FAN3_2
>>> xyz.openbmc_project.State.Decorator.OperationalStatus Functional
>>> b true
>>>
>>> real    0m3.285s
>>> user    0m0.010s
>>> sys    0m0.028s
>>>
>>> The "ipmitool sdr type 0x4" commands is also failed because this
>>> increasement.
>>>
>>> ~$ time ipmitool -I lanplus -U root -P 0penBmc -C 17 -H <BMCIP> sdr
>>> type 0x4
>>> FAN3_1           | 25h | ok  | 29.13 | 5100 RPM
>>> FAN3_2           | 28h | ok  | 29.16 | 4700 RPM
>>> FAN4_1           | 2Bh | ns  | 29.19 | No Reading
>>> FAN4_2           | 2Eh | ns  | 29.22 | No Reading
>>> FAN5_1           | 31h | ns  | 29.25 | No Reading
>>> FAN5_2           | 34h | ns  | 29.28 | No Reading
>>> FAN6_1           | 37h | ns  | 29.31 | No Reading
>>> FAN6_2           | 3Ah | ns  | 29.34 | No Reading
>>> FAN7_1           | 3Dh | ns  | 29.37 | No Reading
>>> FAN7_2           | 40h | ns  | 29.40 | No Reading
>>> FAN8_1           | 43h | ns  | 29.43 | No Reading
>>> FAN8_2           | 46h | ns  | 29.46 | No Reading
>>> PSU0_fan1        | F5h | ns  | 29.60 | No Reading
>>> PSU1_fan1        | F6h | ns  | 29.61 | No Reading
>>>
>>> real    2m43.704s
>>> user    0m0.046s
>>> sys    0m0.057s
>>>
>>> The cause of this increasement is when it failed to read one sensor
>>> phosphor-hwmon keep trying to read the sensors with the retry is 10
>>> and the 100ms delays between retry times.
>>>
>>> Should we reduce the retry for non-functional sensors?
> When a fan is unplugged, its "Present" property should be false as well.
> Maybe you could check that property and skip such fans?
>
In the sensor Dbus object, we don't have the present property. The 
present property is belong to the inventory object of the phosphor-fan.

If using present properties, we have to map the fan sensor name with the 
corresponding inventory object. We will break the generic character of 
phosphor-hwmon.

As my opinion, for hotplug supporting devices such as fans, we should 
not retry when failed to read. Because there are no difference between 
the fan sensors are failed to read or the fan sensors are unplugged with 
the fan.

Is it reasonable to retry to read the failed sensors after each 0.1 
seconds?

>>>
>>> Regards.
>>>
>>> Thu Nguyen
>> Hi All,
>>
>> Any feed back on this?
>>
>> Thu Nguyen,
>>
>



More information about the openbmc mailing list