Fault handling(Threshold exceeds/low) in Fan and NIC sensors
Matt Spinler
mspinler at linux.ibm.com
Sat Nov 21 04:10:25 AEDT 2020
On 11/17/2020 5:59 AM, Kumar Thangavel wrote:
> Classification: Internal
>
> Hi Ed,
>
> Please find below my response inline.
>
> Thanks,
> Kumar.
>
> -----Original Message-----
> From: Ed Tanous <ed at tanous.net>
> Sent: Monday, November 16, 2020 9:29 PM
> To: Kumar Thangavel <thangavel.k at hcl.com>
> Cc: openbmc at lists.ozlabs.org; Velumani T-ERS,HCLTech <velumanit at hcl.com>; sdasari at fb.com; Patrick Williams <patrickw3 at fb.com>; Patrick Venture <venture at google.com>; Jae Hyun Yoo <jae.hyun.yoo at linux.intel.com>; Vernon Mauery <vernon.mauery at linux.intel.com>; Zhikui Ren <zhikui.ren at intel.com>
> Subject: Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
>
> [CAUTION: This Email is from outside the Organization. Unless you trust the sender, Don’t click links or open attachments as it may be a Phishing email, which can steal your Information and compromise your Computer.]
>
> On Mon, Nov 16, 2020 at 5:05 AM Kumar Thangavel <thangavel.k at hcl.com> wrote:
>> Classification: Internal
>>
>> Hi Ed,
>>
>> In short, Our requirement is to take the actions when the fan fails. That action is platform specific.
>>
>> Fan failure : This is based on Fan sensors. If fan sensor's tach values is less than 33%, will consider as a fan failure. So will take the actions to reduce the heat production in the system.
> dbus-sensors and phosphor-pid-control already have mechanisms for handling fan failure in these ways. Take a look at the existing config files, and they'll guide you on what you need to do next.
>
> Kumar : Are you saying about dbus-sensor's checkThresholds function ? In that function, high/low threshold levels are handled. Please confirm once.
> In that function, planning to add the service to handle the platform specific actions.
> Also, planning to add a new field in entity manager to identify the particular sensors to handle this fault condition.
I have a need to monitor some temperature sensor thresholds and take
various actions, such as creating
phosphor-logging event logs and doing soft and hard shutdowns after
various delays. In fact, not all sensors
I need to monitor will be provided by D-Bus sensors, but I do need to
use data provided by entity
manager to tell me things like how long to delay, etc.
I wouldn't think that dbus-sensors is probably the appropriate place to
put this code, since it isn't putting
any sensors on D-Bus and won't necessarily being monitoring sensors
provided by that repo.
Does anyone have a good idea of where a daemon like this could go? If
nowhere else, I could put it
in phosphor-fan, though not fan related, since our platforms will always
use the fan-monitor app
provided there which already does similar things for fan errors.
>
>> So that, hosts, NIC and other power consuming modules.
>>
>> Dbus-sensor's already handles the threshold masking. We just use that threshold masking to take the platform specific actions.
>>
>> Please let us know if any clarifications needed.
>>
>> Thanks,
>> Kumar.
> Ps, Please don't toppost.
> ::DISCLAIMER::
> ________________________________
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
> ________________________________
More information about the openbmc
mailing list