[RFC] BMC RAS Feature
Supreeth Venkatesh
supreeth.venkatesh at amd.com
Thu Mar 23 11:07:24 AEDT 2023
On 3/22/23 02:10, Lei Yu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
>>> On Tue, 21 Mar 2023 at 20:38, Supreeth Venkatesh
>>> <supreeth.venkatesh at amd.com> wrote:
>>>
>>>
>>> On 3/21/23 05:40, Patrick Williams wrote:
>>> > On Tue, Mar 21, 2023 at 12:14:45AM -0500, Supreeth Venkatesh wrote:
>>> >
>>> >> #### Alternatives Considered
>>> >>
>>> >> In-band mechanisms using System Management Mode (SMM) exists.
>>> >>
>>> >> However, out of band method to gather RAS data is processor
>>> specific.
>>> >>
>>> > How does this compare with existing implementations in
>>> > phosphor-debug-collector.
>>> Thanks for your feedback. See below.
>>> > I believe there was some attempt to extend
>>> > P-D-C previously to handle Intel's crashdump behavior.
>>> Intel's crashdump interface uses com.intel.crashdump.
>>> We have implemented com.amd.crashdump based on that reference.
>>> However,
>>> can this be made generic?
>>>
>>> PoC below:
>>>
>>> busctl tree com.amd.crashdump
>>>
>>> └─/com
>>> └─/com/amd
>>> └─/com/amd/crashdump
>>> ├─/com/amd/crashdump/0
>>> ├─/com/amd/crashdump/1
>>> ├─/com/amd/crashdump/2
>>> ├─/com/amd/crashdump/3
>>> ├─/com/amd/crashdump/4
>>> ├─/com/amd/crashdump/5
>>> ├─/com/amd/crashdump/6
>>> ├─/com/amd/crashdump/7
>>> ├─/com/amd/crashdump/8
>>> └─/com/amd/crashdump/9
>>>
>>> > The repository
>>> > currently handles IBM's processors, I think, or maybe that is
>>> covered by
>>> > openpower-debug-collector.
>>> >
>>> > In any case, I think you should look at the existing D-Bus
>>> interfaces
>>> > (and associated Redfish implementation) of these repositories and
>>> > determine if you can use those approaches (or document why now).
>>> I could not find an existing D-Bus interface for RAS in
>>> xyz/openbmc_project/.
>>> It would be helpful if you could point me to it.
>>>
>>>
>>> There is an interface for the dumps generated from the host, which can
>>> be used for these kinds of dumps
>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/System.interface.yaml
>>>
>>> The fault log also provides similar dumps
>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml
>>>
>> ThanksDdhruvraj. The interface looks useful for the purpose. However,
>> the current BMCWEB implementation references
>> https://github.com/openbmc/bmcweb/blob/master/redfish-core/lib/log_services.hpp
>> [com.intel.crashdump]
>> constexpr char const* crashdumpPath = "/com/intel/crashdump";
>>
>> constexpr char const* crashdumpInterface = "com.intel.crashdump";
>> constexpr char const* crashdumpObject = "com.intel.crashdump";
>>
>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/System.interface.yaml
>> or
>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml
>> is it exercised in Redfish logservices?
> In our practice, a plugin `tools/dreport.d/plugins.d/acddump` is added
> to copy the crashdump json file to the dump tarball.
> The crashdump tool (Intel or AMD) could trigger a dump after the
> crashdump is completed, and then we could get a dump entry containing
> the crashdump.
Thanks Lei Yu for your input. We are using Redfish to retrieve the CPER
binary file which can then be passed through a plugin/script for
detailed analysis.
In any case irrespective of whichever Dbus interface we use, we need a
repository which will gather data from AMD processor via APML as per AMD
design.
APML Spec: https://www.amd.com/system/files/TechDocs/57019-A0-PUB_3.00.zip
Can someone please help create bmc-ras or amd-debug-collector repository
as there are instances of openpower-debug-collector repository used for
Open Power systems?
>
>
> --
> BRs,
> Lei YU
More information about the openbmc
mailing list