Error handling

chandu chanti chandutmba at gmail.com
Tue Sep 5 00:49:48 AEST 2023


```
+-----------------------------------------------------------------------------+
| +-------------+          dbus calls
+-------------------+ | cmd1  +------------------+
| |   BMCWeb    |  <----------------------------------> |   GPU
| | <---->| GPU Out of band  |
| |   Service   |  (xyz.openbmc_project.Memory.Dimm )   |   Manager
| | cmd2  | Server           |
| |             |                                       |  (backend dbus
 | | <---> | (hardware level) |
| |             |                                       |   service)
 | |       |  service)        |
| +-------------+
+-------------------+ |       +------------------+
|                        BMC
   |
+-----------------------------------------------------------------------------+

```

Memory URI will use the backend D-Bus interface
"xyz.openbmc_project.Memory.Dimm" hosted by the GPU Manager service to
fetch all memory-related properties.
GPU Manager will update these D-Bus properties by sending SMBus commands to
GPU Out-of-Band server.Intermittent failures can occur in SMBus commands.
At a given instance, there is a chance that cmd1 will fail and cmd2 may
work. we want to use null value for failing property in redfish response
instead of failing entire URI
This is inline with redfish specification
https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.18.0.pdf
, refer 9.5 Common data types 9.5.1 Primitive types.
This is our use case.


>Please point to a failing branch in code you'd like handled differently.
Again, you've answered in the abstract
>which isn't helping me understand what you're trying to accomplish.

please refer gerrit MR(https://gerrit.openbmc.org/c/openbmc/bmcweb/+/66368)
We are trying to handle errors in memory URI properties. We have a backend
service named GPU Manager Server
that will handle all backend requests related to memory URI. We will use
different SMBus commands to update these properties.
In this particular URI case, the command for serial number may fail, but
SMBus commands for properties like part number and manufacturer will work
at a given time.

if (serialNumber != nullptr)
{
    asyncResp->res.jsonValue[jsonPtr]["SerialNumber"] =
          mydatatype_t(*serialNumber);
}

if (partNumber != nullptr)
{
    asyncResp->res.jsonValue[jsonPtr]["PartNumber"] =
          mydatatype_t(*partNumber);
}


> If there is an error, the return code should not be 200.  200 OK is
reserved for when there was no error.

Here we are discussing about only one property in dbus interface is
failing. not entire interface.
According to DMTF spec
https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.18.0.pdf,

9.5 Common data types 9.5.1 Primitive types , please refer Table 17
null value, which the service uses when it is unable to determine the
property's value due to an error or
other temporary condition, or if the schema has requirements for using null
for other special conditions.

we want to use null data type in redfish response to represent error
conditions for specific property.

>This is an example for a double data type.  The other types can be done in
similar manners, if the dbus interface allows it.

we checked for other datatypes, we didn't find any implementation details.
We mentioned the same in our previous mail.

>>2. Using default values in D-Bus properties to indicate errors. Is there
a reference implementation available for setting default values
>>for string and integer data types in bmc web, similar to the
implementation of NaN (default value) for the double data type in
cable.hpp.


>Sorry, but this sounds like you plan on not propagating errors to the
user.
>I can't think of a case where that would be the correct thing to do.
>If the internal service failed, bmcweb should report that to the user
through a return code and error property.

In our case, there is no failure in the entire service, intermittent
failures can occur in smbus commands.
We want to use null value for the corresponding property in redfish
response to represent the error.


On Fri, Sep 1, 2023 at 9:27 PM Ed Tanous <edtanous at google.com> wrote:

>
>
> On Fri, Sep 1, 2023 at 7:21 AM chandu chanti <chandutmba at gmail.com> wrote:
>
>>
>> >Overall, I've read the code, and your email, and I really have no idea
>> what's being proposed that doesn't already exist in a much simpler form,
>> >using the existing error handling mechanisms.  While your point that one
>> failing property can fail the whole call is correct,
>> >that's the intent.  If an internal service fails, that needs propagated
>> to the user, in the form of an error code, per the Redfish specification.
>>
>> please refer Table 17 — Primitive data types in
>> https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.18.0.pdf
>> null value, which the service uses when it is unable to determine the
>> property's value due to an error or
>> other temporary condition, or if the schema has requirements for using
>> null for other special conditions.
>>
>>
>> > One thing I don't understand is what code is actually failing in this
>> case?  Can you point to a place where an error is returned that you'd like
>> >handled differently?
>> > Speaking in concrete terms would help to make the generalized change
>> you're attempting here
>>
>> If there is an error in only one property, we want to represent that a
>> specific property is failing in the URI instead of failing the entire URI.
>>
>
> Please point to a failing branch in code you'd like handled differently.
> Again, you've answered in the abstract which isn't helping me understand
> what you're trying to accomplish.
>
>
>>
>>
>> >Please look at error_messages.hpp and error_messages.cpp
>>
>> We cannot have error messages when the HTTP return code is 200.
>>
>
> If there is an error, the return code should not be 200.  200 OK is
> reserved for when there was no error.
>
>
>>
>>
>> > For properties that have a "null" or "invalid" value, this is already
>> implemented, generally with a simple branch.
>> > Example from sensors:
>> >
>> https://github.com/openbmc/bmcweb/blob/99bf026224d6ea60be27bd36793ddfaea5537d2a/redfish-core/lib/sensors.hpp#L934
>>
>> This addresses only the double datatype.
>>
>
> This is an example for a double data type.  The other types can be done in
> similar manners, if the dbus interface allows it.
>
>
>> We are planning to implement it for all data types. Please refer to the
>> following table.
>>
>> >If the dbus data type is incorrect this already sets internal error
>> (500).
>> >Please look at sdbusplus::unpackPropertiesNoThrow usage in bmcweb.
>> >Given that the interface failed, this is the most specific error we can
>> give the user.  We do log the source of the error though.
>>
>> We are considering the case where only one property is failing, not the
>> entire interface. We don't want to fail the entire URI
>> and we had mentioned the same in the very first email.
>>
>
> Sorry, but this sounds like you plan on not propagating errors to the
> user.  I can't think of a case where that would be the correct thing to
> do.  If the internal service failed, bmcweb should report that to the user
> through a return code and error property.
>
>
>>
>> >> 1 . Using exceptions in our backend D-Bus service by throwing
>> exceptions in the D-Bus property get handlers.
>> >>It works fine for the Get property method call. However, when using the
>> Get All method call,
>> >>if one property fails, an error is returned without checking the other
>> properties.
>> >>Is there a way to implement exceptions in GetAll so that even if one
>> property fails, we can still fetch the remaining properties.
>>
>>
>> Error represenation Table for different data types
>> | D-Bus Data Type | Value of the D-Bus Property in case of Error |
>> Property Value as seen on Redfish |
>>
>> |-----------------|---------------------------------------------|----------------------------------|
>> | INT16           | INT_MAX                                     | null
>>                           |
>> | UINT16          | UINT_MAX                                    | null
>>                           |
>> | INT32           | LONG_MAX                                    | null
>>                           |
>> | UINT32          | ULONG_MAX                                   | null
>>                           |
>> | INT64           | LLONG_MAX                                   | null
>>                           |
>> | UINT64          | ULLONG_MAX                                  | null
>>                           |
>> | DOUBLE          | std::numeric_limits<T>::quiet_NaN()        | null
>>                         |
>> | STRING          | BE_ERROR-XXX                                | null
>>                           |
>> | BOOL            | uint8_t {0–false, 1–true, others- error}    | null
>>                           |
>>
>> On Thu, Aug 31, 2023 at 10:31 PM Ed Tanous <edtanous at google.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 31, 2023 at 4:47 AM chandu chanti <chandutmba at gmail.com>
>>> wrote:
>>>
>>>> Hi team,
>>>>
>>>> We have drafted a proposal for  "Using default values in D-Bus
>>>> properties to indicate errors"  as communicated in previous mail.
>>>>
>>>> Issue with Existing Implementation:
>>>> Error handling is not implemented in the existing bmcweb source.
>>>>
>>>
>>> This statement is verifiably incorrect.  Many discussions have been had
>>> about how we handle Dbus, json, and Redfish errors, and the handling
>>> policies are written down in DBUS_USAGE.md.   Please familiarize yourself
>>> with those previous discussions, and if you believe the conclusion was
>>> incorrect, quote those in your response.
>>>
>>>
>>>> The values fetched from D-Bus are used directly in the Redfish
>>>> response, except for some properties.
>>>>
>>>
>>> While this is true for properties that are intended to be sent directly
>>> to the user, like Asset information, it's not true of all properties.
>>>
>>>
>>>> There was no way to represent error for a given data type on Redfish.
>>>>
>>>
>>> Please look at error_messages.hpp and error_messages.cpp
>>>
>>>
>>>>
>>>>
>>>> New Implementation Proposal:
>>>> We plan to handle errors from backend D-Bus services in bmcweb using a
>>>> custom data type class. We will implement a helper class to create a new
>>>> data type called mydatatype_t. An object of this class will be created with
>>>> the D-Bus property value as input. This object will be assigned to the JSON
>>>> response using the assignment operator. This assignment operation will
>>>> invoke the to_json function with the mydatatype_t object, where D-Bus
>>>> values will be validated based on their data types.
>>>>
>>>
>>>> Data Types and Default Values:
>>>> The following values will be set on the D-Bus property in case of an
>>>> error, based on the data type.
>>>>
>>>
>>> If the dbus data type is incorrect this already sets internal error
>>> (500).  Please look at sdbusplus::unpackPropertiesNoThrow usage in bmcweb.
>>> Given that the interface failed, this is the most specific error we can
>>> give the user.  We do log the source of the error though.
>>>
>>>
>>>> In Redfish, we will validate these properties. If the D-Bus property
>>>> value matches the error value, we will update the property value as null in
>>>> the Redfish JSON response; otherwise, the corresponding value on D-Bus will
>>>> be used.
>>>>
>>>
>>> For properties that have a "null" or "invalid" value, this is already
>>> implemented, generally with a simple branch.
>>> Example from sensors:
>>>
>>> https://github.com/openbmc/bmcweb/blob/99bf026224d6ea60be27bd36793ddfaea5537d2a/redfish-core/lib/sensors.hpp#L934
>>>
>>>
>>>
>>>
>>>>
>>>> *D-Bus*
>>>>
>>>> *data type*
>>>>
>>>> *Value of the D-Bus Property incase of Error *
>>>>
>>>> *Property value as seen on Redfish *
>>>>
>>>> INT16
>>>>
>>>> INT_MAX
>>>>
>>>> null
>>>>
>>>> UINT16
>>>>
>>>> UINT_MAX
>>>>
>>>> null
>>>>
>>>> INT32
>>>>
>>>> LONG_MAX
>>>>
>>>> null
>>>>
>>>> UINT32
>>>>
>>>> ULONG_MAX
>>>>
>>>> null
>>>>
>>>> INT64
>>>>
>>>> LLONG_MAX
>>>>
>>>> null
>>>>
>>>> UINT64
>>>>
>>>> ULLONG_MAX
>>>>
>>>> null
>>>>
>>>> DOUBLE
>>>>
>>>> std::numeric_limits
>>>> <http://en.cppreference.com/w/cpp/types/numeric_limits><T>::quiet_NaN()
>>>>
>>>> null
>>>>
>>>> STRING
>>>>
>>>> BE_ERROR-XXX
>>>>
>>>> null
>>>>
>>>> BOOL
>>>>
>>>> uint8_t
>>>> {0–false , 1–true,
>>>>   others- error}
>>>>
>>>> null
>>>>
>>>>
>>> This table does not match with the Redfish spec on error handling
>>> behavior.  Please familiarize yourself with DSP0266
>>> <https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.18.0.pdf>,
>>> and any updates you make, please reference that specification.
>>>
>>> Also, I have no idea what "BE_ERROR-XXX" is supposed to represent.
>>>
>>>
>>>> In case of error conditions Redfish will report "null" for a property
>>>> irrespective  of the data type.
>>>>
>>>> For error validation, we will implement a custom class, which will
>>>> facilitate error validation for D-Bus values. In instances of errors, a
>>>> null value will be allocated to the JSON response. Example for the uint16_t
>>>> data type.
>>>>
>>>
>>> As written, this seems considerably worse than what we have today, and
>>> goes backwards on things like:
>>> https://github.com/openbmc/bmcweb/issues/176
>>>
>>>
>>>>
>>>> [image: image.png]
>>>>
>>>> [image: image.png]
>>>>
>>>> For more implementation details please refer the following gerrit link,
>>>> we implemented it for redfish URI /redfish/v1/Systems/<str>/Memory/<str>/.
>>>> https://gerrit.openbmc.org/c/openbmc/bmcweb/+/66368
>>>>
>>>> Your feedback and insights are greatly appreciated.
>>>>
>>>> Thanks,
>>>> Chandrasekhar T.
>>>>
>>>
>>>
>>> Overall, I've read the code, and your email, and I really have no idea
>>> what's being proposed that doesn't already exist in a much simpler form,
>>> using the existing error handling mechanisms.  While your point that one
>>> failing property can fail the whole call is correct, that's the intent.  If
>>> an internal service fails, that needs propagated to the user, in the form
>>> of an error code, per the Redfish specification.
>>>
>>> One thing I don't understand is what code is actually failing in this
>>> case?  Can you point to a place where an error is returned that you'd like
>>> handled differently?  Speaking in concrete terms would help to make the
>>> generalized change you're attempting here.
>>>
>>>
>>>>
>>>> On Fri, Aug 4, 2023 at 12:37 PM Shakeeb Pasha <shakeebbk at gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Joseph for the response.
>>>>> >> I didn't see anyone else answer this.  Is your question about how to
>>>>> >> architect or design your D-Bus interfaces?  Is the reference the
>>>>> BMCWeb
>>>>> >> merely to provide context as a consumer of these D-Bus services?  I
>>>>> >> assume so.
>>>>> Question is two pronged, one we wanted to know the best way to
>>>>> propagate error from dbus to bmcweb.
>>>>> Options we tried:
>>>>> 1. Have handlers of each property throw an exception if there was an
>>>>> error. But this did not work well with current bmcweb implementation(get
>>>>> all method call), as it would lead to the entire resource not getting
>>>>> populated even if we have one property failing.
>>>>> 2. Implement one additional associated rc property - and do the error
>>>>> handling bmcweb by looking at that, but this doubled up the number of
>>>>> properties on dbus.
>>>>> 3. Use "default values" per property data type, but this would fail
>>>>> for extreme cases like if the property is u32, and if the default value
>>>>> used was INT_MAX, then it breaks when the property could take value
>>>>> 0xFFFFFFFF as valid value.
>>>>> So the query here was to check if any preference from above options or
>>>>> hear any new approaches as well.
>>>>>
>>>>> The second query was about how to represent this error(per property
>>>>> error) on redfish, our proposal for now is to return "NULL" for any
>>>>> property that might be failing at the backend. Any thoughts on this
>>>>> approach?
>>>>>
>>>>> >> I don't have any special insights.  Are you looking to follow a
>>>>> design
>>>>> >> pattern?  Are you looking for direction from the BMCWeb maintainers?
>>>>>
>>>>> Yes, we are working on a design pattern proposal and will publish it
>>>>> here for review.
>>>>> It would be great if we could get some directions/inputs from the
>>>>> maintainers as well.
>>>>>
>>>>> Thanks,
>>>>> Shakeeb
>>>>>
>>>>> On Thu, Aug 3, 2023 at 10:33 PM Joseph Reynolds <jrey at linux.ibm.com>
>>>>> wrote:
>>>>>
>>>>>> On 7/20/23 2:04 AM, chandu chanti wrote:
>>>>>> > Hi Team, We are planning to handle errors from backend services in
>>>>>> bmc
>>>>>> > web. We are considering the following approaches to implement it,
>>>>>> but
>>>>>> > we are facing some issues. please provide your inputs. 1 . Using
>>>>>> > exceptions in our backend D-Bus service
>>>>>> > ZjQcmQRYFpfptBannerStart
>>>>>> > This Message Is From an Untrusted Sender
>>>>>> > You have not previously corresponded with this sender.
>>>>>> > Report Suspicious
>>>>>> > <
>>>>>> https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!12-vrJJyaRL1Nus7N26ProiLa90y_FB6oawxkmvrT4YcN373bBkdTP-XPRTFLRBygswzt1TwX0wxp5Tel83pR4ZZR-wpxEYJpcKudcTfq2FH6iPLN9Ep4cV_tX4$>
>>>>>>
>>>>>> >
>>>>>> > ZjQcmQRYFpfptBannerEnd
>>>>>> >
>>>>>> > Hi Team,
>>>>>> >
>>>>>> > We are planning to handle errors from backend services in bmc web.
>>>>>> We
>>>>>> > are considering the following approaches to implement it, but we
>>>>>> are
>>>>>> > facing some issues. please provide your inputs.
>>>>>> >
>>>>>>
>>>>>> I didn't see anyone else answer this.  Is your question about how to
>>>>>> architect or design your D-Bus interfaces?  Is the reference the
>>>>>> BMCWeb
>>>>>> merely to provide context as a consumer of these D-Bus services?  I
>>>>>> assume so.
>>>>>>
>>>>>> I don't have any special insights.  Are you looking to follow a
>>>>>> design
>>>>>> pattern?  Are you looking for direction from the BMCWeb maintainers?
>>>>>>
>>>>>> Joseph
>>>>>>
>>>>>> > 1 . Using exceptions in our backend D-Bus service by throwing
>>>>>> > exceptions in the D-Bus property get handlers. It works fine for
>>>>>> the
>>>>>> > Get property method call. However, when using the Get All method
>>>>>> call,
>>>>>> > if one property fails, an error is returned without checking the
>>>>>> other
>>>>>> > properties. Is there a way to implement exceptions in GetAll so
>>>>>> that
>>>>>> > even if one property fails, we can still fetch the remaining
>>>>>> properties.
>>>>>> >
>>>>>> > 2. Using default values in D-Bus properties to indicate errors. Is
>>>>>> > there a reference implementation available for setting default
>>>>>> values
>>>>>> > for string and integer data types in bmc web, similar to the
>>>>>> > implementation of NaN (default value) for the double data type in
>>>>>> > cable.hpp.
>>>>>> >
>>>>>> > 3. Implement associated return code per property on dbus, but this
>>>>>> > would be very inefficient in terms of double the properties on
>>>>>> dbus,
>>>>>> > something we would like to avoid
>>>>>> >
>>>>>> > Thanks
>>>>>> > Chandrasekhar T.
>>>>>> >
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230904/9a0b37e9/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 37880 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230904/9a0b37e9/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 5358 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230904/9a0b37e9/attachment-0003.png>


More information about the openbmc mailing list