[External] Re: The incomplete result of mapper GetSubTree/Paths
Lei Yu
yulei.sh at bytedance.com
Tue May 24 12:58:32 AEST 2022
On Tue, May 24, 2022 at 12:19 AM Ed Tanous <edtanous at google.com> wrote:
>
>
>
> On Thu, May 19, 2022 at 8:14 PM Lei Yu <yulei.sh at bytedance.com> wrote:
>>
>> This email is to describe an issue in mapper that the
>> GetSubTree/GetSubTreePaths could return incomplete results when it's
>> doing introspect.
>>
>> Steps to reproduce the issue:
>> 1. Configure phosphor-logging to get max 1000 entries. (with
>> -Derror_info_cap=1000 meson option)
>> 2. Create 1000 logging entries.
>> 3. Call GetSubTreePaths and make sure it gets the correct 1000 entries:
>> # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/logging/entry 0 1
>> xyz.openbmc_project.Logging.Entry | awk '{print $2;}'
>> 1000
>> 4. Restart logging service
>> # systemctl restart xyz.openbmc_project.Logging.service
>> 5. After the service is restarted, call GetSubTreePaths for multiple
>> times in the short time (e.g. within 10 seconds)
>> # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>> 47
>> # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>> 375
>> # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>> 851
>> # busctl call "xyz.openbmc_project.ObjectMapper"
>> "/xyz/openbmc_project/object_mapper"
>> "xyz.openbmc_project.ObjectMapper" GetSubTreePaths sias
>> /xyz/openbmc_project/loggiz.openbmc_project.Logging.Entry | awk
>> '{print $2;}'
>> 1000
>>
>> We can see that the result of GetSubTreePaths is increasing until it gets 1000.
>> This actually happens when mapper is doing introspect of the logging
>> service, and getting more and more objects.
>>
>> The above "incomplete" behavior will impact the logic that depends on
>> the result of GetSubTreePaths.
>> E.g. in ipmid, the "cached SEL" feature depends on the reliable result
>> of GetSubTreePath, to get the number of current logging entries.
>
>
> I'm not really following why this is a problem, and sounds like a bug in the cached SEL feature. Dbus objects can get added or removed at any time, including logging objects, if the cached sel feature isn't handling all the added/removed signals as it should, that really should be fixed. Can you go into more detail about what this actually causes? Is it that logging objects are getting added out of order?
>
As Patrick also mentioned, the issue is that
1. When a service creates 1000 objects _before_ advertising the
service to the DBus.
2. When it advertises itself, the other services are expected to see
there are 1000 objects, _without_ `interfacesAdded` signal.
3. The reality is that mapper gives a random number between 0~1000
during the introspection.
4. The side effect is that if a service is interested and ask mapper
for the object, it gets incomplete result, and there are no
`interfacesAdded` signal at all, and thus it has the incomplete data
of the DBus objets.
>>
>> If
>> it's not correct, ipmid will not know the "missed" entries.
>
>
> What is a "missed" entry in this context?
See the above example, where the `GetSubTreePaths` API gives result of
47, 375, 851, while the expectation is 1000.
>>
>>
>> The question is, should we make sure mapper returns the "stable"
>> result in the above case?
>
>
> I don't think so, at least, when it was built, it wasn't designed to return atomically created results, but given that ANY object on dbus can be added or removed at any time, it's not clear why this is a problem in the mapper itself.
When an object is added or removed, either it emits
interfacesAdded/Removed signal, or it emits nameOwnerChanged signal.
Both will mapper to do the introspect and get the latest objects. And
when it is doing introspection, it is expected *NOT* to return
"incomplete" result. Otherwise services relying on mapper will have
chance to get incomplete view of the DBus objects.
>
>>
>> When it's doing introspect of a service (e.g. nameOwnerChanged), it
>> could throw if the service is not fully introspected, and only return
>> the "correct" result after the service is fully introspected.
>>
>>
>> If mapper could not guarantee the stable result, the service calling
>> mapper will have to add more complex logic to make sure it gets the
>> "full and correct" result.
>
>
> I left comments on your code review as well, but please don't do this. This is going to force retry code into EVERY service that relies on the mapper, at EVERY mapper call site. If the current behavior is truly a problem, I would much rather the mapper support multi-versioned atomically create copies of the dbus-map, and have the mapper expose the "reliable" interface, rather than spreading that throughout the system. For context, bmcweb alone has dozens of call sites to the mapper that would need to be fixed if it were made to be possible for it to fail in this way.
The patch is POC to show the problem and a possible fix. Throwing
`ResourceNotFound` may not be a good fix.
But it shall tell the caller about the status of the result, possibly:
* Throwing an "InIntrospect" exception to indicate it's doing
introspecting on the object path. This is similar to what my patch
does, it's just using a more meaningful exception.
* Add a flag in the result to indicate that the result is not
complete. Then the caller who cares could check the flag, and retry
until it gets a complete result.
More information about the openbmc
mailing list