Adding support for custom SEL records

Tue Oct 25 09:44:37 AEDT 2022

On 21-Oct-2022 03:14 PM, Patrick Williams wrote:
>On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
>
>> Intel had a requirement to support storing at least 4000 log entries.
>> At the time, we were able to get about 400 entries on D-Bus before D-Bus
>> performance became unusable.
>>
>> That was before dbus-broker, so it could perhaps be better today.
>
>I was surprised that there would be much performance impact to dbus as a
>whole because there should not be any impact to the bus because one
>process decided to host a bunch of objects.  I can understand _that_
>process becoming slower if there are algorithmic problems with it.

I suspect that it was a combination of the switch to dbus-broker in 
addition to rewriting the mapper, which made the number of SEL entries 
phosphor-logging is capable of handling go up from what it was before.

>I did an experiment on an AST2600 system where I modified phosphor-logging
>to support 20k log entries and then created 10k of them.
>
>```
>$ cat meta-facebook/recipes-phosphor/logging/phosphor-logging_%.bbappend
>FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:"
>
>EXTRA_OEMESON += "-Derror_cap=20000"
>```
>
>What I've found can be summarized as follows:
>
>   - Overall there is no impact to the dbus by creating a large number
>     of log entries.  Interactions with other daemons happen just fine
>     with no performance impact.

Right, I don't expect that once they are created that they would have a 
meaningful impact other than maybe some sort of memory footprint, 
though, that footprint is magnified from just phosphor-logging by the 
mapper.

>   - Creating 10k log entries does take a while.  Most of this time is
>     observed (via top) in jffs2 but there is also some peaky objmgr
>     activity.  There might be some improvements to be had there, but I
>     don't think anyone is intending to create 10k events in the span of
>     a minute.

This is really my biggest concern at this point. The OpenBMC is already 
the slowest-to-boot BMC firmware that I have worked on in the past 10 
years, and that is in the face of faster BMC processors and faster RAM. 
Delay the 'fully booted' state of the BMC for this will cause validation 
bugs because the BMC is changing behavior even though it should be at a 
stable state.

>   - Dumping all the events from phosphor-logging is slow when there are
>     10k of them.  It took 8-11s.  I didn't have `strace` installed, but
>     it seemed like much of this was coming from `busctl` processing the
>     result and not from the bus transfer itself, but more investigation
>     would be required.

I think more investigation needs to be done here. We should be limited 
by the network, not by accessing the items.

>   - Deleting all 10k of the events timed out at a dbus level, but still
>     succeeded.  Almost all of the time was spent in jffs2.

This would be a lot faster if all the items were in a single file, which 
is a change that could be made independently of whether or not the 
individual log entries are hosted as dbus objects.

--Vernon