bmcweb logging

Sat Oct 31 08:06:08 AEDT 2020

On 10/30/20 10:17 AM, Ed Tanous wrote:
> On Fri, Oct 30, 2020 at 5:24 AM Ratan Gupta <ratagupt at linux.vnet.ibm.com> wrote:
>> On 10/29/20 9:52 PM, Ed Tanous wrote:
>>> On Thu, Oct 29, 2020 at 8:47 AM Ratan Gupta <ratagupt at linux.vnet.ibm.com> wrote:
>>>> Hi Ed,
>>>>
>>>> I was having issues in getting the logs from bmcweb repo, If I don't
>>>> define the compiler flag(BMCWEB_ENABLE_LOGGING) then it doesn't get me
>>>> any logs in journal(not even error log) and if I define the compiler
>>>> flag then it gets too much debug info.
>>> What's your limit for "too much"?
>> around 30+ traces for each Redfish request.If we enable the bmcweb
>> logging, with that  journal is full of bmcweb traces.
> You've answered what bmcweb does today;  I asked you what the limit
> should be.  5 per request?  10?.  Personally, 30 prints for a single
> request is exactly what I would expect.  There are roughly 30 things
> that happen for every request, any of which can go wrong, and when
> stalls happen, it's good to know what the last thing was.  If you're
> trying to debug new handler code that can assume the core is handling
> things correctly and has no bugs, I could see that as being about 10
> more than you'd need, but as someone that has to debug core
> regressions quite a bit, having that information is quite helpful.
>
>>>> Seems it happens because we have hardcoded the logelevel  and there is
>>>> no way through which we can change except making the change manually in
>>>> the following place.
>>>>
>>>> https://github.com/openbmc/bmcweb/blob/master/http/logging.hpp#L93
>>>>
>>>> Suggestion is we can define another compile time flag for log level and
>>>> use that log level at the above line for type of logging , by default
>>>> logging should be enabled with error log level.
>>> Thusfar, we don't really have a definition of what kind of detail
>>> should be included at each log level, nor are we consistent about
>>> setting the log levels.  If it's important to you, and you want to put
>>> together some patches, we certainly could define the log verbosity,
>>> but we'd also have to come up with better definitions for what level
>>> each log needs to take, and be more consistent about it.
>> We have different log levels in bmcweb like error, critical, info,
>> debug, but what is missing is we don't have some compile time variable
>> to control it( currently it is either full or none.)
> Error critical and info are used roughly interchangeably in code, and
> we don't have a good definition of when to use each one.  It should be
> noted, it's a 1 line fix to set a lower verbosity;  Have you tried
> setting a lower one?  Did it give you the result you were looking for
> on old code?
>
>>> IMO, that level of log granularity was never that valuable to debug,
>>> my thought being that if I'm enabling logging I don't care about
>>> performance, so the more logs the better.  When reproducing a failure,
>>> there doesn't tend to be a lot of logging present, and it's easy
>>> enough to find what you want, but maybe you had a different
>>> experience?
>> Yes, I had a different experience in the recent bug where the failure
>> was intermittent and it was happening only after running the regression,
>> when I look into the log, not a single trace was there.Hence I thought
>> that atleast the logs which is being marked as error(BMCWEB_LOG_ERROR)
>> should have been written to the journal.
>>
>> Say In production if we find some issue and we ask for the journal then
>> journal will not be useful as it will not be having any traces to debug.
> Agreed, but that's not what the bmcweb logging framework is designed
> for, nor is it really in any webservers that I'm aware of.  What it
> sounds like you're looking for is an access log and an error log,
> similar to nginx, which is capable of running all the time and logging
> a minimal subset of a request.  bmcweb doesn't have that today.

+1 for the notion to separate the error logs and audit logs. <turns to a 
slightly different direction>

I believe OpenBMC should have an audit logging service per NIST SP 
800-193 "Platform Firmware Resiliency Guidelines" > "Event Logging".  
These are needed for platform and security administrators.  This log 
would record specific high-level events such as:
- Authentication attempts (requester, account, outcome, etc.).
- Attempted access (such as REST API operations that require privilege 
higher than ReadOnly and operations other than GET or HEAD).
- BMC rebooted.
- Host power state changes.
- Etc. (many more).

Many of these events are generated by BMCWeb, but others come from 
various daemons.  So IMHO this is an OpenBMC-wide opportunity.

Other folks need even more different logs.  For example, see 
https://github.com/ibm-openbmc/dev/issues/2008

- Joseph

References:
NIST.SP.800-193: 
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-193.pdf

>
>>> Long term, my intent was to just get rid of the various verbosities,
>>> and make logging ON or OFF in the code, and downgrade to a single
>>> macro, but if you see a need, we could certainly go the other
>>> direction.
>>>
>>>> If there is a need user can play with new compiler flag for more verbose
>>>> logging.
>>> I don't personally see the need, but if it would help you, I don't
>>> think it adds much complexity to do it, and I'd be happy to review
>>> your patches.
>> Sure, I would be pushing the patch for the same.
> Cool.
>
>>>> Am I missing something in usage of logging infra?
>>>>
>>>> Ratan
>>>>
>>>>
>>> Just to be clear, you're using logging in a debug context, correct?
>>> If we're trying to use this to define an access log, that's a very
>>> different problem space, and the existing logging infrastructure would
>>> be poorly suited for it.
> It'd be helpful if you had answered this, although I'm starting to get
> a sense of what you're looking for.