[Skiboot] [PATCH] nvram: Fix 'missing' nvram on FSP systems.

Thu Nov 30 15:50:47 AEDT 2017

Cyril Bur <cyril.bur at au1.ibm.com> writes:
> On Tue, 2017-11-14 at 11:39 +1100, Oliver wrote:
>> On Tue, Nov 7, 2017 at 5:07 PM, Cyril Bur <cyril.bur at au1.ibm.com> wrote:
>> > commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
>> > from NVRAM rather early. This works fine on BMC based systems as
>> > nvram_init() is actually synchronous. This is not true for FSP systems
>> > and it turns out that the query for the console log level simply
>> > queries blank nvram.
>> > 
>> > The simple fix is to wait for the NVRAM read to complete before
>> > performing any query. Unfortunately it turns out that the fsp-nvram
>> > code does not inform the generic NVRAM layer when the read is complete,
>> > rather, it must be prompted to do so.
>> > 
>> > This patch addresses both these problems. This patch adds a check before
>> > the first read of the NVRAM (for the console log level) that the read
>> > has completed. The fsp-nvram code has been updated to inform the generic
>> > layer as soon as the read completes.
>> > 
>> > The old prompt to the fsp-nvram code has been removed but a check to
>> > ensure that the NVRAM has been loaded remains. It is conservative but
>> > if the NVRAM is not done loading before the host is booted it will not
>> > have an nvram device-tree node which means it won't be able to access
>> > the NVRAM at all, ever, even after the NVRAM has loaded.
>> > 
>> > Signed-off-by: Cyril Bur <cyril.bur at au1.ibm.com>
>> > ---
>> >  core/init.c                  | 11 +++++++++--
>> >  core/nvram-format.c          |  5 +++++
>> >  core/nvram.c                 | 35 +++++++++++++++++++++++++++++++++++
>> >  core/test/run-nvram-format.c |  5 +++++
>> >  hw/fsp/fsp-nvram.c           | 28 ++++------------------------
>> >  include/fsp.h                |  1 -
>> >  include/nvram.h              |  2 ++
>> >  7 files changed, 60 insertions(+), 27 deletions(-)
>> > 
>> > diff --git a/core/init.c b/core/init.c
>> > index 8951e17b..17de9851 100644
>> > --- a/core/init.c
>> > +++ b/core/init.c
>> > @@ -497,7 +497,7 @@ void __noreturn load_and_boot_kernel(bool is_reboot)
>> >                 /* We wait for the nvram read to complete here so we can
>> >                  * grab stuff from there such as the kernel arguments
>> >                  */
>> > -               fsp_nvram_wait_open();
>> > +               nvram_wait_for_load();
>> > 
>> >                 /* Wait for FW VPD data read to complete */
>> >                 fsp_code_update_wait_vpd(true);
>> > @@ -987,7 +987,14 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>> >         /* preload the IMC catalog dtb */
>> >         imc_catalog_preload();
>> > 
>> > -       /* Set the console level */
>> > +       /*
>> > +        * Set the console level
>> > +        * We need NVRAM to have been setup by now otherwise we'll
>> > +        * just read a zeroed buffer.
>> > +        * Some FSP systems can actually be quite slow to return the
>> > +        * NVRAM.
>> > +        */
>> > +       nvram_wait_for_load();
>> >         console_log_level();
>> > 
>> >         /* Secure/Trusted Boot init. We look for /ibm,secureboot in DT */
>> > diff --git a/core/nvram-format.c b/core/nvram-format.c
>> > index 923098af..655cb8ee 100644
>> > --- a/core/nvram-format.c
>> > +++ b/core/nvram-format.c
>> > @@ -216,6 +216,11 @@ const char *nvram_query(const char *key)
>> >         const char *part_end, *start;
>> >         int key_len = strlen(key);
>> > 
>> > +       if (!nvram_has_loaded()) {
>> > +               prlog(PR_WARNING, "NVRAM: Query before is done loading\n");
>> > +               return NULL;
>> > +       }
>> 
>> I think this should just call nvram_wait_for_load() rather than
>> failing here. The problem
>> with this behaviour is that can end up with nvram_query() failing
>> intermittently on FSP
>> systems since loading the nvram is asynchronous there.
>> 
>> Given that we use nvram_query() for switching debug stuff on and off
>> I'd rather it was
>> as reliable as possible even if it slows booting slightly.
>> 
>
> I think this goes back to - when do we call nvram_wait_for_load()...
> and great... looks like my entire summary email never made it to the
> list... nor can I find it anywhere...
>
> It was a rather long email I don't feel like retyping...
>
> Basically, either we do what Oliver suggests but this hides the (now)
> blocking nature of nvram_query() which I was hesitant to do. I feel
> like in skiboot it should be fairly obvious that something might block.
>
> Another options is we leave what I have, which makes the blocking quite
> obvious but does require the function to be explicitly called before
> nvram_query() and more importantly before launching the bootkernel.
>
> Finally, we put nvram_wait_for_load() at the end of nvram_init(), which
> is a nicer place to have blocking since its obviously an init function.
> The problem being that we might be able to nvram_init() long enough
> before we actually need the nvram and we lose the benefit of the
> asynchronous load. Having said that, as Oliver points out - we use
> nvram_query() to switch debug stuff. We want to do that as early as
> possible, so if there is going to be a blocking nvram_query() right
> after nvram_init(), then it might as well be in nvram_init().
>
> All this only applies to FSP, BMC nvram_init() is actually synchronous
> at the moment (although it doesn't have to be).
>
>
> Stewart, do you have any strong feelings?

I've gone back and forth a bit.... I think the best way would be to
print the warning and then wait for it to be loaded anyway. THat way,
things still work if somebody has to put in early debug and not sort
through the how-we-load-nvram-mess, but we'll know that we need to fix
something before shipping if we do.

-- 
Stewart Smith
OPAL Architect, IBM.