[Skiboot] [PATCH 05/12] qemu: 1ms heartbeat time

Tue Oct 22 13:45:26 AEDT 2019

On Mon, Oct 21, 2019, at 3:13 AM, Cédric Le Goater wrote:
> On 21/10/2019 07:53, Oliver O'Halloran wrote:
> > On Fri, Oct 18, 2019 at 8:56 AM Deb McLemore <debmc at linux.ibm.com> wrote:
> >>
> >> From: Stewart Smith <stewart at linux.ibm.com>
> >>
> >> Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
> >> ---
> >>  platforms/qemu/qemu.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/platforms/qemu/qemu.c b/platforms/qemu/qemu.c
> >> index 757c086..71be716 100644
> >> --- a/platforms/qemu/qemu.c
> >> +++ b/platforms/qemu/qemu.c
> >> @@ -42,6 +42,14 @@ static bool qemu_probe_powernv9(void)
> >>         return qemu_probe_common("qemu,powernv9");
> >>  }
> >>
> >> +static int qemu_heartbeat_time(void)
> >> +{
> >> +       /*
> >> +        * Fast polling to make up for lack of SBE timers
> >> +        */
> >> +       return 1;
> > 
> > Cedric, is this still required or does qemu model the SBE timer interrupt now?
> 
> QEMU does not model the SBE and I wasn't aware of that patch. 
> 
> Do you have more info on the root problem ?

I do!

So, OPAL_FLASH_READ/WRITE is an OPAL async API call (as in, does the token, OPAL_ASYNC_COMPLETION etc) but the *implementation* of these calls for the flash backend inside skiboot isn't actually asynchronous.

So the call is like this:
1. opal_flash_read(1 megabyte) returns OPAL_ASYNC_COMPLETION
2. poll for completion every 100ms <- one of these takes $number_of_seconds_to_read_1mb_off_flash
3. get completion.

So, step 2 here introduces horrible jitter.

So the fancy *real* async way of doing it would be to chunk up the 1MB read into (say) 64k chunks and then do one of those per poller run. But, in qemu (or mambo, which IIRC is where I was testing things when I wrote this patch), you don't have a way to schedule a timer interrupt to occur "immediately", so you're  stuck waiting for the regular poller run occuring either via the heartbeat period *OR* the loop in the kernel flash code that sleeps and waits for the async opal call to complete (which runs the pollers every 100ms or so IIRC)

So, effecively, for mambo/qemu, you only run the poller every heartbeat, which means that your 1MB flash read now takes 1MB/chunk size * heartbeat time. i.e. a long time.

The "simple" "solution" here is to make heartbeat be really really often, and then flash ops when you don't have the SBE timer facility available are slow, but not crazy crazy slow.