[Skiboot] [PATCH v3 06/13] hiomap: Enable Async IPMI messaging

Tue Dec 17 12:53:23 AEDT 2019

On Fri, 2019-12-13 at 06:32 -0600, Deb McLemore wrote:
> Comments below:
> 
> On 11/24/19 9:12 PM, Oliver O'Halloran wrote:
> > On Thu, Nov 7, 2019 at 4:25 AM Deb McLemore <debmc at linux.ibm.com> wrote:
> > > *snip*
> From libflash/ipmi-hiomap.h
> 
> /*
>  * we basically check for a quick response
>  * otherwise we catch the updated window in the next cycle
>  */
> #define IPMI_HIOMAP_TICKS 5
> 
> Unless the window is immediately ready (so we have to stall
> doing some logic which is not optimized out) this allows some
> period of time (short) to allow a quick response otherwise
> we will always kick out and back to linux to allow that delay
> to allow the window to be moved.
> 
> > wait_time is always going to be zero since the two mftb() calls are
> > going to return a pretty similar values and tb_to_msecs() divides the
> > difference by 512000 (tb_hz / 1000).
> > 
> > > +       timeout_counter = 0;
> > > +
> > > +
> > > +       while (wait_time < flash_reserve_ticks) {
> > > +               ++timeout_counter;
> > > +               if (timeout_counter % 4 == 0) {
> > > +                       now = mftb();
> > > +                       wait_time = tb_to_msecs(now - start_time);
> > > +               }
> > > +               if (flash->busy == false) {
> > > +                       break;
> > > +               }
> > > +       }

Please follow convention and put your responses below the thing being
responded to rather than above. Mixing responses above and below just
makes the thread a pain to follow.

> Again due to optimizing issues we want to stall and actually
> check the value "x" number of times to allow a miss or two
> on the lock to happen, other logic was optimized away.
> 
> > This doesn't make any sense to me. What are you doing here and why?
> > 
> > > -       if (!try_lock(&flash_lock))
> > > +       while (!try_lock(&flash_lock)) {
> > > +               --lock_try_counter;
> > > +               if (lock_try_counter == 0) {
> > > +                       break;
> > > +               }
> > > +       }
> > If you're going to do this sort of thing then use a for loop. T so the
> > iteration count is set the same place it's being used.
> > 
> > > +       if (lock_try_counter == 0) {
> > >                 return false;
> > > +       }
> > you should return from inside the loop instead of breaking out and
> > re-checking the break condition
> > 
> > > +
> > > +       /* we have the lock if we got here */
> > > 
> > >         if (!flash->busy) {
> > >                 flash->busy = true;
> > >                 rc = true;
> > > +       } else {
> > > +               /* probably beat a flash_release and grabbed the lock */
> > > +               unlock(&flash_lock);
> > > +               while (!try_lock(&flash_lock)) {
> > > +                       --lock_try_counter;
> > > +                       if (lock_try_counter == 0) {
> > > +                               break;
> > > +                       }
> > > +               }
> > > +               if (lock_try_counter == 0) {
> > > +                       return false;
> > > +               }
> > > +               /* we have the lock if we are here */
> > > +               if (!flash->busy) {
> > > +                       flash->busy = true;
> > > +                       rc = true;
> > > +               }
> > This doesn't make much sense to me either. Why is replicating all the
> > previous logic necessary?
> > 
> > >         }
> > >         unlock(&flash_lock);
> > I'm not sure that we even need to use try_lock() with patch 03/13
> > applied. With that patch the flash's lock should only be held briefly
> > so we should be able to use the normal (blocking) lock() since we
> > don't need to deal with waiting for a 64MB flash read to complete.

^^^^^^^^^^^^^^^^^^

This is what I really wanted a response to.

All the games above with try_lock() and checking the busy flag outside
the lock only make sense to me if we're need to avoid being stuck
waiting for a lock in skiboot. The changes in 3/13 result in the lock
only being held briefly since the only thing that needs to be done
under the lock is checking the busy flag. Why not do this instead:

diff --git a/core/flash.c b/core/flash.c
index d97d88edbdd5..8fcc9d46ca59 100644
--- a/core/flash.c
+++ b/core/flash.c
@@ -64,8 +64,7 @@ static bool flash_reserve(struct flash *flash)
 {
 	bool rc = false;
 
-	if (!try_lock(&flash_lock))
-		return false;
+	lock(&flash_lock);
 
 	if (!flash->busy) {
 		flash->busy = true;

That change probably should have been in Cyril's patch. Oh well, people
write bugs sometimes.


> > > @@ -279,12 +340,31 @@ static void flash_poll(struct timer *t __unused, void *data, uint64_t now __unus
> > >                 assert(0);
> > >         }
> > > 
> > > -       if (rc)
> > > -               rc = OPAL_HARDWARE;
> > > +       if (rc == 0) {
> > > +               /*
> > > +                * if we are good to proceed forward
> > > +                * otherwise we may have to try again
> > > +                */
> > > +               flash->async.pos += len;
> > > +               flash->async.buf += len;
> > > +               flash->async.len -= len;
> > > +               /* if we good clear */
> > > +               flash->async.retry_counter = 0;
> > > +               /*
> > > +                * clear the IN_PROGRESS flags
> > > +                * we only need IN_PROGRESS active on missed_cc
> > > +                */
> > > +               flash->bl->flags &= IN_PROGRESS_MASK;
> > > +               /* reset time for scheduling gap */
> > > +               flash->async.in_progress_schedule_delay = FLASH_SCHEDULE_DELAY;
>
> async flash layer is the one that is managing the work,
> block layer is a worker which does not know how the
> work is being managed.
> 
> > The default delay should come from the blocklevel rather than being
> > assumed by the user of libflash. The bl has a better idea about what
> > sort of interval is required between ops and what a reasonable timeout
> > is.

What do you mean by "managing"? The general pattern for async IO in
skiboot is:

upper layer:
	1. Sets up a job
	2. Submit job it to the worker
	3. Periodically polls the worker until completed.

worker:
	1. Do all the work you can until a wait is required.
	2. Return a wait time to the upper layer.

All the details of how the job is handled are supposed to be owned by
the worker. The upper layer does not, and should not, know any of the
details of how the job is actually handled. All the upper layer really
wants is a completion code from the worker. This does mean you need to
keep a bit more of the transaction state in the blocklayer, but I think
that's fine.

> Problem cases are when the async has split the work up into
> chunks.  If a chunk desired to be read straddles the window
> (meaning that only a subset of the chunk is available in the
> current window) we have to bump down the async chunk
> desired to be read so that it will succeed.  Then we can
> acquire a new window which we can then do subsequent reads
> (again at the async flash layer).
> 
> > I can't parse this, but it seems like it's all blocklevel specific
> > stuff that should be documented there. From the perspective of a
> > libflash user we don't want to know or care about *why* we need to do
> > a async delay, just that we have to.

As mentioned above, I don't see why this can't be handled in the
blocklayer. Splitting a single read because it crosses a window
boundary is an implementation detail of the blocklayer driver and
should be handled there.

> > > @@ -487,7 +596,7 @@ static int64_t opal_flash_op(enum flash_op op, uint64_t id, uint64_t offset,
> > >                 uint64_t buf, uint64_t size, uint64_t token)
> > >  {
> > >         struct flash *flash = NULL;
> > > -       uint64_t len;
> > > +       uint64_t len = 0;
> Yes
> > Is that needed?

Doesn't appear to be. Adding a bit more context to the hunk below:

        len = MIN(size, flash->block_size*10);
        printf("Flash op %d len %llu\n", op, len);
        flash->async.op = op;
        flash->async.token = token;
        flash->async.buf = buf + len; 
        flash->async.len = size - len; 

So it's already explicitly initialised.


> > > @@ -516,6 +625,10 @@ static int64_t opal_flash_op(enum flash_op op, uint64_t id, uint64_t offset,
> > >         flash->async.buf = buf + len;
> > >         flash->async.len = size - len;
> > >         flash->async.pos = offset + len;
> > > +       flash->async.retry_counter = 0;
> > > +       flash->async.in_progress_schedule_delay = FLASH_SCHEDULE_DELAY;
> > > +       flash->bl->flags |= ASYNC_REQUIRED;
> > > +       ++flash->async.transaction_id;
> This is needed.
> > Use post-increment unless you absolutely have to use pre-increment.
> > There's only a few instances where pre-increment is used in skiboot
> > and I'd like to keep it that way.

Why? Is a transaction ID of zero signifigant in some way? If so, is
that documented?