[Skiboot] [PATCH v3 06/13] hiomap: Enable Async IPMI messaging

Fri Jan 3 03:38:14 AEDT 2020

On 12/16/19 7:53 PM, Oliver O'Halloran wrote:
> On Fri, 2019-12-13 at 06:32 -0600, Deb McLemore wrote:
>> Comments below:
>>
>> On 11/24/19 9:12 PM, Oliver O'Halloran wrote:
>>> On Thu, Nov 7, 2019 at 4:25 AM Deb McLemore <debmc at linux.ibm.com> wrote:
>>>> *snip*
>> From libflash/ipmi-hiomap.h
>>
>> /*
>>  * we basically check for a quick response
>>  * otherwise we catch the updated window in the next cycle
>>  */
>> #define IPMI_HIOMAP_TICKS 5
>>
>> Unless the window is immediately ready (so we have to stall
>> doing some logic which is not optimized out) this allows some
>> period of time (short) to allow a quick response otherwise
>> we will always kick out and back to linux to allow that delay
>> to allow the window to be moved.
>>
>>> wait_time is always going to be zero since the two mftb() calls are
>>> going to return a pretty similar values and tb_to_msecs() divides the
>>> difference by 512000 (tb_hz / 1000).
>>>
>>>> +       timeout_counter = 0;
>>>> +
>>>> +
>>>> +       while (wait_time < flash_reserve_ticks) {
>>>> +               ++timeout_counter;
>>>> +               if (timeout_counter % 4 == 0) {
>>>> +                       now = mftb();
>>>> +                       wait_time = tb_to_msecs(now - start_time);
>>>> +               }
>>>> +               if (flash->busy == false) {
>>>> +                       break;
>>>> +               }
>>>> +       }
> Please follow convention and put your responses below the thing being
> responded to rather than above. Mixing responses above and below just
> makes the thread a pain to follow.
>
>> Again due to optimizing issues we want to stall and actually
>> check the value "x" number of times to allow a miss or two
>> on the lock to happen, other logic was optimized away.
>>
>>> This doesn't make any sense to me. What are you doing here and why?
>>>
>>>> -       if (!try_lock(&flash_lock))
>>>> +       while (!try_lock(&flash_lock)) {
>>>> +               --lock_try_counter;
>>>> +               if (lock_try_counter == 0) {
>>>> +                       break;
>>>> +               }
>>>> +       }
>>> If you're going to do this sort of thing then use a for loop. T so the
>>> iteration count is set the same place it's being used.
>>>
>>>> +       if (lock_try_counter == 0) {
>>>>                 return false;
>>>> +       }
>>> you should return from inside the loop instead of breaking out and
>>> re-checking the break condition
>>>
>>>> +
>>>> +       /* we have the lock if we got here */
>>>>
>>>>         if (!flash->busy) {
>>>>                 flash->busy = true;
>>>>                 rc = true;
>>>> +       } else {
>>>> +               /* probably beat a flash_release and grabbed the lock */
>>>> +               unlock(&flash_lock);
>>>> +               while (!try_lock(&flash_lock)) {
>>>> +                       --lock_try_counter;
>>>> +                       if (lock_try_counter == 0) {
>>>> +                               break;
>>>> +                       }
>>>> +               }
>>>> +               if (lock_try_counter == 0) {
>>>> +                       return false;
>>>> +               }
>>>> +               /* we have the lock if we are here */
>>>> +               if (!flash->busy) {
>>>> +                       flash->busy = true;
>>>> +                       rc = true;
>>>> +               }
>>> This doesn't make much sense to me either. Why is replicating all the
>>> previous logic necessary?
>>>
>>>>         }
>>>>         unlock(&flash_lock);
>>> I'm not sure that we even need to use try_lock() with patch 03/13
>>> applied. With that patch the flash's lock should only be held briefly
>>> so we should be able to use the normal (blocking) lock() since we
>>> don't need to deal with waiting for a 64MB flash read to complete.
> ^^^^^^^^^^^^^^^^^^
>
> This is what I really wanted a response to.
>
> All the games above with try_lock() and checking the busy flag outside
> the lock only make sense to me if we're need to avoid being stuck
> waiting for a lock in skiboot. The changes in 3/13 result in the lock
> only being held briefly since the only thing that needs to be done
> under the lock is checking the busy flag. Why not do this instead:
>
> diff --git a/core/flash.c b/core/flash.c
> index d97d88edbdd5..8fcc9d46ca59 100644
> --- a/core/flash.c
> +++ b/core/flash.c
> @@ -64,8 +64,7 @@ static bool flash_reserve(struct flash *flash)
>  {
>  	bool rc = false;
>  
> -	if (!try_lock(&flash_lock))
> -		return false;
> +	lock(&flash_lock);
>  
>  	if (!flash->busy) {
>  		flash->busy = true;
>
> That change probably should have been in Cyril's patch. Oh well, people
> write bugs sometimes.

The logic added is to add resiliency.  If the flash_lock is secured, this

does not mean the flash->busy will always be available so we check the

status of the data and allow some iterations to maybe persist in the current

operation.

>
>>>> @@ -279,12 +340,31 @@ static void flash_poll(struct timer *t __unused, void *data, uint64_t now __unus
>>>>                 assert(0);
>>>>         }
>>>>
>>>> -       if (rc)
>>>> -               rc = OPAL_HARDWARE;
>>>> +       if (rc == 0) {
>>>> +               /*
>>>> +                * if we are good to proceed forward
>>>> +                * otherwise we may have to try again
>>>> +                */
>>>> +               flash->async.pos += len;
>>>> +               flash->async.buf += len;
>>>> +               flash->async.len -= len;
>>>> +               /* if we good clear */
>>>> +               flash->async.retry_counter = 0;
>>>> +               /*
>>>> +                * clear the IN_PROGRESS flags
>>>> +                * we only need IN_PROGRESS active on missed_cc
>>>> +                */
>>>> +               flash->bl->flags &= IN_PROGRESS_MASK;
>>>> +               /* reset time for scheduling gap */
>>>> +               flash->async.in_progress_schedule_delay = FLASH_SCHEDULE_DELAY;
>> async flash layer is the one that is managing the work,
>> block layer is a worker which does not know how the
>> work is being managed.
>>
>>> The default delay should come from the blocklevel rather than being
>>> assumed by the user of libflash. The bl has a better idea about what
>>> sort of interval is required between ops and what a reasonable timeout
>>> is.
> What do you mean by "managing"? The general pattern for async IO in
> skiboot is:
>
> upper layer:
> 	1. Sets up a job
> 	2. Submit job it to the worker
> 	3. Periodically polls the worker until completed.
>
> worker:
> 	1. Do all the work you can until a wait is required.
> 	2. Return a wait time to the upper layer.
>
> All the details of how the job is handled are supposed to be owned by
> the worker. The upper layer does not, and should not, know any of the
> details of how the job is actually handled. All the upper layer really
> wants is a completion code from the worker. This does mean you need to
> keep a bit more of the transaction state in the blocklayer, but I think
> that's fine.
>
>> Problem cases are when the async has split the work up into
>> chunks.  If a chunk desired to be read straddles the window
>> (meaning that only a subset of the chunk is available in the
>> current window) we have to bump down the async chunk
>> desired to be read so that it will succeed.  Then we can
>> acquire a new window which we can then do subsequent reads
>> (again at the async flash layer).
>>
>>> I can't parse this, but it seems like it's all blocklevel specific
>>> stuff that should be documented there. From the perspective of a
>>> libflash user we don't want to know or care about *why* we need to do
>>> a async delay, just that we have to.
> As mentioned above, I don't see why this can't be handled in the
> blocklayer. Splitting a single read because it crosses a window
> boundary is an implementation detail of the blocklayer driver and
> should be handled there.

The async design for this feature is to allow an alternative management
of the typical

skiboot polling since during the window movement we cannot allow those

operations (pushing the transaction state to the blocklayer was not the
design point of this

feature, managing and communicating success and failure of each
individual worker request seemed a bit much).

>
>>>> @@ -487,7 +596,7 @@ static int64_t opal_flash_op(enum flash_op op, uint64_t id, uint64_t offset,
>>>>                 uint64_t buf, uint64_t size, uint64_t token)
>>>>  {
>>>>         struct flash *flash = NULL;
>>>> -       uint64_t len;
>>>> +       uint64_t len = 0;
>> Yes
>>> Is that needed?
> Doesn't appear to be. Adding a bit more context to the hunk below:
>
>         len = MIN(size, flash->block_size*10);
>         printf("Flash op %d len %llu\n", op, len);
>         flash->async.op = op;
>         flash->async.token = token;
>         flash->async.buf = buf + len; 
>         flash->async.len = size - len; 
>
> So it's already explicitly initialised.
        [CC]  core/flash.o
In file included from core/flash.c:11:0:
core/flash.c: In function ‘opal_flash_op’:
include/skiboot.h:81:31: error: ‘len’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
 #define prlog(l, f, ...) do { _prlog(l, pr_fmt(f), ##__VA_ARGS__); }
while(0)
                               ^~~~~~
core/flash.c:612:11: note: ‘len’ was declared here
  uint64_t len;
           ^~~
        [CC]  core/sensor.o
cc1: all warnings being treated as errors
make[2]: *** [core/flash.o] Error 1

>
>
>>>> @@ -516,6 +625,10 @@ static int64_t opal_flash_op(enum flash_op op, uint64_t id, uint64_t offset,
>>>>         flash->async.buf = buf + len;
>>>>         flash->async.len = size - len;
>>>>         flash->async.pos = offset + len;
>>>> +       flash->async.retry_counter = 0;
>>>> +       flash->async.in_progress_schedule_delay = FLASH_SCHEDULE_DELAY;
>>>> +       flash->bl->flags |= ASYNC_REQUIRED;
>>>> +       ++flash->async.transaction_id;
>> This is needed.
>>> Use post-increment unless you absolutely have to use pre-increment.
>>> There's only a few instances where pre-increment is used in skiboot
>>> and I'd like to keep it that way.
> Why? Is a transaction ID of zero signifigant in some way? If so, is
> that documented?

The entry of opal_flash_op is the transition point of when we know a new
transaction is being

requested, so we bump the transaction_id on entry (subsequent
continuation paths enter

through flash_poll).

>
>
>