[Skiboot] [PATCH] Add in new OPAL call to flush the L2 and L3 caches.

Alexey Kardashevskiy aik at ozlabs.ru
Wed Nov 14 17:35:24 AEDT 2018



On 14/11/2018 14:32, Alistair Popple wrote:
> On Tuesday, 13 November 2018 7:24:28 PM AEDT Oliver wrote:
>> On Tue, Nov 13, 2018 at 4:28 PM Alistair Popple <alistair at popple.id.au> 
> wrote:
>>>> An async completion might make more sense here. They're a little
>>>> convoluted but the basic process is:
>>>>
>>>> 1) pass an async_token (magic number) in the OPAL call
>>>> 2) schedule a timer inside of OPAL to check on the async job
>>>> 3) return from the opal call with OPAL_ASYNC_COMPLETION
>>>> 4) check on the purge state in the timer's callback function
>>>> 5) when the async job is done, return use opal_msg() to send an
>>>> OPAL_MSG_ASYNC_COMP message.
>>>>
>>>> The calling thread in linux can sleep while waiting for the async
>>>> completion message. If you want an example have a look at
>>>> core/i2c.c:opal_i2c_request() in skiboot and the
>>>> drivers/i2c/busses/i2c-opal.c in linux.
>>>
>>> Both Nick and Alexey have asked offline if this could just be made part of
>>> the NPU reset sequence. Technically I think it could be but needing to do
>>> it async complicates things. Oliver do you know if the existing PCI slot
>>> reset code has any mechanism to return something similar to an async
>>> completion?
>>>
>>> There was talk of having the NVIDIA driver call be able to call this
>>> sequence directly but doing it directly as part of the reset may mitigate
>>> that.
>> You should be able to put it into the creset function for the npu2's
>> virtual PHB. For normal PHBs creset is a fairly long operation since
>> it requires asserting PERST for half a second or something. However
>> the interface for doing that is slightly different (not sure why,
>> might be a OPALv1 holdover). Rather than returning a async token it
>> returns a wait time (in ms) and expects you to call OPAL_PCI_POLL
>> repeatedly until you get OPAL_SUCCESS, at which point the reset is
>> done.
> 
> Thanks Oliver, makes sense.
> 
>> I'd say it should be fairly straightforward to do the purge in the
>> npu2 creset(). If the current NPU driver in linux expects to get
>> OPAL_SUCCESS when it does an NPU reset you might be screwed though.
> 
> We don't do anything special here in Linux for the NPU so I'd assume it would 
> work as well as it does for everything else. Alexey you were hooking up these 
> resets for pass-through do you know if returning OPAL_PCI_POLL would work in 
> your call paths?

GPUs are reset from pnv_pci_reset_secondary_bus()+pnv_eeh_bridge_reset()
which does wait for opal_pci_poll(); in skiboot it is phb4_hreset() and
this could be the place for the cache purge thingy but I cannot easily
figure out this state machine in skiboot :-/

NPUs are reset from pcie_flr() and wait 100ms (then linux waits for
PCI_COMMAND!=0xffff but this happens immediately).



-- 
Alexey


More information about the Skiboot mailing list