[Skiboot] [PATCH skiboot] npu2: Reset NVLinks when resetting a GPU
Alexey Kardashevskiy
aik at ozlabs.ru
Wed Jun 5 17:11:44 AEST 2019
On 03/06/2019 12:04, Stewart Smith wrote:
> Alexey Kardashevskiy <aik at ozlabs.ru> writes:
>> Resetting a V100 GPU brings its NVLinks down and if an NPU tries using
>> those, an HMI occurs. We were lucky not to observe this as the bare metal
>> does not normally reset a GPU and when passed through, GPUs are usually
>> before NPUs in QEMU command line or Libvirt XML and because of that NPUs
>> are naturally reset first. However simple change of the device order
>> brings HMIs.
>>
>> This defines a bus control filter for a PCI slot with a GPU with NVLinks
>> so when the host system issues secondary bus reset to the slot, it resets
>> associated NVLinks.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>
> Merged to master as of 7c977c734e1c4d3be9a036a075798530d352d8e3. Sorry
> for the delay.
>
> Does this need to also go to stable?
It is annoying when you need to reboot a host because of these hmis so
probably yes, needs to go to stable.
--
Alexey
More information about the Skiboot
mailing list