[PATCH v2] powerpc/npu-dma.c: Fix deadlock in mmio_invalidate

Michael Ellerman mpe at ellerman.id.au
Wed Feb 28 21:14:48 AEDT 2018


Alistair Popple <alistair at popple.id.au> writes:

> When sending TLB invalidates to the NPU we need to send extra flushes due
> to a hardware issue. The original implementation would lock the all the
> ATSD MMIO registers sequentially before unlocking and relocking each of
> them sequentially to do the extra flush.
>
> This introduced a deadlock as it is possible for one thread to hold one
> ATSD register whilst waiting for another register to be freed while the
> other thread is holding that register waiting for the one in the first
> thread to be freed.
>
> For example if there are two threads and two ATSD registers:
>
> Thread A	Thread B
> Acquire 1
> Acquire 2
> Release 1	Acquire 1
> Wait 1		Wait 2
>
> Both threads will be stuck waiting to acquire a register resulting in an
> RCU stall warning or soft lockup.
>
> This patch solves the deadlock by refactoring the code to ensure registers
> are not released between flushes and to ensure all registers are either
> acquired or released together and in order.
>
> Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD")
> Signed-off-by: Alistair Popple <alistair at popple.id.au>
>
> ---
>
> v2: Added memory barriers around ->npdev[] and ATSD register aquisition/release

Apologies to Balbir who was standing nearby when I read this patch this
afternoon and copped a bit of a rant as a result ...

But READ_ONCE/WRITE_ONCE are not memory barriers, they are only compiler
barriers.

So the READ_ONCE/WRITE_ONCE usage in here may be necessary, but it's
probably not sufficient, we probably *also* need actual memory barriers.

But I could be wrong, my main gripe is that the locking/ordering in here
is not very obvious.

> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 0a253b64ac5f..2fed4b116e19 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -726,7 +749,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>  							&nvlink_index)))
>  		return ERR_PTR(-ENODEV);
> -	npu_context->npdev[npu->index][nvlink_index] = npdev;
> +	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
  
When you're publishing a struct via a pointer you would typically have a
barrier between the stores that set the fields of the struct, and the
store that publishes the struct. Otherwise the reader can see a
partially setup struct.

I think here the npdev was setup somewhere else, and maybe there has
been an intervening barrier, but it's not clear. A comment at least
would be good.

In general I feel like a spinlock or two could significantly simply the
locking/ordering in this code, and given we're doing MMIOs anyway would
not affect performance.

</rant>

cheers


More information about the Linuxppc-dev mailing list