[PATCH v6 6/8] powerpc/pmem: Avoid the barrier in flush routines

Dan Williams dan.j.williams at intel.com
Tue Jun 30 11:50:15 AEST 2020


On Mon, Jun 29, 2020 at 1:41 PM Aneesh Kumar K.V
<aneesh.kumar at linux.ibm.com> wrote:
>
> Michal Suchánek <msuchanek at suse.de> writes:
>
> > Hello,
> >
> > On Mon, Jun 29, 2020 at 07:27:20PM +0530, Aneesh Kumar K.V wrote:
> >> nvdimm expect the flush routines to just mark the cache clean. The barrier
> >> that mark the store globally visible is done in nvdimm_flush().
> >>
> >> Update the papr_scm driver to a simplified nvdim_flush callback that do
> >> only the required barrier.
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> >> ---
> >>  arch/powerpc/lib/pmem.c                   |  6 ------
> >>  arch/powerpc/platforms/pseries/papr_scm.c | 13 +++++++++++++
> >>  2 files changed, 13 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> >> index 5a61aaeb6930..21210fa676e5 100644
> >> --- a/arch/powerpc/lib/pmem.c
> >> +++ b/arch/powerpc/lib/pmem.c
> >> @@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
> >>
> >>      for (i = 0; i < size >> shift; i++, addr += bytes)
> >>              asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
> >> -
> >> -
> >> -    asm volatile(PPC_PHWSYNC ::: "memory");
> >>  }
> >>
> >>  static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
> >> @@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
> >>
> >>      for (i = 0; i < size >> shift; i++, addr += bytes)
> >>              asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
> >> -
> >> -
> >> -    asm volatile(PPC_PHWSYNC ::: "memory");
> >>  }
> >>
> >>  static inline void clean_pmem_range(unsigned long start, unsigned long stop)
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 9c569078a09f..9a9a0766f8b6 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -630,6 +630,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >>
> >>      return 0;
> >>  }
> >> +/*
> >> + * We have made sure the pmem writes are done such that before calling this
> >> + * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
> >> + * we just need to add the necessary barrier to make sure the above flushes
> >> + * are have updated persistent storage before any data access or data transfer
> >> + * caused by subsequent instructions is initiated.
> >> + */
> >> +static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
> >> +{
> >> +    arch_pmem_flush_barrier();
> >> +    return 0;
> >> +}
> >>
> >>  static ssize_t flags_show(struct device *dev,
> >>                        struct device_attribute *attr, char *buf)
> >> @@ -743,6 +755,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>      ndr_desc.mapping = &mapping;
> >>      ndr_desc.num_mappings = 1;
> >>      ndr_desc.nd_set = &p->nd_set;
> >> +    ndr_desc.flush = papr_scm_flush_sync;
> >
> > AFAICT currently the only device that implements flush is virtio_pmem.
> > How does the nfit driver get away without implementing flush?
>
> generic_nvdimm_flush does the required barrier for nfit. The reason for
> adding ndr_desc.flush call back for papr_scm was to avoid the usage
> of iomem based deep flushing (ndr_region_data.flush_wpq) which is not
> supported by papr_scm.
>
> BTW we do return NULL for ndrd_get_flush_wpq() on power. So the upstream
> code also does the same thing, but in a different way.
>
>
> > Also the flush takes arguments that are completely unused but a user of
> > the pmem region must assume they are used, and call flush() on the
> > region rather than arch_pmem_flush_barrier() directly.
>
> The bio argument can help a pmem driver to do range based flushing in
> case of pmem_make_request. If bio is null then we must assume a full
> device flush.

The bio argument isn't for range based flushing, it is for flush
operations that need to complete asynchronously.

There's no mechanism for the block layer to communicate range based
cache flushing, block-device flushing is assumed to be the device's
entire cache. For pmem that would be the entirety of the cpu cache.
Instead of modeling the cpu cache as a storage device cache it is
modeled as page-cache. Once the fs-layer writes back page-cache /
cpu-cache the storage device is only responsible for flushing those
cache-writes into the persistence domain.

Additionally there is a concept of deep-flush that relegates some
power-fail scenarios to a smaller failure domain. For example consider
the difference between a write arriving at the head of a device-queue
and successfully traversing a device-queue to media. The expectation
of pmem applications is that data is persisted once they reach the
equivalent of the x86 ADR domain, deep-flush is past ADR.


More information about the Linuxppc-dev mailing list