[RFC Qemu PATCH v2 0/2] spapr: nvdimm: Asynchronus flush hcall support
Greg Kurz
groug at kaod.org
Tue Dec 22 00:07:59 AEDT 2020
On Mon, 30 Nov 2020 09:16:14 -0600
Shivaprasad G Bhat <sbhat at linux.ibm.com> wrote:
> The nvdimm devices are expected to ensure write persistent during power
> failure kind of scenarios.
>
> The libpmem has architecture specific instructions like dcbf on power
> to flush the cache data to backend nvdimm device during normal writes.
>
> Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest
> doesn't traslate to actual flush to the backend file on the host in case
> of file backed vnvdimms. This is addressed by virtio-pmem in case of x86_64
> by making asynchronous flushes.
>
> On PAPR, issue is addressed by adding a new hcall to
> request for an explicit asynchronous flush requests from the guest ndctl
> driver when the backend nvdimm cannot ensure write persistence with dcbf
> alone. So, the approach here is to convey when the asynchronous flush is
> required in a device tree property. The guest makes the hcall when the
> property is found, instead of relying on dcbf.
>
> The first patch adds the necessary asynchronous hcall support infrastructure
> code at the DRC level. Second patch implements the hcall using the
> infrastructure.
>
> Hcall semantics are in review and not final.
>
> A new device property sync-dax is added to the nvdimm device. When the
> sync-dax is off(default), the asynchronous hcalls will be called.
>
> With respect to save from new qemu to restore on old qemu, having the
> sync-dax by default off(when not specified) causes IO errors in guests as
> the async-hcall would not be supported on old qemu. The new hcall
> implementation being supported only on the new pseries machine version,
> the current machine version checks may be sufficient to prevent
> such migration. Please suggest what should be done.
>
First, all requests that are still not completed from the guest POV,
ie. the hcall hasn't returned H_SUCCESS yet, are state that we should
migrate in theory. In this case, I guess we rather want to drain all
pending requests on the source in some pre-save handler.
Then, as explained in another mail, you should enforce stable behavior
for existing machine types with some hw_compat magic.
> The below demonstration shows the map_sync behavior with sync-dax on & off.
> (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c)
>
> The pmem0 is from nvdimm with With sync-dax=on, and pmem1 is from nvdimm with syn-dax=off, mounted as
> /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
>
> [root at atest-guest ~]# ./mapsync /mnt1/newfile ----> When sync-dax=off
> [root at atest-guest ~]# ./mapsync /mnt2/newfile ----> when sync-dax=on
> Failed to mmap with Operation not supported
>
> ---
> v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html
> Changes from v1
> - Fixed a missed-out unlock
> - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating token
>
> Shivaprasad G Bhat (2):
> spapr: drc: Add support for async hcalls at the drc level
> spapr: nvdimm: Implement async flush hcalls
>
>
> hw/mem/nvdimm.c | 1
> hw/ppc/spapr_drc.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
> hw/ppc/spapr_nvdimm.c | 79 ++++++++++++++++++++++++
> include/hw/mem/nvdimm.h | 10 +++
> include/hw/ppc/spapr.h | 3 +
> include/hw/ppc/spapr_drc.h | 25 ++++++++
> 6 files changed, 263 insertions(+), 1 deletion(-)
>
> --
> Signature
>
>
More information about the Linuxppc-dev
mailing list