[RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

Jordan Niethe jniethe5 at gmail.com
Sat Jun 10 12:09:13 AEST 2023


On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin <npiggin at gmail.com> wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

     * GET state hcalls
     * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE         0x0001 /* Size of Hypervisor Internal Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE     0x0002 /* Minimum size of the Run VCPU output buffer */
> > +#define GSID_LOGICAL_PVR             0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET                       0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE         0x0005 /* Partition Scoped Page Table */
> > +#define GSID_PROCESS_TABLE           0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT                       0x0C00 /* Run VCPU Input Buffer */
> > +#define GSID_RUN_OUTPUT                      0x0C01 /* Run VCPU Out Buffer */
> > +#define GSID_VPA                     0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)                  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB          0x1020
> > +#define GSID_NIA                     0x1021
> > +#define GSID_MSR                     0x1022
> > +#define GSID_LR                              0x1023
> > +#define GSID_XER                     0x1024
> > +#define GSID_CTR                     0x1025
> > +#define GSID_CFAR                    0x1026
> > +#define GSID_SRR0                    0x1027
> > +#define GSID_SRR1                    0x1028
> > +#define GSID_DAR                     0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar              GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

     gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > +     return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > +     val &= gsid_mask(iden);
> > +     if (gsid_size(iden) == sizeof(u64))
> > +             return gse_put_u64(gsb, iden, val);
> > +
> > +     if (gsid_size(iden) == sizeof(u32)) {
> > +             u32 tmp;
> > +
> > +             tmp = (u32)val;
> > +             if (tmp != val)
> > +                     return -EINVAL;
> > +
> > +             return gse_put_u32(gsb, iden, tmp);
> > +     }
> > +     return -EINVAL;
> > +}
>
> There is a clever accessor that derives the length from the type, but
> then you fall back to this.

It's basically just to massage where we have a kvm representation and
guest state buffer representation mismatch:

Like: unsigned long ccr; being 8 bytes and having 4 byte CR in the spec.

>
> > +/**
> > + * gse_put - add a guest state element to a buffer
> > + * @gsb: guest state buffer to add to
> > + * @iden: guest state identity
> > + * @v: generic value
> > + */
> > +#define gse_put(gsb, iden, v)                                        \
> > +     (_Generic((v),                                          \
> > +               u64 : __gse_put_reg,                          \
> > +               long unsigned int : __gse_put_reg,            \
> > +               u32 : __gse_put_reg,                          \
> > +               struct gs_buff_info : gse_put_buff_info,      \
> > +               struct gs_proc_table : gse_put_proc_table,    \
> > +               struct gs_part_table : gse_put_part_table,    \
> > +               vector128 : gse_put_vector128)(gsb, iden, v))
> > +
> > +/**
> > + * gse_get - return the data of a guest state element
> > + * @gsb: guest state element to add to
> > + * @v: generic value pointer to return in
> > + */
> > +#define gse_get(gse, v)                                              \
> > +     (*v = (_Generic((v),                                    \
> > +                     u64 * : __gse_get_reg,                  \
> > +                     unsigned long * : __gse_get_reg,        \
> > +                     u32 * : __gse_get_reg,                  \
> > +                     vector128 * : gse_get_vector128)(gse)))
>
> I don't see the benefit of this. Caller always knows the type doesn't
> it? It seems like the right function could be called directly. It
> makes the calling convention a bit clunky too. I know there's similar
> precedent for uaccess functions, but not sure I like it for this.

The compiler also knows so I just thought I'd save some typing.
I agree it's kind of ugly, happy to drop it.

[snip]
>
> Should all be GPL exports.
>
> Needs more namespace too, I reckon (not just exports but any kernel-wide
> name this short and non-descriptive needs a kvmppc or kvm_papr or
> something).

Will do.

Thanks,
Jordan

>
> Thanks,
> Nick
>


More information about the Linuxppc-dev mailing list