[PATCH kernel v4 08/10] KVM: PPC: Separate TCE validation from update
Alexey Kardashevskiy
aik at ozlabs.ru
Thu Feb 9 19:20:11 AEDT 2017
On 09/02/17 14:51, David Gibson wrote:
> On Tue, Feb 07, 2017 at 06:17:09PM +1100, Alexey Kardashevskiy wrote:
>> For the emulated devices it does not matter much if we get a broken TCE
>> half way handling a TCE list but for VFIO it will matter as it has
>> more chances to fail so we try to do our best and check as much as we
>> can before proceeding.
>>
>> This separates a guest view table update from validation. No change in
>> behavior is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> ---
>> arch/powerpc/kvm/book3s_64_vio.c | 8 ++++++++
>> arch/powerpc/kvm/book3s_64_vio_hv.c | 8 ++++++--
>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 15df8ae627d9..9a7b7fca5e84 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -282,6 +282,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> ret = kvmppc_tce_validate(stt, tce);
>> if (ret != H_SUCCESS)
>> goto unlock_exit;
>> + }
>> +
>> + for (i = 0; i < npages; ++i) {
>> + if (get_user(tce, tces + i)) {
>> + ret = H_TOO_HARD;
>> + goto unlock_exit;
>> + }
>> + tce = be64_to_cpu(tce);
>
> This doesn't look safe. The contents of user memory could change
> between the two get_user()s, meaning that you're no longer guaranteed
> a TCE loaded into kernel has been validated at all.
>
> I think you need to either:
>
> a) Make sure things safe against a bad TCE being loaded into a TCE
> table and move all validation to where the TCE is used, rather
> than loaded
>
> or
> b) Copy the whole set of indirect entries to a temporary in-kernel
> buffer, then validate, then load into the actual TCE table.
Correct :( The problem is I do not know how far I want to go in reverting
the state as it was when I started handling H_PUT_TCE_INDIRECT.
For example, 1 container, 2 IOMMU groups with disabled shared tables, so -
2 tables, 512 TCEs request and TCE#100 does not translate to host physical
address.
To do a) I'll need to remember old content of each hardware table entry as
when I reach TCE#100, I'll need to revert to the initial state which means
I need to write back old TCEs to all affected hardware tables and update
reference counters of all affected preregistered areas. Well, the actual
tables must not have different addresses (BUG_ON? is it worth testing while
writing to hardware tables that values I am replacing are the same in all
tables?) so I can have just a single array of old TCEs from hardware tables
in vcpu.
To do b) I'll need:
1. to have a copy of TCEs from the guest in vcpu, I populate it via
get_user() to make sure they won't change;
2. an array of userspace addresses translated from given TCEs; and in order
to make sure these addresses won't go away, I'll need to reference each
preregistered memory area via mm_iommu_mapped_inc().
When I reach TCE#100, I'll have to revert the change, i.e. call
mm_iommu_mapped_dec().
So I will end up having 2 arrays in a vcpu and simpler reverting code.
Or I can do simpler version of b) which would store guest TCEs in
kvm_vcpu_arch::tces[512] and use them after checking. If a malicious guest
does something bad and I return from H_PUT_TCE_INDIRECT in a middle of
request, some preregistered regions will stay referenced till the guest is
killed or rebooted (and this will prevent memory from unregistering) - but
this means no harm to the host; and with preregistered RAM, there is no
valid reason for H_PUT_TCE_INDIRECT to fail for a good guest.
Which approach to pick?
LoPAPR says:
===
If the TCE parameter represents the logical page address of a page that is
not valid for the calling partition, return
H_Parameter.
===
>>
>> kvmppc_tce_put(stt, entry + i, tce);
>> }
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 918af76ab2b6..f8a54b7c788e 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -237,7 +237,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> {
>> struct kvmppc_spapr_tce_table *stt;
>> long i, ret = H_SUCCESS;
>> - unsigned long tces, entry, ua = 0;
>> + unsigned long tces, entry, tce, ua = 0;
>> unsigned long *rmap = NULL;
>>
>> stt = kvmppc_find_table(vcpu->kvm, liobn);
>> @@ -279,11 +279,15 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> }
>>
>> for (i = 0; i < npages; ++i) {
>> - unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> + tce = be64_to_cpu(((u64 *)tces)[i]);
>>
>> ret = kvmppc_tce_validate(stt, tce);
>> if (ret != H_SUCCESS)
>> goto unlock_exit;
>> + }
>> +
>> + for (i = 0; i < npages; ++i) {
>> + tce = be64_to_cpu(((u64 *)tces)[i]);
>
> Same problem here.
>
>>
>> kvmppc_tce_put(stt, entry + i, tce);
>> }
>
--
Alexey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 839 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20170209/ce796d68/attachment.sig>
More information about the Linuxppc-dev
mailing list