[PATCH kernel v4 08/10] KVM: PPC: Separate TCE validation from update

Alexey Kardashevskiy aik at ozlabs.ru
Thu Feb 9 19:20:11 AEDT 2017


On 09/02/17 14:51, David Gibson wrote:
> On Tue, Feb 07, 2017 at 06:17:09PM +1100, Alexey Kardashevskiy wrote:
>> For the emulated devices it does not matter much if we get a broken TCE
>> half way handling a TCE list but for VFIO it will matter as it has
>> more chances to fail so we try to do our best and check as much as we
>> can before proceeding.
>>
>> This separates a guest view table update from validation. No change in
>> behavior is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 8 ++++++--
>>  2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 15df8ae627d9..9a7b7fca5e84 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -282,6 +282,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
>> +			ret = H_TOO_HARD;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
> 
> This doesn't look safe.  The contents of user memory could change
> between the two get_user()s, meaning that you're no longer guaranteed
> a TCE loaded into kernel has been validated at all.
> 
> I think you need to either:
> 
>     a) Make sure things safe against a bad TCE being loaded into a TCE
>     table and move all validation to where the TCE is used, rather
>     than loaded
> 
> or
>     b) Copy the whole set of indirect entries to a temporary in-kernel
>        buffer, then validate, then load into the actual TCE table.


Correct :( The problem is I do not know how far I want to go in reverting
the state as it was when I started handling H_PUT_TCE_INDIRECT.

For example, 1 container, 2 IOMMU groups with disabled shared tables, so -
2 tables, 512 TCEs request and TCE#100 does not translate to host physical
address.


To do a) I'll need to remember old content of each hardware table entry as
when I reach TCE#100, I'll need to revert to the initial state which means
I need to write back old TCEs to all affected hardware tables and update
reference counters of all affected preregistered areas. Well, the actual
tables must not have different addresses (BUG_ON? is it worth testing while
writing to hardware tables that values I am replacing are the same in all
tables?) so I can have just a single array of old TCEs from hardware tables
in vcpu.


To do b) I'll need:

1. to have a copy of TCEs from the guest in vcpu, I populate it via
get_user() to make sure they won't change;
2. an array of userspace addresses translated from given TCEs; and in order
to make sure these addresses won't go away, I'll need to reference each
preregistered memory area via mm_iommu_mapped_inc().

When I reach TCE#100, I'll have to revert the change, i.e. call
mm_iommu_mapped_dec().

So I will end up having 2 arrays in a vcpu and simpler reverting code.


Or I can do simpler version of b) which would store guest TCEs in
kvm_vcpu_arch::tces[512] and use them after checking. If a malicious guest
does something bad and I return from H_PUT_TCE_INDIRECT in a middle of
request, some preregistered regions will stay referenced till the guest is
killed or rebooted (and this will prevent memory from unregistering) - but
this means no harm to the host; and with preregistered RAM, there is no
valid reason for H_PUT_TCE_INDIRECT to fail for a good guest.



Which approach to pick?


LoPAPR says:
===
If the TCE parameter represents the logical page address of a page that is
not valid for the calling partition, return
H_Parameter.
===



>>  
>>  		kvmppc_tce_put(stt, entry + i, tce);
>>  	}
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 918af76ab2b6..f8a54b7c788e 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -237,7 +237,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  {
>>  	struct kvmppc_spapr_tce_table *stt;
>>  	long i, ret = H_SUCCESS;
>> -	unsigned long tces, entry, ua = 0;
>> +	unsigned long tces, entry, tce, ua = 0;
>>  	unsigned long *rmap = NULL;
>>  
>>  	stt = kvmppc_find_table(vcpu->kvm, liobn);
>> @@ -279,11 +279,15 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  	}
>>  
>>  	for (i = 0; i < npages; ++i) {
>> -		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> +		tce = be64_to_cpu(((u64 *)tces)[i]);
>>  
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		tce = be64_to_cpu(((u64 *)tces)[i]);
> 
> Same problem here.
> 
>>  
>>  		kvmppc_tce_put(stt, entry + i, tce);
>>  	}
> 


-- 
Alexey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 839 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20170209/ce796d68/attachment.sig>


More information about the Linuxppc-dev mailing list