[PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

Tue Dec 3 13:15:04 AEDT 2019

On 03/12/2019 13:08, Ram Pai wrote:
> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 02/12/2019 17:45, Ram Pai wrote:
>>> H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as one of
>>> its parameters. One page is dedicated per cpu, for the lifetime of the
>>> kernel for this purpose. On secure VMs, contents of this page, when
>>> accessed by the hypervisor, retrieves encrypted TCE entries.  Hypervisor
>>> needs to know the unencrypted entries, to update the TCE table
>>> accordingly.  There is nothing secret or sensitive about these entries.
>>> Hence share the page with the hypervisor.
>>
>> This unsecures a page in the guest in a random place which creates an
>> additional attack surface which is hard to exploit indeed but
>> nevertheless it is there.
>> A safer option would be not to use the
>> hcall-multi-tce hyperrtas option (which translates FW_FEATURE_MULTITCE
>> in the guest).
> 
> 
> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets invoked
> automatically when IOMMU option is enabled.

It is advertised by QEMU but the guest does not have to use it.

> This happens even
> on a normal VM when IOMMU is enabled.
> 
> 
>>
>> Also what is this for anyway? 
> 
> This is for sending indirect-TCE entries to the hypervisor.
> The hypervisor must be able to read those TCE entries, so that it can 
> use those entires to populate the TCE table with the correct mappings.
> 
>> if I understand things right, you cannot
>> map any random guest memory, you should only be mapping that 64MB-ish
>> bounce buffer array but 1) I do not see that happening (I may have
>> missed it) 2) it should be done once and it takes a little time for
>> whatever memory size we allow for bounce buffers anyway. Thanks,
> 
> Any random guest memory can be shared by the guest. 

Yes but we do not want this to be this random. I thought the whole idea
of swiotlb was to restrict the amount of shared memory to bare minimum,
what do I miss?

> Maybe you are confusing this with the SWIOTLB bounce buffers used by PCI
> devices, to transfer data to the hypervisor?

Is not this for pci+swiotlb? The cover letter suggests it is for
virtio-scsi-_pci_ with 	iommu_platform=on which makes it a normal pci
device just like emulated XHCI. Thanks,

> 
>>
>>
>>>
>>> Signed-off-by: Ram Pai <linuxram at us.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 23 ++++++++++++++++++++---
>>>  1 file changed, 20 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6ba081d..0720831 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -37,6 +37,7 @@
>>>  #include <asm/mmzone.h>
>>>  #include <asm/plpar_wrappers.h>
>>>  #include <asm/svm.h>
>>> +#include <asm/ultravisor.h>
>>>  
>>>  #include "pseries.h"
>>>  
>>> @@ -179,6 +180,23 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  
>>>  static DEFINE_PER_CPU(__be64 *, tce_page);
>>>  
>>> +/*
>>> + * Allocate a tce page.  If secure VM, share the page with the hypervisor.
>>> + *
>>> + * NOTE: the TCE page is shared with the hypervisor explicitly and remains
>>> + * shared for the lifetime of the kernel. It is implicitly unshared at kernel
>>> + * shutdown through a UV_UNSHARE_ALL_PAGES ucall.
>>> + */
>>> +static __be64 *alloc_tce_page(void)
>>> +{
>>> +	__be64 *tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +
>>> +	if (tcep && is_secure_guest())
>>> +		uv_share_page(PHYS_PFN(__pa(tcep)), 1);
>>> +
>>> +	return tcep;
>>> +}
>>> +
>>>  static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  				     long npages, unsigned long uaddr,
>>>  				     enum dma_data_direction direction,
>>> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>>>  	 * from iommu_alloc{,_sg}()
>>>  	 */
>>>  	if (!tcep) {
>>> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> -		/* If allocation fails, fall back to the loop implementation */
>>> +		tcep = alloc_tce_page();
>>>  		if (!tcep) {
>>>  			local_irq_restore(flags);
>>>  			return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
>>> @@ -405,7 +422,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>>>  	tcep = __this_cpu_read(tce_page);
>>>  
>>>  	if (!tcep) {
>>> -		tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
>>> +		tcep = alloc_tce_page();
>>>  		if (!tcep) {
>>>  			local_irq_enable();
>>>  			return -ENOMEM;
>>>
>>
>> -- 
>> Alexey
> 

-- 
Alexey