[RFC PATCH v4 4/7] PCI: Modify resource_alignment to support multiple devices

Yongji Xie xyjxie at linux.vnet.ibm.com
Thu Mar 17 22:28:34 AEDT 2016


On 2016/3/17 0:30, Alex Williamson wrote:
> On Mon,  7 Mar 2016 15:48:35 +0800
> Yongji Xie <xyjxie at linux.vnet.ibm.com> wrote:
>
>> When vfio passthrough a PCI device of which MMIO BARs
>> are smaller than PAGE_SIZE, guest will not handle the
>> mmio accesses to the BARs which leads to mmio emulations
>> in host.
>>
>> This is because vfio will not allow to passthrough one
>> BAR's mmio page which may be shared with other BARs.
>>
>> To solve this performance issue, this patch modifies
>> resource_alignment to support syntax where multiple
>> devices get the same alignment. So we can use something
>> like "pci=resource_alignment=*:*:*.*:noresize" to
>> enforce the alignment of all MMIO BARs to be at least
>> PAGE_SIZE so that one BAR's mmio page would not be
>> shared with other BARs.
>>
>> Signed-off-by: Yongji Xie <xyjxie at linux.vnet.ibm.com>
>> ---
>>   Documentation/kernel-parameters.txt |    2 +
>>   drivers/pci/pci.c                   |   90 ++++++++++++++++++++++++++++++-----
>>   include/linux/pci.h                 |    4 ++
>>   3 files changed, 85 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>> index 8028631..74b38ab 100644
>> --- a/Documentation/kernel-parameters.txt
>> +++ b/Documentation/kernel-parameters.txt
>> @@ -2918,6 +2918,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>>   				aligned memory resources.
>>   				If <order of align> is not specified,
>>   				PAGE_SIZE is used as alignment.
>> +				<domain>, <bus>, <slot> and <func> can be set to
>> +				"*" which means match all values.
> I don't see anywhere that you're automatically enabling this for your
> platform, so presumably you're expecting users to determine on their
> own that they have a performance problem and hoping that they'll figure
> out that they need to use this option to resolve it.  The first irate
> question you'll get back is why doesn't this happen automatically?

Actually, I just want to make the code simple. Maybe it is
not a good idea to let users enable this option manually.
I will try to fix it like that:

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 6f8065a..6659752 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -30,6 +30,8 @@
  #define PCIBIOS_MIN_IO         0x1000
  #define PCIBIOS_MIN_MEM                0x10000000

+#define PCIBIOS_MIN_ALIGNMENT  PAGE_SIZE
+
  struct pci_dev;

  /* Values for the `which' argument to sys_pciconfig_iobase syscall.  */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index dadd28a..9f644e4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4593,6 +4593,8 @@ EXPORT_SYMBOL_GPL(pci_ignore_hotplug);
  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
  static DEFINE_SPINLOCK(resource_alignment_lock);

+#define DISABLE_ARCH_ALIGNMENT -1
+#define DEFAULT_ALIGNMENT      -2
  /**
   * pci_specified_resource_alignment - get resource alignment specified 
by user.
   * @dev: the PCI device to get
@@ -4609,6 +4611,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
         char *p;
         bool invalid = false;

+#ifdef PCIBIOS_MIN_ALIGNMENT
+       align = PCIBIOS_MIN_ALIGNMENT;
+#endif
         spin_lock(&resource_alignment_lock);
         p = resource_alignment_param;
         while (*p) {
@@ -4617,7 +4622,7 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
                                                         p[count] == '@') {
                         p += count + 1;
                 } else {
-                       align_order = -1;
+                       align_order = DEFAULT_ALIGNMENT;
                 }
                 if (p[0] == '*' && p[1] == ':') {
                         seg = -1;
@@ -4673,8 +4678,10 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
                         (bus == dev->bus->number || bus == -1) &&
                         (slot == PCI_SLOT(dev->devfn) || slot == -1) &&
                         (func == PCI_FUNC(dev->devfn) || func == -1)) {
-                       if (align_order == -1)
+                       if (align_order == DEFAULT_ALIGNMENT)
                                 align = PAGE_SIZE;
+                       else if (align_order == DISABLE_ARCH_ALIGNMENT)
+                               align = 0;
                         else
                                 align = 1 << align_order;
                         if (!pci_resources_page_aligned &&
>>   				PCI-PCI bridge can be specified, if resource
>>   				windows need to be expanded.
>>   				noresize: Don't change the resources' sizes when
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 760cce5..44ab59f 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -102,6 +102,8 @@ unsigned int pcibios_max_latency = 255;
>>   /* If set, the PCIe ARI capability will not be used. */
>>   static bool pcie_ari_disabled;
>>   
>> +bool pci_resources_page_aligned;
>> +
>>   /**
>>    * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
>>    * @bus: pointer to PCI bus structure to search
>> @@ -4604,6 +4606,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>   	int seg, bus, slot, func, align_order, count;
>>   	resource_size_t align = 0;
>>   	char *p;
>> +	bool invalid = false;
>>   
>>   	spin_lock(&resource_alignment_lock);
>>   	p = resource_alignment_param;
>> @@ -4615,16 +4618,49 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>   		} else {
>>   			align_order = -1;
>>   		}
>> -		if (sscanf(p, "%x:%x:%x.%x%n",
>> -			&seg, &bus, &slot, &func, &count) != 4) {
>> +		if (p[0] == '*' && p[1] == ':') {
>> +			seg = -1;
>> +			count = 1;
>> +		} else if (sscanf(p, "%x%n", &seg, &count) != 1 ||
>> +				p[count] != ':') {
>> +			invalid = true;
>> +			break;
>> +		}
>> +		p += count + 1;
>> +		if (*p == '*') {
>> +			bus = -1;
>> +			count = 1;
>> +		} else if (sscanf(p, "%x%n", &bus, &count) != 1) {
>> +			invalid = true;
>> +			break;
>> +		}
>> +		p += count;
>> +		if (*p == '.') {
>> +			slot = bus;
>> +			bus = seg;
>>   			seg = 0;
>> -			if (sscanf(p, "%x:%x.%x%n",
>> -					&bus, &slot, &func, &count) != 3) {
>> -				/* Invalid format */
>> -				printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n",
>> -					p);
>> +			p++;
>> +		} else if (*p == ':') {
>> +			p++;
>> +			if (p[0] == '*' && p[1] == '.') {
>> +				slot = -1;
>> +				count = 1;
>> +			} else if (sscanf(p, "%x%n", &slot, &count) != 1 ||
>> +					p[count] != '.') {
>> +				invalid = true;
>>   				break;
>>   			}
>> +			p += count + 1;
>> +		} else {
>> +			invalid = true;
>> +			break;
>> +		}
>> +		if (*p == '*') {
>> +			func = -1;
>> +			count = 1;
>> +		} else if (sscanf(p, "%x%n", &func, &count) != 1) {
>> +			invalid = true;
>> +			break;
>>   		}
>>   		p += count;
>>   		if (!strncmp(p, ":noresize", 9)) {
>> @@ -4632,23 +4668,34 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>>   			p += 9;
>>   		} else
>>   			*resize = true;
>> -		if (seg == pci_domain_nr(dev->bus) &&
>> -			bus == dev->bus->number &&
>> -			slot == PCI_SLOT(dev->devfn) &&
>> -			func == PCI_FUNC(dev->devfn)) {
>> +		if ((seg == pci_domain_nr(dev->bus) || seg == -1) &&
>> +			(bus == dev->bus->number || bus == -1) &&
>> +			(slot == PCI_SLOT(dev->devfn) || slot == -1) &&
>> +			(func == PCI_FUNC(dev->devfn) || func == -1)) {
>>   			if (align_order == -1)
>>   				align = PAGE_SIZE;
>>   			else
>>   				align = 1 << align_order;
>> +			if (!pci_resources_page_aligned &&
>> +				(align >= PAGE_SIZE &&
>> +				seg == -1 && bus == -1 &&
>> +				slot == -1 && func == -1))
>> +				pci_resources_page_aligned = true;
>>   			/* Found */
>>   			break;
>>   		}
>>   		if (*p != ';' && *p != ',') {
>>   			/* End of param or invalid format */
>> +			invalid = true;
>>   			break;
>>   		}
>>   		p++;
>>   	}
>> +	if (invalid == true) {
>> +		/* Invalid format */
>> +		printk(KERN_ERR "PCI: Can't parse resource_alignment parameter:%s\n",
>> +				p);
>> +	}
>>   	spin_unlock(&resource_alignment_lock);
>>   	return align;
>>   }
>> @@ -4769,6 +4816,27 @@ static int __init pci_resource_alignment_sysfs_init(void)
>>   }
>>   late_initcall(pci_resource_alignment_sysfs_init);
>>   
>> +/*
>> + * This function checks whether PCI BARs' mmio page will be shared
>> + * with other BARs.
>> + */
>> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
>> +{
>> +	struct resource *res = dev->resource + resno;
>> +
>> +	if (resource_size(res) >= PAGE_SIZE)
>> +		return false;
>> +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
>> +		res->flags & IORESOURCE_MEM) {
>> +		if (res->sibling)
>> +			return (res->sibling->start & ~PAGE_MASK);
>> +		else
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +EXPORT_SYMBOL_GPL(pci_resources_share_page);
> This should be a separate patch, there's no mention of this part of the
> change at all in the commitlog.  Also, pci_resource_page_aligned is
> only set if we use the magic wildcards to set alignment for all
> devices, it's not set if we use a specific seg:bus:dev.fn.  That's
> not consistent.  Can't we make use of the STARTALIGN flag for this or
> did that get removed when resources were assigned?

I only set pci_resource_page_aligned when using magic wildcards
because I must make sure pci_resources_share_page() can return
correctly when a hotplug event happens.

For example, there is only one half-page resource in the system.
And we use a specific seg:bus:dev.fn to force it to be page aligned.
So we set pci_resource_page_aligned because all devices are page
aligned now. It will return false when we call
pci_resources_share_page() for this resource. But this return value
will be not correct if we hot add a new device which also have
a half-page resource which will share one page with the previous
resource.

> The test itself is not entirely reassuring, I'd like some positive
> indication that the device has been aligned, not simply that it should
> have been and the start alignment appears that it might have happened.
> Apparently you don't trust pci_resources_page_aligned either since you
> still test the start alignment.

Yes, I don't trust pci_resources_page_aligned. Because
resource_alignment will not work in some case, e.g.
PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED is set.

But I think *start* of the resource can give the positive
indication. The resource should not share page with
others if the start of the resource and its sibling are both
page aligned.

>> +
>>   static void pci_no_domains(void)
>>   {
>>   #ifdef CONFIG_PCI_DOMAINS
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 2771625..064a1b6 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1511,6 +1511,10 @@ static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
>>   	 (pci_resource_end((dev), (bar)) -		\
>>   	  pci_resource_start((dev), (bar)) + 1))
>>   
>> +extern bool pci_resources_page_aligned;
> I'm not seeing why this shouldn't have been static since you're
> providing a function that tests it, there shouldn't really be any
> external consumers.

I made a mistake here. I will fix it in next version.

Thanks,
Yongji Xie.



More information about the Linuxppc-dev mailing list