[PATCH] powerpc-powernv: align BARs to PAGE_SIZE on powernv platform
Alexey Kardashevskiy
aik at ozlabs.ru
Wed Sep 5 10:55:13 EST 2012
On 05/09/12 05:45, Benjamin Herrenschmidt wrote:
> On Tue, 2012-09-04 at 17:36 +1000, Alexey Kardashevskiy wrote:
>> VFIO adds a separate memory region for every BAR and tries
>> to mmap() it to provide direct BAR mapping to the guest.
>> If it succeedes, QEMU registers this address with kvm_set_phys_mem().
>> However it is not always possible because such a BAR should
>> be host page size aligned. In this case VFIO uses "slow" path
>> and emulated BAR access in QEMU.
>>
>> In order to avoid "slow" path, BARs have to be PAGE_SIZE aligned
>> in the host kernel and this is what the patch does.
>>
>> The patch adds powernv platform specific hook which makes all
>> BARs sizes 64K aligned. The pci_reassigndev_resource_alignment()
>> function from drivers/pci/pci.c has been used as a reference.
>>
>> This is purely an optimization patch, the things will work without
>> it, just a bit slower.
>
> It's still bad in more ways that I care to explain...
Well it is right before pci_reassigndev_resource_alignment() which is
common and does the same thing.
> The main one is that you do the "fixup" in a very wrong place anyway and
> it might cause cases of overlapping BARs.
As far as I can tell it may only happen if someone tries to align resource
via kernel command line.
But ok. I trust you :)
> In any case this is wrong. It's a VFIO design bug and needs to be fixed
> there (CC'ing Alex).
It can be fixed in VFIO only if VFIO will stop treating functions
separately and start mapping group's MMIO space as a whole thing. But this
is not going to happen.
The example of the problem is NEC USB PCI which has 3 functions, each has
one BAR, these BARs are 4K aligned and I cannot see how it can be fixed
with 64K page size and VFIO creating memory regions per BAR (not per PHB).
> IE. We need a way to know where the BAR is within a page at which point
> VFIO can still map the page, but can also properly take into account the
> offset.
It is not about VFIO, it is about KVM. I cannot put non-aligned page to
kvm_set_phys_mem(). Cannot understand how we would solve this.
You better discuss it with David, my vocab is weak.
> We also need a way to tell VFIO userspace that it's OK to use the fast
> path for such small BARs. It's not for all host platforms. We know it's
> ok for PowerNV because we know the devices are grouped by PEs and the PE
> granularity is larger than a page but that's not necessarily going to be
> the case on all powerpc platforms that support KVM.
>
> Cheers,
> Ben.
>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> ---
>> arch/powerpc/platforms/powernv/setup.c | 26 ++++++++++++++++++++++++++
>> 1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
>> index db1ad1c..331838e 100644
>> --- a/arch/powerpc/platforms/powernv/setup.c
>> +++ b/arch/powerpc/platforms/powernv/setup.c
>> @@ -25,6 +25,7 @@
>> #include <linux/of.h>
>> #include <linux/interrupt.h>
>> #include <linux/bug.h>
>> +#include <linux/pci.h>
>>
>> #include <asm/machdep.h>
>> #include <asm/firmware.h>
>> @@ -179,6 +180,30 @@ static int __init pnv_probe(void)
>> return 1;
>> }
>>
>> +static void pnv_pcibios_fixup_resources(struct pci_dev *pdev)
>> +{
>> + struct resource *r;
>> + int i;
>> +
>> + /*
>> + * Aligning resources to PAGE_SIZE in order to
>> + * support "fast" path for PCI BAR access under VFIO
>> + * which maps every BAR individually to the guest
>> + * so BARs have to be PAGE aligned.
>> + */
>> + for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
>> + r = &pdev->resource[i];
>> + if (!r->flags)
>> + continue;
>> + pr_debug("powernv: %s, aligning BAR#%d %llx..%llx",
>> + pdev->dev.kobj.name, i, r->start, r->end);
>> + r->end = PAGE_ALIGN(r->end - r->start + 1) - 1;
>> + r->start = 0;
>> + r->flags |= IORESOURCE_UNSET;
>> + pr_debug(" to %llx..%llx\n", r->start, r->end);
>> + }
>> +}
>> +
>> define_machine(powernv) {
>> .name = "PowerNV",
>> .probe = pnv_probe,
>> @@ -189,6 +214,7 @@ define_machine(powernv) {
>> .progress = pnv_progress,
>> .power_save = power7_idle,
>> .calibrate_decr = generic_calibrate_decr,
>> + .pcibios_fixup_resources= pnv_pcibios_fixup_resources,
>> #ifdef CONFIG_KEXEC
>> .kexec_cpu_down = pnv_kexec_cpu_down,
>> #endif
>
>
--
Alexey
More information about the Linuxppc-dev
mailing list