[PATCH V3 09/30] arm/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

Russell King (Oracle) linux at armlinux.org.uk
Tue Mar 1 11:31:28 AEDT 2022


On Tue, Mar 01, 2022 at 05:30:41AM +0530, Anshuman Khandual wrote:
> On 2/28/22 4:27 PM, Russell King (Oracle) wrote:
> > On Mon, Feb 28, 2022 at 04:17:32PM +0530, Anshuman Khandual wrote:
> >> This defines and exports a platform specific custom vm_get_page_prot() via
> >> subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
> >> macros can be dropped which are no longer needed.
> > 
> > What I would really like to know is why having to run _code_ to work out
> > what the page protections need to be is better than looking it up in a
> > table.
> > 
> > Not only is this more expensive in terms of CPU cycles, it also brings
> > additional code size with it.
> > 
> > I'm struggling to see what the benefit is.
> 
> Currently vm_get_page_prot() is also being _run_ to fetch required page
> protection values. Although that is being run in the core MM and from a
> platform perspective __SXXX, __PXXX are just being exported for a table.
> Looking it up in a table (and applying more constructs there after) is
> not much different than a clean switch case statement in terms of CPU
> usage. So this is not more expensive in terms of CPU cycles.

I disagree.

However, let's base this disagreement on some evidence. Here is the
present 32-bit ARM implementation:

00000048 <vm_get_page_prot>:
      48:       e200000f        and     r0, r0, #15
      4c:       e3003000        movw    r3, #0
                        4c: R_ARM_MOVW_ABS_NC   .LANCHOR1
      50:       e3403000        movt    r3, #0
                        50: R_ARM_MOVT_ABS      .LANCHOR1
      54:       e7930100        ldr     r0, [r3, r0, lsl #2]
      58:       e12fff1e        bx      lr

That is five instructions long.

Please show that your new implementation is not more expensive on
32-bit ARM. Please do so by building a 32-bit kernel, and providing
the disassembly.

I think you will find way more than five instructions in your version -
the compiler will have to issue code to decode the protection bits,
probably using a table of branches or absolute PC values, or possibly
the worst case of using multiple comparisons and branches. It then has
to load constants that may be moved using movw on ARMv7, but on
older architectures would have to be created from multiple instructions
or loaded from the literal pool. Then there'll be instructions to load
the address of "user_pgprot", retrieve its value, and bitwise or that.

Therefore, I fail to see how your approach of getting rid of the table
is somehow "better" than what we currently have in terms of the effect
on the resulting code.

If you don't like the __P and __S stuff and two arch_* hooks, you could
move the table into arch code along with vm_get_page_prot() without the
additional unnecessary hooks, while keeping all the benefits of the
table lookup.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


More information about the Linuxppc-dev mailing list