[PATCH 1/6] powerpc/code-patching: Implement generic text patching function
Christophe Leroy
christophe.leroy at csgroup.eu
Tue Sep 20 15:44:08 AEST 2022
Le 20/09/2022 à 04:32, Benjamin Gray a écrit :
> On Mon, 2022-09-19 at 07:16 +0000, Christophe Leroy wrote:
>> Why would it be unpredictable ? Only one page is mapped. If it
>> crosses
>> the boundary, __put_kernel_nofault() will fail in a controled manner.
>> I see no point in doing the check before every write.
>
> Oh I didn't see that get_vm_area automatically adds a guard. You're
> right then, it's redundant. I had assumed there could be a writeable
> mapping directly after.
>
>> And while you are thinking about alignment, don't forget that dcbst
>> and
>> icbi apply on a give cacheline. If your memory crosses a cacheline
>> you
>> may have a problem.
>
> Yeah, though this applies to the existing patch_instruction too (in
> theory; prefixed instructions cannot cross a 64 byte boundary, but the
> ISA does not specify minimum cacheline size). I don't have a nice
> solution right now though. The flush needs to be done on the effective
> address (i.e. text poke address) according to the ISA, but the text
> poke address is only valid within the IRQ save region. So non-prefixed
> instruction patching would either pay for some kind of check, or need
> special casing.
As far as I know, cachelines are minimum 64 bytes on PPC64 aren't they ?
>
> Maybe an "is aligned" flag in a generic __patch_text to make the extra
> flush conditional is good enough?
Well, if the cacheline is already flushed, the operation will be a nop
anyway so it shouldn't cost much to flush both addresses, after all it
is the sync that costs and you'll still have only one.
>
>
> Related to EA based flushing, data patching ought to run 'dcbst' on the
> 'exec_addr' too. So given the icache flush only needs to be applied to
> instruction patching, and data flush only to data patching, I plan to
> move those exec_addr syncs outside of __patch_text to the relevant
> public instruction/data specific entry points.
Why should it run 'dcbst' on the 'exec_addr' at all ? We have not
written anything there.
Anyway, powerpc handles cachelines by physical address, so no matter
which EA you use as far as it is done.
And dcbst is handled as a write by the MMU, so you just can't apply it
on the read-only exec address.
>
>>>>> + case 8:
>>>>> + __put_kernel_nofault(dest, src, u64,
>>>>> failed);
>>>>> + break;
>>>>
>>>> Is case 8 needed for PPC32 ?
>>>
>>> I don't have a particular need for it, but the underlying
>>> __put_kernel_nofault is capable of it so I included it.
>>
>> Well, not including it will allow you to pass the source as a 'long'
>> as
>> mentionned above.
>
> I checked the machine code of a 32 bit build, but it still passes the
> pointer in a register. I also checked all 3 ABI docs and they say a
> pointer is the same size as a long. Could you clarify when a pointer is
> passed on the stack but not a long?
>
> Or do you mean that we could pass the pointed-to data in a register and
> skip the pointer altogether? That seems like a good choice, but
> __put_kernel_nofault takes a pointer to the source and the
> implementation is very complex. I don't think I can safely write the
> modified version we'd need at this point.
Yes I meant to pass the value instead of passing a pointer to the value.
When you pass a pointer to a value, it forces gcc to put that value in
memory, namely in the stack. While when you pass the value directly ,
then it gets passed in a register.
So I think you have to pass the value and only change it to a pointer to
that value at the time you are calling __put_kernel_nofault(). That way
gcc is able to handle it efficiently and most of the time voids going
via memory.
Today raw_patch_instruction() is :
c0017ebc <raw_patch_instruction>:
c0017ebc: 90 83 00 00 stw r4,0(r3)
c0017ec0: 7c 00 18 6c dcbst 0,r3
c0017ec4: 7c 00 04 ac hwsync
c0017ec8: 7c 00 1f ac icbi 0,r3
c0017ecc: 7c 00 04 ac hwsync
c0017ed0: 4c 00 01 2c isync
c0017ed4: 38 60 00 00 li r3,0
c0017ed8: 4e 80 00 20 blr
c0017edc: 38 60 ff ff li r3,-1
c0017ee0: 4e 80 00 20 blr
Here r4 is the value to be written.
With your patch, extract from __patch_text() is:
c0017fdc: 28 05 00 04 cmplwi r5,4
c0017fe0: 41 82 00 74 beq c0018054 <__patch_text+0x98>
c0017fe4: 41 81 00 40 bgt c0018024 <__patch_text+0x68>
c0017fe8: 28 05 00 01 cmplwi r5,1
c0017fec: 41 82 00 74 beq c0018060 <__patch_text+0xa4>
c0017ff0: 28 05 00 02 cmplwi r5,2
c0017ff4: 40 82 00 0c bne c0018000 <__patch_text+0x44>
c0017ff8: a1 24 00 00 lhz r9,0(r4)
c0017ffc: b1 23 00 00 sth r9,0(r3)
c0018000: 7c 00 18 6c dcbst 0,r3
c0018004: 7c 00 04 ac hwsync
c0018008: 2c 06 00 00 cmpwi r6,0
c001800c: 38 60 00 00 li r3,0
c0018010: 4d 82 00 20 beqlr
c0018014: 7c 00 3f ac icbi 0,r7
c0018018: 7c 00 04 ac hwsync
c001801c: 4c 00 01 2c isync
c0018020: 4e 80 00 20 blr
c0018024: 28 05 00 08 cmplwi r5,8
c0018028: 40 a2 ff d8 bne c0018000 <__patch_text+0x44>
c001802c: 81 44 00 00 lwz r10,0(r4)
c0018030: 81 64 00 04 lwz r11,4(r4)
c0018034: 91 43 00 00 stw r10,0(r3)
c0018038: 91 63 00 04 stw r11,4(r3)
c001803c: 7c 00 18 6c dcbst 0,r3
c0018040: 7c 00 04 ac hwsync
c0018044: 2c 06 00 00 cmpwi r6,0
c0018048: 38 60 00 00 li r3,0
c001804c: 4d 82 00 20 beqlr
c0018050: 4b ff ff c4 b c0018014 <__patch_text+0x58>
c0018054: 81 24 00 00 lwz r9,0(r4)
c0018058: 91 23 00 00 stw r9,0(r3)
c001805c: 4b ff ff a4 b c0018000 <__patch_text+0x44>
c0018060: 89 24 00 00 lbz r9,0(r4)
c0018064: 99 23 00 00 stb r9,0(r3)
c0018068: 4b ff ff 98 b c0018000 <__patch_text+0x44>
c001806c: 38 60 ff ff li r3,-1
c0018070: 4e 80 00 20 blr
c0018074: 38 60 ff f2 li r3,-14
c0018078: 4e 80 00 20 blr
So as you can see, r4 now is a memory pointer and the data has to be
loaded from there.
Christophe
More information about the Linuxppc-dev
mailing list