[PATCH 1/6] powerpc/code-patching: Implement generic text patching function

Tue Sep 20 15:44:08 AEST 2022

Le 20/09/2022 à 04:32, Benjamin Gray a écrit :
> On Mon, 2022-09-19 at 07:16 +0000, Christophe Leroy wrote:
>> Why would it be unpredictable ? Only one page is mapped. If it
>> crosses
>> the boundary, __put_kernel_nofault() will fail in a controled manner.
>> I see no point in doing the check before every write.
> 
> Oh I didn't see that get_vm_area automatically adds a guard. You're
> right then, it's redundant. I had assumed there could be a writeable
> mapping directly after.
> 
>> And while you are thinking about alignment, don't forget that dcbst
>> and
>> icbi apply on a give cacheline. If your memory crosses a cacheline
>> you
>> may have a problem.
> 
> Yeah, though this applies to the existing patch_instruction too (in
> theory; prefixed instructions cannot cross a 64 byte boundary, but the
> ISA does not specify minimum cacheline size). I don't have a nice
> solution right now though. The flush needs to be done on the effective
> address (i.e. text poke address) according to the ISA, but the text
> poke address is only valid within the IRQ save region. So non-prefixed
> instruction patching would either pay for some kind of check, or need
> special casing.

As far as I know, cachelines are minimum 64 bytes on PPC64 aren't they ?

> 
> Maybe an "is aligned" flag in a generic __patch_text to make the extra
> flush conditional is good enough?

Well, if the cacheline is already flushed, the operation will be a nop 
anyway so it shouldn't cost much to flush both addresses, after all it 
is the sync that costs and you'll still have only one.

> 
> 
> Related to EA based flushing, data patching ought to run 'dcbst' on the
> 'exec_addr' too. So given the icache flush only needs to be applied to
> instruction patching, and data flush only to data patching, I plan to
> move those exec_addr syncs outside of __patch_text to the relevant
> public instruction/data specific entry points.

Why should it run 'dcbst' on the 'exec_addr' at all ? We have not 
written anything there.

Anyway, powerpc handles cachelines by physical address, so no matter 
which EA you use as far as it is done.

And dcbst is handled as a write by the MMU, so you just can't apply it 
on the read-only exec address.

> 
>>>>> +               case 8:
>>>>> +                       __put_kernel_nofault(dest, src, u64,
>>>>> failed);
>>>>> +                       break;
>>>>
>>>> Is case 8 needed for PPC32 ?
>>>
>>> I don't have a particular need for it, but the underlying
>>> __put_kernel_nofault is capable of it so I included it.
>>
>> Well, not including it will allow you to pass the source as a 'long'
>> as
>> mentionned above.
> 
> I checked the machine code of a 32 bit build, but it still passes the
> pointer in a register. I also checked all 3 ABI docs and they say a
> pointer is the same size as a long. Could you clarify when a pointer is
> passed on the stack but not a long?
> 
> Or do you mean that we could pass the pointed-to data in a register and
> skip the pointer altogether? That seems like a good choice, but
> __put_kernel_nofault takes a pointer to the source and the
> implementation is very complex. I don't think I can safely write the
> modified version we'd need at this point.

Yes I meant to pass the value instead of passing a pointer to the value.
When you pass a pointer to a value, it forces gcc to put that value in 
memory, namely in the stack. While when you pass the value directly , 
then it gets passed in a register.

So I think you have to pass the value and only change it to a pointer to 
that value at the time you are calling __put_kernel_nofault(). That way 
gcc is able to handle it efficiently and most of the time voids going 
via memory.

Today raw_patch_instruction() is :

c0017ebc <raw_patch_instruction>:
c0017ebc:	90 83 00 00 	stw     r4,0(r3)
c0017ec0:	7c 00 18 6c 	dcbst   0,r3
c0017ec4:	7c 00 04 ac 	hwsync
c0017ec8:	7c 00 1f ac 	icbi    0,r3
c0017ecc:	7c 00 04 ac 	hwsync
c0017ed0:	4c 00 01 2c 	isync
c0017ed4:	38 60 00 00 	li      r3,0
c0017ed8:	4e 80 00 20 	blr
c0017edc:	38 60 ff ff 	li      r3,-1
c0017ee0:	4e 80 00 20 	blr

Here r4 is the value to be written.

With your patch, extract from __patch_text() is:

c0017fdc:	28 05 00 04 	cmplwi  r5,4
c0017fe0:	41 82 00 74 	beq     c0018054 <__patch_text+0x98>
c0017fe4:	41 81 00 40 	bgt     c0018024 <__patch_text+0x68>
c0017fe8:	28 05 00 01 	cmplwi  r5,1
c0017fec:	41 82 00 74 	beq     c0018060 <__patch_text+0xa4>
c0017ff0:	28 05 00 02 	cmplwi  r5,2
c0017ff4:	40 82 00 0c 	bne     c0018000 <__patch_text+0x44>
c0017ff8:	a1 24 00 00 	lhz     r9,0(r4)
c0017ffc:	b1 23 00 00 	sth     r9,0(r3)
c0018000:	7c 00 18 6c 	dcbst   0,r3
c0018004:	7c 00 04 ac 	hwsync
c0018008:	2c 06 00 00 	cmpwi   r6,0
c001800c:	38 60 00 00 	li      r3,0
c0018010:	4d 82 00 20 	beqlr
c0018014:	7c 00 3f ac 	icbi    0,r7
c0018018:	7c 00 04 ac 	hwsync
c001801c:	4c 00 01 2c 	isync
c0018020:	4e 80 00 20 	blr
c0018024:	28 05 00 08 	cmplwi  r5,8
c0018028:	40 a2 ff d8 	bne     c0018000 <__patch_text+0x44>
c001802c:	81 44 00 00 	lwz     r10,0(r4)
c0018030:	81 64 00 04 	lwz     r11,4(r4)
c0018034:	91 43 00 00 	stw     r10,0(r3)
c0018038:	91 63 00 04 	stw     r11,4(r3)
c001803c:	7c 00 18 6c 	dcbst   0,r3
c0018040:	7c 00 04 ac 	hwsync
c0018044:	2c 06 00 00 	cmpwi   r6,0
c0018048:	38 60 00 00 	li      r3,0
c001804c:	4d 82 00 20 	beqlr
c0018050:	4b ff ff c4 	b       c0018014 <__patch_text+0x58>
c0018054:	81 24 00 00 	lwz     r9,0(r4)
c0018058:	91 23 00 00 	stw     r9,0(r3)
c001805c:	4b ff ff a4 	b       c0018000 <__patch_text+0x44>
c0018060:	89 24 00 00 	lbz     r9,0(r4)
c0018064:	99 23 00 00 	stb     r9,0(r3)
c0018068:	4b ff ff 98 	b       c0018000 <__patch_text+0x44>
c001806c:	38 60 ff ff 	li      r3,-1
c0018070:	4e 80 00 20 	blr
c0018074:	38 60 ff f2 	li      r3,-14
c0018078:	4e 80 00 20 	blr

So as you can see, r4 now is a memory pointer and the data has to be 
loaded from there.

Christophe