[PATCH] powerpc: workaround clang codegen bug in dcbz

Michael Ellerman mpe at ellerman.id.au
Tue Jul 30 21:17:43 AEST 2019


Arnd Bergmann <arnd at arndb.de> writes:
> On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
> <segher at kernel.crashing.org> wrote:
>> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
>> > For the record:
>> >
>> > https://godbolt.org/z/z57VU7
>> >
>> > This seems consistent with what Michael found so I don't think a revert
>> > is entirely unreasonable.
>>
>> Try this:
>>
>>   https://godbolt.org/z/6_ZfVi
>>
>> This matters in non-trivial loops, for example.  But all current cases
>> where such non-trivial loops are done with cache block instructions are
>> actually written in real assembler already, using two registers.
>> Because performance matters.  Not that I recommend writing code as
>> critical as memset in C with inline asm :-)
>
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.
>
> Changing it to an output "=Z" constraint seems to make it work:
>
> https://godbolt.org/z/FwEqHf
>
> Clang still doesn't use the optimum form, but it passes the correct pointer.

Thanks Arnd. This seems like a better solution.

I'll drop the revert I have staged.

Segher does this look OK to you?

Nathan/Nick, are one of you able to test this with your clang CI?

cheers


More information about the Linuxppc-dev mailing list