[PATCH] powerpc: workaround clang codegen bug in dcbz
Michael Ellerman
mpe at ellerman.id.au
Tue Jul 30 21:17:43 AEST 2019
Arnd Bergmann <arnd at arndb.de> writes:
> On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
> <segher at kernel.crashing.org> wrote:
>> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
>> > For the record:
>> >
>> > https://godbolt.org/z/z57VU7
>> >
>> > This seems consistent with what Michael found so I don't think a revert
>> > is entirely unreasonable.
>>
>> Try this:
>>
>> https://godbolt.org/z/6_ZfVi
>>
>> This matters in non-trivial loops, for example. But all current cases
>> where such non-trivial loops are done with cache block instructions are
>> actually written in real assembler already, using two registers.
>> Because performance matters. Not that I recommend writing code as
>> critical as memset in C with inline asm :-)
>
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.
>
> Changing it to an output "=Z" constraint seems to make it work:
>
> https://godbolt.org/z/FwEqHf
>
> Clang still doesn't use the optimum form, but it passes the correct pointer.
Thanks Arnd. This seems like a better solution.
I'll drop the revert I have staged.
Segher does this look OK to you?
Nathan/Nick, are one of you able to test this with your clang CI?
cheers
More information about the Linuxppc-dev
mailing list