[PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Thu Aug 26 23:57:52 AEST 2021

Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
> 
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> > 
>> > I thought only *taken* branches are just one per cycle?
>> 
>> Taken branches are fetched by the front end at one per cycle (assuming 
>> they hit the BTAC), but all branches have to be executed by BR at one 
>> per cycle
> 
> This is not true.  (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues.  And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's 
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction.  It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units).  Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

> 
>> > Internally *all* traps are conditional, in GCC.  It also can optimise
>> > them quite well.  There must be something in the kernel macros that
>> > prevents good optimisation.
>> 
>> I did take a look at it at one point.
>> 
>> One problem is that the kernel needs the address of the trap instruction 
>> to create the entry for it. The other problem is that __builtin_trap 
>> does not return so it can't be used for WARN. LLVM at least seems to 
>> have a __builtin_debugtrap which does return.
> 
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would 
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
> 
> I'm not quite sure what this means.  Can't you always just put a
> 
> bla:	asm("");
> 
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

   __builtin_trap();
   asm ("1:                               \n\t"
        "    .section __bug_table,\"aw\"  \n\t"
        "2:  .4byte 1b - 2b - 4           \n\t"
        "    .previous");

Where the 1: label must follow immediately after the trap instruction, 
and where the asm must be emitted even for the case of no-return traps 
(but anything following the asm could be omitted).

Thanks,
Nick