[PATCH v2] powerpc/mce: fix off by one errors in mce event handling

Michael Ellerman mpe at ellerman.id.au
Tue May 12 19:41:41 AEST 2015


On Tue, 2015-05-12 at 13:23 +1000, Daniel Axtens wrote:
> Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
> save_mce_event, index got the value of mce_nest_count, and
> mce_nest_count was incremented *after* index was set.
> 
> However, that patch changed the behaviour so that mce_nest count was
> incremented *before* setting index.
> 
> This causes an off-by-one error, as get_mce_event sets index as
> mce_nest_count - 1 before reading mce_event.  Thus get_mce_event reads
> bogus data, causing warnings like
> "Machine Check Exception, Unknown event version 0 !"
> and breaking MCEs handling.
> 
> Restore the old behaviour and unbreak MCE handling by subtracting one
> from the newly incremented value.
> 
> The same broken change occured in machine_check_queue_event (which set
> a queue read by machine_check_process_queued_event).  Fix that too,
> unbreaking printing of MCE information.
> 
> Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
> CC: stable at vger.kernel.org
> CC: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> CC: Christoph Lameter <cl at linux.com>
> Signed-off-by: Daniel Axtens <dja at axtens.net>
> 
> ---
> 
> The code is still super racy, but this at least unbreaks the common,
> non-reentrant case for now until we figure out how to fix it properly.
> The proper fix will likely be quite invasive so it might be worth
> picking this up in stable rather than waiting for that?
> 
> mpe: the generated asm is below
> 
> 0000000000000070 <.save_mce_event>:
>   70:   e9 6d 00 30     ld      r11,48(r13)
>   74:   3d 22 00 00     addis   r9,r2,0
>   78:   39 29 00 00     addi    r9,r9,0
>   7c:   7d 2a 4b 78     mr      r10,r9
>   80:   39 29 00 08     addi    r9,r9,8
>   84:   7d 8a 58 2e     lwzx    r12,r10,r11
>   88:   39 8c 00 01     addi    r12,r12,1
>   8c:   7d 8a 59 2e     stwx    r12,r10,r11
>   90:   e9 0d 00 30     ld      r8,48(r13)
>   94:   7d 4a 40 2e     lwzx    r10,r10,r8
>   98:   39 4a ff ff     addi    r10,r10,-1
>   9c:   2f 8a 00 63     cmpwi   cr7,r10,99
> 
> AIUI, we get the per-cpu area in 70, the addr of mce_nest_count itself
> in 80, then load, incr, stor in 84-8c, then we get the address and
> load again in 90-94, then subtract 1 to make the count sensible again,
> then 9c is the conditional `if (index >= MAX_MC_EVT)'
> 
> I think that was what you expected?

Sort of. I wasn't expecting it to reload it after the increment. But I guess
that's an artifact of the macros.

Anyway it's much better than the current code which is just broken always.

cheers




More information about the Linuxppc-dev mailing list