Suspected regression?

Christophe Leroy christophe.leroy at c-s.fr
Fri Aug 26 22:46:48 AEST 2016


Hi Alessio,

Le 26/08/2016 à 04:32, Scott Wood a écrit :
> On Tue, 2016-08-23 at 13:34 +0200, Christophe Leroy wrote:
>>
>> Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit :
>>>
>>> Hi Christophe,
>>>
>>> Sorry for delay in reply I was on vacation.
>>>
>>> On 6 August 2016 at 11:29, christophe leroy <christophe.leroy at c-s.fr>
>>> wrote:
>>>>
>>>> Alessio,
>>>>
>>>>
>>>> Le 05/08/2016 à 09:51, Christophe Leroy a écrit :
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 19/07/2016 à 23:52, Scott Wood a écrit :
>>>>>>
>>>>>>
>>>>>> On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have got two boards MVME5100 (MPC7410 cpu) and MVME7100
>>>>>>> (MPC8641D
>>>>>>> cpu) for which I use the same cross-compiler (ppc7400).
>>>>>>>
>>>>>>> I tested these against kernel HEAD to found that these don't boot
>>>>>>> anymore (PID 1 crash).
>>>>>>>
>>>>>>> Bisecting results in first offending commit:
>>>>>>> 7aef4136566b0539a1a98391181e188905e33401
>>>>>>>
>>>>>>> Removing it from HEAD make boards boot properly again.
>>>>>>>
>>>>>>> A third system based on P2010 isn't affected at all.
>>>>>>>
>>>>>>> Is it a regression or I have made something wrong?
>>>>>>
>>>>>> I booted both my next branch, and Linus's master on MPC8641HPCN and
>>>>>> didn't see
>>>>>> this -- though possibly your RFS is doing something
>>>>>> different.  Maybe
>>>>>> that's
>>>>>> the difference with P2010 as well.
>>>>>>
>>>>>> Is there any way you can debug the cause of the crash?  Or send me a
>>>>>> minimal
>>>>>> RFS that demonstrates the problem (ideally with debug symbols on the
>>>>>> userspace
>>>>>> binaries)?
>>>>>>
>>>>> I got from Alessio the below information:
>>>>>
>>>>> systemd[1]: Caught <BUS>, core dump failed (child 137, code=killed,
>>>>> status=7/BUS).
>>>>> systemd[1]: Freezing execution.
>>>>>
>>>>>
>>>>> What can generate SIGBUS ?
>>>>> And shouldn't we also get some KERN_ERR trace, something like
>>>>> "unhandled
>>>>> signal 7 at ....." ?
>>>>>
>>>> As far as I can see, SIGBUS is mainly generated from alignment
>>>> exception.
>>>> According to 7410 Reference Manual, alignment exception can happen in
>>>> the
>>>> following cases:
>>>> * An operand of a dcbz instruction is on a page that is write-through or
>>>> cache-inhibited for a virtual mode access.
>>>> * An attempt to execute a dcbz instruction occurs when the cache is
>>>> disabled
>>>> or locked.
>>>>
>>>> Could try with below patch to check if the dcbz insn is causing the
>>>> SIGBUS ?
>>> Unfortunately that patch doesn't solve the problem.
>>>
>>> Is there a chance that cache behavior could settled by board firmware
>>> (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)?
>>> In that case what do you suggest me to looking for?
>> If the removal of dcbz doesn't solve the issue, I don't think it is a
>> cache related issue.
>> As far as I understood, your init gets a SIGBUS signal, right ? Then we
>> must identify the reason for that sigbus.
>
> My guess would be errors demand-loading a page via NFS.
>
> One approach might be to hack up the code so that both versions of
> csum_partial_copy_generic() are present, and call both each time.  If the
> results differ or the copied bytes are wrong, then spit out a dump of the
> details.
>

Can you try the patch below ? I have identified that in case the packet 
is smaller than a cacheline, it doesn't get cache-aligned so the result 
shall not be rotated in case of odd dest address.

This patch goes in addition to the previous fix (1bc8b816cb805) as it 
fixes a different case.

Christophe

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 68f6862..3971cfb 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -127,18 +127,19 @@ _GLOBAL(csum_partial_copy_generic)
  	stw	r7,12(r1)
  	stw	r8,8(r1)

-	rlwinm	r0,r4,3,0x8
-	rlwnm	r6,r6,r0,0,31	/* odd destination address: rotate one byte */
-	cmplwi	cr7,r0,0	/* is destination address even ? */
  	addic	r12,r6,0
  	addi	r6,r4,-4
  	neg	r0,r4
  	addi	r4,r3,-4
  	andi.	r0,r0,CACHELINE_MASK	/* # bytes to start of cache line */
+	crset	4*cr7+eq
  	beq	58f

  	cmplw	0,r5,r0			/* is this more than total to do? */
  	blt	63f			/* if not much to do */
+	rlwinm	r7,r6,3,0x8
+	rlwnm	r12,r12,r7,0,31	/* odd destination address: rotate one byte */
+	cmplwi	cr7,r7,0	/* is destination address even ? */
  	andi.	r8,r0,3			/* get it word-aligned first */
  	mtctr	r8
  	beq+	61f
-- 



More information about the Linuxppc-dev mailing list