"Illegal instruction" traps on smp clients - 2.4.19

David Bryan Dave at ThePTRGroup.com
Fri Feb 28 03:37:06 EST 2003


We saw problems like this about a year ago when running SMP on dual PPC7400

There is a I/O signal called SHD between the processors in a multiprocessing
system.  This signal is used to indicate when a reservation is held on an
address.  The lwarx/stcwx instruction pair uses reservations to guarantee
atomicity in SMP systems.  (The lwarx/stcwx instructions are used
extensively in the kernel, particularly in the spinlock routines). To enable
use of the SHD signal, the 7400 has to either be in MESI mode or in MEI mode
with the SHD explicitly enabled.  These modes are controlled by two bits in
the Memory subsystem control register (MSSCR0).  At reset, the MSSCR0
defaults to MEI mode with the SHD signal disabled.  By placing the 7400 in
MESI mode at boot, we solved the problem.

Hope this helps,

     T h e   P T R   G r o u p,   I n c.
       ->->->->  ->->->->->   ->->->->
             ->     ->               ->
     ->->->->      ->       ->->->->
    ->            ->       ->      ->
   ->            ->       ->       ->
 Embedded, Real-Time Solutions, and Training

David Bryan                www.ThePTRGroup.com

-----Original Message-----
From: owner-linuxppc-dev at lists.linuxppc.org
[mailto:owner-linuxppc-dev at lists.linuxppc.org]On Behalf Of Rudy
Sent: Thursday, February 27, 2003 9:45 AM
To: linuxppc-dev at lists.linuxppc.org
Subject: "Illegal instruction" traps on smp clients - 2.4.19

	This is a message that was posted last week on linux-smp.
	No responses, so I'm rewriting/reposting here.

	Our configuration uses Linux 2.4.19, from Synergy ( derived
	from YellowDog version 2.1). We have several boards
	configd in a server/client relationship. These boards
	contain either 2 or 4 G4 Altived ppc processors.  The
	server has an attached disk, clients are diskless, mounting
	their  root file system over nfs.

	I am seeing frequent "Illegal instruction" traps on clients
	that run an smp kernel.  Other symptoms include failure of
	various daemons	during startup ( syslogd, crond, sshd, etc ).
	Symptoms also occur during rsh/rlogin usage.

	Running a UP kernel on clients works just fine.

	Smp and UP kernels work fine on the "server".

	Has anyone else seen this type of problem or something similar?

        This appears to me to be an smp problem.

	A fix relating to page table/tlb invalidation ordering
	was detailed by Sunil Saxena at
	for the x86 architecture, and these mods seem to have made it
	into 2.4.18 .  The ppc arch was not addressed.  Also have
	noticed this problem being addressed starting in 2.5.16 .

	Its not really practical for me to use 2.5.xx at this point.

	I am hoping that someone familiar with this code and the
	ppc architecture can verify that this is indeed a problem
	for 2.4.19.

	And then, what can I do about it?  I'm willing to try things
	as my time permits.  I have looked at 2.5.60 memory.c/mmap.c
	and related functions, and trying to port the new methods
	back to 2.4.19 seems to be a rather daunting task.

        Comments, suggestions?

        My background involves writing device drivers for VMS,
	Solaris,  and now Linux.

        Any assistance or guidance would be appreciated


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc-dev mailing list