[Cbe-oss-dev] Context switching while page fault handling and warning

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue Jun 20 15:13:03 EST 2006


Ok, found one bogon at least. Not sure yet if it explains our problems
here as we are having a problem with the hard disk on the machine, but
in the context save/restore code, the SPU MFC command queue purge code
has a bug:

static inline void wait_purge_complete(struct spu_state *csa, struct spu
*spu)
{
	struct spu_priv2 __iomem *priv2 = spu->priv2;

	/* Save, Step 28:
	 *     Poll MFC_CNTL[Ps] until value '11' is read
	 *     (purge complete).
	 */
	POLL_WHILE_FALSE(in_be64(&priv2->mfc_control_RW) &
			 MFC_CNTL_PURGE_DMA_COMPLETE);
}

This will exit as soon as _one_ of the 2 bits that compose
MFC_CNTL_PURGE_DMA_COMPLETE is set, and one of them happens to be "purge
in progress"... which means that we'll happily continue restoring the
MFC while it's being purged at the same time.

The fix is something like:

	POLL_WHILE_FALSE((in_be64(&priv2->mfc_control_RW) &
			 MFC_CNTL_PURGE_DMA_COMPLETE) ==
			MFC_CNTL_PURGE_DMA_COMPLETE);

Ben.






More information about the cbe-oss-dev mailing list