PROBLEM: memory corrupting bug, bisected to 6dda9d55

Mel Gorman mel at csn.ul.ie
Thu Oct 14 01:40:44 EST 2010


On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> (cc linuxppc-dev at lists.ozlabs.org)
> 
> On Mon, 11 Oct 2010 15:30:22 +0100
> Mel Gorman <mel at csn.ul.ie> wrote:
> 
> > On Sat, Oct 09, 2010 at 04:57:18AM -0500, pacman at kosh.dhis.org wrote:
> > > (What a big Cc: list... scripts/get_maintainer.pl made me do it.)
> > > 
> > > This will be a long story with a weak conclusion, sorry about that, but it's
> > > been a long bug-hunt.
> > > 
> > > With recent kernels I've seen a bug that appears to corrupt random 4-byte
> > > chunks of memory. It's not easy to reproduce. It seems to happen only once
> > > per boot, pretty quickly after userspace has gotten started, and sometimes it
> > > doesn't happen at all.
> > > 
> > 
> > A corruption of 4 bytes could be consistent with a pointer value being
> > written to an incorrect location.
> 
> It's corruption of user memory, which is unusual.  I'd be wondering if
> there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> previously the corruption was hitting something harmless.  Something
> like a missed CPU cache writeback or invalidate operation.
> 

This seems somewhat plausible although it's hard to tell for sure. But
lets say we had the following situation in memory

[<----MAX_ORDER_NR_PAGES---->][<----MAX_ORDER_NR_PAGES---->]
INITRD                        memmap array

initrd gets freed and someone else very early in boot gets allocated in
there. Lets further guess that the struct pages in the memmap area are
managing the page frame where the INITRD was because it makes the situation
slightly easier to trigger. As pages get freed in the memmap array, we could
reference memory where initrd used to be but the physical memory is mapped
at two virtual addresses.

CPU A							CPU B
							Reads kernelspace virtual (gets cache line)
Writes userspace virtual (gets different cache line)
							IO, writes buffer destined for userspace (via cache line)
Cache line eviction, writeback to memory

This is somewhat contrived but I can see how it might happen even on one
CPU particularly if the L1 cache is virtual and is loose about checking
physical tags.

> How sensitive/vulnerable is PPC32 to such things?
> 

I can not tell you specifically but if the above scenario is in any way
plausible, I believe it would depend on what sort of L1 cache the CPU
has. Maybe this particular version has a virtual cache with no physical
tagging and is depending on the OS not to make virtual aliasing mistakes.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab


More information about the Linuxppc-dev mailing list