[PATCH] Fix corruption error in rh_alloc_fixed()

Guillaume Knispel gknispel at proformatique.com
Mon Dec 15 11:32:51 EST 2008


On Mon, 15 Dec 2008 08:21:05 +1100
Paul Mackerras <paulus at samba.org> wrote:

> Guillaume Knispel writes:
> 
> > On Tue, 09 Dec 2008 09:16:50 -0600
> > Timur Tabi <timur at freescale.com> wrote:
> > 
> > > Guillaume Knispel wrote:
> > > 
> > > > blk = NULL; at the end of the loop is what is done in the more used
> > > > rh_alloc_align(), so for consistency either we change both or we use
> > > > the same construction here.
> > > > I also think that testing for &info->free_list is harder to understand
> > > > because you must have the linked list implementation in your head
> > > > (which a kernel developer should anyway so this is not so important)
> > > 
> > > Fair enough.
> > > 
> > > Acked-by: Timur Tabi <timur at freescale.com>
> > > 
> > 
> > Kumar, can this go into your tree ?
> > (copying the patch under so you have it at hand)
> > 
> > There is an error in rh_alloc_fixed() of the Remote Heap code:
> > If there is at least one free block blk won't be NULL at the end of the
> > search loop, so -ENOMEM won't be returned and the else branch of
> > "if (bs == s || be == e)" will be taken, corrupting the management
> > structures.
> > 
> > Signed-off-by: Guillaume Knispel <gknispel at proformatique.com>
> > ---
> > Fix an error in rh_alloc_fixed() that made allocations succeed when
> > they should fail, and corrupted management structures.
> 
> What's the impact of this?  Can it cause an oops?
> 
> Is it a regression from 2.6.27?  Should we be putting it in 2.6.28?
> 
> Paul.

The problem obviously only affect people that make use of
rh_alloc_fixed(), which is the case when you program an MCC or a QMC
controller of the CPM. Without the patch cpm_muram_alloc_fixed()
succeed when it should not, for example when trying to allocate out of
range areas or already allocated areas, so it is possible that buffer
descriptors or other control structures used by other controllers get
corrupted.

Digging into ooooold Linux (like 2.6.9, I haven't checked before),
the problem seems to always have been present.

Without this patch I experienced oops (sometimes panic, sometimes not)
in various unrelated part (probably an indirect result of either
corruption of rheap management structures or corruption caused by the
CPM using crazy overwritten data) and also initialization of
multi-channel control structures putting other communication
controllers out-of-order.

The only risk I can think off is that it could break some out of tree
kernel space code which worked because of luck and a double error - for
example when doing a single DPRam allocation from offset 0 while
leaving an area reserved at the base of the DPRam. So I think it should
be put in 2.6.28.

Guillaume Knispel



More information about the Linuxppc-dev mailing list