Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

Segher Boessenkool segher at kernel.crashing.org
Fri Oct 29 06:50:12 EST 2010


> So is it wrong to leave the host controller enabled when the OS is booted?

Yes.  Or, rather, there should be some way for the client to turn off
all dma and interrupt activity; if the client closes the ihandles in
"/chosen", and perhaps calls "quiesce", that should be enough.



> If
> not, then the error must be in the communication of which memory addresses
> are in use by OF. I've got a node /memory at 0 whose "available" property
> looks
> like this:
>  00000000 00400000
>  00584000 0007c000
>  0092a1d8 00004e28
>  00a2f000 005d1000
>  01800000 0e3fd000
>  0fbffab4 0000054c
>>From that list, it looks to me like OF is telling the kernel that it
>> should
> not attempt to use any address above 0xfbffab4+0x54c == 0xfc00000.

The client is allowed to "take over" all memory, if it doesn't call OF
after doing so.  This won't work if some device scribbles on it, as
you have seen.

> Later, when the kernel decides it's done using OF, what's supposed to
> happen?
> It closes stdin, but that doesn't help here since the offending device is
> a
> bus node, not an input node. It looks to me like the kernel makes the
> assumption that all devices other than stdin and stdout will have been
> deactivated already when the kernel starts, and that this assumption has
> been violated. Who is wrong, from the perspective of the OF standard, the
> assumer or the violator?

The violator.

>> Lovely, incorrect data (it should start with 82002810, i.e.,
>> not relocatable -- it is already an assigned address!).
>
> Now you see how I have trouble relating the docs to the reality...

Yeah :-(

>> This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
>> first BAR; assigned to address 80000000; size is 1000.
>
> But "address 80000000" is a physical address (I think), so do I need to do
> a
> map-in on it before using it?

Yes.

>> You could try a boot script like this:
>>
>>
>> dev /pci
>> 0 ffff04 DO 0 i config-w! -100 +LOOP
>> device-end
>>
>>
>> which should disable all PCI devices on all busses, on that
>
> Almost all of my devices are under that PCI node. What will I prove by
> disabling them?

You should put it after "load", and before "go".

It should give you a working system; it's a sledgehammer workaround.


Segher



More information about the Linuxppc-dev mailing list