[PATCH] PPC64: EEH Recovery

Linas Vepstas linas at austin.ibm.com
Tue Dec 14 06:17:55 EST 2004

On Mon, Dec 13, 2004 at 04:22:12PM +1100, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
> > No not at all.  As I said, all I really wanted to do was to save the
> > bars. I actually thought about saving the bars in the device_node, 
> > but I then anticipated that some wise-guy on this mailing list suggest
> > that I save the bars in the pci_dev instead.   So I saved the bars
> > in pci_dev instead.   I did *not* anticipate your comments.  
> Saving the BARs in the pci_dev does indeed seem like the right thing.
> Milton, why did you prefer to put them in the device_node?  

I can't speak for Milton, but I can offer up one reason.  
When the pci device is removed, the pci_device struct is freed
(and if it has bridges/multi-function, then a tree of pci_device 

So, in order to be able to access data in the struct, after
the device delete, I was doing something funky: doing a pci_get 
to make sure the struct wasn't freed until I was done with it.

Of course, the device_node is freed as well, so I have to 
dn_get to make sure it doesn't disappear from under me. But
at least this way, I have to get only one set of objects, not two,
and so this does reduce total complexity slightly.

> Linas, this part of the patch:
> @@ -223,9 +220,9 @@ pci_addr_cache_insert(struct pci_dev *de
>  	while (*p) {
>  		parent = *p;
>  		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
> -		if (alo < piar->addr_lo) {
> +		if (ahi < piar->addr_lo) {
>  			p = &parent->rb_left;
> -		} else if (ahi > piar->addr_hi) {
> +		} else if (alo > piar->addr_hi) {
>  			p = &parent->rb_right;
>  		} else {
>  			if (dev != piar->pcidev ||
> looks like a bug fix, unrelated to the other changes you are making.
> Is that the case?  

Not really.  If the pci subsystem is working correctly, this change has
no effect.  It only makes a differece if PCI addresses were assigned
incorrectly, in which case, the updated code will catch that error.

> Could you explain what is being fixed and why?

The "pci address cache" is a collection of address ranges that can be
used to find the pci device associated with a given address.  In 
a "correctly working system", these adress ranges are non-overlapping. 
(an i/o address can be associated to only one pci dev.)

To be able to find a device quickly, I hold the address ranges in sorted
order.   The above code is merely trying to find the insertion point
for a new address range.   As long as the ranges are non-overlapping,
I can sort by comparing bottom-of-range to bottom-of-range, or by
comparing top-of-previous to bottom-of-next, or comparing midpoints, etc.
Any of these will work.

But if the address ranges overlap, the old code won't notice it,
whereas the patched code will, and print a warning.  
i.e. in order to not overlap, the highest address of one range must 
be less than the lowest address of the next range. 

Since sometimes programmers introduce errors in assigning pci addresses,
the patch will help rpovide an early warning for one type of error.


More information about the Linuxppc64-dev mailing list