Why do we map PCI IO space so late ?

Fri Oct 1 01:45:15 EST 2004

Hi Ben-

Good questions :)  First let me clear something up, and forgive me if
I'm telling you stuff you already know.  The ioremap()'s that we do at
boot are _exclusively_ done for PHBs.  This creates mappings that span
the ranges for their children buses.  Why do we do this when drivers can
themselves use ioremap()?  Because some drivers still use inb()/outb(),
etc, without remapping their own space.  

The short answer to your questions is that I/O DLPAR required these PHB
ioremap()'s to be moved to a later chronological point during boot, so
that imalloc records would be kept.

Here's the long answer.  To dynamically remove a bus (EADS or PHB), we
need to iounmap() the range associated with it.  The iounmap() function
is prototyped in generic code to take one argument, the virtual address
in question.  In order to know the size of the region to unmap, we need
to keep some records of what was ioremap()'ed originally.  The imalloc
subsystem exists to keep these records.

The ppc64 ioremap() implementation has the limitation that if one calls
it before mem_init_done, no imalloc records are left behind.  If we
remap the PHBs early in boot, we have no way to unmap them (or their
children) at DLPAR remove time.  Does this make sense?  

As a side note, we didn't similarly defer the remap for ISA, b/c we
assumed that we'd never want to unmap this range.  I wrote the function
that remaps for ISA, and it's a hack, you're right :)  Suggestions are
welcome.  I would ask why your ISA node doesn't have a ranges property,
b/c I thought it was mandatory from some spec.  

You asked about ioremap_explicit(). This is used in two ways.  First
during boot, to remap the necessary regions for PHBs after
mem_init_done.  We've saved off the "physical" range info from the ofdt
early in boot, and now we explicitly remap starting at virtual addr
PHBS_IO_BASE.  Second, we use it to remap the range of a newly
DLPAR-added bus.  You can imagine that in the case of adding an EADS
slot, we need the mappings to exist at exact virtual addresses relative
to its parent PHB, etc.  Hence the creation of ioremap_explicit().

Suggestions on improvements are welcome.  Hope this helps, it's before
lunch and I'm being wordy. :)

Thanks-
John

On Thu, 2004-09-30 at 03:22, Benjamin Herrenschmidt wrote:
> Hi John !
> 
> I was going through some of the PCI setup code while working on
> some bringup stuff, and had an issue which was related to the way
> we do the ioremap'ing of the PCI IO space.
> 
> So the current scenario is:
> 
>  - early (setup_arch() time basically), we ioremap_explicit the ISA
> space and that only
> 
>  - later (pcibios_fixup time), we scan all busses and ioremap_explicit
> their various IO spaces.
> 
> I have two problems with that at the moment.
> 
> First is, I'm annoyed that during the actual PCI probing, the IO space
> is not mapped. That means that any quirk that needs IO accesses to the
> device will not work. I wonder also in which conditions we might end up
> instanciating a PCI driver as early as the PCI probing and thus crash.
> Also, this is all after console_initcalls(), so that leaves a gap of
> code that runs with PCI IO space not mapped. So far, it ended up beeing
> mostly ok because our console uses legacy serial drivers that use the
> ISA space which happen to be mapped early, but that sounds fragile &
> bogus to me. (For the short story, I found that while working on a board
> for which the "isa" node didn't have a "ranges" property, so we failed
> to early map it, thus the serial driver would crash doing IO cycles).
> Why can't we do the ioremap_explicit right after setting up the PHBs ?
> 
> The second thing that annoys me is that it seems we are also doing an
> ioremap_explicit for each p2p bridge IO space, aren't we ? I don't fully
> understand the logic here. Aren't those supposed to be fully enclosed by
> their parent PHB IO space, and thus mapped by those ?
> 
> Thanks for enlightening me,
> Ben.
> 
> 
> 
>