RFC: Deprecating io_block_mapping

Dan Malek dan at embeddededge.com
Thu May 26 02:36:30 EST 2005


On May 25, 2005, at 3:04 AM, Benjamin Herrenschmidt wrote:

>> Can one of you explain why this is necessary.  I believe it I just 
>> dont
>> understand.  I think this is one of the abuses of io_block_mapping().
>> People, myself included, realize some of the caveats implied by 
>> calling
>> io_block_mapping().
>
> Well, there are 2 different things here. io_block_mapping "moving"
> ioremap_bot, and my idea of having io_block_mapping "using" it...

It's more complicated than that.  The basic Linux kernel VM map is
kernel_base (usually 0xc0000000), kernel text, kernel data, VM guard,
VM alloc space, then ioremap space.

However, there are "holes" in the VM space that are completely
unused, and this is a precious resource.  The io_block_mapping()
gives us the ability to stick things into those holes.  Usually, we
would configure a system with a 2G user space, then use 
io_block_mapping()
to allocate the space between 0x80000000 and 0xc0000000.
The ioremap() isn't going to do this, unless we really make this
smarter.  On many systems, this was also the mapping for the
PCI space, so things like virt_to_xxx() were based on the assumptions
of this mapping.  So, if a board port wanted to use the option of
user task space configuration, it would have to also manage these
fixed address spaces accordingly.

This is not as simple as making io_block_mapping() use ioremap VM
space.  We have to find a way of managing all of the free kernel VM
space and ensuring all of the mapping APIs for IO know about and
utilize all of this space.

> Now, my idea is that I dislike the io_block_mapping() interface because
> we have to provide the virtual address. Which means, it forces us to
> create hard coded v->p mappings, and I consider hard coding virtual
> addresses a bad thing (for lots of reasons, including the TASK_SIZE
> one).

Then, you better get in line behind me for arguing for much better
VM space management in general :-)  Linux is horrible in this regard,
and the replies I get are " ...  for efficiency you have to know the use
of the spaces and the proper APIs to manage them ..."


> Thus, I think we could "extend" io_block_mapping() to be able to take
> "0" for virt, and return a virtual address.

But, no one would use that because it doesn't have the proper effect.
If this could be done, we would already be using ioremap().

> Dan's point about io_block_mapping() supposedly "initializing"
> ioremap_bot is bogus, unless I misunderstood him.

I never said that, but if you look at the code, it's exactly what it 
does :-)
Any mappings done between the top of user space and bottom of
the kernel are simply forced and ignored by any Linux VM.  The
io_block_mapping() is used to allocate BATs and CAMs and make
them available for ioremap() of devices.  It allows us to map various
devices into the ioremap space, take advantage of the efficiency of
BATs or large page mappings, and still have devices use the ioremap()
to find them.

As I keep saying, somehow you have to lay out the virtual to physical
mapping of devices using the efficiency of BATs and CAMs, and still
make the ioremap() interface work.  The device driver just calls 
ioremap(),
but if you have a smart board set up function, it can set up an 
efficient
mapping using BATs or CAMs rather than 4k pages requiring TLB
exceptions.

We can either make ioremap() really complex with knowledge of all of
these board configuration options so it can set up the BATs and CAMs,
or we set it all up using some functions (like io_block_mapping) in the
board set up and keep ioremap() a simple function.

The current implementation of io_block_mapping does two very important
functions.  One is this set up of efficient mapping for ioremap(), and 
the
other is to utilize the kernel VM space that isn't managed by Linux.
We are currently moving lots of the code to make use of ioremap() rather
than assuming prior mapping, which is a nice thing, but it's costing us
in terms of performance and resource utilization.

Thanks.


	-- Dan




More information about the Linuxppc-dev mailing list