mapping memory in 0xb space

Sat Oct 2 04:05:12 EST 2004

A question for the rest of you, who haven't been following this thread.
Is there publicly available documentation on the power4 extensions,
specifically the large page support, how it effects the HPT hashing, and
the SLB, including the new instructions for maintaining it in software?
I haven't been able to find anything yet.

On Fri, 1 Oct 2004, David Gibson wrote:

> On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote:
> > On Wed, 29 Sep 2004, David Gibson wrote:
> >
> > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote:
> > > > On Tue, 28 Sep 2004, David Gibson wrote:
> > > >
> > > > >  Recent kernels don't even
> > > > > have VSIDs allocated for the 0xb... region.
> > > >
> > > > Looking at both 2.6.8 and 2.4.21, I don't see a difference in
> > > > get_kernel_vsid() code.
> > >
> > > Ok, *very* recent kernels.  The new VSID algorithm has gone into the
> > > BK tree since 2.6.8.
> >
> > >From the description I read, I might be better off using 0xfff.. addresses
> > with that algorithm.  Not a big deal.
>
> Perhaps.  However, there are issues there as well: older kernels have
> the same 41-bit address restriction (maybe somewhat extendable) in the
> 0xf region, just like 0xb.  The new VSID algo gives VSIDs for every
> address above 0xc000000000000000 *except* the very last segment,
> 0xfffffffff0000000-0xffffffffffffffff.

Lucky me!  I'll take a look at what the VSID for the last segment
conflicts with, maybe it will be something unused.  Or I'll have to think
of something else clever.  Right now, I still want my 2.4.21
implementation to work.

> > > > Also, I narrowed it down to
> > > > working (or appearing to work) as long as the highest 5 bits of the page
> > > > index (those that end up as partial index in the HPTE) are zero.  This may
> > > > just be a weird coincidence.
> > >
> > > Could be.
> > >
> > > > > Why on earth do you want to do this?
> > > >
> > > > Good question ;-).  A long long time ago, I posted on this list and
> > > > explained.  Since then, I found what appeared to be a solution, except
> > > > that it appears power4 breaks it.  I am building a tool that allows
> > > > dynamic splicing of code into a running kernel (see
> > > > http://www.paradyn.org/html/kerninst.html).  In order for this to work, I
> > > > need to be able to overwrite a single instruction with a jump to
> > > > spliced-in code.  The target of the jump needs to be within the range (26
> > > > bits).  Therefore, I have a choice of 0xbfff.. addresses with backward
> > > > jumps from 0xc region, or the 0xff.. addresses for absolute jumps.  I
> > > > chose 0xbff.., because I found already-working code, originally written
> > > > for the performance counter interface.  Am I making more sense now?
> > >
> > > Aha!  But this does actually explain the problem - there are only
> > > VSIDs assigned for the first 2^41 bits of each region - so although
> > > there are vsids for 0xb000000000000000-0xb00001ffffffffff, there
> > > aren't any for 0xbff... addresses.  Likewise the Linux pagetables only
> > > cover a 41-bit address range, but that won't matter if you're creating
> > > HPTEs directly.
> >
> > And this is why I avoided explaining fully in my first email :-).  I'd
> > like to solve one problem at a time.  What I said in my initial email
> > is accurate.  Even within the valid VSID range, if the highest 5 bits of
> > the page index are not zero, I get a crash on access (e.g.
> > 0xb00001FFFFF00000, but works on 0xb00001FFF0000000).
>
> Hrm.  Ok.  I'm not sure why that would be.

Here is some more background.  Maybe it will help you think of what's
going wrong here.  I noticed that if I write to the remapped
0xb00001FFF0000000, the changes do not show up at the physical address I
mapped it to.  At this point, I noticed that get_free_page() returns a
4K page frame above 256MB, which means that in reality, it's an
address within a large page.  SLB entry created by do_slb_bolted likewise
has the large page bit set.  I changed my code to create an HPTE mapping
for the large page, and finally I get a sensible result: changes to the
remapped page show up on the physical page.  Note that even though I
create a mapping for the whole large page, I only write to the 4K chunk
that corresponds to the address returned by get_free_page() -- I do not
want to clobber random memory.

In summary, mapping the first large page of the 0xb00001FFF segment works,
but mapping any other within that segment causes a kernel crash.  There
must be something I don't understand about how large pages fit into the
HPT.  Could you point me to documentation on the large page extensions of
power4, and, while we are at it, documentation on the SLB?  So far, I
simply guessed on how it works, based on the code I see in the kernel.

For what it's worth, here is (roughly) the relevant code I am using:

frame = get_free_page(GFP_KERNEL);
pa = (unsigned long)__v2a(frame) & 0xFFFFFFFFFF000000;
//want physical address to point to the corresponding large page.

ea = 0xb00001FFFF000000;
vsid = get_kernel_vsid(ea);
va = ( vsid << 28 ) | ( ea & 0xfffffff );
vpn = va >> PAGE_SHIFT;
rpn = pa >> PAGE_SHIFT;
hpteflags = _PAGE_ACCESSED|_PAGE_COHERENT|PP_RWXX;
slot = ppc_md->hpte_insert(vpn, rpn, hpteflags, 1, 1);

smallpage_offset = ( (unsigned long) __v2a(frame) - pa)
return ea + smallpage_offset;
//only access the relevant 4K chunk within the large page

>
> > As for why I thought 0xbff would work,  I reasoned that
> > since the highest bits are masked out in get_kernel_vsid(), and since
> > nobody else is using the 0xb region, it doesn't matter if I get a VSID
> > that is the same as some other VSID in 0xb region.  However, I did not
> > consider the bug in do_slb_bolted that you describe below.
>
> Yes, with that bug the collision can be with a segment anywhere, not
> just in the 0xb region.

OK, I will deal with this, somehow.  Binary patch idea might just work.

> Though, come to that, you do only need one segment, so it might not be
> that hard to binary patch in branch to some code of your own which
> provides a VSID for that one segment.
>
> > It's starting to sound like an impossible task (at least on non-recent
> > kernels).  I think I might go with a backup suboptimal solution, which
> > involves extra jumps, but at least it might work.
>
> That may be a better idea.

I'd like to avoid this, but if I only have to incur this for the binary
patch to do_slb_bolted, I might be fine.

Thanks,
Igor