From mac at melware.de  Tue Sep  3 18:43:57 2002
From: mac at melware.de (Armin Schindler)
Date: Tue, 3 Sep 2002 10:43:57 +0200 (MEST)
Subject: PCI endianess
In-Reply-To: <1028316150.991.50.camel@q.rchland.ibm.com>
Message-ID: <Pine.LNX.4.31.0209031038150.14941-100000@phoenix.one.melware.de>


Hi,

as far as I know, the PCI bus is little endian, even on big endian machines.
Is this true for RS6000 (44p/270) too ?

The reason I'm asking is, during the port of a PCI card driver, I
noticed, that I have to convert byte-order when writing/reading on
the mapped PCI memory of my card.

Did I miss something here ?

Thanks,
Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From aprasad at in.ibm.com  Tue Sep  3 18:45:53 2002
From: aprasad at in.ibm.com (Anil K Prasad)
Date: Tue, 3 Sep 2002 14:15:53 +0530
Subject: PCI endianess
Message-ID: <OF13F5059E.B16E16F0-ON65256C29.002FE750@in.ibm.com>


>as far as I know, the PCI bus is little endian, even on big endian
machines.
>Is this true for RS6000 (44p/270) too ?

Yes, even on RS6K machines its little endian.
You should write to PCI mapped register in little endian byte order.

Regards,
Anil.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Tue Sep  3 19:22:40 2002
From: mac at melware.de (Armin Schindler)
Date: Tue, 3 Sep 2002 11:22:40 +0200 (MEST)
Subject: PCI endianess
In-Reply-To: <OF13F5059E.B16E16F0-ON65256C29.002FE750@in.ibm.com>
Message-ID: <Pine.LNX.4.31.0209031118010.15283-100000@phoenix.one.melware.de>


On Tue, 3 Sep 2002, Anil K Prasad wrote:
> >as far as I know, the PCI bus is little endian, even on big endian
> machines.
> >Is this true for RS6000 (44p/270) too ?
>
> Yes, even on RS6K machines its little endian.
> You should write to PCI mapped register in little endian byte order.

So commands/data to my memory-mapped PCI card need to be converted, right ?
E.g. if the cards memory has a structure like
struct {
  u16 command;
  u32 length;
  u32 data[128];
}

I need to convert all data to that structure ?

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From aprasad at in.ibm.com  Tue Sep  3 19:54:48 2002
From: aprasad at in.ibm.com (Anil K Prasad)
Date: Tue, 3 Sep 2002 15:24:48 +0530
Subject: PCI endianess
Message-ID: <OFEA22B462.B0FE1C32-ON65256C29.00350849@in.ibm.com>


>> >as far as I know, the PCI bus is little endian, even on big endian
>> machines.
>> >Is this true for RS6000 (44p/270) too ?
>>
>> Yes, even on RS6K machines its little endian.
>> You should write to PCI mapped register in little endian byte order.

>So commands/data to my memory-mapped PCI card need to be converted, right
?
>E.g. if the cards memory has a structure like
>struct {
>  u16 command;
>  u32 length;
>  u32 data[128];
>}

>I need to convert all data to that structure ?
If you need to copy this data from system memory to card memory (or other
way), you need to convert it from big-endian to little-endian(or reverse).
For example if structure is something like

x.command = 0x1234;
x.length = 0x567890AB;
where is x is an instance of above struct.

and assume X is at address 0x10000000 in system memory,
Then following will give view of structure in system memory

0x10000000 --> 0x12
0x10000001 --> 0x34 (0x10000002 and 0x10000002 will unused because of
padding)
0x10000004 --> 0x56
0x10000005 --> 0x78
0x10000006 --> 0x90
0x10000007 --> 0xAB;


Inside Card memory  this structure should look like(lets assume address of
structure is 0xf0000000)

0xf0000000 --> 0x34
0xf0000001 --> 0x12

0xf0000003 --> 0xAB
0xf0000004 --> 0x90
0xf0000005 --> 0x78
0xf0000006 --> 0x56

Regards,
Anil.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Wed Sep  4 04:09:36 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Tue, 3 Sep 2002 20:09:36 +0200
Subject: PCI endianess
In-Reply-To: <OF13F5059E.B16E16F0-ON65256C29.002FE750@in.ibm.com>
References: <OF13F5059E.B16E16F0-ON65256C29.002FE750@in.ibm.com>
Message-ID: <20020903180936.23346@192.168.4.1>


>>as far as I know, the PCI bus is little endian, even on big endian
>machines.
>>Is this true for RS6000 (44p/270) too ?
>
>Yes, even on RS6K machines its little endian.
>You should write to PCI mapped register in little endian byte order.

Which should be done with the {read,write}{b,w,l} accessors in
asm/io.h which are supposed to do the byteswapping & IO barriers
for you on big endian archs.

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Wed Sep  4 16:54:17 2002
From: mac at melware.de (Armin Schindler)
Date: Wed, 4 Sep 2002 08:54:17 +0200 (MEST)
Subject: PCI endianess
In-Reply-To: <OFEA22B462.B0FE1C32-ON65256C29.00350849@in.ibm.com>
Message-ID: <Pine.LNX.4.31.0209040846030.23869-100000@phoenix.one.melware.de>


On Tue, 3 Sep 2002, Anil K Prasad wrote:
> >> >as far as I know, the PCI bus is little endian, even on big endian
> >> machines.
> >> >Is this true for RS6000 (44p/270) too ?
> >>
> >> Yes, even on RS6K machines its little endian.
> >> You should write to PCI mapped register in little endian byte order.
>
> >So commands/data to my memory-mapped PCI card need to be converted, right
> ?
> >E.g. if the cards memory has a structure like
> >struct {
> >  u16 command;
> >  u32 length;
> >  u32 data[128];
> >}
>
> >I need to convert all data to that structure ?
> If you need to copy this data from system memory to card memory (or other
> way), you need to convert it from big-endian to little-endian(or reverse).
> For example if structure is something like
>
> x.command = 0x1234;
> x.length = 0x567890AB;
> where is x is an instance of above struct.
>
> and assume X is at address 0x10000000 in system memory,
> Then following will give view of structure in system memory
>
> 0x10000000 --> 0x12
> 0x10000001 --> 0x34 (0x10000002 and 0x10000002 will unused because of
> padding)
> 0x10000004 --> 0x56
> 0x10000005 --> 0x78
> 0x10000006 --> 0x90
> 0x10000007 --> 0xAB;
>
>
> Inside Card memory  this structure should look like(lets assume address of
> structure is 0xf0000000)
>
> 0xf0000000 --> 0x34
> 0xf0000001 --> 0x12
>
> 0xf0000003 --> 0xAB
> 0xf0000004 --> 0x90
> 0xf0000005 --> 0x78
> 0xf0000006 --> 0x56

Do you mean this happens without conversion (e.g. read/write{bwl]) ?
Then it is exactly what I need.
If I do on my big endian machine:
 unsigned short val = 0x1234;
 x->command = val;
Is 0x1234 stored as little endian on the cards ram ?
Since my card needs little endian (the on card cpu is little endian)
this would be what I need and change of the existing code isn't
necessary. Or did I understand this completely wrong ?

You wrote that addresses 2 and 3 are not used because of padding, but
I didn't notice this before. My structure needs to be __packed__ which
does work without telling this the compiler. Or will GCC on PPC64 do it
different ?

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Wed Sep  4 17:02:27 2002
From: mac at melware.de (Armin Schindler)
Date: Wed, 4 Sep 2002 09:02:27 +0200 (MEST)
Subject: 32bit kernel
In-Reply-To: <20020903180936.23346@192.168.4.1>
Message-ID: <Pine.LNX.4.31.0209040857110.24023-100000@phoenix.one.melware.de>


Hi,

I use SuSE SLES 7 on my 44p/270 and to do the porting of
a driver to big-endian/64bit a little bit easier (in two steps),
I thought beginning with big-endian and not 64bit is a good idea.
So the question is, is it possible to use a 32bit linux kernel (ppc)
on this machine together with SLES 7 ?
What configuration of kernel and yaboot do I need for it ?
If there is a documenation for this, I didn't find it.

Thanks for any hints,

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From aprasad at in.ibm.com  Wed Sep  4 18:00:14 2002
From: aprasad at in.ibm.com (Anil K Prasad)
Date: Wed, 4 Sep 2002 13:30:14 +0530
Subject: PCI endianess
Message-ID: <OF01ECB0C5.9EC9320D-ON65256C2A.002B225A@in.ibm.com>


>Do you mean this happens without conversion (e.g. read/write{bwl]) ?
>Then it is exactly what I need.
>If I do on my big endian machine:
> unsigned short val = 0x1234;
> x->command = val;
you must do something like this:

out_le16(x->command, val); for 16 bit data
and out_le32 for 32 bit data.

Using these out_le16/32 will ensure portability across different endian
platforms.


>Is 0x1234 stored as little endian on the cards ram ?
Yes.


>You wrote that addresses 2 and 3 are not used because of padding, but
>I didn't notice this before. My structure needs to be __packed__ which
>does work without telling this the compiler. Or will GCC on PPC64 do it
>different ?

I am not sure abt this.

Regards,
Anil.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Wed Sep  4 18:27:44 2002
From: mac at melware.de (Armin Schindler)
Date: Wed, 4 Sep 2002 10:27:44 +0200 (MEST)
Subject: PCI endianess
In-Reply-To: <OF01ECB0C5.9EC9320D-ON65256C2A.002B225A@in.ibm.com>
Message-ID: <Pine.LNX.4.31.0209041025450.24717-100000@phoenix.one.melware.de>


On Wed, 4 Sep 2002, Anil K Prasad wrote:
> >Do you mean this happens without conversion (e.g. read/write{bwl]) ?
> >Then it is exactly what I need.
> >If I do on my big endian machine:
> > unsigned short val = 0x1234;
> > x->command = val;
> you must do something like this:
>
> out_le16(x->command, val); for 16 bit data
> and out_le32 for 32 bit data.
>
> Using these out_le16/32 will ensure portability across different endian
> platforms.
>
>
> >Is 0x1234 stored as little endian on the cards ram ?
> Yes.

Now I'm really confused. "Yes" with or without out_le16/32 ?

When I just do
 x->command = val;
without out_leXX, is 0x1234 on PCI cards ram in little endian ?

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From aprasad at in.ibm.com  Wed Sep  4 18:33:13 2002
From: aprasad at in.ibm.com (Anil K Prasad)
Date: Wed, 4 Sep 2002 14:03:13 +0530
Subject: PCI endianess
Message-ID: <OF2C299C9A.3E8F5EC2-ON65256C2A.002E6B90@in.ibm.com>


>When I just do
>x->command = val;
>without out_leXX, is 0x1234 on PCI cards ram in little endian ?

no, it will be 0x3412 in PCI card ram. You should use out_leXX to transfer
data from system memory to card memory.

Regards,
Anil.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From bergner at brule.borg.umn.edu  Thu Sep  5 00:29:23 2002
From: bergner at brule.borg.umn.edu (Peter Bergner)
Date: Wed, 4 Sep 2002 09:29:23 -0500
Subject: 32bit kernel
In-Reply-To: <Pine.LNX.4.31.0209040857110.24023-100000@phoenix.one.melware.de>
References: <20020903180936.23346@192.168.4.1> <Pine.LNX.4.31.0209040857110.24023-100000@phoenix.one.melware.de>
Message-ID: <20020904092923.A1002063@brule.borg.umn.edu>


Armin Schindler wrote:
: I use SuSE SLES 7 on my 44p/270 and to do the porting of
: a driver to big-endian/64bit a little bit easier (in two steps),
: I thought beginning with big-endian and not 64bit is a good idea.
: So the question is, is it possible to use a 32bit linux kernel (ppc)
: on this machine together with SLES 7 ?
: What configuration of kernel and yaboot do I need for it ?
: If there is a documenation for this, I didn't find it.

A 32-bit kernel should work just fine on that box.  In fact, the
SLES7 install CD's already have a 32-bit POWER3 kernel on them.
You should be able to use the one on that CD.  You don't need
to make any yaboot changes.  Just use the one that was installed.
Note that the 32-bit kernel does not run on *star or POWER4
processors, only the POWER3 processor you have.

Peter


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Thu Sep  5 00:42:31 2002
From: mac at melware.de (Armin Schindler)
Date: Wed, 4 Sep 2002 16:42:31 +0200 (MEST)
Subject: 32bit kernel
In-Reply-To: <20020904092923.A1002063@brule.borg.umn.edu>
Message-ID: <Pine.LNX.4.31.0209041640420.28115-100000@phoenix.one.melware.de>


On Wed, 4 Sep 2002, Peter Bergner wrote:
> Armin Schindler wrote:
> : I use SuSE SLES 7 on my 44p/270 and to do the porting of
> : a driver to big-endian/64bit a little bit easier (in two steps),
> : I thought beginning with big-endian and not 64bit is a good idea.
> : So the question is, is it possible to use a 32bit linux kernel (ppc)
> : on this machine together with SLES 7 ?
> : What configuration of kernel and yaboot do I need for it ?
> : If there is a documenation for this, I didn't find it.
>
> A 32-bit kernel should work just fine on that box.  In fact, the
> SLES7 install CD's already have a 32-bit POWER3 kernel on them.
> You should be able to use the one on that CD.  You don't need
> to make any yaboot changes.  Just use the one that was installed.
> Note that the 32-bit kernel does not run on *star or POWER4
> processors, only the POWER3 processor you have.

Thanks, that is what I needed to know.
If I want to compile my own kernel, so I just need to set
it to POWER3 ? Which Kernel-Image do I have to use for yaboot ?

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Thu Sep  5 00:43:04 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 04 Sep 2002 09:43:04 -0500
Subject: PCI endianess
In-Reply-To: <OF01ECB0C5.9EC9320D-ON65256C2A.002B225A@in.ibm.com>
References: <OF01ECB0C5.9EC9320D-ON65256C2A.002B225A@in.ibm.com>
Message-ID: <1031150588.12740.12.camel@q.rchland.ibm.com>


On Wed, 2002-09-04 at 03:00, Anil K Prasad wrote:
>
> >Do you mean this happens without conversion (e.g. read/write{bwl]) ?
> >Then it is exactly what I need.
> >If I do on my big endian machine:
> > unsigned short val = 0x1234;
> > x->command = val;
> you must do something like this:
>
> out_le16(x->command, val); for 16 bit data
> and out_le32 for 32 bit data.
>
> Using these out_le16/32 will ensure portability across different endian
> platforms.

You want to use the read/write macros that you get when you include
<asm/io.h>.  You don't need to specify the "le" macros because PCI by
definition is little endian.  Therefore you will find that on big endian
machines the macros will byteswap for you.  Thus:

val = readb(addr) will read an 8-bit value
val = readw(addr) will read a 16-bit value (with implied byte swap)
val = readl(addr) will read a 32-bit value (with implied byte swap)

likewise:

writeb(val, addr), writew(val,addr) and writel(val, addr) will write 8,
16 and 32 bit values respectively and they will do the byte swap for
you.

If you want to copy/clear ranges of bytes note also the memset_io,
memcpy_fromio and memcpy_toio macros (these don't do swapping -- they
work at the byte level).

The "addr" for these macros is always an address within a range you got
when you ioremap a particular memory BAR of your PCI adapter.  If you
are using an I/O BAR use the inb, inw, inl, outb, outw or outl macros
instead.

One final point:  an important reason not to write your own variations
of these accessor functions is that you don't know what is required for
access for each individual architecture.  On ppc64 there are barrier
instructions that must be executed for these operations to succeed
correctly.  If "Enhanced Error Handling" is enabled (true under a
hypervisor at least) the "addresses" returned by ioremap are actually
token values that are trivially translated by the io macros.  You don't
want to try to replicate that in your driver.

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From bergner at brule.borg.umn.edu  Thu Sep  5 01:57:30 2002
From: bergner at brule.borg.umn.edu (Peter Bergner)
Date: Wed, 4 Sep 2002 10:57:30 -0500
Subject: 32bit kernel
In-Reply-To: <Pine.LNX.4.31.0209041640420.28115-100000@phoenix.one.melware.de>
References: <20020904092923.A1002063@brule.borg.umn.edu> <Pine.LNX.4.31.0209041640420.28115-100000@phoenix.one.melware.de>
Message-ID: <20020904105730.B1002063@brule.borg.umn.edu>


Armin Schindler wrote:
: Thanks, that is what I needed to know.
: If I want to compile my own kernel, so I just need to set
: it to POWER3 ? Which Kernel-Image do I have to use for yaboot ?

I can't help you with building the ppc32 kernel as I only build
the ppc64 kernel.  However, I think the kernel located in the
SLES7 CD1: suse/images/k_chrp64.rpm rpm contains a ppc32 kernel
you can use.

Peter


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From mac at melware.de  Fri Sep  6 18:13:05 2002
From: mac at melware.de (Armin Schindler)
Date: Fri, 6 Sep 2002 10:13:05 +0200 (MEST)
Subject: 32bit kernel
In-Reply-To: <20020904105730.B1002063@brule.borg.umn.edu>
Message-ID: <Pine.LNX.4.31.0209061009540.9246-100000@phoenix.one.melware.de>


On Wed, 4 Sep 2002, Peter Bergner wrote:
>
> Armin Schindler wrote:
> : Thanks, that is what I needed to know.
> : If I want to compile my own kernel, so I just need to set
> : it to POWER3 ? Which Kernel-Image do I have to use for yaboot ?
>
> I can't help you with building the ppc32 kernel as I only build
> the ppc64 kernel.  However, I think the kernel located in the
> SLES7 CD1: suse/images/k_chrp64.rpm rpm contains a ppc32 kernel
> you can use.

Thanks, I found out which Image to use. I compiled the SuSE kernel sources
2.4.16 ARCH=ppc and the Image linux/arch/ppc/boot/images/zImage.chrp-rs6k
works with yaboot on 44p/270.

Are there known differences between ppc and ppc64 regarding performance ?

Armin


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From bergner at brule.borg.umn.edu  Sat Sep  7 00:03:15 2002
From: bergner at brule.borg.umn.edu (Peter Bergner)
Date: Fri, 6 Sep 2002 09:03:15 -0500
Subject: 32bit kernel
In-Reply-To: <Pine.LNX.4.31.0209061009540.9246-100000@phoenix.one.melware.de>
References: <20020904105730.B1002063@brule.borg.umn.edu> <Pine.LNX.4.31.0209061009540.9246-100000@phoenix.one.melware.de>
Message-ID: <20020906090315.A973219@brule.borg.umn.edu>


Armin Schindler wrote:
: Are there known differences between ppc and ppc64 regarding performance ?

Not for your driver testing/porting.  You'll see a difference if your box
has more than 3G of memory as the ppc32 kernel can only access that much
on POWER3 systems (IO lives from 3G-4G).  The ppc64 kernel also gives
each ppc32 app a full 4G of virtual address space to live in/use.
I _think_ the ppc32 kernel restricts that to 2G.

Peter


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From anton at samba.org  Mon Sep  9 23:01:28 2002
From: anton at samba.org (Anton Blanchard)
Date: Mon, 9 Sep 2002 23:01:28 +1000
Subject: PCI hotplug + EEH issues
Message-ID: <20020909130128.GC26700@krispykreme>


Hi,

Ive almost finished a PCI hotplug driver that works on SMP and LPAR.
There is a bit more work to be done to support DLPAR (ie PCI-PCI bridges
appearing and disappearing), but the basics seem to work OK.

One thing that caused problems is the faking up of the device tree node
for function 0 of a PCI-PCI bridge when its not in our partition. I no
longer do that in my local tree, Ive also fixed things to not require
that fake device node.

Ive also been experimenting with reworking the EEH enable code. We
should walk all devices and enable EEH before we do the bus walk,
otherwise we touch the devices with EEH disabled.

One question I had was why we are using device nodes everywhere (like
pci config read/write)? With the recent changes in 2.5, it is harder
to get back to device node and with a few small changes I no longer
need to convert pci_dev to device node. (except for the few times at
boot when we need to fake config reads).

Speaking of which I had a look and the current LPAR behaviour (function
0 not implemented) is in violation of the PCI spec.

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Tue Sep 10 07:43:05 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 09 Sep 2002 16:43:05 -0500
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020909130128.GC26700@krispykreme>
References: <20020909130128.GC26700@krispykreme>
Message-ID: <1031607788.12820.115.camel@q.rchland.ibm.com>

On Mon, 2002-09-09 at 08:01, Anton Blanchard wrote:
>
> Hi,
>
> Ive almost finished a PCI hotplug driver that works on SMP and LPAR.
> There is a bit more work to be done to support DLPAR (ie PCI-PCI bridges
> appearing and disappearing), but the basics seem to work OK.

Cool!  Do you have a patch?  I'd like to see this working :)

> One thing that caused problems is the faking up of the device tree node
> for function 0 of a PCI-PCI bridge when its not in our partition. I no
> longer do that in my local tree, Ive also fixed things to not require
> that fake device node.

I've hated this too.  There's got to be a better way to handle it, but
I'm guessing is that you moved the logic into the Linux pci read funcs?
I didn't want to special case that code, but I guess it doesn't need to
be fast either.

> Ive also been experimenting with reworking the EEH enable code. We
> should walk all devices and enable EEH before we do the bus walk,
> otherwise we touch the devices with EEH disabled.

I have a fix coded for bug 1197.  I'll attach the patch below.

> One question I had was why we are using device nodes everywhere (like
> pci config read/write)? With the recent changes in 2.5, it is harder
> to get back to device node and with a few small changes I no longer
> need to convert pci_dev to device node. (except for the few times at
> boot when we need to fake config reads).

The main reason was to prevent Linux from accessing devices which have
failed (i.e. BIST).  I also learned last week about the OF "status"
property that we are supposed to honor, and ironically I had just
deleted that bit of code 2 weeks before.  I have that coded and ready to
check in.  It's also just a few lines of code.

So if Linux wants to touch a device that is dead, we need to intercept
it and return bad status.  This is on the config reads, of course.
There are other ways to "flag" this but if we are going to hang arch
specific data in the pci_dev it may as well be the device node.  And if
we are going to need to support early PCI config access (i.e. BIST) we
may as well use the device node as the device "handle" for the config
reads.  So they naturally fell together.

What's changed in 2.5 that has made this harder?  I agree that Linux
fights with connecting arch data into the pci_dev when that arch data is
needed for the config reads.  A more natural approach would be to
abandon the pci driver bus walk and do our own by manufacturing the
pci_dev/bus tree from the device tree.  In fact, we could create the
entire pci_dev tree without doing a single config read!

I've been meaning to try coding this to see how it goes but haven't
found the time.  I'm also not sure what function we would lose nor am I
sure how much smaller the code will get.  We'd of course be dependent on
firmware getting it right, but so far I haven't found any counter
examples -- even with unsupported adapters.  Thoughts?

> Speaking of which I had a look and the current LPAR behaviour (function
> 0 not implemented) is in violation of the PCI spec.

True...but it would be hard to re-wire the systems in the field :(.  I
suppose RTAS could work around it like we do.

Personally, I'd like to rework a lot about the way we do eeh.  For
instance, I don't think we need to encode the mmio addresses since it
doesn't appear we will get a faster way to query eeh status, nor do we
get false positives anyway (at least from the little instrumenting code
I have in there I haven't seen any).  Since false positives never/rarely
happen we can code up handling of an eeh failure to be as slow as we
want  And the RTAS call will guarantee slowness anyway :).

-todd

-------------- next part --------------
A non-text attachment was scrubbed...
Name: eeh-init.patch
Type: text/x-diff
Size: 3536 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20020909/42fea718/attachment.patch 

From benh at kernel.crashing.org  Wed Sep 11 00:26:00 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Tue, 10 Sep 2002 16:26:00 +0200
Subject: PCI hotplug + EEH issues
In-Reply-To: <1031607788.12820.115.camel@q.rchland.ibm.com>
References: <1031607788.12820.115.camel@q.rchland.ibm.com>
Message-ID: <20020910142601.31643@192.168.4.1>


>What's changed in 2.5 that has made this harder?  I agree that Linux
>fights with connecting arch data into the pci_dev when that arch data is
>needed for the config reads.  A more natural approach would be to
>abandon the pci driver bus walk and do our own by manufacturing the
>pci_dev/bus tree from the device tree.  In fact, we could create the
>entire pci_dev tree without doing a single config read!

Heh, that's would be interesting ;) The current pci driver bus
walk has some other side effects, at least on pmac, I don't know
if you are affected at all though. For example, it will temporarily
disable both IO and MEM forwarding on any PCI<->PCI bridge during the
scan of devices below that brige. On thing like pmac where the
interrupt controller may hang behind a PCI<->PCI bridge, that means
there is a small window of time where taking an interrupt will cause
a crash during the PCI bus scan. A similar issue happen with the
BAR scanning code, as in order to get the size, the kernel obviously
has to disable the decoding on that BAR by writing F's to it, then
get the size, then write back the original address.

So the above may end up beeing a problem, either if an interrupt
can happen touching a device that we have disabled because of one
of the above, or some other system service (HV ? RTAS ?).

Using only OF to populate the PCI tree would solve that problem,
though I would have to have PCI domains in the kernel to do that
properly on pmac. Unfortunately, it will also break older pmac's
where OF was so broken it wouldn't probe behind PCI<->PCI bridges
properly...

How close is the OF code between ppc32 and ppc64 ? would it make
some sense to clean up the internal representation of the device
tree (which is, at least on ppc32, still closely tied to what I
did with BootX) and move most of the OF interface code to some
common location we could share ?

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From anton at samba.org  Wed Sep 11 01:00:16 2002
From: anton at samba.org (Anton Blanchard)
Date: Wed, 11 Sep 2002 01:00:16 +1000
Subject: PCI hotplug + EEH issues
In-Reply-To: <1031607788.12820.115.camel@q.rchland.ibm.com>
References: <20020909130128.GC26700@krispykreme> <1031607788.12820.115.camel@q.rchland.ibm.com>
Message-ID: <20020910150016.GA26567@krispykreme>


> Cool!  Do you have a patch?  I'd like to see this working :)

Yep, I'll clean it up and mail it out. I spoke to Greg KH and we
should be able to get a few things cleaned up (like moving to
driverfs which will allow us to have a hierarchy to represent pci-pci
bridges etc).

> I've hated this too.  There's got to be a better way to handle it, but
> I'm guessing is that you moved the logic into the Linux pci read funcs?
> I didn't want to special case that code, but I guess it doesn't need to
> be fast either.

Yeah since we only need it during pci probing, it doesnt have to go
real fast.

> I have a fix coded for bug 1197.  I'll attach the patch below.

Looks good, I'll put it into 2.5 tomorrow.

> What's changed in 2.5 that has made this harder?  I agree that Linux
> fights with connecting arch data into the pci_dev when that arch data is
> needed for the config reads.  A more natural approach would be to
> abandon the pci driver bus walk and do our own by manufacturing the
> pci_dev/bus tree from the device tree.  In fact, we could create the
> entire pci_dev tree without doing a single config read!

The config read/write now gets pci_bus *bus, int devfn, int where. This
means there is yet another mapping we need to take care of.

At the moment Im trying to take a step back and make sure we havent missed
an easier way of doing this. As I was working through it I realised we
have to do a bunch of config writes early to initialise the BARs on
some machines.

BTW I am suprised that POWER4 OF does not initialise the BARs.

> I've been meaning to try coding this to see how it goes but haven't
> found the time.  I'm also not sure what function we would lose nor am I
> sure how much smaller the code will get.  We'd of course be dependent on
> firmware getting it right, but so far I haven't found any counter
> examples -- even with unsupported adapters.  Thoughts?

Not sure, I havent thought about this idea much. I'll ask Paul tomorrow.

> Personally, I'd like to rework a lot about the way we do eeh.  For
> instance, I don't think we need to encode the mmio addresses since it
> doesn't appear we will get a faster way to query eeh status, nor do we
> get false positives anyway (at least from the little instrumenting code
> I have in there I haven't seen any).  Since false positives never/rarely
> happen we can code up handling of an eeh failure to be as slow as we
> want  And the RTAS call will guarantee slowness anyway :).

Im strongly in favour of this :)

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From anton at samba.org  Wed Sep 11 01:03:22 2002
From: anton at samba.org (Anton Blanchard)
Date: Wed, 11 Sep 2002 01:03:22 +1000
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020910142601.31643@192.168.4.1>
References: <1031607788.12820.115.camel@q.rchland.ibm.com> <20020910142601.31643@192.168.4.1>
Message-ID: <20020910150322.GB26567@krispykreme>


> How close is the OF code between ppc32 and ppc64 ? would it make
> some sense to clean up the internal representation of the device
> tree (which is, at least on ppc32, still closely tied to what I
> did with BootX) and move most of the OF interface code to some
> common location we could share ?

Yes please! We should be able to start moving large bits of the OF
interface out to somewhere common. For example the ppc64 interrupt
parsing code is still buggy (its your original version), whereas on
ppc32 its fixed.

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Wed Sep 11 01:13:22 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Tue, 10 Sep 2002 17:13:22 +0200
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020910150322.GB26567@krispykreme>
References: <20020910150322.GB26567@krispykreme>
Message-ID: <20020910151322.26416@192.168.4.1>


>> How close is the OF code between ppc32 and ppc64 ? would it make
>> some sense to clean up the internal representation of the device
>> tree (which is, at least on ppc32, still closely tied to what I
>> did with BootX) and move most of the OF interface code to some
>> common location we could share ?
>
>Yes please! We should be able to start moving large bits of the OF
>interface out to somewhere common. For example the ppc64 interrupt
>parsing code is still buggy (its your original version), whereas on
>ppc32 its fixed.

Ok, so most of prom.c whould be janitor'ed & shared, right ?

Where in the kernel tree can we put such common thing ?

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Wed Sep 11 04:14:36 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Tue, 10 Sep 2002 20:14:36 +0200
Subject: PCI hotplug + EEH issues
In-Reply-To: <1031687783.12740.125.camel@q.rchland.ibm.com>
References: <1031687783.12740.125.camel@q.rchland.ibm.com>
Message-ID: <20020910181436.20293@192.168.4.1>


>Pete Bergner and I have talked about this a lot, but haven't had time to
>go anywhere with it.  What we'd like is to not only share code, but to
>move *all* of the Open Firmware code out of the kernel proper (i.e. into
>a wrapper or probably zImage).  This isn't a new idea, of course, as it
>has been discussed for ppc32 in the past -- I'm just not sure if it has
>gone anywhere there either.

Ok, here we are mixing 2 different things

 - The actual interface to OF, which is currently done in the
kernel during the early init stage and could be moved to a
wrapper

 - The kernel's internal representation of the device-tree along
with related "parsing" functions. This is used during most of the
kernel's life and require no interaction with OF itself.

>The zImage code would build the device tree as a data structure,
>instantiate RTAS (if RTAS exists), initialize display devices, unpack
>the kernel and pass control to the kernel with birecs for all these
>things.  The kernel would either use the device tree data structure in
>place as-is, or it would copy it and/or alter the structure as it sees
>fit.  Since this code could be identical for ppc32 (in fact, it would
>run in 32-bit even in our hardware) it would be nice to share code -- or
>at least keep the code in sync.

Well, at least some of this code will stay platform specific,
at least the one doing the actual interaction with OF as various
OF implementation tend to differ significantly, especially
regarding the way memory is allocated & mapped.

What I'm thinking about is, at first, sharing the code related
to dealing with the actual device_node data structure (which should
be cleaned up from old pmac cruft) and routines that deal with it
during kernel lifetime, typically things like get_property, find_device,
etc...

While we are at it, I'd also pick a naming convention making these
less prone to colision, like using "of_" prefix all the time.

>Is this still a the idea over in ppc32-land, or has the thinking changed
>over the past year or so? :)

Both ideas are around, near to no time is available, at least on
my side :( (and my current lack of ppc64 hw doesn't help though
that might be solved sooner or later).

>If the code really is shared I'm not sure where it goes either.  For the
>short term I wouldn't have a problem with just keeping it in sync.  It
>would be annoying, but not difficult.

In ppc32, paulus did the split already between prom_init and prom.c
The point here is to at least split what is related to live interaction
with OF itself, and later dealing with the kernel internal representation
of the device-tree

Making sure we agree on these and then sharing prom.c would be a good
first step. Then, bits of ppc32's pci.c dealing with pci<->OF relationship
would benefit from some cleaning & sharing as well, though I'm more and
ore thinkinh about going further and really sharing the whole pci.c

That leads to an old question about how to share stuffs between arch's ;)

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Wed Sep 11 05:56:22 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 10 Sep 2002 14:56:22 -0500
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020910151322.26416@192.168.4.1>
References: <20020910150322.GB26567@krispykreme> 
	<20020910151322.26416@192.168.4.1>
Message-ID: <1031687783.12740.125.camel@q.rchland.ibm.com>


On Tue, 2002-09-10 at 10:13, Benjamin Herrenschmidt wrote:
>
> >> How close is the OF code between ppc32 and ppc64 ? would it make
> >> some sense to clean up the internal representation of the device
> >> tree (which is, at least on ppc32, still closely tied to what I
> >> did with BootX) and move most of the OF interface code to some
> >> common location we could share ?
> >
> >Yes please! We should be able to start moving large bits of the OF
> >interface out to somewhere common. For example the ppc64 interrupt
> >parsing code is still buggy (its your original version), whereas on
> >ppc32 its fixed.
>
> Ok, so most of prom.c whould be janitor'ed & shared, right ?
>
> Where in the kernel tree can we put such common thing ?

Pete Bergner and I have talked about this a lot, but haven't had time to
go anywhere with it.  What we'd like is to not only share code, but to
move *all* of the Open Firmware code out of the kernel proper (i.e. into
a wrapper or probably zImage).  This isn't a new idea, of course, as it
has been discussed for ppc32 in the past -- I'm just not sure if it has
gone anywhere there either.

The zImage code would build the device tree as a data structure,
instantiate RTAS (if RTAS exists), initialize display devices, unpack
the kernel and pass control to the kernel with birecs for all these
things.  The kernel would either use the device tree data structure in
place as-is, or it would copy it and/or alter the structure as it sees
fit.  Since this code could be identical for ppc32 (in fact, it would
run in 32-bit even in our hardware) it would be nice to share code -- or
at least keep the code in sync.

Is this still a the idea over in ppc32-land, or has the thinking changed
over the past year or so? :)

If the code really is shared I'm not sure where it goes either.  For the
short term I wouldn't have a problem with just keeping it in sync.  It
would be annoying, but not difficult.

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Wed Sep 11 22:28:33 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 11 Sep 2002 07:28:33 -0500
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020910181436.20293@192.168.4.1>
References: <1031687783.12740.125.camel@q.rchland.ibm.com> 
	<20020910181436.20293@192.168.4.1>
Message-ID: <1031747315.12740.141.camel@q.rchland.ibm.com>


On Tue, 2002-09-10 at 13:14, Benjamin Herrenschmidt wrote:
>
> Ok, here we are mixing 2 different things
>
>  - The actual interface to OF, which is currently done in the
> kernel during the early init stage and could be moved to a
> wrapper
>
>  - The kernel's internal representation of the device-tree along
> with related "parsing" functions. This is used during most of the
> kernel's life and require no interaction with OF itself.

Yep, agreed.  In both cases I think code could be shared (or at least
identical).  Of course the 2nd would be 64-bit in ppc64, while the first
is 32-bit under both ppc32 and ppc64 (and could be the same object
files, in fact).


> Well, at least some of this code will stay platform specific,
> at least the one doing the actual interaction with OF as various
> OF implementation tend to differ significantly, especially
> regarding the way memory is allocated & mapped.

That's interesting.  I would have guessed the other way around, but I
haven't played with OF much on a mac, either :(.

> What I'm thinking about is, at first, sharing the code related
> to dealing with the actual device_node data structure (which should
> be cleaned up from old pmac cruft) and routines that deal with it
> during kernel lifetime, typically things like get_property, find_device,
> etc...
>
> While we are at it, I'd also pick a naming convention making these
> less prone to colision, like using "of_" prefix all the time.

Agreed.  This stuff needs janitor work first because it directly affects
the kernel.

[...]
> In ppc32, paulus did the split already between prom_init and prom.c
> The point here is to at least split what is related to live interaction
> with OF itself, and later dealing with the kernel internal representation
> of the device-tree
>
> Making sure we agree on these and then sharing prom.c would be a good
> first step. Then, bits of ppc32's pci.c dealing with pci<->OF relationship
> would benefit from some cleaning & sharing as well, though I'm more and
> ore thinkinh about going further and really sharing the whole pci.c

But now I am confused.  Do you really think we can share prom.c?  Or by
"various OF implementations tend to differ" are you talking only stuff
that zImage (or yaboot) has to do.  Is allocating memory the only
problem?

> That leads to an old question about how to share stuffs between arch's ;)

Well, in the short run I have no problem just sharing copies of the
code.  Once they are identical, we could just put a comment at the top
of each saying they a supposed to be in sync.  It might not last in the
long run, but probably worth a shot.  Once we get enough code we can
gripe on l-k about it as a non-theoretical problem.

So should I make a stab at defining the of_* functions, or has someone
already done this?  One thing in particular I've never liked are the
find_* funcs that link stuff in ->next lists (or ->allnext lists in case
you need that 2nd query....).

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From anton at samba.org  Wed Sep 11 23:51:21 2002
From: anton at samba.org (Anton Blanchard)
Date: Wed, 11 Sep 2002 23:51:21 +1000
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020911144957.29571@192.168.4.1>
References: <1031747315.12740.141.camel@q.rchland.ibm.com> <20020911144957.29571@192.168.4.1>
Message-ID: <20020911135121.GA922@krispykreme>


> That reminds me we need to add a rwlock around device-tree functions
> since we can (and will occasionally) write to it (add properties
> is what I have in mind, locking on access to an actual property data
> is beyond the scope).

Yep, Im going to need to add and remove properties in the pSeries PCI
hotplug driver :)

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Wed Sep 11 23:53:07 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 11 Sep 2002 08:53:07 -0500
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020911144957.29571@192.168.4.1>
References: <1031747315.12740.141.camel@q.rchland.ibm.com> 
	<20020911144957.29571@192.168.4.1>
Message-ID: <1031752388.12820.149.camel@q.rchland.ibm.com>


On Wed, 2002-09-11 at 09:49, Benjamin Herrenschmidt wrote:
[...]
> >But now I am confused.  Do you really think we can share prom.c?  Or by
> >"various OF implementations tend to differ" are you talking only stuff
> >that zImage (or yaboot) has to do.  Is allocating memory the only
> >problem?
>
> I was talking about eventually keeping zImage/yaboot code split
> (and what's in prom_init.c).
> Allocating memory is one thing, dealing with displays another, well,
> we may be able to keep the same code, I'm just not sure it's worth it
> if moved to a zImage wrapper.
>
> prom.c, that deals with the device-tree during kernel lifetime, and
> other PCI<->OF, OF resource, etc... functions could probably be shared.

Pete just reminded me how prom.c and prom_init.c are split :).  All
along I was thinking we'd have a prom.c and a device_tree.c.  Just got
confused....

[...]
> >So should I make a stab at defining the of_* functions, or has someone
> >already done this?  One thing in particular I've never liked are the
> >find_* funcs that link stuff in ->next lists (or ->allnext lists in case
> >you need that 2nd query....).
>
> Well, they definitely cause interesting locking issues... How would
> you fix that ? Defining an opaque iterator type ? Just passing a
> device_node to start the search from ?

Yeah, I was thinking having a start node would make for the simplest
interface (where NULL means begin at the top, of course).  It's pretty
trival to traverse the tree -- even when you want to traverse all nodes
-- so I don't see any reason to keep the next and allnext ptrs.

I'm also not sure we need to cache the addresses and interrupts for pci
nodes.  Since access to these is not performance critical (i.e. only
during device init) can't we just have functions that given a specific
node could compute these values?  I've never understood why they needed
to be computed all up front.

> That reminds me we need to add a rwlock around device-tree functions
> since we can (and will occasionally) write to it (add properties
> is what I have in mind, locking on access to an actual property data
> is beyond the scope).

Good thought.  We may even add entire sub-trees during hot plug
operations.

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Thu Sep 12 00:49:57 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Wed, 11 Sep 2002 16:49:57 +0200
Subject: PCI hotplug + EEH issues
In-Reply-To: <1031747315.12740.141.camel@q.rchland.ibm.com>
References: <1031747315.12740.141.camel@q.rchland.ibm.com>
Message-ID: <20020911144957.29571@192.168.4.1>


>> Making sure we agree on these and then sharing prom.c would be a good
>> first step. Then, bits of ppc32's pci.c dealing with pci<->OF relationship
>> would benefit from some cleaning & sharing as well, though I'm more and
>> ore thinkinh about going further and really sharing the whole pci.c
>
>But now I am confused.  Do you really think we can share prom.c?  Or by
>"various OF implementations tend to differ" are you talking only stuff
>that zImage (or yaboot) has to do.  Is allocating memory the only
>problem?

I was talking about eventually keeping zImage/yaboot code split
(and what's in prom_init.c).
Allocating memory is one thing, dealing with displays another, well,
we may be able to keep the same code, I'm just not sure it's worth it
if moved to a zImage wrapper.

prom.c, that deals with the device-tree during kernel lifetime, and
other PCI<->OF, OF resource, etc... functions could probably be shared.

I have no definitive idea on how to implement the interation between
driverfs and the device-tree, but that also could be shared.

>> That leads to an old question about how to share stuffs between arch's ;)
>
>Well, in the short run I have no problem just sharing copies of the
>code.  Once they are identical, we could just put a comment at the top
>of each saying they a supposed to be in sync.  It might not last in the
>long run, but probably worth a shot.  Once we get enough code we can
>gripe on l-k about it as a non-theoretical problem.

Ok. Well, depending if apple ends up releasing 64 bits boxes or not,
more will have to be shared anyway ;)

>So should I make a stab at defining the of_* functions, or has someone
>already done this?  One thing in particular I've never liked are the
>find_* funcs that link stuff in ->next lists (or ->allnext lists in case
>you need that 2nd query....).

Well, they definitely cause interesting locking issues... How would
you fix that ? Defining an opaque iterator type ? Just passing a
device_node to start the search from ?

That reminds me we need to add a rwlock around device-tree functions
since we can (and will occasionally) write to it (add properties
is what I have in mind, locking on access to an actual property data
is beyond the scope).

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Thu Sep 12 04:11:57 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 11 Sep 2002 13:11:57 -0500
Subject: PCI hotplug + EEH issues
In-Reply-To: <20020911144957.29571@192.168.4.1>
References: <1031747315.12740.141.camel@q.rchland.ibm.com> 
	<20020911144957.29571@192.168.4.1>
Message-ID: <1031767918.12740.319.camel@q.rchland.ibm.com>


On Wed, 2002-09-11 at 09:49, Benjamin Herrenschmidt wrote:
>
> >So should I make a stab at defining the of_* functions, or has someone
> >already done this?  One thing in particular I've never liked are the
> >find_* funcs that link stuff in ->next lists (or ->allnext lists in case
> >you need that 2nd query....).

The more I think about locking the more fun this gets :).

Here's what I'd expect we'd do to iterate over, say, "pci" nodes:

for (dn = of_find_next(NULL, "pci"); dn; of_find_next(dn, "pci")) {
	/* Do something with dn */
}

I suppose it could be written cleaner like this (same API):

dn = NULL;
while ((dn = of_find_next(dn, "pci")) != NULL) {
	/* Do something with dn */
}

The trouble of course is that dn could be deleted by hotplug (or some
other clever code), so we still need locking.

We could make the caller hold a big device-tree lock, but then the
caller would need to hold the lock for as long as it uses dn.  This
might be ok, but then no code can hold a ptr to the device node for any
length of time.

Another possibility is to use reference counting on the nodes (still
using a lock to traverse from one node to the next and bump the count),
but at least the caller will not need to hold a lock while *using* a
node.  I'm not sure I like the complexity of this idea considering the
details of deleting a node while a traversal is in process somewhere
else.

Thoughts?

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Thu Sep 12 17:31:46 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Thu, 12 Sep 2002 09:31:46 +0200
Subject: PCI hotplug + EEH issues
In-Reply-To: <1031767918.12740.319.camel@q.rchland.ibm.com>
References: <1031767918.12740.319.camel@q.rchland.ibm.com>
Message-ID: <20020912073146.7979@192.168.4.1>


>Another possibility is to use reference counting on the nodes (still
>using a lock to traverse from one node to the next and bump the count),
>but at least the caller will not need to hold a lock while *using* a
>node.  I'm not sure I like the complexity of this idea considering the
>details of deleting a node while a traversal is in process somewhere
>else.
>
>Thoughts?

I prefer that reference counting solution. We have one lock protecting
the tree itself from insertion/removal and taken inside the traversal
functions. Any function returning a device_node will return it with
the reference counter incremented (the actual increment beeing done
with the lock held of course). Callers are expected to call some
kind of of_put_node() function that would then do the de-ref counting
and eventual deletion of the node data structure (pool allocation ?)

(I prefer of_release_node terminology but it seems we have a bunch of
get/put fanatics in the linux playfield ;)

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From sopwith at redhat.com  Thu Sep 19 00:48:11 2002
From: sopwith at redhat.com (Elliot Lee)
Date: Wed, 18 Sep 2002 10:48:11 -0400 (EDT)
Subject: binutils/bfd/elf64-ppc.c patch: Add PLTREL24 support
Message-ID: <Pine.LNX.4.44.0209181040560.16914-200000@devserv.devel.redhat.com>

I understand these are good addresses to send binutils ppc64 patches to -
please correct me if wrong...

This patch makes bfd handle the PLTREL24 relocation thingie. I needed it
to make glibc-CVS/sysdeps/powerpc/powerpc64/elf/start.S assemble. I just
did a cut & paste from elf32-ppc.c, hopefully it's sane.

-- Elliot
"Do not trust the horse, Trojans! Whatever it is, I fear the Greeks, even
though they bring gifts."
-------------- next part --------------
? obj
? ppc64-pltrel24.patch
Index: bfd/elf64-ppc.c
===================================================================
RCS file: /cvs/src/src/bfd/elf64-ppc.c,v
retrieving revision 1.63
diff -u -r1.63 elf64-ppc.c
--- bfd/elf64-ppc.c	22 Aug 2002 01:27:19 -0000	1.63
+++ bfd/elf64-ppc.c	18 Sep 2002 14:40:38 -0000
@@ -543,6 +543,22 @@
 	 0xffffffff,		/* dst_mask */
 	 false),		/* pcrel_offset */
 
+  /* 24-bit PC relative relocation to the symbol's procedure linkage table.
+     FIXME: R_PPC64_PLTREL24 not supported.  */
+  HOWTO (R_PPC64_PLTREL24,	/* type */
+	 0,			/* rightshift */
+	 2,			/* size (0 = byte, 1 = short, 2 = long) */
+	 26,			/* bitsize */
+	 true,			/* pc_relative */
+	 0,			/* bitpos */
+	 complain_overflow_signed, /* complain_on_overflow */
+	 bfd_elf_generic_reloc,	/* special_function */
+	 "R_PPC64_PLTREL24",	/* name */
+	 false,			/* partial_inplace */
+	 0,			/* src_mask */
+	 0x3ffffffc,		/* dst_mask */
+	 true),			/* pcrel_offset */
+
   /* 32-bit PC relative relocation to the symbol's procedure linkage table.
      FIXME: R_PPC64_PLTREL32 not supported.  */
   HOWTO (R_PPC64_PLTREL32,	/* type */
@@ -1258,6 +1274,8 @@
       break;
     case BFD_RELOC_PPC_GLOB_DAT:	 ppc_reloc = R_PPC64_GLOB_DAT;
       break;
+    case BFD_RELOC_24_PLT_PCREL:         ppc_reloc = R_PPC64_PLTREL24;
+      break;
     case BFD_RELOC_32_PCREL:		 ppc_reloc = R_PPC64_REL32;
       break;
     case BFD_RELOC_32_PLTOFF:		 ppc_reloc = R_PPC64_PLT32;
@@ -5837,6 +5855,7 @@
 	case R_PPC64_PLTGOT16_LO_DS:
 	case R_PPC64_PLTREL32:
 	case R_PPC64_PLTREL64:
+	case R_PPC64_PLTREL24:
 	  /* These ones haven't been implemented yet.  */
 
 	  (*_bfd_error_handler)
Index: include/elf/ppc.h
===================================================================
RCS file: /cvs/src/src/include/elf/ppc.h,v
retrieving revision 1.8
diff -u -r1.8 ppc.h
--- include/elf/ppc.h	12 Feb 2002 06:31:24 -0000	1.8
+++ include/elf/ppc.h	18 Sep 2002 14:40:39 -0000
@@ -160,6 +160,7 @@
 #define R_PPC64_REL32             R_PPC_REL32
 #define R_PPC64_PLT32             R_PPC_PLT32
 #define R_PPC64_PLTREL32          R_PPC_PLTREL32
+#define R_PPC64_PLTREL24          R_PPC_PLTREL24
 #define R_PPC64_PLT16_LO          R_PPC_PLT16_LO
 #define R_PPC64_PLT16_HI          R_PPC_PLT16_HI
 #define R_PPC64_PLT16_HA          R_PPC_PLT16_HA

From amodra at bigpond.net.au  Thu Sep 19 11:54:51 2002
From: amodra at bigpond.net.au (Alan Modra)
Date: Thu, 19 Sep 2002 11:24:51 +0930
Subject: binutils/bfd/elf64-ppc.c patch: Add PLTREL24 support
In-Reply-To: <Pine.LNX.4.44.0209181040560.16914-200000@devserv.devel.redhat.com>; from sopwith@redhat.com on Wed, Sep 18, 2002 at 10:48:11AM -0400
References: <Pine.LNX.4.44.0209181040560.16914-200000@devserv.devel.redhat.com>
Message-ID: <20020919112451.C14457@bubble.sa.bigpond.net.au>


On Wed, Sep 18, 2002 at 10:48:11AM -0400, Elliot Lee wrote:
> This patch makes bfd handle the PLTREL24 relocation thingie. I needed it
> to make glibc-CVS/sysdeps/powerpc/powerpc64/elf/start.S assemble. I just
> did a cut & paste from elf32-ppc.c, hopefully it's sane.

Hi Elliot,
  PLTREL24 isn't correct for powerpc64.  The powerpc64 plt doesn't
contain code so there is no need for a branch to the plt.  I suspect
you're using the wrong assembler when trying to compile glibc.

--
Alan Modra
IBM OzLabs - Linux Technology Centre

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From anton at samba.org  Thu Sep 19 18:03:06 2002
From: anton at samba.org (Anton Blanchard)
Date: Thu, 19 Sep 2002 18:03:06 +1000
Subject: PCI probe code
Message-ID: <20020919080306.GC6186@krispykreme>


Hi,

Ive been looking over our PCI probe code and it could do with a cleanup.
It would be nice to make things simpler before PCI domains and PCI hotplug
goes in. A recent bug I found suggests that our BAR reallocation code has
always been buggy and I got to thinking...

Can we assume that OF will always get the BAR allocation correct? If so
then a bunch of code goes away. Even if this is not the case, we should
be able to use the generic hooks a lot more than we currently do
(eg we should probably use pci_assign_unassigned_resources)

The question I have is how does iSeries handle BAR allocation?

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Thu Sep 19 20:15:06 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Thu, 19 Sep 2002 12:15:06 +0200
Subject: PCI probe code
In-Reply-To: <20020919080306.GC6186@krispykreme>
References: <20020919080306.GC6186@krispykreme>
Message-ID: <20020919101506.26730@192.168.4.1>


>Ive been looking over our PCI probe code and it could do with a cleanup.
>It would be nice to make things simpler before PCI domains and PCI hotplug
>goes in. A recent bug I found suggests that our BAR reallocation code has
>always been buggy and I got to thinking...
>
>Can we assume that OF will always get the BAR allocation correct? If so
>then a bunch of code goes away. Even if this is not the case, we should
>be able to use the generic hooks a lot more than we currently do
>(eg we should probably use pci_assign_unassigned_resources)
>
>The question I have is how does iSeries handle BAR allocation?

I've been thinking about that for ppc32 as well, and encountered
various problems.

One of them is that OF will not assign BARs that aren't requested
by a card's OF driver (typically most ATI video cards with OF driver
will not request the IO BAR). The kernel wants all BARs assigned.
That leads into some problems as OF also tends to close the IO
region of PCI<->PCI bridges if no device below them has an assigned
IO BAR. I encoutered the case on the XServe where OF didn't assign
the IO BAR of the ATI card behind a PCI<->PCI bridge and closed that
bridge IO region (but kept a memory region of course). The kernel
tried to allocate that IO BAR, but failed becasue of the lack of the
IO region on the PCI<->PCI bridge. That's the reason why my tree
currently has some code to re-open IO regions, though that code is
probably not generic enough.

There are other issues with the code in setup-bus
(pci_assign_unassigned_resources and friends). Currently, on ppc32 at
least, pcibios_fixup_pbus_ranges() doesn't seem to be correct, thus
causing the setup-bus code to not properly deal with the various
offsets we have on some platforms.

Also, the code in setup-bus makes lots of assumptions about the layout
of resource in a pci_bus structure. Typically, it expects all busses
(including host busses, cardbus ones, etc...) to exactly follow the
layout of a PCI<->PCI bridge, that is resource 0 is IO, resource 1 is
MEM, and resource 2 is MEM+PREFETCH, period.

The result is that it doesn't deal properly with some setups I have
on pmac (dunno about you on chrp here) where a host bus may expose
several discontiguous MEM and no MEM+PREFETCH etc...

It would surely be nice to be able to "fix" setup-bus, though this
is a bit difficult as Ivan seems to be reluctant to adding genericity
to that code (he uses it on alpha afaik), I already had difficulties
getting him to accept that pci_read_bridge_bases should take care of
all 4 resource of the pci_bus structure when dealing with a transparent
bridge (a real one now that we fixed that code), he didn't fix it until
Linus himself told him he was wrong, so....

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From sopwith at redhat.com  Thu Sep 19 20:51:39 2002
From: sopwith at redhat.com (Elliot Lee)
Date: Thu, 19 Sep 2002 06:51:39 -0400 (EDT)
Subject: binutils/bfd/elf64-ppc.c patch: Add PLTREL24 support
In-Reply-To: <20020919112451.C14457@bubble.sa.bigpond.net.au>
Message-ID: <Pine.LNX.4.44.0209190615150.23900-201000@devserv.devel.redhat.com>

On Thu, 19 Sep 2002, Alan Modra wrote:

> On Wed, Sep 18, 2002 at 10:48:11AM -0400, Elliot Lee wrote:
> > This patch makes bfd handle the PLTREL24 relocation thingie. I needed it
> > to make glibc-CVS/sysdeps/powerpc/powerpc64/elf/start.S assemble. I just
> > did a cut & paste from elf32-ppc.c, hopefully it's sane.
>
> Hi Elliot,
>   PLTREL24 isn't correct for powerpc64.  The powerpc64 plt doesn't
> contain code so there is no need for a branch to the plt.  I suspect
> you're using the wrong assembler when trying to compile glibc.

If that were true, the patch wouldn't fix the problem ('make install' is
most definitely installing /opt/ppc64-foo/bin/powerpc64-linux-as, not
/usr/bin/as or anything else on my path).

It's could just be me on crack playing with glibc when the ppc64 merge
isn't complete - the preprocessed start.s and the command that produced it
are both attached for you to judge.

-- Elliot
"Do not trust the horse, Trojans! Whatever it is, I fear the Greeks, even
though they bring gifts."

-------------- next part --------------
A non-text attachment was scrubbed...
Name: build-start.sh
Type: application/x-sh
Size: 1304 bytes
Desc: 
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20020919/4cdffa46/attachment.sh 
-------------- next part --------------
# 1 "../sysdeps/powerpc/powerpc64/elf/start.S"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "../include/libc-symbols.h" 1
# 56 "../include/libc-symbols.h"
# 1 "/usr/src/libc/build-libc/config.h" 1
# 57 "../include/libc-symbols.h" 2
# 2 "<command line>" 2
# 1 "../sysdeps/powerpc/powerpc64/elf/start.S"
# 20 "../sysdeps/powerpc/powerpc64/elf/start.S"
# 1 "../sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h" 1
# 25 "../sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h"
# 1 "../sysdeps/unix/powerpc/sysdep.h" 1
# 19 "../sysdeps/unix/powerpc/sysdep.h"
# 1 "../sysdeps/unix/sysdep.h" 1
# 19 "../sysdeps/unix/sysdep.h"
# 1 "../sysdeps/generic/sysdep.h" 1
# 20 "../sysdeps/unix/sysdep.h" 2

# 1 "../sysdeps/unix/sysv/linux/sys/syscall.h" 1
# 25 "../sysdeps/unix/sysv/linux/sys/syscall.h"
# 1 "/usr/src/linux-2.4.19/include/asm/unistd.h" 1 3
# 26 "../sysdeps/unix/sysv/linux/sys/syscall.h" 2
# 22 "../sysdeps/unix/sysdep.h" 2
# 20 "../sysdeps/unix/powerpc/sysdep.h" 2
# 1 "../sysdeps/powerpc/sysdep.h" 1
# 21 "../sysdeps/unix/powerpc/sysdep.h" 2
# 26 "../sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h" 2
# 21 "../sysdeps/powerpc/powerpc64/elf/start.S" 2
# 1 "../sysdeps/generic/bp-sym.h" 1
# 22 "../sysdeps/powerpc/powerpc64/elf/start.S" 2


        .section ".rodata"
        .align 3
        .weak _init
        .weak _fini
        .weak ._init
        .weak ._fini
.Lstart_addresses:
        .quad 0

        .quad main
        .quad _init
        .quad _fini

        .size .Lstart_addresses,.-.Lstart_addresses

        .section ".toc","aw"
.L01:
        .tc .Lstart_addresses[TC],.Lstart_addresses
        .section ".text"
.globl _start; .type _start, at function; .align 2; _start:

        mr 9,1

        clrrdi 1,1,4
        li 0,0
        stdu 1,-128(1)
        mtlr 0
        std 0,0(1)


        ld 8,.L01(2)


        b __libc_start_main at plt

.size _start,.-_start


        .section ".data"
        .globl __data_start
__data_start:
.weak data_start ; data_start = __data_start ; .weak .data_start ; .data_start = .__data_start

From amodra at bigpond.net.au  Thu Sep 19 21:09:29 2002
From: amodra at bigpond.net.au (Alan Modra)
Date: Thu, 19 Sep 2002 20:39:29 +0930
Subject: binutils/bfd/elf64-ppc.c patch: Add PLTREL24 support
In-Reply-To: <Pine.LNX.4.44.0209190615150.23900-201000@devserv.devel.redhat.com>; from sopwith@redhat.com on Thu, Sep 19, 2002 at 06:51:39AM -0400
References: <20020919112451.C14457@bubble.sa.bigpond.net.au> <Pine.LNX.4.44.0209190615150.23900-201000@devserv.devel.redhat.com>
Message-ID: <20020919203929.G14457@bubble.sa.bigpond.net.au>


On Thu, Sep 19, 2002 at 06:51:39AM -0400, Elliot Lee wrote:
>         b __libc_start_main at plt

Ought to be

        b .__libc_start_main

For some reason you're getting the 32 bit powerpc version of JUMPTARGET.

--
Alan Modra
IBM OzLabs - Linux Technology Centre

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From peter at bergner.org  Thu Sep 19 23:16:56 2002
From: peter at bergner.org (Peter Bergner)
Date: Thu, 19 Sep 2002 08:16:56 -0500
Subject: binutils/bfd/elf64-ppc.c patch: Add PLTREL24 support
In-Reply-To: <Pine.LNX.4.44.0209190615150.23900-201000@devserv.devel.redhat.com>
References: <20020919112451.C14457@bubble.sa.bigpond.net.au> <Pine.LNX.4.44.0209190615150.23900-201000@devserv.devel.redhat.com>
Message-ID: <20020919081656.A1071347@brule.borg.umn.edu>


: It's could just be me on crack playing with glibc when the ppc64 merge
: isn't complete - the preprocessed start.s and the command that produced it
: are both attached for you to judge.

There are important parts of the ppc64 code that haven't been merged yet.
As Alan mentioned in another note, you may be picking up some ppc32
code/headers/defines due to ppc64 not being in fully yet.

We're getting close though!

Peter


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From trautman at us.ibm.com  Fri Sep 20 00:29:57 2002
From: trautman at us.ibm.com (Al Trautman)
Date: Thu, 19 Sep 2002 09:29:57 -0500
Subject: PCI probe code
Message-ID: <OF568602C8.34086682-ON86256C39.0048388A@rchland.ibm.com>


>>The question I have is how does iSeries handle BAR allocation?
>>
>>Anton

The Hypervisor assigns the real BARs, however the values in Linux are
meaningless.  That is  because Linux can not use access this space
directly, it can only access space via Hypervisor calls.

After the bus walk, there is code that sets up  pci_dev resources with
virtual addresses.  It uses the pci_resource_len that was setup in the
./drivers/pci/pci.c buswalk  to calculate the virtual addressing spacing.
When the drivers use the I/O macros, the virtual address gets mapped to the
device, bar, and offset for the Hypervisor call.

Peace, Al

Allan H Trautman (507-253-3508)
Dept. 8DN    Bld 030-2/R115
IBM Rochester, Minnesota


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From tinglett at vnet.ibm.com  Sat Sep 21 02:01:45 2002
From: tinglett at vnet.ibm.com (Todd Inglett)
Date: 20 Sep 2002 11:01:45 -0500
Subject: PCI probe code
In-Reply-To: <20020919101506.26730@192.168.4.1>
References: <20020919080306.GC6186@krispykreme> 
	<20020919101506.26730@192.168.4.1>
Message-ID: <1032537713.20072.5.camel@q.rchland.ibm.com>


On Thu, 2002-09-19 at 05:15, Benjamin Herrenschmidt wrote:

> One of them is that OF will not assign BARs that aren't requested
> by a card's OF driver (typically most ATI video cards with OF driver
> will not request the IO BAR). The kernel wants all BARs assigned.

I've never seen this at least with IBM's firmware.  For example, if I
put in a video capture card where there is clearly no driver the BARs
still get assigned properly.

However, it could be possible that this only occurs if a driver *does*
exist.  Maybe the firmware goes down a non-driver path otherwise.

Are you trying to use the bar reallocation code for hot plug?

-todd


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/


From benh at kernel.crashing.org  Sat Sep 21 02:39:32 2002
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Fri, 20 Sep 2002 18:39:32 +0200
Subject: PCI probe code
In-Reply-To: <1032537713.20072.5.camel@q.rchland.ibm.com>
References: <1032537713.20072.5.camel@q.rchland.ibm.com>
Message-ID: <20020920163933.30181@192.168.4.1>


>I've never seen this at least with IBM's firmware.  For example, if I
>put in a video capture card where there is clearly no driver the BARs
>still get assigned properly.
>
>However, it could be possible that this only occurs if a driver *does*
>exist.  Maybe the firmware goes down a non-driver path otherwise.

I think it happens because the ATI OF driver (and possibly some
Adaptec OF drivers too) override the "reg" property to explicitely
not request the IO memory to be assigned.

>Are you trying to use the bar reallocation code for hot plug?

No, all I do right now is for boot-time. I pushed my tweak to
bk linuxppc_2_4 today so you can look at the code if you want.

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/