From david at gibson.dropbear.id.au  Sun Jan  2 09:33:45 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Sun, 2 Jan 2005 09:33:45 +1100
Subject: [PATCH] sparse fixes for cpu feature constants
In-Reply-To: <1104381206.16694.38.camel@localhost.localdomain>
References: <1104381206.16694.38.camel@localhost.localdomain>
Message-ID: <20050101223345.GC2297@zax>

On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote:
> Hi-
> 
> I've been playing around with sparse a little and saw that it gives a
> lot of warnings like this:
> 
> arch/ppc64/mm/init.c:755:35: warning: constant 0x0000020000000000 is so
> big it is long
> 
> It looks like we get such a warning for every expression of the form
> "(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)" -- basically,
> every time the code checks for a cpu feature.
> 
> Following is an attempt to clean these up by defining the cpu feature
> constants using the ASM_CONST macro from ppc64's page.h.  I believe this
> is consistent with the intentions for ASM_CONST's use.
> 
> There's some fallout:
> 
> flush_icache_range() was already using ASM_CONST on one of the
> constants, so that is fixed up.
> 
> switch_mm() uses a BEGIN_FTR_SECTION ...
> END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change
> since 0x0000000000000008UL winds up in the generated assembly.  I
> couldn't find the BEGIN/END_FTR_SECTION construct used in any other C
> code, so I replaced this with the usual bitwise 'and' conditional (I
> hope someone else will verify that this is equivalent :).
> 
> So, does this look like the right thing to do?  It eliminates 129 sparse
> warnings from a defconfig 2.6.10 build.

Hurrah!  You beat me to it...

> Index: 2.6.10/include/asm-ppc64/cputable.h
> ===================================================================
> +++ 2.6.10/include/asm-ppc64/cputable.h	2004-12-30 04:04:09.463979408 +0000
> @@ -16,6 +16,7 @@
>  #define __ASM_PPC_CPUTABLE_H
>  
>  #include <linux/config.h>
> +#include <asm/page.h> /* for ASM_CONST */

Have you double checked that this won't cause a nasty #include loop?
The CPU constants are used in quite a few places, as is page.h

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From nathanl at austin.ibm.com  Tue Jan  4 00:16:24 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Mon, 03 Jan 2005 07:16:24 -0600
Subject: [PATCH] sparse fixes for cpu feature constants
In-Reply-To: <20050101223345.GC2297@zax>
References: <1104381206.16694.38.camel@localhost.localdomain>
	<20050101223345.GC2297@zax>
Message-ID: <1104758184.15200.6.camel@localhost.localdomain>

On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote:
> On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote:
> >
> > Index: 2.6.10/include/asm-ppc64/cputable.h
> > ===================================================================
> > +++ 2.6.10/include/asm-ppc64/cputable.h	2004-12-30 04:04:09.463979408 +0000
> > @@ -16,6 +16,7 @@
> >  #define __ASM_PPC_CPUTABLE_H
> >  
> >  #include <linux/config.h>
> > +#include <asm/page.h> /* for ASM_CONST */
> 
> Have you double checked that this won't cause a nasty #include loop?
> The CPU constants are used in quite a few places, as is page.h

I think it's ok -- page.h includes the following:

- linux/config.h, which includes linux/autoconf.h

- asm-ppc64/naca.h, which includes asm-ppc64/types.h and
asm-ppc64/systemcfg.h.

So I don't see any way that cputable.h could be pulled in before
ASM_CONST is defined.

Thanks,
Nathan


From jdl at freescale.com  Tue Jan  4 05:56:39 2005
From: jdl at freescale.com (Jon Loeliger)
Date: Mon, 03 Jan 2005 12:56:39 -0600
Subject: PATCH uninorth3 (G5) agp support
In-Reply-To: <41D00564.6010507@free.fr>
References: <41CEC6B0.5020106@free.fr> <1104137527.5615.20.camel@gaston>
	<41D00564.6010507@free.fr>
Message-ID: <1104778599.14049.64.camel@cashmere.sps.mot.com>

On Mon, 2004-12-27 at 06:51, Jerome Glisse wrote:

>  /* My understanding of UniNorth AGP as of UniNorth rev 1.0x,
>   * revision 1.5 (x4 AGP) may need further changes.
> diff -Naur linux/include/linux/pci_ids.h linux-new/include/linux/pci_ids.h
> --- linux/include/linux/pci_ids.h	2004-12-26 14:40:05.000000000 +0100
> +++ linux-new/include/linux/pci_ids.h	2004-12-27 13:40:50.121003792 +0100
> @@ -842,6 +842,7 @@
>  #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC2	0x0032

>  #define PCI_DEVIEC_ID_APPLE_UNI_N_ATA	0x0033

>  #define PCI_DEVICE_ID_APPLE_UNI_N_AGP2	0x0034
> +#define PCI_DEVICE_ID_APPLE_U3_AGP	0x0059
>  #define PCI_DEVICE_ID_APPLE_IPID_ATA100	0x003b
>  #define PCI_DEVICE_ID_APPLE_KEYLARGO_I	0x003e
>  #define PCI_DEVICE_ID_APPLE_K2_ATA100	0x0043

So, did 0x0033's symbol need to be spelled consistently too?
NB: PCI_DEVIEC_

Thanks,
jdl


From david at gibson.dropbear.id.au  Tue Jan  4 11:07:23 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Tue, 4 Jan 2005 11:07:23 +1100
Subject: [PATCH] sparse fixes for cpu feature constants
In-Reply-To: <1104758184.15200.6.camel@localhost.localdomain>
References: <1104381206.16694.38.camel@localhost.localdomain>
	<20050101223345.GC2297@zax>
	<1104758184.15200.6.camel@localhost.localdomain>
Message-ID: <20050104000723.GB6745@zax>

On Mon, Jan 03, 2005 at 07:16:24AM -0600, Nathan Lynch wrote:
> On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote:
> > On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote:
> > >
> > > Index: 2.6.10/include/asm-ppc64/cputable.h
> > > ===================================================================
> > > +++ 2.6.10/include/asm-ppc64/cputable.h	2004-12-30 04:04:09.463979408 +0000
> > > @@ -16,6 +16,7 @@
> > >  #define __ASM_PPC_CPUTABLE_H
> > >  
> > >  #include <linux/config.h>
> > > +#include <asm/page.h> /* for ASM_CONST */
> > 
> > Have you double checked that this won't cause a nasty #include loop?
> > The CPU constants are used in quite a few places, as is page.h
> 
> I think it's ok -- page.h includes the following:
> 
> - linux/config.h, which includes linux/autoconf.h
> 
> - asm-ppc64/naca.h, which includes asm-ppc64/types.h and
> asm-ppc64/systemcfg.h.
> 
> So I don't see any way that cputable.h could be pulled in before
> ASM_CONST is defined.

Ok, sounds good.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From sfr at canb.auug.org.au  Tue Jan  4 14:53:56 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 14:53:56 +1100
Subject: PPC64 cleanups 0/11
Message-ID: <20050104145356.4d5333dd.sfr@canb.auug.org.au>

Hi Andrew,

The following series of patches are mainly just cleanups of the ppc64 code
in order to eliminate the naca structure.  In the end, the naca only
exists for legacy iseries kernels.  One of the more intrusive parts of
these patches is the renaming of the fields of the lppaca structure to
eliminate another set of StudyCaps.

These patches (in total) have been built on iSeries, pSeries and pmac and
booted on iSeries and pSeries.

Please apply and send upstream.

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/340c1852/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:04:10 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:04:10 +1100
Subject: [PATCH 1/11] PPC64: Consolidate cache sizing variables
In-Reply-To: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
Message-ID: <20050104150410.199b132e.sfr@canb.auug.org.au>

Hi Andrew,

This patch consolidates the variables that define the PPC64 cache sizes
into a single structure (the were in the naca and the systemcfg
structures).  Those that were in the systemcfg structure are left there
just because they are exported to user mode through /proc.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c
--- linus-bk/arch/ppc64/kernel/asm-offsets.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c	2004-12-31 14:52:14.000000000 +1100
@@ -35,6 +35,7 @@
 #include <asm/iSeries/HvLpEvent.h>
 #include <asm/rtas.h>
 #include <asm/cputable.h>
+#include <asm/cache.h>
 
 #define DEFINE(sym, val) \
 	asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -69,12 +70,12 @@
 
 	/* naca */
         DEFINE(PACA, offsetof(struct naca_struct, paca));
-	DEFINE(DCACHEL1LINESIZE, offsetof(struct systemcfg, dCacheL1LineSize));
-        DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct naca_struct, dCacheL1LogLineSize));
-        DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct naca_struct, dCacheL1LinesPerPage));
-        DEFINE(ICACHEL1LINESIZE, offsetof(struct systemcfg, iCacheL1LineSize));
-        DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct naca_struct, iCacheL1LogLineSize));
-        DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct naca_struct, iCacheL1LinesPerPage));
+	DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size));
+	DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size));
+	DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page));
+	DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size));
+	DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size));
+	DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page));
 	DEFINE(PLATFORM, offsetof(struct systemcfg, platform));
 
 	/* paca */
diff -ruN linus-bk/arch/ppc64/kernel/eeh.c linus-bk-naca.1/arch/ppc64/kernel/eeh.c
--- linus-bk/arch/ppc64/kernel/eeh.c	2004-10-26 16:06:41.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/eeh.c	2004-12-31 14:52:14.000000000 +1100
@@ -32,6 +32,7 @@
 #include <asm/machdep.h>
 #include <asm/rtas.h>
 #include <asm/atomic.h>
+#include <asm/systemcfg.h>
 #include "pci.h"
 
 #undef DEBUG
diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk/arch/ppc64/kernel/iSeries_setup.c	2004-11-12 09:09:48.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c	2004-12-31 14:52:14.000000000 +1100
@@ -44,6 +44,7 @@
 #include "iSeries_setup.h"
 #include <asm/naca.h>
 #include <asm/paca.h>
+#include <asm/cache.h>
 #include <asm/sections.h>
 #include <asm/iSeries/LparData.h>
 #include <asm/iSeries/HvCallHpt.h>
@@ -560,33 +561,36 @@
 	unsigned int i, n;
 	unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex;
 
-	systemcfg->iCacheL1Size =
-		xIoHriProcessorVpd[procIx].xInstCacheSize * 1024;
-	systemcfg->iCacheL1LineSize =
+	systemcfg->icache_size =
+	ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024;
+	systemcfg->icache_line_size =
+	ppc64_caches.iline_size =
 		xIoHriProcessorVpd[procIx].xInstCacheOperandSize;
-	systemcfg->dCacheL1Size =
+	systemcfg->dcache_size =
+	ppc64_caches.dsize =
 		xIoHriProcessorVpd[procIx].xDataL1CacheSizeKB * 1024;
-	systemcfg->dCacheL1LineSize =
+	systemcfg->dcache_line_size =
+	ppc64_caches.dline_size =
 		xIoHriProcessorVpd[procIx].xDataCacheOperandSize;
-	naca->iCacheL1LinesPerPage = PAGE_SIZE / systemcfg->iCacheL1LineSize;
-	naca->dCacheL1LinesPerPage = PAGE_SIZE / systemcfg->dCacheL1LineSize;
+	ppc64_caches.ilines_per_page = PAGE_SIZE / ppc64_caches.iline_size;
+	ppc64_caches.dlines_per_page = PAGE_SIZE / ppc64_caches.dline_size;
 
-	i = systemcfg->iCacheL1LineSize;
+	i = ppc64_caches.iline_size;
 	n = 0;
 	while ((i = (i / 2)))
 		++n;
-	naca->iCacheL1LogLineSize = n;
+	ppc64_caches.log_iline_size = n;
 
-	i = systemcfg->dCacheL1LineSize;
+	i = ppc64_caches.dline_size;
 	n = 0;
 	while ((i = (i / 2)))
 		++n;
-	naca->dCacheL1LogLineSize = n;
+	ppc64_caches.log_dline_size = n;
 
 	printk("D-cache line size = %d\n",
-			(unsigned int)systemcfg->dCacheL1LineSize);
+			(unsigned int)ppc64_caches.dline_size);
 	printk("I-cache line size = %d\n",
-			(unsigned int)systemcfg->iCacheL1LineSize);
+			(unsigned int)ppc64_caches.iline_size);
 }
 
 /*
diff -ruN linus-bk/arch/ppc64/kernel/idle.c linus-bk-naca.1/arch/ppc64/kernel/idle.c
--- linus-bk/arch/ppc64/kernel/idle.c	2004-10-27 07:32:57.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/idle.c	2004-12-31 14:52:14.000000000 +1100
@@ -32,6 +32,7 @@
 #include <asm/iSeries/HvCall.h>
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/systemcfg.h>
 
 extern void power4_idle(void);
 
diff -ruN linus-bk/arch/ppc64/kernel/misc.S linus-bk-naca.1/arch/ppc64/kernel/misc.S
--- linus-bk/arch/ppc64/kernel/misc.S	2004-11-12 09:09:48.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/misc.S	2004-12-31 14:52:14.000000000 +1100
@@ -189,6 +189,11 @@
 	isync
 	blr
 
+	.section	".toc","aw"
+PPC64_CACHES:
+	.tc		ppc64_caches[TC],ppc64_caches
+	.section	".text"
+
 /*
  * Write any modified data cache blocks out to memory
  * and invalidate the corresponding instruction cache blocks.
@@ -207,11 +212,8 @@
  * and in some cases i-cache and d-cache line sizes differ from
  * each other.
  */
-	LOADADDR(r10,naca)		/* Get Naca address */
-	ld	r10,0(r10)
-	LOADADDR(r11,systemcfg)		/* Get systemcfg address */
-	ld	r11,0(r11)
-	lwz	r7,DCACHEL1LINESIZE(r11)/* Get cache line size */
+ 	ld	r10,PPC64_CACHES at toc(r2)
+	lwz	r7,DCACHEL1LINESIZE(r10)/* Get cache line size */
 	addi	r5,r7,-1
 	andc	r6,r3,r5		/* round low to line bdy */
 	subf	r8,r6,r4		/* compute length */
@@ -227,7 +229,7 @@
 
 /* Now invalidate the instruction cache */
 	
-	lwz	r7,ICACHEL1LINESIZE(r11)	/* Get Icache line size */
+	lwz	r7,ICACHEL1LINESIZE(r10)	/* Get Icache line size */
 	addi	r5,r7,-1
 	andc	r6,r3,r5		/* round low to line bdy */
 	subf	r8,r6,r4		/* compute length */
@@ -256,11 +258,8 @@
  * 
  * Different systems have different cache line sizes
  */
-	LOADADDR(r10,naca)		/* Get Naca address */
-	ld	r10,0(r10)
-	LOADADDR(r11,systemcfg)		/* Get systemcfg address */
-	ld	r11,0(r11)
-	lwz	r7,DCACHEL1LINESIZE(r11)	/* Get dcache line size */
+ 	ld	r10,PPC64_CACHES at toc(r2)
+	lwz	r7,DCACHEL1LINESIZE(r10)	/* Get dcache line size */
 	addi	r5,r7,-1
 	andc	r6,r3,r5		/* round low to line bdy */
 	subf	r8,r6,r4		/* compute length */
@@ -286,11 +285,8 @@
  *    flush all bytes from start to stop-1 inclusive
  */
 _GLOBAL(flush_dcache_phys_range)
-	LOADADDR(r10,naca)		/* Get Naca address */
-	ld	r10,0(r10)
-	LOADADDR(r11,systemcfg)		/* Get systemcfg address */
-	ld	r11,0(r11)
-	lwz	r7,DCACHEL1LINESIZE(r11)	/* Get dcache line size */
+ 	ld	r10,PPC64_CACHES at toc(r2)
+	lwz	r7,DCACHEL1LINESIZE(r10)	/* Get dcache line size */
 	addi	r5,r7,-1
 	andc	r6,r3,r5		/* round low to line bdy */
 	subf	r8,r6,r4		/* compute length */
@@ -332,13 +328,10 @@
  */
 
 /* Flush the dcache */
-	LOADADDR(r7,naca)
-	ld	r7,0(r7)
-	LOADADDR(r8,systemcfg)			/* Get systemcfg address */
-	ld	r8,0(r8)
+ 	ld	r7,PPC64_CACHES at toc(r2)
 	clrrdi	r3,r3,12           	    /* Page align */
 	lwz	r4,DCACHEL1LINESPERPAGE(r7)	/* Get # dcache lines per page */
-	lwz	r5,DCACHEL1LINESIZE(r8)		/* Get dcache line size */
+	lwz	r5,DCACHEL1LINESIZE(r7)		/* Get dcache line size */
 	mr	r6,r3
 	mtctr	r4
 0:	dcbst	0,r6
@@ -349,7 +342,7 @@
 /* Now invalidate the icache */	
 
 	lwz	r4,ICACHEL1LINESPERPAGE(r7)	/* Get # icache lines per page */
-	lwz	r5,ICACHEL1LINESIZE(r8)		/* Get icache line size */
+	lwz	r5,ICACHEL1LINESIZE(r7)		/* Get icache line size */
 	mtctr	r4
 1:	icbi	0,r3
 	add	r3,r3,r5
diff -ruN linus-bk/arch/ppc64/kernel/nvram.c linus-bk-naca.1/arch/ppc64/kernel/nvram.c
--- linus-bk/arch/ppc64/kernel/nvram.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/nvram.c	2004-12-31 14:52:14.000000000 +1100
@@ -31,6 +31,7 @@
 #include <asm/rtas.h>
 #include <asm/prom.h>
 #include <asm/machdep.h>
+#include <asm/systemcfg.h>
 
 #undef DEBUG_NVRAM
 
diff -ruN linus-bk/arch/ppc64/kernel/pSeries_iommu.c linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c
--- linus-bk/arch/ppc64/kernel/pSeries_iommu.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c	2004-12-31 14:52:14.000000000 +1100
@@ -43,6 +43,7 @@
 #include <asm/machdep.h>
 #include <asm/abs_addr.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/systemcfg.h>
 #include "pci.h"
 
 
diff -ruN linus-bk/arch/ppc64/kernel/pacaData.c linus-bk-naca.1/arch/ppc64/kernel/pacaData.c
--- linus-bk/arch/ppc64/kernel/pacaData.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/pacaData.c	2004-12-31 14:52:14.000000000 +1100
@@ -10,6 +10,8 @@
 #include <linux/config.h>
 #include <linux/types.h>
 #include <linux/threads.h>
+#include <linux/module.h>
+
 #include <asm/processor.h>
 #include <asm/ptrace.h>
 #include <asm/page.h>
@@ -20,7 +22,9 @@
 #include <asm/paca.h>
 
 struct naca_struct *naca;
+EXPORT_SYMBOL(naca);
 struct systemcfg *systemcfg;
+EXPORT_SYMBOL(systemcfg);
 
 /* This symbol is provided by the linker - let it fill in the paca
  * field correctly */
diff -ruN linus-bk/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c
--- linus-bk/arch/ppc64/kernel/pmac_setup.c	2004-10-25 18:18:33.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c	2004-12-31 14:52:14.000000000 +1100
@@ -70,6 +70,7 @@
 #include <asm/time.h>
 #include <asm/of_device.h>
 #include <asm/lmb.h>
+#include <asm/naca.h>
 
 #include "pmac.h"
 #include "mpic.h"
diff -ruN linus-bk/arch/ppc64/kernel/ppc_ksyms.c linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c
--- linus-bk/arch/ppc64/kernel/ppc_ksyms.c	2004-10-21 07:17:18.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c	2004-12-31 14:52:14.000000000 +1100
@@ -67,7 +67,6 @@
 
 EXPORT_SYMBOL(__down_interruptible);
 EXPORT_SYMBOL(__up);
-EXPORT_SYMBOL(naca);
 EXPORT_SYMBOL(__down);
 #ifdef CONFIG_PPC_ISERIES
 EXPORT_SYMBOL(itLpNaca);
@@ -162,4 +161,3 @@
 EXPORT_SYMBOL(tb_ticks_per_usec);
 EXPORT_SYMBOL(paca);
 EXPORT_SYMBOL(cur_cpu_spec);
-EXPORT_SYMBOL(systemcfg);
diff -ruN linus-bk/arch/ppc64/kernel/rtas-proc.c linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c
--- linus-bk/arch/ppc64/kernel/rtas-proc.c	2004-10-21 07:17:18.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c	2004-12-31 14:52:14.000000000 +1100
@@ -31,6 +31,7 @@
 #include <asm/rtas.h>
 #include <asm/machdep.h> /* for ppc_md */
 #include <asm/time.h>
+#include <asm/systemcfg.h>
 
 /* Token for Sensors */
 #define KEY_SWITCH		0x0001
diff -ruN linus-bk/arch/ppc64/kernel/rtas.c linus-bk-naca.1/arch/ppc64/kernel/rtas.c
--- linus-bk/arch/ppc64/kernel/rtas.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/rtas.c	2004-12-31 14:52:14.000000000 +1100
@@ -29,6 +29,7 @@
 #include <asm/udbg.h>
 #include <asm/delay.h>
 #include <asm/uaccess.h>
+#include <asm/systemcfg.h>
 
 struct flash_block_list_header rtas_firmware_flash_list = {0, NULL};
 
diff -ruN linus-bk/arch/ppc64/kernel/rtasd.c linus-bk-naca.1/arch/ppc64/kernel/rtasd.c
--- linus-bk/arch/ppc64/kernel/rtasd.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/rtasd.c	2004-12-31 14:52:14.000000000 +1100
@@ -26,6 +26,7 @@
 #include <asm/prom.h>
 #include <asm/nvram.h>
 #include <asm/atomic.h>
+#include <asm/systemcfg.h>
 
 #if 0
 #define DEBUG(A...)	printk(KERN_ERR A)
diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-naca.1/arch/ppc64/kernel/setup.c
--- linus-bk/arch/ppc64/kernel/setup.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/setup.c	2004-12-31 16:22:00.000000000 +1100
@@ -54,6 +54,7 @@
 #include <asm/rtas.h>
 #include <asm/iommu.h>
 #include <asm/serial.h>
+#include <asm/cache.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -111,6 +112,8 @@
 int boot_cpuid_phys = 0;
 dev_t boot_dev;
 
+struct ppc64_caches ppc64_caches;
+
 /*
  * These are used in binfmt_elf.c to put aux entries on the stack
  * for each elf executable being started.
@@ -489,15 +492,15 @@
 			lsizep = (u32 *) get_property(np, dc, NULL);
 			if (lsizep != NULL)
 				lsize = *lsizep;
-
 			if (sizep == 0 || lsizep == 0)
 				DBG("Argh, can't find dcache properties ! "
 				    "sizep: %p, lsizep: %p\n", sizep, lsizep);
 
-			systemcfg->dCacheL1Size = size;
-			systemcfg->dCacheL1LineSize = lsize;
-			naca->dCacheL1LogLineSize = __ilog2(lsize);
-			naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize);
+			systemcfg->dcache_size = ppc64_caches.dsize = size;
+			systemcfg->dcache_line_size =
+				ppc64_caches.dline_size = lsize;
+			ppc64_caches.log_dline_size = __ilog2(lsize);
+			ppc64_caches.dlines_per_page = PAGE_SIZE / lsize;
 
 			size = 0;
 			lsize = cur_cpu_spec->icache_bsize;
@@ -511,11 +514,11 @@
 				DBG("Argh, can't find icache properties ! "
 				    "sizep: %p, lsizep: %p\n", sizep, lsizep);
 
-			systemcfg->iCacheL1Size = size;
-			systemcfg->iCacheL1LineSize = lsize;
-			naca->iCacheL1LogLineSize = __ilog2(lsize);
-			naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize);
-
+			systemcfg->icache_size = ppc64_caches.isize = size;
+			systemcfg->icache_line_size =
+				ppc64_caches.iline_size = lsize;
+			ppc64_caches.log_iline_size = __ilog2(lsize);
+			ppc64_caches.ilines_per_page = PAGE_SIZE / lsize;
 		}
 	}
 
@@ -664,8 +667,10 @@
 	printk("systemcfg->platform           = 0x%x\n", systemcfg->platform);
 	printk("systemcfg->processorCount     = 0x%lx\n", systemcfg->processorCount);
 	printk("systemcfg->physicalMemorySize = 0x%lx\n", systemcfg->physicalMemorySize);
-	printk("systemcfg->dCacheL1LineSize   = 0x%x\n", systemcfg->dCacheL1LineSize);
-	printk("systemcfg->iCacheL1LineSize   = 0x%x\n", systemcfg->iCacheL1LineSize);
+	printk("ppc64_caches.dcache_line_size = 0x%x\n",
+			ppc64_caches.dline_size);
+	printk("ppc64_caches.icache_line_size = 0x%x\n",
+			ppc64_caches.iline_size);
 	printk("htab_data.htab                = 0x%p\n", htab_data.htab);
 	printk("htab_data.num_ptegs           = 0x%lx\n", htab_data.htab_num_ptegs);
 	printk("-----------------------------------------------------\n");
@@ -1000,8 +1005,8 @@
 	 * Systems with OF can look in the properties on the cpu node(s)
 	 * for a possibly more accurate value.
 	 */
-	dcache_bsize = systemcfg->dCacheL1LineSize; 
-	icache_bsize = systemcfg->iCacheL1LineSize; 
+	dcache_bsize = ppc64_caches.dline_size;
+	icache_bsize = ppc64_caches.iline_size;
 
 	/* reboot on panic */
 	panic_timeout = 180;
diff -ruN linus-bk/arch/ppc64/kernel/sys_ppc32.c linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c
--- linus-bk/arch/ppc64/kernel/sys_ppc32.c	2004-10-28 16:57:54.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c	2004-12-31 14:52:14.000000000 +1100
@@ -73,6 +73,7 @@
 #include <asm/ppcdebug.h>
 #include <asm/time.h>
 #include <asm/mmu_context.h>
+#include <asm/systemcfg.h>
 
 #include "pci.h"
 
diff -ruN linus-bk/arch/ppc64/kernel/sysfs.c linus-bk-naca.1/arch/ppc64/kernel/sysfs.c
--- linus-bk/arch/ppc64/kernel/sysfs.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-naca.1/arch/ppc64/kernel/sysfs.c	2004-12-31 14:52:14.000000000 +1100
@@ -13,6 +13,7 @@
 #include <asm/cputable.h>
 #include <asm/hvcall.h>
 #include <asm/prom.h>
+#include <asm/systemcfg.h>
 
 
 /* SMT stuff */
diff -ruN linus-bk/arch/ppc64/kernel/time.c linus-bk-naca.1/arch/ppc64/kernel/time.c
--- linus-bk/arch/ppc64/kernel/time.c	2004-10-21 07:17:18.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/time.c	2004-12-31 14:52:14.000000000 +1100
@@ -66,6 +66,7 @@
 #include <asm/ppcdebug.h>
 #include <asm/prom.h>
 #include <asm/sections.h>
+#include <asm/systemcfg.h>
 
 void smp_local_timer_interrupt(struct pt_regs *);
 
diff -ruN linus-bk/arch/ppc64/kernel/traps.c linus-bk-naca.1/arch/ppc64/kernel/traps.c
--- linus-bk/arch/ppc64/kernel/traps.c	2004-09-09 09:59:49.000000000 +1000
+++ linus-bk-naca.1/arch/ppc64/kernel/traps.c	2004-12-31 14:52:14.000000000 +1100
@@ -37,6 +37,7 @@
 #include <asm/processor.h>
 #include <asm/ppcdebug.h>
 #include <asm/rtas.h>
+#include <asm/systemcfg.h>
 
 #ifdef CONFIG_PPC_PSERIES
 /* This is true if we are using the firmware NMI handler (typically LPAR) */
diff -ruN linus-bk/include/asm-ppc64/cache.h linus-bk-naca.1/include/asm-ppc64/cache.h
--- linus-bk/include/asm-ppc64/cache.h	2002-08-28 06:04:10.000000000 +1000
+++ linus-bk-naca.1/include/asm-ppc64/cache.h	2004-12-31 14:52:14.000000000 +1100
@@ -7,6 +7,8 @@
 #ifndef __ARCH_PPC64_CACHE_H
 #define __ARCH_PPC64_CACHE_H
 
+#include <asm/types.h>
+
 /* bytes per L1 cache line */
 #define L1_CACHE_SHIFT	7
 #define L1_CACHE_BYTES	(1 << L1_CACHE_SHIFT)
@@ -14,4 +16,21 @@
 #define SMP_CACHE_BYTES L1_CACHE_BYTES
 #define L1_CACHE_SHIFT_MAX 7	/* largest L1 which this arch supports */
 
+#ifndef __ASSEMBLY__
+
+struct ppc64_caches {
+	u32	dsize;			/* L1 d-cache size */
+	u32	dline_size;		/* L1 d-cache line size	*/
+	u32	log_dline_size;
+	u32	dlines_per_page;
+	u32	isize;			/* L1 i-cache size */
+	u32	iline_size;		/* L1 i-cache line size	*/
+	u32	log_iline_size;
+	u32	ilines_per_page;
+};
+
+extern struct ppc64_caches ppc64_caches;
+
+#endif
+
 #endif
diff -ruN linus-bk/include/asm-ppc64/naca.h linus-bk-naca.1/include/asm-ppc64/naca.h
--- linus-bk/include/asm-ppc64/naca.h	2004-09-16 21:51:58.000000000 +1000
+++ linus-bk-naca.1/include/asm-ppc64/naca.h	2004-12-31 14:52:14.000000000 +1100
@@ -16,11 +16,7 @@
 #ifndef __ASSEMBLY__
 
 struct naca_struct {
-	/*==================================================================
-	 * Cache line 1: 0x0000 - 0x007F
-	 * Kernel only data - undefined for user space
-	 *==================================================================
-	 */
+	/* Kernel only data - undefined for user space */
 	void *xItVpdAreas;              /* VPD Data                  0x00 */
 	void *xRamDisk;                 /* iSeries ramdisk           0x08 */
 	u64   xRamDiskSize;		/* In pages                  0x10 */
@@ -32,12 +28,6 @@
 	u64 interrupt_controller;	/* Type of int controller    0x40 */ 
 	u64 unused1;			/* was SLB size in entries   0x48 */
 	u64 pftSize;			/* Log 2 of page table size  0x50 */
-	void *systemcfg;		/* Pointer to systemcfg data 0x58 */
-	u32 dCacheL1LogLineSize;	/* L1 d-cache line size Log2 0x60 */
-	u32 dCacheL1LinesPerPage;	/* L1 d-cache lines / page   0x64 */
-	u32 iCacheL1LogLineSize;	/* L1 i-cache line size Log2 0x68 */
-	u32 iCacheL1LinesPerPage;	/* L1 i-cache lines / page   0x6c */
-	u8  resv0[15];                  /* Reserved           0x71 - 0x7F */
 };
 
 extern struct naca_struct *naca;
diff -ruN linus-bk/include/asm-ppc64/page.h linus-bk-naca.1/include/asm-ppc64/page.h
--- linus-bk/include/asm-ppc64/page.h	2004-10-29 07:03:22.000000000 +1000
+++ linus-bk-naca.1/include/asm-ppc64/page.h	2004-12-31 14:52:14.000000000 +1100
@@ -93,7 +93,7 @@
 
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
-#include <asm/naca.h>
+#include <asm/cache.h>
 
 #undef STRICT_MM_TYPECHECKS
 
@@ -106,8 +106,8 @@
 {
 	unsigned long lines, line_size;
 
-	line_size = systemcfg->dCacheL1LineSize; 
-	lines = naca->dCacheL1LinesPerPage;
+	line_size = ppc64_caches.dline_size;
+	lines = ppc64_caches.dlines_per_page;
 
 	__asm__ __volatile__(
 	"mtctr  	%1	# clear_page\n\
diff -ruN linus-bk/include/asm-ppc64/processor.h linus-bk-naca.1/include/asm-ppc64/processor.h
--- linus-bk/include/asm-ppc64/processor.h	2004-12-29 18:05:40.000000000 +1100
+++ linus-bk-naca.1/include/asm-ppc64/processor.h	2004-12-31 15:01:17.000000000 +1100
@@ -19,6 +19,7 @@
 #endif
 #include <asm/ptrace.h>
 #include <asm/types.h>
+#include <asm/systemcfg.h>
 
 /* Machine State Register (MSR) Fields */
 #define MSR_SF_LG	63              /* Enable 64 bit mode */
diff -ruN linus-bk/include/asm-ppc64/systemcfg.h linus-bk-naca.1/include/asm-ppc64/systemcfg.h
--- linus-bk/include/asm-ppc64/systemcfg.h	2004-09-29 08:25:16.000000000 +1000
+++ linus-bk-naca.1/include/asm-ppc64/systemcfg.h	2004-12-31 14:52:14.000000000 +1100
@@ -15,14 +15,6 @@
  * End Change Activity 
  */
 
-
-#ifndef __KERNEL__
-#include <unistd.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-#include <linux/types.h>
-#endif
-
 /*
  * If the major version changes we are incompatible.
  * Minor version changes are a hint.
@@ -50,10 +42,11 @@
 	__u64 tb_update_count;		/* Timebase atomicity ctr	0x50 */
 	__u32 tz_minuteswest;		/* Minutes west of Greenwich	0x58 */
 	__u32 tz_dsttime;		/* Type of dst correction	0x5C */
-	__u32 dCacheL1Size;		/* L1 d-cache size		0x60 */
-	__u32 dCacheL1LineSize;		/* L1 d-cache line size		0x64 */
-	__u32 iCacheL1Size;		/* L1 i-cache size		0x68 */
-	__u32 iCacheL1LineSize;		/* L1 i-cache line size		0x6C */
+	/* next four are no longer used except to be exported to /proc */
+	__u32 dcache_size;		/* L1 d-cache size		0x60 */
+	__u32 dcache_line_size;		/* L1 d-cache line size		0x64 */
+	__u32 icache_size;		/* L1 i-cache size		0x68 */
+	__u32 icache_line_size;		/* L1 i-cache line size		0x6C */
 	__u8  reserved0[3984];		/* Reserve rest of page		0x70 */
 };
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ee27c294/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:08:33 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:08:33 +1100
Subject: [PATCH 2/11] PPC64: remove the page table size from the naca
In-Reply-To: <20050104150410.199b132e.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
Message-ID: <20050104150833.5d3f3722.sfr@canb.auug.org.au>

Hi Andrew,

This patch just removes the page table size field from the naca (and makes
it ppc64_pft_size instead).

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c
--- linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c	2004-12-31 15:16:48.000000000 +1100
@@ -33,7 +33,6 @@
 #include <asm/mmu_context.h>
 #include <asm/ppcdebug.h>
 #include <asm/iommu.h>
-#include <asm/naca.h>
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <asm/prom.h>
@@ -368,7 +367,7 @@
 
 static void pSeries_lpar_hptab_clear(void)
 {
-	unsigned long size_bytes = 1UL << naca->pftSize;
+	unsigned long size_bytes = 1UL << ppc64_pft_size;
 	unsigned long hpte_count = size_bytes >> 4;
 	unsigned long dummy1, dummy2;
 	int i;
diff -ruN linus-bk-naca.1/arch/ppc64/kernel/prom.c linus-bk-naca.2/arch/ppc64/kernel/prom.c
--- linus-bk-naca.1/arch/ppc64/kernel/prom.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.2/arch/ppc64/kernel/prom.c	2004-12-31 14:52:56.000000000 +1100
@@ -844,12 +844,12 @@
 
 	/* On LPAR, look for the first ibm,pft-size property for the  hash table size
 	 */
-	if (systemcfg->platform == PLATFORM_PSERIES_LPAR && naca->pftSize == 0) {
+	if (systemcfg->platform == PLATFORM_PSERIES_LPAR && ppc64_pft_size == 0) {
 		u32 *pft_size;
 		pft_size = (u32 *)get_flat_dt_prop(node, "ibm,pft-size", NULL);
 		if (pft_size != NULL) {
 			/* pft_size[0] is the NUMA CEC cookie */
-			naca->pftSize = pft_size[1];
+			ppc64_pft_size = pft_size[1];
 		}
 	}
 
@@ -1018,7 +1018,7 @@
 	initial_boot_params = params;
 
 	/* By default, hash size is not set */
-	naca->pftSize = 0;
+	ppc64_pft_size = 0;
 
 	/* Retreive various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
@@ -1047,7 +1047,7 @@
 	/* If hash size wasn't obtained above, we calculate it now based on
 	 * the total RAM size
 	 */
-	if (naca->pftSize == 0) {
+	if (ppc64_pft_size == 0) {
 		unsigned long rnd_mem_size, pteg_count;
 
 		/* round mem_size up to next power of 2 */
@@ -1058,10 +1058,10 @@
 		/* # pages / 2 */
 		pteg_count = (rnd_mem_size >> (12 + 1));
 
-		naca->pftSize = __ilog2(pteg_count << 7);
+		ppc64_pft_size = __ilog2(pteg_count << 7);
 	}
 
-	DBG("Hash pftSize: %x\n", (int)naca->pftSize);
+	DBG("Hash pftSize: %x\n", (int)ppc64_pft_size);
 	DBG(" <- early_init_devtree()\n");
 }
 
diff -ruN linus-bk-naca.1/arch/ppc64/kernel/setup.c linus-bk-naca.2/arch/ppc64/kernel/setup.c
--- linus-bk-naca.1/arch/ppc64/kernel/setup.c	2004-12-31 16:22:00.000000000 +1100
+++ linus-bk-naca.2/arch/ppc64/kernel/setup.c	2004-12-31 16:22:49.000000000 +1100
@@ -55,6 +55,7 @@
 #include <asm/iommu.h>
 #include <asm/serial.h>
 #include <asm/cache.h>
+#include <asm/page.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -111,6 +112,7 @@
 int boot_cpuid = 0;
 int boot_cpuid_phys = 0;
 dev_t boot_dev;
+u64 ppc64_pft_size;
 
 struct ppc64_caches ppc64_caches;
 
@@ -660,7 +662,7 @@
 
 	printk("-----------------------------------------------------\n");
 	printk("naca                          = 0x%p\n", naca);
-	printk("naca->pftSize                 = 0x%lx\n", naca->pftSize);
+	printk("ppc64_pft_size                = 0x%lx\n", ppc64_pft_size);
 	printk("naca->debug_switch            = 0x%lx\n", naca->debug_switch);
 	printk("naca->interrupt_controller    = 0x%ld\n", naca->interrupt_controller);
 	printk("systemcfg                     = 0x%p\n", systemcfg);
diff -ruN linus-bk-naca.1/arch/ppc64/mm/hash_utils.c linus-bk-naca.2/arch/ppc64/mm/hash_utils.c
--- linus-bk-naca.1/arch/ppc64/mm/hash_utils.c	2004-10-29 07:03:21.000000000 +1000
+++ linus-bk-naca.2/arch/ppc64/mm/hash_utils.c	2004-12-31 14:52:56.000000000 +1100
@@ -41,7 +41,6 @@
 #include <asm/types.h>
 #include <asm/system.h>
 #include <asm/uaccess.h>
-#include <asm/naca.h>
 #include <asm/machdep.h>
 #include <asm/lmb.h>
 #include <asm/abs_addr.h>
@@ -147,7 +146,7 @@
 	 * Calculate the required size of the htab.  We want the number of
 	 * PTEGs to equal one half the number of real pages.
 	 */ 
-	htab_size_bytes = 1UL << naca->pftSize;
+	htab_size_bytes = 1UL << ppc64_pft_size;
 	pteg_count = htab_size_bytes >> 7;
 
 	/* For debug, make the HTAB 1/8 as big as it normally would be. */
diff -ruN linus-bk-naca.1/include/asm-ppc64/naca.h linus-bk-naca.2/include/asm-ppc64/naca.h
--- linus-bk-naca.1/include/asm-ppc64/naca.h	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.2/include/asm-ppc64/naca.h	2004-12-31 14:52:56.000000000 +1100
@@ -26,8 +26,6 @@
 	u64 log;                        /* Ptr to log buffer         0x30 */
 	u64 serialPortAddr;		/* Phy addr of serial port   0x38 */
 	u64 interrupt_controller;	/* Type of int controller    0x40 */ 
-	u64 unused1;			/* was SLB size in entries   0x48 */
-	u64 pftSize;			/* Log 2 of page table size  0x50 */
 };
 
 extern struct naca_struct *naca;
diff -ruN linus-bk-naca.1/include/asm-ppc64/page.h linus-bk-naca.2/include/asm-ppc64/page.h
--- linus-bk-naca.1/include/asm-ppc64/page.h	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.2/include/asm-ppc64/page.h	2004-12-31 14:52:56.000000000 +1100
@@ -183,6 +183,8 @@
 
 extern int page_is_ram(unsigned long pfn);
 
+extern u64 ppc64_pft_size;		/* Log 2 of page table size */
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef MODULE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/29fa9153/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:12:29 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:12:29 +1100
Subject: [PATCH 3/11] PPC64: remove interrupt_controller from naca
In-Reply-To: <20050104150833.5d3f3722.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
Message-ID: <20050104151229.521e8083.sfr@canb.auug.org.au>

Hi Andrew,

This patch just moves the interrupt_controller field of the naca into a
global variable.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.2/arch/ppc64/kernel/irq.c linus-bk-naca.3/arch/ppc64/kernel/irq.c
--- linus-bk-naca.2/arch/ppc64/kernel/irq.c	2004-10-21 07:17:18.000000000 +1000
+++ linus-bk-naca.3/arch/ppc64/kernel/irq.c	2004-12-31 14:53:21.000000000 +1100
@@ -65,6 +65,7 @@
 int __irq_offset_value;
 int ppc_spurious_interrupts;
 unsigned long lpevent_count;
+u64 ppc64_interrupt_controller;
 
 int show_interrupts(struct seq_file *p, void *v)
 {
@@ -360,7 +361,7 @@
 	unsigned int virq, first_virq;
 	static int warned;
 
-	if (naca->interrupt_controller == IC_OPEN_PIC)
+	if (ppc64_interrupt_controller == IC_OPEN_PIC)
 		return real_irq;	/* no mapping for openpic (for now) */
 
 	/* don't map interrupts < MIN_VIRT_IRQ */
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c
--- linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c	2004-10-30 08:33:22.000000000 +1000
+++ linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c	2004-12-31 14:53:21.000000000 +1100
@@ -155,7 +155,7 @@
 	}
 
 	/* Setup interrupt mapping options */
-	naca->interrupt_controller = IC_OPEN_PIC;
+	ppc64_interrupt_controller = IC_OPEN_PIC;
 
 	DBG(" <- maple_init_early\n");
 }
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c
--- linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c	2004-12-31 14:53:21.000000000 +1100
@@ -353,7 +353,7 @@
 	unsigned int *opprop = NULL;
 	struct device_node *root = of_find_node_by_path("/");
 
-	if (naca->interrupt_controller == IC_OPEN_PIC) {
+	if (ppc64_interrupt_controller == IC_OPEN_PIC) {
 		opprop = (unsigned int *)get_property(root,
 				"platform-open-pic", NULL);
 	}
@@ -375,7 +375,7 @@
 		pci_process_bridge_OF_ranges(phb, node);
 		pci_setup_phb_io(phb, index == 0);
 
-		if (naca->interrupt_controller == IC_OPEN_PIC && pSeries_mpic) {
+		if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) {
 			int addr = root_size_cells * (index + 2) - 1;
 			mpic_assign_isu(pSeries_mpic, index, opprop[addr]);
 		}
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c
--- linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c	2004-12-31 15:22:17.000000000 +1100
@@ -196,7 +196,7 @@
 static void __init pSeries_setup_arch(void)
 {
 	/* Fixup ppc_md depending on the type of interrupt controller */
-	if (naca->interrupt_controller == IC_OPEN_PIC) {
+	if (ppc64_interrupt_controller == IC_OPEN_PIC) {
 		ppc_md.init_IRQ       = pSeries_init_mpic; 
 		ppc_md.get_irq        = mpic_get_irq;
 		/* Allocate the mpic now, so that find_and_init_phbs() can
@@ -308,13 +308,13 @@
 	 * to properly parse the OF interrupt tree & do the virtual irq mapping
 	 */
 	__irq_offset_value = NUM_ISA_INTERRUPTS;
-	naca->interrupt_controller = IC_INVALID;
+	ppc64_interrupt_controller = IC_INVALID;
 	for (np = NULL; (np = of_find_node_by_name(np, "interrupt-controller"));) {
 		typep = (char *)get_property(np, "compatible", NULL);
 		if (strstr(typep, "open-pic"))
-			naca->interrupt_controller = IC_OPEN_PIC;
+			ppc64_interrupt_controller = IC_OPEN_PIC;
 		else if (strstr(typep, "ppc-xicp"))
-			naca->interrupt_controller = IC_PPC_XIC;
+			ppc64_interrupt_controller = IC_PPC_XIC;
 		else
 			printk("initialize_naca: failed to recognize"
 			       " interrupt-controller\n");
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c
--- linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c	2004-12-31 15:22:45.000000000 +1100
@@ -348,7 +348,7 @@
 
 	DBG(" -> smp_init_pSeries()\n");
 
-	if (naca->interrupt_controller == IC_OPEN_PIC)
+	if (ppc64_interrupt_controller == IC_OPEN_PIC)
 		smp_ops = &pSeries_mpic_smp_ops;
 	else
 		smp_ops = &pSeries_xics_smp_ops;
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c
--- linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c	2004-12-31 14:53:21.000000000 +1100
@@ -70,7 +70,6 @@
 #include <asm/time.h>
 #include <asm/of_device.h>
 #include <asm/lmb.h>
-#include <asm/naca.h>
 
 #include "pmac.h"
 #include "mpic.h"
@@ -316,7 +315,7 @@
 	}
 
 	/* Setup interrupt mapping options */
-	naca->interrupt_controller = IC_OPEN_PIC;
+	ppc64_interrupt_controller = IC_OPEN_PIC;
 
 	DBG(" <- pmac_init_early\n");
 }
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/prom.c linus-bk-naca.3/arch/ppc64/kernel/prom.c
--- linus-bk-naca.2/arch/ppc64/kernel/prom.c	2004-12-31 14:52:56.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/prom.c	2004-12-31 14:53:21.000000000 +1100
@@ -44,7 +44,6 @@
 #include <asm/system.h>
 #include <asm/mmu.h>
 #include <asm/pgtable.h>
-#include <asm/naca.h>
 #include <asm/pci.h>
 #include <asm/iommu.h>
 #include <asm/bootinfo.h>
@@ -557,7 +556,7 @@
 
 	DBG(" -> finish_device_tree\n");
 
-	if (naca->interrupt_controller == IC_INVALID) {
+	if (ppc64_interrupt_controller == IC_INVALID) {
 		DBG("failed to configure interrupt controller type\n");
 		panic("failed to configure interrupt controller type\n");
 	}
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/setup.c linus-bk-naca.3/arch/ppc64/kernel/setup.c
--- linus-bk-naca.2/arch/ppc64/kernel/setup.c	2004-12-31 16:22:49.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/setup.c	2004-12-31 16:23:03.000000000 +1100
@@ -664,7 +664,7 @@
 	printk("naca                          = 0x%p\n", naca);
 	printk("ppc64_pft_size                = 0x%lx\n", ppc64_pft_size);
 	printk("naca->debug_switch            = 0x%lx\n", naca->debug_switch);
-	printk("naca->interrupt_controller    = 0x%ld\n", naca->interrupt_controller);
+	printk("ppc64_interrupt_controller    = 0x%ld\n", ppc64_interrupt_controller);
 	printk("systemcfg                     = 0x%p\n", systemcfg);
 	printk("systemcfg->platform           = 0x%x\n", systemcfg->platform);
 	printk("systemcfg->processorCount     = 0x%lx\n", systemcfg->processorCount);
diff -ruN linus-bk-naca.2/arch/ppc64/kernel/xics.c linus-bk-naca.3/arch/ppc64/kernel/xics.c
--- linus-bk-naca.2/arch/ppc64/kernel/xics.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.3/arch/ppc64/kernel/xics.c	2004-12-31 15:24:20.000000000 +1100
@@ -24,7 +24,6 @@
 #include <asm/io.h>
 #include <asm/pgtable.h>
 #include <asm/smp.h>
-#include <asm/naca.h>
 #include <asm/rtas.h>
 #include <asm/xics.h>
 #include <asm/hvcall.h>
@@ -575,7 +574,7 @@
  */
 static int __init xics_setup_i8259(void)
 {
-	if (naca->interrupt_controller == IC_PPC_XIC &&
+	if (ppc64_interrupt_controller == IC_PPC_XIC &&
 	    xics_irq_8259_cascade != -1) {
 		if (request_irq(irq_offset_up(xics_irq_8259_cascade),
 				no_action, 0, "8259 cascade", NULL))
diff -ruN linus-bk-naca.2/include/asm-ppc64/naca.h linus-bk-naca.3/include/asm-ppc64/naca.h
--- linus-bk-naca.2/include/asm-ppc64/naca.h	2004-12-31 14:52:56.000000000 +1100
+++ linus-bk-naca.3/include/asm-ppc64/naca.h	2004-12-31 14:53:21.000000000 +1100
@@ -25,7 +25,6 @@
 	u64 banner;                     /* Ptr to banner string      0x28 */
 	u64 log;                        /* Ptr to log buffer         0x30 */
 	u64 serialPortAddr;		/* Phy addr of serial port   0x38 */
-	u64 interrupt_controller;	/* Type of int controller    0x40 */ 
 };
 
 extern struct naca_struct *naca;
diff -ruN linus-bk-naca.2/include/asm-ppc64/processor.h linus-bk-naca.3/include/asm-ppc64/processor.h
--- linus-bk-naca.2/include/asm-ppc64/processor.h	2004-12-31 15:01:17.000000000 +1100
+++ linus-bk-naca.3/include/asm-ppc64/processor.h	2004-12-31 15:25:17.000000000 +1100
@@ -484,6 +484,7 @@
 #ifdef __KERNEL__
 
 extern int have_of;
+extern u64 ppc64_interrupt_controller;
 
 struct task_struct;
 void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00000000.mimetmp
Type: application/pgp-signature
Size: 190 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment.pgp 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment-0001.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:19:06 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:19:06 +1100
Subject: [PATCH 4/11] PPC64: remove /proc/ppc64/{naca,paca/xx}
In-Reply-To: <20050104151229.521e8083.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
Message-ID: <20050104151906.6e50f1d2.sfr@canb.auug.org.au>

Hi Andrew,

This patch removes the (unused) /proc entries for the naca and the (per
cpu) pacas.  Also it removes a lot of no longer necessary includes of
<asm/naca.h>.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c
--- linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c	2004-12-10 16:26:54.000000000 +1100
@@ -35,7 +35,6 @@
 #include <asm/machdep.h>
 #include <asm/pci-bridge.h>
 #include <asm/ppcdebug.h>
-#include <asm/naca.h>
 #include <asm/iommu.h>
 
 #include <asm/iSeries/HvCallPci.h>
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c
--- linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c	2004-10-22 07:00:21.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c	2004-12-10 16:26:54.000000000 +1100
@@ -24,7 +24,6 @@
 #include <asm/paca.h>
 #include <asm/processor.h>
 #include <asm/time.h>
-#include <asm/naca.h>
 #include <asm/iSeries/ItLpPaca.h>
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/iSeries/HvCallXm.h>
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c
--- linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c	2004-10-30 08:33:22.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c	2004-12-10 16:26:54.000000000 +1100
@@ -37,7 +37,6 @@
 #include <asm/pgtable.h>
 #include <asm/io.h>
 #include <asm/smp.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/iSeries/LparData.h>
 #include <asm/iSeries/HvCall.h>
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c
--- linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c	2004-12-31 14:53:21.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c	2004-12-10 16:26:54.000000000 +1100
@@ -36,7 +36,6 @@
 #include <asm/prom.h>
 #include <asm/machdep.h>
 #include <asm/pci-bridge.h>
-#include <asm/naca.h>
 #include <asm/iommu.h>
 #include <asm/rtas.h>
 
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c
--- linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c	2004-12-31 15:22:45.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c	2004-12-31 15:27:45.000000000 +1100
@@ -38,7 +38,6 @@
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/time.h>
 #include <asm/ppcdebug.h>
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c
--- linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c	2004-10-25 18:18:33.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c	2004-12-10 16:26:54.000000000 +1100
@@ -33,7 +33,6 @@
 #include <asm/machdep.h>
 #include <asm/pci-bridge.h>
 #include <asm/ppcdebug.h>
-#include <asm/naca.h>
 #include <asm/iommu.h>
 
 #include "pci.h"
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c
--- linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c	2004-10-27 07:32:57.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c	2004-12-10 16:26:54.000000000 +1100
@@ -25,8 +25,6 @@
 #include <linux/slab.h>
 #include <linux/kernel.h>
 
-#include <asm/naca.h>
-#include <asm/paca.h>
 #include <asm/systemcfg.h>
 #include <asm/rtas.h>
 #include <asm/uaccess.h>
@@ -58,26 +56,6 @@
 #endif
 
 /*
- * NOTE: since paca data is always in flux the values will never be a
- * consistant set.
- */
-static void __init proc_create_paca(struct proc_dir_entry *dir, int num)
-{
-	struct proc_dir_entry *ent;
-	struct paca_struct *lpaca = paca + num;
-	char buf[16];
-
-	sprintf(buf, "%02x", num);
-	ent = create_proc_entry(buf, S_IRUSR, dir);
-	if (ent) {
-		ent->nlink = 1;
-		ent->data = lpaca;
-		ent->size = 4096;
-		ent->proc_fops = &page_map_fops;
-	}
-}
-
-/*
  * Create the ppc64 and ppc64/rtas directories early. This allows us to
  * assume that they have been previously created in drivers.
  */
@@ -104,17 +82,8 @@
 
 static int __init proc_ppc64_init(void)
 {
-	unsigned long i;
 	struct proc_dir_entry *pde;
 
-	pde = create_proc_entry("ppc64/naca", S_IRUSR, NULL);
-	if (!pde)
-		return 1;
-	pde->nlink = 1;
-	pde->data = naca;
-	pde->size = 4096;
-	pde->proc_fops = &page_map_fops;
-
 	pde = create_proc_entry("ppc64/systemcfg", S_IFREG|S_IRUGO, NULL);
 	if (!pde)
 		return 1;
@@ -123,13 +92,6 @@
 	pde->size = 4096;
 	pde->proc_fops = &page_map_fops;
 
-	/* /proc/ppc64/paca/XX -- raw paca contents.  Only readable to root */
-	pde = proc_mkdir("ppc64/paca", NULL);
-	if (!pde)
-		return 1;
-	for_each_cpu(i)
-		proc_create_paca(pde, i);
-
 #ifdef CONFIG_PPC_PSERIES
 	if ((systemcfg->platform & PLATFORM_PSERIES))
 		proc_ppc64_create_ofdt();
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/prom_init.c linus-bk-naca.4/arch/ppc64/kernel/prom_init.c
--- linus-bk-naca.3/arch/ppc64/kernel/prom_init.c	2004-12-08 12:07:34.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/kernel/prom_init.c	2004-12-10 16:26:54.000000000 +1100
@@ -43,7 +43,6 @@
 #include <asm/system.h>
 #include <asm/mmu.h>
 #include <asm/pgtable.h>
-#include <asm/naca.h>
 #include <asm/pci.h>
 #include <asm/iommu.h>
 #include <asm/bootinfo.h>
diff -ruN linus-bk-naca.3/arch/ppc64/kernel/smp.c linus-bk-naca.4/arch/ppc64/kernel/smp.c
--- linus-bk-naca.3/arch/ppc64/kernel/smp.c	2004-12-14 04:07:06.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/kernel/smp.c	2004-12-31 15:29:14.000000000 +1100
@@ -41,7 +41,6 @@
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/time.h>
 #include <asm/ppcdebug.h>
diff -ruN linus-bk-naca.3/arch/ppc64/mm/init.c linus-bk-naca.4/arch/ppc64/mm/init.c
--- linus-bk-naca.3/arch/ppc64/mm/init.c	2004-11-04 16:05:08.000000000 +1100
+++ linus-bk-naca.4/arch/ppc64/mm/init.c	2004-12-10 16:26:54.000000000 +1100
@@ -52,7 +52,6 @@
 #include <asm/smp.h>
 #include <asm/machdep.h>
 #include <asm/tlb.h>
-#include <asm/naca.h>
 #include <asm/eeh.h>
 #include <asm/processor.h>
 #include <asm/mmzone.h>
diff -ruN linus-bk-naca.3/arch/ppc64/mm/slb.c linus-bk-naca.4/arch/ppc64/mm/slb.c
--- linus-bk-naca.3/arch/ppc64/mm/slb.c	2004-09-06 10:19:04.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/mm/slb.c	2004-12-10 16:26:54.000000000 +1100
@@ -19,7 +19,6 @@
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/paca.h>
-#include <asm/naca.h>
 #include <asm/cputable.h>
 
 extern void slb_allocate(unsigned long ea);
diff -ruN linus-bk-naca.3/arch/ppc64/mm/stab.c linus-bk-naca.4/arch/ppc64/mm/stab.c
--- linus-bk-naca.3/arch/ppc64/mm/stab.c	2004-09-16 21:51:57.000000000 +1000
+++ linus-bk-naca.4/arch/ppc64/mm/stab.c	2004-12-10 16:26:54.000000000 +1100
@@ -17,7 +17,6 @@
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/paca.h>
-#include <asm/naca.h>
 #include <asm/cputable.h>
 
 /* Both the segment table and SLB code uses the following cache */
diff -ruN linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h
--- linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h	2002-09-18 12:00:50.000000000 +1000
+++ linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h	2004-12-10 16:26:54.000000000 +1100
@@ -24,11 +24,9 @@
 #include <asm/page.h>
 #include <asm/abs_addr.h>
 
-#include <asm/naca.h>
 #include <asm/iSeries/ItLpNaca.h>
 #include <asm/iSeries/ItLpPaca.h>
 #include <asm/iSeries/ItLpRegSave.h>
-#include <asm/paca.h>
 #include <asm/iSeries/HvReleaseData.h>
 #include <asm/iSeries/LparMap.h>
 #include <asm/iSeries/ItVpdAreas.h>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/c799e8f4/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:23:40 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:23:40 +1100
Subject: [PATCH 5/11] PPC64: remove the paca pointer form the naca
In-Reply-To: <20050104151906.6e50f1d2.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
Message-ID: <20050104152340.67219ccf.sfr@canb.auug.org.au>

Hi Andrew,

The only place that was using the paca pointer that was in the naca was
some assembler that used it to find a parameter to pass to some C code. 
That C code did not even declare that parameter!

Remove the paca pointer.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c
--- linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c	2004-12-10 17:27:14.000000000 +1100
@@ -28,7 +28,6 @@
 #include <asm/pgtable.h>
 #include <asm/processor.h>
 
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/iSeries/ItLpPaca.h>
 #include <asm/iSeries/ItLpQueue.h>
@@ -68,8 +67,6 @@
 #endif /* CONFIG_ALTIVEC */
 	DEFINE(MM, offsetof(struct task_struct, mm));
 
-	/* naca */
-        DEFINE(PACA, offsetof(struct naca_struct, paca));
 	DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size));
 	DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size));
 	DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page));
diff -ruN linus-bk-naca.4/arch/ppc64/kernel/head.S linus-bk-naca.5/arch/ppc64/kernel/head.S
--- linus-bk-naca.4/arch/ppc64/kernel/head.S	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.5/arch/ppc64/kernel/head.S	2004-12-10 18:40:24.000000000 +1100
@@ -517,12 +517,7 @@
 __start_naca:
 #ifdef CONFIG_PPC_ISERIES
 	.llong itVpdAreas
-#else
-	.llong 0x0
 #endif
-	.llong 0x0
-	.llong 0x0
-	.llong paca
 
 	. = SYSTEMCFG_PHYS_ADDR
 	.globl __end_naca
@@ -1241,6 +1236,7 @@
 #endif
 #endif
 	b 	3b			/* Loop until told to go	 */
+
 #ifdef CONFIG_PPC_ISERIES
 _STATIC(__start_initialization_iSeries)
 	/* Clear out the BSS */
@@ -1278,10 +1274,6 @@
 	SET_REG_TO_CONST(r4, NACA_VIRT_ADDR)
 	std	r4,0(r9)		/* set the naca pointer */
 
-	/* Get the pointer to the segment table */
-	ld	r6,PACA(r4)		/* Get the base paca pointer	*/
-	ld	r4,PACASTABVIRT(r6)
-
 	bl	.iSeries_early_setup
 
 	/* relocation is on at this point */
diff -ruN linus-bk-naca.4/include/asm-ppc64/naca.h linus-bk-naca.5/include/asm-ppc64/naca.h
--- linus-bk-naca.4/include/asm-ppc64/naca.h	2004-12-31 14:53:21.000000000 +1100
+++ linus-bk-naca.5/include/asm-ppc64/naca.h	2004-12-10 18:42:14.000000000 +1100
@@ -11,7 +11,6 @@
  */
 
 #include <asm/types.h>
-#include <asm/systemcfg.h>
 
 #ifndef __ASSEMBLY__
 
@@ -20,7 +19,6 @@
 	void *xItVpdAreas;              /* VPD Data                  0x00 */
 	void *xRamDisk;                 /* iSeries ramdisk           0x08 */
 	u64   xRamDiskSize;		/* In pages                  0x10 */
-	struct paca_struct *paca;	/* Ptr to an array of pacas  0x18 */
 	u64 debug_switch;		/* Debug print control       0x20 */
 	u64 banner;                     /* Ptr to banner string      0x28 */
 	u64 log;                        /* Ptr to log buffer         0x30 */
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/3e8e9116/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:27:05 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:27:05 +1100
Subject: [PATCH 6/11] PPC64: remove serialPortAddr from the naca
In-Reply-To: <20050104152340.67219ccf.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
Message-ID: <20050104152705.6030abc5.sfr@canb.auug.org.au>

Hi Andrew,

The serialPortAddr field of the naca was only being used locally, remove
it.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c
--- linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c	2004-12-31 14:53:21.000000000 +1100
+++ linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c	2004-12-11 00:53:42.000000000 +1100
@@ -75,7 +75,8 @@
 extern void maple_pci_init(void);
 extern void maple_pcibios_fixup(void);
 extern int maple_pci_get_legacy_ide_irq(struct pci_dev *dev, int channel);
-extern void generic_find_legacy_serial_ports(unsigned int *default_speed);
+extern void generic_find_legacy_serial_ports(u64 *physport,
+		unsigned int *default_speed);
 
 
 static void maple_restart(char *cmd)
@@ -129,6 +130,7 @@
 static void __init maple_init_early(void)
 {
 	unsigned int default_speed;
+	u64 physport;
 
 	DBG(" -> maple_init_early\n");
 
@@ -138,14 +140,14 @@
 	hpte_init_native();
 
 	/* Find the serial port */
-       	generic_find_legacy_serial_ports(&default_speed);
+	generic_find_legacy_serial_ports(&physport, &default_speed);
 
-	DBG("naca->serialPortAddr: %lx\n", (long)naca->serialPortAddr);
+	DBG("phys port addr: %lx\n", (long)physport);
 
-	if (naca->serialPortAddr) {
+	if (physport) {
 		void *comport;
 		/* Map the uart for udbg. */
-		comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE);
+		comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE);
 		udbg_init_uart(comport, default_speed);
 
 		ppc_md.udbg_putc = udbg_putc;
diff -ruN linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c
--- linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c	2004-12-31 15:22:17.000000000 +1100
+++ linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c	2004-12-31 15:35:13.000000000 +1100
@@ -81,7 +81,8 @@
 extern int  pSeries_set_rtc_time(struct rtc_time *rtc_time);
 extern void find_udbg_vterm(void);
 extern void SystemReset_FWNMI(void), MachineCheck_FWNMI(void);	/* from head.S */
-extern void generic_find_legacy_serial_ports(unsigned int *default_speed);
+extern void generic_find_legacy_serial_ports(u64 *physport,
+		unsigned int *default_speed);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
@@ -344,6 +345,7 @@
 	void *comport;
 	int iommu_off = 0;
 	unsigned int default_speed;
+	u64 physport;
 
 	DBG(" -> pSeries_init_early()\n");
 
@@ -357,13 +359,13 @@
 			     get_property(of_chosen, "linux,iommu-off", NULL));
 	}
 
-	generic_find_legacy_serial_ports(&default_speed);
+	generic_find_legacy_serial_ports(&physport, &default_speed);
 
 	if (systemcfg->platform & PLATFORM_LPAR)
 		find_udbg_vterm();
-	else if (naca->serialPortAddr) {
+	else if (physport) {
 		/* Map the uart for udbg. */
-		comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE);
+		comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE);
 		udbg_init_uart(comport, default_speed);
 
 		ppc_md.udbg_putc = udbg_putc;
diff -ruN linus-bk-naca.5/arch/ppc64/kernel/setup.c linus-bk-naca.6/arch/ppc64/kernel/setup.c
--- linus-bk-naca.5/arch/ppc64/kernel/setup.c	2004-12-31 16:24:54.000000000 +1100
+++ linus-bk-naca.6/arch/ppc64/kernel/setup.c	2004-12-31 16:23:30.000000000 +1100
@@ -1154,7 +1154,8 @@
 static struct plat_serial8250_port serial_ports[MAX_LEGACY_SERIAL_PORTS+1];
 static unsigned int old_serial_count;
 
-void __init generic_find_legacy_serial_ports(unsigned int *default_speed)
+void __init generic_find_legacy_serial_ports(u64 *physport,
+		unsigned int *default_speed)
 {
 	struct device_node *np;
 	u32 *sizeprop;
@@ -1172,7 +1173,7 @@
 
 	DBG(" -> generic_find_legacy_serial_port()\n");
 
-	naca->serialPortAddr = 0;
+	*physport = 0;
 	if (default_speed)
 		*default_speed = 0;
 
@@ -1294,7 +1295,7 @@
 				io_base = (io_base << 32) | rangesp[4];
 		}
 		if (io_base != 0) {
-			naca->serialPortAddr = io_base + reg->address;
+			*physport = io_base + reg->address;
 			if (default_speed && spd)
 				*default_speed = *spd;
 		}
diff -ruN linus-bk-naca.5/include/asm-ppc64/naca.h linus-bk-naca.6/include/asm-ppc64/naca.h
--- linus-bk-naca.5/include/asm-ppc64/naca.h	2004-12-10 18:42:14.000000000 +1100
+++ linus-bk-naca.6/include/asm-ppc64/naca.h	2004-12-11 00:03:55.000000000 +1100
@@ -22,7 +22,6 @@
 	u64 debug_switch;		/* Debug print control       0x20 */
 	u64 banner;                     /* Ptr to banner string      0x28 */
 	u64 log;                        /* Ptr to log buffer         0x30 */
-	u64 serialPortAddr;		/* Phy addr of serial port   0x38 */
 };
 
 extern struct naca_struct *naca;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/e06bdbb4/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:31:02 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:31:02 +1100
Subject: [PATCH 7/11] PPC64: remove debug_switch from the naca
In-Reply-To: <20050104152705.6030abc5.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
Message-ID: <20050104153102.67284491.sfr@canb.auug.org.au>

Hi Andrew,

The patch moves the debug_switch from the naca to a global variable.

Also, a couple of trivial naming tidy ups.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c
--- linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c	2004-12-31 15:35:13.000000000 +1100
+++ linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c	2004-12-31 15:39:01.000000000 +1100
@@ -56,7 +56,6 @@
 #include <asm/dma.h>
 #include <asm/machdep.h>
 #include <asm/irq.h>
-#include <asm/naca.h>
 #include <asm/time.h>
 #include <asm/nvram.h>
 
@@ -317,7 +316,7 @@
 		else if (strstr(typep, "ppc-xicp"))
 			ppc64_interrupt_controller = IC_PPC_XIC;
 		else
-			printk("initialize_naca: failed to recognize"
+			printk("pSeries_discover_pic: failed to recognize"
 			       " interrupt-controller\n");
 		break;
 	}
diff -ruN linus-bk-naca.6/arch/ppc64/kernel/setup.c linus-bk-naca.7/arch/ppc64/kernel/setup.c
--- linus-bk-naca.6/arch/ppc64/kernel/setup.c	2004-12-31 16:23:30.000000000 +1100
+++ linus-bk-naca.7/arch/ppc64/kernel/setup.c	2004-12-31 16:25:02.000000000 +1100
@@ -41,7 +41,6 @@
 #include <asm/elf.h>
 #include <asm/machdep.h>
 #include <asm/iSeries/LparData.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/ppcdebug.h>
 #include <asm/time.h>
@@ -113,6 +112,7 @@
 int boot_cpuid_phys = 0;
 dev_t boot_dev;
 u64 ppc64_pft_size;
+u64 ppc64_debug_switch;
 
 struct ppc64_caches ppc64_caches;
 
@@ -161,7 +161,7 @@
  */
 void __init ppcdbg_initialize(void)
 {
-	naca->debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */
+	ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */
 	/* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */;
 }
 
@@ -399,7 +399,7 @@
 	DBG(" -> early_setup()\n");
 
 	/*
-	 * Fill the default DBG level in naca (do we want to keep
+	 * Fill the default DBG level (do we want to keep
 	 * that old mecanism around forever ?)
 	 */
 	ppcdbg_initialize();
@@ -453,17 +453,17 @@
 
 
 /*
- * Initialize some remaining members of the naca and systemcfg structures
+ * Initialize some remaining members of the ppc64_caches and systemcfg structures
  * (at least until we get rid of them completely). This is mostly some
  * cache informations about the CPU that will be used by cache flush
  * routines and/or provided to userland
  */
-static void __init initialize_naca(void)
+static void __init initialize_cache_info(void)
 {
 	struct device_node *np;
 	unsigned long num_cpus = 0;
 
-	DBG(" -> initialize_naca()\n");
+	DBG(" -> initialize_cache_info()\n");
 
 	for (np = NULL; (np = of_find_node_by_type(np, "cpu"));) {
 		num_cpus += 1;
@@ -530,7 +530,7 @@
 	systemcfg->version.minor = SYSTEMCFG_MINOR;
 	systemcfg->processor = mfspr(SPRN_PVR);
 
-	DBG(" <- initialize_naca()\n");
+	DBG(" <- initialize_cache_info()\n");
 }
 
 static void __init check_for_initrd(void)
@@ -591,7 +591,7 @@
 	unflatten_device_tree();
 
 	/*
-	 * Fill the naca & systemcfg structures with informations
+	 * Fill the ppc64_caches & systemcfg structures with informations
 	 * retreived from the device-tree. Need to be called before
 	 * finish_device_tree() since the later requires some of the
 	 * informations filled up here to properly parse the interrupt
@@ -600,7 +600,7 @@
 	 * routines like flush_icache_range (used by the hash init
 	 * later on).
 	 */
-	initialize_naca();
+	initialize_cache_info();
 
 #ifdef CONFIG_PPC_PSERIES
 	/*
@@ -661,9 +661,8 @@
 	printk("Starting Linux PPC64 %s\n", UTS_RELEASE);
 
 	printk("-----------------------------------------------------\n");
-	printk("naca                          = 0x%p\n", naca);
 	printk("ppc64_pft_size                = 0x%lx\n", ppc64_pft_size);
-	printk("naca->debug_switch            = 0x%lx\n", naca->debug_switch);
+	printk("ppc64_debug_switch            = 0x%lx\n", ppc64_debug_switch);
 	printk("ppc64_interrupt_controller    = 0x%ld\n", ppc64_interrupt_controller);
 	printk("systemcfg                     = 0x%p\n", systemcfg);
 	printk("systemcfg->platform           = 0x%x\n", systemcfg->platform);
diff -ruN linus-bk-naca.6/arch/ppc64/kernel/udbg.c linus-bk-naca.7/arch/ppc64/kernel/udbg.c
--- linus-bk-naca.6/arch/ppc64/kernel/udbg.c	2004-11-22 14:05:02.000000000 +1100
+++ linus-bk-naca.7/arch/ppc64/kernel/udbg.c	2004-12-11 02:31:17.000000000 +1100
@@ -15,7 +15,6 @@
 #include <linux/types.h>
 #include <asm/ppcdebug.h>
 #include <asm/processor.h>
-#include <asm/naca.h>
 #include <asm/uaccess.h>
 #include <asm/machdep.h>
 #include <asm/io.h>
@@ -323,7 +322,7 @@
 /* Special print used by PPCDBG() macro */
 void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...)
 {
-	unsigned long active_debugs = debug_flags & naca->debug_switch;
+	unsigned long active_debugs = debug_flags & ppc64_debug_switch;
 
 	if (active_debugs) {
 		va_list ap;
@@ -357,5 +356,5 @@
 
 unsigned long udbg_ifdebug(unsigned long flags)
 {
-	return (flags & naca->debug_switch);
+	return (flags & ppc64_debug_switch);
 }
diff -ruN linus-bk-naca.6/arch/ppc64/xmon/xmon.c linus-bk-naca.7/arch/ppc64/xmon/xmon.c
--- linus-bk-naca.6/arch/ppc64/xmon/xmon.c	2004-11-26 12:08:51.000000000 +1100
+++ linus-bk-naca.7/arch/ppc64/xmon/xmon.c	2004-12-11 02:33:00.000000000 +1100
@@ -26,7 +26,6 @@
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 #include <asm/ppcdebug.h>
 #include <asm/cputable.h>
@@ -2360,9 +2359,9 @@
 	if (cmd == '\n') {
 		/* show current state */
 		unsigned long i;
-		printf("naca->debug_switch = 0x%lx\n", naca->debug_switch);
+		printf("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch);
 		for (i = 0; i < PPCDBG_NUM_FLAGS ;i++) {
-			on = PPCDBG_BITVAL(i) & naca->debug_switch;
+			on = PPCDBG_BITVAL(i) & ppc64_debug_switch;
 			printf("%02x %s %12s   ", i, on ? "on " : "off",  trace_names[i] ? trace_names[i] : "");
 			if (((i+1) % 3) == 0)
 				printf("\n");
@@ -2376,7 +2375,7 @@
 			on = (cmd == '+');
 			cmd = inchar();
 			if (cmd == ' ' || cmd == '\n') {  /* Turn on or off based on + or - */
-				naca->debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE;
+				ppc64_debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE;
 				printf("Setting all values to %s...\n", on ? "on" : "off");
 				if (cmd == '\n') return;
 				else cmd = skipbl(); 
@@ -2391,10 +2390,10 @@
 			return;
 		}
 		if (on) {
-			naca->debug_switch |= PPCDBG_BITVAL(val);
+			ppc64_debug_switch |= PPCDBG_BITVAL(val);
 			printf("enable debug %x %s\n", val, trace_names[val] ? trace_names[val] : "");
 		} else {
-			naca->debug_switch &= ~PPCDBG_BITVAL(val);
+			ppc64_debug_switch &= ~PPCDBG_BITVAL(val);
 			printf("disable debug %x %s\n", val, trace_names[val] ? trace_names[val] : "");
 		}
 		cmd = skipbl();
diff -ruN linus-bk-naca.6/include/asm-ppc64/naca.h linus-bk-naca.7/include/asm-ppc64/naca.h
--- linus-bk-naca.6/include/asm-ppc64/naca.h	2004-12-11 00:03:55.000000000 +1100
+++ linus-bk-naca.7/include/asm-ppc64/naca.h	2004-12-11 02:41:18.000000000 +1100
@@ -19,9 +19,6 @@
 	void *xItVpdAreas;              /* VPD Data                  0x00 */
 	void *xRamDisk;                 /* iSeries ramdisk           0x08 */
 	u64   xRamDiskSize;		/* In pages                  0x10 */
-	u64 debug_switch;		/* Debug print control       0x20 */
-	u64 banner;                     /* Ptr to banner string      0x28 */
-	u64 log;                        /* Ptr to log buffer         0x30 */
 };
 
 extern struct naca_struct *naca;
diff -ruN linus-bk-naca.6/include/asm-ppc64/ppcdebug.h linus-bk-naca.7/include/asm-ppc64/ppcdebug.h
--- linus-bk-naca.6/include/asm-ppc64/ppcdebug.h	2004-02-16 08:19:48.000000000 +1100
+++ linus-bk-naca.7/include/asm-ppc64/ppcdebug.h	2004-12-13 12:05:25.000000000 +1100
@@ -16,13 +16,14 @@
  ********************************************************************/
 
 #include <linux/config.h>
+#include <linux/types.h>
 #include <asm/udbg.h>
 #include <stdarg.h>
 
 #define PPCDBG_BITVAL(X)     ((1UL)<<((unsigned long)(X)))
 
 /* Defined below are the bit positions of various debug flags in the
- * debug_switch variable (defined in naca.h).
+ * ppc64_debug_switch variable.
  * -- When adding new values, please enter them into trace names below -- 
  *
  * Values 62 & 63 can be used to stress the hardware page table management
@@ -64,6 +65,8 @@
 
 #define PPCDBG_NUM_FLAGS     64
 
+extern u64 ppc64_debug_switch;
+
 #ifdef WANT_PPCDBG_TAB
 /* A table of debug switch names to allow name lookup in xmon 
  * (and whoever else wants it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a19eb844/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:34:45 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:34:45 +1100
Subject: [PATCH 8/11] PPC64: remove the naca from all but iSeries
In-Reply-To: <20050104153102.67284491.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
Message-ID: <20050104153445.3777e689.sfr@canb.auug.org.au>

Hi Andrew,

This patch finally removes the naca from all architectures except legacy
iSeries and in the process makes it a structure instead of a pointer.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.7/arch/ppc64/kernel/LparData.c linus-bk-naca.8/arch/ppc64/kernel/LparData.c
--- linus-bk-naca.7/arch/ppc64/kernel/LparData.c	2004-10-26 16:06:41.000000000 +1000
+++ linus-bk-naca.8/arch/ppc64/kernel/LparData.c	2004-12-11 02:49:48.000000000 +1100
@@ -44,7 +44,7 @@
 	0xc8a5d9c4,	/* desc = "HvRD" ebcdic */
 	sizeof(struct HvReleaseData),
 	offsetof(struct naca_struct, xItVpdAreas),
-	(struct naca_struct *)(NACA_VIRT_ADDR), /* 64-bit Naca address */
+	&naca,		/* 64-bit Naca address */
 	0x6000,		/* offset of LparMap within loadarea (see head.S) */
 	0,
 	1,		/* tags inactive       */
diff -ruN linus-bk-naca.7/arch/ppc64/kernel/head.S linus-bk-naca.8/arch/ppc64/kernel/head.S
--- linus-bk-naca.7/arch/ppc64/kernel/head.S	2004-12-10 18:40:24.000000000 +1100
+++ linus-bk-naca.8/arch/ppc64/kernel/head.S	2004-12-11 02:56:12.000000000 +1100
@@ -512,17 +512,15 @@
 	 */
 	. = NACA_PHYS_ADDR
 	.globl __end_interrupts
-	.globl __start_naca
 __end_interrupts:
-__start_naca:
 #ifdef CONFIG_PPC_ISERIES
+	.globl naca
+naca:
 	.llong itVpdAreas
 #endif
 
 	. = SYSTEMCFG_PHYS_ADDR
-	.globl __end_naca
 	.globl __start_systemcfg
-__end_naca:
 __start_systemcfg:
 	. = (SYSTEMCFG_PHYS_ADDR + PAGE_SIZE)
 	.globl __end_systemcfg
@@ -1270,10 +1268,6 @@
 	SET_REG_TO_CONST(r4, SYSTEMCFG_VIRT_ADDR)
 	std	r4,0(r9)		/* set the systemcfg pointer */
 
-	LOADADDR(r9,naca)
-	SET_REG_TO_CONST(r4, NACA_VIRT_ADDR)
-	std	r4,0(r9)		/* set the naca pointer */
-
 	bl	.iSeries_early_setup
 
 	/* relocation is on at this point */
@@ -1873,12 +1867,6 @@
 	li	r27,SYSTEMCFG_PHYS_ADDR
 	std	r27,0(r6)	 	/* set the value of systemcfg	*/
 
-	/* setup the naca pointer which is needed by *tab_initialize	*/
-	LOADADDR(r6,naca)
-	sub	r6,r6,r26		/* addr of the variable naca	*/
-	li	r27,NACA_PHYS_ADDR
-	std	r27,0(r6)	 	/* set the value of naca	*/
-
 #ifdef CONFIG_HMT
 	/* Start up the second thread on cpu 0 */
 	mfspr	r3,PVR
@@ -2015,11 +2003,6 @@
 	SET_REG_TO_CONST(r8, SYSTEMCFG_VIRT_ADDR)
 	std	r8,0(r9)
 
-	/* setup the naca pointer */
-	LOADADDR(r9,naca)
-	SET_REG_TO_CONST(r8, NACA_VIRT_ADDR)
-	std	r8,0(r9)		/* set the value of the naca ptr */
-
 	LOADADDR(r26, boot_cpuid)
 	lwz	r26,0(r26)
 
diff -ruN linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c	2004-12-11 02:51:17.000000000 +1100
@@ -314,13 +314,13 @@
 	 * If the init RAM disk has been configured and there is
 	 * a non-zero starting address for it, set it up
 	 */
-	if (naca->xRamDisk) {
-		initrd_start = (unsigned long)__va(naca->xRamDisk);
-		initrd_end = initrd_start + naca->xRamDiskSize * PAGE_SIZE;
+	if (naca.xRamDisk) {
+		initrd_start = (unsigned long)__va(naca.xRamDisk);
+		initrd_end = initrd_start + naca.xRamDiskSize * PAGE_SIZE;
 		initrd_below_start_ok = 1;	// ramdisk in kernel space
 		ROOT_DEV = Root_RAM0;
-		if (((rd_size * 1024) / PAGE_SIZE) < naca->xRamDiskSize)
-			rd_size = (naca->xRamDiskSize * PAGE_SIZE) / 1024;
+		if (((rd_size * 1024) / PAGE_SIZE) < naca.xRamDiskSize)
+			rd_size = (naca.xRamDiskSize * PAGE_SIZE) / 1024;
 	} else
 #endif /* CONFIG_BLK_DEV_INITRD */
 	{
@@ -813,9 +813,9 @@
 	 * Change klimit to take into account any ram disk
 	 * that may be included
 	 */
-	if (naca->xRamDisk)
-		klimit = KERNELBASE + (u64)naca->xRamDisk +
-			(naca->xRamDiskSize * PAGE_SIZE);
+	if (naca.xRamDisk)
+		klimit = KERNELBASE + (u64)naca.xRamDisk +
+			(naca.xRamDiskSize * PAGE_SIZE);
 	else {
 		/*
 		 * No ram disk was included - check and see if there
diff -ruN linus-bk-naca.7/arch/ppc64/kernel/pacaData.c linus-bk-naca.8/arch/ppc64/kernel/pacaData.c
--- linus-bk-naca.7/arch/ppc64/kernel/pacaData.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.8/arch/ppc64/kernel/pacaData.c	2004-12-11 02:50:23.000000000 +1100
@@ -18,11 +18,8 @@
 
 #include <asm/iSeries/ItLpPaca.h>
 #include <asm/iSeries/ItLpQueue.h>
-#include <asm/naca.h>
 #include <asm/paca.h>
 
-struct naca_struct *naca;
-EXPORT_SYMBOL(naca);
 struct systemcfg *systemcfg;
 EXPORT_SYMBOL(systemcfg);
 
diff -ruN linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h
--- linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h	2004-01-20 08:20:26.000000000 +1100
+++ linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h	2004-12-11 02:52:05.000000000 +1100
@@ -26,6 +26,7 @@
 //   address of the OS's NACA).
 //
 #include <asm/types.h>
+#include <asm/naca.h>
 
 //=============================================================================
 //
diff -ruN linus-bk-naca.7/include/asm-ppc64/naca.h linus-bk-naca.8/include/asm-ppc64/naca.h
--- linus-bk-naca.7/include/asm-ppc64/naca.h	2004-12-11 02:41:18.000000000 +1100
+++ linus-bk-naca.8/include/asm-ppc64/naca.h	2004-12-11 02:54:02.000000000 +1100
@@ -21,12 +21,11 @@
 	u64   xRamDiskSize;		/* In pages                  0x10 */
 };
 
-extern struct naca_struct *naca;
+extern struct naca_struct naca;
 
 #endif /* __ASSEMBLY__ */
 
 #define NACA_PAGE      0x4
 #define NACA_PHYS_ADDR (NACA_PAGE<<PAGE_SHIFT)
-#define NACA_VIRT_ADDR (KERNELBASE+NACA_PHYS_ADDR)
 
 #endif /* _NACA_H */
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/91a6d1f4/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:37:40 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:37:40 +1100
Subject: [PATCH 9/11] PPC64: use xPMCRegsInUse
In-Reply-To: <20050104153445.3777e689.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
	<20050104153445.3777e689.sfr@canb.auug.org.au>
Message-ID: <20050104153740.56622b4f.sfr@canb.auug.org.au>

Hi Andrew,

This fixes an aweful piece of code that could have just referenced
xPMCRegsInUse in the lppaca structure.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.8/arch/ppc64/kernel/sysfs.c linus-bk-naca.9/arch/ppc64/kernel/sysfs.c
--- linus-bk-naca.8/arch/ppc64/kernel/sysfs.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.9/arch/ppc64/kernel/sysfs.c	2004-12-13 14:49:37.000000000 +1100
@@ -14,6 +14,8 @@
 #include <asm/hvcall.h>
 #include <asm/prom.h>
 #include <asm/systemcfg.h>
+#include <asm/paca.h>
+#include <asm/iSeries/ItLpPaca.h>
 
 
 /* SMT stuff */
@@ -154,10 +156,8 @@
 
 #ifdef CONFIG_PPC_PSERIES
 	/* instruct hypervisor to maintain PMCs */
-	if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) {
-		char *ptr = (char *)&paca[smp_processor_id()].lppaca;
-		ptr[0xBB] = 1;
-	}
+	if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR)
+		get_paca()->lppaca.xPMCRegsInUse = 1;
 
 	/*
 	 * On SMT machines we have to set the run latch in the ctrl register
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ea16f4d5/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:40:25 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:40:25 +1100
Subject: [PATCH 10/11] PPC64: move the lppaca defining header file
In-Reply-To: <20050104153740.56622b4f.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
	<20050104153445.3777e689.sfr@canb.auug.org.au>
	<20050104153740.56622b4f.sfr@canb.auug.org.au>
Message-ID: <20050104154025.63a1b9fb.sfr@canb.auug.org.au>

Hi Andrew,

This patch just renames asm/iSeries/ItLpPaca.h to asm/lppaca.h as the
lppaca structure is no longer just legacy iSeries specific.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.9/arch/ppc64/kernel/LparData.c linus-bk-naca.10/arch/ppc64/kernel/LparData.c
--- linus-bk-naca.9/arch/ppc64/kernel/LparData.c	2004-12-11 02:49:48.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/LparData.c	2004-12-13 15:01:55.000000000 +1100
@@ -16,7 +16,7 @@
 #include <asm/naca.h>
 #include <asm/abs_addr.h>
 #include <asm/iSeries/ItLpNaca.h>
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 #include <asm/iSeries/ItLpRegSave.h>
 #include <asm/paca.h>
 #include <asm/iSeries/HvReleaseData.h>
diff -ruN linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c
--- linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c	2004-12-10 17:27:14.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c	2004-12-13 15:02:03.000000000 +1100
@@ -29,7 +29,7 @@
 #include <asm/processor.h>
 
 #include <asm/paca.h>
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/iSeries/HvLpEvent.h>
 #include <asm/rtas.h>
diff -ruN linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c
--- linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c	2004-12-10 16:26:54.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c	2004-12-13 15:02:14.000000000 +1100
@@ -24,7 +24,7 @@
 #include <asm/paca.h>
 #include <asm/processor.h>
 #include <asm/time.h>
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/iSeries/HvCallXm.h>
 #include <asm/iSeries/IoHriMainStore.h>
diff -ruN linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c
--- linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c	2004-11-20 12:05:26.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c	2004-12-13 15:02:29.000000000 +1100
@@ -27,7 +27,7 @@
 #include <linux/seq_file.h>
 #include <asm/uaccess.h>
 #include <asm/iSeries/HvLpConfig.h>
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 #include <asm/iSeries/LparData.h>
 #include <asm/hvcall.h>
 #include <asm/cputable.h>
diff -ruN linus-bk-naca.9/arch/ppc64/kernel/pacaData.c linus-bk-naca.10/arch/ppc64/kernel/pacaData.c
--- linus-bk-naca.9/arch/ppc64/kernel/pacaData.c	2004-12-11 02:50:23.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/pacaData.c	2004-12-13 15:02:07.000000000 +1100
@@ -16,7 +16,7 @@
 #include <asm/ptrace.h>
 #include <asm/page.h>
 
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/paca.h>
 
diff -ruN linus-bk-naca.9/arch/ppc64/kernel/sysfs.c linus-bk-naca.10/arch/ppc64/kernel/sysfs.c
--- linus-bk-naca.9/arch/ppc64/kernel/sysfs.c	2004-12-13 14:49:37.000000000 +1100
+++ linus-bk-naca.10/arch/ppc64/kernel/sysfs.c	2004-12-13 15:01:19.000000000 +1100
@@ -15,7 +15,7 @@
 #include <asm/prom.h>
 #include <asm/systemcfg.h>
 #include <asm/paca.h>
-#include <asm/iSeries/ItLpPaca.h>
+#include <asm/lppaca.h>
 
 
 /* SMT stuff */
diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h
--- linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h	2004-01-20 08:20:26.000000000 +1100
+++ linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h	1970-01-01 10:00:00.000000000 +1000
@@ -1,134 +0,0 @@
-/*
- * ItLpPaca.h
- * Copyright (C) 2001  Mike Corrigan IBM Corporation
- * 
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- * 
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- * 
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-#ifndef _ITLPPACA_H
-#define _ITLPPACA_H
-
-//=============================================================================
-//                                   
-//	This control block contains the data that is shared between the 
-//	hypervisor (PLIC) and the OS.
-//    
-//
-//----------------------------------------------------------------------------
-#include <asm/types.h>
-
-struct ItLpPaca
-{
-//=============================================================================
-// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data
-// NOTE: The xDynXyz fields are fields that will be dynamically changed by 
-// PLIC when preparing to bring a processor online or when dispatching a 
-// virtual processor!
-//=============================================================================
-	u32	xDesc;			// Eye catcher 0xD397D781	x00-x03
-	u16	xSize;			// Size of this struct		x04-x05
-	u16	xRsvd1_0;		// Reserved			x06-x07
-	u16	xRsvd1_1:14;		// Reserved			x08-x09
-	u8	xSharedProc:1;		// Shared processor indicator	...
-	u8	xSecondaryThread:1;	// Secondary thread indicator	...
-	volatile u8 xDynProcStatus:8;	// Dynamic Status of this proc	x0A-x0A
-	u8	xSecondaryThreadCnt;	// Secondary thread count	x0B-x0B
-	volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D
-	volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F
-	u32	xDecrVal;   		// Value for Decr programming 	x10-x13
-	u32	xPMCVal;       		// Value for PMC regs         	x14-x17
-	volatile u32 xDynHwNodeId;	// Dynamic Hardware Node id	x18-x1B
-	volatile u32 xDynHwProcId;	// Dynamic Hardware Proc Id	x1C-x1F
-	volatile u32 xDynPIR;		// Dynamic ProcIdReg value	x20-x23
-	u32	xDseiData;           	// DSEI data                  	x24-x27
-	u64	xSPRG3;               	// SPRG3 value                	x28-x2F
-	u8	xRsvd1_3[80];		// Reserved			x30-x7F
-   
-//=============================================================================
-// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data
-//=============================================================================
-	// This Dword contains a byte for each type of interrupt that can occur.  
-	// The IPI is a count while the others are just a binary 1 or 0.
-	union {
-		u64	xAnyInt;
-		struct {
-			u16	xRsvd;		// Reserved - cleared by #mpasmbl
-			u8	xXirrInt;	// Indicates xXirrValue is valid or Immed IO
-			u8	xIpiCnt;	// IPI Count
-			u8	xDecrInt;	// DECR interrupt occurred
-			u8	xPdcInt;	// PDC interrupt occurred
-			u8	xQuantumInt;	// Interrupt quantum reached
-			u8	xOldPlicDeferredExtInt;	// Old PLIC has a deferred XIRR pending
-		} xFields;
-	} xIntDword;
-
-	// Whenever any fields in this Dword are set then PLIC will defer the 
-	// processing of external interrupts.  Note that PLIC will store the 
-	// XIRR directly into the xXirrValue field so that another XIRR will 
-	// not be presented until this one clears.  The layout of the low 
-	// 4-bytes of this Dword is upto SLIC - PLIC just checks whether the 
-	// entire Dword is zero or not.  A non-zero value in the low order 
-	// 2-bytes will result in SLIC being granted the highest thread 
-	// priority upon return.  A 0 will return to SLIC as medium priority.
-	u64	xPlicDeferIntsArea;	// Entire Dword
-
-	// Used to pass the real SRR0/1 from PLIC to SLIC as well as to 
-	// pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid.
-	u64     xSavedSrr0;             // Saved SRR0                   x10-x17
-	u64     xSavedSrr1;             // Saved SRR1                   x18-x1F
-
-	// Used to pass parms from the OS to PLIC for SetAsrAndRfid
-	u64     xSavedGpr3;             // Saved GPR3                   x20-x27
-	u64     xSavedGpr4;             // Saved GPR4                   x28-x2F
-	u64     xSavedGpr5;             // Saved GPR5                   x30-x37
-
-	u8	xRsvd2_1;		// Reserved			x38-x38
-	u8      xCpuCtlsTaskAttributes; // Task attributes for cpuctls  x39-x39
-	u8      xFPRegsInUse;           // FP regs in use               x3A-x3A
-	u8      xPMCRegsInUse;          // PMC regs in use              x3B-x3B
-	volatile u32  xSavedDecr;	// Saved Decr Value             x3C-x3F
-	volatile u64  xEmulatedTimeBase;// Emulated TB for this thread  x40-x47
-	volatile u64  xCurPLICLatency;	// Unaccounted PLIC latency     x48-x4F
-	u64     xTotPLICLatency;        // Accumulated PLIC latency     x50-x57   
-	u64     xWaitStateCycles;       // Wait cycles for this proc    x58-x5F
-	u64     xEndOfQuantum;          // TB at end of quantum         x60-x67
-	u64     xPDCSavedSPRG1;         // Saved SPRG1 for PMC int      x68-x6F
-	u64     xPDCSavedSRR0;          // Saved SRR0 for PMC int       x70-x77
-	volatile u32 xVirtualDecr;	// Virtual DECR for shared procsx78-x7B
-	u16     xSLBCount;              // # of SLBs to maintain        x7C-x7D
-	u8      xIdle;                  // Indicate OS is idle          x7E
-	u8      xRsvd2_2;               // Reserved                     x7F
-
-
-//=============================================================================
-// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors
-//=============================================================================
-	// This is the xYieldCount.  An "odd" value (low bit on) means that 
-	// the processor is yielded (either because of an OS yield or a PLIC 
-	// preempt).  An even value implies that the processor is currently 
-	// executing.
-	// NOTE: This value will ALWAYS be zero for dedicated processors and 
-	// will NEVER be zero for shared processors (ie, initialized to a 1).
-	volatile u32 xYieldCount;	// PLIC increments each dispatchx00-x03
-	u8	xRsvd3_0[124];		// Reserved                     x04-x7F         
-
-//=============================================================================
-// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data
-//=============================================================================
-	u8      xPmcSaveArea[256];	// PMC interrupt Area           x00-xFF
-
-
-};
-
-#endif /* _ITLPPACA_H */
diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h
--- linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h	2004-12-10 16:26:54.000000000 +1100
+++ linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h	2004-12-13 15:03:03.000000000 +1100
@@ -25,7 +25,6 @@
 #include <asm/abs_addr.h>
 
 #include <asm/iSeries/ItLpNaca.h>
-#include <asm/iSeries/ItLpPaca.h>
 #include <asm/iSeries/ItLpRegSave.h>
 #include <asm/iSeries/HvReleaseData.h>
 #include <asm/iSeries/LparMap.h>
diff -ruN linus-bk-naca.9/include/asm-ppc64/lppaca.h linus-bk-naca.10/include/asm-ppc64/lppaca.h
--- linus-bk-naca.9/include/asm-ppc64/lppaca.h	1970-01-01 10:00:00.000000000 +1000
+++ linus-bk-naca.10/include/asm-ppc64/lppaca.h	2004-12-13 15:04:43.000000000 +1100
@@ -0,0 +1,134 @@
+/*
+ * lppaca.h
+ * Copyright (C) 2001  Mike Corrigan IBM Corporation
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+#ifndef _ASM_LPPACA_H
+#define _ASM_LPPACA_H
+
+//=============================================================================
+//                                   
+//	This control block contains the data that is shared between the 
+//	hypervisor (PLIC) and the OS.
+//    
+//
+//----------------------------------------------------------------------------
+#include <asm/types.h>
+
+struct ItLpPaca
+{
+//=============================================================================
+// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data
+// NOTE: The xDynXyz fields are fields that will be dynamically changed by 
+// PLIC when preparing to bring a processor online or when dispatching a 
+// virtual processor!
+//=============================================================================
+	u32	xDesc;			// Eye catcher 0xD397D781	x00-x03
+	u16	xSize;			// Size of this struct		x04-x05
+	u16	xRsvd1_0;		// Reserved			x06-x07
+	u16	xRsvd1_1:14;		// Reserved			x08-x09
+	u8	xSharedProc:1;		// Shared processor indicator	...
+	u8	xSecondaryThread:1;	// Secondary thread indicator	...
+	volatile u8 xDynProcStatus:8;	// Dynamic Status of this proc	x0A-x0A
+	u8	xSecondaryThreadCnt;	// Secondary thread count	x0B-x0B
+	volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D
+	volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F
+	u32	xDecrVal;   		// Value for Decr programming 	x10-x13
+	u32	xPMCVal;       		// Value for PMC regs         	x14-x17
+	volatile u32 xDynHwNodeId;	// Dynamic Hardware Node id	x18-x1B
+	volatile u32 xDynHwProcId;	// Dynamic Hardware Proc Id	x1C-x1F
+	volatile u32 xDynPIR;		// Dynamic ProcIdReg value	x20-x23
+	u32	xDseiData;           	// DSEI data                  	x24-x27
+	u64	xSPRG3;               	// SPRG3 value                	x28-x2F
+	u8	xRsvd1_3[80];		// Reserved			x30-x7F
+   
+//=============================================================================
+// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data
+//=============================================================================
+	// This Dword contains a byte for each type of interrupt that can occur.  
+	// The IPI is a count while the others are just a binary 1 or 0.
+	union {
+		u64	xAnyInt;
+		struct {
+			u16	xRsvd;		// Reserved - cleared by #mpasmbl
+			u8	xXirrInt;	// Indicates xXirrValue is valid or Immed IO
+			u8	xIpiCnt;	// IPI Count
+			u8	xDecrInt;	// DECR interrupt occurred
+			u8	xPdcInt;	// PDC interrupt occurred
+			u8	xQuantumInt;	// Interrupt quantum reached
+			u8	xOldPlicDeferredExtInt;	// Old PLIC has a deferred XIRR pending
+		} xFields;
+	} xIntDword;
+
+	// Whenever any fields in this Dword are set then PLIC will defer the 
+	// processing of external interrupts.  Note that PLIC will store the 
+	// XIRR directly into the xXirrValue field so that another XIRR will 
+	// not be presented until this one clears.  The layout of the low 
+	// 4-bytes of this Dword is upto SLIC - PLIC just checks whether the 
+	// entire Dword is zero or not.  A non-zero value in the low order 
+	// 2-bytes will result in SLIC being granted the highest thread 
+	// priority upon return.  A 0 will return to SLIC as medium priority.
+	u64	xPlicDeferIntsArea;	// Entire Dword
+
+	// Used to pass the real SRR0/1 from PLIC to SLIC as well as to 
+	// pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid.
+	u64     xSavedSrr0;             // Saved SRR0                   x10-x17
+	u64     xSavedSrr1;             // Saved SRR1                   x18-x1F
+
+	// Used to pass parms from the OS to PLIC for SetAsrAndRfid
+	u64     xSavedGpr3;             // Saved GPR3                   x20-x27
+	u64     xSavedGpr4;             // Saved GPR4                   x28-x2F
+	u64     xSavedGpr5;             // Saved GPR5                   x30-x37
+
+	u8	xRsvd2_1;		// Reserved			x38-x38
+	u8      xCpuCtlsTaskAttributes; // Task attributes for cpuctls  x39-x39
+	u8      xFPRegsInUse;           // FP regs in use               x3A-x3A
+	u8      xPMCRegsInUse;          // PMC regs in use              x3B-x3B
+	volatile u32  xSavedDecr;	// Saved Decr Value             x3C-x3F
+	volatile u64  xEmulatedTimeBase;// Emulated TB for this thread  x40-x47
+	volatile u64  xCurPLICLatency;	// Unaccounted PLIC latency     x48-x4F
+	u64     xTotPLICLatency;        // Accumulated PLIC latency     x50-x57   
+	u64     xWaitStateCycles;       // Wait cycles for this proc    x58-x5F
+	u64     xEndOfQuantum;          // TB at end of quantum         x60-x67
+	u64     xPDCSavedSPRG1;         // Saved SPRG1 for PMC int      x68-x6F
+	u64     xPDCSavedSRR0;          // Saved SRR0 for PMC int       x70-x77
+	volatile u32 xVirtualDecr;	// Virtual DECR for shared procsx78-x7B
+	u16     xSLBCount;              // # of SLBs to maintain        x7C-x7D
+	u8      xIdle;                  // Indicate OS is idle          x7E
+	u8      xRsvd2_2;               // Reserved                     x7F
+
+
+//=============================================================================
+// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors
+//=============================================================================
+	// This is the xYieldCount.  An "odd" value (low bit on) means that 
+	// the processor is yielded (either because of an OS yield or a PLIC 
+	// preempt).  An even value implies that the processor is currently 
+	// executing.
+	// NOTE: This value will ALWAYS be zero for dedicated processors and 
+	// will NEVER be zero for shared processors (ie, initialized to a 1).
+	volatile u32 xYieldCount;	// PLIC increments each dispatchx00-x03
+	u8	xRsvd3_0[124];		// Reserved                     x04-x7F         
+
+//=============================================================================
+// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data
+//=============================================================================
+	u8      xPmcSaveArea[256];	// PMC interrupt Area           x00-xFF
+
+
+};
+
+#endif /* _ASM_LPPACA_H */
diff -ruN linus-bk-naca.9/include/asm-ppc64/paca.h linus-bk-naca.10/include/asm-ppc64/paca.h
--- linus-bk-naca.9/include/asm-ppc64/paca.h	2004-12-13 18:05:08.000000000 +1100
+++ linus-bk-naca.10/include/asm-ppc64/paca.h	2004-12-31 15:48:57.000000000 +1100
@@ -18,7 +18,7 @@
 
 #include	<linux/config.h>
 #include	<asm/types.h>
-#include	<asm/iSeries/ItLpPaca.h>
+#include	<asm/lppaca.h>
 #include	<asm/iSeries/ItLpRegSave.h>
 #include	<asm/mmu.h>
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/75de8b11/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 15:43:19 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 15:43:19 +1100
Subject: [PATCH 11/11] PPC64: remove StudlyCaps from lppaca structure
In-Reply-To: <20050104154025.63a1b9fb.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
	<20050104153445.3777e689.sfr@canb.auug.org.au>
	<20050104153740.56622b4f.sfr@canb.auug.org.au>
	<20050104154025.63a1b9fb.sfr@canb.auug.org.au>
Message-ID: <20050104154319.505b1197.sfr@canb.auug.org.au>

Hi Andrew,

This patch just renames all the fields (and the structure name) of the
lppaca structure to rid us of some more StudyCaps.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c
--- linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c	2004-12-13 15:02:03.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c	2004-12-31 15:51:16.000000000 +1100
@@ -102,10 +102,10 @@
         DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp));
 	DEFINE(PACALPPACA, offsetof(struct paca_struct, lppaca));
 	DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id));
-        DEFINE(LPPACASRR0, offsetof(struct ItLpPaca, xSavedSrr0));
-        DEFINE(LPPACASRR1, offsetof(struct ItLpPaca, xSavedSrr1));
-	DEFINE(LPPACAANYINT, offsetof(struct ItLpPaca, xIntDword.xAnyInt));
-	DEFINE(LPPACADECRINT, offsetof(struct ItLpPaca, xIntDword.xFields.xDecrInt));
+	DEFINE(LPPACASRR0, offsetof(struct lppaca, saved_srr0));
+	DEFINE(LPPACASRR1, offsetof(struct lppaca, saved_srr1));
+	DEFINE(LPPACAANYINT, offsetof(struct lppaca, int_dword.any_int));
+	DEFINE(LPPACADECRINT, offsetof(struct lppaca, int_dword.fields.decr_int));
 
 	/* RTAS */
 	DEFINE(RTASBASE, offsetof(struct rtas_t, base));
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c	2004-12-11 02:51:17.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c	2004-12-13 15:31:14.000000000 +1100
@@ -559,7 +559,7 @@
 static void __init setup_iSeries_cache_sizes(void)
 {
 	unsigned int i, n;
-	unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex;
+	unsigned int procIx = get_paca()->lppaca.dyn_hv_phys_proc_index;
 
 	systemcfg->icache_size =
 	ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024;
@@ -656,7 +656,7 @@
 void __init iSeries_setup_arch(void)
 {
 	void *eventStack;
-	unsigned procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex;
+	unsigned procIx = get_paca()->lppaca.dyn_hv_phys_proc_index;
 
 	/* Add an eye catcher and the systemcfg layout version number */
 	strcpy(systemcfg->eye_catcher, "SYSTEMCFG:PPC64");
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c
--- linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c	2004-12-10 16:26:54.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c	2004-12-13 15:29:16.000000000 +1100
@@ -90,7 +90,7 @@
 
 	np = 0;
         for (i=0; i < NR_CPUS; ++i) {
-                if (paca[i].lppaca.xDynProcStatus < 2) {
+                if (paca[i].lppaca.dyn_proc_status < 2) {
 			cpu_set(i, cpu_possible_map);
 			cpu_set(i, cpu_present_map);
 			cpu_set(i, cpu_sibling_map[i]);
@@ -106,7 +106,7 @@
 	unsigned np = 0;
 
 	for (i=0; i < NR_CPUS; ++i) {
-		if (paca[i].lppaca.xDynProcStatus < 2) {
+		if (paca[i].lppaca.dyn_proc_status < 2) {
 			/*paca[i].active = 1;*/
 			++np;
 		}
@@ -120,7 +120,7 @@
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	/* Verify that our partition has a processor nr */
-	if (paca[nr].lppaca.xDynProcStatus >= 2)
+	if (paca[nr].lppaca.dyn_proc_status >= 2)
 		return;
 
 	/* The processor is currently spinning, waiting
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/idle.c linus-bk-naca.11/arch/ppc64/kernel/idle.c
--- linus-bk-naca.10/arch/ppc64/kernel/idle.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/idle.c	2004-12-13 16:06:18.000000000 +1100
@@ -67,7 +67,7 @@
 	 * The decrementer stops during the yield.  Force a fake decrementer
 	 * here and let the timer_interrupt code sort out the actual time.
 	 */
-	get_paca()->lppaca.xIntDword.xFields.xDecrInt = 1;
+	get_paca()->lppaca.int_dword.fields.decr_int = 1;
 	process_iSeries_events();
 }
 
@@ -86,7 +86,7 @@
 	lpaca = get_paca();
 
 	while (1) {
-		if (lpaca->lppaca.xSharedProc) {
+		if (lpaca->lppaca.shared_proc) {
 			if (ItLpQueue_isLpIntPending(lpaca->lpqueue_ptr))
 				process_iSeries_events();
 			if (!need_resched())
@@ -173,7 +173,7 @@
 		 * Indicate to the HV that we are idle. Now would be
 		 * a good time to find other work to dispatch.
 		 */
-		lpaca->lppaca.xIdle = 1;
+		lpaca->lppaca.idle = 1;
 
 		oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED);
 		if (!oldval) {
@@ -194,7 +194,7 @@
 
 				HMT_medium();
 
-				if (!(ppaca->lppaca.xIdle)) {
+				if (!(ppaca->lppaca.idle)) {
 					local_irq_disable();
 
 					/*
@@ -233,7 +233,7 @@
 		}
 
 		HMT_medium();
-		lpaca->lppaca.xIdle = 0;
+		lpaca->lppaca.idle = 0;
 		schedule();
 		if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING)
 			cpu_die();
@@ -251,7 +251,7 @@
 		 * Indicate to the HV that we are idle. Now would be
 		 * a good time to find other work to dispatch.
 		 */
-		lpaca->lppaca.xIdle = 1;
+		lpaca->lppaca.idle = 1;
 
 		while (!need_resched() && !cpu_is_offline(cpu)) {
 			local_irq_disable();
@@ -273,7 +273,7 @@
 		}
 
 		HMT_medium();
-		lpaca->lppaca.xIdle = 0;
+		lpaca->lppaca.idle = 0;
 		schedule();
 		if (cpu_is_offline(smp_processor_id()) &&
 		    system_state == SYSTEM_RUNNING)
@@ -352,7 +352,7 @@
 #ifdef CONFIG_PPC_PSERIES
 	if (systemcfg->platform & PLATFORM_PSERIES) {
 		if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) {
-			if (get_paca()->lppaca.xSharedProc) {
+			if (get_paca()->lppaca.shared_proc) {
 				printk(KERN_INFO "Using shared processor idle loop\n");
 				idle_loop = shared_idle;
 			} else {
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/irq.c linus-bk-naca.11/arch/ppc64/kernel/irq.c
--- linus-bk-naca.10/arch/ppc64/kernel/irq.c	2004-12-31 14:53:21.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/irq.c	2004-12-13 15:43:22.000000000 +1100
@@ -259,8 +259,8 @@
 
 	lpaca = get_paca();
 #ifdef CONFIG_SMP
-	if (lpaca->lppaca.xIntDword.xFields.xIpiCnt) {
-		lpaca->lppaca.xIntDword.xFields.xIpiCnt = 0;
+	if (lpaca->lppaca.int_dword.fields.ipi_cnt) {
+		lpaca->lppaca.int_dword.fields.ipi_cnt = 0;
 		iSeries_smp_message_recv(regs);
 	}
 #endif /* CONFIG_SMP */
@@ -270,8 +270,8 @@
 
 	irq_exit();
 
-	if (lpaca->lppaca.xIntDword.xFields.xDecrInt) {
-		lpaca->lppaca.xIntDword.xFields.xDecrInt = 0;
+	if (lpaca->lppaca.int_dword.fields.decr_int) {
+		lpaca->lppaca.int_dword.fields.decr_int = 0;
 		/* Signal a fake decrementer interrupt */
 		timer_interrupt(regs);
 	}
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c
--- linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c	2004-12-13 15:02:29.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c	2004-12-13 16:00:00.000000000 +1100
@@ -72,7 +72,7 @@
 
 /*
  * For iSeries legacy systems, the PPA purr function is available from the
- * xEmulatedTimeBase field in the paca.
+ * emulated_time_base field in the paca.
  */
 static unsigned long get_purr(void)
 {
@@ -82,11 +82,11 @@
 
 	for_each_cpu(cpu) {
 		lpaca = paca + cpu;
-		sum_purr += lpaca->lppaca.xEmulatedTimeBase;
+		sum_purr += lpaca->lppaca.emulated_time_base;
 
 #ifdef PURR_DEBUG
 		printk(KERN_INFO "get_purr for cpu (%d) has value (%ld) \n",
-			cpu, lpaca->lppaca.xEmulatedTimeBase);
+			cpu, lpaca->lppaca.emulated_time_base);
 #endif
 	}
 	return sum_purr;
@@ -107,7 +107,7 @@
 
 	seq_printf(m, "%s %s \n", MODULE_NAME, MODULE_VERS);
 
-	shared = (int)(lpaca->lppaca_ptr->xSharedProc);
+	shared = (int)(lpaca->lppaca_ptr->shared_proc);
 	seq_printf(m, "serial_number=%c%c%c%c%c%c%c\n",
 		   e2a(xItExtVpdPanel.mfgID[2]),
 		   e2a(xItExtVpdPanel.mfgID[3]),
@@ -395,7 +395,7 @@
 			   (h_resource >> 0 * 8) & 0xffff);
 
 		/* pool related entries are apropriate for shared configs */
-		if (paca[0].lppaca.xSharedProc) {
+		if (paca[0].lppaca.shared_proc) {
 
 			h_pic(&pool_idle_time, &pool_procs);
 
@@ -444,7 +444,7 @@
 	seq_printf(m, "partition_potential_processors=%d\n",
 		   partition_potential_processors);
 
-	seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.xSharedProc);
+	seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.shared_proc);
 
 	return 0;
 }
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/pacaData.c linus-bk-naca.11/arch/ppc64/kernel/pacaData.c
--- linus-bk-naca.10/arch/ppc64/kernel/pacaData.c	2004-12-13 15:02:07.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/pacaData.c	2004-12-13 16:05:34.000000000 +1100
@@ -28,7 +28,7 @@
 extern unsigned long __toc_start;
 
 /* The Paca is an array with one entry per processor.  Each contains an 
- * ItLpPaca, which contains the information shared between the 
+ * lppaca, which contains the information shared between the 
  * hypervisor and Linux.  Each also contains an ItLpRegSave area which
  * is used by the hypervisor to save registers.
  * On systems with hardware multi-threading, there are two threads
@@ -61,13 +61,13 @@
 	.cpu_start = (start),		/* Processor start */		    \
 	.hw_cpu_id = 0xffff,						    \
 	.lppaca = {							    \
-		.xDesc = 0xd397d781,	/* "LpPa" */			    \
-		.xSize = sizeof(struct ItLpPaca),			    \
-		.xFPRegsInUse = 1,					    \
-		.xDynProcStatus = 2,					    \
-		.xDecrVal = 0x00ff0000,					    \
-		.xEndOfQuantum = 0xfffffffffffffffful,			    \
-		.xSLBCount = 64,					    \
+		.desc = 0xd397d781,	/* "LpPa" */			    \
+		.size = sizeof(struct lppaca),				    \
+		.dyn_proc_status = 2,					    \
+		.decr_val = 0x00ff0000,					    \
+		.fpregs_in_use = 1,					    \
+		.end_of_quantum = 0xfffffffffffffffful,			    \
+		.slb_count = 64,					    \
 	},								    \
 	EXTRA_INITS((number), (lpq))					    \
 }
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/sysfs.c linus-bk-naca.11/arch/ppc64/kernel/sysfs.c
--- linus-bk-naca.10/arch/ppc64/kernel/sysfs.c	2004-12-13 15:01:19.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/sysfs.c	2004-12-13 15:58:30.000000000 +1100
@@ -157,7 +157,7 @@
 #ifdef CONFIG_PPC_PSERIES
 	/* instruct hypervisor to maintain PMCs */
 	if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR)
-		get_paca()->lppaca.xPMCRegsInUse = 1;
+		get_paca()->lppaca.pmcregs_in_use = 1;
 
 	/*
 	 * On SMT machines we have to set the run latch in the ctrl register
diff -ruN linus-bk-naca.10/arch/ppc64/kernel/time.c linus-bk-naca.11/arch/ppc64/kernel/time.c
--- linus-bk-naca.10/arch/ppc64/kernel/time.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/kernel/time.c	2004-12-13 15:43:28.000000000 +1100
@@ -230,7 +230,7 @@
 /*
  * For iSeries shared processors, we have to let the hypervisor
  * set the hardware decrementer.  We set a virtual decrementer
- * in the ItLpPaca and call the hypervisor if the virtual
+ * in the lppaca and call the hypervisor if the virtual
  * decrementer is less than the current value in the hardware
  * decrementer. (almost always the new decrementer value will
  * be greater than the current hardware decementer so the hypervisor
@@ -256,7 +256,7 @@
 	profile_tick(CPU_PROFILING, regs);
 #endif
 
-	lpaca->lppaca.xIntDword.xFields.xDecrInt = 0;
+	lpaca->lppaca.int_dword.fields.decr_int = 0;
 
 	while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) {
 
diff -ruN linus-bk-naca.10/arch/ppc64/lib/locks.c linus-bk-naca.11/arch/ppc64/lib/locks.c
--- linus-bk-naca.10/arch/ppc64/lib/locks.c	2004-09-16 21:51:57.000000000 +1000
+++ linus-bk-naca.11/arch/ppc64/lib/locks.c	2004-12-13 16:08:05.000000000 +1100
@@ -34,7 +34,7 @@
 	holder_cpu = lock_value & 0xffff;
 	BUG_ON(holder_cpu >= NR_CPUS);
 	holder_paca = &paca[holder_cpu];
-	yield_count = holder_paca->lppaca.xYieldCount;
+	yield_count = holder_paca->lppaca.yield_count;
 	if ((yield_count & 1) == 0)
 		return;		/* virtual cpu is currently running */
 	rmb();
@@ -66,7 +66,7 @@
 	holder_cpu = lock_value & 0xffff;
 	BUG_ON(holder_cpu >= NR_CPUS);
 	holder_paca = &paca[holder_cpu];
-	yield_count = holder_paca->lppaca.xYieldCount;
+	yield_count = holder_paca->lppaca.yield_count;
 	if ((yield_count & 1) == 0)
 		return;		/* virtual cpu is currently running */
 	rmb();
diff -ruN linus-bk-naca.10/arch/ppc64/xmon/xmon.c linus-bk-naca.11/arch/ppc64/xmon/xmon.c
--- linus-bk-naca.10/arch/ppc64/xmon/xmon.c	2004-12-11 02:33:00.000000000 +1100
+++ linus-bk-naca.11/arch/ppc64/xmon/xmon.c	2004-12-13 15:50:52.000000000 +1100
@@ -1489,7 +1489,7 @@
 	unsigned long val;
 #ifdef CONFIG_PPC_ISERIES
 	struct paca_struct *ptrPaca = NULL;
-	struct ItLpPaca *ptrLpPaca = NULL;
+	struct lppaca *ptrLpPaca = NULL;
 	struct ItLpRegSave *ptrLpRegSave = NULL;
 #endif
 
@@ -1513,10 +1513,10 @@
 		printf("  Local Processor Control Area (LpPaca): \n");
 		ptrLpPaca = ptrPaca->lppaca_ptr;
 		printf("    Saved Srr0=%.16lx  Saved Srr1=%.16lx \n",
-		       ptrLpPaca->xSavedSrr0, ptrLpPaca->xSavedSrr1);
+		       ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1);
 		printf("    Saved Gpr3=%.16lx  Saved Gpr4=%.16lx \n",
-		       ptrLpPaca->xSavedGpr3, ptrLpPaca->xSavedGpr4);
-		printf("    Saved Gpr5=%.16lx \n", ptrLpPaca->xSavedGpr5);
+		       ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4);
+		printf("    Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5);
     
 		printf("  Local Processor Register Save Area (LpRegSave): \n");
 		ptrLpRegSave = ptrPaca->reg_save_ptr;
diff -ruN linus-bk-naca.10/include/asm-ppc64/lppaca.h linus-bk-naca.11/include/asm-ppc64/lppaca.h
--- linus-bk-naca.10/include/asm-ppc64/lppaca.h	2004-12-13 15:04:43.000000000 +1100
+++ linus-bk-naca.11/include/asm-ppc64/lppaca.h	2004-12-13 16:09:08.000000000 +1100
@@ -28,7 +28,7 @@
 //----------------------------------------------------------------------------
 #include <asm/types.h>
 
-struct ItLpPaca
+struct lppaca
 {
 //=============================================================================
 // CACHE_LINE_1 0x0000 - 0x007F Contains read-only data
@@ -36,24 +36,24 @@
 // PLIC when preparing to bring a processor online or when dispatching a 
 // virtual processor!
 //=============================================================================
-	u32	xDesc;			// Eye catcher 0xD397D781	x00-x03
-	u16	xSize;			// Size of this struct		x04-x05
-	u16	xRsvd1_0;		// Reserved			x06-x07
-	u16	xRsvd1_1:14;		// Reserved			x08-x09
-	u8	xSharedProc:1;		// Shared processor indicator	...
-	u8	xSecondaryThread:1;	// Secondary thread indicator	...
-	volatile u8 xDynProcStatus:8;	// Dynamic Status of this proc	x0A-x0A
-	u8	xSecondaryThreadCnt;	// Secondary thread count	x0B-x0B
-	volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D
-	volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F
-	u32	xDecrVal;   		// Value for Decr programming 	x10-x13
-	u32	xPMCVal;       		// Value for PMC regs         	x14-x17
-	volatile u32 xDynHwNodeId;	// Dynamic Hardware Node id	x18-x1B
-	volatile u32 xDynHwProcId;	// Dynamic Hardware Proc Id	x1C-x1F
-	volatile u32 xDynPIR;		// Dynamic ProcIdReg value	x20-x23
-	u32	xDseiData;           	// DSEI data                  	x24-x27
-	u64	xSPRG3;               	// SPRG3 value                	x28-x2F
-	u8	xRsvd1_3[80];		// Reserved			x30-x7F
+	u32	desc;			// Eye catcher 0xD397D781	x00-x03
+	u16	size;			// Size of this struct		x04-x05
+	u16	reserved1;		// Reserved			x06-x07
+	u16	reserved2:14;		// Reserved			x08-x09
+	u8	shared_proc:1;		// Shared processor indicator	...
+	u8	secondary_thread:1;	// Secondary thread indicator	...
+	volatile u8 dyn_proc_status:8;	// Dynamic Status of this proc	x0A-x0A
+	u8	secondary_thread_count;	// Secondary thread count	x0B-x0B
+	volatile u16 dyn_hv_phys_proc_index;// Dynamic HV Physical Proc Index0C-x0D
+	volatile u16 dyn_hv_log_proc_index;// Dynamic HV Logical Proc Indexx0E-x0F
+	u32	decr_val;   		// Value for Decr programming 	x10-x13
+	u32	pmc_val;       		// Value for PMC regs         	x14-x17
+	volatile u32 dyn_hw_node_id;	// Dynamic Hardware Node id	x18-x1B
+	volatile u32 dyn_hw_proc_id;	// Dynamic Hardware Proc Id	x1C-x1F
+	volatile u32 dyn_pir;		// Dynamic ProcIdReg value	x20-x23
+	u32	dsei_data;           	// DSEI data                  	x24-x27
+	u64	sprg3;               	// SPRG3 value                	x28-x2F
+	u8	reserved3[80];		// Reserved			x30-x7F
    
 //=============================================================================
 // CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data
@@ -61,17 +61,17 @@
 	// This Dword contains a byte for each type of interrupt that can occur.  
 	// The IPI is a count while the others are just a binary 1 or 0.
 	union {
-		u64	xAnyInt;
+		u64	any_int;
 		struct {
-			u16	xRsvd;		// Reserved - cleared by #mpasmbl
-			u8	xXirrInt;	// Indicates xXirrValue is valid or Immed IO
-			u8	xIpiCnt;	// IPI Count
-			u8	xDecrInt;	// DECR interrupt occurred
-			u8	xPdcInt;	// PDC interrupt occurred
-			u8	xQuantumInt;	// Interrupt quantum reached
-			u8	xOldPlicDeferredExtInt;	// Old PLIC has a deferred XIRR pending
-		} xFields;
-	} xIntDword;
+			u16	reserved;	// Reserved - cleared by #mpasmbl
+			u8	xirr_int;	// Indicates xXirrValue is valid or Immed IO
+			u8	ipi_cnt;	// IPI Count
+			u8	decr_int;	// DECR interrupt occurred
+			u8	pdc_int;	// PDC interrupt occurred
+			u8	quantum_int;	// Interrupt quantum reached
+			u8	old_plic_deferred_ext_int;	// Old PLIC has a deferred XIRR pending
+		} fields;
+	} int_dword;
 
 	// Whenever any fields in this Dword are set then PLIC will defer the 
 	// processing of external interrupts.  Note that PLIC will store the 
@@ -81,54 +81,52 @@
 	// entire Dword is zero or not.  A non-zero value in the low order 
 	// 2-bytes will result in SLIC being granted the highest thread 
 	// priority upon return.  A 0 will return to SLIC as medium priority.
-	u64	xPlicDeferIntsArea;	// Entire Dword
+	u64	plic_defer_ints_area;	// Entire Dword
 
 	// Used to pass the real SRR0/1 from PLIC to SLIC as well as to 
 	// pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid.
-	u64     xSavedSrr0;             // Saved SRR0                   x10-x17
-	u64     xSavedSrr1;             // Saved SRR1                   x18-x1F
+	u64	saved_srr0;		// Saved SRR0                   x10-x17
+	u64	saved_srr1;		// Saved SRR1                   x18-x1F
 
 	// Used to pass parms from the OS to PLIC for SetAsrAndRfid
-	u64     xSavedGpr3;             // Saved GPR3                   x20-x27
-	u64     xSavedGpr4;             // Saved GPR4                   x28-x2F
-	u64     xSavedGpr5;             // Saved GPR5                   x30-x37
-
-	u8	xRsvd2_1;		// Reserved			x38-x38
-	u8      xCpuCtlsTaskAttributes; // Task attributes for cpuctls  x39-x39
-	u8      xFPRegsInUse;           // FP regs in use               x3A-x3A
-	u8      xPMCRegsInUse;          // PMC regs in use              x3B-x3B
-	volatile u32  xSavedDecr;	// Saved Decr Value             x3C-x3F
-	volatile u64  xEmulatedTimeBase;// Emulated TB for this thread  x40-x47
-	volatile u64  xCurPLICLatency;	// Unaccounted PLIC latency     x48-x4F
-	u64     xTotPLICLatency;        // Accumulated PLIC latency     x50-x57   
-	u64     xWaitStateCycles;       // Wait cycles for this proc    x58-x5F
-	u64     xEndOfQuantum;          // TB at end of quantum         x60-x67
-	u64     xPDCSavedSPRG1;         // Saved SPRG1 for PMC int      x68-x6F
-	u64     xPDCSavedSRR0;          // Saved SRR0 for PMC int       x70-x77
-	volatile u32 xVirtualDecr;	// Virtual DECR for shared procsx78-x7B
-	u16     xSLBCount;              // # of SLBs to maintain        x7C-x7D
-	u8      xIdle;                  // Indicate OS is idle          x7E
-	u8      xRsvd2_2;               // Reserved                     x7F
+	u64	saved_gpr3;		// Saved GPR3                   x20-x27
+	u64	saved_gpr4;		// Saved GPR4                   x28-x2F
+	u64	saved_gpr5;		// Saved GPR5                   x30-x37
+
+	u8	reserved4;		// Reserved			x38-x38
+	u8	cpuctls_task_attrs;	// Task attributes for cpuctls  x39-x39
+	u8	fpregs_in_use;		// FP regs in use               x3A-x3A
+	u8	pmcregs_in_use;		// PMC regs in use              x3B-x3B
+	volatile u32 saved_decr;	// Saved Decr Value             x3C-x3F
+	volatile u64 emulated_time_base;// Emulated TB for this thread  x40-x47
+	volatile u64 cur_plic_latency;	// Unaccounted PLIC latency     x48-x4F
+	u64	tot_plic_latency;	// Accumulated PLIC latency     x50-x57   
+	u64	wait_state_cycles;	// Wait cycles for this proc    x58-x5F
+	u64	end_of_quantum;		// TB at end of quantum         x60-x67
+	u64	pdc_saved_sprg1;	// Saved SPRG1 for PMC int      x68-x6F
+	u64	pdc_saved_srr0;		// Saved SRR0 for PMC int       x70-x77
+	volatile u32 virtual_decr;	// Virtual DECR for shared procsx78-x7B
+	u16	slb_count;		// # of SLBs to maintain        x7C-x7D
+	u8	idle;			// Indicate OS is idle          x7E
+	u8	reserved5;		// Reserved                     x7F
 
 
 //=============================================================================
 // CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors
 //=============================================================================
-	// This is the xYieldCount.  An "odd" value (low bit on) means that 
+	// This is the yield_count.  An "odd" value (low bit on) means that 
 	// the processor is yielded (either because of an OS yield or a PLIC 
 	// preempt).  An even value implies that the processor is currently 
 	// executing.
 	// NOTE: This value will ALWAYS be zero for dedicated processors and 
 	// will NEVER be zero for shared processors (ie, initialized to a 1).
-	volatile u32 xYieldCount;	// PLIC increments each dispatchx00-x03
-	u8	xRsvd3_0[124];		// Reserved                     x04-x7F         
+	volatile u32 yield_count;	// PLIC increments each dispatchx00-x03
+	u8	reserved6[124];		// Reserved                     x04-x7F         
 
 //=============================================================================
 // CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data
 //=============================================================================
-	u8      xPmcSaveArea[256];	// PMC interrupt Area           x00-xFF
-
-
+	u8	pmc_save_area[256];	// PMC interrupt Area           x00-xFF
 };
 
 #endif /* _ASM_LPPACA_H */
diff -ruN linus-bk-naca.10/include/asm-ppc64/paca.h linus-bk-naca.11/include/asm-ppc64/paca.h
--- linus-bk-naca.10/include/asm-ppc64/paca.h	2004-12-31 15:48:57.000000000 +1100
+++ linus-bk-naca.11/include/asm-ppc64/paca.h	2004-12-31 15:54:35.000000000 +1100
@@ -34,8 +34,8 @@
  *
  * This structure is not directly accessed by firmware or the service
  * processor except for the first two pointers that point to the
- * ItLpPaca area and the ItLpRegSave area for this CPU.  Both the
- * ItLpPaca and ItLpRegSave objects are currently contained within the
+ * lppaca area and the ItLpRegSave area for this CPU.  Both the
+ * lppaca and ItLpRegSave objects are currently contained within the
  * PACA but they do not need to be.
  */
 struct paca_struct {
@@ -50,7 +50,7 @@
 	 * MAGIC: These first two pointers can't be moved - they're
 	 * accessed by the firmware
 	 */
-	struct ItLpPaca *lppaca_ptr;	/* Pointer to LpPaca for PLIC */
+	struct lppaca *lppaca_ptr;	/* Pointer to LpPaca for PLIC */
 	struct ItLpRegSave *reg_save_ptr; /* Pointer to LpRegSave for PLIC */
 
 	/*
@@ -109,7 +109,7 @@
 	 * alignment will suffice to ensure that it doesn't
 	 * cross a page boundary.
 	 */
-	struct ItLpPaca lppaca __attribute__((__aligned__(0x400)));
+	struct lppaca lppaca __attribute__((__aligned__(0x400)));
 #ifdef CONFIG_PPC_ISERIES
 	struct ItLpRegSave reg_save;
 #endif
diff -ruN linus-bk-naca.10/include/asm-ppc64/spinlock.h linus-bk-naca.11/include/asm-ppc64/spinlock.h
--- linus-bk-naca.10/include/asm-ppc64/spinlock.h	2004-09-09 09:59:50.000000000 +1000
+++ linus-bk-naca.11/include/asm-ppc64/spinlock.h	2004-12-13 15:25:23.000000000 +1100
@@ -57,7 +57,7 @@
 
 #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES)
 /* We only yield to the hypervisor if we are in shared processor mode */
-#define SHARED_PROCESSOR (get_paca()->lppaca.xSharedProc)
+#define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc)
 extern void __spin_yield(spinlock_t *lock);
 extern void __rw_yield(rwlock_t *lock);
 #else /* SPLPAR || ISERIES */
diff -ruN linus-bk-naca.10/include/asm-ppc64/time.h linus-bk-naca.11/include/asm-ppc64/time.h
--- linus-bk-naca.10/include/asm-ppc64/time.h	2004-07-05 11:49:20.000000000 +1000
+++ linus-bk-naca.11/include/asm-ppc64/time.h	2004-12-13 16:05:02.000000000 +1100
@@ -78,8 +78,8 @@
 	struct paca_struct *lpaca = get_paca();
 	int cur_dec;
 
-	if (lpaca->lppaca.xSharedProc) {
-		lpaca->lppaca.xVirtualDecr = val;
+	if (lpaca->lppaca.shared_proc) {
+		lpaca->lppaca.virtual_decr = val;
 		cur_dec = get_dec();
 		if (cur_dec > val)
 			HvCall_setVirtualDecr();
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/90f4646f/attachment.pgp 

From anton at samba.org  Tue Jan  4 16:01:15 2005
From: anton at samba.org (Anton Blanchard)
Date: Tue, 4 Jan 2005 16:01:15 +1100
Subject: [PATCH] ppc64: Clarify rtasd printk
Message-ID: <20050104050115.GG7335@krispykreme.ozlabs.ibm.com>


Hi,

On machines with RTAS but without event-scan support we would
incorrectly claim there was no RTAS on the system.

Signed-off-by: Anton Blanchard <anton at samba.org>

===== rtasd.c 1.34 vs edited =====
--- 1.34/arch/ppc64/kernel/rtasd.c	2004-11-16 14:29:11 +11:00
+++ edited/rtasd.c	2004-12-26 13:36:56 +11:00
@@ -486,7 +486,7 @@
 	/* No RTAS, only warn if we are on a pSeries box  */
 	if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) {
 		if (systemcfg->platform & PLATFORM_PSERIES);
-			printk(KERN_ERR "rtasd: no RTAS on system\n");
+			printk(KERN_ERR "rtasd: no event-scan on system\n");
 		return 1;
 	}
 

From anton at samba.org  Tue Jan  4 16:07:27 2005
From: anton at samba.org (Anton Blanchard)
Date: Tue, 4 Jan 2005 16:07:27 +1100
Subject: [PATCH] ppc64: fix some compiler warnings
Message-ID: <20050104050727.GH7335@krispykreme.ozlabs.ibm.com>


Fix some compiler warnings:

- The first two are spurious gcc warnings, but quieten them up regardless
- Add a missing include
- Use register_sysrq_key instead of __sysrq_put_key_op

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/mm/hash_native.c~remove_compiler_warnings arch/ppc64/mm/hash_native.c
--- gr_work/arch/ppc64/mm/hash_native.c~remove_compiler_warnings	2004-12-25 21:44:00.112288718 -0600
+++ gr_work-anton/arch/ppc64/mm/hash_native.c	2004-12-25 21:44:35.782093438 -0600
@@ -242,7 +242,7 @@ static long native_hpte_updatepp(unsigne
  */
 static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea)
 {
-	unsigned long vsid, va, vpn, flags;
+	unsigned long vsid, va, vpn, flags = 0;
 	long slot;
 	HPTE *hptep;
 	int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE);
diff -puN arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_lpar.c
--- gr_work/arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings	2004-12-25 21:44:48.291973925 -0600
+++ gr_work-anton/arch/ppc64/kernel/pSeries_lpar.c	2004-12-25 21:45:08.829912888 -0600
@@ -504,7 +504,7 @@ void pSeries_lpar_flush_hash_range(unsig
 				   int local)
 {
 	int i;
-	unsigned long flags;
+	unsigned long flags = 0;
 	struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
 	int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE);
 
diff -puN arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_setup.c
--- gr_work/arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings	2004-12-25 21:46:35.016298826 -0600
+++ gr_work-anton/arch/ppc64/kernel/pSeries_setup.c	2004-12-25 21:47:05.188173311 -0600
@@ -59,6 +59,7 @@
 #include <asm/naca.h>
 #include <asm/time.h>
 #include <asm/nvram.h>
+#include <asm/plpar_wrappers.h>
 
 #include "i8259.h"
 #include <asm/xics.h>
diff -puN arch/ppc64/xmon/start.c~remove_compiler_warnings arch/ppc64/xmon/start.c
--- gr_work/arch/ppc64/xmon/start.c~remove_compiler_warnings	2004-12-25 21:48:27.578625901 -0600
+++ gr_work-anton/arch/ppc64/xmon/start.c	2004-12-25 21:48:55.121385858 -0600
@@ -40,7 +40,7 @@ static struct sysrq_key_op sysrq_xmon_op
 
 static int __init setup_xmon_sysrq(void)
 {
-	__sysrq_put_key_op('x', &sysrq_xmon_op);
+	register_sysrq_key('x', &sysrq_xmon_op);
 	return 0;
 }
 __initcall(setup_xmon_sysrq);
_


From anton at samba.org  Tue Jan  4 16:13:35 2005
From: anton at samba.org (Anton Blanchard)
Date: Tue, 4 Jan 2005 16:13:35 +1100
Subject: [PATCH] ppc64: remove stale prom.h code
Message-ID: <20050104051335.GJ7335@krispykreme.ozlabs.ibm.com>


Remove some stale code in prom.h

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN include/asm-ppc64/prom.h~prom_cleanup include/asm-ppc64/prom.h
--- foobar2/include/asm-ppc64/prom.h~prom_cleanup	2005-01-04 16:07:39.113436136 +1100
+++ foobar2-anton/include/asm-ppc64/prom.h	2005-01-04 16:07:39.132434650 +1100
@@ -21,9 +21,6 @@
 #define PTRUNRELOC(x)   ((typeof(x))((unsigned long)(x) + offset))
 #define RELOC(x)        (*PTRRELOC(&(x)))
 
-#define LONG_LSW(X) (((unsigned long)X) & 0xffffffff)
-#define LONG_MSW(X) (((unsigned long)X) >> 32)
-
 /* Definitions used by the flattened device tree */
 #define OF_DT_HEADER		0xd00dfeed	/* 4: version, 4: total size */
 #define OF_DT_BEGIN_NODE	0x1		/* Start node: full name */
@@ -64,8 +61,6 @@ struct boot_param_header
 
 typedef u32 phandle;
 typedef u32 ihandle;
-typedef u32 phandle32;
-typedef u32 ihandle32;
 
 struct address_range {
 	unsigned long space;
@@ -95,13 +90,6 @@ struct isa_range {
 	unsigned int size;
 };
 
-struct of_tce_table {
-	phandle node;
-	unsigned long base;
-	unsigned long size;
-};
-extern struct of_tce_table of_tce_table[];
-
 struct reg_property {
 	unsigned long address;
 	unsigned long size;
@@ -117,19 +105,6 @@ struct reg_property64 {
 	unsigned long size;
 };
 
-struct reg_property_pmac {
-	unsigned int address_hi;
-	unsigned int address_lo;
-	unsigned int size;
-};
-
-struct translation_property {
-	unsigned long virt;
-	unsigned long size;
-	unsigned long phys;
-	unsigned int flags;
-};
-
 struct property {
 	char	*name;
 	int	length;
@@ -160,8 +135,6 @@ struct device_node {
 	int	busno;			/* for pci devices */
 	int	bussubno;		/* for pci devices */
 	int	devfn;			/* for pci devices */
-#define DN_STATUS_BIST_FAILED (1<<0)
-	int	status;			/* Current device status (non-zero is bad) */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
 	struct  pci_controller *phb;	/* for pci devices */
@@ -244,7 +217,6 @@ extern int of_remove_node(struct device_
 /* Other Prototypes */
 extern unsigned long prom_init(unsigned long, unsigned long, unsigned long,
 	unsigned long, unsigned long);
-extern void relocate_nodes(void);
 extern void finish_device_tree(void);
 extern int device_is_compatible(struct device_node *device, const char *);
 extern int machine_is_compatible(const char *compat);
_


From paulus at samba.org  Tue Jan  4 17:39:00 2005
From: paulus at samba.org (Paul Mackerras)
Date: Tue, 4 Jan 2005 17:39:00 +1100
Subject: [PATCH] PPC64 Simplify timer_interrupt
Message-ID: <16858.14852.673750.729779@cargo.ozlabs.ibm.com>

This patch is from Milton Miller <miltonm at bga.com>.

When the update_process_times call was moved out of do_timer for the UP case,
the replicator didn't track down the hiding and just added ifndef SMP.

This removes the ifdefs and the indirection of calling another file for
one function in a third file.

Signed-off-by: Milton Miller <miltonm at bga.com>
Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN base-2.6/arch/ppc64/kernel/smp.c test/arch/ppc64/kernel/smp.c
--- base-2.6/arch/ppc64/kernel/smp.c	2005-01-04 16:24:21.930503880 +1100
+++ test/arch/ppc64/kernel/smp.c	2005-01-04 17:36:44.569526376 +1100
@@ -156,11 +156,6 @@
 	}
 }
 
-void smp_local_timer_interrupt(struct pt_regs * regs)
-{
-	update_process_times(user_mode(regs));
-}
-
 void smp_message_recv(int msg, struct pt_regs *regs)
 {
 	switch(msg) {
diff -urN base-2.6/arch/ppc64/kernel/time.c test/arch/ppc64/kernel/time.c
--- base-2.6/arch/ppc64/kernel/time.c	2005-01-04 16:27:42.854446184 +1100
+++ test/arch/ppc64/kernel/time.c	2005-01-04 17:36:44.571526072 +1100
@@ -68,8 +68,6 @@
 #include <asm/sections.h>
 #include <asm/systemcfg.h>
 
-void smp_local_timer_interrupt(struct pt_regs *);
-
 u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
 
 EXPORT_SYMBOL(jiffies_64);
@@ -259,8 +257,6 @@
 	lpaca->lppaca.int_dword.fields.decr_int = 0;
 
 	while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) {
-
-#ifdef CONFIG_SMP
 		/*
 		 * We cannot disable the decrementer, so in the period
 		 * between this cpu's being marked offline in cpu_online_map
@@ -269,8 +265,7 @@
 		 * is the case.
 		 */
 		if (!cpu_is_offline(cpu))
-			smp_local_timer_interrupt(regs);
-#endif
+			update_process_times(user_mode(regs));
 		/*
 		 * No need to check whether cpu is offline here; boot_cpuid
 		 * should have been fixed up by now.
@@ -279,9 +274,6 @@
 			write_seqlock(&xtime_lock);
 			tb_last_stamp = lpaca->next_jiffy_update_tb;
 			do_timer(regs);
-#ifndef CONFIG_SMP
-			update_process_times(user_mode(regs));
-#endif
 			timer_sync_xtime( cur_tb );
 			timer_check_rtc();
 			write_sequnlock(&xtime_lock);


From sfr at canb.auug.org.au  Tue Jan  4 22:58:09 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 22:58:09 +1100
Subject: [PATCH] PPC64: use c99 initializers
In-Reply-To: <20050104154319.505b1197.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
	<20050104153445.3777e689.sfr@canb.auug.org.au>
	<20050104153740.56622b4f.sfr@canb.auug.org.au>
	<20050104154025.63a1b9fb.sfr@canb.auug.org.au>
	<20050104154319.505b1197.sfr@canb.auug.org.au>
Message-ID: <20050104225809.4b265440.sfr@canb.auug.org.au>

Hi Andrew,

This patch is just more clean up in the ppc64 arch.  It uses c99
initializers for various iSeries structures that are used to pass
information to the hypervisor.  Also itLpNaca is not used by any code that
could be in a module, so don't export it.

Built and booted.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>

Please apply.

P.S. for the StudlyCaps brigade, changing these is on my To Do list. :-)
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/LparData.c linus-bk-sfr.12/arch/ppc64/kernel/LparData.c
--- linus-bk-sfr.11/arch/ppc64/kernel/LparData.c	2004-12-13 15:01:55.000000000 +1100
+++ linus-bk-sfr.12/arch/ppc64/kernel/LparData.c	2005-01-04 18:18:37.000000000 +1100
@@ -41,24 +41,22 @@
  */
 
 struct HvReleaseData hvReleaseData = {
-	0xc8a5d9c4,	/* desc = "HvRD" ebcdic */
-	sizeof(struct HvReleaseData),
-	offsetof(struct naca_struct, xItVpdAreas),
-	&naca,		/* 64-bit Naca address */
-	0x6000,		/* offset of LparMap within loadarea (see head.S) */
-	0,
-	1,		/* tags inactive       */
-	0,		/* 64 bit              */
-	0,		/* shared processors   */
-	0,		/* HMT allowed         */
-	6,		/* TEMP: This allows non-GA driver */
-	4,		/* We are v5r2m0               */
-	3,		/* Min supported PLIC = v5r1m0 */
-	3,		/* Min usable PLIC   = v5r1m0 */
-	{ 0xd3, 0x89, 0x95, 0xa4, /* "Linux 2.4   "*/
-	  0xa7, 0x40, 0xf2, 0x4b,
-	  0xf4, 0x4b, 0xf6, 0xf4 },
-	{0}  
+	.xDesc = 0xc8a5d9c4,	/* "HvRD" ebcdic */
+	.xSize = sizeof(struct HvReleaseData),
+	.xVpdAreasPtrOffset = offsetof(struct naca_struct, xItVpdAreas),
+	.xSlicNacaAddr = &naca,		/* 64-bit Naca address */
+	.xMsNucDataOffset = 0x6000,	/* offset of LparMap within loadarea (see head.S) */
+	.xTagsMode = 1,			/* tags inactive       */
+	.xAddressSize = 0,		/* 64 bit              */
+	.xNoSharedProcs = 0,		/* shared processors   */
+	.xNoHMT = 0,			/* HMT allowed         */
+	.xRsvd2 = 6,			/* TEMP: This allows non-GA driver */
+	.xVrmIndex = 4,			/* We are v5r2m0               */
+	.xMinSupportedPlicVrmIndex = 3,		/* v5r1m0 */
+	.xMinCompatablePlicVrmIndex = 3,	/* v5r1m0 */
+	.xVrmName = { 0xd3, 0x89, 0x95, 0xa4,	/* "Linux 2.4.64" ebcdic */
+		0xa7, 0x40, 0xf2, 0x4b,
+		0xf4, 0x4b, 0xf6, 0xf4 },
 };
 
 extern void SystemReset_Iseries(void);
@@ -80,26 +78,33 @@
 extern void InstructionAccessSLB_Iseries(void);
 	
 struct ItLpNaca itLpNaca = {
-	0xd397d581,	/* desc = "LpNa" ebcdic */
-	0x0400,		/* size of ItLpNaca     */
-	0x0300, 19,	/* offset to int array, # ents */
-	0, 0, 0,	/* Part # of primary, serv, me */
-	0, 0x100,	/* # of LP queues, offset */
-	0, 0, 0,	/* Piranha stuff */
-	{ 0,0,0,0,0 },	/* reserved */
-	0,0,0,0,0,0,0,	/* stuff    */
-	{ 0,0,0,0,0 },	/* reserved */
-	0,		/* reserved */
-	0,		/* VRM index of PLIC */
-	0, 0,		/* min supported, compat SLIC */
-	0,		/* 64-bit addr of load area   */
-	0,		/* chunks for load area  */
-	0, 0,		/* PASE mask, seg table  */
-	{ 0 },		/* 64 reserved bytes  */
-	{ 0 }, 		/* 128 reserved bytes */
-	{ 0 }, 		/* Old LP Queue       */
-	{ 0 }, 		/* 384 reserved bytes */
-	{
+	.xDesc = 0xd397d581,		/* "LpNa" ebcdic */
+	.xSize = 0x0400,		/* size of ItLpNaca */
+	.xIntHdlrOffset = 0x0300,	/* offset to int array */
+	.xMaxIntHdlrEntries = 19,	/* # ents */
+	.xPrimaryLpIndex = 0,		/* Part # of primary */
+	.xServiceLpIndex = 0,		/* Part # of serv */
+	.xLpIndex = 0,			/* Part # of me */
+	.xMaxLpQueues = 0,		/* # of LP queues */
+	.xLpQueueOffset = 0x100,	/* offset of start of LP queues */
+	.xPirEnvironMode = 0,		/* Piranha stuff */
+	.xPirConsoleMode = 0,
+	.xPirDasdMode = 0,
+	.xLparInstalled = 0,
+	.xSysPartitioned = 0,
+	.xHwSyncedTBs = 0,
+	.xIntProcUtilHmt = 0,
+	.xSpVpdFormat = 0,
+	.xIntProcRatio = 0,
+	.xPlicVrmIndex = 0,		/* VRM index of PLIC */
+	.xMinSupportedSlicVrmInd = 0,	/* min supported SLIC */
+	.xMinCompatableSlicVrmInd = 0,	/* min compat SLIC */
+	.xLoadAreaAddr = 0,		/* 64-bit addr of load area */
+	.xLoadAreaChunks = 0,		/* chunks for load area */
+	.xPaseSysCallCRMask = 0,	/* PASE mask */
+	.xSlicSegmentTablePtr = 0,	/* seg table */
+	.xOldLpQueue = { 0 }, 		/* Old LP Queue */
+	.xInterruptHdlr = {
 		(u64)SystemReset_Iseries,	/* 0x100 System Reset */
 		(u64)MachineCheck_Iseries,	/* 0x200 Machine Check */
 		(u64)DataAccess_Iseries,	/* 0x300 Data Access */
@@ -153,10 +158,8 @@
 u64    xRecoveryLogBuffer[32] __attribute__((__section__(".data")));
 
 struct SpCommArea xSpCommArea = {
-	0xE2D7C3C2,
-	1,
-	{0},
-	0, 0, 0, 0, {0}
+	.xDesc = 0xE2D7C3C2,
+	.xFormat = 1,
 };
 
 /* The LparMap data is now located at offset 0x6000 in head.S
@@ -168,22 +171,21 @@
  * offset into the Naca of the pointer to the ItVpdAreas.
  */
 struct ItVpdAreas itVpdAreas = {
-	0xc9a3e5c1,	/* "ItVA" */
-	sizeof( struct ItVpdAreas ),
-	0, 0,
-	26,		/* # VPD array entries */
-	10,		/* # DMA array entries */
-	NR_CPUS*2, maxPhysicalProcessors,	/* Max logical, physical procs */
-	offsetof(struct ItVpdAreas,xPlicDmaToks),/* offset to DMA toks */
-	offsetof(struct ItVpdAreas,xSlicVpdAdrs),/* offset to VPD addrs */
-	offsetof(struct ItVpdAreas,xPlicDmaLens),/* offset to DMA lens */
-	offsetof(struct ItVpdAreas,xSlicVpdLens),/* offset to VPD lens */
-	0,		/* max slot labels */
-	1,		/* max LP queues */
-	{0}, {0},	/* reserved */
-	{0},		/* DMA lengths */
-	{0},		/* DMA tokens */
-	{		/* VPD lengths */
+	.xSlicDesc = 0xc9a3e5c1,		/* "ItVA" */
+	.xSlicSize = sizeof(struct ItVpdAreas),
+	.xSlicVpdEntries = ItVpdMaxEntries,	/* # VPD array entries */
+	.xSlicDmaEntries = ItDmaMaxEntries,	/* # DMA array entries */
+	.xSlicMaxLogicalProcs = NR_CPUS * 2,	/* Max logical procs */
+	.xSlicMaxPhysicalProcs = maxPhysicalProcessors,	/* Max physical procs */
+	.xSlicDmaToksOffset = offsetof(struct ItVpdAreas, xPlicDmaToks),
+	.xSlicVpdAdrsOffset = offsetof(struct ItVpdAreas, xSlicVpdAdrs),
+	.xSlicDmaLensOffset = offsetof(struct ItVpdAreas, xPlicDmaLens),
+	.xSlicVpdLensOffset = offsetof(struct ItVpdAreas, xSlicVpdLens),
+	.xSlicMaxSlotLabels = 0,		/* max slot labels */
+	.xSlicMaxLpQueues = 1,			/* max LP queues */
+	.xPlicDmaLens = { 0 },			/* DMA lengths */
+	.xPlicDmaToks = { 0 },			/* DMA tokens */
+	.xSlicVpdLens = {			/* VPD lengths */
 	        0,0,0,		        /*  0 - 2 */
 		sizeof(xItExtVpdPanel), /*       3 Extended VPD   */
 		sizeof(struct paca_struct),	/*       4 length of Paca  */
@@ -201,7 +203,7 @@
 		sizeof(struct ItLpQueue),/*     23 length of Lp Queue */
 		0,0			/* 24 - 25 */
 		},
-	{			/* VPD addresses */
+	.xSlicVpdAdrs = {			/* VPD addresses */
 		0,0,0,  		/*	 0 -  2 */
 		&xItExtVpdPanel,        /*       3 Extended VPD */
 		&paca[0],		/*       4 first Paca */
diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c
--- linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c	2004-12-31 14:52:14.000000000 +1100
+++ linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c	2005-01-04 18:07:42.000000000 +1100
@@ -68,9 +68,6 @@
 EXPORT_SYMBOL(__down_interruptible);
 EXPORT_SYMBOL(__up);
 EXPORT_SYMBOL(__down);
-#ifdef CONFIG_PPC_ISERIES
-EXPORT_SYMBOL(itLpNaca);
-#endif
 
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_generic);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/7cbb34e2/attachment.pgp 

From sfr at canb.auug.org.au  Tue Jan  4 23:05:08 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 4 Jan 2005 23:05:08 +1100
Subject: [PATCH] PPC64: tidy up the htab_data structure
In-Reply-To: <20050104225809.4b265440.sfr@canb.auug.org.au>
References: <20050104145356.4d5333dd.sfr@canb.auug.org.au>
	<20050104150410.199b132e.sfr@canb.auug.org.au>
	<20050104150833.5d3f3722.sfr@canb.auug.org.au>
	<20050104151229.521e8083.sfr@canb.auug.org.au>
	<20050104151906.6e50f1d2.sfr@canb.auug.org.au>
	<20050104152340.67219ccf.sfr@canb.auug.org.au>
	<20050104152705.6030abc5.sfr@canb.auug.org.au>
	<20050104153102.67284491.sfr@canb.auug.org.au>
	<20050104153445.3777e689.sfr@canb.auug.org.au>
	<20050104153740.56622b4f.sfr@canb.auug.org.au>
	<20050104154025.63a1b9fb.sfr@canb.auug.org.au>
	<20050104154319.505b1197.sfr@canb.auug.org.au>
	<20050104225809.4b265440.sfr@canb.auug.org.au>
Message-ID: <20050104230508.13dd0df4.sfr@canb.auug.org.au>

Hi Andrew,

More tidying up.

The htab_data structure contained 5 fields or which two were completely
unused and one other was just kept for printing at boot time.  I have mode
the remaining two into global variables.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>

Built and booted on iSeries (which is always lpar) and on pSeries without
partitioning.

Please apply.
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c	2004-12-13 15:31:14.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c	2005-01-04 19:01:54.000000000 +1100
@@ -472,18 +472,16 @@
 	printk("HPT absolute addr = %016lx, size = %dK\n",
 			chunk_to_addr(hptFirstChunk), hptSizeChunks * 256);
 
-	/* Fill in the htab_data structure */
-	/* Fill in size of hashed page table */
+	/* Fill in the hashed page table hash mask */
 	num_ptegs = hptSizePages *
 		(PAGE_SIZE / (sizeof(HPTE) * HPTES_PER_GROUP));
-	htab_data.htab_num_ptegs = num_ptegs;
-	htab_data.htab_hash_mask = num_ptegs - 1;
+	htab_hash_mask = num_ptegs - 1;
 	
 	/*
 	 * The actual hashed page table is in the hypervisor,
 	 * we have no direct access
 	 */
-	htab_data.htab = NULL;
+	htab_address = NULL;
 
 	/*
 	 * Determine if absolute memory has any
diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c
--- linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c	2004-12-31 15:16:48.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c	2005-01-04 19:00:17.000000000 +1100
@@ -436,7 +436,7 @@
 	hash = hpt_hash(vpn, 0);
 
 	for (j = 0; j < 2; j++) {
-		slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP;
+		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 		for (i = 0; i < HPTES_PER_GROUP; i++) {
 			hpte_dw0.dword0 = pSeries_lpar_hpte_getword0(slot);
 			dw0 = hpte_dw0.dw0;
diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/setup.c linus-bk-sfr.13/arch/ppc64/kernel/setup.c
--- linus-bk-sfr.12/arch/ppc64/kernel/setup.c	2004-12-31 16:24:11.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/kernel/setup.c	2005-01-04 18:58:58.000000000 +1100
@@ -55,6 +55,7 @@
 #include <asm/serial.h>
 #include <asm/cache.h>
 #include <asm/page.h>
+#include <asm/mmu.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -90,7 +91,6 @@
 #endif
 
 /* extern void *stab; */
-extern HTAB htab_data;
 extern unsigned long klimit;
 
 extern void mm_init_ppc64(void);
@@ -672,8 +672,8 @@
 			ppc64_caches.dline_size);
 	printk("ppc64_caches.icache_line_size = 0x%x\n",
 			ppc64_caches.iline_size);
-	printk("htab_data.htab                = 0x%p\n", htab_data.htab);
-	printk("htab_data.num_ptegs           = 0x%lx\n", htab_data.htab_num_ptegs);
+	printk("htab_address                  = 0x%p\n", htab_address);
+	printk("htab_hash_mask                = 0x%lx\n", htab_hash_mask);
 	printk("-----------------------------------------------------\n");
 
 	mm_init_ppc64();
diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_low.S linus-bk-sfr.13/arch/ppc64/mm/hash_low.S
--- linus-bk-sfr.12/arch/ppc64/mm/hash_low.S	2004-10-14 18:37:37.000000000 +1000
+++ linus-bk-sfr.13/arch/ppc64/mm/hash_low.S	2005-01-04 19:06:24.000000000 +1100
@@ -139,8 +139,8 @@
 	std	r3,STK_PARM(r4)(r1)
 
 	/* Get htab_hash_mask */
-	ld	r4,htab_data at got(2)
-	ld	r27,16(r4)	/* htab_data.htab_hash_mask -> r27 */
+	ld	r4,htab_hash_mask at got(2)
+	ld	r27,0(r4)	/* htab_hash_mask -> r27 */
 
 	/* Check if we may already be in the hashtable, in this case, we
 	 * go to out-of-line code to try to modify the HPTE
diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_native.c linus-bk-sfr.13/arch/ppc64/mm/hash_native.c
--- linus-bk-sfr.12/arch/ppc64/mm/hash_native.c	2004-11-16 16:05:10.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/mm/hash_native.c	2005-01-04 19:09:45.000000000 +1100
@@ -52,7 +52,7 @@
 			unsigned long hpteflags, int bolted, int large)
 {
 	unsigned long arpn = physRpn_to_absRpn(prpn);
-	HPTE *hptep = htab_data.htab + hpte_group;
+	HPTE *hptep = htab_address + hpte_group;
 	Hpte_dword0 dw0;
 	HPTE lhpte;
 	int i;
@@ -117,7 +117,7 @@
 	slot_offset = mftb() & 0x7;
 
 	for (i = 0; i < HPTES_PER_GROUP; i++) {
-		hptep = htab_data.htab + hpte_group + slot_offset;
+		hptep = htab_address + hpte_group + slot_offset;
 		dw0 = hptep->dw0.dw0;
 
 		if (dw0.v && !dw0.bolted) {
@@ -172,9 +172,9 @@
 	hash = hpt_hash(vpn, 0);
 
 	for (j = 0; j < 2; j++) {
-		slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP;
+		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 		for (i = 0; i < HPTES_PER_GROUP; i++) {
-			hptep = htab_data.htab + slot;
+			hptep = htab_address + slot;
 			dw0 = hptep->dw0.dw0;
 
 			if ((dw0.avpn == (vpn >> 11)) && dw0.v &&
@@ -195,7 +195,7 @@
 static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 				 unsigned long va, int large, int local)
 {
-	HPTE *hptep = htab_data.htab + slot;
+	HPTE *hptep = htab_address + slot;
 	Hpte_dword0 dw0;
 	unsigned long avpn = va >> 23;
 	int ret = 0;
@@ -254,7 +254,7 @@
 	slot = native_hpte_find(vpn);
 	if (slot == -1)
 		panic("could not find page to bolt\n");
-	hptep = htab_data.htab + slot;
+	hptep = htab_address + slot;
 
 	set_pp_bit(newpp, hptep);
 
@@ -269,7 +269,7 @@
 static void native_hpte_invalidate(unsigned long slot, unsigned long va,
 				    int large, int local)
 {
-	HPTE *hptep = htab_data.htab + slot;
+	HPTE *hptep = htab_address + slot;
 	Hpte_dword0 dw0;
 	unsigned long avpn = va >> 23;
 	unsigned long flags;
@@ -336,10 +336,10 @@
 		secondary = (pte_val(batch->pte[i]) & _PAGE_SECONDARY) >> 15;
 		if (secondary)
 			hash = ~hash;
-		slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP;
+		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 		slot += (pte_val(batch->pte[i]) & _PAGE_GROUP_IX) >> 12;
 
-		hptep = htab_data.htab + slot;
+		hptep = htab_address + slot;
 
 		avpn = va >> 23;
 		if (large)
diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c
--- linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c	2004-12-31 14:52:56.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c	2005-01-04 19:08:37.000000000 +1100
@@ -74,7 +74,8 @@
 extern unsigned long dart_tablebase;
 #endif /* CONFIG_U3_DART */
 
-HTAB htab_data = {NULL, 0, 0, 0, 0};
+HPTE		*htab_address;
+unsigned long	htab_hash_mask;
 
 extern unsigned long _SDR1;
 
@@ -113,7 +114,7 @@
 
 		hash = hpt_hash(vpn, large);
 
-		hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP);
+		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
 
 #ifdef CONFIG_PPC_PSERIES
 		if (systemcfg->platform & PLATFORM_LPAR)
@@ -155,12 +156,11 @@
 		htab_size_bytes = pteg_count << 7;
 	}
 
-	htab_data.htab_num_ptegs = pteg_count;
-	htab_data.htab_hash_mask = pteg_count - 1;
+	htab_hash_mask = pteg_count - 1;
 
 	if (systemcfg->platform & PLATFORM_LPAR) {
 		/* Using a hypervisor which owns the htab */
-		htab_data.htab = NULL;
+		htab_address = NULL;
 		_SDR1 = 0; 
 	} else {
 		/* Find storage for the HPT.  Must be contiguous in
@@ -175,7 +175,7 @@
 			ppc64_terminate_msg(0x20, "hpt space");
 			loop_forever();
 		}
-		htab_data.htab = abs_to_virt(table);
+		htab_address = abs_to_virt(table);
 
 		/* htab absolute addr + encoded htabsize */
 		_SDR1 = table + __ilog2(pteg_count) - 11;
@@ -356,7 +356,7 @@
 	secondary = (pte_val(pte) & _PAGE_SECONDARY) >> 15;
 	if (secondary)
 		hash = ~hash;
-	slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP;
+	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 	slot += (pte_val(pte) & _PAGE_GROUP_IX) >> 12;
 
 	ppc_md.hpte_invalidate(slot, va, huge, local);
diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c
--- linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c	2004-10-29 07:03:21.000000000 +1000
+++ linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c	2005-01-04 19:02:45.000000000 +1100
@@ -832,7 +832,7 @@
 		hash = hpt_hash(vpn, 1);
 		if (pte_val(old_pte) & _PAGE_SECONDARY)
 			hash = ~hash;
-		slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP;
+		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 		slot += (pte_val(old_pte) & _PAGE_GROUP_IX) >> 12;
 
 		if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1)
@@ -846,7 +846,7 @@
 		prpn = pte_pfn(old_pte);
 
 repeat:
-		hpte_group = ((hash & htab_data.htab_hash_mask) *
+		hpte_group = ((hash & htab_hash_mask) *
 			      HPTES_PER_GROUP) & ~0x7UL;
 
 		/* Update the linux pte with the HPTE slot */
@@ -863,13 +863,13 @@
 		/* Primary is full, try the secondary */
 		if (unlikely(slot == -1)) {
 			pte_val(new_pte) |= _PAGE_SECONDARY;
-			hpte_group = ((~hash & htab_data.htab_hash_mask) *
+			hpte_group = ((~hash & htab_hash_mask) *
 				      HPTES_PER_GROUP) & ~0x7UL; 
 			slot = ppc_md.hpte_insert(hpte_group, va, prpn,
 						  1, hpteflags, 0, 1);
 			if (slot == -1) {
 				if (mftb() & 0x1)
-					hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+					hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
 
 				ppc_md.hpte_remove(hpte_group);
 				goto repeat;
diff -ruN linus-bk-sfr.12/arch/ppc64/mm/init.c linus-bk-sfr.13/arch/ppc64/mm/init.c
--- linus-bk-sfr.12/arch/ppc64/mm/init.c	2004-12-10 16:26:54.000000000 +1100
+++ linus-bk-sfr.13/arch/ppc64/mm/init.c	2005-01-04 19:03:14.000000000 +1100
@@ -168,7 +168,7 @@
 
 		hash = hpt_hash(vpn, 0);
 
-		hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP);
+		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
 
 		/* Panic if a pte grpup is full */
 		if (ppc_md.hpte_insert(hpteg, va, pa >> PAGE_SHIFT, 0,
diff -ruN linus-bk-sfr.12/include/asm-ppc64/mmu.h linus-bk-sfr.13/include/asm-ppc64/mmu.h
--- linus-bk-sfr.12/include/asm-ppc64/mmu.h	2004-10-29 07:03:22.000000000 +1000
+++ linus-bk-sfr.13/include/asm-ppc64/mmu.h	2005-01-04 19:10:32.000000000 +1100
@@ -98,15 +98,8 @@
 #define PP_RXRX 3	/* Supervisor read,       User read */
 
 
-typedef struct {
-	HPTE *		htab;
-	unsigned long	htab_num_ptegs;
-	unsigned long	htab_hash_mask;
-	unsigned long	next_round_robin;
-	unsigned long   last_kernel_address;
-} HTAB;
-
-extern HTAB htab_data;
+extern HPTE *		htab_address;
+extern unsigned long	htab_hash_mask;
 
 static inline unsigned long hpt_hash(unsigned long vpn, int large)
 {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a5b39bfc/attachment.pgp 

From moilanen at austin.ibm.com  Wed Jan  5 07:30:31 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Tue, 4 Jan 2005 14:30:31 -0600
Subject: [PATCH] xmon breakpoints fix for Power4/5
Message-ID: <20050104143031.62c25338@localhost>

Looks like xmon breakpoints were not working on Power4/5.  Here's a fix
to the problem.  

Tested on Power3 and Power5 boxes.  

Jake

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c
--- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp	Tue Jan  4 12:44:20 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c	Tue Jan  4 14:13:09 2005
@@ -1088,11 +1088,6 @@ bpt_cmds(void)
 		break;
 
 	case 'i':	/* bi - hardware instr breakpoint */
-		if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) {
-			printf("Hardware instruction breakpoint "
-			       "not supported on this cpu\n");
-			break;
-		}
 		if (iabr) {
 			iabr->enabled &= ~(BP_IABR | BP_IABR_TE);
 			iabr = NULL;
@@ -1101,10 +1096,15 @@ bpt_cmds(void)
 			break;
 		if (!check_bp_loc(a))
 			break;
+
 		bp = new_breakpoint(a);
-		if (bp != NULL) {
+
+		if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) {		
 			bp->enabled |= BP_IABR | BP_IABR_TE;
 			iabr = bp;
+		} else {
+			if (bp) 
+				bp->enabled |= BP_TRAP;
 		}
 		break;
 

_


From sjmunroe at us.ibm.com  Wed Jan  5 09:02:04 2005
From: sjmunroe at us.ibm.com (Steve Munroe)
Date: Tue, 4 Jan 2005 16:02:04 -0600
Subject: ppc64 vDSO update
In-Reply-To: <1101094716.13598.39.camel@gaston>
Message-ID: <OF3FD2DB7F.20FBC776-ON86256F7F.00763DF5-86256F7F.00790C47@us.ibm.com>

Benjamin Herrenschmidt <benh at kernel.crashing.org> wrote on 11/21/2004 
09:38:36 PM:

> At the URL below, you can find a new version of the ppc64 vDSO patch 
against
> a recent Linus bk tree. I intend to submit it upstream real soon as the 
work
> on non-executable stack is waiting for it, though we must first make 
sure the
> way symbols are exported to userland is ok for glibc.
> 
> http://gate.crashing.org/~benh/ppc64-vdso-20041122.diff
> ...
> 
> (Craig: the signal issue is fixed now, either when building with 
> descriptors or
> without).
> 
> Ben.
> 

Still haveing problems with VDSO/GLIBC integration. Basically any glibc 
make check test that uses signals is a space shot for both PPC32/PPC64. 

First it seems that glibc is expecting a (fairly normal) DSO image 
including two (2) LOAD entries in the program header. The current PPC64 
kernel vdso images only contain one (1) LOAD entry:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E 
0x10000
  DYNAMIC        0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R   0x4
  GNU_EH_FRAME   0x000000 0x00000000 0x00000000 0x00000 0x00000     0x4

This caused problems for the code in libc/elf/rtld.c that attempts to 
extract l_map_start/l_map_end for the vdso:

              else if (ph->p_type == PT_LOAD)
                {
                  if (! l->l_addr)
                    l->l_addr = ph->p_vaddr;
                  else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end)
                    l->l_map_end = ph->p_vaddr + ph->p_memsz;
                  else if ((ph->p_flags & PF_X)
                           && ph->p_vaddr + ph->p_memsz >= l->l_text_end)
                    l->l_text_end = ph->p_vaddr + ph->p_memsz;
                }

This code will set l_addr but not l_map_end or l_text_end because it 
grabbed the p_vaddr from the 1st and only LOAD entry then continue the 
loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this 
causes the "assert (mapend > mapstart)" in __elf_preferred_address to 
fail. I hacked around this by removing the "else" from the "else if" but 
it just fails later.

The remaining problem is we are getting into dl_iterate_phdr and taking a 
wild branch. This could be from the callback in dl_iterate_phdr and due to 
the incomplete nature of our vsdo. This is difficult to debug as the stack 
point (and TOC pointer in PPC64) are both clobbered by this point and 
GDB-6.1 gets totally confused.

Ben: it would be handy if you could update the corefile support to include 
the vdso segments. Also please try a vdso with 2 LOAD segments. 

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center


From moilanen at austin.ibm.com  Wed Jan  5 09:13:54 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Tue, 4 Jan 2005 16:13:54 -0600
Subject: in_be64() assembly
Message-ID: <20050104161354.17f77ce7@localhost>

I'm trying to use in_be64() and when I build, I get a compile errors:

{standard input}: Assembler messages:
{standard input}:5534: Error: syntax error; found `(' but expected `)'
{standard input}:5534: Error: junk at end of line: `(3))'
make[1]: *** [arch/ppc64/xmon/xmon.o] Error 1
make: *** [arch/ppc64/xmon] Error 2
make: *** Waiting for unfinished jobs....

Olof pointed out that in/out_le64 use a "b" operand for the addr.

In in_be64(), when changed the "m" operand to a "b", the kernel built
fine (although I haven't tried running it yet).  What does the "b"
operand mean?

Patch used below.

Thanks,
Jake

---


diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h
--- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix	Tue Jan  4 15:33:22 2005
+++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h	Tue Jan  4 15:59:50 2005
@@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons
 	unsigned long ret;
 
 	__asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
-			     : "=r" (ret) : "m" (*addr));
+			     : "=r" (ret) : "b" (*addr));
 	return ret;
 }
 

_


From amodra at bigpond.net.au  Wed Jan  5 10:31:32 2005
From: amodra at bigpond.net.au (Alan Modra)
Date: Wed, 5 Jan 2005 10:01:32 +1030
Subject: ppc64 vDSO update
In-Reply-To: <OF3FD2DB7F.20FBC776-ON86256F7F.00763DF5-86256F7F.00790C47@us.ibm.com>
References: <1101094716.13598.39.camel@gaston>
	<OF3FD2DB7F.20FBC776-ON86256F7F.00763DF5-86256F7F.00790C47@us.ibm.com>
Message-ID: <20050104233132.GF11457@bubble.modra.org>

On Tue, Jan 04, 2005 at 04:02:04PM -0600, Steve Munroe wrote:
> First it seems that glibc is expecting a (fairly normal) DSO image 
> including two (2) LOAD entries in the program header. The current PPC64 
> kernel vdso images only contain one (1) LOAD entry:
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E 
> 0x10000
>   DYNAMIC        0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R   0x4
>   GNU_EH_FRAME   0x000000 0x00000000 0x00000000 0x00000 0x00000     0x4

There's absolutely nothing wrong with an executable or shared lib having
just one PT_LOAD segment.  It's a glibc bug if ld.so can't handle it.

> This caused problems for the code in libc/elf/rtld.c that attempts to 
> extract l_map_start/l_map_end for the vdso:
> 
>               else if (ph->p_type == PT_LOAD)
>                 {
>                   if (! l->l_addr)
>                     l->l_addr = ph->p_vaddr;
>                   else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end)
>                     l->l_map_end = ph->p_vaddr + ph->p_memsz;
>                   else if ((ph->p_flags & PF_X)
>                            && ph->p_vaddr + ph->p_memsz >= l->l_text_end)
>                     l->l_text_end = ph->p_vaddr + ph->p_memsz;
>                 }
> 
> This code will set l_addr but not l_map_end or l_text_end because it 
> grabbed the p_vaddr from the 1st and only LOAD entry then continue the 
> loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this 
> causes the "assert (mapend > mapstart)" in __elf_preferred_address to 
> fail. I hacked around this by removing the "else" from the "else if" but 
> it just fails later.

Buggy code.  All the "else" keywords should be removed.  ie.

		  if (! l->l_addr)
		    l->l_addr = ph->p_vaddr;
		  if (ph->p_vaddr + ph->p_memsz >= l->l_map_end)
		    l->l_map_end = ph->p_vaddr + ph->p_memsz;
		  if ((ph->p_flags & PF_X)
		      && ph->p_vaddr + ph->p_memsz >= l->l_text_end)
		    l->l_text_end = ph->p_vaddr + ph->p_memsz;

> The remaining problem is we are getting into dl_iterate_phdr and taking a 
> wild branch. This could be from the callback in dl_iterate_phdr and due to 
> the incomplete nature of our vsdo. This is difficult to debug as the stack 
> point (and TOC pointer in PPC64) are both clobbered by this point and 
> GDB-6.1 gets totally confused.

I don't know what to suggest, other than brute force debugging by poking
.long 0 over code paths you suspect might be executed.

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre


From paulus at samba.org  Wed Jan  5 10:53:34 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 5 Jan 2005 10:53:34 +1100
Subject: [PATCH] xmon breakpoints fix for Power4/5
In-Reply-To: <20050104143031.62c25338@localhost>
References: <20050104143031.62c25338@localhost>
Message-ID: <16859.11390.511469.875831@cargo.ozlabs.ibm.com>

Jake Moilanen writes:

> Looks like xmon breakpoints were not working on Power4/5.  Here's a fix
> to the problem.  

You mean the 'bi' command didn't make a breakpoint?  Just use the 'b'
command instead.  Also you take out the if (bp != NULL) check which is
needed.

Rejected.

Paul.


From linas at austin.ibm.com  Wed Jan  5 11:10:16 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Tue, 4 Jan 2005 18:10:16 -0600
Subject: in_be64() assembly
In-Reply-To: <20050104161354.17f77ce7@localhost>
References: <20050104161354.17f77ce7@localhost>
Message-ID: <20050105001016.GC22274@austin.ibm.com>

On Tue, Jan 04, 2005 at 04:13:54PM -0600, Jake Moilanen was heard to remark:
> 
> diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h
> --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix	Tue Jan  4 15:33:22 2005
> +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h	Tue Jan  4 15:59:50 2005
> @@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons
>  	unsigned long ret;
>  
>  	__asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
> -			     : "=r" (ret) : "m" (*addr));
> +			     : "=r" (ret) : "b" (*addr));
>  	return ret;
>  }


Very weird.  Why anyone thought that doing a load with a zero offset 
is somehow 'correct' seems strange to me.  The compiler is quite
capable of computing offsets, and I don't see any aliasing issues.
Certainly the 8, 16 and 32-bit versions doen't do this kind of 
funny business.

Does the following work?

static inline unsigned long in_be64(const whatever ...)
{
	unsigned long ret;
  
  	__asm__ __volatile__("ld %0,%1; twi 0,%0,0; isync"
			     : "=r" (ret) : "m" (*addr));
  	return ret;
}

I suspect in_le64 is also borken,  it should be

"ld %1,%2\n"  

...with

: "=r" (ret) , "=r" (tmp) : "m" (*addr) ,

instead of the b.

out_le64 looks broken in the same way.

--linas


From paulus at samba.org  Wed Jan  5 11:24:44 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 5 Jan 2005 11:24:44 +1100
Subject: in_be64() assembly
In-Reply-To: <20050104161354.17f77ce7@localhost>
References: <20050104161354.17f77ce7@localhost>
Message-ID: <16859.13260.426004.296846@cargo.ozlabs.ibm.com>

Jake Moilanen writes:

> In in_be64(), when changed the "m" operand to a "b", the kernel built
> fine (although I haven't tried running it yet).  What does the "b"
> operand mean?

"b" means the value should be in a "base" register, i.e. any gpr other
than gpr0.

Your patch isn't correct.  We can either do:

	__asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
			     : "=r" (ret) : "b" (addr));

(note no "*" before addr) or we can do

	__asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync"
			     : "=r" (ret) : "m" (*addr));

On the whole I think I prefer the second.

Paul.


From paulus at samba.org  Wed Jan  5 11:35:34 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 5 Jan 2005 11:35:34 +1100
Subject: in_be64() assembly
In-Reply-To: <20050105001016.GC22274@austin.ibm.com>
References: <20050104161354.17f77ce7@localhost>
	<20050105001016.GC22274@austin.ibm.com>
Message-ID: <16859.13910.16173.232170@cargo.ozlabs.ibm.com>

Linas Vepstas writes:

> Very weird.  Why anyone thought that doing a load with a zero offset 
> is somehow 'correct' seems strange to me.  The compiler is quite

It's one of the two addressing modes that PPC has - register + offset
and register + register.

> I suspect in_le64 is also borken,  it should be
> 
> "ld %1,%2\n"  

It and out_le64 are correct as they stand.  They could be rewritten as
"ld%U2%X2 %1,%2" etc.

Paul.


From david at gibson.dropbear.id.au  Wed Jan  5 14:54:28 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Wed, 5 Jan 2005 14:54:28 +1100
Subject: [PPC64] Add performance monitor register information to processor.h
Message-ID: <20050105035428.GC21259@zax>

Andrew, please apply:

Most special purpose registers on the ppc64 have both the SPR number,
and the various fields within the register defined in
asm-ppc64/processor.h.  So far that's not true for the performance
counter control registers, MMCR0 and MMCRA.  They have the SPR numbers
defined, but the internal fields are defined in the oprofile code and
(just a few) in traps.c where they're actually used.

This patch moves all the MMCR0 and MMCRA definitions, plus the MSR
performance monitor bit, MSR_PMM, into processor.h.

Index: working-2.6/include/asm-ppc64/processor.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/processor.h	2005-01-05 14:46:10.557311664 +1100
+++ working-2.6/include/asm-ppc64/processor.h	2005-01-05 14:46:12.551274880 +1100
@@ -44,6 +44,7 @@
 #define MSR_DR_LG	4 		/* Data Relocate */
 #define MSR_PE_LG	3		/* Protection Enable */
 #define MSR_PX_LG	2		/* Protection Exclusive Mode */
+#define MSR_PMM_LG	2		/* Performance monitor */
 #define MSR_RI_LG	1		/* Recoverable Exception */
 #define MSR_LE_LG	0 		/* Little Endian */
 
@@ -76,6 +77,7 @@
 #define MSR_DR		__MASK(MSR_DR_LG)	/* Data Relocate */
 #define MSR_PE		__MASK(MSR_PE_LG)	/* Protection Enable */
 #define MSR_PX		__MASK(MSR_PX_LG)	/* Protection Exclusive Mode */
+#define MSR_PMM		__MASK(MSR_PMM_LG)	/* Performance monitor */
 #define MSR_RI		__MASK(MSR_RI_LG)	/* Recoverable Exception */
 #define MSR_LE		__MASK(MSR_LE_LG)	/* Little Endian */
 
@@ -305,6 +307,9 @@
 #define SPRN_SIAR	780
 #define SPRN_SDAR	781
 #define SPRN_MMCRA	786
+#define   MMCRA_SIHV	0x10000000UL /* state of MSR HV when SIAR set */
+#define   MMCRA_SIPR	0x08000000UL /* state of MSR PR when SIAR set */
+#define   MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */
 #define SPRN_PMC1	787
 #define SPRN_PMC2	788
 #define SPRN_PMC3	789
@@ -314,6 +319,26 @@
 #define SPRN_PMC7	793
 #define SPRN_PMC8	794
 #define SPRN_MMCR0	795
+#define   MMCR0_FC	0x80000000UL /* freeze counters. set to 1 on a perfmon exception */
+#define   MMCR0_FCS	0x40000000UL /* freeze in supervisor state */
+#define   MMCR0_KERNEL_DISABLE MMCR0_FCS
+#define   MMCR0_FCP	0x20000000UL /* freeze in problem state */
+#define   MMCR0_PROBLEM_DISABLE MMCR0_FCP
+#define   MMCR0_FCM1	0x10000000UL /* freeze counters while MSR mark = 1 */
+#define   MMCR0_FCM0	0x08000000UL /* freeze counters while MSR mark = 0 */
+#define   MMCR0_PMXE	0x04000000UL /* performance monitor exception enable */
+#define   MMCR0_FCECE	0x02000000UL /* freeze counters on enabled condition or event */
+/* time base exception enable */
+#define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
+#define   MMCR0_PMC1INTCONTROL	0x00008000UL /* PMC1 count enable*/
+#define   MMCR0_PMCNINTCONTROL	0x00004000UL /* PMCn count enable*/
+#define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
+#define   MMCR0_PMAO	0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */
+#define   MMCR0_SHRFC	0x00000040UL /* SHRre freeze conditions between threads */
+#define   MMCR0_FCTI	0x00000008UL /* freeze counters in tags inactive mode */
+#define   MMCR0_FCTA	0x00000004UL /* freeze counters in tags active mode */
+#define   MMCR0_FCWAIT	0x00000002UL /* freeze counter in WAIT state */
+#define   MMCR0_FCHV	0x00000001UL /* freeze conditions in hypervisor mode */
 #define SPRN_MMCR1	798
 
 /* Short-hand versions for a number of the above SPRNs */
Index: working-2.6/arch/ppc64/oprofile/op_impl.h
===================================================================
--- working-2.6.orig/arch/ppc64/oprofile/op_impl.h	2005-01-05 14:46:10.558311512 +1100
+++ working-2.6/arch/ppc64/oprofile/op_impl.h	2005-01-05 14:46:12.551274880 +1100
@@ -14,44 +14,6 @@
 
 #define OP_MAX_COUNTER 8
 
-#define MSR_PMM		(1UL << (63 - 61))
-
-/* freeze counters. set to 1 on a perfmon exception */
-#define MMCR0_FC	(1UL << (31 - 0))
-
-/* freeze in supervisor state */
-#define MMCR0_KERNEL_DISABLE (1UL << (31 - 1))
-
-/* freeze in problem state */
-#define MMCR0_PROBLEM_DISABLE (1UL << (31 - 2))
-
-/* freeze counters while MSR mark = 1 */
-#define MMCR0_FCM1	(1UL << (31 - 3))
-
-/* performance monitor exception enable */
-#define MMCR0_PMXE	(1UL << (31 - 5))
-
-/* freeze counters on enabled condition or event */
-#define MMCR0_FCECE	(1UL << (31 - 6))
-
-/* PMC1 count enable*/
-#define MMCR0_PMC1INTCONTROL	(1UL << (31 - 16))
-
-/* PMCn count enable*/
-#define MMCR0_PMCNINTCONTROL	(1UL << (31 - 17))
-
-/* performance monitor alert has occurred, set to 0 after handling exception */
-#define MMCR0_PMAO	(1UL << (31 - 24))
-
-/* state of MSR HV when SIAR set */
-#define MMCRA_SIHV	(1UL << (63 - 35))
-
-/* state of MSR PR when SIAR set */
-#define MMCRA_SIPR	(1UL << (63 - 36))
-
-/* enable sampling */
-#define MMCRA_SAMPLE_ENABLE	(1UL << (63 - 63))
-
 /* Per-counter configuration as set via oprofilefs.  */
 struct op_counter_config {
 	unsigned long valid;
Index: working-2.6/arch/ppc64/kernel/traps.c
===================================================================
--- working-2.6.orig/arch/ppc64/kernel/traps.c	2005-01-05 14:46:10.558311512 +1100
+++ working-2.6/arch/ppc64/kernel/traps.c	2005-01-05 14:46:12.552274728 +1100
@@ -545,9 +545,6 @@
 }
 
 /* Ensure exceptions are disabled */
-#define MMCR0_PMXE      (1UL << (31 - 5))
-#define MMCR0_PMAO      (1UL << (31 - 24))
-
 static void dummy_perf(struct pt_regs *regs)
 {
 	unsigned int mmcr0 = mfspr(SPRN_MMCR0);


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From paulus at samba.org  Wed Jan  5 16:09:38 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 5 Jan 2005 16:09:38 +1100
Subject: [PATCH] PPC64 Use newer RTAS call when available
Message-ID: <16859.30354.953245.690482@cargo.ozlabs.ibm.com>

This patch is from Nathan Fontenot <nfont at austin.ibm.com> originally.

The PPC64 EEH code needs a small update to start using the
ibm,read-slot-reset-state2 rtas call if available.  The currently
used ibm,read-slot-reset-state call will be going away on future
machines.

This patch attempts to use the newer rtas call if available and falls
back the older version otherwise.  This will maintain EEH slot checking
capabilities on all future and current firmware levels.

Signed-off-by: Nathan Fontenot <nfont at austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN base-2.6/arch/ppc64/kernel/eeh.c test/arch/ppc64/kernel/eeh.c
--- base-2.6/arch/ppc64/kernel/eeh.c	2005-01-05 14:29:58.333466400 +1100
+++ test/arch/ppc64/kernel/eeh.c	2005-01-05 15:04:59.937483424 +1100
@@ -96,6 +96,7 @@
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
 static int ibm_read_slot_reset_state;
+static int ibm_read_slot_reset_state2;
 static int ibm_slot_error_detail;
 
 static int eeh_subsystem_enabled;
@@ -408,6 +409,27 @@
 }
 
 /**
+ * read_slot_reset_state - Read the reset state of a device node's slot
+ * @dn: device node to read
+ * @rets: array to return results in
+ */
+static int read_slot_reset_state(struct device_node *dn, int rets[])
+{
+	int token, outputs;
+
+	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
+		token = ibm_read_slot_reset_state2;
+		outputs = 4;
+	} else {
+		token = ibm_read_slot_reset_state;
+		outputs = 3;
+	}
+	
+	return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, 
+			 BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid));
+}
+
+/**
  * eeh_panic - call panic() for an eeh event that cannot be handled.
  * The philosophy of this routine is that it is better to panic and
  * halt the OS than it is to risk possible data corruption by
@@ -509,7 +531,7 @@
 int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
 {
 	int ret;
-	int rets[2];
+	int rets[3];
 	unsigned long flags;
 	int rc, reset_state;
 	struct eeh_event  *event;
@@ -540,11 +562,8 @@
 		atomic_inc(&eeh_fail_count);
 		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
 			/* re-read the slot reset state */
-			rets[0] = -1;
-			rtas_call(ibm_read_slot_reset_state, 3, 3, rets,
-				  dn->eeh_config_addr,
-				  BUID_HI(dn->phb->buid),
-				  BUID_LO(dn->phb->buid));
+			if (read_slot_reset_state(dn, rets) != 0)
+				rets[0] = -1;	/* reset state unknown */
 			eeh_panic(dev, rets[0]);
 		}
 		return 0;
@@ -557,10 +576,7 @@
 	 * function zero of a multi-function device.
 	 * In any case they must share a common PHB.
 	 */
-	ret = rtas_call(ibm_read_slot_reset_state, 3, 3, rets,
-			dn->eeh_config_addr, BUID_HI(dn->phb->buid),
-			BUID_LO(dn->phb->buid));
-
+	ret = read_slot_reset_state(dn, rets);
 	if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) {
 		__get_cpu_var(false_positives)++;
 		return 0;
@@ -756,6 +772,7 @@
 
 	ibm_set_eeh_option = rtas_token("ibm,set-eeh-option");
 	ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
+	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
 	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
 	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
 

From moilanen at austin.ibm.com  Thu Jan  6 01:42:02 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Wed, 5 Jan 2005 08:42:02 -0600
Subject: [PATCH] xmon breakpoints fix for Power4/5
In-Reply-To: <16859.11390.511469.875831@cargo.ozlabs.ibm.com>
References: <20050104143031.62c25338@localhost>
	<16859.11390.511469.875831@cargo.ozlabs.ibm.com>
Message-ID: <20050105084202.5102b467@localhost>

On Wed, 5 Jan 2005 10:53:34 +1100
Paul Mackerras <paulus at samba.org> wrote:

> Jake Moilanen writes:
> 
> > Looks like xmon breakpoints were not working on Power4/5.  Here's a fix
> > to the problem.  
> 
> You mean the 'bi' command didn't make a breakpoint?  Just use the 'b'
> command instead.  Also you take out the if (bp != NULL) check which is
> needed.

I may have misunderstood what Anton wanted when I talked w/ him
yesterday, but I was under the impression that he wanted 'bi' and 'bd'
fixed for Power4/5/LPAR.  

I pretty much just made 'bi' work like 'b' for Power4/5.  I should have
been a little more explicit when I wrote up the description of the
patch. 

If I misunderstood, please just throw this follow up patch away.  

In the follow up, I also included the (bp != NULL) even though it should
not matter because we reuse the same bp everytime.  I do agree that it
should still have the check.  

I will be posting the 'bd' fix for LPAR shortly.

Thanks,
Jake

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c
--- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp	Wed Jan  5 08:14:09 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c	Wed Jan  5 08:15:48 2005
@@ -1050,7 +1050,7 @@ static char *breakpoint_help_string = 
     "b <addr> [cnt]   set breakpoint at given instr addr\n"
     "bc               clear all breakpoints\n"
     "bc <n/addr>      clear breakpoint number n or at addr\n"
-    "bi <addr> [cnt]  set hardware instr breakpoint (broken?)\n"
+    "bi <addr> [cnt]  set hardware instr breakpoint\n"
     "bd <addr> [cnt]  set hardware data breakpoint (broken?)\n"
     "";
 
@@ -1088,11 +1088,6 @@ bpt_cmds(void)
 		break;
 
 	case 'i':	/* bi - hardware instr breakpoint */
-		if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) {
-			printf("Hardware instruction breakpoint "
-			       "not supported on this cpu\n");
-			break;
-		}
 		if (iabr) {
 			iabr->enabled &= ~(BP_IABR | BP_IABR_TE);
 			iabr = NULL;
@@ -1101,11 +1096,16 @@ bpt_cmds(void)
 			break;
 		if (!check_bp_loc(a))
 			break;
+
 		bp = new_breakpoint(a);
-		if (bp != NULL) {
-			bp->enabled |= BP_IABR | BP_IABR_TE;
-			iabr = bp;
+		if (bp) {
+			if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) {		
+				bp->enabled |= BP_IABR | BP_IABR_TE;
+				iabr = bp;
+			} else 
+				bp->enabled |= BP_TRAP;
 		}
+		
 		break;
 
 	case 'c':

_


From moilanen at austin.ibm.com  Thu Jan  6 01:52:19 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Wed, 5 Jan 2005 08:52:19 -0600
Subject: [PATCH] xmon dabr support for LPAR
Message-ID: <20050105085219.5eab02a8@localhost>

Here's xmon DABR support for LPAR.  

I added SETCTRLREG which is a wrapper for setting a controlled register
that will choose to use either an hcall or mtspr depending on what mode
the machine is in.

Thanks,
Jake

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-dabr arch/ppc64/xmon/xmon.c
--- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-dabr	Wed Jan  5 08:17:07 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c	Wed Jan  5 08:23:50 2005
@@ -712,7 +712,7 @@ static void insert_bpts(void)
 static void insert_cpu_bpts(void)
 {
 	if (dabr.enabled)
-		set_dabr(dabr.address | (dabr.enabled & 7));
+		set_controlled_dabr(dabr.address | (dabr.enabled & 7));
 	if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR))
 		set_iabr(iabr->address
 			 | (iabr->enabled & (BP_IABR|BP_IABR_TE)));
@@ -740,7 +740,7 @@ static void remove_bpts(void)
 
 static void remove_cpu_bpts(void)
 {
-	set_dabr(0);
+	set_controlled_dabr(0);
 	if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR))
 		set_iabr(0);
 }
@@ -1051,7 +1051,7 @@ static char *breakpoint_help_string = 
     "bc               clear all breakpoints\n"
     "bc <n/addr>      clear breakpoint number n or at addr\n"
     "bi <addr> [cnt]  set hardware instr breakpoint\n"
-    "bd <addr> [cnt]  set hardware data breakpoint (broken?)\n"
+    "bd <addr> [cnt]  set hardware data breakpoint\n"
     "";
 
 static void
diff -puN arch/ppc64/xmon/privinst.h~xmon-lpar-dabr arch/ppc64/xmon/privinst.h
--- linux-2.6-bk/arch/ppc64/xmon/privinst.h~xmon-lpar-dabr	Wed Jan  5 08:17:22 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/privinst.h	Wed Jan  5 08:20:02 2005
@@ -25,6 +25,16 @@ GETREG(cr)
     static inline void set_ ## name (long val) \
 	{ asm volatile ("mtspr " #n ",%0" : : "r" (val)); }
 
+/*
+ * If a register is a controlled resource protected when there
+ * is a hypervisor, then use this command.
+ */
+#define SETCTRLREG(name)	\
+	extern inline void set_lpar_ ##name(long val); \
+	extern inline void set_controlled_ ## name (long val) \
+	{ (systemcfg->platform == PLATFORM_PSERIES_LPAR) ? \
+	  set_lpar_ ##name (val) : set_ ##name (val); }
+
 GSETSPR(0, mq)
 GSETSPR(1, xer)
 GSETSPR(4, rtcu)
@@ -48,6 +58,8 @@ GSETSPR(1009, hid1)
 GSETSPR(1010, iabr)
 GSETSPR(1013, dabr)
 GSETSPR(1023, pir)
+
+SETCTRLREG(dabr)
 
 static inline void store_inst(void *p)
 {
diff -puN arch/ppc64/xmon/start.c~xmon-lpar-dabr arch/ppc64/xmon/start.c
--- linux-2.6-bk/arch/ppc64/xmon/start.c~xmon-lpar-dabr	Wed Jan  5 08:17:49 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/start.c	Wed Jan  5 08:20:49 2005
@@ -46,6 +46,16 @@ static int __init setup_xmon_sysrq(void)
 __initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
 
+inline void set_lpar_dabr(long val)
+{
+	int rc;
+	
+	rc = plpar_hcall_norets(H_SET_DABR, val);
+
+	if (rc != H_Success)
+		xmon_printf("Warning: setting DABR failed. rc = %d\n", rc);
+}
+
 int
 xmon_write(void *handle, void *ptr, int nb)
 {

_


From linas at austin.ibm.com  Thu Jan  6 07:27:56 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Wed, 5 Jan 2005 14:27:56 -0600
Subject: [PATCH] PPC64: xmon recursion
Message-ID: <20050105202756.GF22274@austin.ibm.com>


Hi,

I've had a number of problems with recursive xmon calls, primarily
because longjump was returning incorrectly.  The following patch
fixes this problem.

Please review and forward upstream.

--linas

Signed-off-by: Linas Vepstas <linas at linas.org>


===== arch/ppc64/xmon/setjmp.c 1.1 vs edited =====
--- 1.1/arch/ppc64/xmon/setjmp.c	2002-02-14 06:14:36 -06:00
+++ edited/arch/ppc64/xmon/setjmp.c	2004-12-14 17:51:29 -06:00
@@ -73,5 +73,6 @@ xmon_longjmp(long *buf, int val)
 	 ld	2,16(%0)\n\
 	 mtlr	0\n\
 	 mr	3,%1\n\
+	 blr	\n\
 	 " : : "r" (buf), "r" (val));
 }


From moilanen at austin.ibm.com  Thu Jan  6 07:45:02 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Wed, 5 Jan 2005 14:45:02 -0600
Subject: [PATCH 0/2] xmon io space read
Message-ID: <20050105144502.56a15bcd@localhost>

These patches allow xmon to read from ioremapped IO space.  

It uses a command very similar to the normal memory read.  I elected to
not reuse the memory read code because I wanted some extra "security"
to help prevent an inadvertent destructive read.

I had to also add a debugger_fault_handler() in bad_page_fault() to
catch an illegal attempt at hashing a bad page via a hcall.

Patch 1/2: Fix for in_be64()

Patch 2/2: xmon code to read from io space.

Thanks,
Jake


From moilanen at austin.ibm.com  Thu Jan  6 07:52:55 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Wed, 5 Jan 2005 14:52:55 -0600
Subject: [PATCH 1/2] xmon io space read
In-Reply-To: <20050105144502.56a15bcd@localhost>
References: <20050105144502.56a15bcd@localhost>
Message-ID: <20050105145255.41819748@localhost>

Here is the fix suggested by Paulus for in_be64().

Thanks,
Jake

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h
--- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix	Tue Jan  4 15:33:22 2005
+++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h	Wed Jan  5 08:08:03 2005
@@ -371,7 +371,7 @@ static inline unsigned long in_be64(cons
 {
 	unsigned long ret;
 
-	__asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
+	__asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync"
 			     : "=r" (ret) : "m" (*addr));
 	return ret;
 }

_


From moilanen at austin.ibm.com  Thu Jan  6 07:57:57 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Wed, 5 Jan 2005 14:57:57 -0600
Subject: [PATCH 2/2] xmon io space read
In-Reply-To: <20050105144502.56a15bcd@localhost>
References: <20050105144502.56a15bcd@localhost>
Message-ID: <20050105145757.62c84c3b@localhost>

Here is the support code for xmon to read IO space.  

It should come in handy to debug driver and bringup issues.

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN arch/ppc64/xmon/xmon.c~xmon-io-read arch/ppc64/xmon/xmon.c
--- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-io-read	Wed Jan  5 11:50:57 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c	Wed Jan  5 14:19:57 2005
@@ -93,6 +93,7 @@ static int mwrite(unsigned long, void *,
 static int handle_fault(struct pt_regs *);
 static void byterev(unsigned char *, int);
 static void memex(void);
+static void iomemex(void);
 static int bsesc(void);
 static void dump(void);
 static void prdump(unsigned long, long);
@@ -175,6 +176,7 @@ Commands:\n\
   di	dump instructions\n\
   df	dump float values\n\
   dd	dump double values\n\
+  i	IO memory dump\n\
   e	print exception information\n\
   f	flush cache\n\
   la	lookup symbol+offset of specified address\n\
@@ -794,6 +796,9 @@ cmds(struct pt_regs *excp)
 				memex();
 			}
 			break;
+		case 'i':
+			iomemex();
+			break;
 		case 'd':
 			dump();
 			break;
@@ -1855,6 +1860,130 @@ memex(void)
 		}
 		adrs += inc;
 	}
+}
+
+static char *iomemex_help_string = 
+    "IO Memory examine command usage:\n"
+    "i addr [size] [options]\n"
+    "  size may include chars from this set:\n"
+    "    1   examine byte  (default)\n"
+    "    2   examine short (2 byte)\n"
+    "    4   examine int   (4 byte)\n"
+    "    8   examine long  (8 byte)\n"
+    "  options may include chars from this set:\n"
+    "    l   little endian (default)\n"
+    "    b   big endian\n"
+    "    a   absolute address - does not add on pci_io_base\n"
+    "NOTE: Defaults to adding on pci_io_base\n"
+    "";
+
+
+#define LE 0
+#define BE 1
+
+static void
+ioread(unsigned long addr, int size, int endiness)
+{
+	int i;
+	long data;
+	
+	if (setjmp(bus_error_jmp) == 0) {
+		catch_memory_errors = 1;
+		sync();
+		switch (size) {
+		case 1:
+			data = in_8((char *)addr);
+			sync();
+			__delay(200);
+			printf("%.16lx: 0x%.2x\n", addr, data);
+			break;
+
+		case 2:
+			data = endiness ? in_be16((short *)addr) : in_le16((short *)addr);
+			sync();
+			__delay(200);
+			printf("%.16lx: 0x%.4x\n", addr, data);
+			break;
+		case 4:
+			data = endiness ? in_be32((int *)addr) : in_le32((int *)addr);
+			sync();
+			__delay(200);
+			printf("%.16lx: 0x%.8x\n", addr, data);
+			break;
+		case 8:
+			data = endiness ? in_be64((long *)addr) : in_le64((long *)addr);
+			sync();
+			__delay(200);
+			printf("%.16lx: 0x%.16x\n", addr, data);
+			break;
+		default:
+			printf("ioread: invalid size (%d)\n", size);
+		}
+	} else {
+		printf("%.16lx: ", addr);
+		for (i = 0; i < size; i++)
+			printf("%s", fault_chars[fault_type]);
+		printf("\n");
+	}
+	
+	catch_memory_errors = 0;
+
+}
+
+static void
+iomemex(void)
+{
+	int size = 1;
+	int cmd;
+	int endiness = LE;
+	int absolute = 0;
+
+	scanhex((void *)&adrs);
+	cmd = skipbl();
+	if (cmd == '?') {
+		printf(iomemex_help_string);
+		return;
+	} else if (cmd == '\n' && !adrs) {
+		printf("pci_io_base: 0x%lx\n", pci_io_base);
+		return;
+	}
+
+	termch = cmd;
+
+	while ((cmd = skipbl()) != '\n') {
+		switch (cmd) {
+		case '1':	size = 1;	break;
+		case '2':	size = 2;	break;
+		case '4':	size = 4;	break;
+		case '8':	size = 8;	break;
+		case 'l':	endiness = LE; break;
+		case 'b':	endiness = BE; break;
+		case 'a':	absolute = 1; break;
+		}
+	} 
+
+	if(size <= 0)
+		size = 1;
+	else if(size > 8)
+		size = 8;
+
+	if (!absolute)
+		adrs += pci_io_base;
+
+	printf("Will attempt to read:\n");
+	printf("address:\t0x%lx\n", adrs);
+	printf("size:\t\t0x%lx\n", size);
+	printf("endiness:\t%s\n", endiness ? "Big" : "Little");
+	printf("Are you sure (Y/n): ");
+	fflush(stdout);
+	flush_input();
+
+	cmd = skipbl();
+
+	if (cmd == 'n')
+		return;
+
+	ioread(adrs, size, endiness);
 }
 
 int
diff -puN arch/ppc64/mm/fault.c~xmon-io-read arch/ppc64/mm/fault.c
--- linux-2.6-bk/arch/ppc64/mm/fault.c~xmon-io-read	Wed Jan  5 13:27:30 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c	Wed Jan  5 13:27:38 2005
@@ -297,6 +297,9 @@ void bad_page_fault(struct pt_regs *regs
 		return;
 	}
 
+	if (debugger_fault_handler(regs))
+		return;
+
 	/* kernel has accessed a bad area */
 	die("Kernel access of bad area", regs, sig);
 }

_


From olof at austin.ibm.com  Thu Jan  6 11:07:21 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Wed, 5 Jan 2005 18:07:21 -0600
Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c
Message-ID: <20050106000721.GA20029@austin.ibm.com>

Hi,

This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to
the naming convention of the other iommu files.

This is part of the iommu cleanup, but broken out as a separate patch
since for mainline, a BK rename is more appropriate. Still, we need a
patch to apply for non-BK-based trees (-mm)

Signed-off-by: Olof Johansson <olof at austin.ibm.com>


---

 linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c |   89 ++++++++++++++++++++
 linux-2.5/arch/ppc64/kernel/pci_dma_direct.c        |   89 --------------------
 2 files changed, 89 insertions(+), 89 deletions(-)

diff -L arch/ppc64/kernel/pci_dma_direct.c -puN arch/ppc64/kernel/pci_dma_direct.c~iommu-rename-pci_dma_direct /dev/null
--- linux-2.5/arch/ppc64/kernel/pci_dma_direct.c
+++ /dev/null	2004-12-07 13:25:26.079467688 -0600
@@ -1,89 +0,0 @@
-/*
- * Support for DMA from PCI devices to main memory on
- * machines without an iommu or with directly addressable
- * RAM (typically a pmac with 2Gb of RAM or less)
- *
- * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/kernel.h>
-#include <linux/pci.h>
-#include <linux/delay.h>
-#include <linux/string.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-#include <linux/dma-mapping.h>
-
-#include <asm/sections.h>
-#include <asm/io.h>
-#include <asm/prom.h>
-#include <asm/pci-bridge.h>
-#include <asm/machdep.h>
-#include <asm/pmac_feature.h>
-#include <asm/abs_addr.h>
-
-#include "pci.h"
-
-static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size,
-				   dma_addr_t *dma_handle)
-{
-	void *ret;
-
-	ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size));
-	if (ret != NULL) {
-		memset(ret, 0, size);
-		*dma_handle = virt_to_abs(ret);
-	}
-	return ret;
-}
-
-static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size,
-				 void *vaddr, dma_addr_t dma_handle)
-{
-	free_pages((unsigned long)vaddr, get_order(size));
-}
-
-static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr,
-		size_t size, enum dma_data_direction direction)
-{
-	return virt_to_abs(ptr);
-}
-
-static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
-		size_t size, enum dma_data_direction direction)
-{
-}
-
-static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-		int nents, enum dma_data_direction direction)
-{
-	int i;
-
-	for (i = 0; i < nents; i++, sg++) {
-		sg->dma_address = page_to_phys(sg->page) + sg->offset;
-		sg->dma_length = sg->length;
-	}
-
-	return nents;
-}
-
-static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-		int nents, enum dma_data_direction direction)
-{
-}
-
-void __init pci_dma_init_direct(void)
-{
-	pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent;
-	pci_dma_ops.pci_free_consistent = pci_direct_free_consistent;
-	pci_dma_ops.pci_map_single = pci_direct_map_single;
-	pci_dma_ops.pci_unmap_single = pci_direct_unmap_single;
-	pci_dma_ops.pci_map_sg = pci_direct_map_sg;
-	pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg;
-}
diff -puN /dev/null arch/ppc64/kernel/pci_direct_iommu.c
--- /dev/null	2004-12-07 13:25:26.079467688 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c	2004-12-07 16:17:31.549078536 -0600
@@ -0,0 +1,89 @@
+/*
+ * Support for DMA from PCI devices to main memory on
+ * machines without an iommu or with directly addressable
+ * RAM (typically a pmac with 2Gb of RAM or less)
+ *
+ * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+
+#include <asm/sections.h>
+#include <asm/io.h>
+#include <asm/prom.h>
+#include <asm/pci-bridge.h>
+#include <asm/machdep.h>
+#include <asm/pmac_feature.h>
+#include <asm/abs_addr.h>
+
+#include "pci.h"
+
+static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size,
+				   dma_addr_t *dma_handle)
+{
+	void *ret;
+
+	ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size));
+	if (ret != NULL) {
+		memset(ret, 0, size);
+		*dma_handle = virt_to_abs(ret);
+	}
+	return ret;
+}
+
+static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size,
+				 void *vaddr, dma_addr_t dma_handle)
+{
+	free_pages((unsigned long)vaddr, get_order(size));
+}
+
+static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr,
+		size_t size, enum dma_data_direction direction)
+{
+	return virt_to_abs(ptr);
+}
+
+static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
+		size_t size, enum dma_data_direction direction)
+{
+}
+
+static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+		int nents, enum dma_data_direction direction)
+{
+	int i;
+
+	for (i = 0; i < nents; i++, sg++) {
+		sg->dma_address = page_to_phys(sg->page) + sg->offset;
+		sg->dma_length = sg->length;
+	}
+
+	return nents;
+}
+
+static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+		int nents, enum dma_data_direction direction)
+{
+}
+
+void __init pci_dma_init_direct(void)
+{
+	pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent;
+	pci_dma_ops.pci_free_consistent = pci_direct_free_consistent;
+	pci_dma_ops.pci_map_single = pci_direct_map_single;
+	pci_dma_ops.pci_unmap_single = pci_direct_unmap_single;
+	pci_dma_ops.pci_map_sg = pci_direct_map_sg;
+	pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg;
+}

_


From olof at austin.ibm.com  Thu Jan  6 11:07:35 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Wed, 5 Jan 2005 18:07:35 -0600
Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch
Message-ID: <20050106000735.GA20079@austin.ibm.com>

Hi,

Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted
at simplifying the allocation schemes and modularising things for the
various platforms. The IOMMU init functions are still a mess. This is
an attempt to clean them up and make them somewhat easier to follow.

The new rules are:

1. iommu_init_early_<arch> is called before any PCI/VIO init is done
2. The pcibios fixup routines will call the iommu_{bus,dev}_setup functions
   appropriately as devices are added.

TCE space allocation has changed somewhat:

* On LPARs, nothing is really different. ibm,dma-window properties are still
  used to determine table sizes.
* On pSeries SMP-mode (non-LPAR), the full TCE space per PHB is split up
  in 256MB chunks, each handed out to one child bus/slot as needed. This
  makes current max 7 child buses per PHB, something we're currently below
  on all machine models I'm aware of.
* Exception to the above: Pre-POWER4 machines with Python PHBs have a full
  GB of DMA space allocated at the PHB level, since there are no EADS-level
  tables on such systems.
* PowerMac and Maple still work like before: all buses/slots share one table.
* VIO works like before, ibm,my-dma-window is used like before.
* iSeries has not been touched much at all, besides the changed unit of
  the it_size variable in struct iommu_table.

Other things changed:
* Powermac and maple PCI/IOMMU inits have been changed a bit to conform to
  the new init structure
* pci_dma_direct.c has been renamed pci_direct_iommu.c to match
  pci_iommu.c (see separate patch)
* Likewise, a couple of the pci direct init functions have been renamed.


Signed-off-by: Olof Johansson <olof at austin.ibm.com>


---

 linux-2.5-olof/arch/ppc64/kernel/Makefile           |    2 
 linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c    |   11 
 linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c    |    3 
 linux-2.5-olof/arch/ppc64/kernel/iommu.c            |   21 -
 linux-2.5-olof/arch/ppc64/kernel/maple_pci.c        |    3 
 linux-2.5-olof/arch/ppc64/kernel/maple_setup.c      |    7 
 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c    |  283 ++++++++++----------
 linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c      |    5 
 linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c    |    5 
 linux-2.5-olof/arch/ppc64/kernel/pci.c              |    5 
 linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c |    2 
 linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c         |    2 
 linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c       |    7 
 linux-2.5-olof/arch/ppc64/kernel/prom.c             |   11 
 linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c         |  104 ++++---
 linux-2.5-olof/arch/ppc64/kernel/vio.c              |   18 -
 linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c     |    4 
 linux-2.5-olof/include/asm-ppc64/iommu.h            |   13 
 linux-2.5-olof/include/asm-ppc64/machdep.h          |    2 
 linux-2.5-olof/include/asm-ppc64/pci-bridge.h       |    8 
 20 files changed, 265 insertions(+), 251 deletions(-)

diff -puN arch/ppc64/kernel/pci.c~iommu-cleanup arch/ppc64/kernel/pci.c
--- linux-2.5/arch/ppc64/kernel/pci.c~iommu-cleanup	2005-01-05 16:59:18.108168880 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pci.c	2005-01-05 16:59:18.235149576 -0600
@@ -845,6 +845,11 @@ void __devinit pcibios_fixup_bus(struct 
 		pcibios_fixup_device_resources(dev, bus);
 	}
 
+	ppc_md.iommu_bus_setup(bus);
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		ppc_md.iommu_dev_setup(dev);
+
 	if (!pci_probe_only)
 		return;
 
diff -puN include/asm-ppc64/machdep.h~iommu-cleanup include/asm-ppc64/machdep.h
--- linux-2.5/include/asm-ppc64/machdep.h~iommu-cleanup	2005-01-05 16:59:18.112168272 -0600
+++ linux-2.5-olof/include/asm-ppc64/machdep.h	2005-01-05 16:59:18.236149424 -0600
@@ -70,6 +70,8 @@ struct machdep_calls {
 				    long index,
 				    long npages);
 	void		(*tce_flush)(struct iommu_table *tbl);
+	void		(*iommu_dev_setup)(struct pci_dev *dev);
+	void		(*iommu_bus_setup)(struct pci_bus *bus);
 
 	int		(*probe)(int platform);
 	void		(*setup_arch)(void);
diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/pSeries_iommu.c
--- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup	2005-01-05 16:59:18.141163864 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c	2005-01-05 17:08:34.411597904 -0600
@@ -46,6 +46,9 @@
 #include <asm/systemcfg.h>
 #include "pci.h"
 
+#define DBG(fmt...)
+
+extern int is_python(struct device_node *);
 
 static void tce_build_pSeries(struct iommu_table *tbl, long index, 
 			      long npages, unsigned long uaddr, 
@@ -121,7 +124,7 @@ static void tce_build_pSeriesLP(struct i
 	}
 }
 
-DEFINE_PER_CPU(void *, tce_page) = NULL;
+static DEFINE_PER_CPU(void *, tce_page) = NULL;
 
 static void tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 				     long npages, unsigned long uaddr,
@@ -233,85 +236,6 @@ static void tce_freemulti_pSeriesLP(stru
 	}
 }
 
-
-static void iommu_buses_init(void)
-{
-	struct pci_controller *phb, *tmp;
-	struct device_node *dn, *first_dn;
-	int num_slots, num_slots_ilog2;
-	int first_phb = 1;
-	unsigned long tcetable_ilog2;
-
-	/*
-	 * We default to a TCE table that maps 2GB (4MB table, 22 bits),
-	 * however some machines have a 3GB IO hole and for these we
-	 * create a table that maps 1GB (2MB table, 21 bits)
-	 */
-	if (io_hole_start < 0x80000000UL)
-		tcetable_ilog2 = 21;
-	else
-		tcetable_ilog2 = 22;
-
-	/* XXX Should we be using pci_root_buses instead?  -ojn
-	 */
-
-	list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
-		first_dn = ((struct device_node *)phb->arch_data)->child;
-
-		/* Carve 2GB into the largest dma_window_size possible */
-		for (dn = first_dn, num_slots = 0; dn != NULL; dn = dn->sibling)
-			num_slots++;
-		num_slots_ilog2 = __ilog2(num_slots);
-
-		if ((1<<num_slots_ilog2) != num_slots)
-			num_slots_ilog2++;
-
-		phb->dma_window_size = 1 << (tcetable_ilog2 - num_slots_ilog2);
-
-		/* Reserve 16MB of DMA space on the first PHB.
-		 * We should probably be more careful and use firmware props.
-		 * In reality this space is remapped, not lost.  But we don't
-		 * want to get that smart to handle it -- too much work.
-		 */
-		phb->dma_window_base_cur = first_phb ? (1 << 12) : 0;
-		first_phb = 0;
-
-		for (dn = first_dn; dn != NULL; dn = dn->sibling)
-			iommu_devnode_init_pSeries(dn);
-	}
-}
-
-
-static void iommu_buses_init_lpar(struct list_head *bus_list)
-{
-	struct list_head *ln;
-	struct pci_bus *bus;
-	struct device_node *busdn;
-	unsigned int *dma_window;
-
-	for (ln=bus_list->next; ln != bus_list; ln=ln->next) {
-		bus = pci_bus_b(ln);
-
-		if (bus->self)
-			busdn = pci_device_to_OF_node(bus->self);
-		else
-			busdn = bus->sysdata;   /* must be a phb */
-
-		dma_window = (unsigned int *)get_property(busdn, "ibm,dma-window", NULL);
-		if (dma_window) {
-			/* Bussubno hasn't been copied yet.
-			 * Do it now because iommu_table_setparms_lpar needs it.
-			 */
-			busdn->bussubno = bus->number;
-			iommu_devnode_init_pSeries(busdn);
-		}
-
-		/* look for a window on a bridge even if the PHB had one */
-		iommu_buses_init_lpar(&bus->children);
-	}
-}
-
-
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
 				 struct iommu_table *tbl) 
@@ -336,27 +260,18 @@ static void iommu_table_setparms(struct 
 	tbl->it_busno = phb->bus->number;
 	
 	/* Units of tce entries */
-	tbl->it_offset = phb->dma_window_base_cur;
-	
-	/* Adjust the current table offset to the next
-	 * region.  Measured in TCE entries. Force an
-	 * alignment to the size allotted per IOA. This
-	 * makes it easier to remove the 1st 16MB.
-      	 */
-	phb->dma_window_base_cur += (phb->dma_window_size>>3);
-	phb->dma_window_base_cur &= 
-		~((phb->dma_window_size>>3)-1);
-	
-	/* Set the tce table size - measured in pages */
-	tbl->it_size = ((phb->dma_window_base_cur -
-			 tbl->it_offset) << 3) >> PAGE_SHIFT;
+	tbl->it_offset = phb->dma_window_base_cur >> PAGE_SHIFT;
 	
 	/* Test if we are going over 2GB of DMA space */
-	if (phb->dma_window_base_cur > (1 << 19))
+	if (phb->dma_window_base_cur + phb->dma_window_size > (1L << 31))
 		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n"); 
 	
+	phb->dma_window_base_cur += phb->dma_window_size;
+
+	/* Set the tce table size - measured in entries */
+	tbl->it_size = phb->dma_window_size >> PAGE_SHIFT;
+
 	tbl->it_index = 0;
-	tbl->it_entrysize = sizeof(union tce_entry);
 	tbl->it_blocksize = 16;
 	tbl->it_type = TCE_PCI;
 }
@@ -375,82 +290,174 @@ static void iommu_table_setparms(struct 
  */
 static void iommu_table_setparms_lpar(struct pci_controller *phb,
 				      struct device_node *dn,
-				      struct iommu_table *tbl)
+				      struct iommu_table *tbl,
+				      unsigned int *dma_window)
 {
-	unsigned int *dma_window;
-
-	dma_window = (unsigned int *)get_property(dn, "ibm,dma-window", NULL);
-
 	if (!dma_window)
 		panic("iommu_table_setparms_lpar: device %s has no"
 		      " ibm,dma-window property!\n", dn->full_name);
 
 	tbl->it_busno  = dn->bussubno;
-	tbl->it_size   = (((((unsigned long)dma_window[4] << 32) | 
-			   (unsigned long)dma_window[5]) >> PAGE_SHIFT) << 3) >> PAGE_SHIFT;
-	tbl->it_offset = ((((unsigned long)dma_window[2] << 32) | 
-			   (unsigned long)dma_window[3]) >> 12);
+
+	/* TODO: Parse field size properties properly. */
+	tbl->it_size   = (((unsigned long)dma_window[4] << 32) |
+			   (unsigned long)dma_window[5]) >> PAGE_SHIFT;
+	tbl->it_offset = (((unsigned long)dma_window[2] << 32) |
+			   (unsigned long)dma_window[3]) >> PAGE_SHIFT;
 	tbl->it_base   = 0;
 	tbl->it_index  = dma_window[0];
-	tbl->it_entrysize = sizeof(union tce_entry);
 	tbl->it_blocksize  = 16;
 	tbl->it_type = TCE_PCI;
 }
 
+static void iommu_bus_setup_pSeries(struct pci_bus *bus)
+{
+	struct device_node *dn, *pdn;
+
+	DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self);
+
+	/* For each (root) bus, we carve up the available DMA space in 256MB
+	 * pieces. Since each piece is used by one (sub) bus/device, that would
+	 * give a maximum of 7 devices per PHB. In most cases, this is plenty.
+	 *
+	 * The exception is on Python PHBs (pre-POWER4). Here we don't have EADS
+	 * bridges below the PHB to allocate the sectioned tables to, so instead
+	 * we allocate a 1GB table at the PHB level.
+	 */
+
+	dn = pci_bus_to_OF_node(bus);
+
+	if (!bus->self) {
+		/* Root bus */
+		if (is_python(dn)) {
+			struct iommu_table *tbl;
+
+			DBG("Python root bus %s\n", bus->name);
+
+			/* 1GB window by default */
+			dn->phb->dma_window_size = 1 << 30;
+			dn->phb->dma_window_base_cur = 0;
+
+			tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
+
+			iommu_table_setparms(dn->phb, dn, tbl);
+			dn->iommu_table = iommu_init_table(tbl);
+		} else {
+			/* 256 MB window by default */
+			dn->phb->dma_window_size = 1 << 28;
+			/* always skip the first 256MB */
+			dn->phb->dma_window_base_cur = 1 << 28;
+
+			/* No table at PHB level for non-python PHBs */
+		}
+	} else {
+		pdn = pci_bus_to_OF_node(bus->parent);
+
+		if (!pdn->iommu_table) {
+			struct iommu_table *tbl;
+			/* First child, allocate new table (256MB window) */
+
+			tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
+
+			iommu_table_setparms(dn->phb, dn, tbl);
+
+			dn->iommu_table = iommu_init_table(tbl);
+		} else {
+			/* Lower than first child or under python, copy parent table */
+			dn->iommu_table = pdn->iommu_table;
+		}
+	}
+}
+
 
-void iommu_devnode_init_pSeries(struct device_node *dn)
+static void iommu_bus_setup_pSeriesLP(struct pci_bus *bus)
 {
 	struct iommu_table *tbl;
+	struct device_node *dn, *pdn;
+	unsigned int *dma_window = NULL;
 
-	tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), 
-					    GFP_KERNEL);
-	
-	if (systemcfg->platform == PLATFORM_PSERIES_LPAR)
-		iommu_table_setparms_lpar(dn->phb, dn, tbl);
-	else
-		iommu_table_setparms(dn->phb, dn, tbl);
+	dn = pci_bus_to_OF_node(bus);
+
+	/* Find nearest ibm,dma-window, walking up the device tree */
+	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
+		dma_window = (unsigned int *)get_property(pdn, "ibm,dma-window", NULL);
+		if (dma_window != NULL)
+			break;
+	}
+
+	WARN_ON(dma_window == NULL);
+
+	if (!pdn->iommu_table) {
+		/* Bussubno hasn't been copied yet.
+		 * Do it now because iommu_table_setparms_lpar needs it.
+		 */
+		pdn->bussubno = bus->number;
+
+		tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table),
+						    GFP_KERNEL);
 	
-	dn->iommu_table = iommu_init_table(tbl);
+		iommu_table_setparms_lpar(pdn->phb, pdn, tbl, dma_window);
+
+		pdn->iommu_table = iommu_init_table(tbl);
+	}
+
+	if (pdn != dn)
+		dn->iommu_table = pdn->iommu_table;
 }
 
-void iommu_setup_pSeries(void)
+
+static void iommu_dev_setup_pSeries(struct pci_dev *dev)
 {
-	struct pci_dev *dev = NULL;
 	struct device_node *dn, *mydn;
 
-	if (systemcfg->platform == PLATFORM_PSERIES_LPAR)
-		iommu_buses_init_lpar(&pci_root_buses);
-	else
-		iommu_buses_init();
-
-	/* Now copy the iommu_table ptr from the bus devices down to every
+	DBG("iommu_dev_setup_pSeries, dev %p (%s)\n", dev, dev->pretty_name);
+	/* Now copy the iommu_table ptr from the bus device down to the
 	 * pci device_node.  This means get_iommu_table() won't need to search
 	 * up the device tree to find it.
 	 */
-	for_each_pci_dev(dev) {
-		mydn = dn = pci_device_to_OF_node(dev);
+	mydn = dn = pci_device_to_OF_node(dev);
 
-		while (dn && dn->iommu_table == NULL)
-			dn = dn->parent;
-		if (dn)
-			mydn->iommu_table = dn->iommu_table;
-	}
+	while (dn && dn->iommu_table == NULL)
+		dn = dn->parent;
+
+	WARN_ON(!dn);
+
+	if (dn)
+		mydn->iommu_table = dn->iommu_table;
 }
 
+static void iommu_bus_setup_null(struct pci_bus *b) { }
+static void iommu_dev_setup_null(struct pci_dev *d) { }
+
 /* These are called very early. */
-void tce_init_pSeries(void)
+void iommu_init_early_pSeries(void)
 {
-	if (!(systemcfg->platform & PLATFORM_LPAR)) {
+	if (of_chosen && get_property(of_chosen, "linux,iommu-off", NULL)) {
+		/* Direct I/O, IOMMU off */
+		ppc_md.iommu_dev_setup = iommu_dev_setup_null;
+		ppc_md.iommu_bus_setup = iommu_bus_setup_null;
+		pci_direct_iommu_init();
+
+		return;
+	}
+
+	if (systemcfg->platform & PLATFORM_LPAR) {
+		if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) {
+			ppc_md.tce_build = tce_buildmulti_pSeriesLP;
+			ppc_md.tce_free	 = tce_freemulti_pSeriesLP;
+		} else {
+			ppc_md.tce_build = tce_build_pSeriesLP;
+			ppc_md.tce_free	 = tce_free_pSeriesLP;
+		}
+		ppc_md.iommu_bus_setup = iommu_bus_setup_pSeriesLP;
+	} else {
 		ppc_md.tce_build = tce_build_pSeries;
 		ppc_md.tce_free  = tce_free_pSeries;
-	} else if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) {
-		ppc_md.tce_build = tce_buildmulti_pSeriesLP;
-		ppc_md.tce_free	 = tce_freemulti_pSeriesLP;
-	} else {
-		ppc_md.tce_build = tce_build_pSeriesLP;
-		ppc_md.tce_free	 = tce_free_pSeriesLP;
+		ppc_md.iommu_bus_setup = iommu_bus_setup_pSeries;
 	}
 
+	ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries;
+
 	pci_iommu_init();
 }
 
diff -puN arch/ppc64/kernel/u3_iommu.c~iommu-cleanup arch/ppc64/kernel/u3_iommu.c
--- linux-2.5/arch/ppc64/kernel/u3_iommu.c~iommu-cleanup	2005-01-05 16:59:18.145163256 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c	2005-01-05 16:59:18.242148512 -0600
@@ -91,6 +91,7 @@ static unsigned int *dart; 
 static unsigned int dart_emptyval;
 
 static struct iommu_table iommu_table_u3;
+static int iommu_table_u3_inited;
 static int dart_dirty;
 
 #define DBG(...)
@@ -192,7 +193,6 @@ static int dart_init(struct device_node 
 	unsigned int regword;
 	unsigned int i;
 	unsigned long tmp;
-	struct page *p;
 
 	if (dart_tablebase == 0 || dart_tablesize == 0) {
 		printk(KERN_INFO "U3-DART: table not allocated, using direct DMA\n");
@@ -209,16 +209,15 @@ static int dart_init(struct device_node 
 	 * that to work around what looks like a problem with the HT bridge
 	 * prefetching into invalid pages and corrupting data
 	 */
-	tmp = __get_free_pages(GFP_ATOMIC, 1);
-	if (tmp == 0)
-		panic("U3-DART: Cannot allocate spare page !");
-	dart_emptyval = DARTMAP_VALID |
-		((virt_to_abs(tmp) >> PAGE_SHIFT) & DARTMAP_RPNMASK);
+	tmp = lmb_alloc(PAGE_SIZE, PAGE_SIZE);
+	if (!tmp)
+		panic("U3-DART: Cannot allocate spare page!");
+	dart_emptyval = DARTMAP_VALID | ((tmp >> PAGE_SHIFT) & DARTMAP_RPNMASK);
 
 	/* Map in DART registers. FIXME: Use device node to get base address */
 	dart = ioremap(DART_BASE, 0x7000);
 	if (dart == NULL)
-		panic("U3-DART: Cannot map registers !");
+		panic("U3-DART: Cannot map registers!");
 
 	/* Set initial control register contents: table base, 
 	 * table size and enable bit
@@ -227,7 +226,6 @@ static int dart_init(struct device_node 
 		((dart_tablebase >> PAGE_SHIFT) << DARTCNTL_BASE_SHIFT) |
 		(((dart_tablesize >> PAGE_SHIFT) & DARTCNTL_SIZE_MASK)
 				 << DARTCNTL_SIZE_SHIFT);
-	p = virt_to_page(dart_tablebase);
 	dart_vbase = ioremap(virt_to_abs(dart_tablebase), dart_tablesize);
 
 	/* Fill initial table */
@@ -240,35 +238,67 @@ static int dart_init(struct device_node 
 	/* Invalidate DART to get rid of possible stale TLBs */
 	dart_tlb_invalidate_all();
 
+	printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n");
+
+	return 0;
+}
+
+static void iommu_table_u3_setup(void)
+{
 	iommu_table_u3.it_busno = 0;
-	
-	/* Units of tce entries */
 	iommu_table_u3.it_offset = 0;
-	
-	/* Set the tce table size - measured in pages */
-	iommu_table_u3.it_size = dart_tablesize >> PAGE_SHIFT;
+	/* it_size is in number of entries */
+	iommu_table_u3.it_size = dart_tablesize / sizeof(u32);
 
 	/* Initialize the common IOMMU code */
 	iommu_table_u3.it_base = (unsigned long)dart_vbase;
 	iommu_table_u3.it_index = 0;
 	iommu_table_u3.it_blocksize = 1;
-	iommu_table_u3.it_entrysize = sizeof(u32);
 	iommu_init_table(&iommu_table_u3);
 
 	/* Reserve the last page of the DART to avoid possible prefetch
 	 * past the DART mapped area
 	 */
-	set_bit(iommu_table_u3.it_mapsize - 1, iommu_table_u3.it_map);
+	set_bit(iommu_table_u3.it_size - 1, iommu_table_u3.it_map);
+}
 
-	printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n");
+static void iommu_dev_setup_u3(struct pci_dev *dev)
+{
+	struct device_node *dn;
 
-	return 0;
+	/* We only have one iommu table on the mac for now, which makes
+	 * things simple. Setup all PCI devices to point to this table
+	 *
+	 * We must use pci_device_to_OF_node() to make sure that
+	 * we get the real "final" pointer to the device in the
+	 * pci_dev sysdata and not the temporary PHB one
+	 */
+	dn = pci_device_to_OF_node(dev);
+
+	if (dn)
+		dn->iommu_table = &iommu_table_u3;
+}
+
+static void iommu_bus_setup_u3(struct pci_bus *bus)
+{
+	struct device_node *dn;
+
+	if (!iommu_table_u3_inited) {
+		iommu_table_u3_inited = 1;
+		iommu_table_u3_setup();
+	}
+
+	dn = pci_bus_to_OF_node(bus);
+
+	if (dn)
+		dn->iommu_table = &iommu_table_u3;
 }
 
-void iommu_setup_u3(void)
+static void iommu_dev_setup_null(struct pci_dev *dev) { }
+static void iommu_bus_setup_null(struct pci_bus *bus) { }
+
+void iommu_init_early_u3(void)
 {
-	struct pci_controller *phb, *tmp;
-	struct pci_dev *dev = NULL;
 	struct device_node *dn;
 
 	/* Find the DART in the device-tree */
@@ -282,31 +312,23 @@ void iommu_setup_u3(void)
 	ppc_md.tce_flush = dart_flush;
 
 	/* Initialize the DART HW */
-	if (dart_init(dn))
-		return;
+	if (dart_init(dn)) {
+		/* If init failed, use direct iommu and null setup functions */
+		ppc_md.iommu_dev_setup = iommu_dev_setup_null;
+		ppc_md.iommu_bus_setup = iommu_bus_setup_null;
+
+		/* Setup pci_dma ops */
+		pci_direct_iommu_init();
+	} else {
+		ppc_md.iommu_dev_setup = iommu_dev_setup_u3;
+		ppc_md.iommu_bus_setup = iommu_bus_setup_u3;
 
-	/* Setup pci_dma ops */
-	pci_iommu_init();
-
-	/* We only have one iommu table on the mac for now, which makes
-	 * things simple. Setup all PCI devices to point to this table
-	 */
-	for_each_pci_dev(dev) {
-		/* We must use pci_device_to_OF_node() to make sure that
-		 * we get the real "final" pointer to the device in the
-		 * pci_dev sysdata and not the temporary PHB one
-		 */
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		if (dn)
-			dn->iommu_table = &iommu_table_u3;
-	}
-	/* We also make sure we set all PHBs ... */
-	list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
-		dn = (struct device_node *)phb->arch_data;
-		dn->iommu_table = &iommu_table_u3;
+		/* Setup pci_dma ops */
+		pci_iommu_init();
 	}
 }
 
+
 void __init alloc_u3_dart_table(void)
 {
 	/* Only reserve DART space if machine has more than 2GB of RAM
diff -puN arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/iSeries_iommu.c
--- linux-2.5/arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup	2005-01-05 16:59:18.149162648 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c	2005-01-05 16:59:18.243148360 -0600
@@ -132,11 +132,11 @@ static void iommu_table_getparms(struct 
 	if (parms->itc_size == 0)
 		panic("PCI_DMA: parms->size is zero, parms is 0x%p", parms);
 
-	tbl->it_size = parms->itc_size;
+	/* itc_size is in pages worth of table, it_size is in # of entries */
+	tbl->it_size = (parms->itc_size * PAGE_SIZE) / sizeof(union tce_entry);
 	tbl->it_busno = parms->itc_busno;
 	tbl->it_offset = parms->itc_offset;
 	tbl->it_index = parms->itc_index;
-	tbl->it_entrysize = sizeof(union tce_entry);
 	tbl->it_blocksize = 1;
 	tbl->it_type = TCE_PCI;
 
@@ -160,11 +160,16 @@ void iommu_devnode_init_iSeries(struct i
 		kfree(tbl);
 }
 
+static void iommu_dev_setup_iSeries(struct pci_dev *dev) { }
+static void iommu_bus_setup_iSeries(struct pci_bus *bus) { }
 
-void tce_init_iSeries(void)
+void iommu_init_early_iSeries(void)
 {
 	ppc_md.tce_build = tce_build_iSeries;
 	ppc_md.tce_free  = tce_free_iSeries;
 
+	ppc_md.iommu_dev_setup = iommu_dev_setup_iSeries;
+	ppc_md.iommu_bus_setup = iommu_bus_setup_iSeries;
+
 	pci_iommu_init();
 }
diff -puN drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup drivers/pci/hotplug/rpaphp_pci.c
--- linux-2.5/drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup	2005-01-05 16:59:18.154161888 -0600
+++ linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c	2005-01-05 16:59:18.245148056 -0600
@@ -25,6 +25,7 @@
 #include <linux/pci.h>
 #include <asm/pci-bridge.h>
 #include <asm/rtas.h>
+#include <asm/machdep.h>
 #include "../pci.h"		/* for pci_add_new_bus */
 
 #include "rpaphp.h"
@@ -168,6 +169,9 @@ rpaphp_fixup_new_pci_devices(struct pci_
 		if (list_empty(&dev->global_list)) {
 			int i;
 			
+			/* Need to setup IOMMU tables */
+			ppc_md.iommu_dev_setup(dev);
+
 			if(fix_bus)
 				pcibios_fixup_device_resources(dev, bus);
 			pci_read_irq_line(dev);
diff -puN arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup arch/ppc64/kernel/pSeries_pci.c
--- linux-2.5/arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup	2005-01-05 16:59:18.158161280 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c	2005-01-05 16:59:18.246147904 -0600
@@ -148,7 +148,7 @@ struct pci_ops rtas_pci_ops = {
 	rtas_pci_write_config
 };
 
-static int is_python(struct device_node *dev)
+int is_python(struct device_node *dev)
 {
 	char *model = (char *)get_property(dev, "model", NULL);
 
@@ -554,9 +554,6 @@ void __init pSeries_final_fixup(void)
 	pSeries_request_regions();
 	pci_fix_bus_sysdata();
 
-	if (!of_chosen || !get_property(of_chosen, "linux,iommu-off", NULL))
-		iommu_setup_pSeries();
-
 	pci_addr_cache_build();
 }
 
diff -puN arch/ppc64/kernel/prom.c~iommu-cleanup arch/ppc64/kernel/prom.c
--- linux-2.5/arch/ppc64/kernel/prom.c~iommu-cleanup	2005-01-05 16:59:18.162160672 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/prom.c	2005-01-05 16:59:18.249147448 -0600
@@ -1743,17 +1743,6 @@ static int of_finish_dynamic_node(struct
 		node->devfn = (regs[0] >> 8) & 0xff;
 	}
 
-	/* fixing up iommu_table */
-
-#ifdef CONFIG_PPC_PSERIES
-	if (strcmp(node->name, "pci") == 0 &&
-	    get_property(node, "ibm,dma-window", NULL)) {
-		node->bussubno = node->busno;
-		iommu_devnode_init_pSeries(node);
-	} else
-		node->iommu_table = parent->iommu_table;
-#endif /* CONFIG_PPC_PSERIES */
-
 out:
 	of_node_put(parent);
 	return err;
diff -puN include/asm-ppc64/pci-bridge.h~iommu-cleanup include/asm-ppc64/pci-bridge.h
--- linux-2.5/include/asm-ppc64/pci-bridge.h~iommu-cleanup	2005-01-05 16:59:18.166160064 -0600
+++ linux-2.5-olof/include/asm-ppc64/pci-bridge.h	2005-01-05 16:59:18.250147296 -0600
@@ -79,6 +79,14 @@ static inline struct device_node *pci_de
 		return fetch_dev_dn(dev);
 }
 
+static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
+{
+	if (bus->self)
+		return pci_device_to_OF_node(bus->self);
+	else
+		return bus->sysdata; /* Must be root bus (PHB) */
+}
+
 extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 					 struct device_node *dev);
 
diff -puN include/asm-ppc64/iommu.h~iommu-cleanup include/asm-ppc64/iommu.h
--- linux-2.5/include/asm-ppc64/iommu.h~iommu-cleanup	2005-01-05 16:59:18.170159456 -0600
+++ linux-2.5-olof/include/asm-ppc64/iommu.h	2005-01-05 16:59:18.252146992 -0600
@@ -69,18 +69,16 @@ union tce_entry {
 
 struct iommu_table {
 	unsigned long  it_busno;     /* Bus number this table belongs to */
-	unsigned long  it_size;      /* Size in pages of iommu table */
+	unsigned long  it_size;      /* Size of iommu table in entries */
 	unsigned long  it_offset;    /* Offset into global table */
 	unsigned long  it_base;      /* mapped address of tce table */
 	unsigned long  it_index;     /* which iommu table this is */
 	unsigned long  it_type;      /* type: PCI or Virtual Bus */
-	unsigned long  it_entrysize; /* Size of an entry in bytes */
 	unsigned long  it_blocksize; /* Entries in each block (cacheline) */
 	unsigned long  it_hint;      /* Hint for next alloc */
 	unsigned long  it_largehint; /* Hint for large allocs */
 	unsigned long  it_halfpoint; /* Breaking point for small/large allocs */
 	spinlock_t     it_lock;      /* Protects it_map */
-	unsigned long  it_mapsize;   /* Size of map in # of entries (bits) */
 	unsigned long *it_map;       /* A simple allocation bitmap for now */
 };
 
@@ -156,14 +154,13 @@ extern dma_addr_t iommu_map_single(struc
 extern void iommu_unmap_single(struct iommu_table *tbl, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction direction);
 
-extern void tce_init_pSeries(void);
-extern void tce_init_iSeries(void);
+extern void iommu_init_early_pSeries(void);
+extern void iommu_init_early_iSeries(void);
+extern void iommu_init_early_u3(void);
 
 extern void pci_iommu_init(void);
-extern void pci_dma_init_direct(void);
+extern void pci_direct_iommu_init(void);
 
 extern void alloc_u3_dart_table(void);
 
-extern int ppc64_iommu_off;
-
 #endif /* _ASM_IOMMU_H */
diff -puN arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup arch/ppc64/kernel/pSeries_setup.c
--- linux-2.5/arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup	2005-01-05 16:59:18.175158696 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c	2005-01-05 16:59:18.253146840 -0600
@@ -375,10 +375,7 @@ static void __init pSeries_init_early(vo
 	}
 
 
-	if (iommu_off)
-		pci_dma_init_direct();
-	else
-		tce_init_pSeries();
+	iommu_init_early_pSeries();
 
 	pSeries_discover_pic();
 
diff -puN arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup arch/ppc64/kernel/iSeries_setup.c
--- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup	2005-01-05 16:59:18.180157936 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c	2005-01-05 16:59:18.255146536 -0600
@@ -68,7 +68,6 @@ extern void hvlog(char *fmt, ...);
 
 /* Function Prototypes */
 extern void ppcdbg_initialize(void);
-extern void tce_init_iSeries(void);
 
 static void build_iSeries_Memory_Map(void);
 static void setup_iSeries_cache_sizes(void);
@@ -344,7 +343,7 @@ static void __init iSeries_parse_cmdline
 	/*
 	 * Initialize the DMA/TCE management
 	 */
-	tce_init_iSeries();
+	iommu_init_early_iSeries();
 
 	/*
 	 * Initialize the table which translate Linux physical addresses to
diff -puN arch/ppc64/kernel/maple_pci.c~iommu-cleanup arch/ppc64/kernel/maple_pci.c
--- linux-2.5/arch/ppc64/kernel/maple_pci.c~iommu-cleanup	2005-01-05 16:59:18.184157328 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/maple_pci.c	2005-01-05 16:59:18.257146232 -0600
@@ -385,9 +385,6 @@ void __init maple_pcibios_fixup(void)
 	/* Fixup the pci_bus sysdata pointers */
 	pci_fix_bus_sysdata();
 
-	/* Setup the iommu */
-	iommu_setup_u3();
-
 	DBG(" <- maple_pcibios_fixup\n");
 }
 
diff -puN arch/ppc64/kernel/pmac_pci.c~iommu-cleanup arch/ppc64/kernel/pmac_pci.c
--- linux-2.5/arch/ppc64/kernel/pmac_pci.c~iommu-cleanup	2005-01-05 16:59:18.188156720 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c	2005-01-05 16:59:18.258146080 -0600
@@ -666,8 +666,6 @@ void __init pmac_pcibios_fixup(void)
 		pci_read_irq_line(dev);
 
 	pci_fix_bus_sysdata();
-
-	iommu_setup_u3();
 }
 
 static void __init pmac_fixup_phb_resources(void)
diff -puN arch/ppc64/kernel/pmac_setup.c~iommu-cleanup arch/ppc64/kernel/pmac_setup.c
--- linux-2.5/arch/ppc64/kernel/pmac_setup.c~iommu-cleanup	2005-01-05 16:59:18.194155808 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c	2005-01-05 16:59:18.309138328 -0600
@@ -166,11 +166,6 @@ void __init pmac_setup_arch(void)
 	pmac_setup_smp();
 #endif
 
-	/* Setup the PCI DMA to "direct" by default. May be overriden
-	 * by iommu later on
-	 */
-	pci_dma_init_direct();
-
 	/* Lookup PCI hosts */
        	pmac_pci_init();
 
@@ -317,6 +312,8 @@ void __init pmac_init_early(void)
 	/* Setup interrupt mapping options */
 	ppc64_interrupt_controller = IC_OPEN_PIC;
 
+	iommu_init_early_u3();
+
 	DBG(" <- pmac_init_early\n");
 }
 
diff -puN arch/ppc64/kernel/maple_setup.c~iommu-cleanup arch/ppc64/kernel/maple_setup.c
--- linux-2.5/arch/ppc64/kernel/maple_setup.c~iommu-cleanup	2005-01-05 16:59:18.199155048 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/maple_setup.c	2005-01-05 16:59:18.309138328 -0600
@@ -111,11 +111,6 @@ void __init maple_setup_arch(void)
 #ifdef CONFIG_SMP
 	smp_ops = &maple_smp_ops;
 #endif
-	/* Setup the PCI DMA to "direct" by default. May be overriden
-	 * by iommu later on
-	 */
-	pci_dma_init_direct();
-
 	/* Lookup PCI hosts */
        	maple_pci_init();
 
@@ -159,6 +154,8 @@ static void __init maple_init_early(void
 	/* Setup interrupt mapping options */
 	ppc64_interrupt_controller = IC_OPEN_PIC;
 
+	iommu_init_early_u3();
+
 	DBG(" <- maple_init_early\n");
 }
 
diff -puN arch/ppc64/kernel/smp.c~iommu-cleanup arch/ppc64/kernel/smp.c
diff -puN arch/ppc64/kernel/vio.c~iommu-cleanup arch/ppc64/kernel/vio.c
--- linux-2.5/arch/ppc64/kernel/vio.c~iommu-cleanup	2005-01-05 16:59:18.207153832 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/vio.c	2005-01-05 16:59:18.311138024 -0600
@@ -158,6 +158,7 @@ void __init iommu_vio_init(void)
 	struct iommu_table *t;
 	struct iommu_table_cb cb;
 	unsigned long cbp;
+	unsigned long itc_entries;
 
 	cb.itc_busno = 255;    /* Bus 255 is the virtual bus */
 	cb.itc_virtbus = 0xff; /* Ask for virtual bus */
@@ -165,12 +166,12 @@ void __init iommu_vio_init(void)
 	cbp = virt_to_abs(&cb);
 	HvCallXm_getTceTableParms(cbp);
 
-	veth_iommu_table.it_size        = cb.itc_size / 2;
+	itc_entries = cb.itc_size * PAGE_SIZE / sizeof(union tce_entry);
+	veth_iommu_table.it_size        = itc_entries / 2;
 	veth_iommu_table.it_busno       = cb.itc_busno;
 	veth_iommu_table.it_offset      = cb.itc_offset;
 	veth_iommu_table.it_index       = cb.itc_index;
 	veth_iommu_table.it_type        = TCE_VB;
-	veth_iommu_table.it_entrysize	= sizeof(union tce_entry);
 	veth_iommu_table.it_blocksize	= 1;
 
 	t = iommu_init_table(&veth_iommu_table);
@@ -178,13 +179,12 @@ void __init iommu_vio_init(void)
 	if (!t)
 		printk("Virtual Bus VETH TCE table failed.\n");
 
-	vio_iommu_table.it_size         = cb.itc_size - veth_iommu_table.it_size;
+	vio_iommu_table.it_size         = itc_entries - veth_iommu_table.it_size;
 	vio_iommu_table.it_busno        = cb.itc_busno;
 	vio_iommu_table.it_offset       = cb.itc_offset +
-		veth_iommu_table.it_size * (PAGE_SIZE/sizeof(union tce_entry));
+					  veth_iommu_table.it_size;
 	vio_iommu_table.it_index        = cb.itc_index;
 	vio_iommu_table.it_type         = TCE_VB;
-	vio_iommu_table.it_entrysize	= sizeof(union tce_entry);
 	vio_iommu_table.it_blocksize	= 1;
 
 	t = iommu_init_table(&vio_iommu_table);
@@ -511,7 +511,6 @@ static struct iommu_table * vio_build_io
 	unsigned int *dma_window;
 	struct iommu_table *newTceTable;
 	unsigned long offset;
-	unsigned long size;
 	int dma_window_property_size;
 
 	dma_window = (unsigned int *) get_property(dev->dev.platform_data, "ibm,my-dma-window", &dma_window_property_size);
@@ -521,21 +520,18 @@ static struct iommu_table * vio_build_io
 
 	newTceTable = (struct iommu_table *) kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
 
-	size = ((dma_window[4] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT;
-
 	/*  There should be some code to extract the phys-encoded offset
 		using prom_n_addr_cells(). However, according to a comment
 		on earlier versions, it's always zero, so we don't bother */
 	offset = dma_window[1] >>  PAGE_SHIFT;
 
-	/* TCE table size - measured in units of pages of tce table */
-	newTceTable->it_size		= size;
+	/* TCE table size - measured in tce entries */
+	newTceTable->it_size		= dma_window[4] >> PAGE_SHIFT;
 	/* offset for VIO should always be 0 */
 	newTceTable->it_offset		= offset;
 	newTceTable->it_busno		= 0;
 	newTceTable->it_index		= (unsigned long)dma_window[0];
 	newTceTable->it_type		= TCE_VB;
-	newTceTable->it_entrysize	= sizeof(union tce_entry);
 
 	return iommu_init_table(newTceTable);
 }
diff -puN arch/ppc64/kernel/iommu.c~iommu-cleanup arch/ppc64/kernel/iommu.c
--- linux-2.5/arch/ppc64/kernel/iommu.c~iommu-cleanup	2005-01-05 16:59:18.211153224 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/iommu.c	2005-01-05 16:59:18.312137872 -0600
@@ -87,7 +87,7 @@ static unsigned long iommu_range_alloc(s
 		start = largealloc ? tbl->it_largehint : tbl->it_hint;
 
 	/* Use only half of the table for small allocs (15 pages or less) */
-	limit = largealloc ? tbl->it_mapsize : tbl->it_halfpoint;
+	limit = largealloc ? tbl->it_size : tbl->it_halfpoint;
 
 	if (largealloc && start < tbl->it_halfpoint)
 		start = tbl->it_halfpoint;
@@ -114,7 +114,7 @@ static unsigned long iommu_range_alloc(s
 			 * Second failure, rescan the other half of the table.
 			 */
 			start = (largealloc ^ pass) ? tbl->it_halfpoint : 0;
-			limit = pass ? tbl->it_mapsize : limit;
+			limit = pass ? tbl->it_size : limit;
 			pass++;
 			goto again;
 		} else {
@@ -194,7 +194,7 @@ static void __iommu_free(struct iommu_ta
 	entry = dma_addr >> PAGE_SHIFT;
 	free_entry = entry - tbl->it_offset;
 
-	if (((free_entry + npages) > tbl->it_mapsize) ||
+	if (((free_entry + npages) > tbl->it_size) ||
 	    (entry < tbl->it_offset)) {
 		if (printk_ratelimit()) {
 			printk(KERN_INFO "iommu_free: invalid entry\n");
@@ -202,7 +202,7 @@ static void __iommu_free(struct iommu_ta
 			printk(KERN_INFO "\tdma_addr  = 0x%lx\n", (u64)dma_addr);
 			printk(KERN_INFO "\tTable     = 0x%lx\n", (u64)tbl);
 			printk(KERN_INFO "\tbus#      = 0x%lx\n", (u64)tbl->it_busno);
-			printk(KERN_INFO "\tmapsize   = 0x%lx\n", (u64)tbl->it_mapsize);
+			printk(KERN_INFO "\tsize      = 0x%lx\n", (u64)tbl->it_size);
 			printk(KERN_INFO "\tstartOff  = 0x%lx\n", (u64)tbl->it_offset);
 			printk(KERN_INFO "\tindex     = 0x%lx\n", (u64)tbl->it_index);
 			WARN_ON(1);
@@ -407,14 +407,11 @@ struct iommu_table *iommu_init_table(str
 	unsigned long sz;
 	static int welcomed = 0;
 
-	/* it_size is in pages, it_mapsize in number of entries */
-	tbl->it_mapsize = (tbl->it_size << PAGE_SHIFT) / tbl->it_entrysize;
-
 	/* Set aside 1/4 of the table for large allocations. */
-	tbl->it_halfpoint = tbl->it_mapsize * 3 / 4;
+	tbl->it_halfpoint = tbl->it_size * 3 / 4;
 
 	/* number of bytes needed for the bitmap */
-	sz = (tbl->it_mapsize + 7) >> 3;
+	sz = (tbl->it_size + 7) >> 3;
 
 	tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz));
 	if (!tbl->it_map)
@@ -448,8 +445,8 @@ void iommu_free_table(struct device_node
 	}
 
 	/* verify that table contains no entries */
-	/* it_mapsize is in entries, and we're examining 64 at a time */
-	for (i = 0; i < (tbl->it_mapsize/64); i++) {
+	/* it_size is in entries, and we're examining 64 at a time */
+	for (i = 0; i < (tbl->it_size/64); i++) {
 		if (tbl->it_map[i] != 0) {
 			printk(KERN_WARNING "%s: Unexpected TCEs for %s\n",
 				__FUNCTION__, dn->full_name);
@@ -458,7 +455,7 @@ void iommu_free_table(struct device_node
 	}
 
 	/* calculate bitmap size in bytes */
-	bitmap_sz = (tbl->it_mapsize + 7) / 8;
+	bitmap_sz = (tbl->it_size + 7) / 8;
 
 	/* free bitmap */
 	order = get_order(bitmap_sz);
diff -puN arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup arch/ppc64/kernel/pci_direct_iommu.c
--- linux-2.5/arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup	2005-01-05 16:59:18.215152616 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c	2005-01-05 16:59:18.312137872 -0600
@@ -78,7 +78,7 @@ static void pci_direct_unmap_sg(struct p
 {
 }
 
-void __init pci_dma_init_direct(void)
+void __init pci_direct_iommu_init(void)
 {
 	pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent;
 	pci_dma_ops.pci_free_consistent = pci_direct_free_consistent;
diff -puN arch/ppc64/kernel/Makefile~iommu-cleanup arch/ppc64/kernel/Makefile
--- linux-2.5/arch/ppc64/kernel/Makefile~iommu-cleanup	2005-01-05 16:59:18.219152008 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/Makefile	2005-01-05 16:59:18.313137720 -0600
@@ -16,7 +16,7 @@ obj-y               :=	setup.o entry.o t
 obj-$(CONFIG_PPC_OF) +=	of_device.o
 
 pci-obj-$(CONFIG_PPC_ISERIES)	+= iSeries_pci.o iSeries_pci_reset.o
-pci-obj-$(CONFIG_PPC_MULTIPLATFORM)	+= pci_dn.o pci_dma_direct.o
+pci-obj-$(CONFIG_PPC_MULTIPLATFORM)	+= pci_dn.o pci_direct_iommu.o
 
 obj-$(CONFIG_PCI)	+= pci.o pci_iommu.o iomap.o $(pci-obj-y)
 

_


From olof at austin.ibm.com  Thu Jan  6 11:26:31 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Wed, 05 Jan 2005 18:26:31 -0600
Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c
In-Reply-To: <20050105162409.5cc9087e.akpm@osdl.org>
References: <20050106000721.GA20029@austin.ibm.com>
	<20050105162409.5cc9087e.akpm@osdl.org>
Message-ID: <41DC85B7.1060000@austin.ibm.com>

Andrew Morton wrote:

>Olof Johansson <olof at austin.ibm.com> wrote:
>  
>
>>This is part of the iommu cleanup, but broken out as a separate patch
>> since for mainline, a BK rename is more appropriate. Still, we need a
>> patch to apply for non-BK-based trees (-mm)
>>    
>>
>
>It's not clear to me what this comment means.  Is this patch for upstream
>merging?
>
>bk is fairly good at detecting when a gnu patch is simply performing a
>rename and will convert it into a `bk mv'.
>
Ah, I didn't know that it was that clever. If so, it's good for upstream 
merging. Otherwise I would have recommended a manual bk mv upstream, 
that's what the comment referred to.


-Olof


From akpm at osdl.org  Thu Jan  6 11:24:09 2005
From: akpm at osdl.org (Andrew Morton)
Date: Wed, 5 Jan 2005 16:24:09 -0800
Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c
In-Reply-To: <20050106000721.GA20029@austin.ibm.com>
References: <20050106000721.GA20029@austin.ibm.com>
Message-ID: <20050105162409.5cc9087e.akpm@osdl.org>

Olof Johansson <olof at austin.ibm.com> wrote:
>
> This is part of the iommu cleanup, but broken out as a separate patch
>  since for mainline, a BK rename is more appropriate. Still, we need a
>  patch to apply for non-BK-based trees (-mm)

It's not clear to me what this comment means.  Is this patch for upstream
merging?

bk is fairly good at detecting when a gnu patch is simply performing a
rename and will convert it into a `bk mv'.


From paulus at samba.org  Thu Jan  6 14:14:44 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 6 Jan 2005 14:14:44 +1100
Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c
In-Reply-To: <20050106000721.GA20029@austin.ibm.com>
References: <20050106000721.GA20029@austin.ibm.com>
Message-ID: <16860.44324.493730.587567@cargo.ozlabs.ibm.com>

Olof Johansson writes:

> This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to
> the naming convention of the other iommu files.
> 
> This is part of the iommu cleanup, but broken out as a separate patch
> since for mainline, a BK rename is more appropriate. Still, we need a
> patch to apply for non-BK-based trees (-mm)
> 
> Signed-off-by: Olof Johansson <olof at austin.ibm.com>

Acked-by: Paul Mackerras <paulus at samba.org>


From paulus at samba.org  Thu Jan  6 14:15:12 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 6 Jan 2005 14:15:12 +1100
Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch
In-Reply-To: <20050106000735.GA20079@austin.ibm.com>
References: <20050106000735.GA20079@austin.ibm.com>
Message-ID: <16860.44352.242290.624886@cargo.ozlabs.ibm.com>

Olof Johansson writes:

> Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted
> at simplifying the allocation schemes and modularising things for the
> various platforms. The IOMMU init functions are still a mess. This is
> an attempt to clean them up and make them somewhat easier to follow.
...
> Signed-off-by: Olof Johansson <olof at austin.ibm.com>

Acked-by: Paul Mackerras <paulus at samba.org>


From sfr at canb.auug.org.au  Thu Jan  6 14:51:02 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Thu, 6 Jan 2005 14:51:02 +1100
Subject: [PATCH] htab code cleanup
Message-ID: <20050106145102.0c3c60ad.sfr@canb.auug.org.au>

Hi all,

This patch just does some small clean ups on the hash page table code
	- make htab_address static with in htab_native.c
	- move some code that depended on CONFIG_PPC_MULTIPLATFORM
	  from htab_utils.c to htab_native.c (on less CONFIG check).
	- clean up includes in htab_utils.c

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk/arch/ppc64/kernel/iSeries_setup.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c	2005-01-06 14:37:42.000000000 +1100
@@ -478,12 +478,6 @@
 	htab_hash_mask = num_ptegs - 1;
 	
 	/*
-	 * The actual hashed page table is in the hypervisor,
-	 * we have no direct access
-	 */
-	htab_address = NULL;
-
-	/*
 	 * Determine if absolute memory has any
 	 * holes so that we can interpret the
 	 * access map we get back from the hypervisor
diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-sfr.14/arch/ppc64/kernel/setup.c
--- linus-bk/arch/ppc64/kernel/setup.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14/arch/ppc64/kernel/setup.c	2005-01-06 14:37:54.000000000 +1100
@@ -673,7 +673,6 @@
 			ppc64_caches.dline_size);
 	printk("ppc64_caches.icache_line_size = 0x%x\n",
 			ppc64_caches.iline_size);
-	printk("htab_address                  = 0x%p\n", htab_address);
 	printk("htab_hash_mask                = 0x%lx\n", htab_hash_mask);
 	printk("-----------------------------------------------------\n");
 
diff -ruN linus-bk/arch/ppc64/mm/hash_native.c linus-bk-sfr.14/arch/ppc64/mm/hash_native.c
--- linus-bk/arch/ppc64/mm/hash_native.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14/arch/ppc64/mm/hash_native.c	2005-01-06 14:37:14.000000000 +1100
@@ -9,6 +9,7 @@
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  */
+#include <linux/config.h>
 #include <linux/spinlock.h>
 #include <linux/bitops.h>
 #include <linux/threads.h>
@@ -22,6 +23,15 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <asm/cputable.h>
+#include <asm/ppcdebug.h>
+
+#ifdef DEBUG
+#define DBG(fmt...) udbg_printf(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+static HPTE *htab_address;
 
 #define HPTE_LOCK_BIT 3
 
@@ -410,6 +420,173 @@
 }
 #endif
 
+/*
+ * Note:  pte   --> Linux PTE
+ *        HPTE  --> PowerPC Hashed Page Table Entry
+ *
+ * Execution context:
+ *   htab_initialize is called with the MMU off (of course), but
+ *   the kernel has been copied down to zero so it can directly
+ *   reference global data.  At this point it is very difficult
+ *   to print debug info.
+ *
+ */
+
+#ifdef CONFIG_U3_DART
+extern unsigned long dart_tablebase;
+#endif /* CONFIG_U3_DART */
+extern unsigned long _SDR1;
+
+#define KB (1024)
+#define MB (1024*KB)
+
+static inline void loop_forever(void)
+{
+	volatile unsigned long x = 1;
+	for(;x;x|=1)
+		;
+}
+
+static inline void create_pte_mapping(unsigned long start, unsigned long end,
+				      unsigned long mode, int large)
+{
+	unsigned long addr;
+	unsigned int step;
+
+	if (large)
+		step = 16*MB;
+	else
+		step = 4*KB;
+
+	for (addr = start; addr < end; addr += step) {
+		unsigned long vpn, hash, hpteg;
+		unsigned long vsid = get_kernel_vsid(addr);
+		unsigned long va = (vsid << 28) | (addr & 0xfffffff);
+		int ret;
+
+		if (large)
+			vpn = va >> HPAGE_SHIFT;
+		else
+			vpn = va >> PAGE_SHIFT;
+
+		hash = hpt_hash(vpn, large);
+
+		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+#ifdef CONFIG_PPC_PSERIES
+		if (systemcfg->platform & PLATFORM_LPAR)
+			ret = pSeries_lpar_hpte_insert(hpteg, va,
+				virt_to_abs(addr) >> PAGE_SHIFT,
+				0, mode, 1, large);
+		else
+#endif /* CONFIG_PPC_PSERIES */
+			ret = native_hpte_insert(hpteg, va,
+				virt_to_abs(addr) >> PAGE_SHIFT,
+				0, mode, 1, large);
+
+		if (ret == -1) {
+			ppc64_terminate_msg(0x20, "create_pte_mapping");
+			loop_forever();
+		}
+	}
+}
+
+void __init htab_initialize(void)
+{
+	unsigned long table, htab_size_bytes;
+	unsigned long pteg_count;
+	unsigned long mode_rw;
+	int i, use_largepages = 0;
+
+	DBG(" -> htab_initialize()\n");
+
+	/*
+	 * Calculate the required size of the htab.  We want the number of
+	 * PTEGs to equal one half the number of real pages.
+	 */ 
+	htab_size_bytes = 1UL << ppc64_pft_size;
+	pteg_count = htab_size_bytes >> 7;
+
+	/* For debug, make the HTAB 1/8 as big as it normally would be. */
+	ifppcdebug(PPCDBG_HTABSIZE) {
+		pteg_count >>= 3;
+		htab_size_bytes = pteg_count << 7;
+	}
+
+	htab_hash_mask = pteg_count - 1;
+
+	if (systemcfg->platform & PLATFORM_LPAR) {
+		/* Using a hypervisor which owns the htab */
+		htab_address = NULL;
+		_SDR1 = 0; 
+	} else {
+		/* Find storage for the HPT.  Must be contiguous in
+		 * the absolute address space.
+		 */
+		table = lmb_alloc(htab_size_bytes, htab_size_bytes);
+
+		DBG("Hash table allocated at %lx, size: %lx\n", table,
+		    htab_size_bytes);
+
+		if ( !table ) {
+			ppc64_terminate_msg(0x20, "hpt space");
+			loop_forever();
+		}
+		htab_address = abs_to_virt(table);
+
+		/* htab absolute addr + encoded htabsize */
+		_SDR1 = table + __ilog2(pteg_count) - 11;
+
+		/* Initialize the HPT with no entries */
+		memset((void *)table, 0, htab_size_bytes);
+	}
+
+	mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX;
+
+	/* On U3 based machines, we need to reserve the DART area and
+	 * _NOT_ map it to avoid cache paradoxes as it's remapped non
+	 * cacheable later on
+	 */
+	if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)
+		use_largepages = 1;
+
+	/* create bolted the linear mapping in the hash table */
+	for (i=0; i < lmb.memory.cnt; i++) {
+		unsigned long base, size;
+
+		base = lmb.memory.region[i].physbase + KERNELBASE;
+		size = lmb.memory.region[i].size;
+
+		DBG("creating mapping for region: %lx : %lx\n", base, size);
+
+#ifdef CONFIG_U3_DART
+		/* Do not map the DART space. Fortunately, it will be aligned
+		 * in such a way that it will not cross two lmb regions and will
+		 * fit within a single 16Mb page.
+		 * The DART space is assumed to be a full 16Mb region even if we
+		 * only use 2Mb of that space. We will use more of it later for
+		 * AGP GART. We have to use a full 16Mb large page.
+		 */
+		DBG("DART base: %lx\n", dart_tablebase);
+
+		if (dart_tablebase != 0 && dart_tablebase >= base
+		    && dart_tablebase < (base + size)) {
+			if (base != dart_tablebase)
+				create_pte_mapping(base, dart_tablebase, mode_rw,
+						   use_largepages);
+			if ((base + size) > (dart_tablebase + 16*MB))
+				create_pte_mapping(dart_tablebase + 16*MB, base + size,
+						   mode_rw, use_largepages);
+			continue;
+		}
+#endif /* CONFIG_U3_DART */
+		create_pte_mapping(base, base + size, mode_rw, use_largepages);
+	}
+	DBG(" <- htab_initialize()\n");
+}
+#undef KB
+#undef MB
+
 void hpte_init_native(void)
 {
 	ppc_md.hpte_invalidate	= native_hpte_invalidate;
diff -ruN linus-bk/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c
--- linus-bk/arch/ppc64/mm/hash_utils.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c	2005-01-06 14:37:27.000000000 +1100
@@ -17,220 +17,29 @@
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  */
-
-#undef DEBUG
-
-#include <linux/config.h>
-#include <linux/spinlock.h>
-#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/bitops.h>
+#include <linux/page-flags.h>
 #include <linux/sched.h>
-#include <linux/proc_fs.h>
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include <linux/ctype.h>
-#include <linux/cache.h>
-#include <linux/init.h>
+#include <linux/cpumask.h>
+#include <linux/smp.h>
+#include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <linux/signal.h>
 
-#include <asm/ppcdebug.h>
 #include <asm/processor.h>
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/page.h>
-#include <asm/types.h>
 #include <asm/system.h>
-#include <asm/uaccess.h>
 #include <asm/machdep.h>
-#include <asm/lmb.h>
-#include <asm/abs_addr.h>
 #include <asm/tlbflush.h>
-#include <asm/io.h>
-#include <asm/eeh.h>
-#include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/cputable.h>
-#include <asm/abs_addr.h>
-
-#ifdef DEBUG
-#define DBG(fmt...) udbg_printf(fmt)
-#else
-#define DBG(fmt...)
-#endif
-
-/*
- * Note:  pte   --> Linux PTE
- *        HPTE  --> PowerPC Hashed Page Table Entry
- *
- * Execution context:
- *   htab_initialize is called with the MMU off (of course), but
- *   the kernel has been copied down to zero so it can directly
- *   reference global data.  At this point it is very difficult
- *   to print debug info.
- *
- */
-
-#ifdef CONFIG_U3_DART
-extern unsigned long dart_tablebase;
-#endif /* CONFIG_U3_DART */
+#include <asm/ptrace.h>
 
-HPTE		*htab_address;
 unsigned long	htab_hash_mask;
 
-extern unsigned long _SDR1;
-
-#define KB (1024)
-#define MB (1024*KB)
-
-static inline void loop_forever(void)
-{
-	volatile unsigned long x = 1;
-	for(;x;x|=1)
-		;
-}
-
-#ifdef CONFIG_PPC_MULTIPLATFORM
-static inline void create_pte_mapping(unsigned long start, unsigned long end,
-				      unsigned long mode, int large)
-{
-	unsigned long addr;
-	unsigned int step;
-
-	if (large)
-		step = 16*MB;
-	else
-		step = 4*KB;
-
-	for (addr = start; addr < end; addr += step) {
-		unsigned long vpn, hash, hpteg;
-		unsigned long vsid = get_kernel_vsid(addr);
-		unsigned long va = (vsid << 28) | (addr & 0xfffffff);
-		int ret;
-
-		if (large)
-			vpn = va >> HPAGE_SHIFT;
-		else
-			vpn = va >> PAGE_SHIFT;
-
-		hash = hpt_hash(vpn, large);
-
-		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
-
-#ifdef CONFIG_PPC_PSERIES
-		if (systemcfg->platform & PLATFORM_LPAR)
-			ret = pSeries_lpar_hpte_insert(hpteg, va,
-				virt_to_abs(addr) >> PAGE_SHIFT,
-				0, mode, 1, large);
-		else
-#endif /* CONFIG_PPC_PSERIES */
-			ret = native_hpte_insert(hpteg, va,
-				virt_to_abs(addr) >> PAGE_SHIFT,
-				0, mode, 1, large);
-
-		if (ret == -1) {
-			ppc64_terminate_msg(0x20, "create_pte_mapping");
-			loop_forever();
-		}
-	}
-}
-
-void __init htab_initialize(void)
-{
-	unsigned long table, htab_size_bytes;
-	unsigned long pteg_count;
-	unsigned long mode_rw;
-	int i, use_largepages = 0;
-
-	DBG(" -> htab_initialize()\n");
-
-	/*
-	 * Calculate the required size of the htab.  We want the number of
-	 * PTEGs to equal one half the number of real pages.
-	 */ 
-	htab_size_bytes = 1UL << ppc64_pft_size;
-	pteg_count = htab_size_bytes >> 7;
-
-	/* For debug, make the HTAB 1/8 as big as it normally would be. */
-	ifppcdebug(PPCDBG_HTABSIZE) {
-		pteg_count >>= 3;
-		htab_size_bytes = pteg_count << 7;
-	}
-
-	htab_hash_mask = pteg_count - 1;
-
-	if (systemcfg->platform & PLATFORM_LPAR) {
-		/* Using a hypervisor which owns the htab */
-		htab_address = NULL;
-		_SDR1 = 0; 
-	} else {
-		/* Find storage for the HPT.  Must be contiguous in
-		 * the absolute address space.
-		 */
-		table = lmb_alloc(htab_size_bytes, htab_size_bytes);
-
-		DBG("Hash table allocated at %lx, size: %lx\n", table,
-		    htab_size_bytes);
-
-		if ( !table ) {
-			ppc64_terminate_msg(0x20, "hpt space");
-			loop_forever();
-		}
-		htab_address = abs_to_virt(table);
-
-		/* htab absolute addr + encoded htabsize */
-		_SDR1 = table + __ilog2(pteg_count) - 11;
-
-		/* Initialize the HPT with no entries */
-		memset((void *)table, 0, htab_size_bytes);
-	}
-
-	mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX;
-
-	/* On U3 based machines, we need to reserve the DART area and
-	 * _NOT_ map it to avoid cache paradoxes as it's remapped non
-	 * cacheable later on
-	 */
-	if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)
-		use_largepages = 1;
-
-	/* create bolted the linear mapping in the hash table */
-	for (i=0; i < lmb.memory.cnt; i++) {
-		unsigned long base, size;
-
-		base = lmb.memory.region[i].physbase + KERNELBASE;
-		size = lmb.memory.region[i].size;
-
-		DBG("creating mapping for region: %lx : %lx\n", base, size);
-
-#ifdef CONFIG_U3_DART
-		/* Do not map the DART space. Fortunately, it will be aligned
-		 * in such a way that it will not cross two lmb regions and will
-		 * fit within a single 16Mb page.
-		 * The DART space is assumed to be a full 16Mb region even if we
-		 * only use 2Mb of that space. We will use more of it later for
-		 * AGP GART. We have to use a full 16Mb large page.
-		 */
-		DBG("DART base: %lx\n", dart_tablebase);
-
-		if (dart_tablebase != 0 && dart_tablebase >= base
-		    && dart_tablebase < (base + size)) {
-			if (base != dart_tablebase)
-				create_pte_mapping(base, dart_tablebase, mode_rw,
-						   use_largepages);
-			if ((base + size) > (dart_tablebase + 16*MB))
-				create_pte_mapping(dart_tablebase + 16*MB, base + size,
-						   mode_rw, use_largepages);
-			continue;
-		}
-#endif /* CONFIG_U3_DART */
-		create_pte_mapping(base, base + size, mode_rw, use_largepages);
-	}
-	DBG(" <- htab_initialize()\n");
-}
-#undef KB
-#undef MB
-#endif /* CONFIG_PPC_MULTIPLATFORM */
-
 /*
  * Called by asm hashtable.S for doing lazy icache flush
  */
diff -ruN linus-bk/include/asm-ppc64/mmu.h linus-bk-sfr.14/include/asm-ppc64/mmu.h
--- linus-bk/include/asm-ppc64/mmu.h	2005-01-05 17:06:08.000000000 +1100
+++ linus-bk-sfr.14/include/asm-ppc64/mmu.h	2005-01-06 14:36:16.000000000 +1100
@@ -98,7 +98,6 @@
 #define PP_RXRX 3	/* Supervisor read,       User read */
 
 
-extern HPTE *		htab_address;
 extern unsigned long	htab_hash_mask;
 
 static inline unsigned long hpt_hash(unsigned long vpn, int large)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/86e40190/attachment.pgp 

From WJEEHA at pk.ibm.com  Thu Jan  6 22:48:27 2005
From: WJEEHA at pk.ibm.com (Wjeeha Tahir)
Date: Thu, 6 Jan 2005 16:48:27 +0500
Subject: IBM 6400 Printer Driver
Message-ID: <OFBAF0A6A8.DC4240A6-ON45256F81.00404D87-45256F81.004074CD@pk.ibm.com>

Hi,

I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and any 
configuration/installation document (if possible). I am hoping that this 
forum would help me find the desired. Thanks in advance,

Kind Regards,
Wjeeha Tahir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/79b23142/attachment.htm 

From anton at samba.org  Thu Jan  6 23:19:24 2005
From: anton at samba.org (Anton Blanchard)
Date: Thu, 6 Jan 2005 23:19:24 +1100
Subject: [PATCH] xmon breakpoints fix for Power4/5
In-Reply-To: <20050105084202.5102b467@localhost>
References: <20050104143031.62c25338@localhost>
	<16859.11390.511469.875831@cargo.ozlabs.ibm.com>
	<20050105084202.5102b467@localhost>
Message-ID: <20050106121924.GA14239@krispykreme.ozlabs.ibm.com>


> I may have misunderstood what Anton wanted when I talked w/ him
> yesterday, but I was under the impression that he wanted 'bi' and 'bd'
> fixed for Power4/5/LPAR.  

Yep sorry, my fault. I was interested in the data breakpoint stuff you
had written that went through the hypervisor.

Anton


From haveblue at us.ibm.com  Fri Jan  7 03:45:09 2005
From: haveblue at us.ibm.com (Dave Hansen)
Date: Thu, 06 Jan 2005 08:45:09 -0800
Subject: IBM 6400 Printer Driver
In-Reply-To: <OFBAF0A6A8.DC4240A6-ON45256F81.00404D87-45256F81.004074CD@pk.ibm.com>
References: <OFBAF0A6A8.DC4240A6-ON45256F81.00404D87-45256F81.004074CD@pk.ibm.com>
Message-ID: <1105029909.6932.2.camel@localhost>

On Thu, 2005-01-06 at 16:48 +0500, Wjeeha Tahir wrote:
> I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and
> any configuration/installation document (if possible). I am hoping
> that this forum would help me find the desired. Thanks in advance,

Does the printer have a ppc64 chip in it and run Linux?

-- Dave


From hch at lst.de  Fri Jan  7 03:47:19 2005
From: hch at lst.de (Christoph Hellwig)
Date: Thu, 6 Jan 2005 17:47:19 +0100
Subject: [PATCH] fix pktcdvd linking on ppc64
Message-ID: <20050106164719.GA24751@lst.de>

clear_page uses ppc64_caches so it needs to be exported.


--- 1.99/arch/ppc64/kernel/setup.c	2005-01-05 03:48:16 +01:00
+++ edited/arch/ppc64/kernel/setup.c	2005-01-06 17:51:19 +01:00
@@ -116,6 +116,7 @@
 u64 ppc64_debug_switch;
 
 struct ppc64_caches ppc64_caches;
+EXPORT_SYMBOL_GPL(ppc64_caches);
 
 /*
  * These are used in binfmt_elf.c to put aux entries on the stack


From markus at unixforces.net  Fri Jan  7 04:55:01 2005
From: markus at unixforces.net (Markus Rothe)
Date: Thu, 6 Jan 2005 18:55:01 +0100
Subject: Problems using Apple LCD with 2.6.10
Message-ID: <20050106175501.GA11534@unixforces.net>

Hi,

I'm not sure if this is the correct place for such mails, but I didn't
found another place to post my problem.

My problem is that my LCD doesn't work correctly with latest (2.6.10)
kernel. It's an Apple Cinema Display connected through the Apple Display
Connector (ADC). The problem is that there are many "blue lightnings" all
over the display. With blue lightning I mean a small set of pixels which
turn into light blue for about half a second. And my display also flickers
from time to time. Both happens when running console mode and if I run
Xorg.

This problem is definetly related to the kernel as it does not occure with
kernel 2.6.9.

Markus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/d4574a70/attachment.pgp 

From linas at austin.ibm.com  Fri Jan  7 06:24:13 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Thu, 6 Jan 2005 13:24:13 -0600
Subject: [PATCH] PPC64: EEH Recovery
Message-ID: <20050106192413.GK22274@austin.ibm.com>


Hi Paul,

The patch below implements hotplug style EEH error recovery. 
Its split into two pieces: a part that needs to be applied to the
PPC64 arch tree, and a part that needs to be applied to the 
RPA PHP hotplug tree. The PPC64 part needs to go in first.

Assuming this doesn't generate a round of discussion, please
forward upstream to akpm/torvalds.

Signed-off-by: Linas Vepstas <linas at linas.org>


-------------- next part --------------
===== arch/ppc64/kernel/eeh.c 1.41 vs edited =====
--- 1.41/arch/ppc64/kernel/eeh.c	2005-01-06 13:05:42 -06:00
+++ edited/arch/ppc64/kernel/eeh.c	2005-01-06 13:08:03 -06:00
@@ -17,21 +17,19 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
-#include <linux/bootmem.h>
+#include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/list.h>
-#include <linux/mm.h>
 #include <linux/notifier.h>
 #include <linux/pci.h>
 #include <linux/proc_fs.h>
 #include <linux/rbtree.h>
 #include <linux/seq_file.h>
-#include <linux/spinlock.h>
+#include <asm/atomic.h>
 #include <asm/eeh.h>
 #include <asm/io.h>
 #include <asm/machdep.h>
 #include <asm/rtas.h>
-#include <asm/atomic.h>
 #include "pci.h"
 
 #undef DEBUG
@@ -89,7 +87,6 @@ static struct notifier_block *eeh_notifi
  * attempts we allow before panicking.
  */
 #define EEH_MAX_FAILS	1000
-static atomic_t eeh_fail_count;
 
 /* RTAS tokens */
 static int ibm_set_eeh_option;
@@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI
 static int eeh_error_buf_size;
 
 /* System monitoring statistics */
+static DEFINE_PER_CPU(unsigned long, no_device);
+static DEFINE_PER_CPU(unsigned long, no_dn);
+static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
+static DEFINE_PER_CPU(unsigned long, ignored_check);
 static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
 static DEFINE_PER_CPU(unsigned long, false_positives);
 static DEFINE_PER_CPU(unsigned long, ignored_failures);
@@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de
 	while (*p) {
 		parent = *p;
 		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (alo < piar->addr_lo) {
+		if (ahi < piar->addr_lo) {
 			p = &parent->rb_left;
-		} else if (ahi > piar->addr_hi) {
+		} else if (alo > piar->addr_hi) {
 			p = &parent->rb_right;
 		} else {
 			if (dev != piar->pcidev ||
@@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de
 	piar->addr_hi = ahi;
 	piar->pcidev = dev;
 	piar->flags = flags;
+	
+#ifdef DEBUG 
+	printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", 
+	               alo, ahi, pci_name (dev));
+#endif
 
 	rb_link_node(&piar->rb_node, parent, p);
 	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
@@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct
  */
 void __init pci_addr_cache_build(void)
 {
+	struct device_node *dn;
 	struct pci_dev *dev = NULL;
 
 	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
@@ -378,6 +385,14 @@ void __init pci_addr_cache_build(void)
 			continue;
 		}
 		pci_addr_cache_insert_device(dev);
+		
+		/* Save the BAR's; firmware doesn't restore these after EEH reset */
+		dn = pci_device_to_OF_node(dev);
+		if (dn) {
+			int i;
+			for (i = 0; i < 16; i++) 
+				pci_read_config_dword(dev, i * 4, &dn->config_space[i]);
+		}
 	}
 
 #ifdef DEBUG
@@ -389,6 +404,32 @@ void __init pci_addr_cache_build(void)
 /* --------------------------------------------------------------- */
 /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
 
+void eeh_slot_error_detail (struct device_node *dn, int severity)
+{
+	unsigned long flags;
+	int rc;
+
+	if (!dn) return;
+
+	/* Log the error with the rtas logger */
+	spin_lock_irqsave(&slot_errbuf_lock, flags);
+	memset(slot_errbuf, 0, eeh_error_buf_size);
+
+	rc = rtas_call(ibm_slot_error_detail,
+	               8, 1, NULL, dn->eeh_config_addr,
+	               BUID_HI(dn->phb->buid),
+	               BUID_LO(dn->phb->buid), NULL, 0,
+	               virt_to_phys(slot_errbuf),
+	               eeh_error_buf_size,
+	               severity);
+
+	if (rc == 0)
+		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
+	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+}
+
+EXPORT_SYMBOL(eeh_slot_error_detail);
+
 /**
  * eeh_register_notifier - Register to find out about EEH events.
  * @nb: notifier block to callback on events
@@ -484,11 +525,9 @@ static void eeh_event_handler(void *dumm
 		       "%s %s\n", event->reset_state,
 		       pci_name(event->dev), pci_pretty_name(event->dev));
 
-		atomic_set(&eeh_fail_count, 0);
-		notifier_call_chain (&eeh_notifier_chain,
-				     EEH_NOTIFY_FREEZE, event);
-
 		__get_cpu_var(slot_resets)++;
+		notifier_call_chain (&eeh_notifier_chain,
+		           EEH_NOTIFY_FREEZE, event);
 
 		pci_dev_put(event->dev);
 		kfree(event);
@@ -496,8 +535,8 @@ static void eeh_event_handler(void *dumm
 }
 
 /**
- * eeh_token_to_phys - convert EEH address token to phys address
- * @token i/o token, should be address in the form 0xE....
+ * eeh_token_to_phys - convert I/O address to phys address
+ * @token i/o address, should be address in the form 0xA....
  */
 static inline unsigned long eeh_token_to_phys(unsigned long token)
 {
@@ -512,6 +551,17 @@ static inline unsigned long eeh_token_to
 	return pa | (token & (PAGE_SIZE-1));
 }
 
+static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn)
+{
+	struct pci_dev *dev = NULL;
+
+	for_each_pci_dev(dev) {
+		if (pci_device_to_OF_node(dev) == dn)
+			return dev;
+	}
+	return NULL;
+}
+
 /**
  * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
  * @dn device node
@@ -532,7 +582,7 @@ int eeh_dn_check_failure(struct device_n
 	int ret;
 	int rets[3];
 	unsigned long flags;
-	int rc, reset_state;
+	int reset_state;
 	struct eeh_event  *event;
 
 	__get_cpu_var(total_mmio_ffs)++;
@@ -540,16 +590,20 @@ int eeh_dn_check_failure(struct device_n
 	if (!eeh_subsystem_enabled)
 		return 0;
 
-	if (!dn)
+	if (!dn) {
+		__get_cpu_var(no_dn)++;
 		return 0;
+	}
 
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) ||
 	    dn->eeh_mode & EEH_MODE_NOCHECK) {
+		__get_cpu_var(ignored_check)++;
 		return 0;
 	}
 
 	if (!dn->eeh_config_addr) {
+		__get_cpu_var(no_cfg_addr)++;
 		return 0;
 	}
 
@@ -558,8 +612,9 @@ int eeh_dn_check_failure(struct device_n
 	 * slot, we know it's bad already, we don't need to check...
 	 */
 	if (dn->eeh_mode & EEH_MODE_ISOLATED) {
-		atomic_inc(&eeh_fail_count);
-		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
+		dn->eeh_freeze_count ++;
+		if (dn->eeh_freeze_count >= EEH_MAX_FAILS) {
+			dump_stack();
 			/* re-read the slot reset state */
 			if (read_slot_reset_state(dn, rets) != 0)
 				rets[0] = -1;	/* reset state unknown */
@@ -581,34 +636,25 @@ int eeh_dn_check_failure(struct device_n
 		return 0;
 	}
 
-	/* prevent repeated reports of this failure */
+	/* Prevent repeated reports of this failure */
 	dn->eeh_mode |= EEH_MODE_ISOLATED;
 
 	reset_state = rets[0];
+	/* Log the error with the rtas logger */
+	if (dn->eeh_freeze_count < EEH_MAX_ALLOWED_FREEZES) {
+		eeh_slot_error_detail (dn, 1 /* Temporary Error */);
+	} else {
+		eeh_slot_error_detail (dn, 2 /* Permanent Error */);
+   }
 
-	spin_lock_irqsave(&slot_errbuf_lock, flags);
-	memset(slot_errbuf, 0, eeh_error_buf_size);
-
-	rc = rtas_call(ibm_slot_error_detail,
-	               8, 1, NULL, dn->eeh_config_addr,
-	               BUID_HI(dn->phb->buid),
-	               BUID_LO(dn->phb->buid), NULL, 0,
-	               virt_to_phys(slot_errbuf),
-	               eeh_error_buf_size,
-	               1 /* Temporary Error */);
-
-	if (rc == 0)
-		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
-	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
-
-	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
-	       rets[0], dn->name, dn->full_name);
 	event = kmalloc(sizeof(*event), GFP_ATOMIC);
 	if (event == NULL) {
-		eeh_panic(dev, reset_state);
+		printk (KERN_ERR "EEH: out of memory, event not handled\n");
 		return 1;
  	}
 
+	if (!dev)
+		dev = eeh_get_pci_dev (dn);
 	event->dev = dev;
 	event->dn = dn;
 	event->reset_state = reset_state;
@@ -634,7 +680,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure);
  * @token i/o token, should be address in the form 0xA....
  * @val value, should be all 1's (XXX why do we need this arg??)
  *
- * Check for an eeh failure at the given token address.
  * Check for an EEH failure at the given token address.  Call this
  * routine if the result of a read was all 0xff's and you want to
  * find out if this is due to an EEH slot freeze event.  This routine
@@ -642,6 +687,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure);
  *
  * Note this routine is safe to call in an interrupt context.
  */
+
 unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
 {
 	unsigned long addr;
@@ -651,8 +697,10 @@ unsigned long eeh_check_failure(const vo
 	/* Finding the phys addr + pci device; this is pretty quick. */
 	addr = eeh_token_to_phys((unsigned long __force) token);
 	dev = pci_get_device_by_addr(addr);
-	if (!dev)
+	if (!dev) {
+		__get_cpu_var(no_device)++;
 		return val;
+	}
 
 	dn = pci_device_to_OF_node(dev);
 	eeh_dn_check_failure (dn, dev);
@@ -663,6 +711,172 @@ unsigned long eeh_check_failure(const vo
 
 EXPORT_SYMBOL(eeh_check_failure);
 
+/* ------------------------------------------------------------- */
+/* The code below deals with error recovery */
+
+void
+rtas_set_slot_reset(struct device_node *dn)
+{
+	int token = rtas_token ("ibm,set-slot-reset");
+	int rc;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return;
+	rc = rtas_call(token,4,1, NULL,
+	               dn->eeh_config_addr,
+	               BUID_HI(dn->phb->buid),
+	               BUID_LO(dn->phb->buid),
+	               1);
+	if (rc) {
+		printk (KERN_WARNING "EEH: Unable to reset the failed slot\n");
+		return;
+	}
+	
+	/* The PCI bus requires that the reset be held high for at least
+	 * a 100 milliseconds. We wait a bit longer 'just in case'.
+	 */
+   msleep (200);
+	
+	rc = rtas_call(token,4,1, NULL,
+	               dn->eeh_config_addr,
+	               BUID_HI(dn->phb->buid),
+	               BUID_LO(dn->phb->buid),
+	               0);
+}
+
+EXPORT_SYMBOL(rtas_set_slot_reset);
+
+void
+rtas_configure_bridge(struct device_node *dn)
+{
+	int token = rtas_token ("ibm,configure-bridge");
+	int rc;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return;
+	rc = rtas_call(token,3,1, NULL,
+	               dn->eeh_config_addr,
+	               BUID_HI(dn->phb->buid),
+	               BUID_LO(dn->phb->buid));
+	if (rc) {
+		printk (KERN_WARNING "EEH: Unable to configure device bridge\n");
+	}
+}
+
+EXPORT_SYMBOL(rtas_configure_bridge);
+
+/* ------------------------------------------------------- */
+/** Save and restore of PCI BARs
+ * 
+ * Although firmware will set up BARs during boot, it doesn't
+ * set up device BAR's after a device reset, although it will,
+ * if requested, set up bridge configuration. Thus, we need to 
+ * configure the PCI devices ourselves.  Config-space setup is 
+ * stored in the PCI structures which are normally deleted during
+ * device removal.  Thus, the "save" routine references the
+ * structures so that they aren't deleted. 
+ */
+
+
+struct eeh_cfg_tree
+{
+	struct eeh_cfg_tree *sibling;
+	struct eeh_cfg_tree *child;
+	struct device_node *dn;
+	int is_bridge;
+};
+
+/** 
+ * eeh_save_bars - save the PCI config space info
+ */
+struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn)
+{
+	struct pci_dev *dev;
+	struct eeh_cfg_tree *cnode;
+
+	dev = eeh_get_pci_dev(dn);
+	if (!dev)
+		return NULL;
+	
+	cnode = kmalloc(sizeof(struct eeh_cfg_tree), GFP_KERNEL);
+	if (!cnode) 
+		return NULL;
+	
+	cnode->is_bridge = 0;
+	
+	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) 
+		cnode->is_bridge = 1;
+			  
+	of_node_get(dn);
+	cnode->dn = dn;
+	
+	cnode->sibling = NULL;
+	cnode->child = NULL;
+
+	if (dn->child) {
+		cnode->child = eeh_save_bars (dn->child);
+	}
+	if (dn->sibling) {
+		cnode->sibling = eeh_save_bars (dn->sibling);
+	}
+
+	return cnode;
+}
+EXPORT_SYMBOL(eeh_save_bars);
+
+/**
+ * __restore_bars - Restore the Base Address Registers
+ * Loads the PCI configuration space base address registers, 
+ * the expansion ROM base address, the latency timer, and etc.
+ * from the saved values in the device node.
+ */
+static inline void __restore_bars (struct device_node *dn)
+{
+	int i;
+	for (i=4; i<10; i++) {
+		rtas_write_config(dn, i*4, 4, dn->config_space[i]);
+	}
+
+	/* 12 == Expansion ROM Address */
+	rtas_write_config(dn, 12*4, 4, dn->config_space[12]);
+	
+#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF])
+	
+	rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, 
+	            SAVED_BYTE(PCI_CACHE_LINE_SIZE));
+	
+	rtas_write_config (dn, PCI_LATENCY_TIMER, 1, 
+	            SAVED_BYTE(PCI_LATENCY_TIMER));
+	
+	rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, 
+	            SAVED_BYTE(PCI_INTERRUPT_LINE));
+}
+
+/** 
+ * eeh_restore_bars - restore the PCI config space info
+ */
+void eeh_restore_bars(struct eeh_cfg_tree *tree)
+{
+	if (!(tree->is_bridge))
+		__restore_bars (tree->dn);
+	
+	if (tree->child)
+		eeh_restore_bars (tree->child);
+
+	if (tree->sibling)
+		eeh_restore_bars (tree->sibling);
+
+	of_node_put (tree->dn);
+	kfree (tree);
+}
+EXPORT_SYMBOL(eeh_restore_bars);
+
+/* ------------------------------------------------------------- */
+/* The code below deals with enabling EEH for devices during  the
+ * early boot sequence.  EEH must be enabled before any PCI probing
+ * can be done.
+ */
+
 struct eeh_early_enable_info {
 	unsigned int buid_hi;
 	unsigned int buid_lo;
@@ -829,7 +1043,9 @@ void eeh_add_device_early(struct device_
 		return;
 	phb = dn->phb;
 	if (NULL == phb || 0 == phb->buid) {
-		printk(KERN_WARNING "EEH: Expected buid but found none\n");
+		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
+		                dn->full_name);
+		dump_stack();
 		return;
 	}
 
@@ -848,6 +1064,9 @@ EXPORT_SYMBOL(eeh_add_device_early);
  */
 void eeh_add_device_late(struct pci_dev *dev)
 {
+	int i;
+	struct device_node *dn;
+
 	if (!dev || !eeh_subsystem_enabled)
 		return;
 
@@ -857,6 +1076,11 @@ void eeh_add_device_late(struct pci_dev 
 #endif
 
 	pci_addr_cache_insert_device (dev);
+
+	/* Save the BAR's; firmware doesn't restore these after EEH reset */
+	dn = pci_device_to_OF_node(dev);
+	for (i = 0; i < 16; i++)
+		pci_read_config_dword(dev, i * 4, &dn->config_space[i]);
 }
 EXPORT_SYMBOL(eeh_add_device_late);
 
@@ -886,12 +1110,17 @@ static int proc_eeh_show(struct seq_file
 	unsigned int cpu;
 	unsigned long ffs = 0, positives = 0, failures = 0;
 	unsigned long resets = 0;
+	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
 
 	for_each_cpu(cpu) {
 		ffs += per_cpu(total_mmio_ffs, cpu);
 		positives += per_cpu(false_positives, cpu);
 		failures += per_cpu(ignored_failures, cpu);
 		resets += per_cpu(slot_resets, cpu);
+		no_dev += per_cpu(no_device, cpu);
+		no_dn += per_cpu(no_dn, cpu);
+		no_cfg += per_cpu(no_cfg_addr, cpu);
+		no_check += per_cpu(ignored_check, cpu);
 	}
 
 	if (0 == eeh_subsystem_enabled) {
@@ -899,13 +1128,17 @@ static int proc_eeh_show(struct seq_file
 		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
 	} else {
 		seq_printf(m, "EEH Subsystem is enabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n"
+		seq_printf(m, 
+				"no device=%ld\n"
+				"no device node=%ld\n"
+				"no config address=%ld\n"
+				"check not wanted=%ld\n"
+				"eeh_total_mmio_ffs=%ld\n"
 			   "eeh_false_positives=%ld\n"
 			   "eeh_ignored_failures=%ld\n"
-			   "eeh_slot_resets=%ld\n"
-				"eeh_fail_count=%d\n",
-			   ffs, positives, failures, resets,
-				eeh_fail_count.counter);
+			   "eeh_slot_resets=%ld\n",
+				no_dev, no_dn, no_cfg, no_check,
+			   ffs, positives, failures, resets);
 	}
 
 	return 0;
===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited =====
--- 1.59/arch/ppc64/kernel/pSeries_pci.c	2004-11-15 21:29:10 -06:00
+++ edited/arch/ppc64/kernel/pSeries_pci.c	2005-01-05 13:41:09 -06:00
@@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p
 	return PCIBIOS_DEVICE_NOT_FOUND;
 }
 
-static int rtas_write_config(struct device_node *dn, int where, int size, u32 val)
+int rtas_write_config(struct device_node *dn, int where, int size, u32 val)
 {
 	unsigned long buid, addr;
 	int ret;
===== include/asm-ppc64/eeh.h 1.23 vs edited =====
--- 1.23/include/asm-ppc64/eeh.h	2004-10-25 18:17:38 -05:00
+++ edited/include/asm-ppc64/eeh.h	2005-01-05 13:47:55 -06:00
@@ -22,8 +22,8 @@
 
 #include <linux/init.h>
 #include <linux/list.h>
-#include <linux/string.h>
 #include <linux/notifier.h>
+#include <linux/string.h>
 
 struct pci_dev;
 struct device_node;
@@ -33,6 +33,10 @@ struct device_node;
 #define EEH_MODE_NOCHECK	(1<<1)
 #define EEH_MODE_ISOLATED	(1<<2)
 
+/* Max number of EEH freezes allowed before we consider the device
+ * to be permanently disabled. */
+#define EEH_MAX_ALLOWED_FREEZES 5
+
 #ifdef CONFIG_PPC_PSERIES
 extern void __init eeh_init(void);
 unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val);
@@ -57,6 +61,34 @@ void eeh_add_device_early(struct device_
 void eeh_add_device_late(struct pci_dev *);
 
 /**
+ * eeh_slot_error_detail -- record and EEH error condition to the log
+ * @severity: 1 if temporary, 2 if permanent failure.
+ *
+ * Obtains the the EEH error details from the RTAS subsystem, 
+ * and then logs these details with the RTAS error log system.
+ */
+void eeh_slot_error_detail (struct device_node *dn, int severity);
+
+/** 
+ * rtas_set_slot_reset -- unfreeze a frozen slot
+ *
+ * Clear the EEH-frozen condition on a slot.  This routine
+ * does this by asserting the PCI #RST line for 1/8th of 
+ * a second; this routine will sleep while the adapter is 
+ * being reset.
+ */
+void rtas_set_slot_reset (struct device_node *dn);
+
+/**
+ * rtas_configure_bridge -- firmware initialization of pci bridge
+ * 
+ * Ask the firmware to configure any PCI bridge devices 
+ * located behind the indicated node. Required after a 
+ * pci device reset.
+ */
+void rtas_configure_bridge(struct device_node *dn);
+
+/**
  * eeh_remove_device - undo EEH setup for the indicated pci device
  * @dev: pci device to be removed
  *
@@ -91,6 +123,13 @@ struct eeh_event {
 /** Register to find out about EEH events. */
 int eeh_register_notifier(struct notifier_block *nb);
 int eeh_unregister_notifier(struct notifier_block *nb);
+
+/** Save and restore device configuration info across
+ *  device resets.
+ */
+struct eeh_cfg_tree;
+struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn);
+void eeh_restore_bars(struct eeh_cfg_tree *tree);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
===== include/asm-ppc64/prom.h 1.24 vs edited =====
--- 1.24/include/asm-ppc64/prom.h	2004-11-25 00:42:42 -06:00
+++ edited/include/asm-ppc64/prom.h	2005-01-05 13:41:09 -06:00
@@ -164,8 +164,10 @@ struct device_node {
 	int	status;			/* Current device status (non-zero is bad) */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
+	int	eeh_freeze_count;   /* number of times this device froze up. */
 	struct  pci_controller *phb;	/* for pci devices */
 	struct	iommu_table *iommu_table;	/* for phb's or bridges */
+	u32      config_space[16]; /* saved PCI config space */
 
 	struct	property *properties;
 	struct	device_node *parent;
===== include/asm-ppc64/rtas.h 1.25 vs edited =====
--- 1.25/include/asm-ppc64/rtas.h	2004-11-25 00:42:42 -06:00
+++ edited/include/asm-ppc64/rtas.h	2005-01-05 13:41:09 -06:00
@@ -241,4 +241,6 @@ extern void rtas_stop_self(void);
 /* RMO buffer reserved for user-space RTAS use */
 extern unsigned long rtas_rmo_buf;
 
+extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val);
+
 #endif /* _PPC64_RTAS_H */
-------------- next part --------------
===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited =====
--- 1.11/drivers/pci/hotplug/rpaphp.h	2004-10-06 11:43:44 -05:00
+++ edited/drivers/pci/hotplug/rpaphp.h	2005-01-05 13:41:09 -06:00
@@ -126,6 +126,8 @@ extern int register_pci_slot(struct slot
 extern int rpaphp_unconfig_pci_adapter(struct slot *slot);
 extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value);
 extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev);
+extern void init_eeh_handler (void);
+extern void exit_eeh_handler (void);
 
 /* rpaphp_core.c */
 extern int rpaphp_add_slot(struct device_node *dn);
===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited =====
--- 1.18/drivers/pci/hotplug/rpaphp_core.c	2004-10-06 11:43:44 -05:00
+++ edited/drivers/pci/hotplug/rpaphp_core.c	2005-01-05 13:41:09 -06:00
@@ -443,12 +443,18 @@ static int __init rpaphp_init(void)
 {
 	info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 
+	/* Get set to handle EEH events. */
+	init_eeh_handler();
+
 	/* read all the PRA info from the system */
 	return init_rpa();
 }
 
 static void __exit rpaphp_exit(void)
 {
+	/* Let EEH know we are going away. */
+	exit_eeh_handler();
+
 	cleanup_slots();
 }
 
===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited =====
--- 1.17/drivers/pci/hotplug/rpaphp_pci.c	2004-11-18 02:36:18 -06:00
+++ edited/drivers/pci/hotplug/rpaphp_pci.c	2005-01-05 15:30:29 -06:00
@@ -22,8 +22,12 @@
  * Send feedback to <lxie at us.ibm.com>
  *
  */
+#include <linux/delay.h>
+#include <linux/notifier.h>
 #include <linux/pci.h>
+#include <asm/eeh.h>
 #include <asm/pci-bridge.h>
+#include <asm/prom.h>
 #include <asm/rtas.h>
 #include "../pci.h"		/* for pci_add_new_bus */
 
@@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev
 		    root ? "Address space collision on" :
 		    "No parent found for",
 		    resource, dtype, pci_name(dev), res->start, res->end);
+		dump_stack();
 	}
 	return err;
 }
@@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_
 
 static int rpaphp_pci_config_bridge(struct pci_dev *dev);
 
+static void rpaphp_eeh_add_bus_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		eeh_add_device_late(dev);
+		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+			struct pci_bus *subbus = dev->subordinate;
+			if (bus)
+				rpaphp_eeh_add_bus_device (subbus);
+		}
+	}
+}
+
 /*****************************************************************************
  rpaphp_pci_config_slot() will  configure all devices under the 
  given slot->dn and return the the first pci_dev.
@@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod
 		}
 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) 
 			rpaphp_pci_config_bridge(dev);
+
+		rpaphp_eeh_add_bus_device(bus);
 	}
 	return dev;
 }
@@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru
 {
 	u8 sec_busno;
 	struct pci_bus *child_bus;
-	struct pci_dev *child_dev;
 
 	dbg("Enter %s:  BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev));
 
@@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru
 	/* do pci_scan_child_bus */
 	pci_scan_child_bus(child_bus);
 
-	list_for_each_entry(child_dev, &child_bus->devices, bus_list) {
-		eeh_add_device_late(child_dev);
-	}
-
-	 /* fixup new pci devices without touching bus struct */
+	/* Fixup new pci devices without touching bus struct */
 	rpaphp_fixup_new_pci_devices(child_bus, 0);
 
 	/* Make the discovered devices available */
@@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct 
 	return;
 }
 #else
-static void print_slot_pci_funcs(struct slot *slot)
+static inline void print_slot_pci_funcs(struct slot *slot)
 {
 	return;
 }
@@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device
 			if (pdev)
 				rpaphp_eeh_remove_bus_device(pdev);
 		}
-
 	}
 	return;
 }
@@ -562,36 +576,154 @@ exit:
 	return retval;
 }
 
-struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev)
+/**
+ * rpaphp_find_slot - find and return the slot holding the device
+ * @dev: pci device for which we want the slot structure.
+ */
+static struct slot *rpaphp_find_slot(struct pci_dev *dev)
 {
-	struct list_head	*tmp, *n;
-	struct slot		*slot;
+	struct list_head *tmp, *n;
+	struct slot	*slot;
 
 	list_for_each_safe(tmp, n, &rpaphp_slot_head) {
 		struct pci_bus *bus;
 		struct list_head *ln;
 
 		slot = list_entry(tmp, struct slot, rpaphp_slot_list);
-		if (slot->bridge == NULL) {
-			if (slot->dev_type == PCI_DEV) {
-				printk(KERN_WARNING "PCI slot missing bridge %s %s \n", 
-				                    slot->name, slot->location);
-			}
+		
+		/* PHB slots don't have bridges */
+		if (slot->bridge == NULL)
 			continue;
-		}
+
+		/* the PCI device could be the PHB itself */
+		if (slot->bridge == dev)
+			return slot;
 
 		bus = slot->bridge->subordinate;
 		if (!bus) {
+			printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n",
+			    pci_name (slot->bridge), pci_pretty_name (slot->bridge));
 			continue;  /* should never happen? */
 		}
+
 		for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) {
-                                struct pci_dev *pdev = pci_dev_b(ln);
-				if (pdev == dev)
-					return slot->hotplug_slot;
+			struct pci_dev *pdev = pci_dev_b(ln);
+			if (pdev == dev)
+				return slot;
 		}
 	}
 
 	return NULL;
 }
 
-EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot);
+/* ------------------------------------------------------- */
+/**
+ * handle_eeh_events -- reset a PCI device after hard lockup.
+ *
+ * pSeries systems will isolate a PCI slot if the PCI-Host
+ * bridge detects address or data parity errors, DMA's 
+ * occuring to wild addresses (which usually happen due to
+ * bugs in device drivers or in PCI adapter firmware).
+ * Slot isolations also occur if #SERR, #PERR or other misc
+ * PCI-related errors are detected.
+ * 
+ * Recovery process consists of unplugging the device driver
+ * (which generated hotplug events to userspace), then issuing
+ * a PCI #RST to the device, then reconfiguring the PCI config 
+ * space for all bridges & devices under this slot, and then 
+ * finally restarting the device drivers (which cause a second
+ * set of hotplug events to go out to userspace).
+ */
+int handle_eeh_events (struct notifier_block *self, 
+                       unsigned long reason, void *ev)
+{
+	int freeze_count=0;
+	struct eeh_event *event = ev;
+	struct slot *frozen_slot;
+	struct eeh_cfg_tree * saved_bars;
+
+debug=1;
+	frozen_slot = rpaphp_find_slot(event->dev);
+	if (!frozen_slot)
+	{
+		printk (KERN_ERR 
+			"EEH: Cannot find PCI slot for EEH error! dev=%p dn=%p\n", 
+			event->dev, event->dn);
+		if (event->dev)
+			printk("EEH: above message for pci device %s %s\n", 
+				pci_name(event->dev), pci_pretty_name (event->dev));
+		if (event->dn)
+			printk ("EEH: above message for dn %s\n", event->dn->full_name);
+		return 1;
+	}
+
+	/* Keep a copy of the config space registers */
+	saved_bars = eeh_save_bars(frozen_slot->dn);
+	of_node_get(event->dn);
+	pci_dev_get(event->dev);
+
+	if (frozen_slot->dn->child)
+		freeze_count = frozen_slot->dn->child->eeh_freeze_count;
+	rpaphp_unconfig_pci_adapter (frozen_slot);
+
+	freeze_count ++;
+	if (freeze_count > EEH_MAX_ALLOWED_FREEZES) {
+		/* 
+		 * About 90% of all real-life EEH failures in the field
+		 * are due to poorly seated PCI cards. Only 10% or so are
+		 * due to actual, failed cards 
+		 */
+		printk (KERN_ERR
+		   "EEH: device %s:%s has failed %d times \n"
+			"and has been permanently disabled.  Please try reseating\n"
+		   "this device or replacing it.\n",
+			pci_name (event->dev),
+			pci_pretty_name (event->dev),
+			freeze_count);
+		goto rdone;
+	}
+	printk (KERN_WARNING
+	   "EEH: This device has failed %d times since last reoobt: %s:%s\n",
+		freeze_count,
+		pci_name (event->dev),
+		pci_pretty_name (event->dev));
+
+	/* Reset the pci controller. (Asserts RST#; resets config space). 
+	 * Reconfigure bridges and devices */
+	rtas_set_slot_reset (event->dn);
+	rtas_configure_bridge(event->dn);
+	eeh_restore_bars(saved_bars);
+
+	/* Give the system 5 seconds to finish running the user-space
+	 * hotplug scripts, e.g. ifdown for ethernet.  Yes, this is a hack, 
+	 * but if we don't do this, weird things happen.
+	 */
+	ssleep (5);
+
+	rpaphp_enable_pci_slot (frozen_slot);
+
+	/* Store the freeze count with the pci adapter, and not the slot.
+	 * This way, if the device is replaced, the count is cleared.
+	 */
+	if (frozen_slot->dn->child)
+		frozen_slot->dn->child->eeh_freeze_count = freeze_count;
+
+rdone:
+	of_node_put(event->dn);
+	pci_dev_put(event->dev);
+	return 0;
+}
+
+static struct notifier_block eeh_block;
+
+void __init init_eeh_handler (void)
+{
+	eeh_block.notifier_call = handle_eeh_events;
+	eeh_register_notifier (&eeh_block);
+}
+
+void __exit exit_eeh_handler (void)
+{
+	eeh_unregister_notifier (&eeh_block);
+}
+

From johnrose at austin.ibm.com  Fri Jan  7 07:59:25 2005
From: johnrose at austin.ibm.com (John Rose)
Date: Thu, 06 Jan 2005 14:59:25 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <20050106192413.GK22274@austin.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
Message-ID: <1105045165.22565.20.camel@sinatra.austin.ibm.com>

Hi Linas-

Here are a couple of non-substantive comments on your PCI Hotplug patch:

+               /* PHB slots don't have bridges */
+               if (slot->bridge == NULL)
                        continue;
-               }
+
+               /* the PCI device could be the PHB itself */
+               if (slot->bridge == dev)
+                       return slot;

The PHB case is handled by the first condition.  The second comment
would make more sense if "PHB itself" read "slot itself".

-EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot);

I suppose we could also make this static and remove it from rpaphp.h.

Thanks-
John


From j.glisse at free.fr  Sat Jan  8 04:37:03 2005
From: j.glisse at free.fr (Jerome Glisse)
Date: Fri, 07 Jan 2005 18:37:03 +0100
Subject: Problems using Apple LCD with 2.6.10
In-Reply-To: <20050106175501.GA11534@unixforces.net>
References: <20050106175501.GA11534@unixforces.net>
Message-ID: <41DEC8BF.6010809@free.fr>

Markus Rothe wrote:

>Hi,
>
>I'm not sure if this is the correct place for such mails, but I didn't
>found another place to post my problem.
>
>My problem is that my LCD doesn't work correctly with latest (2.6.10)
>kernel. It's an Apple Cinema Display connected through the Apple Display
>Connector (ADC). The problem is that there are many "blue lightnings" all
>over the display. With blue lightning I mean a small set of pixels which
>turn into light blue for about half a second. And my display also flickers
>from time to time. Both happens when running console mode and if I run
>Xorg.
>
>This problem is definetly related to the kernel as it does not occure with
>kernel 2.6.9.
>  
>
What is your graphics card ? radeon ? nvidia ?

best,
Jerome Glisse


From olof at austin.ibm.com  Sat Jan  8 07:00:26 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Fri, 7 Jan 2005 14:00:26 -0600
Subject: [PATCH] [PPC64] Fix iommu cleanup regression
Message-ID: <20050107200026.GA23616@austin.ibm.com>

Hi,

In the recent IOMMU cleanup, the new LPAR code assumes that all PHBs
must have a dma window assigned to it. On some machines we don't have
a window assinged unless there's an adapter in the slot.

In other words, a PHB without a ibm,dma-window property is not a bug and
must be tolerated. This patch fixes that, and also removes a redundant
check for the dma-window being defined.

Signed-off-by: Olof Johansson <olof at austin.ibm.com>


---

 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix arch/ppc64/kernel/pSeries_iommu.c
--- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix	2005-01-07 12:52:18.960683160 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c	2005-01-07 13:44:19.427300128 -0600
@@ -293,10 +293,6 @@ static void iommu_table_setparms_lpar(st
 				      struct iommu_table *tbl,
 				      unsigned int *dma_window)
 {
-	if (!dma_window)
-		panic("iommu_table_setparms_lpar: device %s has no"
-		      " ibm,dma-window property!\n", dn->full_name);
-
 	tbl->it_busno  = dn->bussubno;
 
 	/* TODO: Parse field size properties properly. */
@@ -385,7 +381,10 @@ static void iommu_bus_setup_pSeriesLP(st
 			break;
 	}
 
-	WARN_ON(dma_window == NULL);
+	if (dma_window == NULL) {
+		DBG("iommu_bus_setup_pSeriesLP: bus %s seems to have no ibm,dma-window property\n", dn->full_name);
+		return;
+	}
 
 	if (!pdn->iommu_table) {
 		/* Bussubno hasn't been copied yet.
@@ -420,10 +419,11 @@ static void iommu_dev_setup_pSeries(stru
 	while (dn && dn->iommu_table == NULL)
 		dn = dn->parent;
 
-	WARN_ON(!dn);
-
-	if (dn)
+	if (dn) {
 		mydn->iommu_table = dn->iommu_table;
+	} else {
+		DBG("iommu_dev_setup_pSeries, dev %p (%s) has no iommu table\n", dev, dev->pretty_name);
+	}
 }
 
 static void iommu_bus_setup_null(struct pci_bus *b) { }

_


From linas at austin.ibm.com  Sat Jan  8 07:09:36 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Fri, 7 Jan 2005 14:09:36 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <1105045165.22565.20.camel@sinatra.austin.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
	<1105045165.22565.20.camel@sinatra.austin.ibm.com>
Message-ID: <20050107200936.GN22274@austin.ibm.com>

On Thu, Jan 06, 2005 at 02:59:25PM -0600, John Rose was heard to remark:
> Hi Linas-
> 
> Here are a couple of non-substantive comments on your PCI Hotplug patch:

OK, thanks, I've tweaked it, I'll be in the next round of updates.

--linas


From markus at unixforces.net  Sat Jan  8 07:13:43 2005
From: markus at unixforces.net (Markus Rothe)
Date: Fri, 7 Jan 2005 21:13:43 +0100
Subject: Problems using Apple LCD with 2.6.10
In-Reply-To: <41DEC8BF.6010809@free.fr>
References: <20050106175501.GA11534@unixforces.net> <41DEC8BF.6010809@free.fr>
Message-ID: <20050107201343.GA10390@unixforces.net>

Jerome Glisse wrote:
> What is your graphics card ? radeon ? nvidia ?

It is a Radeon 9600.

Markus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050107/6b316468/attachment.pgp 

From zwane at arm.linux.org.uk  Sun Jan  9 15:29:23 2005
From: zwane at arm.linux.org.uk (Zwane Mwaikambo)
Date: Sat, 8 Jan 2005 21:29:23 -0700 (MST)
Subject: [PATCH] PPC64: Move hotplug cpu functions to smp_ops
Message-ID: <Pine.LNX.4.61.0501082114520.13639@montezuma.fsmlabs.com>

This should allow for easier adding of hotplug cpu support for other PPC64 
subarchs. The patch is untested but does compile with and without hotplug 
cpu on pSeries and G5 configs. What can get slightly confusing is the fact 
that both ppc_md and smp_ops have cpu_die members.

 arch/ppc64/kernel/pSeries_smp.c |    9 +++++++--
 arch/ppc64/kernel/smp.c         |   16 ++++++++++++++++
 include/asm-ppc64/machdep.h     |    2 ++
 3 files changed, 25 insertions(+), 2 deletions(-)

Signed-off-by: Zwane Mwaikambo <zwane at arm.linux.org.uk>

Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/pSeries_smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 pSeries_smp.c
--- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c	4 Jan 2005 04:03:33 -0000	1.1.1.1
+++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c	9 Jan 2005 03:42:19 -0000
@@ -88,7 +88,7 @@ static int query_cpu_stopped(unsigned in
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-int __cpu_disable(void)
+int pSeries_cpu_disable(void)
 {
 	/* FIXME: go put this in a header somewhere */
 	extern void xics_migrate_irqs_away(void);
@@ -106,7 +106,7 @@ int __cpu_disable(void)
 	return 0;
 }
 
-void __cpu_die(unsigned int cpu)
+void pSeries_cpu_die(unsigned int cpu)
 {
 	int tries;
 	int cpu_status;
@@ -355,6 +355,11 @@ void __init smp_init_pSeries(void)
 	else
 		smp_ops = &pSeries_xics_smp_ops;
 
+#ifdef CONFIG_HOTPLUG_CPU
+	smp_ops->cpu_disable = pSeries_cpu_disable;
+	smp_ops->cpu_die = pSeries_cpu_die;
+#endif
+
 	/* Start secondary threads on SMT systems; primary threads
 	 * are already in the running state.
 	 */
Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.c
--- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c	4 Jan 2005 04:03:33 -0000	1.1.1.1
+++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c	9 Jan 2005 03:48:56 -0000
@@ -557,3 +557,19 @@ void __init smp_cpus_done(unsigned int m
 	 */
 	cpu_present_map = cpu_possible_map;
 }
+
+#ifdef CONFIG_HOTPLUG_CPU
+int __cpu_disable(void)
+{
+	if (smp_ops->cpu_disable)
+		return smp_ops->cpu_disable();
+
+	return -ENOSYS;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+	if (smp_ops->cpu_die)
+		smp_ops->cpu_die(cpu);
+}
+#endif
Index: linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm1/include/asm-ppc64/machdep.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 machdep.h
--- linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h	4 Jan 2005 04:03:40 -0000	1.1.1.1
+++ linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h	9 Jan 2005 03:50:21 -0000
@@ -31,6 +31,8 @@ struct smp_ops_t {
 	void  (*late_setup_cpu)(int nr);
 	void  (*take_timebase)(void);
 	void  (*give_timebase)(void);
+	int   (*cpu_disable)(void);
+	void  (*cpu_die)(unsigned int nr);
 };
 #endif
 

From anton at samba.org  Sun Jan  9 16:48:34 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 9 Jan 2005 16:48:34 +1100
Subject: xtime <-> gettimeofday can get out of sync
Message-ID: <20050109054834.GL14239@krispykreme.ozlabs.ibm.com>


Hi,

Ive noticed a problem where xtime and gettimeofday could get out of sync
if interrupts are disabled for too long (eg long kernel code paths or
dropping into the debugger for a while).

We correctly replay lost jiffies but in that loop time_sync_xtime syncs
the intermediate values of xtime up with the current value of
gettimeofday. So xtime jumps by a bunch and from then on it is ahead of
gettimeofday and we never resync the two. I guess this is to avoid xtime
going backwards.

The patch below creates a __do_gettimeofday where you can pass in a tb
value and sync the intermediate values of xtime properly.

Note that the time_sync_xtime check only stops the seconds from going
backwards, the ns component still could couldnt it? Considering this
stuff is hard to get right, should we switch to the time interpolator
stuff? The only problem there is it might be trouble for systemcfg
(which exports stuff to do userspace gettimeofday).

Anton

===== arch/ppc64/kernel/time.c 1.44 vs edited =====
--- 1.44/arch/ppc64/kernel/time.c	2005-01-05 13:48:14 +11:00
+++ edited/arch/ppc64/kernel/time.c	2005-01-09 16:37:33 +11:00
@@ -142,16 +142,54 @@
         }
 }
 
+/*
+ * This version of gettimeofday has microsecond resolution.
+ */
+static inline void __do_gettimeofday(struct timeval *tv, unsigned long tb_val)
+{
+	unsigned long sec, usec, tb_ticks;
+	unsigned long xsec, tb_xsec;
+	struct gettimeofday_vars * temp_varp;
+	unsigned long temp_tb_to_xs, temp_stamp_xsec;
+
+	/*
+	 * These calculations are faster (gets rid of divides)
+	 * if done in units of 1/2^20 rather than microseconds.
+	 * The conversion to microseconds at the end is done
+	 * without a divide (and in fact, without a multiply)
+	 */
+	tb_ticks = tb_val - do_gtod.tb_orig_stamp;
+	temp_varp = do_gtod.varp;
+	temp_tb_to_xs = temp_varp->tb_to_xs;
+	temp_stamp_xsec = temp_varp->stamp_xsec;
+	tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs );
+	xsec = temp_stamp_xsec + tb_xsec;
+	sec = xsec / XSEC_PER_SEC;
+	xsec -= sec * XSEC_PER_SEC;
+	usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC;
+
+	tv->tv_sec = sec;
+	tv->tv_usec = usec;
+}
+
+void do_gettimeofday(struct timeval *tv)
+{
+	__do_gettimeofday(tv, get_tb());
+}
+
+EXPORT_SYMBOL(do_gettimeofday);
+
 /* Synchronize xtime with do_gettimeofday */ 
 
-static __inline__ void timer_sync_xtime( unsigned long cur_tb )
+static inline void timer_sync_xtime(unsigned long cur_tb)
 {
 	struct timeval my_tv;
 
-	if ( cur_tb > next_xtime_sync_tb ) {
+	if (cur_tb > next_xtime_sync_tb) {
 		next_xtime_sync_tb = cur_tb + xtime_sync_interval;
-		do_gettimeofday( &my_tv );
-		if ( xtime.tv_sec <= my_tv.tv_sec ) {
+		__do_gettimeofday(&my_tv, cur_tb);
+
+		if (xtime.tv_sec <= my_tv.tv_sec) {
 			xtime.tv_sec = my_tv.tv_sec;
 			xtime.tv_nsec = my_tv.tv_usec * 1000;
 		}
@@ -274,7 +312,7 @@
 			write_seqlock(&xtime_lock);
 			tb_last_stamp = lpaca->next_jiffy_update_tb;
 			do_timer(regs);
-			timer_sync_xtime( cur_tb );
+			timer_sync_xtime(lpaca->next_jiffy_update_tb);
 			timer_check_rtc();
 			write_sequnlock(&xtime_lock);
 			if ( adjusting_time && (time_adjust == 0) )
@@ -312,36 +350,6 @@
 {
 	return mulhdu(get_tb(), tb_to_ns_scale) << tb_to_ns_shift;
 }
-
-/*
- * This version of gettimeofday has microsecond resolution.
- */
-void do_gettimeofday(struct timeval *tv)
-{
-        unsigned long sec, usec, tb_ticks;
-	unsigned long xsec, tb_xsec;
-	struct gettimeofday_vars * temp_varp;
-	unsigned long temp_tb_to_xs, temp_stamp_xsec;
-
-	/* These calculations are faster (gets rid of divides)
-	 * if done in units of 1/2^20 rather than microseconds.
-	 * The conversion to microseconds at the end is done
-	 * without a divide (and in fact, without a multiply) */
-	tb_ticks = get_tb() - do_gtod.tb_orig_stamp;
-	temp_varp = do_gtod.varp;
-	temp_tb_to_xs = temp_varp->tb_to_xs;
-	temp_stamp_xsec = temp_varp->stamp_xsec;
-	tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs );
-	xsec = temp_stamp_xsec + tb_xsec;
-	sec = xsec / XSEC_PER_SEC;
-	xsec -= sec * XSEC_PER_SEC;
-	usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC;
-
-        tv->tv_sec = sec;
-        tv->tv_usec = usec;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
 
 int do_settimeofday(struct timespec *tv)
 {


From j.glisse at gmail.com  Mon Jan 10 02:26:12 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Sun, 9 Jan 2005 16:26:12 +0100
Subject: U3 G5 AGP support patch (v4)
Message-ID: <4240b916050109072621440269@mail.gmail.com>

Hi,

Attached is a patch for the U3 agp bridge. This one just fix some typo
from the previous patch. (DEVICE instead of DEVIEC...).

Signed-off-by: Jerome Glisse <j.glisse at gmail.com>

best,
Jerome Glisse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uninorth-patch4
Type: application/octet-stream
Size: 10216 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/226dc102/attachment.obj 

From j.glisse at gmail.com  Mon Jan 10 02:40:56 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Sun, 9 Jan 2005 16:40:56 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
Message-ID: <4240b916050109074053e328b1@mail.gmail.com>

Hi,

With 2.6.10 i get a compilation error with disable_6xx_mmu
i guess this is linked with the patch you supplied in december
in arch/ppc/boot/common/util.S

Patch which comment disable_6xx_mmu if flags CONFIG_6XX
not defined. The problem arise in arch/ppc/boot/simple/misc-prep.c
where there is no conditional compilation for this function.

Attached is a patch that use the flags CONFIG_6XX to comment
out call to this function if flags not set.

By the way there is many compilation warning related to PPC with 2.6.10
anyone looking to correct them ?

Signed-off-by: Jerome Glisse <j.glisse at gmail.com>

best,
Jerome Glisse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: disable_6xx-patch
Type: application/octet-stream
Size: 855 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/888cb654/attachment.obj 

From hch at lst.de  Mon Jan 10 03:06:14 2005
From: hch at lst.de (Christoph Hellwig)
Date: Sun, 9 Jan 2005 17:06:14 +0100
Subject: U3 G5 AGP support patch (v4)
In-Reply-To: <4240b916050109072621440269@mail.gmail.com>
References: <4240b916050109072621440269@mail.gmail.com>
Message-ID: <20050109160614.GA22839@lst.de>

+static struct device_node* uninorth_node __pmacdata;
+static u32 __iomem * uninorth_base __pmacdata;

static struct device_node *uninorth_node __pmacdata;
static u32 __iomem *uninorth_base __pmacdata;

+	if(uninorth_rev == 0x21) {

	if (uninorth_rev == 0x21) {

+	if((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) {

	if ((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) {

+	if (agp_bridge->dev->device == PCI_DEVICE_ID_APPLE_U3_AGP) {
+			/* This is an AGP V3 */
+			agp_device_command(command, TRUE);
+	} else {
+			/* AGP V2 */
+			agp_device_command(command, FALSE);
+	}

double-indentation, also please use 1/0 instead of TRUE/FALSE.

+static struct aper_size_info_32 u3_sizes[8] =
+{
+/*
+ * Not sure that uninorth3 supports that high aperture sizes but it
+ * would strange if it did not :)
+ */

comment before the struct declearation, please, aka

/*
 * Not sure that uninorth3 supports that high aperture sizes but it
 * would strange if it did not :)
 */
static struct aper_size_info_32 u3_sizes[8] = {

+	uninorth_node = of_find_node_by_name(NULL, "uni-n");
+	/* Locate G5 u3 */
+	if (uninorth_node == NULL) {
+		uninorth_node = of_find_node_by_name(NULL, "u3");
+	}

	/* Locate G5 u3 */
	uninorth_node = of_find_node_by_name(NULL, "uni-n");
	if (!uninorth_node)
		uninorth_node = of_find_node_by_name(NULL, "u3");

+	/*
+	 * Set specific functions & values for agp3 controller.
+	 */
+	if (pdev->device == PCI_DEVICE_ID_APPLE_U3_AGP) {
+		uninorth_agp_driver.insert_memory  = uninorth3_insert_memory;
+		uninorth_agp_driver.aperture_sizes = (void *)u3_sizes;
+		uninorth_agp_driver.num_aperture_sizes = 8;

Please delcare separate driver instance instead of overriding.


And asm-ppc64 is still missing an agp.h, no?


From j.glisse at gmail.com  Mon Jan 10 04:46:05 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Sun, 9 Jan 2005 18:46:05 +0100
Subject: U3 G5 AGP support patch (v4)
In-Reply-To: <20050109160614.GA22839@lst.de>
References: <4240b916050109072621440269@mail.gmail.com>
	<20050109160614.GA22839@lst.de>
Message-ID: <4240b91605010909463e44bba8@mail.gmail.com>

> Please delcare separate driver instance instead of overriding.

I hope new patch follow codestyle ? :)
 
> And asm-ppc64 is still missing an agp.h, no?

Maybe, some one with better knowledge may tell us more on that :)
Anyway BenH tell me that there is still pending issue with agp & a
potential cache aliasing.

best,
Jerome Glisse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uninorth-patch5
Type: application/octet-stream
Size: 11215 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/5cfffcbb/attachment.obj 

From j.glisse at gmail.com  Mon Jan 10 07:41:44 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Sun, 9 Jan 2005 21:41:44 +0100
Subject: U3 G5 AGP support patch (v4)
In-Reply-To: <4240b91605010909463e44bba8@mail.gmail.com>
References: <4240b916050109072621440269@mail.gmail.com>
	<20050109160614.GA22839@lst.de>
	<4240b91605010909463e44bba8@mail.gmail.com>
Message-ID: <4240b91605010912414a5b1b67@mail.gmail.com>

It seems there is bug somewhere in my agp patch. I was playing with
r300 radeon and
i get a hard lockup (quite used to that while playing with r300 thought :()

But after a bit of investigation it seems to be related to agp. Right now i am
porting an old tools from dri that test agpgart & thus agp. I finally may really
need to totaly split the u3 driver from the old uninorth.

I will give a deeper look to track down the issue. In the mean time if some
one could test agp & radeon r200 on a g5. You will certainly lockup your g5
but it should not burn, at least here i just got some smoke ;)

best,
Jerome Glisse


From paulus at samba.org  Mon Jan 10 08:03:20 2005
From: paulus at samba.org (Paul Mackerras)
Date: Mon, 10 Jan 2005 08:03:20 +1100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <4240b916050109074053e328b1@mail.gmail.com>
References: <4240b916050109074053e328b1@mail.gmail.com>
Message-ID: <16865.39960.274092.996530@cargo.ozlabs.ibm.com>

Jerome Glisse writes:

> With 2.6.10 i get a compilation error with disable_6xx_mmu

What kind of machine is this?  Could you send me your .config?

I suspect that maybe we aren't defining CONFIG_6XX for PPC970
machines.

Paul.


From david at gibson.dropbear.id.au  Tue Jan 11 02:55:20 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Tue, 11 Jan 2005 02:55:20 +1100
Subject: [PPC64] Hugepage bugfix
Message-ID: <20050110155520.GA22101@localhost.localdomain>

Andrew, Linus, please apply:

Fix a stupid unbalanced lock bug in the ppc64 hugepage code.  Lead
rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were
enabled (even without actually using hugepages at all).

Signed-off-by: David Gibson <dwg at au1.ibm.com>

Index: working-2.6/arch/ppc64/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c	2005-01-06 10:47:48.000000000 +1100
+++ working-2.6/arch/ppc64/mm/hugetlbpage.c	2005-01-10 15:16:25.142319552 +1100
@@ -745,7 +745,7 @@
 
 	pgdir = mm->context.huge_pgdir;
 	if (! pgdir)
-		return;
+		goto out;
 
 	mm->context.huge_pgdir = NULL;
 
@@ -768,6 +768,7 @@
 	BUG_ON(memcmp(pgdir, empty_zero_page, PAGE_SIZE));
 	kmem_cache_free(zero_cache, pgdir);
 
+ out:
 	spin_unlock(&mm->page_table_lock);
 }
 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From wli at holomorphy.com  Mon Jan 10 16:04:41 2005
From: wli at holomorphy.com (William Lee Irwin III)
Date: Sun, 9 Jan 2005 21:04:41 -0800
Subject: [PPC64] Hugepage bugfix
In-Reply-To: <20050110155520.GA22101@localhost.localdomain>
References: <20050110155520.GA22101@localhost.localdomain>
Message-ID: <20050110050441.GA2696@holomorphy.com>

On Tue, Jan 11, 2005 at 02:55:20AM +1100, David Gibson wrote:
> Andrew, Linus, please apply:
> Fix a stupid unbalanced lock bug in the ppc64 hugepage code.  Lead
> rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were
> enabled (even without actually using hugepages at all).
> Signed-off-by: David Gibson <dwg at au1.ibm.com>

Acked-by: William Irwin <wli at holomorphy.com>


From david at gibson.dropbear.id.au  Tue Jan 11 05:00:04 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Tue, 11 Jan 2005 05:00:04 +1100
Subject: [PPC64] Rename perf counter register #defines
Message-ID: <20050110180004.GC22101@localhost.localdomain>

Andrew, please apply:

This patch makes some cleanups to the #defines for various fields in
the MMCR0 performance monitor control register.  Specifically, the
names of a couple of bits are changed so that: a) they are a bit less
cumbersomely long and b) they match the names used in the hardware
documentation.

Signed-off-by: David Gibson <dwg at au1.ibm.com>

Index: working-2.6/include/asm-ppc64/processor.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/processor.h	2005-01-10 16:51:10.625391320 +1100
+++ working-2.6/include/asm-ppc64/processor.h	2005-01-10 16:51:28.771295712 +1100
@@ -331,8 +331,8 @@
 #define   MMCR0_FCECE	0x02000000UL /* freeze counters on enabled condition or event */
 /* time base exception enable */
 #define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
-#define   MMCR0_PMC1INTCONTROL	0x00008000UL /* PMC1 count enable*/
-#define   MMCR0_PMCNINTCONTROL	0x00004000UL /* PMCn count enable*/
+#define   MMCR0_PMC1CE	0x00008000UL /* PMC1 count enable*/
+#define   MMCR0_PMCjCE	0x00004000UL /* PMCj count enable*/
 #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
 #define   MMCR0_PMAO	0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */
 #define   MMCR0_SHRFC	0x00000040UL /* SHRre freeze conditions between threads */
Index: working-2.6/arch/ppc64/oprofile/op_model_rs64.c
===================================================================
--- working-2.6.orig/arch/ppc64/oprofile/op_model_rs64.c	2005-01-10 16:51:10.625391320 +1100
+++ working-2.6/arch/ppc64/oprofile/op_model_rs64.c	2005-01-10 16:51:28.772295560 +1100
@@ -119,7 +119,7 @@
 
 	mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE;
 	/* Only applies to POWER3, but should be safe on RS64 */
-	mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL;
+	mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE;
 	mtspr(SPRN_MMCR0, mmcr0);
 
 	dbg("setup on cpu %d, mmcr0 %lx\n", smp_processor_id(),
Index: working-2.6/arch/ppc64/oprofile/op_model_power4.c
===================================================================
--- working-2.6.orig/arch/ppc64/oprofile/op_model_power4.c	2005-01-10 16:51:10.626391168 +1100
+++ working-2.6/arch/ppc64/oprofile/op_model_power4.c	2005-01-10 16:51:28.772295560 +1100
@@ -97,7 +97,7 @@
 	mtspr(SPRN_MMCR0, mmcr0);
 
 	mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE;
-	mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL;
+	mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE;
 	mtspr(SPRN_MMCR0, mmcr0);
 
 	mtspr(SPRN_MMCR1, mmcr1_val);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From david at gibson.dropbear.id.au  Tue Jan 11 05:01:27 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Tue, 11 Jan 2005 05:01:27 +1100
Subject: [PPC64] Functions to reserve performance monitor hardware
Message-ID: <20050110180127.GD22101@localhost.localdomain>

Andrew, please apply:

The PPC64 interrupt code includes a hook to call when an exception
from the performance monitor unit occurs.  However, there's no way of
reserving the hook properly, so if more than one bit of code tries to
use it things will get ugly.  Currently oprofile is the only user, but
there are likely to be more in future e.g. perfctr, if and when it
reaches a fit state for merging.

This patch creates functions to reserve and release the performance
monitor hardware (including its interrupt), and makes oprofile use
them.  It also creates a new arch/ppc64/kernel/pmc.c, in which we can
put any future helper functions for handling the performance monitor
counters.

Signed-off-by: David Gibson <dwg at au1.ibm.com>

Index: working-2.6/arch/ppc64/kernel/pmc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ working-2.6/arch/ppc64/kernel/pmc.c	2005-01-10 16:32:49.733411536 +1100
@@ -0,0 +1,65 @@
+/*
+ *  linux/arch/ppc64/kernel/pmc.c
+ *
+ *  Copyright (C) 2004 David Gibson, IBM Corporation.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <linux/errno.h>
+#include <linux/spinlock.h>
+
+#include <asm/processor.h>
+#include <asm/pmc.h>
+
+/* Ensure exceptions are disabled */
+static void dummy_perf(struct pt_regs *regs)
+{
+	unsigned int mmcr0 = mfspr(SPRN_MMCR0);
+
+	mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO);
+	mtspr(SPRN_MMCR0, mmcr0);
+}
+
+static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED;
+static void *pmc_owner_caller; /* mostly for debugging */
+perf_irq_t perf_irq = dummy_perf;
+
+int reserve_pmc_hardware(perf_irq_t new_perf_irq)
+{
+	int err = -EBUSY;;
+
+	spin_lock(&pmc_owner_lock);
+
+	if (pmc_owner_caller) {
+		printk(KERN_WARNING "reserve_pmc_hardware: "
+		       "PMC hardware busy (reserved by caller %p)\n",
+		       pmc_owner_caller);
+		goto out;
+	}
+
+	pmc_owner_caller = __builtin_return_address(0);
+	perf_irq = new_perf_irq ? : dummy_perf;
+
+	err = 0;
+
+ out:
+	spin_unlock(&pmc_owner_lock);
+	return err;
+}
+
+void release_pmc_hardware(void)
+{
+	spin_lock(&pmc_owner_lock);
+
+	WARN_ON(! pmc_owner_caller);
+
+	pmc_owner_caller = NULL;
+	perf_irq = dummy_perf;
+
+	spin_unlock(&pmc_owner_lock);
+}
Index: working-2.6/arch/ppc64/kernel/traps.c
===================================================================
--- working-2.6.orig/arch/ppc64/kernel/traps.c	2005-01-10 10:51:31.000000000 +1100
+++ working-2.6/arch/ppc64/kernel/traps.c	2005-01-10 16:33:43.154412536 +1100
@@ -40,6 +40,7 @@
 #include <asm/rtas.h>
 #include <asm/systemcfg.h>
 #include <asm/machdep.h>
+#include <asm/pmc.h>
 
 #ifdef CONFIG_DEBUGGER
 int (*__debugger)(struct pt_regs *regs);
@@ -449,18 +450,7 @@
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
-/* Ensure exceptions are disabled */
-static void dummy_perf(struct pt_regs *regs)
-{
-	unsigned int mmcr0 = mfspr(SPRN_MMCR0);
-
-	mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO);
-	mtspr(SPRN_MMCR0, mmcr0);
-}
-
-void (*perf_irq)(struct pt_regs *) = dummy_perf;
-
-EXPORT_SYMBOL(perf_irq);
+extern perf_irq_t perf_irq;
 
 void performance_monitor_exception(struct pt_regs *regs)
 {
Index: working-2.6/include/asm-ppc64/pmc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ working-2.6/include/asm-ppc64/pmc.h	2005-01-10 15:24:40.217406672 +1100
@@ -0,0 +1,29 @@
+/*
+ * pmc.h
+ * Copyright (C) 2004  David Gibson, IBM Corporation
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+#ifndef _PPC64_PMC_H
+#define _PPC64_PMC_H
+
+#include <asm/ptrace.h>
+
+typedef void (*perf_irq_t)(struct pt_regs *);
+
+int reserve_pmc_hardware(perf_irq_t new_perf_irq);
+void release_pmc_hardware(void);
+
+#endif /* _PPC64_PMC_H */
Index: working-2.6/arch/ppc64/kernel/Makefile
===================================================================
--- working-2.6.orig/arch/ppc64/kernel/Makefile	2005-01-10 10:51:31.000000000 +1100
+++ working-2.6/arch/ppc64/kernel/Makefile	2005-01-10 15:24:40.218406520 +1100
@@ -11,7 +11,7 @@
 			udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \
 			ptrace32.o signal32.o rtc.o init_task.o \
 			lmb.o cputable.o cpu_setup_power4.o idle_power4.o \
-			iommu.o sysfs.o
+			iommu.o sysfs.o pmc.o
 
 obj-$(CONFIG_PPC_OF) +=	of_device.o
 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From anil_411 at yahoo.com  Mon Jan 10 18:49:30 2005
From: anil_411 at yahoo.com (Anil Kumar Prasad)
Date: Sun, 9 Jan 2005 23:49:30 -0800 (PST)
Subject: ioremap of pci region on pSeries LPAR vs SMP
Message-ID: <20050110074930.92901.qmail@web11508.mail.yahoo.com>

Hi,
I am using SLES9 default kernel(2.6.5). I have a piece
of code where i do ioremap on pci memory region. It
works on JS20 machine where linux runs in partition
mode while it causes SLB miss on SMP box(P630) and
subsequently panics. 
On JS20, i get va in IO_REGION (0xE000....) while on
p630  ioremap returns address in
EEH_REGION(0xA000...). As soon as i try to dereference
this returned va on p630, kernel crashes(dump is at
the end of mail).

I looked in slab.c:slb_allocate(). it doesn't look
like that SLB miss in EEH_REGION will ever get through
'us REGION_ID check will return user address.

Did i miss something? Please help.

Thanks a lot,
Anil.
------------------
SMP NR_CPUS=128 NUMA PSERIES 
NIP: D000000000649CB4 XER: 0000000000000000 LR:
D000000000649CA4
REGS: c0000003f7897670 TRAP: 0380   Tainted: GF U 
(2.6.5-7.97-pseries64)
MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR:
11
DAR: a000000082060010, DSISR: 0000000002200000
TASK: c00000003f1b5340[5037] 'modprobe' THREAD:
c0000003f7894000 CPU: 0
GPR00: 0000000000000001 C0000003F78978F0
D0000000006AEB70 00000000001E8480 
GPR04: 0000000000000000 0000000000000004
0000000028088422 0000000000000000 
GPR08: 0000000000000000 FFFFFFFFFFFFFFFF
C0000000006CAC80 0000000000000080 
GPR12: 0000000048004028 C000000000444000
D0000000006A6DD9 D0000000006A6DA8 
GPR16: 0000000000000001 0000000000000000
C000000000411770 C000000000411670 
GPR20: C000000000411050 D0000000006A6DA8
0000000000001867 0000000000006278 
GPR24: 0000000000000001 C0000003F7897A40
C0000001FD158080 C0000001FD158180 
GPR28: 0000000000000000 A000000082060010
D0000000006A8F38 C000000000411000
---------------------------------------------------


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250


From anil_411 at yahoo.com  Mon Jan 10 18:49:45 2005
From: anil_411 at yahoo.com (Anil Kumar Prasad)
Date: Sun, 9 Jan 2005 23:49:45 -0800 (PST)
Subject: ioremap of pci region on pSeries LPAR vs SMP
Message-ID: <20050110074945.83609.qmail@web11501.mail.yahoo.com>

Hi,
I am using SLES9 default kernel(2.6.5). I have a piece
of code where i do ioremap on pci memory region. It
works on JS20 machine where linux runs in partition
mode while it causes SLB miss on SMP box(P630) and
subsequently panics. 
On JS20, i get va in IO_REGION (0xE000....) while on
p630  ioremap returns address in
EEH_REGION(0xA000...). As soon as i try to dereference
this returned va on p630, kernel crashes(dump is at
the end of mail).

I looked in slab.c:slb_allocate(). it doesn't look
like that SLB miss in EEH_REGION will ever get through
'us REGION_ID check will return user address.

Did i miss something? Please help.

Thanks a lot,
Anil.
------------------
SMP NR_CPUS=128 NUMA PSERIES 
NIP: D000000000649CB4 XER: 0000000000000000 LR:
D000000000649CA4
REGS: c0000003f7897670 TRAP: 0380   Tainted: GF U 
(2.6.5-7.97-pseries64)
MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR:
11
DAR: a000000082060010, DSISR: 0000000002200000
TASK: c00000003f1b5340[5037] 'modprobe' THREAD:
c0000003f7894000 CPU: 0
GPR00: 0000000000000001 C0000003F78978F0
D0000000006AEB70 00000000001E8480 
GPR04: 0000000000000000 0000000000000004
0000000028088422 0000000000000000 
GPR08: 0000000000000000 FFFFFFFFFFFFFFFF
C0000000006CAC80 0000000000000080 
GPR12: 0000000048004028 C000000000444000
D0000000006A6DD9 D0000000006A6DA8 
GPR16: 0000000000000001 0000000000000000
C000000000411770 C000000000411670 
GPR20: C000000000411050 D0000000006A6DA8
0000000000001867 0000000000006278 
GPR24: 0000000000000001 C0000003F7897A40
C0000001FD158080 C0000001FD158180 
GPR28: 0000000000000000 A000000082060010
D0000000006A8F38 C000000000411000
---------------------------------------------------


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail


From paulus at samba.org  Mon Jan 10 20:10:59 2005
From: paulus at samba.org (Paul Mackerras)
Date: Mon, 10 Jan 2005 20:10:59 +1100
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
Message-ID: <16866.18083.212727.327170@cargo.ozlabs.ibm.com>

Anil Kumar Prasad writes:

> On JS20, i get va in IO_REGION (0xE000....) while on
> p630  ioremap returns address in
> EEH_REGION(0xA000...). As soon as i try to dereference
> this returned va on p630, kernel crashes(dump is at
> the end of mail).

You shouldn't ever directly dereference the result of ioremap.  You
have to use readb/readw/readl and writeb/writew/writel.

Paul.


From trini at kernel.crashing.org  Tue Jan 11 01:52:19 2005
From: trini at kernel.crashing.org (Tom Rini)
Date: Mon, 10 Jan 2005 07:52:19 -0700
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <16865.39960.274092.996530@cargo.ozlabs.ibm.com>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
Message-ID: <20050110145219.GB2226@smtp.west.cox.net>

On Mon, Jan 10, 2005 at 08:03:20AM +1100, Paul Mackerras wrote:
> Jerome Glisse writes:
> 
> > With 2.6.10 i get a compilation error with disable_6xx_mmu
> 
> What kind of machine is this?  Could you send me your .config?
> 
> I suspect that maybe we aren't defining CONFIG_6XX for PPC970
> machines.

Indeed.  It might make most sense to do something like:

Signed-off-by: Tom Rini <trini at kernel.crashing.org>

--- 1.40/arch/ppc/boot/simple/Makefile	2005-01-03 16:49:19 -07:00
+++ edited/arch/ppc/boot/simple/Makefile	2005-01-10 07:51:34 -07:00
@@ -112,11 +112,15 @@
          end-$(pcore)			:= pcore
    cacheflag-$(pcore)			:= -include $(clear_L2_L3)
 
+# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real
+# machine.
+ifeq ($(CONFIG_6xx),y)
       zimage-$(CONFIG_PPC_PREP)		:= zImage-PPLUS
 zimageinitrd-$(CONFIG_PPC_PREP)		:= zImage.initrd-PPLUS
      extra.o-$(CONFIG_PPC_PREP)		:= prepmap.o
         misc-$(CONFIG_PPC_PREP)		+= misc-prep.o mpc10x_memory.o
          end-$(CONFIG_PPC_PREP)		:= prep
+endif
 
          end-$(CONFIG_SANDPOINT)	:= sandpoint
    cacheflag-$(CONFIG_SANDPOINT)	:= -include $(clear_L2_L3)

-- 
Tom Rini
http://gate.crashing.org/~trini/


From hch at lst.de  Tue Jan 11 03:39:15 2005
From: hch at lst.de (Christoph Hellwig)
Date: Mon, 10 Jan 2005 17:39:15 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
Message-ID: <20050110163914.GA11906@lst.de>

On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote:
> +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real
> +# machine.

Maybe we should prevent setting PPC_PREP to y for PPC970 instead?


From trini at kernel.crashing.org  Tue Jan 11 03:44:02 2005
From: trini at kernel.crashing.org (Tom Rini)
Date: Mon, 10 Jan 2005 09:44:02 -0700
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050110163914.GA11906@lst.de>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<20050110163914.GA11906@lst.de>
Message-ID: <20050110164402.GF2226@smtp.west.cox.net>

On Mon, Jan 10, 2005 at 05:39:15PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote:
> > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real
> > +# machine.
> 
> Maybe we should prevent setting PPC_PREP to y for PPC970 instead?

I don't know if that'll compile.  It'd be nice because it means we could
try splitting the PREP stuff out of the OpenFirmware (pmac/chrp) stuff
again.

-- 
Tom Rini
http://gate.crashing.org/~trini/


From linas at austin.ibm.com  Tue Jan 11 04:47:16 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Mon, 10 Jan 2005 11:47:16 -0600
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <16866.18083.212727.327170@cargo.ozlabs.ibm.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
	<16866.18083.212727.327170@cargo.ozlabs.ibm.com>
Message-ID: <20050110174716.GW22274@austin.ibm.com>

Hi Paul,

On Mon, Jan 10, 2005 at 08:10:59PM +1100, Paul Mackerras was heard to remark:
> Anil Kumar Prasad writes:
> 
> > On JS20, i get va in IO_REGION (0xE000....) while on
> > p630  ioremap returns address in
> > EEH_REGION(0xA000...). As soon as i try to dereference
> > this returned va on p630, kernel crashes(dump is at
> > the end of mail).
> 
> You shouldn't ever directly dereference the result of ioremap.  You
> have to use readb/readw/readl and writeb/writew/writel.

Paul,

Please note that someone removed the EEH_REGION stuff recently,
october-ish I think.  I don't know why, I thought it was something 
you condoned.  And so in the latest kernels, it *is* legal to directly 
dereference the result of ioremap.  That is, Anil wouldn't have seen 
this bug if he'd been running the current BK sources.

Was removing this mechanism the right thing to do?  If so, why?
It seemed like a great way to force everyone to use the 
readb/etc macros.


--linas


From j.glisse at gmail.com  Tue Jan 11 05:14:28 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Mon, 10 Jan 2005 19:14:28 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
Message-ID: <4240b9160501101014317b8d85@mail.gmail.com>

> Signed-off-by: Tom Rini <trini at kernel.crashing.org>
> 
> --- 1.40/arch/ppc/boot/simple/Makefile  2005-01-03 16:49:19 -07:00
> +++ edited/arch/ppc/boot/simple/Makefile        2005-01-10 07:51:34 -07:00
> @@ -112,11 +112,15 @@
>           end-$(pcore)                  := pcore
>     cacheflag-$(pcore)                  := -include $(clear_L2_L3)
> 
> +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real
> +# machine.
> +ifeq ($(CONFIG_6xx),y)
>        zimage-$(CONFIG_PPC_PREP)                := zImage-PPLUS
>  zimageinitrd-$(CONFIG_PPC_PREP)                := zImage.initrd-PPLUS
>       extra.o-$(CONFIG_PPC_PREP)                := prepmap.o
>          misc-$(CONFIG_PPC_PREP)                += misc-prep.o mpc10x_memory.o
>           end-$(CONFIG_PPC_PREP)                := prep
> +endif
> 
>           end-$(CONFIG_SANDPOINT)       := sandpoint
>     cacheflag-$(CONFIG_SANDPOINT)       := -include $(clear_L2_L3)
> 

This do not compile with this patch maybe need also to define
CONFIG_6xx if PPC970 is selected as processor ?

The errors are:
undefined reference for cols, lines, vidmems, scroll, orig_x, orig_y
in functions puts, ClearVideoMemory, putc

Jerome Glisse


From j.glisse at gmail.com  Tue Jan 11 05:16:22 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Mon, 10 Jan 2005 19:16:22 +0100
Subject: U3 G5 AGP support patch (v4)
In-Reply-To: <4240b91605010912414a5b1b67@mail.gmail.com>
References: <4240b916050109072621440269@mail.gmail.com>
	<20050109160614.GA22839@lst.de>
	<4240b91605010909463e44bba8@mail.gmail.com>
	<4240b91605010912414a5b1b67@mail.gmail.com>
Message-ID: <4240b916050110101647cfb8f9@mail.gmail.com>

> It seems there is bug somewhere in my agp patch. I was playing with
> r300 radeon and
> i get a hard lockup (quite used to that while playing with r300 thought :()
> 
> But after a bit of investigation it seems to be related to agp. Right now i am
> porting an old tools from dri that test agpgart & thus agp. I finally may really
> need to totaly split the u3 driver from the old uninorth.
> 
> I will give a deeper look to track down the issue. In the mean time if some
> one could test agp & radeon r200 on a g5. You will certainly lockup your g5
> but it should not burn, at least here i just got some smoke ;)
> 

Finally this was because i was doing some nasty stuff elsewhere :)
Thus AGP seems to work well, at least over here with some r300 
test program using agp :)

best,
Jerome Glisse


From trini at kernel.crashing.org  Tue Jan 11 05:29:41 2005
From: trini at kernel.crashing.org (Tom Rini)
Date: Mon, 10 Jan 2005 11:29:41 -0700
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <4240b9160501101014317b8d85@mail.gmail.com>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
Message-ID: <20050110182940.GA3391@smtp.west.cox.net>

On Mon, Jan 10, 2005 at 07:14:28PM +0100, Jerome Glisse wrote:
> > Signed-off-by: Tom Rini <trini at kernel.crashing.org>
> > 
> > --- 1.40/arch/ppc/boot/simple/Makefile  2005-01-03 16:49:19 -07:00
> > +++ edited/arch/ppc/boot/simple/Makefile        2005-01-10 07:51:34 -07:00
> > @@ -112,11 +112,15 @@
> >           end-$(pcore)                  := pcore
> >     cacheflag-$(pcore)                  := -include $(clear_L2_L3)
> > 
> > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real
> > +# machine.
> > +ifeq ($(CONFIG_6xx),y)
> >        zimage-$(CONFIG_PPC_PREP)                := zImage-PPLUS
> >  zimageinitrd-$(CONFIG_PPC_PREP)                := zImage.initrd-PPLUS
> >       extra.o-$(CONFIG_PPC_PREP)                := prepmap.o
> >          misc-$(CONFIG_PPC_PREP)                += misc-prep.o mpc10x_memory.o
> >           end-$(CONFIG_PPC_PREP)                := prep
> > +endif
> > 
> >           end-$(CONFIG_SANDPOINT)       := sandpoint
> >     cacheflag-$(CONFIG_SANDPOINT)       := -include $(clear_L2_L3)
> > 
> 
> This do not compile with this patch maybe need also to define
> CONFIG_6xx if PPC970 is selected as processor ?

I have a feeling CONFIG_6xx isn't selected for a good reason.  Can you
try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig
and seeing if you can build / boot ?  Thanks.

-- 
Tom Rini
http://gate.crashing.org/~trini/


From j.glisse at gmail.com  Tue Jan 11 05:59:50 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Mon, 10 Jan 2005 19:59:50 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050110182940.GA3391@smtp.west.cox.net>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
	<20050110182940.GA3391@smtp.west.cox.net>
Message-ID: <4240b91605011010593d2f3b3d@mail.gmail.com>

> I have a feeling CONFIG_6xx isn't selected for a good reason.  Can you
> try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig
> and seeing if you can build / boot ?  Thanks.
> 
> --
> Tom Rini
> http://gate.crashing.org/~trini/
> 

Seems that this flags is linked to many things :) I tried removing PPC_PREP
bool but the kernel fail to compile with again new errors :

arch/ppc/kernel/built-in.o(.init.text+0x610): In function `DoSyscall':
arch/ppc/kernel/entry.S: undefined reference to `prep_init'

arch/ppc/platforms/built-in.o(.pmac.text+0x936): In function 
'note_bootable_part':
: undefined reference to `boot_dev'

I attach my config, someone asked me for that previously but
i crashed my system since, thus here it is.

Jerome Glisse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config-ppc970
Type: application/octet-stream
Size: 27921 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050110/902cf012/attachment.obj 

From trini at kernel.crashing.org  Tue Jan 11 06:12:48 2005
From: trini at kernel.crashing.org (Tom Rini)
Date: Mon, 10 Jan 2005 12:12:48 -0700
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <4240b91605011010593d2f3b3d@mail.gmail.com>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
	<20050110182940.GA3391@smtp.west.cox.net>
	<4240b91605011010593d2f3b3d@mail.gmail.com>
Message-ID: <20050110191248.GB3391@smtp.west.cox.net>

On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote:
> > I have a feeling CONFIG_6xx isn't selected for a good reason.  Can you
> > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig
> > and seeing if you can build / boot ?  Thanks.
> > 
> > --
> > Tom Rini
> > http://gate.crashing.org/~trini/
> > 
> 
> Seems that this flags is linked to many things :) I tried removing PPC_PREP
> bool but the kernel fail to compile with again new errors :
> 

One last thing before we just do what you suggested originally, can you
hack it so that PPC_PREP is still set, but on 970 we still set
CONFIG_6xx?  Thanks again.

-- 
Tom Rini
http://gate.crashing.org/~trini/


From j.glisse at gmail.com  Tue Jan 11 06:31:29 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Mon, 10 Jan 2005 20:31:29 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050110191248.GB3391@smtp.west.cox.net>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
	<20050110182940.GA3391@smtp.west.cox.net>
	<4240b91605011010593d2f3b3d@mail.gmail.com>
	<20050110191248.GB3391@smtp.west.cox.net>
Message-ID: <4240b91605011011314bb06814@mail.gmail.com>

On Mon, 10 Jan 2005 12:12:48 -0700, Tom Rini <trini at kernel.crashing.org> wrote:
> On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote:
> > > I have a feeling CONFIG_6xx isn't selected for a good reason.  Can you
> > > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig
> > > and seeing if you can build / boot ?  Thanks.
> > >
> > > --
> > > Tom Rini
> > > http://gate.crashing.org/~trini/
> > >
> >
> > Seems that this flags is linked to many things :) I tried removing PPC_PREP
> > bool but the kernel fail to compile with again new errors :
> >
> 
> One last thing before we just do what you suggested originally, can you
> hack it so that PPC_PREP is still set, but on 970 we still set
> CONFIG_6xx?  Thanks again.
> 
> --
> Tom Rini
> http://gate.crashing.org/~trini/
> 

This issue must be strongly linked with the Murphy Law.
Got another compile error when y a add CONFIG_6xx=y
to my kernel config.

  LD      .tmp_vmlinux1
ld: arch/ppc/kernel/idle_6xx.o: No such file: Aucun fichier ou
r?pertoire de ce type

Unfortunetly i've got to move (some trip for my study) and i won't be
able to have access any g5 or PPC970 with linux on it until i came back
friday or saturday. Anyway i may access my mail until then.

Does this disable_6xx_mmu function do something that we should really
have on PPC970 ? I hadn't get enought time to look at this function and
understand it.

By the way, even if i pretty sure this is not related, my kernel is patched
with one of my patch (i posted it on this mailling list) that add support of U3
agp bridge on G5.

This patch only affect few file and if i remember well, i have tested without
it too with no success.

Files affected by my patch pciids.h, uninorth.c(char/driver/agp), 
uninorth.h(asm-ppc&64). and some change in Kconfig of (char/driver/agp)

One strange things is that no one except me report error on compilation ?
No one use linux with g5, am i alone :) ?

best,
Jerome Glisse


From domen at coderock.org  Tue Jan 11 06:59:58 2005
From: domen at coderock.org (domen at coderock.org)
Date: Mon, 10 Jan 2005 20:59:58 +0100
Subject: [patch 1/1] ppc64: semicolon in rtasd.c
Message-ID: <20050110195959.4D66A1F203@trashy.coderock.org>


Hi.

Comments and identiation suggest this was wrong.

Signed-off-by: Domen Puncer <domen at coderock.org>
---


 kj-domen/arch/ppc64/kernel/rtasd.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c
--- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c	2005-01-10 18:00:30.000000000 +0100
+++ kj-domen/arch/ppc64/kernel/rtasd.c	2005-01-10 18:00:30.000000000 +0100
@@ -486,7 +486,7 @@ static int __init rtas_init(void)
 
 	/* No RTAS, only warn if we are on a pSeries box  */
 	if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) {
-		if (systemcfg->platform & PLATFORM_PSERIES);
+		if (systemcfg->platform & PLATFORM_PSERIES)
 			printk(KERN_ERR "rtasd: no event-scan on system\n");
 		return 1;
 	}
_


From hollis at penguinppc.org  Tue Jan 11 07:15:57 2005
From: hollis at penguinppc.org (Hollis Blanchard)
Date: Mon, 10 Jan 2005 20:15:57 +0000
Subject: email message sizes
In-Reply-To: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org>
References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org>
Message-ID: <200501102015.57394.hollis@penguinppc.org>

On Monday 10 January 2005 15:22, Hollis Blanchard wrote:
> Hi all, I am one of two people who moderates these mailing lists. On
> occasion, people send large emails to these lists. I am of the opinion
> that 1MB emails should not be mass-mailed, but if you all have no
> problem with that then I will approve them.
>
> So are any of you on modems, or operate near the limits of your mail
> quotas? I'd like to hear comments either way: how large is ok to post
> to these mailing lists?

So far I have received 5 private mails indicating that 100KB is a reasonable 
maximum. If you disagree please speak up...

-Hollis


From paulus at samba.org  Tue Jan 11 08:41:48 2005
From: paulus at samba.org (Paul Mackerras)
Date: Tue, 11 Jan 2005 08:41:48 +1100
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <20050110174716.GW22274@austin.ibm.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
	<16866.18083.212727.327170@cargo.ozlabs.ibm.com>
	<20050110174716.GW22274@austin.ibm.com>
Message-ID: <16866.63132.352016.732484@cargo.ozlabs.ibm.com>

Linas Vepstas writes:

> Please note that someone removed the EEH_REGION stuff recently,
> october-ish I think.  I don't know why, I thought it was something 
> you condoned.  And so in the latest kernels, it *is* legal to directly 
> dereference the result of ioremap. 

It might work, but it's not legal on any architecture.  I thought
there was a file in the Documentation directory explaining that, but I
can't find it now.  Certainly it has been discussed on various mailing
lists in the past.  See for example:

http://uwsg.iu.edu/hypermail/linux/kernel/0007.3/0591.html

On ppc and ppc64, the ioremap return happens to be a valid effective
address, but dereferencing it directly is still not right, since if
you do that you miss out on the barriers you need to ensure that your
loads and stores hit the device in program order.

> Was removing this mechanism the right thing to do?  If so, why?

It was an enormous simplification and Linus was keen to do it.  He
actually looks at our code from time to time now that his desktop
machine is a G5. :)

> It seemed like a great way to force everyone to use the 
> readb/etc macros.

Some architectures do in fact use ioremap cookie poisoning for that
reason.  We could do that as a debug option.

Paul.


From olof at austin.ibm.com  Tue Jan 11 09:23:40 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Mon, 10 Jan 2005 16:23:40 -0600
Subject: [PPC64] Functions to reserve performance monitor hardware
In-Reply-To: <20050110180127.GD22101@localhost.localdomain>
References: <20050110180127.GD22101@localhost.localdomain>
Message-ID: <20050110222340.GA13731@austin.ibm.com>

On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote:

> This patch creates functions to reserve and release the performance
> monitor hardware (including its interrupt), and makes oprofile use
> them.

I don't see where you make oprofile use the functions? op_model_*
changes aren't included in the patch.

> +int reserve_pmc_hardware(perf_irq_t new_perf_irq)
> +{
> +	int err = -EBUSY;;

Keeping an extra semicolon around in case you need one in a hurry? :)

> +	spin_lock(&pmc_owner_lock);
> +
> +	if (pmc_owner_caller) {
> +		printk(KERN_WARNING "reserve_pmc_hardware: "
> +		       "PMC hardware busy (reserved by caller %p)\n",
> +		       pmc_owner_caller);
> +		goto out;
> +	}
> +
> +	pmc_owner_caller = __builtin_return_address(0);
> +	perf_irq = new_perf_irq ? : dummy_perf;
> +
> +	err = 0;

Maybe I'm the only one with such an opinion, but I find it more readable
to set the error code in the error case (if section above) instead of
defaulting to error and clearing it before returning. :)

> +	pmc_owner_caller = NULL;
> +	perf_irq = dummy_perf;
> +
> +	spin_unlock(&pmc_owner_lock);

Current oprofile code has an implicit mb(); after restoring perf_irq. I
think the implied lwsync in spin_unlock is sufficient, but I wanted to
mention it.

How do you expect the function to be used, will there really be users
reserving the hardware without registering the interrupt handler? If
there are no such users then it could be nice to reserve using the
handler instead of the return address.


-Olof


From anton at samba.org  Tue Jan 11 10:00:15 2005
From: anton at samba.org (Anton Blanchard)
Date: Tue, 11 Jan 2005 10:00:15 +1100
Subject: [patch 1/1] ppc64: semicolon in rtasd.c
In-Reply-To: <20050110195959.4D66A1F203@trashy.coderock.org>
References: <20050110195959.4D66A1F203@trashy.coderock.org>
Message-ID: <20050110230015.GB14239@krispykreme.ozlabs.ibm.com>


Nice catch!

Anton

--

From: Domen Puncer <domen at coderock.org>

semicolon in rtasd.c

Signed-off-by: Domen Puncer <domen at coderock.org>
Acked-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c
--- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c	2005-01-10 18:00:30.000000000 +0100
+++ kj-domen/arch/ppc64/kernel/rtasd.c	2005-01-10 18:00:30.000000000 +0100
@@ -486,7 +486,7 @@ static int __init rtas_init(void)
 
 	/* No RTAS, only warn if we are on a pSeries box  */
 	if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) {
-		if (systemcfg->platform & PLATFORM_PSERIES);
+		if (systemcfg->platform & PLATFORM_PSERIES)
 			printk(KERN_ERR "rtasd: no event-scan on system\n");
 		return 1;
 	}
_


From anton at samba.org  Tue Jan 11 11:08:45 2005
From: anton at samba.org (Anton Blanchard)
Date: Tue, 11 Jan 2005 11:08:45 +1100
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <16866.63132.352016.732484@cargo.ozlabs.ibm.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
	<16866.18083.212727.327170@cargo.ozlabs.ibm.com>
	<20050110174716.GW22274@austin.ibm.com>
	<16866.63132.352016.732484@cargo.ozlabs.ibm.com>
Message-ID: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com>

 
Hi,

> It was an enormous simplification and Linus was keen to do it.  He
> actually looks at our code from time to time now that his desktop
> machine is a G5. :)

Roland (the infiniband guy) and Linus were behind it:

http://marc.theaimsgroup.com/?l=linux-kernel&m=109579598620069&w=2

Looks like it was due to __raw_* not having any EEH checks.

As a side note its a worry that we dont have IO macros that order but
dont byte swap. __raw_* (which doesnt order) is going to catch out a
lot of driver writers I suspect.

> Some architectures do in fact use ioremap cookie poisoning for that
> reason.  We could do that as a debug option.

Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into
userspace. Their hack on ppc64 was to look at the high nibble of the
address and convert it to a non EEH address if required :)

Im not sure how best to solve the userspace mmap issue but there are a
few groups wanting that.

Anton


From david at gibson.dropbear.id.au  Tue Jan 11 21:57:07 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Tue, 11 Jan 2005 21:57:07 +1100
Subject: [PPC64] Functions to reserve performance monitor hardware
In-Reply-To: <20050110222340.GA13731@austin.ibm.com>
References: <20050110180127.GD22101@localhost.localdomain>
	<20050110222340.GA13731@austin.ibm.com>
Message-ID: <20050111105707.GC28175@localhost.localdomain>

On Mon, Jan 10, 2005 at 04:23:40PM -0600, Olof Johansson wrote:
> On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote:
> 
> > This patch creates functions to reserve and release the performance
> > monitor hardware (including its interrupt), and makes oprofile use
> > them.
> 
> I don't see where you make oprofile use the functions? op_model_*
> changes aren't included in the patch.

Ah, bugger.  I could have sworn I made the changes, wonder where I
managed to drop them.

> > +int reserve_pmc_hardware(perf_irq_t new_perf_irq)
> > +{
> > +	int err = -EBUSY;;
> 
> Keeping an extra semicolon around in case you need one in a hurry? :)

Oh, dear, I clearly wasn't having a good day.

> > +	spin_lock(&pmc_owner_lock);
> > +
> > +	if (pmc_owner_caller) {
> > +		printk(KERN_WARNING "reserve_pmc_hardware: "
> > +		       "PMC hardware busy (reserved by caller %p)\n",
> > +		       pmc_owner_caller);
> > +		goto out;
> > +	}
> > +
> > +	pmc_owner_caller = __builtin_return_address(0);
> > +	perf_irq = new_perf_irq ? : dummy_perf;
> > +
> > +	err = 0;
> 
> Maybe I'm the only one with such an opinion, but I find it more readable
> to set the error code in the error case (if section above) instead of
> defaulting to error and clearing it before returning. :)

Actually, I think I do to, but I've been experimenting with this
style, since it seems to be rather common in the kernel.  Anyway,
revised below.

> > +	pmc_owner_caller = NULL;
> > +	perf_irq = dummy_perf;
> > +
> > +	spin_unlock(&pmc_owner_lock);
> 
> Current oprofile code has an implicit mb(); after restoring perf_irq. I
> think the implied lwsync in spin_unlock is sufficient, but I wanted to
> mention it.

Yes, I did think about that, and figured the barrier in the
spin_unlock() should be sufficient.

> How do you expect the function to be used, will there really be users
> reserving the hardware without registering the interrupt handler? 

I think it's entirely plausible that there could be.  It would seem a
bit yucky for a user that wasn't using interrupts to have to define
their own copy of the dummy_perf() routine.

> If
> there are no such users then it could be nice to reserve using the
> handler instead of the return address.

Well, bear in mind that from the semantics point of view it's only the
non-nullness of the return address that matters, so essentially it's
just a flag.  The rest of the return address is just there for
debugging convenience.

Anyway, patch with the abovementioned stupidities removed follows.
Andrew, please apply:

The PPC64 interrupt code includes a hook to call when an exception
from the performance monitor unit occurs.  However, there's no way of
reserving the hook properly, so if more than one bit of code tries to
use it things will get ugly.  Currently oprofile is the only user, but
there are likely to be more in future e.g. perfctr, if and when it
reaches a fit state for merging.

This patch creates functions to reserve and release the performance
monitor hardware (including its interrupt), and makes oprofile use
them.  It also creates a new arch/ppc64/kernel/pmc.c, in which we can
put any future helper functions for handling the performance monitor
counters.

Signed-off-by: David Gibson <dwg at au1.ibm.com>

Index: working-2.6/arch/ppc64/kernel/pmc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ working-2.6/arch/ppc64/kernel/pmc.c	2005-01-11 10:37:52.001422584 +1100
@@ -0,0 +1,64 @@
+/*
+ *  linux/arch/ppc64/kernel/pmc.c
+ *
+ *  Copyright (C) 2004 David Gibson, IBM Corporation.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <linux/errno.h>
+#include <linux/spinlock.h>
+
+#include <asm/processor.h>
+#include <asm/pmc.h>
+
+/* Ensure exceptions are disabled */
+static void dummy_perf(struct pt_regs *regs)
+{
+	unsigned int mmcr0 = mfspr(SPRN_MMCR0);
+
+	mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO);
+	mtspr(SPRN_MMCR0, mmcr0);
+}
+
+static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED;
+static void *pmc_owner_caller; /* mostly for debugging */
+perf_irq_t perf_irq = dummy_perf;
+
+int reserve_pmc_hardware(perf_irq_t new_perf_irq)
+{
+	int err = 0;
+
+	spin_lock(&pmc_owner_lock);
+
+	if (pmc_owner_caller) {
+		printk(KERN_WARNING "reserve_pmc_hardware: "
+		       "PMC hardware busy (reserved by caller %p)\n",
+		       pmc_owner_caller);
+		err = -EBUSY;
+		goto out;
+	}
+
+	pmc_owner_caller = __builtin_return_address(0);
+	perf_irq = new_perf_irq ? : dummy_perf;
+
+ out:
+	spin_unlock(&pmc_owner_lock);
+	return err;
+}
+
+void release_pmc_hardware(void)
+{
+	spin_lock(&pmc_owner_lock);
+
+	WARN_ON(! pmc_owner_caller);
+
+	pmc_owner_caller = NULL;
+	perf_irq = dummy_perf;
+
+	spin_unlock(&pmc_owner_lock);
+}
Index: working-2.6/arch/ppc64/kernel/traps.c
===================================================================
--- working-2.6.orig/arch/ppc64/kernel/traps.c	2005-01-11 10:36:44.555424864 +1100
+++ working-2.6/arch/ppc64/kernel/traps.c	2005-01-11 10:36:46.969324088 +1100
@@ -40,6 +40,7 @@
 #include <asm/rtas.h>
 #include <asm/systemcfg.h>
 #include <asm/machdep.h>
+#include <asm/pmc.h>
 
 #ifdef CONFIG_DEBUGGER
 int (*__debugger)(struct pt_regs *regs);
@@ -449,18 +450,7 @@
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
-/* Ensure exceptions are disabled */
-static void dummy_perf(struct pt_regs *regs)
-{
-	unsigned int mmcr0 = mfspr(SPRN_MMCR0);
-
-	mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO);
-	mtspr(SPRN_MMCR0, mmcr0);
-}
-
-void (*perf_irq)(struct pt_regs *) = dummy_perf;
-
-EXPORT_SYMBOL(perf_irq);
+extern perf_irq_t perf_irq;
 
 void performance_monitor_exception(struct pt_regs *regs)
 {
Index: working-2.6/include/asm-ppc64/pmc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ working-2.6/include/asm-ppc64/pmc.h	2005-01-11 10:36:46.970323936 +1100
@@ -0,0 +1,29 @@
+/*
+ * pmc.h
+ * Copyright (C) 2004  David Gibson, IBM Corporation
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ * 
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+#ifndef _PPC64_PMC_H
+#define _PPC64_PMC_H
+
+#include <asm/ptrace.h>
+
+typedef void (*perf_irq_t)(struct pt_regs *);
+
+int reserve_pmc_hardware(perf_irq_t new_perf_irq);
+void release_pmc_hardware(void);
+
+#endif /* _PPC64_PMC_H */
Index: working-2.6/arch/ppc64/kernel/Makefile
===================================================================
--- working-2.6.orig/arch/ppc64/kernel/Makefile	2005-01-11 10:36:44.555424864 +1100
+++ working-2.6/arch/ppc64/kernel/Makefile	2005-01-11 10:36:46.970323936 +1100
@@ -11,7 +11,7 @@
 			udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \
 			ptrace32.o signal32.o rtc.o init_task.o \
 			lmb.o cputable.o cpu_setup_power4.o idle_power4.o \
-			iommu.o sysfs.o
+			iommu.o sysfs.o pmc.o
 
 obj-$(CONFIG_PPC_OF) +=	of_device.o
 
Index: working-2.6/arch/ppc64/oprofile/common.c
===================================================================
--- working-2.6.orig/arch/ppc64/oprofile/common.c	2005-01-06 10:47:48.000000000 +1100
+++ working-2.6/arch/ppc64/oprofile/common.c	2005-01-11 10:42:26.788317488 +1100
@@ -15,6 +15,7 @@
 #include <linux/errno.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
+#include <asm/pmc.h>
 
 #include "op_impl.h"
 
@@ -22,9 +23,6 @@
 extern struct op_ppc64_model op_model_power4;
 static struct op_ppc64_model *model;
 
-extern void (*perf_irq)(struct pt_regs *);
-static void (*save_perf_irq)(struct pt_regs *);
-
 static struct op_counter_config ctr[OP_MAX_COUNTER];
 static struct op_system_config sys;
 
@@ -35,11 +33,12 @@
 
 static int op_ppc64_setup(void)
 {
-	/* Install our interrupt handler into the existing hook.  */
-	save_perf_irq = perf_irq;
-	perf_irq = op_handle_interrupt;
+	int err;
 
-	mb();
+	/* Grab the hardware */
+	err = reserve_pmc_hardware(op_handle_interrupt);
+	if (err)
+		return err;
 
 	/* Pre-compute the values to stuff in the hardware registers.  */
 	model->reg_setup(ctr, &sys, model->num_counters);
@@ -52,10 +51,7 @@
 
 static void op_ppc64_shutdown(void)
 {
-	mb();
-
-	/* Remove our interrupt handler. We may be removing this module. */
-	perf_irq = save_perf_irq;
+	release_pmc_hardware();
 }
 
 static void op_ppc64_cpu_start(void *dummy)


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From michael at ellerman.id.au  Tue Jan 11 19:43:57 2005
From: michael at ellerman.id.au (Michael Ellerman)
Date: Tue, 11 Jan 2005 19:43:57 +1100
Subject: [PATCH 2/2] ppc64: Fix iseries_veth module unload race and memory
	leak
Message-ID: <20050111084358.0ABD917DF7@ozlabs.au.ibm.com>

Hi All,

When the iseries_veth driver module is unloaded there is the potential for an
oops and also some memory leakage.

Because the HvLpEvent_unregisterHandler() function did no synchronisation, it
was possible for the handler that was being unregistered to be running on another
CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the
iseries_veth driver to leave work in the events work queue after the module
had been unloaded. When that work was eventually executed we got an oops.

In addition some of the data structures in the iseries_veth driver were not
being correctly freed when the module was unloaded.

This is the second patch, we make iseries_veth call flush_scheduled_work()
after we are sure the handler is no longer running, and also fix the memory leaks.


 iseries_veth.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)


Signed-off-by: Michael Ellerman <michael at ellerman.id.au>

diff -urN 2.6.10-ppc64-stock/drivers/net/iseries_veth.c 2.6.10-ppc64-work/drivers/net/iseries_veth.c
--- 2.6.10-ppc64-stock/drivers/net/iseries_veth.c	2004-12-25 10:14:43.000000000 +1100
+++ 2.6.10-ppc64-work/drivers/net/iseries_veth.c	2005-01-11 18:40:21.811722242 +1100
@@ -642,7 +642,7 @@
 	return 0;
 }
 
-static void veth_destroy_connection(u8 rlp)
+static void veth_stop_connection(u8 rlp)
 {
 	struct veth_lpar_connection *cnx = veth_cnx[rlp];
 
@@ -671,9 +671,18 @@
 				      HvLpEvent_Type_VirtualLan,
 				      cnx->num_ack_events,
 				      NULL, NULL);
+}
+
+static void veth_destroy_connection(u8 rlp)
+{
+	struct veth_lpar_connection *cnx = veth_cnx[rlp];
 
-	if (cnx->msgs)
-		kfree(cnx->msgs);
+	if (! cnx)
+		return;
+
+	kfree(cnx->msgs);
+	kfree(cnx);
+	veth_cnx[rlp] = NULL;
 }
 
 /*
@@ -1375,9 +1384,18 @@
 	vio_unregister_driver(&veth_driver);
 
 	for (i = 0; i < HVMAXARCHITECTEDLPS; ++i)
-		veth_destroy_connection(i);
+		veth_stop_connection(i);
 
 	HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan);
+
+	/* Hypervisor callbacks may have scheduled more work while we
+	 * were destroying connections. Now that we've disconnected from
+	 * the hypervisor make sure everything's finished. */
+	flush_scheduled_work();
+
+	for (i = 0; i < HVMAXARCHITECTEDLPS; ++i)
+		veth_destroy_connection(i);
+	
 }
 module_exit(veth_module_cleanup);
 

From michael at ellerman.id.au  Tue Jan 11 19:43:57 2005
From: michael at ellerman.id.au (Michael Ellerman)
Date: Tue, 11 Jan 2005 19:43:57 +1100
Subject: [PATCH 1/2] ppc64: Fix iseries_veth module unload race and memory
	leak
Message-ID: <20050111084357.7C3F317DDF@ozlabs.au.ibm.com>

Hi All,

When the iseries_veth driver module is unloaded there is the potential for an
oops and also some memory leakage.

Because the HvLpEvent_unregisterHandler() function did no synchronisation, it
was possible for the handler that was being unregistered to be running on another
CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the
iseries_veth driver to leave work in the events work queue after the module
had been unloaded. When that work was eventually executed we got an oops.

In addition some of the data structures in the iseries_veth driver were not
being correctly freed when the module was unloaded.

This is the first patch, which makes HvLpEvent_unregisterHandler() work.

 arch/ppc64/kernel/HvLpEvent.c         |    8 ++++++++
 include/asm-ppc64/iSeries/HvLpEvent.h |    3 +++
 2 files changed, 11 insertions(+)


Signed-off-by: Michael Ellerman <michael at ellerman.id.au>

diff -urN 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c
--- 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c	2004-06-16 17:12:51.000000000 +1000
+++ 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c	2005-01-10 16:13:33.381994263 +1100
@@ -34,10 +34,18 @@
 int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType )
 {
 	int rc = 1;
+
+	might_sleep();
+
 	if ( eventType < HvLpEvent_Type_NumTypes ) {
 		if ( !lpEventHandlerPaths[eventType] ) {
 			lpEventHandler[eventType] = NULL;
 			rc = 0;
+
+			/* We now sleep until all other CPUs have scheduled. This ensures that
+			 * the deletion is seen by all other CPUs, and that the deleted handler
+			 * isn't still running on another CPU when we return. */
+			synchronize_kernel();
 		}
 	}
 	return rc;
diff -urN 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h
--- 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h	2004-02-04 14:44:05.000000000 +1100
+++ 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h	2005-01-10 16:11:18.899255131 +1100
@@ -75,6 +75,9 @@
 extern int HvLpEvent_registerHandler( HvLpEvent_Type eventType, LpEventHandler hdlr);
 
 // Unregister a handler for an event type
+//  This call will sleep until the handler being removed is guaranteed to
+//  be no longer executing on any CPU. Do not call with locks held.
+//
 //  returns 0 on success
 //  Unregister will fail if there are any paths open for the type
 extern int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType );


From clark at esteem.com  Wed Jan 12 03:19:53 2005
From: clark at esteem.com (Conn Clark)
Date: Tue, 11 Jan 2005 08:19:53 -0800
Subject: email message sizes
In-Reply-To: <200501102015.57394.hollis@penguinppc.org>
References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org>
	<200501102015.57394.hollis@penguinppc.org>
Message-ID: <41E3FCA9.1060705@esteem.com>

Hollis Blanchard wrote:
> On Monday 10 January 2005 15:22, Hollis Blanchard wrote:
> 
>>Hi all, I am one of two people who moderates these mailing lists. On
>>occasion, people send large emails to these lists. I am of the opinion
>>that 1MB emails should not be mass-mailed, but if you all have no
>>problem with that then I will approve them.
>>
>>So are any of you on modems, or operate near the limits of your mail
>>quotas? I'd like to hear comments either way: how large is ok to post
>>to these mailing lists?
> 
> 
> So far I have received 5 private mails indicating that 100KB is a reasonable 
> maximum. If you disagree please speak up...
> 
> -Hollis

I say 101K because I think it should be 100k and I know I will want to 
send something just over the limit.


-- Conn Clark

*****************************************************************
Give a man a match and you heat him for a moment. Set him on fire
and you'll heat him for life.
*****************************************************************

Conn Clark
Engineering Stooge				clark at esteem.com
Electronic Systems Technology Inc.		www.esteem.com

Stock Ticker Symbol				ELST


From linas at austin.ibm.com  Wed Jan 12 09:17:23 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Tue, 11 Jan 2005 16:17:23 -0600
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
	<16866.18083.212727.327170@cargo.ozlabs.ibm.com>
	<20050110174716.GW22274@austin.ibm.com>
	<16866.63132.352016.732484@cargo.ozlabs.ibm.com>
	<20050111000845.GC14239@krispykreme.ozlabs.ibm.com>
Message-ID: <20050111221723.GE23690@austin.ibm.com>

On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark:
> 
> Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into
> userspace. Their hack on ppc64 was to look at the high nibble of the
> address and convert it to a non EEH address if required :)
> 
> Im not sure how best to solve the userspace mmap issue but there are a
> few groups wanting that.

Somewhat off-topic ... but ...

1) If you design your hardware correctly, there are some amazing things
   you can do (performance wise) by mmaping pci card resources into user
   space.  If your hardwares is done right, then user corruption can't 
   hurt the system. This was the defacto method for getting high 
   performance graphics on IBM RS/6000, sgi, HP and Sun workstations 
   many moons ago.

2) There is interest in the virtual i/o community about mmaping 
   funky stuff to userspace, but that conversation may be for a 
   different day.  The question is (for example) how to build
   a high-performance virtual scsi server in userspace (without
   kernel pieces) which is a design point some people like.
   Later...

--linas


From linas at austin.ibm.com  Wed Jan 12 09:27:08 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Tue, 11 Jan 2005 16:27:08 -0600
Subject: [PATCH] PPC64: Trivial Cleanup:  EEH_REGION
Message-ID: <20050111222708.GF23690@austin.ibm.com>


Hi Paul, 

Please forward upstream if you agree.

This is a dumb, dorky cleanup patch:
Per last round of emails, the concept of EEH _REGION is gone, 
but a few  stubs remained.  This patch removes them.

Note there is some funny business in the SLB code that 
I did not understand, and so I left that alone.  
I'm guessing that it should be cut out as well.

Signed-off-by: Linas Vepstas <linas at linas.org>

--linas


-------------- next part --------------
===== arch/ppc64/mm/hash_utils.c 1.55 vs edited =====
--- 1.55/arch/ppc64/mm/hash_utils.c	2004-10-28 02:39:49 -05:00
+++ edited/arch/ppc64/mm/hash_utils.c	2005-01-10 16:58:40 -06:00
@@ -295,12 +295,6 @@ int hash_page(unsigned long ea, unsigned
 		vsid = get_kernel_vsid(ea);
 		break;
 #if 0
-	case EEH_REGION_ID:
-		/*
-		 * Should only be hit if there is an access to MMIO space
-		 * which is protected by EEH.
-		 * Send the problem up to do_page_fault 
-		 */
 	case KERNEL_REGION_ID:
 		/*
 		 * Should never get here - entire 0xC0... region is bolted.
===== arch/ppc64/mm/slb.c 1.3 vs edited =====
--- 1.3/arch/ppc64/mm/slb.c	2004-09-03 04:08:16 -05:00
+++ edited/arch/ppc64/mm/slb.c	2005-01-10 17:03:36 -06:00
@@ -75,6 +75,8 @@ static void slb_flush_and_rebolt(void)
 		     : "memory");
 }
 
+#define EEHREGIONBASE   ASM_CONST(0xA000000000000000)
+
 /* Flush all user entries from the segment table of the current processor. */
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
===== include/asm-ppc64/page.h 1.36 vs edited =====
--- 1.36/include/asm-ppc64/page.h	2004-10-28 02:39:49 -05:00
+++ edited/include/asm-ppc64/page.h	2005-01-10 16:59:50 -06:00
@@ -203,10 +203,8 @@ extern int page_is_ram(unsigned long pfn
 #define KERNELBASE      PAGE_OFFSET
 #define VMALLOCBASE     ASM_CONST(0xD000000000000000)
 #define IOREGIONBASE    ASM_CONST(0xE000000000000000)
-#define EEHREGIONBASE   ASM_CONST(0xA000000000000000)
 
 #define IO_REGION_ID       (IOREGIONBASE>>REGION_SHIFT)
-#define EEH_REGION_ID      (EEHREGIONBASE>>REGION_SHIFT)
 #define VMALLOC_REGION_ID  (VMALLOCBASE>>REGION_SHIFT)
 #define KERNEL_REGION_ID   (KERNELBASE>>REGION_SHIFT)
 #define USER_REGION_ID     (0UL)

From david at gibson.dropbear.id.au  Wed Jan 12 11:18:35 2005
From: david at gibson.dropbear.id.au (David Gibson)
Date: Wed, 12 Jan 2005 11:18:35 +1100
Subject: [PATCH] PPC64: Trivial Cleanup:  EEH_REGION
In-Reply-To: <20050111222708.GF23690@austin.ibm.com>
References: <20050111222708.GF23690@austin.ibm.com>
Message-ID: <20050112001835.GA12816@localhost.localdomain>

On Tue, Jan 11, 2005 at 04:27:08PM -0600, Linas Vepstas wrote:
> 
> Hi Paul, 
> 
> Please forward upstream if you agree.
> 
> This is a dumb, dorky cleanup patch:
> Per last round of emails, the concept of EEH _REGION is gone, 
> but a few  stubs remained.  This patch removes them.
> 
> Note there is some funny business in the SLB code that 
> I did not understand, and so I left that alone.  
> I'm guessing that it should be cut out as well.

Yes and no.  The code that's there needs to stay - it's a workaround
for a POWER5 hardware bug - but it doesn't have any real connection to
EEH.  The only reason we use EEHREGIONBASE is that it's a segment
address which will never have anything real mapped into it.
0xFFFFFFFFF0000000 would do just as well.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist.  NOT _the_ _other_ _way_
				| _around_!
http://www.ozlabs.org/people/dgibson


From ahuja at austin.ibm.com  Wed Jan 12 12:08:13 2005
From: ahuja at austin.ibm.com (Manish Ahuja)
Date: Tue, 11 Jan 2005 19:08:13 -0600
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
Message-ID: <41E4787D.90309@austin.ibm.com>

There is a requirement to collect real usage values of each partition in 
LPAR environment
on pseries as well as iseries.

This patch enables that feature. The current purr (processor Utilization 
register )
values of each of the processors is stored in a per_cpu data array. this 
is then
summed and used to calculate various numbers for managing lpars.

The patch also calculates how much real cpu time each process uses and 
stores this value
in a ppc64 specific struct. The value is needed by CKRM to do further 
calculations.


Signed-off-by: Manish Ahuja <ahuja at austin.ibm.com>


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050111/b931e9b9/attachment.txt 

From paulus at samba.org  Wed Jan 12 13:36:56 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 12 Jan 2005 13:36:56 +1100
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
In-Reply-To: <41E4787D.90309@austin.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
Message-ID: <16868.36168.772082.315933@cargo.ozlabs.ibm.com>

Manish Ahuja writes:

> This patch enables that feature. The current purr (processor Utilization 
> register )
> values of each of the processors is stored in a per_cpu data array. this 
> is then
> summed and used to calculate various numbers for managing lpars.

Don't you also need to update purr_data_array in timer_interrupt as
well?  You seem to be doing that only on context switch, which won't
be updated in a timely fashion necessarily (think of a compute-bound
task on a lightly-loaded machine).

> +	for_each_cpu(cpu){
> +		cus = &per_cpu(purr_data_array, cpu);
> +		sum_purr += cus->current_purr;
> +		}

The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the
"}" should be one tab to the left of where it is.

> +/* Used to store Processor Utilization register (purr) values */
> +DECLARE_PER_CPU(struct purr_data, purr_data_array);
> +
> +struct purr_data {
> +        u64 current_purr;  /* Holds the current purr register values */
> +};

Do we really need a struct to store one thing?  Are there other things
you plan to add later?

Paul.


From akpm at osdl.org  Wed Jan 12 14:51:27 2005
From: akpm at osdl.org (Andrew Morton)
Date: Tue, 11 Jan 2005 19:51:27 -0800
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
In-Reply-To: <41E4787D.90309@austin.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
Message-ID: <20050111195127.23300721.akpm@osdl.org>

Manish Ahuja <ahuja at austin.ibm.com> wrote:
>
> There is a requirement to collect real usage values of each partition in 
>  LPAR environment
>  on pseries as well as iseries.

What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ?


From olof at austin.ibm.com  Wed Jan 12 15:06:28 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Tue, 11 Jan 2005 22:06:28 -0600
Subject: Collect real process and processor utilization values when
	virtualization is enabled.
In-Reply-To: <41E4787D.90309@austin.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
Message-ID: <20050112040628.GA13221@austin.ibm.com>

Hi Manish,

On Tue, Jan 11, 2005 at 07:08:13PM -0600, Manish Ahuja wrote:

> The patch also calculates how much real cpu time each process uses and 
> stores this value in a ppc64 specific struct. 

I was going to ask a couple of questions about this and noticed Andrew
Morton's reply pointing at cputime. That answered most of them (how
other archs might be doing it).

> The value is needed by CKRM to do further calculations.

How will CKRM use this? Does it have architecture-specific code to
dig this out of the thread_struct again? Could they use the cputime
interface if we hooked into that instead?

Finally: There's two ways to read PURR on our platform: One is to read
the SPR value, the other to get it from the hypervisor via the H_PURR
call. Do they measure the same thing and stay consistent?


-Olof


From scheel at vnet.ibm.com  Wed Jan 12 19:56:53 2005
From: scheel at vnet.ibm.com (Jeff Scheel)
Date: Wed, 12 Jan 2005 08:56:53 +0000
Subject: Collect real process and processor utilization values when
	virtualization is enabled.
In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
	<16868.36168.772082.315933@cargo.ozlabs.ibm.com>
Message-ID: <1105520213.25534.17.camel@sheepdog.rchland.ibm.com>

On Wed, 2005-01-12 at 02:36, Paul Mackerras wrote:
> Don't you also need to update purr_data_array in timer_interrupt as
> well?  You seem to be doing that only on context switch, which won't
> be updated in a timely fashion necessarily (think of a compute-bound
> task on a lightly-loaded machine).
I agree it doesn't sound like only collecting this data at context
switch does the trick.  If we hook the timer (say in the decr path),
then you need not have the code in context switch.  That is until you 
implement a tickless timer and decr goes away.

> Do we really need a struct to store one thing?  Are there other things
> you plan to add later?
It seems to me that if we tucked the last PURR aside on each decrementer
tick, we could simply let the kernel tasks which need this information
retrieve it as long as we can guarantee atomic update of the values from
the interrupt level.  Then, interfaces like /proc/ppc64/lparcfg can do
summing and other interfaces can use only the processor value if that's
all they need.  Given this, I'd vote for sticking the last PURR value
for a processor in sum per processor structure that exists today like
the Paca.  Thoughts?
-- 
Jeff Scheel (scheel at vnet.ibm.com)


From scheel at vnet.ibm.com  Wed Jan 12 19:47:23 2005
From: scheel at vnet.ibm.com (Jeff Scheel)
Date: Wed, 12 Jan 2005 08:47:23 +0000
Subject: Collect real process and processor utilization values when
	virtualization is enabled.
In-Reply-To: <20050112040628.GA13221@austin.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
	<20050112040628.GA13221@austin.ibm.com>
Message-ID: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com>

On Wed, 2005-01-12 at 04:06, Olof Johansson wrote:
> > The value is needed by CKRM to do further calculations.
> 
> How will CKRM use this? Does it have architecture-specific code to
> dig this out of the thread_struct again? Could they use the cputime
> interface if we hooked into that instead?
The only interface which exists today to retrieve purr is
/proc/ppc64/lparconfig which provides PURR summed across all processors.
We are working on other means to retrieve more specific data in the near
future.

> Finally: There's two ways to read PURR on our platform: One is to read
> the SPR value, the other to get it from the hypervisor via the H_PURR
> call. Do they measure the same thing and stay consistent?
Olof, you are correct.  We'll want to go directly to the hardware and
avoid the overhead of a hypervisor call.  Only is the instance where the
hypervisor is emulating PURR will we want to use the hypervisor call. 
The "art" is detecting when/if that is occurring.  Dave E. should be
able to help us with this.
-- 
Jeff Scheel (scheel at vnet.ibm.com)


From ahuja at austin.ibm.com  Thu Jan 13 03:30:21 2005
From: ahuja at austin.ibm.com (Manish Ahuja)
Date: Wed, 12 Jan 2005 10:30:21 -0600
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
	<16868.36168.772082.315933@cargo.ozlabs.ibm.com>
Message-ID: <41E5509D.2070102@austin.ibm.com>

Paul Mackerras wrote:

>
>Don't you also need to update purr_data_array in timer_interrupt as
>well?  You seem to be doing that only on context switch, which won't
>be updated in a timely fashion necessarily (think of a compute-bound
>task on a lightly-loaded machine).
>
>  
>
Yes, I do need to add this in other places to improve the collection times.

I have stepped away from using the old system completely and would 
actually like to add more collection points
in interrupt routines. This will also enable me to collect real system 
time and other data which i plan to use at other places.

I held that piece back since I saw what martin and john have been doing.

Will put a more cohesive patch out. But this bit will remain unchanged 
with the other additions.

>>+	for_each_cpu(cpu)
>>+		cus = &per_cpu(purr_data_array, cpu);
>>+		sum_purr += cus->current_purr;
>>+		}
>>    
>>
>
>The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the
>"}" should be one tab to the left of where it is.
>
>  
>

Wilco .. will fix it ..

>>+/* Used to store Processor Utilization register (purr) values */
>>+DECLARE_PER_CPU(struct purr_data, purr_data_array);
>>+
>>+struct purr_data {
>>+        u64 current_purr;  /* Holds the current purr register values */
>>+};
>>    
>>
>
>Do we really need a struct to store one thing?  Are there other things
>you plan to add later?
>
>  
>

In my prototype there are a few more things. But since the other patch 
is not final, I only added the one thing, that I knew for sure I wanted.
Having other members and not using them, generally gets you knocked on 
the head... so...

I definitely plan to add other things..

Manish


From ahuja at austin.ibm.com  Thu Jan 13 03:37:32 2005
From: ahuja at austin.ibm.com (Manish Ahuja)
Date: Wed, 12 Jan 2005 10:37:32 -0600
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
In-Reply-To: <20050112040628.GA13221@austin.ibm.com>
References: <41E4787D.90309@austin.ibm.com>
	<20050112040628.GA13221@austin.ibm.com>
Message-ID: <41E5524C.6000107@austin.ibm.com>


>How will CKRM use this? Does it have architecture-specific code to
>dig this out of the thread_struct again? Could they use the cputime
>interface if we hooked into that instead?
>
>  
>

I have provided them my test machine and they are working on setting up 
their stuff and as things get clear
on whether they wish to use cputime interface or collect directly, I 
shall accordingly provide a small patch to enable
them on ppc64.


From will_schmidt at vnet.ibm.com  Thu Jan 13 04:19:38 2005
From: will_schmidt at vnet.ibm.com (will schmidt)
Date: Wed, 12 Jan 2005 11:19:38 -0600
Subject: Collect real process and processor utilization values
 when	virtualization is enabled.
In-Reply-To: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com>
References: <41E4787D.90309@austin.ibm.com>	<20050112040628.GA13221@austin.ibm.com>
	<1105519643.25534.7.camel@sheepdog.rchland.ibm.com>
Message-ID: <41E55C2A.2030309@vnet.ibm.com>

Jeff Scheel wrote:
> On Wed, 2005-01-12 at 04:06, Olof Johansson wrote:
...

> Olof, you are correct.  We'll want to go directly to the hardware and
> avoid the overhead of a hypervisor call.  Only is the instance where the
> hypervisor is emulating PURR will we want to use the hypervisor call. 
> The "art" is detecting when/if that is occurring.  Dave E. should be
> able to help us with this.

Related to the PURR hcall comments.   (Yeah, I already visited that 
hcall/mfspr topic once.. :-)   )
  "While there is an hcall for reading the purr, and that hcall will 
work, it should not be used on [Power5] systems.  on GR and later 
processors the OS should be doing a mfspr PURR directly.  The purpose of 
the hcall was for prototyping PHYP/PURR behavior on pre-GR processors. "


From paulus at samba.org  Thu Jan 13 21:35:25 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 13 Jan 2005 21:35:25 +1100
Subject: [PATCH] PPC64 Move thread_info flags to its own cache line
Message-ID: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>

This patch fixes a problem I have been seeing since all the preempt
changes went in, which is that ppc64 SMP systems would livelock
randomly if preempt was enabled.

It turns out that what was happening was that one cpu was spinning in
spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly
doing preempt_enable() and preempt_disable() calls.  The other cpu had
the lock and was trying to set the TIF_NEED_RESCHED flag for the task
running on the first cpu.  That is an atomic operation which has to be
retried if another cpu writes to the same cacheline between the load
and the store, which the other cpu was doing every time it did
preempt_enable() or preempt_disable().

I decided to move the thread_info flags field into the next cache
line, since it is the only field that would regularly be modified by
cpus other than the one running the task that owns the thread_info.
(OK possibly the `cpu' field would be on a rebalance; I don't know the
rebalancing code, but that should be pretty infrequent.)  Thus, moving
the flags field seems like a good idea generally as well as solving the
immediate problem.

For the record I am pretty unhappy with the code we use for spin_lock
et al. with preemption turned on (the BUILD_LOCK_OPS stuff in
spinlock.c).  For a start we do the atomic op (_raw_spin_trylock) each
time around the loop.  That is going to be generating a lot of
unnecessary bus (or fabric) traffic.  Instead, after we fail to get
the lock we should poll it with simple loads until we see that it is
clear and then retry the atomic op.  Assuming a reasonable cache
design, the loads won't generate any bus traffic until another cpu
writes to the cacheline containing the lock.

Secondly we have lost the __spin_yield call that we had on ppc64,
which is an important optimization when we are running under the
hypervisor.  I can't just put that in cpu_relax because I need to know
which (virtual) cpu is holding the lock, so that I can tell the
hypervisor which virtual cpu to give my time slice to.  That
information is stored in the lock variable, which is why __spin_yield
needs the address of the lock.

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/include/asm-ppc64/thread_info.h test/include/asm-ppc64/thread_info.h
--- linux-2.5/include/asm-ppc64/thread_info.h	2004-12-18 08:35:35.000000000 +1100
+++ test/include/asm-ppc64/thread_info.h	2005-01-13 18:36:24.000000000 +1100
@@ -12,6 +12,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/config.h>
+#include <linux/cache.h>
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <linux/stringify.h>
@@ -22,12 +23,13 @@
 struct thread_info {
 	struct task_struct *task;		/* main task structure */
 	struct exec_domain *exec_domain;	/* execution domain */
-	unsigned long	flags;			/* low level flags */
 	int		cpu;			/* cpu we're on */
 	int		preempt_count;
 	struct restart_block restart_block;
 	/* set by force_successful_syscall_return */
 	unsigned char	syscall_noerror;
+	/* low level flags - has atomic operations done on it */
+	unsigned long	flags ____cacheline_aligned_in_smp;
 };
 
 /*
@@ -39,12 +41,12 @@
 {						\
 	.task =		&tsk,			\
 	.exec_domain =	&default_exec_domain,	\
-	.flags =	0,			\
 	.cpu =		0,			\
 	.preempt_count = 1,			\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
 	},					\
+	.flags =	0,			\
 }
 
 #define init_thread_info	(init_thread_union.thread_info)


From paulus at samba.org  Thu Jan 13 21:37:52 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 13 Jan 2005 21:37:52 +1100
Subject: [PATCH] PPC64 Disable preemption in flush_tlb_pending
Message-ID: <16870.20352.503047.221064@cargo.ozlabs.ibm.com>

The preempt debug stuff found a place where we were using
smp_processor_id() without having preemption disabled, in
flush_tlb_pending.  This patch fixes it by using get_cpu_var and
put_cpu_var instead of the __get_cpu_var variant.

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/include/asm-ppc64/tlbflush.h test/include/asm-ppc64/tlbflush.h
--- linux-2.5/include/asm-ppc64/tlbflush.h	2004-06-07 08:25:32.000000000 +1000
+++ test/include/asm-ppc64/tlbflush.h	2005-01-13 19:35:37.000000000 +1100
@@ -32,10 +32,11 @@
 
 static inline void flush_tlb_pending(void)
 {
-	struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
+	struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
 
 	if (batch->index)
 		__flush_tlb_pending(batch);
+	put_cpu_var(ppc64_tlb_batch);
 }
 
 #define flush_tlb_mm(mm)			flush_tlb_pending()


From paulus at samba.org  Thu Jan 13 21:41:36 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 13 Jan 2005 21:41:36 +1100
Subject: [PATCH] PPC64 Call preempt_schedule on exception exit
Message-ID: <16870.20576.417821.693961@cargo.ozlabs.ibm.com>

This patch mirrors the recent changes on x86 to call preempt_schedule
rather than schedule in the exception exit path, in the case where the
preempt_count is zero and the TIF_NEED_RESCHED bit is set.

I'm a little concerned that this means that we have a window where
interrupts are enabled and we are on our way into preempt_schedule,
but preempt_count is still zero.  Ingo's proposed preempt_schedule_irq
would fix this, and I think something like that should go in.

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/arch/ppc64/kernel/entry.S test/arch/ppc64/kernel/entry.S
--- linux-2.5/arch/ppc64/kernel/entry.S	2005-01-10 07:54:27.000000000 +1100
+++ test/arch/ppc64/kernel/entry.S	2005-01-13 20:48:36.000000000 +1100
@@ -574,25 +574,22 @@
 	crandc	eq,cr1*4+eq,eq
 	bne	restore
 	/* here we are preempting the current task */
-1:	lis	r0,PREEMPT_ACTIVE at h
-	stw	r0,TI_PREEMPT(r9)
+1:
 #ifdef CONFIG_PPC_ISERIES
 	li	r0,1
 	stb	r0,PACAPROCENABLED(r13)
 #endif
 	ori	r10,r10,MSR_EE
 	mtmsrd	r10,1		/* reenable interrupts */
-	bl	.schedule
+	bl	.preempt_schedule
 	mfmsr	r10
 	clrrdi	r9,r1,THREAD_SHIFT
 	rldicl	r10,r10,48,1	/* disable interrupts again */
-	li	r0,0
 	rotldi	r10,r10,16
 	mtmsrd	r10,1
 	ld	r4,TI_FLAGS(r9)
 	andi.	r0,r4,_TIF_NEED_RESCHED
 	bne	1b
-	stw	r0,TI_PREEMPT(r9)
 	b	restore
 
 user_work:


From paulus at samba.org  Thu Jan 13 21:45:06 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 13 Jan 2005 21:45:06 +1100
Subject: [PATCH] PPC64 can do preempt debug too
Message-ID: <16870.20786.164419.188120@cargo.ozlabs.ibm.com>

This patch enables the DEBUG_PREEMPT config option for PPC64.  I have
this turned on on my desktop G5 and it isn't finding any problems.
(It did find one problem, in flush_tlb_pending(), that I have just
sent a patch for.)

BTW, do we really need to restrict which architectures the config
option is available on?

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/include/asm-ppc64/smp.h test/include/asm-ppc64/smp.h
--- linux-2.5/include/asm-ppc64/smp.h	2004-11-26 20:40:32.000000000 +1100
+++ test/include/asm-ppc64/smp.h	2005-01-10 19:49:03.000000000 +1100
@@ -38,7 +38,7 @@
 extern void smp_message_recv(int, struct pt_regs *);
 
 
-#define smp_processor_id() (get_paca()->paca_index)
+#define __smp_processor_id() (get_paca()->paca_index)
 #define hard_smp_processor_id() (get_paca()->hw_cpu_id)
 
 extern cpumask_t cpu_sibling_map[NR_CPUS];
diff -urN linux-2.5/lib/Kconfig.debug test/lib/Kconfig.debug
--- linux-2.5/lib/Kconfig.debug	2005-01-11 08:57:21.000000000 +1100
+++ test/lib/Kconfig.debug	2005-01-11 09:13:28.000000000 +1100
@@ -50,7 +50,7 @@
 
 config DEBUG_PREEMPT
 	bool "Debug preemptible kernel"
-	depends on PREEMPT && X86
+	depends on PREEMPT && (X86 || PPC64)
 	default y
 	help
 	  If you say Y here then the kernel will use a debug variant of the


From paulus at samba.org  Thu Jan 13 21:47:30 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 13 Jan 2005 21:47:30 +1100
Subject: [PATCH] PPC64 Add PREEMPT_BKL option
Message-ID: <16870.20930.566334.782203@cargo.ozlabs.ibm.com>

This patch adds the PREEMPT_BKL config option for PPC64, shamelessly
stolen from the i386 version.  I have this turned on in the kernel on
my desktop G5 and it seems to be just fine.

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/arch/ppc64/Kconfig test/arch/ppc64/Kconfig
--- linux-2.5/arch/ppc64/Kconfig	2005-01-11 08:57:19.000000000 +1100
+++ test/arch/ppc64/Kconfig	2005-01-12 20:25:17.000000000 +1100
@@ -231,6 +231,17 @@
 	  Say Y here if you are building a kernel for a desktop, embedded
 	  or real-time system.  Say N if you are unsure.
 
+config PREEMPT_BKL
+	bool "Preempt The Big Kernel Lock"
+	depends on PREEMPT
+	default y
+	help
+	  This option reduces the latency of the kernel by making the
+	  big kernel lock preemptible.
+
+	  Say Y here if you are building a kernel for a desktop system.
+	  Say N if you are unsure.
+
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #


From mjw at us.ibm.com  Fri Jan 14 05:46:23 2005
From: mjw at us.ibm.com (Mike Wolf)
Date: Thu, 13 Jan 2005 12:46:23 -0600
Subject: [PATCH] PPC64: 32bit wrapper for ioctls.
Message-ID: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com>

Hi Paul,
  The patch adds some 32bit wrappers for 2 ioctls that Java needs.
Assuming this doesn't generate a round of discussion, please
forward upstream to akpm/torvalds.

Signed-off-by: Mike Wolf  mjw at us.ibm.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ioctl32.patch
Type: text/x-patch
Size: 482 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050113/8bc77f67/attachment.bin 

From olof at austin.ibm.com  Fri Jan 14 07:00:48 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Thu, 13 Jan 2005 14:00:48 -0600
Subject: [PATCH] [PPC64] iommu: avoid ISA io space on POWER3
Message-ID: <20050113200048.GA11683@austin.ibm.com>

Hi,

On some systems, the first PCI bus has a ISA I/O hole at the first 16MB. We can't
use this space for DMA addresses on the bus.

On Python-based machines, we'll skip the first 256MB on buses that have the hole,
just as we do on later systems. This means that the first bus will have 768MB of
DMA space shared between the devices on it.

Signed-off-by: Olof Johansson <olof at austin.ibm.com>
Acked-by: Paul Mackerras <paulus at samba.org>


---

 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c |   19 ++++++++++++++++---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole arch/ppc64/kernel/pSeries_iommu.c
--- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole	2005-01-12 16:29:55.000000000 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c	2005-01-12 16:34:57.000000000 -0600
@@ -327,12 +327,25 @@ static void iommu_bus_setup_pSeries(stru
 		/* Root bus */
 		if (is_python(dn)) {
 			struct iommu_table *tbl;
+			unsigned int *iohole;
 
 			DBG("Python root bus %s\n", bus->name);
 
-			/* 1GB window by default */
-			dn->phb->dma_window_size = 1 << 30;
-			dn->phb->dma_window_base_cur = 0;
+			iohole = (unsigned int *)get_property(dn, "io-hole", 0);
+
+			if (iohole) {
+				/* On first bus we need to leave room for the
+				 * ISA address space. Just skip the first 256MB
+				 * alltogether. This leaves 768MB for the window.
+				 */
+				DBG("PHB has io-hole, reserving 256MB\n");
+				dn->phb->dma_window_size = 3 << 28;
+				dn->phb->dma_window_base_cur = 1 << 28;
+			} else {
+				/* 1GB window by default */
+				dn->phb->dma_window_size = 1 << 30;
+				dn->phb->dma_window_base_cur = 0;
+			}
 
 			tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
 

_


From anton at samba.org  Fri Jan 14 10:51:19 2005
From: anton at samba.org (Anton Blanchard)
Date: Fri, 14 Jan 2005 10:51:19 +1100
Subject: [PATCH] ppc64: Allow EEH to be disabled
Message-ID: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>


Hi,

I was thinking of sending this upstream. Any thoughts?

Anton

--

Allow EEH to be disabled for pSeries targets, but only if the EMBEDDED
option is enabled.

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/Kconfig~no-eeh arch/ppc64/Kconfig
--- foobar2/arch/ppc64/Kconfig~no-eeh	2005-01-12 00:34:25.902201644 +1100
+++ foobar2-anton/arch/ppc64/Kconfig	2005-01-12 00:34:25.934199201 +1100
@@ -231,6 +231,11 @@ config PREEMPT
 	  Say Y here if you are building a kernel for a desktop, embedded
 	  or real-time system.  Say N if you are unsure.
 
+config EEH
+	bool "PCI Extended Error Handling (EEH)" if EMBEDDED
+	depends on PPC_PSERIES
+	default y if !EMBEDDED
+
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
diff -puN arch/ppc64/kernel/Makefile~no-eeh arch/ppc64/kernel/Makefile
--- foobar2/arch/ppc64/kernel/Makefile~no-eeh	2005-01-12 00:34:25.908201186 +1100
+++ foobar2-anton/arch/ppc64/kernel/Makefile	2005-01-12 00:34:25.932199354 +1100
@@ -30,9 +30,10 @@ obj-$(CONFIG_PPC_ISERIES) += iSeries_irq
 obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o
 
 obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \
-			     eeh.o pSeries_nvram.o rtasd.o ras.o \
+			     pSeries_nvram.o rtasd.o ras.o \
 			     xics.o rtas.o pSeries_setup.o pSeries_iommu.o
 
+obj-$(CONFIG_EEH)		+= eeh.o
 obj-$(CONFIG_PROC_FS)		+= proc_ppc64.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_SMP)		+= smp.o
diff -puN include/asm-ppc64/eeh.h~no-eeh include/asm-ppc64/eeh.h
--- foobar2/include/asm-ppc64/eeh.h~no-eeh	2005-01-12 00:34:25.913200804 +1100
+++ foobar2-anton/include/asm-ppc64/eeh.h	2005-01-12 00:34:25.931199430 +1100
@@ -23,7 +23,6 @@
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/string.h>
-#include <linux/notifier.h>
 
 struct pci_dev;
 struct device_node;
@@ -33,14 +32,18 @@ struct device_node;
 #define EEH_MODE_NOCHECK	(1<<1)
 #define EEH_MODE_ISOLATED	(1<<2)
 
-#ifdef CONFIG_PPC_PSERIES
-extern void __init eeh_init(void);
-unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val);
+#ifdef CONFIG_EEH
+void __init eeh_init(void);
+unsigned long eeh_check_failure(const volatile void __iomem *token,
+				unsigned long val);
 int eeh_dn_check_failure (struct device_node *dn, struct pci_dev *dev);
 void __iomem *eeh_ioremap(unsigned long addr, void __iomem *vaddr);
 void __init pci_addr_cache_build(void);
 #else
+#define eeh_init()
 #define eeh_check_failure(token, val) (val)
+#define eeh_dn_check_failure(dn, dev) (0)
+#define pci_addr_cache_build()
 #endif
 
 /**
@@ -69,8 +72,6 @@ void eeh_remove_device(struct pci_dev *)
 #define EEH_ENABLE		1
 #define EEH_RELEASE_LOADSTORE	2
 #define EEH_RELEASE_DMA		3
-int eeh_set_option(struct pci_dev *dev, int options);
-
 
 /**
  * Notifier event flags.
@@ -89,6 +90,7 @@ struct eeh_event {
 };
 
 /** Register to find out about EEH events. */
+struct notifier_block;
 int eeh_register_notifier(struct notifier_block *nb);
 int eeh_unregister_notifier(struct notifier_block *nb);
 
@@ -194,7 +196,8 @@ static inline void eeh_raw_writeq(u64 va
 #define EEH_CHECK_ALIGN(v,a) \
 	((((unsigned long)(v)) & ((a) - 1)) == 0)
 
-static inline void eeh_memset_io(volatile void __iomem *addr, int c, unsigned long n)
+static inline void eeh_memset_io(volatile void __iomem *addr, int c,
+				 unsigned long n)
 {
 	u32 lc = c;
 	lc |= lc << 8;
_


From linas at austin.ibm.com  Fri Jan 14 11:31:59 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Thu, 13 Jan 2005 18:31:59 -0600
Subject: [PATCH] ppc64: Allow EEH to be disabled
In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
Message-ID: <20050114003159.GO23690@austin.ibm.com>

On Fri, Jan 14, 2005 at 10:51:19AM +1100, Anton Blanchard was heard to remark:
> 
> Hi,
> 
> I was thinking of sending this upstream. Any thoughts?

Yes, can you help me get my other patch accepted?

(This patch, though, looks fine to me). (Although one could probably
move even more things into the #ifdef region, just to be clean.)

--linas


From zwane at arm.linux.org.uk  Fri Jan 14 11:43:39 2005
From: zwane at arm.linux.org.uk (Zwane Mwaikambo)
Date: Thu, 13 Jan 2005 17:43:39 -0700 (MST)
Subject: [PATCH] PPC64 pmac hotplug cpu
Message-ID: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>

I found the following very handy for use as a reference platform when 
working on i386 hotplug cpu recently.

It's been tested on a G5 system with a cpu going on/offline every second 
and make -j. I've also tried a number of config options to avoid compile 
breakage.

Signed-off-by: Zwane Mwaikambo <zwane at arm.linux.org.uk>

Index: linux-2.6.10-mm3/arch/ppc64/Kconfig
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/Kconfig,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 Kconfig
--- linux-2.6.10-mm3/arch/ppc64/Kconfig	13 Jan 2005 16:27:26 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/Kconfig	13 Jan 2005 16:35:39 -0000
@@ -305,7 +305,7 @@ source "drivers/pci/Kconfig"
 
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
-	depends on SMP && EXPERIMENTAL && PPC_PSERIES
+	depends on SMP && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC)
 	select HOTPLUG
 	---help---
 	  Say Y here to be able to turn CPUs off and on.
Index: linux-2.6.10-mm3/arch/ppc64/kernel/idle.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/idle.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 idle.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/idle.c	13 Jan 2005 16:27:26 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/idle.c	13 Jan 2005 16:34:24 -0000
@@ -364,7 +364,7 @@ int idle_setup(void)
 		}
 	}
 #endif /* CONFIG_PPC_PSERIES */
-#ifndef CONFIG_PPC_ISERIES
+#if !defined(CONFIG_PPC_ISERIES) && !defined(CONFIG_HOTPLUG_CPU)
 	if (systemcfg->platform == PLATFORM_POWERMAC ||
 	    systemcfg->platform == PLATFORM_MAPLE) {
 		printk(KERN_INFO "Using native/NAP idle loop\n");
Index: linux-2.6.10-mm3/arch/ppc64/kernel/irq.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/irq.c	13 Jan 2005 16:27:26 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/irq.c	13 Jan 2005 23:51:29 -0000
@@ -479,3 +479,31 @@ EXPORT_SYMBOL(do_softirq);
 
 #endif /* CONFIG_IRQSTACKS */
 
+#ifdef CONFIG_HOTPLUG_CPU
+void fixup_irqs(cpumask_t map)
+{
+	unsigned int irq;
+	static int warned;
+
+	for_each_irq(irq) {
+		cpumask_t mask;
+
+		if (irq_desc[irq].status & IRQ_PER_CPU)
+			continue;
+
+		cpus_and(mask, irq_affinity[irq], map);
+		if (any_online_cpu(mask) == NR_CPUS) {
+			printk("Breaking affinity for irq %i\n", irq);
+			mask = map;
+		}
+		if (irq_desc[irq].handler->set_affinity)
+			irq_desc[irq].handler->set_affinity(irq, mask);
+		else if (irq_desc[irq].action && !(warned++))
+			printk("Cannot set affinity for irq %i\n", irq);
+	}
+
+	local_irq_enable();
+	mdelay(1);
+	local_irq_disable();
+}
+#endif
Index: linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 pSeries_setup.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c	13 Jan 2005 20:44:05 -0000
@@ -327,8 +327,9 @@ static  void __init pSeries_discover_pic
 	}
 }
 
-static void pSeries_cpu_die(void)
+static void pSeries_mach_cpu_die(void)
 {
+	idle_task_exit();
 	local_irq_disable();
 	/* Some hardware requires clearing the CPPR, while other hardware does not
 	 * it is safe either way
@@ -606,7 +607,7 @@ struct machdep_calls __initdata pSeries_
 	.power_off		= rtas_power_off,
 	.halt			= rtas_halt,
 	.panic			= rtas_os_term,
-	.cpu_die		= pSeries_cpu_die,
+	.cpu_die		= pSeries_mach_cpu_die,
 	.get_boot_time		= pSeries_get_boot_time,
 	.get_rtc_time		= pSeries_get_rtc_time,
 	.set_rtc_time		= pSeries_set_rtc_time,
Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 pmac.h
--- linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h	13 Jan 2005 16:34:24 -0000
@@ -8,6 +8,9 @@
  * Declaration for the various functions exported by the
  * pmac_* files. Mostly for use by pmac_setup
  */
+#ifdef CONFIG_HOTPLUG_CPU
+DECLARE_PER_CPU(int, cpu_state);
+#endif
 
 extern void pmac_get_boot_time(struct rtc_time *tm);
 extern void pmac_get_rtc_time(struct rtc_time *tm);
Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 pmac_setup.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c	13 Jan 2005 16:34:24 -0000
@@ -229,6 +229,25 @@ void __pmac pmac_halt(void)
 	pmac_power_off();
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void pmac_mach_cpu_die(void)
+{
+	unsigned int cpu;
+
+	local_irq_disable();
+	cpu = smp_processor_id();
+	printk(KERN_DEBUG "CPU%d offline\n", cpu);
+	__get_cpu_var(cpu_state) = CPU_DEAD;
+	wmb();
+	while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE)
+		cpu_relax();
+
+	flush_tlb_pending();
+	cpu_set(cpu, cpu_online_map);
+	local_irq_enable();
+}
+#endif
+
 #ifdef CONFIG_BOOTX_TEXT
 static int dummy_getc_poll(void)
 {
@@ -455,5 +474,8 @@ struct machdep_calls __initdata pmac_md 
       	.calibrate_decr		= pmac_calibrate_decr,
 	.feature_call		= pmac_do_feature_call,
 	.progress		= pmac_progress,
-	.check_legacy_ioport	= pmac_check_legacy_ioport
+	.check_legacy_ioport	= pmac_check_legacy_ioport,
+#ifdef CONFIG_HOTPLUG_CPU
+	.cpu_die		= pmac_mach_cpu_die,
+#endif
 };
Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 pmac_smp.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c	14 Jan 2005 00:32:10 -0000
@@ -35,6 +35,7 @@
 #include <linux/spinlock.h>
 #include <linux/errno.h>
 #include <linux/irq.h>
+#include <linux/delay.h>
 
 #include <asm/ptrace.h>
 #include <asm/atomic.h>
@@ -296,6 +297,38 @@ static void __init smp_core99_setup_cpu(
 	}
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+/* State of each CPU during hotplug phases */
+DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
+static int pmac_cpu_disable(void)
+{
+	unsigned int cpu = smp_processor_id();
+
+	if (cpu == boot_cpuid)
+		return -EBUSY;
+
+	systemcfg->processorCount--;
+	cpu_clear(cpu, cpu_online_map);
+	fixup_irqs(cpu_online_map);
+	return 0;
+}
+
+static void pmac_cpu_die(unsigned int cpu)
+{
+	int i;
+	
+	for (i = 0; i < 100; i++) {
+		rmb();
+		if (per_cpu(cpu_state, cpu) == CPU_DEAD)
+			return;
+		msleep(100);
+	}
+	printk(KERN_ERR "CPU%d didn't die...\n", cpu);
+}
+
+#endif
+
 struct smp_ops_t core99_smp_ops __pmacdata = {
 	.message_pass	= smp_mpic_message_pass,
 	.probe		= smp_core99_probe,
@@ -308,4 +341,8 @@ struct smp_ops_t core99_smp_ops __pmacda
 void __init pmac_setup_smp(void)
 {
 	smp_ops = &core99_smp_ops;
+#ifdef CONFIG_HOTPLUG_CPU
+	smp_ops->cpu_disable = pmac_cpu_disable;
+	smp_ops->cpu_die = pmac_cpu_die;
+#endif
 }
Index: linux-2.6.10-mm3/arch/ppc64/kernel/setup.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/setup.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 setup.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/setup.c	13 Jan 2005 16:27:26 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/setup.c	13 Jan 2005 21:26:48 -0000
@@ -1345,9 +1345,6 @@ early_param("xmon", early_xmon);
 
 void cpu_die(void)
 {
-	idle_task_exit();
 	if (ppc_md.cpu_die)
 		ppc_md.cpu_die();
-	local_irq_disable();
-	for (;;);
 }
Index: linux-2.6.10-mm3/arch/ppc64/kernel/smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/smp.c	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/smp.c	14 Jan 2005 00:26:26 -0000
@@ -406,10 +406,39 @@ void __devinit smp_prepare_boot_cpu(void
 	current_set[boot_cpuid] = current->thread_info;
 }
 
+#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PPC_PMAC)
+#include "pmac.h"
+static int cpu_enable(unsigned int cpu)
+{
+	if (systemcfg->platform == PLATFORM_PSERIES_LPAR)
+		return -ENOSYS;
+
+	/* get the target out of it's holding state */
+	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+	wmb();
+
+	while (!cpu_online(cpu))
+		cpu_relax();
+
+	fixup_irqs(cpu_online_map);
+	/* counter the irq disable in fixup_irqs */
+	local_irq_enable();
+	return 0;
+}
+#else
+static int cpu_enable(unsigned int cpu)
+{
+	return -ENOSYS;
+}
+#endif
+
 int __devinit __cpu_up(unsigned int cpu)
 {
 	int c;
 
+	if (system_state == SYSTEM_RUNNING && !cpu_enable(cpu))
+		return 0;
+
 	/* At boot, don't bother with non-present cpus -JSCHOPP */
 	if (system_state < SYSTEM_RUNNING && !cpu_present(cpu))
 		return -ENOENT;
Index: linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 sysfs.c
--- linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c	13 Jan 2005 16:27:27 -0000	1.1.1.1
+++ linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c	13 Jan 2005 16:36:23 -0000
@@ -18,7 +18,7 @@
 #include <asm/systemcfg.h>
 #include <asm/paca.h>
 #include <asm/lppaca.h>
-
+#include <asm/machdep.h>
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
 
@@ -413,9 +413,7 @@ static int __init topology_init(void)
 		 * CPU.  For instance, the boot cpu might never be valid
 		 * for hotplugging.
 		 */
-#ifdef CONFIG_HOTPLUG_CPU
-		if (systemcfg->platform != PLATFORM_PSERIES_LPAR)
-#endif
+		if (!ppc_md.cpu_die)
 			c->no_control = 1;
 
 		if (cpu_online(cpu) || (c->no_control == 0)) {
Index: linux-2.6.10-mm3/include/asm-ppc64/smp.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm3/include/asm-ppc64/smp.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.h
--- linux-2.6.10-mm3/include/asm-ppc64/smp.h	13 Jan 2005 16:27:35 -0000	1.1.1.1
+++ linux-2.6.10-mm3/include/asm-ppc64/smp.h	13 Jan 2005 16:34:24 -0000
@@ -29,7 +29,7 @@
 extern int boot_cpuid;
 extern int boot_cpuid_phys;
 
-extern void cpu_die(void) __attribute__((noreturn));
+extern void cpu_die(void);
 
 #ifdef CONFIG_SMP
 
@@ -37,6 +37,9 @@ extern void smp_send_debugger_break(int 
 struct pt_regs;
 extern void smp_message_recv(int, struct pt_regs *);
 
+#ifdef CONFIG_HOTPLUG_CPU
+extern void fixup_irqs(cpumask_t map);
+#endif
 
 #define smp_processor_id() (get_paca()->paca_index)
 #define hard_smp_processor_id() (get_paca()->hw_cpu_id)


From nathanl at austin.ibm.com  Fri Jan 14 18:05:52 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Fri, 14 Jan 2005 01:05:52 -0600
Subject: [PATCH] use kref for device_node refcounting
Message-ID: <1105686352.4367.4.camel@biclops>

This changes struct device_node and associated code to use the kref
api for object refcounting and freeing.  I've given it some testing on
pSeries with cpu add/remove and verified that the release function
works.  The change is somewhat cosmetic but it does make the code
easier to understand... at least I think so =)

The only real change is that the refcount on all device_nodes is
initialized at 1, and the device node is freed when the refcount
reaches 0 (of_remove_node has the extra "put" to ensure that this
happens).  This lets us get rid of the OF_STALE flag and macros in
prom.h.

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>


---


diff -puN arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref arch/ppc64/kernel/prom.c
--- linux-2.6.11-rc1-bk1/arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref	2005-01-13 19:04:09.000000000 -0600
+++ linux-2.6.11-rc1-bk1-nathanl/arch/ppc64/kernel/prom.c	2005-01-14 00:24:04.000000000 -0600
@@ -717,6 +717,7 @@ static unsigned long __init unflatten_dt
 				dad->next->sibling = np;
 			dad->next = np;
 		}
+		kref_init(&np->kref);
 	}
 	while(1) {
 		u32 sz, noff;
@@ -1475,24 +1476,31 @@ EXPORT_SYMBOL(of_get_next_child);
  *	@node:	Node to inc refcount, NULL is supported to
  *		simplify writing of callers
  *
- *	Returns the node itself or NULL if gone.
+ *	Returns node.
  */
 struct device_node *of_node_get(struct device_node *node)
 {
-	if (node && !OF_IS_STALE(node)) {
-		atomic_inc(&node->_users);
-		return node;
-	}
-	return NULL;
+	if (node)
+		kref_get(&node->kref);
+	return node;
 }
 EXPORT_SYMBOL(of_node_get);
 
+static inline struct device_node * kref_to_device_node(struct kref *kref)
+{
+	return container_of(kref, struct device_node, kref);
+}
+
 /**
- *	of_node_cleanup - release a dynamically allocated node
- *	@arg:  Node to be released
+ *	of_node_release - release a dynamically allocated node
+ *	@kref:  kref element of the node to be released
+ *
+ *	In of_node_put() this function is passed to kref_put()
+ *	as the destructor.
  */
-static void of_node_cleanup(struct device_node *node)
+static void of_node_release(struct kref *kref)
 {
+	struct device_node *node = kref_to_device_node(kref);
 	struct property *prop = node->properties;
 
 	if (!OF_IS_DYNAMIC(node))
@@ -1518,19 +1526,8 @@ static void of_node_cleanup(struct devic
  */
 void of_node_put(struct device_node *node)
 {
-	if (!node)
-		return;
-
-	WARN_ON(0 == atomic_read(&node->_users));
-
-	if (OF_IS_STALE(node)) {
-		if (atomic_dec_and_test(&node->_users)) {
-			of_node_cleanup(node);
-			return;
-		}
-	}
-	else
-		atomic_dec(&node->_users);
+	if (node)
+		kref_put(&node->kref, of_node_release);
 }
 EXPORT_SYMBOL(of_node_put);
 
@@ -1773,7 +1770,7 @@ int of_add_node(const char *path, struct
 
 	np->properties = proplist;
 	OF_MARK_DYNAMIC(np);
-	of_node_get(np);
+	kref_init(&np->kref);
 	np->parent = derive_parent(path);
 	if (!np->parent) {
 		kfree(np);
@@ -1808,8 +1805,9 @@ static void of_cleanup_node(struct devic
 }
 
 /*
- * Remove an OF device node from the system.
- * Caller should have already "gotten" np.
+ * "Unplug" a node from the device tree.  The caller must hold
+ * a reference to the node.  The memory associated with the node
+ * is not freed until its refcount goes to zero.
  */
 int of_remove_node(struct device_node *np)
 {
@@ -1827,7 +1825,6 @@ int of_remove_node(struct device_node *n
 	of_cleanup_node(np);
 
 	write_lock(&devtree_lock);
-	OF_MARK_STALE(np);
 	remove_node_proc_entries(np);
 	if (allnodes == np)
 		allnodes = np->allnext;
@@ -1852,6 +1849,7 @@ int of_remove_node(struct device_node *n
 	}
 	write_unlock(&devtree_lock);
 	of_node_put(parent);
+	of_node_put(np); /* Must decrement the refcount */
 	return 0;
 }
 
diff -puN include/asm-ppc64/prom.h~ppc64-device_node-use-kref include/asm-ppc64/prom.h
--- linux-2.6.11-rc1-bk1/include/asm-ppc64/prom.h~ppc64-device_node-use-kref	2005-01-13 19:04:09.000000000 -0600
+++ linux-2.6.11-rc1-bk1-nathanl/include/asm-ppc64/prom.h	2005-01-13 19:04:09.000000000 -0600
@@ -149,18 +149,15 @@ struct device_node {
 	struct  proc_dir_entry *pde;       /* this node's proc directory */
 	struct  proc_dir_entry *name_link; /* name symlink */
 	struct  proc_dir_entry *addr_link; /* addr symlink */
-	atomic_t _users;                 /* reference count */
+	struct  kref kref;
 	unsigned long _flags;
 };
 
 extern struct device_node *of_chosen;
 
 /* flag descriptions */
-#define OF_STALE   0 /* node is slated for deletion */
 #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */
 
-#define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags)
-#define OF_MARK_STALE(x) set_bit(OF_STALE, &x->_flags)
 #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, &x->_flags)
 #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, &x->_flags)
 

_


From arnd at arndb.de  Fri Jan 14 20:28:22 2005
From: arnd at arndb.de (Arnd Bergmann)
Date: Fri, 14 Jan 2005 10:28:22 +0100
Subject: [PATCH] ppc64: Allow EEH to be disabled
In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
Message-ID: <200501141028.23317.arnd@arndb.de>

On Freedag 14 Januar 2005 00:51, Anton Blanchard wrote:
> Hi,
> 
> I was thinking of sending this upstream. Any thoughts?
> 
I'm doing something similar in my private tree and I noticed that
init_pci_config_tokens() is currently called by eeh_init().
If you don't build EEH, init_pci_config_tokens() needs to be called
by pSeries_setup_arch(), which makes more sense anyway.

	Arnd <><


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/8f823a0e/attachment.pgp 

From arnd at arndb.de  Fri Jan 14 20:23:07 2005
From: arnd at arndb.de (Arnd Bergmann)
Date: Fri, 14 Jan 2005 10:23:07 +0100
Subject: [PATCH] PPC64: 32bit wrapper for ioctls.
In-Reply-To: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com>
References: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com>
Message-ID: <200501141023.08156.arnd@arndb.de>

On Dunnersdag 13 Januar 2005 19:46, Mike Wolf wrote:
> Hi Paul,
> ? The patch adds some 32bit wrappers for 2 ioctls that Java needs.
> Assuming this doesn't generate a round of discussion, please
> forward upstream to akpm/torvalds.

Why add them to arch/ppc64? These don't look architecture specific, so they
should go into include/linux/compat_ioctl.h. 

> --- linus-0112.orig/arch/ppc64/kernel/ioctl32.c?2005-01-13 10:35:10.165539000 -0600
> +++ linus-0112/arch/ppc64/kernel/ioctl32.c??????2005-01-13 10:51:43.450433277 -0600
> @@ -43,6 +43,8 @@
> ?COMPATIBLE_IOCTL(TIOCSTART)
> ?COMPATIBLE_IOCTL(TIOCSTOP)
> ?COMPATIBLE_IOCTL(TIOCSLTC)
> +COMPATIBLE_IOCTL(TIOCMIWAIT)

Note that TIOCMIWAIT is not COMPATIBLE_IOCTL, but ULONG_IOCTL. It doesn't make
a difference for ppc64, but if you add it to the generic file that is needed for
s390x.

	Arnd <><


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/54e8467a/attachment.pgp 

From arnd at arndb.de  Fri Jan 14 20:50:28 2005
From: arnd at arndb.de (Arnd Bergmann)
Date: Fri, 14 Jan 2005 10:50:28 +0100
Subject: Collect real process and processor utilization values when
	virtualization is enabled.
In-Reply-To: <20050111195127.23300721.akpm@osdl.org>
References: <41E4787D.90309@austin.ibm.com>
	<20050111195127.23300721.akpm@osdl.org>
Message-ID: <200501141050.29068.arnd@arndb.de>

On Middeweken 12 Januar 2005 04:51, Andrew Morton wrote:
> Manish Ahuja <ahuja at austin.ibm.com> wrote:
> >
> > There is a requirement to collect real usage values of each partition in 
> >  LPAR environment
> >  on pseries as well as iseries.
> 
> What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ?

I asked Martin the same thing yesterday, and he said that that recording
the purr value like Manish does is needed to support the cputime statistics,
but this is not the complete solution.

Manish, did you look at
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ?
I think you need to do similar things on top of you patch to really export 
steal time etc. to user space.

	Arnd <><
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/44dce463/attachment.pgp 

From ahuja at austin.ibm.com  Sat Jan 15 06:18:23 2005
From: ahuja at austin.ibm.com (Manish Ahuja)
Date: Fri, 14 Jan 2005 13:18:23 -0600
Subject: Collect real process and processor utilization values when
 virtualization is enabled.
In-Reply-To: <200501141050.29068.arnd@arndb.de>
References: <41E4787D.90309@austin.ibm.com>
	<20050111195127.23300721.akpm@osdl.org>
	<200501141050.29068.arnd@arndb.de>
Message-ID: <41E81AFF.3020005@austin.ibm.com>

Arnd Bergmann wrote:

>I asked Martin the same thing yesterday, and he said that that recording
>the purr value like Manish does is needed to support the cputime statistics,
>but this is not the complete solution.
>
>Manish, did you look at
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ?
>I think you need to do similar things on top of you patch to really export 
>steal time etc. to user space.
>
>	Arnd <><
>  
>

Yup,

There is another piece that will tie in with Martin's patch. This piece 
is needed by the CKRM folks to enable process accounting feature
as well as by Jeff Scheel since he uses the output for his calculations.

Manish


From anton at samba.org  Sat Jan 15 10:49:20 2005
From: anton at samba.org (Anton Blanchard)
Date: Sat, 15 Jan 2005 10:49:20 +1100
Subject: [PATCH] ppc64: Allow EEH to be disabled
In-Reply-To: <200501141028.23317.arnd@arndb.de>
References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
	<200501141028.23317.arnd@arndb.de>
Message-ID: <20050114234920.GM6309@krispykreme.ozlabs.ibm.com>


Hi,

> I'm doing something similar in my private tree and I noticed that
> init_pci_config_tokens() is currently called by eeh_init().
> If you don't build EEH, init_pci_config_tokens() needs to be called
> by pSeries_setup_arch(), which makes more sense anyway.

Good point :) We also had PCI disabled so never saw this.

Anton


From anton at samba.org  Sat Jan 15 11:00:55 2005
From: anton at samba.org (Anton Blanchard)
Date: Sat, 15 Jan 2005 11:00:55 +1100
Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE()
In-Reply-To: <1105714076.26551.243.camel@hades.cambridge.redhat.com>
References: <1105714076.26551.243.camel@hades.cambridge.redhat.com>
Message-ID: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com>


David: you have to send me some spare Signed-off-by's :)

Anton

--

From: David Woodhouse <dwmw2 at infradead.org>

We don't set MM_VM_SIZE() on ppc64, so it defaults to TASK_SIZE. Which
means a 32-bit process ending up in exit_mmap() to kill a 64-bit mm may
call tlb_finish_mmu() with an incorrect 'end' argument.

Signed-off-by: Anton Blanchard <anton at samba.org>

===== include/asm-ppc64/processor.h 1.59 vs edited =====
--- 1.59/include/asm-ppc64/processor.h	Tue Jan 11 01:29:24 2005
+++ edited/include/asm-ppc64/processor.h	Fri Jan 14 14:42:44 2005
@@ -537,6 +537,10 @@
 #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
 		TASK_SIZE_USER32 : TASK_SIZE_USER64)
 
+/* We can't actually tell the TASK_SIZE given just the mm, but default
+ * to the 64-bit case to make sure that enough gets cleaned up. */
+#define MM_VM_SIZE(mm)	TASK_SIZE_USER64
+
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */


From dwmw2 at infradead.org  Sat Jan 15 11:31:41 2005
From: dwmw2 at infradead.org (David Woodhouse)
Date: Sat, 15 Jan 2005 00:31:41 +0000
Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE()
In-Reply-To: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com>
References: <1105714076.26551.243.camel@hades.cambridge.redhat.com>
	<20050115000055.GO6309@krispykreme.ozlabs.ibm.com>
Message-ID: <1105749101.30759.109.camel@baythorne.infradead.org>

On Sat, 2005-01-15 at 11:00 +1100, Anton Blanchard wrote:
> David: you have to send me some spare Signed-off-by's :)

Get Paulus to give you some spares. I'm sure he's losing them.

Signed-off-by: David Woodhouse <dwmw2 at infradead.org>

-- 
dwmw2


From mingo at elte.hu  Sun Jan 16 01:25:37 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 15 Jan 2005 15:25:37 +0100
Subject: [patch] spin-nicer-2.6.11-rc1-A0
In-Reply-To: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
Message-ID: <20050115142537.GD10114@elte.hu>


* Paul Mackerras <paulus at samba.org> wrote:

> This patch fixes a problem I have been seeing since all the preempt
> changes went in, which is that ppc64 SMP systems would livelock
> randomly if preempt was enabled.
> 
> It turns out that what was happening was that one cpu was spinning in
> spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly
> doing preempt_enable() and preempt_disable() calls.  The other cpu had
> the lock and was trying to set the TIF_NEED_RESCHED flag for the task
> running on the first cpu.  That is an atomic operation which has to be
> retried if another cpu writes to the same cacheline between the load
> and the store, which the other cpu was doing every time it did
> preempt_enable() or preempt_disable().

ahh ... indeed. Nice catch.

> I decided to move the thread_info flags field into the next cache
> line, since it is the only field that would regularly be modified by
> cpus other than the one running the task that owns the thread_info.
> (OK possibly the `cpu' field would be on a rebalance; I don't know the
> rebalancing code, but that should be pretty infrequent.)  Thus, moving
> the flags field seems like a good idea generally as well as solving
> the immediate problem.
> 
> For the record I am pretty unhappy with the code we use for spin_lock
> et al. with preemption turned on (the BUILD_LOCK_OPS stuff in
> spinlock.c).  For a start we do the atomic op (_raw_spin_trylock) each
> time around the loop.  That is going to be generating a lot of
> unnecessary bus (or fabric) traffic.  Instead, after we fail to get
> the lock we should poll it with simple loads until we see that it is
> clear and then retry the atomic op.  Assuming a reasonable cache
> design, the loads won't generate any bus traffic until another cpu
> writes to the cacheline containing the lock.

agreed. How about the patch below? (tested on x86)

> Secondly we have lost the __spin_yield call that we had on ppc64,
> which is an important optimization when we are running under the
> hypervisor.  I can't just put that in cpu_relax because I need to know
> which (virtual) cpu is holding the lock, so that I can tell the
> hypervisor which virtual cpu to give my time slice to.  That
> information is stored in the lock variable, which is why __spin_yield
> needs the address of the lock.

hm, how about calling __spin_yield() from _raw_spin_trylock(), if the
locking attempt was unsuccessful? This might be slightly incorrect if
the locking attempt is not connected to an actual spin-loop, but we do
have other spin-loops with open-coded trylocks that would benefit from
this optimization too.

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype)					\
+#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
 void __lockfunc _##op##_lock(locktype *lock)				\
 {									\
 	preempt_disable();						\
@@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		cpu_relax();						\
+		while (is_locked_fn(lock) && (lock)->break_lock)	\
+			cpu_relax();					\
 		preempt_disable();					\
 	}								\
 }									\
@@ -204,6 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
+		while (spin_is_locked(lock) && (lock)->break_lock)	\
+			cpu_relax();					\
 		cpu_relax();						\
 		preempt_disable();					\
 	}								\
@@ -244,9 +247,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t);
-BUILD_LOCK_OPS(read, rwlock_t);
-BUILD_LOCK_OPS(write, rwlock_t);
+BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
+BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
+BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
 
 #endif /* CONFIG_PREEMPT */
 

From mingo at elte.hu  Sun Jan 16 01:38:05 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 15 Jan 2005 15:38:05 +0100
Subject: [patch] spin-nicer-2.6.11-rc1-A0
In-Reply-To: <20050115142537.GD10114@elte.hu>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu>
Message-ID: <20050115143805.GA15041@elte.hu>


* Ingo Molnar <mingo at elte.hu> wrote:

> agreed. How about the patch below? (tested on x86)

updated patch below.

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype)					\
+#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
 void __lockfunc _##op##_lock(locktype *lock)				\
 {									\
 	preempt_disable();						\
@@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		cpu_relax();						\
+		while (is_locked_fn(lock) && (lock)->break_lock)	\
+			cpu_relax();					\
 		preempt_disable();					\
 	}								\
 }									\
@@ -204,7 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		cpu_relax();						\
+		while (is_locked_fn(lock) && (lock)->break_lock)	\
+			cpu_relax();					\
 		preempt_disable();					\
 	}								\
 	return flags;							\
@@ -244,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t);
-BUILD_LOCK_OPS(read, rwlock_t);
-BUILD_LOCK_OPS(write, rwlock_t);
+BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
+BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
+BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
 
 #endif /* CONFIG_PREEMPT */
 

From mingo at elte.hu  Sun Jan 16 01:00:44 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 15 Jan 2005 15:00:44 +0100
Subject: [PATCH] PPC64 can do preempt debug too
In-Reply-To: <16870.20786.164419.188120@cargo.ozlabs.ibm.com>
References: <16870.20786.164419.188120@cargo.ozlabs.ibm.com>
Message-ID: <20050115140044.GB10114@elte.hu>


* Paul Mackerras <paulus at samba.org> wrote:

> This patch enables the DEBUG_PREEMPT config option for PPC64.  I have
> this turned on on my desktop G5 and it isn't finding any problems. (It
> did find one problem, in flush_tlb_pending(), that I have just sent a
> patch for.)
> 
> BTW, do we really need to restrict which architectures the config
> option is available on?

in the case of x86 (and x64) i found that there were a fair number of
false positives in arch-level code. But i agree that we should (now)
make the config option available to all architectures - patch against
2.6.11-rc1 below.

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/lib/Kconfig.debug.orig
+++ linux/lib/Kconfig.debug
@@ -50,7 +50,7 @@ config DEBUG_SLAB
 
 config DEBUG_PREEMPT
 	bool "Debug preemptible kernel"
-	depends on PREEMPT && X86
+	depends on PREEMPT
 	default y
 	help
 	  If you say Y here then the kernel will use a debug variant of the


From mingo at elte.hu  Sun Jan 16 01:04:38 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 15 Jan 2005 15:04:38 +0100
Subject: [PATCH] PPC64 Call preempt_schedule on exception exit
In-Reply-To: <16870.20576.417821.693961@cargo.ozlabs.ibm.com>
References: <16870.20576.417821.693961@cargo.ozlabs.ibm.com>
Message-ID: <20050115140438.GC10114@elte.hu>


* Paul Mackerras <paulus at samba.org> wrote:

> This patch mirrors the recent changes on x86 to call preempt_schedule
> rather than schedule in the exception exit path, in the case where the
> preempt_count is zero and the TIF_NEED_RESCHED bit is set.
> 
> I'm a little concerned that this means that we have a window where
> interrupts are enabled and we are on our way into preempt_schedule,
> but preempt_count is still zero.  Ingo's proposed preempt_schedule_irq
> would fix this, and I think something like that should go in.

the preempt_schedule_irq() patch is in 2.6.11-rc1-mm1 now, does it look
good to you? ppc64 should be able to call it directly from lowlevel
code.

	Ingo


From benh at kernel.crashing.org  Sun Jan 16 09:23:13 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Sun, 16 Jan 2005 09:23:13 +1100
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>
Message-ID: <1105827794.27410.82.camel@gaston>

On Thu, 2005-01-13 at 17:43 -0700, Zwane Mwaikambo wrote:
> I found the following very handy for use as a reference platform when 
> working on i386 hotplug cpu recently.
> 
> It's been tested on a G5 system with a cpu going on/offline every second 
> and make -j. I've also tried a number of config options to avoid compile 
> breakage.

Hi !

Looks good, but you could do even better :) I still want to look at the
proper mecanism to flush the CPU cache on 970, but the idea here is to
flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode)
with the caches clean and MSR:EE off. We can later get it back with a
soft reset.

Ben.


From benh at kernel.crashing.org  Sun Jan 16 09:29:21 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Sun, 16 Jan 2005 09:29:21 +1100
Subject: ioremap of pci region on pSeries LPAR vs SMP
In-Reply-To: <20050111221723.GE23690@austin.ibm.com>
References: <20050110074930.92901.qmail@web11508.mail.yahoo.com>
	<16866.18083.212727.327170@cargo.ozlabs.ibm.com>
	<20050110174716.GW22274@austin.ibm.com>
	<16866.63132.352016.732484@cargo.ozlabs.ibm.com>
	<20050111000845.GC14239@krispykreme.ozlabs.ibm.com>
	<20050111221723.GE23690@austin.ibm.com>
Message-ID: <1105828161.27410.84.camel@gaston>

On Tue, 2005-01-11 at 16:17 -0600, Linas Vepstas wrote:
> On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark:
> > 
> > Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into
> > userspace. Their hack on ppc64 was to look at the high nibble of the
> > address and convert it to a non EEH address if required :)
> > 
> > Im not sure how best to solve the userspace mmap issue but there are a
> > few groups wanting that.
> 
> Somewhat off-topic ... but ...
> 
> 1) If you design your hardware correctly, there are some amazing things
>    you can do (performance wise) by mmaping pci card resources into user
>    space.  If your hardwares is done right, then user corruption can't 
>    hurt the system. This was the defacto method for getting high 
>    performance graphics on IBM RS/6000, sgi, HP and Sun workstations 
>    many moons ago.

And that's exactly what X does still today on pretty much all
machines :)

> 2) There is interest in the virtual i/o community about mmaping 
>    funky stuff to userspace, but that conversation may be for a 
>    different day.  The question is (for example) how to build
>    a high-performance virtual scsi server in userspace (without
>    kernel pieces) which is a design point some people like.
>    Later...
> 
> --linas
> _______________________________________________
> Linuxppc64-dev mailing list
> Linuxppc64-dev at ozlabs.org
> https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev
-- 
Benjamin Herrenschmidt <benh at kernel.crashing.org>


From benh at kernel.crashing.org  Sun Jan 16 09:36:37 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Sun, 16 Jan 2005 09:36:37 +1100
Subject: [PATCH] htab code cleanup
In-Reply-To: <20050106145102.0c3c60ad.sfr@canb.auug.org.au>
References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au>
Message-ID: <1105828597.27435.88.camel@gaston>

On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> This patch just does some small clean ups on the hash page table code
> 	- make htab_address static with in htab_native.c
> 	- move some code that depended on CONFIG_PPC_MULTIPLATFORM
> 	  from htab_utils.c to htab_native.c (on less CONFIG check).
> 	- clean up includes in htab_utils.c

I don't see the point of moving create_pte_mapping() and
htab_initialize() to htab_native.c since it contains code for both
native and non-native...

If you want to get rid of the htab_address, then maybe split
htab_initialize in bits... like htab_native_init() and htab_plpar_init()
for the early ptr setup, that sort of thing ...

Ben.


From benh at kernel.crashing.org  Sun Jan 16 09:44:27 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Sun, 16 Jan 2005 09:44:27 +1100
Subject: [PATCH] sparse fixes for cpu feature constants
In-Reply-To: <20050101223345.GC2297@zax>
References: <1104381206.16694.38.camel@localhost.localdomain>
	<20050101223345.GC2297@zax>
Message-ID: <1105829067.27411.92.camel@gaston>

On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote:

> > switch_mm() uses a BEGIN_FTR_SECTION ...
> > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change
> > since 0x0000000000000008UL winds up in the generated assembly.  I
> > couldn't find the BEGIN/END_FTR_SECTION construct used in any other C
> > code, so I replaced this with the usual bitwise 'and' conditional (I
> > hope someone else will verify that this is equivalent :).
> > 
> > So, does this look like the right thing to do?  It eliminates 129 sparse
> > warnings from a defconfig 2.6.10 build.

Hrm... it's a bit annoying. You are replacing a dynamic patching of the
code by an runtime test... killing a (small tho) optimisation.

There may be other cases where I want to use the CPU feature stuff in
inline assembly..... Not sure what the right fix is, maybe passing the
constant to the asm via the inputs as "i" ...

Ben.


From paulus at samba.org  Sun Jan 16 09:54:22 2005
From: paulus at samba.org (Paul Mackerras)
Date: Sun, 16 Jan 2005 09:54:22 +1100
Subject: [patch] spin-nicer-2.6.11-rc1-A0
In-Reply-To: <20050115143805.GA15041@elte.hu>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu>
Message-ID: <16873.40734.485466.850449@cargo.ozlabs.ibm.com>

Ingo Molnar writes:

> +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
> +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);

I don't think this is right - this means that a cpu trying to acquire
a read lock will spin while any other cpu has a read lock.  We need to
invent and use a rwlock_is_write_locked() here.  PPC64 and parisc have
an is_write_locked() already, and it shouldn't be too hard to do one
for the other architectures (i386 wants (signed int)rw->lock <= 0,
most other arches seem to need (signed int)rw->lock < 0).

> +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);

This one should be rwlock_is_locked, surely?  Otherwise the compiler
will grizzle about us calling spin_is_locked with a rwlock_t *.

Regards,
Paul.


From paulus at samba.org  Sun Jan 16 14:04:27 2005
From: paulus at samba.org (Paul Mackerras)
Date: Sun, 16 Jan 2005 14:04:27 +1100
Subject: [patch] spin-nicer-2.6.11-rc1-A0
In-Reply-To: <20050115142537.GD10114@elte.hu>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu>
Message-ID: <16873.55739.214904.473407@cargo.ozlabs.ibm.com>

Ingo Molnar writes:

> * Paul Mackerras <paulus at samba.org> wrote:
> 
> > Secondly we have lost the __spin_yield call that we had on ppc64,
> > which is an important optimization when we are running under the
> > hypervisor.  I can't just put that in cpu_relax because I need to know
> > which (virtual) cpu is holding the lock, so that I can tell the
> > hypervisor which virtual cpu to give my time slice to.  That
> > information is stored in the lock variable, which is why __spin_yield
> > needs the address of the lock.
> 
> hm, how about calling __spin_yield() from _raw_spin_trylock(), if the
> locking attempt was unsuccessful? This might be slightly incorrect if
> the locking attempt is not connected to an actual spin-loop, but we do
> have other spin-loops with open-coded trylocks that would benefit from
> this optimization too.

That would help, but we also need to yield while we are polling the
lock until it becomes available.  Otherwise we will only yield once;
if we get another timeslice and the other cpu still hasn't finished
with the lock (or another cpu has got it now), we will spin uselessly
for the whole of our timeslice.  Thus I think we need to yield in the
polling loop, whether or not we also yield in _raw_spin_trylock.

Regards,
Paul.


From anton at samba.org  Sun Jan 16 16:19:04 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 16 Jan 2005 16:19:04 +1100
Subject: ppc64 xics.c: what is smp_threads_ready exactly used for?
In-Reply-To: <20050116043356.GM4274@stusta.de>
References: <20050116043356.GM4274@stusta.de>
Message-ID: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com>

 
Hi,

> during a cleanup, I stumbled upon the following:
> 
> 
> arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says:
> 
>         /* XXX fix this, xics currently relies on it - Anton */
>         smp_threads_ready = 1;
> 
> 
> arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where 
> smp_threads_ready is actually used, and this is the _only_ place where 
> smp_threads_ready ever changes it's value on ppc64.

It turns out I was about to submit a patch to remove the ppc64 use of
smp_threads_ready. With that patch it makes sense to kill
smp_threads_ready completely.

Anton


From anton at samba.org  Sun Jan 16 16:55:23 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 16 Jan 2005 16:55:23 +1100
Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS
In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com>
References: <20050116043356.GM4274@stusta.de>
	<20050116051904.GP6309@krispykreme.ozlabs.ibm.com>
Message-ID: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com>

 
Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile
options arent much use on a distro kernel. This also removes the ppc64
use of smp_threads_ready.

I considered removing the option completely but we have had problems in
the past with firmware bugs. In those cases the boot option would have
helped. 

Signed-off-by: Anton Blanchard <anton at samba.org>

===== arch/ppc64/Kconfig 1.76 vs edited =====
--- 1.76/arch/ppc64/Kconfig	2005-01-16 09:31:06 +11:00
+++ edited/arch/ppc64/Kconfig	2005-01-16 16:48:43 +11:00
@@ -186,14 +186,6 @@
 
 	  If you don't know what to do here, say Y.
 
-config IRQ_ALL_CPUS
-	bool "Distribute interrupts on all CPUs by default"
-	depends on SMP && PPC_MULTIPLATFORM
-	help
-	  This option gives the kernel permission to distribute IRQs across
-	  multiple CPUs.  Saying N here will route all IRQs to the first
-	  CPU.
-
 config NR_CPUS
 	int "Maximum number of CPUs (2-128)"
 	range 2 128
===== arch/ppc64/kernel/irq.c 1.74 vs edited =====
--- 1.74/arch/ppc64/kernel/irq.c	2005-01-05 13:48:02 +11:00
+++ edited/arch/ppc64/kernel/irq.c	2005-01-16 16:48:47 +11:00
@@ -62,6 +62,7 @@
 
 extern irq_desc_t irq_desc[NR_IRQS];
 
+int distribute_irqs = 1;
 int __irq_offset_value;
 int ppc_spurious_interrupts;
 unsigned long lpevent_count;
@@ -479,3 +480,10 @@
 
 #endif /* CONFIG_IRQSTACKS */
 
+static int __init setup_noirqdistrib(char *str)
+{
+	distribute_irqs = 0;
+	return 1;
+}
+
+__setup("noirqdistrib", setup_noirqdistrib);
===== arch/ppc64/kernel/mpic.c 1.3 vs edited =====
--- 1.3/arch/ppc64/kernel/mpic.c	2004-11-16 14:29:10 +11:00
+++ edited/arch/ppc64/kernel/mpic.c	2005-01-16 16:48:44 +11:00
@@ -765,10 +765,8 @@
 #ifdef CONFIG_SMP
 	struct mpic *mpic = mpic_primary;
 	unsigned long flags;
-#ifdef CONFIG_IRQ_ALL_CPUS
 	u32 msk = 1 << hard_smp_processor_id();
 	unsigned int i;
-#endif
 
 	BUG_ON(mpic == NULL);
 
@@ -776,16 +774,16 @@
 
 	spin_lock_irqsave(&mpic_lock, flags);
 
-#ifdef CONFIG_IRQ_ALL_CPUS
  	/* let the mpic know we want intrs. default affinity is 0xffffffff
 	 * until changed via /proc. That's how it's done on x86. If we want
 	 * it differently, then we should make sure we also change the default
 	 * values of irq_affinity in irq.c.
  	 */
- 	for (i = 0; i < mpic->num_sources ; i++)
-		mpic_irq_write(i, MPIC_IRQ_DESTINATION,
-			mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk);
-#endif /* CONFIG_IRQ_ALL_CPUS */
+	if (distribute_irqs) {
+	 	for (i = 0; i < mpic->num_sources ; i++)
+			mpic_irq_write(i, MPIC_IRQ_DESTINATION,
+				mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk);
+	}
 
 	/* Set current processor priority to 0 */
 	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0);
===== arch/ppc64/kernel/pSeries_smp.c 1.9 vs edited =====
--- 1.9/arch/ppc64/kernel/pSeries_smp.c	2005-01-12 11:42:40 +11:00
+++ edited/arch/ppc64/kernel/pSeries_smp.c	2005-01-16 16:48:44 +11:00
@@ -259,7 +259,6 @@
 	if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR)
 		vpa_init(cpu);
 
-#ifdef CONFIG_IRQ_ALL_CPUS
 	/*
 	 * Put the calling processor into the GIQ.  This is really only
 	 * necessary from a secondary thread as the OF start-cpu interface
@@ -267,7 +266,6 @@
 	 */
 	rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE,
 		(1UL << interrupt_server_size) - 1 - default_distrib_server, 1);
-#endif
 }
 
 static spinlock_t timebase_lock = SPIN_LOCK_UNLOCKED;
===== arch/ppc64/kernel/smp.c 1.104 vs edited =====
--- 1.104/arch/ppc64/kernel/smp.c	2005-01-12 11:42:39 +11:00
+++ edited/arch/ppc64/kernel/smp.c	2005-01-16 16:48:45 +11:00
@@ -526,9 +526,6 @@
 	
 	smp_ops->setup_cpu(boot_cpuid);
 
-	/* XXX fix this, xics currently relies on it - Anton */
-	smp_threads_ready = 1;
-
 	set_cpus_allowed(current, old_mask);
 
 	/*
===== arch/ppc64/kernel/xics.c 1.57 vs edited =====
--- 1.57/arch/ppc64/kernel/xics.c	2005-01-12 11:42:40 +11:00
+++ edited/arch/ppc64/kernel/xics.c	2005-01-16 16:48:45 +11:00
@@ -242,28 +242,24 @@
 static int get_irq_server(unsigned int irq)
 {
 	unsigned int server;
-
-#ifdef CONFIG_IRQ_ALL_CPUS
 	/* For the moment only implement delivery to all cpus or one cpu */
-	if (smp_threads_ready) {
-		cpumask_t cpumask = irq_affinity[irq];
-		cpumask_t tmp = CPU_MASK_NONE;
-		if (cpus_equal(cpumask, CPU_MASK_ALL)) {
-			server = default_distrib_server;
-		} else {
-			cpus_and(tmp, cpu_online_map, cpumask);
+	cpumask_t cpumask = irq_affinity[irq];
+	cpumask_t tmp = CPU_MASK_NONE;
+
+	if (!distribute_irqs)
+		return default_server;
 
-			if (cpus_empty(tmp))
-				server = default_distrib_server;
-			else
-				server = get_hard_smp_processor_id(first_cpu(tmp));
-		}
+	if (cpus_equal(cpumask, CPU_MASK_ALL)) {
+		server = default_distrib_server;
 	} else {
-		server = default_server;
+		cpus_and(tmp, cpu_online_map, cpumask);
+
+		if (cpus_empty(tmp))
+			server = default_distrib_server;
+		else
+			server = get_hard_smp_processor_id(first_cpu(tmp));
 	}
-#else
-	server = default_server;
-#endif
+
 	return server;
 
 }
===== include/asm-ppc64/irq.h 1.11 vs edited =====
--- 1.11/include/asm-ppc64/irq.h	2004-10-23 11:44:19 +10:00
+++ edited/include/asm-ppc64/irq.h	2005-01-16 16:48:47 +11:00
@@ -87,6 +87,8 @@
 	return irq;
 }
 
+extern int distribute_irqs;
+
 struct irqaction;
 struct pt_regs;
 

From bunk at stusta.de  Sun Jan 16 15:33:56 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Sun, 16 Jan 2005 05:33:56 +0100
Subject: ppc64 xics.c: what is smp_threads_ready exactly used for?
Message-ID: <20050116043356.GM4274@stusta.de>

Hi Anton,

during a cleanup, I stumbled upon the following:


arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says:

        /* XXX fix this, xics currently relies on it - Anton */
        smp_threads_ready = 1;


arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where 
smp_threads_ready is actually used, and this is the _only_ place where 
smp_threads_ready ever changes it's value on ppc64.

I have to admit I'm a bit lost in the sequence of function calls on 
ppc64. Is it possible to make any assumptions about the ordering of the 
assignment and the usage of smp_threads_ready?


TIA
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From bunk at stusta.de  Sun Jan 16 18:24:39 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Sun, 16 Jan 2005 08:24:39 +0100
Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS
In-Reply-To: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com>
References: <20050116043356.GM4274@stusta.de>
	<20050116051904.GP6309@krispykreme.ozlabs.ibm.com>
	<20050116055523.GQ6309@krispykreme.ozlabs.ibm.com>
Message-ID: <20050116072439.GS4274@stusta.de>

On Sun, Jan 16, 2005 at 04:55:23PM +1100, Anton Blanchard wrote:
>  
> Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile
> options arent much use on a distro kernel. This also removes the ppc64
> use of smp_threads_ready.
>...

Seems perfect for me.  :-)

I'll simply state that my patch depends on ppc64 on your patch.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From bunk at stusta.de  Sun Jan 16 16:26:56 2005
From: bunk at stusta.de (Adrian Bunk)
Date: Sun, 16 Jan 2005 06:26:56 +0100
Subject: ppc64 xics.c: what is smp_threads_ready exactly used for?
In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com>
References: <20050116043356.GM4274@stusta.de>
	<20050116051904.GP6309@krispykreme.ozlabs.ibm.com>
Message-ID: <20050116052655.GN4274@stusta.de>

On Sun, Jan 16, 2005 at 04:19:04PM +1100, Anton Blanchard wrote:
>  
> Hi,

Hi Anton,

> > during a cleanup, I stumbled upon the following:
> > 
> > 
> > arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says:
> > 
> >         /* XXX fix this, xics currently relies on it - Anton */
> >         smp_threads_ready = 1;
> > 
> > 
> > arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where 
> > smp_threads_ready is actually used, and this is the _only_ place where 
> > smp_threads_ready ever changes it's value on ppc64.
> 
> It turns out I was about to submit a patch to remove the ppc64 use of
> smp_threads_ready. With that patch it makes sense to kill
> smp_threads_ready completely.

I've got a patch ready to remove smp_threads_ready on all architectures.

The only part I still need ids how to replace it in xics.c, since this 
is the only read access to this variable on all architectures.

Could you send me this part for inclusion into my patch?

> Anton

TIA
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From zwane at arm.linux.org.uk  Mon Jan 17 15:37:28 2005
From: zwane at arm.linux.org.uk (Zwane Mwaikambo)
Date: Sun, 16 Jan 2005 21:37:28 -0700 (MST)
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <1105827794.27410.82.camel@gaston>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>
	<1105827794.27410.82.camel@gaston>
Message-ID: <Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>

Hello Ben,

On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote:

> Looks good, but you could do even better :) I still want to look at the
> proper mecanism to flush the CPU cache on 970, but the idea here is to
> flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode)
> with the caches clean and MSR:EE off. We can later get it back with a
> soft reset.

Thanks for the suggestions! I'll work on getting something together.

	Zwane


From benh at kernel.crashing.org  Mon Jan 17 15:47:46 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Mon, 17 Jan 2005 15:47:46 +1100
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>
	<1105827794.27410.82.camel@gaston>
	<Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>
Message-ID: <1105937266.4534.0.camel@gaston>

On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote:
> Hello Ben,
> 
> On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote:
> 
> > Looks good, but you could do even better :) I still want to look at the
> > proper mecanism to flush the CPU cache on 970, but the idea here is to
> > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode)
> > with the caches clean and MSR:EE off. We can later get it back with a
> > soft reset.
> 
> Thanks for the suggestions! I'll work on getting something together.

Well.. the cache flush part requires some not-really-documentd stuff on
the 970, but I'll try to come up with something.

Ben.


From zwane at arm.linux.org.uk  Mon Jan 17 16:35:05 2005
From: zwane at arm.linux.org.uk (Zwane Mwaikambo)
Date: Sun, 16 Jan 2005 22:35:05 -0700 (MST)
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <1105937266.4534.0.camel@gaston>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com> 
	<1105827794.27410.82.camel@gaston>
	<Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>
	<1105937266.4534.0.camel@gaston>
Message-ID: <Pine.LNX.4.61.0501162234390.3010@montezuma.fsmlabs.com>

On Mon, 17 Jan 2005, Benjamin Herrenschmidt wrote:

> On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote:
> > Hello Ben,
> > 
> > On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote:
> > 
> > > Looks good, but you could do even better :) I still want to look at the
> > > proper mecanism to flush the CPU cache on 970, but the idea here is to
> > > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode)
> > > with the caches clean and MSR:EE off. We can later get it back with a
> > > soft reset.
> > 
> > Thanks for the suggestions! I'll work on getting something together.
> 
> Well.. the cache flush part requires some not-really-documentd stuff on
> the 970, but I'll try to come up with something.

I was waiting for you to say that ;)

Thanks,
	Zwane


From mingo at elte.hu  Mon Jan 17 22:32:17 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Mon, 17 Jan 2005 12:32:17 +0100
Subject: [patch] spin-nicer-2.6.11-rc1-A1
In-Reply-To: <16873.40734.485466.850449@cargo.ozlabs.ibm.com>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu>
	<16873.40734.485466.850449@cargo.ozlabs.ibm.com>
Message-ID: <20050117113217.GA14619@elte.hu>


* Paul Mackerras <paulus at samba.org> wrote:

> > +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
> > +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
> 
> I don't think this is right - this means that a cpu trying to acquire
> a read lock will spin while any other cpu has a read lock.  We need to
> invent and use a rwlock_is_write_locked() here.  PPC64 and parisc have
> an is_write_locked() already, and it shouldn't be too hard to do one
> for the other architectures (i386 wants (signed int)rw->lock <= 0,
> most other arches seem to need (signed int)rw->lock < 0).
> 
> > +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
> 
> This one should be rwlock_is_locked, surely?  Otherwise the compiler
> will grizzle about us calling spin_is_locked with a rwlock_t *.

you are right on both counts. The patch below, ontop of current BK,
fixes both problems.

the first fix is that there was no compiler warning on x86 because it
uses macros - i fixed this by changing the spinlock field to be
'->slock'. (we could also use inline functions to get type protection, i
chose this solution because it was the easiest to do.)

the second fix is to split rwlock_is_locked() into two functions:

 +/**
 + * read_is_locked - would read_trylock() fail?
 + * @lock: the rwlock in question.
 + */
 +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0)
 +
 +/**
 + * write_is_locked - would write_trylock() fail?
 + * @lock: the rwlock in question.
 + */
 +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS)

this canonical naming of them also enabled the elimination of the newly
added 'is_locked_fn' argument to the BUILD_LOCK_OPS macro.

the third change was to change the other user of rwlock_is_locked(), and
to put a migration helper there: architectures that dont have
read/write_is_locked defined yet will get a #warning message but the
build will succeed. (except if PREEMPT is enabled - there we really
need.)

compile and boot-tested on x86, on SMP and UP, PREEMPT and !PREEMPT. 
Non-x86 architectures should work fine, except PREEMPT+SMP builds which
will need the read_is_locked()/write_is_locked() definitions.
!PREEMPT+SMP builds will work fine and will produce a #warning.

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
+#define BUILD_LOCK_OPS(op, locktype)					\
 void __lockfunc _##op##_lock(locktype *lock)				\
 {									\
 	preempt_disable();						\
@@ -183,7 +183,7 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
 			cpu_relax();					\
 		preempt_disable();					\
 	}								\
@@ -205,7 +205,7 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
 			cpu_relax();					\
 		preempt_disable();					\
 	}								\
@@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
-BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
-BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
+BUILD_LOCK_OPS(spin, spinlock_t);
+BUILD_LOCK_OPS(read, rwlock_t);
+BUILD_LOCK_OPS(write, rwlock_t);
 
 #endif /* CONFIG_PREEMPT */
 
--- linux/include/asm-i386/spinlock.h.orig
+++ linux/include/asm-i386/spinlock.h
@@ -15,7 +15,7 @@ asmlinkage int printk(const char * fmt, 
  */
 
 typedef struct {
-	volatile unsigned int lock;
+	volatile unsigned int slock;
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned magic;
 #endif
@@ -43,7 +43,7 @@ typedef struct {
  * We make no fairness assumptions. They have a cost.
  */
 
-#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
+#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) <= 0)
 #define spin_unlock_wait(x)	do { barrier(); } while(spin_is_locked(x))
 
 #define spin_lock_string \
@@ -83,7 +83,7 @@ typedef struct {
 
 #define spin_unlock_string \
 	"movb $1,%0" \
-		:"=m" (lock->lock) : : "memory"
+		:"=m" (lock->slock) : : "memory"
 
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -101,7 +101,7 @@ static inline void _raw_spin_unlock(spin
 
 #define spin_unlock_string \
 	"xchgb %b0, %1" \
-		:"=q" (oldval), "=m" (lock->lock) \
+		:"=q" (oldval), "=m" (lock->slock) \
 		:"0" (oldval) : "memory"
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -123,7 +123,7 @@ static inline int _raw_spin_trylock(spin
 	char oldval;
 	__asm__ __volatile__(
 		"xchgb %b0,%1"
-		:"=q" (oldval), "=m" (lock->lock)
+		:"=q" (oldval), "=m" (lock->slock)
 		:"0" (0) : "memory");
 	return oldval > 0;
 }
@@ -138,7 +138,7 @@ static inline void _raw_spin_lock(spinlo
 #endif
 	__asm__ __volatile__(
 		spin_lock_string
-		:"=m" (lock->lock) : : "memory");
+		:"=m" (lock->slock) : : "memory");
 }
 
 static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags)
@@ -151,7 +151,7 @@ static inline void _raw_spin_lock_flags 
 #endif
 	__asm__ __volatile__(
 		spin_lock_string_flags
-		:"=m" (lock->lock) : "r" (flags) : "memory");
+		:"=m" (lock->slock) : "r" (flags) : "memory");
 }
 
 /*
@@ -186,7 +186,17 @@ typedef struct {
 
 #define rwlock_init(x)	do { *(x) = RW_LOCK_UNLOCKED; } while(0)
 
-#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
+/**
+ * read_is_locked - would read_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0)
+
+/**
+ * write_is_locked - would write_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
 
 /*
  * On x86, we implement read-write locks as a 32-bit counter
--- linux/kernel/exit.c.orig
+++ linux/kernel/exit.c
@@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_
 #ifdef CONFIG_SMP
 	if (!p->sighand)
 		BUG();
+#ifndef write_is_locked
+# warning please implement read_is_locked()/write_is_locked()!
+# define write_is_locked rwlock_is_locked
+#endif
 	if (!spin_is_locked(&p->sighand->siglock) &&
-				!rwlock_is_locked(&tasklist_lock))
+				!write_is_locked(&tasklist_lock))
 		BUG();
 #endif
 	return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);


From mingo at elte.hu  Mon Jan 17 23:42:09 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Mon, 17 Jan 2005 13:42:09 +0100
Subject: [patch] spin-yield-2.6.11-rc1-A1
In-Reply-To: <16873.55739.214904.473407@cargo.ozlabs.ibm.com>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu>
	<16873.55739.214904.473407@cargo.ozlabs.ibm.com>
Message-ID: <20050117124209.GA20796@elte.hu>


* Paul Mackerras <paulus at samba.org> wrote:

> > hm, how about calling __spin_yield() from _raw_spin_trylock(), if the
> > locking attempt was unsuccessful? This might be slightly incorrect if
> > the locking attempt is not connected to an actual spin-loop, but we do
> > have other spin-loops with open-coded trylocks that would benefit from
> > this optimization too.
> 
> That would help, but we also need to yield while we are polling the
> lock until it becomes available.  Otherwise we will only yield once;
> if we get another timeslice and the other cpu still hasn't finished
> with the lock (or another cpu has got it now), we will spin uselessly
> for the whole of our timeslice.  Thus I think we need to yield in the
> polling loop, whether or not we also yield in _raw_spin_trylock.

ok - how about the (raw) patch below? (ontop of BK plus the latest
spin-nicer patch i sent earlier.) It builds/boots on x86 but is untested
on ppc64.

the idea is to make spin_yield() a generic function, with some related
namespace cleanups.

	Ingo

Acked-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/exit.c.orig
+++ linux/kernel/exit.c
@@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_
 #ifdef CONFIG_SMP
 	if (!p->sighand)
 		BUG();
+#ifndef write_is_locked
+# warning please implement read_is_locked()/write_is_locked()!
+# define write_is_locked rwlock_is_locked
+#endif
 	if (!spin_is_locked(&p->sighand->siglock) &&
-				!rwlock_is_locked(&tasklist_lock))
+				!write_is_locked(&tasklist_lock))
 		BUG();
 #endif
 	return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);
--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
-void __lockfunc _##op##_lock(locktype *lock)				\
+#define BUILD_LOCK_OPS(op, locktype)					\
+void __lockfunc _##op##_lock(locktype##_t *lock)			\
 {									\
 	preempt_disable();						\
 	for (;;) {							\
@@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock);						\
 									\
-unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock)		\
+unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock)	\
 {									\
 	unsigned long flags;						\
 									\
@@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 	return flags;							\
@@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir
 									\
 EXPORT_SYMBOL(_##op##_lock_irqsave);					\
 									\
-void __lockfunc _##op##_lock_irq(locktype *lock)			\
+void __lockfunc _##op##_lock_irq(locktype##_t *lock)			\
 {									\
 	_##op##_lock_irqsave(lock);					\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock_irq);					\
 									\
-void __lockfunc _##op##_lock_bh(locktype *lock)				\
+void __lockfunc _##op##_lock_bh(locktype##_t *lock)			\
 {									\
 	unsigned long flags;						\
 									\
@@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
-BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
-BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
+BUILD_LOCK_OPS(spin, spinlock);
+BUILD_LOCK_OPS(read, rwlock);
+BUILD_LOCK_OPS(write, rwlock);
 
 #endif /* CONFIG_PREEMPT */
 
--- linux/include/asm-i386/spinlock.h.orig
+++ linux/include/asm-i386/spinlock.h
@@ -7,6 +7,8 @@
 #include <linux/config.h>
 #include <linux/compiler.h>
 
+#include <asm-generic/spinlock.h>
+
 asmlinkage int printk(const char * fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
 
@@ -15,7 +17,7 @@ asmlinkage int printk(const char * fmt, 
  */
 
 typedef struct {
-	volatile unsigned int lock;
+	volatile unsigned int slock;
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned magic;
 #endif
@@ -43,7 +45,7 @@ typedef struct {
  * We make no fairness assumptions. They have a cost.
  */
 
-#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
+#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) <= 0)
 #define spin_unlock_wait(x)	do { barrier(); } while(spin_is_locked(x))
 
 #define spin_lock_string \
@@ -83,7 +85,7 @@ typedef struct {
 
 #define spin_unlock_string \
 	"movb $1,%0" \
-		:"=m" (lock->lock) : : "memory"
+		:"=m" (lock->slock) : : "memory"
 
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -101,7 +103,7 @@ static inline void _raw_spin_unlock(spin
 
 #define spin_unlock_string \
 	"xchgb %b0, %1" \
-		:"=q" (oldval), "=m" (lock->lock) \
+		:"=q" (oldval), "=m" (lock->slock) \
 		:"0" (oldval) : "memory"
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -123,7 +125,7 @@ static inline int _raw_spin_trylock(spin
 	char oldval;
 	__asm__ __volatile__(
 		"xchgb %b0,%1"
-		:"=q" (oldval), "=m" (lock->lock)
+		:"=q" (oldval), "=m" (lock->slock)
 		:"0" (0) : "memory");
 	return oldval > 0;
 }
@@ -138,7 +140,7 @@ static inline void _raw_spin_lock(spinlo
 #endif
 	__asm__ __volatile__(
 		spin_lock_string
-		:"=m" (lock->lock) : : "memory");
+		:"=m" (lock->slock) : : "memory");
 }
 
 static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags)
@@ -151,7 +153,7 @@ static inline void _raw_spin_lock_flags 
 #endif
 	__asm__ __volatile__(
 		spin_lock_string_flags
-		:"=m" (lock->lock) : "r" (flags) : "memory");
+		:"=m" (lock->slock) : "r" (flags) : "memory");
 }
 
 /*
@@ -186,7 +188,17 @@ typedef struct {
 
 #define rwlock_init(x)	do { *(x) = RW_LOCK_UNLOCKED; } while(0)
 
-#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
+/**
+ * read_is_locked - would read_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0)
+
+/**
+ * write_is_locked - would write_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
 
 /*
  * On x86, we implement read-write locks as a 32-bit counter


From cfriesen at nortelnetworks.com  Tue Jan 18 02:14:42 2005
From: cfriesen at nortelnetworks.com (Chris Friesen)
Date: Mon, 17 Jan 2005 09:14:42 -0600
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <1105937266.4534.0.camel@gaston>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>	
	<1105827794.27410.82.camel@gaston>	
	<Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>
	<1105937266.4534.0.camel@gaston>
Message-ID: <41EBD662.1080409@nortelnetworks.com>

Benjamin Herrenschmidt wrote:

> Well.. the cache flush part requires some not-really-documentd stuff on
> the 970, but I'll try to come up with something.

Details?  We've got a cache-flush routine put together based on the 
documentation that seems to be working, but if there's something else 
that has to be done I'd love to know about it.

Chris


From dhowells at redhat.com  Tue Jan 18 03:27:19 2005
From: dhowells at redhat.com (David Howells)
Date: Mon, 17 Jan 2005 16:27:19 +0000
Subject: [PATCH] Fix kallsyms/insmod/rmmod race
Message-ID: <31453.1105979239@redhat.com>


The attached patch fixes a race between kallsyms and insmod/rmmod.

The problem is this:

 (1) The various kallsyms functions poke around in the module list without any
     locking so that they can be called from the oops handler.

 (2) Although insmod and rmmod use locks to exclude each other, these have no
     effect on the kallsyms function.

 (3) Although rmmod modifies the module state with the machine "stopped", it
     hasn't removed the metadata from the module metadata list, meaning that
     as soon as the machine is "restarted", the metadata can be observed by
     kallsyms.

     It's not possible to say that an item in that list should be ignored if
     it's state is marked as inactive - you can't get at the state information
     because you can't trust the metadata in which it is embedded.

     Furthermore, list linkage information is embedded in the metadata too, so
     you can't trust that either...

 (4) kallsyms may be walking the module list without a lock whilst either
     insmod or rmmod are busy changing it. insmod probably isn't a problem
     since nothing is going a way, but rmmod is as it's deleting an entry.

 (5) Therefore nothing that uses these functions can in any way trust any
     pointers to "static" data (such as module symbol names or module names)
     that are returned.

 (6) On ppc64 the problems are exacerbated since the hypervisor may reschedule
     bits of the kernel, making operations that appear adjacent occur a long
     time apart.

This patch fixes the race by only linking/unlinking modules into/from the
master module list with the machine in the "stopped" state. This means that
any "static" information can be trusted as far as the next kernel reschedule
on any given CPU without the need to hold any locks.

However, I'm not sure how this is affected by preemption. I suspect more work
may need to be done in that case, but I'm not entirely sure.

This also means that rmmod has to bump the machine into the stopped state
twice... but since that shouldn't be a common operation, I don't think that's
a problem.

Signed-Off-By: David Howells <dhowells at redhat.com>
---
warthog>diffstat kallsyms-race-2611rc1.diff
 kallsyms.c |   16 ++++++++++++++--
 module.c   |   35 ++++++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 9 deletions(-)

diff -uNrp linux-2.6.11-rc1/kernel/kallsyms.c linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c
--- linux-2.6.11-rc1/kernel/kallsyms.c	2005-01-12 19:09:18.000000000 +0000
+++ linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c	2005-01-17 15:33:55.000000000 +0000
@@ -139,13 +139,20 @@ unsigned long kallsyms_lookup_name(const
 	return module_kallsyms_lookup_name(name);
 }
 
-/* Lookup an address.  modname is set to NULL if it's in the kernel. */
+/*
+ * Lookup an address
+ * - modname is set to NULL if it's in the kernel
+ * - we guarantee that the returned name is valid until we reschedule even if
+ *   it resides in a module
+ * - we also guarantee that modname will be valid until rescheduled
+ */
 const char *kallsyms_lookup(unsigned long addr,
 			    unsigned long *symbolsize,
 			    unsigned long *offset,
 			    char **modname, char *namebuf)
 {
 	unsigned long i, low, high, mid;
+	const char *msym;
 
 	/* This kernel should never had been booted. */
 	BUG_ON(!kallsyms_addresses);
@@ -196,7 +203,12 @@ const char *kallsyms_lookup(unsigned lon
 		return namebuf;
 	}
 
-	return module_address_lookup(addr, symbolsize, offset, modname);
+	/* see if it's in a module */
+	msym = module_address_lookup(addr, symbolsize, offset, modname);
+	if (msym)
+		return strncpy(namebuf, msym, KSYM_NAME_LEN);
+
+	return NULL;
 }
 
 /* Replace "%s" in format with address, or returns -errno. */
diff -uNrp linux-2.6.11-rc1/kernel/module.c linux-2.6.11-rc1-kallsyms/kernel/module.c
--- linux-2.6.11-rc1/kernel/module.c	2005-01-12 19:09:18.000000000 +0000
+++ linux-2.6.11-rc1-kallsyms/kernel/module.c	2005-01-17 15:31:42.000000000 +0000
@@ -1072,14 +1072,24 @@ static void mod_kobject_remove(struct mo
 	kobject_unregister(&mod->mkobj.kobj);
 }
 
+/*
+ * unlink the module with the whole machine is stopped with interrupts off
+ * - this defends against kallsyms not taking locks
+ */
+static inline int __unlink_module(void *_mod)
+{
+	struct module *mod = _mod;
+	spin_lock(&modlist_lock);
+	list_del(&mod->list);
+	spin_unlock(&modlist_lock);
+	return 0;
+}
+
 /* Free a module, remove from lists, etc (must hold module mutex). */
 static void free_module(struct module *mod)
 {
 	/* Delete from various lists */
-	spin_lock_irq(&modlist_lock);
-	list_del(&mod->list);
-	spin_unlock_irq(&modlist_lock);
-
+	stop_machine_run(__unlink_module, mod, NR_CPUS);
 	remove_sect_attrs(mod);
 	mod_kobject_remove(mod);
 
@@ -1732,6 +1742,19 @@ static struct module *load_module(void _
 	goto free_hdr;
 }
 
+/*
+ * link the module with the whole machine is stopped with interrupts off
+ * - this defends against kallsyms not taking locks
+ */
+static inline int __link_module(void *_mod)
+{
+	struct module *mod = _mod;
+	spin_lock(&modlist_lock);
+	list_add(&mod->list, &modules);
+	spin_unlock(&modlist_lock);
+	return 0;
+}
+
 /* This is where the real work happens */
 asmlinkage long
 sys_init_module(void __user *umod,
@@ -1766,9 +1789,7 @@ sys_init_module(void __user *umod,
 
 	/* Now sew it into the lists.  They won't access us, since
            strong_try_module_get() will fail. */
-	spin_lock_irq(&modlist_lock);
-	list_add(&mod->list, &modules);
-	spin_unlock_irq(&modlist_lock);
+	stop_machine_run(__link_module, mod, NR_CPUS);
 
 	/* Drop lock so they can recurse */
 	up(&module_mutex);


From willschm at us.ibm.com  Tue Jan 18 03:42:05 2005
From: willschm at us.ibm.com (Will Schmidt)
Date: Mon, 17 Jan 2005 10:42:05 -0600
Subject: question about LMB's size
In-Reply-To: <OF107B9B26.4CE45ECB-ON48256F8C.003964E7-48256F8C.003D0851@cn.ibm.com>
Message-ID: <OF69A2C033.F9ACAAD9-ON86256F8C.005A7B66-86256F8C.005BBF20@us.ibm.com>


Hi,

ipseries-list-bounces at redhat.com wrote on 01/17/2005 05:00:46 AM:
> Hi,
> This is a question about the different of memory size between lpar and
HMC.
...

> 2. In lpar didolp2: We get the size of memory is 2174672KB.
> [root at didolp2 ~]# cat /proc/meminfo
> MemTotal: 2174672 kB
>
> The question is: 2174672/(32*1024) = 66.36572265625

MemTotal is the amount of free memory in the partition, which does not
include the memory that holds the kernel code, (bss, data, init).

There should be a few other pieces of data that will add up to the numbers
you are looking for.

in early boot messages, there is a line "SystemCfg->physicalMemorySize =
0x.......".   This value should be precisely what you are trying to
measure.

A bit later in the logs, you can also see a line
"Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k data,
###k bss, ###k init).
the YYYYk should also match what you are looking for.


>
> whereas 2176/32=68.
>
> 68 != 66.36572265625
>
> --------------------------------------------
> Wang Zhaoyu
>
> Email: wangzyu at cn.ibm.com
> Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN--
> ipseries-list mailing list
> ipseries-list at redhat.com
> https://www.redhat.com/mailman/listinfo/ipseries-list

-Will


From linas at austin.ibm.com  Tue Jan 18 07:14:15 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Mon, 17 Jan 2005 14:14:15 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <20050106192413.GK22274@austin.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
Message-ID: <20050117201415.GA11505@austin.ibm.com>


Andrew,

The attached file describes PCI bus EEH "Extended Error Handling"
concepts and operation;  could you drop this into the kernel
documentation tree, at
linux-2.6/Documentation/powerpc/eeh-pci-error-recovery.txt ?

Signed-off-by: Linas Vepstas <linas at linas.org>

--linas

p.s.  It was not clear to me if the EEH patch previously sent 
(6 January 2005, same subject line) will be wending its way into 
the main Torvalds kernel tree, or not.  I hadn't really gotten
confirmation one way or another.


-------------- next part --------------


                      PCI Bus EEH Error Recovery
                      --------------------------
                           Linas Vepstas
                       <linas at austin.ibm.com>
                          12 January 2005


Overview:
---------
The IBM POWER-based pSeries and iSeries computers include PCI bus 
controller chips that have extended capabilities for detecting and 
reporting a large variety of PCI bus error conditions.  These features 
go under the name of "EEH", for "Extended Error Handling".  The EEH
hardware features allow PCI bus errors to be cleared and a PCI
card to be "rebooted", without also having to reboot the operating
system.  

This is in contrast to traditional PCI error handling, where the 
PCI chip is wired directly to the CPU, and an error would cause 
a CPU machine-check/check-stop condition, halting the CPU entirely. 
Another "traditional" technique is to ignore such errors, which
can lead to data corruption, both of user data or of kernel data,
hung/unresponsive adapters, or system crashes/lockups.  Thus, 
the idea behind EEH is that the operating system can become more
reliable and robust by protecting it from PCI errors, and giving
the OS the ability to "reboot"/recover individual PCI devices.

Future systems from other vendors, based on the PCI-E specification,
may contain similar features. 


Causes of EEH Errors
--------------------
EEH was originally designed to guard against hardware failure, such 
as PCI cards dying from heat, humidity, dust, vibration and bad 
electrical connections. The vast majority of EEH errors seen in 
"real life" are due to eithr poorly seated PCI cards, or, 
unfortunately quite commonly, due device driver bugs, device firmware 
bugs, and sometimes PCI card hardware bugs.

The most common software bug, is one that causes the device to
attempt to DMA to a location in system memory that has not been 
reserved for DMA access for that card.  This is a powerful feature, 
as it prevents what; otherwise, would have been silent memory 
corruption caused by the bad DMA.  A number of device driver
bugs have been found and fixed in this way over the past few 
years.  Other possible causes of EEH errors include data or 
address line parity errors (for example, due to poor electrical 
connectivity due to a poorly seated card), and PCI-X split-completion 
errors (due to software, device firmware, or device PCI hardware bugs). 
The vast majority of "true hardware failures" can be cured by
physically removing and re-seating the PCI card.


Detection and Recovery
----------------------
In the following discussion, a generic overview of how to detect 
and recover from EEH errors will be presented. This is followed
by an overview of how the current implementation in the Linux
kernel does it.  The actual implementation is subject to change,
and some of the finer points are still being debated.  These 
may in turn be swayed if or when other architectures implement 
similar functionality.

When a PCI Host Bridge (PHB, the bus controller connecting the 
PCI bus to the system CPU electronics complex) detects a PCI error
condition, it will "isolate" the affected PCI card.  Isolation 
will block all writes (either to the card from the system, or 
from the card to the system), and it will cause all reads to 
return all-ff's (0xff, 0xffff, 0xffffffff for 8/16/32-bit reads).
This value was chosen because it is the same value you would
get if the device was physically unplugged from the slot.
This includes access to PCI memory, I/O space, and PCI config 
space.  Interrupts; however, will continued to be delivered.

Detection and recovery are performed with the aid of ppc64 
firmware.  The programming interfaces in the Linux kernel 
into the firmware are referred to as RTAS (Run-Time Abstraction 
Services).  The Linux kernel does not (should not) access
the EEH function in the PCI chipsets directly, primarily because 
there are a number of different chipsets out there, each with 
different interfaces and quirks. The firmware provides a 
uniform abstraction layer that will work with all pSeries 
and iSeries hardware (and be forwards-compatible).

If the OS or device driver suspects that a PCI slot has been 
EEH-isolated, there is a firmware call it can make to determine if 
this is the case. If so, then the device driver should put itself 
into a consistent state (given that it won't be able to complete any 
pending work) and start recovery of the card.  Recovery normally 
would consist of reseting the PCI device (holding the PCI #RST 
line high for two seconds), followed by setting up the device 
config space (the base address registers (BAR's), latency timer, 
cache line size, interrupt line, and so on).  This is followed by a 
reinitialization of the device driver.  In a worst-case scenario, 
the power to the card can be toggled, at least on hot-plug-capable 
slots.  In principle, layers far above the device driver probably 
do not need to know that the PCI card has been "rebooted" in this 
way; ideally, there should be at most a pause in Ethernet/disk/USB 
I/O while the card is being reset. 

If the card cannot be recovered after three or four resets, the 
kernel/device driver should assume the worst-case scenario, that the 
card has died completely, and report this error to the sysadmin.  
In addition, error messages are reported through RTAS and also through 
syslogd (/var/log/messages) to alert the sysadmin of PCI resets.
The correct way to deal with failed adapters is to use the standard
PCI hotplug tools to remove and replace the dead card.


Current PPC64 Linux EEH Implementation
--------------------------------------
At this time, a generic EEH recovery mechanism has been implemented,
so that individual device drivers do not need to be modified to support
EEH recovery.  This generic mechanism piggy-backs on the PCI hotplug
infrastructure,  and percolates events up through the hotplug/udev 
infrastructure.  Followiing is a detailed description of how this is 
accomplished.

EEH must be enabled in the PHB's very early during the boot process, 
and if a PCI slot is hot-plugged. The former is performed by 
eeh_init() in arch/ppc64/kernel/eeh.c, and the later by
drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code.
EEH must be enabled before a PCI scan of the device can proceed.
Current Power5 hardware will not work unless EEH is enabled;
although older Power4 can run with it disabled.  Effectively,
EEH can no longer be turned off.  PCI devices *must* be 
registered with the EEH code; the EEH code needs to know about
the I/O address ranges of the PCI device in order to detect an 
error.  Given an arbitrary address, the routine 
pci_get_device_by_addr() will find the pci device associated 
with that address (if any).

The default include/asm-ppc64/io.h macros readb(), inb(), insb(), 
etc. include a check to see if the the i/o read returned all-0xff's.
If so, these make a call to eeh_dn_check_failure(), which in turn
asks the firmware if the all-ff's value is the sign of a true EEH 
error.  If it is not, processing continues as normal.  The grand 
total number of these false alarms or "false positives" can be
seen in /proc/ppc64/eeh (subject to change).  Normally, almost 
all of these occur during boot, when the PCI bus is scanned, where
a large number of 0xff reads are part of the bus scan procedure.

If a frozen slot is detected, code in arch/ppc64/kernel/eeh.c will 
print a stack trace to syslog (/var/log/messages).  This stack trace 
has proven to be very useful to device-driver authors for finding 
out at what point the EEH error was detected, as the error itself
usually occurs slightly beforehand.

Next, it uses the Linux kernel notifier chain/work queue mechanism to
allow any interested parties to find out about the failure.  Device 
drivers, or other parts of the kernel, can use 
eeh_register_notifier(struct notifier_block *) to find out about EEH 
events.  The event will include a pointer to the pci device, the 
device node and some state info.  Receivers of the event can "do as 
they wish"; the default handler will be described further in this
section.

To assist in the recovery of the device, eeh.c exports the
following functions:

rtas_set_slot_reset() -- assert the  PCI #RST line for 1/8th of a second
rtas_configure_bridge() -- ask firmware to configure any PCI bridges
   located topologically under the pci slot.
eeh_save_bars() and eeh_restore_bars(): save and restore the PCI
   config-space info for a device and any devices under it. 
 

A handler for the EEH notifier_block events is implemented in
drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events().
It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter().
This last call causes the device driver for the card to be stopped,
which causes hotplug events to go out to user space. This triggers
user-space scripts that might issue commands such as "ifdown eth0"
for ethernet cards, and so on.  This handler then sleeps for 5 seconds,
hoping to give the user-space scripts enough time to complete.
It then resets the PCI card, reconfigures the device BAR's, and
any bridges underneath. It then calls rpaphp_enable_pci_slot(),
which restarts the device driver and triggers more user-space
events (for example, calling "ifup eth0" for ethernet cards).


Device Shutdown and User-Space Events
-------------------------------------
This section documents what happens when a pci slot is unconfigured,
focusing on how the device driver gets shut down, and on how the 
events get delivered to user-space scripts.
 
Following is an example sequence of events that cause a device driver
close function to be called during the first phase of an EEH reset.  
The following sequence is an example of the pcnet32 device driver.

    rpa_php_unconfig_pci_adapter (struct slot *)  // in rpaphp_pci.c
    {
      calls
      pci_remove_bus_device (struct pci_dev *) // in /drivers/pci/remove.c
      { 
        calls
        pci_destroy_dev (struct pci_dev *) 
        {
          calls 
          device_unregister (&dev->dev) // in /drivers/base/core.c
          {
            calls
            device_del (struct device *)
            {
              calls 
              bus_remove_device() // in /drivers/base/bus.c
              {
                calls 
                device_release_driver()
                {
                  calls 
                  struct device_driver->remove() which is just
                  pci_device_remove()  // in /drivers/pci/pci_driver.c
                  {
                    calls
                    struct pci_driver->remove() which is just
                    pcnet32_remove_one() // in /drivers/net/pcnet32.c  
                    {
                      calls
                      unregister_netdev() // in /net/core/dev.c
                      {
                        calls 
                        dev_close()  // in /net/core/dev.c
                        { 
                           calls dev->stop();
                           which is just pcnet32_close() // in pcnet32.c
                           {
                             which does what you wanted
                             to stop the device
                           }
                        }
                     }
                   which
                   frees pcnet32 device driver memory
                }
     }}}}}}


    in drivers/pci/pci_driver.c, 
    struct device_driver->remove() is just pci_device_remove() 
    which calls struct pci_driver->remove() which is pcnet32_remove_one()
    which calls unregister_netdev()  (in net/core/dev.c)
    which calls dev_close()  (in net/core/dev.c) 
    which calls dev->stop() which is pcnet32_close() 
    which then does the appropriate shutdown. 
    
---
Following is the analogous stack trace for events sent to user-space
when the pci device is unconfigured.

rpa_php_unconfig_pci_adapter() {             // in rpaphp_pci.c 
  calls
  pci_remove_bus_device (struct pci_dev *) { // in /drivers/pci/remove.c
    calls 
    pci_destroy_dev (struct pci_dev *) {
      calls 
      device_unregister (&dev->dev) {      // in /drivers/base/core.c 
        calls
        device_del(struct device * dev) {  // in /drivers/base/core.c
          calls
          kobject_del() {                  //in /libs/kobject.c
            calls
            kobject_hotplug() {            // in /libs/kobject.c
              calls
              kset_hotplug() {             // in /lib/kobject.c
                calls 
                kset->hotplug_ops->hotplug() which is really just
                a call to 
                dev_hotplug() {           // in /drivers/base/core.c
                  calls 
                  dev->bus->hotplug() which is really just a call to 
                  pci_hotplug () {      // in drivers/pci/hotplug.c
                    which prints device name, etc....
                 }
               }
               then kset_hotplug() calls 
                call_usermodehelper () with 
                   argv[0]=hotplug_path[] which is "/sbin/hotplug"
             --> event to userspace, 
           }
         }
         kobject_del() then calls sysfs_remove_dir(), which would
         trigger any user-space daemon that was watching /sysfs,
         and notice the delete event.
  

Pro's and Con's of the Current Design
-------------------------------------
There are several issues with the current EEH software recovery design,
which may be addressed in future revisions.  But first, note that the 
big plus of the current design is that no changes need to be made to 
individual device drivers, so that the current design throws a wide net.
The biggest negative of the design is that it potentially disturbs 
network daemons and file systems that didn't need to be disturbed.

-- A minor complaint is that resetting the network card causes 
   user-space back-to-back ifdown/ifup burps that potentially disturb 
   network daemons, that didn't need to even know that the pci
   card was being rebooted.

-- A more serious concern is that the same reset, for SCSI devices,
   causes havoc to mounted file systems.  Scripts cannot post-facto
   unmount a file system without flushing pending buffers, but this 
   is impossible, because I/O has already been stopped.  Thus, 
   ideally, the reset should happen at or below the block layer,
   so that the file systems are not disturbed.

   Reiserfs does not tolerate errors returned from the block device.
   Ext3fs seems to be tolerant, retrying reads/writes until it does
   succeed. Both have been only lightly tested in this scenario.

   The SCSI-generic subsystem already has built-in code for performing
   SCSI device resets, SCSI bus resets, and SCSI host-bus-adapter 
   (HBA) resets.  These are cascaded into a chain of attempted 
   resets if a SCSI command fails. These are completely hidden
   from the block layer.  It would be very natural to add an EEH 
   reset into this chain of events.

-- If a SCSI error occurs for the root device, all is lost unless
   the sysadmin had the foresight to run /bin, /sbin, /etc, /var 
   and so on, out of ramdisk/tmpfs.


Conclusions
-----------
There's forward progress ... 


From nacc at us.ibm.com  Tue Jan 18 10:50:05 2005
From: nacc at us.ibm.com (Nishanth Aravamudan)
Date: Mon, 17 Jan 2005 15:50:05 -0800
Subject: [PATCH 16/21] ppc64/iSeries_pci_reset: replace schedule_timeout()
	with msleep()
Message-ID: <20050117235005.GY24698@us.ibm.com>

Hi,

Please consider applying. 

Description: Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The code is not wrong as is, but I see two benefits to using
msleep(): 1) real time delays (milliseconds) and 2) consistency across the
kernel with respect to longer delays. Change the units of the WaitDelay and
AssertDelay constants accordingly.

Signed-off-by: Nishanth Aravamudan <nacc at us.ibm.com>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/iSeries_pci_reset.c	2005-01-15 16:55:41.000000000 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/iSeries_pci_reset.c	2005-01-15 17:17:54.000000000 -0800
@@ -32,6 +32,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/irq.h>
+#include <linux/delay.h>
 
 #include <asm/io.h>
 #include <asm/iSeries/HvCallPci.h>
@@ -49,7 +50,7 @@
 int iSeries_Device_ToggleReset(struct pci_dev *PciDev, int AssertTime,
 		int DelayTime)
 {
-	unsigned long AssertDelay, WaitDelay;
+	unsigned int AssertDelay, WaitDelay;
 	struct iSeries_Device_Node *DeviceNode =
 		(struct iSeries_Device_Node *)PciDev->sysdata;
 
@@ -62,14 +63,14 @@ int iSeries_Device_ToggleReset(struct pc
 	 * Set defaults, Assert is .5 second, Wait is 3 seconds.
 	 */
 	if (AssertTime == 0)
-		AssertDelay = (5 * HZ) / 10;
+		AssertDelay = 500;
 	else
-		AssertDelay = (AssertTime * HZ) / 10;
+		AssertDelay = AssertTime * 100;
 
 	if (DelayTime == 0)
-		WaitDelay = (30 * HZ) / 10;
+		WaitDelay = 3000;
 	else
-		WaitDelay = (DelayTime * HZ) / 10;
+		WaitDelay = DelayTime * 100;
 
 	/*
 	 * Assert reset
@@ -77,8 +78,7 @@ int iSeries_Device_ToggleReset(struct pc
 	DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
 			0x00, DeviceNode->AgentId, 1);
 	if (DeviceNode->ReturnCode == 0) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(AssertDelay);       /* Sleep for the time */
+		msleep(AssertDelay);			/* Sleep for the time */
 		DeviceNode->ReturnCode =
 			HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
 					0x00, DeviceNode->AgentId, 0);
@@ -86,8 +86,7 @@ int iSeries_Device_ToggleReset(struct pc
 		/*
    		 * Wait for device to reset
 		 */
-		set_current_state(TASK_UNINTERRUPTIBLE);  
-		schedule_timeout(WaitDelay);
+		msleep(WaitDelay);
 	}
 	if (DeviceNode->ReturnCode == 0)
 		PCIFR("Slot 0x%04X.%02 Reset\n", ISERIES_BUS(DeviceNode),


From nacc at us.ibm.com  Tue Jan 18 11:15:22 2005
From: nacc at us.ibm.com (Nishanth Aravamudan)
Date: Mon, 17 Jan 2005 16:15:22 -0800
Subject: [PATCH 17/21] ppc64/pSeries_smp: replace schedule_timeout() with
	msleep()
Message-ID: <20050118001522.GZ24698@us.ibm.com>

Hi,

Please consider applying. 

Description: Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not incorrect, but msleep() is clearer
in terms of the length of delay and helps make the kernel consistent.

Signed-off-by: Nishanth Aravamudan <nacc at us.ibm.com>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/pSeries_smp.c	2005-01-15 16:55:41.000000000 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/pSeries_smp.c	2005-01-15 17:21:12.000000000 -0800
@@ -107,8 +107,7 @@ void pSeries_cpu_die(unsigned int cpu)
 		cpu_status = query_cpu_stopped(pcpu);
 		if (cpu_status == 0 || cpu_status == -1)
 			break;
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(HZ/5);
+		msleep(200);
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",


From nacc at us.ibm.com  Tue Jan 18 11:18:19 2005
From: nacc at us.ibm.com (Nishanth Aravamudan)
Date: Mon, 17 Jan 2005 16:18:19 -0800
Subject: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with msleep()
Message-ID: <20050118001819.GA24698@us.ibm.com>

Hi,

Please consider applying. 

Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases,
the current code sleeps in TASK_INTERRUPTIBLE but does not account for early
wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE
sleeps in both cases. The second sleep is slightly more difficult to convert as
rtas_event_scan_rate is variable. I have left it as a msleep() call, although
ssleep() may be more appropriate.

Signed-off-by: Nishanth Aravamudan <nacc at us.ibm.com>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/rtasd.c	2005-01-15 16:55:41.000000000 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/rtasd.c	2005-01-15 17:28:50.000000000 -0800
@@ -19,6 +19,7 @@
 #include <linux/vmalloc.h>
 #include <linux/spinlock.h>
 #include <linux/cpu.h>
+#include <linux/delay.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -444,8 +445,7 @@ static int rtasd(void *unused)
 		DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id());
 
 		do_event_scan(event_scan);
-		set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(HZ);
+		ssleep(1);
 	}
 	unlock_cpu_hotplug();
 
@@ -466,8 +466,7 @@ static int rtasd(void *unused)
 		 * one second since some machines have problems if we
 		 * call event-scan too quickly). */
 		unlock_cpu_hotplug();
-		set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout((HZ*60/rtas_event_scan_rate) / 2);
+		msleep(30000/rtas_event_scan_rate);
 		lock_cpu_hotplug();
 
 		cpu = next_cpu(cpu, cpu_online_map);


From nacc at us.ibm.com  Tue Jan 18 11:20:13 2005
From: nacc at us.ibm.com (Nishanth Aravamudan)
Date: Mon, 17 Jan 2005 16:20:13 -0800
Subject: [PATCH 19/21] ppc64/smp: replace schedule_timeout() with msleep()
Message-ID: <20050118002013.GB24698@us.ibm.com>

Hi,

Please consider applying. 

Description: Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not incorrect; however using msleep()
encourages using real time-unit sleeps and keeps the kernel consistent.

Signed-off-by: Nishanth Aravamudan <nacc at us.ibm.com>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/smp.c	2005-01-15 16:55:41.000000000 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/smp.c	2005-01-15 17:30:16.000000000 -0800
@@ -459,8 +459,7 @@ int __devinit __cpu_up(unsigned int cpu)
 		 * hotplug case.  Wait five seconds.
 		 */
 		for (c = 25; c && !cpu_callin_map[cpu]; c--) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			schedule_timeout(HZ/5);
+			msleep(200);
 		}
 #endif
 

From nacc at us.ibm.com  Tue Jan 18 11:21:30 2005
From: nacc at us.ibm.com (Nishanth Aravamudan)
Date: Mon, 17 Jan 2005 16:21:30 -0800
Subject: [PATCH 20/21] ppc64/traps: replace schedule_timeout() with ssleep()
Message-ID: <20050118002130.GC24698@us.ibm.com>

Hi,

Please consider applying. 

Description:  Use ssleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not incorrect, but using ssleep()
encourages specifying delays in real time-units and consistency across the
kernel.

Signed-off-by: Nishanth Aravamudan <nacc at us.ibm.com>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/traps.c	2005-01-15 16:55:41.000000000 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/traps.c	2005-01-15 17:30:39.000000000 -0800
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/delay.h>
 #include <asm/kdebug.h>
 
 #include <asm/pgtable.h>
@@ -137,8 +138,7 @@ int die(const char *str, struct pt_regs 
 
 	if (panic_on_oops) {
 		printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(5 * HZ);
+		ssleep(5);
 		panic("Fatal exception");
 	}
 	do_exit(SIGSEGV);


From benh at kernel.crashing.org  Tue Jan 18 11:49:15 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Tue, 18 Jan 2005 11:49:15 +1100
Subject: [PATCH] PPC64 pmac hotplug cpu
In-Reply-To: <41EBD662.1080409@nortelnetworks.com>
References: <Pine.LNX.4.61.0501122341410.23299@montezuma.fsmlabs.com>
	<1105827794.27410.82.camel@gaston>
	<Pine.LNX.4.61.0501162129380.3010@montezuma.fsmlabs.com>
	<1105937266.4534.0.camel@gaston> <41EBD662.1080409@nortelnetworks.com>
Message-ID: <1106009355.4533.19.camel@gaston>

On Mon, 2005-01-17 at 09:14 -0600, Chris Friesen wrote:
> Benjamin Herrenschmidt wrote:
> 
> > Well.. the cache flush part requires some not-really-documentd stuff on
> > the 970, but I'll try to come up with something.
> 
> Details?  We've got a cache-flush routine put together based on the 
> documentation that seems to be working, but if there's something else 
> that has to be done I'd love to know about it.

Well, I don't have all the details at hand right now, but it involves
using SCOM (with appropriate workarounds for CPU SCOM bugs on some
970's) to switch the L2 to direct addressing iirc.

Ben.


From rusty at rustcorp.com.au  Tue Jan 18 13:20:03 2005
From: rusty at rustcorp.com.au (Rusty Russell)
Date: Tue, 18 Jan 2005 13:20:03 +1100
Subject: [PATCH] Fix kallsyms/insmod/rmmod race
In-Reply-To: <31453.1105979239@redhat.com>
References: <31453.1105979239@redhat.com>
Message-ID: <1106014803.30801.22.camel@localhost.localdomain>

On Mon, 2005-01-17 at 16:27 +0000, David Howells wrote:
> The attached patch fixes a race between kallsyms and insmod/rmmod.

Hi David,

	The more I looked at this, the more I warmed to it.  I've known for a
while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan),
so we should provide a "grabs locks" version, but this solution gets
around that nicely, while making life more certain for the oops case,
too.

Good work!
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


From dhowells at redhat.com  Wed Jan 19 06:44:28 2005
From: dhowells at redhat.com (David Howells)
Date: Tue, 18 Jan 2005 19:44:28 +0000
Subject: [PATCH] Fix kallsyms/insmod/rmmod race 
In-Reply-To: <1106014803.30801.22.camel@localhost.localdomain> 
References: <1106014803.30801.22.camel@localhost.localdomain>
	<31453.1105979239@redhat.com> 
Message-ID: <1561.1106077468@redhat.com>


Rusty Russell <rusty at rustcorp.com.au> wrote:

> 	The more I looked at this, the more I warmed to it.  I've known for a
> while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan),
> so we should provide a "grabs locks" version, but this solution gets
> around that nicely, while making life more certain for the oops case,
> too.


Hmmm... though it works on i386 SMP, it doesn't, however, seem to work on
ppc64 SMP:-/

My pSeries box seems to think that it can't find any symbols from previously
loaded modules, and my Power5 box is quite happy to load modules that depend
on other modules but panics because it can't mount its root fs.

This is very odd, because the patch is simple enough. Is there anything
obvious I've missed that you can see? Or maybe I'm just misunderstanding how
stop_machine_run() works... maybe it can't be called during initialisation.

David


From benh at kernel.crashing.org  Wed Jan 19 13:54:51 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Wed, 19 Jan 2005 13:54:51 +1100
Subject: [PATCH] ppc64/ppc: Cleanup PCI skipping
Message-ID: <1106103291.4500.147.camel@gaston>

Hi !

The g5 code has special hooks to "hide" some PCI devices when they are off.

Currently, this code involves some calls to match a pci_dev from the open firmware
node and such things that are causing some problems with the latest version of
my sungem driver who wants to do some of this in atomic contexts.

This patch moves that to a list of struct device_node instead, which also ends up
simplifying the code.

Later, I'll go back to manipulating PCI devices in a clean way when Brian King's
PCI blocking patch gets in, but only after I change sungem again to never call
these in atomic context. This is a 3 step transition basically

Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>

Index: linux-work/arch/ppc64/kernel/pmac_feature.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/pmac_feature.c	2004-11-22 11:49:24.000000000 +1100
+++ linux-work/arch/ppc64/kernel/pmac_feature.c	2005-01-19 13:48:25.000000000 +1100
@@ -111,7 +111,7 @@
 static u32 uninorth_rev __pmacdata;
 static void *u3_ht;
 
-extern struct pci_dev *k2_skiplist[2];
+extern struct device_node *k2_skiplist[2];
 
 /*
  * For each motherboard family, we have a table of functions pointers
@@ -160,30 +160,17 @@
 {
 	struct macio_chip* macio = &macio_chips[0];
 	unsigned long flags;
-	struct pci_dev *pdev = NULL;
 
 	if (node == NULL)
 		return -ENODEV;
 
-	/* XXX FIXME: We should fix pci_device_from_OF_node here, and
-	 * get to a real pci_dev or we'll get into trouble with PCI
-	 * domains the day we get overlapping numbers (like if we ever
-	 * decide to show the HT root.
-	 * Note that we only get the slot when value is 0. This is called
-	 * early during boot with value 1 to enable all devices, at which
-	 * point, we don't yet have probed pci_find_slot, so it would fail
-	 * to look for the slot at this point.
-	 */
-	if (!value)
-		pdev = pci_find_slot(node->busno, node->devfn);
-
 	LOCK(flags);
 	if (value) {
 		MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE);
 		mb();
 		k2_skiplist[0] = NULL;
 	} else {
-		k2_skiplist[0] = pdev;
+		k2_skiplist[0] = node;
 		mb();
 		MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE);
 	}
@@ -198,30 +185,17 @@
 {
 	struct macio_chip* macio = &macio_chips[0];
 	unsigned long flags;
-	struct pci_dev *pdev = NULL;
 
-	/* XXX FIXME: We should fix pci_device_from_OF_node here, and
-	 * get to a real pci_dev or we'll get into trouble with PCI
-	 * domains the day we get overlapping numbers (like if we ever
-	 * decide to show the HT root
-	 * Note that we only get the slot when value is 0. This is called
-	 * early during boot with value 1 to enable all devices, at which
-	 * point, we don't yet have probed pci_find_slot, so it would fail
-	 * to look for the slot at this point.
-	 */
 	if (node == NULL)
 		return -ENODEV;
 
-	if (!value)
-		pdev = pci_find_slot(node->busno, node->devfn);
-
 	LOCK(flags);
 	if (value) {
 		MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE);
 		mb();
 		k2_skiplist[1] = NULL;
 	} else {
-		k2_skiplist[1] = pdev;
+		k2_skiplist[1] = node;
 		mb();
 		MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE);
 	}
Index: linux-work/arch/ppc64/kernel/pmac_pci.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/pmac_pci.c	2005-01-14 08:17:11.000000000 +1100
+++ linux-work/arch/ppc64/kernel/pmac_pci.c	2005-01-19 13:44:50.000000000 +1100
@@ -43,7 +43,7 @@
  * assuming we won't have both UniNorth and Bandit */
 static int has_uninorth;
 static struct pci_controller *u3_agp;
-struct pci_dev *k2_skiplist[2];
+struct device_node *k2_skiplist[2];
 
 static int __init fixup_one_level_bus_range(struct device_node *node, int higher)
 {
@@ -233,15 +233,6 @@
 	struct device_node *busdn, *dn;
 	int i;
 
-	/*
-	 * When a device in K2 is powered down, we die on config
-	 * cycle accesses. Fix that here.
-	 */
-	for (i=0; i<2; i++)
-		if (k2_skiplist[i] && k2_skiplist[i]->bus == bus &&
-		    k2_skiplist[i]->devfn == devfn)
-			return 1;
-
 	/* We only allow config cycles to devices that are in OF device-tree
 	 * as we are apparently having some weird things going on with some
 	 * revs of K2 on recent G5s
@@ -256,6 +247,14 @@
 	if (dn == NULL)
 		return -1;
 
+	/*
+	 * When a device in K2 is powered down, we die on config
+	 * cycle accesses. Fix that here.
+	 */
+	for (i=0; i<2; i++)
+		if (k2_skiplist[i] == dn)
+			return 1;
+
 	return 0;
 }
 
Index: linux-work/arch/ppc/platforms/pmac_feature.c
===================================================================
--- linux-work.orig/arch/ppc/platforms/pmac_feature.c	2005-01-18 17:50:10.000000000 +1100
+++ linux-work/arch/ppc/platforms/pmac_feature.c	2005-01-19 13:46:06.000000000 +1100
@@ -56,7 +56,7 @@
 #endif
 
 extern int powersave_nap;
-extern struct pci_dev *k2_skiplist[2];
+extern struct device_node *k2_skiplist[2];
 
 
 /*
@@ -1328,16 +1328,6 @@
 {
 	struct macio_chip* macio = &macio_chips[0];
 	unsigned long flags;
-	struct pci_dev *pdev;
-	u8 pbus, pid;
-
-	/* XXX FIXME: We should fix pci_device_from_OF_node here, and
-	 * get to a real pci_dev or we'll get into trouble with PCI
-	 * domains the day we get overlapping numbers (like if we ever
-	 * decide to show the HT root
-	 */
-	if (pci_device_from_OF_node(node, &pbus, &pid) == 0)
-		pdev = pci_find_slot(pbus, pid);
 
 	LOCK(flags);
 	if (value) {
@@ -1345,7 +1335,7 @@
 		mb();
 		k2_skiplist[0] = NULL;
 	} else {
-		k2_skiplist[0] = pdev;
+		k2_skiplist[0] = node;
 		mb();
 		MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE);
 	}
@@ -1361,16 +1351,6 @@
 {
 	struct macio_chip* macio = &macio_chips[0];
 	unsigned long flags;
-	struct pci_dev *pdev;
-	u8 pbus, pid;
-
-	/* XXX FIXME: We should fix pci_device_from_OF_node here, and
-	 * get to a real pci_dev or we'll get into trouble with PCI
-	 * domains the day we get overlapping numbers (like if we ever
-	 * decide to show the HT root
-	 */
-	if (pci_device_from_OF_node(node, &pbus, &pid) == 0)
-		pdev = pci_find_slot(pbus, pid);
 
 	LOCK(flags);
 	if (value) {
@@ -1378,7 +1358,7 @@
 		mb();
 		k2_skiplist[1] = NULL;
 	} else {
-		k2_skiplist[1] = pdev;
+		k2_skiplist[1] = node;
 		mb();
 		MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE);
 	}
Index: linux-work/arch/ppc/platforms/pmac_pci.c
===================================================================
--- linux-work.orig/arch/ppc/platforms/pmac_pci.c	2005-01-18 17:50:11.000000000 +1100
+++ linux-work/arch/ppc/platforms/pmac_pci.c	2005-01-19 13:46:58.000000000 +1100
@@ -52,7 +52,7 @@
 extern u8 pci_cache_line_size;
 extern int pcibios_assign_bus_offset;
 
-struct pci_dev *k2_skiplist[2];
+struct device_node *k2_skiplist[2];
 
 /*
  * Magic constants for enabling cache coherency in the bandit/PSX bridge.
@@ -325,8 +325,7 @@
 	 * cycle accesses. Fix that here.
 	 */
 	for (i=0; i<2; i++)
-		if (k2_skiplist[i] && k2_skiplist[i]->bus == bus &&
-		    k2_skiplist[i]->devfn == devfn) {
+		if (k2_skiplist[i] == np) { 
 			switch (len) {
 			case 1:
 				*val = 0xff; break;
@@ -375,8 +374,7 @@
 	 * cycle accesses. Fix that here.
 	 */
 	for (i=0; i<2; i++)
-		if (k2_skiplist[i] && k2_skiplist[i]->bus == bus &&
-		    k2_skiplist[i]->devfn == devfn)
+		if (k2_skiplist[i] == np)
 			return PCIBIOS_SUCCESSFUL;
 
 	addr = u3_ht_cfg_access(hose, bus->number, devfn, offset);


From anton at samba.org  Wed Jan 19 15:12:30 2005
From: anton at samba.org (Anton Blanchard)
Date: Wed, 19 Jan 2005 15:12:30 +1100
Subject: [PATCH] ppc64: Minimum hashtable size
Message-ID: <20050119041230.GB21682@krispykreme.ozlabs.ibm.com>


From: Milton Miller <miltonm at bga.com>

We werent enforcing the minimum hardware MMU hashtable size.

Signed-off-by: Milton Miller <miltonm at bga.com>
Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/kernel/prom.c~minimum_hashtable_size arch/ppc64/kernel/prom.c
--- foobar2/arch/ppc64/kernel/prom.c~minimum_hashtable_size	2005-01-19 15:06:47.729610075 +1100
+++ foobar2-anton/arch/ppc64/kernel/prom.c	2005-01-19 15:07:06.577082744 +1100
@@ -1055,7 +1055,7 @@ void __init early_init_devtree(void *par
 			rnd_mem_size <<= 1;
 
 		/* # pages / 2 */
-		pteg_count = (rnd_mem_size >> (12 + 1));
+		pteg_count = max(rnd_mem_size >> (12 + 1), 1UL << 11);
 
 		ppc64_pft_size = __ilog2(pteg_count << 7);
 	}
_


From benh at kernel.crashing.org  Wed Jan 19 15:31:55 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Wed, 19 Jan 2005 15:31:55 +1100
Subject: vDSO update
Message-ID: <1106109115.4499.171.camel@gaston>

I posted a new vDSO patch at http://gate.crashing.org/~benh/ppc64-vdso-20050119.diff

Now, both 32 and 64 bits vDSO's are linked at "0" and export symbols as offsets
to functions and not real function symbols (I made them consistent) and updated
to patch to apply against current Linus bk.

-- 
Benjamin Herrenschmidt <benh at kernel.crashing.org>


From sfr at canb.auug.org.au  Wed Jan 19 15:48:57 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Wed, 19 Jan 2005 15:48:57 +1100
Subject: [PATCH] htab code cleanup
In-Reply-To: <1105828597.27435.88.camel@gaston>
References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au>
	<1105828597.27435.88.camel@gaston>
Message-ID: <20050119154857.7cec8fbb.sfr@canb.auug.org.au>

On Sun, 16 Jan 2005 09:36:37 +1100 Benjamin Herrenschmidt <benh at kernel.crashing.org> wrote:
>
> On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > This patch just does some small clean ups on the hash page table code
> > 	- make htab_address static with in htab_native.c
> > 	- move some code that depended on CONFIG_PPC_MULTIPLATFORM
> > 	  from htab_utils.c to htab_native.c (on less CONFIG check).
> > 	- clean up includes in htab_utils.c
> 
> I don't see the point of moving create_pte_mapping() and
> htab_initialize() to htab_native.c since it contains code for both
> native and non-native...
> 
> If you want to get rid of the htab_address, then maybe split
> htab_initialize in bits... like htab_native_init() and htab_plpar_init()
> for the early ptr setup, that sort of thing ...

OK, how about this one, then?  This has been built and booted on
iSeries, pSeries (bare metal and lpar) and a G5 (with and without iommu).

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk.new/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c
--- linus-bk.new/arch/ppc64/kernel/iSeries_setup.c	2005-01-09 10:05:39.000000000 +1100
+++ linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c	2005-01-18 16:46:06.000000000 +1100
@@ -477,12 +477,6 @@
 	htab_hash_mask = num_ptegs - 1;
 	
 	/*
-	 * The actual hashed page table is in the hypervisor,
-	 * we have no direct access
-	 */
-	htab_address = NULL;
-
-	/*
 	 * Determine if absolute memory has any
 	 * holes so that we can interpret the
 	 * access map we get back from the hypervisor
diff -ruN linus-bk.new/arch/ppc64/kernel/setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c
--- linus-bk.new/arch/ppc64/kernel/setup.c	2005-01-09 10:05:39.000000000 +1100
+++ linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c	2005-01-18 16:46:23.000000000 +1100
@@ -674,7 +674,6 @@
 			ppc64_caches.dline_size);
 	printk("ppc64_caches.icache_line_size = 0x%x\n",
 			ppc64_caches.iline_size);
-	printk("htab_address                  = 0x%p\n", htab_address);
 	printk("htab_hash_mask                = 0x%lx\n", htab_hash_mask);
 	printk("-----------------------------------------------------\n");
 
diff -ruN linus-bk.new/arch/ppc64/mm/Makefile linus-bk-sfr.14.new/arch/ppc64/mm/Makefile
--- linus-bk.new/arch/ppc64/mm/Makefile	2004-09-24 15:23:06.000000000 +1000
+++ linus-bk-sfr.14.new/arch/ppc64/mm/Makefile	2005-01-18 18:28:57.000000000 +1100
@@ -8,4 +8,4 @@
 	slb_low.o slb.o stab.o mmap.o
 obj-$(CONFIG_DISCONTIGMEM) += numa.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_native.o
+obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_multi.o hash_native.o
diff -ruN linus-bk.new/arch/ppc64/mm/hash_multi.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c
--- linus-bk.new/arch/ppc64/mm/hash_multi.c	1970-01-01 10:00:00.000000000 +1000
+++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c	2005-01-18 18:27:48.000000000 +1100
@@ -0,0 +1,177 @@
+/*
+ * multiplatform hashtable management.
+ *
+ * SMP scalability work:
+ *    Copyright (C) 2001 Anton Blanchard <anton at au.ibm.com>, IBM
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+
+#include <asm/abs_addr.h>
+#include <asm/machdep.h>
+#include <asm/mmu.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
+#include <asm/cputable.h>
+#include <asm/ppcdebug.h>
+#include <asm/page.h>
+#include <asm/systemcfg.h>
+#include <asm/processor.h>
+#include <asm/lmb.h>
+#include <asm/segment.h>
+
+#ifdef DEBUG
+#define DBG(fmt...) udbg_printf(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+/*
+ * Note:  pte   --> Linux PTE
+ *        HPTE  --> PowerPC Hashed Page Table Entry
+ *
+ * Execution context:
+ *   htab_initialize is called with the MMU off (of course), but
+ *   the kernel has been copied down to zero so it can directly
+ *   reference global data.  At this point it is very difficult
+ *   to print debug info.
+ *
+ */
+
+#ifdef CONFIG_U3_DART
+extern unsigned long dart_tablebase;
+#endif /* CONFIG_U3_DART */
+
+#define KB (1024)
+#define MB (1024*KB)
+
+static inline void loop_forever(void)
+{
+	volatile unsigned long x = 1;
+	for(;x;x|=1)
+		;
+}
+
+static inline void create_pte_mapping(unsigned long start, unsigned long end,
+				      unsigned long mode, int large)
+{
+	unsigned long addr;
+	unsigned int step;
+
+	if (large)
+		step = 16*MB;
+	else
+		step = 4*KB;
+
+	for (addr = start; addr < end; addr += step) {
+		unsigned long vpn, hash, hpteg;
+		unsigned long vsid = get_kernel_vsid(addr);
+		unsigned long va = (vsid << 28) | (addr & 0xfffffff);
+		int ret;
+
+		if (large)
+			vpn = va >> HPAGE_SHIFT;
+		else
+			vpn = va >> PAGE_SHIFT;
+
+		hash = hpt_hash(vpn, large);
+
+		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+#ifdef CONFIG_PPC_PSERIES
+		if (systemcfg->platform & PLATFORM_LPAR)
+			ret = pSeries_lpar_hpte_insert(hpteg, va,
+				virt_to_abs(addr) >> PAGE_SHIFT,
+				0, mode, 1, large);
+		else
+#endif /* CONFIG_PPC_PSERIES */
+			ret = native_hpte_insert(hpteg, va,
+				virt_to_abs(addr) >> PAGE_SHIFT,
+				0, mode, 1, large);
+
+		if (ret == -1) {
+			ppc64_terminate_msg(0x20, "create_pte_mapping");
+			loop_forever();
+		}
+	}
+}
+
+void __init htab_initialize(void)
+{
+	unsigned long htab_size_bytes;
+	unsigned long pteg_count;
+	unsigned long mode_rw;
+	int i, use_largepages = 0;
+
+	DBG(" -> htab_initialize()\n");
+
+	/*
+	 * Calculate the required size of the htab.  We want the number of
+	 * PTEGs to equal one half the number of real pages.
+	 */ 
+	htab_size_bytes = 1UL << ppc64_pft_size;
+	pteg_count = htab_size_bytes >> 7;
+
+	/* For debug, make the HTAB 1/8 as big as it normally would be. */
+	ifppcdebug(PPCDBG_HTABSIZE) {
+		pteg_count >>= 3;
+		htab_size_bytes = pteg_count << 7;
+	}
+
+	htab_hash_mask = pteg_count - 1;
+
+#ifdef CONFIG_PPC_PSERIES
+	if (!(systemcfg->platform & PLATFORM_LPAR))
+#endif
+		if (native_htab_initialize(htab_size_bytes, pteg_count))
+			loop_forever();
+
+	mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX;
+
+	/* On U3 based machines, we need to reserve the DART area and
+	 * _NOT_ map it to avoid cache paradoxes as it's remapped non
+	 * cacheable later on
+	 */
+	if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)
+		use_largepages = 1;
+
+	/* create bolted the linear mapping in the hash table */
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		unsigned long base, size;
+
+		base = lmb.memory.region[i].physbase + KERNELBASE;
+		size = lmb.memory.region[i].size;
+
+		DBG("creating mapping for region: %lx : %lx\n", base, size);
+
+#ifdef CONFIG_U3_DART
+		/* Do not map the DART space. Fortunately, it will be aligned
+		 * in such a way that it will not cross two lmb regions and will
+		 * fit within a single 16Mb page.
+		 * The DART space is assumed to be a full 16Mb region even if we
+		 * only use 2Mb of that space. We will use more of it later for
+		 * AGP GART. We have to use a full 16Mb large page.
+		 */
+		DBG("DART base: %lx\n", dart_tablebase);
+
+		if (dart_tablebase != 0 && dart_tablebase >= base
+		    && dart_tablebase < (base + size)) {
+			if (base != dart_tablebase)
+				create_pte_mapping(base, dart_tablebase, mode_rw,
+						   use_largepages);
+			if ((base + size) > (dart_tablebase + 16*MB))
+				create_pte_mapping(dart_tablebase + 16*MB, base + size,
+						   mode_rw, use_largepages);
+			continue;
+		}
+#endif /* CONFIG_U3_DART */
+		create_pte_mapping(base, base + size, mode_rw, use_largepages);
+	}
+	DBG(" <- htab_initialize()\n");
+}
+#undef KB
+#undef MB
diff -ruN linus-bk.new/arch/ppc64/mm/hash_native.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c
--- linus-bk.new/arch/ppc64/mm/hash_native.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c	2005-01-18 18:28:13.000000000 +1100
@@ -22,10 +22,21 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <asm/cputable.h>
+#include <asm/lmb.h>
+#include <asm/ppcdebug.h>
+
+#ifdef DEBUG
+#define DBG(fmt...) udbg_printf(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+extern unsigned long _SDR1;
 
 #define HPTE_LOCK_BIT 3
 
 static spinlock_t native_tlbie_lock = SPIN_LOCK_UNLOCKED;
+static HPTE *htab_address;
 
 static inline void native_lock_hpte(HPTE *hptep)
 {
@@ -410,6 +421,33 @@
 }
 #endif
 
+int native_htab_initialize(unsigned long htab_size_bytes,
+		unsigned long pteg_count)
+{
+	unsigned long table;
+
+	/* Find storage for the HPT.  Must be contiguous in
+	 * the absolute address space.
+	 */
+	table = lmb_alloc(htab_size_bytes, htab_size_bytes);
+
+	DBG("Hash table allocated at %lx, size: %lx\n", table, htab_size_bytes);
+
+	if (!table) {
+		ppc64_terminate_msg(0x20, "hpt space");
+		return 1;
+	}
+	htab_address = abs_to_virt(table);
+
+	/* htab absolute addr + encoded htabsize */
+	_SDR1 = table + __ilog2(pteg_count) - 11;
+
+	/* Initialize the HPT with no entries */
+	memset((void *)table, 0, htab_size_bytes);
+
+	return 0;
+}
+
 void hpte_init_native(void)
 {
 	ppc_md.hpte_invalidate	= native_hpte_invalidate;
diff -ruN linus-bk.new/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c
--- linus-bk.new/arch/ppc64/mm/hash_utils.c	2005-01-05 17:06:07.000000000 +1100
+++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c	2005-01-06 14:37:27.000000000 +1100
@@ -17,220 +17,29 @@
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  */
-
-#undef DEBUG
-
-#include <linux/config.h>
-#include <linux/spinlock.h>
-#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/bitops.h>
+#include <linux/page-flags.h>
 #include <linux/sched.h>
-#include <linux/proc_fs.h>
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include <linux/ctype.h>
-#include <linux/cache.h>
-#include <linux/init.h>
+#include <linux/cpumask.h>
+#include <linux/smp.h>
+#include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <linux/signal.h>
 
-#include <asm/ppcdebug.h>
 #include <asm/processor.h>
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/page.h>
-#include <asm/types.h>
 #include <asm/system.h>
-#include <asm/uaccess.h>
 #include <asm/machdep.h>
-#include <asm/lmb.h>
-#include <asm/abs_addr.h>
 #include <asm/tlbflush.h>
-#include <asm/io.h>
-#include <asm/eeh.h>
-#include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/cputable.h>
-#include <asm/abs_addr.h>
-
-#ifdef DEBUG
-#define DBG(fmt...) udbg_printf(fmt)
-#else
-#define DBG(fmt...)
-#endif
-
-/*
- * Note:  pte   --> Linux PTE
- *        HPTE  --> PowerPC Hashed Page Table Entry
- *
- * Execution context:
- *   htab_initialize is called with the MMU off (of course), but
- *   the kernel has been copied down to zero so it can directly
- *   reference global data.  At this point it is very difficult
- *   to print debug info.
- *
- */
-
-#ifdef CONFIG_U3_DART
-extern unsigned long dart_tablebase;
-#endif /* CONFIG_U3_DART */
+#include <asm/ptrace.h>
 
-HPTE		*htab_address;
 unsigned long	htab_hash_mask;
 
-extern unsigned long _SDR1;
-
-#define KB (1024)
-#define MB (1024*KB)
-
-static inline void loop_forever(void)
-{
-	volatile unsigned long x = 1;
-	for(;x;x|=1)
-		;
-}
-
-#ifdef CONFIG_PPC_MULTIPLATFORM
-static inline void create_pte_mapping(unsigned long start, unsigned long end,
-				      unsigned long mode, int large)
-{
-	unsigned long addr;
-	unsigned int step;
-
-	if (large)
-		step = 16*MB;
-	else
-		step = 4*KB;
-
-	for (addr = start; addr < end; addr += step) {
-		unsigned long vpn, hash, hpteg;
-		unsigned long vsid = get_kernel_vsid(addr);
-		unsigned long va = (vsid << 28) | (addr & 0xfffffff);
-		int ret;
-
-		if (large)
-			vpn = va >> HPAGE_SHIFT;
-		else
-			vpn = va >> PAGE_SHIFT;
-
-		hash = hpt_hash(vpn, large);
-
-		hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
-
-#ifdef CONFIG_PPC_PSERIES
-		if (systemcfg->platform & PLATFORM_LPAR)
-			ret = pSeries_lpar_hpte_insert(hpteg, va,
-				virt_to_abs(addr) >> PAGE_SHIFT,
-				0, mode, 1, large);
-		else
-#endif /* CONFIG_PPC_PSERIES */
-			ret = native_hpte_insert(hpteg, va,
-				virt_to_abs(addr) >> PAGE_SHIFT,
-				0, mode, 1, large);
-
-		if (ret == -1) {
-			ppc64_terminate_msg(0x20, "create_pte_mapping");
-			loop_forever();
-		}
-	}
-}
-
-void __init htab_initialize(void)
-{
-	unsigned long table, htab_size_bytes;
-	unsigned long pteg_count;
-	unsigned long mode_rw;
-	int i, use_largepages = 0;
-
-	DBG(" -> htab_initialize()\n");
-
-	/*
-	 * Calculate the required size of the htab.  We want the number of
-	 * PTEGs to equal one half the number of real pages.
-	 */ 
-	htab_size_bytes = 1UL << ppc64_pft_size;
-	pteg_count = htab_size_bytes >> 7;
-
-	/* For debug, make the HTAB 1/8 as big as it normally would be. */
-	ifppcdebug(PPCDBG_HTABSIZE) {
-		pteg_count >>= 3;
-		htab_size_bytes = pteg_count << 7;
-	}
-
-	htab_hash_mask = pteg_count - 1;
-
-	if (systemcfg->platform & PLATFORM_LPAR) {
-		/* Using a hypervisor which owns the htab */
-		htab_address = NULL;
-		_SDR1 = 0; 
-	} else {
-		/* Find storage for the HPT.  Must be contiguous in
-		 * the absolute address space.
-		 */
-		table = lmb_alloc(htab_size_bytes, htab_size_bytes);
-
-		DBG("Hash table allocated at %lx, size: %lx\n", table,
-		    htab_size_bytes);
-
-		if ( !table ) {
-			ppc64_terminate_msg(0x20, "hpt space");
-			loop_forever();
-		}
-		htab_address = abs_to_virt(table);
-
-		/* htab absolute addr + encoded htabsize */
-		_SDR1 = table + __ilog2(pteg_count) - 11;
-
-		/* Initialize the HPT with no entries */
-		memset((void *)table, 0, htab_size_bytes);
-	}
-
-	mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX;
-
-	/* On U3 based machines, we need to reserve the DART area and
-	 * _NOT_ map it to avoid cache paradoxes as it's remapped non
-	 * cacheable later on
-	 */
-	if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)
-		use_largepages = 1;
-
-	/* create bolted the linear mapping in the hash table */
-	for (i=0; i < lmb.memory.cnt; i++) {
-		unsigned long base, size;
-
-		base = lmb.memory.region[i].physbase + KERNELBASE;
-		size = lmb.memory.region[i].size;
-
-		DBG("creating mapping for region: %lx : %lx\n", base, size);
-
-#ifdef CONFIG_U3_DART
-		/* Do not map the DART space. Fortunately, it will be aligned
-		 * in such a way that it will not cross two lmb regions and will
-		 * fit within a single 16Mb page.
-		 * The DART space is assumed to be a full 16Mb region even if we
-		 * only use 2Mb of that space. We will use more of it later for
-		 * AGP GART. We have to use a full 16Mb large page.
-		 */
-		DBG("DART base: %lx\n", dart_tablebase);
-
-		if (dart_tablebase != 0 && dart_tablebase >= base
-		    && dart_tablebase < (base + size)) {
-			if (base != dart_tablebase)
-				create_pte_mapping(base, dart_tablebase, mode_rw,
-						   use_largepages);
-			if ((base + size) > (dart_tablebase + 16*MB))
-				create_pte_mapping(dart_tablebase + 16*MB, base + size,
-						   mode_rw, use_largepages);
-			continue;
-		}
-#endif /* CONFIG_U3_DART */
-		create_pte_mapping(base, base + size, mode_rw, use_largepages);
-	}
-	DBG(" <- htab_initialize()\n");
-}
-#undef KB
-#undef MB
-#endif /* CONFIG_PPC_MULTIPLATFORM */
-
 /*
  * Called by asm hashtable.S for doing lazy icache flush
  */
diff -ruN linus-bk.new/include/asm-ppc64/mmu.h linus-bk-sfr.14.new/include/asm-ppc64/mmu.h
--- linus-bk.new/include/asm-ppc64/mmu.h	2005-01-05 17:06:08.000000000 +1100
+++ linus-bk-sfr.14.new/include/asm-ppc64/mmu.h	2005-01-06 14:36:16.000000000 +1100
@@ -98,7 +98,6 @@
 #define PP_RXRX 3	/* Supervisor read,       User read */
 
 
-extern HPTE *		htab_address;
 extern unsigned long	htab_hash_mask;
 
 static inline unsigned long hpt_hash(unsigned long vpn, int large)
diff -ruN linus-bk.new/include/asm-ppc64/pgtable.h linus-bk-sfr.14.new/include/asm-ppc64/pgtable.h
--- linus-bk.new/include/asm-ppc64/pgtable.h	2005-01-02 12:05:23.000000000 +1100
+++ linus-bk-sfr.14.new/include/asm-ppc64/pgtable.h	2005-01-18 17:37:43.000000000 +1100
@@ -523,6 +523,9 @@
 extern long native_hpte_insert(unsigned long hpte_group, unsigned long va,
 			       unsigned long prpn, int secondary,
 			       unsigned long hpteflags, int bolted, int large);
+extern int native_htab_initialize(unsigned long htab_size_bytes,
+				  unsigned long pteg_count);
+
 
 /*
  * find_linux_pte returns the address of a linux pte for a given 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/49581523/attachment.pgp 

From wangzyu at cn.ibm.com  Wed Jan 19 15:50:39 2005
From: wangzyu at cn.ibm.com (Zhao Yu Wang)
Date: Wed, 19 Jan 2005 12:50:39 +0800
Subject: question about LMB's size
In-Reply-To: <OF69A2C033.F9ACAAD9-ON86256F8C.005A7B66-86256F8C.005BBF20@us.ibm.com>
Message-ID: <OF3731450D.CC3ED7AE-ON48256F8E.0019A3BE-48256F8E.001B2596@cn.ibm.com>


Hi,Will
  Thanks

>> Hi,
>> This is a question about the different of memory size between lpar and
HMC.
>>...

>> 2. In lpar didolp2: We get the size of memory is 2174672KB.
>> [root at didolp2 ~]# cat /proc/meminfo
>> MemTotal: 2174672 kB
>>
>> The question is: 2174672/(32*1024) = 66.36572265625

>MemTotal is the amount of free memory in the partition, which does not
>include the memory that holds the kernel code, (bss, data, init).

>There should be a few other pieces of data that will add up to the numbers
>you are looking for.

>in early boot messages, there is a line "SystemCfg->physicalMemorySize =
>0x.......".   This value should be precisely what you are trying to
>measure.

>A bit later in the logs, you can also see a line
>"Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k
data,
>###k bss, ###k init).
>the YYYYk should also match what you are looking for.

If the system boot up several days before, the boot log is not available at
this time.  Whether there has any other method to get the physical memory
from lpar.

Could the OS provide a method to obtain the real memory. It will help to
dynamic reassign resource according by the load between several partition.


Thanks & Best regards,

--------------------------------------------
Wang Zhaoyu

Email: wangzyu at cn.ibm.com
Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/f183befb/attachment.htm 

From sfr at canb.auug.org.au  Wed Jan 19 16:47:53 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Wed, 19 Jan 2005 16:47:53 +1100
Subject: [PATCH] PPC64: remove some unused iSeries functions
Message-ID: <20050119164753.5af63cc5.sfr@canb.auug.org.au>

Hi Linus, Andrew,

This patch removes some unused stuff from PPC64 iSeries:
	- asm-ppc64/iSeries/iSeries_VpdInfo.h
	- iSeries_GetLocationData()
	- LocationData structure
	- device_Location()

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruN linus-bk/arch/ppc64/kernel/iSeries_VpdInfo.c linus-bk-sfr.16/arch/ppc64/kernel/iSeries_VpdInfo.c
--- linus-bk/arch/ppc64/kernel/iSeries_VpdInfo.c	2004-04-01 06:59:36.000000000 +1000
+++ linus-bk-sfr.16/arch/ppc64/kernel/iSeries_VpdInfo.c	2005-01-19 16:36:40.000000000 +1100
@@ -36,7 +36,6 @@
 #include <asm/iSeries/HvTypes.h>
 #include <asm/iSeries/mf.h>
 #include <asm/iSeries/LparData.h>
-//#include <asm/iSeries/iSeries_VpdInfo.h>
 #include <asm/iSeries/iSeries_pci.h>
 #include "pci.h"
 
@@ -85,30 +84,6 @@
 #define SLOT_ENTRY_SIZE   16
 
 /*
- * Bus, Card, Board, FrameId, CardLocation.
- */
-LocationData* iSeries_GetLocationData(struct pci_dev *PciDev)
-{
-	struct iSeries_Device_Node *DevNode =
-		(struct iSeries_Device_Node *)PciDev->sysdata;
-	LocationData *LocationPtr =
-		(LocationData *)kmalloc(LOCATION_DATA_SIZE, GFP_KERNEL);
-
-	if (LocationPtr == NULL) {
-		printk("PCI: LocationData area allocation failed!\n");
-		return NULL;
-	}
-	memset(LocationPtr, 0, LOCATION_DATA_SIZE);
-	LocationPtr->Bus = ISERIES_BUS(DevNode);
-	LocationPtr->Board = DevNode->Board;
-	LocationPtr->FrameId = DevNode->FrameId;
-	LocationPtr->Card = PCI_SLOT(DevNode->DevFn);
-	strcpy(&LocationPtr->CardLocation[0], &DevNode->CardLocation[0]);
-	return LocationPtr;
-}
-EXPORT_SYMBOL(iSeries_GetLocationData);
-
-/*
  * Formats the device information.
  * - Pass in pci_dev* pointer to the device.
  * - Pass in buffer to place the data.  Danger here is the buffer must
@@ -149,18 +124,6 @@
 }
 
 /*
- * Build a character string of the device location, Frame  1, Card  C10
- */
-int device_Location(struct pci_dev *PciDev, char *BufPtr)
-{
-	struct iSeries_Device_Node *DevNode =
-		(struct iSeries_Device_Node *)PciDev->sysdata;
-	return sprintf(BufPtr, "PCI: Bus%3d, AgentId%3d, Vendor %04X, Location %s",
-		       DevNode->DsaAddr.Dsa.busNumber, DevNode->AgentId,
-		       DevNode->Vendor, DevNode->Location);
-}
-
-/*
  * Parse the Slot Area
  */
 void iSeries_Parse_SlotArea(SlotMap *MapPtr, int MapLen,
diff -ruN linus-bk/include/asm-ppc64/iSeries/iSeries_VpdInfo.h linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_VpdInfo.h
--- linus-bk/include/asm-ppc64/iSeries/iSeries_VpdInfo.h	2002-02-14 23:14:36.000000000 +1100
+++ linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_VpdInfo.h	1970-01-01 10:00:00.000000000 +1000
@@ -1,56 +0,0 @@
-#ifndef _ISERIES_VPDINFO_H
-#define _ISERIES_VPDINFO_H
-/************************************************************************/
-/* File iSeries_VpdInfo.h created by Allan Trautman Feb 08 2001.        */
-/************************************************************************/
-/* This code supports the location data fon on the IBM iSeries systems. */
-/* Copyright (C) 20yy  <Allan H Trautman> <IBM Corp>                    */
-/*                                                                      */
-/* This program is free software; you can redistribute it and/or modify */
-/* it under the terms of the GNU General Public License as published by */
-/* the Free Software Foundation; either version 2 of the License, or    */
-/* (at your option) any later version.                                  */
-/*                                                                      */
-/* This program is distributed in the hope that it will be useful,      */ 
-/* but WITHOUT ANY WARRANTY; without even the implied warranty of       */
-/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the        */
-/* GNU General Public License for more details.                         */
-/*                                                                      */
-/* You should have received a copy of the GNU General Public License    */ 
-/* along with this program; if not, write to the:                       */
-/* Free Software Foundation, Inc.,                                      */ 
-/* 59 Temple Place, Suite 330,                                          */ 
-/* Boston, MA  02111-1307  USA                                          */
-/************************************************************************/
-/* Change Activity:                                                     */
-/*   Created, Feg  8, 2001                                              */
-/*   Reformated for Card, March 8, 2001                                 */
-/* End Change Activity                                                  */
-/************************************************************************/
-
-struct pci_dev; 		/* Forward Declare                      */
-/************************************************************************/
-/* Location Data extracted from the VPD list and device info.           */
-/************************************************************************/
-struct LocationDataStruct {	/* Location data structure for device   */
-	u16  Bus;		/* iSeries Bus Number		    0x00*/
-	u16  Board;		/* iSeries Board                    0x02*/
-	u8   FrameId;		/* iSeries spcn Frame Id            0x04*/
-	u8   PhbId;		/* iSeries Phb Location             0x05*/
-	u16  Card;		/* iSeries Card Slot                0x06*/
-	char CardLocation[4];	/* Char format of planar vpd        0x08*/
-	u8   AgentId;		/* iSeries AgentId                  0x0C*/
-	u8   SecondaryAgentId;	/* iSeries Secondary Agent Id       0x0D*/
-	u8   LinuxBus;		/* Linux Bus Number                 0x0E*/
-	u8   LinuxDevFn;	/* Linux Device Function            0x0F*/
-};
-typedef struct LocationDataStruct  LocationData;
-#define LOCATION_DATA_SIZE      16
-
-/************************************************************************/
-/* Protypes                                                             */
-/************************************************************************/
-extern LocationData* iSeries_GetLocationData(struct pci_dev* PciDev);
-extern int           iSeries_Device_Information(struct pci_dev*,char*, int);
-
-#endif /* _ISERIES_VPDINFO_H */
diff -ruN linus-bk/include/asm-ppc64/iSeries/iSeries_pci.h linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_pci.h
--- linus-bk/include/asm-ppc64/iSeries/iSeries_pci.h	2004-04-01 06:59:37.000000000 +1000
+++ linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_pci.h	2005-01-19 16:33:01.000000000 +1100
@@ -102,27 +102,9 @@
 };
 
 /************************************************************************/
-/* Location Data extracted from the VPD list and device info.           */
-/************************************************************************/
-
-struct LocationDataStruct { 	/* Location data structure for device  */
-	u16  Bus;               /* iSeries Bus Number              0x00*/
-	u16  Board;             /* iSeries Board                   0x02*/
-	u8   FrameId;           /* iSeries spcn Frame Id           0x04*/
-	u8   PhbId;             /* iSeries Phb Location            0x05*/
-	u8   AgentId;           /* iSeries AgentId                 0x06*/
-	u8   Card;
-	char CardLocation[4];      
-};
-
-typedef struct LocationDataStruct  LocationData;
-#define LOCATION_DATA_SIZE      48
-
-/************************************************************************/
 /* Functions                                                            */
 /************************************************************************/
 
-extern LocationData* iSeries_GetLocationData(struct pci_dev* PciDev);
 extern int           iSeries_Device_Information(struct pci_dev*,char*, int);
 extern void          iSeries_Get_Location_Code(struct iSeries_Device_Node*);
 extern int           iSeries_Device_ToggleReset(struct pci_dev* PciDev, int AssertTime, int DelayTime);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/cd8071d1/attachment.pgp 

From paulus at samba.org  Wed Jan 19 17:06:05 2005
From: paulus at samba.org (Paul Mackerras)
Date: Wed, 19 Jan 2005 17:06:05 +1100
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <20050117201415.GA11505@austin.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
	<20050117201415.GA11505@austin.ibm.com>
Message-ID: <16877.63693.915740.385920@cargo.ozlabs.ibm.com>

Linas Vepstas writes:

> p.s.  It was not clear to me if the EEH patch previously sent 
> (6 January 2005, same subject line) will be wending its way into 
> the main Torvalds kernel tree, or not.  I hadn't really gotten
> confirmation one way or another.

I'm not really totally happy with it yet, on a number of fronts:

1. You're adding more PCI-specific stuff to the device_node struct,
   which I don't like.  I would prefer that the device_node tree
   contains basically just what we get from OF, and that we have a
   separate struct for storing ppc64-specific information for each PCI
   device.  Fixing that is outside the scope of your patch, though.

2. I don't see why the device nodes for the PCI subtree being reset
   would go away, and thus I don't see the need for your eeh_cfg_tree
   struct.

3. Is there a good reason why we can't use the assigned-addresses
   property on the relevant device tree nodes to tell us what to set
   the BARs to?

4. I think the 5 second sleep is quite bogus, and shows that we have
   the flow of control wrong.  In particular I think it should be a
   userland write to a sysfs file that kicks off the restart process
   rather than it just happening after 5 seconds.  Anyway, what
   process or thread is executing that 5 second sleep?  Is it keventd
   or something?

5. AFAICS userland will get an unplug notification for the device, but
   nothing to indicate that is due to an EEH slot isolation event.  I
   think userland should be told about EEH events.

Regards,
Paul.


From mingo at elte.hu  Wed Jan 19 18:44:04 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Wed, 19 Jan 2005 08:44:04 +0100
Subject: [patch] spin-yield-2.6.11-rc1-A1
In-Reply-To: <20050117124209.GA20796@elte.hu>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu>
	<16873.55739.214904.473407@cargo.ozlabs.ibm.com>
	<20050117124209.GA20796@elte.hu>
Message-ID: <20050119074404.GA26768@elte.hu>


* Ingo Molnar <mingo at elte.hu> wrote:

> ok - how about the (raw) patch below? (ontop of BK plus the latest
> spin-nicer patch i sent earlier.) It builds/boots on x86 but is
> untested on ppc64.
> 
> the idea is to make spin_yield() a generic function, with some related
> namespace cleanups.

wrong patch... Full patch against BK-curr attached.

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/exit.c.orig
+++ linux/kernel/exit.c
@@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_
 #ifdef CONFIG_SMP
 	if (!p->sighand)
 		BUG();
+#ifndef write_is_locked
+# warning please implement read_is_locked()/write_is_locked()!
+# define write_is_locked rwlock_is_locked
+#endif
 	if (!spin_is_locked(&p->sighand->siglock) &&
-				!rwlock_is_locked(&tasklist_lock))
+				!write_is_locked(&tasklist_lock))
 		BUG();
 #endif
 	return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);
--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
-void __lockfunc _##op##_lock(locktype *lock)				\
+#define BUILD_LOCK_OPS(op, locktype)					\
+void __lockfunc _##op##_lock(locktype##_t *lock)			\
 {									\
 	preempt_disable();						\
 	for (;;) {							\
@@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock);						\
 									\
-unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock)		\
+unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock)	\
 {									\
 	unsigned long flags;						\
 									\
@@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 	return flags;							\
@@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir
 									\
 EXPORT_SYMBOL(_##op##_lock_irqsave);					\
 									\
-void __lockfunc _##op##_lock_irq(locktype *lock)			\
+void __lockfunc _##op##_lock_irq(locktype##_t *lock)			\
 {									\
 	_##op##_lock_irqsave(lock);					\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock_irq);					\
 									\
-void __lockfunc _##op##_lock_bh(locktype *lock)				\
+void __lockfunc _##op##_lock_bh(locktype##_t *lock)			\
 {									\
 	unsigned long flags;						\
 									\
@@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
-BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
-BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
+BUILD_LOCK_OPS(spin, spinlock);
+BUILD_LOCK_OPS(read, rwlock);
+BUILD_LOCK_OPS(write, rwlock);
 
 #endif /* CONFIG_PREEMPT */
 
--- linux/arch/ppc64/lib/locks.c.orig
+++ linux/arch/ppc64/lib/locks.c
@@ -23,7 +23,7 @@
 /* waiting for a spinlock... */
 #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES)
 
-void __spin_yield(spinlock_t *lock)
+void spinlock_yield(spinlock_t *lock)
 {
 	unsigned int lock_value, holder_cpu, yield_count;
 	struct paca_struct *holder_paca;
@@ -54,7 +54,7 @@ void __spin_yield(spinlock_t *lock)
  * This turns out to be the same for read and write locks, since
  * we only know the holder if it is write-locked.
  */
-void __rw_yield(rwlock_t *rw)
+void rwlock_yield(rwlock_t *rw)
 {
 	int lock_value;
 	unsigned int holder_cpu, yield_count;
@@ -87,7 +87,7 @@ void spin_unlock_wait(spinlock_t *lock)
 	while (lock->lock) {
 		HMT_low();
 		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
+			spinlock_yield(lock);
 	}
 	HMT_medium();
 }
--- linux/include/asm-ia64/spinlock.h.orig
+++ linux/include/asm-ia64/spinlock.h
@@ -17,6 +17,8 @@
 #include <asm/intrinsics.h>
 #include <asm/system.h>
 
+#include <asm-generic/spinlock.h>
+
 typedef struct {
 	volatile unsigned int lock;
 #ifdef CONFIG_PREEMPT
--- linux/include/asm-generic/spinlock.h.orig
+++ linux/include/asm-generic/spinlock.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_GENERIC_SPINLOCK_H
+#define _ASM_GENERIC_SPINLOCK_H
+
+/*
+ * Virtual platforms might use these to
+ * yield to specific virtual CPUs:
+ */
+#define spinlock_yield(lock)	cpu_relax()
+#define rwlock_yield(lock)	cpu_relax()
+
+#endif /* _ASM_GENERIC_SPINLOCK_H */
--- linux/include/linux/spinlock.h.orig
+++ linux/include/linux/spinlock.h
@@ -202,10 +202,12 @@ typedef struct {
 #define _raw_spin_lock(lock)	do { (void)(lock); } while(0)
 #define spin_is_locked(lock)	((void)(lock), 0)
 #define _raw_spin_trylock(lock)	(((void)(lock), 1))
-#define spin_unlock_wait(lock)	(void)(lock);
+#define spin_unlock_wait(lock)	(void)(lock)
 #define _raw_spin_unlock(lock) do { (void)(lock); } while(0)
 #endif /* CONFIG_DEBUG_SPINLOCK */
 
+#define spinlock_yield(lock)	(void)(lock)
+
 /* RW spinlocks: No debug version */
 
 #if (__GNUC__ > 2)
@@ -224,6 +226,8 @@ typedef struct {
 #define _raw_read_trylock(lock) ({ (void)(lock); (1); })
 #define _raw_write_trylock(lock) ({ (void)(lock); (1); })
 
+#define rwlock_yield(lock)	(void)(lock)
+
 #define _spin_trylock(lock)	({preempt_disable(); _raw_spin_trylock(lock) ? \
 				1 : ({preempt_enable(); 0;});})
 
--- linux/include/asm-i386/spinlock.h.orig
+++ linux/include/asm-i386/spinlock.h
@@ -15,7 +15,7 @@ asmlinkage int printk(const char * fmt, 
  */
 
 typedef struct {
-	volatile unsigned int lock;
+	volatile unsigned int slock;
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned magic;
 #endif
@@ -43,7 +43,7 @@ typedef struct {
  * We make no fairness assumptions. They have a cost.
  */
 
-#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
+#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) <= 0)
 #define spin_unlock_wait(x)	do { barrier(); } while(spin_is_locked(x))
 
 #define spin_lock_string \
@@ -83,7 +83,7 @@ typedef struct {
 
 #define spin_unlock_string \
 	"movb $1,%0" \
-		:"=m" (lock->lock) : : "memory"
+		:"=m" (lock->slock) : : "memory"
 
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -101,7 +101,7 @@ static inline void _raw_spin_unlock(spin
 
 #define spin_unlock_string \
 	"xchgb %b0, %1" \
-		:"=q" (oldval), "=m" (lock->lock) \
+		:"=q" (oldval), "=m" (lock->slock) \
 		:"0" (oldval) : "memory"
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -123,7 +123,7 @@ static inline int _raw_spin_trylock(spin
 	char oldval;
 	__asm__ __volatile__(
 		"xchgb %b0,%1"
-		:"=q" (oldval), "=m" (lock->lock)
+		:"=q" (oldval), "=m" (lock->slock)
 		:"0" (0) : "memory");
 	return oldval > 0;
 }
@@ -138,7 +138,7 @@ static inline void _raw_spin_lock(spinlo
 #endif
 	__asm__ __volatile__(
 		spin_lock_string
-		:"=m" (lock->lock) : : "memory");
+		:"=m" (lock->slock) : : "memory");
 }
 
 static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags)
@@ -151,7 +151,7 @@ static inline void _raw_spin_lock_flags 
 #endif
 	__asm__ __volatile__(
 		spin_lock_string_flags
-		:"=m" (lock->lock) : "r" (flags) : "memory");
+		:"=m" (lock->slock) : "r" (flags) : "memory");
 }
 
 /*
@@ -186,7 +186,17 @@ typedef struct {
 
 #define rwlock_init(x)	do { *(x) = RW_LOCK_UNLOCKED; } while(0)
 
-#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
+/**
+ * read_is_locked - would read_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0)
+
+/**
+ * write_is_locked - would write_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
 
 /*
  * On x86, we implement read-write locks as a 32-bit counter
--- linux/include/asm-ppc64/spinlock.h.orig
+++ linux/include/asm-ppc64/spinlock.h
@@ -64,11 +64,11 @@ static __inline__ void _raw_spin_unlock(
 #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc)
-extern void __spin_yield(spinlock_t *lock);
-extern void __rw_yield(rwlock_t *lock);
+extern void spinlock_yield(spinlock_t *lock);
+extern void rwlock_yield(rwlock_t *lock);
 #else /* SPLPAR || ISERIES */
-#define __spin_yield(x)	barrier()
-#define __rw_yield(x)	barrier()
+#define spinlock_yield(x)	barrier()
+#define rwlock_yield(x)	barrier()
 #define SHARED_PROCESSOR	0
 #endif
 extern void spin_unlock_wait(spinlock_t *lock);
@@ -109,7 +109,7 @@ static void __inline__ _raw_spin_lock(sp
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__spin_yield(lock);
+				spinlock_yield(lock);
 		} while (likely(lock->lock != 0));
 		HMT_medium();
 	}
@@ -127,7 +127,7 @@ static void __inline__ _raw_spin_lock_fl
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__spin_yield(lock);
+				spinlock_yield(lock);
 		} while (likely(lock->lock != 0));
 		HMT_medium();
 		local_irq_restore(flags_dis);
@@ -201,7 +201,7 @@ static void __inline__ _raw_read_lock(rw
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__rw_yield(rw);
+				rwlock_yield(rw);
 		} while (likely(rw->lock < 0));
 		HMT_medium();
 	}
@@ -258,7 +258,7 @@ static void __inline__ _raw_write_lock(r
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__rw_yield(rw);
+				rwlock_yield(rw);
 		} while (likely(rw->lock != 0));
 		HMT_medium();
 	}
--- linux/include/asm-x86_64/spinlock.h.orig
+++ linux/include/asm-x86_64/spinlock.h
@@ -6,6 +6,8 @@
 #include <asm/page.h>
 #include <linux/config.h>
 
+#include <asm-generic/spinlock.h>
+
 extern int printk(const char * fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
 

From mingo at elte.hu  Wed Jan 19 18:55:00 2005
From: mingo at elte.hu (Ingo Molnar)
Date: Wed, 19 Jan 2005 08:55:00 +0100
Subject: [patch] spin-yield-2.6.11-rc1-A1
In-Reply-To: <20050119074404.GA26768@elte.hu>
References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com>
	<20050115142537.GD10114@elte.hu>
	<16873.55739.214904.473407@cargo.ozlabs.ibm.com>
	<20050117124209.GA20796@elte.hu> <20050119074404.GA26768@elte.hu>
Message-ID: <20050119075500.GA26880@elte.hu>


* Ingo Molnar <mingo at elte.hu> wrote:

> > ok - how about the (raw) patch below? (ontop of BK plus the latest
> > spin-nicer patch i sent earlier.) It builds/boots on x86 but is
> > untested on ppc64.
> > 
> > the idea is to make spin_yield() a generic function, with some related
> > namespace cleanups.
> 
> wrong patch... Full patch against BK-curr attached.

the one below builds/boots as well ...

	Ingo

Signed-off-by: Ingo Molnar <mingo at elte.hu>

--- linux/kernel/exit.c.orig
+++ linux/kernel/exit.c
@@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_
 #ifdef CONFIG_SMP
 	if (!p->sighand)
 		BUG();
+#ifndef write_is_locked
+# warning please implement read_is_locked()/write_is_locked()!
+# define write_is_locked rwlock_is_locked
+#endif
 	if (!spin_is_locked(&p->sighand->siglock) &&
-				!rwlock_is_locked(&tasklist_lock))
+				!write_is_locked(&tasklist_lock))
 		BUG();
 #endif
 	return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);
--- linux/kernel/spinlock.c.orig
+++ linux/kernel/spinlock.c
@@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock);
  * (We do this in a function because inlining it would be excessive.)
  */
 
-#define BUILD_LOCK_OPS(op, locktype, is_locked_fn)			\
-void __lockfunc _##op##_lock(locktype *lock)				\
+#define BUILD_LOCK_OPS(op, locktype)					\
+void __lockfunc _##op##_lock(locktype##_t *lock)			\
 {									\
 	preempt_disable();						\
 	for (;;) {							\
@@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock);						\
 									\
-unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock)		\
+unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock)	\
 {									\
 	unsigned long flags;						\
 									\
@@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir
 		preempt_enable();					\
 		if (!(lock)->break_lock)				\
 			(lock)->break_lock = 1;				\
-		while (is_locked_fn(lock) && (lock)->break_lock)	\
-			cpu_relax();					\
+		while (op##_is_locked(lock) && (lock)->break_lock)	\
+			locktype##_yield(lock);				\
 		preempt_disable();					\
 	}								\
 	return flags;							\
@@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir
 									\
 EXPORT_SYMBOL(_##op##_lock_irqsave);					\
 									\
-void __lockfunc _##op##_lock_irq(locktype *lock)			\
+void __lockfunc _##op##_lock_irq(locktype##_t *lock)			\
 {									\
 	_##op##_lock_irqsave(lock);					\
 }									\
 									\
 EXPORT_SYMBOL(_##op##_lock_irq);					\
 									\
-void __lockfunc _##op##_lock_bh(locktype *lock)				\
+void __lockfunc _##op##_lock_bh(locktype##_t *lock)			\
 {									\
 	unsigned long flags;						\
 									\
@@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh)
  *         _[spin|read|write]_lock_irqsave()
  *         _[spin|read|write]_lock_bh()
  */
-BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked);
-BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked);
-BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked);
+BUILD_LOCK_OPS(spin, spinlock);
+BUILD_LOCK_OPS(read, rwlock);
+BUILD_LOCK_OPS(write, rwlock);
 
 #endif /* CONFIG_PREEMPT */
 
--- linux/arch/ppc64/lib/locks.c.orig
+++ linux/arch/ppc64/lib/locks.c
@@ -23,7 +23,7 @@
 /* waiting for a spinlock... */
 #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES)
 
-void __spin_yield(spinlock_t *lock)
+void spinlock_yield(spinlock_t *lock)
 {
 	unsigned int lock_value, holder_cpu, yield_count;
 	struct paca_struct *holder_paca;
@@ -54,7 +54,7 @@ void __spin_yield(spinlock_t *lock)
  * This turns out to be the same for read and write locks, since
  * we only know the holder if it is write-locked.
  */
-void __rw_yield(rwlock_t *rw)
+void rwlock_yield(rwlock_t *rw)
 {
 	int lock_value;
 	unsigned int holder_cpu, yield_count;
@@ -87,7 +87,7 @@ void spin_unlock_wait(spinlock_t *lock)
 	while (lock->lock) {
 		HMT_low();
 		if (SHARED_PROCESSOR)
-			__spin_yield(lock);
+			spinlock_yield(lock);
 	}
 	HMT_medium();
 }
--- linux/include/asm-ia64/spinlock.h.orig
+++ linux/include/asm-ia64/spinlock.h
@@ -17,6 +17,8 @@
 #include <asm/intrinsics.h>
 #include <asm/system.h>
 
+#include <asm-generic/spinlock.h>
+
 typedef struct {
 	volatile unsigned int lock;
 #ifdef CONFIG_PREEMPT
--- linux/include/asm-generic/spinlock.h.orig
+++ linux/include/asm-generic/spinlock.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_GENERIC_SPINLOCK_H
+#define _ASM_GENERIC_SPINLOCK_H
+
+/*
+ * Virtual platforms might use these to
+ * yield to specific virtual CPUs:
+ */
+#define spinlock_yield(lock)	cpu_relax()
+#define rwlock_yield(lock)	cpu_relax()
+
+#endif /* _ASM_GENERIC_SPINLOCK_H */
--- linux/include/linux/spinlock.h.orig
+++ linux/include/linux/spinlock.h
@@ -202,10 +202,12 @@ typedef struct {
 #define _raw_spin_lock(lock)	do { (void)(lock); } while(0)
 #define spin_is_locked(lock)	((void)(lock), 0)
 #define _raw_spin_trylock(lock)	(((void)(lock), 1))
-#define spin_unlock_wait(lock)	(void)(lock);
+#define spin_unlock_wait(lock)	(void)(lock)
 #define _raw_spin_unlock(lock) do { (void)(lock); } while(0)
 #endif /* CONFIG_DEBUG_SPINLOCK */
 
+#define spinlock_yield(lock)	(void)(lock)
+
 /* RW spinlocks: No debug version */
 
 #if (__GNUC__ > 2)
@@ -224,6 +226,8 @@ typedef struct {
 #define _raw_read_trylock(lock) ({ (void)(lock); (1); })
 #define _raw_write_trylock(lock) ({ (void)(lock); (1); })
 
+#define rwlock_yield(lock)	(void)(lock)
+
 #define _spin_trylock(lock)	({preempt_disable(); _raw_spin_trylock(lock) ? \
 				1 : ({preempt_enable(); 0;});})
 
--- linux/include/asm-i386/spinlock.h.orig
+++ linux/include/asm-i386/spinlock.h
@@ -7,6 +7,8 @@
 #include <linux/config.h>
 #include <linux/compiler.h>
 
+#include <asm-generic/spinlock.h>
+
 asmlinkage int printk(const char * fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
 
@@ -15,7 +17,7 @@ asmlinkage int printk(const char * fmt, 
  */
 
 typedef struct {
-	volatile unsigned int lock;
+	volatile unsigned int slock;
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned magic;
 #endif
@@ -43,7 +45,7 @@ typedef struct {
  * We make no fairness assumptions. They have a cost.
  */
 
-#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
+#define spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) <= 0)
 #define spin_unlock_wait(x)	do { barrier(); } while(spin_is_locked(x))
 
 #define spin_lock_string \
@@ -83,7 +85,7 @@ typedef struct {
 
 #define spin_unlock_string \
 	"movb $1,%0" \
-		:"=m" (lock->lock) : : "memory"
+		:"=m" (lock->slock) : : "memory"
 
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -101,7 +103,7 @@ static inline void _raw_spin_unlock(spin
 
 #define spin_unlock_string \
 	"xchgb %b0, %1" \
-		:"=q" (oldval), "=m" (lock->lock) \
+		:"=q" (oldval), "=m" (lock->slock) \
 		:"0" (oldval) : "memory"
 
 static inline void _raw_spin_unlock(spinlock_t *lock)
@@ -123,7 +125,7 @@ static inline int _raw_spin_trylock(spin
 	char oldval;
 	__asm__ __volatile__(
 		"xchgb %b0,%1"
-		:"=q" (oldval), "=m" (lock->lock)
+		:"=q" (oldval), "=m" (lock->slock)
 		:"0" (0) : "memory");
 	return oldval > 0;
 }
@@ -138,7 +140,7 @@ static inline void _raw_spin_lock(spinlo
 #endif
 	__asm__ __volatile__(
 		spin_lock_string
-		:"=m" (lock->lock) : : "memory");
+		:"=m" (lock->slock) : : "memory");
 }
 
 static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags)
@@ -151,7 +153,7 @@ static inline void _raw_spin_lock_flags 
 #endif
 	__asm__ __volatile__(
 		spin_lock_string_flags
-		:"=m" (lock->lock) : "r" (flags) : "memory");
+		:"=m" (lock->slock) : "r" (flags) : "memory");
 }
 
 /*
@@ -186,7 +188,17 @@ typedef struct {
 
 #define rwlock_init(x)	do { *(x) = RW_LOCK_UNLOCKED; } while(0)
 
-#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
+/**
+ * read_is_locked - would read_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0)
+
+/**
+ * write_is_locked - would write_trylock() fail?
+ * @lock: the rwlock in question.
+ */
+#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
 
 /*
  * On x86, we implement read-write locks as a 32-bit counter
--- linux/include/asm-ppc64/spinlock.h.orig
+++ linux/include/asm-ppc64/spinlock.h
@@ -64,11 +64,11 @@ static __inline__ void _raw_spin_unlock(
 #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc)
-extern void __spin_yield(spinlock_t *lock);
-extern void __rw_yield(rwlock_t *lock);
+extern void spinlock_yield(spinlock_t *lock);
+extern void rwlock_yield(rwlock_t *lock);
 #else /* SPLPAR || ISERIES */
-#define __spin_yield(x)	barrier()
-#define __rw_yield(x)	barrier()
+#define spinlock_yield(x)	barrier()
+#define rwlock_yield(x)	barrier()
 #define SHARED_PROCESSOR	0
 #endif
 extern void spin_unlock_wait(spinlock_t *lock);
@@ -109,7 +109,7 @@ static void __inline__ _raw_spin_lock(sp
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__spin_yield(lock);
+				spinlock_yield(lock);
 		} while (likely(lock->lock != 0));
 		HMT_medium();
 	}
@@ -127,7 +127,7 @@ static void __inline__ _raw_spin_lock_fl
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__spin_yield(lock);
+				spinlock_yield(lock);
 		} while (likely(lock->lock != 0));
 		HMT_medium();
 		local_irq_restore(flags_dis);
@@ -201,7 +201,7 @@ static void __inline__ _raw_read_lock(rw
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__rw_yield(rw);
+				rwlock_yield(rw);
 		} while (likely(rw->lock < 0));
 		HMT_medium();
 	}
@@ -258,7 +258,7 @@ static void __inline__ _raw_write_lock(r
 		do {
 			HMT_low();
 			if (SHARED_PROCESSOR)
-				__rw_yield(rw);
+				rwlock_yield(rw);
 		} while (likely(rw->lock != 0));
 		HMT_medium();
 	}
--- linux/include/asm-x86_64/spinlock.h.orig
+++ linux/include/asm-x86_64/spinlock.h
@@ -6,6 +6,8 @@
 #include <asm/page.h>
 #include <linux/config.h>
 
+#include <asm-generic/spinlock.h>
+
 extern int printk(const char * fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
 

From nfont at austin.ibm.com  Thu Jan 20 03:00:22 2005
From: nfont at austin.ibm.com (Nathan Fontenot)
Date: Wed, 19 Jan 2005 10:00:22 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>	<20050117201415.GA11505@austin.ibm.com>
	<16877.63693.915740.385920@cargo.ozlabs.ibm.com>
Message-ID: <41EE8416.502@austin.ibm.com>


Paul Mackerras wrote:

> 5. AFAICS userland will get an unplug notification for the device, but
>    nothing to indicate that is due to an EEH slot isolation event.  I
>    think userland should be told about EEH events.
> 

Currently there is a way for userland to determine if a hotplug event 
they receive is due to an EEH slot isolation event.  It's not very 
pretty and requires the rtas_errd daemon to be running.

The RTAS event generated from the EEH event is logged to 
/var/log/platform by rtas_errd.  Userland scripts would have to search 
the file for a recent EEH event matching their device to make this 
determination.  This isn't as nice as a direct notification but is what 
we have at this point.

-- 
Nathan Fontenot


From willschm at us.ibm.com  Thu Jan 20 08:50:20 2005
From: willschm at us.ibm.com (Will Schmidt)
Date: Wed, 19 Jan 2005 15:50:20 -0600
Subject: question about LMB's size
In-Reply-To: <OF3731450D.CC3ED7AE-ON48256F8E.0019A3BE-48256F8E.001B2596@cn.ibm.com>
Message-ID: <OFA3AEE647.D179A43E-ON86256F8E.0070C978-86256F8E.0077F780@us.ibm.com>


Hi,

Zhao Yu Wang <wangzyu at cn.ibm.com> wrote on 01/18/2005 10:50:39 PM:

> Hi,Will
> Thanks
>
> >in early boot messages, there is a line "SystemCfg->physicalMemorySize =
> >0x.......".   This value should be precisely what you are trying to
> >measure.
>
> >A bit later in the logs, you can also see a line
> >"Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k
data,
> >###k bss, ###k init).
> >the YYYYk should also match what you are looking for.
>
> If the system boot up several days before, the boot log is not
> available at this time. Whether there has any other method to get
> the physical memory from lpar.

You should be able to find a copy of the early boot log somewhere in
/var/log; either /var/log/boot.msg or /var/log/dmesg.   Depending on the
distro or kernel level, the assortments of files in /var/log seems to vary.

>
> Could the OS provide a method to obtain the real memory. It will
> help to dynamic reassign resource according by the load between
> several partition.

My recommendation is that you stick with the values that are reported via
the HMC commands.   There might be an RMC command on the Linux side that
can obtain the value from the HMC, but am not postive of that.


>
>
> Thanks & Best regards,
>
> --------------------------------------------
> Wang Zhaoyu
>
> Email: wangzyu at cn.ibm.com
> Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN

-Will


From benh at kernel.crashing.org  Thu Jan 20 18:33:09 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Thu, 20 Jan 2005 18:33:09 +1100
Subject: ppc64 vDSO update
Message-ID: <1106206389.5294.82.camel@gaston>

Latest update for the ppc64 vDSO. Yesterday patch had build issues (some
bits were missing from the patch file). This also fixes a time
management issue and incorrect eh_frame_hdr sections.
 
http://gate.crashing.org/~benh/ppc64-vdso-20050120.diff

Will be sent upstream soon.

Ben.


From linas at austin.ibm.com  Fri Jan 21 09:39:16 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Thu, 20 Jan 2005 16:39:16 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
	<20050117201415.GA11505@austin.ibm.com>
	<16877.63693.915740.385920@cargo.ozlabs.ibm.com>
Message-ID: <20050120223916.GJ9140@austin.ibm.com>


On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
> 
> > p.s.  It was not clear to me if the EEH patch previously sent 
> > (6 January 2005, same subject line) will be wending its way into 
> > the main Torvalds kernel tree, or not.  I hadn't really gotten
> > confirmation one way or another.
> 
> I'm not really totally happy with it yet, on a number of fronts:
> 
> 1. You're adding more PCI-specific stuff to the device_node struct,
>    which I don't like.  I would prefer that the device_node tree
>    contains basically just what we get from OF, and that we have a
>    separate struct for storing ppc64-specific information for each PCI
>    device.  Fixing that is outside the scope of your patch, though.

I wrote this down on my to-do list.  Its the sort of thing that 
evaporates from my consciousness when other things come along,
but I'll give it a shot.  

> 2. I don't see why the device nodes for the PCI subtree being reset
>    would go away, and thus I don't see the need for your eeh_cfg_tree
>    struct.

Its not the reset, its the hot-plug remove.  The hot plug code assumes
that you are going to physically remove the device from the slot, so
it removes the device_node as part of the "unconfig".  

Of course, I found this out only after performing a null-pointer deref.
Note only does the node go away, but all of the various pointers it holds
are zeroed in the process.  

The cfg tree holds on to those pointers, so that I wouldn't have to
muck with the device_node removal code to do something tricky.

> 3. Is there a good reason why we can't use the assigned-addresses
>    property on the relevant device tree nodes to tell us what to set
>    the BARs to?

Yes, the reason is that after a reset, that property doesn't hold any 
decent data.   I discussed this with the firmware developers, and thier 
response was that it is the kernel's responsibility to compute 
(or save/restore) such values.  (Except for bridges, which they will do for us).

> 4. I think the 5 second sleep is quite bogus, and shows that we have
>    the flow of control wrong.  

:)  Yes, well, indeed it is.  Don't look at me, not my idea.

> In particular I think it should be a
>    userland write to a sysfs file that kicks off the restart process
>    rather than it just happening after 5 seconds.  Anyway, what
>    process or thread is executing that 5 second sleep?  Is it keventd
>    or something?

Its a workqueue.

> 5. AFAICS userland will get an unplug notification for the device, but
>    nothing to indicate that is due to an EEH slot isolation event.  I
>    think userland should be told about EEH events.

In principle, I'd agree. In practice, this would seem to require changes
or additions or enhancements to udev that I don't quite understand, as
well as potential changes to udev scripts.  Maybe I don't understand
sysfs sufficiently well.  I am very tempted to punt on this, and wait 
for the Intel-backed PCI-E code to get to this point, and then do whatever 
they're doing.

--linas


From linas at austin.ibm.com  Fri Jan 21 09:48:12 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Thu, 20 Jan 2005 16:48:12 -0600
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
	<20050117201415.GA11505@austin.ibm.com>
	<16877.63693.915740.385920@cargo.ozlabs.ibm.com>
Message-ID: <20050120224812.GK9140@austin.ibm.com>

On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
> 
> > p.s.  It was not clear to me if the EEH patch previously sent 
> > (6 January 2005, same subject line) will be wending its way into 
> > the main Torvalds kernel tree, or not.  I hadn't really gotten
> > confirmation one way or another.
> 
> I'm not really totally happy with it yet, on a number of fronts:

[...]

I forgot to mention: while I agree with some/many of these points,
especially with regards to recovery, I'd also like to note that the 
patch was mailed in two independent parts:  

-- a number of generic infrastructure routines, all in a ppc64 patch, and
-- the code that actually performs the recovery, as a patch to 
   the drivers/pci/hotplug subsystem.

While the actual recovery code is controversial (e.g. no support of 
scsi recovery), I'd like to at least get in the the generic 
infrastructure pieces.  

--linas


From paulus at samba.org  Fri Jan 21 13:50:50 2005
From: paulus at samba.org (Paul Mackerras)
Date: Fri, 21 Jan 2005 13:50:50 +1100
Subject: [PATCH] PPC64: EEH Recovery
In-Reply-To: <20050120223916.GJ9140@austin.ibm.com>
References: <20050106192413.GK22274@austin.ibm.com>
	<20050117201415.GA11505@austin.ibm.com>
	<16877.63693.915740.385920@cargo.ozlabs.ibm.com>
	<20050120223916.GJ9140@austin.ibm.com>
Message-ID: <16880.28170.976516.285336@cargo.ozlabs.ibm.com>

Linas Vepstas writes:

> > 2. I don't see why the device nodes for the PCI subtree being reset
> >    would go away, and thus I don't see the need for your eeh_cfg_tree
> >    struct.
> 
> Its not the reset, its the hot-plug remove.  The hot plug code assumes
> that you are going to physically remove the device from the slot, so
> it removes the device_node as part of the "unconfig".  

OK, I missed that.  It seems a bit bogus to me.  Could you point me at
where in the code this happens?

> > 3. Is there a good reason why we can't use the assigned-addresses
> >    property on the relevant device tree nodes to tell us what to set
> >    the BARs to?
> 
> Yes, the reason is that after a reset, that property doesn't hold any 
> decent data.   I discussed this with the firmware developers, and thier 
> response was that it is the kernel's responsibility to compute 
> (or save/restore) such values.  (Except for bridges, which they will do for us).

The not holding any decent data is a consequence of the device nodes
getting thrown away, isn't it?  I fail to see how resetting the device
can of itself affect our copy of the device tree.

> > In particular I think it should be a
> >    userland write to a sysfs file that kicks off the restart process
> >    rather than it just happening after 5 seconds.  Anyway, what
> >    process or thread is executing that 5 second sleep?  Is it keventd
> >    or something?
> 
> Its a workqueue.

Which get run in keventd's context.  In other words no other
workqueues will get run during the 5 second sleep, or at least not on
that cpu.

Paul.


From anton at samba.org  Fri Jan 21 16:40:43 2005
From: anton at samba.org (Anton Blanchard)
Date: Fri, 21 Jan 2005 16:40:43 +1100
Subject: [PATCH] ppc64: limit segment tables on UP kernels
Message-ID: <20050121054043.GA10563@krispykreme.ozlabs.ibm.com>


We were allocating 48 segment tables on UP kernels. Remove them and save
192kB of kernel memory on UP builds.

Anton

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/kernel/head.S~limit_stab_on_up arch/ppc64/kernel/head.S
--- foobar2/arch/ppc64/kernel/head.S~limit_stab_on_up	2005-01-19 15:16:28.987107097 +1100
+++ foobar2-anton/arch/ppc64/kernel/head.S	2005-01-19 15:16:29.009105597 +1100
@@ -2145,10 +2145,12 @@ swapper_pg_dir:
 ioremap_dir:
 	.space	4096
 
+#ifdef CONFIG_SMP
 /* 1 page segment table per cpu (max 48, cpu0 allocated at STAB0_PHYS_ADDR) */
 	.globl	stab_array
 stab_array:
 	.space	4096 * 48
+#endif
 	
 /*
  * This space gets a copy of optional info passed to us by the bootstrap
_


From j.glisse at gmail.com  Fri Jan 21 22:22:10 2005
From: j.glisse at gmail.com (Jerome Glisse)
Date: Fri, 21 Jan 2005 12:22:10 +0100
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <20050120231442.GE2626@smtp.west.cox.net>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
	<20050110182940.GA3391@smtp.west.cox.net>
	<4240b91605011010593d2f3b3d@mail.gmail.com>
	<20050110191248.GB3391@smtp.west.cox.net>
	<4240b91605011011314bb06814@mail.gmail.com>
	<4240b91605011211101ed322a8@mail.gmail.com>
	<20050120231442.GE2626@smtp.west.cox.net>
Message-ID: <4240b916050121032230b9c5dc@mail.gmail.com>

On Thu, 20 Jan 2005 16:14:42 -0700, Tom Rini <trini at kernel.crashing.org> wrote:
> On Wed, Jan 12, 2005 at 08:10:58PM +0100, Jerome Glisse wrote:
> 
> > Wanted to know what is going on with CONFIG_6xx?  You will use
> > my patch or do you have another better way ? :)
> 
> Can you resend it please?
> 

Here is another version (the previous one used ifdef to comment
out function call but i read somewhere that this doesn't follow
codeguideline). Anyway i think that my patch is a ugly hack.

Signed-off-by: Jerome Glisse <j.glisse at gmail.com>

best,
Jerome Glisse


diff -Naur linux/arch/ppc/boot/simple/misc-prep.c
linux-2.6.10/arch/ppc/boot/simple/misc-prep.c
--- linux/arch/ppc/boot/simple/misc-prep.c	2004-12-24 22:33:51.000000000 +0100
+++ linux-2.6.10/arch/ppc/boot/simple/misc-prep.c	2005-01-21
12:09:50.976426672 +0100
@@ -34,7 +34,11 @@
 extern void serial_fixups(void);
 extern struct bi_record *decompress_kernel(unsigned long load_addr,
 		int num_words, unsigned long cksum);
+#ifdef CONFIG_6XX
 extern void disable_6xx_mmu(void);
+#elif
+void disable_6xx_mmu(void) {}
+#endif
 extern unsigned long mpc10x_get_mem_size(void);
 
 static void


From geert at linux-m68k.org  Fri Jan 21 23:36:14 2005
From: geert at linux-m68k.org (Geert Uytterhoeven)
Date: Fri, 21 Jan 2005 13:36:14 +0100 (MET)
Subject: Classic PPC specific ASM (CONFIG_6XX)
In-Reply-To: <4240b916050121032230b9c5dc@mail.gmail.com>
References: <4240b916050109074053e328b1@mail.gmail.com>
	<16865.39960.274092.996530@cargo.ozlabs.ibm.com>
	<20050110145219.GB2226@smtp.west.cox.net>
	<4240b9160501101014317b8d85@mail.gmail.com>
	<20050110182940.GA3391@smtp.west.cox.net>
	<4240b91605011010593d2f3b3d@mail.gmail.com>
	<20050110191248.GB3391@smtp.west.cox.net>
	<4240b91605011011314bb06814@mail.gmail.com>
	<4240b91605011211101ed322a8@mail.gmail.com>
	<20050120231442.GE2626@smtp.west.cox.net>
	<4240b916050121032230b9c5dc@mail.gmail.com>
Message-ID: <Pine.GSO.4.61.0501211335450.1075@waterleaf.sonytel.be>

On Fri, 21 Jan 2005, Jerome Glisse wrote:
> On Thu, 20 Jan 2005 16:14:42 -0700, Tom Rini <trini at kernel.crashing.org> wrote:
> > On Wed, Jan 12, 2005 at 08:10:58PM +0100, Jerome Glisse wrote:
> > 
> > > Wanted to know what is going on with CONFIG_6xx?  You will use
> > > my patch or do you have another better way ? :)
> > 
> > Can you resend it please?
> > 
> 
> Here is another version (the previous one used ifdef to comment
> out function call but i read somewhere that this doesn't follow
> codeguideline). Anyway i think that my patch is a ugly hack.
> 
> Signed-off-by: Jerome Glisse <j.glisse at gmail.com>
> 
> best,
> Jerome Glisse
> 
> 
> 
> diff -Naur linux/arch/ppc/boot/simple/misc-prep.c
> linux-2.6.10/arch/ppc/boot/simple/misc-prep.c
> --- linux/arch/ppc/boot/simple/misc-prep.c	2004-12-24 22:33:51.000000000 +0100
> +++ linux-2.6.10/arch/ppc/boot/simple/misc-prep.c	2005-01-21
> 12:09:50.976426672 +0100
> @@ -34,7 +34,11 @@
>  extern void serial_fixups(void);
>  extern struct bi_record *decompress_kernel(unsigned long load_addr,
>  		int num_words, unsigned long cksum);
> +#ifdef CONFIG_6XX
>  extern void disable_6xx_mmu(void);
> +#elif
> +void disable_6xx_mmu(void) {}
   ^^^^^^^^^^^^^^^^^^^^
You better make this one static inline.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


From paulus at samba.org  Sat Jan 22 16:25:21 2005
From: paulus at samba.org (Paul Mackerras)
Date: Sat, 22 Jan 2005 16:25:21 +1100
Subject: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with
	msleep()
In-Reply-To: <20050118001819.GA24698@us.ibm.com>
References: <20050118001819.GA24698@us.ibm.com>
Message-ID: <16881.58305.424934.884018@cargo.ozlabs.ibm.com>

Nishanth Aravamudan writes:

> Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases,
> the current code sleeps in TASK_INTERRUPTIBLE but does not account for early
> wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE
> sleeps in both cases. The second sleep is slightly more difficult to convert as
> rtas_event_scan_rate is variable. I have left it as a msleep() call, although
> ssleep() may be more appropriate.

You have a good point about signals, but I don't like the way that
this will elevate the load average by 1 the whole time.  We need to
fix this properly instead.

Paul.


From paulus at samba.org  Sat Jan 22 20:11:46 2005
From: paulus at samba.org (Paul Mackerras)
Date: Sat, 22 Jan 2005 20:11:46 +1100
Subject: [PATCH 2/2] xmon io space read
In-Reply-To: <20050105145757.62c84c3b@localhost>
References: <20050105144502.56a15bcd@localhost>
	<20050105145757.62c84c3b@localhost>
Message-ID: <16882.6354.955035.976749@cargo.ozlabs.ibm.com>

Jake Moilanen writes:

> Here is the support code for xmon to read IO space.  

I would prefer to see that as a variant of the 'm' command, i.e. you
would type 'mi 3f8' to look at serial port registers, etc.

Paul.


From nish.aravamudan at gmail.com  Sun Jan 23 05:49:48 2005
From: nish.aravamudan at gmail.com (Nish Aravamudan)
Date: Sat, 22 Jan 2005 10:49:48 -0800
Subject: [KJ] Re: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout()
	with msleep()
In-Reply-To: <16881.58305.424934.884018@cargo.ozlabs.ibm.com>
References: <20050118001819.GA24698@us.ibm.com>
	<16881.58305.424934.884018@cargo.ozlabs.ibm.com>
Message-ID: <29495f1d05012210497ee384b3@mail.gmail.com>

On Sat, 22 Jan 2005 16:25:21 +1100, Paul Mackerras <paulus at samba.org> wrote:
> Nishanth Aravamudan writes:
> 
> > Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases,
> > the current code sleeps in TASK_INTERRUPTIBLE but does not account for early
> > wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE
> > sleeps in both cases. The second sleep is slightly more difficult to convert as
> > rtas_event_scan_rate is variable. I have left it as a msleep() call, although
> > ssleep() may be more appropriate.
> 
> You have a good point about signals, but I don't like the way that
> this will elevate the load average by 1 the whole time.  We need to
> fix this properly instead.

Ideally, we should fix the load average calculation :) It just seems
counterintuitive to me that people would use a less correct
sleep-state just to prevent the load average from going up. But I
understand your motivation, so it's ok. Just an FYI/FWIW, it seems
most other driver authors/maintainers have been somewhat ok with use
TASK_UNINTERRUPTIBLE via msleep()/ssleep(), just because of the time
units difference (which is just so much easier to understand).
Admittedly, it's going to be a long time before HZ is completely out
of the kernel (at least the way it is used today to calculate
delays/timeouts -- there are a total of ~4000 lines of HZ throughout
the kernel), but these patches *are* the first step.

How exactly do you mean fix it properly? Do you want to deal with
signals? It doesn't seem like the code should fail if a signal hits,
but you could save the signal state, block all signals, sleep
interruptibly (to prevent load average) and then restore all signals
on wake-up. I would also add a comment to the effect that
TASK_UNINTERRUPTIBLE would be acceptable, if the loadavg calculation
changes; just so another Janitor (once the calc. does change) could go
through and change it to msleep() / ssleep() then.

Thanks,
Nish


From anton at samba.org  Sun Jan 23 15:27:33 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 23 Jan 2005 15:27:33 +1100
Subject: [PATCH] ppc64: Allow EEH to be disabled
In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com>
Message-ID: <20050123042733.GA5920@krispykreme.ozlabs.ibm.com>

 
Allow EEH to be disabled for pSeries targets, but only if the EMBEDDED
option is enabled. This version incorporates some suggestions from
Arnd Bergmann and Linas Vepstas.

Signed-off-by: Anton Blanchard <anton at samba.org>

===== arch/ppc64/Kconfig 1.77 vs edited =====
--- 1.77/arch/ppc64/Kconfig	2005-01-21 15:56:33 +11:00
+++ edited/arch/ppc64/Kconfig	2005-01-23 15:15:19 +11:00
@@ -234,6 +234,11 @@
 	  Say Y here if you are building a kernel for a desktop system.
 	  Say N if you are unsure.
 
+config EEH
+	bool "PCI Extended Error Handling (EEH)" if EMBEDDED
+	depends on PPC_PSERIES
+	default y if !EMBEDDED
+
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
===== arch/ppc64/kernel/Makefile 1.58 vs edited =====
--- 1.58/arch/ppc64/kernel/Makefile	2005-01-08 16:43:52 +11:00
+++ edited/arch/ppc64/kernel/Makefile	2005-01-23 15:15:20 +11:00
@@ -30,9 +30,10 @@
 obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o
 
 obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \
-			     eeh.o pSeries_nvram.o rtasd.o ras.o \
+			     pSeries_nvram.o rtasd.o ras.o \
 			     xics.o rtas.o pSeries_setup.o pSeries_iommu.o
 
+obj-$(CONFIG_EEH)		+= eeh.o
 obj-$(CONFIG_PROC_FS)		+= proc_ppc64.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_SMP)		+= smp.o
===== arch/ppc64/kernel/eeh.c 1.43 vs edited =====
--- 1.43/arch/ppc64/kernel/eeh.c	2005-01-21 16:02:09 +11:00
+++ edited/arch/ppc64/kernel/eeh.c	2005-01-23 15:15:23 +11:00
@@ -764,8 +764,6 @@
 	struct device_node *phb, *np;
 	struct eeh_early_enable_info info;
 
-	init_pci_config_tokens();
-
 	np = of_find_node_by_path("/rtas");
 	if (np == NULL)
 		return;
===== arch/ppc64/kernel/pSeries_setup.c 1.66 vs edited =====
--- 1.66/arch/ppc64/kernel/pSeries_setup.c	2005-01-21 16:02:10 +11:00
+++ edited/arch/ppc64/kernel/pSeries_setup.c	2005-01-23 15:15:22 +11:00
@@ -40,7 +40,6 @@
 #include <linux/adb.h>
 #include <linux/module.h>
 #include <linux/delay.h>
-
 #include <linux/irq.h>
 #include <linux/seq_file.h>
 #include <linux/root_dev.h>
@@ -59,13 +58,12 @@
 #include <asm/time.h>
 #include <asm/nvram.h>
 #include <asm/plpar_wrappers.h>
-
-#include "i8259.h"
 #include <asm/xics.h>
-#include <asm/ppcdebug.h>
 #include <asm/cputable.h>
 
+#include "i8259.h"
 #include "mpic.h"
+#include "pci.h"
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -73,7 +71,6 @@
 #define DBG(fmt...)
 #endif
 
-extern void find_and_init_phbs(void);
 extern void pSeries_final_fixup(void);
 
 extern void pSeries_get_boot_time(struct rtc_time *rtc_time);
@@ -87,10 +84,6 @@
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
-unsigned long  virtPython0Facilities = 0;  // python0 facility area (memory mapped io) (64-bit format) VIRTUAL address.
-
-extern unsigned long loops_per_jiffy;
-
 extern unsigned long ppc_proc_freq;
 extern unsigned long ppc_tb_freq;
 
@@ -230,7 +223,7 @@
 	fwnmi_init();
 
 	/* Find and initialize PCI host bridges */
-	/* iSeries needs to be done much later. */
+	init_pci_config_tokens();
 	eeh_init();
 	find_and_init_phbs();
 
===== include/asm-ppc64/eeh.h 1.23 vs edited =====
--- 1.23/include/asm-ppc64/eeh.h	2004-10-26 09:17:38 +10:00
+++ edited/include/asm-ppc64/eeh.h	2005-01-23 15:15:21 +11:00
@@ -20,28 +20,28 @@
 #ifndef _PPC64_EEH_H
 #define _PPC64_EEH_H
 
+#include <linux/config.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/string.h>
-#include <linux/notifier.h>
 
 struct pci_dev;
 struct device_node;
+struct device_node;
+struct notifier_block;
+
+#ifdef CONFIG_EEH
 
 /* Values for eeh_mode bits in device_node */
 #define EEH_MODE_SUPPORTED	(1<<0)
 #define EEH_MODE_NOCHECK	(1<<1)
 #define EEH_MODE_ISOLATED	(1<<2)
 
-#ifdef CONFIG_PPC_PSERIES
-extern void __init eeh_init(void);
-unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val);
-int eeh_dn_check_failure (struct device_node *dn, struct pci_dev *dev);
-void __iomem *eeh_ioremap(unsigned long addr, void __iomem *vaddr);
+void __init eeh_init(void);
+unsigned long eeh_check_failure(const volatile void __iomem *token,
+				unsigned long val);
+int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev);
 void __init pci_addr_cache_build(void);
-#else
-#define eeh_check_failure(token, val) (val)
-#endif
 
 /**
  * eeh_add_device_early
@@ -52,7 +52,6 @@
  * device (including config space i/o).  Call eeh_add_device_late
  * to finish the eeh setup for this device.
  */
-struct device_node;
 void eeh_add_device_early(struct device_node *);
 void eeh_add_device_late(struct pci_dev *);
 
@@ -69,8 +68,6 @@
 #define EEH_ENABLE		1
 #define EEH_RELEASE_LOADSTORE	2
 #define EEH_RELEASE_DMA		3
-int eeh_set_option(struct pci_dev *dev, int options);
-
 
 /**
  * Notifier event flags.
@@ -107,6 +104,18 @@
  */
 #define EEH_IO_ERROR_VALUE(size)	(~0U >> ((4 - (size)) * 8))
 
+#else
+#define eeh_init()
+#define eeh_check_failure(token, val) (val)
+#define eeh_dn_check_failure(dn, dev) (0)
+#define pci_addr_cache_build()
+#define eeh_add_device_early(dn)
+#define eeh_add_device_late(dev)
+#define eeh_remove_device(dev)
+#define EEH_POSSIBLE_ERROR(val, type) (0)
+#define EEH_IO_ERROR_VALUE(size) (-1UL)
+#endif
+
 /* 
  * MMIO read/write operations with EEH support.
  */
@@ -194,7 +203,8 @@
 #define EEH_CHECK_ALIGN(v,a) \
 	((((unsigned long)(v)) & ((a) - 1)) == 0)
 
-static inline void eeh_memset_io(volatile void __iomem *addr, int c, unsigned long n)
+static inline void eeh_memset_io(volatile void __iomem *addr, int c,
+				 unsigned long n)
 {
 	u32 lc = c;
 	lc |= lc << 8;


From anton at samba.org  Sun Jan 23 15:34:23 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 23 Jan 2005 15:34:23 +1100
Subject: [PATCH] ppc64: disable some boot wrapper debug
Message-ID: <20050123043423.GB5920@krispykreme.ozlabs.ibm.com>


Hi,

The debug information in the boot wrapper can be quite verbose (it
prints an entry for every address it attempts to claim). Disable it.

Anton

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN arch/ppc64/boot/main.c~disable_boot_debug arch/ppc64/boot/main.c
--- foobar2/arch/ppc64/boot/main.c~disable_boot_debug	2005-01-23 13:34:05.555656631 +1100
+++ foobar2-anton/arch/ppc64/boot/main.c	2005-01-23 13:34:05.577655139 +1100
@@ -73,7 +73,7 @@ void *stdin;
 void *stdout;
 void *stderr;
 
-#define DEBUG
+#undef DEBUG
 
 static unsigned long claim_base = PROG_START;
 
_


From anton at samba.org  Sun Jan 23 15:48:48 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 23 Jan 2005 15:48:48 +1100
Subject: [PATCH] ppc64: Problem disabling SYSVIPC
Message-ID: <20050123044848.GC5920@krispykreme.ozlabs.ibm.com>


Hi,

The kernel wouldnt link when SYSVIPC was disabled. x86-64 was already
defining a cond_syscall, instead of duplicating it in the ppc64 port
move it into the arch specific portion of kernel/sys_ni.c

Anton

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN kernel/sys_ni.c~fix_config_sysvipc2 kernel/sys_ni.c
--- foobar2/kernel/sys_ni.c~fix_config_sysvipc2	2005-01-12 00:17:55.800846282 +1100
+++ foobar2-anton/kernel/sys_ni.c	2005-01-12 00:18:59.720579810 +1100
@@ -81,4 +81,4 @@ cond_syscall(compat_sys_socketcall)
 cond_syscall(sys_pciconfig_read)
 cond_syscall(sys_pciconfig_write)
 cond_syscall(sys_pciconfig_iobase)
-
+cond_syscall(sys32_ipc)
diff -puN arch/ppc64/kernel/sys_ppc32.c~fix_config_sysvipc2 arch/ppc64/kernel/sys_ppc32.c
--- foobar2/arch/ppc64/kernel/sys_ppc32.c~fix_config_sysvipc2	2005-01-12 00:18:09.526904432 +1100
+++ foobar2-anton/arch/ppc64/kernel/sys_ppc32.c	2005-01-12 00:18:25.130082960 +1100
@@ -492,6 +492,7 @@ asmlinkage long sys32_settimeofday(struc
 	return do_sys_settimeofday(tv ? &kts : NULL, tz ? &ktz : NULL);
 }
 
+#ifdef CONFIG_SYSVIPC
 long sys32_ipc(u32 call, u32 first, u32 second, u32 third, compat_uptr_t ptr,
 	       u32 fifth)
 {
@@ -556,6 +557,7 @@ long sys32_ipc(u32 call, u32 first, u32 
 
 	return -ENOSYS;
 }
+#endif
 
 /* Note: it is necessary to treat out_fd and in_fd as unsigned ints, 
  * with the corresponding cast to a signed int to insure that the 
diff -puN arch/x86_64/ia32/sys_ia32.c~fix_config_sysvipc2 arch/x86_64/ia32/sys_ia32.c
--- foobar2/arch/x86_64/ia32/sys_ia32.c~fix_config_sysvipc2	2005-01-12 00:18:46.324623956 +1100
+++ foobar2-anton/arch/x86_64/ia32/sys_ia32.c	2005-01-12 00:18:52.356042193 +1100
@@ -1082,8 +1082,6 @@ long sys32_lookup_dcookie(u32 addr_low, 
 	return sys_lookup_dcookie(((u64)addr_high << 32) | addr_low, buf, len);
 }
 
-cond_syscall(sys32_ipc)
-
 static int __init ia32_init (void)
 {
 	printk("IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $\n");  
_


From anton at samba.org  Sun Jan 23 16:36:52 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 23 Jan 2005 16:36:52 +1100
Subject: [PATCH] ppc64: Enable virtual ethernet and virtual scsi
Message-ID: <20050123053652.GE5920@krispykreme.ozlabs.ibm.com>


Enable the virtual ethernet and virtual scsi drivers in the pseries
config. Since our root device may be on either we need them compiled in
(unless we play initrd tricks).

Signed-off-by: Anton Blanchard <anton at samba.org>

===== arch/ppc64/configs/pSeries_defconfig 1.10 vs edited =====
--- 1.10/arch/ppc64/configs/pSeries_defconfig	2004-11-27 22:20:13 +11:00
+++ edited/arch/ppc64/configs/pSeries_defconfig	2005-01-23 16:26:07 +11:00
@@ -268,7 +268,7 @@
 # CONFIG_SCSI_FUTURE_DOMAIN is not set
 # CONFIG_SCSI_GDTH is not set
 # CONFIG_SCSI_IPS is not set
-CONFIG_SCSI_IBMVSCSI=m
+CONFIG_SCSI_IBMVSCSI=y
 # CONFIG_SCSI_INIA100 is not set
 CONFIG_SCSI_SYM53C8XX_2=y
 CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0
@@ -492,7 +492,7 @@
 #
 # CONFIG_NET_TULIP is not set
 # CONFIG_HP100 is not set
-CONFIG_IBMVETH=m
+CONFIG_IBMVETH=y
 CONFIG_NET_PCI=y
 CONFIG_PCNET32=y
 # CONFIG_AMD8111_ETH is not set


From linas at austin.ibm.com  Tue Jan 25 10:04:53 2005
From: linas at austin.ibm.com (Linas Vepstas)
Date: Mon, 24 Jan 2005 17:04:53 -0600
Subject: saving & analyzing (by the bootloader) kernel boot log buf fer
	on "vanilla"Linux (2.6) usable for for 8xx ppc
In-Reply-To: <313680C9A886D511A06000204840E1CF0A64754F@whq-msgusr-02.pit.comms.marconi.com>
References: <313680C9A886D511A06000204840E1CF0A64754F@whq-msgusr-02.pit.comms.marconi.com>
Message-ID: <20050124230453.GN9140@austin.ibm.com>

On Sat, Jan 22, 2005 at 06:26:43AM -0500, Povolotsky, Alexander was heard to remark:
> I would suggest CONSIDER implementing - it would help for early debugging
> when serial console 
> is not working and no "live"output is available - I am in such situation
> right now !

Are you perchance seeing "Warning: unable to open an initial console."
on ppc64? If so, I am debugging that right now, and hope to have a patch
soon.

--linas


From anton at samba.org  Wed Jan 26 00:59:30 2005
From: anton at samba.org (Anton Blanchard)
Date: Wed, 26 Jan 2005 00:59:30 +1100
Subject: [PATCH] ppc64: mask lower bits in tlbie
Message-ID: <20050125135930.GH5920@krispykreme.ozlabs.ibm.com>


Hi,

We werent masking the lower bits of the VA in a tlbie(l) instruction.
While most CPUs ignore this we should play it safe and follow the spec.

Anton

Signed-off-by: Anton Blanchard <anton at samba.org>

diff -puN include/asm-ppc64/mmu.h~fix_tlbie include/asm-ppc64/mmu.h
--- gr_work/include/asm-ppc64/mmu.h~fix_tlbie	2005-01-12 22:54:35.098404315 -0600
+++ gr_work-anton/include/asm-ppc64/mmu.h	2005-01-12 22:54:35.107402890 -0600
@@ -122,10 +122,13 @@ static inline void __tlbie(unsigned long
 	/* clear top 16 bits, non SLS segment */
 	va &= ~(0xffffULL << 48);
 
-	if (large)
+	if (large) {
+		va &= HPAGE_MASK;
 		asm volatile("tlbie %0,1" : : "r"(va) : "memory");
-	else
+	} else {
+		va &= PAGE_MASK;
 		asm volatile("tlbie %0,0" : : "r"(va) : "memory");
+	}
 }
 
 static inline void tlbie(unsigned long va, int large)
@@ -139,6 +142,7 @@ static inline void __tlbiel(unsigned lon
 {
 	/* clear top 16 bits, non SLS segment */
 	va &= ~(0xffffULL << 48);
+	va &= PAGE_MASK;
 
 	/* 
 	 * Thanks to Alan Modra we are now able to use machine specific 
_


From nathanl at austin.ibm.com  Wed Jan 26 11:22:01 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Tue, 25 Jan 2005 18:22:01 -0600
Subject: [PATCH] show -1 for physical_id of non-present cpus
Message-ID: <1106698921.9091.4.camel@pants.austin.ibm.com>


Make the physical_id cpu attribute on ppc64 show -1 instead of 65535
for non-present cpus.

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>


---


diff -puN arch/ppc64/kernel/sysfs.c~cpu-physical_id-signed arch/ppc64/kernel/sysfs.c
--- linux-2.6.11-rc2-bk2/arch/ppc64/kernel/sysfs.c~cpu-physical_id-signed	2005-01-24 21:29:57.000000000 -0600
+++ linux-2.6.11-rc2-bk2-nathanl/arch/ppc64/kernel/sysfs.c	2005-01-25 09:41:15.000000000 -0600
@@ -387,7 +387,7 @@ static ssize_t show_physical_id(struct s
 {
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 
-	return sprintf(buf, "%u\n", get_hard_smp_processor_id(cpu->sysdev.id));
+	return sprintf(buf, "%hd\n", get_hard_smp_processor_id(cpu->sysdev.id));
 }
 static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL);
 

_


From olof at austin.ibm.com  Wed Jan 26 15:11:43 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Tue, 25 Jan 2005 22:11:43 -0600
Subject: [PATCH] show -1 for physical_id of non-present cpus
In-Reply-To: <1106698921.9091.4.camel@pants.austin.ibm.com>
References: <1106698921.9091.4.camel@pants.austin.ibm.com>
Message-ID: <41F7187F.9070602@austin.ibm.com>

Nathan Lynch wrote:

>Make the physical_id cpu attribute on ppc64 show -1 instead of 65535
>for non-present cpus.
>

Good catch.

I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 
and using %d. Either way is fine with me.


-Olof


From nathanl at austin.ibm.com  Wed Jan 26 15:41:03 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Tue, 25 Jan 2005 22:41:03 -0600
Subject: [PATCH] show -1 for physical_id of non-present cpus
In-Reply-To: <41F7187F.9070602@austin.ibm.com>
References: <1106698921.9091.4.camel@pants.austin.ibm.com>
	<41F7187F.9070602@austin.ibm.com>
Message-ID: <1106714463.9855.16.camel@localhost.localdomain>

On Tue, 2005-01-25 at 22:11 -0600, Olof Johansson wrote:
> Nathan Lynch wrote:
> 
> >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535
> >for non-present cpus.
> >
> 
> Good catch.
> 
> I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 
> and using %d. Either way is fine with me.

fwiw, I plan to to make the issue moot eventually by having only present
cpus show up in sysfs, but that's not going to happen in time for
2.6.11.

Nathan


From nathanl at austin.ibm.com  Wed Jan 26 16:06:31 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Tue, 25 Jan 2005 23:06:31 -0600
Subject: [RFC/PATCH 1/2] use notifier chain for device node addition and
	removal
Message-ID: <1106715991.9855.22.camel@localhost.localdomain>

This patch attempts to clean up the code which handles changes to the
Open Firmware device tree during PCI hotplug or DLPAR operations by
replacing the explicit fixups (e.g. of_finish_dynamic_node,
of_cleanup_node) with a notifier call chain.  It doesn't make all that
much of a dent in the ugliness -- note that I've simply folded
of_finish_dynamic_node into a high-priority notifier block while
leaving most of the function intact.

My ulterior motive here is that I want to be notified when processor
device nodes are added to the system, and I don't want to add yet more
special-case code to prom.c.  I'll be following up with a patch for this.

We could probably go further with the notifier chain approach, even to
the point of moving of_finish_dynamic_node and friends to a separate
module which could be config'd out for non-pSeries builds.

I haven't tested this with anything but adding and removing processors
from a Power5 partition, btw.  I'd appreciate any other testing
(e.g. PCI, VIO).

Thoughts?

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>


---


diff -puN arch/ppc64/kernel/pSeries_iommu.c~of-dlpar-notifier arch/ppc64/kernel/pSeries_iommu.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pSeries_iommu.c~of-dlpar-notifier	2005-01-25 22:56:46.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pSeries_iommu.c	2005-01-25 22:56:46.000000000 -0600
@@ -34,6 +34,7 @@
 #include <linux/string.h>
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
+#include <linux/notifier.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -439,6 +440,29 @@ static void iommu_dev_setup_pSeries(stru
 	}
 }
 
+static int iommu_of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node)
+{
+	struct device_node *node = (struct device_node *)_node;
+	int err = NOTIFY_DONE;
+
+	switch (action) {
+	case OF_RECONFIG_REMOVE:
+		if (node->iommu_table &&
+		    get_property(node, "ibm,dma-window", NULL)) {
+			iommu_free_table(node);
+			err = NOTIFY_OK;
+		}
+		break;
+	default:
+		break;
+	}
+	return err;
+}
+
+static struct notifier_block iommu_of_reconfig_nb = {
+	.notifier_call = iommu_of_reconfig_notifier,
+};
+
 static void iommu_bus_setup_null(struct pci_bus *b) { }
 static void iommu_dev_setup_null(struct pci_dev *d) { }
 
@@ -471,6 +495,8 @@ void iommu_init_early_pSeries(void)
 
 	ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries;
 
+	register_of_reconfig_notifier(&iommu_of_reconfig_nb);
+
 	pci_iommu_init();
 }
 
diff -puN arch/ppc64/kernel/pci_dn.c~of-dlpar-notifier arch/ppc64/kernel/pci_dn.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pci_dn.c~of-dlpar-notifier	2005-01-25 22:56:46.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pci_dn.c	2005-01-25 22:56:46.000000000 -0600
@@ -23,6 +23,7 @@
 #include <linux/pci.h>
 #include <linux/string.h>
 #include <linux/init.h>
+#include <linux/notifier.h>
 
 #include <asm/io.h>
 #include <asm/prom.h>
@@ -158,6 +159,25 @@ struct device_node *fetch_dev_dn(struct 
 }
 EXPORT_SYMBOL(fetch_dev_dn);
 
+static int pci_of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node)
+{
+	struct device_node *node = (struct device_node *)_node;
+	int err = NOTIFY_OK;
+
+	switch (action) {
+	case OF_RECONFIG_ADD:
+		update_dn_pci_info(node, node->parent->phb);
+		break;
+	default:
+		err = NOTIFY_DONE;
+		break;
+	}
+	return err;
+}
+
+static struct notifier_block pci_of_reconfig_nb = {
+	.notifier_call = pci_of_reconfig_notifier,
+};
 
 /*
  * Actually initialize the phbs.
@@ -170,4 +190,7 @@ void __init pci_devs_phb_init(void)
 	/* This must be done first so the device nodes have valid pci info! */
 	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
 		pci_devs_phb_init_dynamic(phb);
+
+	if (systemcfg->platform & PLATFORM_PSERIES)
+		register_of_reconfig_notifier(&pci_of_reconfig_nb);
 }
diff -puN arch/ppc64/kernel/prom.c~of-dlpar-notifier arch/ppc64/kernel/prom.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/prom.c~of-dlpar-notifier	2005-01-25 22:56:46.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/prom.c	2005-01-25 22:56:46.000000000 -0600
@@ -32,6 +32,7 @@
 #include <linux/delay.h>
 #include <linux/initrd.h>
 #include <linux/bitops.h>
+#include <linux/notifier.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
 #include <asm/lmb.h>
@@ -1671,7 +1672,6 @@ static int of_finish_dynamic_node_interr
 static int of_finish_dynamic_node(struct device_node *node)
 {
 	struct device_node *parent = of_get_parent(node);
-	u32 *regs;
 	int err = 0;
 	phandle *ibm_phandle;
 
@@ -1726,25 +1726,53 @@ static int of_finish_dynamic_node(struct
 		err = of_finish_dynamic_node_interrupts(node);
 		if (err) goto out;
 	}
+out:
+	of_node_put(parent);
+	return err;
+}
 
-	/* now do the rough equivalent of update_dn_pci_info, this
-	 * probably is not correct for phb's, but should work for
-	 * IOAs and slots.
-	 */
+static struct notifier_block *of_reconfig_chain;
+
+int register_of_reconfig_notifier(struct notifier_block *nb)
+{
+	return notifier_chain_register(&of_reconfig_chain, nb);
+}
 
-	node->phb = parent->phb;
+void unregister_of_reconfig_notifier(struct notifier_block *nb)
+{
+	notifier_chain_unregister(&of_reconfig_chain, nb);
+}
 
-	regs = (u32 *)get_property(node, "reg", NULL);
-	if (regs) {
-		node->busno = (regs[0] >> 16) & 0xff;
-		node->devfn = (regs[0] >> 8) & 0xff;
-	}
+static int of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node)
+{
+	struct device_node *node = (struct device_node *)_node;
+	int err = NOTIFY_OK;
 
-out:
-	of_node_put(parent);
+	switch (action) {
+	case OF_RECONFIG_ADD:
+		if (of_finish_dynamic_node(node))
+			err = NOTIFY_BAD;
+		break;
+	default:
+		err = NOTIFY_DONE;
+		break;
+	}
 	return err;
 }
 
+static struct notifier_block of_reconfig_nb = {
+	.notifier_call = of_reconfig_notifier,
+	.priority = 10, /* This one needs to run first */
+};
+
+static int __init of_reconfig_setup(void)
+{
+	if (systemcfg->platform & PLATFORM_PSERIES)
+		register_of_reconfig_notifier(&of_reconfig_nb);
+	return 0;
+}
+__initcall(of_reconfig_setup);
+
 /*
  * Given a path and a property list, construct an OF device node, add
  * it to the device tree and global list, and place it in
@@ -1778,9 +1806,11 @@ int of_add_node(const char *path, struct
 		return -EINVAL; /* could also be ENOMEM, though */
 	}
 
-	if (0 != (err = of_finish_dynamic_node(np))) {
+	err = notifier_call_chain(&of_reconfig_chain, OF_RECONFIG_ADD, np);
+	if (err == NOTIFY_BAD) {
+		printk(KERN_WARNING "Failed to add device node %s\n", path);
 		kfree(np);
-		return err;
+		return -EINVAL;
 	}
 
 	write_lock(&devtree_lock);
@@ -1798,15 +1828,6 @@ int of_add_node(const char *path, struct
 }
 
 /*
- * Prepare an OF node for removal from system
- */
-static void of_cleanup_node(struct device_node *np)
-{
-	if (np->iommu_table && get_property(np, "ibm,dma-window", NULL))
-		iommu_free_table(np);
-}
-
-/*
  * "Unplug" a node from the device tree.  The caller must hold
  * a reference to the node.  The memory associated with the node
  * is not freed until its refcount goes to zero.
@@ -1814,6 +1835,7 @@ static void of_cleanup_node(struct devic
 int of_remove_node(struct device_node *np)
 {
 	struct device_node *parent, *child;
+	int err;
 
 	parent = of_get_parent(np);
 	if (!parent)
@@ -1824,7 +1846,9 @@ int of_remove_node(struct device_node *n
 		return -EBUSY;
 	}
 
-	of_cleanup_node(np);
+	err = notifier_call_chain(&of_reconfig_chain, OF_RECONFIG_REMOVE, np);
+	if (err == NOTIFY_BAD)
+		return -EBUSY;
 
 	write_lock(&devtree_lock);
 	remove_node_proc_entries(np);
diff -puN include/asm-ppc64/prom.h~of-dlpar-notifier include/asm-ppc64/prom.h
--- linux-2.6.11-rc2-mm1/include/asm-ppc64/prom.h~of-dlpar-notifier	2005-01-25 22:56:46.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/include/asm-ppc64/prom.h	2005-01-25 22:56:46.000000000 -0600
@@ -211,6 +211,14 @@ extern void of_node_put(struct device_no
 extern int of_add_node(const char *path, struct property *proplist);
 extern int of_remove_node(struct device_node *np);
 
+/* For notification of device node addition and removal */
+extern int register_of_reconfig_notifier(struct notifier_block *nb);
+extern void unregister_of_reconfig_notifier(struct notifier_block *nb);
+
+/* Notification codes for users of the above */
+#define OF_RECONFIG_ADD       0x0001
+#define OF_RECONFIG_REMOVE    0x0002
+
 /* Other Prototypes */
 extern unsigned long prom_init(unsigned long, unsigned long, unsigned long,
 	unsigned long, unsigned long);

_


From nathanl at austin.ibm.com  Wed Jan 26 16:11:05 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Tue, 25 Jan 2005 23:11:05 -0600
Subject: [RFC/PATCH 2/2] handle cpu device node addition and removal
In-Reply-To: <1106715991.9855.22.camel@localhost.localdomain>
References: <1106715991.9855.22.camel@localhost.localdomain>
Message-ID: <1106716265.9855.26.camel@localhost.localdomain>

Using the notifier chain in a previous patch, handle addition and
removal of processors on pSeries LPAR.  The new notifier call updates
cpu_present_map and sets hw_cpu_id in the paca appropriately.  Note
that we must handle more than one cpu being added or going away to
account for SMT processors.

This allows us to stop abusing cpu_present_map, and lets us get rid of
find_physical_cpu_to_start, which has always been a bit dodgy.

The code which updates cpu_present_map I plan to move to the generic
hotplug cpu code someday, but I think this is a good intermediate
step for now.

Tested on Power5.

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>


---


diff -puN arch/ppc64/kernel/pSeries_smp.c~cpu-dlpar-notifier arch/ppc64/kernel/pSeries_smp.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pSeries_smp.c~cpu-dlpar-notifier	2005-01-25 22:57:15.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pSeries_smp.c	2005-01-25 22:57:15.000000000 -0600
@@ -27,6 +27,7 @@
 #include <linux/err.h>
 #include <linux/sysdev.h>
 #include <linux/cpu.h>
+#include <linux/notifier.h>
 
 #include <asm/ptrace.h>
 #include <asm/atomic.h>
@@ -125,54 +126,6 @@ void pSeries_cpu_die(unsigned int cpu)
 	paca[cpu].cpu_start = 0;
 }
 
-/* Search all cpu device nodes for an offline logical cpu.  If a
- * device node has a "ibm,my-drc-index" property (meaning this is an
- * LPAR), paranoid-check whether we own the cpu.  For each "thread"
- * of a cpu, if it is offline and has the same hw index as before,
- * grab that in preference.
- */
-static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex)
-{
-	struct device_node *np = NULL;
-	unsigned int best = -1U;
-
-	while ((np = of_find_node_by_type(np, "cpu"))) {
-		int nr_threads, len;
-		u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL);
-		u32 *tid = (u32 *)
-			get_property(np, "ibm,ppc-interrupt-server#s", &len);
-
-		if (!tid)
-			tid = (u32 *)get_property(np, "reg", &len);
-
-		if (!tid)
-			continue;
-
-		/* If there is a drc-index, make sure that we own
-		 * the cpu.
-		 */
-		if (index) {
-			int state;
-			int rc = rtas_get_sensor(9003, *index, &state);
-			if (rc != 0 || state != 1)
-				continue;
-		}
-
-		nr_threads = len / sizeof(u32);
-
-		while (nr_threads--) {
-			if (0 == query_cpu_stopped(tid[nr_threads])) {
-				best = tid[nr_threads];
-				if (best == old_hwindex)
-					goto out;
-			}
-		}
-	}
-out:
-	of_node_put(np);
-	return best;
-}
-
 /**
  * smp_startup_cpu() - start the given cpu
  *
@@ -189,25 +142,16 @@ static inline int __devinit smp_startup_
 	int status;
 	unsigned long start_here = __pa((u32)*((unsigned long *)
 					       pSeries_secondary_smp_init));
-	unsigned int pcpu;
+	unsigned int pcpu = get_hard_smp_processor_id(lcpu);
 
 	/* At boot time the cpus are already spinning in hold
 	 * loops, so nothing to do. */
  	if (system_state < SYSTEM_RUNNING)
 		return 1;
 
-	pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu));
-	if (pcpu == -1U) {
-		printk(KERN_INFO "No more cpus available, failing\n");
-		return 0;
-	}
-
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	paca[lcpu].__current->thread_info->preempt_count	= 0;
 
-	/* At boot this is done in prom.c. */
-	paca[lcpu].hw_cpu_id = pcpu;
-
 	status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL,
 			   pcpu, start_here, lcpu);
 	if (status != 0) {
@@ -324,6 +268,116 @@ static struct smp_ops_t pSeries_xics_smp
 	.setup_cpu	= smp_xics_setup_cpu,
 };
 
+/*
+ * Update cpu_present_map and paca for a new cpu node.  Would like to
+ * move parts of this to generic code so that hotplug events are
+ * generated for each new cpu, but this is needed for now.
+ */
+static int pSeries_add_processor(struct device_node *node)
+{
+	unsigned int cpu;
+	cpumask_t candidate_map, tmp = CPU_MASK_NONE;
+	int err = 0, len, nthreads, i;
+	u32 *intserv;
+
+	intserv = (u32 *)get_property(node, "ibm,ppc-interrupt-server#s",
+								&len);
+	if (!intserv)
+		goto out;
+	nthreads = len / sizeof(u32);
+	for (i = 0; i < nthreads; i ++)
+		cpu_set(i, tmp);
+
+	lock_cpu_hotplug();
+
+	cpus_xor(candidate_map, cpu_possible_map, cpu_present_map);
+	err = -EINVAL;
+	if (cpus_empty(candidate_map))
+		goto out_unlock;
+
+	while (!cpus_empty(tmp))
+		if (cpus_subset(tmp, candidate_map))
+			/* Found a range where we can insert the new cpu(s) */
+			break;
+		else
+			cpus_shift_left(tmp, tmp, nthreads);
+
+	if (cpus_empty(tmp)) {
+		printk(KERN_INFO "Unable to find space in cpu_present_map for"
+		       " processor %s with %d thread(s)\n", node->name,
+		       nthreads);
+		goto out_unlock;
+	}
+
+	for_each_cpu_mask(cpu, tmp) {
+		BUG_ON(cpu_isset(cpu, cpu_present_map));
+		cpu_set(cpu, cpu_present_map);
+		set_hard_smp_processor_id(cpu, *intserv++);
+	}
+	err = 0;
+out_unlock:
+	unlock_cpu_hotplug();
+out:
+	return err;
+}
+
+/*
+ * Update present map for a cpu node which is going away, and set the
+ * "hard" id in the paca(s) to -1 to be consistent with boot time
+ * convention for non-present cpus.
+ */
+static int pSeries_remove_processor(struct device_node *node)
+{
+	unsigned int cpu;
+	int len, nthreads, i;
+	u32 *intserv = (u32 *)get_property(node, "ibm,ppc-interrupt-server#s",
+								&len);
+	if (!intserv)
+		return 0;
+
+	nthreads = len / sizeof(u32);
+
+	lock_cpu_hotplug();
+	for (i = 0; i < nthreads; i++) {
+		for_each_present_cpu(cpu) {
+			if (get_hard_smp_processor_id(cpu) == intserv[i]) {
+				BUG_ON(cpu_online(cpu));
+				cpu_clear(cpu, cpu_present_map);
+				set_hard_smp_processor_id(cpu, -1);
+				break;
+			}
+		}
+		if (cpu == NR_CPUS)
+			printk(KERN_WARNING "Could not find cpu to remove "
+			       "with physical id 0x%x\n", intserv[i]);
+	}
+	unlock_cpu_hotplug();
+	return 0;
+}
+
+static int pSeries_smp_notifier(struct notifier_block *nb, unsigned long action, void *_node)
+{
+	struct device_node *node = _node;
+	int err = NOTIFY_OK;
+
+	switch (action) {
+	case OF_RECONFIG_ADD:
+		if (pSeries_add_processor(node))
+			err = NOTIFY_BAD;
+		break;
+	case OF_RECONFIG_REMOVE:
+		if (pSeries_remove_processor(node))
+			err = NOTIFY_BAD;
+	default:
+		err = NOTIFY_DONE;
+	}
+	return err;
+}
+
+static struct notifier_block pSeries_smp_nb = {
+	.notifier_call = pSeries_smp_notifier,
+};
+
 /* This is called very early */
 void __init smp_init_pSeries(void)
 {
@@ -362,6 +416,9 @@ void __init smp_init_pSeries(void)
 		smp_ops->take_timebase = pSeries_take_timebase;
 	}
 
+	if (systemcfg->platform == PLATFORM_PSERIES_LPAR)
+		register_of_reconfig_notifier(&pSeries_smp_nb);
+
 	DBG(" <- smp_init_pSeries()\n");
 }
 
diff -puN arch/ppc64/kernel/smp.c~cpu-dlpar-notifier arch/ppc64/kernel/smp.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/smp.c~cpu-dlpar-notifier	2005-01-25 22:57:15.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/smp.c	2005-01-25 22:57:15.000000000 -0600
@@ -526,14 +526,6 @@ void __init smp_cpus_done(unsigned int m
 	smp_ops->setup_cpu(boot_cpuid);
 
 	set_cpus_allowed(current, old_mask);
-
-	/*
-	 * We know at boot the maximum number of cpus we can add to
-	 * a partition and set cpu_possible_map accordingly. cpu_present_map
-	 * needs to match for the hotplug code to allow us to hot add
-	 * any offline cpus.
-	 */
-	cpu_present_map = cpu_possible_map;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU

_


From dwmw2 at infradead.org  Thu Jan 27 05:45:40 2005
From: dwmw2 at infradead.org (David Woodhouse)
Date: Wed, 26 Jan 2005 18:45:40 +0000
Subject: Syscall auditing on ppc64 lacks correct return codes.
Message-ID: <1106765140.19262.27.camel@hades.cambridge.redhat.com>

We were pretending that every syscall returned zero. Don't do that.

===== arch/ppc64/kernel/entry.S 1.51 vs edited =====
--- 1.51/arch/ppc64/kernel/entry.S	Thu Jan 13 09:48:36 2005
+++ edited/arch/ppc64/kernel/entry.S	Thu Jan 20 16:14:50 2005
@@ -231,6 +231,7 @@
 syscall_exit_trace:
 	std	r3,GPR3(r1)
 	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_syscall_trace_leave
 	REST_NVGPRS(r1)
 	ld	r3,GPR3(r1)
@@ -324,6 +325,7 @@
 	ld	r4,TI_FLAGS(r4)
 	andi.	r4,r4,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP)
 	beq+	81f
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_syscall_trace_leave
 81:	b	.ret_from_except
 
===== arch/ppc64/kernel/ptrace.c 1.13 vs edited =====
--- 1.13/arch/ppc64/kernel/ptrace.c	Fri Dec 17 08:09:09 2004
+++ edited/arch/ppc64/kernel/ptrace.c	Thu Jan 20 16:24:12 2005
@@ -313,10 +313,10 @@
 		do_syscall_trace();
 }
 
-void do_syscall_trace_leave(void)
+void do_syscall_trace_leave(struct pt_regs *regs)
 {
 	if (unlikely(current->audit_context))
-		audit_syscall_exit(current, 0);	/* FIXME: pass pt_regs */
+		audit_syscall_exit(current, regs->result);
 
 	if ((test_thread_flag(TIF_SYSCALL_TRACE)
 	     || test_thread_flag(TIF_SINGLESTEP))


-- 
dwmw2


From paulus at samba.org  Thu Jan 27 13:27:01 2005
From: paulus at samba.org (Paul Mackerras)
Date: Thu, 27 Jan 2005 13:27:01 +1100
Subject: [PATCH] show -1 for physical_id of non-present cpus
In-Reply-To: <41F7187F.9070602@austin.ibm.com>
References: <1106698921.9091.4.camel@pants.austin.ibm.com>
	<41F7187F.9070602@austin.ibm.com>
Message-ID: <16888.20853.816824.41795@cargo.ozlabs.ibm.com>

Olof Johansson writes:

> Nathan Lynch wrote:
> 
> >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535
> >for non-present cpus.
> >
> 
> Good catch.
> 
> I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 
> and using %d. Either way is fine with me.

Changing hw_cpu_id to a signed quantity sounds cleaner to me.

Paul.


From dhowells at redhat.com  Fri Jan 28 01:02:42 2005
From: dhowells at redhat.com (David Howells)
Date: Thu, 27 Jan 2005 14:02:42 +0000
Subject: [PATCH] Fix kallsyms/insmod/rmmod race 
In-Reply-To: <1561.1106077468@redhat.com> 
References: <1561.1106077468@redhat.com>
	<1106014803.30801.22.camel@localhost.localdomain>
	<31453.1105979239@redhat.com> 
Message-ID: <3244.1106834562@redhat.com>

David Howells <dhowells at redhat.com> wrote:

> Rusty Russell <rusty at rustcorp.com.au> wrote:
> 
> > 	The more I looked at this, the more I warmed to it.  I've known for a
> > while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan),
> > so we should provide a "grabs locks" version, but this solution gets
> > around that nicely, while making life more certain for the oops case,
> > too.
> 
> Hmmm... though it works on i386 SMP, it doesn't, however, seem to work on
> ppc64 SMP:-/
> 
> My pSeries box seems to think that it can't find any symbols from previously
> loaded modules, and my Power5 box is quite happy to load modules that depend
> on other modules but panics because it can't mount its root fs.

Turns out that the patch works. Userspace was being bad. The stripped down
shell running as init (pid #1) wasn't taking into account that it would get
notification of kernel threads exiting when it called wait(), and so ended up
trying to load several modules at once, some of which required dependency
modules loading first.

David


From dhowells at redhat.com  Fri Jan 28 01:08:07 2005
From: dhowells at redhat.com (David Howells)
Date: Thu, 27 Jan 2005 14:08:07 +0000
Subject: [PATCH] Fix kallsyms/insmod/rmmod race [try #2]
In-Reply-To: <31453.1105979239@redhat.com> 
References: <31453.1105979239@redhat.com> 
Message-ID: <3880.1106834887@redhat.com>


The attached patch fixes a race between kallsyms and insmod/rmmod.

The problem is this:

 (1) The various kallsyms functions poke around in the module list without any
     locking so that they can be called from the oops handler.

 (2) Although insmod and rmmod use locks to exclude each other, these have no
     effect on the kallsyms function.

 (3) Although rmmod modifies the module state with the machine "stopped", it
     hasn't removed the metadata from the module metadata list, meaning that
     as soon as the machine is "restarted", the metadata can be observed by
     kallsyms.

     It's not possible to say that an item in that list should be ignored if
     it's state is marked as inactive - you can't get at the state information
     because you can't trust the metadata in which it is embedded.

     Furthermore, list linkage information is embedded in the metadata too, so
     you can't trust that either...

 (4) kallsyms may be walking the module list without a lock whilst either
     insmod or rmmod are busy changing it. insmod probably isn't a problem
     since nothing is going a way, but rmmod is as it's deleting an entry.

 (5) Therefore nothing that uses these functions can in any way trust any
     pointers to "static" data (such as module symbol names or module names)
     that are returned.

 (6) On ppc64 the problems are exacerbated since the hypervisor may reschedule
     bits of the kernel, making operations that appear adjacent occur a long
     time apart.

This patch fixes the race by only linking/unlinking modules into/from the
master module list with the machine in the "stopped" state. This means that
any "static" information can be trusted as far as the next kernel reschedule
on any given CPU without the need to hold any locks.

However, I'm not sure how this is affected by preemption. I suspect more work
may need to be done in that case, but I'm not entirely sure.

This also means that rmmod has to bump the machine into the stopped state
twice... but since that shouldn't be a common operation, I don't think that's
a problem.

I've amended this patch to not get spinlocks whilst in the machine locked
state - there's no point as nothing else can be holding spinlocks.

Signed-Off-By: David Howells <dhowells at redhat.com>
---
warthog>diffstat kallsyms-race-2611rc1.diff
 kallsyms.c |   16 ++++++++++++++--
 module.c   |   31 ++++++++++++++++++++++++-------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff -uNrp linux-2.6.11-rc1/kernel/kallsyms.c linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c
--- linux-2.6.11-rc1/kernel/kallsyms.c	2005-01-12 19:09:18.000000000 +0000
+++ linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c	2005-01-17 15:33:55.000000000 +0000
@@ -139,13 +139,20 @@ unsigned long kallsyms_lookup_name(const
 	return module_kallsyms_lookup_name(name);
 }
 
-/* Lookup an address.  modname is set to NULL if it's in the kernel. */
+/*
+ * Lookup an address
+ * - modname is set to NULL if it's in the kernel
+ * - we guarantee that the returned name is valid until we reschedule even if
+ *   it resides in a module
+ * - we also guarantee that modname will be valid until rescheduled
+ */
 const char *kallsyms_lookup(unsigned long addr,
 			    unsigned long *symbolsize,
 			    unsigned long *offset,
 			    char **modname, char *namebuf)
 {
 	unsigned long i, low, high, mid;
+	const char *msym;
 
 	/* This kernel should never had been booted. */
 	BUG_ON(!kallsyms_addresses);
@@ -196,7 +203,12 @@ const char *kallsyms_lookup(unsigned lon
 		return namebuf;
 	}
 
-	return module_address_lookup(addr, symbolsize, offset, modname);
+	/* see if it's in a module */
+	msym = module_address_lookup(addr, symbolsize, offset, modname);
+	if (msym)
+		return strncpy(namebuf, msym, KSYM_NAME_LEN);
+
+	return NULL;
 }
 
 /* Replace "%s" in format with address, or returns -errno. */
diff -uNrp linux-2.6.11-rc1/kernel/module.c linux-2.6.11-rc1-kallsyms/kernel/module.c
--- linux-2.6.11-rc1/kernel/module.c	2005-01-12 19:09:18.000000000 +0000
+++ linux-2.6.11-rc1-kallsyms/kernel/module.c	2005-01-27 14:06:22.857054758 +0000
@@ -1072,14 +1072,22 @@ static void mod_kobject_remove(struct mo
 	kobject_unregister(&mod->mkobj.kobj);
 }
 
+/*
+ * unlink the module with the whole machine is stopped with interrupts off
+ * - this defends against kallsyms not taking locks
+ */
+static inline int __unlink_module(void *_mod)
+{
+	struct module *mod = _mod;
+	list_del(&mod->list);
+	return 0;
+}
+
 /* Free a module, remove from lists, etc (must hold module mutex). */
 static void free_module(struct module *mod)
 {
 	/* Delete from various lists */
-	spin_lock_irq(&modlist_lock);
-	list_del(&mod->list);
-	spin_unlock_irq(&modlist_lock);
-
+	stop_machine_run(__unlink_module, mod, NR_CPUS);
 	remove_sect_attrs(mod);
 	mod_kobject_remove(mod);
 
@@ -1732,6 +1740,17 @@ static struct module *load_module(void _
 	goto free_hdr;
 }
 
+/*
+ * link the module with the whole machine is stopped with interrupts off
+ * - this defends against kallsyms not taking locks
+ */
+static inline int __link_module(void *_mod)
+{
+	struct module *mod = _mod;
+	list_add(&mod->list, &modules);
+	return 0;
+}
+
 /* This is where the real work happens */
 asmlinkage long
 sys_init_module(void __user *umod,
@@ -1766,9 +1785,7 @@ sys_init_module(void __user *umod,
 
 	/* Now sew it into the lists.  They won't access us, since
            strong_try_module_get() will fail. */
-	spin_lock_irq(&modlist_lock);
-	list_add(&mod->list, &modules);
-	spin_unlock_irq(&modlist_lock);
+	stop_machine_run(__link_module, mod, NR_CPUS);
 
 	/* Drop lock so they can recurse */
 	up(&module_mutex);


From moilanen at austin.ibm.com  Fri Jan 28 03:24:04 2005
From: moilanen at austin.ibm.com (Jake Moilanen)
Date: Thu, 27 Jan 2005 10:24:04 -0600
Subject: [PATCH] iSeries buildbreak fix
Message-ID: <20050127102404.07b57cd4.moilanen@austin.ibm.com>

Looks like a build break on iSeries after the xmon-dabr patch:

	arch/ppc64/xmon/xmon.c:632: undefined reference to `.plpar_hcall_norets'

Since iSeries cannot use xmon, a simple fix is to turn it off.

Jake

Signed-off-by: Jake Moilanen <moilanen at austin.ibm.com>

---


diff -puN arch/ppc64/Kconfig.debug~xmon-off-iSeries arch/ppc64/Kconfig.debug
--- linux-2.6-bk/arch/ppc64/Kconfig.debug~xmon-off-iSeries	Thu Jan 27 10:15:00 2005
+++ linux-2.6-bk-moilanen/arch/ppc64/Kconfig.debug	Thu Jan 27 10:16:23 2005
@@ -34,7 +34,7 @@ config DEBUGGER
 
 config XMON
 	bool "Include xmon kernel debugger"
-	depends on DEBUGGER
+	depends on DEBUGGER && !PPC_ISERIES
 	help
 	  Include in-kernel hooks for the xmon kernel monitor/debugger.
 	  Unless you are intending to debug the kernel, say N here.

_


From nathanl at austin.ibm.com  Fri Jan 28 08:26:01 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Thu, 27 Jan 2005 15:26:01 -0600
Subject: [PATCH] show -1 for physical_id of non-present cpus
In-Reply-To: <16888.20853.816824.41795@cargo.ozlabs.ibm.com>
References: <1106698921.9091.4.camel@pants.austin.ibm.com>
	<41F7187F.9070602@austin.ibm.com>
	<16888.20853.816824.41795@cargo.ozlabs.ibm.com>
Message-ID: <1106861161.8962.7.camel@pants.austin.ibm.com>

On Thu, 2005-01-27 at 13:27 +1100, Paul Mackerras wrote:
> Olof Johansson writes:
> 
> > Nathan Lynch wrote:
> > 
> > >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535
> > >for non-present cpus.
> > >
> > 
> > Good catch.
> > 
> > I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 
> > and using %d. Either way is fine with me.
> 
> Changing hw_cpu_id to a signed quantity sounds cleaner to me.

OK.

Make the physical_id cpu sysfs attribute on ppc64 show -1 instead of
65535 for non-present cpus.

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>

---


diff -puN arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed arch/ppc64/kernel/sysfs.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed	2005-01-27 15:03:16.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/sysfs.c	2005-01-27 15:05:12.000000000 -0600
@@ -387,7 +387,7 @@ static ssize_t show_physical_id(struct s
 {
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 
-	return sprintf(buf, "%u\n", get_hard_smp_processor_id(cpu->sysdev.id));
+	return sprintf(buf, "%d\n", get_hard_smp_processor_id(cpu->sysdev.id));
 }
 static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL);
 
diff -puN include/asm-ppc64/paca.h~make-cpu-physical_id-signed include/asm-ppc64/paca.h
--- linux-2.6.11-rc2-mm1/include/asm-ppc64/paca.h~make-cpu-physical_id-signed	2005-01-27 15:04:14.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/include/asm-ppc64/paca.h	2005-01-27 15:04:51.000000000 -0600
@@ -68,7 +68,7 @@ struct paca_struct {
 	u64 stab_real;			/* Absolute address of segment table */
 	u64 stab_addr;			/* Virtual address of segment table */
 	void *emergency_sp;		/* pointer to emergency stack */
-	u16 hw_cpu_id;			/* Physical processor number */
+	s16 hw_cpu_id;			/* Physical processor number */
 	u8 cpu_start;			/* At startup, processor spins until */
 					/* this becomes non-zero. */
 

_


From nathanl at austin.ibm.com  Fri Jan 28 09:23:45 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Thu, 27 Jan 2005 16:23:45 -0600
Subject: [PATCH] use _smp_processor_id() in idle loops
Message-ID: <1106864625.8962.11.camel@pants.austin.ibm.com>


With 2.6.11-rc2-mm1 and 2.6-bk kernels with CONFIG_DEBUG_PREEMPT I'm
seeing lots of smp_processor_id warnings from the idle loops:

BUG: using smp_processor_id() in preemptible [00000001] code:
swapper/0
caller is .dedicated_idle+0x64/0x228
Call Trace:
[c0000000004a3c50] [ffffffffffffffff] 0xffffffffffffffff (unreliable)
[c0000000004a3cd0] [c0000000001d179c] .smp_processor_id+0x154/0x168
[c0000000004a3d90] [c00000000000f990] .dedicated_idle+0x64/0x228
[c0000000004a3e80] [c00000000000fce0] .cpu_idle+0x34/0x4c
[c0000000004a3f00] [c00000000003a908] .start_secondary+0x10c/0x150
[c0000000004a3f90] [c00000000000bd28] .enable_64b_mode+0x0/0x28

This patch replaces smp_processor_id() with _smp_processor_id() in the
idle loop code, since we know the idle thread can't jump to a
different cpu.

Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>


---


diff -puN arch/ppc64/kernel/idle.c~kill-idle-loop-smp_processor_id-warnings arch/ppc64/kernel/idle.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/idle.c~kill-idle-loop-smp_processor_id-warnings	2005-01-27 16:14:31.000000000 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/idle.c	2005-01-27 16:14:31.000000000 -0600
@@ -122,7 +122,7 @@ static int iSeries_idle(void)
 static int default_idle(void)
 {
 	long oldval;
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu = _smp_processor_id();
 
 	while (1) {
 		oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED);
@@ -164,7 +164,7 @@ int dedicated_idle(void)
 	struct paca_struct *lpaca = get_paca(), *ppaca;
 	unsigned long start_snooze;
 	unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay);
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu = _smp_processor_id();
 
 	ppaca = &paca[cpu ^ 1];
 
@@ -244,7 +244,7 @@ int dedicated_idle(void)
 static int shared_idle(void)
 {
 	struct paca_struct *lpaca = get_paca();
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu = _smp_processor_id();
 
 	while (1) {
 		/*
@@ -275,8 +275,7 @@ static int shared_idle(void)
 		HMT_medium();
 		lpaca->lppaca.idle = 0;
 		schedule();
-		if (cpu_is_offline(smp_processor_id()) &&
-		    system_state == SYSTEM_RUNNING)
+		if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING)
 			cpu_die();
 	}
 

_


From nathanl at austin.ibm.com  Fri Jan 28 10:07:54 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Thu, 27 Jan 2005 17:07:54 -0600
Subject: [RFC/PATCH 2/2] handle cpu device node addition and removal
In-Reply-To: <1106716265.9855.26.camel@localhost.localdomain>
References: <1106715991.9855.22.camel@localhost.localdomain>
	<1106716265.9855.26.camel@localhost.localdomain>
Message-ID: <1106867274.8962.14.camel@pants.austin.ibm.com>

On Tue, 2005-01-25 at 23:11 -0600, Nathan Lynch wrote:
> Using the notifier chain in a previous patch, handle addition and
> removal of processors on pSeries LPAR.  The new notifier call updates
> cpu_present_map and sets hw_cpu_id in the paca appropriately.  Note
> that we must handle more than one cpu being added or going away to
> account for SMT processors.
> 
> This allows us to stop abusing cpu_present_map, and lets us get rid of
> find_physical_cpu_to_start, which has always been a bit dodgy.
> 
> The code which updates cpu_present_map I plan to move to the generic
> hotplug cpu code someday, but I think this is a good intermediate
> step for now.
> 
> Tested on Power5.

Hmm, just noticed that this does not allow us to online secondary
threads when booting with smt-enabled=off.  Will need to respin this
one.

Nathan


From rusty at rustcorp.com.au  Fri Jan 28 11:42:02 2005
From: rusty at rustcorp.com.au (Rusty Russell)
Date: Fri, 28 Jan 2005 11:42:02 +1100
Subject: [PATCH] Fix kallsyms/insmod/rmmod race [try #2]
In-Reply-To: <3880.1106834887@redhat.com>
References: <31453.1105979239@redhat.com>   <3880.1106834887@redhat.com>
Message-ID: <1106872922.18360.9.camel@localhost.localdomain>

On Thu, 2005-01-27 at 14:08 +0000, David Howells wrote:
> Signed-Off-By: David Howells <dhowells at redhat.com>

Excellent.  Thanks David!

Rusty.
-- 

A bad analogy is like a leaky screwdriver -- Richard Braakman


From brking at us.ibm.com  Sat Jan 29 01:56:17 2005
From: brking at us.ibm.com (brking at us.ibm.com)
Date: Fri, 28 Jan 2005 08:56:17 -0600
Subject: [PATCH 1/2] pci: Arch hook to determine config space size
Message-ID: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com>


When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64
system, the current code used to determine the config space size
of a device results in a PCI Master abort and an EEH error, resulting
in the device being taken offline. This patch adds the ability for
arch specific code to override part of the config space size
determination to fix this.

Signed-off-by: Brian King <brking at us.ibm.com>
---

 linux-2.6.11-rc2-bk5-bjking1/drivers/pci/probe.c |    4 ++++
 1 files changed, 4 insertions(+)

diff -puN drivers/pci/probe.c~pci_arch_cfg_space_size drivers/pci/probe.c
--- linux-2.6.11-rc2-bk5/drivers/pci/probe.c~pci_arch_cfg_space_size	2005-01-27 16:56:46.000000000 -0600
+++ linux-2.6.11-rc2-bk5-bjking1/drivers/pci/probe.c	2005-01-27 16:56:46.000000000 -0600
@@ -627,6 +627,8 @@ static void pci_release_dev(struct devic
 	kfree(pci_dev);
 }
 
+int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; }
+
 /**
  * pci_cfg_space_size - get the configuration space size of the PCI device.
  *
@@ -653,6 +655,8 @@ static int pci_cfg_space_size(struct pci
 			goto fail;
 	}
 
+	if (!pcibios_exp_cfg_space(dev))
+		goto fail;
 	if (pci_read_config_dword(dev, 256, &status) != PCIBIOS_SUCCESSFUL)
 		goto fail;
 	if (status == 0xffffffff)
_


From brking at us.ibm.com  Sat Jan 29 01:56:24 2005
From: brking at us.ibm.com (brking at us.ibm.com)
Date: Fri, 28 Jan 2005 08:56:24 -0600
Subject: [PATCH 2/2] ppc64: Arch hook to determine config space size
Message-ID: <200501281456.j0SEuPRF017696@d01av04.pok.ibm.com>


When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64
system, the current code used to determine the config space size
of a device results in a PCI Master abort and an EEH error, resulting
in the device being taken offline. This patch adds a ppc64
override to query OF to determine if the system and PHB support
PCI-X mode 2.

Signed-off-by: Brian King <brking at us.ibm.com>
---

 linux-2.6.11-rc2-bk5-bjking1/arch/ppc64/kernel/pSeries_pci.c |   18 +++++++++++
 1 files changed, 18 insertions(+)

diff -puN arch/ppc64/kernel/pSeries_pci.c~ppc64_arch_cfg_space_size arch/ppc64/kernel/pSeries_pci.c
--- linux-2.6.11-rc2-bk5/arch/ppc64/kernel/pSeries_pci.c~ppc64_arch_cfg_space_size	2005-01-27 16:57:03.000000000 -0600
+++ linux-2.6.11-rc2-bk5-bjking1/arch/ppc64/kernel/pSeries_pci.c	2005-01-27 16:57:48.000000000 -0600
@@ -583,3 +583,21 @@ static void fixup_winbond_82c105(struct 
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
 			 fixup_winbond_82c105);
+
+int pcibios_exp_cfg_space(struct pci_dev *dev)
+{
+	int *type;
+	struct device_node *dn;
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+	if (!hose)
+		return 0;
+
+	dn = (struct device_node *) hose->arch_data;
+	type = (int *)get_property(dn, "ibm,pci-config-space-type", NULL);
+
+	if (type && *type == 1)
+		return 1;
+
+	return 0;
+}
_


From hch at infradead.org  Sat Jan 29 05:52:34 2005
From: hch at infradead.org (Christoph Hellwig)
Date: Fri, 28 Jan 2005 18:52:34 +0000
Subject: [PATCH 1/2] pci: Arch hook to determine config space size
In-Reply-To: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com>
References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com>
Message-ID: <20050128185234.GB21760@infradead.org>

> +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; }

 - prototypes belong to headers
 - weak linkage is the perfect way for total obsfucation

please make this a regular arch hook
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---


From olof at austin.ibm.com  Sat Jan 29 07:09:01 2005
From: olof at austin.ibm.com (Olof Johansson)
Date: Fri, 28 Jan 2005 14:09:01 -0600
Subject: [PATCH] PPC64: p615 IOMMU fix
Message-ID: <20050128200901.GA8615@austin.ibm.com>

Hi,

pSeries p615 happens to have a bus hierarchy where the IDE controller for
the built-in CD is connected directly to the PHB without an intermediate
EADS bridge. The new iommu/bus setup code assumed that all systems
with EADS will have all devices under them, so this resulted in the IDE
controller not having an iommu table allocated.

To avoid this, always allocate a small table at the PHB level. It will
never be used for regular devices, and it's allocated out of the 256MB
that we previously skipped.

Signed-off-by: Olof Johansson <olof at austin.ibm.com>


---

 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c |   34 ++++++++++++++++-------
 1 files changed, 25 insertions(+), 9 deletions(-)

diff -puN arch/ppc64/kernel/pSeries_iommu.c~p615-iommu arch/ppc64/kernel/pSeries_iommu.c
--- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~p615-iommu	2005-01-28 14:04:48.971761000 -0600
+++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c	2005-01-28 14:04:48.984759024 -0600
@@ -309,6 +309,7 @@ static void iommu_table_setparms_lpar(st
 static void iommu_bus_setup_pSeries(struct pci_bus *bus)
 {
 	struct device_node *dn, *pdn;
+	struct iommu_table *tbl;
 
 	DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self);
 
@@ -326,7 +327,6 @@ static void iommu_bus_setup_pSeries(stru
 	if (!bus->self) {
 		/* Root bus */
 		if (is_python(dn)) {
-			struct iommu_table *tbl;
 			unsigned int *iohole;
 
 			DBG("Python root bus %s\n", bus->name);
@@ -352,19 +352,35 @@ static void iommu_bus_setup_pSeries(stru
 			iommu_table_setparms(dn->phb, dn, tbl);
 			dn->iommu_table = iommu_init_table(tbl);
 		} else {
-			/* 256 MB window by default */
-			dn->phb->dma_window_size = 1 << 28;
-			/* always skip the first 256MB */
-			dn->phb->dma_window_base_cur = 1 << 28;
+			/* Do a 128MB table at root. This is used for the IDE
+			 * controller on some SMP-mode POWER4 machines. It
+			 * doesn't hurt to allocate it on other machines
+			 * -- it'll just be unused since new tables are
+			 * allocated on the EADS level.
+			 *
+			 * Allocate at offset 128MB to avoid having to deal
+			 * with ISA holes; 128MB table for IDE is plenty.
+			 */
+			dn->phb->dma_window_size = 1 << 27;
+			dn->phb->dma_window_base_cur = 1 << 27;
+
+			tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
 
-			/* No table at PHB level for non-python PHBs */
+			iommu_table_setparms(dn->phb, dn, tbl);
+			dn->iommu_table = iommu_init_table(tbl);
+
+			/* All child buses have 256MB tables */
+			dn->phb->dma_window_size = 1 << 28;
 		}
 	} else {
 		pdn = pci_bus_to_OF_node(bus->parent);
 
-		if (!pdn->iommu_table) {
+		if (!bus->parent->self && !is_python(pdn)) {
 			struct iommu_table *tbl;
-			/* First child, allocate new table (256MB window) */
+			/* First child and not python means this is the EADS
+			 * level. Allocate new table for this slot with 256MB
+			 * window.
+			 */
 
 			tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL);
 
@@ -372,7 +388,7 @@ static void iommu_bus_setup_pSeries(stru
 
 			dn->iommu_table = iommu_init_table(tbl);
 		} else {
-			/* Lower than first child or under python, copy parent table */
+			/* Lower than first child or under python, use parent table */
 			dn->iommu_table = pdn->iommu_table;
 		}
 	}

_


From mvolaski at aecom.yu.edu  Sat Jan 29 12:37:21 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Fri, 28 Jan 2005 20:37:21 -0500
Subject: CONFIG_THERM_PM72 is missing from .config from recent kernels
 (2.6.10, 2.6.11)
Message-ID: <a06200729be208e66f54d@[129.98.90.227]>

CONFIG_THERM_PM72 is required for thermal management in at least 
Macs, most notably the PowerMac G5. Without it, the computer will run 
its fans at the max and is very loud.

It's missing from .config in at least a few releases of recent 
kernels (2.6.10, 2.6.11).

Does anyone know why?
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


From mvolaski at aecom.yu.edu  Sat Jan 29 12:26:08 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Fri, 28 Jan 2005 20:26:08 -0500
Subject: Recent kernels may freeze dual 2.5 GHz PowerMac G5s
Message-ID: <a06200732be2096a1e32e@[129.98.90.227]>

Posted here FYI...

>The patch below works. Thanks.
>
>>Maurice Volaski writes:
>>  > I am running Gentoo with a fresh 2.6.11-r1. I have all the kernel
>>  > debugging options turned on. Occasionally, I can get past the boot
>>  > process, but half the time it freezes somewhere along the way. If
>>  > not, I do get to boot, it doesn't take very long for it to freeze.
>>
>>Did 2.6.10 work Ok? Try the patch below, it fixes 2.6.11-rc1 boot
>>lockups on both my Beige G3 (locks up in ADB driver) and my G4 eMac
>>(locks up in radeonfb).
>>
>>--- linux-2.6.11-rc1/init/main.c.~1~	2005-01-15 03:30:25.000000000 +0100
>>+++ linux-2.6.11-rc1/init/main.c	2005-01-15 03:31:44.000000000 +0100
>>@@ -377,7 +377,7 @@ static void noinline rest_init(void)
>>  	 * Re-enable preemption but disable interrupts to make sure
>>  	 * we dont get preempted until we schedule() in cpu_idle().
>>  	 */
>>-	local_irq_disable();
>>+//	local_irq_disable();
>>  	preempt_enable_no_resched();
>>  	unlock_kernel();
>>  	cpu_idle();


--

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


From greg at kroah.com  Sat Jan 29 15:06:47 2005
From: greg at kroah.com (Greg KH)
Date: Fri, 28 Jan 2005 20:06:47 -0800
Subject: [PATCH 1/2] pci: Arch hook to determine config space size
In-Reply-To: <20050128185234.GB21760@infradead.org>
References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com>
	<20050128185234.GB21760@infradead.org>
Message-ID: <20050129040647.GA6261@kroah.com>

On Fri, Jan 28, 2005 at 06:52:34PM +0000, Christoph Hellwig wrote:
> > +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; }
> 
>  - prototypes belong to headers
>  - weak linkage is the perfect way for total obsfucation
> 
> please make this a regular arch hook

I agree.  Also, when sending PCI related patches, please cc the
linux-pci mailing list.

thanks,

greg k-h


From nathanl at austin.ibm.com  Sun Jan 30 09:24:19 2005
From: nathanl at austin.ibm.com (Nathan Lynch)
Date: Sat, 29 Jan 2005 16:24:19 -0600
Subject: [PATCH] use _smp_processor_id() in idle loops
In-Reply-To: <1106864625.8962.11.camel@pants.austin.ibm.com>
References: <1106864625.8962.11.camel@pants.austin.ibm.com>
Message-ID: <1107037459.31457.4.camel@biclops>

On Thu, 2005-01-27 at 16:23 -0600, Nathan Lynch wrote:
> With 2.6.11-rc2-mm1 and 2.6-bk kernels with CONFIG_DEBUG_PREEMPT I'm
> seeing lots of smp_processor_id warnings from the idle loops:
> 
> BUG: using smp_processor_id() in preemptible [00000001] code:
> swapper/0
> caller is .dedicated_idle+0x64/0x228
> Call Trace:
> [c0000000004a3c50] [ffffffffffffffff] 0xffffffffffffffff (unreliable)
> [c0000000004a3cd0] [c0000000001d179c] .smp_processor_id+0x154/0x168
> [c0000000004a3d90] [c00000000000f990] .dedicated_idle+0x64/0x228
> [c0000000004a3e80] [c00000000000fce0] .cpu_idle+0x34/0x4c
> [c0000000004a3f00] [c00000000003a908] .start_secondary+0x10c/0x150
> [c0000000004a3f90] [c00000000000bd28] .enable_64b_mode+0x0/0x28

This appears to be fixed in 2.6.11-rc2-mm2, so I guess my patch isn't
necessary now.

Nathan


From anton at samba.org  Sun Jan 30 09:49:03 2005
From: anton at samba.org (Anton Blanchard)
Date: Sun, 30 Jan 2005 09:49:03 +1100
Subject: [PATCH] use _smp_processor_id() in idle loops
In-Reply-To: <1107037459.31457.4.camel@biclops>
References: <1106864625.8962.11.camel@pants.austin.ibm.com>
	<1107037459.31457.4.camel@biclops>
Message-ID: <20050129224903.GD8654@krispykreme.ozlabs.ibm.com>

 
> This appears to be fixed in 2.6.11-rc2-mm2, so I guess my patch isn't
> necessary now.

FYI I saw some warnings in kprobes when it was called out of a
pagefault. From memory it was kprobe_running().

Anton


From mvolaski at aecom.yu.edu  Sun Jan 30 10:41:19 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Sat, 29 Jan 2005 18:41:19 -0500
Subject: [gentoo-ppc-dev] CONFIG_THERM_PM72 is missing from .config
 from recent kernels (2.6.10, 2.6.11)
In-Reply-To: <20050129103057.GA27803@hansmi.ch>
References: <a06200736be209a45bd65@[129.98.90.227]>
	<20050129103057.GA27803@hansmi.ch>
Message-ID: <a0620073cbe21c8047634@[129.98.90.227]>

>Hello Maurice
>
>>  It's missing from .config in at least a few releases of recent
>>  kernels (2.6.10, 2.6.11).
>
>Definitly not true, at least for ppc32.

Note that..

1) I looked only at official kernel source code
and
2) I looked only at a few releases, not every patchset.
and
3) I looked only at the resulting .config file after preparing it 
with make menuconfig.

>Linux g5 2.6.10-gentoo-r6-g5 #6 SMP Wed Jan 26 23:05:05 CET 2005 ppc
>PPC970, altivec supported PowerMac7,2 GNU/Linux

 From what I can tell, the .config file is built up from different 
files. I just looked at gentoo-dev-sources for this version and it 
is, in fact, present for ppc64 in
/usr/src/linux-2.6.10-gentoo-r6/arch/ppc64/defconfig

That suggests the mechanism that generates the .config files is not 
working right under certain circumstances related to the 64bit G5.
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


From benh at kernel.crashing.org  Mon Jan 31 10:21:13 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Mon, 31 Jan 2005 10:21:13 +1100
Subject: [gentoo-ppc-dev] CONFIG_THERM_PM72 is missing from .config
	from recent kernels (2.6.10, 2.6.11)
In-Reply-To: <a0620073cbe21c8047634@[129.98.90.227]>
References: <a06200736be209a45bd65@[129.98.90.227]>
	<20050129103057.GA27803@hansmi.ch>
	<a0620073cbe21c8047634@[129.98.90.227]>
Message-ID: <1107127273.5713.13.camel@gaston>

On Sat, 2005-01-29 at 18:41 -0500, Maurice Volaski wrote:

>  From what I can tell, the .config file is built up from different 
> files. I just looked at gentoo-dev-sources for this version and it 
> is, in fact, present for ppc64 in
> /usr/src/linux-2.6.10-gentoo-r6/arch/ppc64/defconfig
> 
> That suggests the mechanism that generates the .config files is not 
> working right under certain circumstances related to the 64bit G5.

The default config for G5s is arch/ppc64/configs/g5_defconfig, there is
only one for 64 bits. 32 bits on G5s is unsupported (and will probably
not work with more recent machines).

Ben.


From benh at kernel.crashing.org  Mon Jan 31 16:41:13 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Mon, 31 Jan 2005 16:41:13 +1100
Subject: [PATCH] ppc64: Move systemcfg out of head.S
Message-ID: <1107150074.5713.55.camel@gaston>

Hi !

The "systemcfg" data structure in the ppc64 kernel is something that used to be
defined to be at a hard-coded page number in the kernel image. This is not necessary
(at least not any more) and is a possible problem with future developements. This
patch removes that constraint, which also simplifies various bits of assembly
in head.S that were dealing with it.

This is the first step of a deeper cleanup of systemcfg definition of usage (and
ultimately removal in it's current incarnation).

Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>

Index: linux-work/arch/ppc64/kernel/head.S
===================================================================
--- linux-work.orig/arch/ppc64/kernel/head.S	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/head.S	2005-01-31 16:19:44.000000000 +1100
@@ -517,16 +517,7 @@
 	.globl naca
 naca:
 	.llong itVpdAreas
-#endif
-
-	. = SYSTEMCFG_PHYS_ADDR
-	.globl __start_systemcfg
-__start_systemcfg:
-	. = (SYSTEMCFG_PHYS_ADDR + PAGE_SIZE)
-	.globl __end_systemcfg
-__end_systemcfg:
 
-#ifdef CONFIG_PPC_ISERIES
 	/*
 	 * The iSeries LPAR map is at this fixed address
 	 * so that the HvReleaseData structure can address
@@ -536,6 +527,8 @@
 	 * VSID generation algorithm.  See include/asm/mmu_context.h.
 	 */
 
+	. = 0x4800
+
 	.llong	2		/* # ESIDs to be mapped by hypervisor	 */
 	.llong	1		/* # memory ranges to be mapped by hypervisor */
 	.llong	STAB0_PAGE	/* Page # of segment table within load area	*/
@@ -1264,10 +1257,6 @@
 	addi	r2,r2,0x4000
 	addi	r2,r2,0x4000
 
-	LOADADDR(r9,systemcfg)
-	SET_REG_TO_CONST(r4, SYSTEMCFG_VIRT_ADDR)
-	std	r4,0(r9)		/* set the systemcfg pointer */
-
 	bl	.iSeries_early_setup
 
 	/* relocation is on at this point */
@@ -1772,7 +1761,7 @@
 	sc				/* HvCall_setASR */
 #else
 	/* set the ASR */
-	li	r3,SYSTEMCFG_PHYS_ADDR	/* r3 = ptr to systemcfg	 */
+	ld	r3,systemcfg at got(r2)	/* r3 = ptr to systemcfg	 */
 	lwz	r3,PLATFORM(r3)		/* r3 = platform flags		 */
 	cmpldi 	r3,PLATFORM_PSERIES_LPAR
 	bne	98f
@@ -1861,12 +1850,6 @@
 	ori	r6,r6,MSR_RI
 	mtmsrd	r6			/* RI on */
 
-	/* setup the systemcfg pointer which is needed by *tab_initialize	*/
-	LOADADDR(r6,systemcfg)
-	sub	r6,r6,r26		/* addr of the variable systemcfg */
-	li	r27,SYSTEMCFG_PHYS_ADDR
-	std	r27,0(r6)	 	/* set the value of systemcfg	*/
-
 #ifdef CONFIG_HMT
 	/* Start up the second thread on cpu 0 */
 	mfspr	r3,PVR
@@ -1941,7 +1924,7 @@
 	/* set the ASR */
 	ld	r3,PACASTABREAL(r13)
 	ori	r4,r3,1			/* turn on valid bit		 */
-	li	r3,SYSTEMCFG_PHYS_ADDR	/* r3 = ptr to systemcfg */
+	ld	r3,systemcfg at got(r2)	/* r3 = ptr to systemcfg */
 	lwz	r3,PLATFORM(r3)		/* r3 = platform flags */
 	cmpldi 	r3,PLATFORM_PSERIES_LPAR
 	bne	98f
@@ -1960,7 +1943,7 @@
 	mtasr	r4			/* set the stab location	*/
 99:
 	/* Set SDR1 (hash table pointer) */
-	li	r3,SYSTEMCFG_PHYS_ADDR	/* r3 = ptr to systemcfg */
+	ld	r3,systemcfg at got(r2)	/* r3 = ptr to systemcfg */
 	lwz	r3,PLATFORM(r3)		/* r3 = platform flags */
 	/* Test if bit 0 is set (LPAR bit) */
 	andi.	r3,r3,0x1
@@ -1998,11 +1981,6 @@
 	li	r3,0
 	bl	.do_cpu_ftr_fixups
 
-	/* setup the systemcfg pointer */
-	LOADADDR(r9,systemcfg)
-	SET_REG_TO_CONST(r8, SYSTEMCFG_VIRT_ADDR)
-	std	r8,0(r9)
-
 	LOADADDR(r26, boot_cpuid)
 	lwz	r26,0(r26)
 
Index: linux-work/arch/ppc64/kernel/pacaData.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/pacaData.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/pacaData.c	2005-01-31 15:56:55.000000000 +1100
@@ -20,9 +20,14 @@
 #include <asm/iSeries/ItLpQueue.h>
 #include <asm/paca.h>
 
-struct systemcfg *systemcfg;
+static union {
+	struct systemcfg	data;
+	u8			page[PAGE_SIZE];
+} systemcfg_store __page_aligned;
+struct systemcfg *systemcfg = &systemcfg_store.data;
 EXPORT_SYMBOL(systemcfg);
 
+
 /* This symbol is provided by the linker - let it fill in the paca
  * field correctly */
 extern unsigned long __toc_start;
Index: linux-work/arch/ppc64/kernel/proc_ppc64.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/proc_ppc64.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/proc_ppc64.c	2005-01-31 15:56:55.000000000 +1100
@@ -89,7 +89,7 @@
 		return 1;
 	pde->nlink = 1;
 	pde->data = systemcfg;
-	pde->size = 4096;
+	pde->size = PAGE_SIZE;
 	pde->proc_fops = &page_map_fops;
 
 #ifdef CONFIG_PPC_PSERIES
Index: linux-work/include/asm-ppc64/systemcfg.h
===================================================================
--- linux-work.orig/include/asm-ppc64/systemcfg.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/systemcfg.h	2005-01-31 15:56:55.000000000 +1100
@@ -47,7 +47,6 @@
 	__u32 dcache_line_size;		/* L1 d-cache line size		0x64 */
 	__u32 icache_size;		/* L1 i-cache size		0x68 */
 	__u32 icache_line_size;		/* L1 i-cache line size		0x6C */
-	__u8  reserved0[3984];		/* Reserve rest of page		0x70 */
 };
 
 #ifdef __KERNEL__
@@ -56,8 +55,4 @@
 
 #endif /* __ASSEMBLY__ */
 
-#define SYSTEMCFG_PAGE      0x5
-#define SYSTEMCFG_PHYS_ADDR (SYSTEMCFG_PAGE<<PAGE_SHIFT)
-#define SYSTEMCFG_VIRT_ADDR (KERNELBASE+SYSTEMCFG_PHYS_ADDR)
-
 #endif /* _SYSTEMCFG_H */
Index: linux-work/arch/ppc64/kernel/LparData.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/LparData.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/LparData.c	2005-01-31 16:19:50.000000000 +1100
@@ -45,7 +45,7 @@
 	.xSize = sizeof(struct HvReleaseData),
 	.xVpdAreasPtrOffset = offsetof(struct naca_struct, xItVpdAreas),
 	.xSlicNacaAddr = &naca,		/* 64-bit Naca address */
-	.xMsNucDataOffset = 0x6000,	/* offset of LparMap within loadarea (see head.S) */
+	.xMsNucDataOffset = 0x4800,	/* offset of LparMap within loadarea (see head.S) */
 	.xTagsMode = 1,			/* tags inactive       */
 	.xAddressSize = 0,		/* 64 bit              */
 	.xNoSharedProcs = 0,		/* shared processors   */


From benh at kernel.crashing.org  Mon Jan 31 17:04:06 2005
From: benh at kernel.crashing.org (Benjamin Herrenschmidt)
Date: Mon, 31 Jan 2005 17:04:06 +1100
Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline
Message-ID: <1107151447.5712.81.camel@gaston>

Hi !

This is a rather large patch. See notes below for possible backward compatiblity
issues. (Note: It depends on "ppc64: Move systemcfg out of head.S" beeing applied)

This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every
process space, similar to the x86 vsyscall page. However, the implementation is
very different (and doesn't use the gate area mecanism). Actually, it contains two
implementations, a 32 bits and a 64 bits one.

These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a process
load section isn't already there). In the future, we can randomize that address,
or even imagine having a special phdr entry letting apps that wnat finer control
over their address space to put it elsewhere (or not at all).

The implementation adds a hook to binfmt_elf to let the architecture add a real VMA
to the process space instead of using the gate area mecanism. This mecanism wasn't
very suitable for ppc, we couldn't just "shove" PTE entries mapping kernel addresses
into userland without expensive changes to our hash table management. Instead, I
made the vDSO be a normal VMA which, additionally, means it supports copy-on-write
semantics if made writable via ptrace/mprotect, thus allowing breakpoints in the
vDSO code.

The current implementation of the vDSOs contain the signal trampolines with
appropriate DWARF informations, which enable us to use non-executable stacks
(patches to come later) along with a few more functions that we hope glibc will
soon make good use of (this is the "hard" part now :) Note that the symbols
exposed by the vDSO aren't "normal" function symbols, apps can't be expected to
link against them directly, the vDSO's are both seen as if they were linked at 0
and the symbols just contain offsets to the various functions. This is done on
purpose to avoid a relocation step (ppc64 functions normally have descriptors with
abs addresses in them). When glibc uses those functions, it's expected to use it's
own trampolines that know how to reach them.

In some cases, the vDSO contains several versions of a given function (for various
CPUs), the kernel will "patch" the symbol table at boot to make it point to the
appropriate one transparently.
 
What is currently implemented is:

 -  int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz);

 This is a fully userland implementation of gettimeofday, with no barriers and no
 locks, and providing 100% equivalent results to the syscall version

 - void __kernel_sync_dicache(unsigned long start, unsigned long end)

 This function sync's the data and instruction caches (for making data executable),
 it is expected that userland loaders use this instead of doing it themselves, as
 the kernel will provide optimized versions for the current CPU. Currently, the
 vDSO procides a full one for all CPUs prior to POWER5 and a nop one for POWER5
 which implements hardware snooping at the L1 level. In the future, an intermediate
 implementation may be done for the POWER4 and 970 which don't need the "dcbst"
 loop (the L1D cache is write-through on those).

 - void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
 
 Returns a pointer to a map of implemented syscalls on the currently running
 kernel. The map is agnostic to the size of "long", unlike kernel bitops, it stores
 bits from top to bottom so that memory actually contains a linear bitmap check for
 syscall N by testing bit (0x80000000 >> (N & 0x1f)) of * 32 bits int at N >> 5.


Note about backward compatibility issues: A bug in the ppc64 libgcc unwinder
makes it unable to unwind stacks properly accross signals if the signal trampoline
isn't on the stack. This has been fixed in CVS for gcc 4.0 and will be soon on
the stable branch, but the problem exist will all currently used versions.

That means that until glibc gets the patch to enable it's use of the vDSO symbols
for the DWARF unwinder (rather trivial patch that will be pushed to glibc CVS soon
hopefully), unwinding from a signal handler will not work for 64 bits applications.

I consider this as a non-issue though as a patch is about to be produced, which can
easily get pushed to "live" distros like debian, gentoo, fedora, etc... soon enough
(it breaks compatilbity with kernels below 2.4.20 unfortunately as our signal stack
layout changed, crap crap crap), as there are few 64 bits applications out there
(expect gentoo), as it's only really an issue with C++ code relying on throwing
exceptions out of signal handlers (extremely rare it seems), and as "release" distros
like SLES or RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix
by the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :)

So far, I yet have to see an app failing because of that...

Finally, many many many thanks to Alan Modra for writing the DWARF information of
the signal handlers and debugging the libgcc issues !

Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>

Index: linux-work/arch/ppc64/Makefile
===================================================================
--- linux-work.orig/arch/ppc64/Makefile	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/Makefile	2005-01-31 16:25:55.000000000 +1100
@@ -53,6 +53,8 @@
 
 libs-y				+= arch/ppc64/lib/
 core-y				+= arch/ppc64/kernel/
+core-y				+= arch/ppc64/kernel/vdso32/
+core-y				+= arch/ppc64/kernel/vdso64/
 core-y				+= arch/ppc64/mm/
 core-$(CONFIG_XMON)		+= arch/ppc64/xmon/
 drivers-$(CONFIG_OPROFILE)	+= arch/ppc64/oprofile/
Index: linux-work/arch/ppc64/kernel/asm-offsets.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/asm-offsets.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/asm-offsets.c	2005-01-31 16:25:56.000000000 +1100
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/mman.h>
 #include <linux/mm.h>
+#include <linux/time.h>
 #include <linux/hardirq.h>
 #include <asm/io.h>
 #include <asm/page.h>
@@ -35,6 +36,8 @@
 #include <asm/rtas.h>
 #include <asm/cputable.h>
 #include <asm/cache.h>
+#include <asm/systemcfg.h>
+#include <asm/compat.h>
 
 #define DEFINE(sym, val) \
 	asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -167,5 +170,24 @@
 	DEFINE(CPU_SPEC_FEATURES, offsetof(struct cpu_spec, cpu_features));
 	DEFINE(CPU_SPEC_SETUP, offsetof(struct cpu_spec, cpu_setup));
 
+	/* systemcfg offsets for use by vdso */
+	DEFINE(CFG_TB_ORIG_STAMP, offsetof(struct systemcfg, tb_orig_stamp));
+	DEFINE(CFG_TB_TICKS_PER_SEC, offsetof(struct systemcfg, tb_ticks_per_sec));
+	DEFINE(CFG_TB_TO_XS, offsetof(struct systemcfg, tb_to_xs));
+	DEFINE(CFG_STAMP_XSEC, offsetof(struct systemcfg, stamp_xsec));
+	DEFINE(CFG_TB_UPDATE_COUNT, offsetof(struct systemcfg, tb_update_count));
+	DEFINE(CFG_TZ_MINUTEWEST, offsetof(struct systemcfg, tz_minuteswest));
+	DEFINE(CFG_TZ_DSTTIME, offsetof(struct systemcfg, tz_dsttime));
+	DEFINE(CFG_SYSCALL_MAP32, offsetof(struct systemcfg, syscall_map_32));
+	DEFINE(CFG_SYSCALL_MAP64, offsetof(struct systemcfg, syscall_map_64));
+
+	/* timeval/timezone offsets for use by vdso */
+	DEFINE(TVAL64_TV_SEC, offsetof(struct timeval, tv_sec));
+	DEFINE(TVAL64_TV_USEC, offsetof(struct timeval, tv_usec));
+	DEFINE(TVAL32_TV_SEC, offsetof(struct compat_timeval, tv_sec));
+	DEFINE(TVAL32_TV_USEC, offsetof(struct compat_timeval, tv_usec));
+	DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest));
+	DEFINE(TZONE_TZ_DSTTIME, offsetof(struct timezone, tz_dsttime));
+
 	return 0;
 }
Index: linux-work/arch/ppc64/kernel/Makefile
===================================================================
--- linux-work.orig/arch/ppc64/kernel/Makefile	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/Makefile	2005-01-31 16:25:56.000000000 +1100
@@ -11,7 +11,7 @@
 			udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \
 			ptrace32.o signal32.o rtc.o init_task.o \
 			lmb.o cputable.o cpu_setup_power4.o idle_power4.o \
-			iommu.o sysfs.o
+			iommu.o sysfs.o vdso.o
 
 obj-$(CONFIG_PPC_OF) +=	of_device.o
 
Index: linux-work/arch/ppc64/kernel/signal32.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/signal32.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/signal32.c	2005-01-31 16:25:56.000000000 +1100
@@ -31,6 +31,7 @@
 #include <asm/ppcdebug.h>
 #include <asm/unistd.h>
 #include <asm/cacheflush.h>
+#include <asm/vdso.h>
 
 #define DEBUG_SIG 0
 
@@ -656,18 +657,24 @@
 
 	/* Save user registers on the stack */
 	frame = &rt_sf->uc.uc_mcontext;
-	if (save_user_regs(regs, frame, __NR_rt_sigreturn))
-		goto badframe;
-
 	if (put_user(regs->gpr[1], (unsigned long __user *)newsp))
 		goto badframe;
+
+	if (vdso32_rt_sigtramp && current->thread.vdso_base) {
+		if (save_user_regs(regs, frame, 0))
+			goto badframe;
+		regs->link = current->thread.vdso_base + vdso32_rt_sigtramp;
+	} else {
+		if (save_user_regs(regs, frame, __NR_rt_sigreturn))
+			goto badframe;
+		regs->link = (unsigned long) frame->tramp;
+	}
 	regs->gpr[1] = (unsigned long) newsp;
 	regs->gpr[3] = sig;
 	regs->gpr[4] = (unsigned long) &rt_sf->info;
 	regs->gpr[5] = (unsigned long) &rt_sf->uc;
 	regs->gpr[6] = (unsigned long) rt_sf;
 	regs->nip = (unsigned long) ka->sa.sa_handler;
-	regs->link = (unsigned long) frame->tramp;
 	regs->trap = 0;
 	regs->result = 0;
 
@@ -825,8 +832,15 @@
 	    || __put_user(sig, &sc->signal))
 		goto badframe;
 
-	if (save_user_regs(regs, &frame->mctx, __NR_sigreturn))
-		goto badframe;
+	if (vdso32_sigtramp && current->thread.vdso_base) {
+		if (save_user_regs(regs, &frame->mctx, 0))
+			goto badframe;
+		regs->link = current->thread.vdso_base + vdso32_sigtramp;
+	} else {
+		if (save_user_regs(regs, &frame->mctx, __NR_sigreturn))
+			goto badframe;
+		regs->link = (unsigned long) frame->mctx.tramp;
+	}
 
 	if (put_user(regs->gpr[1], (unsigned long __user *)newsp))
 		goto badframe;
@@ -834,7 +848,6 @@
 	regs->gpr[3] = sig;
 	regs->gpr[4] = (unsigned long) sc;
 	regs->nip = (unsigned long) ka->sa.sa_handler;
-	regs->link = (unsigned long) frame->mctx.tramp;
 	regs->trap = 0;
 	regs->result = 0;
 
Index: linux-work/arch/ppc64/kernel/setup.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/setup.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/setup.c	2005-01-31 16:25:56.000000000 +1100
@@ -990,6 +990,34 @@
 }
 
 /*
+ * Called from setup_arch to initialize the bitmap of available
+ * syscalls in the systemcfg page
+ */
+void __init setup_syscall_map(void)
+{
+	unsigned int i, count64 = 0, count32 = 0;
+	extern unsigned long *sys_call_table;
+	extern unsigned long *sys_call_table32;
+	extern unsigned long sys_ni_syscall;
+
+
+	for (i = 0; i < __NR_syscalls; i++) {
+		if (sys_call_table[i] == sys_ni_syscall)
+			continue;
+		count64++;
+		systemcfg->syscall_map_64[i >> 5] |= 0x80000000UL >> (i & 0x1f);
+	}
+	for (i = 0; i < __NR_syscalls; i++) {
+		if (sys_call_table32[i] == sys_ni_syscall)
+			continue;
+		count32++;
+		systemcfg->syscall_map_32[i >> 5] |= 0x80000000UL >> (i & 0x1f);
+	}
+	printk(KERN_INFO "Syscall map setup, %d 32 bits and %d 64 bits syscalls\n",
+	       count32, count64);
+}
+
+/*
  * Called into from start_kernel, after lock_kernel has been called.
  * Initializes bootmem, which is unsed to manage page allocation until
  * mem_init is called.
@@ -1027,6 +1055,9 @@
 	/* set up the bootmem stuff with available memory */
 	do_init_bootmem();
 
+	/* initialize the syscall map in systemcfg */
+	setup_syscall_map();
+
 	ppc_md.setup_arch();
 
 	/* Select the correct idle loop for the platform. */
Index: linux-work/arch/ppc64/kernel/signal.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/signal.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/signal.c	2005-01-31 16:25:56.000000000 +1100
@@ -34,6 +34,7 @@
 #include <asm/ppcdebug.h>
 #include <asm/unistd.h>
 #include <asm/cacheflush.h>
+#include <asm/vdso.h>
 
 #define DEBUG_SIG 0
 
@@ -426,10 +427,14 @@
 		goto badframe;
 
 	/* Set up to return from userspace. */
-	err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]);
-	if (err)
-		goto badframe;
-
+	if (vdso64_rt_sigtramp && current->thread.vdso_base) {
+		regs->link = current->thread.vdso_base + vdso64_rt_sigtramp;
+	} else {
+		err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]);
+		if (err)
+			goto badframe;
+		regs->link = (unsigned long) &frame->tramp[0];
+	}
 	funct_desc_ptr = (func_descr_t __user *) ka->sa.sa_handler;
 
 	/* Allocate a dummy caller frame for the signal handler. */
@@ -438,7 +443,6 @@
 
 	/* Set up "regs" so we "return" to the signal handler. */
 	err |= get_user(regs->nip, &funct_desc_ptr->entry);
-	regs->link = (unsigned long) &frame->tramp[0];
 	regs->gpr[1] = newsp;
 	err |= get_user(regs->gpr[2], &funct_desc_ptr->toc);
 	regs->gpr[3] = signr;
Index: linux-work/arch/ppc64/kernel/smp.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/smp.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/smp.c	2005-01-31 16:25:56.000000000 +1100
@@ -383,7 +383,7 @@
 	 * For now we leave it which means the time can be some
 	 * number of msecs off until someone does a settimeofday()
 	 */
-	do_gtod.tb_orig_stamp = tb_last_stamp;
+	do_gtod.varp->tb_orig_stamp = tb_last_stamp;
 	systemcfg->tb_orig_stamp = tb_last_stamp;
 #endif
 
Index: linux-work/arch/ppc64/kernel/time.c
===================================================================
--- linux-work.orig/arch/ppc64/kernel/time.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/kernel/time.c	2005-01-31 16:25:56.000000000 +1100
@@ -86,8 +86,6 @@
 unsigned long tb_ticks_per_jiffy;
 unsigned long tb_ticks_per_usec = 100; /* sane default */
 unsigned long tb_ticks_per_sec;
-unsigned long next_xtime_sync_tb;
-unsigned long xtime_sync_interval;
 unsigned long tb_to_xs;
 unsigned      tb_to_us;
 unsigned long processor_freq;
@@ -158,8 +156,8 @@
 	 * The conversion to microseconds at the end is done
 	 * without a divide (and in fact, without a multiply)
 	 */
-	tb_ticks = tb_val - do_gtod.tb_orig_stamp;
 	temp_varp = do_gtod.varp;
+	tb_ticks = tb_val - temp_varp->tb_orig_stamp;
 	temp_tb_to_xs = temp_varp->tb_to_xs;
 	temp_stamp_xsec = temp_varp->stamp_xsec;
 	tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs );
@@ -185,17 +183,55 @@
 {
 	struct timeval my_tv;
 
-	if (cur_tb > next_xtime_sync_tb) {
-		next_xtime_sync_tb = cur_tb + xtime_sync_interval;
-		__do_gettimeofday(&my_tv, cur_tb);
-
-		if (xtime.tv_sec <= my_tv.tv_sec) {
-			xtime.tv_sec = my_tv.tv_sec;
-			xtime.tv_nsec = my_tv.tv_usec * 1000;
-		}
+	__do_gettimeofday(&my_tv, cur_tb);
+
+	if (xtime.tv_sec <= my_tv.tv_sec) {
+		xtime.tv_sec = my_tv.tv_sec;
+		xtime.tv_nsec = my_tv.tv_usec * 1000;
 	}
 }
 
+/*
+ * When the timebase - tb_orig_stamp gets too big, we do a manipulation
+ * between tb_orig_stamp and stamp_xsec. The goal here is to keep the
+ * difference tb - tb_orig_stamp small enough to always fit inside a
+ * 32 bits number. This is a requirement of our fast 32 bits userland
+ * implementation in the vdso. If we "miss" a call to this function
+ * (interrupt latency, CPU locked in a spinlock, ...) and we end up
+ * with a too big difference, then the vdso will fallback to calling
+ * the syscall
+ */ 
+static __inline__ void timer_recalc_offset(unsigned long cur_tb)
+{
+	struct gettimeofday_vars * temp_varp;
+	unsigned temp_idx;
+	unsigned long offset, new_stamp_xsec, new_tb_orig_stamp;
+
+	if (((cur_tb - do_gtod.varp->tb_orig_stamp) & 0x80000000u) == 0)
+		return;
+
+	temp_idx = (do_gtod.var_idx == 0);
+	temp_varp = &do_gtod.vars[temp_idx];
+
+	new_tb_orig_stamp = cur_tb;
+	offset = new_tb_orig_stamp - do_gtod.varp->tb_orig_stamp;
+	new_stamp_xsec = do_gtod.varp->stamp_xsec + mulhdu(offset, do_gtod.varp->tb_to_xs);
+	
+	temp_varp->tb_to_xs = do_gtod.varp->tb_to_xs;
+	temp_varp->tb_orig_stamp = new_tb_orig_stamp;
+	temp_varp->stamp_xsec = new_stamp_xsec;
+	mb();
+	do_gtod.varp = temp_varp;
+	do_gtod.var_idx = temp_idx;
+
+	++(systemcfg->tb_update_count);
+	wmb();
+	systemcfg->tb_orig_stamp = new_tb_orig_stamp;
+	systemcfg->stamp_xsec = new_stamp_xsec;
+	wmb();
+	++(systemcfg->tb_update_count);
+}
+
 #ifdef CONFIG_SMP
 unsigned long profile_pc(struct pt_regs *regs)
 {
@@ -311,6 +347,7 @@
 		if (cpu == boot_cpuid) {
 			write_seqlock(&xtime_lock);
 			tb_last_stamp = lpaca->next_jiffy_update_tb;
+			timer_recalc_offset(lpaca->next_jiffy_update_tb);
 			do_timer(regs);
 			timer_sync_xtime(lpaca->next_jiffy_update_tb);
 			timer_check_rtc();
@@ -398,7 +435,9 @@
 	time_maxerror = NTP_PHASE_LIMIT;
 	time_esterror = NTP_PHASE_LIMIT;
 
-	delta_xsec = mulhdu( (tb_last_stamp-do_gtod.tb_orig_stamp), do_gtod.varp->tb_to_xs );
+	delta_xsec = mulhdu( (tb_last_stamp-do_gtod.varp->tb_orig_stamp),
+			     do_gtod.varp->tb_to_xs );
+
 	new_xsec = (new_nsec * XSEC_PER_SEC) / NSEC_PER_SEC;
 	new_xsec += new_sec * XSEC_PER_SEC;
 	if ( new_xsec > delta_xsec ) {
@@ -411,7 +450,7 @@
 		 * before 1970 ... eg. we booted ten days ago, and we are setting
 		 * the time to Jan 5, 1970 */
 		do_gtod.varp->stamp_xsec = new_xsec;
-		do_gtod.tb_orig_stamp = tb_last_stamp;
+		do_gtod.varp->tb_orig_stamp = tb_last_stamp;
 		systemcfg->stamp_xsec = new_xsec;
 		systemcfg->tb_orig_stamp = tb_last_stamp;
 	}
@@ -464,9 +503,9 @@
 	xtime.tv_sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
 			      tm.tm_hour, tm.tm_min, tm.tm_sec);
 	tb_last_stamp = get_tb();
-	do_gtod.tb_orig_stamp = tb_last_stamp;
 	do_gtod.varp = &do_gtod.vars[0];
 	do_gtod.var_idx = 0;
+	do_gtod.varp->tb_orig_stamp = tb_last_stamp;
 	do_gtod.varp->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
 	do_gtod.tb_ticks_per_sec = tb_ticks_per_sec;
 	do_gtod.varp->tb_to_xs = tb_to_xs;
@@ -477,9 +516,6 @@
 	systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
 	systemcfg->tb_to_xs = tb_to_xs;
 
-	xtime_sync_interval = tb_ticks_per_sec - (tb_ticks_per_sec/8);
-	next_xtime_sync_tb = tb_last_stamp + xtime_sync_interval;
-
 	time_freq = 0;
 
 	xtime.tv_nsec = 0;
@@ -584,12 +620,12 @@
 	   stamp_xsec which is the time (in 1/2^20 second units) corresponding to tb_orig_stamp.  This 
 	   new value of stamp_xsec compensates for the change in frequency (implied by the new tb_to_xs)
 	   which guarantees that the current time remains the same */ 
-	tb_ticks = get_tb() - do_gtod.tb_orig_stamp;
+	write_seqlock_irqsave( &xtime_lock, flags );
+	tb_ticks = get_tb() - do_gtod.varp->tb_orig_stamp;
 	div128_by_32( 1024*1024, 0, new_tb_ticks_per_sec, &divres );
 	new_tb_to_xs = divres.result_low;
 	new_xsec = mulhdu( tb_ticks, new_tb_to_xs );
 
-	write_seqlock_irqsave( &xtime_lock, flags );
 	old_xsec = mulhdu( tb_ticks, do_gtod.varp->tb_to_xs );
 	new_stamp_xsec = do_gtod.varp->stamp_xsec + old_xsec - new_xsec;
 
@@ -597,16 +633,12 @@
 	   values in do_gettimeofday.  We alternate the copies and as long as a reasonable time elapses between
 	   changes, there will never be inconsistent values.  ntpd has a minimum of one minute between updates */
 
-	if (do_gtod.var_idx == 0) {
-		temp_varp = &do_gtod.vars[1];
-		temp_idx  = 1;
-	}
-	else {
-		temp_varp = &do_gtod.vars[0];
-		temp_idx  = 0;
-	}
+	temp_idx = (do_gtod.var_idx == 0);
+	temp_varp = &do_gtod.vars[temp_idx];
+
 	temp_varp->tb_to_xs = new_tb_to_xs;
 	temp_varp->stamp_xsec = new_stamp_xsec;
+	temp_varp->tb_orig_stamp = do_gtod.varp->tb_orig_stamp;
 	mb();
 	do_gtod.varp = temp_varp;
 	do_gtod.var_idx = temp_idx;
Index: linux-work/arch/ppc64/kernel/vdso.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso.c	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,614 @@
+/*
+ *  linux/arch/ppc64/kernel/vdso.c
+ *
+ *    Copyright (C) 2004 Benjamin Herrenschmidt, IBM Corp.
+ *			 <benh at kernel.crashing.org>
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/smp_lock.h>
+#include <linux/stddef.h>
+#include <linux/unistd.h>
+#include <linux/slab.h>
+#include <linux/user.h>
+#include <linux/elf.h>
+#include <linux/security.h>
+#include <linux/bootmem.h>
+
+#include <asm/pgtable.h>
+#include <asm/system.h>
+#include <asm/processor.h>
+#include <asm/mmu.h>
+#include <asm/mmu_context.h>
+#include <asm/machdep.h>
+#include <asm/cputable.h>
+#include <asm/sections.h>
+#include <asm/vdso.h>
+
+#undef DEBUG
+
+#ifdef DEBUG
+#define DBG(fmt...) printk(fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+
+/*
+ * The vDSOs themselves are here
+ */ 
+extern char vdso64_start, vdso64_end;
+extern char vdso32_start, vdso32_end;
+
+static void *vdso64_kbase = &vdso64_start;
+static void *vdso32_kbase = &vdso32_start;
+
+unsigned int vdso64_pages;
+unsigned int vdso32_pages;
+
+/* Signal trampolines user addresses */
+
+unsigned long vdso64_rt_sigtramp;
+unsigned long vdso32_sigtramp;
+unsigned long vdso32_rt_sigtramp;
+
+/* Format of the patch table */
+struct vdso_patch_def
+{
+	u32		pvr_mask, pvr_value;
+	const char	*gen_name;
+	const char	*fix_name;
+};
+
+/* Table of functions to patch based on the CPU type/revision
+ *
+ * TODO: Improve by adding whole lists for each entry
+ */
+static struct vdso_patch_def vdso_patches[] = {
+	{
+		0xffff0000, 0x003a0000,		/* POWER5 */
+		"__kernel_sync_dicache", "__kernel_sync_dicache_p5"
+	},
+	{
+		0xffff0000, 0x003b0000,		/* POWER5 */
+		"__kernel_sync_dicache", "__kernel_sync_dicache_p5"
+	},
+};
+
+/*
+ * Some infos carried around for each of them during parsing at
+ * boot time.
+ */
+struct lib32_elfinfo
+{
+	Elf32_Ehdr	*hdr;		/* ptr to ELF */
+	Elf32_Sym	*dynsym;	/* ptr to .dynsym section */
+	unsigned long	dynsymsize;	/* size of .dynsym section */
+	char		*dynstr;	/* ptr to .dynstr section */
+	unsigned long	text;		/* offset of .text section in .so */
+};
+
+struct lib64_elfinfo
+{
+	Elf64_Ehdr	*hdr;
+	Elf64_Sym	*dynsym;
+	unsigned long	dynsymsize;
+	char		*dynstr;
+	unsigned long	text;
+};
+
+
+#ifdef __DEBUG
+static void dump_one_vdso_page(struct page *pg, struct page *upg)
+{
+	printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT),
+	       page_count(pg),
+	       pg->flags);
+	if (upg/* && pg != upg*/) {
+		printk(" upg: %p (c:%d,f:%08lx)", __va(page_to_pfn(upg) << PAGE_SHIFT),
+		       page_count(upg),
+		       upg->flags);
+	}
+	printk("\n");
+}
+
+static void dump_vdso_pages(struct vm_area_struct * vma)
+{
+	int i;
+
+	if (!vma || test_thread_flag(TIF_32BIT)) {
+		printk("vDSO32 @ %016lx:\n", (unsigned long)vdso32_kbase);
+		for (i=0; i<vdso32_pages; i++) {
+			struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
+			struct page *upg = (vma && vma->vm_mm) ?
+				follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0)
+				: NULL;
+			dump_one_vdso_page(pg, upg);
+		}
+	}
+	if (!vma || !test_thread_flag(TIF_32BIT)) {
+		printk("vDSO64 @ %016lx:\n", (unsigned long)vdso64_kbase);
+		for (i=0; i<vdso64_pages; i++) {
+			struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
+			struct page *upg = (vma && vma->vm_mm) ?
+				follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0)
+				: NULL;
+			dump_one_vdso_page(pg, upg);
+		}
+	}
+}
+#endif /* DEBUG */
+
+/*
+ * Keep a dummy vma_close for now, it will prevent VMA merging.
+ */
+static void vdso_vma_close(struct vm_area_struct * vma)
+{
+}
+
+/*
+ * Our nopage() function, maps in the actual vDSO kernel pages, they will
+ * be mapped read-only by do_no_page(), and eventually COW'ed, either
+ * right away for an initial write access, or by do_wp_page().
+ */
+static struct page * vdso_vma_nopage(struct vm_area_struct * vma,
+				     unsigned long address, int *type)
+{
+	unsigned long offset = address - vma->vm_start;
+	struct page *pg;
+	void *vbase = test_thread_flag(TIF_32BIT) ? vdso32_kbase : vdso64_kbase;
+
+	DBG("vdso_vma_nopage(current: %s, address: %016lx, off: %lx)\n",
+	    current->comm, address, offset);
+
+	if (address < vma->vm_start || address > vma->vm_end)
+		return NOPAGE_SIGBUS;
+
+	/*
+	 * Last page is systemcfg, special handling here, no get_page() a
+	 * this is a reserved page
+	 */
+	if ((vma->vm_end - address) <= PAGE_SIZE)
+		return virt_to_page(systemcfg);
+	
+	pg = virt_to_page(vbase + offset);
+	get_page(pg);
+	DBG(" ->page count: %d\n", page_count(pg));
+
+	return pg;
+}
+
+static struct vm_operations_struct vdso_vmops = {
+	.close	= vdso_vma_close,
+	.nopage	= vdso_vma_nopage,
+};
+
+/*
+ * This is called from binfmt_elf, we create the special vma for the
+ * vDSO and insert it into the mm struct tree
+ */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	unsigned long vdso_pages;
+	unsigned long vdso_base;
+
+	if (test_thread_flag(TIF_32BIT)) {
+		vdso_pages = vdso32_pages;
+		vdso_base = VDSO32_MBASE;
+	} else {
+		vdso_pages = vdso64_pages;
+		vdso_base = VDSO64_MBASE;
+	}
+
+	/* vDSO has a problem and was disabled, just don't "enable" it for the
+	 * process
+	 */
+	if (vdso_pages == 0) {
+		current->thread.vdso_base = 0;
+		return 0;
+	}
+	vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
+	if (vma == NULL)
+		return -ENOMEM;
+	if (security_vm_enough_memory(vdso_pages)) {
+		kmem_cache_free(vm_area_cachep, vma);
+		return -ENOMEM;
+	}
+	memset(vma, 0, sizeof(*vma));
+
+	/*
+	 * pick a base address for the vDSO in process space. We have a default
+	 * base of 1Mb on which we had a random offset up to 1Mb.
+	 * XXX: Add possibility for a program header to specify that location
+	 */
+	current->thread.vdso_base = vdso_base;
+	/*  + ((unsigned long)vma & 0x000ff000); */
+
+	vma->vm_mm = mm;
+	vma->vm_start = current->thread.vdso_base;
+
+	/*
+	 * the VMA size is one page more than the vDSO since systemcfg
+	 * is mapped in the last one
+	 */
+	vma->vm_end = vma->vm_start + ((vdso_pages + 1) << PAGE_SHIFT);
+
+	/*
+	 * our vma flags don't have VM_WRITE so by default, the process isn't allowed
+	 * to write those pages.
+	 * gdb can break that with ptrace interface, and thus trigger COW on those
+	 * pages but it's then your responsibility to never do that on the "data" page
+	 * of the vDSO or you'll stop getting kernel updates and your nice userland
+	 * gettimeofday will be totally dead. It's fine to use that for setting
+	 * breakpoints in the vDSO code pages though
+	 */
+	vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+	vma->vm_flags |= mm->def_flags;
+	vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+	vma->vm_ops = &vdso_vmops;
+
+	down_write(&mm->mmap_sem);
+	insert_vm_struct(mm, vma);
+	mm->total_vm += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+	up_write(&mm->mmap_sem);
+
+	return 0;
+}
+
+static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname,
+				  unsigned long *size)
+{
+	Elf32_Shdr *sechdrs;
+	unsigned int i;
+	char *secnames;
+
+	/* Grab section headers and strings so we can tell who is who */
+	sechdrs = (void *)ehdr + ehdr->e_shoff;
+	secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset;
+
+	/* Find the section they want */
+	for (i = 1; i < ehdr->e_shnum; i++) {
+		if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) {
+			if (size)
+				*size = sechdrs[i].sh_size;
+			return (void *)ehdr + sechdrs[i].sh_offset;
+		}
+	}
+	*size = 0;
+	return NULL;
+}
+
+static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname,
+				  unsigned long *size)
+{
+	Elf64_Shdr *sechdrs;
+	unsigned int i;
+	char *secnames;
+
+	/* Grab section headers and strings so we can tell who is who */
+	sechdrs = (void *)ehdr + ehdr->e_shoff;
+	secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset;
+
+	/* Find the section they want */
+	for (i = 1; i < ehdr->e_shnum; i++) {
+		if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) {
+			if (size)
+				*size = sechdrs[i].sh_size;
+			return (void *)ehdr + sechdrs[i].sh_offset;
+		}
+	}
+	if (size)
+		*size = 0;
+	return NULL;
+}
+
+static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, const char *symname)
+{
+	unsigned int i;
+	char name[32], *c;
+
+	for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) {
+		if (lib->dynsym[i].st_name == 0)
+			continue;
+		strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32);
+		c = strchr(name, '@');
+		if (c)
+			*c = 0;
+		if (strcmp(symname, name) == 0)
+			return &lib->dynsym[i];
+	}
+	return NULL;
+}
+
+static Elf64_Sym * __init find_symbol64(struct lib64_elfinfo *lib, const char *symname)
+{
+	unsigned int i;
+	char name[32], *c;
+
+	for (i = 0; i < (lib->dynsymsize / sizeof(Elf64_Sym)); i++) {
+		if (lib->dynsym[i].st_name == 0)
+			continue;
+		strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32);
+		c = strchr(name, '@');
+		if (c)
+			*c = 0;
+		if (strcmp(symname, name) == 0)
+			return &lib->dynsym[i];
+	}
+	return NULL;
+}
+
+/* Note that we assume the section is .text and the symbol is relative to
+ * the library base
+ */
+static unsigned long __init find_function32(struct lib32_elfinfo *lib, const char *symname)
+{
+	Elf32_Sym *sym = find_symbol32(lib, symname);
+
+	if (sym == NULL) {
+		printk(KERN_WARNING "vDSO32: function %s not found !\n", symname);
+		return 0;
+	}
+	return sym->st_value - VDSO32_LBASE;
+}
+
+/* Note that we assume the section is .text and the symbol is relative to
+ * the library base
+ */
+static unsigned long __init find_function64(struct lib64_elfinfo *lib, const char *symname)
+{
+	Elf64_Sym *sym = find_symbol64(lib, symname);
+
+	if (sym == NULL) {
+		printk(KERN_WARNING "vDSO64: function %s not found !\n", symname);
+		return 0;
+	}
+#ifdef VDS64_HAS_DESCRIPTORS
+	return *((u64 *)(vdso64_kbase + sym->st_value - VDSO64_LBASE)) - VDSO64_LBASE;
+#else
+	return sym->st_value - VDSO64_LBASE;
+#endif
+}
+
+
+static __init int vdso_do_find_sections(struct lib32_elfinfo *v32,
+					struct lib64_elfinfo *v64)
+{
+	void *sect;
+
+	/*
+	 * Locate symbol tables & text section
+	 */
+
+	v32->dynsym = find_section32(v32->hdr, ".dynsym", &v32->dynsymsize);
+	v32->dynstr = find_section32(v32->hdr, ".dynstr", NULL);
+	if (v32->dynsym == NULL || v32->dynstr == NULL) {
+		printk(KERN_ERR "vDSO32: a required symbol section was not found\n");
+		return -1;
+	}
+	sect = find_section32(v32->hdr, ".text", NULL);
+	if (sect == NULL) {
+		printk(KERN_ERR "vDSO32: the .text section was not found\n");
+		return -1;
+	}
+	v32->text = sect - vdso32_kbase;
+
+	v64->dynsym = find_section64(v64->hdr, ".dynsym", &v64->dynsymsize);
+	v64->dynstr = find_section64(v64->hdr, ".dynstr", NULL);
+	if (v64->dynsym == NULL || v64->dynstr == NULL) {
+		printk(KERN_ERR "vDSO64: a required symbol section was not found\n");
+		return -1;
+	}
+	sect = find_section64(v64->hdr, ".text", NULL);
+	if (sect == NULL) {
+		printk(KERN_ERR "vDSO64: the .text section was not found\n");
+		return -1;
+	}
+	v64->text = sect - vdso64_kbase;
+
+	return 0;
+}
+
+static __init void vdso_setup_trampolines(struct lib32_elfinfo *v32,
+					  struct lib64_elfinfo *v64)
+{
+	/*
+	 * Find signal trampolines
+	 */
+
+	vdso64_rt_sigtramp	= find_function64(v64, "__kernel_sigtramp_rt64");
+	vdso32_sigtramp		= find_function32(v32, "__kernel_sigtramp32");
+	vdso32_rt_sigtramp	= find_function32(v32, "__kernel_sigtramp_rt32");
+}
+
+static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32,
+				       struct lib64_elfinfo *v64)
+{
+	Elf32_Sym *sym32;
+	Elf64_Sym *sym64;
+
+	sym32 = find_symbol32(v32, "__kernel_datapage_offset");
+	if (sym32 == NULL) {
+		printk(KERN_ERR "vDSO32: Can't find symbol __kernel_datapage_offset !\n");
+		return -1;
+	}
+	*((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) =
+		(vdso32_pages << PAGE_SHIFT) - (sym32->st_value - VDSO32_LBASE);
+
+       	sym64 = find_symbol64(v64, "__kernel_datapage_offset");
+	if (sym64 == NULL) {
+		printk(KERN_ERR "vDSO64: Can't find symbol __kernel_datapage_offset !\n");
+		return -1;
+	}
+	*((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) =
+		(vdso64_pages << PAGE_SHIFT) - (sym64->st_value - VDSO64_LBASE);
+
+	return 0;
+}
+
+static int vdso_do_func_patch32(struct lib32_elfinfo *v32,
+				struct lib64_elfinfo *v64,
+				const char *orig, const char *fix)
+{
+	Elf32_Sym *sym32_gen, *sym32_fix;
+
+	sym32_gen = find_symbol32(v32, orig);
+	if (sym32_gen == NULL) {
+		printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig);
+		return -1;
+	}
+	sym32_fix = find_symbol32(v32, fix);
+	if (sym32_fix == NULL) {
+		printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix);
+		return -1;
+	}
+	sym32_gen->st_value = sym32_fix->st_value;
+	sym32_gen->st_size = sym32_fix->st_size;
+	sym32_gen->st_info = sym32_fix->st_info;
+	sym32_gen->st_other = sym32_fix->st_other;
+	sym32_gen->st_shndx = sym32_fix->st_shndx;
+	
+	return 0;
+}
+
+static int vdso_do_func_patch64(struct lib32_elfinfo *v32,
+				struct lib64_elfinfo *v64,
+				const char *orig, const char *fix)
+{
+	Elf64_Sym *sym64_gen, *sym64_fix;
+
+	sym64_gen = find_symbol64(v64, orig);
+	if (sym64_gen == NULL) {
+		printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", orig);
+		return -1;
+	}
+	sym64_fix = find_symbol64(v64, fix);
+	if (sym64_fix == NULL) {
+		printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", fix);
+		return -1;
+	}
+	sym64_gen->st_value = sym64_fix->st_value;
+	sym64_gen->st_size = sym64_fix->st_size;
+	sym64_gen->st_info = sym64_fix->st_info;
+	sym64_gen->st_other = sym64_fix->st_other;
+	sym64_gen->st_shndx = sym64_fix->st_shndx;
+	
+	return 0;
+}
+
+static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32,
+				       struct lib64_elfinfo *v64)
+{
+	u32 pvr;
+	int i;
+
+	pvr = mfspr(SPRN_PVR);
+	for (i = 0; i < ARRAY_SIZE(vdso_patches); i++) {
+		struct vdso_patch_def *patch = &vdso_patches[i];
+		int match = (pvr & patch->pvr_mask) == patch->pvr_value;
+
+		DBG("patch %d (mask: %x, pvr: %x) : %s\n",
+		    i, patch->pvr_mask, patch->pvr_value, match ? "match" : "skip");
+
+		if (!match)
+			continue;
+		
+		DBG("replacing %s with %s...\n", patch->gen_name, patch->fix_name);
+
+		/*
+		 * Patch the 32 bits and 64 bits symbols. Note that we do not patch
+		 * the "." symbol on 64 bits. It would be easy to do, but doesn't
+		 * seem to be necessary, patching the OPD symbol is enough.
+		 */
+		vdso_do_func_patch32(v32, v64, patch->gen_name, patch->fix_name);
+		vdso_do_func_patch64(v32, v64, patch->gen_name, patch->fix_name);
+	}
+
+	return 0;
+}
+
+
+static __init int vdso_setup(void)
+{
+	struct lib32_elfinfo	v32;
+	struct lib64_elfinfo	v64;
+
+	v32.hdr = vdso32_kbase;
+	v64.hdr = vdso64_kbase;
+
+	if (vdso_do_find_sections(&v32, &v64))
+		return -1;
+
+	if (vdso_fixup_datapage(&v32, &v64))
+		return -1;
+
+	if (vdso_fixup_alt_funcs(&v32, &v64))
+		return -1;
+
+	vdso_setup_trampolines(&v32, &v64);
+
+	return 0;
+}
+
+void __init vdso_init(void)
+{
+	int i;
+
+	vdso64_pages = (&vdso64_end - &vdso64_start) >> PAGE_SHIFT;
+	vdso32_pages = (&vdso32_end - &vdso32_start) >> PAGE_SHIFT;
+
+	DBG("vdso64_kbase: %p, 0x%x pages, vdso32_kbase: %p, 0x%x pages\n",
+	       vdso64_kbase, vdso64_pages, vdso32_kbase, vdso32_pages);
+
+	/*
+	 * Initialize the vDSO images in memory, that is do necessary
+	 * fixups of vDSO symbols, locate trampolines, etc...
+	 */
+	if (vdso_setup()) {
+		printk(KERN_ERR "vDSO setup failure, not enabled !\n");
+		/* XXX should free pages here ? */
+		vdso64_pages = vdso32_pages = 0;
+		return;
+	}
+
+	/* Make sure pages are in the correct state */
+	for (i = 0; i < vdso64_pages; i++) {
+		struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
+		ClearPageReserved(pg);
+		get_page(pg);
+	}
+	for (i = 0; i < vdso32_pages; i++) {
+		struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
+		ClearPageReserved(pg);
+		get_page(pg);
+	}
+}
+
+int in_gate_area_no_task(unsigned long addr)
+{
+	return 0;
+}
+
+int in_gate_area(struct task_struct *task, unsigned long addr)
+{
+	return 0;
+}
+
+struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+{
+	return NULL;
+}
+
Index: linux-work/include/asm-ppc64/processor.h
===================================================================
--- linux-work.orig/include/asm-ppc64/processor.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/processor.h	2005-01-31 16:25:56.000000000 +1100
@@ -544,8 +544,8 @@
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */
-#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(STACK_TOP_USER32 / 4))
-#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(STACK_TOP_USER64 / 4))
+#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4))
+#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(TASK_SIZE_USER64 / 4))
 
 #define TASK_UNMAPPED_BASE ((test_thread_flag(TIF_32BIT)||(ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \
 		TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 )
@@ -562,7 +562,8 @@
 	double		fpr[32];	/* Complete floating point set */
 	unsigned long	fpscr;		/* Floating point status (plus pad) */
 	unsigned long	fpexc_mode;	/* Floating-point exception mode */
-	unsigned long	pad[3];		/* was saved_msr, saved_softe */
+	unsigned long	pad[2];		/* was saved_msr, saved_softe */
+	unsigned long	vdso_base;	/* base of the vDSO library */
 #ifdef CONFIG_ALTIVEC
 	/* Complete AltiVec register set */
 	vector128	vr[32] __attribute((aligned(16)));
Index: linux-work/include/asm-ppc64/systemcfg.h
===================================================================
--- linux-work.orig/include/asm-ppc64/systemcfg.h	2005-01-31 15:56:55.000000000 +1100
+++ linux-work/include/asm-ppc64/systemcfg.h	2005-01-31 16:25:56.000000000 +1100
@@ -20,10 +20,14 @@
  * Minor version changes are a hint.
  */
 #define SYSTEMCFG_MAJOR 1
-#define SYSTEMCFG_MINOR 0
+#define SYSTEMCFG_MINOR 1
 
 #ifndef __ASSEMBLY__
 
+#include <linux/unistd.h>
+
+#define SYSCALL_MAP_SIZE      ((__NR_syscalls + 31) / 32)
+
 struct systemcfg {
 	__u8  eye_catcher[16];		/* Eyecatcher: SYSTEMCFG:PPC64	0x00 */
 	struct {			/* Systemcfg version numbers	     */
@@ -47,6 +51,8 @@
 	__u32 dcache_line_size;		/* L1 d-cache line size		0x64 */
 	__u32 icache_size;		/* L1 i-cache size		0x68 */
 	__u32 icache_line_size;		/* L1 i-cache line size		0x6C */
+   	__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */
+   	__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */
 };
 
 #ifdef __KERNEL__
Index: linux-work/include/asm-ppc64/a.out.h
===================================================================
--- linux-work.orig/include/asm-ppc64/a.out.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/a.out.h	2005-01-31 16:25:56.000000000 +1100
@@ -30,14 +30,11 @@
 
 #ifdef __KERNEL__
 
-#define STACK_TOP_USER64 (TASK_SIZE_USER64)
+#define STACK_TOP_USER64 TASK_SIZE_USER64
+#define STACK_TOP_USER32 TASK_SIZE_USER32
 
-/* Give 32-bit user space a full 4G address space to live in. */
-#define STACK_TOP_USER32 (TASK_SIZE_USER32)
-
-#define STACK_TOP ((test_thread_flag(TIF_32BIT) || \
-		(ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \
-		STACK_TOP_USER32 : STACK_TOP_USER64)
+#define STACK_TOP (test_thread_flag(TIF_32BIT) ? \
+		   STACK_TOP_USER32 : STACK_TOP_USER64)
 
 #endif /* __KERNEL__ */
 
Index: linux-work/include/asm-ppc64/elf.h
===================================================================
--- linux-work.orig/include/asm-ppc64/elf.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/elf.h	2005-01-31 16:25:56.000000000 +1100
@@ -238,10 +238,20 @@
 /* A special ignored type value for PPC, for glibc compatibility.  */
 #define AT_IGNOREPPC		22
 
+/* The vDSO location. We have to use the same value as x86 for glibc's
+ * sake :-)
+ */
+#define AT_SYSINFO_EHDR		33
+
 extern int dcache_bsize;
 extern int icache_bsize;
 extern int ucache_bsize;
 
+/* We do have an arch_setup_additional_pages for vDSO matters */
+#define ARCH_HAS_SETUP_ADDITIONAL_PAGES
+struct linux_binprm;
+extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack);
+
 /*
  * The requirements here are:
  * - keep the final alignment of sp (sp & 0xf)
@@ -260,6 +270,8 @@
 	NEW_AUX_ENT(AT_DCACHEBSIZE, dcache_bsize);			\
 	NEW_AUX_ENT(AT_ICACHEBSIZE, icache_bsize);			\
 	NEW_AUX_ENT(AT_UCACHEBSIZE, ucache_bsize);			\
+	/* vDSO base */							\
+	NEW_AUX_ENT(AT_SYSINFO_EHDR, current->thread.vdso_base);       	\
  } while (0)
 
 /* PowerPC64 relocations defined by the ABIs */
Index: linux-work/include/asm-ppc64/time.h
===================================================================
--- linux-work.orig/include/asm-ppc64/time.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/time.h	2005-01-31 16:25:56.000000000 +1100
@@ -43,10 +43,10 @@
 struct gettimeofday_vars {
 	unsigned long tb_to_xs;
 	unsigned long stamp_xsec;
+	unsigned long tb_orig_stamp;
 };
 
 struct gettimeofday_struct {
-	unsigned long tb_orig_stamp;
 	unsigned long tb_ticks_per_sec;
 	struct gettimeofday_vars vars[2];
 	struct gettimeofday_vars * volatile varp;
Index: linux-work/fs/binfmt_elf.c
===================================================================
--- linux-work.orig/fs/binfmt_elf.c	2005-01-31 14:18:24.000000000 +1100
+++ linux-work/fs/binfmt_elf.c	2005-01-31 16:25:56.000000000 +1100
@@ -772,6 +772,14 @@
 		goto out_free_dentry;
 	}
 	
+#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
+	retval = arch_setup_additional_pages(bprm, executable_stack);
+	if (retval < 0) {
+		send_sig(SIGKILL, current, 0);
+		goto out_free_dentry;
+	}
+#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
+
 	current->mm->start_stack = bprm->p;
 
 	/* Now we do a little grungy work by mmaping the ELF image into
Index: linux-work/include/asm-ppc64/page.h
===================================================================
--- linux-work.orig/include/asm-ppc64/page.h	2005-01-31 14:18:44.000000000 +1100
+++ linux-work/include/asm-ppc64/page.h	2005-01-31 16:25:56.000000000 +1100
@@ -185,6 +185,9 @@
 
 extern u64 ppc64_pft_size;		/* Log 2 of page table size */
 
+/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */
+#define __HAVE_ARCH_GATE_AREA		1
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef MODULE
Index: linux-work/include/asm-ppc64/vdso.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/include/asm-ppc64/vdso.h	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,83 @@
+#ifndef __PPC64_VDSO_H__
+#define __PPC64_VDSO_H__
+
+#ifdef __KERNEL__
+
+/* Default link addresses for the vDSOs */
+#define VDSO32_LBASE	0
+#define VDSO64_LBASE	0
+
+/* Default map addresses */
+#define VDSO32_MBASE	0x100000
+#define VDSO64_MBASE	0x100000
+
+#define VDSO_VERSION_STRING	LINUX_2.6.11
+
+/* Define if 64 bits VDSO has procedure descriptors */
+#undef VDS64_HAS_DESCRIPTORS
+
+#ifndef __ASSEMBLY__
+
+extern unsigned int vdso64_pages;
+extern unsigned int vdso32_pages;
+
+/* Offsets relative to thread->vdso_base */
+extern unsigned long vdso64_rt_sigtramp;
+extern unsigned long vdso32_sigtramp;
+extern unsigned long vdso32_rt_sigtramp;
+
+extern void vdso_init(void);
+
+#else /* __ASSEMBLY__ */
+
+#ifdef __VDSO64__
+#ifdef VDS64_HAS_DESCRIPTORS
+#define V_FUNCTION_BEGIN(name)		\
+	.globl name;			\
+        .section ".opd","a";		\
+        .align 3;			\
+	name:				\
+	.quad .name,.TOC. at tocbase,0;	\
+	.previous;			\
+	.globl .name;			\
+	.type .name, at function; 		\
+	.name:				\
+
+#define V_FUNCTION_END(name)		\
+	.size .name,.-.name;
+
+#define V_LOCAL_FUNC(name) (.name)
+
+#else /* VDS64_HAS_DESCRIPTORS */
+
+#define V_FUNCTION_BEGIN(name)		\
+	.globl name;			\
+	name:				\
+
+#define V_FUNCTION_END(name)		\
+	.size name,.-name;
+
+#define V_LOCAL_FUNC(name) (name)
+
+#endif /* VDS64_HAS_DESCRIPTORS */
+#endif /* __VDSO64__ */
+
+#ifdef __VDSO32__
+
+#define V_FUNCTION_BEGIN(name)		\
+	.globl name;			\
+	.type name, at function; 		\
+	name:				\
+
+#define V_FUNCTION_END(name)		\
+	.size name,.-name;
+
+#define V_LOCAL_FUNC(name) (name)
+
+#endif /* __VDSO32__ */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __KERNEL__ */
+
+#endif /* __PPC64_VDSO_H__ */
Index: linux-work/arch/ppc64/mm/init.c
===================================================================
--- linux-work.orig/arch/ppc64/mm/init.c	2005-01-31 14:18:14.000000000 +1100
+++ linux-work/arch/ppc64/mm/init.c	2005-01-31 16:25:56.000000000 +1100
@@ -62,6 +62,7 @@
 #include <asm/system.h>
 #include <asm/iommu.h>
 #include <asm/abs_addr.h>
+#include <asm/vdso.h>
 
 int mem_init_done;
 unsigned long ioremap_bot = IMALLOC_BASE;
@@ -743,6 +744,8 @@
 #ifdef CONFIG_PPC_ISERIES
 	iommu_vio_init();
 #endif
+	/* Initialize the vDSO */
+	vdso_init();
 }
 
 /*
Index: linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,139 @@
+/*
+ * Userland implementation of gettimeofday() for 32 bits processes in a
+ * ppc64 kernel for use in the vDSO
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/vdso.h>
+#include <asm/offsets.h>
+#include <asm/unistd.h>
+
+	.text
+/*
+ * Exact prototype of gettimeofday
+ *
+ * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_gettimeofday)
+  .cfi_startproc
+	mflr	r12
+  .cfi_register lr,r12
+
+	mr	r10,r3			/* r10 saves tv */
+	mr	r11,r4			/* r11 saves tz */
+	bl	__get_datapage at local	/* get data page */
+	mr	r9, r3			/* datapage ptr in r9 */
+	bl	__do_get_xsec at local	/* get xsec from tb & kernel */
+	bne-	2f			/* out of line -> do syscall */
+
+	/* seconds are xsec >> 20 */
+	rlwinm	r5,r4,12,20,31
+	rlwimi	r5,r3,12,0,19
+	stw	r5,TVAL32_TV_SEC(r10)
+
+	/* get remaining xsec and convert to usec. we scale
+	 * up remaining xsec by 12 bits and get the top 32 bits
+	 * of the multiplication
+	 */
+	rlwinm	r5,r4,12,0,19
+	lis	r6,1000000 at h
+	ori	r6,r6,1000000 at l
+	mulhwu	r5,r5,r6		
+	stw	r5,TVAL32_TV_USEC(r10)
+
+	cmpli	cr0,r11,0		/* check if tz is NULL */
+	beq	1f
+	lwz	r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */
+	lwz	r5,CFG_TZ_DSTTIME(r9)
+	stw	r4,TZONE_TZ_MINWEST(r11)
+	stw	r5,TZONE_TZ_DSTTIME(r11)
+	
+1:	mtlr	r12
+	blr
+
+2:	mr	r3,r10
+	mr	r4,r11
+	li	r0,__NR_gettimeofday
+	sc
+	b	1b
+  .cfi_endproc
+V_FUNCTION_END(__kernel_gettimeofday)
+
+/*
+ * This is the core of gettimeofday(), it returns the xsec
+ * value in r3 & r4 and expects the datapage ptr (non clobbered)
+ * in r9. clobbers r0,r4,r5,r6,r7,r8
+*/ 
+__do_get_xsec:
+  .cfi_startproc
+	/* Check for update count & load values. We use the low
+	 * order 32 bits of the update count
+	 */
+1:	lwz	r8,(CFG_TB_UPDATE_COUNT+4)(r9)
+	andi.	r0,r8,1			/* pending update ? loop */
+	bne-	1b
+	xor	r0,r8,r8		/* create dependency */
+	add	r9,r9,r0
+
+	/* Load orig stamp (offset to TB) */
+	lwz	r5,CFG_TB_ORIG_STAMP(r9)
+	lwz	r6,(CFG_TB_ORIG_STAMP+4)(r9)
+
+	/* Get a stable TB value */
+2:	mftbu	r3
+	mftbl	r4
+	mftbu	r0
+	cmpl	cr0,r3,r0
+	bne-	2b
+
+	/* Substract tb orig stamp. If the high part is non-zero, we jump to the
+	 * slow path which call the syscall. If it's ok, then we have our 32 bits
+	 * tb_ticks value in r7
+	 */
+	subfc	r7,r6,r4
+	subfe.	r0,r5,r3
+	bne-	3f
+
+	/* Load scale factor & do multiplication */
+	lwz	r5,CFG_TB_TO_XS(r9)	/* load values */
+	lwz	r6,(CFG_TB_TO_XS+4)(r9)
+	mulhwu	r4,r7,r5
+	mulhwu	r6,r7,r6
+	mullw	r6,r7,r5
+	addc	r6,r6,r0
+
+	/* At this point, we have the scaled xsec value in r4 + XER:CA
+	 * we load & add the stamp since epoch
+	 */
+	lwz	r5,CFG_STAMP_XSEC(r9)
+	lwz	r6,(CFG_STAMP_XSEC+4)(r9)
+	adde	r4,r4,r6
+	addze	r3,r5
+
+	/* We now have our result in r3,r4. We create a fake dependency
+	 * on that result and re-check the counter
+	 */
+	xor	r0,r4,r4
+	add	r9,r9,r0
+	lwz	r0,(CFG_TB_UPDATE_COUNT+4)(r9)
+        cmpl    cr0,r8,r0		/* check if updated */
+	bne-	1b
+
+	/* Warning ! The caller expects CR:EQ to be set to indicate a
+	 * successful calculation (so it won't fallback to the syscall
+	 * method). We have overriden that CR bit in the counter check,
+	 * but fortunately, the loop exit condition _is_ CR:EQ set, so
+	 * we can exit safely here. If you change this code, be careful
+	 * of that side effect.
+	 */
+3:	blr
+  .cfi_endproc
Index: linux-work/arch/ppc64/kernel/vdso32/sigtramp.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,300 @@
+/*
+ * Signal trampolines for 32 bits processes in a ppc64 kernel for
+ * use in the vDSO
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp.
+ * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/unistd.h>
+#include <asm/vdso.h>
+
+	.text
+
+/* The nop here is a hack.  The dwarf2 unwind routines subtract 1 from
+   the return address to get an address in the middle of the presumed
+   call instruction.  Since we don't have a call here, we artifically
+   extend the range covered by the unwind info by adding a nop before
+   the real start.  */
+	nop
+V_FUNCTION_BEGIN(__kernel_sigtramp32)
+.Lsig_start = . - 4
+	li	r0,__NR_sigreturn
+	sc
+.Lsig_end:
+V_FUNCTION_END(__kernel_sigtramp32)
+
+.Lsigrt_start:
+	nop
+V_FUNCTION_BEGIN(__kernel_sigtramp_rt32)
+	li	r0,__NR_rt_sigreturn
+	sc
+.Lsigrt_end:
+V_FUNCTION_END(__kernel_sigtramp_rt32)
+
+	.section .eh_frame,"a", at progbits
+
+/* Register r1 can be found at offset 4 of a pt_regs structure.
+   A pointer to the pt_regs is stored in memory at the old sp plus PTREGS.  */
+#define cfa_save \
+  .byte 0x0f;			/* DW_CFA_def_cfa_expression */		\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x23; .uleb128 RSIZE;	/*     DW_OP_plus_uconst */		\
+  .byte 0x06;			/*     DW_OP_deref */			\
+9:
+
+/* Register REGNO can be found at offset OFS of a pt_regs structure.
+   A pointer to the pt_regs is stored in memory at the old sp plus PTREGS.  */
+#define rsave(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .ifne ofs;								\
+    .byte 0x23; .uleb128 ofs;	/*     DW_OP_plus_uconst */		\
+  .endif;								\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16
+   of the VMX reg struct.  The VMX reg struct is at offset VREGS of
+   the pt_regs struct.  This macro is for REGNO == 0, and contains
+   'subroutines' that the other macros jump to.  */
+#define vsave_msr0(regno) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x30 + regno;		/*     DW_OP_lit0 */			\
+2:									\
+  .byte 0x40;			/*     DW_OP_lit16 */			\
+  .byte 0x1e;			/*     DW_OP_mul */			\
+3:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x12;			/*     DW_OP_dup */			\
+  .byte 0x23;			/*     DW_OP_plus_uconst */		\
+    .uleb128 33*RSIZE;		/*       msr offset */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x0c; .long 1 << 25;	/*     DW_OP_const4u */			\
+  .byte 0x1a;			/*     DW_OP_and */			\
+  .byte 0x12;			/*     DW_OP_dup, ret 0 if bra taken */	\
+  .byte 0x30;			/*     DW_OP_lit0 */			\
+  .byte 0x29;			/*     DW_OP_eq */			\
+  .byte 0x28; .short 0x7fff;	/*     DW_OP_bra to end */		\
+  .byte 0x13;			/*     DW_OP_drop, pop the 0 */		\
+  .byte 0x23; .uleb128 VREGS;	/*     DW_OP_plus_uconst */		\
+  .byte 0x22;			/*     DW_OP_plus */			\
+  .byte 0x2f; .short 0x7fff;	/*     DW_OP_skip to end */		\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16
+   of the VMX reg struct.  REGNO is 1 thru 31.  */
+#define vsave_msr1(regno) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x30 + regno;		/*     DW_OP_lit n */			\
+  .byte 0x2f; .short 2b - 9f;	/*     DW_OP_skip */			\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of
+   the VMX save block.  */
+#define vsave_msr2(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x0a; .short ofs;	/*     DW_OP_const2u */			\
+  .byte 0x2f; .short 3b - 9f;	/*     DW_OP_skip */			\
+9:
+
+/* VMX register REGNO is at offset OFS of the VMX save area.  */
+#define vsave(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x23; .uleb128 VREGS;	/*     DW_OP_plus_uconst */		\
+  .byte 0x23; .uleb128 ofs;	/*     DW_OP_plus_uconst */		\
+9:
+
+/* This is where the pt_regs pointer can be found on the stack.  */
+#define PTREGS 64+28
+
+/* Size of regs.  */
+#define RSIZE 4
+
+/* This is the offset of the VMX regs.  */
+#define VREGS 48*RSIZE+34*8
+
+/* Describe where general purpose regs are saved.  */
+#define EH_FRAME_GEN \
+  cfa_save;								\
+  rsave ( 0,  0*RSIZE);							\
+  rsave ( 2,  2*RSIZE);							\
+  rsave ( 3,  3*RSIZE);							\
+  rsave ( 4,  4*RSIZE);							\
+  rsave ( 5,  5*RSIZE);							\
+  rsave ( 6,  6*RSIZE);							\
+  rsave ( 7,  7*RSIZE);							\
+  rsave ( 8,  8*RSIZE);							\
+  rsave ( 9,  9*RSIZE);							\
+  rsave (10, 10*RSIZE);							\
+  rsave (11, 11*RSIZE);							\
+  rsave (12, 12*RSIZE);							\
+  rsave (13, 13*RSIZE);							\
+  rsave (14, 14*RSIZE);							\
+  rsave (15, 15*RSIZE);							\
+  rsave (16, 16*RSIZE);							\
+  rsave (17, 17*RSIZE);							\
+  rsave (18, 18*RSIZE);							\
+  rsave (19, 19*RSIZE);							\
+  rsave (20, 20*RSIZE);							\
+  rsave (21, 21*RSIZE);							\
+  rsave (22, 22*RSIZE);							\
+  rsave (23, 23*RSIZE);							\
+  rsave (24, 24*RSIZE);							\
+  rsave (25, 25*RSIZE);							\
+  rsave (26, 26*RSIZE);							\
+  rsave (27, 27*RSIZE);							\
+  rsave (28, 28*RSIZE);							\
+  rsave (29, 29*RSIZE);							\
+  rsave (30, 30*RSIZE);							\
+  rsave (31, 31*RSIZE);							\
+  rsave (67, 32*RSIZE);		/* ap, used as temp for nip */		\
+  rsave (65, 36*RSIZE);		/* lr */				\
+  rsave (70, 38*RSIZE)		/* cr */
+
+/* Describe where the FP regs are saved.  */
+#define EH_FRAME_FP \
+  rsave (32, 48*RSIZE +  0*8);						\
+  rsave (33, 48*RSIZE +  1*8);						\
+  rsave (34, 48*RSIZE +  2*8);						\
+  rsave (35, 48*RSIZE +  3*8);						\
+  rsave (36, 48*RSIZE +  4*8);						\
+  rsave (37, 48*RSIZE +  5*8);						\
+  rsave (38, 48*RSIZE +  6*8);						\
+  rsave (39, 48*RSIZE +  7*8);						\
+  rsave (40, 48*RSIZE +  8*8);						\
+  rsave (41, 48*RSIZE +  9*8);						\
+  rsave (42, 48*RSIZE + 10*8);						\
+  rsave (43, 48*RSIZE + 11*8);						\
+  rsave (44, 48*RSIZE + 12*8);						\
+  rsave (45, 48*RSIZE + 13*8);						\
+  rsave (46, 48*RSIZE + 14*8);						\
+  rsave (47, 48*RSIZE + 15*8);						\
+  rsave (48, 48*RSIZE + 16*8);						\
+  rsave (49, 48*RSIZE + 17*8);						\
+  rsave (50, 48*RSIZE + 18*8);						\
+  rsave (51, 48*RSIZE + 19*8);						\
+  rsave (52, 48*RSIZE + 20*8);						\
+  rsave (53, 48*RSIZE + 21*8);						\
+  rsave (54, 48*RSIZE + 22*8);						\
+  rsave (55, 48*RSIZE + 23*8);						\
+  rsave (56, 48*RSIZE + 24*8);						\
+  rsave (57, 48*RSIZE + 25*8);						\
+  rsave (58, 48*RSIZE + 26*8);						\
+  rsave (59, 48*RSIZE + 27*8);						\
+  rsave (60, 48*RSIZE + 28*8);						\
+  rsave (61, 48*RSIZE + 29*8);						\
+  rsave (62, 48*RSIZE + 30*8);						\
+  rsave (63, 48*RSIZE + 31*8)
+
+/* Describe where the VMX regs are saved.  */
+#ifdef CONFIG_ALTIVEC
+#define EH_FRAME_VMX \
+  vsave_msr0 ( 0);							\
+  vsave_msr1 ( 1);							\
+  vsave_msr1 ( 2);							\
+  vsave_msr1 ( 3);							\
+  vsave_msr1 ( 4);							\
+  vsave_msr1 ( 5);							\
+  vsave_msr1 ( 6);							\
+  vsave_msr1 ( 7);							\
+  vsave_msr1 ( 8);							\
+  vsave_msr1 ( 9);							\
+  vsave_msr1 (10);							\
+  vsave_msr1 (11);							\
+  vsave_msr1 (12);							\
+  vsave_msr1 (13);							\
+  vsave_msr1 (14);							\
+  vsave_msr1 (15);							\
+  vsave_msr1 (16);							\
+  vsave_msr1 (17);							\
+  vsave_msr1 (18);							\
+  vsave_msr1 (19);							\
+  vsave_msr1 (20);							\
+  vsave_msr1 (21);							\
+  vsave_msr1 (22);							\
+  vsave_msr1 (23);							\
+  vsave_msr1 (24);							\
+  vsave_msr1 (25);							\
+  vsave_msr1 (26);							\
+  vsave_msr1 (27);							\
+  vsave_msr1 (28);							\
+  vsave_msr1 (29);							\
+  vsave_msr1 (30);							\
+  vsave_msr1 (31);							\
+  vsave_msr2 (33, 32*16+12);						\
+  vsave      (32, 32*16)
+#else
+#define EH_FRAME_VMX
+#endif
+
+.Lcie:
+	.long .Lcie_end - .Lcie_start
+.Lcie_start:
+	.long 0			/* CIE ID */
+	.byte 1			/* Version number */
+	.string "zR"		/* NUL-terminated augmentation string */
+	.uleb128 4		/* Code alignment factor */
+	.sleb128 -4		/* Data alignment factor */
+	.byte 67		/* Return address register column, ap */
+	.uleb128 1		/* Augmentation value length */
+	.byte 0x1b		/* DW_EH_PE_pcrel | DW_EH_PE_sdata4. */
+	.byte 0x0c,1,0		/* DW_CFA_def_cfa: r1 ofs 0 */
+	.balign 4
+.Lcie_end:
+
+	.long .Lfde0_end - .Lfde0_start
+.Lfde0_start:
+	.long .Lfde0_start - .Lcie	/* CIE pointer. */
+	.long .Lsig_start - .		/* PC start, length */
+	.long .Lsig_end - .Lsig_start
+	.uleb128 0			/* Augmentation */
+	EH_FRAME_GEN
+	EH_FRAME_FP
+	EH_FRAME_VMX
+	.balign 4
+.Lfde0_end:
+
+/* We have a different stack layout for rt_sigreturn.  */
+#undef PTREGS
+#define PTREGS 64+16+128+20+28
+
+	.long .Lfde1_end - .Lfde1_start
+.Lfde1_start:
+	.long .Lfde1_start - .Lcie	/* CIE pointer. */
+	.long .Lsigrt_start - .		/* PC start, length */
+	.long .Lsigrt_end - .Lsigrt_start
+	.uleb128 0			/* Augmentation */
+	EH_FRAME_GEN
+	EH_FRAME_FP
+	EH_FRAME_VMX
+	.balign 4
+.Lfde1_end:
Index: linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,12 @@
+#include <linux/init.h>
+
+	.section ".data"
+
+	.globl vdso32_start, vdso32_end
+	.balign 4096
+vdso32_start:
+	.incbin "arch/ppc64/kernel/vdso32/vdso32.so"
+	.balign 4096
+vdso32_end:
+
+	.previous
Index: linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,110 @@
+/*
+ * This is the infamous ld script for the 64 bits vdso
+ * library
+ */
+#include <asm/vdso.h>
+
+OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
+OUTPUT_ARCH(powerpc:common64)
+ENTRY(_start)
+
+SECTIONS
+{
+  . = VDSO64_LBASE + SIZEOF_HEADERS;
+  .hash           : { *(.hash) }		:text
+  .dynsym         : { *(.dynsym) }
+  .dynstr         : { *(.dynstr) }
+  .gnu.version    : { *(.gnu.version) }
+  .gnu.version_d  : { *(.gnu.version_d) }
+  .gnu.version_r  : { *(.gnu.version_r) }
+
+  . = ALIGN (16);
+  .text           :
+  {
+    *(.text .stub .text.* .gnu.linkonce.t.*)
+    *(.sfpr .glink)
+  }
+  PROVIDE (__etext = .);
+  PROVIDE (_etext = .);
+  PROVIDE (etext = .);
+ 
+  /* Other stuff is appended to the text segment: */
+  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
+  .rodata1        : { *(.rodata1) }
+  .eh_frame_hdr   : { *(.eh_frame_hdr) }	:text	:eh_frame_hdr
+  .eh_frame       : { KEEP (*(.eh_frame)) }	:text
+  .gcc_except_table   : { *(.gcc_except_table) }
+
+  .opd           ALIGN(8) : { KEEP (*(.opd)) }
+  .got		 ALIGN(8) : { *(.got .toc) }
+  .rela.dyn	 ALIGN(8) : { *(.rela.dyn) }
+
+  .dynamic        : { *(.dynamic) }		:text	:dynamic
+
+  _end = .;
+  PROVIDE (end = .);
+
+  /* Stabs debugging sections are here too
+   */
+  .stab          0 : { *(.stab) }
+  .stabstr       0 : { *(.stabstr) }
+  .stab.excl     0 : { *(.stab.excl) }
+  .stab.exclstr  0 : { *(.stab.exclstr) }
+  .stab.index    0 : { *(.stab.index) }
+  .stab.indexstr 0 : { *(.stab.indexstr) }
+  .comment       0 : { *(.comment) }
+  /* DWARF debug sectio/ns.
+     Symbols in the DWARF debugging sections are relative to the beginning
+     of the section so we begin them at 0.  */
+  /* DWARF 1 */
+  .debug          0 : { *(.debug) }
+  .line           0 : { *(.line) }
+  /* GNU DWARF 1 extensions */
+  .debug_srcinfo  0 : { *(.debug_srcinfo) }
+  .debug_sfnames  0 : { *(.debug_sfnames) }
+  /* DWARF 1.1 and DWARF 2 */
+  .debug_aranges  0 : { *(.debug_aranges) }
+  .debug_pubnames 0 : { *(.debug_pubnames) }
+  /* DWARF 2 */
+  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
+  .debug_abbrev   0 : { *(.debug_abbrev) }
+  .debug_line     0 : { *(.debug_line) }
+  .debug_frame    0 : { *(.debug_frame) }
+  .debug_str      0 : { *(.debug_str) }
+  .debug_loc      0 : { *(.debug_loc) }
+  .debug_macinfo  0 : { *(.debug_macinfo) }
+  /* SGI/MIPS DWARF 2 extensions */
+  .debug_weaknames 0 : { *(.debug_weaknames) }
+  .debug_funcnames 0 : { *(.debug_funcnames) }
+  .debug_typenames 0 : { *(.debug_typenames) }
+  .debug_varnames  0 : { *(.debug_varnames) }
+
+  /DISCARD/ : { *(.note.GNU-stack) }
+  /DISCARD/ : { *(.branch_lt) }
+  /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.*) }
+  /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) }
+}
+
+PHDRS
+{
+  text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
+  dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+  eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
+}
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+  VDSO_VERSION_STRING {
+    global:
+	__kernel_datapage_offset; /* Has to be there for the kernel to find it */
+	__kernel_get_syscall_map;
+    	__kernel_gettimeofday;
+	__kernel_sync_dicache;
+	__kernel_sync_dicache_p5;
+	__kernel_sigtramp_rt64;
+    local: *;
+  };
+}
Index: linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,12 @@
+#include <linux/init.h>
+
+	.section ".data"
+
+	.globl vdso64_start, vdso64_end
+	.balign 4096
+vdso64_start:
+	.incbin "arch/ppc64/kernel/vdso64/vdso64.so"
+	.balign 4096
+vdso64_end:
+
+	.previous
Index: linux-work/arch/ppc64/kernel/vdso32/datapage.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/datapage.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,68 @@
+/*
+ * Access to the shared data page by the vDSO & syscall map
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/offsets.h>
+#include <asm/unistd.h>
+#include <asm/vdso.h>
+
+	.text
+V_FUNCTION_BEGIN(__get_datapage)
+  .cfi_startproc
+	/* We don't want that exposed or overridable as we want other objects
+	 * to be able to bl directly to here
+	 */
+	.protected __get_datapage
+	.hidden __get_datapage
+
+	mflr	r0
+  .cfi_register lr,r0
+
+	bcl	20,31,1f
+	.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+1:
+	mflr	r3
+	mtlr	r0
+	lwz	r0,0(r3)
+	add	r3,r0,r3
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__get_datapage)
+
+/*
+ * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
+ *
+ * returns a pointer to the syscall map. the map is agnostic to the
+ * size of "long", unlike kernel bitops, it stores bits from top to
+ * bottom so that memory actually contains a linear bitmap
+ * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of
+ * 32 bits int at N >> 5.
+ */
+V_FUNCTION_BEGIN(__kernel_get_syscall_map)
+  .cfi_startproc
+	mflr	r12
+  .cfi_register lr,r12
+
+	mr	r4,r3
+	bl	__get_datapage at local
+	mtlr	r12
+	addi	r3,r3,CFG_SYSCALL_MAP32
+	cmpli	cr0,r4,0
+	beqlr
+	li	r0,__NR_syscalls
+	stw	r0,0(r4)
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_get_syscall_map)
Index: linux-work/arch/ppc64/kernel/vdso32/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/Makefile	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,50 @@
+# Choose compiler
+#
+# XXX FIXME: We probably want to enforce using a biarch compiler by default
+#             and thus use (CC) with -m64, while letting the user pass a
+#             CROSS32_COMPILE prefix if wanted. Same goes for the zImage
+#             wrappers
+#
+
+CROSS32_COMPILE ?=
+
+CROSS32CC		:= $(CROSS32_COMPILE)gcc
+CROSS32AS		:= $(CROSS32_COMPILE)as
+
+# List of files in the vdso, has to be asm only for now
+
+src-vdso32 = sigtramp.S gettimeofday.S datapage.S cacheflush.S
+
+# Build rules
+
+obj-vdso32 := $(addsuffix .o, $(basename $(src-vdso32)))
+targets := $(obj-vdso32) vdso32.so
+obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
+src-vdso32 := $(addprefix $(src)/, $(src-vdso32))
+
+
+EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin
+EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1
+EXTRA_AFLAGS := -D__VDSO32__ -s
+
+obj-y += vdso32_wrapper.o
+extra-y += vdso32.lds
+CPPFLAGS_vdso32.lds += -P -C -U$(ARCH)
+
+# Force dependency (incbin is bad)
+$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
+
+# link rule for the .so file, .lds has to be first
+$(obj)/vdso32.so: $(src)/vdso32.lds $(obj-vdso32)
+	$(call if_changed,vdso32ld)
+
+# assembly rules for the .S files
+$(obj-vdso32): %.o: %.S
+	$(call if_changed_dep,vdso32as)
+
+# actual build commands
+quiet_cmd_vdso32ld = VDSO32L $@
+      cmd_vdso32ld = $(CROSS32CC) $(c_flags) -Wl,-T $^ -o $@
+quiet_cmd_vdso32as = VDSO32A $@
+      cmd_vdso32as = $(CROSS32CC) $(a_flags) -c -o $@ $<
+
Index: linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,91 @@
+/*
+ * Userland implementation of gettimeofday() for 64 bits processes in a
+ * ppc64 kernel for use in the vDSO
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org),
+ *                    IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/vdso.h>
+#include <asm/offsets.h>
+
+	.text
+/*
+ * Exact prototype of gettimeofday
+ *
+ * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_gettimeofday)
+  .cfi_startproc
+	mflr	r12
+  .cfi_register lr,r12
+
+	mr	r11,r3			/* r11 holds tv */
+	mr	r10,r4			/* r10 holds tz */
+	bl	V_LOCAL_FUNC(__get_datapage)		/* get data page */
+	bl	V_LOCAL_FUNC(__do_get_xsec)		/* get xsec from tb & kernel */
+	lis     r7,15			/* r7 = 1000000 = USEC_PER_SEC */
+	ori     r7,r7,16960
+	rldicl  r5,r4,44,20		/* r5 = sec = xsec / XSEC_PER_SEC */
+	rldicr  r6,r5,20,43		/* r6 = sec * XSEC_PER_SEC */
+	std	r5,TVAL64_TV_SEC(r11)	/* store sec in tv */
+	subf	r0,r6,r4		/* r0 = xsec = (xsec - r6) */
+	mulld   r0,r0,r7		/* usec = (xsec * USEC_PER_SEC) / XSEC_PER_SEC */
+	rldicl  r0,r0,44,20
+	cmpldi	cr0,r10,0		/* check if tz is NULL */
+	std	r0,TVAL64_TV_USEC(r11)	/* store usec in tv */
+	beq	1f
+	lwz	r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */
+	lwz	r5,CFG_TZ_DSTTIME(r3)
+	stw	r4,TZONE_TZ_MINWEST(r10)
+	stw	r5,TZONE_TZ_DSTTIME(r10)
+1:	mtlr	r12
+	li	r3,0			/* always success */
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_gettimeofday)
+
+
+/*
+ * This is the core of gettimeofday(), it returns the xsec
+ * value in r4 and expects the datapage ptr (non clobbered)
+ * in r3. clobbers r0,r4,r5,r6,r7,r8
+*/ 
+V_FUNCTION_BEGIN(__do_get_xsec)
+  .cfi_startproc
+	/* check for update count & load values */
+1:	ld	r7,CFG_TB_UPDATE_COUNT(r3)
+	andi.	r0,r4,1			/* pending update ? loop */
+	bne-	1b
+	xor	r0,r4,r4		/* create dependency */
+	add	r3,r3,r0
+
+	/* Get TB & offset it */
+	mftb	r8
+	ld	r9,CFG_TB_ORIG_STAMP(r3)
+	subf	r8,r9,r8
+
+	/* Scale result */
+	ld	r5,CFG_TB_TO_XS(r3)
+	mulhdu	r8,r8,r5
+
+	/* Add stamp since epoch */
+	ld	r6,CFG_STAMP_XSEC(r3)
+	add	r4,r6,r8
+
+	xor	r0,r4,r4
+	add	r3,r3,r0
+	ld	r0,CFG_TB_UPDATE_COUNT(r3)
+        cmpld   cr0,r0,r7		/* check if updated */
+	bne-	1b
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__do_get_xsec)
Index: linux-work/arch/ppc64/kernel/vdso64/datapage.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/datapage.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,68 @@
+/*
+ * Access to the shared data page by the vDSO & syscall map
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/offsets.h>
+#include <asm/unistd.h>
+#include <asm/vdso.h>
+
+	.text
+V_FUNCTION_BEGIN(__get_datapage)
+  .cfi_startproc
+	/* We don't want that exposed or overridable as we want other objects
+	 * to be able to bl directly to here
+	 */
+	.protected __get_datapage
+	.hidden __get_datapage
+
+	mflr	r0
+  .cfi_register lr,r0
+
+	bcl	20,31,1f
+	.global	__kernel_datapage_offset;
+__kernel_datapage_offset:
+	.long	0
+1:
+	mflr	r3
+	mtlr	r0
+	lwz	r0,0(r3)
+	add	r3,r0,r3
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__get_datapage)
+
+/*
+ * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
+ *
+ * returns a pointer to the syscall map. the map is agnostic to the
+ * size of "long", unlike kernel bitops, it stores bits from top to
+ * bottom so that memory actually contains a linear bitmap
+ * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of
+ * 32 bits int at N >> 5.
+ */
+V_FUNCTION_BEGIN(__kernel_get_syscall_map)
+  .cfi_startproc
+	mflr	r12
+  .cfi_register lr,r12
+
+	mr	r4,r3
+	bl	V_LOCAL_FUNC(__get_datapage)
+	mtlr	r12
+	addi	r3,r3,CFG_SYSCALL_MAP64
+	cmpli	cr0,r4,0
+	beqlr
+	li	r0,__NR_syscalls
+	stw	r0,0(r4)
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_get_syscall_map)
Index: linux-work/arch/ppc64/kernel/vdso64/sigtramp.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/sigtramp.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,294 @@
+/*
+ * Signal trampoline for 64 bits processes in a ppc64 kernel for
+ * use in the vDSO
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp.
+ * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/unistd.h>
+#include <asm/vdso.h>
+	
+	.text
+
+/* The nop here is a hack.  The dwarf2 unwind routines subtract 1 from
+   the return address to get an address in the middle of the presumed
+   call instruction.  Since we don't have a call here, we artifically
+   extend the range covered by the unwind info by padding before the
+   real start.  */
+	nop
+	.balign 8
+V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
+.Lsigrt_start = . - 4
+	addi	r1, r1, __SIGNAL_FRAMESIZE
+	li	r0,__NR_rt_sigreturn
+	sc
+.Lsigrt_end:
+V_FUNCTION_END(__kernel_sigtramp_rt64)
+/* The ".balign 8" above and the following zeros mimic the old stack
+   trampoline layout.  The last magic value is the ucontext pointer,
+   chosen in such a way that older libgcc unwind code returns a zero
+   for a sigcontext pointer.  */
+	.long 0,0,0
+	.quad 0,-21*8
+
+/* Register r1 can be found at offset 8 of a pt_regs structure.
+   A pointer to the pt_regs is stored in memory at the old sp plus PTREGS.  */
+#define cfa_save \
+  .byte 0x0f;			/* DW_CFA_def_cfa_expression */		\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x23; .uleb128 RSIZE;	/*     DW_OP_plus_uconst */		\
+  .byte 0x06;			/*     DW_OP_deref */			\
+9:
+
+/* Register REGNO can be found at offset OFS of a pt_regs structure.
+   A pointer to the pt_regs is stored in memory at the old sp plus PTREGS.  */
+#define rsave(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .ifne ofs;								\
+    .byte 0x23; .uleb128 ofs;	/*     DW_OP_plus_uconst */		\
+  .endif;								\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16
+   of the VMX reg struct.  A pointer to the VMX reg struct is at VREGS in
+   the pt_regs struct.  This macro is for REGNO == 0, and contains
+   'subroutines' that the other macros jump to.  */
+#define vsave_msr0(regno) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x30 + regno;		/*     DW_OP_lit0 */			\
+2:									\
+  .byte 0x40;			/*     DW_OP_lit16 */			\
+  .byte 0x1e;			/*     DW_OP_mul */			\
+3:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x12;			/*     DW_OP_dup */			\
+  .byte 0x23;			/*     DW_OP_plus_uconst */		\
+    .uleb128 33*RSIZE;		/*       msr offset */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x0c; .long 1 << 25;	/*     DW_OP_const4u */			\
+  .byte 0x1a;			/*     DW_OP_and */			\
+  .byte 0x12;			/*     DW_OP_dup, ret 0 if bra taken */	\
+  .byte 0x30;			/*     DW_OP_lit0 */			\
+  .byte 0x29;			/*     DW_OP_eq */			\
+  .byte 0x28; .short 0x7fff;	/*     DW_OP_bra to end */		\
+  .byte 0x13;			/*     DW_OP_drop, pop the 0 */		\
+  .byte 0x23; .uleb128 VREGS;	/*     DW_OP_plus_uconst */		\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x22;			/*     DW_OP_plus */			\
+  .byte 0x2f; .short 0x7fff;	/*     DW_OP_skip to end */		\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16
+   of the VMX reg struct.  REGNO is 1 thru 31.  */
+#define vsave_msr1(regno) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x30 + regno;		/*     DW_OP_lit n */			\
+  .byte 0x2f; .short 2b - 9f;	/*     DW_OP_skip */			\
+9:
+
+/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of
+   the VMX save block.  */
+#define vsave_msr2(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x0a; .short ofs;	/*     DW_OP_const2u */			\
+  .byte 0x2f; .short 3b - 9f;	/*     DW_OP_skip */			\
+9:
+
+/* VMX register REGNO is at offset OFS of the VMX save area.  */
+#define vsave(regno, ofs) \
+  .byte 0x10;			/* DW_CFA_expression */			\
+  .uleb128 regno + 77;		/*   regno */				\
+  .uleb128 9f - 1f;		/*   length */				\
+1:									\
+  .byte 0x71; .sleb128 PTREGS;	/*     DW_OP_breg1 */			\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x23; .uleb128 VREGS;	/*     DW_OP_plus_uconst */		\
+  .byte 0x06;			/*     DW_OP_deref */			\
+  .byte 0x23; .uleb128 ofs;	/*     DW_OP_plus_uconst */		\
+9:
+
+/* This is where the pt_regs pointer can be found on the stack.  */
+#define PTREGS 128+168+56
+
+/* Size of regs.  */
+#define RSIZE 8
+
+/* This is the offset of the VMX reg pointer.  */
+#define VREGS 48*RSIZE+33*8
+
+/* Describe where general purpose regs are saved.  */
+#define EH_FRAME_GEN \
+  cfa_save;								\
+  rsave ( 0,  0*RSIZE);							\
+  rsave ( 2,  2*RSIZE);							\
+  rsave ( 3,  3*RSIZE);							\
+  rsave ( 4,  4*RSIZE);							\
+  rsave ( 5,  5*RSIZE);							\
+  rsave ( 6,  6*RSIZE);							\
+  rsave ( 7,  7*RSIZE);							\
+  rsave ( 8,  8*RSIZE);							\
+  rsave ( 9,  9*RSIZE);							\
+  rsave (10, 10*RSIZE);							\
+  rsave (11, 11*RSIZE);							\
+  rsave (12, 12*RSIZE);							\
+  rsave (13, 13*RSIZE);							\
+  rsave (14, 14*RSIZE);							\
+  rsave (15, 15*RSIZE);							\
+  rsave (16, 16*RSIZE);							\
+  rsave (17, 17*RSIZE);							\
+  rsave (18, 18*RSIZE);							\
+  rsave (19, 19*RSIZE);							\
+  rsave (20, 20*RSIZE);							\
+  rsave (21, 21*RSIZE);							\
+  rsave (22, 22*RSIZE);							\
+  rsave (23, 23*RSIZE);							\
+  rsave (24, 24*RSIZE);							\
+  rsave (25, 25*RSIZE);							\
+  rsave (26, 26*RSIZE);							\
+  rsave (27, 27*RSIZE);							\
+  rsave (28, 28*RSIZE);							\
+  rsave (29, 29*RSIZE);							\
+  rsave (30, 30*RSIZE);							\
+  rsave (31, 31*RSIZE);							\
+  rsave (67, 32*RSIZE);		/* ap, used as temp for nip */		\
+  rsave (65, 36*RSIZE);		/* lr */				\
+  rsave (70, 38*RSIZE)		/* cr */
+
+/* Describe where the FP regs are saved.  */
+#define EH_FRAME_FP \
+  rsave (32, 48*RSIZE +  0*8);						\
+  rsave (33, 48*RSIZE +  1*8);						\
+  rsave (34, 48*RSIZE +  2*8);						\
+  rsave (35, 48*RSIZE +  3*8);						\
+  rsave (36, 48*RSIZE +  4*8);						\
+  rsave (37, 48*RSIZE +  5*8);						\
+  rsave (38, 48*RSIZE +  6*8);						\
+  rsave (39, 48*RSIZE +  7*8);						\
+  rsave (40, 48*RSIZE +  8*8);						\
+  rsave (41, 48*RSIZE +  9*8);						\
+  rsave (42, 48*RSIZE + 10*8);						\
+  rsave (43, 48*RSIZE + 11*8);						\
+  rsave (44, 48*RSIZE + 12*8);						\
+  rsave (45, 48*RSIZE + 13*8);						\
+  rsave (46, 48*RSIZE + 14*8);						\
+  rsave (47, 48*RSIZE + 15*8);						\
+  rsave (48, 48*RSIZE + 16*8);						\
+  rsave (49, 48*RSIZE + 17*8);						\
+  rsave (50, 48*RSIZE + 18*8);						\
+  rsave (51, 48*RSIZE + 19*8);						\
+  rsave (52, 48*RSIZE + 20*8);						\
+  rsave (53, 48*RSIZE + 21*8);						\
+  rsave (54, 48*RSIZE + 22*8);						\
+  rsave (55, 48*RSIZE + 23*8);						\
+  rsave (56, 48*RSIZE + 24*8);						\
+  rsave (57, 48*RSIZE + 25*8);						\
+  rsave (58, 48*RSIZE + 26*8);						\
+  rsave (59, 48*RSIZE + 27*8);						\
+  rsave (60, 48*RSIZE + 28*8);						\
+  rsave (61, 48*RSIZE + 29*8);						\
+  rsave (62, 48*RSIZE + 30*8);						\
+  rsave (63, 48*RSIZE + 31*8)
+
+/* Describe where the VMX regs are saved.  */
+#ifdef CONFIG_ALTIVEC
+#define EH_FRAME_VMX \
+  vsave_msr0 ( 0);							\
+  vsave_msr1 ( 1);							\
+  vsave_msr1 ( 2);							\
+  vsave_msr1 ( 3);							\
+  vsave_msr1 ( 4);							\
+  vsave_msr1 ( 5);							\
+  vsave_msr1 ( 6);							\
+  vsave_msr1 ( 7);							\
+  vsave_msr1 ( 8);							\
+  vsave_msr1 ( 9);							\
+  vsave_msr1 (10);							\
+  vsave_msr1 (11);							\
+  vsave_msr1 (12);							\
+  vsave_msr1 (13);							\
+  vsave_msr1 (14);							\
+  vsave_msr1 (15);							\
+  vsave_msr1 (16);							\
+  vsave_msr1 (17);							\
+  vsave_msr1 (18);							\
+  vsave_msr1 (19);							\
+  vsave_msr1 (20);							\
+  vsave_msr1 (21);							\
+  vsave_msr1 (22);							\
+  vsave_msr1 (23);							\
+  vsave_msr1 (24);							\
+  vsave_msr1 (25);							\
+  vsave_msr1 (26);							\
+  vsave_msr1 (27);							\
+  vsave_msr1 (28);							\
+  vsave_msr1 (29);							\
+  vsave_msr1 (30);							\
+  vsave_msr1 (31);							\
+  vsave_msr2 (33, 32*16+12);						\
+  vsave      (32, 33*16)
+#else
+#define EH_FRAME_VMX
+#endif
+
+	.section .eh_frame,"a", at progbits
+.Lcie:
+	.long .Lcie_end - .Lcie_start
+.Lcie_start:
+	.long 0			/* CIE ID */
+	.byte 1			/* Version number */
+	.string "zR"		/* NUL-terminated augmentation string */
+	.uleb128 4		/* Code alignment factor */
+	.sleb128 -8		/* Data alignment factor */
+	.byte 67		/* Return address register column, ap */
+	.uleb128 1		/* Augmentation value length */
+	.byte 0x14		/* DW_EH_PE_pcrel | DW_EH_PE_udata8. */
+	.byte 0x0c,1,0		/* DW_CFA_def_cfa: r1 ofs 0 */
+	.balign 8
+.Lcie_end:
+
+	.long .Lfde0_end - .Lfde0_start
+.Lfde0_start:
+	.long .Lfde0_start - .Lcie	/* CIE pointer. */
+	.quad .Lsigrt_start - .		/* PC start, length */
+	.quad .Lsigrt_end - .Lsigrt_start
+	.uleb128 0			/* Augmentation */
+	EH_FRAME_GEN
+	EH_FRAME_FP
+	EH_FRAME_VMX
+# Do we really need to describe the frame at this point?  ie. will
+# we ever have some call chain that returns somewhere past the addi?
+# I don't think so, since gcc doesn't support async signals.
+#	.byte 0x41		/* DW_CFA_advance_loc 1*4 */
+#undef PTREGS
+#define PTREGS 168+56
+#	EH_FRAME_GEN
+#	EH_FRAME_FP
+#	EH_FRAME_VMX
+	.balign 8
+.Lfde0_end:
Index: linux-work/arch/ppc64/kernel/vdso64/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/Makefile	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,37 @@
+# List of files in the vdso, has to be asm only for now
+
+src-vdso64 = sigtramp.S gettimeofday.S datapage.S cacheflush.S
+
+# Build rules
+
+obj-vdso64 := $(addsuffix .o, $(basename $(src-vdso64)))
+targets := $(obj-vdso64) vdso64.so
+obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
+src-vdso64 := $(addprefix $(src)/, $(src-vdso64))
+
+EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin
+EXTRA_CFLAGS +=  -nostdlib -Wl,-soname=linux-vdso64.so.1
+EXTRA_AFLAGS := -D__VDSO64__ -s
+
+obj-y += vdso64_wrapper.o
+extra-y += vdso64.lds
+CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
+
+# Force dependency (incbin is bad)
+$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so
+
+# link rule for the .so file, .lds has to be first
+$(obj)/vdso64.so: $(src)/vdso64.lds $(obj-vdso64)
+	$(call if_changed,vdso64ld)
+
+# assembly rules for the .S files
+$(obj-vdso64): %.o: %.S
+	$(call if_changed_dep,vdso64as)
+
+# actual build commands
+quiet_cmd_vdso64ld = VDSO64L $@
+      cmd_vdso64ld = $(CC) $(c_flags) -Wl,-T $^ -o $@
+quiet_cmd_vdso64as = VDSO64A $@
+      cmd_vdso64as = $(CC) $(a_flags) -c -o $@ $<
+
+
Index: linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,111 @@
+
+/*
+ * This is the infamous ld script for the 32 bits vdso
+ * library
+ */
+#include <asm/vdso.h>
+
+/* Default link addresses for the vDSOs */
+OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc")
+OUTPUT_ARCH(powerpc:common)
+ENTRY(_start)
+
+SECTIONS
+{
+  . = VDSO32_LBASE + SIZEOF_HEADERS;
+  .hash           : { *(.hash) }			:text
+  .dynsym         : { *(.dynsym) }
+  .dynstr         : { *(.dynstr) }
+  .gnu.version    : { *(.gnu.version) }
+  .gnu.version_d  : { *(.gnu.version_d) }
+  .gnu.version_r  : { *(.gnu.version_r) }
+
+  . = ALIGN (16);
+  .text :
+  {
+    *(.text .stub .text.* .gnu.linkonce.t.*)
+  }
+  PROVIDE (__etext = .);
+  PROVIDE (_etext = .);
+  PROVIDE (etext = .);
+
+  /* Other stuff is appended to the text segment: */
+  .rodata		: { *(.rodata .rodata.* .gnu.linkonce.r.*) }
+  .rodata1		: { *(.rodata1) }
+
+  .eh_frame_hdr		: { *(.eh_frame_hdr) }		:text	:eh_frame_hdr
+  .eh_frame		: { KEEP (*(.eh_frame)) }	:text
+  .gcc_except_table	: { *(.gcc_except_table) }
+  .fixup		: { *(.fixup) }
+
+  .got ALIGN(4)		: { *(.got.plt) *(.got) }
+
+  .dynamic		: { *(.dynamic) }		:text	:dynamic
+
+  _end = .;
+  __end = .;
+  PROVIDE (end = .);
+
+
+  /* Stabs debugging sections are here too
+   */
+  .stab 0 : { *(.stab) }
+  .stabstr 0 : { *(.stabstr) }
+  .stab.excl 0 : { *(.stab.excl) }
+  .stab.exclstr 0 : { *(.stab.exclstr) }
+  .stab.index 0 : { *(.stab.index) }
+  .stab.indexstr 0 : { *(.stab.indexstr) }
+  .comment 0 : { *(.comment) }
+  .debug 0 : { *(.debug) }
+  .line 0 : { *(.line) }
+
+  .debug_srcinfo 0 : { *(.debug_srcinfo) }
+  .debug_sfnames 0 : { *(.debug_sfnames) }
+
+  .debug_aranges 0 : { *(.debug_aranges) }
+  .debug_pubnames 0 : { *(.debug_pubnames) }
+
+  .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) }
+  .debug_abbrev 0 : { *(.debug_abbrev) }
+  .debug_line 0 : { *(.debug_line) }
+  .debug_frame 0 : { *(.debug_frame) }
+  .debug_str 0 : { *(.debug_str) }
+  .debug_loc 0 : { *(.debug_loc) }
+  .debug_macinfo 0 : { *(.debug_macinfo) }
+
+  .debug_weaknames 0 : { *(.debug_weaknames) }
+  .debug_funcnames 0 : { *(.debug_funcnames) }
+  .debug_typenames 0 : { *(.debug_typenames) }
+  .debug_varnames 0 : { *(.debug_varnames) }
+
+  /DISCARD/ : { *(.note.GNU-stack) }
+  /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.* .sdata*) }
+  /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) }
+}
+
+
+PHDRS
+{
+  text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
+  dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+  eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
+}
+
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+  VDSO_VERSION_STRING {
+    global:
+	__kernel_datapage_offset; /* Has to be there for the kernel to find it */
+	__kernel_get_syscall_map;
+	__kernel_gettimeofday;
+	__kernel_sync_dicache;
+	__kernel_sync_dicache_p5;
+	__kernel_sigtramp32;
+	__kernel_sigtramp_rt32;
+    local: *;
+  };
+}
Index: linux-work/arch/ppc64/kernel/vdso32/cacheflush.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso32/cacheflush.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,65 @@
+/*
+ * vDSO provided cache flush routines
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org),
+ *                    IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/vdso.h>
+#include <asm/offsets.h>
+
+	.text
+
+/*
+ * Default "generic" version of __kernel_sync_dicache.
+ *
+ * void __kernel_sync_dicache(unsigned long start, unsigned long end)
+ *
+ * Flushes the data cache & invalidate the instruction cache for the
+ * provided range [start, end[
+ *
+ * Note: all CPUs supported by this kernel have a 128 bytes cache
+ * line size so we don't have to peek that info from the datapage
+ */
+V_FUNCTION_BEGIN(__kernel_sync_dicache)
+  .cfi_startproc
+	li	r5,127
+	andc	r6,r3,r5		/* round low to line bdy */
+	subf	r8,r6,r4		/* compute length */
+	add	r8,r8,r5		/* ensure we get enough */
+	srwi.	r8,r8,7			/* compute line count */
+	beqlr				/* nothing to do? */
+	mtctr	r8
+	mr	r3,r6
+1:	dcbst	0,r3
+	addi	r3,r3,128
+	bdnz	1b
+	sync
+	mtctr	r8
+1:	icbi	0,r6
+	addi	r6,r6,128
+	bdnz	1b
+	isync
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_sync_dicache)
+
+
+/*
+ * POWER5 version of __kernel_sync_dicache
+ */
+V_FUNCTION_BEGIN(__kernel_sync_dicache_p5)
+  .cfi_startproc
+	sync
+	isync
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_sync_dicache_p5)
+
Index: linux-work/arch/ppc64/kernel/vdso64/cacheflush.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-work/arch/ppc64/kernel/vdso64/cacheflush.S	2005-01-31 16:25:56.000000000 +1100
@@ -0,0 +1,64 @@
+/*
+ * vDSO provided cache flush routines
+ *
+ * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org),
+ *                    IBM Corp.
+ *  
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/vdso.h>
+#include <asm/offsets.h>
+
+	.text
+
+/*
+ * Default "generic" version of __kernel_sync_dicache.
+ *
+ * void __kernel_sync_dicache(unsigned long start, unsigned long end)
+ *
+ * Flushes the data cache & invalidate the instruction cache for the
+ * provided range [start, end[
+ *
+ * Note: all CPUs supported by this kernel have a 128 bytes cache
+ * line size so we don't have to peek that info from the datapage
+ */
+V_FUNCTION_BEGIN(__kernel_sync_dicache)
+  .cfi_startproc
+	li	r5,127
+	andc	r6,r3,r5		/* round low to line bdy */
+	subf	r8,r6,r4		/* compute length */
+	add	r8,r8,r5		/* ensure we get enough */
+	srwi.	r8,r8,7			/* compute line count */
+	beqlr				/* nothing to do? */
+	mtctr	r8
+	mr	r3,r6
+1:	dcbst	0,r3
+	addi	r3,r3,128
+	bdnz	1b
+	sync
+	mtctr	r8
+1:	icbi	0,r6
+	addi	r6,r6,128
+	bdnz	1b
+	isync
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_sync_dicache)
+
+
+/*
+ * POWER5 version of __kernel_sync_dicache
+ */
+V_FUNCTION_BEGIN(__kernel_sync_dicache_p5)
+  .cfi_startproc
+	sync
+	isync
+	blr
+  .cfi_endproc
+V_FUNCTION_END(__kernel_sync_dicache_p5)
Index: linux-work/arch/ppc64/kernel/head.S
===================================================================
--- linux-work.orig/arch/ppc64/kernel/head.S	2005-01-31 16:19:44.000000000 +1100
+++ linux-work/arch/ppc64/kernel/head.S	2005-01-31 16:25:56.000000000 +1100
@@ -54,7 +54,6 @@
  * 0x0100 - 0x2fff : pSeries Interrupt prologs
  * 0x3000 - 0x3fff : Interrupt support
  * 0x4000 - 0x4fff : NACA
- * 0x5000 - 0x5fff : SystemCfg
  * 0x6000	   : iSeries and common interrupt prologs
  * 0x9000 - 0x9fff : Initial segment table
  */


From sfr at canb.auug.org.au  Mon Jan 31 17:39:44 2005
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Mon, 31 Jan 2005 17:39:44 +1100
Subject: [PATCH] replace last usage of vio dma mapping routines
Message-ID: <20050131173944.7aa7f206.sfr@canb.auug.org.au>

Hi all,

This patch just replaces the last usage of the vio dma mapping routines
with the equivalent generic dma mapping routines.

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

diff -ruNp linus-bk/drivers/net/ibmveth.c linus-bk-vio.1/drivers/net/ibmveth.c
--- linus-bk/drivers/net/ibmveth.c	2004-12-08 04:06:06.000000000 +1100
+++ linus-bk-vio.1/drivers/net/ibmveth.c	2005-01-31 16:45:28.000000000 +1100
@@ -218,7 +218,8 @@ static void ibmveth_replenish_buffer_poo
 		ibmveth_assert(index != IBM_VETH_INVALID_MAP);
 		ibmveth_assert(pool->skbuff[index] == NULL);
 
-		dma_addr = vio_map_single(adapter->vdev, skb->data, pool->buff_size, DMA_FROM_DEVICE);
+		dma_addr = dma_map_single(&adapter->vdev->dev, skb->data,
+				pool->buff_size, DMA_FROM_DEVICE);
 
 		pool->free_map[free_index] = IBM_VETH_INVALID_MAP;
 		pool->dma_addr[index] = dma_addr;
@@ -238,7 +239,9 @@ static void ibmveth_replenish_buffer_poo
 			pool->free_map[free_index] = IBM_VETH_INVALID_MAP;
 			pool->skbuff[index] = NULL;
 			pool->consumer_index--;
-			vio_unmap_single(adapter->vdev, pool->dma_addr[index], pool->buff_size, DMA_FROM_DEVICE);
+			dma_unmap_single(&adapter->vdev->dev,
+					pool->dma_addr[index], pool->buff_size,
+					DMA_FROM_DEVICE);
 			dev_kfree_skb_any(skb);
 			adapter->replenish_add_buff_failure++;
 			break;
@@ -299,7 +302,7 @@ static void ibmveth_free_buffer_pool(str
 		for(i = 0; i < pool->size; ++i) {
 			struct sk_buff *skb = pool->skbuff[i];
 			if(skb) {
-				vio_unmap_single(adapter->vdev,
+				dma_unmap_single(&adapter->vdev->dev,
 						 pool->dma_addr[i],
 						 pool->buff_size,
 						 DMA_FROM_DEVICE);
@@ -337,7 +340,7 @@ static void ibmveth_remove_buffer_from_p
 
 	adapter->rx_buff_pool[pool].skbuff[index] = NULL;
 
-	vio_unmap_single(adapter->vdev,
+	dma_unmap_single(&adapter->vdev->dev,
 			 adapter->rx_buff_pool[pool].dma_addr[index],
 			 adapter->rx_buff_pool[pool].buff_size,
 			 DMA_FROM_DEVICE);
@@ -408,7 +411,9 @@ static void ibmveth_cleanup(struct ibmve
 {
 	if(adapter->buffer_list_addr != NULL) {
 		if(!dma_mapping_error(adapter->buffer_list_dma)) {
-			vio_unmap_single(adapter->vdev, adapter->buffer_list_dma, 4096, DMA_BIDIRECTIONAL);
+			dma_unmap_single(&adapter->vdev->dev,
+					adapter->buffer_list_dma, 4096,
+					DMA_BIDIRECTIONAL);
 			adapter->buffer_list_dma = DMA_ERROR_CODE;
 		}
 		free_page((unsigned long)adapter->buffer_list_addr);
@@ -417,7 +422,9 @@ static void ibmveth_cleanup(struct ibmve
 
 	if(adapter->filter_list_addr != NULL) {
 		if(!dma_mapping_error(adapter->filter_list_dma)) {
-			vio_unmap_single(adapter->vdev, adapter->filter_list_dma, 4096, DMA_BIDIRECTIONAL);
+			dma_unmap_single(&adapter->vdev->dev,
+					adapter->filter_list_dma, 4096,
+					DMA_BIDIRECTIONAL);
 			adapter->filter_list_dma = DMA_ERROR_CODE;
 		}
 		free_page((unsigned long)adapter->filter_list_addr);
@@ -426,7 +433,10 @@ static void ibmveth_cleanup(struct ibmve
 
 	if(adapter->rx_queue.queue_addr != NULL) {
 		if(!dma_mapping_error(adapter->rx_queue.queue_dma)) {
-			vio_unmap_single(adapter->vdev, adapter->rx_queue.queue_dma, adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL);
+			dma_unmap_single(&adapter->vdev->dev,
+					adapter->rx_queue.queue_dma,
+					adapter->rx_queue.queue_len,
+					DMA_BIDIRECTIONAL);
 			adapter->rx_queue.queue_dma = DMA_ERROR_CODE;
 		}
 		kfree(adapter->rx_queue.queue_addr);
@@ -472,9 +482,13 @@ static int ibmveth_open(struct net_devic
 		return -ENOMEM;
 	}
 
-	adapter->buffer_list_dma = vio_map_single(adapter->vdev, adapter->buffer_list_addr, 4096, DMA_BIDIRECTIONAL);
-	adapter->filter_list_dma = vio_map_single(adapter->vdev, adapter->filter_list_addr, 4096, DMA_BIDIRECTIONAL);
-	adapter->rx_queue.queue_dma = vio_map_single(adapter->vdev, adapter->rx_queue.queue_addr, adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL);
+	adapter->buffer_list_dma = dma_map_single(&adapter->vdev->dev,
+			adapter->buffer_list_addr, 4096, DMA_BIDIRECTIONAL);
+	adapter->filter_list_dma = dma_map_single(&adapter->vdev->dev,
+			adapter->filter_list_addr, 4096, DMA_BIDIRECTIONAL);
+	adapter->rx_queue.queue_dma = dma_map_single(&adapter->vdev->dev,
+			adapter->rx_queue.queue_addr,
+			adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL);
 
 	if((dma_mapping_error(adapter->buffer_list_dma) ) ||
 	   (dma_mapping_error(adapter->filter_list_dma)) ||
@@ -644,7 +658,7 @@ static int ibmveth_start_xmit(struct sk_
 
 	/* map the initial fragment */
 	desc[0].fields.length  = nfrags ? skb->len - skb->data_len : skb->len;
-	desc[0].fields.address = vio_map_single(adapter->vdev, skb->data,
+	desc[0].fields.address = dma_map_single(&adapter->vdev->dev, skb->data,
 					desc[0].fields.length, DMA_TO_DEVICE);
 	desc[0].fields.valid   = 1;
 
@@ -662,7 +676,7 @@ static int ibmveth_start_xmit(struct sk_
 	while(curfrag--) {
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[curfrag];
 		desc[curfrag+1].fields.address
-			= vio_map_single(adapter->vdev,
+			= dma_map_single(&adapter->vdev->dev,
 				page_address(frag->page) + frag->page_offset,
 				frag->size, DMA_TO_DEVICE);
 		desc[curfrag+1].fields.length = frag->size;
@@ -674,7 +688,7 @@ static int ibmveth_start_xmit(struct sk_
 			adapter->stats.tx_dropped++;
 			/* Free all the mappings we just created */
 			while(curfrag < nfrags) {
-				vio_unmap_single(adapter->vdev,
+				dma_unmap_single(&adapter->vdev->dev,
 						 desc[curfrag+1].fields.address,
 						 desc[curfrag+1].fields.length,
 						 DMA_TO_DEVICE);
@@ -714,7 +728,9 @@ static int ibmveth_start_xmit(struct sk_
 	}
 
 	do {
-		vio_unmap_single(adapter->vdev, desc[nfrags].fields.address, desc[nfrags].fields.length, DMA_TO_DEVICE);
+		dma_unmap_single(&adapter->vdev->dev,
+				desc[nfrags].fields.address,
+				desc[nfrags].fields.length, DMA_TO_DEVICE);
 	} while(--nfrags >= 0);
 
 	dev_kfree_skb(skb);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/4aa7fa11/attachment.pgp 

From olh at suse.de  Mon Jan 31 19:52:45 2005
From: olh at suse.de (Olaf Hering)
Date: Mon, 31 Jan 2005 09:52:45 +0100
Subject: IDE oops with 2.6.11rc2
Message-ID: <20050131085245.GA26443@suse.de>


I get this with current Linus tree, on a p630.


Linux version 2.6.11-rc2-pseries64 (olaf at pomegranate) (gcc version 3.3.3 (SuSE Linux)) #4 SMP Mon Jan 31 09:40:20 CET 2005
Kernel command line:
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
W82C105: IDE controller at PCI slot 0000:00:03.1
W82C105: chipset revision 5
W82C105: 100% native mode on irq 102
    ide0: BM-DMA at 0xf040-0xf047<3> CPU: -- Error, unable to allocateW82C105 DMA table(s).
    ide1: BM-DMA at 0xf048-0xf04f<3> CPU: -- Error, unable to allocateW82C105 DMA table(s).
hda: HL-DT-ST CD-ROM GCR-8480B, ATAPI CD/DVD-ROM drive
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA PSERIES
Modules linked in:
NIP: C00000000028E248 XER: 00000000 LR: C000000000284E7C CTR: 0000000000000015
REGS: c0000000041e3620 TRAP: 0300   Not tainted  (2.6.11-rc2-pseries64)
MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 24004048
DAR: 0000000000000000 DSISR: 0000000040000000
TASK: c0000001fe7927f0[1] 'swapper' THREAD: c0000000041e0000 CPU: 0
GPR00: C000000000602B40 C0000000041E38A0 C000000000619990 C0000000006CB468
GPR04: 000000000000000C 0000000000000015 C0000000041E39E8 E00000000000F042
GPR08: 0000000000000001 0000000000000000 03000000A0000000 0000000000000000
GPR12: 000000003FD228A8 C000000000491C00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000003A10000 0000000003E3CA10
GPR20: 0000000003E3CA10 BFFFFFFFFC5F0000 0000000000000001 000000000000000C
GPR24: 0000000000000001 0000000000000044 C0000000006CB468 C0000000006CB358
GPR28: C0000000006CB468 0000000000000000 C00000000052F8F0 C0000000041E38A0
NIP [c00000000028e248] .ide_config_drive_speed+0x32c/0x6e0
LR [c000000000284e7c] .config_for_pio+0x1b8/0x240
Call Trace:
[c0000000041e38a0] [0000000040000000] .__start+0x4000000040000000/0x8 (unreliable)
[c0000000041e3970] [c000000000284e7c] .config_for_pio+0x1b8/0x240
[c0000000041e3a30] [c000000000284fb4] .tune_sl82c105+0x2c/0x54
[c0000000041e3ac0] [c0000000002926bc] .probe_hwif+0x9e4/0x9ec
[c0000000041e3b90] [c000000000293414] .probe_hwif_init_with_fixup+0x2c/0xe0
[c0000000041e3c20] [c000000000296e20] .ide_setup_pci_device+0x74/0xd8
[c0000000041e3cc0] [c00000000028495c] .sl82c105_init_one+0x24/0x40
[c0000000041e3d40] [c000000000426ffc] .ide_scan_pcidev+0xb0/0x10c
[c0000000041e3dd0] [c0000000004270a0] .ide_scan_pcibus+0x48/0x120
[c0000000041e3e70] [c000000000426f18] .ide_init+0x80/0xb4
[c0000000041e3f00] [c00000000000c390] .init+0x1d4/0x3f4
[c0000000041e3f90] [c000000000014388] .kernel_thread+0x4c/0x6c
Instruction dump:
887a00dd e8090000 f8410028 60630002 e9690010 e8490008 7c0903a6 4e800421
e8410028 e97a0090 4bfffdd4 e93b06a0 <e8090000> f8410028 60000000 e9690010
 <0>Kernel panic - not syncing: Attempted to kill init!


From olh at suse.de  Mon Jan 31 20:02:26 2005
From: olh at suse.de (Olaf Hering)
Date: Mon, 31 Jan 2005 10:02:26 +0100
Subject: IDE oops with 2.6.11rc2
In-Reply-To: <20050131085245.GA26443@suse.de>
References: <20050131085245.GA26443@suse.de>
Message-ID: <20050131090226.GA27127@suse.de>

 On Mon, Jan 31, Olaf Hering wrote:

> 
> I get this with current Linus tree, on a p630.

ppc64-p615-iommu-fix.patch helps