From david at gibson.dropbear.id.au Sun Jan 2 09:33:45 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Sun, 2 Jan 2005 09:33:45 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <1104381206.16694.38.camel@localhost.localdomain> References: <1104381206.16694.38.camel@localhost.localdomain> Message-ID: <20050101223345.GC2297@zax> On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > Hi- > > I've been playing around with sparse a little and saw that it gives a > lot of warnings like this: > > arch/ppc64/mm/init.c:755:35: warning: constant 0x0000020000000000 is so > big it is long > > It looks like we get such a warning for every expression of the form > "(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)" -- basically, > every time the code checks for a cpu feature. > > Following is an attempt to clean these up by defining the cpu feature > constants using the ASM_CONST macro from ppc64's page.h. I believe this > is consistent with the intentions for ASM_CONST's use. > > There's some fallout: > > flush_icache_range() was already using ASM_CONST on one of the > constants, so that is fixed up. > > switch_mm() uses a BEGIN_FTR_SECTION ... > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change > since 0x0000000000000008UL winds up in the generated assembly. I > couldn't find the BEGIN/END_FTR_SECTION construct used in any other C > code, so I replaced this with the usual bitwise 'and' conditional (I > hope someone else will verify that this is equivalent :). > > So, does this look like the right thing to do? It eliminates 129 sparse > warnings from a defconfig 2.6.10 build. Hurrah! You beat me to it... > Index: 2.6.10/include/asm-ppc64/cputable.h > =================================================================== > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > @@ -16,6 +16,7 @@ > #define __ASM_PPC_CPUTABLE_H > > #include > +#include /* for ASM_CONST */ Have you double checked that this won't cause a nasty #include loop? The CPU constants are used in quite a few places, as is page.h -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From nathanl at austin.ibm.com Tue Jan 4 00:16:24 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 03 Jan 2005 07:16:24 -0600 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <20050101223345.GC2297@zax> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> Message-ID: <1104758184.15200.6.camel@localhost.localdomain> On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > > > > Index: 2.6.10/include/asm-ppc64/cputable.h > > =================================================================== > > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > > @@ -16,6 +16,7 @@ > > #define __ASM_PPC_CPUTABLE_H > > > > #include > > +#include /* for ASM_CONST */ > > Have you double checked that this won't cause a nasty #include loop? > The CPU constants are used in quite a few places, as is page.h I think it's ok -- page.h includes the following: - linux/config.h, which includes linux/autoconf.h - asm-ppc64/naca.h, which includes asm-ppc64/types.h and asm-ppc64/systemcfg.h. So I don't see any way that cputable.h could be pulled in before ASM_CONST is defined. Thanks, Nathan From jdl at freescale.com Tue Jan 4 05:56:39 2005 From: jdl at freescale.com (Jon Loeliger) Date: Mon, 03 Jan 2005 12:56:39 -0600 Subject: PATCH uninorth3 (G5) agp support In-Reply-To: <41D00564.6010507@free.fr> References: <41CEC6B0.5020106@free.fr> <1104137527.5615.20.camel@gaston> <41D00564.6010507@free.fr> Message-ID: <1104778599.14049.64.camel@cashmere.sps.mot.com> On Mon, 2004-12-27 at 06:51, Jerome Glisse wrote: > /* My understanding of UniNorth AGP as of UniNorth rev 1.0x, > * revision 1.5 (x4 AGP) may need further changes. > diff -Naur linux/include/linux/pci_ids.h linux-new/include/linux/pci_ids.h > --- linux/include/linux/pci_ids.h 2004-12-26 14:40:05.000000000 +0100 > +++ linux-new/include/linux/pci_ids.h 2004-12-27 13:40:50.121003792 +0100 > @@ -842,6 +842,7 @@ > #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC2 0x0032 > #define PCI_DEVIEC_ID_APPLE_UNI_N_ATA 0x0033 > #define PCI_DEVICE_ID_APPLE_UNI_N_AGP2 0x0034 > +#define PCI_DEVICE_ID_APPLE_U3_AGP 0x0059 > #define PCI_DEVICE_ID_APPLE_IPID_ATA100 0x003b > #define PCI_DEVICE_ID_APPLE_KEYLARGO_I 0x003e > #define PCI_DEVICE_ID_APPLE_K2_ATA100 0x0043 So, did 0x0033's symbol need to be spelled consistently too? NB: PCI_DEVIEC_ Thanks, jdl From david at gibson.dropbear.id.au Tue Jan 4 11:07:23 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 4 Jan 2005 11:07:23 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <1104758184.15200.6.camel@localhost.localdomain> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> <1104758184.15200.6.camel@localhost.localdomain> Message-ID: <20050104000723.GB6745@zax> On Mon, Jan 03, 2005 at 07:16:24AM -0600, Nathan Lynch wrote: > On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > > On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > > > > > > Index: 2.6.10/include/asm-ppc64/cputable.h > > > =================================================================== > > > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > > > @@ -16,6 +16,7 @@ > > > #define __ASM_PPC_CPUTABLE_H > > > > > > #include > > > +#include /* for ASM_CONST */ > > > > Have you double checked that this won't cause a nasty #include loop? > > The CPU constants are used in quite a few places, as is page.h > > I think it's ok -- page.h includes the following: > > - linux/config.h, which includes linux/autoconf.h > > - asm-ppc64/naca.h, which includes asm-ppc64/types.h and > asm-ppc64/systemcfg.h. > > So I don't see any way that cputable.h could be pulled in before > ASM_CONST is defined. Ok, sounds good. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From sfr at canb.auug.org.au Tue Jan 4 14:53:56 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 14:53:56 +1100 Subject: PPC64 cleanups 0/11 Message-ID: <20050104145356.4d5333dd.sfr@canb.auug.org.au> Hi Andrew, The following series of patches are mainly just cleanups of the ppc64 code in order to eliminate the naca structure. In the end, the naca only exists for legacy iseries kernels. One of the more intrusive parts of these patches is the renaming of the fields of the lppaca structure to eliminate another set of StudyCaps. These patches (in total) have been built on iSeries, pSeries and pmac and booted on iSeries and pSeries. Please apply and send upstream. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/340c1852/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:04:10 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:04:10 +1100 Subject: [PATCH 1/11] PPC64: Consolidate cache sizing variables In-Reply-To: <20050104145356.4d5333dd.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> Message-ID: <20050104150410.199b132e.sfr@canb.auug.org.au> Hi Andrew, This patch consolidates the variables that define the PPC64 cache sizes into a single structure (the were in the naca and the systemcfg structures). Those that were in the systemcfg structure are left there just because they are exported to user mode through /proc. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c --- linus-bk/arch/ppc64/kernel/asm-offsets.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c 2004-12-31 14:52:14.000000000 +1100 @@ -35,6 +35,7 @@ #include #include #include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -69,12 +70,12 @@ /* naca */ DEFINE(PACA, offsetof(struct naca_struct, paca)); - DEFINE(DCACHEL1LINESIZE, offsetof(struct systemcfg, dCacheL1LineSize)); - DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct naca_struct, dCacheL1LogLineSize)); - DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct naca_struct, dCacheL1LinesPerPage)); - DEFINE(ICACHEL1LINESIZE, offsetof(struct systemcfg, iCacheL1LineSize)); - DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct naca_struct, iCacheL1LogLineSize)); - DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct naca_struct, iCacheL1LinesPerPage)); + DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size)); + DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size)); + DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page)); + DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size)); + DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size)); + DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); /* paca */ diff -ruN linus-bk/arch/ppc64/kernel/eeh.c linus-bk-naca.1/arch/ppc64/kernel/eeh.c --- linus-bk/arch/ppc64/kernel/eeh.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/eeh.c 2004-12-31 14:52:14.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include #include "pci.h" #undef DEBUG diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c 2004-12-31 14:52:14.000000000 +1100 @@ -44,6 +44,7 @@ #include "iSeries_setup.h" #include #include +#include #include #include #include @@ -560,33 +561,36 @@ unsigned int i, n; unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; - systemcfg->iCacheL1Size = - xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; - systemcfg->iCacheL1LineSize = + systemcfg->icache_size = + ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; + systemcfg->icache_line_size = + ppc64_caches.iline_size = xIoHriProcessorVpd[procIx].xInstCacheOperandSize; - systemcfg->dCacheL1Size = + systemcfg->dcache_size = + ppc64_caches.dsize = xIoHriProcessorVpd[procIx].xDataL1CacheSizeKB * 1024; - systemcfg->dCacheL1LineSize = + systemcfg->dcache_line_size = + ppc64_caches.dline_size = xIoHriProcessorVpd[procIx].xDataCacheOperandSize; - naca->iCacheL1LinesPerPage = PAGE_SIZE / systemcfg->iCacheL1LineSize; - naca->dCacheL1LinesPerPage = PAGE_SIZE / systemcfg->dCacheL1LineSize; + ppc64_caches.ilines_per_page = PAGE_SIZE / ppc64_caches.iline_size; + ppc64_caches.dlines_per_page = PAGE_SIZE / ppc64_caches.dline_size; - i = systemcfg->iCacheL1LineSize; + i = ppc64_caches.iline_size; n = 0; while ((i = (i / 2))) ++n; - naca->iCacheL1LogLineSize = n; + ppc64_caches.log_iline_size = n; - i = systemcfg->dCacheL1LineSize; + i = ppc64_caches.dline_size; n = 0; while ((i = (i / 2))) ++n; - naca->dCacheL1LogLineSize = n; + ppc64_caches.log_dline_size = n; printk("D-cache line size = %d\n", - (unsigned int)systemcfg->dCacheL1LineSize); + (unsigned int)ppc64_caches.dline_size); printk("I-cache line size = %d\n", - (unsigned int)systemcfg->iCacheL1LineSize); + (unsigned int)ppc64_caches.iline_size); } /* diff -ruN linus-bk/arch/ppc64/kernel/idle.c linus-bk-naca.1/arch/ppc64/kernel/idle.c --- linus-bk/arch/ppc64/kernel/idle.c 2004-10-27 07:32:57.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/idle.c 2004-12-31 14:52:14.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include extern void power4_idle(void); diff -ruN linus-bk/arch/ppc64/kernel/misc.S linus-bk-naca.1/arch/ppc64/kernel/misc.S --- linus-bk/arch/ppc64/kernel/misc.S 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/misc.S 2004-12-31 14:52:14.000000000 +1100 @@ -189,6 +189,11 @@ isync blr + .section ".toc","aw" +PPC64_CACHES: + .tc ppc64_caches[TC],ppc64_caches + .section ".text" + /* * Write any modified data cache blocks out to memory * and invalidate the corresponding instruction cache blocks. @@ -207,11 +212,8 @@ * and in some cases i-cache and d-cache line sizes differ from * each other. */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11)/* Get cache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10)/* Get cache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -227,7 +229,7 @@ /* Now invalidate the instruction cache */ - lwz r7,ICACHEL1LINESIZE(r11) /* Get Icache line size */ + lwz r7,ICACHEL1LINESIZE(r10) /* Get Icache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -256,11 +258,8 @@ * * Different systems have different cache line sizes */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -286,11 +285,8 @@ * flush all bytes from start to stop-1 inclusive */ _GLOBAL(flush_dcache_phys_range) - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -332,13 +328,10 @@ */ /* Flush the dcache */ - LOADADDR(r7,naca) - ld r7,0(r7) - LOADADDR(r8,systemcfg) /* Get systemcfg address */ - ld r8,0(r8) + ld r7,PPC64_CACHES at toc(r2) clrrdi r3,r3,12 /* Page align */ lwz r4,DCACHEL1LINESPERPAGE(r7) /* Get # dcache lines per page */ - lwz r5,DCACHEL1LINESIZE(r8) /* Get dcache line size */ + lwz r5,DCACHEL1LINESIZE(r7) /* Get dcache line size */ mr r6,r3 mtctr r4 0: dcbst 0,r6 @@ -349,7 +342,7 @@ /* Now invalidate the icache */ lwz r4,ICACHEL1LINESPERPAGE(r7) /* Get # icache lines per page */ - lwz r5,ICACHEL1LINESIZE(r8) /* Get icache line size */ + lwz r5,ICACHEL1LINESIZE(r7) /* Get icache line size */ mtctr r4 1: icbi 0,r3 add r3,r3,r5 diff -ruN linus-bk/arch/ppc64/kernel/nvram.c linus-bk-naca.1/arch/ppc64/kernel/nvram.c --- linus-bk/arch/ppc64/kernel/nvram.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/nvram.c 2004-12-31 14:52:14.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #undef DEBUG_NVRAM diff -ruN linus-bk/arch/ppc64/kernel/pSeries_iommu.c linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c --- linus-bk/arch/ppc64/kernel/pSeries_iommu.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c 2004-12-31 14:52:14.000000000 +1100 @@ -43,6 +43,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/pacaData.c linus-bk-naca.1/arch/ppc64/kernel/pacaData.c --- linus-bk/arch/ppc64/kernel/pacaData.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/pacaData.c 2004-12-31 14:52:14.000000000 +1100 @@ -10,6 +10,8 @@ #include #include #include +#include + #include #include #include @@ -20,7 +22,9 @@ #include struct naca_struct *naca; +EXPORT_SYMBOL(naca); struct systemcfg *systemcfg; +EXPORT_SYMBOL(systemcfg); /* This symbol is provided by the linker - let it fill in the paca * field correctly */ diff -ruN linus-bk/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c --- linus-bk/arch/ppc64/kernel/pmac_setup.c 2004-10-25 18:18:33.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:52:14.000000000 +1100 @@ -70,6 +70,7 @@ #include #include #include +#include #include "pmac.h" #include "mpic.h" diff -ruN linus-bk/arch/ppc64/kernel/ppc_ksyms.c linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk/arch/ppc64/kernel/ppc_ksyms.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c 2004-12-31 14:52:14.000000000 +1100 @@ -67,7 +67,6 @@ EXPORT_SYMBOL(__down_interruptible); EXPORT_SYMBOL(__up); -EXPORT_SYMBOL(naca); EXPORT_SYMBOL(__down); #ifdef CONFIG_PPC_ISERIES EXPORT_SYMBOL(itLpNaca); @@ -162,4 +161,3 @@ EXPORT_SYMBOL(tb_ticks_per_usec); EXPORT_SYMBOL(paca); EXPORT_SYMBOL(cur_cpu_spec); -EXPORT_SYMBOL(systemcfg); diff -ruN linus-bk/arch/ppc64/kernel/rtas-proc.c linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c --- linus-bk/arch/ppc64/kernel/rtas-proc.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c 2004-12-31 14:52:14.000000000 +1100 @@ -31,6 +31,7 @@ #include #include /* for ppc_md */ #include +#include /* Token for Sensors */ #define KEY_SWITCH 0x0001 diff -ruN linus-bk/arch/ppc64/kernel/rtas.c linus-bk-naca.1/arch/ppc64/kernel/rtas.c --- linus-bk/arch/ppc64/kernel/rtas.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/rtas.c 2004-12-31 14:52:14.000000000 +1100 @@ -29,6 +29,7 @@ #include #include #include +#include struct flash_block_list_header rtas_firmware_flash_list = {0, NULL}; diff -ruN linus-bk/arch/ppc64/kernel/rtasd.c linus-bk-naca.1/arch/ppc64/kernel/rtasd.c --- linus-bk/arch/ppc64/kernel/rtasd.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/rtasd.c 2004-12-31 14:52:14.000000000 +1100 @@ -26,6 +26,7 @@ #include #include #include +#include #if 0 #define DEBUG(A...) printk(KERN_ERR A) diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-naca.1/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-31 16:22:00.000000000 +1100 @@ -54,6 +54,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,8 @@ int boot_cpuid_phys = 0; dev_t boot_dev; +struct ppc64_caches ppc64_caches; + /* * These are used in binfmt_elf.c to put aux entries on the stack * for each elf executable being started. @@ -489,15 +492,15 @@ lsizep = (u32 *) get_property(np, dc, NULL); if (lsizep != NULL) lsize = *lsizep; - if (sizep == 0 || lsizep == 0) DBG("Argh, can't find dcache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->dCacheL1Size = size; - systemcfg->dCacheL1LineSize = lsize; - naca->dCacheL1LogLineSize = __ilog2(lsize); - naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize); + systemcfg->dcache_size = ppc64_caches.dsize = size; + systemcfg->dcache_line_size = + ppc64_caches.dline_size = lsize; + ppc64_caches.log_dline_size = __ilog2(lsize); + ppc64_caches.dlines_per_page = PAGE_SIZE / lsize; size = 0; lsize = cur_cpu_spec->icache_bsize; @@ -511,11 +514,11 @@ DBG("Argh, can't find icache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->iCacheL1Size = size; - systemcfg->iCacheL1LineSize = lsize; - naca->iCacheL1LogLineSize = __ilog2(lsize); - naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize); - + systemcfg->icache_size = ppc64_caches.isize = size; + systemcfg->icache_line_size = + ppc64_caches.iline_size = lsize; + ppc64_caches.log_iline_size = __ilog2(lsize); + ppc64_caches.ilines_per_page = PAGE_SIZE / lsize; } } @@ -664,8 +667,10 @@ printk("systemcfg->platform = 0x%x\n", systemcfg->platform); printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); printk("systemcfg->physicalMemorySize = 0x%lx\n", systemcfg->physicalMemorySize); - printk("systemcfg->dCacheL1LineSize = 0x%x\n", systemcfg->dCacheL1LineSize); - printk("systemcfg->iCacheL1LineSize = 0x%x\n", systemcfg->iCacheL1LineSize); + printk("ppc64_caches.dcache_line_size = 0x%x\n", + ppc64_caches.dline_size); + printk("ppc64_caches.icache_line_size = 0x%x\n", + ppc64_caches.iline_size); printk("htab_data.htab = 0x%p\n", htab_data.htab); printk("htab_data.num_ptegs = 0x%lx\n", htab_data.htab_num_ptegs); printk("-----------------------------------------------------\n"); @@ -1000,8 +1005,8 @@ * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - dcache_bsize = systemcfg->dCacheL1LineSize; - icache_bsize = systemcfg->iCacheL1LineSize; + dcache_bsize = ppc64_caches.dline_size; + icache_bsize = ppc64_caches.iline_size; /* reboot on panic */ panic_timeout = 180; diff -ruN linus-bk/arch/ppc64/kernel/sys_ppc32.c linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c --- linus-bk/arch/ppc64/kernel/sys_ppc32.c 2004-10-28 16:57:54.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c 2004-12-31 14:52:14.000000000 +1100 @@ -73,6 +73,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/sysfs.c linus-bk-naca.1/arch/ppc64/kernel/sysfs.c --- linus-bk/arch/ppc64/kernel/sysfs.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/sysfs.c 2004-12-31 14:52:14.000000000 +1100 @@ -13,6 +13,7 @@ #include #include #include +#include /* SMT stuff */ diff -ruN linus-bk/arch/ppc64/kernel/time.c linus-bk-naca.1/arch/ppc64/kernel/time.c --- linus-bk/arch/ppc64/kernel/time.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/time.c 2004-12-31 14:52:14.000000000 +1100 @@ -66,6 +66,7 @@ #include #include #include +#include void smp_local_timer_interrupt(struct pt_regs *); diff -ruN linus-bk/arch/ppc64/kernel/traps.c linus-bk-naca.1/arch/ppc64/kernel/traps.c --- linus-bk/arch/ppc64/kernel/traps.c 2004-09-09 09:59:49.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/traps.c 2004-12-31 14:52:14.000000000 +1100 @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC_PSERIES /* This is true if we are using the firmware NMI handler (typically LPAR) */ diff -ruN linus-bk/include/asm-ppc64/cache.h linus-bk-naca.1/include/asm-ppc64/cache.h --- linus-bk/include/asm-ppc64/cache.h 2002-08-28 06:04:10.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/cache.h 2004-12-31 14:52:14.000000000 +1100 @@ -7,6 +7,8 @@ #ifndef __ARCH_PPC64_CACHE_H #define __ARCH_PPC64_CACHE_H +#include + /* bytes per L1 cache line */ #define L1_CACHE_SHIFT 7 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) @@ -14,4 +16,21 @@ #define SMP_CACHE_BYTES L1_CACHE_BYTES #define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ +#ifndef __ASSEMBLY__ + +struct ppc64_caches { + u32 dsize; /* L1 d-cache size */ + u32 dline_size; /* L1 d-cache line size */ + u32 log_dline_size; + u32 dlines_per_page; + u32 isize; /* L1 i-cache size */ + u32 iline_size; /* L1 i-cache line size */ + u32 log_iline_size; + u32 ilines_per_page; +}; + +extern struct ppc64_caches ppc64_caches; + +#endif + #endif diff -ruN linus-bk/include/asm-ppc64/naca.h linus-bk-naca.1/include/asm-ppc64/naca.h --- linus-bk/include/asm-ppc64/naca.h 2004-09-16 21:51:58.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/naca.h 2004-12-31 14:52:14.000000000 +1100 @@ -16,11 +16,7 @@ #ifndef __ASSEMBLY__ struct naca_struct { - /*================================================================== - * Cache line 1: 0x0000 - 0x007F - * Kernel only data - undefined for user space - *================================================================== - */ + /* Kernel only data - undefined for user space */ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ @@ -32,12 +28,6 @@ u64 interrupt_controller; /* Type of int controller 0x40 */ u64 unused1; /* was SLB size in entries 0x48 */ u64 pftSize; /* Log 2 of page table size 0x50 */ - void *systemcfg; /* Pointer to systemcfg data 0x58 */ - u32 dCacheL1LogLineSize; /* L1 d-cache line size Log2 0x60 */ - u32 dCacheL1LinesPerPage; /* L1 d-cache lines / page 0x64 */ - u32 iCacheL1LogLineSize; /* L1 i-cache line size Log2 0x68 */ - u32 iCacheL1LinesPerPage; /* L1 i-cache lines / page 0x6c */ - u8 resv0[15]; /* Reserved 0x71 - 0x7F */ }; extern struct naca_struct *naca; diff -ruN linus-bk/include/asm-ppc64/page.h linus-bk-naca.1/include/asm-ppc64/page.h --- linus-bk/include/asm-ppc64/page.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/page.h 2004-12-31 14:52:14.000000000 +1100 @@ -93,7 +93,7 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ -#include +#include #undef STRICT_MM_TYPECHECKS @@ -106,8 +106,8 @@ { unsigned long lines, line_size; - line_size = systemcfg->dCacheL1LineSize; - lines = naca->dCacheL1LinesPerPage; + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; __asm__ __volatile__( "mtctr %1 # clear_page\n\ diff -ruN linus-bk/include/asm-ppc64/processor.h linus-bk-naca.1/include/asm-ppc64/processor.h --- linus-bk/include/asm-ppc64/processor.h 2004-12-29 18:05:40.000000000 +1100 +++ linus-bk-naca.1/include/asm-ppc64/processor.h 2004-12-31 15:01:17.000000000 +1100 @@ -19,6 +19,7 @@ #endif #include #include +#include /* Machine State Register (MSR) Fields */ #define MSR_SF_LG 63 /* Enable 64 bit mode */ diff -ruN linus-bk/include/asm-ppc64/systemcfg.h linus-bk-naca.1/include/asm-ppc64/systemcfg.h --- linus-bk/include/asm-ppc64/systemcfg.h 2004-09-29 08:25:16.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/systemcfg.h 2004-12-31 14:52:14.000000000 +1100 @@ -15,14 +15,6 @@ * End Change Activity */ - -#ifndef __KERNEL__ -#include -#include -#include -#include -#endif - /* * If the major version changes we are incompatible. * Minor version changes are a hint. @@ -50,10 +42,11 @@ __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */ __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */ __u32 tz_dsttime; /* Type of dst correction 0x5C */ - __u32 dCacheL1Size; /* L1 d-cache size 0x60 */ - __u32 dCacheL1LineSize; /* L1 d-cache line size 0x64 */ - __u32 iCacheL1Size; /* L1 i-cache size 0x68 */ - __u32 iCacheL1LineSize; /* L1 i-cache line size 0x6C */ + /* next four are no longer used except to be exported to /proc */ + __u32 dcache_size; /* L1 d-cache size 0x60 */ + __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ + __u32 icache_size; /* L1 i-cache size 0x68 */ + __u32 icache_line_size; /* L1 i-cache line size 0x6C */ __u8 reserved0[3984]; /* Reserve rest of page 0x70 */ }; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ee27c294/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:08:33 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:08:33 +1100 Subject: [PATCH 2/11] PPC64: remove the page table size from the naca In-Reply-To: <20050104150410.199b132e.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> Message-ID: <20050104150833.5d3f3722.sfr@canb.auug.org.au> Hi Andrew, This patch just removes the page table size field from the naca (and makes it ppc64_pft_size instead). Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c 2004-12-31 15:16:48.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include @@ -368,7 +367,7 @@ static void pSeries_lpar_hptab_clear(void) { - unsigned long size_bytes = 1UL << naca->pftSize; + unsigned long size_bytes = 1UL << ppc64_pft_size; unsigned long hpte_count = size_bytes >> 4; unsigned long dummy1, dummy2; int i; diff -ruN linus-bk-naca.1/arch/ppc64/kernel/prom.c linus-bk-naca.2/arch/ppc64/kernel/prom.c --- linus-bk-naca.1/arch/ppc64/kernel/prom.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/prom.c 2004-12-31 14:52:56.000000000 +1100 @@ -844,12 +844,12 @@ /* On LPAR, look for the first ibm,pft-size property for the hash table size */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR && naca->pftSize == 0) { + if (systemcfg->platform == PLATFORM_PSERIES_LPAR && ppc64_pft_size == 0) { u32 *pft_size; pft_size = (u32 *)get_flat_dt_prop(node, "ibm,pft-size", NULL); if (pft_size != NULL) { /* pft_size[0] is the NUMA CEC cookie */ - naca->pftSize = pft_size[1]; + ppc64_pft_size = pft_size[1]; } } @@ -1018,7 +1018,7 @@ initial_boot_params = params; /* By default, hash size is not set */ - naca->pftSize = 0; + ppc64_pft_size = 0; /* Retreive various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1047,7 +1047,7 @@ /* If hash size wasn't obtained above, we calculate it now based on * the total RAM size */ - if (naca->pftSize == 0) { + if (ppc64_pft_size == 0) { unsigned long rnd_mem_size, pteg_count; /* round mem_size up to next power of 2 */ @@ -1058,10 +1058,10 @@ /* # pages / 2 */ pteg_count = (rnd_mem_size >> (12 + 1)); - naca->pftSize = __ilog2(pteg_count << 7); + ppc64_pft_size = __ilog2(pteg_count << 7); } - DBG("Hash pftSize: %x\n", (int)naca->pftSize); + DBG("Hash pftSize: %x\n", (int)ppc64_pft_size); DBG(" <- early_init_devtree()\n"); } diff -ruN linus-bk-naca.1/arch/ppc64/kernel/setup.c linus-bk-naca.2/arch/ppc64/kernel/setup.c --- linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-31 16:22:00.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/setup.c 2004-12-31 16:22:49.000000000 +1100 @@ -55,6 +55,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,7 @@ int boot_cpuid = 0; int boot_cpuid_phys = 0; dev_t boot_dev; +u64 ppc64_pft_size; struct ppc64_caches ppc64_caches; @@ -660,7 +662,7 @@ printk("-----------------------------------------------------\n"); printk("naca = 0x%p\n", naca); - printk("naca->pftSize = 0x%lx\n", naca->pftSize); + printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); diff -ruN linus-bk-naca.1/arch/ppc64/mm/hash_utils.c linus-bk-naca.2/arch/ppc64/mm/hash_utils.c --- linus-bk-naca.1/arch/ppc64/mm/hash_utils.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-naca.2/arch/ppc64/mm/hash_utils.c 2004-12-31 14:52:56.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -147,7 +146,7 @@ * Calculate the required size of the htab. We want the number of * PTEGs to equal one half the number of real pages. */ - htab_size_bytes = 1UL << naca->pftSize; + htab_size_bytes = 1UL << ppc64_pft_size; pteg_count = htab_size_bytes >> 7; /* For debug, make the HTAB 1/8 as big as it normally would be. */ diff -ruN linus-bk-naca.1/include/asm-ppc64/naca.h linus-bk-naca.2/include/asm-ppc64/naca.h --- linus-bk-naca.1/include/asm-ppc64/naca.h 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.2/include/asm-ppc64/naca.h 2004-12-31 14:52:56.000000000 +1100 @@ -26,8 +26,6 @@ u64 log; /* Ptr to log buffer 0x30 */ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ u64 interrupt_controller; /* Type of int controller 0x40 */ - u64 unused1; /* was SLB size in entries 0x48 */ - u64 pftSize; /* Log 2 of page table size 0x50 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.1/include/asm-ppc64/page.h linus-bk-naca.2/include/asm-ppc64/page.h --- linus-bk-naca.1/include/asm-ppc64/page.h 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.2/include/asm-ppc64/page.h 2004-12-31 14:52:56.000000000 +1100 @@ -183,6 +183,8 @@ extern int page_is_ram(unsigned long pfn); +extern u64 ppc64_pft_size; /* Log 2 of page table size */ + #endif /* __ASSEMBLY__ */ #ifdef MODULE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/29fa9153/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:12:29 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:12:29 +1100 Subject: [PATCH 3/11] PPC64: remove interrupt_controller from naca In-Reply-To: <20050104150833.5d3f3722.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> Message-ID: <20050104151229.521e8083.sfr@canb.auug.org.au> Hi Andrew, This patch just moves the interrupt_controller field of the naca into a global variable. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.2/arch/ppc64/kernel/irq.c linus-bk-naca.3/arch/ppc64/kernel/irq.c --- linus-bk-naca.2/arch/ppc64/kernel/irq.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.3/arch/ppc64/kernel/irq.c 2004-12-31 14:53:21.000000000 +1100 @@ -65,6 +65,7 @@ int __irq_offset_value; int ppc_spurious_interrupts; unsigned long lpevent_count; +u64 ppc64_interrupt_controller; int show_interrupts(struct seq_file *p, void *v) { @@ -360,7 +361,7 @@ unsigned int virq, first_virq; static int warned; - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) return real_irq; /* no mapping for openpic (for now) */ /* don't map interrupts < MIN_VIRT_IRQ */ diff -ruN linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c 2004-12-31 14:53:21.000000000 +1100 @@ -155,7 +155,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- maple_init_early\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c 2004-12-31 14:53:21.000000000 +1100 @@ -353,7 +353,7 @@ unsigned int *opprop = NULL; struct device_node *root = of_find_node_by_path("/"); - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { opprop = (unsigned int *)get_property(root, "platform-open-pic", NULL); } @@ -375,7 +375,7 @@ pci_process_bridge_OF_ranges(phb, node); pci_setup_phb_io(phb, index == 0); - if (naca->interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { + if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { int addr = root_size_cells * (index + 2) - 1; mpic_assign_isu(pSeries_mpic, index, opprop[addr]); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:22:17.000000000 +1100 @@ -196,7 +196,7 @@ static void __init pSeries_setup_arch(void) { /* Fixup ppc_md depending on the type of interrupt controller */ - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { ppc_md.init_IRQ = pSeries_init_mpic; ppc_md.get_irq = mpic_get_irq; /* Allocate the mpic now, so that find_and_init_phbs() can @@ -308,13 +308,13 @@ * to properly parse the OF interrupt tree & do the virtual irq mapping */ __irq_offset_value = NUM_ISA_INTERRUPTS; - naca->interrupt_controller = IC_INVALID; + ppc64_interrupt_controller = IC_INVALID; for (np = NULL; (np = of_find_node_by_name(np, "interrupt-controller"));) { typep = (char *)get_property(np, "compatible", NULL); if (strstr(typep, "open-pic")) - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; else if (strstr(typep, "ppc-xicp")) - naca->interrupt_controller = IC_PPC_XIC; + ppc64_interrupt_controller = IC_PPC_XIC; else printk("initialize_naca: failed to recognize" " interrupt-controller\n"); diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:22:45.000000000 +1100 @@ -348,7 +348,7 @@ DBG(" -> smp_init_pSeries()\n"); - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) smp_ops = &pSeries_mpic_smp_ops; else smp_ops = &pSeries_xics_smp_ops; diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:53:21.000000000 +1100 @@ -70,7 +70,6 @@ #include #include #include -#include #include "pmac.h" #include "mpic.h" @@ -316,7 +315,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- pmac_init_early\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/prom.c linus-bk-naca.3/arch/ppc64/kernel/prom.c --- linus-bk-naca.2/arch/ppc64/kernel/prom.c 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/prom.c 2004-12-31 14:53:21.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -557,7 +556,7 @@ DBG(" -> finish_device_tree\n"); - if (naca->interrupt_controller == IC_INVALID) { + if (ppc64_interrupt_controller == IC_INVALID) { DBG("failed to configure interrupt controller type\n"); panic("failed to configure interrupt controller type\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/setup.c linus-bk-naca.3/arch/ppc64/kernel/setup.c --- linus-bk-naca.2/arch/ppc64/kernel/setup.c 2004-12-31 16:22:49.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/setup.c 2004-12-31 16:23:03.000000000 +1100 @@ -664,7 +664,7 @@ printk("naca = 0x%p\n", naca); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); - printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); + printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); diff -ruN linus-bk-naca.2/arch/ppc64/kernel/xics.c linus-bk-naca.3/arch/ppc64/kernel/xics.c --- linus-bk-naca.2/arch/ppc64/kernel/xics.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/xics.c 2004-12-31 15:24:20.000000000 +1100 @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include @@ -575,7 +574,7 @@ */ static int __init xics_setup_i8259(void) { - if (naca->interrupt_controller == IC_PPC_XIC && + if (ppc64_interrupt_controller == IC_PPC_XIC && xics_irq_8259_cascade != -1) { if (request_irq(irq_offset_up(xics_irq_8259_cascade), no_action, 0, "8259 cascade", NULL)) diff -ruN linus-bk-naca.2/include/asm-ppc64/naca.h linus-bk-naca.3/include/asm-ppc64/naca.h --- linus-bk-naca.2/include/asm-ppc64/naca.h 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-naca.3/include/asm-ppc64/naca.h 2004-12-31 14:53:21.000000000 +1100 @@ -25,7 +25,6 @@ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ - u64 interrupt_controller; /* Type of int controller 0x40 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.2/include/asm-ppc64/processor.h linus-bk-naca.3/include/asm-ppc64/processor.h --- linus-bk-naca.2/include/asm-ppc64/processor.h 2004-12-31 15:01:17.000000000 +1100 +++ linus-bk-naca.3/include/asm-ppc64/processor.h 2004-12-31 15:25:17.000000000 +1100 @@ -484,6 +484,7 @@ #ifdef __KERNEL__ extern int have_of; +extern u64 ppc64_interrupt_controller; struct task_struct; void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp); -------------- next part -------------- A non-text attachment was scrubbed... Name: 00000000.mimetmp Type: application/pgp-signature Size: 190 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment.pgp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment-0001.pgp From sfr at canb.auug.org.au Tue Jan 4 15:19:06 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:19:06 +1100 Subject: [PATCH 4/11] PPC64: remove /proc/ppc64/{naca,paca/xx} In-Reply-To: <20050104151229.521e8083.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> Message-ID: <20050104151906.6e50f1d2.sfr@canb.auug.org.au> Hi Andrew, This patch removes the (unused) /proc entries for the naca and the (per cpu) pacas. Also it removes a lot of no longer necessary includes of . Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c 2004-12-10 16:26:54.000000000 +1100 @@ -35,7 +35,6 @@ #include #include #include -#include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c 2004-10-22 07:00:21.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c 2004-12-10 16:26:54.000000000 +1100 @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c 2004-12-10 16:26:54.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c --- linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c 2004-12-10 16:26:54.000000000 +1100 @@ -36,7 +36,6 @@ #include #include #include -#include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c --- linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:22:45.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:27:45.000000000 +1100 @@ -38,7 +38,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c --- linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c 2004-10-25 18:18:33.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c 2004-12-10 16:26:54.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include "pci.h" diff -ruN linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c --- linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c 2004-10-27 07:32:57.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c 2004-12-10 16:26:54.000000000 +1100 @@ -25,8 +25,6 @@ #include #include -#include -#include #include #include #include @@ -58,26 +56,6 @@ #endif /* - * NOTE: since paca data is always in flux the values will never be a - * consistant set. - */ -static void __init proc_create_paca(struct proc_dir_entry *dir, int num) -{ - struct proc_dir_entry *ent; - struct paca_struct *lpaca = paca + num; - char buf[16]; - - sprintf(buf, "%02x", num); - ent = create_proc_entry(buf, S_IRUSR, dir); - if (ent) { - ent->nlink = 1; - ent->data = lpaca; - ent->size = 4096; - ent->proc_fops = &page_map_fops; - } -} - -/* * Create the ppc64 and ppc64/rtas directories early. This allows us to * assume that they have been previously created in drivers. */ @@ -104,17 +82,8 @@ static int __init proc_ppc64_init(void) { - unsigned long i; struct proc_dir_entry *pde; - pde = create_proc_entry("ppc64/naca", S_IRUSR, NULL); - if (!pde) - return 1; - pde->nlink = 1; - pde->data = naca; - pde->size = 4096; - pde->proc_fops = &page_map_fops; - pde = create_proc_entry("ppc64/systemcfg", S_IFREG|S_IRUGO, NULL); if (!pde) return 1; @@ -123,13 +92,6 @@ pde->size = 4096; pde->proc_fops = &page_map_fops; - /* /proc/ppc64/paca/XX -- raw paca contents. Only readable to root */ - pde = proc_mkdir("ppc64/paca", NULL); - if (!pde) - return 1; - for_each_cpu(i) - proc_create_paca(pde, i); - #ifdef CONFIG_PPC_PSERIES if ((systemcfg->platform & PLATFORM_PSERIES)) proc_ppc64_create_ofdt(); diff -ruN linus-bk-naca.3/arch/ppc64/kernel/prom_init.c linus-bk-naca.4/arch/ppc64/kernel/prom_init.c --- linus-bk-naca.3/arch/ppc64/kernel/prom_init.c 2004-12-08 12:07:34.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/prom_init.c 2004-12-10 16:26:54.000000000 +1100 @@ -43,7 +43,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/smp.c linus-bk-naca.4/arch/ppc64/kernel/smp.c --- linus-bk-naca.3/arch/ppc64/kernel/smp.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/smp.c 2004-12-31 15:29:14.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/mm/init.c linus-bk-naca.4/arch/ppc64/mm/init.c --- linus-bk-naca.3/arch/ppc64/mm/init.c 2004-11-04 16:05:08.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/mm/init.c 2004-12-10 16:26:54.000000000 +1100 @@ -52,7 +52,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/mm/slb.c linus-bk-naca.4/arch/ppc64/mm/slb.c --- linus-bk-naca.3/arch/ppc64/mm/slb.c 2004-09-06 10:19:04.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/mm/slb.c 2004-12-10 16:26:54.000000000 +1100 @@ -19,7 +19,6 @@ #include #include #include -#include #include extern void slb_allocate(unsigned long ea); diff -ruN linus-bk-naca.3/arch/ppc64/mm/stab.c linus-bk-naca.4/arch/ppc64/mm/stab.c --- linus-bk-naca.3/arch/ppc64/mm/stab.c 2004-09-16 21:51:57.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/mm/stab.c 2004-12-10 16:26:54.000000000 +1100 @@ -17,7 +17,6 @@ #include #include #include -#include #include /* Both the segment table and SLB code uses the following cache */ diff -ruN linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h --- linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h 2002-09-18 12:00:50.000000000 +1000 +++ linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h 2004-12-10 16:26:54.000000000 +1100 @@ -24,11 +24,9 @@ #include #include -#include #include #include #include -#include #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/c799e8f4/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:23:40 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:23:40 +1100 Subject: [PATCH 5/11] PPC64: remove the paca pointer form the naca In-Reply-To: <20050104151906.6e50f1d2.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> Message-ID: <20050104152340.67219ccf.sfr@canb.auug.org.au> Hi Andrew, The only place that was using the paca pointer that was in the naca was some assembler that used it to find a parameter to pass to some C code. That C code did not even declare that parameter! Remove the paca pointer. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c 2004-12-10 17:27:14.000000000 +1100 @@ -28,7 +28,6 @@ #include #include -#include #include #include #include @@ -68,8 +67,6 @@ #endif /* CONFIG_ALTIVEC */ DEFINE(MM, offsetof(struct task_struct, mm)); - /* naca */ - DEFINE(PACA, offsetof(struct naca_struct, paca)); DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size)); DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size)); DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page)); diff -ruN linus-bk-naca.4/arch/ppc64/kernel/head.S linus-bk-naca.5/arch/ppc64/kernel/head.S --- linus-bk-naca.4/arch/ppc64/kernel/head.S 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.5/arch/ppc64/kernel/head.S 2004-12-10 18:40:24.000000000 +1100 @@ -517,12 +517,7 @@ __start_naca: #ifdef CONFIG_PPC_ISERIES .llong itVpdAreas -#else - .llong 0x0 #endif - .llong 0x0 - .llong 0x0 - .llong paca . = SYSTEMCFG_PHYS_ADDR .globl __end_naca @@ -1241,6 +1236,7 @@ #endif #endif b 3b /* Loop until told to go */ + #ifdef CONFIG_PPC_ISERIES _STATIC(__start_initialization_iSeries) /* Clear out the BSS */ @@ -1278,10 +1274,6 @@ SET_REG_TO_CONST(r4, NACA_VIRT_ADDR) std r4,0(r9) /* set the naca pointer */ - /* Get the pointer to the segment table */ - ld r6,PACA(r4) /* Get the base paca pointer */ - ld r4,PACASTABVIRT(r6) - bl .iSeries_early_setup /* relocation is on at this point */ diff -ruN linus-bk-naca.4/include/asm-ppc64/naca.h linus-bk-naca.5/include/asm-ppc64/naca.h --- linus-bk-naca.4/include/asm-ppc64/naca.h 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.5/include/asm-ppc64/naca.h 2004-12-10 18:42:14.000000000 +1100 @@ -11,7 +11,6 @@ */ #include -#include #ifndef __ASSEMBLY__ @@ -20,7 +19,6 @@ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ - struct paca_struct *paca; /* Ptr to an array of pacas 0x18 */ u64 debug_switch; /* Debug print control 0x20 */ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/3e8e9116/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:27:05 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:27:05 +1100 Subject: [PATCH 6/11] PPC64: remove serialPortAddr from the naca In-Reply-To: <20050104152340.67219ccf.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> Message-ID: <20050104152705.6030abc5.sfr@canb.auug.org.au> Hi Andrew, The serialPortAddr field of the naca was only being used locally, remove it. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c --- linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c 2004-12-11 00:53:42.000000000 +1100 @@ -75,7 +75,8 @@ extern void maple_pci_init(void); extern void maple_pcibios_fixup(void); extern int maple_pci_get_legacy_ide_irq(struct pci_dev *dev, int channel); -extern void generic_find_legacy_serial_ports(unsigned int *default_speed); +extern void generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed); static void maple_restart(char *cmd) @@ -129,6 +130,7 @@ static void __init maple_init_early(void) { unsigned int default_speed; + u64 physport; DBG(" -> maple_init_early\n"); @@ -138,14 +140,14 @@ hpte_init_native(); /* Find the serial port */ - generic_find_legacy_serial_ports(&default_speed); + generic_find_legacy_serial_ports(&physport, &default_speed); - DBG("naca->serialPortAddr: %lx\n", (long)naca->serialPortAddr); + DBG("phys port addr: %lx\n", (long)physport); - if (naca->serialPortAddr) { + if (physport) { void *comport; /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; diff -ruN linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:22:17.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:35:13.000000000 +1100 @@ -81,7 +81,8 @@ extern int pSeries_set_rtc_time(struct rtc_time *rtc_time); extern void find_udbg_vterm(void); extern void SystemReset_FWNMI(void), MachineCheck_FWNMI(void); /* from head.S */ -extern void generic_find_legacy_serial_ports(unsigned int *default_speed); +extern void generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed); int fwnmi_active; /* TRUE if an FWNMI handler is present */ @@ -344,6 +345,7 @@ void *comport; int iommu_off = 0; unsigned int default_speed; + u64 physport; DBG(" -> pSeries_init_early()\n"); @@ -357,13 +359,13 @@ get_property(of_chosen, "linux,iommu-off", NULL)); } - generic_find_legacy_serial_ports(&default_speed); + generic_find_legacy_serial_ports(&physport, &default_speed); if (systemcfg->platform & PLATFORM_LPAR) find_udbg_vterm(); - else if (naca->serialPortAddr) { + else if (physport) { /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; diff -ruN linus-bk-naca.5/arch/ppc64/kernel/setup.c linus-bk-naca.6/arch/ppc64/kernel/setup.c --- linus-bk-naca.5/arch/ppc64/kernel/setup.c 2004-12-31 16:24:54.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/setup.c 2004-12-31 16:23:30.000000000 +1100 @@ -1154,7 +1154,8 @@ static struct plat_serial8250_port serial_ports[MAX_LEGACY_SERIAL_PORTS+1]; static unsigned int old_serial_count; -void __init generic_find_legacy_serial_ports(unsigned int *default_speed) +void __init generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed) { struct device_node *np; u32 *sizeprop; @@ -1172,7 +1173,7 @@ DBG(" -> generic_find_legacy_serial_port()\n"); - naca->serialPortAddr = 0; + *physport = 0; if (default_speed) *default_speed = 0; @@ -1294,7 +1295,7 @@ io_base = (io_base << 32) | rangesp[4]; } if (io_base != 0) { - naca->serialPortAddr = io_base + reg->address; + *physport = io_base + reg->address; if (default_speed && spd) *default_speed = *spd; } diff -ruN linus-bk-naca.5/include/asm-ppc64/naca.h linus-bk-naca.6/include/asm-ppc64/naca.h --- linus-bk-naca.5/include/asm-ppc64/naca.h 2004-12-10 18:42:14.000000000 +1100 +++ linus-bk-naca.6/include/asm-ppc64/naca.h 2004-12-11 00:03:55.000000000 +1100 @@ -22,7 +22,6 @@ u64 debug_switch; /* Debug print control 0x20 */ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ - u64 serialPortAddr; /* Phy addr of serial port 0x38 */ }; extern struct naca_struct *naca; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/e06bdbb4/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:31:02 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:31:02 +1100 Subject: [PATCH 7/11] PPC64: remove debug_switch from the naca In-Reply-To: <20050104152705.6030abc5.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> Message-ID: <20050104153102.67284491.sfr@canb.auug.org.au> Hi Andrew, The patch moves the debug_switch from the naca to a global variable. Also, a couple of trivial naming tidy ups. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:35:13.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:39:01.000000000 +1100 @@ -56,7 +56,6 @@ #include #include #include -#include #include #include @@ -317,7 +316,7 @@ else if (strstr(typep, "ppc-xicp")) ppc64_interrupt_controller = IC_PPC_XIC; else - printk("initialize_naca: failed to recognize" + printk("pSeries_discover_pic: failed to recognize" " interrupt-controller\n"); break; } diff -ruN linus-bk-naca.6/arch/ppc64/kernel/setup.c linus-bk-naca.7/arch/ppc64/kernel/setup.c --- linus-bk-naca.6/arch/ppc64/kernel/setup.c 2004-12-31 16:23:30.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/setup.c 2004-12-31 16:25:02.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -113,6 +112,7 @@ int boot_cpuid_phys = 0; dev_t boot_dev; u64 ppc64_pft_size; +u64 ppc64_debug_switch; struct ppc64_caches ppc64_caches; @@ -161,7 +161,7 @@ */ void __init ppcdbg_initialize(void) { - naca->debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ + ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ /* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */; } @@ -399,7 +399,7 @@ DBG(" -> early_setup()\n"); /* - * Fill the default DBG level in naca (do we want to keep + * Fill the default DBG level (do we want to keep * that old mecanism around forever ?) */ ppcdbg_initialize(); @@ -453,17 +453,17 @@ /* - * Initialize some remaining members of the naca and systemcfg structures + * Initialize some remaining members of the ppc64_caches and systemcfg structures * (at least until we get rid of them completely). This is mostly some * cache informations about the CPU that will be used by cache flush * routines and/or provided to userland */ -static void __init initialize_naca(void) +static void __init initialize_cache_info(void) { struct device_node *np; unsigned long num_cpus = 0; - DBG(" -> initialize_naca()\n"); + DBG(" -> initialize_cache_info()\n"); for (np = NULL; (np = of_find_node_by_type(np, "cpu"));) { num_cpus += 1; @@ -530,7 +530,7 @@ systemcfg->version.minor = SYSTEMCFG_MINOR; systemcfg->processor = mfspr(SPRN_PVR); - DBG(" <- initialize_naca()\n"); + DBG(" <- initialize_cache_info()\n"); } static void __init check_for_initrd(void) @@ -591,7 +591,7 @@ unflatten_device_tree(); /* - * Fill the naca & systemcfg structures with informations + * Fill the ppc64_caches & systemcfg structures with informations * retreived from the device-tree. Need to be called before * finish_device_tree() since the later requires some of the * informations filled up here to properly parse the interrupt @@ -600,7 +600,7 @@ * routines like flush_icache_range (used by the hash init * later on). */ - initialize_naca(); + initialize_cache_info(); #ifdef CONFIG_PPC_PSERIES /* @@ -661,9 +661,8 @@ printk("Starting Linux PPC64 %s\n", UTS_RELEASE); printk("-----------------------------------------------------\n"); - printk("naca = 0x%p\n", naca); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); - printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); + printk("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); diff -ruN linus-bk-naca.6/arch/ppc64/kernel/udbg.c linus-bk-naca.7/arch/ppc64/kernel/udbg.c --- linus-bk-naca.6/arch/ppc64/kernel/udbg.c 2004-11-22 14:05:02.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/udbg.c 2004-12-11 02:31:17.000000000 +1100 @@ -15,7 +15,6 @@ #include #include #include -#include #include #include #include @@ -323,7 +322,7 @@ /* Special print used by PPCDBG() macro */ void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...) { - unsigned long active_debugs = debug_flags & naca->debug_switch; + unsigned long active_debugs = debug_flags & ppc64_debug_switch; if (active_debugs) { va_list ap; @@ -357,5 +356,5 @@ unsigned long udbg_ifdebug(unsigned long flags) { - return (flags & naca->debug_switch); + return (flags & ppc64_debug_switch); } diff -ruN linus-bk-naca.6/arch/ppc64/xmon/xmon.c linus-bk-naca.7/arch/ppc64/xmon/xmon.c --- linus-bk-naca.6/arch/ppc64/xmon/xmon.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/xmon/xmon.c 2004-12-11 02:33:00.000000000 +1100 @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -2360,9 +2359,9 @@ if (cmd == '\n') { /* show current state */ unsigned long i; - printf("naca->debug_switch = 0x%lx\n", naca->debug_switch); + printf("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); for (i = 0; i < PPCDBG_NUM_FLAGS ;i++) { - on = PPCDBG_BITVAL(i) & naca->debug_switch; + on = PPCDBG_BITVAL(i) & ppc64_debug_switch; printf("%02x %s %12s ", i, on ? "on " : "off", trace_names[i] ? trace_names[i] : ""); if (((i+1) % 3) == 0) printf("\n"); @@ -2376,7 +2375,7 @@ on = (cmd == '+'); cmd = inchar(); if (cmd == ' ' || cmd == '\n') { /* Turn on or off based on + or - */ - naca->debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE; + ppc64_debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE; printf("Setting all values to %s...\n", on ? "on" : "off"); if (cmd == '\n') return; else cmd = skipbl(); @@ -2391,10 +2390,10 @@ return; } if (on) { - naca->debug_switch |= PPCDBG_BITVAL(val); + ppc64_debug_switch |= PPCDBG_BITVAL(val); printf("enable debug %x %s\n", val, trace_names[val] ? trace_names[val] : ""); } else { - naca->debug_switch &= ~PPCDBG_BITVAL(val); + ppc64_debug_switch &= ~PPCDBG_BITVAL(val); printf("disable debug %x %s\n", val, trace_names[val] ? trace_names[val] : ""); } cmd = skipbl(); diff -ruN linus-bk-naca.6/include/asm-ppc64/naca.h linus-bk-naca.7/include/asm-ppc64/naca.h --- linus-bk-naca.6/include/asm-ppc64/naca.h 2004-12-11 00:03:55.000000000 +1100 +++ linus-bk-naca.7/include/asm-ppc64/naca.h 2004-12-11 02:41:18.000000000 +1100 @@ -19,9 +19,6 @@ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ - u64 debug_switch; /* Debug print control 0x20 */ - u64 banner; /* Ptr to banner string 0x28 */ - u64 log; /* Ptr to log buffer 0x30 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.6/include/asm-ppc64/ppcdebug.h linus-bk-naca.7/include/asm-ppc64/ppcdebug.h --- linus-bk-naca.6/include/asm-ppc64/ppcdebug.h 2004-02-16 08:19:48.000000000 +1100 +++ linus-bk-naca.7/include/asm-ppc64/ppcdebug.h 2004-12-13 12:05:25.000000000 +1100 @@ -16,13 +16,14 @@ ********************************************************************/ #include +#include #include #include #define PPCDBG_BITVAL(X) ((1UL)<<((unsigned long)(X))) /* Defined below are the bit positions of various debug flags in the - * debug_switch variable (defined in naca.h). + * ppc64_debug_switch variable. * -- When adding new values, please enter them into trace names below -- * * Values 62 & 63 can be used to stress the hardware page table management @@ -64,6 +65,8 @@ #define PPCDBG_NUM_FLAGS 64 +extern u64 ppc64_debug_switch; + #ifdef WANT_PPCDBG_TAB /* A table of debug switch names to allow name lookup in xmon * (and whoever else wants it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a19eb844/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:34:45 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:34:45 +1100 Subject: [PATCH 8/11] PPC64: remove the naca from all but iSeries In-Reply-To: <20050104153102.67284491.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> Message-ID: <20050104153445.3777e689.sfr@canb.auug.org.au> Hi Andrew, This patch finally removes the naca from all architectures except legacy iSeries and in the process makes it a structure instead of a pointer. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.7/arch/ppc64/kernel/LparData.c linus-bk-naca.8/arch/ppc64/kernel/LparData.c --- linus-bk-naca.7/arch/ppc64/kernel/LparData.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.8/arch/ppc64/kernel/LparData.c 2004-12-11 02:49:48.000000000 +1100 @@ -44,7 +44,7 @@ 0xc8a5d9c4, /* desc = "HvRD" ebcdic */ sizeof(struct HvReleaseData), offsetof(struct naca_struct, xItVpdAreas), - (struct naca_struct *)(NACA_VIRT_ADDR), /* 64-bit Naca address */ + &naca, /* 64-bit Naca address */ 0x6000, /* offset of LparMap within loadarea (see head.S) */ 0, 1, /* tags inactive */ diff -ruN linus-bk-naca.7/arch/ppc64/kernel/head.S linus-bk-naca.8/arch/ppc64/kernel/head.S --- linus-bk-naca.7/arch/ppc64/kernel/head.S 2004-12-10 18:40:24.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/head.S 2004-12-11 02:56:12.000000000 +1100 @@ -512,17 +512,15 @@ */ . = NACA_PHYS_ADDR .globl __end_interrupts - .globl __start_naca __end_interrupts: -__start_naca: #ifdef CONFIG_PPC_ISERIES + .globl naca +naca: .llong itVpdAreas #endif . = SYSTEMCFG_PHYS_ADDR - .globl __end_naca .globl __start_systemcfg -__end_naca: __start_systemcfg: . = (SYSTEMCFG_PHYS_ADDR + PAGE_SIZE) .globl __end_systemcfg @@ -1270,10 +1268,6 @@ SET_REG_TO_CONST(r4, SYSTEMCFG_VIRT_ADDR) std r4,0(r9) /* set the systemcfg pointer */ - LOADADDR(r9,naca) - SET_REG_TO_CONST(r4, NACA_VIRT_ADDR) - std r4,0(r9) /* set the naca pointer */ - bl .iSeries_early_setup /* relocation is on at this point */ @@ -1873,12 +1867,6 @@ li r27,SYSTEMCFG_PHYS_ADDR std r27,0(r6) /* set the value of systemcfg */ - /* setup the naca pointer which is needed by *tab_initialize */ - LOADADDR(r6,naca) - sub r6,r6,r26 /* addr of the variable naca */ - li r27,NACA_PHYS_ADDR - std r27,0(r6) /* set the value of naca */ - #ifdef CONFIG_HMT /* Start up the second thread on cpu 0 */ mfspr r3,PVR @@ -2015,11 +2003,6 @@ SET_REG_TO_CONST(r8, SYSTEMCFG_VIRT_ADDR) std r8,0(r9) - /* setup the naca pointer */ - LOADADDR(r9,naca) - SET_REG_TO_CONST(r8, NACA_VIRT_ADDR) - std r8,0(r9) /* set the value of the naca ptr */ - LOADADDR(r26, boot_cpuid) lwz r26,0(r26) diff -ruN linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c 2004-12-11 02:51:17.000000000 +1100 @@ -314,13 +314,13 @@ * If the init RAM disk has been configured and there is * a non-zero starting address for it, set it up */ - if (naca->xRamDisk) { - initrd_start = (unsigned long)__va(naca->xRamDisk); - initrd_end = initrd_start + naca->xRamDiskSize * PAGE_SIZE; + if (naca.xRamDisk) { + initrd_start = (unsigned long)__va(naca.xRamDisk); + initrd_end = initrd_start + naca.xRamDiskSize * PAGE_SIZE; initrd_below_start_ok = 1; // ramdisk in kernel space ROOT_DEV = Root_RAM0; - if (((rd_size * 1024) / PAGE_SIZE) < naca->xRamDiskSize) - rd_size = (naca->xRamDiskSize * PAGE_SIZE) / 1024; + if (((rd_size * 1024) / PAGE_SIZE) < naca.xRamDiskSize) + rd_size = (naca.xRamDiskSize * PAGE_SIZE) / 1024; } else #endif /* CONFIG_BLK_DEV_INITRD */ { @@ -813,9 +813,9 @@ * Change klimit to take into account any ram disk * that may be included */ - if (naca->xRamDisk) - klimit = KERNELBASE + (u64)naca->xRamDisk + - (naca->xRamDiskSize * PAGE_SIZE); + if (naca.xRamDisk) + klimit = KERNELBASE + (u64)naca.xRamDisk + + (naca.xRamDiskSize * PAGE_SIZE); else { /* * No ram disk was included - check and see if there diff -ruN linus-bk-naca.7/arch/ppc64/kernel/pacaData.c linus-bk-naca.8/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.7/arch/ppc64/kernel/pacaData.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/pacaData.c 2004-12-11 02:50:23.000000000 +1100 @@ -18,11 +18,8 @@ #include #include -#include #include -struct naca_struct *naca; -EXPORT_SYMBOL(naca); struct systemcfg *systemcfg; EXPORT_SYMBOL(systemcfg); diff -ruN linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h --- linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h 2004-01-20 08:20:26.000000000 +1100 +++ linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h 2004-12-11 02:52:05.000000000 +1100 @@ -26,6 +26,7 @@ // address of the OS's NACA). // #include +#include //============================================================================= // diff -ruN linus-bk-naca.7/include/asm-ppc64/naca.h linus-bk-naca.8/include/asm-ppc64/naca.h --- linus-bk-naca.7/include/asm-ppc64/naca.h 2004-12-11 02:41:18.000000000 +1100 +++ linus-bk-naca.8/include/asm-ppc64/naca.h 2004-12-11 02:54:02.000000000 +1100 @@ -21,12 +21,11 @@ u64 xRamDiskSize; /* In pages 0x10 */ }; -extern struct naca_struct *naca; +extern struct naca_struct naca; #endif /* __ASSEMBLY__ */ #define NACA_PAGE 0x4 #define NACA_PHYS_ADDR (NACA_PAGE< References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> Message-ID: <20050104153740.56622b4f.sfr@canb.auug.org.au> Hi Andrew, This fixes an aweful piece of code that could have just referenced xPMCRegsInUse in the lppaca structure. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.8/arch/ppc64/kernel/sysfs.c linus-bk-naca.9/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.8/arch/ppc64/kernel/sysfs.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.9/arch/ppc64/kernel/sysfs.c 2004-12-13 14:49:37.000000000 +1100 @@ -14,6 +14,8 @@ #include #include #include +#include +#include /* SMT stuff */ @@ -154,10 +156,8 @@ #ifdef CONFIG_PPC_PSERIES /* instruct hypervisor to maintain PMCs */ - if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - char *ptr = (char *)&paca[smp_processor_id()].lppaca; - ptr[0xBB] = 1; - } + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) + get_paca()->lppaca.xPMCRegsInUse = 1; /* * On SMT machines we have to set the run latch in the ctrl register -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ea16f4d5/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:40:25 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:40:25 +1100 Subject: [PATCH 10/11] PPC64: move the lppaca defining header file In-Reply-To: <20050104153740.56622b4f.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> Message-ID: <20050104154025.63a1b9fb.sfr@canb.auug.org.au> Hi Andrew, This patch just renames asm/iSeries/ItLpPaca.h to asm/lppaca.h as the lppaca structure is no longer just legacy iSeries specific. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.9/arch/ppc64/kernel/LparData.c linus-bk-naca.10/arch/ppc64/kernel/LparData.c --- linus-bk-naca.9/arch/ppc64/kernel/LparData.c 2004-12-11 02:49:48.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/LparData.c 2004-12-13 15:01:55.000000000 +1100 @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c 2004-12-10 17:27:14.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c 2004-12-13 15:02:03.000000000 +1100 @@ -29,7 +29,7 @@ #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c --- linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c 2004-12-13 15:02:14.000000000 +1100 @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c --- linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c 2004-11-20 12:05:26.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c 2004-12-13 15:02:29.000000000 +1100 @@ -27,7 +27,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/pacaData.c linus-bk-naca.10/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.9/arch/ppc64/kernel/pacaData.c 2004-12-11 02:50:23.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/pacaData.c 2004-12-13 15:02:07.000000000 +1100 @@ -16,7 +16,7 @@ #include #include -#include +#include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/sysfs.c linus-bk-naca.10/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.9/arch/ppc64/kernel/sysfs.c 2004-12-13 14:49:37.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/sysfs.c 2004-12-13 15:01:19.000000000 +1100 @@ -15,7 +15,7 @@ #include #include #include -#include +#include /* SMT stuff */ diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h --- linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h 2004-01-20 08:20:26.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h 1970-01-01 10:00:00.000000000 +1000 @@ -1,134 +0,0 @@ -/* - * ItLpPaca.h - * Copyright (C) 2001 Mike Corrigan IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -#ifndef _ITLPPACA_H -#define _ITLPPACA_H - -//============================================================================= -// -// This control block contains the data that is shared between the -// hypervisor (PLIC) and the OS. -// -// -//---------------------------------------------------------------------------- -#include - -struct ItLpPaca -{ -//============================================================================= -// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data -// NOTE: The xDynXyz fields are fields that will be dynamically changed by -// PLIC when preparing to bring a processor online or when dispatching a -// virtual processor! -//============================================================================= - u32 xDesc; // Eye catcher 0xD397D781 x00-x03 - u16 xSize; // Size of this struct x04-x05 - u16 xRsvd1_0; // Reserved x06-x07 - u16 xRsvd1_1:14; // Reserved x08-x09 - u8 xSharedProc:1; // Shared processor indicator ... - u8 xSecondaryThread:1; // Secondary thread indicator ... - volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A - u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B - volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D - volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F - u32 xDecrVal; // Value for Decr programming x10-x13 - u32 xPMCVal; // Value for PMC regs x14-x17 - volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B - volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F - volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 - u32 xDseiData; // DSEI data x24-x27 - u64 xSPRG3; // SPRG3 value x28-x2F - u8 xRsvd1_3[80]; // Reserved x30-x7F - -//============================================================================= -// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data -//============================================================================= - // This Dword contains a byte for each type of interrupt that can occur. - // The IPI is a count while the others are just a binary 1 or 0. - union { - u64 xAnyInt; - struct { - u16 xRsvd; // Reserved - cleared by #mpasmbl - u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO - u8 xIpiCnt; // IPI Count - u8 xDecrInt; // DECR interrupt occurred - u8 xPdcInt; // PDC interrupt occurred - u8 xQuantumInt; // Interrupt quantum reached - u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending - } xFields; - } xIntDword; - - // Whenever any fields in this Dword are set then PLIC will defer the - // processing of external interrupts. Note that PLIC will store the - // XIRR directly into the xXirrValue field so that another XIRR will - // not be presented until this one clears. The layout of the low - // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the - // entire Dword is zero or not. A non-zero value in the low order - // 2-bytes will result in SLIC being granted the highest thread - // priority upon return. A 0 will return to SLIC as medium priority. - u64 xPlicDeferIntsArea; // Entire Dword - - // Used to pass the real SRR0/1 from PLIC to SLIC as well as to - // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. - u64 xSavedSrr0; // Saved SRR0 x10-x17 - u64 xSavedSrr1; // Saved SRR1 x18-x1F - - // Used to pass parms from the OS to PLIC for SetAsrAndRfid - u64 xSavedGpr3; // Saved GPR3 x20-x27 - u64 xSavedGpr4; // Saved GPR4 x28-x2F - u64 xSavedGpr5; // Saved GPR5 x30-x37 - - u8 xRsvd2_1; // Reserved x38-x38 - u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 - u8 xFPRegsInUse; // FP regs in use x3A-x3A - u8 xPMCRegsInUse; // PMC regs in use x3B-x3B - volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F - volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 - volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F - u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 - u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F - u64 xEndOfQuantum; // TB at end of quantum x60-x67 - u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F - u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 - volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B - u16 xSLBCount; // # of SLBs to maintain x7C-x7D - u8 xIdle; // Indicate OS is idle x7E - u8 xRsvd2_2; // Reserved x7F - - -//============================================================================= -// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors -//============================================================================= - // This is the xYieldCount. An "odd" value (low bit on) means that - // the processor is yielded (either because of an OS yield or a PLIC - // preempt). An even value implies that the processor is currently - // executing. - // NOTE: This value will ALWAYS be zero for dedicated processors and - // will NEVER be zero for shared processors (ie, initialized to a 1). - volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 - u8 xRsvd3_0[124]; // Reserved x04-x7F - -//============================================================================= -// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data -//============================================================================= - u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF - - -}; - -#endif /* _ITLPPACA_H */ diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h --- linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h 2004-12-13 15:03:03.000000000 +1100 @@ -25,7 +25,6 @@ #include #include -#include #include #include #include diff -ruN linus-bk-naca.9/include/asm-ppc64/lppaca.h linus-bk-naca.10/include/asm-ppc64/lppaca.h --- linus-bk-naca.9/include/asm-ppc64/lppaca.h 1970-01-01 10:00:00.000000000 +1000 +++ linus-bk-naca.10/include/asm-ppc64/lppaca.h 2004-12-13 15:04:43.000000000 +1100 @@ -0,0 +1,134 @@ +/* + * lppaca.h + * Copyright (C) 2001 Mike Corrigan IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _ASM_LPPACA_H +#define _ASM_LPPACA_H + +//============================================================================= +// +// This control block contains the data that is shared between the +// hypervisor (PLIC) and the OS. +// +// +//---------------------------------------------------------------------------- +#include + +struct ItLpPaca +{ +//============================================================================= +// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data +// NOTE: The xDynXyz fields are fields that will be dynamically changed by +// PLIC when preparing to bring a processor online or when dispatching a +// virtual processor! +//============================================================================= + u32 xDesc; // Eye catcher 0xD397D781 x00-x03 + u16 xSize; // Size of this struct x04-x05 + u16 xRsvd1_0; // Reserved x06-x07 + u16 xRsvd1_1:14; // Reserved x08-x09 + u8 xSharedProc:1; // Shared processor indicator ... + u8 xSecondaryThread:1; // Secondary thread indicator ... + volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A + u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B + volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D + volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F + u32 xDecrVal; // Value for Decr programming x10-x13 + u32 xPMCVal; // Value for PMC regs x14-x17 + volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B + volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F + volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 + u32 xDseiData; // DSEI data x24-x27 + u64 xSPRG3; // SPRG3 value x28-x2F + u8 xRsvd1_3[80]; // Reserved x30-x7F + +//============================================================================= +// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data +//============================================================================= + // This Dword contains a byte for each type of interrupt that can occur. + // The IPI is a count while the others are just a binary 1 or 0. + union { + u64 xAnyInt; + struct { + u16 xRsvd; // Reserved - cleared by #mpasmbl + u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO + u8 xIpiCnt; // IPI Count + u8 xDecrInt; // DECR interrupt occurred + u8 xPdcInt; // PDC interrupt occurred + u8 xQuantumInt; // Interrupt quantum reached + u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending + } xFields; + } xIntDword; + + // Whenever any fields in this Dword are set then PLIC will defer the + // processing of external interrupts. Note that PLIC will store the + // XIRR directly into the xXirrValue field so that another XIRR will + // not be presented until this one clears. The layout of the low + // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the + // entire Dword is zero or not. A non-zero value in the low order + // 2-bytes will result in SLIC being granted the highest thread + // priority upon return. A 0 will return to SLIC as medium priority. + u64 xPlicDeferIntsArea; // Entire Dword + + // Used to pass the real SRR0/1 from PLIC to SLIC as well as to + // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. + u64 xSavedSrr0; // Saved SRR0 x10-x17 + u64 xSavedSrr1; // Saved SRR1 x18-x1F + + // Used to pass parms from the OS to PLIC for SetAsrAndRfid + u64 xSavedGpr3; // Saved GPR3 x20-x27 + u64 xSavedGpr4; // Saved GPR4 x28-x2F + u64 xSavedGpr5; // Saved GPR5 x30-x37 + + u8 xRsvd2_1; // Reserved x38-x38 + u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 + u8 xFPRegsInUse; // FP regs in use x3A-x3A + u8 xPMCRegsInUse; // PMC regs in use x3B-x3B + volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F + volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 + volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F + u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 + u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F + u64 xEndOfQuantum; // TB at end of quantum x60-x67 + u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F + u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 + volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B + u16 xSLBCount; // # of SLBs to maintain x7C-x7D + u8 xIdle; // Indicate OS is idle x7E + u8 xRsvd2_2; // Reserved x7F + + +//============================================================================= +// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors +//============================================================================= + // This is the xYieldCount. An "odd" value (low bit on) means that + // the processor is yielded (either because of an OS yield or a PLIC + // preempt). An even value implies that the processor is currently + // executing. + // NOTE: This value will ALWAYS be zero for dedicated processors and + // will NEVER be zero for shared processors (ie, initialized to a 1). + volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 + u8 xRsvd3_0[124]; // Reserved x04-x7F + +//============================================================================= +// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data +//============================================================================= + u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF + + +}; + +#endif /* _ASM_LPPACA_H */ diff -ruN linus-bk-naca.9/include/asm-ppc64/paca.h linus-bk-naca.10/include/asm-ppc64/paca.h --- linus-bk-naca.9/include/asm-ppc64/paca.h 2004-12-13 18:05:08.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/paca.h 2004-12-31 15:48:57.000000000 +1100 @@ -18,7 +18,7 @@ #include #include -#include +#include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/75de8b11/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:43:19 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:43:19 +1100 Subject: [PATCH 11/11] PPC64: remove StudlyCaps from lppaca structure In-Reply-To: <20050104154025.63a1b9fb.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> Message-ID: <20050104154319.505b1197.sfr@canb.auug.org.au> Hi Andrew, This patch just renames all the fields (and the structure name) of the lppaca structure to rid us of some more StudyCaps. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c 2004-12-13 15:02:03.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c 2004-12-31 15:51:16.000000000 +1100 @@ -102,10 +102,10 @@ DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp)); DEFINE(PACALPPACA, offsetof(struct paca_struct, lppaca)); DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); - DEFINE(LPPACASRR0, offsetof(struct ItLpPaca, xSavedSrr0)); - DEFINE(LPPACASRR1, offsetof(struct ItLpPaca, xSavedSrr1)); - DEFINE(LPPACAANYINT, offsetof(struct ItLpPaca, xIntDword.xAnyInt)); - DEFINE(LPPACADECRINT, offsetof(struct ItLpPaca, xIntDword.xFields.xDecrInt)); + DEFINE(LPPACASRR0, offsetof(struct lppaca, saved_srr0)); + DEFINE(LPPACASRR1, offsetof(struct lppaca, saved_srr1)); + DEFINE(LPPACAANYINT, offsetof(struct lppaca, int_dword.any_int)); + DEFINE(LPPACADECRINT, offsetof(struct lppaca, int_dword.fields.decr_int)); /* RTAS */ DEFINE(RTASBASE, offsetof(struct rtas_t, base)); diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c 2004-12-11 02:51:17.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c 2004-12-13 15:31:14.000000000 +1100 @@ -559,7 +559,7 @@ static void __init setup_iSeries_cache_sizes(void) { unsigned int i, n; - unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; + unsigned int procIx = get_paca()->lppaca.dyn_hv_phys_proc_index; systemcfg->icache_size = ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; @@ -656,7 +656,7 @@ void __init iSeries_setup_arch(void) { void *eventStack; - unsigned procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; + unsigned procIx = get_paca()->lppaca.dyn_hv_phys_proc_index; /* Add an eye catcher and the systemcfg layout version number */ strcpy(systemcfg->eye_catcher, "SYSTEMCFG:PPC64"); diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c --- linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c 2004-12-13 15:29:16.000000000 +1100 @@ -90,7 +90,7 @@ np = 0; for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { + if (paca[i].lppaca.dyn_proc_status < 2) { cpu_set(i, cpu_possible_map); cpu_set(i, cpu_present_map); cpu_set(i, cpu_sibling_map[i]); @@ -106,7 +106,7 @@ unsigned np = 0; for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { + if (paca[i].lppaca.dyn_proc_status < 2) { /*paca[i].active = 1;*/ ++np; } @@ -120,7 +120,7 @@ BUG_ON(nr < 0 || nr >= NR_CPUS); /* Verify that our partition has a processor nr */ - if (paca[nr].lppaca.xDynProcStatus >= 2) + if (paca[nr].lppaca.dyn_proc_status >= 2) return; /* The processor is currently spinning, waiting diff -ruN linus-bk-naca.10/arch/ppc64/kernel/idle.c linus-bk-naca.11/arch/ppc64/kernel/idle.c --- linus-bk-naca.10/arch/ppc64/kernel/idle.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/idle.c 2004-12-13 16:06:18.000000000 +1100 @@ -67,7 +67,7 @@ * The decrementer stops during the yield. Force a fake decrementer * here and let the timer_interrupt code sort out the actual time. */ - get_paca()->lppaca.xIntDword.xFields.xDecrInt = 1; + get_paca()->lppaca.int_dword.fields.decr_int = 1; process_iSeries_events(); } @@ -86,7 +86,7 @@ lpaca = get_paca(); while (1) { - if (lpaca->lppaca.xSharedProc) { + if (lpaca->lppaca.shared_proc) { if (ItLpQueue_isLpIntPending(lpaca->lpqueue_ptr)) process_iSeries_events(); if (!need_resched()) @@ -173,7 +173,7 @@ * Indicate to the HV that we are idle. Now would be * a good time to find other work to dispatch. */ - lpaca->lppaca.xIdle = 1; + lpaca->lppaca.idle = 1; oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); if (!oldval) { @@ -194,7 +194,7 @@ HMT_medium(); - if (!(ppaca->lppaca.xIdle)) { + if (!(ppaca->lppaca.idle)) { local_irq_disable(); /* @@ -233,7 +233,7 @@ } HMT_medium(); - lpaca->lppaca.xIdle = 0; + lpaca->lppaca.idle = 0; schedule(); if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); @@ -251,7 +251,7 @@ * Indicate to the HV that we are idle. Now would be * a good time to find other work to dispatch. */ - lpaca->lppaca.xIdle = 1; + lpaca->lppaca.idle = 1; while (!need_resched() && !cpu_is_offline(cpu)) { local_irq_disable(); @@ -273,7 +273,7 @@ } HMT_medium(); - lpaca->lppaca.xIdle = 0; + lpaca->lppaca.idle = 0; schedule(); if (cpu_is_offline(smp_processor_id()) && system_state == SYSTEM_RUNNING) @@ -352,7 +352,7 @@ #ifdef CONFIG_PPC_PSERIES if (systemcfg->platform & PLATFORM_PSERIES) { if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - if (get_paca()->lppaca.xSharedProc) { + if (get_paca()->lppaca.shared_proc) { printk(KERN_INFO "Using shared processor idle loop\n"); idle_loop = shared_idle; } else { diff -ruN linus-bk-naca.10/arch/ppc64/kernel/irq.c linus-bk-naca.11/arch/ppc64/kernel/irq.c --- linus-bk-naca.10/arch/ppc64/kernel/irq.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/irq.c 2004-12-13 15:43:22.000000000 +1100 @@ -259,8 +259,8 @@ lpaca = get_paca(); #ifdef CONFIG_SMP - if (lpaca->lppaca.xIntDword.xFields.xIpiCnt) { - lpaca->lppaca.xIntDword.xFields.xIpiCnt = 0; + if (lpaca->lppaca.int_dword.fields.ipi_cnt) { + lpaca->lppaca.int_dword.fields.ipi_cnt = 0; iSeries_smp_message_recv(regs); } #endif /* CONFIG_SMP */ @@ -270,8 +270,8 @@ irq_exit(); - if (lpaca->lppaca.xIntDword.xFields.xDecrInt) { - lpaca->lppaca.xIntDword.xFields.xDecrInt = 0; + if (lpaca->lppaca.int_dword.fields.decr_int) { + lpaca->lppaca.int_dword.fields.decr_int = 0; /* Signal a fake decrementer interrupt */ timer_interrupt(regs); } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c --- linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c 2004-12-13 15:02:29.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c 2004-12-13 16:00:00.000000000 +1100 @@ -72,7 +72,7 @@ /* * For iSeries legacy systems, the PPA purr function is available from the - * xEmulatedTimeBase field in the paca. + * emulated_time_base field in the paca. */ static unsigned long get_purr(void) { @@ -82,11 +82,11 @@ for_each_cpu(cpu) { lpaca = paca + cpu; - sum_purr += lpaca->lppaca.xEmulatedTimeBase; + sum_purr += lpaca->lppaca.emulated_time_base; #ifdef PURR_DEBUG printk(KERN_INFO "get_purr for cpu (%d) has value (%ld) \n", - cpu, lpaca->lppaca.xEmulatedTimeBase); + cpu, lpaca->lppaca.emulated_time_base); #endif } return sum_purr; @@ -107,7 +107,7 @@ seq_printf(m, "%s %s \n", MODULE_NAME, MODULE_VERS); - shared = (int)(lpaca->lppaca_ptr->xSharedProc); + shared = (int)(lpaca->lppaca_ptr->shared_proc); seq_printf(m, "serial_number=%c%c%c%c%c%c%c\n", e2a(xItExtVpdPanel.mfgID[2]), e2a(xItExtVpdPanel.mfgID[3]), @@ -395,7 +395,7 @@ (h_resource >> 0 * 8) & 0xffff); /* pool related entries are apropriate for shared configs */ - if (paca[0].lppaca.xSharedProc) { + if (paca[0].lppaca.shared_proc) { h_pic(&pool_idle_time, &pool_procs); @@ -444,7 +444,7 @@ seq_printf(m, "partition_potential_processors=%d\n", partition_potential_processors); - seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.xSharedProc); + seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.shared_proc); return 0; } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/pacaData.c linus-bk-naca.11/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.10/arch/ppc64/kernel/pacaData.c 2004-12-13 15:02:07.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/pacaData.c 2004-12-13 16:05:34.000000000 +1100 @@ -28,7 +28,7 @@ extern unsigned long __toc_start; /* The Paca is an array with one entry per processor. Each contains an - * ItLpPaca, which contains the information shared between the + * lppaca, which contains the information shared between the * hypervisor and Linux. Each also contains an ItLpRegSave area which * is used by the hypervisor to save registers. * On systems with hardware multi-threading, there are two threads @@ -61,13 +61,13 @@ .cpu_start = (start), /* Processor start */ \ .hw_cpu_id = 0xffff, \ .lppaca = { \ - .xDesc = 0xd397d781, /* "LpPa" */ \ - .xSize = sizeof(struct ItLpPaca), \ - .xFPRegsInUse = 1, \ - .xDynProcStatus = 2, \ - .xDecrVal = 0x00ff0000, \ - .xEndOfQuantum = 0xfffffffffffffffful, \ - .xSLBCount = 64, \ + .desc = 0xd397d781, /* "LpPa" */ \ + .size = sizeof(struct lppaca), \ + .dyn_proc_status = 2, \ + .decr_val = 0x00ff0000, \ + .fpregs_in_use = 1, \ + .end_of_quantum = 0xfffffffffffffffful, \ + .slb_count = 64, \ }, \ EXTRA_INITS((number), (lpq)) \ } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/sysfs.c linus-bk-naca.11/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.10/arch/ppc64/kernel/sysfs.c 2004-12-13 15:01:19.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/sysfs.c 2004-12-13 15:58:30.000000000 +1100 @@ -157,7 +157,7 @@ #ifdef CONFIG_PPC_PSERIES /* instruct hypervisor to maintain PMCs */ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) - get_paca()->lppaca.xPMCRegsInUse = 1; + get_paca()->lppaca.pmcregs_in_use = 1; /* * On SMT machines we have to set the run latch in the ctrl register diff -ruN linus-bk-naca.10/arch/ppc64/kernel/time.c linus-bk-naca.11/arch/ppc64/kernel/time.c --- linus-bk-naca.10/arch/ppc64/kernel/time.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/time.c 2004-12-13 15:43:28.000000000 +1100 @@ -230,7 +230,7 @@ /* * For iSeries shared processors, we have to let the hypervisor * set the hardware decrementer. We set a virtual decrementer - * in the ItLpPaca and call the hypervisor if the virtual + * in the lppaca and call the hypervisor if the virtual * decrementer is less than the current value in the hardware * decrementer. (almost always the new decrementer value will * be greater than the current hardware decementer so the hypervisor @@ -256,7 +256,7 @@ profile_tick(CPU_PROFILING, regs); #endif - lpaca->lppaca.xIntDword.xFields.xDecrInt = 0; + lpaca->lppaca.int_dword.fields.decr_int = 0; while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) { diff -ruN linus-bk-naca.10/arch/ppc64/lib/locks.c linus-bk-naca.11/arch/ppc64/lib/locks.c --- linus-bk-naca.10/arch/ppc64/lib/locks.c 2004-09-16 21:51:57.000000000 +1000 +++ linus-bk-naca.11/arch/ppc64/lib/locks.c 2004-12-13 16:08:05.000000000 +1100 @@ -34,7 +34,7 @@ holder_cpu = lock_value & 0xffff; BUG_ON(holder_cpu >= NR_CPUS); holder_paca = &paca[holder_cpu]; - yield_count = holder_paca->lppaca.xYieldCount; + yield_count = holder_paca->lppaca.yield_count; if ((yield_count & 1) == 0) return; /* virtual cpu is currently running */ rmb(); @@ -66,7 +66,7 @@ holder_cpu = lock_value & 0xffff; BUG_ON(holder_cpu >= NR_CPUS); holder_paca = &paca[holder_cpu]; - yield_count = holder_paca->lppaca.xYieldCount; + yield_count = holder_paca->lppaca.yield_count; if ((yield_count & 1) == 0) return; /* virtual cpu is currently running */ rmb(); diff -ruN linus-bk-naca.10/arch/ppc64/xmon/xmon.c linus-bk-naca.11/arch/ppc64/xmon/xmon.c --- linus-bk-naca.10/arch/ppc64/xmon/xmon.c 2004-12-11 02:33:00.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/xmon/xmon.c 2004-12-13 15:50:52.000000000 +1100 @@ -1489,7 +1489,7 @@ unsigned long val; #ifdef CONFIG_PPC_ISERIES struct paca_struct *ptrPaca = NULL; - struct ItLpPaca *ptrLpPaca = NULL; + struct lppaca *ptrLpPaca = NULL; struct ItLpRegSave *ptrLpRegSave = NULL; #endif @@ -1513,10 +1513,10 @@ printf(" Local Processor Control Area (LpPaca): \n"); ptrLpPaca = ptrPaca->lppaca_ptr; printf(" Saved Srr0=%.16lx Saved Srr1=%.16lx \n", - ptrLpPaca->xSavedSrr0, ptrLpPaca->xSavedSrr1); + ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1); printf(" Saved Gpr3=%.16lx Saved Gpr4=%.16lx \n", - ptrLpPaca->xSavedGpr3, ptrLpPaca->xSavedGpr4); - printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->xSavedGpr5); + ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4); + printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5); printf(" Local Processor Register Save Area (LpRegSave): \n"); ptrLpRegSave = ptrPaca->reg_save_ptr; diff -ruN linus-bk-naca.10/include/asm-ppc64/lppaca.h linus-bk-naca.11/include/asm-ppc64/lppaca.h --- linus-bk-naca.10/include/asm-ppc64/lppaca.h 2004-12-13 15:04:43.000000000 +1100 +++ linus-bk-naca.11/include/asm-ppc64/lppaca.h 2004-12-13 16:09:08.000000000 +1100 @@ -28,7 +28,7 @@ //---------------------------------------------------------------------------- #include -struct ItLpPaca +struct lppaca { //============================================================================= // CACHE_LINE_1 0x0000 - 0x007F Contains read-only data @@ -36,24 +36,24 @@ // PLIC when preparing to bring a processor online or when dispatching a // virtual processor! //============================================================================= - u32 xDesc; // Eye catcher 0xD397D781 x00-x03 - u16 xSize; // Size of this struct x04-x05 - u16 xRsvd1_0; // Reserved x06-x07 - u16 xRsvd1_1:14; // Reserved x08-x09 - u8 xSharedProc:1; // Shared processor indicator ... - u8 xSecondaryThread:1; // Secondary thread indicator ... - volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A - u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B - volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D - volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F - u32 xDecrVal; // Value for Decr programming x10-x13 - u32 xPMCVal; // Value for PMC regs x14-x17 - volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B - volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F - volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 - u32 xDseiData; // DSEI data x24-x27 - u64 xSPRG3; // SPRG3 value x28-x2F - u8 xRsvd1_3[80]; // Reserved x30-x7F + u32 desc; // Eye catcher 0xD397D781 x00-x03 + u16 size; // Size of this struct x04-x05 + u16 reserved1; // Reserved x06-x07 + u16 reserved2:14; // Reserved x08-x09 + u8 shared_proc:1; // Shared processor indicator ... + u8 secondary_thread:1; // Secondary thread indicator ... + volatile u8 dyn_proc_status:8; // Dynamic Status of this proc x0A-x0A + u8 secondary_thread_count; // Secondary thread count x0B-x0B + volatile u16 dyn_hv_phys_proc_index;// Dynamic HV Physical Proc Index0C-x0D + volatile u16 dyn_hv_log_proc_index;// Dynamic HV Logical Proc Indexx0E-x0F + u32 decr_val; // Value for Decr programming x10-x13 + u32 pmc_val; // Value for PMC regs x14-x17 + volatile u32 dyn_hw_node_id; // Dynamic Hardware Node id x18-x1B + volatile u32 dyn_hw_proc_id; // Dynamic Hardware Proc Id x1C-x1F + volatile u32 dyn_pir; // Dynamic ProcIdReg value x20-x23 + u32 dsei_data; // DSEI data x24-x27 + u64 sprg3; // SPRG3 value x28-x2F + u8 reserved3[80]; // Reserved x30-x7F //============================================================================= // CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data @@ -61,17 +61,17 @@ // This Dword contains a byte for each type of interrupt that can occur. // The IPI is a count while the others are just a binary 1 or 0. union { - u64 xAnyInt; + u64 any_int; struct { - u16 xRsvd; // Reserved - cleared by #mpasmbl - u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO - u8 xIpiCnt; // IPI Count - u8 xDecrInt; // DECR interrupt occurred - u8 xPdcInt; // PDC interrupt occurred - u8 xQuantumInt; // Interrupt quantum reached - u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending - } xFields; - } xIntDword; + u16 reserved; // Reserved - cleared by #mpasmbl + u8 xirr_int; // Indicates xXirrValue is valid or Immed IO + u8 ipi_cnt; // IPI Count + u8 decr_int; // DECR interrupt occurred + u8 pdc_int; // PDC interrupt occurred + u8 quantum_int; // Interrupt quantum reached + u8 old_plic_deferred_ext_int; // Old PLIC has a deferred XIRR pending + } fields; + } int_dword; // Whenever any fields in this Dword are set then PLIC will defer the // processing of external interrupts. Note that PLIC will store the @@ -81,54 +81,52 @@ // entire Dword is zero or not. A non-zero value in the low order // 2-bytes will result in SLIC being granted the highest thread // priority upon return. A 0 will return to SLIC as medium priority. - u64 xPlicDeferIntsArea; // Entire Dword + u64 plic_defer_ints_area; // Entire Dword // Used to pass the real SRR0/1 from PLIC to SLIC as well as to // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. - u64 xSavedSrr0; // Saved SRR0 x10-x17 - u64 xSavedSrr1; // Saved SRR1 x18-x1F + u64 saved_srr0; // Saved SRR0 x10-x17 + u64 saved_srr1; // Saved SRR1 x18-x1F // Used to pass parms from the OS to PLIC for SetAsrAndRfid - u64 xSavedGpr3; // Saved GPR3 x20-x27 - u64 xSavedGpr4; // Saved GPR4 x28-x2F - u64 xSavedGpr5; // Saved GPR5 x30-x37 - - u8 xRsvd2_1; // Reserved x38-x38 - u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 - u8 xFPRegsInUse; // FP regs in use x3A-x3A - u8 xPMCRegsInUse; // PMC regs in use x3B-x3B - volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F - volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 - volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F - u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 - u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F - u64 xEndOfQuantum; // TB at end of quantum x60-x67 - u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F - u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 - volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B - u16 xSLBCount; // # of SLBs to maintain x7C-x7D - u8 xIdle; // Indicate OS is idle x7E - u8 xRsvd2_2; // Reserved x7F + u64 saved_gpr3; // Saved GPR3 x20-x27 + u64 saved_gpr4; // Saved GPR4 x28-x2F + u64 saved_gpr5; // Saved GPR5 x30-x37 + + u8 reserved4; // Reserved x38-x38 + u8 cpuctls_task_attrs; // Task attributes for cpuctls x39-x39 + u8 fpregs_in_use; // FP regs in use x3A-x3A + u8 pmcregs_in_use; // PMC regs in use x3B-x3B + volatile u32 saved_decr; // Saved Decr Value x3C-x3F + volatile u64 emulated_time_base;// Emulated TB for this thread x40-x47 + volatile u64 cur_plic_latency; // Unaccounted PLIC latency x48-x4F + u64 tot_plic_latency; // Accumulated PLIC latency x50-x57 + u64 wait_state_cycles; // Wait cycles for this proc x58-x5F + u64 end_of_quantum; // TB at end of quantum x60-x67 + u64 pdc_saved_sprg1; // Saved SPRG1 for PMC int x68-x6F + u64 pdc_saved_srr0; // Saved SRR0 for PMC int x70-x77 + volatile u32 virtual_decr; // Virtual DECR for shared procsx78-x7B + u16 slb_count; // # of SLBs to maintain x7C-x7D + u8 idle; // Indicate OS is idle x7E + u8 reserved5; // Reserved x7F //============================================================================= // CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors //============================================================================= - // This is the xYieldCount. An "odd" value (low bit on) means that + // This is the yield_count. An "odd" value (low bit on) means that // the processor is yielded (either because of an OS yield or a PLIC // preempt). An even value implies that the processor is currently // executing. // NOTE: This value will ALWAYS be zero for dedicated processors and // will NEVER be zero for shared processors (ie, initialized to a 1). - volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 - u8 xRsvd3_0[124]; // Reserved x04-x7F + volatile u32 yield_count; // PLIC increments each dispatchx00-x03 + u8 reserved6[124]; // Reserved x04-x7F //============================================================================= // CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data //============================================================================= - u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF - - + u8 pmc_save_area[256]; // PMC interrupt Area x00-xFF }; #endif /* _ASM_LPPACA_H */ diff -ruN linus-bk-naca.10/include/asm-ppc64/paca.h linus-bk-naca.11/include/asm-ppc64/paca.h --- linus-bk-naca.10/include/asm-ppc64/paca.h 2004-12-31 15:48:57.000000000 +1100 +++ linus-bk-naca.11/include/asm-ppc64/paca.h 2004-12-31 15:54:35.000000000 +1100 @@ -34,8 +34,8 @@ * * This structure is not directly accessed by firmware or the service * processor except for the first two pointers that point to the - * ItLpPaca area and the ItLpRegSave area for this CPU. Both the - * ItLpPaca and ItLpRegSave objects are currently contained within the + * lppaca area and the ItLpRegSave area for this CPU. Both the + * lppaca and ItLpRegSave objects are currently contained within the * PACA but they do not need to be. */ struct paca_struct { @@ -50,7 +50,7 @@ * MAGIC: These first two pointers can't be moved - they're * accessed by the firmware */ - struct ItLpPaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ + struct lppaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ struct ItLpRegSave *reg_save_ptr; /* Pointer to LpRegSave for PLIC */ /* @@ -109,7 +109,7 @@ * alignment will suffice to ensure that it doesn't * cross a page boundary. */ - struct ItLpPaca lppaca __attribute__((__aligned__(0x400))); + struct lppaca lppaca __attribute__((__aligned__(0x400))); #ifdef CONFIG_PPC_ISERIES struct ItLpRegSave reg_save; #endif diff -ruN linus-bk-naca.10/include/asm-ppc64/spinlock.h linus-bk-naca.11/include/asm-ppc64/spinlock.h --- linus-bk-naca.10/include/asm-ppc64/spinlock.h 2004-09-09 09:59:50.000000000 +1000 +++ linus-bk-naca.11/include/asm-ppc64/spinlock.h 2004-12-13 15:25:23.000000000 +1100 @@ -57,7 +57,7 @@ #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (get_paca()->lppaca.xSharedProc) +#define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc) extern void __spin_yield(spinlock_t *lock); extern void __rw_yield(rwlock_t *lock); #else /* SPLPAR || ISERIES */ diff -ruN linus-bk-naca.10/include/asm-ppc64/time.h linus-bk-naca.11/include/asm-ppc64/time.h --- linus-bk-naca.10/include/asm-ppc64/time.h 2004-07-05 11:49:20.000000000 +1000 +++ linus-bk-naca.11/include/asm-ppc64/time.h 2004-12-13 16:05:02.000000000 +1100 @@ -78,8 +78,8 @@ struct paca_struct *lpaca = get_paca(); int cur_dec; - if (lpaca->lppaca.xSharedProc) { - lpaca->lppaca.xVirtualDecr = val; + if (lpaca->lppaca.shared_proc) { + lpaca->lppaca.virtual_decr = val; cur_dec = get_dec(); if (cur_dec > val) HvCall_setVirtualDecr(); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/90f4646f/attachment.pgp From anton at samba.org Tue Jan 4 16:01:15 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:01:15 +1100 Subject: [PATCH] ppc64: Clarify rtasd printk Message-ID: <20050104050115.GG7335@krispykreme.ozlabs.ibm.com> Hi, On machines with RTAS but without event-scan support we would incorrectly claim there was no RTAS on the system. Signed-off-by: Anton Blanchard ===== rtasd.c 1.34 vs edited ===== --- 1.34/arch/ppc64/kernel/rtasd.c 2004-11-16 14:29:11 +11:00 +++ edited/rtasd.c 2004-12-26 13:36:56 +11:00 @@ -486,7 +486,7 @@ /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { if (systemcfg->platform & PLATFORM_PSERIES); - printk(KERN_ERR "rtasd: no RTAS on system\n"); + printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } From anton at samba.org Tue Jan 4 16:07:27 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:07:27 +1100 Subject: [PATCH] ppc64: fix some compiler warnings Message-ID: <20050104050727.GH7335@krispykreme.ozlabs.ibm.com> Fix some compiler warnings: - The first two are spurious gcc warnings, but quieten them up regardless - Add a missing include - Use register_sysrq_key instead of __sysrq_put_key_op Signed-off-by: Anton Blanchard diff -puN arch/ppc64/mm/hash_native.c~remove_compiler_warnings arch/ppc64/mm/hash_native.c --- gr_work/arch/ppc64/mm/hash_native.c~remove_compiler_warnings 2004-12-25 21:44:00.112288718 -0600 +++ gr_work-anton/arch/ppc64/mm/hash_native.c 2004-12-25 21:44:35.782093438 -0600 @@ -242,7 +242,7 @@ static long native_hpte_updatepp(unsigne */ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea) { - unsigned long vsid, va, vpn, flags; + unsigned long vsid, va, vpn, flags = 0; long slot; HPTE *hptep; int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); diff -puN arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_lpar.c --- gr_work/arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings 2004-12-25 21:44:48.291973925 -0600 +++ gr_work-anton/arch/ppc64/kernel/pSeries_lpar.c 2004-12-25 21:45:08.829912888 -0600 @@ -504,7 +504,7 @@ void pSeries_lpar_flush_hash_range(unsig int local) { int i; - unsigned long flags; + unsigned long flags = 0; struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); diff -puN arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_setup.c --- gr_work/arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings 2004-12-25 21:46:35.016298826 -0600 +++ gr_work-anton/arch/ppc64/kernel/pSeries_setup.c 2004-12-25 21:47:05.188173311 -0600 @@ -59,6 +59,7 @@ #include #include #include +#include #include "i8259.h" #include diff -puN arch/ppc64/xmon/start.c~remove_compiler_warnings arch/ppc64/xmon/start.c --- gr_work/arch/ppc64/xmon/start.c~remove_compiler_warnings 2004-12-25 21:48:27.578625901 -0600 +++ gr_work-anton/arch/ppc64/xmon/start.c 2004-12-25 21:48:55.121385858 -0600 @@ -40,7 +40,7 @@ static struct sysrq_key_op sysrq_xmon_op static int __init setup_xmon_sysrq(void) { - __sysrq_put_key_op('x', &sysrq_xmon_op); + register_sysrq_key('x', &sysrq_xmon_op); return 0; } __initcall(setup_xmon_sysrq); _ From anton at samba.org Tue Jan 4 16:13:35 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:13:35 +1100 Subject: [PATCH] ppc64: remove stale prom.h code Message-ID: <20050104051335.GJ7335@krispykreme.ozlabs.ibm.com> Remove some stale code in prom.h Signed-off-by: Anton Blanchard diff -puN include/asm-ppc64/prom.h~prom_cleanup include/asm-ppc64/prom.h --- foobar2/include/asm-ppc64/prom.h~prom_cleanup 2005-01-04 16:07:39.113436136 +1100 +++ foobar2-anton/include/asm-ppc64/prom.h 2005-01-04 16:07:39.132434650 +1100 @@ -21,9 +21,6 @@ #define PTRUNRELOC(x) ((typeof(x))((unsigned long)(x) + offset)) #define RELOC(x) (*PTRRELOC(&(x))) -#define LONG_LSW(X) (((unsigned long)X) & 0xffffffff) -#define LONG_MSW(X) (((unsigned long)X) >> 32) - /* Definitions used by the flattened device tree */ #define OF_DT_HEADER 0xd00dfeed /* 4: version, 4: total size */ #define OF_DT_BEGIN_NODE 0x1 /* Start node: full name */ @@ -64,8 +61,6 @@ struct boot_param_header typedef u32 phandle; typedef u32 ihandle; -typedef u32 phandle32; -typedef u32 ihandle32; struct address_range { unsigned long space; @@ -95,13 +90,6 @@ struct isa_range { unsigned int size; }; -struct of_tce_table { - phandle node; - unsigned long base; - unsigned long size; -}; -extern struct of_tce_table of_tce_table[]; - struct reg_property { unsigned long address; unsigned long size; @@ -117,19 +105,6 @@ struct reg_property64 { unsigned long size; }; -struct reg_property_pmac { - unsigned int address_hi; - unsigned int address_lo; - unsigned int size; -}; - -struct translation_property { - unsigned long virt; - unsigned long size; - unsigned long phys; - unsigned int flags; -}; - struct property { char *name; int length; @@ -160,8 +135,6 @@ struct device_node { int busno; /* for pci devices */ int bussubno; /* for pci devices */ int devfn; /* for pci devices */ -#define DN_STATUS_BIST_FAILED (1<<0) - int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; struct pci_controller *phb; /* for pci devices */ @@ -244,7 +217,6 @@ extern int of_remove_node(struct device_ /* Other Prototypes */ extern unsigned long prom_init(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); -extern void relocate_nodes(void); extern void finish_device_tree(void); extern int device_is_compatible(struct device_node *device, const char *); extern int machine_is_compatible(const char *compat); _ From paulus at samba.org Tue Jan 4 17:39:00 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 4 Jan 2005 17:39:00 +1100 Subject: [PATCH] PPC64 Simplify timer_interrupt Message-ID: <16858.14852.673750.729779@cargo.ozlabs.ibm.com> This patch is from Milton Miller . When the update_process_times call was moved out of do_timer for the UP case, the replicator didn't track down the hiding and just added ifndef SMP. This removes the ifdefs and the indirection of calling another file for one function in a third file. Signed-off-by: Milton Miller Signed-off-by: Paul Mackerras diff -urN base-2.6/arch/ppc64/kernel/smp.c test/arch/ppc64/kernel/smp.c --- base-2.6/arch/ppc64/kernel/smp.c 2005-01-04 16:24:21.930503880 +1100 +++ test/arch/ppc64/kernel/smp.c 2005-01-04 17:36:44.569526376 +1100 @@ -156,11 +156,6 @@ } } -void smp_local_timer_interrupt(struct pt_regs * regs) -{ - update_process_times(user_mode(regs)); -} - void smp_message_recv(int msg, struct pt_regs *regs) { switch(msg) { diff -urN base-2.6/arch/ppc64/kernel/time.c test/arch/ppc64/kernel/time.c --- base-2.6/arch/ppc64/kernel/time.c 2005-01-04 16:27:42.854446184 +1100 +++ test/arch/ppc64/kernel/time.c 2005-01-04 17:36:44.571526072 +1100 @@ -68,8 +68,6 @@ #include #include -void smp_local_timer_interrupt(struct pt_regs *); - u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; EXPORT_SYMBOL(jiffies_64); @@ -259,8 +257,6 @@ lpaca->lppaca.int_dword.fields.decr_int = 0; while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) { - -#ifdef CONFIG_SMP /* * We cannot disable the decrementer, so in the period * between this cpu's being marked offline in cpu_online_map @@ -269,8 +265,7 @@ * is the case. */ if (!cpu_is_offline(cpu)) - smp_local_timer_interrupt(regs); -#endif + update_process_times(user_mode(regs)); /* * No need to check whether cpu is offline here; boot_cpuid * should have been fixed up by now. @@ -279,9 +274,6 @@ write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; do_timer(regs); -#ifndef CONFIG_SMP - update_process_times(user_mode(regs)); -#endif timer_sync_xtime( cur_tb ); timer_check_rtc(); write_sequnlock(&xtime_lock); From sfr at canb.auug.org.au Tue Jan 4 22:58:09 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 22:58:09 +1100 Subject: [PATCH] PPC64: use c99 initializers In-Reply-To: <20050104154319.505b1197.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> <20050104154319.505b1197.sfr@canb.auug.org.au> Message-ID: <20050104225809.4b265440.sfr@canb.auug.org.au> Hi Andrew, This patch is just more clean up in the ppc64 arch. It uses c99 initializers for various iSeries structures that are used to pass information to the hypervisor. Also itLpNaca is not used by any code that could be in a module, so don't export it. Built and booted. Signed-off-by: Stephen Rothwell Please apply. P.S. for the StudlyCaps brigade, changing these is on my To Do list. :-) -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/LparData.c linus-bk-sfr.12/arch/ppc64/kernel/LparData.c --- linus-bk-sfr.11/arch/ppc64/kernel/LparData.c 2004-12-13 15:01:55.000000000 +1100 +++ linus-bk-sfr.12/arch/ppc64/kernel/LparData.c 2005-01-04 18:18:37.000000000 +1100 @@ -41,24 +41,22 @@ */ struct HvReleaseData hvReleaseData = { - 0xc8a5d9c4, /* desc = "HvRD" ebcdic */ - sizeof(struct HvReleaseData), - offsetof(struct naca_struct, xItVpdAreas), - &naca, /* 64-bit Naca address */ - 0x6000, /* offset of LparMap within loadarea (see head.S) */ - 0, - 1, /* tags inactive */ - 0, /* 64 bit */ - 0, /* shared processors */ - 0, /* HMT allowed */ - 6, /* TEMP: This allows non-GA driver */ - 4, /* We are v5r2m0 */ - 3, /* Min supported PLIC = v5r1m0 */ - 3, /* Min usable PLIC = v5r1m0 */ - { 0xd3, 0x89, 0x95, 0xa4, /* "Linux 2.4 "*/ - 0xa7, 0x40, 0xf2, 0x4b, - 0xf4, 0x4b, 0xf6, 0xf4 }, - {0} + .xDesc = 0xc8a5d9c4, /* "HvRD" ebcdic */ + .xSize = sizeof(struct HvReleaseData), + .xVpdAreasPtrOffset = offsetof(struct naca_struct, xItVpdAreas), + .xSlicNacaAddr = &naca, /* 64-bit Naca address */ + .xMsNucDataOffset = 0x6000, /* offset of LparMap within loadarea (see head.S) */ + .xTagsMode = 1, /* tags inactive */ + .xAddressSize = 0, /* 64 bit */ + .xNoSharedProcs = 0, /* shared processors */ + .xNoHMT = 0, /* HMT allowed */ + .xRsvd2 = 6, /* TEMP: This allows non-GA driver */ + .xVrmIndex = 4, /* We are v5r2m0 */ + .xMinSupportedPlicVrmIndex = 3, /* v5r1m0 */ + .xMinCompatablePlicVrmIndex = 3, /* v5r1m0 */ + .xVrmName = { 0xd3, 0x89, 0x95, 0xa4, /* "Linux 2.4.64" ebcdic */ + 0xa7, 0x40, 0xf2, 0x4b, + 0xf4, 0x4b, 0xf6, 0xf4 }, }; extern void SystemReset_Iseries(void); @@ -80,26 +78,33 @@ extern void InstructionAccessSLB_Iseries(void); struct ItLpNaca itLpNaca = { - 0xd397d581, /* desc = "LpNa" ebcdic */ - 0x0400, /* size of ItLpNaca */ - 0x0300, 19, /* offset to int array, # ents */ - 0, 0, 0, /* Part # of primary, serv, me */ - 0, 0x100, /* # of LP queues, offset */ - 0, 0, 0, /* Piranha stuff */ - { 0,0,0,0,0 }, /* reserved */ - 0,0,0,0,0,0,0, /* stuff */ - { 0,0,0,0,0 }, /* reserved */ - 0, /* reserved */ - 0, /* VRM index of PLIC */ - 0, 0, /* min supported, compat SLIC */ - 0, /* 64-bit addr of load area */ - 0, /* chunks for load area */ - 0, 0, /* PASE mask, seg table */ - { 0 }, /* 64 reserved bytes */ - { 0 }, /* 128 reserved bytes */ - { 0 }, /* Old LP Queue */ - { 0 }, /* 384 reserved bytes */ - { + .xDesc = 0xd397d581, /* "LpNa" ebcdic */ + .xSize = 0x0400, /* size of ItLpNaca */ + .xIntHdlrOffset = 0x0300, /* offset to int array */ + .xMaxIntHdlrEntries = 19, /* # ents */ + .xPrimaryLpIndex = 0, /* Part # of primary */ + .xServiceLpIndex = 0, /* Part # of serv */ + .xLpIndex = 0, /* Part # of me */ + .xMaxLpQueues = 0, /* # of LP queues */ + .xLpQueueOffset = 0x100, /* offset of start of LP queues */ + .xPirEnvironMode = 0, /* Piranha stuff */ + .xPirConsoleMode = 0, + .xPirDasdMode = 0, + .xLparInstalled = 0, + .xSysPartitioned = 0, + .xHwSyncedTBs = 0, + .xIntProcUtilHmt = 0, + .xSpVpdFormat = 0, + .xIntProcRatio = 0, + .xPlicVrmIndex = 0, /* VRM index of PLIC */ + .xMinSupportedSlicVrmInd = 0, /* min supported SLIC */ + .xMinCompatableSlicVrmInd = 0, /* min compat SLIC */ + .xLoadAreaAddr = 0, /* 64-bit addr of load area */ + .xLoadAreaChunks = 0, /* chunks for load area */ + .xPaseSysCallCRMask = 0, /* PASE mask */ + .xSlicSegmentTablePtr = 0, /* seg table */ + .xOldLpQueue = { 0 }, /* Old LP Queue */ + .xInterruptHdlr = { (u64)SystemReset_Iseries, /* 0x100 System Reset */ (u64)MachineCheck_Iseries, /* 0x200 Machine Check */ (u64)DataAccess_Iseries, /* 0x300 Data Access */ @@ -153,10 +158,8 @@ u64 xRecoveryLogBuffer[32] __attribute__((__section__(".data"))); struct SpCommArea xSpCommArea = { - 0xE2D7C3C2, - 1, - {0}, - 0, 0, 0, 0, {0} + .xDesc = 0xE2D7C3C2, + .xFormat = 1, }; /* The LparMap data is now located at offset 0x6000 in head.S @@ -168,22 +171,21 @@ * offset into the Naca of the pointer to the ItVpdAreas. */ struct ItVpdAreas itVpdAreas = { - 0xc9a3e5c1, /* "ItVA" */ - sizeof( struct ItVpdAreas ), - 0, 0, - 26, /* # VPD array entries */ - 10, /* # DMA array entries */ - NR_CPUS*2, maxPhysicalProcessors, /* Max logical, physical procs */ - offsetof(struct ItVpdAreas,xPlicDmaToks),/* offset to DMA toks */ - offsetof(struct ItVpdAreas,xSlicVpdAdrs),/* offset to VPD addrs */ - offsetof(struct ItVpdAreas,xPlicDmaLens),/* offset to DMA lens */ - offsetof(struct ItVpdAreas,xSlicVpdLens),/* offset to VPD lens */ - 0, /* max slot labels */ - 1, /* max LP queues */ - {0}, {0}, /* reserved */ - {0}, /* DMA lengths */ - {0}, /* DMA tokens */ - { /* VPD lengths */ + .xSlicDesc = 0xc9a3e5c1, /* "ItVA" */ + .xSlicSize = sizeof(struct ItVpdAreas), + .xSlicVpdEntries = ItVpdMaxEntries, /* # VPD array entries */ + .xSlicDmaEntries = ItDmaMaxEntries, /* # DMA array entries */ + .xSlicMaxLogicalProcs = NR_CPUS * 2, /* Max logical procs */ + .xSlicMaxPhysicalProcs = maxPhysicalProcessors, /* Max physical procs */ + .xSlicDmaToksOffset = offsetof(struct ItVpdAreas, xPlicDmaToks), + .xSlicVpdAdrsOffset = offsetof(struct ItVpdAreas, xSlicVpdAdrs), + .xSlicDmaLensOffset = offsetof(struct ItVpdAreas, xPlicDmaLens), + .xSlicVpdLensOffset = offsetof(struct ItVpdAreas, xSlicVpdLens), + .xSlicMaxSlotLabels = 0, /* max slot labels */ + .xSlicMaxLpQueues = 1, /* max LP queues */ + .xPlicDmaLens = { 0 }, /* DMA lengths */ + .xPlicDmaToks = { 0 }, /* DMA tokens */ + .xSlicVpdLens = { /* VPD lengths */ 0,0,0, /* 0 - 2 */ sizeof(xItExtVpdPanel), /* 3 Extended VPD */ sizeof(struct paca_struct), /* 4 length of Paca */ @@ -201,7 +203,7 @@ sizeof(struct ItLpQueue),/* 23 length of Lp Queue */ 0,0 /* 24 - 25 */ }, - { /* VPD addresses */ + .xSlicVpdAdrs = { /* VPD addresses */ 0,0,0, /* 0 - 2 */ &xItExtVpdPanel, /* 3 Extended VPD */ &paca[0], /* 4 first Paca */ diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c 2005-01-04 18:07:42.000000000 +1100 @@ -68,9 +68,6 @@ EXPORT_SYMBOL(__down_interruptible); EXPORT_SYMBOL(__up); EXPORT_SYMBOL(__down); -#ifdef CONFIG_PPC_ISERIES -EXPORT_SYMBOL(itLpNaca); -#endif EXPORT_SYMBOL(csum_partial); EXPORT_SYMBOL(csum_partial_copy_generic); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/7cbb34e2/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 23:05:08 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 23:05:08 +1100 Subject: [PATCH] PPC64: tidy up the htab_data structure In-Reply-To: <20050104225809.4b265440.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> <20050104154319.505b1197.sfr@canb.auug.org.au> <20050104225809.4b265440.sfr@canb.auug.org.au> Message-ID: <20050104230508.13dd0df4.sfr@canb.auug.org.au> Hi Andrew, More tidying up. The htab_data structure contained 5 fields or which two were completely unused and one other was just kept for printing at boot time. I have mode the remaining two into global variables. Signed-off-by: Stephen Rothwell Built and booted on iSeries (which is always lpar) and on pSeries without partitioning. Please apply. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c 2004-12-13 15:31:14.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c 2005-01-04 19:01:54.000000000 +1100 @@ -472,18 +472,16 @@ printk("HPT absolute addr = %016lx, size = %dK\n", chunk_to_addr(hptFirstChunk), hptSizeChunks * 256); - /* Fill in the htab_data structure */ - /* Fill in size of hashed page table */ + /* Fill in the hashed page table hash mask */ num_ptegs = hptSizePages * (PAGE_SIZE / (sizeof(HPTE) * HPTES_PER_GROUP)); - htab_data.htab_num_ptegs = num_ptegs; - htab_data.htab_hash_mask = num_ptegs - 1; + htab_hash_mask = num_ptegs - 1; /* * The actual hashed page table is in the hypervisor, * we have no direct access */ - htab_data.htab = NULL; + htab_address = NULL; /* * Determine if absolute memory has any diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c 2004-12-31 15:16:48.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c 2005-01-04 19:00:17.000000000 +1100 @@ -436,7 +436,7 @@ hash = hpt_hash(vpn, 0); for (j = 0; j < 2; j++) { - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; for (i = 0; i < HPTES_PER_GROUP; i++) { hpte_dw0.dword0 = pSeries_lpar_hpte_getword0(slot); dw0 = hpte_dw0.dw0; diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/setup.c linus-bk-sfr.13/arch/ppc64/kernel/setup.c --- linus-bk-sfr.12/arch/ppc64/kernel/setup.c 2004-12-31 16:24:11.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/setup.c 2005-01-04 18:58:58.000000000 +1100 @@ -55,6 +55,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -90,7 +91,6 @@ #endif /* extern void *stab; */ -extern HTAB htab_data; extern unsigned long klimit; extern void mm_init_ppc64(void); @@ -672,8 +672,8 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_data.htab = 0x%p\n", htab_data.htab); - printk("htab_data.num_ptegs = 0x%lx\n", htab_data.htab_num_ptegs); + printk("htab_address = 0x%p\n", htab_address); + printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); mm_init_ppc64(); diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_low.S linus-bk-sfr.13/arch/ppc64/mm/hash_low.S --- linus-bk-sfr.12/arch/ppc64/mm/hash_low.S 2004-10-14 18:37:37.000000000 +1000 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_low.S 2005-01-04 19:06:24.000000000 +1100 @@ -139,8 +139,8 @@ std r3,STK_PARM(r4)(r1) /* Get htab_hash_mask */ - ld r4,htab_data at got(2) - ld r27,16(r4) /* htab_data.htab_hash_mask -> r27 */ + ld r4,htab_hash_mask at got(2) + ld r27,0(r4) /* htab_hash_mask -> r27 */ /* Check if we may already be in the hashtable, in this case, we * go to out-of-line code to try to modify the HPTE diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_native.c linus-bk-sfr.13/arch/ppc64/mm/hash_native.c --- linus-bk-sfr.12/arch/ppc64/mm/hash_native.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_native.c 2005-01-04 19:09:45.000000000 +1100 @@ -52,7 +52,7 @@ unsigned long hpteflags, int bolted, int large) { unsigned long arpn = physRpn_to_absRpn(prpn); - HPTE *hptep = htab_data.htab + hpte_group; + HPTE *hptep = htab_address + hpte_group; Hpte_dword0 dw0; HPTE lhpte; int i; @@ -117,7 +117,7 @@ slot_offset = mftb() & 0x7; for (i = 0; i < HPTES_PER_GROUP; i++) { - hptep = htab_data.htab + hpte_group + slot_offset; + hptep = htab_address + hpte_group + slot_offset; dw0 = hptep->dw0.dw0; if (dw0.v && !dw0.bolted) { @@ -172,9 +172,9 @@ hash = hpt_hash(vpn, 0); for (j = 0; j < 2; j++) { - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; for (i = 0; i < HPTES_PER_GROUP; i++) { - hptep = htab_data.htab + slot; + hptep = htab_address + slot; dw0 = hptep->dw0.dw0; if ((dw0.avpn == (vpn >> 11)) && dw0.v && @@ -195,7 +195,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, unsigned long va, int large, int local) { - HPTE *hptep = htab_data.htab + slot; + HPTE *hptep = htab_address + slot; Hpte_dword0 dw0; unsigned long avpn = va >> 23; int ret = 0; @@ -254,7 +254,7 @@ slot = native_hpte_find(vpn); if (slot == -1) panic("could not find page to bolt\n"); - hptep = htab_data.htab + slot; + hptep = htab_address + slot; set_pp_bit(newpp, hptep); @@ -269,7 +269,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long va, int large, int local) { - HPTE *hptep = htab_data.htab + slot; + HPTE *hptep = htab_address + slot; Hpte_dword0 dw0; unsigned long avpn = va >> 23; unsigned long flags; @@ -336,10 +336,10 @@ secondary = (pte_val(batch->pte[i]) & _PAGE_SECONDARY) >> 15; if (secondary) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(batch->pte[i]) & _PAGE_GROUP_IX) >> 12; - hptep = htab_data.htab + slot; + hptep = htab_address + slot; avpn = va >> 23; if (large) diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c --- linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c 2005-01-04 19:08:37.000000000 +1100 @@ -74,7 +74,8 @@ extern unsigned long dart_tablebase; #endif /* CONFIG_U3_DART */ -HTAB htab_data = {NULL, 0, 0, 0, 0}; +HPTE *htab_address; +unsigned long htab_hash_mask; extern unsigned long _SDR1; @@ -113,7 +114,7 @@ hash = hpt_hash(vpn, large); - hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); #ifdef CONFIG_PPC_PSERIES if (systemcfg->platform & PLATFORM_LPAR) @@ -155,12 +156,11 @@ htab_size_bytes = pteg_count << 7; } - htab_data.htab_num_ptegs = pteg_count; - htab_data.htab_hash_mask = pteg_count - 1; + htab_hash_mask = pteg_count - 1; if (systemcfg->platform & PLATFORM_LPAR) { /* Using a hypervisor which owns the htab */ - htab_data.htab = NULL; + htab_address = NULL; _SDR1 = 0; } else { /* Find storage for the HPT. Must be contiguous in @@ -175,7 +175,7 @@ ppc64_terminate_msg(0x20, "hpt space"); loop_forever(); } - htab_data.htab = abs_to_virt(table); + htab_address = abs_to_virt(table); /* htab absolute addr + encoded htabsize */ _SDR1 = table + __ilog2(pteg_count) - 11; @@ -356,7 +356,7 @@ secondary = (pte_val(pte) & _PAGE_SECONDARY) >> 15; if (secondary) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(pte) & _PAGE_GROUP_IX) >> 12; ppc_md.hpte_invalidate(slot, va, huge, local); diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c --- linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c 2005-01-04 19:02:45.000000000 +1100 @@ -832,7 +832,7 @@ hash = hpt_hash(vpn, 1); if (pte_val(old_pte) & _PAGE_SECONDARY) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(old_pte) & _PAGE_GROUP_IX) >> 12; if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1) @@ -846,7 +846,7 @@ prpn = pte_pfn(old_pte); repeat: - hpte_group = ((hash & htab_data.htab_hash_mask) * + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; /* Update the linux pte with the HPTE slot */ @@ -863,13 +863,13 @@ /* Primary is full, try the secondary */ if (unlikely(slot == -1)) { pte_val(new_pte) |= _PAGE_SECONDARY; - hpte_group = ((~hash & htab_data.htab_hash_mask) * + hpte_group = ((~hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; slot = ppc_md.hpte_insert(hpte_group, va, prpn, 1, hpteflags, 0, 1); if (slot == -1) { if (mftb() & 0x1) - hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; ppc_md.hpte_remove(hpte_group); goto repeat; diff -ruN linus-bk-sfr.12/arch/ppc64/mm/init.c linus-bk-sfr.13/arch/ppc64/mm/init.c --- linus-bk-sfr.12/arch/ppc64/mm/init.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/init.c 2005-01-04 19:03:14.000000000 +1100 @@ -168,7 +168,7 @@ hash = hpt_hash(vpn, 0); - hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); /* Panic if a pte grpup is full */ if (ppc_md.hpte_insert(hpteg, va, pa >> PAGE_SHIFT, 0, diff -ruN linus-bk-sfr.12/include/asm-ppc64/mmu.h linus-bk-sfr.13/include/asm-ppc64/mmu.h --- linus-bk-sfr.12/include/asm-ppc64/mmu.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-sfr.13/include/asm-ppc64/mmu.h 2005-01-04 19:10:32.000000000 +1100 @@ -98,15 +98,8 @@ #define PP_RXRX 3 /* Supervisor read, User read */ -typedef struct { - HPTE * htab; - unsigned long htab_num_ptegs; - unsigned long htab_hash_mask; - unsigned long next_round_robin; - unsigned long last_kernel_address; -} HTAB; - -extern HTAB htab_data; +extern HPTE * htab_address; +extern unsigned long htab_hash_mask; static inline unsigned long hpt_hash(unsigned long vpn, int large) { -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a5b39bfc/attachment.pgp From moilanen at austin.ibm.com Wed Jan 5 07:30:31 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 4 Jan 2005 14:30:31 -0600 Subject: [PATCH] xmon breakpoints fix for Power4/5 Message-ID: <20050104143031.62c25338@localhost> Looks like xmon breakpoints were not working on Power4/5. Here's a fix to the problem. Tested on Power3 and Power5 boxes. Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp Tue Jan 4 12:44:20 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Tue Jan 4 14:13:09 2005 @@ -1088,11 +1088,6 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { - printf("Hardware instruction breakpoint " - "not supported on this cpu\n"); - break; - } if (iabr) { iabr->enabled &= ~(BP_IABR | BP_IABR_TE); iabr = NULL; @@ -1101,10 +1096,15 @@ bpt_cmds(void) break; if (!check_bp_loc(a)) break; + bp = new_breakpoint(a); - if (bp != NULL) { + + if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) { bp->enabled |= BP_IABR | BP_IABR_TE; iabr = bp; + } else { + if (bp) + bp->enabled |= BP_TRAP; } break; _ From sjmunroe at us.ibm.com Wed Jan 5 09:02:04 2005 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Tue, 4 Jan 2005 16:02:04 -0600 Subject: ppc64 vDSO update In-Reply-To: <1101094716.13598.39.camel@gaston> Message-ID: Benjamin Herrenschmidt wrote on 11/21/2004 09:38:36 PM: > At the URL below, you can find a new version of the ppc64 vDSO patch against > a recent Linus bk tree. I intend to submit it upstream real soon as the work > on non-executable stack is waiting for it, though we must first make sure the > way symbols are exported to userland is ok for glibc. > > http://gate.crashing.org/~benh/ppc64-vdso-20041122.diff > ... > > (Craig: the signal issue is fixed now, either when building with > descriptors or > without). > > Ben. > Still haveing problems with VDSO/GLIBC integration. Basically any glibc make check test that uses signals is a space shot for both PPC32/PPC64. First it seems that glibc is expecting a (fairly normal) DSO image including two (2) LOAD entries in the program header. The current PPC64 kernel vdso images only contain one (1) LOAD entry: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E 0x10000 DYNAMIC 0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R 0x4 GNU_EH_FRAME 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4 This caused problems for the code in libc/elf/rtld.c that attempts to extract l_map_start/l_map_end for the vdso: else if (ph->p_type == PT_LOAD) { if (! l->l_addr) l->l_addr = ph->p_vaddr; else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) l->l_map_end = ph->p_vaddr + ph->p_memsz; else if ((ph->p_flags & PF_X) && ph->p_vaddr + ph->p_memsz >= l->l_text_end) l->l_text_end = ph->p_vaddr + ph->p_memsz; } This code will set l_addr but not l_map_end or l_text_end because it grabbed the p_vaddr from the 1st and only LOAD entry then continue the loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this causes the "assert (mapend > mapstart)" in __elf_preferred_address to fail. I hacked around this by removing the "else" from the "else if" but it just fails later. The remaining problem is we are getting into dl_iterate_phdr and taking a wild branch. This could be from the callback in dl_iterate_phdr and due to the incomplete nature of our vsdo. This is difficult to debug as the stack point (and TOC pointer in PPC64) are both clobbered by this point and GDB-6.1 gets totally confused. Ben: it would be handy if you could update the corefile support to include the vdso segments. Also please try a vdso with 2 LOAD segments. Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center From moilanen at austin.ibm.com Wed Jan 5 09:13:54 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 4 Jan 2005 16:13:54 -0600 Subject: in_be64() assembly Message-ID: <20050104161354.17f77ce7@localhost> I'm trying to use in_be64() and when I build, I get a compile errors: {standard input}: Assembler messages: {standard input}:5534: Error: syntax error; found `(' but expected `)' {standard input}:5534: Error: junk at end of line: `(3))' make[1]: *** [arch/ppc64/xmon/xmon.o] Error 1 make: *** [arch/ppc64/xmon] Error 2 make: *** Waiting for unfinished jobs.... Olof pointed out that in/out_le64 use a "b" operand for the addr. In in_be64(), when changed the "m" operand to a "b", the kernel built fine (although I haven't tried running it yet). What does the "b" operand mean? Patch used below. Thanks, Jake --- diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Tue Jan 4 15:59:50 2005 @@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons unsigned long ret; __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" - : "=r" (ret) : "m" (*addr)); + : "=r" (ret) : "b" (*addr)); return ret; } _ From amodra at bigpond.net.au Wed Jan 5 10:31:32 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Jan 2005 10:01:32 +1030 Subject: ppc64 vDSO update In-Reply-To: References: <1101094716.13598.39.camel@gaston> Message-ID: <20050104233132.GF11457@bubble.modra.org> On Tue, Jan 04, 2005 at 04:02:04PM -0600, Steve Munroe wrote: > First it seems that glibc is expecting a (fairly normal) DSO image > including two (2) LOAD entries in the program header. The current PPC64 > kernel vdso images only contain one (1) LOAD entry: > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E > 0x10000 > DYNAMIC 0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R 0x4 > GNU_EH_FRAME 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4 There's absolutely nothing wrong with an executable or shared lib having just one PT_LOAD segment. It's a glibc bug if ld.so can't handle it. > This caused problems for the code in libc/elf/rtld.c that attempts to > extract l_map_start/l_map_end for the vdso: > > else if (ph->p_type == PT_LOAD) > { > if (! l->l_addr) > l->l_addr = ph->p_vaddr; > else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) > l->l_map_end = ph->p_vaddr + ph->p_memsz; > else if ((ph->p_flags & PF_X) > && ph->p_vaddr + ph->p_memsz >= l->l_text_end) > l->l_text_end = ph->p_vaddr + ph->p_memsz; > } > > This code will set l_addr but not l_map_end or l_text_end because it > grabbed the p_vaddr from the 1st and only LOAD entry then continue the > loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this > causes the "assert (mapend > mapstart)" in __elf_preferred_address to > fail. I hacked around this by removing the "else" from the "else if" but > it just fails later. Buggy code. All the "else" keywords should be removed. ie. if (! l->l_addr) l->l_addr = ph->p_vaddr; if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) l->l_map_end = ph->p_vaddr + ph->p_memsz; if ((ph->p_flags & PF_X) && ph->p_vaddr + ph->p_memsz >= l->l_text_end) l->l_text_end = ph->p_vaddr + ph->p_memsz; > The remaining problem is we are getting into dl_iterate_phdr and taking a > wild branch. This could be from the callback in dl_iterate_phdr and due to > the incomplete nature of our vsdo. This is difficult to debug as the stack > point (and TOC pointer in PPC64) are both clobbered by this point and > GDB-6.1 gets totally confused. I don't know what to suggest, other than brute force debugging by poking .long 0 over code paths you suspect might be executed. -- Alan Modra IBM OzLabs - Linux Technology Centre From paulus at samba.org Wed Jan 5 10:53:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 10:53:34 +1100 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <20050104143031.62c25338@localhost> References: <20050104143031.62c25338@localhost> Message-ID: <16859.11390.511469.875831@cargo.ozlabs.ibm.com> Jake Moilanen writes: > Looks like xmon breakpoints were not working on Power4/5. Here's a fix > to the problem. You mean the 'bi' command didn't make a breakpoint? Just use the 'b' command instead. Also you take out the if (bp != NULL) check which is needed. Rejected. Paul. From linas at austin.ibm.com Wed Jan 5 11:10:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 4 Jan 2005 18:10:16 -0600 Subject: in_be64() assembly In-Reply-To: <20050104161354.17f77ce7@localhost> References: <20050104161354.17f77ce7@localhost> Message-ID: <20050105001016.GC22274@austin.ibm.com> On Tue, Jan 04, 2005 at 04:13:54PM -0600, Jake Moilanen was heard to remark: > > diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h > --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 > +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Tue Jan 4 15:59:50 2005 > @@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons > unsigned long ret; > > __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" > - : "=r" (ret) : "m" (*addr)); > + : "=r" (ret) : "b" (*addr)); > return ret; > } Very weird. Why anyone thought that doing a load with a zero offset is somehow 'correct' seems strange to me. The compiler is quite capable of computing offsets, and I don't see any aliasing issues. Certainly the 8, 16 and 32-bit versions doen't do this kind of funny business. Does the following work? static inline unsigned long in_be64(const whatever ...) { unsigned long ret; __asm__ __volatile__("ld %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); return ret; } I suspect in_le64 is also borken, it should be "ld %1,%2\n" ...with : "=r" (ret) , "=r" (tmp) : "m" (*addr) , instead of the b. out_le64 looks broken in the same way. --linas From paulus at samba.org Wed Jan 5 11:24:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 11:24:44 +1100 Subject: in_be64() assembly In-Reply-To: <20050104161354.17f77ce7@localhost> References: <20050104161354.17f77ce7@localhost> Message-ID: <16859.13260.426004.296846@cargo.ozlabs.ibm.com> Jake Moilanen writes: > In in_be64(), when changed the "m" operand to a "b", the kernel built > fine (although I haven't tried running it yet). What does the "b" > operand mean? "b" means the value should be in a "base" register, i.e. any gpr other than gpr0. Your patch isn't correct. We can either do: __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" : "=r" (ret) : "b" (addr)); (note no "*" before addr) or we can do __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); On the whole I think I prefer the second. Paul. From paulus at samba.org Wed Jan 5 11:35:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 11:35:34 +1100 Subject: in_be64() assembly In-Reply-To: <20050105001016.GC22274@austin.ibm.com> References: <20050104161354.17f77ce7@localhost> <20050105001016.GC22274@austin.ibm.com> Message-ID: <16859.13910.16173.232170@cargo.ozlabs.ibm.com> Linas Vepstas writes: > Very weird. Why anyone thought that doing a load with a zero offset > is somehow 'correct' seems strange to me. The compiler is quite It's one of the two addressing modes that PPC has - register + offset and register + register. > I suspect in_le64 is also borken, it should be > > "ld %1,%2\n" It and out_le64 are correct as they stand. They could be rewritten as "ld%U2%X2 %1,%2" etc. Paul. From david at gibson.dropbear.id.au Wed Jan 5 14:54:28 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 5 Jan 2005 14:54:28 +1100 Subject: [PPC64] Add performance monitor register information to processor.h Message-ID: <20050105035428.GC21259@zax> Andrew, please apply: Most special purpose registers on the ppc64 have both the SPR number, and the various fields within the register defined in asm-ppc64/processor.h. So far that's not true for the performance counter control registers, MMCR0 and MMCRA. They have the SPR numbers defined, but the internal fields are defined in the oprofile code and (just a few) in traps.c where they're actually used. This patch moves all the MMCR0 and MMCRA definitions, plus the MSR performance monitor bit, MSR_PMM, into processor.h. Index: working-2.6/include/asm-ppc64/processor.h =================================================================== --- working-2.6.orig/include/asm-ppc64/processor.h 2005-01-05 14:46:10.557311664 +1100 +++ working-2.6/include/asm-ppc64/processor.h 2005-01-05 14:46:12.551274880 +1100 @@ -44,6 +44,7 @@ #define MSR_DR_LG 4 /* Data Relocate */ #define MSR_PE_LG 3 /* Protection Enable */ #define MSR_PX_LG 2 /* Protection Exclusive Mode */ +#define MSR_PMM_LG 2 /* Performance monitor */ #define MSR_RI_LG 1 /* Recoverable Exception */ #define MSR_LE_LG 0 /* Little Endian */ @@ -76,6 +77,7 @@ #define MSR_DR __MASK(MSR_DR_LG) /* Data Relocate */ #define MSR_PE __MASK(MSR_PE_LG) /* Protection Enable */ #define MSR_PX __MASK(MSR_PX_LG) /* Protection Exclusive Mode */ +#define MSR_PMM __MASK(MSR_PMM_LG) /* Performance monitor */ #define MSR_RI __MASK(MSR_RI_LG) /* Recoverable Exception */ #define MSR_LE __MASK(MSR_LE_LG) /* Little Endian */ @@ -305,6 +307,9 @@ #define SPRN_SIAR 780 #define SPRN_SDAR 781 #define SPRN_MMCRA 786 +#define MMCRA_SIHV 0x10000000UL /* state of MSR HV when SIAR set */ +#define MMCRA_SIPR 0x08000000UL /* state of MSR PR when SIAR set */ +#define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */ #define SPRN_PMC1 787 #define SPRN_PMC2 788 #define SPRN_PMC3 789 @@ -314,6 +319,26 @@ #define SPRN_PMC7 793 #define SPRN_PMC8 794 #define SPRN_MMCR0 795 +#define MMCR0_FC 0x80000000UL /* freeze counters. set to 1 on a perfmon exception */ +#define MMCR0_FCS 0x40000000UL /* freeze in supervisor state */ +#define MMCR0_KERNEL_DISABLE MMCR0_FCS +#define MMCR0_FCP 0x20000000UL /* freeze in problem state */ +#define MMCR0_PROBLEM_DISABLE MMCR0_FCP +#define MMCR0_FCM1 0x10000000UL /* freeze counters while MSR mark = 1 */ +#define MMCR0_FCM0 0x08000000UL /* freeze counters while MSR mark = 0 */ +#define MMCR0_PMXE 0x04000000UL /* performance monitor exception enable */ +#define MMCR0_FCECE 0x02000000UL /* freeze counters on enabled condition or event */ +/* time base exception enable */ +#define MMCR0_TBEE 0x00400000UL /* time base exception enable */ +#define MMCR0_PMC1INTCONTROL 0x00008000UL /* PMC1 count enable*/ +#define MMCR0_PMCNINTCONTROL 0x00004000UL /* PMCn count enable*/ +#define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ +#define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ +#define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ +#define MMCR0_FCTI 0x00000008UL /* freeze counters in tags inactive mode */ +#define MMCR0_FCTA 0x00000004UL /* freeze counters in tags active mode */ +#define MMCR0_FCWAIT 0x00000002UL /* freeze counter in WAIT state */ +#define MMCR0_FCHV 0x00000001UL /* freeze conditions in hypervisor mode */ #define SPRN_MMCR1 798 /* Short-hand versions for a number of the above SPRNs */ Index: working-2.6/arch/ppc64/oprofile/op_impl.h =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_impl.h 2005-01-05 14:46:10.558311512 +1100 +++ working-2.6/arch/ppc64/oprofile/op_impl.h 2005-01-05 14:46:12.551274880 +1100 @@ -14,44 +14,6 @@ #define OP_MAX_COUNTER 8 -#define MSR_PMM (1UL << (63 - 61)) - -/* freeze counters. set to 1 on a perfmon exception */ -#define MMCR0_FC (1UL << (31 - 0)) - -/* freeze in supervisor state */ -#define MMCR0_KERNEL_DISABLE (1UL << (31 - 1)) - -/* freeze in problem state */ -#define MMCR0_PROBLEM_DISABLE (1UL << (31 - 2)) - -/* freeze counters while MSR mark = 1 */ -#define MMCR0_FCM1 (1UL << (31 - 3)) - -/* performance monitor exception enable */ -#define MMCR0_PMXE (1UL << (31 - 5)) - -/* freeze counters on enabled condition or event */ -#define MMCR0_FCECE (1UL << (31 - 6)) - -/* PMC1 count enable*/ -#define MMCR0_PMC1INTCONTROL (1UL << (31 - 16)) - -/* PMCn count enable*/ -#define MMCR0_PMCNINTCONTROL (1UL << (31 - 17)) - -/* performance monitor alert has occurred, set to 0 after handling exception */ -#define MMCR0_PMAO (1UL << (31 - 24)) - -/* state of MSR HV when SIAR set */ -#define MMCRA_SIHV (1UL << (63 - 35)) - -/* state of MSR PR when SIAR set */ -#define MMCRA_SIPR (1UL << (63 - 36)) - -/* enable sampling */ -#define MMCRA_SAMPLE_ENABLE (1UL << (63 - 63)) - /* Per-counter configuration as set via oprofilefs. */ struct op_counter_config { unsigned long valid; Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-05 14:46:10.558311512 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-05 14:46:12.552274728 +1100 @@ -545,9 +545,6 @@ } /* Ensure exceptions are disabled */ -#define MMCR0_PMXE (1UL << (31 - 5)) -#define MMCR0_PMAO (1UL << (31 - 24)) - static void dummy_perf(struct pt_regs *regs) { unsigned int mmcr0 = mfspr(SPRN_MMCR0); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Jan 5 16:09:38 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 16:09:38 +1100 Subject: [PATCH] PPC64 Use newer RTAS call when available Message-ID: <16859.30354.953245.690482@cargo.ozlabs.ibm.com> This patch is from Nathan Fontenot originally. The PPC64 EEH code needs a small update to start using the ibm,read-slot-reset-state2 rtas call if available. The currently used ibm,read-slot-reset-state call will be going away on future machines. This patch attempts to use the newer rtas call if available and falls back the older version otherwise. This will maintain EEH slot checking capabilities on all future and current firmware levels. Signed-off-by: Nathan Fontenot Signed-off-by: Paul Mackerras diff -urN base-2.6/arch/ppc64/kernel/eeh.c test/arch/ppc64/kernel/eeh.c --- base-2.6/arch/ppc64/kernel/eeh.c 2005-01-05 14:29:58.333466400 +1100 +++ test/arch/ppc64/kernel/eeh.c 2005-01-05 15:04:59.937483424 +1100 @@ -96,6 +96,7 @@ static int ibm_set_eeh_option; static int ibm_set_slot_reset; static int ibm_read_slot_reset_state; +static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; static int eeh_subsystem_enabled; @@ -408,6 +409,27 @@ } /** + * read_slot_reset_state - Read the reset state of a device node's slot + * @dn: device node to read + * @rets: array to return results in + */ +static int read_slot_reset_state(struct device_node *dn, int rets[]) +{ + int token, outputs; + + if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { + token = ibm_read_slot_reset_state2; + outputs = 4; + } else { + token = ibm_read_slot_reset_state; + outputs = 3; + } + + return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid)); +} + +/** * eeh_panic - call panic() for an eeh event that cannot be handled. * The philosophy of this routine is that it is better to panic and * halt the OS than it is to risk possible data corruption by @@ -509,7 +531,7 @@ int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) { int ret; - int rets[2]; + int rets[3]; unsigned long flags; int rc, reset_state; struct eeh_event *event; @@ -540,11 +562,8 @@ atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { /* re-read the slot reset state */ - rets[0] = -1; - rtas_call(ibm_read_slot_reset_state, 3, 3, rets, - dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid)); + if (read_slot_reset_state(dn, rets) != 0) + rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } return 0; @@ -557,10 +576,7 @@ * function zero of a multi-function device. * In any case they must share a common PHB. */ - ret = rtas_call(ibm_read_slot_reset_state, 3, 3, rets, - dn->eeh_config_addr, BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid)); - + ret = read_slot_reset_state(dn, rets); if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { __get_cpu_var(false_positives)++; return 0; @@ -756,6 +772,7 @@ ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); + ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); From moilanen at austin.ibm.com Thu Jan 6 01:42:02 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 08:42:02 -0600 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <16859.11390.511469.875831@cargo.ozlabs.ibm.com> References: <20050104143031.62c25338@localhost> <16859.11390.511469.875831@cargo.ozlabs.ibm.com> Message-ID: <20050105084202.5102b467@localhost> On Wed, 5 Jan 2005 10:53:34 +1100 Paul Mackerras wrote: > Jake Moilanen writes: > > > Looks like xmon breakpoints were not working on Power4/5. Here's a fix > > to the problem. > > You mean the 'bi' command didn't make a breakpoint? Just use the 'b' > command instead. Also you take out the if (bp != NULL) check which is > needed. I may have misunderstood what Anton wanted when I talked w/ him yesterday, but I was under the impression that he wanted 'bi' and 'bd' fixed for Power4/5/LPAR. I pretty much just made 'bi' work like 'b' for Power4/5. I should have been a little more explicit when I wrote up the description of the patch. If I misunderstood, please just throw this follow up patch away. In the follow up, I also included the (bp != NULL) even though it should not matter because we reuse the same bp everytime. I do agree that it should still have the check. I will be posting the 'bd' fix for LPAR shortly. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp Wed Jan 5 08:14:09 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 08:15:48 2005 @@ -1050,7 +1050,7 @@ static char *breakpoint_help_string = "b [cnt] set breakpoint at given instr addr\n" "bc clear all breakpoints\n" "bc clear breakpoint number n or at addr\n" - "bi [cnt] set hardware instr breakpoint (broken?)\n" + "bi [cnt] set hardware instr breakpoint\n" "bd [cnt] set hardware data breakpoint (broken?)\n" ""; @@ -1088,11 +1088,6 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { - printf("Hardware instruction breakpoint " - "not supported on this cpu\n"); - break; - } if (iabr) { iabr->enabled &= ~(BP_IABR | BP_IABR_TE); iabr = NULL; @@ -1101,11 +1096,16 @@ bpt_cmds(void) break; if (!check_bp_loc(a)) break; + bp = new_breakpoint(a); - if (bp != NULL) { - bp->enabled |= BP_IABR | BP_IABR_TE; - iabr = bp; + if (bp) { + if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) { + bp->enabled |= BP_IABR | BP_IABR_TE; + iabr = bp; + } else + bp->enabled |= BP_TRAP; } + break; case 'c': _ From moilanen at austin.ibm.com Thu Jan 6 01:52:19 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 08:52:19 -0600 Subject: [PATCH] xmon dabr support for LPAR Message-ID: <20050105085219.5eab02a8@localhost> Here's xmon DABR support for LPAR. I added SETCTRLREG which is a wrapper for setting a controlled register that will choose to use either an hcall or mtspr depending on what mode the machine is in. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-dabr arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-dabr Wed Jan 5 08:17:07 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 08:23:50 2005 @@ -712,7 +712,7 @@ static void insert_bpts(void) static void insert_cpu_bpts(void) { if (dabr.enabled) - set_dabr(dabr.address | (dabr.enabled & 7)); + set_controlled_dabr(dabr.address | (dabr.enabled & 7)); if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR)) set_iabr(iabr->address | (iabr->enabled & (BP_IABR|BP_IABR_TE))); @@ -740,7 +740,7 @@ static void remove_bpts(void) static void remove_cpu_bpts(void) { - set_dabr(0); + set_controlled_dabr(0); if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR)) set_iabr(0); } @@ -1051,7 +1051,7 @@ static char *breakpoint_help_string = "bc clear all breakpoints\n" "bc clear breakpoint number n or at addr\n" "bi [cnt] set hardware instr breakpoint\n" - "bd [cnt] set hardware data breakpoint (broken?)\n" + "bd [cnt] set hardware data breakpoint\n" ""; static void diff -puN arch/ppc64/xmon/privinst.h~xmon-lpar-dabr arch/ppc64/xmon/privinst.h --- linux-2.6-bk/arch/ppc64/xmon/privinst.h~xmon-lpar-dabr Wed Jan 5 08:17:22 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/privinst.h Wed Jan 5 08:20:02 2005 @@ -25,6 +25,16 @@ GETREG(cr) static inline void set_ ## name (long val) \ { asm volatile ("mtspr " #n ",%0" : : "r" (val)); } +/* + * If a register is a controlled resource protected when there + * is a hypervisor, then use this command. + */ +#define SETCTRLREG(name) \ + extern inline void set_lpar_ ##name(long val); \ + extern inline void set_controlled_ ## name (long val) \ + { (systemcfg->platform == PLATFORM_PSERIES_LPAR) ? \ + set_lpar_ ##name (val) : set_ ##name (val); } + GSETSPR(0, mq) GSETSPR(1, xer) GSETSPR(4, rtcu) @@ -48,6 +58,8 @@ GSETSPR(1009, hid1) GSETSPR(1010, iabr) GSETSPR(1013, dabr) GSETSPR(1023, pir) + +SETCTRLREG(dabr) static inline void store_inst(void *p) { diff -puN arch/ppc64/xmon/start.c~xmon-lpar-dabr arch/ppc64/xmon/start.c --- linux-2.6-bk/arch/ppc64/xmon/start.c~xmon-lpar-dabr Wed Jan 5 08:17:49 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/start.c Wed Jan 5 08:20:49 2005 @@ -46,6 +46,16 @@ static int __init setup_xmon_sysrq(void) __initcall(setup_xmon_sysrq); #endif /* CONFIG_MAGIC_SYSRQ */ +inline void set_lpar_dabr(long val) +{ + int rc; + + rc = plpar_hcall_norets(H_SET_DABR, val); + + if (rc != H_Success) + xmon_printf("Warning: setting DABR failed. rc = %d\n", rc); +} + int xmon_write(void *handle, void *ptr, int nb) { _ From linas at austin.ibm.com Thu Jan 6 07:27:56 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 5 Jan 2005 14:27:56 -0600 Subject: [PATCH] PPC64: xmon recursion Message-ID: <20050105202756.GF22274@austin.ibm.com> Hi, I've had a number of problems with recursive xmon calls, primarily because longjump was returning incorrectly. The following patch fixes this problem. Please review and forward upstream. --linas Signed-off-by: Linas Vepstas ===== arch/ppc64/xmon/setjmp.c 1.1 vs edited ===== --- 1.1/arch/ppc64/xmon/setjmp.c 2002-02-14 06:14:36 -06:00 +++ edited/arch/ppc64/xmon/setjmp.c 2004-12-14 17:51:29 -06:00 @@ -73,5 +73,6 @@ xmon_longjmp(long *buf, int val) ld 2,16(%0)\n\ mtlr 0\n\ mr 3,%1\n\ + blr \n\ " : : "r" (buf), "r" (val)); } From moilanen at austin.ibm.com Thu Jan 6 07:45:02 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:45:02 -0600 Subject: [PATCH 0/2] xmon io space read Message-ID: <20050105144502.56a15bcd@localhost> These patches allow xmon to read from ioremapped IO space. It uses a command very similar to the normal memory read. I elected to not reuse the memory read code because I wanted some extra "security" to help prevent an inadvertent destructive read. I had to also add a debugger_fault_handler() in bad_page_fault() to catch an illegal attempt at hashing a bad page via a hcall. Patch 1/2: Fix for in_be64() Patch 2/2: xmon code to read from io space. Thanks, Jake From moilanen at austin.ibm.com Thu Jan 6 07:52:55 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:52:55 -0600 Subject: [PATCH 1/2] xmon io space read In-Reply-To: <20050105144502.56a15bcd@localhost> References: <20050105144502.56a15bcd@localhost> Message-ID: <20050105145255.41819748@localhost> Here is the fix suggested by Paulus for in_be64(). Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Wed Jan 5 08:08:03 2005 @@ -371,7 +371,7 @@ static inline unsigned long in_be64(cons { unsigned long ret; - __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" + __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); return ret; } _ From moilanen at austin.ibm.com Thu Jan 6 07:57:57 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:57:57 -0600 Subject: [PATCH 2/2] xmon io space read In-Reply-To: <20050105144502.56a15bcd@localhost> References: <20050105144502.56a15bcd@localhost> Message-ID: <20050105145757.62c84c3b@localhost> Here is the support code for xmon to read IO space. It should come in handy to debug driver and bringup issues. Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-io-read arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-io-read Wed Jan 5 11:50:57 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 14:19:57 2005 @@ -93,6 +93,7 @@ static int mwrite(unsigned long, void *, static int handle_fault(struct pt_regs *); static void byterev(unsigned char *, int); static void memex(void); +static void iomemex(void); static int bsesc(void); static void dump(void); static void prdump(unsigned long, long); @@ -175,6 +176,7 @@ Commands:\n\ di dump instructions\n\ df dump float values\n\ dd dump double values\n\ + i IO memory dump\n\ e print exception information\n\ f flush cache\n\ la lookup symbol+offset of specified address\n\ @@ -794,6 +796,9 @@ cmds(struct pt_regs *excp) memex(); } break; + case 'i': + iomemex(); + break; case 'd': dump(); break; @@ -1855,6 +1860,130 @@ memex(void) } adrs += inc; } +} + +static char *iomemex_help_string = + "IO Memory examine command usage:\n" + "i addr [size] [options]\n" + " size may include chars from this set:\n" + " 1 examine byte (default)\n" + " 2 examine short (2 byte)\n" + " 4 examine int (4 byte)\n" + " 8 examine long (8 byte)\n" + " options may include chars from this set:\n" + " l little endian (default)\n" + " b big endian\n" + " a absolute address - does not add on pci_io_base\n" + "NOTE: Defaults to adding on pci_io_base\n" + ""; + + +#define LE 0 +#define BE 1 + +static void +ioread(unsigned long addr, int size, int endiness) +{ + int i; + long data; + + if (setjmp(bus_error_jmp) == 0) { + catch_memory_errors = 1; + sync(); + switch (size) { + case 1: + data = in_8((char *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.2x\n", addr, data); + break; + + case 2: + data = endiness ? in_be16((short *)addr) : in_le16((short *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.4x\n", addr, data); + break; + case 4: + data = endiness ? in_be32((int *)addr) : in_le32((int *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.8x\n", addr, data); + break; + case 8: + data = endiness ? in_be64((long *)addr) : in_le64((long *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.16x\n", addr, data); + break; + default: + printf("ioread: invalid size (%d)\n", size); + } + } else { + printf("%.16lx: ", addr); + for (i = 0; i < size; i++) + printf("%s", fault_chars[fault_type]); + printf("\n"); + } + + catch_memory_errors = 0; + +} + +static void +iomemex(void) +{ + int size = 1; + int cmd; + int endiness = LE; + int absolute = 0; + + scanhex((void *)&adrs); + cmd = skipbl(); + if (cmd == '?') { + printf(iomemex_help_string); + return; + } else if (cmd == '\n' && !adrs) { + printf("pci_io_base: 0x%lx\n", pci_io_base); + return; + } + + termch = cmd; + + while ((cmd = skipbl()) != '\n') { + switch (cmd) { + case '1': size = 1; break; + case '2': size = 2; break; + case '4': size = 4; break; + case '8': size = 8; break; + case 'l': endiness = LE; break; + case 'b': endiness = BE; break; + case 'a': absolute = 1; break; + } + } + + if(size <= 0) + size = 1; + else if(size > 8) + size = 8; + + if (!absolute) + adrs += pci_io_base; + + printf("Will attempt to read:\n"); + printf("address:\t0x%lx\n", adrs); + printf("size:\t\t0x%lx\n", size); + printf("endiness:\t%s\n", endiness ? "Big" : "Little"); + printf("Are you sure (Y/n): "); + fflush(stdout); + flush_input(); + + cmd = skipbl(); + + if (cmd == 'n') + return; + + ioread(adrs, size, endiness); } int diff -puN arch/ppc64/mm/fault.c~xmon-io-read arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~xmon-io-read Wed Jan 5 13:27:30 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Wed Jan 5 13:27:38 2005 @@ -297,6 +297,9 @@ void bad_page_fault(struct pt_regs *regs return; } + if (debugger_fault_handler(regs)) + return; + /* kernel has accessed a bad area */ die("Kernel access of bad area", regs, sig); } _ From olof at austin.ibm.com Thu Jan 6 11:07:21 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 5 Jan 2005 18:07:21 -0600 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c Message-ID: <20050106000721.GA20029@austin.ibm.com> Hi, This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to the naming convention of the other iommu files. This is part of the iommu cleanup, but broken out as a separate patch since for mainline, a BK rename is more appropriate. Still, we need a patch to apply for non-BK-based trees (-mm) Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c | 89 ++++++++++++++++++++ linux-2.5/arch/ppc64/kernel/pci_dma_direct.c | 89 -------------------- 2 files changed, 89 insertions(+), 89 deletions(-) diff -L arch/ppc64/kernel/pci_dma_direct.c -puN arch/ppc64/kernel/pci_dma_direct.c~iommu-rename-pci_dma_direct /dev/null --- linux-2.5/arch/ppc64/kernel/pci_dma_direct.c +++ /dev/null 2004-12-07 13:25:26.079467688 -0600 @@ -1,89 +0,0 @@ -/* - * Support for DMA from PCI devices to main memory on - * machines without an iommu or with directly addressable - * RAM (typically a pmac with 2Gb of RAM or less) - * - * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include - -#include "pci.h" - -static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - void *ret; - - ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); - if (ret != NULL) { - memset(ret, 0, size); - *dma_handle = virt_to_abs(ret); - } - return ret; -} - -static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - free_pages((unsigned long)vaddr, get_order(size)); -} - -static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, - size_t size, enum dma_data_direction direction) -{ - return virt_to_abs(ptr); -} - -static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction direction) -{ -} - -static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ - int i; - - for (i = 0; i < nents; i++, sg++) { - sg->dma_address = page_to_phys(sg->page) + sg->offset; - sg->dma_length = sg->length; - } - - return nents; -} - -static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ -} - -void __init pci_dma_init_direct(void) -{ - pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; - pci_dma_ops.pci_map_single = pci_direct_map_single; - pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; - pci_dma_ops.pci_map_sg = pci_direct_map_sg; - pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; -} diff -puN /dev/null arch/ppc64/kernel/pci_direct_iommu.c --- /dev/null 2004-12-07 13:25:26.079467688 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c 2004-12-07 16:17:31.549078536 -0600 @@ -0,0 +1,89 @@ +/* + * Support for DMA from PCI devices to main memory on + * machines without an iommu or with directly addressable + * RAM (typically a pmac with 2Gb of RAM or less) + * + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "pci.h" + +static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, + dma_addr_t *dma_handle) +{ + void *ret; + + ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); + if (ret != NULL) { + memset(ret, 0, size); + *dma_handle = virt_to_abs(ret); + } + return ret; +} + +static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, + void *vaddr, dma_addr_t dma_handle) +{ + free_pages((unsigned long)vaddr, get_order(size)); +} + +static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, + size_t size, enum dma_data_direction direction) +{ + return virt_to_abs(ptr); +} + +static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ +} + +static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++, sg++) { + sg->dma_address = page_to_phys(sg->page) + sg->offset; + sg->dma_length = sg->length; + } + + return nents; +} + +static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ +} + +void __init pci_dma_init_direct(void) +{ + pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; + pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; + pci_dma_ops.pci_map_single = pci_direct_map_single; + pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; + pci_dma_ops.pci_map_sg = pci_direct_map_sg; + pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; +} _ From olof at austin.ibm.com Thu Jan 6 11:07:35 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 5 Jan 2005 18:07:35 -0600 Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch Message-ID: <20050106000735.GA20079@austin.ibm.com> Hi, Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted at simplifying the allocation schemes and modularising things for the various platforms. The IOMMU init functions are still a mess. This is an attempt to clean them up and make them somewhat easier to follow. The new rules are: 1. iommu_init_early_ is called before any PCI/VIO init is done 2. The pcibios fixup routines will call the iommu_{bus,dev}_setup functions appropriately as devices are added. TCE space allocation has changed somewhat: * On LPARs, nothing is really different. ibm,dma-window properties are still used to determine table sizes. * On pSeries SMP-mode (non-LPAR), the full TCE space per PHB is split up in 256MB chunks, each handed out to one child bus/slot as needed. This makes current max 7 child buses per PHB, something we're currently below on all machine models I'm aware of. * Exception to the above: Pre-POWER4 machines with Python PHBs have a full GB of DMA space allocated at the PHB level, since there are no EADS-level tables on such systems. * PowerMac and Maple still work like before: all buses/slots share one table. * VIO works like before, ibm,my-dma-window is used like before. * iSeries has not been touched much at all, besides the changed unit of the it_size variable in struct iommu_table. Other things changed: * Powermac and maple PCI/IOMMU inits have been changed a bit to conform to the new init structure * pci_dma_direct.c has been renamed pci_direct_iommu.c to match pci_iommu.c (see separate patch) * Likewise, a couple of the pci direct init functions have been renamed. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/Makefile | 2 linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c | 11 linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c | 3 linux-2.5-olof/arch/ppc64/kernel/iommu.c | 21 - linux-2.5-olof/arch/ppc64/kernel/maple_pci.c | 3 linux-2.5-olof/arch/ppc64/kernel/maple_setup.c | 7 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 283 ++++++++++---------- linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c | 5 linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c | 5 linux-2.5-olof/arch/ppc64/kernel/pci.c | 5 linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c | 2 linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c | 2 linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c | 7 linux-2.5-olof/arch/ppc64/kernel/prom.c | 11 linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c | 104 ++++--- linux-2.5-olof/arch/ppc64/kernel/vio.c | 18 - linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c | 4 linux-2.5-olof/include/asm-ppc64/iommu.h | 13 linux-2.5-olof/include/asm-ppc64/machdep.h | 2 linux-2.5-olof/include/asm-ppc64/pci-bridge.h | 8 20 files changed, 265 insertions(+), 251 deletions(-) diff -puN arch/ppc64/kernel/pci.c~iommu-cleanup arch/ppc64/kernel/pci.c --- linux-2.5/arch/ppc64/kernel/pci.c~iommu-cleanup 2005-01-05 16:59:18.108168880 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci.c 2005-01-05 16:59:18.235149576 -0600 @@ -845,6 +845,11 @@ void __devinit pcibios_fixup_bus(struct pcibios_fixup_device_resources(dev, bus); } + ppc_md.iommu_bus_setup(bus); + + list_for_each_entry(dev, &bus->devices, bus_list) + ppc_md.iommu_dev_setup(dev); + if (!pci_probe_only) return; diff -puN include/asm-ppc64/machdep.h~iommu-cleanup include/asm-ppc64/machdep.h --- linux-2.5/include/asm-ppc64/machdep.h~iommu-cleanup 2005-01-05 16:59:18.112168272 -0600 +++ linux-2.5-olof/include/asm-ppc64/machdep.h 2005-01-05 16:59:18.236149424 -0600 @@ -70,6 +70,8 @@ struct machdep_calls { long index, long npages); void (*tce_flush)(struct iommu_table *tbl); + void (*iommu_dev_setup)(struct pci_dev *dev); + void (*iommu_bus_setup)(struct pci_bus *bus); int (*probe)(int platform); void (*setup_arch)(void); diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup 2005-01-05 16:59:18.141163864 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-05 17:08:34.411597904 -0600 @@ -46,6 +46,9 @@ #include #include "pci.h" +#define DBG(fmt...) + +extern int is_python(struct device_node *); static void tce_build_pSeries(struct iommu_table *tbl, long index, long npages, unsigned long uaddr, @@ -121,7 +124,7 @@ static void tce_build_pSeriesLP(struct i } } -DEFINE_PER_CPU(void *, tce_page) = NULL; +static DEFINE_PER_CPU(void *, tce_page) = NULL; static void tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, @@ -233,85 +236,6 @@ static void tce_freemulti_pSeriesLP(stru } } - -static void iommu_buses_init(void) -{ - struct pci_controller *phb, *tmp; - struct device_node *dn, *first_dn; - int num_slots, num_slots_ilog2; - int first_phb = 1; - unsigned long tcetable_ilog2; - - /* - * We default to a TCE table that maps 2GB (4MB table, 22 bits), - * however some machines have a 3GB IO hole and for these we - * create a table that maps 1GB (2MB table, 21 bits) - */ - if (io_hole_start < 0x80000000UL) - tcetable_ilog2 = 21; - else - tcetable_ilog2 = 22; - - /* XXX Should we be using pci_root_buses instead? -ojn - */ - - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { - first_dn = ((struct device_node *)phb->arch_data)->child; - - /* Carve 2GB into the largest dma_window_size possible */ - for (dn = first_dn, num_slots = 0; dn != NULL; dn = dn->sibling) - num_slots++; - num_slots_ilog2 = __ilog2(num_slots); - - if ((1<dma_window_size = 1 << (tcetable_ilog2 - num_slots_ilog2); - - /* Reserve 16MB of DMA space on the first PHB. - * We should probably be more careful and use firmware props. - * In reality this space is remapped, not lost. But we don't - * want to get that smart to handle it -- too much work. - */ - phb->dma_window_base_cur = first_phb ? (1 << 12) : 0; - first_phb = 0; - - for (dn = first_dn; dn != NULL; dn = dn->sibling) - iommu_devnode_init_pSeries(dn); - } -} - - -static void iommu_buses_init_lpar(struct list_head *bus_list) -{ - struct list_head *ln; - struct pci_bus *bus; - struct device_node *busdn; - unsigned int *dma_window; - - for (ln=bus_list->next; ln != bus_list; ln=ln->next) { - bus = pci_bus_b(ln); - - if (bus->self) - busdn = pci_device_to_OF_node(bus->self); - else - busdn = bus->sysdata; /* must be a phb */ - - dma_window = (unsigned int *)get_property(busdn, "ibm,dma-window", NULL); - if (dma_window) { - /* Bussubno hasn't been copied yet. - * Do it now because iommu_table_setparms_lpar needs it. - */ - busdn->bussubno = bus->number; - iommu_devnode_init_pSeries(busdn); - } - - /* look for a window on a bridge even if the PHB had one */ - iommu_buses_init_lpar(&bus->children); - } -} - - static void iommu_table_setparms(struct pci_controller *phb, struct device_node *dn, struct iommu_table *tbl) @@ -336,27 +260,18 @@ static void iommu_table_setparms(struct tbl->it_busno = phb->bus->number; /* Units of tce entries */ - tbl->it_offset = phb->dma_window_base_cur; - - /* Adjust the current table offset to the next - * region. Measured in TCE entries. Force an - * alignment to the size allotted per IOA. This - * makes it easier to remove the 1st 16MB. - */ - phb->dma_window_base_cur += (phb->dma_window_size>>3); - phb->dma_window_base_cur &= - ~((phb->dma_window_size>>3)-1); - - /* Set the tce table size - measured in pages */ - tbl->it_size = ((phb->dma_window_base_cur - - tbl->it_offset) << 3) >> PAGE_SHIFT; + tbl->it_offset = phb->dma_window_base_cur >> PAGE_SHIFT; /* Test if we are going over 2GB of DMA space */ - if (phb->dma_window_base_cur > (1 << 19)) + if (phb->dma_window_base_cur + phb->dma_window_size > (1L << 31)) panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n"); + phb->dma_window_base_cur += phb->dma_window_size; + + /* Set the tce table size - measured in entries */ + tbl->it_size = phb->dma_window_size >> PAGE_SHIFT; + tbl->it_index = 0; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } @@ -375,82 +290,174 @@ static void iommu_table_setparms(struct */ static void iommu_table_setparms_lpar(struct pci_controller *phb, struct device_node *dn, - struct iommu_table *tbl) + struct iommu_table *tbl, + unsigned int *dma_window) { - unsigned int *dma_window; - - dma_window = (unsigned int *)get_property(dn, "ibm,dma-window", NULL); - if (!dma_window) panic("iommu_table_setparms_lpar: device %s has no" " ibm,dma-window property!\n", dn->full_name); tbl->it_busno = dn->bussubno; - tbl->it_size = (((((unsigned long)dma_window[4] << 32) | - (unsigned long)dma_window[5]) >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - tbl->it_offset = ((((unsigned long)dma_window[2] << 32) | - (unsigned long)dma_window[3]) >> 12); + + /* TODO: Parse field size properties properly. */ + tbl->it_size = (((unsigned long)dma_window[4] << 32) | + (unsigned long)dma_window[5]) >> PAGE_SHIFT; + tbl->it_offset = (((unsigned long)dma_window[2] << 32) | + (unsigned long)dma_window[3]) >> PAGE_SHIFT; tbl->it_base = 0; tbl->it_index = dma_window[0]; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } +static void iommu_bus_setup_pSeries(struct pci_bus *bus) +{ + struct device_node *dn, *pdn; + + DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self); + + /* For each (root) bus, we carve up the available DMA space in 256MB + * pieces. Since each piece is used by one (sub) bus/device, that would + * give a maximum of 7 devices per PHB. In most cases, this is plenty. + * + * The exception is on Python PHBs (pre-POWER4). Here we don't have EADS + * bridges below the PHB to allocate the sectioned tables to, so instead + * we allocate a 1GB table at the PHB level. + */ + + dn = pci_bus_to_OF_node(bus); + + if (!bus->self) { + /* Root bus */ + if (is_python(dn)) { + struct iommu_table *tbl; + + DBG("Python root bus %s\n", bus->name); + + /* 1GB window by default */ + dn->phb->dma_window_size = 1 << 30; + dn->phb->dma_window_base_cur = 0; + + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); + + iommu_table_setparms(dn->phb, dn, tbl); + dn->iommu_table = iommu_init_table(tbl); + } else { + /* 256 MB window by default */ + dn->phb->dma_window_size = 1 << 28; + /* always skip the first 256MB */ + dn->phb->dma_window_base_cur = 1 << 28; + + /* No table at PHB level for non-python PHBs */ + } + } else { + pdn = pci_bus_to_OF_node(bus->parent); + + if (!pdn->iommu_table) { + struct iommu_table *tbl; + /* First child, allocate new table (256MB window) */ + + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); + + iommu_table_setparms(dn->phb, dn, tbl); + + dn->iommu_table = iommu_init_table(tbl); + } else { + /* Lower than first child or under python, copy parent table */ + dn->iommu_table = pdn->iommu_table; + } + } +} + -void iommu_devnode_init_pSeries(struct device_node *dn) +static void iommu_bus_setup_pSeriesLP(struct pci_bus *bus) { struct iommu_table *tbl; + struct device_node *dn, *pdn; + unsigned int *dma_window = NULL; - tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), - GFP_KERNEL); - - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) - iommu_table_setparms_lpar(dn->phb, dn, tbl); - else - iommu_table_setparms(dn->phb, dn, tbl); + dn = pci_bus_to_OF_node(bus); + + /* Find nearest ibm,dma-window, walking up the device tree */ + for (pdn = dn; pdn != NULL; pdn = pdn->parent) { + dma_window = (unsigned int *)get_property(pdn, "ibm,dma-window", NULL); + if (dma_window != NULL) + break; + } + + WARN_ON(dma_window == NULL); + + if (!pdn->iommu_table) { + /* Bussubno hasn't been copied yet. + * Do it now because iommu_table_setparms_lpar needs it. + */ + pdn->bussubno = bus->number; + + tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), + GFP_KERNEL); - dn->iommu_table = iommu_init_table(tbl); + iommu_table_setparms_lpar(pdn->phb, pdn, tbl, dma_window); + + pdn->iommu_table = iommu_init_table(tbl); + } + + if (pdn != dn) + dn->iommu_table = pdn->iommu_table; } -void iommu_setup_pSeries(void) + +static void iommu_dev_setup_pSeries(struct pci_dev *dev) { - struct pci_dev *dev = NULL; struct device_node *dn, *mydn; - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) - iommu_buses_init_lpar(&pci_root_buses); - else - iommu_buses_init(); - - /* Now copy the iommu_table ptr from the bus devices down to every + DBG("iommu_dev_setup_pSeries, dev %p (%s)\n", dev, dev->pretty_name); + /* Now copy the iommu_table ptr from the bus device down to the * pci device_node. This means get_iommu_table() won't need to search * up the device tree to find it. */ - for_each_pci_dev(dev) { - mydn = dn = pci_device_to_OF_node(dev); + mydn = dn = pci_device_to_OF_node(dev); - while (dn && dn->iommu_table == NULL) - dn = dn->parent; - if (dn) - mydn->iommu_table = dn->iommu_table; - } + while (dn && dn->iommu_table == NULL) + dn = dn->parent; + + WARN_ON(!dn); + + if (dn) + mydn->iommu_table = dn->iommu_table; } +static void iommu_bus_setup_null(struct pci_bus *b) { } +static void iommu_dev_setup_null(struct pci_dev *d) { } + /* These are called very early. */ -void tce_init_pSeries(void) +void iommu_init_early_pSeries(void) { - if (!(systemcfg->platform & PLATFORM_LPAR)) { + if (of_chosen && get_property(of_chosen, "linux,iommu-off", NULL)) { + /* Direct I/O, IOMMU off */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + pci_direct_iommu_init(); + + return; + } + + if (systemcfg->platform & PLATFORM_LPAR) { + if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { + ppc_md.tce_build = tce_buildmulti_pSeriesLP; + ppc_md.tce_free = tce_freemulti_pSeriesLP; + } else { + ppc_md.tce_build = tce_build_pSeriesLP; + ppc_md.tce_free = tce_free_pSeriesLP; + } + ppc_md.iommu_bus_setup = iommu_bus_setup_pSeriesLP; + } else { ppc_md.tce_build = tce_build_pSeries; ppc_md.tce_free = tce_free_pSeries; - } else if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { - ppc_md.tce_build = tce_buildmulti_pSeriesLP; - ppc_md.tce_free = tce_freemulti_pSeriesLP; - } else { - ppc_md.tce_build = tce_build_pSeriesLP; - ppc_md.tce_free = tce_free_pSeriesLP; + ppc_md.iommu_bus_setup = iommu_bus_setup_pSeries; } + ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; + pci_iommu_init(); } diff -puN arch/ppc64/kernel/u3_iommu.c~iommu-cleanup arch/ppc64/kernel/u3_iommu.c --- linux-2.5/arch/ppc64/kernel/u3_iommu.c~iommu-cleanup 2005-01-05 16:59:18.145163256 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c 2005-01-05 16:59:18.242148512 -0600 @@ -91,6 +91,7 @@ static unsigned int *dart; static unsigned int dart_emptyval; static struct iommu_table iommu_table_u3; +static int iommu_table_u3_inited; static int dart_dirty; #define DBG(...) @@ -192,7 +193,6 @@ static int dart_init(struct device_node unsigned int regword; unsigned int i; unsigned long tmp; - struct page *p; if (dart_tablebase == 0 || dart_tablesize == 0) { printk(KERN_INFO "U3-DART: table not allocated, using direct DMA\n"); @@ -209,16 +209,15 @@ static int dart_init(struct device_node * that to work around what looks like a problem with the HT bridge * prefetching into invalid pages and corrupting data */ - tmp = __get_free_pages(GFP_ATOMIC, 1); - if (tmp == 0) - panic("U3-DART: Cannot allocate spare page !"); - dart_emptyval = DARTMAP_VALID | - ((virt_to_abs(tmp) >> PAGE_SHIFT) & DARTMAP_RPNMASK); + tmp = lmb_alloc(PAGE_SIZE, PAGE_SIZE); + if (!tmp) + panic("U3-DART: Cannot allocate spare page!"); + dart_emptyval = DARTMAP_VALID | ((tmp >> PAGE_SHIFT) & DARTMAP_RPNMASK); /* Map in DART registers. FIXME: Use device node to get base address */ dart = ioremap(DART_BASE, 0x7000); if (dart == NULL) - panic("U3-DART: Cannot map registers !"); + panic("U3-DART: Cannot map registers!"); /* Set initial control register contents: table base, * table size and enable bit @@ -227,7 +226,6 @@ static int dart_init(struct device_node ((dart_tablebase >> PAGE_SHIFT) << DARTCNTL_BASE_SHIFT) | (((dart_tablesize >> PAGE_SHIFT) & DARTCNTL_SIZE_MASK) << DARTCNTL_SIZE_SHIFT); - p = virt_to_page(dart_tablebase); dart_vbase = ioremap(virt_to_abs(dart_tablebase), dart_tablesize); /* Fill initial table */ @@ -240,35 +238,67 @@ static int dart_init(struct device_node /* Invalidate DART to get rid of possible stale TLBs */ dart_tlb_invalidate_all(); + printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n"); + + return 0; +} + +static void iommu_table_u3_setup(void) +{ iommu_table_u3.it_busno = 0; - - /* Units of tce entries */ iommu_table_u3.it_offset = 0; - - /* Set the tce table size - measured in pages */ - iommu_table_u3.it_size = dart_tablesize >> PAGE_SHIFT; + /* it_size is in number of entries */ + iommu_table_u3.it_size = dart_tablesize / sizeof(u32); /* Initialize the common IOMMU code */ iommu_table_u3.it_base = (unsigned long)dart_vbase; iommu_table_u3.it_index = 0; iommu_table_u3.it_blocksize = 1; - iommu_table_u3.it_entrysize = sizeof(u32); iommu_init_table(&iommu_table_u3); /* Reserve the last page of the DART to avoid possible prefetch * past the DART mapped area */ - set_bit(iommu_table_u3.it_mapsize - 1, iommu_table_u3.it_map); + set_bit(iommu_table_u3.it_size - 1, iommu_table_u3.it_map); +} - printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n"); +static void iommu_dev_setup_u3(struct pci_dev *dev) +{ + struct device_node *dn; - return 0; + /* We only have one iommu table on the mac for now, which makes + * things simple. Setup all PCI devices to point to this table + * + * We must use pci_device_to_OF_node() to make sure that + * we get the real "final" pointer to the device in the + * pci_dev sysdata and not the temporary PHB one + */ + dn = pci_device_to_OF_node(dev); + + if (dn) + dn->iommu_table = &iommu_table_u3; +} + +static void iommu_bus_setup_u3(struct pci_bus *bus) +{ + struct device_node *dn; + + if (!iommu_table_u3_inited) { + iommu_table_u3_inited = 1; + iommu_table_u3_setup(); + } + + dn = pci_bus_to_OF_node(bus); + + if (dn) + dn->iommu_table = &iommu_table_u3; } -void iommu_setup_u3(void) +static void iommu_dev_setup_null(struct pci_dev *dev) { } +static void iommu_bus_setup_null(struct pci_bus *bus) { } + +void iommu_init_early_u3(void) { - struct pci_controller *phb, *tmp; - struct pci_dev *dev = NULL; struct device_node *dn; /* Find the DART in the device-tree */ @@ -282,31 +312,23 @@ void iommu_setup_u3(void) ppc_md.tce_flush = dart_flush; /* Initialize the DART HW */ - if (dart_init(dn)) - return; + if (dart_init(dn)) { + /* If init failed, use direct iommu and null setup functions */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + + /* Setup pci_dma ops */ + pci_direct_iommu_init(); + } else { + ppc_md.iommu_dev_setup = iommu_dev_setup_u3; + ppc_md.iommu_bus_setup = iommu_bus_setup_u3; - /* Setup pci_dma ops */ - pci_iommu_init(); - - /* We only have one iommu table on the mac for now, which makes - * things simple. Setup all PCI devices to point to this table - */ - for_each_pci_dev(dev) { - /* We must use pci_device_to_OF_node() to make sure that - * we get the real "final" pointer to the device in the - * pci_dev sysdata and not the temporary PHB one - */ - struct device_node *dn = pci_device_to_OF_node(dev); - if (dn) - dn->iommu_table = &iommu_table_u3; - } - /* We also make sure we set all PHBs ... */ - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { - dn = (struct device_node *)phb->arch_data; - dn->iommu_table = &iommu_table_u3; + /* Setup pci_dma ops */ + pci_iommu_init(); } } + void __init alloc_u3_dart_table(void) { /* Only reserve DART space if machine has more than 2GB of RAM diff -puN arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/iSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup 2005-01-05 16:59:18.149162648 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c 2005-01-05 16:59:18.243148360 -0600 @@ -132,11 +132,11 @@ static void iommu_table_getparms(struct if (parms->itc_size == 0) panic("PCI_DMA: parms->size is zero, parms is 0x%p", parms); - tbl->it_size = parms->itc_size; + /* itc_size is in pages worth of table, it_size is in # of entries */ + tbl->it_size = (parms->itc_size * PAGE_SIZE) / sizeof(union tce_entry); tbl->it_busno = parms->itc_busno; tbl->it_offset = parms->itc_offset; tbl->it_index = parms->itc_index; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 1; tbl->it_type = TCE_PCI; @@ -160,11 +160,16 @@ void iommu_devnode_init_iSeries(struct i kfree(tbl); } +static void iommu_dev_setup_iSeries(struct pci_dev *dev) { } +static void iommu_bus_setup_iSeries(struct pci_bus *bus) { } -void tce_init_iSeries(void) +void iommu_init_early_iSeries(void) { ppc_md.tce_build = tce_build_iSeries; ppc_md.tce_free = tce_free_iSeries; + ppc_md.iommu_dev_setup = iommu_dev_setup_iSeries; + ppc_md.iommu_bus_setup = iommu_bus_setup_iSeries; + pci_iommu_init(); } diff -puN drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup drivers/pci/hotplug/rpaphp_pci.c --- linux-2.5/drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup 2005-01-05 16:59:18.154161888 -0600 +++ linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c 2005-01-05 16:59:18.245148056 -0600 @@ -25,6 +25,7 @@ #include #include #include +#include #include "../pci.h" /* for pci_add_new_bus */ #include "rpaphp.h" @@ -168,6 +169,9 @@ rpaphp_fixup_new_pci_devices(struct pci_ if (list_empty(&dev->global_list)) { int i; + /* Need to setup IOMMU tables */ + ppc_md.iommu_dev_setup(dev); + if(fix_bus) pcibios_fixup_device_resources(dev, bus); pci_read_irq_line(dev); diff -puN arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup arch/ppc64/kernel/pSeries_pci.c --- linux-2.5/arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup 2005-01-05 16:59:18.158161280 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c 2005-01-05 16:59:18.246147904 -0600 @@ -148,7 +148,7 @@ struct pci_ops rtas_pci_ops = { rtas_pci_write_config }; -static int is_python(struct device_node *dev) +int is_python(struct device_node *dev) { char *model = (char *)get_property(dev, "model", NULL); @@ -554,9 +554,6 @@ void __init pSeries_final_fixup(void) pSeries_request_regions(); pci_fix_bus_sysdata(); - if (!of_chosen || !get_property(of_chosen, "linux,iommu-off", NULL)) - iommu_setup_pSeries(); - pci_addr_cache_build(); } diff -puN arch/ppc64/kernel/prom.c~iommu-cleanup arch/ppc64/kernel/prom.c --- linux-2.5/arch/ppc64/kernel/prom.c~iommu-cleanup 2005-01-05 16:59:18.162160672 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/prom.c 2005-01-05 16:59:18.249147448 -0600 @@ -1743,17 +1743,6 @@ static int of_finish_dynamic_node(struct node->devfn = (regs[0] >> 8) & 0xff; } - /* fixing up iommu_table */ - -#ifdef CONFIG_PPC_PSERIES - if (strcmp(node->name, "pci") == 0 && - get_property(node, "ibm,dma-window", NULL)) { - node->bussubno = node->busno; - iommu_devnode_init_pSeries(node); - } else - node->iommu_table = parent->iommu_table; -#endif /* CONFIG_PPC_PSERIES */ - out: of_node_put(parent); return err; diff -puN include/asm-ppc64/pci-bridge.h~iommu-cleanup include/asm-ppc64/pci-bridge.h --- linux-2.5/include/asm-ppc64/pci-bridge.h~iommu-cleanup 2005-01-05 16:59:18.166160064 -0600 +++ linux-2.5-olof/include/asm-ppc64/pci-bridge.h 2005-01-05 16:59:18.250147296 -0600 @@ -79,6 +79,14 @@ static inline struct device_node *pci_de return fetch_dev_dn(dev); } +static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus) +{ + if (bus->self) + return pci_device_to_OF_node(bus->self); + else + return bus->sysdata; /* Must be root bus (PHB) */ +} + extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev); diff -puN include/asm-ppc64/iommu.h~iommu-cleanup include/asm-ppc64/iommu.h --- linux-2.5/include/asm-ppc64/iommu.h~iommu-cleanup 2005-01-05 16:59:18.170159456 -0600 +++ linux-2.5-olof/include/asm-ppc64/iommu.h 2005-01-05 16:59:18.252146992 -0600 @@ -69,18 +69,16 @@ union tce_entry { struct iommu_table { unsigned long it_busno; /* Bus number this table belongs to */ - unsigned long it_size; /* Size in pages of iommu table */ + unsigned long it_size; /* Size of iommu table in entries */ unsigned long it_offset; /* Offset into global table */ unsigned long it_base; /* mapped address of tce table */ unsigned long it_index; /* which iommu table this is */ unsigned long it_type; /* type: PCI or Virtual Bus */ - unsigned long it_entrysize; /* Size of an entry in bytes */ unsigned long it_blocksize; /* Entries in each block (cacheline) */ unsigned long it_hint; /* Hint for next alloc */ unsigned long it_largehint; /* Hint for large allocs */ unsigned long it_halfpoint; /* Breaking point for small/large allocs */ spinlock_t it_lock; /* Protects it_map */ - unsigned long it_mapsize; /* Size of map in # of entries (bits) */ unsigned long *it_map; /* A simple allocation bitmap for now */ }; @@ -156,14 +154,13 @@ extern dma_addr_t iommu_map_single(struc extern void iommu_unmap_single(struct iommu_table *tbl, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction); -extern void tce_init_pSeries(void); -extern void tce_init_iSeries(void); +extern void iommu_init_early_pSeries(void); +extern void iommu_init_early_iSeries(void); +extern void iommu_init_early_u3(void); extern void pci_iommu_init(void); -extern void pci_dma_init_direct(void); +extern void pci_direct_iommu_init(void); extern void alloc_u3_dart_table(void); -extern int ppc64_iommu_off; - #endif /* _ASM_IOMMU_H */ diff -puN arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup arch/ppc64/kernel/pSeries_setup.c --- linux-2.5/arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup 2005-01-05 16:59:18.175158696 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c 2005-01-05 16:59:18.253146840 -0600 @@ -375,10 +375,7 @@ static void __init pSeries_init_early(vo } - if (iommu_off) - pci_dma_init_direct(); - else - tce_init_pSeries(); + iommu_init_early_pSeries(); pSeries_discover_pic(); diff -puN arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup arch/ppc64/kernel/iSeries_setup.c --- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup 2005-01-05 16:59:18.180157936 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c 2005-01-05 16:59:18.255146536 -0600 @@ -68,7 +68,6 @@ extern void hvlog(char *fmt, ...); /* Function Prototypes */ extern void ppcdbg_initialize(void); -extern void tce_init_iSeries(void); static void build_iSeries_Memory_Map(void); static void setup_iSeries_cache_sizes(void); @@ -344,7 +343,7 @@ static void __init iSeries_parse_cmdline /* * Initialize the DMA/TCE management */ - tce_init_iSeries(); + iommu_init_early_iSeries(); /* * Initialize the table which translate Linux physical addresses to diff -puN arch/ppc64/kernel/maple_pci.c~iommu-cleanup arch/ppc64/kernel/maple_pci.c --- linux-2.5/arch/ppc64/kernel/maple_pci.c~iommu-cleanup 2005-01-05 16:59:18.184157328 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/maple_pci.c 2005-01-05 16:59:18.257146232 -0600 @@ -385,9 +385,6 @@ void __init maple_pcibios_fixup(void) /* Fixup the pci_bus sysdata pointers */ pci_fix_bus_sysdata(); - /* Setup the iommu */ - iommu_setup_u3(); - DBG(" <- maple_pcibios_fixup\n"); } diff -puN arch/ppc64/kernel/pmac_pci.c~iommu-cleanup arch/ppc64/kernel/pmac_pci.c --- linux-2.5/arch/ppc64/kernel/pmac_pci.c~iommu-cleanup 2005-01-05 16:59:18.188156720 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c 2005-01-05 16:59:18.258146080 -0600 @@ -666,8 +666,6 @@ void __init pmac_pcibios_fixup(void) pci_read_irq_line(dev); pci_fix_bus_sysdata(); - - iommu_setup_u3(); } static void __init pmac_fixup_phb_resources(void) diff -puN arch/ppc64/kernel/pmac_setup.c~iommu-cleanup arch/ppc64/kernel/pmac_setup.c --- linux-2.5/arch/ppc64/kernel/pmac_setup.c~iommu-cleanup 2005-01-05 16:59:18.194155808 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c 2005-01-05 16:59:18.309138328 -0600 @@ -166,11 +166,6 @@ void __init pmac_setup_arch(void) pmac_setup_smp(); #endif - /* Setup the PCI DMA to "direct" by default. May be overriden - * by iommu later on - */ - pci_dma_init_direct(); - /* Lookup PCI hosts */ pmac_pci_init(); @@ -317,6 +312,8 @@ void __init pmac_init_early(void) /* Setup interrupt mapping options */ ppc64_interrupt_controller = IC_OPEN_PIC; + iommu_init_early_u3(); + DBG(" <- pmac_init_early\n"); } diff -puN arch/ppc64/kernel/maple_setup.c~iommu-cleanup arch/ppc64/kernel/maple_setup.c --- linux-2.5/arch/ppc64/kernel/maple_setup.c~iommu-cleanup 2005-01-05 16:59:18.199155048 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/maple_setup.c 2005-01-05 16:59:18.309138328 -0600 @@ -111,11 +111,6 @@ void __init maple_setup_arch(void) #ifdef CONFIG_SMP smp_ops = &maple_smp_ops; #endif - /* Setup the PCI DMA to "direct" by default. May be overriden - * by iommu later on - */ - pci_dma_init_direct(); - /* Lookup PCI hosts */ maple_pci_init(); @@ -159,6 +154,8 @@ static void __init maple_init_early(void /* Setup interrupt mapping options */ ppc64_interrupt_controller = IC_OPEN_PIC; + iommu_init_early_u3(); + DBG(" <- maple_init_early\n"); } diff -puN arch/ppc64/kernel/smp.c~iommu-cleanup arch/ppc64/kernel/smp.c diff -puN arch/ppc64/kernel/vio.c~iommu-cleanup arch/ppc64/kernel/vio.c --- linux-2.5/arch/ppc64/kernel/vio.c~iommu-cleanup 2005-01-05 16:59:18.207153832 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/vio.c 2005-01-05 16:59:18.311138024 -0600 @@ -158,6 +158,7 @@ void __init iommu_vio_init(void) struct iommu_table *t; struct iommu_table_cb cb; unsigned long cbp; + unsigned long itc_entries; cb.itc_busno = 255; /* Bus 255 is the virtual bus */ cb.itc_virtbus = 0xff; /* Ask for virtual bus */ @@ -165,12 +166,12 @@ void __init iommu_vio_init(void) cbp = virt_to_abs(&cb); HvCallXm_getTceTableParms(cbp); - veth_iommu_table.it_size = cb.itc_size / 2; + itc_entries = cb.itc_size * PAGE_SIZE / sizeof(union tce_entry); + veth_iommu_table.it_size = itc_entries / 2; veth_iommu_table.it_busno = cb.itc_busno; veth_iommu_table.it_offset = cb.itc_offset; veth_iommu_table.it_index = cb.itc_index; veth_iommu_table.it_type = TCE_VB; - veth_iommu_table.it_entrysize = sizeof(union tce_entry); veth_iommu_table.it_blocksize = 1; t = iommu_init_table(&veth_iommu_table); @@ -178,13 +179,12 @@ void __init iommu_vio_init(void) if (!t) printk("Virtual Bus VETH TCE table failed.\n"); - vio_iommu_table.it_size = cb.itc_size - veth_iommu_table.it_size; + vio_iommu_table.it_size = itc_entries - veth_iommu_table.it_size; vio_iommu_table.it_busno = cb.itc_busno; vio_iommu_table.it_offset = cb.itc_offset + - veth_iommu_table.it_size * (PAGE_SIZE/sizeof(union tce_entry)); + veth_iommu_table.it_size; vio_iommu_table.it_index = cb.itc_index; vio_iommu_table.it_type = TCE_VB; - vio_iommu_table.it_entrysize = sizeof(union tce_entry); vio_iommu_table.it_blocksize = 1; t = iommu_init_table(&vio_iommu_table); @@ -511,7 +511,6 @@ static struct iommu_table * vio_build_io unsigned int *dma_window; struct iommu_table *newTceTable; unsigned long offset; - unsigned long size; int dma_window_property_size; dma_window = (unsigned int *) get_property(dev->dev.platform_data, "ibm,my-dma-window", &dma_window_property_size); @@ -521,21 +520,18 @@ static struct iommu_table * vio_build_io newTceTable = (struct iommu_table *) kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - size = ((dma_window[4] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - /* There should be some code to extract the phys-encoded offset using prom_n_addr_cells(). However, according to a comment on earlier versions, it's always zero, so we don't bother */ offset = dma_window[1] >> PAGE_SHIFT; - /* TCE table size - measured in units of pages of tce table */ - newTceTable->it_size = size; + /* TCE table size - measured in tce entries */ + newTceTable->it_size = dma_window[4] >> PAGE_SHIFT; /* offset for VIO should always be 0 */ newTceTable->it_offset = offset; newTceTable->it_busno = 0; newTceTable->it_index = (unsigned long)dma_window[0]; newTceTable->it_type = TCE_VB; - newTceTable->it_entrysize = sizeof(union tce_entry); return iommu_init_table(newTceTable); } diff -puN arch/ppc64/kernel/iommu.c~iommu-cleanup arch/ppc64/kernel/iommu.c --- linux-2.5/arch/ppc64/kernel/iommu.c~iommu-cleanup 2005-01-05 16:59:18.211153224 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iommu.c 2005-01-05 16:59:18.312137872 -0600 @@ -87,7 +87,7 @@ static unsigned long iommu_range_alloc(s start = largealloc ? tbl->it_largehint : tbl->it_hint; /* Use only half of the table for small allocs (15 pages or less) */ - limit = largealloc ? tbl->it_mapsize : tbl->it_halfpoint; + limit = largealloc ? tbl->it_size : tbl->it_halfpoint; if (largealloc && start < tbl->it_halfpoint) start = tbl->it_halfpoint; @@ -114,7 +114,7 @@ static unsigned long iommu_range_alloc(s * Second failure, rescan the other half of the table. */ start = (largealloc ^ pass) ? tbl->it_halfpoint : 0; - limit = pass ? tbl->it_mapsize : limit; + limit = pass ? tbl->it_size : limit; pass++; goto again; } else { @@ -194,7 +194,7 @@ static void __iommu_free(struct iommu_ta entry = dma_addr >> PAGE_SHIFT; free_entry = entry - tbl->it_offset; - if (((free_entry + npages) > tbl->it_mapsize) || + if (((free_entry + npages) > tbl->it_size) || (entry < tbl->it_offset)) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_free: invalid entry\n"); @@ -202,7 +202,7 @@ static void __iommu_free(struct iommu_ta printk(KERN_INFO "\tdma_addr = 0x%lx\n", (u64)dma_addr); printk(KERN_INFO "\tTable = 0x%lx\n", (u64)tbl); printk(KERN_INFO "\tbus# = 0x%lx\n", (u64)tbl->it_busno); - printk(KERN_INFO "\tmapsize = 0x%lx\n", (u64)tbl->it_mapsize); + printk(KERN_INFO "\tsize = 0x%lx\n", (u64)tbl->it_size); printk(KERN_INFO "\tstartOff = 0x%lx\n", (u64)tbl->it_offset); printk(KERN_INFO "\tindex = 0x%lx\n", (u64)tbl->it_index); WARN_ON(1); @@ -407,14 +407,11 @@ struct iommu_table *iommu_init_table(str unsigned long sz; static int welcomed = 0; - /* it_size is in pages, it_mapsize in number of entries */ - tbl->it_mapsize = (tbl->it_size << PAGE_SHIFT) / tbl->it_entrysize; - /* Set aside 1/4 of the table for large allocations. */ - tbl->it_halfpoint = tbl->it_mapsize * 3 / 4; + tbl->it_halfpoint = tbl->it_size * 3 / 4; /* number of bytes needed for the bitmap */ - sz = (tbl->it_mapsize + 7) >> 3; + sz = (tbl->it_size + 7) >> 3; tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz)); if (!tbl->it_map) @@ -448,8 +445,8 @@ void iommu_free_table(struct device_node } /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { + /* it_size is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_size/64); i++) { if (tbl->it_map[i] != 0) { printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", __FUNCTION__, dn->full_name); @@ -458,7 +455,7 @@ void iommu_free_table(struct device_node } /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; + bitmap_sz = (tbl->it_size + 7) / 8; /* free bitmap */ order = get_order(bitmap_sz); diff -puN arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup arch/ppc64/kernel/pci_direct_iommu.c --- linux-2.5/arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup 2005-01-05 16:59:18.215152616 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c 2005-01-05 16:59:18.312137872 -0600 @@ -78,7 +78,7 @@ static void pci_direct_unmap_sg(struct p { } -void __init pci_dma_init_direct(void) +void __init pci_direct_iommu_init(void) { pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; diff -puN arch/ppc64/kernel/Makefile~iommu-cleanup arch/ppc64/kernel/Makefile --- linux-2.5/arch/ppc64/kernel/Makefile~iommu-cleanup 2005-01-05 16:59:18.219152008 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/Makefile 2005-01-05 16:59:18.313137720 -0600 @@ -16,7 +16,7 @@ obj-y := setup.o entry.o t obj-$(CONFIG_PPC_OF) += of_device.o pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o -pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_dma_direct.o +pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o obj-$(CONFIG_PCI) += pci.o pci_iommu.o iomap.o $(pci-obj-y) _ From olof at austin.ibm.com Thu Jan 6 11:26:31 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 05 Jan 2005 18:26:31 -0600 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050105162409.5cc9087e.akpm@osdl.org> References: <20050106000721.GA20029@austin.ibm.com> <20050105162409.5cc9087e.akpm@osdl.org> Message-ID: <41DC85B7.1060000@austin.ibm.com> Andrew Morton wrote: >Olof Johansson wrote: > > >>This is part of the iommu cleanup, but broken out as a separate patch >> since for mainline, a BK rename is more appropriate. Still, we need a >> patch to apply for non-BK-based trees (-mm) >> >> > >It's not clear to me what this comment means. Is this patch for upstream >merging? > >bk is fairly good at detecting when a gnu patch is simply performing a >rename and will convert it into a `bk mv'. > Ah, I didn't know that it was that clever. If so, it's good for upstream merging. Otherwise I would have recommended a manual bk mv upstream, that's what the comment referred to. -Olof From akpm at osdl.org Thu Jan 6 11:24:09 2005 From: akpm at osdl.org (Andrew Morton) Date: Wed, 5 Jan 2005 16:24:09 -0800 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050106000721.GA20029@austin.ibm.com> References: <20050106000721.GA20029@austin.ibm.com> Message-ID: <20050105162409.5cc9087e.akpm@osdl.org> Olof Johansson wrote: > > This is part of the iommu cleanup, but broken out as a separate patch > since for mainline, a BK rename is more appropriate. Still, we need a > patch to apply for non-BK-based trees (-mm) It's not clear to me what this comment means. Is this patch for upstream merging? bk is fairly good at detecting when a gnu patch is simply performing a rename and will convert it into a `bk mv'. From paulus at samba.org Thu Jan 6 14:14:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 6 Jan 2005 14:14:44 +1100 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050106000721.GA20029@austin.ibm.com> References: <20050106000721.GA20029@austin.ibm.com> Message-ID: <16860.44324.493730.587567@cargo.ozlabs.ibm.com> Olof Johansson writes: > This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to > the naming convention of the other iommu files. > > This is part of the iommu cleanup, but broken out as a separate patch > since for mainline, a BK rename is more appropriate. Still, we need a > patch to apply for non-BK-based trees (-mm) > > Signed-off-by: Olof Johansson Acked-by: Paul Mackerras From paulus at samba.org Thu Jan 6 14:15:12 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 6 Jan 2005 14:15:12 +1100 Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch In-Reply-To: <20050106000735.GA20079@austin.ibm.com> References: <20050106000735.GA20079@austin.ibm.com> Message-ID: <16860.44352.242290.624886@cargo.ozlabs.ibm.com> Olof Johansson writes: > Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted > at simplifying the allocation schemes and modularising things for the > various platforms. The IOMMU init functions are still a mess. This is > an attempt to clean them up and make them somewhat easier to follow. ... > Signed-off-by: Olof Johansson Acked-by: Paul Mackerras From sfr at canb.auug.org.au Thu Jan 6 14:51:02 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 6 Jan 2005 14:51:02 +1100 Subject: [PATCH] htab code cleanup Message-ID: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> Hi all, This patch just does some small clean ups on the hash page table code - make htab_address static with in htab_native.c - move some code that depended on CONFIG_PPC_MULTIPLATFORM from htab_utils.c to htab_native.c (on less CONFIG check). - clean up includes in htab_utils.c -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c 2005-01-06 14:37:42.000000000 +1100 @@ -478,12 +478,6 @@ htab_hash_mask = num_ptegs - 1; /* - * The actual hashed page table is in the hypervisor, - * we have no direct access - */ - htab_address = NULL; - - /* * Determine if absolute memory has any * holes so that we can interpret the * access map we get back from the hypervisor diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-sfr.14/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/kernel/setup.c 2005-01-06 14:37:54.000000000 +1100 @@ -673,7 +673,6 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); diff -ruN linus-bk/arch/ppc64/mm/hash_native.c linus-bk-sfr.14/arch/ppc64/mm/hash_native.c --- linus-bk/arch/ppc64/mm/hash_native.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/mm/hash_native.c 2005-01-06 14:37:14.000000000 +1100 @@ -9,6 +9,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ +#include #include #include #include @@ -22,6 +23,15 @@ #include #include #include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +static HPTE *htab_address; #define HPTE_LOCK_BIT 3 @@ -410,6 +420,173 @@ } #endif +/* + * Note: pte --> Linux PTE + * HPTE --> PowerPC Hashed Page Table Entry + * + * Execution context: + * htab_initialize is called with the MMU off (of course), but + * the kernel has been copied down to zero so it can directly + * reference global data. At this point it is very difficult + * to print debug info. + * + */ + +#ifdef CONFIG_U3_DART +extern unsigned long dart_tablebase; +#endif /* CONFIG_U3_DART */ +extern unsigned long _SDR1; + +#define KB (1024) +#define MB (1024*KB) + +static inline void loop_forever(void) +{ + volatile unsigned long x = 1; + for(;x;x|=1) + ; +} + +static inline void create_pte_mapping(unsigned long start, unsigned long end, + unsigned long mode, int large) +{ + unsigned long addr; + unsigned int step; + + if (large) + step = 16*MB; + else + step = 4*KB; + + for (addr = start; addr < end; addr += step) { + unsigned long vpn, hash, hpteg; + unsigned long vsid = get_kernel_vsid(addr); + unsigned long va = (vsid << 28) | (addr & 0xfffffff); + int ret; + + if (large) + vpn = va >> HPAGE_SHIFT; + else + vpn = va >> PAGE_SHIFT; + + hash = hpt_hash(vpn, large); + + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); + +#ifdef CONFIG_PPC_PSERIES + if (systemcfg->platform & PLATFORM_LPAR) + ret = pSeries_lpar_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + else +#endif /* CONFIG_PPC_PSERIES */ + ret = native_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + + if (ret == -1) { + ppc64_terminate_msg(0x20, "create_pte_mapping"); + loop_forever(); + } + } +} + +void __init htab_initialize(void) +{ + unsigned long table, htab_size_bytes; + unsigned long pteg_count; + unsigned long mode_rw; + int i, use_largepages = 0; + + DBG(" -> htab_initialize()\n"); + + /* + * Calculate the required size of the htab. We want the number of + * PTEGs to equal one half the number of real pages. + */ + htab_size_bytes = 1UL << ppc64_pft_size; + pteg_count = htab_size_bytes >> 7; + + /* For debug, make the HTAB 1/8 as big as it normally would be. */ + ifppcdebug(PPCDBG_HTABSIZE) { + pteg_count >>= 3; + htab_size_bytes = pteg_count << 7; + } + + htab_hash_mask = pteg_count - 1; + + if (systemcfg->platform & PLATFORM_LPAR) { + /* Using a hypervisor which owns the htab */ + htab_address = NULL; + _SDR1 = 0; + } else { + /* Find storage for the HPT. Must be contiguous in + * the absolute address space. + */ + table = lmb_alloc(htab_size_bytes, htab_size_bytes); + + DBG("Hash table allocated at %lx, size: %lx\n", table, + htab_size_bytes); + + if ( !table ) { + ppc64_terminate_msg(0x20, "hpt space"); + loop_forever(); + } + htab_address = abs_to_virt(table); + + /* htab absolute addr + encoded htabsize */ + _SDR1 = table + __ilog2(pteg_count) - 11; + + /* Initialize the HPT with no entries */ + memset((void *)table, 0, htab_size_bytes); + } + + mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + + /* On U3 based machines, we need to reserve the DART area and + * _NOT_ map it to avoid cache paradoxes as it's remapped non + * cacheable later on + */ + if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + use_largepages = 1; + + /* create bolted the linear mapping in the hash table */ + for (i=0; i < lmb.memory.cnt; i++) { + unsigned long base, size; + + base = lmb.memory.region[i].physbase + KERNELBASE; + size = lmb.memory.region[i].size; + + DBG("creating mapping for region: %lx : %lx\n", base, size); + +#ifdef CONFIG_U3_DART + /* Do not map the DART space. Fortunately, it will be aligned + * in such a way that it will not cross two lmb regions and will + * fit within a single 16Mb page. + * The DART space is assumed to be a full 16Mb region even if we + * only use 2Mb of that space. We will use more of it later for + * AGP GART. We have to use a full 16Mb large page. + */ + DBG("DART base: %lx\n", dart_tablebase); + + if (dart_tablebase != 0 && dart_tablebase >= base + && dart_tablebase < (base + size)) { + if (base != dart_tablebase) + create_pte_mapping(base, dart_tablebase, mode_rw, + use_largepages); + if ((base + size) > (dart_tablebase + 16*MB)) + create_pte_mapping(dart_tablebase + 16*MB, base + size, + mode_rw, use_largepages); + continue; + } +#endif /* CONFIG_U3_DART */ + create_pte_mapping(base, base + size, mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); +} +#undef KB +#undef MB + void hpte_init_native(void) { ppc_md.hpte_invalidate = native_hpte_invalidate; diff -ruN linus-bk/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c --- linus-bk/arch/ppc64/mm/hash_utils.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c 2005-01-06 14:37:27.000000000 +1100 @@ -17,220 +17,29 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - -#undef DEBUG - -#include -#include -#include +#include +#include +#include #include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include -#include #include #include #include #include #include -#include #include -#include #include -#include -#include #include -#include -#include -#include #include -#include -#include - -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - -/* - * Note: pte --> Linux PTE - * HPTE --> PowerPC Hashed Page Table Entry - * - * Execution context: - * htab_initialize is called with the MMU off (of course), but - * the kernel has been copied down to zero so it can directly - * reference global data. At this point it is very difficult - * to print debug info. - * - */ - -#ifdef CONFIG_U3_DART -extern unsigned long dart_tablebase; -#endif /* CONFIG_U3_DART */ +#include -HPTE *htab_address; unsigned long htab_hash_mask; -extern unsigned long _SDR1; - -#define KB (1024) -#define MB (1024*KB) - -static inline void loop_forever(void) -{ - volatile unsigned long x = 1; - for(;x;x|=1) - ; -} - -#ifdef CONFIG_PPC_MULTIPLATFORM -static inline void create_pte_mapping(unsigned long start, unsigned long end, - unsigned long mode, int large) -{ - unsigned long addr; - unsigned int step; - - if (large) - step = 16*MB; - else - step = 4*KB; - - for (addr = start; addr < end; addr += step) { - unsigned long vpn, hash, hpteg; - unsigned long vsid = get_kernel_vsid(addr); - unsigned long va = (vsid << 28) | (addr & 0xfffffff); - int ret; - - if (large) - vpn = va >> HPAGE_SHIFT; - else - vpn = va >> PAGE_SHIFT; - - hash = hpt_hash(vpn, large); - - hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); - -#ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_LPAR) - ret = pSeries_lpar_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - else -#endif /* CONFIG_PPC_PSERIES */ - ret = native_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - - if (ret == -1) { - ppc64_terminate_msg(0x20, "create_pte_mapping"); - loop_forever(); - } - } -} - -void __init htab_initialize(void) -{ - unsigned long table, htab_size_bytes; - unsigned long pteg_count; - unsigned long mode_rw; - int i, use_largepages = 0; - - DBG(" -> htab_initialize()\n"); - - /* - * Calculate the required size of the htab. We want the number of - * PTEGs to equal one half the number of real pages. - */ - htab_size_bytes = 1UL << ppc64_pft_size; - pteg_count = htab_size_bytes >> 7; - - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - - htab_hash_mask = pteg_count - 1; - - if (systemcfg->platform & PLATFORM_LPAR) { - /* Using a hypervisor which owns the htab */ - htab_address = NULL; - _SDR1 = 0; - } else { - /* Find storage for the HPT. Must be contiguous in - * the absolute address space. - */ - table = lmb_alloc(htab_size_bytes, htab_size_bytes); - - DBG("Hash table allocated at %lx, size: %lx\n", table, - htab_size_bytes); - - if ( !table ) { - ppc64_terminate_msg(0x20, "hpt space"); - loop_forever(); - } - htab_address = abs_to_virt(table); - - /* htab absolute addr + encoded htabsize */ - _SDR1 = table + __ilog2(pteg_count) - 11; - - /* Initialize the HPT with no entries */ - memset((void *)table, 0, htab_size_bytes); - } - - mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; - - /* On U3 based machines, we need to reserve the DART area and - * _NOT_ map it to avoid cache paradoxes as it's remapped non - * cacheable later on - */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) - use_largepages = 1; - - /* create bolted the linear mapping in the hash table */ - for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - - base = lmb.memory.region[i].physbase + KERNELBASE; - size = lmb.memory.region[i].size; - - DBG("creating mapping for region: %lx : %lx\n", base, size); - -#ifdef CONFIG_U3_DART - /* Do not map the DART space. Fortunately, it will be aligned - * in such a way that it will not cross two lmb regions and will - * fit within a single 16Mb page. - * The DART space is assumed to be a full 16Mb region even if we - * only use 2Mb of that space. We will use more of it later for - * AGP GART. We have to use a full 16Mb large page. - */ - DBG("DART base: %lx\n", dart_tablebase); - - if (dart_tablebase != 0 && dart_tablebase >= base - && dart_tablebase < (base + size)) { - if (base != dart_tablebase) - create_pte_mapping(base, dart_tablebase, mode_rw, - use_largepages); - if ((base + size) > (dart_tablebase + 16*MB)) - create_pte_mapping(dart_tablebase + 16*MB, base + size, - mode_rw, use_largepages); - continue; - } -#endif /* CONFIG_U3_DART */ - create_pte_mapping(base, base + size, mode_rw, use_largepages); - } - DBG(" <- htab_initialize()\n"); -} -#undef KB -#undef MB -#endif /* CONFIG_PPC_MULTIPLATFORM */ - /* * Called by asm hashtable.S for doing lazy icache flush */ diff -ruN linus-bk/include/asm-ppc64/mmu.h linus-bk-sfr.14/include/asm-ppc64/mmu.h --- linus-bk/include/asm-ppc64/mmu.h 2005-01-05 17:06:08.000000000 +1100 +++ linus-bk-sfr.14/include/asm-ppc64/mmu.h 2005-01-06 14:36:16.000000000 +1100 @@ -98,7 +98,6 @@ #define PP_RXRX 3 /* Supervisor read, User read */ -extern HPTE * htab_address; extern unsigned long htab_hash_mask; static inline unsigned long hpt_hash(unsigned long vpn, int large) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/86e40190/attachment.pgp From WJEEHA at pk.ibm.com Thu Jan 6 22:48:27 2005 From: WJEEHA at pk.ibm.com (Wjeeha Tahir) Date: Thu, 6 Jan 2005 16:48:27 +0500 Subject: IBM 6400 Printer Driver Message-ID: Hi, I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and any configuration/installation document (if possible). I am hoping that this forum would help me find the desired. Thanks in advance, Kind Regards, Wjeeha Tahir -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/79b23142/attachment.htm From anton at samba.org Thu Jan 6 23:19:24 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 6 Jan 2005 23:19:24 +1100 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <20050105084202.5102b467@localhost> References: <20050104143031.62c25338@localhost> <16859.11390.511469.875831@cargo.ozlabs.ibm.com> <20050105084202.5102b467@localhost> Message-ID: <20050106121924.GA14239@krispykreme.ozlabs.ibm.com> > I may have misunderstood what Anton wanted when I talked w/ him > yesterday, but I was under the impression that he wanted 'bi' and 'bd' > fixed for Power4/5/LPAR. Yep sorry, my fault. I was interested in the data breakpoint stuff you had written that went through the hypervisor. Anton From haveblue at us.ibm.com Fri Jan 7 03:45:09 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Thu, 06 Jan 2005 08:45:09 -0800 Subject: IBM 6400 Printer Driver In-Reply-To: References: Message-ID: <1105029909.6932.2.camel@localhost> On Thu, 2005-01-06 at 16:48 +0500, Wjeeha Tahir wrote: > I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and > any configuration/installation document (if possible). I am hoping > that this forum would help me find the desired. Thanks in advance, Does the printer have a ppc64 chip in it and run Linux? -- Dave From hch at lst.de Fri Jan 7 03:47:19 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 6 Jan 2005 17:47:19 +0100 Subject: [PATCH] fix pktcdvd linking on ppc64 Message-ID: <20050106164719.GA24751@lst.de> clear_page uses ppc64_caches so it needs to be exported. --- 1.99/arch/ppc64/kernel/setup.c 2005-01-05 03:48:16 +01:00 +++ edited/arch/ppc64/kernel/setup.c 2005-01-06 17:51:19 +01:00 @@ -116,6 +116,7 @@ u64 ppc64_debug_switch; struct ppc64_caches ppc64_caches; +EXPORT_SYMBOL_GPL(ppc64_caches); /* * These are used in binfmt_elf.c to put aux entries on the stack From markus at unixforces.net Fri Jan 7 04:55:01 2005 From: markus at unixforces.net (Markus Rothe) Date: Thu, 6 Jan 2005 18:55:01 +0100 Subject: Problems using Apple LCD with 2.6.10 Message-ID: <20050106175501.GA11534@unixforces.net> Hi, I'm not sure if this is the correct place for such mails, but I didn't found another place to post my problem. My problem is that my LCD doesn't work correctly with latest (2.6.10) kernel. It's an Apple Cinema Display connected through the Apple Display Connector (ADC). The problem is that there are many "blue lightnings" all over the display. With blue lightning I mean a small set of pixels which turn into light blue for about half a second. And my display also flickers from time to time. Both happens when running console mode and if I run Xorg. This problem is definetly related to the kernel as it does not occure with kernel 2.6.9. Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/d4574a70/attachment.pgp From linas at austin.ibm.com Fri Jan 7 06:24:13 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 6 Jan 2005 13:24:13 -0600 Subject: [PATCH] PPC64: EEH Recovery Message-ID: <20050106192413.GK22274@austin.ibm.com> Hi Paul, The patch below implements hotplug style EEH error recovery. Its split into two pieces: a part that needs to be applied to the PPC64 arch tree, and a part that needs to be applied to the RPA PHP hotplug tree. The PPC64 part needs to go in first. Assuming this doesn't generate a round of discussion, please forward upstream to akpm/torvalds. Signed-off-by: Linas Vepstas -------------- next part -------------- ===== arch/ppc64/kernel/eeh.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/eeh.c 2005-01-06 13:05:42 -06:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-01-06 13:08:03 -06:00 @@ -17,21 +17,19 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include #include -#include #include #include #include #include #include -#include +#include #include #include #include #include -#include #include "pci.h" #undef DEBUG @@ -89,7 +87,6 @@ static struct notifier_block *eeh_notifi * attempts we allow before panicking. */ #define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; /* RTAS tokens */ static int ibm_set_eeh_option; @@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -378,6 +385,14 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + } } #ifdef DEBUG @@ -389,6 +404,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct device_node *dn, int severity) +{ + unsigned long flags; + int rc; + + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -484,11 +525,9 @@ static void eeh_event_handler(void *dumm "%s %s\n", event->reset_state, pci_name(event->dev), pci_pretty_name(event->dev)); - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - __get_cpu_var(slot_resets)++; + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); pci_dev_put(event->dev); kfree(event); @@ -496,8 +535,8 @@ static void eeh_event_handler(void *dumm } /** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -512,6 +551,17 @@ static inline unsigned long eeh_token_to return pa | (token & (PAGE_SIZE-1)); } +static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } + return NULL; +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -532,7 +582,7 @@ int eeh_dn_check_failure(struct device_n int ret; int rets[3]; unsigned long flags; - int rc, reset_state; + int reset_state; struct eeh_event *event; __get_cpu_var(total_mmio_ffs)++; @@ -540,16 +590,20 @@ int eeh_dn_check_failure(struct device_n if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -558,8 +612,9 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_freeze_count ++; + if (dn->eeh_freeze_count >= EEH_MAX_FAILS) { + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ @@ -581,34 +636,25 @@ int eeh_dn_check_failure(struct device_n return 0; } - /* prevent repeated reports of this failure */ + /* Prevent repeated reports of this failure */ dn->eeh_mode |= EEH_MODE_ISOLATED; reset_state = rets[0]; + /* Log the error with the rtas logger */ + if (dn->eeh_freeze_count < EEH_MAX_ALLOWED_FREEZES) { + eeh_slot_error_detail (dn, 1 /* Temporary Error */); + } else { + eeh_slot_error_detail (dn, 2 /* Permanent Error */); + } - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { - eeh_panic(dev, reset_state); + printk (KERN_ERR "EEH: out of memory, event not handled\n"); return 1; } + if (!dev) + dev = eeh_get_pci_dev (dn); event->dev = dev; event->dn = dn; event->reset_state = reset_state; @@ -634,7 +680,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -642,6 +687,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -651,8 +697,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -663,6 +711,172 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int token = rtas_token ("ibm,set-slot-reset"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 1); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot\n"); + return; + } + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. + */ + msleep (200); + + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 0); +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge\n"); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + + +struct eeh_cfg_tree +{ + struct eeh_cfg_tree *sibling; + struct eeh_cfg_tree *child; + struct device_node *dn; + int is_bridge; +}; + +/** + * eeh_save_bars - save the PCI config space info + */ +struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn) +{ + struct pci_dev *dev; + struct eeh_cfg_tree *cnode; + + dev = eeh_get_pci_dev(dn); + if (!dev) + return NULL; + + cnode = kmalloc(sizeof(struct eeh_cfg_tree), GFP_KERNEL); + if (!cnode) + return NULL; + + cnode->is_bridge = 0; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + cnode->is_bridge = 1; + + of_node_get(dn); + cnode->dn = dn; + + cnode->sibling = NULL; + cnode->child = NULL; + + if (dn->child) { + cnode->child = eeh_save_bars (dn->child); + } + if (dn->sibling) { + cnode->sibling = eeh_save_bars (dn->sibling); + } + + return cnode; +} +EXPORT_SYMBOL(eeh_save_bars); + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct eeh_cfg_tree *tree) +{ + if (!(tree->is_bridge)) + __restore_bars (tree->dn); + + if (tree->child) + eeh_restore_bars (tree->child); + + if (tree->sibling) + eeh_restore_bars (tree->sibling); + + of_node_put (tree->dn); + kfree (tree); +} +EXPORT_SYMBOL(eeh_restore_bars); + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -829,7 +1043,9 @@ void eeh_add_device_early(struct device_ return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -848,6 +1064,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -857,6 +1076,11 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); } EXPORT_SYMBOL(eeh_add_device_late); @@ -886,12 +1110,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -899,13 +1128,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; ===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited ===== --- 1.59/arch/ppc64/kernel/pSeries_pci.c 2004-11-15 21:29:10 -06:00 +++ edited/arch/ppc64/kernel/pSeries_pci.c 2005-01-05 13:41:09 -06:00 @@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-25 18:17:38 -05:00 +++ edited/include/asm-ppc64/eeh.h 2005-01-05 13:47:55 -06:00 @@ -22,8 +22,8 @@ #include #include -#include #include +#include struct pci_dev; struct device_node; @@ -33,6 +33,10 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + #ifdef CONFIG_PPC_PSERIES extern void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -57,6 +61,34 @@ void eeh_add_device_early(struct device_ void eeh_add_device_late(struct pci_dev *); /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); + +/** * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * @@ -91,6 +123,13 @@ struct eeh_event { /** Register to find out about EEH events. */ int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); + +/** Save and restore device configuration info across + * device resets. + */ +struct eeh_cfg_tree; +struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn); +void eeh_restore_bars(struct eeh_cfg_tree *tree); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. ===== include/asm-ppc64/prom.h 1.24 vs edited ===== --- 1.24/include/asm-ppc64/prom.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/prom.h 2005-01-05 13:41:09 -06:00 @@ -164,8 +164,10 @@ struct device_node { int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_freeze_count; /* number of times this device froze up. */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ struct property *properties; struct device_node *parent; ===== include/asm-ppc64/rtas.h 1.25 vs edited ===== --- 1.25/include/asm-ppc64/rtas.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/rtas.h 2005-01-05 13:41:09 -06:00 @@ -241,4 +241,6 @@ extern void rtas_stop_self(void); /* RMO buffer reserved for user-space RTAS use */ extern unsigned long rtas_rmo_buf; +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ -------------- next part -------------- ===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited ===== --- 1.11/drivers/pci/hotplug/rpaphp.h 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp.h 2005-01-05 13:41:09 -06:00 @@ -126,6 +126,8 @@ extern int register_pci_slot(struct slot extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); ===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited ===== --- 1.18/drivers/pci/hotplug/rpaphp_core.c 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp_core.c 2005-01-05 13:41:09 -06:00 @@ -443,12 +443,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } ===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited ===== --- 1.17/drivers/pci/hotplug/rpaphp_pci.c 2004-11-18 02:36:18 -06:00 +++ edited/drivers/pci/hotplug/rpaphp_pci.c 2005-01-05 15:30:29 -06:00 @@ -22,8 +22,12 @@ * Send feedback to * */ +#include +#include #include +#include #include +#include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -562,36 +576,154 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB slots don't have bridges */ + if (slot->bridge == NULL) continue; - } + + /* the PCI device could be the PHB itself */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return slot; } } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + int freeze_count=0; + struct eeh_event *event = ev; + struct slot *frozen_slot; + struct eeh_cfg_tree * saved_bars; + +debug=1; + frozen_slot = rpaphp_find_slot(event->dev); + if (!frozen_slot) + { + printk (KERN_ERR + "EEH: Cannot find PCI slot for EEH error! dev=%p dn=%p\n", + event->dev, event->dn); + if (event->dev) + printk("EEH: above message for pci device %s %s\n", + pci_name(event->dev), pci_pretty_name (event->dev)); + if (event->dn) + printk ("EEH: above message for dn %s\n", event->dn->full_name); + return 1; + } + + /* Keep a copy of the config space registers */ + saved_bars = eeh_save_bars(frozen_slot->dn); + of_node_get(event->dn); + pci_dev_get(event->dev); + + if (frozen_slot->dn->child) + freeze_count = frozen_slot->dn->child->eeh_freeze_count; + rpaphp_unconfig_pci_adapter (frozen_slot); + + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (event->dev), + pci_pretty_name (event->dev), + freeze_count); + goto rdone; + } + printk (KERN_WARNING + "EEH: This device has failed %d times since last reoobt: %s:%s\n", + freeze_count, + pci_name (event->dev), + pci_pretty_name (event->dev)); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (event->dn); + rtas_configure_bridge(event->dn); + eeh_restore_bars(saved_bars); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + ssleep (5); + + rpaphp_enable_pci_slot (frozen_slot); + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + if (frozen_slot->dn->child) + frozen_slot->dn->child->eeh_freeze_count = freeze_count; + +rdone: + of_node_put(event->dn); + pci_dev_put(event->dev); + return 0; +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + eeh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + eeh_unregister_notifier (&eeh_block); +} + From johnrose at austin.ibm.com Fri Jan 7 07:59:25 2005 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 06 Jan 2005 14:59:25 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050106192413.GK22274@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> Message-ID: <1105045165.22565.20.camel@sinatra.austin.ibm.com> Hi Linas- Here are a couple of non-substantive comments on your PCI Hotplug patch: + /* PHB slots don't have bridges */ + if (slot->bridge == NULL) continue; - } + + /* the PCI device could be the PHB itself */ + if (slot->bridge == dev) + return slot; The PHB case is handled by the first condition. The second comment would make more sense if "PHB itself" read "slot itself". -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); I suppose we could also make this static and remove it from rpaphp.h. Thanks- John From j.glisse at free.fr Sat Jan 8 04:37:03 2005 From: j.glisse at free.fr (Jerome Glisse) Date: Fri, 07 Jan 2005 18:37:03 +0100 Subject: Problems using Apple LCD with 2.6.10 In-Reply-To: <20050106175501.GA11534@unixforces.net> References: <20050106175501.GA11534@unixforces.net> Message-ID: <41DEC8BF.6010809@free.fr> Markus Rothe wrote: >Hi, > >I'm not sure if this is the correct place for such mails, but I didn't >found another place to post my problem. > >My problem is that my LCD doesn't work correctly with latest (2.6.10) >kernel. It's an Apple Cinema Display connected through the Apple Display >Connector (ADC). The problem is that there are many "blue lightnings" all >over the display. With blue lightning I mean a small set of pixels which >turn into light blue for about half a second. And my display also flickers >from time to time. Both happens when running console mode and if I run >Xorg. > >This problem is definetly related to the kernel as it does not occure with >kernel 2.6.9. > > What is your graphics card ? radeon ? nvidia ? best, Jerome Glisse From olof at austin.ibm.com Sat Jan 8 07:00:26 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 7 Jan 2005 14:00:26 -0600 Subject: [PATCH] [PPC64] Fix iommu cleanup regression Message-ID: <20050107200026.GA23616@austin.ibm.com> Hi, In the recent IOMMU cleanup, the new LPAR code assumes that all PHBs must have a dma window assigned to it. On some machines we don't have a window assinged unless there's an adapter in the slot. In other words, a PHB without a ibm,dma-window property is not a bug and must be tolerated. This patch fixes that, and also removes a redundant check for the dma-window being defined. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix 2005-01-07 12:52:18.960683160 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-07 13:44:19.427300128 -0600 @@ -293,10 +293,6 @@ static void iommu_table_setparms_lpar(st struct iommu_table *tbl, unsigned int *dma_window) { - if (!dma_window) - panic("iommu_table_setparms_lpar: device %s has no" - " ibm,dma-window property!\n", dn->full_name); - tbl->it_busno = dn->bussubno; /* TODO: Parse field size properties properly. */ @@ -385,7 +381,10 @@ static void iommu_bus_setup_pSeriesLP(st break; } - WARN_ON(dma_window == NULL); + if (dma_window == NULL) { + DBG("iommu_bus_setup_pSeriesLP: bus %s seems to have no ibm,dma-window property\n", dn->full_name); + return; + } if (!pdn->iommu_table) { /* Bussubno hasn't been copied yet. @@ -420,10 +419,11 @@ static void iommu_dev_setup_pSeries(stru while (dn && dn->iommu_table == NULL) dn = dn->parent; - WARN_ON(!dn); - - if (dn) + if (dn) { mydn->iommu_table = dn->iommu_table; + } else { + DBG("iommu_dev_setup_pSeries, dev %p (%s) has no iommu table\n", dev, dev->pretty_name); + } } static void iommu_bus_setup_null(struct pci_bus *b) { } _ From linas at austin.ibm.com Sat Jan 8 07:09:36 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 7 Jan 2005 14:09:36 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <1105045165.22565.20.camel@sinatra.austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <1105045165.22565.20.camel@sinatra.austin.ibm.com> Message-ID: <20050107200936.GN22274@austin.ibm.com> On Thu, Jan 06, 2005 at 02:59:25PM -0600, John Rose was heard to remark: > Hi Linas- > > Here are a couple of non-substantive comments on your PCI Hotplug patch: OK, thanks, I've tweaked it, I'll be in the next round of updates. --linas From markus at unixforces.net Sat Jan 8 07:13:43 2005 From: markus at unixforces.net (Markus Rothe) Date: Fri, 7 Jan 2005 21:13:43 +0100 Subject: Problems using Apple LCD with 2.6.10 In-Reply-To: <41DEC8BF.6010809@free.fr> References: <20050106175501.GA11534@unixforces.net> <41DEC8BF.6010809@free.fr> Message-ID: <20050107201343.GA10390@unixforces.net> Jerome Glisse wrote: > What is your graphics card ? radeon ? nvidia ? It is a Radeon 9600. Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050107/6b316468/attachment.pgp From zwane at arm.linux.org.uk Sun Jan 9 15:29:23 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sat, 8 Jan 2005 21:29:23 -0700 (MST) Subject: [PATCH] PPC64: Move hotplug cpu functions to smp_ops Message-ID: This should allow for easier adding of hotplug cpu support for other PPC64 subarchs. The patch is untested but does compile with and without hotplug cpu on pSeries and G5 configs. What can get slightly confusing is the fact that both ppc_md and smp_ops have cpu_die members. arch/ppc64/kernel/pSeries_smp.c | 9 +++++++-- arch/ppc64/kernel/smp.c | 16 ++++++++++++++++ include/asm-ppc64/machdep.h | 2 ++ 3 files changed, 25 insertions(+), 2 deletions(-) Signed-off-by: Zwane Mwaikambo Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/pSeries_smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pSeries_smp.c --- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c 4 Jan 2005 04:03:33 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c 9 Jan 2005 03:42:19 -0000 @@ -88,7 +88,7 @@ static int query_cpu_stopped(unsigned in #ifdef CONFIG_HOTPLUG_CPU -int __cpu_disable(void) +int pSeries_cpu_disable(void) { /* FIXME: go put this in a header somewhere */ extern void xics_migrate_irqs_away(void); @@ -106,7 +106,7 @@ int __cpu_disable(void) return 0; } -void __cpu_die(unsigned int cpu) +void pSeries_cpu_die(unsigned int cpu) { int tries; int cpu_status; @@ -355,6 +355,11 @@ void __init smp_init_pSeries(void) else smp_ops = &pSeries_xics_smp_ops; +#ifdef CONFIG_HOTPLUG_CPU + smp_ops->cpu_disable = pSeries_cpu_disable; + smp_ops->cpu_die = pSeries_cpu_die; +#endif + /* Start secondary threads on SMT systems; primary threads * are already in the running state. */ Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.c --- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c 4 Jan 2005 04:03:33 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c 9 Jan 2005 03:48:56 -0000 @@ -557,3 +557,19 @@ void __init smp_cpus_done(unsigned int m */ cpu_present_map = cpu_possible_map; } + +#ifdef CONFIG_HOTPLUG_CPU +int __cpu_disable(void) +{ + if (smp_ops->cpu_disable) + return smp_ops->cpu_disable(); + + return -ENOSYS; +} + +void __cpu_die(unsigned int cpu) +{ + if (smp_ops->cpu_die) + smp_ops->cpu_die(cpu); +} +#endif Index: linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/include/asm-ppc64/machdep.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 machdep.h --- linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h 4 Jan 2005 04:03:40 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h 9 Jan 2005 03:50:21 -0000 @@ -31,6 +31,8 @@ struct smp_ops_t { void (*late_setup_cpu)(int nr); void (*take_timebase)(void); void (*give_timebase)(void); + int (*cpu_disable)(void); + void (*cpu_die)(unsigned int nr); }; #endif From anton at samba.org Sun Jan 9 16:48:34 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 9 Jan 2005 16:48:34 +1100 Subject: xtime <-> gettimeofday can get out of sync Message-ID: <20050109054834.GL14239@krispykreme.ozlabs.ibm.com> Hi, Ive noticed a problem where xtime and gettimeofday could get out of sync if interrupts are disabled for too long (eg long kernel code paths or dropping into the debugger for a while). We correctly replay lost jiffies but in that loop time_sync_xtime syncs the intermediate values of xtime up with the current value of gettimeofday. So xtime jumps by a bunch and from then on it is ahead of gettimeofday and we never resync the two. I guess this is to avoid xtime going backwards. The patch below creates a __do_gettimeofday where you can pass in a tb value and sync the intermediate values of xtime properly. Note that the time_sync_xtime check only stops the seconds from going backwards, the ns component still could couldnt it? Considering this stuff is hard to get right, should we switch to the time interpolator stuff? The only problem there is it might be trouble for systemcfg (which exports stuff to do userspace gettimeofday). Anton ===== arch/ppc64/kernel/time.c 1.44 vs edited ===== --- 1.44/arch/ppc64/kernel/time.c 2005-01-05 13:48:14 +11:00 +++ edited/arch/ppc64/kernel/time.c 2005-01-09 16:37:33 +11:00 @@ -142,16 +142,54 @@ } } +/* + * This version of gettimeofday has microsecond resolution. + */ +static inline void __do_gettimeofday(struct timeval *tv, unsigned long tb_val) +{ + unsigned long sec, usec, tb_ticks; + unsigned long xsec, tb_xsec; + struct gettimeofday_vars * temp_varp; + unsigned long temp_tb_to_xs, temp_stamp_xsec; + + /* + * These calculations are faster (gets rid of divides) + * if done in units of 1/2^20 rather than microseconds. + * The conversion to microseconds at the end is done + * without a divide (and in fact, without a multiply) + */ + tb_ticks = tb_val - do_gtod.tb_orig_stamp; + temp_varp = do_gtod.varp; + temp_tb_to_xs = temp_varp->tb_to_xs; + temp_stamp_xsec = temp_varp->stamp_xsec; + tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); + xsec = temp_stamp_xsec + tb_xsec; + sec = xsec / XSEC_PER_SEC; + xsec -= sec * XSEC_PER_SEC; + usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC; + + tv->tv_sec = sec; + tv->tv_usec = usec; +} + +void do_gettimeofday(struct timeval *tv) +{ + __do_gettimeofday(tv, get_tb()); +} + +EXPORT_SYMBOL(do_gettimeofday); + /* Synchronize xtime with do_gettimeofday */ -static __inline__ void timer_sync_xtime( unsigned long cur_tb ) +static inline void timer_sync_xtime(unsigned long cur_tb) { struct timeval my_tv; - if ( cur_tb > next_xtime_sync_tb ) { + if (cur_tb > next_xtime_sync_tb) { next_xtime_sync_tb = cur_tb + xtime_sync_interval; - do_gettimeofday( &my_tv ); - if ( xtime.tv_sec <= my_tv.tv_sec ) { + __do_gettimeofday(&my_tv, cur_tb); + + if (xtime.tv_sec <= my_tv.tv_sec) { xtime.tv_sec = my_tv.tv_sec; xtime.tv_nsec = my_tv.tv_usec * 1000; } @@ -274,7 +312,7 @@ write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; do_timer(regs); - timer_sync_xtime( cur_tb ); + timer_sync_xtime(lpaca->next_jiffy_update_tb); timer_check_rtc(); write_sequnlock(&xtime_lock); if ( adjusting_time && (time_adjust == 0) ) @@ -312,36 +350,6 @@ { return mulhdu(get_tb(), tb_to_ns_scale) << tb_to_ns_shift; } - -/* - * This version of gettimeofday has microsecond resolution. - */ -void do_gettimeofday(struct timeval *tv) -{ - unsigned long sec, usec, tb_ticks; - unsigned long xsec, tb_xsec; - struct gettimeofday_vars * temp_varp; - unsigned long temp_tb_to_xs, temp_stamp_xsec; - - /* These calculations are faster (gets rid of divides) - * if done in units of 1/2^20 rather than microseconds. - * The conversion to microseconds at the end is done - * without a divide (and in fact, without a multiply) */ - tb_ticks = get_tb() - do_gtod.tb_orig_stamp; - temp_varp = do_gtod.varp; - temp_tb_to_xs = temp_varp->tb_to_xs; - temp_stamp_xsec = temp_varp->stamp_xsec; - tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); - xsec = temp_stamp_xsec + tb_xsec; - sec = xsec / XSEC_PER_SEC; - xsec -= sec * XSEC_PER_SEC; - usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC; - - tv->tv_sec = sec; - tv->tv_usec = usec; -} - -EXPORT_SYMBOL(do_gettimeofday); int do_settimeofday(struct timespec *tv) { From j.glisse at gmail.com Mon Jan 10 02:26:12 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 16:26:12 +0100 Subject: U3 G5 AGP support patch (v4) Message-ID: <4240b916050109072621440269@mail.gmail.com> Hi, Attached is a patch for the U3 agp bridge. This one just fix some typo from the previous patch. (DEVICE instead of DEVIEC...). Signed-off-by: Jerome Glisse best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: uninorth-patch4 Type: application/octet-stream Size: 10216 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/226dc102/attachment.obj From j.glisse at gmail.com Mon Jan 10 02:40:56 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 16:40:56 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) Message-ID: <4240b916050109074053e328b1@mail.gmail.com> Hi, With 2.6.10 i get a compilation error with disable_6xx_mmu i guess this is linked with the patch you supplied in december in arch/ppc/boot/common/util.S Patch which comment disable_6xx_mmu if flags CONFIG_6XX not defined. The problem arise in arch/ppc/boot/simple/misc-prep.c where there is no conditional compilation for this function. Attached is a patch that use the flags CONFIG_6XX to comment out call to this function if flags not set. By the way there is many compilation warning related to PPC with 2.6.10 anyone looking to correct them ? Signed-off-by: Jerome Glisse best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: disable_6xx-patch Type: application/octet-stream Size: 855 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/888cb654/attachment.obj From hch at lst.de Mon Jan 10 03:06:14 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 9 Jan 2005 17:06:14 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b916050109072621440269@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> Message-ID: <20050109160614.GA22839@lst.de> +static struct device_node* uninorth_node __pmacdata; +static u32 __iomem * uninorth_base __pmacdata; static struct device_node *uninorth_node __pmacdata; static u32 __iomem *uninorth_base __pmacdata; + if(uninorth_rev == 0x21) { if (uninorth_rev == 0x21) { + if((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { if ((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { + if (agp_bridge->dev->device == PCI_DEVICE_ID_APPLE_U3_AGP) { + /* This is an AGP V3 */ + agp_device_command(command, TRUE); + } else { + /* AGP V2 */ + agp_device_command(command, FALSE); + } double-indentation, also please use 1/0 instead of TRUE/FALSE. +static struct aper_size_info_32 u3_sizes[8] = +{ +/* + * Not sure that uninorth3 supports that high aperture sizes but it + * would strange if it did not :) + */ comment before the struct declearation, please, aka /* * Not sure that uninorth3 supports that high aperture sizes but it * would strange if it did not :) */ static struct aper_size_info_32 u3_sizes[8] = { + uninorth_node = of_find_node_by_name(NULL, "uni-n"); + /* Locate G5 u3 */ + if (uninorth_node == NULL) { + uninorth_node = of_find_node_by_name(NULL, "u3"); + } /* Locate G5 u3 */ uninorth_node = of_find_node_by_name(NULL, "uni-n"); if (!uninorth_node) uninorth_node = of_find_node_by_name(NULL, "u3"); + /* + * Set specific functions & values for agp3 controller. + */ + if (pdev->device == PCI_DEVICE_ID_APPLE_U3_AGP) { + uninorth_agp_driver.insert_memory = uninorth3_insert_memory; + uninorth_agp_driver.aperture_sizes = (void *)u3_sizes; + uninorth_agp_driver.num_aperture_sizes = 8; Please delcare separate driver instance instead of overriding. And asm-ppc64 is still missing an agp.h, no? From j.glisse at gmail.com Mon Jan 10 04:46:05 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 18:46:05 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <20050109160614.GA22839@lst.de> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> Message-ID: <4240b91605010909463e44bba8@mail.gmail.com> > Please delcare separate driver instance instead of overriding. I hope new patch follow codestyle ? :) > And asm-ppc64 is still missing an agp.h, no? Maybe, some one with better knowledge may tell us more on that :) Anyway BenH tell me that there is still pending issue with agp & a potential cache aliasing. best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: uninorth-patch5 Type: application/octet-stream Size: 11215 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/5cfffcbb/attachment.obj From j.glisse at gmail.com Mon Jan 10 07:41:44 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 21:41:44 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b91605010909463e44bba8@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> <4240b91605010909463e44bba8@mail.gmail.com> Message-ID: <4240b91605010912414a5b1b67@mail.gmail.com> It seems there is bug somewhere in my agp patch. I was playing with r300 radeon and i get a hard lockup (quite used to that while playing with r300 thought :() But after a bit of investigation it seems to be related to agp. Right now i am porting an old tools from dri that test agpgart & thus agp. I finally may really need to totaly split the u3 driver from the old uninorth. I will give a deeper look to track down the issue. In the mean time if some one could test agp & radeon r200 on a g5. You will certainly lockup your g5 but it should not burn, at least here i just got some smoke ;) best, Jerome Glisse From paulus at samba.org Mon Jan 10 08:03:20 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Jan 2005 08:03:20 +1100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b916050109074053e328b1@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> Message-ID: <16865.39960.274092.996530@cargo.ozlabs.ibm.com> Jerome Glisse writes: > With 2.6.10 i get a compilation error with disable_6xx_mmu What kind of machine is this? Could you send me your .config? I suspect that maybe we aren't defining CONFIG_6XX for PPC970 machines. Paul. From david at gibson.dropbear.id.au Tue Jan 11 02:55:20 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 02:55:20 +1100 Subject: [PPC64] Hugepage bugfix Message-ID: <20050110155520.GA22101@localhost.localdomain> Andrew, Linus, please apply: Fix a stupid unbalanced lock bug in the ppc64 hugepage code. Lead rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were enabled (even without actually using hugepages at all). Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2005-01-06 10:47:48.000000000 +1100 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2005-01-10 15:16:25.142319552 +1100 @@ -745,7 +745,7 @@ pgdir = mm->context.huge_pgdir; if (! pgdir) - return; + goto out; mm->context.huge_pgdir = NULL; @@ -768,6 +768,7 @@ BUG_ON(memcmp(pgdir, empty_zero_page, PAGE_SIZE)); kmem_cache_free(zero_cache, pgdir); + out: spin_unlock(&mm->page_table_lock); } -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From wli at holomorphy.com Mon Jan 10 16:04:41 2005 From: wli at holomorphy.com (William Lee Irwin III) Date: Sun, 9 Jan 2005 21:04:41 -0800 Subject: [PPC64] Hugepage bugfix In-Reply-To: <20050110155520.GA22101@localhost.localdomain> References: <20050110155520.GA22101@localhost.localdomain> Message-ID: <20050110050441.GA2696@holomorphy.com> On Tue, Jan 11, 2005 at 02:55:20AM +1100, David Gibson wrote: > Andrew, Linus, please apply: > Fix a stupid unbalanced lock bug in the ppc64 hugepage code. Lead > rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were > enabled (even without actually using hugepages at all). > Signed-off-by: David Gibson Acked-by: William Irwin From david at gibson.dropbear.id.au Tue Jan 11 05:00:04 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 05:00:04 +1100 Subject: [PPC64] Rename perf counter register #defines Message-ID: <20050110180004.GC22101@localhost.localdomain> Andrew, please apply: This patch makes some cleanups to the #defines for various fields in the MMCR0 performance monitor control register. Specifically, the names of a couple of bits are changed so that: a) they are a bit less cumbersomely long and b) they match the names used in the hardware documentation. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/processor.h =================================================================== --- working-2.6.orig/include/asm-ppc64/processor.h 2005-01-10 16:51:10.625391320 +1100 +++ working-2.6/include/asm-ppc64/processor.h 2005-01-10 16:51:28.771295712 +1100 @@ -331,8 +331,8 @@ #define MMCR0_FCECE 0x02000000UL /* freeze counters on enabled condition or event */ /* time base exception enable */ #define MMCR0_TBEE 0x00400000UL /* time base exception enable */ -#define MMCR0_PMC1INTCONTROL 0x00008000UL /* PMC1 count enable*/ -#define MMCR0_PMCNINTCONTROL 0x00004000UL /* PMCn count enable*/ +#define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/ +#define MMCR0_PMCjCE 0x00004000UL /* PMCj count enable*/ #define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ #define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ #define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ Index: working-2.6/arch/ppc64/oprofile/op_model_rs64.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_model_rs64.c 2005-01-10 16:51:10.625391320 +1100 +++ working-2.6/arch/ppc64/oprofile/op_model_rs64.c 2005-01-10 16:51:28.772295560 +1100 @@ -119,7 +119,7 @@ mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; /* Only applies to POWER3, but should be safe on RS64 */ - mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL; + mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE; mtspr(SPRN_MMCR0, mmcr0); dbg("setup on cpu %d, mmcr0 %lx\n", smp_processor_id(), Index: working-2.6/arch/ppc64/oprofile/op_model_power4.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_model_power4.c 2005-01-10 16:51:10.626391168 +1100 +++ working-2.6/arch/ppc64/oprofile/op_model_power4.c 2005-01-10 16:51:28.772295560 +1100 @@ -97,7 +97,7 @@ mtspr(SPRN_MMCR0, mmcr0); mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; - mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL; + mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE; mtspr(SPRN_MMCR0, mmcr0); mtspr(SPRN_MMCR1, mmcr1_val); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Jan 11 05:01:27 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 05:01:27 +1100 Subject: [PPC64] Functions to reserve performance monitor hardware Message-ID: <20050110180127.GD22101@localhost.localdomain> Andrew, please apply: The PPC64 interrupt code includes a hook to call when an exception from the performance monitor unit occurs. However, there's no way of reserving the hook properly, so if more than one bit of code tries to use it things will get ugly. Currently oprofile is the only user, but there are likely to be more in future e.g. perfctr, if and when it reaches a fit state for merging. This patch creates functions to reserve and release the performance monitor hardware (including its interrupt), and makes oprofile use them. It also creates a new arch/ppc64/kernel/pmc.c, in which we can put any future helper functions for handling the performance monitor counters. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/pmc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/ppc64/kernel/pmc.c 2005-01-10 16:32:49.733411536 +1100 @@ -0,0 +1,65 @@ +/* + * linux/arch/ppc64/kernel/pmc.c + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +#include +#include + +/* Ensure exceptions are disabled */ +static void dummy_perf(struct pt_regs *regs) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); + mtspr(SPRN_MMCR0, mmcr0); +} + +static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED; +static void *pmc_owner_caller; /* mostly for debugging */ +perf_irq_t perf_irq = dummy_perf; + +int reserve_pmc_hardware(perf_irq_t new_perf_irq) +{ + int err = -EBUSY;; + + spin_lock(&pmc_owner_lock); + + if (pmc_owner_caller) { + printk(KERN_WARNING "reserve_pmc_hardware: " + "PMC hardware busy (reserved by caller %p)\n", + pmc_owner_caller); + goto out; + } + + pmc_owner_caller = __builtin_return_address(0); + perf_irq = new_perf_irq ? : dummy_perf; + + err = 0; + + out: + spin_unlock(&pmc_owner_lock); + return err; +} + +void release_pmc_hardware(void) +{ + spin_lock(&pmc_owner_lock); + + WARN_ON(! pmc_owner_caller); + + pmc_owner_caller = NULL; + perf_irq = dummy_perf; + + spin_unlock(&pmc_owner_lock); +} Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-10 10:51:31.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-10 16:33:43.154412536 +1100 @@ -40,6 +40,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUGGER int (*__debugger)(struct pt_regs *regs); @@ -449,18 +450,7 @@ die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -/* Ensure exceptions are disabled */ -static void dummy_perf(struct pt_regs *regs) -{ - unsigned int mmcr0 = mfspr(SPRN_MMCR0); - - mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); - mtspr(SPRN_MMCR0, mmcr0); -} - -void (*perf_irq)(struct pt_regs *) = dummy_perf; - -EXPORT_SYMBOL(perf_irq); +extern perf_irq_t perf_irq; void performance_monitor_exception(struct pt_regs *regs) { Index: working-2.6/include/asm-ppc64/pmc.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-ppc64/pmc.h 2005-01-10 15:24:40.217406672 +1100 @@ -0,0 +1,29 @@ +/* + * pmc.h + * Copyright (C) 2004 David Gibson, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _PPC64_PMC_H +#define _PPC64_PMC_H + +#include + +typedef void (*perf_irq_t)(struct pt_regs *); + +int reserve_pmc_hardware(perf_irq_t new_perf_irq); +void release_pmc_hardware(void); + +#endif /* _PPC64_PMC_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-01-10 10:51:31.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-01-10 15:24:40.218406520 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o pmc.o obj-$(CONFIG_PPC_OF) += of_device.o -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From anil_411 at yahoo.com Mon Jan 10 18:49:30 2005 From: anil_411 at yahoo.com (Anil Kumar Prasad) Date: Sun, 9 Jan 2005 23:49:30 -0800 (PST) Subject: ioremap of pci region on pSeries LPAR vs SMP Message-ID: <20050110074930.92901.qmail@web11508.mail.yahoo.com> Hi, I am using SLES9 default kernel(2.6.5). I have a piece of code where i do ioremap on pci memory region. It works on JS20 machine where linux runs in partition mode while it causes SLB miss on SMP box(P630) and subsequently panics. On JS20, i get va in IO_REGION (0xE000....) while on p630 ioremap returns address in EEH_REGION(0xA000...). As soon as i try to dereference this returned va on p630, kernel crashes(dump is at the end of mail). I looked in slab.c:slb_allocate(). it doesn't look like that SLB miss in EEH_REGION will ever get through 'us REGION_ID check will return user address. Did i miss something? Please help. Thanks a lot, Anil. ------------------ SMP NR_CPUS=128 NUMA PSERIES NIP: D000000000649CB4 XER: 0000000000000000 LR: D000000000649CA4 REGS: c0000003f7897670 TRAP: 0380 Tainted: GF U (2.6.5-7.97-pseries64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: a000000082060010, DSISR: 0000000002200000 TASK: c00000003f1b5340[5037] 'modprobe' THREAD: c0000003f7894000 CPU: 0 GPR00: 0000000000000001 C0000003F78978F0 D0000000006AEB70 00000000001E8480 GPR04: 0000000000000000 0000000000000004 0000000028088422 0000000000000000 GPR08: 0000000000000000 FFFFFFFFFFFFFFFF C0000000006CAC80 0000000000000080 GPR12: 0000000048004028 C000000000444000 D0000000006A6DD9 D0000000006A6DA8 GPR16: 0000000000000001 0000000000000000 C000000000411770 C000000000411670 GPR20: C000000000411050 D0000000006A6DA8 0000000000001867 0000000000006278 GPR24: 0000000000000001 C0000003F7897A40 C0000001FD158080 C0000001FD158180 GPR28: 0000000000000000 A000000082060010 D0000000006A8F38 C000000000411000 --------------------------------------------------- __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 From anil_411 at yahoo.com Mon Jan 10 18:49:45 2005 From: anil_411 at yahoo.com (Anil Kumar Prasad) Date: Sun, 9 Jan 2005 23:49:45 -0800 (PST) Subject: ioremap of pci region on pSeries LPAR vs SMP Message-ID: <20050110074945.83609.qmail@web11501.mail.yahoo.com> Hi, I am using SLES9 default kernel(2.6.5). I have a piece of code where i do ioremap on pci memory region. It works on JS20 machine where linux runs in partition mode while it causes SLB miss on SMP box(P630) and subsequently panics. On JS20, i get va in IO_REGION (0xE000....) while on p630 ioremap returns address in EEH_REGION(0xA000...). As soon as i try to dereference this returned va on p630, kernel crashes(dump is at the end of mail). I looked in slab.c:slb_allocate(). it doesn't look like that SLB miss in EEH_REGION will ever get through 'us REGION_ID check will return user address. Did i miss something? Please help. Thanks a lot, Anil. ------------------ SMP NR_CPUS=128 NUMA PSERIES NIP: D000000000649CB4 XER: 0000000000000000 LR: D000000000649CA4 REGS: c0000003f7897670 TRAP: 0380 Tainted: GF U (2.6.5-7.97-pseries64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: a000000082060010, DSISR: 0000000002200000 TASK: c00000003f1b5340[5037] 'modprobe' THREAD: c0000003f7894000 CPU: 0 GPR00: 0000000000000001 C0000003F78978F0 D0000000006AEB70 00000000001E8480 GPR04: 0000000000000000 0000000000000004 0000000028088422 0000000000000000 GPR08: 0000000000000000 FFFFFFFFFFFFFFFF C0000000006CAC80 0000000000000080 GPR12: 0000000048004028 C000000000444000 D0000000006A6DD9 D0000000006A6DA8 GPR16: 0000000000000001 0000000000000000 C000000000411770 C000000000411670 GPR20: C000000000411050 D0000000006A6DA8 0000000000001867 0000000000006278 GPR24: 0000000000000001 C0000003F7897A40 C0000001FD158080 C0000001FD158180 GPR28: 0000000000000000 A000000082060010 D0000000006A8F38 C000000000411000 --------------------------------------------------- __________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail From paulus at samba.org Mon Jan 10 20:10:59 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Jan 2005 20:10:59 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050110074930.92901.qmail@web11508.mail.yahoo.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> Message-ID: <16866.18083.212727.327170@cargo.ozlabs.ibm.com> Anil Kumar Prasad writes: > On JS20, i get va in IO_REGION (0xE000....) while on > p630 ioremap returns address in > EEH_REGION(0xA000...). As soon as i try to dereference > this returned va on p630, kernel crashes(dump is at > the end of mail). You shouldn't ever directly dereference the result of ioremap. You have to use readb/readw/readl and writeb/writew/writel. Paul. From trini at kernel.crashing.org Tue Jan 11 01:52:19 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 07:52:19 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <16865.39960.274092.996530@cargo.ozlabs.ibm.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> Message-ID: <20050110145219.GB2226@smtp.west.cox.net> On Mon, Jan 10, 2005 at 08:03:20AM +1100, Paul Mackerras wrote: > Jerome Glisse writes: > > > With 2.6.10 i get a compilation error with disable_6xx_mmu > > What kind of machine is this? Could you send me your .config? > > I suspect that maybe we aren't defining CONFIG_6XX for PPC970 > machines. Indeed. It might make most sense to do something like: Signed-off-by: Tom Rini --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 @@ -112,11 +112,15 @@ end-$(pcore) := pcore cacheflag-$(pcore) := -include $(clear_L2_L3) +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real +# machine. +ifeq ($(CONFIG_6xx),y) zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS extra.o-$(CONFIG_PPC_PREP) := prepmap.o misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o end-$(CONFIG_PPC_PREP) := prep +endif end-$(CONFIG_SANDPOINT) := sandpoint cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) -- Tom Rini http://gate.crashing.org/~trini/ From hch at lst.de Tue Jan 11 03:39:15 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 10 Jan 2005 17:39:15 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> Message-ID: <20050110163914.GA11906@lst.de> On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote: > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > +# machine. Maybe we should prevent setting PPC_PREP to y for PPC970 instead? From trini at kernel.crashing.org Tue Jan 11 03:44:02 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 09:44:02 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110163914.GA11906@lst.de> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <20050110163914.GA11906@lst.de> Message-ID: <20050110164402.GF2226@smtp.west.cox.net> On Mon, Jan 10, 2005 at 05:39:15PM +0100, Christoph Hellwig wrote: > On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote: > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > > +# machine. > > Maybe we should prevent setting PPC_PREP to y for PPC970 instead? I don't know if that'll compile. It'd be nice because it means we could try splitting the PREP stuff out of the OpenFirmware (pmac/chrp) stuff again. -- Tom Rini http://gate.crashing.org/~trini/ From linas at austin.ibm.com Tue Jan 11 04:47:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 10 Jan 2005 11:47:16 -0600 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <16866.18083.212727.327170@cargo.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> Message-ID: <20050110174716.GW22274@austin.ibm.com> Hi Paul, On Mon, Jan 10, 2005 at 08:10:59PM +1100, Paul Mackerras was heard to remark: > Anil Kumar Prasad writes: > > > On JS20, i get va in IO_REGION (0xE000....) while on > > p630 ioremap returns address in > > EEH_REGION(0xA000...). As soon as i try to dereference > > this returned va on p630, kernel crashes(dump is at > > the end of mail). > > You shouldn't ever directly dereference the result of ioremap. You > have to use readb/readw/readl and writeb/writew/writel. Paul, Please note that someone removed the EEH_REGION stuff recently, october-ish I think. I don't know why, I thought it was something you condoned. And so in the latest kernels, it *is* legal to directly dereference the result of ioremap. That is, Anil wouldn't have seen this bug if he'd been running the current BK sources. Was removing this mechanism the right thing to do? If so, why? It seemed like a great way to force everyone to use the readb/etc macros. --linas From j.glisse at gmail.com Tue Jan 11 05:14:28 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:14:28 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> Message-ID: <4240b9160501101014317b8d85@mail.gmail.com> > Signed-off-by: Tom Rini > > --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 > +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 > @@ -112,11 +112,15 @@ > end-$(pcore) := pcore > cacheflag-$(pcore) := -include $(clear_L2_L3) > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > +# machine. > +ifeq ($(CONFIG_6xx),y) > zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS > zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS > extra.o-$(CONFIG_PPC_PREP) := prepmap.o > misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o > end-$(CONFIG_PPC_PREP) := prep > +endif > > end-$(CONFIG_SANDPOINT) := sandpoint > cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) > This do not compile with this patch maybe need also to define CONFIG_6xx if PPC970 is selected as processor ? The errors are: undefined reference for cols, lines, vidmems, scroll, orig_x, orig_y in functions puts, ClearVideoMemory, putc Jerome Glisse From j.glisse at gmail.com Tue Jan 11 05:16:22 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:16:22 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b91605010912414a5b1b67@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> <4240b91605010909463e44bba8@mail.gmail.com> <4240b91605010912414a5b1b67@mail.gmail.com> Message-ID: <4240b916050110101647cfb8f9@mail.gmail.com> > It seems there is bug somewhere in my agp patch. I was playing with > r300 radeon and > i get a hard lockup (quite used to that while playing with r300 thought :() > > But after a bit of investigation it seems to be related to agp. Right now i am > porting an old tools from dri that test agpgart & thus agp. I finally may really > need to totaly split the u3 driver from the old uninorth. > > I will give a deeper look to track down the issue. In the mean time if some > one could test agp & radeon r200 on a g5. You will certainly lockup your g5 > but it should not burn, at least here i just got some smoke ;) > Finally this was because i was doing some nasty stuff elsewhere :) Thus AGP seems to work well, at least over here with some r300 test program using agp :) best, Jerome Glisse From trini at kernel.crashing.org Tue Jan 11 05:29:41 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 11:29:41 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b9160501101014317b8d85@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> Message-ID: <20050110182940.GA3391@smtp.west.cox.net> On Mon, Jan 10, 2005 at 07:14:28PM +0100, Jerome Glisse wrote: > > Signed-off-by: Tom Rini > > > > --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 > > +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 > > @@ -112,11 +112,15 @@ > > end-$(pcore) := pcore > > cacheflag-$(pcore) := -include $(clear_L2_L3) > > > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > > +# machine. > > +ifeq ($(CONFIG_6xx),y) > > zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS > > zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS > > extra.o-$(CONFIG_PPC_PREP) := prepmap.o > > misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o > > end-$(CONFIG_PPC_PREP) := prep > > +endif > > > > end-$(CONFIG_SANDPOINT) := sandpoint > > cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) > > > > This do not compile with this patch maybe need also to define > CONFIG_6xx if PPC970 is selected as processor ? I have a feeling CONFIG_6xx isn't selected for a good reason. Can you try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig and seeing if you can build / boot ? Thanks. -- Tom Rini http://gate.crashing.org/~trini/ From j.glisse at gmail.com Tue Jan 11 05:59:50 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:59:50 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110182940.GA3391@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> Message-ID: <4240b91605011010593d2f3b3d@mail.gmail.com> > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > and seeing if you can build / boot ? Thanks. > > -- > Tom Rini > http://gate.crashing.org/~trini/ > Seems that this flags is linked to many things :) I tried removing PPC_PREP bool but the kernel fail to compile with again new errors : arch/ppc/kernel/built-in.o(.init.text+0x610): In function `DoSyscall': arch/ppc/kernel/entry.S: undefined reference to `prep_init' arch/ppc/platforms/built-in.o(.pmac.text+0x936): In function 'note_bootable_part': : undefined reference to `boot_dev' I attach my config, someone asked me for that previously but i crashed my system since, thus here it is. Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: config-ppc970 Type: application/octet-stream Size: 27921 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050110/902cf012/attachment.obj From trini at kernel.crashing.org Tue Jan 11 06:12:48 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 12:12:48 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b91605011010593d2f3b3d@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> Message-ID: <20050110191248.GB3391@smtp.west.cox.net> On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote: > > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > > and seeing if you can build / boot ? Thanks. > > > > -- > > Tom Rini > > http://gate.crashing.org/~trini/ > > > > Seems that this flags is linked to many things :) I tried removing PPC_PREP > bool but the kernel fail to compile with again new errors : > One last thing before we just do what you suggested originally, can you hack it so that PPC_PREP is still set, but on 970 we still set CONFIG_6xx? Thanks again. -- Tom Rini http://gate.crashing.org/~trini/ From j.glisse at gmail.com Tue Jan 11 06:31:29 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 20:31:29 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110191248.GB3391@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> <20050110191248.GB3391@smtp.west.cox.net> Message-ID: <4240b91605011011314bb06814@mail.gmail.com> On Mon, 10 Jan 2005 12:12:48 -0700, Tom Rini wrote: > On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote: > > > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > > > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > > > and seeing if you can build / boot ? Thanks. > > > > > > -- > > > Tom Rini > > > http://gate.crashing.org/~trini/ > > > > > > > Seems that this flags is linked to many things :) I tried removing PPC_PREP > > bool but the kernel fail to compile with again new errors : > > > > One last thing before we just do what you suggested originally, can you > hack it so that PPC_PREP is still set, but on 970 we still set > CONFIG_6xx? Thanks again. > > -- > Tom Rini > http://gate.crashing.org/~trini/ > This issue must be strongly linked with the Murphy Law. Got another compile error when y a add CONFIG_6xx=y to my kernel config. LD .tmp_vmlinux1 ld: arch/ppc/kernel/idle_6xx.o: No such file: Aucun fichier ou r?pertoire de ce type Unfortunetly i've got to move (some trip for my study) and i won't be able to have access any g5 or PPC970 with linux on it until i came back friday or saturday. Anyway i may access my mail until then. Does this disable_6xx_mmu function do something that we should really have on PPC970 ? I hadn't get enought time to look at this function and understand it. By the way, even if i pretty sure this is not related, my kernel is patched with one of my patch (i posted it on this mailling list) that add support of U3 agp bridge on G5. This patch only affect few file and if i remember well, i have tested without it too with no success. Files affected by my patch pciids.h, uninorth.c(char/driver/agp), uninorth.h(asm-ppc&64). and some change in Kconfig of (char/driver/agp) One strange things is that no one except me report error on compilation ? No one use linux with g5, am i alone :) ? best, Jerome Glisse From domen at coderock.org Tue Jan 11 06:59:58 2005 From: domen at coderock.org (domen at coderock.org) Date: Mon, 10 Jan 2005 20:59:58 +0100 Subject: [patch 1/1] ppc64: semicolon in rtasd.c Message-ID: <20050110195959.4D66A1F203@trashy.coderock.org> Hi. Comments and identiation suggest this was wrong. Signed-off-by: Domen Puncer --- kj-domen/arch/ppc64/kernel/rtasd.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c --- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c 2005-01-10 18:00:30.000000000 +0100 +++ kj-domen/arch/ppc64/kernel/rtasd.c 2005-01-10 18:00:30.000000000 +0100 @@ -486,7 +486,7 @@ static int __init rtas_init(void) /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { - if (systemcfg->platform & PLATFORM_PSERIES); + if (systemcfg->platform & PLATFORM_PSERIES) printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } _ From hollis at penguinppc.org Tue Jan 11 07:15:57 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Mon, 10 Jan 2005 20:15:57 +0000 Subject: email message sizes In-Reply-To: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> Message-ID: <200501102015.57394.hollis@penguinppc.org> On Monday 10 January 2005 15:22, Hollis Blanchard wrote: > Hi all, I am one of two people who moderates these mailing lists. On > occasion, people send large emails to these lists. I am of the opinion > that 1MB emails should not be mass-mailed, but if you all have no > problem with that then I will approve them. > > So are any of you on modems, or operate near the limits of your mail > quotas? I'd like to hear comments either way: how large is ok to post > to these mailing lists? So far I have received 5 private mails indicating that 100KB is a reasonable maximum. If you disagree please speak up... -Hollis From paulus at samba.org Tue Jan 11 08:41:48 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 11 Jan 2005 08:41:48 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050110174716.GW22274@austin.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> Message-ID: <16866.63132.352016.732484@cargo.ozlabs.ibm.com> Linas Vepstas writes: > Please note that someone removed the EEH_REGION stuff recently, > october-ish I think. I don't know why, I thought it was something > you condoned. And so in the latest kernels, it *is* legal to directly > dereference the result of ioremap. It might work, but it's not legal on any architecture. I thought there was a file in the Documentation directory explaining that, but I can't find it now. Certainly it has been discussed on various mailing lists in the past. See for example: http://uwsg.iu.edu/hypermail/linux/kernel/0007.3/0591.html On ppc and ppc64, the ioremap return happens to be a valid effective address, but dereferencing it directly is still not right, since if you do that you miss out on the barriers you need to ensure that your loads and stores hit the device in program order. > Was removing this mechanism the right thing to do? If so, why? It was an enormous simplification and Linus was keen to do it. He actually looks at our code from time to time now that his desktop machine is a G5. :) > It seemed like a great way to force everyone to use the > readb/etc macros. Some architectures do in fact use ioremap cookie poisoning for that reason. We could do that as a debug option. Paul. From olof at austin.ibm.com Tue Jan 11 09:23:40 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 10 Jan 2005 16:23:40 -0600 Subject: [PPC64] Functions to reserve performance monitor hardware In-Reply-To: <20050110180127.GD22101@localhost.localdomain> References: <20050110180127.GD22101@localhost.localdomain> Message-ID: <20050110222340.GA13731@austin.ibm.com> On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote: > This patch creates functions to reserve and release the performance > monitor hardware (including its interrupt), and makes oprofile use > them. I don't see where you make oprofile use the functions? op_model_* changes aren't included in the patch. > +int reserve_pmc_hardware(perf_irq_t new_perf_irq) > +{ > + int err = -EBUSY;; Keeping an extra semicolon around in case you need one in a hurry? :) > + spin_lock(&pmc_owner_lock); > + > + if (pmc_owner_caller) { > + printk(KERN_WARNING "reserve_pmc_hardware: " > + "PMC hardware busy (reserved by caller %p)\n", > + pmc_owner_caller); > + goto out; > + } > + > + pmc_owner_caller = __builtin_return_address(0); > + perf_irq = new_perf_irq ? : dummy_perf; > + > + err = 0; Maybe I'm the only one with such an opinion, but I find it more readable to set the error code in the error case (if section above) instead of defaulting to error and clearing it before returning. :) > + pmc_owner_caller = NULL; > + perf_irq = dummy_perf; > + > + spin_unlock(&pmc_owner_lock); Current oprofile code has an implicit mb(); after restoring perf_irq. I think the implied lwsync in spin_unlock is sufficient, but I wanted to mention it. How do you expect the function to be used, will there really be users reserving the hardware without registering the interrupt handler? If there are no such users then it could be nice to reserve using the handler instead of the return address. -Olof From anton at samba.org Tue Jan 11 10:00:15 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Jan 2005 10:00:15 +1100 Subject: [patch 1/1] ppc64: semicolon in rtasd.c In-Reply-To: <20050110195959.4D66A1F203@trashy.coderock.org> References: <20050110195959.4D66A1F203@trashy.coderock.org> Message-ID: <20050110230015.GB14239@krispykreme.ozlabs.ibm.com> Nice catch! Anton -- From: Domen Puncer semicolon in rtasd.c Signed-off-by: Domen Puncer Acked-by: Anton Blanchard diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c --- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c 2005-01-10 18:00:30.000000000 +0100 +++ kj-domen/arch/ppc64/kernel/rtasd.c 2005-01-10 18:00:30.000000000 +0100 @@ -486,7 +486,7 @@ static int __init rtas_init(void) /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { - if (systemcfg->platform & PLATFORM_PSERIES); + if (systemcfg->platform & PLATFORM_PSERIES) printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } _ From anton at samba.org Tue Jan 11 11:08:45 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Jan 2005 11:08:45 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <16866.63132.352016.732484@cargo.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> Message-ID: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> Hi, > It was an enormous simplification and Linus was keen to do it. He > actually looks at our code from time to time now that his desktop > machine is a G5. :) Roland (the infiniband guy) and Linus were behind it: http://marc.theaimsgroup.com/?l=linux-kernel&m=109579598620069&w=2 Looks like it was due to __raw_* not having any EEH checks. As a side note its a worry that we dont have IO macros that order but dont byte swap. __raw_* (which doesnt order) is going to catch out a lot of driver writers I suspect. > Some architectures do in fact use ioremap cookie poisoning for that > reason. We could do that as a debug option. Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into userspace. Their hack on ppc64 was to look at the high nibble of the address and convert it to a non EEH address if required :) Im not sure how best to solve the userspace mmap issue but there are a few groups wanting that. Anton From david at gibson.dropbear.id.au Tue Jan 11 21:57:07 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 21:57:07 +1100 Subject: [PPC64] Functions to reserve performance monitor hardware In-Reply-To: <20050110222340.GA13731@austin.ibm.com> References: <20050110180127.GD22101@localhost.localdomain> <20050110222340.GA13731@austin.ibm.com> Message-ID: <20050111105707.GC28175@localhost.localdomain> On Mon, Jan 10, 2005 at 04:23:40PM -0600, Olof Johansson wrote: > On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote: > > > This patch creates functions to reserve and release the performance > > monitor hardware (including its interrupt), and makes oprofile use > > them. > > I don't see where you make oprofile use the functions? op_model_* > changes aren't included in the patch. Ah, bugger. I could have sworn I made the changes, wonder where I managed to drop them. > > +int reserve_pmc_hardware(perf_irq_t new_perf_irq) > > +{ > > + int err = -EBUSY;; > > Keeping an extra semicolon around in case you need one in a hurry? :) Oh, dear, I clearly wasn't having a good day. > > + spin_lock(&pmc_owner_lock); > > + > > + if (pmc_owner_caller) { > > + printk(KERN_WARNING "reserve_pmc_hardware: " > > + "PMC hardware busy (reserved by caller %p)\n", > > + pmc_owner_caller); > > + goto out; > > + } > > + > > + pmc_owner_caller = __builtin_return_address(0); > > + perf_irq = new_perf_irq ? : dummy_perf; > > + > > + err = 0; > > Maybe I'm the only one with such an opinion, but I find it more readable > to set the error code in the error case (if section above) instead of > defaulting to error and clearing it before returning. :) Actually, I think I do to, but I've been experimenting with this style, since it seems to be rather common in the kernel. Anyway, revised below. > > + pmc_owner_caller = NULL; > > + perf_irq = dummy_perf; > > + > > + spin_unlock(&pmc_owner_lock); > > Current oprofile code has an implicit mb(); after restoring perf_irq. I > think the implied lwsync in spin_unlock is sufficient, but I wanted to > mention it. Yes, I did think about that, and figured the barrier in the spin_unlock() should be sufficient. > How do you expect the function to be used, will there really be users > reserving the hardware without registering the interrupt handler? I think it's entirely plausible that there could be. It would seem a bit yucky for a user that wasn't using interrupts to have to define their own copy of the dummy_perf() routine. > If > there are no such users then it could be nice to reserve using the > handler instead of the return address. Well, bear in mind that from the semantics point of view it's only the non-nullness of the return address that matters, so essentially it's just a flag. The rest of the return address is just there for debugging convenience. Anyway, patch with the abovementioned stupidities removed follows. Andrew, please apply: The PPC64 interrupt code includes a hook to call when an exception from the performance monitor unit occurs. However, there's no way of reserving the hook properly, so if more than one bit of code tries to use it things will get ugly. Currently oprofile is the only user, but there are likely to be more in future e.g. perfctr, if and when it reaches a fit state for merging. This patch creates functions to reserve and release the performance monitor hardware (including its interrupt), and makes oprofile use them. It also creates a new arch/ppc64/kernel/pmc.c, in which we can put any future helper functions for handling the performance monitor counters. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/pmc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/ppc64/kernel/pmc.c 2005-01-11 10:37:52.001422584 +1100 @@ -0,0 +1,64 @@ +/* + * linux/arch/ppc64/kernel/pmc.c + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +#include +#include + +/* Ensure exceptions are disabled */ +static void dummy_perf(struct pt_regs *regs) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); + mtspr(SPRN_MMCR0, mmcr0); +} + +static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED; +static void *pmc_owner_caller; /* mostly for debugging */ +perf_irq_t perf_irq = dummy_perf; + +int reserve_pmc_hardware(perf_irq_t new_perf_irq) +{ + int err = 0; + + spin_lock(&pmc_owner_lock); + + if (pmc_owner_caller) { + printk(KERN_WARNING "reserve_pmc_hardware: " + "PMC hardware busy (reserved by caller %p)\n", + pmc_owner_caller); + err = -EBUSY; + goto out; + } + + pmc_owner_caller = __builtin_return_address(0); + perf_irq = new_perf_irq ? : dummy_perf; + + out: + spin_unlock(&pmc_owner_lock); + return err; +} + +void release_pmc_hardware(void) +{ + spin_lock(&pmc_owner_lock); + + WARN_ON(! pmc_owner_caller); + + pmc_owner_caller = NULL; + perf_irq = dummy_perf; + + spin_unlock(&pmc_owner_lock); +} Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-11 10:36:44.555424864 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-11 10:36:46.969324088 +1100 @@ -40,6 +40,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUGGER int (*__debugger)(struct pt_regs *regs); @@ -449,18 +450,7 @@ die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -/* Ensure exceptions are disabled */ -static void dummy_perf(struct pt_regs *regs) -{ - unsigned int mmcr0 = mfspr(SPRN_MMCR0); - - mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); - mtspr(SPRN_MMCR0, mmcr0); -} - -void (*perf_irq)(struct pt_regs *) = dummy_perf; - -EXPORT_SYMBOL(perf_irq); +extern perf_irq_t perf_irq; void performance_monitor_exception(struct pt_regs *regs) { Index: working-2.6/include/asm-ppc64/pmc.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-ppc64/pmc.h 2005-01-11 10:36:46.970323936 +1100 @@ -0,0 +1,29 @@ +/* + * pmc.h + * Copyright (C) 2004 David Gibson, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _PPC64_PMC_H +#define _PPC64_PMC_H + +#include + +typedef void (*perf_irq_t)(struct pt_regs *); + +int reserve_pmc_hardware(perf_irq_t new_perf_irq); +void release_pmc_hardware(void); + +#endif /* _PPC64_PMC_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-01-11 10:36:44.555424864 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-01-11 10:36:46.970323936 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o pmc.o obj-$(CONFIG_PPC_OF) += of_device.o Index: working-2.6/arch/ppc64/oprofile/common.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/common.c 2005-01-06 10:47:48.000000000 +1100 +++ working-2.6/arch/ppc64/oprofile/common.c 2005-01-11 10:42:26.788317488 +1100 @@ -15,6 +15,7 @@ #include #include #include +#include #include "op_impl.h" @@ -22,9 +23,6 @@ extern struct op_ppc64_model op_model_power4; static struct op_ppc64_model *model; -extern void (*perf_irq)(struct pt_regs *); -static void (*save_perf_irq)(struct pt_regs *); - static struct op_counter_config ctr[OP_MAX_COUNTER]; static struct op_system_config sys; @@ -35,11 +33,12 @@ static int op_ppc64_setup(void) { - /* Install our interrupt handler into the existing hook. */ - save_perf_irq = perf_irq; - perf_irq = op_handle_interrupt; + int err; - mb(); + /* Grab the hardware */ + err = reserve_pmc_hardware(op_handle_interrupt); + if (err) + return err; /* Pre-compute the values to stuff in the hardware registers. */ model->reg_setup(ctr, &sys, model->num_counters); @@ -52,10 +51,7 @@ static void op_ppc64_shutdown(void) { - mb(); - - /* Remove our interrupt handler. We may be removing this module. */ - perf_irq = save_perf_irq; + release_pmc_hardware(); } static void op_ppc64_cpu_start(void *dummy) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From michael at ellerman.id.au Tue Jan 11 19:43:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 11 Jan 2005 19:43:57 +1100 Subject: [PATCH 2/2] ppc64: Fix iseries_veth module unload race and memory leak Message-ID: <20050111084358.0ABD917DF7@ozlabs.au.ibm.com> Hi All, When the iseries_veth driver module is unloaded there is the potential for an oops and also some memory leakage. Because the HvLpEvent_unregisterHandler() function did no synchronisation, it was possible for the handler that was being unregistered to be running on another CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the iseries_veth driver to leave work in the events work queue after the module had been unloaded. When that work was eventually executed we got an oops. In addition some of the data structures in the iseries_veth driver were not being correctly freed when the module was unloaded. This is the second patch, we make iseries_veth call flush_scheduled_work() after we are sure the handler is no longer running, and also fix the memory leaks. iseries_veth.c | 26 ++++++++++++++++++++++---- 1 files changed, 22 insertions(+), 4 deletions(-) Signed-off-by: Michael Ellerman diff -urN 2.6.10-ppc64-stock/drivers/net/iseries_veth.c 2.6.10-ppc64-work/drivers/net/iseries_veth.c --- 2.6.10-ppc64-stock/drivers/net/iseries_veth.c 2004-12-25 10:14:43.000000000 +1100 +++ 2.6.10-ppc64-work/drivers/net/iseries_veth.c 2005-01-11 18:40:21.811722242 +1100 @@ -642,7 +642,7 @@ return 0; } -static void veth_destroy_connection(u8 rlp) +static void veth_stop_connection(u8 rlp) { struct veth_lpar_connection *cnx = veth_cnx[rlp]; @@ -671,9 +671,18 @@ HvLpEvent_Type_VirtualLan, cnx->num_ack_events, NULL, NULL); +} + +static void veth_destroy_connection(u8 rlp) +{ + struct veth_lpar_connection *cnx = veth_cnx[rlp]; - if (cnx->msgs) - kfree(cnx->msgs); + if (! cnx) + return; + + kfree(cnx->msgs); + kfree(cnx); + veth_cnx[rlp] = NULL; } /* @@ -1375,9 +1384,18 @@ vio_unregister_driver(&veth_driver); for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) - veth_destroy_connection(i); + veth_stop_connection(i); HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan); + + /* Hypervisor callbacks may have scheduled more work while we + * were destroying connections. Now that we've disconnected from + * the hypervisor make sure everything's finished. */ + flush_scheduled_work(); + + for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) + veth_destroy_connection(i); + } module_exit(veth_module_cleanup); From michael at ellerman.id.au Tue Jan 11 19:43:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 11 Jan 2005 19:43:57 +1100 Subject: [PATCH 1/2] ppc64: Fix iseries_veth module unload race and memory leak Message-ID: <20050111084357.7C3F317DDF@ozlabs.au.ibm.com> Hi All, When the iseries_veth driver module is unloaded there is the potential for an oops and also some memory leakage. Because the HvLpEvent_unregisterHandler() function did no synchronisation, it was possible for the handler that was being unregistered to be running on another CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the iseries_veth driver to leave work in the events work queue after the module had been unloaded. When that work was eventually executed we got an oops. In addition some of the data structures in the iseries_veth driver were not being correctly freed when the module was unloaded. This is the first patch, which makes HvLpEvent_unregisterHandler() work. arch/ppc64/kernel/HvLpEvent.c | 8 ++++++++ include/asm-ppc64/iSeries/HvLpEvent.h | 3 +++ 2 files changed, 11 insertions(+) Signed-off-by: Michael Ellerman diff -urN 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c --- 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c 2004-06-16 17:12:51.000000000 +1000 +++ 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c 2005-01-10 16:13:33.381994263 +1100 @@ -34,10 +34,18 @@ int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType ) { int rc = 1; + + might_sleep(); + if ( eventType < HvLpEvent_Type_NumTypes ) { if ( !lpEventHandlerPaths[eventType] ) { lpEventHandler[eventType] = NULL; rc = 0; + + /* We now sleep until all other CPUs have scheduled. This ensures that + * the deletion is seen by all other CPUs, and that the deleted handler + * isn't still running on another CPU when we return. */ + synchronize_kernel(); } } return rc; diff -urN 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h --- 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h 2004-02-04 14:44:05.000000000 +1100 +++ 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h 2005-01-10 16:11:18.899255131 +1100 @@ -75,6 +75,9 @@ extern int HvLpEvent_registerHandler( HvLpEvent_Type eventType, LpEventHandler hdlr); // Unregister a handler for an event type +// This call will sleep until the handler being removed is guaranteed to +// be no longer executing on any CPU. Do not call with locks held. +// // returns 0 on success // Unregister will fail if there are any paths open for the type extern int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType ); From clark at esteem.com Wed Jan 12 03:19:53 2005 From: clark at esteem.com (Conn Clark) Date: Tue, 11 Jan 2005 08:19:53 -0800 Subject: email message sizes In-Reply-To: <200501102015.57394.hollis@penguinppc.org> References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> <200501102015.57394.hollis@penguinppc.org> Message-ID: <41E3FCA9.1060705@esteem.com> Hollis Blanchard wrote: > On Monday 10 January 2005 15:22, Hollis Blanchard wrote: > >>Hi all, I am one of two people who moderates these mailing lists. On >>occasion, people send large emails to these lists. I am of the opinion >>that 1MB emails should not be mass-mailed, but if you all have no >>problem with that then I will approve them. >> >>So are any of you on modems, or operate near the limits of your mail >>quotas? I'd like to hear comments either way: how large is ok to post >>to these mailing lists? > > > So far I have received 5 private mails indicating that 100KB is a reasonable > maximum. If you disagree please speak up... > > -Hollis I say 101K because I think it should be 100k and I know I will want to send something just over the limit. -- Conn Clark ***************************************************************** Give a man a match and you heat him for a moment. Set him on fire and you'll heat him for life. ***************************************************************** Conn Clark Engineering Stooge clark at esteem.com Electronic Systems Technology Inc. www.esteem.com Stock Ticker Symbol ELST From linas at austin.ibm.com Wed Jan 12 09:17:23 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 11 Jan 2005 16:17:23 -0600 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> Message-ID: <20050111221723.GE23690@austin.ibm.com> On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark: > > Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into > userspace. Their hack on ppc64 was to look at the high nibble of the > address and convert it to a non EEH address if required :) > > Im not sure how best to solve the userspace mmap issue but there are a > few groups wanting that. Somewhat off-topic ... but ... 1) If you design your hardware correctly, there are some amazing things you can do (performance wise) by mmaping pci card resources into user space. If your hardwares is done right, then user corruption can't hurt the system. This was the defacto method for getting high performance graphics on IBM RS/6000, sgi, HP and Sun workstations many moons ago. 2) There is interest in the virtual i/o community about mmaping funky stuff to userspace, but that conversation may be for a different day. The question is (for example) how to build a high-performance virtual scsi server in userspace (without kernel pieces) which is a design point some people like. Later... --linas From linas at austin.ibm.com Wed Jan 12 09:27:08 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 11 Jan 2005 16:27:08 -0600 Subject: [PATCH] PPC64: Trivial Cleanup: EEH_REGION Message-ID: <20050111222708.GF23690@austin.ibm.com> Hi Paul, Please forward upstream if you agree. This is a dumb, dorky cleanup patch: Per last round of emails, the concept of EEH _REGION is gone, but a few stubs remained. This patch removes them. Note there is some funny business in the SLB code that I did not understand, and so I left that alone. I'm guessing that it should be cut out as well. Signed-off-by: Linas Vepstas --linas -------------- next part -------------- ===== arch/ppc64/mm/hash_utils.c 1.55 vs edited ===== --- 1.55/arch/ppc64/mm/hash_utils.c 2004-10-28 02:39:49 -05:00 +++ edited/arch/ppc64/mm/hash_utils.c 2005-01-10 16:58:40 -06:00 @@ -295,12 +295,6 @@ int hash_page(unsigned long ea, unsigned vsid = get_kernel_vsid(ea); break; #if 0 - case EEH_REGION_ID: - /* - * Should only be hit if there is an access to MMIO space - * which is protected by EEH. - * Send the problem up to do_page_fault - */ case KERNEL_REGION_ID: /* * Should never get here - entire 0xC0... region is bolted. ===== arch/ppc64/mm/slb.c 1.3 vs edited ===== --- 1.3/arch/ppc64/mm/slb.c 2004-09-03 04:08:16 -05:00 +++ edited/arch/ppc64/mm/slb.c 2005-01-10 17:03:36 -06:00 @@ -75,6 +75,8 @@ static void slb_flush_and_rebolt(void) : "memory"); } +#define EEHREGIONBASE ASM_CONST(0xA000000000000000) + /* Flush all user entries from the segment table of the current processor. */ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) { ===== include/asm-ppc64/page.h 1.36 vs edited ===== --- 1.36/include/asm-ppc64/page.h 2004-10-28 02:39:49 -05:00 +++ edited/include/asm-ppc64/page.h 2005-01-10 16:59:50 -06:00 @@ -203,10 +203,8 @@ extern int page_is_ram(unsigned long pfn #define KERNELBASE PAGE_OFFSET #define VMALLOCBASE ASM_CONST(0xD000000000000000) #define IOREGIONBASE ASM_CONST(0xE000000000000000) -#define EEHREGIONBASE ASM_CONST(0xA000000000000000) #define IO_REGION_ID (IOREGIONBASE>>REGION_SHIFT) -#define EEH_REGION_ID (EEHREGIONBASE>>REGION_SHIFT) #define VMALLOC_REGION_ID (VMALLOCBASE>>REGION_SHIFT) #define KERNEL_REGION_ID (KERNELBASE>>REGION_SHIFT) #define USER_REGION_ID (0UL) From david at gibson.dropbear.id.au Wed Jan 12 11:18:35 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 12 Jan 2005 11:18:35 +1100 Subject: [PATCH] PPC64: Trivial Cleanup: EEH_REGION In-Reply-To: <20050111222708.GF23690@austin.ibm.com> References: <20050111222708.GF23690@austin.ibm.com> Message-ID: <20050112001835.GA12816@localhost.localdomain> On Tue, Jan 11, 2005 at 04:27:08PM -0600, Linas Vepstas wrote: > > Hi Paul, > > Please forward upstream if you agree. > > This is a dumb, dorky cleanup patch: > Per last round of emails, the concept of EEH _REGION is gone, > but a few stubs remained. This patch removes them. > > Note there is some funny business in the SLB code that > I did not understand, and so I left that alone. > I'm guessing that it should be cut out as well. Yes and no. The code that's there needs to stay - it's a workaround for a POWER5 hardware bug - but it doesn't have any real connection to EEH. The only reason we use EEHREGIONBASE is that it's a segment address which will never have anything real mapped into it. 0xFFFFFFFFF0000000 would do just as well. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From ahuja at austin.ibm.com Wed Jan 12 12:08:13 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Tue, 11 Jan 2005 19:08:13 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. Message-ID: <41E4787D.90309@austin.ibm.com> There is a requirement to collect real usage values of each partition in LPAR environment on pseries as well as iseries. This patch enables that feature. The current purr (processor Utilization register ) values of each of the processors is stored in a per_cpu data array. this is then summed and used to calculate various numbers for managing lpars. The patch also calculates how much real cpu time each process uses and stores this value in a ppc64 specific struct. The value is needed by CKRM to do further calculations. Signed-off-by: Manish Ahuja -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050111/b931e9b9/attachment.txt From paulus at samba.org Wed Jan 12 13:36:56 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 12 Jan 2005 13:36:56 +1100 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Manish Ahuja writes: > This patch enables that feature. The current purr (processor Utilization > register ) > values of each of the processors is stored in a per_cpu data array. this > is then > summed and used to calculate various numbers for managing lpars. Don't you also need to update purr_data_array in timer_interrupt as well? You seem to be doing that only on context switch, which won't be updated in a timely fashion necessarily (think of a compute-bound task on a lightly-loaded machine). > + for_each_cpu(cpu){ > + cus = &per_cpu(purr_data_array, cpu); > + sum_purr += cus->current_purr; > + } The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the "}" should be one tab to the left of where it is. > +/* Used to store Processor Utilization register (purr) values */ > +DECLARE_PER_CPU(struct purr_data, purr_data_array); > + > +struct purr_data { > + u64 current_purr; /* Holds the current purr register values */ > +}; Do we really need a struct to store one thing? Are there other things you plan to add later? Paul. From akpm at osdl.org Wed Jan 12 14:51:27 2005 From: akpm at osdl.org (Andrew Morton) Date: Tue, 11 Jan 2005 19:51:27 -0800 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <20050111195127.23300721.akpm@osdl.org> Manish Ahuja wrote: > > There is a requirement to collect real usage values of each partition in > LPAR environment > on pseries as well as iseries. What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ? From olof at austin.ibm.com Wed Jan 12 15:06:28 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 11 Jan 2005 22:06:28 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <20050112040628.GA13221@austin.ibm.com> Hi Manish, On Tue, Jan 11, 2005 at 07:08:13PM -0600, Manish Ahuja wrote: > The patch also calculates how much real cpu time each process uses and > stores this value in a ppc64 specific struct. I was going to ask a couple of questions about this and noticed Andrew Morton's reply pointing at cputime. That answered most of them (how other archs might be doing it). > The value is needed by CKRM to do further calculations. How will CKRM use this? Does it have architecture-specific code to dig this out of the thread_struct again? Could they use the cputime interface if we hooked into that instead? Finally: There's two ways to read PURR on our platform: One is to read the SPR value, the other to get it from the hypervisor via the H_PURR call. Do they measure the same thing and stay consistent? -Olof From scheel at vnet.ibm.com Wed Jan 12 19:56:53 2005 From: scheel at vnet.ibm.com (Jeff Scheel) Date: Wed, 12 Jan 2005 08:56:53 +0000 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> References: <41E4787D.90309@austin.ibm.com> <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Message-ID: <1105520213.25534.17.camel@sheepdog.rchland.ibm.com> On Wed, 2005-01-12 at 02:36, Paul Mackerras wrote: > Don't you also need to update purr_data_array in timer_interrupt as > well? You seem to be doing that only on context switch, which won't > be updated in a timely fashion necessarily (think of a compute-bound > task on a lightly-loaded machine). I agree it doesn't sound like only collecting this data at context switch does the trick. If we hook the timer (say in the decr path), then you need not have the code in context switch. That is until you implement a tickless timer and decr goes away. > Do we really need a struct to store one thing? Are there other things > you plan to add later? It seems to me that if we tucked the last PURR aside on each decrementer tick, we could simply let the kernel tasks which need this information retrieve it as long as we can guarantee atomic update of the values from the interrupt level. Then, interfaces like /proc/ppc64/lparcfg can do summing and other interfaces can use only the processor value if that's all they need. Given this, I'd vote for sticking the last PURR value for a processor in sum per processor structure that exists today like the Paca. Thoughts? -- Jeff Scheel (scheel at vnet.ibm.com) From scheel at vnet.ibm.com Wed Jan 12 19:47:23 2005 From: scheel at vnet.ibm.com (Jeff Scheel) Date: Wed, 12 Jan 2005 08:47:23 +0000 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050112040628.GA13221@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> Message-ID: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> On Wed, 2005-01-12 at 04:06, Olof Johansson wrote: > > The value is needed by CKRM to do further calculations. > > How will CKRM use this? Does it have architecture-specific code to > dig this out of the thread_struct again? Could they use the cputime > interface if we hooked into that instead? The only interface which exists today to retrieve purr is /proc/ppc64/lparconfig which provides PURR summed across all processors. We are working on other means to retrieve more specific data in the near future. > Finally: There's two ways to read PURR on our platform: One is to read > the SPR value, the other to get it from the hypervisor via the H_PURR > call. Do they measure the same thing and stay consistent? Olof, you are correct. We'll want to go directly to the hardware and avoid the overhead of a hypervisor call. Only is the instance where the hypervisor is emulating PURR will we want to use the hypervisor call. The "art" is detecting when/if that is occurring. Dave E. should be able to help us with this. -- Jeff Scheel (scheel at vnet.ibm.com) From ahuja at austin.ibm.com Thu Jan 13 03:30:21 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Wed, 12 Jan 2005 10:30:21 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> References: <41E4787D.90309@austin.ibm.com> <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Message-ID: <41E5509D.2070102@austin.ibm.com> Paul Mackerras wrote: > >Don't you also need to update purr_data_array in timer_interrupt as >well? You seem to be doing that only on context switch, which won't >be updated in a timely fashion necessarily (think of a compute-bound >task on a lightly-loaded machine). > > > Yes, I do need to add this in other places to improve the collection times. I have stepped away from using the old system completely and would actually like to add more collection points in interrupt routines. This will also enable me to collect real system time and other data which i plan to use at other places. I held that piece back since I saw what martin and john have been doing. Will put a more cohesive patch out. But this bit will remain unchanged with the other additions. >>+ for_each_cpu(cpu) >>+ cus = &per_cpu(purr_data_array, cpu); >>+ sum_purr += cus->current_purr; >>+ } >> >> > >The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the >"}" should be one tab to the left of where it is. > > > Wilco .. will fix it .. >>+/* Used to store Processor Utilization register (purr) values */ >>+DECLARE_PER_CPU(struct purr_data, purr_data_array); >>+ >>+struct purr_data { >>+ u64 current_purr; /* Holds the current purr register values */ >>+}; >> >> > >Do we really need a struct to store one thing? Are there other things >you plan to add later? > > > In my prototype there are a few more things. But since the other patch is not final, I only added the one thing, that I knew for sure I wanted. Having other members and not using them, generally gets you knocked on the head... so... I definitely plan to add other things.. Manish From ahuja at austin.ibm.com Thu Jan 13 03:37:32 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Wed, 12 Jan 2005 10:37:32 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050112040628.GA13221@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> Message-ID: <41E5524C.6000107@austin.ibm.com> >How will CKRM use this? Does it have architecture-specific code to >dig this out of the thread_struct again? Could they use the cputime >interface if we hooked into that instead? > > > I have provided them my test machine and they are working on setting up their stuff and as things get clear on whether they wish to use cputime interface or collect directly, I shall accordingly provide a small patch to enable them on ppc64. From will_schmidt at vnet.ibm.com Thu Jan 13 04:19:38 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Wed, 12 Jan 2005 11:19:38 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> Message-ID: <41E55C2A.2030309@vnet.ibm.com> Jeff Scheel wrote: > On Wed, 2005-01-12 at 04:06, Olof Johansson wrote: ... > Olof, you are correct. We'll want to go directly to the hardware and > avoid the overhead of a hypervisor call. Only is the instance where the > hypervisor is emulating PURR will we want to use the hypervisor call. > The "art" is detecting when/if that is occurring. Dave E. should be > able to help us with this. Related to the PURR hcall comments. (Yeah, I already visited that hcall/mfspr topic once.. :-) ) "While there is an hcall for reading the purr, and that hcall will work, it should not be used on [Power5] systems. on GR and later processors the OS should be doing a mfspr PURR directly. The purpose of the hcall was for prototyping PHYP/PURR behavior on pre-GR processors. " From paulus at samba.org Thu Jan 13 21:35:25 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:35:25 +1100 Subject: [PATCH] PPC64 Move thread_info flags to its own cache line Message-ID: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> This patch fixes a problem I have been seeing since all the preempt changes went in, which is that ppc64 SMP systems would livelock randomly if preempt was enabled. It turns out that what was happening was that one cpu was spinning in spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly doing preempt_enable() and preempt_disable() calls. The other cpu had the lock and was trying to set the TIF_NEED_RESCHED flag for the task running on the first cpu. That is an atomic operation which has to be retried if another cpu writes to the same cacheline between the load and the store, which the other cpu was doing every time it did preempt_enable() or preempt_disable(). I decided to move the thread_info flags field into the next cache line, since it is the only field that would regularly be modified by cpus other than the one running the task that owns the thread_info. (OK possibly the `cpu' field would be on a rebalance; I don't know the rebalancing code, but that should be pretty infrequent.) Thus, moving the flags field seems like a good idea generally as well as solving the immediate problem. For the record I am pretty unhappy with the code we use for spin_lock et al. with preemption turned on (the BUILD_LOCK_OPS stuff in spinlock.c). For a start we do the atomic op (_raw_spin_trylock) each time around the loop. That is going to be generating a lot of unnecessary bus (or fabric) traffic. Instead, after we fail to get the lock we should poll it with simple loads until we see that it is clear and then retry the atomic op. Assuming a reasonable cache design, the loads won't generate any bus traffic until another cpu writes to the cacheline containing the lock. Secondly we have lost the __spin_yield call that we had on ppc64, which is an important optimization when we are running under the hypervisor. I can't just put that in cpu_relax because I need to know which (virtual) cpu is holding the lock, so that I can tell the hypervisor which virtual cpu to give my time slice to. That information is stored in the lock variable, which is why __spin_yield needs the address of the lock. Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/thread_info.h test/include/asm-ppc64/thread_info.h --- linux-2.5/include/asm-ppc64/thread_info.h 2004-12-18 08:35:35.000000000 +1100 +++ test/include/asm-ppc64/thread_info.h 2005-01-13 18:36:24.000000000 +1100 @@ -12,6 +12,7 @@ #ifndef __ASSEMBLY__ #include +#include #include #include #include @@ -22,12 +23,13 @@ struct thread_info { struct task_struct *task; /* main task structure */ struct exec_domain *exec_domain; /* execution domain */ - unsigned long flags; /* low level flags */ int cpu; /* cpu we're on */ int preempt_count; struct restart_block restart_block; /* set by force_successful_syscall_return */ unsigned char syscall_noerror; + /* low level flags - has atomic operations done on it */ + unsigned long flags ____cacheline_aligned_in_smp; }; /* @@ -39,12 +41,12 @@ { \ .task = &tsk, \ .exec_domain = &default_exec_domain, \ - .flags = 0, \ .cpu = 0, \ .preempt_count = 1, \ .restart_block = { \ .fn = do_no_restart_syscall, \ }, \ + .flags = 0, \ } #define init_thread_info (init_thread_union.thread_info) From paulus at samba.org Thu Jan 13 21:37:52 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:37:52 +1100 Subject: [PATCH] PPC64 Disable preemption in flush_tlb_pending Message-ID: <16870.20352.503047.221064@cargo.ozlabs.ibm.com> The preempt debug stuff found a place where we were using smp_processor_id() without having preemption disabled, in flush_tlb_pending. This patch fixes it by using get_cpu_var and put_cpu_var instead of the __get_cpu_var variant. Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/tlbflush.h test/include/asm-ppc64/tlbflush.h --- linux-2.5/include/asm-ppc64/tlbflush.h 2004-06-07 08:25:32.000000000 +1000 +++ test/include/asm-ppc64/tlbflush.h 2005-01-13 19:35:37.000000000 +1100 @@ -32,10 +32,11 @@ static inline void flush_tlb_pending(void) { - struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); + struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch); if (batch->index) __flush_tlb_pending(batch); + put_cpu_var(ppc64_tlb_batch); } #define flush_tlb_mm(mm) flush_tlb_pending() From paulus at samba.org Thu Jan 13 21:41:36 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:41:36 +1100 Subject: [PATCH] PPC64 Call preempt_schedule on exception exit Message-ID: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> This patch mirrors the recent changes on x86 to call preempt_schedule rather than schedule in the exception exit path, in the case where the preempt_count is zero and the TIF_NEED_RESCHED bit is set. I'm a little concerned that this means that we have a window where interrupts are enabled and we are on our way into preempt_schedule, but preempt_count is still zero. Ingo's proposed preempt_schedule_irq would fix this, and I think something like that should go in. Signed-off-by: Paul Mackerras diff -urN linux-2.5/arch/ppc64/kernel/entry.S test/arch/ppc64/kernel/entry.S --- linux-2.5/arch/ppc64/kernel/entry.S 2005-01-10 07:54:27.000000000 +1100 +++ test/arch/ppc64/kernel/entry.S 2005-01-13 20:48:36.000000000 +1100 @@ -574,25 +574,22 @@ crandc eq,cr1*4+eq,eq bne restore /* here we are preempting the current task */ -1: lis r0,PREEMPT_ACTIVE at h - stw r0,TI_PREEMPT(r9) +1: #ifdef CONFIG_PPC_ISERIES li r0,1 stb r0,PACAPROCENABLED(r13) #endif ori r10,r10,MSR_EE mtmsrd r10,1 /* reenable interrupts */ - bl .schedule + bl .preempt_schedule mfmsr r10 clrrdi r9,r1,THREAD_SHIFT rldicl r10,r10,48,1 /* disable interrupts again */ - li r0,0 rotldi r10,r10,16 mtmsrd r10,1 ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b - stw r0,TI_PREEMPT(r9) b restore user_work: From paulus at samba.org Thu Jan 13 21:45:06 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:45:06 +1100 Subject: [PATCH] PPC64 can do preempt debug too Message-ID: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> This patch enables the DEBUG_PREEMPT config option for PPC64. I have this turned on on my desktop G5 and it isn't finding any problems. (It did find one problem, in flush_tlb_pending(), that I have just sent a patch for.) BTW, do we really need to restrict which architectures the config option is available on? Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/smp.h test/include/asm-ppc64/smp.h --- linux-2.5/include/asm-ppc64/smp.h 2004-11-26 20:40:32.000000000 +1100 +++ test/include/asm-ppc64/smp.h 2005-01-10 19:49:03.000000000 +1100 @@ -38,7 +38,7 @@ extern void smp_message_recv(int, struct pt_regs *); -#define smp_processor_id() (get_paca()->paca_index) +#define __smp_processor_id() (get_paca()->paca_index) #define hard_smp_processor_id() (get_paca()->hw_cpu_id) extern cpumask_t cpu_sibling_map[NR_CPUS]; diff -urN linux-2.5/lib/Kconfig.debug test/lib/Kconfig.debug --- linux-2.5/lib/Kconfig.debug 2005-01-11 08:57:21.000000000 +1100 +++ test/lib/Kconfig.debug 2005-01-11 09:13:28.000000000 +1100 @@ -50,7 +50,7 @@ config DEBUG_PREEMPT bool "Debug preemptible kernel" - depends on PREEMPT && X86 + depends on PREEMPT && (X86 || PPC64) default y help If you say Y here then the kernel will use a debug variant of the From paulus at samba.org Thu Jan 13 21:47:30 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:47:30 +1100 Subject: [PATCH] PPC64 Add PREEMPT_BKL option Message-ID: <16870.20930.566334.782203@cargo.ozlabs.ibm.com> This patch adds the PREEMPT_BKL config option for PPC64, shamelessly stolen from the i386 version. I have this turned on in the kernel on my desktop G5 and it seems to be just fine. Signed-off-by: Paul Mackerras diff -urN linux-2.5/arch/ppc64/Kconfig test/arch/ppc64/Kconfig --- linux-2.5/arch/ppc64/Kconfig 2005-01-11 08:57:19.000000000 +1100 +++ test/arch/ppc64/Kconfig 2005-01-12 20:25:17.000000000 +1100 @@ -231,6 +231,17 @@ Say Y here if you are building a kernel for a desktop, embedded or real-time system. Say N if you are unsure. +config PREEMPT_BKL + bool "Preempt The Big Kernel Lock" + depends on PREEMPT + default y + help + This option reduces the latency of the kernel by making the + big kernel lock preemptible. + + Say Y here if you are building a kernel for a desktop system. + Say N if you are unsure. + # # Use the generic interrupt handling code in kernel/irq/: # From mjw at us.ibm.com Fri Jan 14 05:46:23 2005 From: mjw at us.ibm.com (Mike Wolf) Date: Thu, 13 Jan 2005 12:46:23 -0600 Subject: [PATCH] PPC64: 32bit wrapper for ioctls. Message-ID: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> Hi Paul, The patch adds some 32bit wrappers for 2 ioctls that Java needs. Assuming this doesn't generate a round of discussion, please forward upstream to akpm/torvalds. Signed-off-by: Mike Wolf mjw at us.ibm.com -------------- next part -------------- A non-text attachment was scrubbed... Name: ioctl32.patch Type: text/x-patch Size: 482 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050113/8bc77f67/attachment.bin From olof at austin.ibm.com Fri Jan 14 07:00:48 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 13 Jan 2005 14:00:48 -0600 Subject: [PATCH] [PPC64] iommu: avoid ISA io space on POWER3 Message-ID: <20050113200048.GA11683@austin.ibm.com> Hi, On some systems, the first PCI bus has a ISA I/O hole at the first 16MB. We can't use this space for DMA addresses on the bus. On Python-based machines, we'll skip the first 256MB on buses that have the hole, just as we do on later systems. This means that the first bus will have 768MB of DMA space shared between the devices on it. Signed-off-by: Olof Johansson Acked-by: Paul Mackerras --- linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole 2005-01-12 16:29:55.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-12 16:34:57.000000000 -0600 @@ -327,12 +327,25 @@ static void iommu_bus_setup_pSeries(stru /* Root bus */ if (is_python(dn)) { struct iommu_table *tbl; + unsigned int *iohole; DBG("Python root bus %s\n", bus->name); - /* 1GB window by default */ - dn->phb->dma_window_size = 1 << 30; - dn->phb->dma_window_base_cur = 0; + iohole = (unsigned int *)get_property(dn, "io-hole", 0); + + if (iohole) { + /* On first bus we need to leave room for the + * ISA address space. Just skip the first 256MB + * alltogether. This leaves 768MB for the window. + */ + DBG("PHB has io-hole, reserving 256MB\n"); + dn->phb->dma_window_size = 3 << 28; + dn->phb->dma_window_base_cur = 1 << 28; + } else { + /* 1GB window by default */ + dn->phb->dma_window_size = 1 << 30; + dn->phb->dma_window_base_cur = 0; + } tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); _ From anton at samba.org Fri Jan 14 10:51:19 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 14 Jan 2005 10:51:19 +1100 Subject: [PATCH] ppc64: Allow EEH to be disabled Message-ID: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Hi, I was thinking of sending this upstream. Any thoughts? Anton -- Allow EEH to be disabled for pSeries targets, but only if the EMBEDDED option is enabled. Signed-off-by: Anton Blanchard diff -puN arch/ppc64/Kconfig~no-eeh arch/ppc64/Kconfig --- foobar2/arch/ppc64/Kconfig~no-eeh 2005-01-12 00:34:25.902201644 +1100 +++ foobar2-anton/arch/ppc64/Kconfig 2005-01-12 00:34:25.934199201 +1100 @@ -231,6 +231,11 @@ config PREEMPT Say Y here if you are building a kernel for a desktop, embedded or real-time system. Say N if you are unsure. +config EEH + bool "PCI Extended Error Handling (EEH)" if EMBEDDED + depends on PPC_PSERIES + default y if !EMBEDDED + # # Use the generic interrupt handling code in kernel/irq/: # diff -puN arch/ppc64/kernel/Makefile~no-eeh arch/ppc64/kernel/Makefile --- foobar2/arch/ppc64/kernel/Makefile~no-eeh 2005-01-12 00:34:25.908201186 +1100 +++ foobar2-anton/arch/ppc64/kernel/Makefile 2005-01-12 00:34:25.932199354 +1100 @@ -30,9 +30,10 @@ obj-$(CONFIG_PPC_ISERIES) += iSeries_irq obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - eeh.o pSeries_nvram.o rtasd.o ras.o \ + pSeries_nvram.o rtasd.o ras.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o +obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o diff -puN include/asm-ppc64/eeh.h~no-eeh include/asm-ppc64/eeh.h --- foobar2/include/asm-ppc64/eeh.h~no-eeh 2005-01-12 00:34:25.913200804 +1100 +++ foobar2-anton/include/asm-ppc64/eeh.h 2005-01-12 00:34:25.931199430 +1100 @@ -23,7 +23,6 @@ #include #include #include -#include struct pci_dev; struct device_node; @@ -33,14 +32,18 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) -#ifdef CONFIG_PPC_PSERIES -extern void __init eeh_init(void); -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); +#ifdef CONFIG_EEH +void __init eeh_init(void); +unsigned long eeh_check_failure(const volatile void __iomem *token, + unsigned long val); int eeh_dn_check_failure (struct device_node *dn, struct pci_dev *dev); void __iomem *eeh_ioremap(unsigned long addr, void __iomem *vaddr); void __init pci_addr_cache_build(void); #else +#define eeh_init() #define eeh_check_failure(token, val) (val) +#define eeh_dn_check_failure(dn, dev) (0) +#define pci_addr_cache_build() #endif /** @@ -69,8 +72,6 @@ void eeh_remove_device(struct pci_dev *) #define EEH_ENABLE 1 #define EEH_RELEASE_LOADSTORE 2 #define EEH_RELEASE_DMA 3 -int eeh_set_option(struct pci_dev *dev, int options); - /** * Notifier event flags. @@ -89,6 +90,7 @@ struct eeh_event { }; /** Register to find out about EEH events. */ +struct notifier_block; int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); @@ -194,7 +196,8 @@ static inline void eeh_raw_writeq(u64 va #define EEH_CHECK_ALIGN(v,a) \ ((((unsigned long)(v)) & ((a) - 1)) == 0) -static inline void eeh_memset_io(volatile void __iomem *addr, int c, unsigned long n) +static inline void eeh_memset_io(volatile void __iomem *addr, int c, + unsigned long n) { u32 lc = c; lc |= lc << 8; _ From linas at austin.ibm.com Fri Jan 14 11:31:59 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 13 Jan 2005 18:31:59 -0600 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050114003159.GO23690@austin.ibm.com> On Fri, Jan 14, 2005 at 10:51:19AM +1100, Anton Blanchard was heard to remark: > > Hi, > > I was thinking of sending this upstream. Any thoughts? Yes, can you help me get my other patch accepted? (This patch, though, looks fine to me). (Although one could probably move even more things into the #ifdef region, just to be clean.) --linas From zwane at arm.linux.org.uk Fri Jan 14 11:43:39 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Thu, 13 Jan 2005 17:43:39 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu Message-ID: I found the following very handy for use as a reference platform when working on i386 hotplug cpu recently. It's been tested on a G5 system with a cpu going on/offline every second and make -j. I've also tried a number of config options to avoid compile breakage. Signed-off-by: Zwane Mwaikambo Index: linux-2.6.10-mm3/arch/ppc64/Kconfig =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/Kconfig,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 Kconfig --- linux-2.6.10-mm3/arch/ppc64/Kconfig 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/Kconfig 13 Jan 2005 16:35:39 -0000 @@ -305,7 +305,7 @@ source "drivers/pci/Kconfig" config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" - depends on SMP && EXPERIMENTAL && PPC_PSERIES + depends on SMP && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC) select HOTPLUG ---help--- Say Y here to be able to turn CPUs off and on. Index: linux-2.6.10-mm3/arch/ppc64/kernel/idle.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/idle.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 idle.c --- linux-2.6.10-mm3/arch/ppc64/kernel/idle.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/idle.c 13 Jan 2005 16:34:24 -0000 @@ -364,7 +364,7 @@ int idle_setup(void) } } #endif /* CONFIG_PPC_PSERIES */ -#ifndef CONFIG_PPC_ISERIES +#if !defined(CONFIG_PPC_ISERIES) && !defined(CONFIG_HOTPLUG_CPU) if (systemcfg->platform == PLATFORM_POWERMAC || systemcfg->platform == PLATFORM_MAPLE) { printk(KERN_INFO "Using native/NAP idle loop\n"); Index: linux-2.6.10-mm3/arch/ppc64/kernel/irq.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/irq.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 irq.c --- linux-2.6.10-mm3/arch/ppc64/kernel/irq.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/irq.c 13 Jan 2005 23:51:29 -0000 @@ -479,3 +479,31 @@ EXPORT_SYMBOL(do_softirq); #endif /* CONFIG_IRQSTACKS */ +#ifdef CONFIG_HOTPLUG_CPU +void fixup_irqs(cpumask_t map) +{ + unsigned int irq; + static int warned; + + for_each_irq(irq) { + cpumask_t mask; + + if (irq_desc[irq].status & IRQ_PER_CPU) + continue; + + cpus_and(mask, irq_affinity[irq], map); + if (any_online_cpu(mask) == NR_CPUS) { + printk("Breaking affinity for irq %i\n", irq); + mask = map; + } + if (irq_desc[irq].handler->set_affinity) + irq_desc[irq].handler->set_affinity(irq, mask); + else if (irq_desc[irq].action && !(warned++)) + printk("Cannot set affinity for irq %i\n", irq); + } + + local_irq_enable(); + mdelay(1); + local_irq_disable(); +} +#endif Index: linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pSeries_setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c 13 Jan 2005 20:44:05 -0000 @@ -327,8 +327,9 @@ static void __init pSeries_discover_pic } } -static void pSeries_cpu_die(void) +static void pSeries_mach_cpu_die(void) { + idle_task_exit(); local_irq_disable(); /* Some hardware requires clearing the CPPR, while other hardware does not * it is safe either way @@ -606,7 +607,7 @@ struct machdep_calls __initdata pSeries_ .power_off = rtas_power_off, .halt = rtas_halt, .panic = rtas_os_term, - .cpu_die = pSeries_cpu_die, + .cpu_die = pSeries_mach_cpu_die, .get_boot_time = pSeries_get_boot_time, .get_rtc_time = pSeries_get_rtc_time, .set_rtc_time = pSeries_set_rtc_time, Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac.h --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h 13 Jan 2005 16:34:24 -0000 @@ -8,6 +8,9 @@ * Declaration for the various functions exported by the * pmac_* files. Mostly for use by pmac_setup */ +#ifdef CONFIG_HOTPLUG_CPU +DECLARE_PER_CPU(int, cpu_state); +#endif extern void pmac_get_boot_time(struct rtc_time *tm); extern void pmac_get_rtc_time(struct rtc_time *tm); Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c 13 Jan 2005 16:34:24 -0000 @@ -229,6 +229,25 @@ void __pmac pmac_halt(void) pmac_power_off(); } +#ifdef CONFIG_HOTPLUG_CPU +static void pmac_mach_cpu_die(void) +{ + unsigned int cpu; + + local_irq_disable(); + cpu = smp_processor_id(); + printk(KERN_DEBUG "CPU%d offline\n", cpu); + __get_cpu_var(cpu_state) = CPU_DEAD; + wmb(); + while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) + cpu_relax(); + + flush_tlb_pending(); + cpu_set(cpu, cpu_online_map); + local_irq_enable(); +} +#endif + #ifdef CONFIG_BOOTX_TEXT static int dummy_getc_poll(void) { @@ -455,5 +474,8 @@ struct machdep_calls __initdata pmac_md .calibrate_decr = pmac_calibrate_decr, .feature_call = pmac_do_feature_call, .progress = pmac_progress, - .check_legacy_ioport = pmac_check_legacy_ioport + .check_legacy_ioport = pmac_check_legacy_ioport, +#ifdef CONFIG_HOTPLUG_CPU + .cpu_die = pmac_mach_cpu_die, +#endif }; Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_smp.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c 14 Jan 2005 00:32:10 -0000 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -296,6 +297,38 @@ static void __init smp_core99_setup_cpu( } } +#ifdef CONFIG_HOTPLUG_CPU +/* State of each CPU during hotplug phases */ +DEFINE_PER_CPU(int, cpu_state) = { 0 }; + +static int pmac_cpu_disable(void) +{ + unsigned int cpu = smp_processor_id(); + + if (cpu == boot_cpuid) + return -EBUSY; + + systemcfg->processorCount--; + cpu_clear(cpu, cpu_online_map); + fixup_irqs(cpu_online_map); + return 0; +} + +static void pmac_cpu_die(unsigned int cpu) +{ + int i; + + for (i = 0; i < 100; i++) { + rmb(); + if (per_cpu(cpu_state, cpu) == CPU_DEAD) + return; + msleep(100); + } + printk(KERN_ERR "CPU%d didn't die...\n", cpu); +} + +#endif + struct smp_ops_t core99_smp_ops __pmacdata = { .message_pass = smp_mpic_message_pass, .probe = smp_core99_probe, @@ -308,4 +341,8 @@ struct smp_ops_t core99_smp_ops __pmacda void __init pmac_setup_smp(void) { smp_ops = &core99_smp_ops; +#ifdef CONFIG_HOTPLUG_CPU + smp_ops->cpu_disable = pmac_cpu_disable; + smp_ops->cpu_die = pmac_cpu_die; +#endif } Index: linux-2.6.10-mm3/arch/ppc64/kernel/setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/setup.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/setup.c 13 Jan 2005 21:26:48 -0000 @@ -1345,9 +1345,6 @@ early_param("xmon", early_xmon); void cpu_die(void) { - idle_task_exit(); if (ppc_md.cpu_die) ppc_md.cpu_die(); - local_irq_disable(); - for (;;); } Index: linux-2.6.10-mm3/arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.c --- linux-2.6.10-mm3/arch/ppc64/kernel/smp.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/smp.c 14 Jan 2005 00:26:26 -0000 @@ -406,10 +406,39 @@ void __devinit smp_prepare_boot_cpu(void current_set[boot_cpuid] = current->thread_info; } +#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PPC_PMAC) +#include "pmac.h" +static int cpu_enable(unsigned int cpu) +{ + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + return -ENOSYS; + + /* get the target out of it's holding state */ + per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; + wmb(); + + while (!cpu_online(cpu)) + cpu_relax(); + + fixup_irqs(cpu_online_map); + /* counter the irq disable in fixup_irqs */ + local_irq_enable(); + return 0; +} +#else +static int cpu_enable(unsigned int cpu) +{ + return -ENOSYS; +} +#endif + int __devinit __cpu_up(unsigned int cpu) { int c; + if (system_state == SYSTEM_RUNNING && !cpu_enable(cpu)) + return 0; + /* At boot, don't bother with non-present cpus -JSCHOPP */ if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) return -ENOENT; Index: linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 sysfs.c --- linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c 13 Jan 2005 16:36:23 -0000 @@ -18,7 +18,7 @@ #include #include #include - +#include static DEFINE_PER_CPU(struct cpu, cpu_devices); @@ -413,9 +413,7 @@ static int __init topology_init(void) * CPU. For instance, the boot cpu might never be valid * for hotplugging. */ -#ifdef CONFIG_HOTPLUG_CPU - if (systemcfg->platform != PLATFORM_PSERIES_LPAR) -#endif + if (!ppc_md.cpu_die) c->no_control = 1; if (cpu_online(cpu) || (c->no_control == 0)) { Index: linux-2.6.10-mm3/include/asm-ppc64/smp.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/include/asm-ppc64/smp.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.h --- linux-2.6.10-mm3/include/asm-ppc64/smp.h 13 Jan 2005 16:27:35 -0000 1.1.1.1 +++ linux-2.6.10-mm3/include/asm-ppc64/smp.h 13 Jan 2005 16:34:24 -0000 @@ -29,7 +29,7 @@ extern int boot_cpuid; extern int boot_cpuid_phys; -extern void cpu_die(void) __attribute__((noreturn)); +extern void cpu_die(void); #ifdef CONFIG_SMP @@ -37,6 +37,9 @@ extern void smp_send_debugger_break(int struct pt_regs; extern void smp_message_recv(int, struct pt_regs *); +#ifdef CONFIG_HOTPLUG_CPU +extern void fixup_irqs(cpumask_t map); +#endif #define smp_processor_id() (get_paca()->paca_index) #define hard_smp_processor_id() (get_paca()->hw_cpu_id) From nathanl at austin.ibm.com Fri Jan 14 18:05:52 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 14 Jan 2005 01:05:52 -0600 Subject: [PATCH] use kref for device_node refcounting Message-ID: <1105686352.4367.4.camel@biclops> This changes struct device_node and associated code to use the kref api for object refcounting and freeing. I've given it some testing on pSeries with cpu add/remove and verified that the release function works. The change is somewhat cosmetic but it does make the code easier to understand... at least I think so =) The only real change is that the refcount on all device_nodes is initialized at 1, and the device node is freed when the refcount reaches 0 (of_remove_node has the extra "put" to ensure that this happens). This lets us get rid of the OF_STALE flag and macros in prom.h. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref arch/ppc64/kernel/prom.c --- linux-2.6.11-rc1-bk1/arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref 2005-01-13 19:04:09.000000000 -0600 +++ linux-2.6.11-rc1-bk1-nathanl/arch/ppc64/kernel/prom.c 2005-01-14 00:24:04.000000000 -0600 @@ -717,6 +717,7 @@ static unsigned long __init unflatten_dt dad->next->sibling = np; dad->next = np; } + kref_init(&np->kref); } while(1) { u32 sz, noff; @@ -1475,24 +1476,31 @@ EXPORT_SYMBOL(of_get_next_child); * @node: Node to inc refcount, NULL is supported to * simplify writing of callers * - * Returns the node itself or NULL if gone. + * Returns node. */ struct device_node *of_node_get(struct device_node *node) { - if (node && !OF_IS_STALE(node)) { - atomic_inc(&node->_users); - return node; - } - return NULL; + if (node) + kref_get(&node->kref); + return node; } EXPORT_SYMBOL(of_node_get); +static inline struct device_node * kref_to_device_node(struct kref *kref) +{ + return container_of(kref, struct device_node, kref); +} + /** - * of_node_cleanup - release a dynamically allocated node - * @arg: Node to be released + * of_node_release - release a dynamically allocated node + * @kref: kref element of the node to be released + * + * In of_node_put() this function is passed to kref_put() + * as the destructor. */ -static void of_node_cleanup(struct device_node *node) +static void of_node_release(struct kref *kref) { + struct device_node *node = kref_to_device_node(kref); struct property *prop = node->properties; if (!OF_IS_DYNAMIC(node)) @@ -1518,19 +1526,8 @@ static void of_node_cleanup(struct devic */ void of_node_put(struct device_node *node) { - if (!node) - return; - - WARN_ON(0 == atomic_read(&node->_users)); - - if (OF_IS_STALE(node)) { - if (atomic_dec_and_test(&node->_users)) { - of_node_cleanup(node); - return; - } - } - else - atomic_dec(&node->_users); + if (node) + kref_put(&node->kref, of_node_release); } EXPORT_SYMBOL(of_node_put); @@ -1773,7 +1770,7 @@ int of_add_node(const char *path, struct np->properties = proplist; OF_MARK_DYNAMIC(np); - of_node_get(np); + kref_init(&np->kref); np->parent = derive_parent(path); if (!np->parent) { kfree(np); @@ -1808,8 +1805,9 @@ static void of_cleanup_node(struct devic } /* - * Remove an OF device node from the system. - * Caller should have already "gotten" np. + * "Unplug" a node from the device tree. The caller must hold + * a reference to the node. The memory associated with the node + * is not freed until its refcount goes to zero. */ int of_remove_node(struct device_node *np) { @@ -1827,7 +1825,6 @@ int of_remove_node(struct device_node *n of_cleanup_node(np); write_lock(&devtree_lock); - OF_MARK_STALE(np); remove_node_proc_entries(np); if (allnodes == np) allnodes = np->allnext; @@ -1852,6 +1849,7 @@ int of_remove_node(struct device_node *n } write_unlock(&devtree_lock); of_node_put(parent); + of_node_put(np); /* Must decrement the refcount */ return 0; } diff -puN include/asm-ppc64/prom.h~ppc64-device_node-use-kref include/asm-ppc64/prom.h --- linux-2.6.11-rc1-bk1/include/asm-ppc64/prom.h~ppc64-device_node-use-kref 2005-01-13 19:04:09.000000000 -0600 +++ linux-2.6.11-rc1-bk1-nathanl/include/asm-ppc64/prom.h 2005-01-13 19:04:09.000000000 -0600 @@ -149,18 +149,15 @@ struct device_node { struct proc_dir_entry *pde; /* this node's proc directory */ struct proc_dir_entry *name_link; /* name symlink */ struct proc_dir_entry *addr_link; /* addr symlink */ - atomic_t _users; /* reference count */ + struct kref kref; unsigned long _flags; }; extern struct device_node *of_chosen; /* flag descriptions */ -#define OF_STALE 0 /* node is slated for deletion */ #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */ -#define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags) -#define OF_MARK_STALE(x) set_bit(OF_STALE, &x->_flags) #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, &x->_flags) #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, &x->_flags) _ From arnd at arndb.de Fri Jan 14 20:28:22 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:28:22 +0100 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Message-ID: <200501141028.23317.arnd@arndb.de> On Freedag 14 Januar 2005 00:51, Anton Blanchard wrote: > Hi, > > I was thinking of sending this upstream. Any thoughts? > I'm doing something similar in my private tree and I noticed that init_pci_config_tokens() is currently called by eeh_init(). If you don't build EEH, init_pci_config_tokens() needs to be called by pSeries_setup_arch(), which makes more sense anyway. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/8f823a0e/attachment.pgp From arnd at arndb.de Fri Jan 14 20:23:07 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:23:07 +0100 Subject: [PATCH] PPC64: 32bit wrapper for ioctls. In-Reply-To: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> References: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> Message-ID: <200501141023.08156.arnd@arndb.de> On Dunnersdag 13 Januar 2005 19:46, Mike Wolf wrote: > Hi Paul, > ? The patch adds some 32bit wrappers for 2 ioctls that Java needs. > Assuming this doesn't generate a round of discussion, please > forward upstream to akpm/torvalds. Why add them to arch/ppc64? These don't look architecture specific, so they should go into include/linux/compat_ioctl.h. > --- linus-0112.orig/arch/ppc64/kernel/ioctl32.c?2005-01-13 10:35:10.165539000 -0600 > +++ linus-0112/arch/ppc64/kernel/ioctl32.c??????2005-01-13 10:51:43.450433277 -0600 > @@ -43,6 +43,8 @@ > ?COMPATIBLE_IOCTL(TIOCSTART) > ?COMPATIBLE_IOCTL(TIOCSTOP) > ?COMPATIBLE_IOCTL(TIOCSLTC) > +COMPATIBLE_IOCTL(TIOCMIWAIT) Note that TIOCMIWAIT is not COMPATIBLE_IOCTL, but ULONG_IOCTL. It doesn't make a difference for ppc64, but if you add it to the generic file that is needed for s390x. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/54e8467a/attachment.pgp From arnd at arndb.de Fri Jan 14 20:50:28 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:50:28 +0100 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050111195127.23300721.akpm@osdl.org> References: <41E4787D.90309@austin.ibm.com> <20050111195127.23300721.akpm@osdl.org> Message-ID: <200501141050.29068.arnd@arndb.de> On Middeweken 12 Januar 2005 04:51, Andrew Morton wrote: > Manish Ahuja wrote: > > > > There is a requirement to collect real usage values of each partition in > > LPAR environment > > on pseries as well as iseries. > > What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ? I asked Martin the same thing yesterday, and he said that that recording the purr value like Manish does is needed to support the cputime statistics, but this is not the complete solution. Manish, did you look at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ? I think you need to do similar things on top of you patch to really export steal time etc. to user space. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/44dce463/attachment.pgp From ahuja at austin.ibm.com Sat Jan 15 06:18:23 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Fri, 14 Jan 2005 13:18:23 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <200501141050.29068.arnd@arndb.de> References: <41E4787D.90309@austin.ibm.com> <20050111195127.23300721.akpm@osdl.org> <200501141050.29068.arnd@arndb.de> Message-ID: <41E81AFF.3020005@austin.ibm.com> Arnd Bergmann wrote: >I asked Martin the same thing yesterday, and he said that that recording >the purr value like Manish does is needed to support the cputime statistics, >but this is not the complete solution. > >Manish, did you look at >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ? >I think you need to do similar things on top of you patch to really export >steal time etc. to user space. > > Arnd <>< > > Yup, There is another piece that will tie in with Martin's patch. This piece is needed by the CKRM folks to enable process accounting feature as well as by Jeff Scheel since he uses the output for his calculations. Manish From anton at samba.org Sat Jan 15 10:49:20 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 15 Jan 2005 10:49:20 +1100 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <200501141028.23317.arnd@arndb.de> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> <200501141028.23317.arnd@arndb.de> Message-ID: <20050114234920.GM6309@krispykreme.ozlabs.ibm.com> Hi, > I'm doing something similar in my private tree and I noticed that > init_pci_config_tokens() is currently called by eeh_init(). > If you don't build EEH, init_pci_config_tokens() needs to be called > by pSeries_setup_arch(), which makes more sense anyway. Good point :) We also had PCI disabled so never saw this. Anton From anton at samba.org Sat Jan 15 11:00:55 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 15 Jan 2005 11:00:55 +1100 Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE() In-Reply-To: <1105714076.26551.243.camel@hades.cambridge.redhat.com> References: <1105714076.26551.243.camel@hades.cambridge.redhat.com> Message-ID: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> David: you have to send me some spare Signed-off-by's :) Anton -- From: David Woodhouse We don't set MM_VM_SIZE() on ppc64, so it defaults to TASK_SIZE. Which means a 32-bit process ending up in exit_mmap() to kill a 64-bit mm may call tlb_finish_mmu() with an incorrect 'end' argument. Signed-off-by: Anton Blanchard ===== include/asm-ppc64/processor.h 1.59 vs edited ===== --- 1.59/include/asm-ppc64/processor.h Tue Jan 11 01:29:24 2005 +++ edited/include/asm-ppc64/processor.h Fri Jan 14 14:42:44 2005 @@ -537,6 +537,10 @@ #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_USER32 : TASK_SIZE_USER64) +/* We can't actually tell the TASK_SIZE given just the mm, but default + * to the 64-bit case to make sure that enough gets cleaned up. */ +#define MM_VM_SIZE(mm) TASK_SIZE_USER64 + /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ From dwmw2 at infradead.org Sat Jan 15 11:31:41 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 15 Jan 2005 00:31:41 +0000 Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE() In-Reply-To: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> References: <1105714076.26551.243.camel@hades.cambridge.redhat.com> <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> Message-ID: <1105749101.30759.109.camel@baythorne.infradead.org> On Sat, 2005-01-15 at 11:00 +1100, Anton Blanchard wrote: > David: you have to send me some spare Signed-off-by's :) Get Paulus to give you some spares. I'm sure he's losing them. Signed-off-by: David Woodhouse -- dwmw2 From mingo at elte.hu Sun Jan 16 01:25:37 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:25:37 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> Message-ID: <20050115142537.GD10114@elte.hu> * Paul Mackerras wrote: > This patch fixes a problem I have been seeing since all the preempt > changes went in, which is that ppc64 SMP systems would livelock > randomly if preempt was enabled. > > It turns out that what was happening was that one cpu was spinning in > spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly > doing preempt_enable() and preempt_disable() calls. The other cpu had > the lock and was trying to set the TIF_NEED_RESCHED flag for the task > running on the first cpu. That is an atomic operation which has to be > retried if another cpu writes to the same cacheline between the load > and the store, which the other cpu was doing every time it did > preempt_enable() or preempt_disable(). ahh ... indeed. Nice catch. > I decided to move the thread_info flags field into the next cache > line, since it is the only field that would regularly be modified by > cpus other than the one running the task that owns the thread_info. > (OK possibly the `cpu' field would be on a rebalance; I don't know the > rebalancing code, but that should be pretty infrequent.) Thus, moving > the flags field seems like a good idea generally as well as solving > the immediate problem. > > For the record I am pretty unhappy with the code we use for spin_lock > et al. with preemption turned on (the BUILD_LOCK_OPS stuff in > spinlock.c). For a start we do the atomic op (_raw_spin_trylock) each > time around the loop. That is going to be generating a lot of > unnecessary bus (or fabric) traffic. Instead, after we fail to get > the lock we should poll it with simple loads until we see that it is > clear and then retry the atomic op. Assuming a reasonable cache > design, the loads won't generate any bus traffic until another cpu > writes to the cacheline containing the lock. agreed. How about the patch below? (tested on x86) > Secondly we have lost the __spin_yield call that we had on ppc64, > which is an important optimization when we are running under the > hypervisor. I can't just put that in cpu_relax because I need to know > which (virtual) cpu is holding the lock, so that I can tell the > hypervisor which virtual cpu to give my time slice to. That > information is stored in the lock variable, which is why __spin_yield > needs the address of the lock. hm, how about calling __spin_yield() from _raw_spin_trylock(), if the locking attempt was unsuccessful? This might be slightly incorrect if the locking attempt is not connected to an actual spin-loop, but we do have other spin-loops with open-coded trylocks that would benefit from this optimization too. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype) \ +#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ } \ @@ -204,6 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ + while (spin_is_locked(lock) && (lock)->break_lock) \ + cpu_relax(); \ cpu_relax(); \ preempt_disable(); \ } \ @@ -244,9 +247,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t); -BUILD_LOCK_OPS(read, rwlock_t); -BUILD_LOCK_OPS(write, rwlock_t); +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); #endif /* CONFIG_PREEMPT */ From mingo at elte.hu Sun Jan 16 01:38:05 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:38:05 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115142537.GD10114@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> Message-ID: <20050115143805.GA15041@elte.hu> * Ingo Molnar wrote: > agreed. How about the patch below? (tested on x86) updated patch below. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype) \ +#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ } \ @@ -204,7 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ return flags; \ @@ -244,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t); -BUILD_LOCK_OPS(read, rwlock_t); -BUILD_LOCK_OPS(write, rwlock_t); +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); #endif /* CONFIG_PREEMPT */ From mingo at elte.hu Sun Jan 16 01:00:44 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:00:44 +0100 Subject: [PATCH] PPC64 can do preempt debug too In-Reply-To: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> References: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> Message-ID: <20050115140044.GB10114@elte.hu> * Paul Mackerras wrote: > This patch enables the DEBUG_PREEMPT config option for PPC64. I have > this turned on on my desktop G5 and it isn't finding any problems. (It > did find one problem, in flush_tlb_pending(), that I have just sent a > patch for.) > > BTW, do we really need to restrict which architectures the config > option is available on? in the case of x86 (and x64) i found that there were a fair number of false positives in arch-level code. But i agree that we should (now) make the config option available to all architectures - patch against 2.6.11-rc1 below. Ingo Signed-off-by: Ingo Molnar --- linux/lib/Kconfig.debug.orig +++ linux/lib/Kconfig.debug @@ -50,7 +50,7 @@ config DEBUG_SLAB config DEBUG_PREEMPT bool "Debug preemptible kernel" - depends on PREEMPT && X86 + depends on PREEMPT default y help If you say Y here then the kernel will use a debug variant of the From mingo at elte.hu Sun Jan 16 01:04:38 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:04:38 +0100 Subject: [PATCH] PPC64 Call preempt_schedule on exception exit In-Reply-To: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> References: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> Message-ID: <20050115140438.GC10114@elte.hu> * Paul Mackerras wrote: > This patch mirrors the recent changes on x86 to call preempt_schedule > rather than schedule in the exception exit path, in the case where the > preempt_count is zero and the TIF_NEED_RESCHED bit is set. > > I'm a little concerned that this means that we have a window where > interrupts are enabled and we are on our way into preempt_schedule, > but preempt_count is still zero. Ingo's proposed preempt_schedule_irq > would fix this, and I think something like that should go in. the preempt_schedule_irq() patch is in 2.6.11-rc1-mm1 now, does it look good to you? ppc64 should be able to call it directly from lowlevel code. Ingo From benh at kernel.crashing.org Sun Jan 16 09:23:13 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:23:13 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: References: Message-ID: <1105827794.27410.82.camel@gaston> On Thu, 2005-01-13 at 17:43 -0700, Zwane Mwaikambo wrote: > I found the following very handy for use as a reference platform when > working on i386 hotplug cpu recently. > > It's been tested on a G5 system with a cpu going on/offline every second > and make -j. I've also tried a number of config options to avoid compile > breakage. Hi ! Looks good, but you could do even better :) I still want to look at the proper mecanism to flush the CPU cache on 970, but the idea here is to flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) with the caches clean and MSR:EE off. We can later get it back with a soft reset. Ben. From benh at kernel.crashing.org Sun Jan 16 09:29:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:29:21 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050111221723.GE23690@austin.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> <20050111221723.GE23690@austin.ibm.com> Message-ID: <1105828161.27410.84.camel@gaston> On Tue, 2005-01-11 at 16:17 -0600, Linas Vepstas wrote: > On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark: > > > > Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into > > userspace. Their hack on ppc64 was to look at the high nibble of the > > address and convert it to a non EEH address if required :) > > > > Im not sure how best to solve the userspace mmap issue but there are a > > few groups wanting that. > > Somewhat off-topic ... but ... > > 1) If you design your hardware correctly, there are some amazing things > you can do (performance wise) by mmaping pci card resources into user > space. If your hardwares is done right, then user corruption can't > hurt the system. This was the defacto method for getting high > performance graphics on IBM RS/6000, sgi, HP and Sun workstations > many moons ago. And that's exactly what X does still today on pretty much all machines :) > 2) There is interest in the virtual i/o community about mmaping > funky stuff to userspace, but that conversation may be for a > different day. The question is (for example) how to build > a high-performance virtual scsi server in userspace (without > kernel pieces) which is a design point some people like. > Later... > > --linas > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -- Benjamin Herrenschmidt From benh at kernel.crashing.org Sun Jan 16 09:36:37 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:36:37 +1100 Subject: [PATCH] htab code cleanup In-Reply-To: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> Message-ID: <1105828597.27435.88.camel@gaston> On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote: > Hi all, > > This patch just does some small clean ups on the hash page table code > - make htab_address static with in htab_native.c > - move some code that depended on CONFIG_PPC_MULTIPLATFORM > from htab_utils.c to htab_native.c (on less CONFIG check). > - clean up includes in htab_utils.c I don't see the point of moving create_pte_mapping() and htab_initialize() to htab_native.c since it contains code for both native and non-native... If you want to get rid of the htab_address, then maybe split htab_initialize in bits... like htab_native_init() and htab_plpar_init() for the early ptr setup, that sort of thing ... Ben. From benh at kernel.crashing.org Sun Jan 16 09:44:27 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:44:27 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <20050101223345.GC2297@zax> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> Message-ID: <1105829067.27411.92.camel@gaston> On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > > switch_mm() uses a BEGIN_FTR_SECTION ... > > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change > > since 0x0000000000000008UL winds up in the generated assembly. I > > couldn't find the BEGIN/END_FTR_SECTION construct used in any other C > > code, so I replaced this with the usual bitwise 'and' conditional (I > > hope someone else will verify that this is equivalent :). > > > > So, does this look like the right thing to do? It eliminates 129 sparse > > warnings from a defconfig 2.6.10 build. Hrm... it's a bit annoying. You are replacing a dynamic patching of the code by an runtime test... killing a (small tho) optimisation. There may be other cases where I want to use the CPU feature stuff in inline assembly..... Not sure what the right fix is, maybe passing the constant to the asm via the inputs as "i" ... Ben. From paulus at samba.org Sun Jan 16 09:54:22 2005 From: paulus at samba.org (Paul Mackerras) Date: Sun, 16 Jan 2005 09:54:22 +1100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115143805.GA15041@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu> Message-ID: <16873.40734.485466.850449@cargo.ozlabs.ibm.com> Ingo Molnar writes: > +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); > +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); I don't think this is right - this means that a cpu trying to acquire a read lock will spin while any other cpu has a read lock. We need to invent and use a rwlock_is_write_locked() here. PPC64 and parisc have an is_write_locked() already, and it shouldn't be too hard to do one for the other architectures (i386 wants (signed int)rw->lock <= 0, most other arches seem to need (signed int)rw->lock < 0). > +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); This one should be rwlock_is_locked, surely? Otherwise the compiler will grizzle about us calling spin_is_locked with a rwlock_t *. Regards, Paul. From paulus at samba.org Sun Jan 16 14:04:27 2005 From: paulus at samba.org (Paul Mackerras) Date: Sun, 16 Jan 2005 14:04:27 +1100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115142537.GD10114@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> Message-ID: <16873.55739.214904.473407@cargo.ozlabs.ibm.com> Ingo Molnar writes: > * Paul Mackerras wrote: > > > Secondly we have lost the __spin_yield call that we had on ppc64, > > which is an important optimization when we are running under the > > hypervisor. I can't just put that in cpu_relax because I need to know > > which (virtual) cpu is holding the lock, so that I can tell the > > hypervisor which virtual cpu to give my time slice to. That > > information is stored in the lock variable, which is why __spin_yield > > needs the address of the lock. > > hm, how about calling __spin_yield() from _raw_spin_trylock(), if the > locking attempt was unsuccessful? This might be slightly incorrect if > the locking attempt is not connected to an actual spin-loop, but we do > have other spin-loops with open-coded trylocks that would benefit from > this optimization too. That would help, but we also need to yield while we are polling the lock until it becomes available. Otherwise we will only yield once; if we get another timeslice and the other cpu still hasn't finished with the lock (or another cpu has got it now), we will spin uselessly for the whole of our timeslice. Thus I think we need to yield in the polling loop, whether or not we also yield in _raw_spin_trylock. Regards, Paul. From anton at samba.org Sun Jan 16 16:19:04 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 16 Jan 2005 16:19:04 +1100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? In-Reply-To: <20050116043356.GM4274@stusta.de> References: <20050116043356.GM4274@stusta.de> Message-ID: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Hi, > during a cleanup, I stumbled upon the following: > > > arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: > > /* XXX fix this, xics currently relies on it - Anton */ > smp_threads_ready = 1; > > > arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where > smp_threads_ready is actually used, and this is the _only_ place where > smp_threads_ready ever changes it's value on ppc64. It turns out I was about to submit a patch to remove the ppc64 use of smp_threads_ready. With that patch it makes sense to kill smp_threads_ready completely. Anton From anton at samba.org Sun Jan 16 16:55:23 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 16 Jan 2005 16:55:23 +1100 Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile options arent much use on a distro kernel. This also removes the ppc64 use of smp_threads_ready. I considered removing the option completely but we have had problems in the past with firmware bugs. In those cases the boot option would have helped. Signed-off-by: Anton Blanchard ===== arch/ppc64/Kconfig 1.76 vs edited ===== --- 1.76/arch/ppc64/Kconfig 2005-01-16 09:31:06 +11:00 +++ edited/arch/ppc64/Kconfig 2005-01-16 16:48:43 +11:00 @@ -186,14 +186,6 @@ If you don't know what to do here, say Y. -config IRQ_ALL_CPUS - bool "Distribute interrupts on all CPUs by default" - depends on SMP && PPC_MULTIPLATFORM - help - This option gives the kernel permission to distribute IRQs across - multiple CPUs. Saying N here will route all IRQs to the first - CPU. - config NR_CPUS int "Maximum number of CPUs (2-128)" range 2 128 ===== arch/ppc64/kernel/irq.c 1.74 vs edited ===== --- 1.74/arch/ppc64/kernel/irq.c 2005-01-05 13:48:02 +11:00 +++ edited/arch/ppc64/kernel/irq.c 2005-01-16 16:48:47 +11:00 @@ -62,6 +62,7 @@ extern irq_desc_t irq_desc[NR_IRQS]; +int distribute_irqs = 1; int __irq_offset_value; int ppc_spurious_interrupts; unsigned long lpevent_count; @@ -479,3 +480,10 @@ #endif /* CONFIG_IRQSTACKS */ +static int __init setup_noirqdistrib(char *str) +{ + distribute_irqs = 0; + return 1; +} + +__setup("noirqdistrib", setup_noirqdistrib); ===== arch/ppc64/kernel/mpic.c 1.3 vs edited ===== --- 1.3/arch/ppc64/kernel/mpic.c 2004-11-16 14:29:10 +11:00 +++ edited/arch/ppc64/kernel/mpic.c 2005-01-16 16:48:44 +11:00 @@ -765,10 +765,8 @@ #ifdef CONFIG_SMP struct mpic *mpic = mpic_primary; unsigned long flags; -#ifdef CONFIG_IRQ_ALL_CPUS u32 msk = 1 << hard_smp_processor_id(); unsigned int i; -#endif BUG_ON(mpic == NULL); @@ -776,16 +774,16 @@ spin_lock_irqsave(&mpic_lock, flags); -#ifdef CONFIG_IRQ_ALL_CPUS /* let the mpic know we want intrs. default affinity is 0xffffffff * until changed via /proc. That's how it's done on x86. If we want * it differently, then we should make sure we also change the default * values of irq_affinity in irq.c. */ - for (i = 0; i < mpic->num_sources ; i++) - mpic_irq_write(i, MPIC_IRQ_DESTINATION, - mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk); -#endif /* CONFIG_IRQ_ALL_CPUS */ + if (distribute_irqs) { + for (i = 0; i < mpic->num_sources ; i++) + mpic_irq_write(i, MPIC_IRQ_DESTINATION, + mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk); + } /* Set current processor priority to 0 */ mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0); ===== arch/ppc64/kernel/pSeries_smp.c 1.9 vs edited ===== --- 1.9/arch/ppc64/kernel/pSeries_smp.c 2005-01-12 11:42:40 +11:00 +++ edited/arch/ppc64/kernel/pSeries_smp.c 2005-01-16 16:48:44 +11:00 @@ -259,7 +259,6 @@ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); -#ifdef CONFIG_IRQ_ALL_CPUS /* * Put the calling processor into the GIQ. This is really only * necessary from a secondary thread as the OF start-cpu interface @@ -267,7 +266,6 @@ */ rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, (1UL << interrupt_server_size) - 1 - default_distrib_server, 1); -#endif } static spinlock_t timebase_lock = SPIN_LOCK_UNLOCKED; ===== arch/ppc64/kernel/smp.c 1.104 vs edited ===== --- 1.104/arch/ppc64/kernel/smp.c 2005-01-12 11:42:39 +11:00 +++ edited/arch/ppc64/kernel/smp.c 2005-01-16 16:48:45 +11:00 @@ -526,9 +526,6 @@ smp_ops->setup_cpu(boot_cpuid); - /* XXX fix this, xics currently relies on it - Anton */ - smp_threads_ready = 1; - set_cpus_allowed(current, old_mask); /* ===== arch/ppc64/kernel/xics.c 1.57 vs edited ===== --- 1.57/arch/ppc64/kernel/xics.c 2005-01-12 11:42:40 +11:00 +++ edited/arch/ppc64/kernel/xics.c 2005-01-16 16:48:45 +11:00 @@ -242,28 +242,24 @@ static int get_irq_server(unsigned int irq) { unsigned int server; - -#ifdef CONFIG_IRQ_ALL_CPUS /* For the moment only implement delivery to all cpus or one cpu */ - if (smp_threads_ready) { - cpumask_t cpumask = irq_affinity[irq]; - cpumask_t tmp = CPU_MASK_NONE; - if (cpus_equal(cpumask, CPU_MASK_ALL)) { - server = default_distrib_server; - } else { - cpus_and(tmp, cpu_online_map, cpumask); + cpumask_t cpumask = irq_affinity[irq]; + cpumask_t tmp = CPU_MASK_NONE; + + if (!distribute_irqs) + return default_server; - if (cpus_empty(tmp)) - server = default_distrib_server; - else - server = get_hard_smp_processor_id(first_cpu(tmp)); - } + if (cpus_equal(cpumask, CPU_MASK_ALL)) { + server = default_distrib_server; } else { - server = default_server; + cpus_and(tmp, cpu_online_map, cpumask); + + if (cpus_empty(tmp)) + server = default_distrib_server; + else + server = get_hard_smp_processor_id(first_cpu(tmp)); } -#else - server = default_server; -#endif + return server; } ===== include/asm-ppc64/irq.h 1.11 vs edited ===== --- 1.11/include/asm-ppc64/irq.h 2004-10-23 11:44:19 +10:00 +++ edited/include/asm-ppc64/irq.h 2005-01-16 16:48:47 +11:00 @@ -87,6 +87,8 @@ return irq; } +extern int distribute_irqs; + struct irqaction; struct pt_regs; From bunk at stusta.de Sun Jan 16 15:33:56 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 05:33:56 +0100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? Message-ID: <20050116043356.GM4274@stusta.de> Hi Anton, during a cleanup, I stumbled upon the following: arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: /* XXX fix this, xics currently relies on it - Anton */ smp_threads_ready = 1; arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where smp_threads_ready is actually used, and this is the _only_ place where smp_threads_ready ever changes it's value on ppc64. I have to admit I'm a bit lost in the sequence of function calls on ppc64. Is it possible to make any assumptions about the ordering of the assignment and the usage of smp_threads_ready? TIA Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From bunk at stusta.de Sun Jan 16 18:24:39 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 08:24:39 +0100 Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS In-Reply-To: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116072439.GS4274@stusta.de> On Sun, Jan 16, 2005 at 04:55:23PM +1100, Anton Blanchard wrote: > > Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile > options arent much use on a distro kernel. This also removes the ppc64 > use of smp_threads_ready. >... Seems perfect for me. :-) I'll simply state that my patch depends on ppc64 on your patch. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From bunk at stusta.de Sun Jan 16 16:26:56 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 06:26:56 +0100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116052655.GN4274@stusta.de> On Sun, Jan 16, 2005 at 04:19:04PM +1100, Anton Blanchard wrote: > > Hi, Hi Anton, > > during a cleanup, I stumbled upon the following: > > > > > > arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: > > > > /* XXX fix this, xics currently relies on it - Anton */ > > smp_threads_ready = 1; > > > > > > arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where > > smp_threads_ready is actually used, and this is the _only_ place where > > smp_threads_ready ever changes it's value on ppc64. > > It turns out I was about to submit a patch to remove the ppc64 use of > smp_threads_ready. With that patch it makes sense to kill > smp_threads_ready completely. I've got a patch ready to remove smp_threads_ready on all architectures. The only part I still need ids how to replace it in xics.c, since this is the only read access to this variable on all architectures. Could you send me this part for inclusion into my patch? > Anton TIA Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From zwane at arm.linux.org.uk Mon Jan 17 15:37:28 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sun, 16 Jan 2005 21:37:28 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105827794.27410.82.camel@gaston> References: <1105827794.27410.82.camel@gaston> Message-ID: Hello Ben, On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > Looks good, but you could do even better :) I still want to look at the > proper mecanism to flush the CPU cache on 970, but the idea here is to > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > with the caches clean and MSR:EE off. We can later get it back with a > soft reset. Thanks for the suggestions! I'll work on getting something together. Zwane From benh at kernel.crashing.org Mon Jan 17 15:47:46 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 17 Jan 2005 15:47:46 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: References: <1105827794.27410.82.camel@gaston> Message-ID: <1105937266.4534.0.camel@gaston> On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote: > Hello Ben, > > On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > > > Looks good, but you could do even better :) I still want to look at the > > proper mecanism to flush the CPU cache on 970, but the idea here is to > > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > > with the caches clean and MSR:EE off. We can later get it back with a > > soft reset. > > Thanks for the suggestions! I'll work on getting something together. Well.. the cache flush part requires some not-really-documentd stuff on the 970, but I'll try to come up with something. Ben. From zwane at arm.linux.org.uk Mon Jan 17 16:35:05 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sun, 16 Jan 2005 22:35:05 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105937266.4534.0.camel@gaston> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> Message-ID: On Mon, 17 Jan 2005, Benjamin Herrenschmidt wrote: > On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote: > > Hello Ben, > > > > On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > > > > > Looks good, but you could do even better :) I still want to look at the > > > proper mecanism to flush the CPU cache on 970, but the idea here is to > > > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > > > with the caches clean and MSR:EE off. We can later get it back with a > > > soft reset. > > > > Thanks for the suggestions! I'll work on getting something together. > > Well.. the cache flush part requires some not-really-documentd stuff on > the 970, but I'll try to come up with something. I was waiting for you to say that ;) Thanks, Zwane From mingo at elte.hu Mon Jan 17 22:32:17 2005 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 17 Jan 2005 12:32:17 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A1 In-Reply-To: <16873.40734.485466.850449@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu> <16873.40734.485466.850449@cargo.ozlabs.ibm.com> Message-ID: <20050117113217.GA14619@elte.hu> * Paul Mackerras wrote: > > +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); > > +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); > > I don't think this is right - this means that a cpu trying to acquire > a read lock will spin while any other cpu has a read lock. We need to > invent and use a rwlock_is_write_locked() here. PPC64 and parisc have > an is_write_locked() already, and it shouldn't be too hard to do one > for the other architectures (i386 wants (signed int)rw->lock <= 0, > most other arches seem to need (signed int)rw->lock < 0). > > > +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); > > This one should be rwlock_is_locked, surely? Otherwise the compiler > will grizzle about us calling spin_is_locked with a rwlock_t *. you are right on both counts. The patch below, ontop of current BK, fixes both problems. the first fix is that there was no compiler warning on x86 because it uses macros - i fixed this by changing the spinlock field to be '->slock'. (we could also use inline functions to get type protection, i chose this solution because it was the easiest to do.) the second fix is to split rwlock_is_locked() into two functions: +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) this canonical naming of them also enabled the elimination of the newly added 'is_locked_fn' argument to the BUILD_LOCK_OPS macro. the third change was to change the other user of rwlock_is_locked(), and to put a migration helper there: architectures that dont have read/write_is_locked defined yet will get a #warning message but the build will succeed. (except if PREEMPT is enabled - there we really need.) compile and boot-tested on x86, on SMP and UP, PREEMPT and !PREEMPT. Non-x86 architectures should work fine, except PREEMPT+SMP builds which will need the read_is_locked()/write_is_locked() definitions. !PREEMPT+SMP builds will work fine and will produce a #warning. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ +#define BUILD_LOCK_OPS(op, locktype) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,7 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ + while (op##_is_locked(lock) && (lock)->break_lock) \ cpu_relax(); \ preempt_disable(); \ } \ @@ -205,7 +205,7 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ + while (op##_is_locked(lock) && (lock)->break_lock) \ cpu_relax(); \ preempt_disable(); \ } \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock_t); +BUILD_LOCK_OPS(read, rwlock_t); +BUILD_LOCK_OPS(write, rwlock_t); #endif /* CONFIG_PREEMPT */ --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -15,7 +15,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +43,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +83,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +101,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +123,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +138,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +151,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +186,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); From mingo at elte.hu Mon Jan 17 23:42:09 2005 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 17 Jan 2005 13:42:09 +0100 Subject: [patch] spin-yield-2.6.11-rc1-A1 In-Reply-To: <16873.55739.214904.473407@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <16873.55739.214904.473407@cargo.ozlabs.ibm.com> Message-ID: <20050117124209.GA20796@elte.hu> * Paul Mackerras wrote: > > hm, how about calling __spin_yield() from _raw_spin_trylock(), if the > > locking attempt was unsuccessful? This might be slightly incorrect if > > the locking attempt is not connected to an actual spin-loop, but we do > > have other spin-loops with open-coded trylocks that would benefit from > > this optimization too. > > That would help, but we also need to yield while we are polling the > lock until it becomes available. Otherwise we will only yield once; > if we get another timeslice and the other cpu still hasn't finished > with the lock (or another cpu has got it now), we will spin uselessly > for the whole of our timeslice. Thus I think we need to yield in the > polling loop, whether or not we also yield in _raw_spin_trylock. ok - how about the (raw) patch below? (ontop of BK plus the latest spin-nicer patch i sent earlier.) It builds/boots on x86 but is untested on ppc64. the idea is to make spin_yield() a generic function, with some related namespace cleanups. Ingo Acked-by: Ingo Molnar --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ -void __lockfunc _##op##_lock(locktype *lock) \ +#define BUILD_LOCK_OPS(op, locktype) \ +void __lockfunc _##op##_lock(locktype##_t *lock) \ { \ preempt_disable(); \ for (;;) { \ @@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ } \ \ EXPORT_SYMBOL(_##op##_lock); \ \ -unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock) \ +unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ return flags; \ @@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir \ EXPORT_SYMBOL(_##op##_lock_irqsave); \ \ -void __lockfunc _##op##_lock_irq(locktype *lock) \ +void __lockfunc _##op##_lock_irq(locktype##_t *lock) \ { \ _##op##_lock_irqsave(lock); \ } \ \ EXPORT_SYMBOL(_##op##_lock_irq); \ \ -void __lockfunc _##op##_lock_bh(locktype *lock) \ +void __lockfunc _##op##_lock_bh(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock); +BUILD_LOCK_OPS(read, rwlock); +BUILD_LOCK_OPS(write, rwlock); #endif /* CONFIG_PREEMPT */ --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -7,6 +7,8 @@ #include #include +#include + asmlinkage int printk(const char * fmt, ...) __attribute__ ((format (printf, 1, 2))); @@ -15,7 +17,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +45,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +85,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +103,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +125,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +140,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +153,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +188,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter From cfriesen at nortelnetworks.com Tue Jan 18 02:14:42 2005 From: cfriesen at nortelnetworks.com (Chris Friesen) Date: Mon, 17 Jan 2005 09:14:42 -0600 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105937266.4534.0.camel@gaston> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> Message-ID: <41EBD662.1080409@nortelnetworks.com> Benjamin Herrenschmidt wrote: > Well.. the cache flush part requires some not-really-documentd stuff on > the 970, but I'll try to come up with something. Details? We've got a cache-flush routine put together based on the documentation that seems to be working, but if there's something else that has to be done I'd love to know about it. Chris From dhowells at redhat.com Tue Jan 18 03:27:19 2005 From: dhowells at redhat.com (David Howells) Date: Mon, 17 Jan 2005 16:27:19 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race Message-ID: <31453.1105979239@redhat.com> The attached patch fixes a race between kallsyms and insmod/rmmod. The problem is this: (1) The various kallsyms functions poke around in the module list without any locking so that they can be called from the oops handler. (2) Although insmod and rmmod use locks to exclude each other, these have no effect on the kallsyms function. (3) Although rmmod modifies the module state with the machine "stopped", it hasn't removed the metadata from the module metadata list, meaning that as soon as the machine is "restarted", the metadata can be observed by kallsyms. It's not possible to say that an item in that list should be ignored if it's state is marked as inactive - you can't get at the state information because you can't trust the metadata in which it is embedded. Furthermore, list linkage information is embedded in the metadata too, so you can't trust that either... (4) kallsyms may be walking the module list without a lock whilst either insmod or rmmod are busy changing it. insmod probably isn't a problem since nothing is going a way, but rmmod is as it's deleting an entry. (5) Therefore nothing that uses these functions can in any way trust any pointers to "static" data (such as module symbol names or module names) that are returned. (6) On ppc64 the problems are exacerbated since the hypervisor may reschedule bits of the kernel, making operations that appear adjacent occur a long time apart. This patch fixes the race by only linking/unlinking modules into/from the master module list with the machine in the "stopped" state. This means that any "static" information can be trusted as far as the next kernel reschedule on any given CPU without the need to hold any locks. However, I'm not sure how this is affected by preemption. I suspect more work may need to be done in that case, but I'm not entirely sure. This also means that rmmod has to bump the machine into the stopped state twice... but since that shouldn't be a common operation, I don't think that's a problem. Signed-Off-By: David Howells --- warthog>diffstat kallsyms-race-2611rc1.diff kallsyms.c | 16 ++++++++++++++-- module.c | 35 ++++++++++++++++++++++++++++------- 2 files changed, 42 insertions(+), 9 deletions(-) diff -uNrp linux-2.6.11-rc1/kernel/kallsyms.c linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c --- linux-2.6.11-rc1/kernel/kallsyms.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c 2005-01-17 15:33:55.000000000 +0000 @@ -139,13 +139,20 @@ unsigned long kallsyms_lookup_name(const return module_kallsyms_lookup_name(name); } -/* Lookup an address. modname is set to NULL if it's in the kernel. */ +/* + * Lookup an address + * - modname is set to NULL if it's in the kernel + * - we guarantee that the returned name is valid until we reschedule even if + * it resides in a module + * - we also guarantee that modname will be valid until rescheduled + */ const char *kallsyms_lookup(unsigned long addr, unsigned long *symbolsize, unsigned long *offset, char **modname, char *namebuf) { unsigned long i, low, high, mid; + const char *msym; /* This kernel should never had been booted. */ BUG_ON(!kallsyms_addresses); @@ -196,7 +203,12 @@ const char *kallsyms_lookup(unsigned lon return namebuf; } - return module_address_lookup(addr, symbolsize, offset, modname); + /* see if it's in a module */ + msym = module_address_lookup(addr, symbolsize, offset, modname); + if (msym) + return strncpy(namebuf, msym, KSYM_NAME_LEN); + + return NULL; } /* Replace "%s" in format with address, or returns -errno. */ diff -uNrp linux-2.6.11-rc1/kernel/module.c linux-2.6.11-rc1-kallsyms/kernel/module.c --- linux-2.6.11-rc1/kernel/module.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/module.c 2005-01-17 15:31:42.000000000 +0000 @@ -1072,14 +1072,24 @@ static void mod_kobject_remove(struct mo kobject_unregister(&mod->mkobj.kobj); } +/* + * unlink the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __unlink_module(void *_mod) +{ + struct module *mod = _mod; + spin_lock(&modlist_lock); + list_del(&mod->list); + spin_unlock(&modlist_lock); + return 0; +} + /* Free a module, remove from lists, etc (must hold module mutex). */ static void free_module(struct module *mod) { /* Delete from various lists */ - spin_lock_irq(&modlist_lock); - list_del(&mod->list); - spin_unlock_irq(&modlist_lock); - + stop_machine_run(__unlink_module, mod, NR_CPUS); remove_sect_attrs(mod); mod_kobject_remove(mod); @@ -1732,6 +1742,19 @@ static struct module *load_module(void _ goto free_hdr; } +/* + * link the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __link_module(void *_mod) +{ + struct module *mod = _mod; + spin_lock(&modlist_lock); + list_add(&mod->list, &modules); + spin_unlock(&modlist_lock); + return 0; +} + /* This is where the real work happens */ asmlinkage long sys_init_module(void __user *umod, @@ -1766,9 +1789,7 @@ sys_init_module(void __user *umod, /* Now sew it into the lists. They won't access us, since strong_try_module_get() will fail. */ - spin_lock_irq(&modlist_lock); - list_add(&mod->list, &modules); - spin_unlock_irq(&modlist_lock); + stop_machine_run(__link_module, mod, NR_CPUS); /* Drop lock so they can recurse */ up(&module_mutex); From willschm at us.ibm.com Tue Jan 18 03:42:05 2005 From: willschm at us.ibm.com (Will Schmidt) Date: Mon, 17 Jan 2005 10:42:05 -0600 Subject: question about LMB's size In-Reply-To: Message-ID: Hi, ipseries-list-bounces at redhat.com wrote on 01/17/2005 05:00:46 AM: > Hi, > This is a question about the different of memory size between lpar and HMC. ... > 2. In lpar didolp2: We get the size of memory is 2174672KB. > [root at didolp2 ~]# cat /proc/meminfo > MemTotal: 2174672 kB > > The question is: 2174672/(32*1024) = 66.36572265625 MemTotal is the amount of free memory in the partition, which does not include the memory that holds the kernel code, (bss, data, init). There should be a few other pieces of data that will add up to the numbers you are looking for. in early boot messages, there is a line "SystemCfg->physicalMemorySize = 0x.......". This value should be precisely what you are trying to measure. A bit later in the logs, you can also see a line "Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k data, ###k bss, ###k init). the YYYYk should also match what you are looking for. > > whereas 2176/32=68. > > 68 != 66.36572265625 > > -------------------------------------------- > Wang Zhaoyu > > Email: wangzyu at cn.ibm.com > Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN-- > ipseries-list mailing list > ipseries-list at redhat.com > https://www.redhat.com/mailman/listinfo/ipseries-list -Will From linas at austin.ibm.com Tue Jan 18 07:14:15 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 17 Jan 2005 14:14:15 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050106192413.GK22274@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> Message-ID: <20050117201415.GA11505@austin.ibm.com> Andrew, The attached file describes PCI bus EEH "Extended Error Handling" concepts and operation; could you drop this into the kernel documentation tree, at linux-2.6/Documentation/powerpc/eeh-pci-error-recovery.txt ? Signed-off-by: Linas Vepstas --linas p.s. It was not clear to me if the EEH patch previously sent (6 January 2005, same subject line) will be wending its way into the main Torvalds kernel tree, or not. I hadn't really gotten confirmation one way or another. -------------- next part -------------- PCI Bus EEH Error Recovery -------------------------- Linas Vepstas 12 January 2005 Overview: --------- The IBM POWER-based pSeries and iSeries computers include PCI bus controller chips that have extended capabilities for detecting and reporting a large variety of PCI bus error conditions. These features go under the name of "EEH", for "Extended Error Handling". The EEH hardware features allow PCI bus errors to be cleared and a PCI card to be "rebooted", without also having to reboot the operating system. This is in contrast to traditional PCI error handling, where the PCI chip is wired directly to the CPU, and an error would cause a CPU machine-check/check-stop condition, halting the CPU entirely. Another "traditional" technique is to ignore such errors, which can lead to data corruption, both of user data or of kernel data, hung/unresponsive adapters, or system crashes/lockups. Thus, the idea behind EEH is that the operating system can become more reliable and robust by protecting it from PCI errors, and giving the OS the ability to "reboot"/recover individual PCI devices. Future systems from other vendors, based on the PCI-E specification, may contain similar features. Causes of EEH Errors -------------------- EEH was originally designed to guard against hardware failure, such as PCI cards dying from heat, humidity, dust, vibration and bad electrical connections. The vast majority of EEH errors seen in "real life" are due to eithr poorly seated PCI cards, or, unfortunately quite commonly, due device driver bugs, device firmware bugs, and sometimes PCI card hardware bugs. The most common software bug, is one that causes the device to attempt to DMA to a location in system memory that has not been reserved for DMA access for that card. This is a powerful feature, as it prevents what; otherwise, would have been silent memory corruption caused by the bad DMA. A number of device driver bugs have been found and fixed in this way over the past few years. Other possible causes of EEH errors include data or address line parity errors (for example, due to poor electrical connectivity due to a poorly seated card), and PCI-X split-completion errors (due to software, device firmware, or device PCI hardware bugs). The vast majority of "true hardware failures" can be cured by physically removing and re-seating the PCI card. Detection and Recovery ---------------------- In the following discussion, a generic overview of how to detect and recover from EEH errors will be presented. This is followed by an overview of how the current implementation in the Linux kernel does it. The actual implementation is subject to change, and some of the finer points are still being debated. These may in turn be swayed if or when other architectures implement similar functionality. When a PCI Host Bridge (PHB, the bus controller connecting the PCI bus to the system CPU electronics complex) detects a PCI error condition, it will "isolate" the affected PCI card. Isolation will block all writes (either to the card from the system, or from the card to the system), and it will cause all reads to return all-ff's (0xff, 0xffff, 0xffffffff for 8/16/32-bit reads). This value was chosen because it is the same value you would get if the device was physically unplugged from the slot. This includes access to PCI memory, I/O space, and PCI config space. Interrupts; however, will continued to be delivered. Detection and recovery are performed with the aid of ppc64 firmware. The programming interfaces in the Linux kernel into the firmware are referred to as RTAS (Run-Time Abstraction Services). The Linux kernel does not (should not) access the EEH function in the PCI chipsets directly, primarily because there are a number of different chipsets out there, each with different interfaces and quirks. The firmware provides a uniform abstraction layer that will work with all pSeries and iSeries hardware (and be forwards-compatible). If the OS or device driver suspects that a PCI slot has been EEH-isolated, there is a firmware call it can make to determine if this is the case. If so, then the device driver should put itself into a consistent state (given that it won't be able to complete any pending work) and start recovery of the card. Recovery normally would consist of reseting the PCI device (holding the PCI #RST line high for two seconds), followed by setting up the device config space (the base address registers (BAR's), latency timer, cache line size, interrupt line, and so on). This is followed by a reinitialization of the device driver. In a worst-case scenario, the power to the card can be toggled, at least on hot-plug-capable slots. In principle, layers far above the device driver probably do not need to know that the PCI card has been "rebooted" in this way; ideally, there should be at most a pause in Ethernet/disk/USB I/O while the card is being reset. If the card cannot be recovered after three or four resets, the kernel/device driver should assume the worst-case scenario, that the card has died completely, and report this error to the sysadmin. In addition, error messages are reported through RTAS and also through syslogd (/var/log/messages) to alert the sysadmin of PCI resets. The correct way to deal with failed adapters is to use the standard PCI hotplug tools to remove and replace the dead card. Current PPC64 Linux EEH Implementation -------------------------------------- At this time, a generic EEH recovery mechanism has been implemented, so that individual device drivers do not need to be modified to support EEH recovery. This generic mechanism piggy-backs on the PCI hotplug infrastructure, and percolates events up through the hotplug/udev infrastructure. Followiing is a detailed description of how this is accomplished. EEH must be enabled in the PHB's very early during the boot process, and if a PCI slot is hot-plugged. The former is performed by eeh_init() in arch/ppc64/kernel/eeh.c, and the later by drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code. EEH must be enabled before a PCI scan of the device can proceed. Current Power5 hardware will not work unless EEH is enabled; although older Power4 can run with it disabled. Effectively, EEH can no longer be turned off. PCI devices *must* be registered with the EEH code; the EEH code needs to know about the I/O address ranges of the PCI device in order to detect an error. Given an arbitrary address, the routine pci_get_device_by_addr() will find the pci device associated with that address (if any). The default include/asm-ppc64/io.h macros readb(), inb(), insb(), etc. include a check to see if the the i/o read returned all-0xff's. If so, these make a call to eeh_dn_check_failure(), which in turn asks the firmware if the all-ff's value is the sign of a true EEH error. If it is not, processing continues as normal. The grand total number of these false alarms or "false positives" can be seen in /proc/ppc64/eeh (subject to change). Normally, almost all of these occur during boot, when the PCI bus is scanned, where a large number of 0xff reads are part of the bus scan procedure. If a frozen slot is detected, code in arch/ppc64/kernel/eeh.c will print a stack trace to syslog (/var/log/messages). This stack trace has proven to be very useful to device-driver authors for finding out at what point the EEH error was detected, as the error itself usually occurs slightly beforehand. Next, it uses the Linux kernel notifier chain/work queue mechanism to allow any interested parties to find out about the failure. Device drivers, or other parts of the kernel, can use eeh_register_notifier(struct notifier_block *) to find out about EEH events. The event will include a pointer to the pci device, the device node and some state info. Receivers of the event can "do as they wish"; the default handler will be described further in this section. To assist in the recovery of the device, eeh.c exports the following functions: rtas_set_slot_reset() -- assert the PCI #RST line for 1/8th of a second rtas_configure_bridge() -- ask firmware to configure any PCI bridges located topologically under the pci slot. eeh_save_bars() and eeh_restore_bars(): save and restore the PCI config-space info for a device and any devices under it. A handler for the EEH notifier_block events is implemented in drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events(). It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter(). This last call causes the device driver for the card to be stopped, which causes hotplug events to go out to user space. This triggers user-space scripts that might issue commands such as "ifdown eth0" for ethernet cards, and so on. This handler then sleeps for 5 seconds, hoping to give the user-space scripts enough time to complete. It then resets the PCI card, reconfigures the device BAR's, and any bridges underneath. It then calls rpaphp_enable_pci_slot(), which restarts the device driver and triggers more user-space events (for example, calling "ifup eth0" for ethernet cards). Device Shutdown and User-Space Events ------------------------------------- This section documents what happens when a pci slot is unconfigured, focusing on how the device driver gets shut down, and on how the events get delivered to user-space scripts. Following is an example sequence of events that cause a device driver close function to be called during the first phase of an EEH reset. The following sequence is an example of the pcnet32 device driver. rpa_php_unconfig_pci_adapter (struct slot *) // in rpaphp_pci.c { calls pci_remove_bus_device (struct pci_dev *) // in /drivers/pci/remove.c { calls pci_destroy_dev (struct pci_dev *) { calls device_unregister (&dev->dev) // in /drivers/base/core.c { calls device_del (struct device *) { calls bus_remove_device() // in /drivers/base/bus.c { calls device_release_driver() { calls struct device_driver->remove() which is just pci_device_remove() // in /drivers/pci/pci_driver.c { calls struct pci_driver->remove() which is just pcnet32_remove_one() // in /drivers/net/pcnet32.c { calls unregister_netdev() // in /net/core/dev.c { calls dev_close() // in /net/core/dev.c { calls dev->stop(); which is just pcnet32_close() // in pcnet32.c { which does what you wanted to stop the device } } } which frees pcnet32 device driver memory } }}}}}} in drivers/pci/pci_driver.c, struct device_driver->remove() is just pci_device_remove() which calls struct pci_driver->remove() which is pcnet32_remove_one() which calls unregister_netdev() (in net/core/dev.c) which calls dev_close() (in net/core/dev.c) which calls dev->stop() which is pcnet32_close() which then does the appropriate shutdown. --- Following is the analogous stack trace for events sent to user-space when the pci device is unconfigured. rpa_php_unconfig_pci_adapter() { // in rpaphp_pci.c calls pci_remove_bus_device (struct pci_dev *) { // in /drivers/pci/remove.c calls pci_destroy_dev (struct pci_dev *) { calls device_unregister (&dev->dev) { // in /drivers/base/core.c calls device_del(struct device * dev) { // in /drivers/base/core.c calls kobject_del() { //in /libs/kobject.c calls kobject_hotplug() { // in /libs/kobject.c calls kset_hotplug() { // in /lib/kobject.c calls kset->hotplug_ops->hotplug() which is really just a call to dev_hotplug() { // in /drivers/base/core.c calls dev->bus->hotplug() which is really just a call to pci_hotplug () { // in drivers/pci/hotplug.c which prints device name, etc.... } } then kset_hotplug() calls call_usermodehelper () with argv[0]=hotplug_path[] which is "/sbin/hotplug" --> event to userspace, } } kobject_del() then calls sysfs_remove_dir(), which would trigger any user-space daemon that was watching /sysfs, and notice the delete event. Pro's and Con's of the Current Design ------------------------------------- There are several issues with the current EEH software recovery design, which may be addressed in future revisions. But first, note that the big plus of the current design is that no changes need to be made to individual device drivers, so that the current design throws a wide net. The biggest negative of the design is that it potentially disturbs network daemons and file systems that didn't need to be disturbed. -- A minor complaint is that resetting the network card causes user-space back-to-back ifdown/ifup burps that potentially disturb network daemons, that didn't need to even know that the pci card was being rebooted. -- A more serious concern is that the same reset, for SCSI devices, causes havoc to mounted file systems. Scripts cannot post-facto unmount a file system without flushing pending buffers, but this is impossible, because I/O has already been stopped. Thus, ideally, the reset should happen at or below the block layer, so that the file systems are not disturbed. Reiserfs does not tolerate errors returned from the block device. Ext3fs seems to be tolerant, retrying reads/writes until it does succeed. Both have been only lightly tested in this scenario. The SCSI-generic subsystem already has built-in code for performing SCSI device resets, SCSI bus resets, and SCSI host-bus-adapter (HBA) resets. These are cascaded into a chain of attempted resets if a SCSI command fails. These are completely hidden from the block layer. It would be very natural to add an EEH reset into this chain of events. -- If a SCSI error occurs for the root device, all is lost unless the sysadmin had the foresight to run /bin, /sbin, /etc, /var and so on, out of ramdisk/tmpfs. Conclusions ----------- There's forward progress ... From nacc at us.ibm.com Tue Jan 18 10:50:05 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 15:50:05 -0800 Subject: [PATCH 16/21] ppc64/iSeries_pci_reset: replace schedule_timeout() with msleep() Message-ID: <20050117235005.GY24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The code is not wrong as is, but I see two benefits to using msleep(): 1) real time delays (milliseconds) and 2) consistency across the kernel with respect to longer delays. Change the units of the WaitDelay and AssertDelay constants accordingly. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 17:17:54.000000000 -0800 @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -49,7 +50,7 @@ int iSeries_Device_ToggleReset(struct pci_dev *PciDev, int AssertTime, int DelayTime) { - unsigned long AssertDelay, WaitDelay; + unsigned int AssertDelay, WaitDelay; struct iSeries_Device_Node *DeviceNode = (struct iSeries_Device_Node *)PciDev->sysdata; @@ -62,14 +63,14 @@ int iSeries_Device_ToggleReset(struct pc * Set defaults, Assert is .5 second, Wait is 3 seconds. */ if (AssertTime == 0) - AssertDelay = (5 * HZ) / 10; + AssertDelay = 500; else - AssertDelay = (AssertTime * HZ) / 10; + AssertDelay = AssertTime * 100; if (DelayTime == 0) - WaitDelay = (30 * HZ) / 10; + WaitDelay = 3000; else - WaitDelay = (DelayTime * HZ) / 10; + WaitDelay = DelayTime * 100; /* * Assert reset @@ -77,8 +78,7 @@ int iSeries_Device_ToggleReset(struct pc DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode), 0x00, DeviceNode->AgentId, 1); if (DeviceNode->ReturnCode == 0) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(AssertDelay); /* Sleep for the time */ + msleep(AssertDelay); /* Sleep for the time */ DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode), 0x00, DeviceNode->AgentId, 0); @@ -86,8 +86,7 @@ int iSeries_Device_ToggleReset(struct pc /* * Wait for device to reset */ - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(WaitDelay); + msleep(WaitDelay); } if (DeviceNode->ReturnCode == 0) PCIFR("Slot 0x%04X.%02 Reset\n", ISERIES_BUS(DeviceNode), From nacc at us.ibm.com Tue Jan 18 11:15:22 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:15:22 -0800 Subject: [PATCH 17/21] ppc64/pSeries_smp: replace schedule_timeout() with msleep() Message-ID: <20050118001522.GZ24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect, but msleep() is clearer in terms of the length of delay and helps make the kernel consistent. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/pSeries_smp.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/pSeries_smp.c 2005-01-15 17:21:12.000000000 -0800 @@ -107,8 +107,7 @@ void pSeries_cpu_die(unsigned int cpu) cpu_status = query_cpu_stopped(pcpu); if (cpu_status == 0 || cpu_status == -1) break; - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/5); + msleep(200); } if (cpu_status != 0) { printk("Querying DEAD? cpu %i (%i) shows %i\n", From nacc at us.ibm.com Tue Jan 18 11:18:19 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:18:19 -0800 Subject: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with msleep() Message-ID: <20050118001819.GA24698@us.ibm.com> Hi, Please consider applying. Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases, the current code sleeps in TASK_INTERRUPTIBLE but does not account for early wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE sleeps in both cases. The second sleep is slightly more difficult to convert as rtas_event_scan_rate is variable. I have left it as a msleep() call, although ssleep() may be more appropriate. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/rtasd.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/rtasd.c 2005-01-15 17:28:50.000000000 -0800 @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -444,8 +445,7 @@ static int rtasd(void *unused) DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); do_event_scan(event_scan); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(HZ); + ssleep(1); } unlock_cpu_hotplug(); @@ -466,8 +466,7 @@ static int rtasd(void *unused) * one second since some machines have problems if we * call event-scan too quickly). */ unlock_cpu_hotplug(); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout((HZ*60/rtas_event_scan_rate) / 2); + msleep(30000/rtas_event_scan_rate); lock_cpu_hotplug(); cpu = next_cpu(cpu, cpu_online_map); From nacc at us.ibm.com Tue Jan 18 11:20:13 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:20:13 -0800 Subject: [PATCH 19/21] ppc64/smp: replace schedule_timeout() with msleep() Message-ID: <20050118002013.GB24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect; however using msleep() encourages using real time-unit sleeps and keeps the kernel consistent. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/smp.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/smp.c 2005-01-15 17:30:16.000000000 -0800 @@ -459,8 +459,7 @@ int __devinit __cpu_up(unsigned int cpu) * hotplug case. Wait five seconds. */ for (c = 25; c && !cpu_callin_map[cpu]; c--) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/5); + msleep(200); } #endif From nacc at us.ibm.com Tue Jan 18 11:21:30 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:21:30 -0800 Subject: [PATCH 20/21] ppc64/traps: replace schedule_timeout() with ssleep() Message-ID: <20050118002130.GC24698@us.ibm.com> Hi, Please consider applying. Description: Use ssleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect, but using ssleep() encourages specifying delays in real time-units and consistency across the kernel. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/traps.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/traps.c 2005-01-15 17:30:39.000000000 -0800 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -137,8 +138,7 @@ int die(const char *str, struct pt_regs if (panic_on_oops) { printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n"); - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(5 * HZ); + ssleep(5); panic("Fatal exception"); } do_exit(SIGSEGV); From benh at kernel.crashing.org Tue Jan 18 11:49:15 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 18 Jan 2005 11:49:15 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <41EBD662.1080409@nortelnetworks.com> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> <41EBD662.1080409@nortelnetworks.com> Message-ID: <1106009355.4533.19.camel@gaston> On Mon, 2005-01-17 at 09:14 -0600, Chris Friesen wrote: > Benjamin Herrenschmidt wrote: > > > Well.. the cache flush part requires some not-really-documentd stuff on > > the 970, but I'll try to come up with something. > > Details? We've got a cache-flush routine put together based on the > documentation that seems to be working, but if there's something else > that has to be done I'd love to know about it. Well, I don't have all the details at hand right now, but it involves using SCOM (with appropriate workarounds for CPU SCOM bugs on some 970's) to switch the L2 to direct addressing iirc. Ben. From rusty at rustcorp.com.au Tue Jan 18 13:20:03 2005 From: rusty at rustcorp.com.au (Rusty Russell) Date: Tue, 18 Jan 2005 13:20:03 +1100 Subject: [PATCH] Fix kallsyms/insmod/rmmod race In-Reply-To: <31453.1105979239@redhat.com> References: <31453.1105979239@redhat.com> Message-ID: <1106014803.30801.22.camel@localhost.localdomain> On Mon, 2005-01-17 at 16:27 +0000, David Howells wrote: > The attached patch fixes a race between kallsyms and insmod/rmmod. Hi David, The more I looked at this, the more I warmed to it. I've known for a while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan), so we should provide a "grabs locks" version, but this solution gets around that nicely, while making life more certain for the oops case, too. Good work! Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman From dhowells at redhat.com Wed Jan 19 06:44:28 2005 From: dhowells at redhat.com (David Howells) Date: Tue, 18 Jan 2005 19:44:28 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race In-Reply-To: <1106014803.30801.22.camel@localhost.localdomain> References: <1106014803.30801.22.camel@localhost.localdomain> <31453.1105979239@redhat.com> Message-ID: <1561.1106077468@redhat.com> Rusty Russell wrote: > The more I looked at this, the more I warmed to it. I've known for a > while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan), > so we should provide a "grabs locks" version, but this solution gets > around that nicely, while making life more certain for the oops case, > too. Hmmm... though it works on i386 SMP, it doesn't, however, seem to work on ppc64 SMP:-/ My pSeries box seems to think that it can't find any symbols from previously loaded modules, and my Power5 box is quite happy to load modules that depend on other modules but panics because it can't mount its root fs. This is very odd, because the patch is simple enough. Is there anything obvious I've missed that you can see? Or maybe I'm just misunderstanding how stop_machine_run() works... maybe it can't be called during initialisation. David From benh at kernel.crashing.org Wed Jan 19 13:54:51 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 19 Jan 2005 13:54:51 +1100 Subject: [PATCH] ppc64/ppc: Cleanup PCI skipping Message-ID: <1106103291.4500.147.camel@gaston> Hi ! The g5 code has special hooks to "hide" some PCI devices when they are off. Currently, this code involves some calls to match a pci_dev from the open firmware node and such things that are causing some problems with the latest version of my sungem driver who wants to do some of this in atomic contexts. This patch moves that to a list of struct device_node instead, which also ends up simplifying the code. Later, I'll go back to manipulating PCI devices in a clean way when Brian King's PCI blocking patch gets in, but only after I change sungem again to never call these in atomic context. This is a 3 step transition basically Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_feature.c 2004-11-22 11:49:24.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_feature.c 2005-01-19 13:48:25.000000000 +1100 @@ -111,7 +111,7 @@ static u32 uninorth_rev __pmacdata; static void *u3_ht; -extern struct pci_dev *k2_skiplist[2]; +extern struct device_node *k2_skiplist[2]; /* * For each motherboard family, we have a table of functions pointers @@ -160,30 +160,17 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev = NULL; if (node == NULL) return -ENODEV; - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root. - * Note that we only get the slot when value is 0. This is called - * early during boot with value 1 to enable all devices, at which - * point, we don't yet have probed pci_find_slot, so it would fail - * to look for the slot at this point. - */ - if (!value) - pdev = pci_find_slot(node->busno, node->devfn); - LOCK(flags); if (value) { MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); mb(); k2_skiplist[0] = NULL; } else { - k2_skiplist[0] = pdev; + k2_skiplist[0] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); } @@ -198,30 +185,17 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev = NULL; - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - * Note that we only get the slot when value is 0. This is called - * early during boot with value 1 to enable all devices, at which - * point, we don't yet have probed pci_find_slot, so it would fail - * to look for the slot at this point. - */ if (node == NULL) return -ENODEV; - if (!value) - pdev = pci_find_slot(node->busno, node->devfn); - LOCK(flags); if (value) { MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); mb(); k2_skiplist[1] = NULL; } else { - k2_skiplist[1] = pdev; + k2_skiplist[1] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); } Index: linux-work/arch/ppc64/kernel/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_pci.c 2005-01-14 08:17:11.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_pci.c 2005-01-19 13:44:50.000000000 +1100 @@ -43,7 +43,7 @@ * assuming we won't have both UniNorth and Bandit */ static int has_uninorth; static struct pci_controller *u3_agp; -struct pci_dev *k2_skiplist[2]; +struct device_node *k2_skiplist[2]; static int __init fixup_one_level_bus_range(struct device_node *node, int higher) { @@ -233,15 +233,6 @@ struct device_node *busdn, *dn; int i; - /* - * When a device in K2 is powered down, we die on config - * cycle accesses. Fix that here. - */ - for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) - return 1; - /* We only allow config cycles to devices that are in OF device-tree * as we are apparently having some weird things going on with some * revs of K2 on recent G5s @@ -256,6 +247,14 @@ if (dn == NULL) return -1; + /* + * When a device in K2 is powered down, we die on config + * cycle accesses. Fix that here. + */ + for (i=0; i<2; i++) + if (k2_skiplist[i] == dn) + return 1; + return 0; } Index: linux-work/arch/ppc/platforms/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_feature.c 2005-01-18 17:50:10.000000000 +1100 +++ linux-work/arch/ppc/platforms/pmac_feature.c 2005-01-19 13:46:06.000000000 +1100 @@ -56,7 +56,7 @@ #endif extern int powersave_nap; -extern struct pci_dev *k2_skiplist[2]; +extern struct device_node *k2_skiplist[2]; /* @@ -1328,16 +1328,6 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev; - u8 pbus, pid; - - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - */ - if (pci_device_from_OF_node(node, &pbus, &pid) == 0) - pdev = pci_find_slot(pbus, pid); LOCK(flags); if (value) { @@ -1345,7 +1335,7 @@ mb(); k2_skiplist[0] = NULL; } else { - k2_skiplist[0] = pdev; + k2_skiplist[0] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); } @@ -1361,16 +1351,6 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev; - u8 pbus, pid; - - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - */ - if (pci_device_from_OF_node(node, &pbus, &pid) == 0) - pdev = pci_find_slot(pbus, pid); LOCK(flags); if (value) { @@ -1378,7 +1358,7 @@ mb(); k2_skiplist[1] = NULL; } else { - k2_skiplist[1] = pdev; + k2_skiplist[1] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); } Index: linux-work/arch/ppc/platforms/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_pci.c 2005-01-18 17:50:11.000000000 +1100 +++ linux-work/arch/ppc/platforms/pmac_pci.c 2005-01-19 13:46:58.000000000 +1100 @@ -52,7 +52,7 @@ extern u8 pci_cache_line_size; extern int pcibios_assign_bus_offset; -struct pci_dev *k2_skiplist[2]; +struct device_node *k2_skiplist[2]; /* * Magic constants for enabling cache coherency in the bandit/PSX bridge. @@ -325,8 +325,7 @@ * cycle accesses. Fix that here. */ for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) { + if (k2_skiplist[i] == np) { switch (len) { case 1: *val = 0xff; break; @@ -375,8 +374,7 @@ * cycle accesses. Fix that here. */ for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) + if (k2_skiplist[i] == np) return PCIBIOS_SUCCESSFUL; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); From anton at samba.org Wed Jan 19 15:12:30 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 19 Jan 2005 15:12:30 +1100 Subject: [PATCH] ppc64: Minimum hashtable size Message-ID: <20050119041230.GB21682@krispykreme.ozlabs.ibm.com> From: Milton Miller We werent enforcing the minimum hardware MMU hashtable size. Signed-off-by: Milton Miller Signed-off-by: Anton Blanchard diff -puN arch/ppc64/kernel/prom.c~minimum_hashtable_size arch/ppc64/kernel/prom.c --- foobar2/arch/ppc64/kernel/prom.c~minimum_hashtable_size 2005-01-19 15:06:47.729610075 +1100 +++ foobar2-anton/arch/ppc64/kernel/prom.c 2005-01-19 15:07:06.577082744 +1100 @@ -1055,7 +1055,7 @@ void __init early_init_devtree(void *par rnd_mem_size <<= 1; /* # pages / 2 */ - pteg_count = (rnd_mem_size >> (12 + 1)); + pteg_count = max(rnd_mem_size >> (12 + 1), 1UL << 11); ppc64_pft_size = __ilog2(pteg_count << 7); } _ From benh at kernel.crashing.org Wed Jan 19 15:31:55 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 19 Jan 2005 15:31:55 +1100 Subject: vDSO update Message-ID: <1106109115.4499.171.camel@gaston> I posted a new vDSO patch at http://gate.crashing.org/~benh/ppc64-vdso-20050119.diff Now, both 32 and 64 bits vDSO's are linked at "0" and export symbols as offsets to functions and not real function symbols (I made them consistent) and updated to patch to apply against current Linus bk. -- Benjamin Herrenschmidt From sfr at canb.auug.org.au Wed Jan 19 15:48:57 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 19 Jan 2005 15:48:57 +1100 Subject: [PATCH] htab code cleanup In-Reply-To: <1105828597.27435.88.camel@gaston> References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> <1105828597.27435.88.camel@gaston> Message-ID: <20050119154857.7cec8fbb.sfr@canb.auug.org.au> On Sun, 16 Jan 2005 09:36:37 +1100 Benjamin Herrenschmidt wrote: > > On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote: > > Hi all, > > > > This patch just does some small clean ups on the hash page table code > > - make htab_address static with in htab_native.c > > - move some code that depended on CONFIG_PPC_MULTIPLATFORM > > from htab_utils.c to htab_native.c (on less CONFIG check). > > - clean up includes in htab_utils.c > > I don't see the point of moving create_pte_mapping() and > htab_initialize() to htab_native.c since it contains code for both > native and non-native... > > If you want to get rid of the htab_address, then maybe split > htab_initialize in bits... like htab_native_init() and htab_plpar_init() > for the early ptr setup, that sort of thing ... OK, how about this one, then? This has been built and booted on iSeries, pSeries (bare metal and lpar) and a G5 (with and without iommu). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk.new/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c --- linus-bk.new/arch/ppc64/kernel/iSeries_setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c 2005-01-18 16:46:06.000000000 +1100 @@ -477,12 +477,6 @@ htab_hash_mask = num_ptegs - 1; /* - * The actual hashed page table is in the hypervisor, - * we have no direct access - */ - htab_address = NULL; - - /* * Determine if absolute memory has any * holes so that we can interpret the * access map we get back from the hypervisor diff -ruN linus-bk.new/arch/ppc64/kernel/setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c --- linus-bk.new/arch/ppc64/kernel/setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c 2005-01-18 16:46:23.000000000 +1100 @@ -674,7 +674,6 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); diff -ruN linus-bk.new/arch/ppc64/mm/Makefile linus-bk-sfr.14.new/arch/ppc64/mm/Makefile --- linus-bk.new/arch/ppc64/mm/Makefile 2004-09-24 15:23:06.000000000 +1000 +++ linus-bk-sfr.14.new/arch/ppc64/mm/Makefile 2005-01-18 18:28:57.000000000 +1100 @@ -8,4 +8,4 @@ slb_low.o slb.o stab.o mmap.o obj-$(CONFIG_DISCONTIGMEM) += numa.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_native.o +obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_multi.o hash_native.o diff -ruN linus-bk.new/arch/ppc64/mm/hash_multi.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c --- linus-bk.new/arch/ppc64/mm/hash_multi.c 1970-01-01 10:00:00.000000000 +1000 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c 2005-01-18 18:27:48.000000000 +1100 @@ -0,0 +1,177 @@ +/* + * multiplatform hashtable management. + * + * SMP scalability work: + * Copyright (C) 2001 Anton Blanchard , IBM + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +/* + * Note: pte --> Linux PTE + * HPTE --> PowerPC Hashed Page Table Entry + * + * Execution context: + * htab_initialize is called with the MMU off (of course), but + * the kernel has been copied down to zero so it can directly + * reference global data. At this point it is very difficult + * to print debug info. + * + */ + +#ifdef CONFIG_U3_DART +extern unsigned long dart_tablebase; +#endif /* CONFIG_U3_DART */ + +#define KB (1024) +#define MB (1024*KB) + +static inline void loop_forever(void) +{ + volatile unsigned long x = 1; + for(;x;x|=1) + ; +} + +static inline void create_pte_mapping(unsigned long start, unsigned long end, + unsigned long mode, int large) +{ + unsigned long addr; + unsigned int step; + + if (large) + step = 16*MB; + else + step = 4*KB; + + for (addr = start; addr < end; addr += step) { + unsigned long vpn, hash, hpteg; + unsigned long vsid = get_kernel_vsid(addr); + unsigned long va = (vsid << 28) | (addr & 0xfffffff); + int ret; + + if (large) + vpn = va >> HPAGE_SHIFT; + else + vpn = va >> PAGE_SHIFT; + + hash = hpt_hash(vpn, large); + + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); + +#ifdef CONFIG_PPC_PSERIES + if (systemcfg->platform & PLATFORM_LPAR) + ret = pSeries_lpar_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + else +#endif /* CONFIG_PPC_PSERIES */ + ret = native_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + + if (ret == -1) { + ppc64_terminate_msg(0x20, "create_pte_mapping"); + loop_forever(); + } + } +} + +void __init htab_initialize(void) +{ + unsigned long htab_size_bytes; + unsigned long pteg_count; + unsigned long mode_rw; + int i, use_largepages = 0; + + DBG(" -> htab_initialize()\n"); + + /* + * Calculate the required size of the htab. We want the number of + * PTEGs to equal one half the number of real pages. + */ + htab_size_bytes = 1UL << ppc64_pft_size; + pteg_count = htab_size_bytes >> 7; + + /* For debug, make the HTAB 1/8 as big as it normally would be. */ + ifppcdebug(PPCDBG_HTABSIZE) { + pteg_count >>= 3; + htab_size_bytes = pteg_count << 7; + } + + htab_hash_mask = pteg_count - 1; + +#ifdef CONFIG_PPC_PSERIES + if (!(systemcfg->platform & PLATFORM_LPAR)) +#endif + if (native_htab_initialize(htab_size_bytes, pteg_count)) + loop_forever(); + + mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + + /* On U3 based machines, we need to reserve the DART area and + * _NOT_ map it to avoid cache paradoxes as it's remapped non + * cacheable later on + */ + if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + use_largepages = 1; + + /* create bolted the linear mapping in the hash table */ + for (i = 0; i < lmb.memory.cnt; i++) { + unsigned long base, size; + + base = lmb.memory.region[i].physbase + KERNELBASE; + size = lmb.memory.region[i].size; + + DBG("creating mapping for region: %lx : %lx\n", base, size); + +#ifdef CONFIG_U3_DART + /* Do not map the DART space. Fortunately, it will be aligned + * in such a way that it will not cross two lmb regions and will + * fit within a single 16Mb page. + * The DART space is assumed to be a full 16Mb region even if we + * only use 2Mb of that space. We will use more of it later for + * AGP GART. We have to use a full 16Mb large page. + */ + DBG("DART base: %lx\n", dart_tablebase); + + if (dart_tablebase != 0 && dart_tablebase >= base + && dart_tablebase < (base + size)) { + if (base != dart_tablebase) + create_pte_mapping(base, dart_tablebase, mode_rw, + use_largepages); + if ((base + size) > (dart_tablebase + 16*MB)) + create_pte_mapping(dart_tablebase + 16*MB, base + size, + mode_rw, use_largepages); + continue; + } +#endif /* CONFIG_U3_DART */ + create_pte_mapping(base, base + size, mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); +} +#undef KB +#undef MB diff -ruN linus-bk.new/arch/ppc64/mm/hash_native.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c --- linus-bk.new/arch/ppc64/mm/hash_native.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c 2005-01-18 18:28:13.000000000 +1100 @@ -22,10 +22,21 @@ #include #include #include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +extern unsigned long _SDR1; #define HPTE_LOCK_BIT 3 static spinlock_t native_tlbie_lock = SPIN_LOCK_UNLOCKED; +static HPTE *htab_address; static inline void native_lock_hpte(HPTE *hptep) { @@ -410,6 +421,33 @@ } #endif +int native_htab_initialize(unsigned long htab_size_bytes, + unsigned long pteg_count) +{ + unsigned long table; + + /* Find storage for the HPT. Must be contiguous in + * the absolute address space. + */ + table = lmb_alloc(htab_size_bytes, htab_size_bytes); + + DBG("Hash table allocated at %lx, size: %lx\n", table, htab_size_bytes); + + if (!table) { + ppc64_terminate_msg(0x20, "hpt space"); + return 1; + } + htab_address = abs_to_virt(table); + + /* htab absolute addr + encoded htabsize */ + _SDR1 = table + __ilog2(pteg_count) - 11; + + /* Initialize the HPT with no entries */ + memset((void *)table, 0, htab_size_bytes); + + return 0; +} + void hpte_init_native(void) { ppc_md.hpte_invalidate = native_hpte_invalidate; diff -ruN linus-bk.new/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c --- linus-bk.new/arch/ppc64/mm/hash_utils.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c 2005-01-06 14:37:27.000000000 +1100 @@ -17,220 +17,29 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - -#undef DEBUG - -#include -#include -#include +#include +#include +#include #include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include -#include #include #include #include #include #include -#include #include -#include #include -#include -#include #include -#include -#include -#include #include -#include -#include - -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - -/* - * Note: pte --> Linux PTE - * HPTE --> PowerPC Hashed Page Table Entry - * - * Execution context: - * htab_initialize is called with the MMU off (of course), but - * the kernel has been copied down to zero so it can directly - * reference global data. At this point it is very difficult - * to print debug info. - * - */ - -#ifdef CONFIG_U3_DART -extern unsigned long dart_tablebase; -#endif /* CONFIG_U3_DART */ +#include -HPTE *htab_address; unsigned long htab_hash_mask; -extern unsigned long _SDR1; - -#define KB (1024) -#define MB (1024*KB) - -static inline void loop_forever(void) -{ - volatile unsigned long x = 1; - for(;x;x|=1) - ; -} - -#ifdef CONFIG_PPC_MULTIPLATFORM -static inline void create_pte_mapping(unsigned long start, unsigned long end, - unsigned long mode, int large) -{ - unsigned long addr; - unsigned int step; - - if (large) - step = 16*MB; - else - step = 4*KB; - - for (addr = start; addr < end; addr += step) { - unsigned long vpn, hash, hpteg; - unsigned long vsid = get_kernel_vsid(addr); - unsigned long va = (vsid << 28) | (addr & 0xfffffff); - int ret; - - if (large) - vpn = va >> HPAGE_SHIFT; - else - vpn = va >> PAGE_SHIFT; - - hash = hpt_hash(vpn, large); - - hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); - -#ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_LPAR) - ret = pSeries_lpar_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - else -#endif /* CONFIG_PPC_PSERIES */ - ret = native_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - - if (ret == -1) { - ppc64_terminate_msg(0x20, "create_pte_mapping"); - loop_forever(); - } - } -} - -void __init htab_initialize(void) -{ - unsigned long table, htab_size_bytes; - unsigned long pteg_count; - unsigned long mode_rw; - int i, use_largepages = 0; - - DBG(" -> htab_initialize()\n"); - - /* - * Calculate the required size of the htab. We want the number of - * PTEGs to equal one half the number of real pages. - */ - htab_size_bytes = 1UL << ppc64_pft_size; - pteg_count = htab_size_bytes >> 7; - - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - - htab_hash_mask = pteg_count - 1; - - if (systemcfg->platform & PLATFORM_LPAR) { - /* Using a hypervisor which owns the htab */ - htab_address = NULL; - _SDR1 = 0; - } else { - /* Find storage for the HPT. Must be contiguous in - * the absolute address space. - */ - table = lmb_alloc(htab_size_bytes, htab_size_bytes); - - DBG("Hash table allocated at %lx, size: %lx\n", table, - htab_size_bytes); - - if ( !table ) { - ppc64_terminate_msg(0x20, "hpt space"); - loop_forever(); - } - htab_address = abs_to_virt(table); - - /* htab absolute addr + encoded htabsize */ - _SDR1 = table + __ilog2(pteg_count) - 11; - - /* Initialize the HPT with no entries */ - memset((void *)table, 0, htab_size_bytes); - } - - mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; - - /* On U3 based machines, we need to reserve the DART area and - * _NOT_ map it to avoid cache paradoxes as it's remapped non - * cacheable later on - */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) - use_largepages = 1; - - /* create bolted the linear mapping in the hash table */ - for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - - base = lmb.memory.region[i].physbase + KERNELBASE; - size = lmb.memory.region[i].size; - - DBG("creating mapping for region: %lx : %lx\n", base, size); - -#ifdef CONFIG_U3_DART - /* Do not map the DART space. Fortunately, it will be aligned - * in such a way that it will not cross two lmb regions and will - * fit within a single 16Mb page. - * The DART space is assumed to be a full 16Mb region even if we - * only use 2Mb of that space. We will use more of it later for - * AGP GART. We have to use a full 16Mb large page. - */ - DBG("DART base: %lx\n", dart_tablebase); - - if (dart_tablebase != 0 && dart_tablebase >= base - && dart_tablebase < (base + size)) { - if (base != dart_tablebase) - create_pte_mapping(base, dart_tablebase, mode_rw, -