From david at gibson.dropbear.id.au Sun Jan 2 09:33:45 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Sun, 2 Jan 2005 09:33:45 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <1104381206.16694.38.camel@localhost.localdomain> References: <1104381206.16694.38.camel@localhost.localdomain> Message-ID: <20050101223345.GC2297@zax> On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > Hi- > > I've been playing around with sparse a little and saw that it gives a > lot of warnings like this: > > arch/ppc64/mm/init.c:755:35: warning: constant 0x0000020000000000 is so > big it is long > > It looks like we get such a warning for every expression of the form > "(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)" -- basically, > every time the code checks for a cpu feature. > > Following is an attempt to clean these up by defining the cpu feature > constants using the ASM_CONST macro from ppc64's page.h. I believe this > is consistent with the intentions for ASM_CONST's use. > > There's some fallout: > > flush_icache_range() was already using ASM_CONST on one of the > constants, so that is fixed up. > > switch_mm() uses a BEGIN_FTR_SECTION ... > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change > since 0x0000000000000008UL winds up in the generated assembly. I > couldn't find the BEGIN/END_FTR_SECTION construct used in any other C > code, so I replaced this with the usual bitwise 'and' conditional (I > hope someone else will verify that this is equivalent :). > > So, does this look like the right thing to do? It eliminates 129 sparse > warnings from a defconfig 2.6.10 build. Hurrah! You beat me to it... > Index: 2.6.10/include/asm-ppc64/cputable.h > =================================================================== > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > @@ -16,6 +16,7 @@ > #define __ASM_PPC_CPUTABLE_H > > #include > +#include /* for ASM_CONST */ Have you double checked that this won't cause a nasty #include loop? The CPU constants are used in quite a few places, as is page.h -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From nathanl at austin.ibm.com Tue Jan 4 00:16:24 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 03 Jan 2005 07:16:24 -0600 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <20050101223345.GC2297@zax> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> Message-ID: <1104758184.15200.6.camel@localhost.localdomain> On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > > > > Index: 2.6.10/include/asm-ppc64/cputable.h > > =================================================================== > > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > > @@ -16,6 +16,7 @@ > > #define __ASM_PPC_CPUTABLE_H > > > > #include > > +#include /* for ASM_CONST */ > > Have you double checked that this won't cause a nasty #include loop? > The CPU constants are used in quite a few places, as is page.h I think it's ok -- page.h includes the following: - linux/config.h, which includes linux/autoconf.h - asm-ppc64/naca.h, which includes asm-ppc64/types.h and asm-ppc64/systemcfg.h. So I don't see any way that cputable.h could be pulled in before ASM_CONST is defined. Thanks, Nathan From jdl at freescale.com Tue Jan 4 05:56:39 2005 From: jdl at freescale.com (Jon Loeliger) Date: Mon, 03 Jan 2005 12:56:39 -0600 Subject: PATCH uninorth3 (G5) agp support In-Reply-To: <41D00564.6010507@free.fr> References: <41CEC6B0.5020106@free.fr> <1104137527.5615.20.camel@gaston> <41D00564.6010507@free.fr> Message-ID: <1104778599.14049.64.camel@cashmere.sps.mot.com> On Mon, 2004-12-27 at 06:51, Jerome Glisse wrote: > /* My understanding of UniNorth AGP as of UniNorth rev 1.0x, > * revision 1.5 (x4 AGP) may need further changes. > diff -Naur linux/include/linux/pci_ids.h linux-new/include/linux/pci_ids.h > --- linux/include/linux/pci_ids.h 2004-12-26 14:40:05.000000000 +0100 > +++ linux-new/include/linux/pci_ids.h 2004-12-27 13:40:50.121003792 +0100 > @@ -842,6 +842,7 @@ > #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC2 0x0032 > #define PCI_DEVIEC_ID_APPLE_UNI_N_ATA 0x0033 > #define PCI_DEVICE_ID_APPLE_UNI_N_AGP2 0x0034 > +#define PCI_DEVICE_ID_APPLE_U3_AGP 0x0059 > #define PCI_DEVICE_ID_APPLE_IPID_ATA100 0x003b > #define PCI_DEVICE_ID_APPLE_KEYLARGO_I 0x003e > #define PCI_DEVICE_ID_APPLE_K2_ATA100 0x0043 So, did 0x0033's symbol need to be spelled consistently too? NB: PCI_DEVIEC_ Thanks, jdl From david at gibson.dropbear.id.au Tue Jan 4 11:07:23 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 4 Jan 2005 11:07:23 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <1104758184.15200.6.camel@localhost.localdomain> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> <1104758184.15200.6.camel@localhost.localdomain> Message-ID: <20050104000723.GB6745@zax> On Mon, Jan 03, 2005 at 07:16:24AM -0600, Nathan Lynch wrote: > On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > > On Wed, Dec 29, 2004 at 10:33:26PM -0600, Nathan Lynch wrote: > > > > > > Index: 2.6.10/include/asm-ppc64/cputable.h > > > =================================================================== > > > +++ 2.6.10/include/asm-ppc64/cputable.h 2004-12-30 04:04:09.463979408 +0000 > > > @@ -16,6 +16,7 @@ > > > #define __ASM_PPC_CPUTABLE_H > > > > > > #include > > > +#include /* for ASM_CONST */ > > > > Have you double checked that this won't cause a nasty #include loop? > > The CPU constants are used in quite a few places, as is page.h > > I think it's ok -- page.h includes the following: > > - linux/config.h, which includes linux/autoconf.h > > - asm-ppc64/naca.h, which includes asm-ppc64/types.h and > asm-ppc64/systemcfg.h. > > So I don't see any way that cputable.h could be pulled in before > ASM_CONST is defined. Ok, sounds good. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From sfr at canb.auug.org.au Tue Jan 4 14:53:56 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 14:53:56 +1100 Subject: PPC64 cleanups 0/11 Message-ID: <20050104145356.4d5333dd.sfr@canb.auug.org.au> Hi Andrew, The following series of patches are mainly just cleanups of the ppc64 code in order to eliminate the naca structure. In the end, the naca only exists for legacy iseries kernels. One of the more intrusive parts of these patches is the renaming of the fields of the lppaca structure to eliminate another set of StudyCaps. These patches (in total) have been built on iSeries, pSeries and pmac and booted on iSeries and pSeries. Please apply and send upstream. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/340c1852/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:04:10 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:04:10 +1100 Subject: [PATCH 1/11] PPC64: Consolidate cache sizing variables In-Reply-To: <20050104145356.4d5333dd.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> Message-ID: <20050104150410.199b132e.sfr@canb.auug.org.au> Hi Andrew, This patch consolidates the variables that define the PPC64 cache sizes into a single structure (the were in the naca and the systemcfg structures). Those that were in the systemcfg structure are left there just because they are exported to user mode through /proc. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c --- linus-bk/arch/ppc64/kernel/asm-offsets.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c 2004-12-31 14:52:14.000000000 +1100 @@ -35,6 +35,7 @@ #include #include #include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -69,12 +70,12 @@ /* naca */ DEFINE(PACA, offsetof(struct naca_struct, paca)); - DEFINE(DCACHEL1LINESIZE, offsetof(struct systemcfg, dCacheL1LineSize)); - DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct naca_struct, dCacheL1LogLineSize)); - DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct naca_struct, dCacheL1LinesPerPage)); - DEFINE(ICACHEL1LINESIZE, offsetof(struct systemcfg, iCacheL1LineSize)); - DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct naca_struct, iCacheL1LogLineSize)); - DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct naca_struct, iCacheL1LinesPerPage)); + DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size)); + DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size)); + DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page)); + DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size)); + DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size)); + DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); /* paca */ diff -ruN linus-bk/arch/ppc64/kernel/eeh.c linus-bk-naca.1/arch/ppc64/kernel/eeh.c --- linus-bk/arch/ppc64/kernel/eeh.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/eeh.c 2004-12-31 14:52:14.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include #include "pci.h" #undef DEBUG diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c 2004-12-31 14:52:14.000000000 +1100 @@ -44,6 +44,7 @@ #include "iSeries_setup.h" #include #include +#include #include #include #include @@ -560,33 +561,36 @@ unsigned int i, n; unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; - systemcfg->iCacheL1Size = - xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; - systemcfg->iCacheL1LineSize = + systemcfg->icache_size = + ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; + systemcfg->icache_line_size = + ppc64_caches.iline_size = xIoHriProcessorVpd[procIx].xInstCacheOperandSize; - systemcfg->dCacheL1Size = + systemcfg->dcache_size = + ppc64_caches.dsize = xIoHriProcessorVpd[procIx].xDataL1CacheSizeKB * 1024; - systemcfg->dCacheL1LineSize = + systemcfg->dcache_line_size = + ppc64_caches.dline_size = xIoHriProcessorVpd[procIx].xDataCacheOperandSize; - naca->iCacheL1LinesPerPage = PAGE_SIZE / systemcfg->iCacheL1LineSize; - naca->dCacheL1LinesPerPage = PAGE_SIZE / systemcfg->dCacheL1LineSize; + ppc64_caches.ilines_per_page = PAGE_SIZE / ppc64_caches.iline_size; + ppc64_caches.dlines_per_page = PAGE_SIZE / ppc64_caches.dline_size; - i = systemcfg->iCacheL1LineSize; + i = ppc64_caches.iline_size; n = 0; while ((i = (i / 2))) ++n; - naca->iCacheL1LogLineSize = n; + ppc64_caches.log_iline_size = n; - i = systemcfg->dCacheL1LineSize; + i = ppc64_caches.dline_size; n = 0; while ((i = (i / 2))) ++n; - naca->dCacheL1LogLineSize = n; + ppc64_caches.log_dline_size = n; printk("D-cache line size = %d\n", - (unsigned int)systemcfg->dCacheL1LineSize); + (unsigned int)ppc64_caches.dline_size); printk("I-cache line size = %d\n", - (unsigned int)systemcfg->iCacheL1LineSize); + (unsigned int)ppc64_caches.iline_size); } /* diff -ruN linus-bk/arch/ppc64/kernel/idle.c linus-bk-naca.1/arch/ppc64/kernel/idle.c --- linus-bk/arch/ppc64/kernel/idle.c 2004-10-27 07:32:57.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/idle.c 2004-12-31 14:52:14.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include extern void power4_idle(void); diff -ruN linus-bk/arch/ppc64/kernel/misc.S linus-bk-naca.1/arch/ppc64/kernel/misc.S --- linus-bk/arch/ppc64/kernel/misc.S 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/misc.S 2004-12-31 14:52:14.000000000 +1100 @@ -189,6 +189,11 @@ isync blr + .section ".toc","aw" +PPC64_CACHES: + .tc ppc64_caches[TC],ppc64_caches + .section ".text" + /* * Write any modified data cache blocks out to memory * and invalidate the corresponding instruction cache blocks. @@ -207,11 +212,8 @@ * and in some cases i-cache and d-cache line sizes differ from * each other. */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11)/* Get cache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10)/* Get cache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -227,7 +229,7 @@ /* Now invalidate the instruction cache */ - lwz r7,ICACHEL1LINESIZE(r11) /* Get Icache line size */ + lwz r7,ICACHEL1LINESIZE(r10) /* Get Icache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -256,11 +258,8 @@ * * Different systems have different cache line sizes */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -286,11 +285,8 @@ * flush all bytes from start to stop-1 inclusive */ _GLOBAL(flush_dcache_phys_range) - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -332,13 +328,10 @@ */ /* Flush the dcache */ - LOADADDR(r7,naca) - ld r7,0(r7) - LOADADDR(r8,systemcfg) /* Get systemcfg address */ - ld r8,0(r8) + ld r7,PPC64_CACHES at toc(r2) clrrdi r3,r3,12 /* Page align */ lwz r4,DCACHEL1LINESPERPAGE(r7) /* Get # dcache lines per page */ - lwz r5,DCACHEL1LINESIZE(r8) /* Get dcache line size */ + lwz r5,DCACHEL1LINESIZE(r7) /* Get dcache line size */ mr r6,r3 mtctr r4 0: dcbst 0,r6 @@ -349,7 +342,7 @@ /* Now invalidate the icache */ lwz r4,ICACHEL1LINESPERPAGE(r7) /* Get # icache lines per page */ - lwz r5,ICACHEL1LINESIZE(r8) /* Get icache line size */ + lwz r5,ICACHEL1LINESIZE(r7) /* Get icache line size */ mtctr r4 1: icbi 0,r3 add r3,r3,r5 diff -ruN linus-bk/arch/ppc64/kernel/nvram.c linus-bk-naca.1/arch/ppc64/kernel/nvram.c --- linus-bk/arch/ppc64/kernel/nvram.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/nvram.c 2004-12-31 14:52:14.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #undef DEBUG_NVRAM diff -ruN linus-bk/arch/ppc64/kernel/pSeries_iommu.c linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c --- linus-bk/arch/ppc64/kernel/pSeries_iommu.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/pSeries_iommu.c 2004-12-31 14:52:14.000000000 +1100 @@ -43,6 +43,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/pacaData.c linus-bk-naca.1/arch/ppc64/kernel/pacaData.c --- linus-bk/arch/ppc64/kernel/pacaData.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/pacaData.c 2004-12-31 14:52:14.000000000 +1100 @@ -10,6 +10,8 @@ #include #include #include +#include + #include #include #include @@ -20,7 +22,9 @@ #include struct naca_struct *naca; +EXPORT_SYMBOL(naca); struct systemcfg *systemcfg; +EXPORT_SYMBOL(systemcfg); /* This symbol is provided by the linker - let it fill in the paca * field correctly */ diff -ruN linus-bk/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c --- linus-bk/arch/ppc64/kernel/pmac_setup.c 2004-10-25 18:18:33.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:52:14.000000000 +1100 @@ -70,6 +70,7 @@ #include #include #include +#include #include "pmac.h" #include "mpic.h" diff -ruN linus-bk/arch/ppc64/kernel/ppc_ksyms.c linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk/arch/ppc64/kernel/ppc_ksyms.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/ppc_ksyms.c 2004-12-31 14:52:14.000000000 +1100 @@ -67,7 +67,6 @@ EXPORT_SYMBOL(__down_interruptible); EXPORT_SYMBOL(__up); -EXPORT_SYMBOL(naca); EXPORT_SYMBOL(__down); #ifdef CONFIG_PPC_ISERIES EXPORT_SYMBOL(itLpNaca); @@ -162,4 +161,3 @@ EXPORT_SYMBOL(tb_ticks_per_usec); EXPORT_SYMBOL(paca); EXPORT_SYMBOL(cur_cpu_spec); -EXPORT_SYMBOL(systemcfg); diff -ruN linus-bk/arch/ppc64/kernel/rtas-proc.c linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c --- linus-bk/arch/ppc64/kernel/rtas-proc.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/rtas-proc.c 2004-12-31 14:52:14.000000000 +1100 @@ -31,6 +31,7 @@ #include #include /* for ppc_md */ #include +#include /* Token for Sensors */ #define KEY_SWITCH 0x0001 diff -ruN linus-bk/arch/ppc64/kernel/rtas.c linus-bk-naca.1/arch/ppc64/kernel/rtas.c --- linus-bk/arch/ppc64/kernel/rtas.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/rtas.c 2004-12-31 14:52:14.000000000 +1100 @@ -29,6 +29,7 @@ #include #include #include +#include struct flash_block_list_header rtas_firmware_flash_list = {0, NULL}; diff -ruN linus-bk/arch/ppc64/kernel/rtasd.c linus-bk-naca.1/arch/ppc64/kernel/rtasd.c --- linus-bk/arch/ppc64/kernel/rtasd.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/rtasd.c 2004-12-31 14:52:14.000000000 +1100 @@ -26,6 +26,7 @@ #include #include #include +#include #if 0 #define DEBUG(A...) printk(KERN_ERR A) diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-naca.1/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-31 16:22:00.000000000 +1100 @@ -54,6 +54,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,8 @@ int boot_cpuid_phys = 0; dev_t boot_dev; +struct ppc64_caches ppc64_caches; + /* * These are used in binfmt_elf.c to put aux entries on the stack * for each elf executable being started. @@ -489,15 +492,15 @@ lsizep = (u32 *) get_property(np, dc, NULL); if (lsizep != NULL) lsize = *lsizep; - if (sizep == 0 || lsizep == 0) DBG("Argh, can't find dcache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->dCacheL1Size = size; - systemcfg->dCacheL1LineSize = lsize; - naca->dCacheL1LogLineSize = __ilog2(lsize); - naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize); + systemcfg->dcache_size = ppc64_caches.dsize = size; + systemcfg->dcache_line_size = + ppc64_caches.dline_size = lsize; + ppc64_caches.log_dline_size = __ilog2(lsize); + ppc64_caches.dlines_per_page = PAGE_SIZE / lsize; size = 0; lsize = cur_cpu_spec->icache_bsize; @@ -511,11 +514,11 @@ DBG("Argh, can't find icache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->iCacheL1Size = size; - systemcfg->iCacheL1LineSize = lsize; - naca->iCacheL1LogLineSize = __ilog2(lsize); - naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize); - + systemcfg->icache_size = ppc64_caches.isize = size; + systemcfg->icache_line_size = + ppc64_caches.iline_size = lsize; + ppc64_caches.log_iline_size = __ilog2(lsize); + ppc64_caches.ilines_per_page = PAGE_SIZE / lsize; } } @@ -664,8 +667,10 @@ printk("systemcfg->platform = 0x%x\n", systemcfg->platform); printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); printk("systemcfg->physicalMemorySize = 0x%lx\n", systemcfg->physicalMemorySize); - printk("systemcfg->dCacheL1LineSize = 0x%x\n", systemcfg->dCacheL1LineSize); - printk("systemcfg->iCacheL1LineSize = 0x%x\n", systemcfg->iCacheL1LineSize); + printk("ppc64_caches.dcache_line_size = 0x%x\n", + ppc64_caches.dline_size); + printk("ppc64_caches.icache_line_size = 0x%x\n", + ppc64_caches.iline_size); printk("htab_data.htab = 0x%p\n", htab_data.htab); printk("htab_data.num_ptegs = 0x%lx\n", htab_data.htab_num_ptegs); printk("-----------------------------------------------------\n"); @@ -1000,8 +1005,8 @@ * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - dcache_bsize = systemcfg->dCacheL1LineSize; - icache_bsize = systemcfg->iCacheL1LineSize; + dcache_bsize = ppc64_caches.dline_size; + icache_bsize = ppc64_caches.iline_size; /* reboot on panic */ panic_timeout = 180; diff -ruN linus-bk/arch/ppc64/kernel/sys_ppc32.c linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c --- linus-bk/arch/ppc64/kernel/sys_ppc32.c 2004-10-28 16:57:54.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/sys_ppc32.c 2004-12-31 14:52:14.000000000 +1100 @@ -73,6 +73,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/sysfs.c linus-bk-naca.1/arch/ppc64/kernel/sysfs.c --- linus-bk/arch/ppc64/kernel/sysfs.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/sysfs.c 2004-12-31 14:52:14.000000000 +1100 @@ -13,6 +13,7 @@ #include #include #include +#include /* SMT stuff */ diff -ruN linus-bk/arch/ppc64/kernel/time.c linus-bk-naca.1/arch/ppc64/kernel/time.c --- linus-bk/arch/ppc64/kernel/time.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/time.c 2004-12-31 14:52:14.000000000 +1100 @@ -66,6 +66,7 @@ #include #include #include +#include void smp_local_timer_interrupt(struct pt_regs *); diff -ruN linus-bk/arch/ppc64/kernel/traps.c linus-bk-naca.1/arch/ppc64/kernel/traps.c --- linus-bk/arch/ppc64/kernel/traps.c 2004-09-09 09:59:49.000000000 +1000 +++ linus-bk-naca.1/arch/ppc64/kernel/traps.c 2004-12-31 14:52:14.000000000 +1100 @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC_PSERIES /* This is true if we are using the firmware NMI handler (typically LPAR) */ diff -ruN linus-bk/include/asm-ppc64/cache.h linus-bk-naca.1/include/asm-ppc64/cache.h --- linus-bk/include/asm-ppc64/cache.h 2002-08-28 06:04:10.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/cache.h 2004-12-31 14:52:14.000000000 +1100 @@ -7,6 +7,8 @@ #ifndef __ARCH_PPC64_CACHE_H #define __ARCH_PPC64_CACHE_H +#include + /* bytes per L1 cache line */ #define L1_CACHE_SHIFT 7 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) @@ -14,4 +16,21 @@ #define SMP_CACHE_BYTES L1_CACHE_BYTES #define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ +#ifndef __ASSEMBLY__ + +struct ppc64_caches { + u32 dsize; /* L1 d-cache size */ + u32 dline_size; /* L1 d-cache line size */ + u32 log_dline_size; + u32 dlines_per_page; + u32 isize; /* L1 i-cache size */ + u32 iline_size; /* L1 i-cache line size */ + u32 log_iline_size; + u32 ilines_per_page; +}; + +extern struct ppc64_caches ppc64_caches; + +#endif + #endif diff -ruN linus-bk/include/asm-ppc64/naca.h linus-bk-naca.1/include/asm-ppc64/naca.h --- linus-bk/include/asm-ppc64/naca.h 2004-09-16 21:51:58.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/naca.h 2004-12-31 14:52:14.000000000 +1100 @@ -16,11 +16,7 @@ #ifndef __ASSEMBLY__ struct naca_struct { - /*================================================================== - * Cache line 1: 0x0000 - 0x007F - * Kernel only data - undefined for user space - *================================================================== - */ + /* Kernel only data - undefined for user space */ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ @@ -32,12 +28,6 @@ u64 interrupt_controller; /* Type of int controller 0x40 */ u64 unused1; /* was SLB size in entries 0x48 */ u64 pftSize; /* Log 2 of page table size 0x50 */ - void *systemcfg; /* Pointer to systemcfg data 0x58 */ - u32 dCacheL1LogLineSize; /* L1 d-cache line size Log2 0x60 */ - u32 dCacheL1LinesPerPage; /* L1 d-cache lines / page 0x64 */ - u32 iCacheL1LogLineSize; /* L1 i-cache line size Log2 0x68 */ - u32 iCacheL1LinesPerPage; /* L1 i-cache lines / page 0x6c */ - u8 resv0[15]; /* Reserved 0x71 - 0x7F */ }; extern struct naca_struct *naca; diff -ruN linus-bk/include/asm-ppc64/page.h linus-bk-naca.1/include/asm-ppc64/page.h --- linus-bk/include/asm-ppc64/page.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/page.h 2004-12-31 14:52:14.000000000 +1100 @@ -93,7 +93,7 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ -#include +#include #undef STRICT_MM_TYPECHECKS @@ -106,8 +106,8 @@ { unsigned long lines, line_size; - line_size = systemcfg->dCacheL1LineSize; - lines = naca->dCacheL1LinesPerPage; + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; __asm__ __volatile__( "mtctr %1 # clear_page\n\ diff -ruN linus-bk/include/asm-ppc64/processor.h linus-bk-naca.1/include/asm-ppc64/processor.h --- linus-bk/include/asm-ppc64/processor.h 2004-12-29 18:05:40.000000000 +1100 +++ linus-bk-naca.1/include/asm-ppc64/processor.h 2004-12-31 15:01:17.000000000 +1100 @@ -19,6 +19,7 @@ #endif #include #include +#include /* Machine State Register (MSR) Fields */ #define MSR_SF_LG 63 /* Enable 64 bit mode */ diff -ruN linus-bk/include/asm-ppc64/systemcfg.h linus-bk-naca.1/include/asm-ppc64/systemcfg.h --- linus-bk/include/asm-ppc64/systemcfg.h 2004-09-29 08:25:16.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/systemcfg.h 2004-12-31 14:52:14.000000000 +1100 @@ -15,14 +15,6 @@ * End Change Activity */ - -#ifndef __KERNEL__ -#include -#include -#include -#include -#endif - /* * If the major version changes we are incompatible. * Minor version changes are a hint. @@ -50,10 +42,11 @@ __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */ __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */ __u32 tz_dsttime; /* Type of dst correction 0x5C */ - __u32 dCacheL1Size; /* L1 d-cache size 0x60 */ - __u32 dCacheL1LineSize; /* L1 d-cache line size 0x64 */ - __u32 iCacheL1Size; /* L1 i-cache size 0x68 */ - __u32 iCacheL1LineSize; /* L1 i-cache line size 0x6C */ + /* next four are no longer used except to be exported to /proc */ + __u32 dcache_size; /* L1 d-cache size 0x60 */ + __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ + __u32 icache_size; /* L1 i-cache size 0x68 */ + __u32 icache_line_size; /* L1 i-cache line size 0x6C */ __u8 reserved0[3984]; /* Reserve rest of page 0x70 */ }; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ee27c294/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:08:33 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:08:33 +1100 Subject: [PATCH 2/11] PPC64: remove the page table size from the naca In-Reply-To: <20050104150410.199b132e.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> Message-ID: <20050104150833.5d3f3722.sfr@canb.auug.org.au> Hi Andrew, This patch just removes the page table size field from the naca (and makes it ppc64_pft_size instead). Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c 2004-12-31 15:16:48.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include @@ -368,7 +367,7 @@ static void pSeries_lpar_hptab_clear(void) { - unsigned long size_bytes = 1UL << naca->pftSize; + unsigned long size_bytes = 1UL << ppc64_pft_size; unsigned long hpte_count = size_bytes >> 4; unsigned long dummy1, dummy2; int i; diff -ruN linus-bk-naca.1/arch/ppc64/kernel/prom.c linus-bk-naca.2/arch/ppc64/kernel/prom.c --- linus-bk-naca.1/arch/ppc64/kernel/prom.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/prom.c 2004-12-31 14:52:56.000000000 +1100 @@ -844,12 +844,12 @@ /* On LPAR, look for the first ibm,pft-size property for the hash table size */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR && naca->pftSize == 0) { + if (systemcfg->platform == PLATFORM_PSERIES_LPAR && ppc64_pft_size == 0) { u32 *pft_size; pft_size = (u32 *)get_flat_dt_prop(node, "ibm,pft-size", NULL); if (pft_size != NULL) { /* pft_size[0] is the NUMA CEC cookie */ - naca->pftSize = pft_size[1]; + ppc64_pft_size = pft_size[1]; } } @@ -1018,7 +1018,7 @@ initial_boot_params = params; /* By default, hash size is not set */ - naca->pftSize = 0; + ppc64_pft_size = 0; /* Retreive various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1047,7 +1047,7 @@ /* If hash size wasn't obtained above, we calculate it now based on * the total RAM size */ - if (naca->pftSize == 0) { + if (ppc64_pft_size == 0) { unsigned long rnd_mem_size, pteg_count; /* round mem_size up to next power of 2 */ @@ -1058,10 +1058,10 @@ /* # pages / 2 */ pteg_count = (rnd_mem_size >> (12 + 1)); - naca->pftSize = __ilog2(pteg_count << 7); + ppc64_pft_size = __ilog2(pteg_count << 7); } - DBG("Hash pftSize: %x\n", (int)naca->pftSize); + DBG("Hash pftSize: %x\n", (int)ppc64_pft_size); DBG(" <- early_init_devtree()\n"); } diff -ruN linus-bk-naca.1/arch/ppc64/kernel/setup.c linus-bk-naca.2/arch/ppc64/kernel/setup.c --- linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-31 16:22:00.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/setup.c 2004-12-31 16:22:49.000000000 +1100 @@ -55,6 +55,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,7 @@ int boot_cpuid = 0; int boot_cpuid_phys = 0; dev_t boot_dev; +u64 ppc64_pft_size; struct ppc64_caches ppc64_caches; @@ -660,7 +662,7 @@ printk("-----------------------------------------------------\n"); printk("naca = 0x%p\n", naca); - printk("naca->pftSize = 0x%lx\n", naca->pftSize); + printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); diff -ruN linus-bk-naca.1/arch/ppc64/mm/hash_utils.c linus-bk-naca.2/arch/ppc64/mm/hash_utils.c --- linus-bk-naca.1/arch/ppc64/mm/hash_utils.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-naca.2/arch/ppc64/mm/hash_utils.c 2004-12-31 14:52:56.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -147,7 +146,7 @@ * Calculate the required size of the htab. We want the number of * PTEGs to equal one half the number of real pages. */ - htab_size_bytes = 1UL << naca->pftSize; + htab_size_bytes = 1UL << ppc64_pft_size; pteg_count = htab_size_bytes >> 7; /* For debug, make the HTAB 1/8 as big as it normally would be. */ diff -ruN linus-bk-naca.1/include/asm-ppc64/naca.h linus-bk-naca.2/include/asm-ppc64/naca.h --- linus-bk-naca.1/include/asm-ppc64/naca.h 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.2/include/asm-ppc64/naca.h 2004-12-31 14:52:56.000000000 +1100 @@ -26,8 +26,6 @@ u64 log; /* Ptr to log buffer 0x30 */ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ u64 interrupt_controller; /* Type of int controller 0x40 */ - u64 unused1; /* was SLB size in entries 0x48 */ - u64 pftSize; /* Log 2 of page table size 0x50 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.1/include/asm-ppc64/page.h linus-bk-naca.2/include/asm-ppc64/page.h --- linus-bk-naca.1/include/asm-ppc64/page.h 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.2/include/asm-ppc64/page.h 2004-12-31 14:52:56.000000000 +1100 @@ -183,6 +183,8 @@ extern int page_is_ram(unsigned long pfn); +extern u64 ppc64_pft_size; /* Log 2 of page table size */ + #endif /* __ASSEMBLY__ */ #ifdef MODULE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/29fa9153/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:12:29 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:12:29 +1100 Subject: [PATCH 3/11] PPC64: remove interrupt_controller from naca In-Reply-To: <20050104150833.5d3f3722.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> Message-ID: <20050104151229.521e8083.sfr@canb.auug.org.au> Hi Andrew, This patch just moves the interrupt_controller field of the naca into a global variable. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.2/arch/ppc64/kernel/irq.c linus-bk-naca.3/arch/ppc64/kernel/irq.c --- linus-bk-naca.2/arch/ppc64/kernel/irq.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.3/arch/ppc64/kernel/irq.c 2004-12-31 14:53:21.000000000 +1100 @@ -65,6 +65,7 @@ int __irq_offset_value; int ppc_spurious_interrupts; unsigned long lpevent_count; +u64 ppc64_interrupt_controller; int show_interrupts(struct seq_file *p, void *v) { @@ -360,7 +361,7 @@ unsigned int virq, first_virq; static int warned; - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) return real_irq; /* no mapping for openpic (for now) */ /* don't map interrupts < MIN_VIRT_IRQ */ diff -ruN linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/maple_setup.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-naca.3/arch/ppc64/kernel/maple_setup.c 2004-12-31 14:53:21.000000000 +1100 @@ -155,7 +155,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- maple_init_early\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_pci.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c 2004-12-31 14:53:21.000000000 +1100 @@ -353,7 +353,7 @@ unsigned int *opprop = NULL; struct device_node *root = of_find_node_by_path("/"); - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { opprop = (unsigned int *)get_property(root, "platform-open-pic", NULL); } @@ -375,7 +375,7 @@ pci_process_bridge_OF_ranges(phb, node); pci_setup_phb_io(phb, index == 0); - if (naca->interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { + if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { int addr = root_size_cells * (index + 2) - 1; mpic_assign_isu(pSeries_mpic, index, opprop[addr]); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_setup.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:22:17.000000000 +1100 @@ -196,7 +196,7 @@ static void __init pSeries_setup_arch(void) { /* Fixup ppc_md depending on the type of interrupt controller */ - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { ppc_md.init_IRQ = pSeries_init_mpic; ppc_md.get_irq = mpic_get_irq; /* Allocate the mpic now, so that find_and_init_phbs() can @@ -308,13 +308,13 @@ * to properly parse the OF interrupt tree & do the virtual irq mapping */ __irq_offset_value = NUM_ISA_INTERRUPTS; - naca->interrupt_controller = IC_INVALID; + ppc64_interrupt_controller = IC_INVALID; for (np = NULL; (np = of_find_node_by_name(np, "interrupt-controller"));) { typep = (char *)get_property(np, "compatible", NULL); if (strstr(typep, "open-pic")) - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; else if (strstr(typep, "ppc-xicp")) - naca->interrupt_controller = IC_PPC_XIC; + ppc64_interrupt_controller = IC_PPC_XIC; else printk("initialize_naca: failed to recognize" " interrupt-controller\n"); diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c --- linus-bk-naca.2/arch/ppc64/kernel/pSeries_smp.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:22:45.000000000 +1100 @@ -348,7 +348,7 @@ DBG(" -> smp_init_pSeries()\n"); - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) smp_ops = &pSeries_mpic_smp_ops; else smp_ops = &pSeries_xics_smp_ops; diff -ruN linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c --- linus-bk-naca.2/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/pmac_setup.c 2004-12-31 14:53:21.000000000 +1100 @@ -70,7 +70,6 @@ #include #include #include -#include #include "pmac.h" #include "mpic.h" @@ -316,7 +315,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- pmac_init_early\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/prom.c linus-bk-naca.3/arch/ppc64/kernel/prom.c --- linus-bk-naca.2/arch/ppc64/kernel/prom.c 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/prom.c 2004-12-31 14:53:21.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -557,7 +556,7 @@ DBG(" -> finish_device_tree\n"); - if (naca->interrupt_controller == IC_INVALID) { + if (ppc64_interrupt_controller == IC_INVALID) { DBG("failed to configure interrupt controller type\n"); panic("failed to configure interrupt controller type\n"); } diff -ruN linus-bk-naca.2/arch/ppc64/kernel/setup.c linus-bk-naca.3/arch/ppc64/kernel/setup.c --- linus-bk-naca.2/arch/ppc64/kernel/setup.c 2004-12-31 16:22:49.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/setup.c 2004-12-31 16:23:03.000000000 +1100 @@ -664,7 +664,7 @@ printk("naca = 0x%p\n", naca); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); - printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); + printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); diff -ruN linus-bk-naca.2/arch/ppc64/kernel/xics.c linus-bk-naca.3/arch/ppc64/kernel/xics.c --- linus-bk-naca.2/arch/ppc64/kernel/xics.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.3/arch/ppc64/kernel/xics.c 2004-12-31 15:24:20.000000000 +1100 @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include @@ -575,7 +574,7 @@ */ static int __init xics_setup_i8259(void) { - if (naca->interrupt_controller == IC_PPC_XIC && + if (ppc64_interrupt_controller == IC_PPC_XIC && xics_irq_8259_cascade != -1) { if (request_irq(irq_offset_up(xics_irq_8259_cascade), no_action, 0, "8259 cascade", NULL)) diff -ruN linus-bk-naca.2/include/asm-ppc64/naca.h linus-bk-naca.3/include/asm-ppc64/naca.h --- linus-bk-naca.2/include/asm-ppc64/naca.h 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-naca.3/include/asm-ppc64/naca.h 2004-12-31 14:53:21.000000000 +1100 @@ -25,7 +25,6 @@ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ - u64 interrupt_controller; /* Type of int controller 0x40 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.2/include/asm-ppc64/processor.h linus-bk-naca.3/include/asm-ppc64/processor.h --- linus-bk-naca.2/include/asm-ppc64/processor.h 2004-12-31 15:01:17.000000000 +1100 +++ linus-bk-naca.3/include/asm-ppc64/processor.h 2004-12-31 15:25:17.000000000 +1100 @@ -484,6 +484,7 @@ #ifdef __KERNEL__ extern int have_of; +extern u64 ppc64_interrupt_controller; struct task_struct; void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp); -------------- next part -------------- A non-text attachment was scrubbed... Name: 00000000.mimetmp Type: application/pgp-signature Size: 190 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment.pgp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/d720c248/attachment-0001.pgp From sfr at canb.auug.org.au Tue Jan 4 15:19:06 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:19:06 +1100 Subject: [PATCH 4/11] PPC64: remove /proc/ppc64/{naca,paca/xx} In-Reply-To: <20050104151229.521e8083.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> Message-ID: <20050104151906.6e50f1d2.sfr@canb.auug.org.au> Hi Andrew, This patch removes the (unused) /proc entries for the naca and the (per cpu) pacas. Also it removes a lot of no longer necessary includes of . Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_pci.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_pci.c 2004-12-10 16:26:54.000000000 +1100 @@ -35,7 +35,6 @@ #include #include #include -#include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_proc.c 2004-10-22 07:00:21.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_proc.c 2004-12-10 16:26:54.000000000 +1100 @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c --- linus-bk-naca.3/arch/ppc64/kernel/iSeries_smp.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/iSeries_smp.c 2004-12-10 16:26:54.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c --- linus-bk-naca.3/arch/ppc64/kernel/pSeries_pci.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_pci.c 2004-12-10 16:26:54.000000000 +1100 @@ -36,7 +36,6 @@ #include #include #include -#include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c --- linus-bk-naca.3/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:22:45.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/pSeries_smp.c 2004-12-31 15:27:45.000000000 +1100 @@ -38,7 +38,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c --- linus-bk-naca.3/arch/ppc64/kernel/pci_dn.c 2004-10-25 18:18:33.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/pci_dn.c 2004-12-10 16:26:54.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include "pci.h" diff -ruN linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c --- linus-bk-naca.3/arch/ppc64/kernel/proc_ppc64.c 2004-10-27 07:32:57.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/kernel/proc_ppc64.c 2004-12-10 16:26:54.000000000 +1100 @@ -25,8 +25,6 @@ #include #include -#include -#include #include #include #include @@ -58,26 +56,6 @@ #endif /* - * NOTE: since paca data is always in flux the values will never be a - * consistant set. - */ -static void __init proc_create_paca(struct proc_dir_entry *dir, int num) -{ - struct proc_dir_entry *ent; - struct paca_struct *lpaca = paca + num; - char buf[16]; - - sprintf(buf, "%02x", num); - ent = create_proc_entry(buf, S_IRUSR, dir); - if (ent) { - ent->nlink = 1; - ent->data = lpaca; - ent->size = 4096; - ent->proc_fops = &page_map_fops; - } -} - -/* * Create the ppc64 and ppc64/rtas directories early. This allows us to * assume that they have been previously created in drivers. */ @@ -104,17 +82,8 @@ static int __init proc_ppc64_init(void) { - unsigned long i; struct proc_dir_entry *pde; - pde = create_proc_entry("ppc64/naca", S_IRUSR, NULL); - if (!pde) - return 1; - pde->nlink = 1; - pde->data = naca; - pde->size = 4096; - pde->proc_fops = &page_map_fops; - pde = create_proc_entry("ppc64/systemcfg", S_IFREG|S_IRUGO, NULL); if (!pde) return 1; @@ -123,13 +92,6 @@ pde->size = 4096; pde->proc_fops = &page_map_fops; - /* /proc/ppc64/paca/XX -- raw paca contents. Only readable to root */ - pde = proc_mkdir("ppc64/paca", NULL); - if (!pde) - return 1; - for_each_cpu(i) - proc_create_paca(pde, i); - #ifdef CONFIG_PPC_PSERIES if ((systemcfg->platform & PLATFORM_PSERIES)) proc_ppc64_create_ofdt(); diff -ruN linus-bk-naca.3/arch/ppc64/kernel/prom_init.c linus-bk-naca.4/arch/ppc64/kernel/prom_init.c --- linus-bk-naca.3/arch/ppc64/kernel/prom_init.c 2004-12-08 12:07:34.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/prom_init.c 2004-12-10 16:26:54.000000000 +1100 @@ -43,7 +43,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/kernel/smp.c linus-bk-naca.4/arch/ppc64/kernel/smp.c --- linus-bk-naca.3/arch/ppc64/kernel/smp.c 2004-12-14 04:07:06.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/kernel/smp.c 2004-12-31 15:29:14.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/mm/init.c linus-bk-naca.4/arch/ppc64/mm/init.c --- linus-bk-naca.3/arch/ppc64/mm/init.c 2004-11-04 16:05:08.000000000 +1100 +++ linus-bk-naca.4/arch/ppc64/mm/init.c 2004-12-10 16:26:54.000000000 +1100 @@ -52,7 +52,6 @@ #include #include #include -#include #include #include #include diff -ruN linus-bk-naca.3/arch/ppc64/mm/slb.c linus-bk-naca.4/arch/ppc64/mm/slb.c --- linus-bk-naca.3/arch/ppc64/mm/slb.c 2004-09-06 10:19:04.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/mm/slb.c 2004-12-10 16:26:54.000000000 +1100 @@ -19,7 +19,6 @@ #include #include #include -#include #include extern void slb_allocate(unsigned long ea); diff -ruN linus-bk-naca.3/arch/ppc64/mm/stab.c linus-bk-naca.4/arch/ppc64/mm/stab.c --- linus-bk-naca.3/arch/ppc64/mm/stab.c 2004-09-16 21:51:57.000000000 +1000 +++ linus-bk-naca.4/arch/ppc64/mm/stab.c 2004-12-10 16:26:54.000000000 +1100 @@ -17,7 +17,6 @@ #include #include #include -#include #include /* Both the segment table and SLB code uses the following cache */ diff -ruN linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h --- linus-bk-naca.3/include/asm-ppc64/iSeries/LparData.h 2002-09-18 12:00:50.000000000 +1000 +++ linus-bk-naca.4/include/asm-ppc64/iSeries/LparData.h 2004-12-10 16:26:54.000000000 +1100 @@ -24,11 +24,9 @@ #include #include -#include #include #include #include -#include #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/c799e8f4/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:23:40 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:23:40 +1100 Subject: [PATCH 5/11] PPC64: remove the paca pointer form the naca In-Reply-To: <20050104151906.6e50f1d2.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> Message-ID: <20050104152340.67219ccf.sfr@canb.auug.org.au> Hi Andrew, The only place that was using the paca pointer that was in the naca was some assembler that used it to find a parameter to pass to some C code. That C code did not even declare that parameter! Remove the paca pointer. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.4/arch/ppc64/kernel/asm-offsets.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.5/arch/ppc64/kernel/asm-offsets.c 2004-12-10 17:27:14.000000000 +1100 @@ -28,7 +28,6 @@ #include #include -#include #include #include #include @@ -68,8 +67,6 @@ #endif /* CONFIG_ALTIVEC */ DEFINE(MM, offsetof(struct task_struct, mm)); - /* naca */ - DEFINE(PACA, offsetof(struct naca_struct, paca)); DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size)); DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size)); DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page)); diff -ruN linus-bk-naca.4/arch/ppc64/kernel/head.S linus-bk-naca.5/arch/ppc64/kernel/head.S --- linus-bk-naca.4/arch/ppc64/kernel/head.S 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.5/arch/ppc64/kernel/head.S 2004-12-10 18:40:24.000000000 +1100 @@ -517,12 +517,7 @@ __start_naca: #ifdef CONFIG_PPC_ISERIES .llong itVpdAreas -#else - .llong 0x0 #endif - .llong 0x0 - .llong 0x0 - .llong paca . = SYSTEMCFG_PHYS_ADDR .globl __end_naca @@ -1241,6 +1236,7 @@ #endif #endif b 3b /* Loop until told to go */ + #ifdef CONFIG_PPC_ISERIES _STATIC(__start_initialization_iSeries) /* Clear out the BSS */ @@ -1278,10 +1274,6 @@ SET_REG_TO_CONST(r4, NACA_VIRT_ADDR) std r4,0(r9) /* set the naca pointer */ - /* Get the pointer to the segment table */ - ld r6,PACA(r4) /* Get the base paca pointer */ - ld r4,PACASTABVIRT(r6) - bl .iSeries_early_setup /* relocation is on at this point */ diff -ruN linus-bk-naca.4/include/asm-ppc64/naca.h linus-bk-naca.5/include/asm-ppc64/naca.h --- linus-bk-naca.4/include/asm-ppc64/naca.h 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.5/include/asm-ppc64/naca.h 2004-12-10 18:42:14.000000000 +1100 @@ -11,7 +11,6 @@ */ #include -#include #ifndef __ASSEMBLY__ @@ -20,7 +19,6 @@ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ - struct paca_struct *paca; /* Ptr to an array of pacas 0x18 */ u64 debug_switch; /* Debug print control 0x20 */ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/3e8e9116/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:27:05 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:27:05 +1100 Subject: [PATCH 6/11] PPC64: remove serialPortAddr from the naca In-Reply-To: <20050104152340.67219ccf.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> Message-ID: <20050104152705.6030abc5.sfr@canb.auug.org.au> Hi Andrew, The serialPortAddr field of the naca was only being used locally, remove it. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c --- linus-bk-naca.5/arch/ppc64/kernel/maple_setup.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/maple_setup.c 2004-12-11 00:53:42.000000000 +1100 @@ -75,7 +75,8 @@ extern void maple_pci_init(void); extern void maple_pcibios_fixup(void); extern int maple_pci_get_legacy_ide_irq(struct pci_dev *dev, int channel); -extern void generic_find_legacy_serial_ports(unsigned int *default_speed); +extern void generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed); static void maple_restart(char *cmd) @@ -129,6 +130,7 @@ static void __init maple_init_early(void) { unsigned int default_speed; + u64 physport; DBG(" -> maple_init_early\n"); @@ -138,14 +140,14 @@ hpte_init_native(); /* Find the serial port */ - generic_find_legacy_serial_ports(&default_speed); + generic_find_legacy_serial_ports(&physport, &default_speed); - DBG("naca->serialPortAddr: %lx\n", (long)naca->serialPortAddr); + DBG("phys port addr: %lx\n", (long)physport); - if (naca->serialPortAddr) { + if (physport) { void *comport; /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; diff -ruN linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.5/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:22:17.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:35:13.000000000 +1100 @@ -81,7 +81,8 @@ extern int pSeries_set_rtc_time(struct rtc_time *rtc_time); extern void find_udbg_vterm(void); extern void SystemReset_FWNMI(void), MachineCheck_FWNMI(void); /* from head.S */ -extern void generic_find_legacy_serial_ports(unsigned int *default_speed); +extern void generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed); int fwnmi_active; /* TRUE if an FWNMI handler is present */ @@ -344,6 +345,7 @@ void *comport; int iommu_off = 0; unsigned int default_speed; + u64 physport; DBG(" -> pSeries_init_early()\n"); @@ -357,13 +359,13 @@ get_property(of_chosen, "linux,iommu-off", NULL)); } - generic_find_legacy_serial_ports(&default_speed); + generic_find_legacy_serial_ports(&physport, &default_speed); if (systemcfg->platform & PLATFORM_LPAR) find_udbg_vterm(); - else if (naca->serialPortAddr) { + else if (physport) { /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; diff -ruN linus-bk-naca.5/arch/ppc64/kernel/setup.c linus-bk-naca.6/arch/ppc64/kernel/setup.c --- linus-bk-naca.5/arch/ppc64/kernel/setup.c 2004-12-31 16:24:54.000000000 +1100 +++ linus-bk-naca.6/arch/ppc64/kernel/setup.c 2004-12-31 16:23:30.000000000 +1100 @@ -1154,7 +1154,8 @@ static struct plat_serial8250_port serial_ports[MAX_LEGACY_SERIAL_PORTS+1]; static unsigned int old_serial_count; -void __init generic_find_legacy_serial_ports(unsigned int *default_speed) +void __init generic_find_legacy_serial_ports(u64 *physport, + unsigned int *default_speed) { struct device_node *np; u32 *sizeprop; @@ -1172,7 +1173,7 @@ DBG(" -> generic_find_legacy_serial_port()\n"); - naca->serialPortAddr = 0; + *physport = 0; if (default_speed) *default_speed = 0; @@ -1294,7 +1295,7 @@ io_base = (io_base << 32) | rangesp[4]; } if (io_base != 0) { - naca->serialPortAddr = io_base + reg->address; + *physport = io_base + reg->address; if (default_speed && spd) *default_speed = *spd; } diff -ruN linus-bk-naca.5/include/asm-ppc64/naca.h linus-bk-naca.6/include/asm-ppc64/naca.h --- linus-bk-naca.5/include/asm-ppc64/naca.h 2004-12-10 18:42:14.000000000 +1100 +++ linus-bk-naca.6/include/asm-ppc64/naca.h 2004-12-11 00:03:55.000000000 +1100 @@ -22,7 +22,6 @@ u64 debug_switch; /* Debug print control 0x20 */ u64 banner; /* Ptr to banner string 0x28 */ u64 log; /* Ptr to log buffer 0x30 */ - u64 serialPortAddr; /* Phy addr of serial port 0x38 */ }; extern struct naca_struct *naca; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/e06bdbb4/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:31:02 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:31:02 +1100 Subject: [PATCH 7/11] PPC64: remove debug_switch from the naca In-Reply-To: <20050104152705.6030abc5.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> Message-ID: <20050104153102.67284491.sfr@canb.auug.org.au> Hi Andrew, The patch moves the debug_switch from the naca to a global variable. Also, a couple of trivial naming tidy ups. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.6/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:35:13.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/pSeries_setup.c 2004-12-31 15:39:01.000000000 +1100 @@ -56,7 +56,6 @@ #include #include #include -#include #include #include @@ -317,7 +316,7 @@ else if (strstr(typep, "ppc-xicp")) ppc64_interrupt_controller = IC_PPC_XIC; else - printk("initialize_naca: failed to recognize" + printk("pSeries_discover_pic: failed to recognize" " interrupt-controller\n"); break; } diff -ruN linus-bk-naca.6/arch/ppc64/kernel/setup.c linus-bk-naca.7/arch/ppc64/kernel/setup.c --- linus-bk-naca.6/arch/ppc64/kernel/setup.c 2004-12-31 16:23:30.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/setup.c 2004-12-31 16:25:02.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -113,6 +112,7 @@ int boot_cpuid_phys = 0; dev_t boot_dev; u64 ppc64_pft_size; +u64 ppc64_debug_switch; struct ppc64_caches ppc64_caches; @@ -161,7 +161,7 @@ */ void __init ppcdbg_initialize(void) { - naca->debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ + ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ /* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */; } @@ -399,7 +399,7 @@ DBG(" -> early_setup()\n"); /* - * Fill the default DBG level in naca (do we want to keep + * Fill the default DBG level (do we want to keep * that old mecanism around forever ?) */ ppcdbg_initialize(); @@ -453,17 +453,17 @@ /* - * Initialize some remaining members of the naca and systemcfg structures + * Initialize some remaining members of the ppc64_caches and systemcfg structures * (at least until we get rid of them completely). This is mostly some * cache informations about the CPU that will be used by cache flush * routines and/or provided to userland */ -static void __init initialize_naca(void) +static void __init initialize_cache_info(void) { struct device_node *np; unsigned long num_cpus = 0; - DBG(" -> initialize_naca()\n"); + DBG(" -> initialize_cache_info()\n"); for (np = NULL; (np = of_find_node_by_type(np, "cpu"));) { num_cpus += 1; @@ -530,7 +530,7 @@ systemcfg->version.minor = SYSTEMCFG_MINOR; systemcfg->processor = mfspr(SPRN_PVR); - DBG(" <- initialize_naca()\n"); + DBG(" <- initialize_cache_info()\n"); } static void __init check_for_initrd(void) @@ -591,7 +591,7 @@ unflatten_device_tree(); /* - * Fill the naca & systemcfg structures with informations + * Fill the ppc64_caches & systemcfg structures with informations * retreived from the device-tree. Need to be called before * finish_device_tree() since the later requires some of the * informations filled up here to properly parse the interrupt @@ -600,7 +600,7 @@ * routines like flush_icache_range (used by the hash init * later on). */ - initialize_naca(); + initialize_cache_info(); #ifdef CONFIG_PPC_PSERIES /* @@ -661,9 +661,8 @@ printk("Starting Linux PPC64 %s\n", UTS_RELEASE); printk("-----------------------------------------------------\n"); - printk("naca = 0x%p\n", naca); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); - printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); + printk("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); diff -ruN linus-bk-naca.6/arch/ppc64/kernel/udbg.c linus-bk-naca.7/arch/ppc64/kernel/udbg.c --- linus-bk-naca.6/arch/ppc64/kernel/udbg.c 2004-11-22 14:05:02.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/kernel/udbg.c 2004-12-11 02:31:17.000000000 +1100 @@ -15,7 +15,6 @@ #include #include #include -#include #include #include #include @@ -323,7 +322,7 @@ /* Special print used by PPCDBG() macro */ void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...) { - unsigned long active_debugs = debug_flags & naca->debug_switch; + unsigned long active_debugs = debug_flags & ppc64_debug_switch; if (active_debugs) { va_list ap; @@ -357,5 +356,5 @@ unsigned long udbg_ifdebug(unsigned long flags) { - return (flags & naca->debug_switch); + return (flags & ppc64_debug_switch); } diff -ruN linus-bk-naca.6/arch/ppc64/xmon/xmon.c linus-bk-naca.7/arch/ppc64/xmon/xmon.c --- linus-bk-naca.6/arch/ppc64/xmon/xmon.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.7/arch/ppc64/xmon/xmon.c 2004-12-11 02:33:00.000000000 +1100 @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -2360,9 +2359,9 @@ if (cmd == '\n') { /* show current state */ unsigned long i; - printf("naca->debug_switch = 0x%lx\n", naca->debug_switch); + printf("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); for (i = 0; i < PPCDBG_NUM_FLAGS ;i++) { - on = PPCDBG_BITVAL(i) & naca->debug_switch; + on = PPCDBG_BITVAL(i) & ppc64_debug_switch; printf("%02x %s %12s ", i, on ? "on " : "off", trace_names[i] ? trace_names[i] : ""); if (((i+1) % 3) == 0) printf("\n"); @@ -2376,7 +2375,7 @@ on = (cmd == '+'); cmd = inchar(); if (cmd == ' ' || cmd == '\n') { /* Turn on or off based on + or - */ - naca->debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE; + ppc64_debug_switch = on ? PPCDBG_ALL:PPCDBG_NONE; printf("Setting all values to %s...\n", on ? "on" : "off"); if (cmd == '\n') return; else cmd = skipbl(); @@ -2391,10 +2390,10 @@ return; } if (on) { - naca->debug_switch |= PPCDBG_BITVAL(val); + ppc64_debug_switch |= PPCDBG_BITVAL(val); printf("enable debug %x %s\n", val, trace_names[val] ? trace_names[val] : ""); } else { - naca->debug_switch &= ~PPCDBG_BITVAL(val); + ppc64_debug_switch &= ~PPCDBG_BITVAL(val); printf("disable debug %x %s\n", val, trace_names[val] ? trace_names[val] : ""); } cmd = skipbl(); diff -ruN linus-bk-naca.6/include/asm-ppc64/naca.h linus-bk-naca.7/include/asm-ppc64/naca.h --- linus-bk-naca.6/include/asm-ppc64/naca.h 2004-12-11 00:03:55.000000000 +1100 +++ linus-bk-naca.7/include/asm-ppc64/naca.h 2004-12-11 02:41:18.000000000 +1100 @@ -19,9 +19,6 @@ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ - u64 debug_switch; /* Debug print control 0x20 */ - u64 banner; /* Ptr to banner string 0x28 */ - u64 log; /* Ptr to log buffer 0x30 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.6/include/asm-ppc64/ppcdebug.h linus-bk-naca.7/include/asm-ppc64/ppcdebug.h --- linus-bk-naca.6/include/asm-ppc64/ppcdebug.h 2004-02-16 08:19:48.000000000 +1100 +++ linus-bk-naca.7/include/asm-ppc64/ppcdebug.h 2004-12-13 12:05:25.000000000 +1100 @@ -16,13 +16,14 @@ ********************************************************************/ #include +#include #include #include #define PPCDBG_BITVAL(X) ((1UL)<<((unsigned long)(X))) /* Defined below are the bit positions of various debug flags in the - * debug_switch variable (defined in naca.h). + * ppc64_debug_switch variable. * -- When adding new values, please enter them into trace names below -- * * Values 62 & 63 can be used to stress the hardware page table management @@ -64,6 +65,8 @@ #define PPCDBG_NUM_FLAGS 64 +extern u64 ppc64_debug_switch; + #ifdef WANT_PPCDBG_TAB /* A table of debug switch names to allow name lookup in xmon * (and whoever else wants it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a19eb844/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:34:45 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:34:45 +1100 Subject: [PATCH 8/11] PPC64: remove the naca from all but iSeries In-Reply-To: <20050104153102.67284491.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> Message-ID: <20050104153445.3777e689.sfr@canb.auug.org.au> Hi Andrew, This patch finally removes the naca from all architectures except legacy iSeries and in the process makes it a structure instead of a pointer. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.7/arch/ppc64/kernel/LparData.c linus-bk-naca.8/arch/ppc64/kernel/LparData.c --- linus-bk-naca.7/arch/ppc64/kernel/LparData.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.8/arch/ppc64/kernel/LparData.c 2004-12-11 02:49:48.000000000 +1100 @@ -44,7 +44,7 @@ 0xc8a5d9c4, /* desc = "HvRD" ebcdic */ sizeof(struct HvReleaseData), offsetof(struct naca_struct, xItVpdAreas), - (struct naca_struct *)(NACA_VIRT_ADDR), /* 64-bit Naca address */ + &naca, /* 64-bit Naca address */ 0x6000, /* offset of LparMap within loadarea (see head.S) */ 0, 1, /* tags inactive */ diff -ruN linus-bk-naca.7/arch/ppc64/kernel/head.S linus-bk-naca.8/arch/ppc64/kernel/head.S --- linus-bk-naca.7/arch/ppc64/kernel/head.S 2004-12-10 18:40:24.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/head.S 2004-12-11 02:56:12.000000000 +1100 @@ -512,17 +512,15 @@ */ . = NACA_PHYS_ADDR .globl __end_interrupts - .globl __start_naca __end_interrupts: -__start_naca: #ifdef CONFIG_PPC_ISERIES + .globl naca +naca: .llong itVpdAreas #endif . = SYSTEMCFG_PHYS_ADDR - .globl __end_naca .globl __start_systemcfg -__end_naca: __start_systemcfg: . = (SYSTEMCFG_PHYS_ADDR + PAGE_SIZE) .globl __end_systemcfg @@ -1270,10 +1268,6 @@ SET_REG_TO_CONST(r4, SYSTEMCFG_VIRT_ADDR) std r4,0(r9) /* set the systemcfg pointer */ - LOADADDR(r9,naca) - SET_REG_TO_CONST(r4, NACA_VIRT_ADDR) - std r4,0(r9) /* set the naca pointer */ - bl .iSeries_early_setup /* relocation is on at this point */ @@ -1873,12 +1867,6 @@ li r27,SYSTEMCFG_PHYS_ADDR std r27,0(r6) /* set the value of systemcfg */ - /* setup the naca pointer which is needed by *tab_initialize */ - LOADADDR(r6,naca) - sub r6,r6,r26 /* addr of the variable naca */ - li r27,NACA_PHYS_ADDR - std r27,0(r6) /* set the value of naca */ - #ifdef CONFIG_HMT /* Start up the second thread on cpu 0 */ mfspr r3,PVR @@ -2015,11 +2003,6 @@ SET_REG_TO_CONST(r8, SYSTEMCFG_VIRT_ADDR) std r8,0(r9) - /* setup the naca pointer */ - LOADADDR(r9,naca) - SET_REG_TO_CONST(r8, NACA_VIRT_ADDR) - std r8,0(r9) /* set the value of the naca ptr */ - LOADADDR(r26, boot_cpuid) lwz r26,0(r26) diff -ruN linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-naca.7/arch/ppc64/kernel/iSeries_setup.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/iSeries_setup.c 2004-12-11 02:51:17.000000000 +1100 @@ -314,13 +314,13 @@ * If the init RAM disk has been configured and there is * a non-zero starting address for it, set it up */ - if (naca->xRamDisk) { - initrd_start = (unsigned long)__va(naca->xRamDisk); - initrd_end = initrd_start + naca->xRamDiskSize * PAGE_SIZE; + if (naca.xRamDisk) { + initrd_start = (unsigned long)__va(naca.xRamDisk); + initrd_end = initrd_start + naca.xRamDiskSize * PAGE_SIZE; initrd_below_start_ok = 1; // ramdisk in kernel space ROOT_DEV = Root_RAM0; - if (((rd_size * 1024) / PAGE_SIZE) < naca->xRamDiskSize) - rd_size = (naca->xRamDiskSize * PAGE_SIZE) / 1024; + if (((rd_size * 1024) / PAGE_SIZE) < naca.xRamDiskSize) + rd_size = (naca.xRamDiskSize * PAGE_SIZE) / 1024; } else #endif /* CONFIG_BLK_DEV_INITRD */ { @@ -813,9 +813,9 @@ * Change klimit to take into account any ram disk * that may be included */ - if (naca->xRamDisk) - klimit = KERNELBASE + (u64)naca->xRamDisk + - (naca->xRamDiskSize * PAGE_SIZE); + if (naca.xRamDisk) + klimit = KERNELBASE + (u64)naca.xRamDisk + + (naca.xRamDiskSize * PAGE_SIZE); else { /* * No ram disk was included - check and see if there diff -ruN linus-bk-naca.7/arch/ppc64/kernel/pacaData.c linus-bk-naca.8/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.7/arch/ppc64/kernel/pacaData.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.8/arch/ppc64/kernel/pacaData.c 2004-12-11 02:50:23.000000000 +1100 @@ -18,11 +18,8 @@ #include #include -#include #include -struct naca_struct *naca; -EXPORT_SYMBOL(naca); struct systemcfg *systemcfg; EXPORT_SYMBOL(systemcfg); diff -ruN linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h --- linus-bk-naca.7/include/asm-ppc64/iSeries/HvReleaseData.h 2004-01-20 08:20:26.000000000 +1100 +++ linus-bk-naca.8/include/asm-ppc64/iSeries/HvReleaseData.h 2004-12-11 02:52:05.000000000 +1100 @@ -26,6 +26,7 @@ // address of the OS's NACA). // #include +#include //============================================================================= // diff -ruN linus-bk-naca.7/include/asm-ppc64/naca.h linus-bk-naca.8/include/asm-ppc64/naca.h --- linus-bk-naca.7/include/asm-ppc64/naca.h 2004-12-11 02:41:18.000000000 +1100 +++ linus-bk-naca.8/include/asm-ppc64/naca.h 2004-12-11 02:54:02.000000000 +1100 @@ -21,12 +21,11 @@ u64 xRamDiskSize; /* In pages 0x10 */ }; -extern struct naca_struct *naca; +extern struct naca_struct naca; #endif /* __ASSEMBLY__ */ #define NACA_PAGE 0x4 #define NACA_PHYS_ADDR (NACA_PAGE< References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> Message-ID: <20050104153740.56622b4f.sfr@canb.auug.org.au> Hi Andrew, This fixes an aweful piece of code that could have just referenced xPMCRegsInUse in the lppaca structure. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.8/arch/ppc64/kernel/sysfs.c linus-bk-naca.9/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.8/arch/ppc64/kernel/sysfs.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.9/arch/ppc64/kernel/sysfs.c 2004-12-13 14:49:37.000000000 +1100 @@ -14,6 +14,8 @@ #include #include #include +#include +#include /* SMT stuff */ @@ -154,10 +156,8 @@ #ifdef CONFIG_PPC_PSERIES /* instruct hypervisor to maintain PMCs */ - if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - char *ptr = (char *)&paca[smp_processor_id()].lppaca; - ptr[0xBB] = 1; - } + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) + get_paca()->lppaca.xPMCRegsInUse = 1; /* * On SMT machines we have to set the run latch in the ctrl register -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/ea16f4d5/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:40:25 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:40:25 +1100 Subject: [PATCH 10/11] PPC64: move the lppaca defining header file In-Reply-To: <20050104153740.56622b4f.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> Message-ID: <20050104154025.63a1b9fb.sfr@canb.auug.org.au> Hi Andrew, This patch just renames asm/iSeries/ItLpPaca.h to asm/lppaca.h as the lppaca structure is no longer just legacy iSeries specific. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.9/arch/ppc64/kernel/LparData.c linus-bk-naca.10/arch/ppc64/kernel/LparData.c --- linus-bk-naca.9/arch/ppc64/kernel/LparData.c 2004-12-11 02:49:48.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/LparData.c 2004-12-13 15:01:55.000000000 +1100 @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.9/arch/ppc64/kernel/asm-offsets.c 2004-12-10 17:27:14.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c 2004-12-13 15:02:03.000000000 +1100 @@ -29,7 +29,7 @@ #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c --- linus-bk-naca.9/arch/ppc64/kernel/iSeries_proc.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/iSeries_proc.c 2004-12-13 15:02:14.000000000 +1100 @@ -24,7 +24,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c --- linus-bk-naca.9/arch/ppc64/kernel/lparcfg.c 2004-11-20 12:05:26.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c 2004-12-13 15:02:29.000000000 +1100 @@ -27,7 +27,7 @@ #include #include #include -#include +#include #include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/pacaData.c linus-bk-naca.10/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.9/arch/ppc64/kernel/pacaData.c 2004-12-11 02:50:23.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/pacaData.c 2004-12-13 15:02:07.000000000 +1100 @@ -16,7 +16,7 @@ #include #include -#include +#include #include #include diff -ruN linus-bk-naca.9/arch/ppc64/kernel/sysfs.c linus-bk-naca.10/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.9/arch/ppc64/kernel/sysfs.c 2004-12-13 14:49:37.000000000 +1100 +++ linus-bk-naca.10/arch/ppc64/kernel/sysfs.c 2004-12-13 15:01:19.000000000 +1100 @@ -15,7 +15,7 @@ #include #include #include -#include +#include /* SMT stuff */ diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h --- linus-bk-naca.9/include/asm-ppc64/iSeries/ItLpPaca.h 2004-01-20 08:20:26.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/iSeries/ItLpPaca.h 1970-01-01 10:00:00.000000000 +1000 @@ -1,134 +0,0 @@ -/* - * ItLpPaca.h - * Copyright (C) 2001 Mike Corrigan IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -#ifndef _ITLPPACA_H -#define _ITLPPACA_H - -//============================================================================= -// -// This control block contains the data that is shared between the -// hypervisor (PLIC) and the OS. -// -// -//---------------------------------------------------------------------------- -#include - -struct ItLpPaca -{ -//============================================================================= -// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data -// NOTE: The xDynXyz fields are fields that will be dynamically changed by -// PLIC when preparing to bring a processor online or when dispatching a -// virtual processor! -//============================================================================= - u32 xDesc; // Eye catcher 0xD397D781 x00-x03 - u16 xSize; // Size of this struct x04-x05 - u16 xRsvd1_0; // Reserved x06-x07 - u16 xRsvd1_1:14; // Reserved x08-x09 - u8 xSharedProc:1; // Shared processor indicator ... - u8 xSecondaryThread:1; // Secondary thread indicator ... - volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A - u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B - volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D - volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F - u32 xDecrVal; // Value for Decr programming x10-x13 - u32 xPMCVal; // Value for PMC regs x14-x17 - volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B - volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F - volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 - u32 xDseiData; // DSEI data x24-x27 - u64 xSPRG3; // SPRG3 value x28-x2F - u8 xRsvd1_3[80]; // Reserved x30-x7F - -//============================================================================= -// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data -//============================================================================= - // This Dword contains a byte for each type of interrupt that can occur. - // The IPI is a count while the others are just a binary 1 or 0. - union { - u64 xAnyInt; - struct { - u16 xRsvd; // Reserved - cleared by #mpasmbl - u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO - u8 xIpiCnt; // IPI Count - u8 xDecrInt; // DECR interrupt occurred - u8 xPdcInt; // PDC interrupt occurred - u8 xQuantumInt; // Interrupt quantum reached - u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending - } xFields; - } xIntDword; - - // Whenever any fields in this Dword are set then PLIC will defer the - // processing of external interrupts. Note that PLIC will store the - // XIRR directly into the xXirrValue field so that another XIRR will - // not be presented until this one clears. The layout of the low - // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the - // entire Dword is zero or not. A non-zero value in the low order - // 2-bytes will result in SLIC being granted the highest thread - // priority upon return. A 0 will return to SLIC as medium priority. - u64 xPlicDeferIntsArea; // Entire Dword - - // Used to pass the real SRR0/1 from PLIC to SLIC as well as to - // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. - u64 xSavedSrr0; // Saved SRR0 x10-x17 - u64 xSavedSrr1; // Saved SRR1 x18-x1F - - // Used to pass parms from the OS to PLIC for SetAsrAndRfid - u64 xSavedGpr3; // Saved GPR3 x20-x27 - u64 xSavedGpr4; // Saved GPR4 x28-x2F - u64 xSavedGpr5; // Saved GPR5 x30-x37 - - u8 xRsvd2_1; // Reserved x38-x38 - u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 - u8 xFPRegsInUse; // FP regs in use x3A-x3A - u8 xPMCRegsInUse; // PMC regs in use x3B-x3B - volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F - volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 - volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F - u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 - u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F - u64 xEndOfQuantum; // TB at end of quantum x60-x67 - u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F - u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 - volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B - u16 xSLBCount; // # of SLBs to maintain x7C-x7D - u8 xIdle; // Indicate OS is idle x7E - u8 xRsvd2_2; // Reserved x7F - - -//============================================================================= -// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors -//============================================================================= - // This is the xYieldCount. An "odd" value (low bit on) means that - // the processor is yielded (either because of an OS yield or a PLIC - // preempt). An even value implies that the processor is currently - // executing. - // NOTE: This value will ALWAYS be zero for dedicated processors and - // will NEVER be zero for shared processors (ie, initialized to a 1). - volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 - u8 xRsvd3_0[124]; // Reserved x04-x7F - -//============================================================================= -// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data -//============================================================================= - u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF - - -}; - -#endif /* _ITLPPACA_H */ diff -ruN linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h --- linus-bk-naca.9/include/asm-ppc64/iSeries/LparData.h 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/iSeries/LparData.h 2004-12-13 15:03:03.000000000 +1100 @@ -25,7 +25,6 @@ #include #include -#include #include #include #include diff -ruN linus-bk-naca.9/include/asm-ppc64/lppaca.h linus-bk-naca.10/include/asm-ppc64/lppaca.h --- linus-bk-naca.9/include/asm-ppc64/lppaca.h 1970-01-01 10:00:00.000000000 +1000 +++ linus-bk-naca.10/include/asm-ppc64/lppaca.h 2004-12-13 15:04:43.000000000 +1100 @@ -0,0 +1,134 @@ +/* + * lppaca.h + * Copyright (C) 2001 Mike Corrigan IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _ASM_LPPACA_H +#define _ASM_LPPACA_H + +//============================================================================= +// +// This control block contains the data that is shared between the +// hypervisor (PLIC) and the OS. +// +// +//---------------------------------------------------------------------------- +#include + +struct ItLpPaca +{ +//============================================================================= +// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data +// NOTE: The xDynXyz fields are fields that will be dynamically changed by +// PLIC when preparing to bring a processor online or when dispatching a +// virtual processor! +//============================================================================= + u32 xDesc; // Eye catcher 0xD397D781 x00-x03 + u16 xSize; // Size of this struct x04-x05 + u16 xRsvd1_0; // Reserved x06-x07 + u16 xRsvd1_1:14; // Reserved x08-x09 + u8 xSharedProc:1; // Shared processor indicator ... + u8 xSecondaryThread:1; // Secondary thread indicator ... + volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A + u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B + volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D + volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F + u32 xDecrVal; // Value for Decr programming x10-x13 + u32 xPMCVal; // Value for PMC regs x14-x17 + volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B + volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F + volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 + u32 xDseiData; // DSEI data x24-x27 + u64 xSPRG3; // SPRG3 value x28-x2F + u8 xRsvd1_3[80]; // Reserved x30-x7F + +//============================================================================= +// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data +//============================================================================= + // This Dword contains a byte for each type of interrupt that can occur. + // The IPI is a count while the others are just a binary 1 or 0. + union { + u64 xAnyInt; + struct { + u16 xRsvd; // Reserved - cleared by #mpasmbl + u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO + u8 xIpiCnt; // IPI Count + u8 xDecrInt; // DECR interrupt occurred + u8 xPdcInt; // PDC interrupt occurred + u8 xQuantumInt; // Interrupt quantum reached + u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending + } xFields; + } xIntDword; + + // Whenever any fields in this Dword are set then PLIC will defer the + // processing of external interrupts. Note that PLIC will store the + // XIRR directly into the xXirrValue field so that another XIRR will + // not be presented until this one clears. The layout of the low + // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the + // entire Dword is zero or not. A non-zero value in the low order + // 2-bytes will result in SLIC being granted the highest thread + // priority upon return. A 0 will return to SLIC as medium priority. + u64 xPlicDeferIntsArea; // Entire Dword + + // Used to pass the real SRR0/1 from PLIC to SLIC as well as to + // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. + u64 xSavedSrr0; // Saved SRR0 x10-x17 + u64 xSavedSrr1; // Saved SRR1 x18-x1F + + // Used to pass parms from the OS to PLIC for SetAsrAndRfid + u64 xSavedGpr3; // Saved GPR3 x20-x27 + u64 xSavedGpr4; // Saved GPR4 x28-x2F + u64 xSavedGpr5; // Saved GPR5 x30-x37 + + u8 xRsvd2_1; // Reserved x38-x38 + u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 + u8 xFPRegsInUse; // FP regs in use x3A-x3A + u8 xPMCRegsInUse; // PMC regs in use x3B-x3B + volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F + volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 + volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F + u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 + u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F + u64 xEndOfQuantum; // TB at end of quantum x60-x67 + u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F + u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 + volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B + u16 xSLBCount; // # of SLBs to maintain x7C-x7D + u8 xIdle; // Indicate OS is idle x7E + u8 xRsvd2_2; // Reserved x7F + + +//============================================================================= +// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors +//============================================================================= + // This is the xYieldCount. An "odd" value (low bit on) means that + // the processor is yielded (either because of an OS yield or a PLIC + // preempt). An even value implies that the processor is currently + // executing. + // NOTE: This value will ALWAYS be zero for dedicated processors and + // will NEVER be zero for shared processors (ie, initialized to a 1). + volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 + u8 xRsvd3_0[124]; // Reserved x04-x7F + +//============================================================================= +// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data +//============================================================================= + u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF + + +}; + +#endif /* _ASM_LPPACA_H */ diff -ruN linus-bk-naca.9/include/asm-ppc64/paca.h linus-bk-naca.10/include/asm-ppc64/paca.h --- linus-bk-naca.9/include/asm-ppc64/paca.h 2004-12-13 18:05:08.000000000 +1100 +++ linus-bk-naca.10/include/asm-ppc64/paca.h 2004-12-31 15:48:57.000000000 +1100 @@ -18,7 +18,7 @@ #include #include -#include +#include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/75de8b11/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 15:43:19 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 15:43:19 +1100 Subject: [PATCH 11/11] PPC64: remove StudlyCaps from lppaca structure In-Reply-To: <20050104154025.63a1b9fb.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> Message-ID: <20050104154319.505b1197.sfr@canb.auug.org.au> Hi Andrew, This patch just renames all the fields (and the structure name) of the lppaca structure to rid us of some more StudyCaps. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c --- linus-bk-naca.10/arch/ppc64/kernel/asm-offsets.c 2004-12-13 15:02:03.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/asm-offsets.c 2004-12-31 15:51:16.000000000 +1100 @@ -102,10 +102,10 @@ DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp)); DEFINE(PACALPPACA, offsetof(struct paca_struct, lppaca)); DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); - DEFINE(LPPACASRR0, offsetof(struct ItLpPaca, xSavedSrr0)); - DEFINE(LPPACASRR1, offsetof(struct ItLpPaca, xSavedSrr1)); - DEFINE(LPPACAANYINT, offsetof(struct ItLpPaca, xIntDword.xAnyInt)); - DEFINE(LPPACADECRINT, offsetof(struct ItLpPaca, xIntDword.xFields.xDecrInt)); + DEFINE(LPPACASRR0, offsetof(struct lppaca, saved_srr0)); + DEFINE(LPPACASRR1, offsetof(struct lppaca, saved_srr1)); + DEFINE(LPPACAANYINT, offsetof(struct lppaca, int_dword.any_int)); + DEFINE(LPPACADECRINT, offsetof(struct lppaca, int_dword.fields.decr_int)); /* RTAS */ DEFINE(RTASBASE, offsetof(struct rtas_t, base)); diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-naca.10/arch/ppc64/kernel/iSeries_setup.c 2004-12-11 02:51:17.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_setup.c 2004-12-13 15:31:14.000000000 +1100 @@ -559,7 +559,7 @@ static void __init setup_iSeries_cache_sizes(void) { unsigned int i, n; - unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; + unsigned int procIx = get_paca()->lppaca.dyn_hv_phys_proc_index; systemcfg->icache_size = ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; @@ -656,7 +656,7 @@ void __init iSeries_setup_arch(void) { void *eventStack; - unsigned procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; + unsigned procIx = get_paca()->lppaca.dyn_hv_phys_proc_index; /* Add an eye catcher and the systemcfg layout version number */ strcpy(systemcfg->eye_catcher, "SYSTEMCFG:PPC64"); diff -ruN linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c --- linus-bk-naca.10/arch/ppc64/kernel/iSeries_smp.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/iSeries_smp.c 2004-12-13 15:29:16.000000000 +1100 @@ -90,7 +90,7 @@ np = 0; for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { + if (paca[i].lppaca.dyn_proc_status < 2) { cpu_set(i, cpu_possible_map); cpu_set(i, cpu_present_map); cpu_set(i, cpu_sibling_map[i]); @@ -106,7 +106,7 @@ unsigned np = 0; for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { + if (paca[i].lppaca.dyn_proc_status < 2) { /*paca[i].active = 1;*/ ++np; } @@ -120,7 +120,7 @@ BUG_ON(nr < 0 || nr >= NR_CPUS); /* Verify that our partition has a processor nr */ - if (paca[nr].lppaca.xDynProcStatus >= 2) + if (paca[nr].lppaca.dyn_proc_status >= 2) return; /* The processor is currently spinning, waiting diff -ruN linus-bk-naca.10/arch/ppc64/kernel/idle.c linus-bk-naca.11/arch/ppc64/kernel/idle.c --- linus-bk-naca.10/arch/ppc64/kernel/idle.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/idle.c 2004-12-13 16:06:18.000000000 +1100 @@ -67,7 +67,7 @@ * The decrementer stops during the yield. Force a fake decrementer * here and let the timer_interrupt code sort out the actual time. */ - get_paca()->lppaca.xIntDword.xFields.xDecrInt = 1; + get_paca()->lppaca.int_dword.fields.decr_int = 1; process_iSeries_events(); } @@ -86,7 +86,7 @@ lpaca = get_paca(); while (1) { - if (lpaca->lppaca.xSharedProc) { + if (lpaca->lppaca.shared_proc) { if (ItLpQueue_isLpIntPending(lpaca->lpqueue_ptr)) process_iSeries_events(); if (!need_resched()) @@ -173,7 +173,7 @@ * Indicate to the HV that we are idle. Now would be * a good time to find other work to dispatch. */ - lpaca->lppaca.xIdle = 1; + lpaca->lppaca.idle = 1; oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); if (!oldval) { @@ -194,7 +194,7 @@ HMT_medium(); - if (!(ppaca->lppaca.xIdle)) { + if (!(ppaca->lppaca.idle)) { local_irq_disable(); /* @@ -233,7 +233,7 @@ } HMT_medium(); - lpaca->lppaca.xIdle = 0; + lpaca->lppaca.idle = 0; schedule(); if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); @@ -251,7 +251,7 @@ * Indicate to the HV that we are idle. Now would be * a good time to find other work to dispatch. */ - lpaca->lppaca.xIdle = 1; + lpaca->lppaca.idle = 1; while (!need_resched() && !cpu_is_offline(cpu)) { local_irq_disable(); @@ -273,7 +273,7 @@ } HMT_medium(); - lpaca->lppaca.xIdle = 0; + lpaca->lppaca.idle = 0; schedule(); if (cpu_is_offline(smp_processor_id()) && system_state == SYSTEM_RUNNING) @@ -352,7 +352,7 @@ #ifdef CONFIG_PPC_PSERIES if (systemcfg->platform & PLATFORM_PSERIES) { if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - if (get_paca()->lppaca.xSharedProc) { + if (get_paca()->lppaca.shared_proc) { printk(KERN_INFO "Using shared processor idle loop\n"); idle_loop = shared_idle; } else { diff -ruN linus-bk-naca.10/arch/ppc64/kernel/irq.c linus-bk-naca.11/arch/ppc64/kernel/irq.c --- linus-bk-naca.10/arch/ppc64/kernel/irq.c 2004-12-31 14:53:21.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/irq.c 2004-12-13 15:43:22.000000000 +1100 @@ -259,8 +259,8 @@ lpaca = get_paca(); #ifdef CONFIG_SMP - if (lpaca->lppaca.xIntDword.xFields.xIpiCnt) { - lpaca->lppaca.xIntDword.xFields.xIpiCnt = 0; + if (lpaca->lppaca.int_dword.fields.ipi_cnt) { + lpaca->lppaca.int_dword.fields.ipi_cnt = 0; iSeries_smp_message_recv(regs); } #endif /* CONFIG_SMP */ @@ -270,8 +270,8 @@ irq_exit(); - if (lpaca->lppaca.xIntDword.xFields.xDecrInt) { - lpaca->lppaca.xIntDword.xFields.xDecrInt = 0; + if (lpaca->lppaca.int_dword.fields.decr_int) { + lpaca->lppaca.int_dword.fields.decr_int = 0; /* Signal a fake decrementer interrupt */ timer_interrupt(regs); } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c --- linus-bk-naca.10/arch/ppc64/kernel/lparcfg.c 2004-12-13 15:02:29.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/lparcfg.c 2004-12-13 16:00:00.000000000 +1100 @@ -72,7 +72,7 @@ /* * For iSeries legacy systems, the PPA purr function is available from the - * xEmulatedTimeBase field in the paca. + * emulated_time_base field in the paca. */ static unsigned long get_purr(void) { @@ -82,11 +82,11 @@ for_each_cpu(cpu) { lpaca = paca + cpu; - sum_purr += lpaca->lppaca.xEmulatedTimeBase; + sum_purr += lpaca->lppaca.emulated_time_base; #ifdef PURR_DEBUG printk(KERN_INFO "get_purr for cpu (%d) has value (%ld) \n", - cpu, lpaca->lppaca.xEmulatedTimeBase); + cpu, lpaca->lppaca.emulated_time_base); #endif } return sum_purr; @@ -107,7 +107,7 @@ seq_printf(m, "%s %s \n", MODULE_NAME, MODULE_VERS); - shared = (int)(lpaca->lppaca_ptr->xSharedProc); + shared = (int)(lpaca->lppaca_ptr->shared_proc); seq_printf(m, "serial_number=%c%c%c%c%c%c%c\n", e2a(xItExtVpdPanel.mfgID[2]), e2a(xItExtVpdPanel.mfgID[3]), @@ -395,7 +395,7 @@ (h_resource >> 0 * 8) & 0xffff); /* pool related entries are apropriate for shared configs */ - if (paca[0].lppaca.xSharedProc) { + if (paca[0].lppaca.shared_proc) { h_pic(&pool_idle_time, &pool_procs); @@ -444,7 +444,7 @@ seq_printf(m, "partition_potential_processors=%d\n", partition_potential_processors); - seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.xSharedProc); + seq_printf(m, "shared_processor_mode=%d\n", paca[0].lppaca.shared_proc); return 0; } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/pacaData.c linus-bk-naca.11/arch/ppc64/kernel/pacaData.c --- linus-bk-naca.10/arch/ppc64/kernel/pacaData.c 2004-12-13 15:02:07.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/pacaData.c 2004-12-13 16:05:34.000000000 +1100 @@ -28,7 +28,7 @@ extern unsigned long __toc_start; /* The Paca is an array with one entry per processor. Each contains an - * ItLpPaca, which contains the information shared between the + * lppaca, which contains the information shared between the * hypervisor and Linux. Each also contains an ItLpRegSave area which * is used by the hypervisor to save registers. * On systems with hardware multi-threading, there are two threads @@ -61,13 +61,13 @@ .cpu_start = (start), /* Processor start */ \ .hw_cpu_id = 0xffff, \ .lppaca = { \ - .xDesc = 0xd397d781, /* "LpPa" */ \ - .xSize = sizeof(struct ItLpPaca), \ - .xFPRegsInUse = 1, \ - .xDynProcStatus = 2, \ - .xDecrVal = 0x00ff0000, \ - .xEndOfQuantum = 0xfffffffffffffffful, \ - .xSLBCount = 64, \ + .desc = 0xd397d781, /* "LpPa" */ \ + .size = sizeof(struct lppaca), \ + .dyn_proc_status = 2, \ + .decr_val = 0x00ff0000, \ + .fpregs_in_use = 1, \ + .end_of_quantum = 0xfffffffffffffffful, \ + .slb_count = 64, \ }, \ EXTRA_INITS((number), (lpq)) \ } diff -ruN linus-bk-naca.10/arch/ppc64/kernel/sysfs.c linus-bk-naca.11/arch/ppc64/kernel/sysfs.c --- linus-bk-naca.10/arch/ppc64/kernel/sysfs.c 2004-12-13 15:01:19.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/sysfs.c 2004-12-13 15:58:30.000000000 +1100 @@ -157,7 +157,7 @@ #ifdef CONFIG_PPC_PSERIES /* instruct hypervisor to maintain PMCs */ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) - get_paca()->lppaca.xPMCRegsInUse = 1; + get_paca()->lppaca.pmcregs_in_use = 1; /* * On SMT machines we have to set the run latch in the ctrl register diff -ruN linus-bk-naca.10/arch/ppc64/kernel/time.c linus-bk-naca.11/arch/ppc64/kernel/time.c --- linus-bk-naca.10/arch/ppc64/kernel/time.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/kernel/time.c 2004-12-13 15:43:28.000000000 +1100 @@ -230,7 +230,7 @@ /* * For iSeries shared processors, we have to let the hypervisor * set the hardware decrementer. We set a virtual decrementer - * in the ItLpPaca and call the hypervisor if the virtual + * in the lppaca and call the hypervisor if the virtual * decrementer is less than the current value in the hardware * decrementer. (almost always the new decrementer value will * be greater than the current hardware decementer so the hypervisor @@ -256,7 +256,7 @@ profile_tick(CPU_PROFILING, regs); #endif - lpaca->lppaca.xIntDword.xFields.xDecrInt = 0; + lpaca->lppaca.int_dword.fields.decr_int = 0; while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) { diff -ruN linus-bk-naca.10/arch/ppc64/lib/locks.c linus-bk-naca.11/arch/ppc64/lib/locks.c --- linus-bk-naca.10/arch/ppc64/lib/locks.c 2004-09-16 21:51:57.000000000 +1000 +++ linus-bk-naca.11/arch/ppc64/lib/locks.c 2004-12-13 16:08:05.000000000 +1100 @@ -34,7 +34,7 @@ holder_cpu = lock_value & 0xffff; BUG_ON(holder_cpu >= NR_CPUS); holder_paca = &paca[holder_cpu]; - yield_count = holder_paca->lppaca.xYieldCount; + yield_count = holder_paca->lppaca.yield_count; if ((yield_count & 1) == 0) return; /* virtual cpu is currently running */ rmb(); @@ -66,7 +66,7 @@ holder_cpu = lock_value & 0xffff; BUG_ON(holder_cpu >= NR_CPUS); holder_paca = &paca[holder_cpu]; - yield_count = holder_paca->lppaca.xYieldCount; + yield_count = holder_paca->lppaca.yield_count; if ((yield_count & 1) == 0) return; /* virtual cpu is currently running */ rmb(); diff -ruN linus-bk-naca.10/arch/ppc64/xmon/xmon.c linus-bk-naca.11/arch/ppc64/xmon/xmon.c --- linus-bk-naca.10/arch/ppc64/xmon/xmon.c 2004-12-11 02:33:00.000000000 +1100 +++ linus-bk-naca.11/arch/ppc64/xmon/xmon.c 2004-12-13 15:50:52.000000000 +1100 @@ -1489,7 +1489,7 @@ unsigned long val; #ifdef CONFIG_PPC_ISERIES struct paca_struct *ptrPaca = NULL; - struct ItLpPaca *ptrLpPaca = NULL; + struct lppaca *ptrLpPaca = NULL; struct ItLpRegSave *ptrLpRegSave = NULL; #endif @@ -1513,10 +1513,10 @@ printf(" Local Processor Control Area (LpPaca): \n"); ptrLpPaca = ptrPaca->lppaca_ptr; printf(" Saved Srr0=%.16lx Saved Srr1=%.16lx \n", - ptrLpPaca->xSavedSrr0, ptrLpPaca->xSavedSrr1); + ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1); printf(" Saved Gpr3=%.16lx Saved Gpr4=%.16lx \n", - ptrLpPaca->xSavedGpr3, ptrLpPaca->xSavedGpr4); - printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->xSavedGpr5); + ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4); + printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5); printf(" Local Processor Register Save Area (LpRegSave): \n"); ptrLpRegSave = ptrPaca->reg_save_ptr; diff -ruN linus-bk-naca.10/include/asm-ppc64/lppaca.h linus-bk-naca.11/include/asm-ppc64/lppaca.h --- linus-bk-naca.10/include/asm-ppc64/lppaca.h 2004-12-13 15:04:43.000000000 +1100 +++ linus-bk-naca.11/include/asm-ppc64/lppaca.h 2004-12-13 16:09:08.000000000 +1100 @@ -28,7 +28,7 @@ //---------------------------------------------------------------------------- #include -struct ItLpPaca +struct lppaca { //============================================================================= // CACHE_LINE_1 0x0000 - 0x007F Contains read-only data @@ -36,24 +36,24 @@ // PLIC when preparing to bring a processor online or when dispatching a // virtual processor! //============================================================================= - u32 xDesc; // Eye catcher 0xD397D781 x00-x03 - u16 xSize; // Size of this struct x04-x05 - u16 xRsvd1_0; // Reserved x06-x07 - u16 xRsvd1_1:14; // Reserved x08-x09 - u8 xSharedProc:1; // Shared processor indicator ... - u8 xSecondaryThread:1; // Secondary thread indicator ... - volatile u8 xDynProcStatus:8; // Dynamic Status of this proc x0A-x0A - u8 xSecondaryThreadCnt; // Secondary thread count x0B-x0B - volatile u16 xDynHvPhysicalProcIndex;// Dynamic HV Physical Proc Index0C-x0D - volatile u16 xDynHvLogicalProcIndex;// Dynamic HV Logical Proc Indexx0E-x0F - u32 xDecrVal; // Value for Decr programming x10-x13 - u32 xPMCVal; // Value for PMC regs x14-x17 - volatile u32 xDynHwNodeId; // Dynamic Hardware Node id x18-x1B - volatile u32 xDynHwProcId; // Dynamic Hardware Proc Id x1C-x1F - volatile u32 xDynPIR; // Dynamic ProcIdReg value x20-x23 - u32 xDseiData; // DSEI data x24-x27 - u64 xSPRG3; // SPRG3 value x28-x2F - u8 xRsvd1_3[80]; // Reserved x30-x7F + u32 desc; // Eye catcher 0xD397D781 x00-x03 + u16 size; // Size of this struct x04-x05 + u16 reserved1; // Reserved x06-x07 + u16 reserved2:14; // Reserved x08-x09 + u8 shared_proc:1; // Shared processor indicator ... + u8 secondary_thread:1; // Secondary thread indicator ... + volatile u8 dyn_proc_status:8; // Dynamic Status of this proc x0A-x0A + u8 secondary_thread_count; // Secondary thread count x0B-x0B + volatile u16 dyn_hv_phys_proc_index;// Dynamic HV Physical Proc Index0C-x0D + volatile u16 dyn_hv_log_proc_index;// Dynamic HV Logical Proc Indexx0E-x0F + u32 decr_val; // Value for Decr programming x10-x13 + u32 pmc_val; // Value for PMC regs x14-x17 + volatile u32 dyn_hw_node_id; // Dynamic Hardware Node id x18-x1B + volatile u32 dyn_hw_proc_id; // Dynamic Hardware Proc Id x1C-x1F + volatile u32 dyn_pir; // Dynamic ProcIdReg value x20-x23 + u32 dsei_data; // DSEI data x24-x27 + u64 sprg3; // SPRG3 value x28-x2F + u8 reserved3[80]; // Reserved x30-x7F //============================================================================= // CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data @@ -61,17 +61,17 @@ // This Dword contains a byte for each type of interrupt that can occur. // The IPI is a count while the others are just a binary 1 or 0. union { - u64 xAnyInt; + u64 any_int; struct { - u16 xRsvd; // Reserved - cleared by #mpasmbl - u8 xXirrInt; // Indicates xXirrValue is valid or Immed IO - u8 xIpiCnt; // IPI Count - u8 xDecrInt; // DECR interrupt occurred - u8 xPdcInt; // PDC interrupt occurred - u8 xQuantumInt; // Interrupt quantum reached - u8 xOldPlicDeferredExtInt; // Old PLIC has a deferred XIRR pending - } xFields; - } xIntDword; + u16 reserved; // Reserved - cleared by #mpasmbl + u8 xirr_int; // Indicates xXirrValue is valid or Immed IO + u8 ipi_cnt; // IPI Count + u8 decr_int; // DECR interrupt occurred + u8 pdc_int; // PDC interrupt occurred + u8 quantum_int; // Interrupt quantum reached + u8 old_plic_deferred_ext_int; // Old PLIC has a deferred XIRR pending + } fields; + } int_dword; // Whenever any fields in this Dword are set then PLIC will defer the // processing of external interrupts. Note that PLIC will store the @@ -81,54 +81,52 @@ // entire Dword is zero or not. A non-zero value in the low order // 2-bytes will result in SLIC being granted the highest thread // priority upon return. A 0 will return to SLIC as medium priority. - u64 xPlicDeferIntsArea; // Entire Dword + u64 plic_defer_ints_area; // Entire Dword // Used to pass the real SRR0/1 from PLIC to SLIC as well as to // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. - u64 xSavedSrr0; // Saved SRR0 x10-x17 - u64 xSavedSrr1; // Saved SRR1 x18-x1F + u64 saved_srr0; // Saved SRR0 x10-x17 + u64 saved_srr1; // Saved SRR1 x18-x1F // Used to pass parms from the OS to PLIC for SetAsrAndRfid - u64 xSavedGpr3; // Saved GPR3 x20-x27 - u64 xSavedGpr4; // Saved GPR4 x28-x2F - u64 xSavedGpr5; // Saved GPR5 x30-x37 - - u8 xRsvd2_1; // Reserved x38-x38 - u8 xCpuCtlsTaskAttributes; // Task attributes for cpuctls x39-x39 - u8 xFPRegsInUse; // FP regs in use x3A-x3A - u8 xPMCRegsInUse; // PMC regs in use x3B-x3B - volatile u32 xSavedDecr; // Saved Decr Value x3C-x3F - volatile u64 xEmulatedTimeBase;// Emulated TB for this thread x40-x47 - volatile u64 xCurPLICLatency; // Unaccounted PLIC latency x48-x4F - u64 xTotPLICLatency; // Accumulated PLIC latency x50-x57 - u64 xWaitStateCycles; // Wait cycles for this proc x58-x5F - u64 xEndOfQuantum; // TB at end of quantum x60-x67 - u64 xPDCSavedSPRG1; // Saved SPRG1 for PMC int x68-x6F - u64 xPDCSavedSRR0; // Saved SRR0 for PMC int x70-x77 - volatile u32 xVirtualDecr; // Virtual DECR for shared procsx78-x7B - u16 xSLBCount; // # of SLBs to maintain x7C-x7D - u8 xIdle; // Indicate OS is idle x7E - u8 xRsvd2_2; // Reserved x7F + u64 saved_gpr3; // Saved GPR3 x20-x27 + u64 saved_gpr4; // Saved GPR4 x28-x2F + u64 saved_gpr5; // Saved GPR5 x30-x37 + + u8 reserved4; // Reserved x38-x38 + u8 cpuctls_task_attrs; // Task attributes for cpuctls x39-x39 + u8 fpregs_in_use; // FP regs in use x3A-x3A + u8 pmcregs_in_use; // PMC regs in use x3B-x3B + volatile u32 saved_decr; // Saved Decr Value x3C-x3F + volatile u64 emulated_time_base;// Emulated TB for this thread x40-x47 + volatile u64 cur_plic_latency; // Unaccounted PLIC latency x48-x4F + u64 tot_plic_latency; // Accumulated PLIC latency x50-x57 + u64 wait_state_cycles; // Wait cycles for this proc x58-x5F + u64 end_of_quantum; // TB at end of quantum x60-x67 + u64 pdc_saved_sprg1; // Saved SPRG1 for PMC int x68-x6F + u64 pdc_saved_srr0; // Saved SRR0 for PMC int x70-x77 + volatile u32 virtual_decr; // Virtual DECR for shared procsx78-x7B + u16 slb_count; // # of SLBs to maintain x7C-x7D + u8 idle; // Indicate OS is idle x7E + u8 reserved5; // Reserved x7F //============================================================================= // CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors //============================================================================= - // This is the xYieldCount. An "odd" value (low bit on) means that + // This is the yield_count. An "odd" value (low bit on) means that // the processor is yielded (either because of an OS yield or a PLIC // preempt). An even value implies that the processor is currently // executing. // NOTE: This value will ALWAYS be zero for dedicated processors and // will NEVER be zero for shared processors (ie, initialized to a 1). - volatile u32 xYieldCount; // PLIC increments each dispatchx00-x03 - u8 xRsvd3_0[124]; // Reserved x04-x7F + volatile u32 yield_count; // PLIC increments each dispatchx00-x03 + u8 reserved6[124]; // Reserved x04-x7F //============================================================================= // CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data //============================================================================= - u8 xPmcSaveArea[256]; // PMC interrupt Area x00-xFF - - + u8 pmc_save_area[256]; // PMC interrupt Area x00-xFF }; #endif /* _ASM_LPPACA_H */ diff -ruN linus-bk-naca.10/include/asm-ppc64/paca.h linus-bk-naca.11/include/asm-ppc64/paca.h --- linus-bk-naca.10/include/asm-ppc64/paca.h 2004-12-31 15:48:57.000000000 +1100 +++ linus-bk-naca.11/include/asm-ppc64/paca.h 2004-12-31 15:54:35.000000000 +1100 @@ -34,8 +34,8 @@ * * This structure is not directly accessed by firmware or the service * processor except for the first two pointers that point to the - * ItLpPaca area and the ItLpRegSave area for this CPU. Both the - * ItLpPaca and ItLpRegSave objects are currently contained within the + * lppaca area and the ItLpRegSave area for this CPU. Both the + * lppaca and ItLpRegSave objects are currently contained within the * PACA but they do not need to be. */ struct paca_struct { @@ -50,7 +50,7 @@ * MAGIC: These first two pointers can't be moved - they're * accessed by the firmware */ - struct ItLpPaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ + struct lppaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ struct ItLpRegSave *reg_save_ptr; /* Pointer to LpRegSave for PLIC */ /* @@ -109,7 +109,7 @@ * alignment will suffice to ensure that it doesn't * cross a page boundary. */ - struct ItLpPaca lppaca __attribute__((__aligned__(0x400))); + struct lppaca lppaca __attribute__((__aligned__(0x400))); #ifdef CONFIG_PPC_ISERIES struct ItLpRegSave reg_save; #endif diff -ruN linus-bk-naca.10/include/asm-ppc64/spinlock.h linus-bk-naca.11/include/asm-ppc64/spinlock.h --- linus-bk-naca.10/include/asm-ppc64/spinlock.h 2004-09-09 09:59:50.000000000 +1000 +++ linus-bk-naca.11/include/asm-ppc64/spinlock.h 2004-12-13 15:25:23.000000000 +1100 @@ -57,7 +57,7 @@ #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) /* We only yield to the hypervisor if we are in shared processor mode */ -#define SHARED_PROCESSOR (get_paca()->lppaca.xSharedProc) +#define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc) extern void __spin_yield(spinlock_t *lock); extern void __rw_yield(rwlock_t *lock); #else /* SPLPAR || ISERIES */ diff -ruN linus-bk-naca.10/include/asm-ppc64/time.h linus-bk-naca.11/include/asm-ppc64/time.h --- linus-bk-naca.10/include/asm-ppc64/time.h 2004-07-05 11:49:20.000000000 +1000 +++ linus-bk-naca.11/include/asm-ppc64/time.h 2004-12-13 16:05:02.000000000 +1100 @@ -78,8 +78,8 @@ struct paca_struct *lpaca = get_paca(); int cur_dec; - if (lpaca->lppaca.xSharedProc) { - lpaca->lppaca.xVirtualDecr = val; + if (lpaca->lppaca.shared_proc) { + lpaca->lppaca.virtual_decr = val; cur_dec = get_dec(); if (cur_dec > val) HvCall_setVirtualDecr(); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/90f4646f/attachment.pgp From anton at samba.org Tue Jan 4 16:01:15 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:01:15 +1100 Subject: [PATCH] ppc64: Clarify rtasd printk Message-ID: <20050104050115.GG7335@krispykreme.ozlabs.ibm.com> Hi, On machines with RTAS but without event-scan support we would incorrectly claim there was no RTAS on the system. Signed-off-by: Anton Blanchard ===== rtasd.c 1.34 vs edited ===== --- 1.34/arch/ppc64/kernel/rtasd.c 2004-11-16 14:29:11 +11:00 +++ edited/rtasd.c 2004-12-26 13:36:56 +11:00 @@ -486,7 +486,7 @@ /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { if (systemcfg->platform & PLATFORM_PSERIES); - printk(KERN_ERR "rtasd: no RTAS on system\n"); + printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } From anton at samba.org Tue Jan 4 16:07:27 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:07:27 +1100 Subject: [PATCH] ppc64: fix some compiler warnings Message-ID: <20050104050727.GH7335@krispykreme.ozlabs.ibm.com> Fix some compiler warnings: - The first two are spurious gcc warnings, but quieten them up regardless - Add a missing include - Use register_sysrq_key instead of __sysrq_put_key_op Signed-off-by: Anton Blanchard diff -puN arch/ppc64/mm/hash_native.c~remove_compiler_warnings arch/ppc64/mm/hash_native.c --- gr_work/arch/ppc64/mm/hash_native.c~remove_compiler_warnings 2004-12-25 21:44:00.112288718 -0600 +++ gr_work-anton/arch/ppc64/mm/hash_native.c 2004-12-25 21:44:35.782093438 -0600 @@ -242,7 +242,7 @@ static long native_hpte_updatepp(unsigne */ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea) { - unsigned long vsid, va, vpn, flags; + unsigned long vsid, va, vpn, flags = 0; long slot; HPTE *hptep; int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); diff -puN arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_lpar.c --- gr_work/arch/ppc64/kernel/pSeries_lpar.c~remove_compiler_warnings 2004-12-25 21:44:48.291973925 -0600 +++ gr_work-anton/arch/ppc64/kernel/pSeries_lpar.c 2004-12-25 21:45:08.829912888 -0600 @@ -504,7 +504,7 @@ void pSeries_lpar_flush_hash_range(unsig int local) { int i; - unsigned long flags; + unsigned long flags = 0; struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); diff -puN arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings arch/ppc64/kernel/pSeries_setup.c --- gr_work/arch/ppc64/kernel/pSeries_setup.c~remove_compiler_warnings 2004-12-25 21:46:35.016298826 -0600 +++ gr_work-anton/arch/ppc64/kernel/pSeries_setup.c 2004-12-25 21:47:05.188173311 -0600 @@ -59,6 +59,7 @@ #include #include #include +#include #include "i8259.h" #include diff -puN arch/ppc64/xmon/start.c~remove_compiler_warnings arch/ppc64/xmon/start.c --- gr_work/arch/ppc64/xmon/start.c~remove_compiler_warnings 2004-12-25 21:48:27.578625901 -0600 +++ gr_work-anton/arch/ppc64/xmon/start.c 2004-12-25 21:48:55.121385858 -0600 @@ -40,7 +40,7 @@ static struct sysrq_key_op sysrq_xmon_op static int __init setup_xmon_sysrq(void) { - __sysrq_put_key_op('x', &sysrq_xmon_op); + register_sysrq_key('x', &sysrq_xmon_op); return 0; } __initcall(setup_xmon_sysrq); _ From anton at samba.org Tue Jan 4 16:13:35 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Jan 2005 16:13:35 +1100 Subject: [PATCH] ppc64: remove stale prom.h code Message-ID: <20050104051335.GJ7335@krispykreme.ozlabs.ibm.com> Remove some stale code in prom.h Signed-off-by: Anton Blanchard diff -puN include/asm-ppc64/prom.h~prom_cleanup include/asm-ppc64/prom.h --- foobar2/include/asm-ppc64/prom.h~prom_cleanup 2005-01-04 16:07:39.113436136 +1100 +++ foobar2-anton/include/asm-ppc64/prom.h 2005-01-04 16:07:39.132434650 +1100 @@ -21,9 +21,6 @@ #define PTRUNRELOC(x) ((typeof(x))((unsigned long)(x) + offset)) #define RELOC(x) (*PTRRELOC(&(x))) -#define LONG_LSW(X) (((unsigned long)X) & 0xffffffff) -#define LONG_MSW(X) (((unsigned long)X) >> 32) - /* Definitions used by the flattened device tree */ #define OF_DT_HEADER 0xd00dfeed /* 4: version, 4: total size */ #define OF_DT_BEGIN_NODE 0x1 /* Start node: full name */ @@ -64,8 +61,6 @@ struct boot_param_header typedef u32 phandle; typedef u32 ihandle; -typedef u32 phandle32; -typedef u32 ihandle32; struct address_range { unsigned long space; @@ -95,13 +90,6 @@ struct isa_range { unsigned int size; }; -struct of_tce_table { - phandle node; - unsigned long base; - unsigned long size; -}; -extern struct of_tce_table of_tce_table[]; - struct reg_property { unsigned long address; unsigned long size; @@ -117,19 +105,6 @@ struct reg_property64 { unsigned long size; }; -struct reg_property_pmac { - unsigned int address_hi; - unsigned int address_lo; - unsigned int size; -}; - -struct translation_property { - unsigned long virt; - unsigned long size; - unsigned long phys; - unsigned int flags; -}; - struct property { char *name; int length; @@ -160,8 +135,6 @@ struct device_node { int busno; /* for pci devices */ int bussubno; /* for pci devices */ int devfn; /* for pci devices */ -#define DN_STATUS_BIST_FAILED (1<<0) - int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; struct pci_controller *phb; /* for pci devices */ @@ -244,7 +217,6 @@ extern int of_remove_node(struct device_ /* Other Prototypes */ extern unsigned long prom_init(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); -extern void relocate_nodes(void); extern void finish_device_tree(void); extern int device_is_compatible(struct device_node *device, const char *); extern int machine_is_compatible(const char *compat); _ From paulus at samba.org Tue Jan 4 17:39:00 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 4 Jan 2005 17:39:00 +1100 Subject: [PATCH] PPC64 Simplify timer_interrupt Message-ID: <16858.14852.673750.729779@cargo.ozlabs.ibm.com> This patch is from Milton Miller . When the update_process_times call was moved out of do_timer for the UP case, the replicator didn't track down the hiding and just added ifndef SMP. This removes the ifdefs and the indirection of calling another file for one function in a third file. Signed-off-by: Milton Miller Signed-off-by: Paul Mackerras diff -urN base-2.6/arch/ppc64/kernel/smp.c test/arch/ppc64/kernel/smp.c --- base-2.6/arch/ppc64/kernel/smp.c 2005-01-04 16:24:21.930503880 +1100 +++ test/arch/ppc64/kernel/smp.c 2005-01-04 17:36:44.569526376 +1100 @@ -156,11 +156,6 @@ } } -void smp_local_timer_interrupt(struct pt_regs * regs) -{ - update_process_times(user_mode(regs)); -} - void smp_message_recv(int msg, struct pt_regs *regs) { switch(msg) { diff -urN base-2.6/arch/ppc64/kernel/time.c test/arch/ppc64/kernel/time.c --- base-2.6/arch/ppc64/kernel/time.c 2005-01-04 16:27:42.854446184 +1100 +++ test/arch/ppc64/kernel/time.c 2005-01-04 17:36:44.571526072 +1100 @@ -68,8 +68,6 @@ #include #include -void smp_local_timer_interrupt(struct pt_regs *); - u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; EXPORT_SYMBOL(jiffies_64); @@ -259,8 +257,6 @@ lpaca->lppaca.int_dword.fields.decr_int = 0; while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) { - -#ifdef CONFIG_SMP /* * We cannot disable the decrementer, so in the period * between this cpu's being marked offline in cpu_online_map @@ -269,8 +265,7 @@ * is the case. */ if (!cpu_is_offline(cpu)) - smp_local_timer_interrupt(regs); -#endif + update_process_times(user_mode(regs)); /* * No need to check whether cpu is offline here; boot_cpuid * should have been fixed up by now. @@ -279,9 +274,6 @@ write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; do_timer(regs); -#ifndef CONFIG_SMP - update_process_times(user_mode(regs)); -#endif timer_sync_xtime( cur_tb ); timer_check_rtc(); write_sequnlock(&xtime_lock); From sfr at canb.auug.org.au Tue Jan 4 22:58:09 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 22:58:09 +1100 Subject: [PATCH] PPC64: use c99 initializers In-Reply-To: <20050104154319.505b1197.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> <20050104154319.505b1197.sfr@canb.auug.org.au> Message-ID: <20050104225809.4b265440.sfr@canb.auug.org.au> Hi Andrew, This patch is just more clean up in the ppc64 arch. It uses c99 initializers for various iSeries structures that are used to pass information to the hypervisor. Also itLpNaca is not used by any code that could be in a module, so don't export it. Built and booted. Signed-off-by: Stephen Rothwell Please apply. P.S. for the StudlyCaps brigade, changing these is on my To Do list. :-) -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/LparData.c linus-bk-sfr.12/arch/ppc64/kernel/LparData.c --- linus-bk-sfr.11/arch/ppc64/kernel/LparData.c 2004-12-13 15:01:55.000000000 +1100 +++ linus-bk-sfr.12/arch/ppc64/kernel/LparData.c 2005-01-04 18:18:37.000000000 +1100 @@ -41,24 +41,22 @@ */ struct HvReleaseData hvReleaseData = { - 0xc8a5d9c4, /* desc = "HvRD" ebcdic */ - sizeof(struct HvReleaseData), - offsetof(struct naca_struct, xItVpdAreas), - &naca, /* 64-bit Naca address */ - 0x6000, /* offset of LparMap within loadarea (see head.S) */ - 0, - 1, /* tags inactive */ - 0, /* 64 bit */ - 0, /* shared processors */ - 0, /* HMT allowed */ - 6, /* TEMP: This allows non-GA driver */ - 4, /* We are v5r2m0 */ - 3, /* Min supported PLIC = v5r1m0 */ - 3, /* Min usable PLIC = v5r1m0 */ - { 0xd3, 0x89, 0x95, 0xa4, /* "Linux 2.4 "*/ - 0xa7, 0x40, 0xf2, 0x4b, - 0xf4, 0x4b, 0xf6, 0xf4 }, - {0} + .xDesc = 0xc8a5d9c4, /* "HvRD" ebcdic */ + .xSize = sizeof(struct HvReleaseData), + .xVpdAreasPtrOffset = offsetof(struct naca_struct, xItVpdAreas), + .xSlicNacaAddr = &naca, /* 64-bit Naca address */ + .xMsNucDataOffset = 0x6000, /* offset of LparMap within loadarea (see head.S) */ + .xTagsMode = 1, /* tags inactive */ + .xAddressSize = 0, /* 64 bit */ + .xNoSharedProcs = 0, /* shared processors */ + .xNoHMT = 0, /* HMT allowed */ + .xRsvd2 = 6, /* TEMP: This allows non-GA driver */ + .xVrmIndex = 4, /* We are v5r2m0 */ + .xMinSupportedPlicVrmIndex = 3, /* v5r1m0 */ + .xMinCompatablePlicVrmIndex = 3, /* v5r1m0 */ + .xVrmName = { 0xd3, 0x89, 0x95, 0xa4, /* "Linux 2.4.64" ebcdic */ + 0xa7, 0x40, 0xf2, 0x4b, + 0xf4, 0x4b, 0xf6, 0xf4 }, }; extern void SystemReset_Iseries(void); @@ -80,26 +78,33 @@ extern void InstructionAccessSLB_Iseries(void); struct ItLpNaca itLpNaca = { - 0xd397d581, /* desc = "LpNa" ebcdic */ - 0x0400, /* size of ItLpNaca */ - 0x0300, 19, /* offset to int array, # ents */ - 0, 0, 0, /* Part # of primary, serv, me */ - 0, 0x100, /* # of LP queues, offset */ - 0, 0, 0, /* Piranha stuff */ - { 0,0,0,0,0 }, /* reserved */ - 0,0,0,0,0,0,0, /* stuff */ - { 0,0,0,0,0 }, /* reserved */ - 0, /* reserved */ - 0, /* VRM index of PLIC */ - 0, 0, /* min supported, compat SLIC */ - 0, /* 64-bit addr of load area */ - 0, /* chunks for load area */ - 0, 0, /* PASE mask, seg table */ - { 0 }, /* 64 reserved bytes */ - { 0 }, /* 128 reserved bytes */ - { 0 }, /* Old LP Queue */ - { 0 }, /* 384 reserved bytes */ - { + .xDesc = 0xd397d581, /* "LpNa" ebcdic */ + .xSize = 0x0400, /* size of ItLpNaca */ + .xIntHdlrOffset = 0x0300, /* offset to int array */ + .xMaxIntHdlrEntries = 19, /* # ents */ + .xPrimaryLpIndex = 0, /* Part # of primary */ + .xServiceLpIndex = 0, /* Part # of serv */ + .xLpIndex = 0, /* Part # of me */ + .xMaxLpQueues = 0, /* # of LP queues */ + .xLpQueueOffset = 0x100, /* offset of start of LP queues */ + .xPirEnvironMode = 0, /* Piranha stuff */ + .xPirConsoleMode = 0, + .xPirDasdMode = 0, + .xLparInstalled = 0, + .xSysPartitioned = 0, + .xHwSyncedTBs = 0, + .xIntProcUtilHmt = 0, + .xSpVpdFormat = 0, + .xIntProcRatio = 0, + .xPlicVrmIndex = 0, /* VRM index of PLIC */ + .xMinSupportedSlicVrmInd = 0, /* min supported SLIC */ + .xMinCompatableSlicVrmInd = 0, /* min compat SLIC */ + .xLoadAreaAddr = 0, /* 64-bit addr of load area */ + .xLoadAreaChunks = 0, /* chunks for load area */ + .xPaseSysCallCRMask = 0, /* PASE mask */ + .xSlicSegmentTablePtr = 0, /* seg table */ + .xOldLpQueue = { 0 }, /* Old LP Queue */ + .xInterruptHdlr = { (u64)SystemReset_Iseries, /* 0x100 System Reset */ (u64)MachineCheck_Iseries, /* 0x200 Machine Check */ (u64)DataAccess_Iseries, /* 0x300 Data Access */ @@ -153,10 +158,8 @@ u64 xRecoveryLogBuffer[32] __attribute__((__section__(".data"))); struct SpCommArea xSpCommArea = { - 0xE2D7C3C2, - 1, - {0}, - 0, 0, 0, 0, {0} + .xDesc = 0xE2D7C3C2, + .xFormat = 1, }; /* The LparMap data is now located at offset 0x6000 in head.S @@ -168,22 +171,21 @@ * offset into the Naca of the pointer to the ItVpdAreas. */ struct ItVpdAreas itVpdAreas = { - 0xc9a3e5c1, /* "ItVA" */ - sizeof( struct ItVpdAreas ), - 0, 0, - 26, /* # VPD array entries */ - 10, /* # DMA array entries */ - NR_CPUS*2, maxPhysicalProcessors, /* Max logical, physical procs */ - offsetof(struct ItVpdAreas,xPlicDmaToks),/* offset to DMA toks */ - offsetof(struct ItVpdAreas,xSlicVpdAdrs),/* offset to VPD addrs */ - offsetof(struct ItVpdAreas,xPlicDmaLens),/* offset to DMA lens */ - offsetof(struct ItVpdAreas,xSlicVpdLens),/* offset to VPD lens */ - 0, /* max slot labels */ - 1, /* max LP queues */ - {0}, {0}, /* reserved */ - {0}, /* DMA lengths */ - {0}, /* DMA tokens */ - { /* VPD lengths */ + .xSlicDesc = 0xc9a3e5c1, /* "ItVA" */ + .xSlicSize = sizeof(struct ItVpdAreas), + .xSlicVpdEntries = ItVpdMaxEntries, /* # VPD array entries */ + .xSlicDmaEntries = ItDmaMaxEntries, /* # DMA array entries */ + .xSlicMaxLogicalProcs = NR_CPUS * 2, /* Max logical procs */ + .xSlicMaxPhysicalProcs = maxPhysicalProcessors, /* Max physical procs */ + .xSlicDmaToksOffset = offsetof(struct ItVpdAreas, xPlicDmaToks), + .xSlicVpdAdrsOffset = offsetof(struct ItVpdAreas, xSlicVpdAdrs), + .xSlicDmaLensOffset = offsetof(struct ItVpdAreas, xPlicDmaLens), + .xSlicVpdLensOffset = offsetof(struct ItVpdAreas, xSlicVpdLens), + .xSlicMaxSlotLabels = 0, /* max slot labels */ + .xSlicMaxLpQueues = 1, /* max LP queues */ + .xPlicDmaLens = { 0 }, /* DMA lengths */ + .xPlicDmaToks = { 0 }, /* DMA tokens */ + .xSlicVpdLens = { /* VPD lengths */ 0,0,0, /* 0 - 2 */ sizeof(xItExtVpdPanel), /* 3 Extended VPD */ sizeof(struct paca_struct), /* 4 length of Paca */ @@ -201,7 +203,7 @@ sizeof(struct ItLpQueue),/* 23 length of Lp Queue */ 0,0 /* 24 - 25 */ }, - { /* VPD addresses */ + .xSlicVpdAdrs = { /* VPD addresses */ 0,0,0, /* 0 - 2 */ &xItExtVpdPanel, /* 3 Extended VPD */ &paca[0], /* 4 first Paca */ diff -ruN linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk-sfr.11/arch/ppc64/kernel/ppc_ksyms.c 2004-12-31 14:52:14.000000000 +1100 +++ linus-bk-sfr.12/arch/ppc64/kernel/ppc_ksyms.c 2005-01-04 18:07:42.000000000 +1100 @@ -68,9 +68,6 @@ EXPORT_SYMBOL(__down_interruptible); EXPORT_SYMBOL(__up); EXPORT_SYMBOL(__down); -#ifdef CONFIG_PPC_ISERIES -EXPORT_SYMBOL(itLpNaca); -#endif EXPORT_SYMBOL(csum_partial); EXPORT_SYMBOL(csum_partial_copy_generic); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/7cbb34e2/attachment.pgp From sfr at canb.auug.org.au Tue Jan 4 23:05:08 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 4 Jan 2005 23:05:08 +1100 Subject: [PATCH] PPC64: tidy up the htab_data structure In-Reply-To: <20050104225809.4b265440.sfr@canb.auug.org.au> References: <20050104145356.4d5333dd.sfr@canb.auug.org.au> <20050104150410.199b132e.sfr@canb.auug.org.au> <20050104150833.5d3f3722.sfr@canb.auug.org.au> <20050104151229.521e8083.sfr@canb.auug.org.au> <20050104151906.6e50f1d2.sfr@canb.auug.org.au> <20050104152340.67219ccf.sfr@canb.auug.org.au> <20050104152705.6030abc5.sfr@canb.auug.org.au> <20050104153102.67284491.sfr@canb.auug.org.au> <20050104153445.3777e689.sfr@canb.auug.org.au> <20050104153740.56622b4f.sfr@canb.auug.org.au> <20050104154025.63a1b9fb.sfr@canb.auug.org.au> <20050104154319.505b1197.sfr@canb.auug.org.au> <20050104225809.4b265440.sfr@canb.auug.org.au> Message-ID: <20050104230508.13dd0df4.sfr@canb.auug.org.au> Hi Andrew, More tidying up. The htab_data structure contained 5 fields or which two were completely unused and one other was just kept for printing at boot time. I have mode the remaining two into global variables. Signed-off-by: Stephen Rothwell Built and booted on iSeries (which is always lpar) and on pSeries without partitioning. Please apply. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-sfr.12/arch/ppc64/kernel/iSeries_setup.c 2004-12-13 15:31:14.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/iSeries_setup.c 2005-01-04 19:01:54.000000000 +1100 @@ -472,18 +472,16 @@ printk("HPT absolute addr = %016lx, size = %dK\n", chunk_to_addr(hptFirstChunk), hptSizeChunks * 256); - /* Fill in the htab_data structure */ - /* Fill in size of hashed page table */ + /* Fill in the hashed page table hash mask */ num_ptegs = hptSizePages * (PAGE_SIZE / (sizeof(HPTE) * HPTES_PER_GROUP)); - htab_data.htab_num_ptegs = num_ptegs; - htab_data.htab_hash_mask = num_ptegs - 1; + htab_hash_mask = num_ptegs - 1; /* * The actual hashed page table is in the hypervisor, * we have no direct access */ - htab_data.htab = NULL; + htab_address = NULL; /* * Determine if absolute memory has any diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-sfr.12/arch/ppc64/kernel/pSeries_lpar.c 2004-12-31 15:16:48.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/pSeries_lpar.c 2005-01-04 19:00:17.000000000 +1100 @@ -436,7 +436,7 @@ hash = hpt_hash(vpn, 0); for (j = 0; j < 2; j++) { - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; for (i = 0; i < HPTES_PER_GROUP; i++) { hpte_dw0.dword0 = pSeries_lpar_hpte_getword0(slot); dw0 = hpte_dw0.dw0; diff -ruN linus-bk-sfr.12/arch/ppc64/kernel/setup.c linus-bk-sfr.13/arch/ppc64/kernel/setup.c --- linus-bk-sfr.12/arch/ppc64/kernel/setup.c 2004-12-31 16:24:11.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/kernel/setup.c 2005-01-04 18:58:58.000000000 +1100 @@ -55,6 +55,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -90,7 +91,6 @@ #endif /* extern void *stab; */ -extern HTAB htab_data; extern unsigned long klimit; extern void mm_init_ppc64(void); @@ -672,8 +672,8 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_data.htab = 0x%p\n", htab_data.htab); - printk("htab_data.num_ptegs = 0x%lx\n", htab_data.htab_num_ptegs); + printk("htab_address = 0x%p\n", htab_address); + printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); mm_init_ppc64(); diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_low.S linus-bk-sfr.13/arch/ppc64/mm/hash_low.S --- linus-bk-sfr.12/arch/ppc64/mm/hash_low.S 2004-10-14 18:37:37.000000000 +1000 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_low.S 2005-01-04 19:06:24.000000000 +1100 @@ -139,8 +139,8 @@ std r3,STK_PARM(r4)(r1) /* Get htab_hash_mask */ - ld r4,htab_data at got(2) - ld r27,16(r4) /* htab_data.htab_hash_mask -> r27 */ + ld r4,htab_hash_mask at got(2) + ld r27,0(r4) /* htab_hash_mask -> r27 */ /* Check if we may already be in the hashtable, in this case, we * go to out-of-line code to try to modify the HPTE diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_native.c linus-bk-sfr.13/arch/ppc64/mm/hash_native.c --- linus-bk-sfr.12/arch/ppc64/mm/hash_native.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_native.c 2005-01-04 19:09:45.000000000 +1100 @@ -52,7 +52,7 @@ unsigned long hpteflags, int bolted, int large) { unsigned long arpn = physRpn_to_absRpn(prpn); - HPTE *hptep = htab_data.htab + hpte_group; + HPTE *hptep = htab_address + hpte_group; Hpte_dword0 dw0; HPTE lhpte; int i; @@ -117,7 +117,7 @@ slot_offset = mftb() & 0x7; for (i = 0; i < HPTES_PER_GROUP; i++) { - hptep = htab_data.htab + hpte_group + slot_offset; + hptep = htab_address + hpte_group + slot_offset; dw0 = hptep->dw0.dw0; if (dw0.v && !dw0.bolted) { @@ -172,9 +172,9 @@ hash = hpt_hash(vpn, 0); for (j = 0; j < 2; j++) { - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; for (i = 0; i < HPTES_PER_GROUP; i++) { - hptep = htab_data.htab + slot; + hptep = htab_address + slot; dw0 = hptep->dw0.dw0; if ((dw0.avpn == (vpn >> 11)) && dw0.v && @@ -195,7 +195,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, unsigned long va, int large, int local) { - HPTE *hptep = htab_data.htab + slot; + HPTE *hptep = htab_address + slot; Hpte_dword0 dw0; unsigned long avpn = va >> 23; int ret = 0; @@ -254,7 +254,7 @@ slot = native_hpte_find(vpn); if (slot == -1) panic("could not find page to bolt\n"); - hptep = htab_data.htab + slot; + hptep = htab_address + slot; set_pp_bit(newpp, hptep); @@ -269,7 +269,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long va, int large, int local) { - HPTE *hptep = htab_data.htab + slot; + HPTE *hptep = htab_address + slot; Hpte_dword0 dw0; unsigned long avpn = va >> 23; unsigned long flags; @@ -336,10 +336,10 @@ secondary = (pte_val(batch->pte[i]) & _PAGE_SECONDARY) >> 15; if (secondary) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(batch->pte[i]) & _PAGE_GROUP_IX) >> 12; - hptep = htab_data.htab + slot; + hptep = htab_address + slot; avpn = va >> 23; if (large) diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c --- linus-bk-sfr.12/arch/ppc64/mm/hash_utils.c 2004-12-31 14:52:56.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/hash_utils.c 2005-01-04 19:08:37.000000000 +1100 @@ -74,7 +74,8 @@ extern unsigned long dart_tablebase; #endif /* CONFIG_U3_DART */ -HTAB htab_data = {NULL, 0, 0, 0, 0}; +HPTE *htab_address; +unsigned long htab_hash_mask; extern unsigned long _SDR1; @@ -113,7 +114,7 @@ hash = hpt_hash(vpn, large); - hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); #ifdef CONFIG_PPC_PSERIES if (systemcfg->platform & PLATFORM_LPAR) @@ -155,12 +156,11 @@ htab_size_bytes = pteg_count << 7; } - htab_data.htab_num_ptegs = pteg_count; - htab_data.htab_hash_mask = pteg_count - 1; + htab_hash_mask = pteg_count - 1; if (systemcfg->platform & PLATFORM_LPAR) { /* Using a hypervisor which owns the htab */ - htab_data.htab = NULL; + htab_address = NULL; _SDR1 = 0; } else { /* Find storage for the HPT. Must be contiguous in @@ -175,7 +175,7 @@ ppc64_terminate_msg(0x20, "hpt space"); loop_forever(); } - htab_data.htab = abs_to_virt(table); + htab_address = abs_to_virt(table); /* htab absolute addr + encoded htabsize */ _SDR1 = table + __ilog2(pteg_count) - 11; @@ -356,7 +356,7 @@ secondary = (pte_val(pte) & _PAGE_SECONDARY) >> 15; if (secondary) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(pte) & _PAGE_GROUP_IX) >> 12; ppc_md.hpte_invalidate(slot, va, huge, local); diff -ruN linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c --- linus-bk-sfr.12/arch/ppc64/mm/hugetlbpage.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-sfr.13/arch/ppc64/mm/hugetlbpage.c 2005-01-04 19:02:45.000000000 +1100 @@ -832,7 +832,7 @@ hash = hpt_hash(vpn, 1); if (pte_val(old_pte) & _PAGE_SECONDARY) hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(old_pte) & _PAGE_GROUP_IX) >> 12; if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1) @@ -846,7 +846,7 @@ prpn = pte_pfn(old_pte); repeat: - hpte_group = ((hash & htab_data.htab_hash_mask) * + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; /* Update the linux pte with the HPTE slot */ @@ -863,13 +863,13 @@ /* Primary is full, try the secondary */ if (unlikely(slot == -1)) { pte_val(new_pte) |= _PAGE_SECONDARY; - hpte_group = ((~hash & htab_data.htab_hash_mask) * + hpte_group = ((~hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; slot = ppc_md.hpte_insert(hpte_group, va, prpn, 1, hpteflags, 0, 1); if (slot == -1) { if (mftb() & 0x1) - hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; ppc_md.hpte_remove(hpte_group); goto repeat; diff -ruN linus-bk-sfr.12/arch/ppc64/mm/init.c linus-bk-sfr.13/arch/ppc64/mm/init.c --- linus-bk-sfr.12/arch/ppc64/mm/init.c 2004-12-10 16:26:54.000000000 +1100 +++ linus-bk-sfr.13/arch/ppc64/mm/init.c 2005-01-04 19:03:14.000000000 +1100 @@ -168,7 +168,7 @@ hash = hpt_hash(vpn, 0); - hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); /* Panic if a pte grpup is full */ if (ppc_md.hpte_insert(hpteg, va, pa >> PAGE_SHIFT, 0, diff -ruN linus-bk-sfr.12/include/asm-ppc64/mmu.h linus-bk-sfr.13/include/asm-ppc64/mmu.h --- linus-bk-sfr.12/include/asm-ppc64/mmu.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-sfr.13/include/asm-ppc64/mmu.h 2005-01-04 19:10:32.000000000 +1100 @@ -98,15 +98,8 @@ #define PP_RXRX 3 /* Supervisor read, User read */ -typedef struct { - HPTE * htab; - unsigned long htab_num_ptegs; - unsigned long htab_hash_mask; - unsigned long next_round_robin; - unsigned long last_kernel_address; -} HTAB; - -extern HTAB htab_data; +extern HPTE * htab_address; +extern unsigned long htab_hash_mask; static inline unsigned long hpt_hash(unsigned long vpn, int large) { -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050104/a5b39bfc/attachment.pgp From moilanen at austin.ibm.com Wed Jan 5 07:30:31 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 4 Jan 2005 14:30:31 -0600 Subject: [PATCH] xmon breakpoints fix for Power4/5 Message-ID: <20050104143031.62c25338@localhost> Looks like xmon breakpoints were not working on Power4/5. Here's a fix to the problem. Tested on Power3 and Power5 boxes. Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp Tue Jan 4 12:44:20 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Tue Jan 4 14:13:09 2005 @@ -1088,11 +1088,6 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { - printf("Hardware instruction breakpoint " - "not supported on this cpu\n"); - break; - } if (iabr) { iabr->enabled &= ~(BP_IABR | BP_IABR_TE); iabr = NULL; @@ -1101,10 +1096,15 @@ bpt_cmds(void) break; if (!check_bp_loc(a)) break; + bp = new_breakpoint(a); - if (bp != NULL) { + + if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) { bp->enabled |= BP_IABR | BP_IABR_TE; iabr = bp; + } else { + if (bp) + bp->enabled |= BP_TRAP; } break; _ From sjmunroe at us.ibm.com Wed Jan 5 09:02:04 2005 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Tue, 4 Jan 2005 16:02:04 -0600 Subject: ppc64 vDSO update In-Reply-To: <1101094716.13598.39.camel@gaston> Message-ID: Benjamin Herrenschmidt wrote on 11/21/2004 09:38:36 PM: > At the URL below, you can find a new version of the ppc64 vDSO patch against > a recent Linus bk tree. I intend to submit it upstream real soon as the work > on non-executable stack is waiting for it, though we must first make sure the > way symbols are exported to userland is ok for glibc. > > http://gate.crashing.org/~benh/ppc64-vdso-20041122.diff > ... > > (Craig: the signal issue is fixed now, either when building with > descriptors or > without). > > Ben. > Still haveing problems with VDSO/GLIBC integration. Basically any glibc make check test that uses signals is a space shot for both PPC32/PPC64. First it seems that glibc is expecting a (fairly normal) DSO image including two (2) LOAD entries in the program header. The current PPC64 kernel vdso images only contain one (1) LOAD entry: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E 0x10000 DYNAMIC 0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R 0x4 GNU_EH_FRAME 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4 This caused problems for the code in libc/elf/rtld.c that attempts to extract l_map_start/l_map_end for the vdso: else if (ph->p_type == PT_LOAD) { if (! l->l_addr) l->l_addr = ph->p_vaddr; else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) l->l_map_end = ph->p_vaddr + ph->p_memsz; else if ((ph->p_flags & PF_X) && ph->p_vaddr + ph->p_memsz >= l->l_text_end) l->l_text_end = ph->p_vaddr + ph->p_memsz; } This code will set l_addr but not l_map_end or l_text_end because it grabbed the p_vaddr from the 1st and only LOAD entry then continue the loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this causes the "assert (mapend > mapstart)" in __elf_preferred_address to fail. I hacked around this by removing the "else" from the "else if" but it just fails later. The remaining problem is we are getting into dl_iterate_phdr and taking a wild branch. This could be from the callback in dl_iterate_phdr and due to the incomplete nature of our vsdo. This is difficult to debug as the stack point (and TOC pointer in PPC64) are both clobbered by this point and GDB-6.1 gets totally confused. Ben: it would be handy if you could update the corefile support to include the vdso segments. Also please try a vdso with 2 LOAD segments. Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center From moilanen at austin.ibm.com Wed Jan 5 09:13:54 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 4 Jan 2005 16:13:54 -0600 Subject: in_be64() assembly Message-ID: <20050104161354.17f77ce7@localhost> I'm trying to use in_be64() and when I build, I get a compile errors: {standard input}: Assembler messages: {standard input}:5534: Error: syntax error; found `(' but expected `)' {standard input}:5534: Error: junk at end of line: `(3))' make[1]: *** [arch/ppc64/xmon/xmon.o] Error 1 make: *** [arch/ppc64/xmon] Error 2 make: *** Waiting for unfinished jobs.... Olof pointed out that in/out_le64 use a "b" operand for the addr. In in_be64(), when changed the "m" operand to a "b", the kernel built fine (although I haven't tried running it yet). What does the "b" operand mean? Patch used below. Thanks, Jake --- diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Tue Jan 4 15:59:50 2005 @@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons unsigned long ret; __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" - : "=r" (ret) : "m" (*addr)); + : "=r" (ret) : "b" (*addr)); return ret; } _ From amodra at bigpond.net.au Wed Jan 5 10:31:32 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Jan 2005 10:01:32 +1030 Subject: ppc64 vDSO update In-Reply-To: References: <1101094716.13598.39.camel@gaston> Message-ID: <20050104233132.GF11457@bubble.modra.org> On Tue, Jan 04, 2005 at 04:02:04PM -0600, Steve Munroe wrote: > First it seems that glibc is expecting a (fairly normal) DSO image > including two (2) LOAD entries in the program header. The current PPC64 > kernel vdso images only contain one (1) LOAD entry: > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > LOAD 0x000000 0x00100000 0x00100000 0x00e10 0x00e10 R E > 0x10000 > DYNAMIC 0x000d98 0x00100d98 0x00100d98 0x00078 0x00078 R 0x4 > GNU_EH_FRAME 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4 There's absolutely nothing wrong with an executable or shared lib having just one PT_LOAD segment. It's a glibc bug if ld.so can't handle it. > This caused problems for the code in libc/elf/rtld.c that attempts to > extract l_map_start/l_map_end for the vdso: > > else if (ph->p_type == PT_LOAD) > { > if (! l->l_addr) > l->l_addr = ph->p_vaddr; > else if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) > l->l_map_end = ph->p_vaddr + ph->p_memsz; > else if ((ph->p_flags & PF_X) > && ph->p_vaddr + ph->p_memsz >= l->l_text_end) > l->l_text_end = ph->p_vaddr + ph->p_memsz; > } > > This code will set l_addr but not l_map_end or l_text_end because it > grabbed the p_vaddr from the 1st and only LOAD entry then continue the > loop looking for the 2nd LOAD entry (which is not there!). On PPC32 this > causes the "assert (mapend > mapstart)" in __elf_preferred_address to > fail. I hacked around this by removing the "else" from the "else if" but > it just fails later. Buggy code. All the "else" keywords should be removed. ie. if (! l->l_addr) l->l_addr = ph->p_vaddr; if (ph->p_vaddr + ph->p_memsz >= l->l_map_end) l->l_map_end = ph->p_vaddr + ph->p_memsz; if ((ph->p_flags & PF_X) && ph->p_vaddr + ph->p_memsz >= l->l_text_end) l->l_text_end = ph->p_vaddr + ph->p_memsz; > The remaining problem is we are getting into dl_iterate_phdr and taking a > wild branch. This could be from the callback in dl_iterate_phdr and due to > the incomplete nature of our vsdo. This is difficult to debug as the stack > point (and TOC pointer in PPC64) are both clobbered by this point and > GDB-6.1 gets totally confused. I don't know what to suggest, other than brute force debugging by poking .long 0 over code paths you suspect might be executed. -- Alan Modra IBM OzLabs - Linux Technology Centre From paulus at samba.org Wed Jan 5 10:53:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 10:53:34 +1100 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <20050104143031.62c25338@localhost> References: <20050104143031.62c25338@localhost> Message-ID: <16859.11390.511469.875831@cargo.ozlabs.ibm.com> Jake Moilanen writes: > Looks like xmon breakpoints were not working on Power4/5. Here's a fix > to the problem. You mean the 'bi' command didn't make a breakpoint? Just use the 'b' command instead. Also you take out the if (bp != NULL) check which is needed. Rejected. Paul. From linas at austin.ibm.com Wed Jan 5 11:10:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 4 Jan 2005 18:10:16 -0600 Subject: in_be64() assembly In-Reply-To: <20050104161354.17f77ce7@localhost> References: <20050104161354.17f77ce7@localhost> Message-ID: <20050105001016.GC22274@austin.ibm.com> On Tue, Jan 04, 2005 at 04:13:54PM -0600, Jake Moilanen was heard to remark: > > diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h > --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 > +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Tue Jan 4 15:59:50 2005 > @@ -372,7 +372,7 @@ static inline unsigned long in_be64(cons > unsigned long ret; > > __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" > - : "=r" (ret) : "m" (*addr)); > + : "=r" (ret) : "b" (*addr)); > return ret; > } Very weird. Why anyone thought that doing a load with a zero offset is somehow 'correct' seems strange to me. The compiler is quite capable of computing offsets, and I don't see any aliasing issues. Certainly the 8, 16 and 32-bit versions doen't do this kind of funny business. Does the following work? static inline unsigned long in_be64(const whatever ...) { unsigned long ret; __asm__ __volatile__("ld %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); return ret; } I suspect in_le64 is also borken, it should be "ld %1,%2\n" ...with : "=r" (ret) , "=r" (tmp) : "m" (*addr) , instead of the b. out_le64 looks broken in the same way. --linas From paulus at samba.org Wed Jan 5 11:24:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 11:24:44 +1100 Subject: in_be64() assembly In-Reply-To: <20050104161354.17f77ce7@localhost> References: <20050104161354.17f77ce7@localhost> Message-ID: <16859.13260.426004.296846@cargo.ozlabs.ibm.com> Jake Moilanen writes: > In in_be64(), when changed the "m" operand to a "b", the kernel built > fine (although I haven't tried running it yet). What does the "b" > operand mean? "b" means the value should be in a "base" register, i.e. any gpr other than gpr0. Your patch isn't correct. We can either do: __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" : "=r" (ret) : "b" (addr)); (note no "*" before addr) or we can do __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); On the whole I think I prefer the second. Paul. From paulus at samba.org Wed Jan 5 11:35:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 11:35:34 +1100 Subject: in_be64() assembly In-Reply-To: <20050105001016.GC22274@austin.ibm.com> References: <20050104161354.17f77ce7@localhost> <20050105001016.GC22274@austin.ibm.com> Message-ID: <16859.13910.16173.232170@cargo.ozlabs.ibm.com> Linas Vepstas writes: > Very weird. Why anyone thought that doing a load with a zero offset > is somehow 'correct' seems strange to me. The compiler is quite It's one of the two addressing modes that PPC has - register + offset and register + register. > I suspect in_le64 is also borken, it should be > > "ld %1,%2\n" It and out_le64 are correct as they stand. They could be rewritten as "ld%U2%X2 %1,%2" etc. Paul. From david at gibson.dropbear.id.au Wed Jan 5 14:54:28 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 5 Jan 2005 14:54:28 +1100 Subject: [PPC64] Add performance monitor register information to processor.h Message-ID: <20050105035428.GC21259@zax> Andrew, please apply: Most special purpose registers on the ppc64 have both the SPR number, and the various fields within the register defined in asm-ppc64/processor.h. So far that's not true for the performance counter control registers, MMCR0 and MMCRA. They have the SPR numbers defined, but the internal fields are defined in the oprofile code and (just a few) in traps.c where they're actually used. This patch moves all the MMCR0 and MMCRA definitions, plus the MSR performance monitor bit, MSR_PMM, into processor.h. Index: working-2.6/include/asm-ppc64/processor.h =================================================================== --- working-2.6.orig/include/asm-ppc64/processor.h 2005-01-05 14:46:10.557311664 +1100 +++ working-2.6/include/asm-ppc64/processor.h 2005-01-05 14:46:12.551274880 +1100 @@ -44,6 +44,7 @@ #define MSR_DR_LG 4 /* Data Relocate */ #define MSR_PE_LG 3 /* Protection Enable */ #define MSR_PX_LG 2 /* Protection Exclusive Mode */ +#define MSR_PMM_LG 2 /* Performance monitor */ #define MSR_RI_LG 1 /* Recoverable Exception */ #define MSR_LE_LG 0 /* Little Endian */ @@ -76,6 +77,7 @@ #define MSR_DR __MASK(MSR_DR_LG) /* Data Relocate */ #define MSR_PE __MASK(MSR_PE_LG) /* Protection Enable */ #define MSR_PX __MASK(MSR_PX_LG) /* Protection Exclusive Mode */ +#define MSR_PMM __MASK(MSR_PMM_LG) /* Performance monitor */ #define MSR_RI __MASK(MSR_RI_LG) /* Recoverable Exception */ #define MSR_LE __MASK(MSR_LE_LG) /* Little Endian */ @@ -305,6 +307,9 @@ #define SPRN_SIAR 780 #define SPRN_SDAR 781 #define SPRN_MMCRA 786 +#define MMCRA_SIHV 0x10000000UL /* state of MSR HV when SIAR set */ +#define MMCRA_SIPR 0x08000000UL /* state of MSR PR when SIAR set */ +#define MMCRA_SAMPLE_ENABLE 0x00000001UL /* enable sampling */ #define SPRN_PMC1 787 #define SPRN_PMC2 788 #define SPRN_PMC3 789 @@ -314,6 +319,26 @@ #define SPRN_PMC7 793 #define SPRN_PMC8 794 #define SPRN_MMCR0 795 +#define MMCR0_FC 0x80000000UL /* freeze counters. set to 1 on a perfmon exception */ +#define MMCR0_FCS 0x40000000UL /* freeze in supervisor state */ +#define MMCR0_KERNEL_DISABLE MMCR0_FCS +#define MMCR0_FCP 0x20000000UL /* freeze in problem state */ +#define MMCR0_PROBLEM_DISABLE MMCR0_FCP +#define MMCR0_FCM1 0x10000000UL /* freeze counters while MSR mark = 1 */ +#define MMCR0_FCM0 0x08000000UL /* freeze counters while MSR mark = 0 */ +#define MMCR0_PMXE 0x04000000UL /* performance monitor exception enable */ +#define MMCR0_FCECE 0x02000000UL /* freeze counters on enabled condition or event */ +/* time base exception enable */ +#define MMCR0_TBEE 0x00400000UL /* time base exception enable */ +#define MMCR0_PMC1INTCONTROL 0x00008000UL /* PMC1 count enable*/ +#define MMCR0_PMCNINTCONTROL 0x00004000UL /* PMCn count enable*/ +#define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ +#define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ +#define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ +#define MMCR0_FCTI 0x00000008UL /* freeze counters in tags inactive mode */ +#define MMCR0_FCTA 0x00000004UL /* freeze counters in tags active mode */ +#define MMCR0_FCWAIT 0x00000002UL /* freeze counter in WAIT state */ +#define MMCR0_FCHV 0x00000001UL /* freeze conditions in hypervisor mode */ #define SPRN_MMCR1 798 /* Short-hand versions for a number of the above SPRNs */ Index: working-2.6/arch/ppc64/oprofile/op_impl.h =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_impl.h 2005-01-05 14:46:10.558311512 +1100 +++ working-2.6/arch/ppc64/oprofile/op_impl.h 2005-01-05 14:46:12.551274880 +1100 @@ -14,44 +14,6 @@ #define OP_MAX_COUNTER 8 -#define MSR_PMM (1UL << (63 - 61)) - -/* freeze counters. set to 1 on a perfmon exception */ -#define MMCR0_FC (1UL << (31 - 0)) - -/* freeze in supervisor state */ -#define MMCR0_KERNEL_DISABLE (1UL << (31 - 1)) - -/* freeze in problem state */ -#define MMCR0_PROBLEM_DISABLE (1UL << (31 - 2)) - -/* freeze counters while MSR mark = 1 */ -#define MMCR0_FCM1 (1UL << (31 - 3)) - -/* performance monitor exception enable */ -#define MMCR0_PMXE (1UL << (31 - 5)) - -/* freeze counters on enabled condition or event */ -#define MMCR0_FCECE (1UL << (31 - 6)) - -/* PMC1 count enable*/ -#define MMCR0_PMC1INTCONTROL (1UL << (31 - 16)) - -/* PMCn count enable*/ -#define MMCR0_PMCNINTCONTROL (1UL << (31 - 17)) - -/* performance monitor alert has occurred, set to 0 after handling exception */ -#define MMCR0_PMAO (1UL << (31 - 24)) - -/* state of MSR HV when SIAR set */ -#define MMCRA_SIHV (1UL << (63 - 35)) - -/* state of MSR PR when SIAR set */ -#define MMCRA_SIPR (1UL << (63 - 36)) - -/* enable sampling */ -#define MMCRA_SAMPLE_ENABLE (1UL << (63 - 63)) - /* Per-counter configuration as set via oprofilefs. */ struct op_counter_config { unsigned long valid; Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-05 14:46:10.558311512 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-05 14:46:12.552274728 +1100 @@ -545,9 +545,6 @@ } /* Ensure exceptions are disabled */ -#define MMCR0_PMXE (1UL << (31 - 5)) -#define MMCR0_PMAO (1UL << (31 - 24)) - static void dummy_perf(struct pt_regs *regs) { unsigned int mmcr0 = mfspr(SPRN_MMCR0); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Jan 5 16:09:38 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Jan 2005 16:09:38 +1100 Subject: [PATCH] PPC64 Use newer RTAS call when available Message-ID: <16859.30354.953245.690482@cargo.ozlabs.ibm.com> This patch is from Nathan Fontenot originally. The PPC64 EEH code needs a small update to start using the ibm,read-slot-reset-state2 rtas call if available. The currently used ibm,read-slot-reset-state call will be going away on future machines. This patch attempts to use the newer rtas call if available and falls back the older version otherwise. This will maintain EEH slot checking capabilities on all future and current firmware levels. Signed-off-by: Nathan Fontenot Signed-off-by: Paul Mackerras diff -urN base-2.6/arch/ppc64/kernel/eeh.c test/arch/ppc64/kernel/eeh.c --- base-2.6/arch/ppc64/kernel/eeh.c 2005-01-05 14:29:58.333466400 +1100 +++ test/arch/ppc64/kernel/eeh.c 2005-01-05 15:04:59.937483424 +1100 @@ -96,6 +96,7 @@ static int ibm_set_eeh_option; static int ibm_set_slot_reset; static int ibm_read_slot_reset_state; +static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; static int eeh_subsystem_enabled; @@ -408,6 +409,27 @@ } /** + * read_slot_reset_state - Read the reset state of a device node's slot + * @dn: device node to read + * @rets: array to return results in + */ +static int read_slot_reset_state(struct device_node *dn, int rets[]) +{ + int token, outputs; + + if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { + token = ibm_read_slot_reset_state2; + outputs = 4; + } else { + token = ibm_read_slot_reset_state; + outputs = 3; + } + + return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid)); +} + +/** * eeh_panic - call panic() for an eeh event that cannot be handled. * The philosophy of this routine is that it is better to panic and * halt the OS than it is to risk possible data corruption by @@ -509,7 +531,7 @@ int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) { int ret; - int rets[2]; + int rets[3]; unsigned long flags; int rc, reset_state; struct eeh_event *event; @@ -540,11 +562,8 @@ atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { /* re-read the slot reset state */ - rets[0] = -1; - rtas_call(ibm_read_slot_reset_state, 3, 3, rets, - dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid)); + if (read_slot_reset_state(dn, rets) != 0) + rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } return 0; @@ -557,10 +576,7 @@ * function zero of a multi-function device. * In any case they must share a common PHB. */ - ret = rtas_call(ibm_read_slot_reset_state, 3, 3, rets, - dn->eeh_config_addr, BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid)); - + ret = read_slot_reset_state(dn, rets); if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { __get_cpu_var(false_positives)++; return 0; @@ -756,6 +772,7 @@ ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); + ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); From moilanen at austin.ibm.com Thu Jan 6 01:42:02 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 08:42:02 -0600 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <16859.11390.511469.875831@cargo.ozlabs.ibm.com> References: <20050104143031.62c25338@localhost> <16859.11390.511469.875831@cargo.ozlabs.ibm.com> Message-ID: <20050105084202.5102b467@localhost> On Wed, 5 Jan 2005 10:53:34 +1100 Paul Mackerras wrote: > Jake Moilanen writes: > > > Looks like xmon breakpoints were not working on Power4/5. Here's a fix > > to the problem. > > You mean the 'bi' command didn't make a breakpoint? Just use the 'b' > command instead. Also you take out the if (bp != NULL) check which is > needed. I may have misunderstood what Anton wanted when I talked w/ him yesterday, but I was under the impression that he wanted 'bi' and 'bd' fixed for Power4/5/LPAR. I pretty much just made 'bi' work like 'b' for Power4/5. I should have been a little more explicit when I wrote up the description of the patch. If I misunderstood, please just throw this follow up patch away. In the follow up, I also included the (bp != NULL) even though it should not matter because we reuse the same bp everytime. I do agree that it should still have the check. I will be posting the 'bd' fix for LPAR shortly. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-bp arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-bp Wed Jan 5 08:14:09 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 08:15:48 2005 @@ -1050,7 +1050,7 @@ static char *breakpoint_help_string = "b [cnt] set breakpoint at given instr addr\n" "bc clear all breakpoints\n" "bc clear breakpoint number n or at addr\n" - "bi [cnt] set hardware instr breakpoint (broken?)\n" + "bi [cnt] set hardware instr breakpoint\n" "bd [cnt] set hardware data breakpoint (broken?)\n" ""; @@ -1088,11 +1088,6 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { - printf("Hardware instruction breakpoint " - "not supported on this cpu\n"); - break; - } if (iabr) { iabr->enabled &= ~(BP_IABR | BP_IABR_TE); iabr = NULL; @@ -1101,11 +1096,16 @@ bpt_cmds(void) break; if (!check_bp_loc(a)) break; + bp = new_breakpoint(a); - if (bp != NULL) { - bp->enabled |= BP_IABR | BP_IABR_TE; - iabr = bp; + if (bp) { + if (cur_cpu_spec->cpu_features & CPU_FTR_IABR) { + bp->enabled |= BP_IABR | BP_IABR_TE; + iabr = bp; + } else + bp->enabled |= BP_TRAP; } + break; case 'c': _ From moilanen at austin.ibm.com Thu Jan 6 01:52:19 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 08:52:19 -0600 Subject: [PATCH] xmon dabr support for LPAR Message-ID: <20050105085219.5eab02a8@localhost> Here's xmon DABR support for LPAR. I added SETCTRLREG which is a wrapper for setting a controlled register that will choose to use either an hcall or mtspr depending on what mode the machine is in. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-lpar-dabr arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-lpar-dabr Wed Jan 5 08:17:07 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 08:23:50 2005 @@ -712,7 +712,7 @@ static void insert_bpts(void) static void insert_cpu_bpts(void) { if (dabr.enabled) - set_dabr(dabr.address | (dabr.enabled & 7)); + set_controlled_dabr(dabr.address | (dabr.enabled & 7)); if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR)) set_iabr(iabr->address | (iabr->enabled & (BP_IABR|BP_IABR_TE))); @@ -740,7 +740,7 @@ static void remove_bpts(void) static void remove_cpu_bpts(void) { - set_dabr(0); + set_controlled_dabr(0); if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR)) set_iabr(0); } @@ -1051,7 +1051,7 @@ static char *breakpoint_help_string = "bc clear all breakpoints\n" "bc clear breakpoint number n or at addr\n" "bi [cnt] set hardware instr breakpoint\n" - "bd [cnt] set hardware data breakpoint (broken?)\n" + "bd [cnt] set hardware data breakpoint\n" ""; static void diff -puN arch/ppc64/xmon/privinst.h~xmon-lpar-dabr arch/ppc64/xmon/privinst.h --- linux-2.6-bk/arch/ppc64/xmon/privinst.h~xmon-lpar-dabr Wed Jan 5 08:17:22 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/privinst.h Wed Jan 5 08:20:02 2005 @@ -25,6 +25,16 @@ GETREG(cr) static inline void set_ ## name (long val) \ { asm volatile ("mtspr " #n ",%0" : : "r" (val)); } +/* + * If a register is a controlled resource protected when there + * is a hypervisor, then use this command. + */ +#define SETCTRLREG(name) \ + extern inline void set_lpar_ ##name(long val); \ + extern inline void set_controlled_ ## name (long val) \ + { (systemcfg->platform == PLATFORM_PSERIES_LPAR) ? \ + set_lpar_ ##name (val) : set_ ##name (val); } + GSETSPR(0, mq) GSETSPR(1, xer) GSETSPR(4, rtcu) @@ -48,6 +58,8 @@ GSETSPR(1009, hid1) GSETSPR(1010, iabr) GSETSPR(1013, dabr) GSETSPR(1023, pir) + +SETCTRLREG(dabr) static inline void store_inst(void *p) { diff -puN arch/ppc64/xmon/start.c~xmon-lpar-dabr arch/ppc64/xmon/start.c --- linux-2.6-bk/arch/ppc64/xmon/start.c~xmon-lpar-dabr Wed Jan 5 08:17:49 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/start.c Wed Jan 5 08:20:49 2005 @@ -46,6 +46,16 @@ static int __init setup_xmon_sysrq(void) __initcall(setup_xmon_sysrq); #endif /* CONFIG_MAGIC_SYSRQ */ +inline void set_lpar_dabr(long val) +{ + int rc; + + rc = plpar_hcall_norets(H_SET_DABR, val); + + if (rc != H_Success) + xmon_printf("Warning: setting DABR failed. rc = %d\n", rc); +} + int xmon_write(void *handle, void *ptr, int nb) { _ From linas at austin.ibm.com Thu Jan 6 07:27:56 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 5 Jan 2005 14:27:56 -0600 Subject: [PATCH] PPC64: xmon recursion Message-ID: <20050105202756.GF22274@austin.ibm.com> Hi, I've had a number of problems with recursive xmon calls, primarily because longjump was returning incorrectly. The following patch fixes this problem. Please review and forward upstream. --linas Signed-off-by: Linas Vepstas ===== arch/ppc64/xmon/setjmp.c 1.1 vs edited ===== --- 1.1/arch/ppc64/xmon/setjmp.c 2002-02-14 06:14:36 -06:00 +++ edited/arch/ppc64/xmon/setjmp.c 2004-12-14 17:51:29 -06:00 @@ -73,5 +73,6 @@ xmon_longjmp(long *buf, int val) ld 2,16(%0)\n\ mtlr 0\n\ mr 3,%1\n\ + blr \n\ " : : "r" (buf), "r" (val)); } From moilanen at austin.ibm.com Thu Jan 6 07:45:02 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:45:02 -0600 Subject: [PATCH 0/2] xmon io space read Message-ID: <20050105144502.56a15bcd@localhost> These patches allow xmon to read from ioremapped IO space. It uses a command very similar to the normal memory read. I elected to not reuse the memory read code because I wanted some extra "security" to help prevent an inadvertent destructive read. I had to also add a debugger_fault_handler() in bad_page_fault() to catch an illegal attempt at hashing a bad page via a hcall. Patch 1/2: Fix for in_be64() Patch 2/2: xmon code to read from io space. Thanks, Jake From moilanen at austin.ibm.com Thu Jan 6 07:52:55 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:52:55 -0600 Subject: [PATCH 1/2] xmon io space read In-Reply-To: <20050105144502.56a15bcd@localhost> References: <20050105144502.56a15bcd@localhost> Message-ID: <20050105145255.41819748@localhost> Here is the fix suggested by Paulus for in_be64(). Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h --- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan 4 15:33:22 2005 +++ linux-2.6-bk-moilanen/include/asm-ppc64/io.h Wed Jan 5 08:08:03 2005 @@ -371,7 +371,7 @@ static inline unsigned long in_be64(cons { unsigned long ret; - __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync" + __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync" : "=r" (ret) : "m" (*addr)); return ret; } _ From moilanen at austin.ibm.com Thu Jan 6 07:57:57 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 5 Jan 2005 14:57:57 -0600 Subject: [PATCH 2/2] xmon io space read In-Reply-To: <20050105144502.56a15bcd@localhost> References: <20050105144502.56a15bcd@localhost> Message-ID: <20050105145757.62c84c3b@localhost> Here is the support code for xmon to read IO space. It should come in handy to debug driver and bringup issues. Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/xmon/xmon.c~xmon-io-read arch/ppc64/xmon/xmon.c --- linux-2.6-bk/arch/ppc64/xmon/xmon.c~xmon-io-read Wed Jan 5 11:50:57 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/xmon/xmon.c Wed Jan 5 14:19:57 2005 @@ -93,6 +93,7 @@ static int mwrite(unsigned long, void *, static int handle_fault(struct pt_regs *); static void byterev(unsigned char *, int); static void memex(void); +static void iomemex(void); static int bsesc(void); static void dump(void); static void prdump(unsigned long, long); @@ -175,6 +176,7 @@ Commands:\n\ di dump instructions\n\ df dump float values\n\ dd dump double values\n\ + i IO memory dump\n\ e print exception information\n\ f flush cache\n\ la lookup symbol+offset of specified address\n\ @@ -794,6 +796,9 @@ cmds(struct pt_regs *excp) memex(); } break; + case 'i': + iomemex(); + break; case 'd': dump(); break; @@ -1855,6 +1860,130 @@ memex(void) } adrs += inc; } +} + +static char *iomemex_help_string = + "IO Memory examine command usage:\n" + "i addr [size] [options]\n" + " size may include chars from this set:\n" + " 1 examine byte (default)\n" + " 2 examine short (2 byte)\n" + " 4 examine int (4 byte)\n" + " 8 examine long (8 byte)\n" + " options may include chars from this set:\n" + " l little endian (default)\n" + " b big endian\n" + " a absolute address - does not add on pci_io_base\n" + "NOTE: Defaults to adding on pci_io_base\n" + ""; + + +#define LE 0 +#define BE 1 + +static void +ioread(unsigned long addr, int size, int endiness) +{ + int i; + long data; + + if (setjmp(bus_error_jmp) == 0) { + catch_memory_errors = 1; + sync(); + switch (size) { + case 1: + data = in_8((char *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.2x\n", addr, data); + break; + + case 2: + data = endiness ? in_be16((short *)addr) : in_le16((short *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.4x\n", addr, data); + break; + case 4: + data = endiness ? in_be32((int *)addr) : in_le32((int *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.8x\n", addr, data); + break; + case 8: + data = endiness ? in_be64((long *)addr) : in_le64((long *)addr); + sync(); + __delay(200); + printf("%.16lx: 0x%.16x\n", addr, data); + break; + default: + printf("ioread: invalid size (%d)\n", size); + } + } else { + printf("%.16lx: ", addr); + for (i = 0; i < size; i++) + printf("%s", fault_chars[fault_type]); + printf("\n"); + } + + catch_memory_errors = 0; + +} + +static void +iomemex(void) +{ + int size = 1; + int cmd; + int endiness = LE; + int absolute = 0; + + scanhex((void *)&adrs); + cmd = skipbl(); + if (cmd == '?') { + printf(iomemex_help_string); + return; + } else if (cmd == '\n' && !adrs) { + printf("pci_io_base: 0x%lx\n", pci_io_base); + return; + } + + termch = cmd; + + while ((cmd = skipbl()) != '\n') { + switch (cmd) { + case '1': size = 1; break; + case '2': size = 2; break; + case '4': size = 4; break; + case '8': size = 8; break; + case 'l': endiness = LE; break; + case 'b': endiness = BE; break; + case 'a': absolute = 1; break; + } + } + + if(size <= 0) + size = 1; + else if(size > 8) + size = 8; + + if (!absolute) + adrs += pci_io_base; + + printf("Will attempt to read:\n"); + printf("address:\t0x%lx\n", adrs); + printf("size:\t\t0x%lx\n", size); + printf("endiness:\t%s\n", endiness ? "Big" : "Little"); + printf("Are you sure (Y/n): "); + fflush(stdout); + flush_input(); + + cmd = skipbl(); + + if (cmd == 'n') + return; + + ioread(adrs, size, endiness); } int diff -puN arch/ppc64/mm/fault.c~xmon-io-read arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~xmon-io-read Wed Jan 5 13:27:30 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Wed Jan 5 13:27:38 2005 @@ -297,6 +297,9 @@ void bad_page_fault(struct pt_regs *regs return; } + if (debugger_fault_handler(regs)) + return; + /* kernel has accessed a bad area */ die("Kernel access of bad area", regs, sig); } _ From olof at austin.ibm.com Thu Jan 6 11:07:21 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 5 Jan 2005 18:07:21 -0600 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c Message-ID: <20050106000721.GA20029@austin.ibm.com> Hi, This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to the naming convention of the other iommu files. This is part of the iommu cleanup, but broken out as a separate patch since for mainline, a BK rename is more appropriate. Still, we need a patch to apply for non-BK-based trees (-mm) Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c | 89 ++++++++++++++++++++ linux-2.5/arch/ppc64/kernel/pci_dma_direct.c | 89 -------------------- 2 files changed, 89 insertions(+), 89 deletions(-) diff -L arch/ppc64/kernel/pci_dma_direct.c -puN arch/ppc64/kernel/pci_dma_direct.c~iommu-rename-pci_dma_direct /dev/null --- linux-2.5/arch/ppc64/kernel/pci_dma_direct.c +++ /dev/null 2004-12-07 13:25:26.079467688 -0600 @@ -1,89 +0,0 @@ -/* - * Support for DMA from PCI devices to main memory on - * machines without an iommu or with directly addressable - * RAM (typically a pmac with 2Gb of RAM or less) - * - * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include - -#include "pci.h" - -static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - void *ret; - - ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); - if (ret != NULL) { - memset(ret, 0, size); - *dma_handle = virt_to_abs(ret); - } - return ret; -} - -static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - free_pages((unsigned long)vaddr, get_order(size)); -} - -static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, - size_t size, enum dma_data_direction direction) -{ - return virt_to_abs(ptr); -} - -static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction direction) -{ -} - -static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ - int i; - - for (i = 0; i < nents; i++, sg++) { - sg->dma_address = page_to_phys(sg->page) + sg->offset; - sg->dma_length = sg->length; - } - - return nents; -} - -static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ -} - -void __init pci_dma_init_direct(void) -{ - pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; - pci_dma_ops.pci_map_single = pci_direct_map_single; - pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; - pci_dma_ops.pci_map_sg = pci_direct_map_sg; - pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; -} diff -puN /dev/null arch/ppc64/kernel/pci_direct_iommu.c --- /dev/null 2004-12-07 13:25:26.079467688 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c 2004-12-07 16:17:31.549078536 -0600 @@ -0,0 +1,89 @@ +/* + * Support for DMA from PCI devices to main memory on + * machines without an iommu or with directly addressable + * RAM (typically a pmac with 2Gb of RAM or less) + * + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "pci.h" + +static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, + dma_addr_t *dma_handle) +{ + void *ret; + + ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); + if (ret != NULL) { + memset(ret, 0, size); + *dma_handle = virt_to_abs(ret); + } + return ret; +} + +static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, + void *vaddr, dma_addr_t dma_handle) +{ + free_pages((unsigned long)vaddr, get_order(size)); +} + +static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, + size_t size, enum dma_data_direction direction) +{ + return virt_to_abs(ptr); +} + +static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ +} + +static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++, sg++) { + sg->dma_address = page_to_phys(sg->page) + sg->offset; + sg->dma_length = sg->length; + } + + return nents; +} + +static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ +} + +void __init pci_dma_init_direct(void) +{ + pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; + pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; + pci_dma_ops.pci_map_single = pci_direct_map_single; + pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; + pci_dma_ops.pci_map_sg = pci_direct_map_sg; + pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; +} _ From olof at austin.ibm.com Thu Jan 6 11:07:35 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 5 Jan 2005 18:07:35 -0600 Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch Message-ID: <20050106000735.GA20079@austin.ibm.com> Hi, Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted at simplifying the allocation schemes and modularising things for the various platforms. The IOMMU init functions are still a mess. This is an attempt to clean them up and make them somewhat easier to follow. The new rules are: 1. iommu_init_early_ is called before any PCI/VIO init is done 2. The pcibios fixup routines will call the iommu_{bus,dev}_setup functions appropriately as devices are added. TCE space allocation has changed somewhat: * On LPARs, nothing is really different. ibm,dma-window properties are still used to determine table sizes. * On pSeries SMP-mode (non-LPAR), the full TCE space per PHB is split up in 256MB chunks, each handed out to one child bus/slot as needed. This makes current max 7 child buses per PHB, something we're currently below on all machine models I'm aware of. * Exception to the above: Pre-POWER4 machines with Python PHBs have a full GB of DMA space allocated at the PHB level, since there are no EADS-level tables on such systems. * PowerMac and Maple still work like before: all buses/slots share one table. * VIO works like before, ibm,my-dma-window is used like before. * iSeries has not been touched much at all, besides the changed unit of the it_size variable in struct iommu_table. Other things changed: * Powermac and maple PCI/IOMMU inits have been changed a bit to conform to the new init structure * pci_dma_direct.c has been renamed pci_direct_iommu.c to match pci_iommu.c (see separate patch) * Likewise, a couple of the pci direct init functions have been renamed. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/Makefile | 2 linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c | 11 linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c | 3 linux-2.5-olof/arch/ppc64/kernel/iommu.c | 21 - linux-2.5-olof/arch/ppc64/kernel/maple_pci.c | 3 linux-2.5-olof/arch/ppc64/kernel/maple_setup.c | 7 linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 283 ++++++++++---------- linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c | 5 linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c | 5 linux-2.5-olof/arch/ppc64/kernel/pci.c | 5 linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c | 2 linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c | 2 linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c | 7 linux-2.5-olof/arch/ppc64/kernel/prom.c | 11 linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c | 104 ++++--- linux-2.5-olof/arch/ppc64/kernel/vio.c | 18 - linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c | 4 linux-2.5-olof/include/asm-ppc64/iommu.h | 13 linux-2.5-olof/include/asm-ppc64/machdep.h | 2 linux-2.5-olof/include/asm-ppc64/pci-bridge.h | 8 20 files changed, 265 insertions(+), 251 deletions(-) diff -puN arch/ppc64/kernel/pci.c~iommu-cleanup arch/ppc64/kernel/pci.c --- linux-2.5/arch/ppc64/kernel/pci.c~iommu-cleanup 2005-01-05 16:59:18.108168880 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci.c 2005-01-05 16:59:18.235149576 -0600 @@ -845,6 +845,11 @@ void __devinit pcibios_fixup_bus(struct pcibios_fixup_device_resources(dev, bus); } + ppc_md.iommu_bus_setup(bus); + + list_for_each_entry(dev, &bus->devices, bus_list) + ppc_md.iommu_dev_setup(dev); + if (!pci_probe_only) return; diff -puN include/asm-ppc64/machdep.h~iommu-cleanup include/asm-ppc64/machdep.h --- linux-2.5/include/asm-ppc64/machdep.h~iommu-cleanup 2005-01-05 16:59:18.112168272 -0600 +++ linux-2.5-olof/include/asm-ppc64/machdep.h 2005-01-05 16:59:18.236149424 -0600 @@ -70,6 +70,8 @@ struct machdep_calls { long index, long npages); void (*tce_flush)(struct iommu_table *tbl); + void (*iommu_dev_setup)(struct pci_dev *dev); + void (*iommu_bus_setup)(struct pci_bus *bus); int (*probe)(int platform); void (*setup_arch)(void); diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup 2005-01-05 16:59:18.141163864 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-05 17:08:34.411597904 -0600 @@ -46,6 +46,9 @@ #include #include "pci.h" +#define DBG(fmt...) + +extern int is_python(struct device_node *); static void tce_build_pSeries(struct iommu_table *tbl, long index, long npages, unsigned long uaddr, @@ -121,7 +124,7 @@ static void tce_build_pSeriesLP(struct i } } -DEFINE_PER_CPU(void *, tce_page) = NULL; +static DEFINE_PER_CPU(void *, tce_page) = NULL; static void tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, @@ -233,85 +236,6 @@ static void tce_freemulti_pSeriesLP(stru } } - -static void iommu_buses_init(void) -{ - struct pci_controller *phb, *tmp; - struct device_node *dn, *first_dn; - int num_slots, num_slots_ilog2; - int first_phb = 1; - unsigned long tcetable_ilog2; - - /* - * We default to a TCE table that maps 2GB (4MB table, 22 bits), - * however some machines have a 3GB IO hole and for these we - * create a table that maps 1GB (2MB table, 21 bits) - */ - if (io_hole_start < 0x80000000UL) - tcetable_ilog2 = 21; - else - tcetable_ilog2 = 22; - - /* XXX Should we be using pci_root_buses instead? -ojn - */ - - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { - first_dn = ((struct device_node *)phb->arch_data)->child; - - /* Carve 2GB into the largest dma_window_size possible */ - for (dn = first_dn, num_slots = 0; dn != NULL; dn = dn->sibling) - num_slots++; - num_slots_ilog2 = __ilog2(num_slots); - - if ((1<dma_window_size = 1 << (tcetable_ilog2 - num_slots_ilog2); - - /* Reserve 16MB of DMA space on the first PHB. - * We should probably be more careful and use firmware props. - * In reality this space is remapped, not lost. But we don't - * want to get that smart to handle it -- too much work. - */ - phb->dma_window_base_cur = first_phb ? (1 << 12) : 0; - first_phb = 0; - - for (dn = first_dn; dn != NULL; dn = dn->sibling) - iommu_devnode_init_pSeries(dn); - } -} - - -static void iommu_buses_init_lpar(struct list_head *bus_list) -{ - struct list_head *ln; - struct pci_bus *bus; - struct device_node *busdn; - unsigned int *dma_window; - - for (ln=bus_list->next; ln != bus_list; ln=ln->next) { - bus = pci_bus_b(ln); - - if (bus->self) - busdn = pci_device_to_OF_node(bus->self); - else - busdn = bus->sysdata; /* must be a phb */ - - dma_window = (unsigned int *)get_property(busdn, "ibm,dma-window", NULL); - if (dma_window) { - /* Bussubno hasn't been copied yet. - * Do it now because iommu_table_setparms_lpar needs it. - */ - busdn->bussubno = bus->number; - iommu_devnode_init_pSeries(busdn); - } - - /* look for a window on a bridge even if the PHB had one */ - iommu_buses_init_lpar(&bus->children); - } -} - - static void iommu_table_setparms(struct pci_controller *phb, struct device_node *dn, struct iommu_table *tbl) @@ -336,27 +260,18 @@ static void iommu_table_setparms(struct tbl->it_busno = phb->bus->number; /* Units of tce entries */ - tbl->it_offset = phb->dma_window_base_cur; - - /* Adjust the current table offset to the next - * region. Measured in TCE entries. Force an - * alignment to the size allotted per IOA. This - * makes it easier to remove the 1st 16MB. - */ - phb->dma_window_base_cur += (phb->dma_window_size>>3); - phb->dma_window_base_cur &= - ~((phb->dma_window_size>>3)-1); - - /* Set the tce table size - measured in pages */ - tbl->it_size = ((phb->dma_window_base_cur - - tbl->it_offset) << 3) >> PAGE_SHIFT; + tbl->it_offset = phb->dma_window_base_cur >> PAGE_SHIFT; /* Test if we are going over 2GB of DMA space */ - if (phb->dma_window_base_cur > (1 << 19)) + if (phb->dma_window_base_cur + phb->dma_window_size > (1L << 31)) panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n"); + phb->dma_window_base_cur += phb->dma_window_size; + + /* Set the tce table size - measured in entries */ + tbl->it_size = phb->dma_window_size >> PAGE_SHIFT; + tbl->it_index = 0; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } @@ -375,82 +290,174 @@ static void iommu_table_setparms(struct */ static void iommu_table_setparms_lpar(struct pci_controller *phb, struct device_node *dn, - struct iommu_table *tbl) + struct iommu_table *tbl, + unsigned int *dma_window) { - unsigned int *dma_window; - - dma_window = (unsigned int *)get_property(dn, "ibm,dma-window", NULL); - if (!dma_window) panic("iommu_table_setparms_lpar: device %s has no" " ibm,dma-window property!\n", dn->full_name); tbl->it_busno = dn->bussubno; - tbl->it_size = (((((unsigned long)dma_window[4] << 32) | - (unsigned long)dma_window[5]) >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - tbl->it_offset = ((((unsigned long)dma_window[2] << 32) | - (unsigned long)dma_window[3]) >> 12); + + /* TODO: Parse field size properties properly. */ + tbl->it_size = (((unsigned long)dma_window[4] << 32) | + (unsigned long)dma_window[5]) >> PAGE_SHIFT; + tbl->it_offset = (((unsigned long)dma_window[2] << 32) | + (unsigned long)dma_window[3]) >> PAGE_SHIFT; tbl->it_base = 0; tbl->it_index = dma_window[0]; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } +static void iommu_bus_setup_pSeries(struct pci_bus *bus) +{ + struct device_node *dn, *pdn; + + DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self); + + /* For each (root) bus, we carve up the available DMA space in 256MB + * pieces. Since each piece is used by one (sub) bus/device, that would + * give a maximum of 7 devices per PHB. In most cases, this is plenty. + * + * The exception is on Python PHBs (pre-POWER4). Here we don't have EADS + * bridges below the PHB to allocate the sectioned tables to, so instead + * we allocate a 1GB table at the PHB level. + */ + + dn = pci_bus_to_OF_node(bus); + + if (!bus->self) { + /* Root bus */ + if (is_python(dn)) { + struct iommu_table *tbl; + + DBG("Python root bus %s\n", bus->name); + + /* 1GB window by default */ + dn->phb->dma_window_size = 1 << 30; + dn->phb->dma_window_base_cur = 0; + + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); + + iommu_table_setparms(dn->phb, dn, tbl); + dn->iommu_table = iommu_init_table(tbl); + } else { + /* 256 MB window by default */ + dn->phb->dma_window_size = 1 << 28; + /* always skip the first 256MB */ + dn->phb->dma_window_base_cur = 1 << 28; + + /* No table at PHB level for non-python PHBs */ + } + } else { + pdn = pci_bus_to_OF_node(bus->parent); + + if (!pdn->iommu_table) { + struct iommu_table *tbl; + /* First child, allocate new table (256MB window) */ + + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); + + iommu_table_setparms(dn->phb, dn, tbl); + + dn->iommu_table = iommu_init_table(tbl); + } else { + /* Lower than first child or under python, copy parent table */ + dn->iommu_table = pdn->iommu_table; + } + } +} + -void iommu_devnode_init_pSeries(struct device_node *dn) +static void iommu_bus_setup_pSeriesLP(struct pci_bus *bus) { struct iommu_table *tbl; + struct device_node *dn, *pdn; + unsigned int *dma_window = NULL; - tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), - GFP_KERNEL); - - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) - iommu_table_setparms_lpar(dn->phb, dn, tbl); - else - iommu_table_setparms(dn->phb, dn, tbl); + dn = pci_bus_to_OF_node(bus); + + /* Find nearest ibm,dma-window, walking up the device tree */ + for (pdn = dn; pdn != NULL; pdn = pdn->parent) { + dma_window = (unsigned int *)get_property(pdn, "ibm,dma-window", NULL); + if (dma_window != NULL) + break; + } + + WARN_ON(dma_window == NULL); + + if (!pdn->iommu_table) { + /* Bussubno hasn't been copied yet. + * Do it now because iommu_table_setparms_lpar needs it. + */ + pdn->bussubno = bus->number; + + tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), + GFP_KERNEL); - dn->iommu_table = iommu_init_table(tbl); + iommu_table_setparms_lpar(pdn->phb, pdn, tbl, dma_window); + + pdn->iommu_table = iommu_init_table(tbl); + } + + if (pdn != dn) + dn->iommu_table = pdn->iommu_table; } -void iommu_setup_pSeries(void) + +static void iommu_dev_setup_pSeries(struct pci_dev *dev) { - struct pci_dev *dev = NULL; struct device_node *dn, *mydn; - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) - iommu_buses_init_lpar(&pci_root_buses); - else - iommu_buses_init(); - - /* Now copy the iommu_table ptr from the bus devices down to every + DBG("iommu_dev_setup_pSeries, dev %p (%s)\n", dev, dev->pretty_name); + /* Now copy the iommu_table ptr from the bus device down to the * pci device_node. This means get_iommu_table() won't need to search * up the device tree to find it. */ - for_each_pci_dev(dev) { - mydn = dn = pci_device_to_OF_node(dev); + mydn = dn = pci_device_to_OF_node(dev); - while (dn && dn->iommu_table == NULL) - dn = dn->parent; - if (dn) - mydn->iommu_table = dn->iommu_table; - } + while (dn && dn->iommu_table == NULL) + dn = dn->parent; + + WARN_ON(!dn); + + if (dn) + mydn->iommu_table = dn->iommu_table; } +static void iommu_bus_setup_null(struct pci_bus *b) { } +static void iommu_dev_setup_null(struct pci_dev *d) { } + /* These are called very early. */ -void tce_init_pSeries(void) +void iommu_init_early_pSeries(void) { - if (!(systemcfg->platform & PLATFORM_LPAR)) { + if (of_chosen && get_property(of_chosen, "linux,iommu-off", NULL)) { + /* Direct I/O, IOMMU off */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + pci_direct_iommu_init(); + + return; + } + + if (systemcfg->platform & PLATFORM_LPAR) { + if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { + ppc_md.tce_build = tce_buildmulti_pSeriesLP; + ppc_md.tce_free = tce_freemulti_pSeriesLP; + } else { + ppc_md.tce_build = tce_build_pSeriesLP; + ppc_md.tce_free = tce_free_pSeriesLP; + } + ppc_md.iommu_bus_setup = iommu_bus_setup_pSeriesLP; + } else { ppc_md.tce_build = tce_build_pSeries; ppc_md.tce_free = tce_free_pSeries; - } else if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { - ppc_md.tce_build = tce_buildmulti_pSeriesLP; - ppc_md.tce_free = tce_freemulti_pSeriesLP; - } else { - ppc_md.tce_build = tce_build_pSeriesLP; - ppc_md.tce_free = tce_free_pSeriesLP; + ppc_md.iommu_bus_setup = iommu_bus_setup_pSeries; } + ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; + pci_iommu_init(); } diff -puN arch/ppc64/kernel/u3_iommu.c~iommu-cleanup arch/ppc64/kernel/u3_iommu.c --- linux-2.5/arch/ppc64/kernel/u3_iommu.c~iommu-cleanup 2005-01-05 16:59:18.145163256 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/u3_iommu.c 2005-01-05 16:59:18.242148512 -0600 @@ -91,6 +91,7 @@ static unsigned int *dart; static unsigned int dart_emptyval; static struct iommu_table iommu_table_u3; +static int iommu_table_u3_inited; static int dart_dirty; #define DBG(...) @@ -192,7 +193,6 @@ static int dart_init(struct device_node unsigned int regword; unsigned int i; unsigned long tmp; - struct page *p; if (dart_tablebase == 0 || dart_tablesize == 0) { printk(KERN_INFO "U3-DART: table not allocated, using direct DMA\n"); @@ -209,16 +209,15 @@ static int dart_init(struct device_node * that to work around what looks like a problem with the HT bridge * prefetching into invalid pages and corrupting data */ - tmp = __get_free_pages(GFP_ATOMIC, 1); - if (tmp == 0) - panic("U3-DART: Cannot allocate spare page !"); - dart_emptyval = DARTMAP_VALID | - ((virt_to_abs(tmp) >> PAGE_SHIFT) & DARTMAP_RPNMASK); + tmp = lmb_alloc(PAGE_SIZE, PAGE_SIZE); + if (!tmp) + panic("U3-DART: Cannot allocate spare page!"); + dart_emptyval = DARTMAP_VALID | ((tmp >> PAGE_SHIFT) & DARTMAP_RPNMASK); /* Map in DART registers. FIXME: Use device node to get base address */ dart = ioremap(DART_BASE, 0x7000); if (dart == NULL) - panic("U3-DART: Cannot map registers !"); + panic("U3-DART: Cannot map registers!"); /* Set initial control register contents: table base, * table size and enable bit @@ -227,7 +226,6 @@ static int dart_init(struct device_node ((dart_tablebase >> PAGE_SHIFT) << DARTCNTL_BASE_SHIFT) | (((dart_tablesize >> PAGE_SHIFT) & DARTCNTL_SIZE_MASK) << DARTCNTL_SIZE_SHIFT); - p = virt_to_page(dart_tablebase); dart_vbase = ioremap(virt_to_abs(dart_tablebase), dart_tablesize); /* Fill initial table */ @@ -240,35 +238,67 @@ static int dart_init(struct device_node /* Invalidate DART to get rid of possible stale TLBs */ dart_tlb_invalidate_all(); + printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n"); + + return 0; +} + +static void iommu_table_u3_setup(void) +{ iommu_table_u3.it_busno = 0; - - /* Units of tce entries */ iommu_table_u3.it_offset = 0; - - /* Set the tce table size - measured in pages */ - iommu_table_u3.it_size = dart_tablesize >> PAGE_SHIFT; + /* it_size is in number of entries */ + iommu_table_u3.it_size = dart_tablesize / sizeof(u32); /* Initialize the common IOMMU code */ iommu_table_u3.it_base = (unsigned long)dart_vbase; iommu_table_u3.it_index = 0; iommu_table_u3.it_blocksize = 1; - iommu_table_u3.it_entrysize = sizeof(u32); iommu_init_table(&iommu_table_u3); /* Reserve the last page of the DART to avoid possible prefetch * past the DART mapped area */ - set_bit(iommu_table_u3.it_mapsize - 1, iommu_table_u3.it_map); + set_bit(iommu_table_u3.it_size - 1, iommu_table_u3.it_map); +} - printk(KERN_INFO "U3/CPC925 DART IOMMU initialized\n"); +static void iommu_dev_setup_u3(struct pci_dev *dev) +{ + struct device_node *dn; - return 0; + /* We only have one iommu table on the mac for now, which makes + * things simple. Setup all PCI devices to point to this table + * + * We must use pci_device_to_OF_node() to make sure that + * we get the real "final" pointer to the device in the + * pci_dev sysdata and not the temporary PHB one + */ + dn = pci_device_to_OF_node(dev); + + if (dn) + dn->iommu_table = &iommu_table_u3; +} + +static void iommu_bus_setup_u3(struct pci_bus *bus) +{ + struct device_node *dn; + + if (!iommu_table_u3_inited) { + iommu_table_u3_inited = 1; + iommu_table_u3_setup(); + } + + dn = pci_bus_to_OF_node(bus); + + if (dn) + dn->iommu_table = &iommu_table_u3; } -void iommu_setup_u3(void) +static void iommu_dev_setup_null(struct pci_dev *dev) { } +static void iommu_bus_setup_null(struct pci_bus *bus) { } + +void iommu_init_early_u3(void) { - struct pci_controller *phb, *tmp; - struct pci_dev *dev = NULL; struct device_node *dn; /* Find the DART in the device-tree */ @@ -282,31 +312,23 @@ void iommu_setup_u3(void) ppc_md.tce_flush = dart_flush; /* Initialize the DART HW */ - if (dart_init(dn)) - return; + if (dart_init(dn)) { + /* If init failed, use direct iommu and null setup functions */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + + /* Setup pci_dma ops */ + pci_direct_iommu_init(); + } else { + ppc_md.iommu_dev_setup = iommu_dev_setup_u3; + ppc_md.iommu_bus_setup = iommu_bus_setup_u3; - /* Setup pci_dma ops */ - pci_iommu_init(); - - /* We only have one iommu table on the mac for now, which makes - * things simple. Setup all PCI devices to point to this table - */ - for_each_pci_dev(dev) { - /* We must use pci_device_to_OF_node() to make sure that - * we get the real "final" pointer to the device in the - * pci_dev sysdata and not the temporary PHB one - */ - struct device_node *dn = pci_device_to_OF_node(dev); - if (dn) - dn->iommu_table = &iommu_table_u3; - } - /* We also make sure we set all PHBs ... */ - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { - dn = (struct device_node *)phb->arch_data; - dn->iommu_table = &iommu_table_u3; + /* Setup pci_dma ops */ + pci_iommu_init(); } } + void __init alloc_u3_dart_table(void) { /* Only reserve DART space if machine has more than 2GB of RAM diff -puN arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup arch/ppc64/kernel/iSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/iSeries_iommu.c~iommu-cleanup 2005-01-05 16:59:18.149162648 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_iommu.c 2005-01-05 16:59:18.243148360 -0600 @@ -132,11 +132,11 @@ static void iommu_table_getparms(struct if (parms->itc_size == 0) panic("PCI_DMA: parms->size is zero, parms is 0x%p", parms); - tbl->it_size = parms->itc_size; + /* itc_size is in pages worth of table, it_size is in # of entries */ + tbl->it_size = (parms->itc_size * PAGE_SIZE) / sizeof(union tce_entry); tbl->it_busno = parms->itc_busno; tbl->it_offset = parms->itc_offset; tbl->it_index = parms->itc_index; - tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 1; tbl->it_type = TCE_PCI; @@ -160,11 +160,16 @@ void iommu_devnode_init_iSeries(struct i kfree(tbl); } +static void iommu_dev_setup_iSeries(struct pci_dev *dev) { } +static void iommu_bus_setup_iSeries(struct pci_bus *bus) { } -void tce_init_iSeries(void) +void iommu_init_early_iSeries(void) { ppc_md.tce_build = tce_build_iSeries; ppc_md.tce_free = tce_free_iSeries; + ppc_md.iommu_dev_setup = iommu_dev_setup_iSeries; + ppc_md.iommu_bus_setup = iommu_bus_setup_iSeries; + pci_iommu_init(); } diff -puN drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup drivers/pci/hotplug/rpaphp_pci.c --- linux-2.5/drivers/pci/hotplug/rpaphp_pci.c~iommu-cleanup 2005-01-05 16:59:18.154161888 -0600 +++ linux-2.5-olof/drivers/pci/hotplug/rpaphp_pci.c 2005-01-05 16:59:18.245148056 -0600 @@ -25,6 +25,7 @@ #include #include #include +#include #include "../pci.h" /* for pci_add_new_bus */ #include "rpaphp.h" @@ -168,6 +169,9 @@ rpaphp_fixup_new_pci_devices(struct pci_ if (list_empty(&dev->global_list)) { int i; + /* Need to setup IOMMU tables */ + ppc_md.iommu_dev_setup(dev); + if(fix_bus) pcibios_fixup_device_resources(dev, bus); pci_read_irq_line(dev); diff -puN arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup arch/ppc64/kernel/pSeries_pci.c --- linux-2.5/arch/ppc64/kernel/pSeries_pci.c~iommu-cleanup 2005-01-05 16:59:18.158161280 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_pci.c 2005-01-05 16:59:18.246147904 -0600 @@ -148,7 +148,7 @@ struct pci_ops rtas_pci_ops = { rtas_pci_write_config }; -static int is_python(struct device_node *dev) +int is_python(struct device_node *dev) { char *model = (char *)get_property(dev, "model", NULL); @@ -554,9 +554,6 @@ void __init pSeries_final_fixup(void) pSeries_request_regions(); pci_fix_bus_sysdata(); - if (!of_chosen || !get_property(of_chosen, "linux,iommu-off", NULL)) - iommu_setup_pSeries(); - pci_addr_cache_build(); } diff -puN arch/ppc64/kernel/prom.c~iommu-cleanup arch/ppc64/kernel/prom.c --- linux-2.5/arch/ppc64/kernel/prom.c~iommu-cleanup 2005-01-05 16:59:18.162160672 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/prom.c 2005-01-05 16:59:18.249147448 -0600 @@ -1743,17 +1743,6 @@ static int of_finish_dynamic_node(struct node->devfn = (regs[0] >> 8) & 0xff; } - /* fixing up iommu_table */ - -#ifdef CONFIG_PPC_PSERIES - if (strcmp(node->name, "pci") == 0 && - get_property(node, "ibm,dma-window", NULL)) { - node->bussubno = node->busno; - iommu_devnode_init_pSeries(node); - } else - node->iommu_table = parent->iommu_table; -#endif /* CONFIG_PPC_PSERIES */ - out: of_node_put(parent); return err; diff -puN include/asm-ppc64/pci-bridge.h~iommu-cleanup include/asm-ppc64/pci-bridge.h --- linux-2.5/include/asm-ppc64/pci-bridge.h~iommu-cleanup 2005-01-05 16:59:18.166160064 -0600 +++ linux-2.5-olof/include/asm-ppc64/pci-bridge.h 2005-01-05 16:59:18.250147296 -0600 @@ -79,6 +79,14 @@ static inline struct device_node *pci_de return fetch_dev_dn(dev); } +static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus) +{ + if (bus->self) + return pci_device_to_OF_node(bus->self); + else + return bus->sysdata; /* Must be root bus (PHB) */ +} + extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev); diff -puN include/asm-ppc64/iommu.h~iommu-cleanup include/asm-ppc64/iommu.h --- linux-2.5/include/asm-ppc64/iommu.h~iommu-cleanup 2005-01-05 16:59:18.170159456 -0600 +++ linux-2.5-olof/include/asm-ppc64/iommu.h 2005-01-05 16:59:18.252146992 -0600 @@ -69,18 +69,16 @@ union tce_entry { struct iommu_table { unsigned long it_busno; /* Bus number this table belongs to */ - unsigned long it_size; /* Size in pages of iommu table */ + unsigned long it_size; /* Size of iommu table in entries */ unsigned long it_offset; /* Offset into global table */ unsigned long it_base; /* mapped address of tce table */ unsigned long it_index; /* which iommu table this is */ unsigned long it_type; /* type: PCI or Virtual Bus */ - unsigned long it_entrysize; /* Size of an entry in bytes */ unsigned long it_blocksize; /* Entries in each block (cacheline) */ unsigned long it_hint; /* Hint for next alloc */ unsigned long it_largehint; /* Hint for large allocs */ unsigned long it_halfpoint; /* Breaking point for small/large allocs */ spinlock_t it_lock; /* Protects it_map */ - unsigned long it_mapsize; /* Size of map in # of entries (bits) */ unsigned long *it_map; /* A simple allocation bitmap for now */ }; @@ -156,14 +154,13 @@ extern dma_addr_t iommu_map_single(struc extern void iommu_unmap_single(struct iommu_table *tbl, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction); -extern void tce_init_pSeries(void); -extern void tce_init_iSeries(void); +extern void iommu_init_early_pSeries(void); +extern void iommu_init_early_iSeries(void); +extern void iommu_init_early_u3(void); extern void pci_iommu_init(void); -extern void pci_dma_init_direct(void); +extern void pci_direct_iommu_init(void); extern void alloc_u3_dart_table(void); -extern int ppc64_iommu_off; - #endif /* _ASM_IOMMU_H */ diff -puN arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup arch/ppc64/kernel/pSeries_setup.c --- linux-2.5/arch/ppc64/kernel/pSeries_setup.c~iommu-cleanup 2005-01-05 16:59:18.175158696 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_setup.c 2005-01-05 16:59:18.253146840 -0600 @@ -375,10 +375,7 @@ static void __init pSeries_init_early(vo } - if (iommu_off) - pci_dma_init_direct(); - else - tce_init_pSeries(); + iommu_init_early_pSeries(); pSeries_discover_pic(); diff -puN arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup arch/ppc64/kernel/iSeries_setup.c --- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~iommu-cleanup 2005-01-05 16:59:18.180157936 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c 2005-01-05 16:59:18.255146536 -0600 @@ -68,7 +68,6 @@ extern void hvlog(char *fmt, ...); /* Function Prototypes */ extern void ppcdbg_initialize(void); -extern void tce_init_iSeries(void); static void build_iSeries_Memory_Map(void); static void setup_iSeries_cache_sizes(void); @@ -344,7 +343,7 @@ static void __init iSeries_parse_cmdline /* * Initialize the DMA/TCE management */ - tce_init_iSeries(); + iommu_init_early_iSeries(); /* * Initialize the table which translate Linux physical addresses to diff -puN arch/ppc64/kernel/maple_pci.c~iommu-cleanup arch/ppc64/kernel/maple_pci.c --- linux-2.5/arch/ppc64/kernel/maple_pci.c~iommu-cleanup 2005-01-05 16:59:18.184157328 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/maple_pci.c 2005-01-05 16:59:18.257146232 -0600 @@ -385,9 +385,6 @@ void __init maple_pcibios_fixup(void) /* Fixup the pci_bus sysdata pointers */ pci_fix_bus_sysdata(); - /* Setup the iommu */ - iommu_setup_u3(); - DBG(" <- maple_pcibios_fixup\n"); } diff -puN arch/ppc64/kernel/pmac_pci.c~iommu-cleanup arch/ppc64/kernel/pmac_pci.c --- linux-2.5/arch/ppc64/kernel/pmac_pci.c~iommu-cleanup 2005-01-05 16:59:18.188156720 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pmac_pci.c 2005-01-05 16:59:18.258146080 -0600 @@ -666,8 +666,6 @@ void __init pmac_pcibios_fixup(void) pci_read_irq_line(dev); pci_fix_bus_sysdata(); - - iommu_setup_u3(); } static void __init pmac_fixup_phb_resources(void) diff -puN arch/ppc64/kernel/pmac_setup.c~iommu-cleanup arch/ppc64/kernel/pmac_setup.c --- linux-2.5/arch/ppc64/kernel/pmac_setup.c~iommu-cleanup 2005-01-05 16:59:18.194155808 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pmac_setup.c 2005-01-05 16:59:18.309138328 -0600 @@ -166,11 +166,6 @@ void __init pmac_setup_arch(void) pmac_setup_smp(); #endif - /* Setup the PCI DMA to "direct" by default. May be overriden - * by iommu later on - */ - pci_dma_init_direct(); - /* Lookup PCI hosts */ pmac_pci_init(); @@ -317,6 +312,8 @@ void __init pmac_init_early(void) /* Setup interrupt mapping options */ ppc64_interrupt_controller = IC_OPEN_PIC; + iommu_init_early_u3(); + DBG(" <- pmac_init_early\n"); } diff -puN arch/ppc64/kernel/maple_setup.c~iommu-cleanup arch/ppc64/kernel/maple_setup.c --- linux-2.5/arch/ppc64/kernel/maple_setup.c~iommu-cleanup 2005-01-05 16:59:18.199155048 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/maple_setup.c 2005-01-05 16:59:18.309138328 -0600 @@ -111,11 +111,6 @@ void __init maple_setup_arch(void) #ifdef CONFIG_SMP smp_ops = &maple_smp_ops; #endif - /* Setup the PCI DMA to "direct" by default. May be overriden - * by iommu later on - */ - pci_dma_init_direct(); - /* Lookup PCI hosts */ maple_pci_init(); @@ -159,6 +154,8 @@ static void __init maple_init_early(void /* Setup interrupt mapping options */ ppc64_interrupt_controller = IC_OPEN_PIC; + iommu_init_early_u3(); + DBG(" <- maple_init_early\n"); } diff -puN arch/ppc64/kernel/smp.c~iommu-cleanup arch/ppc64/kernel/smp.c diff -puN arch/ppc64/kernel/vio.c~iommu-cleanup arch/ppc64/kernel/vio.c --- linux-2.5/arch/ppc64/kernel/vio.c~iommu-cleanup 2005-01-05 16:59:18.207153832 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/vio.c 2005-01-05 16:59:18.311138024 -0600 @@ -158,6 +158,7 @@ void __init iommu_vio_init(void) struct iommu_table *t; struct iommu_table_cb cb; unsigned long cbp; + unsigned long itc_entries; cb.itc_busno = 255; /* Bus 255 is the virtual bus */ cb.itc_virtbus = 0xff; /* Ask for virtual bus */ @@ -165,12 +166,12 @@ void __init iommu_vio_init(void) cbp = virt_to_abs(&cb); HvCallXm_getTceTableParms(cbp); - veth_iommu_table.it_size = cb.itc_size / 2; + itc_entries = cb.itc_size * PAGE_SIZE / sizeof(union tce_entry); + veth_iommu_table.it_size = itc_entries / 2; veth_iommu_table.it_busno = cb.itc_busno; veth_iommu_table.it_offset = cb.itc_offset; veth_iommu_table.it_index = cb.itc_index; veth_iommu_table.it_type = TCE_VB; - veth_iommu_table.it_entrysize = sizeof(union tce_entry); veth_iommu_table.it_blocksize = 1; t = iommu_init_table(&veth_iommu_table); @@ -178,13 +179,12 @@ void __init iommu_vio_init(void) if (!t) printk("Virtual Bus VETH TCE table failed.\n"); - vio_iommu_table.it_size = cb.itc_size - veth_iommu_table.it_size; + vio_iommu_table.it_size = itc_entries - veth_iommu_table.it_size; vio_iommu_table.it_busno = cb.itc_busno; vio_iommu_table.it_offset = cb.itc_offset + - veth_iommu_table.it_size * (PAGE_SIZE/sizeof(union tce_entry)); + veth_iommu_table.it_size; vio_iommu_table.it_index = cb.itc_index; vio_iommu_table.it_type = TCE_VB; - vio_iommu_table.it_entrysize = sizeof(union tce_entry); vio_iommu_table.it_blocksize = 1; t = iommu_init_table(&vio_iommu_table); @@ -511,7 +511,6 @@ static struct iommu_table * vio_build_io unsigned int *dma_window; struct iommu_table *newTceTable; unsigned long offset; - unsigned long size; int dma_window_property_size; dma_window = (unsigned int *) get_property(dev->dev.platform_data, "ibm,my-dma-window", &dma_window_property_size); @@ -521,21 +520,18 @@ static struct iommu_table * vio_build_io newTceTable = (struct iommu_table *) kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - size = ((dma_window[4] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - /* There should be some code to extract the phys-encoded offset using prom_n_addr_cells(). However, according to a comment on earlier versions, it's always zero, so we don't bother */ offset = dma_window[1] >> PAGE_SHIFT; - /* TCE table size - measured in units of pages of tce table */ - newTceTable->it_size = size; + /* TCE table size - measured in tce entries */ + newTceTable->it_size = dma_window[4] >> PAGE_SHIFT; /* offset for VIO should always be 0 */ newTceTable->it_offset = offset; newTceTable->it_busno = 0; newTceTable->it_index = (unsigned long)dma_window[0]; newTceTable->it_type = TCE_VB; - newTceTable->it_entrysize = sizeof(union tce_entry); return iommu_init_table(newTceTable); } diff -puN arch/ppc64/kernel/iommu.c~iommu-cleanup arch/ppc64/kernel/iommu.c --- linux-2.5/arch/ppc64/kernel/iommu.c~iommu-cleanup 2005-01-05 16:59:18.211153224 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iommu.c 2005-01-05 16:59:18.312137872 -0600 @@ -87,7 +87,7 @@ static unsigned long iommu_range_alloc(s start = largealloc ? tbl->it_largehint : tbl->it_hint; /* Use only half of the table for small allocs (15 pages or less) */ - limit = largealloc ? tbl->it_mapsize : tbl->it_halfpoint; + limit = largealloc ? tbl->it_size : tbl->it_halfpoint; if (largealloc && start < tbl->it_halfpoint) start = tbl->it_halfpoint; @@ -114,7 +114,7 @@ static unsigned long iommu_range_alloc(s * Second failure, rescan the other half of the table. */ start = (largealloc ^ pass) ? tbl->it_halfpoint : 0; - limit = pass ? tbl->it_mapsize : limit; + limit = pass ? tbl->it_size : limit; pass++; goto again; } else { @@ -194,7 +194,7 @@ static void __iommu_free(struct iommu_ta entry = dma_addr >> PAGE_SHIFT; free_entry = entry - tbl->it_offset; - if (((free_entry + npages) > tbl->it_mapsize) || + if (((free_entry + npages) > tbl->it_size) || (entry < tbl->it_offset)) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_free: invalid entry\n"); @@ -202,7 +202,7 @@ static void __iommu_free(struct iommu_ta printk(KERN_INFO "\tdma_addr = 0x%lx\n", (u64)dma_addr); printk(KERN_INFO "\tTable = 0x%lx\n", (u64)tbl); printk(KERN_INFO "\tbus# = 0x%lx\n", (u64)tbl->it_busno); - printk(KERN_INFO "\tmapsize = 0x%lx\n", (u64)tbl->it_mapsize); + printk(KERN_INFO "\tsize = 0x%lx\n", (u64)tbl->it_size); printk(KERN_INFO "\tstartOff = 0x%lx\n", (u64)tbl->it_offset); printk(KERN_INFO "\tindex = 0x%lx\n", (u64)tbl->it_index); WARN_ON(1); @@ -407,14 +407,11 @@ struct iommu_table *iommu_init_table(str unsigned long sz; static int welcomed = 0; - /* it_size is in pages, it_mapsize in number of entries */ - tbl->it_mapsize = (tbl->it_size << PAGE_SHIFT) / tbl->it_entrysize; - /* Set aside 1/4 of the table for large allocations. */ - tbl->it_halfpoint = tbl->it_mapsize * 3 / 4; + tbl->it_halfpoint = tbl->it_size * 3 / 4; /* number of bytes needed for the bitmap */ - sz = (tbl->it_mapsize + 7) >> 3; + sz = (tbl->it_size + 7) >> 3; tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz)); if (!tbl->it_map) @@ -448,8 +445,8 @@ void iommu_free_table(struct device_node } /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { + /* it_size is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_size/64); i++) { if (tbl->it_map[i] != 0) { printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", __FUNCTION__, dn->full_name); @@ -458,7 +455,7 @@ void iommu_free_table(struct device_node } /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; + bitmap_sz = (tbl->it_size + 7) / 8; /* free bitmap */ order = get_order(bitmap_sz); diff -puN arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup arch/ppc64/kernel/pci_direct_iommu.c --- linux-2.5/arch/ppc64/kernel/pci_direct_iommu.c~iommu-cleanup 2005-01-05 16:59:18.215152616 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pci_direct_iommu.c 2005-01-05 16:59:18.312137872 -0600 @@ -78,7 +78,7 @@ static void pci_direct_unmap_sg(struct p { } -void __init pci_dma_init_direct(void) +void __init pci_direct_iommu_init(void) { pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; diff -puN arch/ppc64/kernel/Makefile~iommu-cleanup arch/ppc64/kernel/Makefile --- linux-2.5/arch/ppc64/kernel/Makefile~iommu-cleanup 2005-01-05 16:59:18.219152008 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/Makefile 2005-01-05 16:59:18.313137720 -0600 @@ -16,7 +16,7 @@ obj-y := setup.o entry.o t obj-$(CONFIG_PPC_OF) += of_device.o pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o -pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_dma_direct.o +pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o obj-$(CONFIG_PCI) += pci.o pci_iommu.o iomap.o $(pci-obj-y) _ From olof at austin.ibm.com Thu Jan 6 11:26:31 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 05 Jan 2005 18:26:31 -0600 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050105162409.5cc9087e.akpm@osdl.org> References: <20050106000721.GA20029@austin.ibm.com> <20050105162409.5cc9087e.akpm@osdl.org> Message-ID: <41DC85B7.1060000@austin.ibm.com> Andrew Morton wrote: >Olof Johansson wrote: > > >>This is part of the iommu cleanup, but broken out as a separate patch >> since for mainline, a BK rename is more appropriate. Still, we need a >> patch to apply for non-BK-based trees (-mm) >> >> > >It's not clear to me what this comment means. Is this patch for upstream >merging? > >bk is fairly good at detecting when a gnu patch is simply performing a >rename and will convert it into a `bk mv'. > Ah, I didn't know that it was that clever. If so, it's good for upstream merging. Otherwise I would have recommended a manual bk mv upstream, that's what the comment referred to. -Olof From akpm at osdl.org Thu Jan 6 11:24:09 2005 From: akpm at osdl.org (Andrew Morton) Date: Wed, 5 Jan 2005 16:24:09 -0800 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050106000721.GA20029@austin.ibm.com> References: <20050106000721.GA20029@austin.ibm.com> Message-ID: <20050105162409.5cc9087e.akpm@osdl.org> Olof Johansson wrote: > > This is part of the iommu cleanup, but broken out as a separate patch > since for mainline, a BK rename is more appropriate. Still, we need a > patch to apply for non-BK-based trees (-mm) It's not clear to me what this comment means. Is this patch for upstream merging? bk is fairly good at detecting when a gnu patch is simply performing a rename and will convert it into a `bk mv'. From paulus at samba.org Thu Jan 6 14:14:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 6 Jan 2005 14:14:44 +1100 Subject: [PATCH] [PPC64] [1/2] IOMMU cleanups: rename pci_dma_direct.c In-Reply-To: <20050106000721.GA20029@austin.ibm.com> References: <20050106000721.GA20029@austin.ibm.com> Message-ID: <16860.44324.493730.587567@cargo.ozlabs.ibm.com> Olof Johansson writes: > This patch renames pci_dma_direct.c to pci_direct_iommu.c to comply to > the naming convention of the other iommu files. > > This is part of the iommu cleanup, but broken out as a separate patch > since for mainline, a BK rename is more appropriate. Still, we need a > patch to apply for non-BK-based trees (-mm) > > Signed-off-by: Olof Johansson Acked-by: Paul Mackerras From paulus at samba.org Thu Jan 6 14:15:12 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 6 Jan 2005 14:15:12 +1100 Subject: [PATCH] [PPC64] [2/2] IOMMU cleanups: Main cleanup patch In-Reply-To: <20050106000735.GA20079@austin.ibm.com> References: <20050106000735.GA20079@austin.ibm.com> Message-ID: <16860.44352.242290.624886@cargo.ozlabs.ibm.com> Olof Johansson writes: > Earlier cleanup efforts of the ppc64 IOMMU code have mostly been targeted > at simplifying the allocation schemes and modularising things for the > various platforms. The IOMMU init functions are still a mess. This is > an attempt to clean them up and make them somewhat easier to follow. ... > Signed-off-by: Olof Johansson Acked-by: Paul Mackerras From sfr at canb.auug.org.au Thu Jan 6 14:51:02 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 6 Jan 2005 14:51:02 +1100 Subject: [PATCH] htab code cleanup Message-ID: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> Hi all, This patch just does some small clean ups on the hash page table code - make htab_address static with in htab_native.c - move some code that depended on CONFIG_PPC_MULTIPLATFORM from htab_utils.c to htab_native.c (on less CONFIG check). - clean up includes in htab_utils.c -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/kernel/iSeries_setup.c 2005-01-06 14:37:42.000000000 +1100 @@ -478,12 +478,6 @@ htab_hash_mask = num_ptegs - 1; /* - * The actual hashed page table is in the hypervisor, - * we have no direct access - */ - htab_address = NULL; - - /* * Determine if absolute memory has any * holes so that we can interpret the * access map we get back from the hypervisor diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-sfr.14/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/kernel/setup.c 2005-01-06 14:37:54.000000000 +1100 @@ -673,7 +673,6 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); diff -ruN linus-bk/arch/ppc64/mm/hash_native.c linus-bk-sfr.14/arch/ppc64/mm/hash_native.c --- linus-bk/arch/ppc64/mm/hash_native.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/mm/hash_native.c 2005-01-06 14:37:14.000000000 +1100 @@ -9,6 +9,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ +#include #include #include #include @@ -22,6 +23,15 @@ #include #include #include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +static HPTE *htab_address; #define HPTE_LOCK_BIT 3 @@ -410,6 +420,173 @@ } #endif +/* + * Note: pte --> Linux PTE + * HPTE --> PowerPC Hashed Page Table Entry + * + * Execution context: + * htab_initialize is called with the MMU off (of course), but + * the kernel has been copied down to zero so it can directly + * reference global data. At this point it is very difficult + * to print debug info. + * + */ + +#ifdef CONFIG_U3_DART +extern unsigned long dart_tablebase; +#endif /* CONFIG_U3_DART */ +extern unsigned long _SDR1; + +#define KB (1024) +#define MB (1024*KB) + +static inline void loop_forever(void) +{ + volatile unsigned long x = 1; + for(;x;x|=1) + ; +} + +static inline void create_pte_mapping(unsigned long start, unsigned long end, + unsigned long mode, int large) +{ + unsigned long addr; + unsigned int step; + + if (large) + step = 16*MB; + else + step = 4*KB; + + for (addr = start; addr < end; addr += step) { + unsigned long vpn, hash, hpteg; + unsigned long vsid = get_kernel_vsid(addr); + unsigned long va = (vsid << 28) | (addr & 0xfffffff); + int ret; + + if (large) + vpn = va >> HPAGE_SHIFT; + else + vpn = va >> PAGE_SHIFT; + + hash = hpt_hash(vpn, large); + + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); + +#ifdef CONFIG_PPC_PSERIES + if (systemcfg->platform & PLATFORM_LPAR) + ret = pSeries_lpar_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + else +#endif /* CONFIG_PPC_PSERIES */ + ret = native_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + + if (ret == -1) { + ppc64_terminate_msg(0x20, "create_pte_mapping"); + loop_forever(); + } + } +} + +void __init htab_initialize(void) +{ + unsigned long table, htab_size_bytes; + unsigned long pteg_count; + unsigned long mode_rw; + int i, use_largepages = 0; + + DBG(" -> htab_initialize()\n"); + + /* + * Calculate the required size of the htab. We want the number of + * PTEGs to equal one half the number of real pages. + */ + htab_size_bytes = 1UL << ppc64_pft_size; + pteg_count = htab_size_bytes >> 7; + + /* For debug, make the HTAB 1/8 as big as it normally would be. */ + ifppcdebug(PPCDBG_HTABSIZE) { + pteg_count >>= 3; + htab_size_bytes = pteg_count << 7; + } + + htab_hash_mask = pteg_count - 1; + + if (systemcfg->platform & PLATFORM_LPAR) { + /* Using a hypervisor which owns the htab */ + htab_address = NULL; + _SDR1 = 0; + } else { + /* Find storage for the HPT. Must be contiguous in + * the absolute address space. + */ + table = lmb_alloc(htab_size_bytes, htab_size_bytes); + + DBG("Hash table allocated at %lx, size: %lx\n", table, + htab_size_bytes); + + if ( !table ) { + ppc64_terminate_msg(0x20, "hpt space"); + loop_forever(); + } + htab_address = abs_to_virt(table); + + /* htab absolute addr + encoded htabsize */ + _SDR1 = table + __ilog2(pteg_count) - 11; + + /* Initialize the HPT with no entries */ + memset((void *)table, 0, htab_size_bytes); + } + + mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + + /* On U3 based machines, we need to reserve the DART area and + * _NOT_ map it to avoid cache paradoxes as it's remapped non + * cacheable later on + */ + if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + use_largepages = 1; + + /* create bolted the linear mapping in the hash table */ + for (i=0; i < lmb.memory.cnt; i++) { + unsigned long base, size; + + base = lmb.memory.region[i].physbase + KERNELBASE; + size = lmb.memory.region[i].size; + + DBG("creating mapping for region: %lx : %lx\n", base, size); + +#ifdef CONFIG_U3_DART + /* Do not map the DART space. Fortunately, it will be aligned + * in such a way that it will not cross two lmb regions and will + * fit within a single 16Mb page. + * The DART space is assumed to be a full 16Mb region even if we + * only use 2Mb of that space. We will use more of it later for + * AGP GART. We have to use a full 16Mb large page. + */ + DBG("DART base: %lx\n", dart_tablebase); + + if (dart_tablebase != 0 && dart_tablebase >= base + && dart_tablebase < (base + size)) { + if (base != dart_tablebase) + create_pte_mapping(base, dart_tablebase, mode_rw, + use_largepages); + if ((base + size) > (dart_tablebase + 16*MB)) + create_pte_mapping(dart_tablebase + 16*MB, base + size, + mode_rw, use_largepages); + continue; + } +#endif /* CONFIG_U3_DART */ + create_pte_mapping(base, base + size, mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); +} +#undef KB +#undef MB + void hpte_init_native(void) { ppc_md.hpte_invalidate = native_hpte_invalidate; diff -ruN linus-bk/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c --- linus-bk/arch/ppc64/mm/hash_utils.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14/arch/ppc64/mm/hash_utils.c 2005-01-06 14:37:27.000000000 +1100 @@ -17,220 +17,29 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - -#undef DEBUG - -#include -#include -#include +#include +#include +#include #include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include -#include #include #include #include #include #include -#include #include -#include #include -#include -#include #include -#include -#include -#include #include -#include -#include - -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - -/* - * Note: pte --> Linux PTE - * HPTE --> PowerPC Hashed Page Table Entry - * - * Execution context: - * htab_initialize is called with the MMU off (of course), but - * the kernel has been copied down to zero so it can directly - * reference global data. At this point it is very difficult - * to print debug info. - * - */ - -#ifdef CONFIG_U3_DART -extern unsigned long dart_tablebase; -#endif /* CONFIG_U3_DART */ +#include -HPTE *htab_address; unsigned long htab_hash_mask; -extern unsigned long _SDR1; - -#define KB (1024) -#define MB (1024*KB) - -static inline void loop_forever(void) -{ - volatile unsigned long x = 1; - for(;x;x|=1) - ; -} - -#ifdef CONFIG_PPC_MULTIPLATFORM -static inline void create_pte_mapping(unsigned long start, unsigned long end, - unsigned long mode, int large) -{ - unsigned long addr; - unsigned int step; - - if (large) - step = 16*MB; - else - step = 4*KB; - - for (addr = start; addr < end; addr += step) { - unsigned long vpn, hash, hpteg; - unsigned long vsid = get_kernel_vsid(addr); - unsigned long va = (vsid << 28) | (addr & 0xfffffff); - int ret; - - if (large) - vpn = va >> HPAGE_SHIFT; - else - vpn = va >> PAGE_SHIFT; - - hash = hpt_hash(vpn, large); - - hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); - -#ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_LPAR) - ret = pSeries_lpar_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - else -#endif /* CONFIG_PPC_PSERIES */ - ret = native_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - - if (ret == -1) { - ppc64_terminate_msg(0x20, "create_pte_mapping"); - loop_forever(); - } - } -} - -void __init htab_initialize(void) -{ - unsigned long table, htab_size_bytes; - unsigned long pteg_count; - unsigned long mode_rw; - int i, use_largepages = 0; - - DBG(" -> htab_initialize()\n"); - - /* - * Calculate the required size of the htab. We want the number of - * PTEGs to equal one half the number of real pages. - */ - htab_size_bytes = 1UL << ppc64_pft_size; - pteg_count = htab_size_bytes >> 7; - - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - - htab_hash_mask = pteg_count - 1; - - if (systemcfg->platform & PLATFORM_LPAR) { - /* Using a hypervisor which owns the htab */ - htab_address = NULL; - _SDR1 = 0; - } else { - /* Find storage for the HPT. Must be contiguous in - * the absolute address space. - */ - table = lmb_alloc(htab_size_bytes, htab_size_bytes); - - DBG("Hash table allocated at %lx, size: %lx\n", table, - htab_size_bytes); - - if ( !table ) { - ppc64_terminate_msg(0x20, "hpt space"); - loop_forever(); - } - htab_address = abs_to_virt(table); - - /* htab absolute addr + encoded htabsize */ - _SDR1 = table + __ilog2(pteg_count) - 11; - - /* Initialize the HPT with no entries */ - memset((void *)table, 0, htab_size_bytes); - } - - mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; - - /* On U3 based machines, we need to reserve the DART area and - * _NOT_ map it to avoid cache paradoxes as it's remapped non - * cacheable later on - */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) - use_largepages = 1; - - /* create bolted the linear mapping in the hash table */ - for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - - base = lmb.memory.region[i].physbase + KERNELBASE; - size = lmb.memory.region[i].size; - - DBG("creating mapping for region: %lx : %lx\n", base, size); - -#ifdef CONFIG_U3_DART - /* Do not map the DART space. Fortunately, it will be aligned - * in such a way that it will not cross two lmb regions and will - * fit within a single 16Mb page. - * The DART space is assumed to be a full 16Mb region even if we - * only use 2Mb of that space. We will use more of it later for - * AGP GART. We have to use a full 16Mb large page. - */ - DBG("DART base: %lx\n", dart_tablebase); - - if (dart_tablebase != 0 && dart_tablebase >= base - && dart_tablebase < (base + size)) { - if (base != dart_tablebase) - create_pte_mapping(base, dart_tablebase, mode_rw, - use_largepages); - if ((base + size) > (dart_tablebase + 16*MB)) - create_pte_mapping(dart_tablebase + 16*MB, base + size, - mode_rw, use_largepages); - continue; - } -#endif /* CONFIG_U3_DART */ - create_pte_mapping(base, base + size, mode_rw, use_largepages); - } - DBG(" <- htab_initialize()\n"); -} -#undef KB -#undef MB -#endif /* CONFIG_PPC_MULTIPLATFORM */ - /* * Called by asm hashtable.S for doing lazy icache flush */ diff -ruN linus-bk/include/asm-ppc64/mmu.h linus-bk-sfr.14/include/asm-ppc64/mmu.h --- linus-bk/include/asm-ppc64/mmu.h 2005-01-05 17:06:08.000000000 +1100 +++ linus-bk-sfr.14/include/asm-ppc64/mmu.h 2005-01-06 14:36:16.000000000 +1100 @@ -98,7 +98,6 @@ #define PP_RXRX 3 /* Supervisor read, User read */ -extern HPTE * htab_address; extern unsigned long htab_hash_mask; static inline unsigned long hpt_hash(unsigned long vpn, int large) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/86e40190/attachment.pgp From WJEEHA at pk.ibm.com Thu Jan 6 22:48:27 2005 From: WJEEHA at pk.ibm.com (Wjeeha Tahir) Date: Thu, 6 Jan 2005 16:48:27 +0500 Subject: IBM 6400 Printer Driver Message-ID: Hi, I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and any configuration/installation document (if possible). I am hoping that this forum would help me find the desired. Thanks in advance, Kind Regards, Wjeeha Tahir -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/79b23142/attachment.htm From anton at samba.org Thu Jan 6 23:19:24 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 6 Jan 2005 23:19:24 +1100 Subject: [PATCH] xmon breakpoints fix for Power4/5 In-Reply-To: <20050105084202.5102b467@localhost> References: <20050104143031.62c25338@localhost> <16859.11390.511469.875831@cargo.ozlabs.ibm.com> <20050105084202.5102b467@localhost> Message-ID: <20050106121924.GA14239@krispykreme.ozlabs.ibm.com> > I may have misunderstood what Anton wanted when I talked w/ him > yesterday, but I was under the impression that he wanted 'bi' and 'bd' > fixed for Power4/5/LPAR. Yep sorry, my fault. I was interested in the data breakpoint stuff you had written that went through the hypervisor. Anton From haveblue at us.ibm.com Fri Jan 7 03:45:09 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Thu, 06 Jan 2005 08:45:09 -0800 Subject: IBM 6400 Printer Driver In-Reply-To: References: Message-ID: <1105029909.6932.2.camel@localhost> On Thu, 2005-01-06 at 16:48 +0500, Wjeeha Tahir wrote: > I need the drivers for IBM 6400 Line Printer for RedHat Linux 9 and > any configuration/installation document (if possible). I am hoping > that this forum would help me find the desired. Thanks in advance, Does the printer have a ppc64 chip in it and run Linux? -- Dave From hch at lst.de Fri Jan 7 03:47:19 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 6 Jan 2005 17:47:19 +0100 Subject: [PATCH] fix pktcdvd linking on ppc64 Message-ID: <20050106164719.GA24751@lst.de> clear_page uses ppc64_caches so it needs to be exported. --- 1.99/arch/ppc64/kernel/setup.c 2005-01-05 03:48:16 +01:00 +++ edited/arch/ppc64/kernel/setup.c 2005-01-06 17:51:19 +01:00 @@ -116,6 +116,7 @@ u64 ppc64_debug_switch; struct ppc64_caches ppc64_caches; +EXPORT_SYMBOL_GPL(ppc64_caches); /* * These are used in binfmt_elf.c to put aux entries on the stack From markus at unixforces.net Fri Jan 7 04:55:01 2005 From: markus at unixforces.net (Markus Rothe) Date: Thu, 6 Jan 2005 18:55:01 +0100 Subject: Problems using Apple LCD with 2.6.10 Message-ID: <20050106175501.GA11534@unixforces.net> Hi, I'm not sure if this is the correct place for such mails, but I didn't found another place to post my problem. My problem is that my LCD doesn't work correctly with latest (2.6.10) kernel. It's an Apple Cinema Display connected through the Apple Display Connector (ADC). The problem is that there are many "blue lightnings" all over the display. With blue lightning I mean a small set of pixels which turn into light blue for about half a second. And my display also flickers from time to time. Both happens when running console mode and if I run Xorg. This problem is definetly related to the kernel as it does not occure with kernel 2.6.9. Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050106/d4574a70/attachment.pgp From linas at austin.ibm.com Fri Jan 7 06:24:13 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 6 Jan 2005 13:24:13 -0600 Subject: [PATCH] PPC64: EEH Recovery Message-ID: <20050106192413.GK22274@austin.ibm.com> Hi Paul, The patch below implements hotplug style EEH error recovery. Its split into two pieces: a part that needs to be applied to the PPC64 arch tree, and a part that needs to be applied to the RPA PHP hotplug tree. The PPC64 part needs to go in first. Assuming this doesn't generate a round of discussion, please forward upstream to akpm/torvalds. Signed-off-by: Linas Vepstas -------------- next part -------------- ===== arch/ppc64/kernel/eeh.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/eeh.c 2005-01-06 13:05:42 -06:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-01-06 13:08:03 -06:00 @@ -17,21 +17,19 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include #include -#include #include #include #include #include #include -#include +#include #include #include #include #include -#include #include "pci.h" #undef DEBUG @@ -89,7 +87,6 @@ static struct notifier_block *eeh_notifi * attempts we allow before panicking. */ #define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; /* RTAS tokens */ static int ibm_set_eeh_option; @@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -378,6 +385,14 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + } } #ifdef DEBUG @@ -389,6 +404,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct device_node *dn, int severity) +{ + unsigned long flags; + int rc; + + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -484,11 +525,9 @@ static void eeh_event_handler(void *dumm "%s %s\n", event->reset_state, pci_name(event->dev), pci_pretty_name(event->dev)); - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - __get_cpu_var(slot_resets)++; + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); pci_dev_put(event->dev); kfree(event); @@ -496,8 +535,8 @@ static void eeh_event_handler(void *dumm } /** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -512,6 +551,17 @@ static inline unsigned long eeh_token_to return pa | (token & (PAGE_SIZE-1)); } +static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } + return NULL; +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -532,7 +582,7 @@ int eeh_dn_check_failure(struct device_n int ret; int rets[3]; unsigned long flags; - int rc, reset_state; + int reset_state; struct eeh_event *event; __get_cpu_var(total_mmio_ffs)++; @@ -540,16 +590,20 @@ int eeh_dn_check_failure(struct device_n if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -558,8 +612,9 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_freeze_count ++; + if (dn->eeh_freeze_count >= EEH_MAX_FAILS) { + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ @@ -581,34 +636,25 @@ int eeh_dn_check_failure(struct device_n return 0; } - /* prevent repeated reports of this failure */ + /* Prevent repeated reports of this failure */ dn->eeh_mode |= EEH_MODE_ISOLATED; reset_state = rets[0]; + /* Log the error with the rtas logger */ + if (dn->eeh_freeze_count < EEH_MAX_ALLOWED_FREEZES) { + eeh_slot_error_detail (dn, 1 /* Temporary Error */); + } else { + eeh_slot_error_detail (dn, 2 /* Permanent Error */); + } - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { - eeh_panic(dev, reset_state); + printk (KERN_ERR "EEH: out of memory, event not handled\n"); return 1; } + if (!dev) + dev = eeh_get_pci_dev (dn); event->dev = dev; event->dn = dn; event->reset_state = reset_state; @@ -634,7 +680,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -642,6 +687,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -651,8 +697,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -663,6 +711,172 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int token = rtas_token ("ibm,set-slot-reset"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 1); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot\n"); + return; + } + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. + */ + msleep (200); + + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 0); +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge\n"); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + + +struct eeh_cfg_tree +{ + struct eeh_cfg_tree *sibling; + struct eeh_cfg_tree *child; + struct device_node *dn; + int is_bridge; +}; + +/** + * eeh_save_bars - save the PCI config space info + */ +struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn) +{ + struct pci_dev *dev; + struct eeh_cfg_tree *cnode; + + dev = eeh_get_pci_dev(dn); + if (!dev) + return NULL; + + cnode = kmalloc(sizeof(struct eeh_cfg_tree), GFP_KERNEL); + if (!cnode) + return NULL; + + cnode->is_bridge = 0; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + cnode->is_bridge = 1; + + of_node_get(dn); + cnode->dn = dn; + + cnode->sibling = NULL; + cnode->child = NULL; + + if (dn->child) { + cnode->child = eeh_save_bars (dn->child); + } + if (dn->sibling) { + cnode->sibling = eeh_save_bars (dn->sibling); + } + + return cnode; +} +EXPORT_SYMBOL(eeh_save_bars); + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct eeh_cfg_tree *tree) +{ + if (!(tree->is_bridge)) + __restore_bars (tree->dn); + + if (tree->child) + eeh_restore_bars (tree->child); + + if (tree->sibling) + eeh_restore_bars (tree->sibling); + + of_node_put (tree->dn); + kfree (tree); +} +EXPORT_SYMBOL(eeh_restore_bars); + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -829,7 +1043,9 @@ void eeh_add_device_early(struct device_ return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -848,6 +1064,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -857,6 +1076,11 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); } EXPORT_SYMBOL(eeh_add_device_late); @@ -886,12 +1110,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -899,13 +1128,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; ===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited ===== --- 1.59/arch/ppc64/kernel/pSeries_pci.c 2004-11-15 21:29:10 -06:00 +++ edited/arch/ppc64/kernel/pSeries_pci.c 2005-01-05 13:41:09 -06:00 @@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-25 18:17:38 -05:00 +++ edited/include/asm-ppc64/eeh.h 2005-01-05 13:47:55 -06:00 @@ -22,8 +22,8 @@ #include #include -#include #include +#include struct pci_dev; struct device_node; @@ -33,6 +33,10 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + #ifdef CONFIG_PPC_PSERIES extern void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -57,6 +61,34 @@ void eeh_add_device_early(struct device_ void eeh_add_device_late(struct pci_dev *); /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); + +/** * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * @@ -91,6 +123,13 @@ struct eeh_event { /** Register to find out about EEH events. */ int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); + +/** Save and restore device configuration info across + * device resets. + */ +struct eeh_cfg_tree; +struct eeh_cfg_tree * eeh_save_bars(struct device_node *dn); +void eeh_restore_bars(struct eeh_cfg_tree *tree); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. ===== include/asm-ppc64/prom.h 1.24 vs edited ===== --- 1.24/include/asm-ppc64/prom.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/prom.h 2005-01-05 13:41:09 -06:00 @@ -164,8 +164,10 @@ struct device_node { int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_freeze_count; /* number of times this device froze up. */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ struct property *properties; struct device_node *parent; ===== include/asm-ppc64/rtas.h 1.25 vs edited ===== --- 1.25/include/asm-ppc64/rtas.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/rtas.h 2005-01-05 13:41:09 -06:00 @@ -241,4 +241,6 @@ extern void rtas_stop_self(void); /* RMO buffer reserved for user-space RTAS use */ extern unsigned long rtas_rmo_buf; +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ -------------- next part -------------- ===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited ===== --- 1.11/drivers/pci/hotplug/rpaphp.h 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp.h 2005-01-05 13:41:09 -06:00 @@ -126,6 +126,8 @@ extern int register_pci_slot(struct slot extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); ===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited ===== --- 1.18/drivers/pci/hotplug/rpaphp_core.c 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp_core.c 2005-01-05 13:41:09 -06:00 @@ -443,12 +443,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } ===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited ===== --- 1.17/drivers/pci/hotplug/rpaphp_pci.c 2004-11-18 02:36:18 -06:00 +++ edited/drivers/pci/hotplug/rpaphp_pci.c 2005-01-05 15:30:29 -06:00 @@ -22,8 +22,12 @@ * Send feedback to * */ +#include +#include #include +#include #include +#include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -562,36 +576,154 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB slots don't have bridges */ + if (slot->bridge == NULL) continue; - } + + /* the PCI device could be the PHB itself */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return slot; } } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + int freeze_count=0; + struct eeh_event *event = ev; + struct slot *frozen_slot; + struct eeh_cfg_tree * saved_bars; + +debug=1; + frozen_slot = rpaphp_find_slot(event->dev); + if (!frozen_slot) + { + printk (KERN_ERR + "EEH: Cannot find PCI slot for EEH error! dev=%p dn=%p\n", + event->dev, event->dn); + if (event->dev) + printk("EEH: above message for pci device %s %s\n", + pci_name(event->dev), pci_pretty_name (event->dev)); + if (event->dn) + printk ("EEH: above message for dn %s\n", event->dn->full_name); + return 1; + } + + /* Keep a copy of the config space registers */ + saved_bars = eeh_save_bars(frozen_slot->dn); + of_node_get(event->dn); + pci_dev_get(event->dev); + + if (frozen_slot->dn->child) + freeze_count = frozen_slot->dn->child->eeh_freeze_count; + rpaphp_unconfig_pci_adapter (frozen_slot); + + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (event->dev), + pci_pretty_name (event->dev), + freeze_count); + goto rdone; + } + printk (KERN_WARNING + "EEH: This device has failed %d times since last reoobt: %s:%s\n", + freeze_count, + pci_name (event->dev), + pci_pretty_name (event->dev)); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (event->dn); + rtas_configure_bridge(event->dn); + eeh_restore_bars(saved_bars); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + ssleep (5); + + rpaphp_enable_pci_slot (frozen_slot); + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + if (frozen_slot->dn->child) + frozen_slot->dn->child->eeh_freeze_count = freeze_count; + +rdone: + of_node_put(event->dn); + pci_dev_put(event->dev); + return 0; +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + eeh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + eeh_unregister_notifier (&eeh_block); +} + From johnrose at austin.ibm.com Fri Jan 7 07:59:25 2005 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 06 Jan 2005 14:59:25 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050106192413.GK22274@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> Message-ID: <1105045165.22565.20.camel@sinatra.austin.ibm.com> Hi Linas- Here are a couple of non-substantive comments on your PCI Hotplug patch: + /* PHB slots don't have bridges */ + if (slot->bridge == NULL) continue; - } + + /* the PCI device could be the PHB itself */ + if (slot->bridge == dev) + return slot; The PHB case is handled by the first condition. The second comment would make more sense if "PHB itself" read "slot itself". -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); I suppose we could also make this static and remove it from rpaphp.h. Thanks- John From j.glisse at free.fr Sat Jan 8 04:37:03 2005 From: j.glisse at free.fr (Jerome Glisse) Date: Fri, 07 Jan 2005 18:37:03 +0100 Subject: Problems using Apple LCD with 2.6.10 In-Reply-To: <20050106175501.GA11534@unixforces.net> References: <20050106175501.GA11534@unixforces.net> Message-ID: <41DEC8BF.6010809@free.fr> Markus Rothe wrote: >Hi, > >I'm not sure if this is the correct place for such mails, but I didn't >found another place to post my problem. > >My problem is that my LCD doesn't work correctly with latest (2.6.10) >kernel. It's an Apple Cinema Display connected through the Apple Display >Connector (ADC). The problem is that there are many "blue lightnings" all >over the display. With blue lightning I mean a small set of pixels which >turn into light blue for about half a second. And my display also flickers >from time to time. Both happens when running console mode and if I run >Xorg. > >This problem is definetly related to the kernel as it does not occure with >kernel 2.6.9. > > What is your graphics card ? radeon ? nvidia ? best, Jerome Glisse From olof at austin.ibm.com Sat Jan 8 07:00:26 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 7 Jan 2005 14:00:26 -0600 Subject: [PATCH] [PPC64] Fix iommu cleanup regression Message-ID: <20050107200026.GA23616@austin.ibm.com> Hi, In the recent IOMMU cleanup, the new LPAR code assumes that all PHBs must have a dma window assigned to it. On some machines we don't have a window assinged unless there's an adapter in the slot. In other words, a PHB without a ibm,dma-window property is not a bug and must be tolerated. This patch fixes that, and also removes a redundant check for the dma-window being defined. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-cleanup-bugfix 2005-01-07 12:52:18.960683160 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-07 13:44:19.427300128 -0600 @@ -293,10 +293,6 @@ static void iommu_table_setparms_lpar(st struct iommu_table *tbl, unsigned int *dma_window) { - if (!dma_window) - panic("iommu_table_setparms_lpar: device %s has no" - " ibm,dma-window property!\n", dn->full_name); - tbl->it_busno = dn->bussubno; /* TODO: Parse field size properties properly. */ @@ -385,7 +381,10 @@ static void iommu_bus_setup_pSeriesLP(st break; } - WARN_ON(dma_window == NULL); + if (dma_window == NULL) { + DBG("iommu_bus_setup_pSeriesLP: bus %s seems to have no ibm,dma-window property\n", dn->full_name); + return; + } if (!pdn->iommu_table) { /* Bussubno hasn't been copied yet. @@ -420,10 +419,11 @@ static void iommu_dev_setup_pSeries(stru while (dn && dn->iommu_table == NULL) dn = dn->parent; - WARN_ON(!dn); - - if (dn) + if (dn) { mydn->iommu_table = dn->iommu_table; + } else { + DBG("iommu_dev_setup_pSeries, dev %p (%s) has no iommu table\n", dev, dev->pretty_name); + } } static void iommu_bus_setup_null(struct pci_bus *b) { } _ From linas at austin.ibm.com Sat Jan 8 07:09:36 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 7 Jan 2005 14:09:36 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <1105045165.22565.20.camel@sinatra.austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <1105045165.22565.20.camel@sinatra.austin.ibm.com> Message-ID: <20050107200936.GN22274@austin.ibm.com> On Thu, Jan 06, 2005 at 02:59:25PM -0600, John Rose was heard to remark: > Hi Linas- > > Here are a couple of non-substantive comments on your PCI Hotplug patch: OK, thanks, I've tweaked it, I'll be in the next round of updates. --linas From markus at unixforces.net Sat Jan 8 07:13:43 2005 From: markus at unixforces.net (Markus Rothe) Date: Fri, 7 Jan 2005 21:13:43 +0100 Subject: Problems using Apple LCD with 2.6.10 In-Reply-To: <41DEC8BF.6010809@free.fr> References: <20050106175501.GA11534@unixforces.net> <41DEC8BF.6010809@free.fr> Message-ID: <20050107201343.GA10390@unixforces.net> Jerome Glisse wrote: > What is your graphics card ? radeon ? nvidia ? It is a Radeon 9600. Markus -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050107/6b316468/attachment.pgp From zwane at arm.linux.org.uk Sun Jan 9 15:29:23 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sat, 8 Jan 2005 21:29:23 -0700 (MST) Subject: [PATCH] PPC64: Move hotplug cpu functions to smp_ops Message-ID: This should allow for easier adding of hotplug cpu support for other PPC64 subarchs. The patch is untested but does compile with and without hotplug cpu on pSeries and G5 configs. What can get slightly confusing is the fact that both ppc_md and smp_ops have cpu_die members. arch/ppc64/kernel/pSeries_smp.c | 9 +++++++-- arch/ppc64/kernel/smp.c | 16 ++++++++++++++++ include/asm-ppc64/machdep.h | 2 ++ 3 files changed, 25 insertions(+), 2 deletions(-) Signed-off-by: Zwane Mwaikambo Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/pSeries_smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pSeries_smp.c --- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c 4 Jan 2005 04:03:33 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/pSeries_smp.c 9 Jan 2005 03:42:19 -0000 @@ -88,7 +88,7 @@ static int query_cpu_stopped(unsigned in #ifdef CONFIG_HOTPLUG_CPU -int __cpu_disable(void) +int pSeries_cpu_disable(void) { /* FIXME: go put this in a header somewhere */ extern void xics_migrate_irqs_away(void); @@ -106,7 +106,7 @@ int __cpu_disable(void) return 0; } -void __cpu_die(unsigned int cpu) +void pSeries_cpu_die(unsigned int cpu) { int tries; int cpu_status; @@ -355,6 +355,11 @@ void __init smp_init_pSeries(void) else smp_ops = &pSeries_xics_smp_ops; +#ifdef CONFIG_HOTPLUG_CPU + smp_ops->cpu_disable = pSeries_cpu_disable; + smp_ops->cpu_die = pSeries_cpu_die; +#endif + /* Start secondary threads on SMT systems; primary threads * are already in the running state. */ Index: linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/arch/ppc64/kernel/smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.c --- linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c 4 Jan 2005 04:03:33 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/arch/ppc64/kernel/smp.c 9 Jan 2005 03:48:56 -0000 @@ -557,3 +557,19 @@ void __init smp_cpus_done(unsigned int m */ cpu_present_map = cpu_possible_map; } + +#ifdef CONFIG_HOTPLUG_CPU +int __cpu_disable(void) +{ + if (smp_ops->cpu_disable) + return smp_ops->cpu_disable(); + + return -ENOSYS; +} + +void __cpu_die(unsigned int cpu) +{ + if (smp_ops->cpu_die) + smp_ops->cpu_die(cpu); +} +#endif Index: linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm1/include/asm-ppc64/machdep.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 machdep.h --- linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h 4 Jan 2005 04:03:40 -0000 1.1.1.1 +++ linux-2.6.10-mm1-ppc64/include/asm-ppc64/machdep.h 9 Jan 2005 03:50:21 -0000 @@ -31,6 +31,8 @@ struct smp_ops_t { void (*late_setup_cpu)(int nr); void (*take_timebase)(void); void (*give_timebase)(void); + int (*cpu_disable)(void); + void (*cpu_die)(unsigned int nr); }; #endif From anton at samba.org Sun Jan 9 16:48:34 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 9 Jan 2005 16:48:34 +1100 Subject: xtime <-> gettimeofday can get out of sync Message-ID: <20050109054834.GL14239@krispykreme.ozlabs.ibm.com> Hi, Ive noticed a problem where xtime and gettimeofday could get out of sync if interrupts are disabled for too long (eg long kernel code paths or dropping into the debugger for a while). We correctly replay lost jiffies but in that loop time_sync_xtime syncs the intermediate values of xtime up with the current value of gettimeofday. So xtime jumps by a bunch and from then on it is ahead of gettimeofday and we never resync the two. I guess this is to avoid xtime going backwards. The patch below creates a __do_gettimeofday where you can pass in a tb value and sync the intermediate values of xtime properly. Note that the time_sync_xtime check only stops the seconds from going backwards, the ns component still could couldnt it? Considering this stuff is hard to get right, should we switch to the time interpolator stuff? The only problem there is it might be trouble for systemcfg (which exports stuff to do userspace gettimeofday). Anton ===== arch/ppc64/kernel/time.c 1.44 vs edited ===== --- 1.44/arch/ppc64/kernel/time.c 2005-01-05 13:48:14 +11:00 +++ edited/arch/ppc64/kernel/time.c 2005-01-09 16:37:33 +11:00 @@ -142,16 +142,54 @@ } } +/* + * This version of gettimeofday has microsecond resolution. + */ +static inline void __do_gettimeofday(struct timeval *tv, unsigned long tb_val) +{ + unsigned long sec, usec, tb_ticks; + unsigned long xsec, tb_xsec; + struct gettimeofday_vars * temp_varp; + unsigned long temp_tb_to_xs, temp_stamp_xsec; + + /* + * These calculations are faster (gets rid of divides) + * if done in units of 1/2^20 rather than microseconds. + * The conversion to microseconds at the end is done + * without a divide (and in fact, without a multiply) + */ + tb_ticks = tb_val - do_gtod.tb_orig_stamp; + temp_varp = do_gtod.varp; + temp_tb_to_xs = temp_varp->tb_to_xs; + temp_stamp_xsec = temp_varp->stamp_xsec; + tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); + xsec = temp_stamp_xsec + tb_xsec; + sec = xsec / XSEC_PER_SEC; + xsec -= sec * XSEC_PER_SEC; + usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC; + + tv->tv_sec = sec; + tv->tv_usec = usec; +} + +void do_gettimeofday(struct timeval *tv) +{ + __do_gettimeofday(tv, get_tb()); +} + +EXPORT_SYMBOL(do_gettimeofday); + /* Synchronize xtime with do_gettimeofday */ -static __inline__ void timer_sync_xtime( unsigned long cur_tb ) +static inline void timer_sync_xtime(unsigned long cur_tb) { struct timeval my_tv; - if ( cur_tb > next_xtime_sync_tb ) { + if (cur_tb > next_xtime_sync_tb) { next_xtime_sync_tb = cur_tb + xtime_sync_interval; - do_gettimeofday( &my_tv ); - if ( xtime.tv_sec <= my_tv.tv_sec ) { + __do_gettimeofday(&my_tv, cur_tb); + + if (xtime.tv_sec <= my_tv.tv_sec) { xtime.tv_sec = my_tv.tv_sec; xtime.tv_nsec = my_tv.tv_usec * 1000; } @@ -274,7 +312,7 @@ write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; do_timer(regs); - timer_sync_xtime( cur_tb ); + timer_sync_xtime(lpaca->next_jiffy_update_tb); timer_check_rtc(); write_sequnlock(&xtime_lock); if ( adjusting_time && (time_adjust == 0) ) @@ -312,36 +350,6 @@ { return mulhdu(get_tb(), tb_to_ns_scale) << tb_to_ns_shift; } - -/* - * This version of gettimeofday has microsecond resolution. - */ -void do_gettimeofday(struct timeval *tv) -{ - unsigned long sec, usec, tb_ticks; - unsigned long xsec, tb_xsec; - struct gettimeofday_vars * temp_varp; - unsigned long temp_tb_to_xs, temp_stamp_xsec; - - /* These calculations are faster (gets rid of divides) - * if done in units of 1/2^20 rather than microseconds. - * The conversion to microseconds at the end is done - * without a divide (and in fact, without a multiply) */ - tb_ticks = get_tb() - do_gtod.tb_orig_stamp; - temp_varp = do_gtod.varp; - temp_tb_to_xs = temp_varp->tb_to_xs; - temp_stamp_xsec = temp_varp->stamp_xsec; - tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); - xsec = temp_stamp_xsec + tb_xsec; - sec = xsec / XSEC_PER_SEC; - xsec -= sec * XSEC_PER_SEC; - usec = (xsec * USEC_PER_SEC)/XSEC_PER_SEC; - - tv->tv_sec = sec; - tv->tv_usec = usec; -} - -EXPORT_SYMBOL(do_gettimeofday); int do_settimeofday(struct timespec *tv) { From j.glisse at gmail.com Mon Jan 10 02:26:12 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 16:26:12 +0100 Subject: U3 G5 AGP support patch (v4) Message-ID: <4240b916050109072621440269@mail.gmail.com> Hi, Attached is a patch for the U3 agp bridge. This one just fix some typo from the previous patch. (DEVICE instead of DEVIEC...). Signed-off-by: Jerome Glisse best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: uninorth-patch4 Type: application/octet-stream Size: 10216 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/226dc102/attachment.obj From j.glisse at gmail.com Mon Jan 10 02:40:56 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 16:40:56 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) Message-ID: <4240b916050109074053e328b1@mail.gmail.com> Hi, With 2.6.10 i get a compilation error with disable_6xx_mmu i guess this is linked with the patch you supplied in december in arch/ppc/boot/common/util.S Patch which comment disable_6xx_mmu if flags CONFIG_6XX not defined. The problem arise in arch/ppc/boot/simple/misc-prep.c where there is no conditional compilation for this function. Attached is a patch that use the flags CONFIG_6XX to comment out call to this function if flags not set. By the way there is many compilation warning related to PPC with 2.6.10 anyone looking to correct them ? Signed-off-by: Jerome Glisse best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: disable_6xx-patch Type: application/octet-stream Size: 855 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/888cb654/attachment.obj From hch at lst.de Mon Jan 10 03:06:14 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 9 Jan 2005 17:06:14 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b916050109072621440269@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> Message-ID: <20050109160614.GA22839@lst.de> +static struct device_node* uninorth_node __pmacdata; +static u32 __iomem * uninorth_base __pmacdata; static struct device_node *uninorth_node __pmacdata; static u32 __iomem *uninorth_base __pmacdata; + if(uninorth_rev == 0x21) { if (uninorth_rev == 0x21) { + if((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { if ((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { + if (agp_bridge->dev->device == PCI_DEVICE_ID_APPLE_U3_AGP) { + /* This is an AGP V3 */ + agp_device_command(command, TRUE); + } else { + /* AGP V2 */ + agp_device_command(command, FALSE); + } double-indentation, also please use 1/0 instead of TRUE/FALSE. +static struct aper_size_info_32 u3_sizes[8] = +{ +/* + * Not sure that uninorth3 supports that high aperture sizes but it + * would strange if it did not :) + */ comment before the struct declearation, please, aka /* * Not sure that uninorth3 supports that high aperture sizes but it * would strange if it did not :) */ static struct aper_size_info_32 u3_sizes[8] = { + uninorth_node = of_find_node_by_name(NULL, "uni-n"); + /* Locate G5 u3 */ + if (uninorth_node == NULL) { + uninorth_node = of_find_node_by_name(NULL, "u3"); + } /* Locate G5 u3 */ uninorth_node = of_find_node_by_name(NULL, "uni-n"); if (!uninorth_node) uninorth_node = of_find_node_by_name(NULL, "u3"); + /* + * Set specific functions & values for agp3 controller. + */ + if (pdev->device == PCI_DEVICE_ID_APPLE_U3_AGP) { + uninorth_agp_driver.insert_memory = uninorth3_insert_memory; + uninorth_agp_driver.aperture_sizes = (void *)u3_sizes; + uninorth_agp_driver.num_aperture_sizes = 8; Please delcare separate driver instance instead of overriding. And asm-ppc64 is still missing an agp.h, no? From j.glisse at gmail.com Mon Jan 10 04:46:05 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 18:46:05 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <20050109160614.GA22839@lst.de> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> Message-ID: <4240b91605010909463e44bba8@mail.gmail.com> > Please delcare separate driver instance instead of overriding. I hope new patch follow codestyle ? :) > And asm-ppc64 is still missing an agp.h, no? Maybe, some one with better knowledge may tell us more on that :) Anyway BenH tell me that there is still pending issue with agp & a potential cache aliasing. best, Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: uninorth-patch5 Type: application/octet-stream Size: 11215 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050109/5cfffcbb/attachment.obj From j.glisse at gmail.com Mon Jan 10 07:41:44 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Sun, 9 Jan 2005 21:41:44 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b91605010909463e44bba8@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> <4240b91605010909463e44bba8@mail.gmail.com> Message-ID: <4240b91605010912414a5b1b67@mail.gmail.com> It seems there is bug somewhere in my agp patch. I was playing with r300 radeon and i get a hard lockup (quite used to that while playing with r300 thought :() But after a bit of investigation it seems to be related to agp. Right now i am porting an old tools from dri that test agpgart & thus agp. I finally may really need to totaly split the u3 driver from the old uninorth. I will give a deeper look to track down the issue. In the mean time if some one could test agp & radeon r200 on a g5. You will certainly lockup your g5 but it should not burn, at least here i just got some smoke ;) best, Jerome Glisse From paulus at samba.org Mon Jan 10 08:03:20 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Jan 2005 08:03:20 +1100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b916050109074053e328b1@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> Message-ID: <16865.39960.274092.996530@cargo.ozlabs.ibm.com> Jerome Glisse writes: > With 2.6.10 i get a compilation error with disable_6xx_mmu What kind of machine is this? Could you send me your .config? I suspect that maybe we aren't defining CONFIG_6XX for PPC970 machines. Paul. From david at gibson.dropbear.id.au Tue Jan 11 02:55:20 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 02:55:20 +1100 Subject: [PPC64] Hugepage bugfix Message-ID: <20050110155520.GA22101@localhost.localdomain> Andrew, Linus, please apply: Fix a stupid unbalanced lock bug in the ppc64 hugepage code. Lead rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were enabled (even without actually using hugepages at all). Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2005-01-06 10:47:48.000000000 +1100 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2005-01-10 15:16:25.142319552 +1100 @@ -745,7 +745,7 @@ pgdir = mm->context.huge_pgdir; if (! pgdir) - return; + goto out; mm->context.huge_pgdir = NULL; @@ -768,6 +768,7 @@ BUG_ON(memcmp(pgdir, empty_zero_page, PAGE_SIZE)); kmem_cache_free(zero_cache, pgdir); + out: spin_unlock(&mm->page_table_lock); } -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From wli at holomorphy.com Mon Jan 10 16:04:41 2005 From: wli at holomorphy.com (William Lee Irwin III) Date: Sun, 9 Jan 2005 21:04:41 -0800 Subject: [PPC64] Hugepage bugfix In-Reply-To: <20050110155520.GA22101@localhost.localdomain> References: <20050110155520.GA22101@localhost.localdomain> Message-ID: <20050110050441.GA2696@holomorphy.com> On Tue, Jan 11, 2005 at 02:55:20AM +1100, David Gibson wrote: > Andrew, Linus, please apply: > Fix a stupid unbalanced lock bug in the ppc64 hugepage code. Lead > rapidly to a crash if both CONFIG_HUGETLB_PAGE and CONFIG_PREEMPT were > enabled (even without actually using hugepages at all). > Signed-off-by: David Gibson Acked-by: William Irwin From david at gibson.dropbear.id.au Tue Jan 11 05:00:04 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 05:00:04 +1100 Subject: [PPC64] Rename perf counter register #defines Message-ID: <20050110180004.GC22101@localhost.localdomain> Andrew, please apply: This patch makes some cleanups to the #defines for various fields in the MMCR0 performance monitor control register. Specifically, the names of a couple of bits are changed so that: a) they are a bit less cumbersomely long and b) they match the names used in the hardware documentation. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/processor.h =================================================================== --- working-2.6.orig/include/asm-ppc64/processor.h 2005-01-10 16:51:10.625391320 +1100 +++ working-2.6/include/asm-ppc64/processor.h 2005-01-10 16:51:28.771295712 +1100 @@ -331,8 +331,8 @@ #define MMCR0_FCECE 0x02000000UL /* freeze counters on enabled condition or event */ /* time base exception enable */ #define MMCR0_TBEE 0x00400000UL /* time base exception enable */ -#define MMCR0_PMC1INTCONTROL 0x00008000UL /* PMC1 count enable*/ -#define MMCR0_PMCNINTCONTROL 0x00004000UL /* PMCn count enable*/ +#define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/ +#define MMCR0_PMCjCE 0x00004000UL /* PMCj count enable*/ #define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ #define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ #define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ Index: working-2.6/arch/ppc64/oprofile/op_model_rs64.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_model_rs64.c 2005-01-10 16:51:10.625391320 +1100 +++ working-2.6/arch/ppc64/oprofile/op_model_rs64.c 2005-01-10 16:51:28.772295560 +1100 @@ -119,7 +119,7 @@ mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; /* Only applies to POWER3, but should be safe on RS64 */ - mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL; + mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE; mtspr(SPRN_MMCR0, mmcr0); dbg("setup on cpu %d, mmcr0 %lx\n", smp_processor_id(), Index: working-2.6/arch/ppc64/oprofile/op_model_power4.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/op_model_power4.c 2005-01-10 16:51:10.626391168 +1100 +++ working-2.6/arch/ppc64/oprofile/op_model_power4.c 2005-01-10 16:51:28.772295560 +1100 @@ -97,7 +97,7 @@ mtspr(SPRN_MMCR0, mmcr0); mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; - mmcr0 |= MMCR0_PMC1INTCONTROL|MMCR0_PMCNINTCONTROL; + mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE; mtspr(SPRN_MMCR0, mmcr0); mtspr(SPRN_MMCR1, mmcr1_val); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Jan 11 05:01:27 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 05:01:27 +1100 Subject: [PPC64] Functions to reserve performance monitor hardware Message-ID: <20050110180127.GD22101@localhost.localdomain> Andrew, please apply: The PPC64 interrupt code includes a hook to call when an exception from the performance monitor unit occurs. However, there's no way of reserving the hook properly, so if more than one bit of code tries to use it things will get ugly. Currently oprofile is the only user, but there are likely to be more in future e.g. perfctr, if and when it reaches a fit state for merging. This patch creates functions to reserve and release the performance monitor hardware (including its interrupt), and makes oprofile use them. It also creates a new arch/ppc64/kernel/pmc.c, in which we can put any future helper functions for handling the performance monitor counters. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/pmc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/ppc64/kernel/pmc.c 2005-01-10 16:32:49.733411536 +1100 @@ -0,0 +1,65 @@ +/* + * linux/arch/ppc64/kernel/pmc.c + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +#include +#include + +/* Ensure exceptions are disabled */ +static void dummy_perf(struct pt_regs *regs) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); + mtspr(SPRN_MMCR0, mmcr0); +} + +static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED; +static void *pmc_owner_caller; /* mostly for debugging */ +perf_irq_t perf_irq = dummy_perf; + +int reserve_pmc_hardware(perf_irq_t new_perf_irq) +{ + int err = -EBUSY;; + + spin_lock(&pmc_owner_lock); + + if (pmc_owner_caller) { + printk(KERN_WARNING "reserve_pmc_hardware: " + "PMC hardware busy (reserved by caller %p)\n", + pmc_owner_caller); + goto out; + } + + pmc_owner_caller = __builtin_return_address(0); + perf_irq = new_perf_irq ? : dummy_perf; + + err = 0; + + out: + spin_unlock(&pmc_owner_lock); + return err; +} + +void release_pmc_hardware(void) +{ + spin_lock(&pmc_owner_lock); + + WARN_ON(! pmc_owner_caller); + + pmc_owner_caller = NULL; + perf_irq = dummy_perf; + + spin_unlock(&pmc_owner_lock); +} Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-10 10:51:31.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-10 16:33:43.154412536 +1100 @@ -40,6 +40,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUGGER int (*__debugger)(struct pt_regs *regs); @@ -449,18 +450,7 @@ die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -/* Ensure exceptions are disabled */ -static void dummy_perf(struct pt_regs *regs) -{ - unsigned int mmcr0 = mfspr(SPRN_MMCR0); - - mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); - mtspr(SPRN_MMCR0, mmcr0); -} - -void (*perf_irq)(struct pt_regs *) = dummy_perf; - -EXPORT_SYMBOL(perf_irq); +extern perf_irq_t perf_irq; void performance_monitor_exception(struct pt_regs *regs) { Index: working-2.6/include/asm-ppc64/pmc.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-ppc64/pmc.h 2005-01-10 15:24:40.217406672 +1100 @@ -0,0 +1,29 @@ +/* + * pmc.h + * Copyright (C) 2004 David Gibson, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _PPC64_PMC_H +#define _PPC64_PMC_H + +#include + +typedef void (*perf_irq_t)(struct pt_regs *); + +int reserve_pmc_hardware(perf_irq_t new_perf_irq); +void release_pmc_hardware(void); + +#endif /* _PPC64_PMC_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-01-10 10:51:31.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-01-10 15:24:40.218406520 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o pmc.o obj-$(CONFIG_PPC_OF) += of_device.o -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From anil_411 at yahoo.com Mon Jan 10 18:49:30 2005 From: anil_411 at yahoo.com (Anil Kumar Prasad) Date: Sun, 9 Jan 2005 23:49:30 -0800 (PST) Subject: ioremap of pci region on pSeries LPAR vs SMP Message-ID: <20050110074930.92901.qmail@web11508.mail.yahoo.com> Hi, I am using SLES9 default kernel(2.6.5). I have a piece of code where i do ioremap on pci memory region. It works on JS20 machine where linux runs in partition mode while it causes SLB miss on SMP box(P630) and subsequently panics. On JS20, i get va in IO_REGION (0xE000....) while on p630 ioremap returns address in EEH_REGION(0xA000...). As soon as i try to dereference this returned va on p630, kernel crashes(dump is at the end of mail). I looked in slab.c:slb_allocate(). it doesn't look like that SLB miss in EEH_REGION will ever get through 'us REGION_ID check will return user address. Did i miss something? Please help. Thanks a lot, Anil. ------------------ SMP NR_CPUS=128 NUMA PSERIES NIP: D000000000649CB4 XER: 0000000000000000 LR: D000000000649CA4 REGS: c0000003f7897670 TRAP: 0380 Tainted: GF U (2.6.5-7.97-pseries64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: a000000082060010, DSISR: 0000000002200000 TASK: c00000003f1b5340[5037] 'modprobe' THREAD: c0000003f7894000 CPU: 0 GPR00: 0000000000000001 C0000003F78978F0 D0000000006AEB70 00000000001E8480 GPR04: 0000000000000000 0000000000000004 0000000028088422 0000000000000000 GPR08: 0000000000000000 FFFFFFFFFFFFFFFF C0000000006CAC80 0000000000000080 GPR12: 0000000048004028 C000000000444000 D0000000006A6DD9 D0000000006A6DA8 GPR16: 0000000000000001 0000000000000000 C000000000411770 C000000000411670 GPR20: C000000000411050 D0000000006A6DA8 0000000000001867 0000000000006278 GPR24: 0000000000000001 C0000003F7897A40 C0000001FD158080 C0000001FD158180 GPR28: 0000000000000000 A000000082060010 D0000000006A8F38 C000000000411000 --------------------------------------------------- __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 From anil_411 at yahoo.com Mon Jan 10 18:49:45 2005 From: anil_411 at yahoo.com (Anil Kumar Prasad) Date: Sun, 9 Jan 2005 23:49:45 -0800 (PST) Subject: ioremap of pci region on pSeries LPAR vs SMP Message-ID: <20050110074945.83609.qmail@web11501.mail.yahoo.com> Hi, I am using SLES9 default kernel(2.6.5). I have a piece of code where i do ioremap on pci memory region. It works on JS20 machine where linux runs in partition mode while it causes SLB miss on SMP box(P630) and subsequently panics. On JS20, i get va in IO_REGION (0xE000....) while on p630 ioremap returns address in EEH_REGION(0xA000...). As soon as i try to dereference this returned va on p630, kernel crashes(dump is at the end of mail). I looked in slab.c:slb_allocate(). it doesn't look like that SLB miss in EEH_REGION will ever get through 'us REGION_ID check will return user address. Did i miss something? Please help. Thanks a lot, Anil. ------------------ SMP NR_CPUS=128 NUMA PSERIES NIP: D000000000649CB4 XER: 0000000000000000 LR: D000000000649CA4 REGS: c0000003f7897670 TRAP: 0380 Tainted: GF U (2.6.5-7.97-pseries64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: a000000082060010, DSISR: 0000000002200000 TASK: c00000003f1b5340[5037] 'modprobe' THREAD: c0000003f7894000 CPU: 0 GPR00: 0000000000000001 C0000003F78978F0 D0000000006AEB70 00000000001E8480 GPR04: 0000000000000000 0000000000000004 0000000028088422 0000000000000000 GPR08: 0000000000000000 FFFFFFFFFFFFFFFF C0000000006CAC80 0000000000000080 GPR12: 0000000048004028 C000000000444000 D0000000006A6DD9 D0000000006A6DA8 GPR16: 0000000000000001 0000000000000000 C000000000411770 C000000000411670 GPR20: C000000000411050 D0000000006A6DA8 0000000000001867 0000000000006278 GPR24: 0000000000000001 C0000003F7897A40 C0000001FD158080 C0000001FD158180 GPR28: 0000000000000000 A000000082060010 D0000000006A8F38 C000000000411000 --------------------------------------------------- __________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail From paulus at samba.org Mon Jan 10 20:10:59 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Jan 2005 20:10:59 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050110074930.92901.qmail@web11508.mail.yahoo.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> Message-ID: <16866.18083.212727.327170@cargo.ozlabs.ibm.com> Anil Kumar Prasad writes: > On JS20, i get va in IO_REGION (0xE000....) while on > p630 ioremap returns address in > EEH_REGION(0xA000...). As soon as i try to dereference > this returned va on p630, kernel crashes(dump is at > the end of mail). You shouldn't ever directly dereference the result of ioremap. You have to use readb/readw/readl and writeb/writew/writel. Paul. From trini at kernel.crashing.org Tue Jan 11 01:52:19 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 07:52:19 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <16865.39960.274092.996530@cargo.ozlabs.ibm.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> Message-ID: <20050110145219.GB2226@smtp.west.cox.net> On Mon, Jan 10, 2005 at 08:03:20AM +1100, Paul Mackerras wrote: > Jerome Glisse writes: > > > With 2.6.10 i get a compilation error with disable_6xx_mmu > > What kind of machine is this? Could you send me your .config? > > I suspect that maybe we aren't defining CONFIG_6XX for PPC970 > machines. Indeed. It might make most sense to do something like: Signed-off-by: Tom Rini --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 @@ -112,11 +112,15 @@ end-$(pcore) := pcore cacheflag-$(pcore) := -include $(clear_L2_L3) +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real +# machine. +ifeq ($(CONFIG_6xx),y) zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS extra.o-$(CONFIG_PPC_PREP) := prepmap.o misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o end-$(CONFIG_PPC_PREP) := prep +endif end-$(CONFIG_SANDPOINT) := sandpoint cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) -- Tom Rini http://gate.crashing.org/~trini/ From hch at lst.de Tue Jan 11 03:39:15 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 10 Jan 2005 17:39:15 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> Message-ID: <20050110163914.GA11906@lst.de> On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote: > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > +# machine. Maybe we should prevent setting PPC_PREP to y for PPC970 instead? From trini at kernel.crashing.org Tue Jan 11 03:44:02 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 09:44:02 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110163914.GA11906@lst.de> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <20050110163914.GA11906@lst.de> Message-ID: <20050110164402.GF2226@smtp.west.cox.net> On Mon, Jan 10, 2005 at 05:39:15PM +0100, Christoph Hellwig wrote: > On Mon, Jan 10, 2005 at 07:52:19AM -0700, Tom Rini wrote: > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > > +# machine. > > Maybe we should prevent setting PPC_PREP to y for PPC970 instead? I don't know if that'll compile. It'd be nice because it means we could try splitting the PREP stuff out of the OpenFirmware (pmac/chrp) stuff again. -- Tom Rini http://gate.crashing.org/~trini/ From linas at austin.ibm.com Tue Jan 11 04:47:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 10 Jan 2005 11:47:16 -0600 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <16866.18083.212727.327170@cargo.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> Message-ID: <20050110174716.GW22274@austin.ibm.com> Hi Paul, On Mon, Jan 10, 2005 at 08:10:59PM +1100, Paul Mackerras was heard to remark: > Anil Kumar Prasad writes: > > > On JS20, i get va in IO_REGION (0xE000....) while on > > p630 ioremap returns address in > > EEH_REGION(0xA000...). As soon as i try to dereference > > this returned va on p630, kernel crashes(dump is at > > the end of mail). > > You shouldn't ever directly dereference the result of ioremap. You > have to use readb/readw/readl and writeb/writew/writel. Paul, Please note that someone removed the EEH_REGION stuff recently, october-ish I think. I don't know why, I thought it was something you condoned. And so in the latest kernels, it *is* legal to directly dereference the result of ioremap. That is, Anil wouldn't have seen this bug if he'd been running the current BK sources. Was removing this mechanism the right thing to do? If so, why? It seemed like a great way to force everyone to use the readb/etc macros. --linas From j.glisse at gmail.com Tue Jan 11 05:14:28 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:14:28 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110145219.GB2226@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> Message-ID: <4240b9160501101014317b8d85@mail.gmail.com> > Signed-off-by: Tom Rini > > --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 > +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 > @@ -112,11 +112,15 @@ > end-$(pcore) := pcore > cacheflag-$(pcore) := -include $(clear_L2_L3) > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > +# machine. > +ifeq ($(CONFIG_6xx),y) > zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS > zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS > extra.o-$(CONFIG_PPC_PREP) := prepmap.o > misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o > end-$(CONFIG_PPC_PREP) := prep > +endif > > end-$(CONFIG_SANDPOINT) := sandpoint > cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) > This do not compile with this patch maybe need also to define CONFIG_6xx if PPC970 is selected as processor ? The errors are: undefined reference for cols, lines, vidmems, scroll, orig_x, orig_y in functions puts, ClearVideoMemory, putc Jerome Glisse From j.glisse at gmail.com Tue Jan 11 05:16:22 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:16:22 +0100 Subject: U3 G5 AGP support patch (v4) In-Reply-To: <4240b91605010912414a5b1b67@mail.gmail.com> References: <4240b916050109072621440269@mail.gmail.com> <20050109160614.GA22839@lst.de> <4240b91605010909463e44bba8@mail.gmail.com> <4240b91605010912414a5b1b67@mail.gmail.com> Message-ID: <4240b916050110101647cfb8f9@mail.gmail.com> > It seems there is bug somewhere in my agp patch. I was playing with > r300 radeon and > i get a hard lockup (quite used to that while playing with r300 thought :() > > But after a bit of investigation it seems to be related to agp. Right now i am > porting an old tools from dri that test agpgart & thus agp. I finally may really > need to totaly split the u3 driver from the old uninorth. > > I will give a deeper look to track down the issue. In the mean time if some > one could test agp & radeon r200 on a g5. You will certainly lockup your g5 > but it should not burn, at least here i just got some smoke ;) > Finally this was because i was doing some nasty stuff elsewhere :) Thus AGP seems to work well, at least over here with some r300 test program using agp :) best, Jerome Glisse From trini at kernel.crashing.org Tue Jan 11 05:29:41 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 11:29:41 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b9160501101014317b8d85@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> Message-ID: <20050110182940.GA3391@smtp.west.cox.net> On Mon, Jan 10, 2005 at 07:14:28PM +0100, Jerome Glisse wrote: > > Signed-off-by: Tom Rini > > > > --- 1.40/arch/ppc/boot/simple/Makefile 2005-01-03 16:49:19 -07:00 > > +++ edited/arch/ppc/boot/simple/Makefile 2005-01-10 07:51:34 -07:00 > > @@ -112,11 +112,15 @@ > > end-$(pcore) := pcore > > cacheflag-$(pcore) := -include $(clear_L2_L3) > > > > +# PPC_PREP can be set to y on a PPC970 configuration, which isn't a real > > +# machine. > > +ifeq ($(CONFIG_6xx),y) > > zimage-$(CONFIG_PPC_PREP) := zImage-PPLUS > > zimageinitrd-$(CONFIG_PPC_PREP) := zImage.initrd-PPLUS > > extra.o-$(CONFIG_PPC_PREP) := prepmap.o > > misc-$(CONFIG_PPC_PREP) += misc-prep.o mpc10x_memory.o > > end-$(CONFIG_PPC_PREP) := prep > > +endif > > > > end-$(CONFIG_SANDPOINT) := sandpoint > > cacheflag-$(CONFIG_SANDPOINT) := -include $(clear_L2_L3) > > > > This do not compile with this patch maybe need also to define > CONFIG_6xx if PPC970 is selected as processor ? I have a feeling CONFIG_6xx isn't selected for a good reason. Can you try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig and seeing if you can build / boot ? Thanks. -- Tom Rini http://gate.crashing.org/~trini/ From j.glisse at gmail.com Tue Jan 11 05:59:50 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 19:59:50 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110182940.GA3391@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> Message-ID: <4240b91605011010593d2f3b3d@mail.gmail.com> > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > and seeing if you can build / boot ? Thanks. > > -- > Tom Rini > http://gate.crashing.org/~trini/ > Seems that this flags is linked to many things :) I tried removing PPC_PREP bool but the kernel fail to compile with again new errors : arch/ppc/kernel/built-in.o(.init.text+0x610): In function `DoSyscall': arch/ppc/kernel/entry.S: undefined reference to `prep_init' arch/ppc/platforms/built-in.o(.pmac.text+0x936): In function 'note_bootable_part': : undefined reference to `boot_dev' I attach my config, someone asked me for that previously but i crashed my system since, thus here it is. Jerome Glisse -------------- next part -------------- A non-text attachment was scrubbed... Name: config-ppc970 Type: application/octet-stream Size: 27921 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050110/902cf012/attachment.obj From trini at kernel.crashing.org Tue Jan 11 06:12:48 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Mon, 10 Jan 2005 12:12:48 -0700 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b91605011010593d2f3b3d@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> Message-ID: <20050110191248.GB3391@smtp.west.cox.net> On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote: > > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > > and seeing if you can build / boot ? Thanks. > > > > -- > > Tom Rini > > http://gate.crashing.org/~trini/ > > > > Seems that this flags is linked to many things :) I tried removing PPC_PREP > bool but the kernel fail to compile with again new errors : > One last thing before we just do what you suggested originally, can you hack it so that PPC_PREP is still set, but on 970 we still set CONFIG_6xx? Thanks again. -- Tom Rini http://gate.crashing.org/~trini/ From j.glisse at gmail.com Tue Jan 11 06:31:29 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Mon, 10 Jan 2005 20:31:29 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050110191248.GB3391@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> <20050110191248.GB3391@smtp.west.cox.net> Message-ID: <4240b91605011011314bb06814@mail.gmail.com> On Mon, 10 Jan 2005 12:12:48 -0700, Tom Rini wrote: > On Mon, Jan 10, 2005 at 07:59:50PM +0100, Jerome Glisse wrote: > > > I have a feeling CONFIG_6xx isn't selected for a good reason. Can you > > > try, as a kludge, removing define_bool PPC_PREP from arch/ppc/Kconfig > > > and seeing if you can build / boot ? Thanks. > > > > > > -- > > > Tom Rini > > > http://gate.crashing.org/~trini/ > > > > > > > Seems that this flags is linked to many things :) I tried removing PPC_PREP > > bool but the kernel fail to compile with again new errors : > > > > One last thing before we just do what you suggested originally, can you > hack it so that PPC_PREP is still set, but on 970 we still set > CONFIG_6xx? Thanks again. > > -- > Tom Rini > http://gate.crashing.org/~trini/ > This issue must be strongly linked with the Murphy Law. Got another compile error when y a add CONFIG_6xx=y to my kernel config. LD .tmp_vmlinux1 ld: arch/ppc/kernel/idle_6xx.o: No such file: Aucun fichier ou r?pertoire de ce type Unfortunetly i've got to move (some trip for my study) and i won't be able to have access any g5 or PPC970 with linux on it until i came back friday or saturday. Anyway i may access my mail until then. Does this disable_6xx_mmu function do something that we should really have on PPC970 ? I hadn't get enought time to look at this function and understand it. By the way, even if i pretty sure this is not related, my kernel is patched with one of my patch (i posted it on this mailling list) that add support of U3 agp bridge on G5. This patch only affect few file and if i remember well, i have tested without it too with no success. Files affected by my patch pciids.h, uninorth.c(char/driver/agp), uninorth.h(asm-ppc&64). and some change in Kconfig of (char/driver/agp) One strange things is that no one except me report error on compilation ? No one use linux with g5, am i alone :) ? best, Jerome Glisse From domen at coderock.org Tue Jan 11 06:59:58 2005 From: domen at coderock.org (domen at coderock.org) Date: Mon, 10 Jan 2005 20:59:58 +0100 Subject: [patch 1/1] ppc64: semicolon in rtasd.c Message-ID: <20050110195959.4D66A1F203@trashy.coderock.org> Hi. Comments and identiation suggest this was wrong. Signed-off-by: Domen Puncer --- kj-domen/arch/ppc64/kernel/rtasd.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c --- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c 2005-01-10 18:00:30.000000000 +0100 +++ kj-domen/arch/ppc64/kernel/rtasd.c 2005-01-10 18:00:30.000000000 +0100 @@ -486,7 +486,7 @@ static int __init rtas_init(void) /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { - if (systemcfg->platform & PLATFORM_PSERIES); + if (systemcfg->platform & PLATFORM_PSERIES) printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } _ From hollis at penguinppc.org Tue Jan 11 07:15:57 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Mon, 10 Jan 2005 20:15:57 +0000 Subject: email message sizes In-Reply-To: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> Message-ID: <200501102015.57394.hollis@penguinppc.org> On Monday 10 January 2005 15:22, Hollis Blanchard wrote: > Hi all, I am one of two people who moderates these mailing lists. On > occasion, people send large emails to these lists. I am of the opinion > that 1MB emails should not be mass-mailed, but if you all have no > problem with that then I will approve them. > > So are any of you on modems, or operate near the limits of your mail > quotas? I'd like to hear comments either way: how large is ok to post > to these mailing lists? So far I have received 5 private mails indicating that 100KB is a reasonable maximum. If you disagree please speak up... -Hollis From paulus at samba.org Tue Jan 11 08:41:48 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 11 Jan 2005 08:41:48 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050110174716.GW22274@austin.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> Message-ID: <16866.63132.352016.732484@cargo.ozlabs.ibm.com> Linas Vepstas writes: > Please note that someone removed the EEH_REGION stuff recently, > october-ish I think. I don't know why, I thought it was something > you condoned. And so in the latest kernels, it *is* legal to directly > dereference the result of ioremap. It might work, but it's not legal on any architecture. I thought there was a file in the Documentation directory explaining that, but I can't find it now. Certainly it has been discussed on various mailing lists in the past. See for example: http://uwsg.iu.edu/hypermail/linux/kernel/0007.3/0591.html On ppc and ppc64, the ioremap return happens to be a valid effective address, but dereferencing it directly is still not right, since if you do that you miss out on the barriers you need to ensure that your loads and stores hit the device in program order. > Was removing this mechanism the right thing to do? If so, why? It was an enormous simplification and Linus was keen to do it. He actually looks at our code from time to time now that his desktop machine is a G5. :) > It seemed like a great way to force everyone to use the > readb/etc macros. Some architectures do in fact use ioremap cookie poisoning for that reason. We could do that as a debug option. Paul. From olof at austin.ibm.com Tue Jan 11 09:23:40 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 10 Jan 2005 16:23:40 -0600 Subject: [PPC64] Functions to reserve performance monitor hardware In-Reply-To: <20050110180127.GD22101@localhost.localdomain> References: <20050110180127.GD22101@localhost.localdomain> Message-ID: <20050110222340.GA13731@austin.ibm.com> On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote: > This patch creates functions to reserve and release the performance > monitor hardware (including its interrupt), and makes oprofile use > them. I don't see where you make oprofile use the functions? op_model_* changes aren't included in the patch. > +int reserve_pmc_hardware(perf_irq_t new_perf_irq) > +{ > + int err = -EBUSY;; Keeping an extra semicolon around in case you need one in a hurry? :) > + spin_lock(&pmc_owner_lock); > + > + if (pmc_owner_caller) { > + printk(KERN_WARNING "reserve_pmc_hardware: " > + "PMC hardware busy (reserved by caller %p)\n", > + pmc_owner_caller); > + goto out; > + } > + > + pmc_owner_caller = __builtin_return_address(0); > + perf_irq = new_perf_irq ? : dummy_perf; > + > + err = 0; Maybe I'm the only one with such an opinion, but I find it more readable to set the error code in the error case (if section above) instead of defaulting to error and clearing it before returning. :) > + pmc_owner_caller = NULL; > + perf_irq = dummy_perf; > + > + spin_unlock(&pmc_owner_lock); Current oprofile code has an implicit mb(); after restoring perf_irq. I think the implied lwsync in spin_unlock is sufficient, but I wanted to mention it. How do you expect the function to be used, will there really be users reserving the hardware without registering the interrupt handler? If there are no such users then it could be nice to reserve using the handler instead of the return address. -Olof From anton at samba.org Tue Jan 11 10:00:15 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Jan 2005 10:00:15 +1100 Subject: [patch 1/1] ppc64: semicolon in rtasd.c In-Reply-To: <20050110195959.4D66A1F203@trashy.coderock.org> References: <20050110195959.4D66A1F203@trashy.coderock.org> Message-ID: <20050110230015.GB14239@krispykreme.ozlabs.ibm.com> Nice catch! Anton -- From: Domen Puncer semicolon in rtasd.c Signed-off-by: Domen Puncer Acked-by: Anton Blanchard diff -puN arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c arch/ppc64/kernel/rtasd.c --- kj/arch/ppc64/kernel/rtasd.c~typo-arch_ppc64_kernel_rtasd.c 2005-01-10 18:00:30.000000000 +0100 +++ kj-domen/arch/ppc64/kernel/rtasd.c 2005-01-10 18:00:30.000000000 +0100 @@ -486,7 +486,7 @@ static int __init rtas_init(void) /* No RTAS, only warn if we are on a pSeries box */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { - if (systemcfg->platform & PLATFORM_PSERIES); + if (systemcfg->platform & PLATFORM_PSERIES) printk(KERN_ERR "rtasd: no event-scan on system\n"); return 1; } _ From anton at samba.org Tue Jan 11 11:08:45 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Jan 2005 11:08:45 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <16866.63132.352016.732484@cargo.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> Message-ID: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> Hi, > It was an enormous simplification and Linus was keen to do it. He > actually looks at our code from time to time now that his desktop > machine is a G5. :) Roland (the infiniband guy) and Linus were behind it: http://marc.theaimsgroup.com/?l=linux-kernel&m=109579598620069&w=2 Looks like it was due to __raw_* not having any EEH checks. As a side note its a worry that we dont have IO macros that order but dont byte swap. __raw_* (which doesnt order) is going to catch out a lot of driver writers I suspect. > Some architectures do in fact use ioremap cookie poisoning for that > reason. We could do that as a debug option. Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into userspace. Their hack on ppc64 was to look at the high nibble of the address and convert it to a non EEH address if required :) Im not sure how best to solve the userspace mmap issue but there are a few groups wanting that. Anton From david at gibson.dropbear.id.au Tue Jan 11 21:57:07 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 11 Jan 2005 21:57:07 +1100 Subject: [PPC64] Functions to reserve performance monitor hardware In-Reply-To: <20050110222340.GA13731@austin.ibm.com> References: <20050110180127.GD22101@localhost.localdomain> <20050110222340.GA13731@austin.ibm.com> Message-ID: <20050111105707.GC28175@localhost.localdomain> On Mon, Jan 10, 2005 at 04:23:40PM -0600, Olof Johansson wrote: > On Tue, Jan 11, 2005 at 05:01:27AM +1100, David Gibson wrote: > > > This patch creates functions to reserve and release the performance > > monitor hardware (including its interrupt), and makes oprofile use > > them. > > I don't see where you make oprofile use the functions? op_model_* > changes aren't included in the patch. Ah, bugger. I could have sworn I made the changes, wonder where I managed to drop them. > > +int reserve_pmc_hardware(perf_irq_t new_perf_irq) > > +{ > > + int err = -EBUSY;; > > Keeping an extra semicolon around in case you need one in a hurry? :) Oh, dear, I clearly wasn't having a good day. > > + spin_lock(&pmc_owner_lock); > > + > > + if (pmc_owner_caller) { > > + printk(KERN_WARNING "reserve_pmc_hardware: " > > + "PMC hardware busy (reserved by caller %p)\n", > > + pmc_owner_caller); > > + goto out; > > + } > > + > > + pmc_owner_caller = __builtin_return_address(0); > > + perf_irq = new_perf_irq ? : dummy_perf; > > + > > + err = 0; > > Maybe I'm the only one with such an opinion, but I find it more readable > to set the error code in the error case (if section above) instead of > defaulting to error and clearing it before returning. :) Actually, I think I do to, but I've been experimenting with this style, since it seems to be rather common in the kernel. Anyway, revised below. > > + pmc_owner_caller = NULL; > > + perf_irq = dummy_perf; > > + > > + spin_unlock(&pmc_owner_lock); > > Current oprofile code has an implicit mb(); after restoring perf_irq. I > think the implied lwsync in spin_unlock is sufficient, but I wanted to > mention it. Yes, I did think about that, and figured the barrier in the spin_unlock() should be sufficient. > How do you expect the function to be used, will there really be users > reserving the hardware without registering the interrupt handler? I think it's entirely plausible that there could be. It would seem a bit yucky for a user that wasn't using interrupts to have to define their own copy of the dummy_perf() routine. > If > there are no such users then it could be nice to reserve using the > handler instead of the return address. Well, bear in mind that from the semantics point of view it's only the non-nullness of the return address that matters, so essentially it's just a flag. The rest of the return address is just there for debugging convenience. Anyway, patch with the abovementioned stupidities removed follows. Andrew, please apply: The PPC64 interrupt code includes a hook to call when an exception from the performance monitor unit occurs. However, there's no way of reserving the hook properly, so if more than one bit of code tries to use it things will get ugly. Currently oprofile is the only user, but there are likely to be more in future e.g. perfctr, if and when it reaches a fit state for merging. This patch creates functions to reserve and release the performance monitor hardware (including its interrupt), and makes oprofile use them. It also creates a new arch/ppc64/kernel/pmc.c, in which we can put any future helper functions for handling the performance monitor counters. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/pmc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/ppc64/kernel/pmc.c 2005-01-11 10:37:52.001422584 +1100 @@ -0,0 +1,64 @@ +/* + * linux/arch/ppc64/kernel/pmc.c + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +#include +#include + +/* Ensure exceptions are disabled */ +static void dummy_perf(struct pt_regs *regs) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); + mtspr(SPRN_MMCR0, mmcr0); +} + +static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED; +static void *pmc_owner_caller; /* mostly for debugging */ +perf_irq_t perf_irq = dummy_perf; + +int reserve_pmc_hardware(perf_irq_t new_perf_irq) +{ + int err = 0; + + spin_lock(&pmc_owner_lock); + + if (pmc_owner_caller) { + printk(KERN_WARNING "reserve_pmc_hardware: " + "PMC hardware busy (reserved by caller %p)\n", + pmc_owner_caller); + err = -EBUSY; + goto out; + } + + pmc_owner_caller = __builtin_return_address(0); + perf_irq = new_perf_irq ? : dummy_perf; + + out: + spin_unlock(&pmc_owner_lock); + return err; +} + +void release_pmc_hardware(void) +{ + spin_lock(&pmc_owner_lock); + + WARN_ON(! pmc_owner_caller); + + pmc_owner_caller = NULL; + perf_irq = dummy_perf; + + spin_unlock(&pmc_owner_lock); +} Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-01-11 10:36:44.555424864 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-01-11 10:36:46.969324088 +1100 @@ -40,6 +40,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUGGER int (*__debugger)(struct pt_regs *regs); @@ -449,18 +450,7 @@ die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -/* Ensure exceptions are disabled */ -static void dummy_perf(struct pt_regs *regs) -{ - unsigned int mmcr0 = mfspr(SPRN_MMCR0); - - mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); - mtspr(SPRN_MMCR0, mmcr0); -} - -void (*perf_irq)(struct pt_regs *) = dummy_perf; - -EXPORT_SYMBOL(perf_irq); +extern perf_irq_t perf_irq; void performance_monitor_exception(struct pt_regs *regs) { Index: working-2.6/include/asm-ppc64/pmc.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-ppc64/pmc.h 2005-01-11 10:36:46.970323936 +1100 @@ -0,0 +1,29 @@ +/* + * pmc.h + * Copyright (C) 2004 David Gibson, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _PPC64_PMC_H +#define _PPC64_PMC_H + +#include + +typedef void (*perf_irq_t)(struct pt_regs *); + +int reserve_pmc_hardware(perf_irq_t new_perf_irq); +void release_pmc_hardware(void); + +#endif /* _PPC64_PMC_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-01-11 10:36:44.555424864 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-01-11 10:36:46.970323936 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o pmc.o obj-$(CONFIG_PPC_OF) += of_device.o Index: working-2.6/arch/ppc64/oprofile/common.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/common.c 2005-01-06 10:47:48.000000000 +1100 +++ working-2.6/arch/ppc64/oprofile/common.c 2005-01-11 10:42:26.788317488 +1100 @@ -15,6 +15,7 @@ #include #include #include +#include #include "op_impl.h" @@ -22,9 +23,6 @@ extern struct op_ppc64_model op_model_power4; static struct op_ppc64_model *model; -extern void (*perf_irq)(struct pt_regs *); -static void (*save_perf_irq)(struct pt_regs *); - static struct op_counter_config ctr[OP_MAX_COUNTER]; static struct op_system_config sys; @@ -35,11 +33,12 @@ static int op_ppc64_setup(void) { - /* Install our interrupt handler into the existing hook. */ - save_perf_irq = perf_irq; - perf_irq = op_handle_interrupt; + int err; - mb(); + /* Grab the hardware */ + err = reserve_pmc_hardware(op_handle_interrupt); + if (err) + return err; /* Pre-compute the values to stuff in the hardware registers. */ model->reg_setup(ctr, &sys, model->num_counters); @@ -52,10 +51,7 @@ static void op_ppc64_shutdown(void) { - mb(); - - /* Remove our interrupt handler. We may be removing this module. */ - perf_irq = save_perf_irq; + release_pmc_hardware(); } static void op_ppc64_cpu_start(void *dummy) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From michael at ellerman.id.au Tue Jan 11 19:43:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 11 Jan 2005 19:43:57 +1100 Subject: [PATCH 2/2] ppc64: Fix iseries_veth module unload race and memory leak Message-ID: <20050111084358.0ABD917DF7@ozlabs.au.ibm.com> Hi All, When the iseries_veth driver module is unloaded there is the potential for an oops and also some memory leakage. Because the HvLpEvent_unregisterHandler() function did no synchronisation, it was possible for the handler that was being unregistered to be running on another CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the iseries_veth driver to leave work in the events work queue after the module had been unloaded. When that work was eventually executed we got an oops. In addition some of the data structures in the iseries_veth driver were not being correctly freed when the module was unloaded. This is the second patch, we make iseries_veth call flush_scheduled_work() after we are sure the handler is no longer running, and also fix the memory leaks. iseries_veth.c | 26 ++++++++++++++++++++++---- 1 files changed, 22 insertions(+), 4 deletions(-) Signed-off-by: Michael Ellerman diff -urN 2.6.10-ppc64-stock/drivers/net/iseries_veth.c 2.6.10-ppc64-work/drivers/net/iseries_veth.c --- 2.6.10-ppc64-stock/drivers/net/iseries_veth.c 2004-12-25 10:14:43.000000000 +1100 +++ 2.6.10-ppc64-work/drivers/net/iseries_veth.c 2005-01-11 18:40:21.811722242 +1100 @@ -642,7 +642,7 @@ return 0; } -static void veth_destroy_connection(u8 rlp) +static void veth_stop_connection(u8 rlp) { struct veth_lpar_connection *cnx = veth_cnx[rlp]; @@ -671,9 +671,18 @@ HvLpEvent_Type_VirtualLan, cnx->num_ack_events, NULL, NULL); +} + +static void veth_destroy_connection(u8 rlp) +{ + struct veth_lpar_connection *cnx = veth_cnx[rlp]; - if (cnx->msgs) - kfree(cnx->msgs); + if (! cnx) + return; + + kfree(cnx->msgs); + kfree(cnx); + veth_cnx[rlp] = NULL; } /* @@ -1375,9 +1384,18 @@ vio_unregister_driver(&veth_driver); for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) - veth_destroy_connection(i); + veth_stop_connection(i); HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan); + + /* Hypervisor callbacks may have scheduled more work while we + * were destroying connections. Now that we've disconnected from + * the hypervisor make sure everything's finished. */ + flush_scheduled_work(); + + for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) + veth_destroy_connection(i); + } module_exit(veth_module_cleanup); From michael at ellerman.id.au Tue Jan 11 19:43:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 11 Jan 2005 19:43:57 +1100 Subject: [PATCH 1/2] ppc64: Fix iseries_veth module unload race and memory leak Message-ID: <20050111084357.7C3F317DDF@ozlabs.au.ibm.com> Hi All, When the iseries_veth driver module is unloaded there is the potential for an oops and also some memory leakage. Because the HvLpEvent_unregisterHandler() function did no synchronisation, it was possible for the handler that was being unregistered to be running on another CPU *after* HvLpEvent_unregisterHandler() had returned. This could cause the iseries_veth driver to leave work in the events work queue after the module had been unloaded. When that work was eventually executed we got an oops. In addition some of the data structures in the iseries_veth driver were not being correctly freed when the module was unloaded. This is the first patch, which makes HvLpEvent_unregisterHandler() work. arch/ppc64/kernel/HvLpEvent.c | 8 ++++++++ include/asm-ppc64/iSeries/HvLpEvent.h | 3 +++ 2 files changed, 11 insertions(+) Signed-off-by: Michael Ellerman diff -urN 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c --- 2.6.10-ppc64-stock/arch/ppc64/kernel/HvLpEvent.c 2004-06-16 17:12:51.000000000 +1000 +++ 2.6.10-ppc64-work/arch/ppc64/kernel/HvLpEvent.c 2005-01-10 16:13:33.381994263 +1100 @@ -34,10 +34,18 @@ int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType ) { int rc = 1; + + might_sleep(); + if ( eventType < HvLpEvent_Type_NumTypes ) { if ( !lpEventHandlerPaths[eventType] ) { lpEventHandler[eventType] = NULL; rc = 0; + + /* We now sleep until all other CPUs have scheduled. This ensures that + * the deletion is seen by all other CPUs, and that the deleted handler + * isn't still running on another CPU when we return. */ + synchronize_kernel(); } } return rc; diff -urN 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h --- 2.6.10-ppc64-stock/include/asm-ppc64/iSeries/HvLpEvent.h 2004-02-04 14:44:05.000000000 +1100 +++ 2.6.10-ppc64-work/include/asm-ppc64/iSeries/HvLpEvent.h 2005-01-10 16:11:18.899255131 +1100 @@ -75,6 +75,9 @@ extern int HvLpEvent_registerHandler( HvLpEvent_Type eventType, LpEventHandler hdlr); // Unregister a handler for an event type +// This call will sleep until the handler being removed is guaranteed to +// be no longer executing on any CPU. Do not call with locks held. +// // returns 0 on success // Unregister will fail if there are any paths open for the type extern int HvLpEvent_unregisterHandler( HvLpEvent_Type eventType ); From clark at esteem.com Wed Jan 12 03:19:53 2005 From: clark at esteem.com (Conn Clark) Date: Tue, 11 Jan 2005 08:19:53 -0800 Subject: email message sizes In-Reply-To: <200501102015.57394.hollis@penguinppc.org> References: <78DE72FE-631B-11D9-AD26-000A95A0560C@penguinppc.org> <200501102015.57394.hollis@penguinppc.org> Message-ID: <41E3FCA9.1060705@esteem.com> Hollis Blanchard wrote: > On Monday 10 January 2005 15:22, Hollis Blanchard wrote: > >>Hi all, I am one of two people who moderates these mailing lists. On >>occasion, people send large emails to these lists. I am of the opinion >>that 1MB emails should not be mass-mailed, but if you all have no >>problem with that then I will approve them. >> >>So are any of you on modems, or operate near the limits of your mail >>quotas? I'd like to hear comments either way: how large is ok to post >>to these mailing lists? > > > So far I have received 5 private mails indicating that 100KB is a reasonable > maximum. If you disagree please speak up... > > -Hollis I say 101K because I think it should be 100k and I know I will want to send something just over the limit. -- Conn Clark ***************************************************************** Give a man a match and you heat him for a moment. Set him on fire and you'll heat him for life. ***************************************************************** Conn Clark Engineering Stooge clark at esteem.com Electronic Systems Technology Inc. www.esteem.com Stock Ticker Symbol ELST From linas at austin.ibm.com Wed Jan 12 09:17:23 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 11 Jan 2005 16:17:23 -0600 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> Message-ID: <20050111221723.GE23690@austin.ibm.com> On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark: > > Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into > userspace. Their hack on ppc64 was to look at the high nibble of the > address and convert it to a non EEH address if required :) > > Im not sure how best to solve the userspace mmap issue but there are a > few groups wanting that. Somewhat off-topic ... but ... 1) If you design your hardware correctly, there are some amazing things you can do (performance wise) by mmaping pci card resources into user space. If your hardwares is done right, then user corruption can't hurt the system. This was the defacto method for getting high performance graphics on IBM RS/6000, sgi, HP and Sun workstations many moons ago. 2) There is interest in the virtual i/o community about mmaping funky stuff to userspace, but that conversation may be for a different day. The question is (for example) how to build a high-performance virtual scsi server in userspace (without kernel pieces) which is a design point some people like. Later... --linas From linas at austin.ibm.com Wed Jan 12 09:27:08 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 11 Jan 2005 16:27:08 -0600 Subject: [PATCH] PPC64: Trivial Cleanup: EEH_REGION Message-ID: <20050111222708.GF23690@austin.ibm.com> Hi Paul, Please forward upstream if you agree. This is a dumb, dorky cleanup patch: Per last round of emails, the concept of EEH _REGION is gone, but a few stubs remained. This patch removes them. Note there is some funny business in the SLB code that I did not understand, and so I left that alone. I'm guessing that it should be cut out as well. Signed-off-by: Linas Vepstas --linas -------------- next part -------------- ===== arch/ppc64/mm/hash_utils.c 1.55 vs edited ===== --- 1.55/arch/ppc64/mm/hash_utils.c 2004-10-28 02:39:49 -05:00 +++ edited/arch/ppc64/mm/hash_utils.c 2005-01-10 16:58:40 -06:00 @@ -295,12 +295,6 @@ int hash_page(unsigned long ea, unsigned vsid = get_kernel_vsid(ea); break; #if 0 - case EEH_REGION_ID: - /* - * Should only be hit if there is an access to MMIO space - * which is protected by EEH. - * Send the problem up to do_page_fault - */ case KERNEL_REGION_ID: /* * Should never get here - entire 0xC0... region is bolted. ===== arch/ppc64/mm/slb.c 1.3 vs edited ===== --- 1.3/arch/ppc64/mm/slb.c 2004-09-03 04:08:16 -05:00 +++ edited/arch/ppc64/mm/slb.c 2005-01-10 17:03:36 -06:00 @@ -75,6 +75,8 @@ static void slb_flush_and_rebolt(void) : "memory"); } +#define EEHREGIONBASE ASM_CONST(0xA000000000000000) + /* Flush all user entries from the segment table of the current processor. */ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) { ===== include/asm-ppc64/page.h 1.36 vs edited ===== --- 1.36/include/asm-ppc64/page.h 2004-10-28 02:39:49 -05:00 +++ edited/include/asm-ppc64/page.h 2005-01-10 16:59:50 -06:00 @@ -203,10 +203,8 @@ extern int page_is_ram(unsigned long pfn #define KERNELBASE PAGE_OFFSET #define VMALLOCBASE ASM_CONST(0xD000000000000000) #define IOREGIONBASE ASM_CONST(0xE000000000000000) -#define EEHREGIONBASE ASM_CONST(0xA000000000000000) #define IO_REGION_ID (IOREGIONBASE>>REGION_SHIFT) -#define EEH_REGION_ID (EEHREGIONBASE>>REGION_SHIFT) #define VMALLOC_REGION_ID (VMALLOCBASE>>REGION_SHIFT) #define KERNEL_REGION_ID (KERNELBASE>>REGION_SHIFT) #define USER_REGION_ID (0UL) From david at gibson.dropbear.id.au Wed Jan 12 11:18:35 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 12 Jan 2005 11:18:35 +1100 Subject: [PATCH] PPC64: Trivial Cleanup: EEH_REGION In-Reply-To: <20050111222708.GF23690@austin.ibm.com> References: <20050111222708.GF23690@austin.ibm.com> Message-ID: <20050112001835.GA12816@localhost.localdomain> On Tue, Jan 11, 2005 at 04:27:08PM -0600, Linas Vepstas wrote: > > Hi Paul, > > Please forward upstream if you agree. > > This is a dumb, dorky cleanup patch: > Per last round of emails, the concept of EEH _REGION is gone, > but a few stubs remained. This patch removes them. > > Note there is some funny business in the SLB code that > I did not understand, and so I left that alone. > I'm guessing that it should be cut out as well. Yes and no. The code that's there needs to stay - it's a workaround for a POWER5 hardware bug - but it doesn't have any real connection to EEH. The only reason we use EEHREGIONBASE is that it's a segment address which will never have anything real mapped into it. 0xFFFFFFFFF0000000 would do just as well. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From ahuja at austin.ibm.com Wed Jan 12 12:08:13 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Tue, 11 Jan 2005 19:08:13 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. Message-ID: <41E4787D.90309@austin.ibm.com> There is a requirement to collect real usage values of each partition in LPAR environment on pseries as well as iseries. This patch enables that feature. The current purr (processor Utilization register ) values of each of the processors is stored in a per_cpu data array. this is then summed and used to calculate various numbers for managing lpars. The patch also calculates how much real cpu time each process uses and stores this value in a ppc64 specific struct. The value is needed by CKRM to do further calculations. Signed-off-by: Manish Ahuja -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050111/b931e9b9/attachment.txt From paulus at samba.org Wed Jan 12 13:36:56 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 12 Jan 2005 13:36:56 +1100 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Manish Ahuja writes: > This patch enables that feature. The current purr (processor Utilization > register ) > values of each of the processors is stored in a per_cpu data array. this > is then > summed and used to calculate various numbers for managing lpars. Don't you also need to update purr_data_array in timer_interrupt as well? You seem to be doing that only on context switch, which won't be updated in a timely fashion necessarily (think of a compute-bound task on a lightly-loaded machine). > + for_each_cpu(cpu){ > + cus = &per_cpu(purr_data_array, cpu); > + sum_purr += cus->current_purr; > + } The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the "}" should be one tab to the left of where it is. > +/* Used to store Processor Utilization register (purr) values */ > +DECLARE_PER_CPU(struct purr_data, purr_data_array); > + > +struct purr_data { > + u64 current_purr; /* Holds the current purr register values */ > +}; Do we really need a struct to store one thing? Are there other things you plan to add later? Paul. From akpm at osdl.org Wed Jan 12 14:51:27 2005 From: akpm at osdl.org (Andrew Morton) Date: Tue, 11 Jan 2005 19:51:27 -0800 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <20050111195127.23300721.akpm@osdl.org> Manish Ahuja wrote: > > There is a requirement to collect real usage values of each partition in > LPAR environment > on pseries as well as iseries. What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ? From olof at austin.ibm.com Wed Jan 12 15:06:28 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 11 Jan 2005 22:06:28 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <41E4787D.90309@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> Message-ID: <20050112040628.GA13221@austin.ibm.com> Hi Manish, On Tue, Jan 11, 2005 at 07:08:13PM -0600, Manish Ahuja wrote: > The patch also calculates how much real cpu time each process uses and > stores this value in a ppc64 specific struct. I was going to ask a couple of questions about this and noticed Andrew Morton's reply pointing at cputime. That answered most of them (how other archs might be doing it). > The value is needed by CKRM to do further calculations. How will CKRM use this? Does it have architecture-specific code to dig this out of the thread_struct again? Could they use the cputime interface if we hooked into that instead? Finally: There's two ways to read PURR on our platform: One is to read the SPR value, the other to get it from the hypervisor via the H_PURR call. Do they measure the same thing and stay consistent? -Olof From scheel at vnet.ibm.com Wed Jan 12 19:56:53 2005 From: scheel at vnet.ibm.com (Jeff Scheel) Date: Wed, 12 Jan 2005 08:56:53 +0000 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> References: <41E4787D.90309@austin.ibm.com> <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Message-ID: <1105520213.25534.17.camel@sheepdog.rchland.ibm.com> On Wed, 2005-01-12 at 02:36, Paul Mackerras wrote: > Don't you also need to update purr_data_array in timer_interrupt as > well? You seem to be doing that only on context switch, which won't > be updated in a timely fashion necessarily (think of a compute-bound > task on a lightly-loaded machine). I agree it doesn't sound like only collecting this data at context switch does the trick. If we hook the timer (say in the decr path), then you need not have the code in context switch. That is until you implement a tickless timer and decr goes away. > Do we really need a struct to store one thing? Are there other things > you plan to add later? It seems to me that if we tucked the last PURR aside on each decrementer tick, we could simply let the kernel tasks which need this information retrieve it as long as we can guarantee atomic update of the values from the interrupt level. Then, interfaces like /proc/ppc64/lparcfg can do summing and other interfaces can use only the processor value if that's all they need. Given this, I'd vote for sticking the last PURR value for a processor in sum per processor structure that exists today like the Paca. Thoughts? -- Jeff Scheel (scheel at vnet.ibm.com) From scheel at vnet.ibm.com Wed Jan 12 19:47:23 2005 From: scheel at vnet.ibm.com (Jeff Scheel) Date: Wed, 12 Jan 2005 08:47:23 +0000 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050112040628.GA13221@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> Message-ID: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> On Wed, 2005-01-12 at 04:06, Olof Johansson wrote: > > The value is needed by CKRM to do further calculations. > > How will CKRM use this? Does it have architecture-specific code to > dig this out of the thread_struct again? Could they use the cputime > interface if we hooked into that instead? The only interface which exists today to retrieve purr is /proc/ppc64/lparconfig which provides PURR summed across all processors. We are working on other means to retrieve more specific data in the near future. > Finally: There's two ways to read PURR on our platform: One is to read > the SPR value, the other to get it from the hypervisor via the H_PURR > call. Do they measure the same thing and stay consistent? Olof, you are correct. We'll want to go directly to the hardware and avoid the overhead of a hypervisor call. Only is the instance where the hypervisor is emulating PURR will we want to use the hypervisor call. The "art" is detecting when/if that is occurring. Dave E. should be able to help us with this. -- Jeff Scheel (scheel at vnet.ibm.com) From ahuja at austin.ibm.com Thu Jan 13 03:30:21 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Wed, 12 Jan 2005 10:30:21 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> References: <41E4787D.90309@austin.ibm.com> <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Message-ID: <41E5509D.2070102@austin.ibm.com> Paul Mackerras wrote: > >Don't you also need to update purr_data_array in timer_interrupt as >well? You seem to be doing that only on context switch, which won't >be updated in a timely fashion necessarily (think of a compute-bound >task on a lightly-loaded machine). > > > Yes, I do need to add this in other places to improve the collection times. I have stepped away from using the old system completely and would actually like to add more collection points in interrupt routines. This will also enable me to collect real system time and other data which i plan to use at other places. I held that piece back since I saw what martin and john have been doing. Will put a more cohesive patch out. But this bit will remain unchanged with the other additions. >>+ for_each_cpu(cpu) >>+ cus = &per_cpu(purr_data_array, cpu); >>+ sum_purr += cus->current_purr; >>+ } >> >> > >The spacing is wrong here, it should be "for_each_cpu(cpu) {" and the >"}" should be one tab to the left of where it is. > > > Wilco .. will fix it .. >>+/* Used to store Processor Utilization register (purr) values */ >>+DECLARE_PER_CPU(struct purr_data, purr_data_array); >>+ >>+struct purr_data { >>+ u64 current_purr; /* Holds the current purr register values */ >>+}; >> >> > >Do we really need a struct to store one thing? Are there other things >you plan to add later? > > > In my prototype there are a few more things. But since the other patch is not final, I only added the one thing, that I knew for sure I wanted. Having other members and not using them, generally gets you knocked on the head... so... I definitely plan to add other things.. Manish From ahuja at austin.ibm.com Thu Jan 13 03:37:32 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Wed, 12 Jan 2005 10:37:32 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050112040628.GA13221@austin.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> Message-ID: <41E5524C.6000107@austin.ibm.com> >How will CKRM use this? Does it have architecture-specific code to >dig this out of the thread_struct again? Could they use the cputime >interface if we hooked into that instead? > > > I have provided them my test machine and they are working on setting up their stuff and as things get clear on whether they wish to use cputime interface or collect directly, I shall accordingly provide a small patch to enable them on ppc64. From will_schmidt at vnet.ibm.com Thu Jan 13 04:19:38 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Wed, 12 Jan 2005 11:19:38 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> References: <41E4787D.90309@austin.ibm.com> <20050112040628.GA13221@austin.ibm.com> <1105519643.25534.7.camel@sheepdog.rchland.ibm.com> Message-ID: <41E55C2A.2030309@vnet.ibm.com> Jeff Scheel wrote: > On Wed, 2005-01-12 at 04:06, Olof Johansson wrote: ... > Olof, you are correct. We'll want to go directly to the hardware and > avoid the overhead of a hypervisor call. Only is the instance where the > hypervisor is emulating PURR will we want to use the hypervisor call. > The "art" is detecting when/if that is occurring. Dave E. should be > able to help us with this. Related to the PURR hcall comments. (Yeah, I already visited that hcall/mfspr topic once.. :-) ) "While there is an hcall for reading the purr, and that hcall will work, it should not be used on [Power5] systems. on GR and later processors the OS should be doing a mfspr PURR directly. The purpose of the hcall was for prototyping PHYP/PURR behavior on pre-GR processors. " From paulus at samba.org Thu Jan 13 21:35:25 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:35:25 +1100 Subject: [PATCH] PPC64 Move thread_info flags to its own cache line Message-ID: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> This patch fixes a problem I have been seeing since all the preempt changes went in, which is that ppc64 SMP systems would livelock randomly if preempt was enabled. It turns out that what was happening was that one cpu was spinning in spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly doing preempt_enable() and preempt_disable() calls. The other cpu had the lock and was trying to set the TIF_NEED_RESCHED flag for the task running on the first cpu. That is an atomic operation which has to be retried if another cpu writes to the same cacheline between the load and the store, which the other cpu was doing every time it did preempt_enable() or preempt_disable(). I decided to move the thread_info flags field into the next cache line, since it is the only field that would regularly be modified by cpus other than the one running the task that owns the thread_info. (OK possibly the `cpu' field would be on a rebalance; I don't know the rebalancing code, but that should be pretty infrequent.) Thus, moving the flags field seems like a good idea generally as well as solving the immediate problem. For the record I am pretty unhappy with the code we use for spin_lock et al. with preemption turned on (the BUILD_LOCK_OPS stuff in spinlock.c). For a start we do the atomic op (_raw_spin_trylock) each time around the loop. That is going to be generating a lot of unnecessary bus (or fabric) traffic. Instead, after we fail to get the lock we should poll it with simple loads until we see that it is clear and then retry the atomic op. Assuming a reasonable cache design, the loads won't generate any bus traffic until another cpu writes to the cacheline containing the lock. Secondly we have lost the __spin_yield call that we had on ppc64, which is an important optimization when we are running under the hypervisor. I can't just put that in cpu_relax because I need to know which (virtual) cpu is holding the lock, so that I can tell the hypervisor which virtual cpu to give my time slice to. That information is stored in the lock variable, which is why __spin_yield needs the address of the lock. Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/thread_info.h test/include/asm-ppc64/thread_info.h --- linux-2.5/include/asm-ppc64/thread_info.h 2004-12-18 08:35:35.000000000 +1100 +++ test/include/asm-ppc64/thread_info.h 2005-01-13 18:36:24.000000000 +1100 @@ -12,6 +12,7 @@ #ifndef __ASSEMBLY__ #include +#include #include #include #include @@ -22,12 +23,13 @@ struct thread_info { struct task_struct *task; /* main task structure */ struct exec_domain *exec_domain; /* execution domain */ - unsigned long flags; /* low level flags */ int cpu; /* cpu we're on */ int preempt_count; struct restart_block restart_block; /* set by force_successful_syscall_return */ unsigned char syscall_noerror; + /* low level flags - has atomic operations done on it */ + unsigned long flags ____cacheline_aligned_in_smp; }; /* @@ -39,12 +41,12 @@ { \ .task = &tsk, \ .exec_domain = &default_exec_domain, \ - .flags = 0, \ .cpu = 0, \ .preempt_count = 1, \ .restart_block = { \ .fn = do_no_restart_syscall, \ }, \ + .flags = 0, \ } #define init_thread_info (init_thread_union.thread_info) From paulus at samba.org Thu Jan 13 21:37:52 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:37:52 +1100 Subject: [PATCH] PPC64 Disable preemption in flush_tlb_pending Message-ID: <16870.20352.503047.221064@cargo.ozlabs.ibm.com> The preempt debug stuff found a place where we were using smp_processor_id() without having preemption disabled, in flush_tlb_pending. This patch fixes it by using get_cpu_var and put_cpu_var instead of the __get_cpu_var variant. Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/tlbflush.h test/include/asm-ppc64/tlbflush.h --- linux-2.5/include/asm-ppc64/tlbflush.h 2004-06-07 08:25:32.000000000 +1000 +++ test/include/asm-ppc64/tlbflush.h 2005-01-13 19:35:37.000000000 +1100 @@ -32,10 +32,11 @@ static inline void flush_tlb_pending(void) { - struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); + struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch); if (batch->index) __flush_tlb_pending(batch); + put_cpu_var(ppc64_tlb_batch); } #define flush_tlb_mm(mm) flush_tlb_pending() From paulus at samba.org Thu Jan 13 21:41:36 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:41:36 +1100 Subject: [PATCH] PPC64 Call preempt_schedule on exception exit Message-ID: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> This patch mirrors the recent changes on x86 to call preempt_schedule rather than schedule in the exception exit path, in the case where the preempt_count is zero and the TIF_NEED_RESCHED bit is set. I'm a little concerned that this means that we have a window where interrupts are enabled and we are on our way into preempt_schedule, but preempt_count is still zero. Ingo's proposed preempt_schedule_irq would fix this, and I think something like that should go in. Signed-off-by: Paul Mackerras diff -urN linux-2.5/arch/ppc64/kernel/entry.S test/arch/ppc64/kernel/entry.S --- linux-2.5/arch/ppc64/kernel/entry.S 2005-01-10 07:54:27.000000000 +1100 +++ test/arch/ppc64/kernel/entry.S 2005-01-13 20:48:36.000000000 +1100 @@ -574,25 +574,22 @@ crandc eq,cr1*4+eq,eq bne restore /* here we are preempting the current task */ -1: lis r0,PREEMPT_ACTIVE at h - stw r0,TI_PREEMPT(r9) +1: #ifdef CONFIG_PPC_ISERIES li r0,1 stb r0,PACAPROCENABLED(r13) #endif ori r10,r10,MSR_EE mtmsrd r10,1 /* reenable interrupts */ - bl .schedule + bl .preempt_schedule mfmsr r10 clrrdi r9,r1,THREAD_SHIFT rldicl r10,r10,48,1 /* disable interrupts again */ - li r0,0 rotldi r10,r10,16 mtmsrd r10,1 ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b - stw r0,TI_PREEMPT(r9) b restore user_work: From paulus at samba.org Thu Jan 13 21:45:06 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:45:06 +1100 Subject: [PATCH] PPC64 can do preempt debug too Message-ID: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> This patch enables the DEBUG_PREEMPT config option for PPC64. I have this turned on on my desktop G5 and it isn't finding any problems. (It did find one problem, in flush_tlb_pending(), that I have just sent a patch for.) BTW, do we really need to restrict which architectures the config option is available on? Signed-off-by: Paul Mackerras diff -urN linux-2.5/include/asm-ppc64/smp.h test/include/asm-ppc64/smp.h --- linux-2.5/include/asm-ppc64/smp.h 2004-11-26 20:40:32.000000000 +1100 +++ test/include/asm-ppc64/smp.h 2005-01-10 19:49:03.000000000 +1100 @@ -38,7 +38,7 @@ extern void smp_message_recv(int, struct pt_regs *); -#define smp_processor_id() (get_paca()->paca_index) +#define __smp_processor_id() (get_paca()->paca_index) #define hard_smp_processor_id() (get_paca()->hw_cpu_id) extern cpumask_t cpu_sibling_map[NR_CPUS]; diff -urN linux-2.5/lib/Kconfig.debug test/lib/Kconfig.debug --- linux-2.5/lib/Kconfig.debug 2005-01-11 08:57:21.000000000 +1100 +++ test/lib/Kconfig.debug 2005-01-11 09:13:28.000000000 +1100 @@ -50,7 +50,7 @@ config DEBUG_PREEMPT bool "Debug preemptible kernel" - depends on PREEMPT && X86 + depends on PREEMPT && (X86 || PPC64) default y help If you say Y here then the kernel will use a debug variant of the From paulus at samba.org Thu Jan 13 21:47:30 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Jan 2005 21:47:30 +1100 Subject: [PATCH] PPC64 Add PREEMPT_BKL option Message-ID: <16870.20930.566334.782203@cargo.ozlabs.ibm.com> This patch adds the PREEMPT_BKL config option for PPC64, shamelessly stolen from the i386 version. I have this turned on in the kernel on my desktop G5 and it seems to be just fine. Signed-off-by: Paul Mackerras diff -urN linux-2.5/arch/ppc64/Kconfig test/arch/ppc64/Kconfig --- linux-2.5/arch/ppc64/Kconfig 2005-01-11 08:57:19.000000000 +1100 +++ test/arch/ppc64/Kconfig 2005-01-12 20:25:17.000000000 +1100 @@ -231,6 +231,17 @@ Say Y here if you are building a kernel for a desktop, embedded or real-time system. Say N if you are unsure. +config PREEMPT_BKL + bool "Preempt The Big Kernel Lock" + depends on PREEMPT + default y + help + This option reduces the latency of the kernel by making the + big kernel lock preemptible. + + Say Y here if you are building a kernel for a desktop system. + Say N if you are unsure. + # # Use the generic interrupt handling code in kernel/irq/: # From mjw at us.ibm.com Fri Jan 14 05:46:23 2005 From: mjw at us.ibm.com (Mike Wolf) Date: Thu, 13 Jan 2005 12:46:23 -0600 Subject: [PATCH] PPC64: 32bit wrapper for ioctls. Message-ID: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> Hi Paul, The patch adds some 32bit wrappers for 2 ioctls that Java needs. Assuming this doesn't generate a round of discussion, please forward upstream to akpm/torvalds. Signed-off-by: Mike Wolf mjw at us.ibm.com -------------- next part -------------- A non-text attachment was scrubbed... Name: ioctl32.patch Type: text/x-patch Size: 482 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050113/8bc77f67/attachment.bin From olof at austin.ibm.com Fri Jan 14 07:00:48 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 13 Jan 2005 14:00:48 -0600 Subject: [PATCH] [PPC64] iommu: avoid ISA io space on POWER3 Message-ID: <20050113200048.GA11683@austin.ibm.com> Hi, On some systems, the first PCI bus has a ISA I/O hole at the first 16MB. We can't use this space for DMA addresses on the bus. On Python-based machines, we'll skip the first 256MB on buses that have the hole, just as we do on later systems. This means that the first bus will have 768MB of DMA space shared between the devices on it. Signed-off-by: Olof Johansson Acked-by: Paul Mackerras --- linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~iommu-iohole 2005-01-12 16:29:55.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-12 16:34:57.000000000 -0600 @@ -327,12 +327,25 @@ static void iommu_bus_setup_pSeries(stru /* Root bus */ if (is_python(dn)) { struct iommu_table *tbl; + unsigned int *iohole; DBG("Python root bus %s\n", bus->name); - /* 1GB window by default */ - dn->phb->dma_window_size = 1 << 30; - dn->phb->dma_window_base_cur = 0; + iohole = (unsigned int *)get_property(dn, "io-hole", 0); + + if (iohole) { + /* On first bus we need to leave room for the + * ISA address space. Just skip the first 256MB + * alltogether. This leaves 768MB for the window. + */ + DBG("PHB has io-hole, reserving 256MB\n"); + dn->phb->dma_window_size = 3 << 28; + dn->phb->dma_window_base_cur = 1 << 28; + } else { + /* 1GB window by default */ + dn->phb->dma_window_size = 1 << 30; + dn->phb->dma_window_base_cur = 0; + } tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); _ From anton at samba.org Fri Jan 14 10:51:19 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 14 Jan 2005 10:51:19 +1100 Subject: [PATCH] ppc64: Allow EEH to be disabled Message-ID: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Hi, I was thinking of sending this upstream. Any thoughts? Anton -- Allow EEH to be disabled for pSeries targets, but only if the EMBEDDED option is enabled. Signed-off-by: Anton Blanchard diff -puN arch/ppc64/Kconfig~no-eeh arch/ppc64/Kconfig --- foobar2/arch/ppc64/Kconfig~no-eeh 2005-01-12 00:34:25.902201644 +1100 +++ foobar2-anton/arch/ppc64/Kconfig 2005-01-12 00:34:25.934199201 +1100 @@ -231,6 +231,11 @@ config PREEMPT Say Y here if you are building a kernel for a desktop, embedded or real-time system. Say N if you are unsure. +config EEH + bool "PCI Extended Error Handling (EEH)" if EMBEDDED + depends on PPC_PSERIES + default y if !EMBEDDED + # # Use the generic interrupt handling code in kernel/irq/: # diff -puN arch/ppc64/kernel/Makefile~no-eeh arch/ppc64/kernel/Makefile --- foobar2/arch/ppc64/kernel/Makefile~no-eeh 2005-01-12 00:34:25.908201186 +1100 +++ foobar2-anton/arch/ppc64/kernel/Makefile 2005-01-12 00:34:25.932199354 +1100 @@ -30,9 +30,10 @@ obj-$(CONFIG_PPC_ISERIES) += iSeries_irq obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - eeh.o pSeries_nvram.o rtasd.o ras.o \ + pSeries_nvram.o rtasd.o ras.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o +obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o diff -puN include/asm-ppc64/eeh.h~no-eeh include/asm-ppc64/eeh.h --- foobar2/include/asm-ppc64/eeh.h~no-eeh 2005-01-12 00:34:25.913200804 +1100 +++ foobar2-anton/include/asm-ppc64/eeh.h 2005-01-12 00:34:25.931199430 +1100 @@ -23,7 +23,6 @@ #include #include #include -#include struct pci_dev; struct device_node; @@ -33,14 +32,18 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) -#ifdef CONFIG_PPC_PSERIES -extern void __init eeh_init(void); -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); +#ifdef CONFIG_EEH +void __init eeh_init(void); +unsigned long eeh_check_failure(const volatile void __iomem *token, + unsigned long val); int eeh_dn_check_failure (struct device_node *dn, struct pci_dev *dev); void __iomem *eeh_ioremap(unsigned long addr, void __iomem *vaddr); void __init pci_addr_cache_build(void); #else +#define eeh_init() #define eeh_check_failure(token, val) (val) +#define eeh_dn_check_failure(dn, dev) (0) +#define pci_addr_cache_build() #endif /** @@ -69,8 +72,6 @@ void eeh_remove_device(struct pci_dev *) #define EEH_ENABLE 1 #define EEH_RELEASE_LOADSTORE 2 #define EEH_RELEASE_DMA 3 -int eeh_set_option(struct pci_dev *dev, int options); - /** * Notifier event flags. @@ -89,6 +90,7 @@ struct eeh_event { }; /** Register to find out about EEH events. */ +struct notifier_block; int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); @@ -194,7 +196,8 @@ static inline void eeh_raw_writeq(u64 va #define EEH_CHECK_ALIGN(v,a) \ ((((unsigned long)(v)) & ((a) - 1)) == 0) -static inline void eeh_memset_io(volatile void __iomem *addr, int c, unsigned long n) +static inline void eeh_memset_io(volatile void __iomem *addr, int c, + unsigned long n) { u32 lc = c; lc |= lc << 8; _ From linas at austin.ibm.com Fri Jan 14 11:31:59 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 13 Jan 2005 18:31:59 -0600 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050114003159.GO23690@austin.ibm.com> On Fri, Jan 14, 2005 at 10:51:19AM +1100, Anton Blanchard was heard to remark: > > Hi, > > I was thinking of sending this upstream. Any thoughts? Yes, can you help me get my other patch accepted? (This patch, though, looks fine to me). (Although one could probably move even more things into the #ifdef region, just to be clean.) --linas From zwane at arm.linux.org.uk Fri Jan 14 11:43:39 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Thu, 13 Jan 2005 17:43:39 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu Message-ID: I found the following very handy for use as a reference platform when working on i386 hotplug cpu recently. It's been tested on a G5 system with a cpu going on/offline every second and make -j. I've also tried a number of config options to avoid compile breakage. Signed-off-by: Zwane Mwaikambo Index: linux-2.6.10-mm3/arch/ppc64/Kconfig =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/Kconfig,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 Kconfig --- linux-2.6.10-mm3/arch/ppc64/Kconfig 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/Kconfig 13 Jan 2005 16:35:39 -0000 @@ -305,7 +305,7 @@ source "drivers/pci/Kconfig" config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" - depends on SMP && EXPERIMENTAL && PPC_PSERIES + depends on SMP && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC) select HOTPLUG ---help--- Say Y here to be able to turn CPUs off and on. Index: linux-2.6.10-mm3/arch/ppc64/kernel/idle.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/idle.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 idle.c --- linux-2.6.10-mm3/arch/ppc64/kernel/idle.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/idle.c 13 Jan 2005 16:34:24 -0000 @@ -364,7 +364,7 @@ int idle_setup(void) } } #endif /* CONFIG_PPC_PSERIES */ -#ifndef CONFIG_PPC_ISERIES +#if !defined(CONFIG_PPC_ISERIES) && !defined(CONFIG_HOTPLUG_CPU) if (systemcfg->platform == PLATFORM_POWERMAC || systemcfg->platform == PLATFORM_MAPLE) { printk(KERN_INFO "Using native/NAP idle loop\n"); Index: linux-2.6.10-mm3/arch/ppc64/kernel/irq.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/irq.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 irq.c --- linux-2.6.10-mm3/arch/ppc64/kernel/irq.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/irq.c 13 Jan 2005 23:51:29 -0000 @@ -479,3 +479,31 @@ EXPORT_SYMBOL(do_softirq); #endif /* CONFIG_IRQSTACKS */ +#ifdef CONFIG_HOTPLUG_CPU +void fixup_irqs(cpumask_t map) +{ + unsigned int irq; + static int warned; + + for_each_irq(irq) { + cpumask_t mask; + + if (irq_desc[irq].status & IRQ_PER_CPU) + continue; + + cpus_and(mask, irq_affinity[irq], map); + if (any_online_cpu(mask) == NR_CPUS) { + printk("Breaking affinity for irq %i\n", irq); + mask = map; + } + if (irq_desc[irq].handler->set_affinity) + irq_desc[irq].handler->set_affinity(irq, mask); + else if (irq_desc[irq].action && !(warned++)) + printk("Cannot set affinity for irq %i\n", irq); + } + + local_irq_enable(); + mdelay(1); + local_irq_disable(); +} +#endif Index: linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pSeries_setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pSeries_setup.c 13 Jan 2005 20:44:05 -0000 @@ -327,8 +327,9 @@ static void __init pSeries_discover_pic } } -static void pSeries_cpu_die(void) +static void pSeries_mach_cpu_die(void) { + idle_task_exit(); local_irq_disable(); /* Some hardware requires clearing the CPPR, while other hardware does not * it is safe either way @@ -606,7 +607,7 @@ struct machdep_calls __initdata pSeries_ .power_off = rtas_power_off, .halt = rtas_halt, .panic = rtas_os_term, - .cpu_die = pSeries_cpu_die, + .cpu_die = pSeries_mach_cpu_die, .get_boot_time = pSeries_get_boot_time, .get_rtc_time = pSeries_get_rtc_time, .set_rtc_time = pSeries_set_rtc_time, Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac.h --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac.h 13 Jan 2005 16:34:24 -0000 @@ -8,6 +8,9 @@ * Declaration for the various functions exported by the * pmac_* files. Mostly for use by pmac_setup */ +#ifdef CONFIG_HOTPLUG_CPU +DECLARE_PER_CPU(int, cpu_state); +#endif extern void pmac_get_boot_time(struct rtc_time *tm); extern void pmac_get_rtc_time(struct rtc_time *tm); Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_setup.c 13 Jan 2005 16:34:24 -0000 @@ -229,6 +229,25 @@ void __pmac pmac_halt(void) pmac_power_off(); } +#ifdef CONFIG_HOTPLUG_CPU +static void pmac_mach_cpu_die(void) +{ + unsigned int cpu; + + local_irq_disable(); + cpu = smp_processor_id(); + printk(KERN_DEBUG "CPU%d offline\n", cpu); + __get_cpu_var(cpu_state) = CPU_DEAD; + wmb(); + while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) + cpu_relax(); + + flush_tlb_pending(); + cpu_set(cpu, cpu_online_map); + local_irq_enable(); +} +#endif + #ifdef CONFIG_BOOTX_TEXT static int dummy_getc_poll(void) { @@ -455,5 +474,8 @@ struct machdep_calls __initdata pmac_md .calibrate_decr = pmac_calibrate_decr, .feature_call = pmac_do_feature_call, .progress = pmac_progress, - .check_legacy_ioport = pmac_check_legacy_ioport + .check_legacy_ioport = pmac_check_legacy_ioport, +#ifdef CONFIG_HOTPLUG_CPU + .cpu_die = pmac_mach_cpu_die, +#endif }; Index: linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_smp.c --- linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/pmac_smp.c 14 Jan 2005 00:32:10 -0000 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -296,6 +297,38 @@ static void __init smp_core99_setup_cpu( } } +#ifdef CONFIG_HOTPLUG_CPU +/* State of each CPU during hotplug phases */ +DEFINE_PER_CPU(int, cpu_state) = { 0 }; + +static int pmac_cpu_disable(void) +{ + unsigned int cpu = smp_processor_id(); + + if (cpu == boot_cpuid) + return -EBUSY; + + systemcfg->processorCount--; + cpu_clear(cpu, cpu_online_map); + fixup_irqs(cpu_online_map); + return 0; +} + +static void pmac_cpu_die(unsigned int cpu) +{ + int i; + + for (i = 0; i < 100; i++) { + rmb(); + if (per_cpu(cpu_state, cpu) == CPU_DEAD) + return; + msleep(100); + } + printk(KERN_ERR "CPU%d didn't die...\n", cpu); +} + +#endif + struct smp_ops_t core99_smp_ops __pmacdata = { .message_pass = smp_mpic_message_pass, .probe = smp_core99_probe, @@ -308,4 +341,8 @@ struct smp_ops_t core99_smp_ops __pmacda void __init pmac_setup_smp(void) { smp_ops = &core99_smp_ops; +#ifdef CONFIG_HOTPLUG_CPU + smp_ops->cpu_disable = pmac_cpu_disable; + smp_ops->cpu_die = pmac_cpu_die; +#endif } Index: linux-2.6.10-mm3/arch/ppc64/kernel/setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 setup.c --- linux-2.6.10-mm3/arch/ppc64/kernel/setup.c 13 Jan 2005 16:27:26 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/setup.c 13 Jan 2005 21:26:48 -0000 @@ -1345,9 +1345,6 @@ early_param("xmon", early_xmon); void cpu_die(void) { - idle_task_exit(); if (ppc_md.cpu_die) ppc_md.cpu_die(); - local_irq_disable(); - for (;;); } Index: linux-2.6.10-mm3/arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.c --- linux-2.6.10-mm3/arch/ppc64/kernel/smp.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/smp.c 14 Jan 2005 00:26:26 -0000 @@ -406,10 +406,39 @@ void __devinit smp_prepare_boot_cpu(void current_set[boot_cpuid] = current->thread_info; } +#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PPC_PMAC) +#include "pmac.h" +static int cpu_enable(unsigned int cpu) +{ + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + return -ENOSYS; + + /* get the target out of it's holding state */ + per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; + wmb(); + + while (!cpu_online(cpu)) + cpu_relax(); + + fixup_irqs(cpu_online_map); + /* counter the irq disable in fixup_irqs */ + local_irq_enable(); + return 0; +} +#else +static int cpu_enable(unsigned int cpu) +{ + return -ENOSYS; +} +#endif + int __devinit __cpu_up(unsigned int cpu) { int c; + if (system_state == SYSTEM_RUNNING && !cpu_enable(cpu)) + return 0; + /* At boot, don't bother with non-present cpus -JSCHOPP */ if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) return -ENOENT; Index: linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 sysfs.c --- linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c 13 Jan 2005 16:27:27 -0000 1.1.1.1 +++ linux-2.6.10-mm3/arch/ppc64/kernel/sysfs.c 13 Jan 2005 16:36:23 -0000 @@ -18,7 +18,7 @@ #include #include #include - +#include static DEFINE_PER_CPU(struct cpu, cpu_devices); @@ -413,9 +413,7 @@ static int __init topology_init(void) * CPU. For instance, the boot cpu might never be valid * for hotplugging. */ -#ifdef CONFIG_HOTPLUG_CPU - if (systemcfg->platform != PLATFORM_PSERIES_LPAR) -#endif + if (!ppc_md.cpu_die) c->no_control = 1; if (cpu_online(cpu) || (c->no_control == 0)) { Index: linux-2.6.10-mm3/include/asm-ppc64/smp.h =================================================================== RCS file: /home/cvsroot/linux-2.6.10-mm3/include/asm-ppc64/smp.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.h --- linux-2.6.10-mm3/include/asm-ppc64/smp.h 13 Jan 2005 16:27:35 -0000 1.1.1.1 +++ linux-2.6.10-mm3/include/asm-ppc64/smp.h 13 Jan 2005 16:34:24 -0000 @@ -29,7 +29,7 @@ extern int boot_cpuid; extern int boot_cpuid_phys; -extern void cpu_die(void) __attribute__((noreturn)); +extern void cpu_die(void); #ifdef CONFIG_SMP @@ -37,6 +37,9 @@ extern void smp_send_debugger_break(int struct pt_regs; extern void smp_message_recv(int, struct pt_regs *); +#ifdef CONFIG_HOTPLUG_CPU +extern void fixup_irqs(cpumask_t map); +#endif #define smp_processor_id() (get_paca()->paca_index) #define hard_smp_processor_id() (get_paca()->hw_cpu_id) From nathanl at austin.ibm.com Fri Jan 14 18:05:52 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 14 Jan 2005 01:05:52 -0600 Subject: [PATCH] use kref for device_node refcounting Message-ID: <1105686352.4367.4.camel@biclops> This changes struct device_node and associated code to use the kref api for object refcounting and freeing. I've given it some testing on pSeries with cpu add/remove and verified that the release function works. The change is somewhat cosmetic but it does make the code easier to understand... at least I think so =) The only real change is that the refcount on all device_nodes is initialized at 1, and the device node is freed when the refcount reaches 0 (of_remove_node has the extra "put" to ensure that this happens). This lets us get rid of the OF_STALE flag and macros in prom.h. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref arch/ppc64/kernel/prom.c --- linux-2.6.11-rc1-bk1/arch/ppc64/kernel/prom.c~ppc64-device_node-use-kref 2005-01-13 19:04:09.000000000 -0600 +++ linux-2.6.11-rc1-bk1-nathanl/arch/ppc64/kernel/prom.c 2005-01-14 00:24:04.000000000 -0600 @@ -717,6 +717,7 @@ static unsigned long __init unflatten_dt dad->next->sibling = np; dad->next = np; } + kref_init(&np->kref); } while(1) { u32 sz, noff; @@ -1475,24 +1476,31 @@ EXPORT_SYMBOL(of_get_next_child); * @node: Node to inc refcount, NULL is supported to * simplify writing of callers * - * Returns the node itself or NULL if gone. + * Returns node. */ struct device_node *of_node_get(struct device_node *node) { - if (node && !OF_IS_STALE(node)) { - atomic_inc(&node->_users); - return node; - } - return NULL; + if (node) + kref_get(&node->kref); + return node; } EXPORT_SYMBOL(of_node_get); +static inline struct device_node * kref_to_device_node(struct kref *kref) +{ + return container_of(kref, struct device_node, kref); +} + /** - * of_node_cleanup - release a dynamically allocated node - * @arg: Node to be released + * of_node_release - release a dynamically allocated node + * @kref: kref element of the node to be released + * + * In of_node_put() this function is passed to kref_put() + * as the destructor. */ -static void of_node_cleanup(struct device_node *node) +static void of_node_release(struct kref *kref) { + struct device_node *node = kref_to_device_node(kref); struct property *prop = node->properties; if (!OF_IS_DYNAMIC(node)) @@ -1518,19 +1526,8 @@ static void of_node_cleanup(struct devic */ void of_node_put(struct device_node *node) { - if (!node) - return; - - WARN_ON(0 == atomic_read(&node->_users)); - - if (OF_IS_STALE(node)) { - if (atomic_dec_and_test(&node->_users)) { - of_node_cleanup(node); - return; - } - } - else - atomic_dec(&node->_users); + if (node) + kref_put(&node->kref, of_node_release); } EXPORT_SYMBOL(of_node_put); @@ -1773,7 +1770,7 @@ int of_add_node(const char *path, struct np->properties = proplist; OF_MARK_DYNAMIC(np); - of_node_get(np); + kref_init(&np->kref); np->parent = derive_parent(path); if (!np->parent) { kfree(np); @@ -1808,8 +1805,9 @@ static void of_cleanup_node(struct devic } /* - * Remove an OF device node from the system. - * Caller should have already "gotten" np. + * "Unplug" a node from the device tree. The caller must hold + * a reference to the node. The memory associated with the node + * is not freed until its refcount goes to zero. */ int of_remove_node(struct device_node *np) { @@ -1827,7 +1825,6 @@ int of_remove_node(struct device_node *n of_cleanup_node(np); write_lock(&devtree_lock); - OF_MARK_STALE(np); remove_node_proc_entries(np); if (allnodes == np) allnodes = np->allnext; @@ -1852,6 +1849,7 @@ int of_remove_node(struct device_node *n } write_unlock(&devtree_lock); of_node_put(parent); + of_node_put(np); /* Must decrement the refcount */ return 0; } diff -puN include/asm-ppc64/prom.h~ppc64-device_node-use-kref include/asm-ppc64/prom.h --- linux-2.6.11-rc1-bk1/include/asm-ppc64/prom.h~ppc64-device_node-use-kref 2005-01-13 19:04:09.000000000 -0600 +++ linux-2.6.11-rc1-bk1-nathanl/include/asm-ppc64/prom.h 2005-01-13 19:04:09.000000000 -0600 @@ -149,18 +149,15 @@ struct device_node { struct proc_dir_entry *pde; /* this node's proc directory */ struct proc_dir_entry *name_link; /* name symlink */ struct proc_dir_entry *addr_link; /* addr symlink */ - atomic_t _users; /* reference count */ + struct kref kref; unsigned long _flags; }; extern struct device_node *of_chosen; /* flag descriptions */ -#define OF_STALE 0 /* node is slated for deletion */ #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */ -#define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags) -#define OF_MARK_STALE(x) set_bit(OF_STALE, &x->_flags) #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, &x->_flags) #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, &x->_flags) _ From arnd at arndb.de Fri Jan 14 20:28:22 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:28:22 +0100 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Message-ID: <200501141028.23317.arnd@arndb.de> On Freedag 14 Januar 2005 00:51, Anton Blanchard wrote: > Hi, > > I was thinking of sending this upstream. Any thoughts? > I'm doing something similar in my private tree and I noticed that init_pci_config_tokens() is currently called by eeh_init(). If you don't build EEH, init_pci_config_tokens() needs to be called by pSeries_setup_arch(), which makes more sense anyway. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/8f823a0e/attachment.pgp From arnd at arndb.de Fri Jan 14 20:23:07 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:23:07 +0100 Subject: [PATCH] PPC64: 32bit wrapper for ioctls. In-Reply-To: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> References: <41E6C1FF.4000203@us.ltcfwd.linux.ibm.com> Message-ID: <200501141023.08156.arnd@arndb.de> On Dunnersdag 13 Januar 2005 19:46, Mike Wolf wrote: > Hi Paul, > ? The patch adds some 32bit wrappers for 2 ioctls that Java needs. > Assuming this doesn't generate a round of discussion, please > forward upstream to akpm/torvalds. Why add them to arch/ppc64? These don't look architecture specific, so they should go into include/linux/compat_ioctl.h. > --- linus-0112.orig/arch/ppc64/kernel/ioctl32.c?2005-01-13 10:35:10.165539000 -0600 > +++ linus-0112/arch/ppc64/kernel/ioctl32.c??????2005-01-13 10:51:43.450433277 -0600 > @@ -43,6 +43,8 @@ > ?COMPATIBLE_IOCTL(TIOCSTART) > ?COMPATIBLE_IOCTL(TIOCSTOP) > ?COMPATIBLE_IOCTL(TIOCSLTC) > +COMPATIBLE_IOCTL(TIOCMIWAIT) Note that TIOCMIWAIT is not COMPATIBLE_IOCTL, but ULONG_IOCTL. It doesn't make a difference for ppc64, but if you add it to the generic file that is needed for s390x. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/54e8467a/attachment.pgp From arnd at arndb.de Fri Jan 14 20:50:28 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 14 Jan 2005 10:50:28 +0100 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <20050111195127.23300721.akpm@osdl.org> References: <41E4787D.90309@austin.ibm.com> <20050111195127.23300721.akpm@osdl.org> Message-ID: <200501141050.29068.arnd@arndb.de> On Middeweken 12 Januar 2005 04:51, Andrew Morton wrote: > Manish Ahuja wrote: > > > > There is a requirement to collect real usage values of each partition in > > LPAR environment > > on pseries as well as iseries. > > What (if any) relationship does this have to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-introduce-cputime.patch ? I asked Martin the same thing yesterday, and he said that that recording the purr value like Manish does is needed to support the cputime statistics, but this is not the complete solution. Manish, did you look at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ? I think you need to do similar things on top of you patch to really export steal time etc. to user space. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050114/44dce463/attachment.pgp From ahuja at austin.ibm.com Sat Jan 15 06:18:23 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Fri, 14 Jan 2005 13:18:23 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <200501141050.29068.arnd@arndb.de> References: <41E4787D.90309@austin.ibm.com> <20050111195127.23300721.akpm@osdl.org> <200501141050.29068.arnd@arndb.de> Message-ID: <41E81AFF.3020005@austin.ibm.com> Arnd Bergmann wrote: >I asked Martin the same thing yesterday, and he said that that recording >the purr value like Manish does is needed to support the cputime statistics, >but this is not the complete solution. > >Manish, did you look at >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm3/broken-out/cputime-microsecond-based-cputime-for-s390.patch ? >I think you need to do similar things on top of you patch to really export >steal time etc. to user space. > > Arnd <>< > > Yup, There is another piece that will tie in with Martin's patch. This piece is needed by the CKRM folks to enable process accounting feature as well as by Jeff Scheel since he uses the output for his calculations. Manish From anton at samba.org Sat Jan 15 10:49:20 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 15 Jan 2005 10:49:20 +1100 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <200501141028.23317.arnd@arndb.de> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> <200501141028.23317.arnd@arndb.de> Message-ID: <20050114234920.GM6309@krispykreme.ozlabs.ibm.com> Hi, > I'm doing something similar in my private tree and I noticed that > init_pci_config_tokens() is currently called by eeh_init(). > If you don't build EEH, init_pci_config_tokens() needs to be called > by pSeries_setup_arch(), which makes more sense anyway. Good point :) We also had PCI disabled so never saw this. Anton From anton at samba.org Sat Jan 15 11:00:55 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 15 Jan 2005 11:00:55 +1100 Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE() In-Reply-To: <1105714076.26551.243.camel@hades.cambridge.redhat.com> References: <1105714076.26551.243.camel@hades.cambridge.redhat.com> Message-ID: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> David: you have to send me some spare Signed-off-by's :) Anton -- From: David Woodhouse We don't set MM_VM_SIZE() on ppc64, so it defaults to TASK_SIZE. Which means a 32-bit process ending up in exit_mmap() to kill a 64-bit mm may call tlb_finish_mmu() with an incorrect 'end' argument. Signed-off-by: Anton Blanchard ===== include/asm-ppc64/processor.h 1.59 vs edited ===== --- 1.59/include/asm-ppc64/processor.h Tue Jan 11 01:29:24 2005 +++ edited/include/asm-ppc64/processor.h Fri Jan 14 14:42:44 2005 @@ -537,6 +537,10 @@ #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_USER32 : TASK_SIZE_USER64) +/* We can't actually tell the TASK_SIZE given just the mm, but default + * to the 64-bit case to make sure that enough gets cleaned up. */ +#define MM_VM_SIZE(mm) TASK_SIZE_USER64 + /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ From dwmw2 at infradead.org Sat Jan 15 11:31:41 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 15 Jan 2005 00:31:41 +0000 Subject: [PATCH] ppc64: lacks definition of MM_VM_SIZE() In-Reply-To: <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> References: <1105714076.26551.243.camel@hades.cambridge.redhat.com> <20050115000055.GO6309@krispykreme.ozlabs.ibm.com> Message-ID: <1105749101.30759.109.camel@baythorne.infradead.org> On Sat, 2005-01-15 at 11:00 +1100, Anton Blanchard wrote: > David: you have to send me some spare Signed-off-by's :) Get Paulus to give you some spares. I'm sure he's losing them. Signed-off-by: David Woodhouse -- dwmw2 From mingo at elte.hu Sun Jan 16 01:25:37 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:25:37 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> Message-ID: <20050115142537.GD10114@elte.hu> * Paul Mackerras wrote: > This patch fixes a problem I have been seeing since all the preempt > changes went in, which is that ppc64 SMP systems would livelock > randomly if preempt was enabled. > > It turns out that what was happening was that one cpu was spinning in > spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly > doing preempt_enable() and preempt_disable() calls. The other cpu had > the lock and was trying to set the TIF_NEED_RESCHED flag for the task > running on the first cpu. That is an atomic operation which has to be > retried if another cpu writes to the same cacheline between the load > and the store, which the other cpu was doing every time it did > preempt_enable() or preempt_disable(). ahh ... indeed. Nice catch. > I decided to move the thread_info flags field into the next cache > line, since it is the only field that would regularly be modified by > cpus other than the one running the task that owns the thread_info. > (OK possibly the `cpu' field would be on a rebalance; I don't know the > rebalancing code, but that should be pretty infrequent.) Thus, moving > the flags field seems like a good idea generally as well as solving > the immediate problem. > > For the record I am pretty unhappy with the code we use for spin_lock > et al. with preemption turned on (the BUILD_LOCK_OPS stuff in > spinlock.c). For a start we do the atomic op (_raw_spin_trylock) each > time around the loop. That is going to be generating a lot of > unnecessary bus (or fabric) traffic. Instead, after we fail to get > the lock we should poll it with simple loads until we see that it is > clear and then retry the atomic op. Assuming a reasonable cache > design, the loads won't generate any bus traffic until another cpu > writes to the cacheline containing the lock. agreed. How about the patch below? (tested on x86) > Secondly we have lost the __spin_yield call that we had on ppc64, > which is an important optimization when we are running under the > hypervisor. I can't just put that in cpu_relax because I need to know > which (virtual) cpu is holding the lock, so that I can tell the > hypervisor which virtual cpu to give my time slice to. That > information is stored in the lock variable, which is why __spin_yield > needs the address of the lock. hm, how about calling __spin_yield() from _raw_spin_trylock(), if the locking attempt was unsuccessful? This might be slightly incorrect if the locking attempt is not connected to an actual spin-loop, but we do have other spin-loops with open-coded trylocks that would benefit from this optimization too. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype) \ +#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ } \ @@ -204,6 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ + while (spin_is_locked(lock) && (lock)->break_lock) \ + cpu_relax(); \ cpu_relax(); \ preempt_disable(); \ } \ @@ -244,9 +247,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t); -BUILD_LOCK_OPS(read, rwlock_t); -BUILD_LOCK_OPS(write, rwlock_t); +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); #endif /* CONFIG_PREEMPT */ From mingo at elte.hu Sun Jan 16 01:38:05 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:38:05 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115142537.GD10114@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> Message-ID: <20050115143805.GA15041@elte.hu> * Ingo Molnar wrote: > agreed. How about the patch below? (tested on x86) updated patch below. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype) \ +#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,8 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ } \ @@ -204,7 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - cpu_relax(); \ + while (is_locked_fn(lock) && (lock)->break_lock) \ + cpu_relax(); \ preempt_disable(); \ } \ return flags; \ @@ -244,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t); -BUILD_LOCK_OPS(read, rwlock_t); -BUILD_LOCK_OPS(write, rwlock_t); +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); #endif /* CONFIG_PREEMPT */ From mingo at elte.hu Sun Jan 16 01:00:44 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:00:44 +0100 Subject: [PATCH] PPC64 can do preempt debug too In-Reply-To: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> References: <16870.20786.164419.188120@cargo.ozlabs.ibm.com> Message-ID: <20050115140044.GB10114@elte.hu> * Paul Mackerras wrote: > This patch enables the DEBUG_PREEMPT config option for PPC64. I have > this turned on on my desktop G5 and it isn't finding any problems. (It > did find one problem, in flush_tlb_pending(), that I have just sent a > patch for.) > > BTW, do we really need to restrict which architectures the config > option is available on? in the case of x86 (and x64) i found that there were a fair number of false positives in arch-level code. But i agree that we should (now) make the config option available to all architectures - patch against 2.6.11-rc1 below. Ingo Signed-off-by: Ingo Molnar --- linux/lib/Kconfig.debug.orig +++ linux/lib/Kconfig.debug @@ -50,7 +50,7 @@ config DEBUG_SLAB config DEBUG_PREEMPT bool "Debug preemptible kernel" - depends on PREEMPT && X86 + depends on PREEMPT default y help If you say Y here then the kernel will use a debug variant of the From mingo at elte.hu Sun Jan 16 01:04:38 2005 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 15 Jan 2005 15:04:38 +0100 Subject: [PATCH] PPC64 Call preempt_schedule on exception exit In-Reply-To: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> References: <16870.20576.417821.693961@cargo.ozlabs.ibm.com> Message-ID: <20050115140438.GC10114@elte.hu> * Paul Mackerras wrote: > This patch mirrors the recent changes on x86 to call preempt_schedule > rather than schedule in the exception exit path, in the case where the > preempt_count is zero and the TIF_NEED_RESCHED bit is set. > > I'm a little concerned that this means that we have a window where > interrupts are enabled and we are on our way into preempt_schedule, > but preempt_count is still zero. Ingo's proposed preempt_schedule_irq > would fix this, and I think something like that should go in. the preempt_schedule_irq() patch is in 2.6.11-rc1-mm1 now, does it look good to you? ppc64 should be able to call it directly from lowlevel code. Ingo From benh at kernel.crashing.org Sun Jan 16 09:23:13 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:23:13 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: References: Message-ID: <1105827794.27410.82.camel@gaston> On Thu, 2005-01-13 at 17:43 -0700, Zwane Mwaikambo wrote: > I found the following very handy for use as a reference platform when > working on i386 hotplug cpu recently. > > It's been tested on a G5 system with a cpu going on/offline every second > and make -j. I've also tried a number of config options to avoid compile > breakage. Hi ! Looks good, but you could do even better :) I still want to look at the proper mecanism to flush the CPU cache on 970, but the idea here is to flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) with the caches clean and MSR:EE off. We can later get it back with a soft reset. Ben. From benh at kernel.crashing.org Sun Jan 16 09:29:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:29:21 +1100 Subject: ioremap of pci region on pSeries LPAR vs SMP In-Reply-To: <20050111221723.GE23690@austin.ibm.com> References: <20050110074930.92901.qmail@web11508.mail.yahoo.com> <16866.18083.212727.327170@cargo.ozlabs.ibm.com> <20050110174716.GW22274@austin.ibm.com> <16866.63132.352016.732484@cargo.ozlabs.ibm.com> <20050111000845.GC14239@krispykreme.ozlabs.ibm.com> <20050111221723.GE23690@austin.ibm.com> Message-ID: <1105828161.27410.84.camel@gaston> On Tue, 2005-01-11 at 16:17 -0600, Linas Vepstas wrote: > On Tue, Jan 11, 2005 at 11:08:45AM +1100, Anton Blanchard was heard to remark: > > > > Ive seen HPC stuff that wants to be able to mmap a PCI cards resources into > > userspace. Their hack on ppc64 was to look at the high nibble of the > > address and convert it to a non EEH address if required :) > > > > Im not sure how best to solve the userspace mmap issue but there are a > > few groups wanting that. > > Somewhat off-topic ... but ... > > 1) If you design your hardware correctly, there are some amazing things > you can do (performance wise) by mmaping pci card resources into user > space. If your hardwares is done right, then user corruption can't > hurt the system. This was the defacto method for getting high > performance graphics on IBM RS/6000, sgi, HP and Sun workstations > many moons ago. And that's exactly what X does still today on pretty much all machines :) > 2) There is interest in the virtual i/o community about mmaping > funky stuff to userspace, but that conversation may be for a > different day. The question is (for example) how to build > a high-performance virtual scsi server in userspace (without > kernel pieces) which is a design point some people like. > Later... > > --linas > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -- Benjamin Herrenschmidt From benh at kernel.crashing.org Sun Jan 16 09:36:37 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:36:37 +1100 Subject: [PATCH] htab code cleanup In-Reply-To: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> Message-ID: <1105828597.27435.88.camel@gaston> On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote: > Hi all, > > This patch just does some small clean ups on the hash page table code > - make htab_address static with in htab_native.c > - move some code that depended on CONFIG_PPC_MULTIPLATFORM > from htab_utils.c to htab_native.c (on less CONFIG check). > - clean up includes in htab_utils.c I don't see the point of moving create_pte_mapping() and htab_initialize() to htab_native.c since it contains code for both native and non-native... If you want to get rid of the htab_address, then maybe split htab_initialize in bits... like htab_native_init() and htab_plpar_init() for the early ptr setup, that sort of thing ... Ben. From benh at kernel.crashing.org Sun Jan 16 09:44:27 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 16 Jan 2005 09:44:27 +1100 Subject: [PATCH] sparse fixes for cpu feature constants In-Reply-To: <20050101223345.GC2297@zax> References: <1104381206.16694.38.camel@localhost.localdomain> <20050101223345.GC2297@zax> Message-ID: <1105829067.27411.92.camel@gaston> On Sun, 2005-01-02 at 09:33 +1100, David Gibson wrote: > > switch_mm() uses a BEGIN_FTR_SECTION ... > > END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) which gets broken by the change > > since 0x0000000000000008UL winds up in the generated assembly. I > > couldn't find the BEGIN/END_FTR_SECTION construct used in any other C > > code, so I replaced this with the usual bitwise 'and' conditional (I > > hope someone else will verify that this is equivalent :). > > > > So, does this look like the right thing to do? It eliminates 129 sparse > > warnings from a defconfig 2.6.10 build. Hrm... it's a bit annoying. You are replacing a dynamic patching of the code by an runtime test... killing a (small tho) optimisation. There may be other cases where I want to use the CPU feature stuff in inline assembly..... Not sure what the right fix is, maybe passing the constant to the asm via the inputs as "i" ... Ben. From paulus at samba.org Sun Jan 16 09:54:22 2005 From: paulus at samba.org (Paul Mackerras) Date: Sun, 16 Jan 2005 09:54:22 +1100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115143805.GA15041@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu> Message-ID: <16873.40734.485466.850449@cargo.ozlabs.ibm.com> Ingo Molnar writes: > +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); > +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); I don't think this is right - this means that a cpu trying to acquire a read lock will spin while any other cpu has a read lock. We need to invent and use a rwlock_is_write_locked() here. PPC64 and parisc have an is_write_locked() already, and it shouldn't be too hard to do one for the other architectures (i386 wants (signed int)rw->lock <= 0, most other arches seem to need (signed int)rw->lock < 0). > +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); This one should be rwlock_is_locked, surely? Otherwise the compiler will grizzle about us calling spin_is_locked with a rwlock_t *. Regards, Paul. From paulus at samba.org Sun Jan 16 14:04:27 2005 From: paulus at samba.org (Paul Mackerras) Date: Sun, 16 Jan 2005 14:04:27 +1100 Subject: [patch] spin-nicer-2.6.11-rc1-A0 In-Reply-To: <20050115142537.GD10114@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> Message-ID: <16873.55739.214904.473407@cargo.ozlabs.ibm.com> Ingo Molnar writes: > * Paul Mackerras wrote: > > > Secondly we have lost the __spin_yield call that we had on ppc64, > > which is an important optimization when we are running under the > > hypervisor. I can't just put that in cpu_relax because I need to know > > which (virtual) cpu is holding the lock, so that I can tell the > > hypervisor which virtual cpu to give my time slice to. That > > information is stored in the lock variable, which is why __spin_yield > > needs the address of the lock. > > hm, how about calling __spin_yield() from _raw_spin_trylock(), if the > locking attempt was unsuccessful? This might be slightly incorrect if > the locking attempt is not connected to an actual spin-loop, but we do > have other spin-loops with open-coded trylocks that would benefit from > this optimization too. That would help, but we also need to yield while we are polling the lock until it becomes available. Otherwise we will only yield once; if we get another timeslice and the other cpu still hasn't finished with the lock (or another cpu has got it now), we will spin uselessly for the whole of our timeslice. Thus I think we need to yield in the polling loop, whether or not we also yield in _raw_spin_trylock. Regards, Paul. From anton at samba.org Sun Jan 16 16:19:04 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 16 Jan 2005 16:19:04 +1100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? In-Reply-To: <20050116043356.GM4274@stusta.de> References: <20050116043356.GM4274@stusta.de> Message-ID: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Hi, > during a cleanup, I stumbled upon the following: > > > arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: > > /* XXX fix this, xics currently relies on it - Anton */ > smp_threads_ready = 1; > > > arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where > smp_threads_ready is actually used, and this is the _only_ place where > smp_threads_ready ever changes it's value on ppc64. It turns out I was about to submit a patch to remove the ppc64 use of smp_threads_ready. With that patch it makes sense to kill smp_threads_ready completely. Anton From anton at samba.org Sun Jan 16 16:55:23 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 16 Jan 2005 16:55:23 +1100 Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile options arent much use on a distro kernel. This also removes the ppc64 use of smp_threads_ready. I considered removing the option completely but we have had problems in the past with firmware bugs. In those cases the boot option would have helped. Signed-off-by: Anton Blanchard ===== arch/ppc64/Kconfig 1.76 vs edited ===== --- 1.76/arch/ppc64/Kconfig 2005-01-16 09:31:06 +11:00 +++ edited/arch/ppc64/Kconfig 2005-01-16 16:48:43 +11:00 @@ -186,14 +186,6 @@ If you don't know what to do here, say Y. -config IRQ_ALL_CPUS - bool "Distribute interrupts on all CPUs by default" - depends on SMP && PPC_MULTIPLATFORM - help - This option gives the kernel permission to distribute IRQs across - multiple CPUs. Saying N here will route all IRQs to the first - CPU. - config NR_CPUS int "Maximum number of CPUs (2-128)" range 2 128 ===== arch/ppc64/kernel/irq.c 1.74 vs edited ===== --- 1.74/arch/ppc64/kernel/irq.c 2005-01-05 13:48:02 +11:00 +++ edited/arch/ppc64/kernel/irq.c 2005-01-16 16:48:47 +11:00 @@ -62,6 +62,7 @@ extern irq_desc_t irq_desc[NR_IRQS]; +int distribute_irqs = 1; int __irq_offset_value; int ppc_spurious_interrupts; unsigned long lpevent_count; @@ -479,3 +480,10 @@ #endif /* CONFIG_IRQSTACKS */ +static int __init setup_noirqdistrib(char *str) +{ + distribute_irqs = 0; + return 1; +} + +__setup("noirqdistrib", setup_noirqdistrib); ===== arch/ppc64/kernel/mpic.c 1.3 vs edited ===== --- 1.3/arch/ppc64/kernel/mpic.c 2004-11-16 14:29:10 +11:00 +++ edited/arch/ppc64/kernel/mpic.c 2005-01-16 16:48:44 +11:00 @@ -765,10 +765,8 @@ #ifdef CONFIG_SMP struct mpic *mpic = mpic_primary; unsigned long flags; -#ifdef CONFIG_IRQ_ALL_CPUS u32 msk = 1 << hard_smp_processor_id(); unsigned int i; -#endif BUG_ON(mpic == NULL); @@ -776,16 +774,16 @@ spin_lock_irqsave(&mpic_lock, flags); -#ifdef CONFIG_IRQ_ALL_CPUS /* let the mpic know we want intrs. default affinity is 0xffffffff * until changed via /proc. That's how it's done on x86. If we want * it differently, then we should make sure we also change the default * values of irq_affinity in irq.c. */ - for (i = 0; i < mpic->num_sources ; i++) - mpic_irq_write(i, MPIC_IRQ_DESTINATION, - mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk); -#endif /* CONFIG_IRQ_ALL_CPUS */ + if (distribute_irqs) { + for (i = 0; i < mpic->num_sources ; i++) + mpic_irq_write(i, MPIC_IRQ_DESTINATION, + mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk); + } /* Set current processor priority to 0 */ mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0); ===== arch/ppc64/kernel/pSeries_smp.c 1.9 vs edited ===== --- 1.9/arch/ppc64/kernel/pSeries_smp.c 2005-01-12 11:42:40 +11:00 +++ edited/arch/ppc64/kernel/pSeries_smp.c 2005-01-16 16:48:44 +11:00 @@ -259,7 +259,6 @@ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); -#ifdef CONFIG_IRQ_ALL_CPUS /* * Put the calling processor into the GIQ. This is really only * necessary from a secondary thread as the OF start-cpu interface @@ -267,7 +266,6 @@ */ rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, (1UL << interrupt_server_size) - 1 - default_distrib_server, 1); -#endif } static spinlock_t timebase_lock = SPIN_LOCK_UNLOCKED; ===== arch/ppc64/kernel/smp.c 1.104 vs edited ===== --- 1.104/arch/ppc64/kernel/smp.c 2005-01-12 11:42:39 +11:00 +++ edited/arch/ppc64/kernel/smp.c 2005-01-16 16:48:45 +11:00 @@ -526,9 +526,6 @@ smp_ops->setup_cpu(boot_cpuid); - /* XXX fix this, xics currently relies on it - Anton */ - smp_threads_ready = 1; - set_cpus_allowed(current, old_mask); /* ===== arch/ppc64/kernel/xics.c 1.57 vs edited ===== --- 1.57/arch/ppc64/kernel/xics.c 2005-01-12 11:42:40 +11:00 +++ edited/arch/ppc64/kernel/xics.c 2005-01-16 16:48:45 +11:00 @@ -242,28 +242,24 @@ static int get_irq_server(unsigned int irq) { unsigned int server; - -#ifdef CONFIG_IRQ_ALL_CPUS /* For the moment only implement delivery to all cpus or one cpu */ - if (smp_threads_ready) { - cpumask_t cpumask = irq_affinity[irq]; - cpumask_t tmp = CPU_MASK_NONE; - if (cpus_equal(cpumask, CPU_MASK_ALL)) { - server = default_distrib_server; - } else { - cpus_and(tmp, cpu_online_map, cpumask); + cpumask_t cpumask = irq_affinity[irq]; + cpumask_t tmp = CPU_MASK_NONE; + + if (!distribute_irqs) + return default_server; - if (cpus_empty(tmp)) - server = default_distrib_server; - else - server = get_hard_smp_processor_id(first_cpu(tmp)); - } + if (cpus_equal(cpumask, CPU_MASK_ALL)) { + server = default_distrib_server; } else { - server = default_server; + cpus_and(tmp, cpu_online_map, cpumask); + + if (cpus_empty(tmp)) + server = default_distrib_server; + else + server = get_hard_smp_processor_id(first_cpu(tmp)); } -#else - server = default_server; -#endif + return server; } ===== include/asm-ppc64/irq.h 1.11 vs edited ===== --- 1.11/include/asm-ppc64/irq.h 2004-10-23 11:44:19 +10:00 +++ edited/include/asm-ppc64/irq.h 2005-01-16 16:48:47 +11:00 @@ -87,6 +87,8 @@ return irq; } +extern int distribute_irqs; + struct irqaction; struct pt_regs; From bunk at stusta.de Sun Jan 16 15:33:56 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 05:33:56 +0100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? Message-ID: <20050116043356.GM4274@stusta.de> Hi Anton, during a cleanup, I stumbled upon the following: arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: /* XXX fix this, xics currently relies on it - Anton */ smp_threads_ready = 1; arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where smp_threads_ready is actually used, and this is the _only_ place where smp_threads_ready ever changes it's value on ppc64. I have to admit I'm a bit lost in the sequence of function calls on ppc64. Is it possible to make any assumptions about the ordering of the assignment and the usage of smp_threads_ready? TIA Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From bunk at stusta.de Sun Jan 16 18:24:39 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 08:24:39 +0100 Subject: [PATCH] ppc64: Remove CONFIG_IRQ_ALL_CPUS In-Reply-To: <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> <20050116055523.GQ6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116072439.GS4274@stusta.de> On Sun, Jan 16, 2005 at 04:55:23PM +1100, Anton Blanchard wrote: > > Replace CONFIG_IRQ_ALL_CPUS with a boot option (noirqdistrib). Compile > options arent much use on a distro kernel. This also removes the ppc64 > use of smp_threads_ready. >... Seems perfect for me. :-) I'll simply state that my patch depends on ppc64 on your patch. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From bunk at stusta.de Sun Jan 16 16:26:56 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sun, 16 Jan 2005 06:26:56 +0100 Subject: ppc64 xics.c: what is smp_threads_ready exactly used for? In-Reply-To: <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> References: <20050116043356.GM4274@stusta.de> <20050116051904.GP6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050116052655.GN4274@stusta.de> On Sun, Jan 16, 2005 at 04:19:04PM +1100, Anton Blanchard wrote: > > Hi, Hi Anton, > > during a cleanup, I stumbled upon the following: > > > > > > arch/ppc64/kernel/smp.c (in 2.6.11-rc1-mm1) says: > > > > /* XXX fix this, xics currently relies on it - Anton */ > > smp_threads_ready = 1; > > > > > > arch/ppc64/kernel/xics.c is the _only_ place in the whole kernel where > > smp_threads_ready is actually used, and this is the _only_ place where > > smp_threads_ready ever changes it's value on ppc64. > > It turns out I was about to submit a patch to remove the ppc64 use of > smp_threads_ready. With that patch it makes sense to kill > smp_threads_ready completely. I've got a patch ready to remove smp_threads_ready on all architectures. The only part I still need ids how to replace it in xics.c, since this is the only read access to this variable on all architectures. Could you send me this part for inclusion into my patch? > Anton TIA Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From zwane at arm.linux.org.uk Mon Jan 17 15:37:28 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sun, 16 Jan 2005 21:37:28 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105827794.27410.82.camel@gaston> References: <1105827794.27410.82.camel@gaston> Message-ID: Hello Ben, On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > Looks good, but you could do even better :) I still want to look at the > proper mecanism to flush the CPU cache on 970, but the idea here is to > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > with the caches clean and MSR:EE off. We can later get it back with a > soft reset. Thanks for the suggestions! I'll work on getting something together. Zwane From benh at kernel.crashing.org Mon Jan 17 15:47:46 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 17 Jan 2005 15:47:46 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: References: <1105827794.27410.82.camel@gaston> Message-ID: <1105937266.4534.0.camel@gaston> On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote: > Hello Ben, > > On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > > > Looks good, but you could do even better :) I still want to look at the > > proper mecanism to flush the CPU cache on 970, but the idea here is to > > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > > with the caches clean and MSR:EE off. We can later get it back with a > > soft reset. > > Thanks for the suggestions! I'll work on getting something together. Well.. the cache flush part requires some not-really-documentd stuff on the 970, but I'll try to come up with something. Ben. From zwane at arm.linux.org.uk Mon Jan 17 16:35:05 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Sun, 16 Jan 2005 22:35:05 -0700 (MST) Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105937266.4534.0.camel@gaston> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> Message-ID: On Mon, 17 Jan 2005, Benjamin Herrenschmidt wrote: > On Sun, 2005-01-16 at 21:37 -0700, Zwane Mwaikambo wrote: > > Hello Ben, > > > > On Sun, 16 Jan 2005, Benjamin Herrenschmidt wrote: > > > > > Looks good, but you could do even better :) I still want to look at the > > > proper mecanism to flush the CPU cache on 970, but the idea here is to > > > flush it, and put the CPU into a NAP loop (the 970 has no SLEEP mode) > > > with the caches clean and MSR:EE off. We can later get it back with a > > > soft reset. > > > > Thanks for the suggestions! I'll work on getting something together. > > Well.. the cache flush part requires some not-really-documentd stuff on > the 970, but I'll try to come up with something. I was waiting for you to say that ;) Thanks, Zwane From mingo at elte.hu Mon Jan 17 22:32:17 2005 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 17 Jan 2005 12:32:17 +0100 Subject: [patch] spin-nicer-2.6.11-rc1-A1 In-Reply-To: <16873.40734.485466.850449@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <20050115143805.GA15041@elte.hu> <16873.40734.485466.850449@cargo.ozlabs.ibm.com> Message-ID: <20050117113217.GA14619@elte.hu> * Paul Mackerras wrote: > > +BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); > > +BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); > > I don't think this is right - this means that a cpu trying to acquire > a read lock will spin while any other cpu has a read lock. We need to > invent and use a rwlock_is_write_locked() here. PPC64 and parisc have > an is_write_locked() already, and it shouldn't be too hard to do one > for the other architectures (i386 wants (signed int)rw->lock <= 0, > most other arches seem to need (signed int)rw->lock < 0). > > > +BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); > > This one should be rwlock_is_locked, surely? Otherwise the compiler > will grizzle about us calling spin_is_locked with a rwlock_t *. you are right on both counts. The patch below, ontop of current BK, fixes both problems. the first fix is that there was no compiler warning on x86 because it uses macros - i fixed this by changing the spinlock field to be '->slock'. (we could also use inline functions to get type protection, i chose this solution because it was the easiest to do.) the second fix is to split rwlock_is_locked() into two functions: +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) this canonical naming of them also enabled the elimination of the newly added 'is_locked_fn' argument to the BUILD_LOCK_OPS macro. the third change was to change the other user of rwlock_is_locked(), and to put a migration helper there: architectures that dont have read/write_is_locked defined yet will get a #warning message but the build will succeed. (except if PREEMPT is enabled - there we really need.) compile and boot-tested on x86, on SMP and UP, PREEMPT and !PREEMPT. Non-x86 architectures should work fine, except PREEMPT+SMP builds which will need the read_is_locked()/write_is_locked() definitions. !PREEMPT+SMP builds will work fine and will produce a #warning. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,7 +173,7 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ +#define BUILD_LOCK_OPS(op, locktype) \ void __lockfunc _##op##_lock(locktype *lock) \ { \ preempt_disable(); \ @@ -183,7 +183,7 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ + while (op##_is_locked(lock) && (lock)->break_lock) \ cpu_relax(); \ preempt_disable(); \ } \ @@ -205,7 +205,7 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ + while (op##_is_locked(lock) && (lock)->break_lock) \ cpu_relax(); \ preempt_disable(); \ } \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock_t); +BUILD_LOCK_OPS(read, rwlock_t); +BUILD_LOCK_OPS(write, rwlock_t); #endif /* CONFIG_PREEMPT */ --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -15,7 +15,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +43,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +83,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +101,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +123,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +138,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +151,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +186,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); From mingo at elte.hu Mon Jan 17 23:42:09 2005 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 17 Jan 2005 13:42:09 +0100 Subject: [patch] spin-yield-2.6.11-rc1-A1 In-Reply-To: <16873.55739.214904.473407@cargo.ozlabs.ibm.com> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <16873.55739.214904.473407@cargo.ozlabs.ibm.com> Message-ID: <20050117124209.GA20796@elte.hu> * Paul Mackerras wrote: > > hm, how about calling __spin_yield() from _raw_spin_trylock(), if the > > locking attempt was unsuccessful? This might be slightly incorrect if > > the locking attempt is not connected to an actual spin-loop, but we do > > have other spin-loops with open-coded trylocks that would benefit from > > this optimization too. > > That would help, but we also need to yield while we are polling the > lock until it becomes available. Otherwise we will only yield once; > if we get another timeslice and the other cpu still hasn't finished > with the lock (or another cpu has got it now), we will spin uselessly > for the whole of our timeslice. Thus I think we need to yield in the > polling loop, whether or not we also yield in _raw_spin_trylock. ok - how about the (raw) patch below? (ontop of BK plus the latest spin-nicer patch i sent earlier.) It builds/boots on x86 but is untested on ppc64. the idea is to make spin_yield() a generic function, with some related namespace cleanups. Ingo Acked-by: Ingo Molnar --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ -void __lockfunc _##op##_lock(locktype *lock) \ +#define BUILD_LOCK_OPS(op, locktype) \ +void __lockfunc _##op##_lock(locktype##_t *lock) \ { \ preempt_disable(); \ for (;;) { \ @@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ } \ \ EXPORT_SYMBOL(_##op##_lock); \ \ -unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock) \ +unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ return flags; \ @@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir \ EXPORT_SYMBOL(_##op##_lock_irqsave); \ \ -void __lockfunc _##op##_lock_irq(locktype *lock) \ +void __lockfunc _##op##_lock_irq(locktype##_t *lock) \ { \ _##op##_lock_irqsave(lock); \ } \ \ EXPORT_SYMBOL(_##op##_lock_irq); \ \ -void __lockfunc _##op##_lock_bh(locktype *lock) \ +void __lockfunc _##op##_lock_bh(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock); +BUILD_LOCK_OPS(read, rwlock); +BUILD_LOCK_OPS(write, rwlock); #endif /* CONFIG_PREEMPT */ --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -7,6 +7,8 @@ #include #include +#include + asmlinkage int printk(const char * fmt, ...) __attribute__ ((format (printf, 1, 2))); @@ -15,7 +17,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +45,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +85,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +103,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +125,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +140,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +153,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +188,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter From cfriesen at nortelnetworks.com Tue Jan 18 02:14:42 2005 From: cfriesen at nortelnetworks.com (Chris Friesen) Date: Mon, 17 Jan 2005 09:14:42 -0600 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <1105937266.4534.0.camel@gaston> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> Message-ID: <41EBD662.1080409@nortelnetworks.com> Benjamin Herrenschmidt wrote: > Well.. the cache flush part requires some not-really-documentd stuff on > the 970, but I'll try to come up with something. Details? We've got a cache-flush routine put together based on the documentation that seems to be working, but if there's something else that has to be done I'd love to know about it. Chris From dhowells at redhat.com Tue Jan 18 03:27:19 2005 From: dhowells at redhat.com (David Howells) Date: Mon, 17 Jan 2005 16:27:19 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race Message-ID: <31453.1105979239@redhat.com> The attached patch fixes a race between kallsyms and insmod/rmmod. The problem is this: (1) The various kallsyms functions poke around in the module list without any locking so that they can be called from the oops handler. (2) Although insmod and rmmod use locks to exclude each other, these have no effect on the kallsyms function. (3) Although rmmod modifies the module state with the machine "stopped", it hasn't removed the metadata from the module metadata list, meaning that as soon as the machine is "restarted", the metadata can be observed by kallsyms. It's not possible to say that an item in that list should be ignored if it's state is marked as inactive - you can't get at the state information because you can't trust the metadata in which it is embedded. Furthermore, list linkage information is embedded in the metadata too, so you can't trust that either... (4) kallsyms may be walking the module list without a lock whilst either insmod or rmmod are busy changing it. insmod probably isn't a problem since nothing is going a way, but rmmod is as it's deleting an entry. (5) Therefore nothing that uses these functions can in any way trust any pointers to "static" data (such as module symbol names or module names) that are returned. (6) On ppc64 the problems are exacerbated since the hypervisor may reschedule bits of the kernel, making operations that appear adjacent occur a long time apart. This patch fixes the race by only linking/unlinking modules into/from the master module list with the machine in the "stopped" state. This means that any "static" information can be trusted as far as the next kernel reschedule on any given CPU without the need to hold any locks. However, I'm not sure how this is affected by preemption. I suspect more work may need to be done in that case, but I'm not entirely sure. This also means that rmmod has to bump the machine into the stopped state twice... but since that shouldn't be a common operation, I don't think that's a problem. Signed-Off-By: David Howells --- warthog>diffstat kallsyms-race-2611rc1.diff kallsyms.c | 16 ++++++++++++++-- module.c | 35 ++++++++++++++++++++++++++++------- 2 files changed, 42 insertions(+), 9 deletions(-) diff -uNrp linux-2.6.11-rc1/kernel/kallsyms.c linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c --- linux-2.6.11-rc1/kernel/kallsyms.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c 2005-01-17 15:33:55.000000000 +0000 @@ -139,13 +139,20 @@ unsigned long kallsyms_lookup_name(const return module_kallsyms_lookup_name(name); } -/* Lookup an address. modname is set to NULL if it's in the kernel. */ +/* + * Lookup an address + * - modname is set to NULL if it's in the kernel + * - we guarantee that the returned name is valid until we reschedule even if + * it resides in a module + * - we also guarantee that modname will be valid until rescheduled + */ const char *kallsyms_lookup(unsigned long addr, unsigned long *symbolsize, unsigned long *offset, char **modname, char *namebuf) { unsigned long i, low, high, mid; + const char *msym; /* This kernel should never had been booted. */ BUG_ON(!kallsyms_addresses); @@ -196,7 +203,12 @@ const char *kallsyms_lookup(unsigned lon return namebuf; } - return module_address_lookup(addr, symbolsize, offset, modname); + /* see if it's in a module */ + msym = module_address_lookup(addr, symbolsize, offset, modname); + if (msym) + return strncpy(namebuf, msym, KSYM_NAME_LEN); + + return NULL; } /* Replace "%s" in format with address, or returns -errno. */ diff -uNrp linux-2.6.11-rc1/kernel/module.c linux-2.6.11-rc1-kallsyms/kernel/module.c --- linux-2.6.11-rc1/kernel/module.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/module.c 2005-01-17 15:31:42.000000000 +0000 @@ -1072,14 +1072,24 @@ static void mod_kobject_remove(struct mo kobject_unregister(&mod->mkobj.kobj); } +/* + * unlink the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __unlink_module(void *_mod) +{ + struct module *mod = _mod; + spin_lock(&modlist_lock); + list_del(&mod->list); + spin_unlock(&modlist_lock); + return 0; +} + /* Free a module, remove from lists, etc (must hold module mutex). */ static void free_module(struct module *mod) { /* Delete from various lists */ - spin_lock_irq(&modlist_lock); - list_del(&mod->list); - spin_unlock_irq(&modlist_lock); - + stop_machine_run(__unlink_module, mod, NR_CPUS); remove_sect_attrs(mod); mod_kobject_remove(mod); @@ -1732,6 +1742,19 @@ static struct module *load_module(void _ goto free_hdr; } +/* + * link the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __link_module(void *_mod) +{ + struct module *mod = _mod; + spin_lock(&modlist_lock); + list_add(&mod->list, &modules); + spin_unlock(&modlist_lock); + return 0; +} + /* This is where the real work happens */ asmlinkage long sys_init_module(void __user *umod, @@ -1766,9 +1789,7 @@ sys_init_module(void __user *umod, /* Now sew it into the lists. They won't access us, since strong_try_module_get() will fail. */ - spin_lock_irq(&modlist_lock); - list_add(&mod->list, &modules); - spin_unlock_irq(&modlist_lock); + stop_machine_run(__link_module, mod, NR_CPUS); /* Drop lock so they can recurse */ up(&module_mutex); From willschm at us.ibm.com Tue Jan 18 03:42:05 2005 From: willschm at us.ibm.com (Will Schmidt) Date: Mon, 17 Jan 2005 10:42:05 -0600 Subject: question about LMB's size In-Reply-To: Message-ID: Hi, ipseries-list-bounces at redhat.com wrote on 01/17/2005 05:00:46 AM: > Hi, > This is a question about the different of memory size between lpar and HMC. ... > 2. In lpar didolp2: We get the size of memory is 2174672KB. > [root at didolp2 ~]# cat /proc/meminfo > MemTotal: 2174672 kB > > The question is: 2174672/(32*1024) = 66.36572265625 MemTotal is the amount of free memory in the partition, which does not include the memory that holds the kernel code, (bss, data, init). There should be a few other pieces of data that will add up to the numbers you are looking for. in early boot messages, there is a line "SystemCfg->physicalMemorySize = 0x.......". This value should be precisely what you are trying to measure. A bit later in the logs, you can also see a line "Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k data, ###k bss, ###k init). the YYYYk should also match what you are looking for. > > whereas 2176/32=68. > > 68 != 66.36572265625 > > -------------------------------------------- > Wang Zhaoyu > > Email: wangzyu at cn.ibm.com > Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN-- > ipseries-list mailing list > ipseries-list at redhat.com > https://www.redhat.com/mailman/listinfo/ipseries-list -Will From linas at austin.ibm.com Tue Jan 18 07:14:15 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 17 Jan 2005 14:14:15 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050106192413.GK22274@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> Message-ID: <20050117201415.GA11505@austin.ibm.com> Andrew, The attached file describes PCI bus EEH "Extended Error Handling" concepts and operation; could you drop this into the kernel documentation tree, at linux-2.6/Documentation/powerpc/eeh-pci-error-recovery.txt ? Signed-off-by: Linas Vepstas --linas p.s. It was not clear to me if the EEH patch previously sent (6 January 2005, same subject line) will be wending its way into the main Torvalds kernel tree, or not. I hadn't really gotten confirmation one way or another. -------------- next part -------------- PCI Bus EEH Error Recovery -------------------------- Linas Vepstas 12 January 2005 Overview: --------- The IBM POWER-based pSeries and iSeries computers include PCI bus controller chips that have extended capabilities for detecting and reporting a large variety of PCI bus error conditions. These features go under the name of "EEH", for "Extended Error Handling". The EEH hardware features allow PCI bus errors to be cleared and a PCI card to be "rebooted", without also having to reboot the operating system. This is in contrast to traditional PCI error handling, where the PCI chip is wired directly to the CPU, and an error would cause a CPU machine-check/check-stop condition, halting the CPU entirely. Another "traditional" technique is to ignore such errors, which can lead to data corruption, both of user data or of kernel data, hung/unresponsive adapters, or system crashes/lockups. Thus, the idea behind EEH is that the operating system can become more reliable and robust by protecting it from PCI errors, and giving the OS the ability to "reboot"/recover individual PCI devices. Future systems from other vendors, based on the PCI-E specification, may contain similar features. Causes of EEH Errors -------------------- EEH was originally designed to guard against hardware failure, such as PCI cards dying from heat, humidity, dust, vibration and bad electrical connections. The vast majority of EEH errors seen in "real life" are due to eithr poorly seated PCI cards, or, unfortunately quite commonly, due device driver bugs, device firmware bugs, and sometimes PCI card hardware bugs. The most common software bug, is one that causes the device to attempt to DMA to a location in system memory that has not been reserved for DMA access for that card. This is a powerful feature, as it prevents what; otherwise, would have been silent memory corruption caused by the bad DMA. A number of device driver bugs have been found and fixed in this way over the past few years. Other possible causes of EEH errors include data or address line parity errors (for example, due to poor electrical connectivity due to a poorly seated card), and PCI-X split-completion errors (due to software, device firmware, or device PCI hardware bugs). The vast majority of "true hardware failures" can be cured by physically removing and re-seating the PCI card. Detection and Recovery ---------------------- In the following discussion, a generic overview of how to detect and recover from EEH errors will be presented. This is followed by an overview of how the current implementation in the Linux kernel does it. The actual implementation is subject to change, and some of the finer points are still being debated. These may in turn be swayed if or when other architectures implement similar functionality. When a PCI Host Bridge (PHB, the bus controller connecting the PCI bus to the system CPU electronics complex) detects a PCI error condition, it will "isolate" the affected PCI card. Isolation will block all writes (either to the card from the system, or from the card to the system), and it will cause all reads to return all-ff's (0xff, 0xffff, 0xffffffff for 8/16/32-bit reads). This value was chosen because it is the same value you would get if the device was physically unplugged from the slot. This includes access to PCI memory, I/O space, and PCI config space. Interrupts; however, will continued to be delivered. Detection and recovery are performed with the aid of ppc64 firmware. The programming interfaces in the Linux kernel into the firmware are referred to as RTAS (Run-Time Abstraction Services). The Linux kernel does not (should not) access the EEH function in the PCI chipsets directly, primarily because there are a number of different chipsets out there, each with different interfaces and quirks. The firmware provides a uniform abstraction layer that will work with all pSeries and iSeries hardware (and be forwards-compatible). If the OS or device driver suspects that a PCI slot has been EEH-isolated, there is a firmware call it can make to determine if this is the case. If so, then the device driver should put itself into a consistent state (given that it won't be able to complete any pending work) and start recovery of the card. Recovery normally would consist of reseting the PCI device (holding the PCI #RST line high for two seconds), followed by setting up the device config space (the base address registers (BAR's), latency timer, cache line size, interrupt line, and so on). This is followed by a reinitialization of the device driver. In a worst-case scenario, the power to the card can be toggled, at least on hot-plug-capable slots. In principle, layers far above the device driver probably do not need to know that the PCI card has been "rebooted" in this way; ideally, there should be at most a pause in Ethernet/disk/USB I/O while the card is being reset. If the card cannot be recovered after three or four resets, the kernel/device driver should assume the worst-case scenario, that the card has died completely, and report this error to the sysadmin. In addition, error messages are reported through RTAS and also through syslogd (/var/log/messages) to alert the sysadmin of PCI resets. The correct way to deal with failed adapters is to use the standard PCI hotplug tools to remove and replace the dead card. Current PPC64 Linux EEH Implementation -------------------------------------- At this time, a generic EEH recovery mechanism has been implemented, so that individual device drivers do not need to be modified to support EEH recovery. This generic mechanism piggy-backs on the PCI hotplug infrastructure, and percolates events up through the hotplug/udev infrastructure. Followiing is a detailed description of how this is accomplished. EEH must be enabled in the PHB's very early during the boot process, and if a PCI slot is hot-plugged. The former is performed by eeh_init() in arch/ppc64/kernel/eeh.c, and the later by drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code. EEH must be enabled before a PCI scan of the device can proceed. Current Power5 hardware will not work unless EEH is enabled; although older Power4 can run with it disabled. Effectively, EEH can no longer be turned off. PCI devices *must* be registered with the EEH code; the EEH code needs to know about the I/O address ranges of the PCI device in order to detect an error. Given an arbitrary address, the routine pci_get_device_by_addr() will find the pci device associated with that address (if any). The default include/asm-ppc64/io.h macros readb(), inb(), insb(), etc. include a check to see if the the i/o read returned all-0xff's. If so, these make a call to eeh_dn_check_failure(), which in turn asks the firmware if the all-ff's value is the sign of a true EEH error. If it is not, processing continues as normal. The grand total number of these false alarms or "false positives" can be seen in /proc/ppc64/eeh (subject to change). Normally, almost all of these occur during boot, when the PCI bus is scanned, where a large number of 0xff reads are part of the bus scan procedure. If a frozen slot is detected, code in arch/ppc64/kernel/eeh.c will print a stack trace to syslog (/var/log/messages). This stack trace has proven to be very useful to device-driver authors for finding out at what point the EEH error was detected, as the error itself usually occurs slightly beforehand. Next, it uses the Linux kernel notifier chain/work queue mechanism to allow any interested parties to find out about the failure. Device drivers, or other parts of the kernel, can use eeh_register_notifier(struct notifier_block *) to find out about EEH events. The event will include a pointer to the pci device, the device node and some state info. Receivers of the event can "do as they wish"; the default handler will be described further in this section. To assist in the recovery of the device, eeh.c exports the following functions: rtas_set_slot_reset() -- assert the PCI #RST line for 1/8th of a second rtas_configure_bridge() -- ask firmware to configure any PCI bridges located topologically under the pci slot. eeh_save_bars() and eeh_restore_bars(): save and restore the PCI config-space info for a device and any devices under it. A handler for the EEH notifier_block events is implemented in drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events(). It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter(). This last call causes the device driver for the card to be stopped, which causes hotplug events to go out to user space. This triggers user-space scripts that might issue commands such as "ifdown eth0" for ethernet cards, and so on. This handler then sleeps for 5 seconds, hoping to give the user-space scripts enough time to complete. It then resets the PCI card, reconfigures the device BAR's, and any bridges underneath. It then calls rpaphp_enable_pci_slot(), which restarts the device driver and triggers more user-space events (for example, calling "ifup eth0" for ethernet cards). Device Shutdown and User-Space Events ------------------------------------- This section documents what happens when a pci slot is unconfigured, focusing on how the device driver gets shut down, and on how the events get delivered to user-space scripts. Following is an example sequence of events that cause a device driver close function to be called during the first phase of an EEH reset. The following sequence is an example of the pcnet32 device driver. rpa_php_unconfig_pci_adapter (struct slot *) // in rpaphp_pci.c { calls pci_remove_bus_device (struct pci_dev *) // in /drivers/pci/remove.c { calls pci_destroy_dev (struct pci_dev *) { calls device_unregister (&dev->dev) // in /drivers/base/core.c { calls device_del (struct device *) { calls bus_remove_device() // in /drivers/base/bus.c { calls device_release_driver() { calls struct device_driver->remove() which is just pci_device_remove() // in /drivers/pci/pci_driver.c { calls struct pci_driver->remove() which is just pcnet32_remove_one() // in /drivers/net/pcnet32.c { calls unregister_netdev() // in /net/core/dev.c { calls dev_close() // in /net/core/dev.c { calls dev->stop(); which is just pcnet32_close() // in pcnet32.c { which does what you wanted to stop the device } } } which frees pcnet32 device driver memory } }}}}}} in drivers/pci/pci_driver.c, struct device_driver->remove() is just pci_device_remove() which calls struct pci_driver->remove() which is pcnet32_remove_one() which calls unregister_netdev() (in net/core/dev.c) which calls dev_close() (in net/core/dev.c) which calls dev->stop() which is pcnet32_close() which then does the appropriate shutdown. --- Following is the analogous stack trace for events sent to user-space when the pci device is unconfigured. rpa_php_unconfig_pci_adapter() { // in rpaphp_pci.c calls pci_remove_bus_device (struct pci_dev *) { // in /drivers/pci/remove.c calls pci_destroy_dev (struct pci_dev *) { calls device_unregister (&dev->dev) { // in /drivers/base/core.c calls device_del(struct device * dev) { // in /drivers/base/core.c calls kobject_del() { //in /libs/kobject.c calls kobject_hotplug() { // in /libs/kobject.c calls kset_hotplug() { // in /lib/kobject.c calls kset->hotplug_ops->hotplug() which is really just a call to dev_hotplug() { // in /drivers/base/core.c calls dev->bus->hotplug() which is really just a call to pci_hotplug () { // in drivers/pci/hotplug.c which prints device name, etc.... } } then kset_hotplug() calls call_usermodehelper () with argv[0]=hotplug_path[] which is "/sbin/hotplug" --> event to userspace, } } kobject_del() then calls sysfs_remove_dir(), which would trigger any user-space daemon that was watching /sysfs, and notice the delete event. Pro's and Con's of the Current Design ------------------------------------- There are several issues with the current EEH software recovery design, which may be addressed in future revisions. But first, note that the big plus of the current design is that no changes need to be made to individual device drivers, so that the current design throws a wide net. The biggest negative of the design is that it potentially disturbs network daemons and file systems that didn't need to be disturbed. -- A minor complaint is that resetting the network card causes user-space back-to-back ifdown/ifup burps that potentially disturb network daemons, that didn't need to even know that the pci card was being rebooted. -- A more serious concern is that the same reset, for SCSI devices, causes havoc to mounted file systems. Scripts cannot post-facto unmount a file system without flushing pending buffers, but this is impossible, because I/O has already been stopped. Thus, ideally, the reset should happen at or below the block layer, so that the file systems are not disturbed. Reiserfs does not tolerate errors returned from the block device. Ext3fs seems to be tolerant, retrying reads/writes until it does succeed. Both have been only lightly tested in this scenario. The SCSI-generic subsystem already has built-in code for performing SCSI device resets, SCSI bus resets, and SCSI host-bus-adapter (HBA) resets. These are cascaded into a chain of attempted resets if a SCSI command fails. These are completely hidden from the block layer. It would be very natural to add an EEH reset into this chain of events. -- If a SCSI error occurs for the root device, all is lost unless the sysadmin had the foresight to run /bin, /sbin, /etc, /var and so on, out of ramdisk/tmpfs. Conclusions ----------- There's forward progress ... From nacc at us.ibm.com Tue Jan 18 10:50:05 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 15:50:05 -0800 Subject: [PATCH 16/21] ppc64/iSeries_pci_reset: replace schedule_timeout() with msleep() Message-ID: <20050117235005.GY24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The code is not wrong as is, but I see two benefits to using msleep(): 1) real time delays (milliseconds) and 2) consistency across the kernel with respect to longer delays. Change the units of the WaitDelay and AssertDelay constants accordingly. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 17:17:54.000000000 -0800 @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -49,7 +50,7 @@ int iSeries_Device_ToggleReset(struct pci_dev *PciDev, int AssertTime, int DelayTime) { - unsigned long AssertDelay, WaitDelay; + unsigned int AssertDelay, WaitDelay; struct iSeries_Device_Node *DeviceNode = (struct iSeries_Device_Node *)PciDev->sysdata; @@ -62,14 +63,14 @@ int iSeries_Device_ToggleReset(struct pc * Set defaults, Assert is .5 second, Wait is 3 seconds. */ if (AssertTime == 0) - AssertDelay = (5 * HZ) / 10; + AssertDelay = 500; else - AssertDelay = (AssertTime * HZ) / 10; + AssertDelay = AssertTime * 100; if (DelayTime == 0) - WaitDelay = (30 * HZ) / 10; + WaitDelay = 3000; else - WaitDelay = (DelayTime * HZ) / 10; + WaitDelay = DelayTime * 100; /* * Assert reset @@ -77,8 +78,7 @@ int iSeries_Device_ToggleReset(struct pc DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode), 0x00, DeviceNode->AgentId, 1); if (DeviceNode->ReturnCode == 0) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(AssertDelay); /* Sleep for the time */ + msleep(AssertDelay); /* Sleep for the time */ DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode), 0x00, DeviceNode->AgentId, 0); @@ -86,8 +86,7 @@ int iSeries_Device_ToggleReset(struct pc /* * Wait for device to reset */ - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(WaitDelay); + msleep(WaitDelay); } if (DeviceNode->ReturnCode == 0) PCIFR("Slot 0x%04X.%02 Reset\n", ISERIES_BUS(DeviceNode), From nacc at us.ibm.com Tue Jan 18 11:15:22 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:15:22 -0800 Subject: [PATCH 17/21] ppc64/pSeries_smp: replace schedule_timeout() with msleep() Message-ID: <20050118001522.GZ24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect, but msleep() is clearer in terms of the length of delay and helps make the kernel consistent. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/pSeries_smp.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/pSeries_smp.c 2005-01-15 17:21:12.000000000 -0800 @@ -107,8 +107,7 @@ void pSeries_cpu_die(unsigned int cpu) cpu_status = query_cpu_stopped(pcpu); if (cpu_status == 0 || cpu_status == -1) break; - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/5); + msleep(200); } if (cpu_status != 0) { printk("Querying DEAD? cpu %i (%i) shows %i\n", From nacc at us.ibm.com Tue Jan 18 11:18:19 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:18:19 -0800 Subject: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with msleep() Message-ID: <20050118001819.GA24698@us.ibm.com> Hi, Please consider applying. Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases, the current code sleeps in TASK_INTERRUPTIBLE but does not account for early wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE sleeps in both cases. The second sleep is slightly more difficult to convert as rtas_event_scan_rate is variable. I have left it as a msleep() call, although ssleep() may be more appropriate. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/rtasd.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/rtasd.c 2005-01-15 17:28:50.000000000 -0800 @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -444,8 +445,7 @@ static int rtasd(void *unused) DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); do_event_scan(event_scan); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(HZ); + ssleep(1); } unlock_cpu_hotplug(); @@ -466,8 +466,7 @@ static int rtasd(void *unused) * one second since some machines have problems if we * call event-scan too quickly). */ unlock_cpu_hotplug(); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout((HZ*60/rtas_event_scan_rate) / 2); + msleep(30000/rtas_event_scan_rate); lock_cpu_hotplug(); cpu = next_cpu(cpu, cpu_online_map); From nacc at us.ibm.com Tue Jan 18 11:20:13 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:20:13 -0800 Subject: [PATCH 19/21] ppc64/smp: replace schedule_timeout() with msleep() Message-ID: <20050118002013.GB24698@us.ibm.com> Hi, Please consider applying. Description: Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect; however using msleep() encourages using real time-unit sleeps and keeps the kernel consistent. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/smp.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/smp.c 2005-01-15 17:30:16.000000000 -0800 @@ -459,8 +459,7 @@ int __devinit __cpu_up(unsigned int cpu) * hotplug case. Wait five seconds. */ for (c = 25; c && !cpu_callin_map[cpu]; c--) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/5); + msleep(200); } #endif From nacc at us.ibm.com Tue Jan 18 11:21:30 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 17 Jan 2005 16:21:30 -0800 Subject: [PATCH 20/21] ppc64/traps: replace schedule_timeout() with ssleep() Message-ID: <20050118002130.GC24698@us.ibm.com> Hi, Please consider applying. Description: Use ssleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not incorrect, but using ssleep() encourages specifying delays in real time-units and consistency across the kernel. Signed-off-by: Nishanth Aravamudan --- 2.6.11-rc1-kj-v/arch/ppc64/kernel/traps.c 2005-01-15 16:55:41.000000000 -0800 +++ 2.6.11-rc1-kj/arch/ppc64/kernel/traps.c 2005-01-15 17:30:39.000000000 -0800 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -137,8 +138,7 @@ int die(const char *str, struct pt_regs if (panic_on_oops) { printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n"); - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(5 * HZ); + ssleep(5); panic("Fatal exception"); } do_exit(SIGSEGV); From benh at kernel.crashing.org Tue Jan 18 11:49:15 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 18 Jan 2005 11:49:15 +1100 Subject: [PATCH] PPC64 pmac hotplug cpu In-Reply-To: <41EBD662.1080409@nortelnetworks.com> References: <1105827794.27410.82.camel@gaston> <1105937266.4534.0.camel@gaston> <41EBD662.1080409@nortelnetworks.com> Message-ID: <1106009355.4533.19.camel@gaston> On Mon, 2005-01-17 at 09:14 -0600, Chris Friesen wrote: > Benjamin Herrenschmidt wrote: > > > Well.. the cache flush part requires some not-really-documentd stuff on > > the 970, but I'll try to come up with something. > > Details? We've got a cache-flush routine put together based on the > documentation that seems to be working, but if there's something else > that has to be done I'd love to know about it. Well, I don't have all the details at hand right now, but it involves using SCOM (with appropriate workarounds for CPU SCOM bugs on some 970's) to switch the L2 to direct addressing iirc. Ben. From rusty at rustcorp.com.au Tue Jan 18 13:20:03 2005 From: rusty at rustcorp.com.au (Rusty Russell) Date: Tue, 18 Jan 2005 13:20:03 +1100 Subject: [PATCH] Fix kallsyms/insmod/rmmod race In-Reply-To: <31453.1105979239@redhat.com> References: <31453.1105979239@redhat.com> Message-ID: <1106014803.30801.22.camel@localhost.localdomain> On Mon, 2005-01-17 at 16:27 +0000, David Howells wrote: > The attached patch fixes a race between kallsyms and insmod/rmmod. Hi David, The more I looked at this, the more I warmed to it. I've known for a while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan), so we should provide a "grabs locks" version, but this solution gets around that nicely, while making life more certain for the oops case, too. Good work! Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman From dhowells at redhat.com Wed Jan 19 06:44:28 2005 From: dhowells at redhat.com (David Howells) Date: Tue, 18 Jan 2005 19:44:28 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race In-Reply-To: <1106014803.30801.22.camel@localhost.localdomain> References: <1106014803.30801.22.camel@localhost.localdomain> <31453.1105979239@redhat.com> Message-ID: <1561.1106077468@redhat.com> Rusty Russell wrote: > The more I looked at this, the more I warmed to it. I've known for a > while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan), > so we should provide a "grabs locks" version, but this solution gets > around that nicely, while making life more certain for the oops case, > too. Hmmm... though it works on i386 SMP, it doesn't, however, seem to work on ppc64 SMP:-/ My pSeries box seems to think that it can't find any symbols from previously loaded modules, and my Power5 box is quite happy to load modules that depend on other modules but panics because it can't mount its root fs. This is very odd, because the patch is simple enough. Is there anything obvious I've missed that you can see? Or maybe I'm just misunderstanding how stop_machine_run() works... maybe it can't be called during initialisation. David From benh at kernel.crashing.org Wed Jan 19 13:54:51 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 19 Jan 2005 13:54:51 +1100 Subject: [PATCH] ppc64/ppc: Cleanup PCI skipping Message-ID: <1106103291.4500.147.camel@gaston> Hi ! The g5 code has special hooks to "hide" some PCI devices when they are off. Currently, this code involves some calls to match a pci_dev from the open firmware node and such things that are causing some problems with the latest version of my sungem driver who wants to do some of this in atomic contexts. This patch moves that to a list of struct device_node instead, which also ends up simplifying the code. Later, I'll go back to manipulating PCI devices in a clean way when Brian King's PCI blocking patch gets in, but only after I change sungem again to never call these in atomic context. This is a 3 step transition basically Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_feature.c 2004-11-22 11:49:24.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_feature.c 2005-01-19 13:48:25.000000000 +1100 @@ -111,7 +111,7 @@ static u32 uninorth_rev __pmacdata; static void *u3_ht; -extern struct pci_dev *k2_skiplist[2]; +extern struct device_node *k2_skiplist[2]; /* * For each motherboard family, we have a table of functions pointers @@ -160,30 +160,17 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev = NULL; if (node == NULL) return -ENODEV; - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root. - * Note that we only get the slot when value is 0. This is called - * early during boot with value 1 to enable all devices, at which - * point, we don't yet have probed pci_find_slot, so it would fail - * to look for the slot at this point. - */ - if (!value) - pdev = pci_find_slot(node->busno, node->devfn); - LOCK(flags); if (value) { MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); mb(); k2_skiplist[0] = NULL; } else { - k2_skiplist[0] = pdev; + k2_skiplist[0] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); } @@ -198,30 +185,17 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev = NULL; - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - * Note that we only get the slot when value is 0. This is called - * early during boot with value 1 to enable all devices, at which - * point, we don't yet have probed pci_find_slot, so it would fail - * to look for the slot at this point. - */ if (node == NULL) return -ENODEV; - if (!value) - pdev = pci_find_slot(node->busno, node->devfn); - LOCK(flags); if (value) { MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); mb(); k2_skiplist[1] = NULL; } else { - k2_skiplist[1] = pdev; + k2_skiplist[1] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); } Index: linux-work/arch/ppc64/kernel/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_pci.c 2005-01-14 08:17:11.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_pci.c 2005-01-19 13:44:50.000000000 +1100 @@ -43,7 +43,7 @@ * assuming we won't have both UniNorth and Bandit */ static int has_uninorth; static struct pci_controller *u3_agp; -struct pci_dev *k2_skiplist[2]; +struct device_node *k2_skiplist[2]; static int __init fixup_one_level_bus_range(struct device_node *node, int higher) { @@ -233,15 +233,6 @@ struct device_node *busdn, *dn; int i; - /* - * When a device in K2 is powered down, we die on config - * cycle accesses. Fix that here. - */ - for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) - return 1; - /* We only allow config cycles to devices that are in OF device-tree * as we are apparently having some weird things going on with some * revs of K2 on recent G5s @@ -256,6 +247,14 @@ if (dn == NULL) return -1; + /* + * When a device in K2 is powered down, we die on config + * cycle accesses. Fix that here. + */ + for (i=0; i<2; i++) + if (k2_skiplist[i] == dn) + return 1; + return 0; } Index: linux-work/arch/ppc/platforms/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_feature.c 2005-01-18 17:50:10.000000000 +1100 +++ linux-work/arch/ppc/platforms/pmac_feature.c 2005-01-19 13:46:06.000000000 +1100 @@ -56,7 +56,7 @@ #endif extern int powersave_nap; -extern struct pci_dev *k2_skiplist[2]; +extern struct device_node *k2_skiplist[2]; /* @@ -1328,16 +1328,6 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev; - u8 pbus, pid; - - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - */ - if (pci_device_from_OF_node(node, &pbus, &pid) == 0) - pdev = pci_find_slot(pbus, pid); LOCK(flags); if (value) { @@ -1345,7 +1335,7 @@ mb(); k2_skiplist[0] = NULL; } else { - k2_skiplist[0] = pdev; + k2_skiplist[0] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_GMAC_CLK_ENABLE); } @@ -1361,16 +1351,6 @@ { struct macio_chip* macio = &macio_chips[0]; unsigned long flags; - struct pci_dev *pdev; - u8 pbus, pid; - - /* XXX FIXME: We should fix pci_device_from_OF_node here, and - * get to a real pci_dev or we'll get into trouble with PCI - * domains the day we get overlapping numbers (like if we ever - * decide to show the HT root - */ - if (pci_device_from_OF_node(node, &pbus, &pid) == 0) - pdev = pci_find_slot(pbus, pid); LOCK(flags); if (value) { @@ -1378,7 +1358,7 @@ mb(); k2_skiplist[1] = NULL; } else { - k2_skiplist[1] = pdev; + k2_skiplist[1] = node; mb(); MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_FW_CLK_ENABLE); } Index: linux-work/arch/ppc/platforms/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_pci.c 2005-01-18 17:50:11.000000000 +1100 +++ linux-work/arch/ppc/platforms/pmac_pci.c 2005-01-19 13:46:58.000000000 +1100 @@ -52,7 +52,7 @@ extern u8 pci_cache_line_size; extern int pcibios_assign_bus_offset; -struct pci_dev *k2_skiplist[2]; +struct device_node *k2_skiplist[2]; /* * Magic constants for enabling cache coherency in the bandit/PSX bridge. @@ -325,8 +325,7 @@ * cycle accesses. Fix that here. */ for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) { + if (k2_skiplist[i] == np) { switch (len) { case 1: *val = 0xff; break; @@ -375,8 +374,7 @@ * cycle accesses. Fix that here. */ for (i=0; i<2; i++) - if (k2_skiplist[i] && k2_skiplist[i]->bus == bus && - k2_skiplist[i]->devfn == devfn) + if (k2_skiplist[i] == np) return PCIBIOS_SUCCESSFUL; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); From anton at samba.org Wed Jan 19 15:12:30 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 19 Jan 2005 15:12:30 +1100 Subject: [PATCH] ppc64: Minimum hashtable size Message-ID: <20050119041230.GB21682@krispykreme.ozlabs.ibm.com> From: Milton Miller We werent enforcing the minimum hardware MMU hashtable size. Signed-off-by: Milton Miller Signed-off-by: Anton Blanchard diff -puN arch/ppc64/kernel/prom.c~minimum_hashtable_size arch/ppc64/kernel/prom.c --- foobar2/arch/ppc64/kernel/prom.c~minimum_hashtable_size 2005-01-19 15:06:47.729610075 +1100 +++ foobar2-anton/arch/ppc64/kernel/prom.c 2005-01-19 15:07:06.577082744 +1100 @@ -1055,7 +1055,7 @@ void __init early_init_devtree(void *par rnd_mem_size <<= 1; /* # pages / 2 */ - pteg_count = (rnd_mem_size >> (12 + 1)); + pteg_count = max(rnd_mem_size >> (12 + 1), 1UL << 11); ppc64_pft_size = __ilog2(pteg_count << 7); } _ From benh at kernel.crashing.org Wed Jan 19 15:31:55 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 19 Jan 2005 15:31:55 +1100 Subject: vDSO update Message-ID: <1106109115.4499.171.camel@gaston> I posted a new vDSO patch at http://gate.crashing.org/~benh/ppc64-vdso-20050119.diff Now, both 32 and 64 bits vDSO's are linked at "0" and export symbols as offsets to functions and not real function symbols (I made them consistent) and updated to patch to apply against current Linus bk. -- Benjamin Herrenschmidt From sfr at canb.auug.org.au Wed Jan 19 15:48:57 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 19 Jan 2005 15:48:57 +1100 Subject: [PATCH] htab code cleanup In-Reply-To: <1105828597.27435.88.camel@gaston> References: <20050106145102.0c3c60ad.sfr@canb.auug.org.au> <1105828597.27435.88.camel@gaston> Message-ID: <20050119154857.7cec8fbb.sfr@canb.auug.org.au> On Sun, 16 Jan 2005 09:36:37 +1100 Benjamin Herrenschmidt wrote: > > On Thu, 2005-01-06 at 14:51 +1100, Stephen Rothwell wrote: > > Hi all, > > > > This patch just does some small clean ups on the hash page table code > > - make htab_address static with in htab_native.c > > - move some code that depended on CONFIG_PPC_MULTIPLATFORM > > from htab_utils.c to htab_native.c (on less CONFIG check). > > - clean up includes in htab_utils.c > > I don't see the point of moving create_pte_mapping() and > htab_initialize() to htab_native.c since it contains code for both > native and non-native... > > If you want to get rid of the htab_address, then maybe split > htab_initialize in bits... like htab_native_init() and htab_plpar_init() > for the early ptr setup, that sort of thing ... OK, how about this one, then? This has been built and booted on iSeries, pSeries (bare metal and lpar) and a G5 (with and without iommu). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk.new/arch/ppc64/kernel/iSeries_setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c --- linus-bk.new/arch/ppc64/kernel/iSeries_setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/kernel/iSeries_setup.c 2005-01-18 16:46:06.000000000 +1100 @@ -477,12 +477,6 @@ htab_hash_mask = num_ptegs - 1; /* - * The actual hashed page table is in the hypervisor, - * we have no direct access - */ - htab_address = NULL; - - /* * Determine if absolute memory has any * holes so that we can interpret the * access map we get back from the hypervisor diff -ruN linus-bk.new/arch/ppc64/kernel/setup.c linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c --- linus-bk.new/arch/ppc64/kernel/setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/kernel/setup.c 2005-01-18 16:46:23.000000000 +1100 @@ -674,7 +674,6 @@ ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", ppc64_caches.iline_size); - printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); printk("-----------------------------------------------------\n"); diff -ruN linus-bk.new/arch/ppc64/mm/Makefile linus-bk-sfr.14.new/arch/ppc64/mm/Makefile --- linus-bk.new/arch/ppc64/mm/Makefile 2004-09-24 15:23:06.000000000 +1000 +++ linus-bk-sfr.14.new/arch/ppc64/mm/Makefile 2005-01-18 18:28:57.000000000 +1100 @@ -8,4 +8,4 @@ slb_low.o slb.o stab.o mmap.o obj-$(CONFIG_DISCONTIGMEM) += numa.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o -obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_native.o +obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_multi.o hash_native.o diff -ruN linus-bk.new/arch/ppc64/mm/hash_multi.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c --- linus-bk.new/arch/ppc64/mm/hash_multi.c 1970-01-01 10:00:00.000000000 +1000 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_multi.c 2005-01-18 18:27:48.000000000 +1100 @@ -0,0 +1,177 @@ +/* + * multiplatform hashtable management. + * + * SMP scalability work: + * Copyright (C) 2001 Anton Blanchard , IBM + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +/* + * Note: pte --> Linux PTE + * HPTE --> PowerPC Hashed Page Table Entry + * + * Execution context: + * htab_initialize is called with the MMU off (of course), but + * the kernel has been copied down to zero so it can directly + * reference global data. At this point it is very difficult + * to print debug info. + * + */ + +#ifdef CONFIG_U3_DART +extern unsigned long dart_tablebase; +#endif /* CONFIG_U3_DART */ + +#define KB (1024) +#define MB (1024*KB) + +static inline void loop_forever(void) +{ + volatile unsigned long x = 1; + for(;x;x|=1) + ; +} + +static inline void create_pte_mapping(unsigned long start, unsigned long end, + unsigned long mode, int large) +{ + unsigned long addr; + unsigned int step; + + if (large) + step = 16*MB; + else + step = 4*KB; + + for (addr = start; addr < end; addr += step) { + unsigned long vpn, hash, hpteg; + unsigned long vsid = get_kernel_vsid(addr); + unsigned long va = (vsid << 28) | (addr & 0xfffffff); + int ret; + + if (large) + vpn = va >> HPAGE_SHIFT; + else + vpn = va >> PAGE_SHIFT; + + hash = hpt_hash(vpn, large); + + hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); + +#ifdef CONFIG_PPC_PSERIES + if (systemcfg->platform & PLATFORM_LPAR) + ret = pSeries_lpar_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + else +#endif /* CONFIG_PPC_PSERIES */ + ret = native_hpte_insert(hpteg, va, + virt_to_abs(addr) >> PAGE_SHIFT, + 0, mode, 1, large); + + if (ret == -1) { + ppc64_terminate_msg(0x20, "create_pte_mapping"); + loop_forever(); + } + } +} + +void __init htab_initialize(void) +{ + unsigned long htab_size_bytes; + unsigned long pteg_count; + unsigned long mode_rw; + int i, use_largepages = 0; + + DBG(" -> htab_initialize()\n"); + + /* + * Calculate the required size of the htab. We want the number of + * PTEGs to equal one half the number of real pages. + */ + htab_size_bytes = 1UL << ppc64_pft_size; + pteg_count = htab_size_bytes >> 7; + + /* For debug, make the HTAB 1/8 as big as it normally would be. */ + ifppcdebug(PPCDBG_HTABSIZE) { + pteg_count >>= 3; + htab_size_bytes = pteg_count << 7; + } + + htab_hash_mask = pteg_count - 1; + +#ifdef CONFIG_PPC_PSERIES + if (!(systemcfg->platform & PLATFORM_LPAR)) +#endif + if (native_htab_initialize(htab_size_bytes, pteg_count)) + loop_forever(); + + mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + + /* On U3 based machines, we need to reserve the DART area and + * _NOT_ map it to avoid cache paradoxes as it's remapped non + * cacheable later on + */ + if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + use_largepages = 1; + + /* create bolted the linear mapping in the hash table */ + for (i = 0; i < lmb.memory.cnt; i++) { + unsigned long base, size; + + base = lmb.memory.region[i].physbase + KERNELBASE; + size = lmb.memory.region[i].size; + + DBG("creating mapping for region: %lx : %lx\n", base, size); + +#ifdef CONFIG_U3_DART + /* Do not map the DART space. Fortunately, it will be aligned + * in such a way that it will not cross two lmb regions and will + * fit within a single 16Mb page. + * The DART space is assumed to be a full 16Mb region even if we + * only use 2Mb of that space. We will use more of it later for + * AGP GART. We have to use a full 16Mb large page. + */ + DBG("DART base: %lx\n", dart_tablebase); + + if (dart_tablebase != 0 && dart_tablebase >= base + && dart_tablebase < (base + size)) { + if (base != dart_tablebase) + create_pte_mapping(base, dart_tablebase, mode_rw, + use_largepages); + if ((base + size) > (dart_tablebase + 16*MB)) + create_pte_mapping(dart_tablebase + 16*MB, base + size, + mode_rw, use_largepages); + continue; + } +#endif /* CONFIG_U3_DART */ + create_pte_mapping(base, base + size, mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); +} +#undef KB +#undef MB diff -ruN linus-bk.new/arch/ppc64/mm/hash_native.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c --- linus-bk.new/arch/ppc64/mm/hash_native.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_native.c 2005-01-18 18:28:13.000000000 +1100 @@ -22,10 +22,21 @@ #include #include #include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +extern unsigned long _SDR1; #define HPTE_LOCK_BIT 3 static spinlock_t native_tlbie_lock = SPIN_LOCK_UNLOCKED; +static HPTE *htab_address; static inline void native_lock_hpte(HPTE *hptep) { @@ -410,6 +421,33 @@ } #endif +int native_htab_initialize(unsigned long htab_size_bytes, + unsigned long pteg_count) +{ + unsigned long table; + + /* Find storage for the HPT. Must be contiguous in + * the absolute address space. + */ + table = lmb_alloc(htab_size_bytes, htab_size_bytes); + + DBG("Hash table allocated at %lx, size: %lx\n", table, htab_size_bytes); + + if (!table) { + ppc64_terminate_msg(0x20, "hpt space"); + return 1; + } + htab_address = abs_to_virt(table); + + /* htab absolute addr + encoded htabsize */ + _SDR1 = table + __ilog2(pteg_count) - 11; + + /* Initialize the HPT with no entries */ + memset((void *)table, 0, htab_size_bytes); + + return 0; +} + void hpte_init_native(void) { ppc_md.hpte_invalidate = native_hpte_invalidate; diff -ruN linus-bk.new/arch/ppc64/mm/hash_utils.c linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c --- linus-bk.new/arch/ppc64/mm/hash_utils.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk-sfr.14.new/arch/ppc64/mm/hash_utils.c 2005-01-06 14:37:27.000000000 +1100 @@ -17,220 +17,29 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - -#undef DEBUG - -#include -#include -#include +#include +#include +#include #include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include -#include #include #include #include #include #include -#include #include -#include #include -#include -#include #include -#include -#include -#include #include -#include -#include - -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - -/* - * Note: pte --> Linux PTE - * HPTE --> PowerPC Hashed Page Table Entry - * - * Execution context: - * htab_initialize is called with the MMU off (of course), but - * the kernel has been copied down to zero so it can directly - * reference global data. At this point it is very difficult - * to print debug info. - * - */ - -#ifdef CONFIG_U3_DART -extern unsigned long dart_tablebase; -#endif /* CONFIG_U3_DART */ +#include -HPTE *htab_address; unsigned long htab_hash_mask; -extern unsigned long _SDR1; - -#define KB (1024) -#define MB (1024*KB) - -static inline void loop_forever(void) -{ - volatile unsigned long x = 1; - for(;x;x|=1) - ; -} - -#ifdef CONFIG_PPC_MULTIPLATFORM -static inline void create_pte_mapping(unsigned long start, unsigned long end, - unsigned long mode, int large) -{ - unsigned long addr; - unsigned int step; - - if (large) - step = 16*MB; - else - step = 4*KB; - - for (addr = start; addr < end; addr += step) { - unsigned long vpn, hash, hpteg; - unsigned long vsid = get_kernel_vsid(addr); - unsigned long va = (vsid << 28) | (addr & 0xfffffff); - int ret; - - if (large) - vpn = va >> HPAGE_SHIFT; - else - vpn = va >> PAGE_SHIFT; - - hash = hpt_hash(vpn, large); - - hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); - -#ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_LPAR) - ret = pSeries_lpar_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - else -#endif /* CONFIG_PPC_PSERIES */ - ret = native_hpte_insert(hpteg, va, - virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); - - if (ret == -1) { - ppc64_terminate_msg(0x20, "create_pte_mapping"); - loop_forever(); - } - } -} - -void __init htab_initialize(void) -{ - unsigned long table, htab_size_bytes; - unsigned long pteg_count; - unsigned long mode_rw; - int i, use_largepages = 0; - - DBG(" -> htab_initialize()\n"); - - /* - * Calculate the required size of the htab. We want the number of - * PTEGs to equal one half the number of real pages. - */ - htab_size_bytes = 1UL << ppc64_pft_size; - pteg_count = htab_size_bytes >> 7; - - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - - htab_hash_mask = pteg_count - 1; - - if (systemcfg->platform & PLATFORM_LPAR) { - /* Using a hypervisor which owns the htab */ - htab_address = NULL; - _SDR1 = 0; - } else { - /* Find storage for the HPT. Must be contiguous in - * the absolute address space. - */ - table = lmb_alloc(htab_size_bytes, htab_size_bytes); - - DBG("Hash table allocated at %lx, size: %lx\n", table, - htab_size_bytes); - - if ( !table ) { - ppc64_terminate_msg(0x20, "hpt space"); - loop_forever(); - } - htab_address = abs_to_virt(table); - - /* htab absolute addr + encoded htabsize */ - _SDR1 = table + __ilog2(pteg_count) - 11; - - /* Initialize the HPT with no entries */ - memset((void *)table, 0, htab_size_bytes); - } - - mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; - - /* On U3 based machines, we need to reserve the DART area and - * _NOT_ map it to avoid cache paradoxes as it's remapped non - * cacheable later on - */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) - use_largepages = 1; - - /* create bolted the linear mapping in the hash table */ - for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - - base = lmb.memory.region[i].physbase + KERNELBASE; - size = lmb.memory.region[i].size; - - DBG("creating mapping for region: %lx : %lx\n", base, size); - -#ifdef CONFIG_U3_DART - /* Do not map the DART space. Fortunately, it will be aligned - * in such a way that it will not cross two lmb regions and will - * fit within a single 16Mb page. - * The DART space is assumed to be a full 16Mb region even if we - * only use 2Mb of that space. We will use more of it later for - * AGP GART. We have to use a full 16Mb large page. - */ - DBG("DART base: %lx\n", dart_tablebase); - - if (dart_tablebase != 0 && dart_tablebase >= base - && dart_tablebase < (base + size)) { - if (base != dart_tablebase) - create_pte_mapping(base, dart_tablebase, mode_rw, - use_largepages); - if ((base + size) > (dart_tablebase + 16*MB)) - create_pte_mapping(dart_tablebase + 16*MB, base + size, - mode_rw, use_largepages); - continue; - } -#endif /* CONFIG_U3_DART */ - create_pte_mapping(base, base + size, mode_rw, use_largepages); - } - DBG(" <- htab_initialize()\n"); -} -#undef KB -#undef MB -#endif /* CONFIG_PPC_MULTIPLATFORM */ - /* * Called by asm hashtable.S for doing lazy icache flush */ diff -ruN linus-bk.new/include/asm-ppc64/mmu.h linus-bk-sfr.14.new/include/asm-ppc64/mmu.h --- linus-bk.new/include/asm-ppc64/mmu.h 2005-01-05 17:06:08.000000000 +1100 +++ linus-bk-sfr.14.new/include/asm-ppc64/mmu.h 2005-01-06 14:36:16.000000000 +1100 @@ -98,7 +98,6 @@ #define PP_RXRX 3 /* Supervisor read, User read */ -extern HPTE * htab_address; extern unsigned long htab_hash_mask; static inline unsigned long hpt_hash(unsigned long vpn, int large) diff -ruN linus-bk.new/include/asm-ppc64/pgtable.h linus-bk-sfr.14.new/include/asm-ppc64/pgtable.h --- linus-bk.new/include/asm-ppc64/pgtable.h 2005-01-02 12:05:23.000000000 +1100 +++ linus-bk-sfr.14.new/include/asm-ppc64/pgtable.h 2005-01-18 17:37:43.000000000 +1100 @@ -523,6 +523,9 @@ extern long native_hpte_insert(unsigned long hpte_group, unsigned long va, unsigned long prpn, int secondary, unsigned long hpteflags, int bolted, int large); +extern int native_htab_initialize(unsigned long htab_size_bytes, + unsigned long pteg_count); + /* * find_linux_pte returns the address of a linux pte for a given -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/49581523/attachment.pgp From wangzyu at cn.ibm.com Wed Jan 19 15:50:39 2005 From: wangzyu at cn.ibm.com (Zhao Yu Wang) Date: Wed, 19 Jan 2005 12:50:39 +0800 Subject: question about LMB's size In-Reply-To: Message-ID: Hi,Will Thanks >> Hi, >> This is a question about the different of memory size between lpar and HMC. >>... >> 2. In lpar didolp2: We get the size of memory is 2174672KB. >> [root at didolp2 ~]# cat /proc/meminfo >> MemTotal: 2174672 kB >> >> The question is: 2174672/(32*1024) = 66.36572265625 >MemTotal is the amount of free memory in the partition, which does not >include the memory that holds the kernel code, (bss, data, init). >There should be a few other pieces of data that will add up to the numbers >you are looking for. >in early boot messages, there is a line "SystemCfg->physicalMemorySize = >0x.......". This value should be precisely what you are trying to >measure. >A bit later in the logs, you can also see a line >"Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k data, >###k bss, ###k init). >the YYYYk should also match what you are looking for. If the system boot up several days before, the boot log is not available at this time. Whether there has any other method to get the physical memory from lpar. Could the OS provide a method to obtain the real memory. It will help to dynamic reassign resource according by the load between several partition. Thanks & Best regards, -------------------------------------------- Wang Zhaoyu Email: wangzyu at cn.ibm.com Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/f183befb/attachment.htm From sfr at canb.auug.org.au Wed Jan 19 16:47:53 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 19 Jan 2005 16:47:53 +1100 Subject: [PATCH] PPC64: remove some unused iSeries functions Message-ID: <20050119164753.5af63cc5.sfr@canb.auug.org.au> Hi Linus, Andrew, This patch removes some unused stuff from PPC64 iSeries: - asm-ppc64/iSeries/iSeries_VpdInfo.h - iSeries_GetLocationData() - LocationData structure - device_Location() Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/iSeries_VpdInfo.c linus-bk-sfr.16/arch/ppc64/kernel/iSeries_VpdInfo.c --- linus-bk/arch/ppc64/kernel/iSeries_VpdInfo.c 2004-04-01 06:59:36.000000000 +1000 +++ linus-bk-sfr.16/arch/ppc64/kernel/iSeries_VpdInfo.c 2005-01-19 16:36:40.000000000 +1100 @@ -36,7 +36,6 @@ #include #include #include -//#include #include #include "pci.h" @@ -85,30 +84,6 @@ #define SLOT_ENTRY_SIZE 16 /* - * Bus, Card, Board, FrameId, CardLocation. - */ -LocationData* iSeries_GetLocationData(struct pci_dev *PciDev) -{ - struct iSeries_Device_Node *DevNode = - (struct iSeries_Device_Node *)PciDev->sysdata; - LocationData *LocationPtr = - (LocationData *)kmalloc(LOCATION_DATA_SIZE, GFP_KERNEL); - - if (LocationPtr == NULL) { - printk("PCI: LocationData area allocation failed!\n"); - return NULL; - } - memset(LocationPtr, 0, LOCATION_DATA_SIZE); - LocationPtr->Bus = ISERIES_BUS(DevNode); - LocationPtr->Board = DevNode->Board; - LocationPtr->FrameId = DevNode->FrameId; - LocationPtr->Card = PCI_SLOT(DevNode->DevFn); - strcpy(&LocationPtr->CardLocation[0], &DevNode->CardLocation[0]); - return LocationPtr; -} -EXPORT_SYMBOL(iSeries_GetLocationData); - -/* * Formats the device information. * - Pass in pci_dev* pointer to the device. * - Pass in buffer to place the data. Danger here is the buffer must @@ -149,18 +124,6 @@ } /* - * Build a character string of the device location, Frame 1, Card C10 - */ -int device_Location(struct pci_dev *PciDev, char *BufPtr) -{ - struct iSeries_Device_Node *DevNode = - (struct iSeries_Device_Node *)PciDev->sysdata; - return sprintf(BufPtr, "PCI: Bus%3d, AgentId%3d, Vendor %04X, Location %s", - DevNode->DsaAddr.Dsa.busNumber, DevNode->AgentId, - DevNode->Vendor, DevNode->Location); -} - -/* * Parse the Slot Area */ void iSeries_Parse_SlotArea(SlotMap *MapPtr, int MapLen, diff -ruN linus-bk/include/asm-ppc64/iSeries/iSeries_VpdInfo.h linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_VpdInfo.h --- linus-bk/include/asm-ppc64/iSeries/iSeries_VpdInfo.h 2002-02-14 23:14:36.000000000 +1100 +++ linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_VpdInfo.h 1970-01-01 10:00:00.000000000 +1000 @@ -1,56 +0,0 @@ -#ifndef _ISERIES_VPDINFO_H -#define _ISERIES_VPDINFO_H -/************************************************************************/ -/* File iSeries_VpdInfo.h created by Allan Trautman Feb 08 2001. */ -/************************************************************************/ -/* This code supports the location data fon on the IBM iSeries systems. */ -/* Copyright (C) 20yy */ -/* */ -/* This program is free software; you can redistribute it and/or modify */ -/* it under the terms of the GNU General Public License as published by */ -/* the Free Software Foundation; either version 2 of the License, or */ -/* (at your option) any later version. */ -/* */ -/* This program is distributed in the hope that it will be useful, */ -/* but WITHOUT ANY WARRANTY; without even the implied warranty of */ -/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the */ -/* GNU General Public License for more details. */ -/* */ -/* You should have received a copy of the GNU General Public License */ -/* along with this program; if not, write to the: */ -/* Free Software Foundation, Inc., */ -/* 59 Temple Place, Suite 330, */ -/* Boston, MA 02111-1307 USA */ -/************************************************************************/ -/* Change Activity: */ -/* Created, Feg 8, 2001 */ -/* Reformated for Card, March 8, 2001 */ -/* End Change Activity */ -/************************************************************************/ - -struct pci_dev; /* Forward Declare */ -/************************************************************************/ -/* Location Data extracted from the VPD list and device info. */ -/************************************************************************/ -struct LocationDataStruct { /* Location data structure for device */ - u16 Bus; /* iSeries Bus Number 0x00*/ - u16 Board; /* iSeries Board 0x02*/ - u8 FrameId; /* iSeries spcn Frame Id 0x04*/ - u8 PhbId; /* iSeries Phb Location 0x05*/ - u16 Card; /* iSeries Card Slot 0x06*/ - char CardLocation[4]; /* Char format of planar vpd 0x08*/ - u8 AgentId; /* iSeries AgentId 0x0C*/ - u8 SecondaryAgentId; /* iSeries Secondary Agent Id 0x0D*/ - u8 LinuxBus; /* Linux Bus Number 0x0E*/ - u8 LinuxDevFn; /* Linux Device Function 0x0F*/ -}; -typedef struct LocationDataStruct LocationData; -#define LOCATION_DATA_SIZE 16 - -/************************************************************************/ -/* Protypes */ -/************************************************************************/ -extern LocationData* iSeries_GetLocationData(struct pci_dev* PciDev); -extern int iSeries_Device_Information(struct pci_dev*,char*, int); - -#endif /* _ISERIES_VPDINFO_H */ diff -ruN linus-bk/include/asm-ppc64/iSeries/iSeries_pci.h linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_pci.h --- linus-bk/include/asm-ppc64/iSeries/iSeries_pci.h 2004-04-01 06:59:37.000000000 +1000 +++ linus-bk-sfr.16/include/asm-ppc64/iSeries/iSeries_pci.h 2005-01-19 16:33:01.000000000 +1100 @@ -102,27 +102,9 @@ }; /************************************************************************/ -/* Location Data extracted from the VPD list and device info. */ -/************************************************************************/ - -struct LocationDataStruct { /* Location data structure for device */ - u16 Bus; /* iSeries Bus Number 0x00*/ - u16 Board; /* iSeries Board 0x02*/ - u8 FrameId; /* iSeries spcn Frame Id 0x04*/ - u8 PhbId; /* iSeries Phb Location 0x05*/ - u8 AgentId; /* iSeries AgentId 0x06*/ - u8 Card; - char CardLocation[4]; -}; - -typedef struct LocationDataStruct LocationData; -#define LOCATION_DATA_SIZE 48 - -/************************************************************************/ /* Functions */ /************************************************************************/ -extern LocationData* iSeries_GetLocationData(struct pci_dev* PciDev); extern int iSeries_Device_Information(struct pci_dev*,char*, int); extern void iSeries_Get_Location_Code(struct iSeries_Device_Node*); extern int iSeries_Device_ToggleReset(struct pci_dev* PciDev, int AssertTime, int DelayTime); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050119/cd8071d1/attachment.pgp From paulus at samba.org Wed Jan 19 17:06:05 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 19 Jan 2005 17:06:05 +1100 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050117201415.GA11505@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <20050117201415.GA11505@austin.ibm.com> Message-ID: <16877.63693.915740.385920@cargo.ozlabs.ibm.com> Linas Vepstas writes: > p.s. It was not clear to me if the EEH patch previously sent > (6 January 2005, same subject line) will be wending its way into > the main Torvalds kernel tree, or not. I hadn't really gotten > confirmation one way or another. I'm not really totally happy with it yet, on a number of fronts: 1. You're adding more PCI-specific stuff to the device_node struct, which I don't like. I would prefer that the device_node tree contains basically just what we get from OF, and that we have a separate struct for storing ppc64-specific information for each PCI device. Fixing that is outside the scope of your patch, though. 2. I don't see why the device nodes for the PCI subtree being reset would go away, and thus I don't see the need for your eeh_cfg_tree struct. 3. Is there a good reason why we can't use the assigned-addresses property on the relevant device tree nodes to tell us what to set the BARs to? 4. I think the 5 second sleep is quite bogus, and shows that we have the flow of control wrong. In particular I think it should be a userland write to a sysfs file that kicks off the restart process rather than it just happening after 5 seconds. Anyway, what process or thread is executing that 5 second sleep? Is it keventd or something? 5. AFAICS userland will get an unplug notification for the device, but nothing to indicate that is due to an EEH slot isolation event. I think userland should be told about EEH events. Regards, Paul. From mingo at elte.hu Wed Jan 19 18:44:04 2005 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 19 Jan 2005 08:44:04 +0100 Subject: [patch] spin-yield-2.6.11-rc1-A1 In-Reply-To: <20050117124209.GA20796@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <16873.55739.214904.473407@cargo.ozlabs.ibm.com> <20050117124209.GA20796@elte.hu> Message-ID: <20050119074404.GA26768@elte.hu> * Ingo Molnar wrote: > ok - how about the (raw) patch below? (ontop of BK plus the latest > spin-nicer patch i sent earlier.) It builds/boots on x86 but is > untested on ppc64. > > the idea is to make spin_yield() a generic function, with some related > namespace cleanups. wrong patch... Full patch against BK-curr attached. Ingo Signed-off-by: Ingo Molnar --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ -void __lockfunc _##op##_lock(locktype *lock) \ +#define BUILD_LOCK_OPS(op, locktype) \ +void __lockfunc _##op##_lock(locktype##_t *lock) \ { \ preempt_disable(); \ for (;;) { \ @@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ } \ \ EXPORT_SYMBOL(_##op##_lock); \ \ -unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock) \ +unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ return flags; \ @@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir \ EXPORT_SYMBOL(_##op##_lock_irqsave); \ \ -void __lockfunc _##op##_lock_irq(locktype *lock) \ +void __lockfunc _##op##_lock_irq(locktype##_t *lock) \ { \ _##op##_lock_irqsave(lock); \ } \ \ EXPORT_SYMBOL(_##op##_lock_irq); \ \ -void __lockfunc _##op##_lock_bh(locktype *lock) \ +void __lockfunc _##op##_lock_bh(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock); +BUILD_LOCK_OPS(read, rwlock); +BUILD_LOCK_OPS(write, rwlock); #endif /* CONFIG_PREEMPT */ --- linux/arch/ppc64/lib/locks.c.orig +++ linux/arch/ppc64/lib/locks.c @@ -23,7 +23,7 @@ /* waiting for a spinlock... */ #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) -void __spin_yield(spinlock_t *lock) +void spinlock_yield(spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; struct paca_struct *holder_paca; @@ -54,7 +54,7 @@ void __spin_yield(spinlock_t *lock) * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(rwlock_t *rw) +void rwlock_yield(rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; @@ -87,7 +87,7 @@ void spin_unlock_wait(spinlock_t *lock) while (lock->lock) { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } HMT_medium(); } --- linux/include/asm-ia64/spinlock.h.orig +++ linux/include/asm-ia64/spinlock.h @@ -17,6 +17,8 @@ #include #include +#include + typedef struct { volatile unsigned int lock; #ifdef CONFIG_PREEMPT --- linux/include/asm-generic/spinlock.h.orig +++ linux/include/asm-generic/spinlock.h @@ -0,0 +1,11 @@ +#ifndef _ASM_GENERIC_SPINLOCK_H +#define _ASM_GENERIC_SPINLOCK_H + +/* + * Virtual platforms might use these to + * yield to specific virtual CPUs: + */ +#define spinlock_yield(lock) cpu_relax() +#define rwlock_yield(lock) cpu_relax() + +#endif /* _ASM_GENERIC_SPINLOCK_H */ --- linux/include/linux/spinlock.h.orig +++ linux/include/linux/spinlock.h @@ -202,10 +202,12 @@ typedef struct { #define _raw_spin_lock(lock) do { (void)(lock); } while(0) #define spin_is_locked(lock) ((void)(lock), 0) #define _raw_spin_trylock(lock) (((void)(lock), 1)) -#define spin_unlock_wait(lock) (void)(lock); +#define spin_unlock_wait(lock) (void)(lock) #define _raw_spin_unlock(lock) do { (void)(lock); } while(0) #endif /* CONFIG_DEBUG_SPINLOCK */ +#define spinlock_yield(lock) (void)(lock) + /* RW spinlocks: No debug version */ #if (__GNUC__ > 2) @@ -224,6 +226,8 @@ typedef struct { #define _raw_read_trylock(lock) ({ (void)(lock); (1); }) #define _raw_write_trylock(lock) ({ (void)(lock); (1); }) +#define rwlock_yield(lock) (void)(lock) + #define _spin_trylock(lock) ({preempt_disable(); _raw_spin_trylock(lock) ? \ 1 : ({preempt_enable(); 0;});}) --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -15,7 +15,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +43,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +83,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +101,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +123,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +138,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +151,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +186,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter --- linux/include/asm-ppc64/spinlock.h.orig +++ linux/include/asm-ppc64/spinlock.h @@ -64,11 +64,11 @@ static __inline__ void _raw_spin_unlock( #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) /* We only yield to the hypervisor if we are in shared processor mode */ #define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc) -extern void __spin_yield(spinlock_t *lock); -extern void __rw_yield(rwlock_t *lock); +extern void spinlock_yield(spinlock_t *lock); +extern void rwlock_yield(rwlock_t *lock); #else /* SPLPAR || ISERIES */ -#define __spin_yield(x) barrier() -#define __rw_yield(x) barrier() +#define spinlock_yield(x) barrier() +#define rwlock_yield(x) barrier() #define SHARED_PROCESSOR 0 #endif extern void spin_unlock_wait(spinlock_t *lock); @@ -109,7 +109,7 @@ static void __inline__ _raw_spin_lock(sp do { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } while (likely(lock->lock != 0)); HMT_medium(); } @@ -127,7 +127,7 @@ static void __inline__ _raw_spin_lock_fl do { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } while (likely(lock->lock != 0)); HMT_medium(); local_irq_restore(flags_dis); @@ -201,7 +201,7 @@ static void __inline__ _raw_read_lock(rw do { HMT_low(); if (SHARED_PROCESSOR) - __rw_yield(rw); + rwlock_yield(rw); } while (likely(rw->lock < 0)); HMT_medium(); } @@ -258,7 +258,7 @@ static void __inline__ _raw_write_lock(r do { HMT_low(); if (SHARED_PROCESSOR) - __rw_yield(rw); + rwlock_yield(rw); } while (likely(rw->lock != 0)); HMT_medium(); } --- linux/include/asm-x86_64/spinlock.h.orig +++ linux/include/asm-x86_64/spinlock.h @@ -6,6 +6,8 @@ #include #include +#include + extern int printk(const char * fmt, ...) __attribute__ ((format (printf, 1, 2))); From mingo at elte.hu Wed Jan 19 18:55:00 2005 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 19 Jan 2005 08:55:00 +0100 Subject: [patch] spin-yield-2.6.11-rc1-A1 In-Reply-To: <20050119074404.GA26768@elte.hu> References: <16870.20205.389208.213989@cargo.ozlabs.ibm.com> <20050115142537.GD10114@elte.hu> <16873.55739.214904.473407@cargo.ozlabs.ibm.com> <20050117124209.GA20796@elte.hu> <20050119074404.GA26768@elte.hu> Message-ID: <20050119075500.GA26880@elte.hu> * Ingo Molnar wrote: > > ok - how about the (raw) patch below? (ontop of BK plus the latest > > spin-nicer patch i sent earlier.) It builds/boots on x86 but is > > untested on ppc64. > > > > the idea is to make spin_yield() a generic function, with some related > > namespace cleanups. > > wrong patch... Full patch against BK-curr attached. the one below builds/boots as well ... Ingo Signed-off-by: Ingo Molnar --- linux/kernel/exit.c.orig +++ linux/kernel/exit.c @@ -861,8 +861,12 @@ task_t fastcall *next_thread(const task_ #ifdef CONFIG_SMP if (!p->sighand) BUG(); +#ifndef write_is_locked +# warning please implement read_is_locked()/write_is_locked()! +# define write_is_locked rwlock_is_locked +#endif if (!spin_is_locked(&p->sighand->siglock) && - !rwlock_is_locked(&tasklist_lock)) + !write_is_locked(&tasklist_lock)) BUG(); #endif return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID); --- linux/kernel/spinlock.c.orig +++ linux/kernel/spinlock.c @@ -173,8 +173,8 @@ EXPORT_SYMBOL(_write_lock); * (We do this in a function because inlining it would be excessive.) */ -#define BUILD_LOCK_OPS(op, locktype, is_locked_fn) \ -void __lockfunc _##op##_lock(locktype *lock) \ +#define BUILD_LOCK_OPS(op, locktype) \ +void __lockfunc _##op##_lock(locktype##_t *lock) \ { \ preempt_disable(); \ for (;;) { \ @@ -183,15 +183,15 @@ void __lockfunc _##op##_lock(locktype *l preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ } \ \ EXPORT_SYMBOL(_##op##_lock); \ \ -unsigned long __lockfunc _##op##_lock_irqsave(locktype *lock) \ +unsigned long __lockfunc _##op##_lock_irqsave(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -205,8 +205,8 @@ unsigned long __lockfunc _##op##_lock_ir preempt_enable(); \ if (!(lock)->break_lock) \ (lock)->break_lock = 1; \ - while (is_locked_fn(lock) && (lock)->break_lock) \ - cpu_relax(); \ + while (op##_is_locked(lock) && (lock)->break_lock) \ + locktype##_yield(lock); \ preempt_disable(); \ } \ return flags; \ @@ -214,14 +214,14 @@ unsigned long __lockfunc _##op##_lock_ir \ EXPORT_SYMBOL(_##op##_lock_irqsave); \ \ -void __lockfunc _##op##_lock_irq(locktype *lock) \ +void __lockfunc _##op##_lock_irq(locktype##_t *lock) \ { \ _##op##_lock_irqsave(lock); \ } \ \ EXPORT_SYMBOL(_##op##_lock_irq); \ \ -void __lockfunc _##op##_lock_bh(locktype *lock) \ +void __lockfunc _##op##_lock_bh(locktype##_t *lock) \ { \ unsigned long flags; \ \ @@ -246,9 +246,9 @@ EXPORT_SYMBOL(_##op##_lock_bh) * _[spin|read|write]_lock_irqsave() * _[spin|read|write]_lock_bh() */ -BUILD_LOCK_OPS(spin, spinlock_t, spin_is_locked); -BUILD_LOCK_OPS(read, rwlock_t, rwlock_is_locked); -BUILD_LOCK_OPS(write, rwlock_t, spin_is_locked); +BUILD_LOCK_OPS(spin, spinlock); +BUILD_LOCK_OPS(read, rwlock); +BUILD_LOCK_OPS(write, rwlock); #endif /* CONFIG_PREEMPT */ --- linux/arch/ppc64/lib/locks.c.orig +++ linux/arch/ppc64/lib/locks.c @@ -23,7 +23,7 @@ /* waiting for a spinlock... */ #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) -void __spin_yield(spinlock_t *lock) +void spinlock_yield(spinlock_t *lock) { unsigned int lock_value, holder_cpu, yield_count; struct paca_struct *holder_paca; @@ -54,7 +54,7 @@ void __spin_yield(spinlock_t *lock) * This turns out to be the same for read and write locks, since * we only know the holder if it is write-locked. */ -void __rw_yield(rwlock_t *rw) +void rwlock_yield(rwlock_t *rw) { int lock_value; unsigned int holder_cpu, yield_count; @@ -87,7 +87,7 @@ void spin_unlock_wait(spinlock_t *lock) while (lock->lock) { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } HMT_medium(); } --- linux/include/asm-ia64/spinlock.h.orig +++ linux/include/asm-ia64/spinlock.h @@ -17,6 +17,8 @@ #include #include +#include + typedef struct { volatile unsigned int lock; #ifdef CONFIG_PREEMPT --- linux/include/asm-generic/spinlock.h.orig +++ linux/include/asm-generic/spinlock.h @@ -0,0 +1,11 @@ +#ifndef _ASM_GENERIC_SPINLOCK_H +#define _ASM_GENERIC_SPINLOCK_H + +/* + * Virtual platforms might use these to + * yield to specific virtual CPUs: + */ +#define spinlock_yield(lock) cpu_relax() +#define rwlock_yield(lock) cpu_relax() + +#endif /* _ASM_GENERIC_SPINLOCK_H */ --- linux/include/linux/spinlock.h.orig +++ linux/include/linux/spinlock.h @@ -202,10 +202,12 @@ typedef struct { #define _raw_spin_lock(lock) do { (void)(lock); } while(0) #define spin_is_locked(lock) ((void)(lock), 0) #define _raw_spin_trylock(lock) (((void)(lock), 1)) -#define spin_unlock_wait(lock) (void)(lock); +#define spin_unlock_wait(lock) (void)(lock) #define _raw_spin_unlock(lock) do { (void)(lock); } while(0) #endif /* CONFIG_DEBUG_SPINLOCK */ +#define spinlock_yield(lock) (void)(lock) + /* RW spinlocks: No debug version */ #if (__GNUC__ > 2) @@ -224,6 +226,8 @@ typedef struct { #define _raw_read_trylock(lock) ({ (void)(lock); (1); }) #define _raw_write_trylock(lock) ({ (void)(lock); (1); }) +#define rwlock_yield(lock) (void)(lock) + #define _spin_trylock(lock) ({preempt_disable(); _raw_spin_trylock(lock) ? \ 1 : ({preempt_enable(); 0;});}) --- linux/include/asm-i386/spinlock.h.orig +++ linux/include/asm-i386/spinlock.h @@ -7,6 +7,8 @@ #include #include +#include + asmlinkage int printk(const char * fmt, ...) __attribute__ ((format (printf, 1, 2))); @@ -15,7 +17,7 @@ asmlinkage int printk(const char * fmt, */ typedef struct { - volatile unsigned int lock; + volatile unsigned int slock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned magic; #endif @@ -43,7 +45,7 @@ typedef struct { * We make no fairness assumptions. They have a cost. */ -#define spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) +#define spin_is_locked(x) (*(volatile signed char *)(&(x)->slock) <= 0) #define spin_unlock_wait(x) do { barrier(); } while(spin_is_locked(x)) #define spin_lock_string \ @@ -83,7 +85,7 @@ typedef struct { #define spin_unlock_string \ "movb $1,%0" \ - :"=m" (lock->lock) : : "memory" + :"=m" (lock->slock) : : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -101,7 +103,7 @@ static inline void _raw_spin_unlock(spin #define spin_unlock_string \ "xchgb %b0, %1" \ - :"=q" (oldval), "=m" (lock->lock) \ + :"=q" (oldval), "=m" (lock->slock) \ :"0" (oldval) : "memory" static inline void _raw_spin_unlock(spinlock_t *lock) @@ -123,7 +125,7 @@ static inline int _raw_spin_trylock(spin char oldval; __asm__ __volatile__( "xchgb %b0,%1" - :"=q" (oldval), "=m" (lock->lock) + :"=q" (oldval), "=m" (lock->slock) :"0" (0) : "memory"); return oldval > 0; } @@ -138,7 +140,7 @@ static inline void _raw_spin_lock(spinlo #endif __asm__ __volatile__( spin_lock_string - :"=m" (lock->lock) : : "memory"); + :"=m" (lock->slock) : : "memory"); } static inline void _raw_spin_lock_flags (spinlock_t *lock, unsigned long flags) @@ -151,7 +153,7 @@ static inline void _raw_spin_lock_flags #endif __asm__ __volatile__( spin_lock_string_flags - :"=m" (lock->lock) : "r" (flags) : "memory"); + :"=m" (lock->slock) : "r" (flags) : "memory"); } /* @@ -186,7 +188,17 @@ typedef struct { #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) -#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS) +/** + * read_is_locked - would read_trylock() fail? + * @lock: the rwlock in question. + */ +#define read_is_locked(x) (atomic_read((atomic_t *)&(x)->lock) <= 0) + +/** + * write_is_locked - would write_trylock() fail? + * @lock: the rwlock in question. + */ +#define write_is_locked(x) ((x)->lock != RW_LOCK_BIAS) /* * On x86, we implement read-write locks as a 32-bit counter --- linux/include/asm-ppc64/spinlock.h.orig +++ linux/include/asm-ppc64/spinlock.h @@ -64,11 +64,11 @@ static __inline__ void _raw_spin_unlock( #if defined(CONFIG_PPC_SPLPAR) || defined(CONFIG_PPC_ISERIES) /* We only yield to the hypervisor if we are in shared processor mode */ #define SHARED_PROCESSOR (get_paca()->lppaca.shared_proc) -extern void __spin_yield(spinlock_t *lock); -extern void __rw_yield(rwlock_t *lock); +extern void spinlock_yield(spinlock_t *lock); +extern void rwlock_yield(rwlock_t *lock); #else /* SPLPAR || ISERIES */ -#define __spin_yield(x) barrier() -#define __rw_yield(x) barrier() +#define spinlock_yield(x) barrier() +#define rwlock_yield(x) barrier() #define SHARED_PROCESSOR 0 #endif extern void spin_unlock_wait(spinlock_t *lock); @@ -109,7 +109,7 @@ static void __inline__ _raw_spin_lock(sp do { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } while (likely(lock->lock != 0)); HMT_medium(); } @@ -127,7 +127,7 @@ static void __inline__ _raw_spin_lock_fl do { HMT_low(); if (SHARED_PROCESSOR) - __spin_yield(lock); + spinlock_yield(lock); } while (likely(lock->lock != 0)); HMT_medium(); local_irq_restore(flags_dis); @@ -201,7 +201,7 @@ static void __inline__ _raw_read_lock(rw do { HMT_low(); if (SHARED_PROCESSOR) - __rw_yield(rw); + rwlock_yield(rw); } while (likely(rw->lock < 0)); HMT_medium(); } @@ -258,7 +258,7 @@ static void __inline__ _raw_write_lock(r do { HMT_low(); if (SHARED_PROCESSOR) - __rw_yield(rw); + rwlock_yield(rw); } while (likely(rw->lock != 0)); HMT_medium(); } --- linux/include/asm-x86_64/spinlock.h.orig +++ linux/include/asm-x86_64/spinlock.h @@ -6,6 +6,8 @@ #include #include +#include + extern int printk(const char * fmt, ...) __attribute__ ((format (printf, 1, 2))); From nfont at austin.ibm.com Thu Jan 20 03:00:22 2005 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Wed, 19 Jan 2005 10:00:22 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <20050117201415.GA11505@austin.ibm.com> <16877.63693.915740.385920@cargo.ozlabs.ibm.com> Message-ID: <41EE8416.502@austin.ibm.com> Paul Mackerras wrote: > 5. AFAICS userland will get an unplug notification for the device, but > nothing to indicate that is due to an EEH slot isolation event. I > think userland should be told about EEH events. > Currently there is a way for userland to determine if a hotplug event they receive is due to an EEH slot isolation event. It's not very pretty and requires the rtas_errd daemon to be running. The RTAS event generated from the EEH event is logged to /var/log/platform by rtas_errd. Userland scripts would have to search the file for a recent EEH event matching their device to make this determination. This isn't as nice as a direct notification but is what we have at this point. -- Nathan Fontenot From willschm at us.ibm.com Thu Jan 20 08:50:20 2005 From: willschm at us.ibm.com (Will Schmidt) Date: Wed, 19 Jan 2005 15:50:20 -0600 Subject: question about LMB's size In-Reply-To: Message-ID: Hi, Zhao Yu Wang wrote on 01/18/2005 10:50:39 PM: > Hi,Will > Thanks > > >in early boot messages, there is a line "SystemCfg->physicalMemorySize = > >0x.......". This value should be precisely what you are trying to > >measure. > > >A bit later in the logs, you can also see a line > >"Memory: XXXXk/YYYYk available (###k kernel code, ###k reserved, ###k data, > >###k bss, ###k init). > >the YYYYk should also match what you are looking for. > > If the system boot up several days before, the boot log is not > available at this time. Whether there has any other method to get > the physical memory from lpar. You should be able to find a copy of the early boot log somewhere in /var/log; either /var/log/boot.msg or /var/log/dmesg. Depending on the distro or kernel level, the assortments of files in /var/log seems to vary. > > Could the OS provide a method to obtain the real memory. It will > help to dynamic reassign resource according by the load between > several partition. My recommendation is that you stick with the values that are reported via the HMC commands. There might be an RMC command on the Linux side that can obtain the value from the HMC, but am not postive of that. > > > Thanks & Best regards, > > -------------------------------------------- > Wang Zhaoyu > > Email: wangzyu at cn.ibm.com > Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN -Will From benh at kernel.crashing.org Thu Jan 20 18:33:09 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 20 Jan 2005 18:33:09 +1100 Subject: ppc64 vDSO update Message-ID: <1106206389.5294.82.camel@gaston> Latest update for the ppc64 vDSO. Yesterday patch had build issues (some bits were missing from the patch file). This also fixes a time management issue and incorrect eh_frame_hdr sections. http://gate.crashing.org/~benh/ppc64-vdso-20050120.diff Will be sent upstream soon. Ben. From linas at austin.ibm.com Fri Jan 21 09:39:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 20 Jan 2005 16:39:16 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <20050117201415.GA11505@austin.ibm.com> <16877.63693.915740.385920@cargo.ozlabs.ibm.com> Message-ID: <20050120223916.GJ9140@austin.ibm.com> On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark: > Linas Vepstas writes: > > > p.s. It was not clear to me if the EEH patch previously sent > > (6 January 2005, same subject line) will be wending its way into > > the main Torvalds kernel tree, or not. I hadn't really gotten > > confirmation one way or another. > > I'm not really totally happy with it yet, on a number of fronts: > > 1. You're adding more PCI-specific stuff to the device_node struct, > which I don't like. I would prefer that the device_node tree > contains basically just what we get from OF, and that we have a > separate struct for storing ppc64-specific information for each PCI > device. Fixing that is outside the scope of your patch, though. I wrote this down on my to-do list. Its the sort of thing that evaporates from my consciousness when other things come along, but I'll give it a shot. > 2. I don't see why the device nodes for the PCI subtree being reset > would go away, and thus I don't see the need for your eeh_cfg_tree > struct. Its not the reset, its the hot-plug remove. The hot plug code assumes that you are going to physically remove the device from the slot, so it removes the device_node as part of the "unconfig". Of course, I found this out only after performing a null-pointer deref. Note only does the node go away, but all of the various pointers it holds are zeroed in the process. The cfg tree holds on to those pointers, so that I wouldn't have to muck with the device_node removal code to do something tricky. > 3. Is there a good reason why we can't use the assigned-addresses > property on the relevant device tree nodes to tell us what to set > the BARs to? Yes, the reason is that after a reset, that property doesn't hold any decent data. I discussed this with the firmware developers, and thier response was that it is the kernel's responsibility to compute (or save/restore) such values. (Except for bridges, which they will do for us). > 4. I think the 5 second sleep is quite bogus, and shows that we have > the flow of control wrong. :) Yes, well, indeed it is. Don't look at me, not my idea. > In particular I think it should be a > userland write to a sysfs file that kicks off the restart process > rather than it just happening after 5 seconds. Anyway, what > process or thread is executing that 5 second sleep? Is it keventd > or something? Its a workqueue. > 5. AFAICS userland will get an unplug notification for the device, but > nothing to indicate that is due to an EEH slot isolation event. I > think userland should be told about EEH events. In principle, I'd agree. In practice, this would seem to require changes or additions or enhancements to udev that I don't quite understand, as well as potential changes to udev scripts. Maybe I don't understand sysfs sufficiently well. I am very tempted to punt on this, and wait for the Intel-backed PCI-E code to get to this point, and then do whatever they're doing. --linas From linas at austin.ibm.com Fri Jan 21 09:48:12 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 20 Jan 2005 16:48:12 -0600 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <16877.63693.915740.385920@cargo.ozlabs.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <20050117201415.GA11505@austin.ibm.com> <16877.63693.915740.385920@cargo.ozlabs.ibm.com> Message-ID: <20050120224812.GK9140@austin.ibm.com> On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark: > Linas Vepstas writes: > > > p.s. It was not clear to me if the EEH patch previously sent > > (6 January 2005, same subject line) will be wending its way into > > the main Torvalds kernel tree, or not. I hadn't really gotten > > confirmation one way or another. > > I'm not really totally happy with it yet, on a number of fronts: [...] I forgot to mention: while I agree with some/many of these points, especially with regards to recovery, I'd also like to note that the patch was mailed in two independent parts: -- a number of generic infrastructure routines, all in a ppc64 patch, and -- the code that actually performs the recovery, as a patch to the drivers/pci/hotplug subsystem. While the actual recovery code is controversial (e.g. no support of scsi recovery), I'd like to at least get in the the generic infrastructure pieces. --linas From paulus at samba.org Fri Jan 21 13:50:50 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 21 Jan 2005 13:50:50 +1100 Subject: [PATCH] PPC64: EEH Recovery In-Reply-To: <20050120223916.GJ9140@austin.ibm.com> References: <20050106192413.GK22274@austin.ibm.com> <20050117201415.GA11505@austin.ibm.com> <16877.63693.915740.385920@cargo.ozlabs.ibm.com> <20050120223916.GJ9140@austin.ibm.com> Message-ID: <16880.28170.976516.285336@cargo.ozlabs.ibm.com> Linas Vepstas writes: > > 2. I don't see why the device nodes for the PCI subtree being reset > > would go away, and thus I don't see the need for your eeh_cfg_tree > > struct. > > Its not the reset, its the hot-plug remove. The hot plug code assumes > that you are going to physically remove the device from the slot, so > it removes the device_node as part of the "unconfig". OK, I missed that. It seems a bit bogus to me. Could you point me at where in the code this happens? > > 3. Is there a good reason why we can't use the assigned-addresses > > property on the relevant device tree nodes to tell us what to set > > the BARs to? > > Yes, the reason is that after a reset, that property doesn't hold any > decent data. I discussed this with the firmware developers, and thier > response was that it is the kernel's responsibility to compute > (or save/restore) such values. (Except for bridges, which they will do for us). The not holding any decent data is a consequence of the device nodes getting thrown away, isn't it? I fail to see how resetting the device can of itself affect our copy of the device tree. > > In particular I think it should be a > > userland write to a sysfs file that kicks off the restart process > > rather than it just happening after 5 seconds. Anyway, what > > process or thread is executing that 5 second sleep? Is it keventd > > or something? > > Its a workqueue. Which get run in keventd's context. In other words no other workqueues will get run during the 5 second sleep, or at least not on that cpu. Paul. From anton at samba.org Fri Jan 21 16:40:43 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 21 Jan 2005 16:40:43 +1100 Subject: [PATCH] ppc64: limit segment tables on UP kernels Message-ID: <20050121054043.GA10563@krispykreme.ozlabs.ibm.com> We were allocating 48 segment tables on UP kernels. Remove them and save 192kB of kernel memory on UP builds. Anton Signed-off-by: Anton Blanchard diff -puN arch/ppc64/kernel/head.S~limit_stab_on_up arch/ppc64/kernel/head.S --- foobar2/arch/ppc64/kernel/head.S~limit_stab_on_up 2005-01-19 15:16:28.987107097 +1100 +++ foobar2-anton/arch/ppc64/kernel/head.S 2005-01-19 15:16:29.009105597 +1100 @@ -2145,10 +2145,12 @@ swapper_pg_dir: ioremap_dir: .space 4096 +#ifdef CONFIG_SMP /* 1 page segment table per cpu (max 48, cpu0 allocated at STAB0_PHYS_ADDR) */ .globl stab_array stab_array: .space 4096 * 48 +#endif /* * This space gets a copy of optional info passed to us by the bootstrap _ From j.glisse at gmail.com Fri Jan 21 22:22:10 2005 From: j.glisse at gmail.com (Jerome Glisse) Date: Fri, 21 Jan 2005 12:22:10 +0100 Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <20050120231442.GE2626@smtp.west.cox.net> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> <20050110191248.GB3391@smtp.west.cox.net> <4240b91605011011314bb06814@mail.gmail.com> <4240b91605011211101ed322a8@mail.gmail.com> <20050120231442.GE2626@smtp.west.cox.net> Message-ID: <4240b916050121032230b9c5dc@mail.gmail.com> On Thu, 20 Jan 2005 16:14:42 -0700, Tom Rini wrote: > On Wed, Jan 12, 2005 at 08:10:58PM +0100, Jerome Glisse wrote: > > > Wanted to know what is going on with CONFIG_6xx? You will use > > my patch or do you have another better way ? :) > > Can you resend it please? > Here is another version (the previous one used ifdef to comment out function call but i read somewhere that this doesn't follow codeguideline). Anyway i think that my patch is a ugly hack. Signed-off-by: Jerome Glisse best, Jerome Glisse diff -Naur linux/arch/ppc/boot/simple/misc-prep.c linux-2.6.10/arch/ppc/boot/simple/misc-prep.c --- linux/arch/ppc/boot/simple/misc-prep.c 2004-12-24 22:33:51.000000000 +0100 +++ linux-2.6.10/arch/ppc/boot/simple/misc-prep.c 2005-01-21 12:09:50.976426672 +0100 @@ -34,7 +34,11 @@ extern void serial_fixups(void); extern struct bi_record *decompress_kernel(unsigned long load_addr, int num_words, unsigned long cksum); +#ifdef CONFIG_6XX extern void disable_6xx_mmu(void); +#elif +void disable_6xx_mmu(void) {} +#endif extern unsigned long mpc10x_get_mem_size(void); static void From geert at linux-m68k.org Fri Jan 21 23:36:14 2005 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Fri, 21 Jan 2005 13:36:14 +0100 (MET) Subject: Classic PPC specific ASM (CONFIG_6XX) In-Reply-To: <4240b916050121032230b9c5dc@mail.gmail.com> References: <4240b916050109074053e328b1@mail.gmail.com> <16865.39960.274092.996530@cargo.ozlabs.ibm.com> <20050110145219.GB2226@smtp.west.cox.net> <4240b9160501101014317b8d85@mail.gmail.com> <20050110182940.GA3391@smtp.west.cox.net> <4240b91605011010593d2f3b3d@mail.gmail.com> <20050110191248.GB3391@smtp.west.cox.net> <4240b91605011011314bb06814@mail.gmail.com> <4240b91605011211101ed322a8@mail.gmail.com> <20050120231442.GE2626@smtp.west.cox.net> <4240b916050121032230b9c5dc@mail.gmail.com> Message-ID: On Fri, 21 Jan 2005, Jerome Glisse wrote: > On Thu, 20 Jan 2005 16:14:42 -0700, Tom Rini wrote: > > On Wed, Jan 12, 2005 at 08:10:58PM +0100, Jerome Glisse wrote: > > > > > Wanted to know what is going on with CONFIG_6xx? You will use > > > my patch or do you have another better way ? :) > > > > Can you resend it please? > > > > Here is another version (the previous one used ifdef to comment > out function call but i read somewhere that this doesn't follow > codeguideline). Anyway i think that my patch is a ugly hack. > > Signed-off-by: Jerome Glisse > > best, > Jerome Glisse > > > > diff -Naur linux/arch/ppc/boot/simple/misc-prep.c > linux-2.6.10/arch/ppc/boot/simple/misc-prep.c > --- linux/arch/ppc/boot/simple/misc-prep.c 2004-12-24 22:33:51.000000000 +0100 > +++ linux-2.6.10/arch/ppc/boot/simple/misc-prep.c 2005-01-21 > 12:09:50.976426672 +0100 > @@ -34,7 +34,11 @@ > extern void serial_fixups(void); > extern struct bi_record *decompress_kernel(unsigned long load_addr, > int num_words, unsigned long cksum); > +#ifdef CONFIG_6XX > extern void disable_6xx_mmu(void); > +#elif > +void disable_6xx_mmu(void) {} ^^^^^^^^^^^^^^^^^^^^ You better make this one static inline. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From paulus at samba.org Sat Jan 22 16:25:21 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 22 Jan 2005 16:25:21 +1100 Subject: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with msleep() In-Reply-To: <20050118001819.GA24698@us.ibm.com> References: <20050118001819.GA24698@us.ibm.com> Message-ID: <16881.58305.424934.884018@cargo.ozlabs.ibm.com> Nishanth Aravamudan writes: > Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases, > the current code sleeps in TASK_INTERRUPTIBLE but does not account for early > wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE > sleeps in both cases. The second sleep is slightly more difficult to convert as > rtas_event_scan_rate is variable. I have left it as a msleep() call, although > ssleep() may be more appropriate. You have a good point about signals, but I don't like the way that this will elevate the load average by 1 the whole time. We need to fix this properly instead. Paul. From paulus at samba.org Sat Jan 22 20:11:46 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 22 Jan 2005 20:11:46 +1100 Subject: [PATCH 2/2] xmon io space read In-Reply-To: <20050105145757.62c84c3b@localhost> References: <20050105144502.56a15bcd@localhost> <20050105145757.62c84c3b@localhost> Message-ID: <16882.6354.955035.976749@cargo.ozlabs.ibm.com> Jake Moilanen writes: > Here is the support code for xmon to read IO space. I would prefer to see that as a variant of the 'm' command, i.e. you would type 'mi 3f8' to look at serial port registers, etc. Paul. From nish.aravamudan at gmail.com Sun Jan 23 05:49:48 2005 From: nish.aravamudan at gmail.com (Nish Aravamudan) Date: Sat, 22 Jan 2005 10:49:48 -0800 Subject: [KJ] Re: [PATCH 18/21] ppc64/rtasd: replace schedule_timeout() with msleep() In-Reply-To: <16881.58305.424934.884018@cargo.ozlabs.ibm.com> References: <20050118001819.GA24698@us.ibm.com> <16881.58305.424934.884018@cargo.ozlabs.ibm.com> Message-ID: <29495f1d05012210497ee384b3@mail.gmail.com> On Sat, 22 Jan 2005 16:25:21 +1100, Paul Mackerras wrote: > Nishanth Aravamudan writes: > > > Description: Replace schedule_timeout() with msleep()/ssleep(). In both cases, > > the current code sleeps in TASK_INTERRUPTIBLE but does not account for early > > wakeups due to signals being caught; therefore I have used TASK_UNINTERRUPTIBLE > > sleeps in both cases. The second sleep is slightly more difficult to convert as > > rtas_event_scan_rate is variable. I have left it as a msleep() call, although > > ssleep() may be more appropriate. > > You have a good point about signals, but I don't like the way that > this will elevate the load average by 1 the whole time. We need to > fix this properly instead. Ideally, we should fix the load average calculation :) It just seems counterintuitive to me that people would use a less correct sleep-state just to prevent the load average from going up. But I understand your motivation, so it's ok. Just an FYI/FWIW, it seems most other driver authors/maintainers have been somewhat ok with use TASK_UNINTERRUPTIBLE via msleep()/ssleep(), just because of the time units difference (which is just so much easier to understand). Admittedly, it's going to be a long time before HZ is completely out of the kernel (at least the way it is used today to calculate delays/timeouts -- there are a total of ~4000 lines of HZ throughout the kernel), but these patches *are* the first step. How exactly do you mean fix it properly? Do you want to deal with signals? It doesn't seem like the code should fail if a signal hits, but you could save the signal state, block all signals, sleep interruptibly (to prevent load average) and then restore all signals on wake-up. I would also add a comment to the effect that TASK_UNINTERRUPTIBLE would be acceptable, if the loadavg calculation changes; just so another Janitor (once the calc. does change) could go through and change it to msleep() / ssleep() then. Thanks, Nish From anton at samba.org Sun Jan 23 15:27:33 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 23 Jan 2005 15:27:33 +1100 Subject: [PATCH] ppc64: Allow EEH to be disabled In-Reply-To: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> References: <20050113235119.GD6309@krispykreme.ozlabs.ibm.com> Message-ID: <20050123042733.GA5920@krispykreme.ozlabs.ibm.com> Allow EEH to be disabled for pSeries targets, but only if the EMBEDDED option is enabled. This version incorporates some suggestions from Arnd Bergmann and Linas Vepstas. Signed-off-by: Anton Blanchard ===== arch/ppc64/Kconfig 1.77 vs edited ===== --- 1.77/arch/ppc64/Kconfig 2005-01-21 15:56:33 +11:00 +++ edited/arch/ppc64/Kconfig 2005-01-23 15:15:19 +11:00 @@ -234,6 +234,11 @@ Say Y here if you are building a kernel for a desktop system. Say N if you are unsure. +config EEH + bool "PCI Extended Error Handling (EEH)" if EMBEDDED + depends on PPC_PSERIES + default y if !EMBEDDED + # # Use the generic interrupt handling code in kernel/irq/: # ===== arch/ppc64/kernel/Makefile 1.58 vs edited ===== --- 1.58/arch/ppc64/kernel/Makefile 2005-01-08 16:43:52 +11:00 +++ edited/arch/ppc64/kernel/Makefile 2005-01-23 15:15:20 +11:00 @@ -30,9 +30,10 @@ obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - eeh.o pSeries_nvram.o rtasd.o ras.o \ + pSeries_nvram.o rtasd.o ras.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o +obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o ===== arch/ppc64/kernel/eeh.c 1.43 vs edited ===== --- 1.43/arch/ppc64/kernel/eeh.c 2005-01-21 16:02:09 +11:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-01-23 15:15:23 +11:00 @@ -764,8 +764,6 @@ struct device_node *phb, *np; struct eeh_early_enable_info info; - init_pci_config_tokens(); - np = of_find_node_by_path("/rtas"); if (np == NULL) return; ===== arch/ppc64/kernel/pSeries_setup.c 1.66 vs edited ===== --- 1.66/arch/ppc64/kernel/pSeries_setup.c 2005-01-21 16:02:10 +11:00 +++ edited/arch/ppc64/kernel/pSeries_setup.c 2005-01-23 15:15:22 +11:00 @@ -40,7 +40,6 @@ #include #include #include - #include #include #include @@ -59,13 +58,12 @@ #include #include #include - -#include "i8259.h" #include -#include #include +#include "i8259.h" #include "mpic.h" +#include "pci.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -73,7 +71,6 @@ #define DBG(fmt...) #endif -extern void find_and_init_phbs(void); extern void pSeries_final_fixup(void); extern void pSeries_get_boot_time(struct rtc_time *rtc_time); @@ -87,10 +84,6 @@ int fwnmi_active; /* TRUE if an FWNMI handler is present */ -unsigned long virtPython0Facilities = 0; // python0 facility area (memory mapped io) (64-bit format) VIRTUAL address. - -extern unsigned long loops_per_jiffy; - extern unsigned long ppc_proc_freq; extern unsigned long ppc_tb_freq; @@ -230,7 +223,7 @@ fwnmi_init(); /* Find and initialize PCI host bridges */ - /* iSeries needs to be done much later. */ + init_pci_config_tokens(); eeh_init(); find_and_init_phbs(); ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-26 09:17:38 +10:00 +++ edited/include/asm-ppc64/eeh.h 2005-01-23 15:15:21 +11:00 @@ -20,28 +20,28 @@ #ifndef _PPC64_EEH_H #define _PPC64_EEH_H +#include #include #include #include -#include struct pci_dev; struct device_node; +struct device_node; +struct notifier_block; + +#ifdef CONFIG_EEH /* Values for eeh_mode bits in device_node */ #define EEH_MODE_SUPPORTED (1<<0) #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) -#ifdef CONFIG_PPC_PSERIES -extern void __init eeh_init(void); -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); -int eeh_dn_check_failure (struct device_node *dn, struct pci_dev *dev); -void __iomem *eeh_ioremap(unsigned long addr, void __iomem *vaddr); +void __init eeh_init(void); +unsigned long eeh_check_failure(const volatile void __iomem *token, + unsigned long val); +int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev); void __init pci_addr_cache_build(void); -#else -#define eeh_check_failure(token, val) (val) -#endif /** * eeh_add_device_early @@ -52,7 +52,6 @@ * device (including config space i/o). Call eeh_add_device_late * to finish the eeh setup for this device. */ -struct device_node; void eeh_add_device_early(struct device_node *); void eeh_add_device_late(struct pci_dev *); @@ -69,8 +68,6 @@ #define EEH_ENABLE 1 #define EEH_RELEASE_LOADSTORE 2 #define EEH_RELEASE_DMA 3 -int eeh_set_option(struct pci_dev *dev, int options); - /** * Notifier event flags. @@ -107,6 +104,18 @@ */ #define EEH_IO_ERROR_VALUE(size) (~0U >> ((4 - (size)) * 8)) +#else +#define eeh_init() +#define eeh_check_failure(token, val) (val) +#define eeh_dn_check_failure(dn, dev) (0) +#define pci_addr_cache_build() +#define eeh_add_device_early(dn) +#define eeh_add_device_late(dev) +#define eeh_remove_device(dev) +#define EEH_POSSIBLE_ERROR(val, type) (0) +#define EEH_IO_ERROR_VALUE(size) (-1UL) +#endif + /* * MMIO read/write operations with EEH support. */ @@ -194,7 +203,8 @@ #define EEH_CHECK_ALIGN(v,a) \ ((((unsigned long)(v)) & ((a) - 1)) == 0) -static inline void eeh_memset_io(volatile void __iomem *addr, int c, unsigned long n) +static inline void eeh_memset_io(volatile void __iomem *addr, int c, + unsigned long n) { u32 lc = c; lc |= lc << 8; From anton at samba.org Sun Jan 23 15:34:23 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 23 Jan 2005 15:34:23 +1100 Subject: [PATCH] ppc64: disable some boot wrapper debug Message-ID: <20050123043423.GB5920@krispykreme.ozlabs.ibm.com> Hi, The debug information in the boot wrapper can be quite verbose (it prints an entry for every address it attempts to claim). Disable it. Anton Signed-off-by: Anton Blanchard diff -puN arch/ppc64/boot/main.c~disable_boot_debug arch/ppc64/boot/main.c --- foobar2/arch/ppc64/boot/main.c~disable_boot_debug 2005-01-23 13:34:05.555656631 +1100 +++ foobar2-anton/arch/ppc64/boot/main.c 2005-01-23 13:34:05.577655139 +1100 @@ -73,7 +73,7 @@ void *stdin; void *stdout; void *stderr; -#define DEBUG +#undef DEBUG static unsigned long claim_base = PROG_START; _ From anton at samba.org Sun Jan 23 15:48:48 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 23 Jan 2005 15:48:48 +1100 Subject: [PATCH] ppc64: Problem disabling SYSVIPC Message-ID: <20050123044848.GC5920@krispykreme.ozlabs.ibm.com> Hi, The kernel wouldnt link when SYSVIPC was disabled. x86-64 was already defining a cond_syscall, instead of duplicating it in the ppc64 port move it into the arch specific portion of kernel/sys_ni.c Anton Signed-off-by: Anton Blanchard diff -puN kernel/sys_ni.c~fix_config_sysvipc2 kernel/sys_ni.c --- foobar2/kernel/sys_ni.c~fix_config_sysvipc2 2005-01-12 00:17:55.800846282 +1100 +++ foobar2-anton/kernel/sys_ni.c 2005-01-12 00:18:59.720579810 +1100 @@ -81,4 +81,4 @@ cond_syscall(compat_sys_socketcall) cond_syscall(sys_pciconfig_read) cond_syscall(sys_pciconfig_write) cond_syscall(sys_pciconfig_iobase) - +cond_syscall(sys32_ipc) diff -puN arch/ppc64/kernel/sys_ppc32.c~fix_config_sysvipc2 arch/ppc64/kernel/sys_ppc32.c --- foobar2/arch/ppc64/kernel/sys_ppc32.c~fix_config_sysvipc2 2005-01-12 00:18:09.526904432 +1100 +++ foobar2-anton/arch/ppc64/kernel/sys_ppc32.c 2005-01-12 00:18:25.130082960 +1100 @@ -492,6 +492,7 @@ asmlinkage long sys32_settimeofday(struc return do_sys_settimeofday(tv ? &kts : NULL, tz ? &ktz : NULL); } +#ifdef CONFIG_SYSVIPC long sys32_ipc(u32 call, u32 first, u32 second, u32 third, compat_uptr_t ptr, u32 fifth) { @@ -556,6 +557,7 @@ long sys32_ipc(u32 call, u32 first, u32 return -ENOSYS; } +#endif /* Note: it is necessary to treat out_fd and in_fd as unsigned ints, * with the corresponding cast to a signed int to insure that the diff -puN arch/x86_64/ia32/sys_ia32.c~fix_config_sysvipc2 arch/x86_64/ia32/sys_ia32.c --- foobar2/arch/x86_64/ia32/sys_ia32.c~fix_config_sysvipc2 2005-01-12 00:18:46.324623956 +1100 +++ foobar2-anton/arch/x86_64/ia32/sys_ia32.c 2005-01-12 00:18:52.356042193 +1100 @@ -1082,8 +1082,6 @@ long sys32_lookup_dcookie(u32 addr_low, return sys_lookup_dcookie(((u64)addr_high << 32) | addr_low, buf, len); } -cond_syscall(sys32_ipc) - static int __init ia32_init (void) { printk("IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $\n"); _ From anton at samba.org Sun Jan 23 16:36:52 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 23 Jan 2005 16:36:52 +1100 Subject: [PATCH] ppc64: Enable virtual ethernet and virtual scsi Message-ID: <20050123053652.GE5920@krispykreme.ozlabs.ibm.com> Enable the virtual ethernet and virtual scsi drivers in the pseries config. Since our root device may be on either we need them compiled in (unless we play initrd tricks). Signed-off-by: Anton Blanchard ===== arch/ppc64/configs/pSeries_defconfig 1.10 vs edited ===== --- 1.10/arch/ppc64/configs/pSeries_defconfig 2004-11-27 22:20:13 +11:00 +++ edited/arch/ppc64/configs/pSeries_defconfig 2005-01-23 16:26:07 +11:00 @@ -268,7 +268,7 @@ # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set -CONFIG_SCSI_IBMVSCSI=m +CONFIG_SCSI_IBMVSCSI=y # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 @@ -492,7 +492,7 @@ # # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set -CONFIG_IBMVETH=m +CONFIG_IBMVETH=y CONFIG_NET_PCI=y CONFIG_PCNET32=y # CONFIG_AMD8111_ETH is not set From linas at austin.ibm.com Tue Jan 25 10:04:53 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 24 Jan 2005 17:04:53 -0600 Subject: saving & analyzing (by the bootloader) kernel boot log buf fer on "vanilla"Linux (2.6) usable for for 8xx ppc In-Reply-To: <313680C9A886D511A06000204840E1CF0A64754F@whq-msgusr-02.pit.comms.marconi.com> References: <313680C9A886D511A06000204840E1CF0A64754F@whq-msgusr-02.pit.comms.marconi.com> Message-ID: <20050124230453.GN9140@austin.ibm.com> On Sat, Jan 22, 2005 at 06:26:43AM -0500, Povolotsky, Alexander was heard to remark: > I would suggest CONSIDER implementing - it would help for early debugging > when serial console > is not working and no "live"output is available - I am in such situation > right now ! Are you perchance seeing "Warning: unable to open an initial console." on ppc64? If so, I am debugging that right now, and hope to have a patch soon. --linas From anton at samba.org Wed Jan 26 00:59:30 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 26 Jan 2005 00:59:30 +1100 Subject: [PATCH] ppc64: mask lower bits in tlbie Message-ID: <20050125135930.GH5920@krispykreme.ozlabs.ibm.com> Hi, We werent masking the lower bits of the VA in a tlbie(l) instruction. While most CPUs ignore this we should play it safe and follow the spec. Anton Signed-off-by: Anton Blanchard diff -puN include/asm-ppc64/mmu.h~fix_tlbie include/asm-ppc64/mmu.h --- gr_work/include/asm-ppc64/mmu.h~fix_tlbie 2005-01-12 22:54:35.098404315 -0600 +++ gr_work-anton/include/asm-ppc64/mmu.h 2005-01-12 22:54:35.107402890 -0600 @@ -122,10 +122,13 @@ static inline void __tlbie(unsigned long /* clear top 16 bits, non SLS segment */ va &= ~(0xffffULL << 48); - if (large) + if (large) { + va &= HPAGE_MASK; asm volatile("tlbie %0,1" : : "r"(va) : "memory"); - else + } else { + va &= PAGE_MASK; asm volatile("tlbie %0,0" : : "r"(va) : "memory"); + } } static inline void tlbie(unsigned long va, int large) @@ -139,6 +142,7 @@ static inline void __tlbiel(unsigned lon { /* clear top 16 bits, non SLS segment */ va &= ~(0xffffULL << 48); + va &= PAGE_MASK; /* * Thanks to Alan Modra we are now able to use machine specific _ From nathanl at austin.ibm.com Wed Jan 26 11:22:01 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 25 Jan 2005 18:22:01 -0600 Subject: [PATCH] show -1 for physical_id of non-present cpus Message-ID: <1106698921.9091.4.camel@pants.austin.ibm.com> Make the physical_id cpu attribute on ppc64 show -1 instead of 65535 for non-present cpus. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/sysfs.c~cpu-physical_id-signed arch/ppc64/kernel/sysfs.c --- linux-2.6.11-rc2-bk2/arch/ppc64/kernel/sysfs.c~cpu-physical_id-signed 2005-01-24 21:29:57.000000000 -0600 +++ linux-2.6.11-rc2-bk2-nathanl/arch/ppc64/kernel/sysfs.c 2005-01-25 09:41:15.000000000 -0600 @@ -387,7 +387,7 @@ static ssize_t show_physical_id(struct s { struct cpu *cpu = container_of(dev, struct cpu, sysdev); - return sprintf(buf, "%u\n", get_hard_smp_processor_id(cpu->sysdev.id)); + return sprintf(buf, "%hd\n", get_hard_smp_processor_id(cpu->sysdev.id)); } static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL); _ From olof at austin.ibm.com Wed Jan 26 15:11:43 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 25 Jan 2005 22:11:43 -0600 Subject: [PATCH] show -1 for physical_id of non-present cpus In-Reply-To: <1106698921.9091.4.camel@pants.austin.ibm.com> References: <1106698921.9091.4.camel@pants.austin.ibm.com> Message-ID: <41F7187F.9070602@austin.ibm.com> Nathan Lynch wrote: >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535 >for non-present cpus. > Good catch. I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 and using %d. Either way is fine with me. -Olof From nathanl at austin.ibm.com Wed Jan 26 15:41:03 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 25 Jan 2005 22:41:03 -0600 Subject: [PATCH] show -1 for physical_id of non-present cpus In-Reply-To: <41F7187F.9070602@austin.ibm.com> References: <1106698921.9091.4.camel@pants.austin.ibm.com> <41F7187F.9070602@austin.ibm.com> Message-ID: <1106714463.9855.16.camel@localhost.localdomain> On Tue, 2005-01-25 at 22:11 -0600, Olof Johansson wrote: > Nathan Lynch wrote: > > >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535 > >for non-present cpus. > > > > Good catch. > > I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 > and using %d. Either way is fine with me. fwiw, I plan to to make the issue moot eventually by having only present cpus show up in sysfs, but that's not going to happen in time for 2.6.11. Nathan From nathanl at austin.ibm.com Wed Jan 26 16:06:31 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 25 Jan 2005 23:06:31 -0600 Subject: [RFC/PATCH 1/2] use notifier chain for device node addition and removal Message-ID: <1106715991.9855.22.camel@localhost.localdomain> This patch attempts to clean up the code which handles changes to the Open Firmware device tree during PCI hotplug or DLPAR operations by replacing the explicit fixups (e.g. of_finish_dynamic_node, of_cleanup_node) with a notifier call chain. It doesn't make all that much of a dent in the ugliness -- note that I've simply folded of_finish_dynamic_node into a high-priority notifier block while leaving most of the function intact. My ulterior motive here is that I want to be notified when processor device nodes are added to the system, and I don't want to add yet more special-case code to prom.c. I'll be following up with a patch for this. We could probably go further with the notifier chain approach, even to the point of moving of_finish_dynamic_node and friends to a separate module which could be config'd out for non-pSeries builds. I haven't tested this with anything but adding and removing processors from a Power5 partition, btw. I'd appreciate any other testing (e.g. PCI, VIO). Thoughts? Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/pSeries_iommu.c~of-dlpar-notifier arch/ppc64/kernel/pSeries_iommu.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pSeries_iommu.c~of-dlpar-notifier 2005-01-25 22:56:46.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pSeries_iommu.c 2005-01-25 22:56:46.000000000 -0600 @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -439,6 +440,29 @@ static void iommu_dev_setup_pSeries(stru } } +static int iommu_of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node) +{ + struct device_node *node = (struct device_node *)_node; + int err = NOTIFY_DONE; + + switch (action) { + case OF_RECONFIG_REMOVE: + if (node->iommu_table && + get_property(node, "ibm,dma-window", NULL)) { + iommu_free_table(node); + err = NOTIFY_OK; + } + break; + default: + break; + } + return err; +} + +static struct notifier_block iommu_of_reconfig_nb = { + .notifier_call = iommu_of_reconfig_notifier, +}; + static void iommu_bus_setup_null(struct pci_bus *b) { } static void iommu_dev_setup_null(struct pci_dev *d) { } @@ -471,6 +495,8 @@ void iommu_init_early_pSeries(void) ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; + register_of_reconfig_notifier(&iommu_of_reconfig_nb); + pci_iommu_init(); } diff -puN arch/ppc64/kernel/pci_dn.c~of-dlpar-notifier arch/ppc64/kernel/pci_dn.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pci_dn.c~of-dlpar-notifier 2005-01-25 22:56:46.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pci_dn.c 2005-01-25 22:56:46.000000000 -0600 @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -158,6 +159,25 @@ struct device_node *fetch_dev_dn(struct } EXPORT_SYMBOL(fetch_dev_dn); +static int pci_of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node) +{ + struct device_node *node = (struct device_node *)_node; + int err = NOTIFY_OK; + + switch (action) { + case OF_RECONFIG_ADD: + update_dn_pci_info(node, node->parent->phb); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block pci_of_reconfig_nb = { + .notifier_call = pci_of_reconfig_notifier, +}; /* * Actually initialize the phbs. @@ -170,4 +190,7 @@ void __init pci_devs_phb_init(void) /* This must be done first so the device nodes have valid pci info! */ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) pci_devs_phb_init_dynamic(phb); + + if (systemcfg->platform & PLATFORM_PSERIES) + register_of_reconfig_notifier(&pci_of_reconfig_nb); } diff -puN arch/ppc64/kernel/prom.c~of-dlpar-notifier arch/ppc64/kernel/prom.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/prom.c~of-dlpar-notifier 2005-01-25 22:56:46.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/prom.c 2005-01-25 22:56:46.000000000 -0600 @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -1671,7 +1672,6 @@ static int of_finish_dynamic_node_interr static int of_finish_dynamic_node(struct device_node *node) { struct device_node *parent = of_get_parent(node); - u32 *regs; int err = 0; phandle *ibm_phandle; @@ -1726,25 +1726,53 @@ static int of_finish_dynamic_node(struct err = of_finish_dynamic_node_interrupts(node); if (err) goto out; } +out: + of_node_put(parent); + return err; +} - /* now do the rough equivalent of update_dn_pci_info, this - * probably is not correct for phb's, but should work for - * IOAs and slots. - */ +static struct notifier_block *of_reconfig_chain; + +int register_of_reconfig_notifier(struct notifier_block *nb) +{ + return notifier_chain_register(&of_reconfig_chain, nb); +} - node->phb = parent->phb; +void unregister_of_reconfig_notifier(struct notifier_block *nb) +{ + notifier_chain_unregister(&of_reconfig_chain, nb); +} - regs = (u32 *)get_property(node, "reg", NULL); - if (regs) { - node->busno = (regs[0] >> 16) & 0xff; - node->devfn = (regs[0] >> 8) & 0xff; - } +static int of_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *_node) +{ + struct device_node *node = (struct device_node *)_node; + int err = NOTIFY_OK; -out: - of_node_put(parent); + switch (action) { + case OF_RECONFIG_ADD: + if (of_finish_dynamic_node(node)) + err = NOTIFY_BAD; + break; + default: + err = NOTIFY_DONE; + break; + } return err; } +static struct notifier_block of_reconfig_nb = { + .notifier_call = of_reconfig_notifier, + .priority = 10, /* This one needs to run first */ +}; + +static int __init of_reconfig_setup(void) +{ + if (systemcfg->platform & PLATFORM_PSERIES) + register_of_reconfig_notifier(&of_reconfig_nb); + return 0; +} +__initcall(of_reconfig_setup); + /* * Given a path and a property list, construct an OF device node, add * it to the device tree and global list, and place it in @@ -1778,9 +1806,11 @@ int of_add_node(const char *path, struct return -EINVAL; /* could also be ENOMEM, though */ } - if (0 != (err = of_finish_dynamic_node(np))) { + err = notifier_call_chain(&of_reconfig_chain, OF_RECONFIG_ADD, np); + if (err == NOTIFY_BAD) { + printk(KERN_WARNING "Failed to add device node %s\n", path); kfree(np); - return err; + return -EINVAL; } write_lock(&devtree_lock); @@ -1798,15 +1828,6 @@ int of_add_node(const char *path, struct } /* - * Prepare an OF node for removal from system - */ -static void of_cleanup_node(struct device_node *np) -{ - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); -} - -/* * "Unplug" a node from the device tree. The caller must hold * a reference to the node. The memory associated with the node * is not freed until its refcount goes to zero. @@ -1814,6 +1835,7 @@ static void of_cleanup_node(struct devic int of_remove_node(struct device_node *np) { struct device_node *parent, *child; + int err; parent = of_get_parent(np); if (!parent) @@ -1824,7 +1846,9 @@ int of_remove_node(struct device_node *n return -EBUSY; } - of_cleanup_node(np); + err = notifier_call_chain(&of_reconfig_chain, OF_RECONFIG_REMOVE, np); + if (err == NOTIFY_BAD) + return -EBUSY; write_lock(&devtree_lock); remove_node_proc_entries(np); diff -puN include/asm-ppc64/prom.h~of-dlpar-notifier include/asm-ppc64/prom.h --- linux-2.6.11-rc2-mm1/include/asm-ppc64/prom.h~of-dlpar-notifier 2005-01-25 22:56:46.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/include/asm-ppc64/prom.h 2005-01-25 22:56:46.000000000 -0600 @@ -211,6 +211,14 @@ extern void of_node_put(struct device_no extern int of_add_node(const char *path, struct property *proplist); extern int of_remove_node(struct device_node *np); +/* For notification of device node addition and removal */ +extern int register_of_reconfig_notifier(struct notifier_block *nb); +extern void unregister_of_reconfig_notifier(struct notifier_block *nb); + +/* Notification codes for users of the above */ +#define OF_RECONFIG_ADD 0x0001 +#define OF_RECONFIG_REMOVE 0x0002 + /* Other Prototypes */ extern unsigned long prom_init(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); _ From nathanl at austin.ibm.com Wed Jan 26 16:11:05 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 25 Jan 2005 23:11:05 -0600 Subject: [RFC/PATCH 2/2] handle cpu device node addition and removal In-Reply-To: <1106715991.9855.22.camel@localhost.localdomain> References: <1106715991.9855.22.camel@localhost.localdomain> Message-ID: <1106716265.9855.26.camel@localhost.localdomain> Using the notifier chain in a previous patch, handle addition and removal of processors on pSeries LPAR. The new notifier call updates cpu_present_map and sets hw_cpu_id in the paca appropriately. Note that we must handle more than one cpu being added or going away to account for SMT processors. This allows us to stop abusing cpu_present_map, and lets us get rid of find_physical_cpu_to_start, which has always been a bit dodgy. The code which updates cpu_present_map I plan to move to the generic hotplug cpu code someday, but I think this is a good intermediate step for now. Tested on Power5. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/pSeries_smp.c~cpu-dlpar-notifier arch/ppc64/kernel/pSeries_smp.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/pSeries_smp.c~cpu-dlpar-notifier 2005-01-25 22:57:15.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/pSeries_smp.c 2005-01-25 22:57:15.000000000 -0600 @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -125,54 +126,6 @@ void pSeries_cpu_die(unsigned int cpu) paca[cpu].cpu_start = 0; } -/* Search all cpu device nodes for an offline logical cpu. If a - * device node has a "ibm,my-drc-index" property (meaning this is an - * LPAR), paranoid-check whether we own the cpu. For each "thread" - * of a cpu, if it is offline and has the same hw index as before, - * grab that in preference. - */ -static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) -{ - struct device_node *np = NULL; - unsigned int best = -1U; - - while ((np = of_find_node_by_type(np, "cpu"))) { - int nr_threads, len; - u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); - u32 *tid = (u32 *) - get_property(np, "ibm,ppc-interrupt-server#s", &len); - - if (!tid) - tid = (u32 *)get_property(np, "reg", &len); - - if (!tid) - continue; - - /* If there is a drc-index, make sure that we own - * the cpu. - */ - if (index) { - int state; - int rc = rtas_get_sensor(9003, *index, &state); - if (rc != 0 || state != 1) - continue; - } - - nr_threads = len / sizeof(u32); - - while (nr_threads--) { - if (0 == query_cpu_stopped(tid[nr_threads])) { - best = tid[nr_threads]; - if (best == old_hwindex) - goto out; - } - } - } -out: - of_node_put(np); - return best; -} - /** * smp_startup_cpu() - start the given cpu * @@ -189,25 +142,16 @@ static inline int __devinit smp_startup_ int status; unsigned long start_here = __pa((u32)*((unsigned long *) pSeries_secondary_smp_init)); - unsigned int pcpu; + unsigned int pcpu = get_hard_smp_processor_id(lcpu); /* At boot time the cpus are already spinning in hold * loops, so nothing to do. */ if (system_state < SYSTEM_RUNNING) return 1; - pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); - if (pcpu == -1U) { - printk(KERN_INFO "No more cpus available, failing\n"); - return 0; - } - /* Fixup atomic count: it exited inside IRQ handler. */ paca[lcpu].__current->thread_info->preempt_count = 0; - /* At boot this is done in prom.c. */ - paca[lcpu].hw_cpu_id = pcpu; - status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, pcpu, start_here, lcpu); if (status != 0) { @@ -324,6 +268,116 @@ static struct smp_ops_t pSeries_xics_smp .setup_cpu = smp_xics_setup_cpu, }; +/* + * Update cpu_present_map and paca for a new cpu node. Would like to + * move parts of this to generic code so that hotplug events are + * generated for each new cpu, but this is needed for now. + */ +static int pSeries_add_processor(struct device_node *node) +{ + unsigned int cpu; + cpumask_t candidate_map, tmp = CPU_MASK_NONE; + int err = 0, len, nthreads, i; + u32 *intserv; + + intserv = (u32 *)get_property(node, "ibm,ppc-interrupt-server#s", + &len); + if (!intserv) + goto out; + nthreads = len / sizeof(u32); + for (i = 0; i < nthreads; i ++) + cpu_set(i, tmp); + + lock_cpu_hotplug(); + + cpus_xor(candidate_map, cpu_possible_map, cpu_present_map); + err = -EINVAL; + if (cpus_empty(candidate_map)) + goto out_unlock; + + while (!cpus_empty(tmp)) + if (cpus_subset(tmp, candidate_map)) + /* Found a range where we can insert the new cpu(s) */ + break; + else + cpus_shift_left(tmp, tmp, nthreads); + + if (cpus_empty(tmp)) { + printk(KERN_INFO "Unable to find space in cpu_present_map for" + " processor %s with %d thread(s)\n", node->name, + nthreads); + goto out_unlock; + } + + for_each_cpu_mask(cpu, tmp) { + BUG_ON(cpu_isset(cpu, cpu_present_map)); + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, *intserv++); + } + err = 0; +out_unlock: + unlock_cpu_hotplug(); +out: + return err; +} + +/* + * Update present map for a cpu node which is going away, and set the + * "hard" id in the paca(s) to -1 to be consistent with boot time + * convention for non-present cpus. + */ +static int pSeries_remove_processor(struct device_node *node) +{ + unsigned int cpu; + int len, nthreads, i; + u32 *intserv = (u32 *)get_property(node, "ibm,ppc-interrupt-server#s", + &len); + if (!intserv) + return 0; + + nthreads = len / sizeof(u32); + + lock_cpu_hotplug(); + for (i = 0; i < nthreads; i++) { + for_each_present_cpu(cpu) { + if (get_hard_smp_processor_id(cpu) == intserv[i]) { + BUG_ON(cpu_online(cpu)); + cpu_clear(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, -1); + break; + } + } + if (cpu == NR_CPUS) + printk(KERN_WARNING "Could not find cpu to remove " + "with physical id 0x%x\n", intserv[i]); + } + unlock_cpu_hotplug(); + return 0; +} + +static int pSeries_smp_notifier(struct notifier_block *nb, unsigned long action, void *_node) +{ + struct device_node *node = _node; + int err = NOTIFY_OK; + + switch (action) { + case OF_RECONFIG_ADD: + if (pSeries_add_processor(node)) + err = NOTIFY_BAD; + break; + case OF_RECONFIG_REMOVE: + if (pSeries_remove_processor(node)) + err = NOTIFY_BAD; + default: + err = NOTIFY_DONE; + } + return err; +} + +static struct notifier_block pSeries_smp_nb = { + .notifier_call = pSeries_smp_notifier, +}; + /* This is called very early */ void __init smp_init_pSeries(void) { @@ -362,6 +416,9 @@ void __init smp_init_pSeries(void) smp_ops->take_timebase = pSeries_take_timebase; } + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + register_of_reconfig_notifier(&pSeries_smp_nb); + DBG(" <- smp_init_pSeries()\n"); } diff -puN arch/ppc64/kernel/smp.c~cpu-dlpar-notifier arch/ppc64/kernel/smp.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/smp.c~cpu-dlpar-notifier 2005-01-25 22:57:15.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/smp.c 2005-01-25 22:57:15.000000000 -0600 @@ -526,14 +526,6 @@ void __init smp_cpus_done(unsigned int m smp_ops->setup_cpu(boot_cpuid); set_cpus_allowed(current, old_mask); - - /* - * We know at boot the maximum number of cpus we can add to - * a partition and set cpu_possible_map accordingly. cpu_present_map - * needs to match for the hotplug code to allow us to hot add - * any offline cpus. - */ - cpu_present_map = cpu_possible_map; } #ifdef CONFIG_HOTPLUG_CPU _ From dwmw2 at infradead.org Thu Jan 27 05:45:40 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 26 Jan 2005 18:45:40 +0000 Subject: Syscall auditing on ppc64 lacks correct return codes. Message-ID: <1106765140.19262.27.camel@hades.cambridge.redhat.com> We were pretending that every syscall returned zero. Don't do that. ===== arch/ppc64/kernel/entry.S 1.51 vs edited ===== --- 1.51/arch/ppc64/kernel/entry.S Thu Jan 13 09:48:36 2005 +++ edited/arch/ppc64/kernel/entry.S Thu Jan 20 16:14:50 2005 @@ -231,6 +231,7 @@ syscall_exit_trace: std r3,GPR3(r1) bl .save_nvgprs + addi r3,r1,STACK_FRAME_OVERHEAD bl .do_syscall_trace_leave REST_NVGPRS(r1) ld r3,GPR3(r1) @@ -324,6 +325,7 @@ ld r4,TI_FLAGS(r4) andi. r4,r4,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) beq+ 81f + addi r3,r1,STACK_FRAME_OVERHEAD bl .do_syscall_trace_leave 81: b .ret_from_except ===== arch/ppc64/kernel/ptrace.c 1.13 vs edited ===== --- 1.13/arch/ppc64/kernel/ptrace.c Fri Dec 17 08:09:09 2004 +++ edited/arch/ppc64/kernel/ptrace.c Thu Jan 20 16:24:12 2005 @@ -313,10 +313,10 @@ do_syscall_trace(); } -void do_syscall_trace_leave(void) +void do_syscall_trace_leave(struct pt_regs *regs) { if (unlikely(current->audit_context)) - audit_syscall_exit(current, 0); /* FIXME: pass pt_regs */ + audit_syscall_exit(current, regs->result); if ((test_thread_flag(TIF_SYSCALL_TRACE) || test_thread_flag(TIF_SINGLESTEP)) -- dwmw2 From paulus at samba.org Thu Jan 27 13:27:01 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 27 Jan 2005 13:27:01 +1100 Subject: [PATCH] show -1 for physical_id of non-present cpus In-Reply-To: <41F7187F.9070602@austin.ibm.com> References: <1106698921.9091.4.camel@pants.austin.ibm.com> <41F7187F.9070602@austin.ibm.com> Message-ID: <16888.20853.816824.41795@cargo.ozlabs.ibm.com> Olof Johansson writes: > Nathan Lynch wrote: > > >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535 > >for non-present cpus. > > > > Good catch. > > I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 > and using %d. Either way is fine with me. Changing hw_cpu_id to a signed quantity sounds cleaner to me. Paul. From dhowells at redhat.com Fri Jan 28 01:02:42 2005 From: dhowells at redhat.com (David Howells) Date: Thu, 27 Jan 2005 14:02:42 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race In-Reply-To: <1561.1106077468@redhat.com> References: <1561.1106077468@redhat.com> <1106014803.30801.22.camel@localhost.localdomain> <31453.1105979239@redhat.com> Message-ID: <3244.1106834562@redhat.com> David Howells wrote: > Rusty Russell wrote: > > > The more I looked at this, the more I warmed to it. I've known for a > > while that people are using kallsyms not for OOPS (eg. /proc/$$/wchan), > > so we should provide a "grabs locks" version, but this solution gets > > around that nicely, while making life more certain for the oops case, > > too. > > Hmmm... though it works on i386 SMP, it doesn't, however, seem to work on > ppc64 SMP:-/ > > My pSeries box seems to think that it can't find any symbols from previously > loaded modules, and my Power5 box is quite happy to load modules that depend > on other modules but panics because it can't mount its root fs. Turns out that the patch works. Userspace was being bad. The stripped down shell running as init (pid #1) wasn't taking into account that it would get notification of kernel threads exiting when it called wait(), and so ended up trying to load several modules at once, some of which required dependency modules loading first. David From dhowells at redhat.com Fri Jan 28 01:08:07 2005 From: dhowells at redhat.com (David Howells) Date: Thu, 27 Jan 2005 14:08:07 +0000 Subject: [PATCH] Fix kallsyms/insmod/rmmod race [try #2] In-Reply-To: <31453.1105979239@redhat.com> References: <31453.1105979239@redhat.com> Message-ID: <3880.1106834887@redhat.com> The attached patch fixes a race between kallsyms and insmod/rmmod. The problem is this: (1) The various kallsyms functions poke around in the module list without any locking so that they can be called from the oops handler. (2) Although insmod and rmmod use locks to exclude each other, these have no effect on the kallsyms function. (3) Although rmmod modifies the module state with the machine "stopped", it hasn't removed the metadata from the module metadata list, meaning that as soon as the machine is "restarted", the metadata can be observed by kallsyms. It's not possible to say that an item in that list should be ignored if it's state is marked as inactive - you can't get at the state information because you can't trust the metadata in which it is embedded. Furthermore, list linkage information is embedded in the metadata too, so you can't trust that either... (4) kallsyms may be walking the module list without a lock whilst either insmod or rmmod are busy changing it. insmod probably isn't a problem since nothing is going a way, but rmmod is as it's deleting an entry. (5) Therefore nothing that uses these functions can in any way trust any pointers to "static" data (such as module symbol names or module names) that are returned. (6) On ppc64 the problems are exacerbated since the hypervisor may reschedule bits of the kernel, making operations that appear adjacent occur a long time apart. This patch fixes the race by only linking/unlinking modules into/from the master module list with the machine in the "stopped" state. This means that any "static" information can be trusted as far as the next kernel reschedule on any given CPU without the need to hold any locks. However, I'm not sure how this is affected by preemption. I suspect more work may need to be done in that case, but I'm not entirely sure. This also means that rmmod has to bump the machine into the stopped state twice... but since that shouldn't be a common operation, I don't think that's a problem. I've amended this patch to not get spinlocks whilst in the machine locked state - there's no point as nothing else can be holding spinlocks. Signed-Off-By: David Howells --- warthog>diffstat kallsyms-race-2611rc1.diff kallsyms.c | 16 ++++++++++++++-- module.c | 31 ++++++++++++++++++++++++------- 2 files changed, 38 insertions(+), 9 deletions(-) diff -uNrp linux-2.6.11-rc1/kernel/kallsyms.c linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c --- linux-2.6.11-rc1/kernel/kallsyms.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/kallsyms.c 2005-01-17 15:33:55.000000000 +0000 @@ -139,13 +139,20 @@ unsigned long kallsyms_lookup_name(const return module_kallsyms_lookup_name(name); } -/* Lookup an address. modname is set to NULL if it's in the kernel. */ +/* + * Lookup an address + * - modname is set to NULL if it's in the kernel + * - we guarantee that the returned name is valid until we reschedule even if + * it resides in a module + * - we also guarantee that modname will be valid until rescheduled + */ const char *kallsyms_lookup(unsigned long addr, unsigned long *symbolsize, unsigned long *offset, char **modname, char *namebuf) { unsigned long i, low, high, mid; + const char *msym; /* This kernel should never had been booted. */ BUG_ON(!kallsyms_addresses); @@ -196,7 +203,12 @@ const char *kallsyms_lookup(unsigned lon return namebuf; } - return module_address_lookup(addr, symbolsize, offset, modname); + /* see if it's in a module */ + msym = module_address_lookup(addr, symbolsize, offset, modname); + if (msym) + return strncpy(namebuf, msym, KSYM_NAME_LEN); + + return NULL; } /* Replace "%s" in format with address, or returns -errno. */ diff -uNrp linux-2.6.11-rc1/kernel/module.c linux-2.6.11-rc1-kallsyms/kernel/module.c --- linux-2.6.11-rc1/kernel/module.c 2005-01-12 19:09:18.000000000 +0000 +++ linux-2.6.11-rc1-kallsyms/kernel/module.c 2005-01-27 14:06:22.857054758 +0000 @@ -1072,14 +1072,22 @@ static void mod_kobject_remove(struct mo kobject_unregister(&mod->mkobj.kobj); } +/* + * unlink the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __unlink_module(void *_mod) +{ + struct module *mod = _mod; + list_del(&mod->list); + return 0; +} + /* Free a module, remove from lists, etc (must hold module mutex). */ static void free_module(struct module *mod) { /* Delete from various lists */ - spin_lock_irq(&modlist_lock); - list_del(&mod->list); - spin_unlock_irq(&modlist_lock); - + stop_machine_run(__unlink_module, mod, NR_CPUS); remove_sect_attrs(mod); mod_kobject_remove(mod); @@ -1732,6 +1740,17 @@ static struct module *load_module(void _ goto free_hdr; } +/* + * link the module with the whole machine is stopped with interrupts off + * - this defends against kallsyms not taking locks + */ +static inline int __link_module(void *_mod) +{ + struct module *mod = _mod; + list_add(&mod->list, &modules); + return 0; +} + /* This is where the real work happens */ asmlinkage long sys_init_module(void __user *umod, @@ -1766,9 +1785,7 @@ sys_init_module(void __user *umod, /* Now sew it into the lists. They won't access us, since strong_try_module_get() will fail. */ - spin_lock_irq(&modlist_lock); - list_add(&mod->list, &modules); - spin_unlock_irq(&modlist_lock); + stop_machine_run(__link_module, mod, NR_CPUS); /* Drop lock so they can recurse */ up(&module_mutex); From moilanen at austin.ibm.com Fri Jan 28 03:24:04 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 27 Jan 2005 10:24:04 -0600 Subject: [PATCH] iSeries buildbreak fix Message-ID: <20050127102404.07b57cd4.moilanen@austin.ibm.com> Looks like a build break on iSeries after the xmon-dabr patch: arch/ppc64/xmon/xmon.c:632: undefined reference to `.plpar_hcall_norets' Since iSeries cannot use xmon, a simple fix is to turn it off. Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/Kconfig.debug~xmon-off-iSeries arch/ppc64/Kconfig.debug --- linux-2.6-bk/arch/ppc64/Kconfig.debug~xmon-off-iSeries Thu Jan 27 10:15:00 2005 +++ linux-2.6-bk-moilanen/arch/ppc64/Kconfig.debug Thu Jan 27 10:16:23 2005 @@ -34,7 +34,7 @@ config DEBUGGER config XMON bool "Include xmon kernel debugger" - depends on DEBUGGER + depends on DEBUGGER && !PPC_ISERIES help Include in-kernel hooks for the xmon kernel monitor/debugger. Unless you are intending to debug the kernel, say N here. _ From nathanl at austin.ibm.com Fri Jan 28 08:26:01 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Thu, 27 Jan 2005 15:26:01 -0600 Subject: [PATCH] show -1 for physical_id of non-present cpus In-Reply-To: <16888.20853.816824.41795@cargo.ozlabs.ibm.com> References: <1106698921.9091.4.camel@pants.austin.ibm.com> <41F7187F.9070602@austin.ibm.com> <16888.20853.816824.41795@cargo.ozlabs.ibm.com> Message-ID: <1106861161.8962.7.camel@pants.austin.ibm.com> On Thu, 2005-01-27 at 13:27 +1100, Paul Mackerras wrote: > Olof Johansson writes: > > > Nathan Lynch wrote: > > > > >Make the physical_id cpu attribute on ppc64 show -1 instead of 65535 > > >for non-present cpus. > > > > > > > Good catch. > > > > I'm not sure if I prefer your patch or just switching hw_cpu_id to a s16 > > and using %d. Either way is fine with me. > > Changing hw_cpu_id to a signed quantity sounds cleaner to me. OK. Make the physical_id cpu sysfs attribute on ppc64 show -1 instead of 65535 for non-present cpus. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed arch/ppc64/kernel/sysfs.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed 2005-01-27 15:03:16.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/sysfs.c 2005-01-27 15:05:12.000000000 -0600 @@ -387,7 +387,7 @@ static ssize_t show_physical_id(struct s { struct cpu *cpu = container_of(dev, struct cpu, sysdev); - return sprintf(buf, "%u\n", get_hard_smp_processor_id(cpu->sysdev.id)); + return sprintf(buf, "%d\n", get_hard_smp_processor_id(cpu->sysdev.id)); } static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL); diff -puN include/asm-ppc64/paca.h~make-cpu-physical_id-signed include/asm-ppc64/paca.h --- linux-2.6.11-rc2-mm1/include/asm-ppc64/paca.h~make-cpu-physical_id-signed 2005-01-27 15:04:14.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/include/asm-ppc64/paca.h 2005-01-27 15:04:51.000000000 -0600 @@ -68,7 +68,7 @@ struct paca_struct { u64 stab_real; /* Absolute address of segment table */ u64 stab_addr; /* Virtual address of segment table */ void *emergency_sp; /* pointer to emergency stack */ - u16 hw_cpu_id; /* Physical processor number */ + s16 hw_cpu_id; /* Physical processor number */ u8 cpu_start; /* At startup, processor spins until */ /* this becomes non-zero. */ _ From nathanl at austin.ibm.com Fri Jan 28 09:23:45 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Thu, 27 Jan 2005 16:23:45 -0600 Subject: [PATCH] use _smp_processor_id() in idle loops Message-ID: <1106864625.8962.11.camel@pants.austin.ibm.com> With 2.6.11-rc2-mm1 and 2.6-bk kernels with CONFIG_DEBUG_PREEMPT I'm seeing lots of smp_processor_id warnings from the idle loops: BUG: using smp_processor_id() in preemptible [00000001] code: swapper/0 caller is .dedicated_idle+0x64/0x228 Call Trace: [c0000000004a3c50] [ffffffffffffffff] 0xffffffffffffffff (unreliable) [c0000000004a3cd0] [c0000000001d179c] .smp_processor_id+0x154/0x168 [c0000000004a3d90] [c00000000000f990] .dedicated_idle+0x64/0x228 [c0000000004a3e80] [c00000000000fce0] .cpu_idle+0x34/0x4c [c0000000004a3f00] [c00000000003a908] .start_secondary+0x10c/0x150 [c0000000004a3f90] [c00000000000bd28] .enable_64b_mode+0x0/0x28 This patch replaces smp_processor_id() with _smp_processor_id() in the idle loop code, since we know the idle thread can't jump to a different cpu. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/kernel/idle.c~kill-idle-loop-smp_processor_id-warnings arch/ppc64/kernel/idle.c --- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/idle.c~kill-idle-loop-smp_processor_id-warnings 2005-01-27 16:14:31.000000000 -0600 +++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/idle.c 2005-01-27 16:14:31.000000000 -0600 @@ -122,7 +122,7 @@ static int iSeries_idle(void) static int default_idle(void) { long oldval; - unsigned int cpu = smp_processor_id(); + unsigned int cpu = _smp_processor_id(); while (1) { oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); @@ -164,7 +164,7 @@ int dedicated_idle(void) struct paca_struct *lpaca = get_paca(), *ppaca; unsigned long start_snooze; unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay); - unsigned int cpu = smp_processor_id(); + unsigned int cpu = _smp_processor_id(); ppaca = &paca[cpu ^ 1]; @@ -244,7 +244,7 @@ int dedicated_idle(void) static int shared_idle(void) { struct paca_struct *lpaca = get_paca(); - unsigned int cpu = smp_processor_id(); + unsigned int cpu = _smp_processor_id(); while (1) { /* @@ -275,8 +275,7 @@ static int shared_idle(void) HMT_medium(); lpaca->lppaca.idle = 0; schedule(); - if (cpu_is_offline(smp_processor_id()) && - system_state == SYSTEM_RUNNING) + if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); } _ From nathanl at austin.ibm.com Fri Jan 28 10:07:54 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Thu, 27 Jan 2005 17:07:54 -0600 Subject: [RFC/PATCH 2/2] handle cpu device node addition and removal In-Reply-To: <1106716265.9855.26.camel@localhost.localdomain> References: <1106715991.9855.22.camel@localhost.localdomain> <1106716265.9855.26.camel@localhost.localdomain> Message-ID: <1106867274.8962.14.camel@pants.austin.ibm.com> On Tue, 2005-01-25 at 23:11 -0600, Nathan Lynch wrote: > Using the notifier chain in a previous patch, handle addition and > removal of processors on pSeries LPAR. The new notifier call updates > cpu_present_map and sets hw_cpu_id in the paca appropriately. Note > that we must handle more than one cpu being added or going away to > account for SMT processors. > > This allows us to stop abusing cpu_present_map, and lets us get rid of > find_physical_cpu_to_start, which has always been a bit dodgy. > > The code which updates cpu_present_map I plan to move to the generic > hotplug cpu code someday, but I think this is a good intermediate > step for now. > > Tested on Power5. Hmm, just noticed that this does not allow us to online secondary threads when booting with smt-enabled=off. Will need to respin this one. Nathan From rusty at rustcorp.com.au Fri Jan 28 11:42:02 2005 From: rusty at rustcorp.com.au (Rusty Russell) Date: Fri, 28 Jan 2005 11:42:02 +1100 Subject: [PATCH] Fix kallsyms/insmod/rmmod race [try #2] In-Reply-To: <3880.1106834887@redhat.com> References: <31453.1105979239@redhat.com> <3880.1106834887@redhat.com> Message-ID: <1106872922.18360.9.camel@localhost.localdomain> On Thu, 2005-01-27 at 14:08 +0000, David Howells wrote: > Signed-Off-By: David Howells Excellent. Thanks David! Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman From brking at us.ibm.com Sat Jan 29 01:56:17 2005 From: brking at us.ibm.com (brking at us.ibm.com) Date: Fri, 28 Jan 2005 08:56:17 -0600 Subject: [PATCH 1/2] pci: Arch hook to determine config space size Message-ID: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64 system, the current code used to determine the config space size of a device results in a PCI Master abort and an EEH error, resulting in the device being taken offline. This patch adds the ability for arch specific code to override part of the config space size determination to fix this. Signed-off-by: Brian King --- linux-2.6.11-rc2-bk5-bjking1/drivers/pci/probe.c | 4 ++++ 1 files changed, 4 insertions(+) diff -puN drivers/pci/probe.c~pci_arch_cfg_space_size drivers/pci/probe.c --- linux-2.6.11-rc2-bk5/drivers/pci/probe.c~pci_arch_cfg_space_size 2005-01-27 16:56:46.000000000 -0600 +++ linux-2.6.11-rc2-bk5-bjking1/drivers/pci/probe.c 2005-01-27 16:56:46.000000000 -0600 @@ -627,6 +627,8 @@ static void pci_release_dev(struct devic kfree(pci_dev); } +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } + /** * pci_cfg_space_size - get the configuration space size of the PCI device. * @@ -653,6 +655,8 @@ static int pci_cfg_space_size(struct pci goto fail; } + if (!pcibios_exp_cfg_space(dev)) + goto fail; if (pci_read_config_dword(dev, 256, &status) != PCIBIOS_SUCCESSFUL) goto fail; if (status == 0xffffffff) _ From brking at us.ibm.com Sat Jan 29 01:56:24 2005 From: brking at us.ibm.com (brking at us.ibm.com) Date: Fri, 28 Jan 2005 08:56:24 -0600 Subject: [PATCH 2/2] ppc64: Arch hook to determine config space size Message-ID: <200501281456.j0SEuPRF017696@d01av04.pok.ibm.com> When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64 system, the current code used to determine the config space size of a device results in a PCI Master abort and an EEH error, resulting in the device being taken offline. This patch adds a ppc64 override to query OF to determine if the system and PHB support PCI-X mode 2. Signed-off-by: Brian King --- linux-2.6.11-rc2-bk5-bjking1/arch/ppc64/kernel/pSeries_pci.c | 18 +++++++++++ 1 files changed, 18 insertions(+) diff -puN arch/ppc64/kernel/pSeries_pci.c~ppc64_arch_cfg_space_size arch/ppc64/kernel/pSeries_pci.c --- linux-2.6.11-rc2-bk5/arch/ppc64/kernel/pSeries_pci.c~ppc64_arch_cfg_space_size 2005-01-27 16:57:03.000000000 -0600 +++ linux-2.6.11-rc2-bk5-bjking1/arch/ppc64/kernel/pSeries_pci.c 2005-01-27 16:57:48.000000000 -0600 @@ -583,3 +583,21 @@ static void fixup_winbond_82c105(struct } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, fixup_winbond_82c105); + +int pcibios_exp_cfg_space(struct pci_dev *dev) +{ + int *type; + struct device_node *dn; + struct pci_controller *hose = pci_bus_to_host(dev->bus); + + if (!hose) + return 0; + + dn = (struct device_node *) hose->arch_data; + type = (int *)get_property(dn, "ibm,pci-config-space-type", NULL); + + if (type && *type == 1) + return 1; + + return 0; +} _ From hch at infradead.org Sat Jan 29 05:52:34 2005 From: hch at infradead.org (Christoph Hellwig) Date: Fri, 28 Jan 2005 18:52:34 +0000 Subject: [PATCH 1/2] pci: Arch hook to determine config space size In-Reply-To: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> Message-ID: <20050128185234.GB21760@infradead.org> > +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } - prototypes belong to headers - weak linkage is the perfect way for total obsfucation please make this a regular arch hook > Please read the FAQ at http://www.tux.org/lkml/ ---end quoted text--- From olof at austin.ibm.com Sat Jan 29 07:09:01 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 28 Jan 2005 14:09:01 -0600 Subject: [PATCH] PPC64: p615 IOMMU fix Message-ID: <20050128200901.GA8615@austin.ibm.com> Hi, pSeries p615 happens to have a bus hierarchy where the IDE controller for the built-in CD is connected directly to the PHB without an intermediate EADS bridge. The new iommu/bus setup code assumed that all systems with EADS will have all devices under them, so this resulted in the IDE controller not having an iommu table allocated. To avoid this, always allocate a small table at the PHB level. It will never be used for regular devices, and it's allocated out of the 256MB that we previously skipped. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c | 34 ++++++++++++++++------- 1 files changed, 25 insertions(+), 9 deletions(-) diff -puN arch/ppc64/kernel/pSeries_iommu.c~p615-iommu arch/ppc64/kernel/pSeries_iommu.c --- linux-2.5/arch/ppc64/kernel/pSeries_iommu.c~p615-iommu 2005-01-28 14:04:48.971761000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_iommu.c 2005-01-28 14:04:48.984759024 -0600 @@ -309,6 +309,7 @@ static void iommu_table_setparms_lpar(st static void iommu_bus_setup_pSeries(struct pci_bus *bus) { struct device_node *dn, *pdn; + struct iommu_table *tbl; DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self); @@ -326,7 +327,6 @@ static void iommu_bus_setup_pSeries(stru if (!bus->self) { /* Root bus */ if (is_python(dn)) { - struct iommu_table *tbl; unsigned int *iohole; DBG("Python root bus %s\n", bus->name); @@ -352,19 +352,35 @@ static void iommu_bus_setup_pSeries(stru iommu_table_setparms(dn->phb, dn, tbl); dn->iommu_table = iommu_init_table(tbl); } else { - /* 256 MB window by default */ - dn->phb->dma_window_size = 1 << 28; - /* always skip the first 256MB */ - dn->phb->dma_window_base_cur = 1 << 28; + /* Do a 128MB table at root. This is used for the IDE + * controller on some SMP-mode POWER4 machines. It + * doesn't hurt to allocate it on other machines + * -- it'll just be unused since new tables are + * allocated on the EADS level. + * + * Allocate at offset 128MB to avoid having to deal + * with ISA holes; 128MB table for IDE is plenty. + */ + dn->phb->dma_window_size = 1 << 27; + dn->phb->dma_window_base_cur = 1 << 27; + + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - /* No table at PHB level for non-python PHBs */ + iommu_table_setparms(dn->phb, dn, tbl); + dn->iommu_table = iommu_init_table(tbl); + + /* All child buses have 256MB tables */ + dn->phb->dma_window_size = 1 << 28; } } else { pdn = pci_bus_to_OF_node(bus->parent); - if (!pdn->iommu_table) { + if (!bus->parent->self && !is_python(pdn)) { struct iommu_table *tbl; - /* First child, allocate new table (256MB window) */ + /* First child and not python means this is the EADS + * level. Allocate new table for this slot with 256MB + * window. + */ tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); @@ -372,7 +388,7 @@ static void iommu_bus_setup_pSeries(stru dn->iommu_table = iommu_init_table(tbl); } else { - /* Lower than first child or under python, copy parent table */ + /* Lower than first child or under python, use parent table */ dn->iommu_table = pdn->iommu_table; } } _ From mvolaski at aecom.yu.edu Sat Jan 29 12:37:21 2005 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Fri, 28 Jan 2005 20:37:21 -0500 Subject: CONFIG_THERM_PM72 is missing from .config from recent kernels (2.6.10, 2.6.11) Message-ID: CONFIG_THERM_PM72 is required for thermal management in at least Macs, most notably the PowerMac G5. Without it, the computer will run its fans at the max and is very loud. It's missing from .config in at least a few releases of recent kernels (2.6.10, 2.6.11). Does anyone know why? -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From mvolaski at aecom.yu.edu Sat Jan 29 12:26:08 2005 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Fri, 28 Jan 2005 20:26:08 -0500 Subject: Recent kernels may freeze dual 2.5 GHz PowerMac G5s Message-ID: Posted here FYI... >The patch below works. Thanks. > >>Maurice Volaski writes: >> > I am running Gentoo with a fresh 2.6.11-r1. I have all the kernel >> > debugging options turned on. Occasionally, I can get past the boot >> > process, but half the time it freezes somewhere along the way. If >> > not, I do get to boot, it doesn't take very long for it to freeze. >> >>Did 2.6.10 work Ok? Try the patch below, it fixes 2.6.11-rc1 boot >>lockups on both my Beige G3 (locks up in ADB driver) and my G4 eMac >>(locks up in radeonfb). >> >>--- linux-2.6.11-rc1/init/main.c.~1~ 2005-01-15 03:30:25.000000000 +0100 >>+++ linux-2.6.11-rc1/init/main.c 2005-01-15 03:31:44.000000000 +0100 >>@@ -377,7 +377,7 @@ static void noinline rest_init(void) >> * Re-enable preemption but disable interrupts to make sure >> * we dont get preempted until we schedule() in cpu_idle(). >> */ >>- local_irq_disable(); >>+// local_irq_disable(); >> preempt_enable_no_resched(); >> unlock_kernel(); >> cpu_idle(); -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From greg at kroah.com Sat Jan 29 15:06:47 2005 From: greg at kroah.com (Greg KH) Date: Fri, 28 Jan 2005 20:06:47 -0800 Subject: [PATCH 1/2] pci: Arch hook to determine config space size In-Reply-To: <20050128185234.GB21760@infradead.org> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> Message-ID: <20050129040647.GA6261@kroah.com> On Fri, Jan 28, 2005 at 06:52:34PM +0000, Christoph Hellwig wrote: > > +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > > - prototypes belong to headers > - weak linkage is the perfect way for total obsfucation > > please make this a regular arch hook I agree. Also, when sending PCI related patches, please cc the linux-pci mailing list. thanks, greg k-h From nathanl at austin.ibm.com Sun Jan 30 09:24:19 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 29 Jan 2005 16:24:19 -0600 Subject: [PATCH] use _smp_processor_id() in idle loops In-Reply-To: <1106864625.8962.11.camel@pants.austin.ibm.com> References: <1106864625.8962.11.camel@pants.austin.ibm.com> Message-ID: <1107037459.31457.4.camel@biclops> On Thu, 2005-01-27 at 16:23 -0600, Nathan Lynch wrote: > With 2.6.11-rc2-mm1 and 2.6-bk kernels with CONFIG_DEBUG_PREEMPT I'm > seeing lots of smp_processor_id warnings from the idle loops: > > BUG: using smp_processor_id() in preemptible [00000001] code: > swapper/0 > caller is .dedicated_idle+0x64/0x228 > Call Trace: > [c0000000004a3c50] [ffffffffffffffff] 0xffffffffffffffff (unreliable) > [c0000000004a3cd0] [c0000000001d179c] .smp_processor_id+0x154/0x168 > [c0000000004a3d90] [c00000000000f990] .dedicated_idle+0x64/0x228 > [c0000000004a3e80] [c00000000000fce0] .cpu_idle+0x34/0x4c > [c0000000004a3f00] [c00000000003a908] .start_secondary+0x10c/0x150 > [c0000000004a3f90] [c00000000000bd28] .enable_64b_mode+0x0/0x28 This appears to be fixed in 2.6.11-rc2-mm2, so I guess my patch isn't necessary now. Nathan From anton at samba.org Sun Jan 30 09:49:03 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 30 Jan 2005 09:49:03 +1100 Subject: [PATCH] use _smp_processor_id() in idle loops In-Reply-To: <1107037459.31457.4.camel@biclops> References: <1106864625.8962.11.camel@pants.austin.ibm.com> <1107037459.31457.4.camel@biclops> Message-ID: <20050129224903.GD8654@krispykreme.ozlabs.ibm.com> > This appears to be fixed in 2.6.11-rc2-mm2, so I guess my patch isn't > necessary now. FYI I saw some warnings in kprobes when it was called out of a pagefault. From memory it was kprobe_running(). Anton From mvolaski at aecom.yu.edu Sun Jan 30 10:41:19 2005 From: mvolaski at aecom.yu.edu (Maurice Volaski) Date: Sat, 29 Jan 2005 18:41:19 -0500 Subject: [gentoo-ppc-dev] CONFIG_THERM_PM72 is missing from .config from recent kernels (2.6.10, 2.6.11) In-Reply-To: <20050129103057.GA27803@hansmi.ch> References: <20050129103057.GA27803@hansmi.ch> Message-ID: >Hello Maurice > >> It's missing from .config in at least a few releases of recent >> kernels (2.6.10, 2.6.11). > >Definitly not true, at least for ppc32. Note that.. 1) I looked only at official kernel source code and 2) I looked only at a few releases, not every patchset. and 3) I looked only at the resulting .config file after preparing it with make menuconfig. >Linux g5 2.6.10-gentoo-r6-g5 #6 SMP Wed Jan 26 23:05:05 CET 2005 ppc >PPC970, altivec supported PowerMac7,2 GNU/Linux From what I can tell, the .config file is built up from different files. I just looked at gentoo-dev-sources for this version and it is, in fact, present for ppc64 in /usr/src/linux-2.6.10-gentoo-r6/arch/ppc64/defconfig That suggests the mechanism that generates the .config files is not working right under certain circumstances related to the 64bit G5. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University From benh at kernel.crashing.org Mon Jan 31 10:21:13 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 31 Jan 2005 10:21:13 +1100 Subject: [gentoo-ppc-dev] CONFIG_THERM_PM72 is missing from .config from recent kernels (2.6.10, 2.6.11) In-Reply-To: References: <20050129103057.GA27803@hansmi.ch> Message-ID: <1107127273.5713.13.camel@gaston> On Sat, 2005-01-29 at 18:41 -0500, Maurice Volaski wrote: > From what I can tell, the .config file is built up from different > files. I just looked at gentoo-dev-sources for this version and it > is, in fact, present for ppc64 in > /usr/src/linux-2.6.10-gentoo-r6/arch/ppc64/defconfig > > That suggests the mechanism that generates the .config files is not > working right under certain circumstances related to the 64bit G5. The default config for G5s is arch/ppc64/configs/g5_defconfig, there is only one for 64 bits. 32 bits on G5s is unsupported (and will probably not work with more recent machines). Ben. From benh at kernel.crashing.org Mon Jan 31 16:41:13 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 31 Jan 2005 16:41:13 +1100 Subject: [PATCH] ppc64: Move systemcfg out of head.S Message-ID: <1107150074.5713.55.camel@gaston> Hi ! The "systemcfg" data structure in the ppc64 kernel is something that used to be defined to be at a hard-coded page number in the kernel image. This is not necessary (at least not any more) and is a possible problem with future developements. This patch removes that constraint, which also simplifies various bits of assembly in head.S that were dealing with it. This is the first step of a deeper cleanup of systemcfg definition of usage (and ultimately removal in it's current incarnation). Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/head.S 2005-01-31 16:19:44.000000000 +1100 @@ -517,16 +517,7 @@ .globl naca naca: .llong itVpdAreas -#endif - - . = SYSTEMCFG_PHYS_ADDR - .globl __start_systemcfg -__start_systemcfg: - . = (SYSTEMCFG_PHYS_ADDR + PAGE_SIZE) - .globl __end_systemcfg -__end_systemcfg: -#ifdef CONFIG_PPC_ISERIES /* * The iSeries LPAR map is at this fixed address * so that the HvReleaseData structure can address @@ -536,6 +527,8 @@ * VSID generation algorithm. See include/asm/mmu_context.h. */ + . = 0x4800 + .llong 2 /* # ESIDs to be mapped by hypervisor */ .llong 1 /* # memory ranges to be mapped by hypervisor */ .llong STAB0_PAGE /* Page # of segment table within load area */ @@ -1264,10 +1257,6 @@ addi r2,r2,0x4000 addi r2,r2,0x4000 - LOADADDR(r9,systemcfg) - SET_REG_TO_CONST(r4, SYSTEMCFG_VIRT_ADDR) - std r4,0(r9) /* set the systemcfg pointer */ - bl .iSeries_early_setup /* relocation is on at this point */ @@ -1772,7 +1761,7 @@ sc /* HvCall_setASR */ #else /* set the ASR */ - li r3,SYSTEMCFG_PHYS_ADDR /* r3 = ptr to systemcfg */ + ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ lwz r3,PLATFORM(r3) /* r3 = platform flags */ cmpldi r3,PLATFORM_PSERIES_LPAR bne 98f @@ -1861,12 +1850,6 @@ ori r6,r6,MSR_RI mtmsrd r6 /* RI on */ - /* setup the systemcfg pointer which is needed by *tab_initialize */ - LOADADDR(r6,systemcfg) - sub r6,r6,r26 /* addr of the variable systemcfg */ - li r27,SYSTEMCFG_PHYS_ADDR - std r27,0(r6) /* set the value of systemcfg */ - #ifdef CONFIG_HMT /* Start up the second thread on cpu 0 */ mfspr r3,PVR @@ -1941,7 +1924,7 @@ /* set the ASR */ ld r3,PACASTABREAL(r13) ori r4,r3,1 /* turn on valid bit */ - li r3,SYSTEMCFG_PHYS_ADDR /* r3 = ptr to systemcfg */ + ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ lwz r3,PLATFORM(r3) /* r3 = platform flags */ cmpldi r3,PLATFORM_PSERIES_LPAR bne 98f @@ -1960,7 +1943,7 @@ mtasr r4 /* set the stab location */ 99: /* Set SDR1 (hash table pointer) */ - li r3,SYSTEMCFG_PHYS_ADDR /* r3 = ptr to systemcfg */ + ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ lwz r3,PLATFORM(r3) /* r3 = platform flags */ /* Test if bit 0 is set (LPAR bit) */ andi. r3,r3,0x1 @@ -1998,11 +1981,6 @@ li r3,0 bl .do_cpu_ftr_fixups - /* setup the systemcfg pointer */ - LOADADDR(r9,systemcfg) - SET_REG_TO_CONST(r8, SYSTEMCFG_VIRT_ADDR) - std r8,0(r9) - LOADADDR(r26, boot_cpuid) lwz r26,0(r26) Index: linux-work/arch/ppc64/kernel/pacaData.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pacaData.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pacaData.c 2005-01-31 15:56:55.000000000 +1100 @@ -20,9 +20,14 @@ #include #include -struct systemcfg *systemcfg; +static union { + struct systemcfg data; + u8 page[PAGE_SIZE]; +} systemcfg_store __page_aligned; +struct systemcfg *systemcfg = &systemcfg_store.data; EXPORT_SYMBOL(systemcfg); + /* This symbol is provided by the linker - let it fill in the paca * field correctly */ extern unsigned long __toc_start; Index: linux-work/arch/ppc64/kernel/proc_ppc64.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/proc_ppc64.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/proc_ppc64.c 2005-01-31 15:56:55.000000000 +1100 @@ -89,7 +89,7 @@ return 1; pde->nlink = 1; pde->data = systemcfg; - pde->size = 4096; + pde->size = PAGE_SIZE; pde->proc_fops = &page_map_fops; #ifdef CONFIG_PPC_PSERIES Index: linux-work/include/asm-ppc64/systemcfg.h =================================================================== --- linux-work.orig/include/asm-ppc64/systemcfg.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/systemcfg.h 2005-01-31 15:56:55.000000000 +1100 @@ -47,7 +47,6 @@ __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ __u32 icache_size; /* L1 i-cache size 0x68 */ __u32 icache_line_size; /* L1 i-cache line size 0x6C */ - __u8 reserved0[3984]; /* Reserve rest of page 0x70 */ }; #ifdef __KERNEL__ @@ -56,8 +55,4 @@ #endif /* __ASSEMBLY__ */ -#define SYSTEMCFG_PAGE 0x5 -#define SYSTEMCFG_PHYS_ADDR (SYSTEMCFG_PAGE< Hi ! This is a rather large patch. See notes below for possible backward compatiblity issues. (Note: It depends on "ppc64: Move systemcfg out of head.S" beeing applied) This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every process space, similar to the x86 vsyscall page. However, the implementation is very different (and doesn't use the gate area mecanism). Actually, it contains two implementations, a 32 bits and a 64 bits one. These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a process load section isn't already there). In the future, we can randomize that address, or even imagine having a special phdr entry letting apps that wnat finer control over their address space to put it elsewhere (or not at all). The implementation adds a hook to binfmt_elf to let the architecture add a real VMA to the process space instead of using the gate area mecanism. This mecanism wasn't very suitable for ppc, we couldn't just "shove" PTE entries mapping kernel addresses into userland without expensive changes to our hash table management. Instead, I made the vDSO be a normal VMA which, additionally, means it supports copy-on-write semantics if made writable via ptrace/mprotect, thus allowing breakpoints in the vDSO code. The current implementation of the vDSOs contain the signal trampolines with appropriate DWARF informations, which enable us to use non-executable stacks (patches to come later) along with a few more functions that we hope glibc will soon make good use of (this is the "hard" part now :) Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps can't be expected to link against them directly, the vDSO's are both seen as if they were linked at 0 and the symbols just contain offsets to the various functions. This is done on purpose to avoid a relocation step (ppc64 functions normally have descriptors with abs addresses in them). When glibc uses those functions, it's expected to use it's own trampolines that know how to reach them. In some cases, the vDSO contains several versions of a given function (for various CPUs), the kernel will "patch" the symbol table at boot to make it point to the appropriate one transparently. What is currently implemented is: - int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); This is a fully userland implementation of gettimeofday, with no barriers and no locks, and providing 100% equivalent results to the syscall version - void __kernel_sync_dicache(unsigned long start, unsigned long end) This function sync's the data and instruction caches (for making data executable), it is expected that userland loaders use this instead of doing it themselves, as the kernel will provide optimized versions for the current CPU. Currently, the vDSO procides a full one for all CPUs prior to POWER5 and a nop one for POWER5 which implements hardware snooping at the L1 level. In the future, an intermediate implementation may be done for the POWER4 and 970 which don't need the "dcbst" loop (the L1D cache is write-through on those). - void *__kernel_get_syscall_map(unsigned int *syscall_count) ; Returns a pointer to a map of implemented syscalls on the currently running kernel. The map is agnostic to the size of "long", unlike kernel bitops, it stores bits from top to bottom so that memory actually contains a linear bitmap check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of * 32 bits int at N >> 5. Note about backward compatibility issues: A bug in the ppc64 libgcc unwinder makes it unable to unwind stacks properly accross signals if the signal trampoline isn't on the stack. This has been fixed in CVS for gcc 4.0 and will be soon on the stable branch, but the problem exist will all currently used versions. That means that until glibc gets the patch to enable it's use of the vDSO symbols for the DWARF unwinder (rather trivial patch that will be pushed to glibc CVS soon hopefully), unwinding from a signal handler will not work for 64 bits applications. I consider this as a non-issue though as a patch is about to be produced, which can easily get pushed to "live" distros like debian, gentoo, fedora, etc... soon enough (it breaks compatilbity with kernels below 2.4.20 unfortunately as our signal stack layout changed, crap crap crap), as there are few 64 bits applications out there (expect gentoo), as it's only really an issue with C++ code relying on throwing exceptions out of signal handlers (extremely rare it seems), and as "release" distros like SLES or RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix by the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :) So far, I yet have to see an app failing because of that... Finally, many many many thanks to Alan Modra for writing the DWARF information of the signal handlers and debugging the libgcc issues ! Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/Makefile =================================================================== --- linux-work.orig/arch/ppc64/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/Makefile 2005-01-31 16:25:55.000000000 +1100 @@ -53,6 +53,8 @@ libs-y += arch/ppc64/lib/ core-y += arch/ppc64/kernel/ +core-y += arch/ppc64/kernel/vdso32/ +core-y += arch/ppc64/kernel/vdso64/ core-y += arch/ppc64/mm/ core-$(CONFIG_XMON) += arch/ppc64/xmon/ drivers-$(CONFIG_OPROFILE) += arch/ppc64/oprofile/ Index: linux-work/arch/ppc64/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/asm-offsets.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/asm-offsets.c 2005-01-31 16:25:56.000000000 +1100 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -35,6 +36,8 @@ #include #include #include +#include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -167,5 +170,24 @@ DEFINE(CPU_SPEC_FEATURES, offsetof(struct cpu_spec, cpu_features)); DEFINE(CPU_SPEC_SETUP, offsetof(struct cpu_spec, cpu_setup)); + /* systemcfg offsets for use by vdso */ + DEFINE(CFG_TB_ORIG_STAMP, offsetof(struct systemcfg, tb_orig_stamp)); + DEFINE(CFG_TB_TICKS_PER_SEC, offsetof(struct systemcfg, tb_ticks_per_sec)); + DEFINE(CFG_TB_TO_XS, offsetof(struct systemcfg, tb_to_xs)); + DEFINE(CFG_STAMP_XSEC, offsetof(struct systemcfg, stamp_xsec)); + DEFINE(CFG_TB_UPDATE_COUNT, offsetof(struct systemcfg, tb_update_count)); + DEFINE(CFG_TZ_MINUTEWEST, offsetof(struct systemcfg, tz_minuteswest)); + DEFINE(CFG_TZ_DSTTIME, offsetof(struct systemcfg, tz_dsttime)); + DEFINE(CFG_SYSCALL_MAP32, offsetof(struct systemcfg, syscall_map_32)); + DEFINE(CFG_SYSCALL_MAP64, offsetof(struct systemcfg, syscall_map_64)); + + /* timeval/timezone offsets for use by vdso */ + DEFINE(TVAL64_TV_SEC, offsetof(struct timeval, tv_sec)); + DEFINE(TVAL64_TV_USEC, offsetof(struct timeval, tv_usec)); + DEFINE(TVAL32_TV_SEC, offsetof(struct compat_timeval, tv_sec)); + DEFINE(TVAL32_TV_USEC, offsetof(struct compat_timeval, tv_usec)); + DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); + DEFINE(TZONE_TZ_DSTTIME, offsetof(struct timezone, tz_dsttime)); + return 0; } Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-01-31 16:25:56.000000000 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o vdso.o obj-$(CONFIG_PPC_OF) += of_device.o Index: linux-work/arch/ppc64/kernel/signal32.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal32.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal32.c 2005-01-31 16:25:56.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -656,18 +657,24 @@ /* Save user registers on the stack */ frame = &rt_sf->uc.uc_mcontext; - if (save_user_regs(regs, frame, __NR_rt_sigreturn)) - goto badframe; - if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; + + if (vdso32_rt_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, frame, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_rt_sigtramp; + } else { + if (save_user_regs(regs, frame, __NR_rt_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->tramp; + } regs->gpr[1] = (unsigned long) newsp; regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) &rt_sf->info; regs->gpr[5] = (unsigned long) &rt_sf->uc; regs->gpr[6] = (unsigned long) rt_sf; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->tramp; regs->trap = 0; regs->result = 0; @@ -825,8 +832,15 @@ || __put_user(sig, &sc->signal)) goto badframe; - if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) - goto badframe; + if (vdso32_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, &frame->mctx, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_sigtramp; + } else { + if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->mctx.tramp; + } if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; @@ -834,7 +848,6 @@ regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) sc; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->mctx.tramp; regs->trap = 0; regs->result = 0; Index: linux-work/arch/ppc64/kernel/setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/setup.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/setup.c 2005-01-31 16:25:56.000000000 +1100 @@ -990,6 +990,34 @@ } /* + * Called from setup_arch to initialize the bitmap of available + * syscalls in the systemcfg page + */ +void __init setup_syscall_map(void) +{ + unsigned int i, count64 = 0, count32 = 0; + extern unsigned long *sys_call_table; + extern unsigned long *sys_call_table32; + extern unsigned long sys_ni_syscall; + + + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table[i] == sys_ni_syscall) + continue; + count64++; + systemcfg->syscall_map_64[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table32[i] == sys_ni_syscall) + continue; + count32++; + systemcfg->syscall_map_32[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + printk(KERN_INFO "Syscall map setup, %d 32 bits and %d 64 bits syscalls\n", + count32, count64); +} + +/* * Called into from start_kernel, after lock_kernel has been called. * Initializes bootmem, which is unsed to manage page allocation until * mem_init is called. @@ -1027,6 +1055,9 @@ /* set up the bootmem stuff with available memory */ do_init_bootmem(); + /* initialize the syscall map in systemcfg */ + setup_syscall_map(); + ppc_md.setup_arch(); /* Select the correct idle loop for the platform. */ Index: linux-work/arch/ppc64/kernel/signal.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal.c 2005-01-31 16:25:56.000000000 +1100 @@ -34,6 +34,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -426,10 +427,14 @@ goto badframe; /* Set up to return from userspace. */ - err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); - if (err) - goto badframe; - + if (vdso64_rt_sigtramp && current->thread.vdso_base) { + regs->link = current->thread.vdso_base + vdso64_rt_sigtramp; + } else { + err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); + if (err) + goto badframe; + regs->link = (unsigned long) &frame->tramp[0]; + } funct_desc_ptr = (func_descr_t __user *) ka->sa.sa_handler; /* Allocate a dummy caller frame for the signal handler. */ @@ -438,7 +443,6 @@ /* Set up "regs" so we "return" to the signal handler. */ err |= get_user(regs->nip, &funct_desc_ptr->entry); - regs->link = (unsigned long) &frame->tramp[0]; regs->gpr[1] = newsp; err |= get_user(regs->gpr[2], &funct_desc_ptr->toc); regs->gpr[3] = signr; Index: linux-work/arch/ppc64/kernel/smp.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/smp.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/smp.c 2005-01-31 16:25:56.000000000 +1100 @@ -383,7 +383,7 @@ * For now we leave it which means the time can be some * number of msecs off until someone does a settimeofday() */ - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->tb_orig_stamp = tb_last_stamp; #endif Index: linux-work/arch/ppc64/kernel/time.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/time.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/time.c 2005-01-31 16:25:56.000000000 +1100 @@ -86,8 +86,6 @@ unsigned long tb_ticks_per_jiffy; unsigned long tb_ticks_per_usec = 100; /* sane default */ unsigned long tb_ticks_per_sec; -unsigned long next_xtime_sync_tb; -unsigned long xtime_sync_interval; unsigned long tb_to_xs; unsigned tb_to_us; unsigned long processor_freq; @@ -158,8 +156,8 @@ * The conversion to microseconds at the end is done * without a divide (and in fact, without a multiply) */ - tb_ticks = tb_val - do_gtod.tb_orig_stamp; temp_varp = do_gtod.varp; + tb_ticks = tb_val - temp_varp->tb_orig_stamp; temp_tb_to_xs = temp_varp->tb_to_xs; temp_stamp_xsec = temp_varp->stamp_xsec; tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); @@ -185,17 +183,55 @@ { struct timeval my_tv; - if (cur_tb > next_xtime_sync_tb) { - next_xtime_sync_tb = cur_tb + xtime_sync_interval; - __do_gettimeofday(&my_tv, cur_tb); - - if (xtime.tv_sec <= my_tv.tv_sec) { - xtime.tv_sec = my_tv.tv_sec; - xtime.tv_nsec = my_tv.tv_usec * 1000; - } + __do_gettimeofday(&my_tv, cur_tb); + + if (xtime.tv_sec <= my_tv.tv_sec) { + xtime.tv_sec = my_tv.tv_sec; + xtime.tv_nsec = my_tv.tv_usec * 1000; } } +/* + * When the timebase - tb_orig_stamp gets too big, we do a manipulation + * between tb_orig_stamp and stamp_xsec. The goal here is to keep the + * difference tb - tb_orig_stamp small enough to always fit inside a + * 32 bits number. This is a requirement of our fast 32 bits userland + * implementation in the vdso. If we "miss" a call to this function + * (interrupt latency, CPU locked in a spinlock, ...) and we end up + * with a too big difference, then the vdso will fallback to calling + * the syscall + */ +static __inline__ void timer_recalc_offset(unsigned long cur_tb) +{ + struct gettimeofday_vars * temp_varp; + unsigned temp_idx; + unsigned long offset, new_stamp_xsec, new_tb_orig_stamp; + + if (((cur_tb - do_gtod.varp->tb_orig_stamp) & 0x80000000u) == 0) + return; + + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + + new_tb_orig_stamp = cur_tb; + offset = new_tb_orig_stamp - do_gtod.varp->tb_orig_stamp; + new_stamp_xsec = do_gtod.varp->stamp_xsec + mulhdu(offset, do_gtod.varp->tb_to_xs); + + temp_varp->tb_to_xs = do_gtod.varp->tb_to_xs; + temp_varp->tb_orig_stamp = new_tb_orig_stamp; + temp_varp->stamp_xsec = new_stamp_xsec; + mb(); + do_gtod.varp = temp_varp; + do_gtod.var_idx = temp_idx; + + ++(systemcfg->tb_update_count); + wmb(); + systemcfg->tb_orig_stamp = new_tb_orig_stamp; + systemcfg->stamp_xsec = new_stamp_xsec; + wmb(); + ++(systemcfg->tb_update_count); +} + #ifdef CONFIG_SMP unsigned long profile_pc(struct pt_regs *regs) { @@ -311,6 +347,7 @@ if (cpu == boot_cpuid) { write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; + timer_recalc_offset(lpaca->next_jiffy_update_tb); do_timer(regs); timer_sync_xtime(lpaca->next_jiffy_update_tb); timer_check_rtc(); @@ -398,7 +435,9 @@ time_maxerror = NTP_PHASE_LIMIT; time_esterror = NTP_PHASE_LIMIT; - delta_xsec = mulhdu( (tb_last_stamp-do_gtod.tb_orig_stamp), do_gtod.varp->tb_to_xs ); + delta_xsec = mulhdu( (tb_last_stamp-do_gtod.varp->tb_orig_stamp), + do_gtod.varp->tb_to_xs ); + new_xsec = (new_nsec * XSEC_PER_SEC) / NSEC_PER_SEC; new_xsec += new_sec * XSEC_PER_SEC; if ( new_xsec > delta_xsec ) { @@ -411,7 +450,7 @@ * before 1970 ... eg. we booted ten days ago, and we are setting * the time to Jan 5, 1970 */ do_gtod.varp->stamp_xsec = new_xsec; - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->stamp_xsec = new_xsec; systemcfg->tb_orig_stamp = tb_last_stamp; } @@ -464,9 +503,9 @@ xtime.tv_sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); tb_last_stamp = get_tb(); - do_gtod.tb_orig_stamp = tb_last_stamp; do_gtod.varp = &do_gtod.vars[0]; do_gtod.var_idx = 0; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; do_gtod.varp->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; do_gtod.tb_ticks_per_sec = tb_ticks_per_sec; do_gtod.varp->tb_to_xs = tb_to_xs; @@ -477,9 +516,6 @@ systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; systemcfg->tb_to_xs = tb_to_xs; - xtime_sync_interval = tb_ticks_per_sec - (tb_ticks_per_sec/8); - next_xtime_sync_tb = tb_last_stamp + xtime_sync_interval; - time_freq = 0; xtime.tv_nsec = 0; @@ -584,12 +620,12 @@ stamp_xsec which is the time (in 1/2^20 second units) corresponding to tb_orig_stamp. This new value of stamp_xsec compensates for the change in frequency (implied by the new tb_to_xs) which guarantees that the current time remains the same */ - tb_ticks = get_tb() - do_gtod.tb_orig_stamp; + write_seqlock_irqsave( &xtime_lock, flags ); + tb_ticks = get_tb() - do_gtod.varp->tb_orig_stamp; div128_by_32( 1024*1024, 0, new_tb_ticks_per_sec, &divres ); new_tb_to_xs = divres.result_low; new_xsec = mulhdu( tb_ticks, new_tb_to_xs ); - write_seqlock_irqsave( &xtime_lock, flags ); old_xsec = mulhdu( tb_ticks, do_gtod.varp->tb_to_xs ); new_stamp_xsec = do_gtod.varp->stamp_xsec + old_xsec - new_xsec; @@ -597,16 +633,12 @@ values in do_gettimeofday. We alternate the copies and as long as a reasonable time elapses between changes, there will never be inconsistent values. ntpd has a minimum of one minute between updates */ - if (do_gtod.var_idx == 0) { - temp_varp = &do_gtod.vars[1]; - temp_idx = 1; - } - else { - temp_varp = &do_gtod.vars[0]; - temp_idx = 0; - } + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + temp_varp->tb_to_xs = new_tb_to_xs; temp_varp->stamp_xsec = new_stamp_xsec; + temp_varp->tb_orig_stamp = do_gtod.varp->tb_orig_stamp; mb(); do_gtod.varp = temp_varp; do_gtod.var_idx = temp_idx; Index: linux-work/arch/ppc64/kernel/vdso.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso.c 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,614 @@ +/* + * linux/arch/ppc64/kernel/vdso.c + * + * Copyright (C) 2004 Benjamin Herrenschmidt, IBM Corp. + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + + +/* + * The vDSOs themselves are here + */ +extern char vdso64_start, vdso64_end; +extern char vdso32_start, vdso32_end; + +static void *vdso64_kbase = &vdso64_start; +static void *vdso32_kbase = &vdso32_start; + +unsigned int vdso64_pages; +unsigned int vdso32_pages; + +/* Signal trampolines user addresses */ + +unsigned long vdso64_rt_sigtramp; +unsigned long vdso32_sigtramp; +unsigned long vdso32_rt_sigtramp; + +/* Format of the patch table */ +struct vdso_patch_def +{ + u32 pvr_mask, pvr_value; + const char *gen_name; + const char *fix_name; +}; + +/* Table of functions to patch based on the CPU type/revision + * + * TODO: Improve by adding whole lists for each entry + */ +static struct vdso_patch_def vdso_patches[] = { + { + 0xffff0000, 0x003a0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, + { + 0xffff0000, 0x003b0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, +}; + +/* + * Some infos carried around for each of them during parsing at + * boot time. + */ +struct lib32_elfinfo +{ + Elf32_Ehdr *hdr; /* ptr to ELF */ + Elf32_Sym *dynsym; /* ptr to .dynsym section */ + unsigned long dynsymsize; /* size of .dynsym section */ + char *dynstr; /* ptr to .dynstr section */ + unsigned long text; /* offset of .text section in .so */ +}; + +struct lib64_elfinfo +{ + Elf64_Ehdr *hdr; + Elf64_Sym *dynsym; + unsigned long dynsymsize; + char *dynstr; + unsigned long text; +}; + + +#ifdef __DEBUG +static void dump_one_vdso_page(struct page *pg, struct page *upg) +{ + printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT), + page_count(pg), + pg->flags); + if (upg/* && pg != upg*/) { + printk(" upg: %p (c:%d,f:%08lx)", __va(page_to_pfn(upg) << PAGE_SHIFT), + page_count(upg), + upg->flags); + } + printk("\n"); +} + +static void dump_vdso_pages(struct vm_area_struct * vma) +{ + int i; + + if (!vma || test_thread_flag(TIF_32BIT)) { + printk("vDSO32 @ %016lx:\n", (unsigned long)vdso32_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } + if (!vma || !test_thread_flag(TIF_32BIT)) { + printk("vDSO64 @ %016lx:\n", (unsigned long)vdso64_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } +} +#endif /* DEBUG */ + +/* + * Keep a dummy vma_close for now, it will prevent VMA merging. + */ +static void vdso_vma_close(struct vm_area_struct * vma) +{ +} + +/* + * Our nopage() function, maps in the actual vDSO kernel pages, they will + * be mapped read-only by do_no_page(), and eventually COW'ed, either + * right away for an initial write access, or by do_wp_page(). + */ +static struct page * vdso_vma_nopage(struct vm_area_struct * vma, + unsigned long address, int *type) +{ + unsigned long offset = address - vma->vm_start; + struct page *pg; + void *vbase = test_thread_flag(TIF_32BIT) ? vdso32_kbase : vdso64_kbase; + + DBG("vdso_vma_nopage(current: %s, address: %016lx, off: %lx)\n", + current->comm, address, offset); + + if (address < vma->vm_start || address > vma->vm_end) + return NOPAGE_SIGBUS; + + /* + * Last page is systemcfg, special handling here, no get_page() a + * this is a reserved page + */ + if ((vma->vm_end - address) <= PAGE_SIZE) + return virt_to_page(systemcfg); + + pg = virt_to_page(vbase + offset); + get_page(pg); + DBG(" ->page count: %d\n", page_count(pg)); + + return pg; +} + +static struct vm_operations_struct vdso_vmops = { + .close = vdso_vma_close, + .nopage = vdso_vma_nopage, +}; + +/* + * This is called from binfmt_elf, we create the special vma for the + * vDSO and insert it into the mm struct tree + */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + unsigned long vdso_pages; + unsigned long vdso_base; + + if (test_thread_flag(TIF_32BIT)) { + vdso_pages = vdso32_pages; + vdso_base = VDSO32_MBASE; + } else { + vdso_pages = vdso64_pages; + vdso_base = VDSO64_MBASE; + } + + /* vDSO has a problem and was disabled, just don't "enable" it for the + * process + */ + if (vdso_pages == 0) { + current->thread.vdso_base = 0; + return 0; + } + vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); + if (vma == NULL) + return -ENOMEM; + if (security_vm_enough_memory(vdso_pages)) { + kmem_cache_free(vm_area_cachep, vma); + return -ENOMEM; + } + memset(vma, 0, sizeof(*vma)); + + /* + * pick a base address for the vDSO in process space. We have a default + * base of 1Mb on which we had a random offset up to 1Mb. + * XXX: Add possibility for a program header to specify that location + */ + current->thread.vdso_base = vdso_base; + /* + ((unsigned long)vma & 0x000ff000); */ + + vma->vm_mm = mm; + vma->vm_start = current->thread.vdso_base; + + /* + * the VMA size is one page more than the vDSO since systemcfg + * is mapped in the last one + */ + vma->vm_end = vma->vm_start + ((vdso_pages + 1) << PAGE_SHIFT); + + /* + * our vma flags don't have VM_WRITE so by default, the process isn't allowed + * to write those pages. + * gdb can break that with ptrace interface, and thus trigger COW on those + * pages but it's then your responsibility to never do that on the "data" page + * of the vDSO or you'll stop getting kernel updates and your nice userland + * gettimeofday will be totally dead. It's fine to use that for setting + * breakpoints in the vDSO code pages though + */ + vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + vma->vm_flags |= mm->def_flags; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + vma->vm_ops = &vdso_vmops; + + down_write(&mm->mmap_sem); + insert_vm_struct(mm, vma); + mm->total_vm += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + up_write(&mm->mmap_sem); + + return 0; +} + +static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf32_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + *size = 0; + return NULL; +} + +static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf64_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + if (size) + *size = 0; + return NULL; +} + +static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +static Elf64_Sym * __init find_symbol64(struct lib64_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf64_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function32(struct lib32_elfinfo *lib, const char *symname) +{ + Elf32_Sym *sym = find_symbol32(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO32: function %s not found !\n", symname); + return 0; + } + return sym->st_value - VDSO32_LBASE; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function64(struct lib64_elfinfo *lib, const char *symname) +{ + Elf64_Sym *sym = find_symbol64(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO64: function %s not found !\n", symname); + return 0; + } +#ifdef VDS64_HAS_DESCRIPTORS + return *((u64 *)(vdso64_kbase + sym->st_value - VDSO64_LBASE)) - VDSO64_LBASE; +#else + return sym->st_value - VDSO64_LBASE; +#endif +} + + +static __init int vdso_do_find_sections(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + void *sect; + + /* + * Locate symbol tables & text section + */ + + v32->dynsym = find_section32(v32->hdr, ".dynsym", &v32->dynsymsize); + v32->dynstr = find_section32(v32->hdr, ".dynstr", NULL); + if (v32->dynsym == NULL || v32->dynstr == NULL) { + printk(KERN_ERR "vDSO32: a required symbol section was not found\n"); + return -1; + } + sect = find_section32(v32->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO32: the .text section was not found\n"); + return -1; + } + v32->text = sect - vdso32_kbase; + + v64->dynsym = find_section64(v64->hdr, ".dynsym", &v64->dynsymsize); + v64->dynstr = find_section64(v64->hdr, ".dynstr", NULL); + if (v64->dynsym == NULL || v64->dynstr == NULL) { + printk(KERN_ERR "vDSO64: a required symbol section was not found\n"); + return -1; + } + sect = find_section64(v64->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO64: the .text section was not found\n"); + return -1; + } + v64->text = sect - vdso64_kbase; + + return 0; +} + +static __init void vdso_setup_trampolines(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + /* + * Find signal trampolines + */ + + vdso64_rt_sigtramp = find_function64(v64, "__kernel_sigtramp_rt64"); + vdso32_sigtramp = find_function32(v32, "__kernel_sigtramp32"); + vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32"); +} + +static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + Elf32_Sym *sym32; + Elf64_Sym *sym64; + + sym32 = find_symbol32(v32, "__kernel_datapage_offset"); + if (sym32 == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) = + (vdso32_pages << PAGE_SHIFT) - (sym32->st_value - VDSO32_LBASE); + + sym64 = find_symbol64(v64, "__kernel_datapage_offset"); + if (sym64 == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) = + (vdso64_pages << PAGE_SHIFT) - (sym64->st_value - VDSO64_LBASE); + + return 0; +} + +static int vdso_do_func_patch32(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf32_Sym *sym32_gen, *sym32_fix; + + sym32_gen = find_symbol32(v32, orig); + if (sym32_gen == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig); + return -1; + } + sym32_fix = find_symbol32(v32, fix); + if (sym32_fix == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix); + return -1; + } + sym32_gen->st_value = sym32_fix->st_value; + sym32_gen->st_size = sym32_fix->st_size; + sym32_gen->st_info = sym32_fix->st_info; + sym32_gen->st_other = sym32_fix->st_other; + sym32_gen->st_shndx = sym32_fix->st_shndx; + + return 0; +} + +static int vdso_do_func_patch64(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf64_Sym *sym64_gen, *sym64_fix; + + sym64_gen = find_symbol64(v64, orig); + if (sym64_gen == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", orig); + return -1; + } + sym64_fix = find_symbol64(v64, fix); + if (sym64_fix == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", fix); + return -1; + } + sym64_gen->st_value = sym64_fix->st_value; + sym64_gen->st_size = sym64_fix->st_size; + sym64_gen->st_info = sym64_fix->st_info; + sym64_gen->st_other = sym64_fix->st_other; + sym64_gen->st_shndx = sym64_fix->st_shndx; + + return 0; +} + +static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + u32 pvr; + int i; + + pvr = mfspr(SPRN_PVR); + for (i = 0; i < ARRAY_SIZE(vdso_patches); i++) { + struct vdso_patch_def *patch = &vdso_patches[i]; + int match = (pvr & patch->pvr_mask) == patch->pvr_value; + + DBG("patch %d (mask: %x, pvr: %x) : %s\n", + i, patch->pvr_mask, patch->pvr_value, match ? "match" : "skip"); + + if (!match) + continue; + + DBG("replacing %s with %s...\n", patch->gen_name, patch->fix_name); + + /* + * Patch the 32 bits and 64 bits symbols. Note that we do not patch + * the "." symbol on 64 bits. It would be easy to do, but doesn't + * seem to be necessary, patching the OPD symbol is enough. + */ + vdso_do_func_patch32(v32, v64, patch->gen_name, patch->fix_name); + vdso_do_func_patch64(v32, v64, patch->gen_name, patch->fix_name); + } + + return 0; +} + + +static __init int vdso_setup(void) +{ + struct lib32_elfinfo v32; + struct lib64_elfinfo v64; + + v32.hdr = vdso32_kbase; + v64.hdr = vdso64_kbase; + + if (vdso_do_find_sections(&v32, &v64)) + return -1; + + if (vdso_fixup_datapage(&v32, &v64)) + return -1; + + if (vdso_fixup_alt_funcs(&v32, &v64)) + return -1; + + vdso_setup_trampolines(&v32, &v64); + + return 0; +} + +void __init vdso_init(void) +{ + int i; + + vdso64_pages = (&vdso64_end - &vdso64_start) >> PAGE_SHIFT; + vdso32_pages = (&vdso32_end - &vdso32_start) >> PAGE_SHIFT; + + DBG("vdso64_kbase: %p, 0x%x pages, vdso32_kbase: %p, 0x%x pages\n", + vdso64_kbase, vdso64_pages, vdso32_kbase, vdso32_pages); + + /* + * Initialize the vDSO images in memory, that is do necessary + * fixups of vDSO symbols, locate trampolines, etc... + */ + if (vdso_setup()) { + printk(KERN_ERR "vDSO setup failure, not enabled !\n"); + /* XXX should free pages here ? */ + vdso64_pages = vdso32_pages = 0; + return; + } + + /* Make sure pages are in the correct state */ + for (i = 0; i < vdso64_pages; i++) { + struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } + for (i = 0; i < vdso32_pages; i++) { + struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } +} + +int in_gate_area_no_task(unsigned long addr) +{ + return 0; +} + +int in_gate_area(struct task_struct *task, unsigned long addr) +{ + return 0; +} + +struct vm_area_struct *get_gate_vma(struct task_struct *tsk) +{ + return NULL; +} + Index: linux-work/include/asm-ppc64/processor.h =================================================================== --- linux-work.orig/include/asm-ppc64/processor.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/processor.h 2005-01-31 16:25:56.000000000 +1100 @@ -544,8 +544,8 @@ /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ -#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(STACK_TOP_USER32 / 4)) -#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(STACK_TOP_USER64 / 4)) +#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4)) +#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(TASK_SIZE_USER64 / 4)) #define TASK_UNMAPPED_BASE ((test_thread_flag(TIF_32BIT)||(ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 ) @@ -562,7 +562,8 @@ double fpr[32]; /* Complete floating point set */ unsigned long fpscr; /* Floating point status (plus pad) */ unsigned long fpexc_mode; /* Floating-point exception mode */ - unsigned long pad[3]; /* was saved_msr, saved_softe */ + unsigned long pad[2]; /* was saved_msr, saved_softe */ + unsigned long vdso_base; /* base of the vDSO library */ #ifdef CONFIG_ALTIVEC /* Complete AltiVec register set */ vector128 vr[32] __attribute((aligned(16))); Index: linux-work/include/asm-ppc64/systemcfg.h =================================================================== --- linux-work.orig/include/asm-ppc64/systemcfg.h 2005-01-31 15:56:55.000000000 +1100 +++ linux-work/include/asm-ppc64/systemcfg.h 2005-01-31 16:25:56.000000000 +1100 @@ -20,10 +20,14 @@ * Minor version changes are a hint. */ #define SYSTEMCFG_MAJOR 1 -#define SYSTEMCFG_MINOR 0 +#define SYSTEMCFG_MINOR 1 #ifndef __ASSEMBLY__ +#include + +#define SYSCALL_MAP_SIZE ((__NR_syscalls + 31) / 32) + struct systemcfg { __u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */ struct { /* Systemcfg version numbers */ @@ -47,6 +51,8 @@ __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ __u32 icache_size; /* L1 i-cache size 0x68 */ __u32 icache_line_size; /* L1 i-cache line size 0x6C */ + __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */ + __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */ }; #ifdef __KERNEL__ Index: linux-work/include/asm-ppc64/a.out.h =================================================================== --- linux-work.orig/include/asm-ppc64/a.out.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/a.out.h 2005-01-31 16:25:56.000000000 +1100 @@ -30,14 +30,11 @@ #ifdef __KERNEL__ -#define STACK_TOP_USER64 (TASK_SIZE_USER64) +#define STACK_TOP_USER64 TASK_SIZE_USER64 +#define STACK_TOP_USER32 TASK_SIZE_USER32 -/* Give 32-bit user space a full 4G address space to live in. */ -#define STACK_TOP_USER32 (TASK_SIZE_USER32) - -#define STACK_TOP ((test_thread_flag(TIF_32BIT) || \ - (ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ - STACK_TOP_USER32 : STACK_TOP_USER64) +#define STACK_TOP (test_thread_flag(TIF_32BIT) ? \ + STACK_TOP_USER32 : STACK_TOP_USER64) #endif /* __KERNEL__ */ Index: linux-work/include/asm-ppc64/elf.h =================================================================== --- linux-work.orig/include/asm-ppc64/elf.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/elf.h 2005-01-31 16:25:56.000000000 +1100 @@ -238,10 +238,20 @@ /* A special ignored type value for PPC, for glibc compatibility. */ #define AT_IGNOREPPC 22 +/* The vDSO location. We have to use the same value as x86 for glibc's + * sake :-) + */ +#define AT_SYSINFO_EHDR 33 + extern int dcache_bsize; extern int icache_bsize; extern int ucache_bsize; +/* We do have an arch_setup_additional_pages for vDSO matters */ +#define ARCH_HAS_SETUP_ADDITIONAL_PAGES +struct linux_binprm; +extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); + /* * The requirements here are: * - keep the final alignment of sp (sp & 0xf) @@ -260,6 +270,8 @@ NEW_AUX_ENT(AT_DCACHEBSIZE, dcache_bsize); \ NEW_AUX_ENT(AT_ICACHEBSIZE, icache_bsize); \ NEW_AUX_ENT(AT_UCACHEBSIZE, ucache_bsize); \ + /* vDSO base */ \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, current->thread.vdso_base); \ } while (0) /* PowerPC64 relocations defined by the ABIs */ Index: linux-work/include/asm-ppc64/time.h =================================================================== --- linux-work.orig/include/asm-ppc64/time.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/time.h 2005-01-31 16:25:56.000000000 +1100 @@ -43,10 +43,10 @@ struct gettimeofday_vars { unsigned long tb_to_xs; unsigned long stamp_xsec; + unsigned long tb_orig_stamp; }; struct gettimeofday_struct { - unsigned long tb_orig_stamp; unsigned long tb_ticks_per_sec; struct gettimeofday_vars vars[2]; struct gettimeofday_vars * volatile varp; Index: linux-work/fs/binfmt_elf.c =================================================================== --- linux-work.orig/fs/binfmt_elf.c 2005-01-31 14:18:24.000000000 +1100 +++ linux-work/fs/binfmt_elf.c 2005-01-31 16:25:56.000000000 +1100 @@ -772,6 +772,14 @@ goto out_free_dentry; } +#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES + retval = arch_setup_additional_pages(bprm, executable_stack); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + goto out_free_dentry; + } +#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ + current->mm->start_stack = bprm->p; /* Now we do a little grungy work by mmaping the ELF image into Index: linux-work/include/asm-ppc64/page.h =================================================================== --- linux-work.orig/include/asm-ppc64/page.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/page.h 2005-01-31 16:25:56.000000000 +1100 @@ -185,6 +185,9 @@ extern u64 ppc64_pft_size; /* Log 2 of page table size */ +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* __ASSEMBLY__ */ #ifdef MODULE Index: linux-work/include/asm-ppc64/vdso.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/include/asm-ppc64/vdso.h 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,83 @@ +#ifndef __PPC64_VDSO_H__ +#define __PPC64_VDSO_H__ + +#ifdef __KERNEL__ + +/* Default link addresses for the vDSOs */ +#define VDSO32_LBASE 0 +#define VDSO64_LBASE 0 + +/* Default map addresses */ +#define VDSO32_MBASE 0x100000 +#define VDSO64_MBASE 0x100000 + +#define VDSO_VERSION_STRING LINUX_2.6.11 + +/* Define if 64 bits VDSO has procedure descriptors */ +#undef VDS64_HAS_DESCRIPTORS + +#ifndef __ASSEMBLY__ + +extern unsigned int vdso64_pages; +extern unsigned int vdso32_pages; + +/* Offsets relative to thread->vdso_base */ +extern unsigned long vdso64_rt_sigtramp; +extern unsigned long vdso32_sigtramp; +extern unsigned long vdso32_rt_sigtramp; + +extern void vdso_init(void); + +#else /* __ASSEMBLY__ */ + +#ifdef __VDSO64__ +#ifdef VDS64_HAS_DESCRIPTORS +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .section ".opd","a"; \ + .align 3; \ + name: \ + .quad .name,.TOC. at tocbase,0; \ + .previous; \ + .globl .name; \ + .type .name, at function; \ + .name: \ + +#define V_FUNCTION_END(name) \ + .size .name,.-.name; + +#define V_LOCAL_FUNC(name) (.name) + +#else /* VDS64_HAS_DESCRIPTORS */ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* VDS64_HAS_DESCRIPTORS */ +#endif /* __VDSO64__ */ + +#ifdef __VDSO32__ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .type name, at function; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* __VDSO32__ */ + +#endif /* __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* __PPC64_VDSO_H__ */ Index: linux-work/arch/ppc64/mm/init.c =================================================================== --- linux-work.orig/arch/ppc64/mm/init.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/mm/init.c 2005-01-31 16:25:56.000000000 +1100 @@ -62,6 +62,7 @@ #include #include #include +#include int mem_init_done; unsigned long ioremap_bot = IMALLOC_BASE; @@ -743,6 +744,8 @@ #ifdef CONFIG_PPC_ISERIES iommu_vio_init(); #endif + /* Initialize the vDSO */ + vdso_init(); } /* Index: linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,139 @@ +/* + * Userland implementation of gettimeofday() for 32 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r10,r3 /* r10 saves tv */ + mr r11,r4 /* r11 saves tz */ + bl __get_datapage at local /* get data page */ + mr r9, r3 /* datapage ptr in r9 */ + bl __do_get_xsec at local /* get xsec from tb & kernel */ + bne- 2f /* out of line -> do syscall */ + + /* seconds are xsec >> 20 */ + rlwinm r5,r4,12,20,31 + rlwimi r5,r3,12,0,19 + stw r5,TVAL32_TV_SEC(r10) + + /* get remaining xsec and convert to usec. we scale + * up remaining xsec by 12 bits and get the top 32 bits + * of the multiplication + */ + rlwinm r5,r4,12,0,19 + lis r6,1000000 at h + ori r6,r6,1000000 at l + mulhwu r5,r5,r6 + stw r5,TVAL32_TV_USEC(r10) + + cmpli cr0,r11,0 /* check if tz is NULL */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r9) + stw r4,TZONE_TZ_MINWEST(r11) + stw r5,TZONE_TZ_DSTTIME(r11) + +1: mtlr r12 + blr + +2: mr r3,r10 + mr r4,r11 + li r0,__NR_gettimeofday + sc + b 1b + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r3 & r4 and expects the datapage ptr (non clobbered) + * in r9. clobbers r0,r4,r5,r6,r7,r8 +*/ +__do_get_xsec: + .cfi_startproc + /* Check for update count & load values. We use the low + * order 32 bits of the update count + */ +1: lwz r8,(CFG_TB_UPDATE_COUNT+4)(r9) + andi. r0,r8,1 /* pending update ? loop */ + bne- 1b + xor r0,r8,r8 /* create dependency */ + add r9,r9,r0 + + /* Load orig stamp (offset to TB) */ + lwz r5,CFG_TB_ORIG_STAMP(r9) + lwz r6,(CFG_TB_ORIG_STAMP+4)(r9) + + /* Get a stable TB value */ +2: mftbu r3 + mftbl r4 + mftbu r0 + cmpl cr0,r3,r0 + bne- 2b + + /* Substract tb orig stamp. If the high part is non-zero, we jump to the + * slow path which call the syscall. If it's ok, then we have our 32 bits + * tb_ticks value in r7 + */ + subfc r7,r6,r4 + subfe. r0,r5,r3 + bne- 3f + + /* Load scale factor & do multiplication */ + lwz r5,CFG_TB_TO_XS(r9) /* load values */ + lwz r6,(CFG_TB_TO_XS+4)(r9) + mulhwu r4,r7,r5 + mulhwu r6,r7,r6 + mullw r6,r7,r5 + addc r6,r6,r0 + + /* At this point, we have the scaled xsec value in r4 + XER:CA + * we load & add the stamp since epoch + */ + lwz r5,CFG_STAMP_XSEC(r9) + lwz r6,(CFG_STAMP_XSEC+4)(r9) + adde r4,r4,r6 + addze r3,r5 + + /* We now have our result in r3,r4. We create a fake dependency + * on that result and re-check the counter + */ + xor r0,r4,r4 + add r9,r9,r0 + lwz r0,(CFG_TB_UPDATE_COUNT+4)(r9) + cmpl cr0,r8,r0 /* check if updated */ + bne- 1b + + /* Warning ! The caller expects CR:EQ to be set to indicate a + * successful calculation (so it won't fallback to the syscall + * method). We have overriden that CR bit in the counter check, + * but fortunately, the loop exit condition _is_ CR:EQ set, so + * we can exit safely here. If you change this code, be careful + * of that side effect. + */ +3: blr + .cfi_endproc Index: linux-work/arch/ppc64/kernel/vdso32/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,300 @@ +/* + * Signal trampolines for 32 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by adding a nop before + the real start. */ + nop +V_FUNCTION_BEGIN(__kernel_sigtramp32) +.Lsig_start = . - 4 + li r0,__NR_sigreturn + sc +.Lsig_end: +V_FUNCTION_END(__kernel_sigtramp32) + +.Lsigrt_start: + nop +V_FUNCTION_BEGIN(__kernel_sigtramp_rt32) + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt32) + + .section .eh_frame,"a", at progbits + +/* Register r1 can be found at offset 4 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. The VMX reg struct is at offset VREGS of + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 64+28 + +/* Size of regs. */ +#define RSIZE 4 + +/* This is the offset of the VMX regs. */ +#define VREGS 48*RSIZE+34*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 32*16) +#else +#define EH_FRAME_VMX +#endif + +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -4 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x1b /* DW_EH_PE_pcrel | DW_EH_PE_sdata4. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 4 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .long .Lsig_start - . /* PC start, length */ + .long .Lsig_end - .Lsig_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde0_end: + +/* We have a different stack layout for rt_sigreturn. */ +#undef PTREGS +#define PTREGS 64+16+128+20+28 + + .long .Lfde1_end - .Lfde1_start +.Lfde1_start: + .long .Lfde1_start - .Lcie /* CIE pointer. */ + .long .Lsigrt_start - . /* PC start, length */ + .long .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde1_end: Index: linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,12 @@ +#include + + .section ".data" + + .globl vdso32_start, vdso32_end + .balign 4096 +vdso32_start: + .incbin "arch/ppc64/kernel/vdso32/vdso32.so" + .balign 4096 +vdso32_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,110 @@ +/* + * This is the infamous ld script for the 64 bits vdso + * library + */ +#include + +OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") +OUTPUT_ARCH(powerpc:common64) +ENTRY(_start) + +SECTIONS +{ + . = VDSO64_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + *(.sfpr .glink) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + + .opd ALIGN(8) : { KEEP (*(.opd)) } + .got ALIGN(8) : { *(.got .toc) } + .rela.dyn ALIGN(8) : { *(.rela.dyn) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + PROVIDE (end = .); + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sectio/ns. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.branch_lt) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp_rt64; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,12 @@ +#include + + .section ".data" + + .globl vdso64_start, vdso64_end + .balign 4096 +vdso64_start: + .incbin "arch/ppc64/kernel/vdso64/vdso64.so" + .balign 4096 +vdso64_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso32/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/datapage.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl __get_datapage at local + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP32 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso32/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,50 @@ +# Choose compiler +# +# XXX FIXME: We probably want to enforce using a biarch compiler by default +# and thus use (CC) with -m64, while letting the user pass a +# CROSS32_COMPILE prefix if wanted. Same goes for the zImage +# wrappers +# + +CROSS32_COMPILE ?= + +CROSS32CC := $(CROSS32_COMPILE)gcc +CROSS32AS := $(CROSS32_COMPILE)as + +# List of files in the vdso, has to be asm only for now + +src-vdso32 = sigtramp.S gettimeofday.S datapage.S cacheflush.S + +# Build rules + +obj-vdso32 := $(addsuffix .o, $(basename $(src-vdso32))) +targets := $(obj-vdso32) vdso32.so +obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) +src-vdso32 := $(addprefix $(src)/, $(src-vdso32)) + + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1 +EXTRA_AFLAGS := -D__VDSO32__ -s + +obj-y += vdso32_wrapper.o +extra-y += vdso32.lds +CPPFLAGS_vdso32.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso32.so: $(src)/vdso32.lds $(obj-vdso32) + $(call if_changed,vdso32ld) + +# assembly rules for the .S files +$(obj-vdso32): %.o: %.S + $(call if_changed_dep,vdso32as) + +# actual build commands +quiet_cmd_vdso32ld = VDSO32L $@ + cmd_vdso32ld = $(CROSS32CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso32as = VDSO32A $@ + cmd_vdso32as = $(CROSS32CC) $(a_flags) -c -o $@ $< + Index: linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,91 @@ +/* + * Userland implementation of gettimeofday() for 64 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r11,r3 /* r11 holds tv */ + mr r10,r4 /* r10 holds tz */ + bl V_LOCAL_FUNC(__get_datapage) /* get data page */ + bl V_LOCAL_FUNC(__do_get_xsec) /* get xsec from tb & kernel */ + lis r7,15 /* r7 = 1000000 = USEC_PER_SEC */ + ori r7,r7,16960 + rldicl r5,r4,44,20 /* r5 = sec = xsec / XSEC_PER_SEC */ + rldicr r6,r5,20,43 /* r6 = sec * XSEC_PER_SEC */ + std r5,TVAL64_TV_SEC(r11) /* store sec in tv */ + subf r0,r6,r4 /* r0 = xsec = (xsec - r6) */ + mulld r0,r0,r7 /* usec = (xsec * USEC_PER_SEC) / XSEC_PER_SEC */ + rldicl r0,r0,44,20 + cmpldi cr0,r10,0 /* check if tz is NULL */ + std r0,TVAL64_TV_USEC(r11) /* store usec in tv */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r3) + stw r4,TZONE_TZ_MINWEST(r10) + stw r5,TZONE_TZ_DSTTIME(r10) +1: mtlr r12 + li r3,0 /* always success */ + blr + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r4 and expects the datapage ptr (non clobbered) + * in r3. clobbers r0,r4,r5,r6,r7,r8 +*/ +V_FUNCTION_BEGIN(__do_get_xsec) + .cfi_startproc + /* check for update count & load values */ +1: ld r7,CFG_TB_UPDATE_COUNT(r3) + andi. r0,r4,1 /* pending update ? loop */ + bne- 1b + xor r0,r4,r4 /* create dependency */ + add r3,r3,r0 + + /* Get TB & offset it */ + mftb r8 + ld r9,CFG_TB_ORIG_STAMP(r3) + subf r8,r9,r8 + + /* Scale result */ + ld r5,CFG_TB_TO_XS(r3) + mulhdu r8,r8,r5 + + /* Add stamp since epoch */ + ld r6,CFG_STAMP_XSEC(r3) + add r4,r6,r8 + + xor r0,r4,r4 + add r3,r3,r0 + ld r0,CFG_TB_UPDATE_COUNT(r3) + cmpld cr0,r0,r7 /* check if updated */ + bne- 1b + blr + .cfi_endproc +V_FUNCTION_END(__do_get_xsec) Index: linux-work/arch/ppc64/kernel/vdso64/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/datapage.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl V_LOCAL_FUNC(__get_datapage) + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP64 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso64/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/sigtramp.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,294 @@ +/* + * Signal trampoline for 64 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by padding before the + real start. */ + nop + .balign 8 +V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) +.Lsigrt_start = . - 4 + addi r1, r1, __SIGNAL_FRAMESIZE + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt64) +/* The ".balign 8" above and the following zeros mimic the old stack + trampoline layout. The last magic value is the ucontext pointer, + chosen in such a way that older libgcc unwind code returns a zero + for a sigcontext pointer. */ + .long 0,0,0 + .quad 0,-21*8 + +/* Register r1 can be found at offset 8 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. A pointer to the VMX reg struct is at VREGS in + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 128+168+56 + +/* Size of regs. */ +#define RSIZE 8 + +/* This is the offset of the VMX reg pointer. */ +#define VREGS 48*RSIZE+33*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 33*16) +#else +#define EH_FRAME_VMX +#endif + + .section .eh_frame,"a", at progbits +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -8 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x14 /* DW_EH_PE_pcrel | DW_EH_PE_udata8. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 8 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .quad .Lsigrt_start - . /* PC start, length */ + .quad .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX +# Do we really need to describe the frame at this point? ie. will +# we ever have some call chain that returns somewhere past the addi? +# I don't think so, since gcc doesn't support async signals. +# .byte 0x41 /* DW_CFA_advance_loc 1*4 */ +#undef PTREGS +#define PTREGS 168+56 +# EH_FRAME_GEN +# EH_FRAME_FP +# EH_FRAME_VMX + .balign 8 +.Lfde0_end: Index: linux-work/arch/ppc64/kernel/vdso64/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/Makefile 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,37 @@ +# List of files in the vdso, has to be asm only for now + +src-vdso64 = sigtramp.S gettimeofday.S datapage.S cacheflush.S + +# Build rules + +obj-vdso64 := $(addsuffix .o, $(basename $(src-vdso64))) +targets := $(obj-vdso64) vdso64.so +obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64)) +src-vdso64 := $(addprefix $(src)/, $(src-vdso64)) + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso64.so.1 +EXTRA_AFLAGS := -D__VDSO64__ -s + +obj-y += vdso64_wrapper.o +extra-y += vdso64.lds +CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso64.so: $(src)/vdso64.lds $(obj-vdso64) + $(call if_changed,vdso64ld) + +# assembly rules for the .S files +$(obj-vdso64): %.o: %.S + $(call if_changed_dep,vdso64as) + +# actual build commands +quiet_cmd_vdso64ld = VDSO64L $@ + cmd_vdso64ld = $(CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso64as = VDSO64A $@ + cmd_vdso64as = $(CC) $(a_flags) -c -o $@ $< + + Index: linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,111 @@ + +/* + * This is the infamous ld script for the 32 bits vdso + * library + */ +#include + +/* Default link addresses for the vDSOs */ +OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") +OUTPUT_ARCH(powerpc:common) +ENTRY(_start) + +SECTIONS +{ + . = VDSO32_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + .fixup : { *(.fixup) } + + .got ALIGN(4) : { *(.got.plt) *(.got) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + __end = .; + PROVIDE (end = .); + + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.* .sdata*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp32; + __kernel_sigtramp_rt32; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso32/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/cacheflush.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,65 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) + Index: linux-work/arch/ppc64/kernel/vdso64/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/cacheflush.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,64 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-01-31 16:19:44.000000000 +1100 +++ linux-work/arch/ppc64/kernel/head.S 2005-01-31 16:25:56.000000000 +1100 @@ -54,7 +54,6 @@ * 0x0100 - 0x2fff : pSeries Interrupt prologs * 0x3000 - 0x3fff : Interrupt support * 0x4000 - 0x4fff : NACA - * 0x5000 - 0x5fff : SystemCfg * 0x6000 : iSeries and common interrupt prologs * 0x9000 - 0x9fff : Initial segment table */ From sfr at canb.auug.org.au Mon Jan 31 17:39:44 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 31 Jan 2005 17:39:44 +1100 Subject: [PATCH] replace last usage of vio dma mapping routines Message-ID: <20050131173944.7aa7f206.sfr@canb.auug.org.au> Hi all, This patch just replaces the last usage of the vio dma mapping routines with the equivalent generic dma mapping routines. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus-bk/drivers/net/ibmveth.c linus-bk-vio.1/drivers/net/ibmveth.c --- linus-bk/drivers/net/ibmveth.c 2004-12-08 04:06:06.000000000 +1100 +++ linus-bk-vio.1/drivers/net/ibmveth.c 2005-01-31 16:45:28.000000000 +1100 @@ -218,7 +218,8 @@ static void ibmveth_replenish_buffer_poo ibmveth_assert(index != IBM_VETH_INVALID_MAP); ibmveth_assert(pool->skbuff[index] == NULL); - dma_addr = vio_map_single(adapter->vdev, skb->data, pool->buff_size, DMA_FROM_DEVICE); + dma_addr = dma_map_single(&adapter->vdev->dev, skb->data, + pool->buff_size, DMA_FROM_DEVICE); pool->free_map[free_index] = IBM_VETH_INVALID_MAP; pool->dma_addr[index] = dma_addr; @@ -238,7 +239,9 @@ static void ibmveth_replenish_buffer_poo pool->free_map[free_index] = IBM_VETH_INVALID_MAP; pool->skbuff[index] = NULL; pool->consumer_index--; - vio_unmap_single(adapter->vdev, pool->dma_addr[index], pool->buff_size, DMA_FROM_DEVICE); + dma_unmap_single(&adapter->vdev->dev, + pool->dma_addr[index], pool->buff_size, + DMA_FROM_DEVICE); dev_kfree_skb_any(skb); adapter->replenish_add_buff_failure++; break; @@ -299,7 +302,7 @@ static void ibmveth_free_buffer_pool(str for(i = 0; i < pool->size; ++i) { struct sk_buff *skb = pool->skbuff[i]; if(skb) { - vio_unmap_single(adapter->vdev, + dma_unmap_single(&adapter->vdev->dev, pool->dma_addr[i], pool->buff_size, DMA_FROM_DEVICE); @@ -337,7 +340,7 @@ static void ibmveth_remove_buffer_from_p adapter->rx_buff_pool[pool].skbuff[index] = NULL; - vio_unmap_single(adapter->vdev, + dma_unmap_single(&adapter->vdev->dev, adapter->rx_buff_pool[pool].dma_addr[index], adapter->rx_buff_pool[pool].buff_size, DMA_FROM_DEVICE); @@ -408,7 +411,9 @@ static void ibmveth_cleanup(struct ibmve { if(adapter->buffer_list_addr != NULL) { if(!dma_mapping_error(adapter->buffer_list_dma)) { - vio_unmap_single(adapter->vdev, adapter->buffer_list_dma, 4096, DMA_BIDIRECTIONAL); + dma_unmap_single(&adapter->vdev->dev, + adapter->buffer_list_dma, 4096, + DMA_BIDIRECTIONAL); adapter->buffer_list_dma = DMA_ERROR_CODE; } free_page((unsigned long)adapter->buffer_list_addr); @@ -417,7 +422,9 @@ static void ibmveth_cleanup(struct ibmve if(adapter->filter_list_addr != NULL) { if(!dma_mapping_error(adapter->filter_list_dma)) { - vio_unmap_single(adapter->vdev, adapter->filter_list_dma, 4096, DMA_BIDIRECTIONAL); + dma_unmap_single(&adapter->vdev->dev, + adapter->filter_list_dma, 4096, + DMA_BIDIRECTIONAL); adapter->filter_list_dma = DMA_ERROR_CODE; } free_page((unsigned long)adapter->filter_list_addr); @@ -426,7 +433,10 @@ static void ibmveth_cleanup(struct ibmve if(adapter->rx_queue.queue_addr != NULL) { if(!dma_mapping_error(adapter->rx_queue.queue_dma)) { - vio_unmap_single(adapter->vdev, adapter->rx_queue.queue_dma, adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL); + dma_unmap_single(&adapter->vdev->dev, + adapter->rx_queue.queue_dma, + adapter->rx_queue.queue_len, + DMA_BIDIRECTIONAL); adapter->rx_queue.queue_dma = DMA_ERROR_CODE; } kfree(adapter->rx_queue.queue_addr); @@ -472,9 +482,13 @@ static int ibmveth_open(struct net_devic return -ENOMEM; } - adapter->buffer_list_dma = vio_map_single(adapter->vdev, adapter->buffer_list_addr, 4096, DMA_BIDIRECTIONAL); - adapter->filter_list_dma = vio_map_single(adapter->vdev, adapter->filter_list_addr, 4096, DMA_BIDIRECTIONAL); - adapter->rx_queue.queue_dma = vio_map_single(adapter->vdev, adapter->rx_queue.queue_addr, adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL); + adapter->buffer_list_dma = dma_map_single(&adapter->vdev->dev, + adapter->buffer_list_addr, 4096, DMA_BIDIRECTIONAL); + adapter->filter_list_dma = dma_map_single(&adapter->vdev->dev, + adapter->filter_list_addr, 4096, DMA_BIDIRECTIONAL); + adapter->rx_queue.queue_dma = dma_map_single(&adapter->vdev->dev, + adapter->rx_queue.queue_addr, + adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL); if((dma_mapping_error(adapter->buffer_list_dma) ) || (dma_mapping_error(adapter->filter_list_dma)) || @@ -644,7 +658,7 @@ static int ibmveth_start_xmit(struct sk_ /* map the initial fragment */ desc[0].fields.length = nfrags ? skb->len - skb->data_len : skb->len; - desc[0].fields.address = vio_map_single(adapter->vdev, skb->data, + desc[0].fields.address = dma_map_single(&adapter->vdev->dev, skb->data, desc[0].fields.length, DMA_TO_DEVICE); desc[0].fields.valid = 1; @@ -662,7 +676,7 @@ static int ibmveth_start_xmit(struct sk_ while(curfrag--) { skb_frag_t *frag = &skb_shinfo(skb)->frags[curfrag]; desc[curfrag+1].fields.address - = vio_map_single(adapter->vdev, + = dma_map_single(&adapter->vdev->dev, page_address(frag->page) + frag->page_offset, frag->size, DMA_TO_DEVICE); desc[curfrag+1].fields.length = frag->size; @@ -674,7 +688,7 @@ static int ibmveth_start_xmit(struct sk_ adapter->stats.tx_dropped++; /* Free all the mappings we just created */ while(curfrag < nfrags) { - vio_unmap_single(adapter->vdev, + dma_unmap_single(&adapter->vdev->dev, desc[curfrag+1].fields.address, desc[curfrag+1].fields.length, DMA_TO_DEVICE); @@ -714,7 +728,9 @@ static int ibmveth_start_xmit(struct sk_ } do { - vio_unmap_single(adapter->vdev, desc[nfrags].fields.address, desc[nfrags].fields.length, DMA_TO_DEVICE); + dma_unmap_single(&adapter->vdev->dev, + desc[nfrags].fields.address, + desc[nfrags].fields.length, DMA_TO_DEVICE); } while(--nfrags >= 0); dev_kfree_skb(skb); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/4aa7fa11/attachment.pgp From olh at suse.de Mon Jan 31 19:52:45 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 31 Jan 2005 09:52:45 +0100 Subject: IDE oops with 2.6.11rc2 Message-ID: <20050131085245.GA26443@suse.de> I get this with current Linus tree, on a p630. Linux version 2.6.11-rc2-pseries64 (olaf at pomegranate) (gcc version 3.3.3 (SuSE Linux)) #4 SMP Mon Jan 31 09:40:20 CET 2005 Kernel command line: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx W82C105: IDE controller at PCI slot 0000:00:03.1 W82C105: chipset revision 5 W82C105: 100% native mode on irq 102 ide0: BM-DMA at 0xf040-0xf047<3> CPU: -- Error, unable to allocateW82C105 DMA table(s). ide1: BM-DMA at 0xf048-0xf04f<3> CPU: -- Error, unable to allocateW82C105 DMA table(s). hda: HL-DT-ST CD-ROM GCR-8480B, ATAPI CD/DVD-ROM drive Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=128 NUMA PSERIES Modules linked in: NIP: C00000000028E248 XER: 00000000 LR: C000000000284E7C CTR: 0000000000000015 REGS: c0000000041e3620 TRAP: 0300 Not tainted (2.6.11-rc2-pseries64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 24004048 DAR: 0000000000000000 DSISR: 0000000040000000 TASK: c0000001fe7927f0[1] 'swapper' THREAD: c0000000041e0000 CPU: 0 GPR00: C000000000602B40 C0000000041E38A0 C000000000619990 C0000000006CB468 GPR04: 000000000000000C 0000000000000015 C0000000041E39E8 E00000000000F042 GPR08: 0000000000000001 0000000000000000 03000000A0000000 0000000000000000 GPR12: 000000003FD228A8 C000000000491C00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000003A10000 0000000003E3CA10 GPR20: 0000000003E3CA10 BFFFFFFFFC5F0000 0000000000000001 000000000000000C GPR24: 0000000000000001 0000000000000044 C0000000006CB468 C0000000006CB358 GPR28: C0000000006CB468 0000000000000000 C00000000052F8F0 C0000000041E38A0 NIP [c00000000028e248] .ide_config_drive_speed+0x32c/0x6e0 LR [c000000000284e7c] .config_for_pio+0x1b8/0x240 Call Trace: [c0000000041e38a0] [0000000040000000] .__start+0x4000000040000000/0x8 (unreliable) [c0000000041e3970] [c000000000284e7c] .config_for_pio+0x1b8/0x240 [c0000000041e3a30] [c000000000284fb4] .tune_sl82c105+0x2c/0x54 [c0000000041e3ac0] [c0000000002926bc] .probe_hwif+0x9e4/0x9ec [c0000000041e3b90] [c000000000293414] .probe_hwif_init_with_fixup+0x2c/0xe0 [c0000000041e3c20] [c000000000296e20] .ide_setup_pci_device+0x74/0xd8 [c0000000041e3cc0] [c00000000028495c] .sl82c105_init_one+0x24/0x40 [c0000000041e3d40] [c000000000426ffc] .ide_scan_pcidev+0xb0/0x10c [c0000000041e3dd0] [c0000000004270a0] .ide_scan_pcibus+0x48/0x120 [c0000000041e3e70] [c000000000426f18] .ide_init+0x80/0xb4 [c0000000041e3f00] [c00000000000c390] .init+0x1d4/0x3f4 [c0000000041e3f90] [c000000000014388] .kernel_thread+0x4c/0x6c Instruction dump: 887a00dd e8090000 f8410028 60630002 e9690010 e8490008 7c0903a6 4e800421 e8410028 e97a0090 4bfffdd4 e93b06a0 f8410028 60000000 e9690010 <0>Kernel panic - not syncing: Attempted to kill init! From olh at suse.de Mon Jan 31 20:02:26 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 31 Jan 2005 10:02:26 +0100 Subject: IDE oops with 2.6.11rc2 In-Reply-To: <20050131085245.GA26443@suse.de> References: <20050131085245.GA26443@suse.de> Message-ID: <20050131090226.GA27127@suse.de> On Mon, Jan 31, Olaf Hering wrote: > > I get this with current Linus tree, on a p630. ppc64-p615-iommu-fix.patch helps