From moilanen at austin.ibm.com Tue Nov 1 08:45:40 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 31 Oct 2005 15:45:40 -0600 Subject: [PATCH 2/2] Export Physical IO base address In-Reply-To: <1130540087.29054.128.camel@gaston> References: <20051028150035.3d1da846.moilanen@austin.ibm.com> <20051028150804.73b5cedb.moilanen@austin.ibm.com> <1130540087.29054.128.camel@gaston> Message-ID: <20051031154540.195b0de0.moilanen@austin.ibm.com> On Sat, 29 Oct 2005 08:54:47 +1000 Benjamin Herrenschmidt wrote: > On Fri, 2005-10-28 at 15:08 -0500, Jake Moilanen wrote: > > This patch exports the physical IO base address so drivers can pick it > > up when using addresses from the device-tree. > > Why ? What is your driver exactly trying to do ? TPM needs to get the base address for IO as an offset into IO space. This base physical address is stored in the reg property in the device-node. To calculate the offset, need to do: TPM_base_phy_addr - io_base_phys. Thus the need to export this address. Jake From arndb at de.ibm.com Tue Nov 1 12:08:38 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:38 -0500 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.300238000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-smp.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/03409828/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:37 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:37 -0500 Subject: [patch 1/5] powerpc: Rename BPA to Cell References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.134984000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-kconfig.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/751fab3b/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:36 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:36 -0500 Subject: [patch 0/5] Move Cell stuff to arch/powerpc Message-ID: <20051101010836.771791000@localhost> As promised, here is my new patch set moving all Cell stuff over to arch/powerpc. Please apply. Arnd <>< From arndb at de.ibm.com Tue Nov 1 12:08:40 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:40 -0500 Subject: [patch 4/5] powerpc: move mmio_nvram.c over to arch/powerpc References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.623411000@localhost> An embedded and charset-unspecified text was scrubbed... Name: mmio-nvram.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/f47181fa/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:39 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:39 -0500 Subject: [patch 3/5] powerpc: move rtas_fw.c out of platforms/pseries References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.463223000@localhost> An embedded and charset-unspecified text was scrubbed... Name: rtas-flash.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/5894915b/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:41 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:41 -0500 Subject: [patch 5/5] powerpc: move arch/ppc64/kernel/bpa* to arch/powerpc/platforms/cell References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.788778000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-platform.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/13827568/attachment.txt From michael at ellerman.id.au Tue Nov 1 10:26:55 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 1 Nov 2005 10:26:55 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <20051101011133.300238000@localhost> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> Message-ID: <200511011026.59266.michael@ellerman.id.au> On Tue, 1 Nov 2005 12:08, Arnd Bergmann wrote: > During the conversion to the merge tree, the Cell specific > SMP initialization was removed from the pSeries code. > > This creates a new Cell specific SMP implementation file. > > Signed-off-by: Arnd Bergmann > > --- > > arch/powerpc/platforms/Makefile | 1 > arch/powerpc/platforms/cell/Makefile | 1 > arch/powerpc/platforms/cell/smp.c | 230 ++++++++++++++++++++ > include/asm-ppc64/smp.h | 1 > 4 files changed, 233 insertions(+) A lot of your smp routines are identical to the pSeries versions. Wouldn't it be preferable to only have one implementation? cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051101/0d649a53/attachment.pgp From benh at kernel.crashing.org Tue Nov 1 10:27:23 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Nov 2005 10:27:23 +1100 Subject: [PATCH 2/2] Export Physical IO base address In-Reply-To: <20051031154540.195b0de0.moilanen@austin.ibm.com> References: <20051028150035.3d1da846.moilanen@austin.ibm.com> <20051028150804.73b5cedb.moilanen@austin.ibm.com> <1130540087.29054.128.camel@gaston> <20051031154540.195b0de0.moilanen@austin.ibm.com> Message-ID: <1130801243.29054.376.camel@gaston> On Mon, 2005-10-31 at 15:45 -0600, Jake Moilanen wrote: > On Sat, 29 Oct 2005 08:54:47 +1000 > Benjamin Herrenschmidt wrote: > > > On Fri, 2005-10-28 at 15:08 -0500, Jake Moilanen wrote: > > > This patch exports the physical IO base address so drivers can pick it > > > up when using addresses from the device-tree. > > > > Why ? What is your driver exactly trying to do ? > > TPM needs to get the base address for IO as an offset into IO space. > > This base physical address is stored in the reg property in the > device-node. > > To calculate the offset, need to do: TPM_base_phy_addr - io_base_phys. > > Thus the need to export this address. Hrm... that is sooo bogus. If the device-tree exposes a full physical address in "reg", then just use that with ioremap (ignore the fact that it's actually IO space). Additionally, tell the firmware folks to fix their device-tree, this is all very bogus to me. The TPM device is on the LPC bus right ? Thus it should appear below the LPC/ISA bridge and thus get proper address space. It's totally broken to put it anywhere else Ben. From arnd at arndb.de Tue Nov 1 10:50:48 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 1 Nov 2005 00:50:48 +0100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511011026.59266.michael@ellerman.id.au> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> <200511011026.59266.michael@ellerman.id.au> Message-ID: <200511010050.48828.arnd@arndb.de> On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > A lot of your smp routines are identical to the pSeries versions. Wouldn't it > be preferable to only have one implementation? Yes it would. I'm not sure how that would best be done however. Until 2.6.14, we've just used the pSeries implementation, which does not work any more now that we want to keep the platform stuff in separate directories. One idea might be to split out the rtas calls (startup_cpu, give_timebase, take_timebase) to rtas.c so they can be included by all chrp-descendants. smp_init_cell() can be further simplified under the assumption that we're always SMT and never LPAR, although the latter might change in the future. Arnd <>< From benh at kernel.crashing.org Tue Nov 1 11:05:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Nov 2005 11:05:08 +1100 Subject: [PATCH] tpm: support PPC64 hardware In-Reply-To: <1130769479.4882.35.camel@localhost.localdomain> References: <1130769479.4882.35.camel@localhost.localdomain> Message-ID: <1130803508.29054.388.camel@gaston> On Mon, 2005-10-31 at 08:37 -0600, Kylene Jo Hall wrote: > The TPM is discovered differently on PPC64 because the device must be > discovered through the device tree in order to open the proper holes in > the io_page_mask for reading and writing in the low memory space. This > does not happen automatically like most devices because the tpm is not a > normal pci device and lives under the root node. > > This patch contains the necessary changes to the tpm logic. > > This depends on patches submitted by Jake Moilanen (10/28) to allow for > the opening of holes in the io_page_mask for this device. Please submit to the appropriate list (linuxppc64-dev at ozlabs.org). There are some issues with that patch. One, I intend to get rid of the io_page_mask, so that part at least is gone. I don't like the exporting of io_base_phys neither, it's an ugly hack. Other comments inline. > +/* Verify this is a 1.1 Atmel TPM */ > +static int atmel_verify_tpm11(void) > +{ > + struct device_node * dn; > + char *compat; > + int compat_len; > + > + dn = find_devices("tpm"); find_devices() is a deprecated interface. Use the of_find_node_* series and do an of_node_put() once done. > + if (!dn) > + return 1; > + > + compat = (char *) get_property(dn, "compatible", &compat_len); > + if (!compat) > + return 1; > + > + if ( strcmp( compat,"AT97SC3201_r") == 0 ) > + return 0; > + Testing the "compatible" property this way is bogus. Use device_is_compatible(). Or better, use of_find_compatible_node() which allows you to find by type & compatible in one step. > + dn = find_devices("tpm"); Same comment as above. In addition, why do you have to do it twice ? You should rethink your changes. Only one "probe" should be needed that retreives the base addresses. > + if (!dn) > + return 0; > + > + reg = (unsigned int *) get_property(dn, "reg", ®len); > + naddrc = prom_n_addr_cells(dn); > + nsizec = prom_n_size_cells(dn); > + > + for (i = 0; i < reglen; i = i + naddrc + nsizec) { > + > + if (naddrc == 2) > + address = ((unsigned long)reg[i] << 32) | reg[i+1]; > + else > + address = reg[i]; > + > + address = address - pci_io_base_phys; That is bogosity. Your address is an ISA IO address, It should be relative to the parent LPC bus and thus useable as is. It looks like the firmware folks crapped the device-tree. Please check that with them. If they decide to stick with a broken device-tree, then you'll have to consider the address as an MMIO address. That mean you'll have to change the IO accesses of the TPM driver to use the iomap API thus making it immune of IO cs. MMIO distinction. > + allow_isa_address(address, address+size-1); That is going away. Ben. From michael at ellerman.id.au Tue Nov 1 11:13:23 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 1 Nov 2005 11:13:23 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511010050.48828.arnd@arndb.de> References: <20051101010836.771791000@localhost> <200511011026.59266.michael@ellerman.id.au> <200511010050.48828.arnd@arndb.de> Message-ID: <200511011113.26609.michael@ellerman.id.au> On Tue, 1 Nov 2005 10:50, Arnd Bergmann wrote: > On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > > A lot of your smp routines are identical to the pSeries versions. > > Wouldn't it be preferable to only have one implementation? > > Yes it would. I'm not sure how that would best be done however. Until > 2.6.14, we've just used the pSeries implementation, which does not work any > more now that we want to keep the platform stuff in separate directories. > > One idea might be to split out the rtas calls (startup_cpu, give_timebase, > take_timebase) to rtas.c so they can be included by all chrp-descendants. > smp_init_cell() can be further simplified under the assumption that we're > always SMT and never LPAR, although the latter might change in the future. OK, I'm not sure what the best spot is. arch/powerpc/sysdev is apparently the place for stuff that's not core-kernel but shared between platforms, although maybe smp ops are core, I dunno. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051101/dc38ac71/attachment.pgp From david at gibson.dropbear.id.au Tue Nov 1 15:30:26 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 15:30:26 +1100 Subject: powerpc: Move naca.h to platforms/iseries Message-ID: <20051101043026.GB27961@localhost.localdomain> Paulus, please apply. These days, the NACA only exists on iSeries. Therefore, this patch moves naca.h from include/asm-ppc64 to arch/powerpc/platforms/iseries. There was one file including naca.h outside of platforms/iseries - arch/ppc64/kernel/udbg_scc.c. However, that's obviously a hangover from older days. The include is not necessary, so this patch simply removes it. Built and booted on iSeries, built for G5 (which uses udbg_scc.o). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/platforms/iseries/lpardata.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/lpardata.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/lpardata.c 2005-11-01 15:15:57.000000000 +1100 @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -23,6 +22,7 @@ #include #include +#include "naca.h" #include "vpd_areas.h" #include "spcomm_area.h" #include "ipl_parms.h" Index: working-2.6/arch/powerpc/platforms/iseries/naca.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/platforms/iseries/naca.h 2005-11-01 15:28:03.000000000 +1100 @@ -0,0 +1,24 @@ +#ifndef _PLATFORMS_ISERIES_NACA_H +#define _PLATFORMS_ISERIES_NACA_H + +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include + +struct naca_struct { + /* Kernel only data - undefined for user space */ + void *xItVpdAreas; /* VPD Data 0x00 */ + void *xRamDisk; /* iSeries ramdisk 0x08 */ + u64 xRamDiskSize; /* In pages 0x10 */ +}; + +extern struct naca_struct naca; + +#endif /* _PLATFORMS_ISERIES_NACA_H */ Index: working-2.6/arch/powerpc/platforms/iseries/release_data.h =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/release_data.h 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/release_data.h 2005-11-01 15:15:57.000000000 +1100 @@ -24,7 +24,7 @@ * address of the OS's NACA). */ #include -#include +#include "naca.h" /* * When we IPL a secondary partition, we will check if if the Index: working-2.6/arch/powerpc/platforms/iseries/setup.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/setup.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/setup.c 2005-11-01 15:15:57.000000000 +1100 @@ -40,7 +40,6 @@ #include #include -#include #include #include #include @@ -53,6 +52,7 @@ #include #include +#include "naca.h" #include "setup.h" #include "irq.h" #include "vpd_areas.h" Index: working-2.6/arch/ppc64/kernel/udbg_scc.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg_scc.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/udbg_scc.c 2005-11-01 15:15:57.000000000 +1100 @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/include/asm-ppc64/naca.h =================================================================== --- working-2.6.orig/include/asm-ppc64/naca.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,24 +0,0 @@ -#ifndef _NACA_H -#define _NACA_H - -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include - -struct naca_struct { - /* Kernel only data - undefined for user space */ - void *xItVpdAreas; /* VPD Data 0x00 */ - void *xRamDisk; /* iSeries ramdisk 0x08 */ - u64 xRamDiskSize; /* In pages 0x10 */ -}; - -extern struct naca_struct naca; - -#endif /* _NACA_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Nov 1 16:53:24 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 16:53:24 +1100 Subject: powerpc: Merge ipcbuf.h Message-ID: <20051101055324.GA3551@localhost.localdomain> Paulus, please apply. This patch merges ppc32 and ppc64 versions of ipcbuf.h. The merge is essentially trivial, since the structure defined in each version was already identical. Only wrinkle is that the merged version now includes linux/types.h in order to get the fixed width integer types. In fact, the old versions probably should have been including that anyway, since the file uses various __kernel_*_t types. Built and booted on G5, built for 32-bit pmac, but not booted, since the merge tree currently doesn't boot there. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-01 15:44:01.000000000 +1100 @@ -0,0 +1,34 @@ +#ifndef _ASM_POWERPC_IPCBUF_H +#define _ASM_POWERPC_IPCBUF_H + +/* + * The ipc64_perm structure for the powerpc is identical to + * kern_ipc_perm as we have always had 32-bit UIDs and GIDs in the + * kernel. Note extra padding because this structure is passed back + * and forth between kernel and user space. Pad space is left for: + * - 1 32-bit value to fill up for 8-byte alignment + * - 2 miscellaneous 64-bit values + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include + +struct ipc64_perm +{ + __kernel_key_t key; + __kernel_uid_t uid; + __kernel_gid_t gid; + __kernel_uid_t cuid; + __kernel_gid_t cgid; + __kernel_mode_t mode; + unsigned int seq; + unsigned int __pad1; + u64 __unused1; + u64 __unused2; +}; + +#endif /* _ASM_POWERPC_IPCBUF_H */ Index: working-2.6/include/asm-ppc64/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-ppc64/ipcbuf.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,28 +0,0 @@ -#ifndef __PPC64_IPCBUF_H__ -#define __PPC64_IPCBUF_H__ - -/* - * The ipc64_perm structure for the PPC is identical to kern_ipc_perm - * as we have always had 32-bit UIDs and GIDs in the kernel. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -struct ipc64_perm -{ - __kernel_key_t key; - __kernel_uid_t uid; - __kernel_gid_t gid; - __kernel_uid_t cuid; - __kernel_gid_t cgid; - __kernel_mode_t mode; - unsigned int seq; - unsigned int __pad1; - unsigned long __unused1; - unsigned long __unused2; -}; - -#endif /* __PPC64_IPCBUF_H__ */ Index: working-2.6/include/asm-ppc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-ppc/ipcbuf.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,29 +0,0 @@ -#ifndef __PPC_IPCBUF_H__ -#define __PPC_IPCBUF_H__ - -/* - * The ipc64_perm structure for PPC architecture. - * Note extra padding because this structure is passed back and forth - * between kernel and user space. - * - * Pad space is left for: - * - 1 32-bit value to fill up for 8-byte alignment - * - 2 miscellaneous 64-bit values (so that this structure matches - * PPC64 ipc64_perm) - */ - -struct ipc64_perm -{ - __kernel_key_t key; - __kernel_uid_t uid; - __kernel_gid_t gid; - __kernel_uid_t cuid; - __kernel_gid_t cgid; - __kernel_mode_t mode; - unsigned long seq; - unsigned int __pad2; - unsigned long long __unused1; - unsigned long long __unused2; -}; - -#endif /* __PPC_IPCBUF_H__ */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Nov 1 17:28:10 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 17:28:10 +1100 Subject: powerpc: Merge bitops.h In-Reply-To: <20051031064823.GD6622@localhost.localdomain> References: <20051031064823.GD6622@localhost.localdomain> Message-ID: <20051101062810.GC3551@localhost.localdomain> Here's a revised version. This re-introduces the set_bits() function from ppc64, which I removed because I thought it was unused (it exists on no other arch). In fact it is used in the powermac interrupt code (but not on pSeries). This seems to be running fine on my G5 (ARCH=powerpc), but it still hasn't been tested on 32-bit, which should probably happen before merging. - We use LARXL/STCXL macros to generate the right (32 or 64 bit) instructions, similar to LDL/STL from ppc_asm.h, used in fpu.S - ppc32 previously used a full "sync" barrier at the end of test_and_*_bit(), whereas ppc64 used an "isync". The merged version uses "isync", since I believe that's sufficient. - The ppc64 versions of then minix_*() bitmap functions have changed semantics. Previously on ppc64, these functions were big-endian (that is bit 0 was the LSB in the first 64-bit, big-endian word). On ppc32 (and x86, for that matter, they were little-endian. As far as I can tell, the big-endian usage was simply wrong - I guess no-one ever tried to use minixfs on ppc64. - On ppc32 find_next_bit() and find_next_zero_bit() are no longer inline (they were already out-of-line on ppc64). - For ppc64, sched_find_first_bit() has moved from mmu_context.h to the merged bitops. What it was doing in mmu_context.h in the first place, I have no idea. - The fls() function is now implemented using the cntlzw instruction on ppc64, instead of generic_fls(), as it already was on ppc32. - For ARCH=ppc, this patch requires adding arch/powerpc/lib to the arch/ppc/Makefile. This in turn requires some changes to arch/powerpc/lib/Makefile which didn't correctly handle ARCH=ppc. Built and running on G5. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc/bitops.h =================================================================== --- working-2.6.orig/include/asm-ppc/bitops.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,460 +0,0 @@ -/* - * bitops.h: Bit string operations on the ppc - */ - -#ifdef __KERNEL__ -#ifndef _PPC_BITOPS_H -#define _PPC_BITOPS_H - -#include -#include -#include -#include - -/* - * The test_and_*_bit operations are taken to imply a memory barrier - * on SMP systems. - */ -#ifdef CONFIG_SMP -#define SMP_WMB "eieio\n" -#define SMP_MB "\nsync" -#else -#define SMP_WMB -#define SMP_MB -#endif /* CONFIG_SMP */ - -static __inline__ void set_bit(int nr, volatile unsigned long * addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - or %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc" ); -} - -/* - * non-atomic version - */ -static __inline__ void __set_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p |= mask; -} - -/* - * clear_bit doesn't imply a memory barrier - */ -#define smp_mb__before_clear_bit() smp_mb() -#define smp_mb__after_clear_bit() smp_mb() - -static __inline__ void clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - andc %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -/* - * non-atomic version - */ -static __inline__ void __clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p &= ~mask; -} - -static __inline__ void change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - xor %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -/* - * non-atomic version - */ -static __inline__ void __change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p ^= mask; -} - -/* - * test_and_*_bit do imply a memory barrier (?) - */ -static __inline__ int test_and_set_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - or %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old | mask; - return (old & mask) != 0; -} - -static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - andc %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old & ~mask; - return (old & mask) != 0; -} - -static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - xor %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old ^ mask; - return (old & mask) != 0; -} - -static __inline__ int test_bit(int nr, __const__ volatile unsigned long *addr) -{ - return ((addr[nr >> 5] >> (nr & 0x1f)) & 1) != 0; -} - -/* Return the bit position of the most significant 1 bit in a word */ -static __inline__ int __ilog2(unsigned long x) -{ - int lz; - - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); - return 31 - lz; -} - -static __inline__ int ffz(unsigned long x) -{ - if ((x = ~x) == 0) - return 32; - return __ilog2(x & -x); -} - -static inline int __ffs(unsigned long x) -{ - return __ilog2(x & -x); -} - -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - return __ilog2(x & -x) + 1; -} - -/* - * fls: find last (most-significant) bit set. - * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. - */ -static __inline__ int fls(unsigned int x) -{ - int lz; - - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); - return 32 - lz; -} - -/* - * hweightN: returns the hamming weight (i.e. the number - * of bits set) of a N-bit word - */ - -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) - -/* - * Find the first bit set in a 140-bit bitmap. - * The first 100 bits are unlikely to be set. - */ -static inline int sched_find_first_bit(const unsigned long *b) -{ - if (unlikely(b[0])) - return __ffs(b[0]); - if (unlikely(b[1])) - return __ffs(b[1]) + 32; - if (unlikely(b[2])) - return __ffs(b[2]) + 64; - if (b[3]) - return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; -} - -/** - * find_next_bit - find the next set bit in a memory region - * @addr: The address to base the search on - * @offset: The bitnumber to start searching at - * @size: The maximum size to search - */ -static __inline__ unsigned long find_next_bit(const unsigned long *addr, - unsigned long size, unsigned long offset) -{ - unsigned int *p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = *p++; - tmp &= ~0UL << offset; - if (size < 32) - goto found_first; - if (tmp) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = *p++) != 0) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp &= ~0UL >> (32 - size); - if (tmp == 0UL) /* Are any bits set? */ - return result + size; /* Nope. */ -found_middle: - return result + __ffs(tmp); -} - -/** - * find_first_bit - find the first set bit in a memory region - * @addr: The address to start the search at - * @size: The maximum size to search - * - * Returns the bit-number of the first set bit, not the number of the byte - * containing a bit. - */ -#define find_first_bit(addr, size) \ - find_next_bit((addr), (size), 0) - -/* - * This implementation of find_{first,next}_zero_bit was stolen from - * Linus' asm-alpha/bitops.h. - */ -#define find_first_zero_bit(addr, size) \ - find_next_zero_bit((addr), (size), 0) - -static __inline__ unsigned long find_next_zero_bit(const unsigned long *addr, - unsigned long size, unsigned long offset) -{ - unsigned int * p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = *p++; - tmp |= ~0UL >> (32-offset); - if (size < 32) - goto found_first; - if (tmp != ~0U) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = *p++) != ~0U) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = *p; -found_first: - tmp |= ~0UL << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - - -#define ext2_set_bit(nr, addr) __test_and_set_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_set_bit_atomic(lock, nr, addr) test_and_set_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_clear_bit_atomic(lock, nr, addr) test_and_clear_bit((nr) ^ 0x18, (unsigned long *)(addr)) - -static __inline__ int ext2_test_bit(int nr, __const__ void * addr) -{ - __const__ unsigned char *ADDR = (__const__ unsigned char *) addr; - - return (ADDR[nr >> 3] >> (nr & 7)) & 1; -} - -/* - * This implementation of ext2_find_{first,next}_zero_bit was stolen from - * Linus' asm-alpha/bitops.h and modified for a big-endian machine. - */ - -#define ext2_find_first_zero_bit(addr, size) \ - ext2_find_next_zero_bit((addr), (size), 0) - -static __inline__ unsigned long ext2_find_next_zero_bit(const void *addr, - unsigned long size, unsigned long offset) -{ - unsigned int *p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = cpu_to_le32p(p++); - tmp |= ~0UL >> (32-offset); - if (size < 32) - goto found_first; - if (tmp != ~0U) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = cpu_to_le32p(p++)) != ~0U) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = cpu_to_le32p(p); -found_first: - tmp |= ~0U << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - -/* Bitmap functions for the minix filesystem. */ -#define minix_test_and_set_bit(nr,addr) ext2_set_bit(nr,addr) -#define minix_set_bit(nr,addr) ((void)ext2_set_bit(nr,addr)) -#define minix_test_and_clear_bit(nr,addr) ext2_clear_bit(nr,addr) -#define minix_test_bit(nr,addr) ext2_test_bit(nr,addr) -#define minix_find_first_zero_bit(addr,size) ext2_find_first_zero_bit(addr,size) - -#endif /* _PPC_BITOPS_H */ -#endif /* __KERNEL__ */ Index: working-2.6/include/asm-ppc64/bitops.h =================================================================== --- working-2.6.orig/include/asm-ppc64/bitops.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,360 +0,0 @@ -/* - * PowerPC64 atomic bit operations. - * Dave Engebretsen, Todd Inglett, Don Reed, Pat McCarthy, Peter Bergner, - * Anton Blanchard - * - * Originally taken from the 32b PPC code. Modified to use 64b values for - * the various counters & memory references. - * - * Bitops are odd when viewed on big-endian systems. They were designed - * on little endian so the size of the bitset doesn't matter (low order bytes - * come first) as long as the bit in question is valid. - * - * Bits are "tested" often using the C expression (val & (1< - -/* - * clear_bit doesn't imply a memory barrier - */ -#define smp_mb__before_clear_bit() smp_mb() -#define smp_mb__after_clear_bit() smp_mb() - -static __inline__ int test_bit(unsigned long nr, __const__ volatile unsigned long *addr) -{ - return (1UL & (addr[nr >> 6] >> (nr & 63))); -} - -static __inline__ void set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # set_bit\n\ - or %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ void clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # clear_bit\n\ - andc %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ void change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # change_bit\n\ - xor %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ int test_and_set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_set_bit\n\ - or %1,%0,%2 \n\ - stdcx. %1,0,%3 \n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_clear_bit\n\ - andc %1,%0,%2\n\ - stdcx. %1,0,%3\n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ int test_and_change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_change_bit\n\ - xor %1,%0,%2\n\ - stdcx. %1,0,%3\n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ void set_bits(unsigned long mask, unsigned long *addr) -{ - unsigned long old; - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # set_bit\n\ - or %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*addr) - : "r" (mask), "r" (addr), "m" (*addr) - : "cc"); -} - -/* - * non-atomic versions - */ -static __inline__ void __set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p |= mask; -} - -static __inline__ void __clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p &= ~mask; -} - -static __inline__ void __change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p ^= mask; -} - -static __inline__ int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old | mask; - return (old & mask) != 0; -} - -static __inline__ int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old & ~mask; - return (old & mask) != 0; -} - -static __inline__ int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old ^ mask; - return (old & mask) != 0; -} - -/* - * Return the zero-based bit position (from RIGHT TO LEFT, 63 -> 0) of the - * most significant (left-most) 1-bit in a double word. - */ -static __inline__ int __ilog2(unsigned long x) -{ - int lz; - - asm ("cntlzd %0,%1" : "=r" (lz) : "r" (x)); - return 63 - lz; -} - -/* - * Determines the bit position of the least significant (rightmost) 0 bit - * in the specified double word. The returned bit position will be zero-based, - * starting from the right side (63 - 0). - */ -static __inline__ unsigned long ffz(unsigned long x) -{ - /* no zero exists anywhere in the 8 byte area. */ - if ((x = ~x) == 0) - return 64; - - /* - * Calculate the bit position of the least signficant '1' bit in x - * (since x has been changed this will actually be the least signficant - * '0' bit in * the original x). Note: (x & -x) gives us a mask that - * is the least significant * (RIGHT-most) 1-bit of the value in x. - */ - return __ilog2(x & -x); -} - -static __inline__ int __ffs(unsigned long x) -{ - return __ilog2(x & -x); -} - -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - unsigned long i = (unsigned long)x; - return __ilog2(i & -i) + 1; -} - -/* - * fls: find last (most-significant) bit set. - * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. - */ -#define fls(x) generic_fls(x) - -/* - * hweightN: returns the hamming weight (i.e. the number - * of bits set) of a N-bit word - */ -#define hweight64(x) generic_hweight64(x) -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) - -extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_zero_bit(addr, size) \ - find_next_zero_bit((addr), (size), 0) - -extern unsigned long find_next_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_bit(addr, size) \ - find_next_bit((addr), (size), 0) - -extern unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_zero_le_bit(addr, size) \ - find_next_zero_le_bit((addr), (size), 0) - -static __inline__ int test_le_bit(unsigned long nr, __const__ unsigned long * addr) -{ - __const__ unsigned char *ADDR = (__const__ unsigned char *) addr; - return (ADDR[nr >> 3] >> (nr & 7)) & 1; -} - -#define test_and_clear_le_bit(nr, addr) \ - test_and_clear_bit((nr) ^ 0x38, (addr)) -#define test_and_set_le_bit(nr, addr) \ - test_and_set_bit((nr) ^ 0x38, (addr)) - -/* - * non-atomic versions - */ - -#define __set_le_bit(nr, addr) \ - __set_bit((nr) ^ 0x38, (addr)) -#define __clear_le_bit(nr, addr) \ - __clear_bit((nr) ^ 0x38, (addr)) -#define __test_and_clear_le_bit(nr, addr) \ - __test_and_clear_bit((nr) ^ 0x38, (addr)) -#define __test_and_set_le_bit(nr, addr) \ - __test_and_set_bit((nr) ^ 0x38, (addr)) - -#define ext2_set_bit(nr,addr) \ - __test_and_set_le_bit((nr), (unsigned long*)addr) -#define ext2_clear_bit(nr, addr) \ - __test_and_clear_le_bit((nr), (unsigned long*)addr) - -#define ext2_set_bit_atomic(lock, nr, addr) \ - test_and_set_le_bit((nr), (unsigned long*)addr) -#define ext2_clear_bit_atomic(lock, nr, addr) \ - test_and_clear_le_bit((nr), (unsigned long*)addr) - - -#define ext2_test_bit(nr, addr) test_le_bit((nr),(unsigned long*)addr) -#define ext2_find_first_zero_bit(addr, size) \ - find_first_zero_le_bit((unsigned long*)addr, size) -#define ext2_find_next_zero_bit(addr, size, off) \ - find_next_zero_le_bit((unsigned long*)addr, size, off) - -#define minix_test_and_set_bit(nr,addr) test_and_set_bit(nr,addr) -#define minix_set_bit(nr,addr) set_bit(nr,addr) -#define minix_test_and_clear_bit(nr,addr) test_and_clear_bit(nr,addr) -#define minix_test_bit(nr,addr) test_bit(nr,addr) -#define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size) - -#endif /* __KERNEL__ */ -#endif /* _PPC64_BITOPS_H */ Index: working-2.6/arch/powerpc/lib/bitops.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/lib/bitops.c 2005-11-01 15:51:56.000000000 +1100 @@ -0,0 +1,150 @@ +#include +#include +#include +#include + +/** + * find_next_bit - find the next set bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The maximum size to search + */ +unsigned long find_next_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp &= (~0UL << offset); + if (size < BITS_PER_LONG) + goto found_first; + if (tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if ((tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp &= (~0UL >> (64 - size)); + if (tmp == 0UL) /* Are any bits set? */ + return result + size; /* Nope. */ +found_middle: + return result + __ffs(tmp); +} +EXPORT_SYMBOL(find_next_bit); + +/* + * This implementation of find_{first,next}_zero_bit was stolen from + * Linus' asm-alpha/bitops.h. + */ +unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp |= ~0UL >> (BITS_PER_LONG - offset); + if (size < BITS_PER_LONG) + goto found_first; + if (~tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if (~(tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp |= ~0UL << size; + if (tmp == ~0UL) /* Are any bits zero? */ + return result + size; /* Nope. */ +found_middle: + return result + ffz(tmp); +} +EXPORT_SYMBOL(find_next_zero_bit); + +static inline unsigned int ext2_ilog2(unsigned int x) +{ + int lz; + + asm("cntlzw %0,%1": "=r"(lz):"r"(x)); + return 31 - lz; +} + +static inline unsigned int ext2_ffz(unsigned int x) +{ + u32 rc; + if ((x = ~x) == 0) + return 32; + rc = ext2_ilog2(x & -x); + return rc; +} + +unsigned long find_next_zero_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + const unsigned int *p = ((const unsigned int *)addr) + (offset >> 5); + unsigned int result = offset & ~31; + unsigned int tmp; + + if (offset >= size) + return size; + size -= result; + offset &= 31; + if (offset) { + tmp = cpu_to_le32p(p++); + tmp |= ~0U >> (32 - offset); /* bug or feature ? */ + if (size < 32) + goto found_first; + if (tmp != ~0) + goto found_middle; + size -= 32; + result += 32; + } + while (size >= 32) { + if ((tmp = cpu_to_le32p(p++)) != ~0) + goto found_middle; + result += 32; + size -= 32; + } + if (!size) + return result; + tmp = cpu_to_le32p(p); +found_first: + tmp |= ~0 << size; + if (tmp == ~0) /* Are any bits zero? */ + return result + size; /* Nope. */ +found_middle: + return result + ext2_ffz(tmp); +} +EXPORT_SYMBOL(find_next_zero_le_bit); Index: working-2.6/arch/ppc64/kernel/bitops.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/bitops.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,147 +0,0 @@ -/* - * These are too big to be inlined. - */ - -#include -#include -#include -#include - -unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned long *p = addr + (offset >> 6); - unsigned long result = offset & ~63UL; - unsigned long tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 63UL; - if (offset) { - tmp = *(p++); - tmp |= ~0UL >> (64 - offset); - if (size < 64) - goto found_first; - if (~tmp) - goto found_middle; - size -= 64; - result += 64; - } - while (size & ~63UL) { - if (~(tmp = *(p++))) - goto found_middle; - result += 64; - size -= 64; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp |= ~0UL << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - -EXPORT_SYMBOL(find_next_zero_bit); - -unsigned long find_next_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned long *p = addr + (offset >> 6); - unsigned long result = offset & ~63UL; - unsigned long tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 63UL; - if (offset) { - tmp = *(p++); - tmp &= (~0UL << offset); - if (size < 64) - goto found_first; - if (tmp) - goto found_middle; - size -= 64; - result += 64; - } - while (size & ~63UL) { - if ((tmp = *(p++))) - goto found_middle; - result += 64; - size -= 64; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp &= (~0UL >> (64 - size)); - if (tmp == 0UL) /* Are any bits set? */ - return result + size; /* Nope. */ -found_middle: - return result + __ffs(tmp); -} - -EXPORT_SYMBOL(find_next_bit); - -static inline unsigned int ext2_ilog2(unsigned int x) -{ - int lz; - - asm("cntlzw %0,%1": "=r"(lz):"r"(x)); - return 31 - lz; -} - -static inline unsigned int ext2_ffz(unsigned int x) -{ - u32 rc; - if ((x = ~x) == 0) - return 32; - rc = ext2_ilog2(x & -x); - return rc; -} - -unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned int *p = ((const unsigned int *)addr) + (offset >> 5); - unsigned int result = offset & ~31; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31; - if (offset) { - tmp = cpu_to_le32p(p++); - tmp |= ~0U >> (32 - offset); /* bug or feature ? */ - if (size < 32) - goto found_first; - if (tmp != ~0) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = cpu_to_le32p(p++)) != ~0) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = cpu_to_le32p(p); -found_first: - tmp |= ~0 << size; - if (tmp == ~0) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ext2_ffz(tmp); -} - -EXPORT_SYMBOL(find_next_zero_le_bit); Index: working-2.6/arch/powerpc/kernel/ppc_ksyms.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/ppc_ksyms.c 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/ppc_ksyms.c 2005-11-01 15:51:56.000000000 +1100 @@ -81,15 +81,6 @@ EXPORT_SYMBOL(ucSystemType); #endif -#if !defined(__INLINE_BITOPS) -EXPORT_SYMBOL(set_bit); -EXPORT_SYMBOL(clear_bit); -EXPORT_SYMBOL(change_bit); -EXPORT_SYMBOL(test_and_set_bit); -EXPORT_SYMBOL(test_and_clear_bit); -EXPORT_SYMBOL(test_and_change_bit); -#endif /* __INLINE_BITOPS */ - EXPORT_SYMBOL(strcpy); EXPORT_SYMBOL(strncpy); EXPORT_SYMBOL(strcat); Index: working-2.6/arch/ppc/kernel/bitops.c =================================================================== --- working-2.6.orig/arch/ppc/kernel/bitops.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,126 +0,0 @@ -/* - * Copyright (C) 1996 Paul Mackerras. - */ - -#include -#include - -/* - * If the bitops are not inlined in bitops.h, they are defined here. - * -- paulus - */ -#if !__INLINE_BITOPS -void set_bit(int nr, volatile void * addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - or %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc" ); -} - -void clear_bit(int nr, volatile void *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - andc %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -void change_bit(int nr, volatile void *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - xor %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -int test_and_set_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - or %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} - -int test_and_clear_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - andc %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} - -int test_and_change_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - xor %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} -#endif /* !__INLINE_BITOPS */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -13,7 +13,7 @@ obj-y += irq.o idle.o dma.o \ signal.o \ - align.o bitops.o pacaData.o \ + align.o pacaData.o \ udbg.o ioctl32.o \ rtc.o \ cpu_setup_power4.o \ Index: working-2.6/include/asm-powerpc/bitops.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/bitops.h 2005-11-01 15:51:56.000000000 +1100 @@ -0,0 +1,437 @@ +/* + * PowerPC atomic bit operations. + * + * Merged version by David Gibson . + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard. They + * originally took it from the ppc32 code. + * + * Within a word, bits are numbered LSB first. Lot's of places make + * this assumption by directly testing bits with (val & (1< 1 word) bitmaps on a + * big-endian system because, unlike little endian, the number of each + * bit depends on the word size. + * + * The bitop functions are defined to work on unsigned longs, so for a + * ppc64 system the bits end up numbered: + * |63..............0|127............64|191...........128|255...........196| + * and on ppc32: + * |31.....0|63....31|95....64|127...96|159..128|191..160|223..192|255..224| + * + * There are a few little-endian macros used mostly for filesystem + * bitmaps, these work on similar bit arrays layouts, but + * byte-oriented: + * |7...0|15...8|23...16|31...24|39...32|47...40|55...48|63...56| + * + * The main difference is that bit 3-5 (64b) or 3-4 (32b) in the bit + * number field needs to be reversed compared to the big-endian bit + * fields. This can be achieved by XOR with 0x38 (64b) or 0x18 (32b). + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_POWERPC_BITOPS_H +#define _ASM_POWERPC_BITOPS_H + +#ifdef __KERNEL__ + +#include +#include +#include + +/* + * clear_bit doesn't imply a memory barrier + */ +#define smp_mb__before_clear_bit() smp_mb() +#define smp_mb__after_clear_bit() smp_mb() + +#define BITOP_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) +#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) + +#ifdef CONFIG_PPC64 +#define LARXL "ldarx" +#define STCXL "stdcx." +#define CNTLZL "cntlzd" +#else +#define LARXL "lwarx" +#define STCXL "stwcx." +#define CNTLZL "cntlzw" +#endif + +static __inline__ void set_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "or %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ void clear_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "andc %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ void change_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "xor %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ int test_and_set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_set_bit\n" + "or %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ int test_and_clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_clear_bit\n" + "andc %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ int test_and_change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_change_bit\n" + "xor %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ void set_bits(unsigned long mask, unsigned long *addr) +{ + unsigned long old; + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "or %0,%0,%2\n" + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r" (old), "=m" (*addr) + : "r" (mask), "r" (addr), "m" (*addr) + : "cc"); +} + +/* Non-atomic versions */ +static __inline__ int test_bit(unsigned long nr, + __const__ volatile unsigned long *addr) +{ + return 1UL & (addr[BITOP_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); +} + +static __inline__ void __set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p |= mask; +} + +static __inline__ void __clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p &= ~mask; +} + +static __inline__ void __change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p ^= mask; +} + +static __inline__ int __test_and_set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old | mask; + return (old & mask) != 0; +} + +static __inline__ int __test_and_clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old & ~mask; + return (old & mask) != 0; +} + +static __inline__ int __test_and_change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old ^ mask; + return (old & mask) != 0; +} + +/* + * Return the zero-based bit position (LE, not IBM bit numbering) of + * the most significant 1-bit in a double word. + */ +static __inline__ int __ilog2(unsigned long x) +{ + int lz; + + asm (CNTLZL " %0,%1" : "=r" (lz) : "r" (x)); + return BITS_PER_LONG - 1 - lz; +} + +/* + * Determines the bit position of the least significant 0 bit in the + * specified double word. The returned bit position will be + * zero-based, starting from the right side (63/31 - 0). + */ +static __inline__ unsigned long ffz(unsigned long x) +{ + /* no zero exists anywhere in the 8 byte area. */ + if ((x = ~x) == 0) + return BITS_PER_LONG; + + /* + * Calculate the bit position of the least signficant '1' bit in x + * (since x has been changed this will actually be the least signficant + * '0' bit in * the original x). Note: (x & -x) gives us a mask that + * is the least significant * (RIGHT-most) 1-bit of the value in x. + */ + return __ilog2(x & -x); +} + +static __inline__ int __ffs(unsigned long x) +{ + return __ilog2(x & -x); +} + +/* + * ffs: find first bit set. This is defined the same way as + * the libc and compiler builtin ffs routines, therefore + * differs in spirit from the above ffz (man ffs). + */ +static __inline__ int ffs(int x) +{ + unsigned long i = (unsigned long)x; + return __ilog2(i & -i) + 1; +} + +/* + * fls: find last (most-significant) bit set. + * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. + */ +static __inline__ int fls(unsigned int x) +{ + int lz; + + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); + return 32 - lz; +} + +/* + * hweightN: returns the hamming weight (i.e. the number + * of bits set) of a N-bit word + */ +#define hweight64(x) generic_hweight64(x) +#define hweight32(x) generic_hweight32(x) +#define hweight16(x) generic_hweight16(x) +#define hweight8(x) generic_hweight8(x) + +#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) +unsigned long find_next_zero_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); +/** + * find_first_bit - find the first set bit in a memory region + * @addr: The address to start the search at + * @size: The maximum size to search + * + * Returns the bit-number of the first set bit, not the number of the byte + * containing a bit. + */ +#define find_first_bit(addr, size) find_next_bit((addr), (size), 0) +unsigned long find_next_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); + +/* Little-endian versions */ + +static __inline__ int test_le_bit(unsigned long nr, + __const__ unsigned long *addr) +{ + __const__ unsigned char *tmp = (__const__ unsigned char *) addr; + return (tmp[nr >> 3] >> (nr & 7)) & 1; +} + +#define __set_le_bit(nr, addr) \ + __set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define __clear_le_bit(nr, addr) \ + __clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define test_and_set_le_bit(nr, addr) \ + test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define test_and_clear_le_bit(nr, addr) \ + test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define __test_and_set_le_bit(nr, addr) \ + __test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define __test_and_clear_le_bit(nr, addr) \ + __test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define find_first_zero_le_bit(addr, size) find_next_zero_le_bit((addr), (size), 0) +unsigned long find_next_zero_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); + +/* Bitmap functions for the ext2 filesystem */ + +#define ext2_set_bit(nr,addr) \ + __test_and_set_le_bit((nr), (unsigned long*)addr) +#define ext2_clear_bit(nr, addr) \ + __test_and_clear_le_bit((nr), (unsigned long*)addr) + +#define ext2_set_bit_atomic(lock, nr, addr) \ + test_and_set_le_bit((nr), (unsigned long*)addr) +#define ext2_clear_bit_atomic(lock, nr, addr) \ + test_and_clear_le_bit((nr), (unsigned long*)addr) + +#define ext2_test_bit(nr, addr) test_le_bit((nr),(unsigned long*)addr) + +#define ext2_find_first_zero_bit(addr, size) \ + find_first_zero_le_bit((unsigned long*)addr, size) +#define ext2_find_next_zero_bit(addr, size, off) \ + find_next_zero_le_bit((unsigned long*)addr, size, off) + +/* Bitmap functions for the minix filesystem. */ + +#define minix_test_and_set_bit(nr,addr) \ + __test_and_set_le_bit(nr, (unsigned long *)addr) +#define minix_set_bit(nr,addr) \ + __set_le_bit(nr, (unsigned long *)addr) +#define minix_test_and_clear_bit(nr,addr) \ + __test_and_clear_le_bit(nr, (unsigned long *)addr) +#define minix_test_bit(nr,addr) \ + test_le_bit(nr, (unsigned long *)addr) + +#define minix_find_first_zero_bit(addr,size) \ + find_first_zero_le_bit((unsigned long *)addr, size) + +/* + * Every architecture must define this function. It's the fastest + * way of searching a 140-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 140 + * bits is cleared. + */ +static inline int sched_find_first_bit(const unsigned long *b) +{ +#ifdef CONFIG_PPC64 + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 64; + return __ffs(b[2]) + 128; +#else + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 32; + if (unlikely(b[2])) + return __ffs(b[2]) + 64; + if (b[3]) + return __ffs(b[3]) + 96; + return __ffs(b[4]) + 128; +#endif +} + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_BITOPS_H */ Index: working-2.6/include/asm-ppc64/mmu_context.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu_context.h 2005-10-25 11:59:59.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu_context.h 2005-11-01 15:51:56.000000000 +1100 @@ -16,21 +16,6 @@ * 2 of the License, or (at your option) any later version. */ -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - if (unlikely(b[0])) - return __ffs(b[0]); - if (unlikely(b[1])) - return __ffs(b[1]) + 64; - return __ffs(b[2]) + 128; -} - static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) { } Index: working-2.6/arch/ppc/Makefile =================================================================== --- working-2.6.orig/arch/ppc/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/ppc/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -66,7 +66,8 @@ core-y += arch/ppc/kernel/ arch/powerpc/kernel/ \ arch/ppc/platforms/ \ arch/ppc/mm/ arch/ppc/lib/ \ - arch/ppc/syslib/ arch/powerpc/sysdev/ + arch/ppc/syslib/ arch/powerpc/sysdev/ \ + arch/powerpc/lib/ core-$(CONFIG_4xx) += arch/ppc/platforms/4xx/ core-$(CONFIG_83xx) += arch/ppc/platforms/83xx/ core-$(CONFIG_85xx) += arch/ppc/platforms/85xx/ Index: working-2.6/arch/powerpc/lib/Makefile =================================================================== --- working-2.6.orig/arch/powerpc/lib/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/powerpc/lib/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -3,13 +3,14 @@ # ifeq ($(CONFIG_PPC_MERGE),y) -obj-y := string.o +obj-y := string.o strcase.o +obj-$(CONFIG_PPC32) += div64.o copy_32.o checksum_32.o endif -obj-y += strcase.o -obj-$(CONFIG_PPC32) += div64.o copy_32.o checksum_32.o +obj-y += bitops.o obj-$(CONFIG_PPC64) += checksum_64.o copypage_64.o copyuser_64.o \ - memcpy_64.o usercopy_64.o mem_64.o + memcpy_64.o usercopy_64.o mem_64.o \ + strcase.o obj-$(CONFIG_PPC_ISERIES) += e2a.o obj-$(CONFIG_XMON) += sstep.o -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From mikey at neuling.org Tue Nov 1 18:14:40 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:40 +1100 (EST) Subject: [PATCH 0/3] powerpc: Fix legacy drivers for remove io_page_mask patch Message-ID: <1130829279.790724.373020147318.qpush@coopers> These patches are the same as I send the other day, except updated for the merge tree. Also now contains an updated version of Anton's original patch which will apply cleanly to the merge tree. Anton's patch is necessary for running kexec with some e1000 revisions. The issue we have is that the reset is not being sent to the e1000 correctly, resulting in it still running during the second boot. Other two patches fix drivers which have issues with the first patch. Mikey From mikey at neuling.org Tue Nov 1 18:14:41 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:41 +1100 (EST) Subject: [PATCH 1/3] powerpc: Updated remove io_page_mask In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071441.1C53668662@ozlabs.org> From: Anton Blanchard Retransmit of Anton's patch from here: http://ozlabs.org/pipermail/linuxppc64-dev/2005-May/003922.html Updated for merge tree. Signed-off-by: Michael Neuling arch/powerpc/platforms/iseries/pci.c | 3 --- arch/powerpc/platforms/maple/pci.c | 3 --- arch/powerpc/platforms/powermac/pci.c | 3 --- arch/ppc64/kernel/iomap.c | 2 -- arch/ppc64/kernel/pci.c | 30 +++--------------------------- include/asm-ppc64/eeh.h | 15 +++------------ include/asm-ppc64/io.h | 6 ------ 7 files changed, 6 insertions(+), 56 deletions(-) Index: linux-2.6/arch/powerpc/platforms/iseries/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/iseries/pci.c 2005-11-01 11:04:27.000000000 +1100 @@ -45,8 +45,6 @@ #include "pci.h" #include "call_pci.h" -extern unsigned long io_page_mask; - /* * Forward declares of prototypes. */ @@ -288,7 +286,6 @@ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iomm_table_initialize(); find_and_init_phbs(); - io_page_mask = -1; PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } Index: linux-2.6/arch/powerpc/platforms/maple/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/maple/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/maple/pci.c 2005-11-01 11:06:30.000000000 +1100 @@ -455,9 +455,6 @@ /* Tell pci.c to use the common resource allocation mecanism */ pci_probe_only = 0; - - /* Allow all IO */ - io_page_mask = -1; } int maple_pci_get_legacy_ide_irq(struct pci_dev *pdev, int channel) Index: linux-2.6/arch/powerpc/platforms/powermac/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/powermac/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/powermac/pci.c 2005-11-01 11:10:07.000000000 +1100 @@ -926,9 +926,6 @@ /* Tell pci.c to not use the common resource allocation mechanism */ pci_probe_only = 1; - /* Allow all IO */ - io_page_mask = -1; - #else /* CONFIG_PPC64 */ init_p2pbridge(); fixup_nec_usb2(); Index: linux-2.6/arch/ppc64/kernel/iomap.c =================================================================== --- linux-2.6.orig/arch/ppc64/kernel/iomap.c 2005-11-01 10:30:32.000000000 +1100 +++ linux-2.6/arch/ppc64/kernel/iomap.c 2005-11-01 11:05:01.000000000 +1100 @@ -108,8 +108,6 @@ void __iomem *ioport_map(unsigned long port, unsigned int len) { - if (!_IO_IS_VALID(port)) - return NULL; return (void __iomem *) (port+pci_io_base); } Index: linux-2.6/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6.orig/arch/ppc64/kernel/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/ppc64/kernel/pci.c 2005-11-01 11:08:44.000000000 +1100 @@ -42,14 +42,6 @@ unsigned long pci_probe_only = 1; unsigned long pci_assign_all_buses = 0; -/* - * legal IO pages under MAX_ISA_PORT. This is to ensure we don't touch - * devices we don't have access to. - */ -unsigned long io_page_mask; - -EXPORT_SYMBOL(io_page_mask); - #ifdef CONFIG_PPC_MULTIPLATFORM static void fixup_resource(struct resource *res, struct pci_dev *dev); static void do_bus_setup(struct pci_bus *bus); @@ -995,8 +987,6 @@ pci_process_ISA_OF_ranges(isa_dn, hose->io_base_phys, hose->io_base_virt); of_node_put(isa_dn); - /* Allow all IO */ - io_page_mask = -1; } } @@ -1132,27 +1122,13 @@ static void __devinit fixup_resource(struct resource *res, struct pci_dev *dev) { struct pci_controller *hose = pci_bus_to_host(dev->bus); - unsigned long start, end, mask, offset; + unsigned long offset; if (res->flags & IORESOURCE_IO) { offset = (unsigned long)hose->io_base_virt - pci_io_base; - start = res->start += offset; - end = res->end += offset; - - /* Need to allow IO access to pages that are in the - ISA range */ - if (start < MAX_ISA_PORT) { - if (end > MAX_ISA_PORT) - end = MAX_ISA_PORT; - - start >>= PAGE_SHIFT; - end >>= PAGE_SHIFT; - - /* get the range of pages for the map */ - mask = ((1 << (end+1)) - 1) ^ ((1 << start) - 1); - io_page_mask |= mask; - } + res->start += offset; + res->end += offset; } else if (res->flags & IORESOURCE_MEM) { res->start += hose->pci_mem_offset; res->end += hose->pci_mem_offset; Index: linux-2.6/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.orig/include/asm-ppc64/eeh.h 2005-11-01 10:30:32.000000000 +1100 +++ linux-2.6/include/asm-ppc64/eeh.h 2005-11-01 11:13:05.000000000 +1100 @@ -311,8 +311,6 @@ static inline u8 eeh_inb(unsigned long port) { u8 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_8((u8 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u8)) return eeh_check_failure((void __iomem *)(port), val); @@ -321,15 +319,12 @@ static inline void eeh_outb(u8 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_8((u8 __iomem *)(port+pci_io_base), val); + out_8((u8 __iomem *)(port+pci_io_base), val); } static inline u16 eeh_inw(unsigned long port) { u16 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_le16((u16 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u16)) return eeh_check_failure((void __iomem *)(port), val); @@ -338,15 +333,12 @@ static inline void eeh_outw(u16 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_le16((u16 __iomem *)(port+pci_io_base), val); + out_le16((u16 __iomem *)(port+pci_io_base), val); } static inline u32 eeh_inl(unsigned long port) { u32 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_le32((u32 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u32)) return eeh_check_failure((void __iomem *)(port), val); @@ -355,8 +347,7 @@ static inline void eeh_outl(u32 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_le32((u32 __iomem *)(port+pci_io_base), val); + out_le32((u32 __iomem *)(port+pci_io_base), val); } /* in-string eeh macros */ Index: linux-2.6/include/asm-ppc64/io.h =================================================================== --- linux-2.6.orig/include/asm-ppc64/io.h 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/include/asm-ppc64/io.h 2005-11-01 11:13:40.000000000 +1100 @@ -33,12 +33,6 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; -extern unsigned long io_page_mask; - -#define MAX_ISA_PORT 0x10000 - -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ - & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ From mikey at neuling.org Tue Nov 1 18:14:41 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:41 +1100 (EST) Subject: [PATCH 2/3] powerpc: Updated parallel port init fix In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071441.1B6BB68665@ozlabs.org> Updated for powerpc merge tree. Signed-off-by: Michael Neuling include/asm-powerpc/parport.h | 28 ++++++++++++++++++++++++++-- 1 files changed, 26 insertions(+), 2 deletions(-) Index: linux-2.6/include/asm-powerpc/parport.h =================================================================== --- linux-2.6.orig/include/asm-powerpc/parport.h 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/include/asm-powerpc/parport.h 2005-11-01 11:35:05.000000000 +1100 @@ -9,10 +9,34 @@ #ifndef _ASM_POWERPC_PARPORT_H #define _ASM_POWERPC_PARPORT_H -static int __devinit parport_pc_find_isa_ports (int autoirq, int autodma); +#include + +extern struct parport *parport_pc_probe_port (unsigned long int base, + unsigned long int base_hi, + int irq, int dma, + struct pci_dev *dev); + static int __devinit parport_pc_find_nonpci_ports (int autoirq, int autodma) { - return parport_pc_find_isa_ports (autoirq, autodma); + struct device_node *np; + u32 *prop; + u32 io1, io2; + int propsize; + int count = 0; + for (np = NULL; (np = of_find_compatible_node(np, + "parallel", + "pnpPNP,400")) != NULL;) { + prop = (u32 *)get_property(np, "reg", &propsize); + if (!prop || propsize > 6*sizeof(u32)) + continue; + io1 = prop[1]; io2 = prop[2]; + prop = (u32 *)get_property(np, "interrupts", NULL); + if (!prop) + continue; + if (parport_pc_probe_port(io1, io2, prop[0], autodma, NULL) != NULL) + count++; + } + return count; } #endif /* !(_ASM_POWERPC_PARPORT_H) */ From mikey at neuling.org Tue Nov 1 18:14:42 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:42 +1100 (EST) Subject: [PATCH 3/3] powerpc: Updated PC speaker init fix In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071442.CD1C66866A@ozlabs.org> Updated for powerpc merge tree. Adds architecture specific init to pcspkr. Signed-off-by: Michael Neuling Acked-by: Paul Mackerras drivers/input/misc/pcspkr.c | 5 +++++ include/asm-powerpc/8253pit.h | 13 +++++++++++++ 2 files changed, 18 insertions(+) Index: linux-2.6/drivers/input/misc/pcspkr.c =================================================================== --- linux-2.6.orig/drivers/input/misc/pcspkr.c 2005-10-31 15:16:39.000000000 +1100 +++ linux-2.6/drivers/input/misc/pcspkr.c 2005-10-31 15:21:13.000000000 +1100 @@ -66,6 +66,11 @@ static int __init pcspkr_init(void) { +#ifdef HAS_PCSPKR_ARCH_INIT + int rc = pcspkr_arch_init(); + if (rc) + return rc; +#endif pcspkr_dev = input_allocate_device(); if (!pcspkr_dev) return -ENOMEM; Index: linux-2.6/include/asm-powerpc/8253pit.h =================================================================== --- linux-2.6.orig/include/asm-powerpc/8253pit.h 2005-10-31 15:02:18.000000000 +1100 +++ linux-2.6/include/asm-powerpc/8253pit.h 2005-10-31 15:20:30.000000000 +1100 @@ -5,6 +5,19 @@ * 8253/8254 Programmable Interval Timer */ +#include + #define PIT_TICK_RATE 1193182UL +#define HAS_PCSPKR_ARCH_INIT + +static inline int pcspkr_arch_init(void) +{ + struct device_node *np; + + np = of_find_compatible_node(NULL, NULL, "pnpPNP,100"); + of_node_put(np); + return np ? 0 : -ENODEV; +} + #endif /* _ASM_POWERPC_8253PIT_H */ From paulus at samba.org Tue Nov 1 16:53:37 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:53:37 +1100 Subject: Patches for 2.6.15 In-Reply-To: <20051028103041.B15268@cox.net> References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> <20051028103041.B15268@cox.net> Message-ID: <17255.737.832440.137041@cargo.ozlabs.ibm.com> Matt Porter writes: > Ok, we have a set of 4xx patches that I plan to send to Andrew. > They are some existing 4xx SoC/board updates as well as a new > SoC/board. They are obviously mostly confined to the 4xx code paths > but there's likely conflicts in changes to Makefiles, etc. > > Would you prefer these going upstream before or after the > powerpc-merge pull? Did you send them yet? Linus has pulled the powerpc-merge tree, as I'm sure you've noticed. Paul. From paulus at samba.org Tue Nov 1 16:46:29 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:46:29 +1100 Subject: [PATCH] VMX get_user w/ irq disabled In-Reply-To: <20051028115509.1bb23cb6.moilanen@austin.ibm.com> References: <20051028115509.1bb23cb6.moilanen@austin.ibm.com> Message-ID: <17255.309.688169.531174@cargo.ozlabs.ibm.com> Jake Moilanen writes: > Looks like we have a get_user() call with interrupts disabled. While I > haven't seen the problem, I believe we have the same hole in mainline. > > The patch below fixed the problem on Redhat (rebased at 2.6.14). The problem is that altivec_assist_exception gets called with interrupts disabled on ppc64. I haven't decided whether to change head_64.S or just do a local_irq_enable() inside altivec_assist_exception(). Paul. From paulus at samba.org Tue Nov 1 16:57:58 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:57:58 +1100 Subject: Patches for 2.6.15 In-Reply-To: References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> Message-ID: <17255.998.459385.399953@cargo.ozlabs.ibm.com> Kumar Gala writes: > Can you merge this in: > > http://patchwork.ozlabs.org/linuxppc/patch?id=2931 Having the same extern declaration in several C files raises a red flag. Could we have that in a suitable header file instead please? Paul. From paulus at samba.org Tue Nov 1 20:55:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 20:55:17 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511010050.48828.arnd@arndb.de> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> <200511011026.59266.michael@ellerman.id.au> <200511010050.48828.arnd@arndb.de> Message-ID: <17255.15237.695554.410362@cargo.ozlabs.ibm.com> Arnd Bergmann writes: > On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > > A lot of your smp routines are identical to the pSeries versions. Wouldn't it > > be preferable to only have one implementation? > > Yes it would. I'm not sure how that would best be done however. Until 2.6.14, > we've just used the pSeries implementation, which does not work any more now > that we want to keep the platform stuff in separate directories. I don't mind putting generic rtas stuff in arch/powerpc/kernel. Paul. From mporter at kernel.crashing.org Wed Nov 2 00:22:50 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 1 Nov 2005 06:22:50 -0700 Subject: Patches for 2.6.15 In-Reply-To: <17255.737.832440.137041@cargo.ozlabs.ibm.com>; from paulus@samba.org on Tue, Nov 01, 2005 at 04:53:37PM +1100 References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> <20051028103041.B15268@cox.net> <17255.737.832440.137041@cargo.ozlabs.ibm.com> Message-ID: <20051101062250.A28639@cox.net> On Tue, Nov 01, 2005 at 04:53:37PM +1100, Paul Mackerras wrote: > Matt Porter writes: > > > Ok, we have a set of 4xx patches that I plan to send to Andrew. > > They are some existing 4xx SoC/board updates as well as a new > > SoC/board. They are obviously mostly confined to the 4xx code paths > > but there's likely conflicts in changes to Makefiles, etc. > > > > Would you prefer these going upstream before or after the > > powerpc-merge pull? > > Did you send them yet? Linus has pulled the powerpc-merge tree, as > I'm sure you've noticed. Yes I did. I saw the merge go into mainline and rebased what was necessary off of that. Andrew now has them queued up for Linus so we are set. BTW, we're starting to look at merging 4xx to arch/powerpc/ now. -Matt From dwmw2 at infradead.org Wed Nov 2 02:54:04 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 01 Nov 2005 15:54:04 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> Message-ID: <1130860444.21212.52.camel@hades.cambridge.redhat.com> On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > It is now possible to build kernels for powermac, pSeries, iSeries and > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > build and run. Hm. Not entirely in line with my experience. Can you share the configs you used? Using http://david/woodhou.se/powerpc-merge-32.config it doesn't actually boot on my powerbook. I'll try it on the Pegasos later or tomorrow, where I have a serial console; it dies very early. Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented anywhere, I also needed to do this to make that config build: --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file #ifdef CONFIG_TAU_AVERAGE /* more straightforward, but potentially misleading */ seq_printf(m, "temperature \t: %u C (uncalibrated)\n", - cpu_temp(i)); + cpu_temp(cpu_id)); #else /* show the actual temp sensor range */ u32 temp; - temp = cpu_temp_both(i); + temp = cpu_temp_both(cpu_id); seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", temp & 0xff, temp >> 16); #endif -- dwmw2 From dwmw2 at infradead.org Wed Nov 2 03:06:47 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 01 Nov 2005 16:06:47 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> Message-ID: <1130861207.21212.66.camel@hades.cambridge.redhat.com> On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > It is now possible to build kernels for powermac, pSeries, iSeries and > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > build and run. The ppc64 build (http://david.woodhou.se/powerpc-merge-64.config) fares worse than ppc32 for me -- it doesn't even build. arch/powerpc/platforms/powermac/pic.c:614: error: ?ppc_cached_irq_mask? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:620: error: ?pmac_irq_hw? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:621: error: ?max_real_irqs? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:641: warning: implicit declaration of function ?pmac_unmask_irq? If I leave CONFIG_ADB_PMU enabled (as I think I should since some G5s have it?) I also see this: drivers/macintosh/via-pmu.c:2410: undefined reference to `.pmac_tweak_clock_spreading' drivers/macintosh/via-pmu.c:2494: undefined reference to `.set_context' drivers/macintosh/via-pmu.c:2670: undefined reference to `._nmask_and_or_msr' drivers/macintosh/via-pmu.c:2592: undefined reference to `.set_context' If I turn CONFIG_ADB_PMU off, I see this: arch/powerpc/platforms/powermac/time.c:335: undefined reference to `.pmu_register_sleep_notifier' I think I'll leave the task of switching the Fedora rawhide kernel to arch/powerpc to another day :) -- dwmw2 From miltonm at bga.com Wed Nov 2 02:02:08 2005 From: miltonm at bga.com (Milton Miller) Date: Tue, 1 Nov 2005 09:02:08 -0600 Subject: Patches for 2.6.15 Message-ID: Paul wrote: > Kumar Gala writes: > > http://patchwork.ozlabs.org/linuxppc/patch?id=2931 > > Having the same extern declaration in several C files raises a red > flag. Could we have that in a suitable header file instead please? And the patch header says: > Having a prototype that uses seq_file without always including > seq_file.h generates a lot of warnings. This happened when asm/irq.h > was merged. Hint: a simple struct seq_file; in the approprate header should suffice. milton From arnd at arndb.de Wed Nov 2 06:13:50 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 1 Nov 2005 20:13:50 +0100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <20051101055324.GA3551@localhost.localdomain> References: <20051101055324.GA3551@localhost.localdomain> Message-ID: <200511012013.51443.arnd@arndb.de> On Dinsdag 01 November 2005 06:53, David Gibson wrote: > +struct ipc64_perm > +{ > +???????__kernel_key_t??key; > +???????__kernel_uid_t??uid; > +???????__kernel_gid_t??gid; > +???????__kernel_uid_t??cuid; > +???????__kernel_gid_t??cgid; > +???????__kernel_mode_t?mode; > +???????unsigned int????seq; > +???????unsigned int????__pad1; > +???????u64?????????????__unused1; > +???????u64?????????????__unused2; > +}; ipc64_perm is a user visible structure, so you have to use __u64 here instead of u64. Even that does not exists if you build with 32 bit and __STRICT_ANSI__, so it might be better yet to use four __u32 for the unused fields. Arnd <>< From ingvar at linpro.no Wed Nov 2 09:49:26 2005 From: ingvar at linpro.no (Ingvar Hagelund) Date: 01 Nov 2005 23:49:26 +0100 Subject: dlpar problem on sles9/openpower Message-ID: This is a pure user question We have an IBM OpenPower 720 with hypervisor and a set of lpars. Earlier we got dlpar to work, at least a couple of times, but not anymore. Here's an example: Ran this on the hmc: ~> chhwres -r virtualio -m Server-9124-720-SNXXXXXXX -p mgmt -o a \ --rsubtype eth -s 11 \ -a "ieee_virtual_eth=1,port_vlan_id=1,is_trunk=1,\"addl_vlan_ids=600,601\"" HSCL294C DLPAR ADD Virtual I/O resources failed: HMC adding Virtual I/O ...... HMC Virtual slot DLPAR operation failed. Here are the virtual slot IDs that failed and the reasons for failure: 11 The dynamic logical partitioning operation failed. On the lpar "mgmt", running sles9 sp2 kernel 2.6.5-7.193-pseries64, we have the magic rpms from IBM installed: evlog-drv-tmpl-0.8-1 diagela-1.3.0.0-6 lsvpd-0.12.7-1 ppc64-utils-2.5-2 librtas-1.2-1 ... and the following IBM services are running: # lssrc -a Subsystem Group PID Status ctrmc rsct 5011 active IBM.ServiceRM rsct_rm 5103 active IBM.DRM rsct_rm 5110 active IBM.HostRM rsct_rm 5186 active IBM.CSMAgentRM rsct_rm 5217 active IBM.ERRM rsct_rm 5221 active IBM.AuditRM rsct_rm 5266 active ctcas rsct 7705 active IBM.SensorRM rsct_rm 7721 active IBM.FSRM rsct_rm 7722 active IBM.ConfigRM rsct_rm 7723 active The hmc seems to be able to talk to the managed system ~> lshwres -r virtualio --rsubtype eth -m Server-9124-720-SNXXXXXXX \ --level lpar --filter "lpar_names=mgmt" lpar_name=mgmt,lpar_id=1,slot_num=10,state=null,ieee_virtual_eth=1,port_vlan_id=1,"addl_vlan_ids=500,501,502,503",is_trunk=1,is_required=0,mac_addr=AE808000100A ... and to the lpar: ~> lspartition -dlpar <#0> Partition:<1, power0.somewhere.tld, 10.0.0.2> Active:<1>, OS:, DCaps:<0xf>, CmdCaps:<0x1, 0x1>, PinnedMem:<0> So, where can I start digging? Regards, Ingvar -- Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the very wise cannot see all ends. Gandalf From tdgarcia at us.ibm.com Wed Nov 2 09:51:24 2005 From: tdgarcia at us.ibm.com (tdgarcia) Date: Tue, 01 Nov 2005 16:51:24 -0600 Subject: lockmeter port for ppc64 Message-ID: <4367F16C.5080003@us.ibm.com> My team and I adapted your lockmeter code to the ppc64 architecture. I am attaching the ppc64 specific code to this email. This code should patch cleanly to the 2.6.13 kernel. diff -Narup linux-2.6.13/arch/ppc64/Kconfig.debug linux-2.6.13-lockmeter/arch/ppc64/Kconfig.debug --- linux-2.6.13/arch/ppc64/Kconfig.debug 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/arch/ppc64/Kconfig.debug 2005-10-11 07:40:28.000000000 -0700 @@ -19,6 +19,13 @@ config KPROBES for kernel debugging, non-intrusive instrumentation and testing. If in doubt, say "N". +config LOCKMETER + bool "Kernel lock metering" + depends on SMP && !PREEMPT + help + Say Y to enable kernel lock metering, which adds overhead to SMP locks, + but allows you to see various statistics using the lockstat command. + config DEBUG_STACK_USAGE bool "Stack utilization instrumentation" depends on DEBUG_KERNEL diff -Narup linux-2.6.13/arch/ppc64/lib/dec_and_lock.c linux-2.6.13-lockmeter/arch/ppc64/lib/dec_and_lock.c --- linux-2.6.13/arch/ppc64/lib/dec_and_lock.c 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/arch/ppc64/lib/dec_and_lock.c 2005-10-10 13:02:47.000000000 -0700 @@ -28,7 +28,13 @@ */ #ifndef ATOMIC_DEC_AND_LOCK + +#ifndef CONFIG_LOCKMETER +int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +#else int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +#endif /* CONFIG_LOCKMETER */ + { int counter; int newcount; @@ -51,5 +57,10 @@ int _atomic_dec_and_lock(atomic_t *atomi return 0; } +#ifndef CONFIG_LOCKMETER +EXPORT_SYMBOL(atomic_dec_and_lock); +#else EXPORT_SYMBOL(_atomic_dec_and_lock); +#endif /* CONFIG_LOCKMETER */ + #endif /* ATOMIC_DEC_AND_LOCK */ diff -Narup linux-2.6.13/include/asm-ppc64/lockmeter.h linux-2.6.13-lockmeter/include/asm-ppc64/lockmeter.h --- linux-2.6.13/include/asm-ppc64/lockmeter.h 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6.13-lockmeter/include/asm-ppc64/lockmeter.h 2005-10-11 08:48:48.000000000 -0700 @@ -0,0 +1,110 @@ +/* + * Copyright (C) 1999,2000 Silicon Graphics, Inc. + * + * Written by John Hawkes (hawkes at sgi.com) + * Based on klstat.h by Jack Steiner (steiner at sgi.com) + * + * Modified by Ray Bryant (raybry at us.ibm.com) + * Changes Copyright (C) 2000 IBM, Inc. + * Added save of index in spinlock_t to improve efficiency + * of "hold" time reporting for spinlocks. + * Added support for hold time statistics for read and write + * locks. + * Moved machine dependent code here from include/lockmeter.h. + * + * Modified by Tony Garcia (garcia1 at us.ibm.com) + * Ported to Power PC 64 + */ + +#ifndef _PPC64_LOCKMETER_H +#define _PPC64_LOCKMETER_H + + +#include +#include +#include + +#include /* definitions for SPRN_TBRL + SPRN_TBRU, mftb() */ +extern unsigned long ppc_proc_freq; + +#define CPU_CYCLE_FREQUENCY ppc_proc_freq + +#define THIS_CPU_NUMBER smp_processor_id() + +/* + * macros to cache and retrieve an index value inside of a spin lock + * these macros assume that there are less than 65536 simultaneous + * (read mode) holders of a rwlock. Not normally a problem!! + * we also assume that the hash table has less than 65535 entries. + */ +/* + * instrumented spinlock structure -- never used to allocate storage + * only used in macros below to overlay a spinlock_t + */ +typedef struct inst_spinlock_s { + volatile unsigned int lock; + unsigned int index; +} inst_spinlock_t; + +#define PUT_INDEX(lock_ptr,indexv) ((inst_spinlock_t *)(lock_ptr))->index = indexv +#define GET_INDEX(lock_ptr) ((inst_spinlock_t *)(lock_ptr))->index + +/* + * macros to cache and retrieve an index value in a read/write lock + * as well as the cpu where a reader busy period started + * we use the 2nd word (the debug word) for this, so require the + * debug word to be present + */ +/* + * instrumented rwlock structure -- never used to allocate storage + * only used in macros below to overlay a rwlock_t + */ +typedef struct inst_rwlock_s { + volatile signed int lock; + unsigned int index; + unsigned int cpu; +} inst_rwlock_t; + +#define PUT_RWINDEX(rwlock_ptr,indexv) ((inst_rwlock_t *)(rwlock_ptr))->index = indexv +#define GET_RWINDEX(rwlock_ptr) ((inst_rwlock_t *)(rwlock_ptr))->index +#define PUT_RW_CPU(rwlock_ptr,cpuv) ((inst_rwlock_t *)(rwlock_ptr))->cpu = cpuv +#define GET_RW_CPU(rwlock_ptr) ((inst_rwlock_t *)(rwlock_ptr))->cpu + +/* + * return the number of readers for a rwlock_t + */ +#define RWLOCK_READERS(rwlock_ptr) rwlock_readers(rwlock_ptr) + +/* Return number of readers */ +extern inline int rwlock_readers(rwlock_t *rwlock_ptr) +{ + signed int tmp = rwlock_ptr->lock; + + if ( tmp > 0 ) + return tmp; + else + return 0; +} + +/* + * return true if rwlock is write locked + * (note that other lock attempts can cause the lock value to be negative) + */ +#define RWLOCK_IS_WRITE_LOCKED(rwlock_ptr) ((signed int)(rwlock_ptr)->lock < 0) +#define RWLOCK_IS_READ_LOCKED(rwlock_ptr) ((signed int)(rwlock_ptr)->lock > 0 ) + +/*Written by Carl L. to get the time base counters on ppc, + rplaces the Intel only call rtds*/ +static inline long get_cycles64 (void) +{ + unsigned long tb; + + /* read the upper and lower 32 bit Time base counter */ + tb = mfspr(SPRN_TBRU); + tb = (tb << 32) | mfspr(SPRN_TBRL); + + return(tb); +} + +#endif /* _PPC64_LOCKMETER_H */ diff -Narup linux-2.6.13/include/asm-ppc64/spinlock.h linux-2.6.13-lockmeter/include/asm-ppc64/spinlock.h --- linux-2.6.13/include/asm-ppc64/spinlock.h 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/include/asm-ppc64/spinlock.h 2005-10-10 14:04:25.000000000 -0700 @@ -23,6 +23,9 @@ typedef struct { volatile unsigned int lock; +#ifdef CONFIG_LOCKMETER + unsigned int lockmeter_magic; +#endif /* CONFIG_LOCKMETER */ #ifdef CONFIG_PREEMPT unsigned int break_lock; #endif @@ -30,13 +33,20 @@ typedef struct { typedef struct { volatile signed int lock; +#ifdef CONFIG_LOCKMETER + unsigned int index; + unsigned int cpu; +#endif /* CONFIG_LOCKMETER */ #ifdef CONFIG_PREEMPT unsigned int break_lock; #endif } rwlock_t; -#ifdef __KERNEL__ -#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 } +#ifdef CONFIG_LOCKMETER + #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 } +#else + #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 , 0} +#endif /* CONFIG_LOCKMETER */ #define spin_is_locked(x) ((x)->lock != 0) #define spin_lock_init(x) do { *(x) = SPIN_LOCK_UNLOCKED; } while(0) @@ -144,7 +154,7 @@ static void __inline__ _raw_spin_lock_fl * irq-safe write-lock, but readers can get non-irqsafe * read-locks. */ -#define RW_LOCK_UNLOCKED (rwlock_t) { 0 } +#define RW_LOCK_UNLOCKED (rwlock_t) { 0 , 0 , 0 } #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) @@ -157,6 +167,44 @@ static __inline__ void _raw_write_unlock rw->lock = 0; } +#if defined(CONFIG_LOCKMETER) && defined(CONFIG_HAVE_DEC_LOCK) +extern void _metered_spin_lock (spinlock_t *lock, void *caller_pc); +extern void _metered_spin_unlock(spinlock_t *lock); + +/* + * Matches what is in arch/ppc64/lib/dec_and_lock.c, except this one is + * "static inline" so that the spin_lock(), if actually invoked, is charged + * against the real caller, not against the catch-all atomic_dec_and_lock + */ +static inline int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +{ + int counter; + int newcount; + + for (;;) { + counter = atomic_read(atomic); + newcount = counter - 1; + if (!newcount) + break; /* do it the slow way */ + + newcount = cmpxchg(&atomic->counter, counter, newcount); + if (newcount == counter) + return 0; + } + + preempt_disable(); + _metered_spin_lock(lock, __builtin_return_address(0)); + if (atomic_dec_and_test(atomic)) + return 1; + _metered_spin_unlock(lock); + preempt_enable(); + + return 0; +} + +#define ATOMIC_DEC_AND_LOCK +#endif /* CONFIG_LOCKMETER and CONFIG_HAVE_DEC_LOCK */ + /* * This returns the old value in the lock + 1, * so we got a read lock if the return value is > 0. @@ -256,5 +304,4 @@ static void __inline__ _raw_write_lock(r } } -#endif /* __KERNEL__ */ #endif /* __ASM_SPINLOCK_H */ From hien at us.ibm.com Wed Nov 2 11:14:42 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Tue, 01 Nov 2005 16:14:42 -0800 Subject: [PATCH] exporting validate_sp Message-ID: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> This patch will export the validate_sp() function (part of dump_stack code). I am developers for the systemtap project: http://sourceware.org/systemtap/ The SystemTap runtime includes a function for capturing a stack trace as a string. For the ppc64 port, we need to have the validate_sp() function exported so it is accessible to our stack-trace function, which is part of a SystemTap-generated kernel module. This patch should apply to kernel 2.6.14-rc5-mm1. Thanks, Hien. Signed-off-by: Hien Nguyen --- linux-2.6.14-rc5.org/arch/ppc64/kernel/process.c 2005-10-19 23:23:05.000000000 -0700 +++ linux-2.6.14-rc5/arch/ppc64/kernel/process.c 2005-11-01 12:54:23.000000000 -0800 @@ -626,6 +626,7 @@ return 0; } +EXPORT_SYMBOL_GPL(validate_sp); unsigned long get_wchan(struct task_struct *p) { From david at gibson.dropbear.id.au Wed Nov 2 11:06:44 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 11:06:44 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <20051102000644.GB8308@localhost.localdomain> On Tue, Nov 01, 2005 at 03:54:04PM +0000, David Woodhouse wrote: > On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > > It is now possible to build kernels for powermac, pSeries, iSeries and > > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > > build and run. > > Hm. Not entirely in line with my experience. Can you share the configs > you used? I gather paulus doesn't believe in CONFIG_TAU. > Using http://david/woodhou.se/powerpc-merge-32.config it doesn't > actually boot on my powerbook. I'll try it on the Pegasos later or > tomorrow, where I have a serial console; it dies very early. > > Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented > anywhere, I also needed to do this to make that config build: > > --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 > +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 > @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file > #ifdef CONFIG_TAU_AVERAGE > /* more straightforward, but potentially misleading */ > seq_printf(m, "temperature \t: %u C (uncalibrated)\n", > - cpu_temp(i)); > + cpu_temp(cpu_id)); > #else > /* show the actual temp sensor range */ > u32 temp; > - temp = cpu_temp_both(i); > + temp = cpu_temp_both(cpu_id); > seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", > temp & 0xff, temp >> 16); > #endif > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Wed Nov 2 11:44:26 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 11:44:26 +1100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <200511012013.51443.arnd@arndb.de> References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> Message-ID: <20051102004426.GC8308@localhost.localdomain> On Tue, Nov 01, 2005 at 08:13:50PM +0100, Arnd Bergmann wrote: > On Dinsdag 01 November 2005 06:53, David Gibson wrote: > > +struct ipc64_perm > > +{ > > +???????__kernel_key_t??key; > > +???????__kernel_uid_t??uid; > > +???????__kernel_gid_t??gid; > > +???????__kernel_uid_t??cuid; > > +???????__kernel_gid_t??cgid; > > +???????__kernel_mode_t?mode; > > +???????unsigned int????seq; > > +???????unsigned int????__pad1; > > +???????u64?????????????__unused1; > > +???????u64?????????????__unused2; > > +}; > > ipc64_perm is a user visible structure, so you have to use > __u64 here instead of u64. Even that does not exists if > you build with 32 bit and __STRICT_ANSI__, so it might > be better yet to use four __u32 for the unused fields. Oops. I realised it was user visible, but forgot the wrinkle that the 'uXX' names can't be used there. Here's a patch to correct it. Paulus, please apply. Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in user-visible headers. This patch corrects the problem, replacing the unused fields with an array of four __u32s. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 @@ -27,8 +27,7 @@ __kernel_mode_t mode; unsigned int seq; unsigned int __pad1; - u64 __unused1; - u64 __unused2; + __u32 __unused[4]; }; #endif /* _ASM_POWERPC_IPCBUF_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Nov 2 12:03:26 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 12:03:26 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <17256.4190.184855.821331@cargo.ozlabs.ibm.com> David Woodhouse writes: > Hm. Not entirely in line with my experience. Can you share the configs > you used? Sure, attached (as a .tar.gz). For 32-bit pmac, you currently have to disable CONFIG_PREP and (I believe) the TAU options. For the 64-bit configs I basically just used the defconfigs in arch/ppc64/configs. > Using http://david/woodhou.se/powerpc-merge-32.config it doesn't > actually boot on my powerbook. I'll try it on the Pegasos later or > tomorrow, where I have a serial console; it dies very early. That's probably either the pci quirk that got added to do USB host controller handoff unconditionally on all platforms, and which touches the device without doing pci_enable_device or checking whether MMIO is enabled. A fix has gone into Linus' tree for that. There was also a bug added to the adbhid.c driver which would cause an oops when you pressed a key if you had an ADB keyboard (which powerbooks do). That's also fixed in Linus' tree. > Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented > anywhere, I also needed to do this to make that config build: > > --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 > +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 > @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file > #ifdef CONFIG_TAU_AVERAGE > /* more straightforward, but potentially misleading */ > seq_printf(m, "temperature \t: %u C (uncalibrated)\n", > - cpu_temp(i)); > + cpu_temp(cpu_id)); > #else > /* show the actual temp sensor range */ > u32 temp; > - temp = cpu_temp_both(i); > + temp = cpu_temp_both(cpu_id); > seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", > temp & 0xff, temp >> 16); > #endif Thanks, I'll put that in. Paul. From paulus at samba.org Wed Nov 2 12:04:07 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 12:04:07 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <17256.4231.124394.723713@cargo.ozlabs.ibm.com> David Woodhouse writes: > Hm. Not entirely in line with my experience. Can you share the configs > you used? Forgot to attach the configs on my previous reply. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: configs.tar.gz Type: application/octet-stream Size: 19644 bytes Desc: config tarball Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051102/55f01850/attachment.obj From david at gibson.dropbear.id.au Wed Nov 2 13:58:22 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 13:58:22 +1100 Subject: powerpc: Merge futex.h Message-ID: <20051102025822.GB10682@localhost.localdomain> This patch merges the ppc32 and ppc64 versions of futex.h, essentially by taking the ppc64 version as the powerpc version. The old ppc32 version did not implement the futex_atomic_op_inuser() callback (it always returned -ENOSYS), so FUTEX_WAKE_OP would not work on ppc32. In fact the ppc64 version of this function is almost suitable for ppc32 as well - the only change needed is to extend ppc_asm.h with a macro expanding to to the right pseudo-op to store a pointer (either ".long" or ".llong"). Built and booted on pSeries. Built for 32-bit powermac. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/futex.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/futex.h 2005-11-02 13:43:08.000000000 +1100 @@ -0,0 +1,84 @@ +#ifndef _ASM_POWERPC_FUTEX_H +#define _ASM_POWERPC_FUTEX_H + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include + +#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ + __asm__ __volatile ( \ + SYNC_ON_SMP \ +"1: lwarx %0,0,%2\n" \ + insn \ +"2: stwcx. %1,0,%2\n" \ + "bne- 1b\n" \ + "li %1,0\n" \ +"3: .section .fixup,\"ax\"\n" \ +"4: li %1,%3\n" \ + "b 3b\n" \ + ".previous\n" \ + ".section __ex_table,\"a\"\n" \ + ".align 3\n" \ + DATAL " 1b,4b,2b,4b\n" \ + ".previous" \ + : "=&r" (oldval), "=&r" (ret) \ + : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ + : "cr0", "memory") + +static inline int futex_atomic_op_inuser (int encoded_op, int __user *uaddr) +{ + int op = (encoded_op >> 28) & 7; + int cmp = (encoded_op >> 24) & 15; + int oparg = (encoded_op << 8) >> 20; + int cmparg = (encoded_op << 20) >> 20; + int oldval = 0, ret; + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) + oparg = 1 << oparg; + + if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) + return -EFAULT; + + inc_preempt_count(); + + switch (op) { + case FUTEX_OP_SET: + __futex_atomic_op("", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_ADD: + __futex_atomic_op("add %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_OR: + __futex_atomic_op("or %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_ANDN: + __futex_atomic_op("andc %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_XOR: + __futex_atomic_op("xor %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + default: + ret = -ENOSYS; + } + + dec_preempt_count(); + + if (!ret) { + switch (cmp) { + case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; + case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; + case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; + case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; + case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; + case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; + default: ret = -ENOSYS; + } + } + return ret; +} + +#endif /* __KERNEL__ */ +#endif /* _ASM_POWERPC_FUTEX_H */ Index: working-2.6/include/asm-ppc/futex.h =================================================================== --- working-2.6.orig/include/asm-ppc/futex.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,53 +0,0 @@ -#ifndef _ASM_FUTEX_H -#define _ASM_FUTEX_H - -#ifdef __KERNEL__ - -#include -#include -#include - -static inline int -futex_atomic_op_inuser (int encoded_op, int __user *uaddr) -{ - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; - int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; - - inc_preempt_count(); - - switch (op) { - case FUTEX_OP_SET: - case FUTEX_OP_ADD: - case FUTEX_OP_OR: - case FUTEX_OP_ANDN: - case FUTEX_OP_XOR: - default: - ret = -ENOSYS; - } - - dec_preempt_count(); - - if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } - return ret; -} - -#endif -#endif Index: working-2.6/include/asm-ppc64/futex.h =================================================================== --- working-2.6.orig/include/asm-ppc64/futex.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,83 +0,0 @@ -#ifndef _ASM_FUTEX_H -#define _ASM_FUTEX_H - -#ifdef __KERNEL__ - -#include -#include -#include -#include - -#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ - __asm__ __volatile (SYNC_ON_SMP \ -"1: lwarx %0,0,%2\n" \ - insn \ -"2: stwcx. %1,0,%2\n\ - bne- 1b\n\ - li %1,0\n\ -3: .section .fixup,\"ax\"\n\ -4: li %1,%3\n\ - b 3b\n\ - .previous\n\ - .section __ex_table,\"a\"\n\ - .align 3\n\ - .llong 1b,4b,2b,4b\n\ - .previous" \ - : "=&r" (oldval), "=&r" (ret) \ - : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ - : "cr0", "memory") - -static inline int -futex_atomic_op_inuser (int encoded_op, int __user *uaddr) -{ - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; - int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; - - inc_preempt_count(); - - switch (op) { - case FUTEX_OP_SET: - __futex_atomic_op("", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_ADD: - __futex_atomic_op("add %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_OR: - __futex_atomic_op("or %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_ANDN: - __futex_atomic_op("andc %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_XOR: - __futex_atomic_op("xor %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - default: - ret = -ENOSYS; - } - - dec_preempt_count(); - - if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } - return ret; -} - -#endif -#endif Index: working-2.6/include/asm-powerpc/ppc_asm.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ppc_asm.h 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/include/asm-powerpc/ppc_asm.h 2005-11-02 13:48:08.000000000 +1100 @@ -506,6 +506,13 @@ #else #define __ASM_CONST(x) x##UL #define ASM_CONST(x) __ASM_CONST(x) + +#ifdef CONFIG_PPC64 +#define DATAL ".llong" +#else +#define DATAL ".long" +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PPC_ASM_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Wed Nov 2 15:13:20 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 15:13:20 +1100 Subject: powerpc: Move dart.h Message-ID: <20051102041320.GA15666@localhost.localdomain> asm-ppc64/dart.h is included in exactly one place - arch/powerpc/sysdev/u3_iommu.c. This patch, therefore, moves it into arch/powerpc/sysdev. While we're at it, update the #ifndef/#define protecting the include, and the filename in the comments of u3_iommu.c. Built and booted on pSeries and G5, built for ppc32 powermac. Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/sysdev/dart.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/sysdev/dart.h 2005-11-02 14:53:28.000000000 +1100 @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2004 Olof Johansson , IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _POWERPC_SYSDEV_DART_H +#define _POWERPC_SYSDEV_DART_H + + +/* physical base of DART registers */ +#define DART_BASE 0xf8033000UL + +/* Offset from base to control register */ +#define DARTCNTL 0 +/* Offset from base to exception register */ +#define DARTEXCP 0x10 +/* Offset from base to TLB tag registers */ +#define DARTTAG 0x1000 + + +/* Control Register fields */ + +/* base address of table (pfn) */ +#define DARTCNTL_BASE_MASK 0xfffff +#define DARTCNTL_BASE_SHIFT 12 + +#define DARTCNTL_FLUSHTLB 0x400 +#define DARTCNTL_ENABLE 0x200 + +/* size of table in pages */ +#define DARTCNTL_SIZE_MASK 0x1ff +#define DARTCNTL_SIZE_SHIFT 0 + + +/* DART table fields */ + +#define DARTMAP_VALID 0x80000000 +#define DARTMAP_RPNMASK 0x00ffffff + + +#define DART_PAGE_SHIFT 12 +#define DART_PAGE_SIZE (1 << DART_PAGE_SHIFT) +#define DART_PAGE_FACTOR (PAGE_SHIFT - DART_PAGE_SHIFT) + + +#endif /* _POWERPC_SYSDEV_DART_H */ Index: working-2.6/include/asm-ppc64/dart.h =================================================================== --- working-2.6.orig/include/asm-ppc64/dart.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,59 +0,0 @@ -/* - * Copyright (C) 2004 Olof Johansson , IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#ifndef _ASM_DART_H -#define _ASM_DART_H - - -/* physical base of DART registers */ -#define DART_BASE 0xf8033000UL - -/* Offset from base to control register */ -#define DARTCNTL 0 -/* Offset from base to exception register */ -#define DARTEXCP 0x10 -/* Offset from base to TLB tag registers */ -#define DARTTAG 0x1000 - - -/* Control Register fields */ - -/* base address of table (pfn) */ -#define DARTCNTL_BASE_MASK 0xfffff -#define DARTCNTL_BASE_SHIFT 12 - -#define DARTCNTL_FLUSHTLB 0x400 -#define DARTCNTL_ENABLE 0x200 - -/* size of table in pages */ -#define DARTCNTL_SIZE_MASK 0x1ff -#define DARTCNTL_SIZE_SHIFT 0 - - -/* DART table fields */ - -#define DARTMAP_VALID 0x80000000 -#define DARTMAP_RPNMASK 0x00ffffff - - -#define DART_PAGE_SHIFT 12 -#define DART_PAGE_SIZE (1 << DART_PAGE_SHIFT) -#define DART_PAGE_FACTOR (PAGE_SHIFT - DART_PAGE_SHIFT) - - -#endif Index: working-2.6/arch/powerpc/sysdev/u3_iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/sysdev/u3_iommu.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/sysdev/u3_iommu.c 2005-11-02 14:54:16.000000000 +1100 @@ -1,5 +1,5 @@ /* - * arch/ppc64/kernel/u3_iommu.c + * arch/powerpc/sysdev/u3_iommu.c * * Copyright (C) 2004 Olof Johansson , IBM Corporation * @@ -44,9 +44,10 @@ #include #include #include -#include #include +#include "dart.h" + extern int iommu_force_on; /* Physical base address and size of the DART table */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Nov 2 15:43:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 15:43:44 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130861207.21212.66.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> Message-ID: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> David Woodhouse writes: > The ppc64 build (http://david.woodhou.se/powerpc-merge-64.config) fares > worse than ppc32 for me -- it doesn't even build. Those errors were all due to getting powerbook sleep code included because you have CONFIG_PM=y. I have changed things so that that code doesn't get included on a 64-bit build (at least until BenH gets sleep going on the G5 :). I also pulled in Linus' tree, and now drivers/char/tlclk.c fails to build for some reason. I claim that's not my fault, however. :) Paul. From michael at ellerman.id.au Wed Nov 2 18:23:33 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Wed, 2 Nov 2005 18:23:33 +1100 (EST) Subject: [PATCH] powerpc: Fix random memory corruption in merged elf.h Message-ID: <20051102072333.B91D568664@ozlabs.org> The merged verison of ELF_CORE_COPY_REGS is basically the PPC64 version, with a memset that came from PPC and a few types abstracted out into #defines. But it's not _quite_ right. The first problem is we calculate the number of registers with: nregs = sizeof(struct pt_regs) / sizeof(ELF_GREG_TYPE) For a 32-bit process on a 64-bit kernel that's bogus because the registers are 64 bits, but ELF_GREG_TYPE is u32, so nregs == 88 which is wrong. The other problem is the memset, which assumes a struct pt_regs is smaller than a struct elf_regs. For a 32-bit process on a 64-bit kernel that's false. The fix is to calculate the number of regs using sizeof(unsigned long), which should always be right, and just memset the whole damn thing _before_ copying the registers in. Signed-off-by: Michael Ellerman --- include/asm-powerpc/elf.h | 22 +++++++++++++--------- 1 files changed, 13 insertions(+), 9 deletions(-) Index: kexec/include/asm-powerpc/elf.h =================================================================== --- kexec.orig/include/asm-powerpc/elf.h +++ kexec/include/asm-powerpc/elf.h @@ -178,18 +178,22 @@ typedef elf_vrreg_t elf_vrregset_t32[ELF static inline void ppc_elf_core_copy_regs(elf_gregset_t elf_regs, struct pt_regs *regs) { - int i; - int gprs = sizeof(struct pt_regs)/sizeof(ELF_GREG_TYPE); + int i, nregs; - if (gprs > ELF_NGREG) - gprs = ELF_NGREG; + memset((void *)elf_regs, 0, sizeof(elf_gregset_t)); - for (i=0; i < gprs; i++) + /* Our registers are always unsigned longs, whether we're a 32 bit + * process or 64 bit, on either a 64 bit or 32 bit kernel. + * Don't use ELF_GREG_TYPE here. */ + nregs = sizeof(struct pt_regs) / sizeof(unsigned long); + if (nregs > ELF_NGREG) + nregs = ELF_NGREG; + + for (i = 0; i < nregs; i++) { + /* This will correctly truncate 64 bit registers to 32 bits + * for a 32 bit process on a 64 bit kernel. */ elf_regs[i] = (elf_greg_t)((ELF_GREG_TYPE *)regs)[i]; - - memset((char *)(elf_regs) + sizeof(struct pt_regs), 0, \ - sizeof(elf_gregset_t) - sizeof(struct pt_regs)); - + } } #define ELF_CORE_COPY_REGS(gregs, regs) ppc_elf_core_copy_regs(gregs, regs); From benh at kernel.crashing.org Wed Nov 2 18:23:18 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 02 Nov 2005 18:23:18 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130915220.20136.14.camel@gaston> References: <1130915220.20136.14.camel@gaston> Message-ID: <1130916198.20136.17.camel@gaston> On Wed, 2005-11-02 at 18:07 +1100, Benjamin Herrenschmidt wrote: > It took a while, but finally, here is the 64K pages support patch for > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > changes the kernel base page size to 64K. The resulting kernel still > boots on any hardware. On current machines with 4K pages support only, > the kernel will maintain 16 "subpages" for each 64K page > transparently. > > Note that while real 64K capable HW has been tested, the current patch > will not enable it yet as such hardware is not released yet, and I'm > still verifying with the firmware architects the proper to get the > information from the newer hypervisors. > > Signed-off-by: Benjamin Herrenschmidt Oh, and since the mailing lists are probably filtering this out due to the patch size, here's an URL where you can find it too: http://gate.crashing.org/~benh/ppc64-64k-pages.diff Ben. From dwmw2 at infradead.org Wed Nov 2 18:35:12 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 07:35:12 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> <17256.17408.247708.622755@cargo.ozlabs.ibm.com> Message-ID: <1130916912.10031.143.camel@baythorne.infradead.org> On Wed, 2005-11-02 at 15:43 +1100, Paul Mackerras wrote: > I also pulled in Linus' tree, and now drivers/char/tlclk.c fails to > build for some reason. I claim that's not my fault, however. :) It just needs included. That reminds me -- I needed that in platforms/chrp/pegasos_eth.c too. diff --git a/arch/powerpc/platforms/chrp/pegasos_eth.c b/arch/powerpc/platforms/chrp/pegasos_eth.c --- a/arch/powerpc/platforms/chrp/pegasos_eth.c +++ b/arch/powerpc/platforms/chrp/pegasos_eth.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #define PEGASOS2_MARVELL_REGBASE (0xf1000000) diff --git a/drivers/char/tlclk.c b/drivers/char/tlclk.c --- a/drivers/char/tlclk.c +++ b/drivers/char/tlclk.c @@ -43,6 +43,7 @@ #include #include #include +#include #include /* inb/outb */ #include -- dwmw2 From dwmw2 at infradead.org Wed Nov 2 20:06:19 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 09:06:19 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.4190.184855.821331@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> <17256.4190.184855.821331@cargo.ozlabs.ibm.com> Message-ID: <1130922379.10031.154.camel@baythorne.infradead.org> On Wed, 2005-11-02 at 12:03 +1100, Paul Mackerras wrote: > That's probably either the pci quirk that got added to do USB host > controller handoff unconditionally on all platforms, and which touches > the device without doing pci_enable_device or checking whether MMIO is > enabled. A fix has gone into Linus' tree for that. > > There was also a bug added to the adbhid.c driver which would cause an > oops when you pressed a key if you had an ADB keyboard (which > powerbooks do). That's also fixed in Linus' tree. It was neither of those -- after a few warnings about sleeping in inappropriate contexts it just seems to stop. The Pegasos is a little more informative -- lots of 'hda: lost interrupt' on that. Keyboard seems to work though, and Bogomips calculation -- so maybe it's just PCI interrupts which are missing. I'll poke at it further. I'll also try again on the powerbook today and see if I can get anything more useful out of it. -- dwmw2 From vst at vlnb.net Wed Nov 2 17:51:41 2005 From: vst at vlnb.net (Vladislav Bolkhovitin) Date: Wed, 02 Nov 2005 09:51:41 +0300 Subject: [PATCH 0/3] ibmvscsis scsi target In-Reply-To: <435EE61C.8020404@torque.net> References: <20051017143644.GA9992@cs.umn.edu> <435EE61C.8020404@torque.net> Message-ID: <436861FD.6080300@vlnb.net> Douglas Gilbert wrote: > Dave Boutcher wrote: > >>James, >> >>Here's the ibmvscsis SCSI target submitted for inclusion in 2.4.15. >>This driver meets a couple of akpm's criteria for worthiness, in that >>its actually been shipping for a while in a distro kernel, and (given >>the posts when I broke compatibility) is being used. >> >>This version is basically the same as the recent RFC version I sent >>out, with a few bug fixes. It addresses a comment from Anton about >>using gratuitously small max_sectors limits, and has a few other >>miscellanious fixes. >> >>The only other significant comment generated by the the RFC was from >>Christoph, and requested that this work be combined with the sgtg work >>that Mike Christie and Tomonori Fujita are working on. I definitely >>will start contributing to that work, and will convert this driver to >>their framework when it becomes complete. I would rather not keep >>this driver out of mainline for the amount of time that may take. > > > Dave, > While I'm partial to things that start with "sg...", I > had problems finding that project until I tried "stgt". Doug, Dave, Have you seen SCST (SCSI target mid-layer for Linux) on http://scst.sourceforge.net? It's much more advanced, than stgt, and much more moved ahead. Vlad From schwab at suse.de Thu Nov 3 00:12:01 2005 From: schwab at suse.de (Andreas Schwab) Date: Wed, 02 Nov 2005 14:12:01 +0100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <20051102004426.GC8308@localhost.localdomain> (David Gibson's message of "Wed, 2 Nov 2005 11:44:26 +1100") References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> <20051102004426.GC8308@localhost.localdomain> Message-ID: David Gibson writes: > Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in > user-visible headers. This patch corrects the problem, replacing the > unused fields with an array of four __u32s. > > Signed-off-by: David Gibson > > Index: working-2.6/include/asm-powerpc/ipcbuf.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 > +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 > @@ -27,8 +27,7 @@ > __kernel_mode_t mode; > unsigned int seq; > unsigned int __pad1; > - u64 __unused1; > - u64 __unused2; > + __u32 __unused[4]; I think you are changing the alignment of the structure. A u64 has bigger alignment than a u32[2]. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From johnrose at austin.ibm.com Thu Nov 3 03:29:55 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 02 Nov 2005 10:29:55 -0600 Subject: [PATCH] fix add notifier crashes Message-ID: <1130948995.32348.35.camel@sinatra.austin.ibm.com> Hi Paul- The extraction of PCI stuff from struct device_node left some false assumptions in notifier code. As a result, dynamic add crashes when non-PCI nodes are added. This patch fixes these assumptions. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pci_dn.c~add_crash_fix arch/ppc64/kernel/pci_dn.c --- 2_6_linus_2/arch/ppc64/kernel/pci_dn.c~add_crash_fix 2005-10-31 10:51:19.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pci_dn.c 2005-10-31 10:56:47.000000000 -0600 @@ -181,13 +181,14 @@ EXPORT_SYMBOL(fetch_dev_dn); static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) { struct device_node *np = node; - struct pci_dn *pci; + struct pci_dn *pci = NULL; int err = NOTIFY_OK; switch (action) { case PSERIES_RECONFIG_ADD: pci = np->parent->data; - update_dn_pci_info(np, pci->phb); + if (pci) + update_dn_pci_info(np, pci->phb); break; default: err = NOTIFY_DONE; diff -puN arch/powerpc/platforms/pseries/iommu.c~add_crash_fix arch/powerpc/platforms/pseries/iommu.c --- 2_6_linus_2/arch/powerpc/platforms/pseries/iommu.c~add_crash_fix 2005-10-31 15:19:14.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/powerpc/platforms/pseries/iommu.c 2005-10-31 15:20:44.000000000 -0600 @@ -498,7 +498,7 @@ static int iommu_reconfig_notifier(struc switch (action) { case PSERIES_RECONFIG_REMOVE: - if (pci->iommu_table && + if (pci && pci->iommu_table && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); break; _ From johnrose at austin.ibm.com Thu Nov 3 03:40:06 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 02 Nov 2005 10:40:06 -0600 Subject: dlpar problem on sles9/openpower In-Reply-To: References: Message-ID: <1130949606.32348.45.camel@sinatra.austin.ibm.com> Hi Ingvar- > On the lpar "mgmt", running sles9 sp2 kernel 2.6.5-7.193-pseries64, we > have the magic rpms from IBM installed: > > evlog-drv-tmpl-0.8-1 > diagela-1.3.0.0-6 > lsvpd-0.12.7-1 > ppc64-utils-2.5-2 > librtas-1.2-1 I assume that you have the rpa-dlpar package as well, since you said DLPAR worked at an earlier point. :) I would check two things. First, check the dmesg for any blurbs around the time of the failure. Second, use "rpttr /var/ct/IW/log/mc/IBM.DRM/trace". The output from this includes much gibberish, but the translated hex strings can include the inputs to the "drmgr" command and any error messages. Check near the bottom of the file. Unfortunately, the HMC annoyingly hides error messages for DLPAR of virtual adapters, so we have to dig here. Good luck- John From dwmw2 at infradead.org Thu Nov 3 03:54:46 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 16:54:46 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> <17256.17408.247708.622755@cargo.ozlabs.ibm.com> Message-ID: <1130950487.21212.89.camel@hades.cambridge.redhat.com> On Wed, 2005-11-02 at 15:43 +1100, Paul Mackerras wrote: > Those errors were all due to getting powerbook sleep code included > because you have CONFIG_PM=y. I have changed things so that that code > doesn't get included on a 64-bit build (at least until BenH gets sleep > going on the G5 :). OK, now the Fedora rawhide kernel builds for ppc64 with arch/powerpc and runs on both my POWER5 and G5 test boxes. I need this if I want nvram support on the G5 though. Should we be using CONFIG_GENERIC_NVRAM on ppc64, and actually allowing the nvram support to be optional? --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -351,7 +350,7 @@ void __init pmac_setup_arch(void) find_via_pmu(); smu_init(); -#ifdef CONFIG_NVRAM +#if defined(CONFIG_NVRAM) || defined(CONFIG_PPC64) pmac_nvram_init(); #endif -- dwmw2 From hch at lst.de Thu Nov 3 04:40:37 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 2 Nov 2005 18:40:37 +0100 Subject: [PATCH] exporting validate_sp In-Reply-To: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> Message-ID: <20051102174037.GA23650@lst.de> On Tue, Nov 01, 2005 at 04:14:42PM -0800, Hien Nguyen wrote: > This patch will export the validate_sp() function (part of dump_stack > code). > > I am developers for the systemtap project: > http://sourceware.org/systemtap/ > > The SystemTap runtime includes a function for capturing a stack trace as > a string. For the ppc64 port, we need to have the validate_sp() > function exported so it is accessible to our stack-trace function, which > is part of a SystemTap-generated kernel module. > > This patch should apply to kernel 2.6.14-rc5-mm1. NACK. this is not something that should be exported. especiall not for some odd crap that hopefully never will get merged. From hien at us.ibm.com Thu Nov 3 04:55:22 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Wed, 02 Nov 2005 09:55:22 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <20051102174037.GA23650@lst.de> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> Message-ID: <4368FD8A.3090000@us.ibm.com> Christoph Hellwig wrote: > especiall not for >some odd crap that hopefully never will get merged. > > > > I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi actively work on this project for a while). And systemtap itself does not try to merge any code to the main kernel. From david at gibson.dropbear.id.au Thu Nov 3 10:13:58 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 3 Nov 2005 10:13:58 +1100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> <20051102004426.GC8308@localhost.localdomain> Message-ID: <20051102231358.GA24772@localhost.localdomain> On Wed, Nov 02, 2005 at 02:12:01PM +0100, Andreas Schwab wrote: > David Gibson writes: > > > Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in > > user-visible headers. This patch corrects the problem, replacing the > > unused fields with an array of four __u32s. > > > > Signed-off-by: David Gibson > > > > Index: working-2.6/include/asm-powerpc/ipcbuf.h > > =================================================================== > > --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 > > +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 > > @@ -27,8 +27,7 @@ > > __kernel_mode_t mode; > > unsigned int seq; > > unsigned int __pad1; > > - u64 __unused1; > > - u64 __unused2; > > + __u32 __unused[4]; > > I think you are changing the alignment of the structure. A u64 has bigger > alignment than a u32[2]. Bother, so it does. Paulus, please apply. powerpc: Keep fixing merged ipcbuf.h Oops, replacing the two u64s in struct ipc64_perm with __u32s changed the alignment of that structure, which could mess up userspace. Revert to using two unsigned long longs (which is what ppc32 had originally). ppc64 orignally had two unsigned longs, but long long is the same size on 64 bit, so this should be ok there too. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 15:47:11.000000000 +1100 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-03 10:10:58.000000000 +1100 @@ -27,7 +27,8 @@ __kernel_mode_t mode; unsigned int seq; unsigned int __pad1; - __u32 __unused[4]; + unsigned long long __unused1; + unsigned long long __unused2; }; #endif /* _ASM_POWERPC_IPCBUF_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From hch at lst.de Thu Nov 3 10:30:11 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 3 Nov 2005 00:30:11 +0100 Subject: [PATCH] exporting validate_sp In-Reply-To: <4368FD8A.3090000@us.ibm.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> Message-ID: <20051102233011.GA29200@lst.de> On Wed, Nov 02, 2005 at 09:55:22AM -0800, Hien Nguyen wrote: > Christoph Hellwig wrote: > > > especiall not for > >some odd crap that hopefully never will get merged. > > > > > > > > > I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi > actively work on this project for a while). all these companies are known for producing lots of crap. > And systemtap itself does not try to merge any code to the main kernel. and we're never adding exports for out of tree code. you're out of luck. From paulus at samba.org Thu Nov 3 14:16:29 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 3 Nov 2005 14:16:29 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130916198.20136.17.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> Message-ID: <17257.33037.210237.986072@cargo.ozlabs.ibm.com> Benjamin Herrenschmidt writes: > It took a while, but finally, here is the 64K pages support patch for > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > changes the kernel base page size to 64K. The resulting kernel still > boots on any hardware. On current machines with 4K pages support only, > the kernel will maintain 16 "subpages" for each 64K page > transparently. > > Note that while real 64K capable HW has been tested, the current patch > will not enable it yet as such hardware is not released yet, and I'm > still verifying with the firmware architects the proper to get the > information from the newer hypervisors. > > Signed-off-by: Benjamin Herrenschmidt Acked-by: Paul Mackerras From david at gibson.dropbear.id.au Thu Nov 3 16:26:34 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 3 Nov 2005 16:26:34 +1100 Subject: ppc64: Fix bug in SLB miss handler for hugepages In-Reply-To: <17257.33037.210237.986072@cargo.ozlabs.ibm.com> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <17257.33037.210237.986072@cargo.ozlabs.ibm.com> Message-ID: <20051103052634.GD24772@localhost.localdomain> On Thu, Nov 03, 2005 at 02:16:29PM +1100, Paul Mackerras wrote: > Benjamin Herrenschmidt writes: > > > It took a while, but finally, here is the 64K pages support patch for > > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > > changes the kernel base page size to 64K. The resulting kernel still > > boots on any hardware. On current machines with 4K pages support only, > > the kernel will maintain 16 "subpages" for each 64K page > > transparently. > > > > Note that while real 64K capable HW has been tested, the current patch > > will not enable it yet as such hardware is not released yet, and I'm > > still verifying with the firmware architects the proper to get the > > information from the newer hypervisors. > > > > Signed-off-by: Benjamin Herrenschmidt > > Acked-by: Paul Mackerras This patch, however, should be applied on top to fix some problems with hugepage (some pre-existing, another introduced by this patch). The patch fixes a bug in the SLB miss handler for hugepages on ppc64 introduced by the dynamic hugepage patch (commit id c594adad5653491813959277fb87a2fef54c4e05) due to a misunderstanding of the srd instruction's behaviour (mea culpa). The problem arises when a 64-bit process maps some hugepages in the low 4GB of the address space (unusual). In this case, as well as the 256M segment in question being marked for hugepages, other segments at 32G intervals will be incorrectly marked for hugepages. In the process, this patch tweaks the semantics of the hugepage bitmaps to be more sensible. Previously, an address below 4G was marked for hugepages if the appropriate segment bit in the "low areas" bitmask was set *or* if the low bit in the "high areas" bitmap was set (which would mark all addresses below 1TB for hugepage). With this patch, any given address is governed by a single bitmap. Addresses below 4GB are marked for hugepage if and only if their bit is set in the "low areas" bitmap (256M granularity). Addresses between 4GB and 1TB are marked for hugepage iff the low bit in the "high areas" bitmap is set. Higher addresses are marked for hugepage iff their bit in the "high areas" bitmap is set (1TB granularity). To avoid conflicts, this patch must be applied on top of BenH's pending patch for 64k base page size [0]. As such, this patch also addresses a hugepage problem introduced by that patch. That patch allows hugepages of 1MB in size on hardware which supports it, however, that won't work when using 4k pages (4 level pagetable), because in that case hugepage PTEs are stored at the PMD level, and each PMD entry maps 2MB. This patch simply disallows hugepages in that case (we can do something cleverer to re-enable them some other day). Built, booted, and a handful of hugepage related tests passed on POWER5 LPAR (both ARCH=powerpc and ARCH=ppc64). [0] http://gate.crashing.org/~benh/ppc64-64k-pages.diff Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/mm/slb_low.S =================================================================== --- working-2.6.orig/arch/powerpc/mm/slb_low.S 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/slb_low.S 2005-11-03 14:55:56.000000000 +1100 @@ -80,12 +80,17 @@ BEGIN_FTR_SECTION b 1f END_FTR_SECTION_IFCLR(CPU_FTR_16M_PAGE) + cmpldi r10,16 + + lhz r9,PACALOWHTLBAREAS(r13) + mr r11,r10 + blt 5f + lhz r9,PACAHIGHHTLBAREAS(r13) srdi r11,r10,(HTLB_AREA_SHIFT-SID_SHIFT) - srd r9,r9,r11 - lhz r11,PACALOWHTLBAREAS(r13) - srd r11,r11,r10 - or. r9,r9,r11 + +5: srd r9,r9,r11 + andi. r9,r9,1 beq 1f _GLOBAL(slb_miss_user_load_huge) li r11,0 Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-03 15:40:56.000000000 +1100 @@ -329,12 +329,14 @@ */ if (mmu_psize_defs[MMU_PAGE_16M].shift) mmu_huge_psize = MMU_PAGE_16M; + /* With 4k/4level pagetables, we can't (for now) cope with a + * huge page size < PMD_SIZE */ else if (mmu_psize_defs[MMU_PAGE_1M].shift) mmu_huge_psize = MMU_PAGE_1M; /* Calculate HPAGE_SHIFT and sanity check it */ - if (mmu_psize_defs[mmu_huge_psize].shift > 16 && - mmu_psize_defs[mmu_huge_psize].shift < 28) + if (mmu_psize_defs[mmu_huge_psize].shift > MIN_HUGEPTE_SHIFT && + mmu_psize_defs[mmu_huge_psize].shift < SID_SHIFT) HPAGE_SHIFT = mmu_psize_defs[mmu_huge_psize].shift; else HPAGE_SHIFT = 0; /* No huge pages dude ! */ Index: working-2.6/include/asm-ppc64/pgtable-4k.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable-4k.h 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable-4k.h 2005-11-03 15:38:40.000000000 +1100 @@ -23,6 +23,9 @@ #define PMD_SIZE (1UL << PMD_SHIFT) #define PMD_MASK (~(PMD_SIZE-1)) +/* With 4k base page size, hugepage PTEs go at the PMD level */ +#define MIN_HUGEPTE_SHIFT PMD_SHIFT + /* PUD_SHIFT determines what a third-level page table entry can map */ #define PUD_SHIFT (PMD_SHIFT + PMD_INDEX_SIZE) #define PUD_SIZE (1UL << PUD_SHIFT) Index: working-2.6/include/asm-ppc64/pgtable-64k.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable-64k.h 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable-64k.h 2005-11-03 15:39:07.000000000 +1100 @@ -14,6 +14,9 @@ #define PTRS_PER_PMD (1 << PMD_INDEX_SIZE) #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE) +/* With 4k base page size, hugepage PTEs go at the PMD level */ +#define MIN_HUGEPTE_SHIFT PAGE_SHIFT + /* PMD_SHIFT determines what a second-level page table entry can map */ #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) #define PMD_SIZE (1UL << PMD_SHIFT) Index: working-2.6/arch/powerpc/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hugetlbpage.c 2005-11-03 15:56:34.000000000 +1100 @@ -212,6 +212,12 @@ BUG_ON(area >= NUM_HIGH_AREAS); + /* Hack, so that each addresses is controlled by exactly one + * of the high or low area bitmaps, the first high area starts + * at 4GB, not 0 */ + if (start == 0) + start = 0x100000000UL; + /* Check no VMAs are in the region */ vma = find_vma(mm, start); if (vma && (vma->vm_start < end)) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From olof at lixom.net Fri Nov 4 06:49:27 2005 From: olof at lixom.net (Olof Johansson) Date: Thu, 3 Nov 2005 11:49:27 -0800 Subject: [PATCH] POWERPC/PPC64: Fix CONFIG_SMP=n build for ppc64 Message-ID: <20051103194927.GC8515@pb15.lixom.net> Hi, Below is against 2.6.14-git5: --- Two CONFIG_SMP=n build fixes due to missing includes. Signed-off-by: Olof Johansson Index: 2.6/arch/ppc64/kernel/sysfs.c =================================================================== --- 2.6.orig/arch/ppc64/kernel/sysfs.c 2005-11-03 10:33:42.000000000 -0800 +++ 2.6/arch/ppc64/kernel/sysfs.c 2005-11-03 10:33:51.000000000 -0800 @@ -20,6 +20,7 @@ #include #include #include +#include static DEFINE_PER_CPU(struct cpu, cpu_devices); Index: 2.6/arch/powerpc/kernel/time.c =================================================================== --- 2.6.orig/arch/powerpc/kernel/time.c 2005-11-03 10:45:43.000000000 -0800 +++ 2.6/arch/powerpc/kernel/time.c 2005-11-03 10:49:52.000000000 -0800 @@ -69,6 +69,7 @@ #include #include #endif +#include /* keep track of when we need to update the rtc */ time_t last_rtc_update; From tim.bird at am.sony.com Fri Nov 4 06:59:31 2005 From: tim.bird at am.sony.com (Tim Bird) Date: Thu, 03 Nov 2005 11:59:31 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <20051102233011.GA29200@lst.de> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> <20051102233011.GA29200@lst.de> Message-ID: <436A6C23.2050604@am.sony.com> Christoph Hellwig wrote: > On Wed, Nov 02, 2005 at 09:55:22AM -0800, Hien Nguyen wrote: > >>Christoph Hellwig wrote: >>>especiall not for >>>some odd crap that hopefully never will get merged. >> >>I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi >>actively work on this project for a while). > > all these companies are known for producing lots of crap. > >>And systemtap itself does not try to merge any code to the main kernel. > > and we're never adding exports for out of tree code. you're out of > luck. These lines must be directly from Christoph's "How to motivate people" management handbook. Don't worry Hein. Other people (though maybe quieter than Christoph) see value in the SystemTap work. I hope it will continue to be developed and improved. -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= From hien at us.ibm.com Fri Nov 4 07:42:28 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Thu, 03 Nov 2005 12:42:28 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <436A6C23.2050604@am.sony.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> <20051102233011.GA29200@lst.de> <436A6C23.2050604@am.sony.com> Message-ID: <436A7634.20602@us.ibm.com> Tim Bird wrote: >These lines must be directly from Christoph's >"How to motivate people" management handbook. > >Don't worry Hein. Other people (though maybe quieter than >Christoph) see value in the SystemTap work. I hope it will >continue to be developed and improved. > -- Tim > >============================= >Tim Bird >Architecture Group Chair, CE Linux Forum >Senior Staff Engineer, Sony Electronics >============================= > > > > Thanks for the kind words. Yes, our intention is to make systemtap better, safer. Hien. From linas at austin.ibm.com Fri Nov 4 07:53:31 2005 From: linas at austin.ibm.com (linas) Date: Thu, 3 Nov 2005 14:53:31 -0600 Subject: [PATCH] fix add notifier crashes In-Reply-To: <1130948995.32348.35.camel@sinatra.austin.ibm.com> References: <1130948995.32348.35.camel@sinatra.austin.ibm.com> Message-ID: <20051103205331.GN19593@austin.ibm.com> On Wed, Nov 02, 2005 at 10:29:55AM -0600, John Rose was heard to remark: > Hi Paul- > > The extraction of PCI stuff from struct device_node left some false > assumptions in notifier code. As a result, dynamic add crashes when > non-PCI nodes are added. This patch fixes these assumptions. This is more or less the same as the patch I sent on 4 October. It was called "crash-on-pci-slot-add.patch" There's another closely related null ptr deref that is fixed in the patch "rpaphp-crashing.patch" that was the next one in that series. Anyway, other than that, it looks good to me. --linas From david at gibson.dropbear.id.au Fri Nov 4 11:16:53 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 4 Nov 2005 11:16:53 +1100 Subject: powerpc: Kill ppcdebug Message-ID: <20051104001653.GC29025@localhost.localdomain> The ancient ppcdebug/PPCDBG mechanism is now only used in two places. First, in the hash setup code, one of the bits allows the size of the hash table to be reduced by a factor of 8 - which would be better accomplished with a command line option for that purpose. The other was a bunch of bus walking related messages in the iSeries code, which would seem to be insufficient reason to keep the mechanism. This patch removes the last traces of this mechanism. Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/kernel/signal_32.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_32.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_32.c 2005-11-04 10:23:36.000000000 +1100 @@ -44,7 +44,6 @@ #include #ifdef CONFIG_PPC64 #include "ppc32.h" -#include #include #include #else Index: working-2.6/arch/powerpc/mm/init_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/init_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/init_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -57,7 +57,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/mm/pgtable_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/pgtable_64.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/powerpc/mm/pgtable_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/iseries/smp.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/smp.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/smp.c 2005-11-04 10:23:20.000000000 +1100 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/iommu.c 2005-11-04 10:23:20.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/lpar.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/lpar.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/lpar.c 2005-11-04 10:23:20.000000000 +1100 @@ -31,7 +31,6 @@ #include #include #include -#include #include #include #include @@ -39,6 +38,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) Index: working-2.6/arch/powerpc/platforms/pseries/ras.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/ras.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/ras.c 2005-11-04 10:23:20.000000000 +1100 @@ -48,7 +48,7 @@ #include #include #include -#include +#include static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(ras_log_buf_lock); Index: working-2.6/arch/ppc64/kernel/prom.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom.c 2005-11-04 10:23:20.000000000 +1100 @@ -46,7 +46,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/ppc64/kernel/prom_init.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom_init.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom_init.c 2005-11-04 10:23:20.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/sysdev/u3_iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/sysdev/u3_iommu.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/sysdev/u3_iommu.c 2005-11-04 10:23:20.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/kernel/setup_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/setup_64.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/setup_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -60,6 +59,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -352,12 +352,6 @@ DBG(" -> early_setup()\n"); /* - * Fill the default DBG level (do we want to keep - * that old mecanism around forever ?) - */ - ppcdbg_initialize(); - - /* * Do early initializations using the flattened device * tree, like retreiving the physical memory map or * calculating/retreiving the hash table size @@ -605,7 +599,6 @@ printk("-----------------------------------------------------\n"); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); - printk("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); Index: working-2.6/include/asm-ppc64/ppcdebug.h =================================================================== --- working-2.6.orig/include/asm-ppc64/ppcdebug.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,108 +0,0 @@ -#ifndef __PPCDEBUG_H -#define __PPCDEBUG_H -/******************************************************************** - * Author: Adam Litke, IBM Corp - * (c) 2001 - * - * This file contains definitions and macros for a runtime debugging - * system for ppc64 (This should also work on 32 bit with a few - * adjustments. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - ********************************************************************/ - -#include -#include -#include -#include - -#define PPCDBG_BITVAL(X) ((1UL)<<((unsigned long)(X))) - -/* Defined below are the bit positions of various debug flags in the - * ppc64_debug_switch variable. - * -- When adding new values, please enter them into trace names below -- - * - * Values 62 & 63 can be used to stress the hardware page table management - * code. They must be set statically, any attempt to change them dynamically - * would be a very bad idea. - */ -#define PPCDBG_MMINIT PPCDBG_BITVAL(0) -#define PPCDBG_MM PPCDBG_BITVAL(1) -#define PPCDBG_SYS32 PPCDBG_BITVAL(2) -#define PPCDBG_SYS32NI PPCDBG_BITVAL(3) -#define PPCDBG_SYS32X PPCDBG_BITVAL(4) -#define PPCDBG_SYS32M PPCDBG_BITVAL(5) -#define PPCDBG_SYS64 PPCDBG_BITVAL(6) -#define PPCDBG_SYS64NI PPCDBG_BITVAL(7) -#define PPCDBG_SYS64X PPCDBG_BITVAL(8) -#define PPCDBG_SIGNAL PPCDBG_BITVAL(9) -#define PPCDBG_SIGNALXMON PPCDBG_BITVAL(10) -#define PPCDBG_BINFMT32 PPCDBG_BITVAL(11) -#define PPCDBG_BINFMT64 PPCDBG_BITVAL(12) -#define PPCDBG_BINFMTXMON PPCDBG_BITVAL(13) -#define PPCDBG_BINFMT_32ADDR PPCDBG_BITVAL(14) -#define PPCDBG_ALIGNFIXUP PPCDBG_BITVAL(15) -#define PPCDBG_TCEINIT PPCDBG_BITVAL(16) -#define PPCDBG_TCE PPCDBG_BITVAL(17) -#define PPCDBG_PHBINIT PPCDBG_BITVAL(18) -#define PPCDBG_SMP PPCDBG_BITVAL(19) -#define PPCDBG_BOOT PPCDBG_BITVAL(20) -#define PPCDBG_BUSWALK PPCDBG_BITVAL(21) -#define PPCDBG_PROM PPCDBG_BITVAL(22) -#define PPCDBG_RTAS PPCDBG_BITVAL(23) -#define PPCDBG_HTABSTRESS PPCDBG_BITVAL(62) -#define PPCDBG_HTABSIZE PPCDBG_BITVAL(63) -#define PPCDBG_NONE (0UL) -#define PPCDBG_ALL (0xffffffffUL) - -/* The default initial value for the debug switch */ -#define PPC_DEBUG_DEFAULT 0 -/* #define PPC_DEBUG_DEFAULT PPCDBG_ALL */ - -#define PPCDBG_NUM_FLAGS 64 - -extern u64 ppc64_debug_switch; - -#ifdef WANT_PPCDBG_TAB -/* A table of debug switch names to allow name lookup in xmon - * (and whoever else wants it. - */ -char *trace_names[PPCDBG_NUM_FLAGS] = { - /* Known debug names */ - "mminit", "mm", - "syscall32", "syscall32_ni", "syscall32x", "syscall32m", - "syscall64", "syscall64_ni", "syscall64x", - "signal", "signal_xmon", - "binfmt32", "binfmt64", "binfmt_xmon", "binfmt_32addr", - "alignfixup", "tceinit", "tce", "phb_init", - "smp", "boot", "buswalk", "prom", - "rtas" -}; -#else -extern char *trace_names[64]; -#endif /* WANT_PPCDBG_TAB */ - -#ifdef CONFIG_PPCDBG -/* Macro to conditionally print debug based on debug_switch */ -#define PPCDBG(...) udbg_ppcdbg(__VA_ARGS__) - -/* Macro to conditionally call a debug routine based on debug_switch */ -#define PPCDBGCALL(FLAGS,FUNCTION) ifppcdebug(FLAGS) FUNCTION - -/* Macros to test for debug states */ -#define ifppcdebug(FLAGS) if (udbg_ifdebug(FLAGS)) -#define ppcdebugset(FLAGS) (udbg_ifdebug(FLAGS)) -#define PPCDBG_BINFMT (test_thread_flag(TIF_32BIT) ? PPCDBG_BINFMT32 : PPCDBG_BINFMT64) - -#else -#define PPCDBG(...) do {;} while (0) -#define PPCDBGCALL(FLAGS,FUNCTION) do {;} while (0) -#define ifppcdebug(...) if (0) -#define ppcdebugset(FLAGS) (0) -#endif /* CONFIG_PPCDBG */ - -#endif /*__PPCDEBUG_H */ Index: working-2.6/arch/ppc64/kernel/udbg.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/udbg.c 2005-11-04 10:23:20.000000000 +1100 @@ -10,12 +10,10 @@ */ #include -#define WANT_PPCDBG_TAB /* Only defined here */ #include #include #include #include -#include #include void (*udbg_putc)(unsigned char c); @@ -89,59 +87,6 @@ va_end(args); } -/* PPCDBG stuff */ - -u64 ppc64_debug_switch; - -/* Special print used by PPCDBG() macro */ -void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...) -{ - unsigned long active_debugs = debug_flags & ppc64_debug_switch; - - if (active_debugs) { - va_list ap; - unsigned char buf[UDBG_BUFSIZE]; - unsigned long i, len = 0; - - for (i=0; i < PPCDBG_NUM_FLAGS; i++) { - if (((1U << i) & active_debugs) && - trace_names[i]) { - len += strlen(trace_names[i]); - udbg_puts(trace_names[i]); - break; - } - } - - snprintf(buf, UDBG_BUFSIZE, " [%s]: ", current->comm); - len += strlen(buf); - udbg_puts(buf); - - while (len < 18) { - udbg_puts(" "); - len++; - } - - va_start(ap, fmt); - vsnprintf(buf, UDBG_BUFSIZE, fmt, ap); - udbg_puts(buf); - va_end(ap); - } -} - -unsigned long udbg_ifdebug(unsigned long flags) -{ - return (flags & ppc64_debug_switch); -} - -/* - * Initialize the PPCDBG state. Called before relocation has been enabled. - */ -void __init ppcdbg_initialize(void) -{ - ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ - /* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */; -} - /* * Early boot console based on udbg */ Index: working-2.6/include/asm-ppc64/udbg.h =================================================================== --- working-2.6.orig/include/asm-ppc64/udbg.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/udbg.h 2005-11-04 10:23:20.000000000 +1100 @@ -23,9 +23,6 @@ extern void register_early_udbg_console(void); extern void udbg_printf(const char *fmt, ...); -extern void udbg_ppcdbg(unsigned long flags, const char *fmt, ...); -extern unsigned long udbg_ifdebug(unsigned long flags); -extern void __init ppcdbg_initialize(void); extern void udbg_init_uart(void __iomem *comport, unsigned int speed); Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -32,7 +32,6 @@ #include #include -#include #include #include #include @@ -194,12 +193,6 @@ htab_size_bytes = get_hashtable_size(); pteg_count = htab_size_bytes >> 7; - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - htab_hash_mask = pteg_count - 1; if (systemcfg->platform & PLATFORM_LPAR) { Index: working-2.6/arch/powerpc/platforms/iseries/irq.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/irq.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/irq.c 2005-11-04 10:23:20.000000000 +1100 @@ -35,7 +35,6 @@ #include #include -#include #include #include #include @@ -227,8 +226,6 @@ /* Unmask secondary INTA */ mask = 0x80000000; HvCallPci_unmaskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_enable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* This is called by iSeries_activate_IRQs */ @@ -310,8 +307,6 @@ /* Mask secondary INTA */ mask = 0x80000000; HvCallPci_maskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_disable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* Index: working-2.6/arch/powerpc/platforms/iseries/pci.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/pci.c 2005-11-04 10:23:20.000000000 +1100 @@ -32,7 +32,6 @@ #include #include #include -#include #include #include @@ -207,10 +206,6 @@ struct device_node *node; struct pci_dn *pdn; - PPCDBG(PPCDBG_BUSWALK, - "-build_device_node 0x%02X.%02X.%02X Function: %02X\n", - Bus, SubBus, AgentId, Function); - node = kmalloc(sizeof(struct device_node), GFP_KERNEL); if (node == NULL) return NULL; @@ -243,8 +238,6 @@ struct pci_controller *phb; HvBusNumber bus; - PPCDBG(PPCDBG_BUSWALK, "find_and_init_phbs Entry\n"); - /* Check all possible buses. */ for (bus = 0; bus < 256; bus++) { int ret = HvCallXm_testBus(bus); @@ -261,9 +254,6 @@ phb->last_busno = bus; phb->ops = &iSeries_pci_ops; - PPCDBG(PPCDBG_BUSWALK, "PCI:Create iSeries pci_controller(%p), Bus: %04X\n", - phb, bus); - /* Find and connect the devices. */ scan_PHB_slots(phb); } @@ -285,11 +275,9 @@ */ void iSeries_pcibios_init(void) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iomm_table_initialize(); find_and_init_phbs(); io_page_mask = -1; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } /* @@ -301,8 +289,6 @@ struct device_node *node; int DeviceCount = 0; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup Entry.\n"); - /* Fix up at the device node and pci_dev relationship */ mf_display_src(0xC9000100); @@ -316,9 +302,6 @@ ++DeviceCount; pdev->sysdata = (void *)node; PCI_DN(node)->pcidev = pdev; - PPCDBG(PPCDBG_BUSWALK, - "pdev 0x%p <==> DevNode 0x%p\n", - pdev, node); allocate_device_bars(pdev); iSeries_Device_Information(pdev, DeviceCount); iommu_devnode_init_iSeries(node); @@ -333,13 +316,10 @@ void pcibios_fixup_bus(struct pci_bus *PciBus) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup_bus(0x%04X) Entry.\n", - PciBus->number); } void pcibios_fixup_resources(struct pci_dev *pdev) { - PPCDBG(PPCDBG_BUSWALK, "fixup_resources pdev %p\n", pdev); } /* @@ -401,9 +381,6 @@ printk("found device at bus %d idsel %d func %d (AgentId %x)\n", bus, IdSel, Function, AgentId); /* Connect EADs: 0x18.00.12 = 0x00 */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:Connect EADs: 0x%02X.%02X.%02X\n", - bus, SubBus, AgentId); HvRc = HvCallPci_getBusUnitInfo(bus, SubBus, AgentId, iseries_hv_addr(BridgeInfo), sizeof(struct HvCallPci_BridgeInfo)); @@ -414,14 +391,6 @@ BridgeInfo->maxAgents, BridgeInfo->maxSubBusNumber, BridgeInfo->logicalSlotNumber); - PPCDBG(PPCDBG_BUSWALK, - "PCI: BridgeInfo, Type:0x%02X, SubBus:0x%02X, MaxAgents:0x%02X, MaxSubBus: 0x%02X, LSlot: 0x%02X\n", - BridgeInfo->busUnitInfo.deviceType, - BridgeInfo->subBusNumber, - BridgeInfo->maxAgents, - BridgeInfo->maxSubBusNumber, - BridgeInfo->logicalSlotNumber); - if (BridgeInfo->busUnitInfo.deviceType == HvCallPci_BridgeDevice) { /* Scan_Bridge_Slot...: 0x18.00.12 */ @@ -454,9 +423,6 @@ /* iSeries_allocate_IRQ.: 0x18.00.12(0xA3) */ Irq = iSeries_allocate_IRQ(Bus, 0, EADsIdSel); - PPCDBG(PPCDBG_BUSWALK, - "PCI:- allocate and assign IRQ 0x%02X.%02X.%02X = 0x%02X\n", - Bus, 0, EADsIdSel, Irq); /* * Connect all functions of any device found. @@ -482,9 +448,6 @@ printk("read vendor ID: %x\n", VendorId); /* FoundDevice: 0x18.28.10 = 0x12AE */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:- FoundDevice: 0x%02X.%02X.%02X = 0x%04X, irq %d\n", - Bus, SubBus, AgentId, VendorId, Irq); HvRc = HvCallPci_configStore8(Bus, SubBus, AgentId, PCI_INTERRUPT_LINE, Irq); if (HvRc != 0) Index: working-2.6/arch/powerpc/platforms/iseries/setup.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/setup.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/setup.c 2005-11-04 10:23:20.000000000 +1100 @@ -71,8 +71,6 @@ #endif /* Function Prototypes */ -extern void ppcdbg_initialize(void); - static void build_iSeries_Memory_Map(void); static void iseries_shared_idle(void); static void iseries_dedicated_idle(void); @@ -309,8 +307,6 @@ ppc64_firmware_features = FW_FEATURE_ISERIES; - ppcdbg_initialize(); - ppc64_interrupt_controller = IC_ISERIES; #if defined(CONFIG_BLK_DEV_INITRD) Index: working-2.6/arch/ppc64/Kconfig.debug =================================================================== --- working-2.6.orig/arch/ppc64/Kconfig.debug 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/Kconfig.debug 2005-11-04 10:23:20.000000000 +1100 @@ -55,10 +55,6 @@ xmon is normally disabled unless booted with 'xmon=on'. Use 'xmon=off' to disable xmon init during runtime. -config PPCDBG - bool "Include PPCDBG realtime debugging" - depends on DEBUG_KERNEL - config IRQSTACKS bool "Use separate kernel stacks when processing interrupts" help Index: working-2.6/arch/powerpc/kernel/signal_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_64.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_64.c 2005-11-04 10:23:47.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From linas at linas.org Fri Nov 4 10:59:18 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 17:59:18 -0600 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Message-ID: <20051103235918.GA25616@mail.gnucash.org> What follows is a long sequence of mostly small patches to implement PCI Error Recovery by adding notification callbacks to the PCI device driver structure, implementing the recovery in 5 device drivers (3 ethernet, two scsi drivers), and adding the actual error detection and recovery code to the ppc64/powerpc arch tree. Highlights: -- Patches 1-14: Misc required ppc64/powerpc cleanup/bugfixes/restructuring -- Patch 15: Overview documentation -- Patch 16: changes to include/linux/pci.h -- Patches 17-26: error detection and recovery for pSeries PCI bridge chips -- Patchs 27-32: recovery patches for ethernet, scsi device drivers -- Patches 33-42: More misc ppc64-specific changes Signed-off-by: Linas Vepstas From michael at ellerman.id.au Fri Nov 4 11:34:26 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 4 Nov 2005 11:34:26 +1100 Subject: powerpc: Kill ppcdebug In-Reply-To: <20051104001653.GC29025@localhost.localdomain> References: <20051104001653.GC29025@localhost.localdomain> Message-ID: <200511041134.30106.michael@ellerman.id.au> On Fri, 4 Nov 2005 11:16, David Gibson wrote: > The ancient ppcdebug/PPCDBG mechanism is now only used in two places. > First, in the hash setup code, one of the bits allows the size of the > hash table to be reduced by a factor of 8 - which would be better > accomplished with a command line option for that purpose. The other > was a bunch of bus walking related messages in the iSeries code, which > would seem to be insufficient reason to keep the mechanism. > > This patch removes the last traces of this mechanism. I agree it's pretty ugly, but I thought the concept was at least nice, ie. runttime enablable debugging. The current scheme of having to #define DEBUG in a gazillion different files is pretty painful. Oh well. -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051104/204e9ca5/attachment.pgp From linas at austin.ibm.com Fri Nov 4 11:42:26 2005 From: linas at austin.ibm.com (linas) Date: Thu, 3 Nov 2005 18:42:26 -0600 Subject: [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces In-Reply-To: <20051103235918.GA25616@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004226.GQ19593@austin.ibm.com> 01-pci-dn-uniformization.patch This patch changes the rtas_pci interface to use the new struct pci_dn structure for two routines that work with pci device nodes. This patch also does some minor janitorial work: it uses some handy macros and cleans up some trailing whitespace in the affected file. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 11:59:11.879644789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:01:21.403477910 -0600 @@ -71,10 +71,6 @@ * and sent out for processing. */ -/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */ -#define BUID_HI(buid) ((buid) >> 32) -#define BUID_LO(buid) ((buid) & 0xffffffff) - /* EEH event workqueue setup. */ static DEFINE_SPINLOCK(eeh_eventlist_lock); LIST_HEAD(eeh_eventlist); Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-10-31 11:59:11.880644649 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-10-31 12:01:21.404477769 -0600 @@ -26,6 +26,10 @@ extern struct pci_dev *ppc64_isabridge_dev; /* may be NULL if no ISA bus */ +/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */ +#define BUID_HI(buid) ((buid) >> 32) +#define BUID_LO(buid) ((buid) & 0xffffffff) + /* PCI device_node operations */ struct device_node; typedef void *(*traverse_func)(struct device_node *me, void *data); Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c 2005-10-31 11:59:11.879644789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c 2005-10-31 12:01:21.407477349 -0600 @@ -5,19 +5,19 @@ * Copyright (C) 2003 Anton Blanchard , IBM * * RTAS specific routines for PCI. - * + * * Based on code from pci.c, chrp_pci.c and pSeries_pci.c * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA @@ -47,7 +47,7 @@ static int ibm_read_pci_config; static int ibm_write_pci_config; -static int config_access_valid(struct pci_dn *dn, int where) +static inline int config_access_valid(struct pci_dn *dn, int where) { if (where < 256) return 1; @@ -72,16 +72,14 @@ return 0; } -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; int ret; - struct pci_dn *pdn; - if (!dn || !dn->data) + if (!pdn) return PCIBIOS_DEVICE_NOT_FOUND; - pdn = dn->data; if (!config_access_valid(pdn, where)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -90,7 +88,7 @@ buid = pdn->phb->buid; if (buid) { ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, - addr, buid >> 32, buid & 0xffffffff, size); + addr, BUID_HI(buid), BUID_LO(buid), size); } else { ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size); } @@ -100,7 +98,7 @@ return PCIBIOS_DEVICE_NOT_FOUND; if (returnval == EEH_IO_ERROR_VALUE(size) && - eeh_dn_check_failure (dn, NULL)) + eeh_dn_check_failure (pdn->node, NULL)) return PCIBIOS_DEVICE_NOT_FOUND; return PCIBIOS_SUCCESSFUL; @@ -118,23 +116,23 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->data && PCI_DN(dn)->devfn == devfn + for (dn = busdn->child; dn; dn = dn->sibling) { + struct pci_dn *pdn = PCI_DN(dn); + if (pdn && pdn->devfn == devfn && of_device_available(dn)) - return rtas_read_config(dn, where, size, val); + return rtas_read_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } -int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val) { unsigned long buid, addr; int ret; - struct pci_dn *pdn; - if (!dn || !dn->data) + if (!pdn) return PCIBIOS_DEVICE_NOT_FOUND; - pdn = dn->data; if (!config_access_valid(pdn, where)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -142,7 +140,8 @@ (pdn->devfn << 8) | (where & 0xff); buid = pdn->phb->buid; if (buid) { - ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val); + ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, + BUID_HI(buid), BUID_LO(buid), size, (ulong) val); } else { ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val); } @@ -165,10 +164,12 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->data && PCI_DN(dn)->devfn == devfn + for (dn = busdn->child; dn; dn = dn->sibling) { + struct pci_dn *pdn = PCI_DN(dn); + if (pdn && pdn->devfn == devfn && of_device_available(dn)) - return rtas_write_config(dn, where, size, val); + return rtas_write_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } @@ -221,7 +222,7 @@ /* Python's register file is 1 MB in size. */ chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000); - /* + /* * Firmware doesn't always clear this bit which is critical * for good performance - Anton */ @@ -292,7 +293,7 @@ if (bus_range == NULL || len < 2 * sizeof(int)) { return 1; } - + phb->first_busno = bus_range[0]; phb->last_busno = bus_range[1]; From linas at linas.org Fri Nov 4 11:47:50 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:47:50 -0600 Subject: [PATCH 2/42]: ppc64: misc minor cleanup References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004750.GA26782@mail.gnucash.org> 02-eeh-minor-cleanup.patch This patch performs some minor cleanup of the eeh.c file, including: -- trim some trailing whitespace -- remove extraneous #includes -- use the macro PCI_DN uniformly, instead of the void pointer chase. -- typos in comments -- improved debug printk's Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:01:21.403477910 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:06:16.222121166 -0600 @@ -1,32 +1,31 @@ /* * eeh.c * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include #include #include -#include #include #include #include #include #include #include +#include #include #include #include @@ -49,8 +48,8 @@ * were "empty": all reads return 0xff's and all writes are silently * ignored. EEH slot isolation events can be triggered by parity * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by dust, vibration, humidity, - * radioactivity or plain-old failed hardware. + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. * * Note, however, that one of the leading causes of EEH slot * freeze events are buggy device drivers, buggy device microcode, @@ -256,18 +255,17 @@ dn = pci_device_to_OF_node(dev); if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", - pci_name(dev)); + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); return; } /* Skip any devices for which EEH is not enabled. */ - pdn = dn->data; + pdn = PCI_DN(dn); if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { #ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s\n", - pci_name(dev)); + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); #endif return; } @@ -410,16 +408,16 @@ * @dn: device node to read * @rets: array to return results in */ -static int read_slot_reset_state(struct device_node *dn, int rets[]) +static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) { int token, outputs; - struct pci_dn *pdn = dn->data; if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { token = ibm_read_slot_reset_state2; outputs = 4; } else { token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ outputs = 3; } @@ -496,7 +494,7 @@ /** * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * @token i/o token, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -522,7 +520,7 @@ * will query firmware for the EEH status. * * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a solt isolation event notification. + * a non-zero value and queues up a slot isolation event notification. * * It is safe to call this routine in an interrupt context. */ @@ -542,7 +540,7 @@ if (!dn) return 0; - pdn = dn->data; + pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || @@ -562,7 +560,7 @@ atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { /* re-read the slot reset state */ - if (read_slot_reset_state(dn, rets) != 0) + if (read_slot_reset_state(pdn, rets) != 0) rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } @@ -576,7 +574,7 @@ * function zero of a multi-function device. * In any case they must share a common PHB. */ - ret = read_slot_reset_state(dn, rets); + ret = read_slot_reset_state(pdn, rets); if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { __get_cpu_var(false_positives)++; return 0; @@ -635,7 +633,6 @@ * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -680,7 +677,7 @@ u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); u32 *regs; int enable; - struct pci_dn *pdn = dn->data; + struct pci_dn *pdn = PCI_DN(dn); pdn->eeh_mode = 0; @@ -732,7 +729,7 @@ /* This device doesn't support EEH, but it may have an * EEH parent, in which case we mark it as supported. */ - if (dn->parent && dn->parent->data + if (dn->parent && PCI_DN(dn->parent) && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { /* Parent supports EEH. */ pdn->eeh_mode |= EEH_MODE_SUPPORTED; @@ -745,7 +742,7 @@ dn->full_name); } - return NULL; + return NULL; } /* @@ -793,13 +790,11 @@ for (phb = of_find_node_by_name(NULL, "pci"); phb; phb = of_find_node_by_name(phb, "pci")) { unsigned long buid; - struct pci_dn *pci; buid = get_phb_buid(phb); - if (buid == 0 || phb->data == NULL) + if (buid == 0 || PCI_DN(phb) == NULL) continue; - pci = phb->data; info.buid_lo = BUID_LO(buid); info.buid_hi = BUID_HI(buid); traverse_pci_devices(phb, early_enable_eeh, &info); @@ -828,11 +823,13 @@ struct pci_controller *phb; struct eeh_early_enable_info info; - if (!dn || !dn->data) + if (!dn || !PCI_DN(dn)) return; phb = PCI_DN(dn)->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } From linas at linas.org Fri Nov 4 11:48:45 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:48:45 -0600 Subject: [PATCH 3/42]: ppc64: PCI address cache minor fixes References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004845.GA26803@mail.gnucash.org> 03-eeh-addr-cache-cleanup.patch This is a minor patch to clean up a buglet related to the PCI address cache. (The buglet doesn't manifes itself unless there are also bugs elsewhere, which is why its minor.). Also: -- Improved debug printing. -- Declare some private routines as static -- Adds reference counting to struct pci_dn->pcidev structure Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:07:15.072864803 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:10:23.985360685 -0600 @@ -219,9 +219,9 @@ while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -240,6 +240,11 @@ piar->pcidev = dev; piar->flags = flags; +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -301,7 +306,7 @@ * we maintain a cache of devices that can be quickly searched. * This routine adds a device to that cache. */ -void pci_addr_cache_insert_device(struct pci_dev *dev) +static void pci_addr_cache_insert_device(struct pci_dev *dev) { unsigned long flags; @@ -344,7 +349,7 @@ * the tree multiple times (once per resource). * But so what; device removal doesn't need to be that fast. */ -void pci_addr_cache_remove_device(struct pci_dev *dev) +static void pci_addr_cache_remove_device(struct pci_dev *dev) { unsigned long flags; @@ -366,6 +371,9 @@ { struct pci_dev *dev = NULL; + if (!eeh_subsystem_enabled) + return; + spin_lock_init(&pci_io_addr_cache_root.piar_lock); while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { @@ -837,7 +845,7 @@ info.buid_lo = BUID_LO(phb->buid); early_enable_eeh(dn, &info); } -EXPORT_SYMBOL(eeh_add_device_early); +EXPORT_SYMBOL_GPL(eeh_add_device_early); /** * eeh_add_device_late - perform EEH initialization for the indicated pci device @@ -848,6 +856,8 @@ */ void eeh_add_device_late(struct pci_dev *dev) { + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -855,9 +865,13 @@ printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); #endif + pci_dev_get (dev); + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = dev; + pci_addr_cache_insert_device (dev); } -EXPORT_SYMBOL(eeh_add_device_late); +EXPORT_SYMBOL_GPL(eeh_add_device_late); /** * eeh_remove_device - undo EEH setup for the indicated pci device @@ -868,6 +882,7 @@ */ void eeh_remove_device(struct pci_dev *dev) { + struct device_node *dn; if (!dev || !eeh_subsystem_enabled) return; @@ -876,8 +891,12 @@ printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); #endif pci_addr_cache_remove_device(dev); + + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = NULL; + pci_dev_put (dev); } -EXPORT_SYMBOL(eeh_remove_device); +EXPORT_SYMBOL_GPL(eeh_remove_device); static int proc_eeh_show(struct seq_file *m, void *v) { Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-10-31 12:01:21.404477769 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-10-31 12:10:06.152862619 -0600 @@ -39,10 +39,6 @@ void pci_devs_phb_init(void); void pci_devs_phb_init_dynamic(struct pci_controller *phb); -/* PCI address cache management routines */ -void pci_addr_cache_insert_device(struct pci_dev *dev); -void pci_addr_cache_remove_device(struct pci_dev *dev); - /* From rtas_pci.h */ void init_pci_config_tokens (void); unsigned long get_phb_buid (struct device_node *); From linas at linas.org Fri Nov 4 11:48:52 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:48:52 -0600 Subject: [PATCH 4/42]: ppc64: PCI error rate statistics References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004852.GA26811@mail.gnucash.org> 04-eeh-statistics.patch This minor patch adds some statistics-gathering counters that allow the behaviour of the EEH subsystem o be monitored. While far from perfect, it does provide a rudimentary device that makes understanding of the current state of the system a bit easier. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:10:23.985360685 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:11:57.134291514 -0600 @@ -102,6 +102,10 @@ static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -493,8 +497,6 @@ notifier_call_chain (&eeh_notifier_chain, EEH_NOTIFY_FREEZE, event); - __get_cpu_var(slot_resets)++; - pci_dev_put(event->dev); kfree(event); } @@ -546,17 +548,24 @@ if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; +#ifdef DEBUG + printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name); +#endif return 0; } if (!pdn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -590,6 +599,7 @@ /* prevent repeated reports of this failure */ pdn->eeh_mode |= EEH_MODE_ISOLATED; + __get_cpu_var(slot_resets)++; reset_state = rets[0]; @@ -657,8 +667,10 @@ /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -903,12 +915,17 @@ unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -916,13 +933,17 @@ seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" + "eeh_false_positives=%ld\n" + "eeh_ignored_failures=%ld\n" + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; From linas at linas.org Fri Nov 4 11:49:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:01 -0600 Subject: [PATCH 5/42]: ppc64: RTAS error reporting restructuring References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004901.GA26819@mail.gnucash.org> 05-eeh-slot-error-detail.patch This patch encapsulates a section of code that reports the EEH event. The new subroutine can be used in several places to report the error. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:11:57.134291514 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:13:09.282168648 -0600 @@ -397,6 +397,28 @@ /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct pci_dn *pdn, int severity) +{ + unsigned long flags; + int rc; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -454,9 +476,12 @@ * Since the panic_on_oops sysctl is used to halt the system * in light of potential corruption, we can use it here. */ - if (panic_on_oops) + if (panic_on_oops) { + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, pci_name(dev)); + } else { __get_cpu_var(ignored_failures)++; printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", @@ -539,7 +564,7 @@ int ret; int rets[3]; unsigned long flags; - int rc, reset_state; + int reset_state; struct eeh_event *event; struct pci_dn *pdn; @@ -603,20 +628,7 @@ reset_state = rets[0]; - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); + eeh_slot_error_detail (pdn, 1 /* Temporary Error */); printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", rets[0], dn->name, dn->full_name); @@ -783,6 +795,8 @@ struct device_node *phb, *np; struct eeh_early_enable_info info; + spin_lock_init(&slot_errbuf_lock); + np = of_find_node_by_path("/rtas"); if (np == NULL) return; From linas at linas.org Fri Nov 4 11:49:15 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:15 -0600 Subject: [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004915.GA26827@mail.gnucash.org> 06-eeh-empty-slot-error.patch Performing PCI config-space reads to empty PCI slots can lead to reports of "permanent failure" from the firmware. Ignore permanent failures on empty slots. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:13:09.282168648 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:15:26.162962756 -0600 @@ -617,7 +617,32 @@ * In any case they must share a common PHB. */ ret = read_slot_reset_state(pdn, rets); - if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { + + /* If the call to firmware failed, punt */ + if (ret != 0) { + printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + return 0; + } + + /* If EEH is not supported on this device, punt. */ + if (rets[1] != 1) { + printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + return 0; + } + + /* If not the kind of error we know about, punt. */ + if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { + __get_cpu_var(false_positives)++; + return 0; + } + + /* Note that config-io to empty slots may fail; + * we recognize empty because they don't have children. */ + if ((rets[0] == 5) && (dn->child == NULL)) { __get_cpu_var(false_positives)++; return 0; } @@ -650,7 +675,7 @@ /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - dump_stack(); + if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); return 0; From linas at linas.org Fri Nov 4 11:49:23 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:23 -0600 Subject: [PATCH 7/42]: ppc64: serialize reports of PCI errors References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004923.GA26835@mail.gnucash.org> 07-eeh-report-race.patch When a PCI slot is isolated, all PCI functions under that slot are affected. If hese functions have separate device drivers, the EEH isolation event might be reported multiple times. This patch adds a lock to prevent the racing of such multiple reports. It also marks every device under the slot as having experienced an EEH event, so that multiple reports may be recognized more easily. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:15:26.162962756 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:16:19.766441392 -0600 @@ -96,6 +96,9 @@ static int eeh_subsystem_enabled; +/* Lock to avoid races due to multiple reports of an error */ +static DEFINE_SPINLOCK(confirm_error_lock); + /* Buffer for reporting slot-error-detail rtas calls */ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(slot_errbuf_lock); @@ -544,6 +547,55 @@ return pa | (token & (PAGE_SIZE-1)); } +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + +/** Mark all devices that are peers of this device as failed. + * Mark the device driver too, so that it can see the failure + * immediately; this is critical, since some drivers poll + * status registers in interrupts ... If a driver is polling, + * and the slot is frozen, then the driver can deadlock in + * an interrupt context, which is bad. + */ + +static inline void __eeh_mark_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + + if (dn->child) + __eeh_mark_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void __eeh_clear_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; + if (dn->child) + __eeh_clear_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void eeh_clear_slot (struct device_node *dn) +{ + unsigned long flags; + spin_lock_irqsave(&confirm_error_lock, flags); + __eeh_clear_slot (dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -567,6 +619,8 @@ int reset_state; struct eeh_event *event; struct pci_dn *pdn; + struct device_node *pe_dn; + int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -594,10 +648,14 @@ return 0; } - /* - * If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check... + /* If we already have a pending isolation event for this + * slot, we know it's bad already, we don't need to check. + * Do this checking under a lock; as multiple PCI devices + * in one slot might report errors simultaneously, and we + * only want one error recovery routine running. */ + spin_lock_irqsave(&confirm_error_lock, flags); + rc = 1; if (pdn->eeh_mode & EEH_MODE_ISOLATED) { atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { @@ -606,7 +664,7 @@ rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } - return 0; + goto dn_unlock; } /* @@ -623,7 +681,8 @@ printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", ret, dn->full_name); __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* If EEH is not supported on this device, punt. */ @@ -631,25 +690,33 @@ printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", ret, dn->full_name); __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* If not the kind of error we know about, punt. */ if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* Note that config-io to empty slots may fail; * we recognize empty because they don't have children. */ if ((rets[0] == 5) && (dn->child == NULL)) { __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } - /* prevent repeated reports of this failure */ - pdn->eeh_mode |= EEH_MODE_ISOLATED; - __get_cpu_var(slot_resets)++; + __get_cpu_var(slot_resets)++; + + /* Avoid repeated reports of this failure, including problems + * with other functions on this device, and functions under + * bridges. */ + pe_dn = find_device_pe (dn); + __eeh_mark_slot (pe_dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); reset_state = rets[0]; @@ -678,10 +745,14 @@ if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); - return 0; + return 1; + +dn_unlock: + spin_unlock_irqrestore(&confirm_error_lock, flags); + return rc; } -EXPORT_SYMBOL(eeh_dn_check_failure); +EXPORT_SYMBOL_GPL(eeh_dn_check_failure); /** * eeh_check_failure - check if all 1's data is due to EEH slot freeze @@ -820,6 +891,7 @@ struct device_node *phb, *np; struct eeh_early_enable_info info; + spin_lock_init(&confirm_error_lock); spin_lock_init(&slot_errbuf_lock); np = of_find_node_by_path("/rtas"); From linas at linas.org Fri Nov 4 11:49:31 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:31 -0600 Subject: [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004931.GA26844@mail.gnucash.org> 08-eeh-spin-counter.patch One an EEH event is triggers, all further I/O to a device is blocked (until reset). Bad device drivers may end up spinning in their interrupt handlers, trying to read an interrupt status register that will never change state. This patch moves that spin counter to a per-device structure, and adds some diagnostic prints to help locate the bad driver. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:16:19.766441392 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:18:21.924300428 -0600 @@ -78,14 +78,12 @@ static struct notifier_block *eeh_notifier_chain; -/* - * If a device driver keeps reading an MMIO register in an interrupt +/* If a device driver keeps reading an MMIO register in an interrupt * handler after a slot isolation event has occurred, we assume it * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -521,7 +519,6 @@ "%s\n", event->reset_state, pci_name(event->dev)); - atomic_set(&eeh_fail_count, 0); notifier_call_chain (&eeh_notifier_chain, EEH_NOTIFY_FREEZE, event); @@ -657,12 +654,18 @@ spin_lock_irqsave(&confirm_error_lock, flags); rc = 1; if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + pdn->eeh_check_count ++; + if (pdn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + pdn->eeh_check_count); + dump_stack(); + /* re-read the slot reset state */ if (read_slot_reset_state(pdn, rets) != 0) rets[0] = -1; /* reset state unknown */ - eeh_panic(dev, rets[0]); + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); } goto dn_unlock; } @@ -808,6 +811,8 @@ struct pci_dn *pdn = PCI_DN(dn); pdn->eeh_mode = 0; + pdn->eeh_check_count = 0; + pdn->eeh_freeze_count = 0; if (status && strcmp(status, "ok") != 0) return NULL; /* ignore devices with bad status */ From linas at linas.org Fri Nov 4 11:49:38 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:38 -0600 Subject: [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004938.GA26852@mail.gnucash.org> 09-hotplug-bugfix.patch In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add of a hotplug slot will crash the system, with the following (abbreviated) stack trace: cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0] pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0 lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118 c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable) c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64 c000000000060754 .notifier_call_chain+0x68/0x9c The root cause was that __init __alloc_bootmem() was called long after boot had finished, resulting in a crash because this routine is undefined after boot time. The patch below fixes this crash, and adds some docs to clarify the code. p.s. congrats to all for getting slashdotted on this yesterday! Signed-off-by: Linas Vepstas Mailed to: paulus at samba.org CC: linuxppc64-dev at ozlabs.org, linux-kernel at vger.kernel.org, johnrose at linux.ibm.com On Monday 3 October 2005 revised on 4 Ocober to [PATCH 1/2] ppc64: Crash in DLPAR code on PCI hotplug add Index: linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci_dn.c 2005-10-31 12:19:03.211506966 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c 2005-10-31 12:19:47.420303479 -0600 @@ -43,7 +43,7 @@ u32 *regs; struct pci_dn *pdn; - if (phb->is_dynamic) + if (mem_init_done) pdn = kmalloc(sizeof(*pdn), GFP_KERNEL); else pdn = alloc_bootmem(sizeof(*pdn)); @@ -120,6 +120,14 @@ return NULL; } +/** + * pci_devs_phb_init_dynamic - setup pci devices under this PHB + * phb: pci-to-host bridge (top-level bridge connecting to cpu) + * + * This routine is called both during boot, (before the memory + * subsystem is set up, before kmalloc is valid) and during the + * dynamic lpar operation of adding a PHB to a running system. + */ void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb) { struct device_node * dn = (struct device_node *) phb->arch_data; @@ -200,9 +208,14 @@ .notifier_call = pci_dn_reconfig_notifier, }; -/* - * Actually initialize the phbs. - * The buswalk on this phb has not happened yet. +/** + * pci_devs_phb_init - Initialize phbs and pci devs under them. + * + * This routine walks over all phb's (pci-host bridges) on the + * system, and sets up assorted pci-related structures + * (including pci info in the device node structs) for each + * pci device found underneath. This routine runs once, + * early in the boot sequence. */ void __init pci_devs_phb_init(void) { From linas at linas.org Fri Nov 4 11:49:45 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:45 -0600 Subject: [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004945.GA26860@mail.gnucash.org> 10-EEH-enable-bugfix.patch Bugfix: With the curent linux-2.6.14-rc2-git6, EEH errors are ignored because thier detection requires an unusued, uninitialized flag to be set. This patch removes the unused flag. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:54:20.919034814 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:54:48.165215962 -0600 @@ -631,11 +631,12 @@ pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ - if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { __get_cpu_var(ignored_check)++; #ifdef DEBUG - printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name); + printk ("EEH:ignored check (%x) for %s %s\n", + pdn->eeh_mode, pci_name (dev), dn->full_name); #endif return 0; } Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-10-31 12:54:20.919034814 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-10-31 12:54:48.167215682 -0600 @@ -63,7 +63,6 @@ int devfn; /* for pci devices */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; - int eeh_capable; /* from firmware */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ int eeh_is_bridge; /* device is pci-to-pci bridge */ From linas at linas.org Fri Nov 4 11:49:51 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:51 -0600 Subject: [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004951.GA26868@mail.gnucash.org> 11-eeh-move-to-powerpc.patch Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c No other changes (except for Makefile to build it) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-11-02 14:29:22.485829789 -0600 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,1093 +0,0 @@ -/* - * eeh.c - * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#undef DEBUG - -/** Overview: - * EEH, or "Extended Error Handling" is a PCI bridge technology for - * dealing with PCI bus errors that can't be dealt with within the - * usual PCI framework, except by check-stopping the CPU. Systems - * that are designed for high-availability/reliability cannot afford - * to crash due to a "mere" PCI error, thus the need for EEH. - * An EEH-capable bridge operates by converting a detected error - * into a "slot freeze", taking the PCI adapter off-line, making - * the slot behave, from the OS'es point of view, as if the slot - * were "empty": all reads return 0xff's and all writes are silently - * ignored. EEH slot isolation events can be triggered by parity - * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by low voltage on the bus, dust, - * vibration, humidity, radioactivity or plain-old failed hardware. - * - * Note, however, that one of the leading causes of EEH slot - * freeze events are buggy device drivers, buggy device microcode, - * or buggy device hardware. This is because any attempt by the - * device to bus-master data to a memory address that is not - * assigned to the device will trigger a slot freeze. (The idea - * is to prevent devices-gone-wild from corrupting system memory). - * Buggy hardware/drivers will have a miserable time co-existing - * with EEH. - * - * Ideally, a PCI device driver, when suspecting that an isolation - * event has occured (e.g. by reading 0xff's), will then ask EEH - * whether this is the case, and then take appropriate steps to - * reset the PCI slot, the PCI device, and then resume operations. - * However, until that day, the checking is done here, with the - * eeh_check_failure() routine embedded in the MMIO macros. If - * the slot is found to be isolated, an "EEH Event" is synthesized - * and sent out for processing. - */ - -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - -/* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. - */ -#define EEH_MAX_FAILS 100000 - -/* RTAS tokens */ -static int ibm_set_eeh_option; -static int ibm_set_slot_reset; -static int ibm_read_slot_reset_state; -static int ibm_read_slot_reset_state2; -static int ibm_slot_error_detail; - -static int eeh_subsystem_enabled; - -/* Lock to avoid races due to multiple reports of an error */ -static DEFINE_SPINLOCK(confirm_error_lock); - -/* Buffer for reporting slot-error-detail rtas calls */ -static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -static DEFINE_SPINLOCK(slot_errbuf_lock); -static int eeh_error_buf_size; - -/* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); - -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - -/* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ - -void eeh_slot_error_detail (struct pci_dn *pdn, int severity) -{ - unsigned long flags; - int rc; - - /* Log the error with the rtas logger */ - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - severity); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); -} - -/** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** - * read_slot_reset_state - Read the reset state of a device node's slot - * @dn: device node to read - * @rets: array to return results in - */ -static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) -{ - int token, outputs; - - if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { - token = ibm_read_slot_reset_state2; - outputs = 4; - } else { - token = ibm_read_slot_reset_state; - rets[2] = 0; /* fake PE Unavailable info */ - outputs = 3; - } - - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); -} - -/** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xA.... - */ -static inline unsigned long eeh_token_to_phys(unsigned long token) -{ - pte_t *ptep; - unsigned long pa; - - ptep = find_linux_pte(init_mm.pgd, token); - if (!ptep) - return token; - pa = pte_pfn(*ptep) << PAGE_SHIFT; - - return pa | (token & (PAGE_SIZE-1)); -} - -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - -/** Mark all devices that are peers of this device as failed. - * Mark the device driver too, so that it can see the failure - * immediately; this is critical, since some drivers poll - * status registers in interrupts ... If a driver is polling, - * and the slot is frozen, then the driver can deadlock in - * an interrupt context, which is bad. - */ - -static inline void __eeh_mark_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; - - if (dn->child) - __eeh_mark_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void __eeh_clear_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void eeh_clear_slot (struct device_node *dn) -{ - unsigned long flags; - spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); -} - -/** - * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze - * @dn device node - * @dev pci device, if known - * - * Check for an EEH failure for the given device node. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze. This routine - * will query firmware for the EEH status. - * - * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a slot isolation event notification. - * - * It is safe to call this routine in an interrupt context. - */ -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) -{ - int ret; - int rets[3]; - unsigned long flags; - int reset_state; - struct eeh_event *event; - struct pci_dn *pdn; - struct device_node *pe_dn; - int rc = 0; - - __get_cpu_var(total_mmio_ffs)++; - - if (!eeh_subsystem_enabled) - return 0; - - if (!dn) { - __get_cpu_var(no_dn)++; - return 0; - } - pdn = PCI_DN(dn); - - /* Access to IO BARs might get this far and still not want checking. */ - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; -#ifdef DEBUG - printk ("EEH:ignored check (%x) for %s %s\n", - pdn->eeh_mode, pci_name (dev), dn->full_name); -#endif - return 0; - } - - if (!pdn->eeh_config_addr) { - __get_cpu_var(no_cfg_addr)++; - return 0; - } - - /* If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check. - * Do this checking under a lock; as multiple PCI devices - * in one slot might report errors simultaneously, and we - * only want one error recovery routine running. - */ - spin_lock_irqsave(&confirm_error_lock, flags); - rc = 1; - if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - pdn->eeh_check_count ++; - if (pdn->eeh_check_count >= EEH_MAX_FAILS) { - printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", - pdn->eeh_check_count); - dump_stack(); - - /* re-read the slot reset state */ - if (read_slot_reset_state(pdn, rets) != 0) - rets[0] = -1; /* reset state unknown */ - - /* If we are here, then we hit an infinite loop. Stop. */ - panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); - } - goto dn_unlock; - } - - /* - * Now test for an EEH failure. This is VERY expensive. - * Note that the eeh_config_addr may be a parent device - * in the case of a device behind a bridge, or it may be - * function zero of a multi-function device. - * In any case they must share a common PHB. - */ - ret = read_slot_reset_state(pdn, rets); - - /* If the call to firmware failed, punt */ - if (ret != 0) { - printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If EEH is not supported on this device, punt. */ - if (rets[1] != 1) { - printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If not the kind of error we know about, punt. */ - if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* Note that config-io to empty slots may fail; - * we recognize empty because they don't have children. */ - if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - __get_cpu_var(slot_resets)++; - - /* Avoid repeated reports of this failure, including problems - * with other functions on this device, and functions under - * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); - - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - - /* Most EEH events are due to device driver bugs. Having - * a stack trace will help the device-driver authors figure - * out what happened. So print that out. */ - if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - - return 1; - -dn_unlock: - spin_unlock_irqrestore(&confirm_error_lock, flags); - return rc; -} - -EXPORT_SYMBOL_GPL(eeh_dn_check_failure); - -/** - * eeh_check_failure - check if all 1's data is due to EEH slot freeze - * @token i/o token, should be address in the form 0xA.... - * @val value, should be all 1's (XXX why do we need this arg??) - * - * Check for an EEH failure at the given token address. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze event. This routine - * will query firmware for the EEH status. - * - * Note this routine is safe to call in an interrupt context. - */ -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) -{ - unsigned long addr; - struct pci_dev *dev; - struct device_node *dn; - - /* Finding the phys addr + pci device; this is pretty quick. */ - addr = eeh_token_to_phys((unsigned long __force) token); - dev = pci_get_device_by_addr(addr); - if (!dev) { - __get_cpu_var(no_device)++; - return val; - } - - dn = pci_device_to_OF_node(dev); - eeh_dn_check_failure (dn, dev); - - pci_dev_put(dev); - return val; -} - -EXPORT_SYMBOL(eeh_check_failure); - -struct eeh_early_enable_info { - unsigned int buid_hi; - unsigned int buid_lo; -}; - -/* Enable eeh for the given device node. */ -static void *early_enable_eeh(struct device_node *dn, void *data) -{ - struct eeh_early_enable_info *info = data; - int ret; - char *status = get_property(dn, "status", NULL); - u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); - u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); - u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); - u32 *regs; - int enable; - struct pci_dn *pdn = PCI_DN(dn); - - pdn->eeh_mode = 0; - pdn->eeh_check_count = 0; - pdn->eeh_freeze_count = 0; - - if (status && strcmp(status, "ok") != 0) - return NULL; /* ignore devices with bad status */ - - /* Ignore bad nodes. */ - if (!class_code || !vendor_id || !device_id) - return NULL; - - /* There is nothing to check on PCI to ISA bridges */ - if (dn->type && !strcmp(dn->type, "isa")) { - pdn->eeh_mode |= EEH_MODE_NOCHECK; - return NULL; - } - - /* - * Now decide if we are going to "Disable" EEH checking - * for this device. We still run with the EEH hardware active, - * but we won't be checking for ff's. This means a driver - * could return bad data (very bad!), an interrupt handler could - * hang waiting on status bits that won't change, etc. - * But there are a few cases like display devices that make sense. - */ - enable = 1; /* i.e. we will do checking */ - if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) - enable = 0; - - if (!enable) - pdn->eeh_mode |= EEH_MODE_NOCHECK; - - /* Ok... see if this device supports EEH. Some do, some don't, - * and the only way to find out is to check each and every one. */ - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - /* Try to enable eeh */ - ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); - if (ret == 0) { - eeh_subsystem_enabled = 1; - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = regs[0]; -#ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); -#endif - } else { - - /* This device doesn't support EEH, but it may have an - * EEH parent, in which case we mark it as supported. */ - if (dn->parent && PCI_DN(dn->parent) - && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - /* Parent supports EEH. */ - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; - return NULL; - } - } - } else { - printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", - dn->full_name); - } - - return NULL; -} - -/* - * Initialize EEH by trying to enable it for all of the adapters in the system. - * As a side effect we can determine here if eeh is supported at all. - * Note that we leave EEH on so failed config cycles won't cause a machine - * check. If a user turns off EEH for a particular adapter they are really - * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't - * grant access to a slot if EEH isn't enabled, and so we always enable - * EEH for all slots/all devices. - * - * The eeh-force-off option disables EEH checking globally, for all slots. - * Even if force-off is set, the EEH hardware is still enabled, so that - * newer systems can boot. - */ -void __init eeh_init(void) -{ - struct device_node *phb, *np; - struct eeh_early_enable_info info; - - spin_lock_init(&confirm_error_lock); - spin_lock_init(&slot_errbuf_lock); - - np = of_find_node_by_path("/rtas"); - if (np == NULL) - return; - - ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); - ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); - ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); - ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); - ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); - - if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) - return; - - eeh_error_buf_size = rtas_token("rtas-error-log-max"); - if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { - eeh_error_buf_size = 1024; - } - if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { - printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " - "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); - eeh_error_buf_size = RTAS_ERROR_LOG_MAX; - } - - /* Enable EEH for all adapters. Note that eeh requires buid's */ - for (phb = of_find_node_by_name(NULL, "pci"); phb; - phb = of_find_node_by_name(phb, "pci")) { - unsigned long buid; - - buid = get_phb_buid(phb); - if (buid == 0 || PCI_DN(phb) == NULL) - continue; - - info.buid_lo = BUID_LO(buid); - info.buid_hi = BUID_HI(buid); - traverse_pci_devices(phb, early_enable_eeh, &info); - } - - if (eeh_subsystem_enabled) - printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - printk(KERN_WARNING "EEH: No capable adapters found\n"); -} - -/** - * eeh_add_device_early - enable EEH for the indicated device_node - * @dn: device node for which to set up EEH - * - * This routine must be used to perform EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - * This routine must be called before any i/o is performed to the - * adapter (inluding any config-space i/o). - * Whether this actually enables EEH or not for this device depends - * on the CEC architecture, type of the device, on earlier boot - * command-line arguments & etc. - */ -void eeh_add_device_early(struct device_node *dn) -{ - struct pci_controller *phb; - struct eeh_early_enable_info info; - - if (!dn || !PCI_DN(dn)) - return; - phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); - return; - } - - info.buid_hi = BUID_HI(phb->buid); - info.buid_lo = BUID_LO(phb->buid); - early_enable_eeh(dn, &info); -} -EXPORT_SYMBOL_GPL(eeh_add_device_early); - -/** - * eeh_add_device_late - perform EEH initialization for the indicated pci device - * @dev: pci device for which to set up EEH - * - * This routine must be used to complete EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - */ -void eeh_add_device_late(struct pci_dev *dev) -{ - struct device_node *dn; - - if (!dev || !eeh_subsystem_enabled) - return; - -#ifdef DEBUG - printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); -#endif - - pci_dev_get (dev); - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; - - pci_addr_cache_insert_device (dev); -} -EXPORT_SYMBOL_GPL(eeh_add_device_late); - -/** - * eeh_remove_device - undo EEH setup for the indicated pci device - * @dev: pci device to be removed - * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). - */ -void eeh_remove_device(struct pci_dev *dev) -{ - struct device_node *dn; - if (!dev || !eeh_subsystem_enabled) - return; - - /* Unregister the device with the EEH/PCI address search system */ -#ifdef DEBUG - printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); -#endif - pci_addr_cache_remove_device(dev); - - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = NULL; - pci_dev_put (dev); -} -EXPORT_SYMBOL_GPL(eeh_remove_device); - -static int proc_eeh_show(struct seq_file *m, void *v) -{ - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - - if (0 == eeh_subsystem_enabled) { - seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); - } else { - seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, - "no device=%ld\n" - "no device node=%ld\n" - "no config address=%ld\n" - "check not wanted=%ld\n" - "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); - } - - return 0; -} - -static int proc_eeh_open(struct inode *inode, struct file *file) -{ - return single_open(file, proc_eeh_show, NULL); -} - -static struct file_operations proc_eeh_operations = { - .open = proc_eeh_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int __init eeh_init_proc(void) -{ - struct proc_dir_entry *e; - - if (systemcfg->platform & PLATFORM_PSERIES) { - e = create_proc_entry("ppc64/eeh", 0, NULL); - if (e) - e->proc_fops = &proc_eeh_operations; - } - - return 0; -} -__initcall(eeh_init_proc); Index: linux-2.6.14-git3/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/Makefile 2005-11-02 14:29:22.485829789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/Makefile 2005-11-02 14:30:49.805589414 -0600 @@ -35,7 +35,6 @@ bpa_iic.o spider-pic.o obj-$(CONFIG_KEXEC) += machine_kexec.o -obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-10-31 11:19:47.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:31:36.150092654 -0600 @@ -3,3 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o +obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:30:49.790591516 -0600 @@ -0,0 +1,1093 @@ +/* + * eeh.c + * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +/** Overview: + * EEH, or "Extended Error Handling" is a PCI bridge technology for + * dealing with PCI bus errors that can't be dealt with within the + * usual PCI framework, except by check-stopping the CPU. Systems + * that are designed for high-availability/reliability cannot afford + * to crash due to a "mere" PCI error, thus the need for EEH. + * An EEH-capable bridge operates by converting a detected error + * into a "slot freeze", taking the PCI adapter off-line, making + * the slot behave, from the OS'es point of view, as if the slot + * were "empty": all reads return 0xff's and all writes are silently + * ignored. EEH slot isolation events can be triggered by parity + * errors on the address or data busses (e.g. during posted writes), + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. + * + * Note, however, that one of the leading causes of EEH slot + * freeze events are buggy device drivers, buggy device microcode, + * or buggy device hardware. This is because any attempt by the + * device to bus-master data to a memory address that is not + * assigned to the device will trigger a slot freeze. (The idea + * is to prevent devices-gone-wild from corrupting system memory). + * Buggy hardware/drivers will have a miserable time co-existing + * with EEH. + * + * Ideally, a PCI device driver, when suspecting that an isolation + * event has occured (e.g. by reading 0xff's), will then ask EEH + * whether this is the case, and then take appropriate steps to + * reset the PCI slot, the PCI device, and then resume operations. + * However, until that day, the checking is done here, with the + * eeh_check_failure() routine embedded in the MMIO macros. If + * the slot is found to be isolated, an "EEH Event" is synthesized + * and sent out for processing. + */ + +/* EEH event workqueue setup. */ +static DEFINE_SPINLOCK(eeh_eventlist_lock); +LIST_HEAD(eeh_eventlist); +static void eeh_event_handler(void *); +DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); + +static struct notifier_block *eeh_notifier_chain; + +/* If a device driver keeps reading an MMIO register in an interrupt + * handler after a slot isolation event has occurred, we assume it + * is broken and panic. This sets the threshold for how many read + * attempts we allow before panicking. + */ +#define EEH_MAX_FAILS 100000 + +/* RTAS tokens */ +static int ibm_set_eeh_option; +static int ibm_set_slot_reset; +static int ibm_read_slot_reset_state; +static int ibm_read_slot_reset_state2; +static int ibm_slot_error_detail; + +static int eeh_subsystem_enabled; + +/* Lock to avoid races due to multiple reports of an error */ +static DEFINE_SPINLOCK(confirm_error_lock); + +/* Buffer for reporting slot-error-detail rtas calls */ +static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; +static DEFINE_SPINLOCK(slot_errbuf_lock); +static int eeh_error_buf_size; + +/* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); +static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); +static DEFINE_PER_CPU(unsigned long, false_positives); +static DEFINE_PER_CPU(unsigned long, ignored_failures); +static DEFINE_PER_CPU(unsigned long, slot_resets); + +/** + * The pci address cache subsystem. This subsystem places + * PCI device address resources into a red-black tree, sorted + * according to the address range, so that given only an i/o + * address, the corresponding PCI device can be **quickly** + * found. It is safe to perform an address lookup in an interrupt + * context; this ability is an important feature. + * + * Currently, the only customer of this code is the EEH subsystem; + * thus, this code has been somewhat tailored to suit EEH better. + * In particular, the cache does *not* hold the addresses of devices + * for which EEH is not enabled. + * + * (Implementation Note: The RB tree seems to be better/faster + * than any hash algo I could think of for this problem, even + * with the penalty of slow pointer chases for d-cache misses). + */ +struct pci_io_addr_range +{ + struct rb_node rb_node; + unsigned long addr_lo; + unsigned long addr_hi; + struct pci_dev *pcidev; + unsigned int flags; +}; + +static struct pci_io_addr_cache +{ + struct rb_root rb_root; + spinlock_t piar_lock; +} pci_io_addr_cache_root; + +static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) +{ + struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; + + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (addr < piar->addr_lo) { + n = n->rb_left; + } else { + if (addr > piar->addr_hi) { + n = n->rb_right; + } else { + pci_dev_get(piar->pcidev); + return piar->pcidev; + } + } + } + + return NULL; +} + +/** + * pci_get_device_by_addr - Get device, given only address + * @addr: mmio (PIO) phys address or i/o port number + * + * Given an mmio phys address, or a port number, find a pci device + * that implements this address. Be sure to pci_dev_put the device + * when finished. I/O port numbers are assumed to be offset + * from zero (that is, they do *not* have pci_io_addr added in). + * It is safe to call this function within an interrupt. + */ +static struct pci_dev *pci_get_device_by_addr(unsigned long addr) +{ + struct pci_dev *dev; + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + dev = __pci_get_device_by_addr(addr); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); + return dev; +} + +#ifdef DEBUG +/* + * Handy-dandy debug print routine, does nothing more + * than print out the contents of our addr cache. + */ +static void pci_addr_cache_print(struct pci_io_addr_cache *cache) +{ + struct rb_node *n; + int cnt = 0; + + n = rb_first(&cache->rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", + (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, + piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + cnt++; + n = rb_next(n); + } +} +#endif + +/* Insert address range into the rb tree. */ +static struct pci_io_addr_range * +pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, + unsigned long ahi, unsigned int flags) +{ + struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; + struct rb_node *parent = NULL; + struct pci_io_addr_range *piar; + + /* Walk tree, find a place to insert into tree */ + while (*p) { + parent = *p; + piar = rb_entry(parent, struct pci_io_addr_range, rb_node); + if (ahi < piar->addr_lo) { + p = &parent->rb_left; + } else if (alo > piar->addr_hi) { + p = &parent->rb_right; + } else { + if (dev != piar->pcidev || + alo != piar->addr_lo || ahi != piar->addr_hi) { + printk(KERN_WARNING "PIAR: overlapping address range\n"); + } + return piar; + } + } + piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); + if (!piar) + return NULL; + + piar->addr_lo = alo; + piar->addr_hi = ahi; + piar->pcidev = dev; + piar->flags = flags; + +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + + rb_link_node(&piar->rb_node, parent, p); + rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); + + return piar; +} + +static void __pci_addr_cache_insert_device(struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_dn *pdn; + int i; + int inserted = 0; + + dn = pci_device_to_OF_node(dev); + if (!dn) { + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); + return; + } + + /* Skip any devices for which EEH is not enabled. */ + pdn = PCI_DN(dn); + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { +#ifdef DEBUG + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); +#endif + return; + } + + /* The cache holds a reference to the device... */ + pci_dev_get(dev); + + /* Walk resources on this device, poke them into the tree */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long start = pci_resource_start(dev,i); + unsigned long end = pci_resource_end(dev,i); + unsigned int flags = pci_resource_flags(dev,i); + + /* We are interested only bus addresses, not dma or other stuff */ + if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if (start == 0 || ~start == 0 || end == 0 || ~end == 0) + continue; + pci_addr_cache_insert(dev, start, end, flags); + inserted = 1; + } + + /* If there was nothing to add, the cache has no reference... */ + if (!inserted) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_insert_device - Add a device to the address cache + * @dev: PCI device whose I/O addresses we are interested in. + * + * In order to support the fast lookup of devices based on addresses, + * we maintain a cache of devices that can be quickly searched. + * This routine adds a device to that cache. + */ +static void pci_addr_cache_insert_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_insert_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) +{ + struct rb_node *n; + int removed = 0; + +restart: + n = rb_first(&pci_io_addr_cache_root.rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (piar->pcidev == dev) { + rb_erase(n, &pci_io_addr_cache_root.rb_root); + removed = 1; + kfree(piar); + goto restart; + } + n = rb_next(n); + } + + /* The cache no longer holds its reference to this device... */ + if (removed) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_remove_device - remove pci device from addr cache + * @dev: device to remove + * + * Remove a device from the addr-cache tree. + * This is potentially expensive, since it will walk + * the tree multiple times (once per resource). + * But so what; device removal doesn't need to be that fast. + */ +static void pci_addr_cache_remove_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_remove_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +/** + * pci_addr_cache_build - Build a cache of I/O addresses + * + * Build a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + * This routine scans all pci busses to build the cache. + * Must be run late in boot process, after the pci controllers + * have been scaned for devices (after all device resources are known). + */ +void __init pci_addr_cache_build(void) +{ + struct pci_dev *dev = NULL; + + if (!eeh_subsystem_enabled) + return; + + spin_lock_init(&pci_io_addr_cache_root.piar_lock); + + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { + /* Ignore PCI bridges ( XXX why ??) */ + if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { + continue; + } + pci_addr_cache_insert_device(dev); + } + +#ifdef DEBUG + /* Verify tree built up above, echo back the list of addrs. */ + pci_addr_cache_print(&pci_io_addr_cache_root); +#endif +} + +/* --------------------------------------------------------------- */ +/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ + +void eeh_slot_error_detail (struct pci_dn *pdn, int severity) +{ + unsigned long flags; + int rc; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +/** + * eeh_register_notifier - Register to find out about EEH events. + * @nb: notifier block to callback on events + */ +int eeh_register_notifier(struct notifier_block *nb) +{ + return notifier_chain_register(&eeh_notifier_chain, nb); +} + +/** + * eeh_unregister_notifier - Unregister to an EEH event notifier. + * @nb: notifier block to callback on events + */ +int eeh_unregister_notifier(struct notifier_block *nb) +{ + return notifier_chain_unregister(&eeh_notifier_chain, nb); +} + +/** + * read_slot_reset_state - Read the reset state of a device node's slot + * @dn: device node to read + * @rets: array to return results in + */ +static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) +{ + int token, outputs; + + if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { + token = ibm_read_slot_reset_state2; + outputs = 4; + } else { + token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ + outputs = 3; + } + + return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); +} + +/** + * eeh_panic - call panic() for an eeh event that cannot be handled. + * The philosophy of this routine is that it is better to panic and + * halt the OS than it is to risk possible data corruption by + * oblivious device drivers that don't know better. + * + * @dev pci device that had an eeh event + * @reset_state current reset state of the device slot + */ +static void eeh_panic(struct pci_dev *dev, int reset_state) +{ + /* + * XXX We should create a separate sysctl for this. + * + * Since the panic_on_oops sysctl is used to halt the system + * in light of potential corruption, we can use it here. + */ + if (panic_on_oops) { + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); + panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, + pci_name(dev)); + } + else { + __get_cpu_var(ignored_failures)++; + printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", + reset_state, pci_name(dev)); + } +} + +/** + * eeh_event_handler - dispatch EEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static void eeh_event_handler(void *dummy) +{ + unsigned long flags; + struct eeh_event *event; + + while (1) { + spin_lock_irqsave(&eeh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&eeh_eventlist)) { + event = list_entry(eeh_eventlist.next, struct eeh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " + "%s\n", event->reset_state, + pci_name(event->dev)); + + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); + + pci_dev_put(event->dev); + kfree(event); + } +} + +/** + * eeh_token_to_phys - convert EEH address token to phys address + * @token i/o token, should be address in the form 0xA.... + */ +static inline unsigned long eeh_token_to_phys(unsigned long token) +{ + pte_t *ptep; + unsigned long pa; + + ptep = find_linux_pte(init_mm.pgd, token); + if (!ptep) + return token; + pa = pte_pfn(*ptep) << PAGE_SHIFT; + + return pa | (token & (PAGE_SIZE-1)); +} + +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + +/** Mark all devices that are peers of this device as failed. + * Mark the device driver too, so that it can see the failure + * immediately; this is critical, since some drivers poll + * status registers in interrupts ... If a driver is polling, + * and the slot is frozen, then the driver can deadlock in + * an interrupt context, which is bad. + */ + +static inline void __eeh_mark_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + + if (dn->child) + __eeh_mark_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void __eeh_clear_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; + if (dn->child) + __eeh_clear_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void eeh_clear_slot (struct device_node *dn) +{ + unsigned long flags; + spin_lock_irqsave(&confirm_error_lock, flags); + __eeh_clear_slot (dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); +} + +/** + * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze + * @dn device node + * @dev pci device, if known + * + * Check for an EEH failure for the given device node. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze. This routine + * will query firmware for the EEH status. + * + * Returns 0 if there has not been an EEH error; otherwise returns + * a non-zero value and queues up a slot isolation event notification. + * + * It is safe to call this routine in an interrupt context. + */ +int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) +{ + int ret; + int rets[3]; + unsigned long flags; + int reset_state; + struct eeh_event *event; + struct pci_dn *pdn; + struct device_node *pe_dn; + int rc = 0; + + __get_cpu_var(total_mmio_ffs)++; + + if (!eeh_subsystem_enabled) + return 0; + + if (!dn) { + __get_cpu_var(no_dn)++; + return 0; + } + pdn = PCI_DN(dn); + + /* Access to IO BARs might get this far and still not want checking. */ + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; +#ifdef DEBUG + printk ("EEH:ignored check (%x) for %s %s\n", + pdn->eeh_mode, pci_name (dev), dn->full_name); +#endif + return 0; + } + + if (!pdn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; + return 0; + } + + /* If we already have a pending isolation event for this + * slot, we know it's bad already, we don't need to check. + * Do this checking under a lock; as multiple PCI devices + * in one slot might report errors simultaneously, and we + * only want one error recovery routine running. + */ + spin_lock_irqsave(&confirm_error_lock, flags); + rc = 1; + if (pdn->eeh_mode & EEH_MODE_ISOLATED) { + pdn->eeh_check_count ++; + if (pdn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + pdn->eeh_check_count); + dump_stack(); + + /* re-read the slot reset state */ + if (read_slot_reset_state(pdn, rets) != 0) + rets[0] = -1; /* reset state unknown */ + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); + } + goto dn_unlock; + } + + /* + * Now test for an EEH failure. This is VERY expensive. + * Note that the eeh_config_addr may be a parent device + * in the case of a device behind a bridge, or it may be + * function zero of a multi-function device. + * In any case they must share a common PHB. + */ + ret = read_slot_reset_state(pdn, rets); + + /* If the call to firmware failed, punt */ + if (ret != 0) { + printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If EEH is not supported on this device, punt. */ + if (rets[1] != 1) { + printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If not the kind of error we know about, punt. */ + if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* Note that config-io to empty slots may fail; + * we recognize empty because they don't have children. */ + if ((rets[0] == 5) && (dn->child == NULL)) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + __get_cpu_var(slot_resets)++; + + /* Avoid repeated reports of this failure, including problems + * with other functions on this device, and functions under + * bridges. */ + pe_dn = find_device_pe (dn); + __eeh_mark_slot (pe_dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); + + reset_state = rets[0]; + + eeh_slot_error_detail (pdn, 1 /* Temporary Error */); + + printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", + rets[0], dn->name, dn->full_name); + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + eeh_panic(dev, reset_state); + return 1; + } + + event->dev = dev; + event->dn = dn; + event->reset_state = reset_state; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&eeh_eventlist_lock, flags); + list_add(&event->list, &eeh_eventlist); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + + /* Most EEH events are due to device driver bugs. Having + * a stack trace will help the device-driver authors figure + * out what happened. So print that out. */ + if (rets[0] != 5) dump_stack(); + schedule_work(&eeh_event_wq); + + return 1; + +dn_unlock: + spin_unlock_irqrestore(&confirm_error_lock, flags); + return rc; +} + +EXPORT_SYMBOL_GPL(eeh_dn_check_failure); + +/** + * eeh_check_failure - check if all 1's data is due to EEH slot freeze + * @token i/o token, should be address in the form 0xA.... + * @val value, should be all 1's (XXX why do we need this arg??) + * + * Check for an EEH failure at the given token address. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze event. This routine + * will query firmware for the EEH status. + * + * Note this routine is safe to call in an interrupt context. + */ +unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) +{ + unsigned long addr; + struct pci_dev *dev; + struct device_node *dn; + + /* Finding the phys addr + pci device; this is pretty quick. */ + addr = eeh_token_to_phys((unsigned long __force) token); + dev = pci_get_device_by_addr(addr); + if (!dev) { + __get_cpu_var(no_device)++; + return val; + } + + dn = pci_device_to_OF_node(dev); + eeh_dn_check_failure (dn, dev); + + pci_dev_put(dev); + return val; +} + +EXPORT_SYMBOL(eeh_check_failure); + +struct eeh_early_enable_info { + unsigned int buid_hi; + unsigned int buid_lo; +}; + +/* Enable eeh for the given device node. */ +static void *early_enable_eeh(struct device_node *dn, void *data) +{ + struct eeh_early_enable_info *info = data; + int ret; + char *status = get_property(dn, "status", NULL); + u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); + u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); + u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); + u32 *regs; + int enable; + struct pci_dn *pdn = PCI_DN(dn); + + pdn->eeh_mode = 0; + pdn->eeh_check_count = 0; + pdn->eeh_freeze_count = 0; + + if (status && strcmp(status, "ok") != 0) + return NULL; /* ignore devices with bad status */ + + /* Ignore bad nodes. */ + if (!class_code || !vendor_id || !device_id) + return NULL; + + /* There is nothing to check on PCI to ISA bridges */ + if (dn->type && !strcmp(dn->type, "isa")) { + pdn->eeh_mode |= EEH_MODE_NOCHECK; + return NULL; + } + + /* + * Now decide if we are going to "Disable" EEH checking + * for this device. We still run with the EEH hardware active, + * but we won't be checking for ff's. This means a driver + * could return bad data (very bad!), an interrupt handler could + * hang waiting on status bits that won't change, etc. + * But there are a few cases like display devices that make sense. + */ + enable = 1; /* i.e. we will do checking */ + if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) + enable = 0; + + if (!enable) + pdn->eeh_mode |= EEH_MODE_NOCHECK; + + /* Ok... see if this device supports EEH. Some do, some don't, + * and the only way to find out is to check each and every one. */ + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + /* Try to enable eeh */ + ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, + regs[0], info->buid_hi, info->buid_lo, + EEH_ENABLE); + if (ret == 0) { + eeh_subsystem_enabled = 1; + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = regs[0]; +#ifdef DEBUG + printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); +#endif + } else { + + /* This device doesn't support EEH, but it may have an + * EEH parent, in which case we mark it as supported. */ + if (dn->parent && PCI_DN(dn->parent) + && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + /* Parent supports EEH. */ + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; + return NULL; + } + } + } else { + printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", + dn->full_name); + } + + return NULL; +} + +/* + * Initialize EEH by trying to enable it for all of the adapters in the system. + * As a side effect we can determine here if eeh is supported at all. + * Note that we leave EEH on so failed config cycles won't cause a machine + * check. If a user turns off EEH for a particular adapter they are really + * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't + * grant access to a slot if EEH isn't enabled, and so we always enable + * EEH for all slots/all devices. + * + * The eeh-force-off option disables EEH checking globally, for all slots. + * Even if force-off is set, the EEH hardware is still enabled, so that + * newer systems can boot. + */ +void __init eeh_init(void) +{ + struct device_node *phb, *np; + struct eeh_early_enable_info info; + + spin_lock_init(&confirm_error_lock); + spin_lock_init(&slot_errbuf_lock); + + np = of_find_node_by_path("/rtas"); + if (np == NULL) + return; + + ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); + ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); + ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); + ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); + ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); + + if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) + return; + + eeh_error_buf_size = rtas_token("rtas-error-log-max"); + if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { + eeh_error_buf_size = 1024; + } + if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { + printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " + "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); + eeh_error_buf_size = RTAS_ERROR_LOG_MAX; + } + + /* Enable EEH for all adapters. Note that eeh requires buid's */ + for (phb = of_find_node_by_name(NULL, "pci"); phb; + phb = of_find_node_by_name(phb, "pci")) { + unsigned long buid; + + buid = get_phb_buid(phb); + if (buid == 0 || PCI_DN(phb) == NULL) + continue; + + info.buid_lo = BUID_LO(buid); + info.buid_hi = BUID_HI(buid); + traverse_pci_devices(phb, early_enable_eeh, &info); + } + + if (eeh_subsystem_enabled) + printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); + else + printk(KERN_WARNING "EEH: No capable adapters found\n"); +} + +/** + * eeh_add_device_early - enable EEH for the indicated device_node + * @dn: device node for which to set up EEH + * + * This routine must be used to perform EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + * This routine must be called before any i/o is performed to the + * adapter (inluding any config-space i/o). + * Whether this actually enables EEH or not for this device depends + * on the CEC architecture, type of the device, on earlier boot + * command-line arguments & etc. + */ +void eeh_add_device_early(struct device_node *dn) +{ + struct pci_controller *phb; + struct eeh_early_enable_info info; + + if (!dn || !PCI_DN(dn)) + return; + phb = PCI_DN(dn)->phb; + if (NULL == phb || 0 == phb->buid) { + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); + return; + } + + info.buid_hi = BUID_HI(phb->buid); + info.buid_lo = BUID_LO(phb->buid); + early_enable_eeh(dn, &info); +} +EXPORT_SYMBOL_GPL(eeh_add_device_early); + +/** + * eeh_add_device_late - perform EEH initialization for the indicated pci device + * @dev: pci device for which to set up EEH + * + * This routine must be used to complete EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + */ +void eeh_add_device_late(struct pci_dev *dev) +{ + struct device_node *dn; + + if (!dev || !eeh_subsystem_enabled) + return; + +#ifdef DEBUG + printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); +#endif + + pci_dev_get (dev); + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = dev; + + pci_addr_cache_insert_device (dev); +} +EXPORT_SYMBOL_GPL(eeh_add_device_late); + +/** + * eeh_remove_device - undo EEH setup for the indicated pci device + * @dev: pci device to be removed + * + * This routine should be when a device is removed from a running + * system (e.g. by hotplug or dlpar). + */ +void eeh_remove_device(struct pci_dev *dev) +{ + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) + return; + + /* Unregister the device with the EEH/PCI address search system */ +#ifdef DEBUG + printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); +#endif + pci_addr_cache_remove_device(dev); + + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = NULL; + pci_dev_put (dev); +} +EXPORT_SYMBOL_GPL(eeh_remove_device); + +static int proc_eeh_show(struct seq_file *m, void *v) +{ + unsigned int cpu; + unsigned long ffs = 0, positives = 0, failures = 0; + unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; + + for_each_cpu(cpu) { + ffs += per_cpu(total_mmio_ffs, cpu); + positives += per_cpu(false_positives, cpu); + failures += per_cpu(ignored_failures, cpu); + resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); + } + + if (0 == eeh_subsystem_enabled) { + seq_printf(m, "EEH Subsystem is globally disabled\n"); + seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); + } else { + seq_printf(m, "EEH Subsystem is enabled\n"); + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" + "eeh_false_positives=%ld\n" + "eeh_ignored_failures=%ld\n" + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); + } + + return 0; +} + +static int proc_eeh_open(struct inode *inode, struct file *file) +{ + return single_open(file, proc_eeh_show, NULL); +} + +static struct file_operations proc_eeh_operations = { + .open = proc_eeh_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int __init eeh_init_proc(void) +{ + struct proc_dir_entry *e; + + if (systemcfg->platform & PLATFORM_PSERIES) { + e = create_proc_entry("ppc64/eeh", 0, NULL); + if (e) + e->proc_fops = &proc_eeh_operations; + } + + return 0; +} +__initcall(eeh_init_proc); From linas at linas.org Fri Nov 4 11:50:04 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:04 -0600 Subject: [PATCH 12/42]: ppc64: PCI error event dispatcher References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005004.GA26878@mail.gnucash.org> 12-eeh-event-dispatcher.patch ppc64: EEH Recovery dispatcher thread This patch adds a mechanism to create recovery threads when an EEH event is received. Since an EEH freeze state may be detected within an interrupt context, we need to get out of the interrupt context before starting recovery. This dispatcher does this in two steps: first, it uses a workqueue to get out, and then lanuches a kernel thread, so that the recovery routine can sleep for exteded periods without upseting the keventd. A kernel thread is created with each EEH event, rather than having one long-running daemon started at boot time. This is because it is anticipated that EEH events will be very rare (very very rare, ideally) and so its pointless to cluter the process tables with a daemon that will almost never run. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:30:49.790591516 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:32:35.713742506 -0600 @@ -19,7 +19,6 @@ #include #include -#include #include #include #include @@ -27,12 +26,12 @@ #include #include #include +#include #include #include +#include #include -#include #include -#include #undef DEBUG @@ -70,14 +69,6 @@ * and sent out for processing. */ -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - /* If a device driver keeps reading an MMIO register in an interrupt * handler after a slot isolation event has occurred, we assume it * is broken and panic. This sets the threshold for how many read @@ -421,24 +412,6 @@ } /** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** * read_slot_reset_state - Read the reset state of a device node's slot * @dn: device node to read * @rets: array to return results in @@ -461,73 +434,6 @@ } /** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** * eeh_token_to_phys - convert EEH address token to phys address * @token i/o token, should be address in the form 0xA.... */ @@ -613,8 +519,6 @@ int ret; int rets[3]; unsigned long flags; - int reset_state; - struct eeh_event *event; struct pci_dn *pdn; struct device_node *pe_dn; int rc = 0; @@ -722,33 +626,12 @@ __eeh_mark_slot (pe_dn); spin_unlock_irqrestore(&confirm_error_lock, flags); - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - + eeh_send_failure_event (dn, dev, rets[0], rets[2]); + /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - return 1; dn_unlock: @@ -793,6 +676,14 @@ EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + +#define EEH_ENABLE 1 + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -850,8 +741,9 @@ /* First register entry is addr (00BBSS00) */ /* Try to enable eeh */ ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); + regs[0], info->buid_hi, info->buid_lo, + EEH_ENABLE); + if (ret == 0) { eeh_subsystem_enabled = 1; pdn->eeh_mode |= EEH_MODE_SUPPORTED; Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h 2005-11-02 14:32:35.718741805 -0600 @@ -0,0 +1,52 @@ +/* + * eeh_event.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Copyright (c) 2005 Linas Vepstas + */ + +#ifndef ASM_PPC64_EEH_EVENT_H +#define ASM_PPC64_EEH_EVENT_H + +/** EEH event -- structure holding pci controller data that describes + * a change in the isolation status of a PCI slot. A pointer + * to this struct is passed as the data pointer in a notify callback. + */ +struct eeh_event { + struct list_head list; + struct device_node *dn; /* struct device node */ + struct pci_dev *dev; /* affected device */ + int state; + int time_unavail; /* milliseconds until device might be available */ +}; + +/** + * eeh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine builds a PCI error event which will be delivered + * to all listeners on the peh_notifier_chain. + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int eeh_send_failure_event (struct device_node *dn, + struct pci_dev *dev, + int reset_state, + int time_unavail); + +#endif /* ASM_PPC64_EEH_EVENT_H */ Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:29:21.496968403 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:32:35.725740824 -0600 @@ -1,4 +1,4 @@ -/* +/* * eeh.h * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation. * @@ -6,12 +6,12 @@ * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA @@ -27,8 +27,6 @@ struct pci_dev; struct device_node; -struct device_node; -struct notifier_block; #ifdef CONFIG_EEH @@ -37,6 +35,10 @@ #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -59,36 +61,14 @@ * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). + * This routine should be called when a device is removed from + * a running system (e.g. by hotplug or dlpar). It unregisters + * the PCI device from the EEH subsystem. I/O errors affecting + * this device will no longer be detected after this call; thus, + * i/o errors affecting this slot may leave this device unusable. */ void eeh_remove_device(struct pci_dev *); -#define EEH_DISABLE 0 -#define EEH_ENABLE 1 -#define EEH_RELEASE_LOADSTORE 2 -#define EEH_RELEASE_DMA 3 - -/** - * Notifier event flags. - */ -#define EEH_NOTIFY_FREEZE 1 - -/** EEH event -- structure holding pci slot data that describes - * a change in the isolation status of a PCI slot. A pointer - * to this struct is passed as the data pointer in a notify callback. - */ -struct eeh_event { - struct list_head list; - struct pci_dev *dev; - struct device_node *dn; - int reset_state; -}; - -/** Register to find out about EEH events. */ -int eeh_register_notifier(struct notifier_block *nb); -int eeh_unregister_notifier(struct notifier_block *nb); - /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. * @@ -129,7 +109,7 @@ #define EEH_IO_ERROR_VALUE(size) (-1UL) #endif /* CONFIG_EEH */ -/* +/* * MMIO read/write operations with EEH support. */ static inline u8 eeh_readb(const volatile void __iomem *addr) Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:32:35.731739983 -0600 @@ -0,0 +1,155 @@ +/* + * eeh_event.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Copyright (c) 2005 Linas Vepstas + */ + +#include +#include +#include + +/** Overview: + * EEH error states may be detected within exception handlers; + * however, the recovery processing needs to occur asynchronously + * in a normal kernel context and not an interrupt context. + * This pair of routines creates an event and queues it onto a + * work-queue, where a worker thread can drive recovery. + */ + +/* EEH event workqueue setup. */ +static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED; +LIST_HEAD(eeh_eventlist); +static void eeh_thread_launcher(void *); +DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL); + +/** + * eeh_panic - call panic() for an eeh event that cannot be handled. + * The philosophy of this routine is that it is better to panic and + * halt the OS than it is to risk possible data corruption by + * oblivious device drivers that don't know better. + * + * @dev pci device that had an eeh event + * @reset_state current reset state of the device slot + */ +static void eeh_panic(struct pci_dev *dev, int reset_state) +{ + /* + * Since the panic_on_oops sysctl is used to halt the system + * in light of potential corruption, we can use it here. + */ + if (panic_on_oops) { + panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, + pci_name(dev)); + } + else { + printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", + reset_state, pci_name(dev)); + } +} + +/** + * eeh_event_handler - dispatch EEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static int eeh_event_handler(void * dummy) +{ + unsigned long flags; + struct eeh_event *event; + + daemonize ("eehd"); + + while (1) { + set_current_state(TASK_INTERRUPTIBLE); + + spin_lock_irqsave(&eeh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&eeh_eventlist)) { + event = list_entry(eeh_eventlist.next, struct eeh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n", + pci_name(event->dev)); + + eeh_panic (event->dev, event->state); + + kfree(event); + } + + return 0; +} + +/** + * eeh_thread_launcher + * + * @dummy - unused + */ +static void eeh_thread_launcher(void *dummy) +{ + if (kernel_thread(eeh_event_handler, NULL, CLONE_KERNEL) < 0) + printk(KERN_ERR "Failed to start EEH daemon\n"); +} + +/** + * eeh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int eeh_send_failure_event (struct device_node *dn, + struct pci_dev *dev, + int state, + int time_unavail) +{ + unsigned long flags; + struct eeh_event *event; + + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + printk (KERN_ERR "EEH: out of memory, event not handled\n"); + return 1; + } + + if (dev) + pci_dev_get(dev); + + event->dn = dn; + event->dev = dev; + event->state = state; + event->time_unavail = time_unavail; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&eeh_eventlist_lock, flags); + list_add(&event->list, &eeh_eventlist); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + + schedule_work(&eeh_event_wq); + + return 0; +} + +/********************** END OF FILE ******************************/ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:31:36.150092654 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:32:55.306995693 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o +obj-$(CONFIG_EEH) += eeh.o eeh_event.o From linas at linas.org Fri Nov 4 11:50:10 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:10 -0600 Subject: [PATCH 13/42]: ppc64: PCI reset support routines References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005010.GA26901@mail.gnucash.org> 13-eeh-recovery-support-routines.patch EEH Recovery support routines This patch adds routines required to help drive the recovery of EEH-frozen slots. The main function is to drive the PCI #RST signal line high for a qurter of a second, and then allow for a second & a half of settle time. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:29:20.596094683 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:33:42.083437903 -0600 @@ -51,4 +51,18 @@ extern unsigned long pci_assign_all_buses; extern int pci_read_irq_line(struct pci_dev *pci_dev); +/* ---- EEH internal-use-only related routines ---- */ +#ifdef CONFIG_EEH +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct pci_dn *); + +#endif + #endif /* _ASM_POWERPC_PPC_PCI_H */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:32:35.713742506 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:33:42.096436081 -0600 @@ -17,6 +17,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#include #include #include #include @@ -677,6 +678,104 @@ EXPORT_SYMBOL(eeh_check_failure); /* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +/** Return negative value if a permanent error, else return + * a number of milliseconds to wait until the PCI slot is + * ready to be used. + */ +static int +eeh_slot_availability(struct pci_dn *pdn) +{ + int rc; + int rets[3]; + + rc = read_slot_reset_state(pdn, rets); + + if (rc) return rc; + + if (rets[1] == 0) return -1; /* EEH is not supported */ + if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 5) { + if (rets[2] == 0) return -1; /* permanently unavailable */ + return rets[2]; /* number of millisecs to wait */ + } + return -1; +} + +/** rtas_pci_slot_reset raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + * + * Clear the EEH-frozen condition on a slot. This routine + * asserts the PCI #RST line if the 'state' argument is '1', + * and drops the #RST line if 'state is '0'. This routine is + * safe to call in an interrupt context. + * + */ + +static void +rtas_pci_slot_reset(struct pci_dn *pdn, int state) +{ + int rc; + + BUG_ON (pdn==NULL); + + if (!pdn->phb) { + printk (KERN_WARNING "EEH: in slot reset, device node %s has no phb\n", + pdn->node->full_name); + return; + } + + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, + pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), + state); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot, (%d) #RST=%d dn=%s\n", + rc, state, pdn->node->full_name); + return; + } + + if (state == 0) + eeh_clear_slot (pdn->node->parent->child); +} + +/** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second + * dn -- device node to be reset. + */ + +void +rtas_set_slot_reset(struct pci_dn *pdn) +{ + int i, rc; + + rtas_pci_slot_reset (pdn, 1); + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. */ + +#define PCI_BUS_RST_HOLD_TIME_MSEC 250 + msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + rtas_pci_slot_reset (pdn, 0); + + /* After a PCI slot has been reset, the PCI Express spec requires + * a 1.5 second idle time for the bus to stabilize, before starting + * up traffic. */ +#define PCI_BUS_SETTLE_TIME_MSEC 1800 + msleep (PCI_BUS_SETTLE_TIME_MSEC); + + /* Now double check with the firmware to make sure the device is + * ready to be used; if not, wait for recovery. */ + for (i=0; i<10; i++) { + rc = eeh_slot_availability (pdn); + if (rc <= 0) break; + + msleep (rc+100); + } +} + +/* ------------------------------------------------------------- */ /* The code below deals with enabling EEH for devices during the * early boot sequence. EEH must be enabled before any PCI probing * can be done. From linas at linas.org Fri Nov 4 11:50:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:17 -0600 Subject: [PATCH 14/42]: ppc64: Save & restore of PCI device BARS References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005017.GA26911@mail.gnucash.org> 14-eeh-device-bar-save.patch After a PCI device has been resest, the device BAR's and other config space info must be restored to the same state as they were in when the firmware first handed us this device. This will allow the PCI device driver, when restarted, to correctly recognize and set up the device. Tis patch saves the device config space as early as reasonable after the firmware has handed over the device. Te state resore funcion is inteded for use by the EEH recovery routines. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:33:42.096436081 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:34:19.926132452 -0600 @@ -77,6 +77,9 @@ */ #define EEH_MAX_FAILS 100000 +/* Misc forward declaraions */ +static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); + /* RTAS tokens */ static int ibm_set_eeh_option; static int ibm_set_slot_reset; @@ -366,6 +369,7 @@ */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; if (!eeh_subsystem_enabled) @@ -379,6 +383,10 @@ continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + eeh_save_bars(dev, PCI_DN(dn)); } #ifdef DEBUG @@ -775,6 +783,108 @@ } } +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct pci_dn *pdn) +{ + int i; + + if (NULL==pdn->phb) return; + for (i=4; i<10; i++) { + rtas_write_config(pdn, i*4, 4, pdn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(pdn, 12*4, 4, pdn->config_space[12]); + +#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) +#define SAVED_BYTE(OFF) (((u8 *)(pdn->config_space))[BYTE_SWAP(OFF)]) + + rtas_write_config (pdn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (pdn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + /* max latency, min grant, interrupt pin and line */ + rtas_write_config(pdn, 15*4, 4, pdn->config_space[15]); +} + +/** + * eeh_restore_bars - restore the PCI config space info + * + * This routine performs a recursive walk to the children + * of this device as well. + */ +void eeh_restore_bars(struct pci_dn *pdn) +{ + struct device_node *dn; + if (!pdn) + return; + + if (! pdn->eeh_is_bridge) + __restore_bars (pdn); + + dn = pdn->node->child; + while (dn) { + eeh_restore_bars (PCI_DN(dn)); + dn = dn->sibling; + } +} + +/** + * eeh_save_bars - save device bars + * + * Save the values of the device bars. Unlike the restore + * routine, this routine is *not* recursive. This is because + * PCI devices are added individuallly; but, for the restore, + * an entire slot is reset at a time. + */ +static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +{ + int i; + + if (!pdev || !pdn ) + return; + + for (i = 0; i < 16; i++) + pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]); + + if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + pdn->eeh_is_bridge = 1; +} + +void +rtas_configure_bridge(struct pci_dn *pdn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge (%d) for %s\n", + rc, pdn->node->full_name); + } +} + /* ------------------------------------------------------------- */ /* The code below deals with enabling EEH for devices during the * early boot sequence. EEH must be enabled before any PCI probing @@ -977,6 +1087,7 @@ void eeh_add_device_late(struct pci_dev *dev) { struct device_node *dn; + struct pci_dn *pdn; if (!dev || !eeh_subsystem_enabled) return; @@ -987,9 +1098,11 @@ pci_dev_get (dev); dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; + pdn = PCI_DN(dn); + pdn->pcidev = dev; pci_addr_cache_insert_device (dev); + eeh_save_bars(dev, pdn); } EXPORT_SYMBOL_GPL(eeh_add_device_late); Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:33:42.083437903 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:34:19.931131751 -0600 @@ -63,6 +63,29 @@ */ void rtas_set_slot_reset (struct pci_dn *); +/** + * eeh_restore_bars - Restore device configuration info. + * + * A reset of a PCI device will clear out its config space. + * This routines will restore the config space for this + * device, and is children, to values previously obtained + * from the firmware. + */ +void eeh_restore_bars(struct pci_dn *); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure all PCI bridges devices + * located behind the indicated node. Required after a + * pci device reset. Does essentially the same hing as + * eeh_restore_bars, but for brdges, and lets firmware + * do the work. + */ +void rtas_configure_bridge(struct pci_dn *); + +int rtas_write_config(struct pci_dn *, int where, int size, u32 val); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:50:26 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:26 -0600 Subject: [PATCH 15/42]: Documentation: PCI Error Recovery References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005026.GA26919@mail.gnucash.org> 215-pci-error-recovery_docs.patch PCI Error Recovery: documentation patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds documentation describing the current error recovery proposal. Signed-off-by: Linas Vepstas Documentation/pci-error-recovery.txt | 246 +++++++++++++++++++++++++++++++++++ MAINTAINERS | 7 2 files changed, 253 insertions(+) Index: linux-2.6.14-git3/Documentation/pci-error-recovery.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/Documentation/pci-error-recovery.txt 2005-11-02 14:34:25.663328101 -0600 @@ -0,0 +1,246 @@ + + PCI Error Recovery + ------------------ + May 31, 2005 + + Current document maintainer: + Linas Vepstas + + +Some PCI bus controllers are able to detect certain "hard" PCI errors +on the bus, such as parity errors on the data and address busses, as +well as SERR and PERR errors. These chipsets are then able to disable +I/O to/from the affected device, so that, for example, a bad DMA +address doesn't end up corrupting system memory. These same chipsets +are also able to reset the affected PCI device, and return it to +working condition. This document describes a generic API form +performing error recovery. + +The core idea is that after a PCI error has been detected, there must +be a way for the kernel to coordinate with all affected device drivers +so that the pci card can be made operational again, possibly after +performing a full electrical #RST of the PCI card. The API below +provides a generic API for device drivers to be notified of PCI +errors, and to be notified of, and respond to, a reset sequence. + +Preliminary sketch of API, cut-n-pasted-n-modified email from +Ben Herrenschmidt, circa 5 april 2005 + +The error recovery API support is exposed to the driver in the form of +a structure of function pointers pointed to by a new field in struct +pci_driver. The absence of this pointer in pci_driver denotes an +"non-aware" driver, behaviour on these is platform dependant. +Platforms like ppc64 can try to simulate pci hotplug remove/add. + +The definition of "pci_error_token" is not covered here. It is based on +Seto's work on the synchronous error detection. We still need to define +functions for extracting infos out of an opaque error token. This is +separate from this API. + +This structure has the form: + +struct pci_error_handlers +{ + int (*error_detected)(struct pci_dev *dev, pci_error_token error); + int (*mmio_enabled)(struct pci_dev *dev); + int (*resume)(struct pci_dev *dev); + int (*link_reset)(struct pci_dev *dev); + int (*slot_reset)(struct pci_dev *dev); +}; + +A driver doesn't have to implement all of these callbacks. The +only mandatory one is error_detected(). If a callback is not +implemented, the corresponding feature is considered unsupported. +For example, if mmio_enabled() and resume() aren't there, then the +driver is assumed as not doing any direct recovery and requires +a reset. If link_reset() is not implemented, the card is assumed as +not caring about link resets, in which case, if recover is supported, +the core can try recover (but not slot_reset() unless it really did +reset the slot). If slot_reset() is not supported, link_reset() can +be called instead on a slot reset. + +At first, the call will always be : + + 1) error_detected() + + Error detected. This is sent once after an error has been detected. At +this point, the device might not be accessible anymore depending on the +platform (the slot will be isolated on ppc64). The driver may already +have "noticed" the error because of a failing IO, but this is the proper +"synchronisation point", that is, it gives a chance to the driver to +cleanup, waiting for pending stuff (timers, whatever, etc...) to +complete; it can take semaphores, schedule, etc... everything but touch +the device. Within this function and after it returns, the driver +shouldn't do any new IOs. Called in task context. This is sort of a +"quiesce" point. See note about interrupts at the end of this doc. + + Result codes: + - PCIERR_RESULT_CAN_RECOVER: + Driever returns this if it thinks it might be able to recover + the HW by just banging IOs or if it wants to be given + a chance to extract some diagnostic informations (see + below). + - PCIERR_RESULT_NEED_RESET: + Driver returns this if it thinks it can't recover unless the + slot is reset. + - PCIERR_RESULT_DISCONNECT: + Return this if driver thinks it won't recover at all, + (this will detach the driver ? or just leave it + dangling ? to be decided) + +So at this point, we have called error_detected() for all drivers +on the segment that had the error. On ppc64, the slot is isolated. What +happens now typically depends on the result from the drivers. If all +drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would +re-enable IOs on the slot (or do nothing special if the platform doesn't +isolate slots) and call 2). If not and we can reset slots, we go to 4), +if neither, we have a dead slot. If it's an hotplug slot, we might +"simulate" reset by triggering HW unplug/replug though. + +>>> Current ppc64 implementation assumes that a device driver will +>>> *not* schedule or semaphore in this routine; the current ppc64 +>>> implementation uses one kernel thread to notify all devices; +>>> thus, of one device sleeps/schedules, all devices are affected. +>>> Doing better requires complex multi-threaded logic in the error +>>> recovery implementation (e.g. waiting for all notification threads +>>> to "join" before proceeding with recovery.) This seems excessively +>>> complex and not worth implementing. + +>>> The current ppc64 implementation doesn't much care if the device +>>> attempts i/o at this point, or not. I/O's will fail, returning +>>> a value of 0xff on read, and writes will be dropped. If the device +>>> driver attempts more than 10K I/O's to a frozen adapter, it will +>>> assume that the device driver has gone into an infinite loop, and +>>> it will panic the the kernel. + + 2) mmio_enabled() + + This is the "early recovery" call. IOs are allowed again, but DMA is +not (hrm... to be discussed, I prefer not), with some restrictions. This +is NOT a callback for the driver to start operations again, only to +peek/poke at the device, extract diagnostic information, if any, and +eventually do things like trigger a device local reset or some such, +but not restart operations. This is sent if all drivers on a segment +agree that they can try to recover and no automatic link reset was +performed by the HW. If the platform can't just re-enable IOs without +a slot reset or a link reset, it doesn't call this callback and goes +directly to 3) or 4). All IOs should be done _synchronously_ from +within this callback, errors triggered by them will be returned via +the normal pci_check_whatever() api, no new error_detected() callback +will be issued due to an error happening here. However, such an error +might cause IOs to be re-blocked for the whole segment, and thus +invalidate the recovery that other devices on the same segment might +have done, forcing the whole segment into one of the next states, +that is link reset or slot reset. + + Result codes: + - PCIERR_RESULT_RECOVERED + Driver returns this if it thinks the device is fully + functionnal and thinks it is ready to start + normal driver operations again. There is no + guarantee that the driver will actually be + allowed to proceed, as another driver on the + same segment might have failed and thus triggered a + slot reset on platforms that support it. + + - PCIERR_RESULT_NEED_RESET + Driver returns this if it thinks the device is not + recoverable in it's current state and it needs a slot + reset to proceed. + + - PCIERR_RESULT_DISCONNECT + Same as above. Total failure, no recovery even after + reset driver dead. (To be defined more precisely) + +>>> The current ppc64 implementation does not implement this callback. + + 3) link_reset() + + This is called after the link has been reset. This is typically +a PCI Express specific state at this point and is done whenever a +non-fatal error has been detected that can be "solved" by resetting +the link. This call informs the driver of the reset and the driver +should check if the device appears to be in working condition. +This function acts a bit like 2) mmio_enabled(), in that the driver +is not supposed to restart normal driver I/O operations right away. +Instead, it should just "probe" the device to check it's recoverability +status. If all is right, then the core will call resume() once all +drivers have ack'd link_reset(). + + Result codes: + (identical to mmio_enabled) + +>>> The current ppc64 implementation does not implement this callback. + + 4) slot_reset() + + This is called after the slot has been soft or hard reset by the +platform. A soft reset consists of asserting the adapter #RST line +and then restoring the PCI BARs and PCI configuration header. If the +platform supports PCI hotplug, then it might instead perform a hard +reset by toggling power on the slot off/on. This call gives drivers +the chance to re-initialize the hardware (re-download firmware, etc.), +but drivers shouldn't restart normal I/O processing operations at +this point. (See note about interrupts; interrupts aren't guaranteed +to be delivered until the resume() callback has been called). If all +device drivers report success on this callback, the patform will call +resume() to complete the error handling and let the driver restart +normal I/O processing. + +A driver can still return a critical failure for this function if +it can't get the device operational after reset. If the platform +previously tried a soft reset, it migh now try a hard reset (power +cycle) and then call slot_reset() again. It the device still can't +be recovered, there is nothing more that can be done; the platform +will typically report a "permanent failure" in such a case. The +device will be considered "dead" in this case. + + Result codes: + - PCIERR_RESULT_DISCONNECT + Same as above. + +>>> The current ppc64 implementation does not try a power-cycle reset +>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. + + 5) resume() + + This is called if all drivers on the segment have returned +PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. +That basically tells the driver to restart activity, tht everything +is back and running. No result code is taken into account here. If +a new error happens, it will restart a new error handling process. + +That's it. I think this covers all the possibilities. The way those +callbacks are called is platform policy. A platform with no slot reset +capability for example may want to just "ignore" drivers that can't +recover (disconnect them) and try to let other cards on the same segment +recover. Keep in mind that in most real life cases, though, there will +be only one driver per segment. + +Now, there is a note about interrupts. If you get an interrupt and your +device is dead or has been isolated, there is a problem :) + +After much thinking, I decided to leave that to the platform. That is, +the recovery API only precies that: + + - There is no guarantee that interrupt delivery can proceed from any +device on the segment starting from the error detection and until the +restart callback is sent, at which point interrupts are expected to be +fully operational. + + - There is no guarantee that interrupt delivery is stopped, that is, ad +river that gets an interrupts after detecting an error, or that detects +and error within the interrupt handler such that it prevents proper +ack'ing of the interrupt (and thus removal of the source) should just +return IRQ_NOTHANDLED. It's up to the platform to deal with taht +condition, typically by masking the irq source during the duration of +the error handling. It is expected that the platform "knows" which +interrupts are routed to error-management capable slots and can deal +with temporarily disabling that irq number during error processing (this +isn't terribly complex). That means some IRQ latency for other devices +sharing the interrupt, but there is simply no other way. High end +platforms aren't supposed to share interrupts between many devices +anyway :) + + +Revised: 31 May 2005 Linas Vepstas Index: linux-2.6.14-git3/MAINTAINERS =================================================================== --- linux-2.6.14-git3.orig/MAINTAINERS 2005-11-02 14:29:19.433257684 -0600 +++ linux-2.6.14-git3/MAINTAINERS 2005-11-02 14:34:25.700322915 -0600 @@ -1885,6 +1885,13 @@ L: linux-abi-devel at lists.sourceforge.net S: Maintained +PCI ERROR RECOVERY +P: Linas Vepstas +M: linas at austin.ibm.com +L: linux-kernel at vger.kernel.org +L: linux-pci at atrey.karlin.mff.cuni.cz +S: Supported + PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES) P: Thomas Sailer M: sailer at ife.ee.ethz.ch From linas at linas.org Fri Nov 4 11:50:35 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:35 -0600 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005035.GA26929@mail.gnucash.org> 16-pci-error-recovery_header.patch PCI Error Recovery: header file patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds a set of callbacks to be used by error recovery routines to notify device drivers of the various stages of recovery. Signed-off-by: Linas Vepstas -- include/linux/pci.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 49 insertions(+) Index: linux-2.6.14-git3/include/linux/pci.h =================================================================== --- linux-2.6.14-git3.orig/include/linux/pci.h 2005-11-02 14:29:18.856338553 -0600 +++ linux-2.6.14-git3/include/linux/pci.h 2005-11-02 14:34:32.272401512 -0600 @@ -78,6 +78,16 @@ #define PCI_UNKNOWN ((pci_power_t __force) 5) #define PCI_POWER_ERROR ((pci_power_t __force) -1) +/** The pci_channel state describes connectivity between the CPU and + * the pci device. If some PCI bus between here and the pci device + * has crashed or locked up, this info is reflected here. + */ +enum pci_channel_state { + pci_channel_io_normal = 0, /* I/O channel is in normal state */ + pci_channel_io_frozen = 1, /* I/O to channel is blocked */ + pci_channel_io_perm_failure, /* PCI card is dead */ +}; + /* * The pci_dev structure is used to describe PCI devices. */ @@ -110,6 +120,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ + enum pci_channel_state error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -232,6 +243,43 @@ unsigned int use_driver_data:1; /* pci_driver->driver_data is used */ }; +/* ---------------------------------------------------------------- */ +/** PCI error recovery infrastructure. If a PCI device driver provides + * a set fof callbacks in struct pci_error_handlers, then that device driver + * will be notified of PCI bus errors, and will be driven to recovery + * when an error occurs. + */ + +enum pcierr_result { + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ +}; + +/* PCI bus error event callbacks */ +struct pci_error_handlers +{ + /* PCI bus error detected on this device */ + int (*error_detected)(struct pci_dev *dev, + enum pci_channel_state error); + + /* MMIO has been re-enabled, but not DMA */ + int (*mmio_enabled)(struct pci_dev *dev); + + /* PCI Express link has been reset */ + int (*link_reset)(struct pci_dev *dev); + + /* PCI slot has been reset */ + int (*slot_reset)(struct pci_dev *dev); + + /* Device driver may resume normal operations */ + void (*resume)(struct pci_dev *dev); +}; + +/* ---------------------------------------------------------------- */ + struct module; struct pci_driver { struct list_head node; @@ -245,6 +293,7 @@ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); + struct pci_error_handlers *err_handler; struct device_driver driver; struct pci_dynids dynids; }; From linas at linas.org Fri Nov 4 11:50:48 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:48 -0600 Subject: [PATCH 17/42]: ppc64: mark failed devices References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005048.GA26970@mail.gnucash.org> 17-eeh-slot-marking-bug.patch A device that experiences a PCI outage may be just one deivce out of many that was affected. In order to avoid repeated reports of a failure, the entire tree of affected devices should be marked as failed. This patch marks up the entire tree. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:34:19.926132452 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:35:39.290005477 -0600 @@ -479,32 +479,47 @@ * an interrupt context, which is bad. */ -static inline void __eeh_mark_slot (struct device_node *dn) +static inline void __eeh_mark_slot (struct device_node *dn, int mode_flag) { while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + if (PCI_DN(dn)) { + PCI_DN(dn)->eeh_mode |= mode_flag; - if (dn->child) - __eeh_mark_slot (dn->child); + if (dn->child) + __eeh_mark_slot (dn->child, mode_flag); + } dn = dn->sibling; } } -static inline void __eeh_clear_slot (struct device_node *dn) +void eeh_mark_slot (struct device_node *dn, int mode_flag) +{ + dn = find_device_pe (dn); + PCI_DN(dn)->eeh_mode |= mode_flag; + __eeh_mark_slot (dn->child, mode_flag); +} + +static inline void __eeh_clear_slot (struct device_node *dn, int mode_flag) { while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); + if (PCI_DN(dn)) { + PCI_DN(dn)->eeh_mode &= ~mode_flag; + PCI_DN(dn)->eeh_check_count = 0; + if (dn->child) + __eeh_clear_slot (dn->child, mode_flag); + } dn = dn->sibling; } } -static inline void eeh_clear_slot (struct device_node *dn) +void eeh_clear_slot (struct device_node *dn, int mode_flag) { unsigned long flags; spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); + dn = find_device_pe (dn); + PCI_DN(dn)->eeh_mode &= ~mode_flag; + PCI_DN(dn)->eeh_check_count = 0; + __eeh_clear_slot (dn->child, mode_flag); spin_unlock_irqrestore(&confirm_error_lock, flags); } @@ -529,7 +544,6 @@ int rets[3]; unsigned long flags; struct pci_dn *pdn; - struct device_node *pe_dn; int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -631,8 +645,7 @@ /* Avoid repeated reports of this failure, including problems * with other functions on this device, and functions under * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); + eeh_mark_slot (dn, EEH_MODE_ISOLATED); spin_unlock_irqrestore(&confirm_error_lock, flags); eeh_send_failure_event (dn, dev, rets[0], rets[2]); @@ -744,9 +757,6 @@ rc, state, pdn->node->full_name); return; } - - if (state == 0) - eeh_clear_slot (pdn->node->parent->child); } /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second @@ -765,6 +775,12 @@ #define PCI_BUS_RST_HOLD_TIME_MSEC 250 msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + + /* We might get hit with another EEH freeze as soon as the + * pci slot reset line is dropped. Make sure we don't miss + * these, and clear the flag now. */ + eeh_clear_slot (pdn->node, EEH_MODE_ISOLATED); + rtas_pci_slot_reset (pdn, 0); /* After a PCI slot has been reset, the PCI Express spec requires Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:34:19.931131751 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:35:39.295004776 -0600 @@ -86,6 +86,13 @@ int rtas_write_config(struct pci_dn *, int where, int size, u32 val); +/** + * mark and clear slots: find "partition endpoint" PE and set or + * clear the flags for each subnode of the PE. + */ +void eeh_mark_slot (struct device_node *dn, int mode_flag); +void eeh_clear_slot (struct device_node *dn, int mode_flag); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:51:03 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:03 -0600 Subject: [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005103.GA26983@mail.gnucash.org> 18-crash-on-pci-slot-add.patch This patch fixes a bugs related to dlpar slot add. -- Crash is due to the fact the some children of pci nodes are not pci nodes themselves, and thus do not have pci_dn structures. For example: /pci at 800000020000002/pci at 2,3/usb at 1/hub at 1 /pci at 800000020000002/pci at 2,3/usb at 1,1/hub at 1 A typical stack trace: Vector: 300 (Data Access) at [c0000000555637d0] pc: c000000000202a50: .dlpar_add_slot+0x108/0x410 c000000000202e78 .add_slot_store+0x7c/0xac c000000000202da0 .dlpar_attr_store+0x48/0x64 c0000000000f8ee4 .sysfs_write_file+0x100/0x1a0 A similar stack trace is involved for the slot remove. This code survived testing, of adding and removing different slots, 23 times each, so far, as of this writing. Signed-off-by: Linas Vepstas emailed to To: paulus at samba.org Cc: linuxppc64-dev at ozlabs.org, johnrose at linux.ibm.com, linux-kernel at vger.kernel.org Subject: [PATCH 2/2] ppc64: Crash in DLPAR code on remove operation on 4 October 2005 Index: linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.14-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-11-03 14:15:40.520737607 -0600 +++ linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c 2005-11-03 14:15:45.182083115 -0600 @@ -194,7 +194,10 @@ switch (action) { case PSERIES_RECONFIG_ADD: - pci = np->parent->data; + pci = PCI_DN(np->parent); + if (!pci) + return NOTIFY_OK; + update_dn_pci_info(np, pci->phb); break; default: Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-03 14:14:32.131340002 -0600 +++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c 2005-11-03 14:49:42.621970876 -0600 @@ -494,10 +494,13 @@ { int err = NOTIFY_OK; struct device_node *np = node; - struct pci_dn *pci = np->data; + struct pci_dn *pci; switch (action) { case PSERIES_RECONFIG_REMOVE: + pci = PCI_DN(np); + if (!pci) + return NOTIFY_OK; if (pci->iommu_table && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); From linas at linas.org Fri Nov 4 11:51:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:17 -0600 Subject: [PATCH 19/42]: ppc64: bugfix: crash on PHB add References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005117.GA26991@mail.gnucash.org> 19-rpaphp-crashing.patch This patch fixes a bug related to dlpar PHB add, after a PHB removal. -- The crash was due to the PHB not having a pci_dn structure yet, when the phb is being added. This code survived testing, of adding and removeig the PHB and all slots underneath it, 17 times so far, as of this writing. Signed-off-by: Linas Vepstas emailed to To: paulus at samba.org Cc: linuxppc64-dev at ozlabs.org, linux-pci at atrey.karlin.mff.cuni.cz, johnrose at linux.ibm.com, linux-kernel at vger.kernel.org Subject: [PATCH] rpaphp: PCI Hotplug crash on PHB DLPAR add on 4 October 2005 Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:29:02.115685162 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:35:52.800111285 -0600 @@ -306,7 +306,7 @@ { struct pci_controller *phb; - if (PCI_DN(dn)->phb) { + if (PCI_DN(dn) && PCI_DN(dn)->phb) { /* PHB already exists */ return -EINVAL; } From linas at linas.org Fri Nov 4 11:51:31 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:31 -0600 Subject: [PATCH 20/42]: ppc64: PCI hotplug common code elimination References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005131.GA27000@mail.gnucash.org> 20-rpaphp-eeh-cleanup.patch This patch move some code from the rpaphp directory, to the ppc64 directory, where it should have been all along (Among other things, I need it in the ppc64 directory for the PCI error recovery.) Please note that patch affects TWO maintainers: Paul, after applying the ppc64 part, please ask that GregKH appli the PCI part. It is safe to have the ppc64 part go in first. It would be bad to have the PCI part go in first. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:35:39.290005477 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:36:41.255317484 -0600 @@ -1093,6 +1093,15 @@ } EXPORT_SYMBOL_GPL(eeh_add_device_early); +void eeh_add_device_tree_early(struct device_node *dn) +{ + struct device_node *sib; + for (sib = dn->child; sib; sib = sib->sibling) + eeh_add_device_tree_early(sib); + eeh_add_device_early(dn); +} +EXPORT_SYMBOL_GPL(eeh_add_device_tree_early); + /** * eeh_add_device_late - perform EEH initialization for the indicated pci device * @dev: pci device for which to set up EEH @@ -1147,6 +1156,23 @@ } EXPORT_SYMBOL_GPL(eeh_remove_device); +void eeh_remove_bus_device(struct pci_dev *dev) +{ + eeh_remove_device(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *bus = dev->subordinate; + struct list_head *ln; + if (!bus) + return; + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev) + eeh_remove_bus_device(pdev); + } + } +} +EXPORT_SYMBOL_GPL(eeh_remove_bus_device); + static int proc_eeh_show(struct seq_file *m, void *v) { unsigned int cpu; Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:32:35.725740824 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:36:41.263316362 -0600 @@ -55,6 +55,7 @@ * to finish the eeh setup for this device. */ void eeh_add_device_early(struct device_node *); +void eeh_add_device_tree_early(struct device_node *); void eeh_add_device_late(struct pci_dev *); /** @@ -70,6 +71,15 @@ void eeh_remove_device(struct pci_dev *); /** + * eeh_remove_device_recursive - undo EEH for device & children. + * @dev: pci device to be removed + * + * As above, this removes the device; it also removes child + * pci devices as well. + */ +void eeh_remove_bus_device(struct pci_dev *); + +/** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. * * If this macro yields TRUE, the caller relays to eeh_check_failure() Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:28:58.955128188 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:41.271315241 -0600 @@ -253,17 +253,6 @@ return dev; } -static void enable_eeh(struct device_node *dn) -{ - struct device_node *sib; - - for (sib = dn->child; sib; sib = sib->sibling) - enable_eeh(sib); - eeh_add_device_early(dn); - return; - -} - static void print_slot_pci_funcs(struct pci_bus *bus) { struct device_node *dn; @@ -289,7 +278,7 @@ if (!dn) goto exit; - enable_eeh(dn); + eeh_add_device_tree_early(dn); dev = rpaphp_pci_config_slot(bus); if (!dev) { err("%s: can't find any devices.\n", __FUNCTION__); @@ -303,30 +292,12 @@ } EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); -static void rpaphp_eeh_remove_bus_device(struct pci_dev *dev) -{ - eeh_remove_device(dev); - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - struct pci_bus *bus = dev->subordinate; - struct list_head *ln; - if (!bus) - return; - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev) - rpaphp_eeh_remove_bus_device(pdev); - } - - } - return; -} - int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) { struct pci_dev *dev, *tmp; list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { - rpaphp_eeh_remove_bus_device(dev); + eeh_remove_bus_device(dev); pci_remove_bus_device(dev); } return 0; From linas at linas.org Fri Nov 4 11:51:46 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:46 -0600 Subject: [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005146.GA27008@mail.gnucash.org> 21-rpaphp-eeh-cleanup.patch This patch cleans up some rpa dlpar code. Basically, the rpaphp_config_pci_adapter() was a wrapper routine, which made two calls, and wrapped a bunch of verbose no-op code around it. This was consolidated wih the routine it called. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:41.271315241 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:48.081360405 -0600 @@ -221,18 +221,21 @@ rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. *****************************************************************************/ -static struct pci_dev * -rpaphp_pci_config_slot(struct pci_bus *bus) +int +rpaphp_config_pci_adapter(struct pci_bus *bus) { struct device_node *dn = pci_bus_to_OF_node(bus); struct pci_dev *dev = NULL; + int rc = -ENODEV; int slotno; int num; dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name); if (!dn || !dn->child) - return NULL; + goto exit; + eeh_add_device_tree_early(dn); + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); /* pci_scan_slot should find all children */ @@ -243,15 +246,23 @@ } if (list_empty(&bus->devices)) { err("%s: No new device found\n", __FUNCTION__); - return NULL; + goto exit; } list_for_each_entry(dev, &bus->devices, bus_list) { if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); } - return dev; + dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name); + list_for_each_entry (dev, &bus->devices, bus_list) + dbg("\t%s\n", pci_name(dev)); + + rc = 0; +exit: + dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); + return rc; } +EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); static void print_slot_pci_funcs(struct pci_bus *bus) { @@ -268,30 +279,6 @@ return; } -int rpaphp_config_pci_adapter(struct pci_bus *bus) -{ - struct device_node *dn = pci_bus_to_OF_node(bus); - struct pci_dev *dev; - int rc = -ENODEV; - - dbg("Entry %s: slot[%s]\n", __FUNCTION__, dn->full_name); - if (!dn) - goto exit; - - eeh_add_device_tree_early(dn); - dev = rpaphp_pci_config_slot(bus); - if (!dev) { - err("%s: can't find any devices.\n", __FUNCTION__); - goto exit; - } - print_slot_pci_funcs(bus); - rc = 0; -exit: - dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); - return rc; -} -EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); - int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) { struct pci_dev *dev, *tmp; From linas at linas.org Fri Nov 4 11:52:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:01 -0600 Subject: [PATCH 22/42]: PCI: remove duplicted pci hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005201.GA27016@mail.gnucash.org> 22-rpaphp-eliminate-dupe-code.patch The RPAPHP code contains two routines that appear to be gratiuitous copies of very similar pci code. In particular, rpaphp_claim_resource ~~ pci_claim_resource rpadlpar_claim_one_bus == pcibios_claim_one_bus This patch removes the rpaphp versions of the code. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:48.081360405 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:51.785840999 -0600 @@ -62,28 +62,6 @@ } EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus); -int rpaphp_claim_resource(struct pci_dev *dev, int resource) -{ - struct resource *res = &dev->resource[resource]; - struct resource *root = pci_find_parent_resource(dev, res); - char *dtype = resource < PCI_BRIDGE_RESOURCES ? "device" : "bridge"; - int err = -EINVAL; - - if (root != NULL) { - err = request_resource(root, res); - } - - if (err) { - err("PCI: %s region %d of %s %s [%lx:%lx]\n", - root ? "Address space collision on" : - "No parent found for", - resource, dtype, pci_name(dev), res->start, res->end); - } - return err; -} - -EXPORT_SYMBOL_GPL(rpaphp_claim_resource); - static int rpaphp_get_sensor_state(struct slot *slot, int *state) { int rc; @@ -178,7 +156,7 @@ if (r->parent || !r->start || !r->flags) continue; - rpaphp_claim_resource(dev, i); + pci_claim_resource(dev, i); } } } Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:35:52.800111285 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:36:51.793839877 -0600 @@ -112,28 +112,6 @@ return NULL; } -static void rpadlpar_claim_one_bus(struct pci_bus *b) -{ - struct list_head *ld; - struct pci_bus *child_bus; - - for (ld = b->devices.next; ld != &b->devices; ld = ld->next) { - struct pci_dev *dev = pci_dev_b(ld); - int i; - - for (i = 0; i < PCI_NUM_RESOURCES; i++) { - struct resource *r = &dev->resource[i]; - - if (r->parent || !r->start || !r->flags) - continue; - rpaphp_claim_resource(dev, i); - } - } - - list_for_each_entry(child_bus, &b->children, node) - rpadlpar_claim_one_bus(child_bus); -} - static int pci_add_secondary_bus(struct device_node *dn, struct pci_dev *bridge_dev) { @@ -158,7 +136,7 @@ pcibios_fixup_bus(child); /* Claim new bus resources */ - rpadlpar_claim_one_bus(bridge_dev->bus); + pcibios_claim_one_bus(bridge_dev->bus); if (hose->last_busno < child->number) hose->last_busno = child->number; Index: linux-2.6.14-git3/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci.c 2005-11-02 14:28:57.119385510 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/pci.c 2005-11-02 14:36:51.808837774 -0600 @@ -197,7 +197,7 @@ spin_unlock(&hose_spinlock); } -static void __init pcibios_claim_one_bus(struct pci_bus *b) +void __devinit pcibios_claim_one_bus(struct pci_bus *b) { struct pci_dev *dev; struct pci_bus *child_bus; Index: linux-2.6.14-git3/include/asm-ppc64/pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci.h 2005-11-02 14:28:57.119385510 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci.h 2005-11-02 14:36:51.813837073 -0600 @@ -160,6 +160,8 @@ extern void pcibios_fixup_device_resources(struct pci_dev *dev, struct pci_bus *bus); +extern void pcibios_claim_one_bus(struct pci_bus *b); + extern struct pci_controller *init_phb_dynamic(struct device_node *dn); extern int pci_read_irq_line(struct pci_dev *dev); From linas at linas.org Fri Nov 4 11:52:16 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:16 -0600 Subject: [PATCH 23/42]: ppc64: migrate common PCI hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005216.GA27025@mail.gnucash.org> 23-rpaphp-migrate.patch This patch moves some pci device add & remove code from the PCI hotplug directory to the arch/ppc64/kernel directory, and cleans it up a tad. The primary reason for this is that the code performs some fairly generic operations that are shared with the PCI error recovery code (living in the arch/ppc64/kernel directory). Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c 2005-11-02 14:39:24.724396565 -0600 @@ -0,0 +1,174 @@ +/* + * PCI Dynamic LPAR, PCI Hot Plug and PCI EEH recovery code + * for RPA-compliant PPC64 platform. + * Copyright (C) 2003 Linda Xie + * Copyright (C) 2005 International Business Machines + * + * Updates, 2005, John Rose + * Updates, 2005, Linas Vepstas + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +static struct pci_bus * +find_bus_among_children(struct pci_bus *bus, + struct device_node *dn) +{ + struct pci_bus *child = NULL; + struct list_head *tmp; + struct device_node *busdn; + + busdn = pci_bus_to_OF_node(bus); + if (busdn == dn) + return bus; + + list_for_each(tmp, &bus->children) { + child = find_bus_among_children(pci_bus_b(tmp), dn); + if (child) + break; + }; + return child; +} + +struct pci_bus * +pcibios_find_pci_bus(struct device_node *dn) +{ + struct pci_dn *pdn = dn->data; + + if (!pdn || !pdn->phb || !pdn->phb->bus) + return NULL; + + return find_bus_among_children(pdn->phb->bus, dn); +} + +/** + * pcibios_remove_pci_devices - remove all devices under this bus + * + * Remove all of the PCI devices under this bus both from the + * linux pci device tree, and from the ppc64 EEH address cache. + */ +void +pcibios_remove_pci_devices(struct pci_bus *bus) +{ + struct pci_dev *dev, *tmp; + + list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { + eeh_remove_bus_device(dev); + pci_remove_bus_device(dev); + } +} + +/* Must be called before pci_bus_add_devices */ +static void +pcibios_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus) +{ + struct pci_dev *dev; + + list_for_each_entry(dev, &bus->devices, bus_list) { + /* + * Skip already-present devices (which are on the + * global device list.) + */ + if (list_empty(&dev->global_list)) { + int i; + + /* Need to setup IOMMU tables */ + ppc_md.iommu_dev_setup(dev); + + if(fix_bus) + pcibios_fixup_device_resources(dev, bus); + pci_read_irq_line(dev); + for (i = 0; i < PCI_NUM_RESOURCES; i++) { + struct resource *r = &dev->resource[i]; + + if (r->parent || !r->start || !r->flags) + continue; + pci_claim_resource(dev, i); + } + } + } +} + +static int +pcibios_pci_config_bridge(struct pci_dev *dev) +{ + u8 sec_busno; + struct pci_bus *child_bus; + struct pci_dev *child_dev; + + /* Get busno of downstream bus */ + pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno); + + /* Add to children of PCI bridge dev->bus */ + child_bus = pci_add_new_bus(dev->bus, dev, sec_busno); + if (!child_bus) { + printk (KERN_ERR "%s: could not add second bus\n", __FUNCTION__); + return -EIO; + } + sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number); + + pci_scan_child_bus(child_bus); + + list_for_each_entry(child_dev, &child_bus->devices, bus_list) { + eeh_add_device_late(child_dev); + } + + /* Fixup new pci devices without touching bus struct */ + pcibios_fixup_new_pci_devices(child_bus, 0); + + /* Make the discovered devices available */ + pci_bus_add_devices(child_bus); + return 0; +} + +/** + * pcibios_add_pci_devices - adds new pci devices to bus + * + * This routine will find and fixup new pci devices under + * the indicated bus. This routine presumes that there + * might already be some devices under this bridge, so + * it carefully tries to add only new devices. (And that + * is how this routine differs from other, similar pcibios + * routines.) + */ +void +pcibios_add_pci_devices(struct pci_bus * bus) +{ + int slotno, num; + struct pci_dev *dev; + struct device_node *dn = pci_bus_to_OF_node(bus); + + eeh_add_device_tree_early(dn); + + /* pci_scan_slot should find all children */ + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); + num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); + if (num) { + pcibios_fixup_new_pci_devices(bus, 1); + pci_bus_add_devices(bus); + } + + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late (dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + pcibios_pci_config_bridge(dev); + } +} Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:51.785840999 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:39:24.730395724 -0600 @@ -32,36 +32,6 @@ #include "../pci.h" /* for pci_add_new_bus */ #include "rpaphp.h" -static struct pci_bus *find_bus_among_children(struct pci_bus *bus, - struct device_node *dn) -{ - struct pci_bus *child = NULL; - struct list_head *tmp; - struct device_node *busdn; - - busdn = pci_bus_to_OF_node(bus); - if (busdn == dn) - return bus; - - list_for_each(tmp, &bus->children) { - child = find_bus_among_children(pci_bus_b(tmp), dn); - if (child) - break; - } - return child; -} - -struct pci_bus *rpaphp_find_pci_bus(struct device_node *dn) -{ - struct pci_dn *pdn = dn->data; - - if (!pdn || !pdn->phb || !pdn->phb->bus) - return NULL; - - return find_bus_among_children(pdn->phb->bus, dn); -} -EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus); - static int rpaphp_get_sensor_state(struct slot *slot, int *state) { int rc; @@ -120,7 +90,7 @@ /* config/unconfig adapter */ *value = slot->state; } else { - bus = rpaphp_find_pci_bus(slot->dn); + bus = pcibios_find_pci_bus(slot->dn); if (bus && !list_empty(&bus->devices)) *value = CONFIGURED; else @@ -131,117 +101,6 @@ return rc; } -/* Must be called before pci_bus_add_devices */ -static void -rpaphp_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus) -{ - struct pci_dev *dev; - - list_for_each_entry(dev, &bus->devices, bus_list) { - /* - * Skip already-present devices (which are on the - * global device list.) - */ - if (list_empty(&dev->global_list)) { - int i; - - /* Need to setup IOMMU tables */ - ppc_md.iommu_dev_setup(dev); - - if(fix_bus) - pcibios_fixup_device_resources(dev, bus); - pci_read_irq_line(dev); - for (i = 0; i < PCI_NUM_RESOURCES; i++) { - struct resource *r = &dev->resource[i]; - - if (r->parent || !r->start || !r->flags) - continue; - pci_claim_resource(dev, i); - } - } - } -} - -static int rpaphp_pci_config_bridge(struct pci_dev *dev) -{ - u8 sec_busno; - struct pci_bus *child_bus; - struct pci_dev *child_dev; - - dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); - - /* get busno of downstream bus */ - pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno); - - /* add to children of PCI bridge dev->bus */ - child_bus = pci_add_new_bus(dev->bus, dev, sec_busno); - if (!child_bus) { - err("%s: could not add second bus\n", __FUNCTION__); - return -EIO; - } - sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number); - /* do pci_scan_child_bus */ - pci_scan_child_bus(child_bus); - - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ - rpaphp_fixup_new_pci_devices(child_bus, 0); - - /* Make the discovered devices available */ - pci_bus_add_devices(child_bus); - return 0; -} - -/***************************************************************************** - rpaphp_pci_config_slot() will configure all devices under the - given slot->dn and return the the first pci_dev. - *****************************************************************************/ -int -rpaphp_config_pci_adapter(struct pci_bus *bus) -{ - struct device_node *dn = pci_bus_to_OF_node(bus); - struct pci_dev *dev = NULL; - int rc = -ENODEV; - int slotno; - int num; - - dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name); - if (!dn || !dn->child) - goto exit; - - eeh_add_device_tree_early(dn); - - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); - - /* pci_scan_slot should find all children */ - num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); - if (num) { - rpaphp_fixup_new_pci_devices(bus, 1); - pci_bus_add_devices(bus); - } - if (list_empty(&bus->devices)) { - err("%s: No new device found\n", __FUNCTION__); - goto exit; - } - list_for_each_entry(dev, &bus->devices, bus_list) { - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - rpaphp_pci_config_bridge(dev); - } - - dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name); - list_for_each_entry (dev, &bus->devices, bus_list) - dbg("\t%s\n", pci_name(dev)); - - rc = 0; -exit: - dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); - return rc; -} -EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); - static void print_slot_pci_funcs(struct pci_bus *bus) { struct device_node *dn; @@ -257,17 +116,6 @@ return; } -int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) -{ - struct pci_dev *dev, *tmp; - - list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { - eeh_remove_bus_device(dev); - pci_remove_bus_device(dev); - } - return 0; -} - static int setup_pci_hotplug_slot_info(struct slot *slot) { dbg("%s Initilize the PCI slot's hotplug->info structure ...\n", @@ -303,7 +151,7 @@ struct pci_bus *bus; BUG_ON(!dn); - bus = rpaphp_find_pci_bus(dn); + bus = pcibios_find_pci_bus(dn); if (!bus) { err("%s: no pci_bus for dn %s\n", __FUNCTION__, dn->full_name); goto exit_rc; @@ -328,10 +176,7 @@ if (slot->hotplug_slot->info->adapter_status == NOT_CONFIGURED) { dbg("%s CONFIGURING pci adapter in slot[%s]\n", __FUNCTION__, slot->name); - if (rpaphp_config_pci_adapter(slot->bus)) { - err("%s: CONFIG pci adapter failed\n", __FUNCTION__); - goto exit_rc; - } + pcibios_add_pci_devices(slot->bus); } else if (slot->hotplug_slot->info->adapter_status != CONFIGURED) { err("%s: slot[%s]'s adapter_status is NOT_VALID.\n", @@ -377,16 +222,10 @@ /* if slot is not empty, enable the adapter */ if (state == PRESENT) { dbg("%s : slot[%s] is occupied.\n", __FUNCTION__, slot->name); - retval = rpaphp_config_pci_adapter(slot->bus); - if (!retval) { - slot->state = CONFIGURED; - dbg("%s: PCI devices in slot[%s] has been configured\n", + pcibios_add_pci_devices(slot->bus); + slot->state = CONFIGURED; + dbg("%s: PCI devices in slot[%s] has been configured\n", __FUNCTION__, slot->name); - } else { - slot->state = NOT_CONFIGURED; - dbg("%s: no pci_dev struct for adapter in slot[%s]\n", - __FUNCTION__, slot->name); - } } else if (state == EMPTY) { dbg("%s : slot[%s] is empty\n", __FUNCTION__, slot->name); slot->state = EMPTY; Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:36:51.793839877 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:39:24.737394743 -0600 @@ -197,9 +197,8 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn) { struct pci_dev *dev; - int rc; - if (rpaphp_find_pci_bus(dn)) + if (pcibios_find_pci_bus(dn)) return -EINVAL; /* Add pci bus */ @@ -211,12 +210,7 @@ } if (dn->child) { - rc = rpaphp_config_pci_adapter(dev->subordinate); - if (rc < 0) { - printk(KERN_ERR "%s: unable to enable slot %s\n", - __FUNCTION__, drc_name); - return -EIO; - } + pcibios_add_pci_devices(dev->subordinate); } /* Add hotplug slot */ @@ -255,7 +249,7 @@ struct pci_dn *pdn; int rc = 0; - if (!rpaphp_find_pci_bus(dn)) + if (!pcibios_find_pci_bus(dn)) return -EINVAL; slot = find_slot(dn); @@ -400,7 +394,7 @@ struct pci_bus *bus; struct slot *slot; - bus = rpaphp_find_pci_bus(dn); + bus = pcibios_find_pci_bus(dn); if (!bus) return -EINVAL; Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_core.c 2005-11-02 14:28:55.984544585 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c 2005-11-02 14:39:24.744393761 -0600 @@ -426,7 +426,8 @@ dbg("DISABLING SLOT %s\n", slot->name); down(&rpaphp_sem); - retval = rpaphp_unconfig_pci_adapter(slot->bus); + pcibios_remove_pci_devices(slot->bus); + retval = 0; up(&rpaphp_sem); slot->state = NOT_CONFIGURED; info("%s: devices in slot[%s] unconfigured.\n", __FUNCTION__, Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:32:55.306995693 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:40:05.531674439 -0600 @@ -1,6 +1,6 @@ obj-y := pci.o lpar.o hvCall.o nvram.o reconfig.o \ - setup.o iommu.o rtas-fw.o ras.o + setup.o iommu.o rtas-fw.o ras.o pci_dlpar.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_event.o Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:28:55.984544585 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 14:39:24.755392219 -0600 @@ -121,9 +121,18 @@ return bus->sysdata; /* Must be root bus (PHB) */ } +/** Find the bus corresponding to the indicated device node */ +struct pci_bus * pcibios_find_pci_bus(struct device_node *dn); + extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev, int primary); +/** Remove all of the PCI devices under this bus */ +void pcibios_remove_pci_devices(struct pci_bus *bus); + +/** Discover new pci devices under this bus, and add them */ +void pcibios_add_pci_devices(struct pci_bus * bus); + extern int pcibios_remove_root_bus(struct pci_controller *phb); extern void phbs_remap_io(void); From linas at linas.org Fri Nov 4 11:52:49 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:49 -0600 Subject: [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005249.GA27034@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:36:41.255317484 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:41:18.427452474 -0600 @@ -485,6 +485,11 @@ if (PCI_DN(dn)) { PCI_DN(dn)->eeh_mode |= mode_flag; + /* Mark the pci device driver too */ + struct pci_dev *dev = PCI_DN(dn)->pcidev; + if (dev && dev->driver) + dev->error_state = pci_channel_io_frozen; + if (dn->child) __eeh_mark_slot (dn->child, mode_flag); } @@ -544,6 +549,7 @@ int rets[3]; unsigned long flags; struct pci_dn *pdn; + enum pci_channel_state state; int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -648,8 +654,13 @@ eeh_mark_slot (dn, EEH_MODE_ISOLATED); spin_unlock_irqrestore(&confirm_error_lock, flags); - eeh_send_failure_event (dn, dev, rets[0], rets[2]); - + state = pci_channel_io_normal; + if ((rets[0] == 2) || (rets[0] == 4)) + state = pci_channel_io_frozen; + if (rets[0] == 5) + state = pci_channel_io_perm_failure; + eeh_send_failure_event (dn, dev, state, rets[2]); + /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ @@ -953,8 +964,10 @@ * But there are a few cases like display devices that make sense. */ enable = 1; /* i.e. we will do checking */ +#if 0 if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) enable = 0; +#endif if (!enable) pdn->eeh_mode |= EEH_MODE_NOCHECK; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:41:18.435451353 -0600 @@ -0,0 +1,366 @@ +/* + * PCI Error Recovery Driver for RPA-compliant PPC64 platform. + * Copyright (C) 2004, 2005 Linas Vepstas + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + * Send feedback to + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static inline const char * pcid_name (struct pci_dev *pdev) +{ + if (pdev->dev.driver) + return pdev->dev.driver->name; + return ""; +} + +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + + +#ifdef DEBUG +static void print_device_node_tree (struct pci_dn *pdn, int dent) +{ + int i; + if (!pdn) return; + for (i=0;inode->name, pdn->eeh_mode, pdn->eeh_config_addr, + pdn->eeh_pe_config_addr, pdn->node->full_name); + dent += 3; + struct device_node *pc = pdn->node->child; + while (pc) { + print_device_node_tree(PCI_DN(pc), dent); + pc = pc->sibling; + } +} +#endif + +/** + * irq_in_use - return true if this irq is being used + */ +static int irq_in_use(unsigned int irq) +{ + int rc = 0; + unsigned long flags; + struct irq_desc *desc = irq_desc + irq; + + spin_lock_irqsave(&desc->lock, flags); + if (desc->action) + rc = 1; + spin_unlock_irqrestore(&desc->lock, flags); + return rc; +} + +/* ------------------------------------------------------- */ +/** eeh_report_error - report an EEH error to each device, + * collect up and merge the device responses. + */ + +static void eeh_report_error(struct pci_dev *dev, void *userdata) +{ + enum pcierr_result rc, *res = userdata; + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_frozen; + + if (!driver) + return; + + if (irq_in_use (dev->irq)) { + struct device_node *dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->error_detected) + return; + + rc = driver->err_handler->error_detected (dev, pci_channel_io_frozen); + if (*res == PCIERR_RESULT_NONE) *res = rc; + if (*res == PCIERR_RESULT_NEED_RESET) return; + if (*res == PCIERR_RESULT_DISCONNECT && + rc == PCIERR_RESULT_NEED_RESET) *res = rc; +} + +/** eeh_report_reset -- tell this device that the pci slot + * has been reset. + */ + +static void eeh_report_reset(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + struct device_node *dn = pci_device_to_OF_node(dev); + + if (!driver) + return; + + if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED; + enable_irq(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->slot_reset) + return; + + driver->err_handler->slot_reset(dev); +} + +static void eeh_report_resume(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_normal; + + if (!driver) + return; + if (!driver->err_handler) + return; + if (!driver->err_handler->resume) + return; + + driver->err_handler->resume(dev); +} + +static void eeh_report_failure(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_perm_failure; + + if (!driver) + return; + + if (irq_in_use (dev->irq)) { + struct device_node *dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->error_detected) + return; + driver->err_handler->error_detected(dev, pci_channel_io_perm_failure); +} + +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ + +/** + * eeh_reset_device() -- perform actual reset of a pci slot + * Args: bus: pointer to the pci bus structure corresponding + * to the isolated slot. A non-null value will + * cause all devices under the bus to be removed + * and then re-added. + * pe_dn: pointer to a "Partionable Endpoint" device node. + * This is the top-level structure on which pci + * bus resets can be performed. + */ + +static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +{ + if (bus) + pcibios_remove_pci_devices(bus); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset(pe_dn); + + /* Walk over all functions on this device */ + rtas_configure_bridge(pe_dn); + eeh_restore_bars(pe_dn); + + /* Give the system 5 seconds to finish running the user-space + * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, + * this is a hack, but if we don't do this, and try to bring + * the device up before the scripts have taken it down, + * potentially weird things happen. + */ + if (bus) { + ssleep (5); + pcibios_add_pci_devices(bus); + } +} + +/* The longest amount of time to wait for a pci device + * to come back on line, in seconds. + */ +#define MAX_WAIT_FOR_RECOVERY 15 + +void handle_eeh_events (struct eeh_event *event) +{ + struct device_node *frozen_dn; + struct pci_dn *frozen_pdn; + struct pci_bus *frozen_bus; + int perm_failure = 0; + + frozen_dn = find_device_pe(event->dn); + frozen_bus = pcibios_find_pci_bus(frozen_dn); + + if (!frozen_dn) { + printk(KERN_ERR "EEH: Error: Cannot find partition endpoint for %s\n", + pci_name(event->dev)); + return; + } + + /* There are two different styles for coming up with the PE. + * In the old style, it was the highest EEH-capable device + * which was always an EADS pci bridge. In the new style, + * there might not be any EADS bridges, and even when there are, + * the firmware marks them as "EEH incapable". So another + * two-step is needed to find the pci bus.. */ + if (!frozen_bus) + frozen_bus = pcibios_find_pci_bus (frozen_dn->parent); + + if (!frozen_bus) { + printk(KERN_ERR "EEH: Cannot find PCI bus for %s\n", + frozen_dn->full_name); + return; + } + +#if 0 + /* We may get "permanent failure" messages on empty slots. + * These are false alarms. Empty slots have no child dn. */ + if ((event->state == pci_channel_io_perm_failure) && (frozen_device == NULL)) + return; +#endif + + frozen_pdn = PCI_DN(frozen_dn); + frozen_pdn->eeh_freeze_count++; + + if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES) + perm_failure = 1; + + /* If the reset state is a '5' and the time to reset is 0 (infinity) + * or is more then 15 seconds, then mark this as a permanent failure. + */ + if ((event->state == pci_channel_io_perm_failure) && + ((event->time_unavail <= 0) || + (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) + { + perm_failure = 1; + } + + /* Log the error with the rtas logger. */ + if (perm_failure) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk(KERN_ERR + "EEH: PCI device %s - %s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev), + frozen_pdn->eeh_freeze_count); + + eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); + + /* Notify all devices that they're about to go down. */ + pci_walk_bus(frozen_bus, eeh_report_failure, 0); + + /* Shut down the device drivers for good. */ + pcibios_remove_pci_devices(frozen_bus); + return; + } + + eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */); + printk(KERN_WARNING + "EEH: This PCI device has failed %d times since last reboot: %s - %s\n", + frozen_pdn->eeh_freeze_count, + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev)); + + /* Walk the various device drivers attached to this slot through + * a reset sequence, giving each an opportunity to do what it needs + * to accomplish the reset. Each child gets a report of the + * status ... if any child can't handle the reset, then the entire + * slot is dlpar removed and added. + */ + enum pcierr_result result = PCIERR_RESULT_NONE; + pci_walk_bus(frozen_bus, eeh_report_error, &result); + + /* If all device drivers were EEH-unaware, then shut + * down all of the device drivers, and hope they + * go down willingly, without panicing the system. + */ + if (result == PCIERR_RESULT_NONE) { + eeh_reset_device(frozen_pdn, frozen_bus); + } + + /* If any device called out for a reset, then reset the slot */ + if (result == PCIERR_RESULT_NEED_RESET) { + eeh_reset_device(frozen_pdn, NULL); + pci_walk_bus(frozen_bus, eeh_report_reset, 0); + } + + /* If all devices reported they can proceed, the re-enable PIO */ + if (result == PCIERR_RESULT_CAN_RECOVER) { + /* XXX Not supported; we brute-force reset the device */ + eeh_reset_device(frozen_pdn, NULL); + pci_walk_bus(frozen_bus, eeh_report_reset, 0); + } + + /* Tell all device drivers that they can resume operations */ + pci_walk_bus(frozen_bus, eeh_report_resume, 0); +} + +/* ---------- end of file ---------- */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:32:35.731739983 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:41:18.440450652 -0600 @@ -21,6 +21,7 @@ #include #include #include +#include /** Overview: * EEH error states may be detected within exception handlers; @@ -37,31 +38,6 @@ DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL); /** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** * eeh_event_handler - dispatch EEH events. The detection of a frozen * slot can occur inside an interrupt, where it can be hard to do * anything about it. The goal of this routine is to pull these @@ -82,10 +58,16 @@ spin_lock_irqsave(&eeh_eventlist_lock, flags); event = NULL; + + /* Unqueue the event, get ready to process. */ if (!list_empty(&eeh_eventlist)) { event = list_entry(eeh_eventlist.next, struct eeh_event, list); list_del(&event->list); } + + if (event) + eeh_mark_slot(event->dn, EEH_MODE_RECOVERING); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); if (event == NULL) break; @@ -93,8 +75,11 @@ printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n", pci_name(event->dev)); - eeh_panic (event->dev, event->state); + handle_eeh_events(event); + + eeh_clear_slot(event->dn, EEH_MODE_RECOVERING); + pci_dev_put(event->dev); kfree(event); } @@ -122,7 +107,7 @@ */ int eeh_send_failure_event (struct device_node *dn, struct pci_dev *dev, - int state, + enum pci_channel_state state, int time_unavail) { unsigned long flags; Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/eeh_event.h 2005-11-02 14:32:35.718741805 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h 2005-11-02 14:41:18.444450091 -0600 @@ -29,7 +29,7 @@ struct list_head list; struct device_node *dn; /* struct device node */ struct pci_dev *dev; /* affected device */ - int state; + enum pci_channel_state state; /* PCI bus state for the affected device */ int time_unavail; /* milliseconds until device might be available */ }; @@ -46,7 +46,10 @@ */ int eeh_send_failure_event (struct device_node *dn, struct pci_dev *dev, - int reset_state, + enum pci_channel_state state, int time_unavail); +/* Main recovery function */ +void handle_eeh_events (struct eeh_event *); + #endif /* ASM_PPC64_EEH_EVENT_H */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:40:05.531674439 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:41:48.393250352 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_driver.o eeh_event.o Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:35:39.295004776 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:41:18.454448689 -0600 @@ -54,6 +54,15 @@ /* ---- EEH internal-use-only related routines ---- */ #ifdef CONFIG_EEH /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct pci_dn *pdn, int severity); + +/** * rtas_set_slot_reset -- unfreeze a frozen slot * * Clear the EEH-frozen condition on a slot. This routine Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:36:41.263316362 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:41:18.461447707 -0600 @@ -31,9 +31,11 @@ #ifdef CONFIG_EEH /* Values for eeh_mode bits in device_node */ -#define EEH_MODE_SUPPORTED (1<<0) -#define EEH_MODE_NOCHECK (1<<1) -#define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_SUPPORTED (1<<0) +#define EEH_MODE_NOCHECK (1<<1) +#define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_RECOVERING (1<<3) +#define EEH_MODE_IRQ_DISABLED (1<<4) /* Max number of EEH freezes allowed before we consider the device * to be permanently disabled. */ From linas at linas.org Fri Nov 4 11:53:07 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:07 -0600 Subject: [PATCH 25/42]: ppc64: Split out PCI address cache to its own file References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005307.GA27041@mail.gnucash.org> 25-pci-address-cache.patch The core EEH files is rather large. This patch splits out a self-contained chunk of it into its own file. This is the chunk that performes the caching and lookup of pci devices based on the i/o addresses of thier resoures. This code is almos archiecture-independent and could be used by any system that wanted to find a pci device based only on the i/o address used by the device. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:41:48.393250352 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:42:58.323443756 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_driver.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_cache.o eeh_driver.o eeh_event.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:41:18.427452474 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:42:38.986155538 -0600 @@ -77,9 +77,6 @@ */ #define EEH_MAX_FAILS 100000 -/* Misc forward declaraions */ -static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); - /* RTAS tokens */ static int ibm_set_eeh_option; static int ibm_set_slot_reset; @@ -107,296 +104,8 @@ static DEFINE_PER_CPU(unsigned long, ignored_failures); static DEFINE_PER_CPU(unsigned long, slot_resets); -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct device_node *dn; - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - - /* Save the BAR's; firmware doesn't restore these after EEH reset */ - dn = pci_device_to_OF_node(dev); - eeh_save_bars(dev, PCI_DN(dn)); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - /* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +/* Below lies the EEH event infrastructure */ void eeh_slot_error_detail (struct pci_dn *pdn, int severity) { @@ -880,7 +589,7 @@ * PCI devices are added individuallly; but, for the restore, * an entire slot is reset at a time. */ -static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) { int i; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:42:38.994154417 -0600 @@ -0,0 +1,317 @@ +/* + * eeh_cache.c + * PCI address cache; allows the lookup of PCI devices based on I/O address + * + * Copyright (C) 2004 Linas Vepstas IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +/** + * The pci address cache subsystem. This subsystem places + * PCI device address resources into a red-black tree, sorted + * according to the address range, so that given only an i/o + * address, the corresponding PCI device can be **quickly** + * found. It is safe to perform an address lookup in an interrupt + * context; this ability is an important feature. + * + * Currently, the only customer of this code is the EEH subsystem; + * thus, this code has been somewhat tailored to suit EEH better. + * In particular, the cache does *not* hold the addresses of devices + * for which EEH is not enabled. + * + * (Implementation Note: The RB tree seems to be better/faster + * than any hash algo I could think of for this problem, even + * with the penalty of slow pointer chases for d-cache misses). + */ +struct pci_io_addr_range +{ + struct rb_node rb_node; + unsigned long addr_lo; + unsigned long addr_hi; + struct pci_dev *pcidev; + unsigned int flags; +}; + +static struct pci_io_addr_cache +{ + struct rb_root rb_root; + spinlock_t piar_lock; +} pci_io_addr_cache_root; + +static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) +{ + struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; + + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (addr < piar->addr_lo) { + n = n->rb_left; + } else { + if (addr > piar->addr_hi) { + n = n->rb_right; + } else { + pci_dev_get(piar->pcidev); + return piar->pcidev; + } + } + } + + return NULL; +} + +/** + * pci_get_device_by_addr - Get device, given only address + * @addr: mmio (PIO) phys address or i/o port number + * + * Given an mmio phys address, or a port number, find a pci device + * that implements this address. Be sure to pci_dev_put the device + * when finished. I/O port numbers are assumed to be offset + * from zero (that is, they do *not* have pci_io_addr added in). + * It is safe to call this function within an interrupt. + */ +struct pci_dev *pci_get_device_by_addr(unsigned long addr) +{ + struct pci_dev *dev; + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + dev = __pci_get_device_by_addr(addr); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); + return dev; +} + +#ifdef DEBUG +/* + * Handy-dandy debug print routine, does nothing more + * than print out the contents of our addr cache. + */ +static void pci_addr_cache_print(struct pci_io_addr_cache *cache) +{ + struct rb_node *n; + int cnt = 0; + + n = rb_first(&cache->rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", + (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, + piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + cnt++; + n = rb_next(n); + } +} +#endif + +/* Insert address range into the rb tree. */ +static struct pci_io_addr_range * +pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, + unsigned long ahi, unsigned int flags) +{ + struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; + struct rb_node *parent = NULL; + struct pci_io_addr_range *piar; + + /* Walk tree, find a place to insert into tree */ + while (*p) { + parent = *p; + piar = rb_entry(parent, struct pci_io_addr_range, rb_node); + if (ahi < piar->addr_lo) { + p = &parent->rb_left; + } else if (alo > piar->addr_hi) { + p = &parent->rb_right; + } else { + if (dev != piar->pcidev || + alo != piar->addr_lo || ahi != piar->addr_hi) { + printk(KERN_WARNING "PIAR: overlapping address range\n"); + } + return piar; + } + } + piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); + if (!piar) + return NULL; + + piar->addr_lo = alo; + piar->addr_hi = ahi; + piar->pcidev = dev; + piar->flags = flags; + +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + + rb_link_node(&piar->rb_node, parent, p); + rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); + + return piar; +} + +static void __pci_addr_cache_insert_device(struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_dn *pdn; + int i; + int inserted = 0; + + dn = pci_device_to_OF_node(dev); + if (!dn) { + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); + return; + } + + /* Skip any devices for which EEH is not enabled. */ + pdn = PCI_DN(dn); + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { +#ifdef DEBUG + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); +#endif + return; + } + + /* The cache holds a reference to the device... */ + pci_dev_get(dev); + + /* Walk resources on this device, poke them into the tree */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long start = pci_resource_start(dev,i); + unsigned long end = pci_resource_end(dev,i); + unsigned int flags = pci_resource_flags(dev,i); + + /* We are interested only bus addresses, not dma or other stuff */ + if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if (start == 0 || ~start == 0 || end == 0 || ~end == 0) + continue; + pci_addr_cache_insert(dev, start, end, flags); + inserted = 1; + } + + /* If there was nothing to add, the cache has no reference... */ + if (!inserted) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_insert_device - Add a device to the address cache + * @dev: PCI device whose I/O addresses we are interested in. + * + * In order to support the fast lookup of devices based on addresses, + * we maintain a cache of devices that can be quickly searched. + * This routine adds a device to that cache. + */ +void pci_addr_cache_insert_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_insert_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) +{ + struct rb_node *n; + int removed = 0; + +restart: + n = rb_first(&pci_io_addr_cache_root.rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (piar->pcidev == dev) { + rb_erase(n, &pci_io_addr_cache_root.rb_root); + removed = 1; + kfree(piar); + goto restart; + } + n = rb_next(n); + } + + /* The cache no longer holds its reference to this device... */ + if (removed) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_remove_device - remove pci device from addr cache + * @dev: device to remove + * + * Remove a device from the addr-cache tree. + * This is potentially expensive, since it will walk + * the tree multiple times (once per resource). + * But so what; device removal doesn't need to be that fast. + */ +void pci_addr_cache_remove_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_remove_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +/** + * pci_addr_cache_build - Build a cache of I/O addresses + * + * Build a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + * This routine scans all pci busses to build the cache. + * Must be run late in boot process, after the pci controllers + * have been scaned for devices (after all device resources are known). + */ +void __init pci_addr_cache_build(void) +{ + struct device_node *dn; + struct pci_dev *dev = NULL; + + spin_lock_init(&pci_io_addr_cache_root.piar_lock); + + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { + /* Ignore PCI bridges */ + if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) + continue; + + pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + eeh_save_bars(dev, PCI_DN(dn)); + } + +#ifdef DEBUG + /* Verify tree built up above, echo back the list of addrs. */ + pci_addr_cache_print(&pci_io_addr_cache_root); +#endif +} + Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:41:18.454448689 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:42:38.998153856 -0600 @@ -53,6 +53,14 @@ /* ---- EEH internal-use-only related routines ---- */ #ifdef CONFIG_EEH + +void pci_addr_cache_insert_device(struct pci_dev *dev); +void pci_addr_cache_remove_device(struct pci_dev *dev); +void pci_addr_cache_build(void); +struct pci_dev *pci_get_device_by_addr(unsigned long addr); + +void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); + /** * eeh_slot_error_detail -- record and EEH error condition to the log * @severity: 1 if temporary, 2 if permanent failure. From linas at linas.org Fri Nov 4 11:53:20 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:20 -0600 Subject: [PATCH 26/42]: ppc64: Add "partion endpoint" support References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005320.GA27049@mail.gnucash.org> 26-eeh-partition-endpoint.patch New versions of firmware introduce a new method by which the "partition endpoint" (the point at which the pci bus is cut). This code adds the support for this (mandatory) new feature. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:42:38.986155538 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:43:49.212307192 -0600 @@ -83,6 +83,7 @@ static int ibm_read_slot_reset_state; static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; +static int ibm_get_config_addr_info; static int eeh_subsystem_enabled; @@ -457,6 +458,7 @@ static void rtas_pci_slot_reset(struct pci_dn *pdn, int state) { + int config_addr; int rc; BUG_ON (pdn==NULL); @@ -467,8 +469,13 @@ return; } + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, - pdn->eeh_config_addr, + config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid), state); @@ -695,8 +702,22 @@ eeh_subsystem_enabled = 1; pdn->eeh_mode |= EEH_MODE_SUPPORTED; pdn->eeh_config_addr = regs[0]; + + /* If the newer, better, ibm,get-config-addr-info is supported, + * then use that instead. */ + pdn->eeh_pe_config_addr = 0; + if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) { + unsigned int rets[2]; + ret = rtas_call (ibm_get_config_addr_info, 4, 2, rets, + pdn->eeh_config_addr, + info->buid_hi, info->buid_lo, + 0); + if (ret == 0) + pdn->eeh_pe_config_addr = rets[0]; + } #ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); + printk(KERN_DEBUG "EEH: %s: eeh enabled, config=%x pe_config=%x\n", + dn->full_name, pdn->eeh_config_addr, pdn->eeh_pe_config_addr); #endif } else { @@ -748,6 +769,7 @@ ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); + ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info"); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:39:24.755392219 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 14:43:49.218306351 -0600 @@ -63,6 +63,7 @@ int devfn; /* for pci devices */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_pe_config_addr; /* new-style partition endpoint address */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ int eeh_is_bridge; /* device is pci-to-pci bridge */ From linas at linas.org Fri Nov 4 11:53:36 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:36 -0600 Subject: [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005336.GA27057@mail.gnucash.org> 27-pci-error-recovery_IPR-driver.patch Subject: PCI Error Recovery: IPR SCSI device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the IPR SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas Signed-off-by: Brian King -- Index: linux-2.6.14-git3/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/ipr.c 2005-11-02 14:28:53.284922999 -0600 +++ linux-2.6.14-git3/drivers/scsi/ipr.c 2005-11-02 14:43:52.782806465 -0600 @@ -5328,6 +5328,94 @@ shutdown_type); } +/* --------------- PCI Error Recovery infrastructure ----------- */ +/** If the PCI slot is frozen, hold off all i/o + * activity; then, as soon as the slot is available again, + * initiate an adapter reset. + */ +static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd) +{ + /* Disallow new interrupts, avoid loop */ + ipr_cmd->ioa_cfg->allow_interrupts = 0; + list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); + ipr_cmd->done = ipr_reset_ioa_job; + return IPR_RC_JOB_RETURN; +} + +/** ipr_eeh_frozen -- called when slot has experience PCI bus error. + * This routine is called to tell us that the PCI bus is down. + * Can't do anything here, except put the device driver into a + * holding pattern, waiting for the PCI bus to come back. + */ +static void ipr_eeh_frozen (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +/** ipr_eeh_slot_reset - called when pci slot has been reset. + * + * This routine is called by the pci error recovery recovery + * code after the PCI slot has been reset, just before we + * should resume normal operations. + */ +static int ipr_eeh_slot_reset(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + // pci_enable_device(pdev); + // pci_set_master(pdev); + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, + IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); + + return PCIERR_RESULT_RECOVERED; +} + +/** This routine is called when the PCI bus has permanently + * failed. This routine should purge all pending I/O and + * shut down the device driver (close and unload). + */ +static void ipr_eeh_perm_failure(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + if (ioa_cfg->sdt_state == WAIT_FOR_DUMP) + ioa_cfg->sdt_state = ABORT_DUMP; + ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES; + ioa_cfg->in_ioa_bringdown = 1; + ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static int ipr_eeh_error_detected(struct pci_dev *pdev, + enum pci_channel_state state) +{ + switch (state) { + case pci_channel_io_frozen: + ipr_eeh_frozen (pdev); + return PCIERR_RESULT_NEED_RESET; + + case pci_channel_io_perm_failure: + ipr_eeh_perm_failure (pdev); + return PCIERR_RESULT_DISCONNECT; + break; + default: + break; + } + return PCIERR_RESULT_NEED_RESET; +} + +/* ------------- end of PCI Error Recovery suport ----------- */ + /** * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..) * @ioa_cfg: ioa cfg struct @@ -6065,12 +6153,18 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +static struct pci_error_handlers ipr_err_handler = { + .error_detected = ipr_eeh_error_detected, + .slot_reset = ipr_eeh_slot_reset, +}; + static struct pci_driver ipr_driver = { .name = IPR_NAME, .id_table = ipr_pci_table, .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, + .err_handler = &ipr_err_handler, }; /** From linas at linas.org Fri Nov 4 11:53:46 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:46 -0600 Subject: [PATCH 28/42]: SCSI: add PCI error recovery to Symbios dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005346.GA27066@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the Symbios SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:28:52.512031337 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:43:56.084343457 -0600 @@ -686,6 +686,10 @@ if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("["); + /* Avoid spinloop trying to handle interrupts on frozen device */ + if (np->s.io_state != pci_channel_io_normal) + return IRQ_HANDLED; + spin_lock_irqsave(np->s.host->host_lock, flags); sym_interrupt(np); spin_unlock_irqrestore(np->s.host->host_lock, flags); @@ -759,6 +763,25 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +static void sym_eeh_timeout(u_long p) +{ + struct sym_eh_wait *ep = (struct sym_eh_wait *) p; + if (!ep) + return; + complete(&ep->done); +} + +static void sym_eeh_done(struct sym_eh_wait *ep) +{ + if (!ep) + return; + ep->timed_out = 0; + if (!del_timer(&ep->timer)) + return; + + complete(&ep->done); +} + /* * Generic method for our eh processing. * The 'op' argument tells what we have to do. @@ -799,6 +822,35 @@ /* Try to proceed the operation we have been asked for */ sts = -1; + + /* We may be in an error condition because the PCI bus + * went down. In this case, we need to wait until the + * PCI bus is reset, the card is reset, and only then + * proceed with the scsi error recovery. We'll wait + * for 15 seconds for this to happen. + */ +#define WAIT_FOR_PCI_RECOVERY 15 + if (np->s.io_state != pci_channel_io_normal) { + struct sym_eh_wait eeh, *eep = &eeh; + np->s.io_reset_wait = eep; + init_completion(&eep->done); + init_timer(&eep->timer); + eep->to_do = SYM_EH_DO_WAIT; + eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ); + eep->timer.function = sym_eeh_timeout; + eep->timer.data = (u_long)eep; + eep->timed_out = 1; /* Be pessimistic for once :) */ + add_timer(&eep->timer); + spin_unlock_irq(np->s.host->host_lock); + wait_for_completion(&eep->done); + spin_lock_irq(np->s.host->host_lock); + if (eep->timed_out) { + printk (KERN_ERR "%s: Timed out waiting for PCI reset\n", + sym_name(np)); + } + np->s.io_reset_wait = NULL; + } + switch(op) { case SYM_EH_ABORT: sts = sym_abort_scsiio(np, cmd, 1); @@ -1584,6 +1636,8 @@ np->maxoffs = dev->chip.offset_max; np->maxburst = dev->chip.burst_max; np->myaddr = dev->host_id; + np->s.io_state = pci_channel_io_normal; + np->s.io_reset_wait = NULL; /* * Edit its name. @@ -1916,6 +1970,58 @@ return 1; } +/* ------------- PCI Error Recovery infrastructure -------------- */ +/** sym2_io_error_detected() is called when PCI error is detected */ +static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + np->s.io_state = state; + // XXX If slot is permanently frozen, then what? + // Should we scsi_remove_host() maybe ?? + + /* Request a slot slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** sym2_io_slot_reset is called when the pci bus has been reset. + * Restart the card from scratch. */ +static int sym2_io_slot_reset (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + printk (KERN_INFO "%s: recovering from a PCI slot reset\n", + sym_name(np)); + + if (pci_enable_device(pdev)) + printk (KERN_ERR "%s: device setup failed most egregiously\n", + sym_name(np)); + + pci_set_master(pdev); + enable_irq (pdev->irq); + + /* Perform host reset only on one instance of the card */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_reset_scsi_bus(np, 0); + + return PCIERR_RESULT_RECOVERED; +} + +/** sym2_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void sym2_io_resume (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + /* Perform device startup only once for this card. */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_start_up (np, 1); + + np->s.io_state = pci_channel_io_normal; + sym_eeh_done (np->s.io_reset_wait); +} + /* * Driver host template. */ @@ -2169,11 +2275,18 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +static struct pci_error_handlers sym2_err_handler = { + .error_detected = sym2_io_error_detected, + .slot_reset = sym2_io_slot_reset, + .resume = sym2_io_resume, +}; + static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), + .err_handler = &sym2_err_handler, }; static int __init sym2_init(void) Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-02 14:28:52.513031197 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-02 14:43:56.089342756 -0600 @@ -181,6 +181,10 @@ char chip_name[8]; struct pci_dev *device; + /* pci bus i/o state; waiter for clearing of i/o state */ + enum pci_channel_state io_state; + struct sym_eh_wait *io_reset_wait; + struct Scsi_Host *host; void __iomem * ioaddr; /* MMIO kernel io address */ Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-02 14:28:52.513031197 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-02 14:43:56.141335464 -0600 @@ -2809,6 +2809,7 @@ u_char istat, istatc; u_char dstat; u_short sist; + u_int icnt; /* * interrupt on the fly ? @@ -2850,6 +2851,7 @@ sist = 0; dstat = 0; istatc = istat; + icnt = 0; do { if (istatc & SIP) sist |= INW(np, nc_sist); @@ -2857,6 +2859,19 @@ dstat |= INB(np, nc_dstat); istatc = INB(np, nc_istat); istat |= istatc; + + /* Prevent deadlock waiting on a condition that may never clear. */ + /* XXX this is a temporary kludge; the correct to detect + * a PCI bus error would be to use the io_check interfaces + * proposed by Hidetoshi Seto + * Problem with polling like that is the state flag might not + * be set. + */ + icnt ++; + if (100 < icnt) { + if (np->s.device->error_state != pci_channel_io_normal) + return; + } } while (istatc & (SIP|DIP)); if (DEBUG_FLAGS & DEBUG_TINY) From linas at linas.org Fri Nov 4 11:53:53 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:53 -0600 Subject: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005353.GA27074@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ethernet e100 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/e100.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e100.c 2005-11-02 14:28:51.524169808 -0600 +++ linux-2.6.14-git3/drivers/net/e100.c 2005-11-02 14:43:58.890949857 -0600 @@ -2465,6 +2465,75 @@ } +/* ------------------ PCI Error Recovery infrastructure -------------- */ +/** e100_io_error_detected() is called when PCI error is detected */ +static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + + /* Same as calling e100_down(netdev_priv(netdev)), but generic */ + netdev->stop(netdev); + + /* Is a detach needed ?? */ + // netif_device_detach(netdev); + + /* Request a slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** e100_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. */ +static int e100_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Only one device per card can do a reset */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + e100_hw_reset(nic); + e100_phy_init(nic); + + if(e100_hw_init(nic)) { + DPRINTK(HW, ERR, "e100_hw_init failed\n"); + return PCIERR_RESULT_DISCONNECT; + } + + return PCIERR_RESULT_RECOVERED; +} + +/** e100_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void e100_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + /* ack any pending wake events, disable PME */ + pci_enable_wake(pdev, 0, 0); + + netif_device_attach(netdev); + if(netif_running(netdev)) { + e100_open (netdev); + mod_timer(&nic->watchdog, jiffies); + } +} + +static struct pci_error_handlers e100_err_handler = { + .error_detected = e100_io_error_detected, + .slot_reset = e100_io_slot_reset, + .resume = e100_io_resume, +}; + + static struct pci_driver e100_driver = { .name = DRV_NAME, .id_table = e100_id_table, @@ -2475,6 +2544,7 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, + .err_handler = &e100_err_handler, }; static int __init e100_init_module(void) From linas at linas.org Fri Nov 4 11:54:04 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:04 -0600 Subject: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005404.GA27082@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel gigabit ethernet e1000 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c 2005-11-02 14:28:50.471317390 -0600 +++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:00.730691851 -0600 @@ -206,6 +206,16 @@ void e1000_rx_schedule(void *data); #endif +static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state); +static int e1000_io_slot_reset(struct pci_dev *pdev); +static void e1000_io_resume(struct pci_dev *pdev); + +static struct pci_error_handlers e1000_err_handler = { + .error_detected = e1000_io_error_detected, + .slot_reset = e1000_io_slot_reset, + .resume = e1000_io_resume, +}; + /* Exported from other modules */ extern void e1000_check_options(struct e1000_adapter *adapter); @@ -218,8 +228,9 @@ /* Power Managment Hooks */ #ifdef CONFIG_PM .suspend = e1000_suspend, - .resume = e1000_resume + .resume = e1000_resume, #endif + .err_handler = &e1000_err_handler, }; MODULE_AUTHOR("Intel Corporation, "); @@ -2937,6 +2948,10 @@ #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF + /* Prevent stats update while adapter is being reset */ + if (adapter->link_speed == 0) + return; + spin_lock_irqsave(&adapter->stats_lock, flags); /* these counters are modified from e1000_adjust_tbi_stats, @@ -4358,4 +4373,88 @@ } #endif +/* --------------- PCI Error Recovery infrastructure ------------ */ +/** e1000_io_error_detected() is called when PCI error is detected */ +static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (netif_running(netdev)) + e1000_down(adapter); + + /* Request a slot slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** e1000_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * e1000_resume routine. + */ +static int e1000_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + pci_enable_wake(pdev, 3, 0); + pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */ + + /* Perform card reset only on one instance of the card */ + if(0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + e1000_reset(adapter); + E1000_WRITE_REG(&adapter->hw, WUS, ~0); + + return PCIERR_RESULT_RECOVERED; +} + +/** e1000_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * e1000_resume routine. + */ +static void e1000_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + uint32_t manc, swsm; + + if(netif_running(netdev)) { + if (e1000_up(adapter)) { + printk("e1000: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + + if(adapter->hw.mac_type >= e1000_82540 && + adapter->hw.media_type == e1000_media_type_copper) { + manc = E1000_READ_REG(&adapter->hw, MANC); + manc &= ~(E1000_MANC_ARP_EN); + E1000_WRITE_REG(&adapter->hw, MANC, manc); + } + + switch(adapter->hw.mac_type) { + case e1000_82573: + swsm = E1000_READ_REG(&adapter->hw, SWSM); + E1000_WRITE_REG(&adapter->hw, SWSM, + swsm | E1000_SWSM_DRV_LOAD); + break; + default: + break; + } + + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); +} + /* e1000_main.c */ From linas at linas.org Fri Nov 4 11:54:11 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:11 -0600 Subject: [PATCH 31/42]: ethernet: add PCI error recovery to ixgb dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005411.GA27090@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ten-gigabit ethernet ixgb device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:28:49.225492020 -0600 +++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:02.380460486 -0600 @@ -132,6 +132,16 @@ static void ixgb_netpoll(struct net_device *dev); #endif +static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state); +static int ixgb_io_slot_reset (struct pci_dev *pdev); +static void ixgb_io_resume (struct pci_dev *pdev); + +static struct pci_error_handlers ixgb_err_handler = { + .error_detected = ixgb_io_error_detected, + .slot_reset = ixgb_io_slot_reset, + .resume = ixgb_io_resume, +}; + /* Exported from other modules */ extern void ixgb_check_options(struct ixgb_adapter *adapter); @@ -141,6 +151,8 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), + .err_handler = &ixgb_err_handler, + }; MODULE_AUTHOR("Intel Corporation, "); @@ -1654,8 +1666,16 @@ unsigned int i; #endif +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { + if (eeh_slot_is_isolated (adapter->pdev)) + // disable_irq_nosync (adapter->pdev->irq); + return IRQ_NONE; /* Not our interrupt */ + } +#else if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ +#endif /* CONFIG_IXGB_EEH_RECOVERY */ if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) { mod_timer(&adapter->watchdog_timer, jiffies); @@ -2125,4 +2145,70 @@ } #endif +/* -------------- PCI Error Recovery infrastructure ---------------- */ +/** ixgb_io_error_detected() is called when PCI error is detected */ +static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) + ixgb_down(adapter, TRUE); + + /* Request a slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** ixgb_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * ixgb_resume routine. + */ +static int ixgb_io_slot_reset (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Perform card reset only on one instance of the card */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + ixgb_reset(adapter); + + return PCIERR_RESULT_RECOVERED; +} + +/** ixgb_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * ixgb_resume routine. + */ +static void ixgb_io_resume (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) { + if(ixgb_up(adapter)) { + printk ("ixgb: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); + + /* Reading all-ff's from the adapter will completely hose + * the counts and statistics. So just clear them out */ + memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); + ixgb_update_stats(adapter); +} + /* ixgb_main.c */ From linas at linas.org Fri Nov 4 11:54:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:17 -0600 Subject: [PATCH 32/42]: RFC: Add compile-time config options References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005417.GA27098@mail.gnucash.org> 32-pci-error-recovery_config-option.patch This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the various device drivers. This patch is "optional" in that its a little bit messy, but it does solve a little problem. -- The good news: this gives some users (e.g. embeddd systems) the option of not compiling in this code, thus making thier device drivers a tiny bit smaller. -- The bad news: This also clutters up the drivers with extraneous markup and the config process with yet another config. I don't know if this patch is worth it. Apply or reject, as desired ... Its up to you ... :-) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/ipr.c 2005-11-02 14:43:52.782806465 -0600 +++ linux-2.6.14-git3/drivers/scsi/ipr.c 2005-11-02 14:44:04.167209911 -0600 @@ -5329,6 +5329,8 @@ } /* --------------- PCI Error Recovery infrastructure ----------- */ +#ifdef CONFIG_PCIERR_RECOVERY + /** If the PCI slot is frozen, hold off all i/o * activity; then, as soon as the slot is available again, * initiate an adapter reset. @@ -5414,6 +5416,7 @@ return PCIERR_RESULT_NEED_RESET; } +#endif /* CONFIG_PCIERR_RECOVERY */ /* ------------- end of PCI Error Recovery suport ----------- */ /** @@ -6153,10 +6156,12 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +#ifdef CONFIG_PCIERR_RECOVERY static struct pci_error_handlers ipr_err_handler = { .error_detected = ipr_eeh_error_detected, .slot_reset = ipr_eeh_slot_reset, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver ipr_driver = { .name = IPR_NAME, @@ -6164,7 +6169,9 @@ .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &ipr_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; /** Index: linux-2.6.14-git3/drivers/pci/Kconfig =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/Kconfig 2005-11-02 14:28:48.597580036 -0600 +++ linux-2.6.14-git3/drivers/pci/Kconfig 2005-11-02 14:44:04.172209210 -0600 @@ -13,6 +13,21 @@ If you don't know what to do here, say N. +config PCIERR_RECOVERY + bool "PCI Error Recovery support" + depends on PCI + depends on PPC_PSERIES + default y + help + PCI Error Recovery is a mechanism by which crashed/hung + PCI adapters are automatically detected and rebooted without + otherwise disturbing the operation of the system. Support + for this recovery requires special PCI bridge chips (some + PCI-E chips may have this support) as well as support in + the device drivers (not all device drivers can handle this). + + When in doubt, say Y. + config PCI_LEGACY_PROC bool "Legacy /proc/pci interface" depends on PCI Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:43:56.084343457 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:44:04.195205985 -0600 @@ -763,6 +763,7 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +#ifdef CONFIG_PCIERR_RECOVERY static void sym_eeh_timeout(u_long p) { struct sym_eh_wait *ep = (struct sym_eh_wait *) p; @@ -781,6 +782,7 @@ complete(&ep->done); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* * Generic method for our eh processing. @@ -823,6 +825,7 @@ /* Try to proceed the operation we have been asked for */ sts = -1; +#ifdef CONFIG_PCIERR_RECOVERY /* We may be in an error condition because the PCI bus * went down. In this case, we need to wait until the * PCI bus is reset, the card is reset, and only then @@ -850,6 +853,7 @@ } np->s.io_reset_wait = NULL; } +#endif /* CONFIG_PCIERR_RECOVERY */ switch(op) { case SYM_EH_ABORT: @@ -1971,6 +1975,7 @@ } /* ------------- PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** sym2_io_error_detected() is called when PCI error is detected */ static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) { @@ -2021,6 +2026,7 @@ np->s.io_state = pci_channel_io_normal; sym_eeh_done (np->s.io_reset_wait); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* * Driver host template. @@ -2275,18 +2281,22 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +#ifdef CONFIG_PCIERR_RECOVERY static struct pci_error_handlers sym2_err_handler = { .error_detected = sym2_io_error_detected, .slot_reset = sym2_io_slot_reset, .resume = sym2_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &sym2_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; static int __init sym2_init(void) Index: linux-2.6.14-git3/drivers/net/e100.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e100.c 2005-11-02 14:43:58.890949857 -0600 +++ linux-2.6.14-git3/drivers/net/e100.c 2005-11-02 14:44:04.222202199 -0600 @@ -2466,6 +2466,7 @@ /* ------------------ PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** e100_io_error_detected() is called when PCI error is detected */ static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) { @@ -2532,6 +2533,7 @@ .slot_reset = e100_io_slot_reset, .resume = e100_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver e100_driver = { @@ -2544,7 +2546,9 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &e100_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; static int __init e100_init_module(void) Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:00.730691851 -0600 +++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:04.266196029 -0600 @@ -206,6 +206,7 @@ void e1000_rx_schedule(void *data); #endif +#ifdef CONFIG_PCIERR_RECOVERY static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state); static int e1000_io_slot_reset(struct pci_dev *pdev); static void e1000_io_resume(struct pci_dev *pdev); @@ -215,6 +216,7 @@ .slot_reset = e1000_io_slot_reset, .resume = e1000_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ /* Exported from other modules */ @@ -230,7 +232,9 @@ .suspend = e1000_suspend, .resume = e1000_resume, #endif +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &e1000_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -4374,6 +4378,7 @@ #endif /* --------------- PCI Error Recovery infrastructure ------------ */ +#ifdef CONFIG_PCIERR_RECOVERY /** e1000_io_error_detected() is called when PCI error is detected */ static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) { @@ -4456,5 +4461,6 @@ if(netif_running(netdev)) mod_timer(&adapter->watchdog_timer, jiffies); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* e1000_main.c */ Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:02.380460486 -0600 +++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:04.289192804 -0600 @@ -132,6 +132,7 @@ static void ixgb_netpoll(struct net_device *dev); #endif +#ifdef CONFIG_PCIERR_RECOVERY static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state); static int ixgb_io_slot_reset (struct pci_dev *pdev); static void ixgb_io_resume (struct pci_dev *pdev); @@ -141,6 +142,7 @@ .slot_reset = ixgb_io_slot_reset, .resume = ixgb_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ /* Exported from other modules */ @@ -151,8 +153,9 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &ixgb_err_handler, - +#endif /* CONFIG_PCIERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -2146,6 +2149,7 @@ #endif /* -------------- PCI Error Recovery infrastructure ---------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** ixgb_io_error_detected() is called when PCI error is detected */ static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) { @@ -2210,5 +2214,6 @@ memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); ixgb_update_stats(adapter); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* ixgb_main.c */ From linas at linas.org Fri Nov 4 11:54:23 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:23 -0600 Subject: [PATCH 33/42]: ppc64: remove bogus printk References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005423.GA27106@mail.gnucash.org> 233-eeh-buid-fix.patch Remove un-desired warning print from EEH code. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:43:49.212307192 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:00.429319560 -0600 @@ -824,12 +824,10 @@ if (!dn || !PCI_DN(dn)) return; phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); + + /* USB Bus children of PCI devices will not have BUID's */ + if (NULL == phb || 0 == phb->buid) return; - } info.buid_hi = BUID_HI(phb->buid); info.buid_lo = BUID_LO(phb->buid); From linas at linas.org Fri Nov 4 11:54:29 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:29 -0600 Subject: [PATCH 34/42]: ppc64: Remove duplicate code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005429.GA27114@mail.gnucash.org> 234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:41:18.435451353 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:45:43.638259683 -0600 @@ -42,19 +42,6 @@ return ""; } -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - - #ifdef DEBUG static void print_device_node_tree (struct pci_dn *pdn, int dent) { Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:00.429319560 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:43.651257860 -0600 @@ -172,7 +172,7 @@ /** * Return the "partitionable endpoint" (pe) under which this device lies */ -static struct device_node * find_device_pe(struct device_node *dn) +struct device_node * find_device_pe(struct device_node *dn) { while ((dn->parent) && PCI_DN(dn->parent) && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:42:38.998153856 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:45:43.656257159 -0600 @@ -110,6 +110,9 @@ void eeh_mark_slot (struct device_node *dn, int mode_flag); void eeh_clear_slot (struct device_node *dn, int mode_flag); +/* Find the associated "Partiationable Endpoint" PE */ +struct device_node * find_device_pe(struct device_node *dn); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:54:34 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:34 -0600 Subject: [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005434.GA27122@mail.gnucash.org> 235-eeh-set-pcidev-bugfix.patch The pci device field should be initialized to a valid value. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:42:38.994154417 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:46:23.687642815 -0600 @@ -307,6 +307,9 @@ /* Save the BAR's; firmware doesn't restore these after EEH reset */ dn = pci_device_to_OF_node(dev); eeh_save_bars(dev, PCI_DN(dn)); + + pci_dev_get (dev); /* matching put is in eeh_remove_device() */ + PCI_DN(dn)->pcidev = dev; } #ifdef DEBUG From linas at linas.org Fri Nov 4 11:54:39 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:39 -0600 Subject: [PATCH 36/42]: ppc64: Use PE configuration address consistently References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005439.GA27130@mail.gnucash.org> 236-eeh-config-addr.patch The PE configuration address wasn't being cnsistently used in all locations where a config address is called for. This patch adds it to the places it should have appeared in. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:43.651257860 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:07.798456202 -0600 @@ -110,6 +110,7 @@ void eeh_slot_error_detail (struct pci_dn *pdn, int severity) { + int config_addr; unsigned long flags; int rc; @@ -117,8 +118,13 @@ spin_lock_irqsave(&slot_errbuf_lock, flags); memset(slot_errbuf, 0, eeh_error_buf_size); + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, + 8, 1, NULL, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid), NULL, 0, virt_to_phys(slot_errbuf), @@ -138,6 +144,7 @@ static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) { int token, outputs; + int config_addr; if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { token = ibm_read_slot_reset_state2; @@ -148,7 +155,12 @@ outputs = 3; } - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + + return rtas_call(token, 3, outputs, rets, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); } @@ -284,7 +296,7 @@ return 0; } - if (!pdn->eeh_config_addr) { + if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) { __get_cpu_var(no_cfg_addr)++; return 0; } @@ -613,13 +625,20 @@ void rtas_configure_bridge(struct pci_dn *pdn) { + int config_addr; int token = rtas_token ("ibm,configure-bridge"); int rc; if (token == RTAS_UNKNOWN_SERVICE) return; + + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(token,3,1, NULL, - pdn->eeh_config_addr, + config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); if (rc) { From linas at linas.org Fri Nov 4 11:54:47 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:47 -0600 Subject: [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them. References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005447.GA27138@mail.gnucash.org> 237-eeh-bridge-token.patch Minor: the rtas-bridge toekn should be set up the same way that all the other tokens rtas tokens are set up. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:07.798456202 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:38.997080468 -0600 @@ -84,6 +84,7 @@ static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; static int ibm_get_config_addr_info; +static int ibm_configure_bridge; static int eeh_subsystem_enabled; @@ -626,18 +627,14 @@ rtas_configure_bridge(struct pci_dn *pdn) { int config_addr; - int token = rtas_token ("ibm,configure-bridge"); int rc; - if (token == RTAS_UNKNOWN_SERVICE) - return; - /* Use PE configuration address, if present */ config_addr = pdn->eeh_config_addr; if (pdn->eeh_pe_config_addr) config_addr = pdn->eeh_pe_config_addr; - rc = rtas_call(token,3,1, NULL, + rc = rtas_call(ibm_configure_bridge,3,1, NULL, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); @@ -789,6 +786,7 @@ ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info"); + ibm_configure_bridge = rtas_token ("ibm,configure-bridge"); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; From linas at linas.org Fri Nov 4 11:54:54 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:54 -0600 Subject: [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed. References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005454.GA27146@mail.gnucash.org> 238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:38.997080468 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:13.093298267 -0600 @@ -450,11 +450,16 @@ if (rc) return rc; if (rets[1] == 0) return -1; /* EEH is not supported */ - if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 0) return 0; /* Oll Korrect */ if (rets[0] == 5) { if (rets[2] == 0) return -1; /* permanently unavailable */ return rets[2]; /* number of millisecs to wait */ } + if (rets[0] == 1) + return 250; + + printk (KERN_ERR "EEH: Slot unavailable: rc=%d, rets=%d %d %d\n", + rc, rets[0], rets[1], rets[2]); return -1; } @@ -501,9 +506,11 @@ /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second * dn -- device node to be reset. + * + * Return 0 if success, else a non-zero value. */ -void +int rtas_set_slot_reset(struct pci_dn *pdn) { int i, rc; @@ -533,10 +540,21 @@ * ready to be used; if not, wait for recovery. */ for (i=0; i<10; i++) { rc = eeh_slot_availability (pdn); - if (rc <= 0) break; + if (rc < 0) + printk (KERN_ERR "EEH: failed (%d) to reset slot %s\n", rc, pdn->node->full_name); + if (rc == 0) + return 0; + if (rc < 0) + return -1; msleep (rc+100); } + + rc = eeh_slot_availability (pdn); + if (rc) + printk (KERN_ERR "EEH: timeout resetting slot %s\n", pdn->node->full_name); + + return rc; } /* ------------------------------------------------------- */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:45:43.638259683 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:13.100297285 -0600 @@ -200,14 +200,18 @@ * bus resets can be performed. */ -static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) { + int rc; if (bus) pcibios_remove_pci_devices(bus); /* Reset the pci controller. (Asserts RST#; resets config space). - * Reconfigure bridges and devices */ - rtas_set_slot_reset(pe_dn); + * Reconfigure bridges and devices. Don't try to bring the system + * up if the reset failed for some reason. */ + rc = rtas_set_slot_reset(pe_dn); + if (rc) + return rc; /* Walk over all functions on this device */ rtas_configure_bridge(pe_dn); @@ -223,6 +227,8 @@ ssleep (5); pcibios_add_pci_devices(bus); } + + return 0; } /* The longest amount of time to wait for a pci device @@ -235,7 +241,7 @@ struct device_node *frozen_dn; struct pci_dn *frozen_pdn; struct pci_bus *frozen_bus; - int perm_failure = 0; + int rc = 0; frozen_dn = find_device_pe(event->dn); frozen_bus = pcibios_find_pci_bus(frozen_dn); @@ -272,7 +278,7 @@ frozen_pdn->eeh_freeze_count++; if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES) - perm_failure = 1; + goto hard_fail; /* If the reset state is a '5' and the time to reset is 0 (infinity) * or is more then 15 seconds, then mark this as a permanent failure. @@ -280,34 +286,7 @@ if ((event->state == pci_channel_io_perm_failure) && ((event->time_unavail <= 0) || (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) - { - perm_failure = 1; - } - - /* Log the error with the rtas logger. */ - if (perm_failure) { - /* - * About 90% of all real-life EEH failures in the field - * are due to poorly seated PCI cards. Only 10% or so are - * due to actual, failed cards. - */ - printk(KERN_ERR - "EEH: PCI device %s - %s has failed %d times \n" - "and has been permanently disabled. Please try reseating\n" - "this device or replacing it.\n", - pci_name (frozen_pdn->pcidev), - pcid_name(frozen_pdn->pcidev), - frozen_pdn->eeh_freeze_count); - - eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); - - /* Notify all devices that they're about to go down. */ - pci_walk_bus(frozen_bus, eeh_report_failure, 0); - - /* Shut down the device drivers for good. */ - pcibios_remove_pci_devices(frozen_bus); - return; - } + goto hard_fail; eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */); printk(KERN_WARNING @@ -330,24 +309,54 @@ * go down willingly, without panicing the system. */ if (result == PCIERR_RESULT_NONE) { - eeh_reset_device(frozen_pdn, frozen_bus); + rc = eeh_reset_device(frozen_pdn, frozen_bus); + if (rc) + goto hard_fail; } /* If any device called out for a reset, then reset the slot */ if (result == PCIERR_RESULT_NEED_RESET) { - eeh_reset_device(frozen_pdn, NULL); + rc = eeh_reset_device(frozen_pdn, NULL); + if (rc) + goto hard_fail; pci_walk_bus(frozen_bus, eeh_report_reset, 0); } /* If all devices reported they can proceed, the re-enable PIO */ if (result == PCIERR_RESULT_CAN_RECOVER) { /* XXX Not supported; we brute-force reset the device */ - eeh_reset_device(frozen_pdn, NULL); + rc = eeh_reset_device(frozen_pdn, NULL); + if (rc) + goto hard_fail; pci_walk_bus(frozen_bus, eeh_report_reset, 0); } /* Tell all device drivers that they can resume operations */ pci_walk_bus(frozen_bus, eeh_report_resume, 0); + + return; + +hard_fail: + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk(KERN_ERR + "EEH: PCI device %s - %s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev), + frozen_pdn->eeh_freeze_count); + + eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); + + /* Notify all devices that they're about to go down. */ + pci_walk_bus(frozen_bus, eeh_report_failure, 0); + + /* Shut down the device drivers for good. */ + pcibios_remove_pci_devices(frozen_bus); } /* ---------- end of file ---------- */ Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:45:43.656257159 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:48:13.104296724 -0600 @@ -77,8 +77,10 @@ * does this by asserting the PCI #RST line for 1/8th of * a second; this routine will sleep while the adapter is * being reset. + * + * Returns a non-zero value if the reset failed. */ -void rtas_set_slot_reset (struct pci_dn *); +int rtas_set_slot_reset (struct pci_dn *); /** * eeh_restore_bars - Restore device configuration info. From linas at linas.org Fri Nov 4 11:55:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:01 -0600 Subject: [PATCH 39/42]: ppc64: handle multifunction PCI devices properly References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005501.GA27154@mail.gnucash.org> 239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, tehse devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:13.093298267 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:44.941831253 -0600 @@ -223,6 +223,11 @@ void eeh_mark_slot (struct device_node *dn, int mode_flag) { dn = find_device_pe (dn); + + /* Back up one, since config addrs might be shared */ + if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr) + dn = dn->parent; + PCI_DN(dn)->eeh_mode |= mode_flag; __eeh_mark_slot (dn->child, mode_flag); } @@ -244,7 +249,13 @@ { unsigned long flags; spin_lock_irqsave(&confirm_error_lock, flags); + dn = find_device_pe (dn); + + /* Back up one, since config addrs might be shared */ + if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr) + dn = dn->parent; + PCI_DN(dn)->eeh_mode &= ~mode_flag; PCI_DN(dn)->eeh_check_count = 0; __eeh_clear_slot (dn->child, mode_flag); @@ -609,7 +620,7 @@ if (!pdn) return; - if (! pdn->eeh_is_bridge) + if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge)) __restore_bars (pdn); dn = pdn->node->child; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:13.100297285 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:44.950829991 -0600 @@ -213,9 +213,23 @@ if (rc) return rc; - /* Walk over all functions on this device */ - rtas_configure_bridge(pe_dn); - eeh_restore_bars(pe_dn); + /* New-style config addrs might be shared across multiple devices, + * Walk over all functions on this device */ + if (pe_dn->eeh_pe_config_addr) { + struct device_node *pe = pe_dn->node; + pe = pe->parent->child; + while (pe) { + struct pci_dn *ppe = PCI_DN(pe); + if (pe_dn->eeh_pe_config_addr == ppe->eeh_pe_config_addr) { + rtas_configure_bridge(ppe); + eeh_restore_bars(ppe); + } + pe = pe->sibling; + } + } else { + rtas_configure_bridge(pe_dn); + eeh_restore_bars(pe_dn); + } /* Give the system 5 seconds to finish running the user-space * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, From linas at linas.org Fri Nov 4 11:55:14 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:14 -0600 Subject: [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005514.GA27179@mail.gnucash.org> 240-ioremap-null-ptr-test.patch Under highly unusual circumstances, a buggy driver will ask a null ptr to be ioremapped, an operation that curently suceeds but leads to later trouble. Instead, refuse to remap the null pointer. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/mm/pgtable_64.c 2005-11-02 14:59:56.507624778 -0600 +++ linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c 2005-11-02 15:01:04.284115774 -0600 @@ -185,7 +185,7 @@ pa = addr & PAGE_MASK; size = PAGE_ALIGN(addr + size) - pa; - if (size == 0) + if ((size == 0) || (pa == 0)) return NULL; if (mem_init_done) { From linas at linas.org Fri Nov 4 11:55:19 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:19 -0600 Subject: [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005519.GA27189@mail.gnucash.org> 241-eeh-save-bars-earlier.patch Save the PCI device bars *before* any PCI probing is done. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c 2005-10-31 12:01:21.000000000 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c 2005-11-02 16:52:48.556202006 -0600 @@ -72,7 +72,7 @@ return 0; } -static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) +int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 16:53:29.000000000 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 17:28:14.843073955 -0600 @@ -59,8 +59,6 @@ void pci_addr_cache_build(void); struct pci_dev *pci_get_device_by_addr(unsigned long addr); -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); - /** * eeh_slot_error_detail -- record and EEH error condition to the log * @severity: 1 if temporary, 2 if permanent failure. @@ -104,6 +102,7 @@ void rtas_configure_bridge(struct pci_dn *); int rtas_write_config(struct pci_dn *, int where, int size, u32 val); +int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); /** * mark and clear slots: find "partition endpoint" PE and set or Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:43:49.000000000 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 17:13:07.358586231 -0600 @@ -58,15 +58,15 @@ struct iommu_table; struct pci_dn { - int busno; /* for pci devices */ - int bussubno; /* for pci devices */ - int devfn; /* for pci devices */ + int busno; /* pci bus number */ + int bussubno; /* pci subordinate bus number */ + int devfn; /* pci device and function number */ + int class_code; /* pci device class */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; int eeh_pe_config_addr; /* new-style partition endpoint address */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ - int eeh_is_bridge; /* device is pci-to-pci bridge */ int pci_ext_config_space; /* for pci devices */ struct pci_controller *phb; /* for pci devices */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 16:45:55.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:42:28.243139205 -0600 @@ -106,6 +106,8 @@ static DEFINE_PER_CPU(unsigned long, ignored_failures); static DEFINE_PER_CPU(unsigned long, slot_resets); +#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE) + /* --------------------------------------------------------------- */ /* Below lies the EEH event infrastructure */ @@ -620,7 +622,7 @@ if (!pdn) return; - if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge)) + if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && !IS_BRIDGE(pdn->class_code)) __restore_bars (pdn); dn = pdn->node->child; @@ -638,18 +640,15 @@ * PCI devices are added individuallly; but, for the restore, * an entire slot is reset at a time. */ -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +static void eeh_save_bars(struct pci_dn *pdn) { int i; - if (!pdev || !pdn ) + if (!pdn ) return; for (i = 0; i < 16; i++) - pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]); - - if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - pdn->eeh_is_bridge = 1; + rtas_read_config(pdn, i * 4, 4, &pdn->config_space[i]); } void @@ -699,6 +698,7 @@ int enable; struct pci_dn *pdn = PCI_DN(dn); + pdn->class_code = *class_code; pdn->eeh_mode = 0; pdn->eeh_check_count = 0; pdn->eeh_freeze_count = 0; @@ -781,6 +781,7 @@ dn->full_name); } + eeh_save_bars(pdn); return NULL; } @@ -915,7 +916,6 @@ pdn->pcidev = dev; pci_addr_cache_insert_device (dev); - eeh_save_bars(dev, pdn); } EXPORT_SYMBOL_GPL(eeh_add_device_late); Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 16:45:55.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 18:40:54.893242771 -0600 @@ -304,10 +304,7 @@ pci_addr_cache_insert_device(dev); - /* Save the BAR's; firmware doesn't restore these after EEH reset */ dn = pci_device_to_OF_node(dev); - eeh_save_bars(dev, PCI_DN(dn)); - pci_dev_get (dev); /* matching put is in eeh_remove_device() */ PCI_DN(dn)->pcidev = dev; } From linas at linas.org Fri Nov 4 11:55:25 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:25 -0600 Subject: [PATCH 42/42]: ppc64: get rid of per_cpu counters References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005525.GA27197@mail.gnucash.org> 242-eeh-no-percpu-counters.patch Remove per-cpu counters from the EEH code. These statistics counters are incremented at a very low-frequency, and the performance gains of per-cpu variables are negligable. By conrast, the counters weren't safe against cpu gard operations, and its not worth the effeort to make them so (other than to turn them into plain globals). Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:42:28.243139205 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:49:24.196716323 -0600 @@ -97,14 +97,14 @@ static int eeh_error_buf_size; /* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); +static unsigned long no_device; +static unsigned long no_dn; +static unsigned long no_cfg_addr; +static unsigned long ignored_check; +static unsigned long total_mmio_ffs; +static unsigned long false_positives; +static unsigned long ignored_failures; +static unsigned long slot_resets; #define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE) @@ -288,13 +288,13 @@ enum pci_channel_state state; int rc = 0; - __get_cpu_var(total_mmio_ffs)++; + total_mmio_ffs++; if (!eeh_subsystem_enabled) return 0; if (!dn) { - __get_cpu_var(no_dn)++; + no_dn++; return 0; } pdn = PCI_DN(dn); @@ -302,7 +302,7 @@ /* Access to IO BARs might get this far and still not want checking. */ if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; + ignored_check++; #ifdef DEBUG printk ("EEH:ignored check (%x) for %s %s\n", pdn->eeh_mode, pci_name (dev), dn->full_name); @@ -311,7 +311,7 @@ } if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) { - __get_cpu_var(no_cfg_addr)++; + no_cfg_addr++; return 0; } @@ -353,7 +353,7 @@ if (ret != 0) { printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", ret, dn->full_name); - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } @@ -362,14 +362,14 @@ if (rets[1] != 1) { printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", ret, dn->full_name); - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } /* If not the kind of error we know about, punt. */ if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } @@ -377,12 +377,12 @@ /* Note that config-io to empty slots may fail; * we recognize empty because they don't have children. */ if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } - __get_cpu_var(slot_resets)++; + slot_resets++; /* Avoid repeated reports of this failure, including problems * with other functions on this device, and functions under @@ -432,7 +432,7 @@ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); if (!dev) { - __get_cpu_var(no_device)++; + no_device++; return val; } @@ -963,25 +963,9 @@ static int proc_eeh_show(struct seq_file *m, void *v) { - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - if (0 == eeh_subsystem_enabled) { seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); + seq_printf(m, "eeh_total_mmio_ffs=%ld\n", total_mmio_ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); seq_printf(m, @@ -993,8 +977,10 @@ "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); + no_device, no_dn, no_cfg_addr, + ignored_check, total_mmio_ffs, + false_positives, ignored_failures, + slot_resets); } return 0; From linas at linas.org Fri Nov 4 11:57:35 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:57:35 -0600 Subject: [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005735.GA27243@mail.gnucash.org> 11-eeh-move-to-powerpc.patch Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c No other changes (except for Makefile to build it) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-11-02 14:29:22.485829789 -0600 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,1093 +0,0 @@ -/* - * eeh.c - * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#undef DEBUG - -/** Overview: - * EEH, or "Extended Error Handling" is a PCI bridge technology for - * dealing with PCI bus errors that can't be dealt with within the - * usual PCI framework, except by check-stopping the CPU. Systems - * that are designed for high-availability/reliability cannot afford - * to crash due to a "mere" PCI error, thus the need for EEH. - * An EEH-capable bridge operates by converting a detected error - * into a "slot freeze", taking the PCI adapter off-line, making - * the slot behave, from the OS'es point of view, as if the slot - * were "empty": all reads return 0xff's and all writes are silently - * ignored. EEH slot isolation events can be triggered by parity - * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by low voltage on the bus, dust, - * vibration, humidity, radioactivity or plain-old failed hardware. - * - * Note, however, that one of the leading causes of EEH slot - * freeze events are buggy device drivers, buggy device microcode, - * or buggy device hardware. This is because any attempt by the - * device to bus-master data to a memory address that is not - * assigned to the device will trigger a slot freeze. (The idea - * is to prevent devices-gone-wild from corrupting system memory). - * Buggy hardware/drivers will have a miserable time co-existing - * with EEH. - * - * Ideally, a PCI device driver, when suspecting that an isolation - * event has occured (e.g. by reading 0xff's), will then ask EEH - * whether this is the case, and then take appropriate steps to - * reset the PCI slot, the PCI device, and then resume operations. - * However, until that day, the checking is done here, with the - * eeh_check_failure() routine embedded in the MMIO macros. If - * the slot is found to be isolated, an "EEH Event" is synthesized - * and sent out for processing. - */ - -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - -/* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. - */ -#define EEH_MAX_FAILS 100000 - -/* RTAS tokens */ -static int ibm_set_eeh_option; -static int ibm_set_slot_reset; -static int ibm_read_slot_reset_state; -static int ibm_read_slot_reset_state2; -static int ibm_slot_error_detail; - -static int eeh_subsystem_enabled; - -/* Lock to avoid races due to multiple reports of an error */ -static DEFINE_SPINLOCK(confirm_error_lock); - -/* Buffer for reporting slot-error-detail rtas calls */ -static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -static DEFINE_SPINLOCK(slot_errbuf_lock); -static int eeh_error_buf_size; - -/* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); - -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - -/* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ - -void eeh_slot_error_detail (struct pci_dn *pdn, int severity) -{ - unsigned long flags; - int rc; - - /* Log the error with the rtas logger */ - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - severity); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); -} - -/** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** - * read_slot_reset_state - Read the reset state of a device node's slot - * @dn: device node to read - * @rets: array to return results in - */ -static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) -{ - int token, outputs; - - if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { - token = ibm_read_slot_reset_state2; - outputs = 4; - } else { - token = ibm_read_slot_reset_state; - rets[2] = 0; /* fake PE Unavailable info */ - outputs = 3; - } - - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); -} - -/** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xA.... - */ -static inline unsigned long eeh_token_to_phys(unsigned long token) -{ - pte_t *ptep; - unsigned long pa; - - ptep = find_linux_pte(init_mm.pgd, token); - if (!ptep) - return token; - pa = pte_pfn(*ptep) << PAGE_SHIFT; - - return pa | (token & (PAGE_SIZE-1)); -} - -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - -/** Mark all devices that are peers of this device as failed. - * Mark the device driver too, so that it can see the failure - * immediately; this is critical, since some drivers poll - * status registers in interrupts ... If a driver is polling, - * and the slot is frozen, then the driver can deadlock in - * an interrupt context, which is bad. - */ - -static inline void __eeh_mark_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; - - if (dn->child) - __eeh_mark_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void __eeh_clear_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void eeh_clear_slot (struct device_node *dn) -{ - unsigned long flags; - spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); -} - -/** - * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze - * @dn device node - * @dev pci device, if known - * - * Check for an EEH failure for the given device node. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze. This routine - * will query firmware for the EEH status. - * - * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a slot isolation event notification. - * - * It is safe to call this routine in an interrupt context. - */ -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) -{ - int ret; - int rets[3]; - unsigned long flags; - int reset_state; - struct eeh_event *event; - struct pci_dn *pdn; - struct device_node *pe_dn; - int rc = 0; - - __get_cpu_var(total_mmio_ffs)++; - - if (!eeh_subsystem_enabled) - return 0; - - if (!dn) { - __get_cpu_var(no_dn)++; - return 0; - } - pdn = PCI_DN(dn); - - /* Access to IO BARs might get this far and still not want checking. */ - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; -#ifdef DEBUG - printk ("EEH:ignored check (%x) for %s %s\n", - pdn->eeh_mode, pci_name (dev), dn->full_name); -#endif - return 0; - } - - if (!pdn->eeh_config_addr) { - __get_cpu_var(no_cfg_addr)++; - return 0; - } - - /* If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check. - * Do this checking under a lock; as multiple PCI devices - * in one slot might report errors simultaneously, and we - * only want one error recovery routine running. - */ - spin_lock_irqsave(&confirm_error_lock, flags); - rc = 1; - if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - pdn->eeh_check_count ++; - if (pdn->eeh_check_count >= EEH_MAX_FAILS) { - printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", - pdn->eeh_check_count); - dump_stack(); - - /* re-read the slot reset state */ - if (read_slot_reset_state(pdn, rets) != 0) - rets[0] = -1; /* reset state unknown */ - - /* If we are here, then we hit an infinite loop. Stop. */ - panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); - } - goto dn_unlock; - } - - /* - * Now test for an EEH failure. This is VERY expensive. - * Note that the eeh_config_addr may be a parent device - * in the case of a device behind a bridge, or it may be - * function zero of a multi-function device. - * In any case they must share a common PHB. - */ - ret = read_slot_reset_state(pdn, rets); - - /* If the call to firmware failed, punt */ - if (ret != 0) { - printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If EEH is not supported on this device, punt. */ - if (rets[1] != 1) { - printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If not the kind of error we know about, punt. */ - if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* Note that config-io to empty slots may fail; - * we recognize empty because they don't have children. */ - if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - __get_cpu_var(slot_resets)++; - - /* Avoid repeated reports of this failure, including problems - * with other functions on this device, and functions under - * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); - - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - - /* Most EEH events are due to device driver bugs. Having - * a stack trace will help the device-driver authors figure - * out what happened. So print that out. */ - if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - - return 1; - -dn_unlock: - spin_unlock_irqrestore(&confirm_error_lock, flags); - return rc; -} - -EXPORT_SYMBOL_GPL(eeh_dn_check_failure); - -/** - * eeh_check_failure - check if all 1's data is due to EEH slot freeze - * @token i/o token, should be address in the form 0xA.... - * @val value, should be all 1's (XXX why do we need this arg??) - * - * Check for an EEH failure at the given token address. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze event. This routine - * will query firmware for the EEH status. - * - * Note this routine is safe to call in an interrupt context. - */ -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) -{ - unsigned long addr; - struct pci_dev *dev; - struct device_node *dn; - - /* Finding the phys addr + pci device; this is pretty quick. */ - addr = eeh_token_to_phys((unsigned long __force) token); - dev = pci_get_device_by_addr(addr); - if (!dev) { - __get_cpu_var(no_device)++; - return val; - } - - dn = pci_device_to_OF_node(dev); - eeh_dn_check_failure (dn, dev); - - pci_dev_put(dev); - return val; -} - -EXPORT_SYMBOL(eeh_check_failure); - -struct eeh_early_enable_info { - unsigned int buid_hi; - unsigned int buid_lo; -}; - -/* Enable eeh for the given device node. */ -static void *early_enable_eeh(struct device_node *dn, void *data) -{ - struct eeh_early_enable_info *info = data; - int ret; - char *status = get_property(dn, "status", NULL); - u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); - u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); - u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); - u32 *regs; - int enable; - struct pci_dn *pdn = PCI_DN(dn); - - pdn->eeh_mode = 0; - pdn->eeh_check_count = 0; - pdn->eeh_freeze_count = 0; - - if (status && strcmp(status, "ok") != 0) - return NULL; /* ignore devices with bad status */ - - /* Ignore bad nodes. */ - if (!class_code || !vendor_id || !device_id) - return NULL; - - /* There is nothing to check on PCI to ISA bridges */ - if (dn->type && !strcmp(dn->type, "isa")) { - pdn->eeh_mode |= EEH_MODE_NOCHECK; - return NULL; - } - - /* - * Now decide if we are going to "Disable" EEH checking - * for this device. We still run with the EEH hardware active, - * but we won't be checking for ff's. This means a driver - * could return bad data (very bad!), an interrupt handler could - * hang waiting on status bits that won't change, etc. - * But there are a few cases like display devices that make sense. - */ - enable = 1; /* i.e. we will do checking */ - if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) - enable = 0; - - if (!enable) - pdn->eeh_mode |= EEH_MODE_NOCHECK; - - /* Ok... see if this device supports EEH. Some do, some don't, - * and the only way to find out is to check each and every one. */ - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - /* Try to enable eeh */ - ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); - if (ret == 0) { - eeh_subsystem_enabled = 1; - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = regs[0]; -#ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); -#endif - } else { - - /* This device doesn't support EEH, but it may have an - * EEH parent, in which case we mark it as supported. */ - if (dn->parent && PCI_DN(dn->parent) - && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - /* Parent supports EEH. */ - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; - return NULL; - } - } - } else { - printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", - dn->full_name); - } - - return NULL; -} - -/* - * Initialize EEH by trying to enable it for all of the adapters in the system. - * As a side effect we can determine here if eeh is supported at all. - * Note that we leave EEH on so failed config cycles won't cause a machine - * check. If a user turns off EEH for a particular adapter they are really - * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't - * grant access to a slot if EEH isn't enabled, and so we always enable - * EEH for all slots/all devices. - * - * The eeh-force-off option disables EEH checking globally, for all slots. - * Even if force-off is set, the EEH hardware is still enabled, so that - * newer systems can boot. - */ -void __init eeh_init(void) -{ - struct device_node *phb, *np; - struct eeh_early_enable_info info; - - spin_lock_init(&confirm_error_lock); - spin_lock_init(&slot_errbuf_lock); - - np = of_find_node_by_path("/rtas"); - if (np == NULL) - return; - - ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); - ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); - ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); - ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); - ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); - - if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) - return; - - eeh_error_buf_size = rtas_token("rtas-error-log-max"); - if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { - eeh_error_buf_size = 1024; - } - if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { - printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " - "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); - eeh_error_buf_size = RTAS_ERROR_LOG_MAX; - } - - /* Enable EEH for all adapters. Note that eeh requires buid's */ - for (phb = of_find_node_by_name(NULL, "pci"); phb; - phb = of_find_node_by_name(phb, "pci")) { - unsigned long buid; - - buid = get_phb_buid(phb); - if (buid == 0 || PCI_DN(phb) == NULL) - continue; - - info.buid_lo = BUID_LO(buid); - info.buid_hi = BUID_HI(buid); - traverse_pci_devices(phb, early_enable_eeh, &info); - } - - if (eeh_subsystem_enabled) - printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - printk(KERN_WARNING "EEH: No capable adapters found\n"); -} - -/** - * eeh_add_device_early - enable EEH for the indicated device_node - * @dn: device node for which to set up EEH - * - * This routine must be used to perform EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - * This routine must be called before any i/o is performed to the - * adapter (inluding any config-space i/o). - * Whether this actually enables EEH or not for this device depends - * on the CEC architecture, type of the device, on earlier boot - * command-line arguments & etc. - */ -void eeh_add_device_early(struct device_node *dn) -{ - struct pci_controller *phb; - struct eeh_early_enable_info info; - - if (!dn || !PCI_DN(dn)) - return; - phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); - return; - } - - info.buid_hi = BUID_HI(phb->buid); - info.buid_lo = BUID_LO(phb->buid); - early_enable_eeh(dn, &info); -} -EXPORT_SYMBOL_GPL(eeh_add_device_early); - -/** - * eeh_add_device_late - perform EEH initialization for the indicated pci device - * @dev: pci device for which to set up EEH - * - * This routine must be used to complete EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - */ -void eeh_add_device_late(struct pci_dev *dev) -{ - struct device_node *dn; - - if (!dev || !eeh_subsystem_enabled) - return; - -#ifdef DEBUG - printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); -#endif - - pci_dev_get (dev); - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; - - pci_addr_cache_insert_device (dev); -} -EXPORT_SYMBOL_GPL(eeh_add_device_late); - -/** - * eeh_remove_device - undo EEH setup for the indicated pci device - * @dev: pci device to be removed - * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). - */ -void eeh_remove_device(struct pci_dev *dev) -{ - struct device_node *dn; - if (!dev || !eeh_subsystem_enabled) - return; - - /* Unregister the device with the EEH/PCI address search system */ -#ifdef DEBUG - printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); -#endif - pci_addr_cache_remove_device(dev); - - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = NULL; - pci_dev_put (dev); -} -EXPORT_SYMBOL_GPL(eeh_remove_device); - -static int proc_eeh_show(struct seq_file *m, void *v) -{ - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - - if (0 == eeh_subsystem_enabled) { - seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); - } else { - seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, - "no device=%ld\n" - "no device node=%ld\n" - "no config address=%ld\n" - "check not wanted=%ld\n" - "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); - } - - return 0; -} - -static int proc_eeh_open(struct inode *inode, struct file *file) -{ - return single_open(file, proc_eeh_show, NULL); -} - -static struct file_operations proc_eeh_operations = { - .open = proc_eeh_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int __init eeh_init_proc(void) -{ - struct proc_dir_entry *e; - - if (systemcfg->platform & PLATFORM_PSERIES) { - e = create_proc_entry("ppc64/eeh", 0, NULL); - if (e) - e->proc_fops = &proc_eeh_operations; - } - - return 0; -} -__initcall(eeh_init_proc); Index: linux-2.6.14-git3/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/Makefile 2005-11-02 14:29:22.485829789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/Makefile 2005-11-02 14:30:49.805589414 -0600 @@ -35,7 +35,6 @@ bpa_iic.o spider-pic.o obj-$(CONFIG_KEXEC) += machine_kexec.o -obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-10-31 11:19:47.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:31:36.150092654 -0600 @@ -3,3 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o +obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:30:49.790591516 -0600 @@ -0,0 +1,1093 @@ +/* + * eeh.c + * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +/** Overview: + * EEH, or "Extended Error Handling" is a PCI bridge technology for + * dealing with PCI bus errors that can't be dealt with within the + * usual PCI framework, except by check-stopping the CPU. Systems + * that are designed for high-availability/reliability cannot afford + * to crash due to a "mere" PCI error, thus the need for EEH. + * An EEH-capable bridge operates by converting a detected error + * into a "slot freeze", taking the PCI adapter off-line, making + * the slot behave, from the OS'es point of view, as if the slot + * were "empty": all reads return 0xff's and all writes are silently + * ignored. EEH slot isolation events can be triggered by parity + * errors on the address or data busses (e.g. during posted writes), + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. + * + * Note, however, that one of the leading causes of EEH slot + * freeze events are buggy device drivers, buggy device microcode, + * or buggy device hardware. This is because any attempt by the + * device to bus-master data to a memory address that is not + * assigned to the device will trigger a slot freeze. (The idea + * is to prevent devices-gone-wild from corrupting system memory). + * Buggy hardware/drivers will have a miserable time co-existing + * with EEH. + * + * Ideally, a PCI device driver, when suspecting that an isolation + * event has occured (e.g. by reading 0xff's), will then ask EEH + * whether this is the case, and then take appropriate steps to + * reset the PCI slot, the PCI device, and then resume operations. + * However, until that day, the checking is done here, with the + * eeh_check_failure() routine embedded in the MMIO macros. If + * the slot is found to be isolated, an "EEH Event" is synthesized + * and sent out for processing. + */ + +/* EEH event workqueue setup. */ +static DEFINE_SPINLOCK(eeh_eventlist_lock); +LIST_HEAD(eeh_eventlist); +static void eeh_event_handler(void *); +DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); + +static struct notifier_block *eeh_notifier_chain; + +/* If a device driver keeps reading an MMIO register in an interrupt + * handler after a slot isolation event has occurred, we assume it + * is broken and panic. This sets the threshold for how many read + * attempts we allow before panicking. + */ +#define EEH_MAX_FAILS 100000 + +/* RTAS tokens */ +static int ibm_set_eeh_option; +static int ibm_set_slot_reset; +static int ibm_read_slot_reset_state; +static int ibm_read_slot_reset_state2; +static int ibm_slot_error_detail; + +static int eeh_subsystem_enabled; + +/* Lock to avoid races due to multiple reports of an error */ +static DEFINE_SPINLOCK(confirm_error_lock); + +/* Buffer for reporting slot-error-detail rtas calls */ +static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; +static DEFINE_SPINLOCK(slot_errbuf_lock); +static int eeh_error_buf_size; + +/* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); +static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); +static DEFINE_PER_CPU(unsigned long, false_positives); +static DEFINE_PER_CPU(unsigned long, ignored_failures); +static DEFINE_PER_CPU(unsigned long, slot_resets); + +/** + * The pci address cache subsystem. This subsystem places + * PCI device address resources into a red-black tree, sorted + * according to the address range, so that given only an i/o + * address, the corresponding PCI device can be **quickly** + * found. It is safe to perform an address lookup in an interrupt + * context; this ability is an important feature. + * + * Currently, the only customer of this code is the EEH subsystem; + * thus, this code has been somewhat tailored to suit EEH better. + * In particular, the cache does *not* hold the addresses of devices + * for which EEH is not enabled. + * + * (Implementation Note: The RB tree seems to be better/faster + * than any hash algo I could think of for this problem, even + * with the penalty of slow pointer chases for d-cache misses). + */ +struct pci_io_addr_range +{ + struct rb_node rb_node; + unsigned long addr_lo; + unsigned long addr_hi; + struct pci_dev *pcidev; + unsigned int flags; +}; + +static struct pci_io_addr_cache +{ + struct rb_root rb_root; + spinlock_t piar_lock; +} pci_io_addr_cache_root; + +static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) +{ + struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; + + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (addr < piar->addr_lo) { + n = n->rb_left; + } else { + if (addr > piar->addr_hi) { + n = n->rb_right; + } else { + pci_dev_get(piar->pcidev); + return piar->pcidev; + } + } + } + + return NULL; +} + +/** + * pci_get_device_by_addr - Get device, given only address + * @addr: mmio (PIO) phys address or i/o port number + * + * Given an mmio phys address, or a port number, find a pci device + * that implements this address. Be sure to pci_dev_put the device + * when finished. I/O port numbers are assumed to be offset + * from zero (that is, they do *not* have pci_io_addr added in). + * It is safe to call this function within an interrupt. + */ +static struct pci_dev *pci_get_device_by_addr(unsigned long addr) +{ + struct pci_dev *dev; + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + dev = __pci_get_device_by_addr(addr); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); + return dev; +} + +#ifdef DEBUG +/* + * Handy-dandy debug print routine, does nothing more + * than print out the contents of our addr cache. + */ +static void pci_addr_cache_print(struct pci_io_addr_cache *cache) +{ + struct rb_node *n; + int cnt = 0; + + n = rb_first(&cache->rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", + (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, + piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + cnt++; + n = rb_next(n); + } +} +#endif + +/* Insert address range into the rb tree. */ +static struct pci_io_addr_range * +pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, + unsigned long ahi, unsigned int flags) +{ + struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; + struct rb_node *parent = NULL; + struct pci_io_addr_range *piar; + + /* Walk tree, find a place to insert into tree */ + while (*p) { + parent = *p; + piar = rb_entry(parent, struct pci_io_addr_range, rb_node); + if (ahi < piar->addr_lo) { + p = &parent->rb_left; + } else if (alo > piar->addr_hi) { + p = &parent->rb_right; + } else { + if (dev != piar->pcidev || + alo != piar->addr_lo || ahi != piar->addr_hi) { + printk(KERN_WARNING "PIAR: overlapping address range\n"); + } + return piar; + } + } + piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); + if (!piar) + return NULL; + + piar->addr_lo = alo; + piar->addr_hi = ahi; + piar->pcidev = dev; + piar->flags = flags; + +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + + rb_link_node(&piar->rb_node, parent, p); + rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); + + return piar; +} + +static void __pci_addr_cache_insert_device(struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_dn *pdn; + int i; + int inserted = 0; + + dn = pci_device_to_OF_node(dev); + if (!dn) { + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); + return; + } + + /* Skip any devices for which EEH is not enabled. */ + pdn = PCI_DN(dn); + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { +#ifdef DEBUG + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); +#endif + return; + } + + /* The cache holds a reference to the device... */ + pci_dev_get(dev); + + /* Walk resources on this device, poke them into the tree */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long start = pci_resource_start(dev,i); + unsigned long end = pci_resource_end(dev,i); + unsigned int flags = pci_resource_flags(dev,i); + + /* We are interested only bus addresses, not dma or other stuff */ + if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if (start == 0 || ~start == 0 || end == 0 || ~end == 0) + continue; + pci_addr_cache_insert(dev, start, end, flags); + inserted = 1; + } + + /* If there was nothing to add, the cache has no reference... */ + if (!inserted) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_insert_device - Add a device to the address cache + * @dev: PCI device whose I/O addresses we are interested in. + * + * In order to support the fast lookup of devices based on addresses, + * we maintain a cache of devices that can be quickly searched. + * This routine adds a device to that cache. + */ +static void pci_addr_cache_insert_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_insert_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) +{ + struct rb_node *n; + int removed = 0; + +restart: + n = rb_first(&pci_io_addr_cache_root.rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (piar->pcidev == dev) { + rb_erase(n, &pci_io_addr_cache_root.rb_root); + removed = 1; + kfree(piar); + goto restart; + } + n = rb_next(n); + } + + /* The cache no longer holds its reference to this device... */ + if (removed) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_remove_device - remove pci device from addr cache + * @dev: device to remove + * + * Remove a device from the addr-cache tree. + * This is potentially expensive, since it will walk + * the tree multiple times (once per resource). + * But so what; device removal doesn't need to be that fast. + */ +static void pci_addr_cache_remove_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_remove_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +/** + * pci_addr_cache_build - Build a cache of I/O addresses + * + * Build a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + * This routine scans all pci busses to build the cache. + * Must be run late in boot process, after the pci controllers + * have been scaned for devices (after all device resources are known). + */ +void __init pci_addr_cache_build(void) +{ + struct pci_dev *dev = NULL; + + if (!eeh_subsystem_enabled) + return; + + spin_lock_init(&pci_io_addr_cache_root.piar_lock); + + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { + /* Ignore PCI bridges ( XXX why ??) */ + if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { + continue; + } + pci_addr_cache_insert_device(dev); + } + +#ifdef DEBUG + /* Verify tree built up above, echo back the list of addrs. */ + pci_addr_cache_print(&pci_io_addr_cache_root); +#endif +} + +/* --------------------------------------------------------------- */ +/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ + +void eeh_slot_error_detail (struct pci_dn *pdn, int severity) +{ + unsigned long flags; + int rc; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +/** + * eeh_register_notifier - Register to find out about EEH events. + * @nb: notifier block to callback on events + */ +int eeh_register_notifier(struct notifier_block *nb) +{ + return notifier_chain_register(&eeh_notifier_chain, nb); +} + +/** + * eeh_unregister_notifier - Unregister to an EEH event notifier. + * @nb: notifier block to callback on events + */ +int eeh_unregister_notifier(struct notifier_block *nb) +{ + return notifier_chain_unregister(&eeh_notifier_chain, nb); +} + +/** + * read_slot_reset_state - Read the reset state of a device node's slot + * @dn: device node to read + * @rets: array to return results in + */ +static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) +{ + int token, outputs; + + if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { + token = ibm_read_slot_reset_state2; + outputs = 4; + } else { + token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ + outputs = 3; + } + + return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); +} + +/** + * eeh_panic - call panic() for an eeh event that cannot be handled. + * The philosophy of this routine is that it is better to panic and + * halt the OS than it is to risk possible data corruption by + * oblivious device drivers that don't know better. + * + * @dev pci device that had an eeh event + * @reset_state current reset state of the device slot + */ +static void eeh_panic(struct pci_dev *dev, int reset_state) +{ + /* + * XXX We should create a separate sysctl for this. + * + * Since the panic_on_oops sysctl is used to halt the system + * in light of potential corruption, we can use it here. + */ + if (panic_on_oops) { + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); + panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, + pci_name(dev)); + } + else { + __get_cpu_var(ignored_failures)++; + printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", + reset_state, pci_name(dev)); + } +} + +/** + * eeh_event_handler - dispatch EEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static void eeh_event_handler(void *dummy) +{ + unsigned long flags; + struct eeh_event *event; + + while (1) { + spin_lock_irqsave(&eeh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&eeh_eventlist)) { + event = list_entry(eeh_eventlist.next, struct eeh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " + "%s\n", event->reset_state, + pci_name(event->dev)); + + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); + + pci_dev_put(event->dev); + kfree(event); + } +} + +/** + * eeh_token_to_phys - convert EEH address token to phys address + * @token i/o token, should be address in the form 0xA.... + */ +static inline unsigned long eeh_token_to_phys(unsigned long token) +{ + pte_t *ptep; + unsigned long pa; + + ptep = find_linux_pte(init_mm.pgd, token); + if (!ptep) + return token; + pa = pte_pfn(*ptep) << PAGE_SHIFT; + + return pa | (token & (PAGE_SIZE-1)); +} + +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + +/** Mark all devices that are peers of this device as failed. + * Mark the device driver too, so that it can see the failure + * immediately; this is critical, since some drivers poll + * status registers in interrupts ... If a driver is polling, + * and the slot is frozen, then the driver can deadlock in + * an interrupt context, which is bad. + */ + +static inline void __eeh_mark_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + + if (dn->child) + __eeh_mark_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void __eeh_clear_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; + if (dn->child) + __eeh_clear_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void eeh_clear_slot (struct device_node *dn) +{ + unsigned long flags; + spin_lock_irqsave(&confirm_error_lock, flags); + __eeh_clear_slot (dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); +} + +/** + * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze + * @dn device node + * @dev pci device, if known + * + * Check for an EEH failure for the given device node. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze. This routine + * will query firmware for the EEH status. + * + * Returns 0 if there has not been an EEH error; otherwise returns + * a non-zero value and queues up a slot isolation event notification. + * + * It is safe to call this routine in an interrupt context. + */ +int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) +{ + int ret; + int rets[3]; + unsigned long flags; + int reset_state; + struct eeh_event *event; + struct pci_dn *pdn; + struct device_node *pe_dn; + int rc = 0; + + __get_cpu_var(total_mmio_ffs)++; + + if (!eeh_subsystem_enabled) + return 0; + + if (!dn) { + __get_cpu_var(no_dn)++; + return 0; + } + pdn = PCI_DN(dn); + + /* Access to IO BARs might get this far and still not want checking. */ + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; +#ifdef DEBUG + printk ("EEH:ignored check (%x) for %s %s\n", + pdn->eeh_mode, pci_name (dev), dn->full_name); +#endif + return 0; + } + + if (!pdn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; + return 0; + } + + /* If we already have a pending isolation event for this + * slot, we know it's bad already, we don't need to check. + * Do this checking under a lock; as multiple PCI devices + * in one slot might report errors simultaneously, and we + * only want one error recovery routine running. + */ + spin_lock_irqsave(&confirm_error_lock, flags); + rc = 1; + if (pdn->eeh_mode & EEH_MODE_ISOLATED) { + pdn->eeh_check_count ++; + if (pdn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + pdn->eeh_check_count); + dump_stack(); + + /* re-read the slot reset state */ + if (read_slot_reset_state(pdn, rets) != 0) + rets[0] = -1; /* reset state unknown */ + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); + } + goto dn_unlock; + } + + /* + * Now test for an EEH failure. This is VERY expensive. + * Note that the eeh_config_addr may be a parent device + * in the case of a device behind a bridge, or it may be + * function zero of a multi-function device. + * In any case they must share a common PHB. + */ + ret = read_slot_reset_state(pdn, rets); + + /* If the call to firmware failed, punt */ + if (ret != 0) { + printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If EEH is not supported on this device, punt. */ + if (rets[1] != 1) { + printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If not the kind of error we know about, punt. */ + if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* Note that config-io to empty slots may fail; + * we recognize empty because they don't have children. */ + if ((rets[0] == 5) && (dn->child == NULL)) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + __get_cpu_var(slot_resets)++; + + /* Avoid repeated reports of this failure, including problems + * with other functions on this device, and functions under + * bridges. */ + pe_dn = find_device_pe (dn); + __eeh_mark_slot (pe_dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); + + reset_state = rets[0]; + + eeh_slot_error_detail (pdn, 1 /* Temporary Error */); + + printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", + rets[0], dn->name, dn->full_name); + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + eeh_panic(dev, reset_state); + return 1; + } + + event->dev = dev; + event->dn = dn; + event->reset_state = reset_state; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&eeh_eventlist_lock, flags); + list_add(&event->list, &eeh_eventlist); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + + /* Most EEH events are due to device driver bugs. Having + * a stack trace will help the device-driver authors figure + * out what happened. So print that out. */ + if (rets[0] != 5) dump_stack(); + schedule_work(&eeh_event_wq); + + return 1; + +dn_unlock: + spin_unlock_irqrestore(&confirm_error_lock, flags); + return rc; +} + +EXPORT_SYMBOL_GPL(eeh_dn_check_failure); + +/** + * eeh_check_failure - check if all 1's data is due to EEH slot freeze + * @token i/o token, should be address in the form 0xA.... + * @val value, should be all 1's (XXX why do we need this arg??) + * + * Check for an EEH failure at the given token address. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze event. This routine + * will query firmware for the EEH status. + * + * Note this routine is safe to call in an interrupt context. + */ +unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) +{ + unsigned long addr; + struct pci_dev *dev; + struct device_node *dn; + + /* Finding the phys addr + pci device; this is pretty quick. */ + addr = eeh_token_to_phys((unsigned long __force) token); + dev = pci_get_device_by_addr(addr); + if (!dev) { + __get_cpu_var(no_device)++; + return val; + } + + dn = pci_device_to_OF_node(dev); + eeh_dn_check_failure (dn, dev); + + pci_dev_put(dev); + return val; +} + +EXPORT_SYMBOL(eeh_check_failure); + +struct eeh_early_enable_info { + unsigned int buid_hi; + unsigned int buid_lo; +}; + +/* Enable eeh for the given device node. */ +static void *early_enable_eeh(struct device_node *dn, void *data) +{ + struct eeh_early_enable_info *info = data; + int ret; + char *status = get_property(dn, "status", NULL); + u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); + u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); + u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); + u32 *regs; + int enable; + struct pci_dn *pdn = PCI_DN(dn); + + pdn->eeh_mode = 0; + pdn->eeh_check_count = 0; + pdn->eeh_freeze_count = 0; + + if (status && strcmp(status, "ok") != 0) + return NULL; /* ignore devices with bad status */ + + /* Ignore bad nodes. */ + if (!class_code || !vendor_id || !device_id) + return NULL; + + /* There is nothing to check on PCI to ISA bridges */ + if (dn->type && !strcmp(dn->type, "isa")) { + pdn->eeh_mode |= EEH_MODE_NOCHECK; + return NULL; + } + + /* + * Now decide if we are going to "Disable" EEH checking + * for this device. We still run with the EEH hardware active, + * but we won't be checking for ff's. This means a driver + * could return bad data (very bad!), an interrupt handler could + * hang waiting on status bits that won't change, etc. + * But there are a few cases like display devices that make sense. + */ + enable = 1; /* i.e. we will do checking */ + if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) + enable = 0; + + if (!enable) + pdn->eeh_mode |= EEH_MODE_NOCHECK; + + /* Ok... see if this device supports EEH. Some do, some don't, + * and the only way to find out is to check each and every one. */ + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + /* Try to enable eeh */ + ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, + regs[0], info->buid_hi, info->buid_lo, + EEH_ENABLE); + if (ret == 0) { + eeh_subsystem_enabled = 1; + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = regs[0]; +#ifdef DEBUG + printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); +#endif + } else { + + /* This device doesn't support EEH, but it may have an + * EEH parent, in which case we mark it as supported. */ + if (dn->parent && PCI_DN(dn->parent) + && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + /* Parent supports EEH. */ + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; + return NULL; + } + } + } else { + printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", + dn->full_name); + } + + return NULL; +} + +/* + * Initialize EEH by trying to enable it for all of the adapters in the system. + * As a side effect we can determine here if eeh is supported at all. + * Note that we leave EEH on so failed config cycles won't cause a machine + * check. If a user turns off EEH for a particular adapter they are really + * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't + * grant access to a slot if EEH isn't enabled, and so we always enable + * EEH for all slots/all devices. + * + * The eeh-force-off option disables EEH checking globally, for all slots. + * Even if force-off is set, the EEH hardware is still enabled, so that + * newer systems can boot. + */ +void __init eeh_init(void) +{ + struct device_node *phb, *np; + struct eeh_early_enable_info info; + + spin_lock_init(&confirm_error_lock); + spin_lock_init(&slot_errbuf_lock); + + np = of_find_node_by_path("/rtas"); + if (np == NULL) + return; + + ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); + ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); + ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); + ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); + ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); + + if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) + return; + + eeh_error_buf_size = rtas_token("rtas-error-log-max"); + if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { + eeh_error_buf_size = 1024; + } + if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { + printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " + "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); + eeh_error_buf_size = RTAS_ERROR_LOG_MAX; + } + + /* Enable EEH for all adapters. Note that eeh requires buid's */ + for (phb = of_find_node_by_name(NULL, "pci"); phb; + phb = of_find_node_by_name(phb, "pci")) { + unsigned long buid; + + buid = get_phb_buid(phb); + if (buid == 0 || PCI_DN(phb) == NULL) + continue; + + info.buid_lo = BUID_LO(buid); + info.buid_hi = BUID_HI(buid); + traverse_pci_devices(phb, early_enable_eeh, &info); + } + + if (eeh_subsystem_enabled) + printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); + else + printk(KERN_WARNING "EEH: No capable adapters found\n"); +} + +/** + * eeh_add_device_early - enable EEH for the indicated device_node + * @dn: device node for which to set up EEH + * + * This routine must be used to perform EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + * This routine must be called before any i/o is performed to the + * adapter (inluding any config-space i/o). + * Whether this actually enables EEH or not for this device depends + * on the CEC architecture, type of the device, on earlier boot + * command-line arguments & etc. + */ +void eeh_add_device_early(struct device_node *dn) +{ + struct pci_controller *phb; + struct eeh_early_enable_info info; + + if (!dn || !PCI_DN(dn)) + return; + phb = PCI_DN(dn)->phb; + if (NULL == phb || 0 == phb->buid) { + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); + return; + } + + info.buid_hi = BUID_HI(phb->buid); + info.buid_lo = BUID_LO(phb->buid); + early_enable_eeh(dn, &info); +} +EXPORT_SYMBOL_GPL(eeh_add_device_early); + +/** + * eeh_add_device_late - perform EEH initialization for the indicated pci device + * @dev: pci device for which to set up EEH + * + * This routine must be used to complete EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + */ +void eeh_add_device_late(struct pci_dev *dev) +{ + struct device_node *dn; + + if (!dev || !eeh_subsystem_enabled) + return; + +#ifdef DEBUG + printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); +#endif + + pci_dev_get (dev); + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = dev; + + pci_addr_cache_insert_device (dev); +} +EXPORT_SYMBOL_GPL(eeh_add_device_late); + +/** + * eeh_remove_device - undo EEH setup for the indicated pci device + * @dev: pci device to be removed + * + * This routine should be when a device is removed from a running + * system (e.g. by hotplug or dlpar). + */ +void eeh_remove_device(struct pci_dev *dev) +{ + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) + return; + + /* Unregister the device with the EEH/PCI address search system */ +#ifdef DEBUG + printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); +#endif + pci_addr_cache_remove_device(dev); + + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = NULL; + pci_dev_put (dev); +} +EXPORT_SYMBOL_GPL(eeh_remove_device); + +static int proc_eeh_show(struct seq_file *m, void *v) +{ + unsigned int cpu; + unsigned long ffs = 0, positives = 0, failures = 0; + unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; + + for_each_cpu(cpu) { + ffs += per_cpu(total_mmio_ffs, cpu); + positives += per_cpu(false_positives, cpu); + failures += per_cpu(ignored_failures, cpu); + resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); + } + + if (0 == eeh_subsystem_enabled) { + seq_printf(m, "EEH Subsystem is globally disabled\n"); + seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); + } else { + seq_printf(m, "EEH Subsystem is enabled\n"); + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" + "eeh_false_positives=%ld\n" + "eeh_ignored_failures=%ld\n" + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); + } + + return 0; +} + +static int proc_eeh_open(struct inode *inode, struct file *file) +{ + return single_open(file, proc_eeh_show, NULL); +} + +static struct file_operations proc_eeh_operations = { + .open = proc_eeh_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int __init eeh_init_proc(void) +{ + struct proc_dir_entry *e; + + if (systemcfg->platform & PLATFORM_PSERIES) { + e = create_proc_entry("ppc64/eeh", 0, NULL); + if (e) + e->proc_fops = &proc_eeh_operations; + } + + return 0; +} +__initcall(eeh_init_proc); From jesse.brandeburg at gmail.com Fri Nov 4 12:34:53 2005 From: jesse.brandeburg at gmail.com (Jesse Brandeburg) Date: Thu, 3 Nov 2005 17:34:53 -0800 Subject: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver In-Reply-To: <20051104005353.GA27074@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005353.GA27074@mail.gnucash.org> Message-ID: <4807377b0511031734gfc23c5fm31050bc8ee47c0c5@mail.gmail.com> On 11/3/05, Linas Vepstas wrote: > Various PCI bus errors can be signaled by newer PCI controllers. This > patch adds the PCI error recovery callbacks to the intel ethernet e100 > device driver. The patch has been tested, and appears to work well. > > Signed-off-by: Linas Vepstas > > -- > Index: linux-2.6.14-git3/drivers/net/e100.c I think these patches will be great, on the pseries, but is there not a compile option that should compile out all this code, i.e. #ifdef PCI_ERROR_RECOVERY if the arch doesn't support it? Jesse From jesse.brandeburg at gmail.com Fri Nov 4 12:51:22 2005 From: jesse.brandeburg at gmail.com (Jesse Brandeburg) Date: Thu, 3 Nov 2005 17:51:22 -0800 Subject: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver In-Reply-To: <4807377b0511031734gfc23c5fm31050bc8ee47c0c5@mail.gmail.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005353.GA27074@mail.gnucash.org> <4807377b0511031734gfc23c5fm31050bc8ee47c0c5@mail.gmail.com> Message-ID: <4807377b0511031751r22301fd8q4397dd504a32fa39@mail.gmail.com> On 11/3/05, Jesse Brandeburg wrote: > I think these patches will be great, on the pseries, but > is there not a compile option that should compile out all this code, i.e. > #ifdef PCI_ERROR_RECOVERY > > if the arch doesn't support it? Uh, i just saw patch 32, never mind. From david at gibson.dropbear.id.au Fri Nov 4 14:16:10 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 4 Nov 2005 14:16:10 +1100 Subject: powerpc: Consolidate asm compatibility macros Message-ID: <20051104031609.GA962@localhost.localdomain> Paulus, not entirely sure if this is close to what you had in mind for consolidating the asm macro stuff. Apply if you like... This patch consolidates macros used to generate assembly for compatibility across different CPUs or configs. A new header, asm-powerpc/asm-compat.h contains the main compatibility macros. It uses some preprocessor magic to make the macros suitable both for use in .S files, and in inline asm in .c files. Headers (bitops.h, uaccess.h, atomic.h, bug.h) which had their own such compatibility macros are changed to use asm-compat.h. ppc_asm.h is now for use in .S files *only*, and a #error enforces that. As such, we're a lot more careless about namespace pollution here than in asm-compat.h. While we're at it, this patch adds a call to the PPC405_ERR77 macro in futex.h which should have had it already, but didn't. Built and booted on pSeries, Maple and iSeries (ARCH=powerpc). Built for 32-bit powermac (ARCH=powerpc) and Walnut (ARCH=ppc). Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ppc_asm.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ppc_asm.h 2005-11-03 16:26:58.000000000 +1100 +++ working-2.6/include/asm-powerpc/ppc_asm.h 2005-11-04 14:04:05.000000000 +1100 @@ -6,8 +6,13 @@ #include #include +#include -#ifdef __ASSEMBLY__ +#ifndef __ASSEMBLY__ +#error __FILE__ should only be used in assembler files +#else + +#define SZL (BITS_PER_LONG/8) /* * Macros for storing registers into and loading registers from @@ -184,12 +189,6 @@ oris reg,reg,(label)@h; \ ori reg,reg,(label)@l; -/* operations for longs and pointers */ -#define LDL ld -#define STL std -#define CMPI cmpdi -#define SZL 8 - /* offsets for stack frame layout */ #define LRSAVE 16 @@ -203,12 +202,6 @@ #define OFF(name) name at l -/* operations for longs and pointers */ -#define LDL lwz -#define STL stw -#define CMPI cmpwi -#define SZL 4 - /* offsets for stack frame layout */ #define LRSAVE 4 @@ -266,15 +259,6 @@ #endif -#ifdef CONFIG_IBM405_ERR77 -#define PPC405_ERR77(ra,rb) dcbt ra, rb; -#define PPC405_ERR77_SYNC sync; -#else -#define PPC405_ERR77(ra,rb) -#define PPC405_ERR77_SYNC -#endif - - #ifdef CONFIG_IBM440EP_ERR42 #define PPC440EP_ERR42 isync #else @@ -502,17 +486,6 @@ #define N_SLINE 68 #define N_SO 100 -#define ASM_CONST(x) x -#else - #define __ASM_CONST(x) x##UL - #define ASM_CONST(x) __ASM_CONST(x) - -#ifdef CONFIG_PPC64 -#define DATAL ".llong" -#else -#define DATAL ".long" -#endif - #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PPC_ASM_H */ Index: working-2.6/arch/powerpc/kernel/fpu.S =================================================================== --- working-2.6.orig/arch/powerpc/kernel/fpu.S 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/fpu.S 2005-11-04 14:04:05.000000000 +1100 @@ -41,7 +41,7 @@ #ifndef CONFIG_SMP LOADBASE(r3, last_task_used_math) toreal(r3) - LDL r4,OFF(last_task_used_math)(r3) + PPC_LD r4,OFF(last_task_used_math)(r3) CMPI 0,r4,0 beq 1f toreal(r4) @@ -49,12 +49,12 @@ SAVE_32FPRS(0, r4) mffs fr0 stfd fr0,THREAD_FPSCR(r4) - LDL r5,PT_REGS(r4) + PPC_LD r5,PT_REGS(r4) toreal(r5) - LDL r4,_MSR-STACK_FRAME_OVERHEAD(r5) + PPC_LD r4,_MSR-STACK_FRAME_OVERHEAD(r5) li r10,MSR_FP|MSR_FE0|MSR_FE1 andc r4,r4,r10 /* disable FP for previous task */ - STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) + PPC_ST r4,_MSR-STACK_FRAME_OVERHEAD(r5) 1: #endif /* CONFIG_SMP */ /* enable use of FP after return */ @@ -77,7 +77,7 @@ #ifndef CONFIG_SMP subi r4,r5,THREAD fromreal(r4) - STL r4,OFF(last_task_used_math)(r3) + PPC_ST r4,OFF(last_task_used_math)(r3) #endif /* CONFIG_SMP */ /* restore registers and return */ /* we haven't used ctr or xer or lr */ @@ -100,21 +100,21 @@ CMPI 0,r3,0 beqlr- /* if no previous owner, done */ addi r3,r3,THREAD /* want THREAD of task */ - LDL r5,PT_REGS(r3) + PPC_LD r5,PT_REGS(r3) CMPI 0,r5,0 SAVE_32FPRS(0, r3) mffs fr0 stfd fr0,THREAD_FPSCR(r3) beq 1f - LDL r4,_MSR-STACK_FRAME_OVERHEAD(r5) + PPC_LD r4,_MSR-STACK_FRAME_OVERHEAD(r5) li r3,MSR_FP|MSR_FE0|MSR_FE1 andc r4,r4,r3 /* disable FP for previous task */ - STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) + PPC_ST r4,_MSR-STACK_FRAME_OVERHEAD(r5) 1: #ifndef CONFIG_SMP li r5,0 LOADBASE(r4,last_task_used_math) - STL r5,OFF(last_task_used_math)(r4) + PPC_ST r5,OFF(last_task_used_math)(r4) #endif /* CONFIG_SMP */ blr Index: working-2.6/include/asm-powerpc/bitops.h =================================================================== --- working-2.6.orig/include/asm-powerpc/bitops.h 2005-11-03 16:26:58.000000000 +1100 +++ working-2.6/include/asm-powerpc/bitops.h 2005-11-04 14:04:05.000000000 +1100 @@ -40,6 +40,7 @@ #include #include +#include #include /* @@ -52,16 +53,6 @@ #define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) #define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) -#ifdef CONFIG_PPC64 -#define LARXL "ldarx" -#define STCXL "stdcx." -#define CNTLZL "cntlzd" -#else -#define LARXL "lwarx" -#define STCXL "stwcx." -#define CNTLZL "cntlzw" -#endif - static __inline__ void set_bit(int nr, volatile unsigned long *addr) { unsigned long old; @@ -69,10 +60,10 @@ unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); __asm__ __volatile__( -"1:" LARXL " %0,0,%3 # set_bit\n" +"1:" PPC_LARX "%0,0,%3 # set_bit\n" "or %0,%0,%2\n" PPC405_ERR77(0,%3) - STCXL " %0,0,%3\n" + PPC_STCX "%0,0,%3\n" "bne- 1b" : "=&r"(old), "=m"(*p) : "r"(mask), "r"(p), "m"(*p) @@ -86,10 +77,10 @@ unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); __asm__ __volatile__( -"1:" LARXL " %0,0,%3 # set_bit\n" +"1:" PPC_LARX "%0,0,%3 # clear_bit\n" "andc %0,%0,%2\n" PPC405_ERR77(0,%3) - STCXL " %0,0,%3\n" + PPC_STCX "%0,0,%3\n" "bne- 1b" : "=&r"(old), "=m"(*p) : "r"(mask), "r"(p), "m"(*p) @@ -103,10 +94,10 @@ unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); __asm__ __volatile__( -"1:" LARXL " %0,0,%3 # set_bit\n" +"1:" PPC_LARX "%0,0,%3 # change_bit\n" "xor %0,%0,%2\n" PPC405_ERR77(0,%3) - STCXL " %0,0,%3\n" + PPC_STCX "%0,0,%3\n" "bne- 1b" : "=&r"(old), "=m"(*p) : "r"(mask), "r"(p), "m"(*p) @@ -122,10 +113,10 @@ __asm__ __volatile__( EIEIO_ON_SMP -"1:" LARXL " %0,0,%3 # test_and_set_bit\n" +"1:" PPC_LARX "%0,0,%3 # test_and_set_bit\n" "or %1,%0,%2 \n" PPC405_ERR77(0,%3) - STCXL " %1,0,%3 \n" + PPC_STCX "%1,0,%3 \n" "bne- 1b" ISYNC_ON_SMP : "=&r" (old), "=&r" (t) @@ -144,10 +135,10 @@ __asm__ __volatile__( EIEIO_ON_SMP -"1:" LARXL " %0,0,%3 # test_and_clear_bit\n" +"1:" PPC_LARX "%0,0,%3 # test_and_clear_bit\n" "andc %1,%0,%2 \n" PPC405_ERR77(0,%3) - STCXL " %1,0,%3 \n" + PPC_STCX "%1,0,%3 \n" "bne- 1b" ISYNC_ON_SMP : "=&r" (old), "=&r" (t) @@ -166,10 +157,10 @@ __asm__ __volatile__( EIEIO_ON_SMP -"1:" LARXL " %0,0,%3 # test_and_change_bit\n" +"1:" PPC_LARX "%0,0,%3 # test_and_change_bit\n" "xor %1,%0,%2 \n" PPC405_ERR77(0,%3) - STCXL " %1,0,%3 \n" + PPC_STCX "%1,0,%3 \n" "bne- 1b" ISYNC_ON_SMP : "=&r" (old), "=&r" (t) @@ -184,9 +175,9 @@ unsigned long old; __asm__ __volatile__( -"1:" LARXL " %0,0,%3 # set_bit\n" +"1:" PPC_LARX "%0,0,%3 # set_bits\n" "or %0,%0,%2\n" - STCXL " %0,0,%3\n" + PPC_STCX "%0,0,%3\n" "bne- 1b" : "=&r" (old), "=m" (*addr) : "r" (mask), "r" (addr), "m" (*addr) @@ -268,7 +259,7 @@ { int lz; - asm (CNTLZL " %0,%1" : "=r" (lz) : "r" (x)); + asm (PPC_CNTLZ "%0,%1" : "=r" (lz) : "r" (x)); return BITS_PER_LONG - 1 - lz; } Index: working-2.6/include/asm-powerpc/bug.h =================================================================== --- working-2.6.orig/include/asm-powerpc/bug.h 2005-11-03 16:26:58.000000000 +1100 +++ working-2.6/include/asm-powerpc/bug.h 2005-11-04 14:04:05.000000000 +1100 @@ -1,6 +1,7 @@ #ifndef _ASM_POWERPC_BUG_H #define _ASM_POWERPC_BUG_H +#include /* * Define an illegal instr to trap on the bug. * We don't use 0 because that marks the end of a function @@ -11,14 +12,6 @@ #ifndef __ASSEMBLY__ -#ifdef __powerpc64__ -#define BUG_TABLE_ENTRY ".llong" -#define BUG_TRAP_OP "tdnei" -#else -#define BUG_TABLE_ENTRY ".long" -#define BUG_TRAP_OP "twnei" -#endif /* __powerpc64__ */ - struct bug_entry { unsigned long bug_addr; long line; @@ -40,16 +33,16 @@ __asm__ __volatile__( \ "1: twi 31,0,0\n" \ ".section __bug_table,\"a\"\n" \ - "\t"BUG_TABLE_ENTRY" 1b,%0,%1,%2\n" \ + "\t"PPC_LONG" 1b,%0,%1,%2\n" \ ".previous" \ : : "i" (__LINE__), "i" (__FILE__), "i" (__FUNCTION__)); \ } while (0) #define BUG_ON(x) do { \ __asm__ __volatile__( \ - "1: "BUG_TRAP_OP" %0,0\n" \ + "1: "PPC_TNEI" %0,0\n" \ ".section __bug_table,\"a\"\n" \ - "\t"BUG_TABLE_ENTRY" 1b,%1,%2,%3\n" \ + "\t"PPC_LONG" 1b,%1,%2,%3\n" \ ".previous" \ : : "r" ((long)(x)), "i" (__LINE__), \ "i" (__FILE__), "i" (__FUNCTION__)); \ @@ -57,9 +50,9 @@ #define WARN_ON(x) do { \ __asm__ __volatile__( \ - "1: "BUG_TRAP_OP" %0,0\n" \ + "1: "PPC_TNEI" %0,0\n" \ ".section __bug_table,\"a\"\n" \ - "\t"BUG_TABLE_ENTRY" 1b,%1,%2,%3\n" \ + "\t"PPC_LONG" 1b,%1,%2,%3\n" \ ".previous" \ : : "r" ((long)(x)), \ "i" (__LINE__ + BUG_WARNING_TRAP), \ Index: working-2.6/include/asm-powerpc/futex.h =================================================================== --- working-2.6.orig/include/asm-powerpc/futex.h 2005-11-03 16:26:58.000000000 +1100 +++ working-2.6/include/asm-powerpc/futex.h 2005-11-04 14:04:05.000000000 +1100 @@ -7,13 +7,14 @@ #include #include #include -#include +#include #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ __asm__ __volatile ( \ SYNC_ON_SMP \ "1: lwarx %0,0,%2\n" \ insn \ + PPC405_ERR77(0, %2) \ "2: stwcx. %1,0,%2\n" \ "bne- 1b\n" \ "li %1,0\n" \ @@ -23,7 +24,7 @@ ".previous\n" \ ".section __ex_table,\"a\"\n" \ ".align 3\n" \ - DATAL " 1b,4b,2b,4b\n" \ + PPC_LONG "1b,4b,2b,4b\n" \ ".previous" \ : "=&r" (oldval), "=&r" (ret) \ : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ Index: working-2.6/include/asm-powerpc/cputable.h =================================================================== --- working-2.6.orig/include/asm-powerpc/cputable.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-powerpc/cputable.h 2005-11-04 14:04:05.000000000 +1100 @@ -2,7 +2,7 @@ #define __ASM_POWERPC_CPUTABLE_H #include -#include /* for ASM_CONST */ +#include #define PPC_FEATURE_32 0x80000000 #define PPC_FEATURE_64 0x40000000 Index: working-2.6/include/asm-ppc64/mmu.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/mmu.h 2005-11-04 14:04:05.000000000 +1100 @@ -14,7 +14,7 @@ #define _PPC64_MMU_H_ #include -#include /* for ASM_CONST */ +#include #include /* Index: working-2.6/include/asm-ppc64/page.h =================================================================== --- working-2.6.orig/include/asm-ppc64/page.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/page.h 2005-11-04 14:04:05.000000000 +1100 @@ -11,7 +11,7 @@ */ #include -#include /* for ASM_CONST */ +#include /* PAGE_SHIFT determines the page size */ #define PAGE_SHIFT 12 Index: working-2.6/include/asm-powerpc/asm-compat.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/asm-compat.h 2005-11-04 14:04:05.000000000 +1100 @@ -0,0 +1,55 @@ +#ifndef _ASM_POWERPC_ASM_COMPAT_H +#define _ASM_POWERPC_ASM_COMPAT_H + +#include +#include + +#ifdef __ASSEMBLY__ +# define stringify_in_c(...) __VA_ARGS__ +# define ASM_CONST(x) x +#else +/* This version of stringify will deal with commas... */ +# define __stringify_in_c(...) #__VA_ARGS__ +# define stringify_in_c(...) __stringify_in_c(__VA_ARGS__) " " +# define __ASM_CONST(x) x##UL +# define ASM_CONST(x) __ASM_CONST(x) +#endif + +#ifdef __powerpc64__ + +/* operations for longs and pointers */ +#define PPC_LD stringify_in_c(ld) +#define PPC_ST stringify_in_c(std) +#define PPC_CMPI stringify_in_c(cmpdi) +#define PPC_LONG stringify_in_c(.llong) +#define PPC_TNEI stringify_in_c(tdnei) +#define PPC_LARX stringify_in_c(ldarx) +#define PPC_STCX stringify_in_c(stdcx.) +#define PPC_CNTLZ stringify_in_c(cntlzd) + +#else /* 32-bit */ + +/* operations for longs and pointers */ +#define PPC_LD stringify_in_c(lwz) +#define PPC_ST stringify_in_c(stw) +#define PPC_CMPI stringify_in_c(cmpwi) +#define PPC_LONG stringify_in_c(.long) +#define PPC_TNEI stringify_in_c(twnei) +#define PPC_LARX stringify_in_c(lwarx) +#define PPC_STCX stringify_in_c(stwcx.) +#define PPC_CNTLZ stringify_in_c(cntlzw) + +#endif + +#ifdef CONFIG_IBM405_ERR77 +/* Erratum #77 on the 405 means we need a sync or dcbt before every + * stwcx. The old ATOMIC_SYNC_FIX covered some but not all of this. + */ +#define PPC405_ERR77(ra,rb) stringify_in_c(dcbt ra, rb;) +#define PPC405_ERR77_SYNC stringify_in_c(sync;) +#else +#define PPC405_ERR77(ra,rb) +#define PPC405_ERR77_SYNC +#endif + +#endif /* _ASM_POWERPC_ASM_COMPAT_H */ Index: working-2.6/arch/powerpc/xmon/setjmp.S =================================================================== --- working-2.6.orig/arch/powerpc/xmon/setjmp.S 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/powerpc/xmon/setjmp.S 2005-11-04 14:04:05.000000000 +1100 @@ -14,61 +14,61 @@ _GLOBAL(xmon_setjmp) mflr r0 - STL r0,0(r3) - STL r1,SZL(r3) - STL r2,2*SZL(r3) + PPC_ST r0,0(r3) + PPC_ST r1,SZL(r3) + PPC_ST r2,2*SZL(r3) mfcr r0 - STL r0,3*SZL(r3) - STL r13,4*SZL(r3) - STL r14,5*SZL(r3) - STL r15,6*SZL(r3) - STL r16,7*SZL(r3) - STL r17,8*SZL(r3) - STL r18,9*SZL(r3) - STL r19,10*SZL(r3) - STL r20,11*SZL(r3) - STL r21,12*SZL(r3) - STL r22,13*SZL(r3) - STL r23,14*SZL(r3) - STL r24,15*SZL(r3) - STL r25,16*SZL(r3) - STL r26,17*SZL(r3) - STL r27,18*SZL(r3) - STL r28,19*SZL(r3) - STL r29,20*SZL(r3) - STL r30,21*SZL(r3) - STL r31,22*SZL(r3) + PPC_ST r0,3*SZL(r3) + PPC_ST r13,4*SZL(r3) + PPC_ST r14,5*SZL(r3) + PPC_ST r15,6*SZL(r3) + PPC_ST r16,7*SZL(r3) + PPC_ST r17,8*SZL(r3) + PPC_ST r18,9*SZL(r3) + PPC_ST r19,10*SZL(r3) + PPC_ST r20,11*SZL(r3) + PPC_ST r21,12*SZL(r3) + PPC_ST r22,13*SZL(r3) + PPC_ST r23,14*SZL(r3) + PPC_ST r24,15*SZL(r3) + PPC_ST r25,16*SZL(r3) + PPC_ST r26,17*SZL(r3) + PPC_ST r27,18*SZL(r3) + PPC_ST r28,19*SZL(r3) + PPC_ST r29,20*SZL(r3) + PPC_ST r30,21*SZL(r3) + PPC_ST r31,22*SZL(r3) li r3,0 blr _GLOBAL(xmon_longjmp) - CMPI r4,0 + PPC_CMPI r4,0 bne 1f li r4,1 -1: LDL r13,4*SZL(r3) - LDL r14,5*SZL(r3) - LDL r15,6*SZL(r3) - LDL r16,7*SZL(r3) - LDL r17,8*SZL(r3) - LDL r18,9*SZL(r3) - LDL r19,10*SZL(r3) - LDL r20,11*SZL(r3) - LDL r21,12*SZL(r3) - LDL r22,13*SZL(r3) - LDL r23,14*SZL(r3) - LDL r24,15*SZL(r3) - LDL r25,16*SZL(r3) - LDL r26,17*SZL(r3) - LDL r27,18*SZL(r3) - LDL r28,19*SZL(r3) - LDL r29,20*SZL(r3) - LDL r30,21*SZL(r3) - LDL r31,22*SZL(r3) - LDL r0,3*SZL(r3) +1: PPC_LD r13,4*SZL(r3) + PPC_LD r14,5*SZL(r3) + PPC_LD r15,6*SZL(r3) + PPC_LD r16,7*SZL(r3) + PPC_LD r17,8*SZL(r3) + PPC_LD r18,9*SZL(r3) + PPC_LD r19,10*SZL(r3) + PPC_LD r20,11*SZL(r3) + PPC_LD r21,12*SZL(r3) + PPC_LD r22,13*SZL(r3) + PPC_LD r23,14*SZL(r3) + PPC_LD r24,15*SZL(r3) + PPC_LD r25,16*SZL(r3) + PPC_LD r26,17*SZL(r3) + PPC_LD r27,18*SZL(r3) + PPC_LD r28,19*SZL(r3) + PPC_LD r29,20*SZL(r3) + PPC_LD r30,21*SZL(r3) + PPC_LD r31,22*SZL(r3) + PPC_LD r0,3*SZL(r3) mtcrf 0x38,r0 - LDL r0,0(r3) - LDL r1,SZL(r3) - LDL r2,2*SZL(r3) + PPC_LD r0,0(r3) + PPC_LD r1,SZL(r3) + PPC_LD r2,2*SZL(r3) mtlr r0 mr r3,r4 blr @@ -84,52 +84,52 @@ * different ABIs, though). */ _GLOBAL(xmon_save_regs) - STL r0,0*SZL(r3) - STL r2,2*SZL(r3) - STL r3,3*SZL(r3) - STL r4,4*SZL(r3) - STL r5,5*SZL(r3) - STL r6,6*SZL(r3) - STL r7,7*SZL(r3) - STL r8,8*SZL(r3) - STL r9,9*SZL(r3) - STL r10,10*SZL(r3) - STL r11,11*SZL(r3) - STL r12,12*SZL(r3) - STL r13,13*SZL(r3) - STL r14,14*SZL(r3) - STL r15,15*SZL(r3) - STL r16,16*SZL(r3) - STL r17,17*SZL(r3) - STL r18,18*SZL(r3) - STL r19,19*SZL(r3) - STL r20,20*SZL(r3) - STL r21,21*SZL(r3) - STL r22,22*SZL(r3) - STL r23,23*SZL(r3) - STL r24,24*SZL(r3) - STL r25,25*SZL(r3) - STL r26,26*SZL(r3) - STL r27,27*SZL(r3) - STL r28,28*SZL(r3) - STL r29,29*SZL(r3) - STL r30,30*SZL(r3) - STL r31,31*SZL(r3) + PPC_ST r0,0*SZL(r3) + PPC_ST r2,2*SZL(r3) + PPC_ST r3,3*SZL(r3) + PPC_ST r4,4*SZL(r3) + PPC_ST r5,5*SZL(r3) + PPC_ST r6,6*SZL(r3) + PPC_ST r7,7*SZL(r3) + PPC_ST r8,8*SZL(r3) + PPC_ST r9,9*SZL(r3) + PPC_ST r10,10*SZL(r3) + PPC_ST r11,11*SZL(r3) + PPC_ST r12,12*SZL(r3) + PPC_ST r13,13*SZL(r3) + PPC_ST r14,14*SZL(r3) + PPC_ST r15,15*SZL(r3) + PPC_ST r16,16*SZL(r3) + PPC_ST r17,17*SZL(r3) + PPC_ST r18,18*SZL(r3) + PPC_ST r19,19*SZL(r3) + PPC_ST r20,20*SZL(r3) + PPC_ST r21,21*SZL(r3) + PPC_ST r22,22*SZL(r3) + PPC_ST r23,23*SZL(r3) + PPC_ST r24,24*SZL(r3) + PPC_ST r25,25*SZL(r3) + PPC_ST r26,26*SZL(r3) + PPC_ST r27,27*SZL(r3) + PPC_ST r28,28*SZL(r3) + PPC_ST r29,29*SZL(r3) + PPC_ST r30,30*SZL(r3) + PPC_ST r31,31*SZL(r3) /* go up one stack frame for SP */ - LDL r4,0(r1) - STL r4,1*SZL(r3) + PPC_LD r4,0(r1) + PPC_ST r4,1*SZL(r3) /* get caller's LR */ - LDL r0,LRSAVE(r4) - STL r0,_NIP-STACK_FRAME_OVERHEAD(r3) - STL r0,_LINK-STACK_FRAME_OVERHEAD(r3) + PPC_LD r0,LRSAVE(r4) + PPC_ST r0,_NIP-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_LINK-STACK_FRAME_OVERHEAD(r3) mfmsr r0 - STL r0,_MSR-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_MSR-STACK_FRAME_OVERHEAD(r3) mfctr r0 - STL r0,_CTR-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_CTR-STACK_FRAME_OVERHEAD(r3) mfxer r0 - STL r0,_XER-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_XER-STACK_FRAME_OVERHEAD(r3) mfcr r0 - STL r0,_CCR-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_CCR-STACK_FRAME_OVERHEAD(r3) li r0,0 - STL r0,_TRAP-STACK_FRAME_OVERHEAD(r3) + PPC_ST r0,_TRAP-STACK_FRAME_OVERHEAD(r3) blr Index: working-2.6/include/asm-powerpc/system.h =================================================================== --- working-2.6.orig/include/asm-powerpc/system.h 2005-10-31 15:45:01.000000000 +1100 +++ working-2.6/include/asm-powerpc/system.h 2005-11-04 14:04:05.000000000 +1100 @@ -8,7 +8,6 @@ #include #include -#include #include /* Index: working-2.6/include/asm-powerpc/atomic.h =================================================================== --- working-2.6.orig/include/asm-powerpc/atomic.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-powerpc/atomic.h 2005-11-04 14:04:05.000000000 +1100 @@ -9,21 +9,13 @@ #ifdef __KERNEL__ #include +#include #define ATOMIC_INIT(i) { (i) } #define atomic_read(v) ((v)->counter) #define atomic_set(v,i) (((v)->counter) = (i)) -/* Erratum #77 on the 405 means we need a sync or dcbt before every stwcx. - * The old ATOMIC_SYNC_FIX covered some but not all of this. - */ -#ifdef CONFIG_IBM405_ERR77 -#define PPC405_ERR77(ra,rb) "dcbt " #ra "," #rb ";" -#else -#define PPC405_ERR77(ra,rb) -#endif - static __inline__ void atomic_add(int a, atomic_t *v) { int t; Index: working-2.6/include/asm-powerpc/uaccess.h =================================================================== --- working-2.6.orig/include/asm-powerpc/uaccess.h 2005-11-03 16:26:58.000000000 +1100 +++ working-2.6/include/asm-powerpc/uaccess.h 2005-11-04 14:04:05.000000000 +1100 @@ -120,14 +120,6 @@ extern long __put_user_bad(void); -#ifdef __powerpc64__ -#define __EX_TABLE_ALIGN "3" -#define __EX_TABLE_TYPE "llong" -#else -#define __EX_TABLE_ALIGN "2" -#define __EX_TABLE_TYPE "long" -#endif - /* * We don't tell gcc that we are accessing memory, but this is OK * because we do not write to any memory gcc knows about, so there @@ -142,11 +134,12 @@ " b 2b\n" \ ".previous\n" \ ".section __ex_table,\"a\"\n" \ - " .align " __EX_TABLE_ALIGN "\n" \ - " ."__EX_TABLE_TYPE" 1b,3b\n" \ + " .balign %5\n" \ + PPC_LONG "1b,3b\n" \ ".previous" \ : "=r" (err) \ - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) + : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err),\ + "i"(sizeof(unsigned long))) #ifdef __powerpc64__ #define __put_user_asm2(x, ptr, retval) \ @@ -162,12 +155,13 @@ " b 3b\n" \ ".previous\n" \ ".section __ex_table,\"a\"\n" \ - " .align " __EX_TABLE_ALIGN "\n" \ - " ." __EX_TABLE_TYPE " 1b,4b\n" \ - " ." __EX_TABLE_TYPE " 2b,4b\n" \ + " .balign %5\n" \ + PPC_LONG "1b,4b\n" \ + PPC_LONG "2b,4b\n" \ ".previous" \ : "=r" (err) \ - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) + : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err),\ + "i"(sizeof(unsigned long))) #endif /* __powerpc64__ */ #define __put_user_size(x, ptr, size, retval) \ @@ -213,11 +207,12 @@ " b 2b\n" \ ".previous\n" \ ".section __ex_table,\"a\"\n" \ - " .align "__EX_TABLE_ALIGN "\n" \ - " ." __EX_TABLE_TYPE " 1b,3b\n" \ + " .balign %5\n" \ + PPC_LONG "1b,3b\n" \ ".previous" \ : "=r" (err), "=r" (x) \ - : "b" (addr), "i" (-EFAULT), "0" (err)) + : "b" (addr), "i" (-EFAULT), "0" (err), \ + "i"(sizeof(unsigned long))) #ifdef __powerpc64__ #define __get_user_asm2(x, addr, err) \ @@ -235,12 +230,13 @@ " b 3b\n" \ ".previous\n" \ ".section __ex_table,\"a\"\n" \ - " .align " __EX_TABLE_ALIGN "\n" \ - " ." __EX_TABLE_TYPE " 1b,4b\n" \ - " ." __EX_TABLE_TYPE " 2b,4b\n" \ + " .balign %5\n" \ + PPC_LONG "1b,4b\n" \ + PPC_LONG "2b,4b\n" \ ".previous" \ : "=r" (err), "=&r" (x) \ - : "b" (addr), "i" (-EFAULT), "0" (err)) + : "b" (addr), "i" (-EFAULT), "0" (err), \ + "i"(sizeof(unsigned long))) #endif /* __powerpc64__ */ #define __get_user_size(x, ptr, size, retval) \ Index: working-2.6/arch/powerpc/platforms/iseries/misc.S =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/misc.S 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/misc.S 2005-11-04 14:04:05.000000000 +1100 @@ -15,6 +15,7 @@ #include #include +#include .text Index: working-2.6/arch/ppc/boot/openfirmware/Makefile =================================================================== --- working-2.6.orig/arch/ppc/boot/openfirmware/Makefile 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc/boot/openfirmware/Makefile 2005-11-04 14:04:05.000000000 +1100 @@ -80,8 +80,7 @@ $(call if_changed,mknote) -$(obj)/coffcrt0.o: EXTRA_AFLAGS := -traditional -DXCOFF -$(obj)/crt0.o: EXTRA_AFLAGS := -traditional +$(obj)/coffcrt0.o: EXTRA_AFLAGS := -DXCOFF targets += coffcrt0.o crt0.o $(obj)/coffcrt0.o $(obj)/crt0.o: $(common)/crt0.S FORCE $(call if_changed_dep,as_o_S) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From geoffrey.levand at am.sony.com Fri Nov 4 14:55:19 2005 From: geoffrey.levand at am.sony.com (Geoff Levand) Date: Thu, 03 Nov 2005 19:55:19 -0800 Subject: pci_resource_end() changed problem with 2.6.14 Message-ID: <436ADBA7.7030706@am.sony.com> I found that the serial port probe code in drivers/serial/8250_pci.c no longer works properly for ppc64 in 2.6.14. It seems the value returned by pci_resource_len() on ppc64 changed from 8 to 16 since 2.6.13. I tested on a PC and pci_resource_len() returns 8 as expected. Any help on on where to look for the problem would be appreciated. Here's the code that hits the problem: if (pci_resource_flags(dev, i) & IORESOURCE_IO && pci_resource_len(dev, i) == 8 && And here are some test results: 2.6.13-ppc64 --serial_pci_guess_board flags: 101h, start: 80, end: 87, len: 8 --serial_pci_guess_board found --serial_pci_guess_board flags: 101h, start: 64, end: 71, len: 8 --serial_pci_guess_board found 2.6.14-ppc64 --serial_pci_guess_board flags: 101h, start: 80, end: 95, len: 16 --serial_pci_guess_board not found --serial_pci_guess_board flags: 101h, start: 64, end: 79, len: 16 --serial_pci_guess_board not found 2.6.14-i386: --serial_pci_guess_board flags: 101h, start: 48128, end: 48135, len: 8 --serial_pci_guess_board found --serial_pci_guess_board flags: 101h, start: 46080, end: 46087, len: 8 --serial_pci_guess_board found -Geoff From mikey at neuling.org Fri Nov 4 16:02:11 2005 From: mikey at neuling.org (Michael Neuling) Date: Fri, 4 Nov 2005 16:02:11 +1100 Subject: [PATCH] HVC init race Message-ID: <20051104160211.c66d82f3.mikey@neuling.org> I've been hitting a crash on boot where tty_open is being called before the hvc console driver setup is complete. Below patch fixes this problem. Thanks to benh for his help on this. Mikey Signed-off-by: Michael Neuling drivers/char/hvc_console.c | 32 ++++++++++++++++++-------------- 1 files changed, 18 insertions(+), 14 deletions(-) Index: linux-2.6/drivers/char/hvc_console.c =================================================================== --- linux-2.6.orig/drivers/char/hvc_console.c +++ linux-2.6/drivers/char/hvc_console.c @@ -823,34 +823,38 @@ * interfaces start to become available. */ int __init hvc_init(void) { + struct tty_driver *drv; + /* We need more than hvc_count adapters due to hotplug additions. */ - hvc_driver = alloc_tty_driver(HVC_ALLOC_TTY_ADAPTERS); - if (!hvc_driver) + drv = alloc_tty_driver(HVC_ALLOC_TTY_ADAPTERS); + if (!drv) return -ENOMEM; - hvc_driver->owner = THIS_MODULE; - hvc_driver->devfs_name = "hvc/"; - hvc_driver->driver_name = "hvc"; - hvc_driver->name = "hvc"; - hvc_driver->major = HVC_MAJOR; - hvc_driver->minor_start = HVC_MINOR; - hvc_driver->type = TTY_DRIVER_TYPE_SYSTEM; - hvc_driver->init_termios = tty_std_termios; - hvc_driver->flags = TTY_DRIVER_REAL_RAW; - tty_set_operations(hvc_driver, &hvc_ops); + drv->owner = THIS_MODULE; + drv->devfs_name = "hvc/"; + drv->driver_name = "hvc"; + drv->name = "hvc"; + drv->major = HVC_MAJOR; + drv->minor_start = HVC_MINOR; + drv->type = TTY_DRIVER_TYPE_SYSTEM; + drv->init_termios = tty_std_termios; + drv->flags = TTY_DRIVER_REAL_RAW; + tty_set_operations(drv, &hvc_ops); /* Always start the kthread because there can be hotplug vty adapters * added later. */ hvc_task = kthread_run(khvcd, NULL, "khvcd"); if (IS_ERR(hvc_task)) { panic("Couldn't create kthread for console.\n"); - put_tty_driver(hvc_driver); + put_tty_driver(drv); return -EIO; } - if (tty_register_driver(hvc_driver)) + if (tty_register_driver(drv)) panic("Couldn't register hvc console driver\n"); + mb(); + hvc_driver = drv; return 0; } module_init(hvc_init); From hollis at penguinppc.org Fri Nov 4 16:10:19 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Thu, 3 Nov 2005 23:10:19 -0600 Subject: [PATCH] HVC init race In-Reply-To: <20051104160211.c66d82f3.mikey@neuling.org> References: <20051104160211.c66d82f3.mikey@neuling.org> Message-ID: On Nov 3, 2005, at 11:02 PM, Michael Neuling wrote: > I've been hitting a crash on boot where tty_open is being called > before the > hvc console driver setup is complete. Below patch fixes this problem. What is the race exactly? I guess nothing should be calling into hvc_open before tty_register_driver()...? -Hollis > drivers/char/hvc_console.c | 32 ++++++++++++++++++-------------- > 1 files changed, 18 insertions(+), 14 deletions(-) > > Index: linux-2.6/drivers/char/hvc_console.c > =================================================================== > --- linux-2.6.orig/drivers/char/hvc_console.c > +++ linux-2.6/drivers/char/hvc_console.c > @@ -823,34 +823,38 @@ > * interfaces start to become available. */ > int __init hvc_init(void) > { > + struct tty_driver *drv; > + > /* We need more than hvc_count adapters due to hotplug additions. */ > - hvc_driver = alloc_tty_driver(HVC_ALLOC_TTY_ADAPTERS); > - if (!hvc_driver) > + drv = alloc_tty_driver(HVC_ALLOC_TTY_ADAPTERS); > + if (!drv) > return -ENOMEM; > > - hvc_driver->owner = THIS_MODULE; > - hvc_driver->devfs_name = "hvc/"; > - hvc_driver->driver_name = "hvc"; > - hvc_driver->name = "hvc"; > - hvc_driver->major = HVC_MAJOR; > - hvc_driver->minor_start = HVC_MINOR; > - hvc_driver->type = TTY_DRIVER_TYPE_SYSTEM; > - hvc_driver->init_termios = tty_std_termios; > - hvc_driver->flags = TTY_DRIVER_REAL_RAW; > - tty_set_operations(hvc_driver, &hvc_ops); > + drv->owner = THIS_MODULE; > + drv->devfs_name = "hvc/"; > + drv->driver_name = "hvc"; > + drv->name = "hvc"; > + drv->major = HVC_MAJOR; > + drv->minor_start = HVC_MINOR; > + drv->type = TTY_DRIVER_TYPE_SYSTEM; > + drv->init_termios = tty_std_termios; > + drv->flags = TTY_DRIVER_REAL_RAW; > + tty_set_operations(drv, &hvc_ops); > > /* Always start the kthread because there can be hotplug vty adapters > * added later. */ > hvc_task = kthread_run(khvcd, NULL, "khvcd"); > if (IS_ERR(hvc_task)) { > panic("Couldn't create kthread for console.\n"); > - put_tty_driver(hvc_driver); > + put_tty_driver(drv); > return -EIO; > } > > - if (tty_register_driver(hvc_driver)) > + if (tty_register_driver(drv)) > panic("Couldn't register hvc console driver\n"); > > + mb(); > + hvc_driver = drv; > return 0; > } > module_init(hvc_init); > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc64-dev > From galak at kernel.crashing.org Fri Nov 4 16:33:14 2005 From: galak at kernel.crashing.org (Kumar Gala) Date: Thu, 3 Nov 2005 23:33:14 -0600 Subject: powerpc: Consolidate asm compatibility macros In-Reply-To: <20051104031609.GA962@localhost.localdomain> References: <20051104031609.GA962@localhost.localdomain> Message-ID: <7487F450-429B-4836-AF05-DD47B02D5BC1@kernel.crashing.org> David, I hate to be anal, but I think keep the 'L' is useful in the macro names. PPC_LD -> PPC_LL I read 'PPC_LD' as either "PPC load" or "PPC load double" never of which is useful. How about "PPC_LL", which I read as "PPC load long". I would propose the following names which at least follow some PPC naming convention: PPC_LL PPC_STL PPC_LLARX PPC_STLCX - kumar On Nov 3, 2005, at 9:16 PM, David Gibson wrote: > Paulus, not entirely sure if this is close to what you had in mind for > consolidating the asm macro stuff. Apply if you like... > > This patch consolidates macros used to generate assembly for > compatibility across different CPUs or configs. A new header, > asm-powerpc/asm-compat.h contains the main compatibility macros. It > uses some preprocessor magic to make the macros suitable both for use > in .S files, and in inline asm in .c files. Headers (bitops.h, > uaccess.h, atomic.h, bug.h) which had their own such compatibility > macros are changed to use asm-compat.h. > > ppc_asm.h is now for use in .S files *only*, and a #error enforces > that. As such, we're a lot more careless about namespace pollution > here than in asm-compat.h. > > While we're at it, this patch adds a call to the PPC405_ERR77 macro in > futex.h which should have had it already, but didn't. > > Built and booted on pSeries, Maple and iSeries (ARCH=powerpc). Built > for 32-bit powermac (ARCH=powerpc) and Walnut (ARCH=ppc). > > Signed-off-by: David Gibson > > Index: working-2.6/include/asm-powerpc/ppc_asm.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/ppc_asm.h 2005-11-03 > 16:26:58.000000000 +1100 > +++ working-2.6/include/asm-powerpc/ppc_asm.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -6,8 +6,13 @@ > > #include > #include > +#include > > -#ifdef __ASSEMBLY__ > +#ifndef __ASSEMBLY__ > +#error __FILE__ should only be used in assembler files > +#else > + > +#define SZL (BITS_PER_LONG/8) > > /* > * Macros for storing registers into and loading registers from > @@ -184,12 +189,6 @@ > oris reg,reg,(label)@h; \ > ori reg,reg,(label)@l; > > -/* operations for longs and pointers */ > -#define LDL ld > -#define STL std > -#define CMPI cmpdi > -#define SZL 8 > - > /* offsets for stack frame layout */ > #define LRSAVE 16 > > @@ -203,12 +202,6 @@ > > #define OFF(name) name at l > > -/* operations for longs and pointers */ > -#define LDL lwz > -#define STL stw > -#define CMPI cmpwi > -#define SZL 4 > - > /* offsets for stack frame layout */ > #define LRSAVE 4 > > @@ -266,15 +259,6 @@ > #endif > > > -#ifdef CONFIG_IBM405_ERR77 > -#define PPC405_ERR77(ra,rb) dcbt ra, rb; > -#define PPC405_ERR77_SYNC sync; > -#else > -#define PPC405_ERR77(ra,rb) > -#define PPC405_ERR77_SYNC > -#endif > - > - > #ifdef CONFIG_IBM440EP_ERR42 > #define PPC440EP_ERR42 isync > #else > @@ -502,17 +486,6 @@ > #define N_SLINE 68 > #define N_SO 100 > > -#define ASM_CONST(x) x > -#else > - #define __ASM_CONST(x) x##UL > - #define ASM_CONST(x) __ASM_CONST(x) > - > -#ifdef CONFIG_PPC64 > -#define DATAL ".llong" > -#else > -#define DATAL ".long" > -#endif > - > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_PPC_ASM_H */ > Index: working-2.6/arch/powerpc/kernel/fpu.S > =================================================================== > --- working-2.6.orig/arch/powerpc/kernel/fpu.S 2005-10-31 > 15:20:20.000000000 +1100 > +++ working-2.6/arch/powerpc/kernel/fpu.S 2005-11-04 > 14:04:05.000000000 +1100 > @@ -41,7 +41,7 @@ > #ifndef CONFIG_SMP > LOADBASE(r3, last_task_used_math) > toreal(r3) > - LDL r4,OFF(last_task_used_math)(r3) > + PPC_LD r4,OFF(last_task_used_math)(r3) > CMPI 0,r4,0 > beq 1f > toreal(r4) > @@ -49,12 +49,12 @@ > SAVE_32FPRS(0, r4) > mffs fr0 > stfd fr0,THREAD_FPSCR(r4) > - LDL r5,PT_REGS(r4) > + PPC_LD r5,PT_REGS(r4) > toreal(r5) > - LDL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > + PPC_LD r4,_MSR-STACK_FRAME_OVERHEAD(r5) > li r10,MSR_FP|MSR_FE0|MSR_FE1 > andc r4,r4,r10 /* disable FP for previous task */ > - STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > + PPC_ST r4,_MSR-STACK_FRAME_OVERHEAD(r5) > 1: > #endif /* CONFIG_SMP */ > /* enable use of FP after return */ > @@ -77,7 +77,7 @@ > #ifndef CONFIG_SMP > subi r4,r5,THREAD > fromreal(r4) > - STL r4,OFF(last_task_used_math)(r3) > + PPC_ST r4,OFF(last_task_used_math)(r3) > #endif /* CONFIG_SMP */ > /* restore registers and return */ > /* we haven't used ctr or xer or lr */ > @@ -100,21 +100,21 @@ > CMPI 0,r3,0 > beqlr- /* if no previous owner, done */ > addi r3,r3,THREAD /* want THREAD of task */ > - LDL r5,PT_REGS(r3) > + PPC_LD r5,PT_REGS(r3) > CMPI 0,r5,0 > SAVE_32FPRS(0, r3) > mffs fr0 > stfd fr0,THREAD_FPSCR(r3) > beq 1f > - LDL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > + PPC_LD r4,_MSR-STACK_FRAME_OVERHEAD(r5) > li r3,MSR_FP|MSR_FE0|MSR_FE1 > andc r4,r4,r3 /* disable FP for previous task */ > - STL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > + PPC_ST r4,_MSR-STACK_FRAME_OVERHEAD(r5) > 1: > #ifndef CONFIG_SMP > li r5,0 > LOADBASE(r4,last_task_used_math) > - STL r5,OFF(last_task_used_math)(r4) > + PPC_ST r5,OFF(last_task_used_math)(r4) > #endif /* CONFIG_SMP */ > blr > > Index: working-2.6/include/asm-powerpc/bitops.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/bitops.h 2005-11-03 > 16:26:58.000000000 +1100 > +++ working-2.6/include/asm-powerpc/bitops.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -40,6 +40,7 @@ > > #include > #include > +#include > #include > > /* > @@ -52,16 +53,6 @@ > #define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) > #define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) > > -#ifdef CONFIG_PPC64 > -#define LARXL "ldarx" > -#define STCXL "stdcx." > -#define CNTLZL "cntlzd" > -#else > -#define LARXL "lwarx" > -#define STCXL "stwcx." > -#define CNTLZL "cntlzw" > -#endif > - > static __inline__ void set_bit(int nr, volatile unsigned long *addr) > { > unsigned long old; > @@ -69,10 +60,10 @@ > unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); > > __asm__ __volatile__( > -"1:" LARXL " %0,0,%3 # set_bit\n" > +"1:" PPC_LARX "%0,0,%3 # set_bit\n" > "or %0,%0,%2\n" > PPC405_ERR77(0,%3) > - STCXL " %0,0,%3\n" > + PPC_STCX "%0,0,%3\n" > "bne- 1b" > : "=&r"(old), "=m"(*p) > : "r"(mask), "r"(p), "m"(*p) > @@ -86,10 +77,10 @@ > unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); > > __asm__ __volatile__( > -"1:" LARXL " %0,0,%3 # set_bit\n" > +"1:" PPC_LARX "%0,0,%3 # clear_bit\n" > "andc %0,%0,%2\n" > PPC405_ERR77(0,%3) > - STCXL " %0,0,%3\n" > + PPC_STCX "%0,0,%3\n" > "bne- 1b" > : "=&r"(old), "=m"(*p) > : "r"(mask), "r"(p), "m"(*p) > @@ -103,10 +94,10 @@ > unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); > > __asm__ __volatile__( > -"1:" LARXL " %0,0,%3 # set_bit\n" > +"1:" PPC_LARX "%0,0,%3 # change_bit\n" > "xor %0,%0,%2\n" > PPC405_ERR77(0,%3) > - STCXL " %0,0,%3\n" > + PPC_STCX "%0,0,%3\n" > "bne- 1b" > : "=&r"(old), "=m"(*p) > : "r"(mask), "r"(p), "m"(*p) > @@ -122,10 +113,10 @@ > > __asm__ __volatile__( > EIEIO_ON_SMP > -"1:" LARXL " %0,0,%3 # test_and_set_bit\n" > +"1:" PPC_LARX "%0,0,%3 # test_and_set_bit\n" > "or %1,%0,%2 \n" > PPC405_ERR77(0,%3) > - STCXL " %1,0,%3 \n" > + PPC_STCX "%1,0,%3 \n" > "bne- 1b" > ISYNC_ON_SMP > : "=&r" (old), "=&r" (t) > @@ -144,10 +135,10 @@ > > __asm__ __volatile__( > EIEIO_ON_SMP > -"1:" LARXL " %0,0,%3 # test_and_clear_bit\n" > +"1:" PPC_LARX "%0,0,%3 # test_and_clear_bit\n" > "andc %1,%0,%2 \n" > PPC405_ERR77(0,%3) > - STCXL " %1,0,%3 \n" > + PPC_STCX "%1,0,%3 \n" > "bne- 1b" > ISYNC_ON_SMP > : "=&r" (old), "=&r" (t) > @@ -166,10 +157,10 @@ > > __asm__ __volatile__( > EIEIO_ON_SMP > -"1:" LARXL " %0,0,%3 # test_and_change_bit\n" > +"1:" PPC_LARX "%0,0,%3 # test_and_change_bit\n" > "xor %1,%0,%2 \n" > PPC405_ERR77(0,%3) > - STCXL " %1,0,%3 \n" > + PPC_STCX "%1,0,%3 \n" > "bne- 1b" > ISYNC_ON_SMP > : "=&r" (old), "=&r" (t) > @@ -184,9 +175,9 @@ > unsigned long old; > > __asm__ __volatile__( > -"1:" LARXL " %0,0,%3 # set_bit\n" > +"1:" PPC_LARX "%0,0,%3 # set_bits\n" > "or %0,%0,%2\n" > - STCXL " %0,0,%3\n" > + PPC_STCX "%0,0,%3\n" > "bne- 1b" > : "=&r" (old), "=m" (*addr) > : "r" (mask), "r" (addr), "m" (*addr) > @@ -268,7 +259,7 @@ > { > int lz; > > - asm (CNTLZL " %0,%1" : "=r" (lz) : "r" (x)); > + asm (PPC_CNTLZ "%0,%1" : "=r" (lz) : "r" (x)); > return BITS_PER_LONG - 1 - lz; > } > > Index: working-2.6/include/asm-powerpc/bug.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/bug.h 2005-11-03 > 16:26:58.000000000 +1100 > +++ working-2.6/include/asm-powerpc/bug.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -1,6 +1,7 @@ > #ifndef _ASM_POWERPC_BUG_H > #define _ASM_POWERPC_BUG_H > > +#include > /* > * Define an illegal instr to trap on the bug. > * We don't use 0 because that marks the end of a function > @@ -11,14 +12,6 @@ > > #ifndef __ASSEMBLY__ > > -#ifdef __powerpc64__ > -#define BUG_TABLE_ENTRY ".llong" > -#define BUG_TRAP_OP "tdnei" > -#else > -#define BUG_TABLE_ENTRY ".long" > -#define BUG_TRAP_OP "twnei" > -#endif /* __powerpc64__ */ > - > struct bug_entry { > unsigned long bug_addr; > long line; > @@ -40,16 +33,16 @@ > __asm__ __volatile__( \ > "1: twi 31,0,0\n" \ > ".section __bug_table,\"a\"\n" \ > - "\t"BUG_TABLE_ENTRY" 1b,%0,%1,%2\n" \ > + "\t"PPC_LONG" 1b,%0,%1,%2\n" \ > ".previous" \ > : : "i" (__LINE__), "i" (__FILE__), "i" (__FUNCTION__)); \ > } while (0) > > #define BUG_ON(x) do { \ > __asm__ __volatile__( \ > - "1: "BUG_TRAP_OP" %0,0\n" \ > + "1: "PPC_TNEI" %0,0\n" \ > ".section __bug_table,\"a\"\n" \ > - "\t"BUG_TABLE_ENTRY" 1b,%1,%2,%3\n" \ > + "\t"PPC_LONG" 1b,%1,%2,%3\n" \ > ".previous" \ > : : "r" ((long)(x)), "i" (__LINE__), \ > "i" (__FILE__), "i" (__FUNCTION__)); \ > @@ -57,9 +50,9 @@ > > #define WARN_ON(x) do { \ > __asm__ __volatile__( \ > - "1: "BUG_TRAP_OP" %0,0\n" \ > + "1: "PPC_TNEI" %0,0\n" \ > ".section __bug_table,\"a\"\n" \ > - "\t"BUG_TABLE_ENTRY" 1b,%1,%2,%3\n" \ > + "\t"PPC_LONG" 1b,%1,%2,%3\n" \ > ".previous" \ > : : "r" ((long)(x)), \ > "i" (__LINE__ + BUG_WARNING_TRAP), \ > Index: working-2.6/include/asm-powerpc/futex.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/futex.h 2005-11-03 > 16:26:58.000000000 +1100 > +++ working-2.6/include/asm-powerpc/futex.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -7,13 +7,14 @@ > #include > #include > #include > -#include > +#include > > #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ > __asm__ __volatile ( \ > SYNC_ON_SMP \ > "1: lwarx %0,0,%2\n" \ > insn \ > + PPC405_ERR77(0, %2) \ > "2: stwcx. %1,0,%2\n" \ > "bne- 1b\n" \ > "li %1,0\n" \ > @@ -23,7 +24,7 @@ > ".previous\n" \ > ".section __ex_table,\"a\"\n" \ > ".align 3\n" \ > - DATAL " 1b,4b,2b,4b\n" \ > + PPC_LONG "1b,4b,2b,4b\n" \ > ".previous" \ > : "=&r" (oldval), "=&r" (ret) \ > : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ > Index: working-2.6/include/asm-powerpc/cputable.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/cputable.h 2005-10-31 > 15:20:22.000000000 +1100 > +++ working-2.6/include/asm-powerpc/cputable.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -2,7 +2,7 @@ > #define __ASM_POWERPC_CPUTABLE_H > > #include > -#include /* for ASM_CONST */ > +#include > > #define PPC_FEATURE_32 0x80000000 > #define PPC_FEATURE_64 0x40000000 > Index: working-2.6/include/asm-ppc64/mmu.h > =================================================================== > --- working-2.6.orig/include/asm-ppc64/mmu.h 2005-10-31 > 15:20:22.000000000 +1100 > +++ working-2.6/include/asm-ppc64/mmu.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -14,7 +14,7 @@ > #define _PPC64_MMU_H_ > > #include > -#include /* for ASM_CONST */ > +#include > #include > > /* > Index: working-2.6/include/asm-ppc64/page.h > =================================================================== > --- working-2.6.orig/include/asm-ppc64/page.h 2005-10-31 > 15:20:22.000000000 +1100 > +++ working-2.6/include/asm-ppc64/page.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -11,7 +11,7 @@ > */ > > #include > -#include /* for ASM_CONST */ > +#include > > /* PAGE_SHIFT determines the page size */ > #define PAGE_SHIFT 12 > Index: working-2.6/include/asm-powerpc/asm-compat.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ working-2.6/include/asm-powerpc/asm-compat.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -0,0 +1,55 @@ > +#ifndef _ASM_POWERPC_ASM_COMPAT_H > +#define _ASM_POWERPC_ASM_COMPAT_H > + > +#include > +#include > + > +#ifdef __ASSEMBLY__ > +# define stringify_in_c(...) __VA_ARGS__ > +# define ASM_CONST(x) x > +#else > +/* This version of stringify will deal with commas... */ > +# define __stringify_in_c(...) #__VA_ARGS__ > +# define stringify_in_c(...) __stringify_in_c(__VA_ARGS__) " " > +# define __ASM_CONST(x) x##UL > +# define ASM_CONST(x) __ASM_CONST(x) > +#endif > + > +#ifdef __powerpc64__ > + > +/* operations for longs and pointers */ > +#define PPC_LD stringify_in_c(ld) > +#define PPC_ST stringify_in_c(std) > +#define PPC_CMPI stringify_in_c(cmpdi) > +#define PPC_LONG stringify_in_c(.llong) > +#define PPC_TNEI stringify_in_c(tdnei) > +#define PPC_LARX stringify_in_c(ldarx) > +#define PPC_STCX stringify_in_c(stdcx.) > +#define PPC_CNTLZ stringify_in_c(cntlzd) > + > +#else /* 32-bit */ > + > +/* operations for longs and pointers */ > +#define PPC_LD stringify_in_c(lwz) > +#define PPC_ST stringify_in_c(stw) > +#define PPC_CMPI stringify_in_c(cmpwi) > +#define PPC_LONG stringify_in_c(.long) > +#define PPC_TNEI stringify_in_c(twnei) > +#define PPC_LARX stringify_in_c(lwarx) > +#define PPC_STCX stringify_in_c(stwcx.) > +#define PPC_CNTLZ stringify_in_c(cntlzw) > + > +#endif > + > +#ifdef CONFIG_IBM405_ERR77 > +/* Erratum #77 on the 405 means we need a sync or dcbt before every > + * stwcx. The old ATOMIC_SYNC_FIX covered some but not all of this. > + */ > +#define PPC405_ERR77(ra,rb) stringify_in_c(dcbt ra, rb;) > +#define PPC405_ERR77_SYNC stringify_in_c(sync;) > +#else > +#define PPC405_ERR77(ra,rb) > +#define PPC405_ERR77_SYNC > +#endif > + > +#endif /* _ASM_POWERPC_ASM_COMPAT_H */ > Index: working-2.6/arch/powerpc/xmon/setjmp.S > =================================================================== > --- working-2.6.orig/arch/powerpc/xmon/setjmp.S 2005-10-31 > 15:20:57.000000000 +1100 > +++ working-2.6/arch/powerpc/xmon/setjmp.S 2005-11-04 > 14:04:05.000000000 +1100 > @@ -14,61 +14,61 @@ > > _GLOBAL(xmon_setjmp) > mflr r0 > - STL r0,0(r3) > - STL r1,SZL(r3) > - STL r2,2*SZL(r3) > + PPC_ST r0,0(r3) > + PPC_ST r1,SZL(r3) > + PPC_ST r2,2*SZL(r3) > mfcr r0 > - STL r0,3*SZL(r3) > - STL r13,4*SZL(r3) > - STL r14,5*SZL(r3) > - STL r15,6*SZL(r3) > - STL r16,7*SZL(r3) > - STL r17,8*SZL(r3) > - STL r18,9*SZL(r3) > - STL r19,10*SZL(r3) > - STL r20,11*SZL(r3) > - STL r21,12*SZL(r3) > - STL r22,13*SZL(r3) > - STL r23,14*SZL(r3) > - STL r24,15*SZL(r3) > - STL r25,16*SZL(r3) > - STL r26,17*SZL(r3) > - STL r27,18*SZL(r3) > - STL r28,19*SZL(r3) > - STL r29,20*SZL(r3) > - STL r30,21*SZL(r3) > - STL r31,22*SZL(r3) > + PPC_ST r0,3*SZL(r3) > + PPC_ST r13,4*SZL(r3) > + PPC_ST r14,5*SZL(r3) > + PPC_ST r15,6*SZL(r3) > + PPC_ST r16,7*SZL(r3) > + PPC_ST r17,8*SZL(r3) > + PPC_ST r18,9*SZL(r3) > + PPC_ST r19,10*SZL(r3) > + PPC_ST r20,11*SZL(r3) > + PPC_ST r21,12*SZL(r3) > + PPC_ST r22,13*SZL(r3) > + PPC_ST r23,14*SZL(r3) > + PPC_ST r24,15*SZL(r3) > + PPC_ST r25,16*SZL(r3) > + PPC_ST r26,17*SZL(r3) > + PPC_ST r27,18*SZL(r3) > + PPC_ST r28,19*SZL(r3) > + PPC_ST r29,20*SZL(r3) > + PPC_ST r30,21*SZL(r3) > + PPC_ST r31,22*SZL(r3) > li r3,0 > blr > > _GLOBAL(xmon_longjmp) > - CMPI r4,0 > + PPC_CMPI r4,0 > bne 1f > li r4,1 > -1: LDL r13,4*SZL(r3) > - LDL r14,5*SZL(r3) > - LDL r15,6*SZL(r3) > - LDL r16,7*SZL(r3) > - LDL r17,8*SZL(r3) > - LDL r18,9*SZL(r3) > - LDL r19,10*SZL(r3) > - LDL r20,11*SZL(r3) > - LDL r21,12*SZL(r3) > - LDL r22,13*SZL(r3) > - LDL r23,14*SZL(r3) > - LDL r24,15*SZL(r3) > - LDL r25,16*SZL(r3) > - LDL r26,17*SZL(r3) > - LDL r27,18*SZL(r3) > - LDL r28,19*SZL(r3) > - LDL r29,20*SZL(r3) > - LDL r30,21*SZL(r3) > - LDL r31,22*SZL(r3) > - LDL r0,3*SZL(r3) > +1: PPC_LD r13,4*SZL(r3) > + PPC_LD r14,5*SZL(r3) > + PPC_LD r15,6*SZL(r3) > + PPC_LD r16,7*SZL(r3) > + PPC_LD r17,8*SZL(r3) > + PPC_LD r18,9*SZL(r3) > + PPC_LD r19,10*SZL(r3) > + PPC_LD r20,11*SZL(r3) > + PPC_LD r21,12*SZL(r3) > + PPC_LD r22,13*SZL(r3) > + PPC_LD r23,14*SZL(r3) > + PPC_LD r24,15*SZL(r3) > + PPC_LD r25,16*SZL(r3) > + PPC_LD r26,17*SZL(r3) > + PPC_LD r27,18*SZL(r3) > + PPC_LD r28,19*SZL(r3) > + PPC_LD r29,20*SZL(r3) > + PPC_LD r30,21*SZL(r3) > + PPC_LD r31,22*SZL(r3) > + PPC_LD r0,3*SZL(r3) > mtcrf 0x38,r0 > - LDL r0,0(r3) > - LDL r1,SZL(r3) > - LDL r2,2*SZL(r3) > + PPC_LD r0,0(r3) > + PPC_LD r1,SZL(r3) > + PPC_LD r2,2*SZL(r3) > mtlr r0 > mr r3,r4 > blr > @@ -84,52 +84,52 @@ > * different ABIs, though). > */ > _GLOBAL(xmon_save_regs) > - STL r0,0*SZL(r3) > - STL r2,2*SZL(r3) > - STL r3,3*SZL(r3) > - STL r4,4*SZL(r3) > - STL r5,5*SZL(r3) > - STL r6,6*SZL(r3) > - STL r7,7*SZL(r3) > - STL r8,8*SZL(r3) > - STL r9,9*SZL(r3) > - STL r10,10*SZL(r3) > - STL r11,11*SZL(r3) > - STL r12,12*SZL(r3) > - STL r13,13*SZL(r3) > - STL r14,14*SZL(r3) > - STL r15,15*SZL(r3) > - STL r16,16*SZL(r3) > - STL r17,17*SZL(r3) > - STL r18,18*SZL(r3) > - STL r19,19*SZL(r3) > - STL r20,20*SZL(r3) > - STL r21,21*SZL(r3) > - STL r22,22*SZL(r3) > - STL r23,23*SZL(r3) > - STL r24,24*SZL(r3) > - STL r25,25*SZL(r3) > - STL r26,26*SZL(r3) > - STL r27,27*SZL(r3) > - STL r28,28*SZL(r3) > - STL r29,29*SZL(r3) > - STL r30,30*SZL(r3) > - STL r31,31*SZL(r3) > + PPC_ST r0,0*SZL(r3) > + PPC_ST r2,2*SZL(r3) > + PPC_ST r3,3*SZL(r3) > + PPC_ST r4,4*SZL(r3) > + PPC_ST r5,5*SZL(r3) > + PPC_ST r6,6*SZL(r3) > + PPC_ST r7,7*SZL(r3) > + PPC_ST r8,8*SZL(r3) > + PPC_ST r9,9*SZL(r3) > + PPC_ST r10,10*SZL(r3) > + PPC_ST r11,11*SZL(r3) > + PPC_ST r12,12*SZL(r3) > + PPC_ST r13,13*SZL(r3) > + PPC_ST r14,14*SZL(r3) > + PPC_ST r15,15*SZL(r3) > + PPC_ST r16,16*SZL(r3) > + PPC_ST r17,17*SZL(r3) > + PPC_ST r18,18*SZL(r3) > + PPC_ST r19,19*SZL(r3) > + PPC_ST r20,20*SZL(r3) > + PPC_ST r21,21*SZL(r3) > + PPC_ST r22,22*SZL(r3) > + PPC_ST r23,23*SZL(r3) > + PPC_ST r24,24*SZL(r3) > + PPC_ST r25,25*SZL(r3) > + PPC_ST r26,26*SZL(r3) > + PPC_ST r27,27*SZL(r3) > + PPC_ST r28,28*SZL(r3) > + PPC_ST r29,29*SZL(r3) > + PPC_ST r30,30*SZL(r3) > + PPC_ST r31,31*SZL(r3) > /* go up one stack frame for SP */ > - LDL r4,0(r1) > - STL r4,1*SZL(r3) > + PPC_LD r4,0(r1) > + PPC_ST r4,1*SZL(r3) > /* get caller's LR */ > - LDL r0,LRSAVE(r4) > - STL r0,_NIP-STACK_FRAME_OVERHEAD(r3) > - STL r0,_LINK-STACK_FRAME_OVERHEAD(r3) > + PPC_LD r0,LRSAVE(r4) > + PPC_ST r0,_NIP-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_LINK-STACK_FRAME_OVERHEAD(r3) > mfmsr r0 > - STL r0,_MSR-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_MSR-STACK_FRAME_OVERHEAD(r3) > mfctr r0 > - STL r0,_CTR-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_CTR-STACK_FRAME_OVERHEAD(r3) > mfxer r0 > - STL r0,_XER-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_XER-STACK_FRAME_OVERHEAD(r3) > mfcr r0 > - STL r0,_CCR-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_CCR-STACK_FRAME_OVERHEAD(r3) > li r0,0 > - STL r0,_TRAP-STACK_FRAME_OVERHEAD(r3) > + PPC_ST r0,_TRAP-STACK_FRAME_OVERHEAD(r3) > blr > Index: working-2.6/include/asm-powerpc/system.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/system.h 2005-10-31 > 15:45:01.000000000 +1100 > +++ working-2.6/include/asm-powerpc/system.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -8,7 +8,6 @@ > #include > > #include > -#include > #include > > /* > Index: working-2.6/include/asm-powerpc/atomic.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/atomic.h 2005-10-31 > 15:20:22.000000000 +1100 > +++ working-2.6/include/asm-powerpc/atomic.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -9,21 +9,13 @@ > > #ifdef __KERNEL__ > #include > +#include > > #define ATOMIC_INIT(i) { (i) } > > #define atomic_read(v) ((v)->counter) > #define atomic_set(v,i) (((v)->counter) = (i)) > > -/* Erratum #77 on the 405 means we need a sync or dcbt before > every stwcx. > - * The old ATOMIC_SYNC_FIX covered some but not all of this. > - */ > -#ifdef CONFIG_IBM405_ERR77 > -#define PPC405_ERR77(ra,rb) "dcbt " #ra "," #rb ";" > -#else > -#define PPC405_ERR77(ra,rb) > -#endif > - > static __inline__ void atomic_add(int a, atomic_t *v) > { > int t; > Index: working-2.6/include/asm-powerpc/uaccess.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/uaccess.h 2005-11-03 > 16:26:58.000000000 +1100 > +++ working-2.6/include/asm-powerpc/uaccess.h 2005-11-04 > 14:04:05.000000000 +1100 > @@ -120,14 +120,6 @@ > > extern long __put_user_bad(void); > > -#ifdef __powerpc64__ > -#define __EX_TABLE_ALIGN "3" > -#define __EX_TABLE_TYPE "llong" > -#else > -#define __EX_TABLE_ALIGN "2" > -#define __EX_TABLE_TYPE "long" > -#endif > - > /* > * We don't tell gcc that we are accessing memory, but this is OK > * because we do not write to any memory gcc knows about, so there > @@ -142,11 +134,12 @@ > " b 2b\n" \ > ".previous\n" \ > ".section __ex_table,\"a\"\n" \ > - " .align " __EX_TABLE_ALIGN "\n" \ > - " ."__EX_TABLE_TYPE" 1b,3b\n" \ > + " .balign %5\n" \ > + PPC_LONG "1b,3b\n" \ > ".previous" \ > : "=r" (err) \ > - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) > + : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err),\ > + "i"(sizeof(unsigned long))) > > #ifdef __powerpc64__ > #define __put_user_asm2(x, ptr, retval) \ > @@ -162,12 +155,13 @@ > " b 3b\n" \ > ".previous\n" \ > ".section __ex_table,\"a\"\n" \ > - " .align " __EX_TABLE_ALIGN "\n" \ > - " ." __EX_TABLE_TYPE " 1b,4b\n" \ > - " ." __EX_TABLE_TYPE " 2b,4b\n" \ > + " .balign %5\n" \ > + PPC_LONG "1b,4b\n" \ > + PPC_LONG "2b,4b\n" \ > ".previous" \ > : "=r" (err) \ > - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) > + : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err),\ > + "i"(sizeof(unsigned long))) > #endif /* __powerpc64__ */ > > #define __put_user_size(x, ptr, size, retval) \ > @@ -213,11 +207,12 @@ > " b 2b\n" \ > ".previous\n" \ > ".section __ex_table,\"a\"\n" \ > - " .align "__EX_TABLE_ALIGN "\n" \ > - " ." __EX_TABLE_TYPE " 1b,3b\n" \ > + " .balign %5\n" \ > + PPC_LONG "1b,3b\n" \ > ".previous" \ > : "=r" (err), "=r" (x) \ > - : "b" (addr), "i" (-EFAULT), "0" (err)) > + : "b" (addr), "i" (-EFAULT), "0" (err), \ > + "i"(sizeof(unsigned long))) > > #ifdef __powerpc64__ > #define __get_user_asm2(x, addr, err) \ > @@ -235,12 +230,13 @@ > " b 3b\n" \ > ".previous\n" \ > ".section __ex_table,\"a\"\n" \ > - " .align " __EX_TABLE_ALIGN "\n" \ > - " ." __EX_TABLE_TYPE " 1b,4b\n" \ > - " ." __EX_TABLE_TYPE " 2b,4b\n" \ > + " .balign %5\n" \ > + PPC_LONG "1b,4b\n" \ > + PPC_LONG "2b,4b\n" \ > ".previous" \ > : "=r" (err), "=&r" (x) \ > - : "b" (addr), "i" (-EFAULT), "0" (err)) > + : "b" (addr), "i" (-EFAULT), "0" (err), \ > + "i"(sizeof(unsigned long))) > #endif /* __powerpc64__ */ > > #define __get_user_size(x, ptr, size, retval) \ > Index: working-2.6/arch/powerpc/platforms/iseries/misc.S > =================================================================== > --- working-2.6.orig/arch/powerpc/platforms/iseries/misc.S > 2005-10-31 15:20:20.000000000 +1100 > +++ working-2.6/arch/powerpc/platforms/iseries/misc.S 2005-11-04 > 14:04:05.000000000 +1100 > @@ -15,6 +15,7 @@ > > #include > #include > +#include > > .text > > Index: working-2.6/arch/ppc/boot/openfirmware/Makefile > =================================================================== > --- working-2.6.orig/arch/ppc/boot/openfirmware/Makefile 2005-10-25 > 11:59:53.000000000 +1000 > +++ working-2.6/arch/ppc/boot/openfirmware/Makefile 2005-11-04 > 14:04:05.000000000 +1100 > @@ -80,8 +80,7 @@ > $(call if_changed,mknote) > > > -$(obj)/coffcrt0.o: EXTRA_AFLAGS := -traditional -DXCOFF > -$(obj)/crt0.o: EXTRA_AFLAGS := -traditional > +$(obj)/coffcrt0.o: EXTRA_AFLAGS := -DXCOFF > targets += coffcrt0.o crt0.o > $(obj)/coffcrt0.o $(obj)/crt0.o: $(common)/crt0.S FORCE > $(call if_changed_dep,as_o_S) > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/people/dgibson > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc64-dev From benh at kernel.crashing.org Fri Nov 4 17:56:09 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 04 Nov 2005 17:56:09 +1100 Subject: pci_resource_end() changed problem with 2.6.14 In-Reply-To: <436ADBA7.7030706@am.sony.com> References: <436ADBA7.7030706@am.sony.com> Message-ID: <1131087370.4680.238.camel@gaston> On Thu, 2005-11-03 at 19:55 -0800, Geoff Levand wrote: > I found that the serial port probe code in drivers/serial/8250_pci.c > no longer works properly for ppc64 in 2.6.14. It seems the value > returned by pci_resource_len() on ppc64 changed from 8 to 16 since > 2.6.13. I tested on a PC and pci_resource_len() returns 8 as > expected. > > Any help on on where to look for the problem would be appreciated. > > Here's the code that hits the problem: > > if (pci_resource_flags(dev, i) & IORESOURCE_IO && > pci_resource_len(dev, i) == 8 && > > And here are some test results: Interesting... What does an lspci -vv shows for the BARs of the PCI card ? Also, what do you have in /proc/device-tree ? What is the machine precisely ? 2.6.14 now uses the OF device-tree to generate the linux PCI tree instead of going directly to PCI probing. It's possible that this is causing your problem if for some reason, the BAR sizing done by OF ends up being different than what the kernel does ... Ben. From benh at kernel.crashing.org Fri Nov 4 17:58:58 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 04 Nov 2005 17:58:58 +1100 Subject: [PATCH] HVC init race In-Reply-To: References: <20051104160211.c66d82f3.mikey@neuling.org> Message-ID: <1131087539.4680.242.camel@gaston> On Thu, 2005-11-03 at 23:10 -0600, Hollis Blanchard wrote: > On Nov 3, 2005, at 11:02 PM, Michael Neuling wrote: > > > I've been hitting a crash on boot where tty_open is being called > > before the > > hvc console driver setup is complete. Below patch fixes this problem. > > What is the race exactly? I guess nothing should be calling into > hvc_open before tty_register_driver()...? Well, in addition to the total lack of locking of the tty code you mean ? :) Seriously, nothing protects tty_register_driver vs; concurrent tty_open()... The race we are hitting here is a bit different though, it is that tty_open(), when asked for the default console driver, will first "ask" the kernel console if it has a matching tty driver. It does that by calling struct console -> device() callback. The hvc implementation of this callback returns the global hvc_driver. However, we fill that global before initializing it's content or registering the tty... kaboom. Ben. From mikey at neuling.org Fri Nov 4 18:02:31 2005 From: mikey at neuling.org (Michael Neuling) Date: Fri, 4 Nov 2005 18:02:31 +1100 Subject: [PATCH] HVC init race In-Reply-To: References: <20051104160211.c66d82f3.mikey@neuling.org> Message-ID: <20051104180231.ca288f01.mikey@neuling.org> > What is the race exactly? I guess nothing should be calling into > hvc_open before tty_register_driver()...? init_dev (from tty_io.c) seems to be racing with hvc_init (from hvc_console.c). Hence we can hit this section at the start if init_dev: if (driver->flags & TTY_DRIVER_DEVPTS_MEM) { tty = devpts_get_tty(idx); if (tty && driver->subtype == PTY_TYPE_MASTER) tty = tty->link; } else { tty = driver->ttys[idx]; /* Crashing here */ } and driver->flags is good but driver->ttys[idx] is not inited yet. ---- cpu 0x5: Vector: 300 (Data Access) at [c000000039f23690] pc: c0000000002a1dc8: .init_dev+0x158/0x760 lr: c0000000002a1db8: .init_dev+0x148/0x760 sp: c000000039f23910 msr: 9000000000009032 dar: 0 dsisr: 40000000 current = 0xc00000003a4147e0 paca = 0xc0000000005d7800 pid = 17150, comm = modprobe enter ? for help 5:mon> t [c000000039f239f0] c0000000002a2564 .tty_open+0x194/0x480 [c000000039f23ab0] c0000000000ce994 .chrdev_open+0x154/0x2a0 [c000000039f23b60] c0000000000bf280 .__dentry_open+0x140/0x3d0 [c000000039f23c10] c0000000000bf664 .filp_open+0x64/0x80 [c000000039f23d00] c0000000000bf868 .do_sys_open+0x68/0x120 [c000000039f23db0] c0000000000fd3e0 .compat_sys_open+0x10/0x30 [c000000039f23e30] c000000000008900 syscall_exit+0x0/0x18 --- Exception: c01 (System Call) at 000000000ff69810 SP (ff918950) is in userspace 5:mon> Mikey From arnd at arndb.de Sat Nov 5 02:31:16 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 4 Nov 2005 16:31:16 +0100 Subject: [PATCH] powerpc: mem_init crash for sparsemem Message-ID: <200511041631.17237.arnd@arndb.de> I have a Cell blade with some broken memory in the middle of the physical address space and this is correctly detected by the firmware, but not relocated. When I enable CONFIG_SPARSEMEM, the memsections for the nonexistant address space do not get struct page entries allocated, as expected. However, mem_init for the non-NUMA configuration tries to access these pages without first looking if they are there. I'm currently using the hack below to work around that, but I have the feeling that there should be a cleaner solution for this. Please comment. Signed-off-by: Arnd Bergmann --- linux-2.6.15-rc.orig/arch/powerpc/mm/mem.c +++ linux-2.6.15-rc/arch/powerpc/mm/mem.c @@ -348,6 +348,9 @@ void __init mem_init(void) #endif for_each_pgdat(pgdat) { for (i = 0; i < pgdat->node_spanned_pages; i++) { + if (!section_has_mem_map(__pfn_to_section + (pgdat->node_start_pfn + i))) + continue; page = pgdat_page_nr(pgdat, i); if (PageReserved(page)) reservedpages++; From johnrose at austin.ibm.com Sat Nov 5 03:20:55 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 04 Nov 2005 10:20:55 -0600 Subject: [PATCH 19/42]: ppc64: bugfix: crash on PHB add In-Reply-To: <20051104005117.GA26991@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005117.GA26991@mail.gnucash.org> Message-ID: <1131121255.9574.11.camel@sinatra.austin.ibm.com> > This patch fixes a bug related to dlpar PHB add, after a PHB removal. This and patch 18 seem logically separate from the feature. This complicates review and adds to an already large patch set. Could we handle these separately? Thanks- John From linas at austin.ibm.com Sat Nov 5 03:35:57 2005 From: linas at austin.ibm.com (linas) Date: Fri, 4 Nov 2005 10:35:57 -0600 Subject: [PATCH 19/42]: ppc64: bugfix: crash on PHB add In-Reply-To: <1131121255.9574.11.camel@sinatra.austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005117.GA26991@mail.gnucash.org> <1131121255.9574.11.camel@sinatra.austin.ibm.com> Message-ID: <20051104163557.GR19593@austin.ibm.com> On Fri, Nov 04, 2005 at 10:20:55AM -0600, John Rose was heard to remark: > > This patch fixes a bug related to dlpar PHB add, after a PHB removal. > > This and patch 18 seem logically separate from the feature. This > complicates review and adds to an already large patch set. Could we > handle these separately? I sent these in separetely, a month ago, as bug fixes for the dlpar crashes in the pre-2.6.14 kernels, but these were never applied. Since they're needed to get EEH to work, I just sent them in again with this set. Yes, I'm aware that the patch you sent yesterday fixes the same bug in almost the same way. What you really want to concentrate on are patches 20 through 23 which mess with the guts of the rpaphp code. But again, these are the same old patches, they have not changed since the submit last month. --linas From geoffrey.levand at am.sony.com Sat Nov 5 05:36:58 2005 From: geoffrey.levand at am.sony.com (Geoff Levand) Date: Fri, 04 Nov 2005 10:36:58 -0800 Subject: pci_resource_end() changed problem with 2.6.14 In-Reply-To: <1131087370.4680.238.camel@gaston> References: <436ADBA7.7030706@am.sony.com> <1131087370.4680.238.camel@gaston> Message-ID: <436BAA4A.5030007@am.sony.com> Benjamin Herrenschmidt wrote: > On Thu, 2005-11-03 at 19:55 -0800, Geoff Levand wrote: > >>I found that the serial port probe code in drivers/serial/8250_pci.c >>no longer works properly for ppc64 in 2.6.14. It seems the value >>returned by pci_resource_len() on ppc64 changed from 8 to 16 since >>2.6.13. I tested on a PC and pci_resource_len() returns 8 as >>expected. >> > Interesting... What does an lspci -vv shows for the BARs of the PCI > card ? Also, what do you have in /proc/device-tree ? What is the > machine precisely ? > > 2.6.14 now uses the OF device-tree to generate the linux PCI tree > instead of going directly to PCI probing. It's possible that this is > causing your problem if for some reason, the BAR sizing done by OF ends > up being different than what the kernel does ... > Sorry, I should have mentioned it, this is on my PowerMac G5 with a generic 8250 serial PCI card (StarTech PCI4S550N). Here's what lspci gives me: 0001:05:03.0 Serial controller: NetMos Technology PCI 9845 Multi-I/O Controller (rev 01) (prog-if 02 [16550]) Subsystem: LSI Logic / Symbios Logic 0P4S (4 port 16550A serial card) Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- References: <436ADBA7.7030706@am.sony.com> <1131087370.4680.238.camel@gaston> <436BAA4A.5030007@am.sony.com> Message-ID: <436BB1A3.5000404@am.sony.com> Geoff Levand wrote: > Benjamin Herrenschmidt wrote: > >>On Thu, 2005-11-03 at 19:55 -0800, Geoff Levand wrote: >> >> >>>I found that the serial port probe code in drivers/serial/8250_pci.c >>>no longer works properly for ppc64 in 2.6.14. It seems the value >>>returned by pci_resource_len() on ppc64 changed from 8 to 16 since >>>2.6.13. I tested on a PC and pci_resource_len() returns 8 as >>>expected. >>> >> >>Interesting... What does an lspci -vv shows for the BARs of the PCI >>card ? Also, what do you have in /proc/device-tree ? What is the >>machine precisely ? >> >>2.6.14 now uses the OF device-tree to generate the linux PCI tree >>instead of going directly to PCI probing. It's possible that this is >>causing your problem if for some reason, the BAR sizing done by OF ends >>up being different than what the kernel does ... >> > > > Sorry, I should have mentioned it, this is on my PowerMac G5 with a > generic 8250 serial PCI card (StarTech PCI4S550N). Here's what lspci > gives me: > > 0001:05:03.0 Serial controller: NetMos Technology PCI 9845 Multi-I/O Controller (rev 01) (prog-if 02 [16550]) > Subsystem: LSI Logic / Symbios Logic 0P4S (4 port 16550A serial card) > Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- > Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Interrupt: pin A routed to IRQ 53 > Region 0: I/O ports at f4000050 [size=16] > Region 1: I/O ports at f4000040 [size=16] > Region 2: I/O ports at f4000030 [size=16] > Region 3: I/O ports at f4000020 [size=16] > Region 4: I/O ports at f4000010 [size=16] > Region 5: I/O ports at f4000000 [size=16] > > It could be the change to using the OF device-tree. What's an easy way to > see the size OF has used? > OK, from the OF prompt I can list the properties, and I see assigned-addresses has a length of 16. I think to change the the OF device tree parsing defeats the purpose of the using the OF device tree, so I'll look into making the serial port probe routine more clever. -Geoff From apw at shadowen.org Sat Nov 5 07:18:19 2005 From: apw at shadowen.org (Andy Whitcroft) Date: Fri, 04 Nov 2005 20:18:19 +0000 Subject: [PATCH] powerpc: mem_init crash for sparsemem In-Reply-To: <200511041631.17237.arnd@arndb.de> References: <200511041631.17237.arnd@arndb.de> Message-ID: <436BC20B.9070704@shadowen.org> Arnd Bergmann wrote: > I have a Cell blade with some broken memory in the middle of the > physical address space and this is correctly detected by the > firmware, but not relocated. When I enable CONFIG_SPARSEMEM, > the memsections for the nonexistant address space do not > get struct page entries allocated, as expected. > > However, mem_init for the non-NUMA configuration tries to > access these pages without first looking if they are there. > I'm currently using the hack below to work around that, but > I have the feeling that there should be a cleaner solution > for this. > > Please comment. > > Signed-off-by: Arnd Bergmann > > --- linux-2.6.15-rc.orig/arch/powerpc/mm/mem.c > +++ linux-2.6.15-rc/arch/powerpc/mm/mem.c > @@ -348,6 +348,9 @@ void __init mem_init(void) > #endif > for_each_pgdat(pgdat) { > for (i = 0; i < pgdat->node_spanned_pages; i++) { > + if (!section_has_mem_map(__pfn_to_section > + (pgdat->node_start_pfn + i))) > + continue; > page = pgdat_page_nr(pgdat, i); > if (PageReserved(page)) > reservedpages++; Would it not make sense to use pfn_valid(), as that is not sparsemem specific? Not looked at the code in question specifically, but if you can use section_has_mem_map() it should be equivalent: if (!pfn_valid(pgdat->node_start_pfn + i)) continue; Want to spin us a patch and I'll give it some general testing. -apw From kravetz at us.ibm.com Sat Nov 5 07:57:58 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 12:57:58 -0800 Subject: [PATCH] powerpc: mem_init crash for sparsemem In-Reply-To: <436BC20B.9070704@shadowen.org> References: <200511041631.17237.arnd@arndb.de> <436BC20B.9070704@shadowen.org> Message-ID: <20051104205758.GA5397@w-mikek2.ibm.com> On Fri, Nov 04, 2005 at 08:18:19PM +0000, Andy Whitcroft wrote: > Arnd Bergmann wrote: > > I have a Cell blade with some broken memory in the middle of the > > physical address space and this is correctly detected by the > > firmware, but not relocated. When I enable CONFIG_SPARSEMEM, > > the memsections for the nonexistant address space do not > > get struct page entries allocated, as expected. > > > > However, mem_init for the non-NUMA configuration tries to > > access these pages without first looking if they are there. This earlier statement in mem_init (or at least the comment), num_physpages = max_pfn; /* RAM is assumed contiguous */ may be a cause for concern. I'm pretty sure max_pfn has previously been set based on the value of lmb_end_of_DRAM(). My guess is that we are going to report the system as having more memory that it actually does (will not account for the hole(s)). That being said, the pfn_valid() check is still needed here. But, it looks like that code was originally written under the assumption that there were no holes. Can someone 'more in the know' of ppc architecture comment on the ram is contiguous assumption? Is this no longer the case? -- Mike From johnrose at austin.ibm.com Sat Nov 5 08:30:56 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 04 Nov 2005 15:30:56 -0600 Subject: [PATCH] dlpar enable for OF pci probe Message-ID: <1131139856.9574.49.camel@sinatra.austin.ibm.com> This patch contains the arch/ppc64 bits for enabling DLPAR and PCI Hotplug for the new OF-based PCI probe mechanism. This code path is currently broken. Please apply if appropriate. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/rtas_pci.c~base_changes arch/ppc64/kernel/rtas_pci.c --- 2_6_linus/arch/ppc64/kernel/rtas_pci.c~base_changes 2005-11-04 13:54:50.000000000 -0600 +++ 2_6_linus-johnrose/arch/ppc64/kernel/rtas_pci.c 2005-11-04 13:54:50.000000000 -0600 @@ -440,7 +440,6 @@ struct pci_controller * __devinit init_p struct device_node *root = of_find_node_by_path("/"); unsigned int root_size_cells = 0; struct pci_controller *phb; - struct pci_bus *bus; int primary; root_size_cells = prom_n_size_cells(root); @@ -456,10 +455,7 @@ struct pci_controller * __devinit init_p of_node_put(root); pci_devs_phb_init_dynamic(phb); - phb->last_busno = 0xff; - bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data); - phb->bus = bus; - phb->last_busno = bus->subordinate; + scan_phb(phb); return phb; } diff -puN arch/ppc64/kernel/pci.c~base_changes arch/ppc64/kernel/pci.c --- 2_6_linus/arch/ppc64/kernel/pci.c~base_changes 2005-11-04 13:54:50.000000000 -0600 +++ 2_6_linus-johnrose/arch/ppc64/kernel/pci.c 2005-11-04 13:54:50.000000000 -0600 @@ -295,8 +295,8 @@ static void pci_parse_of_addrs(struct de } } -static struct pci_dev *of_create_pci_dev(struct device_node *node, - struct pci_bus *bus, int devfn) +struct pci_dev *of_create_pci_dev(struct device_node *node, + struct pci_bus *bus, int devfn) { struct pci_dev *dev; const char *type; @@ -354,10 +354,9 @@ static struct pci_dev *of_create_pci_dev return dev; } +EXPORT_SYMBOL(of_create_pci_dev); -static void of_scan_pci_bridge(struct device_node *node, struct pci_dev *dev); - -static void __devinit of_scan_bus(struct device_node *node, +void __devinit of_scan_bus(struct device_node *node, struct pci_bus *bus) { struct device_node *child = NULL; @@ -381,9 +380,10 @@ static void __devinit of_scan_bus(struct do_bus_setup(bus); } +EXPORT_SYMBOL(of_scan_bus); -static void __devinit of_scan_pci_bridge(struct device_node *node, - struct pci_dev *dev) +void __devinit of_scan_pci_bridge(struct device_node *node, + struct pci_dev *dev) { struct pci_bus *bus; u32 *busrange, *ranges; @@ -464,9 +464,10 @@ static void __devinit of_scan_pci_bridge else if (mode == PCI_PROBE_NORMAL) pci_scan_child_bus(bus); } +EXPORT_SYMBOL(of_scan_pci_bridge); #endif /* CONFIG_PPC_MULTIPLATFORM */ -static void __devinit scan_phb(struct pci_controller *hose) +void __devinit scan_phb(struct pci_controller *hose) { struct pci_bus *bus; struct device_node *node = hose->arch_data; diff -puN include/asm-ppc64/pci.h~base_changes include/asm-ppc64/pci.h --- 2_6_linus/include/asm-ppc64/pci.h~base_changes 2005-11-04 13:54:50.000000000 -0600 +++ 2_6_linus-johnrose/include/asm-ppc64/pci.h 2005-11-04 13:54:50.000000000 -0600 @@ -162,6 +162,14 @@ pcibios_fixup_device_resources(struct pc extern struct pci_controller *init_phb_dynamic(struct device_node *dn); +extern struct pci_dev *of_create_pci_dev(struct device_node *node, + struct pci_bus *bus, int devfn); + +extern void of_scan_pci_bridge(struct device_node *node, + struct pci_dev *dev); + +extern void of_scan_bus(struct device_node *node, struct pci_bus *bus); + extern int pci_read_irq_line(struct pci_dev *dev); extern void pcibios_add_platform_entries(struct pci_dev *dev); diff -puN include/asm-powerpc/ppc-pci.h~base_changes include/asm-powerpc/ppc-pci.h --- 2_6_linus/include/asm-powerpc/ppc-pci.h~base_changes 2005-11-04 13:54:50.000000000 -0600 +++ 2_6_linus-johnrose/include/asm-powerpc/ppc-pci.h 2005-11-04 13:54:50.000000000 -0600 @@ -34,6 +34,7 @@ void *traverse_pci_devices(struct device void pci_devs_phb_init(void); void pci_devs_phb_init_dynamic(struct pci_controller *phb); +void __devinit scan_phb(struct pci_controller *hose); /* PCI address cache management routines */ void pci_addr_cache_insert_device(struct pci_dev *dev); _ From arnd at arndb.de Sat Nov 5 08:43:48 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 4 Nov 2005 22:43:48 +0100 Subject: [PATCH] powerpc: mem_init crash for sparsemem In-Reply-To: <436BC20B.9070704@shadowen.org> References: <200511041631.17237.arnd@arndb.de> <436BC20B.9070704@shadowen.org> Message-ID: <200511042243.49661.arnd@arndb.de> On Freedag 04 November 2005 21:18, Andy Whitcroft wrote: > Would it not make sense to use pfn_valid(), as that is not sparsemem > specific? ?Not looked at the code in question specifically, but if you > can use section_has_mem_map() it should be equivalent: > > ????????if (!pfn_valid(pgdat->node_start_pfn + i)) > ????????????????continue; > > Want to spin us a patch and I'll give it some general testing. Yes, I guess pfn_valid() is the function I was looking for, thanks for pointing that out. Unfortunately, I don't have access to the machine over the weekend, so I won't be able to test that until Monday. Arnd <>< From benh at kernel.crashing.org Sat Nov 5 08:46:10 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Nov 2005 08:46:10 +1100 Subject: pci_resource_end() changed problem with 2.6.14 In-Reply-To: <436BAA4A.5030007@am.sony.com> References: <436ADBA7.7030706@am.sony.com> <1131087370.4680.238.camel@gaston> <436BAA4A.5030007@am.sony.com> Message-ID: <1131140771.29195.11.camel@gaston> On Fri, 2005-11-04 at 10:36 -0800, Geoff Levand wrote: > 0001:05:03.0 Serial controller: NetMos Technology PCI 9845 Multi-I/O Controller (rev 01) (prog-if 02 [16550]) > Subsystem: LSI Logic / Symbios Logic 0P4S (4 port 16550A serial card) > Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- > Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Interrupt: pin A routed to IRQ 53 > Region 0: I/O ports at f4000050 [size=16] > Region 1: I/O ports at f4000040 [size=16] > Region 2: I/O ports at f4000030 [size=16] > Region 3: I/O ports at f4000020 [size=16] > Region 4: I/O ports at f4000010 [size=16] > Region 5: I/O ports at f4000000 [size=16] > > It could be the change to using the OF device-tree. What's an easy way to > see the size OF has used? Find the card in /proc/device-tree and look at "assigned-addresses" Ben. From johnrose at austin.ibm.com Sat Nov 5 08:54:26 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 04 Nov 2005 15:54:26 -0600 Subject: [PATCH 22/42]: PCI: remove duplicted pci hotplug code In-Reply-To: <20051104005201.GA27016@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005201.GA27016@mail.gnucash.org> Message-ID: <1131141266.9574.60.camel@sinatra.austin.ibm.com> > +extern void pcibios_claim_one_bus(struct pci_bus *b); > + Might need to export this for module use by the kernel. Thanks- John From arnd at arndb.de Sat Nov 5 08:59:33 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 4 Nov 2005 22:59:33 +0100 Subject: [PATCH] powerpc: mem_init crash for sparsemem In-Reply-To: <20051104205758.GA5397@w-mikek2.ibm.com> References: <200511041631.17237.arnd@arndb.de> <436BC20B.9070704@shadowen.org> <20051104205758.GA5397@w-mikek2.ibm.com> Message-ID: <200511042259.33880.arnd@arndb.de> On Freedag 04 November 2005 21:57, Mike Kravetz wrote: > This earlier statement in mem_init (or at least the comment), > > num_physpages = max_pfn; /* RAM is assumed contiguous */ > > may be a cause for concern. I'm pretty sure max_pfn has previously > been set based on the value of lmb_end_of_DRAM(). My guess is that we > are going to report the system as having more memory that it actually > does (will not account for the hole(s)). Yes, that's likely to cause trouble later. Unfortunately, there are still multiple places that determine the memory size by different means and save the result in a global variable, so it's hard to get them all right. I'll probably move Cell to use NUMA mode for setups with multiple CPUs (each of which is already SMT), which means we can use the code that we already know handles this correctly, in addtition to the option of using NUMA aware memory allocation and scheduling for the SPUs. > That being said, the pfn_valid() check is still needed here. But, > it looks like that code was originally written under the assumption > that there were no holes. > > Can someone 'more in the know' of ppc architecture comment on the > ram is contiguous assumption? Is this no longer the case? For all I know, the firmware interface can legally declare noncontiguous memory, but that is not done on product level hardware except NUMA. The configuration for SPARSEMEM without NUMA is normally not possible on ppc64, I had to hack Kconfig to allow this in the first place. Without SPARSEMEM, the noncontiguous memory seems to be handled well except for the size detection. Arnd <>< From linas at austin.ibm.com Sat Nov 5 09:14:00 2005 From: linas at austin.ibm.com (linas) Date: Fri, 4 Nov 2005 16:14:00 -0600 Subject: [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence In-Reply-To: <20051104005519.GA27189@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005519.GA27189@mail.gnucash.org> Message-ID: <20051104221400.GS19593@austin.ibm.com> Hi, On Thu, Nov 03, 2005 at 06:55:19PM -0600, Linas Vepstas was heard to remark: > Save the PCI device bars *before* any PCI probing is done. After a tiny bit of extra testing, I found one of those forehead slapping bugs in this patch. Here it is again, with the bug fixed. (So far, all the other tests look good; I've survived 24 hours of several thousand artifical pci errors injected onto ethernet and scsi intermingled with hundreds of pci slot adds/removes.) ------------ 241-eeh-save-bars-earlier.patch Save the PCI device bars *before* any PCI probing is done. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git6/arch/ppc64/kernel/rtas_pci.c =================================================================== --- linux-2.6.14-git6.orig/arch/ppc64/kernel/rtas_pci.c 2005-11-03 14:46:40.000000000 -0600 +++ linux-2.6.14-git6/arch/ppc64/kernel/rtas_pci.c 2005-11-03 14:50:22.000000000 -0600 @@ -72,7 +72,7 @@ return 0; } -static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) +int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; Index: linux-2.6.14-git6/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git6.orig/include/asm-powerpc/ppc-pci.h 2005-11-03 14:50:21.000000000 -0600 +++ linux-2.6.14-git6/include/asm-powerpc/ppc-pci.h 2005-11-03 14:50:22.000000000 -0600 @@ -59,8 +59,6 @@ void pci_addr_cache_build(void); struct pci_dev *pci_get_device_by_addr(unsigned long addr); -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); - /** * eeh_slot_error_detail -- record and EEH error condition to the log * @severity: 1 if temporary, 2 if permanent failure. @@ -104,6 +102,7 @@ void rtas_configure_bridge(struct pci_dn *); int rtas_write_config(struct pci_dn *, int where, int size, u32 val); +int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); /** * mark and clear slots: find "partition endpoint" PE and set or Index: linux-2.6.14-git6/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git6.orig/include/asm-ppc64/pci-bridge.h 2005-11-03 14:50:15.000000000 -0600 +++ linux-2.6.14-git6/include/asm-ppc64/pci-bridge.h 2005-11-03 14:50:22.000000000 -0600 @@ -58,15 +58,15 @@ struct iommu_table; struct pci_dn { - int busno; /* for pci devices */ - int bussubno; /* for pci devices */ - int devfn; /* for pci devices */ + int busno; /* pci bus number */ + int bussubno; /* pci subordinate bus number */ + int devfn; /* pci device and function number */ + int class_code; /* pci device class */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; int eeh_pe_config_addr; /* new-style partition endpoint address */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ - int eeh_is_bridge; /* device is pci-to-pci bridge */ int pci_ext_config_space; /* for pci devices */ struct pci_controller *phb; /* for pci devices */ Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-03 14:50:21.000000000 -0600 +++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh.c 2005-11-04 16:07:29.596059751 -0600 @@ -106,6 +106,8 @@ static DEFINE_PER_CPU(unsigned long, ignored_failures); static DEFINE_PER_CPU(unsigned long, slot_resets); +#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE) + /* --------------------------------------------------------------- */ /* Below lies the EEH event infrastructure */ @@ -620,7 +622,7 @@ if (!pdn) return; - if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge)) + if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && !IS_BRIDGE(pdn->class_code)) __restore_bars (pdn); dn = pdn->node->child; @@ -638,18 +640,15 @@ * PCI devices are added individuallly; but, for the restore, * an entire slot is reset at a time. */ -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +static void eeh_save_bars(struct pci_dn *pdn) { int i; - if (!pdev || !pdn ) + if (!pdn ) return; for (i = 0; i < 16; i++) - pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]); - - if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - pdn->eeh_is_bridge = 1; + rtas_read_config(pdn, i * 4, 4, &pdn->config_space[i]); } void @@ -703,6 +702,9 @@ pdn->eeh_check_count = 0; pdn->eeh_freeze_count = 0; + if (class_code) + pdn->class_code = *class_code; + if (status && strcmp(status, "ok") != 0) return NULL; /* ignore devices with bad status */ @@ -781,6 +783,7 @@ dn->full_name); } + eeh_save_bars(pdn); return NULL; } @@ -915,7 +918,6 @@ pdn->pcidev = dev; pci_addr_cache_insert_device (dev); - eeh_save_bars(dev, pdn); } EXPORT_SYMBOL_GPL(eeh_add_device_late); Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-03 14:50:19.000000000 -0600 +++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-04 10:22:51.000000000 -0600 @@ -304,10 +304,7 @@ pci_addr_cache_insert_device(dev); - /* Save the BAR's; firmware doesn't restore these after EEH reset */ dn = pci_device_to_OF_node(dev); - eeh_save_bars(dev, PCI_DN(dn)); - pci_dev_get (dev); /* matching put is in eeh_remove_device() */ PCI_DN(dn)->pcidev = dev; } From greg at kroah.com Sat Nov 5 09:14:37 2005 From: greg at kroah.com (Greg KH) Date: Fri, 4 Nov 2005 14:14:37 -0800 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers In-Reply-To: <20051103235918.GA25616@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104221437.GA20004@kroah.com> On Thu, Nov 03, 2005 at 05:59:18PM -0600, Linas Vepstas wrote: > What follows is a long sequence of mostly small patches to implement > PCI Error Recovery by adding notification callbacks to the PCI device > driver structure, implementing the recovery in 5 device drivers > (3 ethernet, two scsi drivers), and adding the actual error detection > and recovery code to the ppc64/powerpc arch tree. > > Highlights: > > -- Patches 1-14: Misc required ppc64/powerpc cleanup/bugfixes/restructuring > -- Patch 15: Overview documentation > -- Patch 16: changes to include/linux/pci.h > -- Patches 17-26: error detection and recovery for pSeries PCI bridge chips > -- Patchs 27-32: recovery patches for ethernet, scsi device drivers > -- Patches 33-42: More misc ppc64-specific changes Ok, so at first glance, I only need to pay attention to patches 15, 16, and 27-32? If so, please send the ppc64 specific patches through the ppc64 maintainers, and the rpaphp specific patches through that specific maintainer. Then care to resend the 8 remaining patches to me, so I can stage them in -mm for a while? thanks, greg k-h From holindho at cs.helsinki.fi Sat Nov 5 09:08:21 2005 From: holindho at cs.helsinki.fi (Heikki Lindholm) Date: Sat, 05 Nov 2005 00:08:21 +0200 Subject: [PATCH] Fix G5 UP build of 2.6.14 Message-ID: <436BDBD5.9040009@cs.helsinki.fi> 2.6.14 broke UP build for G5. This is against 2.6.14 release (some were already posted by Olof Johansson). Compiled with g5_defconfig minus CONFIG_SMP. -- Heikki Lindholm -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 2.6.14-ppc64-build-UP.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051105/7c4e04d9/attachment.txt From kravetz at us.ibm.com Sat Nov 5 10:15:52 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 15:15:52 -0800 Subject: [PATCH 0/4] Memory Add Fixes for ppc64 Message-ID: <20051104231552.GA25545@w-mikek2.ibm.com> When memory add was merged into mainline in 2.6.14, there were various bits and pieces missing that prevent it from working on ppc64. The following patches are against 2.6.14-git7 and address all but one of the know issues. 1) Create hptes for new sections 2) Clear page count before freeing new pages 3) Kludge to add new memory to node 0 4) Ensure probe file is created for memory add via sysfs -- Mike From kravetz at us.ibm.com Sat Nov 5 10:18:00 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 15:18:00 -0800 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231552.GA25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> Message-ID: <20051104231800.GB25545@w-mikek2.ibm.com> Add the create_section_mapping() routine to create hptes for memory sections dynamically added after system boot. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c --- linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c 2005-11-04 21:21:05.000000000 +0000 +++ linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c 2005-11-04 22:05:06.000000000 +0000 @@ -176,6 +176,15 @@ static unsigned long get_hashtable_size( return pteg_count << 7; } +#ifdef CONFIG_MEMORY_HOTPLUG +void create_section_mapping(unsigned long start, unsigned long end) +{ + create_pte_mapping(start, end, + _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX, + cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE ? 1 : 0); +} +#endif /* CONFIG_MEMORY_HOTPLUG */ + void __init htab_initialize(void) { unsigned long table, htab_size_bytes; diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c --- linux-2.6.14-git7/arch/powerpc/mm/mem.c 2005-11-04 21:21:05.000000000 +0000 +++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c 2005-11-04 22:05:06.000000000 +0000 @@ -124,6 +124,9 @@ int __devinit add_memory(u64 start, u64 unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; + start += KERNELBASE; + create_section_mapping(start, start + size); + /* this should work for most non-highmem platforms */ zone = pgdata->node_zones; diff -Naupr linux-2.6.14-git7/include/asm-ppc64/sparsemem.h linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h --- linux-2.6.14-git7/include/asm-ppc64/sparsemem.h 2005-10-28 00:02:08.000000000 +0000 +++ linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h 2005-11-04 22:05:06.000000000 +0000 @@ -11,6 +11,10 @@ #define MAX_PHYSADDR_BITS 38 #define MAX_PHYSMEM_BITS 36 +#ifdef CONFIG_MEMORY_HOTPLUG +extern void create_section_mapping(unsigned long start, unsigned long end); +#endif /* CONFIG_MEMORY_HOTPLUG */ + #endif /* CONFIG_SPARSEMEM */ #endif /* _ASM_PPC64_SPARSEMEM_H */ From kravetz at us.ibm.com Sat Nov 5 10:19:32 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 15:19:32 -0800 Subject: [PATCH 2/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231552.GA25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> Message-ID: <20051104231932.GC25545@w-mikek2.ibm.com> memmap_init_zone() sets page count to 1. Before 'freeing' the page, we need to clear the count. This is the same that is done on free_all_bootmem_core() for memory discovered at boot time. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c --- linux-2.6.14-git7/arch/powerpc/mm/mem.c 2005-11-04 21:21:05.000000000 +0000 +++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c 2005-11-04 22:09:59.000000000 +0000 @@ -107,6 +107,7 @@ EXPORT_SYMBOL(phys_mem_access_prot); void online_page(struct page *page) { ClearPageReserved(page); + set_page_count(page, 0); free_cold_page(page); totalram_pages++; num_physpages++; From kravetz at us.ibm.com Sat Nov 5 10:20:24 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 15:20:24 -0800 Subject: [PATCH 3/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231552.GA25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> Message-ID: <20051104232024.GD25545@w-mikek2.ibm.com> This is a temporary kludge that supports adding all new memory to node 0. I will provide a more complete solution similar to that used for dynamically added CPUs in a few days. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7/include/asm-ppc64/mmzone.h linux-2.6.14-git7.work/include/asm-ppc64/mmzone.h --- linux-2.6.14-git7/include/asm-ppc64/mmzone.h 2005-11-04 21:21:09.000000000 +0000 +++ linux-2.6.14-git7.work/include/asm-ppc64/mmzone.h 2005-11-04 22:10:44.000000000 +0000 @@ -33,6 +33,9 @@ extern int numa_cpu_lookup_table[]; extern char *numa_memory_lookup_table; extern cpumask_t numa_cpumask_lookup_table[]; extern int nr_cpus_in_node[]; +#ifdef CONFIG_MEMORY_HOTPLUG +extern unsigned long max_pfn; +#endif /* 16MB regions */ #define MEMORY_INCREMENT_SHIFT 24 @@ -45,6 +48,11 @@ static inline int pa_to_nid(unsigned lon { int nid; +#ifdef CONFIG_MEMORY_HOTPLUG + /* kludge hot added sections default to node 0 */ + if (pa >= (max_pfn << PAGE_SHIFT)) + return 0; +#endif nid = numa_memory_lookup_table[pa >> MEMORY_INCREMENT_SHIFT]; #ifdef DEBUG_NUMA From kravetz at us.ibm.com Sat Nov 5 10:21:09 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 15:21:09 -0800 Subject: [PATCH 4/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231552.GA25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> Message-ID: <20051104232109.GE25545@w-mikek2.ibm.com> ppc64 needs a special sysfs probe file for adding new memory. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig --- linux-2.6.14-git7/arch/ppc64/Kconfig 2005-11-04 21:21:06.000000000 +0000 +++ linux-2.6.14-git7.work/arch/ppc64/Kconfig 2005-11-04 22:11:16.000000000 +0000 @@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID def_bool y depends on NEED_MULTIPLE_NODES +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG + # Some NUMA nodes have memory ranges that span # other nodes. Even though a pfn is valid and # between a node's start and end pfns, it may not From benh at kernel.crashing.org Sat Nov 5 11:04:30 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Nov 2005 11:04:30 +1100 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231800.GB25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> Message-ID: <1131149070.29195.41.camel@gaston> On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote: > Add the create_section_mapping() routine to create hptes for memory > sections dynamically added after system boot. > > Signed-off-by: Mike Kravetz This patch will have to be slightly reworked on top of the 64k pages one. It should be trivial though. Ben. > diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c > --- linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c 2005-11-04 21:21:05.000000000 +0000 > +++ linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c 2005-11-04 22:05:06.000000000 +0000 > @@ -176,6 +176,15 @@ static unsigned long get_hashtable_size( > return pteg_count << 7; > } > > +#ifdef CONFIG_MEMORY_HOTPLUG > +void create_section_mapping(unsigned long start, unsigned long end) > +{ > + create_pte_mapping(start, end, > + _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX, > + cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE ? 1 : 0); > +} > +#endif /* CONFIG_MEMORY_HOTPLUG */ > + > void __init htab_initialize(void) > { > unsigned long table, htab_size_bytes; > diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c > --- linux-2.6.14-git7/arch/powerpc/mm/mem.c 2005-11-04 21:21:05.000000000 +0000 > +++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c 2005-11-04 22:05:06.000000000 +0000 > @@ -124,6 +124,9 @@ int __devinit add_memory(u64 start, u64 > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > > + start += KERNELBASE; > + create_section_mapping(start, start + size); > + > /* this should work for most non-highmem platforms */ > zone = pgdata->node_zones; > > diff -Naupr linux-2.6.14-git7/include/asm-ppc64/sparsemem.h linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h > --- linux-2.6.14-git7/include/asm-ppc64/sparsemem.h 2005-10-28 00:02:08.000000000 +0000 > +++ linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h 2005-11-04 22:05:06.000000000 +0000 > @@ -11,6 +11,10 @@ > #define MAX_PHYSADDR_BITS 38 > #define MAX_PHYSMEM_BITS 36 > > +#ifdef CONFIG_MEMORY_HOTPLUG > +extern void create_section_mapping(unsigned long start, unsigned long end); > +#endif /* CONFIG_MEMORY_HOTPLUG */ > + > #endif /* CONFIG_SPARSEMEM */ > > #endif /* _ASM_PPC64_SPARSEMEM_H */ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From paulus at samba.org Sat Nov 5 11:08:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 5 Nov 2005 11:08:17 +1100 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers In-Reply-To: <20051104221437.GA20004@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104221437.GA20004@kroah.com> Message-ID: <17259.63473.450876.276151@cargo.ozlabs.ibm.com> Greg KH writes: > Ok, so at first glance, I only need to pay attention to patches 15, 16, > and 27-32? If so, please send the ppc64 specific patches through the > ppc64 maintainers, and the rpaphp specific patches through that specific > maintainer. Then care to resend the 8 remaining patches to me, so I can > stage them in -mm for a while? I'm happy to take care of the ppc64-specific patches. I would *really* like to see 16 go to Linus as soon as possible, since everything else depends on it, and since it has very little chance of breaking any existing code. Would you be OK with sending 16 to Linus within the next week? Thanks, Paul. From greg at kroah.com Sat Nov 5 11:28:43 2005 From: greg at kroah.com (Greg KH) Date: Fri, 4 Nov 2005 16:28:43 -0800 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers In-Reply-To: <17259.63473.450876.276151@cargo.ozlabs.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104221437.GA20004@kroah.com> <17259.63473.450876.276151@cargo.ozlabs.ibm.com> Message-ID: <20051105002842.GB22574@kroah.com> On Sat, Nov 05, 2005 at 11:08:17AM +1100, Paul Mackerras wrote: > Greg KH writes: > > > Ok, so at first glance, I only need to pay attention to patches 15, 16, > > and 27-32? If so, please send the ppc64 specific patches through the > > ppc64 maintainers, and the rpaphp specific patches through that specific > > maintainer. Then care to resend the 8 remaining patches to me, so I can > > stage them in -mm for a while? > > I'm happy to take care of the ppc64-specific patches. > > I would *really* like to see 16 go to Linus as soon as possible, since > everything else depends on it, and since it has very little chance of > breaking any existing code. Would you be OK with sending 16 to Linus > within the next week? Can I take 15, 16, 27-32 now without the ppc64 patches dying without it? thanks, greg k-h From kravetz at us.ibm.com Sat Nov 5 11:35:57 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Fri, 4 Nov 2005 16:35:57 -0800 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <1131149070.29195.41.camel@gaston> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> Message-ID: <20051105003557.GC5397@w-mikek2.ibm.com> On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote: > On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote: > > Add the create_section_mapping() routine to create hptes for memory > > sections dynamically added after system boot. > > > > Signed-off-by: Mike Kravetz > > This patch will have to be slightly reworked on top of the 64k pages > one. It should be trivial though. OK. I'll respin on top of your patch at: http://gate.crashing.org/~benh/ppc64-64k-pages.diff Let me know if there is a different version going upstream. -- Mike From hch at lst.de Sat Nov 5 11:38:19 2005 From: hch at lst.de (Christoph Hellwig) Date: Sat, 5 Nov 2005 01:38:19 +0100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130916198.20136.17.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> Message-ID: <20051105003819.GA11505@lst.de> So how does the 64k on 4k hardware emulation work? When Hugh did bigger softpagesize for x86 based on 2.4.x he had to fix drivers all over to deal with that. From benh at kernel.crashing.org Sat Nov 5 11:43:07 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Nov 2005 11:43:07 +1100 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051105003557.GC5397@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051105003557.GC5397@w-mikek2.ibm.com> Message-ID: <1131151387.29195.43.camel@gaston> On Fri, 2005-11-04 at 16:35 -0800, Mike Kravetz wrote: > On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote: > > On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote: > > > Add the create_section_mapping() routine to create hptes for memory > > > sections dynamically added after system boot. > > > > > > Signed-off-by: Mike Kravetz > > > > This patch will have to be slightly reworked on top of the 64k pages > > one. It should be trivial though. > > OK. I'll respin on top of your patch at: > > http://gate.crashing.org/~benh/ppc64-64k-pages.diff > > Let me know if there is a different version going upstream I'll check if it still applied after linus pulls the next round of ppc updates Ben. From benh at kernel.crashing.org Sat Nov 5 11:44:47 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Nov 2005 11:44:47 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051105003819.GA11505@lst.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051105003819.GA11505@lst.de> Message-ID: <1131151488.29195.46.camel@gaston> On Sat, 2005-11-05 at 01:38 +0100, Christoph Hellwig wrote: > So how does the 64k on 4k hardware emulation work? When Hugh did > bigger softpagesize for x86 based on 2.4.x he had to fix drivers all > over to deal with that. What was the problem with drivers ? On ppc64, it's all hidden in the arch code. All the kernel sees is a 64k page size. I extended the PTE to contain tracking informations for the 16 sub pages (HPTE bits & hash slot index). Sub pages are faulted on demand and flushed all at once, but it's all transparent to the generic code. Ben. From paulus at samba.org Sat Nov 5 11:46:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 5 Nov 2005 11:46:44 +1100 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers In-Reply-To: <20051105002842.GB22574@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104221437.GA20004@kroah.com> <17259.63473.450876.276151@cargo.ozlabs.ibm.com> <20051105002842.GB22574@kroah.com> Message-ID: <17260.244.435084.466507@cargo.ozlabs.ibm.com> Greg KH writes: > Can I take 15, 16, 27-32 now without the ppc64 patches dying without it? Sorry, I'm having trouble parsing that. If you mean, will it break ppc64 if you send those patches to Linus before the ppc64 bits get in, the answer is no it won't, please send them on. Thanks, Paul. From greg at kroah.com Sat Nov 5 12:28:06 2005 From: greg at kroah.com (Greg KH) Date: Fri, 4 Nov 2005 17:28:06 -0800 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers In-Reply-To: <17260.244.435084.466507@cargo.ozlabs.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104221437.GA20004@kroah.com> <17259.63473.450876.276151@cargo.ozlabs.ibm.com> <20051105002842.GB22574@kroah.com> <17260.244.435084.466507@cargo.ozlabs.ibm.com> Message-ID: <20051105012806.GA23675@kroah.com> On Sat, Nov 05, 2005 at 11:46:44AM +1100, Paul Mackerras wrote: > Greg KH writes: > > > Can I take 15, 16, 27-32 now without the ppc64 patches dying without it? > > Sorry, I'm having trouble parsing that. If you mean, will it break > ppc64 if you send those patches to Linus before the ppc64 bits get in, > the answer is no it won't, please send them on. Ok, I'll go look at them and see if they can be added to my tree for testing in -mm before I'll send them to Linus. thanks, greg k-h From greg at kroah.com Sat Nov 5 17:11:14 2005 From: greg at kroah.com (Greg KH) Date: Fri, 4 Nov 2005 22:11:14 -0800 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051104005035.GA26929@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> Message-ID: <20051105061114.GA27016@kroah.com> On Thu, Nov 03, 2005 at 06:50:35PM -0600, Linas Vepstas wrote: > +/* ---------------------------------------------------------------- */ > +/** PCI error recovery infrastructure. If a PCI device driver provides > + * a set fof callbacks in struct pci_error_handlers, then that device driver > + * will be notified of PCI bus errors, and will be driven to recovery > + * when an error occurs. > + */ > + > +enum pcierr_result { > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > +}; No, do not create new types of error or return codes. Use the standard -EFOO values. You can document what they should each return, and mean, but do not create new codes. Also, you create an enum, but yet do not use it in your function callback definition, which means you really didn't want to create it in the first place... I'll add 15 and 16 to my tree for now, so they will show up in -mm, but I want to see updated versions before sending them off to Linus. thanks, greg k-h From olh at suse.de Sat Nov 5 23:13:52 2005 From: olh at suse.de (Olaf Hering) Date: Sat, 5 Nov 2005 13:13:52 +0100 Subject: [PATCH] ppc64: add MODALIAS= for vio bus In-Reply-To: <20051030213900.GA22510@suse.de> References: <20051030213900.GA22510@suse.de> Message-ID: <20051105121352.GA8814@suse.de> A non-broken udev would autoload also the drivers for devices on the pseries vio bus, like ibmveth, ibmvscsic and hvsc. This is similar to pci, usb and ieee1394: /lib/modules/`uname -r`/modules.alias alias vio:TvscsiSIBM,v-scsi* ibmvscsic alias vio:TnetworkSIBM,l-lan* ibmveth alias vio:Tserial-serverShvterm2* hvcs /events/debug.00004.pci.add.1394:MODALIAS='pci:v00001014d00000188sv00000000sd00000000bc06sc04i0f' /events/debug.00005.pci.add.1509:MODALIAS='pci:v00008086d00001229sv00001014sd000001FFbc02sc00i00' /events/debug.00026.vio.add.1519:MODALIAS='vio:TserialShvterm1' /events/debug.00027.vio.add.1446:MODALIAS='vio:TvscsiSIBM,v-scsi' /events/debug.00028.vio.add.1451:MODALIAS='vio:TnetworkSIBM,l-lan' modprobe -v vio:TnetworkSIBM,l-lan insmod /lib/modules/2.6.14-20051030_vio-ppc64/kernel/drivers/net/ibmveth.ko Signed-off-by: Olaf Hering arch/powerpc/kernel/vio.c | 27 +++++++++++++++++++++++++++ 1 files changed, 27 insertions(+) Index: linux-2.6.14-olh/arch/powerpc/kernel/vio.c =================================================================== --- linux-2.6.14-olh.orig/arch/powerpc/kernel/vio.c +++ linux-2.6.14-olh/arch/powerpc/kernel/vio.c @@ -21,6 +21,7 @@ #include #include #include +#include static const struct vio_device_id *vio_match_device( const struct vio_device_id *, const struct vio_dev *); @@ -265,7 +266,33 @@ static int vio_bus_match(struct device * return (ids != NULL) && (vio_match_device(ids, vio_dev) != NULL); } +static int vio_hotplug(struct device *dev, char **envp, int num_envp, + char *buffer, int buffer_size) +{ + const struct vio_dev *vio_dev = to_vio_dev(dev); + char *cp; + int length; + + if (!num_envp) + return -ENOMEM; + + if (!vio_dev->dev.platform_data) + return -ENODEV; + cp = (char *)get_property(vio_dev->dev.platform_data, "compatible", &length); + if (!cp) + return -ENODEV; + + envp[0] = buffer; + length = scnprintf(buffer, buffer_size, "MODALIAS=vio:T%sS%s", + vio_dev->type, cp); + if (buffer_size - length <= 0) + return -ENOMEM; + envp[1] = NULL; + return 0; +} + struct bus_type vio_bus_type = { .name = "vio", + .hotplug = vio_hotplug, .match = vio_bus_match, }; -- short story of a lazy sysadmin: alias appserv=wotan From airlied at gmail.com Sat Nov 5 17:37:24 2005 From: airlied at gmail.com (Dave Airlie) Date: Sat, 5 Nov 2005 17:37:24 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1131151488.29195.46.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051105003819.GA11505@lst.de> <1131151488.29195.46.camel@gaston> Message-ID: <21d7e9970511042237p618d6306qb63272a4fa2263ea@mail.gmail.com> > What was the problem with drivers ? On ppc64, it's all hidden in the > arch code. All the kernel sees is a 64k page size. I extended the PTE to > contain tracking informations for the 16 sub pages (HPTE bits & hash > slot index). Sub pages are faulted on demand and flushed all at once, > but it's all transparent to the generic code. > We did that with the VAX port about 5 years ago :-), granted for different reasons.. The VAX has 512 byte hw pages, we had to make a 4K pagesize for the kernel by grouping 8 hw pages together and hiding it all in the arch dir.. granted I don't know if it broke any drivers, we didn't have any... Dave. From david at gibson.dropbear.id.au Sun Nov 6 13:55:58 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Sun, 6 Nov 2005 13:55:58 +1100 Subject: powerpc: Consolidate asm compatibility macros In-Reply-To: <7487F450-429B-4836-AF05-DD47B02D5BC1@kernel.crashing.org> References: <20051104031609.GA962@localhost.localdomain> <7487F450-429B-4836-AF05-DD47B02D5BC1@kernel.crashing.org> Message-ID: <20051106025558.GC17292@localhost.localdomain> On Thu, Nov 03, 2005 at 11:33:14PM -0600, Kumar Gala wrote: > David, > > I hate to be anal, but I think keep the 'L' is useful in the macro > names. > > PPC_LD -> PPC_LL > > I read 'PPC_LD' as either "PPC load" or "PPC load double" never of > which is useful. How about "PPC_LL", which I read as "PPC load long". > > I would propose the following names which at least follow some PPC > naming convention: > > PPC_LL > PPC_STL > PPC_LLARX > PPC_STLCX Hrm.. I actually deliberately removed the L, on the grounds that "long" doesn't necessarily have a consistent meaning between C and asm. The idea is that all these operations work on the "natural" size for the arch, which is to say the size of the GPRs, which is to say the size of a C long. Up to you, paulus. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From anton at samba.org Mon Nov 7 03:25:11 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 7 Nov 2005 03:25:11 +1100 Subject: powerpc: Kill ppcdebug In-Reply-To: <200511041134.30106.michael@ellerman.id.au> References: <20051104001653.GC29025@localhost.localdomain> <200511041134.30106.michael@ellerman.id.au> Message-ID: <20051106162511.GI12353@krispykreme> Hi, > I agree it's pretty ugly, but I thought the concept was at least nice, ie. > runttime enablable debugging. The current scheme of having to #define DEBUG > in a gazillion different files is pretty painful. I found it hard to use because it ended up printing way too much stuff that only the writer of that particular bit of debug code could understand. The PCI code was a case in point, I always went in and wrote my own debug code even though a lot of ppcdebug stuff existed. I agree some runtime debug stuff is useful (numa=debug has been very useful in the past), but Im not sure we need common infrastructure for this. Anton From olof at lixom.net Mon Nov 7 09:04:39 2005 From: olof at lixom.net (Olof Johansson) Date: Sun, 6 Nov 2005 14:04:39 -0800 Subject: [PATCH] powerpc: Nicer printing of address at oops Message-ID: <20051106220439.GA7166@pb15.lixom.net> Hi, Please apply. --- Add nicer printing of faulting address on unresolvable kernel faults. Makes life a little easier for those who don't know how to decode our register contents at oops time. Signed-off-by: Olof Johansson Index: 2.6/arch/powerpc/mm/fault.c =================================================================== --- 2.6.orig/arch/powerpc/mm/fault.c 2005-11-06 12:55:22.000000000 -0800 +++ 2.6/arch/powerpc/mm/fault.c 2005-11-06 14:02:20.000000000 -0800 @@ -389,5 +389,22 @@ void bad_page_fault(struct pt_regs *regs } /* kernel has accessed a bad area */ + + printk(KERN_ALERT "Unable to handle kernel paging request for "); + switch (regs->trap) { + case 0x300: + case 0x380: + printk("data at address 0x%016lx\n", regs->dar); + break; + case 0x400: + case 0x480: + printk("instruction fetch\n"); + break; + default: + printk("unknown fault\n"); + } + printk(KERN_ALERT "Faulting instruction address: 0x%016lx\n", + regs->nip); + die("Kernel access of bad area", regs, sig); } From paulus at samba.org Mon Nov 7 09:44:50 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Nov 2005 09:44:50 +1100 Subject: [PATCH] powerpc: Nicer printing of address at oops In-Reply-To: <20051106220439.GA7166@pb15.lixom.net> References: <20051106220439.GA7166@pb15.lixom.net> Message-ID: <17262.34658.917350.594965@cargo.ozlabs.ibm.com> Olof Johansson writes: > + printk("data at address 0x%016lx\n", regs->dar); Nice idea, but 16 digits is a bit excessive for 32-bit... Paul. From olof at lixom.net Mon Nov 7 09:54:36 2005 From: olof at lixom.net (Olof Johansson) Date: Sun, 6 Nov 2005 14:54:36 -0800 Subject: [PATCH] powerpc: Nicer printing of address at oops In-Reply-To: <17262.34658.917350.594965@cargo.ozlabs.ibm.com> References: <20051106220439.GA7166@pb15.lixom.net> <17262.34658.917350.594965@cargo.ozlabs.ibm.com> Message-ID: <20051106225435.GB7166@pb15.lixom.net> On Mon, Nov 07, 2005 at 09:44:50AM +1100, Paul Mackerras wrote: > Olof Johansson writes: > > > + printk("data at address 0x%016lx\n", regs->dar); > > Nice idea, but 16 digits is a bit excessive for 32-bit... Ack, it's shared now, I forgot. Thanks. Here's a new patch. I don't like the thought of ifdeffing for it so we'll just have to live with 8 digit padding for small 64-bit pointers. :-) -Olof Add nicer printing of faulting address on unresolvable kernel faults. Makes life a little easier for those who don't know how to decode our register contents at oops time. Signed-off-by: Olof Johansson Index: 2.6/arch/powerpc/mm/fault.c =================================================================== --- 2.6.orig/arch/powerpc/mm/fault.c 2005-11-06 12:55:22.000000000 -0800 +++ 2.6/arch/powerpc/mm/fault.c 2005-11-06 14:52:23.000000000 -0800 @@ -389,5 +389,22 @@ void bad_page_fault(struct pt_regs *regs } /* kernel has accessed a bad area */ + + printk(KERN_ALERT "Unable to handle kernel paging request for "); + switch (regs->trap) { + case 0x300: + case 0x380: + printk("data at address 0x%08lx\n", regs->dar); + break; + case 0x400: + case 0x480: + printk("instruction fetch\n"); + break; + default: + printk("unknown fault\n"); + } + printk(KERN_ALERT "Faulting instruction address: 0x%08lx\n", + regs->nip); + die("Kernel access of bad area", regs, sig); } From david at gibson.dropbear.id.au Mon Nov 7 09:49:43 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Mon, 7 Nov 2005 09:49:43 +1100 Subject: powerpc: Kill ppcdebug (revised) In-Reply-To: <20051104001653.GC29025@localhost.localdomain> References: <20051104001653.GC29025@localhost.localdomain> Message-ID: <20051106224943.GA11833@localhost.localdomain> The previous version of the kill ppcdebug patch confliced with Michael Ellerman's plpar_wrappers.h patch. Here's a revised version which fixes the conflict. The ancient ppcdebug/PPCDBG mechanism is now only used in two places. First, in the hash setup code, one of the bits allows the size of the hash table to be reduced by a factor of 8 - which would be better accomplished with a command line option for that purpose. The other was a bunch of bus walking related messages in the iSeries code, which would seem to be insufficient reason to keep the mechanism. This patch removes the last traces of this mechanism. Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/kernel/signal_32.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_32.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_32.c 2005-11-07 09:41:24.000000000 +1100 @@ -44,7 +44,6 @@ #include #ifdef CONFIG_PPC64 #include "ppc32.h" -#include #include #include #else Index: working-2.6/arch/powerpc/mm/init_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/init_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/init_64.c 2005-11-07 09:41:24.000000000 +1100 @@ -57,7 +57,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/mm/pgtable_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/pgtable_64.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/powerpc/mm/pgtable_64.c 2005-11-07 09:41:24.000000000 +1100 @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/iseries/smp.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/smp.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/smp.c 2005-11-07 09:41:24.000000000 +1100 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-07 09:38:01.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/iommu.c 2005-11-07 09:41:24.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/lpar.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/lpar.c 2005-11-07 09:38:01.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/lpar.c 2005-11-07 09:41:35.000000000 +1100 @@ -31,13 +31,13 @@ #include #include #include -#include #include #include #include #include #include #include +#include #include "plpar_wrappers.h" Index: working-2.6/arch/powerpc/platforms/pseries/ras.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/ras.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/ras.c 2005-11-07 09:41:24.000000000 +1100 @@ -48,7 +48,7 @@ #include #include #include -#include +#include static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(ras_log_buf_lock); Index: working-2.6/arch/ppc64/kernel/prom.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom.c 2005-11-07 09:41:24.000000000 +1100 @@ -46,7 +46,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/ppc64/kernel/prom_init.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom_init.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom_init.c 2005-11-07 09:41:24.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/sysdev/u3_iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/sysdev/u3_iommu.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/sysdev/u3_iommu.c 2005-11-07 09:41:24.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/kernel/setup_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/setup_64.c 2005-11-07 09:38:01.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/setup_64.c 2005-11-07 09:41:24.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -60,6 +59,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -244,12 +244,6 @@ DBG(" -> early_setup()\n"); /* - * Fill the default DBG level (do we want to keep - * that old mecanism around forever ?) - */ - ppcdbg_initialize(); - - /* * Do early initializations using the flattened device * tree, like retreiving the physical memory map or * calculating/retreiving the hash table size @@ -516,7 +510,6 @@ printk("-----------------------------------------------------\n"); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); - printk("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); Index: working-2.6/include/asm-ppc64/ppcdebug.h =================================================================== --- working-2.6.orig/include/asm-ppc64/ppcdebug.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,108 +0,0 @@ -#ifndef __PPCDEBUG_H -#define __PPCDEBUG_H -/******************************************************************** - * Author: Adam Litke, IBM Corp - * (c) 2001 - * - * This file contains definitions and macros for a runtime debugging - * system for ppc64 (This should also work on 32 bit with a few - * adjustments. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - ********************************************************************/ - -#include -#include -#include -#include - -#define PPCDBG_BITVAL(X) ((1UL)<<((unsigned long)(X))) - -/* Defined below are the bit positions of various debug flags in the - * ppc64_debug_switch variable. - * -- When adding new values, please enter them into trace names below -- - * - * Values 62 & 63 can be used to stress the hardware page table management - * code. They must be set statically, any attempt to change them dynamically - * would be a very bad idea. - */ -#define PPCDBG_MMINIT PPCDBG_BITVAL(0) -#define PPCDBG_MM PPCDBG_BITVAL(1) -#define PPCDBG_SYS32 PPCDBG_BITVAL(2) -#define PPCDBG_SYS32NI PPCDBG_BITVAL(3) -#define PPCDBG_SYS32X PPCDBG_BITVAL(4) -#define PPCDBG_SYS32M PPCDBG_BITVAL(5) -#define PPCDBG_SYS64 PPCDBG_BITVAL(6) -#define PPCDBG_SYS64NI PPCDBG_BITVAL(7) -#define PPCDBG_SYS64X PPCDBG_BITVAL(8) -#define PPCDBG_SIGNAL PPCDBG_BITVAL(9) -#define PPCDBG_SIGNALXMON PPCDBG_BITVAL(10) -#define PPCDBG_BINFMT32 PPCDBG_BITVAL(11) -#define PPCDBG_BINFMT64 PPCDBG_BITVAL(12) -#define PPCDBG_BINFMTXMON PPCDBG_BITVAL(13) -#define PPCDBG_BINFMT_32ADDR PPCDBG_BITVAL(14) -#define PPCDBG_ALIGNFIXUP PPCDBG_BITVAL(15) -#define PPCDBG_TCEINIT PPCDBG_BITVAL(16) -#define PPCDBG_TCE PPCDBG_BITVAL(17) -#define PPCDBG_PHBINIT PPCDBG_BITVAL(18) -#define PPCDBG_SMP PPCDBG_BITVAL(19) -#define PPCDBG_BOOT PPCDBG_BITVAL(20) -#define PPCDBG_BUSWALK PPCDBG_BITVAL(21) -#define PPCDBG_PROM PPCDBG_BITVAL(22) -#define PPCDBG_RTAS PPCDBG_BITVAL(23) -#define PPCDBG_HTABSTRESS PPCDBG_BITVAL(62) -#define PPCDBG_HTABSIZE PPCDBG_BITVAL(63) -#define PPCDBG_NONE (0UL) -#define PPCDBG_ALL (0xffffffffUL) - -/* The default initial value for the debug switch */ -#define PPC_DEBUG_DEFAULT 0 -/* #define PPC_DEBUG_DEFAULT PPCDBG_ALL */ - -#define PPCDBG_NUM_FLAGS 64 - -extern u64 ppc64_debug_switch; - -#ifdef WANT_PPCDBG_TAB -/* A table of debug switch names to allow name lookup in xmon - * (and whoever else wants it. - */ -char *trace_names[PPCDBG_NUM_FLAGS] = { - /* Known debug names */ - "mminit", "mm", - "syscall32", "syscall32_ni", "syscall32x", "syscall32m", - "syscall64", "syscall64_ni", "syscall64x", - "signal", "signal_xmon", - "binfmt32", "binfmt64", "binfmt_xmon", "binfmt_32addr", - "alignfixup", "tceinit", "tce", "phb_init", - "smp", "boot", "buswalk", "prom", - "rtas" -}; -#else -extern char *trace_names[64]; -#endif /* WANT_PPCDBG_TAB */ - -#ifdef CONFIG_PPCDBG -/* Macro to conditionally print debug based on debug_switch */ -#define PPCDBG(...) udbg_ppcdbg(__VA_ARGS__) - -/* Macro to conditionally call a debug routine based on debug_switch */ -#define PPCDBGCALL(FLAGS,FUNCTION) ifppcdebug(FLAGS) FUNCTION - -/* Macros to test for debug states */ -#define ifppcdebug(FLAGS) if (udbg_ifdebug(FLAGS)) -#define ppcdebugset(FLAGS) (udbg_ifdebug(FLAGS)) -#define PPCDBG_BINFMT (test_thread_flag(TIF_32BIT) ? PPCDBG_BINFMT32 : PPCDBG_BINFMT64) - -#else -#define PPCDBG(...) do {;} while (0) -#define PPCDBGCALL(FLAGS,FUNCTION) do {;} while (0) -#define ifppcdebug(...) if (0) -#define ppcdebugset(FLAGS) (0) -#endif /* CONFIG_PPCDBG */ - -#endif /*__PPCDEBUG_H */ Index: working-2.6/arch/ppc64/kernel/udbg.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/udbg.c 2005-11-07 09:41:24.000000000 +1100 @@ -10,12 +10,10 @@ */ #include -#define WANT_PPCDBG_TAB /* Only defined here */ #include #include #include #include -#include #include void (*udbg_putc)(unsigned char c); @@ -89,59 +87,6 @@ va_end(args); } -/* PPCDBG stuff */ - -u64 ppc64_debug_switch; - -/* Special print used by PPCDBG() macro */ -void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...) -{ - unsigned long active_debugs = debug_flags & ppc64_debug_switch; - - if (active_debugs) { - va_list ap; - unsigned char buf[UDBG_BUFSIZE]; - unsigned long i, len = 0; - - for (i=0; i < PPCDBG_NUM_FLAGS; i++) { - if (((1U << i) & active_debugs) && - trace_names[i]) { - len += strlen(trace_names[i]); - udbg_puts(trace_names[i]); - break; - } - } - - snprintf(buf, UDBG_BUFSIZE, " [%s]: ", current->comm); - len += strlen(buf); - udbg_puts(buf); - - while (len < 18) { - udbg_puts(" "); - len++; - } - - va_start(ap, fmt); - vsnprintf(buf, UDBG_BUFSIZE, fmt, ap); - udbg_puts(buf); - va_end(ap); - } -} - -unsigned long udbg_ifdebug(unsigned long flags) -{ - return (flags & ppc64_debug_switch); -} - -/* - * Initialize the PPCDBG state. Called before relocation has been enabled. - */ -void __init ppcdbg_initialize(void) -{ - ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ - /* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */; -} - /* * Early boot console based on udbg */ Index: working-2.6/include/asm-ppc64/udbg.h =================================================================== --- working-2.6.orig/include/asm-ppc64/udbg.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/udbg.h 2005-11-07 09:41:24.000000000 +1100 @@ -23,9 +23,6 @@ extern void register_early_udbg_console(void); extern void udbg_printf(const char *fmt, ...); -extern void udbg_ppcdbg(unsigned long flags, const char *fmt, ...); -extern unsigned long udbg_ifdebug(unsigned long flags); -extern void __init ppcdbg_initialize(void); extern void udbg_init_uart(void __iomem *comport, unsigned int speed); Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-07 09:41:24.000000000 +1100 @@ -32,7 +32,6 @@ #include #include -#include #include #include #include @@ -194,12 +193,6 @@ htab_size_bytes = get_hashtable_size(); pteg_count = htab_size_bytes >> 7; - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - htab_hash_mask = pteg_count - 1; if (systemcfg->platform & PLATFORM_LPAR) { Index: working-2.6/arch/powerpc/platforms/iseries/irq.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/irq.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/irq.c 2005-11-07 09:41:24.000000000 +1100 @@ -35,7 +35,6 @@ #include #include -#include #include #include #include @@ -227,8 +226,6 @@ /* Unmask secondary INTA */ mask = 0x80000000; HvCallPci_unmaskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_enable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* This is called by iSeries_activate_IRQs */ @@ -310,8 +307,6 @@ /* Mask secondary INTA */ mask = 0x80000000; HvCallPci_maskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_disable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* Index: working-2.6/arch/powerpc/platforms/iseries/pci.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/pci.c 2005-11-07 09:41:24.000000000 +1100 @@ -32,7 +32,6 @@ #include #include #include -#include #include #include @@ -207,10 +206,6 @@ struct device_node *node; struct pci_dn *pdn; - PPCDBG(PPCDBG_BUSWALK, - "-build_device_node 0x%02X.%02X.%02X Function: %02X\n", - Bus, SubBus, AgentId, Function); - node = kmalloc(sizeof(struct device_node), GFP_KERNEL); if (node == NULL) return NULL; @@ -243,8 +238,6 @@ struct pci_controller *phb; HvBusNumber bus; - PPCDBG(PPCDBG_BUSWALK, "find_and_init_phbs Entry\n"); - /* Check all possible buses. */ for (bus = 0; bus < 256; bus++) { int ret = HvCallXm_testBus(bus); @@ -261,9 +254,6 @@ phb->last_busno = bus; phb->ops = &iSeries_pci_ops; - PPCDBG(PPCDBG_BUSWALK, "PCI:Create iSeries pci_controller(%p), Bus: %04X\n", - phb, bus); - /* Find and connect the devices. */ scan_PHB_slots(phb); } @@ -285,11 +275,9 @@ */ void iSeries_pcibios_init(void) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iomm_table_initialize(); find_and_init_phbs(); io_page_mask = -1; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } /* @@ -301,8 +289,6 @@ struct device_node *node; int DeviceCount = 0; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup Entry.\n"); - /* Fix up at the device node and pci_dev relationship */ mf_display_src(0xC9000100); @@ -316,9 +302,6 @@ ++DeviceCount; pdev->sysdata = (void *)node; PCI_DN(node)->pcidev = pdev; - PPCDBG(PPCDBG_BUSWALK, - "pdev 0x%p <==> DevNode 0x%p\n", - pdev, node); allocate_device_bars(pdev); iSeries_Device_Information(pdev, DeviceCount); iommu_devnode_init_iSeries(node); @@ -333,13 +316,10 @@ void pcibios_fixup_bus(struct pci_bus *PciBus) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup_bus(0x%04X) Entry.\n", - PciBus->number); } void pcibios_fixup_resources(struct pci_dev *pdev) { - PPCDBG(PPCDBG_BUSWALK, "fixup_resources pdev %p\n", pdev); } /* @@ -401,9 +381,6 @@ printk("found device at bus %d idsel %d func %d (AgentId %x)\n", bus, IdSel, Function, AgentId); /* Connect EADs: 0x18.00.12 = 0x00 */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:Connect EADs: 0x%02X.%02X.%02X\n", - bus, SubBus, AgentId); HvRc = HvCallPci_getBusUnitInfo(bus, SubBus, AgentId, iseries_hv_addr(BridgeInfo), sizeof(struct HvCallPci_BridgeInfo)); @@ -414,14 +391,6 @@ BridgeInfo->maxAgents, BridgeInfo->maxSubBusNumber, BridgeInfo->logicalSlotNumber); - PPCDBG(PPCDBG_BUSWALK, - "PCI: BridgeInfo, Type:0x%02X, SubBus:0x%02X, MaxAgents:0x%02X, MaxSubBus: 0x%02X, LSlot: 0x%02X\n", - BridgeInfo->busUnitInfo.deviceType, - BridgeInfo->subBusNumber, - BridgeInfo->maxAgents, - BridgeInfo->maxSubBusNumber, - BridgeInfo->logicalSlotNumber); - if (BridgeInfo->busUnitInfo.deviceType == HvCallPci_BridgeDevice) { /* Scan_Bridge_Slot...: 0x18.00.12 */ @@ -454,9 +423,6 @@ /* iSeries_allocate_IRQ.: 0x18.00.12(0xA3) */ Irq = iSeries_allocate_IRQ(Bus, 0, EADsIdSel); - PPCDBG(PPCDBG_BUSWALK, - "PCI:- allocate and assign IRQ 0x%02X.%02X.%02X = 0x%02X\n", - Bus, 0, EADsIdSel, Irq); /* * Connect all functions of any device found. @@ -482,9 +448,6 @@ printk("read vendor ID: %x\n", VendorId); /* FoundDevice: 0x18.28.10 = 0x12AE */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:- FoundDevice: 0x%02X.%02X.%02X = 0x%04X, irq %d\n", - Bus, SubBus, AgentId, VendorId, Irq); HvRc = HvCallPci_configStore8(Bus, SubBus, AgentId, PCI_INTERRUPT_LINE, Irq); if (HvRc != 0) Index: working-2.6/arch/powerpc/platforms/iseries/setup.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/setup.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/setup.c 2005-11-07 09:41:24.000000000 +1100 @@ -71,8 +71,6 @@ #endif /* Function Prototypes */ -extern void ppcdbg_initialize(void); - static void build_iSeries_Memory_Map(void); static void iseries_shared_idle(void); static void iseries_dedicated_idle(void); @@ -309,8 +307,6 @@ ppc64_firmware_features = FW_FEATURE_ISERIES; - ppcdbg_initialize(); - ppc64_interrupt_controller = IC_ISERIES; #if defined(CONFIG_BLK_DEV_INITRD) Index: working-2.6/arch/ppc64/Kconfig.debug =================================================================== --- working-2.6.orig/arch/ppc64/Kconfig.debug 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/Kconfig.debug 2005-11-07 09:41:24.000000000 +1100 @@ -55,10 +55,6 @@ xmon is normally disabled unless booted with 'xmon=on'. Use 'xmon=off' to disable xmon init during runtime. -config PPCDBG - bool "Include PPCDBG realtime debugging" - depends on DEBUG_KERNEL - config IRQSTACKS bool "Use separate kernel stacks when processing interrupts" help Index: working-2.6/arch/powerpc/kernel/signal_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_64.c 2005-11-04 14:18:13.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_64.c 2005-11-07 09:41:24.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From paulus at samba.org Mon Nov 7 10:25:39 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Nov 2005 10:25:39 +1100 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051105061114.GA27016@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> Message-ID: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> Greg KH writes: > > +enum pcierr_result { > > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ > > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ > > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > > +}; > > No, do not create new types of error or return codes. Use the standard > -EFOO values. You can document what they should each return, and mean, > but do not create new codes. Actually, these are not error or return codes, but rather requested actions (maybe somewhat misnamed). We can map them on to -EFOO values but it will be rather strained (-ECONNRESET for "please reset the slot", anyone? :). > Also, you create an enum, but yet do not use it in your function > callback definition, which means you really didn't want to create it in > the first place... Yes, they could be #defines. Paul. From paulus at samba.org Mon Nov 7 11:59:42 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Nov 2005 11:59:42 +1100 Subject: [PATCH 4/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104232109.GE25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104232109.GE25545@w-mikek2.ibm.com> Message-ID: <17262.42750.810366.294231@cargo.ozlabs.ibm.com> Mike Kravetz writes: > ppc64 needs a special sysfs probe file for adding new memory. > > Signed-off-by: Mike Kravetz > > diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig > --- linux-2.6.14-git7/arch/ppc64/Kconfig 2005-11-04 21:21:06.000000000 +0000 > +++ linux-2.6.14-git7.work/arch/ppc64/Kconfig 2005-11-04 22:11:16.000000000 +0000 > @@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID > def_bool y > depends on NEED_MULTIPLE_NODES > > +config ARCH_MEMORY_PROBE > + def_bool y > + depends on MEMORY_HOTPLUG > + Does arch/powerpc/Kconfig need a similar fix then? Paul. From paulus at samba.org Mon Nov 7 12:05:23 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Nov 2005 12:05:23 +1100 Subject: powerpc: Consolidate asm compatibility macros In-Reply-To: <20051106025558.GC17292@localhost.localdomain> References: <20051104031609.GA962@localhost.localdomain> <7487F450-429B-4836-AF05-DD47B02D5BC1@kernel.crashing.org> <20051106025558.GC17292@localhost.localdomain> Message-ID: <17262.43091.764598.779557@cargo.ozlabs.ibm.com> David Gibson writes: > Hrm.. I actually deliberately removed the L, on the grounds that > "long" doesn't necessarily have a consistent meaning between C and > asm. The idea is that all these operations work on the "natural" size > for the arch, which is to say the size of the GPRs, which is to say > the size of a C long. Yes, long is the type that (happens to) correspond to a register width, so I like Kumar's suggestion. Paul. From benh at kernel.crashing.org Mon Nov 7 14:22:18 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:22:18 +1100 Subject: [PATCH] ppc64: Fix zImage boot Message-ID: <1131333739.5229.148.camel@gaston> The zImage wrapper has a bug where it doesn't claim() the memory for the kernel properly, it forgets to take into account the offset between the ELF header and the kernel itself. This results on some machines, like G5s, into a kernel that crashes at boot when clearing the BSS. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/boot/main.c =================================================================== --- linux-work.orig/arch/ppc64/boot/main.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/ppc64/boot/main.c 2005-11-07 14:20:54.000000000 +1100 @@ -203,8 +203,15 @@ if (elf64ph->p_type == PT_LOAD && elf64ph->p_offset != 0) break; } - vmlinux.size = (unsigned long)elf64ph->p_filesz; - vmlinux.memsize = (unsigned long)elf64ph->p_memsz; + vmlinux.size = (unsigned long)elf64ph->p_filesz + + (unsigned long)elf64ph->p_offset; + /* We need to claim the memsize plus the file offset since gzip + * will expand the header (file offset), then the kernel, then + * possible rubbish we don't care about. But the kernel bss must + * be claimed (it will be zero'd by the kernel itself) + */ + vmlinux.memsize = (unsigned long)elf64ph->p_memsz + + (unsigned long)elf64ph->p_offset; printf("Allocating 0x%lx bytes for kernel ...\n\r", vmlinux.memsize); vmlinux.addr = try_claim(vmlinux.memsize); if (vmlinux.addr == 0) { From benh at kernel.crashing.org Mon Nov 7 14:27:33 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:27:33 +1100 Subject: [PATCH] ppc64: SMU based macs cpufreq support Message-ID: <1131334053.5229.151.camel@gaston> CPU freq support using 970FX powertune facility for iMac G5 and SMU based single CPU desktop. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/misc.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/misc.S 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/ppc64/kernel/misc.S 2005-11-07 11:58:31.000000000 +1100 @@ -560,7 +560,7 @@ isync blr - /* +/* * Do an IO access in real mode */ _GLOBAL(real_writeb) @@ -593,6 +593,76 @@ #endif /* defined(CONFIG_PPC_PMAC) || defined(CONFIG_PPC_MAPLE) */ /* + * SCOM access functions for 970 (FX only for now) + * + * unsigned long scom970_read(unsigned int address); + * void scom970_write(unsigned int address, unsigned long value); + * + * The address passed in is the 24 bits register address. This code + * is 970 specific and will not check the status bits, so you should + * know what you are doing. + */ +_GLOBAL(scom970_read) + /* interrupts off */ + mfmsr r4 + ori r0,r4,MSR_EE + xori r0,r0,MSR_EE + mtmsrd r0,1 + + /* rotate 24 bits SCOM address 8 bits left and mask out it's low 8 bits + * (including parity). On current CPUs they must be 0'd, + * and finally or in RW bit + */ + rlwinm r3,r3,8,0,15 + ori r3,r3,0x8000 + + /* do the actual scom read */ + sync + mtspr SPRN_SCOMC,r3 + isync + mfspr r3,SPRN_SCOMD + isync + mfspr r0,SPRN_SCOMC + isync + + /* XXX: fixup result on some buggy 970's (ouch ! we lost a bit, bah + * that's the best we can do). Not implemented yet as we don't use + * the scom on any of the bogus CPUs yet, but may have to be done + * ultimately + */ + + /* restore interrupts */ + mtmsrd r4,1 + blr + + +_GLOBAL(scom970_write) + /* interrupts off */ + mfmsr r5 + ori r0,r5,MSR_EE + xori r0,r0,MSR_EE + mtmsrd r0,1 + + /* rotate 24 bits SCOM address 8 bits left and mask out it's low 8 bits + * (including parity). On current CPUs they must be 0'd. + */ + + rlwinm r3,r3,8,0,15 + + sync + mtspr SPRN_SCOMD,r4 /* write data */ + isync + mtspr SPRN_SCOMC,r3 /* write command */ + isync + mfspr 3,SPRN_SCOMC + isync + + /* restore interrupts */ + mtmsrd r5,1 + blr + + +/* * Create a kernel thread * kernel_thread(fn, arg, flags) */ Index: linux-work/arch/ppc64/Kconfig =================================================================== --- linux-work.orig/arch/ppc64/Kconfig 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/ppc64/Kconfig 2005-11-07 11:58:31.000000000 +1100 @@ -169,6 +169,16 @@ support. As of this writing the exact hardware interface is strongly in flux, so no good recommendation can be made. +source "drivers/cpufreq/Kconfig" + +config CPU_FREQ_PMAC64 + bool "Support for some Apple G5s" + depends on CPU_FREQ && PMAC_SMU && PPC64 + select CPU_FREQ_TABLE + help + This adds support for frequency switching on Apple iMac G5, + and some of the more recent desktop G5 machines as well. + config IBMVIO depends on PPC_PSERIES || PPC_ISERIES bool Index: linux-work/drivers/macintosh/smu.c =================================================================== --- linux-work.orig/drivers/macintosh/smu.c 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/drivers/macintosh/smu.c 2005-11-07 11:58:31.000000000 +1100 @@ -845,6 +845,18 @@ return 0; } +struct smu_sdbp_header *smu_get_sdb_partition(int id, unsigned int *size) +{ + char pname[32]; + + if (!smu) + return NULL; + + sprintf(pname, "sdb-partition-%02x", id); + return (struct smu_sdbp_header *)get_property(smu->of_node, + pname, size); +} +EXPORT_SYMBOL(smu_get_sdb_partition); /* Index: linux-work/arch/powerpc/platforms/powermac/cpufreq.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/cpufreq.c 2005-11-07 11:58:28.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,726 +0,0 @@ -/* - * arch/ppc/platforms/pmac_cpufreq.c - * - * Copyright (C) 2002 - 2005 Benjamin Herrenschmidt - * Copyright (C) 2004 John Steele Scott - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * TODO: Need a big cleanup here. Basically, we need to have different - * cpufreq_driver structures for the different type of HW instead of the - * current mess. We also need to better deal with the detection of the - * type of machine. - * - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* WARNING !!! This will cause calibrate_delay() to be called, - * but this is an __init function ! So you MUST go edit - * init/main.c to make it non-init before enabling DEBUG_FREQ - */ -#undef DEBUG_FREQ - -/* - * There is a problem with the core cpufreq code on SMP kernels, - * it won't recalculate the Bogomips properly - */ -#ifdef CONFIG_SMP -#warning "WARNING, CPUFREQ not recommended on SMP kernels" -#endif - -extern void low_choose_7447a_dfs(int dfs); -extern void low_choose_750fx_pll(int pll); -extern void low_sleep_handler(void); - -/* - * Currently, PowerMac cpufreq supports only high & low frequencies - * that are set by the firmware - */ -static unsigned int low_freq; -static unsigned int hi_freq; -static unsigned int cur_freq; -static unsigned int sleep_freq; - -/* - * Different models uses different mecanisms to switch the frequency - */ -static int (*set_speed_proc)(int low_speed); -static unsigned int (*get_speed_proc)(void); - -/* - * Some definitions used by the various speedprocs - */ -static u32 voltage_gpio; -static u32 frequency_gpio; -static u32 slew_done_gpio; -static int no_schedule; -static int has_cpu_l2lve; -static int is_pmu_based; - -/* There are only two frequency states for each processor. Values - * are in kHz for the time being. - */ -#define CPUFREQ_HIGH 0 -#define CPUFREQ_LOW 1 - -static struct cpufreq_frequency_table pmac_cpu_freqs[] = { - {CPUFREQ_HIGH, 0}, - {CPUFREQ_LOW, 0}, - {0, CPUFREQ_TABLE_END}, -}; - -static struct freq_attr* pmac_cpu_freqs_attr[] = { - &cpufreq_freq_attr_scaling_available_freqs, - NULL, -}; - -static inline void local_delay(unsigned long ms) -{ - if (no_schedule) - mdelay(ms); - else - msleep(ms); -} - -#ifdef DEBUG_FREQ -static inline void debug_calc_bogomips(void) -{ - /* This will cause a recalc of bogomips and display the - * result. We backup/restore the value to avoid affecting the - * core cpufreq framework's own calculation. - */ - extern void calibrate_delay(void); - - unsigned long save_lpj = loops_per_jiffy; - calibrate_delay(); - loops_per_jiffy = save_lpj; -} -#endif /* DEBUG_FREQ */ - -/* Switch CPU speed under 750FX CPU control - */ -static int cpu_750fx_cpu_speed(int low_speed) -{ - u32 hid2; - - if (low_speed == 0) { - /* ramping up, set voltage first */ - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); - /* Make sure we sleep for at least 1ms */ - local_delay(10); - - /* tweak L2 for high voltage */ - if (has_cpu_l2lve) { - hid2 = mfspr(SPRN_HID2); - hid2 &= ~0x2000; - mtspr(SPRN_HID2, hid2); - } - } -#ifdef CONFIG_6xx - low_choose_750fx_pll(low_speed); -#endif - if (low_speed == 1) { - /* tweak L2 for low voltage */ - if (has_cpu_l2lve) { - hid2 = mfspr(SPRN_HID2); - hid2 |= 0x2000; - mtspr(SPRN_HID2, hid2); - } - - /* ramping down, set voltage last */ - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); - local_delay(10); - } - - return 0; -} - -static unsigned int cpu_750fx_get_cpu_speed(void) -{ - if (mfspr(SPRN_HID1) & HID1_PS) - return low_freq; - else - return hi_freq; -} - -/* Switch CPU speed using DFS */ -static int dfs_set_cpu_speed(int low_speed) -{ - if (low_speed == 0) { - /* ramping up, set voltage first */ - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); - /* Make sure we sleep for at least 1ms */ - local_delay(1); - } - - /* set frequency */ -#ifdef CONFIG_6xx - low_choose_7447a_dfs(low_speed); -#endif - udelay(100); - - if (low_speed == 1) { - /* ramping down, set voltage last */ - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); - local_delay(1); - } - - return 0; -} - -static unsigned int dfs_get_cpu_speed(void) -{ - if (mfspr(SPRN_HID1) & HID1_DFS) - return low_freq; - else - return hi_freq; -} - - -/* Switch CPU speed using slewing GPIOs - */ -static int gpios_set_cpu_speed(int low_speed) -{ - int gpio, timeout = 0; - - /* If ramping up, set voltage first */ - if (low_speed == 0) { - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); - /* Delay is way too big but it's ok, we schedule */ - local_delay(10); - } - - /* Set frequency */ - gpio = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, frequency_gpio, 0); - if (low_speed == ((gpio & 0x01) == 0)) - goto skip; - - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, frequency_gpio, - low_speed ? 0x04 : 0x05); - udelay(200); - do { - if (++timeout > 100) - break; - local_delay(1); - gpio = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, slew_done_gpio, 0); - } while((gpio & 0x02) == 0); - skip: - /* If ramping down, set voltage last */ - if (low_speed == 1) { - pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); - /* Delay is way too big but it's ok, we schedule */ - local_delay(10); - } - -#ifdef DEBUG_FREQ - debug_calc_bogomips(); -#endif - - return 0; -} - -/* Switch CPU speed under PMU control - */ -static int pmu_set_cpu_speed(int low_speed) -{ - struct adb_request req; - unsigned long save_l2cr; - unsigned long save_l3cr; - unsigned int pic_prio; - unsigned long flags; - - preempt_disable(); - -#ifdef DEBUG_FREQ - printk(KERN_DEBUG "HID1, before: %x\n", mfspr(SPRN_HID1)); -#endif - pmu_suspend(); - - /* Disable all interrupt sources on openpic */ - pic_prio = mpic_cpu_get_priority(); - mpic_cpu_set_priority(0xf); - - /* Make sure the decrementer won't interrupt us */ - asm volatile("mtdec %0" : : "r" (0x7fffffff)); - /* Make sure any pending DEC interrupt occuring while we did - * the above didn't re-enable the DEC */ - mb(); - asm volatile("mtdec %0" : : "r" (0x7fffffff)); - - /* We can now disable MSR_EE */ - local_irq_save(flags); - - /* Giveup the FPU & vec */ - enable_kernel_fp(); - -#ifdef CONFIG_ALTIVEC - if (cpu_has_feature(CPU_FTR_ALTIVEC)) - enable_kernel_altivec(); -#endif /* CONFIG_ALTIVEC */ - - /* Save & disable L2 and L3 caches */ - save_l3cr = _get_L3CR(); /* (returns -1 if not available) */ - save_l2cr = _get_L2CR(); /* (returns -1 if not available) */ - - /* Send the new speed command. My assumption is that this command - * will cause PLL_CFG[0..3] to be changed next time CPU goes to sleep - */ - pmu_request(&req, NULL, 6, PMU_CPU_SPEED, 'W', 'O', 'O', 'F', low_speed); - while (!req.complete) - pmu_poll(); - - /* Prepare the northbridge for the speed transition */ - pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,1,1); - - /* Call low level code to backup CPU state and recover from - * hardware reset - */ - low_sleep_handler(); - - /* Restore the northbridge */ - pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,1,0); - - /* Restore L2 cache */ - if (save_l2cr != 0xffffffff && (save_l2cr & L2CR_L2E) != 0) - _set_L2CR(save_l2cr); - /* Restore L3 cache */ - if (save_l3cr != 0xffffffff && (save_l3cr & L3CR_L3E) != 0) - _set_L3CR(save_l3cr); - - /* Restore userland MMU context */ - set_context(current->active_mm->context, current->active_mm->pgd); - -#ifdef DEBUG_FREQ - printk(KERN_DEBUG "HID1, after: %x\n", mfspr(SPRN_HID1)); -#endif - - /* Restore low level PMU operations */ - pmu_unlock(); - - /* Restore decrementer */ - wakeup_decrementer(); - - /* Restore interrupts */ - mpic_cpu_set_priority(pic_prio); - - /* Let interrupts flow again ... */ - local_irq_restore(flags); - -#ifdef DEBUG_FREQ - debug_calc_bogomips(); -#endif - - pmu_resume(); - - preempt_enable(); - - return 0; -} - -static int do_set_cpu_speed(int speed_mode, int notify) -{ - struct cpufreq_freqs freqs; - unsigned long l3cr; - static unsigned long prev_l3cr; - - freqs.old = cur_freq; - freqs.new = (speed_mode == CPUFREQ_HIGH) ? hi_freq : low_freq; - freqs.cpu = smp_processor_id(); - - if (freqs.old == freqs.new) - return 0; - - if (notify) - cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE); - if (speed_mode == CPUFREQ_LOW && - cpu_has_feature(CPU_FTR_L3CR)) { - l3cr = _get_L3CR(); - if (l3cr & L3CR_L3E) { - prev_l3cr = l3cr; - _set_L3CR(0); - } - } - set_speed_proc(speed_mode == CPUFREQ_LOW); - if (speed_mode == CPUFREQ_HIGH && - cpu_has_feature(CPU_FTR_L3CR)) { - l3cr = _get_L3CR(); - if ((prev_l3cr & L3CR_L3E) && l3cr != prev_l3cr) - _set_L3CR(prev_l3cr); - } - if (notify) - cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); - cur_freq = (speed_mode == CPUFREQ_HIGH) ? hi_freq : low_freq; - - return 0; -} - -static unsigned int pmac_cpufreq_get_speed(unsigned int cpu) -{ - return cur_freq; -} - -static int pmac_cpufreq_verify(struct cpufreq_policy *policy) -{ - return cpufreq_frequency_table_verify(policy, pmac_cpu_freqs); -} - -static int pmac_cpufreq_target( struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation) -{ - unsigned int newstate = 0; - - if (cpufreq_frequency_table_target(policy, pmac_cpu_freqs, - target_freq, relation, &newstate)) - return -EINVAL; - - return do_set_cpu_speed(newstate, 1); -} - -unsigned int pmac_get_one_cpufreq(int i) -{ - /* Supports only one CPU for now */ - return (i == 0) ? cur_freq : 0; -} - -static int pmac_cpufreq_cpu_init(struct cpufreq_policy *policy) -{ - if (policy->cpu != 0) - return -ENODEV; - - policy->governor = CPUFREQ_DEFAULT_GOVERNOR; - policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL; - policy->cur = cur_freq; - - cpufreq_frequency_table_get_attr(pmac_cpu_freqs, policy->cpu); - return cpufreq_frequency_table_cpuinfo(policy, pmac_cpu_freqs); -} - -static u32 read_gpio(struct device_node *np) -{ - u32 *reg = (u32 *)get_property(np, "reg", NULL); - u32 offset; - - if (reg == NULL) - return 0; - /* That works for all keylargos but shall be fixed properly - * some day... The problem is that it seems we can't rely - * on the "reg" property of the GPIO nodes, they are either - * relative to the base of KeyLargo or to the base of the - * GPIO space, and the device-tree doesn't help. - */ - offset = *reg; - if (offset < KEYLARGO_GPIO_LEVELS0) - offset += KEYLARGO_GPIO_LEVELS0; - return offset; -} - -static int pmac_cpufreq_suspend(struct cpufreq_policy *policy, pm_message_t pmsg) -{ - /* Ok, this could be made a bit smarter, but let's be robust for now. We - * always force a speed change to high speed before sleep, to make sure - * we have appropriate voltage and/or bus speed for the wakeup process, - * and to make sure our loops_per_jiffies are "good enough", that is will - * not cause too short delays if we sleep in low speed and wake in high - * speed.. - */ - no_schedule = 1; - sleep_freq = cur_freq; - if (cur_freq == low_freq && !is_pmu_based) - do_set_cpu_speed(CPUFREQ_HIGH, 0); - return 0; -} - -static int pmac_cpufreq_resume(struct cpufreq_policy *policy) -{ - /* If we resume, first check if we have a get() function */ - if (get_speed_proc) - cur_freq = get_speed_proc(); - else - cur_freq = 0; - - /* We don't, hrm... we don't really know our speed here, best - * is that we force a switch to whatever it was, which is - * probably high speed due to our suspend() routine - */ - do_set_cpu_speed(sleep_freq == low_freq ? - CPUFREQ_LOW : CPUFREQ_HIGH, 0); - - no_schedule = 0; - return 0; -} - -static struct cpufreq_driver pmac_cpufreq_driver = { - .verify = pmac_cpufreq_verify, - .target = pmac_cpufreq_target, - .get = pmac_cpufreq_get_speed, - .init = pmac_cpufreq_cpu_init, - .suspend = pmac_cpufreq_suspend, - .resume = pmac_cpufreq_resume, - .flags = CPUFREQ_PM_NO_WARN, - .attr = pmac_cpu_freqs_attr, - .name = "powermac", - .owner = THIS_MODULE, -}; - - -static int pmac_cpufreq_init_MacRISC3(struct device_node *cpunode) -{ - struct device_node *volt_gpio_np = of_find_node_by_name(NULL, - "voltage-gpio"); - struct device_node *freq_gpio_np = of_find_node_by_name(NULL, - "frequency-gpio"); - struct device_node *slew_done_gpio_np = of_find_node_by_name(NULL, - "slewing-done"); - u32 *value; - - /* - * Check to see if it's GPIO driven or PMU only - * - * The way we extract the GPIO address is slightly hackish, but it - * works well enough for now. We need to abstract the whole GPIO - * stuff sooner or later anyway - */ - - if (volt_gpio_np) - voltage_gpio = read_gpio(volt_gpio_np); - if (freq_gpio_np) - frequency_gpio = read_gpio(freq_gpio_np); - if (slew_done_gpio_np) - slew_done_gpio = read_gpio(slew_done_gpio_np); - - /* If we use the frequency GPIOs, calculate the min/max speeds based - * on the bus frequencies - */ - if (frequency_gpio && slew_done_gpio) { - int lenp, rc; - u32 *freqs, *ratio; - - freqs = (u32 *)get_property(cpunode, "bus-frequencies", &lenp); - lenp /= sizeof(u32); - if (freqs == NULL || lenp != 2) { - printk(KERN_ERR "cpufreq: bus-frequencies incorrect or missing\n"); - return 1; - } - ratio = (u32 *)get_property(cpunode, "processor-to-bus-ratio*2", NULL); - if (ratio == NULL) { - printk(KERN_ERR "cpufreq: processor-to-bus-ratio*2 missing\n"); - return 1; - } - - /* Get the min/max bus frequencies */ - low_freq = min(freqs[0], freqs[1]); - hi_freq = max(freqs[0], freqs[1]); - - /* Grrrr.. It _seems_ that the device-tree is lying on the low bus - * frequency, it claims it to be around 84Mhz on some models while - * it appears to be approx. 101Mhz on all. Let's hack around here... - * fortunately, we don't need to be too precise - */ - if (low_freq < 98000000) - low_freq = 101000000; - - /* Convert those to CPU core clocks */ - low_freq = (low_freq * (*ratio)) / 2000; - hi_freq = (hi_freq * (*ratio)) / 2000; - - /* Now we get the frequencies, we read the GPIO to see what is out current - * speed - */ - rc = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, frequency_gpio, 0); - cur_freq = (rc & 0x01) ? hi_freq : low_freq; - - set_speed_proc = gpios_set_cpu_speed; - return 1; - } - - /* If we use the PMU, look for the min & max frequencies in the - * device-tree - */ - value = (u32 *)get_property(cpunode, "min-clock-frequency", NULL); - if (!value) - return 1; - low_freq = (*value) / 1000; - /* The PowerBook G4 12" (PowerBook6,1) has an error in the device-tree - * here */ - if (low_freq < 100000) - low_freq *= 10; - - value = (u32 *)get_property(cpunode, "max-clock-frequency", NULL); - if (!value) - return 1; - hi_freq = (*value) / 1000; - set_speed_proc = pmu_set_cpu_speed; - is_pmu_based = 1; - - return 0; -} - -static int pmac_cpufreq_init_7447A(struct device_node *cpunode) -{ - struct device_node *volt_gpio_np; - - if (get_property(cpunode, "dynamic-power-step", NULL) == NULL) - return 1; - - volt_gpio_np = of_find_node_by_name(NULL, "cpu-vcore-select"); - if (volt_gpio_np) - voltage_gpio = read_gpio(volt_gpio_np); - if (!voltage_gpio){ - printk(KERN_ERR "cpufreq: missing cpu-vcore-select gpio\n"); - return 1; - } - - /* OF only reports the high frequency */ - hi_freq = cur_freq; - low_freq = cur_freq/2; - - /* Read actual frequency from CPU */ - cur_freq = dfs_get_cpu_speed(); - set_speed_proc = dfs_set_cpu_speed; - get_speed_proc = dfs_get_cpu_speed; - - return 0; -} - -static int pmac_cpufreq_init_750FX(struct device_node *cpunode) -{ - struct device_node *volt_gpio_np; - u32 pvr, *value; - - if (get_property(cpunode, "dynamic-power-step", NULL) == NULL) - return 1; - - hi_freq = cur_freq; - value = (u32 *)get_property(cpunode, "reduced-clock-frequency", NULL); - if (!value) - return 1; - low_freq = (*value) / 1000; - - volt_gpio_np = of_find_node_by_name(NULL, "cpu-vcore-select"); - if (volt_gpio_np) - voltage_gpio = read_gpio(volt_gpio_np); - - pvr = mfspr(SPRN_PVR); - has_cpu_l2lve = !((pvr & 0xf00) == 0x100); - - set_speed_proc = cpu_750fx_cpu_speed; - get_speed_proc = cpu_750fx_get_cpu_speed; - cur_freq = cpu_750fx_get_cpu_speed(); - - return 0; -} - -/* Currently, we support the following machines: - * - * - Titanium PowerBook 1Ghz (PMU based, 667Mhz & 1Ghz) - * - Titanium PowerBook 800 (PMU based, 667Mhz & 800Mhz) - * - Titanium PowerBook 400 (PMU based, 300Mhz & 400Mhz) - * - Titanium PowerBook 500 (PMU based, 300Mhz & 500Mhz) - * - iBook2 500/600 (PMU based, 400Mhz & 500/600Mhz) - * - iBook2 700 (CPU based, 400Mhz & 700Mhz, support low voltage) - * - Recent MacRISC3 laptops - * - All new machines with 7447A CPUs - */ -static int __init pmac_cpufreq_setup(void) -{ - struct device_node *cpunode; - u32 *value; - - if (strstr(cmd_line, "nocpufreq")) - return 0; - - /* Assume only one CPU */ - cpunode = find_type_devices("cpu"); - if (!cpunode) - goto out; - - /* Get current cpu clock freq */ - value = (u32 *)get_property(cpunode, "clock-frequency", NULL); - if (!value) - goto out; - cur_freq = (*value) / 1000; - - /* Check for 7447A based MacRISC3 */ - if (machine_is_compatible("MacRISC3") && - get_property(cpunode, "dynamic-power-step", NULL) && - PVR_VER(mfspr(SPRN_PVR)) == 0x8003) { - pmac_cpufreq_init_7447A(cpunode); - /* Check for other MacRISC3 machines */ - } else if (machine_is_compatible("PowerBook3,4") || - machine_is_compatible("PowerBook3,5") || - machine_is_compatible("MacRISC3")) { - pmac_cpufreq_init_MacRISC3(cpunode); - /* Else check for iBook2 500/600 */ - } else if (machine_is_compatible("PowerBook4,1")) { - hi_freq = cur_freq; - low_freq = 400000; - set_speed_proc = pmu_set_cpu_speed; - is_pmu_based = 1; - } - /* Else check for TiPb 550 */ - else if (machine_is_compatible("PowerBook3,3") && cur_freq == 550000) { - hi_freq = cur_freq; - low_freq = 500000; - set_speed_proc = pmu_set_cpu_speed; - is_pmu_based = 1; - } - /* Else check for TiPb 400 & 500 */ - else if (machine_is_compatible("PowerBook3,2")) { - /* We only know about the 400 MHz and the 500Mhz model - * they both have 300 MHz as low frequency - */ - if (cur_freq < 350000 || cur_freq > 550000) - goto out; - hi_freq = cur_freq; - low_freq = 300000; - set_speed_proc = pmu_set_cpu_speed; - is_pmu_based = 1; - } - /* Else check for 750FX */ - else if (PVR_VER(mfspr(SPRN_PVR)) == 0x7000) - pmac_cpufreq_init_750FX(cpunode); -out: - if (set_speed_proc == NULL) - return -ENODEV; - - pmac_cpu_freqs[CPUFREQ_LOW].frequency = low_freq; - pmac_cpu_freqs[CPUFREQ_HIGH].frequency = hi_freq; - - printk(KERN_INFO "Registering PowerMac CPU frequency driver\n"); - printk(KERN_INFO "Low: %d Mhz, High: %d Mhz, Boot: %d Mhz\n", - low_freq/1000, hi_freq/1000, cur_freq/1000); - - return cpufreq_register_driver(&pmac_cpufreq_driver); -} - -module_init(pmac_cpufreq_setup); - Index: linux-work/arch/powerpc/platforms/powermac/cpufreq_32.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/platforms/powermac/cpufreq_32.c 2005-11-07 11:58:31.000000000 +1100 @@ -0,0 +1,727 @@ +/* + * arch/ppc/platforms/pmac_cpufreq.c + * + * Copyright (C) 2002 - 2005 Benjamin Herrenschmidt + * Copyright (C) 2004 John Steele Scott + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * TODO: Need a big cleanup here. Basically, we need to have different + * cpufreq_driver structures for the different type of HW instead of the + * current mess. We also need to better deal with the detection of the + * type of machine. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* WARNING !!! This will cause calibrate_delay() to be called, + * but this is an __init function ! So you MUST go edit + * init/main.c to make it non-init before enabling DEBUG_FREQ + */ +#undef DEBUG_FREQ + +/* + * There is a problem with the core cpufreq code on SMP kernels, + * it won't recalculate the Bogomips properly + */ +#ifdef CONFIG_SMP +#warning "WARNING, CPUFREQ not recommended on SMP kernels" +#endif + +extern void low_choose_7447a_dfs(int dfs); +extern void low_choose_750fx_pll(int pll); +extern void low_sleep_handler(void); + +/* + * Currently, PowerMac cpufreq supports only high & low frequencies + * that are set by the firmware + */ +static unsigned int low_freq; +static unsigned int hi_freq; +static unsigned int cur_freq; +static unsigned int sleep_freq; + +/* + * Different models uses different mecanisms to switch the frequency + */ +static int (*set_speed_proc)(int low_speed); +static unsigned int (*get_speed_proc)(void); + +/* + * Some definitions used by the various speedprocs + */ +static u32 voltage_gpio; +static u32 frequency_gpio; +static u32 slew_done_gpio; +static int no_schedule; +static int has_cpu_l2lve; +static int is_pmu_based; + +/* There are only two frequency states for each processor. Values + * are in kHz for the time being. + */ +#define CPUFREQ_HIGH 0 +#define CPUFREQ_LOW 1 + +static struct cpufreq_frequency_table pmac_cpu_freqs[] = { + {CPUFREQ_HIGH, 0}, + {CPUFREQ_LOW, 0}, + {0, CPUFREQ_TABLE_END}, +}; + +static struct freq_attr* pmac_cpu_freqs_attr[] = { + &cpufreq_freq_attr_scaling_available_freqs, + NULL, +}; + +static inline void local_delay(unsigned long ms) +{ + if (no_schedule) + mdelay(ms); + else + msleep(ms); +} + +#ifdef DEBUG_FREQ +static inline void debug_calc_bogomips(void) +{ + /* This will cause a recalc of bogomips and display the + * result. We backup/restore the value to avoid affecting the + * core cpufreq framework's own calculation. + */ + extern void calibrate_delay(void); + + unsigned long save_lpj = loops_per_jiffy; + calibrate_delay(); + loops_per_jiffy = save_lpj; +} +#endif /* DEBUG_FREQ */ + +/* Switch CPU speed under 750FX CPU control + */ +static int cpu_750fx_cpu_speed(int low_speed) +{ + u32 hid2; + + if (low_speed == 0) { + /* ramping up, set voltage first */ + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); + /* Make sure we sleep for at least 1ms */ + local_delay(10); + + /* tweak L2 for high voltage */ + if (has_cpu_l2lve) { + hid2 = mfspr(SPRN_HID2); + hid2 &= ~0x2000; + mtspr(SPRN_HID2, hid2); + } + } +#ifdef CONFIG_6xx + low_choose_750fx_pll(low_speed); +#endif + if (low_speed == 1) { + /* tweak L2 for low voltage */ + if (has_cpu_l2lve) { + hid2 = mfspr(SPRN_HID2); + hid2 |= 0x2000; + mtspr(SPRN_HID2, hid2); + } + + /* ramping down, set voltage last */ + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); + local_delay(10); + } + + return 0; +} + +static unsigned int cpu_750fx_get_cpu_speed(void) +{ + if (mfspr(SPRN_HID1) & HID1_PS) + return low_freq; + else + return hi_freq; +} + +/* Switch CPU speed using DFS */ +static int dfs_set_cpu_speed(int low_speed) +{ + if (low_speed == 0) { + /* ramping up, set voltage first */ + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); + /* Make sure we sleep for at least 1ms */ + local_delay(1); + } + + /* set frequency */ +#ifdef CONFIG_6xx + low_choose_7447a_dfs(low_speed); +#endif + udelay(100); + + if (low_speed == 1) { + /* ramping down, set voltage last */ + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); + local_delay(1); + } + + return 0; +} + +static unsigned int dfs_get_cpu_speed(void) +{ + if (mfspr(SPRN_HID1) & HID1_DFS) + return low_freq; + else + return hi_freq; +} + + +/* Switch CPU speed using slewing GPIOs + */ +static int gpios_set_cpu_speed(int low_speed) +{ + int gpio, timeout = 0; + + /* If ramping up, set voltage first */ + if (low_speed == 0) { + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x05); + /* Delay is way too big but it's ok, we schedule */ + local_delay(10); + } + + /* Set frequency */ + gpio = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, frequency_gpio, 0); + if (low_speed == ((gpio & 0x01) == 0)) + goto skip; + + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, frequency_gpio, + low_speed ? 0x04 : 0x05); + udelay(200); + do { + if (++timeout > 100) + break; + local_delay(1); + gpio = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, slew_done_gpio, 0); + } while((gpio & 0x02) == 0); + skip: + /* If ramping down, set voltage last */ + if (low_speed == 1) { + pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, voltage_gpio, 0x04); + /* Delay is way too big but it's ok, we schedule */ + local_delay(10); + } + +#ifdef DEBUG_FREQ + debug_calc_bogomips(); +#endif + + return 0; +} + +/* Switch CPU speed under PMU control + */ +static int pmu_set_cpu_speed(int low_speed) +{ + struct adb_request req; + unsigned long save_l2cr; + unsigned long save_l3cr; + unsigned int pic_prio; + unsigned long flags; + + preempt_disable(); + +#ifdef DEBUG_FREQ + printk(KERN_DEBUG "HID1, before: %x\n", mfspr(SPRN_HID1)); +#endif + pmu_suspend(); + + /* Disable all interrupt sources on openpic */ + pic_prio = mpic_cpu_get_priority(); + mpic_cpu_set_priority(0xf); + + /* Make sure the decrementer won't interrupt us */ + asm volatile("mtdec %0" : : "r" (0x7fffffff)); + /* Make sure any pending DEC interrupt occuring while we did + * the above didn't re-enable the DEC */ + mb(); + asm volatile("mtdec %0" : : "r" (0x7fffffff)); + + /* We can now disable MSR_EE */ + local_irq_save(flags); + + /* Giveup the FPU & vec */ + enable_kernel_fp(); + +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + enable_kernel_altivec(); +#endif /* CONFIG_ALTIVEC */ + + /* Save & disable L2 and L3 caches */ + save_l3cr = _get_L3CR(); /* (returns -1 if not available) */ + save_l2cr = _get_L2CR(); /* (returns -1 if not available) */ + + /* Send the new speed command. My assumption is that this command + * will cause PLL_CFG[0..3] to be changed next time CPU goes to sleep + */ + pmu_request(&req, NULL, 6, PMU_CPU_SPEED, 'W', 'O', 'O', 'F', low_speed); + while (!req.complete) + pmu_poll(); + + /* Prepare the northbridge for the speed transition */ + pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,1,1); + + /* Call low level code to backup CPU state and recover from + * hardware reset + */ + low_sleep_handler(); + + /* Restore the northbridge */ + pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,1,0); + + /* Restore L2 cache */ + if (save_l2cr != 0xffffffff && (save_l2cr & L2CR_L2E) != 0) + _set_L2CR(save_l2cr); + /* Restore L3 cache */ + if (save_l3cr != 0xffffffff && (save_l3cr & L3CR_L3E) != 0) + _set_L3CR(save_l3cr); + + /* Restore userland MMU context */ + set_context(current->active_mm->context, current->active_mm->pgd); + +#ifdef DEBUG_FREQ + printk(KERN_DEBUG "HID1, after: %x\n", mfspr(SPRN_HID1)); +#endif + + /* Restore low level PMU operations */ + pmu_unlock(); + + /* Restore decrementer */ + wakeup_decrementer(); + + /* Restore interrupts */ + mpic_cpu_set_priority(pic_prio); + + /* Let interrupts flow again ... */ + local_irq_restore(flags); + +#ifdef DEBUG_FREQ + debug_calc_bogomips(); +#endif + + pmu_resume(); + + preempt_enable(); + + return 0; +} + +static int do_set_cpu_speed(int speed_mode, int notify) +{ + struct cpufreq_freqs freqs; + unsigned long l3cr; + static unsigned long prev_l3cr; + + freqs.old = cur_freq; + freqs.new = (speed_mode == CPUFREQ_HIGH) ? hi_freq : low_freq; + freqs.cpu = smp_processor_id(); + + if (freqs.old == freqs.new) + return 0; + + if (notify) + cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE); + if (speed_mode == CPUFREQ_LOW && + cpu_has_feature(CPU_FTR_L3CR)) { + l3cr = _get_L3CR(); + if (l3cr & L3CR_L3E) { + prev_l3cr = l3cr; + _set_L3CR(0); + } + } + set_speed_proc(speed_mode == CPUFREQ_LOW); + if (speed_mode == CPUFREQ_HIGH && + cpu_has_feature(CPU_FTR_L3CR)) { + l3cr = _get_L3CR(); + if ((prev_l3cr & L3CR_L3E) && l3cr != prev_l3cr) + _set_L3CR(prev_l3cr); + } + if (notify) + cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); + cur_freq = (speed_mode == CPUFREQ_HIGH) ? hi_freq : low_freq; + + return 0; +} + +static unsigned int pmac_cpufreq_get_speed(unsigned int cpu) +{ + return cur_freq; +} + +static int pmac_cpufreq_verify(struct cpufreq_policy *policy) +{ + return cpufreq_frequency_table_verify(policy, pmac_cpu_freqs); +} + +static int pmac_cpufreq_target( struct cpufreq_policy *policy, + unsigned int target_freq, + unsigned int relation) +{ + unsigned int newstate = 0; + int rc; + + if (cpufreq_frequency_table_target(policy, pmac_cpu_freqs, + target_freq, relation, &newstate)) + return -EINVAL; + + rc = do_set_cpu_speed(newstate, 1); + + ppc_proc_freq = cur_freq * 1000ul; + return rc; +} + +static int pmac_cpufreq_cpu_init(struct cpufreq_policy *policy) +{ + if (policy->cpu != 0) + return -ENODEV; + + policy->governor = CPUFREQ_DEFAULT_GOVERNOR; + policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL; + policy->cur = cur_freq; + + cpufreq_frequency_table_get_attr(pmac_cpu_freqs, policy->cpu); + return cpufreq_frequency_table_cpuinfo(policy, pmac_cpu_freqs); +} + +static u32 read_gpio(struct device_node *np) +{ + u32 *reg = (u32 *)get_property(np, "reg", NULL); + u32 offset; + + if (reg == NULL) + return 0; + /* That works for all keylargos but shall be fixed properly + * some day... The problem is that it seems we can't rely + * on the "reg" property of the GPIO nodes, they are either + * relative to the base of KeyLargo or to the base of the + * GPIO space, and the device-tree doesn't help. + */ + offset = *reg; + if (offset < KEYLARGO_GPIO_LEVELS0) + offset += KEYLARGO_GPIO_LEVELS0; + return offset; +} + +static int pmac_cpufreq_suspend(struct cpufreq_policy *policy, pm_message_t pmsg) +{ + /* Ok, this could be made a bit smarter, but let's be robust for now. We + * always force a speed change to high speed before sleep, to make sure + * we have appropriate voltage and/or bus speed for the wakeup process, + * and to make sure our loops_per_jiffies are "good enough", that is will + * not cause too short delays if we sleep in low speed and wake in high + * speed.. + */ + no_schedule = 1; + sleep_freq = cur_freq; + if (cur_freq == low_freq && !is_pmu_based) + do_set_cpu_speed(CPUFREQ_HIGH, 0); + return 0; +} + +static int pmac_cpufreq_resume(struct cpufreq_policy *policy) +{ + /* If we resume, first check if we have a get() function */ + if (get_speed_proc) + cur_freq = get_speed_proc(); + else) + cur_freq = 0; + + /* We don't, hrm... we don't really know our speed here, best + * is that we force a switch to whatever it was, which is + * probably high speed due to our suspend() routine + */ + do_set_cpu_speed(sleep_freq == low_freq ? + CPUFREQ_LOW : CPUFREQ_HIGH, 0); + + ppc_proc_freq = cur_freq * 1000ul; + + no_schedule = 0; + return 0; +} + +static struct cpufreq_driver pmac_cpufreq_driver = { + .verify = pmac_cpufreq_verify, + .target = pmac_cpufreq_target, + .get = pmac_cpufreq_get_speed, + .init = pmac_cpufreq_cpu_init, + .suspend = pmac_cpufreq_suspend, + .resume = pmac_cpufreq_resume, + .flags = CPUFREQ_PM_NO_WARN, + .attr = pmac_cpu_freqs_attr, + .name = "powermac", + .owner = THIS_MODULE, +}; + + +static int pmac_cpufreq_init_MacRISC3(struct device_node *cpunode) +{ + struct device_node *volt_gpio_np = of_find_node_by_name(NULL, + "voltage-gpio"); + struct device_node *freq_gpio_np = of_find_node_by_name(NULL, + "frequency-gpio"); + struct device_node *slew_done_gpio_np = of_find_node_by_name(NULL, + "slewing-done"); + u32 *value; + + /* + * Check to see if it's GPIO driven or PMU only + * + * The way we extract the GPIO address is slightly hackish, but it + * works well enough for now. We need to abstract the whole GPIO + * stuff sooner or later anyway + */ + + if (volt_gpio_np) + voltage_gpio = read_gpio(volt_gpio_np); + if (freq_gpio_np) + frequency_gpio = read_gpio(freq_gpio_np); + if (slew_done_gpio_np) + slew_done_gpio = read_gpio(slew_done_gpio_np); + + /* If we use the frequency GPIOs, calculate the min/max speeds based + * on the bus frequencies + */ + if (frequency_gpio && slew_done_gpio) { + int lenp, rc; + u32 *freqs, *ratio; + + freqs = (u32 *)get_property(cpunode, "bus-frequencies", &lenp); + lenp /= sizeof(u32); + if (freqs == NULL || lenp != 2) { + printk(KERN_ERR "cpufreq: bus-frequencies incorrect or missing\n"); + return 1; + } + ratio = (u32 *)get_property(cpunode, "processor-to-bus-ratio*2", NULL); + if (ratio == NULL) { + printk(KERN_ERR "cpufreq: processor-to-bus-ratio*2 missing\n"); + return 1; + } + + /* Get the min/max bus frequencies */ + low_freq = min(freqs[0], freqs[1]); + hi_freq = max(freqs[0], freqs[1]); + + /* Grrrr.. It _seems_ that the device-tree is lying on the low bus + * frequency, it claims it to be around 84Mhz on some models while + * it appears to be approx. 101Mhz on all. Let's hack around here... + * fortunately, we don't need to be too precise + */ + if (low_freq < 98000000) + low_freq = 101000000; + + /* Convert those to CPU core clocks */ + low_freq = (low_freq * (*ratio)) / 2000; + hi_freq = (hi_freq * (*ratio)) / 2000; + + /* Now we get the frequencies, we read the GPIO to see what is out current + * speed + */ + rc = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, frequency_gpio, 0); + cur_freq = (rc & 0x01) ? hi_freq : low_freq; + + set_speed_proc = gpios_set_cpu_speed; + return 1; + } + + /* If we use the PMU, look for the min & max frequencies in the + * device-tree + */ + value = (u32 *)get_property(cpunode, "min-clock-frequency", NULL); + if (!value) + return 1; + low_freq = (*value) / 1000; + /* The PowerBook G4 12" (PowerBook6,1) has an error in the device-tree + * here */ + if (low_freq < 100000) + low_freq *= 10; + + value = (u32 *)get_property(cpunode, "max-clock-frequency", NULL); + if (!value) + return 1; + hi_freq = (*value) / 1000; + set_speed_proc = pmu_set_cpu_speed; + is_pmu_based = 1; + + return 0; +} + +static int pmac_cpufreq_init_7447A(struct device_node *cpunode) +{ + struct device_node *volt_gpio_np; + + if (get_property(cpunode, "dynamic-power-step", NULL) == NULL) + return 1; + + volt_gpio_np = of_find_node_by_name(NULL, "cpu-vcore-select"); + if (volt_gpio_np) + voltage_gpio = read_gpio(volt_gpio_np); + if (!voltage_gpio){ + printk(KERN_ERR "cpufreq: missing cpu-vcore-select gpio\n"); + return 1; + } + + /* OF only reports the high frequency */ + hi_freq = cur_freq; + low_freq = cur_freq/2; + + /* Read actual frequency from CPU */ + cur_freq = dfs_get_cpu_speed(); + set_speed_proc = dfs_set_cpu_speed; + get_speed_proc = dfs_get_cpu_speed; + + return 0; +} + +static int pmac_cpufreq_init_750FX(struct device_node *cpunode) +{ + struct device_node *volt_gpio_np; + u32 pvr, *value; + + if (get_property(cpunode, "dynamic-power-step", NULL) == NULL) + return 1; + + hi_freq = cur_freq; + value = (u32 *)get_property(cpunode, "reduced-clock-frequency", NULL); + if (!value) + return 1; + low_freq = (*value) / 1000; + + volt_gpio_np = of_find_node_by_name(NULL, "cpu-vcore-select"); + if (volt_gpio_np) + voltage_gpio = read_gpio(volt_gpio_np); + + pvr = mfspr(SPRN_PVR); + has_cpu_l2lve = !((pvr & 0xf00) == 0x100); + + set_speed_proc = cpu_750fx_cpu_speed; + get_speed_proc = cpu_750fx_get_cpu_speed; + cur_freq = cpu_750fx_get_cpu_speed(); + + return 0; +} + +/* Currently, we support the following machines: + * + * - Titanium PowerBook 1Ghz (PMU based, 667Mhz & 1Ghz) + * - Titanium PowerBook 800 (PMU based, 667Mhz & 800Mhz) + * - Titanium PowerBook 400 (PMU based, 300Mhz & 400Mhz) + * - Titanium PowerBook 500 (PMU based, 300Mhz & 500Mhz) + * - iBook2 500/600 (PMU based, 400Mhz & 500/600Mhz) + * - iBook2 700 (CPU based, 400Mhz & 700Mhz, support low voltage) + * - Recent MacRISC3 laptops + * - All new machines with 7447A CPUs + */ +static int __init pmac_cpufreq_setup(void) +{ + struct device_node *cpunode; + u32 *value; + + if (strstr(cmd_line, "nocpufreq")) + return 0; + + /* Assume only one CPU */ + cpunode = find_type_devices("cpu"); + if (!cpunode) + goto out; + + /* Get current cpu clock freq */ + value = (u32 *)get_property(cpunode, "clock-frequency", NULL); + if (!value) + goto out; + cur_freq = (*value) / 1000; + + /* Check for 7447A based MacRISC3 */ + if (machine_is_compatible("MacRISC3") && + get_property(cpunode, "dynamic-power-step", NULL) && + PVR_VER(mfspr(SPRN_PVR)) == 0x8003) { + pmac_cpufreq_init_7447A(cpunode); + /* Check for other MacRISC3 machines */ + } else if (machine_is_compatible("PowerBook3,4") || + machine_is_compatible("PowerBook3,5") || + machine_is_compatible("MacRISC3")) { + pmac_cpufreq_init_MacRISC3(cpunode); + /* Else check for iBook2 500/600 */ + } else if (machine_is_compatible("PowerBook4,1")) { + hi_freq = cur_freq; + low_freq = 400000; + set_speed_proc = pmu_set_cpu_speed; + is_pmu_based = 1; + } + /* Else check for TiPb 550 */ + else if (machine_is_compatible("PowerBook3,3") && cur_freq == 550000) { + hi_freq = cur_freq; + low_freq = 500000; + set_speed_proc = pmu_set_cpu_speed; + is_pmu_based = 1; + } + /* Else check for TiPb 400 & 500 */ + else if (machine_is_compatible("PowerBook3,2")) { + /* We only know about the 400 MHz and the 500Mhz model + * they both have 300 MHz as low frequency + */ + if (cur_freq < 350000 || cur_freq > 550000) + goto out; + hi_freq = cur_freq; + low_freq = 300000; + set_speed_proc = pmu_set_cpu_speed; + is_pmu_based = 1; + } + /* Else check for 750FX */ + else if (PVR_VER(mfspr(SPRN_PVR)) == 0x7000) + pmac_cpufreq_init_750FX(cpunode); +out: + if (set_speed_proc == NULL) + return -ENODEV; + + pmac_cpu_freqs[CPUFREQ_LOW].frequency = low_freq; + pmac_cpu_freqs[CPUFREQ_HIGH].frequency = hi_freq; + ppc_proc_freq = cur_freq * 1000ul; + + printk(KERN_INFO "Registering PowerMac CPU frequency driver\n"); + printk(KERN_INFO "Low: %d Mhz, High: %d Mhz, Boot: %d Mhz\n", + low_freq/1000, hi_freq/1000, cur_freq/1000); + + return cpufreq_register_driver(&pmac_cpufreq_driver); +} + +module_init(pmac_cpufreq_setup); + Index: linux-work/arch/powerpc/platforms/powermac/cpufreq_64.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/platforms/powermac/cpufreq_64.c 2005-11-07 12:01:00.000000000 +1100 @@ -0,0 +1,323 @@ +/* + * Copyright (C) 2002 - 2005 Benjamin Herrenschmidt + * and Markus Demleitner + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This driver adds basic cpufreq support for SMU & 970FX based G5 Macs, + * that is iMac G5 and latest single CPU desktop. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + +/* see 970FX user manual */ + +#define SCOM_PCR 0x0aa001 /* PCR scom addr */ + +#define PCR_HILO_SELECT 0x80000000U /* 1 = PCR, 0 = PCRH */ +#define PCR_SPEED_FULL 0x00000000U /* 1:1 speed value */ +#define PCR_SPEED_HALF 0x00020000U /* 1:2 speed value */ +#define PCR_SPEED_QUARTER 0x00040000U /* 1:4 speed value */ +#define PCR_SPEED_MASK 0x000e0000U /* speed mask */ +#define PCR_SPEED_SHIFT 17 +#define PCR_FREQ_REQ_VALID 0x00010000U /* freq request valid */ +#define PCR_VOLT_REQ_VALID 0x00008000U /* volt request valid */ +#define PCR_TARGET_TIME_MASK 0x00006000U /* target time */ +#define PCR_STATLAT_MASK 0x00001f00U /* STATLAT value */ +#define PCR_SNOOPLAT_MASK 0x000000f0U /* SNOOPLAT value */ +#define PCR_SNOOPACC_MASK 0x0000000fU /* SNOOPACC value */ + +#define SCOM_PSR 0x408001 /* PSR scom addr */ +/* warning: PSR is a 64 bits register */ +#define PSR_CMD_RECEIVED 0x2000000000000000U /* command received */ +#define PSR_CMD_COMPLETED 0x1000000000000000U /* command completed */ +#define PSR_CUR_SPEED_MASK 0x0300000000000000U /* current speed */ +#define PSR_CUR_SPEED_SHIFT (56) + +/* + * The G5 only supports two frequencies (Quarter speed is not supported) + */ +#define CPUFREQ_HIGH 0 +#define CPUFREQ_LOW 1 + +static struct cpufreq_frequency_table g5_cpu_freqs[] = { + {CPUFREQ_HIGH, 0}, + {CPUFREQ_LOW, 0}, + {0, CPUFREQ_TABLE_END}, +}; + +static struct freq_attr* g5_cpu_freqs_attr[] = { + &cpufreq_freq_attr_scaling_available_freqs, + NULL, +}; + +/* Power mode data is an array of the 32 bits PCR values to use for + * the various frequencies, retreived from the device-tree + */ +static u32 *g5_pmode_data; +static int g5_pmode_max; +static int g5_pmode_cur; + +static DECLARE_MUTEX(g5_switch_mutex); + + +static struct smu_sdbp_fvt *g5_fvt_table; /* table of op. points */ +static int g5_fvt_count; /* number of op. points */ +static int g5_fvt_cur; /* current op. point */ + +/* ----------------- real hardware interface */ + +static void g5_switch_volt(int speed_mode) +{ + struct smu_simple_cmd cmd; + + DECLARE_COMPLETION(comp); + smu_queue_simple(&cmd, SMU_CMD_POWER_COMMAND, 8, smu_done_complete, + &comp, 'V', 'S', 'L', 'E', 'W', + 0xff, g5_fvt_cur+1, speed_mode); + wait_for_completion(&comp); +} + +static int g5_switch_freq(int speed_mode) +{ + struct cpufreq_freqs freqs; + int to; + + if (g5_pmode_cur == speed_mode) + return 0; + + down(&g5_switch_mutex); + + freqs.old = g5_cpu_freqs[g5_pmode_cur].frequency; + freqs.new = g5_cpu_freqs[speed_mode].frequency; + freqs.cpu = 0; + + cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE); + + /* If frequency is going up, first ramp up the voltage */ + if (speed_mode < g5_pmode_cur) + g5_switch_volt(speed_mode); + + /* Clear PCR high */ + scom970_write(SCOM_PCR, 0); + /* Clear PCR low */ + scom970_write(SCOM_PCR, PCR_HILO_SELECT | 0); + /* Set PCR low */ + scom970_write(SCOM_PCR, PCR_HILO_SELECT | + g5_pmode_data[speed_mode]); + + /* Wait for completion */ + for (to = 0; to < 10; to++) { + unsigned long psr = scom970_read(SCOM_PSR); + + if ((psr & PSR_CMD_RECEIVED) == 0 && + (((psr >> PSR_CUR_SPEED_SHIFT) ^ + (g5_pmode_data[speed_mode] >> PCR_SPEED_SHIFT)) & 0x3) + == 0) + break; + if (psr & PSR_CMD_COMPLETED) + break; + udelay(100); + } + + /* If frequency is going down, last ramp the voltage */ + if (speed_mode > g5_pmode_cur) + g5_switch_volt(speed_mode); + + g5_pmode_cur = speed_mode; + ppc_proc_freq = g5_cpu_freqs[speed_mode].frequency * 1000ul; + + cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); + + up(&g5_switch_mutex); + + return 0; +} + +static int g5_query_freq(void) +{ + unsigned long psr = scom970_read(SCOM_PSR); + int i; + + for (i = 0; i <= g5_pmode_max; i++) + if ((((psr >> PSR_CUR_SPEED_SHIFT) ^ + (g5_pmode_data[i] >> PCR_SPEED_SHIFT)) & 0x3) == 0) + break; + return i; +} + +/* ----------------- cpufreq bookkeeping */ + +static int g5_cpufreq_verify(struct cpufreq_policy *policy) +{ + return cpufreq_frequency_table_verify(policy, g5_cpu_freqs); +} + +static int g5_cpufreq_target(struct cpufreq_policy *policy, + unsigned int target_freq, unsigned int relation) +{ + unsigned int newstate = 0; + + if (cpufreq_frequency_table_target(policy, g5_cpu_freqs, + target_freq, relation, &newstate)) + return -EINVAL; + + return g5_switch_freq(newstate); +} + +static unsigned int g5_cpufreq_get_speed(unsigned int cpu) +{ + return g5_cpu_freqs[g5_pmode_cur].frequency; +} + +static int g5_cpufreq_cpu_init(struct cpufreq_policy *policy) +{ + if (policy->cpu != 0) + return -ENODEV; + + policy->governor = CPUFREQ_DEFAULT_GOVERNOR; + policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL; + policy->cur = g5_cpu_freqs[g5_query_freq()].frequency; + cpufreq_frequency_table_get_attr(g5_cpu_freqs, policy->cpu); + + return cpufreq_frequency_table_cpuinfo(policy, + g5_cpu_freqs); +} + + +static struct cpufreq_driver g5_cpufreq_driver = { + .name = "powermac", + .owner = THIS_MODULE, + .flags = CPUFREQ_CONST_LOOPS, + .init = g5_cpufreq_cpu_init, + .verify = g5_cpufreq_verify, + .target = g5_cpufreq_target, + .get = g5_cpufreq_get_speed, + .attr = g5_cpu_freqs_attr, +}; + + +static int __init g5_cpufreq_init(void) +{ + struct device_node *cpunode; + unsigned int psize, ssize; + struct smu_sdbp_header *shdr; + unsigned long max_freq; + u32 *valp; + int rc = -ENODEV; + + /* Look for CPU and SMU nodes */ + cpunode = of_find_node_by_type(NULL, "cpu"); + if (!cpunode) { + DBG("No CPU node !\n"); + return -ENODEV; + } + + /* Check 970FX for now */ + valp = (u32 *)get_property(cpunode, "cpu-version", NULL); + if (!valp) { + DBG("No cpu-version property !\n"); + goto bail_noprops; + } + if (((*valp) >> 16) != 0x3c) { + DBG("Wrong CPU version: %08x\n", *valp); + goto bail_noprops; + } + + /* Look for the powertune data in the device-tree */ + g5_pmode_data = (u32 *)get_property(cpunode, "power-mode-data",&psize); + if (!g5_pmode_data) { + DBG("No power-mode-data !\n"); + goto bail_noprops; + } + g5_pmode_max = psize / sizeof(u32) - 1; + + /* Look for the FVT table */ + shdr = smu_get_sdb_partition(SMU_SDB_FVT_ID, NULL); + if (!shdr) + goto bail_noprops; + g5_fvt_table = (struct smu_sdbp_fvt *)&shdr[1]; + ssize = (shdr->len * sizeof(u32)) - sizeof(struct smu_sdbp_header); + g5_fvt_count = ssize / sizeof(struct smu_sdbp_fvt); + g5_fvt_cur = 0; + + /* Sanity checking */ + if (g5_fvt_count < 1 || g5_pmode_max < 1) + goto bail_noprops; + + /* + * From what I see, clock-frequency is always the maximal frequency. + * The current driver can not slew sysclk yet, so we really only deal + * with powertune steps for now. We also only implement full freq and + * half freq in this version. So far, I haven't yet seen a machine + * supporting anything else. + */ + valp = (u32 *)get_property(cpunode, "clock-frequency", NULL); + if (!valp) + return -ENODEV; + max_freq = (*valp)/1000; + g5_cpu_freqs[0].frequency = max_freq; + g5_cpu_freqs[1].frequency = max_freq/2; + + /* Check current frequency */ + g5_pmode_cur = g5_query_freq(); + if (g5_pmode_cur > 1) + /* We don't support anything but 1:1 and 1:2, fixup ... */ + g5_pmode_cur = 1; + + /* Force apply current frequency to make sure everything is in + * sync (voltage is right for example). Firmware may leave us with + * a strange setting ... + */ + g5_switch_freq(g5_pmode_cur); + + printk(KERN_INFO "Registering G5 CPU frequency driver\n"); + printk(KERN_INFO "Low: %d Mhz, High: %d Mhz, Cur: %d MHz\n", + g5_cpu_freqs[1].frequency/1000, + g5_cpu_freqs[0].frequency/1000, + g5_cpu_freqs[g5_pmode_cur].frequency/1000); + + rc = cpufreq_register_driver(&g5_cpufreq_driver); + + /* We keep the CPU node on hold... hopefully, Apple G5 don't have + * hotplug CPU with a dynamic device-tree ... + */ + return rc; + + bail_noprops: + of_node_put(cpunode); + + return rc; +} + +module_init(g5_cpufreq_init); + + +MODULE_LICENSE("GPL"); Index: linux-work/include/asm-powerpc/reg.h =================================================================== --- linux-work.orig/include/asm-powerpc/reg.h 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/include/asm-powerpc/reg.h 2005-11-07 11:58:31.000000000 +1100 @@ -396,6 +396,9 @@ #define SPRN_VRSAVE 0x100 /* Vector Register Save Register */ #define SPRN_XER 0x001 /* Fixed Point Exception Register */ +#define SPRN_SCOMC 0x114 /* SCOM Access Control */ +#define SPRN_SCOMD 0x115 /* SCOM Access DATA */ + /* Performance monitor SPRs */ #ifdef CONFIG_PPC64 #define SPRN_MMCR0 795 @@ -594,7 +597,11 @@ mtspr(SPRN_CTRLT, ctrl); } } -#endif + +extern unsigned long scom970_read(unsigned int address); +extern void scom970_write(unsigned int address, unsigned long value); + +#endif /* CONFIG_PPC64 */ #define __get_SP() ({unsigned long sp; \ asm volatile("mr %0,1": "=r" (sp)); sp;}) Index: linux-work/include/asm-powerpc/smu.h =================================================================== --- linux-work.orig/include/asm-powerpc/smu.h 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/include/asm-powerpc/smu.h 2005-11-07 11:58:31.000000000 +1100 @@ -144,7 +144,11 @@ * - lenght 8 ("VSLEWxyz") has 3 additional bytes appended, and is * used to set the voltage slewing point. The SMU replies with "DONE" * I yet have to figure out their exact meaning of those 3 bytes in - * both cases. + * both cases. They seem to be: + * x = processor mask + * y = op. point index + * z = processor freq. step index + * I haven't yet decyphered result codes * */ #define SMU_CMD_POWER_COMMAND 0xaa @@ -333,6 +337,60 @@ #endif /* __KERNEL__ */ + +/* + * - SMU "sdb" partitions informations - + */ + + +/* + * Partition header format + */ +struct smu_sdbp_header { + __u8 id; + __u8 len; + __u8 version; + __u8 flags; +}; + +/* + * 32 bits integers are usually encoded with 2x16 bits swapped, + * this demangles them + */ +#define SMU_U32_MIX(x) ((((x) << 16) & 0xffff0000u) | (((x) >> 16) & 0xffffu)) + +/* This is the definition of the SMU sdb-partition-0x12 table (called + * CPU F/V/T operating points in Darwin). The definition for all those + * SMU tables should be moved to some separate file + */ +#define SMU_SDB_FVT_ID 0x12 + +struct smu_sdbp_fvt { + __u32 sysclk; /* Base SysClk frequency in Hz for + * this operating point + */ + __u8 pad; + __u8 maxtemp; /* Max temp. supported by this + * operating point + */ + + __u16 volts[3]; /* CPU core voltage for the 3 + * PowerTune modes, a mode with + * 0V = not supported. + */ +}; + +#ifdef __KERNEL__ +/* + * This returns the pointer to an SMU "sdb" partition data or NULL + * if not found. The data format is described below + */ +extern struct smu_sdbp_header *smu_get_sdb_partition(int id, + unsigned int *size); + +#endif /* __KERNEL__ */ + + /* * - Userland interface - */ Index: linux-work/arch/powerpc/Kconfig =================================================================== --- linux-work.orig/arch/powerpc/Kconfig 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/powerpc/Kconfig 2005-11-07 11:58:31.000000000 +1100 @@ -404,6 +404,14 @@ this currently includes some models of iBook & Titanium PowerBook. +config CPU_FREQ_PMAC64 + bool "Support for some Apple G5s" + depends on CPU_FREQ && PMAC_SMU && PPC64 + select CPU_FREQ_TABLE + help + This adds support for frequency switching on Apple iMac G5, + and some of the more recent desktop G5 machines as well. + config PPC601_SYNC_FIX bool "Workarounds for PPC601 bugs" depends on 6xx && (PPC_PREP || PPC_PMAC) Index: linux-work/arch/powerpc/kernel/misc_64.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/misc_64.S 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/powerpc/kernel/misc_64.S 2005-11-07 11:58:31.000000000 +1100 @@ -604,6 +604,76 @@ #endif /* defined(CONFIG_PPC_PMAC) || defined(CONFIG_PPC_MAPLE) */ /* + * SCOM access functions for 970 (FX only for now) + * + * unsigned long scom970_read(unsigned int address); + * void scom970_write(unsigned int address, unsigned long value); + * + * The address passed in is the 24 bits register address. This code + * is 970 specific and will not check the status bits, so you should + * know what you are doing. + */ +_GLOBAL(scom970_read) + /* interrupts off */ + mfmsr r4 + ori r0,r4,MSR_EE + xori r0,r0,MSR_EE + mtmsrd r0,1 + + /* rotate 24 bits SCOM address 8 bits left and mask out it's low 8 bits + * (including parity). On current CPUs they must be 0'd, + * and finally or in RW bit + */ + rlwinm r3,r3,8,0,15 + ori r3,r3,0x8000 + + /* do the actual scom read */ + sync + mtspr SPRN_SCOMC,r3 + isync + mfspr r3,SPRN_SCOMD + isync + mfspr r0,SPRN_SCOMC + isync + + /* XXX: fixup result on some buggy 970's (ouch ! we lost a bit, bah + * that's the best we can do). Not implemented yet as we don't use + * the scom on any of the bogus CPUs yet, but may have to be done + * ultimately + */ + + /* restore interrupts */ + mtmsrd r4,1 + blr + + +_GLOBAL(scom970_write) + /* interrupts off */ + mfmsr r5 + ori r0,r5,MSR_EE + xori r0,r0,MSR_EE + mtmsrd r0,1 + + /* rotate 24 bits SCOM address 8 bits left and mask out it's low 8 bits + * (including parity). On current CPUs they must be 0'd. + */ + + rlwinm r3,r3,8,0,15 + + sync + mtspr SPRN_SCOMD,r4 /* write data */ + isync + mtspr SPRN_SCOMC,r3 /* write command */ + isync + mfspr 3,SPRN_SCOMC + isync + + /* restore interrupts */ + mtmsrd r5,1 + blr + + +/* * Create a kernel thread * kernel_thread(fn, arg, flags) */ Index: linux-work/arch/powerpc/platforms/powermac/Makefile =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/Makefile 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/powerpc/platforms/powermac/Makefile 2005-11-07 11:58:31.000000000 +1100 @@ -1,7 +1,8 @@ obj-y += pic.o setup.o time.o feature.o pci.o \ sleep.o low_i2c.o cache.o obj-$(CONFIG_PMAC_BACKLIGHT) += backlight.o -obj-$(CONFIG_CPU_FREQ_PMAC) += cpufreq.o +obj-$(CONFIG_CPU_FREQ_PMAC) += cpufreq_32.o +obj-$(CONFIG_CPU_FREQ_PMAC64) += cpufreq_64.o obj-$(CONFIG_NVRAM) += nvram.o # ppc64 pmac doesn't define CONFIG_NVRAM but needs nvram stuff obj-$(CONFIG_PPC64) += nvram.o Index: linux-work/arch/powerpc/platforms/powermac/setup.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/setup.c 2005-11-07 11:58:28.000000000 +1100 +++ linux-work/arch/powerpc/platforms/powermac/setup.c 2005-11-07 11:58:31.000000000 +1100 @@ -193,18 +193,6 @@ pmac_newworld ? "NewWorld" : "OldWorld"); } -static void pmac_show_percpuinfo(struct seq_file *m, int i) -{ -#ifdef CONFIG_CPU_FREQ_PMAC - extern unsigned int pmac_get_one_cpufreq(int i); - unsigned int freq = pmac_get_one_cpufreq(i); - if (freq != 0) { - seq_printf(m, "clock\t\t: %dMHz\n", freq/1000); - return; - } -#endif /* CONFIG_CPU_FREQ_PMAC */ -} - #ifndef CONFIG_ADB_CUDA int find_via_cuda(void) { @@ -767,7 +755,6 @@ .setup_arch = pmac_setup_arch, .init_early = pmac_init_early, .show_cpuinfo = pmac_show_cpuinfo, - .show_percpuinfo = pmac_show_percpuinfo, .init_IRQ = pmac_pic_init, .get_irq = mpic_get_irq, /* changed later */ .pcibios_fixup = pmac_pcibios_fixup, From benh at kernel.crashing.org Mon Nov 7 14:29:02 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:29:02 +1100 Subject: [PATCH] ppc64: SMU partition recovery Message-ID: <1131334142.5229.154.camel@gaston> This patch adds the ability to the SMU driver to recover missing calibration partitions from the SMU chip itself. It also adds some dynamic mecanism to /proc/device-tree so that new properties are visible to userland. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/drivers/macintosh/smu.c =================================================================== --- linux-work.orig/drivers/macintosh/smu.c 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/drivers/macintosh/smu.c 2005-11-07 12:45:44.000000000 +1100 @@ -47,13 +47,13 @@ #include #include -#define VERSION "0.6" +#define VERSION "0.7" #define AUTHOR "(c) 2005 Benjamin Herrenschmidt, IBM Corp." #undef DEBUG_SMU #ifdef DEBUG_SMU -#define DPRINTK(fmt, args...) do { printk(KERN_DEBUG fmt , ##args); } while (0) +#define DPRINTK(fmt, args...) do { udbg_printf(KERN_DEBUG fmt , ##args); } while (0) #else #define DPRINTK(fmt, args...) do { } while (0) #endif @@ -92,7 +92,7 @@ * for now, just hard code that */ static struct smu_device *smu; - +static DECLARE_MUTEX(smu_part_access); /* * SMU driver low level stuff @@ -113,9 +113,11 @@ DPRINTK("SMU: starting cmd %x, %d bytes data\n", cmd->cmd, cmd->data_len); - DPRINTK("SMU: data buffer: %02x %02x %02x %02x ...\n", + DPRINTK("SMU: data buffer: %02x %02x %02x %02x %02x %02x %02x %02x\n", ((u8 *)cmd->data_buf)[0], ((u8 *)cmd->data_buf)[1], - ((u8 *)cmd->data_buf)[2], ((u8 *)cmd->data_buf)[3]); + ((u8 *)cmd->data_buf)[2], ((u8 *)cmd->data_buf)[3], + ((u8 *)cmd->data_buf)[4], ((u8 *)cmd->data_buf)[5], + ((u8 *)cmd->data_buf)[6], ((u8 *)cmd->data_buf)[7]); /* Fill the SMU command buffer */ smu->cmd_buf->cmd = cmd->cmd; @@ -440,7 +442,7 @@ EXPORT_SYMBOL(smu_present); -int smu_init (void) +int __init smu_init (void) { struct device_node *np; u32 *data; @@ -845,16 +847,154 @@ return 0; } -struct smu_sdbp_header *smu_get_sdb_partition(int id, unsigned int *size) +/* + * Handling of "partitions" + */ + +static int smu_read_datablock(u8 *dest, unsigned int addr, unsigned int len) +{ + DECLARE_COMPLETION(comp); + unsigned int chunk; + struct smu_cmd cmd; + int rc; + u8 params[8]; + + /* We currently use a chunk size of 0xe. We could check the + * SMU firmware version and use bigger sizes though + */ + chunk = 0xe; + + while (len) { + unsigned int clen = min(len, chunk); + + cmd.cmd = SMU_CMD_MISC_ee_COMMAND; + cmd.data_len = 7; + cmd.data_buf = params; + cmd.reply_len = chunk; + cmd.reply_buf = dest; + cmd.done = smu_done_complete; + cmd.misc = ∁ + params[0] = SMU_CMD_MISC_ee_GET_DATABLOCK_REC; + params[1] = 0x4; + *((u32 *)¶ms[2]) = addr; + params[6] = clen; + + rc = smu_queue_cmd(&cmd); + if (rc) + return rc; + wait_for_completion(&comp); + if (cmd.status != 0) + return rc; + if (cmd.reply_len != clen) { + printk(KERN_DEBUG "SMU: short read in " + "smu_read_datablock, got: %d, want: %d\n", + cmd.reply_len, clen); + return -EIO; + } + len -= clen; + addr += clen; + dest += clen; + } + return 0; +} + +static struct smu_sdbp_header *smu_create_sdb_partition(int id) +{ + DECLARE_COMPLETION(comp); + struct smu_simple_cmd cmd; + unsigned int addr, len, tlen; + struct smu_sdbp_header *hdr; + struct property *prop; + + /* First query the partition info */ + smu_queue_simple(&cmd, SMU_CMD_PARTITION_COMMAND, 2, + smu_done_complete, &comp, + SMU_CMD_PARTITION_LATEST, id); + wait_for_completion(&comp); + + /* Partition doesn't exist (or other error) */ + if (cmd.cmd.status != 0 || cmd.cmd.reply_len != 6) + return NULL; + + /* Fetch address and length from reply */ + addr = *((u16 *)cmd.buffer); + len = cmd.buffer[3] << 2; + /* Calucluate total length to allocate, including the 17 bytes + * for "sdb-partition-XX" that we append at the end of the buffer + */ + tlen = sizeof(struct property) + len + 18; + + prop = kcalloc(tlen, 1, GFP_KERNEL); + if (prop == NULL) + return NULL; + hdr = (struct smu_sdbp_header *)(prop + 1); + prop->name = ((char *)prop) + tlen - 18; + sprintf(prop->name, "sdb-partition-%02x", id); + prop->length = len; + prop->value = (unsigned char *)hdr; + prop->next = NULL; + + /* Read the datablock */ + if (smu_read_datablock((u8 *)hdr, addr, len)) { + printk(KERN_DEBUG "SMU: datablock read failed while reading " + "partition %02x !\n", id); + goto failure; + } + + /* Got it, check a few things and create the property */ + if (hdr->id != id) { + printk(KERN_DEBUG "SMU: Reading partition %02x and got " + "%02x !\n", id, hdr->id); + goto failure; + } + if (prom_add_property(smu->of_node, prop)) { + printk(KERN_DEBUG "SMU: Failed creating sdb-partition-%02x " + "property !\n", id); + goto failure; + } + + return hdr; + failure: + kfree(prop); + return NULL; +} + +/* Note: Only allowed to return error code in pointers (using ERR_PTR) + * when interruptible is 1 + */ +struct smu_sdbp_header *__smu_get_sdb_partition(int id, unsigned int *size, + int interruptible) { char pname[32]; + struct smu_sdbp_header *part; if (!smu) return NULL; sprintf(pname, "sdb-partition-%02x", id); - return (struct smu_sdbp_header *)get_property(smu->of_node, + + if (interruptible) { + int rc; + rc = down_interruptible(&smu_part_access); + if (rc) + return ERR_PTR(rc); + } else + down(&smu_part_access); + + part = (struct smu_sdbp_header *)get_property(smu->of_node, pname, size); + if (part == NULL) { + part = smu_create_sdb_partition(id); + if (part != NULL && size) + *size = part->len << 2; + } + up(&smu_part_access); + return part; +} + +struct smu_sdbp_header *smu_get_sdb_partition(int id, unsigned int *size) +{ + return __smu_get_sdb_partition(id, size, 0); } EXPORT_SYMBOL(smu_get_sdb_partition); @@ -930,6 +1070,14 @@ else if (hdr.cmdtype == SMU_CMDTYPE_WANTS_EVENTS) { pp->mode = smu_file_events; return 0; + } else if (hdr.cmdtype == SMU_CMDTYPE_GET_PARTITION) { + struct smu_sdbp_header *part; + part = __smu_get_sdb_partition(hdr.cmd, NULL, 1); + if (part == NULL) + return -EINVAL; + else if (IS_ERR(part)) + return PTR_ERR(part); + return 0; } else if (hdr.cmdtype != SMU_CMDTYPE_SMU) return -EINVAL; else if (pp->mode != smu_file_commands) Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/arch/ppc64/kernel/prom.c 2005-11-07 12:45:44.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -1894,17 +1895,32 @@ EXPORT_SYMBOL(get_property); /* - * Add a property to a node + * Add a property to a node. */ -void +int prom_add_property(struct device_node* np, struct property* prop) { - struct property **next = &np->properties; + struct property **next; prop->next = NULL; - while (*next) + write_lock(&devtree_lock); + next = &np->properties; + while (*next) { + if (strcmp(prop->name, (*next)->name) == 0) { + /* duplicate ! don't insert it */ + write_unlock(&devtree_lock); + return -1; + } next = &(*next)->next; + } *next = prop; + write_unlock(&devtree_lock); + + /* try to add to proc as well if it was initialized */ + if (np->pde) + proc_device_tree_add_prop(np->pde, prop); + + return 0; } #if 0 Index: linux-work/fs/proc/proc_devtree.c =================================================================== --- linux-work.orig/fs/proc/proc_devtree.c 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/fs/proc/proc_devtree.c 2005-11-07 12:45:44.000000000 +1100 @@ -49,6 +49,39 @@ */ /* + * Add a property to a node + */ +static struct proc_dir_entry * +__proc_device_tree_add_prop(struct proc_dir_entry *de, struct property *pp) +{ + struct proc_dir_entry *ent; + + /* + * Unfortunately proc_register puts each new entry + * at the beginning of the list. So we rearrange them. + */ + ent = create_proc_read_entry(pp->name, + strncmp(pp->name, "security-", 9) + ? S_IRUGO : S_IRUSR, de, + property_read_proc, pp); + if (ent == NULL) + return NULL; + + if (!strncmp(pp->name, "security-", 9)) + ent->size = 0; /* don't leak number of password chars */ + else + ent->size = pp->length; + + return ent; +} + + +void proc_device_tree_add_prop(struct proc_dir_entry *pde, struct property *prop) +{ + __proc_device_tree_add_prop(pde, prop); +} + +/* * Process a node, adding entries for its children and its properties. */ void proc_device_tree_add_node(struct device_node *np, @@ -57,11 +90,9 @@ struct property *pp; struct proc_dir_entry *ent; struct device_node *child; - struct proc_dir_entry *list = NULL, **lastp; const char *p; set_node_proc_entry(np, de); - lastp = &list; for (child = NULL; (child = of_get_next_child(np, child));) { p = strrchr(child->full_name, '/'); if (!p) @@ -71,9 +102,6 @@ ent = proc_mkdir(p, de); if (ent == 0) break; - *lastp = ent; - ent->next = NULL; - lastp = &ent->next; proc_device_tree_add_node(child, ent); } of_node_put(child); @@ -84,7 +112,7 @@ * properties are quite unimportant for us though, thus we * simply "skip" them here, but we do have to check. */ - for (ent = list; ent != NULL; ent = ent->next) + for (ent = de->subdir; ent != NULL; ent = ent->next) if (!strcmp(ent->name, pp->name)) break; if (ent != NULL) { @@ -94,25 +122,10 @@ continue; } - /* - * Unfortunately proc_register puts each new entry - * at the beginning of the list. So we rearrange them. - */ - ent = create_proc_read_entry(pp->name, - strncmp(pp->name, "security-", 9) - ? S_IRUGO : S_IRUSR, de, - property_read_proc, pp); + ent = __proc_device_tree_add_prop(de, pp); if (ent == 0) break; - if (!strncmp(pp->name, "security-", 9)) - ent->size = 0; /* don't leak number of password chars */ - else - ent->size = pp->length; - ent->next = NULL; - *lastp = ent; - lastp = &ent->next; } - de->subdir = list; } /* Index: linux-work/include/asm-ppc/prom.h =================================================================== --- linux-work.orig/include/asm-ppc/prom.h 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/include/asm-ppc/prom.h 2005-11-07 12:45:44.000000000 +1100 @@ -93,7 +93,7 @@ extern int machine_is_compatible(const char *compat); extern unsigned char *get_property(struct device_node *node, const char *name, int *lenp); -extern void prom_add_property(struct device_node* np, struct property* prop); +extern int prom_add_property(struct device_node* np, struct property* prop); extern void prom_get_irq_senses(unsigned char *, int, int); extern int prom_n_addr_cells(struct device_node* np); extern int prom_n_size_cells(struct device_node* np); Index: linux-work/include/asm-ppc64/prom.h =================================================================== --- linux-work.orig/include/asm-ppc64/prom.h 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/include/asm-ppc64/prom.h 2005-11-07 12:45:44.000000000 +1100 @@ -205,6 +205,6 @@ extern int prom_n_size_cells(struct device_node* np); extern int prom_n_intr_cells(struct device_node* np); extern void prom_get_irq_senses(unsigned char *senses, int off, int max); -extern void prom_add_property(struct device_node* np, struct property* prop); +extern int prom_add_property(struct device_node* np, struct property* prop); #endif /* _PPC64_PROM_H */ Index: linux-work/include/linux/proc_fs.h =================================================================== --- linux-work.orig/include/linux/proc_fs.h 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/include/linux/proc_fs.h 2005-11-07 12:45:44.000000000 +1100 @@ -139,15 +139,12 @@ /* * proc_devtree.c */ +#ifdef CONFIG_PROC_DEVICETREE struct device_node; +struct property; extern void proc_device_tree_init(void); -#ifdef CONFIG_PROC_DEVICETREE extern void proc_device_tree_add_node(struct device_node *, struct proc_dir_entry *); -#else /* !CONFIG_PROC_DEVICETREE */ -static inline void proc_device_tree_add_node(struct device_node *np, struct proc_dir_entry *pde) -{ - return; -} +extern void proc_device_tree_add_prop(struct proc_dir_entry *pde, struct property *prop); #endif /* CONFIG_PROC_DEVICETREE */ extern struct proc_dir_entry *proc_symlink(const char *, Index: linux-work/arch/ppc/syslib/prom.c =================================================================== --- linux-work.orig/arch/ppc/syslib/prom.c 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/arch/ppc/syslib/prom.c 2005-11-07 12:45:44.000000000 +1100 @@ -1165,7 +1165,7 @@ /* * Add a property to a node */ -void +int prom_add_property(struct device_node* np, struct property* prop) { struct property **next = &np->properties; @@ -1174,6 +1174,8 @@ while (*next) next = &(*next)->next; *next = prop; + + return 0; } /* I quickly hacked that one, check against spec ! */ Index: linux-work/include/asm-powerpc/smu.h =================================================================== --- linux-work.orig/include/asm-powerpc/smu.h 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/include/asm-powerpc/smu.h 2005-11-07 12:45:44.000000000 +1100 @@ -20,16 +20,52 @@ /* * Partition info commands * - * I do not know what those are for at this point + * These commands are used to retreive the sdb-partition-XX datas from + * the SMU. The lenght is always 2. First byte is the subcommand code + * and second byte is the partition ID. + * + * The reply is 6 bytes: + * + * - 0..1 : partition address + * - 2 : a byte containing the partition ID + * - 3 : length (maybe other bits are rest of header ?) + * + * The data must then be obtained with calls to another command: + * SMU_CMD_MISC_ee_GET_DATABLOCK_REC (described below). */ #define SMU_CMD_PARTITION_COMMAND 0x3e +#define SMU_CMD_PARTITION_LATEST 0x01 +#define SMU_CMD_PARTITION_BASE 0x02 +#define SMU_CMD_PARTITION_UPDATE 0x03 /* * Fan control * - * This is a "mux" for fan control commands, first byte is the - * "sub" command. + * This is a "mux" for fan control commands. The command seem to + * act differently based on the number of arguments. With 1 byte + * of argument, this seem to be queries for fans status, setpoint, + * etc..., while with 0xe arguments, we will set the fans speeds. + * + * Queries (1 byte arg): + * --------------------- + * + * arg=0x01: read RPM fans status + * arg=0x02: read RPM fans setpoint + * arg=0x11: read PWM fans status + * arg=0x12: read PWM fans setpoint + * + * the "status" queries return the current speed while the "setpoint" ones + * return the programmed/target speed. It _seems_ that the result is a bit + * mask in the first byte of active/available fans, followed by 6 words (16 + * bits) containing the requested speed. + * + * Setpoint (14 bytes arg): + * ------------------------ + * + * first arg byte is 0 for RPM fans and 0x10 for PWM. Second arg byte is the + * mask of fans affected by the command. Followed by 6 words containing the + * setpoint value for selected fans in the mask (or 0 if mask value is 0) */ #define SMU_CMD_FAN_COMMAND 0x4a @@ -156,6 +192,14 @@ #define SMU_CMD_POWER_SHUTDOWN "SHUTDOWN" #define SMU_CMD_POWER_VOLTAGE_SLEW "VSLEW" +/* + * Read ADC sensors + * + * This command takes one byte of parameter: the sensor ID (or "reg" + * value in the device-tree) and returns a 16 bits value + */ +#define SMU_CMD_READ_ADC 0xd8 + /* Misc commands * * This command seem to be a grab bag of various things @@ -176,6 +220,25 @@ * Misc commands * * This command seem to be a grab bag of various things + * + * SMU_CMD_MISC_ee_GET_DATABLOCK_REC is used, among others, to + * transfer blocks of data from the SMU. So far, I've decrypted it's + * usage to retreive partition data. In order to do that, you have to + * break your transfer in "chunks" since that command cannot transfer + * more than a chunk at a time. The chunk size used by OF is 0xe bytes, + * but it seems that the darwin driver will let you do 0x1e bytes if + * your "PMU" version is >= 0x30. You can get the "PMU" version apparently + * either in the last 16 bits of property "smu-version-pmu" or as the 16 + * bytes at offset 1 of "smu-version-info" + * + * For each chunk, the command takes 7 bytes of arguments: + * byte 0: subcommand code (0x02) + * byte 1: 0x04 (always, I don't know what it means, maybe the address + * space to use or some other nicety. It's hard coded in OF) + * byte 2..5: SMU address of the chunk (big endian 32 bits) + * byte 6: size to transfer (up to max chunk size) + * + * The data is returned directly */ #define SMU_CMD_MISC_ee_COMMAND 0xee #define SMU_CMD_MISC_ee_GET_DATABLOCK_REC 0x02 @@ -353,21 +416,26 @@ __u8 flags; }; -/* - * 32 bits integers are usually encoded with 2x16 bits swapped, - * this demangles them + + /* + * demangle 16 and 32 bits integer in some SMU partitions + * (currently, afaik, this concerns only the FVT partition + * (0x12) */ -#define SMU_U32_MIX(x) ((((x) << 16) & 0xffff0000u) | (((x) >> 16) & 0xffffu)) +#define SMU_U16_MIX(x) le16_to_cpu(x); +#define SMU_U32_MIX(x) ((((x) & 0xff00ff00u) >> 8)|(((x) & 0x00ff00ffu) << 8)) + /* This is the definition of the SMU sdb-partition-0x12 table (called * CPU F/V/T operating points in Darwin). The definition for all those * SMU tables should be moved to some separate file */ -#define SMU_SDB_FVT_ID 0x12 +#define SMU_SDB_FVT_ID 0x12 struct smu_sdbp_fvt { __u32 sysclk; /* Base SysClk frequency in Hz for - * this operating point + * this operating point. Value need to + * be unmixed with SMU_U32_MIX() */ __u8 pad; __u8 maxtemp; /* Max temp. supported by this @@ -376,10 +444,73 @@ __u16 volts[3]; /* CPU core voltage for the 3 * PowerTune modes, a mode with - * 0V = not supported. + * 0V = not supported. Value need + * to be unmixed with SMU_U16_MIX() */ }; +/* This partition contains voltage & current sensor calibration + * informations + */ +#define SMU_SDB_CPUVCP_ID 0x21 + +struct smu_sdbp_cpuvcp { + __u16 volt_scale; /* u4.12 fixed point */ + __s16 volt_offset; /* s4.12 fixed point */ + __u16 curr_scale; /* u4.12 fixed point */ + __s16 curr_offset; /* s4.12 fixed point */ + __s32 power_quads[3]; /* s4.28 fixed point */ +}; + +/* This partition contains CPU thermal diode calibration + */ +#define SMU_SDB_CPUDIODE_ID 0x18 + +struct smu_sdbp_cpudiode { + __u16 m_value; /* u1.15 fixed point */ + __s16 b_value; /* s10.6 fixed point */ + +}; + +/* This partition contains Slots power calibration + */ +#define SMU_SDB_SLOTSPOW_ID 0x78 + +struct smu_sdbp_slotspow { + __u16 pow_scale; /* u4.12 fixed point */ + __s16 pow_offset; /* s4.12 fixed point */ +}; + +/* This partition contains machine specific version information about + * the sensor/control layout + */ +#define SMU_SDB_SENSORTREE_ID 0x25 + +struct smu_sdbp_sensortree { + u8 model_id; + u8 unknown[3]; +}; + +/* This partition contains CPU thermal control PID informations. So far + * only single CPU machines have been seen with an SMU, so we assume this + * carries only informations for those + */ +#define SMU_SDB_CPUPIDDATA_ID 0x17 + +struct smu_sdbp_cpupiddata { + u8 unknown1; + u8 target_temp_delta; + u8 unknown2; + u8 history_len; + s16 power_adj; + u16 max_power; + s32 gp,gr,gd; +}; + + +/* Other partitions without known structures */ +#define SMU_SDB_DEBUG_SWITCHES_ID 0x05 + #ifdef __KERNEL__ /* * This returns the pointer to an SMU "sdb" partition data or NULL @@ -423,8 +554,10 @@ __u32 cmdtype; #define SMU_CMDTYPE_SMU 0 /* SMU command */ #define SMU_CMDTYPE_WANTS_EVENTS 1 /* switch fd to events mode */ +#define SMU_CMDTYPE_GET_PARTITION 2 /* retreive an sdb partition */ __u8 cmd; /* SMU command byte */ + __u8 pad[3]; /* padding */ __u32 data_len; /* Lenght of data following */ }; Index: linux-work/arch/powerpc/kernel/prom.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/prom.c 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/arch/powerpc/kernel/prom.c 2005-11-07 12:46:45.000000000 +1100 @@ -1986,14 +1986,29 @@ /* * Add a property to a node */ -void prom_add_property(struct device_node* np, struct property* prop) +int prom_add_property(struct device_node* np, struct property* prop) { - struct property **next = &np->properties; + struct property **next; prop->next = NULL; - while (*next) + write_lock(&devtree_lock); + next = &np->properties; + while (*next) { + if (strcmp(prop->name, (*next)->name) == 0) { + /* duplicate ! don't insert it */ + write_unlock(&devtree_lock); + return -1; + } next = &(*next)->next; + } *next = prop; + write_unlock(&devtree_lock); + + /* try to add to proc as well if it was initialized */ + if (np->pde) + proc_device_tree_add_prop(np->pde, prop); + + return 0; } /* I quickly hacked that one, check against spec ! */ Index: linux-work/include/asm-powerpc/prom.h =================================================================== --- linux-work.orig/include/asm-powerpc/prom.h 2005-11-07 12:45:42.000000000 +1100 +++ linux-work/include/asm-powerpc/prom.h 2005-11-07 12:45:44.000000000 +1100 @@ -195,7 +195,7 @@ extern int prom_n_size_cells(struct device_node* np); extern int prom_n_intr_cells(struct device_node* np); extern void prom_get_irq_senses(unsigned char *senses, int off, int max); -extern void prom_add_property(struct device_node* np, struct property* prop); +extern int prom_add_property(struct device_node* np, struct property* prop); #ifdef CONFIG_PPC32 /* From benh at kernel.crashing.org Mon Nov 7 14:30:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:30:28 +1100 Subject: [PATCH] ppc64: Thermal control for SMU based machiens Message-ID: <1131334230.5229.157.camel@gaston> This adds a new thermal control framework for PowerMac, along with the implementation for PowerMac8,1, PowerMac8,2 (iMac G5 rev 1 and 2), and PowerMac9,1 (latest single CPU desktop). In the future, I expect to move the older G5 thermal control to the new framework as well. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/drivers/macintosh/smu.c =================================================================== --- linux-work.orig/drivers/macintosh/smu.c 2005-11-07 13:30:45.000000000 +1100 +++ linux-work/drivers/macintosh/smu.c 2005-11-07 13:30:46.000000000 +1100 @@ -590,6 +590,8 @@ sprintf(name, "smu-i2c-%02x", *reg); of_platform_device_create(np, name, &smu->of_dev->dev); } + if (device_is_compatible(np, "smu-sensors")) + of_platform_device_create(np, "smu-sensors", &smu->of_dev->dev); } } Index: linux-work/drivers/macintosh/Kconfig =================================================================== --- linux-work.orig/drivers/macintosh/Kconfig 2005-11-07 13:29:50.000000000 +1100 +++ linux-work/drivers/macintosh/Kconfig 2005-11-07 13:30:46.000000000 +1100 @@ -169,6 +169,25 @@ This driver provides thermostat and fan control for the desktop G5 machines. +config WINDFARM + tristate "New PowerMac thermal control infrastructure" + +config WINDFARM_PM81 + tristate "Support for thermal management on iMac G5" + depends on WINDFARM && I2C && CPU_FREQ_PMAC64 && PMAC_SMU + select I2C_PMAC_SMU + help + This driver provides thermal control for the iMacG5 + +config WINDFARM_PM91 + tristate "Support for thermal management on PowerMac9,1" + depends on WINDFARM && I2C && CPU_FREQ_PMAC64 && PMAC_SMU + select I2C_PMAC_SMU + help + This driver provides thermal control for the PowerMac9,1 + which is the recent (SMU based) single CPU desktop G5 + + config ANSLCD tristate "Support for ANS LCD display" depends on ADB_CUDA && PPC_PMAC Index: linux-work/drivers/macintosh/Makefile =================================================================== --- linux-work.orig/drivers/macintosh/Makefile 2005-11-07 13:29:50.000000000 +1100 +++ linux-work/drivers/macintosh/Makefile 2005-11-07 13:30:46.000000000 +1100 @@ -26,3 +26,12 @@ obj-$(CONFIG_THERM_PM72) += therm_pm72.o obj-$(CONFIG_THERM_WINDTUNNEL) += therm_windtunnel.o obj-$(CONFIG_THERM_ADT746X) += therm_adt746x.o +obj-$(CONFIG_WINDFARM) += windfarm_core.o +obj-$(CONFIG_WINDFARM_PM81) += windfarm_smu_controls.o \ + windfarm_smu_sensors.o \ + windfarm_lm75_sensor.o windfarm_pid.o \ + windfarm_cpufreq_clamp.o windfarm_pm81.o +obj-$(CONFIG_WINDFARM_PM91) += windfarm_smu_controls.o \ + windfarm_smu_sensors.o \ + windfarm_lm75_sensor.o windfarm_pid.o \ + windfarm_cpufreq_clamp.o windfarm_pm91.o Index: linux-work/drivers/macintosh/windfarm.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm.h 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,122 @@ +#ifndef __WINDFARM_H__ +#define __WINDFARM_H__ + +#include +#include +#include +#include + +/* Display a 16.16 fixed point value */ +#define FIX32TOPRINT(f) ((f) >> 16),((((f) & 0xffff) * 1000) >> 16) + +/* + * Control objects + */ + +struct wf_control; + +struct wf_control_ops { + int (*set_value)(struct wf_control *ct, s32 val); + int (*get_value)(struct wf_control *ct, s32 *val); + s32 (*get_min)(struct wf_control *ct); + s32 (*get_max)(struct wf_control *ct); + void (*release)(struct wf_control *ct); + struct module *owner; +}; + +struct wf_control { + struct list_head link; + struct wf_control_ops *ops; + char *name; + int type; + struct kref ref; +}; + +#define WF_CONTROL_TYPE_GENERIC 0 +#define WF_CONTROL_RPM_FAN 1 +#define WF_CONTROL_PWM_FAN 2 + + +/* Note about lifetime rules: wf_register_control() will initialize + * the kref and wf_unregister_control will decrement it, thus the + * object creating/disposing a given control shouldn't assume it + * still exists after wf_unregister_control has been called. + * wf_find_control will inc the refcount for you + */ +extern int wf_register_control(struct wf_control *ct); +extern void wf_unregister_control(struct wf_control *ct); +extern struct wf_control * wf_find_control(const char *name); +extern int wf_get_control(struct wf_control *ct); +extern void wf_put_control(struct wf_control *ct); + +static inline int wf_control_set_max(struct wf_control *ct) +{ + s32 vmax = ct->ops->get_max(ct); + return ct->ops->set_value(ct, vmax); +} + +static inline int wf_control_set_min(struct wf_control *ct) +{ + s32 vmin = ct->ops->get_min(ct); + return ct->ops->set_value(ct, vmin); +} + +/* + * Sensor objects + */ + +struct wf_sensor; + +struct wf_sensor_ops { + int (*get_value)(struct wf_sensor *sr, s32 *val); + void (*release)(struct wf_sensor *sr); + struct module *owner; +}; + +struct wf_sensor { + struct list_head link; + struct wf_sensor_ops *ops; + char *name; + struct kref ref; +}; + +/* Same lifetime rules as controls */ +extern int wf_register_sensor(struct wf_sensor *sr); +extern void wf_unregister_sensor(struct wf_sensor *sr); +extern struct wf_sensor * wf_find_sensor(const char *name); +extern int wf_get_sensor(struct wf_sensor *sr); +extern void wf_put_sensor(struct wf_sensor *sr); + +/* For use by clients. Note that we are a bit racy here since + * notifier_block doesn't have a module owner field. I may fix + * it one day ... + * + * LOCKING NOTE ! + * + * All "events" except WF_EVENT_TICK are called with an internal mutex + * held which will deadlock if you call basically any core routine. + * So don't ! Just take note of the event and do your actual operations + * from the ticker. + * + */ +extern int wf_register_client(struct notifier_block *nb); +extern int wf_unregister_client(struct notifier_block *nb); + +/* Overtemp conditions. Those are refcounted */ +extern void wf_set_overtemp(void); +extern void wf_clear_overtemp(void); +extern int wf_is_overtemp(void); + +#define WF_EVENT_NEW_CONTROL 0 /* param is wf_control * */ +#define WF_EVENT_NEW_SENSOR 1 /* param is wf_sensor * */ +#define WF_EVENT_OVERTEMP 2 /* no param */ +#define WF_EVENT_NORMALTEMP 3 /* overtemp condition cleared */ +#define WF_EVENT_TICK 4 /* 1 second tick */ + +/* Note: If that driver gets more broad use, we could replace the + * simplistic overtemp bits with "environmental conditions". That + * could then be used to also notify of things like fan failure, + * case open, battery conditions, ... + */ + +#endif /* __WINDFARM_H__ */ Index: linux-work/drivers/macintosh/windfarm_core.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_core.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,427 @@ +/* + * Windfarm PowerMac thermal control. Core + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * This core code tracks the list of sensors & controls, register + * clients, and holds the kernel thread used for control. + * + * TODO: + * + * Add some information about sensor/control type and data format to + * sensors/controls, and have the sysfs attribute stuff be moved + * generically here instead of hard coded in the platform specific + * driver as it us currently + * + * This however requires solving some annoying lifetime issues with + * sysfs which doesn't seem to have lifetime rules for struct attribute, + * I may have to create full features kobjects for every sensor/control + * instead which is a bit of an overkill imho + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.2" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +static LIST_HEAD(wf_controls); +static LIST_HEAD(wf_sensors); +static DECLARE_MUTEX(wf_lock); +static struct notifier_block *wf_client_list; +static int wf_client_count; +static unsigned int wf_overtemp; +static unsigned int wf_overtemp_counter; +struct task_struct *wf_thread; + +/* + * Utilities & tick thread + */ + +static inline void wf_notify(int event, void *param) +{ + notifier_call_chain(&wf_client_list, event, param); +} + +int wf_critical_overtemp(void) +{ + static char * critical_overtemp_path = "/sbin/critical_overtemp"; + char *argv[] = { critical_overtemp_path, NULL }; + static char *envp[] = { "HOME=/", + "TERM=linux", + "PATH=/sbin:/usr/sbin:/bin:/usr/bin", + NULL }; + + return call_usermodehelper(critical_overtemp_path, argv, envp, 0); +} +EXPORT_SYMBOL_GPL(wf_critical_overtemp); + +static int wf_thread_func(void *data) +{ + unsigned long next, delay; + + next = jiffies; + + DBG("wf: thread started\n"); + + while(!kthread_should_stop()) { + try_to_freeze(); + + if (time_after_eq(jiffies, next)) { + wf_notify(WF_EVENT_TICK, NULL); + if (wf_overtemp) { + wf_overtemp_counter++; + /* 10 seconds overtemp, notify userland */ + if (wf_overtemp_counter > 10) + wf_critical_overtemp(); + /* 30 seconds, shutdown */ + if (wf_overtemp_counter > 30) { + printk(KERN_ERR "windfarm: Overtemp " + "for more than 30" + " seconds, shutting down\n"); + machine_power_off(); + } + } + next += HZ; + } + + delay = next - jiffies; + if (delay <= HZ) + schedule_timeout_interruptible(delay); + + /* there should be no signal, but oh well */ + if (signal_pending(current)) { + printk(KERN_WARNING "windfarm: thread got sigl !\n"); + break; + } + } + + DBG("wf: thread stopped\n"); + + return 0; +} + +static void wf_start_thread(void) +{ + wf_thread = kthread_run(wf_thread_func, NULL, "kwindfarm"); + if (IS_ERR(wf_thread)) { + printk(KERN_ERR "windfarm: failed to create thread,err %ld\n", + PTR_ERR(wf_thread)); + wf_thread = NULL; + } +} + + +static void wf_stop_thread(void) +{ + if (wf_thread) + kthread_stop(wf_thread); + wf_thread = NULL; +} + +/* + * Controls + */ + +static void wf_control_release(struct kref *kref) +{ + struct wf_control *ct = container_of(kref, struct wf_control, ref); + + DBG("wf: Deleting control %s\n", ct->name); + + if (ct->ops && ct->ops->release) + ct->ops->release(ct); + else + kfree(ct); +} + +int wf_register_control(struct wf_control *new_ct) +{ + struct wf_control *ct; + + down(&wf_lock); + list_for_each_entry(ct, &wf_controls, link) { + if (!strcmp(ct->name, new_ct->name)) { + printk(KERN_WARNING "windfarm: trying to register" + " duplicate control %s\n", ct->name); + up(&wf_lock); + return -EEXIST; + } + } + kref_init(&new_ct->ref); + list_add(&new_ct->link, &wf_controls); + + DBG("wf: Registered control %s\n", new_ct->name); + + wf_notify(WF_EVENT_NEW_CONTROL, new_ct); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_register_control); + +void wf_unregister_control(struct wf_control *ct) +{ + down(&wf_lock); + list_del(&ct->link); + up(&wf_lock); + + DBG("wf: Unregistered control %s\n", ct->name); + + kref_put(&ct->ref, wf_control_release); +} +EXPORT_SYMBOL_GPL(wf_unregister_control); + +struct wf_control * wf_find_control(const char *name) +{ + struct wf_control *ct; + + down(&wf_lock); + list_for_each_entry(ct, &wf_controls, link) { + if (!strcmp(ct->name, name)) { + if (wf_get_control(ct)) + ct = NULL; + up(&wf_lock); + return ct; + } + } + up(&wf_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(wf_find_control); + +int wf_get_control(struct wf_control *ct) +{ + if (!try_module_get(ct->ops->owner)) + return -ENODEV; + kref_get(&ct->ref); + return 0; +} +EXPORT_SYMBOL_GPL(wf_get_control); + +void wf_put_control(struct wf_control *ct) +{ + struct module *mod = ct->ops->owner; + kref_put(&ct->ref, wf_control_release); + module_put(mod); +} +EXPORT_SYMBOL_GPL(wf_put_control); + + +/* + * Sensors + */ + + +static void wf_sensor_release(struct kref *kref) +{ + struct wf_sensor *sr = container_of(kref, struct wf_sensor, ref); + + DBG("wf: Deleting sensor %s\n", sr->name); + + if (sr->ops && sr->ops->release) + sr->ops->release(sr); + else + kfree(sr); +} + +int wf_register_sensor(struct wf_sensor *new_sr) +{ + struct wf_sensor *sr; + + down(&wf_lock); + list_for_each_entry(sr, &wf_sensors, link) { + if (!strcmp(sr->name, new_sr->name)) { + printk(KERN_WARNING "windfarm: trying to register" + " duplicate sensor %s\n", sr->name); + up(&wf_lock); + return -EEXIST; + } + } + kref_init(&new_sr->ref); + list_add(&new_sr->link, &wf_sensors); + + DBG("wf: Registered sensor %s\n", new_sr->name); + + wf_notify(WF_EVENT_NEW_SENSOR, new_sr); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_register_sensor); + +void wf_unregister_sensor(struct wf_sensor *sr) +{ + down(&wf_lock); + list_del(&sr->link); + up(&wf_lock); + + DBG("wf: Unregistered sensor %s\n", sr->name); + + wf_put_sensor(sr); +} +EXPORT_SYMBOL_GPL(wf_unregister_sensor); + +struct wf_sensor * wf_find_sensor(const char *name) +{ + struct wf_sensor *sr; + + down(&wf_lock); + list_for_each_entry(sr, &wf_sensors, link) { + if (!strcmp(sr->name, name)) { + if (wf_get_sensor(sr)) + sr = NULL; + up(&wf_lock); + return sr; + } + } + up(&wf_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(wf_find_sensor); + +int wf_get_sensor(struct wf_sensor *sr) +{ + if (!try_module_get(sr->ops->owner)) + return -ENODEV; + kref_get(&sr->ref); + return 0; +} +EXPORT_SYMBOL_GPL(wf_get_sensor); + +void wf_put_sensor(struct wf_sensor *sr) +{ + struct module *mod = sr->ops->owner; + kref_put(&sr->ref, wf_sensor_release); + module_put(mod); +} +EXPORT_SYMBOL_GPL(wf_put_sensor); + + +/* + * Client & notification + */ + +int wf_register_client(struct notifier_block *nb) +{ + int rc; + struct wf_control *ct; + struct wf_sensor *sr; + + down(&wf_lock); + rc = notifier_chain_register(&wf_client_list, nb); + if (rc != 0) + goto bail; + wf_client_count++; + list_for_each_entry(ct, &wf_controls, link) + wf_notify(WF_EVENT_NEW_CONTROL, ct); + list_for_each_entry(sr, &wf_sensors, link) + wf_notify(WF_EVENT_NEW_SENSOR, sr); + if (wf_client_count == 1) + wf_start_thread(); + bail: + up(&wf_lock); + return rc; +} +EXPORT_SYMBOL_GPL(wf_register_client); + +int wf_unregister_client(struct notifier_block *nb) +{ + down(&wf_lock); + notifier_chain_unregister(&wf_client_list, nb); + wf_client_count++; + if (wf_client_count == 0) + wf_stop_thread(); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_unregister_client); + +void wf_set_overtemp(void) +{ + down(&wf_lock); + wf_overtemp++; + if (wf_overtemp == 1) { + printk(KERN_WARNING "windfarm: Overtemp condition detected !\n"); + wf_overtemp_counter = 0; + wf_notify(WF_EVENT_OVERTEMP, NULL); + } + up(&wf_lock); +} +EXPORT_SYMBOL_GPL(wf_set_overtemp); + +void wf_clear_overtemp(void) +{ + down(&wf_lock); + WARN_ON(wf_overtemp == 0); + if (wf_overtemp == 0) { + up(&wf_lock); + return; + } + wf_overtemp--; + if (wf_overtemp == 0) { + printk(KERN_WARNING "windfarm: Overtemp condition cleared !\n"); + wf_notify(WF_EVENT_NORMALTEMP, NULL); + } + up(&wf_lock); +} +EXPORT_SYMBOL_GPL(wf_clear_overtemp); + +int wf_is_overtemp(void) +{ + return (wf_overtemp != 0); +} +EXPORT_SYMBOL_GPL(wf_is_overtemp); + +static struct platform_device wf_platform_device = { + .name = "windfarm", +}; + +static int __init windfarm_core_init(void) +{ + DBG("wf: core loaded\n"); + + platform_device_register(&wf_platform_device); + return 0; +} + +static void __exit windfarm_core_exit(void) +{ + BUG_ON(wf_client_count != 0); + + DBG("wf: core unloaded\n"); + + platform_device_unregister(&wf_platform_device); +} + + +module_init(windfarm_core_init); +module_exit(windfarm_core_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Core component of PowerMac thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_smu_controls.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_smu_controls.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,274 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.3" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* + * SMU fans control object + */ + +static LIST_HEAD(smu_fans); + +struct smu_fan_control { + struct list_head link; + int fan_type; /* 0 = rpm, 1 = pwm */ + u32 reg; /* index in SMU */ + s32 value; /* current value */ + s32 min, max; /* min/max values */ + struct wf_control ctrl; +}; +#define to_smu_fan(c) container_of(c, struct smu_fan_control, ctrl) + +static int smu_set_fan(int pwm, u8 id, u16 value) +{ + struct smu_cmd cmd; + u8 buffer[16]; + DECLARE_COMPLETION(comp); + int rc; + + /* Fill SMU command structure */ + cmd.cmd = SMU_CMD_FAN_COMMAND; + cmd.data_len = 14; + cmd.reply_len = 16; + cmd.data_buf = cmd.reply_buf = buffer; + cmd.status = 0; + cmd.done = smu_done_complete; + cmd.misc = ∁ + + /* Fill argument buffer */ + memset(buffer, 0, 16); + buffer[0] = pwm ? 0x10 : 0x00; + buffer[1] = 0x01 << id; + *((u16 *)&buffer[2 + id * 2]) = value; + + rc = smu_queue_cmd(&cmd); + if (rc) + return rc; + wait_for_completion(&comp); + return cmd.status; +} + +static void smu_fan_release(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + + kfree(fct); +} + +static int smu_fan_set(struct wf_control *ct, s32 value) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + + if (value < fct->min) + value = fct->min; + if (value > fct->max) + value = fct->max; + fct->value = value; + + return smu_set_fan(fct->fan_type, fct->reg, value); +} + +static int smu_fan_get(struct wf_control *ct, s32 *value) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + *value = fct->value; /* todo: read from SMU */ + return 0; +} + +static s32 smu_fan_min(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + return fct->min; +} + +static s32 smu_fan_max(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + return fct->max; +} + +static struct wf_control_ops smu_fan_ops = { + .set_value = smu_fan_set, + .get_value = smu_fan_get, + .get_min = smu_fan_min, + .get_max = smu_fan_max, + .release = smu_fan_release, + .owner = THIS_MODULE, +}; + +static struct smu_fan_control *smu_fan_create(struct device_node *node, + int pwm_fan) +{ + struct smu_fan_control *fct; + s32 *v; u32 *reg; + char *l; + + fct = kmalloc(sizeof(struct smu_fan_control), GFP_KERNEL); + if (fct == NULL) + return NULL; + fct->ctrl.ops = &smu_fan_ops; + l = (char *)get_property(node, "location", NULL); + if (l == NULL) + goto fail; + + fct->fan_type = pwm_fan; + fct->ctrl.type = pwm_fan ? WF_CONTROL_PWM_FAN : WF_CONTROL_RPM_FAN; + + /* We use the name & location here the same way we do for SMU sensors, + * see the comment in windfarm_smu_sensors.c. The locations are a bit + * less consistent here between the iMac and the desktop models, but + * that is good enough for our needs for now at least. + * + * One problem though is that Apple seem to be inconsistent with case + * and the kernel doesn't have strcasecmp =P + */ + + fct->ctrl.name = NULL; + + /* Names used on desktop models */ + if (!strcmp(l, "Rear Fan 0") || !strcmp(l, "Rear Fan") || + !strcmp(l, "Rear fan 0") || !strcmp(l, "Rear fan")) + fct->ctrl.name = "cpu-rear-fan-0"; + else if (!strcmp(l, "Rear Fan 1") || !strcmp(l, "Rear fan 1")) + fct->ctrl.name = "cpu-rear-fan-1"; + else if (!strcmp(l, "Front Fan 0") || !strcmp(l, "Front Fan") || + !strcmp(l, "Front fan 0") || !strcmp(l, "Front fan")) + fct->ctrl.name = "cpu-front-fan-0"; + else if (!strcmp(l, "Front Fan 1") || !strcmp(l, "Front fan 1")) + fct->ctrl.name = "cpu-front-fan-1"; + else if (!strcmp(l, "Slots Fan") || !strcmp(l, "Slots fan")) + fct->ctrl.name = "slots-fan"; + else if (!strcmp(l, "Drive Bay") || !strcmp(l, "Drive bay")) + fct->ctrl.name = "drive-bay-fan"; + + /* Names used on iMac models */ + if (!strcmp(l, "System Fan") || !strcmp(l, "System fan")) + fct->ctrl.name = "system-fan"; + else if (!strcmp(l, "CPU Fan") || !strcmp(l, "CPU fan")) + fct->ctrl.name = "cpu-fan"; + else if (!strcmp(l, "Hard Drive") || !strcmp(l, "Hard drive")) + fct->ctrl.name = "drive-bay-fan"; + + /* Unrecognized fan, bail out */ + if (fct->ctrl.name == NULL) + goto fail; + + /* Get min & max values*/ + v = (s32 *)get_property(node, "min-value", NULL); + if (v == NULL) + goto fail; + fct->min = *v; + v = (s32 *)get_property(node, "max-value", NULL); + if (v == NULL) + goto fail; + fct->max = *v; + + /* Get "reg" value */ + reg = (u32 *)get_property(node, "reg", NULL); + if (reg == NULL) + goto fail; + fct->reg = *reg; + + if (wf_register_control(&fct->ctrl)) + goto fail; + + return fct; + fail: + kfree(fct); + return NULL; +} + + +static int __init smu_controls_init(void) +{ + struct device_node *smu, *fans, *fan; + + if (!smu_present()) + return -ENODEV; + + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return -ENODEV; + + /* Look for RPM fans */ + for (fans = NULL; (fans = of_get_next_child(smu, fans)) != NULL;) + if (!strcmp(fans->name, "rpm-fans")) + break; + for (fan = NULL; + fans && (fan = of_get_next_child(fans, fan)) != NULL;) { + struct smu_fan_control *fct; + + fct = smu_fan_create(fan, 0); + if (fct == NULL) { + printk(KERN_WARNING "windfarm: Failed to create SMU " + "RPM fan %s\n", fan->name); + continue; + } + list_add(&fct->link, &smu_fans); + } + of_node_put(fans); + + + /* Look for PWM fans */ + for (fans = NULL; (fans = of_get_next_child(smu, fans)) != NULL;) + if (!strcmp(fans->name, "pwm-fans")) + break; + for (fan = NULL; + fans && (fan = of_get_next_child(fans, fan)) != NULL;) { + struct smu_fan_control *fct; + + fct = smu_fan_create(fan, 1); + if (fct == NULL) { + printk(KERN_WARNING "windfarm: Failed to create SMU " + "PWM fan %s\n", fan->name); + continue; + } + list_add(&fct->link, &smu_fans); + } + of_node_put(fans); + of_node_put(smu); + + return 0; +} + +static void __exit smu_controls_exit(void) +{ + struct smu_fan_control *fct; + + while (!list_empty(&smu_fans)) { + fct = list_entry(smu_fans.next, struct smu_fan_control, link); + list_del(&fct->link); + wf_unregister_control(&fct->ctrl); + } +} + + +module_init(smu_controls_init); +module_exit(smu_controls_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("SMU control objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_smu_sensors.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_smu_sensors.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,471 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.2" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* + * Various SMU "partitions" calibration objects for which we + * keep pointers here for use by bits & pieces of the driver + */ +static struct smu_sdbp_cpuvcp *cpuvcp; +static int cpuvcp_version; +static struct smu_sdbp_cpudiode *cpudiode; +static struct smu_sdbp_slotspow *slotspow; +static u8 *debugswitches; + +/* + * SMU basic sensors objects + */ + +static LIST_HEAD(smu_ads); + +struct smu_ad_sensor { + struct list_head link; + u32 reg; /* index in SMU */ + struct wf_sensor sens; +}; +#define to_smu_ads(c) container_of(c, struct smu_ad_sensor, sens) + +static void smu_ads_release(struct wf_sensor *sr) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + + kfree(ads); +} + +static int smu_read_adc(u8 id, s32 *value) +{ + struct smu_simple_cmd cmd; + DECLARE_COMPLETION(comp); + int rc; + + rc = smu_queue_simple(&cmd, SMU_CMD_READ_ADC, 1, + smu_done_complete, &comp, id); + if (rc) + return rc; + wait_for_completion(&comp); + if (cmd.cmd.status != 0) + return cmd.cmd.status; + if (cmd.cmd.reply_len != 2) { + printk(KERN_ERR "winfarm: read ADC 0x%x returned %d bytes !\n", + id, cmd.cmd.reply_len); + return -EIO; + } + *value = *((u16 *)cmd.buffer); + return 0; +} + +static int smu_cputemp_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + int rc; + s32 val; + s64 scaled; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU temp failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s64)(((u64)val) * (u64)cpudiode->m_value); + scaled >>= 3; + scaled += ((s64)cpudiode->b_value) << 9; + *value = (s32)(scaled << 1); + + return 0; +} + +static int smu_cpuamp_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU current failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)cpuvcp->curr_scale); + scaled += (s32)cpuvcp->curr_offset; + *value = scaled << 4; + + return 0; +} + +static int smu_cpuvolt_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU voltage failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)cpuvcp->volt_scale); + scaled += (s32)cpuvcp->volt_offset; + *value = scaled << 4; + + return 0; +} + +static int smu_slotspow_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read slots power failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)slotspow->pow_scale); + scaled += (s32)slotspow->pow_offset; + *value = scaled << 4; + + return 0; +} + + +static struct wf_sensor_ops smu_cputemp_ops = { + .get_value = smu_cputemp_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_cpuamp_ops = { + .get_value = smu_cpuamp_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_cpuvolt_ops = { + .get_value = smu_cpuvolt_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_slotspow_ops = { + .get_value = smu_slotspow_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; + + +static struct smu_ad_sensor *smu_ads_create(struct device_node *node) +{ + struct smu_ad_sensor *ads; + char *c, *l; + u32 *v; + + ads = kmalloc(sizeof(struct smu_ad_sensor), GFP_KERNEL); + if (ads == NULL) + return NULL; + c = (char *)get_property(node, "device_type", NULL); + l = (char *)get_property(node, "location", NULL); + if (c == NULL || l == NULL) + goto fail; + + /* We currently pick the sensors based on the OF name and location + * properties, while Darwin uses the sensor-id's. + * The problem with the IDs is that they are model specific while it + * looks like apple has been doing a reasonably good job at keeping + * the names and locations consistents so I'll stick with the names + * and locations for now. + */ + if (!strcmp(c, "temp-sensor") && + !strcmp(l, "CPU T-Diode")) { + ads->sens.ops = &smu_cputemp_ops; + ads->sens.name = "cpu-temp"; + } else if (!strcmp(c, "current-sensor") && + !strcmp(l, "CPU Current")) { + ads->sens.ops = &smu_cpuamp_ops; + ads->sens.name = "cpu-current"; + } else if (!strcmp(c, "voltage-sensor") && + !strcmp(l, "CPU Voltage")) { + ads->sens.ops = &smu_cpuvolt_ops; + ads->sens.name = "cpu-voltage"; + } else if (!strcmp(c, "power-sensor") && + !strcmp(l, "Slots Power")) { + ads->sens.ops = &smu_slotspow_ops; + ads->sens.name = "slots-power"; + if (slotspow == NULL) { + DBG("wf: slotspow partition (%02x) not found\n", + SMU_SDB_SLOTSPOW_ID); + goto fail; + } + } else + goto fail; + + v = (u32 *)get_property(node, "reg", NULL); + if (v == NULL) + goto fail; + ads->reg = *v; + + if (wf_register_sensor(&ads->sens)) + goto fail; + return ads; + fail: + kfree(ads); + return NULL; +} + +/* + * SMU Power combo sensor object + */ + +struct smu_cpu_power_sensor { + struct list_head link; + struct wf_sensor *volts; + struct wf_sensor *amps; + int fake_volts : 1; + int quadratic : 1; + struct wf_sensor sens; +}; +#define to_smu_cpu_power(c) container_of(c, struct smu_cpu_power_sensor, sens) + +static struct smu_cpu_power_sensor *smu_cpu_power; + +static void smu_cpu_power_release(struct wf_sensor *sr) +{ + struct smu_cpu_power_sensor *pow = to_smu_cpu_power(sr); + + if (pow->volts) + wf_put_sensor(pow->volts); + if (pow->amps) + wf_put_sensor(pow->amps); + kfree(pow); +} + +static int smu_cpu_power_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_cpu_power_sensor *pow = to_smu_cpu_power(sr); + s32 volts, amps, power; + u64 tmps, tmpa, tmpb; + int rc; + + rc = pow->amps->ops->get_value(pow->amps, &s); + if (rc) + return rc; + + if (pow->fake_volts) { + *value = amps * 12 - 0x30000; + return 0; + } + + rc = pow->volts->ops->get_value(pow->volts, &volts); + if (rc) + return rc; + + power = (s32)((((u64)volts) * ((u64)amps)) >> 16); + if (!pow->quadratic) { + *value = power; + return 0; + } + tmps = (((u64)power) * ((u64)power)) >> 16; + tmpa = ((u64)cpuvcp->power_quads[0]) * tmps; + tmpb = ((u64)cpuvcp->power_quads[1]) * ((u64)power); + *value = (tmpa >> 28) + (tmpb >> 28) + (cpuvcp->power_quads[2] >> 12); + + return 0; +} + +static struct wf_sensor_ops smu_cpu_power_ops = { + .get_value = smu_cpu_power_get, + .release = smu_cpu_power_release, + .owner = THIS_MODULE, +}; + + +static struct smu_cpu_power_sensor * +smu_cpu_power_create(struct wf_sensor *volts, struct wf_sensor *amps) +{ + struct smu_cpu_power_sensor *pow; + + pow = kmalloc(sizeof(struct smu_cpu_power_sensor), GFP_KERNEL); + if (pow == NULL) + return NULL; + pow->sens.ops = &smu_cpu_power_ops; + pow->sens.name = "cpu-power"; + + wf_get_sensor(volts); + pow->volts = volts; + wf_get_sensor(amps); + pow->amps = amps; + + /* Some early machines need a faked voltage */ + if (debugswitches && ((*debugswitches) & 0x80)) { + printk(KERN_INFO "windfarm: CPU Power sensor using faked" + " voltage !\n"); + pow->fake_volts = 1; + } else + pow->fake_volts = 0; + + /* Try to use quadratic transforms on PowerMac8,1 and 9,1 for now, + * I yet have to figure out what's up with 8,2 and will have to + * adjust for later, unless we can 100% trust the SDB partition... + */ + if ((machine_is_compatible("PowerMac8,1") || + machine_is_compatible("PowerMac8,2") || + machine_is_compatible("PowerMac9,1")) && + cpuvcp_version >= 2) { + pow->quadratic = 1; + DBG("windfarm: CPU Power using quadratic transform\n"); + } else + pow->quadratic = 0; + + if (wf_register_sensor(&pow->sens)) + goto fail; + return pow; + fail: + kfree(pow); + return NULL; +} + +static int smu_fetch_param_partitions(void) +{ + struct smu_sdbp_header *hdr; + + /* Get CPU voltage/current/power calibration data */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUVCP_ID, NULL); + if (hdr == NULL) { + DBG("wf: cpuvcp partition (%02x) not found\n", + SMU_SDB_CPUVCP_ID); + return -ENODEV; + } + cpuvcp = (struct smu_sdbp_cpuvcp *)&hdr[1]; + /* Keep version around */ + cpuvcp_version = hdr->version; + + /* Get CPU diode calibration data */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUDIODE_ID, NULL); + if (hdr == NULL) { + DBG("wf: cpudiode partition (%02x) not found\n", + SMU_SDB_CPUDIODE_ID); + return -ENODEV; + } + cpudiode = (struct smu_sdbp_cpudiode *)&hdr[1]; + + /* Get slots power calibration data if any */ + hdr = smu_get_sdb_partition(SMU_SDB_SLOTSPOW_ID, NULL); + if (hdr != NULL) + slotspow = (struct smu_sdbp_slotspow *)&hdr[1]; + + /* Get debug switches if any */ + hdr = smu_get_sdb_partition(SMU_SDB_DEBUG_SWITCHES_ID, NULL); + if (hdr != NULL) + debugswitches = (u8 *)&hdr[1]; + + return 0; +} + +static int __init smu_sensors_init(void) +{ + struct device_node *smu, *sensors, *s; + struct smu_ad_sensor *volt_sensor = NULL, *curr_sensor = NULL; + int rc; + + if (!smu_present()) + return -ENODEV; + + /* Get parameters partitions */ + rc = smu_fetch_param_partitions(); + if (rc) + return rc; + + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return -ENODEV; + + /* Look for sensors subdir */ + for (sensors = NULL; + (sensors = of_get_next_child(smu, sensors)) != NULL;) + if (!strcmp(sensors->name, "sensors")) + break; + + of_node_put(smu); + + /* Create basic sensors */ + for (s = NULL; + sensors && (s = of_get_next_child(sensors, s)) != NULL;) { + struct smu_ad_sensor *ads; + + ads = smu_ads_create(s); + if (ads == NULL) + continue; + list_add(&ads->link, &smu_ads); + /* keep track of cpu voltage & current */ + if (!strcmp(ads->sens.name, "cpu-voltage")) + volt_sensor = ads; + else if (!strcmp(ads->sens.name, "cpu-current")) + curr_sensor = ads; + } + + of_node_put(sensors); + + /* Create CPU power sensor if possible */ + if (volt_sensor && curr_sensor) + smu_cpu_power = smu_cpu_power_create(&volt_sensor->sens, + &curr_sensor->sens); + + return 0; +} + +static void __exit smu_sensors_exit(void) +{ + struct smu_ad_sensor *ads; + + /* dispose of power sensor */ + if (smu_cpu_power) + wf_unregister_sensor(&smu_cpu_power->sens); + + /* dispose of basic sensors */ + while (!list_empty(&smu_ads)) { + ads = list_entry(smu_ads.next, struct smu_ad_sensor, link); + list_del(&ads->link); + wf_unregister_sensor(&ads->sens); + } +} + + +module_init(smu_sensors_init); +module_exit(smu_sensors_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("SMU sensor objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_lm75_sensor.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_lm75_sensor.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,255 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.1" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +struct wf_lm75_sensor { + int ds1775 : 1; + int inited : 1; + struct i2c_client i2c; + struct wf_sensor sens; +}; +#define wf_to_lm75(c) container_of(c, struct wf_lm75_sensor, sens) +#define i2c_to_lm75(c) container_of(c, struct wf_lm75_sensor, i2c) + +static int wf_lm75_attach(struct i2c_adapter *adapter); +static int wf_lm75_detach(struct i2c_client *client); + +static struct i2c_driver wf_lm75_driver = { + .owner = THIS_MODULE, + .name = "wf_lm75", + .flags = I2C_DF_NOTIFY, + .attach_adapter = wf_lm75_attach, + .detach_client = wf_lm75_detach, +}; + +static int wf_lm75_get(struct wf_sensor *sr, s32 *value) +{ + struct wf_lm75_sensor *lm = wf_to_lm75(sr); + s32 data; + + if (lm->i2c.adapter == NULL) + return -ENODEV; + + /* Init chip if necessary */ + if (!lm->inited) { + u8 cfg_new, cfg = (u8)i2c_smbus_read_byte_data(&lm->i2c, 1); + + DBG("wf_lm75: Initializing %s, cfg was: %02x\n", + sr->name, cfg); + + /* clear shutdown bit, keep other settings as left by + * the firmware for now + */ + cfg_new = cfg & ~0x01; + i2c_smbus_write_byte_data(&lm->i2c, 1, cfg_new); + lm->inited = 1; + + /* If we just powered it up, let's wait 200 ms */ + msleep(200); + } + + /* Read temperature register */ + data = (s32)le16_to_cpu(i2c_smbus_read_word_data(&lm->i2c, 0)); + data <<= 8; + *value = data; + + return 0; +} + +static void wf_lm75_release(struct wf_sensor *sr) +{ + struct wf_lm75_sensor *lm = wf_to_lm75(sr); + + /* check if client is registered and detach from i2c */ + if (lm->i2c.adapter) { + i2c_detach_client(&lm->i2c); + lm->i2c.adapter = NULL; + } + + kfree(lm); +} + +static struct wf_sensor_ops wf_lm75_ops = { + .get_value = wf_lm75_get, + .release = wf_lm75_release, + .owner = THIS_MODULE, +}; + +static struct wf_lm75_sensor *wf_lm75_create(struct i2c_adapter *adapter, + u8 addr, int ds1775, + const char *loc) +{ + struct wf_lm75_sensor *lm; + + DBG("wf_lm75: creating %s device at address 0x%02x\n", + ds1775 ? "ds1775" : "lm75", addr); + + lm = kmalloc(sizeof(struct wf_lm75_sensor), GFP_KERNEL); + if (lm == NULL) + return NULL; + memset(lm, 0, sizeof(struct wf_lm75_sensor)); + + /* Usual rant about sensor names not beeing very consistent in + * the device-tree, oh well ... + * Add more entries below as you deal with more setups + */ + if (!strcmp(loc, "Hard drive") || !strcmp(loc, "DRIVE BAY")) + lm->sens.name = "hd-temp"; + else + goto fail; + + lm->inited = 0; + lm->sens.ops = &wf_lm75_ops; + lm->ds1775 = ds1775; + lm->i2c.addr = (addr >> 1) & 0x7f; + lm->i2c.adapter = adapter; + lm->i2c.driver = &wf_lm75_driver; + strncpy(lm->i2c.name, lm->sens.name, I2C_NAME_SIZE-1); + + if (i2c_attach_client(&lm->i2c)) { + printk(KERN_ERR "windfarm: failed to attach %s %s to i2c\n", + ds1775 ? "ds1775" : "lm75", lm->i2c.name); + goto fail; + } + + if (wf_register_sensor(&lm->sens)) { + i2c_detach_client(&lm->i2c); + goto fail; + } + + return lm; + fail: + kfree(lm); + return NULL; +} + +static int wf_lm75_attach(struct i2c_adapter *adapter) +{ + u8 bus_id; + struct device_node *smu, *bus, *dev; + + /* We currently only deal with LM75's hanging off the SMU + * i2c busses. If we extend that driver to other/older + * machines, we should split this function into SMU-i2c, + * keywest-i2c, PMU-i2c, ... + */ + + DBG("wf_lm75: adapter %s detected\n", adapter->name); + + if (strncmp(adapter->name, "smu-i2c-", 8) != 0) + return 0; + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return 0; + + /* Look for the bus in the device-tree */ + bus_id = (u8)simple_strtoul(adapter->name + 8, NULL, 16); + + DBG("wf_lm75: bus ID is %x\n", bus_id); + + /* Look for sensors subdir */ + for (bus = NULL; + (bus = of_get_next_child(smu, bus)) != NULL;) { + u32 *reg; + + if (strcmp(bus->name, "i2c")) + continue; + reg = (u32 *)get_property(bus, "reg", NULL); + if (reg == NULL) + continue; + if (bus_id == *reg) + break; + } + of_node_put(smu); + if (bus == NULL) { + printk(KERN_WARNING "windfarm: SMU i2c bus 0x%x not found" + " in device-tree !\n", bus_id); + return 0; + } + + DBG("wf_lm75: bus found, looking for device...\n"); + + /* Now look for lm75(s) in there */ + for (dev = NULL; + (dev = of_get_next_child(bus, dev)) != NULL;) { + const char *loc = + get_property(dev, "hwsensor-location", NULL); + u32 *reg = (u32 *)get_property(dev, "reg", NULL); + DBG(" dev: %s... (loc: %p, reg: %p)\n", dev->name, loc, reg); + if (loc == NULL || reg == NULL) + continue; + /* real lm75 */ + if (device_is_compatible(dev, "lm75")) + wf_lm75_create(adapter, *reg, 0, loc); + /* ds1775 (compatible, better resolution */ + else if (device_is_compatible(dev, "ds1775")) + wf_lm75_create(adapter, *reg, 1, loc); + } + + of_node_put(bus); + + return 0; +} + +static int wf_lm75_detach(struct i2c_client *client) +{ + struct wf_lm75_sensor *lm = i2c_to_lm75(client); + + DBG("wf_lm75: i2c detatch called for %s\n", lm->sens.name); + + /* Mark client detached */ + lm->i2c.adapter = NULL; + + /* release sensor */ + wf_unregister_sensor(&lm->sens); + + return 0; +} + +static int __init wf_lm75_sensor_init(void) +{ + int rc; + + rc = i2c_add_driver(&wf_lm75_driver); + if (rc < 0) + return rc; + return 0; +} + +static void __exit wf_lm75_sensor_exit(void) +{ + i2c_del_driver(&wf_lm75_driver); +} + + +module_init(wf_lm75_sensor_init); +module_exit(wf_lm75_sensor_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("LM75 sensor objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pid.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pid.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,146 @@ +/* + * Windfarm PowerMac thermal control. Generic PID helpers + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#include +#include +#include +#include +#include +#include + +#include "windfarm_pid.h" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +void wf_pid_init(struct wf_pid_state *st, struct wf_pid_param *param) +{ + memset(st, 0, sizeof(struct wf_pid_state)); + st->param = *param; + st->first = 1; +} +EXPORT_SYMBOL_GPL(wf_pid_init); + +s32 wf_pid_run(struct wf_pid_state *st, s32 new_sample) +{ + s64 error, integ, deriv; + s32 target; + int i, hlen = st->param.history_len; + + /* Calculate error term */ + error = new_sample - st->param.itarget; + + /* Get samples into our history buffer */ + if (st->first) { + for (i = 0; i < hlen; i++) { + st->samples[i] = new_sample; + st->errors[i] = error; + } + st->first = 0; + st->index = 0; + } else { + st->index = (st->index + 1) % hlen; + st->samples[st->index] = new_sample; + st->errors[st->index] = error; + } + + /* Calculate integral term */ + for (i = 0, integ = 0; i < hlen; i++) + integ += st->errors[(st->index + hlen - i) % hlen]; + integ *= st->param.interval; + + /* Calculate derivative term */ + deriv = st->errors[st->index] - + st->errors[(st->index + hlen - 1) % hlen]; + deriv /= st->param.interval; + + /* Calculate target */ + target = (s32)((integ * (s64)st->param.gr + deriv * (s64)st->param.gd + + error * (s64)st->param.gp) >> 36); + if (st->param.additive) + target += st->target; + target = max(target, st->param.min); + target = min(target, st->param.max); + st->target = target; + + return st->target; +} +EXPORT_SYMBOL_GPL(wf_pid_run); + +void wf_cpu_pid_init(struct wf_cpu_pid_state *st, + struct wf_cpu_pid_param *param) +{ + memset(st, 0, sizeof(struct wf_cpu_pid_state)); + st->param = *param; + st->first = 1; +} +EXPORT_SYMBOL_GPL(wf_cpu_pid_init); + +s32 wf_cpu_pid_run(struct wf_cpu_pid_state *st, s32 new_power, s32 new_temp) +{ + s64 error, integ, deriv, prop; + s32 target, sval, adj; + int i, hlen = st->param.history_len; + + /* Calculate error term */ + error = st->param.pmaxadj - new_power; + + /* Get samples into our history buffer */ + if (st->first) { + for (i = 0; i < hlen; i++) { + st->powers[i] = new_power; + st->errors[i] = error; + } + st->temps[0] = st->temps[1] = new_temp; + st->first = 0; + st->index = st->tindex = 0; + } else { + st->index = (st->index + 1) % hlen; + st->powers[st->index] = new_power; + st->errors[st->index] = error; + st->tindex = (st->tindex + 1) % 2; + st->temps[st->tindex] = new_temp; + } + + /* Calculate integral term */ + for (i = 0, integ = 0; i < hlen; i++) + integ += st->errors[(st->index + hlen - i) % hlen]; + integ *= st->param.interval; + integ *= st->param.gr; + sval = st->param.tmax - ((integ >> 20) & 0xffffffff); + adj = min(st->param.ttarget, sval); + + DBG("integ: %lx, sval: %lx, adj: %lx\n", integ, sval, adj); + + /* Calculate derivative term */ + deriv = st->temps[st->tindex] - + st->temps[(st->tindex + 2 - 1) % 2]; + deriv /= st->param.interval; + deriv *= st->param.gd; + + /* Calculate proportional term */ + prop = (new_temp - adj); + prop *= st->param.gp; + + DBG("deriv: %lx, prop: %lx\n", deriv, prop); + + /* Calculate target */ + target = st->target + (s32)((deriv + prop) >> 36); + target = max(target, st->param.min); + target = min(target, st->param.max); + st->target = target; + + return st->target; +} +EXPORT_SYMBOL_GPL(wf_cpu_pid_run); Index: linux-work/drivers/macintosh/windfarm_pid.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pid.h 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,84 @@ +/* + * Windfarm PowerMac thermal control. Generic PID helpers + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * This is a pair of generic PID helpers that can be used by + * control loops. One is the basic PID implementation, the + * other one is more specifically tailored to the loops used + * for CPU control with 2 input sample types (temp and power) + */ + +/* + * *** Simple PID *** + */ + +#define WF_PID_MAX_HISTORY 32 + +/* This parameter array is passed to the PID algorithm. Currently, + * we don't support changing parameters on the fly as it's not needed + * but could be implemented (with necessary adjustment of the history + * buffer + */ +struct wf_pid_param { + int interval; /* Interval between samples in seconds */ + int history_len; /* Size of history buffer */ + int additive; /* 1: target relative to previous value */ + s32 gd, gp, gr; /* PID gains */ + s32 itarget; /* PID input target */ + s32 min,max; /* min and max target values */ +}; + +struct wf_pid_state { + int first; /* first run of the loop */ + int index; /* index of current sample */ + s32 target; /* current target value */ + s32 samples[WF_PID_MAX_HISTORY]; /* samples history buffer */ + s32 errors[WF_PID_MAX_HISTORY]; /* error history buffer */ + + struct wf_pid_param param; +}; + +extern void wf_pid_init(struct wf_pid_state *st, struct wf_pid_param *param); +extern s32 wf_pid_run(struct wf_pid_state *st, s32 sample); + + +/* + * *** CPU PID *** + */ + +#define WF_CPU_PID_MAX_HISTORY 32 + +/* This parameter array is passed to the CPU PID algorithm. Currently, + * we don't support changing parameters on the fly as it's not needed + * but could be implemented (with necessary adjustment of the history + * buffer + */ +struct wf_cpu_pid_param { + int interval; /* Interval between samples in seconds */ + int history_len; /* Size of history buffer */ + s32 gd, gp, gr; /* PID gains */ + s32 pmaxadj; /* PID max power adjust */ + s32 ttarget; /* PID input target */ + s32 tmax; /* PID input max */ + s32 min,max; /* min and max target values */ +}; + +struct wf_cpu_pid_state { + int first; /* first run of the loop */ + int index; /* index of current power */ + int tindex; /* index of current temp */ + s32 target; /* current target value */ + s32 powers[WF_PID_MAX_HISTORY]; /* power history buffer */ + s32 errors[WF_PID_MAX_HISTORY]; /* error history buffer */ + s32 temps[2]; /* temp. history buffer */ + + struct wf_cpu_pid_param param; +}; + +extern void wf_cpu_pid_init(struct wf_cpu_pid_state *st, + struct wf_cpu_pid_param *param); +extern s32 wf_cpu_pid_run(struct wf_cpu_pid_state *st, s32 power, s32 temp); Index: linux-work/drivers/macintosh/windfarm_cpufreq_clamp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_cpufreq_clamp.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,105 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.3" + +static int clamped; +static struct wf_control *clamp_control; + +static int clamp_notifier_call(struct notifier_block *self, + unsigned long event, void *data) +{ + struct cpufreq_policy *p = data; + unsigned long max_freq; + + if (event != CPUFREQ_ADJUST) + return 0; + + max_freq = clamped ? (p->cpuinfo.min_freq) : (p->cpuinfo.max_freq); + cpufreq_verify_within_limits(p, 0, max_freq); + + return 0; +} + +static struct notifier_block clamp_notifier = { + .notifier_call = clamp_notifier_call, +}; + +static int clamp_set(struct wf_control *ct, s32 value) +{ + if (value) + printk(KERN_INFO "windfarm: Clamping CPU frequency to " + "minimum !\n"); + else + printk(KERN_INFO "windfarm: CPU frequency unclamped !\n"); + clamped = value; + cpufreq_update_policy(0); + return 0; +} + +static int clamp_get(struct wf_control *ct, s32 *value) +{ + *value = clamped; + return 0; +} + +static s32 clamp_min(struct wf_control *ct) +{ + return 0; +} + +static s32 clamp_max(struct wf_control *ct) +{ + return 1; +} + +static struct wf_control_ops clamp_ops = { + .set_value = clamp_set, + .get_value = clamp_get, + .get_min = clamp_min, + .get_max = clamp_max, + .owner = THIS_MODULE, +}; + +static int __init wf_cpufreq_clamp_init(void) +{ + struct wf_control *clamp; + + clamp = kmalloc(sizeof(struct wf_control), GFP_KERNEL); + if (clamp == NULL) + return -ENOMEM; + cpufreq_register_notifier(&clamp_notifier, CPUFREQ_POLICY_NOTIFIER); + clamp->ops = &clamp_ops; + clamp->name = "cpufreq-clamp"; + if (wf_register_control(clamp)) + goto fail; + clamp_control = clamp; + return 0; + fail: + kfree(clamp); + return -ENODEV; +} + +static void __exit wf_cpufreq_clamp_exit(void) +{ + if (clamp_control) + wf_unregister_control(clamp_control); +} + + +module_init(wf_cpufreq_clamp_init); +module_exit(wf_cpufreq_clamp_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("CPU frequency clamp for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pm81.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pm81.c 2005-11-07 13:35:04.000000000 +1100 @@ -0,0 +1,881 @@ +/* + * Windfarm PowerMac thermal control. iMac G5 + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * The algorithm used is the PID control algorithm, used the same + * way the published Darwin code does, using the same values that + * are present in the Darwin 8.2 snapshot property lists (note however + * that none of the code has been re-used, it's a complete re-implementation + * + * The various control loops found in Darwin config file are: + * + * PowerMac8,1 and PowerMac8,2 + * =========================== + * + * System Fans control loop. Different based on models. In addition to the + * usual PID algorithm, the control loop gets 2 additional pairs of linear + * scaling factors (scale/offsets) expressed as 4.12 fixed point values + * signed offset, unsigned scale) + * + * The targets are modified such as: + * - the linked control (second control) gets the target value as-is + * (typically the drive fan) + * - the main control (first control) gets the target value scaled with + * the first pair of factors, and is then modified as below + * - the value of the target of the CPU Fan control loop is retreived, + * scaled with the second pair of factors, and the max of that and + * the scaled target is applied to the main control. + * + * # model_id: 2 + * controls : system-fan, drive-bay-fan + * sensors : hd-temp + * PID params : G_d = 0x15400000 + * G_p = 0x00200000 + * G_r = 0x000002fd + * History = 2 entries + * Input target = 0x3a0000 + * Interval = 5s + * linear-factors : offset = 0xff38 scale = 0x0ccd + * offset = 0x0208 scale = 0x07ae + * + * # model_id: 3 + * controls : system-fan, drive-bay-fan + * sensors : hd-temp + * PID params : G_d = 0x08e00000 + * G_p = 0x00566666 + * G_r = 0x0000072b + * History = 2 entries + * Input target = 0x350000 + * Interval = 5s + * linear-factors : offset = 0xff38 scale = 0x0ccd + * offset = 0x0000 scale = 0x0000 + * + * # model_id: 5 + * controls : system-fan + * sensors : hd-temp + * PID params : G_d = 0x15400000 + * G_p = 0x00233333 + * G_r = 0x000002fd + * History = 2 entries + * Input target = 0x3a0000 + * Interval = 5s + * linear-factors : offset = 0x0000 scale = 0x1000 + * offset = 0x0091 scale = 0x0bae + * + * CPU Fan control loop. The loop is identical for all models. it + * has an additional pair of scaling factor. This is used to scale the + * systems fan control loop target result (the one before it gets scaled + * by the System Fans control loop itself). Then, the max value of the + * calculated target value and system fan value is sent to the fans + * + * controls : cpu-fan + * sensors : cpu-temp cpu-power + * PID params : From SMU sdb partition + * linear-factors : offset = 0xfb50 scale = 0x1000 + * + * CPU Slew control loop. Not implemented. The cpufreq driver in linux is + * completely separate for now, though we could find a way to link it, either + * as a client reacting to overtemp notifications, or directling monitoring + * the CPU temperature + * + * WARNING ! The CPU control loop requires the CPU tmax for the current + * operating point. However, we currently are completely separated from + * the cpufreq driver and thus do not know what the current operating + * point is. Fortunately, we also do not have any hardware supporting anything + * but operating point 0 at the moment, thus we just peek that value directly + * from the SDB partition. If we ever end up with actually slewing the system + * clock and thus changing operating points, we'll have to find a way to + * communicate with the CPU freq driver; + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" +#include "windfarm_pid.h" + +#define VERSION "0.4" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* define this to force CPU overtemp to 74 degree, useful for testing + * the overtemp code + */ +#undef HACKED_OVERTEMP + +static int wf_smu_mach_model; /* machine model id */ + +static struct device *wf_smu_dev; + +/* Controls & sensors */ +static struct wf_sensor *sensor_cpu_power; +static struct wf_sensor *sensor_cpu_temp; +static struct wf_sensor *sensor_hd_temp; +static struct wf_control *fan_cpu_main; +static struct wf_control *fan_hd; +static struct wf_control *fan_system; +static struct wf_control *cpufreq_clamp; + +/* Set to kick the control loop into life */ +static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; + +/* Failure handling.. could be nicer */ +#define FAILURE_FAN 0x01 +#define FAILURE_SENSOR 0x02 +#define FAILURE_OVERTEMP 0x04 + +static unsigned int wf_smu_failure_state; +static int wf_smu_readjust, wf_smu_skipping; + +/* + * ****** System Fans Control Loop ****** + * + */ + +/* Parameters for the System Fans control loop. Parameters + * not in this table such as interval, history size, ... + * are common to all versions and thus hard coded for now. + */ +struct wf_smu_sys_fans_param { + int model_id; + s32 itarget; + s32 gd, gp, gr; + + s16 offset0; + u16 scale0; + s16 offset1; + u16 scale1; +}; + +#define WF_SMU_SYS_FANS_INTERVAL 5 +#define WF_SMU_SYS_FANS_HISTORY_SIZE 2 + +/* State data used by the system fans control loop + */ +struct wf_smu_sys_fans_state { + int ticks; + s32 sys_setpoint; + s32 hd_setpoint; + s16 offset0; + u16 scale0; + s16 offset1; + u16 scale1; + struct wf_pid_state pid; +}; + +/* + * Configs for SMU Sytem Fan control loop + */ +static struct wf_smu_sys_fans_param wf_smu_sys_all_params[] = { + /* Model ID 2 */ + { + .model_id = 2, + .itarget = 0x3a0000, + .gd = 0x15400000, + .gp = 0x00200000, + .gr = 0x000002fd, + .offset0 = 0xff38, + .scale0 = 0x0ccd, + .offset1 = 0x0208, + .scale1 = 0x07ae, + }, + /* Model ID 3 */ + { + .model_id = 2, + .itarget = 0x350000, + .gd = 0x08e00000, + .gp = 0x00566666, + .gr = 0x0000072b, + .offset0 = 0xff38, + .scale0 = 0x0ccd, + .offset1 = 0x0000, + .scale1 = 0x0000, + }, + /* Model ID 5 */ + { + .model_id = 2, + .itarget = 0x3a0000, + .gd = 0x15400000, + .gp = 0x00233333, + .gr = 0x000002fd, + .offset0 = 0x0000, + .scale0 = 0x1000, + .offset1 = 0x0091, + .scale1 = 0x0bae, + }, +}; +#define WF_SMU_SYS_FANS_NUM_CONFIGS ARRAY_SIZE(wf_smu_sys_all_params) + +static struct wf_smu_sys_fans_state *wf_smu_sys_fans; + +/* + * ****** CPU Fans Control Loop ****** + * + */ + + +#define WF_SMU_CPU_FANS_INTERVAL 1 +#define WF_SMU_CPU_FANS_MAX_HISTORY 16 +#define WF_SMU_CPU_FANS_SIBLING_SCALE 0x00001000 +#define WF_SMU_CPU_FANS_SIBLING_OFFSET 0xfffffb50 + +/* State data used by the cpu fans control loop + */ +struct wf_smu_cpu_fans_state { + int ticks; + s32 cpu_setpoint; + s32 scale; + s32 offset; + struct wf_cpu_pid_state pid; +}; + +static struct wf_smu_cpu_fans_state *wf_smu_cpu_fans; + + + +/* + * ***** Implementation ***** + * + */ + +static void wf_smu_create_sys_fans(void) +{ + struct wf_smu_sys_fans_param *param = NULL; + struct wf_pid_param pid_param; + int i; + + /* First, locate the params for this model */ + for (i = 0; i < WF_SMU_SYS_FANS_NUM_CONFIGS; i++) + if (wf_smu_sys_all_params[i].model_id == wf_smu_mach_model) { + param = &wf_smu_sys_all_params[i]; + break; + } + + /* No params found, put fans to max */ + if (param == NULL) { + printk(KERN_WARNING "windfarm: System fan config not found " + "for this machine model, max fan speed\n"); + goto fail; + } + + /* Alloc & initialize state */ + wf_smu_sys_fans = kmalloc(sizeof(struct wf_smu_sys_fans_state), + GFP_KERNEL); + if (wf_smu_sys_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_sys_fans->ticks = 1; + wf_smu_sys_fans->scale0 = param->scale0; + wf_smu_sys_fans->offset0 = param->offset0; + wf_smu_sys_fans->scale1 = param->scale1; + wf_smu_sys_fans->offset1 = param->offset1; + + /* Fill PID params */ + pid_param.gd = param->gd; + pid_param.gp = param->gp; + pid_param.gr = param->gr; + pid_param.interval = WF_SMU_SYS_FANS_INTERVAL; + pid_param.history_len = WF_SMU_SYS_FANS_HISTORY_SIZE; + pid_param.itarget = param->itarget; + pid_param.min = fan_system->ops->get_min(fan_system); + pid_param.max = fan_system->ops->get_max(fan_system); + if (fan_hd) { + pid_param.min = + max(pid_param.min,fan_hd->ops->get_min(fan_hd)); + pid_param.max = + min(pid_param.max,fan_hd->ops->get_max(fan_hd)); + } + wf_pid_init(&wf_smu_sys_fans->pid, &pid_param); + + DBG("wf: System Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.itarget), pid_param.min, pid_param.max); + return; + + fail: + + if (fan_system) + wf_control_set_max(fan_system); + if (fan_hd) + wf_control_set_max(fan_hd); +} + +static void wf_smu_sys_fans_tick(struct wf_smu_sys_fans_state *st) +{ + s32 new_setpoint, temp, scaled, cputarget; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_SYS_FANS_INTERVAL; + + rc = sensor_hd_temp->ops->get_value(sensor_hd_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: HD temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: System Fans tick ! HD temp: %d.%03d\n", + FIX32TOPRINT(temp)); + + if (temp > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; + + new_setpoint = wf_pid_run(&st->pid, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + scaled = ((((s64)new_setpoint) * (s64)st->scale0) >> 12) + st->offset0; + + DBG("wf_smu: scaled setpoint: %d RPM\n", (int)scaled); + + cputarget = wf_smu_cpu_fans ? wf_smu_cpu_fans->pid.target : 0; + cputarget = ((((s64)cputarget) * (s64)st->scale1) >> 12) + st->offset1; + scaled = max(scaled, cputarget); + scaled = max(scaled, st->pid.param.min); + scaled = min(scaled, st->pid.param.max); + + DBG("wf_smu: adjusted setpoint: %d RPM\n", (int)scaled); + + if (st->sys_setpoint == scaled && new_setpoint == st->hd_setpoint) + return; + st->sys_setpoint = scaled; + st->hd_setpoint = new_setpoint; + readjust: + if (fan_system && wf_smu_failure_state == 0) { + rc = fan_system->ops->set_value(fan_system, st->sys_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: Sys fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_hd && wf_smu_failure_state == 0) { + rc = fan_hd->ops->set_value(fan_hd, st->hd_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: HD fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_cpu_fans(void) +{ + struct wf_cpu_pid_param pid_param; + struct smu_sdbp_header *hdr; + struct smu_sdbp_cpupiddata *piddata; + struct smu_sdbp_fvt *fvt; + s32 tmax, tdelta, maxpow, powadj; + + /* First, locate the PID params in SMU SBD */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUPIDDATA_ID, NULL); + if (hdr == 0) { + printk(KERN_WARNING "windfarm: CPU PID fan config not found " + "max fan speed\n"); + goto fail; + } + piddata = (struct smu_sdbp_cpupiddata *)&hdr[1]; + + /* Get the FVT params for operating point 0 (the only supported one + * for now) in order to get tmax + */ + hdr = smu_get_sdb_partition(SMU_SDB_FVT_ID, NULL); + if (hdr) { + fvt = (struct smu_sdbp_fvt *)&hdr[1]; + tmax = ((s32)fvt->maxtemp) << 16; + } else + tmax = 0x5e0000; /* 94 degree default */ + + /* Alloc & initialize state */ + wf_smu_cpu_fans = kmalloc(sizeof(struct wf_smu_cpu_fans_state), + GFP_KERNEL); + if (wf_smu_cpu_fans == NULL) + goto fail; + wf_smu_cpu_fans->ticks = 1; + + wf_smu_cpu_fans->scale = WF_SMU_CPU_FANS_SIBLING_SCALE; + wf_smu_cpu_fans->offset = WF_SMU_CPU_FANS_SIBLING_OFFSET; + + /* Fill PID params */ + pid_param.interval = WF_SMU_CPU_FANS_INTERVAL; + pid_param.history_len = piddata->history_len; + if (pid_param.history_len > WF_CPU_PID_MAX_HISTORY) { + printk(KERN_WARNING "windfarm: History size overflow on " + "CPU control loop (%d)\n", piddata->history_len); + pid_param.history_len = WF_CPU_PID_MAX_HISTORY; + } + pid_param.gd = piddata->gd; + pid_param.gp = piddata->gp; + pid_param.gr = piddata->gr / pid_param.history_len; + + tdelta = ((s32)piddata->target_temp_delta) << 16; + maxpow = ((s32)piddata->max_power) << 16; + powadj = ((s32)piddata->power_adj) << 16; + + pid_param.tmax = tmax; + pid_param.ttarget = tmax - tdelta; + pid_param.pmaxadj = maxpow - powadj; + + pid_param.min = fan_cpu_main->ops->get_min(fan_cpu_main); + pid_param.max = fan_cpu_main->ops->get_max(fan_cpu_main); + + wf_cpu_pid_init(&wf_smu_cpu_fans->pid, &pid_param); + + DBG("wf: CPU Fan control initialized.\n"); + DBG(" ttarged=%d.%03d, tmax=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.ttarget), FIX32TOPRINT(pid_param.tmax), + pid_param.min, pid_param.max); + + return; + + fail: + printk(KERN_WARNING "windfarm: CPU fan config not found\n" + "for this machine model, max fan speed\n"); + + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); +} + +static void wf_smu_cpu_fans_tick(struct wf_smu_cpu_fans_state *st) +{ + s32 new_setpoint, temp, power, systarget; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_CPU_FANS_INTERVAL; + + rc = sensor_cpu_temp->ops->get_value(sensor_cpu_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: CPU temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + rc = sensor_cpu_power->ops->get_value(sensor_cpu_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: CPU power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: CPU Fans tick ! CPU temp: %d.%03d, power: %d.%03d\n", + FIX32TOPRINT(temp), FIX32TOPRINT(power)); + +#ifdef HACKED_OVERTEMP + if (temp > 0x4a0000) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#else + if (temp > st->pid.param.tmax) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + new_setpoint = wf_cpu_pid_run(&st->pid, power, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + systarget = wf_smu_sys_fans ? wf_smu_sys_fans->pid.target : 0; + systarget = ((((s64)systarget) * (s64)st->scale) >> 12) + + st->offset; + new_setpoint = max(new_setpoint, systarget); + new_setpoint = max(new_setpoint, st->pid.param.min); + new_setpoint = min(new_setpoint, st->pid.param.max); + + DBG("wf_smu: adjusted setpoint: %d RPM\n", (int)new_setpoint); + + if (st->cpu_setpoint == new_setpoint) + return; + st->cpu_setpoint = new_setpoint; + readjust: + if (fan_cpu_main && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_main, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU main fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + + +/* + * ****** Attributes ****** + * + */ + +#define BUILD_SHOW_FUNC_FIX(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + ssize_t r; \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + r = sprintf(buf, "%d.%03d", FIX32TOPRINT(val)); \ + return r; \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + + +#define BUILD_SHOW_FUNC_INT(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + return sprintf(buf, "%d", val); \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + +BUILD_SHOW_FUNC_INT(cpu_fan, fan_cpu_main); +BUILD_SHOW_FUNC_INT(sys_fan, fan_system); +BUILD_SHOW_FUNC_INT(hd_fan, fan_hd); + +BUILD_SHOW_FUNC_FIX(cpu_temp, sensor_cpu_temp); +BUILD_SHOW_FUNC_FIX(cpu_power, sensor_cpu_power); +BUILD_SHOW_FUNC_FIX(hd_temp, sensor_hd_temp); + +/* + * ****** Setup / Init / Misc ... ****** + * + */ + +static void wf_smu_tick(void) +{ + unsigned int last_failure = wf_smu_failure_state; + unsigned int new_failure; + + if (!wf_smu_started) { + DBG("wf: creating control loops !\n"); + wf_smu_create_sys_fans(); + wf_smu_create_cpu_fans(); + wf_smu_started = 1; + } + + /* Skipping ticks */ + if (wf_smu_skipping && --wf_smu_skipping) + return; + + wf_smu_failure_state = 0; + if (wf_smu_sys_fans) + wf_smu_sys_fans_tick(wf_smu_sys_fans); + if (wf_smu_cpu_fans) + wf_smu_cpu_fans_tick(wf_smu_cpu_fans); + + wf_smu_readjust = 0; + new_failure = wf_smu_failure_state & ~last_failure; + + /* If entering failure mode, clamp cpufreq and ramp all + * fans to full speed. + */ + if (wf_smu_failure_state && !last_failure) { + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_system) + wf_control_set_max(fan_system); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); + if (fan_hd) + wf_control_set_max(fan_hd); + } + + /* If leaving failure mode, unclamp cpufreq and readjust + * all fans on next iteration + */ + if (!wf_smu_failure_state && last_failure) { + if (cpufreq_clamp) + wf_control_set_min(cpufreq_clamp); + wf_smu_readjust = 1; + } + + /* Overtemp condition detected, notify and start skipping a couple + * ticks to let the temperature go down + */ + if (new_failure & FAILURE_OVERTEMP) { + wf_set_overtemp(); + wf_smu_skipping = 2; + } + + /* We only clear the overtemp condition if overtemp is cleared + * _and_ no other failure is present. Since a sensor error will + * clear the overtemp condition (can't measure temperature) at + * the control loop levels, but we don't want to keep it clear + * here in this case + */ + if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) + wf_clear_overtemp(); +} + +static void wf_smu_new_control(struct wf_control *ct) +{ + if (wf_smu_all_controls_ok) + return; + + if (fan_cpu_main == NULL && !strcmp(ct->name, "cpu-fan")) { + if (wf_get_control(ct) == 0) { + fan_cpu_main = ct; + device_create_file(wf_smu_dev, &dev_attr_cpu_fan); + } + } + + if (fan_system == NULL && !strcmp(ct->name, "system-fan")) { + if (wf_get_control(ct) == 0) { + fan_system = ct; + device_create_file(wf_smu_dev, &dev_attr_sys_fan); + } + } + + if (cpufreq_clamp == NULL && !strcmp(ct->name, "cpufreq-clamp")) { + if (wf_get_control(ct) == 0) + cpufreq_clamp = ct; + } + + /* Darwin property list says the HD fan is only for model ID + * 0, 1, 2 and 3 + */ + + if (wf_smu_mach_model > 3) { + if (fan_system && fan_cpu_main && cpufreq_clamp) + wf_smu_all_controls_ok = 1; + return; + } + + if (fan_hd == NULL && !strcmp(ct->name, "drive-bay-fan")) { + if (wf_get_control(ct) == 0) { + fan_hd = ct; + device_create_file(wf_smu_dev, &dev_attr_hd_fan); + } + } + + if (fan_system && fan_hd && fan_cpu_main && cpufreq_clamp) + wf_smu_all_controls_ok = 1; +} + +static void wf_smu_new_sensor(struct wf_sensor *sr) +{ + if (wf_smu_all_sensors_ok) + return; + + if (sensor_cpu_power == NULL && !strcmp(sr->name, "cpu-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_power = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_power); + } + } + + if (sensor_cpu_temp == NULL && !strcmp(sr->name, "cpu-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_temp); + } + } + + if (sensor_hd_temp == NULL && !strcmp(sr->name, "hd-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_hd_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_hd_temp); + } + } + + if (sensor_cpu_power && sensor_cpu_temp && sensor_hd_temp) + wf_smu_all_sensors_ok = 1; +} + + +static int wf_smu_notify(struct notifier_block *self, + unsigned long event, void *data) +{ + switch(event) { + case WF_EVENT_NEW_CONTROL: + DBG("wf: new control %s detected\n", + ((struct wf_control *)data)->name); + wf_smu_new_control(data); + wf_smu_readjust = 1; + break; + case WF_EVENT_NEW_SENSOR: + DBG("wf: new sensor %s detected\n", + ((struct wf_sensor *)data)->name); + wf_smu_new_sensor(data); + break; + case WF_EVENT_TICK: + if (wf_smu_all_controls_ok && wf_smu_all_sensors_ok) + wf_smu_tick(); + } + + return 0; +} + +static struct notifier_block wf_smu_events = { + .notifier_call = wf_smu_notify, +}; + +static int wf_init_pm(void) +{ + struct smu_sdbp_header *hdr; + + hdr = smu_get_sdb_partition(SMU_SDB_SENSORTREE_ID, NULL); + if (hdr != 0) { + struct smu_sdbp_sensortree *st = + (struct smu_sdbp_sensortree *)&hdr[1]; + wf_smu_mach_model = st->model_id; + } + + printk(KERN_INFO "windfarm: Initializing for iMacG5 model ID %d\n", + wf_smu_mach_model); + + return 0; +} + +static int wf_smu_probe(struct device *ddev) +{ + wf_smu_dev = ddev; + + wf_register_client(&wf_smu_events); + + return 0; +} + +static int wf_smu_remove(struct device *ddev) +{ + wf_unregister_client(&wf_smu_events); + + /* XXX We don't have yet a guarantee that our callback isn't + * in progress when returning from wf_unregister_client, so + * we add an arbitrary delay. I'll have to fix that in the core + */ + msleep(1000); + + /* Release all sensors */ + /* One more crappy race: I don't think we have any guarantee here + * that the attribute callback won't race with the sensor beeing + * disposed of, and I'm not 100% certain what best way to deal + * with that except by adding locks all over... I'll do that + * eventually but heh, who ever rmmod this module anyway ? + */ + if (sensor_cpu_power) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_power); + wf_put_sensor(sensor_cpu_power); + } + if (sensor_cpu_temp) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_temp); + wf_put_sensor(sensor_cpu_temp); + } + if (sensor_hd_temp) { + device_remove_file(wf_smu_dev, &dev_attr_hd_temp); + wf_put_sensor(sensor_hd_temp); + } + + /* Release all controls */ + if (fan_cpu_main) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_fan); + wf_put_control(fan_cpu_main); + } + if (fan_hd) { + device_remove_file(wf_smu_dev, &dev_attr_hd_fan); + wf_put_control(fan_hd); + } + if (fan_system) { + device_remove_file(wf_smu_dev, &dev_attr_sys_fan); + wf_put_control(fan_system); + } + if (cpufreq_clamp) + wf_put_control(cpufreq_clamp); + + /* Destroy control loops state structures */ + if (wf_smu_sys_fans) + kfree(wf_smu_sys_fans); + if (wf_smu_cpu_fans) + kfree(wf_smu_cpu_fans); + + wf_smu_dev = NULL; + + return 0; +} + +static struct device_driver wf_smu_driver = { + .name = "windfarm", + .bus = &platform_bus_type, + .probe = wf_smu_probe, + .remove = wf_smu_remove, +}; + + +static int __init wf_smu_init(void) +{ + int rc = -ENODEV; + + if (machine_is_compatible("PowerMac8,1") || + machine_is_compatible("PowerMac8,2")) + rc = wf_init_pm(); + + if (rc == 0) { +#ifdef MODULE + request_module("windfarm_smu_controls"); + request_module("windfarm_smu_sensors"); + request_module("windfarm_lm75_sensor"); + +#endif /* MODULE */ + driver_register(&wf_smu_driver); + } + + return rc; +} + +static void __exit wf_smu_exit(void) +{ + + driver_unregister(&wf_smu_driver); +} + + +module_init(wf_smu_init); +module_exit(wf_smu_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Thermal control logic for iMac G5"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pm91.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pm91.c 2005-11-07 13:35:12.000000000 +1100 @@ -0,0 +1,816 @@ +/* + * Windfarm PowerMac thermal control. SMU based 1 CPU desktop control loops + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * The algorithm used is the PID control algorithm, used the same + * way the published Darwin code does, using the same values that + * are present in the Darwin 8.2 snapshot property lists (note however + * that none of the code has been re-used, it's a complete re-implementation + * + * The various control loops found in Darwin config file are: + * + * PowerMac9,1 + * =========== + * + * Has 3 control loops: CPU fans is similar to PowerMac8,1 (though it doesn't + * try to play with other control loops fans). Drive bay is rather basic PID + * with one sensor and one fan. Slots area is a bit different as the Darwin + * driver is supposed to be capable of working in a special "AGP" mode which + * involves the presence of an AGP sensor and an AGP fan (possibly on the + * AGP card itself). I can't deal with that special mode as I don't have + * access to those additional sensor/fans for now (though ultimately, it would + * be possible to add sensor objects for them) so I'm only implementing the + * basic PCI slot control loop + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" +#include "windfarm_pid.h" + +#define VERSION "0.4" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* define this to force CPU overtemp to 74 degree, useful for testing + * the overtemp code + */ +#undef HACKED_OVERTEMP + +static struct device *wf_smu_dev; + +/* Controls & sensors */ +static struct wf_sensor *sensor_cpu_power; +static struct wf_sensor *sensor_cpu_temp; +static struct wf_sensor *sensor_hd_temp; +static struct wf_sensor *sensor_slots_power; +static struct wf_control *fan_cpu_main; +static struct wf_control *fan_cpu_second; +static struct wf_control *fan_cpu_third; +static struct wf_control *fan_hd; +static struct wf_control *fan_slots; +static struct wf_control *cpufreq_clamp; + +/* Set to kick the control loop into life */ +static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; + +/* Failure handling.. could be nicer */ +#define FAILURE_FAN 0x01 +#define FAILURE_SENSOR 0x02 +#define FAILURE_OVERTEMP 0x04 + +static unsigned int wf_smu_failure_state; +static int wf_smu_readjust, wf_smu_skipping; + +/* + * ****** CPU Fans Control Loop ****** + * + */ + + +#define WF_SMU_CPU_FANS_INTERVAL 1 +#define WF_SMU_CPU_FANS_MAX_HISTORY 16 + +/* State data used by the cpu fans control loop + */ +struct wf_smu_cpu_fans_state { + int ticks; + s32 cpu_setpoint; + struct wf_cpu_pid_state pid; +}; + +static struct wf_smu_cpu_fans_state *wf_smu_cpu_fans; + + + +/* + * ****** Drive Fan Control Loop ****** + * + */ + +struct wf_smu_drive_fans_state { + int ticks; + s32 setpoint; + struct wf_pid_state pid; +}; + +static struct wf_smu_drive_fans_state *wf_smu_drive_fans; + +/* + * ****** Slots Fan Control Loop ****** + * + */ + +struct wf_smu_slots_fans_state { + int ticks; + s32 setpoint; + struct wf_pid_state pid; +}; + +static struct wf_smu_slots_fans_state *wf_smu_slots_fans; + +/* + * ***** Implementation ***** + * + */ + + +static void wf_smu_create_cpu_fans(void) +{ + struct wf_cpu_pid_param pid_param; + struct smu_sdbp_header *hdr; + struct smu_sdbp_cpupiddata *piddata; + struct smu_sdbp_fvt *fvt; + s32 tmax, tdelta, maxpow, powadj; + + /* First, locate the PID params in SMU SBD */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUPIDDATA_ID, NULL); + if (hdr == 0) { + printk(KERN_WARNING "windfarm: CPU PID fan config not found " + "max fan speed\n"); + goto fail; + } + piddata = (struct smu_sdbp_cpupiddata *)&hdr[1]; + + /* Get the FVT params for operating point 0 (the only supported one + * for now) in order to get tmax + */ + hdr = smu_get_sdb_partition(SMU_SDB_FVT_ID, NULL); + if (hdr) { + fvt = (struct smu_sdbp_fvt *)&hdr[1]; + tmax = ((s32)fvt->maxtemp) << 16; + } else + tmax = 0x5e0000; /* 94 degree default */ + + /* Alloc & initialize state */ + wf_smu_cpu_fans = kmalloc(sizeof(struct wf_smu_cpu_fans_state), + GFP_KERNEL); + if (wf_smu_cpu_fans == NULL) + goto fail; + wf_smu_cpu_fans->ticks = 1; + + /* Fill PID params */ + pid_param.interval = WF_SMU_CPU_FANS_INTERVAL; + pid_param.history_len = piddata->history_len; + if (pid_param.history_len > WF_CPU_PID_MAX_HISTORY) { + printk(KERN_WARNING "windfarm: History size overflow on " + "CPU control loop (%d)\n", piddata->history_len); + pid_param.history_len = WF_CPU_PID_MAX_HISTORY; + } + pid_param.gd = piddata->gd; + pid_param.gp = piddata->gp; + pid_param.gr = piddata->gr / pid_param.history_len; + + tdelta = ((s32)piddata->target_temp_delta) << 16; + maxpow = ((s32)piddata->max_power) << 16; + powadj = ((s32)piddata->power_adj) << 16; + + pid_param.tmax = tmax; + pid_param.ttarget = tmax - tdelta; + pid_param.pmaxadj = maxpow - powadj; + + pid_param.min = fan_cpu_main->ops->get_min(fan_cpu_main); + pid_param.max = fan_cpu_main->ops->get_max(fan_cpu_main); + + wf_cpu_pid_init(&wf_smu_cpu_fans->pid, &pid_param); + + DBG("wf: CPU Fan control initialized.\n"); + DBG(" ttarged=%d.%03d, tmax=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.ttarget), FIX32TOPRINT(pid_param.tmax), + pid_param.min, pid_param.max); + + return; + + fail: + printk(KERN_WARNING "windfarm: CPU fan config not found\n" + "for this machine model, max fan speed\n"); + + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); +} + +static void wf_smu_cpu_fans_tick(struct wf_smu_cpu_fans_state *st) +{ + s32 new_setpoint, temp, power; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_CPU_FANS_INTERVAL; + + rc = sensor_cpu_temp->ops->get_value(sensor_cpu_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: CPU temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + rc = sensor_cpu_power->ops->get_value(sensor_cpu_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: CPU power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: CPU Fans tick ! CPU temp: %d.%03d, power: %d.%03d\n", + FIX32TOPRINT(temp), FIX32TOPRINT(power)); + +#ifdef HACKED_OVERTEMP + if (temp > 0x4a0000) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#else + if (temp > st->pid.param.tmax) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + new_setpoint = wf_cpu_pid_run(&st->pid, power, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + if (st->cpu_setpoint == new_setpoint) + return; + st->cpu_setpoint = new_setpoint; + readjust: + if (fan_cpu_main && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_main, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU main fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_cpu_second && wf_smu_failure_state == 0) { + rc = fan_cpu_second->ops->set_value(fan_cpu_second, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU second fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_cpu_third && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_third, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU third fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_drive_fans(void) +{ + struct wf_pid_param param = { + .interval = 5, + .history_len = 2, + .gd = 0x01e00000, + .gp = 0x00500000, + .gr = 0x00000000, + .itarget = 0x00200000, + }; + + /* Alloc & initialize state */ + wf_smu_drive_fans = kmalloc(sizeof(struct wf_smu_drive_fans_state), + GFP_KERNEL); + if (wf_smu_drive_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_drive_fans->ticks = 1; + + /* Fill PID params */ + param.additive = (fan_hd->type == WF_CONTROL_RPM_FAN); + param.min = fan_hd->ops->get_min(fan_hd); + param.max = fan_hd->ops->get_max(fan_hd); + wf_pid_init(&wf_smu_drive_fans->pid, ¶m); + + DBG("wf: Drive Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(param.itarget), param.min, param.max); + return; + + fail: + if (fan_hd) + wf_control_set_max(fan_hd); +} + +static void wf_smu_drive_fans_tick(struct wf_smu_drive_fans_state *st) +{ + s32 new_setpoint, temp; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = st->pid.param.interval; + + rc = sensor_hd_temp->ops->get_value(sensor_hd_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: HD temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: Drive Fans tick ! HD temp: %d.%03d\n", + FIX32TOPRINT(temp)); + + if (temp > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; + + new_setpoint = wf_pid_run(&st->pid, temp); + + DBG("wf_smu: new_setpoint: %d\n", (int)new_setpoint); + + if (st->setpoint == new_setpoint) + return; + st->setpoint = new_setpoint; + readjust: + if (fan_hd && wf_smu_failure_state == 0) { + rc = fan_hd->ops->set_value(fan_hd, st->setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: HD fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_slots_fans(void) +{ + struct wf_pid_param param = { + .interval = 1, + .history_len = 8, + .gd = 0x00000000, + .gp = 0x00000000, + .gr = 0x00020000, + .itarget = 0x00000000 + }; + + /* Alloc & initialize state */ + wf_smu_slots_fans = kmalloc(sizeof(struct wf_smu_slots_fans_state), + GFP_KERNEL); + if (wf_smu_slots_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_slots_fans->ticks = 1; + + /* Fill PID params */ + param.additive = (fan_slots->type == WF_CONTROL_RPM_FAN); + param.min = fan_slots->ops->get_min(fan_slots); + param.max = fan_slots->ops->get_max(fan_slots); + wf_pid_init(&wf_smu_slots_fans->pid, ¶m); + + DBG("wf: Slots Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(param.itarget), param.min, param.max); + return; + + fail: + if (fan_slots) + wf_control_set_max(fan_slots); +} + +static void wf_smu_slots_fans_tick(struct wf_smu_slots_fans_state *st) +{ + s32 new_setpoint, power; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = st->pid.param.interval; + + rc = sensor_slots_power->ops->get_value(sensor_slots_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: Slots power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: Slots Fans tick ! Slots power: %d.%03d\n", + FIX32TOPRINT(power)); + +#if 0 /* Check what makes a good overtemp condition */ + if (power > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + + new_setpoint = wf_pid_run(&st->pid, power); + + DBG("wf_smu: new_setpoint: %d\n", (int)new_setpoint); + + if (st->setpoint == new_setpoint) + return; + st->setpoint = new_setpoint; + readjust: + if (fan_slots && wf_smu_failure_state == 0) { + rc = fan_slots->ops->set_value(fan_slots, st->setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: Slots fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + + +/* + * ****** Attributes ****** + * + */ + +#define BUILD_SHOW_FUNC_FIX(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + ssize_t r; \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + r = sprintf(buf, "%d.%03d", FIX32TOPRINT(val)); \ + return r; \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + + +#define BUILD_SHOW_FUNC_INT(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + return sprintf(buf, "%d", val); \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + +BUILD_SHOW_FUNC_INT(cpu_fan, fan_cpu_main); +BUILD_SHOW_FUNC_INT(hd_fan, fan_hd); +BUILD_SHOW_FUNC_INT(slots_fan, fan_slots); + +BUILD_SHOW_FUNC_FIX(cpu_temp, sensor_cpu_temp); +BUILD_SHOW_FUNC_FIX(cpu_power, sensor_cpu_power); +BUILD_SHOW_FUNC_FIX(hd_temp, sensor_hd_temp); +BUILD_SHOW_FUNC_FIX(slots_power, sensor_slots_power); + +/* + * ****** Setup / Init / Misc ... ****** + * + */ + +static void wf_smu_tick(void) +{ + unsigned int last_failure = wf_smu_failure_state; + unsigned int new_failure; + + if (!wf_smu_started) { + DBG("wf: creating control loops !\n"); + wf_smu_create_drive_fans(); + wf_smu_create_slots_fans(); + wf_smu_create_cpu_fans(); + wf_smu_started = 1; + } + + /* Skipping ticks */ + if (wf_smu_skipping && --wf_smu_skipping) + return; + + wf_smu_failure_state = 0; + if (wf_smu_drive_fans) + wf_smu_drive_fans_tick(wf_smu_drive_fans); + if (wf_smu_slots_fans) + wf_smu_slots_fans_tick(wf_smu_slots_fans); + if (wf_smu_cpu_fans) + wf_smu_cpu_fans_tick(wf_smu_cpu_fans); + + wf_smu_readjust = 0; + new_failure = wf_smu_failure_state & ~last_failure; + + /* If entering failure mode, clamp cpufreq and ramp all + * fans to full speed. + */ + if (wf_smu_failure_state && !last_failure) { + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); + if (fan_cpu_second) + wf_control_set_max(fan_cpu_second); + if (fan_cpu_third) + wf_control_set_max(fan_cpu_third); + if (fan_hd) + wf_control_set_max(fan_hd); + if (fan_slots) + wf_control_set_max(fan_slots); + } + + /* If leaving failure mode, unclamp cpufreq and readjust + * all fans on next iteration + */ + if (!wf_smu_failure_state && last_failure) { + if (cpufreq_clamp) + wf_control_set_min(cpufreq_clamp); + wf_smu_readjust = 1; + } + + /* Overtemp condition detected, notify and start skipping a couple + * ticks to let the temperature go down + */ + if (new_failure & FAILURE_OVERTEMP) { + wf_set_overtemp(); + wf_smu_skipping = 2; + } + + /* We only clear the overtemp condition if overtemp is cleared + * _and_ no other failure is present. Since a sensor error will + * clear the overtemp condition (can't measure temperature) at + * the control loop levels, but we don't want to keep it clear + * here in this case + */ + if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) + wf_clear_overtemp(); +} + + +static void wf_smu_new_control(struct wf_control *ct) +{ + if (wf_smu_all_controls_ok) + return; + + if (fan_cpu_main == NULL && !strcmp(ct->name, "cpu-rear-fan-0")) { + if (wf_get_control(ct) == 0) { + fan_cpu_main = ct; + device_create_file(wf_smu_dev, &dev_attr_cpu_fan); + } + } + + if (fan_cpu_second == NULL && !strcmp(ct->name, "cpu-rear-fan-1")) { + if (wf_get_control(ct) == 0) + fan_cpu_second = ct; + } + + if (fan_cpu_third == NULL && !strcmp(ct->name, "cpu-front-fan-0")) { + if (wf_get_control(ct) == 0) + fan_cpu_third = ct; + } + + if (cpufreq_clamp == NULL && !strcmp(ct->name, "cpufreq-clamp")) { + if (wf_get_control(ct) == 0) + cpufreq_clamp = ct; + } + + if (fan_hd == NULL && !strcmp(ct->name, "drive-bay-fan")) { + if (wf_get_control(ct) == 0) { + fan_hd = ct; + device_create_file(wf_smu_dev, &dev_attr_hd_fan); + } + } + + if (fan_slots == NULL && !strcmp(ct->name, "slots-fan")) { + if (wf_get_control(ct) == 0) { + fan_slots = ct; + device_create_file(wf_smu_dev, &dev_attr_slots_fan); + } + } + + if (fan_cpu_main && (fan_cpu_second || fan_cpu_third) && fan_hd && + fan_slots && cpufreq_clamp) + wf_smu_all_controls_ok = 1; +} + +static void wf_smu_new_sensor(struct wf_sensor *sr) +{ + if (wf_smu_all_sensors_ok) + return; + + if (sensor_cpu_power == NULL && !strcmp(sr->name, "cpu-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_power = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_power); + } + } + + if (sensor_cpu_temp == NULL && !strcmp(sr->name, "cpu-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_temp); + } + } + + if (sensor_hd_temp == NULL && !strcmp(sr->name, "hd-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_hd_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_hd_temp); + } + } + + if (sensor_slots_power == NULL && !strcmp(sr->name, "slots-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_slots_power = sr; + device_create_file(wf_smu_dev, &dev_attr_slots_power); + } + } + + if (sensor_cpu_power && sensor_cpu_temp && + sensor_hd_temp && sensor_slots_power) + wf_smu_all_sensors_ok = 1; +} + + +static int wf_smu_notify(struct notifier_block *self, + unsigned long event, void *data) +{ + switch(event) { + case WF_EVENT_NEW_CONTROL: + DBG("wf: new control %s detected\n", + ((struct wf_control *)data)->name); + wf_smu_new_control(data); + wf_smu_readjust = 1; + break; + case WF_EVENT_NEW_SENSOR: + DBG("wf: new sensor %s detected\n", + ((struct wf_sensor *)data)->name); + wf_smu_new_sensor(data); + break; + case WF_EVENT_TICK: + if (wf_smu_all_controls_ok && wf_smu_all_sensors_ok) + wf_smu_tick(); + } + + return 0; +} + +static struct notifier_block wf_smu_events = { + .notifier_call = wf_smu_notify, +}; + +static int wf_init_pm(void) +{ + printk(KERN_INFO "windfarm: Initializing for Desktop G5 model\n"); + + return 0; +} + +static int wf_smu_probe(struct device *ddev) +{ + wf_smu_dev = ddev; + + wf_register_client(&wf_smu_events); + + return 0; +} + +static int wf_smu_remove(struct device *ddev) +{ + wf_unregister_client(&wf_smu_events); + + /* XXX We don't have yet a guarantee that our callback isn't + * in progress when returning from wf_unregister_client, so + * we add an arbitrary delay. I'll have to fix that in the core + */ + msleep(1000); + + /* Release all sensors */ + /* One more crappy race: I don't think we have any guarantee here + * that the attribute callback won't race with the sensor beeing + * disposed of, and I'm not 100% certain what best way to deal + * with that except by adding locks all over... I'll do that + * eventually but heh, who ever rmmod this module anyway ? + */ + if (sensor_cpu_power) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_power); + wf_put_sensor(sensor_cpu_power); + } + if (sensor_cpu_temp) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_temp); + wf_put_sensor(sensor_cpu_temp); + } + if (sensor_hd_temp) { + device_remove_file(wf_smu_dev, &dev_attr_hd_temp); + wf_put_sensor(sensor_hd_temp); + } + if (sensor_slots_power) { + device_remove_file(wf_smu_dev, &dev_attr_slots_power); + wf_put_sensor(sensor_slots_power); + } + + /* Release all controls */ + if (fan_cpu_main) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_fan); + wf_put_control(fan_cpu_main); + } + if (fan_cpu_second) + wf_put_control(fan_cpu_second); + if (fan_cpu_third) + wf_put_control(fan_cpu_third); + if (fan_hd) { + device_remove_file(wf_smu_dev, &dev_attr_hd_fan); + wf_put_control(fan_hd); + } + if (fan_slots) { + device_remove_file(wf_smu_dev, &dev_attr_slots_fan); + wf_put_control(fan_slots); + } + if (cpufreq_clamp) + wf_put_control(cpufreq_clamp); + + /* Destroy control loops state structures */ + if (wf_smu_slots_fans) + kfree(wf_smu_cpu_fans); + if (wf_smu_drive_fans) + kfree(wf_smu_cpu_fans); + if (wf_smu_cpu_fans) + kfree(wf_smu_cpu_fans); + + wf_smu_dev = NULL; + + return 0; +} + +static struct device_driver wf_smu_driver = { + .name = "windfarm", + .bus = &platform_bus_type, + .probe = wf_smu_probe, + .remove = wf_smu_remove, +}; + + +static int __init wf_smu_init(void) +{ + int rc = -ENODEV; + + if (machine_is_compatible("PowerMac9,1")) + rc = wf_init_pm(); + + if (rc == 0) { +#ifdef MODULE + request_module("windfarm_smu_controls"); + request_module("windfarm_smu_sensors"); + request_module("windfarm_lm75_sensor"); + +#endif /* MODULE */ + driver_register(&wf_smu_driver); + } + + return rc; +} + +static void __exit wf_smu_exit(void) +{ + + driver_unregister(&wf_smu_driver); +} + + +module_init(wf_smu_init); +module_exit(wf_smu_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Thermal control logic for PowerMac9,1"); +MODULE_LICENSE("GPL"); + From benh at kernel.crashing.org Mon Nov 7 14:32:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:32:28 +1100 Subject: [PATCH] ppc64: Update g5_defconfig for ARCH=powerpc Message-ID: <1131334348.5229.160.camel@gaston> This patch updates g5_defconfig for ARCH=powerpc in order to add the SMU support & thermal drivers to it, the pmac sound driver (works on some G5s) and replaces rivafb with nvidiafb which works better for the cards found in G5 based machines. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/configs/g5_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/g5_defconfig 2005-11-07 13:34:24.000000000 +1100 +++ linux-work/arch/powerpc/configs/g5_defconfig 2005-11-07 13:38:08.000000000 +1100 @@ -1,18 +1,32 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14-rc4 -# Thu Oct 20 08:30:23 2005 +# Linux kernel version: 2.6.14 +# Mon Nov 7 13:37:59 2005 # +CONFIG_PPC64=y CONFIG_64BIT=y +CONFIG_PPC_MERGE=y CONFIG_MMU=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y -CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y +CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y -CONFIG_FORCE_MAX_ZONEORDER=13 + +# +# Processor support +# +CONFIG_POWER4_ONLY=y +CONFIG_POWER4=y +CONFIG_PPC_FPU=y +CONFIG_ALTIVEC=y +CONFIG_PPC_STD_MMU=y +CONFIG_SMP=y +CONFIG_NR_CPUS=2 # # Code maturity level options @@ -67,30 +81,60 @@ CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y -CONFIG_SYSVIPC_COMPAT=y # # Platform support # -# CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y +# CONFIG_PPC_ISERIES is not set +# CONFIG_EMBEDDED6xx is not set +# CONFIG_APUS is not set # CONFIG_PPC_PSERIES is not set -# CONFIG_PPC_BPA is not set CONFIG_PPC_PMAC=y +CONFIG_PPC_PMAC64=y # CONFIG_PPC_MAPLE is not set -CONFIG_PPC=y -CONFIG_PPC64=y +# CONFIG_PPC_CELL is not set CONFIG_PPC_OF=y -CONFIG_MPIC=y -CONFIG_ALTIVEC=y -CONFIG_KEXEC=y CONFIG_U3_DART=y -CONFIG_PPC_PMAC64=y -CONFIG_BOOTX_TEXT=y -CONFIG_POWER4_ONLY=y +CONFIG_MPIC=y +# CONFIG_PPC_RTAS is not set +# CONFIG_MMIO_NVRAM is not set +# CONFIG_PPC_MPC106 is not set +CONFIG_GENERIC_TBSYNC=y +CONFIG_CPU_FREQ=y +CONFIG_CPU_FREQ_TABLE=y +# CONFIG_CPU_FREQ_DEBUG is not set +CONFIG_CPU_FREQ_STAT=y +# CONFIG_CPU_FREQ_STAT_DETAILS is not set +CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y +# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set +CONFIG_CPU_FREQ_GOV_PERFORMANCE=y +CONFIG_CPU_FREQ_GOV_POWERSAVE=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +# CONFIG_CPU_FREQ_GOV_ONDEMAND is not set +# CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set +CONFIG_CPU_FREQ_PMAC64=y +# CONFIG_WANT_EARLY_SERIAL is not set + +# +# Kernel options +# +# CONFIG_HZ_100 is not set +CONFIG_HZ_250=y +# CONFIG_HZ_1000 is not set +CONFIG_HZ=250 +CONFIG_PREEMPT_NONE=y +# CONFIG_PREEMPT_VOLUNTARY is not set +# CONFIG_PREEMPT is not set +# CONFIG_PREEMPT_BKL is not set +CONFIG_BINFMT_ELF=y +# CONFIG_BINFMT_MISC is not set +CONFIG_FORCE_MAX_ZONEORDER=13 CONFIG_IOMMU_VMERGE=y -CONFIG_SMP=y -CONFIG_NR_CPUS=2 +# CONFIG_HOTPLUG_CPU is not set +CONFIG_KEXEC=y +CONFIG_IRQ_ALL_CPUS=y +# CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y @@ -100,28 +144,21 @@ CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set -# CONFIG_NUMA is not set +CONFIG_SPLIT_PTLOCK_CPUS=4 +# CONFIG_PPC_64K_PAGES is not set # CONFIG_SCHED_SMT is not set -CONFIG_PREEMPT_NONE=y -# CONFIG_PREEMPT_VOLUNTARY is not set -# CONFIG_PREEMPT is not set -# CONFIG_PREEMPT_BKL is not set -# CONFIG_HZ_100 is not set -CONFIG_HZ_250=y -# CONFIG_HZ_1000 is not set -CONFIG_HZ=250 -CONFIG_GENERIC_HARDIRQS=y -CONFIG_SECCOMP=y -CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set -# CONFIG_HOTPLUG_CPU is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set +# CONFIG_PM is not set +CONFIG_SECCOMP=y CONFIG_ISA_DMA_API=y # -# Bus Options +# Bus options # +CONFIG_GENERIC_ISA_DMA=y +# CONFIG_PPC_I8259 is not set +# CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y @@ -136,6 +173,7 @@ # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set +CONFIG_KERNEL_START=0xc000000000000000 # # Networking @@ -276,6 +314,10 @@ # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set + +# +# QoS and/or fair queueing +# # CONFIG_NET_SCHED is not set CONFIG_NET_CLS_ROUTE=y @@ -348,6 +390,11 @@ CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # CONFIG_ATA_OVER_ETH is not set # @@ -449,6 +496,7 @@ # # SCSI low-level drivers # +# CONFIG_ISCSI_TCP is not set # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set @@ -465,10 +513,12 @@ # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_MV is not set # CONFIG_SCSI_SATA_NV is not set -# CONFIG_SCSI_SATA_PROMISE is not set +# CONFIG_SCSI_PDC_ADMA is not set # CONFIG_SCSI_SATA_QSTOR is not set +# CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set +# CONFIG_SCSI_SATA_SIL24 is not set # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set @@ -567,6 +617,9 @@ CONFIG_ADB_PMU=y CONFIG_PMAC_SMU=y CONFIG_THERM_PM72=y +CONFIG_WINDFARM=y +CONFIG_WINDFARM_PM81=y +CONFIG_WINDFARM_PM91=y # # Network device support @@ -603,6 +656,7 @@ # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set # CONFIG_NET_PCI is not set +# CONFIG_FEC_8XX is not set # # Ethernet (1000 Mbit) @@ -768,6 +822,7 @@ # TPM devices # # CONFIG_TCG_TPM is not set +# CONFIG_TELCLOCK is not set # # I2C support @@ -820,6 +875,7 @@ # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_SENSORS_MAX6875 is not set +# CONFIG_RTC_X1205_I2C is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set @@ -876,10 +932,9 @@ # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set -# CONFIG_FB_NVIDIA is not set -CONFIG_FB_RIVA=y -# CONFIG_FB_RIVA_I2C is not set -# CONFIG_FB_RIVA_DEBUG is not set +CONFIG_FB_NVIDIA=y +CONFIG_FB_NVIDIA_I2C=y +# CONFIG_FB_RIVA is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y @@ -924,7 +979,96 @@ # # Sound # -# CONFIG_SOUND is not set +CONFIG_SOUND=m + +# +# Advanced Linux Sound Architecture +# +CONFIG_SND=m +CONFIG_SND_TIMER=m +CONFIG_SND_PCM=m +CONFIG_SND_HWDEP=m +CONFIG_SND_RAWMIDI=m +CONFIG_SND_SEQUENCER=m +# CONFIG_SND_SEQ_DUMMY is not set +CONFIG_SND_OSSEMUL=y +CONFIG_SND_MIXER_OSS=m +CONFIG_SND_PCM_OSS=m +CONFIG_SND_SEQUENCER_OSS=y +# CONFIG_SND_VERBOSE_PRINTK is not set +# CONFIG_SND_DEBUG is not set +CONFIG_SND_GENERIC_DRIVER=y + +# +# Generic devices +# +# CONFIG_SND_DUMMY is not set +# CONFIG_SND_VIRMIDI is not set +# CONFIG_SND_MTPAV is not set +# CONFIG_SND_SERIAL_U16550 is not set +# CONFIG_SND_MPU401 is not set + +# +# PCI devices +# +# CONFIG_SND_ALI5451 is not set +# CONFIG_SND_ATIIXP is not set +# CONFIG_SND_ATIIXP_MODEM is not set +# CONFIG_SND_AU8810 is not set +# CONFIG_SND_AU8820 is not set +# CONFIG_SND_AU8830 is not set +# CONFIG_SND_AZT3328 is not set +# CONFIG_SND_BT87X is not set +# CONFIG_SND_CS46XX is not set +# CONFIG_SND_CS4281 is not set +# CONFIG_SND_EMU10K1 is not set +# CONFIG_SND_EMU10K1X is not set +# CONFIG_SND_CA0106 is not set +# CONFIG_SND_KORG1212 is not set +# CONFIG_SND_MIXART is not set +# CONFIG_SND_NM256 is not set +# CONFIG_SND_RME32 is not set +# CONFIG_SND_RME96 is not set +# CONFIG_SND_RME9652 is not set +# CONFIG_SND_HDSP is not set +# CONFIG_SND_HDSPM is not set +# CONFIG_SND_TRIDENT is not set +# CONFIG_SND_YMFPCI is not set +# CONFIG_SND_AD1889 is not set +# CONFIG_SND_ALS4000 is not set +# CONFIG_SND_CMIPCI is not set +# CONFIG_SND_ENS1370 is not set +# CONFIG_SND_ENS1371 is not set +# CONFIG_SND_ES1938 is not set +# CONFIG_SND_ES1968 is not set +# CONFIG_SND_MAESTRO3 is not set +# CONFIG_SND_FM801 is not set +# CONFIG_SND_ICE1712 is not set +# CONFIG_SND_ICE1724 is not set +# CONFIG_SND_INTEL8X0 is not set +# CONFIG_SND_INTEL8X0M is not set +# CONFIG_SND_SONICVIBES is not set +# CONFIG_SND_VIA82XX is not set +# CONFIG_SND_VIA82XX_MODEM is not set +# CONFIG_SND_VX222 is not set +# CONFIG_SND_HDA_INTEL is not set + +# +# ALSA PowerMac devices +# +CONFIG_SND_POWERMAC=m +CONFIG_SND_POWERMAC_AUTO_DRC=y + +# +# USB devices +# +CONFIG_SND_USB_AUDIO=m +# CONFIG_SND_USB_USX2Y is not set + +# +# Open Sound System +# +# CONFIG_SOUND_PRIME is not set # # USB support @@ -958,12 +1102,16 @@ # # USB Device Class drivers # -# CONFIG_USB_BLUETOOTH_TTY is not set +# CONFIG_OBSOLETE_OSS_USB_DRIVER is not set CONFIG_USB_ACM=m CONFIG_USB_PRINTER=y # -# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# + +# +# may also be needed; see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set @@ -1074,6 +1222,7 @@ CONFIG_USB_SERIAL_KLSI=m CONFIG_USB_SERIAL_KOBIL_SCT=m CONFIG_USB_SERIAL_MCT_U232=m +# CONFIG_USB_SERIAL_NOKIA_DKU2 is not set CONFIG_USB_SERIAL_PL2303=m # CONFIG_USB_SERIAL_HP4X is not set CONFIG_USB_SERIAL_SAFE=m @@ -1311,6 +1460,20 @@ CONFIG_NLS_UTF8=y # +# Library routines +# +CONFIG_CRC_CCITT=m +# CONFIG_CRC16 is not set +CONFIG_CRC32=y +CONFIG_LIBCRC32C=m +CONFIG_ZLIB_INFLATE=y +CONFIG_ZLIB_DEFLATE=m +CONFIG_TEXTSEARCH=y +CONFIG_TEXTSEARCH_KMP=m +CONFIG_TEXTSEARCH_BM=m +CONFIG_TEXTSEARCH_FSM=m + +# # Profiling support # CONFIG_PROFILING=y @@ -1331,12 +1494,14 @@ # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y +# CONFIG_DEBUG_VM is not set +# CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUGGER is not set -# CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y +CONFIG_BOOTX_TEXT=y # # Security options @@ -1376,17 +1541,3 @@ # # Hardware crypto devices # - -# -# Library routines -# -CONFIG_CRC_CCITT=m -# CONFIG_CRC16 is not set -CONFIG_CRC32=y -CONFIG_LIBCRC32C=m -CONFIG_ZLIB_INFLATE=y -CONFIG_ZLIB_DEFLATE=m -CONFIG_TEXTSEARCH=y -CONFIG_TEXTSEARCH_KMP=m -CONFIG_TEXTSEARCH_BM=m -CONFIG_TEXTSEARCH_FSM=m From benh at kernel.crashing.org Mon Nov 7 14:36:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:36:21 +1100 Subject: [PATCH] ppc64: More U3 device-tree fixes Message-ID: <1131334582.5229.163.camel@gaston> Some more U3 revisions have the missing "interrupts" property in U3, this adds them to the fixup code in prom_init.c Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/prom_init.c 2005-11-07 14:21:10.000000000 +1100 +++ linux-work/arch/powerpc/kernel/prom_init.c 2005-11-07 14:36:13.000000000 +1100 @@ -1872,7 +1872,7 @@ if (prom_getprop(u3, "device-rev", &u3_rev, sizeof(u3_rev)) == PROM_ERROR) return; - if (u3_rev != 0x35 && u3_rev != 0x37) + if (u3_rev < 0x35 || u3_rev > 0x39) return; /* does it need fixup ? */ if (prom_getproplen(i2c, "interrupts") > 0) Index: linux-work/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom_init.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/ppc64/kernel/prom_init.c 2005-11-07 14:36:17.000000000 +1100 @@ -1825,7 +1825,7 @@ if (prom_getprop(u3, "device-rev", &u3_rev, sizeof(u3_rev)) == PROM_ERROR) return; - if (u3_rev != 0x35 && u3_rev != 0x37) + if (u3_rev < 0x35 || u3_rev > 0x39) return; /* does it need fixup ? */ if (prom_getproplen(i2c, "interrupts") > 0) From olof at lixom.net Mon Nov 7 14:40:08 2005 From: olof at lixom.net (Olof Johansson) Date: Sun, 6 Nov 2005 19:40:08 -0800 Subject: [PATCH] ppc64: SMU based macs cpufreq support In-Reply-To: <1131334053.5229.151.camel@gaston> References: <1131334053.5229.151.camel@gaston> Message-ID: <20051107034008.GC7166@pb15.lixom.net> On Mon, Nov 07, 2005 at 02:27:33PM +1100, Benjamin Herrenschmidt wrote: > CPU freq support using 970FX powertune facility for iMac G5 and SMU > based single CPU desktop. > > Signed-off-by: Benjamin Herrenschmidt > > Index: linux-work/arch/ppc64/kernel/misc.S [...] > +_GLOBAL(scom970_read) [...] > Index: linux-work/arch/powerpc/kernel/misc_64.S [...] > +_GLOBAL(scom970_read) Are they needed in both places? With the move to arch/powerpc, can't the first go? -Olof From benh at kernel.crashing.org Mon Nov 7 14:45:04 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 14:45:04 +1100 Subject: [PATCH] ppc64: SMU based macs cpufreq support In-Reply-To: <20051107034008.GC7166@pb15.lixom.net> References: <1131334053.5229.151.camel@gaston> <20051107034008.GC7166@pb15.lixom.net> Message-ID: <1131335105.5229.169.camel@gaston> On Sun, 2005-11-06 at 19:40 -0800, Olof Johansson wrote: > On Mon, Nov 07, 2005 at 02:27:33PM +1100, Benjamin Herrenschmidt wrote: > > CPU freq support using 970FX powertune facility for iMac G5 and SMU > > based single CPU desktop. > > > > Signed-off-by: Benjamin Herrenschmidt > > > > Index: linux-work/arch/ppc64/kernel/misc.S > [...] > > +_GLOBAL(scom970_read) > > [...] > > > Index: linux-work/arch/powerpc/kernel/misc_64.S > [...] > > +_GLOBAL(scom970_read) > > > Are they needed in both places? With the move to arch/powerpc, can't the > first go? It will go when arch/ppc64 is gone. In the meantime, it's needed if you build an ARCH=ppc64 kernel Ben. From olof at lixom.net Mon Nov 7 15:29:31 2005 From: olof at lixom.net (Olof Johansson) Date: Sun, 6 Nov 2005 20:29:31 -0800 Subject: [PATCH] ppc64: SMU based macs cpufreq support In-Reply-To: <1131335105.5229.169.camel@gaston> References: <1131334053.5229.151.camel@gaston> <20051107034008.GC7166@pb15.lixom.net> <1131335105.5229.169.camel@gaston> Message-ID: <20051107042931.GD7166@pb15.lixom.net> On Mon, Nov 07, 2005 at 02:45:04PM +1100, Benjamin Herrenschmidt wrote: > On Sun, 2005-11-06 at 19:40 -0800, Olof Johansson wrote: > > Are they needed in both places? With the move to arch/powerpc, can't the > > first go? > > It will go when arch/ppc64 is gone. In the meantime, it's needed if you > build an ARCH=ppc64 kernel Ok. It just seemed odd to add functionality to ppc64 when the focus is so much on moving everything over. -Olof From benh at kernel.crashing.org Mon Nov 7 15:31:10 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 15:31:10 +1100 Subject: [PATCH] ppc64: SMU based macs cpufreq support In-Reply-To: <20051107042931.GD7166@pb15.lixom.net> References: <1131334053.5229.151.camel@gaston> <20051107034008.GC7166@pb15.lixom.net> <1131335105.5229.169.camel@gaston> <20051107042931.GD7166@pb15.lixom.net> Message-ID: <1131337870.11406.0.camel@gaston> On Sun, 2005-11-06 at 20:29 -0800, Olof Johansson wrote: > On Mon, Nov 07, 2005 at 02:45:04PM +1100, Benjamin Herrenschmidt wrote: > > On Sun, 2005-11-06 at 19:40 -0800, Olof Johansson wrote: > > > Are they needed in both places? With the move to arch/powerpc, can't the > > > first go? > > > > It will go when arch/ppc64 is gone. In the meantime, it's needed if you > > build an ARCH=ppc64 kernel > > Ok. It just seemed odd to add functionality to ppc64 when the focus is > so much on moving everything over. Well, I'm adding a functionality to the pmac platform :) It needs that support asm, wether it's in ppc64 or powerpc doesn't matter but I'm not (yet) ready to make pmac unbuildable (or building different features) on ppc64. Ben. From hch at lst.de Mon Nov 7 15:49:23 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 7 Nov 2005 05:49:23 +0100 Subject: [PATCH] ppc64: Thermal control for SMU based machiens In-Reply-To: <1131334230.5229.157.camel@gaston> References: <1131334230.5229.157.camel@gaston> Message-ID: <20051107044923.GA16703@lst.de> On Mon, Nov 07, 2005 at 02:30:28PM +1100, Benjamin Herrenschmidt wrote: > This adds a new thermal control framework for PowerMac, along with the > implementation for PowerMac8,1, PowerMac8,2 (iMac G5 rev 1 and 2), and > PowerMac9,1 (latest single CPU desktop). In the future, I expect to move > the older G5 thermal control to the new framework as well. the core code doesn't seem to be mac-specific, could you please move it to common code? Especially the userland callout should be shared with existing drivers on sparc64 and ia64. I don't request you to move those over yet, I'll work with the maintainer to get there later. What about creating drivers/windfarm/ and move all the files there? > +struct wf_sensor_ops { > + int (*get_value)(struct wf_sensor *sr, s32 *val); > + void (*release)(struct wf_sensor *sr); > + struct module *owner; tabs instead of spaces here :) > +#include not needed anymore, latest mainline includes the CONFIG_ symbols automatically. > +#include not needed. > +int wf_register_client(struct notifier_block *nb) > +{ > + int rc; > + struct wf_control *ct; > + struct wf_sensor *sr; > + > + down(&wf_lock); > + rc = notifier_chain_register(&wf_client_list, nb); > + if (rc != 0) > + goto bail; > + wf_client_count++; > + list_for_each_entry(ct, &wf_controls, link) > + wf_notify(WF_EVENT_NEW_CONTROL, ct); > + list_for_each_entry(sr, &wf_sensors, link) > + wf_notify(WF_EVENT_NEW_SENSOR, sr); > + if (wf_client_count == 1) > + wf_start_thread(); > + bail: > + up(&wf_lock); > + return rc; > +} shouldn't clients get proper methods instead of the notifier_block mess? > Index: linux-work/drivers/macintosh/windfarm_smu_controls.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/drivers/macintosh/windfarm_smu_controls.c 2005-11-07 13:30:46.000000000 +1100 > @@ -0,0 +1,274 @@ no copyright notices? From mikey at neuling.org Mon Nov 7 15:51:45 2005 From: mikey at neuling.org (Michael Neuling) Date: Mon, 7 Nov 2005 15:51:45 +1100 Subject: [PATCH 1/2] kexec tools: device tree blob reserve memory map entry cleanup In-Reply-To: <20051026125112.ceddab99.mikey@neuling.org> References: <20051026115202.f2acc73f.mikey@neuling.org> <20051026125112.ceddab99.mikey@neuling.org> Message-ID: <20051107155145.f033ecfd.mikey@neuling.org> Retransmitting this patch as the last one version was causing seg faults (due to an excess of brown paper bags in my cubical). ---- Patch cleans up how the reserve memory maps entry for the device tree are modified. Shouldn't change any functionality. Signed-off-by: Michael Neuling 1 files changed, 18 insertions(+), 21 deletions(-) Index: kexec-tools-1.101/kexec/arch/ppc64/kexec-elf-ppc64.c =================================================================== --- kexec-tools-1.101.orig/kexec/arch/ppc64/kexec-elf-ppc64.c +++ kexec-tools-1.101/kexec/arch/ppc64/kexec-elf-ppc64.c @@ -171,10 +171,7 @@ /* Add v2wrap to the current image */ unsigned char *v2wrap_buf = NULL; off_t v2wrap_size = 0; - unsigned int off_len; - unsigned char *seg_buf; - unsigned int rsvmap_len; - unsigned long long *ptr; + unsigned long long *rsvmap_ptr; struct bootblock *bb_ptr; unsigned int devtree_size; @@ -189,23 +186,23 @@ add_buffer(info, v2wrap_buf, v2wrap_size, v2wrap_size, 0, 0, 0xFFFFFFFFFFFFFFFFUL, -1); - /* patch reserve map address for flattened device-tree */ - base_addr = info->segment[(info->nr_segments)-1].mem; - seg_buf = (unsigned char *)info->segment[(info->nr_segments)-1].buf; - seg_buf = seg_buf + 0x100; /* offset to end of v2wrap */ - bb_ptr = (struct bootblock *)seg_buf; - rsvmap_len = bb_ptr->off_dt_struct - bb_ptr->off_mem_rsvmap; - devtree_size = bb_ptr->totalsize; - off_len = sizeof(struct bootblock); - off_len += 7; off_len &= ~7; - seg_buf = seg_buf + off_len; - off_len = rsvmap_len / (2 * sizeof(unsigned long long)); - - ptr = (unsigned long long *)seg_buf; - ptr = ptr + 2*(off_len-2); - *ptr = base_addr + 0x100; - ptr++; - *ptr = (unsigned long long)devtree_size; + /* patch reserve map address for flattened device-tree + find last entry (both 0) in the reserve mem list. Assume DT + entry is before this one */ + bb_ptr = (struct bootblock *)( + (unsigned char *)info->segment[(info->nr_segments)-1].buf + + 0x100); + rsvmap_ptr = (long long *)( + (unsigned char *)info->segment[(info->nr_segments)-1].buf + + bb_ptr->off_mem_rsvmap + 0x100); + while (*rsvmap_ptr || *(rsvmap_ptr+1)){ + rsvmap_ptr += 2; + } + rsvmap_ptr -= 2; + *rsvmap_ptr = (unsigned long long)( + info->segment[(info->nr_segments)-1].mem + 0x100); + rsvmap_ptr++; + *rsvmap_ptr = (unsigned long long)bb_ptr->totalsize; unsigned int nr_segments; nr_segments = info->nr_segments; From benh at kernel.crashing.org Mon Nov 7 16:05:29 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 16:05:29 +1100 Subject: [PATCH] ppc64: Thermal control for SMU based machiens In-Reply-To: <20051107044923.GA16703@lst.de> References: <1131334230.5229.157.camel@gaston> <20051107044923.GA16703@lst.de> Message-ID: <1131339930.11406.10.camel@gaston> On Mon, 2005-11-07 at 05:49 +0100, Christoph Hellwig wrote: > On Mon, Nov 07, 2005 at 02:30:28PM +1100, Benjamin Herrenschmidt wrote: > > This adds a new thermal control framework for PowerMac, along with the > > implementation for PowerMac8,1, PowerMac8,2 (iMac G5 rev 1 and 2), and > > PowerMac9,1 (latest single CPU desktop). In the future, I expect to move > > the older G5 thermal control to the new framework as well. > > the core code doesn't seem to be mac-specific, could you please move > it to common code? Especially the userland callout should be shared > with existing drivers on sparc64 and ia64. I don't request you to move > those over yet, I'll work with the maintainer to get there later. > > What about creating drivers/windfarm/ and move all the files there? I'd like to massage it a bit more before going to common code. I want to port the old thermal control, and maybe a bit smarter overtemp code as well. For now, I'd rather keep it there... I need it in now (users are complaining enough about their machines wanting to be vacuum cleaners) and I have no time this week before the patch gates close to do the other changes I have in mind, so I put that as a target for after 2.6.15. > > +struct wf_sensor_ops { > > + int (*get_value)(struct wf_sensor *sr, s32 *val); > > + void (*release)(struct wf_sensor *sr); > > + struct module *owner; > > tabs instead of spaces here :) oops... oh well...fixing :) > > +#include > > not needed anymore, latest mainline includes the CONFIG_ symbols > automatically. Good to know > > +#include > > not needed. Yah, remaining of some older bits. > > > +int wf_register_client(struct notifier_block *nb) > > +{ > > + int rc; > > + struct wf_control *ct; > > + struct wf_sensor *sr; > > + > > + down(&wf_lock); > > + rc = notifier_chain_register(&wf_client_list, nb); > > + if (rc != 0) > > + goto bail; > > + wf_client_count++; > > + list_for_each_entry(ct, &wf_controls, link) > > + wf_notify(WF_EVENT_NEW_CONTROL, ct); > > + list_for_each_entry(sr, &wf_sensors, link) > > + wf_notify(WF_EVENT_NEW_SENSOR, sr); > > + if (wf_client_count == 1) > > + wf_start_thread(); > > + bail: > > + up(&wf_lock); > > + return rc; > > +} > > shouldn't clients get proper methods instead of the notifier_block mess? It was easier that way for a first implementation. That's what I'm thinking about "massaging it a bit more" :) There are some lifetime issues with clients I want to address as well so yes, I think the notifier will go :) The initial intent of the notifier however was to be able to have smarter mecanisms like "meta" sensors that catch sensor creations and create their own sensor (like a power sensor that catch the voltage and current sensors and exposes a combo power sensor) though I ended up not doing it that way. I also wanted a way to broadcast things like overtemp conditions system wide. But then, there is no limit on what a client can be. That is, the main "controller" for a given machine is a client, but anything else could be as well... > > Index: linux-work/drivers/macintosh/windfarm_smu_controls.c > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux-work/drivers/macintosh/windfarm_smu_controls.c 2005-11-07 13:30:46.000000000 +1100 > > @@ -0,0 +1,274 @@ > > no copyright notices? Will fix. New patch on the way... Thanks, Ben. From benh at kernel.crashing.org Mon Nov 7 16:08:17 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 07 Nov 2005 16:08:17 +1100 Subject: [PATCH] ppc64: Thermal control for SMU based machines Message-ID: <1131340098.11406.14.camel@gaston> This adds a new thermal control framework for PowerMac, along with the implementation for PowerMac8,1, PowerMac8,2 (iMac G5 rev 1 and 2), and PowerMac9,1 (latest single CPU desktop). In the future, I expect to move the older G5 thermal control to the new framework as well. Signed-off-by: Benjamin Herrenschmidt --- New version addressing some of Christoph comments Index: linux-work/drivers/macintosh/smu.c =================================================================== --- linux-work.orig/drivers/macintosh/smu.c 2005-11-07 13:30:45.000000000 +1100 +++ linux-work/drivers/macintosh/smu.c 2005-11-07 13:30:46.000000000 +1100 @@ -590,6 +590,8 @@ sprintf(name, "smu-i2c-%02x", *reg); of_platform_device_create(np, name, &smu->of_dev->dev); } + if (device_is_compatible(np, "smu-sensors")) + of_platform_device_create(np, "smu-sensors", &smu->of_dev->dev); } } Index: linux-work/drivers/macintosh/Kconfig =================================================================== --- linux-work.orig/drivers/macintosh/Kconfig 2005-11-07 13:29:50.000000000 +1100 +++ linux-work/drivers/macintosh/Kconfig 2005-11-07 13:30:46.000000000 +1100 @@ -169,6 +169,25 @@ This driver provides thermostat and fan control for the desktop G5 machines. +config WINDFARM + tristate "New PowerMac thermal control infrastructure" + +config WINDFARM_PM81 + tristate "Support for thermal management on iMac G5" + depends on WINDFARM && I2C && CPU_FREQ_PMAC64 && PMAC_SMU + select I2C_PMAC_SMU + help + This driver provides thermal control for the iMacG5 + +config WINDFARM_PM91 + tristate "Support for thermal management on PowerMac9,1" + depends on WINDFARM && I2C && CPU_FREQ_PMAC64 && PMAC_SMU + select I2C_PMAC_SMU + help + This driver provides thermal control for the PowerMac9,1 + which is the recent (SMU based) single CPU desktop G5 + + config ANSLCD tristate "Support for ANS LCD display" depends on ADB_CUDA && PPC_PMAC Index: linux-work/drivers/macintosh/Makefile =================================================================== --- linux-work.orig/drivers/macintosh/Makefile 2005-11-07 13:29:50.000000000 +1100 +++ linux-work/drivers/macintosh/Makefile 2005-11-07 13:30:46.000000000 +1100 @@ -26,3 +26,12 @@ obj-$(CONFIG_THERM_PM72) += therm_pm72.o obj-$(CONFIG_THERM_WINDTUNNEL) += therm_windtunnel.o obj-$(CONFIG_THERM_ADT746X) += therm_adt746x.o +obj-$(CONFIG_WINDFARM) += windfarm_core.o +obj-$(CONFIG_WINDFARM_PM81) += windfarm_smu_controls.o \ + windfarm_smu_sensors.o \ + windfarm_lm75_sensor.o windfarm_pid.o \ + windfarm_cpufreq_clamp.o windfarm_pm81.o +obj-$(CONFIG_WINDFARM_PM91) += windfarm_smu_controls.o \ + windfarm_smu_sensors.o \ + windfarm_lm75_sensor.o windfarm_pid.o \ + windfarm_cpufreq_clamp.o windfarm_pm91.o Index: linux-work/drivers/macintosh/windfarm.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm.h 2005-11-07 16:05:27.000000000 +1100 @@ -0,0 +1,131 @@ +/* + * Windfarm PowerMac thermal control. + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#ifndef __WINDFARM_H__ +#define __WINDFARM_H__ + +#include +#include +#include +#include + +/* Display a 16.16 fixed point value */ +#define FIX32TOPRINT(f) ((f) >> 16),((((f) & 0xffff) * 1000) >> 16) + +/* + * Control objects + */ + +struct wf_control; + +struct wf_control_ops { + int (*set_value)(struct wf_control *ct, s32 val); + int (*get_value)(struct wf_control *ct, s32 *val); + s32 (*get_min)(struct wf_control *ct); + s32 (*get_max)(struct wf_control *ct); + void (*release)(struct wf_control *ct); + struct module *owner; +}; + +struct wf_control { + struct list_head link; + struct wf_control_ops *ops; + char *name; + int type; + struct kref ref; +}; + +#define WF_CONTROL_TYPE_GENERIC 0 +#define WF_CONTROL_RPM_FAN 1 +#define WF_CONTROL_PWM_FAN 2 + + +/* Note about lifetime rules: wf_register_control() will initialize + * the kref and wf_unregister_control will decrement it, thus the + * object creating/disposing a given control shouldn't assume it + * still exists after wf_unregister_control has been called. + * wf_find_control will inc the refcount for you + */ +extern int wf_register_control(struct wf_control *ct); +extern void wf_unregister_control(struct wf_control *ct); +extern struct wf_control * wf_find_control(const char *name); +extern int wf_get_control(struct wf_control *ct); +extern void wf_put_control(struct wf_control *ct); + +static inline int wf_control_set_max(struct wf_control *ct) +{ + s32 vmax = ct->ops->get_max(ct); + return ct->ops->set_value(ct, vmax); +} + +static inline int wf_control_set_min(struct wf_control *ct) +{ + s32 vmin = ct->ops->get_min(ct); + return ct->ops->set_value(ct, vmin); +} + +/* + * Sensor objects + */ + +struct wf_sensor; + +struct wf_sensor_ops { + int (*get_value)(struct wf_sensor *sr, s32 *val); + void (*release)(struct wf_sensor *sr); + struct module *owner; +}; + +struct wf_sensor { + struct list_head link; + struct wf_sensor_ops *ops; + char *name; + struct kref ref; +}; + +/* Same lifetime rules as controls */ +extern int wf_register_sensor(struct wf_sensor *sr); +extern void wf_unregister_sensor(struct wf_sensor *sr); +extern struct wf_sensor * wf_find_sensor(const char *name); +extern int wf_get_sensor(struct wf_sensor *sr); +extern void wf_put_sensor(struct wf_sensor *sr); + +/* For use by clients. Note that we are a bit racy here since + * notifier_block doesn't have a module owner field. I may fix + * it one day ... + * + * LOCKING NOTE ! + * + * All "events" except WF_EVENT_TICK are called with an internal mutex + * held which will deadlock if you call basically any core routine. + * So don't ! Just take note of the event and do your actual operations + * from the ticker. + * + */ +extern int wf_register_client(struct notifier_block *nb); +extern int wf_unregister_client(struct notifier_block *nb); + +/* Overtemp conditions. Those are refcounted */ +extern void wf_set_overtemp(void); +extern void wf_clear_overtemp(void); +extern int wf_is_overtemp(void); + +#define WF_EVENT_NEW_CONTROL 0 /* param is wf_control * */ +#define WF_EVENT_NEW_SENSOR 1 /* param is wf_sensor * */ +#define WF_EVENT_OVERTEMP 2 /* no param */ +#define WF_EVENT_NORMALTEMP 3 /* overtemp condition cleared */ +#define WF_EVENT_TICK 4 /* 1 second tick */ + +/* Note: If that driver gets more broad use, we could replace the + * simplistic overtemp bits with "environmental conditions". That + * could then be used to also notify of things like fan failure, + * case open, battery conditions, ... + */ + +#endif /* __WINDFARM_H__ */ Index: linux-work/drivers/macintosh/windfarm_core.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_core.c 2005-11-07 16:03:12.000000000 +1100 @@ -0,0 +1,426 @@ +/* + * Windfarm PowerMac thermal control. Core + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * This core code tracks the list of sensors & controls, register + * clients, and holds the kernel thread used for control. + * + * TODO: + * + * Add some information about sensor/control type and data format to + * sensors/controls, and have the sysfs attribute stuff be moved + * generically here instead of hard coded in the platform specific + * driver as it us currently + * + * This however requires solving some annoying lifetime issues with + * sysfs which doesn't seem to have lifetime rules for struct attribute, + * I may have to create full features kobjects for every sensor/control + * instead which is a bit of an overkill imho + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.2" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +static LIST_HEAD(wf_controls); +static LIST_HEAD(wf_sensors); +static DECLARE_MUTEX(wf_lock); +static struct notifier_block *wf_client_list; +static int wf_client_count; +static unsigned int wf_overtemp; +static unsigned int wf_overtemp_counter; +struct task_struct *wf_thread; + +/* + * Utilities & tick thread + */ + +static inline void wf_notify(int event, void *param) +{ + notifier_call_chain(&wf_client_list, event, param); +} + +int wf_critical_overtemp(void) +{ + static char * critical_overtemp_path = "/sbin/critical_overtemp"; + char *argv[] = { critical_overtemp_path, NULL }; + static char *envp[] = { "HOME=/", + "TERM=linux", + "PATH=/sbin:/usr/sbin:/bin:/usr/bin", + NULL }; + + return call_usermodehelper(critical_overtemp_path, argv, envp, 0); +} +EXPORT_SYMBOL_GPL(wf_critical_overtemp); + +static int wf_thread_func(void *data) +{ + unsigned long next, delay; + + next = jiffies; + + DBG("wf: thread started\n"); + + while(!kthread_should_stop()) { + try_to_freeze(); + + if (time_after_eq(jiffies, next)) { + wf_notify(WF_EVENT_TICK, NULL); + if (wf_overtemp) { + wf_overtemp_counter++; + /* 10 seconds overtemp, notify userland */ + if (wf_overtemp_counter > 10) + wf_critical_overtemp(); + /* 30 seconds, shutdown */ + if (wf_overtemp_counter > 30) { + printk(KERN_ERR "windfarm: Overtemp " + "for more than 30" + " seconds, shutting down\n"); + machine_power_off(); + } + } + next += HZ; + } + + delay = next - jiffies; + if (delay <= HZ) + schedule_timeout_interruptible(delay); + + /* there should be no signal, but oh well */ + if (signal_pending(current)) { + printk(KERN_WARNING "windfarm: thread got sigl !\n"); + break; + } + } + + DBG("wf: thread stopped\n"); + + return 0; +} + +static void wf_start_thread(void) +{ + wf_thread = kthread_run(wf_thread_func, NULL, "kwindfarm"); + if (IS_ERR(wf_thread)) { + printk(KERN_ERR "windfarm: failed to create thread,err %ld\n", + PTR_ERR(wf_thread)); + wf_thread = NULL; + } +} + + +static void wf_stop_thread(void) +{ + if (wf_thread) + kthread_stop(wf_thread); + wf_thread = NULL; +} + +/* + * Controls + */ + +static void wf_control_release(struct kref *kref) +{ + struct wf_control *ct = container_of(kref, struct wf_control, ref); + + DBG("wf: Deleting control %s\n", ct->name); + + if (ct->ops && ct->ops->release) + ct->ops->release(ct); + else + kfree(ct); +} + +int wf_register_control(struct wf_control *new_ct) +{ + struct wf_control *ct; + + down(&wf_lock); + list_for_each_entry(ct, &wf_controls, link) { + if (!strcmp(ct->name, new_ct->name)) { + printk(KERN_WARNING "windfarm: trying to register" + " duplicate control %s\n", ct->name); + up(&wf_lock); + return -EEXIST; + } + } + kref_init(&new_ct->ref); + list_add(&new_ct->link, &wf_controls); + + DBG("wf: Registered control %s\n", new_ct->name); + + wf_notify(WF_EVENT_NEW_CONTROL, new_ct); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_register_control); + +void wf_unregister_control(struct wf_control *ct) +{ + down(&wf_lock); + list_del(&ct->link); + up(&wf_lock); + + DBG("wf: Unregistered control %s\n", ct->name); + + kref_put(&ct->ref, wf_control_release); +} +EXPORT_SYMBOL_GPL(wf_unregister_control); + +struct wf_control * wf_find_control(const char *name) +{ + struct wf_control *ct; + + down(&wf_lock); + list_for_each_entry(ct, &wf_controls, link) { + if (!strcmp(ct->name, name)) { + if (wf_get_control(ct)) + ct = NULL; + up(&wf_lock); + return ct; + } + } + up(&wf_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(wf_find_control); + +int wf_get_control(struct wf_control *ct) +{ + if (!try_module_get(ct->ops->owner)) + return -ENODEV; + kref_get(&ct->ref); + return 0; +} +EXPORT_SYMBOL_GPL(wf_get_control); + +void wf_put_control(struct wf_control *ct) +{ + struct module *mod = ct->ops->owner; + kref_put(&ct->ref, wf_control_release); + module_put(mod); +} +EXPORT_SYMBOL_GPL(wf_put_control); + + +/* + * Sensors + */ + + +static void wf_sensor_release(struct kref *kref) +{ + struct wf_sensor *sr = container_of(kref, struct wf_sensor, ref); + + DBG("wf: Deleting sensor %s\n", sr->name); + + if (sr->ops && sr->ops->release) + sr->ops->release(sr); + else + kfree(sr); +} + +int wf_register_sensor(struct wf_sensor *new_sr) +{ + struct wf_sensor *sr; + + down(&wf_lock); + list_for_each_entry(sr, &wf_sensors, link) { + if (!strcmp(sr->name, new_sr->name)) { + printk(KERN_WARNING "windfarm: trying to register" + " duplicate sensor %s\n", sr->name); + up(&wf_lock); + return -EEXIST; + } + } + kref_init(&new_sr->ref); + list_add(&new_sr->link, &wf_sensors); + + DBG("wf: Registered sensor %s\n", new_sr->name); + + wf_notify(WF_EVENT_NEW_SENSOR, new_sr); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_register_sensor); + +void wf_unregister_sensor(struct wf_sensor *sr) +{ + down(&wf_lock); + list_del(&sr->link); + up(&wf_lock); + + DBG("wf: Unregistered sensor %s\n", sr->name); + + wf_put_sensor(sr); +} +EXPORT_SYMBOL_GPL(wf_unregister_sensor); + +struct wf_sensor * wf_find_sensor(const char *name) +{ + struct wf_sensor *sr; + + down(&wf_lock); + list_for_each_entry(sr, &wf_sensors, link) { + if (!strcmp(sr->name, name)) { + if (wf_get_sensor(sr)) + sr = NULL; + up(&wf_lock); + return sr; + } + } + up(&wf_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(wf_find_sensor); + +int wf_get_sensor(struct wf_sensor *sr) +{ + if (!try_module_get(sr->ops->owner)) + return -ENODEV; + kref_get(&sr->ref); + return 0; +} +EXPORT_SYMBOL_GPL(wf_get_sensor); + +void wf_put_sensor(struct wf_sensor *sr) +{ + struct module *mod = sr->ops->owner; + kref_put(&sr->ref, wf_sensor_release); + module_put(mod); +} +EXPORT_SYMBOL_GPL(wf_put_sensor); + + +/* + * Client & notification + */ + +int wf_register_client(struct notifier_block *nb) +{ + int rc; + struct wf_control *ct; + struct wf_sensor *sr; + + down(&wf_lock); + rc = notifier_chain_register(&wf_client_list, nb); + if (rc != 0) + goto bail; + wf_client_count++; + list_for_each_entry(ct, &wf_controls, link) + wf_notify(WF_EVENT_NEW_CONTROL, ct); + list_for_each_entry(sr, &wf_sensors, link) + wf_notify(WF_EVENT_NEW_SENSOR, sr); + if (wf_client_count == 1) + wf_start_thread(); + bail: + up(&wf_lock); + return rc; +} +EXPORT_SYMBOL_GPL(wf_register_client); + +int wf_unregister_client(struct notifier_block *nb) +{ + down(&wf_lock); + notifier_chain_unregister(&wf_client_list, nb); + wf_client_count++; + if (wf_client_count == 0) + wf_stop_thread(); + up(&wf_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(wf_unregister_client); + +void wf_set_overtemp(void) +{ + down(&wf_lock); + wf_overtemp++; + if (wf_overtemp == 1) { + printk(KERN_WARNING "windfarm: Overtemp condition detected !\n"); + wf_overtemp_counter = 0; + wf_notify(WF_EVENT_OVERTEMP, NULL); + } + up(&wf_lock); +} +EXPORT_SYMBOL_GPL(wf_set_overtemp); + +void wf_clear_overtemp(void) +{ + down(&wf_lock); + WARN_ON(wf_overtemp == 0); + if (wf_overtemp == 0) { + up(&wf_lock); + return; + } + wf_overtemp--; + if (wf_overtemp == 0) { + printk(KERN_WARNING "windfarm: Overtemp condition cleared !\n"); + wf_notify(WF_EVENT_NORMALTEMP, NULL); + } + up(&wf_lock); +} +EXPORT_SYMBOL_GPL(wf_clear_overtemp); + +int wf_is_overtemp(void) +{ + return (wf_overtemp != 0); +} +EXPORT_SYMBOL_GPL(wf_is_overtemp); + +static struct platform_device wf_platform_device = { + .name = "windfarm", +}; + +static int __init windfarm_core_init(void) +{ + DBG("wf: core loaded\n"); + + platform_device_register(&wf_platform_device); + return 0; +} + +static void __exit windfarm_core_exit(void) +{ + BUG_ON(wf_client_count != 0); + + DBG("wf: core unloaded\n"); + + platform_device_unregister(&wf_platform_device); +} + + +module_init(windfarm_core_init); +module_exit(windfarm_core_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Core component of PowerMac thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_smu_controls.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_smu_controls.c 2005-11-07 16:05:16.000000000 +1100 @@ -0,0 +1,282 @@ +/* + * Windfarm PowerMac thermal control. SMU based controls + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.3" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* + * SMU fans control object + */ + +static LIST_HEAD(smu_fans); + +struct smu_fan_control { + struct list_head link; + int fan_type; /* 0 = rpm, 1 = pwm */ + u32 reg; /* index in SMU */ + s32 value; /* current value */ + s32 min, max; /* min/max values */ + struct wf_control ctrl; +}; +#define to_smu_fan(c) container_of(c, struct smu_fan_control, ctrl) + +static int smu_set_fan(int pwm, u8 id, u16 value) +{ + struct smu_cmd cmd; + u8 buffer[16]; + DECLARE_COMPLETION(comp); + int rc; + + /* Fill SMU command structure */ + cmd.cmd = SMU_CMD_FAN_COMMAND; + cmd.data_len = 14; + cmd.reply_len = 16; + cmd.data_buf = cmd.reply_buf = buffer; + cmd.status = 0; + cmd.done = smu_done_complete; + cmd.misc = ∁ + + /* Fill argument buffer */ + memset(buffer, 0, 16); + buffer[0] = pwm ? 0x10 : 0x00; + buffer[1] = 0x01 << id; + *((u16 *)&buffer[2 + id * 2]) = value; + + rc = smu_queue_cmd(&cmd); + if (rc) + return rc; + wait_for_completion(&comp); + return cmd.status; +} + +static void smu_fan_release(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + + kfree(fct); +} + +static int smu_fan_set(struct wf_control *ct, s32 value) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + + if (value < fct->min) + value = fct->min; + if (value > fct->max) + value = fct->max; + fct->value = value; + + return smu_set_fan(fct->fan_type, fct->reg, value); +} + +static int smu_fan_get(struct wf_control *ct, s32 *value) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + *value = fct->value; /* todo: read from SMU */ + return 0; +} + +static s32 smu_fan_min(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + return fct->min; +} + +static s32 smu_fan_max(struct wf_control *ct) +{ + struct smu_fan_control *fct = to_smu_fan(ct); + return fct->max; +} + +static struct wf_control_ops smu_fan_ops = { + .set_value = smu_fan_set, + .get_value = smu_fan_get, + .get_min = smu_fan_min, + .get_max = smu_fan_max, + .release = smu_fan_release, + .owner = THIS_MODULE, +}; + +static struct smu_fan_control *smu_fan_create(struct device_node *node, + int pwm_fan) +{ + struct smu_fan_control *fct; + s32 *v; u32 *reg; + char *l; + + fct = kmalloc(sizeof(struct smu_fan_control), GFP_KERNEL); + if (fct == NULL) + return NULL; + fct->ctrl.ops = &smu_fan_ops; + l = (char *)get_property(node, "location", NULL); + if (l == NULL) + goto fail; + + fct->fan_type = pwm_fan; + fct->ctrl.type = pwm_fan ? WF_CONTROL_PWM_FAN : WF_CONTROL_RPM_FAN; + + /* We use the name & location here the same way we do for SMU sensors, + * see the comment in windfarm_smu_sensors.c. The locations are a bit + * less consistent here between the iMac and the desktop models, but + * that is good enough for our needs for now at least. + * + * One problem though is that Apple seem to be inconsistent with case + * and the kernel doesn't have strcasecmp =P + */ + + fct->ctrl.name = NULL; + + /* Names used on desktop models */ + if (!strcmp(l, "Rear Fan 0") || !strcmp(l, "Rear Fan") || + !strcmp(l, "Rear fan 0") || !strcmp(l, "Rear fan")) + fct->ctrl.name = "cpu-rear-fan-0"; + else if (!strcmp(l, "Rear Fan 1") || !strcmp(l, "Rear fan 1")) + fct->ctrl.name = "cpu-rear-fan-1"; + else if (!strcmp(l, "Front Fan 0") || !strcmp(l, "Front Fan") || + !strcmp(l, "Front fan 0") || !strcmp(l, "Front fan")) + fct->ctrl.name = "cpu-front-fan-0"; + else if (!strcmp(l, "Front Fan 1") || !strcmp(l, "Front fan 1")) + fct->ctrl.name = "cpu-front-fan-1"; + else if (!strcmp(l, "Slots Fan") || !strcmp(l, "Slots fan")) + fct->ctrl.name = "slots-fan"; + else if (!strcmp(l, "Drive Bay") || !strcmp(l, "Drive bay")) + fct->ctrl.name = "drive-bay-fan"; + + /* Names used on iMac models */ + if (!strcmp(l, "System Fan") || !strcmp(l, "System fan")) + fct->ctrl.name = "system-fan"; + else if (!strcmp(l, "CPU Fan") || !strcmp(l, "CPU fan")) + fct->ctrl.name = "cpu-fan"; + else if (!strcmp(l, "Hard Drive") || !strcmp(l, "Hard drive")) + fct->ctrl.name = "drive-bay-fan"; + + /* Unrecognized fan, bail out */ + if (fct->ctrl.name == NULL) + goto fail; + + /* Get min & max values*/ + v = (s32 *)get_property(node, "min-value", NULL); + if (v == NULL) + goto fail; + fct->min = *v; + v = (s32 *)get_property(node, "max-value", NULL); + if (v == NULL) + goto fail; + fct->max = *v; + + /* Get "reg" value */ + reg = (u32 *)get_property(node, "reg", NULL); + if (reg == NULL) + goto fail; + fct->reg = *reg; + + if (wf_register_control(&fct->ctrl)) + goto fail; + + return fct; + fail: + kfree(fct); + return NULL; +} + + +static int __init smu_controls_init(void) +{ + struct device_node *smu, *fans, *fan; + + if (!smu_present()) + return -ENODEV; + + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return -ENODEV; + + /* Look for RPM fans */ + for (fans = NULL; (fans = of_get_next_child(smu, fans)) != NULL;) + if (!strcmp(fans->name, "rpm-fans")) + break; + for (fan = NULL; + fans && (fan = of_get_next_child(fans, fan)) != NULL;) { + struct smu_fan_control *fct; + + fct = smu_fan_create(fan, 0); + if (fct == NULL) { + printk(KERN_WARNING "windfarm: Failed to create SMU " + "RPM fan %s\n", fan->name); + continue; + } + list_add(&fct->link, &smu_fans); + } + of_node_put(fans); + + + /* Look for PWM fans */ + for (fans = NULL; (fans = of_get_next_child(smu, fans)) != NULL;) + if (!strcmp(fans->name, "pwm-fans")) + break; + for (fan = NULL; + fans && (fan = of_get_next_child(fans, fan)) != NULL;) { + struct smu_fan_control *fct; + + fct = smu_fan_create(fan, 1); + if (fct == NULL) { + printk(KERN_WARNING "windfarm: Failed to create SMU " + "PWM fan %s\n", fan->name); + continue; + } + list_add(&fct->link, &smu_fans); + } + of_node_put(fans); + of_node_put(smu); + + return 0; +} + +static void __exit smu_controls_exit(void) +{ + struct smu_fan_control *fct; + + while (!list_empty(&smu_fans)) { + fct = list_entry(smu_fans.next, struct smu_fan_control, link); + list_del(&fct->link); + wf_unregister_control(&fct->ctrl); + } +} + + +module_init(smu_controls_init); +module_exit(smu_controls_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("SMU control objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_smu_sensors.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_smu_sensors.c 2005-11-07 16:05:04.000000000 +1100 @@ -0,0 +1,479 @@ +/* + * Windfarm PowerMac thermal control. SMU based sensors + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.2" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* + * Various SMU "partitions" calibration objects for which we + * keep pointers here for use by bits & pieces of the driver + */ +static struct smu_sdbp_cpuvcp *cpuvcp; +static int cpuvcp_version; +static struct smu_sdbp_cpudiode *cpudiode; +static struct smu_sdbp_slotspow *slotspow; +static u8 *debugswitches; + +/* + * SMU basic sensors objects + */ + +static LIST_HEAD(smu_ads); + +struct smu_ad_sensor { + struct list_head link; + u32 reg; /* index in SMU */ + struct wf_sensor sens; +}; +#define to_smu_ads(c) container_of(c, struct smu_ad_sensor, sens) + +static void smu_ads_release(struct wf_sensor *sr) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + + kfree(ads); +} + +static int smu_read_adc(u8 id, s32 *value) +{ + struct smu_simple_cmd cmd; + DECLARE_COMPLETION(comp); + int rc; + + rc = smu_queue_simple(&cmd, SMU_CMD_READ_ADC, 1, + smu_done_complete, &comp, id); + if (rc) + return rc; + wait_for_completion(&comp); + if (cmd.cmd.status != 0) + return cmd.cmd.status; + if (cmd.cmd.reply_len != 2) { + printk(KERN_ERR "winfarm: read ADC 0x%x returned %d bytes !\n", + id, cmd.cmd.reply_len); + return -EIO; + } + *value = *((u16 *)cmd.buffer); + return 0; +} + +static int smu_cputemp_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + int rc; + s32 val; + s64 scaled; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU temp failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s64)(((u64)val) * (u64)cpudiode->m_value); + scaled >>= 3; + scaled += ((s64)cpudiode->b_value) << 9; + *value = (s32)(scaled << 1); + + return 0; +} + +static int smu_cpuamp_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU current failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)cpuvcp->curr_scale); + scaled += (s32)cpuvcp->curr_offset; + *value = scaled << 4; + + return 0; +} + +static int smu_cpuvolt_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read CPU voltage failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)cpuvcp->volt_scale); + scaled += (s32)cpuvcp->volt_offset; + *value = scaled << 4; + + return 0; +} + +static int smu_slotspow_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_ad_sensor *ads = to_smu_ads(sr); + s32 val, scaled; + int rc; + + rc = smu_read_adc(ads->reg, &val); + if (rc) { + printk(KERN_ERR "windfarm: read slots power failed, err %d\n", + rc); + return rc; + } + + /* Ok, we have to scale & adjust, taking units into account */ + scaled = (s32)(val * (u32)slotspow->pow_scale); + scaled += (s32)slotspow->pow_offset; + *value = scaled << 4; + + return 0; +} + + +static struct wf_sensor_ops smu_cputemp_ops = { + .get_value = smu_cputemp_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_cpuamp_ops = { + .get_value = smu_cpuamp_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_cpuvolt_ops = { + .get_value = smu_cpuvolt_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; +static struct wf_sensor_ops smu_slotspow_ops = { + .get_value = smu_slotspow_get, + .release = smu_ads_release, + .owner = THIS_MODULE, +}; + + +static struct smu_ad_sensor *smu_ads_create(struct device_node *node) +{ + struct smu_ad_sensor *ads; + char *c, *l; + u32 *v; + + ads = kmalloc(sizeof(struct smu_ad_sensor), GFP_KERNEL); + if (ads == NULL) + return NULL; + c = (char *)get_property(node, "device_type", NULL); + l = (char *)get_property(node, "location", NULL); + if (c == NULL || l == NULL) + goto fail; + + /* We currently pick the sensors based on the OF name and location + * properties, while Darwin uses the sensor-id's. + * The problem with the IDs is that they are model specific while it + * looks like apple has been doing a reasonably good job at keeping + * the names and locations consistents so I'll stick with the names + * and locations for now. + */ + if (!strcmp(c, "temp-sensor") && + !strcmp(l, "CPU T-Diode")) { + ads->sens.ops = &smu_cputemp_ops; + ads->sens.name = "cpu-temp"; + } else if (!strcmp(c, "current-sensor") && + !strcmp(l, "CPU Current")) { + ads->sens.ops = &smu_cpuamp_ops; + ads->sens.name = "cpu-current"; + } else if (!strcmp(c, "voltage-sensor") && + !strcmp(l, "CPU Voltage")) { + ads->sens.ops = &smu_cpuvolt_ops; + ads->sens.name = "cpu-voltage"; + } else if (!strcmp(c, "power-sensor") && + !strcmp(l, "Slots Power")) { + ads->sens.ops = &smu_slotspow_ops; + ads->sens.name = "slots-power"; + if (slotspow == NULL) { + DBG("wf: slotspow partition (%02x) not found\n", + SMU_SDB_SLOTSPOW_ID); + goto fail; + } + } else + goto fail; + + v = (u32 *)get_property(node, "reg", NULL); + if (v == NULL) + goto fail; + ads->reg = *v; + + if (wf_register_sensor(&ads->sens)) + goto fail; + return ads; + fail: + kfree(ads); + return NULL; +} + +/* + * SMU Power combo sensor object + */ + +struct smu_cpu_power_sensor { + struct list_head link; + struct wf_sensor *volts; + struct wf_sensor *amps; + int fake_volts : 1; + int quadratic : 1; + struct wf_sensor sens; +}; +#define to_smu_cpu_power(c) container_of(c, struct smu_cpu_power_sensor, sens) + +static struct smu_cpu_power_sensor *smu_cpu_power; + +static void smu_cpu_power_release(struct wf_sensor *sr) +{ + struct smu_cpu_power_sensor *pow = to_smu_cpu_power(sr); + + if (pow->volts) + wf_put_sensor(pow->volts); + if (pow->amps) + wf_put_sensor(pow->amps); + kfree(pow); +} + +static int smu_cpu_power_get(struct wf_sensor *sr, s32 *value) +{ + struct smu_cpu_power_sensor *pow = to_smu_cpu_power(sr); + s32 volts, amps, power; + u64 tmps, tmpa, tmpb; + int rc; + + rc = pow->amps->ops->get_value(pow->amps, &s); + if (rc) + return rc; + + if (pow->fake_volts) { + *value = amps * 12 - 0x30000; + return 0; + } + + rc = pow->volts->ops->get_value(pow->volts, &volts); + if (rc) + return rc; + + power = (s32)((((u64)volts) * ((u64)amps)) >> 16); + if (!pow->quadratic) { + *value = power; + return 0; + } + tmps = (((u64)power) * ((u64)power)) >> 16; + tmpa = ((u64)cpuvcp->power_quads[0]) * tmps; + tmpb = ((u64)cpuvcp->power_quads[1]) * ((u64)power); + *value = (tmpa >> 28) + (tmpb >> 28) + (cpuvcp->power_quads[2] >> 12); + + return 0; +} + +static struct wf_sensor_ops smu_cpu_power_ops = { + .get_value = smu_cpu_power_get, + .release = smu_cpu_power_release, + .owner = THIS_MODULE, +}; + + +static struct smu_cpu_power_sensor * +smu_cpu_power_create(struct wf_sensor *volts, struct wf_sensor *amps) +{ + struct smu_cpu_power_sensor *pow; + + pow = kmalloc(sizeof(struct smu_cpu_power_sensor), GFP_KERNEL); + if (pow == NULL) + return NULL; + pow->sens.ops = &smu_cpu_power_ops; + pow->sens.name = "cpu-power"; + + wf_get_sensor(volts); + pow->volts = volts; + wf_get_sensor(amps); + pow->amps = amps; + + /* Some early machines need a faked voltage */ + if (debugswitches && ((*debugswitches) & 0x80)) { + printk(KERN_INFO "windfarm: CPU Power sensor using faked" + " voltage !\n"); + pow->fake_volts = 1; + } else + pow->fake_volts = 0; + + /* Try to use quadratic transforms on PowerMac8,1 and 9,1 for now, + * I yet have to figure out what's up with 8,2 and will have to + * adjust for later, unless we can 100% trust the SDB partition... + */ + if ((machine_is_compatible("PowerMac8,1") || + machine_is_compatible("PowerMac8,2") || + machine_is_compatible("PowerMac9,1")) && + cpuvcp_version >= 2) { + pow->quadratic = 1; + DBG("windfarm: CPU Power using quadratic transform\n"); + } else + pow->quadratic = 0; + + if (wf_register_sensor(&pow->sens)) + goto fail; + return pow; + fail: + kfree(pow); + return NULL; +} + +static int smu_fetch_param_partitions(void) +{ + struct smu_sdbp_header *hdr; + + /* Get CPU voltage/current/power calibration data */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUVCP_ID, NULL); + if (hdr == NULL) { + DBG("wf: cpuvcp partition (%02x) not found\n", + SMU_SDB_CPUVCP_ID); + return -ENODEV; + } + cpuvcp = (struct smu_sdbp_cpuvcp *)&hdr[1]; + /* Keep version around */ + cpuvcp_version = hdr->version; + + /* Get CPU diode calibration data */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUDIODE_ID, NULL); + if (hdr == NULL) { + DBG("wf: cpudiode partition (%02x) not found\n", + SMU_SDB_CPUDIODE_ID); + return -ENODEV; + } + cpudiode = (struct smu_sdbp_cpudiode *)&hdr[1]; + + /* Get slots power calibration data if any */ + hdr = smu_get_sdb_partition(SMU_SDB_SLOTSPOW_ID, NULL); + if (hdr != NULL) + slotspow = (struct smu_sdbp_slotspow *)&hdr[1]; + + /* Get debug switches if any */ + hdr = smu_get_sdb_partition(SMU_SDB_DEBUG_SWITCHES_ID, NULL); + if (hdr != NULL) + debugswitches = (u8 *)&hdr[1]; + + return 0; +} + +static int __init smu_sensors_init(void) +{ + struct device_node *smu, *sensors, *s; + struct smu_ad_sensor *volt_sensor = NULL, *curr_sensor = NULL; + int rc; + + if (!smu_present()) + return -ENODEV; + + /* Get parameters partitions */ + rc = smu_fetch_param_partitions(); + if (rc) + return rc; + + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return -ENODEV; + + /* Look for sensors subdir */ + for (sensors = NULL; + (sensors = of_get_next_child(smu, sensors)) != NULL;) + if (!strcmp(sensors->name, "sensors")) + break; + + of_node_put(smu); + + /* Create basic sensors */ + for (s = NULL; + sensors && (s = of_get_next_child(sensors, s)) != NULL;) { + struct smu_ad_sensor *ads; + + ads = smu_ads_create(s); + if (ads == NULL) + continue; + list_add(&ads->link, &smu_ads); + /* keep track of cpu voltage & current */ + if (!strcmp(ads->sens.name, "cpu-voltage")) + volt_sensor = ads; + else if (!strcmp(ads->sens.name, "cpu-current")) + curr_sensor = ads; + } + + of_node_put(sensors); + + /* Create CPU power sensor if possible */ + if (volt_sensor && curr_sensor) + smu_cpu_power = smu_cpu_power_create(&volt_sensor->sens, + &curr_sensor->sens); + + return 0; +} + +static void __exit smu_sensors_exit(void) +{ + struct smu_ad_sensor *ads; + + /* dispose of power sensor */ + if (smu_cpu_power) + wf_unregister_sensor(&smu_cpu_power->sens); + + /* dispose of basic sensors */ + while (!list_empty(&smu_ads)) { + ads = list_entry(smu_ads.next, struct smu_ad_sensor, link); + list_del(&ads->link); + wf_unregister_sensor(&ads->sens); + } +} + + +module_init(smu_sensors_init); +module_exit(smu_sensors_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("SMU sensor objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_lm75_sensor.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_lm75_sensor.c 2005-11-07 16:04:50.000000000 +1100 @@ -0,0 +1,263 @@ +/* + * Windfarm PowerMac thermal control. LM75 sensor + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.1" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +struct wf_lm75_sensor { + int ds1775 : 1; + int inited : 1; + struct i2c_client i2c; + struct wf_sensor sens; +}; +#define wf_to_lm75(c) container_of(c, struct wf_lm75_sensor, sens) +#define i2c_to_lm75(c) container_of(c, struct wf_lm75_sensor, i2c) + +static int wf_lm75_attach(struct i2c_adapter *adapter); +static int wf_lm75_detach(struct i2c_client *client); + +static struct i2c_driver wf_lm75_driver = { + .owner = THIS_MODULE, + .name = "wf_lm75", + .flags = I2C_DF_NOTIFY, + .attach_adapter = wf_lm75_attach, + .detach_client = wf_lm75_detach, +}; + +static int wf_lm75_get(struct wf_sensor *sr, s32 *value) +{ + struct wf_lm75_sensor *lm = wf_to_lm75(sr); + s32 data; + + if (lm->i2c.adapter == NULL) + return -ENODEV; + + /* Init chip if necessary */ + if (!lm->inited) { + u8 cfg_new, cfg = (u8)i2c_smbus_read_byte_data(&lm->i2c, 1); + + DBG("wf_lm75: Initializing %s, cfg was: %02x\n", + sr->name, cfg); + + /* clear shutdown bit, keep other settings as left by + * the firmware for now + */ + cfg_new = cfg & ~0x01; + i2c_smbus_write_byte_data(&lm->i2c, 1, cfg_new); + lm->inited = 1; + + /* If we just powered it up, let's wait 200 ms */ + msleep(200); + } + + /* Read temperature register */ + data = (s32)le16_to_cpu(i2c_smbus_read_word_data(&lm->i2c, 0)); + data <<= 8; + *value = data; + + return 0; +} + +static void wf_lm75_release(struct wf_sensor *sr) +{ + struct wf_lm75_sensor *lm = wf_to_lm75(sr); + + /* check if client is registered and detach from i2c */ + if (lm->i2c.adapter) { + i2c_detach_client(&lm->i2c); + lm->i2c.adapter = NULL; + } + + kfree(lm); +} + +static struct wf_sensor_ops wf_lm75_ops = { + .get_value = wf_lm75_get, + .release = wf_lm75_release, + .owner = THIS_MODULE, +}; + +static struct wf_lm75_sensor *wf_lm75_create(struct i2c_adapter *adapter, + u8 addr, int ds1775, + const char *loc) +{ + struct wf_lm75_sensor *lm; + + DBG("wf_lm75: creating %s device at address 0x%02x\n", + ds1775 ? "ds1775" : "lm75", addr); + + lm = kmalloc(sizeof(struct wf_lm75_sensor), GFP_KERNEL); + if (lm == NULL) + return NULL; + memset(lm, 0, sizeof(struct wf_lm75_sensor)); + + /* Usual rant about sensor names not beeing very consistent in + * the device-tree, oh well ... + * Add more entries below as you deal with more setups + */ + if (!strcmp(loc, "Hard drive") || !strcmp(loc, "DRIVE BAY")) + lm->sens.name = "hd-temp"; + else + goto fail; + + lm->inited = 0; + lm->sens.ops = &wf_lm75_ops; + lm->ds1775 = ds1775; + lm->i2c.addr = (addr >> 1) & 0x7f; + lm->i2c.adapter = adapter; + lm->i2c.driver = &wf_lm75_driver; + strncpy(lm->i2c.name, lm->sens.name, I2C_NAME_SIZE-1); + + if (i2c_attach_client(&lm->i2c)) { + printk(KERN_ERR "windfarm: failed to attach %s %s to i2c\n", + ds1775 ? "ds1775" : "lm75", lm->i2c.name); + goto fail; + } + + if (wf_register_sensor(&lm->sens)) { + i2c_detach_client(&lm->i2c); + goto fail; + } + + return lm; + fail: + kfree(lm); + return NULL; +} + +static int wf_lm75_attach(struct i2c_adapter *adapter) +{ + u8 bus_id; + struct device_node *smu, *bus, *dev; + + /* We currently only deal with LM75's hanging off the SMU + * i2c busses. If we extend that driver to other/older + * machines, we should split this function into SMU-i2c, + * keywest-i2c, PMU-i2c, ... + */ + + DBG("wf_lm75: adapter %s detected\n", adapter->name); + + if (strncmp(adapter->name, "smu-i2c-", 8) != 0) + return 0; + smu = of_find_node_by_type(NULL, "smu"); + if (smu == NULL) + return 0; + + /* Look for the bus in the device-tree */ + bus_id = (u8)simple_strtoul(adapter->name + 8, NULL, 16); + + DBG("wf_lm75: bus ID is %x\n", bus_id); + + /* Look for sensors subdir */ + for (bus = NULL; + (bus = of_get_next_child(smu, bus)) != NULL;) { + u32 *reg; + + if (strcmp(bus->name, "i2c")) + continue; + reg = (u32 *)get_property(bus, "reg", NULL); + if (reg == NULL) + continue; + if (bus_id == *reg) + break; + } + of_node_put(smu); + if (bus == NULL) { + printk(KERN_WARNING "windfarm: SMU i2c bus 0x%x not found" + " in device-tree !\n", bus_id); + return 0; + } + + DBG("wf_lm75: bus found, looking for device...\n"); + + /* Now look for lm75(s) in there */ + for (dev = NULL; + (dev = of_get_next_child(bus, dev)) != NULL;) { + const char *loc = + get_property(dev, "hwsensor-location", NULL); + u32 *reg = (u32 *)get_property(dev, "reg", NULL); + DBG(" dev: %s... (loc: %p, reg: %p)\n", dev->name, loc, reg); + if (loc == NULL || reg == NULL) + continue; + /* real lm75 */ + if (device_is_compatible(dev, "lm75")) + wf_lm75_create(adapter, *reg, 0, loc); + /* ds1775 (compatible, better resolution */ + else if (device_is_compatible(dev, "ds1775")) + wf_lm75_create(adapter, *reg, 1, loc); + } + + of_node_put(bus); + + return 0; +} + +static int wf_lm75_detach(struct i2c_client *client) +{ + struct wf_lm75_sensor *lm = i2c_to_lm75(client); + + DBG("wf_lm75: i2c detatch called for %s\n", lm->sens.name); + + /* Mark client detached */ + lm->i2c.adapter = NULL; + + /* release sensor */ + wf_unregister_sensor(&lm->sens); + + return 0; +} + +static int __init wf_lm75_sensor_init(void) +{ + int rc; + + rc = i2c_add_driver(&wf_lm75_driver); + if (rc < 0) + return rc; + return 0; +} + +static void __exit wf_lm75_sensor_exit(void) +{ + i2c_del_driver(&wf_lm75_driver); +} + + +module_init(wf_lm75_sensor_init); +module_exit(wf_lm75_sensor_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("LM75 sensor objects for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pid.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pid.c 2005-11-07 16:04:41.000000000 +1100 @@ -0,0 +1,145 @@ +/* + * Windfarm PowerMac thermal control. Generic PID helpers + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + */ + +#include +#include +#include +#include +#include + +#include "windfarm_pid.h" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +void wf_pid_init(struct wf_pid_state *st, struct wf_pid_param *param) +{ + memset(st, 0, sizeof(struct wf_pid_state)); + st->param = *param; + st->first = 1; +} +EXPORT_SYMBOL_GPL(wf_pid_init); + +s32 wf_pid_run(struct wf_pid_state *st, s32 new_sample) +{ + s64 error, integ, deriv; + s32 target; + int i, hlen = st->param.history_len; + + /* Calculate error term */ + error = new_sample - st->param.itarget; + + /* Get samples into our history buffer */ + if (st->first) { + for (i = 0; i < hlen; i++) { + st->samples[i] = new_sample; + st->errors[i] = error; + } + st->first = 0; + st->index = 0; + } else { + st->index = (st->index + 1) % hlen; + st->samples[st->index] = new_sample; + st->errors[st->index] = error; + } + + /* Calculate integral term */ + for (i = 0, integ = 0; i < hlen; i++) + integ += st->errors[(st->index + hlen - i) % hlen]; + integ *= st->param.interval; + + /* Calculate derivative term */ + deriv = st->errors[st->index] - + st->errors[(st->index + hlen - 1) % hlen]; + deriv /= st->param.interval; + + /* Calculate target */ + target = (s32)((integ * (s64)st->param.gr + deriv * (s64)st->param.gd + + error * (s64)st->param.gp) >> 36); + if (st->param.additive) + target += st->target; + target = max(target, st->param.min); + target = min(target, st->param.max); + st->target = target; + + return st->target; +} +EXPORT_SYMBOL_GPL(wf_pid_run); + +void wf_cpu_pid_init(struct wf_cpu_pid_state *st, + struct wf_cpu_pid_param *param) +{ + memset(st, 0, sizeof(struct wf_cpu_pid_state)); + st->param = *param; + st->first = 1; +} +EXPORT_SYMBOL_GPL(wf_cpu_pid_init); + +s32 wf_cpu_pid_run(struct wf_cpu_pid_state *st, s32 new_power, s32 new_temp) +{ + s64 error, integ, deriv, prop; + s32 target, sval, adj; + int i, hlen = st->param.history_len; + + /* Calculate error term */ + error = st->param.pmaxadj - new_power; + + /* Get samples into our history buffer */ + if (st->first) { + for (i = 0; i < hlen; i++) { + st->powers[i] = new_power; + st->errors[i] = error; + } + st->temps[0] = st->temps[1] = new_temp; + st->first = 0; + st->index = st->tindex = 0; + } else { + st->index = (st->index + 1) % hlen; + st->powers[st->index] = new_power; + st->errors[st->index] = error; + st->tindex = (st->tindex + 1) % 2; + st->temps[st->tindex] = new_temp; + } + + /* Calculate integral term */ + for (i = 0, integ = 0; i < hlen; i++) + integ += st->errors[(st->index + hlen - i) % hlen]; + integ *= st->param.interval; + integ *= st->param.gr; + sval = st->param.tmax - ((integ >> 20) & 0xffffffff); + adj = min(st->param.ttarget, sval); + + DBG("integ: %lx, sval: %lx, adj: %lx\n", integ, sval, adj); + + /* Calculate derivative term */ + deriv = st->temps[st->tindex] - + st->temps[(st->tindex + 2 - 1) % 2]; + deriv /= st->param.interval; + deriv *= st->param.gd; + + /* Calculate proportional term */ + prop = (new_temp - adj); + prop *= st->param.gp; + + DBG("deriv: %lx, prop: %lx\n", deriv, prop); + + /* Calculate target */ + target = st->target + (s32)((deriv + prop) >> 36); + target = max(target, st->param.min); + target = min(target, st->param.max); + st->target = target; + + return st->target; +} +EXPORT_SYMBOL_GPL(wf_cpu_pid_run); Index: linux-work/drivers/macintosh/windfarm_pid.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pid.h 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,84 @@ +/* + * Windfarm PowerMac thermal control. Generic PID helpers + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * This is a pair of generic PID helpers that can be used by + * control loops. One is the basic PID implementation, the + * other one is more specifically tailored to the loops used + * for CPU control with 2 input sample types (temp and power) + */ + +/* + * *** Simple PID *** + */ + +#define WF_PID_MAX_HISTORY 32 + +/* This parameter array is passed to the PID algorithm. Currently, + * we don't support changing parameters on the fly as it's not needed + * but could be implemented (with necessary adjustment of the history + * buffer + */ +struct wf_pid_param { + int interval; /* Interval between samples in seconds */ + int history_len; /* Size of history buffer */ + int additive; /* 1: target relative to previous value */ + s32 gd, gp, gr; /* PID gains */ + s32 itarget; /* PID input target */ + s32 min,max; /* min and max target values */ +}; + +struct wf_pid_state { + int first; /* first run of the loop */ + int index; /* index of current sample */ + s32 target; /* current target value */ + s32 samples[WF_PID_MAX_HISTORY]; /* samples history buffer */ + s32 errors[WF_PID_MAX_HISTORY]; /* error history buffer */ + + struct wf_pid_param param; +}; + +extern void wf_pid_init(struct wf_pid_state *st, struct wf_pid_param *param); +extern s32 wf_pid_run(struct wf_pid_state *st, s32 sample); + + +/* + * *** CPU PID *** + */ + +#define WF_CPU_PID_MAX_HISTORY 32 + +/* This parameter array is passed to the CPU PID algorithm. Currently, + * we don't support changing parameters on the fly as it's not needed + * but could be implemented (with necessary adjustment of the history + * buffer + */ +struct wf_cpu_pid_param { + int interval; /* Interval between samples in seconds */ + int history_len; /* Size of history buffer */ + s32 gd, gp, gr; /* PID gains */ + s32 pmaxadj; /* PID max power adjust */ + s32 ttarget; /* PID input target */ + s32 tmax; /* PID input max */ + s32 min,max; /* min and max target values */ +}; + +struct wf_cpu_pid_state { + int first; /* first run of the loop */ + int index; /* index of current power */ + int tindex; /* index of current temp */ + s32 target; /* current target value */ + s32 powers[WF_PID_MAX_HISTORY]; /* power history buffer */ + s32 errors[WF_PID_MAX_HISTORY]; /* error history buffer */ + s32 temps[2]; /* temp. history buffer */ + + struct wf_cpu_pid_param param; +}; + +extern void wf_cpu_pid_init(struct wf_cpu_pid_state *st, + struct wf_cpu_pid_param *param); +extern s32 wf_cpu_pid_run(struct wf_cpu_pid_state *st, s32 power, s32 temp); Index: linux-work/drivers/macintosh/windfarm_cpufreq_clamp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_cpufreq_clamp.c 2005-11-07 13:30:46.000000000 +1100 @@ -0,0 +1,105 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" + +#define VERSION "0.3" + +static int clamped; +static struct wf_control *clamp_control; + +static int clamp_notifier_call(struct notifier_block *self, + unsigned long event, void *data) +{ + struct cpufreq_policy *p = data; + unsigned long max_freq; + + if (event != CPUFREQ_ADJUST) + return 0; + + max_freq = clamped ? (p->cpuinfo.min_freq) : (p->cpuinfo.max_freq); + cpufreq_verify_within_limits(p, 0, max_freq); + + return 0; +} + +static struct notifier_block clamp_notifier = { + .notifier_call = clamp_notifier_call, +}; + +static int clamp_set(struct wf_control *ct, s32 value) +{ + if (value) + printk(KERN_INFO "windfarm: Clamping CPU frequency to " + "minimum !\n"); + else + printk(KERN_INFO "windfarm: CPU frequency unclamped !\n"); + clamped = value; + cpufreq_update_policy(0); + return 0; +} + +static int clamp_get(struct wf_control *ct, s32 *value) +{ + *value = clamped; + return 0; +} + +static s32 clamp_min(struct wf_control *ct) +{ + return 0; +} + +static s32 clamp_max(struct wf_control *ct) +{ + return 1; +} + +static struct wf_control_ops clamp_ops = { + .set_value = clamp_set, + .get_value = clamp_get, + .get_min = clamp_min, + .get_max = clamp_max, + .owner = THIS_MODULE, +}; + +static int __init wf_cpufreq_clamp_init(void) +{ + struct wf_control *clamp; + + clamp = kmalloc(sizeof(struct wf_control), GFP_KERNEL); + if (clamp == NULL) + return -ENOMEM; + cpufreq_register_notifier(&clamp_notifier, CPUFREQ_POLICY_NOTIFIER); + clamp->ops = &clamp_ops; + clamp->name = "cpufreq-clamp"; + if (wf_register_control(clamp)) + goto fail; + clamp_control = clamp; + return 0; + fail: + kfree(clamp); + return -ENODEV; +} + +static void __exit wf_cpufreq_clamp_exit(void) +{ + if (clamp_control) + wf_unregister_control(clamp_control); +} + + +module_init(wf_cpufreq_clamp_init); +module_exit(wf_cpufreq_clamp_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("CPU frequency clamp for PowerMacs thermal control"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pm81.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pm81.c 2005-11-07 16:04:23.000000000 +1100 @@ -0,0 +1,879 @@ +/* + * Windfarm PowerMac thermal control. iMac G5 + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * The algorithm used is the PID control algorithm, used the same + * way the published Darwin code does, using the same values that + * are present in the Darwin 8.2 snapshot property lists (note however + * that none of the code has been re-used, it's a complete re-implementation + * + * The various control loops found in Darwin config file are: + * + * PowerMac8,1 and PowerMac8,2 + * =========================== + * + * System Fans control loop. Different based on models. In addition to the + * usual PID algorithm, the control loop gets 2 additional pairs of linear + * scaling factors (scale/offsets) expressed as 4.12 fixed point values + * signed offset, unsigned scale) + * + * The targets are modified such as: + * - the linked control (second control) gets the target value as-is + * (typically the drive fan) + * - the main control (first control) gets the target value scaled with + * the first pair of factors, and is then modified as below + * - the value of the target of the CPU Fan control loop is retreived, + * scaled with the second pair of factors, and the max of that and + * the scaled target is applied to the main control. + * + * # model_id: 2 + * controls : system-fan, drive-bay-fan + * sensors : hd-temp + * PID params : G_d = 0x15400000 + * G_p = 0x00200000 + * G_r = 0x000002fd + * History = 2 entries + * Input target = 0x3a0000 + * Interval = 5s + * linear-factors : offset = 0xff38 scale = 0x0ccd + * offset = 0x0208 scale = 0x07ae + * + * # model_id: 3 + * controls : system-fan, drive-bay-fan + * sensors : hd-temp + * PID params : G_d = 0x08e00000 + * G_p = 0x00566666 + * G_r = 0x0000072b + * History = 2 entries + * Input target = 0x350000 + * Interval = 5s + * linear-factors : offset = 0xff38 scale = 0x0ccd + * offset = 0x0000 scale = 0x0000 + * + * # model_id: 5 + * controls : system-fan + * sensors : hd-temp + * PID params : G_d = 0x15400000 + * G_p = 0x00233333 + * G_r = 0x000002fd + * History = 2 entries + * Input target = 0x3a0000 + * Interval = 5s + * linear-factors : offset = 0x0000 scale = 0x1000 + * offset = 0x0091 scale = 0x0bae + * + * CPU Fan control loop. The loop is identical for all models. it + * has an additional pair of scaling factor. This is used to scale the + * systems fan control loop target result (the one before it gets scaled + * by the System Fans control loop itself). Then, the max value of the + * calculated target value and system fan value is sent to the fans + * + * controls : cpu-fan + * sensors : cpu-temp cpu-power + * PID params : From SMU sdb partition + * linear-factors : offset = 0xfb50 scale = 0x1000 + * + * CPU Slew control loop. Not implemented. The cpufreq driver in linux is + * completely separate for now, though we could find a way to link it, either + * as a client reacting to overtemp notifications, or directling monitoring + * the CPU temperature + * + * WARNING ! The CPU control loop requires the CPU tmax for the current + * operating point. However, we currently are completely separated from + * the cpufreq driver and thus do not know what the current operating + * point is. Fortunately, we also do not have any hardware supporting anything + * but operating point 0 at the moment, thus we just peek that value directly + * from the SDB partition. If we ever end up with actually slewing the system + * clock and thus changing operating points, we'll have to find a way to + * communicate with the CPU freq driver; + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" +#include "windfarm_pid.h" + +#define VERSION "0.4" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* define this to force CPU overtemp to 74 degree, useful for testing + * the overtemp code + */ +#undef HACKED_OVERTEMP + +static int wf_smu_mach_model; /* machine model id */ + +static struct device *wf_smu_dev; + +/* Controls & sensors */ +static struct wf_sensor *sensor_cpu_power; +static struct wf_sensor *sensor_cpu_temp; +static struct wf_sensor *sensor_hd_temp; +static struct wf_control *fan_cpu_main; +static struct wf_control *fan_hd; +static struct wf_control *fan_system; +static struct wf_control *cpufreq_clamp; + +/* Set to kick the control loop into life */ +static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; + +/* Failure handling.. could be nicer */ +#define FAILURE_FAN 0x01 +#define FAILURE_SENSOR 0x02 +#define FAILURE_OVERTEMP 0x04 + +static unsigned int wf_smu_failure_state; +static int wf_smu_readjust, wf_smu_skipping; + +/* + * ****** System Fans Control Loop ****** + * + */ + +/* Parameters for the System Fans control loop. Parameters + * not in this table such as interval, history size, ... + * are common to all versions and thus hard coded for now. + */ +struct wf_smu_sys_fans_param { + int model_id; + s32 itarget; + s32 gd, gp, gr; + + s16 offset0; + u16 scale0; + s16 offset1; + u16 scale1; +}; + +#define WF_SMU_SYS_FANS_INTERVAL 5 +#define WF_SMU_SYS_FANS_HISTORY_SIZE 2 + +/* State data used by the system fans control loop + */ +struct wf_smu_sys_fans_state { + int ticks; + s32 sys_setpoint; + s32 hd_setpoint; + s16 offset0; + u16 scale0; + s16 offset1; + u16 scale1; + struct wf_pid_state pid; +}; + +/* + * Configs for SMU Sytem Fan control loop + */ +static struct wf_smu_sys_fans_param wf_smu_sys_all_params[] = { + /* Model ID 2 */ + { + .model_id = 2, + .itarget = 0x3a0000, + .gd = 0x15400000, + .gp = 0x00200000, + .gr = 0x000002fd, + .offset0 = 0xff38, + .scale0 = 0x0ccd, + .offset1 = 0x0208, + .scale1 = 0x07ae, + }, + /* Model ID 3 */ + { + .model_id = 2, + .itarget = 0x350000, + .gd = 0x08e00000, + .gp = 0x00566666, + .gr = 0x0000072b, + .offset0 = 0xff38, + .scale0 = 0x0ccd, + .offset1 = 0x0000, + .scale1 = 0x0000, + }, + /* Model ID 5 */ + { + .model_id = 2, + .itarget = 0x3a0000, + .gd = 0x15400000, + .gp = 0x00233333, + .gr = 0x000002fd, + .offset0 = 0x0000, + .scale0 = 0x1000, + .offset1 = 0x0091, + .scale1 = 0x0bae, + }, +}; +#define WF_SMU_SYS_FANS_NUM_CONFIGS ARRAY_SIZE(wf_smu_sys_all_params) + +static struct wf_smu_sys_fans_state *wf_smu_sys_fans; + +/* + * ****** CPU Fans Control Loop ****** + * + */ + + +#define WF_SMU_CPU_FANS_INTERVAL 1 +#define WF_SMU_CPU_FANS_MAX_HISTORY 16 +#define WF_SMU_CPU_FANS_SIBLING_SCALE 0x00001000 +#define WF_SMU_CPU_FANS_SIBLING_OFFSET 0xfffffb50 + +/* State data used by the cpu fans control loop + */ +struct wf_smu_cpu_fans_state { + int ticks; + s32 cpu_setpoint; + s32 scale; + s32 offset; + struct wf_cpu_pid_state pid; +}; + +static struct wf_smu_cpu_fans_state *wf_smu_cpu_fans; + + + +/* + * ***** Implementation ***** + * + */ + +static void wf_smu_create_sys_fans(void) +{ + struct wf_smu_sys_fans_param *param = NULL; + struct wf_pid_param pid_param; + int i; + + /* First, locate the params for this model */ + for (i = 0; i < WF_SMU_SYS_FANS_NUM_CONFIGS; i++) + if (wf_smu_sys_all_params[i].model_id == wf_smu_mach_model) { + param = &wf_smu_sys_all_params[i]; + break; + } + + /* No params found, put fans to max */ + if (param == NULL) { + printk(KERN_WARNING "windfarm: System fan config not found " + "for this machine model, max fan speed\n"); + goto fail; + } + + /* Alloc & initialize state */ + wf_smu_sys_fans = kmalloc(sizeof(struct wf_smu_sys_fans_state), + GFP_KERNEL); + if (wf_smu_sys_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_sys_fans->ticks = 1; + wf_smu_sys_fans->scale0 = param->scale0; + wf_smu_sys_fans->offset0 = param->offset0; + wf_smu_sys_fans->scale1 = param->scale1; + wf_smu_sys_fans->offset1 = param->offset1; + + /* Fill PID params */ + pid_param.gd = param->gd; + pid_param.gp = param->gp; + pid_param.gr = param->gr; + pid_param.interval = WF_SMU_SYS_FANS_INTERVAL; + pid_param.history_len = WF_SMU_SYS_FANS_HISTORY_SIZE; + pid_param.itarget = param->itarget; + pid_param.min = fan_system->ops->get_min(fan_system); + pid_param.max = fan_system->ops->get_max(fan_system); + if (fan_hd) { + pid_param.min = + max(pid_param.min,fan_hd->ops->get_min(fan_hd)); + pid_param.max = + min(pid_param.max,fan_hd->ops->get_max(fan_hd)); + } + wf_pid_init(&wf_smu_sys_fans->pid, &pid_param); + + DBG("wf: System Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.itarget), pid_param.min, pid_param.max); + return; + + fail: + + if (fan_system) + wf_control_set_max(fan_system); + if (fan_hd) + wf_control_set_max(fan_hd); +} + +static void wf_smu_sys_fans_tick(struct wf_smu_sys_fans_state *st) +{ + s32 new_setpoint, temp, scaled, cputarget; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_SYS_FANS_INTERVAL; + + rc = sensor_hd_temp->ops->get_value(sensor_hd_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: HD temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: System Fans tick ! HD temp: %d.%03d\n", + FIX32TOPRINT(temp)); + + if (temp > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; + + new_setpoint = wf_pid_run(&st->pid, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + scaled = ((((s64)new_setpoint) * (s64)st->scale0) >> 12) + st->offset0; + + DBG("wf_smu: scaled setpoint: %d RPM\n", (int)scaled); + + cputarget = wf_smu_cpu_fans ? wf_smu_cpu_fans->pid.target : 0; + cputarget = ((((s64)cputarget) * (s64)st->scale1) >> 12) + st->offset1; + scaled = max(scaled, cputarget); + scaled = max(scaled, st->pid.param.min); + scaled = min(scaled, st->pid.param.max); + + DBG("wf_smu: adjusted setpoint: %d RPM\n", (int)scaled); + + if (st->sys_setpoint == scaled && new_setpoint == st->hd_setpoint) + return; + st->sys_setpoint = scaled; + st->hd_setpoint = new_setpoint; + readjust: + if (fan_system && wf_smu_failure_state == 0) { + rc = fan_system->ops->set_value(fan_system, st->sys_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: Sys fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_hd && wf_smu_failure_state == 0) { + rc = fan_hd->ops->set_value(fan_hd, st->hd_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: HD fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_cpu_fans(void) +{ + struct wf_cpu_pid_param pid_param; + struct smu_sdbp_header *hdr; + struct smu_sdbp_cpupiddata *piddata; + struct smu_sdbp_fvt *fvt; + s32 tmax, tdelta, maxpow, powadj; + + /* First, locate the PID params in SMU SBD */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUPIDDATA_ID, NULL); + if (hdr == 0) { + printk(KERN_WARNING "windfarm: CPU PID fan config not found " + "max fan speed\n"); + goto fail; + } + piddata = (struct smu_sdbp_cpupiddata *)&hdr[1]; + + /* Get the FVT params for operating point 0 (the only supported one + * for now) in order to get tmax + */ + hdr = smu_get_sdb_partition(SMU_SDB_FVT_ID, NULL); + if (hdr) { + fvt = (struct smu_sdbp_fvt *)&hdr[1]; + tmax = ((s32)fvt->maxtemp) << 16; + } else + tmax = 0x5e0000; /* 94 degree default */ + + /* Alloc & initialize state */ + wf_smu_cpu_fans = kmalloc(sizeof(struct wf_smu_cpu_fans_state), + GFP_KERNEL); + if (wf_smu_cpu_fans == NULL) + goto fail; + wf_smu_cpu_fans->ticks = 1; + + wf_smu_cpu_fans->scale = WF_SMU_CPU_FANS_SIBLING_SCALE; + wf_smu_cpu_fans->offset = WF_SMU_CPU_FANS_SIBLING_OFFSET; + + /* Fill PID params */ + pid_param.interval = WF_SMU_CPU_FANS_INTERVAL; + pid_param.history_len = piddata->history_len; + if (pid_param.history_len > WF_CPU_PID_MAX_HISTORY) { + printk(KERN_WARNING "windfarm: History size overflow on " + "CPU control loop (%d)\n", piddata->history_len); + pid_param.history_len = WF_CPU_PID_MAX_HISTORY; + } + pid_param.gd = piddata->gd; + pid_param.gp = piddata->gp; + pid_param.gr = piddata->gr / pid_param.history_len; + + tdelta = ((s32)piddata->target_temp_delta) << 16; + maxpow = ((s32)piddata->max_power) << 16; + powadj = ((s32)piddata->power_adj) << 16; + + pid_param.tmax = tmax; + pid_param.ttarget = tmax - tdelta; + pid_param.pmaxadj = maxpow - powadj; + + pid_param.min = fan_cpu_main->ops->get_min(fan_cpu_main); + pid_param.max = fan_cpu_main->ops->get_max(fan_cpu_main); + + wf_cpu_pid_init(&wf_smu_cpu_fans->pid, &pid_param); + + DBG("wf: CPU Fan control initialized.\n"); + DBG(" ttarged=%d.%03d, tmax=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.ttarget), FIX32TOPRINT(pid_param.tmax), + pid_param.min, pid_param.max); + + return; + + fail: + printk(KERN_WARNING "windfarm: CPU fan config not found\n" + "for this machine model, max fan speed\n"); + + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); +} + +static void wf_smu_cpu_fans_tick(struct wf_smu_cpu_fans_state *st) +{ + s32 new_setpoint, temp, power, systarget; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_CPU_FANS_INTERVAL; + + rc = sensor_cpu_temp->ops->get_value(sensor_cpu_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: CPU temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + rc = sensor_cpu_power->ops->get_value(sensor_cpu_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: CPU power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: CPU Fans tick ! CPU temp: %d.%03d, power: %d.%03d\n", + FIX32TOPRINT(temp), FIX32TOPRINT(power)); + +#ifdef HACKED_OVERTEMP + if (temp > 0x4a0000) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#else + if (temp > st->pid.param.tmax) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + new_setpoint = wf_cpu_pid_run(&st->pid, power, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + systarget = wf_smu_sys_fans ? wf_smu_sys_fans->pid.target : 0; + systarget = ((((s64)systarget) * (s64)st->scale) >> 12) + + st->offset; + new_setpoint = max(new_setpoint, systarget); + new_setpoint = max(new_setpoint, st->pid.param.min); + new_setpoint = min(new_setpoint, st->pid.param.max); + + DBG("wf_smu: adjusted setpoint: %d RPM\n", (int)new_setpoint); + + if (st->cpu_setpoint == new_setpoint) + return; + st->cpu_setpoint = new_setpoint; + readjust: + if (fan_cpu_main && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_main, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU main fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + + +/* + * ****** Attributes ****** + * + */ + +#define BUILD_SHOW_FUNC_FIX(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + ssize_t r; \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + r = sprintf(buf, "%d.%03d", FIX32TOPRINT(val)); \ + return r; \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + + +#define BUILD_SHOW_FUNC_INT(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + return sprintf(buf, "%d", val); \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + +BUILD_SHOW_FUNC_INT(cpu_fan, fan_cpu_main); +BUILD_SHOW_FUNC_INT(sys_fan, fan_system); +BUILD_SHOW_FUNC_INT(hd_fan, fan_hd); + +BUILD_SHOW_FUNC_FIX(cpu_temp, sensor_cpu_temp); +BUILD_SHOW_FUNC_FIX(cpu_power, sensor_cpu_power); +BUILD_SHOW_FUNC_FIX(hd_temp, sensor_hd_temp); + +/* + * ****** Setup / Init / Misc ... ****** + * + */ + +static void wf_smu_tick(void) +{ + unsigned int last_failure = wf_smu_failure_state; + unsigned int new_failure; + + if (!wf_smu_started) { + DBG("wf: creating control loops !\n"); + wf_smu_create_sys_fans(); + wf_smu_create_cpu_fans(); + wf_smu_started = 1; + } + + /* Skipping ticks */ + if (wf_smu_skipping && --wf_smu_skipping) + return; + + wf_smu_failure_state = 0; + if (wf_smu_sys_fans) + wf_smu_sys_fans_tick(wf_smu_sys_fans); + if (wf_smu_cpu_fans) + wf_smu_cpu_fans_tick(wf_smu_cpu_fans); + + wf_smu_readjust = 0; + new_failure = wf_smu_failure_state & ~last_failure; + + /* If entering failure mode, clamp cpufreq and ramp all + * fans to full speed. + */ + if (wf_smu_failure_state && !last_failure) { + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_system) + wf_control_set_max(fan_system); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); + if (fan_hd) + wf_control_set_max(fan_hd); + } + + /* If leaving failure mode, unclamp cpufreq and readjust + * all fans on next iteration + */ + if (!wf_smu_failure_state && last_failure) { + if (cpufreq_clamp) + wf_control_set_min(cpufreq_clamp); + wf_smu_readjust = 1; + } + + /* Overtemp condition detected, notify and start skipping a couple + * ticks to let the temperature go down + */ + if (new_failure & FAILURE_OVERTEMP) { + wf_set_overtemp(); + wf_smu_skipping = 2; + } + + /* We only clear the overtemp condition if overtemp is cleared + * _and_ no other failure is present. Since a sensor error will + * clear the overtemp condition (can't measure temperature) at + * the control loop levels, but we don't want to keep it clear + * here in this case + */ + if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) + wf_clear_overtemp(); +} + +static void wf_smu_new_control(struct wf_control *ct) +{ + if (wf_smu_all_controls_ok) + return; + + if (fan_cpu_main == NULL && !strcmp(ct->name, "cpu-fan")) { + if (wf_get_control(ct) == 0) { + fan_cpu_main = ct; + device_create_file(wf_smu_dev, &dev_attr_cpu_fan); + } + } + + if (fan_system == NULL && !strcmp(ct->name, "system-fan")) { + if (wf_get_control(ct) == 0) { + fan_system = ct; + device_create_file(wf_smu_dev, &dev_attr_sys_fan); + } + } + + if (cpufreq_clamp == NULL && !strcmp(ct->name, "cpufreq-clamp")) { + if (wf_get_control(ct) == 0) + cpufreq_clamp = ct; + } + + /* Darwin property list says the HD fan is only for model ID + * 0, 1, 2 and 3 + */ + + if (wf_smu_mach_model > 3) { + if (fan_system && fan_cpu_main && cpufreq_clamp) + wf_smu_all_controls_ok = 1; + return; + } + + if (fan_hd == NULL && !strcmp(ct->name, "drive-bay-fan")) { + if (wf_get_control(ct) == 0) { + fan_hd = ct; + device_create_file(wf_smu_dev, &dev_attr_hd_fan); + } + } + + if (fan_system && fan_hd && fan_cpu_main && cpufreq_clamp) + wf_smu_all_controls_ok = 1; +} + +static void wf_smu_new_sensor(struct wf_sensor *sr) +{ + if (wf_smu_all_sensors_ok) + return; + + if (sensor_cpu_power == NULL && !strcmp(sr->name, "cpu-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_power = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_power); + } + } + + if (sensor_cpu_temp == NULL && !strcmp(sr->name, "cpu-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_temp); + } + } + + if (sensor_hd_temp == NULL && !strcmp(sr->name, "hd-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_hd_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_hd_temp); + } + } + + if (sensor_cpu_power && sensor_cpu_temp && sensor_hd_temp) + wf_smu_all_sensors_ok = 1; +} + + +static int wf_smu_notify(struct notifier_block *self, + unsigned long event, void *data) +{ + switch(event) { + case WF_EVENT_NEW_CONTROL: + DBG("wf: new control %s detected\n", + ((struct wf_control *)data)->name); + wf_smu_new_control(data); + wf_smu_readjust = 1; + break; + case WF_EVENT_NEW_SENSOR: + DBG("wf: new sensor %s detected\n", + ((struct wf_sensor *)data)->name); + wf_smu_new_sensor(data); + break; + case WF_EVENT_TICK: + if (wf_smu_all_controls_ok && wf_smu_all_sensors_ok) + wf_smu_tick(); + } + + return 0; +} + +static struct notifier_block wf_smu_events = { + .notifier_call = wf_smu_notify, +}; + +static int wf_init_pm(void) +{ + struct smu_sdbp_header *hdr; + + hdr = smu_get_sdb_partition(SMU_SDB_SENSORTREE_ID, NULL); + if (hdr != 0) { + struct smu_sdbp_sensortree *st = + (struct smu_sdbp_sensortree *)&hdr[1]; + wf_smu_mach_model = st->model_id; + } + + printk(KERN_INFO "windfarm: Initializing for iMacG5 model ID %d\n", + wf_smu_mach_model); + + return 0; +} + +static int wf_smu_probe(struct device *ddev) +{ + wf_smu_dev = ddev; + + wf_register_client(&wf_smu_events); + + return 0; +} + +static int wf_smu_remove(struct device *ddev) +{ + wf_unregister_client(&wf_smu_events); + + /* XXX We don't have yet a guarantee that our callback isn't + * in progress when returning from wf_unregister_client, so + * we add an arbitrary delay. I'll have to fix that in the core + */ + msleep(1000); + + /* Release all sensors */ + /* One more crappy race: I don't think we have any guarantee here + * that the attribute callback won't race with the sensor beeing + * disposed of, and I'm not 100% certain what best way to deal + * with that except by adding locks all over... I'll do that + * eventually but heh, who ever rmmod this module anyway ? + */ + if (sensor_cpu_power) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_power); + wf_put_sensor(sensor_cpu_power); + } + if (sensor_cpu_temp) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_temp); + wf_put_sensor(sensor_cpu_temp); + } + if (sensor_hd_temp) { + device_remove_file(wf_smu_dev, &dev_attr_hd_temp); + wf_put_sensor(sensor_hd_temp); + } + + /* Release all controls */ + if (fan_cpu_main) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_fan); + wf_put_control(fan_cpu_main); + } + if (fan_hd) { + device_remove_file(wf_smu_dev, &dev_attr_hd_fan); + wf_put_control(fan_hd); + } + if (fan_system) { + device_remove_file(wf_smu_dev, &dev_attr_sys_fan); + wf_put_control(fan_system); + } + if (cpufreq_clamp) + wf_put_control(cpufreq_clamp); + + /* Destroy control loops state structures */ + if (wf_smu_sys_fans) + kfree(wf_smu_sys_fans); + if (wf_smu_cpu_fans) + kfree(wf_smu_cpu_fans); + + wf_smu_dev = NULL; + + return 0; +} + +static struct device_driver wf_smu_driver = { + .name = "windfarm", + .bus = &platform_bus_type, + .probe = wf_smu_probe, + .remove = wf_smu_remove, +}; + + +static int __init wf_smu_init(void) +{ + int rc = -ENODEV; + + if (machine_is_compatible("PowerMac8,1") || + machine_is_compatible("PowerMac8,2")) + rc = wf_init_pm(); + + if (rc == 0) { +#ifdef MODULE + request_module("windfarm_smu_controls"); + request_module("windfarm_smu_sensors"); + request_module("windfarm_lm75_sensor"); + +#endif /* MODULE */ + driver_register(&wf_smu_driver); + } + + return rc; +} + +static void __exit wf_smu_exit(void) +{ + + driver_unregister(&wf_smu_driver); +} + + +module_init(wf_smu_init); +module_exit(wf_smu_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Thermal control logic for iMac G5"); +MODULE_LICENSE("GPL"); + Index: linux-work/drivers/macintosh/windfarm_pm91.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/drivers/macintosh/windfarm_pm91.c 2005-11-07 16:04:13.000000000 +1100 @@ -0,0 +1,814 @@ +/* + * Windfarm PowerMac thermal control. SMU based 1 CPU desktop control loops + * + * (c) Copyright 2005 Benjamin Herrenschmidt, IBM Corp. + * + * + * Released under the term of the GNU GPL v2. + * + * The algorithm used is the PID control algorithm, used the same + * way the published Darwin code does, using the same values that + * are present in the Darwin 8.2 snapshot property lists (note however + * that none of the code has been re-used, it's a complete re-implementation + * + * The various control loops found in Darwin config file are: + * + * PowerMac9,1 + * =========== + * + * Has 3 control loops: CPU fans is similar to PowerMac8,1 (though it doesn't + * try to play with other control loops fans). Drive bay is rather basic PID + * with one sensor and one fan. Slots area is a bit different as the Darwin + * driver is supposed to be capable of working in a special "AGP" mode which + * involves the presence of an AGP sensor and an AGP fan (possibly on the + * AGP card itself). I can't deal with that special mode as I don't have + * access to those additional sensor/fans for now (though ultimately, it would + * be possible to add sensor objects for them) so I'm only implementing the + * basic PCI slot control loop + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "windfarm.h" +#include "windfarm_pid.h" + +#define VERSION "0.4" + +#undef DEBUG + +#ifdef DEBUG +#define DBG(args...) printk(args) +#else +#define DBG(args...) do { } while(0) +#endif + +/* define this to force CPU overtemp to 74 degree, useful for testing + * the overtemp code + */ +#undef HACKED_OVERTEMP + +static struct device *wf_smu_dev; + +/* Controls & sensors */ +static struct wf_sensor *sensor_cpu_power; +static struct wf_sensor *sensor_cpu_temp; +static struct wf_sensor *sensor_hd_temp; +static struct wf_sensor *sensor_slots_power; +static struct wf_control *fan_cpu_main; +static struct wf_control *fan_cpu_second; +static struct wf_control *fan_cpu_third; +static struct wf_control *fan_hd; +static struct wf_control *fan_slots; +static struct wf_control *cpufreq_clamp; + +/* Set to kick the control loop into life */ +static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; + +/* Failure handling.. could be nicer */ +#define FAILURE_FAN 0x01 +#define FAILURE_SENSOR 0x02 +#define FAILURE_OVERTEMP 0x04 + +static unsigned int wf_smu_failure_state; +static int wf_smu_readjust, wf_smu_skipping; + +/* + * ****** CPU Fans Control Loop ****** + * + */ + + +#define WF_SMU_CPU_FANS_INTERVAL 1 +#define WF_SMU_CPU_FANS_MAX_HISTORY 16 + +/* State data used by the cpu fans control loop + */ +struct wf_smu_cpu_fans_state { + int ticks; + s32 cpu_setpoint; + struct wf_cpu_pid_state pid; +}; + +static struct wf_smu_cpu_fans_state *wf_smu_cpu_fans; + + + +/* + * ****** Drive Fan Control Loop ****** + * + */ + +struct wf_smu_drive_fans_state { + int ticks; + s32 setpoint; + struct wf_pid_state pid; +}; + +static struct wf_smu_drive_fans_state *wf_smu_drive_fans; + +/* + * ****** Slots Fan Control Loop ****** + * + */ + +struct wf_smu_slots_fans_state { + int ticks; + s32 setpoint; + struct wf_pid_state pid; +}; + +static struct wf_smu_slots_fans_state *wf_smu_slots_fans; + +/* + * ***** Implementation ***** + * + */ + + +static void wf_smu_create_cpu_fans(void) +{ + struct wf_cpu_pid_param pid_param; + struct smu_sdbp_header *hdr; + struct smu_sdbp_cpupiddata *piddata; + struct smu_sdbp_fvt *fvt; + s32 tmax, tdelta, maxpow, powadj; + + /* First, locate the PID params in SMU SBD */ + hdr = smu_get_sdb_partition(SMU_SDB_CPUPIDDATA_ID, NULL); + if (hdr == 0) { + printk(KERN_WARNING "windfarm: CPU PID fan config not found " + "max fan speed\n"); + goto fail; + } + piddata = (struct smu_sdbp_cpupiddata *)&hdr[1]; + + /* Get the FVT params for operating point 0 (the only supported one + * for now) in order to get tmax + */ + hdr = smu_get_sdb_partition(SMU_SDB_FVT_ID, NULL); + if (hdr) { + fvt = (struct smu_sdbp_fvt *)&hdr[1]; + tmax = ((s32)fvt->maxtemp) << 16; + } else + tmax = 0x5e0000; /* 94 degree default */ + + /* Alloc & initialize state */ + wf_smu_cpu_fans = kmalloc(sizeof(struct wf_smu_cpu_fans_state), + GFP_KERNEL); + if (wf_smu_cpu_fans == NULL) + goto fail; + wf_smu_cpu_fans->ticks = 1; + + /* Fill PID params */ + pid_param.interval = WF_SMU_CPU_FANS_INTERVAL; + pid_param.history_len = piddata->history_len; + if (pid_param.history_len > WF_CPU_PID_MAX_HISTORY) { + printk(KERN_WARNING "windfarm: History size overflow on " + "CPU control loop (%d)\n", piddata->history_len); + pid_param.history_len = WF_CPU_PID_MAX_HISTORY; + } + pid_param.gd = piddata->gd; + pid_param.gp = piddata->gp; + pid_param.gr = piddata->gr / pid_param.history_len; + + tdelta = ((s32)piddata->target_temp_delta) << 16; + maxpow = ((s32)piddata->max_power) << 16; + powadj = ((s32)piddata->power_adj) << 16; + + pid_param.tmax = tmax; + pid_param.ttarget = tmax - tdelta; + pid_param.pmaxadj = maxpow - powadj; + + pid_param.min = fan_cpu_main->ops->get_min(fan_cpu_main); + pid_param.max = fan_cpu_main->ops->get_max(fan_cpu_main); + + wf_cpu_pid_init(&wf_smu_cpu_fans->pid, &pid_param); + + DBG("wf: CPU Fan control initialized.\n"); + DBG(" ttarged=%d.%03d, tmax=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(pid_param.ttarget), FIX32TOPRINT(pid_param.tmax), + pid_param.min, pid_param.max); + + return; + + fail: + printk(KERN_WARNING "windfarm: CPU fan config not found\n" + "for this machine model, max fan speed\n"); + + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); +} + +static void wf_smu_cpu_fans_tick(struct wf_smu_cpu_fans_state *st) +{ + s32 new_setpoint, temp, power; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = WF_SMU_CPU_FANS_INTERVAL; + + rc = sensor_cpu_temp->ops->get_value(sensor_cpu_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: CPU temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + rc = sensor_cpu_power->ops->get_value(sensor_cpu_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: CPU power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: CPU Fans tick ! CPU temp: %d.%03d, power: %d.%03d\n", + FIX32TOPRINT(temp), FIX32TOPRINT(power)); + +#ifdef HACKED_OVERTEMP + if (temp > 0x4a0000) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#else + if (temp > st->pid.param.tmax) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + new_setpoint = wf_cpu_pid_run(&st->pid, power, temp); + + DBG("wf_smu: new_setpoint: %d RPM\n", (int)new_setpoint); + + if (st->cpu_setpoint == new_setpoint) + return; + st->cpu_setpoint = new_setpoint; + readjust: + if (fan_cpu_main && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_main, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU main fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_cpu_second && wf_smu_failure_state == 0) { + rc = fan_cpu_second->ops->set_value(fan_cpu_second, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU second fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } + if (fan_cpu_third && wf_smu_failure_state == 0) { + rc = fan_cpu_main->ops->set_value(fan_cpu_third, + st->cpu_setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: CPU third fan" + " error %d\n", rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_drive_fans(void) +{ + struct wf_pid_param param = { + .interval = 5, + .history_len = 2, + .gd = 0x01e00000, + .gp = 0x00500000, + .gr = 0x00000000, + .itarget = 0x00200000, + }; + + /* Alloc & initialize state */ + wf_smu_drive_fans = kmalloc(sizeof(struct wf_smu_drive_fans_state), + GFP_KERNEL); + if (wf_smu_drive_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_drive_fans->ticks = 1; + + /* Fill PID params */ + param.additive = (fan_hd->type == WF_CONTROL_RPM_FAN); + param.min = fan_hd->ops->get_min(fan_hd); + param.max = fan_hd->ops->get_max(fan_hd); + wf_pid_init(&wf_smu_drive_fans->pid, ¶m); + + DBG("wf: Drive Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(param.itarget), param.min, param.max); + return; + + fail: + if (fan_hd) + wf_control_set_max(fan_hd); +} + +static void wf_smu_drive_fans_tick(struct wf_smu_drive_fans_state *st) +{ + s32 new_setpoint, temp; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = st->pid.param.interval; + + rc = sensor_hd_temp->ops->get_value(sensor_hd_temp, &temp); + if (rc) { + printk(KERN_WARNING "windfarm: HD temp sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: Drive Fans tick ! HD temp: %d.%03d\n", + FIX32TOPRINT(temp)); + + if (temp > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; + + new_setpoint = wf_pid_run(&st->pid, temp); + + DBG("wf_smu: new_setpoint: %d\n", (int)new_setpoint); + + if (st->setpoint == new_setpoint) + return; + st->setpoint = new_setpoint; + readjust: + if (fan_hd && wf_smu_failure_state == 0) { + rc = fan_hd->ops->set_value(fan_hd, st->setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: HD fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + +static void wf_smu_create_slots_fans(void) +{ + struct wf_pid_param param = { + .interval = 1, + .history_len = 8, + .gd = 0x00000000, + .gp = 0x00000000, + .gr = 0x00020000, + .itarget = 0x00000000 + }; + + /* Alloc & initialize state */ + wf_smu_slots_fans = kmalloc(sizeof(struct wf_smu_slots_fans_state), + GFP_KERNEL); + if (wf_smu_slots_fans == NULL) { + printk(KERN_WARNING "windfarm: Memory allocation error" + " max fan speed\n"); + goto fail; + } + wf_smu_slots_fans->ticks = 1; + + /* Fill PID params */ + param.additive = (fan_slots->type == WF_CONTROL_RPM_FAN); + param.min = fan_slots->ops->get_min(fan_slots); + param.max = fan_slots->ops->get_max(fan_slots); + wf_pid_init(&wf_smu_slots_fans->pid, ¶m); + + DBG("wf: Slots Fan control initialized.\n"); + DBG(" itarged=%d.%03d, min=%d RPM, max=%d RPM\n", + FIX32TOPRINT(param.itarget), param.min, param.max); + return; + + fail: + if (fan_slots) + wf_control_set_max(fan_slots); +} + +static void wf_smu_slots_fans_tick(struct wf_smu_slots_fans_state *st) +{ + s32 new_setpoint, power; + int rc; + + if (--st->ticks != 0) { + if (wf_smu_readjust) + goto readjust; + return; + } + st->ticks = st->pid.param.interval; + + rc = sensor_slots_power->ops->get_value(sensor_slots_power, &power); + if (rc) { + printk(KERN_WARNING "windfarm: Slots power sensor error %d\n", + rc); + wf_smu_failure_state |= FAILURE_SENSOR; + return; + } + + DBG("wf_smu: Slots Fans tick ! Slots power: %d.%03d\n", + FIX32TOPRINT(power)); + +#if 0 /* Check what makes a good overtemp condition */ + if (power > (st->pid.param.itarget + 0x50000)) + wf_smu_failure_state |= FAILURE_OVERTEMP; +#endif + + new_setpoint = wf_pid_run(&st->pid, power); + + DBG("wf_smu: new_setpoint: %d\n", (int)new_setpoint); + + if (st->setpoint == new_setpoint) + return; + st->setpoint = new_setpoint; + readjust: + if (fan_slots && wf_smu_failure_state == 0) { + rc = fan_slots->ops->set_value(fan_slots, st->setpoint); + if (rc) { + printk(KERN_WARNING "windfarm: Slots fan error %d\n", + rc); + wf_smu_failure_state |= FAILURE_FAN; + } + } +} + + +/* + * ****** Attributes ****** + * + */ + +#define BUILD_SHOW_FUNC_FIX(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + ssize_t r; \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + r = sprintf(buf, "%d.%03d", FIX32TOPRINT(val)); \ + return r; \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + + +#define BUILD_SHOW_FUNC_INT(name, data) \ +static ssize_t show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + s32 val = 0; \ + data->ops->get_value(data, &val); \ + return sprintf(buf, "%d", val); \ +} \ +static DEVICE_ATTR(name,S_IRUGO,show_##name, NULL); + +BUILD_SHOW_FUNC_INT(cpu_fan, fan_cpu_main); +BUILD_SHOW_FUNC_INT(hd_fan, fan_hd); +BUILD_SHOW_FUNC_INT(slots_fan, fan_slots); + +BUILD_SHOW_FUNC_FIX(cpu_temp, sensor_cpu_temp); +BUILD_SHOW_FUNC_FIX(cpu_power, sensor_cpu_power); +BUILD_SHOW_FUNC_FIX(hd_temp, sensor_hd_temp); +BUILD_SHOW_FUNC_FIX(slots_power, sensor_slots_power); + +/* + * ****** Setup / Init / Misc ... ****** + * + */ + +static void wf_smu_tick(void) +{ + unsigned int last_failure = wf_smu_failure_state; + unsigned int new_failure; + + if (!wf_smu_started) { + DBG("wf: creating control loops !\n"); + wf_smu_create_drive_fans(); + wf_smu_create_slots_fans(); + wf_smu_create_cpu_fans(); + wf_smu_started = 1; + } + + /* Skipping ticks */ + if (wf_smu_skipping && --wf_smu_skipping) + return; + + wf_smu_failure_state = 0; + if (wf_smu_drive_fans) + wf_smu_drive_fans_tick(wf_smu_drive_fans); + if (wf_smu_slots_fans) + wf_smu_slots_fans_tick(wf_smu_slots_fans); + if (wf_smu_cpu_fans) + wf_smu_cpu_fans_tick(wf_smu_cpu_fans); + + wf_smu_readjust = 0; + new_failure = wf_smu_failure_state & ~last_failure; + + /* If entering failure mode, clamp cpufreq and ramp all + * fans to full speed. + */ + if (wf_smu_failure_state && !last_failure) { + if (cpufreq_clamp) + wf_control_set_max(cpufreq_clamp); + if (fan_cpu_main) + wf_control_set_max(fan_cpu_main); + if (fan_cpu_second) + wf_control_set_max(fan_cpu_second); + if (fan_cpu_third) + wf_control_set_max(fan_cpu_third); + if (fan_hd) + wf_control_set_max(fan_hd); + if (fan_slots) + wf_control_set_max(fan_slots); + } + + /* If leaving failure mode, unclamp cpufreq and readjust + * all fans on next iteration + */ + if (!wf_smu_failure_state && last_failure) { + if (cpufreq_clamp) + wf_control_set_min(cpufreq_clamp); + wf_smu_readjust = 1; + } + + /* Overtemp condition detected, notify and start skipping a couple + * ticks to let the temperature go down + */ + if (new_failure & FAILURE_OVERTEMP) { + wf_set_overtemp(); + wf_smu_skipping = 2; + } + + /* We only clear the overtemp condition if overtemp is cleared + * _and_ no other failure is present. Since a sensor error will + * clear the overtemp condition (can't measure temperature) at + * the control loop levels, but we don't want to keep it clear + * here in this case + */ + if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) + wf_clear_overtemp(); +} + + +static void wf_smu_new_control(struct wf_control *ct) +{ + if (wf_smu_all_controls_ok) + return; + + if (fan_cpu_main == NULL && !strcmp(ct->name, "cpu-rear-fan-0")) { + if (wf_get_control(ct) == 0) { + fan_cpu_main = ct; + device_create_file(wf_smu_dev, &dev_attr_cpu_fan); + } + } + + if (fan_cpu_second == NULL && !strcmp(ct->name, "cpu-rear-fan-1")) { + if (wf_get_control(ct) == 0) + fan_cpu_second = ct; + } + + if (fan_cpu_third == NULL && !strcmp(ct->name, "cpu-front-fan-0")) { + if (wf_get_control(ct) == 0) + fan_cpu_third = ct; + } + + if (cpufreq_clamp == NULL && !strcmp(ct->name, "cpufreq-clamp")) { + if (wf_get_control(ct) == 0) + cpufreq_clamp = ct; + } + + if (fan_hd == NULL && !strcmp(ct->name, "drive-bay-fan")) { + if (wf_get_control(ct) == 0) { + fan_hd = ct; + device_create_file(wf_smu_dev, &dev_attr_hd_fan); + } + } + + if (fan_slots == NULL && !strcmp(ct->name, "slots-fan")) { + if (wf_get_control(ct) == 0) { + fan_slots = ct; + device_create_file(wf_smu_dev, &dev_attr_slots_fan); + } + } + + if (fan_cpu_main && (fan_cpu_second || fan_cpu_third) && fan_hd && + fan_slots && cpufreq_clamp) + wf_smu_all_controls_ok = 1; +} + +static void wf_smu_new_sensor(struct wf_sensor *sr) +{ + if (wf_smu_all_sensors_ok) + return; + + if (sensor_cpu_power == NULL && !strcmp(sr->name, "cpu-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_power = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_power); + } + } + + if (sensor_cpu_temp == NULL && !strcmp(sr->name, "cpu-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_cpu_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_cpu_temp); + } + } + + if (sensor_hd_temp == NULL && !strcmp(sr->name, "hd-temp")) { + if (wf_get_sensor(sr) == 0) { + sensor_hd_temp = sr; + device_create_file(wf_smu_dev, &dev_attr_hd_temp); + } + } + + if (sensor_slots_power == NULL && !strcmp(sr->name, "slots-power")) { + if (wf_get_sensor(sr) == 0) { + sensor_slots_power = sr; + device_create_file(wf_smu_dev, &dev_attr_slots_power); + } + } + + if (sensor_cpu_power && sensor_cpu_temp && + sensor_hd_temp && sensor_slots_power) + wf_smu_all_sensors_ok = 1; +} + + +static int wf_smu_notify(struct notifier_block *self, + unsigned long event, void *data) +{ + switch(event) { + case WF_EVENT_NEW_CONTROL: + DBG("wf: new control %s detected\n", + ((struct wf_control *)data)->name); + wf_smu_new_control(data); + wf_smu_readjust = 1; + break; + case WF_EVENT_NEW_SENSOR: + DBG("wf: new sensor %s detected\n", + ((struct wf_sensor *)data)->name); + wf_smu_new_sensor(data); + break; + case WF_EVENT_TICK: + if (wf_smu_all_controls_ok && wf_smu_all_sensors_ok) + wf_smu_tick(); + } + + return 0; +} + +static struct notifier_block wf_smu_events = { + .notifier_call = wf_smu_notify, +}; + +static int wf_init_pm(void) +{ + printk(KERN_INFO "windfarm: Initializing for Desktop G5 model\n"); + + return 0; +} + +static int wf_smu_probe(struct device *ddev) +{ + wf_smu_dev = ddev; + + wf_register_client(&wf_smu_events); + + return 0; +} + +static int wf_smu_remove(struct device *ddev) +{ + wf_unregister_client(&wf_smu_events); + + /* XXX We don't have yet a guarantee that our callback isn't + * in progress when returning from wf_unregister_client, so + * we add an arbitrary delay. I'll have to fix that in the core + */ + msleep(1000); + + /* Release all sensors */ + /* One more crappy race: I don't think we have any guarantee here + * that the attribute callback won't race with the sensor beeing + * disposed of, and I'm not 100% certain what best way to deal + * with that except by adding locks all over... I'll do that + * eventually but heh, who ever rmmod this module anyway ? + */ + if (sensor_cpu_power) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_power); + wf_put_sensor(sensor_cpu_power); + } + if (sensor_cpu_temp) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_temp); + wf_put_sensor(sensor_cpu_temp); + } + if (sensor_hd_temp) { + device_remove_file(wf_smu_dev, &dev_attr_hd_temp); + wf_put_sensor(sensor_hd_temp); + } + if (sensor_slots_power) { + device_remove_file(wf_smu_dev, &dev_attr_slots_power); + wf_put_sensor(sensor_slots_power); + } + + /* Release all controls */ + if (fan_cpu_main) { + device_remove_file(wf_smu_dev, &dev_attr_cpu_fan); + wf_put_control(fan_cpu_main); + } + if (fan_cpu_second) + wf_put_control(fan_cpu_second); + if (fan_cpu_third) + wf_put_control(fan_cpu_third); + if (fan_hd) { + device_remove_file(wf_smu_dev, &dev_attr_hd_fan); + wf_put_control(fan_hd); + } + if (fan_slots) { + device_remove_file(wf_smu_dev, &dev_attr_slots_fan); + wf_put_control(fan_slots); + } + if (cpufreq_clamp) + wf_put_control(cpufreq_clamp); + + /* Destroy control loops state structures */ + if (wf_smu_slots_fans) + kfree(wf_smu_cpu_fans); + if (wf_smu_drive_fans) + kfree(wf_smu_cpu_fans); + if (wf_smu_cpu_fans) + kfree(wf_smu_cpu_fans); + + wf_smu_dev = NULL; + + return 0; +} + +static struct device_driver wf_smu_driver = { + .name = "windfarm", + .bus = &platform_bus_type, + .probe = wf_smu_probe, + .remove = wf_smu_remove, +}; + + +static int __init wf_smu_init(void) +{ + int rc = -ENODEV; + + if (machine_is_compatible("PowerMac9,1")) + rc = wf_init_pm(); + + if (rc == 0) { +#ifdef MODULE + request_module("windfarm_smu_controls"); + request_module("windfarm_smu_sensors"); + request_module("windfarm_lm75_sensor"); + +#endif /* MODULE */ + driver_register(&wf_smu_driver); + } + + return rc; +} + +static void __exit wf_smu_exit(void) +{ + + driver_unregister(&wf_smu_driver); +} + + +module_init(wf_smu_init); +module_exit(wf_smu_exit); + +MODULE_AUTHOR("Benjamin Herrenschmidt "); +MODULE_DESCRIPTION("Thermal control logic for PowerMac9,1"); +MODULE_LICENSE("GPL"); + From anton at samba.org Mon Nov 7 17:43:07 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 7 Nov 2005 17:43:07 +1100 Subject: [PATCH] ppc64: fix Memory: summary line Message-ID: <20051107064306.GK12353@krispykreme> On ppc64 we end up with a negative value for the data size in the memory boot message: Memory: 2035560k/2097152k available (5792k kernel code, 89564k reserved, 18014398509481632k data, 870k bss, 352k init) It turns out the section ordering of the linker script is different on ppc32 and ppc64, so just count data as _edata - _sdata which should work on both. Signed-off-by: Anton Blanchard --- Index: build/arch/powerpc/mm/mem.c =================================================================== --- build.orig/arch/powerpc/mm/mem.c 2005-11-07 16:56:10.000000000 +1100 +++ build/arch/powerpc/mm/mem.c 2005-11-07 17:24:41.000000000 +1100 @@ -358,7 +358,7 @@ } codesize = (unsigned long)&_sdata - (unsigned long)&_stext; - datasize = (unsigned long)&__init_begin - (unsigned long)&_sdata; + datasize = (unsigned long)&_edata - (unsigned long)&_sdata; initsize = (unsigned long)&__init_end - (unsigned long)&__init_begin; bsssize = (unsigned long)&__bss_stop - (unsigned long)&__bss_start; From anton at samba.org Mon Nov 7 18:43:56 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 7 Nov 2005 18:43:56 +1100 Subject: [PATCH] ppc64: fix oprofile sample bit handling Message-ID: <20051107074356.GL12353@krispykreme> Oprofile was hardwiring the MMCRA sample bit to 1 but on newer cpus (eg POWER5) we want to vary it based on the group being sampled. Add a temporary workaround until people update their oprofile userspace. Signed-off-by: Anton Blanchard --- Index: build/arch/powerpc/oprofile/op_model_power4.c =================================================================== --- build.orig/arch/powerpc/oprofile/op_model_power4.c 2005-11-05 20:51:08.000000000 +1100 +++ build/arch/powerpc/oprofile/op_model_power4.c 2005-11-07 18:17:29.000000000 +1100 @@ -17,6 +17,7 @@ #include #include #include +#include #define dbg(args...) @@ -81,6 +82,26 @@ extern void ppc64_enable_pmcs(void); +/* + * Older CPUs require the MMCRA sample bit to be always set, but newer + * CPUs only want it set for some groups. Eventually we will remove all + * knowledge of this bit in the kernel, oprofile userspace should be + * setting it when required. + * + * In order to keep current installations working we force the bit for + * those older CPUs. Once everyone has updated their oprofile userspace we + * can remove this hack. + */ +static inline int mmcra_must_set_sample(void) +{ + if (__is_processor(PV_POWER4) || __is_processor(PV_POWER4p) || + __is_processor(PV_970) || __is_processor(PV_970FX) || + __is_processor(PV_970MP)) + return 1; + + return 0; +} + static void power4_cpu_setup(void *unused) { unsigned int mmcr0 = mmcr0_val; @@ -98,7 +119,8 @@ mtspr(SPRN_MMCR1, mmcr1_val); - mmcra |= MMCRA_SAMPLE_ENABLE; + if (mmcra_must_set_sample()) + mmcra |= MMCRA_SAMPLE_ENABLE; mtspr(SPRN_MMCRA, mmcra); dbg("setup on cpu %d, mmcr0 %lx\n", smp_processor_id(), From anton at samba.org Mon Nov 7 19:05:31 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 7 Nov 2005 19:05:31 +1100 Subject: [PATCH] ppc64: remove some direct xmon calls Message-ID: <20051107080531.GN12353@krispykreme> Even though we can enable and disable xmon at runtime now, there are a few places in the merge tree that call xmon and xmon_printf directly. In the case below we call die() which will call xmon if it is enabled. Also remove an unnecessary include of xmon.h in smp.c. Signed-off-by: Anton Blanchard --- Index: linux-2.6/arch/powerpc/kernel/smp.c =================================================================== --- linux-2.6.orig/arch/powerpc/kernel/smp.c 2005-11-05 17:44:42.000000000 +1100 +++ linux-2.6/arch/powerpc/kernel/smp.c 2005-11-06 14:40:07.000000000 +1100 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include Index: linux-2.6/arch/powerpc/kernel/traps.c =================================================================== --- linux-2.6.orig/arch/powerpc/kernel/traps.c 2005-11-05 17:44:42.000000000 +1100 +++ linux-2.6/arch/powerpc/kernel/traps.c 2005-11-06 14:41:15.000000000 +1100 @@ -39,7 +39,6 @@ #include #include #include -#include #include #ifdef CONFIG_PPC32 #include @@ -748,22 +747,12 @@ return 0; if (bug->line & BUG_WARNING_TRAP) { /* this is a WARN_ON rather than BUG/BUG_ON */ -#ifdef CONFIG_XMON - xmon_printf(KERN_ERR "Badness in %s at %s:%ld\n", - bug->function, bug->file, - bug->line & ~BUG_WARNING_TRAP); -#endif /* CONFIG_XMON */ printk(KERN_ERR "Badness in %s at %s:%ld\n", bug->function, bug->file, bug->line & ~BUG_WARNING_TRAP); dump_stack(); return 1; } -#ifdef CONFIG_XMON - xmon_printf(KERN_CRIT "kernel BUG in %s at %s:%ld!\n", - bug->function, bug->file, bug->line); - xmon(regs); -#endif /* CONFIG_XMON */ printk(KERN_CRIT "kernel BUG in %s at %s:%ld!\n", bug->function, bug->file, bug->line); From michael at ellerman.id.au Tue Nov 8 00:06:45 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:45 +1100 (EST) Subject: [PATCH 0/12] powerpc: PPC64 Kdump Support Message-ID: <1131368803.622486.677825384736.qpush@concordia> Hi y'all, This is the current stack of patches we have for implementing Kdump on PPC64. These aren't ready for merging yet, but they're close. I wanted to get them out to the list sooner rather than later, so you can all see them. A lot of these have already been via the list, but some haven't. Comments requested! These sit on top of a merged version of page.h which no longer works since 64k pages went it, so don't try to run them. I'm hoping to get page.h merged soon, as well as machine_kexec.c and a few other files we touch. cheers From michael at ellerman.id.au Tue Nov 8 00:06:48 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:48 +1100 (EST) Subject: [PATCH 1/12] powerpc: Seperate usage of KERNELBASE and PAGE_OFFSET In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130648.430786869D@ozlabs.org> This patch tries to seperate usage of KERNELBASE and PAGE_OFFSET. PAGE_OFFSET == 0xC00..00 and always will. It's the quantity you subtract from a virtual kernel address to get a physical one. KERNELBASE == 0xC00..00 + SOMETHING, where SOMETHING tends to be 0, but might not be. It points to the start of the kernel text + data in virtual memory. arch/powerpc/kernel/entry_64.S | 4 ++-- arch/powerpc/kernel/lparmap.c | 6 +++--- arch/powerpc/mm/hash_utils_64.c | 6 +++--- arch/powerpc/mm/slb.c | 4 ++-- arch/powerpc/mm/slb_low.S | 6 +++--- arch/powerpc/mm/stab.c | 10 +++++----- arch/powerpc/mm/tlb_64.c | 2 +- arch/ppc64/kernel/machine_kexec.c | 5 ++--- 8 files changed, 21 insertions(+), 22 deletions(-) Index: kexec/arch/powerpc/mm/stab.c =================================================================== --- kexec.orig/arch/powerpc/mm/stab.c +++ kexec/arch/powerpc/mm/stab.c @@ -40,7 +40,7 @@ static int make_ste(unsigned long stab, unsigned long entry, group, old_esid, castout_entry, i; unsigned int global_entry; struct stab_entry *ste, *castout_ste; - unsigned long kernel_segment = (esid << SID_SHIFT) >= KERNELBASE; + unsigned long kernel_segment = (esid << SID_SHIFT) >= PAGE_OFFSET; vsid_data = vsid << STE_VSID_SHIFT; esid_data = esid << SID_SHIFT | STE_ESID_KP | STE_ESID_V; @@ -83,7 +83,7 @@ static int make_ste(unsigned long stab, } /* Dont cast out the first kernel segment */ - if ((castout_ste->esid_data & ESID_MASK) != KERNELBASE) + if ((castout_ste->esid_data & ESID_MASK) != PAGE_OFFSET) break; castout_entry = (castout_entry + 1) & 0xf; @@ -248,7 +248,7 @@ void stabs_alloc(void) panic("Unable to allocate segment table for CPU %d.\n", cpu); - newstab += KERNELBASE; + newstab = (unsigned long)__va(newstab); memset((void *)newstab, 0, PAGE_SIZE); @@ -265,13 +265,13 @@ void stabs_alloc(void) */ void stab_initialize(unsigned long stab) { - unsigned long vsid = get_kernel_vsid(KERNELBASE); + unsigned long vsid = get_kernel_vsid(PAGE_OFFSET); if (cpu_has_feature(CPU_FTR_SLB)) { slb_initialize(); } else { asm volatile("isync; slbia; isync":::"memory"); - make_ste(stab, GET_ESID(KERNELBASE), vsid); + make_ste(stab, GET_ESID(PAGE_OFFSET), vsid); /* Order update */ asm volatile("sync":::"memory"); Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -171,9 +171,8 @@ void kexec_copy_flush(struct kimage *ima * including ones that were in place on the original copy */ for (i = 0; i < nr_segments; i++) - flush_icache_range(ranges[i].mem + KERNELBASE, - ranges[i].mem + KERNELBASE + - ranges[i].memsz); + flush_icache_range((unsigned long)__va(ranges[i].mem), + (unsigned long)__va(ranges[i].mem + ranges[i].memsz)); } #ifdef CONFIG_SMP Index: kexec/arch/powerpc/mm/hash_utils_64.c =================================================================== --- kexec.orig/arch/powerpc/mm/hash_utils_64.c +++ kexec/arch/powerpc/mm/hash_utils_64.c @@ -239,7 +239,7 @@ void __init htab_initialize(void) /* create bolted the linear mapping in the hash table */ for (i=0; i < lmb.memory.cnt; i++) { - base = lmb.memory.region[i].base + KERNELBASE; + base = (unsigned long)__va(lmb.memory.region[i].base); size = lmb.memory.region[i].size; DBG("creating mapping for region: %lx : %lx\n", base, size); @@ -276,8 +276,8 @@ void __init htab_initialize(void) * for either 4K or 16MB pages. */ if (tce_alloc_start) { - tce_alloc_start += KERNELBASE; - tce_alloc_end += KERNELBASE; + tce_alloc_start = (unsigned long)__va(tce_alloc_start); + tce_alloc_end = (unsigned long)__va(tce_alloc_end); if (base + size >= tce_alloc_start) tce_alloc_start = base + size + 1; Index: kexec/arch/powerpc/mm/slb.c =================================================================== --- kexec.orig/arch/powerpc/mm/slb.c +++ kexec/arch/powerpc/mm/slb.c @@ -55,7 +55,7 @@ static void slb_flush_and_rebolt(void) ksp_flags |= SLB_VSID_L; ksp_esid_data = mk_esid_data(get_paca()->kstack, 2); - if ((ksp_esid_data & ESID_MASK) == KERNELBASE) + if ((ksp_esid_data & ESID_MASK) == PAGE_OFFSET) ksp_esid_data &= ~SLB_ESID_V; /* We need to do this all in asm, so we're sure we don't touch @@ -145,7 +145,7 @@ void slb_initialize(void) asm volatile("isync":::"memory"); asm volatile("slbmte %0,%0"::"r" (0) : "memory"); asm volatile("isync; slbia; isync":::"memory"); - create_slbe(KERNELBASE, flags, 0); + create_slbe(PAGE_OFFSET, flags, 0); create_slbe(VMALLOCBASE, SLB_VSID_KERNEL, 1); /* We don't bolt the stack for the time being - we're in boot, * so the stack is in the bolted segment. By the time it goes Index: kexec/arch/powerpc/kernel/entry_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/entry_64.S +++ kexec/arch/powerpc/kernel/entry_64.S @@ -674,7 +674,7 @@ _GLOBAL(enter_rtas) /* Setup our real return addr */ SET_REG_TO_LABEL(r4,.rtas_return_loc) - SET_REG_TO_CONST(r9,KERNELBASE) + SET_REG_TO_CONST(r9,PAGE_OFFSET) sub r4,r4,r9 mtlr r4 @@ -702,7 +702,7 @@ _GLOBAL(enter_rtas) _STATIC(rtas_return_loc) /* relocation is off at this point */ mfspr r4,SPRN_SPRG3 /* Get PACA */ - SET_REG_TO_CONST(r5, KERNELBASE) + SET_REG_TO_CONST(r5, PAGE_OFFSET) sub r4,r4,r5 /* RELOC the PACA base pointer */ mfmsr r6 Index: kexec/arch/powerpc/mm/slb_low.S =================================================================== --- kexec.orig/arch/powerpc/mm/slb_low.S +++ kexec/arch/powerpc/mm/slb_low.S @@ -66,12 +66,12 @@ _GLOBAL(slb_allocate) srdi r9,r3,60 /* get region */ srdi r3,r3,28 /* get esid */ - cmpldi cr7,r9,0xc /* cmp KERNELBASE for later use */ + cmpldi cr7,r9,0xc /* cmp PAGE_OFFSET for later use */ rldimi r10,r3,28,0 /* r10= ESID<<28 | entry */ oris r10,r10,SLB_ESID_V at h /* r10 |= SLB_ESID_V */ - /* r3 = esid, r10 = esid_data, cr7 = <>KERNELBASE */ + /* r3 = esid, r10 = esid_data, cr7 = <> PAGE_OFFSET */ blt cr7,0f /* user or kernel? */ @@ -114,7 +114,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_16M_PAGE) ld r9,PACACONTEXTID(r13) rldimi r3,r9,USER_ESID_BITS,0 -9: /* r3 = protovsid, r11 = flags, r10 = esid_data, cr7 = <>KERNELBASE */ +9: /* r3 = protovsid, r11 = flags, r10 = esid_data, cr7 = <> PAGE_OFFSET */ ASM_VSID_SCRAMBLE(r3,r9) rldimi r11,r3,SLB_VSID_SHIFT,16 /* combine VSID and flags */ Index: kexec/arch/powerpc/kernel/lparmap.c =================================================================== --- kexec.orig/arch/powerpc/kernel/lparmap.c +++ kexec/arch/powerpc/kernel/lparmap.c @@ -16,8 +16,8 @@ const struct LparMap __attribute__((__se .xSegmentTableOffs = STAB0_PAGE, .xEsids = { - { .xKernelEsid = GET_ESID(KERNELBASE), - .xKernelVsid = KERNEL_VSID(KERNELBASE), }, + { .xKernelEsid = GET_ESID(PAGE_OFFSET), + .xKernelVsid = KERNEL_VSID(PAGE_OFFSET), }, { .xKernelEsid = GET_ESID(VMALLOCBASE), .xKernelVsid = KERNEL_VSID(VMALLOCBASE), }, }, @@ -25,7 +25,7 @@ const struct LparMap __attribute__((__se .xRanges = { { .xPages = HvPagesToMap, .xOffset = 0, - .xVPN = KERNEL_VSID(KERNELBASE) << (SID_SHIFT - PAGE_SHIFT), + .xVPN = KERNEL_VSID(PAGE_OFFSET) << (SID_SHIFT - PAGE_SHIFT), }, }, }; Index: kexec/arch/powerpc/mm/tlb_64.c =================================================================== --- kexec.orig/arch/powerpc/mm/tlb_64.c +++ kexec/arch/powerpc/mm/tlb_64.c @@ -149,7 +149,7 @@ void hpte_update(struct mm_struct *mm, u batch->mm = mm; batch->large = pte_huge(pte); } - if (addr < KERNELBASE) { + if (!is_kernel_addr(addr)) { vsid = get_vsid(mm->context.id, addr); WARN_ON(vsid == 0); } else From michael at ellerman.id.au Tue Nov 8 00:06:50 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:50 +1100 (EST) Subject: [PATCH 2/12] powerpc: Add a is_kernel_addr() macro In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130650.D8FF6686BA@ozlabs.org> There's a bunch of code that compares an address with KERNELBASE to see if it's a "kernel address", ie. >= KERNELBASE. Replace all of them with an is_kernel_addr() macro that does the same thing. This will save us some pain when we change KERNELBASE, and also makes the code more readable IMHO. arch/powerpc/kernel/prom_init.c | 2 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/mm/slb.c | 6 +++--- arch/powerpc/mm/stab.c | 6 +++--- arch/powerpc/oprofile/op_model_power4.c | 4 ++-- arch/powerpc/oprofile/op_model_rs64.c | 3 +-- arch/powerpc/xmon/xmon.c | 4 ++-- include/asm-powerpc/page.h | 6 ++++++ include/asm-ppc64/pgtable.h | 2 +- 9 files changed, 20 insertions(+), 15 deletions(-) Index: kexec/arch/powerpc/mm/stab.c =================================================================== --- kexec.orig/arch/powerpc/mm/stab.c +++ kexec/arch/powerpc/mm/stab.c @@ -122,7 +122,7 @@ static int __ste_allocate(unsigned long unsigned long offset; /* Kernel or user address? */ - if (ea >= KERNELBASE) { + if (is_kernel_addr(ea)) { vsid = get_kernel_vsid(ea); } else { if ((ea >= TASK_SIZE_USER64) || (! mm)) @@ -133,7 +133,7 @@ static int __ste_allocate(unsigned long stab_entry = make_ste(get_paca()->stab_addr, GET_ESID(ea), vsid); - if (ea < KERNELBASE) { + if (!is_kernel_addr(ea)) { offset = __get_cpu_var(stab_cache_ptr); if (offset < NR_STAB_CACHE_ENTRIES) __get_cpu_var(stab_cache[offset++]) = stab_entry; @@ -190,7 +190,7 @@ void switch_stab(struct task_struct *tsk entry++, ste++) { unsigned long ea; ea = ste->esid_data & ESID_MASK; - if (ea < KERNELBASE) { + if (!is_kernel_addr(ea)) { ste->esid_data = 0; } } Index: kexec/arch/powerpc/kernel/prom_init.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom_init.c +++ kexec/arch/powerpc/kernel/prom_init.c @@ -1917,7 +1917,7 @@ static void __init prom_check_initrd(uns if (r3 && r4 && r4 != 0xdeadbeef) { unsigned long val; - RELOC(prom_initrd_start) = (r3 >= KERNELBASE) ? __pa(r3) : r3; + RELOC(prom_initrd_start) = is_kernel_addr(r3) ? __pa(r3) : r3; RELOC(prom_initrd_end) = RELOC(prom_initrd_start) + r4; val = RELOC(prom_initrd_start); Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -421,7 +421,7 @@ static void __init check_for_initrd(void /* If we were passed an initrd, set the ROOT_DEV properly if the values * look sensible. If not, clear initrd reference. */ - if (initrd_start >= KERNELBASE && initrd_end >= KERNELBASE && + if (is_kernel_addr(initrd_start) && is_kernel_addr(initrd_end) && initrd_end > initrd_start) ROOT_DEV = Root_RAM0; else Index: kexec/arch/powerpc/mm/slb.c =================================================================== --- kexec.orig/arch/powerpc/mm/slb.c +++ kexec/arch/powerpc/mm/slb.c @@ -111,14 +111,14 @@ void switch_slb(struct task_struct *tsk, else unmapped_base = TASK_UNMAPPED_BASE_USER64; - if (pc >= KERNELBASE) + if (is_kernel_addr(pc)) return; slb_allocate(pc); if (GET_ESID(pc) == GET_ESID(stack)) return; - if (stack >= KERNELBASE) + if (is_kernel_addr(stack)) return; slb_allocate(stack); @@ -126,7 +126,7 @@ void switch_slb(struct task_struct *tsk, || (GET_ESID(stack) == GET_ESID(unmapped_base))) return; - if (unmapped_base >= KERNELBASE) + if (is_kernel_addr(unmapped_base)) return; slb_allocate(unmapped_base); } Index: kexec/arch/powerpc/oprofile/op_model_power4.c =================================================================== --- kexec.orig/arch/powerpc/oprofile/op_model_power4.c +++ kexec/arch/powerpc/oprofile/op_model_power4.c @@ -232,7 +232,7 @@ static unsigned long get_pc(struct pt_re return (unsigned long)__va(pc); /* Not sure where we were */ - if (pc < KERNELBASE) + if (!is_kernel_addr(pc)) /* function descriptor madness */ return *((unsigned long *)kernel_unknown_bucket); @@ -244,7 +244,7 @@ static int get_kernel(unsigned long pc) int is_kernel; if (!mmcra_has_sihv) { - is_kernel = (pc >= KERNELBASE); + is_kernel = is_kernel_addr(pc); } else { unsigned long mmcra = mfspr(SPRN_MMCRA); is_kernel = ((mmcra & MMCRA_SIPR) == 0); Index: kexec/arch/powerpc/xmon/xmon.c =================================================================== --- kexec.orig/arch/powerpc/xmon/xmon.c +++ kexec/arch/powerpc/xmon/xmon.c @@ -1015,7 +1015,7 @@ static long check_bp_loc(unsigned long a unsigned int instr; addr &= ~3; - if (addr < KERNELBASE) { + if (!is_kernel_addr(addr)) { printf("Breakpoints may only be placed at kernel addresses\n"); return 0; } @@ -1066,7 +1066,7 @@ bpt_cmds(void) dabr.address = 0; dabr.enabled = 0; if (scanhex(&dabr.address)) { - if (dabr.address < KERNELBASE) { + if (!is_kernel_addr(dabr.address)) { printf(badaddr); break; } Index: kexec/include/asm-powerpc/page.h =================================================================== --- kexec.orig/include/asm-powerpc/page.h +++ kexec/include/asm-powerpc/page.h @@ -106,6 +106,12 @@ #define KERNELBASE PAGE_OFFSET #define VMALLOCBASE ASM_CONST(0xD000000000000000) +/* + * Don't compare things with KERNELBASE or PAGE_OFFSET to test for + * "kernelness", use is_kernel_addr() - it should do what you want. + */ +#define is_kernel_addr(x) ((x) >= PAGE_OFFSET) + #ifndef __ASSEMBLY__ #ifdef __powerpc64__ Index: kexec/include/asm-ppc64/pgtable.h =================================================================== --- kexec.orig/include/asm-ppc64/pgtable.h +++ kexec/include/asm-ppc64/pgtable.h @@ -212,7 +212,7 @@ static inline pte_t pfn_pte(unsigned lon #define pte_pfn(x) ((unsigned long)((pte_val(x) >> PTE_SHIFT))) #define pte_page(x) pfn_to_page(pte_pfn(x)) -#define pmd_set(pmdp, ptep) ({BUG_ON((u64)ptep < KERNELBASE); pmd_val(*(pmdp)) = (unsigned long)(ptep);}) +#define pmd_set(pmdp, ptep) ({BUG_ON(!is_kernel_addr((u64)ptep)); pmd_val(*(pmdp)) = (unsigned long)(ptep);}) #define pmd_none(pmd) (!pmd_val(pmd)) #define pmd_bad(pmd) (pmd_val(pmd) == 0) #define pmd_present(pmd) (pmd_val(pmd) != 0) Index: kexec/arch/powerpc/oprofile/op_model_rs64.c =================================================================== --- kexec.orig/arch/powerpc/oprofile/op_model_rs64.c +++ kexec/arch/powerpc/oprofile/op_model_rs64.c @@ -178,7 +178,6 @@ static void rs64_handle_interrupt(struct int val; int i; unsigned long pc = mfspr(SPRN_SIAR); - int is_kernel = (pc >= KERNELBASE); /* set the PMM bit (see comment below) */ mtmsrd(mfmsr() | MSR_PMM); @@ -187,7 +186,7 @@ static void rs64_handle_interrupt(struct val = ctr_read(i); if (val < 0) { if (ctr[i].enabled) { - oprofile_add_pc(pc, is_kernel, i); + oprofile_add_pc(pc, is_kernel_addr(pc), i); ctr_write(i, reset_value[i]); } else { ctr_write(i, 0); From michael at ellerman.id.au Tue Nov 8 00:06:52 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:52 +1100 (EST) Subject: [PATCH 3/12] powerpc: Add CONFIG_CRASH_DUMP In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130652.B2BE6686BC@ozlabs.org> This patch adds a Kconfig variable, CONFIG_CRASH_DUMP, which configures the built kernel for use as a Kdump kernel. Currently "all" this involves is changing the value of KERNELBASE to 32 MB. arch/powerpc/Kconfig | 11 +++++++++++ arch/powerpc/kernel/setup_64.c | 3 +++ include/asm-powerpc/page.h | 10 +++++++++- 3 files changed, 23 insertions(+), 1 deletion(-) Index: kexec/arch/powerpc/Kconfig =================================================================== --- kexec.orig/arch/powerpc/Kconfig +++ kexec/arch/powerpc/Kconfig @@ -379,6 +379,17 @@ config CELL_IIC bool default y +config CRASH_DUMP + bool "kernel crash dumps (EXPERIMENTAL)" + depends on PPC_MULTIPLATFORM + depends on EXPERIMENTAL + help + Build a kernel suitable for use as a kdump capture kernel. + The kernel will be linked at a different address than normal, and + so can only be used for Kdump. + + Don't change this unless you know what you are doing. + config IBMVIO depends on PPC_PSERIES || PPC_ISERIES bool Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -528,6 +528,9 @@ void __init setup_system(void) ppc64_caches.iline_size); printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); +#if PHYSICAL_START > 0 + printk("physical_start = 0x%lx\n", PHYSICAL_START); +#endif printk("-----------------------------------------------------\n"); mm_init_ppc64(); Index: kexec/include/asm-powerpc/page.h =================================================================== --- kexec.orig/include/asm-powerpc/page.h +++ kexec/include/asm-powerpc/page.h @@ -103,7 +103,15 @@ #define PAGE_OFFSET CONFIG_KERNEL_START #endif /* __powerpc64__ */ -#define KERNELBASE PAGE_OFFSET +#ifdef CONFIG_CRASH_DUMP +/* Kdump kernel runs at 32 MB, change at your peril. */ +#define PHYSICAL_START ASM_CONST(0x2000000) +#else +#define PHYSICAL_START ASM_CONST(0x0) +#endif + +#define KERNELBASE (PAGE_OFFSET + PHYSICAL_START) + #define VMALLOCBASE ASM_CONST(0xD000000000000000) /* From michael at ellerman.id.au Tue Nov 8 00:06:55 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:55 +1100 (EST) Subject: [PATCH 4/12] powerpc: Create a trampoline for the fwnmi vectors In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130655.2649A686CD@ozlabs.org> The fwnmi vectors can be anywhere < 32 MB, so we need to use a trampoline for them. The kdump kernel will register the trampoline addresses, which will then jump up to the real code above 32 MB. arch/powerpc/kernel/head_64.S | 2 ++ arch/powerpc/platforms/pseries/ras.c | 6 ++---- arch/powerpc/platforms/pseries/setup.c | 17 +++++++++-------- include/asm-powerpc/firmware.h | 6 ++++++ 4 files changed, 19 insertions(+), 12 deletions(-) Index: kexec/arch/powerpc/kernel/head_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/head_64.S +++ kexec/arch/powerpc/kernel/head_64.S @@ -512,6 +512,7 @@ _GLOBAL(do_stab_bolted_pSeries) * Vectors for the FWNMI option. Share common code. */ .globl system_reset_fwnmi + .align 7 system_reset_fwnmi: HMT_MEDIUM mtspr SPRN_SPRG1,r13 /* save r13 */ @@ -519,6 +520,7 @@ system_reset_fwnmi: EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common) .globl machine_check_fwnmi + .align 7 machine_check_fwnmi: HMT_MEDIUM mtspr SPRN_SPRG1,r13 /* save r13 */ Index: kexec/arch/powerpc/platforms/pseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/setup.c +++ kexec/arch/powerpc/platforms/pseries/setup.c @@ -75,8 +75,6 @@ #endif extern void find_udbg_vterm(void); -extern void system_reset_fwnmi(void); /* from head.S */ -extern void machine_check_fwnmi(void); /* from head.S */ extern void generic_find_legacy_serial_ports(u64 *physport, unsigned int *default_speed); @@ -104,18 +102,21 @@ void pSeries_show_cpuinfo(struct seq_fil /* Initialize firmware assisted non-maskable interrupts if * the firmware supports this feature. - * */ static void __init fwnmi_init(void) { - int ret; + unsigned long a1, a2; + int ibm_nmi_register = rtas_token("ibm,nmi-register"); if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE) return; - ret = rtas_call(ibm_nmi_register, 2, 1, NULL, - __pa((unsigned long)system_reset_fwnmi), - __pa((unsigned long)machine_check_fwnmi)); - if (ret == 0) + + /* If the kernel's not linked at zero we point the firmware at low + * addresses anyway, and use a trampoline to get to the real code. */ + a1 = __pa(system_reset_fwnmi) - PHYSICAL_START; + a2 = __pa(machine_check_fwnmi) - PHYSICAL_START; + + if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, a1, a2)) fwnmi_active = 1; } Index: kexec/include/asm-powerpc/firmware.h =================================================================== --- kexec.orig/include/asm-powerpc/firmware.h +++ kexec/include/asm-powerpc/firmware.h @@ -92,6 +92,12 @@ typedef struct { extern firmware_feature_t firmware_features_table[]; #endif +extern void system_reset_fwnmi(void); +extern void machine_check_fwnmi(void); + +/* This is true if we are using the firmware NMI handler (typically LPAR) */ +extern int fwnmi_active; + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_FIRMWARE_H */ Index: kexec/arch/powerpc/platforms/pseries/ras.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/ras.c +++ kexec/arch/powerpc/platforms/pseries/ras.c @@ -49,14 +49,12 @@ #include #include #include +#include static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(ras_log_buf_lock); -char mce_data_buf[RTAS_ERROR_LOG_MAX] -; -/* This is true if we are using the firmware NMI handler (typically LPAR) */ -extern int fwnmi_active; +char mce_data_buf[RTAS_ERROR_LOG_MAX]; static int ras_get_sensor_state_token; static int ras_check_exception_token; From michael at ellerman.id.au Tue Nov 8 00:06:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:06:57 +1100 (EST) Subject: [PATCH 5/12] powerpc: Reroute interrupts from 0 + offset to PHYSICAL_START + offset In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130657.980E5686CC@ozlabs.org> Regardless of where the kernel's linked we always get interrupts at low addresses. This patch creates a trampoline in the first 3 pages of memory, where interrupts land, and patches those addresses to jump into the real kernel code at PHYSICAL_START. We also need to reserve the trampoline code and a bit more in prom.c arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/crash.c | 52 +++++++++++++++++++++++++++++++++++++++++ arch/powerpc/kernel/prom.c | 6 +++- arch/powerpc/kernel/setup_64.c | 5 +++ include/asm-powerpc/kdump.h | 13 ++++++++++ 5 files changed, 77 insertions(+), 2 deletions(-) Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -274,6 +275,10 @@ void __init early_setup(unsigned long dt } ppc_md = **mach; +#ifdef CONFIG_CRASH_DUMP + kdump_setup(); +#endif + DBG("Found, Initializing memory management...\n"); /* Index: kexec/arch/powerpc/kernel/prom.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom.c +++ kexec/arch/powerpc/kernel/prom.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -1354,11 +1355,14 @@ void __init early_init_devtree(void *par #ifdef CONFIG_PPC64 systemcfg->physicalMemorySize = lmb_phys_mem_size(); #endif - lmb_reserve(0, __pa(klimit)); DBG("Phys. mem: %lx\n", lmb_phys_mem_size()); /* Reserve LMB regions used by kernel, initrd, dt, etc... */ + lmb_reserve(__pa(KERNELBASE), __pa(klimit) - __pa(KERNELBASE)); +#ifdef CONFIG_CRASH_DUMP + lmb_reserve(0, KDUMP_BACKUP_LIMIT); +#endif early_reserve_mem(); DBG("Scanning CPUs ...\n"); Index: kexec/include/asm-powerpc/kdump.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/kdump.h @@ -0,0 +1,13 @@ +#ifndef _PPC64_KDUMP_H +#define _PPC64_KDUMP_H + +/* How many bytes to backup from zero for kdump. The backup limit should + * be greater or equal to the trampoline's end address. */ +#define KDUMP_BACKUP_LIMIT 0x8000 + +#define KDUMP_TRAMPOLINE_START 0x0100 +#define KDUMP_TRAMPOLINE_END 0x3000 + +extern void kdump_setup(void); + +#endif /* __PPC64_KDUMP_H */ Index: kexec/arch/powerpc/kernel/Makefile =================================================================== --- kexec.orig/arch/powerpc/kernel/Makefile +++ kexec/arch/powerpc/kernel/Makefile @@ -11,7 +11,7 @@ CFLAGS_btext.o += -fPIC endif obj-y := semaphore.o cputable.o ptrace.o syscalls.o \ - signal_32.o pmc.o + signal_32.o pmc.o crash.o obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ signal_64.o ptrace32.o systbl.o obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o @@ -22,6 +22,7 @@ obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_RTAS_PROC) += rtas-proc.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o +obj-$(CONFIG_CRASH_DUMP) += crash.o ifeq ($(CONFIG_PPC_MERGE),y) Index: kexec/arch/powerpc/kernel/crash.c =================================================================== --- /dev/null +++ kexec/arch/powerpc/kernel/crash.c @@ -0,0 +1,52 @@ +/* + * Routines for doing kexec-based kdump. + * + * Copyright (C) 2005, IBM Corp. + * + * Created by: Michael Ellerman + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#undef DEBUG + +#include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +static void __init create_trampoline(unsigned long addr) +{ + /* The maximum range of a single instruction branch, is the current + * instruction's address + (32 MB - 4) bytes. For the trampoline we + * need to branch to current address + 32 MB. So we insert a nop at + * the trampoline address, then the next instruction (+ 4 bytes) + * does a branch to (32 MB - 4). The net effect is that when we + * branch to "addr" we jump to ("addr" + 32 MB). Although it requires + * two instructions it doesn't require any registers. + */ + create_instruction(addr, 0x60000000); /* nop */ + create_branch(addr + 4, addr + PHYSICAL_START, 0); +} + +void __init kdump_setup(void) +{ + unsigned long i; + + DBG(" -> kdump_setup()\n"); + + for (i = KDUMP_TRAMPOLINE_START; i < KDUMP_TRAMPOLINE_END; i += 8) { + create_trampoline(i); + } + + create_trampoline(__pa(system_reset_fwnmi) - PHYSICAL_START); + create_trampoline(__pa(machine_check_fwnmi) - PHYSICAL_START); + + DBG(" <- kdump_setup()\n"); +} From michael at ellerman.id.au Tue Nov 8 00:07:00 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:00 +1100 (EST) Subject: [PATCH 6/12] powerpc: Fixups for kernel linked at 32 MB In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130700.17D9A686DA@ozlabs.org> There's a few places where we need to fix things up for the kernel to work if it's linked at 32MB: - platforms/powermac/smp.c To start secondary cpus on pmac we patch the reset vector, which is fine. Except if we're above 32MB we don't have enough bits for an absolute branch, it needs to relative. - kernel/head_64.s - A few branches in the cpu hold code need to load the full target address and do a bctr. - after_prom_start needs to load PHYSICAL_START as the dest address, not 0. - The exception prolog needs to load the low word of the target adddress, not just the low halfword. - Fixup handling of the initial stab address. - kernel/setup_64.c smp_release_cpus() needs to write 1 to the spinloop flag near 0, not 32 MB. arch/powerpc/kernel/head_64.S | 30 ++++++++++++++++++++++++------ arch/powerpc/kernel/setup_64.c | 5 ++++- arch/powerpc/platforms/powermac/smp.c | 2 +- include/asm-ppc64/mmu.h | 3 ++- 4 files changed, 31 insertions(+), 9 deletions(-) Index: kexec/arch/powerpc/platforms/powermac/smp.c =================================================================== --- kexec.orig/arch/powerpc/platforms/powermac/smp.c +++ kexec/arch/powerpc/platforms/powermac/smp.c @@ -762,7 +762,7 @@ static void __devinit smp_core99_kick_cp * b __secondary_start_pmac_0 + nr*8 - KERNELBASE */ new_vector = (unsigned long) __secondary_start_pmac_0 + nr * 8; - *vector = 0x48000002 + new_vector - KERNELBASE; + *vector = 0x48000001 + new_vector - (unsigned long)vector; /* flush data cache and inval instruction cache */ flush_icache_range((unsigned long) vector, (unsigned long) vector + 4); Index: kexec/arch/powerpc/kernel/head_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/head_64.S +++ kexec/arch/powerpc/kernel/head_64.S @@ -155,11 +155,15 @@ _GLOBAL(__secondary_hold) bne 100b #ifdef CONFIG_HMT - b .hmt_init + LOADADDR(r4, .hmt_init) + mtctr r4 + bctr #else #ifdef CONFIG_SMP + LOADADDR(r4, .pSeries_secondary_smp_init) + mtctr r4 mr r3,r24 - b .pSeries_secondary_smp_init + bctr #else BUG_OPCODE #endif @@ -201,6 +205,20 @@ exception_marker: #define EX_DSISR 56 #define EX_CCR 60 +/* + * We're short on space and time in the exception prolog, so we can't use + * the normal LOADADDR macro. Normally we just need the low halfword of the + * address, but for Kdump we need the whole low word. + */ +#ifdef CONFIG_CRASH_DUMP +#define LOAD_HANDLER(reg, label) \ + oris r12,r12,(label)@h; /* virt addr of handler ... */ \ + ori r12,r12,(label)@l; /* .. and the rest */ +#else +#define LOAD_HANDLER(reg, label) \ + ori r12,r12,(label)@l; /* virt addr of handler ... */ +#endif + #define EXCEPTION_PROLOG_PSERIES(area, label) \ mfspr r13,SPRN_SPRG3; /* get paca address into r13 */ \ std r9,area+EX_R9(r13); /* save r9 - r12 */ \ @@ -213,8 +231,8 @@ exception_marker: clrrdi r12,r13,32; /* get high part of &label */ \ mfmsr r10; \ mfspr r11,SPRN_SRR0; /* save SRR0 */ \ - ori r12,r12,(label)@l; /* virt addr of handler */ \ ori r10,r10,MSR_IR|MSR_DR|MSR_RI; \ + LOAD_HANDLER(r12,label) \ mtspr SPRN_SRR0,r12; \ mfspr r12,SPRN_SRR1; /* and SRR1 */ \ mtspr SPRN_SRR1,r10; \ @@ -1205,7 +1223,7 @@ unrecov_slb: * fixed address (the linker can't compute (u64)&initial_stab >> * PAGE_SHIFT). */ - . = STAB0_PHYS_ADDR /* 0x6000 */ + . = STAB0_OFFSET /* 0x6000 */ .globl initial_stab initial_stab: .space 4096 @@ -1410,7 +1428,7 @@ _STATIC(__boot_from_prom) _STATIC(__after_prom_start) /* - * We need to run with __start at physical address 0. + * We need to run with __start at physical address PHYSICAL_START. * This will leave some code in the first 256B of * real memory, which are reserved for software use. * The remainder of the first page is loaded with the fixed @@ -1425,7 +1443,7 @@ _STATIC(__after_prom_start) mr r26,r3 SET_REG_TO_CONST(r27,KERNELBASE) - li r3,0 /* target addr */ + LOADADDR(r3, PHYSICAL_START) /* target addr */ // XXX FIXME: Use phys returned by OF (r30) add r4,r27,r26 /* source addr */ Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -301,6 +301,7 @@ void __init early_setup(unsigned long dt void smp_release_cpus(void) { extern unsigned long __secondary_hold_spinloop; + unsigned long *ptr; DBG(" -> smp_release_cpus()\n"); @@ -311,7 +312,9 @@ void smp_release_cpus(void) * This is useless but harmless on iSeries, secondaries are already * waiting on their paca spinloops. */ - __secondary_hold_spinloop = 1; + ptr = (unsigned long *)((unsigned long)&__secondary_hold_spinloop + - PHYSICAL_START); + *ptr = 1; mb(); DBG(" <- smp_release_cpus()\n"); Index: kexec/include/asm-ppc64/mmu.h =================================================================== --- kexec.orig/include/asm-ppc64/mmu.h +++ kexec/include/asm-ppc64/mmu.h @@ -30,7 +30,8 @@ /* Location of cpu0's segment table */ #define STAB0_PAGE 0x6 -#define STAB0_PHYS_ADDR (STAB0_PAGE<<12) +#define STAB0_OFFSET (STAB0_PAGE << 12) +#define STAB0_PHYS_ADDR (STAB0_OFFSET + PHYSICAL_START) #ifndef __ASSEMBLY__ extern char initial_stab[]; From michael at ellerman.id.au Tue Nov 8 00:07:03 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:03 +1100 (EST) Subject: [PATCH 7/12] powerpc: Add basic infrastructure for Kdump In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130703.A84EB686E2@ozlabs.org> Add basic infrastructure for Kdump. FIXME - more explanation needed. Patch-by: Haren Myneni arch/powerpc/kernel/crash.c | 260 +++++++++++++++++++++++++++++++++++++- arch/powerpc/kernel/smp.c | 24 +++ arch/powerpc/kernel/traps.c | 20 ++ arch/ppc64/kernel/machine_kexec.c | 27 +-- include/asm-powerpc/kexec.h | 2 5 files changed, 306 insertions(+), 27 deletions(-) Index: kexec/arch/powerpc/kernel/smp.c =================================================================== --- kexec.orig/arch/powerpc/kernel/smp.c +++ kexec/arch/powerpc/kernel/smp.c @@ -74,6 +74,8 @@ void smp_call_function_interrupt(void); int smt_enabled_at_boot = 1; +static void (*crash_ipi_function_ptr)(struct pt_regs *) = NULL; + #ifdef CONFIG_MPIC int __init smp_mpic_probe(void) { @@ -122,11 +124,16 @@ void smp_message_recv(int msg, struct pt /* XXX Do we have to do this? */ set_need_resched(); break; -#ifdef CONFIG_DEBUGGER +#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) case PPC_MSG_DEBUGGER_BREAK: - debugger_ipi(regs); + if (crash_ipi_function_ptr) + crash_ipi_function_ptr(regs); +#ifdef CONFIG_DEBUGGER + else + debugger_ipi(regs); +#endif /* CONFIG_DEBUGGER */ break; -#endif +#endif /* CONFIG_DEBUGGER || CONFIG_KEXEC */ default: printk("SMP %d: smp_message_recv(): unknown msg %d\n", smp_processor_id(), msg); @@ -146,6 +153,17 @@ void smp_send_debugger_break(int cpu) } #endif +#ifdef CONFIG_KEXEC +void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) +{ + crash_ipi_function_ptr = crash_ipi_callback; + if (crash_ipi_callback) { + mb(); + smp_ops->message_pass(MSG_ALL_BUT_SELF, PPC_MSG_DEBUGGER_BREAK); + } +} +#endif + static void stop_this_cpu(void *dummy) { local_irq_disable(); Index: kexec/arch/powerpc/kernel/crash.c =================================================================== --- kexec.orig/arch/powerpc/kernel/crash.c +++ kexec/arch/powerpc/kernel/crash.c @@ -1,16 +1,30 @@ /* - * Routines for doing kexec-based kdump. + * Architecture specific (PPC64) functions for kexec based crash dumps. * - * Copyright (C) 2005, IBM Corp. - * - * Created by: Michael Ellerman + * Created by: Michael Ellerman & Haren Myneni * * This source code is licensed under the GNU General Public License, * Version 2. See the file COPYING for more details. + * */ #undef DEBUG +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include #include #include #include @@ -21,6 +35,226 @@ #define DBG(fmt...) #endif +#ifdef CONFIG_KEXEC +note_buf_t crash_notes[NR_CPUS]; + +/* This keeps a track of which one is crashing cpu. */ +int kexec_crashing_cpu = -1; + +static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, + size_t data_len) +{ + struct elf_note note; + + note.n_namesz = strlen(name) + 1; + note.n_descsz = data_len; + note.n_type = type; + memcpy(buf, ¬e, sizeof(note)); + buf += (sizeof(note) +3)/4; + memcpy(buf, name, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + return buf; +} + +static void final_note(u32 *buf) +{ + struct elf_note note; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, ¬e, sizeof(note)); +} + +static void crash_save_this_cpu(struct pt_regs *regs, int cpu) +{ + struct elf_prstatus prstatus; + u32 *buf; + + if ((cpu < 0) || (cpu >= NR_CPUS)) + return; + + /* Using ELF notes here is opportunistic. + * I need a well defined structure format + * for the data I pass, and I need tags + * on the data to indicate what information I have + * squirrelled away. ELF notes happen to provide + * all of that that no need to invent something new. + */ + buf = &crash_notes[cpu][0]; + memset(&prstatus, 0, sizeof(prstatus)); + prstatus.pr_pid = current->pid; + elf_core_copy_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + final_note(buf); +} + +/* FIXME Merge this with xmon_save_regs ?? */ +static inline void crash_get_current_regs(struct pt_regs *regs) +{ + unsigned long tmp1, tmp2; + + __asm__ __volatile__ ( + "std 0,0(%2)\n" + "std 1,8(%2)\n" + "std 2,16(%2)\n" + "std 3,24(%2)\n" + "std 4,32(%2)\n" + "std 5,40(%2)\n" + "std 6,48(%2)\n" + "std 7,56(%2)\n" + "std 8,64(%2)\n" + "std 9,72(%2)\n" + "std 10,80(%2)\n" + "std 11,88(%2)\n" + "std 12,96(%2)\n" + "std 13,104(%2)\n" + "std 14,112(%2)\n" + "std 15,120(%2)\n" + "std 16,128(%2)\n" + "std 17,136(%2)\n" + "std 18,144(%2)\n" + "std 19,152(%2)\n" + "std 20,160(%2)\n" + "std 21,168(%2)\n" + "std 22,176(%2)\n" + "std 23,184(%2)\n" + "std 24,192(%2)\n" + "std 25,200(%2)\n" + "std 26,208(%2)\n" + "std 27,216(%2)\n" + "std 28,224(%2)\n" + "std 29,232(%2)\n" + "std 30,240(%2)\n" + "std 31,248(%2)\n" + "mfmsr %0\n" + "std %0, 264(%2)\n" + "mfctr %0\n" + "std %0, 280(%2)\n" + "mflr %0\n" + "std %0, 288(%2)\n" + "bl 1f\n" + "1: mflr %1\n" + "std %1, 256(%2)\n" + "mtlr %0\n" + "mfxer %0\n" + "std %0, 296(%2)\n" + : "=&r" (tmp1), "=&r" (tmp2) + : "b" (regs)); +} + +/* We may have saved_regs from where the error came from + * or it is NULL if via a direct panic(). + */ +static void crash_save_self(struct pt_regs *saved_regs) +{ + struct pt_regs regs; + int cpu; + + cpu = smp_processor_id(); + if (saved_regs) + memcpy(®s, saved_regs, sizeof(*saved_regs)); + else + crash_get_current_regs(®s); + crash_save_this_cpu(®s, cpu); +} + +#ifdef CONFIG_SMP +static atomic_t waiting_for_crash_ipi; + +void crash_ipi_callback(struct pt_regs *regs) +{ + int cpu = smp_processor_id(); + + if (cpu == kexec_crashing_cpu) + return; + + if (!cpu_online(cpu)) + return; + + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(1, 1); + + local_irq_disable(); + + crash_save_this_cpu(regs, cpu); + atomic_dec(&waiting_for_crash_ipi); + kexec_smp_wait(); + /* NOTREACHED */ +} + +static void crash_kexec_prepare_cpus(void) +{ + unsigned int msecs; + extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); + + atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1); + + /* Would it be better to replace the trap vector here? */ + crash_send_ipi(crash_ipi_callback); + smp_wmb(); + + /* + * FIXME: Until we will have the way to stop other CPUSs reliabally, + * the crash CPU will send an IPI and wait for other CPUs to + * respond. If not, proceed the kexec boot even though we failed to + * capture other CPU states. Also, note that kexec boot might fail. + */ + msecs = 1000000; + while ((atomic_read(&waiting_for_crash_ipi) > 0) && (--msecs > 0)) + cpu_relax(); + + /* + * FIX: We can do soft reset such that we get all. Panic CPU + * can longjmp to the previous state such that it does kexec boot. + */ + if (atomic_read(&waiting_for_crash_ipi)) + printk("All CPUS are not reponded to an IPI\n"); + + crash_send_ipi(NULL); +} +#else +static void crash_kexec_prepare_cpus(void) +{ + /* There are no cpus to shootdown */ +} + +#endif + +void machine_crash_shutdown(struct pt_regs *regs) +{ + /* + * This function is only called after the system + * has paniced or is otherwise in a critical state. + * The minimum amount of code to allow a kexec'd kernel + * to run successfully needs to happen here. + * + * In practice this means stopping other cpus in + * an SMP system. + * The kernel is broken so disable interrupts. + */ + local_irq_disable(); + + /* FIXME Why commented out ? */ + //if (ppc_md.kexec_cpu_down) + // ppc_md.kexec_cpu_down(1, 0); + + /* + * Make a note of crashing cpu. Will be used in machine_kexec + * such that another IPI will not be sent. + */ + kexec_crashing_cpu = smp_processor_id(); + crash_kexec_prepare_cpus(); + crash_save_self(regs); +} +#endif /* CONFIG_KEXEC */ + +#ifdef CONFIG_CRASH_DUMP + static void __init create_trampoline(unsigned long addr) { /* The maximum range of a single instruction branch, is the current @@ -50,3 +284,21 @@ void __init kdump_setup(void) DBG(" <- kdump_setup()\n"); } + +static int __init parse_elfcorehdr(char *p) +{ + if (p) + elfcorehdr_addr = memparse(p, &p); + + return 0; +} +__setup("elfcorehdr=", parse_elfcorehdr); + +static int __init parse_savemaxmem(char *p) +{ + if (p) + saved_max_pfn = memparse(p, &p) >> PAGE_SHIFT; +} +__setup("savemaxmem=", parse_savemaxmem); + +#endif /* CONFIG_CRASH_DUMP */ Index: kexec/arch/powerpc/kernel/traps.c =================================================================== --- kexec.orig/arch/powerpc/kernel/traps.c +++ kexec/arch/powerpc/kernel/traps.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -97,7 +98,7 @@ static DEFINE_SPINLOCK(die_lock); int die(const char *str, struct pt_regs *regs, long err) { - static int die_counter; + static int die_counter, crash_dump_start = 0; int nl = 0; if (debugger(regs)) @@ -158,7 +159,22 @@ int die(const char *str, struct pt_regs print_modules(); show_regs(regs); bust_spinlocks(0); - spin_unlock_irq(&die_lock); + + if (!crash_dump_start) { + if (kexec_should_crash(current)) { + crash_dump_start = 1; + spin_unlock_irq(&die_lock); + crash_kexec(regs); + } else + spin_unlock_irq(&die_lock); + } else { + spin_unlock_irq(&die_lock); + /* + * Only for soft-reset: Other CPUs will be responded to an IPI + * sent by first kexec CPU. + */ + for(;;); + } if (in_interrupt()) panic("Fatal exception in interrupt"); Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -27,20 +27,6 @@ #define HASH_GROUP_SIZE 0x80 /* size of each hash group, asm/mmu.h */ -/* Have this around till we move it into crash specific file */ -note_buf_t crash_notes[NR_CPUS]; - -/* Dummy for now. Not sure if we need to have a crash shutdown in here - * and if what it will achieve. Letting it be now to compile the code - * in generic kexec environment - */ -void machine_crash_shutdown(struct pt_regs *regs) -{ - /* do nothing right now */ - /* smp_relase_cpus() if we want smp on panic kernel */ - /* cpu_irq_down to isolate us until we are ready */ -} - int machine_kexec_prepare(struct kimage *image) { int i; @@ -283,11 +269,18 @@ extern NORET_TYPE void kexec_sequence(vo /* too late to fail here */ void machine_kexec(struct kimage *image) { - /* prepare control code if any */ - /* shutdown other cpus into our wait loop and quiesce interrupts */ - kexec_prepare_cpus(); + /* + * If the kexec boot is the normal one, need to shutdown other cpus + * into our wait loop and quiesce interrupts. + * Otherwise, in the case of crashed mode (kexec_crashing_cpu >= 0), + * stopping other CPUs and collecting their pt_regs is done before + * using debugger IPI. + */ + + if (kexec_crashing_cpu == -1) + kexec_prepare_cpus(); /* switch to a staticly allocated stack. Based on irq stack code. * XXX: the task struct will likely be invalid once we do the copy! Index: kexec/include/asm-powerpc/kexec.h =================================================================== --- kexec.orig/include/asm-powerpc/kexec.h +++ kexec/include/asm-powerpc/kexec.h @@ -32,7 +32,7 @@ #ifndef __ASSEMBLY__ -#define MAX_NOTE_BYTES 1024 +#define MAX_NOTE_BYTES 2048 typedef u32 note_buf_t[MAX_NOTE_BYTES / sizeof(u32)]; extern note_buf_t crash_notes[]; From michael at ellerman.id.au Tue Nov 8 00:07:05 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:05 +1100 (EST) Subject: [PATCH 8/12] powerpc: Turn cpu_irq_down into kexec_cpu_down In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130705.B9207686E3@ozlabs.org> We currently have a ppc_md member called cpu_irq_down, which disables IRQs for the cpu in question. The only caller of cpu_irq_down is the kexec code. On pSeries we need to do more than just teardown IRQs at kexec time, so rename the ppc_md member to kexec_cpu_down and expand it. The pSeries code needs to know, and other platforms might too, whether we're doing a crash shutdown (ie. panicking) or a regular kexec, so add a flag for that. arch/powerpc/platforms/pseries/setup.c | 26 ++++++++++++++++++++++++-- arch/ppc64/kernel/machine_kexec.c | 12 ++++++------ include/asm-powerpc/machdep.h | 4 +++- 3 files changed, 33 insertions(+), 9 deletions(-) Index: kexec/arch/powerpc/platforms/pseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/setup.c +++ kexec/arch/powerpc/platforms/pseries/setup.c @@ -200,14 +200,12 @@ static void __init pSeries_setup_arch(vo if (ppc64_interrupt_controller == IC_OPEN_PIC) { ppc_md.init_IRQ = pSeries_init_mpic; ppc_md.get_irq = mpic_get_irq; - ppc_md.cpu_irq_down = mpic_teardown_this_cpu; /* Allocate the mpic now, so that find_and_init_phbs() can * fill the ISUs */ pSeries_setup_mpic(); } else { ppc_md.init_IRQ = xics_init_IRQ; ppc_md.get_irq = xics_get_irq; - ppc_md.cpu_irq_down = xics_teardown_cpu; } #ifdef CONFIG_SMP @@ -597,6 +595,27 @@ static int pSeries_pci_probe_mode(struct return PCI_PROBE_NORMAL; } +#ifdef CONFIG_KEXEC +static void pseries_kexec_cpu_down(int crash_shutdown, int secondary) +{ + /* Don't risk a hypervisor call if we're crashing */ + if (!crash_shutdown) { + unsigned long vpa = __pa(&get_paca()->lppaca); + + if (unregister_vpa(hard_smp_processor_id(), vpa)) { + printk("VPA deregistration of cpu %u (hw_cpu_id %d) " + "failed\n", smp_processor_id(), + hard_smp_processor_id()); + } + } + + if (ppc64_interrupt_controller == IC_OPEN_PIC) + mpic_teardown_this_cpu(secondary); + else + xics_teardown_cpu(secondary); +} +#endif + struct machdep_calls __initdata pSeries_md = { .probe = pSeries_probe, .setup_arch = pSeries_setup_arch, @@ -619,4 +638,7 @@ struct machdep_calls __initdata pSeries_ .check_legacy_ioport = pSeries_check_legacy_ioport, .system_reset_exception = pSeries_system_reset_exception, .machine_check_exception = pSeries_machine_check_exception, +#ifdef CONFIG_KEXEC + .kexec_cpu_down = pseries_kexec_cpu_down, +#endif }; Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -169,8 +169,8 @@ void kexec_copy_flush(struct kimage *ima */ void kexec_smp_down(void *arg) { - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(1); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 1); local_irq_disable(); kexec_smp_wait(); @@ -217,8 +217,8 @@ static void kexec_prepare_cpus(void) } /* after we tell the others to go down */ - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(0); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 0); put_cpu(); @@ -239,8 +239,8 @@ static void kexec_prepare_cpus(void) * UP to an SMP kernel. */ smp_release_cpus(); - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(0); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 0); local_irq_disable(); } Index: kexec/include/asm-powerpc/machdep.h =================================================================== --- kexec.orig/include/asm-powerpc/machdep.h +++ kexec/include/asm-powerpc/machdep.h @@ -91,7 +91,9 @@ struct machdep_calls { void (*init_IRQ)(void); int (*get_irq)(struct pt_regs *); - void (*cpu_irq_down)(int secondary); +#ifdef CONFIG_KEXEC + void (*kexec_cpu_down)(int crash_shutdown, int secondary); +#endif /* PCI stuff */ /* Called after scanning the bus, before allocating resources */ From michael at ellerman.id.au Tue Nov 8 00:07:07 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:07 +1100 (EST) Subject: [PATCH 9/12] powerpc: Export htab start/end via device tree In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130707.018F7686E6@ozlabs.org> The userspace kexec-tools need to know the location of the htab on non-lpar machines, as well as the end of the kernel. Export via the device tree. NB. This patch has been updated to use "linux,x" property names. You may need to update your kexec-tools to match. arch/powerpc/kernel/setup_64.c | 5 +++ arch/ppc64/kernel/machine_kexec.c | 51 ++++++++++++++++++++++++++++++++++++++ include/asm-powerpc/kexec.h | 1 3 files changed, 57 insertions(+) Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -61,6 +61,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -455,6 +456,10 @@ void __init setup_system(void) */ unflatten_device_tree(); +#ifdef CONFIG_KEXEC + kexec_setup(); /* requires unflattened device tree. */ +#endif + /* * Fill the ppc64_caches & systemcfg structures with informations * retreived from the device-tree. Need to be called before Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -296,3 +296,54 @@ void machine_kexec(struct kimage *image) ppc_md.hpte_clear_all); /* NOTREACHED */ } + +/* Values we need to export to the second kernel via the device tree. */ +static unsigned long htab_base, htab_size, kernel_end; + +static struct property htab_base_prop = { + .name = "linux,htab-base", + .length = sizeof(unsigned long), + .value = (unsigned char *)&htab_base, +}; + +static struct property htab_size_prop = { + .name = "linux,htab-size", + .length = sizeof(unsigned long), + .value = (unsigned char *)&htab_size, +}; + +static struct property kernel_end_prop = { + .name = "linux,kernel-end", + .length = sizeof(unsigned long), + .value = (unsigned char *)&kernel_end, +}; + +static void __init export_htab_values(void) +{ + struct device_node *node; + + node = of_find_node_by_path("/chosen"); + if (!node) + return; + + kernel_end = __pa(_end); + prom_add_property(node, &kernel_end_prop); + + /* On machines with no htab htab_address is NULL */ + if (NULL == htab_address) + goto out; + + htab_base = __pa(htab_address); + prom_add_property(node, &htab_base_prop); + + htab_size = 1UL << ppc64_pft_size; + prom_add_property(node, &htab_size_prop); + + out: + of_node_put(node); +} + +void __init kexec_setup(void) +{ + export_htab_values(); +} Index: kexec/include/asm-powerpc/kexec.h =================================================================== --- kexec.orig/include/asm-powerpc/kexec.h +++ kexec/include/asm-powerpc/kexec.h @@ -41,6 +41,7 @@ extern note_buf_t crash_notes[]; extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern int kexec_crashing_cpu; +extern void __init kexec_setup(void); #else struct kimage; extern void machine_kexec_simple(struct kimage *image); From michael at ellerman.id.au Tue Nov 8 00:07:12 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:12 +1100 (EST) Subject: [PATCH 10/12] powerpc: Parse crashkernel= parameter in first kernel In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130712.0C9CA686EA@ozlabs.org> This patch adds code to parse and setup the crash kernel resource in the first kernel. FIXME: PPC64 should ignore the @x, we always run at 32 MB. Patch-by: Haren Myneni arch/powerpc/kernel/prom.c | 16 +++++++++++ arch/powerpc/kernel/prom_init.c | 55 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) Index: kexec/arch/powerpc/kernel/prom_init.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom_init.c +++ kexec/arch/powerpc/kernel/prom_init.c @@ -44,6 +44,7 @@ #include #include #include +#include #ifdef CONFIG_LOGO_LINUX_CLUT224 #include @@ -190,6 +191,12 @@ static unsigned long __initdata alloc_bo static unsigned long __initdata rmo_top; static unsigned long __initdata ram_top; +#ifdef CONFIG_KEXEC +static unsigned long __initdata prom_crashk_base; +static unsigned long __initdata prom_crashk_size; +#endif + + static struct mem_map_entry __initdata mem_reserve_map[MEM_RESERVE_MAP_SIZE]; static int __initdata mem_reserve_cnt; @@ -527,6 +534,31 @@ static void __init early_cmdline_parse(v RELOC(prom_memory_limit) = ALIGN(RELOC(prom_memory_limit), 0x1000000); #endif } + +#ifdef CONFIG_KEXEC + /* + * crashkernel=size at addr specifies the location to reserve for + * a crash kernel. + */ + opt = strstr(RELOC(prom_cmd_line), RELOC("crashkernel=")); + if (opt) { + opt += 12; + RELOC(prom_crashk_size) = + prom_memparse(opt, (const char **)&opt); + /* Align to 16 MB == size of large page */ + RELOC(prom_crashk_size) = + ALIGN(RELOC(prom_crashk_size), 0x1000000); + if (*opt == '@') { + opt++; + RELOC(prom_crashk_base) = + prom_memparse(opt, (const char **)&opt); + RELOC(prom_crashk_base) = + ALIGN(RELOC(prom_crashk_base), 0x1000000); + } else + prom_printf("Error in 'crashkernel='\n"); + } +#endif + } #ifdef CONFIG_PPC_PSERIES @@ -948,6 +980,13 @@ static void __init prom_init_mem(void) prom_printf(" alloc_top_hi : %x\n", RELOC(alloc_top_high)); prom_printf(" rmo_top : %x\n", RELOC(rmo_top)); prom_printf(" ram_top : %x\n", RELOC(ram_top)); +#ifdef CONFIG_KEXEC + if (RELOC(prom_crashk_base)) { + prom_printf(" crashk_base : %x\n", RELOC(prom_crashk_base)); + prom_printf(" crashk_size : %x\n", RELOC(prom_crashk_size)); + } +#endif + } @@ -2015,6 +2054,10 @@ unsigned long __init prom_init(unsigned */ prom_init_mem(); +#ifdef CONFIG_KEXEC + if (RELOC(prom_crashk_base)) + reserve_mem(RELOC(prom_crashk_base), RELOC(prom_crashk_size)); +#endif /* * Determine which cpu is actually running right _now_ */ @@ -2069,6 +2112,18 @@ unsigned long __init prom_init(unsigned } #endif +#ifdef CONFIG_KEXEC + if (RELOC(prom_crashk_base)) { + prom_setprop(_prom->chosen, "linux,crashkernel-base", + PTRRELOC(&prom_crashk_base), + sizeof(RELOC(prom_crashk_base))); + prom_setprop(_prom->chosen, "linux,crashkernel-size", + PTRRELOC(&prom_crashk_size), + sizeof(RELOC(prom_crashk_size))); + } +#endif + + /* * Fixup any known bugs in the device-tree */ Index: kexec/arch/powerpc/kernel/prom.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom.c +++ kexec/arch/powerpc/kernel/prom.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -1221,6 +1222,17 @@ static int __init early_init_dt_scan_cho } #endif /* CONFIG_PPC_RTAS */ +#ifdef CONFIG_KEXEC + lprop = (u64*)get_flat_dt_prop(node, "linux,crashkernel-base", NULL); + if (lprop) + crashk_res.start = *lprop; + + lprop = (u64*)get_flat_dt_prop(node, "linux,crashkernel-size", NULL); + if (lprop) + crashk_res.end = crashk_res.start + *lprop - 1; + +#endif + /* break now */ return 1; } @@ -1362,6 +1374,10 @@ void __init early_init_devtree(void *par lmb_reserve(__pa(KERNELBASE), __pa(klimit) - __pa(KERNELBASE)); #ifdef CONFIG_CRASH_DUMP lmb_reserve(0, KDUMP_BACKUP_LIMIT); + if (crashk_res.end > 0) { + lmb.rmo_size = _ALIGN_UP(crashk_res.end - crashk_res.start + 1, + PAGE_SIZE); + } #endif early_reserve_mem(); From michael at ellerman.id.au Tue Nov 8 00:07:13 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:13 +1100 (EST) Subject: [PATCH 11/12] powerpc: Add arch-dependant copy_oldmem_page In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130713.DC11A686F6@ozlabs.org> Add arch-dependant copy_oldmem_page. Patch-by: Haren Myneni arch/powerpc/kernel/crash.c | 36 ++++++++++++++++++++++++++++++++++++ include/asm-powerpc/kexec.h | 2 ++ kernel/crash_dump.c | 3 +++ 3 files changed, 41 insertions(+) Index: kexec/arch/powerpc/kernel/crash.c =================================================================== --- kexec.orig/arch/powerpc/kernel/crash.c +++ kexec/arch/powerpc/kernel/crash.c @@ -28,6 +28,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -285,6 +286,41 @@ void __init kdump_setup(void) DBG(" <- kdump_setup()\n"); } +/* + * copy_oldmem_page - copy one page from "oldmem" + * @pfn: page frame number to be copied + * @buf: target memory address for the copy; this can be in kernel address + * space or user address space (see @userbuf) + * @csize: number of bytes to copy + * @offset: offset in bytes into the page (based on pfn) to begin the copy + * @userbuf: if set, @buf is in user address space, use copy_to_user(), + * otherwise @buf is in kernel address space, use memcpy(). + * + * Copy a page from "oldmem". For this page, there is no pte mapped + * in the current kernel. We stitch up a pte, similar to kmap_atomic. + */ +ssize_t copy_oldmem_page(unsigned long pfn, char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + vaddr = (void *)__ioremap(pfn << PAGE_SHIFT, PAGE_SIZE, 0); + + if (userbuf) { + if (copy_to_user(buf, (vaddr + offset), csize)) { + iounmap(vaddr); + return -EFAULT; + } + } else + memcpy(buf, (vaddr + offset), csize); + + iounmap(vaddr); + return csize; +} + static int __init parse_elfcorehdr(char *p) { if (p) Index: kexec/include/asm-powerpc/kexec.h =================================================================== --- kexec.orig/include/asm-powerpc/kexec.h +++ kexec/include/asm-powerpc/kexec.h @@ -30,6 +30,8 @@ #define KEXEC_ARCH KEXEC_ARCH_PPC #endif +#define HAVE_ARCH_COPY_OLDMEM_PAGE + #ifndef __ASSEMBLY__ #define MAX_NOTE_BYTES 2048 Index: kexec/kernel/crash_dump.c =================================================================== --- kexec.orig/kernel/crash_dump.c +++ kexec/kernel/crash_dump.c @@ -14,10 +14,12 @@ #include #include +#include /* Stores the physical address of elf header of crash image. */ unsigned long long elfcorehdr_addr = ELFCORE_ADDR_MAX; +#ifndef HAVE_ARCH_COPY_OLDMEM_PAGE /** * copy_oldmem_page - copy one page from "oldmem" * @pfn: page frame number to be copied @@ -59,3 +61,4 @@ ssize_t copy_oldmem_page(unsigned long p kfree(page); return csize; } +#endif From segher at kernel.crashing.org Tue Nov 8 00:08:38 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 7 Nov 2005 14:08:38 +0100 Subject: [PATCH] powerpc: Nicer printing of address at oops In-Reply-To: <20051106225435.GB7166@pb15.lixom.net> References: <20051106220439.GA7166@pb15.lixom.net> <17262.34658.917350.594965@cargo.ozlabs.ibm.com> <20051106225435.GB7166@pb15.lixom.net> Message-ID: >>> + printk("data at address 0x%016lx\n", regs->dar); >> >> Nice idea, but 16 digits is a bit excessive for 32-bit... > > Ack, it's shared now, I forgot. Thanks. > > Here's a new patch. I don't like the thought of ifdeffing for it so > we'll > just have to live with 8 digit padding for small 64-bit pointers. :-) You can use "%0*lx". Or just use "%p". Segher From michael at ellerman.id.au Tue Nov 8 00:07:16 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 00:07:16 +1100 (EST) Subject: [PATCH 12/12] powerpc: Add support for "linux, usable-memory" on memory nodes In-Reply-To: <1131368803.622486.677825384736.qpush@concordia> Message-ID: <20051107130716.01308686F7@ozlabs.org> Milton has proposed that we should support a "linux,usable-memory" property on memory nodes which describes, in preference to "reg", the regions of memory Linux should use. This facility is required for kdump, to inform the second kernel which memory it should use. A future patch may also recast mem=x support in terms of these properties. Patch-by: Haren Myneni arch/powerpc/kernel/prom.c | 15 ++++++++++----- arch/powerpc/mm/numa.c | 38 +++++++++++++++++++++++++++++--------- 2 files changed, 39 insertions(+), 14 deletions(-) Index: kexec/arch/powerpc/kernel/prom.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom.c +++ kexec/arch/powerpc/kernel/prom.c @@ -574,7 +574,10 @@ static int __init interpret_root_props(s unsigned int *rp; int rpsize = (naddrc + nsizec) * sizeof(unsigned int); - rp = (unsigned int *) get_property(np, "reg", &l); + rp = (unsigned int *) get_property(np, "linux,usable-memory", &l); + if (rp == NULL) + rp = (unsigned int *) get_property(np, "reg", &l); + if (rp != 0 && l >= rpsize) { i = 0; adr = (struct address_range *) (*mem_start); @@ -1292,10 +1295,12 @@ static int __init early_init_dt_scan_mem if (type == NULL || strcmp(type, "memory") != 0) return 0; - reg = (cell_t *)get_flat_dt_prop(node, "reg", &l); - if (reg == NULL) - return 0; - + reg = (cell_t *)get_flat_dt_prop(node, "linux,usable-memory", &l); + if (reg == NULL) { + reg = (cell_t *)get_flat_dt_prop(node, "reg", &l); + if (reg == NULL) + return 0; + } endp = reg + (l / sizeof(cell_t)); DBG("memory scan node %s, reg size %ld, data: %x %x %x %x,\n", Index: kexec/arch/powerpc/mm/numa.c =================================================================== --- kexec.orig/arch/powerpc/mm/numa.c +++ kexec/arch/powerpc/mm/numa.c @@ -22,6 +22,8 @@ #include #include +#include + static int numa_enabled = 1; static int numa_debug; @@ -376,9 +378,15 @@ static int __init parse_numa_properties( unsigned int *memcell_buf; unsigned int len; - memcell_buf = (unsigned int *)get_property(memory, "reg", &len); - if (!memcell_buf || len <= 0) - continue; + memcell_buf = (unsigned int *)get_property(memory, + "linux,usable-memory", &len); + if (!memcell_buf || len <= 0) { + memcell_buf = + (unsigned int *)get_property(memory, "reg", + &len); + if (!memcell_buf || len <= 0) + continue; + } ranges = memory->n_addrs; new_range: @@ -636,9 +644,15 @@ void __init do_init_bootmem(void) unsigned int *memcell_buf; unsigned int len; - memcell_buf = (unsigned int *)get_property(memory, "reg", &len); - if (!memcell_buf || len <= 0) - continue; + memcell_buf = (unsigned int *)get_property(memory, + "linux,usable-memory", &len); + if (!memcell_buf || len <= 0) { + memcell_buf = + (unsigned int *)get_property(memory, + "reg", &len); + if (!memcell_buf || len <= 0) + continue; + } ranges = memory->n_addrs; /* ranges in cell */ new_range: @@ -706,9 +720,15 @@ new_range: unsigned int *memcell_buf; unsigned int len; - memcell_buf = (unsigned int *)get_property(memory, "reg", &len); - if (!memcell_buf || len <= 0) - continue; + memcell_buf = (unsigned int *)get_property(memory, + "linux,usable-memory", &len); + if (!memcell_buf || len <= 0) { + memcell_buf = + (unsigned int *)get_property(memory, + "reg", &len); + if (!memcell_buf || len <= 0) + continue; + } ranges = memory->n_addrs; /* ranges in cell */ new_range2: From haveblue at us.ibm.com Tue Nov 8 00:15:33 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Mon, 07 Nov 2005 14:15:33 +0100 Subject: [PATCH 9/12] powerpc: Export htab start/end via device tree In-Reply-To: <20051107130707.018F7686E6@ozlabs.org> References: <20051107130707.018F7686E6@ozlabs.org> Message-ID: <1131369333.5976.62.camel@localhost> On Tue, 2005-11-08 at 00:07 +1100, Michael Ellerman wrote: > > +#ifdef CONFIG_KEXEC > + kexec_setup(); /* requires unflattened device tree. */ > +#endif Would this #ifdef be more appropriate in the header where this function's prototype currently resides? -- Dave From haveblue at us.ibm.com Tue Nov 8 00:17:26 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Mon, 07 Nov 2005 14:17:26 +0100 Subject: [PATCH 11/12] powerpc: Add arch-dependant copy_oldmem_page In-Reply-To: <20051107130713.DC11A686F6@ozlabs.org> References: <20051107130713.DC11A686F6@ozlabs.org> Message-ID: <1131369446.5976.64.camel@localhost> On Tue, 2005-11-08 at 00:07 +1100, Michael Ellerman wrote: > > --- kexec.orig/include/asm-powerpc/kexec.h > +++ kexec/include/asm-powerpc/kexec.h > @@ -30,6 +30,8 @@ > #define KEXEC_ARCH KEXEC_ARCH_PPC > #endif > > +#define HAVE_ARCH_COPY_OLDMEM_PAGE > + > #ifndef __ASSEMBLY__ Isn't something like that more properly done in Kconfig? I find it very hard to trace down exactly what CONFIG options it takes to get some of those #defines to happen. Kconfig makes it much more clear. -- Dave From vatsa at in.ibm.com Tue Nov 8 00:22:02 2005 From: vatsa at in.ibm.com (Srivatsa Vaddagiri) Date: Mon, 7 Nov 2005 18:52:02 +0530 Subject: 2.6.14-mm1 doesnt bootup on PPC64 Message-ID: <20051107132201.GA13514@in.ibm.com> Hello, I am having problems with booting 2.6.14-mm1 on a PPC64 box (p630 - 4way Power4 box). I get this message : "<3>Badness in smp_call_function at arch/powerpc/kernel/smp.c:202" System locks up after that (doesnt respond to Softreset also). The message corresponds to the WARN_ON(irqs_disabled()) check in smp_call_function. Surprisingly I dont get any backtrace after this message. I configured so that xmon is enabled by default hoping to breakin via Softreset, but Softreset failed to breakin too. I in fact tried extracting the caller of smp_call_function via the built_in_return_address(0), but that also didnt give me anything. So is this a known issue and a patch exists to fix it?! -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 From amodra at bigpond.net.au Tue Nov 8 00:46:33 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 8 Nov 2005 00:16:33 +1030 Subject: 2.6.14-mm1 doesnt bootup on PPC64 In-Reply-To: <20051107132201.GA13514@in.ibm.com> References: <20051107132201.GA13514@in.ibm.com> Message-ID: <20051107134633.GS26395@bubble.grove.modra.org> On Mon, Nov 07, 2005 at 06:52:02PM +0530, Srivatsa Vaddagiri wrote: > "<3>Badness in smp_call_function at arch/powerpc/kernel/smp.c:202" > > System locks up after that (doesnt respond to Softreset also). Compiler version? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24644 might be relevant. -- Alan Modra IBM OzLabs - Linux Technology Centre From arnd at arndb.de Tue Nov 8 02:19:04 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 7 Nov 2005 16:19:04 +0100 Subject: libspe for 2.6.14-rc5 spufs snapshot In-Reply-To: <200510272312.57007.arnd@arndb.de> References: <20051028023943.505038000@localhost> <200510272312.57007.arnd@arndb.de> Message-ID: <200511071619.05376.arnd@arndb.de> On Dunnersdag 27 Oktober 2005 23:12, Arnd Bergmann wrote: > This is the current snapshot of Dirk Herrendoerfers libspe, using > the spufs interfaces from the patch set. After a series of incompatible > versions, we will now maintain compatibility with the user interfaces > in this version, and only serious bug fixes are included before the 1.0 > release. Ok, it turned out that there were still some fixes needed, the list follows: - Bump version to 1.0 - add stubs for {get,set}context and {get,set}affinity calls, as they were documented, so they should at least be there. - use correct name for struct spe_context. - do not use obsolete definitions for DMA, but instead use intrinsics directly to make DMA test case build. - make event test case build without warnings. - make simple test case use the new library for building, not the installed one. The patch to the previous version follows, the actual tarball is attached as well. Arnd <>< --- diff -urN libspe-1.0pre1/Makefile libspe-1.0/Makefile --- libspe-1.0pre1/Makefile 2005-10-28 05:04:50.000000000 +0200 +++ libspe-1.0/Makefile 2005-11-04 16:48:28.000000000 +0100 @@ -43,7 +43,7 @@ INSTALL_DIR := ${INSTALL} -d -m 755 MAJOR_VERSION := 1 -MINOR_VERSION := 0pre1 +MINOR_VERSION := 0 libspe_SO := libspe.so.${MAJOR_VERSION}.${MINOR_VERSION} libspe_SONAME := libspe.so.${MAJOR_VERSION} @@ -120,7 +120,7 @@ FULLNAME = $(PACKAGE)-$(MAJOR_VERSION).$(MINOR_VERSION) TARBALL = $(SOURCES)$(FULLNAME).tar.gz SOURCEFILES = $(TARBALL) -RPMBUILD = ppc rpmbuild +RPMBUILD = ppc rpmbuild --target=ppc # ../make.rules contains the local rpm build infrastructure in CVS -include ../make.rules diff -urN libspe-1.0pre1/libspe.h libspe-1.0/libspe.h --- libspe-1.0pre1/libspe.h 2005-10-17 19:19:08.000000000 +0200 +++ libspe-1.0/libspe.h 2005-11-04 16:47:55.000000000 +0100 @@ -37,7 +37,7 @@ /* spe user context */ -struct spu_ucontext { +struct spe_ucontext { unsigned int gprs[128][4]; unsigned int fpcr[4]; unsigned int decr; @@ -108,6 +108,13 @@ extern int spe_group_defaults(int policy, int priority, int spe_events); extern int spe_get_threads(spe_gid_t gid, speid_t *spe_ids); +/* Currently without implementation or support + */ +extern int spe_get_affinity( speid_t speid, unsigned long *mask); +extern int spe_set_affinity(speid_t speid, unsigned long mask); +extern int spe_get_context(speid_t speid, struct spe_ucontext *uc); +extern int spe_set_context(speid_t speid, struct spe_ucontext *uc); + /* APIs for loading SPE images */ extern spe_program_handle_t *spe_open_image(const char *filename); diff -urN libspe-1.0pre1/libspe.spec libspe-1.0/libspe.spec --- libspe-1.0pre1/libspe.spec 2005-10-28 05:05:03.000000000 +0200 +++ libspe-1.0/libspe.spec 2005-11-04 16:48:28.000000000 +0100 @@ -1,5 +1,5 @@ Name: libspe -Version: 1.0pre1 +Version: 1.0 Release: 1 License: LGPL Group: System Environment/Base diff -urN libspe-1.0pre1/spe.c libspe-1.0/spe.c --- libspe-1.0pre1/spe.c 2005-10-27 20:44:12.000000000 +0200 +++ libspe-1.0/spe.c 2005-11-07 13:56:31.000000000 +0100 @@ -1467,6 +1467,42 @@ return speid; } +int +spe_get_affinity( speid_t speid, unsigned long *mask) +{ + printf("spe_get_affinity: not implemented in this release.\n"); + + errno=ENOSYS; + return -1; +} + +int +spe_set_affinity(speid_t speid, unsigned long mask) +{ + printf("spe_set_affinity: not implemented in this release.\n"); + + errno=ENOSYS; + return -1; +} + +int +spe_get_context(speid_t speid, struct spe_ucontext *uc) +{ + printf("spe_get_context: not implemented in this release.\n"); + + errno=ENOSYS; + return -1; +} + +int +spe_set_context(speid_t speid, struct spe_ucontext *uc) +{ + printf("spe_set_context: not implemented in this release.\n"); + + errno=ENOSYS; + return -1; +} + /* * mfc.h direct call-ins * @@ -1546,7 +1582,13 @@ struct thread_store *thread_store = spu_ps_addr; int rc; - rc = write(thread_store->fd_sig1, &data, 4); + if (signal_reg == SPE_SIG_NOTIFY_REG_1) + rc = write(thread_store->fd_sig1, &data, 4); + else if (signal_reg == SPE_SIG_NOTIFY_REG_2) + rc = write(thread_store->fd_sig2, &data, 4); + else + return -1; + if (rc == 4) rc = 0; diff -urN libspe-1.0pre1/tests/dma/Makefile libspe-1.0/tests/dma/Makefile --- libspe-1.0pre1/tests/dma/Makefile 2005-10-17 19:19:09.000000000 +0200 +++ libspe-1.0/tests/dma/Makefile 2005-11-07 15:59:54.000000000 +0100 @@ -22,7 +22,7 @@ CTAGS = ctags CFLAGS := -O2 -m32 -Wall -I../.. -I../../include -g -SPECFLAGS := -O2 -Wall -I../../include +SPECFLAGS := -O2 -Wall -I../../include -Wno-main LDFLAGS := -m32 LIBS := -L../.. -lspe -lpthread diff -urN libspe-1.0pre1/tests/dma/spe-dma-read.c libspe-1.0/tests/dma/spe-dma-read.c --- libspe-1.0pre1/tests/dma/spe-dma-read.c 2005-10-17 19:19:09.000000000 +0200 +++ libspe-1.0/tests/dma/spe-dma-read.c 2005-11-07 13:55:32.000000000 +0100 @@ -1,4 +1,4 @@ -#include +#include typedef union { unsigned long long ull; @@ -13,9 +13,11 @@ /* Write to specified address */ result = 23; - _set_mfc_tagmask(1); - _read_mfc((void*)&result, argAddress, 4, 0, 0, 0); - _wait_mfc_tags_all(); + + __builtin_si_wrch((22), __builtin_si_from_uint(1)); + spu_mfcdma32((void *) &result, argAddress, 4, 0, + (((0) << 24) | ((0) << 16) | (0x40))); + spu_mfcstat(0x2); /* Done */ return result; diff -urN libspe-1.0pre1/tests/dma/spe-dma-write.c libspe-1.0/tests/dma/spe-dma-write.c --- libspe-1.0pre1/tests/dma/spe-dma-write.c 2005-10-17 19:19:09.000000000 +0200 +++ libspe-1.0/tests/dma/spe-dma-write.c 2005-11-07 13:54:52.000000000 +0100 @@ -1,4 +1,3 @@ -#include #include "spu_intrinsics.h" typedef union { @@ -15,9 +14,10 @@ /* Write to specified address */ result = 42; - _write_mfc(argAddress, &result, 4, 0, 0, 0); - _set_mfc_tagmask(1); - _wait_mfc_tags_all(); + spu_mfcdma32 (&result, argAddress, 4, 0, + (((0) << 24) | ((0) << 16) | (0x20))); + __builtin_si_wrch((22),__builtin_si_from_uint(1)); + spu_mfcstat(0x2); /* Done */ return 0; diff -urN libspe-1.0pre1/tests/event/ppe-start-stop.c libspe-1.0/tests/event/ppe-start-stop.c --- libspe-1.0pre1/tests/event/ppe-start-stop.c 2005-10-25 21:29:17.000000000 +0200 +++ libspe-1.0/tests/event/ppe-start-stop.c 2005-11-07 15:59:54.000000000 +0100 @@ -81,7 +81,7 @@ else { printf("get_event: revents=0x%04x\n",myevent.revents); - printf("get_event: data=0x%04x\n",myevent.data); + printf("get_event: data=0x%04lx\n",myevent.data); printf("get_event: speid=%p\n",myevent.speid); spe_kill(myevent.speid,SIGCONT); @@ -105,7 +105,7 @@ myevent[1].gid = spe_group; myevent[1].events = SPE_EVENT_MAILBOX; - ret = spe_get_event( &myevent,2, 100); + ret = spe_get_event(&myevent[0], 2, 100); if (!ret) { @@ -116,11 +116,11 @@ printf("get_event: Got %i events.\n",ret); printf("get_event[0]: revents=0x%04x\n",myevent[0].revents); - printf("get_event[0]: data=0x%04x\n",myevent[0].data); + printf("get_event[0]: data=0x%04lx\n",myevent[0].data); printf("get_event[0]: speid=%p\n",myevent[0].speid); printf("get_event[1]: revents=0x%04x\n",myevent[1].revents); - printf("get_event[1]: data=0x%04x\n",myevent[1].data); + printf("get_event[1]: data=0x%04lx\n",myevent[1].data); printf("get_event[1]: speid=%p\n",myevent[1].speid); if (myevent[0].revents == SPE_EVENT_STOP) diff -urN libspe-1.0pre1/tests/start-stop/Makefile libspe-1.0/tests/start-stop/Makefile --- libspe-1.0pre1/tests/start-stop/Makefile 2005-10-17 19:19:09.000000000 +0200 +++ libspe-1.0/tests/start-stop/Makefile 2005-11-04 16:47:55.000000000 +0100 @@ -21,11 +21,11 @@ SPECC := spu-gcc CTAGS = ctags -CFLAGS := -O2 -m32 -Wall -g +CFLAGS := -O2 -m32 -Wall -g -I../.. SPECFLAGS := -O2 -Wall -LDFLAGS := -m32 -LIBS := -lspe +LDFLAGS := -m32 +LIBS := -L../.. -lspe -lpthread SPE_OBJS := spe-start-stop OBJS := ppe-start-stop -------------- next part -------------- A non-text attachment was scrubbed... Name: libspe-1.0.tar.gz Type: application/x-tgz Size: 34369 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051107/522a0ef6/attachment.bin From olof at lixom.net Tue Nov 8 03:40:25 2005 From: olof at lixom.net (Olof Johansson) Date: Mon, 7 Nov 2005 08:40:25 -0800 Subject: [PATCH] powerpc: Nicer printing of address at oops In-Reply-To: References: <20051106220439.GA7166@pb15.lixom.net> <17262.34658.917350.594965@cargo.ozlabs.ibm.com> <20051106225435.GB7166@pb15.lixom.net> Message-ID: <20051107164025.GE7166@pb15.lixom.net> On Mon, Nov 07, 2005 at 02:08:38PM +0100, Segher Boessenkool wrote: > You can use "%0*lx". Or just use "%p". I hope everyone's happy now. Andrew, please apply. -Olof --- Add nicer printing of faulting address on unresolvable kernel faults. Makes life a little easier for those who don't know how to decode our register contents at oops time. Signed-off-by: Olof Johansson Index: 2.6/arch/powerpc/mm/fault.c =================================================================== --- 2.6.orig/arch/powerpc/mm/fault.c 2005-11-06 19:24:16.000000000 -0800 +++ 2.6/arch/powerpc/mm/fault.c 2005-11-07 08:33:02.000000000 -0800 @@ -389,5 +389,23 @@ void bad_page_fault(struct pt_regs *regs } /* kernel has accessed a bad area */ + + printk(KERN_ALERT "Unable to handle kernel paging request for "); + switch (regs->trap) { + case 0x300: + case 0x380: + printk("data at address 0x%0*lx\n", sizeof(long)*2, + regs->dar); + break; + case 0x400: + case 0x480: + printk("instruction fetch\n"); + break; + default: + printk("unknown fault\n"); + } + printk(KERN_ALERT "Faulting instruction address: 0x%0*lx\n", + sizeof(long)*2, regs->nip); + die("Kernel access of bad area", regs, sig); } From arnd at arndb.de Tue Nov 8 04:11:43 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 7 Nov 2005 18:11:43 +0100 Subject: [patch 01/13] spufs snapshot against 2.6.14-rc5 In-Reply-To: <20051028025531.577537000@localhost> References: <20051028023943.505038000@localhost> <20051028025531.577537000@localhost> Message-ID: <200511071811.44288.arnd@arndb.de> On Freedag 28 Oktober 2005 04:39, arnd at arndb.de wrote: > The SPU file system, base > First of two updates against this. --- spufs: Fix oops when spufs module is not loaded try_module_get returns true when NULL arguments, so we first need to check if there is a module loaded before getting the reference count. Signed-off-by: Arnd Bergmann Index: linux-cg/arch/ppc64/kernel/spu_syscalls.c =================================================================== --- linux-cg.orig/arch/ppc64/kernel/spu_syscalls.c +++ linux-cg/arch/ppc64/kernel/spu_syscalls.c @@ -37,11 +37,12 @@ asmlinkage long sys_spu_create_thread(co unsigned int flags, mode_t mode) { long ret; + struct module *owner = spufs_calls.owner; ret = -ENOSYS; - if (try_module_get(spufs_calls.owner)) { + if (owner && try_module_get(spufs_calls.owner)) { ret = spufs_calls.create_thread(name, flags, mode); - module_put(spufs_calls.owner); + module_put(owner); } return ret; } @@ -51,16 +52,17 @@ asmlinkage long sys_spu_run(int fd, __u3 long ret; struct file *filp; int fput_needed; + struct module *owner = spufs_calls.owner; ret = -ENOSYS; - if (try_module_get(spufs_calls.owner)) { + if (owner && try_module_get(owner)) { ret = -EBADF; filp = fget_light(fd, &fput_needed); if (filp) { ret = spufs_calls.spu_run(filp, unpc, ustatus); fput_light(filp, fput_needed); } - module_put(spufs_calls.owner); + module_put(owner); } return ret; } From arnd at arndb.de Tue Nov 8 04:13:59 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 7 Nov 2005 18:13:59 +0100 Subject: [patch 01/13] spufs snapshot against 2.6.14-rc5 In-Reply-To: <200511071811.44288.arnd@arndb.de> References: <20051028023943.505038000@localhost> <20051028025531.577537000@localhost> <200511071811.44288.arnd@arndb.de> Message-ID: <200511071814.00435.arnd@arndb.de> On Maandag 07 November 2005 18:11, Arnd Bergmann wrote: > On Freedag 28 Oktober 2005 04:39, arnd at arndb.de wrote: > > The SPU file system, base > > > First of two updates against this. > Second and (hopefully) last update against 2.6.14: --- Turn off spufs debugging spufs is rather noisy when debugging is enabled, this turns off the messages for production use. Signed-off-by: Arnd Bergmann Index: linux-cg/arch/ppc64/kernel/spu_base.c =================================================================== --- linux-cg.orig/arch/ppc64/kernel/spu_base.c +++ linux-cg/arch/ppc64/kernel/spu_base.c @@ -20,7 +20,7 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#define DEBUG 1 +#undef DEBUG #include #include Index: linux-cg/fs/spufs/sched.c =================================================================== --- linux-cg.orig/fs/spufs/sched.c +++ linux-cg/fs/spufs/sched.c @@ -24,7 +24,8 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#define DEBUG 1 +#undef DEBUG + #include #include #include From kravetz at us.ibm.com Tue Nov 8 04:39:48 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 09:39:48 -0800 Subject: [PATCH 4/4] Memory Add Fixes for ppc64 In-Reply-To: <17262.42750.810366.294231@cargo.ozlabs.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104232109.GE25545@w-mikek2.ibm.com> <17262.42750.810366.294231@cargo.ozlabs.ibm.com> Message-ID: <20051107173948.GB5821@w-mikek2.ibm.com> On Mon, Nov 07, 2005 at 11:59:42AM +1100, Paul Mackerras wrote: > Mike Kravetz writes: > > ppc64 needs a special sysfs probe file for adding new memory. > > Does arch/powerpc/Kconfig need a similar fix then? Yes it does. Sorry, I haven't been paying as much attention to the merge as I should. Here is a new version. ppc64 needs a special sysfs probe file for adding new memory. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7/arch/powerpc/Kconfig linux-2.6.14-git7.work/arch/powerpc/Kconfig --- linux-2.6.14-git7/arch/powerpc/Kconfig 2005-11-04 21:21:05.000000000 +0000 +++ linux-2.6.14-git7.work/arch/powerpc/Kconfig 2005-11-07 17:32:45.000000000 +0000 @@ -569,6 +569,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID def_bool y depends on NEED_MULTIPLE_NODES +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG + # Some NUMA nodes have memory ranges that span # other nodes. Even though a pfn is valid and # between a node's start and end pfns, it may not diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig --- linux-2.6.14-git7/arch/ppc64/Kconfig 2005-11-04 21:21:06.000000000 +0000 +++ linux-2.6.14-git7.work/arch/ppc64/Kconfig 2005-11-07 17:31:51.000000000 +0000 @@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID def_bool y depends on NEED_MULTIPLE_NODES +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG + # Some NUMA nodes have memory ranges that span # other nodes. Even though a pfn is valid and # between a node's start and end pfns, it may not From linas at austin.ibm.com Tue Nov 8 04:55:41 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 11:55:41 -0600 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> Message-ID: <20051107175541.GB19593@austin.ibm.com> On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark: > Greg KH writes: > > > > +enum pcierr_result { > > > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ > > > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ > > > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > > > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > > > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > > > +}; > > > > No, do not create new types of error or return codes. Use the standard > > -EFOO values. You can document what they should each return, and mean, > > but do not create new codes. > > Actually, these are not error or return codes, but rather requested > actions Yes. > (maybe somewhat misnamed). As to naming, my mind went blank on coming up with a good name, and the results was a poor name. I now note that "EDAC" ("Error Detection ad Correction") is now taken. How about "PECS" ("PCI Error Correction System") ? I guess "PCI Error Detection And Recovery System" (PEDERAST) might have an inappropriate set of connotations. > We can map them on to -EFOO values > but it will be rather strained (-ECONNRESET for "please reset the > slot", anyone? :). Yes, that would only lead to confusion. > > Also, you create an enum, but yet do not use it in your function > > callback definition, which means you really didn't want to create it in > > the first place... > > Yes, they could be #defines. In one incarnation, they were #defines. The enum was supposed to be the return value of the error notification callbacks. I can prepare a new patch: would you prefer: 1) lose typing: #defines and int return value? 2) strong typing: enum and enum return value? I often prefer strong typing. And do you want a patch now, or later? --linas From greg at kroah.com Tue Nov 8 05:27:27 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 10:27:27 -0800 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107175541.GB19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> Message-ID: <20051107182727.GD18861@kroah.com> On Mon, Nov 07, 2005 at 11:55:41AM -0600, linas wrote: > On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark: > > Greg KH writes: > > > > > > +enum pcierr_result { > > > > + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ > > > > + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ > > > > + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > > > > + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > > > > + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > > > > +}; > > > > > > No, do not create new types of error or return codes. Use the standard > > > -EFOO values. You can document what they should each return, and mean, > > > but do not create new codes. > > > > Actually, these are not error or return codes, but rather requested > > actions > > Yes. Ok, then make them be stronger, and not return an int, as everyone will get that wrong. > In one incarnation, they were #defines. The enum was supposed to be > the return value of the error notification callbacks. > > I can prepare a new patch: would you prefer: > > 1) lose typing: #defines and int return value? > > 2) strong typing: enum and enum return value? 3) realy strong typing that sparse can detect. enums don't really work, as you can get away with using an integer and the compiler will never complain. Please use a typedef (yeah, I said typedef) in the way that sparse will catch any bad users of the code. > I often prefer strong typing. > > And do you want a patch now, or later? Depends on when you want to see this make it into mainline :) thanks, greg k-h From linas at austin.ibm.com Tue Nov 8 05:56:21 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 12:56:21 -0600 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107182727.GD18861@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> Message-ID: <20051107185621.GD19593@austin.ibm.com> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Am compiling now. > enums don't really work, as you can get away with using an integer and > the compiler will never complain. Please use a typedef (yeah, I said > typedef) in the way that sparse will catch any bad users of the code. How about typedef'ing structs? I'm not to clear on what "sparse" can do; however, in the good old days, gcc allowed you to commit great sins when passing "struct blah *" to subroutines, whereas it stoped you cold if you tried the same trick with a typedef'ed "blah_t *". This got me into the habit of turning all structs into typedefs in my personal projects. Can we expect something similar for the kernel, and in particular, should we start typedefing structs now? (Documentation/CodingStyle doesn't mention typedef at all). --linas From greg at kroah.com Tue Nov 8 06:02:45 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 11:02:45 -0800 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107185621.GD19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> Message-ID: <20051107190245.GA19707@kroah.com> On Mon, Nov 07, 2005 at 12:56:21PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > > > 3) realy strong typing that sparse can detect. > > Am compiling now. > > > enums don't really work, as you can get away with using an integer and > > the compiler will never complain. Please use a typedef (yeah, I said > > typedef) in the way that sparse will catch any bad users of the code. > > How about typedef'ing structs? No. Use __bitwise. See the lkml archives for how to do this properly. > I'm not to clear on what "sparse" can do; however, in the good old days, > gcc allowed you to commit great sins when passing "struct blah *" to > subroutines, whereas it stoped you cold if you tried the same trick > with a typedef'ed "blah_t *". This got me into the habit of turning > all structs into typedefs in my personal projects. Can we expect > something similar for the kernel, and in particular, should we start > typedefing structs now? No, never typedef a struct. That's just wrong. gcc should warn you just the same if you pass the wrong struct pointer (and all of your code builds without warnings, right?) > (Documentation/CodingStyle doesn't mention typedef at all). If it does, it should say not to use it at all :) Except for this case, it's special... thanks, greg k-h From linas at austin.ibm.com Tue Nov 8 06:36:00 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 13:36:00 -0600 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107190245.GA19707@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> Message-ID: <20051107193600.GE19593@austin.ibm.com> On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark: > > > I'm not to clear on what "sparse" can do; however, in the good old days, > > gcc allowed you to commit great sins when passing "struct blah *" to > > subroutines, whereas it stoped you cold if you tried the same trick > > with a typedef'ed "blah_t *". This got me into the habit of turning > > all structs into typedefs in my personal projects. Can we expect > > something similar for the kernel, and in particular, should we start > > typedefing structs now? > > No, never typedef a struct. That's just wrong. Its a defacto convention for most C-language apps, see, for example Xlib, gtk and gnome. Also, "grep typedef include/linux/*" shows that many kernel device drivers use this convention. > gcc should warn you > just the same if you pass the wrong struct pointer There were many cases where it did not warn (I don't remember the case of subr calls). I beleive this had to do with ANSI-C spec issues dating to the 1990's; traditional C is weakly typed. Its not just gcc; anyoe who coded for a while eventually discovered that tyedefs where strongly typed, but "struct blah *" were not. > (and all of your code > builds without warnings, right?) :-/ Yes, of course. --linas From linas at austin.ibm.com Tue Nov 8 06:57:27 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 13:57:27 -0600 Subject: [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107182727.GD18861@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> Message-ID: <20051107195727.GF19593@austin.ibm.com> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > 3) realy strong typing that sparse can detect. PCI Error Recovery: header file patch Change enums and subroutine signatures to be strongly typed, per recent discussion with GregKH. Also, change the acronym to the more unique, less generic "PERS" "PCI Error Recovery System". Greg, Please apply. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/include/linux/pci.h =================================================================== --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600 +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600 @@ -82,11 +82,11 @@ * the pci device. If some PCI bus between here and the pci device * has crashed or locked up, this info is reflected here. */ -enum pci_channel_state { +typedef enum { pci_channel_io_normal = 0, /* I/O channel is in normal state */ pci_channel_io_frozen = 1, /* I/O to channel is blocked */ pci_channel_io_perm_failure, /* PCI card is dead */ -}; +} pci_channel_state_t; /* * The pci_dev structure is used to describe PCI devices. @@ -121,7 +121,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ - enum pci_channel_state error_state; /* current connectivity state */ + pci_channel_state_t error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -245,35 +245,35 @@ }; /* ---------------------------------------------------------------- */ -/** PCI error recovery infrastructure. If a PCI device driver provides +/** PCI Error Recovery System (PERS). If a PCI device driver provides * a set fof callbacks in struct pci_error_handlers, then that device driver * will be notified of PCI bus errors, and will be driven to recovery * when an error occurs. */ -enum pcierr_result { - PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */ - PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ - PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ - PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ - PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ -}; +typedef enum { + PERS_RESULT_NONE = 0, /* no result/none/not supported in device driver */ + PERS_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ + PERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ + PERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ + PERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ +} pers_result_t; /* PCI bus error event callbacks */ struct pci_error_handlers { /* PCI bus error detected on this device */ - int (*error_detected)(struct pci_dev *dev, - enum pci_channel_state error); + pers_result_t (*error_detected)(struct pci_dev *dev, + pci_channel_state_t error); /* MMIO has been re-enabled, but not DMA */ - int (*mmio_enabled)(struct pci_dev *dev); + pers_result_t (*mmio_enabled)(struct pci_dev *dev); /* PCI Express link has been reset */ - int (*link_reset)(struct pci_dev *dev); + pers_result_t (*link_reset)(struct pci_dev *dev); /* PCI slot has been reset */ - int (*slot_reset)(struct pci_dev *dev); + pers_result_t (*slot_reset)(struct pci_dev *dev); /* Device driver may resume normal operations */ void (*resume)(struct pci_dev *dev); From hch at lst.de Tue Nov 8 06:59:43 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 7 Nov 2005 20:59:43 +0100 Subject: [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107195943.GA32566@lst.de> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. > > > PCI Error Recovery: header file patch > > Change enums and subroutine signatures to be strongly typed, per recent > discussion with GregKH. Also, change the acronym to the more unique, > less generic "PERS" "PCI Error Recovery System". > > Greg, Please apply. > > Signed-off-by: Linas Vepstas > > -- > Index: linux-2.6.14-mm1/include/linux/pci.h > =================================================================== > --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600 > +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600 > @@ -82,11 +82,11 @@ > * the pci device. If some PCI bus between here and the pci device > * has crashed or locked up, this info is reflected here. > */ > -enum pci_channel_state { > +typedef enum { > pci_channel_io_normal = 0, /* I/O channel is in normal state */ > pci_channel_io_frozen = 1, /* I/O to channel is blocked */ > pci_channel_io_perm_failure, /* PCI card is dead */ > -}; > +} pci_channel_state_t; this is not strongly typed, just a completely useless typedef. From greg at kroah.com Tue Nov 8 07:02:57 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 12:02:57 -0800 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107193600.GE19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> Message-ID: <20051107200257.GA22524@kroah.com> On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark: > > > > > I'm not to clear on what "sparse" can do; however, in the good old days, > > > gcc allowed you to commit great sins when passing "struct blah *" to > > > subroutines, whereas it stoped you cold if you tried the same trick > > > with a typedef'ed "blah_t *". This got me into the habit of turning > > > all structs into typedefs in my personal projects. Can we expect > > > something similar for the kernel, and in particular, should we start > > > typedefing structs now? > > > > No, never typedef a struct. That's just wrong. > > Its a defacto convention for most C-language apps, see, for > example Xlib, gtk and gnome. The kernel is not those projects. > Also, "grep typedef include/linux/*" shows that many kernel device > drivers use this convention. They are wrong and should be fixed. See my old OLS paper on all about the problems of using typedefs in kernel code. > > gcc should warn you > > just the same if you pass the wrong struct pointer > > There were many cases where it did not warn (I don't remember > the case of subr calls). I beleive this had to do with ANSI-C spec > issues dating to the 1990's; traditional C is weakly typed. > > Its not just gcc; anyoe who coded for a while eventually discovered > that tyedefs where strongly typed, but "struct blah *" were not. Sorry, but you are using a broken compiler if it doesn't complain about this. thanks, greg k-h From greg at kroah.com Tue Nov 8 07:03:52 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 12:03:52 -0800 Subject: [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107200352.GB22524@kroah.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. > > > PCI Error Recovery: header file patch > > Change enums and subroutine signatures to be strongly typed, per recent > discussion with GregKH. Also, change the acronym to the more unique, > less generic "PERS" "PCI Error Recovery System". > > Greg, Please apply. > > Signed-off-by: Linas Vepstas > > -- > Index: linux-2.6.14-mm1/include/linux/pci.h > =================================================================== > --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600 > +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 13:55:35.745830682 -0600 > @@ -82,11 +82,11 @@ > * the pci device. If some PCI bus between here and the pci device > * has crashed or locked up, this info is reflected here. > */ > -enum pci_channel_state { > +typedef enum { > pci_channel_io_normal = 0, /* I/O channel is in normal state */ > pci_channel_io_frozen = 1, /* I/O to channel is blocked */ > pci_channel_io_perm_failure, /* PCI card is dead */ > -}; > +} pci_channel_state_t; No, this doesn't help out at all. Please go look at the __bitwise documentation. Good luck, greg k-h From linas at austin.ibm.com Tue Nov 8 07:41:36 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 14:41:36 -0600 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107200257.GA22524@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> Message-ID: <20051107204136.GG19593@austin.ibm.com> On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark: > On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote: > > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark: > > > > > > No, never typedef a struct. That's just wrong. > > > > Its a defacto convention for most C-language apps, see, for > > example Xlib, gtk and gnome. > > The kernel is not those projects. !! > > Also, "grep typedef include/linux/*" shows that many kernel device > > drivers use this convention. > > They are wrong and should be fixed. What, precisely, is wrong? > See my old OLS paper on all about the problems of using typedefs in > kernel code. Is this on the web somewhere? Google is having trouble finding it. I understand that old code bases often choke on typedefs; forward declarations are a big problem. Not to be rude, but choking for forward decl's is often a symptom of poorly-designed code. > > > gcc should warn you > > > just the same if you pass the wrong struct pointer > > > > There were many cases where it did not warn (I don't remember > > the case of subr calls). I beleive this had to do with ANSI-C spec > > issues dating to the 1990's; traditional C is weakly typed. > > > > Its not just gcc; anyoe who coded for a while eventually discovered > > that tyedefs where strongly typed, but "struct blah *" were not. > > Sorry, but you are using a broken compiler if it doesn't complain about > this. Uhh, gcc? Maybe I've just got more mileage under my wheels. Of all of the compilers I've used, gcc has always had the strictest checking, and was the most verbose about warnings. There's a trick that pros use when they inherit crufty old code: run it through gcc first, and clean it up, even if the project requires using some other compiler. I was simply stating a fact about gcc and about standard ANSI-C type-checking that is "well known" to anyone who's been around the block. I was not trying to start an argument. --linas From greg at kroah.com Tue Nov 8 07:46:53 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 12:46:53 -0800 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107204136.GG19593@austin.ibm.com> References: <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> Message-ID: <20051107204653.GA23705@kroah.com> On Mon, Nov 07, 2005 at 02:41:36PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark: > > On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote: > > > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark: > > > > > > > > No, never typedef a struct. That's just wrong. > > > > > > Its a defacto convention for most C-language apps, see, for > > > example Xlib, gtk and gnome. > > > > The kernel is not those projects. > > !! Yeah, anyone who thinks that Xlib is the paradigm for coding style... > > > Also, "grep typedef include/linux/*" shows that many kernel device > > > drivers use this convention. > > > > They are wrong and should be fixed. > > What, precisely, is wrong? > > > See my old OLS paper on all about the problems of using typedefs in > > kernel code. > > Is this on the web somewhere? Google is having trouble finding it. http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_paper/codingstyle.ps and the presentation is at: http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ > > > > gcc should warn you > > > > just the same if you pass the wrong struct pointer > > > > > > There were many cases where it did not warn (I don't remember > > > the case of subr calls). I beleive this had to do with ANSI-C spec > > > issues dating to the 1990's; traditional C is weakly typed. > > > > > > Its not just gcc; anyoe who coded for a while eventually discovered > > > that tyedefs where strongly typed, but "struct blah *" were not. > > > > Sorry, but you are using a broken compiler if it doesn't complain about > > this. > > Uhh, gcc? Try it in the kernel today. You will get a warning if you pass in a pointer to a different structure type than it was defined as. > I was simply stating a fact about gcc and about standard ANSI-C > type-checking that is "well known" to anyone who's been around the > block. I was not trying to start an argument. Then let's end it here... thanks, greg k-h From kravetz at us.ibm.com Tue Nov 8 07:47:43 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 12:47:43 -0800 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <1131149070.29195.41.camel@gaston> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> Message-ID: <20051107204743.GC5821@w-mikek2.ibm.com> On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote: > This patch will have to be slightly reworked on top of the 64k pages > one. It should be trivial though. Ran into an issue with the interaction of SPARSEMEM and 64k pages. SPARSEMEM defines the pp64 section size to be 16MB which corresponds to the smallest LMB size. There is a check in the SPARSEMEM code to ensure that MAX_ORDER (actually MAX_ORDER-1) block size is not greater than section size. Within the Kconfig file, there is this: # We optimistically allocate largepages from the VM, so make the limit # large enough (16MB). This badly named config option is actually # max order + 1 config FORCE_MAX_ZONEORDER int depends on PPC64 default "13" Just curious if we still want to boost MAX_ORDER like this with 64k pages? Doesn't that make the MAX_ORDER block size 256MB in this case? Also, not quite sure what happens if memory size (a 16 MB multiple) does not align with a MAX_ORDER block size (a 256MB multiple in this case). My 'guess' is that the page allocator would not use it as it would not fit within the buddy system. cc'ing SPARSEMEM author Andy Whitcroft. -- Mike From benh at kernel.crashing.org Tue Nov 8 08:12:56 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Nov 2005 08:12:56 +1100 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051107204743.GC5821@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051107204743.GC5821@w-mikek2.ibm.com> Message-ID: <1131397976.4652.52.camel@gaston> > Just curious if we still want to boost MAX_ORDER like this with 64k > pages? Doesn't that make the MAX_ORDER block size 256MB in this case? > Also, not quite sure what happens if memory size (a 16 MB multiple) > does not align with a MAX_ORDER block size (a 256MB multiple in this > case). My 'guess' is that the page allocator would not use it as it > would not fit within the buddy system. > > cc'ing SPARSEMEM author Andy Whitcroft. Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ? That is have the default value be different based on a Kconfig option ? I don't see that ... We may have to do things differently here... Ben. From linas at austin.ibm.com Tue Nov 8 08:21:28 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:21:28 -0600 Subject: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107200352.GB22524@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> Message-ID: <20051107212128.GH19593@austin.ibm.com> On Mon, Nov 07, 2005 at 12:03:52PM -0800, Greg KH was heard to remark: > On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote: > > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > > 3) realy strong typing that sparse can detect. > Please go look at the __bitwise documentation. PCI Error Recovery: header file patch Change enums and subroutine signatures to be strongly typed, per recent discussion with GregKH. Also, change the acronym to the more unique, less generic "PERS" "PCI Error Recovery System". Greg, Please apply. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/include/linux/pci.h =================================================================== --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.528843983 -0600 +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 14:56:04.917367579 -0600 @@ -82,10 +82,12 @@ * the pci device. If some PCI bus between here and the pci device * has crashed or locked up, this info is reflected here. */ +typedef int __bitwise pci_channel_state_t; + enum pci_channel_state { - pci_channel_io_normal = 0, /* I/O channel is in normal state */ - pci_channel_io_frozen = 1, /* I/O to channel is blocked */ - pci_channel_io_perm_failure, /* PCI card is dead */ + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */ + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */ }; /* @@ -121,7 +123,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ - enum pci_channel_state error_state; /* current connectivity state */ + pci_channel_state_t error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -245,35 +247,37 @@ }; /* ---------------------------------------------------------------- */ -/** PCI error recovery infrastructure. If a PCI device driver provides +/** PCI Error Recovery System (PERS). If a PCI device driver provides * a set fof callbacks in struct pci_error_handlers, then that device driver * will be notified of PCI bus errors, and will be driven to recovery * when an error occurs. */ -enum pcierr_result { - PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */ - PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ - PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ - PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ - PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ +typedef int __bitwise pers_result_t; + +enum pers_result { + PERS_RESULT_NONE = (__force pers_result_t) 0, /* no result/none/not supported in device driver */ + PERS_RESULT_CAN_RECOVER = (__force pers_result_t) 1, /* Device driver can recover without slot reset */ + PERS_RESULT_NEED_RESET = (__force pers_result_t) 2, /* Device driver wants slot to be reset. */ + PERS_RESULT_DISCONNECT = (__force pers_result_t) 3, /* Device has completely failed, is unrecoverable */ + PERS_RESULT_RECOVERED = (__force pers_result_t) 4, /* Device driver is fully recovered and operational */ }; /* PCI bus error event callbacks */ struct pci_error_handlers { /* PCI bus error detected on this device */ - int (*error_detected)(struct pci_dev *dev, - enum pci_channel_state error); + pers_result_t (*error_detected)(struct pci_dev *dev, + pci_channel_state_t error); /* MMIO has been re-enabled, but not DMA */ - int (*mmio_enabled)(struct pci_dev *dev); + pers_result_t (*mmio_enabled)(struct pci_dev *dev); /* PCI Express link has been reset */ - int (*link_reset)(struct pci_dev *dev); + pers_result_t (*link_reset)(struct pci_dev *dev); /* PCI slot has been reset */ - int (*slot_reset)(struct pci_dev *dev); + pers_result_t (*slot_reset)(struct pci_dev *dev); /* Device driver may resume normal operations */ void (*resume)(struct pci_dev *dev); From linas at austin.ibm.com Tue Nov 8 08:30:03 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:30:03 -0600 Subject: [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213003.GI19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the IPR SCSI device driver. The patch has been tested, and appears to work well. Please apply. Signed-off-by: Linas Vepstas Signed-off-by: Brian King -- Index: linux-2.6.14-mm1/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c 2005-11-07 13:55:27.986920072 -0600 +++ linux-2.6.14-mm1/drivers/scsi/ipr.c 2005-11-07 15:02:00.639392946 -0600 @@ -5328,6 +5328,94 @@ shutdown_type); } +/* --------------- PCI Error Recovery infrastructure ----------- */ +/** If the PCI slot is frozen, hold off all i/o + * activity; then, as soon as the slot is available again, + * initiate an adapter reset. + */ +static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd) +{ + /* Disallow new interrupts, avoid loop */ + ipr_cmd->ioa_cfg->allow_interrupts = 0; + list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); + ipr_cmd->done = ipr_reset_ioa_job; + return IPR_RC_JOB_RETURN; +} + +/** ipr_eeh_frozen -- called when slot has experience PCI bus error. + * This routine is called to tell us that the PCI bus is down. + * Can't do anything here, except put the device driver into a + * holding pattern, waiting for the PCI bus to come back. + */ +static void ipr_eeh_frozen (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +/** ipr_eeh_slot_reset - called when pci slot has been reset. + * + * This routine is called by the pci error recovery recovery + * code after the PCI slot has been reset, just before we + * should resume normal operations. + */ +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + // pci_enable_device(pdev); + // pci_set_master(pdev); + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, + IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); + + return PERS_RESULT_RECOVERED; +} + +/** This routine is called when the PCI bus has permanently + * failed. This routine should purge all pending I/O and + * shut down the device driver (close and unload). + */ +static void ipr_eeh_perm_failure(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + if (ioa_cfg->sdt_state == WAIT_FOR_DUMP) + ioa_cfg->sdt_state = ABORT_DUMP; + ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES; + ioa_cfg->in_ioa_bringdown = 1; + ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static pers_result_t ipr_eeh_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + switch (state) { + case pci_channel_io_frozen: + ipr_eeh_frozen (pdev); + return PERS_RESULT_NEED_RESET; + + case pci_channel_io_perm_failure: + ipr_eeh_perm_failure (pdev); + return PERS_RESULT_DISCONNECT; + break; + default: + break; + } + return PERS_RESULT_NEED_RESET; +} + +/* ------------- end of PCI Error Recovery suport ----------- */ + /** * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..) * @ioa_cfg: ioa cfg struct @@ -6065,12 +6153,18 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +static struct pci_error_handlers ipr_err_handler = { + .error_detected = ipr_eeh_error_detected, + .slot_reset = ipr_eeh_slot_reset, +}; + static struct pci_driver ipr_driver = { .name = IPR_NAME, .id_table = ipr_pci_table, .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, + .err_handler = &ipr_err_handler, }; /** From linas at austin.ibm.com Tue Nov 8 08:31:40 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:31:40 -0600 Subject: [PATCH 3/7]: Revised [PATCH 28/42]: SCSI: add PCI error recovery to Symbios dev driver In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213139.GJ19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the Symbios SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 13:55:26.839081234 -0600 +++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:08.152337375 -0600 @@ -686,6 +686,10 @@ if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("["); + /* Avoid spinloop trying to handle interrupts on frozen device */ + if (np->s.io_state != pci_channel_io_normal) + return IRQ_HANDLED; + spin_lock_irqsave(np->s.host->host_lock, flags); sym_interrupt(np); spin_unlock_irqrestore(np->s.host->host_lock, flags); @@ -759,6 +763,25 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +static void sym_eeh_timeout(u_long p) +{ + struct sym_eh_wait *ep = (struct sym_eh_wait *) p; + if (!ep) + return; + complete(&ep->done); +} + +static void sym_eeh_done(struct sym_eh_wait *ep) +{ + if (!ep) + return; + ep->timed_out = 0; + if (!del_timer(&ep->timer)) + return; + + complete(&ep->done); +} + /* * Generic method for our eh processing. * The 'op' argument tells what we have to do. @@ -799,6 +822,35 @@ /* Try to proceed the operation we have been asked for */ sts = -1; + + /* We may be in an error condition because the PCI bus + * went down. In this case, we need to wait until the + * PCI bus is reset, the card is reset, and only then + * proceed with the scsi error recovery. We'll wait + * for 15 seconds for this to happen. + */ +#define WAIT_FOR_PCI_RECOVERY 15 + if (np->s.io_state != pci_channel_io_normal) { + struct sym_eh_wait eeh, *eep = &eeh; + np->s.io_reset_wait = eep; + init_completion(&eep->done); + init_timer(&eep->timer); + eep->to_do = SYM_EH_DO_WAIT; + eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ); + eep->timer.function = sym_eeh_timeout; + eep->timer.data = (u_long)eep; + eep->timed_out = 1; /* Be pessimistic for once :) */ + add_timer(&eep->timer); + spin_unlock_irq(np->s.host->host_lock); + wait_for_completion(&eep->done); + spin_lock_irq(np->s.host->host_lock); + if (eep->timed_out) { + printk (KERN_ERR "%s: Timed out waiting for PCI reset\n", + sym_name(np)); + } + np->s.io_reset_wait = NULL; + } + switch(op) { case SYM_EH_ABORT: sts = sym_abort_scsiio(np, cmd, 1); @@ -1584,6 +1636,8 @@ np->maxoffs = dev->chip.offset_max; np->maxburst = dev->chip.burst_max; np->myaddr = dev->host_id; + np->s.io_state = pci_channel_io_normal; + np->s.io_reset_wait = NULL; /* * Edit its name. @@ -1916,6 +1970,58 @@ return 1; } +/* ------------- PCI Error Recovery infrastructure -------------- */ +/** sym2_io_error_detected() is called when PCI error is detected */ +static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + np->s.io_state = state; + // XXX If slot is permanently frozen, then what? + // Should we scsi_remove_host() maybe ?? + + /* Request a slot slot reset. */ + return PERS_RESULT_NEED_RESET; +} + +/** sym2_io_slot_reset is called when the pci bus has been reset. + * Restart the card from scratch. */ +static pers_result_t sym2_io_slot_reset (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + printk (KERN_INFO "%s: recovering from a PCI slot reset\n", + sym_name(np)); + + if (pci_enable_device(pdev)) + printk (KERN_ERR "%s: device setup failed most egregiously\n", + sym_name(np)); + + pci_set_master(pdev); + enable_irq (pdev->irq); + + /* Perform host reset only on one instance of the card */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_reset_scsi_bus(np, 0); + + return PERS_RESULT_RECOVERED; +} + +/** sym2_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void sym2_io_resume (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + /* Perform device startup only once for this card. */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_start_up (np, 1); + + np->s.io_state = pci_channel_io_normal; + sym_eeh_done (np->s.io_reset_wait); +} + /* * Driver host template. */ @@ -2169,11 +2275,18 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +static struct pci_error_handlers sym2_err_handler = { + .error_detected = sym2_io_error_detected, + .slot_reset = sym2_io_slot_reset, + .resume = sym2_io_resume, +}; + static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), + .err_handler = &sym2_err_handler, }; static int __init sym2_init(void) Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-07 13:55:26.839081234 -0600 +++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-07 15:02:08.154337094 -0600 @@ -181,6 +181,10 @@ char chip_name[8]; struct pci_dev *device; + /* pci bus i/o state; waiter for clearing of i/o state */ + pci_channel_state_t io_state; + struct sym_eh_wait *io_reset_wait; + struct Scsi_Host *host; void __iomem * ioaddr; /* MMIO kernel io address */ Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 13:55:26.840081093 -0600 +++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 15:02:08.162335970 -0600 @@ -2810,6 +2810,7 @@ u_char istat, istatc; u_char dstat; u_short sist; + u_int icnt; /* * interrupt on the fly ? @@ -2851,6 +2852,7 @@ sist = 0; dstat = 0; istatc = istat; + icnt = 0; do { if (istatc & SIP) sist |= INW(np, nc_sist); @@ -2858,6 +2860,19 @@ dstat |= INB(np, nc_dstat); istatc = INB(np, nc_istat); istat |= istatc; + + /* Prevent deadlock waiting on a condition that may never clear. */ + /* XXX this is a temporary kludge; the correct to detect + * a PCI bus error would be to use the io_check interfaces + * proposed by Hidetoshi Seto + * Problem with polling like that is the state flag might not + * be set. + */ + icnt ++; + if (100 < icnt) { + if (np->s.device->error_state != pci_channel_io_normal) + return; + } } while (istatc & (SIP|DIP)); if (DEBUG_FLAGS & DEBUG_TINY) From linas at austin.ibm.com Tue Nov 8 08:34:28 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:34:28 -0600 Subject: [PATCH 4/7]: Revised [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213428.GK19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ethernet e100 device driver. The patch has been tested, and appears to work well. Please apply. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/drivers/net/e100.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/e100.c 2005-11-07 13:55:26.363148057 -0600 +++ linux-2.6.14-mm1/drivers/net/e100.c 2005-11-07 15:02:11.120920287 -0600 @@ -2465,6 +2465,75 @@ } +/* ------------------ PCI Error Recovery infrastructure -------------- */ +/** e100_io_error_detected() is called when PCI error is detected */ +static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + + /* Same as calling e100_down(netdev_priv(netdev)), but generic */ + netdev->stop(netdev); + + /* Is a detach needed ?? */ + // netif_device_detach(netdev); + + /* Request a slot reset. */ + return PERS_RESULT_NEED_RESET; +} + +/** e100_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. */ +static pers_result_t e100_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n"); + return PERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Only one device per card can do a reset */ + if (0 != PCI_FUNC (pdev->devfn)) + return PERS_RESULT_RECOVERED; + + e100_hw_reset(nic); + e100_phy_init(nic); + + if(e100_hw_init(nic)) { + DPRINTK(HW, ERR, "e100_hw_init failed\n"); + return PERS_RESULT_DISCONNECT; + } + + return PERS_RESULT_RECOVERED; +} + +/** e100_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void e100_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + /* ack any pending wake events, disable PME */ + pci_enable_wake(pdev, 0, 0); + + netif_device_attach(netdev); + if(netif_running(netdev)) { + e100_open (netdev); + mod_timer(&nic->watchdog, jiffies); + } +} + +static struct pci_error_handlers e100_err_handler = { + .error_detected = e100_io_error_detected, + .slot_reset = e100_io_slot_reset, + .resume = e100_io_resume, +}; + + static struct pci_driver e100_driver = { .name = DRV_NAME, .id_table = e100_id_table, @@ -2475,6 +2544,7 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, + .err_handler = &e100_err_handler, }; static int __init e100_init_module(void) From linas at austin.ibm.com Tue Nov 8 08:36:04 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:36:04 -0600 Subject: [PATCH: 5/7]: Revised: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 dev driver In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213604.GL19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel gigabit ethernet e1000 device driver. The patch has been tested, and appears to work well. Please apply. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c 2005-11-07 13:55:25.948206317 -0600 +++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:12.811682734 -0600 @@ -206,6 +206,16 @@ void e1000_rx_schedule(void *data); #endif +static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state); +static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev); +static void e1000_io_resume(struct pci_dev *pdev); + +static struct pci_error_handlers e1000_err_handler = { + .error_detected = e1000_io_error_detected, + .slot_reset = e1000_io_slot_reset, + .resume = e1000_io_resume, +}; + /* Exported from other modules */ extern void e1000_check_options(struct e1000_adapter *adapter); @@ -218,8 +228,9 @@ /* Power Managment Hooks */ #ifdef CONFIG_PM .suspend = e1000_suspend, - .resume = e1000_resume + .resume = e1000_resume, #endif + .err_handler = &e1000_err_handler, }; MODULE_AUTHOR("Intel Corporation, "); @@ -2938,6 +2949,10 @@ #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF + /* Prevent stats update while adapter is being reset */ + if (adapter->link_speed == 0) + return; + spin_lock_irqsave(&adapter->stats_lock, flags); /* these counters are modified from e1000_adjust_tbi_stats, @@ -4359,4 +4374,88 @@ } #endif +/* --------------- PCI Error Recovery infrastructure ------------ */ +/** e1000_io_error_detected() is called when PCI error is detected */ +static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (netif_running(netdev)) + e1000_down(adapter); + + /* Request a slot slot reset. */ + return PERS_RESULT_NEED_RESET; +} + +/** e1000_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * e1000_resume routine. + */ +static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n"); + return PERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + pci_enable_wake(pdev, 3, 0); + pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */ + + /* Perform card reset only on one instance of the card */ + if(0 != PCI_FUNC (pdev->devfn)) + return PERS_RESULT_RECOVERED; + + e1000_reset(adapter); + E1000_WRITE_REG(&adapter->hw, WUS, ~0); + + return PERS_RESULT_RECOVERED; +} + +/** e1000_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * e1000_resume routine. + */ +static void e1000_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + uint32_t manc, swsm; + + if(netif_running(netdev)) { + if (e1000_up(adapter)) { + printk("e1000: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + + if(adapter->hw.mac_type >= e1000_82540 && + adapter->hw.media_type == e1000_media_type_copper) { + manc = E1000_READ_REG(&adapter->hw, MANC); + manc &= ~(E1000_MANC_ARP_EN); + E1000_WRITE_REG(&adapter->hw, MANC, manc); + } + + switch(adapter->hw.mac_type) { + case e1000_82573: + swsm = E1000_READ_REG(&adapter->hw, SWSM); + E1000_WRITE_REG(&adapter->hw, SWSM, + swsm | E1000_SWSM_DRV_LOAD); + break; + default: + break; + } + + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); +} + /* e1000_main.c */ From linas at austin.ibm.com Tue Nov 8 08:37:27 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:37:27 -0600 Subject: [PATCH 6/7]: Revised [PATCH 31/42]: ethernet: add PCI error recovery to ixgb dev driver In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213727.GM19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. Replace-Subject: PCI Error Recovery: ixgb network device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ten-gigabit ethernet ixgb device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 13:55:25.431278896 -0600 +++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:14.779406268 -0600 @@ -132,6 +132,16 @@ static void ixgb_netpoll(struct net_device *dev); #endif +static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state); +static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev); +static void ixgb_io_resume (struct pci_dev *pdev); + +static struct pci_error_handlers ixgb_err_handler = { + .error_detected = ixgb_io_error_detected, + .slot_reset = ixgb_io_slot_reset, + .resume = ixgb_io_resume, +}; + /* Exported from other modules */ extern void ixgb_check_options(struct ixgb_adapter *adapter); @@ -141,6 +151,8 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), + .err_handler = &ixgb_err_handler, + }; MODULE_AUTHOR("Intel Corporation, "); @@ -1654,8 +1666,16 @@ unsigned int i; #endif +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { + if (eeh_slot_is_isolated (adapter->pdev)) + // disable_irq_nosync (adapter->pdev->irq); + return IRQ_NONE; /* Not our interrupt */ + } +#else if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ +#endif /* CONFIG_IXGB_EEH_RECOVERY */ if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) { mod_timer(&adapter->watchdog_timer, jiffies); @@ -2125,4 +2145,70 @@ } #endif +/* -------------- PCI Error Recovery infrastructure ---------------- */ +/** ixgb_io_error_detected() is called when PCI error is detected */ +static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) + ixgb_down(adapter, TRUE); + + /* Request a slot reset. */ + return PERS_RESULT_NEED_RESET; +} + +/** ixgb_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * ixgb_resume routine. + */ +static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n"); + return PERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Perform card reset only on one instance of the card */ + if (0 != PCI_FUNC (pdev->devfn)) + return PERS_RESULT_RECOVERED; + + ixgb_reset(adapter); + + return PERS_RESULT_RECOVERED; +} + +/** ixgb_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * ixgb_resume routine. + */ +static void ixgb_io_resume (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) { + if(ixgb_up(adapter)) { + printk ("ixgb: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); + + /* Reading all-ff's from the adapter will completely hose + * the counts and statistics. So just clear them out */ + memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); + ixgb_update_stats(adapter); +} + /* ixgb_main.c */ From greg at kroah.com Tue Nov 8 08:37:29 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 13:37:29 -0800 Subject: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107212128.GH19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> Message-ID: <20051107213729.GA24700@kroah.com> On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote: > +typedef int __bitwise pci_channel_state_t; Closer but... > enum pci_channel_state { > - pci_channel_io_normal = 0, /* I/O channel is in normal state */ > - pci_channel_io_frozen = 1, /* I/O to channel is blocked */ > - pci_channel_io_perm_failure, /* PCI card is dead */ > + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */ > + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ > + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */ > }; You don't have to use an enum anymore, just use a #define. Sparse developers, I see code in the kernel that that does both (__force foo_t) and (foo_t __force). Which one is correct? > +typedef int __bitwise pers_result_t; Ugh, I don't like that name, but I can't think of anything better right now. You should at least keep "pci" at the beginning to make it make more sense to people looking at it for the first time. thanks, greg k-h From linas at austin.ibm.com Tue Nov 8 08:39:54 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 15:39:54 -0600 Subject: [PATCH 7/7]: Revised [PATCH 32/42]: RFC: Add compile-time config options In-Reply-To: <20051107195727.GF19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> Message-ID: <20051107213954.GN19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > 3) realy strong typing that sparse can detect. This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the various device drivers. This patch is "optional" in that its a little bit messy, but it does solve a little problem. -- The good news: this gives some users (e.g. embeddd systems) the option of not compiling in this code, thus making thier device drivers a tiny bit smaller. -- The bad news: This also clutters up the drivers with extraneous markup and the config process with yet another config. Please apply if you agree with the need for this patch :) Signed-off-by: Linas Vepstas Index: linux-2.6.14-mm1/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c 2005-11-07 15:02:00.639392946 -0600 +++ linux-2.6.14-mm1/drivers/scsi/ipr.c 2005-11-07 15:02:20.029668601 -0600 @@ -5329,6 +5329,8 @@ } /* --------------- PCI Error Recovery infrastructure ----------- */ +#ifdef CONFIG_PCI_ERR_RECOVERY + /** If the PCI slot is frozen, hold off all i/o * activity; then, as soon as the slot is available again, * initiate an adapter reset. @@ -5414,6 +5416,7 @@ return PERS_RESULT_NEED_RESET; } +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* ------------- end of PCI Error Recovery suport ----------- */ /** @@ -6153,10 +6156,12 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +#ifdef CONFIG_PCI_ERR_RECOVERY static struct pci_error_handlers ipr_err_handler = { .error_detected = ipr_eeh_error_detected, .slot_reset = ipr_eeh_slot_reset, }; +#endif /* CONFIG_PCI_ERR_RECOVERY */ static struct pci_driver ipr_driver = { .name = IPR_NAME, @@ -6164,7 +6169,9 @@ .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, +#ifdef CONFIG_PCI_ERR_RECOVERY .err_handler = &ipr_err_handler, +#endif /* CONFIG_PCI_ERR_RECOVERY */ }; /** Index: linux-2.6.14-mm1/drivers/pci/Kconfig =================================================================== --- linux-2.6.14-mm1.orig/drivers/pci/Kconfig 2005-11-07 13:55:23.869498177 -0600 +++ linux-2.6.14-mm1/drivers/pci/Kconfig 2005-11-07 15:02:20.030668460 -0600 @@ -13,6 +13,21 @@ If you don't know what to do here, say N. +config PCI_ERR_RECOVERY + bool "PCI Error Recovery support" + depends on PCI + depends on PPC_PSERIES + default y + help + PCI Error Recovery is a mechanism by which crashed/hung + PCI adapters are automatically detected and rebooted without + otherwise disturbing the operation of the system. Support + for this recovery requires special PCI bridge chips (some + PCI-E chips may have this support) as well as support in + the device drivers (not all device drivers can handle this). + + When in doubt, say Y. + config PCI_LEGACY_PROC bool "Legacy /proc/pci interface" depends on PCI Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:08.152337375 -0600 +++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 15:02:20.034667898 -0600 @@ -763,6 +763,7 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +#ifdef CONFIG_PCI_ERR_RECOVERY static void sym_eeh_timeout(u_long p) { struct sym_eh_wait *ep = (struct sym_eh_wait *) p; @@ -781,6 +782,7 @@ complete(&ep->done); } +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* * Generic method for our eh processing. @@ -823,6 +825,7 @@ /* Try to proceed the operation we have been asked for */ sts = -1; +#ifdef CONFIG_PCI_ERR_RECOVERY /* We may be in an error condition because the PCI bus * went down. In this case, we need to wait until the * PCI bus is reset, the card is reset, and only then @@ -850,6 +853,7 @@ } np->s.io_reset_wait = NULL; } +#endif /* CONFIG_PCI_ERR_RECOVERY */ switch(op) { case SYM_EH_ABORT: @@ -1971,6 +1975,7 @@ } /* ------------- PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCI_ERR_RECOVERY /** sym2_io_error_detected() is called when PCI error is detected */ static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) { @@ -2021,6 +2026,7 @@ np->s.io_state = pci_channel_io_normal; sym_eeh_done (np->s.io_reset_wait); } +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* * Driver host template. @@ -2275,18 +2281,22 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +#ifdef CONFIG_PCI_ERR_RECOVERY static struct pci_error_handlers sym2_err_handler = { .error_detected = sym2_io_error_detected, .slot_reset = sym2_io_slot_reset, .resume = sym2_io_resume, }; +#endif /* CONFIG_PCI_ERR_RECOVERY */ static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), +#ifdef CONFIG_PCI_ERR_RECOVERY .err_handler = &sym2_err_handler, +#endif /* CONFIG_PCI_ERR_RECOVERY */ }; static int __init sym2_init(void) Index: linux-2.6.14-mm1/drivers/net/e100.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/e100.c 2005-11-07 15:02:11.120920287 -0600 +++ linux-2.6.14-mm1/drivers/net/e100.c 2005-11-07 15:02:20.038667336 -0600 @@ -2466,6 +2466,7 @@ /* ------------------ PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCI_ERR_RECOVERY /** e100_io_error_detected() is called when PCI error is detected */ static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -2532,6 +2533,7 @@ .slot_reset = e100_io_slot_reset, .resume = e100_io_resume, }; +#endif /* CONFIG_PCI_ERR_RECOVERY */ static struct pci_driver e100_driver = { @@ -2544,7 +2546,9 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, +#ifdef CONFIG_PCI_ERR_RECOVERY .err_handler = &e100_err_handler, +#endif /* CONFIG_PCI_ERR_RECOVERY */ }; static int __init e100_init_module(void) Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:12.811682734 -0600 +++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c 2005-11-07 15:02:20.071662701 -0600 @@ -206,6 +206,7 @@ void e1000_rx_schedule(void *data); #endif +#ifdef CONFIG_PCI_ERR_RECOVERY static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state); static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev); static void e1000_io_resume(struct pci_dev *pdev); @@ -215,6 +216,7 @@ .slot_reset = e1000_io_slot_reset, .resume = e1000_io_resume, }; +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* Exported from other modules */ @@ -230,7 +232,9 @@ .suspend = e1000_suspend, .resume = e1000_resume, #endif +#ifdef CONFIG_PCI_ERR_RECOVERY .err_handler = &e1000_err_handler, +#endif /* CONFIG_PCI_ERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -4375,6 +4379,7 @@ #endif /* --------------- PCI Error Recovery infrastructure ------------ */ +#ifdef CONFIG_PCI_ERR_RECOVERY /** e1000_io_error_detected() is called when PCI error is detected */ static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -4457,5 +4462,6 @@ if(netif_running(netdev)) mod_timer(&adapter->watchdog_timer, jiffies); } +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* e1000_main.c */ Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:14.779406268 -0600 +++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c 2005-11-07 15:02:20.075662139 -0600 @@ -132,6 +132,7 @@ static void ixgb_netpoll(struct net_device *dev); #endif +#ifdef CONFIG_PCI_ERR_RECOVERY static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state); static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev); static void ixgb_io_resume (struct pci_dev *pdev); @@ -141,6 +142,7 @@ .slot_reset = ixgb_io_slot_reset, .resume = ixgb_io_resume, }; +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* Exported from other modules */ @@ -151,8 +153,9 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), +#ifdef CONFIG_PCI_ERR_RECOVERY .err_handler = &ixgb_err_handler, - +#endif /* CONFIG_PCI_ERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -2146,6 +2149,7 @@ #endif /* -------------- PCI Error Recovery infrastructure ---------------- */ +#ifdef CONFIG_PCI_ERR_RECOVERY /** ixgb_io_error_detected() is called when PCI error is detected */ static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) { @@ -2210,5 +2214,6 @@ memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); ixgb_update_stats(adapter); } +#endif /* CONFIG_PCI_ERR_RECOVERY */ /* ixgb_main.c */ From brking at us.ibm.com Tue Nov 8 08:40:32 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 07 Nov 2005 15:40:32 -0600 Subject: [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver In-Reply-To: <20051107213003.GI19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107213003.GI19593@austin.ibm.com> Message-ID: <436FC9D0.4060803@us.ibm.com> linas wrote: > +/** ipr_eeh_slot_reset - called when pci slot has been reset. > + * > + * This routine is called by the pci error recovery recovery > + * code after the PCI slot has been reset, just before we > + * should resume normal operations. > + */ > +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev) > +{ > + unsigned long flags = 0; > + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); > + > + // pci_enable_device(pdev); > + // pci_set_master(pdev); I assume you want remove these two lines... The pci config space restore in ipr's reset handling should cover them. > + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); > + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, > + IPR_SHUTDOWN_NONE); > + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); > + > + return PERS_RESULT_RECOVERED; > +} -- Brian King eServer Storage I/O IBM Linux Technology Center From kravetz at us.ibm.com Tue Nov 8 08:48:59 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 13:48:59 -0800 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <1131397976.4652.52.camel@gaston> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051107204743.GC5821@w-mikek2.ibm.com> <1131397976.4652.52.camel@gaston> Message-ID: <20051107214859.GD5821@w-mikek2.ibm.com> On Tue, Nov 08, 2005 at 08:12:56AM +1100, Benjamin Herrenschmidt wrote: > Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ? > That is have the default value be different based on a Kconfig option ? > I don't see that ... We may have to do things differently here... This seems to be done in other parts of the Kconfig file. Using those as an example, this should keep the MAX_ORDER block size at 16MB. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git7.64k/arch/powerpc/Kconfig linux-2.6.14-git7.64k.work/arch/powerpc/Kconfig --- linux-2.6.14-git7.64k/arch/powerpc/Kconfig 2005-11-07 18:38:50.000000000 +0000 +++ linux-2.6.14-git7.64k.work/arch/powerpc/Kconfig 2005-11-07 21:37:21.000000000 +0000 @@ -463,6 +463,7 @@ source "fs/Kconfig.binfmt" config FORCE_MAX_ZONEORDER int depends on PPC64 + default "9" if PPC_64K_PAGES default "13" config MATH_EMULATION diff -Naupr linux-2.6.14-git7.64k/arch/ppc64/Kconfig linux-2.6.14-git7.64k.work/arch/ppc64/Kconfig --- linux-2.6.14-git7.64k/arch/ppc64/Kconfig 2005-11-07 18:38:50.000000000 +0000 +++ linux-2.6.14-git7.64k.work/arch/ppc64/Kconfig 2005-11-07 21:36:42.000000000 +0000 @@ -56,6 +56,7 @@ config PPC_STD_MMU # max order + 1 config FORCE_MAX_ZONEORDER int + default "9" if PPC_64K_PAGES default "13" source "init/Kconfig" From linas at austin.ibm.com Tue Nov 8 09:03:14 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 16:03:14 -0600 Subject: [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver In-Reply-To: <436FC9D0.4060803@us.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107213003.GI19593@austin.ibm.com> <436FC9D0.4060803@us.ibm.com> Message-ID: <20051107220314.GP19593@austin.ibm.com> On Mon, Nov 07, 2005 at 03:40:32PM -0600, Brian King was heard to remark: > linas wrote: > > +/** ipr_eeh_slot_reset - called when pci slot has been reset. > > + * > > + * This routine is called by the pci error recovery recovery > > + * code after the PCI slot has been reset, just before we > > + * should resume normal operations. > > + */ > > +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev) > > +{ > > + unsigned long flags = 0; > > + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); > > + > > + // pci_enable_device(pdev); > > + // pci_set_master(pdev); > > I assume you want remove these two lines... The pci config space > restore in ipr's reset handling should cover them. Yes, I do. Its cruft left over from old test and debug cycles. :( From paulus at samba.org Tue Nov 8 09:16:37 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 8 Nov 2005 09:16:37 +1100 Subject: [PATCH] powerpc: Nicer printing of address at oops In-Reply-To: <20051107164025.GE7166@pb15.lixom.net> References: <20051106220439.GA7166@pb15.lixom.net> <17262.34658.917350.594965@cargo.ozlabs.ibm.com> <20051106225435.GB7166@pb15.lixom.net> <20051107164025.GE7166@pb15.lixom.net> Message-ID: <17263.53829.671189.936419@cargo.ozlabs.ibm.com> Olof Johansson writes: > On Mon, Nov 07, 2005 at 02:08:38PM +0100, Segher Boessenkool wrote: > > > You can use "%0*lx". Or just use "%p". > > I hope everyone's happy now. Andrew, please apply. Nak - I was quite happy with the previous version, and I have put it in my tree, which I'll ask Linus to pull shortly. Paul. From linas at austin.ibm.com Tue Nov 8 09:43:38 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 16:43:38 -0600 Subject: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107213729.GA24700@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> Message-ID: <20051107224338.GQ19593@austin.ibm.com> On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark: > On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote: > > +typedef int __bitwise pci_channel_state_t; > > You don't have to use an enum anymore, just use a #define. Per Linus's remarks about namespace pollution, I've kept the enums. > > +typedef int __bitwise pers_result_t; > > You should at least keep "pci" at the beginning to make it make > more sense to people looking at it for the first time. PCI_ERS and pci_ers, then. I'm feeling like a blinkin' spammer, splatting out all these emails. --linas PCI Error Recovery: header file patch Change enums and subroutine signatures to be strongly typed, per recent discussion with GregKH. Also, change the acronym to the more unique, less generic "PCI-ERS" "PCI Error Recovery System". Please apply. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-mm1/include/linux/pci.h =================================================================== --- linux-2.6.14-mm1.orig/include/linux/pci.h 2005-11-07 13:55:28.000000000 -0600 +++ linux-2.6.14-mm1/include/linux/pci.h 2005-11-07 16:34:29.790592784 -0600 @@ -82,10 +82,12 @@ * the pci device. If some PCI bus between here and the pci device * has crashed or locked up, this info is reflected here. */ +typedef int __bitwise pci_channel_state_t; + enum pci_channel_state { - pci_channel_io_normal = 0, /* I/O channel is in normal state */ - pci_channel_io_frozen = 1, /* I/O to channel is blocked */ - pci_channel_io_perm_failure, /* PCI card is dead */ + pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O channel is in normal state */ + pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* I/O to channel is blocked */ + pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, /* PCI card is dead */ }; /* @@ -121,7 +123,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ - enum pci_channel_state error_state; /* current connectivity state */ + pci_channel_state_t error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -245,35 +247,46 @@ }; /* ---------------------------------------------------------------- */ -/** PCI error recovery infrastructure. If a PCI device driver provides +/** PCI Error Recovery System (PCI-ERS). If a PCI device driver provides * a set fof callbacks in struct pci_error_handlers, then that device driver * will be notified of PCI bus errors, and will be driven to recovery * when an error occurs. */ -enum pcierr_result { - PCIERR_RESULT_NONE = 0, /* no result/none/not supported in device driver */ - PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ - PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ - PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ - PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ +typedef int __bitwise pci_ers_result_t; + +enum pci_ers_result { + /* no result/none/not supported in device driver */ + PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1, + + /* Device driver can recover without slot reset */ + PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2, + + /* Device driver wants slot to be reset. */ + PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3, + + /* Device has completely failed, is unrecoverable */ + PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4, + + /* Device driver is fully recovered and operational */ + PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5, }; /* PCI bus error event callbacks */ struct pci_error_handlers { /* PCI bus error detected on this device */ - int (*error_detected)(struct pci_dev *dev, - enum pci_channel_state error); + pci_ers_result_t (*error_detected)(struct pci_dev *dev, + pci_channel_state_t error); /* MMIO has been re-enabled, but not DMA */ - int (*mmio_enabled)(struct pci_dev *dev); + pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev); /* PCI Express link has been reset */ - int (*link_reset)(struct pci_dev *dev); + pci_ers_result_t (*link_reset)(struct pci_dev *dev); /* PCI slot has been reset */ - int (*slot_reset)(struct pci_dev *dev); + pci_ers_result_t (*slot_reset)(struct pci_dev *dev); /* Device driver may resume normal operations */ void (*resume)(struct pci_dev *dev); From greg at kroah.com Tue Nov 8 09:54:29 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 14:54:29 -0800 Subject: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: References: <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> Message-ID: <20051107225429.GD27787@kroah.com> On Mon, Nov 07, 2005 at 01:54:35PM -0800, Linus Torvalds wrote: > On Mon, 7 Nov 2005, Greg KH wrote: > > > enum pci_channel_state { > > > - pci_channel_io_normal = 0, /* I/O channel is in normal state */ > > > - pci_channel_io_frozen = 1, /* I/O to channel is blocked */ > > > - pci_channel_io_perm_failure, /* PCI card is dead */ > > > + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */ > > > + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ > > > + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */ > > > }; > > > > You don't have to use an enum anymore, just use a #define. > > The enum works fine, though, and has less namespace pollution than a > #define, so sometimes an enum can be preferred. Good point. > > Sparse developers, I see code in the kernel that that does both > > (__force foo_t) and (foo_t __force). Which one is correct? > > sparse doesn't care. Whatever scans better for humans. Attributes like > "force" parse the same way things like "const" and "volatile" parses, and > while most people _tend_ to write "const int", it's not incorrect to write > "int const". Same with "__attribute__((force))", aka __force. Ok, thanks for clearing this up. greg k-h From greg at kroah.com Tue Nov 8 09:53:08 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 14:53:08 -0800 Subject: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107224338.GQ19593@austin.ibm.com> References: <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> <20051107224338.GQ19593@austin.ibm.com> Message-ID: <20051107225308.GB27787@kroah.com> On Mon, Nov 07, 2005 at 04:43:38PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark: > > On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote: > > > +typedef int __bitwise pci_channel_state_t; > > > > You don't have to use an enum anymore, just use a #define. > > Per Linus's remarks about namespace pollution, I've kept the enums. That's fine. > > > +typedef int __bitwise pers_result_t; > > > > You should at least keep "pci" at the beginning to make it make > > more sense to people looking at it for the first time. > > PCI_ERS and pci_ers, then. Sounds good. > I'm feeling like a blinkin' spammer, splatting out all these emails. Care to just resend the whole series over again? No "patch on top of patch" stuff is needed here. thanks, greg k-h From linas at austin.ibm.com Tue Nov 8 10:19:55 2005 From: linas at austin.ibm.com (linas) Date: Mon, 7 Nov 2005 17:19:55 -0600 Subject: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107225308.GB27787@kroah.com> References: <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> <20051107224338.GQ19593@austin.ibm.com> <20051107225308.GB27787@kroah.com> Message-ID: <20051107231955.GR19593@austin.ibm.com> On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark: > > I'm feeling like a blinkin' spammer, splatting out all these emails. > > Care to just resend the whole series over again? No "patch on top of > patch" stuff is needed here. So that I can avoid that spammin' feelin' ... I'll send patches against -git10, then, so as to start with a clean slate; unless you wanted something aginst -mm1? "The whole series": do you want all 42 patches? Or just the seven discussed today? ----- In the series-of-42, the staging of some of the patches in the middle require simultaneous update to both the drivers/pci/hotplug and the arch/powerpc/xxx; otherwise, build breaks result. I am not sure how to handle that: the obvious solution is to split these up... but that will probably result in a bigger series, and was not a step I wanted to take unless someone asked... --linas From michael at ellerman.id.au Tue Nov 8 10:32:26 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 10:32:26 +1100 Subject: [PATCH 9/12] powerpc: Export htab start/end via device tree In-Reply-To: <1131369333.5976.62.camel@localhost> References: <20051107130707.018F7686E6@ozlabs.org> <1131369333.5976.62.camel@localhost> Message-ID: <200511081032.30434.michael@ellerman.id.au> On Tue, 8 Nov 2005 00:15, Dave Hansen wrote: > On Tue, 2005-11-08 at 00:07 +1100, Michael Ellerman wrote: > > +#ifdef CONFIG_KEXEC > > + kexec_setup(); /* requires unflattened device tree. */ > > +#endif > > Would this #ifdef be more appropriate in the header where this > function's prototype currently resides? Do you mean something like this in kexec.h? #ifdef CONFIG_KEXEC extern void kexec_setup(); #else void kexec_setup(void) { } #endif If so I prefer having the ifdef around the actually call-site, because that makes it clear to someone reading that code that we only call kexec_setup() when CONFIG_KEXEC is true. With the second method they have to open kexec.h and notice that with CONFIG_KEXEC=n kexec_setup() is a no-op. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051108/6439df89/attachment.pgp From michael at ellerman.id.au Tue Nov 8 10:48:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 8 Nov 2005 10:48:32 +1100 Subject: [PATCH 11/12] powerpc: Add arch-dependant copy_oldmem_page In-Reply-To: <1131369446.5976.64.camel@localhost> References: <20051107130713.DC11A686F6@ozlabs.org> <1131369446.5976.64.camel@localhost> Message-ID: <200511081048.36700.michael@ellerman.id.au> On Tue, 8 Nov 2005 00:17, Dave Hansen wrote: > On Tue, 2005-11-08 at 00:07 +1100, Michael Ellerman wrote: > > --- kexec.orig/include/asm-powerpc/kexec.h > > +++ kexec/include/asm-powerpc/kexec.h > > @@ -30,6 +30,8 @@ > > #define KEXEC_ARCH KEXEC_ARCH_PPC > > #endif > > > > +#define HAVE_ARCH_COPY_OLDMEM_PAGE > > + > > #ifndef __ASSEMBLY__ > > Isn't something like that more properly done in Kconfig? I find it very > hard to trace down exactly what CONFIG options it takes to get some of > those #defines to happen. Kconfig makes it much more clear. That's a bit of a hack at the moment, you might be right that Kconfig would be better. I believe there's some guys working on Kdump for x86-64 that have already got the generic infrastructure for this in place, so we'll wait and see what they've done. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051108/18b966fb/attachment.pgp From benh at kernel.crashing.org Tue Nov 8 11:21:05 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Nov 2005 11:21:05 +1100 Subject: [PATCH] ppc64: Fix the lazy icache/dcache code for non-RAM pages Message-ID: <1131409266.4652.72.camel@gaston> For some stupid reason I can't explain (brown paper bag is at hand), I removed the check pfn_valid() in the code that does the icache/dcache coherency on POWER4 and later. That causes us to eventually try to access non existing struct page when hashing in IO pages. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/mm/hash_utils_64.c =================================================================== --- linux-work.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/powerpc/mm/hash_utils_64.c 2005-11-08 11:06:39.000000000 +1100 @@ -514,6 +514,9 @@ { struct page *page; + if (!pfn_valid(pte_pfn(pte))) + return pp; + page = pte_page(pte); /* page is dirty */ From kravetz at us.ibm.com Tue Nov 8 11:25:48 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 16:25:48 -0800 Subject: [PATCH 1/4] revised Memory Add Fixes for ppc64 In-Reply-To: <1131149070.29195.41.camel@gaston> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> Message-ID: <20051108002548.GF5821@w-mikek2.ibm.com> On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote: > This patch will have to be slightly reworked on top of the 64k pages > one. It should be trivial though. Here is a new version of the patch on top of 64k page support (actually 2.6.14-git10). One filename also changed due to more merge changes. Add the create_section_mapping() routine to create hptes for memory sections dynamically added after system boot. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.14-git10/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git10.work/arch/powerpc/mm/hash_utils_64.c --- linux-2.6.14-git10/arch/powerpc/mm/hash_utils_64.c 2005-11-08 00:04:15.784924264 +0000 +++ linux-2.6.14-git10.work/arch/powerpc/mm/hash_utils_64.c 2005-11-08 00:06:46.992964608 +0000 @@ -385,6 +385,15 @@ static unsigned long __init htab_get_tab return pteg_count << 7; } +#ifdef CONFIG_MEMORY_HOTPLUG +void create_section_mapping(unsigned long start, unsigned long end) +{ + BUG_ON(htab_bolt_mapping(start, end, start, + _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_COHERENT | PP_RWXX, + mmu_linear_psize)); +} +#endif /* CONFIG_MEMORY_HOTPLUG */ + void __init htab_initialize(void) { unsigned long table, htab_size_bytes; diff -Naupr linux-2.6.14-git10/arch/powerpc/mm/mem.c linux-2.6.14-git10.work/arch/powerpc/mm/mem.c --- linux-2.6.14-git10/arch/powerpc/mm/mem.c 2005-11-08 00:04:15.798922136 +0000 +++ linux-2.6.14-git10.work/arch/powerpc/mm/mem.c 2005-11-08 00:06:46.993964456 +0000 @@ -127,6 +127,9 @@ int __devinit add_memory(u64 start, u64 unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; + start += KERNELBASE; + create_section_mapping(start, start + size); + /* this should work for most non-highmem platforms */ zone = pgdata->node_zones; diff -Naupr linux-2.6.14-git10/include/asm-powerpc/sparsemem.h linux-2.6.14-git10.work/include/asm-powerpc/sparsemem.h --- linux-2.6.14-git10/include/asm-powerpc/sparsemem.h 2005-11-08 00:04:28.486988472 +0000 +++ linux-2.6.14-git10.work/include/asm-powerpc/sparsemem.h 2005-11-08 00:07:39.138891344 +0000 @@ -11,6 +11,10 @@ #define MAX_PHYSADDR_BITS 38 #define MAX_PHYSMEM_BITS 36 +#ifdef CONFIG_MEMORY_HOTPLUG +extern void create_section_mapping(unsigned long start, unsigned long end); +#endif /* CONFIG_MEMORY_HOTPLUG */ + #endif /* CONFIG_SPARSEMEM */ #endif /* _ASM_POWERPC_SPARSEMEM_H */ From benh at kernel.crashing.org Tue Nov 8 11:35:00 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Nov 2005 11:35:00 +1100 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051107214859.GD5821@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051107204743.GC5821@w-mikek2.ibm.com> <1131397976.4652.52.camel@gaston> <20051107214859.GD5821@w-mikek2.ibm.com> Message-ID: <1131410101.4652.75.camel@gaston> On Mon, 2005-11-07 at 13:48 -0800, Mike Kravetz wrote: > On Tue, Nov 08, 2005 at 08:12:56AM +1100, Benjamin Herrenschmidt wrote: > > Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ? > > That is have the default value be different based on a Kconfig option ? > > I don't see that ... We may have to do things differently here... > > This seems to be done in other parts of the Kconfig file. Using those > as an example, this should keep the MAX_ORDER block size at 16MB. Ok, I verified it does the right thing with Kconfig, thanks. Paul, can you add to the merge tree too ? Ben. From anton at samba.org Tue Nov 8 11:39:01 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 8 Nov 2005 11:39:01 +1100 Subject: [PATCH 0/4] Memory Add Fixes for ppc64 In-Reply-To: <20051104231552.GA25545@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> Message-ID: <20051108003901.GO12353@krispykreme> Hi Mike, > When memory add was merged into mainline in 2.6.14, there were > various bits and pieces missing that prevent it from working on > ppc64. The following patches are against 2.6.14-git7 and address > all but one of the know issues. > > 1) Create hptes for new sections > 2) Clear page count before freeing new pages > 3) Kludge to add new memory to node 0 > 4) Ensure probe file is created for memory add via sysfs Ive got a patch that reworks our numa code and it might reject with your stuff. I'll send them out for review this afternoon. Anton From rdunlap at xenotime.net Tue Nov 8 06:04:29 2005 From: rdunlap at xenotime.net (Randy.Dunlap) Date: Mon, 7 Nov 2005 11:04:29 -0800 (PST) Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107185621.GD19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> Message-ID: On Mon, 7 Nov 2005, linas wrote: > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark: > > > > 3) realy strong typing that sparse can detect. > > Am compiling now. > > > enums don't really work, as you can get away with using an integer and > > the compiler will never complain. Please use a typedef (yeah, I said > > typedef) in the way that sparse will catch any bad users of the code. > > How about typedef'ing structs? No no no. (I feel sure that you will get plenty of responses.) > I'm not to clear on what "sparse" can do; however, in the good old days, > gcc allowed you to commit great sins when passing "struct blah *" to > subroutines, whereas it stoped you cold if you tried the same trick > with a typedef'ed "blah_t *". This got me into the habit of turning > all structs into typedefs in my personal projects. Can we expect > something similar for the kernel, and in particular, should we start > typedefing structs now? No no no. > (Documentation/CodingStyle doesn't mention typedef at all). We can submit patches for that. Basically (generally) we never want a struct to be typedef-ed. (There may be a couple of exceptions to this.) We do allow a very few basic types to be typedef-ed, as long as the basic type (e.g., pid_t) is also a C language basic type or the typedef is useful for strong type checking. -- ~Randy From torvalds at osdl.org Tue Nov 8 08:54:35 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 7 Nov 2005 13:54:35 -0800 (PST) Subject: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107213729.GA24700@kroah.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> Message-ID: On Mon, 7 Nov 2005, Greg KH wrote: > > > enum pci_channel_state { > > - pci_channel_io_normal = 0, /* I/O channel is in normal state */ > > - pci_channel_io_frozen = 1, /* I/O to channel is blocked */ > > - pci_channel_io_perm_failure, /* PCI card is dead */ > > + pci_channel_io_normal = (__force pci_channel_state_t) 0, /* I/O channel is in normal state */ > > + pci_channel_io_frozen = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ > > + pci_channel_io_perm_failure = (__force pci_channel_state_t) 2, /* PCI card is dead */ > > }; > > You don't have to use an enum anymore, just use a #define. The enum works fine, though, and has less namespace pollution than a #define, so sometimes an enum can be preferred. HOWEVER. For sanity, if possible please avoid using the value "0". It's magic for __bitwise, in that a zero is always acceptable as a bitwise thing (which makes sense if you think of bitwise as being about bits: the zero representation is totally independent of any bit ordering). So it's better to start counting from 1 if possible. > Sparse developers, I see code in the kernel that that does both > (__force foo_t) and (foo_t __force). Which one is correct? sparse doesn't care. Whatever scans better for humans. Attributes like "force" parse the same way things like "const" and "volatile" parses, and while most people _tend_ to write "const int", it's not incorrect to write "int const". Same with "__attribute__((force))", aka __force. Linus From kravetz at us.ibm.com Tue Nov 8 11:48:31 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 16:48:31 -0800 Subject: [PATCH 0/4] Memory Add Fixes for ppc64 In-Reply-To: <20051108003901.GO12353@krispykreme> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051108003901.GO12353@krispykreme> Message-ID: <20051108004831.GG5821@w-mikek2.ibm.com> On Tue, Nov 08, 2005 at 11:39:01AM +1100, Anton Blanchard wrote: > Ive got a patch that reworks our numa code and it might reject with > your stuff. I'll send them out for review this afternoon. Interesting in that I was going to start reworking some of the numa code to make it play nice with hot add. Doubt this patch set will impact your changes. This set is not very intelligent WRT numa and doesn't really modify any of the real code. -- Mike From paulus at samba.org Tue Nov 8 13:07:13 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 8 Nov 2005 13:07:13 +1100 Subject: [PATCH 1/4] revised Memory Add Fixes for ppc64 In-Reply-To: <20051108002548.GF5821@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051108002548.GF5821@w-mikek2.ibm.com> Message-ID: <17264.2129.341199.334838@cargo.ozlabs.ibm.com> Mike Kravetz writes: > Here is a new version of the patch on top of 64k page support (actually > 2.6.14-git10). One filename also changed due to more merge changes. So, should I send this on to Linus along with the original 2/4 and 3/4 you posted and the revised 4/4? Paul. From rostedt at goodmis.org Tue Nov 8 12:11:13 2005 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 07 Nov 2005 20:11:13 -0500 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: <20051107204136.GG19593@austin.ibm.com> References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> Message-ID: <1131412273.14381.142.camel@localhost.localdomain> On Mon, 2005-11-07 at 14:41 -0600, linas wrote: > > > > Also, "grep typedef include/linux/*" shows that many kernel device > > > drivers use this convention. > > > > They are wrong and should be fixed. > > What, precisely, is wrong? I can't seem to find it on google, but IIRC Linus stated that he didn't want any more structures defined with typedefs. If it is a structure, simple keep it one, and don't use typedef to get rid of "struct". This was for the simple reason, too many developers were passing structures by value instead of by reference, just because they were using a type that they didn't realize was a structure. And to make things worse, these structures started to get bigger. So in my every day programming, I switched to not typedef structures anymore, and I even found some places that I passed structures by value when it would have been much more efficient by reference. The only exceptions that I can see where you typedef a structure is for use with arch dependent types, like atomic_t. -- Steve From neilb at suse.de Tue Nov 8 12:18:42 2005 From: neilb at suse.de (Neil Brown) Date: Tue, 8 Nov 2005 12:18:42 +1100 Subject: typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] In-Reply-To: message from Steven Rostedt on Monday November 7 References: <20051103235918.GA25616@mail.gnucash.org> <20051104005035.GA26929@mail.gnucash.org> <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> Message-ID: <17263.64754.79733.651186@cse.unsw.edu.au> On Monday November 7, rostedt at goodmis.org wrote: > > This was for the simple reason, too many developers were passing > structures by value instead of by reference, just because they were > using a type that they didn't realize was a structure. And to make > things worse, these structures started to get bigger. > Another reason for not using typedefs is that if you do, and you want to refer to the structure in some other include file, you have to #include the include file that devices the structure. If you don't use typedefs, you can just say: struct foo; and the compiler will happily wait for the complete definition later (providing it doesn't need the size in the meanwhile). So avoiding typedef means that you can sometimes avoid excess #includes, which means faster compiling. NeilBrown From elbert at us.ibm.com Tue Nov 8 13:15:07 2005 From: elbert at us.ibm.com (Elbert C Hu) Date: Mon, 7 Nov 2005 21:15:07 -0500 Subject: 2.6.14 UP kernel build Message-ID: I was unable to build UP kernel from latest 2.6.14 release. May I ask, is UP kernel still supported? Thanks. E. Hu From kravetz at us.ibm.com Tue Nov 8 14:02:07 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 7 Nov 2005 19:02:07 -0800 Subject: [PATCH 1/4] revised Memory Add Fixes for ppc64 In-Reply-To: <17264.2129.341199.334838@cargo.ozlabs.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051108002548.GF5821@w-mikek2.ibm.com> <17264.2129.341199.334838@cargo.ozlabs.ibm.com> Message-ID: <20051108030207.GA6845@w-mikek2.ibm.com> On Tue, Nov 08, 2005 at 01:07:13PM +1100, Paul Mackerras wrote: > So, should I send this on to Linus along with the original 2/4 and 3/4 > you posted and the revised 4/4? Yes, those should provide basic memory add support for ppc64. Thanks, -- Mike From greg at kroah.com Tue Nov 8 13:43:45 2005 From: greg at kroah.com (Greg KH) Date: Mon, 7 Nov 2005 18:43:45 -0800 Subject: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI: PCI Error reporting callbacks In-Reply-To: <20051107231955.GR19593@austin.ibm.com> References: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107195727.GF19593@austin.ibm.com> <20051107200352.GB22524@kroah.com> <20051107212128.GH19593@austin.ibm.com> <20051107213729.GA24700@kroah.com> <20051107224338.GQ19593@austin.ibm.com> <20051107225308.GB27787@kroah.com> <20051107231955.GR19593@austin.ibm.com> Message-ID: <20051108024344.GA538@kroah.com> On Mon, Nov 07, 2005 at 05:19:55PM -0600, linas wrote: > On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark: > > > I'm feeling like a blinkin' spammer, splatting out all these emails. > > > > Care to just resend the whole series over again? No "patch on top of > > patch" stuff is needed here. > > So that I can avoid that spammin' feelin' ... > > I'll send patches against -git10, then, so as to start with a clean > slate; unless you wanted something aginst -mm1? -git10 would be great. > "The whole series": do you want all 42 patches? Or just the seven > discussed today? Just the 7 discussed. The others should go to their proper maintainers (which I am not.) > ----- > > In the series-of-42, the staging of some of the patches in the > middle require simultaneous update to both the drivers/pci/hotplug > and the arch/powerpc/xxx; otherwise, build breaks result. I am > not sure how to handle that: the obvious solution is to split these > up... but that will probably result in a bigger series, and was > not a step I wanted to take unless someone asked... The drivers/pci/hotplug/ stuff only touches the rpaphp driver, right? If so, I don't have a problem with Paul/Ben sending those on with the other PPC64 changes to keep everything building properly for your arch. thanks, greg k-h From david at gibson.dropbear.id.au Tue Nov 8 14:41:47 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 8 Nov 2005 14:41:47 +1100 Subject: ppc64: Make hash_preload() and update_mmu_cache() cope with hugepages Message-ID: <20051108034147.GA14336@localhost.localdomain> Paulus, please apply. At present, hash_preload() (and hence update_mmu_cache()) will not work correctly if called on a hugepage address, thus relying on the fact that the hugepage fault paths never call update_mmu_cache(). I'm not 100% sure that's safe now (although I think it is), and it certainly won't be safe for some of the places we want to go with hugepage. Thus, this patch extends hash_preload() to work correctly on hugepage addresses. Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-08 11:11:29.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-08 12:14:09.000000000 +1100 @@ -638,6 +638,7 @@ cpumask_t mask; unsigned long flags; int local = 0; + int huge = in_hugepage_area(mm->context, ea); /* We don't want huge pages prefaulted for now */ @@ -651,9 +652,11 @@ pgdir = mm->pgd; if (pgdir == NULL) return; - ptep = find_linux_pte(pgdir, ea); - if (!ptep) - return; + if (likely(!huge)) { + ptep = find_linux_pte(pgdir, ea); + if (!ptep) + return; + } vsid = get_vsid(mm->context.id, ea); /* Hash it in */ @@ -661,6 +664,9 @@ mask = cpumask_of_cpu(smp_processor_id()); if (cpus_equal(mm->cpu_vm_mask, mask)) local = 1; + if (unlikely(huge)) + hash_huge_page(mm, access, ea, vsid, local); + else #ifndef CONFIG_PPC_64K_PAGES __hash_page_4K(ea, access, vsid, ptep, trap, local); #else -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From horms at verge.net.au Mon Nov 7 20:03:28 2005 From: horms at verge.net.au (Horms) Date: Mon, 7 Nov 2005 18:03:28 +0900 Subject: SOFTWARE_SUSPEND requires PM on PPC? Message-ID: <20051107090328.GA9756@verge.net.au> Hi, I haven't entirely investigated this, but it seems that CONFIG_PM needs to be set for CONFIT_SOFFTWARE_SUSPEND to compile cleanly. For startes, swsusp_arch_suspend needs swsusp_save. Sould SOFTWARE_SUSPEND required PM, or should i add some strategic #ifdef CONFIG_PM to the code. This is 2.6.14, but i think its present in Linus' current git. arch/ppc/kernel/built-in.o: In function `swsusp_arch_suspend': : undefined reference to `swsusp_save' arch/ppc/kernel/built-in.o: In function `swsusp_arch_resume': : undefined reference to `pagedir_nosave' arch/ppc/kernel/built-in.o: In function `swsusp_arch_resume': : undefined reference to `pagedir_nosave' arch/ppc/platforms/built-in.o: In function `pmac_late_init': : undefined reference to `pm_set_ops' Should PM require SOFTWARE_SUSPEND on PPC, or should the calls to swsusp_save, pagedir_nosave, pagedir_nosave and pm_set_ops be gaurded with #ifdef CONFIG_PM ? -- Horms From paulus at samba.org Tue Nov 8 15:05:45 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 8 Nov 2005 15:05:45 +1100 Subject: please pull powerpc-merge.git Message-ID: <17264.9241.956155.64385@cargo.ozlabs.ibm.com> Linus, Please do a pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git to get a powerpc update - various bug fixes (including Ben's fix for the oops that Martin Bligh reported), some powermac and 8xx updates, and a bit more merging. Diffstat and shortlog follow. Thanks, Paul. arch/powerpc/Kconfig | 9 arch/powerpc/configs/g5_defconfig | 261 +++++-- arch/powerpc/kernel/misc_64.S | 70 ++ arch/powerpc/kernel/process.c | 4 arch/powerpc/kernel/prom.c | 21 - arch/powerpc/kernel/prom_init.c | 11 arch/powerpc/kernel/rtas.c | 5 arch/powerpc/kernel/setup-common.c | 40 + arch/powerpc/kernel/setup_32.c | 1 arch/powerpc/kernel/setup_64.c | 46 - arch/powerpc/kernel/signal_32.c | 1 arch/powerpc/kernel/signal_64.c | 1 arch/powerpc/kernel/smp.c | 1 arch/powerpc/kernel/time.c | 5 arch/powerpc/kernel/traps.c | 11 arch/powerpc/lib/locks.c | 1 arch/powerpc/mm/fault.c | 17 arch/powerpc/mm/hash_utils_64.c | 10 arch/powerpc/mm/init_64.c | 1 arch/powerpc/mm/mem.c | 2 arch/powerpc/mm/numa.c | 1 arch/powerpc/mm/pgtable_64.c | 1 arch/powerpc/oprofile/op_model_power4.c | 24 + arch/powerpc/platforms/iseries/irq.c | 5 arch/powerpc/platforms/iseries/pci.c | 37 - arch/powerpc/platforms/iseries/setup.c | 4 arch/powerpc/platforms/iseries/smp.c | 1 arch/powerpc/platforms/powermac/Makefile | 3 arch/powerpc/platforms/powermac/cpufreq_32.c | 15 arch/powerpc/platforms/powermac/cpufreq_64.c | 323 ++++++++ arch/powerpc/platforms/powermac/setup.c | 13 arch/powerpc/platforms/pseries/iommu.c | 2 arch/powerpc/platforms/pseries/lpar.c | 3 arch/powerpc/platforms/pseries/plpar_wrappers.h | 10 arch/powerpc/platforms/pseries/ras.c | 2 arch/powerpc/platforms/pseries/setup.c | 18 arch/powerpc/sysdev/i8259.c | 5 arch/powerpc/sysdev/u3_iommu.c | 1 arch/ppc/kernel/misc.S | 145 +++- arch/ppc/kernel/traps.c | 12 arch/ppc/syslib/m8xx_wdt.c | 13 arch/ppc/syslib/prom.c | 4 arch/ppc/xmon/xmon.c | 5 arch/ppc64/Kconfig | 11 arch/ppc64/Kconfig.debug | 4 arch/ppc64/kernel/idle.c | 1 arch/ppc64/kernel/machine_kexec.c | 1 arch/ppc64/kernel/misc.S | 72 ++ arch/ppc64/kernel/pci.c | 17 arch/ppc64/kernel/prom.c | 25 + arch/ppc64/kernel/prom_init.c | 3 arch/ppc64/kernel/rtas_pci.c | 6 arch/ppc64/kernel/udbg.c | 55 - drivers/block/swim3.c | 20 - drivers/ide/ppc/pmac.c | 9 drivers/macintosh/Kconfig | 19 drivers/macintosh/Makefile | 9 drivers/macintosh/smu.c | 174 ++++- drivers/macintosh/via-pmu.c | 10 drivers/macintosh/windfarm.h | 131 +++ drivers/macintosh/windfarm_core.c | 426 +++++++++++ drivers/macintosh/windfarm_cpufreq_clamp.c | 105 +++ drivers/macintosh/windfarm_lm75_sensor.c | 263 +++++++ drivers/macintosh/windfarm_pid.c | 145 ++++ drivers/macintosh/windfarm_pid.h | 84 ++ drivers/macintosh/windfarm_pm81.c | 879 +++++++++++++++++++++++ drivers/macintosh/windfarm_pm91.c | 814 +++++++++++++++++++++ drivers/macintosh/windfarm_smu_controls.c | 282 +++++++ drivers/macintosh/windfarm_smu_sensors.c | 479 +++++++++++++ fs/proc/proc_devtree.c | 57 + include/asm-powerpc/ide.h | 29 - include/asm-powerpc/machdep.h | 4 include/asm-powerpc/ppc-pci.h | 1 include/asm-powerpc/prom.h | 2 include/asm-powerpc/reg.h | 9 include/asm-powerpc/smp.h | 4 include/asm-powerpc/smu.h | 199 +++++ include/asm-powerpc/xmon.h | 1 include/asm-ppc/btext.h | 22 - include/asm-ppc/io.h | 12 include/asm-ppc/kgdb.h | 2 include/asm-ppc/prom.h | 2 include/asm-ppc64/ide.h | 30 - include/asm-ppc64/pci.h | 8 include/asm-ppc64/ppcdebug.h | 108 --- include/asm-ppc64/prom.h | 2 include/asm-ppc64/udbg.h | 3 include/linux/proc_fs.h | 9 88 files changed, 5134 insertions(+), 579 deletions(-) rename arch/powerpc/platforms/powermac/{cpufreq.c => cpufreq_32.c} (99%) create mode 100644 arch/powerpc/platforms/powermac/cpufreq_64.c create mode 100644 drivers/macintosh/windfarm.h create mode 100644 drivers/macintosh/windfarm_core.c create mode 100644 drivers/macintosh/windfarm_cpufreq_clamp.c create mode 100644 drivers/macintosh/windfarm_lm75_sensor.c create mode 100644 drivers/macintosh/windfarm_pid.c create mode 100644 drivers/macintosh/windfarm_pid.h create mode 100644 drivers/macintosh/windfarm_pm81.c create mode 100644 drivers/macintosh/windfarm_pm91.c create mode 100644 drivers/macintosh/windfarm_smu_controls.c create mode 100644 drivers/macintosh/windfarm_smu_sensors.c rename include/{asm-ppc/ide.h => asm-powerpc/ide.h} (78%) delete mode 100644 include/asm-ppc64/ide.h delete mode 100644 include/asm-ppc64/ppcdebug.h Anton Blanchard: ppc64: fix Memory: summary line ppc64: fix oprofile sample bit handling ppc64: remove some direct xmon calls Benjamin Herrenschmidt: ppc64: SMU based macs cpufreq support ppc64: SMU partition recovery ppc64: Update g5_defconfig for ARCH=powerpc ppc64: More U3 device-tree fixes ppc64: Thermal control for SMU based machines ppc: fix a bunch of warnings ppc: Fix ARCH=ppc build with xmon ppc: Fix PowerBook HD led on ARCH=powerpc ppc64: Fix the lazy icache/dcache code for non-RAM pages David Gibson: powerpc: Kill ppcdebug David Woodhouse: powerpc: Fix i8259 cascade IRQ powerpc: Fix ppc32 initrd John Rose: dlpar enable for OF pci probe Marcelo Tosatti: ppc32 8xx: fix m8xx_wdt accessor macro update ppc32: handle access to non-present IO ports on 8xx Michael Ellerman: powerpc: Make ppc_md.set_dabr non 64-bit specific Mike Kravetz: Memory Add Fixes for ppc64 Olof Johansson: powerpc: Nicer printing of address at oops Paul Mackerras: powerpc: Various UP build fixes Merge ../linux-2.6 Merge ../linux-2.6 powerpc: Fix typo in pmac_cpufreq_resume macintosh: Always export pmu_[un]register_sleep_notifier if CONFIG_PM set powermac: Use a spinlock in swim3.c (floppy driver) instead of cli Stephen Rothwell: powerpc: merge ide.h From david at gibson.dropbear.id.au Tue Nov 8 15:21:26 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 8 Nov 2005 15:21:26 +1100 Subject: ppc64: Make hash_preload() and update_mmu_cache() cope with hugepages In-Reply-To: <20051108034147.GA14336@localhost.localdomain> References: <20051108034147.GA14336@localhost.localdomain> Message-ID: <20051108042126.GF14336@localhost.localdomain> On Tue, Nov 08, 2005 at 02:41:47PM +1100, David Gibson wrote: > Paulus, please apply. > > At present, hash_preload() (and hence update_mmu_cache()) will not > work correctly if called on a hugepage address, thus relying on the > fact that the hugepage fault paths never call update_mmu_cache(). I'm > not 100% sure that's safe now (although I think it is), and it > certainly won't be safe for some of the places we want to go with > hugepage. > > Thus, this patch extends hash_preload() to work correctly on hugepage > addresses. Bleh, sorry Paul, don't apply. The patch is slightly wrong and the description is largely wrong. In fact the current version of update_mmu_cache() will not blow up on hugepages, and making it do more is merely an optimization which is probably only important in the case of hugepage COW. I'll resend something which will actually work to implement that optimization if COW is looking promising for merging. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From vatsa at in.ibm.com Tue Nov 8 17:28:54 2005 From: vatsa at in.ibm.com (Srivatsa Vaddagiri) Date: Tue, 8 Nov 2005 11:58:54 +0530 Subject: 2.6.14-mm1 doesnt bootup on PPC64 In-Reply-To: <20051107134633.GS26395@bubble.grove.modra.org> References: <20051107132201.GA13514@in.ibm.com> <20051107134633.GS26395@bubble.grove.modra.org> Message-ID: <20051108062854.GB13514@in.ibm.com> On Tue, Nov 08, 2005 at 12:16:33AM +1030, Alan Modra wrote: > Compiler version? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24644 > might be relevant. I tried with both gcc 3.4.3 (RHEL 4) and gcc 3.3.3 (SLES 9). Both result in a non-booting kernel. This is with ppc64's 'defconfig' btw. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 From holindho at cs.helsinki.fi Tue Nov 8 17:37:29 2005 From: holindho at cs.helsinki.fi (Heikki Lindholm) Date: Tue, 08 Nov 2005 08:37:29 +0200 Subject: 2.6.14 UP kernel build In-Reply-To: References: Message-ID: <437047A9.9000609@cs.helsinki.fi> Elbert C Hu kirjoitti: > > > > I was unable to build UP kernel from latest 2.6.14 release. May I ask, is > UP kernel still supported? > Thanks. > > E. Hu At least for Apple's G5, it needs the few added smp.h #includes I sent earlier and to actually work it also needs a patch in prom.c, which sets the boot cpu's cpu_hwid. I suppose the git tree might already work for UP, too. (Don't know about "official" support, though) -- Heikki Lindholm From olh at suse.de Tue Nov 8 23:29:52 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 8 Nov 2005 13:29:52 +0100 Subject: [PATCH] ppc64: SMU partition recovery In-Reply-To: <1131334142.5229.154.camel@gaston> References: <1131334142.5229.154.camel@gaston> Message-ID: <20051108122952.GA18320@suse.de> On Mon, Nov 07, Benjamin Herrenschmidt wrote: > prom_add_property(struct device_node* np, struct property* prop) > { > + /* try to add to proc as well if it was initialized */ > + if (np->pde) > + proc_device_tree_add_prop(np->pde, prop); proc_device_tree_add_prop does not exist on iseries, unless CONFIG_PROC_DEVICETREE is enabled. I did not have the pleasure to boot a recent kernel on iSeries, so I cant say if everyone has now usable content below /proc/device-tee. Perhaps CONFIG_PROC_DEVICETREE should disappear. -- short story of a lazy sysadmin: alias appserv=wotan From apw at shadowen.org Wed Nov 9 01:51:55 2005 From: apw at shadowen.org (Andy Whitcroft) Date: Tue, 08 Nov 2005 14:51:55 +0000 Subject: [PATCH 1/4] Memory Add Fixes for ppc64 In-Reply-To: <20051107204743.GC5821@w-mikek2.ibm.com> References: <20051104231552.GA25545@w-mikek2.ibm.com> <20051104231800.GB25545@w-mikek2.ibm.com> <1131149070.29195.41.camel@gaston> <20051107204743.GC5821@w-mikek2.ibm.com> Message-ID: <4370BB8B.9060705@shadowen.org> Mike Kravetz wrote: > Just curious if we still want to boost MAX_ORDER like this with 64k > pages? Doesn't that make the MAX_ORDER block size 256MB in this case? > Also, not quite sure what happens if memory size (a 16 MB multiple) > does not align with a MAX_ORDER block size (a 256MB multiple in this > case). My 'guess' is that the page allocator would not use it as it > would not fit within the buddy system. The buddy system and the SPARSEMEM mem_map are separate really. The key limitation is the a MAX_ORDER chunk must fit within the SPARSEMEM block size it cannot span two blocks. This is because the algorithm by which the buddy system finds buddies for a returning allocation assumes that mem_map is contigious upto the maximum buddy size (MAX_ORDER); it assumes it can use relative addressing to locate them. The buddy system doesn't really care about the alignment of any of its blocks. The allocator is built empty and all existant pages are freed back to it. If there is a chunk of memory which can never coalesce back to MAX_ORDER it will simply sit lower in the tree 'waiting' for these non-existant buddies and will never merge. It will still be usable. -apw From mporter at kernel.crashing.org Wed Nov 9 03:37:59 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 8 Nov 2005 09:37:59 -0700 Subject: [PATCH] ppc32: fix perf_irq extern on e500 In-Reply-To: <20051107190128.68d41294.akpm@osdl.org>; from akpm@osdl.org on Mon, Nov 07, 2005 at 07:01:28PM -0800 References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> Message-ID: <20051108093759.A26086@cox.net> On Mon, Nov 07, 2005 at 07:01:28PM -0800, Andrew Morton wrote: > Matt Porter wrote: > > > > Add an extern reference to perf_irq on e500. > > #ifdef CONFIG_E500 > > +extern perf_irq_t perf_irq; > > + > > void performance_monitor_exception(struct pt_regs *regs) > > { > > perf_irq(regs); > > extern decls are placed in header files, please. Here's the updated patch, addressing this for ppc64 as well. Paul, would you prefer that this go through the powerpc-merge tree since the updated version modifies arch/powerpc/? -Matt Fixes e500 build and cleans up traps.c by moving perf_irq extern to pmc.h. Signed-off-by: Matt Porter diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 07e5ee4..32cd797 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -898,10 +898,6 @@ void altivec_unavailable_exception(struc die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -#ifdef CONFIG_PPC64 -extern perf_irq_t perf_irq; -#endif - #if defined(CONFIG_PPC64) || defined(CONFIG_E500) void performance_monitor_exception(struct pt_regs *regs) { diff --git a/include/asm-powerpc/pmc.h b/include/asm-powerpc/pmc.h index 2f3c3fc..5f41f3a 100644 --- a/include/asm-powerpc/pmc.h +++ b/include/asm-powerpc/pmc.h @@ -22,6 +22,7 @@ #include typedef void (*perf_irq_t)(struct pt_regs *); +extern perf_irq_t perf_irq; int reserve_pmc_hardware(perf_irq_t new_perf_irq); void release_pmc_hardware(void); From dostrow at gentoo.org Wed Nov 9 06:14:04 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Tue, 08 Nov 2005 14:14:04 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 Message-ID: <1131477245.7855.17.camel@Memoria.anyarch.net> This patch is taken from the arch/sparc64/Makefile almost verbatim. It is really only useful when one is running a pure 32-bit userland (using the powerpc toolchain instead of the powerpc64 one) under a 64-bit kernel. It just checks to make sure that the native CC can use -m64 and if it can't resets CC to the powerpc64 gcc. Signed-off-by: Daniel Ostrow -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} dostrow at gentoo.org diff -Naupr powerpc-merge.orig/arch/powerpc/Makefile powerpc-merge/arch/powerpc/Makefile --- powerpc-merge.orig/arch/powerpc/Makefile 2005-11-08 10:18:18.000000000 -0800 +++ powerpc-merge/arch/powerpc/Makefile 2005-11-08 10:20:27.000000000 -0800 @@ -17,6 +17,7 @@ HAS_BIARCH := $(call cc-option-yn, -m32) ifeq ($(CONFIG_PPC64),y) OLDARCH := ppc64 SZ := 64 +CC := $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= diff -Naupr powerpc-merge.orig/arch/ppc64/Makefile powerpc-merge/arch/ppc64/Makefile --- powerpc-merge.orig/arch/ppc64/Makefile 2005-11-08 10:07:06.000000000 -0800 +++ powerpc-merge/arch/ppc64/Makefile 2005-11-08 10:56:56.000000000 -0800 @@ -14,6 +14,7 @@ # KERNELLOAD := 0xc000000000000000 +CC := $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051108/dadcd07d/attachment.htm From linas at austin.ibm.com Wed Nov 9 07:15:32 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 14:15:32 -0600 Subject: 2.6.14-mm1 doesnt bootup on PPC64 In-Reply-To: <20051108062854.GB13514@in.ibm.com> References: <20051107132201.GA13514@in.ibm.com> <20051107134633.GS26395@bubble.grove.modra.org> <20051108062854.GB13514@in.ibm.com> Message-ID: <20051108201532.GV19593@austin.ibm.com> On Tue, Nov 08, 2005 at 11:58:54AM +0530, Srivatsa Vaddagiri was heard to remark: > On Tue, Nov 08, 2005 at 12:16:33AM +1030, Alan Modra wrote: > > Compiler version? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24644 > > might be relevant. > > I tried with both gcc 3.4.3 (RHEL 4) and gcc 3.3.3 (SLES 9). Both result > in a non-booting kernel. This is with ppc64's 'defconfig' btw. Not the only problem. -- If pci hotplug for ppc64 is enabled, the kernel won't compile, misc symbols missing related to of_pci_whatever() (I don't have the messages in front of me; I notified John Rose) -- w/ hotplug disabled, the kernel boots, but spews a bunch of these: [360420.236575] Brought up 4 CPUs [360420.236578] softlockup thread 3 started up. [360420.238334] NET: Registered protocol family 16 [360420.239497] PCI: Probing PCI hardware <3>Badness in kref_get at lib/kref.c:32 [360420.239622] Badness in kref_get at lib/kref.c:32 [360420.239632] Call Trace: [360420.239640] [C000000002503720] [C00000000002F2AC] .show_stack+0x5c/0x1cc (unreliable) [360420.239665] [C0000000025037D0] [C00000000043D1D4] .program_check_exception+0x34c/0x62c [360420.239687] [C000000002503880] [C000000000004348] program_check_common+0xc8/0x100 [360420.239706] --- Exception: 700 at .kref_get+0xc/0x24 [360420.239721] LR = .kobject_get+0x24/0x40 [360420.239731] [C000000002503B70] [0000000000000000] 0x0 (unreliable) [360420.239749] [C000000002503BF0] [C000000000287CB8] .get_device+0x24/0x40 [360420.239767] [C000000002503C70] [C000000000204458] .pci_dev_get+0x24/0x40 [360420.239785] [C000000002503CF0] [C00000000020038C] .pci_device_add+0x40/0xc4 [360420.239803] [C000000002503D80] [C000000000017420] .of_scan_bus+0x2f8/0x698 [360420.239822] [C000000002503E50] [C0000000004FE14C] .pcibios_init+0x360/0x364 [360420.239842] [C000000002503F00] [C000000000009584] .init+0x1e8/0x44c [360420.239859] [C000000002503F90] [C00000000000A4E4] .kernel_thread+0x4c/0x68 <3>Badness in kref_get at lib/kref.c:32 [360420.240093] Badness in kref_get at lib/kref.c:32 and it later crashes with cpu 0x0: Vector: 300 (Data Access) at [c000000002503770] pc: c00000000006eb30: .add_wait_queue_exclusive+0x4c/0x74 lr: c00000000006eb10: .add_wait_queue_exclusive+0x2c/0x74 sp: c0000000025039f0 msr: 8000000000001032 dar: 0 dsisr: 42000000 current = 0xc000000073fb87e0 paca = 0xc000000000565000 pid = 1, comm = swapper enter ? for help 0:mon> bt type address 0:mon> t [c0000000025039f0] c0000000000a2f1c .alloc_page_interleave+0x3c/0xb4 (unreliable) [c000000002503a80] c00000000043899c .__down+0x5c/0x11c [c000000002503b50] c00000000028a988 .device_attach+0xd4/0xe8 [c000000002503be0] c000000000289b08 .bus_add_device+0x60/0x1c8 [c000000002503c90] c000000000288110 .device_add+0x140/0x204 [c000000002503d30] c0000000001ff228 .pci_bus_add_device+0x24/0x8c [c000000002503dc0] c0000000001ff2f4 .pci_bus_add_devices+0x64/0x164 [c000000002503e50] c0000000004fdf40 .pcibios_init+0x154/0x364 [c000000002503f00] c000000000009584 .init+0x1e8/0x44c [c000000002503f90] c00000000000a4e4 .kernel_thread+0x4c/0x68 0:mon> I'm punting on this for today; I'm guessing there are some missing or misapplied patches somewhere. --linas From dostrow at gentoo.org Wed Nov 9 07:30:31 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Tue, 08 Nov 2005 15:30:31 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <1131477245.7855.17.camel@Memoria.anyarch.net> References: <1131477245.7855.17.camel@Memoria.anyarch.net> Message-ID: <1131481831.9219.2.camel@Memoria.anyarch.net> Erk...sorry about the 80 character wrapping and the html formated message...excuse me while I kick my stupid mail client at work. :( -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} dostrow at gentoo.org From benh at kernel.crashing.org Wed Nov 9 08:07:45 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Nov 2005 08:07:45 +1100 Subject: [PATCH] ppc64: SMU partition recovery In-Reply-To: <20051108122952.GA18320@suse.de> References: <1131334142.5229.154.camel@gaston> <20051108122952.GA18320@suse.de> Message-ID: <1131484066.4652.114.camel@gaston> On Tue, 2005-11-08 at 13:29 +0100, Olaf Hering wrote: > On Mon, Nov 07, Benjamin Herrenschmidt wrote: > > > > prom_add_property(struct device_node* np, struct property* prop) > > { > > > + /* try to add to proc as well if it was initialized */ > > + if (np->pde) > > + proc_device_tree_add_prop(np->pde, prop); > > proc_device_tree_add_prop does not exist on iseries, unless > CONFIG_PROC_DEVICETREE is enabled. I did not have the pleasure to boot a > recent kernel on iSeries, so I cant say if everyone has now usable content > below /proc/device-tee. Perhaps CONFIG_PROC_DEVICETREE should disappear. Yes, thanks. Ben. From sr at denx.de Wed Nov 9 04:38:11 2005 From: sr at denx.de (Stefan Roese) Date: Tue, 8 Nov 2005 18:38:11 +0100 Subject: 440EP FPU support missing In-Reply-To: <20051108093759.A26086@cox.net> References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> Message-ID: <200511081838.11236.sr@denx.de> In the current linux version, Bamboo (440EP) won't compile anymore, because of missing fpu support: make uImage ... LD init/built-in.o LD .tmp_vmlinux1 arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': : undefined reference to `KernelFP' make: *** [.tmp_vmlinux1] Error 1 Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and how to solve this problem (just restore the original file)? Best regards, Stefan From mporter at kernel.crashing.org Wed Nov 9 09:30:36 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 8 Nov 2005 15:30:36 -0700 Subject: 440EP FPU support missing In-Reply-To: <200511081838.11236.sr@denx.de>; from sr@denx.de on Tue, Nov 08, 2005 at 06:38:11PM +0100 References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> Message-ID: <20051108153036.F27232@cox.net> On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > In the current linux version, Bamboo (440EP) won't compile anymore, because of > missing fpu support: > > make uImage > ... > LD init/built-in.o > LD .tmp_vmlinux1 > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > : undefined reference to `KernelFP' > make: *** [.tmp_vmlinux1] Error 1 > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > how to solve this problem (just restore the original file)? arch/powerpc/kernel/fpu.S is being used now which doesn't have KernelFP. I don't know why the 44x fpu support wasn't using kernel_fp_unavailable_exception() before but I must have missed that reviewing it. Try this patch. -Matt diff --git a/arch/ppc/kernel/head_booke.h b/arch/ppc/kernel/head_booke.h index aeb349b..f3d274c 100644 --- a/arch/ppc/kernel/head_booke.h +++ b/arch/ppc/kernel/head_booke.h @@ -358,6 +358,6 @@ label: NORMAL_EXCEPTION_PROLOG; \ bne load_up_fpu; /* if from user, just load it up */ \ addi r3,r1,STACK_FRAME_OVERHEAD; \ - EXC_XFER_EE_LITE(0x800, KernelFP) + EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception) #endif /* __HEAD_BOOKE_H__ */ From david at gibson.dropbear.id.au Wed Nov 9 09:32:02 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 9 Nov 2005 09:32:02 +1100 Subject: 440EP FPU support missing In-Reply-To: <200511081838.11236.sr@denx.de> References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> Message-ID: <20051108223202.GA25185@localhost.localdomain> On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > In the current linux version, Bamboo (440EP) won't compile anymore, because of > missing fpu support: > > make uImage > ... > LD init/built-in.o > LD .tmp_vmlinux1 > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > : undefined reference to `KernelFP' > make: *** [.tmp_vmlinux1] Error 1 > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > how to solve this problem (just restore the original file)? It's just moved to arch/powerpc/kernel/fpu.S. All you should need is to tweak the Makefiles so that it's included in your build. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From mporter at kernel.crashing.org Wed Nov 9 09:46:51 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 8 Nov 2005 15:46:51 -0700 Subject: 440EP FPU support missing In-Reply-To: <20051108223202.GA25185@localhost.localdomain>; from david@gibson.dropbear.id.au on Wed, Nov 09, 2005 at 09:32:02AM +1100 References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> <20051108223202.GA25185@localhost.localdomain> Message-ID: <20051108154651.G27232@cox.net> On Wed, Nov 09, 2005 at 09:32:02AM +1100, David Gibson wrote: > On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > > In the current linux version, Bamboo (440EP) won't compile anymore, because of > > missing fpu support: > > > > make uImage > > ... > > LD init/built-in.o > > LD .tmp_vmlinux1 > > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > > : undefined reference to `KernelFP' > > make: *** [.tmp_vmlinux1] Error 1 > > > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > > how to solve this problem (just restore the original file)? > > It's just moved to arch/powerpc/kernel/fpu.S. All you should need is > to tweak the Makefiles so that it's included in your build. Actually, the version in arch/powerpc/kernel/ is older than the version removed from arch/ppc/kernel/. It is already being built...see my other post. -Matt From kravetz at us.ibm.com Wed Nov 9 09:55:59 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Tue, 8 Nov 2005 14:55:59 -0800 Subject: 2.6.14-git10 booting? Message-ID: <20051108225559.GA13668@w-mikek2.ibm.com> Has anyone been able to boot 2.6.14-git10? I can't boot it on my OpenPower 720. 2.6.14-git9 is OK on the same system. All I see is the following and I'm unable to enter xmon. Please wait, loading kernel... Elf32 kernel loaded... zImage starting: loaded at 0x004019f4 (sp: 0x019dfbe0) Allocating 0x76be40 bytes for kernel ... gunzipping (0x1c00000 <- 0x4079f4:0x6afd94)...done 0x70eab0 bytes OF stdout device is: /vdevice/vty at 30000000 Hypertas detected, assuming LPAR ! command line: root=/dev/sda4 memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 0000000002380000 alloc_top : 0000000008000000 alloc_top_hi : 0000000180000000 rmo_top : 0000000008000000 ram_top : 0000000180000000 Looking for displays found display : /pci at 800000020000002/pci at 2,6/pci at 1/display at 0, opening ... doneinstantiating rtas at 0x00000000077b3000 ...RTAS Code version: bPF050630 rtas_ram_size = 585000 fixed_base_addr = 77b3000 code_base_addr = 7a4e000 registered vars: name addr size hash align -------------------------------- ------------------ ---- ---- ----- link_rtas_code : 0x00000000077bc580 8 1 0 lpqueue : 0x00000000077eec00 864 1 0 phyp_rpa_block_list_struct_ptr : 0x00000000077fa800 4096 1 0 malloc_info : 0x00000000077bc100 8 3 0 rtas_code_base_ptr : 0x00000000077bc4c0 16 3 0 LPE_OUTSTD : 0x00000000077f1300 78 3 4 ind_count_phyp : 0x00000000077bdf00 8 3 0 glob_lpevent_sent_trace_buf : 0x00000000077d2100 16400 4 0 ind_count_os : 0x00000000077bdfc0 8 4 0 rtas_mem_heap_list_ptr : 0x00000000077bc280 16 5 0 get_ind_receipt : 0x00000000077be200 8 5 0 get_ind_state_receipt : 0x00000000077be380 8 5 0 debug_trace_var_uint64 : 0x00000000077bca00 8 6 0 CPUinjectstate : 0x0000000007808200 4096 6 0 debug_trace_var_int : 0x00000000077bc7c0 8 7 0 glob_rtas_trace_buf : 0x00000000077c2000 65552 7 0 lpqbuffer : 0x00000000077dc000 38400 7 12 perf_tools_corr_token_ptr : 0x00000000077bdb00 8 7 0 pl_receipt : 0x00000000077be9c0 8 7 0 glob_client_interface_ptr : 0x00000000077bec00 24 7 0 RTAS_CMD_TOKEN_VALUE : 0x00000000077bc040 4 9 0 prtas_was_interrupted : 0x00000000077bd8c0 4 9 0 callperf : 0x00000000077fb800 13104 9 0 rtas_mem_heap_ptr : 0x00000000077bc1c0 16 10 0 pdecrement_semaphore : 0x00000000077bc640 8 10 0 phyp_error_log : 0x00000000077f5800 20480 11 0 ind_bytes_left : 0x00000000077bdd80 8 11 0 hypStopWatch : 0x00000000077da300 4104 14 8 disable_checkpoints : 0x00000000077bcd80 4 14 0 sentLidMgrEvents : 0x00000000077bd0c0 8 14 0 pglob_epow_state : 0x00000000077bd200 4 14 0 rtas_call_buf_trace_orig : 0x0000000007819500 8208 15 0 ind_next_seq_num : 0x00000000077be080 8 16 0 indices_temp_page : 0x0000000007806000 4096 16 0 vpd_parms_set_flag : 0x00000000077be580 4 16 0 pSLOTinjectstate : 0x00000000077be7c0 32 16 0 glob_cpu_inject_error : 0x0000000007809200 512 16 0 debug_trace_var_int64 : 0x00000000077bc940 8 17 0 ack_for_log_sent_to_phyp : 0x00000000077bd740 1 17 0 set_ind_receipt : 0x00000000077be2c0 8 17 0 rtas_gdata_hash_ptr : 0x00000000077bc340 16 18 0 subq_buffer : 0x00000000077e5600 38400 20 0 prtas_in_progress : 0x00000000077bd800 4 20 0 ind_blhandle : 0x00000000077be140 56 20 0 prtas_get_vpd_buff : 0x0000000007807000 8 20 12 fru_data_collected : 0x0000000007808000 405 20 0 glob_errinjct_open_token : 0x00000000077be900 8 21 0 glob_of_reentry_ptr : 0x00000000077becc0 24 21 0 error_log_id : 0x00000000077bd540 4 23 0 glob_rtas_trace_buf_orig : 0x0000000007809400 65552 23 0 rtas_fixed_code_ptr : 0x00000000077bc400 16 24 0 es_rpa_block_list_struct_ptr : 0x00000000077f3000 4096 24 0 vslot_type : 0x00000000077bcac0 4 25 0 glob_lpevent_rcvd_trace_buf : 0x00000000077d6200 16400 25 0 subq_array : 0x00000000077ef000 8960 25 0 phyp_log_index : 0x00000000077bd680 1 25 0 rtas_thread_count : 0x00000000077beb40 8 25 0 glob_pfds_add_buff : 0x0000000007804000 8192 26 12 ind_carry : 0x00000000077be440 64 27 0 debug_trace_var_uint : 0x00000000077bc880 8 28 0 ind_bytes_read : 0x00000000077bde40 8 28 0 platform_dump:llist : 0x00000000077bed80 16 28 0 debug_trace_point : 0x00000000077bc700 8 30 0 last_error_log : 0x00000000077db400 2048 30 0 rpc_buffer : 0x00000000077f2000 4096 30 12 partfw_error_log : 0x00000000077f5000 2048 30 0 pending_flag : 0x00000000077bea80 8 30 0 hypStopWatch_orig : 0x000000000781b600 4104 30 0 event_scan_data : 0x00000000077f4000 2024 31 12 rtas_call_buf_trace : 0x00000000077ff000 8208 31 12 perf_tools_buff : 0x0000000007802000 88 31 12 nmi_work_buffer : 0x0000000007803000 4096 31 12 get_cfg_mgr_receipt : 0x00000000077bdc40 8 31 0 done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done 0000000000000004 : starting cpu hw idx 0000000000000004... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000002381000 -> 0x0000000002382313 Device tree struct 0x0000000002383000 -> 0x0000000002394000 Calling quiesce ... returning from prom_init <3>Badness in smp_call_function at arch/powerpc/kernel/smp.c:202 -- Mike From mporter at kernel.crashing.org Wed Nov 9 10:02:00 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 8 Nov 2005 16:02:00 -0700 Subject: 440EP FPU support missing In-Reply-To: <1131489174.26096.0.camel@yoda.jdub.homelinux.org>; from jwboyer@jdub.homelinux.org on Tue, Nov 08, 2005 at 04:32:54PM -0600 References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> <20051108153036.F27232@cox.net> <1131489174.26096.0.camel@yoda.jdub.homelinux.org> Message-ID: <20051108160200.I27232@cox.net> On Tue, Nov 08, 2005 at 04:32:54PM -0600, Josh Boyer wrote: > On Tue, 2005-11-08 at 15:30 -0700, Matt Porter wrote: > > On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > > > In the current linux version, Bamboo (440EP) won't compile anymore, because of > > > missing fpu support: > > > > > > make uImage > > > ... > > > LD init/built-in.o > > > LD .tmp_vmlinux1 > > > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > > > : undefined reference to `KernelFP' > > > make: *** [.tmp_vmlinux1] Error 1 > > > > > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > > > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > > > how to solve this problem (just restore the original file)? > > > > arch/powerpc/kernel/fpu.S is being used now which doesn't have KernelFP. > > I don't know why the 44x fpu support wasn't using > > kernel_fp_unavailable_exception() before but I must have missed that > > reviewing it. > > > > Try this patch. > > Doesn't this render the 440EP's FPU useless? Does what render the 440EP's FPU useless? The supplied patch? I don't think so, the path should be the same as classic PPC. The patch simply replaces the KernelFP routine that used to be in arch/ppc/kernel/fpu.S (and was removed inadvertently in the arch/powerpc/ merge) with a kernel_fp_unavailable_exception() call which does the equivalent and is shared by others. The exception still loads up the fpu is coming from userspace and only goes down this path when getting an FP unavailable exception from kernel space. -Matt From linas at austin.ibm.com Wed Nov 9 10:01:34 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:01:34 -0600 Subject: 2.6.14-git10 booting? In-Reply-To: <20051108225559.GA13668@w-mikek2.ibm.com> References: <20051108225559.GA13668@w-mikek2.ibm.com> Message-ID: <20051108230133.GZ19593@austin.ibm.com> On Tue, Nov 08, 2005 at 02:55:59PM -0800, Mike Kravetz was heard to remark: > Has anyone been able to boot 2.6.14-git10? I can't boot it on my > OpenPower 720. 2.6.14-git9 is OK on the same system. All I see is Been booting all day with 2.6.14-git10 and no problems. --linas From linas at austin.ibm.com Wed Nov 9 10:23:27 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:23:27 -0600 Subject: typedefs and structs In-Reply-To: <1131412273.14381.142.camel@localhost.localdomain> References: <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> Message-ID: <20051108232327.GA19593@austin.ibm.com> On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark: > On Mon, 2005-11-07 at 14:41 -0600, linas wrote: > > don't use typedef to get rid of "struct". > > This was for the simple reason, too many developers were passing > structures by value instead of by reference, just because they were > using a type that they didn't realize was a structure. That's a rather bizarre mistake to make, since, in order to access a values in such a beast, you have to use a dot . instead of an arrow -> and so it hits ou in the face that you passed a value instead of a reference. ---- Off-topic: There's actually a neat little trick in C++ that can help avoid accidentally passing null pointers. One can declare function declarations as: int func (sturct blah &v) { v.a ++; return v.b; } The ampersand says "pass argument by reference (so as to get arg passing efficiency) but force coder to write code as if they were passing by value" As a result, it gets difficult to pass null pointers (for reasons similar to the difficulty of passing null pointers in Java (and yes, I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick only; I wish it was in C so I could experiment more and find out if I like it or hate it. --linas From rostedt at goodmis.org Wed Nov 9 10:33:35 2005 From: rostedt at goodmis.org (Steven Rostedt) Date: Tue, 08 Nov 2005 18:33:35 -0500 Subject: typedefs and structs In-Reply-To: <20051108232327.GA19593@austin.ibm.com> References: <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> Message-ID: <1131492815.14381.184.camel@localhost.localdomain> On Tue, 2005-11-08 at 17:23 -0600, linas wrote: > On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark: > > On Mon, 2005-11-07 at 14:41 -0600, linas wrote: > > > > don't use typedef to get rid of "struct". > > > > This was for the simple reason, too many developers were passing > > structures by value instead of by reference, just because they were > > using a type that they didn't realize was a structure. > > That's a rather bizarre mistake to make, since, in order to > access a values in such a beast, you have to use a dot . instead > of an arrow -> and so it hits ou in the face that you passed a value > instead of a reference. It happens when you access the variable via macros and other routines that you notice that takes and address of the variable, so you just pass in the address of the current local variable. > > ---- > Off-topic: There's actually a neat little trick in C++ that can > help avoid accidentally passing null pointers. One can declare > function declarations as: > > int func (sturct blah &v) { > v.a ++; > return v.b; > } > > The ampersand says "pass argument by reference (so as to get arg passing > efficiency) but force coder to write code as if they were passing by value" > As a result, it gets difficult to pass null pointers (for reasons > similar to the difficulty of passing null pointers in Java (and yes, > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick > only; I wish it was in C so I could experiment more and find out if I > like it or hate it. > Actually, the true pass by reference (not by pointer) is one of the things that C++ has, that I wish C had. -- Steve From linas at austin.ibm.com Wed Nov 9 10:36:58 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:36:58 -0600 Subject: typedefs and structs In-Reply-To: <17263.64754.79733.651186@cse.unsw.edu.au> References: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <17263.64754.79733.651186@cse.unsw.edu.au> Message-ID: <20051108233657.GB19593@austin.ibm.com> On Tue, Nov 08, 2005 at 12:18:42PM +1100, Neil Brown was heard to remark: > > Another reason for not using typedefs is that if you do, and you want > to refer to the structure in some other include file, you have to > #include the include file that devices the structure. > If you don't use typedefs, you can just say: > > struct foo; > > and the compiler will happily wait for the complete definition later > (providing it doesn't need the size in the meanwhile). Yes, this is the "forward declaration" problem I was refering to. Its unavoidable if structs have circular references to each other. However, I've learned, by experience, several things by trying to eliminate such forward declarations (and the related #include hell): -- Its really, really hard, and right in the middle, you think, "gosh this is a stupid idea, why am I bothering?" -- When you get done, you think: "wow this new code structure is so insanely better than the old code! The guy who wrote the old code should be hung from a yardarm as an example!" So having a mechanism that prevents coders from declaring "struct foo" whenever they feel like it can be a good thing. Of course, your milage may vary. --linas From linas at austin.ibm.com Wed Nov 9 10:49:11 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:49:11 -0600 Subject: [PATCH 0/7] PCI Error Recovery Message-ID: <20051108234911.GC19593@austin.ibm.com> Greg, Following seven patches implement the PCI error reporting and recovery header and device driver changes as recently discussed, w/all requested changes & etc. These are tested and wrk well. Please apply. Signed-off-by: Linas Vepstas --linas From linas at austin.ibm.com Wed Nov 9 10:53:57 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:53:57 -0600 Subject: [PATCH 1/7] PCI Error Recovery: header file patch In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051108235357.GD19593@austin.ibm.com> Please apply. -------- PCI Error Recovery: header file patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds a set of callbacks to be used by error recovery routines to notify device drivers of the various stages of recovery. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/include/linux/pci.h =================================================================== --- linux-2.6.14-git10.orig/include/linux/pci.h 2005-11-07 17:24:23.048968436 -0600 +++ linux-2.6.14-git10/include/linux/pci.h 2005-11-07 17:42:46.026024245 -0600 @@ -78,6 +78,23 @@ #define PCI_UNKNOWN ((pci_power_t __force) 5) #define PCI_POWER_ERROR ((pci_power_t __force) -1) +/** The pci_channel state describes connectivity between the CPU and + * the pci device. If some PCI bus between here and the pci device + * has crashed or locked up, this info is reflected here. + */ +typedef int __bitwise pci_channel_state_t; + +enum pci_channel_state { + /* I/O channel is in normal state */ + pci_channel_io_normal = (__force pci_channel_state_t) 1, + + /* I/O to channel is blocked */ + pci_channel_io_frozen = (__force pci_channel_state_t) 2, + + /* PCI card is dead */ + pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, +}; + /* * The pci_dev structure is used to describe PCI devices. */ @@ -110,6 +127,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ + pci_channel_state_t error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -232,6 +250,54 @@ unsigned int use_driver_data:1; /* pci_driver->driver_data is used */ }; +/* ---------------------------------------------------------------- */ +/** PCI Error Recovery System (PCI-ERS). If a PCI device driver provides + * a set fof callbacks in struct pci_error_handlers, then that device driver + * will be notified of PCI bus errors, and will be driven to recovery + * when an error occurs. + */ + +typedef int __bitwise pci_ers_result_t; + +enum pci_ers_result { + /* no result/none/not supported in device driver */ + PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1, + + /* Device driver can recover without slot reset */ + PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2, + + /* Device driver wants slot to be reset. */ + PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3, + + /* Device has completely failed, is unrecoverable */ + PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4, + + /* Device driver is fully recovered and operational */ + PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5, +}; + +/* PCI bus error event callbacks */ +struct pci_error_handlers +{ + /* PCI bus error detected on this device */ + pci_ers_result_t (*error_detected)(struct pci_dev *dev, + enum pci_channel_state error); + + /* MMIO has been re-enabled, but not DMA */ + pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev); + + /* PCI Express link has been reset */ + pci_ers_result_t (*link_reset)(struct pci_dev *dev); + + /* PCI slot has been reset */ + pci_ers_result_t (*slot_reset)(struct pci_dev *dev); + + /* Device driver may resume normal operations */ + void (*resume)(struct pci_dev *dev); +}; + +/* ---------------------------------------------------------------- */ + struct module; struct pci_driver { struct list_head node; @@ -245,6 +311,7 @@ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); + struct pci_error_handlers *err_handler; struct device_driver driver; struct pci_dynids dynids; }; From linas at austin.ibm.com Wed Nov 9 10:55:48 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:55:48 -0600 Subject: [PATCH 2/7] PCI Error Recovery: IPR SCSI device driver In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051108235548.GE19593@austin.ibm.com> Please apply. ------ Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the IPR SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas Signed-off-by: Brian King -- Index: linux-2.6.14-git10/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/ipr.c 2005-11-07 17:24:13.000000000 -0600 +++ linux-2.6.14-git10/drivers/scsi/ipr.c 2005-11-07 17:44:35.415656790 -0600 @@ -5328,6 +5328,92 @@ shutdown_type); } +/* --------------- PCI Error Recovery infrastructure ----------- */ +/** If the PCI slot is frozen, hold off all i/o + * activity; then, as soon as the slot is available again, + * initiate an adapter reset. + */ +static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd) +{ + /* Disallow new interrupts, avoid loop */ + ipr_cmd->ioa_cfg->allow_interrupts = 0; + list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); + ipr_cmd->done = ipr_reset_ioa_job; + return IPR_RC_JOB_RETURN; +} + +/** ipr_eeh_frozen -- called when slot has experience PCI bus error. + * This routine is called to tell us that the PCI bus is down. + * Can't do anything here, except put the device driver into a + * holding pattern, waiting for the PCI bus to come back. + */ +static void ipr_eeh_frozen (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +/** ipr_eeh_slot_reset - called when pci slot has been reset. + * + * This routine is called by the pci error recovery recovery + * code after the PCI slot has been reset, just before we + * should resume normal operations. + */ +static pci_ers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, + IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); + + return PCI_ERS_RESULT_RECOVERED; +} + +/** This routine is called when the PCI bus has permanently + * failed. This routine should purge all pending I/O and + * shut down the device driver (close and unload). + */ +static void ipr_eeh_perm_failure(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + if (ioa_cfg->sdt_state == WAIT_FOR_DUMP) + ioa_cfg->sdt_state = ABORT_DUMP; + ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES; + ioa_cfg->in_ioa_bringdown = 1; + ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static pci_ers_result_t ipr_eeh_error_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + switch (state) { + case pci_channel_io_frozen: + ipr_eeh_frozen (pdev); + return PCI_ERS_RESULT_NEED_RESET; + + case pci_channel_io_perm_failure: + ipr_eeh_perm_failure (pdev); + return PCI_ERS_RESULT_DISCONNECT; + break; + default: + break; + } + return PCI_ERS_RESULT_NEED_RESET; +} + +/* ------------- end of PCI Error Recovery suport ----------- */ + /** * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..) * @ioa_cfg: ioa cfg struct @@ -6065,12 +6151,18 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +static struct pci_error_handlers ipr_err_handler = { + .error_detected = ipr_eeh_error_detected, + .slot_reset = ipr_eeh_slot_reset, +}; + static struct pci_driver ipr_driver = { .name = IPR_NAME, .id_table = ipr_pci_table, .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, + .err_handler = &ipr_err_handler, }; /** From linas at austin.ibm.com Wed Nov 9 10:57:16 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:57:16 -0600 Subject: [PATCH 3/7] PCI Error Recovery: Symbios SCSI device driver In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051108235716.GF19593@austin.ibm.com> Please apply. --- Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the Symbios SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-10-27 19:02:08.000000000 -0500 +++ linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 17:44:37.766326553 -0600 @@ -686,6 +686,10 @@ if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("["); + /* Avoid spinloop trying to handle interrupts on frozen device */ + if (np->s.io_state != pci_channel_io_normal) + return IRQ_HANDLED; + spin_lock_irqsave(np->s.host->host_lock, flags); sym_interrupt(np); spin_unlock_irqrestore(np->s.host->host_lock, flags); @@ -759,6 +763,25 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +static void sym_eeh_timeout(u_long p) +{ + struct sym_eh_wait *ep = (struct sym_eh_wait *) p; + if (!ep) + return; + complete(&ep->done); +} + +static void sym_eeh_done(struct sym_eh_wait *ep) +{ + if (!ep) + return; + ep->timed_out = 0; + if (!del_timer(&ep->timer)) + return; + + complete(&ep->done); +} + /* * Generic method for our eh processing. * The 'op' argument tells what we have to do. @@ -799,6 +822,35 @@ /* Try to proceed the operation we have been asked for */ sts = -1; + + /* We may be in an error condition because the PCI bus + * went down. In this case, we need to wait until the + * PCI bus is reset, the card is reset, and only then + * proceed with the scsi error recovery. We'll wait + * for 15 seconds for this to happen. + */ +#define WAIT_FOR_PCI_RECOVERY 15 + if (np->s.io_state != pci_channel_io_normal) { + struct sym_eh_wait eeh, *eep = &eeh; + np->s.io_reset_wait = eep; + init_completion(&eep->done); + init_timer(&eep->timer); + eep->to_do = SYM_EH_DO_WAIT; + eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ); + eep->timer.function = sym_eeh_timeout; + eep->timer.data = (u_long)eep; + eep->timed_out = 1; /* Be pessimistic for once :) */ + add_timer(&eep->timer); + spin_unlock_irq(np->s.host->host_lock); + wait_for_completion(&eep->done); + spin_lock_irq(np->s.host->host_lock); + if (eep->timed_out) { + printk (KERN_ERR "%s: Timed out waiting for PCI reset\n", + sym_name(np)); + } + np->s.io_reset_wait = NULL; + } + switch(op) { case SYM_EH_ABORT: sts = sym_abort_scsiio(np, cmd, 1); @@ -1584,6 +1636,8 @@ np->maxoffs = dev->chip.offset_max; np->maxburst = dev->chip.burst_max; np->myaddr = dev->host_id; + np->s.io_state = pci_channel_io_normal; + np->s.io_reset_wait = NULL; /* * Edit its name. @@ -1916,6 +1970,58 @@ return 1; } +/* ------------- PCI Error Recovery infrastructure -------------- */ +/** sym2_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + np->s.io_state = state; + // XXX If slot is permanently frozen, then what? + // Should we scsi_remove_host() maybe ?? + + /* Request a slot slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** sym2_io_slot_reset is called when the pci bus has been reset. + * Restart the card from scratch. */ +static pci_ers_result_t sym2_io_slot_reset (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + printk (KERN_INFO "%s: recovering from a PCI slot reset\n", + sym_name(np)); + + if (pci_enable_device(pdev)) + printk (KERN_ERR "%s: device setup failed most egregiously\n", + sym_name(np)); + + pci_set_master(pdev); + enable_irq (pdev->irq); + + /* Perform host reset only on one instance of the card */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_reset_scsi_bus(np, 0); + + return PCI_ERS_RESULT_RECOVERED; +} + +/** sym2_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void sym2_io_resume (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + /* Perform device startup only once for this card. */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_start_up (np, 1); + + np->s.io_state = pci_channel_io_normal; + sym_eeh_done (np->s.io_reset_wait); +} + /* * Driver host template. */ @@ -2169,11 +2275,18 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +static struct pci_error_handlers sym2_err_handler = { + .error_detected = sym2_io_error_detected, + .slot_reset = sym2_io_slot_reset, + .resume = sym2_io_resume, +}; + static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), + .err_handler = &sym2_err_handler, }; static int __init sym2_init(void) Index: linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.h =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-10-27 19:02:08.000000000 -0500 +++ linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-07 17:44:37.768326272 -0600 @@ -181,6 +181,10 @@ char chip_name[8]; struct pci_dev *device; + /* pci bus i/o state; waiter for clearing of i/o state */ + pci_channel_state_t io_state; + struct sym_eh_wait *io_reset_wait; + struct Scsi_Host *host; void __iomem * ioaddr; /* MMIO kernel io address */ Index: linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_hipd.c =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 17:24:14.000000000 -0600 +++ linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-07 17:44:37.813319951 -0600 @@ -2809,6 +2809,7 @@ u_char istat, istatc; u_char dstat; u_short sist; + u_int icnt; /* * interrupt on the fly ? @@ -2850,6 +2851,7 @@ sist = 0; dstat = 0; istatc = istat; + icnt = 0; do { if (istatc & SIP) sist |= INW(np, nc_sist); @@ -2857,6 +2859,19 @@ dstat |= INB(np, nc_dstat); istatc = INB(np, nc_istat); istat |= istatc; + + /* Prevent deadlock waiting on a condition that may never clear. */ + /* XXX this is a temporary kludge; the correct to detect + * a PCI bus error would be to use the io_check interfaces + * proposed by Hidetoshi Seto + * Problem with polling like that is the state flag might not + * be set. + */ + icnt ++; + if (100 < icnt) { + if (np->s.device->error_state != pci_channel_io_normal) + return; + } } while (istatc & (SIP|DIP)); if (DEBUG_FLAGS & DEBUG_TINY) From linas at austin.ibm.com Wed Nov 9 10:58:46 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 17:58:46 -0600 Subject: [PATCH 4/7] PCI Error Recovery: e100 network device driver In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051108235846.GG19593@austin.ibm.com> Please apply. ----- Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ethernet e100 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/drivers/net/e100.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/e100.c 2005-11-07 17:24:10.000000000 -0600 +++ linux-2.6.14-git10/drivers/net/e100.c 2005-11-07 17:44:42.911603712 -0600 @@ -2465,6 +2465,75 @@ } +/* ------------------ PCI Error Recovery infrastructure -------------- */ +/** e100_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + + /* Same as calling e100_down(netdev_priv(netdev)), but generic */ + netdev->stop(netdev); + + /* Is a detach needed ?? */ + // netif_device_detach(netdev); + + /* Request a slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** e100_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. */ +static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Only one device per card can do a reset */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCI_ERS_RESULT_RECOVERED; + + e100_hw_reset(nic); + e100_phy_init(nic); + + if(e100_hw_init(nic)) { + DPRINTK(HW, ERR, "e100_hw_init failed\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + + return PCI_ERS_RESULT_RECOVERED; +} + +/** e100_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void e100_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + /* ack any pending wake events, disable PME */ + pci_enable_wake(pdev, 0, 0); + + netif_device_attach(netdev); + if(netif_running(netdev)) { + e100_open (netdev); + mod_timer(&nic->watchdog, jiffies); + } +} + +static struct pci_error_handlers e100_err_handler = { + .error_detected = e100_io_error_detected, + .slot_reset = e100_io_slot_reset, + .resume = e100_io_resume, +}; + + static struct pci_driver e100_driver = { .name = DRV_NAME, .id_table = e100_id_table, @@ -2475,6 +2544,7 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, + .err_handler = &e100_err_handler, }; static int __init e100_init_module(void) From linas at austin.ibm.com Wed Nov 9 11:00:14 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:00:14 -0600 Subject: [PATCH 5/7] PCI Error Recovery: e1000 network device driver In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051109000014.GH19593@austin.ibm.com> Please apply. ---- Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel gigabit ethernet e1000 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/e1000/e1000_main.c 2005-11-07 17:24:10.000000000 -0600 +++ linux-2.6.14-git10/drivers/net/e1000/e1000_main.c 2005-11-07 17:44:45.143290190 -0600 @@ -206,6 +206,16 @@ void e1000_rx_schedule(void *data); #endif +static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state); +static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev); +static void e1000_io_resume(struct pci_dev *pdev); + +static struct pci_error_handlers e1000_err_handler = { + .error_detected = e1000_io_error_detected, + .slot_reset = e1000_io_slot_reset, + .resume = e1000_io_resume, +}; + /* Exported from other modules */ extern void e1000_check_options(struct e1000_adapter *adapter); @@ -218,8 +228,9 @@ /* Power Managment Hooks */ #ifdef CONFIG_PM .suspend = e1000_suspend, - .resume = e1000_resume + .resume = e1000_resume, #endif + .err_handler = &e1000_err_handler, }; MODULE_AUTHOR("Intel Corporation, "); @@ -2937,6 +2948,10 @@ #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF + /* Prevent stats update while adapter is being reset */ + if (adapter->link_speed == 0) + return; + spin_lock_irqsave(&adapter->stats_lock, flags); /* these counters are modified from e1000_adjust_tbi_stats, @@ -4358,4 +4373,88 @@ } #endif +/* --------------- PCI Error Recovery infrastructure ------------ */ +/** e1000_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (netif_running(netdev)) + e1000_down(adapter); + + /* Request a slot slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** e1000_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * e1000_resume routine. + */ +static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + pci_enable_wake(pdev, 3, 0); + pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */ + + /* Perform card reset only on one instance of the card */ + if(0 != PCI_FUNC (pdev->devfn)) + return PCI_ERS_RESULT_RECOVERED; + + e1000_reset(adapter); + E1000_WRITE_REG(&adapter->hw, WUS, ~0); + + return PCI_ERS_RESULT_RECOVERED; +} + +/** e1000_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * e1000_resume routine. + */ +static void e1000_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + uint32_t manc, swsm; + + if(netif_running(netdev)) { + if (e1000_up(adapter)) { + printk("e1000: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + + if(adapter->hw.mac_type >= e1000_82540 && + adapter->hw.media_type == e1000_media_type_copper) { + manc = E1000_READ_REG(&adapter->hw, MANC); + manc &= ~(E1000_MANC_ARP_EN); + E1000_WRITE_REG(&adapter->hw, MANC, manc); + } + + switch(adapter->hw.mac_type) { + case e1000_82573: + swsm = E1000_READ_REG(&adapter->hw, SWSM); + E1000_WRITE_REG(&adapter->hw, SWSM, + swsm | E1000_SWSM_DRV_LOAD); + break; + default: + break; + } + + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); +} + /* e1000_main.c */ From linas at austin.ibm.com Wed Nov 9 11:01:34 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:01:34 -0600 Subject: [PATCH 6/7] PCI Error Recovery: ixgb network device driver In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051109000134.GI19593@austin.ibm.com> Please apply. ---- Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ten-gigabit ethernet ixgb device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 17:24:11.000000000 -0600 +++ linux-2.6.14-git10/drivers/net/ixgb/ixgb_main.c 2005-11-07 17:44:50.380554424 -0600 @@ -132,6 +132,16 @@ static void ixgb_netpoll(struct net_device *dev); #endif +static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state); +static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev); +static void ixgb_io_resume (struct pci_dev *pdev); + +static struct pci_error_handlers ixgb_err_handler = { + .error_detected = ixgb_io_error_detected, + .slot_reset = ixgb_io_slot_reset, + .resume = ixgb_io_resume, +}; + /* Exported from other modules */ extern void ixgb_check_options(struct ixgb_adapter *adapter); @@ -141,6 +151,8 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), + .err_handler = &ixgb_err_handler, + }; MODULE_AUTHOR("Intel Corporation, "); @@ -1654,8 +1666,16 @@ unsigned int i; #endif +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { + if (eeh_slot_is_isolated (adapter->pdev)) + // disable_irq_nosync (adapter->pdev->irq); + return IRQ_NONE; /* Not our interrupt */ + } +#else if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ +#endif /* CONFIG_IXGB_EEH_RECOVERY */ if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) { mod_timer(&adapter->watchdog_timer, jiffies); @@ -2125,4 +2145,70 @@ } #endif +/* -------------- PCI Error Recovery infrastructure ---------------- */ +/** ixgb_io_error_detected() is called when PCI error is detected */ +static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) + ixgb_down(adapter, TRUE); + + /* Request a slot reset. */ + return PCI_ERS_RESULT_NEED_RESET; +} + +/** ixgb_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * ixgb_resume routine. + */ +static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n"); + return PCI_ERS_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Perform card reset only on one instance of the card */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCI_ERS_RESULT_RECOVERED; + + ixgb_reset(adapter); + + return PCI_ERS_RESULT_RECOVERED; +} + +/** ixgb_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * ixgb_resume routine. + */ +static void ixgb_io_resume (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) { + if(ixgb_up(adapter)) { + printk ("ixgb: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); + + /* Reading all-ff's from the adapter will completely hose + * the counts and statistics. So just clear them out */ + memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); + ixgb_update_stats(adapter); +} + /* ixgb_main.c */ From linas at austin.ibm.com Wed Nov 9 11:03:29 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:03:29 -0600 Subject: [PATCH 7/7] PCI Error Recovery: CONFIG_PCI_ERROR_RECOVERY wrappers In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051109000329.GJ19593@austin.ibm.com> Please apply. ----- This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the various device drivers. This patch is "optional" in that its a little bit messy, but it does solve a little problem. -- The good news: this gives some users (e.g. embeddd systems) the option of not compiling in this code, thus making thier device drivers a tiny bit smaller. -- The bad news: This also clutters up the drivers with extraneous markup and the config process with yet another config. I don't know if this patch is worth it. Its up to you ... :-) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git10/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/ipr.c 2005-11-07 17:44:35.415656790 -0600 +++ linux-2.6.14-git10/drivers/scsi/ipr.c 2005-11-07 17:44:56.315720610 -0600 @@ -5329,6 +5329,8 @@ } /* --------------- PCI Error Recovery infrastructure ----------- */ +#ifdef CONFIG_PCI_ERROR_RECOVERY + /** If the PCI slot is frozen, hold off all i/o * activity; then, as soon as the slot is available again, * initiate an adapter reset. @@ -5412,6 +5414,7 @@ return PCI_ERS_RESULT_NEED_RESET; } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* ------------- end of PCI Error Recovery suport ----------- */ /** @@ -6151,10 +6154,12 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +#ifdef CONFIG_PCI_ERROR_RECOVERY static struct pci_error_handlers ipr_err_handler = { .error_detected = ipr_eeh_error_detected, .slot_reset = ipr_eeh_slot_reset, }; +#endif /* CONFIG_PCI_ERROR_RECOVERY */ static struct pci_driver ipr_driver = { .name = IPR_NAME, @@ -6162,7 +6167,9 @@ .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, +#ifdef CONFIG_PCI_ERROR_RECOVERY .err_handler = &ipr_err_handler, +#endif /* CONFIG_PCI_ERROR_RECOVERY */ }; /** Index: linux-2.6.14-git10/drivers/pci/Kconfig =================================================================== --- linux-2.6.14-git10.orig/drivers/pci/Kconfig 2005-10-27 19:02:08.000000000 -0500 +++ linux-2.6.14-git10/drivers/pci/Kconfig 2005-11-07 17:44:56.327718924 -0600 @@ -13,6 +13,21 @@ If you don't know what to do here, say N. +config PCI_ERR_RECOVERY + bool "PCI Error Recovery support" + depends on PCI + depends on PPC_PSERIES + default y + help + PCI Error Recovery is a mechanism by which crashed/hung + PCI adapters are automatically detected and rebooted without + otherwise disturbing the operation of the system. Support + for this recovery requires special PCI bridge chips (some + PCI-E chips may have this support) as well as support in + the device drivers (not all device drivers can handle this). + + When in doubt, say Y. + config PCI_LEGACY_PROC bool "Legacy /proc/pci interface" depends on PCI Index: linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git10.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 17:44:37.766326553 -0600 +++ linux-2.6.14-git10/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-07 17:44:56.332718222 -0600 @@ -763,6 +763,7 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +#ifdef CONFIG_PCI_ERROR_RECOVERY static void sym_eeh_timeout(u_long p) { struct sym_eh_wait *ep = (struct sym_eh_wait *) p; @@ -781,6 +782,7 @@ complete(&ep->done); } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* * Generic method for our eh processing. @@ -823,6 +825,7 @@ /* Try to proceed the operation we have been asked for */ sts = -1; +#ifdef CONFIG_PCI_ERROR_RECOVERY /* We may be in an error condition because the PCI bus * went down. In this case, we need to wait until the * PCI bus is reset, the card is reset, and only then @@ -850,6 +853,7 @@ } np->s.io_reset_wait = NULL; } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ switch(op) { case SYM_EH_ABORT: @@ -1971,6 +1975,7 @@ } /* ------------- PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCI_ERROR_RECOVERY /** sym2_io_error_detected() is called when PCI error is detected */ static pci_ers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) { @@ -2021,6 +2026,7 @@ np->s.io_state = pci_channel_io_normal; sym_eeh_done (np->s.io_reset_wait); } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* * Driver host template. @@ -2275,18 +2281,22 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +#ifdef CONFIG_PCI_ERROR_RECOVERY static struct pci_error_handlers sym2_err_handler = { .error_detected = sym2_io_error_detected, .slot_reset = sym2_io_slot_reset, .resume = sym2_io_resume, }; +#endif /* CONFIG_PCI_ERROR_RECOVERY */ static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), +#ifdef CONFIG_PCI_ERROR_RECOVERY .err_handler = &sym2_err_handler, +#endif /* CONFIG_PCI_ERROR_RECOVERY */ }; static int __init sym2_init(void) Index: linux-2.6.14-git10/drivers/net/e100.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/e100.c 2005-11-07 17:44:42.911603712 -0600 +++ linux-2.6.14-git10/drivers/net/e100.c 2005-11-07 17:44:56.337717520 -0600 @@ -2466,6 +2466,7 @@ /* ------------------ PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCI_ERROR_RECOVERY /** e100_io_error_detected() is called when PCI error is detected */ static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -2532,6 +2533,7 @@ .slot_reset = e100_io_slot_reset, .resume = e100_io_resume, }; +#endif /* CONFIG_PCI_ERROR_RECOVERY */ static struct pci_driver e100_driver = { @@ -2544,7 +2546,9 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, +#ifdef CONFIG_PCI_ERROR_RECOVERY .err_handler = &e100_err_handler, +#endif /* CONFIG_PCI_ERROR_RECOVERY */ }; static int __init e100_init_module(void) Index: linux-2.6.14-git10/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/e1000/e1000_main.c 2005-11-07 17:44:45.143290190 -0600 +++ linux-2.6.14-git10/drivers/net/e1000/e1000_main.c 2005-11-07 17:44:56.344716537 -0600 @@ -206,6 +206,7 @@ void e1000_rx_schedule(void *data); #endif +#ifdef CONFIG_PCI_ERROR_RECOVERY static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state); static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev); static void e1000_io_resume(struct pci_dev *pdev); @@ -215,6 +216,7 @@ .slot_reset = e1000_io_slot_reset, .resume = e1000_io_resume, }; +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* Exported from other modules */ @@ -230,7 +232,9 @@ .suspend = e1000_suspend, .resume = e1000_resume, #endif +#ifdef CONFIG_PCI_ERROR_RECOVERY .err_handler = &e1000_err_handler, +#endif /* CONFIG_PCI_ERROR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -4374,6 +4378,7 @@ #endif /* --------------- PCI Error Recovery infrastructure ------------ */ +#ifdef CONFIG_PCI_ERROR_RECOVERY /** e1000_io_error_detected() is called when PCI error is detected */ static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -4456,5 +4461,6 @@ if(netif_running(netdev)) mod_timer(&adapter->watchdog_timer, jiffies); } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* e1000_main.c */ Index: linux-2.6.14-git10/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git10.orig/drivers/net/ixgb/ixgb_main.c 2005-11-07 17:44:50.380554424 -0600 +++ linux-2.6.14-git10/drivers/net/ixgb/ixgb_main.c 2005-11-07 17:44:56.350715694 -0600 @@ -132,6 +132,7 @@ static void ixgb_netpoll(struct net_device *dev); #endif +#ifdef CONFIG_PCI_ERROR_RECOVERY static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state); static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev); static void ixgb_io_resume (struct pci_dev *pdev); @@ -141,6 +142,7 @@ .slot_reset = ixgb_io_slot_reset, .resume = ixgb_io_resume, }; +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* Exported from other modules */ @@ -151,8 +153,9 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), +#ifdef CONFIG_PCI_ERROR_RECOVERY .err_handler = &ixgb_err_handler, - +#endif /* CONFIG_PCI_ERROR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -2146,6 +2149,7 @@ #endif /* -------------- PCI Error Recovery infrastructure ---------------- */ +#ifdef CONFIG_PCI_ERROR_RECOVERY /** ixgb_io_error_detected() is called when PCI error is detected */ static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state) { @@ -2210,5 +2214,6 @@ memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); ixgb_update_stats(adapter); } +#endif /* CONFIG_PCI_ERROR_RECOVERY */ /* ixgb_main.c */ From david at gibson.dropbear.id.au Wed Nov 9 11:21:07 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 9 Nov 2005 11:21:07 +1100 Subject: powerpc: Merge signal.h Message-ID: <20051109002107.GD28271@localhost.localdomain> Having already merged the ppc and ppc64 versions of signal.c, this patch finishes the job by merging signal.h. The two versions were almost identical already. Notable changes: - We use BITS_PER_LONG to correctly size sigset_t - Remove some uneeded #includes and struct forward declarations. This does mean adding an include to signal_32.c which relied on the indirect inclusion of sigcontext.h - As the ppc64 version, the merged signal.h has prototypes for do_signal() and do_signal32(). Thus remove extra prototypes from ppc_ksyms.c which had them directly. Built and booted on POWER5 LPAR (ARCH=ppc64 and ARCH=powerpc). Built for 32-bit powermac (ARCH=ppc and ARCH=powerpc) and Walnut (ARCH=ppc). Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/signal.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/signal.h 2005-11-09 10:31:07.000000000 +1100 @@ -0,0 +1,150 @@ +#ifndef _ASM_POWERPC_SIGNAL_H +#define _ASM_POWERPC_SIGNAL_H + +#include +#include + +#define _NSIG 64 +#define _NSIG_BPW BITS_PER_LONG +#define _NSIG_WORDS (_NSIG / _NSIG_BPW) + +typedef unsigned long old_sigset_t; /* at least 32 bits */ + +typedef struct { + unsigned long sig[_NSIG_WORDS]; +} sigset_t; + +#define SIGHUP 1 +#define SIGINT 2 +#define SIGQUIT 3 +#define SIGILL 4 +#define SIGTRAP 5 +#define SIGABRT 6 +#define SIGIOT 6 +#define SIGBUS 7 +#define SIGFPE 8 +#define SIGKILL 9 +#define SIGUSR1 10 +#define SIGSEGV 11 +#define SIGUSR2 12 +#define SIGPIPE 13 +#define SIGALRM 14 +#define SIGTERM 15 +#define SIGSTKFLT 16 +#define SIGCHLD 17 +#define SIGCONT 18 +#define SIGSTOP 19 +#define SIGTSTP 20 +#define SIGTTIN 21 +#define SIGTTOU 22 +#define SIGURG 23 +#define SIGXCPU 24 +#define SIGXFSZ 25 +#define SIGVTALRM 26 +#define SIGPROF 27 +#define SIGWINCH 28 +#define SIGIO 29 +#define SIGPOLL SIGIO +/* +#define SIGLOST 29 +*/ +#define SIGPWR 30 +#define SIGSYS 31 +#define SIGUNUSED 31 + +/* These should not be considered constants from userland. */ +#define SIGRTMIN 32 +#define SIGRTMAX _NSIG + +/* + * SA_FLAGS values: + * + * SA_ONSTACK is not currently supported, but will allow sigaltstack(2). + * SA_INTERRUPT is a no-op, but left due to historical reasons. Use the + * SA_RESTART flag to get restarting signals (which were the default long ago) + * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop. + * SA_RESETHAND clears the handler when the signal is delivered. + * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies. + * SA_NODEFER prevents the current signal from being masked in the handler. + * + * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single + * Unix names RESETHAND and NODEFER respectively. + */ +#define SA_NOCLDSTOP 0x00000001U +#define SA_NOCLDWAIT 0x00000002U +#define SA_SIGINFO 0x00000004U +#define SA_ONSTACK 0x08000000U +#define SA_RESTART 0x10000000U +#define SA_NODEFER 0x40000000U +#define SA_RESETHAND 0x80000000U + +#define SA_NOMASK SA_NODEFER +#define SA_ONESHOT SA_RESETHAND +#define SA_INTERRUPT 0x20000000u /* dummy -- ignored */ + +#define SA_RESTORER 0x04000000U + +/* + * sigaltstack controls + */ +#define SS_ONSTACK 1 +#define SS_DISABLE 2 + +#define MINSIGSTKSZ 2048 +#define SIGSTKSZ 8192 + +#include + +struct old_sigaction { + __sighandler_t sa_handler; + old_sigset_t sa_mask; + unsigned long sa_flags; + __sigrestore_t sa_restorer; +}; + +struct sigaction { + __sighandler_t sa_handler; + unsigned long sa_flags; + __sigrestore_t sa_restorer; + sigset_t sa_mask; /* mask last for extensibility */ +}; + +struct k_sigaction { + struct sigaction sa; +}; + +typedef struct sigaltstack { + void __user *ss_sp; + int ss_flags; + size_t ss_size; +} stack_t; + +#ifdef __KERNEL__ +struct pt_regs; +extern int do_signal(sigset_t *oldset, struct pt_regs *regs); +extern int do_signal32(sigset_t *oldset, struct pt_regs *regs); +#define ptrace_signal_deliver(regs, cookie) do { } while (0) +#endif /* __KERNEL__ */ + +#ifndef __powerpc64__ +/* + * These are parameters to dbg_sigreturn syscall. They enable or + * disable certain debugging things that can be done from signal + * handlers. The dbg_sigreturn syscall *must* be called from a + * SA_SIGINFO signal so the ucontext can be passed to it. It takes an + * array of struct sig_dbg_op, which has the debug operations to + * perform before returning from the signal. + */ +struct sig_dbg_op { + int dbg_type; + unsigned long dbg_value; +}; + +/* Enable or disable single-stepping. The value sets the state. */ +#define SIG_DBG_SINGLE_STEPPING 1 + +/* Enable or disable branch tracing. The value sets the state. */ +#define SIG_DBG_BRANCH_TRACING 2 +#endif /* ! __powerpc64__ */ + +#endif /* _ASM_POWERPC_SIGNAL_H */ Index: working-2.6/include/asm-ppc64/signal.h =================================================================== --- working-2.6.orig/include/asm-ppc64/signal.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,132 +0,0 @@ -#ifndef _ASMPPC64_SIGNAL_H -#define _ASMPPC64_SIGNAL_H - -#include -#include -#include - -/* Avoid too many header ordering problems. */ -struct siginfo; - -#define _NSIG 64 -#define _NSIG_BPW 64 -#define _NSIG_WORDS (_NSIG / _NSIG_BPW) - -typedef unsigned long old_sigset_t; /* at least 32 bits */ - -typedef struct { - unsigned long sig[_NSIG_WORDS]; -} sigset_t; - -#define SIGHUP 1 -#define SIGINT 2 -#define SIGQUIT 3 -#define SIGILL 4 -#define SIGTRAP 5 -#define SIGABRT 6 -#define SIGIOT 6 -#define SIGBUS 7 -#define SIGFPE 8 -#define SIGKILL 9 -#define SIGUSR1 10 -#define SIGSEGV 11 -#define SIGUSR2 12 -#define SIGPIPE 13 -#define SIGALRM 14 -#define SIGTERM 15 -#define SIGSTKFLT 16 -#define SIGCHLD 17 -#define SIGCONT 18 -#define SIGSTOP 19 -#define SIGTSTP 20 -#define SIGTTIN 21 -#define SIGTTOU 22 -#define SIGURG 23 -#define SIGXCPU 24 -#define SIGXFSZ 25 -#define SIGVTALRM 26 -#define SIGPROF 27 -#define SIGWINCH 28 -#define SIGIO 29 -#define SIGPOLL SIGIO -/* -#define SIGLOST 29 -*/ -#define SIGPWR 30 -#define SIGSYS 31 -#define SIGUNUSED 31 - -/* These should not be considered constants from userland. */ -#define SIGRTMIN 32 -#define SIGRTMAX _NSIG - -/* - * SA_FLAGS values: - * - * SA_ONSTACK is not currently supported, but will allow sigaltstack(2). - * SA_INTERRUPT is a no-op, but left due to historical reasons. Use the - * SA_RESTART flag to get restarting signals (which were the default long ago) - * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop. - * SA_RESETHAND clears the handler when the signal is delivered. - * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies. - * SA_NODEFER prevents the current signal from being masked in the handler. - * - * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single - * Unix names RESETHAND and NODEFER respectively. - */ -#define SA_NOCLDSTOP 0x00000001u -#define SA_NOCLDWAIT 0x00000002u -#define SA_SIGINFO 0x00000004u -#define SA_ONSTACK 0x08000000u -#define SA_RESTART 0x10000000u -#define SA_NODEFER 0x40000000u -#define SA_RESETHAND 0x80000000u - -#define SA_NOMASK SA_NODEFER -#define SA_ONESHOT SA_RESETHAND -#define SA_INTERRUPT 0x20000000u /* dummy -- ignored */ - -#define SA_RESTORER 0x04000000u - -/* - * sigaltstack controls - */ -#define SS_ONSTACK 1 -#define SS_DISABLE 2 - -#define MINSIGSTKSZ 2048 -#define SIGSTKSZ 8192 - -#include - -struct old_sigaction { - __sighandler_t sa_handler; - old_sigset_t sa_mask; - unsigned long sa_flags; - __sigrestore_t sa_restorer; -}; - -struct sigaction { - __sighandler_t sa_handler; - unsigned long sa_flags; - __sigrestore_t sa_restorer; - sigset_t sa_mask; /* mask last for extensibility */ -}; - -struct k_sigaction { - struct sigaction sa; -}; - -typedef struct sigaltstack { - void __user *ss_sp; - int ss_flags; - size_t ss_size; -} stack_t; - -struct pt_regs; -struct timespec; -extern int do_signal(sigset_t *oldset, struct pt_regs *regs); -extern int do_signal32(sigset_t *oldset, struct pt_regs *regs); -#define ptrace_signal_deliver(regs, cookie) do { } while (0) - -#endif /* _ASMPPC64_SIGNAL_H */ Index: working-2.6/include/asm-ppc/signal.h =================================================================== --- working-2.6.orig/include/asm-ppc/signal.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,153 +0,0 @@ -#ifndef _ASMPPC_SIGNAL_H -#define _ASMPPC_SIGNAL_H - -#ifdef __KERNEL__ -#include -#endif /* __KERNEL__ */ - -/* Avoid too many header ordering problems. */ -struct siginfo; - -/* Most things should be clean enough to redefine this at will, if care - is taken to make libc match. */ - -#define _NSIG 64 -#define _NSIG_BPW 32 -#define _NSIG_WORDS (_NSIG / _NSIG_BPW) - -typedef unsigned long old_sigset_t; /* at least 32 bits */ - -typedef struct { - unsigned long sig[_NSIG_WORDS]; -} sigset_t; - -#define SIGHUP 1 -#define SIGINT 2 -#define SIGQUIT 3 -#define SIGILL 4 -#define SIGTRAP 5 -#define SIGABRT 6 -#define SIGIOT 6 -#define SIGBUS 7 -#define SIGFPE 8 -#define SIGKILL 9 -#define SIGUSR1 10 -#define SIGSEGV 11 -#define SIGUSR2 12 -#define SIGPIPE 13 -#define SIGALRM 14 -#define SIGTERM 15 -#define SIGSTKFLT 16 -#define SIGCHLD 17 -#define SIGCONT 18 -#define SIGSTOP 19 -#define SIGTSTP 20 -#define SIGTTIN 21 -#define SIGTTOU 22 -#define SIGURG 23 -#define SIGXCPU 24 -#define SIGXFSZ 25 -#define SIGVTALRM 26 -#define SIGPROF 27 -#define SIGWINCH 28 -#define SIGIO 29 -#define SIGPOLL SIGIO -/* -#define SIGLOST 29 -*/ -#define SIGPWR 30 -#define SIGSYS 31 -#define SIGUNUSED 31 - -/* These should not be considered constants from userland. */ -#define SIGRTMIN 32 -#define SIGRTMAX _NSIG - -/* - * SA_FLAGS values: - * - * SA_ONSTACK is not currently supported, but will allow sigaltstack(2). - * SA_INTERRUPT is a no-op, but left due to historical reasons. Use the - * SA_RESTART flag to get restarting signals (which were the default long ago) - * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop. - * SA_RESETHAND clears the handler when the signal is delivered. - * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies. - * SA_NODEFER prevents the current signal from being masked in the handler. - * - * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single - * Unix names RESETHAND and NODEFER respectively. - */ -#define SA_NOCLDSTOP 0x00000001 -#define SA_NOCLDWAIT 0x00000002 -#define SA_SIGINFO 0x00000004 -#define SA_ONSTACK 0x08000000 -#define SA_RESTART 0x10000000 -#define SA_NODEFER 0x40000000 -#define SA_RESETHAND 0x80000000 - -#define SA_NOMASK SA_NODEFER -#define SA_ONESHOT SA_RESETHAND -#define SA_INTERRUPT 0x20000000 /* dummy -- ignored */ - -#define SA_RESTORER 0x04000000 - -/* - * sigaltstack controls - */ -#define SS_ONSTACK 1 -#define SS_DISABLE 2 - -#define MINSIGSTKSZ 2048 -#define SIGSTKSZ 8192 - -#include - -struct old_sigaction { - __sighandler_t sa_handler; - old_sigset_t sa_mask; - unsigned long sa_flags; - __sigrestore_t sa_restorer; -}; - -struct sigaction { - __sighandler_t sa_handler; - unsigned long sa_flags; - __sigrestore_t sa_restorer; - sigset_t sa_mask; /* mask last for extensibility */ -}; - -struct k_sigaction { - struct sigaction sa; -}; - -typedef struct sigaltstack { - void __user *ss_sp; - int ss_flags; - size_t ss_size; -} stack_t; - -#ifdef __KERNEL__ -#include -#define ptrace_signal_deliver(regs, cookie) do { } while (0) -#endif /* __KERNEL__ */ - -/* - * These are parameters to dbg_sigreturn syscall. They enable or - * disable certain debugging things that can be done from signal - * handlers. The dbg_sigreturn syscall *must* be called from a - * SA_SIGINFO signal so the ucontext can be passed to it. It takes an - * array of struct sig_dbg_op, which has the debug operations to - * perform before returning from the signal. - */ -struct sig_dbg_op { - int dbg_type; - unsigned long dbg_value; -}; - -/* Enable or disable single-stepping. The value sets the state. */ -#define SIG_DBG_SINGLE_STEPPING 1 - -/* Enable or disable branch tracing. The value sets the state. */ -#define SIG_DBG_BRANCH_TRACING 2 - -#endif Index: working-2.6/arch/powerpc/kernel/ppc_ksyms.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/ppc_ksyms.c 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/ppc_ksyms.c 2005-11-09 10:23:02.000000000 +1100 @@ -44,6 +44,7 @@ #include #include #include +#include #ifdef CONFIG_8xx #include @@ -56,7 +57,6 @@ extern void alignment_exception(struct pt_regs *regs); extern void program_check_exception(struct pt_regs *regs); extern void single_step_exception(struct pt_regs *regs); -extern int do_signal(sigset_t *, struct pt_regs *); extern int pmac_newworld; extern int sys_sigreturn(struct pt_regs *regs); Index: working-2.6/arch/ppc/kernel/ppc_ksyms.c =================================================================== --- working-2.6.orig/arch/ppc/kernel/ppc_ksyms.c 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/ppc/kernel/ppc_ksyms.c 2005-11-09 10:23:02.000000000 +1100 @@ -46,6 +46,7 @@ #include #include #include +#include #ifdef CONFIG_8xx #include @@ -57,7 +58,6 @@ extern void alignment_exception(struct pt_regs *regs); extern void program_check_exception(struct pt_regs *regs); extern void single_step_exception(struct pt_regs *regs); -extern int do_signal(sigset_t *, struct pt_regs *); extern int pmac_newworld; extern int sys_sigreturn(struct pt_regs *regs); Index: working-2.6/arch/powerpc/kernel/signal_32.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_32.c 2005-11-08 16:10:59.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_32.c 2005-11-09 10:24:33.000000000 +1100 @@ -42,6 +42,7 @@ #include #include +#include #ifdef CONFIG_PPC64 #include "ppc32.h" #include -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From linas at austin.ibm.com Wed Nov 9 11:30:48 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:30:48 -0600 Subject: typedefs and structs In-Reply-To: References: <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> Message-ID: <20051109003048.GK19593@austin.ibm.com> On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark: > On Nov 8, 2005, at 18:23:27, linas wrote: > >Off-topic: There's actually a neat little trick in C++ that can > >help avoid accidentally passing null pointers. One can declare > >function declarations as: > > > > int func (sturct blah &v) { > > v.a ++; > > return v.b; > > } > > > >The ampersand says "pass argument by reference (so as to get arg > >passing efficiency) but force coder to write code as if they were > >passing by value" As a result, it gets difficult to pass null > >pointers (for reasons similar to the difficulty of passing null > >pointers in Java (and yes, I loathe Java, sorry to subject you to > >that)) Anyway, that's a C++ trick only; I wish it was in C so I > >could experiment more and find out if I like it or hate it. > > That technique tends to cause more problems than it solves. If I > write the following code: > > struct foo the_leftmost_foo = get_leftmost_foo(); > do_some_stuff(the_leftmost_foo); > > How do I know what it is going to do? It depends on how do_some_stuff() was declared. If its declared as do_some_stuff (struct foo &x) then it will be a pass by reference. > A much better solution is this: > > void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1))); Think of it as "syntactic sugar": the compiler "does the right thing" without all the grungy extra markup such as __atribute. (Remember that at the dawn of time, C++ was just a bunch of pre-processor markup that did nothing but hide grunge like __attribute__((whatever)) from the programmer. Only later did it become a language. Doing markup like what you're suggesting is only a tiny step away from inventing a new language, esp if you come up with some clever, unobtrusive markup for it.) > That ensures that the first argument cannot be explicitly passed as > null, Well, this misses the point. No one intentionally passes null pointers. Its just that "shit happens". Pass-by-reference changes your coding style. You tend to alloc on stack instead of malloc. And then, since its on stack, you know it would be very wrong to keep a pointer to it, and so you don't, you design code differently. Usually, you discover you never really needed to hold a pointer to it anyway; you just did so out of some ingrained habit. And since its on stack, you can't leak memory, you don't need to reference count it. Much fewer mallocs & frees, so less likely to have errors there. Better performance, and less memory fragmentation, for what that's worth. I dunno, I did this once on a larger, year-long project, and rather liked it (I otherwise don't much like C++, since people tend to use it in bad, horrible ways). I won't say this is the greatest coding style in the world, but it does change the way you think about designing code, mostly for the better. --linas From linas at austin.ibm.com Wed Nov 9 11:42:47 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:42:47 -0600 Subject: typedefs and structs In-Reply-To: <1131495228.12797.67.camel@localhost> References: <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051108235759.GA28271@localhost.localdomain> <1131495228.12797.67.camel@localhost> Message-ID: <20051109004247.GL19593@austin.ibm.com> On Tue, Nov 08, 2005 at 05:13:48PM -0700, Zan Lynx was heard to remark: > On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote: > > > > I hate it: it obscures the fact that it's a pass-by-reference at the > > callsite, which is useful information. Although this is, admittedly, > > the least confusing use of C++ reference types. > > I agree with you about that one. It's yet another thing for C > programmers to have to learn to watch for C++ doing behind your back. I think you're rushing to judgement on something you've never tried. It fundamentally changes coding style; you'd have to try it on some mid-size project for at least a few months or longer to get into the mindset. To make it all work, you also have to do other things, like avoid mallocs and allocing on stack, which forces major changes of style (because of the lifetime of things on stack). If you don't change style to go with it, then you'll just end up in debug hell, in which case you'd be right: it would be a (very) bad idea. (Disclaimer: I've moved away from C++ because of all the other opportunities for misuse that it offers and encourages.) --linas From linas at austin.ibm.com Wed Nov 9 11:48:08 2005 From: linas at austin.ibm.com (linas) Date: Tue, 8 Nov 2005 18:48:08 -0600 Subject: typedefs and structs In-Reply-To: References: <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> Message-ID: <20051109004808.GM19593@austin.ibm.com> On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark: > > Yeah, but if you're trying to read that code, you have to go look up > the declaration to figure out whether it might affect 'foo' or not. > And if you get it wrong, you get silent data corruption. No, that is not what "pass by reference" means. You are thinking of "const", maybe, or "pass by value"; this is neither. The arg is not declared const, the subroutine can (and usually will) modify the contents of the structure, and so the caller will be holding a modified structure when the callee returns (just like it would if a pointer was passed). --linas From david at gibson.dropbear.id.au Wed Nov 9 13:04:06 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 9 Nov 2005 13:04:06 +1100 Subject: powerpc: Merge current.h Message-ID: <20051109020406.GG28271@localhost.localdomain> This patch merges current.h. This is a one-big-ifdef merge, but both versions are so tiny, I think we can live with it. While we're at it, we get rid of the fairly pointless redirection through get_current() in the ppc64 version. Built and booted on POWER5 LPAR (ARCH=powerpc & ARCH=ppc64). Built for 32-bit pmac (ARCH=powerpc & ARCH=ppc). Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/current.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/current.h 2005-11-09 12:53:44.000000000 +1100 @@ -0,0 +1,27 @@ +#ifndef _ASM_POWERPC_CURRENT_H +#define _ASM_POWERPC_CURRENT_H + +/* + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +struct task_struct; + +#ifdef __powerpc64__ +#include + +#define current (get_paca()->__current) + +#else + +/* + * We keep `current' in r2 for speed. + */ +register struct task_struct *current asm ("r2"); + +#endif + +#endif /* _ASM_POWERPC_CURRENT_H */ Index: working-2.6/include/asm-ppc64/current.h =================================================================== --- working-2.6.orig/include/asm-ppc64/current.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,16 +0,0 @@ -#ifndef _PPC64_CURRENT_H -#define _PPC64_CURRENT_H - -#include - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define get_current() (get_paca()->__current) -#define current get_current() - -#endif /* !(_PPC64_CURRENT_H) */ Index: working-2.6/include/asm-ppc/current.h =================================================================== --- working-2.6.orig/include/asm-ppc/current.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,11 +0,0 @@ -#ifdef __KERNEL__ -#ifndef _PPC_CURRENT_H -#define _PPC_CURRENT_H - -/* - * We keep `current' in r2 for speed. - */ -register struct task_struct *current asm ("r2"); - -#endif /* !(_PPC_CURRENT_H) */ -#endif /* __KERNEL__ */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Wed Nov 9 13:38:01 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 9 Nov 2005 13:38:01 +1100 Subject: powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc Message-ID: <20051109023801.GI28271@localhost.localdomain> This patch moves a bunch of files from arch/ppc64 and include/asm-ppc64 which have no equivalents in ppc32 code into arch/powerpc and include/asm-powerpc. The file affected are: abs_addr.h compat.h lppaca.h paca.h tce.h cpu_setup_power4.S ioctl32.c firmware.c pacaData.c The only changes apart from the move and corresponding Makefile changes are: - #ifndef/#define in includes updated to _ASM_POWERPC_ form - trailing whitespace removed - comments giving full paths removed - pacaData.c renamed paca.c to remove studlyCaps - Misplaced { moved in lppaca.h Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built for 32-bit powermac (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/lppaca.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/lppaca.h 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,131 @@ +/* + * lppaca.h + * Copyright (C) 2001 Mike Corrigan IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _ASM_POWERPC_LPPACA_H +#define _ASM_POWERPC_LPPACA_H + +//============================================================================= +// +// This control block contains the data that is shared between the +// hypervisor (PLIC) and the OS. +// +// +//---------------------------------------------------------------------------- +#include + +struct lppaca { +//============================================================================= +// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data +// NOTE: The xDynXyz fields are fields that will be dynamically changed by +// PLIC when preparing to bring a processor online or when dispatching a +// virtual processor! +//============================================================================= + u32 desc; // Eye catcher 0xD397D781 x00-x03 + u16 size; // Size of this struct x04-x05 + u16 reserved1; // Reserved x06-x07 + u16 reserved2:14; // Reserved x08-x09 + u8 shared_proc:1; // Shared processor indicator ... + u8 secondary_thread:1; // Secondary thread indicator ... + volatile u8 dyn_proc_status:8; // Dynamic Status of this proc x0A-x0A + u8 secondary_thread_count; // Secondary thread count x0B-x0B + volatile u16 dyn_hv_phys_proc_index;// Dynamic HV Physical Proc Index0C-x0D + volatile u16 dyn_hv_log_proc_index;// Dynamic HV Logical Proc Indexx0E-x0F + u32 decr_val; // Value for Decr programming x10-x13 + u32 pmc_val; // Value for PMC regs x14-x17 + volatile u32 dyn_hw_node_id; // Dynamic Hardware Node id x18-x1B + volatile u32 dyn_hw_proc_id; // Dynamic Hardware Proc Id x1C-x1F + volatile u32 dyn_pir; // Dynamic ProcIdReg value x20-x23 + u32 dsei_data; // DSEI data x24-x27 + u64 sprg3; // SPRG3 value x28-x2F + u8 reserved3[80]; // Reserved x30-x7F + +//============================================================================= +// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data +//============================================================================= + // This Dword contains a byte for each type of interrupt that can occur. + // The IPI is a count while the others are just a binary 1 or 0. + union { + u64 any_int; + struct { + u16 reserved; // Reserved - cleared by #mpasmbl + u8 xirr_int; // Indicates xXirrValue is valid or Immed IO + u8 ipi_cnt; // IPI Count + u8 decr_int; // DECR interrupt occurred + u8 pdc_int; // PDC interrupt occurred + u8 quantum_int; // Interrupt quantum reached + u8 old_plic_deferred_ext_int; // Old PLIC has a deferred XIRR pending + } fields; + } int_dword; + + // Whenever any fields in this Dword are set then PLIC will defer the + // processing of external interrupts. Note that PLIC will store the + // XIRR directly into the xXirrValue field so that another XIRR will + // not be presented until this one clears. The layout of the low + // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the + // entire Dword is zero or not. A non-zero value in the low order + // 2-bytes will result in SLIC being granted the highest thread + // priority upon return. A 0 will return to SLIC as medium priority. + u64 plic_defer_ints_area; // Entire Dword + + // Used to pass the real SRR0/1 from PLIC to SLIC as well as to + // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. + u64 saved_srr0; // Saved SRR0 x10-x17 + u64 saved_srr1; // Saved SRR1 x18-x1F + + // Used to pass parms from the OS to PLIC for SetAsrAndRfid + u64 saved_gpr3; // Saved GPR3 x20-x27 + u64 saved_gpr4; // Saved GPR4 x28-x2F + u64 saved_gpr5; // Saved GPR5 x30-x37 + + u8 reserved4; // Reserved x38-x38 + u8 cpuctls_task_attrs; // Task attributes for cpuctls x39-x39 + u8 fpregs_in_use; // FP regs in use x3A-x3A + u8 pmcregs_in_use; // PMC regs in use x3B-x3B + volatile u32 saved_decr; // Saved Decr Value x3C-x3F + volatile u64 emulated_time_base;// Emulated TB for this thread x40-x47 + volatile u64 cur_plic_latency; // Unaccounted PLIC latency x48-x4F + u64 tot_plic_latency; // Accumulated PLIC latency x50-x57 + u64 wait_state_cycles; // Wait cycles for this proc x58-x5F + u64 end_of_quantum; // TB at end of quantum x60-x67 + u64 pdc_saved_sprg1; // Saved SPRG1 for PMC int x68-x6F + u64 pdc_saved_srr0; // Saved SRR0 for PMC int x70-x77 + volatile u32 virtual_decr; // Virtual DECR for shared procsx78-x7B + u16 slb_count; // # of SLBs to maintain x7C-x7D + u8 idle; // Indicate OS is idle x7E + u8 vmxregs_in_use; // VMX registers in use x7F + + +//============================================================================= +// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors +//============================================================================= + // This is the yield_count. An "odd" value (low bit on) means that + // the processor is yielded (either because of an OS yield or a PLIC + // preempt). An even value implies that the processor is currently + // executing. + // NOTE: This value will ALWAYS be zero for dedicated processors and + // will NEVER be zero for shared processors (ie, initialized to a 1). + volatile u32 yield_count; // PLIC increments each dispatchx00-x03 + u8 reserved6[124]; // Reserved x04-x7F + +//============================================================================= +// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data +//============================================================================= + u8 pmc_save_area[256]; // PMC interrupt Area x00-xFF +}; + +#endif /* _ASM_POWERPC_LPPACA_H */ Index: working-2.6/include/asm-powerpc/paca.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/paca.h 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,120 @@ +/* + * include/asm-powerpc/paca.h + * + * This control block defines the PACA which defines the processor + * specific data for each logical processor on the system. + * There are some pointers defined that are utilized by PLIC. + * + * C 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#ifndef _ASM_POWERPC_PACA_H +#define _ASM_POWERPC_PACA_H + +#include +#include +#include +#include +#include + +register struct paca_struct *local_paca asm("r13"); +#define get_paca() local_paca + +struct task_struct; + +/* + * Defines the layout of the paca. + * + * This structure is not directly accessed by firmware or the service + * processor except for the first two pointers that point to the + * lppaca area and the ItLpRegSave area for this CPU. Both the + * lppaca and ItLpRegSave objects are currently contained within the + * PACA but they do not need to be. + */ +struct paca_struct { + /* + * Because hw_cpu_id, unlike other paca fields, is accessed + * routinely from other CPUs (from the IRQ code), we stick to + * read-only (after boot) fields in the first cacheline to + * avoid cacheline bouncing. + */ + + /* + * MAGIC: These first two pointers can't be moved - they're + * accessed by the firmware + */ + struct lppaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ + struct ItLpRegSave *reg_save_ptr; /* Pointer to LpRegSave for PLIC */ + + /* + * MAGIC: the spinlock functions in arch/ppc64/lib/locks.c + * load lock_token and paca_index with a single lwz + * instruction. They must travel together and be properly + * aligned. + */ + u16 lock_token; /* Constant 0x8000, used in locks */ + u16 paca_index; /* Logical processor number */ + + u32 default_decr; /* Default decrementer value */ + u64 kernel_toc; /* Kernel TOC address */ + u64 stab_real; /* Absolute address of segment table */ + u64 stab_addr; /* Virtual address of segment table */ + void *emergency_sp; /* pointer to emergency stack */ + s16 hw_cpu_id; /* Physical processor number */ + u8 cpu_start; /* At startup, processor spins until */ + /* this becomes non-zero. */ + + /* + * Now, starting in cacheline 2, the exception save areas + */ + /* used for most interrupts/exceptions */ + u64 exgen[10] __attribute__((aligned(0x80))); + u64 exmc[10]; /* used for machine checks */ + u64 exslb[10]; /* used for SLB/segment table misses + * on the linear mapping */ +#ifdef CONFIG_PPC_64K_PAGES + pgd_t *pgdir; +#endif /* CONFIG_PPC_64K_PAGES */ + + mm_context_t context; + u16 slb_cache[SLB_CACHE_ENTRIES]; + u16 slb_cache_ptr; + + /* + * then miscellaneous read-write fields + */ + struct task_struct *__current; /* Pointer to current */ + u64 kstack; /* Saved Kernel stack addr */ + u64 stab_rr; /* stab/slb round-robin counter */ + u64 next_jiffy_update_tb; /* TB value for next jiffy update */ + u64 saved_r1; /* r1 save for RTAS calls */ + u64 saved_msr; /* MSR saved here by enter_rtas */ + u8 proc_enabled; /* irq soft-enable flag */ + + /* not yet used */ + u64 exdsi[8]; /* used for linear mapping hash table misses */ + + /* + * iSeries structure which the hypervisor knows about - + * this structure should not cross a page boundary. + * The vpa_init/register_vpa call is now known to fail if the + * lppaca structure crosses a page boundary. + * The lppaca is also used on POWER5 pSeries boxes. + * The lppaca is 640 bytes long, and cannot readily change + * since the hypervisor knows its layout, so a 1kB + * alignment will suffice to ensure that it doesn't + * cross a page boundary. + */ + struct lppaca lppaca __attribute__((__aligned__(0x400))); +#ifdef CONFIG_PPC_ISERIES + struct ItLpRegSave reg_save; +#endif +}; + +extern struct paca_struct paca[]; + +#endif /* _ASM_POWERPC_PACA_H */ Index: working-2.6/include/asm-ppc64/lppaca.h =================================================================== --- working-2.6.orig/include/asm-ppc64/lppaca.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,132 +0,0 @@ -/* - * lppaca.h - * Copyright (C) 2001 Mike Corrigan IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ -#ifndef _ASM_LPPACA_H -#define _ASM_LPPACA_H - -//============================================================================= -// -// This control block contains the data that is shared between the -// hypervisor (PLIC) and the OS. -// -// -//---------------------------------------------------------------------------- -#include - -struct lppaca -{ -//============================================================================= -// CACHE_LINE_1 0x0000 - 0x007F Contains read-only data -// NOTE: The xDynXyz fields are fields that will be dynamically changed by -// PLIC when preparing to bring a processor online or when dispatching a -// virtual processor! -//============================================================================= - u32 desc; // Eye catcher 0xD397D781 x00-x03 - u16 size; // Size of this struct x04-x05 - u16 reserved1; // Reserved x06-x07 - u16 reserved2:14; // Reserved x08-x09 - u8 shared_proc:1; // Shared processor indicator ... - u8 secondary_thread:1; // Secondary thread indicator ... - volatile u8 dyn_proc_status:8; // Dynamic Status of this proc x0A-x0A - u8 secondary_thread_count; // Secondary thread count x0B-x0B - volatile u16 dyn_hv_phys_proc_index;// Dynamic HV Physical Proc Index0C-x0D - volatile u16 dyn_hv_log_proc_index;// Dynamic HV Logical Proc Indexx0E-x0F - u32 decr_val; // Value for Decr programming x10-x13 - u32 pmc_val; // Value for PMC regs x14-x17 - volatile u32 dyn_hw_node_id; // Dynamic Hardware Node id x18-x1B - volatile u32 dyn_hw_proc_id; // Dynamic Hardware Proc Id x1C-x1F - volatile u32 dyn_pir; // Dynamic ProcIdReg value x20-x23 - u32 dsei_data; // DSEI data x24-x27 - u64 sprg3; // SPRG3 value x28-x2F - u8 reserved3[80]; // Reserved x30-x7F - -//============================================================================= -// CACHE_LINE_2 0x0080 - 0x00FF Contains local read-write data -//============================================================================= - // This Dword contains a byte for each type of interrupt that can occur. - // The IPI is a count while the others are just a binary 1 or 0. - union { - u64 any_int; - struct { - u16 reserved; // Reserved - cleared by #mpasmbl - u8 xirr_int; // Indicates xXirrValue is valid or Immed IO - u8 ipi_cnt; // IPI Count - u8 decr_int; // DECR interrupt occurred - u8 pdc_int; // PDC interrupt occurred - u8 quantum_int; // Interrupt quantum reached - u8 old_plic_deferred_ext_int; // Old PLIC has a deferred XIRR pending - } fields; - } int_dword; - - // Whenever any fields in this Dword are set then PLIC will defer the - // processing of external interrupts. Note that PLIC will store the - // XIRR directly into the xXirrValue field so that another XIRR will - // not be presented until this one clears. The layout of the low - // 4-bytes of this Dword is upto SLIC - PLIC just checks whether the - // entire Dword is zero or not. A non-zero value in the low order - // 2-bytes will result in SLIC being granted the highest thread - // priority upon return. A 0 will return to SLIC as medium priority. - u64 plic_defer_ints_area; // Entire Dword - - // Used to pass the real SRR0/1 from PLIC to SLIC as well as to - // pass the target SRR0/1 from SLIC to PLIC on a SetAsrAndRfid. - u64 saved_srr0; // Saved SRR0 x10-x17 - u64 saved_srr1; // Saved SRR1 x18-x1F - - // Used to pass parms from the OS to PLIC for SetAsrAndRfid - u64 saved_gpr3; // Saved GPR3 x20-x27 - u64 saved_gpr4; // Saved GPR4 x28-x2F - u64 saved_gpr5; // Saved GPR5 x30-x37 - - u8 reserved4; // Reserved x38-x38 - u8 cpuctls_task_attrs; // Task attributes for cpuctls x39-x39 - u8 fpregs_in_use; // FP regs in use x3A-x3A - u8 pmcregs_in_use; // PMC regs in use x3B-x3B - volatile u32 saved_decr; // Saved Decr Value x3C-x3F - volatile u64 emulated_time_base;// Emulated TB for this thread x40-x47 - volatile u64 cur_plic_latency; // Unaccounted PLIC latency x48-x4F - u64 tot_plic_latency; // Accumulated PLIC latency x50-x57 - u64 wait_state_cycles; // Wait cycles for this proc x58-x5F - u64 end_of_quantum; // TB at end of quantum x60-x67 - u64 pdc_saved_sprg1; // Saved SPRG1 for PMC int x68-x6F - u64 pdc_saved_srr0; // Saved SRR0 for PMC int x70-x77 - volatile u32 virtual_decr; // Virtual DECR for shared procsx78-x7B - u16 slb_count; // # of SLBs to maintain x7C-x7D - u8 idle; // Indicate OS is idle x7E - u8 vmxregs_in_use; // VMX registers in use x7F - - -//============================================================================= -// CACHE_LINE_3 0x0100 - 0x007F: This line is shared with other processors -//============================================================================= - // This is the yield_count. An "odd" value (low bit on) means that - // the processor is yielded (either because of an OS yield or a PLIC - // preempt). An even value implies that the processor is currently - // executing. - // NOTE: This value will ALWAYS be zero for dedicated processors and - // will NEVER be zero for shared processors (ie, initialized to a 1). - volatile u32 yield_count; // PLIC increments each dispatchx00-x03 - u8 reserved6[124]; // Reserved x04-x7F - -//============================================================================= -// CACHE_LINE_4-5 0x0100 - 0x01FF Contains PMC interrupt data -//============================================================================= - u8 pmc_save_area[256]; // PMC interrupt Area x00-xFF -}; - -#endif /* _ASM_LPPACA_H */ Index: working-2.6/include/asm-ppc64/paca.h =================================================================== --- working-2.6.orig/include/asm-ppc64/paca.h 2005-11-08 10:57:23.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,121 +0,0 @@ -#ifndef _PPC64_PACA_H -#define _PPC64_PACA_H - -/* - * include/asm-ppc64/paca.h - * - * This control block defines the PACA which defines the processor - * specific data for each logical processor on the system. - * There are some pointers defined that are utilized by PLIC. - * - * C 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include - -register struct paca_struct *local_paca asm("r13"); -#define get_paca() local_paca - -struct task_struct; - -/* - * Defines the layout of the paca. - * - * This structure is not directly accessed by firmware or the service - * processor except for the first two pointers that point to the - * lppaca area and the ItLpRegSave area for this CPU. Both the - * lppaca and ItLpRegSave objects are currently contained within the - * PACA but they do not need to be. - */ -struct paca_struct { - /* - * Because hw_cpu_id, unlike other paca fields, is accessed - * routinely from other CPUs (from the IRQ code), we stick to - * read-only (after boot) fields in the first cacheline to - * avoid cacheline bouncing. - */ - - /* - * MAGIC: These first two pointers can't be moved - they're - * accessed by the firmware - */ - struct lppaca *lppaca_ptr; /* Pointer to LpPaca for PLIC */ - struct ItLpRegSave *reg_save_ptr; /* Pointer to LpRegSave for PLIC */ - - /* - * MAGIC: the spinlock functions in arch/ppc64/lib/locks.c - * load lock_token and paca_index with a single lwz - * instruction. They must travel together and be properly - * aligned. - */ - u16 lock_token; /* Constant 0x8000, used in locks */ - u16 paca_index; /* Logical processor number */ - - u32 default_decr; /* Default decrementer value */ - u64 kernel_toc; /* Kernel TOC address */ - u64 stab_real; /* Absolute address of segment table */ - u64 stab_addr; /* Virtual address of segment table */ - void *emergency_sp; /* pointer to emergency stack */ - s16 hw_cpu_id; /* Physical processor number */ - u8 cpu_start; /* At startup, processor spins until */ - /* this becomes non-zero. */ - - /* - * Now, starting in cacheline 2, the exception save areas - */ - /* used for most interrupts/exceptions */ - u64 exgen[10] __attribute__((aligned(0x80))); - u64 exmc[10]; /* used for machine checks */ - u64 exslb[10]; /* used for SLB/segment table misses - * on the linear mapping */ -#ifdef CONFIG_PPC_64K_PAGES - pgd_t *pgdir; -#endif /* CONFIG_PPC_64K_PAGES */ - - mm_context_t context; - u16 slb_cache[SLB_CACHE_ENTRIES]; - u16 slb_cache_ptr; - - /* - * then miscellaneous read-write fields - */ - struct task_struct *__current; /* Pointer to current */ - u64 kstack; /* Saved Kernel stack addr */ - u64 stab_rr; /* stab/slb round-robin counter */ - u64 next_jiffy_update_tb; /* TB value for next jiffy update */ - u64 saved_r1; /* r1 save for RTAS calls */ - u64 saved_msr; /* MSR saved here by enter_rtas */ - u8 proc_enabled; /* irq soft-enable flag */ - - /* not yet used */ - u64 exdsi[8]; /* used for linear mapping hash table misses */ - - /* - * iSeries structure which the hypervisor knows about - - * this structure should not cross a page boundary. - * The vpa_init/register_vpa call is now known to fail if the - * lppaca structure crosses a page boundary. - * The lppaca is also used on POWER5 pSeries boxes. - * The lppaca is 640 bytes long, and cannot readily change - * since the hypervisor knows its layout, so a 1kB - * alignment will suffice to ensure that it doesn't - * cross a page boundary. - */ - struct lppaca lppaca __attribute__((__aligned__(0x400))); -#ifdef CONFIG_PPC_ISERIES - struct ItLpRegSave reg_save; -#endif -}; - -extern struct paca_struct paca[]; - -#endif /* _PPC64_PACA_H */ Index: working-2.6/include/asm-powerpc/compat.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/compat.h 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,205 @@ +#ifndef _ASM_POWERPC_COMPAT_H +#define _ASM_POWERPC_COMPAT_H +/* + * Architecture specific compatibility types + */ +#include +#include + +#define COMPAT_USER_HZ 100 + +typedef u32 compat_size_t; +typedef s32 compat_ssize_t; +typedef s32 compat_time_t; +typedef s32 compat_clock_t; +typedef s32 compat_pid_t; +typedef u32 __compat_uid_t; +typedef u32 __compat_gid_t; +typedef u32 __compat_uid32_t; +typedef u32 __compat_gid32_t; +typedef u32 compat_mode_t; +typedef u32 compat_ino_t; +typedef u32 compat_dev_t; +typedef s32 compat_off_t; +typedef s64 compat_loff_t; +typedef s16 compat_nlink_t; +typedef u16 compat_ipc_pid_t; +typedef s32 compat_daddr_t; +typedef u32 compat_caddr_t; +typedef __kernel_fsid_t compat_fsid_t; +typedef s32 compat_key_t; +typedef s32 compat_timer_t; + +typedef s32 compat_int_t; +typedef s32 compat_long_t; +typedef u32 compat_uint_t; +typedef u32 compat_ulong_t; + +struct compat_timespec { + compat_time_t tv_sec; + s32 tv_nsec; +}; + +struct compat_timeval { + compat_time_t tv_sec; + s32 tv_usec; +}; + +struct compat_stat { + compat_dev_t st_dev; + compat_ino_t st_ino; + compat_mode_t st_mode; + compat_nlink_t st_nlink; + __compat_uid32_t st_uid; + __compat_gid32_t st_gid; + compat_dev_t st_rdev; + compat_off_t st_size; + compat_off_t st_blksize; + compat_off_t st_blocks; + compat_time_t st_atime; + u32 st_atime_nsec; + compat_time_t st_mtime; + u32 st_mtime_nsec; + compat_time_t st_ctime; + u32 st_ctime_nsec; + u32 __unused4[2]; +}; + +struct compat_flock { + short l_type; + short l_whence; + compat_off_t l_start; + compat_off_t l_len; + compat_pid_t l_pid; +}; + +#define F_GETLK64 12 /* using 'struct flock64' */ +#define F_SETLK64 13 +#define F_SETLKW64 14 + +struct compat_flock64 { + short l_type; + short l_whence; + compat_loff_t l_start; + compat_loff_t l_len; + compat_pid_t l_pid; +}; + +struct compat_statfs { + int f_type; + int f_bsize; + int f_blocks; + int f_bfree; + int f_bavail; + int f_files; + int f_ffree; + compat_fsid_t f_fsid; + int f_namelen; /* SunOS ignores this field. */ + int f_frsize; + int f_spare[5]; +}; + +#define COMPAT_RLIM_OLD_INFINITY 0x7fffffff +#define COMPAT_RLIM_INFINITY 0xffffffff + +typedef u32 compat_old_sigset_t; + +#define _COMPAT_NSIG 64 +#define _COMPAT_NSIG_BPW 32 + +typedef u32 compat_sigset_word; + +#define COMPAT_OFF_T_MAX 0x7fffffff +#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL + +/* + * A pointer passed in from user mode. This should not + * be used for syscall parameters, just declare them + * as pointers because the syscall entry code will have + * appropriately comverted them already. + */ +typedef u32 compat_uptr_t; + +static inline void __user *compat_ptr(compat_uptr_t uptr) +{ + return (void __user *)(unsigned long)uptr; +} + +static inline void __user *compat_alloc_user_space(long len) +{ + struct pt_regs *regs = current->thread.regs; + unsigned long usp = regs->gpr[1]; + + /* + * We cant access below the stack pointer in the 32bit ABI and + * can access 288 bytes in the 64bit ABI + */ + if (!(test_thread_flag(TIF_32BIT))) + usp -= 288; + + return (void __user *) (usp - len); +} + +/* + * ipc64_perm is actually 32/64bit clean but since the compat layer refers to + * it we may as well define it. + */ +struct compat_ipc64_perm { + compat_key_t key; + __compat_uid_t uid; + __compat_gid_t gid; + __compat_uid_t cuid; + __compat_gid_t cgid; + compat_mode_t mode; + unsigned int seq; + unsigned int __pad2; + unsigned long __unused1; /* yes they really are 64bit pads */ + unsigned long __unused2; +}; + +struct compat_semid64_ds { + struct compat_ipc64_perm sem_perm; + unsigned int __unused1; + compat_time_t sem_otime; + unsigned int __unused2; + compat_time_t sem_ctime; + compat_ulong_t sem_nsems; + compat_ulong_t __unused3; + compat_ulong_t __unused4; +}; + +struct compat_msqid64_ds { + struct compat_ipc64_perm msg_perm; + unsigned int __unused1; + compat_time_t msg_stime; + unsigned int __unused2; + compat_time_t msg_rtime; + unsigned int __unused3; + compat_time_t msg_ctime; + compat_ulong_t msg_cbytes; + compat_ulong_t msg_qnum; + compat_ulong_t msg_qbytes; + compat_pid_t msg_lspid; + compat_pid_t msg_lrpid; + compat_ulong_t __unused4; + compat_ulong_t __unused5; +}; + +struct compat_shmid64_ds { + struct compat_ipc64_perm shm_perm; + unsigned int __unused1; + compat_time_t shm_atime; + unsigned int __unused2; + compat_time_t shm_dtime; + unsigned int __unused3; + compat_time_t shm_ctime; + unsigned int __unused4; + compat_size_t shm_segsz; + compat_pid_t shm_cpid; + compat_pid_t shm_lpid; + compat_ulong_t shm_nattch; + compat_ulong_t __unused5; + compat_ulong_t __unused6; +}; + +#endif /* _ASM_POWERPC_COMPAT_H */ Index: working-2.6/include/asm-ppc64/compat.h =================================================================== --- working-2.6.orig/include/asm-ppc64/compat.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,205 +0,0 @@ -#ifndef _ASM_PPC64_COMPAT_H -#define _ASM_PPC64_COMPAT_H -/* - * Architecture specific compatibility types - */ -#include -#include - -#define COMPAT_USER_HZ 100 - -typedef u32 compat_size_t; -typedef s32 compat_ssize_t; -typedef s32 compat_time_t; -typedef s32 compat_clock_t; -typedef s32 compat_pid_t; -typedef u32 __compat_uid_t; -typedef u32 __compat_gid_t; -typedef u32 __compat_uid32_t; -typedef u32 __compat_gid32_t; -typedef u32 compat_mode_t; -typedef u32 compat_ino_t; -typedef u32 compat_dev_t; -typedef s32 compat_off_t; -typedef s64 compat_loff_t; -typedef s16 compat_nlink_t; -typedef u16 compat_ipc_pid_t; -typedef s32 compat_daddr_t; -typedef u32 compat_caddr_t; -typedef __kernel_fsid_t compat_fsid_t; -typedef s32 compat_key_t; -typedef s32 compat_timer_t; - -typedef s32 compat_int_t; -typedef s32 compat_long_t; -typedef u32 compat_uint_t; -typedef u32 compat_ulong_t; - -struct compat_timespec { - compat_time_t tv_sec; - s32 tv_nsec; -}; - -struct compat_timeval { - compat_time_t tv_sec; - s32 tv_usec; -}; - -struct compat_stat { - compat_dev_t st_dev; - compat_ino_t st_ino; - compat_mode_t st_mode; - compat_nlink_t st_nlink; - __compat_uid32_t st_uid; - __compat_gid32_t st_gid; - compat_dev_t st_rdev; - compat_off_t st_size; - compat_off_t st_blksize; - compat_off_t st_blocks; - compat_time_t st_atime; - u32 st_atime_nsec; - compat_time_t st_mtime; - u32 st_mtime_nsec; - compat_time_t st_ctime; - u32 st_ctime_nsec; - u32 __unused4[2]; -}; - -struct compat_flock { - short l_type; - short l_whence; - compat_off_t l_start; - compat_off_t l_len; - compat_pid_t l_pid; -}; - -#define F_GETLK64 12 /* using 'struct flock64' */ -#define F_SETLK64 13 -#define F_SETLKW64 14 - -struct compat_flock64 { - short l_type; - short l_whence; - compat_loff_t l_start; - compat_loff_t l_len; - compat_pid_t l_pid; -}; - -struct compat_statfs { - int f_type; - int f_bsize; - int f_blocks; - int f_bfree; - int f_bavail; - int f_files; - int f_ffree; - compat_fsid_t f_fsid; - int f_namelen; /* SunOS ignores this field. */ - int f_frsize; - int f_spare[5]; -}; - -#define COMPAT_RLIM_OLD_INFINITY 0x7fffffff -#define COMPAT_RLIM_INFINITY 0xffffffff - -typedef u32 compat_old_sigset_t; - -#define _COMPAT_NSIG 64 -#define _COMPAT_NSIG_BPW 32 - -typedef u32 compat_sigset_word; - -#define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL - -/* - * A pointer passed in from user mode. This should not - * be used for syscall parameters, just declare them - * as pointers because the syscall entry code will have - * appropriately comverted them already. - */ -typedef u32 compat_uptr_t; - -static inline void __user *compat_ptr(compat_uptr_t uptr) -{ - return (void __user *)(unsigned long)uptr; -} - -static inline void __user *compat_alloc_user_space(long len) -{ - struct pt_regs *regs = current->thread.regs; - unsigned long usp = regs->gpr[1]; - - /* - * We cant access below the stack pointer in the 32bit ABI and - * can access 288 bytes in the 64bit ABI - */ - if (!(test_thread_flag(TIF_32BIT))) - usp -= 288; - - return (void __user *) (usp - len); -} - -/* - * ipc64_perm is actually 32/64bit clean but since the compat layer refers to - * it we may as well define it. - */ -struct compat_ipc64_perm { - compat_key_t key; - __compat_uid_t uid; - __compat_gid_t gid; - __compat_uid_t cuid; - __compat_gid_t cgid; - compat_mode_t mode; - unsigned int seq; - unsigned int __pad2; - unsigned long __unused1; /* yes they really are 64bit pads */ - unsigned long __unused2; -}; - -struct compat_semid64_ds { - struct compat_ipc64_perm sem_perm; - unsigned int __unused1; - compat_time_t sem_otime; - unsigned int __unused2; - compat_time_t sem_ctime; - compat_ulong_t sem_nsems; - compat_ulong_t __unused3; - compat_ulong_t __unused4; -}; - -struct compat_msqid64_ds { - struct compat_ipc64_perm msg_perm; - unsigned int __unused1; - compat_time_t msg_stime; - unsigned int __unused2; - compat_time_t msg_rtime; - unsigned int __unused3; - compat_time_t msg_ctime; - compat_ulong_t msg_cbytes; - compat_ulong_t msg_qnum; - compat_ulong_t msg_qbytes; - compat_pid_t msg_lspid; - compat_pid_t msg_lrpid; - compat_ulong_t __unused4; - compat_ulong_t __unused5; -}; - -struct compat_shmid64_ds { - struct compat_ipc64_perm shm_perm; - unsigned int __unused1; - compat_time_t shm_atime; - unsigned int __unused2; - compat_time_t shm_dtime; - unsigned int __unused3; - compat_time_t shm_ctime; - unsigned int __unused4; - compat_size_t shm_segsz; - compat_pid_t shm_cpid; - compat_pid_t shm_lpid; - compat_ulong_t shm_nattch; - compat_ulong_t __unused5; - compat_ulong_t __unused6; -}; - -#endif /* _ASM_PPC64_COMPAT_H */ Index: working-2.6/arch/powerpc/kernel/paca.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/paca.c 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,143 @@ +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +static union { + struct systemcfg data; + u8 page[PAGE_SIZE]; +} systemcfg_store __attribute__((__section__(".data.page.aligned"))); +struct systemcfg *systemcfg = &systemcfg_store.data; +EXPORT_SYMBOL(systemcfg); + + +/* This symbol is provided by the linker - let it fill in the paca + * field correctly */ +extern unsigned long __toc_start; + +/* The Paca is an array with one entry per processor. Each contains an + * lppaca, which contains the information shared between the + * hypervisor and Linux. Each also contains an ItLpRegSave area which + * is used by the hypervisor to save registers. + * On systems with hardware multi-threading, there are two threads + * per processor. The Paca array must contain an entry for each thread. + * The VPD Areas will give a max logical processors = 2 * max physical + * processors. The processor VPD array needs one entry per physical + * processor (not thread). + */ +#define PACA_INIT_COMMON(number, start, asrr, asrv) \ + .lock_token = 0x8000, \ + .paca_index = (number), /* Paca Index */ \ + .default_decr = 0x00ff0000, /* Initial Decr */ \ + .kernel_toc = (unsigned long)(&__toc_start) + 0x8000UL, \ + .stab_real = (asrr), /* Real pointer to segment table */ \ + .stab_addr = (asrv), /* Virt pointer to segment table */ \ + .cpu_start = (start), /* Processor start */ \ + .hw_cpu_id = 0xffff, \ + .lppaca = { \ + .desc = 0xd397d781, /* "LpPa" */ \ + .size = sizeof(struct lppaca), \ + .dyn_proc_status = 2, \ + .decr_val = 0x00ff0000, \ + .fpregs_in_use = 1, \ + .end_of_quantum = 0xfffffffffffffffful, \ + .slb_count = 64, \ + .vmxregs_in_use = 0, \ + }, \ + +#ifdef CONFIG_PPC_ISERIES +#define PACA_INIT_ISERIES(number) \ + .lppaca_ptr = &paca[number].lppaca, \ + .reg_save_ptr = &paca[number].reg_save, \ + .reg_save = { \ + .xDesc = 0xd397d9e2, /* "LpRS" */ \ + .xSize = sizeof(struct ItLpRegSave) \ + } + +#define PACA_INIT(number) \ +{ \ + PACA_INIT_COMMON(number, 0, 0, 0) \ + PACA_INIT_ISERIES(number) \ +} + +#define BOOTCPU_PACA_INIT(number) \ +{ \ + PACA_INIT_COMMON(number, 1, 0, (u64)&initial_stab) \ + PACA_INIT_ISERIES(number) \ +} + +#else +#define PACA_INIT(number) \ +{ \ + PACA_INIT_COMMON(number, 0, 0, 0) \ +} + +#define BOOTCPU_PACA_INIT(number) \ +{ \ + PACA_INIT_COMMON(number, 1, STAB0_PHYS_ADDR, (u64)&initial_stab) \ +} +#endif + +struct paca_struct paca[] = { + BOOTCPU_PACA_INIT(0), +#if NR_CPUS > 1 + PACA_INIT( 1), PACA_INIT( 2), PACA_INIT( 3), +#if NR_CPUS > 4 + PACA_INIT( 4), PACA_INIT( 5), PACA_INIT( 6), PACA_INIT( 7), +#if NR_CPUS > 8 + PACA_INIT( 8), PACA_INIT( 9), PACA_INIT( 10), PACA_INIT( 11), + PACA_INIT( 12), PACA_INIT( 13), PACA_INIT( 14), PACA_INIT( 15), + PACA_INIT( 16), PACA_INIT( 17), PACA_INIT( 18), PACA_INIT( 19), + PACA_INIT( 20), PACA_INIT( 21), PACA_INIT( 22), PACA_INIT( 23), + PACA_INIT( 24), PACA_INIT( 25), PACA_INIT( 26), PACA_INIT( 27), + PACA_INIT( 28), PACA_INIT( 29), PACA_INIT( 30), PACA_INIT( 31), +#if NR_CPUS > 32 + PACA_INIT( 32), PACA_INIT( 33), PACA_INIT( 34), PACA_INIT( 35), + PACA_INIT( 36), PACA_INIT( 37), PACA_INIT( 38), PACA_INIT( 39), + PACA_INIT( 40), PACA_INIT( 41), PACA_INIT( 42), PACA_INIT( 43), + PACA_INIT( 44), PACA_INIT( 45), PACA_INIT( 46), PACA_INIT( 47), + PACA_INIT( 48), PACA_INIT( 49), PACA_INIT( 50), PACA_INIT( 51), + PACA_INIT( 52), PACA_INIT( 53), PACA_INIT( 54), PACA_INIT( 55), + PACA_INIT( 56), PACA_INIT( 57), PACA_INIT( 58), PACA_INIT( 59), + PACA_INIT( 60), PACA_INIT( 61), PACA_INIT( 62), PACA_INIT( 63), +#if NR_CPUS > 64 + PACA_INIT( 64), PACA_INIT( 65), PACA_INIT( 66), PACA_INIT( 67), + PACA_INIT( 68), PACA_INIT( 69), PACA_INIT( 70), PACA_INIT( 71), + PACA_INIT( 72), PACA_INIT( 73), PACA_INIT( 74), PACA_INIT( 75), + PACA_INIT( 76), PACA_INIT( 77), PACA_INIT( 78), PACA_INIT( 79), + PACA_INIT( 80), PACA_INIT( 81), PACA_INIT( 82), PACA_INIT( 83), + PACA_INIT( 84), PACA_INIT( 85), PACA_INIT( 86), PACA_INIT( 87), + PACA_INIT( 88), PACA_INIT( 89), PACA_INIT( 90), PACA_INIT( 91), + PACA_INIT( 92), PACA_INIT( 93), PACA_INIT( 94), PACA_INIT( 95), + PACA_INIT( 96), PACA_INIT( 97), PACA_INIT( 98), PACA_INIT( 99), + PACA_INIT(100), PACA_INIT(101), PACA_INIT(102), PACA_INIT(103), + PACA_INIT(104), PACA_INIT(105), PACA_INIT(106), PACA_INIT(107), + PACA_INIT(108), PACA_INIT(109), PACA_INIT(110), PACA_INIT(111), + PACA_INIT(112), PACA_INIT(113), PACA_INIT(114), PACA_INIT(115), + PACA_INIT(116), PACA_INIT(117), PACA_INIT(118), PACA_INIT(119), + PACA_INIT(120), PACA_INIT(121), PACA_INIT(122), PACA_INIT(123), + PACA_INIT(124), PACA_INIT(125), PACA_INIT(126), PACA_INIT(127), +#endif +#endif +#endif +#endif +#endif +}; +EXPORT_SYMBOL(paca); Index: working-2.6/arch/ppc64/kernel/pacaData.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/pacaData.c 2005-11-08 10:57:14.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,143 +0,0 @@ -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include - -#include -#include -#include - -#include -#include -#include - -static union { - struct systemcfg data; - u8 page[PAGE_SIZE]; -} systemcfg_store __attribute__((__section__(".data.page.aligned"))); -struct systemcfg *systemcfg = &systemcfg_store.data; -EXPORT_SYMBOL(systemcfg); - - -/* This symbol is provided by the linker - let it fill in the paca - * field correctly */ -extern unsigned long __toc_start; - -/* The Paca is an array with one entry per processor. Each contains an - * lppaca, which contains the information shared between the - * hypervisor and Linux. Each also contains an ItLpRegSave area which - * is used by the hypervisor to save registers. - * On systems with hardware multi-threading, there are two threads - * per processor. The Paca array must contain an entry for each thread. - * The VPD Areas will give a max logical processors = 2 * max physical - * processors. The processor VPD array needs one entry per physical - * processor (not thread). - */ -#define PACA_INIT_COMMON(number, start, asrr, asrv) \ - .lock_token = 0x8000, \ - .paca_index = (number), /* Paca Index */ \ - .default_decr = 0x00ff0000, /* Initial Decr */ \ - .kernel_toc = (unsigned long)(&__toc_start) + 0x8000UL, \ - .stab_real = (asrr), /* Real pointer to segment table */ \ - .stab_addr = (asrv), /* Virt pointer to segment table */ \ - .cpu_start = (start), /* Processor start */ \ - .hw_cpu_id = 0xffff, \ - .lppaca = { \ - .desc = 0xd397d781, /* "LpPa" */ \ - .size = sizeof(struct lppaca), \ - .dyn_proc_status = 2, \ - .decr_val = 0x00ff0000, \ - .fpregs_in_use = 1, \ - .end_of_quantum = 0xfffffffffffffffful, \ - .slb_count = 64, \ - .vmxregs_in_use = 0, \ - }, \ - -#ifdef CONFIG_PPC_ISERIES -#define PACA_INIT_ISERIES(number) \ - .lppaca_ptr = &paca[number].lppaca, \ - .reg_save_ptr = &paca[number].reg_save, \ - .reg_save = { \ - .xDesc = 0xd397d9e2, /* "LpRS" */ \ - .xSize = sizeof(struct ItLpRegSave) \ - } - -#define PACA_INIT(number) \ -{ \ - PACA_INIT_COMMON(number, 0, 0, 0) \ - PACA_INIT_ISERIES(number) \ -} - -#define BOOTCPU_PACA_INIT(number) \ -{ \ - PACA_INIT_COMMON(number, 1, 0, (u64)&initial_stab) \ - PACA_INIT_ISERIES(number) \ -} - -#else -#define PACA_INIT(number) \ -{ \ - PACA_INIT_COMMON(number, 0, 0, 0) \ -} - -#define BOOTCPU_PACA_INIT(number) \ -{ \ - PACA_INIT_COMMON(number, 1, STAB0_PHYS_ADDR, (u64)&initial_stab) \ -} -#endif - -struct paca_struct paca[] = { - BOOTCPU_PACA_INIT(0), -#if NR_CPUS > 1 - PACA_INIT( 1), PACA_INIT( 2), PACA_INIT( 3), -#if NR_CPUS > 4 - PACA_INIT( 4), PACA_INIT( 5), PACA_INIT( 6), PACA_INIT( 7), -#if NR_CPUS > 8 - PACA_INIT( 8), PACA_INIT( 9), PACA_INIT( 10), PACA_INIT( 11), - PACA_INIT( 12), PACA_INIT( 13), PACA_INIT( 14), PACA_INIT( 15), - PACA_INIT( 16), PACA_INIT( 17), PACA_INIT( 18), PACA_INIT( 19), - PACA_INIT( 20), PACA_INIT( 21), PACA_INIT( 22), PACA_INIT( 23), - PACA_INIT( 24), PACA_INIT( 25), PACA_INIT( 26), PACA_INIT( 27), - PACA_INIT( 28), PACA_INIT( 29), PACA_INIT( 30), PACA_INIT( 31), -#if NR_CPUS > 32 - PACA_INIT( 32), PACA_INIT( 33), PACA_INIT( 34), PACA_INIT( 35), - PACA_INIT( 36), PACA_INIT( 37), PACA_INIT( 38), PACA_INIT( 39), - PACA_INIT( 40), PACA_INIT( 41), PACA_INIT( 42), PACA_INIT( 43), - PACA_INIT( 44), PACA_INIT( 45), PACA_INIT( 46), PACA_INIT( 47), - PACA_INIT( 48), PACA_INIT( 49), PACA_INIT( 50), PACA_INIT( 51), - PACA_INIT( 52), PACA_INIT( 53), PACA_INIT( 54), PACA_INIT( 55), - PACA_INIT( 56), PACA_INIT( 57), PACA_INIT( 58), PACA_INIT( 59), - PACA_INIT( 60), PACA_INIT( 61), PACA_INIT( 62), PACA_INIT( 63), -#if NR_CPUS > 64 - PACA_INIT( 64), PACA_INIT( 65), PACA_INIT( 66), PACA_INIT( 67), - PACA_INIT( 68), PACA_INIT( 69), PACA_INIT( 70), PACA_INIT( 71), - PACA_INIT( 72), PACA_INIT( 73), PACA_INIT( 74), PACA_INIT( 75), - PACA_INIT( 76), PACA_INIT( 77), PACA_INIT( 78), PACA_INIT( 79), - PACA_INIT( 80), PACA_INIT( 81), PACA_INIT( 82), PACA_INIT( 83), - PACA_INIT( 84), PACA_INIT( 85), PACA_INIT( 86), PACA_INIT( 87), - PACA_INIT( 88), PACA_INIT( 89), PACA_INIT( 90), PACA_INIT( 91), - PACA_INIT( 92), PACA_INIT( 93), PACA_INIT( 94), PACA_INIT( 95), - PACA_INIT( 96), PACA_INIT( 97), PACA_INIT( 98), PACA_INIT( 99), - PACA_INIT(100), PACA_INIT(101), PACA_INIT(102), PACA_INIT(103), - PACA_INIT(104), PACA_INIT(105), PACA_INIT(106), PACA_INIT(107), - PACA_INIT(108), PACA_INIT(109), PACA_INIT(110), PACA_INIT(111), - PACA_INIT(112), PACA_INIT(113), PACA_INIT(114), PACA_INIT(115), - PACA_INIT(116), PACA_INIT(117), PACA_INIT(118), PACA_INIT(119), - PACA_INIT(120), PACA_INIT(121), PACA_INIT(122), PACA_INIT(123), - PACA_INIT(124), PACA_INIT(125), PACA_INIT(126), PACA_INIT(127), -#endif -#endif -#endif -#endif -#endif -}; -EXPORT_SYMBOL(paca); Index: working-2.6/arch/powerpc/kernel/Makefile =================================================================== --- working-2.6.orig/arch/powerpc/kernel/Makefile 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/Makefile 2005-11-09 13:23:04.000000000 +1100 @@ -4,6 +4,7 @@ ifeq ($(CONFIG_PPC64),y) EXTRA_CFLAGS += -mno-minimal-toc +CFLAGS_ioctl32.o += -Ifs/ endif ifeq ($(CONFIG_PPC32),y) CFLAGS_prom_init.o += -fPIC @@ -13,7 +14,9 @@ obj-y := semaphore.o cputable.o ptrace.o syscalls.o \ signal_32.o pmc.o obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ - signal_64.o ptrace32.o systbl.o + signal_64.o ptrace32.o systbl.o \ + paca.o ioctl32.o cpu_setup_power4.o \ + firmware.o obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o obj-$(CONFIG_POWER4) += idle_power4.o obj-$(CONFIG_PPC_OF) += of_device.o Index: working-2.6/arch/powerpc/kernel/cpu_setup_power4.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/cpu_setup_power4.S 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,233 @@ +/* + * This file contains low level CPU setup functions. + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include +#include +#include +#include +#include +#include +#include + +_GLOBAL(__970_cpu_preinit) + /* + * Do nothing if not running in HV mode + */ + mfmsr r0 + rldicl. r0,r0,4,63 + beqlr + + /* + * Deal only with PPC970 and PPC970FX. + */ + mfspr r0,SPRN_PVR + srwi r0,r0,16 + cmpwi r0,0x39 + beq 1f + cmpwi r0,0x3c + beq 1f + cmpwi r0,0x44 + bnelr +1: + + /* Make sure HID4:rm_ci is off before MMU is turned off, that large + * pages are enabled with HID4:61 and clear HID5:DCBZ_size and + * HID5:DCBZ32_ill + */ + li r0,0 + mfspr r3,SPRN_HID4 + rldimi r3,r0,40,23 /* clear bit 23 (rm_ci) */ + rldimi r3,r0,2,61 /* clear bit 61 (lg_pg_en) */ + sync + mtspr SPRN_HID4,r3 + isync + sync + mfspr r3,SPRN_HID5 + rldimi r3,r0,6,56 /* clear bits 56 & 57 (DCBZ*) */ + sync + mtspr SPRN_HID5,r3 + isync + sync + + /* Setup some basic HID1 features */ + mfspr r0,SPRN_HID1 + li r3,0x1200 /* enable i-fetch cacheability */ + sldi r3,r3,44 /* and prefetch */ + or r0,r0,r3 + mtspr SPRN_HID1,r0 + mtspr SPRN_HID1,r0 + isync + + /* Clear HIOR */ + li r0,0 + sync + mtspr SPRN_HIOR,0 /* Clear interrupt prefix */ + isync + blr + +_GLOBAL(__setup_cpu_power4) + blr + +_GLOBAL(__setup_cpu_be) + /* Set large page sizes LP=0: 16MB, LP=1: 64KB */ + addi r3, 0, 0 + ori r3, r3, HID6_LB + sldi r3, r3, 32 + nor r3, r3, r3 + mfspr r4, SPRN_HID6 + and r4, r4, r3 + addi r3, 0, 0x02000 + sldi r3, r3, 32 + or r4, r4, r3 + mtspr SPRN_HID6, r4 + blr + +_GLOBAL(__setup_cpu_ppc970) + mfspr r0,SPRN_HID0 + li r11,5 /* clear DOZE and SLEEP */ + rldimi r0,r11,52,8 /* set NAP and DPM */ + mtspr SPRN_HID0,r0 + mfspr r0,SPRN_HID0 + mfspr r0,SPRN_HID0 + mfspr r0,SPRN_HID0 + mfspr r0,SPRN_HID0 + mfspr r0,SPRN_HID0 + mfspr r0,SPRN_HID0 + sync + isync + blr + +/* Definitions for the table use to save CPU states */ +#define CS_HID0 0 +#define CS_HID1 8 +#define CS_HID4 16 +#define CS_HID5 24 +#define CS_SIZE 32 + + .data + .balign L1_CACHE_BYTES,0 +cpu_state_storage: + .space CS_SIZE + .balign L1_CACHE_BYTES,0 + .text + +/* Called in normal context to backup CPU 0 state. This + * does not include cache settings. This function is also + * called for machine sleep. This does not include the MMU + * setup, BATs, etc... but rather the "special" registers + * like HID0, HID1, HID4, etc... + */ +_GLOBAL(__save_cpu_setup) + /* Some CR fields are volatile, we back it up all */ + mfcr r7 + + /* Get storage ptr */ + LOADADDR(r5,cpu_state_storage) + + /* We only deal with 970 for now */ + mfspr r0,SPRN_PVR + srwi r0,r0,16 + cmpwi r0,0x39 + beq 1f + cmpwi r0,0x3c + beq 1f + cmpwi r0,0x44 + bne 2f + +1: /* Save HID0,1,4 and 5 */ + mfspr r3,SPRN_HID0 + std r3,CS_HID0(r5) + mfspr r3,SPRN_HID1 + std r3,CS_HID1(r5) + mfspr r3,SPRN_HID4 + std r3,CS_HID4(r5) + mfspr r3,SPRN_HID5 + std r3,CS_HID5(r5) + +2: + mtcr r7 + blr + +/* Called with no MMU context (typically MSR:IR/DR off) to + * restore CPU state as backed up by the previous + * function. This does not include cache setting + */ +_GLOBAL(__restore_cpu_setup) + /* Get storage ptr (FIXME when using anton reloc as we + * are running with translation disabled here + */ + LOADADDR(r5,cpu_state_storage) + + /* We only deal with 970 for now */ + mfspr r0,SPRN_PVR + srwi r0,r0,16 + cmpwi r0,0x39 + beq 1f + cmpwi r0,0x3c + beq 1f + cmpwi r0,0x44 + bnelr + +1: /* Before accessing memory, we make sure rm_ci is clear */ + li r0,0 + mfspr r3,SPRN_HID4 + rldimi r3,r0,40,23 /* clear bit 23 (rm_ci) */ + sync + mtspr SPRN_HID4,r3 + isync + sync + + /* Clear interrupt prefix */ + li r0,0 + sync + mtspr SPRN_HIOR,0 + isync + + /* Restore HID0 */ + ld r3,CS_HID0(r5) + sync + isync + mtspr SPRN_HID0,r3 + mfspr r3,SPRN_HID0 + mfspr r3,SPRN_HID0 + mfspr r3,SPRN_HID0 + mfspr r3,SPRN_HID0 + mfspr r3,SPRN_HID0 + mfspr r3,SPRN_HID0 + sync + isync + + /* Restore HID1 */ + ld r3,CS_HID1(r5) + sync + isync + mtspr SPRN_HID1,r3 + mtspr SPRN_HID1,r3 + sync + isync + + /* Restore HID4 */ + ld r3,CS_HID4(r5) + sync + isync + mtspr SPRN_HID4,r3 + sync + isync + + /* Restore HID5 */ + ld r3,CS_HID5(r5) + sync + isync + mtspr SPRN_HID5,r3 + sync + isync + blr + Index: working-2.6/arch/powerpc/kernel/ioctl32.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/ioctl32.c 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,49 @@ +/* + * ioctl32.c: Conversion between 32bit and 64bit native ioctls. + * + * Based on sparc64 ioctl32.c by: + * + * Copyright (C) 1997-2000 Jakub Jelinek (jakub at redhat.com) + * Copyright (C) 1998 Eddie C. Dost (ecd at skynet.be) + * + * ppc64 changes: + * + * Copyright (C) 2000 Ken Aaker (kdaaker at rchland.vnet.ibm.com) + * Copyright (C) 2001 Anton Blanchard (antonb at au.ibm.com) + * + * These routines maintain argument size conversion between 32bit and 64bit + * ioctls. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define INCLUDES +#include "compat_ioctl.c" +#include + +#define CODE +#include "compat_ioctl.c" + +#define HANDLE_IOCTL(cmd,handler) { cmd, (ioctl_trans_handler_t)handler, NULL }, +#define COMPATIBLE_IOCTL(cmd) HANDLE_IOCTL(cmd,sys_ioctl) + +#define IOCTL_TABLE_START \ + struct ioctl_trans ioctl_start[] = { +#define IOCTL_TABLE_END \ + }; + +IOCTL_TABLE_START +#include +#define DECLARES +#include "compat_ioctl.c" + +/* Little p (/dev/rtc, /dev/envctrl, etc.) */ +COMPATIBLE_IOCTL(_IOR('p', 20, int[7])) /* RTCGET */ +COMPATIBLE_IOCTL(_IOW('p', 21, int[7])) /* RTCSET */ + +IOCTL_TABLE_END + +int ioctl_table_size = ARRAY_SIZE(ioctl_start); Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-09 13:23:04.000000000 +1100 @@ -12,11 +12,10 @@ endif obj-y += irq.o idle.o dma.o \ - align.o pacaData.o \ - udbg.o ioctl32.o \ + align.o \ + udbg.o \ rtc.o \ - cpu_setup_power4.o \ - iommu.o sysfs.o vdso.o firmware.o + iommu.o sysfs.o vdso.o obj-y += vdso32/ vdso64/ pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o @@ -52,8 +51,6 @@ obj-$(CONFIG_KPROBES) += kprobes.o -CFLAGS_ioctl32.o += -Ifs/ - ifneq ($(CONFIG_PPC_MERGE),y) ifeq ($(CONFIG_PPC_ISERIES),y) arch/ppc64/kernel/head.o: arch/powerpc/kernel/lparmap.s Index: working-2.6/arch/ppc64/kernel/cpu_setup_power4.S =================================================================== --- working-2.6.orig/arch/ppc64/kernel/cpu_setup_power4.S 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,233 +0,0 @@ -/* - * This file contains low level CPU setup functions. - * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - */ - -#include -#include -#include -#include -#include -#include -#include - -_GLOBAL(__970_cpu_preinit) - /* - * Do nothing if not running in HV mode - */ - mfmsr r0 - rldicl. r0,r0,4,63 - beqlr - - /* - * Deal only with PPC970 and PPC970FX. - */ - mfspr r0,SPRN_PVR - srwi r0,r0,16 - cmpwi r0,0x39 - beq 1f - cmpwi r0,0x3c - beq 1f - cmpwi r0,0x44 - bnelr -1: - - /* Make sure HID4:rm_ci is off before MMU is turned off, that large - * pages are enabled with HID4:61 and clear HID5:DCBZ_size and - * HID5:DCBZ32_ill - */ - li r0,0 - mfspr r3,SPRN_HID4 - rldimi r3,r0,40,23 /* clear bit 23 (rm_ci) */ - rldimi r3,r0,2,61 /* clear bit 61 (lg_pg_en) */ - sync - mtspr SPRN_HID4,r3 - isync - sync - mfspr r3,SPRN_HID5 - rldimi r3,r0,6,56 /* clear bits 56 & 57 (DCBZ*) */ - sync - mtspr SPRN_HID5,r3 - isync - sync - - /* Setup some basic HID1 features */ - mfspr r0,SPRN_HID1 - li r3,0x1200 /* enable i-fetch cacheability */ - sldi r3,r3,44 /* and prefetch */ - or r0,r0,r3 - mtspr SPRN_HID1,r0 - mtspr SPRN_HID1,r0 - isync - - /* Clear HIOR */ - li r0,0 - sync - mtspr SPRN_HIOR,0 /* Clear interrupt prefix */ - isync - blr - -_GLOBAL(__setup_cpu_power4) - blr - -_GLOBAL(__setup_cpu_be) - /* Set large page sizes LP=0: 16MB, LP=1: 64KB */ - addi r3, 0, 0 - ori r3, r3, HID6_LB - sldi r3, r3, 32 - nor r3, r3, r3 - mfspr r4, SPRN_HID6 - and r4, r4, r3 - addi r3, 0, 0x02000 - sldi r3, r3, 32 - or r4, r4, r3 - mtspr SPRN_HID6, r4 - blr - -_GLOBAL(__setup_cpu_ppc970) - mfspr r0,SPRN_HID0 - li r11,5 /* clear DOZE and SLEEP */ - rldimi r0,r11,52,8 /* set NAP and DPM */ - mtspr SPRN_HID0,r0 - mfspr r0,SPRN_HID0 - mfspr r0,SPRN_HID0 - mfspr r0,SPRN_HID0 - mfspr r0,SPRN_HID0 - mfspr r0,SPRN_HID0 - mfspr r0,SPRN_HID0 - sync - isync - blr - -/* Definitions for the table use to save CPU states */ -#define CS_HID0 0 -#define CS_HID1 8 -#define CS_HID4 16 -#define CS_HID5 24 -#define CS_SIZE 32 - - .data - .balign L1_CACHE_BYTES,0 -cpu_state_storage: - .space CS_SIZE - .balign L1_CACHE_BYTES,0 - .text - -/* Called in normal context to backup CPU 0 state. This - * does not include cache settings. This function is also - * called for machine sleep. This does not include the MMU - * setup, BATs, etc... but rather the "special" registers - * like HID0, HID1, HID4, etc... - */ -_GLOBAL(__save_cpu_setup) - /* Some CR fields are volatile, we back it up all */ - mfcr r7 - - /* Get storage ptr */ - LOADADDR(r5,cpu_state_storage) - - /* We only deal with 970 for now */ - mfspr r0,SPRN_PVR - srwi r0,r0,16 - cmpwi r0,0x39 - beq 1f - cmpwi r0,0x3c - beq 1f - cmpwi r0,0x44 - bne 2f - -1: /* Save HID0,1,4 and 5 */ - mfspr r3,SPRN_HID0 - std r3,CS_HID0(r5) - mfspr r3,SPRN_HID1 - std r3,CS_HID1(r5) - mfspr r3,SPRN_HID4 - std r3,CS_HID4(r5) - mfspr r3,SPRN_HID5 - std r3,CS_HID5(r5) - -2: - mtcr r7 - blr - -/* Called with no MMU context (typically MSR:IR/DR off) to - * restore CPU state as backed up by the previous - * function. This does not include cache setting - */ -_GLOBAL(__restore_cpu_setup) - /* Get storage ptr (FIXME when using anton reloc as we - * are running with translation disabled here - */ - LOADADDR(r5,cpu_state_storage) - - /* We only deal with 970 for now */ - mfspr r0,SPRN_PVR - srwi r0,r0,16 - cmpwi r0,0x39 - beq 1f - cmpwi r0,0x3c - beq 1f - cmpwi r0,0x44 - bnelr - -1: /* Before accessing memory, we make sure rm_ci is clear */ - li r0,0 - mfspr r3,SPRN_HID4 - rldimi r3,r0,40,23 /* clear bit 23 (rm_ci) */ - sync - mtspr SPRN_HID4,r3 - isync - sync - - /* Clear interrupt prefix */ - li r0,0 - sync - mtspr SPRN_HIOR,0 - isync - - /* Restore HID0 */ - ld r3,CS_HID0(r5) - sync - isync - mtspr SPRN_HID0,r3 - mfspr r3,SPRN_HID0 - mfspr r3,SPRN_HID0 - mfspr r3,SPRN_HID0 - mfspr r3,SPRN_HID0 - mfspr r3,SPRN_HID0 - mfspr r3,SPRN_HID0 - sync - isync - - /* Restore HID1 */ - ld r3,CS_HID1(r5) - sync - isync - mtspr SPRN_HID1,r3 - mtspr SPRN_HID1,r3 - sync - isync - - /* Restore HID4 */ - ld r3,CS_HID4(r5) - sync - isync - mtspr SPRN_HID4,r3 - sync - isync - - /* Restore HID5 */ - ld r3,CS_HID5(r5) - sync - isync - mtspr SPRN_HID5,r3 - sync - isync - blr - Index: working-2.6/arch/ppc64/kernel/ioctl32.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/ioctl32.c 2005-11-08 10:57:14.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,49 +0,0 @@ -/* - * ioctl32.c: Conversion between 32bit and 64bit native ioctls. - * - * Based on sparc64 ioctl32.c by: - * - * Copyright (C) 1997-2000 Jakub Jelinek (jakub at redhat.com) - * Copyright (C) 1998 Eddie C. Dost (ecd at skynet.be) - * - * ppc64 changes: - * - * Copyright (C) 2000 Ken Aaker (kdaaker at rchland.vnet.ibm.com) - * Copyright (C) 2001 Anton Blanchard (antonb at au.ibm.com) - * - * These routines maintain argument size conversion between 32bit and 64bit - * ioctls. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define INCLUDES -#include "compat_ioctl.c" -#include - -#define CODE -#include "compat_ioctl.c" - -#define HANDLE_IOCTL(cmd,handler) { cmd, (ioctl_trans_handler_t)handler, NULL }, -#define COMPATIBLE_IOCTL(cmd) HANDLE_IOCTL(cmd,sys_ioctl) - -#define IOCTL_TABLE_START \ - struct ioctl_trans ioctl_start[] = { -#define IOCTL_TABLE_END \ - }; - -IOCTL_TABLE_START -#include -#define DECLARES -#include "compat_ioctl.c" - -/* Little p (/dev/rtc, /dev/envctrl, etc.) */ -COMPATIBLE_IOCTL(_IOR('p', 20, int[7])) /* RTCGET */ -COMPATIBLE_IOCTL(_IOW('p', 21, int[7])) /* RTCSET */ - -IOCTL_TABLE_END - -int ioctl_table_size = ARRAY_SIZE(ioctl_start); Index: working-2.6/arch/powerpc/kernel/firmware.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/firmware.c 2005-11-09 13:27:40.000000000 +1100 @@ -0,0 +1,45 @@ +/* + * Extracted from cputable.c + * + * Copyright (C) 2001 Ben. Herrenschmidt (benh at kernel.crashing.org) + * + * Modifications for ppc64: + * Copyright (C) 2003 Dave Engebretsen + * Copyright (C) 2005 Stephen Rothwell, IBM Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include + +#include + +unsigned long ppc64_firmware_features; + +#ifdef CONFIG_PPC_PSERIES +firmware_feature_t firmware_features_table[FIRMWARE_MAX_FEATURES] = { + {FW_FEATURE_PFT, "hcall-pft"}, + {FW_FEATURE_TCE, "hcall-tce"}, + {FW_FEATURE_SPRG0, "hcall-sprg0"}, + {FW_FEATURE_DABR, "hcall-dabr"}, + {FW_FEATURE_COPY, "hcall-copy"}, + {FW_FEATURE_ASR, "hcall-asr"}, + {FW_FEATURE_DEBUG, "hcall-debug"}, + {FW_FEATURE_PERF, "hcall-perf"}, + {FW_FEATURE_DUMP, "hcall-dump"}, + {FW_FEATURE_INTERRUPT, "hcall-interrupt"}, + {FW_FEATURE_MIGRATE, "hcall-migrate"}, + {FW_FEATURE_PERFMON, "hcall-perfmon"}, + {FW_FEATURE_CRQ, "hcall-crq"}, + {FW_FEATURE_VIO, "hcall-vio"}, + {FW_FEATURE_RDMA, "hcall-rdma"}, + {FW_FEATURE_LLAN, "hcall-lLAN"}, + {FW_FEATURE_BULK, "hcall-bulk"}, + {FW_FEATURE_XDABR, "hcall-xdabr"}, + {FW_FEATURE_MULTITCE, "hcall-multi-tce"}, + {FW_FEATURE_SPLPAR, "hcall-splpar"}, +}; +#endif Index: working-2.6/arch/ppc64/kernel/firmware.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/firmware.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,47 +0,0 @@ -/* - * arch/ppc64/kernel/firmware.c - * - * Extracted from cputable.c - * - * Copyright (C) 2001 Ben. Herrenschmidt (benh at kernel.crashing.org) - * - * Modifications for ppc64: - * Copyright (C) 2003 Dave Engebretsen - * Copyright (C) 2005 Stephen Rothwell, IBM Corporation - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include - -#include - -unsigned long ppc64_firmware_features; - -#ifdef CONFIG_PPC_PSERIES -firmware_feature_t firmware_features_table[FIRMWARE_MAX_FEATURES] = { - {FW_FEATURE_PFT, "hcall-pft"}, - {FW_FEATURE_TCE, "hcall-tce"}, - {FW_FEATURE_SPRG0, "hcall-sprg0"}, - {FW_FEATURE_DABR, "hcall-dabr"}, - {FW_FEATURE_COPY, "hcall-copy"}, - {FW_FEATURE_ASR, "hcall-asr"}, - {FW_FEATURE_DEBUG, "hcall-debug"}, - {FW_FEATURE_PERF, "hcall-perf"}, - {FW_FEATURE_DUMP, "hcall-dump"}, - {FW_FEATURE_INTERRUPT, "hcall-interrupt"}, - {FW_FEATURE_MIGRATE, "hcall-migrate"}, - {FW_FEATURE_PERFMON, "hcall-perfmon"}, - {FW_FEATURE_CRQ, "hcall-crq"}, - {FW_FEATURE_VIO, "hcall-vio"}, - {FW_FEATURE_RDMA, "hcall-rdma"}, - {FW_FEATURE_LLAN, "hcall-lLAN"}, - {FW_FEATURE_BULK, "hcall-bulk"}, - {FW_FEATURE_XDABR, "hcall-xdabr"}, - {FW_FEATURE_MULTITCE, "hcall-multi-tce"}, - {FW_FEATURE_SPLPAR, "hcall-splpar"}, -}; -#endif Index: working-2.6/include/asm-powerpc/tce.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/tce.h 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,64 @@ +/* + * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation + * Rewrite, cleanup: + * Copyright (C) 2004 Olof Johansson , IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _ASM_POWERPC_TCE_H +#define _ASM_POWERPC_TCE_H + +/* + * Tces come in two formats, one for the virtual bus and a different + * format for PCI + */ +#define TCE_VB 0 +#define TCE_PCI 1 + +/* TCE page size is 4096 bytes (1 << 12) */ + +#define TCE_SHIFT 12 +#define TCE_PAGE_SIZE (1 << TCE_SHIFT) +#define TCE_PAGE_FACTOR (PAGE_SHIFT - TCE_SHIFT) + + +/* tce_entry + * Used by pSeries (SMP) and iSeries/pSeries LPAR, but there it's + * abstracted so layout is irrelevant. + */ +union tce_entry { + unsigned long te_word; + struct { + unsigned int tb_cacheBits :6; /* Cache hash bits - not used */ + unsigned int tb_rsvd :6; + unsigned long tb_rpn :40; /* Real page number */ + unsigned int tb_valid :1; /* Tce is valid (vb only) */ + unsigned int tb_allio :1; /* Tce is valid for all lps (vb only) */ + unsigned int tb_lpindex :8; /* LpIndex for user of TCE (vb only) */ + unsigned int tb_pciwr :1; /* Write allowed (pci only) */ + unsigned int tb_rdwr :1; /* Read allowed (pci), Write allowed (vb) */ + } te_bits; +#define te_cacheBits te_bits.tb_cacheBits +#define te_rpn te_bits.tb_rpn +#define te_valid te_bits.tb_valid +#define te_allio te_bits.tb_allio +#define te_lpindex te_bits.tb_lpindex +#define te_pciwr te_bits.tb_pciwr +#define te_rdwr te_bits.tb_rdwr +}; + + +#endif /* _ASM_POWERPC_TCE_H */ Index: working-2.6/include/asm-ppc64/tce.h =================================================================== --- working-2.6.orig/include/asm-ppc64/tce.h 2005-11-08 10:57:23.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,64 +0,0 @@ -/* - * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation - * Rewrite, cleanup: - * Copyright (C) 2004 Olof Johansson , IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#ifndef _ASM_TCE_H -#define _ASM_TCE_H - -/* - * Tces come in two formats, one for the virtual bus and a different - * format for PCI - */ -#define TCE_VB 0 -#define TCE_PCI 1 - -/* TCE page size is 4096 bytes (1 << 12) */ - -#define TCE_SHIFT 12 -#define TCE_PAGE_SIZE (1 << TCE_SHIFT) -#define TCE_PAGE_FACTOR (PAGE_SHIFT - TCE_SHIFT) - - -/* tce_entry - * Used by pSeries (SMP) and iSeries/pSeries LPAR, but there it's - * abstracted so layout is irrelevant. - */ -union tce_entry { - unsigned long te_word; - struct { - unsigned int tb_cacheBits :6; /* Cache hash bits - not used */ - unsigned int tb_rsvd :6; - unsigned long tb_rpn :40; /* Real page number */ - unsigned int tb_valid :1; /* Tce is valid (vb only) */ - unsigned int tb_allio :1; /* Tce is valid for all lps (vb only) */ - unsigned int tb_lpindex :8; /* LpIndex for user of TCE (vb only) */ - unsigned int tb_pciwr :1; /* Write allowed (pci only) */ - unsigned int tb_rdwr :1; /* Read allowed (pci), Write allowed (vb) */ - } te_bits; -#define te_cacheBits te_bits.tb_cacheBits -#define te_rpn te_bits.tb_rpn -#define te_valid te_bits.tb_valid -#define te_allio te_bits.tb_allio -#define te_lpindex te_bits.tb_lpindex -#define te_pciwr te_bits.tb_pciwr -#define te_rdwr te_bits.tb_rdwr -}; - - -#endif Index: working-2.6/include/asm-powerpc/abs_addr.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/abs_addr.h 2005-11-09 13:23:04.000000000 +1100 @@ -0,0 +1,73 @@ +#ifndef _ASM_POWERPC_ABS_ADDR_H +#define _ASM_POWERPC_ABS_ADDR_H + +#include + +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include + +struct mschunks_map { + unsigned long num_chunks; + unsigned long chunk_size; + unsigned long chunk_shift; + unsigned long chunk_mask; + u32 *mapping; +}; + +extern struct mschunks_map mschunks_map; + +/* Chunks are 256 KB */ +#define MSCHUNKS_CHUNK_SHIFT (18) +#define MSCHUNKS_CHUNK_SIZE (1UL << MSCHUNKS_CHUNK_SHIFT) +#define MSCHUNKS_OFFSET_MASK (MSCHUNKS_CHUNK_SIZE - 1) + +static inline unsigned long chunk_to_addr(unsigned long chunk) +{ + return chunk << MSCHUNKS_CHUNK_SHIFT; +} + +static inline unsigned long addr_to_chunk(unsigned long addr) +{ + return addr >> MSCHUNKS_CHUNK_SHIFT; +} + +static inline unsigned long phys_to_abs(unsigned long pa) +{ + unsigned long chunk; + + /* This is a no-op on non-iSeries */ + if (!firmware_has_feature(FW_FEATURE_ISERIES)) + return pa; + + chunk = addr_to_chunk(pa); + + if (chunk < mschunks_map.num_chunks) + chunk = mschunks_map.mapping[chunk]; + + return chunk_to_addr(chunk) + (pa & MSCHUNKS_OFFSET_MASK); +} + +/* Convenience macros */ +#define virt_to_abs(va) phys_to_abs(__pa(va)) +#define abs_to_virt(aa) __va(aa) + +/* + * Converts Virtual Address to Real Address for + * Legacy iSeries Hypervisor calls + */ +#define iseries_hv_addr(virtaddr) \ + (0x8000000000000000 | virt_to_abs(virtaddr)) + +#endif /* _ASM_POWERPC_ABS_ADDR_H */ Index: working-2.6/include/asm-ppc64/abs_addr.h =================================================================== --- working-2.6.orig/include/asm-ppc64/abs_addr.h 2005-11-08 10:57:23.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,73 +0,0 @@ -#ifndef _ABS_ADDR_H -#define _ABS_ADDR_H - -#include - -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include - -struct mschunks_map { - unsigned long num_chunks; - unsigned long chunk_size; - unsigned long chunk_shift; - unsigned long chunk_mask; - u32 *mapping; -}; - -extern struct mschunks_map mschunks_map; - -/* Chunks are 256 KB */ -#define MSCHUNKS_CHUNK_SHIFT (18) -#define MSCHUNKS_CHUNK_SIZE (1UL << MSCHUNKS_CHUNK_SHIFT) -#define MSCHUNKS_OFFSET_MASK (MSCHUNKS_CHUNK_SIZE - 1) - -static inline unsigned long chunk_to_addr(unsigned long chunk) -{ - return chunk << MSCHUNKS_CHUNK_SHIFT; -} - -static inline unsigned long addr_to_chunk(unsigned long addr) -{ - return addr >> MSCHUNKS_CHUNK_SHIFT; -} - -static inline unsigned long phys_to_abs(unsigned long pa) -{ - unsigned long chunk; - - /* This is a no-op on non-iSeries */ - if (!firmware_has_feature(FW_FEATURE_ISERIES)) - return pa; - - chunk = addr_to_chunk(pa); - - if (chunk < mschunks_map.num_chunks) - chunk = mschunks_map.mapping[chunk]; - - return chunk_to_addr(chunk) + (pa & MSCHUNKS_OFFSET_MASK); -} - -/* Convenience macros */ -#define virt_to_abs(va) phys_to_abs(__pa(va)) -#define abs_to_virt(aa) __va(aa) - -/* - * Converts Virtual Address to Real Address for - * Legacy iSeries Hypervisor calls - */ -#define iseries_hv_addr(virtaddr) \ - (0x8000000000000000 | virt_to_abs(virtaddr)) - -#endif /* _ABS_ADDR_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From michael at ellerman.id.au Wed Nov 9 13:55:16 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Wed, 9 Nov 2005 13:55:16 +1100 Subject: powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc In-Reply-To: <20051109023801.GI28271@localhost.localdomain> References: <20051109023801.GI28271@localhost.localdomain> Message-ID: <200511091355.21945.michael@ellerman.id.au> On Wed, 9 Nov 2005 13:38, David Gibson wrote: > ... > Index: working-2.6/include/asm-powerpc/lppaca.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ working-2.6/include/asm-powerpc/lppaca.h 2005-11-09 13:23:04.000000000 > +1100 @@ -0,0 +1,131 @@ > ... > +//============================================================================= +// > +// This control block contains the data that is shared between the > +// hypervisor (PLIC) and the OS. > +// > +// > +//---------------------------------------------------------------------------- C++ style comments? -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051109/9c03c438/attachment.pgp From jwboyer at jdub.homelinux.org Wed Nov 9 09:32:54 2005 From: jwboyer at jdub.homelinux.org (Josh Boyer) Date: Tue, 08 Nov 2005 16:32:54 -0600 Subject: 440EP FPU support missing In-Reply-To: <20051108153036.F27232@cox.net> References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> <20051108153036.F27232@cox.net> Message-ID: <1131489174.26096.0.camel@yoda.jdub.homelinux.org> On Tue, 2005-11-08 at 15:30 -0700, Matt Porter wrote: > On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > > In the current linux version, Bamboo (440EP) won't compile anymore, because of > > missing fpu support: > > > > make uImage > > ... > > LD init/built-in.o > > LD .tmp_vmlinux1 > > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > > : undefined reference to `KernelFP' > > make: *** [.tmp_vmlinux1] Error 1 > > > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > > how to solve this problem (just restore the original file)? > > arch/powerpc/kernel/fpu.S is being used now which doesn't have KernelFP. > I don't know why the 44x fpu support wasn't using > kernel_fp_unavailable_exception() before but I must have missed that > reviewing it. > > Try this patch. Doesn't this render the 440EP's FPU useless? josh From dwg at au1.ibm.com Wed Nov 9 10:57:59 2005 From: dwg at au1.ibm.com (David Gibson) Date: Wed, 9 Nov 2005 10:57:59 +1100 Subject: typedefs and structs In-Reply-To: <20051108232327.GA19593@austin.ibm.com> References: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> Message-ID: <20051108235759.GA28271@localhost.localdomain> On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote: > On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark: > > On Mon, 2005-11-07 at 14:41 -0600, linas wrote: > > > > don't use typedef to get rid of "struct". > > > > This was for the simple reason, too many developers were passing > > structures by value instead of by reference, just because they were > > using a type that they didn't realize was a structure. > > That's a rather bizarre mistake to make, since, in order to > access a values in such a beast, you have to use a dot . instead > of an arrow -> and so it hits ou in the face that you passed a value > instead of a reference. > > ---- > Off-topic: There's actually a neat little trick in C++ that can > help avoid accidentally passing null pointers. One can declare > function declarations as: > > int func (sturct blah &v) { > v.a ++; > return v.b; > } > > The ampersand says "pass argument by reference (so as to get arg passing > efficiency) but force coder to write code as if they were passing by value" > As a result, it gets difficult to pass null pointers (for reasons > similar to the difficulty of passing null pointers in Java (and yes, > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick > only; I wish it was in C so I could experiment more and find out if I > like it or hate it. I hate it: it obscures the fact that it's a pass-by-reference at the callsite, which is useful information. Although this is, admittedly, the least confusing use of C++ reference types. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From mrmacman_g4 at mac.com Wed Nov 9 10:57:11 2005 From: mrmacman_g4 at mac.com (Kyle Moffett) Date: Tue, 8 Nov 2005 18:57:11 -0500 Subject: typedefs and structs In-Reply-To: <20051108232327.GA19593@austin.ibm.com> References: <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> Message-ID: On Nov 8, 2005, at 18:23:27, linas wrote: > Off-topic: There's actually a neat little trick in C++ that can > help avoid accidentally passing null pointers. One can declare > function declarations as: > > int func (sturct blah &v) { > v.a ++; > return v.b; > } > > The ampersand says "pass argument by reference (so as to get arg > passing efficiency) but force coder to write code as if they were > passing by value" As a result, it gets difficult to pass null > pointers (for reasons similar to the difficulty of passing null > pointers in Java (and yes, I loathe Java, sorry to subject you to > that)) Anyway, that's a C++ trick only; I wish it was in C so I > could experiment more and find out if I like it or hate it. That technique tends to cause more problems than it solves. If I write the following code: struct foo the_leftmost_foo = get_leftmost_foo(); do_some_stuff(the_leftmost_foo); How do I know what it is going to do? Will it modify the_leftmost_foo, or is it a pass-by-value as it appears? This is just as bad as defining a macro some_macro(foo,bar) that does (foo = bar), it's _really_ hard to tell what it does, especially when you aren't all that familiar with the code. A much better solution is this: void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1))); do_some_stuff(&the_leftmost_foo); That ensures that the first argument cannot be explicitly passed as null, while still being quite obvious to the programmer what it's doing. Cheers, Kyle Moffett -- They _will_ find opposing experts to say it isn't, if you push hard enough the wrong way. Idiots with a PhD aren't hard to buy. -- Rob Landley From zlynx at acm.org Wed Nov 9 11:13:48 2005 From: zlynx at acm.org (Zan Lynx) Date: Tue, 08 Nov 2005 17:13:48 -0700 Subject: typedefs and structs In-Reply-To: <20051108235759.GA28271@localhost.localdomain> References: <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051108235759.GA28271@localhost.localdomain> Message-ID: <1131495228.12797.67.camel@localhost> On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote: > On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote: [snip] > > The ampersand says "pass argument by reference (so as to get arg passing > > efficiency) but force coder to write code as if they were passing by value" > > As a result, it gets difficult to pass null pointers (for reasons > > similar to the difficulty of passing null pointers in Java (and yes, > > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick > > only; I wish it was in C so I could experiment more and find out if I > > like it or hate it. > > I hate it: it obscures the fact that it's a pass-by-reference at the > callsite, which is useful information. Although this is, admittedly, > the least confusing use of C++ reference types. I agree with you about that one. It's yet another thing for C programmers to have to learn to watch for C++ doing behind your back. However, it isn't any worse than having an ordinary C pointer to some struct. If the pointer was passed to the current function from above, and you're passing it to another function below, you really don't know what's going to happen to the structure unless you go look. Just like the C++ reference, the C pointer doesn't get an address-of operator to remind you. -- Zan Lynx -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051108/8dd7c5d6/attachment.pgp From doug at mcnaught.org Wed Nov 9 11:59:56 2005 From: doug at mcnaught.org (Douglas McNaught) Date: Tue, 08 Nov 2005 19:59:56 -0500 Subject: typedefs and structs In-Reply-To: <20051109004808.GM19593@austin.ibm.com> (linas@austin.ibm.com's message of "Tue, 8 Nov 2005 18:48:08 -0600") References: <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> Message-ID: linas writes: > On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark: >> >> Yeah, but if you're trying to read that code, you have to go look up >> the declaration to figure out whether it might affect 'foo' or not. >> And if you get it wrong, you get silent data corruption. > > No, that is not what "pass by reference" means. You are thinking of > "const", maybe, or "pass by value"; this is neither. The arg is not > declared const, the subroutine can (and usually will) modify the contents > of the structure, and so the caller will be holding a modified structure > when the callee returns (just like it would if a pointer was passed). Right. My point is only that it's not clear from looking at the call site whether a struct passed by reference will be modified by the callee (some people pass by reference just for "efficiency"). And if the called function modifies the data without the caller's knowledge, it leads to obscure bugs. Whereas if you pass a pointer, it's immediately clear that the called function can modify the pointed-to object. -Doug From doug at mcnaught.org Wed Nov 9 11:37:20 2005 From: doug at mcnaught.org (Douglas McNaught) Date: Tue, 08 Nov 2005 19:37:20 -0500 Subject: typedefs and structs In-Reply-To: <20051109003048.GK19593@austin.ibm.com> (linas@austin.ibm.com's message of "Tue, 8 Nov 2005 18:30:48 -0600") References: <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> Message-ID: linas writes: > On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark: >> That technique tends to cause more problems than it solves. If I >> write the following code: >> >> struct foo the_leftmost_foo = get_leftmost_foo(); >> do_some_stuff(the_leftmost_foo); >> >> How do I know what it is going to do? > > It depends on how do_some_stuff() was declared. If its declared as > > do_some_stuff (struct foo &x) > > then it will be a pass by reference. Yeah, but if you're trying to read that code, you have to go look up the declaration to figure out whether it might affect 'foo' or not. And if you get it wrong, you get silent data corruption. I'd rather pass a pointer explicitly and crash with a segfault if someone passes NULL--at least then it's pellucidly clear what went wrong. -Doug From mrmacman_g4 at mac.com Wed Nov 9 12:51:25 2005 From: mrmacman_g4 at mac.com (Kyle Moffett) Date: Tue, 8 Nov 2005 20:51:25 -0500 Subject: typedefs and structs In-Reply-To: <20051109004808.GM19593@austin.ibm.com> References: <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> Message-ID: <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> On Nov 8, 2005, at 19:48:08, linas wrote: > On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was > heard to remark: >> Yeah, but if you're trying to read that code, you have to go look >> up the declaration to figure out whether it might affect 'foo' or >> not. And if you get it wrong, you get silent data corruption. > > No, that is not what "pass by reference" means. You are thinking of > "const", maybe, or "pass by value"; this is neither. The arg is > not declared const, the subroutine can (and usually will) modify > the contents of the structure, and so the caller will be holding a > modified structure when the callee returns (just like it would if a > pointer was passed). Pass by value in C: do_some_stuff(arg1, arg2); Pass by reference in C: do_some_stuff(&arg1, &arg2); This is very obvious what it does. The compiler does type-checks to make sure you don't get it wrong. There are tools to check stack usage of functions too. This is inherently obvious what the code does without looking at a completely different file where the function is defined. Pass by value in C++: do_some_stuff(arg1, arg2); Pass by reference in C++: do_some_stuff(arg1, arg2); This is C++ being clever and hiding stuff from the programmer, which is Not Good(TM) for a kernel. C++ may be an excellent language for userspace programmers (I say "may" here because some disagree, including myself), however, many of the features are extremely problematic for a kernel. Cheers, Kyle Moffett -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian Kernighan From dtor_core at ameritech.net Wed Nov 9 13:14:51 2005 From: dtor_core at ameritech.net (Dmitry Torokhov) Date: Tue, 8 Nov 2005 21:14:51 -0500 Subject: typedefs and structs In-Reply-To: References: <20051107185621.GD19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> Message-ID: <200511082114.52159.dtor_core@ameritech.net> On Tuesday 08 November 2005 19:59, Douglas McNaught wrote: > linas writes: > > > On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark: > >> > >> Yeah, but if you're trying to read that code, you have to go look up > >> the declaration to figure out whether it might affect 'foo' or not. > >> And if you get it wrong, you get silent data corruption. > > > > No, that is not what "pass by reference" means. You are thinking of > > "const", maybe, or "pass by value"; this is neither. The arg is not > > declared const, the subroutine can (and usually will) modify the contents > > of the structure, and so the caller will be holding a modified structure > > when the callee returns (just like it would if a pointer was passed). > > Right. My point is only that it's not clear from looking at the call > site whether a struct passed by reference will be modified by the > callee (some people pass by reference just for "efficiency"). And if > the called function modifies the data without the caller's knowledge, > it leads to obscure bugs. Whereas if you pass a pointer, it's > immediately clear that the called function can modify the pointed-to > object. > A structure is almost never passed by value, no matter whether it is C or C++. So both languages require you either use descriptive naming or look up declaration/implementation: C: struct str { char buf[1024]; int count; }; struct str s; do_something_with_s(&s); do_something_else_with_s(&s); Which one modufies s? -- Dmitry From david at gibson.dropbear.id.au Wed Nov 9 14:03:57 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 9 Nov 2005 14:03:57 +1100 Subject: powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc In-Reply-To: <200511091355.21945.michael@ellerman.id.au> References: <20051109023801.GI28271@localhost.localdomain> <200511091355.21945.michael@ellerman.id.au> Message-ID: <20051109030357.GJ28271@localhost.localdomain> On Wed, Nov 09, 2005 at 01:55:16PM +1100, Michael Ellerman wrote: > On Wed, 9 Nov 2005 13:38, David Gibson wrote: > > ... > > Index: working-2.6/include/asm-powerpc/lppaca.h > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ working-2.6/include/asm-powerpc/lppaca.h 2005-11-09 13:23:04.000000000 > > +1100 @@ -0,0 +1,131 @@ > > ... > > +//============================================================================= +// > > +// This control block contains the data that is shared between the > > +// hypervisor (PLIC) and the OS. > > +// > > +// > > +//---------------------------------------------------------------------------- > > C++ style comments? They were already in include/asm-ppc64/lppaca.h. We can get rid of them when we pull the lppaca out of the paca, which should allow us to get rid of lppaca.h, or at least make it local to the arch code. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051109/362c892a/attachment.pgp From benh at kernel.crashing.org Wed Nov 9 14:07:53 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Nov 2005 14:07:53 +1100 Subject: [PATCH] Merge platform codes Message-ID: <1131505673.24637.37.camel@gaston> This patch merges the _MACH_* and PLATFORM_* codes together, and changes ppc64 to use _machine instead of systemcfg.h. (The later is also moved to asm-powerpc, it will be renamed & made common in the next patch that merges the vDSO). Later, we'll eventually completely get rid of the platform numbers. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/asm-offsets.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/powerpc/kernel/asm-offsets.c 2005-11-09 12:00:55.000000000 +1100 @@ -106,7 +106,6 @@ DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size)); DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size)); DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); - DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); DEFINE(PLATFORM_LPAR, PLATFORM_LPAR); /* paca */ Index: linux-work/arch/powerpc/kernel/head_64.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/head_64.S 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/powerpc/kernel/head_64.S 2005-11-09 12:00:55.000000000 +1100 @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -1735,9 +1734,7 @@ sc /* HvCall_setASR */ #else /* set the ASR */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine type */ andi. r3,r3,PLATFORM_LPAR /* Test if bit 0 is set (LPAR bit) */ beq 98f /* branch if result is 0 */ mfspr r3,SPRN_PVR @@ -1899,9 +1896,7 @@ /* set the ASR */ ld r3,PACASTABREAL(r13) ori r4,r3,1 /* turn on valid bit */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine type */ andi. r3,r3,PLATFORM_LPAR /* Test if bit 0 is set (LPAR bit) */ beq 98f /* branch if result is 0 */ mfspr r3,SPRN_PVR @@ -1919,9 +1914,7 @@ mtasr r4 /* set the stab location */ 99: /* Set SDR1 (hash table pointer) */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine */ /* Test if bit 0 is set (LPAR bit) */ andi. r3,r3,PLATFORM_LPAR bne 98f /* branch if result is !0 */ Index: linux-work/arch/powerpc/kernel/prom.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/prom.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/prom.c 2005-11-09 12:00:55.000000000 +1100 @@ -48,9 +48,6 @@ #include #include #include -#ifdef CONFIG_PPC64 -#include -#endif #ifdef DEBUG #define DBG(fmt...) printk(KERN_ERR fmt) @@ -391,7 +388,7 @@ #ifdef CONFIG_PPC64 /* We offset irq numbers for the u3 MPIC by 128 in PowerMac */ - if (systemcfg->platform == PLATFORM_POWERMAC && ic && ic->parent) { + if (_machine == PLATFORM_POWERMAC && ic && ic->parent) { char *name = get_property(ic->parent, "name", NULL); if (name && !strcmp(name, "u3")) np->intrs[intrcount].line += 128; @@ -1161,13 +1158,9 @@ prop = (u32 *)of_get_flat_dt_prop(node, "linux,platform", NULL); if (prop == NULL) return 0; -#ifdef CONFIG_PPC64 - systemcfg->platform = *prop; -#else -#ifdef CONFIG_PPC_MULTIPLATFORM +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_MULTIPLATFORM) _machine = *prop; #endif -#endif #ifdef CONFIG_PPC64 /* check if iommu is forced on or off */ @@ -1339,9 +1332,6 @@ of_scan_flat_dt(early_init_dt_scan_memory, NULL); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); -#ifdef CONFIG_PPC64 - systemcfg->physicalMemorySize = lmb_phys_mem_size(); -#endif lmb_reserve(0, __pa(klimit)); DBG("Phys. mem: %lx\n", lmb_phys_mem_size()); @@ -1908,7 +1898,7 @@ /* We don't support that function on PowerMac, at least * not yet */ - if (systemcfg->platform == PLATFORM_POWERMAC) + if (_machine == PLATFORM_POWERMAC) return -ENODEV; /* fix up new node's linux_phandle field */ Index: linux-work/arch/powerpc/kernel/rtas.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/rtas.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/rtas.c 2005-11-09 12:00:55.000000000 +1100 @@ -29,9 +29,6 @@ #include #include #include -#ifdef CONFIG_PPC64 -#include -#endif struct rtas_t rtas = { .lock = SPIN_LOCK_UNLOCKED @@ -671,7 +668,7 @@ * the stop-self token if any */ #ifdef CONFIG_PPC64 - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + if (_machine == PLATFORM_PSERIES_LPAR) rtas_region = min(lmb.rmo_size, RTAS_INSTANTIATE_MAX); #endif rtas_rmo_buf = lmb_alloc_base(RTAS_RMOBUF_MAX, PAGE_SIZE, rtas_region); Index: linux-work/arch/powerpc/kernel/setup-common.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/setup-common.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/setup-common.c 2005-11-09 12:00:55.000000000 +1100 @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -510,8 +511,8 @@ * On pSeries LPAR, we need to know how many cpus * could possibly be added to this partition. */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR && - (dn = of_find_node_by_path("/rtas"))) { + if (_machine == PLATFORM_PSERIES_LPAR && + (dn = of_find_node_by_path("/rtas"))) { int num_addr_cell, num_size_cell, maxcpus; unsigned int *ireg; @@ -555,7 +556,7 @@ cpu_set(cpu ^ 0x1, cpu_sibling_map[cpu]); } - systemcfg->processorCount = num_present_cpus(); + _systemcfg->processorCount = num_present_cpus(); #endif /* CONFIG_PPC64 */ } #endif /* CONFIG_SMP */ Index: linux-work/arch/powerpc/kernel/setup_64.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/setup_64.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/setup_64.c 2005-11-09 13:25:41.000000000 +1100 @@ -108,6 +108,8 @@ int boot_cpuid_phys = 0; dev_t boot_dev; u64 ppc64_pft_size; +int _machine; +EXPORT_SYMBOL(_machine); struct ppc64_caches ppc64_caches; EXPORT_SYMBOL_GPL(ppc64_caches); @@ -254,11 +256,10 @@ * Iterate all ppc_md structures until we find the proper * one for the current machine type */ - DBG("Probing machine type for platform %x...\n", - systemcfg->platform); + DBG("Probing machine type for platform %x...\n", _machine); for (mach = machines; *mach; mach++) { - if ((*mach)->probe(systemcfg->platform)) + if ((*mach)->probe(_machine)) break; } /* What can we do if we didn't find ? */ @@ -315,7 +316,8 @@ #endif /* CONFIG_SMP || CONFIG_KEXEC */ /* - * Initialize some remaining members of the ppc64_caches and systemcfg structures + * Initialize some remaining members of the ppc64_caches and systemcfg + * structures * (at least until we get rid of them completely). This is mostly some * cache informations about the CPU that will be used by cache flush * routines and/or provided to userland @@ -340,7 +342,7 @@ const char *dc, *ic; /* Then read cache informations */ - if (systemcfg->platform == PLATFORM_POWERMAC) { + if (_machine == PLATFORM_POWERMAC) { dc = "d-cache-block-size"; ic = "i-cache-block-size"; } else { @@ -360,8 +362,8 @@ DBG("Argh, can't find dcache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->dcache_size = ppc64_caches.dsize = size; - systemcfg->dcache_line_size = + _systemcfg->dcache_size = ppc64_caches.dsize = size; + _systemcfg->dcache_line_size = ppc64_caches.dline_size = lsize; ppc64_caches.log_dline_size = __ilog2(lsize); ppc64_caches.dlines_per_page = PAGE_SIZE / lsize; @@ -378,8 +380,8 @@ DBG("Argh, can't find icache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->icache_size = ppc64_caches.isize = size; - systemcfg->icache_line_size = + _systemcfg->icache_size = ppc64_caches.isize = size; + _systemcfg->icache_line_size = ppc64_caches.iline_size = lsize; ppc64_caches.log_iline_size = __ilog2(lsize); ppc64_caches.ilines_per_page = PAGE_SIZE / lsize; @@ -387,10 +389,12 @@ } /* Add an eye catcher and the systemcfg layout version number */ - strcpy(systemcfg->eye_catcher, "SYSTEMCFG:PPC64"); - systemcfg->version.major = SYSTEMCFG_MAJOR; - systemcfg->version.minor = SYSTEMCFG_MINOR; - systemcfg->processor = mfspr(SPRN_PVR); + strcpy(_systemcfg->eye_catcher, "SYSTEMCFG:PPC64"); + _systemcfg->version.major = SYSTEMCFG_MAJOR; + _systemcfg->version.minor = SYSTEMCFG_MINOR; + _systemcfg->processor = mfspr(SPRN_PVR); + _systemcfg->platform = _machine; + _systemcfg->physicalMemorySize = lmb_phys_mem_size(); DBG(" <- initialize_cache_info()\n"); } @@ -479,10 +483,10 @@ printk("-----------------------------------------------------\n"); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); - printk("systemcfg = 0x%p\n", systemcfg); - printk("systemcfg->platform = 0x%x\n", systemcfg->platform); - printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); - printk("systemcfg->physicalMemorySize = 0x%lx\n", systemcfg->physicalMemorySize); + printk("systemcfg = 0x%p\n", _systemcfg); + printk("systemcfg->platform = 0x%x\n", _systemcfg->platform); + printk("systemcfg->processorCount = 0x%lx\n", _systemcfg->processorCount); + printk("systemcfg->physicalMemorySize = 0x%lx\n", _systemcfg->physicalMemorySize); printk("ppc64_caches.dcache_line_size = 0x%x\n", ppc64_caches.dline_size); printk("ppc64_caches.icache_line_size = 0x%x\n", @@ -564,12 +568,12 @@ for (i = 0; i < __NR_syscalls; i++) { if (sys_call_table[i*2] != sys_ni_syscall) { count64++; - systemcfg->syscall_map_64[i >> 5] |= + _systemcfg->syscall_map_64[i >> 5] |= 0x80000000UL >> (i & 0x1f); } if (sys_call_table[i*2+1] != sys_ni_syscall) { count32++; - systemcfg->syscall_map_32[i >> 5] |= + _systemcfg->syscall_map_32[i >> 5] |= 0x80000000UL >> (i & 0x1f); } } Index: linux-work/arch/powerpc/kernel/smp.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/smp.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/smp.c 2005-11-09 12:00:55.000000000 +1100 @@ -44,6 +44,7 @@ #include #include #include +#include #ifdef CONFIG_PPC64 #include #endif @@ -368,7 +369,9 @@ if (cpu == boot_cpuid) return -EBUSY; - systemcfg->processorCount--; +#ifdef CONFIG_PPC64 + _systemcfg->processorCount--; +#endif cpu_clear(cpu, cpu_online_map); fixup_irqs(cpu_online_map); return 0; Index: linux-work/arch/powerpc/kernel/sys_ppc32.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/sys_ppc32.c 2005-11-01 14:13:52.000000000 +1100 +++ linux-work/arch/powerpc/kernel/sys_ppc32.c 2005-11-09 12:00:55.000000000 +1100 @@ -52,7 +52,6 @@ #include #include #include -#include #include /* readdir & getdents */ Index: linux-work/arch/powerpc/kernel/time.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/time.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/time.c 2005-11-09 12:00:55.000000000 +1100 @@ -271,13 +271,13 @@ * tb_to_xs and stamp_xsec values are consistent. If not, then it * loops back and reads them again until this criteria is met. */ - ++(systemcfg->tb_update_count); + ++(_systemcfg->tb_update_count); smp_wmb(); - systemcfg->tb_orig_stamp = new_tb_stamp; - systemcfg->stamp_xsec = new_stamp_xsec; - systemcfg->tb_to_xs = new_tb_to_xs; + _systemcfg->tb_orig_stamp = new_tb_stamp; + _systemcfg->stamp_xsec = new_stamp_xsec; + _systemcfg->tb_to_xs = new_tb_to_xs; smp_wmb(); - ++(systemcfg->tb_update_count); + ++(_systemcfg->tb_update_count); #endif } @@ -357,8 +357,9 @@ do_gtod.tb_ticks_per_sec = tb_ticks_per_sec; tb_to_xs = divres.result_low; do_gtod.varp->tb_to_xs = tb_to_xs; - systemcfg->tb_ticks_per_sec = tb_ticks_per_sec; - systemcfg->tb_to_xs = tb_to_xs; + _systemcfg->tb_ticks_per_sec = + tb_ticks_per_sec; + _systemcfg->tb_to_xs = tb_to_xs; } else { printk( "Titan recalibrate: FAILED (difference > 4 percent)\n" @@ -559,8 +560,8 @@ update_gtod(tb_last_jiffy, new_xsec, do_gtod.varp->tb_to_xs); #ifdef CONFIG_PPC64 - systemcfg->tz_minuteswest = sys_tz.tz_minuteswest; - systemcfg->tz_dsttime = sys_tz.tz_dsttime; + _systemcfg->tz_minuteswest = sys_tz.tz_minuteswest; + _systemcfg->tz_dsttime = sys_tz.tz_dsttime; #endif write_sequnlock_irqrestore(&xtime_lock, flags); @@ -711,11 +712,11 @@ do_gtod.varp->tb_to_xs = tb_to_xs; do_gtod.tb_to_us = tb_to_us; #ifdef CONFIG_PPC64 - systemcfg->tb_orig_stamp = tb_last_jiffy; - systemcfg->tb_update_count = 0; - systemcfg->tb_ticks_per_sec = tb_ticks_per_sec; - systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; - systemcfg->tb_to_xs = tb_to_xs; + _systemcfg->tb_orig_stamp = tb_last_jiffy; + _systemcfg->tb_update_count = 0; + _systemcfg->tb_ticks_per_sec = tb_ticks_per_sec; + _systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; + _systemcfg->tb_to_xs = tb_to_xs; #endif time_freq = 0; Index: linux-work/arch/ppc64/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/asm-offsets.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/ppc64/kernel/asm-offsets.c 2005-11-09 12:00:55.000000000 +1100 @@ -74,7 +74,6 @@ DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size)); DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size)); DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); - DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); DEFINE(PLATFORM_LPAR, PLATFORM_LPAR); /* paca */ Index: linux-work/arch/ppc64/kernel/eeh.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/eeh.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/ppc64/kernel/eeh.c 2005-11-09 12:00:55.000000000 +1100 @@ -32,7 +32,6 @@ #include #include #include -#include #include #undef DEBUG @@ -932,11 +931,12 @@ { struct proc_dir_entry *e; - if (systemcfg->platform & PLATFORM_PSERIES) { - e = create_proc_entry("ppc64/eeh", 0, NULL); - if (e) - e->proc_fops = &proc_eeh_operations; - } + if (_machine != PLATFORM_PSERIES && _machine != PLATFORM_PSERIES_LPAR) + return 0; + + e = create_proc_entry("ppc64/eeh", 0, NULL); + if (e) + e->proc_fops = &proc_eeh_operations; return 0; } Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/ppc64/kernel/head.S 2005-11-09 12:00:55.000000000 +1100 @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -1735,9 +1734,7 @@ sc /* HvCall_setASR */ #else /* set the ASR */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine type */ andi. r3,r3,PLATFORM_LPAR /* Test if bit 0 is set (LPAR bit) */ beq 98f /* branch if result is 0 */ mfspr r3,SPRN_PVR @@ -1899,9 +1896,7 @@ /* set the ASR */ ld r3,PACASTABREAL(r13) ori r4,r3,1 /* turn on valid bit */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine type */ andi. r3,r3,PLATFORM_LPAR /* Test if bit 0 is set (LPAR bit) */ beq 98f /* branch if result is 0 */ mfspr r3,SPRN_PVR @@ -1919,9 +1914,7 @@ mtasr r4 /* set the stab location */ 99: /* Set SDR1 (hash table pointer) */ - ld r3,systemcfg at got(r2) /* r3 = ptr to systemcfg */ - ld r3,0(r3) - lwz r3,PLATFORM(r3) /* r3 = platform flags */ + ld r3,_machine at got(r2) /* r3 = machine type */ /* Test if bit 0 is set (LPAR bit) */ andi. r3,r3,PLATFORM_LPAR bne 98f /* branch if result is !0 */ Index: linux-work/arch/ppc64/kernel/idle.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/idle.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/ppc64/kernel/idle.c 2005-11-09 12:00:55.000000000 +1100 @@ -26,7 +26,6 @@ #include #include #include -#include #include #include Index: linux-work/arch/ppc64/kernel/lparcfg.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/lparcfg.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/ppc64/kernel/lparcfg.c 2005-11-09 12:00:55.000000000 +1100 @@ -35,6 +35,7 @@ #include #include #include +#include #define MODULE_VERS "1.6" #define MODULE_NAME "lparcfg" @@ -371,7 +372,7 @@ lrdrp = (int *)get_property(rtas_node, "ibm,lrdr-capacity", NULL); if (lrdrp == NULL) { - partition_potential_processors = systemcfg->processorCount; + partition_potential_processors = _systemcfg->processorCount; } else { partition_potential_processors = *(lrdrp + 4); } Index: linux-work/arch/ppc64/kernel/nvram.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/nvram.c 2005-10-26 12:43:38.000000000 +1000 +++ linux-work/arch/ppc64/kernel/nvram.c 2005-11-09 12:00:55.000000000 +1100 @@ -31,7 +31,6 @@ #include #include #include -#include #undef DEBUG_NVRAM @@ -167,7 +166,7 @@ case IOC_NVRAM_GET_OFFSET: { int part, offset; - if (systemcfg->platform != PLATFORM_POWERMAC) + if (_machine != PLATFORM_POWERMAC) return -EINVAL; if (copy_from_user(&part, (void __user*)arg, sizeof(part)) != 0) return -EFAULT; @@ -450,7 +449,7 @@ * in our nvram, as Apple defined partitions use pretty much * all of the space */ - if (systemcfg->platform == PLATFORM_POWERMAC) + if (_machine == PLATFORM_POWERMAC) return -ENOSPC; /* see if we have an OS partition that meets our needs. Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/ppc64/kernel/prom.c 2005-11-09 12:00:55.000000000 +1100 @@ -318,7 +318,7 @@ } /* We offset irq numbers for the u3 MPIC by 128 in PowerMac */ - if (systemcfg->platform == PLATFORM_POWERMAC && ic && ic->parent) { + if (_machine == PLATFORM_POWERMAC && ic && ic->parent) { char *name = get_property(ic->parent, "name", NULL); if (name && !strcmp(name, "u3")) np->intrs[intrcount].line += 128; @@ -1065,7 +1065,7 @@ prop = (u32 *)of_get_flat_dt_prop(node, "linux,platform", NULL); if (prop == NULL) return 0; - systemcfg->platform = *prop; + _machine = *prop; /* check if iommu is forced on or off */ if (of_get_flat_dt_prop(node, "linux,iommu-off", NULL) != NULL) @@ -1230,11 +1230,8 @@ of_scan_flat_dt(early_init_dt_scan_memory, NULL); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); - systemcfg->physicalMemorySize = lmb_phys_mem_size(); lmb_reserve(0, __pa(klimit)); - DBG("Phys. mem: %lx\n", systemcfg->physicalMemorySize); - /* Reserve LMB regions used by kernel, initrd, dt, etc... */ early_reserve_mem(); @@ -1753,7 +1750,7 @@ /* We don't support that function on PowerMac, at least * not yet */ - if (systemcfg->platform == PLATFORM_POWERMAC) + if (_machine == PLATFORM_POWERMAC) return -ENODEV; /* fix up new node's linux_phandle field */ Index: linux-work/include/asm-powerpc/firmware.h =================================================================== --- linux-work.orig/include/asm-powerpc/firmware.h 2005-11-01 14:13:56.000000000 +1100 +++ linux-work/include/asm-powerpc/firmware.h 2005-11-09 13:26:19.000000000 +1100 @@ -43,6 +43,7 @@ #define FW_FEATURE_ISERIES (1UL<<21) enum { +#ifdef CONFIG_PPC64 FW_FEATURE_PSERIES_POSSIBLE = FW_FEATURE_PFT | FW_FEATURE_TCE | FW_FEATURE_SPRG0 | FW_FEATURE_DABR | FW_FEATURE_COPY | FW_FEATURE_ASR | FW_FEATURE_DEBUG | FW_FEATURE_TERM | @@ -70,6 +71,11 @@ FW_FEATURE_ISERIES_ALWAYS & #endif FW_FEATURE_POSSIBLE, + +#else /* CONFIG_PPC64 */ + FW_FEATURE_POSSIBLE = 0, + FW_FEATURE_ALWAYS = 0, +#endif }; /* This is used to identify firmware features which are available Index: linux-work/include/asm-powerpc/processor.h =================================================================== --- linux-work.orig/include/asm-powerpc/processor.h 2005-11-07 10:31:41.000000000 +1100 +++ linux-work/include/asm-powerpc/processor.h 2005-11-09 14:02:41.000000000 +1100 @@ -17,65 +17,71 @@ #include #include #include -#ifdef CONFIG_PPC64 -#include -#endif -#ifdef CONFIG_PPC32 -/* 32-bit platform types */ -/* We only need to define a new _MACH_xxx for machines which are part of - * a configuration which supports more than one type of different machine. - * This is currently limited to CONFIG_PPC_MULTIPLATFORM and CHRP/PReP/PMac. - * -- Tom +/* We do _not_ want to define new machine types at all, those must die + * in favor of using the device-tree + * -- BenH. */ -#define _MACH_prep 0x00000001 -#define _MACH_Pmac 0x00000002 /* pmac or pmac clone (non-chrp) */ -#define _MACH_chrp 0x00000004 /* chrp machine */ -/* see residual.h for these */ +/* Platforms codes (to be obsoleted) */ +#define PLATFORM_PSERIES 0x0100 +#define PLATFORM_PSERIES_LPAR 0x0101 +#define PLATFORM_ISERIES_LPAR 0x0201 +#define PLATFORM_LPAR 0x0001 +#define PLATFORM_POWERMAC 0x0400 +#define PLATFORM_MAPLE 0x0500 +#define PLATFORM_PREP 0x0600 +#define PLATFORM_CHRP 0x0700 +#define PLATFORM_CELL 0x1000 + +/* Compat platform codes for 32 bits */ +#define _MACH_prep PLATFORM_PREP +#define _MACH_Pmac PLATFORM_POWERMAC +#define _MACH_chrp PLATFORM_CHRP + +/* PREP sub-platform types see residual.h for these */ #define _PREP_Motorola 0x01 /* motorola prep */ #define _PREP_Firm 0x02 /* firmworks prep */ #define _PREP_IBM 0x00 /* ibm prep */ #define _PREP_Bull 0x03 /* bull prep */ -/* these are arbitrary */ +/* CHRP sub-platform types. These are arbitrary */ #define _CHRP_Motorola 0x04 /* motorola chrp, the cobra */ #define _CHRP_IBM 0x05 /* IBM chrp, the longtrail and longtrail 2 */ #define _CHRP_Pegasos 0x06 /* Genesi/bplan's Pegasos and Pegasos2 */ +#define platform_is_pseries() (_machine == PLATFORM_PSERIES || \ + _machine == PLATFORM_PSERIES_LPAR) +#define platform_is_lpar() (!!(_machine & PLATFORM_LPAR)) + #ifdef CONFIG_PPC_MULTIPLATFORM extern int _machine; +#ifdef CONFIG_PPC32 + /* what kind of prep workstation we are */ extern int _prep_type; extern int _chrp_type; /* * This is used to identify the board type from a given PReP board - * vendor. Board revision is also made available. + * vendor. Board revision is also made available. This will be moved + * elsewhere soon */ extern unsigned char ucSystemType; extern unsigned char ucBoardRev; extern unsigned char ucBoardRevMaj, ucBoardRevMin; -#else + +#endif /* CONFIG_PPC32 */ + +#else /* CONFIG_PPC_MULTIPLATFORM */ #define _machine 0 +#define platform_is_pseries() (0) +#define platform_is_lpar() (0) #endif /* CONFIG_PPC_MULTIPLATFORM */ -#endif /* CONFIG_PPC32 */ -#ifdef CONFIG_PPC64 -/* Platforms supported by PPC64 */ -#define PLATFORM_PSERIES 0x0100 -#define PLATFORM_PSERIES_LPAR 0x0101 -#define PLATFORM_ISERIES_LPAR 0x0201 -#define PLATFORM_LPAR 0x0001 -#define PLATFORM_POWERMAC 0x0400 -#define PLATFORM_MAPLE 0x0500 -#define PLATFORM_CELL 0x1000 -/* Compatibility with drivers coming from PPC32 world */ -#define _machine (systemcfg->platform) -#define _MACH_Pmac PLATFORM_POWERMAC -#endif + /* * Default implementation of macro that returns current Index: linux-work/include/asm-ppc64/systemcfg.h =================================================================== --- linux-work.orig/include/asm-ppc64/systemcfg.h 2005-09-23 12:44:12.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,64 +0,0 @@ -#ifndef _SYSTEMCFG_H -#define _SYSTEMCFG_H - -/* - * Copyright (C) 2002 Peter Bergner , IBM - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -/* Change Activity: - * 2002/09/30 : bergner : Created - * End Change Activity - */ - -/* - * If the major version changes we are incompatible. - * Minor version changes are a hint. - */ -#define SYSTEMCFG_MAJOR 1 -#define SYSTEMCFG_MINOR 1 - -#ifndef __ASSEMBLY__ - -#include - -#define SYSCALL_MAP_SIZE ((__NR_syscalls + 31) / 32) - -struct systemcfg { - __u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */ - struct { /* Systemcfg version numbers */ - __u32 major; /* Major number 0x10 */ - __u32 minor; /* Minor number 0x14 */ - } version; - - __u32 platform; /* Platform flags 0x18 */ - __u32 processor; /* Processor type 0x1C */ - __u64 processorCount; /* # of physical processors 0x20 */ - __u64 physicalMemorySize; /* Size of real memory(B) 0x28 */ - __u64 tb_orig_stamp; /* Timebase at boot 0x30 */ - __u64 tb_ticks_per_sec; /* Timebase tics / sec 0x38 */ - __u64 tb_to_xs; /* Inverse of TB to 2^20 0x40 */ - __u64 stamp_xsec; /* 0x48 */ - __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */ - __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */ - __u32 tz_dsttime; /* Type of dst correction 0x5C */ - /* next four are no longer used except to be exported to /proc */ - __u32 dcache_size; /* L1 d-cache size 0x60 */ - __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ - __u32 icache_size; /* L1 i-cache size 0x68 */ - __u32 icache_line_size; /* L1 i-cache line size 0x6C */ - __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */ - __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */ -}; - -#ifdef __KERNEL__ -extern struct systemcfg *systemcfg; -#endif - -#endif /* __ASSEMBLY__ */ - -#endif /* _SYSTEMCFG_H */ Index: linux-work/arch/powerpc/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/prom_init.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/prom_init.c 2005-11-09 12:00:55.000000000 +1100 @@ -111,11 +111,6 @@ #define prom_debug(x...) #endif -#ifdef CONFIG_PPC32 -#define PLATFORM_POWERMAC _MACH_Pmac -#define PLATFORM_CHRP _MACH_chrp -#endif - typedef u32 prom_arg_t; @@ -1996,7 +1991,8 @@ /* * On pSeries, inform the firmware about our capabilities */ - if (RELOC(of_platform) & PLATFORM_PSERIES) + if (RELOC(of_platform) == PLATFORM_PSERIES || + RELOC(of_platform) == PLATFORM_PSERIES_LPAR) prom_send_capabilities(); #endif Index: linux-work/arch/powerpc/kernel/rtas-proc.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/rtas-proc.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/kernel/rtas-proc.c 2005-11-09 13:56:02.000000000 +1100 @@ -259,7 +259,7 @@ { struct proc_dir_entry *entry; - if (!(systemcfg->platform & PLATFORM_PSERIES)) + if (_machine != PLATFORM_PSERIES && _machine != PLATFORM_PSERIES_LPAR) return 1; rtas_node = of_find_node_by_name(NULL, "rtas"); Index: linux-work/arch/powerpc/kernel/traps.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/traps.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/traps.c 2005-11-09 12:00:55.000000000 +1100 @@ -129,7 +129,7 @@ nl = 1; #endif #ifdef CONFIG_PPC64 - switch (systemcfg->platform) { + switch (_machine) { case PLATFORM_PSERIES: printk("PSERIES "); nl = 1; Index: linux-work/arch/powerpc/mm/hash_utils_64.c =================================================================== --- linux-work.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/mm/hash_utils_64.c 2005-11-09 12:00:55.000000000 +1100 @@ -165,7 +165,7 @@ * normal insert callback here. */ #ifdef CONFIG_PPC_ISERIES - if (systemcfg->platform == PLATFORM_ISERIES_LPAR) + if (_machine == PLATFORM_ISERIES_LPAR) ret = iSeries_hpte_insert(hpteg, va, virt_to_abs(paddr), tmp_mode, @@ -174,7 +174,7 @@ else #endif #ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_LPAR) + if (_machine & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(paddr), tmp_mode, @@ -293,7 +293,7 @@ * Not in the device-tree, let's fallback on known size * list for 16M capable GP & GR */ - if ((systemcfg->platform != PLATFORM_ISERIES_LPAR) && + if ((_machine != PLATFORM_ISERIES_LPAR) && cpu_has_feature(CPU_FTR_16M_PAGE)) memcpy(mmu_psize_defs, mmu_psize_defaults_gp, sizeof(mmu_psize_defaults_gp)); @@ -364,7 +364,7 @@ static unsigned long __init htab_get_table_size(void) { - unsigned long rnd_mem_size, pteg_count; + unsigned long mem_size, rnd_mem_size, pteg_count; /* If hash size isn't already provided by the platform, we try to * retreive it from the device-tree. If it's not there neither, we @@ -376,8 +376,9 @@ return 1UL << ppc64_pft_size; /* round mem_size up to next power of 2 */ - rnd_mem_size = 1UL << __ilog2(systemcfg->physicalMemorySize); - if (rnd_mem_size < systemcfg->physicalMemorySize) + mem_size = lmb_phys_mem_size(); + rnd_mem_size = 1UL << __ilog2(mem_size); + if (rnd_mem_size < mem_size) rnd_mem_size <<= 1; /* # pages / 2 */ @@ -410,7 +411,7 @@ htab_hash_mask = pteg_count - 1; - if (systemcfg->platform & PLATFORM_LPAR) { + if (_machine & PLATFORM_LPAR) { /* Using a hypervisor which owns the htab */ htab_address = NULL; _SDR1 = 0; Index: linux-work/arch/powerpc/oprofile/op_model_power4.c =================================================================== --- linux-work.orig/arch/powerpc/oprofile/op_model_power4.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/oprofile/op_model_power4.c 2005-11-09 12:00:55.000000000 +1100 @@ -233,8 +233,7 @@ mmcra = mfspr(SPRN_MMCRA); /* Were we in the hypervisor? */ - if ((systemcfg->platform == PLATFORM_PSERIES_LPAR) && - (mmcra & MMCRA_SIHV)) + if (platform_is_lpar() && (mmcra & MMCRA_SIHV)) /* function descriptor madness */ return *((unsigned long *)hypervisor_bucket); Index: linux-work/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/iommu.c 2005-11-09 12:00:55.000000000 +1100 @@ -42,7 +42,6 @@ #include #include #include -#include #include #include #include @@ -582,7 +581,7 @@ return; } - if (systemcfg->platform & PLATFORM_LPAR) { + if (platform_is_lpar()) { if (firmware_has_feature(FW_FEATURE_MULTITCE)) { ppc_md.tce_build = tce_buildmulti_pSeriesLP; ppc_md.tce_free = tce_freemulti_pSeriesLP; Index: linux-work/arch/powerpc/platforms/pseries/pci.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/pci.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/pci.c 2005-11-09 12:00:55.000000000 +1100 @@ -123,7 +123,7 @@ int i; unsigned int reg; - if (!(systemcfg->platform & PLATFORM_PSERIES)) + if (!platform_is_pseries()) return; printk("Using INTC for W82c105 IDE controller.\n"); Index: linux-work/arch/powerpc/platforms/pseries/reconfig.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/reconfig.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/reconfig.c 2005-11-09 12:00:55.000000000 +1100 @@ -408,7 +408,7 @@ { struct proc_dir_entry *ent; - if (!(systemcfg->platform & PLATFORM_PSERIES)) + if (!platform_is_pseries()) return 0; ent = create_proc_entry("ppc64/ofdt", S_IWUSR, NULL); Index: linux-work/arch/powerpc/platforms/pseries/rtasd.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/rtasd.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/rtasd.c 2005-11-09 12:00:55.000000000 +1100 @@ -482,10 +482,12 @@ { struct proc_dir_entry *entry; - /* No RTAS, only warn if we are on a pSeries box */ + if (!platform_is_pseries()) + return 0; + + /* No RTAS */ if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { - if (systemcfg->platform & PLATFORM_PSERIES) - printk(KERN_INFO "rtasd: no event-scan on system\n"); + printk(KERN_INFO "rtasd: no event-scan on system\n"); return 1; } Index: linux-work/arch/powerpc/platforms/pseries/setup.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/setup.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/setup.c 2005-11-09 12:00:55.000000000 +1100 @@ -249,7 +249,7 @@ ppc_md.idle_loop = default_idle; } - if (systemcfg->platform & PLATFORM_LPAR) + if (platform_is_lpar()) ppc_md.enable_pmcs = pseries_lpar_enable_pmcs; else ppc_md.enable_pmcs = power4_enable_pmcs; @@ -378,7 +378,7 @@ fw_feature_init(); - if (systemcfg->platform & PLATFORM_LPAR) + if (platform_is_lpar()) hpte_init_lpar(); else { hpte_init_native(); @@ -388,7 +388,7 @@ generic_find_legacy_serial_ports(&physport, &default_speed); - if (systemcfg->platform & PLATFORM_LPAR) + if (platform_is_lpar()) find_udbg_vterm(); else if (physport) { /* Map the uart for udbg. */ @@ -592,7 +592,7 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus) { - if (systemcfg->platform & PLATFORM_LPAR) + if (platform_is_lpar()) return PCI_PROBE_DEVTREE; return PCI_PROBE_NORMAL; } Index: linux-work/arch/powerpc/platforms/pseries/smp.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/smp.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/smp.c 2005-11-09 12:00:55.000000000 +1100 @@ -46,6 +46,7 @@ #include #include #include +#include #include "plpar_wrappers.h" @@ -96,7 +97,7 @@ int cpu = smp_processor_id(); cpu_clear(cpu, cpu_online_map); - systemcfg->processorCount--; + _systemcfg->processorCount--; /*fix boot_cpuid here*/ if (cpu == boot_cpuid) @@ -441,7 +442,7 @@ smp_ops->cpu_die = pSeries_cpu_die; /* Processors can be added/removed only on LPAR */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + if (platform_is_lpar()) pSeries_reconfig_notifier_register(&pSeries_smp_nb); #endif Index: linux-work/arch/powerpc/platforms/pseries/xics.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/xics.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/xics.c 2005-11-09 12:00:55.000000000 +1100 @@ -545,7 +545,9 @@ of_node_put(np); } - if (systemcfg->platform == PLATFORM_PSERIES) { + if (platform_is_lpar()) + ops = &pSeriesLP_ops; + else { #ifdef CONFIG_SMP for_each_cpu(i) { int hard_id; @@ -561,8 +563,6 @@ #else xics_per_cpu[0] = ioremap(intr_base, intr_size); #endif /* CONFIG_SMP */ - } else if (systemcfg->platform == PLATFORM_PSERIES_LPAR) { - ops = &pSeriesLP_ops; } xics_8259_pic.enable = i8259_pic.enable; Index: linux-work/arch/ppc64/kernel/pacaData.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pacaData.c 2005-11-08 11:00:17.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pacaData.c 2005-11-09 13:24:34.000000000 +1100 @@ -15,7 +15,7 @@ #include #include #include - +#include #include #include #include @@ -24,8 +24,7 @@ struct systemcfg data; u8 page[PAGE_SIZE]; } systemcfg_store __attribute__((__section__(".data.page.aligned"))); -struct systemcfg *systemcfg = &systemcfg_store.data; -EXPORT_SYMBOL(systemcfg); +struct systemcfg *_systemcfg = &systemcfg_store.data; /* This symbol is provided by the linker - let it fill in the paca Index: linux-work/arch/ppc64/kernel/proc_ppc64.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/proc_ppc64.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/ppc64/kernel/proc_ppc64.c 2005-11-09 12:00:55.000000000 +1100 @@ -53,7 +53,7 @@ if (!root) return 1; - if (!(systemcfg->platform & (PLATFORM_PSERIES | PLATFORM_CELL))) + if (!(platform_is_pseries() || _machine == PLATFORM_CELL)) return 0; if (!proc_mkdir("rtas", root)) @@ -74,7 +74,7 @@ if (!pde) return 1; pde->nlink = 1; - pde->data = systemcfg; + pde->data = _systemcfg; pde->size = PAGE_SIZE; pde->proc_fops = &page_map_fops; Index: linux-work/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom_init.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/ppc64/kernel/prom_init.c 2005-11-09 12:00:55.000000000 +1100 @@ -1934,7 +1934,8 @@ /* * On pSeries, inform the firmware about our capabilities */ - if (RELOC(of_platform) & PLATFORM_PSERIES) + if (RELOC(of_platform) == PLATFORM_PSERIES || + RELOC(of_platform) == PLATFORM_PSERIES_LPAR) prom_send_capabilities(); /* Index: linux-work/arch/ppc64/kernel/vdso.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/ppc64/kernel/vdso.c 2005-11-09 12:00:55.000000000 +1100 @@ -34,6 +34,7 @@ #include #include #include +#include #include #undef DEBUG @@ -179,7 +180,7 @@ * Last page is systemcfg. */ if ((vma->vm_end - address) <= PAGE_SIZE) - pg = virt_to_page(systemcfg); + pg = virt_to_page(_systemcfg); else pg = virt_to_page(vbase + offset); @@ -604,7 +605,7 @@ get_page(pg); } - get_page(virt_to_page(systemcfg)); + get_page(virt_to_page(_systemcfg)); } int in_gate_area_no_task(unsigned long addr) Index: linux-work/arch/powerpc/kernel/ppc_ksyms.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/ppc_ksyms.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/kernel/ppc_ksyms.c 2005-11-09 13:26:09.000000000 +1100 @@ -188,9 +188,6 @@ EXPORT_SYMBOL(cuda_request); EXPORT_SYMBOL(cuda_poll); #endif /* CONFIG_ADB_CUDA */ -#if defined(CONFIG_PPC_MULTIPLATFORM) && defined(CONFIG_PPC32) -EXPORT_SYMBOL(_machine); -#endif #ifdef CONFIG_PPC_PMAC EXPORT_SYMBOL(sys_ctrler); #endif Index: linux-work/arch/powerpc/kernel/setup_32.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/setup_32.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/kernel/setup_32.c 2005-11-09 13:25:32.000000000 +1100 @@ -71,6 +71,7 @@ #ifdef CONFIG_PPC_MULTIPLATFORM int _machine = 0; +EXPORT_SYMBOL(_machine); extern void prep_init(void); extern void pmac_init(void); Index: linux-work/arch/ppc/kernel/ppc_ksyms.c =================================================================== --- linux-work.orig/arch/ppc/kernel/ppc_ksyms.c 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/ppc/kernel/ppc_ksyms.c 2005-11-09 13:25:57.000000000 +1100 @@ -217,9 +217,6 @@ EXPORT_SYMBOL(cuda_request); EXPORT_SYMBOL(cuda_poll); #endif /* CONFIG_ADB_CUDA */ -#ifdef CONFIG_PPC_MULTIPLATFORM -EXPORT_SYMBOL(_machine); -#endif #ifdef CONFIG_PPC_PMAC EXPORT_SYMBOL(sys_ctrler); EXPORT_SYMBOL(pmac_newworld); Index: linux-work/arch/ppc/kernel/setup.c =================================================================== --- linux-work.orig/arch/ppc/kernel/setup.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/ppc/kernel/setup.c 2005-11-09 13:25:15.000000000 +1100 @@ -76,6 +76,7 @@ #ifdef CONFIG_PPC_MULTIPLATFORM int _machine = 0; +EXPORT_SYMBOL(_machine); extern void prep_init(unsigned long r3, unsigned long r4, unsigned long r5, unsigned long r6, unsigned long r7); Index: linux-work/arch/ppc64/kernel/pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pci.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pci.c 2005-11-09 13:04:41.000000000 +1100 @@ -1277,12 +1277,9 @@ * G5 machines... So when something asks for bus 0 io base * (bus 0 is HT root), we return the AGP one instead. */ -#ifdef CONFIG_PPC_PMAC - if (systemcfg->platform == PLATFORM_POWERMAC && - machine_is_compatible("MacRISC4")) + if (machine_is_compatible("MacRISC4")) if (in_bus == 0) in_bus = 0xf0; -#endif /* CONFIG_PPC_PMAC */ /* That syscall isn't quite compatible with PCI domains, but it's * used on pre-domains setup. We return the first match Index: linux-work/include/asm-powerpc/systemcfg.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/include/asm-powerpc/systemcfg.h 2005-11-09 14:03:02.000000000 +1100 @@ -0,0 +1,64 @@ +#ifndef _SYSTEMCFG_H +#define _SYSTEMCFG_H + +/* + * Copyright (C) 2002 Peter Bergner , IBM + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +/* Change Activity: + * 2002/09/30 : bergner : Created + * End Change Activity + */ + +/* + * If the major version changes we are incompatible. + * Minor version changes are a hint. + */ +#define SYSTEMCFG_MAJOR 1 +#define SYSTEMCFG_MINOR 1 + +#ifndef __ASSEMBLY__ + +#include + +#define SYSCALL_MAP_SIZE ((__NR_syscalls + 31) / 32) + +struct systemcfg { + __u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */ + struct { /* Systemcfg version numbers */ + __u32 major; /* Major number 0x10 */ + __u32 minor; /* Minor number 0x14 */ + } version; + + __u32 platform; /* Platform flags 0x18 */ + __u32 processor; /* Processor type 0x1C */ + __u64 processorCount; /* # of physical processors 0x20 */ + __u64 physicalMemorySize; /* Size of real memory(B) 0x28 */ + __u64 tb_orig_stamp; /* Timebase at boot 0x30 */ + __u64 tb_ticks_per_sec; /* Timebase tics / sec 0x38 */ + __u64 tb_to_xs; /* Inverse of TB to 2^20 0x40 */ + __u64 stamp_xsec; /* 0x48 */ + __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */ + __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */ + __u32 tz_dsttime; /* Type of dst correction 0x5C */ + /* next four are no longer used except to be exported to /proc */ + __u32 dcache_size; /* L1 d-cache size 0x60 */ + __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ + __u32 icache_size; /* L1 i-cache size 0x68 */ + __u32 icache_line_size; /* L1 i-cache line size 0x6C */ + __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */ + __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */ +}; + +#ifdef __KERNEL__ +extern struct systemcfg *_systemcfg; /* to be renamed */ +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* _SYSTEMCFG_H */ From michael at ellerman.id.au Wed Nov 9 18:06:21 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Wed, 9 Nov 2005 18:06:21 +1100 (EST) Subject: [PATCH] powerpc: Merge page.h Message-ID: <20051109070621.2B5A7686C9@ozlabs.org> Merge asm-ppc/page.h and asm-ppc64/page.h, redone from scratch after the 64k pages patch went in. Built for PPC (common_defconfig), with ARCH=ppc/powerpc. Built and booted on P5 LPAR for PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Signed-off-by: Michael Ellerman --- include/asm-powerpc/page.h | 433 +++++++++++++++++++++++++++++++++++++++++++++ include/asm-ppc/page.h | 173 ----------------- include/asm-ppc64/page.h | 333 ---------------------------------- 3 files changed, 433 insertions(+), 506 deletions(-) Index: kexec/include/asm-powerpc/page.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page.h @@ -0,0 +1,433 @@ +#ifndef _ASM_POWERPC_PAGE_H +#define _ASM_POWERPC_PAGE_H + +/* + * Copyright (C) 2001,2005 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + + +#ifdef __KERNEL__ +#include +#include /* for ASM_CONST */ + +#ifdef __powerpc64__ +#define VMALLOCBASE ASM_CONST(0xD000000000000000) + +#define REGION_SIZE 4UL +#define REGION_SHIFT 60UL +#define REGION_MASK (((1UL<> REGION_SHIFT) +#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) +#define USER_REGION_ID (0UL) +#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) + +#define PAGE_OFFSET ASM_CONST(0xC000000000000000) +#else +#define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) +#endif + +#define KERNELBASE PAGE_OFFSET + +/* + * We support either 4k or 64k software page size. When using 64k pages + * however, wether we are really supporting 64k pages in HW or not is + * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 + * as use of 64k pages remains a linux kernel specific, every notion of + * page number shared with the firmware, TCEs, iommu, etc... still assumes + * a page size of 4096. + */ +#ifdef CONFIG_PPC_64K_PAGES +#define PAGE_SHIFT 16 +#else +#define PAGE_SHIFT 12 +#endif + +#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) + +/* + * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So on PPC32 + * if we assign PAGE_MASK to a long long it gets extended the way want + * (i.e. with 1s in the high bits) + */ +#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) + +#ifdef __powerpc64__ +/* HW_PAGE_SHIFT is always 4k pages */ +#define HW_PAGE_SHIFT 12 +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) + +/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and + * HW_PAGE_SHIFT, that is 4k pages + */ +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) + +/* Segment size */ +#define SID_SHIFT 28 +#define SID_MASK 0xfffffffffUL +#define ESID_MASK 0xfffffffff0000000UL +#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) + +/* Large pages size */ + +#ifndef __ASSEMBLY__ +extern unsigned int HPAGE_SHIFT; +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) +#endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_HUGETLB_PAGE + + +#define HTLB_AREA_SHIFT 40 +#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) +#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) + +#define LOW_ESID_MASK(addr, len) \ + (((1U << (GET_ESID(addr + len - 1) + 1)) \ + - (1U << GET_ESID(addr))) & 0xffff) + +#define HTLB_AREA_MASK(addr, len) \ + (((1U << (GET_HTLB_AREA(addr + len - 1) + 1)) \ + - (1U << GET_HTLB_AREA(addr))) & 0xffff) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE +#define ARCH_HAS_SETCLEAR_HUGE_PTE + +#define touches_hugepage_low_range(mm, addr, len) \ + (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) +#define touches_hugepage_high_range(mm, addr, len) \ + (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) + +#define __within_hugepage_low_range(addr, len, segmask) \ + ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) +#define within_hugepage_low_range(addr, len) \ + __within_hugepage_low_range((addr), (len), \ + current->mm->context.low_htlb_areas) +#define __within_hugepage_high_range(addr, len, zonemask) \ + ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) +#define within_hugepage_high_range(addr, len) \ + __within_hugepage_high_range((addr), (len), \ + current->mm->context.high_htlb_areas) + +#define is_hugepage_only_range(mm, addr, len) \ + (touches_hugepage_high_range((mm), (addr), (len)) || \ + touches_hugepage_low_range((mm), (addr), (len))) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ + ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ + ( ((addr) < 0x100000000L) && \ + ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ +#endif /* __powerpc64__ */ + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) +#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr,size) _ALIGN_UP(addr,size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +#ifndef __ASSEMBLY__ + +#ifdef __powerpc64__ +#include +#endif + +#ifdef __powerpc64__ +typedef unsigned long pte_basic_t; +#elif defined(CONFIG_PTE_64BIT) +/* Some PPC32 machines have 64-bit PTEs for physical addressing. */ +typedef unsigned long long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ +#define PTE_FMT "%16Lx" +#else +typedef unsigned long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ +#define PTE_FMT "%.8lx" +#endif + +#undef STRICT_MM_TYPECHECKS + +#ifdef STRICT_MM_TYPECHECKS +/* + * These are used to make use of C type-checking.. + */ + +/* PTE level */ +typedef struct { pte_basic_t pte; } pte_t; +#define pte_val(x) ((x).pte) +#define __pte(x) ((pte_t) { (x) }) + +/* 64k pages additionally define a bigger "real PTE" type that gathers + * the "second half" part of the PTE for pseudo 64k pages */ +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef struct { pte_t pte; } real_pte_t; +#endif + +/* PMD level */ +typedef struct { unsigned long pmd; } pmd_t; +#define pmd_val(x) ((x).pmd) +#define __pmd(x) ((pmd_t) { (x) }) + +/* PUD level exists only on 4k pages */ +#ifndef CONFIG_PPC_64K_PAGES +typedef struct { unsigned long pud; } pud_t; +#define pud_val(x) ((x).pud) +#define __pud(x) ((pud_t) { (x) }) +#endif + +/* PGD level */ +typedef struct { unsigned long pgd; } pgd_t; +#define pgd_val(x) ((x).pgd) +#define __pgd(x) ((pgd_t) { (x) }) + +/* Page protection bits */ +typedef struct { unsigned long pgprot; } pgprot_t; +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) }) + +#else + +/* + * .. while these make it easier on the compiler + */ +typedef pte_basic_t pte_t; +#define pte_val(x) (x) +#define __pte(x) (x) + +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef unsigned long real_pte_t; +#endif + +typedef unsigned long pmd_t; +#define pmd_val(x) (x) +#define __pmd(x) (x) + +#ifndef CONFIG_PPC_64K_PAGES +typedef unsigned long pud_t; +#define pud_val(x) (x) +#define __pud(x) (x) +#endif + +typedef unsigned long pgd_t; +#define pgd_val(x) (x) +#define __pgd(x) (x) + +typedef unsigned long pgprot_t; +#define pgprot_val(x) (x) +#define __pgprot(x) (x) + +#endif /* !STRICT_MM_TYPECHECKS */ + +struct page; + +#ifdef __powerpc64__ + +static __inline__ void clear_page(void *addr) +{ + unsigned long lines, line_size; + + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; + + __asm__ __volatile__( + "mtctr %1 # clear_page\n\ +1: dcbz 0,%0\n\ + add %0,%0,%3\n\ + bdnz+ 1b" + : "=r" (addr) + : "r" (lines), "0" (addr), "r" (line_size) + : "ctr", "memory"); +} + + +extern void copy_4K_page(void *to, void *from); + +#ifdef CONFIG_PPC_64K_PAGES +static inline void copy_page(void *to, void *from) +{ + unsigned int i; + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { + copy_4K_page(to, from); + to += 4096; + from += 4096; + } +} +#else /* CONFIG_PPC_64K_PAGES */ +static inline void copy_page(void *to, void *from) +{ + copy_4K_page(to, from); +} +#endif /* CONFIG_PPC_64K_PAGES */ +#else +extern void clear_pages(void *page, int order); +static inline void clear_page(void *page) { clear_pages(page, 0); } +extern void copy_page(void *to, void *from); +#endif /* __powerpc64__ */ + +extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); +extern void copy_user_page(void *to, void *from, unsigned long vaddr, + struct page *p); + +#ifndef CONFIG_APUS +#define PPC_MEMSTART 0 +#define PPC_PGSTART 0 +#define PPC_MEMOFFSET PAGE_OFFSET +#else +extern unsigned long ppc_memstart; +extern unsigned long ppc_pgstart; +extern unsigned long ppc_memoffset; +#define PPC_MEMSTART ppc_memstart +#define PPC_PGSTART ppc_pgstart +#define PPC_MEMOFFSET ppc_memoffset +#endif + +#if defined(CONFIG_APUS) && !defined(MODULE) +/* map phys->virtual and virtual->phys for RAM pages */ +static inline unsigned long ___pa(unsigned long v) +{ + unsigned long p; + asm volatile ("1: addis %0, %1, %2;" + ".section \".vtop_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (p) + : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); + + return p; +} +static inline void* ___va(unsigned long p) +{ + unsigned long v; + asm volatile ("1: addis %0, %1, %2;" + ".section \".ptov_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (v) + : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); + + return (void*) v; +} +#else +#define ___pa(vaddr) ((vaddr) - PPC_MEMOFFSET) +#define ___va(paddr) ((paddr) + PPC_MEMOFFSET) +#endif + +extern int page_is_ram(unsigned long pfn); + +#ifdef __powerpc64__ +extern u64 ppc64_pft_size; /* Log 2 of page table size */ + +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + +#else +/* Pure 2^n version of get_order */ +extern __inline__ int get_order(unsigned long size) +{ + int lz; + + size = (size-1) >> PAGE_SHIFT; + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); + return 32 - lz; +} +#endif + +#endif /* __ASSEMBLY__ */ + +#define __pa(x) ___pa((unsigned long)(x)) +#define __va(x) ((void *)(___va((unsigned long)(x)))) + +#ifdef CONFIG_FLATMEM +#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + PPC_PGSTART) +#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) +#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) +#endif + +#ifdef CONFIG_DISCONTIGMEM +#define page_to_pfn(page) discontigmem_page_to_pfn(page) +#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) +#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) +#endif + +#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) +#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) + +#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) + +#ifdef MODULE +#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) +#else +#define __page_aligned \ + __attribute__((__aligned__(PAGE_SIZE), \ + __section__(".data.page_aligned"))) +#endif + +/* + * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, + * and needs to be executable. This means the whole heap ends + * up being executable. + */ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#ifdef __powerpc64__ +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +/* + * This is the default if a program doesn't have a PT_GNU_STACK + * program header entry. The PPC64 ELF ABI has a non executable stack + * by default, so in the absense of a PT_GNU_STACK program header we + * turn execute permission off. + */ +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) +#else +#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS32 +#endif /*__powerpc64__ */ + +#endif /* __KERNEL__ */ + +#ifdef __powerpc64__ +#include +#endif + +#endif /* _ASM_POWERPC_PAGE_H */ Index: kexec/include/asm-ppc/page.h =================================================================== --- kexec.orig/include/asm-ppc/page.h +++ /dev/null @@ -1,173 +0,0 @@ -#ifndef _PPC_PAGE_H -#define _PPC_PAGE_H - -/* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 -#define PAGE_SIZE (1UL << PAGE_SHIFT) - -/* - * Subtle: this is an int (not an unsigned long) and so it - * gets extended to 64 bits the way want (i.e. with 1s). -- paulus - */ -#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) - -#ifdef __KERNEL__ -#include - -/* This must match what is in arch/ppc/Makefile */ -#define PAGE_OFFSET CONFIG_KERNEL_START -#define KERNELBASE PAGE_OFFSET - -#ifndef __ASSEMBLY__ - -/* - * The basic type of a PTE - 64 bits for those CPUs with > 32 bit - * physical addressing. For now this just the IBM PPC440. - */ -#ifdef CONFIG_PTE_64BIT -typedef unsigned long long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ -#define PTE_FMT "%16Lx" -#else -typedef unsigned long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ -#define PTE_FMT "%.8lx" -#endif - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - - -#undef STRICT_MM_TYPECHECKS - -#ifdef STRICT_MM_TYPECHECKS -/* - * These are used to make use of C type-checking.. - */ -typedef struct { pte_basic_t pte; } pte_t; -typedef struct { unsigned long pmd; } pmd_t; -typedef struct { unsigned long pgd; } pgd_t; -typedef struct { unsigned long pgprot; } pgprot_t; - -#define pte_val(x) ((x).pte) -#define pmd_val(x) ((x).pmd) -#define pgd_val(x) ((x).pgd) -#define pgprot_val(x) ((x).pgprot) - -#define __pte(x) ((pte_t) { (x) } ) -#define __pmd(x) ((pmd_t) { (x) } ) -#define __pgd(x) ((pgd_t) { (x) } ) -#define __pgprot(x) ((pgprot_t) { (x) } ) - -#else -/* - * .. while these make it easier on the compiler - */ -typedef pte_basic_t pte_t; -typedef unsigned long pmd_t; -typedef unsigned long pgd_t; -typedef unsigned long pgprot_t; - -#define pte_val(x) (x) -#define pmd_val(x) (x) -#define pgd_val(x) (x) -#define pgprot_val(x) (x) - -#define __pte(x) (x) -#define __pmd(x) (x) -#define __pgd(x) (x) -#define __pgprot(x) (x) - -#endif - -struct page; -extern void clear_pages(void *page, int order); -static inline void clear_page(void *page) { clear_pages(page, 0); } -extern void copy_page(void *to, void *from); -extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); -extern void copy_user_page(void *to, void *from, unsigned long vaddr, - struct page *pg); - -#ifndef CONFIG_APUS -#define PPC_MEMSTART 0 -#define PPC_PGSTART 0 -#define PPC_MEMOFFSET PAGE_OFFSET -#else -extern unsigned long ppc_memstart; -extern unsigned long ppc_pgstart; -extern unsigned long ppc_memoffset; -#define PPC_MEMSTART ppc_memstart -#define PPC_PGSTART ppc_pgstart -#define PPC_MEMOFFSET ppc_memoffset -#endif - -#if defined(CONFIG_APUS) && !defined(MODULE) -/* map phys->virtual and virtual->phys for RAM pages */ -static inline unsigned long ___pa(unsigned long v) -{ - unsigned long p; - asm volatile ("1: addis %0, %1, %2;" - ".section \".vtop_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (p) - : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); - - return p; -} -static inline void* ___va(unsigned long p) -{ - unsigned long v; - asm volatile ("1: addis %0, %1, %2;" - ".section \".ptov_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (v) - : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); - - return (void*) v; -} -#else -#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET) -#define ___va(paddr) ((paddr)+PPC_MEMOFFSET) -#endif - -extern int page_is_ram(unsigned long pfn); - -#define __pa(x) ___pa((unsigned long)(x)) -#define __va(x) ((void *)(___va((unsigned long)(x)))) - -#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + PPC_PGSTART) -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) - -#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* Pure 2^n version of get_order */ -extern __inline__ int get_order(unsigned long size) -{ - int lz; - - size = (size-1) >> PAGE_SHIFT; - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); - return 32 - lz; -} - -#endif /* __ASSEMBLY__ */ - -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#endif /* __KERNEL__ */ -#endif /* _PPC_PAGE_H */ Index: kexec/include/asm-ppc64/page.h =================================================================== --- kexec.orig/include/asm-ppc64/page.h +++ /dev/null @@ -1,333 +0,0 @@ -#ifndef _PPC64_PAGE_H -#define _PPC64_PAGE_H - -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include /* for ASM_CONST */ - -/* - * We support either 4k or 64k software page size. When using 64k pages - * however, wether we are really supporting 64k pages in HW or not is - * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 - * as use of 64k pages remains a linux kernel specific, every notion of - * page number shared with the firmware, TCEs, iommu, etc... still assumes - * a page size of 4096. - */ -#ifdef CONFIG_PPC_64K_PAGES -#define PAGE_SHIFT 16 -#else -#define PAGE_SHIFT 12 -#endif - -#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - -/* HW_PAGE_SHIFT is always 4k pages */ -#define HW_PAGE_SHIFT 12 -#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) -#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) - -/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and - * HW_PAGE_SHIFT, that is 4k pages - */ -#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) - -/* Segment size */ -#define SID_SHIFT 28 -#define SID_MASK 0xfffffffffUL -#define ESID_MASK 0xfffffffff0000000UL -#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) - -/* Large pages size */ - -#ifndef __ASSEMBLY__ -extern unsigned int HPAGE_SHIFT; -#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#endif /* __ASSEMBLY__ */ - -#ifdef CONFIG_HUGETLB_PAGE - - -#define HTLB_AREA_SHIFT 40 -#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) -#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) - -#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ - - (1U << GET_ESID(addr))) & 0xffff) -#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ - - (1U << GET_HTLB_AREA(addr))) & 0xffff) - -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE -#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE -#define ARCH_HAS_SETCLEAR_HUGE_PTE - -#define touches_hugepage_low_range(mm, addr, len) \ - (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) -#define touches_hugepage_high_range(mm, addr, len) \ - (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) - -#define __within_hugepage_low_range(addr, len, segmask) \ - ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) -#define within_hugepage_low_range(addr, len) \ - __within_hugepage_low_range((addr), (len), \ - current->mm->context.low_htlb_areas) -#define __within_hugepage_high_range(addr, len, zonemask) \ - ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) -#define within_hugepage_high_range(addr, len) \ - __within_hugepage_high_range((addr), (len), \ - current->mm->context.high_htlb_areas) - -#define is_hugepage_only_range(mm, addr, len) \ - (touches_hugepage_high_range((mm), (addr), (len)) || \ - touches_hugepage_low_range((mm), (addr), (len))) -#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA - -#define in_hugepage_area(context, addr) \ - (cpu_has_feature(CPU_FTR_16M_PAGE) && \ - ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ - ( ((addr) < 0x100000000L) && \ - ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) - -#else /* !CONFIG_HUGETLB_PAGE */ - -#define in_hugepage_area(mm, addr) 0 - -#endif /* !CONFIG_HUGETLB_PAGE */ - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#include - -#undef STRICT_MM_TYPECHECKS - -#define REGION_SIZE 4UL -#define REGION_SHIFT 60UL -#define REGION_MASK (((1UL<> REGION_SHIFT) -#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) -#define USER_REGION_ID (0UL) -#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) - -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) - -#ifdef CONFIG_DISCONTIGMEM -#define page_to_pfn(page) discontigmem_page_to_pfn(page) -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) -#endif -#ifdef CONFIG_FLATMEM -#define pfn_to_page(pfn) (mem_map + (pfn)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) -#define pfn_valid(pfn) ((pfn) < max_mapnr) -#endif - -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) - -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* - * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, - * and needs to be executable. This means the whole heap ends - * up being executable. - */ -#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) - -/* - * This is the default if a program doesn't have a PT_GNU_STACK - * program header entry. The PPC64 ELF ABI has a non executable stack - * stack by default, so in the absense of a PT_GNU_STACK program header - * we turn execute permission off. - */ -#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) - -#endif /* __KERNEL__ */ - -#include - -#endif /* _PPC64_PAGE_H */ From benh at kernel.crashing.org Wed Nov 9 18:14:34 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Nov 2005 18:14:34 +1100 Subject: [PATCH] Merge platform codes (ignore it) In-Reply-To: <1131505673.24637.37.camel@gaston> References: <1131505673.24637.37.camel@gaston> Message-ID: <1131520475.24637.65.camel@gaston> Ignore it, it has bugs and I'm making it better ... new patch tomorrow. Ben. From sr at denx.de Wed Nov 9 20:47:32 2005 From: sr at denx.de (Stefan Roese) Date: Wed, 9 Nov 2005 10:47:32 +0100 Subject: 440EP FPU support missing In-Reply-To: <20051108153036.F27232@cox.net> References: <20051107124917.C1671@cox.net> <200511081838.11236.sr@denx.de> <20051108153036.F27232@cox.net> Message-ID: <200511091047.32662.sr@denx.de> On Tuesday 08 November 2005 23:30, Matt Porter wrote: > On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this > > happened in the ppc/ppc64 -> powerpc merge. Any thoughts, why this file > > disappeared and how to solve this problem (just restore the original > > file)? > > arch/powerpc/kernel/fpu.S is being used now which doesn't have KernelFP. > I don't know why the 44x fpu support wasn't using > kernel_fp_unavailable_exception() before but I must have missed that > reviewing it. > > Try this patch. > > -Matt > > diff --git a/arch/ppc/kernel/head_booke.h b/arch/ppc/kernel/head_booke.h > index aeb349b..f3d274c 100644 > --- a/arch/ppc/kernel/head_booke.h > +++ b/arch/ppc/kernel/head_booke.h > @@ -358,6 +358,6 @@ label: > NORMAL_EXCEPTION_PROLOG; \ > bne load_up_fpu; /* if from user, just load it up */ \ > addi r3,r1,STACK_FRAME_OVERHEAD; \ > - EXC_XFER_EE_LITE(0x800, KernelFP) > + EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception) > > #endif /* __HEAD_BOOKE_H__ */ Thanks Matt. That fixes the problem. Please send this patch upstream. Best regards, Stefan From arnd at arndb.de Wed Nov 9 21:39:36 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Nov 2005 11:39:36 +0100 Subject: powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc In-Reply-To: <20051109023801.GI28271@localhost.localdomain> References: <20051109023801.GI28271@localhost.localdomain> Message-ID: <200511091139.37081.arnd@arndb.de> On Middeweken 09 November 2005 03:38, David Gibson wrote: > + > +IOCTL_TABLE_START > +#include > +#define DECLARES > +#include "compat_ioctl.c" > + > +/* Little p (/dev/rtc, /dev/envctrl, etc.) */ > +COMPATIBLE_IOCTL(_IOR('p', 20, int[7])) /* RTCGET */ > +COMPATIBLE_IOCTL(_IOW('p', 21, int[7])) /* RTCSET */ > + > +IOCTL_TABLE_END BTW, since the two RTC ioctls are the only powerpc specific defines left in the tree, I checked where these come from. It turns out that they are only used in a sparc specific SBUS device driver so they never should have been in the ppc64 tree in the first place. Arnd <>< From arnd at arndb.de Wed Nov 9 21:43:48 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Nov 2005 11:43:48 +0100 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <1131477245.7855.17.camel@Memoria.anyarch.net> References: <1131477245.7855.17.camel@Memoria.anyarch.net> Message-ID: <200511091143.48884.arnd@arndb.de> On Dinsdag 08 November 2005 20:14, Daniel Ostrow wrote: > +CC?????:= $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null > 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) You should be using $(call cc-option,-m64) for this, like we do in the rest of the Makefile. Arnd <>< From bernd at firmix.at Wed Nov 9 20:22:42 2005 From: bernd at firmix.at (Bernd Petrovitsch) Date: Wed, 09 Nov 2005 10:22:42 +0100 Subject: typedefs and structs In-Reply-To: <1131492815.14381.184.camel@localhost.localdomain> References: <20051105061114.GA27016@kroah.com> <17262.37107.857718.184055@cargo.ozlabs.ibm.com> <20051107175541.GB19593@austin.ibm.com> <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <1131492815.14381.184.camel@localhost.localdomain> Message-ID: <1131528162.19171.14.camel@tara.firmix.at> On Tue, 2005-11-08 at 18:33 -0500, Steven Rostedt wrote: > On Tue, 2005-11-08 at 17:23 -0600, linas wrote: > > On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark: > > > On Mon, 2005-11-07 at 14:41 -0600, linas wrote: > > > > > > don't use typedef to get rid of "struct". > > > > > > This was for the simple reason, too many developers were passing > > > structures by value instead of by reference, just because they were > > > using a type that they didn't realize was a structure. > > > > That's a rather bizarre mistake to make, since, in order to > > access a values in such a beast, you have to use a dot . instead > > of an arrow -> and so it hits ou in the face that you passed a value > > instead of a reference. And for every access of a field with a . you also look if it is not a locally declared (small) struct? > It happens when you access the variable via macros and other routines > that you notice that takes and address of the variable, so you just pass > in the address of the current local variable. > > ---- > > Off-topic: There's actually a neat little trick in C++ that can > > help avoid accidentally passing null pointers. One can declare And if you want a NULL-pointer equivalent, you declared a defined null_blah object just to have a reference. I've seen that often enough. If want to avoid accidental NULL pointers, use "splint" or similar tools. Or add an BUG_ON(). > > function declarations as: > > > > int func (sturct blah &v) { > > v.a ++; > > return v.b; > > } > > > > The ampersand says "pass argument by reference (so as to get arg passing > > efficiency) but force coder to write code as if they were passing by value" > > As a result, it gets difficult to pass null pointers (for reasons > > similar to the difficulty of passing null pointers in Java (and yes, See above for NULL-pointer equivalents. > > I loathe Java, sorry to subject you to that)) Anyway, that's a C++ trick > > only; I wish it was in C so I could experiment more and find out if I No, it's not a trick butt an ordinary language feature. And no, it was already in several Pascal's decades ago. > > like it or hate it. > > Actually, the true pass by reference (not by pointer) is one of the > things that C++ has, that I wish C had. No, you probably don't because if you forget one of the & in a call chain (or even worse: it is removed for whatever bizarre reason), you might get interesting bugs to hunt. And C++ is also much more creative with temporaries which are simply thrown away afterwards. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services From bernd at firmix.at Wed Nov 9 20:25:20 2005 From: bernd at firmix.at (Bernd Petrovitsch) Date: Wed, 09 Nov 2005 10:25:20 +0100 Subject: typedefs and structs In-Reply-To: <20051109004247.GL19593@austin.ibm.com> References: <20051107182727.GD18861@kroah.com> <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051108235759.GA28271@localhost.localdomain> <1131495228.12797.67.camel@localhost> <20051109004247.GL19593@austin.ibm.com> Message-ID: <1131528320.19171.17.camel@tara.firmix.at> On Tue, 2005-11-08 at 18:42 -0600, linas wrote: [ C vs C++ ] > It fundamentally changes coding style; you'd have to try it on some > mid-size project for at least a few months or longer to get into the > mindset. To make it all work, you also have to do other things, like > avoid mallocs and allocing on stack, which forces major changes of > style (because of the lifetime of things on stack). If you don't change The lifetime of the stack is AFAIK the same on C and C++. So there can't be a significant difference. > style to go with it, then you'll just end up in debug hell, in which > case you'd be right: it would be a (very) bad idea. > > (Disclaimer: I've moved away from C++ because of all the other > opportunities for misuse that it offers and encourages.) You that opportunities in all programming languages - in some more (perl being probably the leader here), in some less (I don't know one). Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services From jamagallon at able.es Wed Nov 9 21:16:40 2005 From: jamagallon at able.es (J.A. Magallon) Date: Wed, 9 Nov 2005 11:16:40 +0100 Subject: typedefs and structs In-Reply-To: <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> References: <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> Message-ID: <20051109111640.757f399a@werewolf.auna.net> On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett wrote: > > Pass by value in C: > do_some_stuff(arg1, arg2); > > Pass by reference in C: > do_some_stuff(&arg1, &arg2); > > This is very obvious what it does. The compiler does type-checks to > make sure you don't get it wrong. There are tools to check stack > usage of functions too. This is inherently obvious what the code > does without looking at a completely different file where the > function is defined. > > > Pass by value in C++: > do_some_stuff(arg1, arg2); > > Pass by reference in C++: > do_some_stuff(arg1, arg2); > > This is C++ being clever and hiding stuff from the programmer, which > is Not Good(TM) for a kernel. C++ may be an excellent language for > userspace programmers (I say "may" here because some disagree, > including myself), however, many of the features are extremely > problematic for a kernel. > Why is it not good for kernel ? You want to pass an struct to a function in the best way you can. Reference just pases a pointer instead of copying, but you don't realize. If you want the funcion to be able to modify the struct, code it as void do_some_stuff(T& arg1,T& arg2) If you DO NOT want the funcion to be able to modify the struct, code it as void do_some_stuff(const T& arg1,const T& arg2) This is far better than in C,. because you get the benefits from reference pass without the problems of accidental modification of pointer contents. And get rid of arrows -> ;). If the function modifies the struct it should be obvious from its name, not depending if you put an & in the call or not. And you stop worrying about argument pass methods. The person who programs the function decides and can even change it without you user even noticing. And gcc does nice optimizations when you mix const& and inlining... -- J.A. Magallon \ Software is like sex: werewolf!able!es \ It's better when it's free Mandriva Linux release 2006.1 (Cooker) for i586 Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1)) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051109/8e4a0d1f/attachment.pgp From dostrow at gentoo.org Thu Nov 10 02:56:27 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Wed, 9 Nov 2005 10:56:27 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <200511091143.48884.arnd@arndb.de> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091143.48884.arnd@arndb.de> Message-ID: <20051109155418.GA11895@Memoria.zivexott.local> On 11:43 Wed 09 Nov , Arnd Bergmann wrote: > On Dinsdag 08 November 2005 20:14, Daniel Ostrow wrote: > > +CC?????:= $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null > > 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) > > You should be using $(call cc-option,-m64) for this, like we do in the > rest of the Makefile. > > Arnd <>< Is something like the following better? Signed-off-by: Daniel Ostrow -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} dostrow at gentoo.org diff -Naupr powerpc-merge.orig/arch/powerpc/Makefile powerpc-merge/arch/powerpc/Makefile --- powerpc-merge.orig/arch/powerpc/Makefile 2005-11-08 10:18:18.000000000 -0800 +++ powerpc-merge/arch/powerpc/Makefile 2005-11-09 07:15:14.000000000 -0800 @@ -18,6 +18,11 @@ ifeq ($(CONFIG_PPC64),y) OLDARCH := ppc64 SZ := 64 +NATIVE64 := $(call cc-option-yn, -m64) +ifeq ($(NATIVE64),n) +CC := powerpc64-linux-gcc +endif + # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= diff -Naupr powerpc-merge.orig/arch/ppc64/Makefile powerpc-merge/arch/ppc64/Makefile --- powerpc-merge.orig/arch/ppc64/Makefile 2005-11-08 10:07:06.000000000 -0800 +++ powerpc-merge/arch/ppc64/Makefile 2005-11-09 07:21:41.000000000 -0800 @@ -15,6 +15,11 @@ KERNELLOAD := 0xc000000000000000 +NATIVE64 := $(call cc-option-yn, -m64) +ifeq ($(NATIVE64),n) +CC := powerpc64-linux-gcc +endif + # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= > From jwboyer at jdub.homelinux.org Thu Nov 10 03:50:33 2005 From: jwboyer at jdub.homelinux.org (Josh Boyer) Date: Wed, 09 Nov 2005 10:50:33 -0600 Subject: 440EP FPU support missing In-Reply-To: <20051108160200.I27232@cox.net> References: <20051107124917.C1671@cox.net> <20051107190128.68d41294.akpm@osdl.org> <20051108093759.A26086@cox.net> <200511081838.11236.sr@denx.de> <20051108153036.F27232@cox.net> <1131489174.26096.0.camel@yoda.jdub.homelinux.org> <20051108160200.I27232@cox.net> Message-ID: <1131555034.4114.0.camel@windu.rchland.ibm.com> On Tue, 2005-11-08 at 16:02 -0700, Matt Porter wrote: > On Tue, Nov 08, 2005 at 04:32:54PM -0600, Josh Boyer wrote: > > On Tue, 2005-11-08 at 15:30 -0700, Matt Porter wrote: > > > On Tue, Nov 08, 2005 at 06:38:11PM +0100, Stefan Roese wrote: > > > > In the current linux version, Bamboo (440EP) won't compile anymore, because of > > > > missing fpu support: > > > > > > > > make uImage > > > > ... > > > > LD init/built-in.o > > > > LD .tmp_vmlinux1 > > > > arch/ppc/kernel/head_44x.o(.text+0x868): In function `_start': > > > > : undefined reference to `KernelFP' > > > > make: *** [.tmp_vmlinux1] Error 1 > > > > > > > > Somehow arch/ppc/kernel/fpu.S has disappeared. :-( I assume, this happened in > > > > the ppc/ppc64 -> powerpc merge. Any thoughts, why this file disappeared and > > > > how to solve this problem (just restore the original file)? > > > > > > arch/powerpc/kernel/fpu.S is being used now which doesn't have KernelFP. > > > I don't know why the 44x fpu support wasn't using > > > kernel_fp_unavailable_exception() before but I must have missed that > > > reviewing it. > > > > > > Try this patch. > > > > Doesn't this render the 440EP's FPU useless? > > Does what render the 440EP's FPU useless? The supplied patch? I > don't think so, the path should be the same as classic PPC. > > The patch simply replaces the KernelFP routine that used to be in > arch/ppc/kernel/fpu.S (and was removed inadvertently in the arch/powerpc/ > merge) with a kernel_fp_unavailable_exception() call which does the > equivalent and is shared by others. > > The exception still loads up the fpu is coming from userspace and > only goes down this path when getting an FP unavailable exception from > kernel space. Yes, you're obviously right. I blame my idiocy on lack of coffee. Sorry for the noise. josh From arnd at arndb.de Thu Nov 10 03:55:14 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Nov 2005 17:55:14 +0100 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109155418.GA11895@Memoria.zivexott.local> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091143.48884.arnd@arndb.de> <20051109155418.GA11895@Memoria.zivexott.local> Message-ID: <200511091755.15114.arnd@arndb.de> On Middeweken 09 November 2005 16:56, Daniel Ostrow wrote: > On 11:43 Wed 09 Nov , Arnd Bergmann wrote: > > On Dinsdag 08 November 2005 20:14, Daniel Ostrow wrote: > > > +CC?????:= $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null > > > 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) > > > > You should be using $(call cc-option,-m64) for this, like we do in the > > rest of the Makefile. > > > > Arnd <>< > > Is something like the following better? I find that much more readable, yes. However, I first misunderstood what you are really trying to do. I think that if you want to use a cross compiler, you normally should set $(CROSS_COMPILE) to powerpc64-linux- instead of only changing $(CC) but not the other tools, right? Arnd <>< From dostrow at gentoo.org Thu Nov 10 04:08:29 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Wed, 9 Nov 2005 12:08:29 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <200511091755.15114.arnd@arndb.de> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091143.48884.arnd@arndb.de> <20051109155418.GA11895@Memoria.zivexott.local> <200511091755.15114.arnd@arndb.de> Message-ID: <20051109170829.GB11895@Memoria.zivexott.local> On 17:55 Wed 09 Nov , Arnd Bergmann wrote: > On Middeweken 09 November 2005 16:56, Daniel Ostrow wrote: > > On 11:43 Wed 09 Nov , Arnd Bergmann wrote: > > > On Dinsdag 08 November 2005 20:14, Daniel Ostrow wrote: > > > > +CC?????:= $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null > > > > 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) > > > > > > You should be using $(call cc-option,-m64) for this, like we do in the > > > rest of the Makefile. > > > > > > Arnd <>< > > > > Is something like the following better? > > I find that much more readable, yes. > > However, I first misunderstood what you are really trying to do. > I think that if you want to use a cross compiler, you normally > should set $(CROSS_COMPILE) to powerpc64-linux- instead of > only changing $(CC) but not the other tools, right? > > Arnd <>< > Actually yes I like that *much* better...good call :) New diff attached. Signed-off-by: Daniel Ostrow diff -Naupr powerpc-merge.orig/arch/powerpc/Makefile powerpc-merge/arch/powerpc/Makefile --- powerpc-merge.orig/arch/powerpc/Makefile 2005-11-08 10:18:18.000000000 -0800 +++ powerpc-merge/arch/powerpc/Makefile 2005-11-09 09:06:28.000000000 -0800 @@ -18,6 +18,11 @@ ifeq ($(CONFIG_PPC64),y) OLDARCH := ppc64 SZ := 64 +NATIVE64 := $(call cc-option-yn, -m64) +ifeq ($(NATIVE64),n) +CROSS_COMPILE := powerpc64-unknown-linux-gnu- +endif + # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= diff -Naupr powerpc-merge.orig/arch/ppc64/Makefile powerpc-merge/arch/ppc64/Makefile --- powerpc-merge.orig/arch/ppc64/Makefile 2005-11-08 10:07:06.000000000 -0800 +++ powerpc-merge/arch/ppc64/Makefile 2005-11-09 09:07:16.000000000 -0800 @@ -15,6 +15,11 @@ KERNELLOAD := 0xc000000000000000 +NATIVE64 := $(call cc-option-yn, -m64) +ifeq ($(NATIVE64),n) +CROSS_COMPILE := powerpc64-unknown-linux-gnu- +endif + # Set default 32 bits cross compilers for vdso and boot wrapper CROSS32_COMPILE ?= From hch at lst.de Thu Nov 10 04:21:25 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 9 Nov 2005 18:21:25 +0100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130916198.20136.17.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> Message-ID: <20051109172125.GA12861@lst.de> Booting current mainline with 64K pagesize enabled gives me a purple (!) screen early during boot. From olof at lixom.net Thu Nov 10 04:29:28 2005 From: olof at lixom.net (Olof Johansson) Date: Wed, 9 Nov 2005 09:29:28 -0800 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109170829.GB11895@Memoria.zivexott.local> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091143.48884.arnd@arndb.de> <20051109155418.GA11895@Memoria.zivexott.local> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> Message-ID: <20051109172928.GA7323@pb15.lixom.net> On Wed, Nov 09, 2005 at 12:08:29PM -0500, Daniel Ostrow wrote: > On 17:55 Wed 09 Nov , Arnd Bergmann wrote: > > On Middeweken 09 November 2005 16:56, Daniel Ostrow wrote: > > > On 11:43 Wed 09 Nov , Arnd Bergmann wrote: > > > > On Dinsdag 08 November 2005 20:14, Daniel Ostrow wrote: > > > > > +CC?????:= $(shell if $(CC) -m64 -S -o /dev/null -xc /dev/null >/dev/null > > > > > 2>&1; then echo $(CC); else echo powerpc64-linux-gcc; fi ) > > > > > > > > You should be using $(call cc-option,-m64) for this, like we do in the > > > > rest of the Makefile. > > > > > > > > Arnd <>< > > > > > > Is something like the following better? > > > > I find that much more readable, yes. > > > > However, I first misunderstood what you are really trying to do. > > I think that if you want to use a cross compiler, you normally > > should set $(CROSS_COMPILE) to powerpc64-linux- instead of > > only changing $(CC) but not the other tools, right? > > > > Arnd <>< > > > > Actually yes I like that *much* better...good call :) > > New diff attached. [...] > +NATIVE64 := $(call cc-option-yn, -m64) > +ifeq ($(NATIVE64),n) > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > +endif > + That one overrides any manual CROSS_COMPILE setting, doesn't it? Also, I know most of the systems I have been on use powerpc64-linux-gcc, not the longer one you specified, it's a better default. -Olof From arnd at arndb.de Thu Nov 10 04:32:36 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Nov 2005 18:32:36 +0100 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109170829.GB11895@Memoria.zivexott.local> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> Message-ID: <200511091832.37094.arnd@arndb.de> On Middeweken 09 November 2005 18:08, Daniel Ostrow wrote: > +NATIVE64 := $(call cc-option-yn, -m64) > +ifeq ($(NATIVE64),n) > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > +endif > + But now you're using powerpc64-unknown-linux-gnu- instead of powerpc64-linux-. I don't know which one is more widely used, but at least the debian packages I'm using have 'powerpc64-linux' only. Arnd <>< From arnd at arndb.de Thu Nov 10 04:36:43 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Nov 2005 18:36:43 +0100 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109172928.GA7323@pb15.lixom.net> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <20051109170829.GB11895@Memoria.zivexott.local> <20051109172928.GA7323@pb15.lixom.net> Message-ID: <200511091836.44242.arnd@arndb.de> On Middeweken 09 November 2005 18:29, Olof Johansson wrote: > That one overrides any manual CROSS_COMPILE setting, doesn't it? > Also, I know most of the systems I have been on use powerpc64-linux-gcc, > not the longer one you specified, it's a better default. > It overrides only the settings supplied with 'CROSS_COMPILE=foo- make', not those from 'make CROSS_COMPILE=foo-'. But you're right, the Makefile should better use 'CROSS_COMPILE ?= ...'. Arnd <>< From dostrow at gentoo.org Thu Nov 10 04:43:59 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Wed, 9 Nov 2005 12:43:59 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <200511091832.37094.arnd@arndb.de> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> <200511091832.37094.arnd@arndb.de> Message-ID: <20051109174358.GC11895@Memoria.zivexott.local> On 18:32 Wed 09 Nov , Arnd Bergmann wrote: > On Middeweken 09 November 2005 18:08, Daniel Ostrow wrote: > > > +NATIVE64 := $(call cc-option-yn, -m64) > > +ifeq ($(NATIVE64),n) > > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > > +endif > > + > > But now you're using powerpc64-unknown-linux-gnu- instead > of powerpc64-linux-. I don't know which one is more widely > used, but at least the debian packages I'm using have > 'powerpc64-linux' only. > > Arnd <>< On Gentoo we use powerpc64-unknown-linux-gnu-* for all the binutils and gcc symlinks. For any cross compilers we also include a powerpc64-linux-gcc symlink but this is the only one that fits the *-linux mold none of the binutils ones do the same. Once we figure out the best way to do it (be it powerpc64-unknown-linux-gnu- or powerpc64-linux-) I will send a new diff with the CROSS_COMPILE ?= added in. -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} dostrow at gentoo.org From linas at austin.ibm.com Thu Nov 10 06:20:28 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 13:20:28 -0600 Subject: typedefs and structs In-Reply-To: References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> Message-ID: <20051109192028.GP19593@austin.ibm.com> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: > On Wed, 9 Nov 2005, J.A. Magallon wrote: > > > void do_some_stuff(T& arg1,T& arg2) > > A diligent C programmer would write this as follows: > void do_some_stuff (struct T * a, struct T * b); > So I don't see C++ winning at all here. I guess the real point that I'd wanted to make, and seems to have gotten lost, was that by avoiding using pointers, you end up designing code in a very different way, and you can find out that often/usually, you don't need structs filled with a zoo of pointers. Minimizing pointers is good: less ref counting is needed, fewer mallocs are needed, fewer locks are needed (because of local/private scope!!), and null pointer deref errors are less likely. There are even performance implications: on modern CPU's there's a very long pipeline to memory (hundreds of cycles for a cache miss! Really! Worse if you have run out of TLB entries!). So walking a long linked list chasing pointers can really really hurt performance. By using refs instead of pointers, it helps you focus on the issue of "do I really need to store this pointer somewhere? Will I really need it later, or can I be done with it now?". I don't know if the idea of "using fewer pointers" can actually be carried out in the kernel. For starters, the stack is way too short to be able to put much on it. --linas From linas at austin.ibm.com Thu Nov 10 06:27:43 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 13:27:43 -0600 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <200511091832.37094.arnd@arndb.de> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> <200511091832.37094.arnd@arndb.de> Message-ID: <20051109192743.GQ19593@austin.ibm.com> On Wed, Nov 09, 2005 at 06:32:36PM +0100, Arnd Bergmann was heard to remark: > On Middeweken 09 November 2005 18:08, Daniel Ostrow wrote: > > > +NATIVE64 := $(call cc-option-yn, -m64) > > +ifeq ($(NATIVE64),n) > > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > > +endif > > + > > But now you're using powerpc64-unknown-linux-gnu- instead > of powerpc64-linux-. I don't know which one is more widely > used, but at least the debian packages I'm using have > 'powerpc64-linux' only. SUSE uses (used) powerpc64-linux- although its native now. Is debian not native? --linas From linas at austin.ibm.com Thu Nov 10 06:38:28 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 13:38:28 -0600 Subject: typedefs and structs In-Reply-To: <20051109193625.GA31889@hockin.org> References: <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> <20051109193625.GA31889@hockin.org> Message-ID: <20051109193828.GR19593@austin.ibm.com> On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin at hockin.org was heard to remark: > On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote: > > I guess the real point that I'd wanted to make, and seems > > to have gotten lost, was that by avoiding using pointers, > > you end up designing code in a very different way, and you > > can find out that often/usually, you don't need structs > > filled with a zoo of pointers. > > Umm, references are implemented as pointers. Instead of a "zoo of > pointers" you have a "zoo of references". No functional difference. Sigh. I think you are confusing references and pointers. By definition you cannot "store a reference"; however, you can "dereference" an object and store a pointer to it. The C programming language conflates these two different ideas; that is why they seem to be "the same thing" to you. > > Minimizing pointers is good: less ref counting is needed, > > fewer mallocs are needed, fewer locks are needed > > (because of local/private scope!!), and null pointer > > deref errors are less likely. > > Not true at all! Which part isn't true? --linas From olh at suse.de Thu Nov 10 06:51:03 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Nov 2005 20:51:03 +0100 Subject: [PATCH] ppc64 boot: remove local initializers Message-ID: <20051109195103.GA31658@suse.de> Remove initialization of local variables. They get all values assigned before use. Signed-off-by: Olaf Hering arch/ppc64/boot/addRamDisk.c | 56 +++++++++++++++++++++---------------------- 1 files changed, 28 insertions(+), 28 deletions(-) Index: linux-2.6.14/arch/ppc64/boot/addRamDisk.c =================================================================== --- linux-2.6.14.orig/arch/ppc64/boot/addRamDisk.c +++ linux-2.6.14/arch/ppc64/boot/addRamDisk.c @@ -34,35 +34,35 @@ void death(const char *msg, FILE *fdesc, int main(int argc, char **argv) { char inbuf[4096]; - FILE *ramDisk = NULL; - FILE *sysmap = NULL; - FILE *inputVmlinux = NULL; - FILE *outputVmlinux = NULL; - - unsigned i = 0; - unsigned long ramFileLen = 0; - unsigned long ramLen = 0; - unsigned long roundR = 0; - - unsigned long sysmapFileLen = 0; - unsigned long sysmapLen = 0; - unsigned long sysmapPages = 0; - char* ptr_end = NULL; - unsigned long offset_end = 0; - - unsigned long kernelLen = 0; - unsigned long actualKernelLen = 0; - unsigned long round = 0; - unsigned long roundedKernelLen = 0; - unsigned long ramStartOffs = 0; - unsigned long ramPages = 0; - unsigned long roundedKernelPages = 0; - unsigned long hvReleaseData = 0; + FILE *ramDisk; + FILE *sysmap; + FILE *inputVmlinux; + FILE *outputVmlinux; + + unsigned i; + unsigned long ramFileLen; + unsigned long ramLen; + unsigned long roundR; + + unsigned long sysmapFileLen; + unsigned long sysmapLen; + unsigned long sysmapPages; + char *ptr_end; + unsigned long offset_end; + + unsigned long kernelLen; + unsigned long actualKernelLen; + unsigned long round; + unsigned long roundedKernelLen; + unsigned long ramStartOffs; + unsigned long ramPages; + unsigned long roundedKernelPages; + unsigned long hvReleaseData; u_int32_t eyeCatcher = 0xc8a5d9c4; - unsigned long naca = 0; - unsigned long xRamDisk = 0; - unsigned long xRamDiskSize = 0; - long padPages = 0; + unsigned long naca; + unsigned long xRamDisk; + unsigned long xRamDiskSize; + long padPages; if (argc < 2) { -- short story of a lazy sysadmin: alias appserv=wotan From dostrow at gentoo.org Thu Nov 10 06:47:21 2005 From: dostrow at gentoo.org (Daniel Ostrow) Date: Wed, 9 Nov 2005 14:47:21 -0500 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109192743.GQ19593@austin.ibm.com> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> <200511091832.37094.arnd@arndb.de> <20051109192743.GQ19593@austin.ibm.com> Message-ID: <20051109194721.GD11895@Memoria.zivexott.local> On 13:27 Wed 09 Nov , linas wrote: > On Wed, Nov 09, 2005 at 06:32:36PM +0100, Arnd Bergmann was heard to remark: > > On Middeweken 09 November 2005 18:08, Daniel Ostrow wrote: > > > > > +NATIVE64 := $(call cc-option-yn, -m64) > > > +ifeq ($(NATIVE64),n) > > > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > > > +endif > > > + > > > > But now you're using powerpc64-unknown-linux-gnu- instead > > of powerpc64-linux-. I don't know which one is more widely > > used, but at least the debian packages I'm using have > > 'powerpc64-linux' only. > > SUSE uses (used) powerpc64-linux- although its native now. > Is debian not native? > > --linas > A little insight into the way Gentoo does things...something that is seemingly somewhat non-standard. On the Gentoo side regardless of whether the toolchain is native or not symlinks are created for $CHOST- for binutils and gcc. To that effect Gentoo has three environments for a user to choose on ppc64, a pure 32-bit userland compiled using the powerpc-unknown-linux-gnu CHOST, a pure 64-bit userland compiled using the powerpc64-unknown-linux-gnu CHOST and Gentoo's flavor of multilib which uses the powerpc64-unknown-linux-gnu CHOST and a series of emulation libraries. All three use ARCH=ppc64 at the moment. For the 32-bit userland we compile a 64bit, C only, cross compiler to build the kernel and headers. This leaves the user with powerpc-unknown-linux-gnu-gcc (their native compiler the same binary as /usr/bin/gcc) and powerpc64-unknown-linux-gnu-gcc (their C only cross compiler), as well as a full set of symlinks for all of the various binutils binaries. As well Gentoo also creates a powerpc64-linux-gcc symlink for this cross compiler. The only time a symlink of this type gets created is for the C only cross compilers used by mips, sparc and ppc64 for building kernels under a 32-bit userland. At the moment we have a note in our installation guide to tell people to add an `alias ppc64make="make CROSS_COMPILE='powerpc64-unknown-linux-gnu-'" to the bottom of their /etc/profile and use ppc64make to build their kernels. >From where I stand this is an ugly as sin hack and should be able to be taken care of automatically. I knew for example that it was already handled for sparc/sparc64 (who have the same setup Gentoo userland wise) in the kernel Makefile (see my original patch, it was just a pull of that check). I agree that setting $(CROSS_COMPILE) is probably a better option we just need to come up with the most standard syntax (which is looking more and more like it isn't the way Gentoo happens to be doing things at the moment) and go from there. Thanks again for everyones help :) -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} dostrow at gentoo.org From olh at suse.de Thu Nov 10 06:52:20 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Nov 2005 20:52:20 +0100 Subject: [PATCH] ppc64 boot: remove argv usage In-Reply-To: <20051109195103.GA31658@suse.de> References: <20051109195103.GA31658@suse.de> Message-ID: <20051109195220.GB31658@suse.de> Use a local variable for the input filenames. Signed-off-by: Olaf Hering arch/ppc64/boot/addRamDisk.c | 32 ++++++++++++++++++-------------- 1 files changed, 18 insertions(+), 14 deletions(-) Index: linux-2.6.14/arch/ppc64/boot/addRamDisk.c =================================================================== --- linux-2.6.14.orig/arch/ppc64/boot/addRamDisk.c +++ linux-2.6.14/arch/ppc64/boot/addRamDisk.c @@ -39,6 +39,7 @@ int main(int argc, char **argv) FILE *inputVmlinux; FILE *outputVmlinux; + char *rd_name, *lx_name, *out_name; unsigned i; unsigned long ramFileLen; unsigned long ramLen; @@ -69,6 +70,7 @@ int main(int argc, char **argv) fprintf(stderr, "Name of RAM disk file missing.\n"); exit(1); } + rd_name = argv[1] if (argc < 3) { fprintf(stderr, "Name of System Map input file is missing.\n"); @@ -79,16 +81,18 @@ int main(int argc, char **argv) fprintf(stderr, "Name of vmlinux file missing.\n"); exit(1); } + lx_name = argv[3]; if (argc < 5) { fprintf(stderr, "Name of vmlinux output file missing.\n"); exit(1); } + out_name = argv[4]; - ramDisk = fopen(argv[1], "r"); + ramDisk = fopen(rd_name, "r"); if ( ! ramDisk ) { - fprintf(stderr, "RAM disk file \"%s\" failed to open.\n", argv[1]); + fprintf(stderr, "RAM disk file \"%s\" failed to open.\n", rd_name); exit(1); } @@ -98,15 +102,15 @@ int main(int argc, char **argv) exit(1); } - inputVmlinux = fopen(argv[3], "r"); + inputVmlinux = fopen(lx_name, "r"); if ( ! inputVmlinux ) { - fprintf(stderr, "vmlinux file \"%s\" failed to open.\n", argv[3]); + fprintf(stderr, "vmlinux file \"%s\" failed to open.\n", lx_name); exit(1); } - outputVmlinux = fopen(argv[4], "w+"); + outputVmlinux = fopen(out_name, "w+"); if ( ! outputVmlinux ) { - fprintf(stderr, "output vmlinux file \"%s\" failed to open.\n", argv[4]); + fprintf(stderr, "output vmlinux file \"%s\" failed to open.\n", out_name); exit(1); } @@ -194,7 +198,7 @@ int main(int argc, char **argv) fseek(ramDisk, 0, SEEK_END); ramFileLen = ftell(ramDisk); fseek(ramDisk, 0, SEEK_SET); - printf("%s file size = %ld/0x%lx \n", argv[1], ramFileLen, ramFileLen); + printf("%s file size = %ld/0x%lx \n", rd_name, ramFileLen, ramFileLen); ramLen = ramFileLen; @@ -248,7 +252,7 @@ int main(int argc, char **argv) /* fseek to the hvReleaseData pointer */ fseek(outputVmlinux, ElfHeaderSize + 0x24, SEEK_SET); if (fread(&hvReleaseData, 4, 1, outputVmlinux) != 1) { - death("Could not read hvReleaseData pointer\n", outputVmlinux, argv[4]); + death("Could not read hvReleaseData pointer\n", outputVmlinux, out_name); } hvReleaseData = ntohl(hvReleaseData); /* Convert to native int */ printf("hvReleaseData is at %08x\n", hvReleaseData); @@ -256,11 +260,11 @@ int main(int argc, char **argv) /* fseek to the hvReleaseData */ fseek(outputVmlinux, ElfHeaderSize + hvReleaseData, SEEK_SET); if (fread(inbuf, 0x40, 1, outputVmlinux) != 1) { - death("Could not read hvReleaseData\n", outputVmlinux, argv[4]); + death("Could not read hvReleaseData\n", outputVmlinux, out_name); } /* Check hvReleaseData sanity */ if (memcmp(inbuf, &eyeCatcher, 4) != 0) { - death("hvReleaseData is invalid\n", outputVmlinux, argv[4]); + death("hvReleaseData is invalid\n", outputVmlinux, out_name); } /* Get the naca pointer */ naca = ntohl(*((u_int32_t*) &inbuf[0x0C])) - KERNELBASE; @@ -269,13 +273,13 @@ int main(int argc, char **argv) /* fseek to the naca */ fseek(outputVmlinux, ElfHeaderSize + naca, SEEK_SET); if (fread(inbuf, 0x18, 1, outputVmlinux) != 1) { - death("Could not read naca\n", outputVmlinux, argv[4]); + death("Could not read naca\n", outputVmlinux, out_name); } xRamDisk = ntohl(*((u_int32_t *) &inbuf[0x0c])); xRamDiskSize = ntohl(*((u_int32_t *) &inbuf[0x14])); /* Make sure a RAM disk isn't already present */ if ((xRamDisk != 0) || (xRamDiskSize != 0)) { - death("RAM disk is already attached to this kernel\n", outputVmlinux, argv[4]); + death("RAM disk is already attached to this kernel\n", outputVmlinux, out_name); } /* Fill in the values */ *((u_int32_t *) &inbuf[0x0c]) = htonl(ramStartOffs); @@ -285,7 +289,7 @@ int main(int argc, char **argv) fflush(outputVmlinux); fseek(outputVmlinux, ElfHeaderSize + naca, SEEK_SET); if (fwrite(inbuf, 0x18, 1, outputVmlinux) != 1) { - death("Could not write naca\n", outputVmlinux, argv[4]); + death("Could not write naca\n", outputVmlinux, out_name); } printf("Ram Disk of 0x%lx pages is attached to the kernel at offset 0x%08x\n", ramPages, ramStartOffs); @@ -293,7 +297,7 @@ int main(int argc, char **argv) /* Done */ fclose(outputVmlinux); /* Set permission to executable */ - chmod(argv[4], S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH); + chmod(out_name, S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH); return 0; } -- short story of a lazy sysadmin: alias appserv=wotan From olh at suse.de Thu Nov 10 06:53:43 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Nov 2005 20:53:43 +0100 Subject: [PATCH] ppc64 boot: remove sysmap from required filenames In-Reply-To: <20051109195220.GB31658@suse.de> References: <20051109195103.GA31658@suse.de> <20051109195220.GB31658@suse.de> Message-ID: <20051109195343.GC31658@suse.de> A stripped vmlinux does not contain enough symbols to recreate the System.map. The System.map file is only used to determine the end of the runtime memory size. This is the same value (rounded up to PAGE_SIZE) as ->memsiz in the ELF program header. Also, the target vmlinux.initrd doesnt work in 2.6.14: arch/ppc64/boot/addRamDisk arch/ppc64/boot/ramdisk.image.gz vmlinux.strip arch/ppc64/boot/vmlinux.initrd Name of vmlinux output file missing. Signed-off-by: Olaf Hering arch/ppc64/boot/addRamDisk.c | 131 ++++++++++++++++++++++--------------------- 1 files changed, 69 insertions(+), 62 deletions(-) Index: linux-2.6.14/arch/ppc64/boot/addRamDisk.c =================================================================== --- linux-2.6.14.orig/arch/ppc64/boot/addRamDisk.c +++ linux-2.6.14/arch/ppc64/boot/addRamDisk.c @@ -5,11 +5,59 @@ #include #include #include +#include #define ElfHeaderSize (64 * 1024) #define ElfPages (ElfHeaderSize / 4096) #define KERNELBASE (0xc000000000000000) +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) +struct addr_range { + unsigned long long addr; + unsigned long memsize; + unsigned long offset; +}; + +static int check_elf64(void *p, int size, struct addr_range *r) +{ + Elf64_Ehdr *elf64 = p; + Elf64_Phdr *elf64ph; + + if (elf64->e_ident[EI_MAG0] != ELFMAG0 || + elf64->e_ident[EI_MAG1] != ELFMAG1 || + elf64->e_ident[EI_MAG2] != ELFMAG2 || + elf64->e_ident[EI_MAG3] != ELFMAG3 || + elf64->e_ident[EI_CLASS] != ELFCLASS64 || + elf64->e_ident[EI_DATA] != ELFDATA2MSB || + elf64->e_type != ET_EXEC || elf64->e_machine != EM_PPC64) + return 0; + + if ((elf64->e_phoff + sizeof(Elf64_Phdr)) > size) + return 0; + + elf64ph = (Elf64_Phdr *) ((unsigned long)elf64 + + (unsigned long)elf64->e_phoff); + + r->memsize = (unsigned long)elf64ph->p_memsz; + r->offset = (unsigned long)elf64ph->p_offset; + r->addr = (unsigned long long)elf64ph->p_vaddr; + +#ifdef DEBUG + printf("PPC64 ELF file, ph:\n"); + printf("p_type 0x%08x\n", elf64ph->p_type); + printf("p_flags 0x%08x\n", elf64ph->p_flags); + printf("p_offset 0x%016llx\n", elf64ph->p_offset); + printf("p_vaddr 0x%016llx\n", elf64ph->p_vaddr); + printf("p_paddr 0x%016llx\n", elf64ph->p_paddr); + printf("p_filesz 0x%016llx\n", elf64ph->p_filesz); + printf("p_memsz 0x%016llx\n", elf64ph->p_memsz); + printf("p_align 0x%016llx\n", elf64ph->p_align); + printf("... skipping 0x%08lx bytes of ELF header\n", + (unsigned long)elf64ph->p_offset); +#endif + + return 64; +} void get4k(FILE *file, char *buf ) { unsigned j; @@ -34,21 +82,17 @@ void death(const char *msg, FILE *fdesc, int main(int argc, char **argv) { char inbuf[4096]; + struct addr_range vmlinux; FILE *ramDisk; - FILE *sysmap; FILE *inputVmlinux; FILE *outputVmlinux; char *rd_name, *lx_name, *out_name; - unsigned i; + + size_t i; unsigned long ramFileLen; unsigned long ramLen; unsigned long roundR; - - unsigned long sysmapFileLen; - unsigned long sysmapLen; - unsigned long sysmapPages; - char *ptr_end; unsigned long offset_end; unsigned long kernelLen; @@ -70,24 +114,19 @@ int main(int argc, char **argv) fprintf(stderr, "Name of RAM disk file missing.\n"); exit(1); } - rd_name = argv[1] + rd_name = argv[1]; if (argc < 3) { - fprintf(stderr, "Name of System Map input file is missing.\n"); - exit(1); - } - - if (argc < 4) { fprintf(stderr, "Name of vmlinux file missing.\n"); exit(1); } - lx_name = argv[3]; + lx_name = argv[2]; - if (argc < 5) { + if (argc < 4) { fprintf(stderr, "Name of vmlinux output file missing.\n"); exit(1); } - out_name = argv[4]; + out_name = argv[3]; ramDisk = fopen(rd_name, "r"); @@ -96,12 +135,6 @@ int main(int argc, char **argv) exit(1); } - sysmap = fopen(argv[2], "r"); - if ( ! sysmap ) { - fprintf(stderr, "System Map file \"%s\" failed to open.\n", argv[2]); - exit(1); - } - inputVmlinux = fopen(lx_name, "r"); if ( ! inputVmlinux ) { fprintf(stderr, "vmlinux file \"%s\" failed to open.\n", lx_name); @@ -113,18 +146,24 @@ int main(int argc, char **argv) fprintf(stderr, "output vmlinux file \"%s\" failed to open.\n", out_name); exit(1); } - - - + + i = fread(inbuf, 1, sizeof(inbuf), inputVmlinux); + if (i != sizeof(inbuf)) { + fprintf(stderr, "can not read vmlinux file %s: %u\n", lx_name, i); + exit(1); + } + + i = check_elf64(inbuf, sizeof(inbuf), &vmlinux); + if (i == 0) { + fprintf(stderr, "You must have a linux kernel specified as argv[2]\n"); + exit(1); + } + /* Input Vmlinux file */ fseek(inputVmlinux, 0, SEEK_END); kernelLen = ftell(inputVmlinux); fseek(inputVmlinux, 0, SEEK_SET); printf("kernel file size = %d\n", kernelLen); - if ( kernelLen == 0 ) { - fprintf(stderr, "You must have a linux kernel specified as argv[3]\n"); - exit(1); - } actualKernelLen = kernelLen - ElfHeaderSize; @@ -138,39 +177,7 @@ int main(int argc, char **argv) roundedKernelPages = roundedKernelLen / 4096; printf("Vmlinux pages to copy = %ld/0x%lx \n", roundedKernelPages, roundedKernelPages); - - - /* Input System Map file */ - /* (needs to be processed simply to determine if we need to add pad pages due to the static variables not being included in the vmlinux) */ - fseek(sysmap, 0, SEEK_END); - sysmapFileLen = ftell(sysmap); - fseek(sysmap, 0, SEEK_SET); - printf("%s file size = %ld/0x%lx \n", argv[2], sysmapFileLen, sysmapFileLen); - - sysmapLen = sysmapFileLen; - - roundR = 4096 - (sysmapLen % 4096); - if (roundR) { - printf("Rounding System Map file up to a multiple of 4096, adding %ld/0x%lx \n", roundR, roundR); - sysmapLen += roundR; - } - printf("Rounded System Map size is %ld/0x%lx \n", sysmapLen, sysmapLen); - - /* Process the Sysmap file to determine where _end is */ - sysmapPages = sysmapLen / 4096; - /* read the whole file line by line, expect that it doesn't fail */ - while ( fgets(inbuf, 4096, sysmap) ) ; - /* search for _end in the last page of the system map */ - ptr_end = strstr(inbuf, " _end"); - if (!ptr_end) { - fprintf(stderr, "Unable to find _end in the sysmap file \n"); - fprintf(stderr, "inbuf: \n"); - fprintf(stderr, "%s \n", inbuf); - exit(1); - } - printf("Found _end in the last page of the sysmap - backing up 10 characters it looks like %s", ptr_end-10); - /* convert address of _end in system map to hex offset. */ - offset_end = (unsigned int)strtol(ptr_end-10, NULL, 16); + offset_end = _ALIGN_UP(vmlinux.memsize, 4096); /* calc how many pages we need to insert between the vmlinux and the start of the ram disk */ padPages = offset_end/4096 - roundedKernelPages; -- short story of a lazy sysadmin: alias appserv=wotan From olh at suse.de Thu Nov 10 06:54:43 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Nov 2005 20:54:43 +0100 Subject: [PATCH] ppc64 boot: fix compile warnings In-Reply-To: <20051109195343.GC31658@suse.de> References: <20051109195103.GA31658@suse.de> <20051109195220.GB31658@suse.de> <20051109195343.GC31658@suse.de> Message-ID: <20051109195443.GD31658@suse.de> Fix a few compile warnings arch/ppc64/boot/addRamDisk.c:166: warning: int format, long unsigned int arg (arg 2) arch/ppc64/boot/addRamDisk.c:170: warning: int format, long unsigned int arg (arg 2) arch/ppc64/boot/addRamDisk.c:265: warning: unsigned int format, long unsigned int arg (arg 2) arch/ppc64/boot/addRamDisk.c:302: warning: unsigned int format, long unsigned int arg (arg 3) Signed-off-by: Olaf Hering arch/ppc64/boot/addRamDisk.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6.14/arch/ppc64/boot/addRamDisk.c =================================================================== --- linux-2.6.14.orig/arch/ppc64/boot/addRamDisk.c +++ linux-2.6.14/arch/ppc64/boot/addRamDisk.c @@ -163,11 +163,11 @@ int main(int argc, char **argv) fseek(inputVmlinux, 0, SEEK_END); kernelLen = ftell(inputVmlinux); fseek(inputVmlinux, 0, SEEK_SET); - printf("kernel file size = %d\n", kernelLen); + printf("kernel file size = %lu\n", kernelLen); actualKernelLen = kernelLen - ElfHeaderSize; - printf("actual kernel length (minus ELF header) = %d\n", actualKernelLen); + printf("actual kernel length (minus ELF header) = %lu\n", actualKernelLen); round = actualKernelLen % 4096; roundedKernelLen = actualKernelLen; @@ -262,7 +262,7 @@ int main(int argc, char **argv) death("Could not read hvReleaseData pointer\n", outputVmlinux, out_name); } hvReleaseData = ntohl(hvReleaseData); /* Convert to native int */ - printf("hvReleaseData is at %08x\n", hvReleaseData); + printf("hvReleaseData is at %08lx\n", hvReleaseData); /* fseek to the hvReleaseData */ fseek(outputVmlinux, ElfHeaderSize + hvReleaseData, SEEK_SET); @@ -298,7 +298,7 @@ int main(int argc, char **argv) if (fwrite(inbuf, 0x18, 1, outputVmlinux) != 1) { death("Could not write naca\n", outputVmlinux, out_name); } - printf("Ram Disk of 0x%lx pages is attached to the kernel at offset 0x%08x\n", + printf("Ram Disk of 0x%lx pages is attached to the kernel at offset 0x%08lx\n", ramPages, ramStartOffs); /* Done */ -- short story of a lazy sysadmin: alias appserv=wotan From linas at austin.ibm.com Thu Nov 10 07:02:13 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 14:02:13 -0600 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <20051109194721.GD11895@Memoria.zivexott.local> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> <200511091832.37094.arnd@arndb.de> <20051109192743.GQ19593@austin.ibm.com> <20051109194721.GD11895@Memoria.zivexott.local> Message-ID: <20051109200213.GT19593@austin.ibm.com> On Wed, Nov 09, 2005 at 02:47:21PM -0500, Daniel Ostrow was heard to remark: > > > > > > > +NATIVE64 := $(call cc-option-yn, -m64) > > > > +ifeq ($(NATIVE64),n) > > > > +CROSS_COMPILE := powerpc64-unknown-linux-gnu- > > > > +endif > > > > > > But now you're using powerpc64-unknown-linux-gnu- instead > > > of powerpc64-linux-. I don't know which one is more widely > > > used, Well, if everyone else has gone "native", then the reasonable default is to pick the most popular "non-native" usage. --linas From kravetz at us.ibm.com Thu Nov 10 07:17:20 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Wed, 9 Nov 2005 12:17:20 -0800 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051109172125.GA12861@lst.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> Message-ID: <20051109201720.GB5443@w-mikek2.ibm.com> On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote: > Booting current mainline with 64K pagesize enabled gives me a purple (!) > screen early during boot. I seem to also be having problems with this patch. My OpenPOWER 720 stopped booting with 2.6.14-git10(and later). Just using defconfig. 64k page size NOT enabled. If I back out the 64k page size patch, 2.6.14-git10 boots. I'm trying to get more info but it is painful. It dies before xmon is initialized. I could have sworn that I booted 2.6.14-git7 with the 64k page size patch applied. But, I can't do that now either. Some co-workers have successfully booted other POWER systems with these kernels. So, it must be specific to my hardware/LPAR configuration. -- Mike From benh at kernel.crashing.org Thu Nov 10 07:32:15 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Nov 2005 07:32:15 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051109201720.GB5443@w-mikek2.ibm.com> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> Message-ID: <1131568336.24637.91.camel@gaston> On Wed, 2005-11-09 at 12:17 -0800, Mike Kravetz wrote: > On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote: > > Booting current mainline with 64K pagesize enabled gives me a purple (!) > > screen early during boot. > > I seem to also be having problems with this patch. My OpenPOWER 720 > stopped booting with 2.6.14-git10(and later). Just using defconfig. > 64k page size NOT enabled. If I back out the 64k page size patch, > 2.6.14-git10 boots. I'm trying to get more info but it is painful. > It dies before xmon is initialized. There have been a couple of fixes, try the very latest git. Also, try enabling early debug in arch/ppc64/kernel/setup.c > I could have sworn that I booted 2.6.14-git7 with the 64k page size > patch applied. But, I can't do that now either. > > Some co-workers have successfully booted other POWER systems with these > kernels. So, it must be specific to my hardware/LPAR configuration. Ok, i'll do more tests here too. Ben. From benh at kernel.crashing.org Thu Nov 10 07:36:29 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Nov 2005 07:36:29 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051109172125.GA12861@lst.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> Message-ID: <1131568589.24637.93.camel@gaston> On Wed, 2005-11-09 at 18:21 +0100, Christoph Hellwig wrote: > Booting current mainline with 64K pagesize enabled gives me a purple (!) > screen early during boot. On the G5 ? Weird... I'll test. Ben. From benh at kernel.crashing.org Thu Nov 10 08:08:06 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Nov 2005 08:08:06 +1100 Subject: 2.6.14-mm1 doesnt bootup on PPC64 In-Reply-To: <20051108201532.GV19593@austin.ibm.com> References: <20051107132201.GA13514@in.ibm.com> <20051107134633.GS26395@bubble.grove.modra.org> <20051108062854.GB13514@in.ibm.com> <20051108201532.GV19593@austin.ibm.com> Message-ID: <1131570486.24637.101.camel@gaston> On Tue, 2005-11-08 at 14:15 -0600, linas wrote: > On Tue, Nov 08, 2005 at 11:58:54AM +0530, Srivatsa Vaddagiri was heard to remark: > > On Tue, Nov 08, 2005 at 12:16:33AM +1030, Alan Modra wrote: > > > Compiler version? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24644 > > > might be relevant. > > > > I tried with both gcc 3.4.3 (RHEL 4) and gcc 3.3.3 (SLES 9). Both result > > in a non-booting kernel. This is with ppc64's 'defconfig' btw. > > Not the only problem. > > -- If pci hotplug for ppc64 is enabled, the kernel won't compile, > misc symbols missing related to of_pci_whatever() > > (I don't have the messages in front of me; I notified John Rose) > > -- w/ hotplug disabled, the kernel boots, but spews a bunch of these: Does pci_device_add() in probe.c still start with device_initialize() ? If not, then that is the problem. We need to mvoe that to the ppc code I suppose. Ben. From schwab at suse.de Thu Nov 10 08:53:13 2005 From: schwab at suse.de (Andreas Schwab) Date: Wed, 09 Nov 2005 22:53:13 +0100 Subject: typedefs and structs In-Reply-To: <20051109203954.GA3539@hockin.org> (thockin@hockin.org's message of "Wed, 9 Nov 2005 12:39:54 -0800") References: <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> <20051109193625.GA31889@hockin.org> <20051109193828.GR19593@austin.ibm.com> <20051109203954.GA3539@hockin.org> Message-ID: thockin at hockin.org writes: > Sigh, That's funny - I've written C++ code which has references as members > of objects. You absolutely *can* store a reference. You can _initialize_, but not _modify_ (reseat) it. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From bernd at firmix.at Thu Nov 10 09:00:50 2005 From: bernd at firmix.at (Bernd Petrovitsch) Date: Wed, 09 Nov 2005 23:00:50 +0100 Subject: typedefs and structs In-Reply-To: References: <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> <20051109193625.GA31889@hockin.org> <20051109193828.GR19593@austin.ibm.com> <20051109203954.GA3539@hockin.org> Message-ID: <1131573650.3258.2.camel@gimli.at.home> On Wed, 2005-11-09 at 22:53 +0100, Andreas Schwab wrote: > thockin at hockin.org writes: > > > Sigh, That's funny - I've written C++ code which has references as members > > of objects. You absolutely *can* store a reference. > > You can _initialize_, but not _modify_ (reseat) it. reset? As in: ---- snip ---- struct x { struct y * const p; }; ---- snip ---- We assume that no one casts the "const" away. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services From benh at kernel.crashing.org Thu Nov 10 09:01:32 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Nov 2005 09:01:32 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1131573556.25354.1.camel@localhost.localdomain> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> Message-ID: <1131573693.24637.109.camel@gaston> > I didn't have any luck on 2.6.14-git12 either. > I tried 64k page support on my P570. > > Here are the console messages: What distro do you use in userland ? Some older glibc versions have a bug that cause issues with 64k pages, though it generally happens with login blowing up, not init ... Ben. From paulus at samba.org Thu Nov 10 09:15:11 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 10 Nov 2005 09:15:11 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051109172125.GA12861@lst.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> Message-ID: <17266.29935.352768.742780@cargo.ozlabs.ibm.com> Christoph Hellwig writes: > Booting current mainline with 64K pagesize enabled gives me a purple (!) > screen early during boot. Cool! Is this on a G5, or what sort of machine? What .config are you using? Paul. From vlobanov at speakeasy.net Thu Nov 10 03:22:15 2005 From: vlobanov at speakeasy.net (Vadim Lobanov) Date: Wed, 9 Nov 2005 08:22:15 -0800 (PST) Subject: typedefs and structs In-Reply-To: <20051109111640.757f399a@werewolf.auna.net> References: <20051107185621.GD19593@austin.ibm.com> <20051107190245.GA19707@kroah.com> <20051107193600.GE19593@austin.ibm.com> <20051107200257.GA22524@kroah.com> <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> Message-ID: On Wed, 9 Nov 2005, J.A. Magallon wrote: > On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett wrote: > > > > > Pass by value in C: > > do_some_stuff(arg1, arg2); > > > > Pass by reference in C: > > do_some_stuff(&arg1, &arg2); > > > > This is very obvious what it does. The compiler does type-checks to > > make sure you don't get it wrong. There are tools to check stack > > usage of functions too. This is inherently obvious what the code > > does without looking at a completely different file where the > > function is defined. > > > > > > Pass by value in C++: > > do_some_stuff(arg1, arg2); > > > > Pass by reference in C++: > > do_some_stuff(arg1, arg2); > > > > This is C++ being clever and hiding stuff from the programmer, which > > is Not Good(TM) for a kernel. C++ may be an excellent language for > > userspace programmers (I say "may" here because some disagree, > > including myself), however, many of the features are extremely > > problematic for a kernel. > > > > Why is it not good for kernel ? > You want to pass an struct to a function in the best way you can. > Reference just pases a pointer instead of copying, but you don't > realize. > If you want the funcion to be able to modify the struct, code it as > > void do_some_stuff(T& arg1,T& arg2) > > If you DO NOT want the funcion to be able to modify the struct, code it as > > void do_some_stuff(const T& arg1,const T& arg2) A diligent C programmer would write this as follows: void do_some_stuff (struct T * a, struct T * b); versus void do_more_stuff (const struct T * a, const struct T * b); So I don't see C++ winning at all here. > This is far better than in C,. because you get the benefits from > reference pass without the problems of accidental modification of > pointer contents. And get rid of arrows -> ;). > > If the function modifies the struct it should be obvious from its name, > not depending if you put an & in the call or not. > And you stop worrying about argument pass methods. I think I'll call this my rule #1: The moment you stop worrying about something is the moment it bites you in the butt. :-) Much firsthand experience. > The person who programs the function decides and can even change it without > you user even noticing. And if the caller is passing in something that's not meant to be modified, then the modification causes much badness. Happens with both languages, too. > And gcc does nice optimizations when you mix const& and inlining... As far as I know, nothing stops GCC from doing the exact same optimizations in the function prototypes given above. > > -- > J.A. Magallon \ Software is like sex: > werewolf!able!es \ It's better when it's free > Mandriva Linux release 2006.1 (Cooker) for i586 > Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1)) > -Vadim Lobanov From thockin at hockin.org Thu Nov 10 06:36:25 2005 From: thockin at hockin.org (thockin at hockin.org) Date: Wed, 9 Nov 2005 11:36:25 -0800 Subject: typedefs and structs In-Reply-To: <20051109192028.GP19593@austin.ibm.com> References: <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: <20051109193625.GA31889@hockin.org> On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote: > I guess the real point that I'd wanted to make, and seems > to have gotten lost, was that by avoiding using pointers, > you end up designing code in a very different way, and you > can find out that often/usually, you don't need structs > filled with a zoo of pointers. Umm, references are implemented as pointers. Instead of a "zoo of pointers" you have a "zoo of references". No functional difference. > Minimizing pointers is good: less ref counting is needed, > fewer mallocs are needed, fewer locks are needed > (because of local/private scope!!), and null pointer > deref errors are less likely. Not true at all! If you're storing references you absolutley still need reference counting. Allocation non-trivial things on the stack is Bad Idea in kernel land. From thockin at hockin.org Thu Nov 10 07:39:54 2005 From: thockin at hockin.org (thockin at hockin.org) Date: Wed, 9 Nov 2005 12:39:54 -0800 Subject: typedefs and structs In-Reply-To: <20051109193828.GR19593@austin.ibm.com> References: <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> <20051109193625.GA31889@hockin.org> <20051109193828.GR19593@austin.ibm.com> Message-ID: <20051109203954.GA3539@hockin.org> On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote: > On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin at hockin.org was heard to remark: > > Umm, references are implemented as pointers. Instead of a "zoo of > > pointers" you have a "zoo of references". No functional difference. > > Sigh. > > I think you are confusing references and pointers. By definition > you cannot "store a reference"; however, you can "dereference" > an object and store a pointer to it. Sigh, That's funny - I've written C++ code which has references as members of objects. You absolutely *can* store a reference. References are simply a syntactic simplification to eliminate the different pointer-dereference notation. If they make you think about a problem differently, that's fine, but they are really just pointers in disguise. From linux-os at analogic.com Thu Nov 10 07:26:10 2005 From: linux-os at analogic.com (linux-os (Dick Johnson)) Date: Wed, 9 Nov 2005 15:26:10 -0500 Subject: typedefs and structs In-Reply-To: <20051109192028.GP19593@austin.ibm.com> References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: On Wed, 9 Nov 2005, linas wrote: > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: >> On Wed, 9 Nov 2005, J.A. Magallon wrote: >> >>> void do_some_stuff(T& arg1,T& arg2) >> >> A diligent C programmer would write this as follows: >> void do_some_stuff (struct T * a, struct T * b); >> So I don't see C++ winning at all here. > > I guess the real point that I'd wanted to make, and seems > to have gotten lost, was that by avoiding using pointers, > you end up designing code in a very different way, and you > can find out that often/usually, you don't need structs > filled with a zoo of pointers. > But you can't avoid pointers unless you make your entire program have global scope. That may be great for performance, but a killer if for have any bugs. Procedures that get pointers to variables (including structures) are a way of isolating faults. Without them, you can't test the procedures in a working environment. Also, without pointers, you are severely limited on the kinds of libraries you can share. You certainly wouldn't want to compile an entire C runtime library into your code so that all the buffers have local scope. > Minimizing pointers is good: less ref counting is needed, > fewer mallocs are needed, fewer locks are needed > (because of local/private scope!!), and null pointer > deref errors are less likely. > No. Minimizing pointers should not be an objective. Properly using the components of your tool-set should be. This means that you use the correct access mode for various object types. "Correct" depends upon the instant context, not upon some company or personal rule. > There are even performance implications: on modern CPU's > there's a very long pipeline to memory (hundreds of cycles > for a cache miss! Really! Worse if you have run out of > TLB entries!). So walking a long linked list chasing > pointers can really really hurt performance. > Linked lists are some of the necessary elements when one doesn't know ahead of time the number of objects that must be manipulated. They are just programming tools. You use them when they are necessary. The fact that they use pointers to make the links is not relevant. > By using refs instead of pointers, it helps you focus > on the issue of "do I really need to store this pointer > somewhere? Will I really need it later, or can I be done > with it now?". > Huh? References (at the opcode level) are pointers. There is no difference whatsoever. For memory references, you Get: direct, direct+displacement, register_indirect, register_indirect+displacement. There isn't anything else. Some processors let you sum displacements over several registers. Nevertheless, that's all you have. Accessing a variable by reference is an old artifact of FORTRAN. It can be efficient if the architecture is flat and global so the compiler can substitute direct access. In other words, no parameter or pointer actually gets passed to the routine. The compiler just remembers what the parameters actually were and substitutes code to directly access the parameters. Not so in C++. With C++, "reference" is a user-shorthand where the compiler actually accesses variables with pointers. The rules don't prohibit C++ compilers from using FORTRAN- like conventions for passing-by-reference. It's just that nobody seems to do so. > I don't know if the idea of "using fewer pointers" can > actually be carried out in the kernel. For starters, > the stack is way too short to be able to put much on it. > > --linas > Cheers, Dick Johnson Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips). Warning : 98.36% of all statistics are fiction. . **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors at analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. From matthew at wil.cx Thu Nov 10 07:55:14 2005 From: matthew at wil.cx (Matthew Wilcox) Date: Wed, 9 Nov 2005 13:55:14 -0700 Subject: typedefs and structs In-Reply-To: <20051109193828.GR19593@austin.ibm.com> References: <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> <20051109193625.GA31889@hockin.org> <20051109193828.GR19593@austin.ibm.com> Message-ID: <20051109205514.GA1736@parisc-linux.org> On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote: > On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin at hockin.org was heard to remark: > > On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote: SHUT UP! SHUT UP ALL OF YOU!! Or at least stop cc'ing linux-pci on this stupid wanking. Thanks. From vlobanov at speakeasy.net Thu Nov 10 08:43:10 2005 From: vlobanov at speakeasy.net (Vadim Lobanov) Date: Wed, 9 Nov 2005 13:43:10 -0800 (PST) Subject: typedefs and structs In-Reply-To: <20051109192028.GP19593@austin.ibm.com> References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: On Wed, 9 Nov 2005, linas wrote: > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: > > On Wed, 9 Nov 2005, J.A. Magallon wrote: > > > > > void do_some_stuff(T& arg1,T& arg2) > > > > A diligent C programmer would write this as follows: > > void do_some_stuff (struct T * a, struct T * b); > > So I don't see C++ winning at all here. > > I guess the real point that I'd wanted to make, and seems > to have gotten lost, was that by avoiding using pointers, > you end up designing code in a very different way, and you > can find out that often/usually, you don't need structs > filled with a zoo of pointers. > > Minimizing pointers is good: less ref counting is needed, > fewer mallocs are needed, fewer locks are needed > (because of local/private scope!!), and null pointer > deref errors are less likely. > > There are even performance implications: on modern CPU's > there's a very long pipeline to memory (hundreds of cycles > for a cache miss! Really! Worse if you have run out of > TLB entries!). So walking a long linked list chasing > pointers can really really hurt performance. > > By using refs instead of pointers, it helps you focus > on the issue of "do I really need to store this pointer > somewhere? Will I really need it later, or can I be done > with it now?". > > I don't know if the idea of "using fewer pointers" can > actually be carried out in the kernel. For starters, > the stack is way too short to be able to put much on it. I really see the two issues at hand as being very much orthogonal to each other. Namely, you put data on the stack when you need it in the local 'context' only, whereas you put data globally when it needs to be available globally. The C++ references are nothing more than syntactic sugar (and we all know what they say about that and semicolons) for pointers, and so I don't see how they would affect the choices at all. Choosing where the data goes should be done according to the data's lifetime, not the specifics of how functions are declared. > --linas > > -Vadim Lobanov From pbadari at us.ibm.com Thu Nov 10 08:59:16 2005 From: pbadari at us.ibm.com (Badari Pulavarty) Date: Wed, 09 Nov 2005 13:59:16 -0800 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1131568336.24637.91.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> Message-ID: <1131573556.25354.1.camel@localhost.localdomain> On Thu, 2005-11-10 at 07:32 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2005-11-09 at 12:17 -0800, Mike Kravetz wrote: > > On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote: > > > Booting current mainline with 64K pagesize enabled gives me a purple (!) > > > screen early during boot. > > > > I seem to also be having problems with this patch. My OpenPOWER 720 > > stopped booting with 2.6.14-git10(and later). Just using defconfig. > > 64k page size NOT enabled. If I back out the 64k page size patch, > > 2.6.14-git10 boots. I'm trying to get more info but it is painful. > > It dies before xmon is initialized. > > There have been a couple of fixes, try the very latest git. Also, try > enabling early debug in arch/ppc64/kernel/setup.c > > > I could have sworn that I booted 2.6.14-git7 with the 64k page size > > patch applied. But, I can't do that now either. > > > > Some co-workers have successfully booted other POWER systems with these > > kernels. So, it must be specific to my hardware/LPAR configuration. > > Ok, i'll do more tests here too. I didn't have any luck on 2.6.14-git12 either. I tried 64k page support on my P570. Here are the console messages: Thanks, Badari -------------- next part -------------- boot: 2614git12 Please wait, loading kernel... Elf32 kernel loaded... zImage starting: loaded at 0x00401a04 (sp: 0x019ffbe0) Allocating 0x845378 bytes for kernel ... gunzipping (0x1c00000 <- 0x407a04:0x63e99d)...done 0x6963d8 bytes OF stdout device is: /vdevice/vty at 30000000 Hypertas detected, assuming LPAR ! command line: root=/dev/sda3 selinux=0 elevator=cfq memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 0000000002460000 alloc_top : 0000000008000000 alloc_top_hi : 00000000ed000000 rmo_top : 0000000008000000 ram_top : 00000000ed000000 Looking for displays instantiating rtas at 0x00000000077c0000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000002470000 -> 0x00000000024711de Device tree struct 0x0000000002480000 -> 0x00000000024a0000 Calling quiesce ... returning from prom_init Page orders: linear mapping = 24, others = 12 Bogus initrd 00000000 00000000 firmware_features = 0x1ffd5f Partition configured for 4 cpus. Starting Linux PPC64 #1 SMP Wed Nov 9 10:51:26 PST 2005 ----------------------------------------------------- ppc64_pft_size = 0x1a ppc64_interrupt_controller = 0x2 systemcfg = 0xc000000000618a00 systemcfg->platform = 0x101 systemcfg->processorCount = 0x4 systemcfg->physicalMemorySize = 0xed000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x0000000000000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.14-git12 (root at elm3b157) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Wed Nov 9 10:51:26 PST 2005 [boot]0012 Setup Arch Node 0 Memory: 0x0-0xed000000 Syscall map setup, 240 32-bit and 219 64-bit syscalls No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Using dedicated idle loop [boot]0015 Setup Done Built 1 zonelists Kernel command line: root=/dev/sda3 selinux=0 elevator=cfq [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 206.999000 MHz time_init: processor frequency = 1655.992000 MHz Page orders: linear mapping = 24, others = 12 Bogus initrd 00000000 00000000 firmware_features = 0x1ffd5f Partition configured for 4 cpus. Starting Linux PPC64 #1 SMP Wed Nov 9 10:51:26 PST 2005 ----------------------------------------------------- ppc64_pft_size = 0x1a ppc64_interrupt_controller = 0x2 systemcfg = 0xc000000000618a00 systemcfg->platform = 0x101 systemcfg->processorCount = 0x4 systemcfg->physicalMemorySize = 0xed000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x0000000000000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.14-git12 (root at elm3b157) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Wed Nov 9 10:51:26 PST 2005 [boot]0012 Setup Arch Node 0 Memory: 0x0-0xed000000 Syscall map setup, 240 32-bit and 219 64-bit syscalls No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Using dedicated idle loop [boot]0015 Setup Done Built 1 zonelists Kernel command line: root=/dev/sda3 selinux=0 elevator=cfq [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 206.999000 MHz time_init: processor frequency = 1655.992000 MHz Console: colour dummy device 80x25 Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes) freeing bootmem node 0 Memory: 3851648k/3883008k available (4928k kernel code, 31360k reserved, 1728k data, 1748k bss, 320k init) Security Framework v1.0.0 initialized SELinux: Disabled at boot. Mount-cache hash table entries: 4096 softlockup thread 0 started up. Processor 1 found. softlockup thread 1 started up. Processor 2 found. softlockup thread 2 started up. Processor 3 found. Brought up 4 CPUs softlockup thread 3 started up. NET: Registered protocol family 16 PCI: Probing PCI hardware IOMMU table initialized, virtual merging disabled mapping IO 3fe00200000 -> d000080000000000, size: 100000 mapping IO 3fe00700000 -> d000080000100000, size: 100000 PCI: Probing PCI hardware done SCSI subsystem initialized i/pSeries Real Time Clock Driver v1.1 RTAS daemon started audit: initializing netlink socket (disabled) audit(1131572634.103:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1 rpaphp: Slot [0001:00:02.4](PCI location=U787E.001.AAA1978-P2-C1) registered rpaphp: Slot [0001:00:02.2](PCI location=U787E.001.AAA1978-P2-C2) registered rpaphp: Slot [0001:00:02.6](PCI location=U787E.001.AAA1978-P2-C3) registered HVSI: registered 0 devices Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled blk_queue_max_sectors: set to minimum 128 Floppy drive(s): fd0 is 2.88M RAMDISK driver initialized: 16 RAM disks of 123456K size 1024 blocksize loop: loaded (max 8 devices) Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI Copyright (c) 1999-2005 Intel Corporation. e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PDC20275: IDE controller at PCI slot 0000:cc:01.0 PDC20275: chipset revision 1 PDC20275: 100% native mode on irq 134 ide2: BM-DMA at 0xdec00-0xdec07, BIOS settings: hde:pio, hdf:pio ide3: BM-DMA at 0xdec08-0xdec0f, BIOS settings: hdg:pio, hdh:pio hde: IBM DROM00205J1 H0, ATAPI CD/DVD-ROM drive ide2 at 0xde400-0xde407,0xddc02 on irq 134 hde: ATAPI 24X DVD-ROM drive, 256kB Cache Uniform CD-ROM driver Revision: 3.20 ipr: IBM Power RAID SCSI Device Driver version: 2.0.14 (May 2, 2005) ipr 0000:d0:01.0: Found IOA with IRQ: 135 ipr 0000:d0:01.0: Starting IOA initialization sequence. ipr 0000:d0:01.0: Adapter firmware version: 020A005E ipr 0000:d0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM H0 Model: HUS103014FL3800 Rev: RPQF Type: Direct-Access ANSI SCSI revision: 04 Vendor: IBM H0 Model: HUS103014FL3800 Rev: RPQF Type: Direct-Access ANSI SCSI revision: 04 Vendor: IBM Model: VSBPD4E2 U4SCSI Rev: 7134 Type: Enclosure ANSI SCSI revision: 02 scsi: unknown device type 31 Vendor: IBM Model: 570B001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 SCSI device sda: 286748000 512-byte hdwr sectors (146815 MB) SCSI device sda: drive cache: write through SCSI device sda: 286748000 512-byte hdwr sectors (146815 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sd 0:0:5:0: Attached scsi disk sda SCSI device sdb: 286748000 512-byte hdwr sectors (146815 MB) SCSI device sdb: drive cache: write through SCSI device sdb: 286748000 512-byte hdwr sectors (146815 MB) SCSI device sdb: drive cache: write through sdb: unknown partition table sd 0:0:8:0: Attached scsi disk sdb sd 0:0:5:0: Attached scsi generic sg0 type 0 sd 0:0:8:0: Attached scsi generic sg1 type 0 0:0:15:0: Attached scsi generic sg2 type 13 0:255:255:255: Attached scsi generic sg3 type 31 mice: PS/2 mouse device common for all mice md: md driver 0.90.2 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 3.39 oprofile: using ppc64/power5 performance monitoring. NET: Registered protocol family 2 IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) TCP established hash table entries: 1048576 (order: 8, 16777216 bytes) TCP bind hash table entries: 65536 (order: 4, 1048576 bytes) TCP: Hash tables configured (established 1048576 bind 65536) TCP reno registered TCP bic registered NET: Registered protocol family 1 NET: Registered protocol family 17 NET: Registered protocol family 15 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. ReiserFS: sda3: found reiserfs format "3.6" with standard journal ReiserFS: sda3: using ordered data mode ReiserFS: sda3: journal params: device sda3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: sda3: checking transaction log (sda3) ReiserFS: sda3: Using r5 hash to sort names VFS: Mounted root (reiserfs filesystem) readonly. Freeing unused kernel memory: 320k freed INIT: version 2.85 booting INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. INIT: PANIC: segmentation violation! sleeping for 30 seconds. From pbadari at us.ibm.com Thu Nov 10 09:07:31 2005 From: pbadari at us.ibm.com (Badari Pulavarty) Date: Wed, 09 Nov 2005 14:07:31 -0800 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1131573693.24637.109.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> Message-ID: <1131574051.25354.3.camel@localhost.localdomain> On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote: > > I didn't have any luck on 2.6.14-git12 either. > > I tried 64k page support on my P570. > > > > Here are the console messages: > > What distro do you use in userland ? Some older glibc versions have a > bug that cause issues with 64k pages, though it generally happens with > login blowing up, not init ... SLES9 (could be SLES9 SP1). Thanks, Badari From vlobanov at speakeasy.net Thu Nov 10 09:12:38 2005 From: vlobanov at speakeasy.net (Vadim Lobanov) Date: Wed, 9 Nov 2005 14:12:38 -0800 (PST) Subject: typedefs and structs In-Reply-To: References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote: > > On Wed, 9 Nov 2005, linas wrote: > > > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: > >> On Wed, 9 Nov 2005, J.A. Magallon wrote: > >> > >>> void do_some_stuff(T& arg1,T& arg2) > >> > >> A diligent C programmer would write this as follows: > >> void do_some_stuff (struct T * a, struct T * b); > >> So I don't see C++ winning at all here. > > > > I guess the real point that I'd wanted to make, and seems > > to have gotten lost, was that by avoiding using pointers, > > you end up designing code in a very different way, and you > > can find out that often/usually, you don't need structs > > filled with a zoo of pointers. > > > > But you can't avoid pointers unless you make your entire > program have global scope. That may be great for performance, > but a killer if for have any bugs. Just to extract some useful technical knowledge from the current ongoing "flamewar"... I'm not entirely sure if the above statement regarding performance is correct. Some enlightenment would be appreciated. Suppose you have the following code: int myvar; void foo (void) { printf("%d\n", myvar); bar(); printf("%d\n", myvar); } If bar is declared in _another_ file as void bar (void); then I believe the compiler has to reread the global 'myvar' from memory for the second printf(). However, if the code is as follows: void foo (void) { int myvar = 0; printf("%d\n", myvar); bar(&myvar); printf("%d\n", myvar); } If bar is declared in _another_ file as void bar (const int * var); then I think the compiler can validly cache the value of 'myvar' for the second printf without re-reading it. Correct/incorrect? -Vadim Lobanov From linux-os at analogic.com Thu Nov 10 09:37:43 2005 From: linux-os at analogic.com (linux-os (Dick Johnson)) Date: Wed, 9 Nov 2005 17:37:43 -0500 Subject: typedefs and structs In-Reply-To: References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: On Wed, 9 Nov 2005, Vadim Lobanov wrote: > On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote: > >> >> On Wed, 9 Nov 2005, linas wrote: >> >>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: >>>> On Wed, 9 Nov 2005, J.A. Magallon wrote: >>>> >>>>> void do_some_stuff(T& arg1,T& arg2) >>>> >>>> A diligent C programmer would write this as follows: >>>> void do_some_stuff (struct T * a, struct T * b); >>>> So I don't see C++ winning at all here. >>> >>> I guess the real point that I'd wanted to make, and seems >>> to have gotten lost, was that by avoiding using pointers, >>> you end up designing code in a very different way, and you >>> can find out that often/usually, you don't need structs >>> filled with a zoo of pointers. >>> >> >> But you can't avoid pointers unless you make your entire >> program have global scope. That may be great for performance, >> but a killer if for have any bugs. > > Just to extract some useful technical knowledge from the current ongoing > "flamewar"... > I'm not entirely sure if the above statement regarding performance is > correct. Some enlightenment would be appreciated. > > Suppose you have the following code: > int myvar; > void foo (void) { > printf("%d\n", myvar); > bar(); > printf("%d\n", myvar); > } > If bar is declared in _another_ file as > void bar (void); > then I believe the compiler has to reread the global 'myvar' from memory > for the second printf(). > Correct because bar() could have modified (it's global). > However, if the code is as follows: > void foo (void) { > int myvar = 0; > printf("%d\n", myvar); > bar(&myvar); > printf("%d\n", myvar); > } > If bar is declared in _another_ file as > void bar (const int * var); > then I think the compiler can validly cache the value of 'myvar' for the > second printf without re-reading it. Correct/incorrect? > Maybe you tried to trick me by showing the variable was not going to be changed (const *). In that case, the compiler may not re-read the variable. However, it can re-read the variable. A "smart" compiler might just do: write(1, "0\n", 2); ... for the first printf() as well. Such compilers make debugging difficult. > -Vadim Lobanov > Cheers, Dick Johnson Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips). Warning : 98.36% of all statistics are fiction. . **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors at analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. From vlobanov at speakeasy.net Thu Nov 10 09:47:17 2005 From: vlobanov at speakeasy.net (Vadim Lobanov) Date: Wed, 9 Nov 2005 14:47:17 -0800 (PST) Subject: typedefs and structs In-Reply-To: References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: Trimmed linux-pci so as not to annoy those who don't want to listen to all of this. Anyone else who wants off the CC list should yell also. :-) On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote: > > On Wed, 9 Nov 2005, Vadim Lobanov wrote: > > > On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote: > > > >> > >> On Wed, 9 Nov 2005, linas wrote: > >> > >>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: > >>>> On Wed, 9 Nov 2005, J.A. Magallon wrote: > >>>> > >>>>> void do_some_stuff(T& arg1,T& arg2) > >>>> > >>>> A diligent C programmer would write this as follows: > >>>> void do_some_stuff (struct T * a, struct T * b); > >>>> So I don't see C++ winning at all here. > >>> > >>> I guess the real point that I'd wanted to make, and seems > >>> to have gotten lost, was that by avoiding using pointers, > >>> you end up designing code in a very different way, and you > >>> can find out that often/usually, you don't need structs > >>> filled with a zoo of pointers. > >>> > >> > >> But you can't avoid pointers unless you make your entire > >> program have global scope. That may be great for performance, > >> but a killer if for have any bugs. > > > > Maybe you tried to trick me by showing the variable was not going > to be changed (const *). In that case, the compiler may not re-read > the variable. However, it can re-read the variable. > > A "smart" compiler might just do: write(1, "0\n", 2); > ... for the first printf() as well. Such compilers make > debugging difficult. It wasn't meant to be a trick. In fact, I _want_ the compiler to cache the myvar value in the second case -- I was merely wondering if there was some fact that I overlooked that would prevent such an optimization. The ultimate point of this was to show that globals can actually be slower than locals (imagine the int in the example replaced by a gigantic struct), contrary to the implication of your original statement. :-) > > Cheers, > Dick Johnson > Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips). > Warning : 98.36% of all statistics are fiction. > . > > **************************************************************** > The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors at analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. > > Thank you. > -Vadim Lobanov From linas at austin.ibm.com Thu Nov 10 10:29:44 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 17:29:44 -0600 Subject: typedefs and structs In-Reply-To: References: <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: <20051109232944.GV19593@austin.ibm.com> On Wed, Nov 09, 2005 at 03:26:10PM -0500, linux-os (Dick Johnson) was heard to remark: > > On Wed, 9 Nov 2005, linas wrote: > > > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark: > >> On Wed, 9 Nov 2005, J.A. Magallon wrote: > >> > >>> void do_some_stuff(T& arg1,T& arg2) > >> > >> A diligent C programmer would write this as follows: > >> void do_some_stuff (struct T * a, struct T * b); > >> So I don't see C++ winning at all here. > > > > I guess the real point that I'd wanted to make, and seems > > to have gotten lost, was that by avoiding using pointers, > > you end up designing code in a very different way, and you > > can find out that often/usually, you don't need structs > > filled with a zoo of pointers. > > But you can't avoid pointers unless you make your entire > program have global scope. I didn't say you can avoid all pointers. I did say that for many projects, one can often avoid many pointers. And I certainly did not say that one needs global scope to do so. In fact, I said the opposite. > Also, without pointers, you are severely limited on the kinds > of libraries you can share. I think you don't understand what a reference is. A reference is just like a pointer, except that the signature is different. It has nothing to do with the ability to create or use libraries, or to create/use modular code. I was trying to say that by focusing on the concept of a "reference" as opposed to the concept of a "pointer", you can write code that is *more* modular, not less. > > Minimizing pointers is good: less ref counting is needed, > > fewer mallocs are needed, fewer locks are needed > > (because of local/private scope!!), and null pointer > > deref errors are less likely. > > No. Minimizing pointers should not be an objective. Why not? I've fixed hundreds of kernel bugs (which you don't see on this list because my fixes mostly go to the distros or other users) and nine out of ten of these are null-pointer derefs. Maybe I'm naive for thinking that "fewer pointers == fewer pointer bugs" but, hey its worth a shot. > Properly > using the components of your tool-set should be. What tool set are you refering to? I am assuming that the code is 100% malleable: that one has complete authority to redesign the way the system works, from the ground up. If you do not have this freedom, but are forced to use someone-else's tool set, then yes, you are SOL. Furthermore, I would agree that mixing two different styles of coding in one project can lead to some nasty, ugly code. > > By using refs instead of pointers, it helps you focus > > on the issue of "do I really need to store this pointer > > somewhere? Will I really need it later, or can I be done > > with it now?". > > Huh? References (at the opcode level) are pointers. There > is no difference whatsoever. Yes, that is right. I'm not talking about opcodes. That's not what the conversation is about. What I am trying to say is that many people design code in such a way that they need to store lots of pointers in an assortment of structs. I wanted to emphasize that there are other ways of designing code, which has smaller needs for pointers (and that this can be done without loosing modularity, testability, debugability, and it can be done without resorting to global variables.) --linas From dthompson at linuxnetworx.com Thu Nov 10 09:54:08 2005 From: dthompson at linuxnetworx.com (doug thompson) Date: Wed, 09 Nov 2005 15:54:08 -0700 Subject: typedefs and structs - trim request In-Reply-To: References: <20051107204136.GG19593@austin.ibm.com> <1131412273.14381.142.camel@localhost.localdomain> <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: <1131576848.31837.1.camel@logos.linuxnetworx.com> Yes, trim off bluesmoke mailing list thanks doug thompson > bluesmoke-devel mailing list > bluesmoke-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bluesmoke-devel From benh at kernel.crashing.org Thu Nov 10 10:42:32 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Nov 2005 10:42:32 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051109172125.GA12861@lst.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> Message-ID: <1131579752.24637.117.camel@gaston> On Wed, 2005-11-09 at 18:21 +0100, Christoph Hellwig wrote: > Booting current mainline with 64K pagesize enabled gives me a purple (!) > screen early during boot. Do you use one of the nvidia fbdev's ? What if you disable it ? (Also, rivafb has some funky bugs on my iMac G5, though nvidiafb works fine with the latest fixes that are now in -git, but I haven't tried with 64K pages enabled in the .config yet). Ben. From hollis at penguinppc.org Thu Nov 10 10:48:54 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Wed, 9 Nov 2005 17:48:54 -0600 Subject: [PATCH] ppc64 boot: remove argv usage In-Reply-To: <20051109195220.GB31658@suse.de> References: <20051109195103.GA31658@suse.de> <20051109195220.GB31658@suse.de> Message-ID: On Nov 9, 2005, at 1:52 PM, Olaf Hering wrote: > @@ -69,6 +70,7 @@ int main(int argc, char **argv) > fprintf(stderr, "Name of RAM disk file missing.\n"); > exit(1); > } > + rd_name = argv[1] > > if (argc < 3) { > fprintf(stderr, "Name of System Map input file is missing.\n"); A semicolon would probably be useful here. :) -Hollis From hollis at penguinppc.org Thu Nov 10 10:52:14 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Wed, 9 Nov 2005 17:52:14 -0600 Subject: [PATCH] ppc64 & powerpc: Check whether the native CC can use -m64 In-Reply-To: <200511091832.37094.arnd@arndb.de> References: <1131477245.7855.17.camel@Memoria.anyarch.net> <200511091755.15114.arnd@arndb.de> <20051109170829.GB11895@Memoria.zivexott.local> <200511091832.37094.arnd@arndb.de> Message-ID: On Nov 9, 2005, at 11:32 AM, Arnd Bergmann wrote: > On Middeweken 09 November 2005 18:08, Daniel Ostrow wrote: > >> +NATIVE64 := $(call cc-option-yn, -m64) >> +ifeq ($(NATIVE64),n) >> +CROSS_COMPILE := powerpc64-unknown-linux-gnu- >> +endif >> + > > But now you're using powerpc64-unknown-linux-gnu- instead > of powerpc64-linux-. I don't know which one is more widely > used, but at least the debian packages I'm using have > 'powerpc64-linux' only. Crosstool produces the longer name. -Hollis From linas at austin.ibm.com Thu Nov 10 11:27:47 2005 From: linas at austin.ibm.com (linas) Date: Wed, 9 Nov 2005 18:27:47 -0600 Subject: typedefs and structs In-Reply-To: References: <20051108232327.GA19593@austin.ibm.com> <20051109003048.GK19593@austin.ibm.com> <20051109004808.GM19593@austin.ibm.com> <19255C96-8B64-4615-A3A7-9E5A850DE398@mac.com> <20051109111640.757f399a@werewolf.auna.net> <20051109192028.GP19593@austin.ibm.com> Message-ID: <20051110002746.GW19593@austin.ibm.com> On Wed, Nov 09, 2005 at 01:43:10PM -0800, Vadim Lobanov was heard to remark: > On Wed, 9 Nov 2005, linas wrote: > > > I guess the real point that I'd wanted to make, and seems > > to have gotten lost, was that by avoiding using pointers, > > you end up designing code in a very different way, and you > > can find out that often/usually, you don't need structs > > filled with a zoo of pointers. > > > > Minimizing pointers is good: less ref counting is needed, > > fewer mallocs are needed, fewer locks are needed > > (because of local/private scope!!), and null pointer > > deref errors are less likely. > > > > There are even performance implications: on modern CPU's > > there's a very long pipeline to memory (hundreds of cycles > > for a cache miss! Really! Worse if you have run out of > > TLB entries!). So walking a long linked list chasing > > pointers can really really hurt performance. > > > > By using refs instead of pointers, it helps you focus > > on the issue of "do I really need to store this pointer > > somewhere? Will I really need it later, or can I be done > > with it now?". > > > > I don't know if the idea of "using fewer pointers" can > > actually be carried out in the kernel. For starters, > > the stack is way too short to be able to put much on it. > > I really see the two issues at hand as being very much orthogonal to > each other. Yes. I accidentally linked them, see below. > Namely, you put data on the stack when you need it in the local > 'context' only, whereas you put data globally when it needs to be > available globally. Yes. But there's some flexibility. > The C++ references are nothing more than syntactic > sugar (and we all know what they say about that and semicolons) for > pointers, Yes. > and so I don't see how they would affect the choices at all. > Choosing where the data goes should be done according to the data's > lifetime, not the specifics of how functions are declared. My apologies for linking the idea of references to fewer pointers. They're not linked, except in how I discovered them. I once had a project (that used threads, so it was "kernel-like", in that race conditions had to be dealt with). One day, for the the hell of it, I decided to create a struct and keep it on the stack, instead of mallocing it. Since this struct was accessed only by a few small, well-defined routines that did not keep any pointers to it, this worked just fine. And skipping the malloc/free felt good, so I liked it. Then I thought that maybe I could push the idea, see how far I could go. Well, of course, the code was filled with various objects, all of which *seemed* to be (or seemed to need to be) long-lifetime objects. And they all stored pointers to one-another, since they all needed to get access to one-another at various points, for various reasons. Well, I really wanted to alloc objects on stack, and so that forced me to think about how to get rid of pointers (since a pointer to an object on stack is deadly). And that forced me to think about lifetime. And some of this thinking was quite hard. But encouraged by some modest success at first, I found that I was able to eliminate almost all the pointers, and almost all the mallocs (maybe several dozen of each, scattered accross maybe several dozen structs). And I was flabbergasted, since the resulting program actually got smaller in the process, and faster. And the null-pointer derefs vanished. Now, maybe this was specific to the project, and can't be replicated elsewhere. But this was a communcations daemon: it basically was a pool of threads, each thread handling a long-lived, stateful "session" of requests and responses from some remote server, and so while its not the kernel, that's a reasonably complex thing. I'm not crazy enough to suggest that one could do the same thing in the Linux kernel, since one probably can't, but now that we're here and all, it does make me wonder. FWIW, the two designs of the commo daemon were radically different; things that were sliced one way got reworked to flow and be handled in a completely different order. You can't just get rid of pointers with some trivial restructuring; you have to figure out how not to need them. --linas From david at gibson.dropbear.id.au Thu Nov 10 11:50:16 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Nov 2005 11:50:16 +1100 Subject: powerpc: Merge cacheflush.h and cache.h Message-ID: <20051110005016.GB17840@localhost.localdomain> The ppc32 and ppc64 versions of cacheflush.h were almost identical. The two versions of cache.h are fairly similar, except for a bunch of register definitions in the ppc32 version which probably belong better elsewhere. This patch, therefore, merges both headers. Notable points: - there are several functions in cacheflush.h which exist only on ppc32 or only on ppc64. These are handled by #ifdef for now, but these should probably be consolidated, along with the actual code behind them later. - Confusingly, both ppc32 and ppc64 have a flush_dcache_range(), but they're subtly different: it uses dcbf on ppc32 and dcbst on ppc64, ppc64 has a flush_inval_dcache_range() which uses dcbf. These too should be merged and consolidated later. - Also flush_dcache_range() was defined in cacheflush.h on ppc64, and in cache.h on ppc32. In the merged version it's in cacheflush.h - On ppc32 flush_icache_range() is a normal function from misc.S. On ppc64, it was wrapper, testing a feature bit before calling __flush_icache_range() which does the actual flush. This patch takes the ppc64 approach, which amounts to no change on ppc32, since CPU_FTR_COHERENT_ICACHE will never be set there, but does mean renaming flush_icache_range() to __flush_icache_range() in arch/ppc/kernel/misc.S and arch/powerpc/kernel/misc_32.S - The PReP register info from asm-ppc/cache.h has moved to arch/ppc/platforms/prep_setup.c - The 8xx register info from asm-ppc/cache.h has moved to a new asm-powerpc/reg_8xx.h, included from reg.h - flush_dcache_all() was defined on ppc32 (only), but was never called (although it was exported). Thus this patch removes it from cacheflush.h and from ARCH=powerpc (misc_32.S) entirely. It's left in ARCH=ppc for now, with the prototype moved to ppc_ksyms.c. Built for Walnut (ARCH=ppc), 32-bit multiplatform (pmac, CHRP and PReP ARCH=ppc, pmac and CHRP ARCH=powerpc). Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64). Built for 32-bit powermac (ARCH=ppc and ARCH=powerpc). Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64). Built and booted on G5 (ARCH=powerpc) Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/cacheflush.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/cacheflush.h 2005-11-10 11:17:56.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#ifndef _ASM_POWERPC_CACHEFLUSH_H +#define _ASM_POWERPC_CACHEFLUSH_H + +#ifdef __KERNEL__ + +#include +#include + +/* + * No cache flushing is required when address mappings are changed, + * because the caches on PowerPCs are physically addressed. + */ +#define flush_cache_all() do { } while (0) +#define flush_cache_mm(mm) do { } while (0) +#define flush_cache_range(vma, start, end) do { } while (0) +#define flush_cache_page(vma, vmaddr, pfn) do { } while (0) +#define flush_icache_page(vma, page) do { } while (0) +#define flush_cache_vmap(start, end) do { } while (0) +#define flush_cache_vunmap(start, end) do { } while (0) + +extern void flush_dcache_page(struct page *page); +#define flush_dcache_mmap_lock(mapping) do { } while (0) +#define flush_dcache_mmap_unlock(mapping) do { } while (0) + +extern void __flush_icache_range(unsigned long, unsigned long); +static inline void flush_icache_range(unsigned long start, unsigned long stop) +{ + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) + __flush_icache_range(start, stop); +} + +extern void flush_icache_user_range(struct vm_area_struct *vma, + struct page *page, unsigned long addr, + int len); +extern void __flush_dcache_icache(void *page_va); +extern void flush_dcache_icache_page(struct page *page); +#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE) +extern void __flush_dcache_icache_phys(unsigned long physaddr); +#endif /* CONFIG_PPC32 && !CONFIG_BOOKE */ + +extern void flush_dcache_range(unsigned long start, unsigned long stop); +#ifdef CONFIG_PPC32 +extern void clean_dcache_range(unsigned long start, unsigned long stop); +extern void invalidate_dcache_range(unsigned long start, unsigned long stop); +#endif /* CONFIG_PPC32 */ +#ifdef CONFIG_PPC64 +extern void flush_inval_dcache_range(unsigned long start, unsigned long stop); +extern void flush_dcache_phys_range(unsigned long start, unsigned long stop); +#endif + +#define copy_to_user_page(vma, page, vaddr, dst, src, len) \ + do { \ + memcpy(dst, src, len); \ + flush_icache_user_range(vma, page, vaddr, len); \ + } while (0) +#define copy_from_user_page(vma, page, vaddr, dst, src, len) \ + memcpy(dst, src, len) + + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_CACHEFLUSH_H */ Index: working-2.6/include/asm-ppc/cache.h =================================================================== --- working-2.6.orig/include/asm-ppc/cache.h 2005-11-08 10:57:23.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,84 +0,0 @@ -/* - * include/asm-ppc/cache.h - */ -#ifdef __KERNEL__ -#ifndef __ARCH_PPC_CACHE_H -#define __ARCH_PPC_CACHE_H - -#include - -/* bytes per L1 cache line */ -#if defined(CONFIG_8xx) || defined(CONFIG_403GCX) -#define L1_CACHE_SHIFT 4 -#define MAX_COPY_PREFETCH 1 -#elif defined(CONFIG_PPC64BRIDGE) -#define L1_CACHE_SHIFT 7 -#define MAX_COPY_PREFETCH 1 -#else -#define L1_CACHE_SHIFT 5 -#define MAX_COPY_PREFETCH 4 -#endif - -#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) - -#define SMP_CACHE_BYTES L1_CACHE_BYTES -#define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ - -#define L1_CACHE_ALIGN(x) (((x)+(L1_CACHE_BYTES-1))&~(L1_CACHE_BYTES-1)) -#define L1_CACHE_PAGES 8 - -#ifndef __ASSEMBLY__ -extern void clean_dcache_range(unsigned long start, unsigned long stop); -extern void flush_dcache_range(unsigned long start, unsigned long stop); -extern void invalidate_dcache_range(unsigned long start, unsigned long stop); -extern void flush_dcache_all(void); -#endif /* __ASSEMBLY__ */ - -/* prep registers for L2 */ -#define CACHECRBA 0x80000823 /* Cache configuration register address */ -#define L2CACHE_MASK 0x03 /* Mask for 2 L2 Cache bits */ -#define L2CACHE_512KB 0x00 /* 512KB */ -#define L2CACHE_256KB 0x01 /* 256KB */ -#define L2CACHE_1MB 0x02 /* 1MB */ -#define L2CACHE_NONE 0x03 /* NONE */ -#define L2CACHE_PARITY 0x08 /* Mask for L2 Cache Parity Protected bit */ - -#ifdef CONFIG_8xx -/* Cache control on the MPC8xx is provided through some additional - * special purpose registers. - */ -#define SPRN_IC_CST 560 /* Instruction cache control/status */ -#define SPRN_IC_ADR 561 /* Address needed for some commands */ -#define SPRN_IC_DAT 562 /* Read-only data register */ -#define SPRN_DC_CST 568 /* Data cache control/status */ -#define SPRN_DC_ADR 569 /* Address needed for some commands */ -#define SPRN_DC_DAT 570 /* Read-only data register */ - -/* Commands. Only the first few are available to the instruction cache. -*/ -#define IDC_ENABLE 0x02000000 /* Cache enable */ -#define IDC_DISABLE 0x04000000 /* Cache disable */ -#define IDC_LDLCK 0x06000000 /* Load and lock */ -#define IDC_UNLINE 0x08000000 /* Unlock line */ -#define IDC_UNALL 0x0a000000 /* Unlock all */ -#define IDC_INVALL 0x0c000000 /* Invalidate all */ - -#define DC_FLINE 0x0e000000 /* Flush data cache line */ -#define DC_SFWT 0x01000000 /* Set forced writethrough mode */ -#define DC_CFWT 0x03000000 /* Clear forced writethrough mode */ -#define DC_SLES 0x05000000 /* Set little endian swap mode */ -#define DC_CLES 0x07000000 /* Clear little endian swap mode */ - -/* Status. -*/ -#define IDC_ENABLED 0x80000000 /* Cache is enabled */ -#define IDC_CERR1 0x00200000 /* Cache error 1 */ -#define IDC_CERR2 0x00100000 /* Cache error 2 */ -#define IDC_CERR3 0x00080000 /* Cache error 3 */ - -#define DC_DFWT 0x40000000 /* Data cache is forced write through */ -#define DC_LES 0x20000000 /* Caches are little endian mode */ -#endif /* CONFIG_8xx */ - -#endif -#endif /* __KERNEL__ */ Index: working-2.6/include/asm-ppc/cacheflush.h =================================================================== --- working-2.6.orig/include/asm-ppc/cacheflush.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,49 +0,0 @@ -/* - * include/asm-ppc/cacheflush.h - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#ifdef __KERNEL__ -#ifndef _PPC_CACHEFLUSH_H -#define _PPC_CACHEFLUSH_H - -#include - -/* - * No cache flushing is required when address mappings are - * changed, because the caches on PowerPCs are physically - * addressed. -- paulus - * Also, when SMP we use the coherency (M) bit of the - * BATs and PTEs. -- Cort - */ -#define flush_cache_all() do { } while (0) -#define flush_cache_mm(mm) do { } while (0) -#define flush_cache_range(vma, a, b) do { } while (0) -#define flush_cache_page(vma, p, pfn) do { } while (0) -#define flush_icache_page(vma, page) do { } while (0) -#define flush_cache_vmap(start, end) do { } while (0) -#define flush_cache_vunmap(start, end) do { } while (0) - -extern void flush_dcache_page(struct page *page); -#define flush_dcache_mmap_lock(mapping) do { } while (0) -#define flush_dcache_mmap_unlock(mapping) do { } while (0) - -extern void flush_icache_range(unsigned long, unsigned long); -extern void flush_icache_user_range(struct vm_area_struct *vma, - struct page *page, unsigned long addr, int len); - -#define copy_to_user_page(vma, page, vaddr, dst, src, len) \ -do { memcpy(dst, src, len); \ - flush_icache_user_range(vma, page, vaddr, len); \ -} while (0) -#define copy_from_user_page(vma, page, vaddr, dst, src, len) \ - memcpy(dst, src, len) - -extern void __flush_dcache_icache(void *page_va); -extern void __flush_dcache_icache_phys(unsigned long physaddr); -extern void flush_dcache_icache_page(struct page *page); -#endif /* _PPC_CACHEFLUSH_H */ -#endif /* __KERNEL__ */ Index: working-2.6/include/asm-ppc64/cacheflush.h =================================================================== --- working-2.6.orig/include/asm-ppc64/cacheflush.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,48 +0,0 @@ -#ifndef _PPC64_CACHEFLUSH_H -#define _PPC64_CACHEFLUSH_H - -#include -#include - -/* - * No cache flushing is required when address mappings are - * changed, because the caches on PowerPCs are physically - * addressed. - */ -#define flush_cache_all() do { } while (0) -#define flush_cache_mm(mm) do { } while (0) -#define flush_cache_range(vma, start, end) do { } while (0) -#define flush_cache_page(vma, vmaddr, pfn) do { } while (0) -#define flush_icache_page(vma, page) do { } while (0) -#define flush_cache_vmap(start, end) do { } while (0) -#define flush_cache_vunmap(start, end) do { } while (0) - -extern void flush_dcache_page(struct page *page); -#define flush_dcache_mmap_lock(mapping) do { } while (0) -#define flush_dcache_mmap_unlock(mapping) do { } while (0) - -extern void __flush_icache_range(unsigned long, unsigned long); -extern void flush_icache_user_range(struct vm_area_struct *vma, - struct page *page, unsigned long addr, - int len); - -extern void flush_dcache_range(unsigned long start, unsigned long stop); -extern void flush_dcache_phys_range(unsigned long start, unsigned long stop); -extern void flush_inval_dcache_range(unsigned long start, unsigned long stop); - -#define copy_to_user_page(vma, page, vaddr, dst, src, len) \ -do { memcpy(dst, src, len); \ - flush_icache_user_range(vma, page, vaddr, len); \ -} while (0) -#define copy_from_user_page(vma, page, vaddr, dst, src, len) \ - memcpy(dst, src, len) - -extern void __flush_dcache_icache(void *page_va); - -static inline void flush_icache_range(unsigned long start, unsigned long stop) -{ - if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) - __flush_icache_range(start, stop); -} - -#endif /* _PPC64_CACHEFLUSH_H */ Index: working-2.6/arch/ppc/kernel/misc.S =================================================================== --- working-2.6.orig/arch/ppc/kernel/misc.S 2005-11-08 16:10:59.000000000 +1100 +++ working-2.6/arch/ppc/kernel/misc.S 2005-11-10 11:14:30.000000000 +1100 @@ -497,9 +497,9 @@ * and invalidate the corresponding instruction cache blocks. * This is a no-op on the 601. * - * flush_icache_range(unsigned long start, unsigned long stop) + * __flush_icache_range(unsigned long start, unsigned long stop) */ -_GLOBAL(flush_icache_range) +_GLOBAL(__flush_icache_range) BEGIN_FTR_SECTION blr /* for 601, do nothing */ END_FTR_SECTION_IFCLR(CPU_FTR_SPLIT_ID_CACHE) Index: working-2.6/include/asm-powerpc/cache.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/cache.h 2005-11-10 11:14:30.000000000 +1100 @@ -0,0 +1,40 @@ +#ifndef _ASM_POWERPC_CACHE_H +#define _ASM_POWERPC_CACHE_H + +#ifdef __KERNEL__ + +#include + +/* bytes per L1 cache line */ +#if defined(CONFIG_8xx) || defined(CONFIG_403GCX) +#define L1_CACHE_SHIFT 4 +#define MAX_COPY_PREFETCH 1 +#elif defined(CONFIG_PPC32) +#define L1_CACHE_SHIFT 5 +#define MAX_COPY_PREFETCH 4 +#else /* CONFIG_PPC64 */ +#define L1_CACHE_SHIFT 7 +#endif + +#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) + +#define SMP_CACHE_BYTES L1_CACHE_BYTES +#define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ + +#if defined(__powerpc64__) && !defined(__ASSEMBLY__) +struct ppc64_caches { + u32 dsize; /* L1 d-cache size */ + u32 dline_size; /* L1 d-cache line size */ + u32 log_dline_size; + u32 dlines_per_page; + u32 isize; /* L1 i-cache size */ + u32 iline_size; /* L1 i-cache line size */ + u32 log_iline_size; + u32 ilines_per_page; +}; + +extern struct ppc64_caches ppc64_caches; +#endif /* __powerpc64__ && ! __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ +#endif /* _ASM_POWERPC_CACHE_H */ Index: working-2.6/arch/ppc/platforms/prep_setup.c =================================================================== --- working-2.6.orig/arch/ppc/platforms/prep_setup.c 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/ppc/platforms/prep_setup.c 2005-11-10 11:14:30.000000000 +1100 @@ -61,6 +61,15 @@ #include #include +/* prep registers for L2 */ +#define CACHECRBA 0x80000823 /* Cache configuration register address */ +#define L2CACHE_MASK 0x03 /* Mask for 2 L2 Cache bits */ +#define L2CACHE_512KB 0x00 /* 512KB */ +#define L2CACHE_256KB 0x01 /* 256KB */ +#define L2CACHE_1MB 0x02 /* 1MB */ +#define L2CACHE_NONE 0x03 /* NONE */ +#define L2CACHE_PARITY 0x08 /* Mask for L2 Cache Parity Protected bit */ + TODC_ALLOC(); unsigned char ucSystemType; Index: working-2.6/include/asm-powerpc/reg.h =================================================================== --- working-2.6.orig/include/asm-powerpc/reg.h 2005-11-08 16:10:59.000000000 +1100 +++ working-2.6/include/asm-powerpc/reg.h 2005-11-10 11:14:30.000000000 +1100 @@ -16,7 +16,11 @@ /* Pickup Book E specific registers. */ #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) #include -#endif +#endif /* CONFIG_BOOKE || CONFIG_40x */ + +#ifdef CONFIG_8xx +#include +#endif /* CONFIG_8xx */ #define MSR_SF_LG 63 /* Enable 64 bit mode */ #define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */ Index: working-2.6/include/asm-powerpc/reg_8xx.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/reg_8xx.h 2005-11-10 11:14:30.000000000 +1100 @@ -0,0 +1,42 @@ +/* + * Contains register definitions common to PowerPC 8xx CPUs. Notice + */ +#ifndef _ASM_POWERPC_REG_8xx_H +#define _ASM_POWERPC_REG_8xx_H + +/* Cache control on the MPC8xx is provided through some additional + * special purpose registers. + */ +#define SPRN_IC_CST 560 /* Instruction cache control/status */ +#define SPRN_IC_ADR 561 /* Address needed for some commands */ +#define SPRN_IC_DAT 562 /* Read-only data register */ +#define SPRN_DC_CST 568 /* Data cache control/status */ +#define SPRN_DC_ADR 569 /* Address needed for some commands */ +#define SPRN_DC_DAT 570 /* Read-only data register */ + +/* Commands. Only the first few are available to the instruction cache. +*/ +#define IDC_ENABLE 0x02000000 /* Cache enable */ +#define IDC_DISABLE 0x04000000 /* Cache disable */ +#define IDC_LDLCK 0x06000000 /* Load and lock */ +#define IDC_UNLINE 0x08000000 /* Unlock line */ +#define IDC_UNALL 0x0a000000 /* Unlock all */ +#define IDC_INVALL 0x0c000000 /* Invalidate all */ + +#define DC_FLINE 0x0e000000 /* Flush data cache line */ +#define DC_SFWT 0x01000000 /* Set forced writethrough mode */ +#define DC_CFWT 0x03000000 /* Clear forced writethrough mode */ +#define DC_SLES 0x05000000 /* Set little endian swap mode */ +#define DC_CLES 0x07000000 /* Clear little endian swap mode */ + +/* Status. +*/ +#define IDC_ENABLED 0x80000000 /* Cache is enabled */ +#define IDC_CERR1 0x00200000 /* Cache error 1 */ +#define IDC_CERR2 0x00100000 /* Cache error 2 */ +#define IDC_CERR3 0x00080000 /* Cache error 3 */ + +#define DC_DFWT 0x40000000 /* Data cache is forced write through */ +#define DC_LES 0x20000000 /* Caches are little endian mode */ + +#endif /* _ASM_POWERPC_REG_8xx_H */ Index: working-2.6/include/asm-ppc64/cache.h =================================================================== --- working-2.6.orig/include/asm-ppc64/cache.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,36 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#ifndef __ARCH_PPC64_CACHE_H -#define __ARCH_PPC64_CACHE_H - -#include - -/* bytes per L1 cache line */ -#define L1_CACHE_SHIFT 7 -#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) - -#define SMP_CACHE_BYTES L1_CACHE_BYTES -#define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ - -#ifndef __ASSEMBLY__ - -struct ppc64_caches { - u32 dsize; /* L1 d-cache size */ - u32 dline_size; /* L1 d-cache line size */ - u32 log_dline_size; - u32 dlines_per_page; - u32 isize; /* L1 i-cache size */ - u32 iline_size; /* L1 i-cache line size */ - u32 log_iline_size; - u32 ilines_per_page; -}; - -extern struct ppc64_caches ppc64_caches; - -#endif - -#endif Index: working-2.6/arch/powerpc/kernel/misc_32.S =================================================================== --- working-2.6.orig/arch/powerpc/kernel/misc_32.S 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/misc_32.S 2005-11-10 11:14:30.000000000 +1100 @@ -519,7 +519,7 @@ * * flush_icache_range(unsigned long start, unsigned long stop) */ -_GLOBAL(flush_icache_range) +_GLOBAL(__flush_icache_range) BEGIN_FTR_SECTION blr /* for 601, do nothing */ END_FTR_SECTION_IFCLR(CPU_FTR_SPLIT_ID_CACHE) @@ -607,27 +607,6 @@ sync /* wait for dcbi's to get to ram */ blr -#ifdef CONFIG_NOT_COHERENT_CACHE -/* - * 40x cores have 8K or 16K dcache and 32 byte line size. - * 44x has a 32K dcache and 32 byte line size. - * 8xx has 1, 2, 4, 8K variants. - * For now, cover the worst case of the 44x. - * Must be called with external interrupts disabled. - */ -#define CACHE_NWAYS 64 -#define CACHE_NLINES 16 - -_GLOBAL(flush_dcache_all) - li r4, (2 * CACHE_NWAYS * CACHE_NLINES) - mtctr r4 - lis r5, KERNELBASE at h -1: lwz r3, 0(r5) /* Load one word from every line */ - addi r5, r5, L1_CACHE_BYTES - bdnz 1b - blr -#endif /* CONFIG_NOT_COHERENT_CACHE */ - /* * Flush a particular page from the data cache to RAM. * Note: this is necessary because the instruction cache does *not* Index: working-2.6/arch/ppc/kernel/ppc_ksyms.c =================================================================== --- working-2.6.orig/arch/ppc/kernel/ppc_ksyms.c 2005-11-10 11:14:30.000000000 +1100 +++ working-2.6/arch/ppc/kernel/ppc_ksyms.c 2005-11-10 11:14:30.000000000 +1100 @@ -176,6 +176,7 @@ #endif /* CONFIG_PCI */ #ifdef CONFIG_NOT_COHERENT_CACHE +extern void flush_dcache_all(void); EXPORT_SYMBOL(flush_dcache_all); #endif -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Thu Nov 10 12:42:17 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Nov 2005 12:42:17 +1100 Subject: powerpc: Move scanlog.c to platforms/pseries Message-ID: <20051110014217.GC17840@localhost.localdomain> scanlog.c is only compiled on pSeries. Thus, this patch moves it to platforms/pseries. Built and booted on pSeries LPAR (ARCH=powerpc and ARCH=ppc64). Built for iSeries (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/platforms/pseries/scanlog.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/platforms/pseries/scanlog.c 2005-11-10 11:54:13.000000000 +1100 @@ -0,0 +1,235 @@ +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * scan-log-data driver for PPC64 Todd Inglett + * + * When ppc64 hardware fails the service processor dumps internal state + * of the system. After a reboot the operating system can access a dump + * of this data using this driver. A dump exists if the device-tree + * /chosen/ibm,scan-log-data property exists. + * + * This driver exports /proc/ppc64/scan-log-dump which can be read. + * The driver supports only sequential reads. + * + * The driver looks at a write to the driver for the single word "reset". + * If given, the driver will reset the scanlog so the platform can free it. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MODULE_VERS "1.0" +#define MODULE_NAME "scanlog" + +/* Status returns from ibm,scan-log-dump */ +#define SCANLOG_COMPLETE 0 +#define SCANLOG_HWERROR -1 +#define SCANLOG_CONTINUE 1 + +#define DEBUG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) + +static int scanlog_debug; +static unsigned int ibm_scan_log_dump; /* RTAS token */ +static struct proc_dir_entry *proc_ppc64_scan_log_dump; /* The proc file */ + +static ssize_t scanlog_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct inode * inode = file->f_dentry->d_inode; + struct proc_dir_entry *dp; + unsigned int *data; + int status; + unsigned long len, off; + unsigned int wait_time; + + dp = PDE(inode); + data = (unsigned int *)dp->data; + + if (!data) { + printk(KERN_ERR "scanlog: read failed no data\n"); + return -EIO; + } + + if (count > RTAS_DATA_BUF_SIZE) + count = RTAS_DATA_BUF_SIZE; + + if (count < 1024) { + /* This is the min supported by this RTAS call. Rather + * than do all the buffering we insist the user code handle + * larger reads. As long as cp works... :) + */ + printk(KERN_ERR "scanlog: cannot perform a small read (%ld)\n", count); + return -EINVAL; + } + + if (!access_ok(VERIFY_WRITE, buf, count)) + return -EFAULT; + + for (;;) { + wait_time = 500; /* default wait if no data */ + spin_lock(&rtas_data_buf_lock); + memcpy(rtas_data_buf, data, RTAS_DATA_BUF_SIZE); + status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, + (u32) __pa(rtas_data_buf), (u32) count); + memcpy(data, rtas_data_buf, RTAS_DATA_BUF_SIZE); + spin_unlock(&rtas_data_buf_lock); + + DEBUG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", + status, data[0], data[1], data[2]); + switch (status) { + case SCANLOG_COMPLETE: + DEBUG("hit eof\n"); + return 0; + case SCANLOG_HWERROR: + DEBUG("hardware error reading scan log data\n"); + return -EIO; + case SCANLOG_CONTINUE: + /* We may or may not have data yet */ + len = data[1]; + off = data[2]; + if (len > 0) { + if (copy_to_user(buf, ((char *)data)+off, len)) + return -EFAULT; + return len; + } + /* Break to sleep default time */ + break; + default: + if (status > 9900 && status <= 9905) { + wait_time = rtas_extended_busy_delay_time(status); + } else { + printk(KERN_ERR "scanlog: unknown error from rtas: %d\n", status); + return -EIO; + } + } + /* Apparently no data yet. Wait and try again. */ + msleep_interruptible(wait_time); + } + /*NOTREACHED*/ +} + +static ssize_t scanlog_write(struct file * file, const char __user * buf, + size_t count, loff_t *ppos) +{ + char stkbuf[20]; + int status; + + if (count > 19) count = 19; + if (copy_from_user (stkbuf, buf, count)) { + return -EFAULT; + } + stkbuf[count] = 0; + + if (buf) { + if (strncmp(stkbuf, "reset", 5) == 0) { + DEBUG("reset scanlog\n"); + status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, 0, 0); + DEBUG("rtas returns %d\n", status); + } else if (strncmp(stkbuf, "debugon", 7) == 0) { + printk(KERN_ERR "scanlog: debug on\n"); + scanlog_debug = 1; + } else if (strncmp(stkbuf, "debugoff", 8) == 0) { + printk(KERN_ERR "scanlog: debug off\n"); + scanlog_debug = 0; + } + } + return count; +} + +static int scanlog_open(struct inode * inode, struct file * file) +{ + struct proc_dir_entry *dp = PDE(inode); + unsigned int *data = (unsigned int *)dp->data; + + if (!data) { + printk(KERN_ERR "scanlog: open failed no data\n"); + return -EIO; + } + + if (data[0] != 0) { + /* This imperfect test stops a second copy of the + * data (or a reset while data is being copied) + */ + return -EBUSY; + } + + data[0] = 0; /* re-init so we restart the scan */ + + return 0; +} + +static int scanlog_release(struct inode * inode, struct file * file) +{ + struct proc_dir_entry *dp = PDE(inode); + unsigned int *data = (unsigned int *)dp->data; + + if (!data) { + printk(KERN_ERR "scanlog: release failed no data\n"); + return -EIO; + } + data[0] = 0; + + return 0; +} + +struct file_operations scanlog_fops = { + .owner = THIS_MODULE, + .read = scanlog_read, + .write = scanlog_write, + .open = scanlog_open, + .release = scanlog_release, +}; + +int __init scanlog_init(void) +{ + struct proc_dir_entry *ent; + + ibm_scan_log_dump = rtas_token("ibm,scan-log-dump"); + if (ibm_scan_log_dump == RTAS_UNKNOWN_SERVICE) { + printk(KERN_ERR "scan-log-dump not implemented on this system\n"); + return -EIO; + } + + ent = create_proc_entry("ppc64/rtas/scan-log-dump", S_IRUSR, NULL); + if (ent) { + ent->proc_fops = &scanlog_fops; + /* Ideally we could allocate a buffer < 4G */ + ent->data = kmalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL); + if (!ent->data) { + printk(KERN_ERR "Failed to allocate a buffer\n"); + remove_proc_entry("scan-log-dump", ent->parent); + return -ENOMEM; + } + ((unsigned int *)ent->data)[0] = 0; + } else { + printk(KERN_ERR "Failed to create ppc64/scan-log-dump proc entry\n"); + return -EIO; + } + proc_ppc64_scan_log_dump = ent; + + return 0; +} + +void __exit scanlog_cleanup(void) +{ + if (proc_ppc64_scan_log_dump) { + kfree(proc_ppc64_scan_log_dump->data); + remove_proc_entry("scan-log-dump", proc_ppc64_scan_log_dump->parent); + } +} + +module_init(scanlog_init); +module_exit(scanlog_cleanup); +MODULE_LICENSE("GPL"); Index: working-2.6/arch/ppc64/kernel/scanlog.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/scanlog.c 2005-11-08 11:11:29.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,235 +0,0 @@ -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - * scan-log-data driver for PPC64 Todd Inglett - * - * When ppc64 hardware fails the service processor dumps internal state - * of the system. After a reboot the operating system can access a dump - * of this data using this driver. A dump exists if the device-tree - * /chosen/ibm,scan-log-data property exists. - * - * This driver exports /proc/ppc64/scan-log-dump which can be read. - * The driver supports only sequential reads. - * - * The driver looks at a write to the driver for the single word "reset". - * If given, the driver will reset the scanlog so the platform can free it. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define MODULE_VERS "1.0" -#define MODULE_NAME "scanlog" - -/* Status returns from ibm,scan-log-dump */ -#define SCANLOG_COMPLETE 0 -#define SCANLOG_HWERROR -1 -#define SCANLOG_CONTINUE 1 - -#define DEBUG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) - -static int scanlog_debug; -static unsigned int ibm_scan_log_dump; /* RTAS token */ -static struct proc_dir_entry *proc_ppc64_scan_log_dump; /* The proc file */ - -static ssize_t scanlog_read(struct file *file, char __user *buf, - size_t count, loff_t *ppos) -{ - struct inode * inode = file->f_dentry->d_inode; - struct proc_dir_entry *dp; - unsigned int *data; - int status; - unsigned long len, off; - unsigned int wait_time; - - dp = PDE(inode); - data = (unsigned int *)dp->data; - - if (!data) { - printk(KERN_ERR "scanlog: read failed no data\n"); - return -EIO; - } - - if (count > RTAS_DATA_BUF_SIZE) - count = RTAS_DATA_BUF_SIZE; - - if (count < 1024) { - /* This is the min supported by this RTAS call. Rather - * than do all the buffering we insist the user code handle - * larger reads. As long as cp works... :) - */ - printk(KERN_ERR "scanlog: cannot perform a small read (%ld)\n", count); - return -EINVAL; - } - - if (!access_ok(VERIFY_WRITE, buf, count)) - return -EFAULT; - - for (;;) { - wait_time = 500; /* default wait if no data */ - spin_lock(&rtas_data_buf_lock); - memcpy(rtas_data_buf, data, RTAS_DATA_BUF_SIZE); - status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, - (u32) __pa(rtas_data_buf), (u32) count); - memcpy(data, rtas_data_buf, RTAS_DATA_BUF_SIZE); - spin_unlock(&rtas_data_buf_lock); - - DEBUG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", - status, data[0], data[1], data[2]); - switch (status) { - case SCANLOG_COMPLETE: - DEBUG("hit eof\n"); - return 0; - case SCANLOG_HWERROR: - DEBUG("hardware error reading scan log data\n"); - return -EIO; - case SCANLOG_CONTINUE: - /* We may or may not have data yet */ - len = data[1]; - off = data[2]; - if (len > 0) { - if (copy_to_user(buf, ((char *)data)+off, len)) - return -EFAULT; - return len; - } - /* Break to sleep default time */ - break; - default: - if (status > 9900 && status <= 9905) { - wait_time = rtas_extended_busy_delay_time(status); - } else { - printk(KERN_ERR "scanlog: unknown error from rtas: %d\n", status); - return -EIO; - } - } - /* Apparently no data yet. Wait and try again. */ - msleep_interruptible(wait_time); - } - /*NOTREACHED*/ -} - -static ssize_t scanlog_write(struct file * file, const char __user * buf, - size_t count, loff_t *ppos) -{ - char stkbuf[20]; - int status; - - if (count > 19) count = 19; - if (copy_from_user (stkbuf, buf, count)) { - return -EFAULT; - } - stkbuf[count] = 0; - - if (buf) { - if (strncmp(stkbuf, "reset", 5) == 0) { - DEBUG("reset scanlog\n"); - status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, 0, 0); - DEBUG("rtas returns %d\n", status); - } else if (strncmp(stkbuf, "debugon", 7) == 0) { - printk(KERN_ERR "scanlog: debug on\n"); - scanlog_debug = 1; - } else if (strncmp(stkbuf, "debugoff", 8) == 0) { - printk(KERN_ERR "scanlog: debug off\n"); - scanlog_debug = 0; - } - } - return count; -} - -static int scanlog_open(struct inode * inode, struct file * file) -{ - struct proc_dir_entry *dp = PDE(inode); - unsigned int *data = (unsigned int *)dp->data; - - if (!data) { - printk(KERN_ERR "scanlog: open failed no data\n"); - return -EIO; - } - - if (data[0] != 0) { - /* This imperfect test stops a second copy of the - * data (or a reset while data is being copied) - */ - return -EBUSY; - } - - data[0] = 0; /* re-init so we restart the scan */ - - return 0; -} - -static int scanlog_release(struct inode * inode, struct file * file) -{ - struct proc_dir_entry *dp = PDE(inode); - unsigned int *data = (unsigned int *)dp->data; - - if (!data) { - printk(KERN_ERR "scanlog: release failed no data\n"); - return -EIO; - } - data[0] = 0; - - return 0; -} - -struct file_operations scanlog_fops = { - .owner = THIS_MODULE, - .read = scanlog_read, - .write = scanlog_write, - .open = scanlog_open, - .release = scanlog_release, -}; - -int __init scanlog_init(void) -{ - struct proc_dir_entry *ent; - - ibm_scan_log_dump = rtas_token("ibm,scan-log-dump"); - if (ibm_scan_log_dump == RTAS_UNKNOWN_SERVICE) { - printk(KERN_ERR "scan-log-dump not implemented on this system\n"); - return -EIO; - } - - ent = create_proc_entry("ppc64/rtas/scan-log-dump", S_IRUSR, NULL); - if (ent) { - ent->proc_fops = &scanlog_fops; - /* Ideally we could allocate a buffer < 4G */ - ent->data = kmalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL); - if (!ent->data) { - printk(KERN_ERR "Failed to allocate a buffer\n"); - remove_proc_entry("scan-log-dump", ent->parent); - return -ENOMEM; - } - ((unsigned int *)ent->data)[0] = 0; - } else { - printk(KERN_ERR "Failed to create ppc64/scan-log-dump proc entry\n"); - return -EIO; - } - proc_ppc64_scan_log_dump = ent; - - return 0; -} - -void __exit scanlog_cleanup(void) -{ - if (proc_ppc64_scan_log_dump) { - kfree(proc_ppc64_scan_log_dump->data); - remove_proc_entry("scan-log-dump", proc_ppc64_scan_log_dump->parent); - } -} - -module_init(scanlog_init); -module_exit(scanlog_cleanup); -MODULE_LICENSE("GPL"); Index: working-2.6/arch/powerpc/platforms/pseries/Makefile =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/Makefile 2005-11-10 11:55:01.000000000 +1100 @@ -3,3 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o +obj-$(CONFIG_SCANLOG) += scanlog.o Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-11-10 11:53:48.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-10 11:54:43.000000000 +1100 @@ -37,7 +37,6 @@ obj-$(CONFIG_MODULES) += ppc_ksyms.o endif obj-$(CONFIG_PPC_RTAS) += rtas_pci.o -obj-$(CONFIG_SCANLOG) += scanlog.o obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o ifneq ($(CONFIG_PPC_MERGE),y) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Thu Nov 10 14:31:48 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Nov 2005 14:31:48 +1100 Subject: powerpc: Move more ppc64 files with no ppc32 equivalent to powerpc Message-ID: <20051110033148.GL17840@localhost.localdomain> This patch moves a bunch more files from arch/ppc64 and include/asm-ppc64 which have no equivalents in ppc32 code into arch/powerpc and include/asm-powerpc. The file affected are: hvcall.h proc_ppc64.c sysfs.c lparcfg.c rtas_pci.c The only changes apart from the move and corresponding Makefile changes are: - #ifndef/#define in includes updated to _ASM_POWERPC_ form - trailing whitespace removed - comments giving full paths removed Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built for 32-bit powermac (ARCH=powerpc). Signed-off-by: David Gibson The patch itself is too big for the list, so grab it from: http://www.ozlabs.org/~dgibson/home/merge-move-more-ppc64-specifics -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Thu Nov 10 15:20:04 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Nov 2005 15:20:04 +1100 Subject: powerpc: Move more ppc64 files with no ppc32 equivalent to powerpc In-Reply-To: <20051110033148.GL17840@localhost.localdomain> References: <20051110033148.GL17840@localhost.localdomain> Message-ID: <20051110042004.GA26218@localhost.localdomain> On Thu, Nov 10, 2005 at 02:31:48PM +1100, David Gibson wrote: > This patch moves a bunch more files from arch/ppc64 and > include/asm-ppc64 which have no equivalents in ppc32 code into > arch/powerpc and include/asm-powerpc. The file affected are: > hvcall.h > proc_ppc64.c > sysfs.c > lparcfg.c > rtas_pci.c > > The only changes apart from the move and corresponding Makefile > changes are: > - #ifndef/#define in includes updated to _ASM_POWERPC_ form > - trailing whitespace removed > - comments giving full paths removed > > Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built > for 32-bit powermac (ARCH=powerpc). > > Signed-off-by: David Gibson > > The patch itself is too big for the list, so grab it from: > > http://www.ozlabs.org/~dgibson/home/merge-move-more-ppc64-specifics I've just updated the patch at this URL with a version which doesn't conflict with the changes that went into the merge tree in the last few hours. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Thu Nov 10 16:19:31 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Nov 2005 16:19:31 +1100 Subject: powerpc: Move hvconsole files to drivers/char Message-ID: <20051110051931.GA24111@localhost.localdomain> At present the code for the pSeries hypervisor console is split between drivers/char and arch/ppc64/kernel for no terribly good reason. Thus, this patch moves hvconsole.c and hvcserver.c from arch/ppc64/kernel to drivers/char. That lets us also move hvconsole.h and hvcserver.h from include/asm-ppc64 to drivers/char. Built and booted on pSeries LPAR (ARCH=powerpc and ARCh=ppc64). Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/hvconsole.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/hvconsole.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,74 +0,0 @@ -/* - * hvconsole.c - * Copyright (C) 2004 Hollis Blanchard, IBM Corporation - * Copyright (C) 2004 IBM Corporation - * - * Additional Author(s): - * Ryan S. Arnold - * - * LPAR console support. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include - -/** - * hvc_get_chars - retrieve characters from firmware for denoted vterm adatper - * @vtermno: The vtermno or unit_address of the adapter from which to fetch the - * data. - * @buf: The character buffer into which to put the character data fetched from - * firmware. - * @count: not used? - */ -int hvc_get_chars(uint32_t vtermno, char *buf, int count) -{ - unsigned long got; - - if (plpar_hcall(H_GET_TERM_CHAR, vtermno, 0, 0, 0, &got, - (unsigned long *)buf, (unsigned long *)buf+1) == H_Success) - return got; - return 0; -} - -EXPORT_SYMBOL(hvc_get_chars); - - -/** - * hvc_put_chars: send characters to firmware for denoted vterm adapter - * @vtermno: The vtermno or unit_address of the adapter from which the data - * originated. - * @buf: The character buffer that contains the character data to send to - * firmware. - * @count: Send this number of characters. - */ -int hvc_put_chars(uint32_t vtermno, const char *buf, int count) -{ - unsigned long *lbuf = (unsigned long *) buf; - long ret; - - ret = plpar_hcall_norets(H_PUT_TERM_CHAR, vtermno, count, lbuf[0], - lbuf[1]); - if (ret == H_Success) - return count; - if (ret == H_Busy) - return 0; - return -EIO; -} - -EXPORT_SYMBOL(hvc_put_chars); Index: working-2.6/arch/ppc64/kernel/hvcserver.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/hvcserver.c 2005-11-08 10:57:14.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,251 +0,0 @@ -/* - * hvcserver.c - * Copyright (C) 2004 Ryan S Arnold, IBM Corporation - * - * PPC64 virtual I/O console server support. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include - -#include -#include -#include - -#define HVCS_ARCH_VERSION "1.0.0" - -MODULE_AUTHOR("Ryan S. Arnold "); -MODULE_DESCRIPTION("IBM hvcs ppc64 API"); -MODULE_LICENSE("GPL"); -MODULE_VERSION(HVCS_ARCH_VERSION); - -/* - * Convert arch specific return codes into relevant errnos. The hvcs - * functions aren't performance sensitive, so this conversion isn't an - * issue. - */ -int hvcs_convert(long to_convert) -{ - switch (to_convert) { - case H_Success: - return 0; - case H_Parameter: - return -EINVAL; - case H_Hardware: - return -EIO; - case H_Busy: - case H_LongBusyOrder1msec: - case H_LongBusyOrder10msec: - case H_LongBusyOrder100msec: - case H_LongBusyOrder1sec: - case H_LongBusyOrder10sec: - case H_LongBusyOrder100sec: - return -EBUSY; - case H_Function: /* fall through */ - default: - return -EPERM; - } -} - -/** - * hvcs_free_partner_info - free pi allocated by hvcs_get_partner_info - * @head: list_head pointer for an allocated list of partner info structs to - * free. - * - * This function is used to free the partner info list that was returned by - * calling hvcs_get_partner_info(). - */ -int hvcs_free_partner_info(struct list_head *head) -{ - struct hvcs_partner_info *pi; - struct list_head *element; - - if (!head) - return -EINVAL; - - while (!list_empty(head)) { - element = head->next; - pi = list_entry(element, struct hvcs_partner_info, node); - list_del(element); - kfree(pi); - } - - return 0; -} -EXPORT_SYMBOL(hvcs_free_partner_info); - -/* Helper function for hvcs_get_partner_info */ -int hvcs_next_partner(uint32_t unit_address, - unsigned long last_p_partition_ID, - unsigned long last_p_unit_address, unsigned long *pi_buff) - -{ - long retval; - retval = plpar_hcall_norets(H_VTERM_PARTNER_INFO, unit_address, - last_p_partition_ID, - last_p_unit_address, virt_to_phys(pi_buff)); - return hvcs_convert(retval); -} - -/** - * hvcs_get_partner_info - Get all of the partner info for a vty-server adapter - * @unit_address: The unit_address of the vty-server adapter for which this - * function is fetching partner info. - * @head: An initialized list_head pointer to an empty list to use to return the - * list of partner info fetched from the hypervisor to the caller. - * @pi_buff: A page sized buffer pre-allocated prior to calling this function - * that is to be used to be used by firmware as an iterator to keep track - * of the partner info retrieval. - * - * This function returns non-zero on success, or if there is no partner info. - * - * The pi_buff is pre-allocated prior to calling this function because this - * function may be called with a spin_lock held and kmalloc of a page is not - * recommended as GFP_ATOMIC. - * - * The first long of this buffer is used to store a partner unit address. The - * second long is used to store a partner partition ID and starting at - * pi_buff[2] is the 79 character Converged Location Code (diff size than the - * unsigned longs, hence the casting mumbo jumbo you see later). - * - * Invocation of this function should always be followed by an invocation of - * hvcs_free_partner_info() using a pointer to the SAME list head instance - * that was passed as a parameter to this function. - */ -int hvcs_get_partner_info(uint32_t unit_address, struct list_head *head, - unsigned long *pi_buff) -{ - /* - * Dealt with as longs because of the hcall interface even though the - * values are uint32_t. - */ - unsigned long last_p_partition_ID; - unsigned long last_p_unit_address; - struct hvcs_partner_info *next_partner_info = NULL; - int more = 1; - int retval; - - memset(pi_buff, 0x00, PAGE_SIZE); - /* invalid parameters */ - if (!head || !pi_buff) - return -EINVAL; - - last_p_partition_ID = last_p_unit_address = ~0UL; - INIT_LIST_HEAD(head); - - do { - retval = hvcs_next_partner(unit_address, last_p_partition_ID, - last_p_unit_address, pi_buff); - if (retval) { - /* - * Don't indicate that we've failed if we have - * any list elements. - */ - if (!list_empty(head)) - return 0; - return retval; - } - - last_p_partition_ID = pi_buff[0]; - last_p_unit_address = pi_buff[1]; - - /* This indicates that there are no further partners */ - if (last_p_partition_ID == ~0UL - && last_p_unit_address == ~0UL) - break; - - /* This is a very small struct and will be freed soon in - * hvcs_free_partner_info(). */ - next_partner_info = kmalloc(sizeof(struct hvcs_partner_info), - GFP_ATOMIC); - - if (!next_partner_info) { - printk(KERN_WARNING "HVCONSOLE: kmalloc() failed to" - " allocate partner info struct.\n"); - hvcs_free_partner_info(head); - return -ENOMEM; - } - - next_partner_info->unit_address - = (unsigned int)last_p_unit_address; - next_partner_info->partition_ID - = (unsigned int)last_p_partition_ID; - - /* copy the Null-term char too */ - strncpy(&next_partner_info->location_code[0], - (char *)&pi_buff[2], - strlen((char *)&pi_buff[2]) + 1); - - list_add_tail(&(next_partner_info->node), head); - next_partner_info = NULL; - - } while (more); - - return 0; -} -EXPORT_SYMBOL(hvcs_get_partner_info); - -/** - * hvcs_register_connection - establish a connection between this vty-server and - * a vty. - * @unit_address: The unit address of the vty-server adapter that is to be - * establish a connection. - * @p_partition_ID: The partition ID of the vty adapter that is to be connected. - * @p_unit_address: The unit address of the vty adapter to which the vty-server - * is to be connected. - * - * If this function is called once and -EINVAL is returned it may - * indicate that the partner info needs to be refreshed for the - * target unit address at which point the caller must invoke - * hvcs_get_partner_info() and then call this function again. If, - * for a second time, -EINVAL is returned then it indicates that - * there is probably already a partner connection registered to a - * different vty-server adapter. It is also possible that a second - * -EINVAL may indicate that one of the parms is not valid, for - * instance if the link was removed between the vty-server adapter - * and the vty adapter that you are trying to open. Don't shoot the - * messenger. Firmware implemented it this way. - */ -int hvcs_register_connection( uint32_t unit_address, - uint32_t p_partition_ID, uint32_t p_unit_address) -{ - long retval; - retval = plpar_hcall_norets(H_REGISTER_VTERM, unit_address, - p_partition_ID, p_unit_address); - return hvcs_convert(retval); -} -EXPORT_SYMBOL(hvcs_register_connection); - -/** - * hvcs_free_connection - free the connection between a vty-server and vty - * @unit_address: The unit address of the vty-server that is to have its - * connection severed. - * - * This function is used to free the partner connection between a vty-server - * adapter and a vty adapter. - * - * If -EBUSY is returned continue to call this function until 0 is returned. - */ -int hvcs_free_connection(uint32_t unit_address) -{ - long retval; - retval = plpar_hcall_norets(H_FREE_VTERM, unit_address); - return hvcs_convert(retval); -} -EXPORT_SYMBOL(hvcs_free_connection); Index: working-2.6/drivers/char/hvconsole.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/drivers/char/hvconsole.c 2005-11-10 15:51:24.000000000 +1100 @@ -0,0 +1,72 @@ +/* + * hvconsole.c + * Copyright (C) 2004 Hollis Blanchard, IBM Corporation + * Copyright (C) 2004 IBM Corporation + * + * Additional Author(s): + * Ryan S. Arnold + * + * LPAR console support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include "hvconsole.h" + +/** + * hvc_get_chars - retrieve characters from firmware for denoted vterm adatper + * @vtermno: The vtermno or unit_address of the adapter from which to fetch the + * data. + * @buf: The character buffer into which to put the character data fetched from + * firmware. + * @count: not used? + */ +int hvc_get_chars(uint32_t vtermno, char *buf, int count) +{ + unsigned long got; + + if (plpar_hcall(H_GET_TERM_CHAR, vtermno, 0, 0, 0, &got, + (unsigned long *)buf, (unsigned long *)buf+1) == H_Success) + return got; + return 0; +} +EXPORT_SYMBOL(hvc_get_chars); + + +/** + * hvc_put_chars: send characters to firmware for denoted vterm adapter + * @vtermno: The vtermno or unit_address of the adapter from which the data + * originated. + * @buf: The character buffer that contains the character data to send to + * firmware. + * @count: Send this number of characters. + */ +int hvc_put_chars(uint32_t vtermno, const char *buf, int count) +{ + unsigned long *lbuf = (unsigned long *) buf; + long ret; + + ret = plpar_hcall_norets(H_PUT_TERM_CHAR, vtermno, count, lbuf[0], + lbuf[1]); + if (ret == H_Success) + return count; + if (ret == H_Busy) + return 0; + return -EIO; +} +EXPORT_SYMBOL(hvc_put_chars); Index: working-2.6/drivers/char/hvconsole.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/drivers/char/hvconsole.h 2005-11-10 15:51:24.000000000 +1100 @@ -0,0 +1,49 @@ +/* + * hvconsole.h + * Copyright (C) 2004 Ryan S Arnold, IBM Corporation + * + * LPAR console support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _DRIVERS_CHAR_HVCONSOLE_H +#define _DRIVERS_CHAR_HVCONSOLE_H + +/* + * This is the max number of console adapters that can/will be found as + * console devices on first stage console init. Any number beyond this range + * can't be used as a console device but is still a valid tty device. + */ +#define MAX_NR_HVC_CONSOLES 16 + +/* implemented by a low level driver */ +struct hv_ops { + int (*get_chars)(uint32_t vtermno, char *buf, int count); + int (*put_chars)(uint32_t vtermno, const char *buf, int count); +}; +extern int hvc_get_chars(uint32_t vtermno, char *buf, int count); +extern int hvc_put_chars(uint32_t vtermno, const char *buf, int count); + +struct hvc_struct; + +/* Register a vterm and a slot index for use as a console (console_init) */ +extern int hvc_instantiate(uint32_t vtermno, int index, struct hv_ops *ops); +/* register a vterm for hvc tty operation (module_init or hotplug add) */ +extern struct hvc_struct * __devinit hvc_alloc(uint32_t vtermno, int irq, + struct hv_ops *ops); +/* remove a vterm from hvc tty operation (modele_exit or hotplug remove) */ +extern int __devexit hvc_remove(struct hvc_struct *hp); +#endif /* _DRIVERS_CHAR_HVCONSOLE_H */ Index: working-2.6/drivers/char/hvcserver.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/drivers/char/hvcserver.c 2005-11-10 15:51:24.000000000 +1100 @@ -0,0 +1,252 @@ +/* + * hvcserver.c + * Copyright (C) 2004 Ryan S Arnold, IBM Corporation + * + * PPC64 virtual I/O console server support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include + +#include +#include + +#include "hvcserver.h" + +#define HVCS_ARCH_VERSION "1.0.0" + +MODULE_AUTHOR("Ryan S. Arnold "); +MODULE_DESCRIPTION("IBM hvcs ppc64 API"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(HVCS_ARCH_VERSION); + +/* + * Convert arch specific return codes into relevant errnos. The hvcs + * functions aren't performance sensitive, so this conversion isn't an + * issue. + */ +int hvcs_convert(long to_convert) +{ + switch (to_convert) { + case H_Success: + return 0; + case H_Parameter: + return -EINVAL; + case H_Hardware: + return -EIO; + case H_Busy: + case H_LongBusyOrder1msec: + case H_LongBusyOrder10msec: + case H_LongBusyOrder100msec: + case H_LongBusyOrder1sec: + case H_LongBusyOrder10sec: + case H_LongBusyOrder100sec: + return -EBUSY; + case H_Function: /* fall through */ + default: + return -EPERM; + } +} + +/** + * hvcs_free_partner_info - free pi allocated by hvcs_get_partner_info + * @head: list_head pointer for an allocated list of partner info structs to + * free. + * + * This function is used to free the partner info list that was returned by + * calling hvcs_get_partner_info(). + */ +int hvcs_free_partner_info(struct list_head *head) +{ + struct hvcs_partner_info *pi; + struct list_head *element; + + if (!head) + return -EINVAL; + + while (!list_empty(head)) { + element = head->next; + pi = list_entry(element, struct hvcs_partner_info, node); + list_del(element); + kfree(pi); + } + + return 0; +} +EXPORT_SYMBOL(hvcs_free_partner_info); + +/* Helper function for hvcs_get_partner_info */ +int hvcs_next_partner(uint32_t unit_address, + unsigned long last_p_partition_ID, + unsigned long last_p_unit_address, unsigned long *pi_buff) + +{ + long retval; + retval = plpar_hcall_norets(H_VTERM_PARTNER_INFO, unit_address, + last_p_partition_ID, + last_p_unit_address, virt_to_phys(pi_buff)); + return hvcs_convert(retval); +} + +/** + * hvcs_get_partner_info - Get all of the partner info for a vty-server adapter + * @unit_address: The unit_address of the vty-server adapter for which this + * function is fetching partner info. + * @head: An initialized list_head pointer to an empty list to use to return the + * list of partner info fetched from the hypervisor to the caller. + * @pi_buff: A page sized buffer pre-allocated prior to calling this function + * that is to be used to be used by firmware as an iterator to keep track + * of the partner info retrieval. + * + * This function returns non-zero on success, or if there is no partner info. + * + * The pi_buff is pre-allocated prior to calling this function because this + * function may be called with a spin_lock held and kmalloc of a page is not + * recommended as GFP_ATOMIC. + * + * The first long of this buffer is used to store a partner unit address. The + * second long is used to store a partner partition ID and starting at + * pi_buff[2] is the 79 character Converged Location Code (diff size than the + * unsigned longs, hence the casting mumbo jumbo you see later). + * + * Invocation of this function should always be followed by an invocation of + * hvcs_free_partner_info() using a pointer to the SAME list head instance + * that was passed as a parameter to this function. + */ +int hvcs_get_partner_info(uint32_t unit_address, struct list_head *head, + unsigned long *pi_buff) +{ + /* + * Dealt with as longs because of the hcall interface even though the + * values are uint32_t. + */ + unsigned long last_p_partition_ID; + unsigned long last_p_unit_address; + struct hvcs_partner_info *next_partner_info = NULL; + int more = 1; + int retval; + + memset(pi_buff, 0x00, PAGE_SIZE); + /* invalid parameters */ + if (!head || !pi_buff) + return -EINVAL; + + last_p_partition_ID = last_p_unit_address = ~0UL; + INIT_LIST_HEAD(head); + + do { + retval = hvcs_next_partner(unit_address, last_p_partition_ID, + last_p_unit_address, pi_buff); + if (retval) { + /* + * Don't indicate that we've failed if we have + * any list elements. + */ + if (!list_empty(head)) + return 0; + return retval; + } + + last_p_partition_ID = pi_buff[0]; + last_p_unit_address = pi_buff[1]; + + /* This indicates that there are no further partners */ + if (last_p_partition_ID == ~0UL + && last_p_unit_address == ~0UL) + break; + + /* This is a very small struct and will be freed soon in + * hvcs_free_partner_info(). */ + next_partner_info = kmalloc(sizeof(struct hvcs_partner_info), + GFP_ATOMIC); + + if (!next_partner_info) { + printk(KERN_WARNING "HVCONSOLE: kmalloc() failed to" + " allocate partner info struct.\n"); + hvcs_free_partner_info(head); + return -ENOMEM; + } + + next_partner_info->unit_address + = (unsigned int)last_p_unit_address; + next_partner_info->partition_ID + = (unsigned int)last_p_partition_ID; + + /* copy the Null-term char too */ + strncpy(&next_partner_info->location_code[0], + (char *)&pi_buff[2], + strlen((char *)&pi_buff[2]) + 1); + + list_add_tail(&(next_partner_info->node), head); + next_partner_info = NULL; + + } while (more); + + return 0; +} +EXPORT_SYMBOL(hvcs_get_partner_info); + +/** + * hvcs_register_connection - establish a connection between this vty-server and + * a vty. + * @unit_address: The unit address of the vty-server adapter that is to be + * establish a connection. + * @p_partition_ID: The partition ID of the vty adapter that is to be connected. + * @p_unit_address: The unit address of the vty adapter to which the vty-server + * is to be connected. + * + * If this function is called once and -EINVAL is returned it may + * indicate that the partner info needs to be refreshed for the + * target unit address at which point the caller must invoke + * hvcs_get_partner_info() and then call this function again. If, + * for a second time, -EINVAL is returned then it indicates that + * there is probably already a partner connection registered to a + * different vty-server adapter. It is also possible that a second + * -EINVAL may indicate that one of the parms is not valid, for + * instance if the link was removed between the vty-server adapter + * and the vty adapter that you are trying to open. Don't shoot the + * messenger. Firmware implemented it this way. + */ +int hvcs_register_connection( uint32_t unit_address, + uint32_t p_partition_ID, uint32_t p_unit_address) +{ + long retval; + retval = plpar_hcall_norets(H_REGISTER_VTERM, unit_address, + p_partition_ID, p_unit_address); + return hvcs_convert(retval); +} +EXPORT_SYMBOL(hvcs_register_connection); + +/** + * hvcs_free_connection - free the connection between a vty-server and vty + * @unit_address: The unit address of the vty-server that is to have its + * connection severed. + * + * This function is used to free the partner connection between a vty-server + * adapter and a vty adapter. + * + * If -EBUSY is returned continue to call this function until 0 is returned. + */ +int hvcs_free_connection(uint32_t unit_address) +{ + long retval; + retval = plpar_hcall_norets(H_FREE_VTERM, unit_address); + return hvcs_convert(retval); +} +EXPORT_SYMBOL(hvcs_free_connection); Index: working-2.6/drivers/char/hvcserver.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/drivers/char/hvcserver.h 2005-11-10 15:51:24.000000000 +1100 @@ -0,0 +1,57 @@ +/* + * hvcserver.h + * Copyright (C) 2004 Ryan S Arnold, IBM Corporation + * + * PPC64 virtual I/O console server support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _DRIVERS_CHAR_HVCSERVER_H +#define _DRIVERS_CHAR_HVCSERVER_H + +#include + +/* Converged Location Code length */ +#define HVCS_CLC_LENGTH 79 + +/** + * hvcs_partner_info - an element in a list of partner info + * @node: list_head denoting this partner_info struct's position in the list of + * partner info. + * @unit_address: The partner unit address of this entry. + * @partition_ID: The partner partition ID of this entry. + * @location_code: The converged location code of this entry + 1 char for the + * null-term. + * + * This structure outlines the format that partner info is presented to a caller + * of the hvcs partner info fetching functions. These are strung together into + * a list using linux kernel lists. + */ +struct hvcs_partner_info { + struct list_head node; + uint32_t unit_address; + uint32_t partition_ID; + char location_code[HVCS_CLC_LENGTH + 1]; /* CLC + 1 null-term char */ +}; + +extern int hvcs_free_partner_info(struct list_head *head); +extern int hvcs_get_partner_info(uint32_t unit_address, + struct list_head *head, unsigned long *pi_buff); +extern int hvcs_register_connection(uint32_t unit_address, + uint32_t p_partition_ID, uint32_t p_unit_address); +extern int hvcs_free_connection(uint32_t unit_address); + +#endif /* _DRIVERS_CHAR_HVCSERVER_H */ Index: working-2.6/include/asm-ppc64/hvconsole.h =================================================================== --- working-2.6.orig/include/asm-ppc64/hvconsole.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,49 +0,0 @@ -/* - * hvconsole.h - * Copyright (C) 2004 Ryan S Arnold, IBM Corporation - * - * LPAR console support. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#ifndef _PPC64_HVCONSOLE_H -#define _PPC64_HVCONSOLE_H - -/* - * This is the max number of console adapters that can/will be found as - * console devices on first stage console init. Any number beyond this range - * can't be used as a console device but is still a valid tty device. - */ -#define MAX_NR_HVC_CONSOLES 16 - -/* implemented by a low level driver */ -struct hv_ops { - int (*get_chars)(uint32_t vtermno, char *buf, int count); - int (*put_chars)(uint32_t vtermno, const char *buf, int count); -}; -extern int hvc_get_chars(uint32_t vtermno, char *buf, int count); -extern int hvc_put_chars(uint32_t vtermno, const char *buf, int count); - -struct hvc_struct; - -/* Register a vterm and a slot index for use as a console (console_init) */ -extern int hvc_instantiate(uint32_t vtermno, int index, struct hv_ops *ops); -/* register a vterm for hvc tty operation (module_init or hotplug add) */ -extern struct hvc_struct * __devinit hvc_alloc(uint32_t vtermno, int irq, - struct hv_ops *ops); -/* remove a vterm from hvc tty operation (modele_exit or hotplug remove) */ -extern int __devexit hvc_remove(struct hvc_struct *hp); -#endif /* _PPC64_HVCONSOLE_H */ Index: working-2.6/include/asm-ppc64/hvcserver.h =================================================================== --- working-2.6.orig/include/asm-ppc64/hvcserver.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,57 +0,0 @@ -/* - * hvcserver.h - * Copyright (C) 2004 Ryan S Arnold, IBM Corporation - * - * PPC64 virtual I/O console server support. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#ifndef _PPC64_HVCSERVER_H -#define _PPC64_HVCSERVER_H - -#include - -/* Converged Location Code length */ -#define HVCS_CLC_LENGTH 79 - -/** - * hvcs_partner_info - an element in a list of partner info - * @node: list_head denoting this partner_info struct's position in the list of - * partner info. - * @unit_address: The partner unit address of this entry. - * @partition_ID: The partner partition ID of this entry. - * @location_code: The converged location code of this entry + 1 char for the - * null-term. - * - * This structure outlines the format that partner info is presented to a caller - * of the hvcs partner info fetching functions. These are strung together into - * a list using linux kernel lists. - */ -struct hvcs_partner_info { - struct list_head node; - uint32_t unit_address; - uint32_t partition_ID; - char location_code[HVCS_CLC_LENGTH + 1]; /* CLC + 1 null-term char */ -}; - -extern int hvcs_free_partner_info(struct list_head *head); -extern int hvcs_get_partner_info(uint32_t unit_address, - struct list_head *head, unsigned long *pi_buff); -extern int hvcs_register_connection(uint32_t unit_address, - uint32_t p_partition_ID, uint32_t p_unit_address); -extern int hvcs_free_connection(uint32_t unit_address); - -#endif /* _PPC64_HVCSERVER_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-11-10 15:39:54.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-10 15:51:24.000000000 +1100 @@ -34,11 +34,9 @@ ifneq ($(CONFIG_PPC_MERGE),y) obj-$(CONFIG_MODULES) += ppc_ksyms.o endif -obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o ifneq ($(CONFIG_PPC_MERGE),y) obj-$(CONFIG_BOOTX_TEXT) += btext.o endif -obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_PPC_PMAC) += udbg_scc.o Index: working-2.6/drivers/char/Makefile =================================================================== --- working-2.6.orig/drivers/char/Makefile 2005-11-08 10:57:15.000000000 +1100 +++ working-2.6/drivers/char/Makefile 2005-11-10 15:51:24.000000000 +1100 @@ -40,13 +40,13 @@ obj-$(CONFIG_AMIGA_BUILTIN_SERIAL) += amiserial.o obj-$(CONFIG_SX) += sx.o generic_serial.o obj-$(CONFIG_RIO) += rio/ generic_serial.o -obj-$(CONFIG_HVC_CONSOLE) += hvc_console.o hvc_vio.o hvsi.o +obj-$(CONFIG_HVC_CONSOLE) += hvc_console.o hvc_vio.o hvsi.o hvconsole.o obj-$(CONFIG_RAW_DRIVER) += raw.o obj-$(CONFIG_SGI_SNSC) += snsc.o snsc_event.o obj-$(CONFIG_MMTIMER) += mmtimer.o obj-$(CONFIG_VIOCONS) += viocons.o obj-$(CONFIG_VIOTAPE) += viotape.o -obj-$(CONFIG_HVCS) += hvcs.o +obj-$(CONFIG_HVCS) += hvcs.o hvcserver.o obj-$(CONFIG_SGI_MBCS) += mbcs.o obj-$(CONFIG_PRINTER) += lp.o Index: working-2.6/drivers/char/hvc_console.c =================================================================== --- working-2.6.orig/drivers/char/hvc_console.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/drivers/char/hvc_console.c 2005-11-10 15:51:24.000000000 +1100 @@ -40,7 +40,7 @@ #include #include #include -#include +#include "hvconsole.h" #define HVC_MAJOR 229 #define HVC_MINOR 0 Index: working-2.6/drivers/char/hvcs.c =================================================================== --- working-2.6.orig/drivers/char/hvcs.c 2005-11-08 10:57:15.000000000 +1100 +++ working-2.6/drivers/char/hvcs.c 2005-11-10 15:51:24.000000000 +1100 @@ -82,11 +82,12 @@ #include #include #include -#include -#include #include #include +#include "hvconsole.h" +#include "hvcserver.h" + /* * 1.3.0 -> 1.3.1 In hvcs_open memset(..,0x00,..) instead of memset(..,0x3F,00). * Removed braces around single statements following conditionals. Removed '= Index: working-2.6/drivers/char/hvc_vio.c =================================================================== --- working-2.6.orig/drivers/char/hvc_vio.c 2005-11-08 10:57:15.000000000 +1100 +++ working-2.6/drivers/char/hvc_vio.c 2005-11-10 15:54:11.000000000 +1100 @@ -31,10 +31,11 @@ #include #include -#include #include #include +#include "hvconsole.h" + char hvc_driver_name[] = "hvc_console"; static struct vio_device_id hvc_driver_table[] __devinitdata = { Index: working-2.6/drivers/char/hvsi.c =================================================================== --- working-2.6.orig/drivers/char/hvsi.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/drivers/char/hvsi.c 2005-11-10 15:55:25.000000000 +1100 @@ -45,12 +45,13 @@ #include #include #include -#include #include #include #include #include +#include "hvconsole.h" + #define HVSI_MAJOR 229 #define HVSI_MINOR 128 #define MAX_NR_HVSI_CONSOLES 4 -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From paulus at samba.org Thu Nov 10 16:33:49 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 10 Nov 2005 16:33:49 +1100 Subject: please pull powerpc-merge.git tree Message-ID: <17266.56253.108263.326409@cargo.ozlabs.ibm.com> Linus, Please do a pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git to get another powerpc update. This contains more merging, some fixes from Ben H for various bugs (notably some problems with 64k pages), a set of fixes for the PCI error handling stuff from Linas Vepstas, some memory add fixes from Mike Kravetz, some ppc patches, and various other cleanups and fixes. With the ppc32/64 merge, I think we are at the point of being able to make ppc64 machines default to ARCH=powerpc. Also, there is not much left in arch/ppc64, and it would be nice to move the remaining stuff to arch/powerpc and get rid of arch/ppc64 entirely. That would take a couple more days, though - could we have a small extension on the merge window to do that? Thanks, Paul. arch/powerpc/Kconfig | 4 arch/powerpc/kernel/Makefile | 13 - arch/powerpc/kernel/asm-offsets.c | 1 arch/powerpc/kernel/cpu_setup_power4.S | 8 arch/powerpc/kernel/cputable.c | 19 - arch/powerpc/kernel/firmware.c | 2 arch/powerpc/kernel/fpu.S | 24 + arch/powerpc/kernel/head_64.S | 89 ---- arch/powerpc/kernel/ioctl32.c | 4 arch/powerpc/kernel/irq.c | 265 +++++------ arch/powerpc/kernel/lparcfg.c | 13 - arch/powerpc/kernel/misc_32.S | 23 - arch/powerpc/kernel/misc_64.S | 8 arch/powerpc/kernel/paca.c | 7 arch/powerpc/kernel/ppc_ksyms.c | 5 arch/powerpc/kernel/proc_ppc64.c | 12 - arch/powerpc/kernel/prom.c | 35 + arch/powerpc/kernel/prom_init.c | 187 ++++++-- arch/powerpc/kernel/rtas-proc.c | 2 arch/powerpc/kernel/rtas.c | 5 arch/powerpc/kernel/rtas_pci.c | 47 +- arch/powerpc/kernel/setup-common.c | 37 +- arch/powerpc/kernel/setup.h | 6 arch/powerpc/kernel/setup_32.c | 18 - arch/powerpc/kernel/setup_64.c | 93 ++-- arch/powerpc/kernel/signal_32.c | 1 arch/powerpc/kernel/smp.c | 9 arch/powerpc/kernel/sys_ppc32.c | 1 arch/powerpc/kernel/sysfs.c | 2 arch/powerpc/kernel/time.c | 31 + arch/powerpc/kernel/traps.c | 2 arch/powerpc/lib/bitops.c | 2 arch/powerpc/mm/hash_utils_64.c | 38 +- arch/powerpc/mm/init_32.c | 3 arch/powerpc/mm/init_64.c | 20 + arch/powerpc/mm/mem.c | 4 arch/powerpc/mm/pgtable_64.c | 7 arch/powerpc/mm/stab.c | 21 + arch/powerpc/oprofile/op_model_power4.c | 3 arch/powerpc/platforms/chrp/setup.c | 4 arch/powerpc/platforms/iseries/irq.c | 24 + arch/powerpc/platforms/iseries/misc.S | 1 arch/powerpc/platforms/iseries/setup.c | 27 + arch/powerpc/platforms/maple/pci.c | 3 arch/powerpc/platforms/powermac/pci.c | 3 arch/powerpc/platforms/powermac/pic.c | 3 arch/powerpc/platforms/powermac/smp.c | 51 +- arch/powerpc/platforms/pseries/Makefile | 2 arch/powerpc/platforms/pseries/eeh.c | 659 ++++++++++++++++++++-------- arch/powerpc/platforms/pseries/eeh_event.c | 155 +++++++ arch/powerpc/platforms/pseries/iommu.c | 3 arch/powerpc/platforms/pseries/pci.c | 3 arch/powerpc/platforms/pseries/reconfig.c | 2 arch/powerpc/platforms/pseries/rtasd.c | 8 arch/powerpc/platforms/pseries/scanlog.c | 0 arch/powerpc/platforms/pseries/setup.c | 8 arch/powerpc/platforms/pseries/smp.c | 5 arch/powerpc/platforms/pseries/xics.c | 7 arch/powerpc/sysdev/u3_iommu.c | 2 arch/powerpc/xmon/Makefile | 2 arch/powerpc/xmon/nonstdio.c | 134 ++++++ arch/powerpc/xmon/nonstdio.h | 28 - arch/powerpc/xmon/setjmp.S | 176 ++++--- arch/powerpc/xmon/start_32.c | 235 +--------- arch/powerpc/xmon/start_64.c | 167 ------- arch/powerpc/xmon/start_8xx.c | 255 ----------- arch/powerpc/xmon/subr_prf.c | 54 -- arch/powerpc/xmon/xmon.c | 50 +- arch/ppc/boot/include/of1275.h | 3 arch/ppc/boot/of1275/Makefile | 2 arch/ppc/boot/of1275/call_prom.c | 74 +++ arch/ppc/boot/of1275/claim.c | 99 +++- arch/ppc/boot/of1275/finddevice.c | 19 - arch/ppc/boot/openfirmware/Makefile | 3 arch/ppc/kernel/Makefile | 5 arch/ppc/kernel/head_booke.h | 2 arch/ppc/kernel/irq.c | 165 ------- arch/ppc/kernel/misc.S | 4 arch/ppc/kernel/ppc_ksyms.c | 7 arch/ppc/kernel/setup.c | 1 arch/ppc/platforms/pmac_pic.c | 3 arch/ppc/platforms/prep_setup.c | 9 arch/ppc64/Kconfig | 4 arch/ppc64/boot/addRamDisk.c | 207 +++++---- arch/ppc64/kernel/Makefile | 16 - arch/ppc64/kernel/asm-offsets.c | 1 arch/ppc64/kernel/head.S | 82 --- arch/ppc64/kernel/idle.c | 1 arch/ppc64/kernel/misc.S | 8 arch/ppc64/kernel/nvram.c | 5 arch/ppc64/kernel/pci.c | 10 arch/ppc64/kernel/pci_dn.c | 21 + arch/ppc64/kernel/prom.c | 9 arch/ppc64/kernel/prom_init.c | 3 arch/ppc64/kernel/vdso.c | 5 drivers/net/fs_enet/fs_enet-main.c | 1 drivers/net/fs_enet/mac-fcc.c | 1 drivers/net/fs_enet/mac-fec.c | 1 drivers/net/fs_enet/mac-scc.c | 1 drivers/pci/hotplug/rpadlpar_core.c | 2 include/asm-powerpc/abs_addr.h | 6 include/asm-powerpc/asm-compat.h | 55 ++ include/asm-powerpc/atomic.h | 188 ++++++++ include/asm-powerpc/bitops.h | 41 +- include/asm-powerpc/bug.h | 19 - include/asm-powerpc/cache.h | 40 ++ include/asm-powerpc/cacheflush.h | 68 +++ include/asm-powerpc/compat.h | 8 include/asm-powerpc/cputable.h | 6 include/asm-powerpc/current.h | 27 + include/asm-powerpc/eeh_event.h | 52 ++ include/asm-powerpc/firmware.h | 6 include/asm-powerpc/futex.h | 5 include/asm-powerpc/hvcall.h | 14 - include/asm-powerpc/hw_irq.h | 1 include/asm-powerpc/irq.h | 5 include/asm-powerpc/lppaca.h | 9 include/asm-powerpc/paca.h | 15 - include/asm-powerpc/ppc-pci.h | 52 ++ include/asm-powerpc/ppc_asm.h | 39 -- include/asm-powerpc/processor.h | 70 ++- include/asm-powerpc/reg.h | 7 include/asm-powerpc/reg_8xx.h | 42 ++ include/asm-powerpc/signal.h | 41 +- include/asm-powerpc/sparsemem.h | 4 include/asm-powerpc/system.h | 2 include/asm-powerpc/systemcfg.h | 6 include/asm-powerpc/tce.h | 6 include/asm-powerpc/uaccess.h | 40 +- include/asm-powerpc/xmon.h | 1 include/asm-ppc/cache.h | 84 ---- include/asm-ppc/cacheflush.h | 49 -- include/asm-ppc/current.h | 11 include/asm-ppc64/cache.h | 36 -- include/asm-ppc64/cacheflush.h | 48 -- include/asm-ppc64/current.h | 16 - include/asm-ppc64/eeh.h | 46 +- include/asm-ppc64/mmu.h | 6 include/asm-ppc64/mmzone.h | 8 include/asm-ppc64/page.h | 2 include/asm-ppc64/pci-bridge.h | 1 include/asm-ppc64/pgalloc.h | 4 include/asm-ppc64/prom.h | 2 include/asm-ppc64/signal.h | 132 ------ include/asm-ppc64/system.h | 2 145 files changed, 2618 insertions(+), 2620 deletions(-) rename arch/{ppc64/kernel/cpu_setup_power4.S => powerpc/kernel/cpu_setup_power4.S} (99%) rename arch/{ppc64/kernel/firmware.c => powerpc/kernel/firmware.c} (96%) rename arch/{ppc64/kernel/ioctl32.c => powerpc/kernel/ioctl32.c} (96%) rename arch/{ppc64/kernel/irq.c => powerpc/kernel/irq.c} (52%) rename arch/{ppc64/kernel/lparcfg.c => powerpc/kernel/lparcfg.c} (99%) rename arch/{ppc64/kernel/pacaData.c => powerpc/kernel/paca.c} (98%) rename arch/{ppc64/kernel/proc_ppc64.c => powerpc/kernel/proc_ppc64.c} (92%) rename arch/{ppc64/kernel/rtas_pci.c => powerpc/kernel/rtas_pci.c} (93%) create mode 100644 arch/powerpc/kernel/setup.h rename arch/{ppc64/kernel/sysfs.c => powerpc/kernel/sysfs.c} (100%) rename arch/{ppc64/kernel/eeh.c => powerpc/platforms/pseries/eeh.c} (52%) create mode 100644 arch/powerpc/platforms/pseries/eeh_event.c rename arch/{ppc64/kernel/scanlog.c => powerpc/platforms/pseries/scanlog.c} (100%) create mode 100644 arch/powerpc/xmon/nonstdio.c delete mode 100644 arch/powerpc/xmon/subr_prf.c create mode 100644 arch/ppc/boot/of1275/call_prom.c delete mode 100644 arch/ppc/kernel/irq.c rename include/{asm-ppc64/abs_addr.h => asm-powerpc/abs_addr.h} (93%) create mode 100644 include/asm-powerpc/asm-compat.h create mode 100644 include/asm-powerpc/cache.h create mode 100644 include/asm-powerpc/cacheflush.h rename include/{asm-ppc64/compat.h => asm-powerpc/compat.h} (97%) create mode 100644 include/asm-powerpc/current.h create mode 100644 include/asm-powerpc/eeh_event.h rename include/{asm-ppc64/hvcall.h => asm-powerpc/hvcall.h} (98%) rename include/{asm-ppc64/lppaca.h => asm-powerpc/lppaca.h} (98%) rename include/{asm-ppc64/paca.h => asm-powerpc/paca.h} (94%) create mode 100644 include/asm-powerpc/reg_8xx.h rename include/{asm-ppc/signal.h => asm-powerpc/signal.h} (77%) rename include/{asm-ppc64/systemcfg.h => asm-powerpc/systemcfg.h} (95%) rename include/{asm-ppc64/tce.h => asm-powerpc/tce.h} (96%) delete mode 100644 include/asm-ppc/cache.h delete mode 100644 include/asm-ppc/cacheflush.h delete mode 100644 include/asm-ppc/current.h delete mode 100644 include/asm-ppc64/cache.h delete mode 100644 include/asm-ppc64/cacheflush.h delete mode 100644 include/asm-ppc64/current.h delete mode 100644 include/asm-ppc64/signal.h Benjamin Herrenschmidt: ppc64: Don't panic when early __ioremap fails powerpc: 64k pages pmd alloc fix powerpc: 64k pages vs. U3 iommu ppc64: fix PCI IO mapping David Gibson: powerpc: Merge signal.h powerpc: Merge current.h powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc powerpc: Merge cacheflush.h and cache.h powerpc: Move scanlog.c to platforms/pseries powerpc: Consolidate asm compatibility macros powerpc: Move more ppc64 files with no ppc32 equivalent to powerpc linas: ppc64: uniform usage of bus unit id interfaces Linas Vepstas: ppc64: misc minor cleanup ppc64: PCI address cache minor fixes ppc64: PCI error rate statistics ppc64: RTAS error reporting restructuring ppc64: avoid PCI error reporting for empty slots ppc64: serialize reports of PCI errors ppc64: escape hatch for spinning interrupt deadlocks ppc64: bugfix: crash on PCI hotplug ppc64: bugfix: don't silently ignore PCI errors ppc64: move eeh.c to powerpc directory from ppc64 ppc64: PCI error event dispatcher ppc64: PCI reset support routines ppc64: Save & restore of PCI device BARS ppc64: mark failed devices ppc64: bugfix: crash on PHB add Marcelo Tosatti: fs_enet build fix Matt Porter: ppc32: fix ppc44x fpu build Mike Kravetz: revised Memory Add Fixes for ppc64 Memory Add Fixes for ppc64 Memory Add Fixes for ppc64 Memory Add Fixes for ppc64 Olaf Hering: ppc64 boot: remove local initializers ppc64 boot: remove argv usage ppc64 boot: remove sysmap from required filenames ppc64 boot: fix compile warnings Paul Mackerras: powerpc: Fix crash in early boot on some powermacs powerpc: Simplify and clean up the xmon terminal I/O ppc/powerpc: workarounds for old Open Firmware versions powerpc: Fix find_next_bit on 32-bit powerpc: merge code values for identifying platforms powerpc: Fix typo introduced in merging platform codes powerpc: Fix compile error in EEH code with gcc4 powerpc: Fixes for 32-bit powermac SMP powerpc: Fix SMP time initialization problem powerpc: Add user CPU features for POWER4, POWER5, POWER5+ and Cell. powerpc: 32-bit fixes for xmon powerpc: Move some extern declarations from C code into headers ppc64: Add declarations to ppc64 headers as well as powerpc headers Stephen Rothwell: powerpc: create kernel/setup.h ppc64: move stack switching up in interrupt processing ppc64: allow iSeries to use IRQSTACKS again ppc64: remove ppc_irq_dispatch_handler powerpc: merge irq.c powerpc: remove some warnings when building iSeries powerpc: implement atomic64_t on ppc64 powerpc: fix iSeries build From akpm at osdl.org Thu Nov 10 17:40:34 2005 From: akpm at osdl.org (Andrew Morton) Date: Wed, 9 Nov 2005 22:40:34 -0800 Subject: [PATCH] ppc64 boot: remove argv usage In-Reply-To: References: <20051109195103.GA31658@suse.de> <20051109195220.GB31658@suse.de> Message-ID: <20051109224034.4da7ad37.akpm@osdl.org> Hollis Blanchard wrote: > > On Nov 9, 2005, at 1:52 PM, Olaf Hering wrote: > > > @@ -69,6 +70,7 @@ int main(int argc, char **argv) > > fprintf(stderr, "Name of RAM disk file missing.\n"); > > exit(1); > > } > > + rd_name = argv[1] > > > > if (argc < 3) { > > fprintf(stderr, "Name of System Map input file is missing.\n"); > > A semicolon would probably be useful here. :) > That sneakily got fixed up in a succeeding patch. I'll sort it out. From michael at ellerman.id.au Fri Nov 11 12:03:09 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 11 Nov 2005 12:03:09 +1100 (EST) Subject: [PATCH] powerpc: Take 3, merge page.h Message-ID: <20051111010309.5943A68710@ozlabs.org> Merge asm-ppc/page.h and asm-ppc64/page.h, into asm-powerpc/page.h, asm-powerpc/page_32.h and asm-powerpc/page_64.h There's a bit of weirdness in page_32.h, with APUS undef'ing things. I think this is cleaner though than polluting the rest of the code with PPC_MEMOFFSET etc. Built for PPC (common_defconfig), with ARCH=powerpc, mostly built with ARCH=ppc (other things break the build). Built and booted on P5 LPAR for PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Mostly built and for iSeries powerpc. Signed-off-by: Michael Ellerman --- include/asm-powerpc/page.h | 173 +++++++++++++++++++++ include/asm-powerpc/page_32.h | 97 ++++++++++++ include/asm-powerpc/page_64.h | 177 ++++++++++++++++++++++ include/asm-ppc/page.h | 173 --------------------- include/asm-ppc64/page.h | 333 ------------------------------------------ 5 files changed, 447 insertions(+), 506 deletions(-) Index: kexec/include/asm-powerpc/page.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page.h @@ -0,0 +1,173 @@ +#ifndef _ASM_POWERPC_PAGE_H +#define _ASM_POWERPC_PAGE_H + +/* + * Copyright (C) 2001,2005 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifdef __KERNEL__ +#include +#include /* for ASM_CONST */ + +/* + * On PPC32 page size is 4K. For PPC64 we support either 4K or 64K software + * page size. When using 64K pages however, whether we are really supporting + * 64K pages in HW or not is irrelevant to those definitions. + */ +#ifdef CONFIG_PPC_64K_PAGES +#define PAGE_SHIFT 16 +#else +#define PAGE_SHIFT 12 +#endif + +#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) + +/* + * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So on PPC32 + * if we assign PAGE_MASK to a long long it gets extended the way want + * (i.e. with 1s in the high bits) + */ +#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) + +#define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) +#define KERNELBASE PAGE_OFFSET + +#ifdef CONFIG_DISCONTIGMEM +#define page_to_pfn(page) discontigmem_page_to_pfn(page) +#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) +#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) +#endif + +#ifdef CONFIG_FLATMEM +#define pfn_to_page(pfn) (mem_map + (pfn)) +#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) +#define pfn_valid(pfn) ((pfn) < max_mapnr) +#endif + +#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) + +#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) +#define __pa(x) ((unsigned long)(x) - PAGE_OFFSET) + +/* + * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, + * and needs to be executable. This means the whole heap ends + * up being executable. + */ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#ifdef __powerpc64__ +#include +#else +#include +#endif + +#undef STRICT_MM_TYPECHECKS + +#ifdef STRICT_MM_TYPECHECKS +/* These are used to make use of C type-checking. */ + +/* PTE level */ +typedef struct { pte_basic_t pte; } pte_t; +#define pte_val(x) ((x).pte) +#define __pte(x) ((pte_t) { (x) }) + +/* 64k pages additionally define a bigger "real PTE" type that gathers + * the "second half" part of the PTE for pseudo 64k pages + */ +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef struct { pte_t pte; } real_pte_t; +#endif + +/* PMD level */ +typedef struct { unsigned long pmd; } pmd_t; +#define pmd_val(x) ((x).pmd) +#define __pmd(x) ((pmd_t) { (x) }) + +/* PUD level exusts only on 4k pages */ +#ifndef CONFIG_PPC_64K_PAGES +typedef struct { unsigned long pud; } pud_t; +#define pud_val(x) ((x).pud) +#define __pud(x) ((pud_t) { (x) }) +#endif + +/* PGD level */ +typedef struct { unsigned long pgd; } pgd_t; +#define pgd_val(x) ((x).pgd) +#define __pgd(x) ((pgd_t) { (x) }) + +/* Page protection bits */ +typedef struct { unsigned long pgprot; } pgprot_t; +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) }) + +#else + +/* + * .. while these make it easier on the compiler + */ + +typedef pte_basic_t pte_t; +#define pte_val(x) (x) +#define __pte(x) (x) + +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef unsigned long real_pte_t; +#endif + + +typedef unsigned long pmd_t; +#define pmd_val(x) (x) +#define __pmd(x) (x) + +#ifndef CONFIG_PPC_64K_PAGES +typedef unsigned long pud_t; +#define pud_val(x) (x) +#define __pud(x) (x) +#endif + +typedef unsigned long pgd_t; +#define pgd_val(x) (x) +#define pgprot_val(x) (x) + +typedef unsigned long pgprot_t; +#define __pgd(x) (x) +#define __pgprot(x) (x) + +#endif + + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) +#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr,size) _ALIGN_UP(addr,size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +struct page; +extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); +extern void copy_user_page(void *to, void *from, unsigned long vaddr, + struct page *p); +extern int page_is_ram(unsigned long pfn); + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_PAGE_H */ Index: kexec/include/asm-powerpc/page_32.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_32.h @@ -0,0 +1,97 @@ +#ifndef _ASM_POWERPC_PAGE_32_H +#define _ASM_POWERPC_PAGE_32_H + +#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS_32 + +#ifndef __ASSEMBLY__ +/* + * The basic type of a PTE - 64 bits for those CPUs with > 32 bit + * physical addressing. For now this just the IBM PPC440. + */ +#ifdef CONFIG_PTE_64BIT +typedef unsigned long long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ +#define PTE_FMT "%16Lx" +#else +typedef unsigned long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ +#define PTE_FMT "%.8lx" +#endif + +struct page; +extern void clear_pages(void *page, int order); +static inline void clear_page(void *page) { clear_pages(page, 0); } +extern void copy_page(void *to, void *from); + +/* Pure 2^n version of get_order */ +extern __inline__ int get_order(unsigned long size) +{ + int lz; + + size = (size-1) >> PAGE_SHIFT; + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); + return 32 - lz; +} + +#ifndef CONFIG_APUS +#define PPC_MEMSTART 0 +#else /* CONFIG_APUS */ +extern unsigned long ppc_memstart; +extern unsigned long ppc_pgstart; +extern unsigned long ppc_memoffset; +#define PPC_MEMSTART ppc_memstart + +#ifdef MODULE +#define ___pa(vaddr) ((vaddr) - ppc_memoffset) +#define ___va(paddr) ((paddr) + ppc_memoffset) +#else /* !MODULE */ +/* map phys->virtual and virtual->phys for RAM pages */ +static inline unsigned long ___pa(unsigned long v) +{ + unsigned long p; + asm volatile ("1: addis %0, %1, %2;" + ".section \".vtop_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (p) + : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); + + return p; +} +static inline void* ___va(unsigned long p) +{ + unsigned long v; + asm volatile ("1: addis %0, %1, %2;" + ".section \".ptov_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (v) + : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); + + return (void*) v; +} +#endif + +/* APUS needs more complicated versions of these macros */ +#undef __pa +#define __pa(x) ___pa((unsigned long)(x)) + +#undef __va +#define __va(x) ((void *)(___va((unsigned long)(x)))) + +#undef pfn_to_page +#define pfn_to_page(pfn) (mem_map + ((pfn) - ppc_pgstart)) + +#undef page_to_pfn +#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + ppc_pgstart) + +#undef pfn_valid +#define pfn_valid(pfn) (((pfn) - ppc_pgstart) < max_mapnr) + +#endif /* CONFIG_APUS */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_POWERPC_PAGE_32_H */ Index: kexec/include/asm-powerpc/page_64.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_64.h @@ -0,0 +1,177 @@ +#ifndef _ASM_POWERPC_PAGE_64_H +#define _ASM_POWERPC_PAGE_64_H + +/* + * Copyright (C) 2001 PPC64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +/* + * We always define HW_PAGE_SHIFT to 12 as use of 64K pages remains Linux + * specific, every notion of page number shared with the firmware, TCEs, + * iommu, etc... still uses a page size of 4K. + */ +#define HW_PAGE_SHIFT 12 +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) + +/* + * PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and + * HW_PAGE_SHIFT, that is 4K pages. + */ +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) + +#define REGION_SIZE 4UL +#define REGION_SHIFT 60UL +#define REGION_MASK (((1UL<> REGION_SHIFT) +#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) +#define USER_REGION_ID (0UL) +#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) + +/* Segment size */ +#define SID_SHIFT 28 +#define SID_MASK 0xfffffffffUL +#define ESID_MASK 0xfffffffff0000000UL +#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) + +#ifndef __ASSEMBLY__ +#include + +typedef unsigned long pte_basic_t; + +static __inline__ void clear_page(void *addr) +{ + unsigned long lines, line_size; + + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; + + __asm__ __volatile__( + "mtctr %1 # clear_page\n\ +1: dcbz 0,%0\n\ + add %0,%0,%3\n\ + bdnz+ 1b" + : "=r" (addr) + : "r" (lines), "0" (addr), "r" (line_size) + : "ctr", "memory"); +} + +extern void copy_4K_page(void *to, void *from); + +#ifdef CONFIG_PPC_64K_PAGES +static inline void copy_page(void *to, void *from) +{ + unsigned int i; + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { + copy_4K_page(to, from); + to += 4096; + from += 4096; + } +} +#else /* CONFIG_PPC_64K_PAGES */ +static inline void copy_page(void *to, void *from) +{ + copy_4K_page(to, from); +} +#endif /* CONFIG_PPC_64K_PAGES */ + +/* Log 2 of page table size */ +extern u64 ppc64_pft_size; + +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + +/* Large pages size */ +extern unsigned int HPAGE_SHIFT; +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +#endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_HUGETLB_PAGE + +#define HTLB_AREA_SHIFT 40 +#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) +#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) + +#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ + - (1U << GET_ESID(addr))) & 0xffff) +#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ + - (1U << GET_HTLB_AREA(addr))) & 0xffff) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE +#define ARCH_HAS_SETCLEAR_HUGE_PTE + +#define touches_hugepage_low_range(mm, addr, len) \ + (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) +#define touches_hugepage_high_range(mm, addr, len) \ + (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) + +#define __within_hugepage_low_range(addr, len, segmask) \ + ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) +#define within_hugepage_low_range(addr, len) \ + __within_hugepage_low_range((addr), (len), \ + current->mm->context.low_htlb_areas) +#define __within_hugepage_high_range(addr, len, zonemask) \ + ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) +#define within_hugepage_high_range(addr, len) \ + __within_hugepage_high_range((addr), (len), \ + current->mm->context.high_htlb_areas) + +#define is_hugepage_only_range(mm, addr, len) \ + (touches_hugepage_high_range((mm), (addr), (len)) || \ + touches_hugepage_low_range((mm), (addr), (len))) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ + ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ + ( ((addr) < 0x100000000L) && \ + ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ + +#ifdef MODULE +#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) +#else +#define __page_aligned \ + __attribute__((__aligned__(PAGE_SIZE), \ + __section__(".data.page_aligned"))) +#endif + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +/* + * This is the default if a program doesn't have a PT_GNU_STACK + * program header entry. The PPC64 ELF ABI has a non executable stack + * stack by default, so in the absense of a PT_GNU_STACK program header + * we turn execute permission off. + */ +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) + +#include + +#endif /* _ASM_POWERPC_PAGE_64_H */ Index: kexec/include/asm-ppc/page.h =================================================================== --- kexec.orig/include/asm-ppc/page.h +++ /dev/null @@ -1,173 +0,0 @@ -#ifndef _PPC_PAGE_H -#define _PPC_PAGE_H - -/* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 -#define PAGE_SIZE (1UL << PAGE_SHIFT) - -/* - * Subtle: this is an int (not an unsigned long) and so it - * gets extended to 64 bits the way want (i.e. with 1s). -- paulus - */ -#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) - -#ifdef __KERNEL__ -#include - -/* This must match what is in arch/ppc/Makefile */ -#define PAGE_OFFSET CONFIG_KERNEL_START -#define KERNELBASE PAGE_OFFSET - -#ifndef __ASSEMBLY__ - -/* - * The basic type of a PTE - 64 bits for those CPUs with > 32 bit - * physical addressing. For now this just the IBM PPC440. - */ -#ifdef CONFIG_PTE_64BIT -typedef unsigned long long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ -#define PTE_FMT "%16Lx" -#else -typedef unsigned long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ -#define PTE_FMT "%.8lx" -#endif - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - - -#undef STRICT_MM_TYPECHECKS - -#ifdef STRICT_MM_TYPECHECKS -/* - * These are used to make use of C type-checking.. - */ -typedef struct { pte_basic_t pte; } pte_t; -typedef struct { unsigned long pmd; } pmd_t; -typedef struct { unsigned long pgd; } pgd_t; -typedef struct { unsigned long pgprot; } pgprot_t; - -#define pte_val(x) ((x).pte) -#define pmd_val(x) ((x).pmd) -#define pgd_val(x) ((x).pgd) -#define pgprot_val(x) ((x).pgprot) - -#define __pte(x) ((pte_t) { (x) } ) -#define __pmd(x) ((pmd_t) { (x) } ) -#define __pgd(x) ((pgd_t) { (x) } ) -#define __pgprot(x) ((pgprot_t) { (x) } ) - -#else -/* - * .. while these make it easier on the compiler - */ -typedef pte_basic_t pte_t; -typedef unsigned long pmd_t; -typedef unsigned long pgd_t; -typedef unsigned long pgprot_t; - -#define pte_val(x) (x) -#define pmd_val(x) (x) -#define pgd_val(x) (x) -#define pgprot_val(x) (x) - -#define __pte(x) (x) -#define __pmd(x) (x) -#define __pgd(x) (x) -#define __pgprot(x) (x) - -#endif - -struct page; -extern void clear_pages(void *page, int order); -static inline void clear_page(void *page) { clear_pages(page, 0); } -extern void copy_page(void *to, void *from); -extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); -extern void copy_user_page(void *to, void *from, unsigned long vaddr, - struct page *pg); - -#ifndef CONFIG_APUS -#define PPC_MEMSTART 0 -#define PPC_PGSTART 0 -#define PPC_MEMOFFSET PAGE_OFFSET -#else -extern unsigned long ppc_memstart; -extern unsigned long ppc_pgstart; -extern unsigned long ppc_memoffset; -#define PPC_MEMSTART ppc_memstart -#define PPC_PGSTART ppc_pgstart -#define PPC_MEMOFFSET ppc_memoffset -#endif - -#if defined(CONFIG_APUS) && !defined(MODULE) -/* map phys->virtual and virtual->phys for RAM pages */ -static inline unsigned long ___pa(unsigned long v) -{ - unsigned long p; - asm volatile ("1: addis %0, %1, %2;" - ".section \".vtop_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (p) - : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); - - return p; -} -static inline void* ___va(unsigned long p) -{ - unsigned long v; - asm volatile ("1: addis %0, %1, %2;" - ".section \".ptov_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (v) - : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); - - return (void*) v; -} -#else -#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET) -#define ___va(paddr) ((paddr)+PPC_MEMOFFSET) -#endif - -extern int page_is_ram(unsigned long pfn); - -#define __pa(x) ___pa((unsigned long)(x)) -#define __va(x) ((void *)(___va((unsigned long)(x)))) - -#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + PPC_PGSTART) -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) - -#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* Pure 2^n version of get_order */ -extern __inline__ int get_order(unsigned long size) -{ - int lz; - - size = (size-1) >> PAGE_SHIFT; - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); - return 32 - lz; -} - -#endif /* __ASSEMBLY__ */ - -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#endif /* __KERNEL__ */ -#endif /* _PPC_PAGE_H */ Index: kexec/include/asm-ppc64/page.h =================================================================== --- kexec.orig/include/asm-ppc64/page.h +++ /dev/null @@ -1,333 +0,0 @@ -#ifndef _PPC64_PAGE_H -#define _PPC64_PAGE_H - -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include /* for ASM_CONST */ - -/* - * We support either 4k or 64k software page size. When using 64k pages - * however, wether we are really supporting 64k pages in HW or not is - * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 - * as use of 64k pages remains a linux kernel specific, every notion of - * page number shared with the firmware, TCEs, iommu, etc... still assumes - * a page size of 4096. - */ -#ifdef CONFIG_PPC_64K_PAGES -#define PAGE_SHIFT 16 -#else -#define PAGE_SHIFT 12 -#endif - -#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - -/* HW_PAGE_SHIFT is always 4k pages */ -#define HW_PAGE_SHIFT 12 -#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) -#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) - -/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and - * HW_PAGE_SHIFT, that is 4k pages - */ -#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) - -/* Segment size */ -#define SID_SHIFT 28 -#define SID_MASK 0xfffffffffUL -#define ESID_MASK 0xfffffffff0000000UL -#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) - -/* Large pages size */ - -#ifndef __ASSEMBLY__ -extern unsigned int HPAGE_SHIFT; -#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#endif /* __ASSEMBLY__ */ - -#ifdef CONFIG_HUGETLB_PAGE - - -#define HTLB_AREA_SHIFT 40 -#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) -#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) - -#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ - - (1U << GET_ESID(addr))) & 0xffff) -#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ - - (1U << GET_HTLB_AREA(addr))) & 0xffff) - -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE -#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE -#define ARCH_HAS_SETCLEAR_HUGE_PTE - -#define touches_hugepage_low_range(mm, addr, len) \ - (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) -#define touches_hugepage_high_range(mm, addr, len) \ - (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) - -#define __within_hugepage_low_range(addr, len, segmask) \ - ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) -#define within_hugepage_low_range(addr, len) \ - __within_hugepage_low_range((addr), (len), \ - current->mm->context.low_htlb_areas) -#define __within_hugepage_high_range(addr, len, zonemask) \ - ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) -#define within_hugepage_high_range(addr, len) \ - __within_hugepage_high_range((addr), (len), \ - current->mm->context.high_htlb_areas) - -#define is_hugepage_only_range(mm, addr, len) \ - (touches_hugepage_high_range((mm), (addr), (len)) || \ - touches_hugepage_low_range((mm), (addr), (len))) -#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA - -#define in_hugepage_area(context, addr) \ - (cpu_has_feature(CPU_FTR_16M_PAGE) && \ - ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ - ( ((addr) < 0x100000000L) && \ - ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) - -#else /* !CONFIG_HUGETLB_PAGE */ - -#define in_hugepage_area(mm, addr) 0 - -#endif /* !CONFIG_HUGETLB_PAGE */ - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#include - -#undef STRICT_MM_TYPECHECKS - -#define REGION_SIZE 4UL -#define REGION_SHIFT 60UL -#define REGION_MASK (((1UL<> REGION_SHIFT) -#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) -#define USER_REGION_ID (0UL) -#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) - -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) - -#ifdef CONFIG_DISCONTIGMEM -#define page_to_pfn(page) discontigmem_page_to_pfn(page) -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) -#endif -#ifdef CONFIG_FLATMEM -#define pfn_to_page(pfn) (mem_map + (pfn)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) -#define pfn_valid(pfn) ((pfn) < max_mapnr) -#endif - -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) - -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* - * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, - * and needs to be executable. This means the whole heap ends - * up being executable. - */ -#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) - -/* - * This is the default if a program doesn't have a PT_GNU_STACK - * program header entry. The PPC64 ELF ABI has a non executable stack - * stack by default, so in the absense of a PT_GNU_STACK program header - * we turn execute permission off. - */ -#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) - -#endif /* __KERNEL__ */ - -#include - -#endif /* _PPC64_PAGE_H */ From michael at ellerman.id.au Fri Nov 11 13:09:17 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 11 Nov 2005 13:09:17 +1100 Subject: [PATCH] powerpc: Take 3, merge page.h - BORKEN In-Reply-To: <20051111010309.5943A68710@ozlabs.org> References: <20051111010309.5943A68710@ozlabs.org> Message-ID: <200511111309.23160.michael@ellerman.id.au> That's missing a quilt ref, new version coming. On Fri, 11 Nov 2005 12:03, Michael Ellerman wrote: > Merge asm-ppc/page.h and asm-ppc64/page.h, into asm-powerpc/page.h, > asm-powerpc/page_32.h and asm-powerpc/page_64.h > > There's a bit of weirdness in page_32.h, with APUS undef'ing things. I > think this is cleaner though than polluting the rest of the code with > PPC_MEMOFFSET etc. > > Built for PPC (common_defconfig), with ARCH=powerpc, mostly built with > ARCH=ppc (other things break the build). Built and booted on P5 LPAR for > PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Mostly built and for > iSeries powerpc. > > Signed-off-by: Michael Ellerman > --- > > include/asm-powerpc/page.h | 173 +++++++++++++++++++++ > include/asm-powerpc/page_32.h | 97 ++++++++++++ > include/asm-powerpc/page_64.h | 177 ++++++++++++++++++++++ > include/asm-ppc/page.h | 173 --------------------- > include/asm-ppc64/page.h | 333 > ------------------------------------------ 5 files changed, 447 > insertions(+), 506 deletions(-) > > Index: kexec/include/asm-powerpc/page.h > =================================================================== > --- /dev/null > +++ kexec/include/asm-powerpc/page.h > @@ -0,0 +1,173 @@ > +#ifndef _ASM_POWERPC_PAGE_H > +#define _ASM_POWERPC_PAGE_H > + > +/* > + * Copyright (C) 2001,2005 IBM Corporation. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#ifdef __KERNEL__ > +#include > +#include /* for ASM_CONST */ > + > +/* > + * On PPC32 page size is 4K. For PPC64 we support either 4K or 64K > software + * page size. When using 64K pages however, whether we are really > supporting + * 64K pages in HW or not is irrelevant to those definitions. > + */ > +#ifdef CONFIG_PPC_64K_PAGES > +#define PAGE_SHIFT 16 > +#else > +#define PAGE_SHIFT 12 > +#endif > + > +#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) > + > +/* > + * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So on PPC32 > + * if we assign PAGE_MASK to a long long it gets extended the way want > + * (i.e. with 1s in the high bits) > + */ > +#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) > + > +#define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) > +#define KERNELBASE PAGE_OFFSET > + > +#ifdef CONFIG_DISCONTIGMEM > +#define page_to_pfn(page) discontigmem_page_to_pfn(page) > +#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) > +#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) > +#endif > + > +#ifdef CONFIG_FLATMEM > +#define pfn_to_page(pfn) (mem_map + (pfn)) > +#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) > +#define pfn_valid(pfn) ((pfn) < max_mapnr) > +#endif > + > +#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) > +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) > +#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) > + > +#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) > +#define __pa(x) ((unsigned long)(x) - PAGE_OFFSET) > + > +/* > + * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, > + * and needs to be executable. This means the whole heap ends > + * up being executable. > + */ > +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ > + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > + > +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ > + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > + > +#ifdef __powerpc64__ > +#include > +#else > +#include > +#endif > + > +#undef STRICT_MM_TYPECHECKS > + > +#ifdef STRICT_MM_TYPECHECKS > +/* These are used to make use of C type-checking. */ > + > +/* PTE level */ > +typedef struct { pte_basic_t pte; } pte_t; > +#define pte_val(x) ((x).pte) > +#define __pte(x) ((pte_t) { (x) }) > + > +/* 64k pages additionally define a bigger "real PTE" type that gathers > + * the "second half" part of the PTE for pseudo 64k pages > + */ > +#ifdef CONFIG_PPC_64K_PAGES > +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; > +#else > +typedef struct { pte_t pte; } real_pte_t; > +#endif > + > +/* PMD level */ > +typedef struct { unsigned long pmd; } pmd_t; > +#define pmd_val(x) ((x).pmd) > +#define __pmd(x) ((pmd_t) { (x) }) > + > +/* PUD level exusts only on 4k pages */ > +#ifndef CONFIG_PPC_64K_PAGES > +typedef struct { unsigned long pud; } pud_t; > +#define pud_val(x) ((x).pud) > +#define __pud(x) ((pud_t) { (x) }) > +#endif > + > +/* PGD level */ > +typedef struct { unsigned long pgd; } pgd_t; > +#define pgd_val(x) ((x).pgd) > +#define __pgd(x) ((pgd_t) { (x) }) > + > +/* Page protection bits */ > +typedef struct { unsigned long pgprot; } pgprot_t; > +#define pgprot_val(x) ((x).pgprot) > +#define __pgprot(x) ((pgprot_t) { (x) }) > + > +#else > + > +/* > + * .. while these make it easier on the compiler > + */ > + > +typedef pte_basic_t pte_t; > +#define pte_val(x) (x) > +#define __pte(x) (x) > + > +#ifdef CONFIG_PPC_64K_PAGES > +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; > +#else > +typedef unsigned long real_pte_t; > +#endif > + > + > +typedef unsigned long pmd_t; > +#define pmd_val(x) (x) > +#define __pmd(x) (x) > + > +#ifndef CONFIG_PPC_64K_PAGES > +typedef unsigned long pud_t; > +#define pud_val(x) (x) > +#define __pud(x) (x) > +#endif > + > +typedef unsigned long pgd_t; > +#define pgd_val(x) (x) > +#define pgprot_val(x) (x) > + > +typedef unsigned long pgprot_t; > +#define __pgd(x) (x) > +#define __pgprot(x) (x) > + > +#endif > + > + > +/* align addr on a size boundary - adjust address up/down if needed */ > +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) > +#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) > + > +/* align addr on a size boundary - adjust address up if needed */ > +#define _ALIGN(addr,size) _ALIGN_UP(addr,size) > + > +/* to align the pointer to the (next) page boundary */ > +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) > + > +struct page; > +extern void clear_user_page(void *page, unsigned long vaddr, struct page > *pg); +extern void copy_user_page(void *to, void *from, unsigned long > vaddr, + struct page *p); > +extern int page_is_ram(unsigned long pfn); > + > +#endif /* __KERNEL__ */ > + > +#endif /* _ASM_POWERPC_PAGE_H */ > Index: kexec/include/asm-powerpc/page_32.h > =================================================================== > --- /dev/null > +++ kexec/include/asm-powerpc/page_32.h > @@ -0,0 +1,97 @@ > +#ifndef _ASM_POWERPC_PAGE_32_H > +#define _ASM_POWERPC_PAGE_32_H > + > +#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS_32 > + > +#ifndef __ASSEMBLY__ > +/* > + * The basic type of a PTE - 64 bits for those CPUs with > 32 bit > + * physical addressing. For now this just the IBM PPC440. > + */ > +#ifdef CONFIG_PTE_64BIT > +typedef unsigned long long pte_basic_t; > +#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ > +#define PTE_FMT "%16Lx" > +#else > +typedef unsigned long pte_basic_t; > +#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ > +#define PTE_FMT "%.8lx" > +#endif > + > +struct page; > +extern void clear_pages(void *page, int order); > +static inline void clear_page(void *page) { clear_pages(page, 0); } > +extern void copy_page(void *to, void *from); > + > +/* Pure 2^n version of get_order */ > +extern __inline__ int get_order(unsigned long size) > +{ > + int lz; > + > + size = (size-1) >> PAGE_SHIFT; > + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); > + return 32 - lz; > +} > + > +#ifndef CONFIG_APUS > +#define PPC_MEMSTART 0 > +#else /* CONFIG_APUS */ > +extern unsigned long ppc_memstart; > +extern unsigned long ppc_pgstart; > +extern unsigned long ppc_memoffset; > +#define PPC_MEMSTART ppc_memstart > + > +#ifdef MODULE > +#define ___pa(vaddr) ((vaddr) - ppc_memoffset) > +#define ___va(paddr) ((paddr) + ppc_memoffset) > +#else /* !MODULE */ > +/* map phys->virtual and virtual->phys for RAM pages */ > +static inline unsigned long ___pa(unsigned long v) > +{ > + unsigned long p; > + asm volatile ("1: addis %0, %1, %2;" > + ".section \".vtop_fixup\",\"aw\";" > + ".align 1;" > + ".long 1b;" > + ".previous;" > + : "=r" (p) > + : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); > + > + return p; > +} > +static inline void* ___va(unsigned long p) > +{ > + unsigned long v; > + asm volatile ("1: addis %0, %1, %2;" > + ".section \".ptov_fixup\",\"aw\";" > + ".align 1;" > + ".long 1b;" > + ".previous;" > + : "=r" (v) > + : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); > + > + return (void*) v; > +} > +#endif > + > +/* APUS needs more complicated versions of these macros */ > +#undef __pa > +#define __pa(x) ___pa((unsigned long)(x)) > + > +#undef __va > +#define __va(x) ((void *)(___va((unsigned long)(x)))) > + > +#undef pfn_to_page > +#define pfn_to_page(pfn) (mem_map + ((pfn) - ppc_pgstart)) > + > +#undef page_to_pfn > +#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + > ppc_pgstart) + > +#undef pfn_valid > +#define pfn_valid(pfn) (((pfn) - ppc_pgstart) < max_mapnr) > + > +#endif /* CONFIG_APUS */ > + > +#endif /* __ASSEMBLY__ */ > + > +#endif /* _ASM_POWERPC_PAGE_32_H */ > Index: kexec/include/asm-powerpc/page_64.h > =================================================================== > --- /dev/null > +++ kexec/include/asm-powerpc/page_64.h > @@ -0,0 +1,177 @@ > +#ifndef _ASM_POWERPC_PAGE_64_H > +#define _ASM_POWERPC_PAGE_64_H > + > +/* > + * Copyright (C) 2001 PPC64 Team, IBM Corp > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +/* > + * We always define HW_PAGE_SHIFT to 12 as use of 64K pages remains Linux > + * specific, every notion of page number shared with the firmware, TCEs, > + * iommu, etc... still uses a page size of 4K. > + */ > +#define HW_PAGE_SHIFT 12 > +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) > +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) > + > +/* > + * PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and > + * HW_PAGE_SHIFT, that is 4K pages. > + */ > +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) > + > +#define REGION_SIZE 4UL > +#define REGION_SHIFT 60UL > +#define REGION_MASK (((1UL< + > +#define VMALLOCBASE ASM_CONST(0xD000000000000000) > +#define VMALLOC_REGION_ID (VMALLOCBASE >> REGION_SHIFT) > +#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) > +#define USER_REGION_ID (0UL) > +#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) > + > +/* Segment size */ > +#define SID_SHIFT 28 > +#define SID_MASK 0xfffffffffUL > +#define ESID_MASK 0xfffffffff0000000UL > +#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) > + > +#ifndef __ASSEMBLY__ > +#include > + > +typedef unsigned long pte_basic_t; > + > +static __inline__ void clear_page(void *addr) > +{ > + unsigned long lines, line_size; > + > + line_size = ppc64_caches.dline_size; > + lines = ppc64_caches.dlines_per_page; > + > + __asm__ __volatile__( > + "mtctr %1 # clear_page\n\ > +1: dcbz 0,%0\n\ > + add %0,%0,%3\n\ > + bdnz+ 1b" > + : "=r" (addr) > + : "r" (lines), "0" (addr), "r" (line_size) > + : "ctr", "memory"); > +} > + > +extern void copy_4K_page(void *to, void *from); > + > +#ifdef CONFIG_PPC_64K_PAGES > +static inline void copy_page(void *to, void *from) > +{ > + unsigned int i; > + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { > + copy_4K_page(to, from); > + to += 4096; > + from += 4096; > + } > +} > +#else /* CONFIG_PPC_64K_PAGES */ > +static inline void copy_page(void *to, void *from) > +{ > + copy_4K_page(to, from); > +} > +#endif /* CONFIG_PPC_64K_PAGES */ > + > +/* Log 2 of page table size */ > +extern u64 ppc64_pft_size; > + > +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ > +#define __HAVE_ARCH_GATE_AREA 1 > + > +/* Large pages size */ > +extern unsigned int HPAGE_SHIFT; > +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) > +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) > +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) > + > +#endif /* __ASSEMBLY__ */ > + > +#ifdef CONFIG_HUGETLB_PAGE > + > +#define HTLB_AREA_SHIFT 40 > +#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) > +#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) > + > +#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ > + - (1U << GET_ESID(addr))) & 0xffff) > +#define HTLB_AREA_MASK(addr, len) (((1U << > (GET_HTLB_AREA(addr+len-1)+1)) \ + - (1U << > GET_HTLB_AREA(addr))) & 0xffff) > + > +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE > +#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE > +#define ARCH_HAS_SETCLEAR_HUGE_PTE > + > +#define touches_hugepage_low_range(mm, addr, len) \ > + (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) > +#define touches_hugepage_high_range(mm, addr, len) \ > + (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) > + > +#define __within_hugepage_low_range(addr, len, segmask) \ > + ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) > +#define within_hugepage_low_range(addr, len) \ > + __within_hugepage_low_range((addr), (len), \ > + current->mm->context.low_htlb_areas) > +#define __within_hugepage_high_range(addr, len, zonemask) \ > + ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) > +#define within_hugepage_high_range(addr, len) \ > + __within_hugepage_high_range((addr), (len), \ > + current->mm->context.high_htlb_areas) > + > +#define is_hugepage_only_range(mm, addr, len) \ > + (touches_hugepage_high_range((mm), (addr), (len)) || \ > + touches_hugepage_low_range((mm), (addr), (len))) > +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA > + > +#define in_hugepage_area(context, addr) \ > + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ > + ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ > + ( ((addr) < 0x100000000L) && \ > + ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) > + > +#else /* !CONFIG_HUGETLB_PAGE */ > + > +#define in_hugepage_area(mm, addr) 0 > + > +#endif /* !CONFIG_HUGETLB_PAGE */ > + > +#ifdef MODULE > +#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) > +#else > +#define __page_aligned \ > + __attribute__((__aligned__(PAGE_SIZE), \ > + __section__(".data.page_aligned"))) > +#endif > + > +#define VM_DATA_DEFAULT_FLAGS \ > + (test_thread_flag(TIF_32BIT) ? \ > + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) > + > +/* > + * This is the default if a program doesn't have a PT_GNU_STACK > + * program header entry. The PPC64 ELF ABI has a non executable stack > + * stack by default, so in the absense of a PT_GNU_STACK program header > + * we turn execute permission off. > + */ > +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ > + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > + > +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ > + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > + > +#define VM_STACK_DEFAULT_FLAGS \ > + (test_thread_flag(TIF_32BIT) ? \ > + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) > + > +#include > + > +#endif /* _ASM_POWERPC_PAGE_64_H */ > Index: kexec/include/asm-ppc/page.h > =================================================================== > --- kexec.orig/include/asm-ppc/page.h > +++ /dev/null > @@ -1,173 +0,0 @@ > -#ifndef _PPC_PAGE_H > -#define _PPC_PAGE_H > - > -/* PAGE_SHIFT determines the page size */ > -#define PAGE_SHIFT 12 > -#define PAGE_SIZE (1UL << PAGE_SHIFT) > - > -/* > - * Subtle: this is an int (not an unsigned long) and so it > - * gets extended to 64 bits the way want (i.e. with 1s). -- paulus > - */ > -#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) > - > -#ifdef __KERNEL__ > -#include > - > -/* This must match what is in arch/ppc/Makefile */ > -#define PAGE_OFFSET CONFIG_KERNEL_START > -#define KERNELBASE PAGE_OFFSET > - > -#ifndef __ASSEMBLY__ > - > -/* > - * The basic type of a PTE - 64 bits for those CPUs with > 32 bit > - * physical addressing. For now this just the IBM PPC440. > - */ > -#ifdef CONFIG_PTE_64BIT > -typedef unsigned long long pte_basic_t; > -#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ > -#define PTE_FMT "%16Lx" > -#else > -typedef unsigned long pte_basic_t; > -#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ > -#define PTE_FMT "%.8lx" > -#endif > - > -/* align addr on a size boundary - adjust address up/down if needed */ > -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) > -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) > - > -/* align addr on a size boundary - adjust address up if needed */ > -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) > - > -/* to align the pointer to the (next) page boundary */ > -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) > - > - > -#undef STRICT_MM_TYPECHECKS > - > -#ifdef STRICT_MM_TYPECHECKS > -/* > - * These are used to make use of C type-checking.. > - */ > -typedef struct { pte_basic_t pte; } pte_t; > -typedef struct { unsigned long pmd; } pmd_t; > -typedef struct { unsigned long pgd; } pgd_t; > -typedef struct { unsigned long pgprot; } pgprot_t; > - > -#define pte_val(x) ((x).pte) > -#define pmd_val(x) ((x).pmd) > -#define pgd_val(x) ((x).pgd) > -#define pgprot_val(x) ((x).pgprot) > - > -#define __pte(x) ((pte_t) { (x) } ) > -#define __pmd(x) ((pmd_t) { (x) } ) > -#define __pgd(x) ((pgd_t) { (x) } ) > -#define __pgprot(x) ((pgprot_t) { (x) } ) > - > -#else > -/* > - * .. while these make it easier on the compiler > - */ > -typedef pte_basic_t pte_t; > -typedef unsigned long pmd_t; > -typedef unsigned long pgd_t; > -typedef unsigned long pgprot_t; > - > -#define pte_val(x) (x) > -#define pmd_val(x) (x) > -#define pgd_val(x) (x) > -#define pgprot_val(x) (x) > - > -#define __pte(x) (x) > -#define __pmd(x) (x) > -#define __pgd(x) (x) > -#define __pgprot(x) (x) > - > -#endif > - > -struct page; > -extern void clear_pages(void *page, int order); > -static inline void clear_page(void *page) { clear_pages(page, 0); } > -extern void copy_page(void *to, void *from); > -extern void clear_user_page(void *page, unsigned long vaddr, struct page > *pg); -extern void copy_user_page(void *to, void *from, unsigned long > vaddr, - struct page *pg); > - > -#ifndef CONFIG_APUS > -#define PPC_MEMSTART 0 > -#define PPC_PGSTART 0 > -#define PPC_MEMOFFSET PAGE_OFFSET > -#else > -extern unsigned long ppc_memstart; > -extern unsigned long ppc_pgstart; > -extern unsigned long ppc_memoffset; > -#define PPC_MEMSTART ppc_memstart > -#define PPC_PGSTART ppc_pgstart > -#define PPC_MEMOFFSET ppc_memoffset > -#endif > - > -#if defined(CONFIG_APUS) && !defined(MODULE) > -/* map phys->virtual and virtual->phys for RAM pages */ > -static inline unsigned long ___pa(unsigned long v) > -{ > - unsigned long p; > - asm volatile ("1: addis %0, %1, %2;" > - ".section \".vtop_fixup\",\"aw\";" > - ".align 1;" > - ".long 1b;" > - ".previous;" > - : "=r" (p) > - : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); > - > - return p; > -} > -static inline void* ___va(unsigned long p) > -{ > - unsigned long v; > - asm volatile ("1: addis %0, %1, %2;" > - ".section \".ptov_fixup\",\"aw\";" > - ".align 1;" > - ".long 1b;" > - ".previous;" > - : "=r" (v) > - : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); > - > - return (void*) v; > -} > -#else > -#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET) > -#define ___va(paddr) ((paddr)+PPC_MEMOFFSET) > -#endif > - > -extern int page_is_ram(unsigned long pfn); > - > -#define __pa(x) ___pa((unsigned long)(x)) > -#define __va(x) ((void *)(___va((unsigned long)(x)))) > - > -#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) > -#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + > PPC_PGSTART) -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> > PAGE_SHIFT) -#define page_to_virt(page) __va(page_to_pfn(page) << > PAGE_SHIFT) > - > -#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) > -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) > - > -/* Pure 2^n version of get_order */ > -extern __inline__ int get_order(unsigned long size) > -{ > - int lz; > - > - size = (size-1) >> PAGE_SHIFT; > - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); > - return 32 - lz; > -} > - > -#endif /* __ASSEMBLY__ */ > - > -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ > - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > - > -#endif /* __KERNEL__ */ > -#endif /* _PPC_PAGE_H */ > Index: kexec/include/asm-ppc64/page.h > =================================================================== > --- kexec.orig/include/asm-ppc64/page.h > +++ /dev/null > @@ -1,333 +0,0 @@ > -#ifndef _PPC64_PAGE_H > -#define _PPC64_PAGE_H > - > -/* > - * Copyright (C) 2001 PPC64 Team, IBM Corp > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public License > - * as published by the Free Software Foundation; either version > - * 2 of the License, or (at your option) any later version. > - */ > - > -#include > -#include /* for ASM_CONST */ > - > -/* > - * We support either 4k or 64k software page size. When using 64k pages > - * however, wether we are really supporting 64k pages in HW or not is > - * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 > - * as use of 64k pages remains a linux kernel specific, every notion of > - * page number shared with the firmware, TCEs, iommu, etc... still assumes > - * a page size of 4096. > - */ > -#ifdef CONFIG_PPC_64K_PAGES > -#define PAGE_SHIFT 16 > -#else > -#define PAGE_SHIFT 12 > -#endif > - > -#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) > -#define PAGE_MASK (~(PAGE_SIZE-1)) > - > -/* HW_PAGE_SHIFT is always 4k pages */ > -#define HW_PAGE_SHIFT 12 > -#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) > -#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) > - > -/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and > - * HW_PAGE_SHIFT, that is 4k pages > - */ > -#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) > - > -/* Segment size */ > -#define SID_SHIFT 28 > -#define SID_MASK 0xfffffffffUL > -#define ESID_MASK 0xfffffffff0000000UL > -#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) > - > -/* Large pages size */ > - > -#ifndef __ASSEMBLY__ > -extern unsigned int HPAGE_SHIFT; > -#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) > -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) > -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) > -#endif /* __ASSEMBLY__ */ > - > -#ifdef CONFIG_HUGETLB_PAGE > - > - > -#define HTLB_AREA_SHIFT 40 > -#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) > -#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) > - > -#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ > - - (1U << GET_ESID(addr))) & 0xffff) > -#define HTLB_AREA_MASK(addr, len) (((1U << > (GET_HTLB_AREA(addr+len-1)+1)) \ - - (1U << > GET_HTLB_AREA(addr))) & 0xffff) - > -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE > -#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE > -#define ARCH_HAS_SETCLEAR_HUGE_PTE > - > -#define touches_hugepage_low_range(mm, addr, len) \ > - (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) > -#define touches_hugepage_high_range(mm, addr, len) \ > - (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) > - > -#define __within_hugepage_low_range(addr, len, segmask) \ > - ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) > -#define within_hugepage_low_range(addr, len) \ > - __within_hugepage_low_range((addr), (len), \ > - current->mm->context.low_htlb_areas) > -#define __within_hugepage_high_range(addr, len, zonemask) \ > - ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) > -#define within_hugepage_high_range(addr, len) \ > - __within_hugepage_high_range((addr), (len), \ > - current->mm->context.high_htlb_areas) > - > -#define is_hugepage_only_range(mm, addr, len) \ > - (touches_hugepage_high_range((mm), (addr), (len)) || \ > - touches_hugepage_low_range((mm), (addr), (len))) > -#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA > - > -#define in_hugepage_area(context, addr) \ > - (cpu_has_feature(CPU_FTR_16M_PAGE) && \ > - ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ > - ( ((addr) < 0x100000000L) && \ > - ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) > - > -#else /* !CONFIG_HUGETLB_PAGE */ > - > -#define in_hugepage_area(mm, addr) 0 > - > -#endif /* !CONFIG_HUGETLB_PAGE */ > - > -/* align addr on a size boundary - adjust address up/down if needed */ > -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) > -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) > - > -/* align addr on a size boundary - adjust address up if needed */ > -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) > - > -/* to align the pointer to the (next) page boundary */ > -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) > - > -#ifdef __KERNEL__ > -#ifndef __ASSEMBLY__ > -#include > - > -#undef STRICT_MM_TYPECHECKS > - > -#define REGION_SIZE 4UL > -#define REGION_SHIFT 60UL > -#define REGION_MASK (((1UL< - > -static __inline__ void clear_page(void *addr) > -{ > - unsigned long lines, line_size; > - > - line_size = ppc64_caches.dline_size; > - lines = ppc64_caches.dlines_per_page; > - > - __asm__ __volatile__( > - "mtctr %1 # clear_page\n\ > -1: dcbz 0,%0\n\ > - add %0,%0,%3\n\ > - bdnz+ 1b" > - : "=r" (addr) > - : "r" (lines), "0" (addr), "r" (line_size) > - : "ctr", "memory"); > -} > - > -extern void copy_4K_page(void *to, void *from); > - > -#ifdef CONFIG_PPC_64K_PAGES > -static inline void copy_page(void *to, void *from) > -{ > - unsigned int i; > - for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { > - copy_4K_page(to, from); > - to += 4096; > - from += 4096; > - } > -} > -#else /* CONFIG_PPC_64K_PAGES */ > -static inline void copy_page(void *to, void *from) > -{ > - copy_4K_page(to, from); > -} > -#endif /* CONFIG_PPC_64K_PAGES */ > - > -struct page; > -extern void clear_user_page(void *page, unsigned long vaddr, struct page > *pg); -extern void copy_user_page(void *to, void *from, unsigned long > vaddr, struct page *p); - > -#ifdef STRICT_MM_TYPECHECKS > -/* > - * These are used to make use of C type-checking. > - * Entries in the pte table are 64b, while entries in the pgd & pmd are > 32b. - */ > - > -/* PTE level */ > -typedef struct { unsigned long pte; } pte_t; > -#define pte_val(x) ((x).pte) > -#define __pte(x) ((pte_t) { (x) }) > - > -/* 64k pages additionally define a bigger "real PTE" type that gathers > - * the "second half" part of the PTE for pseudo 64k pages > - */ > -#ifdef CONFIG_PPC_64K_PAGES > -typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; > -#else > -typedef struct { pte_t pte; } real_pte_t; > -#endif > - > -/* PMD level */ > -typedef struct { unsigned long pmd; } pmd_t; > -#define pmd_val(x) ((x).pmd) > -#define __pmd(x) ((pmd_t) { (x) }) > - > -/* PUD level exusts only on 4k pages */ > -#ifndef CONFIG_PPC_64K_PAGES > -typedef struct { unsigned long pud; } pud_t; > -#define pud_val(x) ((x).pud) > -#define __pud(x) ((pud_t) { (x) }) > -#endif > - > -/* PGD level */ > -typedef struct { unsigned long pgd; } pgd_t; > -#define pgd_val(x) ((x).pgd) > -#define __pgd(x) ((pgd_t) { (x) }) > - > -/* Page protection bits */ > -typedef struct { unsigned long pgprot; } pgprot_t; > -#define pgprot_val(x) ((x).pgprot) > -#define __pgprot(x) ((pgprot_t) { (x) }) > - > -#else > - > -/* > - * .. while these make it easier on the compiler > - */ > - > -typedef unsigned long pte_t; > -#define pte_val(x) (x) > -#define __pte(x) (x) > - > -#ifdef CONFIG_PPC_64K_PAGES > -typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; > -#else > -typedef unsigned long real_pte_t; > -#endif > - > - > -typedef unsigned long pmd_t; > -#define pmd_val(x) (x) > -#define __pmd(x) (x) > - > -#ifndef CONFIG_PPC_64K_PAGES > -typedef unsigned long pud_t; > -#define pud_val(x) (x) > -#define __pud(x) (x) > -#endif > - > -typedef unsigned long pgd_t; > -#define pgd_val(x) (x) > -#define pgprot_val(x) (x) > - > -typedef unsigned long pgprot_t; > -#define __pgd(x) (x) > -#define __pgprot(x) (x) > - > -#endif > - > -#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET) > - > -extern int page_is_ram(unsigned long pfn); > - > -extern u64 ppc64_pft_size; /* Log 2 of page table size */ > - > -/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ > -#define __HAVE_ARCH_GATE_AREA 1 > - > -#endif /* __ASSEMBLY__ */ > - > -#ifdef MODULE > -#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) > -#else > -#define __page_aligned \ > - __attribute__((__aligned__(PAGE_SIZE), \ > - __section__(".data.page_aligned"))) > -#endif > - > - > -/* This must match the -Ttext linker address */ > -/* Note: tophys & tovirt make assumptions about how */ > -/* KERNELBASE is defined for performance reasons. */ > -/* When KERNELBASE moves, those macros may have */ > -/* to change! */ > -#define PAGE_OFFSET ASM_CONST(0xC000000000000000) > -#define KERNELBASE PAGE_OFFSET > -#define VMALLOCBASE ASM_CONST(0xD000000000000000) > - > -#define VMALLOC_REGION_ID (VMALLOCBASE >> REGION_SHIFT) > -#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) > -#define USER_REGION_ID (0UL) > -#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) > - > -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) > - > -#ifdef CONFIG_DISCONTIGMEM > -#define page_to_pfn(page) discontigmem_page_to_pfn(page) > -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) > -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) > -#endif > -#ifdef CONFIG_FLATMEM > -#define pfn_to_page(pfn) (mem_map + (pfn)) > -#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) > -#define pfn_valid(pfn) ((pfn) < max_mapnr) > -#endif > - > -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) > -#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) > - > -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) > - > -/* > - * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, > - * and needs to be executable. This means the whole heap ends > - * up being executable. > - */ > -#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ > - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > - > -#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ > - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > - > -#define VM_DATA_DEFAULT_FLAGS \ > - (test_thread_flag(TIF_32BIT) ? \ > - VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) > - > -/* > - * This is the default if a program doesn't have a PT_GNU_STACK > - * program header entry. The PPC64 ELF ABI has a non executable stack > - * stack by default, so in the absense of a PT_GNU_STACK program header > - * we turn execute permission off. > - */ > -#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ > - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > - > -#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ > - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) > - > -#define VM_STACK_DEFAULT_FLAGS \ > - (test_thread_flag(TIF_32BIT) ? \ > - VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) > - > -#endif /* __KERNEL__ */ > - > -#include > - > -#endif /* _PPC64_PAGE_H */ > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc64-dev -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051111/edbba2b9/attachment.pgp From michael at ellerman.id.au Fri Nov 11 13:09:52 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 11 Nov 2005 13:09:52 +1100 (EST) Subject: [PATCH] powerpc: Take 3, merge page.h Message-ID: <20051111020952.96F5F68704@ozlabs.org> Merge asm-ppc/page.h and asm-ppc64/page.h, into asm-powerpc/page.h, asm-powerpc/page_32.h and asm-powerpc/page_64.h There's a bit of weirdness in page_32.h, with APUS undef'ing things. I think this is cleaner though than polluting the rest of the code with PPC_MEMOFFSET etc. Built for PPC (common_defconfig), with ARCH=powerpc, mostly built with ARCH=ppc (other things break the build). Built and booted on P5 LPAR for PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Mostly built and for iSeries powerpc. Signed-off-by: Michael Ellerman --- arch/ppc64/Kconfig | 5 include/asm-powerpc/page.h | 176 ++++++++++++++++++++++ include/asm-powerpc/page_32.h | 97 ++++++++++++ include/asm-powerpc/page_64.h | 177 ++++++++++++++++++++++ include/asm-ppc/page.h | 173 --------------------- include/asm-ppc64/page.h | 333 ------------------------------------------ 6 files changed, 455 insertions(+), 506 deletions(-) Index: kexec/include/asm-powerpc/page.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page.h @@ -0,0 +1,176 @@ +#ifndef _ASM_POWERPC_PAGE_H +#define _ASM_POWERPC_PAGE_H + +/* + * Copyright (C) 2001,2005 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifdef __KERNEL__ +#include +#include /* for ASM_CONST */ + +/* + * On PPC32 page size is 4K. For PPC64 we support either 4K or 64K software + * page size. When using 64K pages however, whether we are really supporting + * 64K pages in HW or not is irrelevant to those definitions. + */ +#ifdef CONFIG_PPC_64K_PAGES +#define PAGE_SHIFT 16 +#else +#define PAGE_SHIFT 12 +#endif + +#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) + +/* + * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So on PPC32 + * if we assign PAGE_MASK to a long long it gets extended the way want + * (i.e. with 1s in the high bits) + */ +#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) + +#define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) +#define KERNELBASE PAGE_OFFSET + +#ifdef CONFIG_DISCONTIGMEM +#define page_to_pfn(page) discontigmem_page_to_pfn(page) +#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) +#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) +#endif + +#ifdef CONFIG_FLATMEM +#define pfn_to_page(pfn) (mem_map + (pfn)) +#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) +#define pfn_valid(pfn) ((pfn) < max_mapnr) +#endif + +#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) + +#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) +#define __pa(x) ((unsigned long)(x) - PAGE_OFFSET) + +/* + * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, + * and needs to be executable. This means the whole heap ends + * up being executable. + */ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#ifdef __powerpc64__ +#include +#else +#include +#endif + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) +#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr,size) _ALIGN_UP(addr,size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +#ifndef __ASSEMBLY__ + +#undef STRICT_MM_TYPECHECKS + +#ifdef STRICT_MM_TYPECHECKS +/* These are used to make use of C type-checking. */ + +/* PTE level */ +typedef struct { pte_basic_t pte; } pte_t; +#define pte_val(x) ((x).pte) +#define __pte(x) ((pte_t) { (x) }) + +/* 64k pages additionally define a bigger "real PTE" type that gathers + * the "second half" part of the PTE for pseudo 64k pages + */ +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef struct { pte_t pte; } real_pte_t; +#endif + +/* PMD level */ +typedef struct { unsigned long pmd; } pmd_t; +#define pmd_val(x) ((x).pmd) +#define __pmd(x) ((pmd_t) { (x) }) + +/* PUD level exusts only on 4k pages */ +#ifndef CONFIG_PPC_64K_PAGES +typedef struct { unsigned long pud; } pud_t; +#define pud_val(x) ((x).pud) +#define __pud(x) ((pud_t) { (x) }) +#endif + +/* PGD level */ +typedef struct { unsigned long pgd; } pgd_t; +#define pgd_val(x) ((x).pgd) +#define __pgd(x) ((pgd_t) { (x) }) + +/* Page protection bits */ +typedef struct { unsigned long pgprot; } pgprot_t; +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) }) + +#else + +/* + * .. while these make it easier on the compiler + */ + +typedef pte_basic_t pte_t; +#define pte_val(x) (x) +#define __pte(x) (x) + +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef unsigned long real_pte_t; +#endif + + +typedef unsigned long pmd_t; +#define pmd_val(x) (x) +#define __pmd(x) (x) + +#ifndef CONFIG_PPC_64K_PAGES +typedef unsigned long pud_t; +#define pud_val(x) (x) +#define __pud(x) (x) +#endif + +typedef unsigned long pgd_t; +#define pgd_val(x) (x) +#define pgprot_val(x) (x) + +typedef unsigned long pgprot_t; +#define __pgd(x) (x) +#define __pgprot(x) (x) + +#endif + +struct page; +extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); +extern void copy_user_page(void *to, void *from, unsigned long vaddr, + struct page *p); +extern int page_is_ram(unsigned long pfn); + +#endif /* __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_PAGE_H */ Index: kexec/include/asm-powerpc/page_32.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_32.h @@ -0,0 +1,97 @@ +#ifndef _ASM_POWERPC_PAGE_32_H +#define _ASM_POWERPC_PAGE_32_H + +#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS32 + +#ifndef __ASSEMBLY__ +/* + * The basic type of a PTE - 64 bits for those CPUs with > 32 bit + * physical addressing. For now this just the IBM PPC440. + */ +#ifdef CONFIG_PTE_64BIT +typedef unsigned long long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ +#define PTE_FMT "%16Lx" +#else +typedef unsigned long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ +#define PTE_FMT "%.8lx" +#endif + +struct page; +extern void clear_pages(void *page, int order); +static inline void clear_page(void *page) { clear_pages(page, 0); } +extern void copy_page(void *to, void *from); + +/* Pure 2^n version of get_order */ +extern __inline__ int get_order(unsigned long size) +{ + int lz; + + size = (size-1) >> PAGE_SHIFT; + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); + return 32 - lz; +} + +#ifndef CONFIG_APUS +#define PPC_MEMSTART 0 +#else /* CONFIG_APUS */ +extern unsigned long ppc_memstart; +extern unsigned long ppc_pgstart; +extern unsigned long ppc_memoffset; +#define PPC_MEMSTART ppc_memstart + +#ifdef MODULE +#define ___pa(vaddr) ((vaddr) - ppc_memoffset) +#define ___va(paddr) ((paddr) + ppc_memoffset) +#else /* !MODULE */ +/* map phys->virtual and virtual->phys for RAM pages */ +static inline unsigned long ___pa(unsigned long v) +{ + unsigned long p; + asm volatile ("1: addis %0, %1, %2;" + ".section \".vtop_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (p) + : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); + + return p; +} +static inline void* ___va(unsigned long p) +{ + unsigned long v; + asm volatile ("1: addis %0, %1, %2;" + ".section \".ptov_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (v) + : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); + + return (void*) v; +} +#endif + +/* APUS needs more complicated versions of these macros */ +#undef __pa +#define __pa(x) ___pa((unsigned long)(x)) + +#undef __va +#define __va(x) ((void *)(___va((unsigned long)(x)))) + +#undef pfn_to_page +#define pfn_to_page(pfn) (mem_map + ((pfn) - ppc_pgstart)) + +#undef page_to_pfn +#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + ppc_pgstart) + +#undef pfn_valid +#define pfn_valid(pfn) (((pfn) - ppc_pgstart) < max_mapnr) + +#endif /* CONFIG_APUS */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_POWERPC_PAGE_32_H */ Index: kexec/include/asm-powerpc/page_64.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_64.h @@ -0,0 +1,177 @@ +#ifndef _ASM_POWERPC_PAGE_64_H +#define _ASM_POWERPC_PAGE_64_H + +/* + * Copyright (C) 2001 PPC64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +/* + * We always define HW_PAGE_SHIFT to 12 as use of 64K pages remains Linux + * specific, every notion of page number shared with the firmware, TCEs, + * iommu, etc... still uses a page size of 4K. + */ +#define HW_PAGE_SHIFT 12 +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) + +/* + * PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and + * HW_PAGE_SHIFT, that is 4K pages. + */ +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) + +#define REGION_SIZE 4UL +#define REGION_SHIFT 60UL +#define REGION_MASK (((1UL<> REGION_SHIFT) +#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) +#define USER_REGION_ID (0UL) +#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) + +/* Segment size */ +#define SID_SHIFT 28 +#define SID_MASK 0xfffffffffUL +#define ESID_MASK 0xfffffffff0000000UL +#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) + +#ifndef __ASSEMBLY__ +#include + +typedef unsigned long pte_basic_t; + +static __inline__ void clear_page(void *addr) +{ + unsigned long lines, line_size; + + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; + + __asm__ __volatile__( + "mtctr %1 # clear_page\n\ +1: dcbz 0,%0\n\ + add %0,%0,%3\n\ + bdnz+ 1b" + : "=r" (addr) + : "r" (lines), "0" (addr), "r" (line_size) + : "ctr", "memory"); +} + +extern void copy_4K_page(void *to, void *from); + +#ifdef CONFIG_PPC_64K_PAGES +static inline void copy_page(void *to, void *from) +{ + unsigned int i; + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { + copy_4K_page(to, from); + to += 4096; + from += 4096; + } +} +#else /* CONFIG_PPC_64K_PAGES */ +static inline void copy_page(void *to, void *from) +{ + copy_4K_page(to, from); +} +#endif /* CONFIG_PPC_64K_PAGES */ + +/* Log 2 of page table size */ +extern u64 ppc64_pft_size; + +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + +/* Large pages size */ +extern unsigned int HPAGE_SHIFT; +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +#endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_HUGETLB_PAGE + +#define HTLB_AREA_SHIFT 40 +#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) +#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) + +#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ + - (1U << GET_ESID(addr))) & 0xffff) +#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ + - (1U << GET_HTLB_AREA(addr))) & 0xffff) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE +#define ARCH_HAS_SETCLEAR_HUGE_PTE + +#define touches_hugepage_low_range(mm, addr, len) \ + (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) +#define touches_hugepage_high_range(mm, addr, len) \ + (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) + +#define __within_hugepage_low_range(addr, len, segmask) \ + ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) +#define within_hugepage_low_range(addr, len) \ + __within_hugepage_low_range((addr), (len), \ + current->mm->context.low_htlb_areas) +#define __within_hugepage_high_range(addr, len, zonemask) \ + ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) +#define within_hugepage_high_range(addr, len) \ + __within_hugepage_high_range((addr), (len), \ + current->mm->context.high_htlb_areas) + +#define is_hugepage_only_range(mm, addr, len) \ + (touches_hugepage_high_range((mm), (addr), (len)) || \ + touches_hugepage_low_range((mm), (addr), (len))) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ + ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ + ( ((addr) < 0x100000000L) && \ + ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ + +#ifdef MODULE +#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) +#else +#define __page_aligned \ + __attribute__((__aligned__(PAGE_SIZE), \ + __section__(".data.page_aligned"))) +#endif + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +/* + * This is the default if a program doesn't have a PT_GNU_STACK + * program header entry. The PPC64 ELF ABI has a non executable stack + * stack by default, so in the absense of a PT_GNU_STACK program header + * we turn execute permission off. + */ +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) + +#include + +#endif /* _ASM_POWERPC_PAGE_64_H */ Index: kexec/include/asm-ppc/page.h =================================================================== --- kexec.orig/include/asm-ppc/page.h +++ /dev/null @@ -1,173 +0,0 @@ -#ifndef _PPC_PAGE_H -#define _PPC_PAGE_H - -/* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 -#define PAGE_SIZE (1UL << PAGE_SHIFT) - -/* - * Subtle: this is an int (not an unsigned long) and so it - * gets extended to 64 bits the way want (i.e. with 1s). -- paulus - */ -#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) - -#ifdef __KERNEL__ -#include - -/* This must match what is in arch/ppc/Makefile */ -#define PAGE_OFFSET CONFIG_KERNEL_START -#define KERNELBASE PAGE_OFFSET - -#ifndef __ASSEMBLY__ - -/* - * The basic type of a PTE - 64 bits for those CPUs with > 32 bit - * physical addressing. For now this just the IBM PPC440. - */ -#ifdef CONFIG_PTE_64BIT -typedef unsigned long long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ -#define PTE_FMT "%16Lx" -#else -typedef unsigned long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ -#define PTE_FMT "%.8lx" -#endif - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - - -#undef STRICT_MM_TYPECHECKS - -#ifdef STRICT_MM_TYPECHECKS -/* - * These are used to make use of C type-checking.. - */ -typedef struct { pte_basic_t pte; } pte_t; -typedef struct { unsigned long pmd; } pmd_t; -typedef struct { unsigned long pgd; } pgd_t; -typedef struct { unsigned long pgprot; } pgprot_t; - -#define pte_val(x) ((x).pte) -#define pmd_val(x) ((x).pmd) -#define pgd_val(x) ((x).pgd) -#define pgprot_val(x) ((x).pgprot) - -#define __pte(x) ((pte_t) { (x) } ) -#define __pmd(x) ((pmd_t) { (x) } ) -#define __pgd(x) ((pgd_t) { (x) } ) -#define __pgprot(x) ((pgprot_t) { (x) } ) - -#else -/* - * .. while these make it easier on the compiler - */ -typedef pte_basic_t pte_t; -typedef unsigned long pmd_t; -typedef unsigned long pgd_t; -typedef unsigned long pgprot_t; - -#define pte_val(x) (x) -#define pmd_val(x) (x) -#define pgd_val(x) (x) -#define pgprot_val(x) (x) - -#define __pte(x) (x) -#define __pmd(x) (x) -#define __pgd(x) (x) -#define __pgprot(x) (x) - -#endif - -struct page; -extern void clear_pages(void *page, int order); -static inline void clear_page(void *page) { clear_pages(page, 0); } -extern void copy_page(void *to, void *from); -extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); -extern void copy_user_page(void *to, void *from, unsigned long vaddr, - struct page *pg); - -#ifndef CONFIG_APUS -#define PPC_MEMSTART 0 -#define PPC_PGSTART 0 -#define PPC_MEMOFFSET PAGE_OFFSET -#else -extern unsigned long ppc_memstart; -extern unsigned long ppc_pgstart; -extern unsigned long ppc_memoffset; -#define PPC_MEMSTART ppc_memstart -#define PPC_PGSTART ppc_pgstart -#define PPC_MEMOFFSET ppc_memoffset -#endif - -#if defined(CONFIG_APUS) && !defined(MODULE) -/* map phys->virtual and virtual->phys for RAM pages */ -static inline unsigned long ___pa(unsigned long v) -{ - unsigned long p; - asm volatile ("1: addis %0, %1, %2;" - ".section \".vtop_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (p) - : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); - - return p; -} -static inline void* ___va(unsigned long p) -{ - unsigned long v; - asm volatile ("1: addis %0, %1, %2;" - ".section \".ptov_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (v) - : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); - - return (void*) v; -} -#else -#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET) -#define ___va(paddr) ((paddr)+PPC_MEMOFFSET) -#endif - -extern int page_is_ram(unsigned long pfn); - -#define __pa(x) ___pa((unsigned long)(x)) -#define __va(x) ((void *)(___va((unsigned long)(x)))) - -#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + PPC_PGSTART) -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) - -#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* Pure 2^n version of get_order */ -extern __inline__ int get_order(unsigned long size) -{ - int lz; - - size = (size-1) >> PAGE_SHIFT; - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); - return 32 - lz; -} - -#endif /* __ASSEMBLY__ */ - -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#endif /* __KERNEL__ */ -#endif /* _PPC_PAGE_H */ Index: kexec/include/asm-ppc64/page.h =================================================================== --- kexec.orig/include/asm-ppc64/page.h +++ /dev/null @@ -1,333 +0,0 @@ -#ifndef _PPC64_PAGE_H -#define _PPC64_PAGE_H - -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include /* for ASM_CONST */ - -/* - * We support either 4k or 64k software page size. When using 64k pages - * however, wether we are really supporting 64k pages in HW or not is - * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 - * as use of 64k pages remains a linux kernel specific, every notion of - * page number shared with the firmware, TCEs, iommu, etc... still assumes - * a page size of 4096. - */ -#ifdef CONFIG_PPC_64K_PAGES -#define PAGE_SHIFT 16 -#else -#define PAGE_SHIFT 12 -#endif - -#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - -/* HW_PAGE_SHIFT is always 4k pages */ -#define HW_PAGE_SHIFT 12 -#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) -#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) - -/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and - * HW_PAGE_SHIFT, that is 4k pages - */ -#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) - -/* Segment size */ -#define SID_SHIFT 28 -#define SID_MASK 0xfffffffffUL -#define ESID_MASK 0xfffffffff0000000UL -#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) - -/* Large pages size */ - -#ifndef __ASSEMBLY__ -extern unsigned int HPAGE_SHIFT; -#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#endif /* __ASSEMBLY__ */ - -#ifdef CONFIG_HUGETLB_PAGE - - -#define HTLB_AREA_SHIFT 40 -#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) -#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) - -#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ - - (1U << GET_ESID(addr))) & 0xffff) -#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ - - (1U << GET_HTLB_AREA(addr))) & 0xffff) - -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE -#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE -#define ARCH_HAS_SETCLEAR_HUGE_PTE - -#define touches_hugepage_low_range(mm, addr, len) \ - (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) -#define touches_hugepage_high_range(mm, addr, len) \ - (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) - -#define __within_hugepage_low_range(addr, len, segmask) \ - ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) -#define within_hugepage_low_range(addr, len) \ - __within_hugepage_low_range((addr), (len), \ - current->mm->context.low_htlb_areas) -#define __within_hugepage_high_range(addr, len, zonemask) \ - ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) -#define within_hugepage_high_range(addr, len) \ - __within_hugepage_high_range((addr), (len), \ - current->mm->context.high_htlb_areas) - -#define is_hugepage_only_range(mm, addr, len) \ - (touches_hugepage_high_range((mm), (addr), (len)) || \ - touches_hugepage_low_range((mm), (addr), (len))) -#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA - -#define in_hugepage_area(context, addr) \ - (cpu_has_feature(CPU_FTR_16M_PAGE) && \ - ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ - ( ((addr) < 0x100000000L) && \ - ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) - -#else /* !CONFIG_HUGETLB_PAGE */ - -#define in_hugepage_area(mm, addr) 0 - -#endif /* !CONFIG_HUGETLB_PAGE */ - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#include - -#undef STRICT_MM_TYPECHECKS - -#define REGION_SIZE 4UL -#define REGION_SHIFT 60UL -#define REGION_MASK (((1UL<> REGION_SHIFT) -#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) -#define USER_REGION_ID (0UL) -#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) - -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) - -#ifdef CONFIG_DISCONTIGMEM -#define page_to_pfn(page) discontigmem_page_to_pfn(page) -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) -#endif -#ifdef CONFIG_FLATMEM -#define pfn_to_page(pfn) (mem_map + (pfn)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) -#define pfn_valid(pfn) ((pfn) < max_mapnr) -#endif - -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) - -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* - * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, - * and needs to be executable. This means the whole heap ends - * up being executable. - */ -#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) - -/* - * This is the default if a program doesn't have a PT_GNU_STACK - * program header entry. The PPC64 ELF ABI has a non executable stack - * stack by default, so in the absense of a PT_GNU_STACK program header - * we turn execute permission off. - */ -#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) - -#endif /* __KERNEL__ */ - -#include - -#endif /* _PPC64_PAGE_H */ Index: kexec/arch/ppc64/Kconfig =================================================================== --- kexec.orig/arch/ppc64/Kconfig +++ kexec/arch/ppc64/Kconfig @@ -291,6 +291,11 @@ config ARCH_SPARSEMEM_ENABLE def_bool y depends on ARCH_DISCONTIGMEM_ENABLE +# Hack to make ARCH=ppc64 work with asm-powerpc/page.h +config KERNEL_START + hex + default "0xc000000000000000" + source "mm/Kconfig" config HAVE_ARCH_EARLY_PFN_TO_NID From anton at samba.org Fri Nov 11 13:56:16 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 13:56:16 +1100 Subject: [PATCH] ppc64: Remove debug boot message Message-ID: <20051111025616.GC14770@krispykreme> We have been printing the raw ppc64_firmware_features during boot. Since we can work it out from the device tree, lets remove it. Signed-off-by: Anton Blanchard Index: build/arch/powerpc/platforms/pseries/setup.c =================================================================== --- build.orig/arch/powerpc/platforms/pseries/setup.c 2005-11-08 15:45:30.000000000 +1100 +++ build/arch/powerpc/platforms/pseries/setup.c 2005-11-08 15:47:03.000000000 +1100 @@ -306,9 +306,7 @@ } of_node_put(dn); - no_rtas: - printk(KERN_INFO "firmware_features = 0x%lx\n", - ppc64_firmware_features); +no_rtas: DBG(" <- fw_feature_init()\n"); } From anton at samba.org Fri Nov 11 13:53:11 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 13:53:11 +1100 Subject: [PATCH] ppc64: Quieten lparcfg Message-ID: <20051111025310.GB14770@krispykreme> If we dont have permission to read some information from the hypervisor, lparcfg outputs a warning on the console. Now that lparcfg is world readable this is a problem. Dont warn in the case of H_Authority, remove some unnecessary function prototypes and fix whitespace damage in a structure as well. Signed-off-by: Anton Blanchard --- Index: build/arch/powerpc/kernel/lparcfg.c =================================================================== --- build.orig/arch/powerpc/kernel/lparcfg.c 2005-11-11 11:36:16.000000000 +1100 +++ build/arch/powerpc/kernel/lparcfg.c 2005-11-11 11:39:23.000000000 +1100 @@ -43,7 +43,7 @@ /* #define LPARCFG_DEBUG */ /* find a better place for this function... */ -void log_plpar_hcall_return(unsigned long rc, char *tag) +static void log_plpar_hcall_return(unsigned long rc, char *tag) { if (rc == 0) /* success, return */ return; @@ -213,11 +213,10 @@ unsigned long dummy; rc = plpar_hcall(H_PIC, 0, 0, 0, 0, pool_idle_time, num_procs, &dummy); - log_plpar_hcall_return(rc, "H_PIC"); + if (rc != H_Authority) + log_plpar_hcall_return(rc, "H_PIC"); } -static unsigned long get_purr(void); - /* Track sum of all purrs across all processors. This is used to further */ /* calculate usage values by different applications */ @@ -319,8 +318,6 @@ kfree(local_buffer); } -static int lparcfg_count_active_processors(void); - /* Return the number of processors in the system. * This function reads through the device tree and counts * the virtual processors, this does not include threads. @@ -548,7 +545,7 @@ retval = -EIO; } - out: +out: kfree(kbuf); return retval; } @@ -561,10 +558,10 @@ } struct file_operations lparcfg_fops = { - .owner = THIS_MODULE, - .read = seq_read, - .open = lparcfg_open, - .release = single_release, + .owner = THIS_MODULE, + .read = seq_read, + .open = lparcfg_open, + .release = single_release, }; int __init lparcfg_init(void) From anton at samba.org Fri Nov 11 14:07:20 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 14:07:20 +1100 Subject: [PATCH] ppc64: Cleanup kprobe assembly Message-ID: <20051111030720.GE14770@krispykreme> The kprobes code is doing ".previous .text". While the assembler doesnt warn at the moment (and it seems to work), it might in the future. Signed-off-by: Anton Blanchard --- Index: build/arch/ppc64/kernel/misc.S =================================================================== --- build.orig/arch/ppc64/kernel/misc.S 2005-11-08 15:45:30.000000000 +1100 +++ build/arch/ppc64/kernel/misc.S 2005-11-08 15:54:08.000000000 +1100 @@ -186,7 +186,8 @@ bdnz 2b isync blr - .previous .text + + .text /* * Like above, but only do the D-cache. * From michael at ellerman.id.au Fri Nov 11 14:09:16 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 11 Nov 2005 14:09:16 +1100 Subject: [PATCH] powerpc: Take 3, merge page.h In-Reply-To: <20051111020952.96F5F68704@ozlabs.org> References: <20051111020952.96F5F68704@ozlabs.org> Message-ID: <200511111409.21213.michael@ellerman.id.au> And that one conflicts with David's asm-compat.h patch, fifth time lucky perhaps? -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051111/1fb2aaba/attachment.pgp From anton at samba.org Fri Nov 11 14:12:26 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 14:12:26 +1100 Subject: [PATCH] ppc64: prep for NUMA sparsemem rework Message-ID: <20051111031226.GF14770@krispykreme> Remove an unused numa define and move a discontigmem specific define inside the relevant ifdef. I will submit a separate patch to remove them from other architectures, but the ppc64 patches to follow depend on this. Signed-off-by: Anton Blanchard --- Index: build/include/asm-ppc64/mmzone.h =================================================================== --- build.orig/include/asm-ppc64/mmzone.h 2005-11-11 11:36:17.000000000 +1100 +++ build/include/asm-ppc64/mmzone.h 2005-11-11 11:39:29.000000000 +1100 @@ -66,8 +66,6 @@ return nid; } -#define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn) - /* * Following are macros that each numa implmentation must define. */ @@ -77,10 +75,7 @@ #ifdef CONFIG_DISCONTIGMEM -/* - * Given a kernel address, find the home node of the underlying memory. - */ -#define kvaddr_to_nid(kaddr) pa_to_nid(__pa(kaddr)) +#define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn) #define pfn_to_nid(pfn) pa_to_nid((unsigned long)(pfn) << PAGE_SHIFT) From anton at samba.org Fri Nov 11 14:13:20 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 14:13:20 +1100 Subject: [PATCH] ppc64: prep for NUMA sparsemem rework 2 Message-ID: <20051111031320.GG14770@krispykreme> Remove ppc64 specific version of nr_cpus_node and use the generic one provided. Signed-off-by: Anton Blanchard --- Index: build/include/asm-powerpc/topology.h =================================================================== --- build.orig/include/asm-powerpc/topology.h 2005-11-11 11:31:38.000000000 +1100 +++ build/include/asm-powerpc/topology.h 2005-11-11 11:40:34.000000000 +1100 @@ -37,8 +37,6 @@ #define pcibus_to_node(node) (-1) #define pcibus_to_cpumask(bus) (cpu_online_map) -#define nr_cpus_node(node) (nr_cpus_in_node[node]) - /* sched_domains SD_NODE_INIT for PPC64 machines */ #define SD_NODE_INIT (struct sched_domain) { \ .span = CPU_MASK_NONE, \ Index: build/arch/powerpc/mm/numa.c =================================================================== --- build.orig/arch/powerpc/mm/numa.c 2005-11-11 11:31:38.000000000 +1100 +++ build/arch/powerpc/mm/numa.c 2005-11-11 11:40:34.000000000 +1100 @@ -38,7 +38,6 @@ ARRAY_INITIALISER}; char *numa_memory_lookup_table; cpumask_t numa_cpumask_lookup_table[MAX_NUMNODES]; -int nr_cpus_in_node[MAX_NUMNODES] = { [0 ... (MAX_NUMNODES -1)] = 0}; struct pglist_data *node_data[MAX_NUMNODES]; bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; @@ -58,14 +57,12 @@ EXPORT_SYMBOL(numa_cpu_lookup_table); EXPORT_SYMBOL(numa_memory_lookup_table); EXPORT_SYMBOL(numa_cpumask_lookup_table); -EXPORT_SYMBOL(nr_cpus_in_node); static inline void map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; if (!(cpu_isset(cpu, numa_cpumask_lookup_table[node]))) { cpu_set(cpu, numa_cpumask_lookup_table[node]); - nr_cpus_in_node[node]++; } } @@ -78,7 +75,6 @@ if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { cpu_clear(cpu, numa_cpumask_lookup_table[node]); - nr_cpus_in_node[node]--; } else { printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n", cpu, node); Index: build/include/asm-ppc64/mmzone.h =================================================================== --- build.orig/include/asm-ppc64/mmzone.h 2005-11-11 11:39:29.000000000 +1100 +++ build/include/asm-ppc64/mmzone.h 2005-11-11 11:41:38.000000000 +1100 @@ -32,7 +32,6 @@ extern int numa_cpu_lookup_table[]; extern char *numa_memory_lookup_table; extern cpumask_t numa_cpumask_lookup_table[]; -extern int nr_cpus_in_node[]; #ifdef CONFIG_MEMORY_HOTPLUG extern unsigned long max_pfn; #endif From anton at samba.org Fri Nov 11 14:22:35 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 14:22:35 +1100 Subject: [PATCH] ppc64: Convert NUMA to sparsemem (3) Message-ID: <20051111032234.GH14770@krispykreme> Convert to sparsemem and remove all the discontigmem code in the process. This has a few advantages: - The old numa_memory_lookup_table can go away - All the arch specific discontigmem magic can go away We also remove the triple pass of memory properties and instead create a list of per node extents that we iterate through. A final cleanup would be to change our lmb code to store extents per node, then we can reuse that information in the numa code. Signed-off-by: Anton Blanchard --- arch/powerpc/Kconfig | 11 - arch/powerpc/mm/numa.c | 365 +++++++++++++++++------------------------ arch/ppc64/Kconfig | 11 - include/asm-powerpc/topology.h | 10 - include/asm-ppc64/mmzone.h | 63 ------- include/asm-ppc64/page.h | 5 6 files changed, 168 insertions(+), 295 deletions(-) Index: build/arch/powerpc/mm/numa.c =================================================================== --- build.orig/arch/powerpc/mm/numa.c 2005-11-11 11:40:34.000000000 +1100 +++ build/arch/powerpc/mm/numa.c 2005-11-11 11:41:55.000000000 +1100 @@ -17,9 +17,8 @@ #include #include #include +#include #include -#include -#include #include #include @@ -28,42 +27,113 @@ static int numa_debug; #define dbg(args...) if (numa_debug) { printk(KERN_INFO args); } -#ifdef DEBUG_NUMA -#define ARRAY_INITIALISER -1 -#else -#define ARRAY_INITIALISER 0 -#endif - -int numa_cpu_lookup_table[NR_CPUS] = { [ 0 ... (NR_CPUS - 1)] = - ARRAY_INITIALISER}; -char *numa_memory_lookup_table; +int numa_cpu_lookup_table[NR_CPUS]; cpumask_t numa_cpumask_lookup_table[MAX_NUMNODES]; - struct pglist_data *node_data[MAX_NUMNODES]; -bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; + +EXPORT_SYMBOL(numa_cpu_lookup_table); +EXPORT_SYMBOL(numa_cpumask_lookup_table); +EXPORT_SYMBOL(node_data); + +static bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; static int min_common_depth; /* - * We need somewhere to store start/span for each node until we have + * We need somewhere to store start/end/node for each region until we have * allocated the real node_data structures. */ +#define MAX_REGIONS (MAX_LMB_REGIONS*2) static struct { - unsigned long node_start_pfn; - unsigned long node_end_pfn; - unsigned long node_present_pages; -} init_node_data[MAX_NUMNODES] __initdata; + unsigned long start_pfn; + unsigned long end_pfn; + int nid; +} init_node_data[MAX_REGIONS] __initdata; -EXPORT_SYMBOL(node_data); -EXPORT_SYMBOL(numa_cpu_lookup_table); -EXPORT_SYMBOL(numa_memory_lookup_table); -EXPORT_SYMBOL(numa_cpumask_lookup_table); +int __init early_pfn_to_nid(unsigned long pfn) +{ + unsigned int i; + + for (i = 0; init_node_data[i].end_pfn; i++) { + unsigned long start_pfn = init_node_data[i].start_pfn; + unsigned long end_pfn = init_node_data[i].end_pfn; + + if ((start_pfn <= pfn) && (pfn < end_pfn)) + return init_node_data[i].nid; + } + + return -1; +} + +void __init add_region(unsigned int nid, unsigned long start_pfn, + unsigned long pages) +{ + unsigned int i; + + dbg("add_region nid %d start_pfn 0x%lx pages 0x%lx\n", + nid, start_pfn, pages); + + for (i = 0; init_node_data[i].end_pfn; i++) { + if (init_node_data[i].nid != nid) + continue; + if (init_node_data[i].end_pfn == start_pfn) { + init_node_data[i].end_pfn += pages; + return; + } + if (init_node_data[i].start_pfn == (start_pfn + pages)) { + init_node_data[i].start_pfn -= pages; + return; + } + } + + /* + * Leave last entry NULL so we dont iterate off the end (we use + * entry.end_pfn to terminate the walk). + */ + if (i >= (MAX_REGIONS - 1)) { + printk(KERN_ERR "WARNING: too many memory regions in " + "numa code, truncating\n"); + return; + } + + init_node_data[i].start_pfn = start_pfn; + init_node_data[i].end_pfn = start_pfn + pages; + init_node_data[i].nid = nid; +} + +/* We assume init_node_data has no overlapping regions */ +void __init get_region(unsigned int nid, unsigned long *start_pfn, + unsigned long *end_pfn, unsigned long *pages_present) +{ + unsigned int i; + + *start_pfn = -1UL; + *end_pfn = *pages_present = 0; + + for (i = 0; init_node_data[i].end_pfn; i++) { + if (init_node_data[i].nid != nid) + continue; + + *pages_present += init_node_data[i].end_pfn - + init_node_data[i].start_pfn; + + if (init_node_data[i].start_pfn < *start_pfn) + *start_pfn = init_node_data[i].start_pfn; + + if (init_node_data[i].end_pfn > *end_pfn) + *end_pfn = init_node_data[i].end_pfn; + } + + /* We didnt find a matching region, return start/end as 0 */ + if (*start_pfn == -1UL) + start_pfn = 0; +} static inline void map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; - if (!(cpu_isset(cpu, numa_cpumask_lookup_table[node]))) { + + if (!(cpu_isset(cpu, numa_cpumask_lookup_table[node]))) cpu_set(cpu, numa_cpumask_lookup_table[node]); - } } #ifdef CONFIG_HOTPLUG_CPU @@ -82,7 +152,7 @@ } #endif /* CONFIG_HOTPLUG_CPU */ -static struct device_node * __devinit find_cpu_node(unsigned int cpu) +static struct device_node *find_cpu_node(unsigned int cpu) { unsigned int hw_cpuid = get_hard_smp_processor_id(cpu); struct device_node *cpu_node = NULL; @@ -209,7 +279,7 @@ return rc; } -static unsigned long read_n_cells(int n, unsigned int **buf) +static unsigned long __init read_n_cells(int n, unsigned int **buf) { unsigned long result = 0; @@ -291,7 +361,8 @@ * or zero. If the returned value of size is 0 the region should be * discarded as it lies wholy above the memory limit. */ -static unsigned long __init numa_enforce_memory_limit(unsigned long start, unsigned long size) +static unsigned long __init numa_enforce_memory_limit(unsigned long start, + unsigned long size) { /* * We use lmb_end_of_DRAM() in here instead of memory_limit because @@ -316,8 +387,7 @@ struct device_node *cpu = NULL; struct device_node *memory = NULL; int addr_cells, size_cells; - int max_domain = 0; - long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; + int max_domain; unsigned long i; if (numa_enabled == 0) { @@ -325,13 +395,6 @@ return -1; } - numa_memory_lookup_table = - (char *)abs_to_virt(lmb_alloc(entries * sizeof(char), 1)); - memset(numa_memory_lookup_table, 0, entries * sizeof(char)); - - for (i = 0; i < entries ; i++) - numa_memory_lookup_table[i] = ARRAY_INITIALISER; - min_common_depth = find_min_common_depth(); dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth); @@ -383,9 +446,6 @@ start = read_n_cells(addr_cells, &memcell_buf); size = read_n_cells(size_cells, &memcell_buf); - start = _ALIGN_DOWN(start, MEMORY_INCREMENT); - size = _ALIGN_UP(size, MEMORY_INCREMENT); - numa_domain = of_node_numa_domain(memory); if (numa_domain >= MAX_NUMNODES) { @@ -399,44 +459,15 @@ if (max_domain < numa_domain) max_domain = numa_domain; - if (! (size = numa_enforce_memory_limit(start, size))) { + if (!(size = numa_enforce_memory_limit(start, size))) { if (--ranges) goto new_range; else continue; } - /* - * Initialize new node struct, or add to an existing one. - */ - if (init_node_data[numa_domain].node_end_pfn) { - if ((start / PAGE_SIZE) < - init_node_data[numa_domain].node_start_pfn) - init_node_data[numa_domain].node_start_pfn = - start / PAGE_SIZE; - if (((start / PAGE_SIZE) + (size / PAGE_SIZE)) > - init_node_data[numa_domain].node_end_pfn) - init_node_data[numa_domain].node_end_pfn = - (start / PAGE_SIZE) + - (size / PAGE_SIZE); - - init_node_data[numa_domain].node_present_pages += - size / PAGE_SIZE; - } else { - node_set_online(numa_domain); - - init_node_data[numa_domain].node_start_pfn = - start / PAGE_SIZE; - init_node_data[numa_domain].node_end_pfn = - init_node_data[numa_domain].node_start_pfn + - size / PAGE_SIZE; - init_node_data[numa_domain].node_present_pages = - size / PAGE_SIZE; - } - - for (i = start ; i < (start+size); i += MEMORY_INCREMENT) - numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = - numa_domain; + add_region(numa_domain, start >> PAGE_SHIFT, + size >> PAGE_SHIFT); if (--ranges) goto new_range; @@ -452,32 +483,15 @@ { unsigned long top_of_ram = lmb_end_of_DRAM(); unsigned long total_ram = lmb_phys_mem_size(); - unsigned long i; printk(KERN_INFO "Top of RAM: 0x%lx, Total RAM: 0x%lx\n", top_of_ram, total_ram); printk(KERN_INFO "Memory hole size: %ldMB\n", (top_of_ram - total_ram) >> 20); - if (!numa_memory_lookup_table) { - long entries = top_of_ram >> MEMORY_INCREMENT_SHIFT; - numa_memory_lookup_table = - (char *)abs_to_virt(lmb_alloc(entries * sizeof(char), 1)); - memset(numa_memory_lookup_table, 0, entries * sizeof(char)); - for (i = 0; i < entries ; i++) - numa_memory_lookup_table[i] = ARRAY_INITIALISER; - } - map_cpu_to_node(boot_cpuid, 0); - + add_region(0, 0, lmb_end_of_DRAM() >> PAGE_SHIFT); node_set_online(0); - - init_node_data[0].node_start_pfn = 0; - init_node_data[0].node_end_pfn = lmb_end_of_DRAM() / PAGE_SIZE; - init_node_data[0].node_present_pages = total_ram / PAGE_SIZE; - - for (i = 0 ; i < top_of_ram; i += MEMORY_INCREMENT) - numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = 0; } static void __init dump_numa_topology(void) @@ -495,8 +509,9 @@ count = 0; - for (i = 0; i < lmb_end_of_DRAM(); i += MEMORY_INCREMENT) { - if (numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] == node) { + for (i = 0; i < lmb_end_of_DRAM(); + i += (1 << SECTION_SIZE_BITS)) { + if (early_pfn_to_nid(i >> PAGE_SHIFT) == node) { if (count == 0) printk(" 0x%lx", i); ++count; @@ -521,10 +536,12 @@ * * Returns the physical address of the memory. */ -static unsigned long careful_allocation(int nid, unsigned long size, - unsigned long align, unsigned long end) +static void __init *careful_allocation(int nid, unsigned long size, + unsigned long align, + unsigned long end_pfn) { - unsigned long ret = lmb_alloc_base(size, align, end); + int new_nid; + unsigned long ret = lmb_alloc_base(size, align, end_pfn << PAGE_SHIFT); /* retry over all memory */ if (!ret) @@ -538,28 +555,27 @@ * If the memory came from a previously allocated node, we must * retry with the bootmem allocator. */ - if (pa_to_nid(ret) < nid) { - nid = pa_to_nid(ret); - ret = (unsigned long)__alloc_bootmem_node(NODE_DATA(nid), + new_nid = early_pfn_to_nid(ret >> PAGE_SHIFT); + if (new_nid < nid) { + ret = (unsigned long)__alloc_bootmem_node(NODE_DATA(new_nid), size, align, 0); if (!ret) panic("numa.c: cannot allocate %lu bytes on node %d", - size, nid); + size, new_nid); - ret = virt_to_abs(ret); + ret = __pa(ret); dbg("alloc_bootmem %lx %lx\n", ret, size); } - return ret; + return (void *)ret; } void __init do_init_bootmem(void) { int nid; - int addr_cells, size_cells; - struct device_node *memory = NULL; + unsigned int i; static struct notifier_block ppc64_numa_nb = { .notifier_call = cpu_numa_callback, .priority = 1 /* Must run before sched domains notifier. */ @@ -577,99 +593,66 @@ register_cpu_notifier(&ppc64_numa_nb); for_each_online_node(nid) { - unsigned long start_paddr, end_paddr; - int i; + unsigned long start_pfn, end_pfn, pages_present; unsigned long bootmem_paddr; unsigned long bootmap_pages; - start_paddr = init_node_data[nid].node_start_pfn * PAGE_SIZE; - end_paddr = init_node_data[nid].node_end_pfn * PAGE_SIZE; + get_region(nid, &start_pfn, &end_pfn, &pages_present); /* Allocate the node structure node local if possible */ - NODE_DATA(nid) = (struct pglist_data *)careful_allocation(nid, + NODE_DATA(nid) = careful_allocation(nid, sizeof(struct pglist_data), - SMP_CACHE_BYTES, end_paddr); - NODE_DATA(nid) = abs_to_virt(NODE_DATA(nid)); + SMP_CACHE_BYTES, end_pfn); + NODE_DATA(nid) = __va(NODE_DATA(nid)); memset(NODE_DATA(nid), 0, sizeof(struct pglist_data)); dbg("node %d\n", nid); dbg("NODE_DATA() = %p\n", NODE_DATA(nid)); NODE_DATA(nid)->bdata = &plat_node_bdata[nid]; - NODE_DATA(nid)->node_start_pfn = - init_node_data[nid].node_start_pfn; - NODE_DATA(nid)->node_spanned_pages = - end_paddr - start_paddr; + NODE_DATA(nid)->node_start_pfn = start_pfn; + NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn; if (NODE_DATA(nid)->node_spanned_pages == 0) continue; - dbg("start_paddr = %lx\n", start_paddr); - dbg("end_paddr = %lx\n", end_paddr); + dbg("start_paddr = %lx\n", start_pfn << PAGE_SHIFT); + dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT); - bootmap_pages = bootmem_bootmap_pages((end_paddr - start_paddr) >> PAGE_SHIFT); + bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn); + bootmem_paddr = (unsigned long)careful_allocation(nid, + bootmap_pages << PAGE_SHIFT, + PAGE_SIZE, end_pfn); + memset(__va(bootmem_paddr), 0, bootmap_pages << PAGE_SHIFT); - bootmem_paddr = careful_allocation(nid, - bootmap_pages << PAGE_SHIFT, - PAGE_SIZE, end_paddr); - memset(abs_to_virt(bootmem_paddr), 0, - bootmap_pages << PAGE_SHIFT); dbg("bootmap_paddr = %lx\n", bootmem_paddr); init_bootmem_node(NODE_DATA(nid), bootmem_paddr >> PAGE_SHIFT, - start_paddr >> PAGE_SHIFT, - end_paddr >> PAGE_SHIFT); - - /* - * We need to do another scan of all memory sections to - * associate memory with the correct node. - */ - addr_cells = get_mem_addr_cells(); - size_cells = get_mem_size_cells(); - memory = NULL; - while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { - unsigned long mem_start, mem_size; - int numa_domain, ranges; - unsigned int *memcell_buf; - unsigned int len; - - memcell_buf = (unsigned int *)get_property(memory, "reg", &len); - if (!memcell_buf || len <= 0) - continue; + start_pfn, end_pfn); - ranges = memory->n_addrs; /* ranges in cell */ -new_range: - mem_start = read_n_cells(addr_cells, &memcell_buf); - mem_size = read_n_cells(size_cells, &memcell_buf); - if (numa_enabled) { - numa_domain = of_node_numa_domain(memory); - if (numa_domain >= MAX_NUMNODES) - numa_domain = 0; - } else - numa_domain = 0; + /* Add free regions on this node */ + for (i = 0; init_node_data[i].end_pfn; i++) { + unsigned long start, end; - if (numa_domain != nid) + if (init_node_data[i].nid != nid) continue; - mem_size = numa_enforce_memory_limit(mem_start, mem_size); - if (mem_size) { - dbg("free_bootmem %lx %lx\n", mem_start, mem_size); - free_bootmem_node(NODE_DATA(nid), mem_start, mem_size); - } + start = init_node_data[i].start_pfn << PAGE_SHIFT; + end = init_node_data[i].end_pfn << PAGE_SHIFT; - if (--ranges) /* process all ranges in cell */ - goto new_range; + dbg("free_bootmem %lx %lx\n", start, end - start); + free_bootmem_node(NODE_DATA(nid), start, end - start); } - /* - * Mark reserved regions on this node - */ + /* Mark reserved regions on this node */ for (i = 0; i < lmb.reserved.cnt; i++) { unsigned long physbase = lmb.reserved.region[i].base; unsigned long size = lmb.reserved.region[i].size; + unsigned long start_paddr = start_pfn << PAGE_SHIFT; + unsigned long end_paddr = end_pfn << PAGE_SHIFT; - if (pa_to_nid(physbase) != nid && - pa_to_nid(physbase+size-1) != nid) + if (early_pfn_to_nid(physbase >> PAGE_SHIFT) != nid && + early_pfn_to_nid((physbase+size-1) >> PAGE_SHIFT) != nid) continue; if (physbase < end_paddr && @@ -689,46 +672,19 @@ size); } } - /* - * This loop may look famaliar, but we have to do it again - * after marking our reserved memory to mark memory present - * for sparsemem. - */ - addr_cells = get_mem_addr_cells(); - size_cells = get_mem_size_cells(); - memory = NULL; - while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { - unsigned long mem_start, mem_size; - int numa_domain, ranges; - unsigned int *memcell_buf; - unsigned int len; - memcell_buf = (unsigned int *)get_property(memory, "reg", &len); - if (!memcell_buf || len <= 0) - continue; + /* Add regions into sparsemem */ + for (i = 0; init_node_data[i].end_pfn; i++) { + unsigned long start, end; - ranges = memory->n_addrs; /* ranges in cell */ -new_range2: - mem_start = read_n_cells(addr_cells, &memcell_buf); - mem_size = read_n_cells(size_cells, &memcell_buf); - if (numa_enabled) { - numa_domain = of_node_numa_domain(memory); - if (numa_domain >= MAX_NUMNODES) - numa_domain = 0; - } else - numa_domain = 0; - - if (numa_domain != nid) + if (init_node_data[i].nid != nid) continue; - mem_size = numa_enforce_memory_limit(mem_start, mem_size); - memory_present(numa_domain, mem_start >> PAGE_SHIFT, - (mem_start + mem_size) >> PAGE_SHIFT); + start = init_node_data[i].start_pfn; + end = init_node_data[i].end_pfn; - if (--ranges) /* process all ranges in cell */ - goto new_range2; + memory_present(nid, start, end); } - } } @@ -742,21 +698,18 @@ memset(zholes_size, 0, sizeof(zholes_size)); for_each_online_node(nid) { - unsigned long start_pfn; - unsigned long end_pfn; + unsigned long start_pfn, end_pfn, pages_present; - start_pfn = init_node_data[nid].node_start_pfn; - end_pfn = init_node_data[nid].node_end_pfn; + get_region(nid, &start_pfn, &end_pfn, &pages_present); zones_size[ZONE_DMA] = end_pfn - start_pfn; - zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - - init_node_data[nid].node_present_pages; + zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - pages_present; dbg("free_area_init node %d %lx %lx (hole: %lx)\n", nid, zones_size[ZONE_DMA], start_pfn, zholes_size[ZONE_DMA]); - free_area_init_node(nid, NODE_DATA(nid), zones_size, - start_pfn, zholes_size); + free_area_init_node(nid, NODE_DATA(nid), zones_size, start_pfn, + zholes_size); } } Index: build/include/asm-ppc64/mmzone.h =================================================================== --- build.orig/include/asm-ppc64/mmzone.h 2005-11-11 11:41:38.000000000 +1100 +++ build/include/asm-ppc64/mmzone.h 2005-11-11 11:42:24.000000000 +1100 @@ -8,15 +8,14 @@ #define _ASM_MMZONE_H_ #include -#include -/* generic non-linear memory support: +/* + * generic non-linear memory support: * * 1) we will not split memory into more chunks than will fit into the * flags field of the struct page */ - #ifdef CONFIG_NEED_MULTIPLE_NODES extern struct pglist_data *node_data[]; @@ -30,41 +29,11 @@ */ extern int numa_cpu_lookup_table[]; -extern char *numa_memory_lookup_table; extern cpumask_t numa_cpumask_lookup_table[]; #ifdef CONFIG_MEMORY_HOTPLUG extern unsigned long max_pfn; #endif -/* 16MB regions */ -#define MEMORY_INCREMENT_SHIFT 24 -#define MEMORY_INCREMENT (1UL << MEMORY_INCREMENT_SHIFT) - -/* NUMA debugging, will not work on a DLPAR machine */ -#undef DEBUG_NUMA - -static inline int pa_to_nid(unsigned long pa) -{ - int nid; - -#ifdef CONFIG_MEMORY_HOTPLUG - /* kludge hot added sections default to node 0 */ - if (pa >= (max_pfn << PAGE_SHIFT)) - return 0; -#endif - nid = numa_memory_lookup_table[pa >> MEMORY_INCREMENT_SHIFT]; - -#ifdef DEBUG_NUMA - /* the physical address passed in is not in the map for the system */ - if (nid == -1) { - printk("bad address: %lx\n", pa); - BUG(); - } -#endif - - return nid; -} - /* * Following are macros that each numa implmentation must define. */ @@ -72,36 +41,10 @@ #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) (NODE_DATA(nid)->node_end_pfn) -#ifdef CONFIG_DISCONTIGMEM - -#define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn) - -#define pfn_to_nid(pfn) pa_to_nid((unsigned long)(pfn) << PAGE_SHIFT) - -/* Written this way to avoid evaluating arguments twice */ -#define discontigmem_pfn_to_page(pfn) \ -({ \ - unsigned long __tmp = pfn; \ - (NODE_DATA(pfn_to_nid(__tmp))->node_mem_map + \ - node_localnr(__tmp, pfn_to_nid(__tmp))); \ -}) - -#define discontigmem_page_to_pfn(p) \ -({ \ - struct page *__tmp = p; \ - (((__tmp) - page_zone(__tmp)->zone_mem_map) + \ - page_zone(__tmp)->zone_start_pfn); \ -}) - -/* XXX fix for discontiguous physical memory */ -#define discontigmem_pfn_valid(pfn) ((pfn) < num_physpages) - -#endif /* CONFIG_DISCONTIGMEM */ - #endif /* CONFIG_NEED_MULTIPLE_NODES */ #ifdef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -#define early_pfn_to_nid(pfn) pa_to_nid(((unsigned long)pfn) << PAGE_SHIFT) +extern int __init early_pfn_to_nid(unsigned long pfn); #endif #endif /* _ASM_MMZONE_H_ */ Index: build/include/asm-powerpc/topology.h =================================================================== --- build.orig/include/asm-powerpc/topology.h 2005-11-11 11:40:34.000000000 +1100 +++ build/include/asm-powerpc/topology.h 2005-11-11 11:41:55.000000000 +1100 @@ -9,15 +9,7 @@ static inline int cpu_to_node(int cpu) { - int node; - - node = numa_cpu_lookup_table[cpu]; - -#ifdef DEBUG_NUMA - BUG_ON(node == -1); -#endif - - return node; + return numa_cpu_lookup_table[cpu]; } #define parent_node(node) (node) Index: build/include/asm-ppc64/page.h =================================================================== --- build.orig/include/asm-ppc64/page.h 2005-11-11 11:36:17.000000000 +1100 +++ build/include/asm-ppc64/page.h 2005-11-11 11:41:55.000000000 +1100 @@ -279,11 +279,6 @@ #define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) -#ifdef CONFIG_DISCONTIGMEM -#define page_to_pfn(page) discontigmem_page_to_pfn(page) -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) -#endif #ifdef CONFIG_FLATMEM #define pfn_to_page(pfn) (mem_map + (pfn)) #define page_to_pfn(page) ((unsigned long)((page) - mem_map)) Index: build/arch/powerpc/Kconfig =================================================================== --- build.orig/arch/powerpc/Kconfig 2005-11-11 11:36:16.000000000 +1100 +++ build/arch/powerpc/Kconfig 2005-11-11 11:41:55.000000000 +1100 @@ -581,17 +581,12 @@ def_bool y depends on PPC64 && !NUMA -config ARCH_DISCONTIGMEM_ENABLE - def_bool y - depends on SMP && PPC_PSERIES - -config ARCH_DISCONTIGMEM_DEFAULT +config ARCH_SPARSEMEM_ENABLE def_bool y - depends on ARCH_DISCONTIGMEM_ENABLE -config ARCH_SPARSEMEM_ENABLE +config ARCH_SPARSEMEM_DEFAULT def_bool y - depends on ARCH_DISCONTIGMEM_ENABLE + depends on SMP && PPC_PSERIES source "mm/Kconfig" Index: build/arch/ppc64/Kconfig =================================================================== --- build.orig/arch/ppc64/Kconfig 2005-11-11 11:38:51.000000000 +1100 +++ build/arch/ppc64/Kconfig 2005-11-11 11:41:55.000000000 +1100 @@ -279,17 +279,12 @@ def_bool y depends on !NUMA -config ARCH_DISCONTIGMEM_ENABLE - def_bool y - depends on SMP && PPC_PSERIES - -config ARCH_DISCONTIGMEM_DEFAULT +config ARCH_SPARSEMEM_ENABLE def_bool y - depends on ARCH_DISCONTIGMEM_ENABLE -config ARCH_SPARSEMEM_ENABLE +config ARCH_SPARSEMEM_DEFAULT def_bool y - depends on ARCH_DISCONTIGMEM_ENABLE + depends on NUMA source "mm/Kconfig" From michael at ellerman.id.au Fri Nov 11 14:25:24 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 11 Nov 2005 14:25:24 +1100 (EST) Subject: [PATCH] powerpc: Take 5, merge page.h Message-ID: <20051111032524.5DE5E6871E@ozlabs.org> Merge asm-ppc/page.h and asm-ppc64/page.h, into asm-powerpc/page.h, asm-powerpc/page_32.h and asm-powerpc/page_64.h There's a bit of weirdness in page_32.h, with APUS undef'ing things. I think this is cleaner though than polluting the rest of the code with PPC_MEMOFFSET etc. Built for PPC (common_defconfig), with ARCH=powerpc, mostly built with ARCH=ppc (other things break the build). Built and booted on P5 LPAR for PPC64 with ARCH=ppc/powerpc (pseries_defconfig). Mostly built for iSeries powerpc. Signed-off-by: Michael Ellerman --- arch/ppc64/Kconfig | 5 include/asm-powerpc/page.h | 176 ++++++++++++++++++++++ include/asm-powerpc/page_32.h | 97 ++++++++++++ include/asm-powerpc/page_64.h | 177 ++++++++++++++++++++++ include/asm-ppc/page.h | 173 --------------------- include/asm-ppc64/page.h | 333 ------------------------------------------ 6 files changed, 455 insertions(+), 506 deletions(-) Index: kexec/include/asm-powerpc/page.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page.h @@ -0,0 +1,176 @@ +#ifndef _ASM_POWERPC_PAGE_H +#define _ASM_POWERPC_PAGE_H + +/* + * Copyright (C) 2001,2005 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifdef __KERNEL__ +#include +#include + +/* + * On PPC32 page size is 4K. For PPC64 we support either 4K or 64K software + * page size. When using 64K pages however, whether we are really supporting + * 64K pages in HW or not is irrelevant to those definitions. + */ +#ifdef CONFIG_PPC_64K_PAGES +#define PAGE_SHIFT 16 +#else +#define PAGE_SHIFT 12 +#endif + +#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) + +/* + * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we + * assign PAGE_MASK to a larger type it gets extended the way we want + * (i.e. with 1s in the high bits) + */ +#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) + +#define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) +#define KERNELBASE PAGE_OFFSET + +#ifdef CONFIG_DISCONTIGMEM +#define page_to_pfn(page) discontigmem_page_to_pfn(page) +#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) +#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) +#endif + +#ifdef CONFIG_FLATMEM +#define pfn_to_page(pfn) (mem_map + (pfn)) +#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) +#define pfn_valid(pfn) ((pfn) < max_mapnr) +#endif + +#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) + +#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) +#define __pa(x) ((unsigned long)(x) - PAGE_OFFSET) + +/* + * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, + * and needs to be executable. This means the whole heap ends + * up being executable. + */ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#ifdef __powerpc64__ +#include +#else +#include +#endif + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) +#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr,size) _ALIGN_UP(addr,size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +#ifndef __ASSEMBLY__ + +#undef STRICT_MM_TYPECHECKS + +#ifdef STRICT_MM_TYPECHECKS +/* These are used to make use of C type-checking. */ + +/* PTE level */ +typedef struct { pte_basic_t pte; } pte_t; +#define pte_val(x) ((x).pte) +#define __pte(x) ((pte_t) { (x) }) + +/* 64k pages additionally define a bigger "real PTE" type that gathers + * the "second half" part of the PTE for pseudo 64k pages + */ +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef struct { pte_t pte; } real_pte_t; +#endif + +/* PMD level */ +typedef struct { unsigned long pmd; } pmd_t; +#define pmd_val(x) ((x).pmd) +#define __pmd(x) ((pmd_t) { (x) }) + +/* PUD level exusts only on 4k pages */ +#ifndef CONFIG_PPC_64K_PAGES +typedef struct { unsigned long pud; } pud_t; +#define pud_val(x) ((x).pud) +#define __pud(x) ((pud_t) { (x) }) +#endif + +/* PGD level */ +typedef struct { unsigned long pgd; } pgd_t; +#define pgd_val(x) ((x).pgd) +#define __pgd(x) ((pgd_t) { (x) }) + +/* Page protection bits */ +typedef struct { unsigned long pgprot; } pgprot_t; +#define pgprot_val(x) ((x).pgprot) +#define __pgprot(x) ((pgprot_t) { (x) }) + +#else + +/* + * .. while these make it easier on the compiler + */ + +typedef pte_basic_t pte_t; +#define pte_val(x) (x) +#define __pte(x) (x) + +#ifdef CONFIG_PPC_64K_PAGES +typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; +#else +typedef unsigned long real_pte_t; +#endif + + +typedef unsigned long pmd_t; +#define pmd_val(x) (x) +#define __pmd(x) (x) + +#ifndef CONFIG_PPC_64K_PAGES +typedef unsigned long pud_t; +#define pud_val(x) (x) +#define __pud(x) (x) +#endif + +typedef unsigned long pgd_t; +#define pgd_val(x) (x) +#define pgprot_val(x) (x) + +typedef unsigned long pgprot_t; +#define __pgd(x) (x) +#define __pgprot(x) (x) + +#endif + +struct page; +extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); +extern void copy_user_page(void *to, void *from, unsigned long vaddr, + struct page *p); +extern int page_is_ram(unsigned long pfn); + +#endif /* __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_PAGE_H */ Index: kexec/include/asm-powerpc/page_32.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_32.h @@ -0,0 +1,97 @@ +#ifndef _ASM_POWERPC_PAGE_32_H +#define _ASM_POWERPC_PAGE_32_H + +#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS32 + +#ifndef __ASSEMBLY__ +/* + * The basic type of a PTE - 64 bits for those CPUs with > 32 bit + * physical addressing. For now this just the IBM PPC440. + */ +#ifdef CONFIG_PTE_64BIT +typedef unsigned long long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ +#define PTE_FMT "%16Lx" +#else +typedef unsigned long pte_basic_t; +#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ +#define PTE_FMT "%.8lx" +#endif + +struct page; +extern void clear_pages(void *page, int order); +static inline void clear_page(void *page) { clear_pages(page, 0); } +extern void copy_page(void *to, void *from); + +/* Pure 2^n version of get_order */ +extern __inline__ int get_order(unsigned long size) +{ + int lz; + + size = (size-1) >> PAGE_SHIFT; + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); + return 32 - lz; +} + +#ifndef CONFIG_APUS +#define PPC_MEMSTART 0 +#else /* CONFIG_APUS */ +extern unsigned long ppc_memstart; +extern unsigned long ppc_pgstart; +extern unsigned long ppc_memoffset; +#define PPC_MEMSTART ppc_memstart + +#ifdef MODULE +#define ___pa(vaddr) ((vaddr) - ppc_memoffset) +#define ___va(paddr) ((paddr) + ppc_memoffset) +#else /* !MODULE */ +/* map phys->virtual and virtual->phys for RAM pages */ +static inline unsigned long ___pa(unsigned long v) +{ + unsigned long p; + asm volatile ("1: addis %0, %1, %2;" + ".section \".vtop_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (p) + : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); + + return p; +} +static inline void* ___va(unsigned long p) +{ + unsigned long v; + asm volatile ("1: addis %0, %1, %2;" + ".section \".ptov_fixup\",\"aw\";" + ".align 1;" + ".long 1b;" + ".previous;" + : "=r" (v) + : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); + + return (void*) v; +} +#endif + +/* APUS needs more complicated versions of these macros */ +#undef __pa +#define __pa(x) ___pa((unsigned long)(x)) + +#undef __va +#define __va(x) ((void *)(___va((unsigned long)(x)))) + +#undef pfn_to_page +#define pfn_to_page(pfn) (mem_map + ((pfn) - ppc_pgstart)) + +#undef page_to_pfn +#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + ppc_pgstart) + +#undef pfn_valid +#define pfn_valid(pfn) (((pfn) - ppc_pgstart) < max_mapnr) + +#endif /* CONFIG_APUS */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_POWERPC_PAGE_32_H */ Index: kexec/include/asm-powerpc/page_64.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/page_64.h @@ -0,0 +1,177 @@ +#ifndef _ASM_POWERPC_PAGE_64_H +#define _ASM_POWERPC_PAGE_64_H + +/* + * Copyright (C) 2001 PPC64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +/* + * We always define HW_PAGE_SHIFT to 12 as use of 64K pages remains Linux + * specific, every notion of page number shared with the firmware, TCEs, + * iommu, etc... still uses a page size of 4K. + */ +#define HW_PAGE_SHIFT 12 +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) + +/* + * PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and + * HW_PAGE_SHIFT, that is 4K pages. + */ +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) + +#define REGION_SIZE 4UL +#define REGION_SHIFT 60UL +#define REGION_MASK (((1UL<> REGION_SHIFT) +#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) +#define USER_REGION_ID (0UL) +#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) + +/* Segment size */ +#define SID_SHIFT 28 +#define SID_MASK 0xfffffffffUL +#define ESID_MASK 0xfffffffff0000000UL +#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) + +#ifndef __ASSEMBLY__ +#include + +typedef unsigned long pte_basic_t; + +static __inline__ void clear_page(void *addr) +{ + unsigned long lines, line_size; + + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; + + __asm__ __volatile__( + "mtctr %1 # clear_page\n\ +1: dcbz 0,%0\n\ + add %0,%0,%3\n\ + bdnz+ 1b" + : "=r" (addr) + : "r" (lines), "0" (addr), "r" (line_size) + : "ctr", "memory"); +} + +extern void copy_4K_page(void *to, void *from); + +#ifdef CONFIG_PPC_64K_PAGES +static inline void copy_page(void *to, void *from) +{ + unsigned int i; + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { + copy_4K_page(to, from); + to += 4096; + from += 4096; + } +} +#else /* CONFIG_PPC_64K_PAGES */ +static inline void copy_page(void *to, void *from) +{ + copy_4K_page(to, from); +} +#endif /* CONFIG_PPC_64K_PAGES */ + +/* Log 2 of page table size */ +extern u64 ppc64_pft_size; + +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + +/* Large pages size */ +extern unsigned int HPAGE_SHIFT; +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +#endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_HUGETLB_PAGE + +#define HTLB_AREA_SHIFT 40 +#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) +#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) + +#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ + - (1U << GET_ESID(addr))) & 0xffff) +#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ + - (1U << GET_HTLB_AREA(addr))) & 0xffff) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE +#define ARCH_HAS_SETCLEAR_HUGE_PTE + +#define touches_hugepage_low_range(mm, addr, len) \ + (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) +#define touches_hugepage_high_range(mm, addr, len) \ + (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) + +#define __within_hugepage_low_range(addr, len, segmask) \ + ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) +#define within_hugepage_low_range(addr, len) \ + __within_hugepage_low_range((addr), (len), \ + current->mm->context.low_htlb_areas) +#define __within_hugepage_high_range(addr, len, zonemask) \ + ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) +#define within_hugepage_high_range(addr, len) \ + __within_hugepage_high_range((addr), (len), \ + current->mm->context.high_htlb_areas) + +#define is_hugepage_only_range(mm, addr, len) \ + (touches_hugepage_high_range((mm), (addr), (len)) || \ + touches_hugepage_low_range((mm), (addr), (len))) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ + ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ + ( ((addr) < 0x100000000L) && \ + ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ + +#ifdef MODULE +#define __page_aligned __attribute__((__aligned__(PAGE_SIZE))) +#else +#define __page_aligned \ + __attribute__((__aligned__(PAGE_SIZE), \ + __section__(".data.page_aligned"))) +#endif + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +/* + * This is the default if a program doesn't have a PT_GNU_STACK + * program header entry. The PPC64 ELF ABI has a non executable stack + * stack by default, so in the absense of a PT_GNU_STACK program header + * we turn execute permission off. + */ +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) + +#include + +#endif /* _ASM_POWERPC_PAGE_64_H */ Index: kexec/include/asm-ppc/page.h =================================================================== --- kexec.orig/include/asm-ppc/page.h +++ /dev/null @@ -1,173 +0,0 @@ -#ifndef _PPC_PAGE_H -#define _PPC_PAGE_H - -/* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 -#define PAGE_SIZE (1UL << PAGE_SHIFT) - -/* - * Subtle: this is an int (not an unsigned long) and so it - * gets extended to 64 bits the way want (i.e. with 1s). -- paulus - */ -#define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) - -#ifdef __KERNEL__ -#include - -/* This must match what is in arch/ppc/Makefile */ -#define PAGE_OFFSET CONFIG_KERNEL_START -#define KERNELBASE PAGE_OFFSET - -#ifndef __ASSEMBLY__ - -/* - * The basic type of a PTE - 64 bits for those CPUs with > 32 bit - * physical addressing. For now this just the IBM PPC440. - */ -#ifdef CONFIG_PTE_64BIT -typedef unsigned long long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 3) /* 512 ptes per page */ -#define PTE_FMT "%16Lx" -#else -typedef unsigned long pte_basic_t; -#define PTE_SHIFT (PAGE_SHIFT - 2) /* 1024 ptes per page */ -#define PTE_FMT "%.8lx" -#endif - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - - -#undef STRICT_MM_TYPECHECKS - -#ifdef STRICT_MM_TYPECHECKS -/* - * These are used to make use of C type-checking.. - */ -typedef struct { pte_basic_t pte; } pte_t; -typedef struct { unsigned long pmd; } pmd_t; -typedef struct { unsigned long pgd; } pgd_t; -typedef struct { unsigned long pgprot; } pgprot_t; - -#define pte_val(x) ((x).pte) -#define pmd_val(x) ((x).pmd) -#define pgd_val(x) ((x).pgd) -#define pgprot_val(x) ((x).pgprot) - -#define __pte(x) ((pte_t) { (x) } ) -#define __pmd(x) ((pmd_t) { (x) } ) -#define __pgd(x) ((pgd_t) { (x) } ) -#define __pgprot(x) ((pgprot_t) { (x) } ) - -#else -/* - * .. while these make it easier on the compiler - */ -typedef pte_basic_t pte_t; -typedef unsigned long pmd_t; -typedef unsigned long pgd_t; -typedef unsigned long pgprot_t; - -#define pte_val(x) (x) -#define pmd_val(x) (x) -#define pgd_val(x) (x) -#define pgprot_val(x) (x) - -#define __pte(x) (x) -#define __pmd(x) (x) -#define __pgd(x) (x) -#define __pgprot(x) (x) - -#endif - -struct page; -extern void clear_pages(void *page, int order); -static inline void clear_page(void *page) { clear_pages(page, 0); } -extern void copy_page(void *to, void *from); -extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg); -extern void copy_user_page(void *to, void *from, unsigned long vaddr, - struct page *pg); - -#ifndef CONFIG_APUS -#define PPC_MEMSTART 0 -#define PPC_PGSTART 0 -#define PPC_MEMOFFSET PAGE_OFFSET -#else -extern unsigned long ppc_memstart; -extern unsigned long ppc_pgstart; -extern unsigned long ppc_memoffset; -#define PPC_MEMSTART ppc_memstart -#define PPC_PGSTART ppc_pgstart -#define PPC_MEMOFFSET ppc_memoffset -#endif - -#if defined(CONFIG_APUS) && !defined(MODULE) -/* map phys->virtual and virtual->phys for RAM pages */ -static inline unsigned long ___pa(unsigned long v) -{ - unsigned long p; - asm volatile ("1: addis %0, %1, %2;" - ".section \".vtop_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (p) - : "b" (v), "K" (((-PAGE_OFFSET) >> 16) & 0xffff)); - - return p; -} -static inline void* ___va(unsigned long p) -{ - unsigned long v; - asm volatile ("1: addis %0, %1, %2;" - ".section \".ptov_fixup\",\"aw\";" - ".align 1;" - ".long 1b;" - ".previous;" - : "=r" (v) - : "b" (p), "K" (((PAGE_OFFSET) >> 16) & 0xffff)); - - return (void*) v; -} -#else -#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET) -#define ___va(paddr) ((paddr)+PPC_MEMOFFSET) -#endif - -extern int page_is_ram(unsigned long pfn); - -#define __pa(x) ___pa((unsigned long)(x)) -#define __va(x) ((void *)(___va((unsigned long)(x)))) - -#define pfn_to_page(pfn) (mem_map + ((pfn) - PPC_PGSTART)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map) + PPC_PGSTART) -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) - -#define pfn_valid(pfn) (((pfn) - PPC_PGSTART) < max_mapnr) -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* Pure 2^n version of get_order */ -extern __inline__ int get_order(unsigned long size) -{ - int lz; - - size = (size-1) >> PAGE_SHIFT; - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (size)); - return 32 - lz; -} - -#endif /* __ASSEMBLY__ */ - -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#endif /* __KERNEL__ */ -#endif /* _PPC_PAGE_H */ Index: kexec/include/asm-ppc64/page.h =================================================================== --- kexec.orig/include/asm-ppc64/page.h +++ /dev/null @@ -1,333 +0,0 @@ -#ifndef _PPC64_PAGE_H -#define _PPC64_PAGE_H - -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include - -/* - * We support either 4k or 64k software page size. When using 64k pages - * however, wether we are really supporting 64k pages in HW or not is - * irrelevant to those definitions. We always define HW_PAGE_SHIFT to 12 - * as use of 64k pages remains a linux kernel specific, every notion of - * page number shared with the firmware, TCEs, iommu, etc... still assumes - * a page size of 4096. - */ -#ifdef CONFIG_PPC_64K_PAGES -#define PAGE_SHIFT 16 -#else -#define PAGE_SHIFT 12 -#endif - -#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) -#define PAGE_MASK (~(PAGE_SIZE-1)) - -/* HW_PAGE_SHIFT is always 4k pages */ -#define HW_PAGE_SHIFT 12 -#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) -#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) - -/* PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and - * HW_PAGE_SHIFT, that is 4k pages - */ -#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) - -/* Segment size */ -#define SID_SHIFT 28 -#define SID_MASK 0xfffffffffUL -#define ESID_MASK 0xfffffffff0000000UL -#define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) - -/* Large pages size */ - -#ifndef __ASSEMBLY__ -extern unsigned int HPAGE_SHIFT; -#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) -#define HPAGE_MASK (~(HPAGE_SIZE - 1)) -#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#endif /* __ASSEMBLY__ */ - -#ifdef CONFIG_HUGETLB_PAGE - - -#define HTLB_AREA_SHIFT 40 -#define HTLB_AREA_SIZE (1UL << HTLB_AREA_SHIFT) -#define GET_HTLB_AREA(x) ((x) >> HTLB_AREA_SHIFT) - -#define LOW_ESID_MASK(addr, len) (((1U << (GET_ESID(addr+len-1)+1)) \ - - (1U << GET_ESID(addr))) & 0xffff) -#define HTLB_AREA_MASK(addr, len) (((1U << (GET_HTLB_AREA(addr+len-1)+1)) \ - - (1U << GET_HTLB_AREA(addr))) & 0xffff) - -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE -#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE -#define ARCH_HAS_SETCLEAR_HUGE_PTE - -#define touches_hugepage_low_range(mm, addr, len) \ - (LOW_ESID_MASK((addr), (len)) & (mm)->context.low_htlb_areas) -#define touches_hugepage_high_range(mm, addr, len) \ - (HTLB_AREA_MASK((addr), (len)) & (mm)->context.high_htlb_areas) - -#define __within_hugepage_low_range(addr, len, segmask) \ - ((LOW_ESID_MASK((addr), (len)) | (segmask)) == (segmask)) -#define within_hugepage_low_range(addr, len) \ - __within_hugepage_low_range((addr), (len), \ - current->mm->context.low_htlb_areas) -#define __within_hugepage_high_range(addr, len, zonemask) \ - ((HTLB_AREA_MASK((addr), (len)) | (zonemask)) == (zonemask)) -#define within_hugepage_high_range(addr, len) \ - __within_hugepage_high_range((addr), (len), \ - current->mm->context.high_htlb_areas) - -#define is_hugepage_only_range(mm, addr, len) \ - (touches_hugepage_high_range((mm), (addr), (len)) || \ - touches_hugepage_low_range((mm), (addr), (len))) -#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA - -#define in_hugepage_area(context, addr) \ - (cpu_has_feature(CPU_FTR_16M_PAGE) && \ - ( ((1 << GET_HTLB_AREA(addr)) & (context).high_htlb_areas) || \ - ( ((addr) < 0x100000000L) && \ - ((1 << GET_ESID(addr)) & (context).low_htlb_areas) ) ) ) - -#else /* !CONFIG_HUGETLB_PAGE */ - -#define in_hugepage_area(mm, addr) 0 - -#endif /* !CONFIG_HUGETLB_PAGE */ - -/* align addr on a size boundary - adjust address up/down if needed */ -#define _ALIGN_UP(addr,size) (((addr)+((size)-1))&(~((size)-1))) -#define _ALIGN_DOWN(addr,size) ((addr)&(~((size)-1))) - -/* align addr on a size boundary - adjust address up if needed */ -#define _ALIGN(addr,size) _ALIGN_UP(addr,size) - -/* to align the pointer to the (next) page boundary */ -#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) - -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#include - -#undef STRICT_MM_TYPECHECKS - -#define REGION_SIZE 4UL -#define REGION_SHIFT 60UL -#define REGION_MASK (((1UL<> REGION_SHIFT) -#define KERNEL_REGION_ID (KERNELBASE >> REGION_SHIFT) -#define USER_REGION_ID (0UL) -#define REGION_ID(ea) (((unsigned long)(ea)) >> REGION_SHIFT) - -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) - -#ifdef CONFIG_DISCONTIGMEM -#define page_to_pfn(page) discontigmem_page_to_pfn(page) -#define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -#define pfn_valid(pfn) discontigmem_pfn_valid(pfn) -#endif -#ifdef CONFIG_FLATMEM -#define pfn_to_page(pfn) (mem_map + (pfn)) -#define page_to_pfn(page) ((unsigned long)((page) - mem_map)) -#define pfn_valid(pfn) ((pfn) < max_mapnr) -#endif - -#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) -#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) - -#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) - -/* - * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, - * and needs to be executable. This means the whole heap ends - * up being executable. - */ -#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_DATA_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) - -/* - * This is the default if a program doesn't have a PT_GNU_STACK - * program header entry. The PPC64 ELF ABI has a non executable stack - * stack by default, so in the absense of a PT_GNU_STACK program header - * we turn execute permission off. - */ -#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) - -#define VM_STACK_DEFAULT_FLAGS \ - (test_thread_flag(TIF_32BIT) ? \ - VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) - -#endif /* __KERNEL__ */ - -#include - -#endif /* _PPC64_PAGE_H */ Index: kexec/arch/ppc64/Kconfig =================================================================== --- kexec.orig/arch/ppc64/Kconfig +++ kexec/arch/ppc64/Kconfig @@ -291,6 +291,11 @@ config ARCH_SPARSEMEM_ENABLE def_bool y depends on ARCH_DISCONTIGMEM_ENABLE +# Hack to make ARCH=ppc64 work with asm-powerpc/page.h +config KERNEL_START + hex + default "0xc000000000000000" + source "mm/Kconfig" config HAVE_ARCH_EARLY_PFN_TO_NID From kravetz at us.ibm.com Fri Nov 11 14:49:17 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Thu, 10 Nov 2005 19:49:17 -0800 Subject: [PATCH] ppc64: Convert NUMA to sparsemem (3) In-Reply-To: <20051111032234.GH14770@krispykreme> References: <20051111032234.GH14770@krispykreme> Message-ID: <20051111034916.GA7169@w-mikek2.ibm.com> On Fri, Nov 11, 2005 at 02:22:35PM +1100, Anton Blanchard wrote: > Convert to sparsemem and remove all the discontigmem code in the > process. This has a few advantages: Great!!! One thing I've never been sure about are these definitions in sparsemem.h. /* * SECTION_SIZE_BITS 2^N: how big each section will be * MAX_PHYSADDR_BITS 2^N: how much physical address space we have * MAX_PHYSMEM_BITS 2^N: how much memory we can have in that space */ #define SECTION_SIZE_BITS 24 #define MAX_PHYSADDR_BITS 38 #define MAX_PHYSMEM_BITS 36 Do you know if these are sufficient for all supported hardware today? Thanks, -- Mike From anton at samba.org Fri Nov 11 15:02:03 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Nov 2005 15:02:03 +1100 Subject: [PATCH] ppc64: Increase sparsemem defaults In-Reply-To: <20051111034916.GA7169@w-mikek2.ibm.com> References: <20051111032234.GH14770@krispykreme> <20051111034916.GA7169@w-mikek2.ibm.com> Message-ID: <20051111040203.GJ14770@krispykreme> > One thing I've never been sure about are these definitions in sparsemem.h. > > /* > * SECTION_SIZE_BITS 2^N: how big each section will be > * MAX_PHYSADDR_BITS 2^N: how much physical address space we have > * MAX_PHYSMEM_BITS 2^N: how much memory we can have in that space > */ > #define SECTION_SIZE_BITS 24 > #define MAX_PHYSADDR_BITS 38 > #define MAX_PHYSMEM_BITS 36 > > Do you know if these are sufficient for all supported hardware today? Thanks for reminding me, they arent sufficient. We currently sell machines with 2TB of RAM, and in order to give us room for a few years growth lets set it to 16TB. Signed-off-by: Anton Blanchard --- Index: build/include/asm-powerpc/sparsemem.h =================================================================== --- build.orig/include/asm-powerpc/sparsemem.h 2005-11-11 14:56:43.000000000 +1100 +++ build/include/asm-powerpc/sparsemem.h 2005-11-11 14:56:58.000000000 +1100 @@ -8,8 +8,8 @@ * MAX_PHYSMEM_BITS 2^N: how much memory we can have in that space */ #define SECTION_SIZE_BITS 24 -#define MAX_PHYSADDR_BITS 38 -#define MAX_PHYSMEM_BITS 36 +#define MAX_PHYSADDR_BITS 44 +#define MAX_PHYSMEM_BITS 44 #ifdef CONFIG_MEMORY_HOTPLUG extern void create_section_mapping(unsigned long start, unsigned long end); From david at gibson.dropbear.id.au Fri Nov 11 15:49:46 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 11 Nov 2005 15:49:46 +1100 Subject: powerpc: Merge serial.h Message-ID: <20051111044946.GG1821@localhost.localdomain> This patch merges the ppc32 and (trivial) ppc64 versions of serial.h. Mostly this is just an #ifdef merge, with the 32/64 ifdef being folded in with the existing ifdefs for various embedded platforms from the 32-bit version. Notable changes: - We fold ppc32's pc_serial.h into serial.h, because there's no clear reason for keeping it separate. - We abolish the SERIAL_DEV_OFFSET macro; no-one was using it. Built for 32bit multiplatform (ARCH=powerpc), built and booted on POWER5 LPAR (ARCH=powerpc), built for Walnut (ARCH=ppc). Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/serial.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/serial.h 2005-11-11 14:01:13.000000000 +1100 @@ -0,0 +1,100 @@ +/* + * PowerPC serial.h + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#ifndef _ASM_POWERPC_SERIAL_H +#define _ASM_POWERPC_SERIAL_H + +#ifdef __KERNEL__ + +#include + +#if defined(CONFIG_EV64260) +#include +#elif defined(CONFIG_CHESTNUT) +#include +#elif defined(CONFIG_GEMINI) +#include +#elif defined(CONFIG_POWERPMC250) +#include +#elif defined(CONFIG_LOPEC) +#include +#elif defined(CONFIG_MVME5100) +#include +#elif defined(CONFIG_PAL4) +#include +#elif defined(CONFIG_PRPMC750) +#include +#elif defined(CONFIG_PRPMC800) +#include +#elif defined(CONFIG_SANDPOINT) +#include +#elif defined(CONFIG_SPRUCE) +#include +#elif defined(CONFIG_4xx) +#include +#elif defined(CONFIG_83xx) +#include +#elif defined(CONFIG_85xx) +#include +#elif defined(CONFIG_RADSTONE_PPC7D) +#include +#else /* Multiplatform, 32 or 64 bit */ + +/* + * This assumes you have a 1.8432 MHz clock for your UART. + * + * It'd be nice if someone built a serial card with a 24.576 MHz + * clock, since the 16550A is capable of handling a top speed of 1.5 + * megabits/second; but this requires the faster clock. + */ + +/* Default baud base if not found in device-tree */ +#define BASE_BAUD ( 1843200 / 16 ) + +#ifdef CONFIG_PPC32 +/* + * XXX Assume for now it has PC-style ISA serial ports. + * This is true for PReP and CHRP at least. + * + * This is basically a copy of include/asm-i386/serial.h. + * It is used on platforms which have an ISA bus and thus are likely + * to have PC-style serial ports at the legacy I/O port addresses. + * It also includes the definitions for the fourport, accent, boca + * and hub6 multiport serial cards, although I have never heard of + * anyone using any of those on a PPC platform. -- paulus + */ + +#ifdef CONFIG_SERIAL_MANY_PORTS +#define RS_TABLE_SIZE 64 +#else +#define RS_TABLE_SIZE 4 +#endif + +/* Standard COM flags (except for COM4, because of the 8514 problem) */ +#ifdef CONFIG_SERIAL_DETECT_IRQ +#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST | ASYNC_AUTO_IRQ) +#define STD_COM4_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_AUTO_IRQ) +#else +#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST) +#define STD_COM4_FLAGS ASYNC_BOOT_AUTOCONF +#endif + +#define SERIAL_PORT_DFNS \ + /* UART CLK PORT IRQ FLAGS */ \ + { 0, BASE_BAUD, 0x3F8, 4, STD_COM_FLAGS }, /* ttyS0 */ \ + { 0, BASE_BAUD, 0x2F8, 3, STD_COM_FLAGS }, /* ttyS1 */ \ + { 0, BASE_BAUD, 0x3E8, 4, STD_COM_FLAGS }, /* ttyS2 */ \ + { 0, BASE_BAUD, 0x2E8, 3, STD_COM4_FLAGS }, /* ttyS3 */ + +#endif /* CONFIG_PPC32 */ + +#endif + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_SERIAL_H */ Index: working-2.6/include/asm-ppc/pc_serial.h =================================================================== --- working-2.6.orig/include/asm-ppc/pc_serial.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,43 +0,0 @@ -/* - * include/asm-ppc/pc_serial.h - * - * This is basically a copy of include/asm-i386/serial.h. - * It is used on platforms which have an ISA bus and thus are likely - * to have PC-style serial ports at the legacy I/O port addresses. - * It also includes the definitions for the fourport, accent, boca - * and hub6 multiport serial cards, although I have never heard of - * anyone using any of those on a PPC platform. -- paulus - */ - -#include - -/* - * This assumes you have a 1.8432 MHz clock for your UART. - * - * It'd be nice if someone built a serial card with a 24.576 MHz - * clock, since the 16550A is capable of handling a top speed of 1.5 - * megabits/second; but this requires the faster clock. - */ -#define BASE_BAUD ( 1843200 / 16 ) - -#ifdef CONFIG_SERIAL_MANY_PORTS -#define RS_TABLE_SIZE 64 -#else -#define RS_TABLE_SIZE 4 -#endif - -/* Standard COM flags (except for COM4, because of the 8514 problem) */ -#ifdef CONFIG_SERIAL_DETECT_IRQ -#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST | ASYNC_AUTO_IRQ) -#define STD_COM4_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_AUTO_IRQ) -#else -#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST) -#define STD_COM4_FLAGS ASYNC_BOOT_AUTOCONF -#endif - -#define SERIAL_PORT_DFNS \ - /* UART CLK PORT IRQ FLAGS */ \ - { 0, BASE_BAUD, 0x3F8, 4, STD_COM_FLAGS }, /* ttyS0 */ \ - { 0, BASE_BAUD, 0x2F8, 3, STD_COM_FLAGS }, /* ttyS1 */ \ - { 0, BASE_BAUD, 0x3E8, 4, STD_COM_FLAGS }, /* ttyS2 */ \ - { 0, BASE_BAUD, 0x2E8, 3, STD_COM4_FLAGS }, /* ttyS3 */ Index: working-2.6/include/asm-ppc/serial.h =================================================================== --- working-2.6.orig/include/asm-ppc/serial.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,55 +0,0 @@ -/* - * include/asm-ppc/serial.h - */ - -#ifdef __KERNEL__ -#ifndef __ASM_SERIAL_H__ -#define __ASM_SERIAL_H__ - -#include - -#if defined(CONFIG_EV64260) -#include -#elif defined(CONFIG_CHESTNUT) -#include -#elif defined(CONFIG_GEMINI) -#include -#elif defined(CONFIG_POWERPMC250) -#include -#elif defined(CONFIG_LOPEC) -#include -#elif defined(CONFIG_MVME5100) -#include -#elif defined(CONFIG_PAL4) -#include -#elif defined(CONFIG_PRPMC750) -#include -#elif defined(CONFIG_PRPMC800) -#include -#elif defined(CONFIG_SANDPOINT) -#include -#elif defined(CONFIG_SPRUCE) -#include -#elif defined(CONFIG_4xx) -#include -#elif defined(CONFIG_83xx) -#include -#elif defined(CONFIG_85xx) -#include -#elif defined(CONFIG_RADSTONE_PPC7D) -#include -#else - -/* - * XXX Assume for now it has PC-style ISA serial ports. - * This is true for PReP and CHRP at least. - */ -#include - -#if defined(CONFIG_MAC_SERIAL) -#define SERIAL_DEV_OFFSET ((_machine == _MACH_prep || _machine == _MACH_chrp) ? 0 : 2) -#endif - -#endif /* !CONFIG_GEMINI and others */ -#endif /* __ASM_SERIAL_H__ */ -#endif /* __KERNEL__ */ Index: working-2.6/include/asm-ppc64/serial.h =================================================================== --- working-2.6.orig/include/asm-ppc64/serial.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,23 +0,0 @@ -/* - * include/asm-ppc64/serial.h - */ -#ifndef _PPC64_SERIAL_H -#define _PPC64_SERIAL_H - -/* - * This assumes you have a 1.8432 MHz clock for your UART. - * - * It'd be nice if someone built a serial card with a 24.576 MHz - * clock, since the 16550A is capable of handling a top speed of 1.5 - * megabits/second; but this requires the faster clock. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -/* Default baud base if not found in device-tree */ -#define BASE_BAUD ( 1843200 / 16 ) - -#endif /* _PPC64_SERIAL_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From david at gibson.dropbear.id.au Fri Nov 11 16:42:12 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 11 Nov 2005 16:42:12 +1100 Subject: powerpc: Move udbg code to arch/powerpc Message-ID: <20051111054212.GI1821@localhost.localdomain> Since the udbg code in ppc64 has no ppc32 equivalent, move it straight over into arch/powerpc (and include/asm-powerpc for udbg.h). In time, we probably want to meld the various bits and pieces of 32-bit early debugging code into udbg, but for now only include it on CONFIG_PPC64=y builds. The only change during the move is to standardise the protecting #ifdef/#define in udbg.h, and move its banner comment above the initial #ifdef (which seems to be normal practice). Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64). Built for 32bit multiplatform (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/kernel/Makefile =================================================================== --- working-2.6.orig/arch/powerpc/kernel/Makefile 2005-11-10 15:39:54.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/Makefile 2005-11-11 16:12:58.000000000 +1100 @@ -16,7 +16,7 @@ obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ signal_64.o ptrace32.o systbl.o \ paca.o ioctl32.o cpu_setup_power4.o \ - firmware.o sysfs.o + firmware.o sysfs.o udbg.o obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o obj-$(CONFIG_POWER4) += idle_power4.o obj-$(CONFIG_PPC_OF) += of_device.o @@ -29,6 +29,10 @@ obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o +obj-$(CONFIG_PPC_PSERIES) += udbg_16550.o +obj-$(CONFIG_PPC_MAPLE) += udbg_16550.o +udbgscc-$(CONFIG_PPC64) := udbg_scc.o +obj-$(CONFIG_PPC_PMAC) += $(udbgscc-y) ifeq ($(CONFIG_PPC_MERGE),y) Index: working-2.6/arch/powerpc/kernel/udbg.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/udbg.c 2005-11-11 16:08:53.000000000 +1100 @@ -0,0 +1,125 @@ +/* + * polling mode stateless debugging stuff, originally for NS16550 Serial Ports + * + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + +void (*udbg_putc)(unsigned char c); +unsigned char (*udbg_getc)(void); +int (*udbg_getc_poll)(void); + +/* udbg library, used by xmon et al */ +void udbg_puts(const char *s) +{ + if (udbg_putc) { + char c; + + if (s && *s != '\0') { + while ((c = *s++) != '\0') + udbg_putc(c); + } + } +#if 0 + else { + printk("%s", s); + } +#endif +} + +int udbg_write(const char *s, int n) +{ + int remain = n; + char c; + + if (!udbg_putc) + return 0; + + if (s && *s != '\0') { + while (((c = *s++) != '\0') && (remain-- > 0)) { + udbg_putc(c); + } + } + + return n - remain; +} + +int udbg_read(char *buf, int buflen) +{ + char c, *p = buf; + int i; + + if (!udbg_getc) + return 0; + + for (i = 0; i < buflen; ++i) { + do { + c = udbg_getc(); + } while (c == 0x11 || c == 0x13); + if (c == 0) + break; + *p++ = c; + } + + return i; +} + +#define UDBG_BUFSIZE 256 +void udbg_printf(const char *fmt, ...) +{ + unsigned char buf[UDBG_BUFSIZE]; + va_list args; + + va_start(args, fmt); + vsnprintf(buf, UDBG_BUFSIZE, fmt, args); + udbg_puts(buf); + va_end(args); +} + +/* + * Early boot console based on udbg + */ +static void udbg_console_write(struct console *con, const char *s, + unsigned int n) +{ + udbg_write(s, n); +} + +static struct console udbg_console = { + .name = "udbg", + .write = udbg_console_write, + .flags = CON_PRINTBUFFER, + .index = -1, +}; + +static int early_console_initialized; + +void __init disable_early_printk(void) +{ + if (!early_console_initialized) + return; + unregister_console(&udbg_console); + early_console_initialized = 0; +} + +/* called by setup_system */ +void register_early_udbg_console(void) +{ + early_console_initialized = 1; + register_console(&udbg_console); +} + +#if 0 /* if you want to use this as a regular output console */ +console_initcall(register_udbg_console); +#endif Index: working-2.6/arch/powerpc/kernel/udbg_16550.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/udbg_16550.c 2005-11-11 16:08:55.000000000 +1100 @@ -0,0 +1,123 @@ +/* + * udbg for for NS16550 compatable serial ports + * + * Copyright (C) 2001-2005 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include + +extern u8 real_readb(volatile u8 __iomem *addr); +extern void real_writeb(u8 data, volatile u8 __iomem *addr); + +struct NS16550 { + /* this struct must be packed */ + unsigned char rbr; /* 0 */ + unsigned char ier; /* 1 */ + unsigned char fcr; /* 2 */ + unsigned char lcr; /* 3 */ + unsigned char mcr; /* 4 */ + unsigned char lsr; /* 5 */ + unsigned char msr; /* 6 */ + unsigned char scr; /* 7 */ +}; + +#define thr rbr +#define iir fcr +#define dll rbr +#define dlm ier +#define dlab lcr + +#define LSR_DR 0x01 /* Data ready */ +#define LSR_OE 0x02 /* Overrun */ +#define LSR_PE 0x04 /* Parity error */ +#define LSR_FE 0x08 /* Framing error */ +#define LSR_BI 0x10 /* Break */ +#define LSR_THRE 0x20 /* Xmit holding register empty */ +#define LSR_TEMT 0x40 /* Xmitter empty */ +#define LSR_ERR 0x80 /* Error */ + +static volatile struct NS16550 __iomem *udbg_comport; + +static void udbg_550_putc(unsigned char c) +{ + if (udbg_comport) { + while ((in_8(&udbg_comport->lsr) & LSR_THRE) == 0) + /* wait for idle */; + out_8(&udbg_comport->thr, c); + if (c == '\n') + udbg_550_putc('\r'); + } +} + +static int udbg_550_getc_poll(void) +{ + if (udbg_comport) { + if ((in_8(&udbg_comport->lsr) & LSR_DR) != 0) + return in_8(&udbg_comport->rbr); + else + return -1; + } + return -1; +} + +static unsigned char udbg_550_getc(void) +{ + if (udbg_comport) { + while ((in_8(&udbg_comport->lsr) & LSR_DR) == 0) + /* wait for char */; + return in_8(&udbg_comport->rbr); + } + return 0; +} + +void udbg_init_uart(void __iomem *comport, unsigned int speed) +{ + u16 dll = speed ? (115200 / speed) : 12; + + if (comport) { + udbg_comport = (struct NS16550 __iomem *)comport; + out_8(&udbg_comport->lcr, 0x00); + out_8(&udbg_comport->ier, 0xff); + out_8(&udbg_comport->ier, 0x00); + out_8(&udbg_comport->lcr, 0x80); /* Access baud rate */ + out_8(&udbg_comport->dll, dll & 0xff); /* 1 = 115200, 2 = 57600, + 3 = 38400, 12 = 9600 baud */ + out_8(&udbg_comport->dlm, dll >> 8); /* dll >> 8 which should be zero + for fast rates; */ + out_8(&udbg_comport->lcr, 0x03); /* 8 data, 1 stop, no parity */ + out_8(&udbg_comport->mcr, 0x03); /* RTS/DTR */ + out_8(&udbg_comport->fcr ,0x07); /* Clear & enable FIFOs */ + udbg_putc = udbg_550_putc; + udbg_getc = udbg_550_getc; + udbg_getc_poll = udbg_550_getc_poll; + } +} + +#ifdef CONFIG_PPC_MAPLE +void udbg_maple_real_putc(unsigned char c) +{ + if (udbg_comport) { + while ((real_readb(&udbg_comport->lsr) & LSR_THRE) == 0) + /* wait for idle */; + real_writeb(c, &udbg_comport->thr); eieio(); + if (c == '\n') + udbg_maple_real_putc('\r'); + } +} + +void udbg_init_maple_realmode(void) +{ + udbg_comport = (volatile struct NS16550 __iomem *)0xf40003f8; + + udbg_putc = udbg_maple_real_putc; + udbg_getc = NULL; + udbg_getc_poll = NULL; +} +#endif /* CONFIG_PPC_MAPLE */ Index: working-2.6/arch/powerpc/kernel/udbg_scc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/kernel/udbg_scc.c 2005-11-11 16:08:58.000000000 +1100 @@ -0,0 +1,135 @@ +/* + * udbg for for zilog scc ports as found on Apple PowerMacs + * + * Copyright (C) 2001-2005 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include +#include + +extern u8 real_readb(volatile u8 __iomem *addr); +extern void real_writeb(u8 data, volatile u8 __iomem *addr); + +#define SCC_TXRDY 4 +#define SCC_RXRDY 1 + +static volatile u8 __iomem *sccc; +static volatile u8 __iomem *sccd; + +static void udbg_scc_putc(unsigned char c) +{ + if (sccc) { + while ((in_8(sccc) & SCC_TXRDY) == 0) + ; + out_8(sccd, c); + if (c == '\n') + udbg_scc_putc('\r'); + } +} + +static int udbg_scc_getc_poll(void) +{ + if (sccc) { + if ((in_8(sccc) & SCC_RXRDY) != 0) + return in_8(sccd); + else + return -1; + } + return -1; +} + +static unsigned char udbg_scc_getc(void) +{ + if (sccc) { + while ((in_8(sccc) & SCC_RXRDY) == 0) + ; + return in_8(sccd); + } + return 0; +} + +static unsigned char scc_inittab[] = { + 13, 0, /* set baud rate divisor */ + 12, 0, + 14, 1, /* baud rate gen enable, src=rtxc */ + 11, 0x50, /* clocks = br gen */ + 5, 0xea, /* tx 8 bits, assert DTR & RTS */ + 4, 0x46, /* x16 clock, 1 stop */ + 3, 0xc1, /* rx enable, 8 bits */ +}; + +void udbg_init_scc(struct device_node *np) +{ + u32 *reg; + unsigned long addr; + int i, x; + + if (np == NULL) + np = of_find_node_by_name(NULL, "escc"); + if (np == NULL || np->parent == NULL) + return; + + udbg_printf("found SCC...\n"); + /* Get address within mac-io ASIC */ + reg = (u32 *)get_property(np, "reg", NULL); + if (reg == NULL) + return; + addr = reg[0]; + udbg_printf("local addr: %lx\n", addr); + /* Get address of mac-io PCI itself */ + reg = (u32 *)get_property(np->parent, "assigned-addresses", NULL); + if (reg == NULL) + return; + addr += reg[2]; + udbg_printf("final addr: %lx\n", addr); + + /* Setup for 57600 8N1 */ + addr += 0x20; + sccc = (volatile u8 * __iomem) ioremap(addr & PAGE_MASK, PAGE_SIZE) ; + sccc += addr & ~PAGE_MASK; + sccd = sccc + 0x10; + + udbg_printf("ioremap result sccc: %p\n", sccc); + mb(); + + for (i = 20000; i != 0; --i) + x = in_8(sccc); + out_8(sccc, 0x09); /* reset A or B side */ + out_8(sccc, 0xc0); + for (i = 0; i < sizeof(scc_inittab); ++i) + out_8(sccc, scc_inittab[i]); + + udbg_putc = udbg_scc_putc; + udbg_getc = udbg_scc_getc; + udbg_getc_poll = udbg_scc_getc_poll; + + udbg_puts("Hello World !\n"); +} + +static void udbg_real_scc_putc(unsigned char c) +{ + while ((real_readb(sccc) & SCC_TXRDY) == 0) + ; + real_writeb(c, sccd); + if (c == '\n') + udbg_real_scc_putc('\r'); +} + +void udbg_init_pmac_realmode(void) +{ + sccc = (volatile u8 __iomem *)0x80013020ul; + sccd = (volatile u8 __iomem *)0x80013030ul; + + udbg_putc = udbg_real_scc_putc; + udbg_getc = NULL; + udbg_getc_poll = NULL; +} Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-11-11 15:53:05.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-11 16:12:22.000000000 +1100 @@ -13,7 +13,6 @@ obj-y += idle.o dma.o \ align.o \ - udbg.o \ rtc.o \ iommu.o vdso.o obj-y += vdso32/ vdso64/ @@ -27,8 +26,6 @@ obj-$(CONFIG_PPC_MULTIPLATFORM) += prom_init.o endif -obj-$(CONFIG_PPC_PSERIES) += udbg_16550.o - obj-$(CONFIG_KEXEC) += machine_kexec.o obj-$(CONFIG_MODULES) += module.o ifneq ($(CONFIG_PPC_MERGE),y) @@ -38,10 +35,6 @@ obj-$(CONFIG_BOOTX_TEXT) += btext.o endif -obj-$(CONFIG_PPC_PMAC) += udbg_scc.o - -obj-$(CONFIG_PPC_MAPLE) += udbg_16550.o - obj-$(CONFIG_KPROBES) += kprobes.o ifneq ($(CONFIG_PPC_MERGE),y) Index: working-2.6/arch/ppc64/kernel/udbg.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg.c 2005-11-08 16:10:59.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,125 +0,0 @@ -/* - * polling mode stateless debugging stuff, originally for NS16550 Serial Ports - * - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include -#include - -void (*udbg_putc)(unsigned char c); -unsigned char (*udbg_getc)(void); -int (*udbg_getc_poll)(void); - -/* udbg library, used by xmon et al */ -void udbg_puts(const char *s) -{ - if (udbg_putc) { - char c; - - if (s && *s != '\0') { - while ((c = *s++) != '\0') - udbg_putc(c); - } - } -#if 0 - else { - printk("%s", s); - } -#endif -} - -int udbg_write(const char *s, int n) -{ - int remain = n; - char c; - - if (!udbg_putc) - return 0; - - if (s && *s != '\0') { - while (((c = *s++) != '\0') && (remain-- > 0)) { - udbg_putc(c); - } - } - - return n - remain; -} - -int udbg_read(char *buf, int buflen) -{ - char c, *p = buf; - int i; - - if (!udbg_getc) - return 0; - - for (i = 0; i < buflen; ++i) { - do { - c = udbg_getc(); - } while (c == 0x11 || c == 0x13); - if (c == 0) - break; - *p++ = c; - } - - return i; -} - -#define UDBG_BUFSIZE 256 -void udbg_printf(const char *fmt, ...) -{ - unsigned char buf[UDBG_BUFSIZE]; - va_list args; - - va_start(args, fmt); - vsnprintf(buf, UDBG_BUFSIZE, fmt, args); - udbg_puts(buf); - va_end(args); -} - -/* - * Early boot console based on udbg - */ -static void udbg_console_write(struct console *con, const char *s, - unsigned int n) -{ - udbg_write(s, n); -} - -static struct console udbg_console = { - .name = "udbg", - .write = udbg_console_write, - .flags = CON_PRINTBUFFER, - .index = -1, -}; - -static int early_console_initialized; - -void __init disable_early_printk(void) -{ - if (!early_console_initialized) - return; - unregister_console(&udbg_console); - early_console_initialized = 0; -} - -/* called by setup_system */ -void register_early_udbg_console(void) -{ - early_console_initialized = 1; - register_console(&udbg_console); -} - -#if 0 /* if you want to use this as a regular output console */ -console_initcall(register_udbg_console); -#endif Index: working-2.6/arch/ppc64/kernel/udbg_16550.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg_16550.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,123 +0,0 @@ -/* - * udbg for for NS16550 compatable serial ports - * - * Copyright (C) 2001-2005 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#include -#include -#include -#include - -extern u8 real_readb(volatile u8 __iomem *addr); -extern void real_writeb(u8 data, volatile u8 __iomem *addr); - -struct NS16550 { - /* this struct must be packed */ - unsigned char rbr; /* 0 */ - unsigned char ier; /* 1 */ - unsigned char fcr; /* 2 */ - unsigned char lcr; /* 3 */ - unsigned char mcr; /* 4 */ - unsigned char lsr; /* 5 */ - unsigned char msr; /* 6 */ - unsigned char scr; /* 7 */ -}; - -#define thr rbr -#define iir fcr -#define dll rbr -#define dlm ier -#define dlab lcr - -#define LSR_DR 0x01 /* Data ready */ -#define LSR_OE 0x02 /* Overrun */ -#define LSR_PE 0x04 /* Parity error */ -#define LSR_FE 0x08 /* Framing error */ -#define LSR_BI 0x10 /* Break */ -#define LSR_THRE 0x20 /* Xmit holding register empty */ -#define LSR_TEMT 0x40 /* Xmitter empty */ -#define LSR_ERR 0x80 /* Error */ - -static volatile struct NS16550 __iomem *udbg_comport; - -static void udbg_550_putc(unsigned char c) -{ - if (udbg_comport) { - while ((in_8(&udbg_comport->lsr) & LSR_THRE) == 0) - /* wait for idle */; - out_8(&udbg_comport->thr, c); - if (c == '\n') - udbg_550_putc('\r'); - } -} - -static int udbg_550_getc_poll(void) -{ - if (udbg_comport) { - if ((in_8(&udbg_comport->lsr) & LSR_DR) != 0) - return in_8(&udbg_comport->rbr); - else - return -1; - } - return -1; -} - -static unsigned char udbg_550_getc(void) -{ - if (udbg_comport) { - while ((in_8(&udbg_comport->lsr) & LSR_DR) == 0) - /* wait for char */; - return in_8(&udbg_comport->rbr); - } - return 0; -} - -void udbg_init_uart(void __iomem *comport, unsigned int speed) -{ - u16 dll = speed ? (115200 / speed) : 12; - - if (comport) { - udbg_comport = (struct NS16550 __iomem *)comport; - out_8(&udbg_comport->lcr, 0x00); - out_8(&udbg_comport->ier, 0xff); - out_8(&udbg_comport->ier, 0x00); - out_8(&udbg_comport->lcr, 0x80); /* Access baud rate */ - out_8(&udbg_comport->dll, dll & 0xff); /* 1 = 115200, 2 = 57600, - 3 = 38400, 12 = 9600 baud */ - out_8(&udbg_comport->dlm, dll >> 8); /* dll >> 8 which should be zero - for fast rates; */ - out_8(&udbg_comport->lcr, 0x03); /* 8 data, 1 stop, no parity */ - out_8(&udbg_comport->mcr, 0x03); /* RTS/DTR */ - out_8(&udbg_comport->fcr ,0x07); /* Clear & enable FIFOs */ - udbg_putc = udbg_550_putc; - udbg_getc = udbg_550_getc; - udbg_getc_poll = udbg_550_getc_poll; - } -} - -#ifdef CONFIG_PPC_MAPLE -void udbg_maple_real_putc(unsigned char c) -{ - if (udbg_comport) { - while ((real_readb(&udbg_comport->lsr) & LSR_THRE) == 0) - /* wait for idle */; - real_writeb(c, &udbg_comport->thr); eieio(); - if (c == '\n') - udbg_maple_real_putc('\r'); - } -} - -void udbg_init_maple_realmode(void) -{ - udbg_comport = (volatile struct NS16550 __iomem *)0xf40003f8; - - udbg_putc = udbg_maple_real_putc; - udbg_getc = NULL; - udbg_getc_poll = NULL; -} -#endif /* CONFIG_PPC_MAPLE */ Index: working-2.6/arch/ppc64/kernel/udbg_scc.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg_scc.c 2005-11-08 10:57:14.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,135 +0,0 @@ -/* - * udbg for for zilog scc ports as found on Apple PowerMacs - * - * Copyright (C) 2001-2005 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#include -#include -#include -#include -#include -#include -#include - -extern u8 real_readb(volatile u8 __iomem *addr); -extern void real_writeb(u8 data, volatile u8 __iomem *addr); - -#define SCC_TXRDY 4 -#define SCC_RXRDY 1 - -static volatile u8 __iomem *sccc; -static volatile u8 __iomem *sccd; - -static void udbg_scc_putc(unsigned char c) -{ - if (sccc) { - while ((in_8(sccc) & SCC_TXRDY) == 0) - ; - out_8(sccd, c); - if (c == '\n') - udbg_scc_putc('\r'); - } -} - -static int udbg_scc_getc_poll(void) -{ - if (sccc) { - if ((in_8(sccc) & SCC_RXRDY) != 0) - return in_8(sccd); - else - return -1; - } - return -1; -} - -static unsigned char udbg_scc_getc(void) -{ - if (sccc) { - while ((in_8(sccc) & SCC_RXRDY) == 0) - ; - return in_8(sccd); - } - return 0; -} - -static unsigned char scc_inittab[] = { - 13, 0, /* set baud rate divisor */ - 12, 0, - 14, 1, /* baud rate gen enable, src=rtxc */ - 11, 0x50, /* clocks = br gen */ - 5, 0xea, /* tx 8 bits, assert DTR & RTS */ - 4, 0x46, /* x16 clock, 1 stop */ - 3, 0xc1, /* rx enable, 8 bits */ -}; - -void udbg_init_scc(struct device_node *np) -{ - u32 *reg; - unsigned long addr; - int i, x; - - if (np == NULL) - np = of_find_node_by_name(NULL, "escc"); - if (np == NULL || np->parent == NULL) - return; - - udbg_printf("found SCC...\n"); - /* Get address within mac-io ASIC */ - reg = (u32 *)get_property(np, "reg", NULL); - if (reg == NULL) - return; - addr = reg[0]; - udbg_printf("local addr: %lx\n", addr); - /* Get address of mac-io PCI itself */ - reg = (u32 *)get_property(np->parent, "assigned-addresses", NULL); - if (reg == NULL) - return; - addr += reg[2]; - udbg_printf("final addr: %lx\n", addr); - - /* Setup for 57600 8N1 */ - addr += 0x20; - sccc = (volatile u8 * __iomem) ioremap(addr & PAGE_MASK, PAGE_SIZE) ; - sccc += addr & ~PAGE_MASK; - sccd = sccc + 0x10; - - udbg_printf("ioremap result sccc: %p\n", sccc); - mb(); - - for (i = 20000; i != 0; --i) - x = in_8(sccc); - out_8(sccc, 0x09); /* reset A or B side */ - out_8(sccc, 0xc0); - for (i = 0; i < sizeof(scc_inittab); ++i) - out_8(sccc, scc_inittab[i]); - - udbg_putc = udbg_scc_putc; - udbg_getc = udbg_scc_getc; - udbg_getc_poll = udbg_scc_getc_poll; - - udbg_puts("Hello World !\n"); -} - -static void udbg_real_scc_putc(unsigned char c) -{ - while ((real_readb(sccc) & SCC_TXRDY) == 0) - ; - real_writeb(c, sccd); - if (c == '\n') - udbg_real_scc_putc('\r'); -} - -void udbg_init_pmac_realmode(void) -{ - sccc = (volatile u8 __iomem *)0x80013020ul; - sccd = (volatile u8 __iomem *)0x80013030ul; - - udbg_putc = udbg_real_scc_putc; - udbg_getc = NULL; - udbg_getc_poll = NULL; -} Index: working-2.6/include/asm-powerpc/udbg.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/udbg.h 2005-11-11 16:10:04.000000000 +1100 @@ -0,0 +1,31 @@ +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_POWERPC_UDBG_H +#define _ASM_POWERPC_UDBG_H + +#include +#include + +extern void (*udbg_putc)(unsigned char c); +extern unsigned char (*udbg_getc)(void); +extern int (*udbg_getc_poll)(void); + +extern void udbg_puts(const char *s); +extern int udbg_write(const char *s, int n); +extern int udbg_read(char *buf, int buflen); + +extern void register_early_udbg_console(void); +extern void udbg_printf(const char *fmt, ...); + +extern void udbg_init_uart(void __iomem *comport, unsigned int speed); + +struct device_node; +extern void udbg_init_scc(struct device_node *np); +#endif /* _ASM_POWERPC_UDBG_H */ Index: working-2.6/include/asm-ppc64/udbg.h =================================================================== --- working-2.6.orig/include/asm-ppc64/udbg.h 2005-11-08 16:10:59.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,31 +0,0 @@ -#ifndef __UDBG_HDR -#define __UDBG_HDR - -#include -#include - -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -extern void (*udbg_putc)(unsigned char c); -extern unsigned char (*udbg_getc)(void); -extern int (*udbg_getc_poll)(void); - -extern void udbg_puts(const char *s); -extern int udbg_write(const char *s, int n); -extern int udbg_read(char *buf, int buflen); - -extern void register_early_udbg_console(void); -extern void udbg_printf(const char *fmt, ...); - -extern void udbg_init_uart(void __iomem *comport, unsigned int speed); - -struct device_node; -extern void udbg_init_scc(struct device_node *np); -#endif -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From hch at lst.de Fri Nov 11 19:25:13 2005 From: hch at lst.de (Christoph Hellwig) Date: Fri, 11 Nov 2005 09:25:13 +0100 Subject: [PATCH] powerpc: Take 3, merge page.h In-Reply-To: <20051111010309.5943A68710@ozlabs.org> References: <20051111010309.5943A68710@ozlabs.org> Message-ID: <20051111082513.GA25910@lst.de> On Fri, Nov 11, 2005 at 12:03:09PM +1100, Michael Ellerman wrote: > Merge asm-ppc/page.h and asm-ppc64/page.h, into asm-powerpc/page.h, > asm-powerpc/page_32.h and asm-powerpc/page_64.h > > There's a bit of weirdness in page_32.h, with APUS undef'ing things. I think > this is cleaner though than polluting the rest of the code with PPC_MEMOFFSET > etc. I think Roman had patches to clean up various bits of the APUS mess. May you should merge them first? From paulus at samba.org Fri Nov 11 20:26:45 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Nov 2005 20:26:45 +1100 Subject: [PATCH] powerpc: Take 3, merge page.h In-Reply-To: <20051111082513.GA25910@lst.de> References: <20051111010309.5943A68710@ozlabs.org> <20051111082513.GA25910@lst.de> Message-ID: <17268.25557.185799.390867@cargo.ozlabs.ibm.com> Christoph Hellwig writes: > I think Roman had patches to clean up various bits of the APUS mess. > May you should merge them first? I'm considering pulling out all of the hacks that are there to deal with APUS's physical memory not starting at 0, and making the APUS guys use the discontigmem stuff to cope with it instead. Though I guess they would still need some hacks in the assembly code. Paul. From benh at kernel.crashing.org Fri Nov 11 21:15:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 11 Nov 2005 21:15:21 +1100 Subject: [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel Message-ID: <1131704121.24637.223.camel@gaston> This patch moves the vdso's to arch/powerpc, adds support for the 32 bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds some new (still untested) routines to both vdso's: clock_gettime() with support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same clocks) and get_tbfreq() for glibc to retreive the timebase frequency. Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits returns a long long (r3, r4) not a long. This is such that if we ever add support for >4Ghz timebases on ppc32, the userland interface won't have to change. I have tested gettimeofday() using some glibc patches in both ppc32 and ppc64 kernels using 32 bits userland (I haven't had a chance to test a 64 bits userland yet, but the implementation didn't change and was tested earlier). I haven't tested yet the new functions. Signed-off-by: Benjamin Herrenschmidt Due to the size of the patch, I haven't posted it to the lists, it's available at http://gate.crashing.org/~benh/ppc64-vdso-update.diff Ben. From paulus at samba.org Fri Nov 11 21:24:22 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Nov 2005 21:24:22 +1100 Subject: powerpc: Merge serial.h In-Reply-To: <20051111044946.GG1821@localhost.localdomain> References: <20051111044946.GG1821@localhost.localdomain> Message-ID: <17268.29014.173703.647955@cargo.ozlabs.ibm.com> David Gibson writes: > This patch merges the ppc32 and (trivial) ppc64 versions of serial.h. > Mostly this is just an #ifdef merge, with the 32/64 ifdef being folded > in with the existing ifdefs for various embedded platforms from the > 32-bit version. Notable changes: > - We fold ppc32's pc_serial.h into serial.h, because there's > no clear reason for keeping it separate. > - We abolish the SERIAL_DEV_OFFSET macro; no-one was using it. For include/asm-powerpc, I would rather decree that serial ports have to be presented in the device tree, and have an almost empty serial.h, leaving the ifdef mess in include/asm-ppc for now. Paul. From paulus at samba.org Fri Nov 11 22:02:11 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Nov 2005 22:02:11 +1100 Subject: powerpc: Move hvconsole files to drivers/char In-Reply-To: <20051110051931.GA24111@localhost.localdomain> References: <20051110051931.GA24111@localhost.localdomain> Message-ID: <17268.31283.493086.39185@cargo.ozlabs.ibm.com> David Gibson writes: > At present the code for the pSeries hypervisor console is split > between drivers/char and arch/ppc64/kernel for no terribly good > reason. It's split like that so that the bit that does the hypervisor call is in the arch code where it should be, and so that it can be replaced by other get/put character methods on other systems. > Thus, this patch moves hvconsole.c and hvcserver.c from > arch/ppc64/kernel to drivers/char. That lets us also move hvconsole.h > and hvcserver.h from include/asm-ppc64 to drivers/char. I think arch/powerpc/platforms/pseries sounds like a better place. If we find more than just pSeries using these interfaces we may have to think about an arch/powerpc/platforms/papr or something. Paul. From arnd at arndb.de Fri Nov 11 23:01:33 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 11 Nov 2005 13:01:33 +0100 Subject: [PATCH] ppc64: Increase sparsemem defaults In-Reply-To: <20051111040203.GJ14770@krispykreme> References: <20051111032234.GH14770@krispykreme> <20051111034916.GA7169@w-mikek2.ibm.com> <20051111040203.GJ14770@krispykreme> Message-ID: <200511111301.35796.arnd@arndb.de> On Freedag 11 November 2005 05:02, Anton Blanchard wrote: > ?#define SECTION_SIZE_BITS ? ? ? 24 > -#define MAX_PHYSADDR_BITS ? ? ? 38 > -#define MAX_PHYSMEM_BITS ? ? ? ?36 > +#define MAX_PHYSADDR_BITS ? ? ? 44 > +#define MAX_PHYSMEM_BITS ? ? ? ?44 Perfect, one less problem for me. I was getting worried that I might step on someone's toes with my patch that increases MAX_PHYSADDR_BITS to 42 for Cell ;-) Arnd <>< From paulus at samba.org Fri Nov 11 23:12:27 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Nov 2005 23:12:27 +1100 Subject: please pull the powerpc-merge.git tree Message-ID: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> Linus, Please do a pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git to get a powerpc update. The main thing here is that the VDSO stuff is now in arch/powerpc, meaning that both 32-bit and 64-bit kernels export a VDSO to userland. There are also various minor fixes. Thanks, Paul. Anton Blanchard: ppc64: Quieten lparcfg ppc64: Remove debug boot message ppc64: Cleanup kprobe assembly ppc64: prep for NUMA sparsemem rework ppc64: prep for NUMA sparsemem rework 2 ppc64: Convert NUMA to sparsemem (3) ppc64: Increase sparsemem defaults Benjamin Herrenschmidt: powerpc: Merge vdso's and add vdso support to 32 bits kernel David Gibson: powerpc: Move udbg code to arch/powerpc David Woodhouse: powerpc: remove initrd debug printk Kumar Gala: ppc32: fix PQ2 PCI DMA interrupt handling Paul Mackerras: powerpc: Initialize secondary CPU setup for 32-bit SMP powerpc: Fix reading and writing SPRs from xmon on 32-bit powerpc: Fix some compile problems with the VDSO stuff arch/powerpc/Kconfig | 11 arch/powerpc/kernel/Makefile | 10 arch/powerpc/kernel/asm-offsets.c | 45 +- arch/powerpc/kernel/head_32.S | 1 arch/powerpc/kernel/lparcfg.c | 23 - arch/powerpc/kernel/paca.c | 7 arch/powerpc/kernel/proc_ppc64.c | 4 arch/powerpc/kernel/rtas-proc.c | 1 arch/powerpc/kernel/setup-common.c | 8 arch/powerpc/kernel/setup_64.c | 62 -- arch/powerpc/kernel/signal_32.c | 12 arch/powerpc/kernel/smp.c | 4 arch/powerpc/kernel/sysfs.c | 1 arch/powerpc/kernel/time.c | 40 + arch/powerpc/kernel/traps.c | 1 arch/powerpc/kernel/udbg.c | 0 arch/powerpc/kernel/udbg_16550.c | 0 arch/powerpc/kernel/udbg_scc.c | 0 arch/powerpc/kernel/vdso.c | 746 +++++++++++++++++++++++++++ arch/powerpc/kernel/vdso32/Makefile | 6 arch/powerpc/kernel/vdso32/cacheflush.S | 0 arch/powerpc/kernel/vdso32/datapage.S | 16 + arch/powerpc/kernel/vdso32/gettimeofday.S | 315 +++++++++++ arch/powerpc/kernel/vdso32/note.S | 0 arch/powerpc/kernel/vdso32/sigtramp.S | 0 arch/powerpc/kernel/vdso32/vdso32.lds.S | 5 arch/powerpc/kernel/vdso32/vdso32_wrapper.S | 2 arch/powerpc/kernel/vdso64/Makefile | 0 arch/powerpc/kernel/vdso64/cacheflush.S | 0 arch/powerpc/kernel/vdso64/datapage.S | 16 + arch/powerpc/kernel/vdso64/gettimeofday.S | 242 +++++++++ arch/powerpc/kernel/vdso64/note.S | 0 arch/powerpc/kernel/vdso64/sigtramp.S | 0 arch/powerpc/kernel/vdso64/vdso64.lds.S | 5 arch/powerpc/kernel/vdso64/vdso64_wrapper.S | 2 arch/powerpc/mm/mem.c | 4 arch/powerpc/mm/numa.c | 367 ++++++------- arch/powerpc/oprofile/op_model_power4.c | 1 arch/powerpc/platforms/pseries/rtasd.c | 1 arch/powerpc/platforms/pseries/setup.c | 4 arch/powerpc/platforms/pseries/smp.c | 4 arch/powerpc/xmon/xmon.c | 25 + arch/ppc/kernel/asm-offsets.c | 28 + arch/ppc/syslib/cpm2_pic.c | 2 arch/ppc64/Kconfig | 11 arch/ppc64/kernel/Makefile | 10 arch/ppc64/kernel/misc.S | 3 arch/ppc64/kernel/vdso32/gettimeofday.S | 140 ----- arch/ppc64/kernel/vdso64/gettimeofday.S | 91 --- include/asm-powerpc/auxvec.h | 2 include/asm-powerpc/elf.h | 10 include/asm-powerpc/irq.h | 1 include/asm-powerpc/processor.h | 2 include/asm-powerpc/sparsemem.h | 4 include/asm-powerpc/systemcfg.h | 64 -- include/asm-powerpc/topology.h | 12 include/asm-powerpc/udbg.h | 14 - include/asm-powerpc/vdso.h | 0 include/asm-powerpc/vdso_datapage.h | 108 ++++ include/asm-ppc/page.h | 8 include/asm-ppc64/mmzone.h | 69 -- include/asm-ppc64/page.h | 5 62 files changed, 1789 insertions(+), 786 deletions(-) rename arch/{ppc64/kernel/udbg.c => powerpc/kernel/udbg.c} (100%) rename arch/{ppc64/kernel/udbg_16550.c => powerpc/kernel/udbg_16550.c} (100%) rename arch/{ppc64/kernel/udbg_scc.c => powerpc/kernel/udbg_scc.c} (100%) create mode 100644 arch/powerpc/kernel/vdso.c rename arch/{ppc64/kernel/vdso32/Makefile => powerpc/kernel/vdso32/Makefile} (92%) rename arch/{ppc64/kernel/vdso32/cacheflush.S => powerpc/kernel/vdso32/cacheflush.S} (100%) rename arch/{ppc64/kernel/vdso32/datapage.S => powerpc/kernel/vdso32/datapage.S} (94%) create mode 100644 arch/powerpc/kernel/vdso32/gettimeofday.S rename arch/{ppc64/kernel/vdso32/note.S => powerpc/kernel/vdso32/note.S} (100%) rename arch/{ppc64/kernel/vdso32/sigtramp.S => powerpc/kernel/vdso32/sigtramp.S} (100%) rename arch/{ppc64/kernel/vdso32/vdso32.lds.S => powerpc/kernel/vdso32/vdso32.lds.S} (97%) rename arch/{ppc64/kernel/vdso32/vdso32_wrapper.S => powerpc/kernel/vdso32/vdso32_wrapper.S} (86%) rename arch/{ppc64/kernel/vdso64/Makefile => powerpc/kernel/vdso64/Makefile} (100%) rename arch/{ppc64/kernel/vdso64/cacheflush.S => powerpc/kernel/vdso64/cacheflush.S} (100%) rename arch/{ppc64/kernel/vdso64/datapage.S => powerpc/kernel/vdso64/datapage.S} (96%) create mode 100644 arch/powerpc/kernel/vdso64/gettimeofday.S rename arch/{ppc64/kernel/vdso64/note.S => powerpc/kernel/vdso64/note.S} (100%) rename arch/{ppc64/kernel/vdso64/sigtramp.S => powerpc/kernel/vdso64/sigtramp.S} (100%) rename arch/{ppc64/kernel/vdso64/vdso64.lds.S => powerpc/kernel/vdso64/vdso64.lds.S} (98%) rename arch/{ppc64/kernel/vdso64/vdso64_wrapper.S => powerpc/kernel/vdso64/vdso64_wrapper.S} (86%) delete mode 100644 arch/ppc64/kernel/vdso32/gettimeofday.S delete mode 100644 arch/ppc64/kernel/vdso64/gettimeofday.S delete mode 100644 include/asm-powerpc/systemcfg.h rename include/{asm-ppc64/udbg.h => asm-powerpc/udbg.h} (80%) rename include/{asm-ppc64/vdso.h => asm-powerpc/vdso.h} (100%) create mode 100644 include/asm-powerpc/vdso_datapage.h From zippel at linux-m68k.org Fri Nov 11 21:55:38 2005 From: zippel at linux-m68k.org (Roman Zippel) Date: Fri, 11 Nov 2005 11:55:38 +0100 (CET) Subject: [PATCH] powerpc: Take 3, merge page.h In-Reply-To: <17268.25557.185799.390867@cargo.ozlabs.ibm.com> References: <20051111010309.5943A68710@ozlabs.org> <20051111082513.GA25910@lst.de> <17268.25557.185799.390867@cargo.ozlabs.ibm.com> Message-ID: Hi, On Fri, 11 Nov 2005, Paul Mackerras wrote: > I'm considering pulling out all of the hacks that are there to deal > with APUS's physical memory not starting at 0, and making the APUS > guys use the discontigmem stuff to cope with it instead. I don't understand how this will help, discontigmem solves a completely different problem. bye, Roman From michael at ellerman.id.au Sat Nov 12 00:06:05 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:05 +1100 (EST) Subject: [PATCH 0/8] powerpc: Kexec fixups and support for booting at 32MB Message-ID: <1131714362.882855.591468241381.qpush@concordia> The first two patches here are needed for kexec, one's fundamental for getting the offical kexec-tools to work, the other's a bug fix. The rest enable booting (via kexec) to 32MB which we need for Kdump. These are on top of my merge of page.h cheers From michael at ellerman.id.au Sat Nov 12 00:06:05 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:05 +1100 (EST) Subject: [PATCH 1/8] powerpc: Turn cpu_irq_down into kexec_cpu_down In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130605.E08B868713@ozlabs.org> We currently have a ppc_md member called cpu_irq_down, which disables IRQs for the cpu in question. The only caller of cpu_irq_down is the kexec code. On pSeries we need to do more than just teardown IRQs at kexec time, so rename the ppc_md member to kexec_cpu_down and expand it. The pSeries code needs to know, and other platforms might too, whether we're doing a crash shutdown (ie. panicking) or a regular kexec, so add a flag for that. The pSeries implementation of kexec_cpu_down does an unregister VPA call, which tells the Hypervisor to stop writing stuff into our pacas. Without this we can get weird memory corruption bugs when we kexec, caused by the Hypervisor writing into the first kernel's pacas which happens to be somewhere interesting in the second kernel's memory. Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/pseries/setup.c | 26 ++++++++++++++++++++++++-- arch/ppc64/kernel/machine_kexec.c | 12 ++++++------ include/asm-powerpc/machdep.h | 4 +++- 3 files changed, 33 insertions(+), 9 deletions(-) Index: kexec/arch/powerpc/platforms/pseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/setup.c +++ kexec/arch/powerpc/platforms/pseries/setup.c @@ -200,14 +200,12 @@ static void __init pSeries_setup_arch(vo if (ppc64_interrupt_controller == IC_OPEN_PIC) { ppc_md.init_IRQ = pSeries_init_mpic; ppc_md.get_irq = mpic_get_irq; - ppc_md.cpu_irq_down = mpic_teardown_this_cpu; /* Allocate the mpic now, so that find_and_init_phbs() can * fill the ISUs */ pSeries_setup_mpic(); } else { ppc_md.init_IRQ = xics_init_IRQ; ppc_md.get_irq = xics_get_irq; - ppc_md.cpu_irq_down = xics_teardown_cpu; } #ifdef CONFIG_SMP @@ -597,6 +595,27 @@ static int pSeries_pci_probe_mode(struct return PCI_PROBE_NORMAL; } +#ifdef CONFIG_KEXEC +static void pseries_kexec_cpu_down(int crash_shutdown, int secondary) +{ + /* Don't risk a hypervisor call if we're crashing */ + if (!crash_shutdown) { + unsigned long vpa = __pa(&get_paca()->lppaca); + + if (unregister_vpa(hard_smp_processor_id(), vpa)) { + printk("VPA deregistration of cpu %u (hw_cpu_id %d) " + "failed\n", smp_processor_id(), + hard_smp_processor_id()); + } + } + + if (ppc64_interrupt_controller == IC_OPEN_PIC) + mpic_teardown_this_cpu(secondary); + else + xics_teardown_cpu(secondary); +} +#endif + struct machdep_calls __initdata pSeries_md = { .probe = pSeries_probe, .setup_arch = pSeries_setup_arch, @@ -619,4 +638,7 @@ struct machdep_calls __initdata pSeries_ .check_legacy_ioport = pSeries_check_legacy_ioport, .system_reset_exception = pSeries_system_reset_exception, .machine_check_exception = pSeries_machine_check_exception, +#ifdef CONFIG_KEXEC + .kexec_cpu_down = pseries_kexec_cpu_down, +#endif }; Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -185,8 +185,8 @@ void kexec_copy_flush(struct kimage *ima */ void kexec_smp_down(void *arg) { - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(1); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 1); local_irq_disable(); kexec_smp_wait(); @@ -233,8 +233,8 @@ static void kexec_prepare_cpus(void) } /* after we tell the others to go down */ - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(0); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 0); put_cpu(); @@ -255,8 +255,8 @@ static void kexec_prepare_cpus(void) * UP to an SMP kernel. */ smp_release_cpus(); - if (ppc_md.cpu_irq_down) - ppc_md.cpu_irq_down(0); + if (ppc_md.kexec_cpu_down) + ppc_md.kexec_cpu_down(0, 0); local_irq_disable(); } Index: kexec/include/asm-powerpc/machdep.h =================================================================== --- kexec.orig/include/asm-powerpc/machdep.h +++ kexec/include/asm-powerpc/machdep.h @@ -93,7 +93,9 @@ struct machdep_calls { void (*init_IRQ)(void); int (*get_irq)(struct pt_regs *); - void (*cpu_irq_down)(int secondary); +#ifdef CONFIG_KEXEC + void (*kexec_cpu_down)(int crash_shutdown, int secondary); +#endif /* PCI stuff */ /* Called after scanning the bus, before allocating resources */ From michael at ellerman.id.au Sat Nov 12 00:06:06 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:06 +1100 (EST) Subject: [PATCH 2/8] powerpc: Export htab start/end via device tree In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130606.622F668714@ozlabs.org> The userspace kexec-tools need to know the location of the htab on non-lpar machines, as well as the end of the kernel. Export via the device tree. NB. This patch has been updated to use "linux,x" property names. You may need to update your kexec-tools to match. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/setup_64.c | 5 +++ arch/ppc64/kernel/machine_kexec.c | 51 ++++++++++++++++++++++++++++++++++++++ include/asm-powerpc/kexec.h | 1 3 files changed, 57 insertions(+) Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -60,6 +60,7 @@ #include #include #include +#include #include "setup.h" @@ -426,6 +427,10 @@ void __init setup_system(void) */ unflatten_device_tree(); +#ifdef CONFIG_KEXEC + kexec_setup(); /* requires unflattened device tree. */ +#endif + /* * Fill the ppc64_caches & systemcfg structures with informations * retreived from the device-tree. Need to be called before Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -305,3 +305,54 @@ void machine_kexec(struct kimage *image) ppc_md.hpte_clear_all); /* NOTREACHED */ } + +/* Values we need to export to the second kernel via the device tree. */ +static unsigned long htab_base, htab_size, kernel_end; + +static struct property htab_base_prop = { + .name = "linux,htab-base", + .length = sizeof(unsigned long), + .value = (unsigned char *)&htab_base, +}; + +static struct property htab_size_prop = { + .name = "linux,htab-size", + .length = sizeof(unsigned long), + .value = (unsigned char *)&htab_size, +}; + +static struct property kernel_end_prop = { + .name = "linux,kernel-end", + .length = sizeof(unsigned long), + .value = (unsigned char *)&kernel_end, +}; + +static void __init export_htab_values(void) +{ + struct device_node *node; + + node = of_find_node_by_path("/chosen"); + if (!node) + return; + + kernel_end = __pa(_end); + prom_add_property(node, &kernel_end_prop); + + /* On machines with no htab htab_address is NULL */ + if (NULL == htab_address) + goto out; + + htab_base = __pa(htab_address); + prom_add_property(node, &htab_base_prop); + + htab_size = 1UL << ppc64_pft_size; + prom_add_property(node, &htab_size_prop); + + out: + of_node_put(node); +} + +void __init kexec_setup(void) +{ + export_htab_values(); +} Index: kexec/include/asm-powerpc/kexec.h =================================================================== --- kexec.orig/include/asm-powerpc/kexec.h +++ kexec/include/asm-powerpc/kexec.h @@ -40,6 +40,7 @@ extern note_buf_t crash_notes[]; #ifdef __powerpc64__ extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ +extern void __init kexec_setup(void); #else struct kimage; extern void machine_kexec_simple(struct kimage *image); From michael at ellerman.id.au Sat Nov 12 00:06:06 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:06 +1100 (EST) Subject: [PATCH 3/8] powerpc: Add a is_kernel_addr() macro In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130606.F3A986870B@ozlabs.org> There's a bunch of code that compares an address with KERNELBASE to see if it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to compare with PAGE_OFFSET, since we're going to change KERNELBASE soon. So replace all of them with an is_kernel_addr() macro that does that. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/prom_init.c | 2 +- arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/mm/slb.c | 6 +++--- arch/powerpc/mm/stab.c | 6 +++--- arch/powerpc/mm/tlb_64.c | 2 +- arch/powerpc/oprofile/op_model_power4.c | 4 ++-- arch/powerpc/oprofile/op_model_rs64.c | 3 +-- arch/powerpc/xmon/xmon.c | 4 ++-- include/asm-powerpc/page.h | 6 ++++++ 9 files changed, 20 insertions(+), 15 deletions(-) Index: kexec/arch/powerpc/mm/stab.c =================================================================== --- kexec.orig/arch/powerpc/mm/stab.c +++ kexec/arch/powerpc/mm/stab.c @@ -122,7 +122,7 @@ static int __ste_allocate(unsigned long unsigned long offset; /* Kernel or user address? */ - if (ea >= KERNELBASE) { + if (is_kernel_addr(ea)) { vsid = get_kernel_vsid(ea); } else { if ((ea >= TASK_SIZE_USER64) || (! mm)) @@ -133,7 +133,7 @@ static int __ste_allocate(unsigned long stab_entry = make_ste(get_paca()->stab_addr, GET_ESID(ea), vsid); - if (ea < KERNELBASE) { + if (!is_kernel_addr(ea)) { offset = __get_cpu_var(stab_cache_ptr); if (offset < NR_STAB_CACHE_ENTRIES) __get_cpu_var(stab_cache[offset++]) = stab_entry; @@ -190,7 +190,7 @@ void switch_stab(struct task_struct *tsk entry++, ste++) { unsigned long ea; ea = ste->esid_data & ESID_MASK; - if (ea < KERNELBASE) { + if (!is_kernel_addr(ea)) { ste->esid_data = 0; } } Index: kexec/arch/powerpc/kernel/prom_init.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom_init.c +++ kexec/arch/powerpc/kernel/prom_init.c @@ -1994,7 +1994,7 @@ static void __init prom_check_initrd(uns if (r3 && r4 && r4 != 0xdeadbeef) { unsigned long val; - RELOC(prom_initrd_start) = (r3 >= KERNELBASE) ? __pa(r3) : r3; + RELOC(prom_initrd_start) = is_kernel_addr(r3) ? __pa(r3) : r3; RELOC(prom_initrd_end) = RELOC(prom_initrd_start) + r4; val = RELOC(prom_initrd_start); Index: kexec/arch/powerpc/kernel/setup-common.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup-common.c +++ kexec/arch/powerpc/kernel/setup-common.c @@ -441,7 +441,7 @@ void __init check_for_initrd(void) /* If we were passed an initrd, set the ROOT_DEV properly if the values * look sensible. If not, clear initrd reference. */ - if (initrd_start >= KERNELBASE && initrd_end >= KERNELBASE && + if (is_kernel_addr(initrd_start) && is_kernel_addr(initrd_end) && initrd_end > initrd_start) ROOT_DEV = Root_RAM0; else { Index: kexec/arch/powerpc/mm/slb.c =================================================================== --- kexec.orig/arch/powerpc/mm/slb.c +++ kexec/arch/powerpc/mm/slb.c @@ -134,14 +134,14 @@ void switch_slb(struct task_struct *tsk, else unmapped_base = TASK_UNMAPPED_BASE_USER64; - if (pc >= KERNELBASE) + if (is_kernel_addr(pc)) return; slb_allocate(pc); if (GET_ESID(pc) == GET_ESID(stack)) return; - if (stack >= KERNELBASE) + if (is_kernel_addr(stack)) return; slb_allocate(stack); @@ -149,7 +149,7 @@ void switch_slb(struct task_struct *tsk, || (GET_ESID(stack) == GET_ESID(unmapped_base))) return; - if (unmapped_base >= KERNELBASE) + if (is_kernel_addr(unmapped_base)) return; slb_allocate(unmapped_base); } Index: kexec/arch/powerpc/oprofile/op_model_power4.c =================================================================== --- kexec.orig/arch/powerpc/oprofile/op_model_power4.c +++ kexec/arch/powerpc/oprofile/op_model_power4.c @@ -253,7 +253,7 @@ static unsigned long get_pc(struct pt_re return (unsigned long)__va(pc); /* Not sure where we were */ - if (pc < KERNELBASE) + if (!is_kernel_addr(pc)) /* function descriptor madness */ return *((unsigned long *)kernel_unknown_bucket); @@ -265,7 +265,7 @@ static int get_kernel(unsigned long pc) int is_kernel; if (!mmcra_has_sihv) { - is_kernel = (pc >= KERNELBASE); + is_kernel = is_kernel_addr(pc); } else { unsigned long mmcra = mfspr(SPRN_MMCRA); is_kernel = ((mmcra & MMCRA_SIPR) == 0); Index: kexec/arch/powerpc/xmon/xmon.c =================================================================== --- kexec.orig/arch/powerpc/xmon/xmon.c +++ kexec/arch/powerpc/xmon/xmon.c @@ -1010,7 +1010,7 @@ static long check_bp_loc(unsigned long a unsigned int instr; addr &= ~3; - if (addr < KERNELBASE) { + if (!is_kernel_addr(addr)) { printf("Breakpoints may only be placed at kernel addresses\n"); return 0; } @@ -1061,7 +1061,7 @@ bpt_cmds(void) dabr.address = 0; dabr.enabled = 0; if (scanhex(&dabr.address)) { - if (dabr.address < KERNELBASE) { + if (!is_kernel_addr(dabr.address)) { printf(badaddr); break; } Index: kexec/include/asm-powerpc/page.h =================================================================== --- kexec.orig/include/asm-powerpc/page.h +++ kexec/include/asm-powerpc/page.h @@ -83,6 +83,12 @@ /* to align the pointer to the (next) page boundary */ #define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) +/* + * Don't compare things with KERNELBASE or PAGE_OFFSET to test for + * "kernelness", use is_kernel_addr() - it should do what you want. + */ +#define is_kernel_addr(x) ((x) >= PAGE_OFFSET) + #ifndef __ASSEMBLY__ #undef STRICT_MM_TYPECHECKS Index: kexec/arch/powerpc/oprofile/op_model_rs64.c =================================================================== --- kexec.orig/arch/powerpc/oprofile/op_model_rs64.c +++ kexec/arch/powerpc/oprofile/op_model_rs64.c @@ -178,7 +178,6 @@ static void rs64_handle_interrupt(struct int val; int i; unsigned long pc = mfspr(SPRN_SIAR); - int is_kernel = (pc >= KERNELBASE); /* set the PMM bit (see comment below) */ mtmsrd(mfmsr() | MSR_PMM); @@ -187,7 +186,7 @@ static void rs64_handle_interrupt(struct val = ctr_read(i); if (val < 0) { if (ctr[i].enabled) { - oprofile_add_pc(pc, is_kernel, i); + oprofile_add_pc(pc, is_kernel_addr(pc), i); ctr_write(i, reset_value[i]); } else { ctr_write(i, 0); Index: kexec/arch/powerpc/mm/tlb_64.c =================================================================== --- kexec.orig/arch/powerpc/mm/tlb_64.c +++ kexec/arch/powerpc/mm/tlb_64.c @@ -168,7 +168,7 @@ void hpte_update(struct mm_struct *mm, u batch->mm = mm; batch->psize = psize; } - if (addr < KERNELBASE) { + if (!is_kernel_addr(addr)) { vsid = get_vsid(mm->context.id, addr); WARN_ON(vsid == 0); } else From michael at ellerman.id.au Sat Nov 12 00:06:07 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:07 +1100 (EST) Subject: [PATCH 4/8] powerpc: Seperate usage of KERNELBASE and PAGE_OFFSET In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130607.8524368718@ozlabs.org> This patch seperates usage of KERNELBASE and PAGE_OFFSET. I haven't looked at any of the PPC code, if we ever want to support Kdump on PPC we'll have to do another audi, ditto for iSeries. This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1 gazillion. To get a physical address from a virtual one you subtract PAGE_OFFSET, _not_ KERNELBASE. KERNELBASE is the virtual address of the start of the kernel, it's often the same as PAGE_OFFSET, but _might not be_. If you want to know something's offset from the start of the kernel you should subtract KERNELBASE. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/entry_64.S | 4 ++-- arch/powerpc/kernel/lparmap.c | 6 +++--- arch/powerpc/mm/hash_utils_64.c | 6 +++--- arch/powerpc/mm/slb.c | 4 ++-- arch/powerpc/mm/slb_low.S | 6 +++--- arch/powerpc/mm/stab.c | 10 +++++----- arch/ppc64/kernel/machine_kexec.c | 5 ++--- 7 files changed, 20 insertions(+), 21 deletions(-) Index: kexec/arch/powerpc/mm/stab.c =================================================================== --- kexec.orig/arch/powerpc/mm/stab.c +++ kexec/arch/powerpc/mm/stab.c @@ -40,7 +40,7 @@ static int make_ste(unsigned long stab, unsigned long entry, group, old_esid, castout_entry, i; unsigned int global_entry; struct stab_entry *ste, *castout_ste; - unsigned long kernel_segment = (esid << SID_SHIFT) >= KERNELBASE; + unsigned long kernel_segment = (esid << SID_SHIFT) >= PAGE_OFFSET; vsid_data = vsid << STE_VSID_SHIFT; esid_data = esid << SID_SHIFT | STE_ESID_KP | STE_ESID_V; @@ -83,7 +83,7 @@ static int make_ste(unsigned long stab, } /* Dont cast out the first kernel segment */ - if ((castout_ste->esid_data & ESID_MASK) != KERNELBASE) + if ((castout_ste->esid_data & ESID_MASK) != PAGE_OFFSET) break; castout_entry = (castout_entry + 1) & 0xf; @@ -251,7 +251,7 @@ void stabs_alloc(void) panic("Unable to allocate segment table for CPU %d.\n", cpu); - newstab += KERNELBASE; + newstab = (unsigned long)__va(newstab); memset((void *)newstab, 0, HW_PAGE_SIZE); @@ -270,11 +270,11 @@ void stabs_alloc(void) */ void stab_initialize(unsigned long stab) { - unsigned long vsid = get_kernel_vsid(KERNELBASE); + unsigned long vsid = get_kernel_vsid(PAGE_OFFSET); unsigned long stabreal; asm volatile("isync; slbia; isync":::"memory"); - make_ste(stab, GET_ESID(KERNELBASE), vsid); + make_ste(stab, GET_ESID(PAGE_OFFSET), vsid); /* Order update */ asm volatile("sync":::"memory"); Index: kexec/arch/ppc64/kernel/machine_kexec.c =================================================================== --- kexec.orig/arch/ppc64/kernel/machine_kexec.c +++ kexec/arch/ppc64/kernel/machine_kexec.c @@ -172,9 +172,8 @@ void kexec_copy_flush(struct kimage *ima * including ones that were in place on the original copy */ for (i = 0; i < nr_segments; i++) - flush_icache_range(ranges[i].mem + KERNELBASE, - ranges[i].mem + KERNELBASE + - ranges[i].memsz); + flush_icache_range((unsigned long)__va(ranges[i].mem), + (unsigned long)__va(ranges[i].mem + ranges[i].memsz)); } #ifdef CONFIG_SMP Index: kexec/arch/powerpc/mm/hash_utils_64.c =================================================================== --- kexec.orig/arch/powerpc/mm/hash_utils_64.c +++ kexec/arch/powerpc/mm/hash_utils_64.c @@ -456,7 +456,7 @@ void __init htab_initialize(void) /* create bolted the linear mapping in the hash table */ for (i=0; i < lmb.memory.cnt; i++) { - base = lmb.memory.region[i].base + KERNELBASE; + base = (unsigned long)__va(lmb.memory.region[i].base); size = lmb.memory.region[i].size; DBG("creating mapping for region: %lx : %lx\n", base, size); @@ -498,8 +498,8 @@ void __init htab_initialize(void) * for either 4K or 16MB pages. */ if (tce_alloc_start) { - tce_alloc_start += KERNELBASE; - tce_alloc_end += KERNELBASE; + tce_alloc_start = (unsigned long)__va(tce_alloc_start); + tce_alloc_end = (unsigned long)__va(tce_alloc_end); if (base + size >= tce_alloc_start) tce_alloc_start = base + size + 1; Index: kexec/arch/powerpc/mm/slb.c =================================================================== --- kexec.orig/arch/powerpc/mm/slb.c +++ kexec/arch/powerpc/mm/slb.c @@ -75,7 +75,7 @@ static void slb_flush_and_rebolt(void) vflags = SLB_VSID_KERNEL | virtual_llp; ksp_esid_data = mk_esid_data(get_paca()->kstack, 2); - if ((ksp_esid_data & ESID_MASK) == KERNELBASE) + if ((ksp_esid_data & ESID_MASK) == PAGE_OFFSET) ksp_esid_data &= ~SLB_ESID_V; /* We need to do this all in asm, so we're sure we don't touch @@ -213,7 +213,7 @@ void slb_initialize(void) asm volatile("isync":::"memory"); asm volatile("slbmte %0,%0"::"r" (0) : "memory"); asm volatile("isync; slbia; isync":::"memory"); - create_slbe(KERNELBASE, lflags, 0); + create_slbe(PAGE_OFFSET, lflags, 0); /* VMALLOC space has 4K pages always for now */ create_slbe(VMALLOCBASE, vflags, 1); Index: kexec/arch/powerpc/kernel/entry_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/entry_64.S +++ kexec/arch/powerpc/kernel/entry_64.S @@ -674,7 +674,7 @@ _GLOBAL(enter_rtas) /* Setup our real return addr */ SET_REG_TO_LABEL(r4,.rtas_return_loc) - SET_REG_TO_CONST(r9,KERNELBASE) + SET_REG_TO_CONST(r9,PAGE_OFFSET) sub r4,r4,r9 mtlr r4 @@ -702,7 +702,7 @@ _GLOBAL(enter_rtas) _STATIC(rtas_return_loc) /* relocation is off at this point */ mfspr r4,SPRN_SPRG3 /* Get PACA */ - SET_REG_TO_CONST(r5, KERNELBASE) + SET_REG_TO_CONST(r5, PAGE_OFFSET) sub r4,r4,r5 /* RELOC the PACA base pointer */ mfmsr r6 Index: kexec/arch/powerpc/mm/slb_low.S =================================================================== --- kexec.orig/arch/powerpc/mm/slb_low.S +++ kexec/arch/powerpc/mm/slb_low.S @@ -37,9 +37,9 @@ _GLOBAL(slb_allocate_realmode) srdi r9,r3,60 /* get region */ srdi r10,r3,28 /* get esid */ - cmpldi cr7,r9,0xc /* cmp KERNELBASE for later use */ + cmpldi cr7,r9,0xc /* cmp PAGE_OFFSET for later use */ - /* r3 = address, r10 = esid, cr7 = <>KERNELBASE */ + /* r3 = address, r10 = esid, cr7 = <> PAGE_OFFSET */ blt cr7,0f /* user or kernel? */ /* kernel address: proto-VSID = ESID */ @@ -166,7 +166,7 @@ _GLOBAL(slb_allocate_user) /* * Finish loading of an SLB entry and return * - * r3 = EA, r10 = proto-VSID, r11 = flags, clobbers r9, cr7 = <>KERNELBASE + * r3 = EA, r10 = proto-VSID, r11 = flags, clobbers r9, cr7 = <> PAGE_OFFSET */ slb_finish_load: ASM_VSID_SCRAMBLE(r10,r9) Index: kexec/arch/powerpc/kernel/lparmap.c =================================================================== --- kexec.orig/arch/powerpc/kernel/lparmap.c +++ kexec/arch/powerpc/kernel/lparmap.c @@ -16,8 +16,8 @@ const struct LparMap __attribute__((__se .xSegmentTableOffs = STAB0_PAGE, .xEsids = { - { .xKernelEsid = GET_ESID(KERNELBASE), - .xKernelVsid = KERNEL_VSID(KERNELBASE), }, + { .xKernelEsid = GET_ESID(PAGE_OFFSET), + .xKernelVsid = KERNEL_VSID(PAGE_OFFSET), }, { .xKernelEsid = GET_ESID(VMALLOCBASE), .xKernelVsid = KERNEL_VSID(VMALLOCBASE), }, }, @@ -25,7 +25,7 @@ const struct LparMap __attribute__((__se .xRanges = { { .xPages = HvPagesToMap, .xOffset = 0, - .xVPN = KERNEL_VSID(KERNELBASE) << (SID_SHIFT - HW_PAGE_SHIFT), + .xVPN = KERNEL_VSID(PAGE_OFFSET) << (SID_SHIFT - HW_PAGE_SHIFT), }, }, }; From michael at ellerman.id.au Sat Nov 12 00:06:08 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:08 +1100 (EST) Subject: [PATCH 5/8] powerpc: Add CONFIG_CRASH_DUMP In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130608.14CD868719@ozlabs.org> This patch adds a Kconfig variable, CONFIG_CRASH_DUMP, which configures the built kernel for use as a Kdump kernel. Currently "all" this involves is changing the value of KERNELBASE to 32 MB. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 11 +++++++++++ arch/powerpc/kernel/setup_64.c | 3 +++ include/asm-powerpc/page.h | 11 +++++++++-- 3 files changed, 23 insertions(+), 2 deletions(-) Index: kexec/arch/powerpc/Kconfig =================================================================== --- kexec.orig/arch/powerpc/Kconfig +++ kexec/arch/powerpc/Kconfig @@ -379,6 +379,17 @@ config CELL_IIC bool default y +config CRASH_DUMP + bool "kernel crash dumps (EXPERIMENTAL)" + depends on PPC_MULTIPLATFORM + depends on EXPERIMENTAL + help + Build a kernel suitable for use as a kdump capture kernel. + The kernel will be linked at a different address than normal, and + so can only be used for Kdump. + + Don't change this unless you know what you are doing. + config IBMVIO depends on PPC_PSERIES || PPC_ISERIES bool Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -511,6 +511,9 @@ void __init setup_system(void) ppc64_caches.iline_size); printk("htab_address = 0x%p\n", htab_address); printk("htab_hash_mask = 0x%lx\n", htab_hash_mask); +#if PHYSICAL_START > 0 + printk("physical_start = 0x%x\n", PHYSICAL_START); +#endif printk("-----------------------------------------------------\n"); mm_init_ppc64(); Index: kexec/include/asm-powerpc/page.h =================================================================== --- kexec.orig/include/asm-powerpc/page.h +++ kexec/include/asm-powerpc/page.h @@ -34,8 +34,15 @@ */ #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) +#ifdef CONFIG_CRASH_DUMP +/* Kdump kernel runs at 32 MB, change at your peril. */ +#define PHYSICAL_START 0x2000000 +#else +#define PHYSICAL_START 0x0 +#endif + #define PAGE_OFFSET ASM_CONST(CONFIG_KERNEL_START) -#define KERNELBASE PAGE_OFFSET +#define KERNELBASE (PAGE_OFFSET + PHYSICAL_START) #ifdef CONFIG_DISCONTIGMEM #define page_to_pfn(page) discontigmem_page_to_pfn(page) @@ -53,7 +60,7 @@ #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) +#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET)) #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET) /* From michael at ellerman.id.au Sat Nov 12 00:06:09 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:09 +1100 (EST) Subject: [PATCH 6/8] powerpc: Reroute interrupts from 0 + offset to PHYSICAL_START + offset In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130609.B2AD068728@ozlabs.org> Regardless of where the kernel's linked we always get interrupts at low addresses. This patch creates a trampoline in the first 3 pages of memory, where interrupts land, and patches those addresses to jump into the real kernel code at PHYSICAL_START. We also need to reserve the trampoline code and a bit more in prom.c Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/crash.c | 53 +++++++++++++++++++++++++++++++++++++++++ arch/powerpc/kernel/prom.c | 6 +++- arch/powerpc/kernel/setup_64.c | 5 +++ include/asm-powerpc/kdump.h | 13 ++++++++++ 5 files changed, 78 insertions(+), 2 deletions(-) Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -261,6 +262,10 @@ void __init early_setup(unsigned long dt } ppc_md = **mach; +#ifdef CONFIG_CRASH_DUMP + kdump_setup(); +#endif + DBG("Found, Initializing memory management...\n"); /* Index: kexec/arch/powerpc/kernel/prom.c =================================================================== --- kexec.orig/arch/powerpc/kernel/prom.c +++ kexec/arch/powerpc/kernel/prom.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -1335,11 +1336,14 @@ void __init early_init_devtree(void *par of_scan_flat_dt(early_init_dt_scan_memory, NULL); lmb_enforce_memory_limit(memory_limit); lmb_analyze(); - lmb_reserve(0, __pa(klimit)); DBG("Phys. mem: %lx\n", lmb_phys_mem_size()); /* Reserve LMB regions used by kernel, initrd, dt, etc... */ + lmb_reserve(__pa(KERNELBASE), __pa(klimit) - __pa(KERNELBASE)); +#ifdef CONFIG_CRASH_DUMP + lmb_reserve(0, KDUMP_BACKUP_LIMIT); +#endif early_reserve_mem(); DBG("Scanning CPUs ...\n"); Index: kexec/include/asm-powerpc/kdump.h =================================================================== --- /dev/null +++ kexec/include/asm-powerpc/kdump.h @@ -0,0 +1,13 @@ +#ifndef _PPC64_KDUMP_H +#define _PPC64_KDUMP_H + +/* How many bytes to backup from zero for kdump. The backup limit should + * be greater or equal to the trampoline's end address. */ +#define KDUMP_BACKUP_LIMIT 0x8000 + +#define KDUMP_TRAMPOLINE_START 0x0100 +#define KDUMP_TRAMPOLINE_END 0x3000 + +extern void kdump_setup(void); + +#endif /* __PPC64_KDUMP_H */ Index: kexec/arch/powerpc/kernel/Makefile =================================================================== --- kexec.orig/arch/powerpc/kernel/Makefile +++ kexec/arch/powerpc/kernel/Makefile @@ -12,7 +12,7 @@ CFLAGS_btext.o += -fPIC endif obj-y := semaphore.o cputable.o ptrace.o syscalls.o \ - irq.o signal_32.o pmc.o + irq.o signal_32.o pmc.o crash.o obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ signal_64.o ptrace32.o systbl.o \ paca.o ioctl32.o cpu_setup_power4.o \ @@ -29,6 +29,7 @@ obj-$(CONFIG_RTAS_PROC) += rtas-proc.o obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o +obj-$(CONFIG_CRASH_DUMP) += crash.o ifeq ($(CONFIG_PPC_MERGE),y) Index: kexec/arch/powerpc/kernel/crash.c =================================================================== --- /dev/null +++ kexec/arch/powerpc/kernel/crash.c @@ -0,0 +1,53 @@ +/* + * Routines for doing kexec-based kdump. + * + * Copyright (C) 2005, IBM Corp. + * + * Created by: Michael Ellerman + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#undef DEBUG + +#include +#include +#include + +#ifdef DEBUG +#include +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +static void __init create_trampoline(unsigned long addr) +{ + /* The maximum range of a single instruction branch, is the current + * instruction's address + (32 MB - 4) bytes. For the trampoline we + * need to branch to current address + 32 MB. So we insert a nop at + * the trampoline address, then the next instruction (+ 4 bytes) + * does a branch to (32 MB - 4). The net effect is that when we + * branch to "addr" we jump to ("addr" + 32 MB). Although it requires + * two instructions it doesn't require any registers. + */ + create_instruction(addr, 0x60000000); /* nop */ + create_branch(addr + 4, addr + PHYSICAL_START, 0); +} + +void __init kdump_setup(void) +{ + unsigned long i; + + DBG(" -> kdump_setup()\n"); + + for (i = KDUMP_TRAMPOLINE_START; i < KDUMP_TRAMPOLINE_END; i += 8) { + create_trampoline(i); + } + + create_trampoline(__pa(system_reset_fwnmi) - PHYSICAL_START); + create_trampoline(__pa(machine_check_fwnmi) - PHYSICAL_START); + + DBG(" <- kdump_setup()\n"); +} From michael at ellerman.id.au Sat Nov 12 00:06:11 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:11 +1100 (EST) Subject: [PATCH 7/8] powerpc: Create a trampoline for the fwnmi vectors In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130611.1E1E56871F@ozlabs.org> The fwnmi vectors can be anywhere < 32 MB, so we need to use a trampoline for them. The kdump kernel will register the trampoline addresses, which will then jump up to the real code above 32 MB. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/head_64.S | 2 ++ arch/powerpc/platforms/pseries/ras.c | 6 ++---- arch/powerpc/platforms/pseries/setup.c | 17 +++++++++-------- include/asm-powerpc/firmware.h | 6 ++++++ 4 files changed, 19 insertions(+), 12 deletions(-) Index: kexec/arch/powerpc/kernel/head_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/head_64.S +++ kexec/arch/powerpc/kernel/head_64.S @@ -553,6 +553,7 @@ slb_miss_user_pseries: * Vectors for the FWNMI option. Share common code. */ .globl system_reset_fwnmi + .align 7 system_reset_fwnmi: HMT_MEDIUM mtspr SPRN_SPRG1,r13 /* save r13 */ @@ -560,6 +561,7 @@ system_reset_fwnmi: EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common) .globl machine_check_fwnmi + .align 7 machine_check_fwnmi: HMT_MEDIUM mtspr SPRN_SPRG1,r13 /* save r13 */ Index: kexec/arch/powerpc/platforms/pseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/setup.c +++ kexec/arch/powerpc/platforms/pseries/setup.c @@ -76,8 +76,6 @@ #endif extern void find_udbg_vterm(void); -extern void system_reset_fwnmi(void); /* from head.S */ -extern void machine_check_fwnmi(void); /* from head.S */ extern void generic_find_legacy_serial_ports(u64 *physport, unsigned int *default_speed); @@ -105,18 +103,21 @@ void pSeries_show_cpuinfo(struct seq_fil /* Initialize firmware assisted non-maskable interrupts if * the firmware supports this feature. - * */ static void __init fwnmi_init(void) { - int ret; + unsigned long a1, a2; + int ibm_nmi_register = rtas_token("ibm,nmi-register"); if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE) return; - ret = rtas_call(ibm_nmi_register, 2, 1, NULL, - __pa((unsigned long)system_reset_fwnmi), - __pa((unsigned long)machine_check_fwnmi)); - if (ret == 0) + + /* If the kernel's not linked at zero we point the firmware at low + * addresses anyway, and use a trampoline to get to the real code. */ + a1 = __pa(system_reset_fwnmi) - PHYSICAL_START; + a2 = __pa(machine_check_fwnmi) - PHYSICAL_START; + + if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, a1, a2)) fwnmi_active = 1; } Index: kexec/include/asm-powerpc/firmware.h =================================================================== --- kexec.orig/include/asm-powerpc/firmware.h +++ kexec/include/asm-powerpc/firmware.h @@ -98,6 +98,12 @@ typedef struct { extern firmware_feature_t firmware_features_table[]; #endif +extern void system_reset_fwnmi(void); +extern void machine_check_fwnmi(void); + +/* This is true if we are using the firmware NMI handler (typically LPAR) */ +extern int fwnmi_active; + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_FIRMWARE_H */ Index: kexec/arch/powerpc/platforms/pseries/ras.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/ras.c +++ kexec/arch/powerpc/platforms/pseries/ras.c @@ -49,14 +49,12 @@ #include #include #include +#include static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(ras_log_buf_lock); -char mce_data_buf[RTAS_ERROR_LOG_MAX] -; -/* This is true if we are using the firmware NMI handler (typically LPAR) */ -extern int fwnmi_active; +char mce_data_buf[RTAS_ERROR_LOG_MAX]; static int ras_get_sensor_state_token; static int ras_check_exception_token; From michael at ellerman.id.au Sat Nov 12 00:06:13 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 12 Nov 2005 00:06:13 +1100 (EST) Subject: [PATCH 8/8] powerpc: Fixups for kernel linked at 32 MB In-Reply-To: <1131714362.882855.591468241381.qpush@concordia> Message-ID: <20051111130613.3C6B36871F@ozlabs.org> There's a few places where we need to fix things up for the kernel to work if it's linked at 32MB: - platforms/powermac/smp.c To start secondary cpus on pmac we patch the reset vector, which is fine. Except if we're above 32MB we don't have enough bits for an absolute branch, it needs to relative. - kernel/head_64.s - A few branches in the cpu hold code need to load the full target address and do a bctr. - after_prom_start needs to load PHYSICAL_START as the dest address, not 0. - The exception prolog needs to load the low word of the target adddress, not just the low halfword. - Fixup handling of the initial stab address. - kernel/setup_64.c smp_release_cpus() needs to write 1 to the spinloop flag near 0, not 32 MB. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/head_64.S | 30 ++++++++++++++++++++++++------ arch/powerpc/kernel/setup_64.c | 5 ++++- arch/powerpc/platforms/powermac/smp.c | 2 +- include/asm-ppc64/mmu.h | 3 ++- 4 files changed, 31 insertions(+), 9 deletions(-) Index: kexec/arch/powerpc/platforms/powermac/smp.c =================================================================== --- kexec.orig/arch/powerpc/platforms/powermac/smp.c +++ kexec/arch/powerpc/platforms/powermac/smp.c @@ -770,7 +770,7 @@ static void __devinit smp_core99_kick_cp * b __secondary_start_pmac_0 + nr*8 - KERNELBASE */ new_vector = (unsigned long) __secondary_start_pmac_0 + nr * 8; - *vector = 0x48000002 + new_vector - KERNELBASE; + *vector = 0x48000001 + new_vector - (unsigned long)vector; /* flush data cache and inval instruction cache */ flush_icache_range((unsigned long) vector, (unsigned long) vector + 4); Index: kexec/arch/powerpc/kernel/head_64.S =================================================================== --- kexec.orig/arch/powerpc/kernel/head_64.S +++ kexec/arch/powerpc/kernel/head_64.S @@ -154,11 +154,15 @@ _GLOBAL(__secondary_hold) bne 100b #ifdef CONFIG_HMT - b .hmt_init + LOADADDR(r4, .hmt_init) + mtctr r4 + bctr #else #ifdef CONFIG_SMP + LOADADDR(r4, .pSeries_secondary_smp_init) + mtctr r4 mr r3,r24 - b .pSeries_secondary_smp_init + bctr #else BUG_OPCODE #endif @@ -200,6 +204,20 @@ exception_marker: #define EX_R3 64 #define EX_LR 72 +/* + * We're short on space and time in the exception prolog, so we can't use + * the normal LOADADDR macro. Normally we just need the low halfword of the + * address, but for Kdump we need the whole low word. + */ +#ifdef CONFIG_CRASH_DUMP +#define LOAD_HANDLER(reg, label) \ + oris r12,r12,(label)@h; /* virt addr of handler ... */ \ + ori r12,r12,(label)@l; /* .. and the rest */ +#else +#define LOAD_HANDLER(reg, label) \ + ori r12,r12,(label)@l; /* virt addr of handler ... */ +#endif + #define EXCEPTION_PROLOG_PSERIES(area, label) \ mfspr r13,SPRN_SPRG3; /* get paca address into r13 */ \ std r9,area+EX_R9(r13); /* save r9 - r12 */ \ @@ -212,8 +230,8 @@ exception_marker: clrrdi r12,r13,32; /* get high part of &label */ \ mfmsr r10; \ mfspr r11,SPRN_SRR0; /* save SRR0 */ \ - ori r12,r12,(label)@l; /* virt addr of handler */ \ ori r10,r10,MSR_IR|MSR_DR|MSR_RI; \ + LOAD_HANDLER(r12,label) \ mtspr SPRN_SRR0,r12; \ mfspr r12,SPRN_SRR1; /* and SRR1 */ \ mtspr SPRN_SRR1,r10; \ @@ -1347,7 +1365,7 @@ _GLOBAL(do_stab_bolted) * fixed address (the linker can't compute (u64)&initial_stab >> * PAGE_SHIFT). */ - . = STAB0_PHYS_ADDR /* 0x6000 */ + . = STAB0_OFFSET /* 0x6000 */ .globl initial_stab initial_stab: .space 4096 @@ -1552,7 +1570,7 @@ _STATIC(__boot_from_prom) _STATIC(__after_prom_start) /* - * We need to run with __start at physical address 0. + * We need to run with __start at physical address PHYSICAL_START. * This will leave some code in the first 256B of * real memory, which are reserved for software use. * The remainder of the first page is loaded with the fixed @@ -1567,7 +1585,7 @@ _STATIC(__after_prom_start) mr r26,r3 SET_REG_TO_CONST(r27,KERNELBASE) - li r3,0 /* target addr */ + LOADADDR(r3, PHYSICAL_START) /* target addr */ // XXX FIXME: Use phys returned by OF (r30) add r4,r27,r26 /* source addr */ Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -315,6 +315,7 @@ void early_setup_secondary(void) void smp_release_cpus(void) { extern unsigned long __secondary_hold_spinloop; + unsigned long *ptr; DBG(" -> smp_release_cpus()\n"); @@ -325,7 +326,9 @@ void smp_release_cpus(void) * This is useless but harmless on iSeries, secondaries are already * waiting on their paca spinloops. */ - __secondary_hold_spinloop = 1; + ptr = (unsigned long *)((unsigned long)&__secondary_hold_spinloop + - PHYSICAL_START); + *ptr = 1; mb(); DBG(" <- smp_release_cpus()\n"); Index: kexec/include/asm-ppc64/mmu.h =================================================================== --- kexec.orig/include/asm-ppc64/mmu.h +++ kexec/include/asm-ppc64/mmu.h @@ -30,7 +30,8 @@ /* Location of cpu0's segment table */ #define STAB0_PAGE 0x6 -#define STAB0_PHYS_ADDR (STAB0_PAGE<<12) +#define STAB0_OFFSET (STAB0_PAGE << 12) +#define STAB0_PHYS_ADDR (STAB0_OFFSET + PHYSICAL_START) #ifndef __ASSEMBLY__ extern char initial_stab[]; From miltonm at bga.com Sat Nov 12 04:17:46 2005 From: miltonm at bga.com (Milton Miller) Date: Fri, 11 Nov 2005 11:17:46 -0600 Subject: [PATCH] powerpc: Take 5, merge page.h Message-ID: On Fri Nov 11 14:25:24 EST 2005, Michael Ellerman wrote: > +#ifndef __ASSEMBLY__ > + > +#undef STRICT_MM_TYPECHECKS > + > +#ifdef STRICT_MM_TYPECHECKS Should this be a debugging config? > +/* > + * We always define HW_PAGE_SHIFT to 12 as use of 64K pages remains > Linux > + * specific, every notion of page number shared with the firmware, > TCEs, > + * iommu, etc... still uses a page size of 4K. > + */ > +#define HW_PAGE_SHIFT 12 > +#define HW_PAGE_SIZE (ASM_CONST(1) << HW_PAGE_SHIFT) > +#define HW_PAGE_MASK (~(HW_PAGE_SIZE-1)) > + > +/* > + * PAGE_FACTOR is the number of bits factor between PAGE_SHIFT and > + * HW_PAGE_SHIFT, that is 4K pages. > + */ > +#define PAGE_FACTOR (PAGE_SHIFT - HW_PAGE_SHIFT) ... > +#ifdef CONFIG_PPC_64K_PAGES > +static inline void copy_page(void *to, void *from) > +{ > + unsigned int i; > + for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { > + copy_4K_page(to, from); > + to += 4096; > + from += 4096; > + } > +} > +#else /* CONFIG_PPC_64K_PAGES */ > +static inline void copy_page(void *to, void *from) > +{ > + copy_4K_page(to, from); > +} > +#endif /* CONFIG_PPC_64K_PAGES */ > + Can the compiler optimize the first one to the second? Can we always use the first version? milton From arnd at arndb.de Sat Nov 12 04:53:06 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 11 Nov 2005 18:53:06 +0100 Subject: [PATCH] powerpc: Take 5, merge page.h In-Reply-To: References: Message-ID: <200511111853.07697.arnd@arndb.de> On Freedag 11 November 2005 18:17, Milton Miller wrote: > Can the compiler optimize the first one to the second? ?Can > we always use the first version? > I tried this out and found that gcc-2.95, 3.2, 3.3 and 3.4 can not, but 4.0 can. They can all generate the same object code if you do it like void copy_page(void *to, void *from) { unsigned int i; if (PAGE_SHIFT == 12) copy_4K_page(to, from); else for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { copy_4K_page(to, from); to += 4096; from += 4096; } } Arnd <>< From torvalds at osdl.org Sat Nov 12 05:51:51 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Fri, 11 Nov 2005 10:51:51 -0800 (PST) Subject: please pull the powerpc-merge.git tree In-Reply-To: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> Message-ID: On Fri, 11 Nov 2005, Paul Mackerras wrote: > > Please do a pull from > > git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git This doesn't work at _all_ for me. I get an immediate arch/ppc64/kernel/asm-offsets.c:38:27: error: asm/systemcfg.h: No such file or directory arch/ppc64/kernel/asm-offsets.c: In function 'main': arch/ppc64/kernel/asm-offsets.c:176: error: invalid use of undefined type 'struct systemcfg' ... for some reason. Linus From galak at kernel.crashing.org Sat Nov 12 06:16:03 2005 From: galak at kernel.crashing.org (Kumar Gala) Date: Fri, 11 Nov 2005 13:16:03 -0600 Subject: [PATCH] powerpc: Take 5, merge page.h In-Reply-To: <200511111853.07697.arnd@arndb.de> References: <200511111853.07697.arnd@arndb.de> Message-ID: <0CA89CD0-099F-411F-BEF3-465E69C2C07A@kernel.crashing.org> On Nov 11, 2005, at 11:53 AM, Arnd Bergmann wrote: > On Freedag 11 November 2005 18:17, Milton Miller wrote: >> Can the compiler optimize the first one to the second? Can >> we always use the first version? >> > > I tried this out and found that gcc-2.95, 3.2, 3.3 and 3.4 > can not, but 4.0 can. > > They can all generate the same object code if you do it like > > void copy_page(void *to, void *from) > { > unsigned int i; > if (PAGE_SHIFT == 12) > copy_4K_page(to, from); > else for (i=0; i < (1 << (PAGE_SHIFT - 12)); i++) { > copy_4K_page(to, from); > to += 4096; > from += 4096; > } > } I think its clear to leave the origonal version with the #ifdef since it clearly shows we are doing something for 64K page that is special. - kumar From torvalds at osdl.org Sat Nov 12 07:13:59 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Fri, 11 Nov 2005 12:13:59 -0800 (PST) Subject: please pull the powerpc-merge.git tree In-Reply-To: References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> Message-ID: On Fri, 11 Nov 2005, Linus Torvalds wrote: > > This doesn't work at _all_ for me. I get an immediate > > arch/ppc64/kernel/asm-offsets.c:38:27: error: asm/systemcfg.h: No such file or directory > arch/ppc64/kernel/asm-offsets.c: In function 'main': > arch/ppc64/kernel/asm-offsets.c:176: error: invalid use of undefined type 'struct systemcfg' > ... > > for some reason. The reason being that we have both a arch/ppc64/kernel/asm-offsets.c _and_ a arch/powerpc/kernel/asm-offsets.c and it looks like the ppc64 one is bogus. I assume that people are testing with "make ARCH=powerpc" and never noticed that the old (and default) "ARCH=ppc64" doesn't work any more? Could somebody fix this please? Linus From kjhall at us.ibm.com Sat Nov 12 07:06:33 2005 From: kjhall at us.ibm.com (Kylene Jo Hall) Date: Fri, 11 Nov 2005 14:06:33 -0600 Subject: [PATCH 1 of 2] tpm: necessary PPC64 function exports Message-ID: <1131739594.5048.14.camel@localhost.localdomain> Some work is needed in the tpm device driver to discover the TPM out of the device tree rather than based on set address on Power PPC. This patch exports a couple of functions for the parsing. Signed-off-by: Kylene Hall --- --- linux-2.6.14/arch/ppc64/kernel/prom.c.orig 2005-11-11 13:50:50.000000000 -0600 +++ linux-2.6.14/arch/ppc64/kernel/prom.c 2005-11-11 13:51:42.000000000 -0600 @@ -1261,6 +1261,7 @@ prom_n_addr_cells(struct device_node* np /* No #address-cells property for the root node, default to 1 */ return 1; } +EXPORT_SYMBOL_GPL(prom_n_addr_cells); int prom_n_size_cells(struct device_node* np) @@ -1276,6 +1277,7 @@ prom_n_size_cells(struct device_node* np /* No #size-cells property for the root node, default to 1 */ return 1; } +EXPORT_SYMBOL_GPL(prom_n_size_cells); /** * Work out the sense (active-low level / active-high edge) From kjhall at us.ibm.com Sat Nov 12 07:06:34 2005 From: kjhall at us.ibm.com (Kylene Jo Hall) Date: Fri, 11 Nov 2005 14:06:34 -0600 Subject: [PATCH 2 of 2] tpm: updates for new hardware Message-ID: <1131739595.5048.15.camel@localhost.localdomain> This is the patch to support TPMs on power ppc hardware. It has been reworked as requested to remove the need for messing with the io page mask by just using ioremap. Signed-off-by: Kylene Hall --- diff -uprN --exclude='*.o' --exclude='*.ko' --exclude='.*' --exclude='*~' --exclude='*infineon*' --exclude='*nsc*' --exclude='*mod*' --exclude='*.rej' --exclude='*.orig' linux-2.6.14/drivers/char/tpm/tpm_atmel.c linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm_atmel.c --- linux-2.6.14/drivers/char/tpm/tpm_atmel.c 2005-11-11 11:12:38.000000000 -0600 +++ linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm_atmel.c 2005-11-11 09:42:49.000000000 -0600 @@ -19,14 +19,8 @@ * */ -#include #include "tpm.h" - -/* Atmel definitions */ -enum tpm_atmel_addr { - TPM_ATMEL_BASE_ADDR_LO = 0x08, - TPM_ATMEL_BASE_ADDR_HI = 0x09 -}; +#include "tpm_atmel.h" /* write status bits */ enum tpm_atmel_write_status { @@ -53,13 +47,13 @@ static int tpm_atml_recv(struct tpm_chip return -EIO; for (i = 0; i < 6; i++) { - status = inb(chip->vendor->base + 1); + status = atmel_getb(chip, 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { dev_err(chip->dev, "error reading header\n"); return -EIO; } - *buf++ = inb(chip->vendor->base); + *buf++ = atmel_getb(chip, 0); } /* size of the data received */ @@ -70,7 +64,7 @@ static int tpm_atml_recv(struct tpm_chip dev_err(chip->dev, "Recv size(%d) less than available space\n", size); for (; i < size; i++) { /* clear the waiting data anyway */ - status = inb(chip->vendor->base + 1); + status = atmel_getb(chip, 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { dev_err(chip->dev, "error reading data\n"); @@ -82,17 +76,17 @@ static int tpm_atml_recv(struct tpm_chip /* read all the data available */ for (; i < size; i++) { - status = inb(chip->vendor->base + 1); + status = atmel_getb(chip, 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { dev_err(chip->dev, "error reading data\n"); return -EIO; } - *buf++ = inb(chip->vendor->base); + *buf++ = atmel_getb(chip, 0); } /* make sure data available is gone */ - status = inb(chip->vendor->base + 1); + status = atmel_getb(chip, 1); if (status & ATML_STATUS_DATA_AVAIL) { dev_err(chip->dev, "data available is stuck\n"); return -EIO; @@ -108,7 +102,7 @@ static int tpm_atml_send(struct tpm_chip dev_dbg(chip->dev, "tpm_atml_send:\n"); for (i = 0; i < count; i++) { dev_dbg(chip->dev, "%d 0x%x(%d)\n", i, buf[i], buf[i]); - outb(buf[i], chip->vendor->base); + atmel_putb(buf[i], chip, 0); } return count; @@ -116,12 +110,12 @@ static int tpm_atml_send(struct tpm_chip static void tpm_atml_cancel(struct tpm_chip *chip) { - outb(ATML_STATUS_ABORT, chip->vendor->base + 1); + atmel_putb(ATML_STATUS_ABORT, chip, 1); } static u8 tpm_atml_status(struct tpm_chip *chip) { - return inb(chip->vendor->base + 1); + return atmel_getb(chip, 1); } static struct file_operations atmel_ops = { @@ -162,12 +156,16 @@ static struct tpm_vendor_specific tpm_at static struct platform_device *pdev; -static void __devexit tpm_atml_remove(struct device *dev) +static void atml_plat_remove(void) { - struct tpm_chip *chip = dev_get_drvdata(dev); + struct tpm_chip *chip = dev_get_drvdata(&pdev->dev); + if (chip) { - release_region(chip->vendor->base, 2); + if (chip->vendor->have_region) + atmel_release_region(chip->vendor->base, chip->vendor->region_size); + atmel_put_base_addr(chip->vendor); tpm_remove_hardware(chip->dev); + platform_device_unregister(pdev); } } @@ -182,72 +180,40 @@ static struct device_driver atml_drv = { static int __init init_atmel(void) { int rc = 0; - int lo, hi; driver_register(&atml_drv); - lo = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_LO); - hi = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_HI); - - tpm_atmel.base = (hi<<8)|lo; - - /* verify that it is an Atmel part */ - if (tpm_read_index(TPM_ADDR, 4) != 'A' || tpm_read_index(TPM_ADDR, 5) != 'T' - || tpm_read_index(TPM_ADDR, 6) != 'M' || tpm_read_index(TPM_ADDR, 7) != 'L') { - return -ENODEV; + if (atmel_get_base_addr(&tpm_atmel) != 0) { + rc = -ENODEV; + goto err_unreg_drv; } - /* verify chip version number is 1.1 */ - if ( (tpm_read_index(TPM_ADDR, 0x00) != 0x01) || - (tpm_read_index(TPM_ADDR, 0x01) != 0x01 )) - return -ENODEV; - - pdev = kzalloc(sizeof(struct platform_device), GFP_KERNEL); - if ( !pdev ) - return -ENOMEM; - - pdev->name = "tpm_atmel0"; - pdev->id = -1; - pdev->num_resources = 0; - pdev->dev.release = tpm_atml_remove; - pdev->dev.driver = &atml_drv; + tpm_atmel.have_region = (atmel_request_region( tpm_atmel.base, tpm_atmel.region_size, "tpm_atmel0") == NULL) ? 0 : 1; - if ((rc = platform_device_register(pdev)) < 0) { - kfree(pdev); - pdev = NULL; - return rc; + if (IS_ERR(pdev = platform_device_register_simple("tpm_atmel", -1, NULL, 0 ))) { + rc = PTR_ERR(pdev); + goto err_rel_reg; } - if (request_region(tpm_atmel.base, 2, "tpm_atmel0") == NULL ) { - platform_device_unregister(pdev); - kfree(pdev); - pdev = NULL; - return -EBUSY; - } - - if ((rc = tpm_register_hardware(&pdev->dev, &tpm_atmel)) < 0) { - release_region(tpm_atmel.base, 2); - platform_device_unregister(pdev); - kfree(pdev); - pdev = NULL; - return rc; - } - - dev_info(&pdev->dev, "Atmel TPM 1.1, Base Address: 0x%x\n", - tpm_atmel.base); + if ((rc = tpm_register_hardware(&pdev->dev, &tpm_atmel)) < 0) + goto err_unreg_dev; return 0; + +err_unreg_dev: + platform_device_unregister(pdev); +err_rel_reg: + if (tpm_atmel.have_region) + atmel_release_region(tpm_atmel.base, tpm_atmel.region_size); + atmel_put_base_addr(&tpm_atmel); +err_unreg_drv: + driver_unregister(&atml_drv); + return rc; } static void __exit cleanup_atmel(void) { - if (pdev) { - tpm_atml_remove(&pdev->dev); - platform_device_unregister(pdev); - kfree(pdev); - pdev = NULL; - } - driver_unregister(&atml_drv); + atml_plat_remove(); } module_init(init_atmel); diff -uprN --exclude='*.o' --exclude='*.ko' --exclude='.*' --exclude='*~' --exclude='*infineon*' --exclude='*nsc*' --exclude='*mod*' --exclude='*.rej' --exclude='*.orig' linux-2.6.14/drivers/char/tpm/tpm_atmel.h linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm_atmel.h --- linux-2.6.14/drivers/char/tpm/tpm_atmel.h 1969-12-31 18:00:00.000000000 -0600 +++ linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm_atmel.h 2005-11-11 10:46:58.000000000 -0600 @@ -0,0 +1,129 @@ +/* + * Copyright (C) 2005 IBM Corporation + * + * Authors: + * Kylene Hall + * + * Maintained by: + * + * Device driver for TCG/TCPA TPM (trusted platform module). + * Specifications at www.trustedcomputinggroup.org + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * These difference are required on power because the device must be + * discovered through the device tree and iomap must be used to get + * around the need for holes in the io_page_mask. This does not happen + * automatically because the tpm is not a normal pci device and lives + * under the root node. + * + */ + +#ifdef CONFIG_PPC64 +#define atmel_getb(chip, offset) readb(chip->vendor->iobase + offset); +#define atmel_putb(val, chip, offset) writeb(val, chip->vendor->iobase + offset) +#define atmel_request_region request_mem_region +#define atmel_release_region release_mem_region +static inline void atmel_put_base_addr(struct tpm_vendor_specific *vendor) +{ + iounmap(vendor->iobase); +} + +static int atmel_get_base_addr(struct tpm_vendor_specific *vendor) +{ + struct device_node *dn; + unsigned long address, size; + unsigned int *reg; + int reglen; + int naddrc; + int nsizec; + + dn = of_find_node_by_name(NULL, "tpm"); + + if (!dn) + return 1; + + if (!device_is_compatible(dn, "AT97SC3201")) { + of_node_put(dn); + return 1; + } + + reg = (unsigned int *) get_property(dn, "reg", ®len); + naddrc = prom_n_addr_cells(dn); + nsizec = prom_n_size_cells(dn); + + of_node_put(dn); + + + if (naddrc == 2) + address = ((unsigned long) reg[0] << 32) | reg[1]; + else + address = reg[0]; + + if (nsizec == 2) + size = + ((unsigned long) reg[naddrc] << 32) | reg[naddrc + 1]; + else + size = reg[naddrc]; + + vendor->base = address; + vendor->region_size = size; + vendor->iobase = ioremap(address, size); + return 0; +} +#else +#define atmel_getb(chip, offset) inb(chip->vendor->base + offset) +#define atmel_putb(val, chip, offset) outb(val, chip->vendor->base + offset) +#define atmel_request_region request_region +#define atmel_release_region release_region +/* Atmel definitions */ +enum tpm_atmel_addr { + TPM_ATMEL_BASE_ADDR_LO = 0x08, + TPM_ATMEL_BASE_ADDR_HI = 0x09 +}; + +/* Verify this is a 1.1 Atmel TPM */ +static int atmel_verify_tpm11(void) +{ + + /* verify that it is an Atmel part */ + if (tpm_read_index(TPM_ADDR, 4) != 'A' || + tpm_read_index(TPM_ADDR, 5) != 'T' || + tpm_read_index(TPM_ADDR, 6) != 'M' || + tpm_read_index(TPM_ADDR, 7) != 'L') + return 1; + + /* query chip for its version number */ + if (tpm_read_index(TPM_ADDR, 0x00) != 1 || + tpm_read_index(TPM_ADDR, 0x01) != 1) + return 1; + + /* This is an atmel supported part */ + return 0; +} + +static inline void atmel_put_base_addr(struct tpm_vendor_specific *vendor) +{ +} + +/* Determine where to talk to device */ +static unsigned long atmel_get_base_addr(struct tpm_vendor_specific + *vendor) +{ + int lo, hi; + + if (atmel_verify_tpm11() != 0) + return 1; + + lo = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_LO); + hi = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_HI); + + vendor->base = (hi << 8) | lo; + vendor->region_size = 2; + + return 0; +} +#endif diff -uprN --exclude='*.o' --exclude='*.ko' --exclude='.*' --exclude='*~' --exclude='*infineon*' --exclude='*nsc*' --exclude='*mod*' --exclude='*.rej' --exclude='*.orig' linux-2.6.14/drivers/char/tpm/tpm.h linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm.h --- linux-2.6.14/drivers/char/tpm/tpm.h 2005-11-11 11:12:38.000000000 -0600 +++ linux-2.6.14-rc4-tpm/drivers/char/tpm/tpm.h 2005-11-11 09:36:34.000000000 -0600 @@ -50,7 +50,11 @@ struct tpm_vendor_specific { u8 req_complete_mask; u8 req_complete_val; u8 req_canceled; - u16 base; /* TPM base address */ + void __iomem *iobase; /* ioremapped address */ + unsigned long base; /* TPM base address */ + + int region_size; + int have_region; int (*recv) (struct tpm_chip *, u8 *, size_t); int (*send) (struct tpm_chip *, u8 *, size_t); From bunk at stusta.de Sat Nov 12 07:36:01 2005 From: bunk at stusta.de (Adrian Bunk) Date: Fri, 11 Nov 2005 21:36:01 +0100 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051111202005.GQ5376@stusta.de> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> <20051111201849.GP5376@stusta.de> <20051111202005.GQ5376@stusta.de> Message-ID: <20051111203601.GR5376@stusta.de> On Fri, Nov 11, 2005 at 09:20:05PM +0100, Adrian Bunk wrote: > On Fri, Nov 11, 2005 at 09:18:49PM +0100, Adrian Bunk wrote: > >... > > But in this case -Werror-implicit-function-declaration doesn't create > > new compile errors, it only moves compile errors from compile time to > > link or depmod time - which is IMHO not a bad change. > > > > If you really want to keep the status quo, you can still steal the > > following from sparc64: > > extern unsigned long virt_to_bus_not_defined_use_pci_map(volatile void *addr); > > #define virt_to_bus virt_to_bus_not_defined_use_pci_map > > extern unsigned long bus_to_virt_not_defined_use_pci_map(volatile void *addr); > > #define bus_to_virt bus_to_virt_not_defined_use_pci_map > > > > Would a patch to mark the ISA legacy functions as __deprecated be OK? > >... > > Sorry, this were two separate thoughts: > > Would a patch to mark both virt_to_bus/bus_to_virt and the ISA legacy > functions (that cause similar problems) as __deprecated be OK? Rethinking about this: The ISA legacy functions were declared obsolete five years ago. Since there are not many drivers using them, we could simply mark the drivers still using them as BROKEN. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From benh at kernel.crashing.org Sat Nov 12 08:15:38 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Nov 2005 08:15:38 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> Message-ID: <1131743739.24637.237.camel@gaston> > and it looks like the ppc64 one is bogus. > > I assume that people are testing with "make ARCH=powerpc" and never > noticed that the old (and default) "ARCH=ppc64" doesn't work any more? > > Could somebody fix this please? Oops, my fault, I didn't test old ppc64. I'll do a patch fixing ppc64 later today. Note however that we really don't want to continue supporting ARCH=ppc64 in 2.6.15, you should really be using powerpc :) Ben. From akpm at osdl.org Sat Nov 12 08:24:43 2005 From: akpm at osdl.org (Andrew Morton) Date: Fri, 11 Nov 2005 13:24:43 -0800 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051111201849.GP5376@stusta.de> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> <20051111201849.GP5376@stusta.de> Message-ID: <20051111132443.04061d10.akpm@osdl.org> Adrian Bunk wrote: > > > > > > > > > Sorry, I need to build allmodconfig kernels on wacky architectures (eg > > > > ppc64) and this patch is killing me. > > > > > > Can you send me the list of compile errors so that I can work on fixing > > > them? > > > > > > > No handily, sorry. Missing virt_to_bus() is the typical problem. > > > > But in this case -Werror-implicit-function-declaration doesn't create > new compile errors, it only moves compile errors from compile time to > link or depmod time - which is IMHO not a bad change. It is a quite inconvenient change if you want to get full coverage with `make allmodconfig'. Maybe one can do `make -i' and then weed through the noise - I haven't tried. > If you really want to keep the status quo, you can still steal the > following from sparc64: > extern unsigned long virt_to_bus_not_defined_use_pci_map(volatile void *addr); > #define virt_to_bus virt_to_bus_not_defined_use_pci_map > extern unsigned long bus_to_virt_not_defined_use_pci_map(volatile void *addr); > #define bus_to_virt bus_to_virt_not_defined_use_pci_map Maybe. There were some other failures. From torvalds at osdl.org Sat Nov 12 08:30:05 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Fri, 11 Nov 2005 13:30:05 -0800 (PST) Subject: please pull the powerpc-merge.git tree In-Reply-To: <1131743739.24637.237.camel@gaston> References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> <1131743739.24637.237.camel@gaston> Message-ID: On Sat, 12 Nov 2005, Benjamin Herrenschmidt wrote: > > Oops, my fault, I didn't test old ppc64. I'll do a patch fixing ppc64 > later today. Note however that we really don't want to continue > supporting ARCH=ppc64 in 2.6.15, you should really be using powerpc :) Hey, feel free to make the Makefile turn "ppc64" into "powerpc" and fixing it that way. That's what all the other platforms do: SUBARCH := $(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ -e s/arm.*/arm/ -e s/sa110/arm/ \ -e s/s390x/s390/ -e s/parisc64/parisc/ ) but if so, you should get rid of the old non-working ppc64 too. Linus From benh at kernel.crashing.org Sat Nov 12 08:32:30 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Nov 2005 08:32:30 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> <1131743739.24637.237.camel@gaston> Message-ID: <1131744750.24637.245.camel@gaston> On Fri, 2005-11-11 at 13:30 -0800, Linus Torvalds wrote: > > On Sat, 12 Nov 2005, Benjamin Herrenschmidt wrote: > > > > Oops, my fault, I didn't test old ppc64. I'll do a patch fixing ppc64 > > later today. Note however that we really don't want to continue > > supporting ARCH=ppc64 in 2.6.15, you should really be using powerpc :) > > Hey, feel free to make the Makefile turn "ppc64" into "powerpc" and fixing > it that way. That's what all the other platforms do: > > SUBARCH := $(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ > -e s/arm.*/arm/ -e s/sa110/arm/ \ > -e s/s390x/s390/ -e s/parisc64/parisc/ ) > > but if so, you should get rid of the old non-working ppc64 too. Yes, well, that's what Paul asked you a couple of days ago :) Wether you will give us a few more days after the official 2 weeks limit to finish merging the remaining bits that are still in arch/ppc64 and include/asm-ppc64 and remove arch/ppc64 completely or not :) Hopefully, we should be able to be over sometimes next week. I'll pop to work later this morning and get a patch for these error, I'll let paul decide what to do with the main Makefile. Ben From torvalds at osdl.org Sat Nov 12 09:13:21 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Fri, 11 Nov 2005 14:13:21 -0800 (PST) Subject: please pull the powerpc-merge.git tree In-Reply-To: <1131744750.24637.245.camel@gaston> References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> <1131743739.24637.237.camel@gaston> <1131744750.24637.245.camel@gaston> Message-ID: On Sat, 12 Nov 2005, Benjamin Herrenschmidt wrote: > > Yes, well, that's what Paul asked you a couple of days ago :) Wether you > will give us a few more days after the official 2 weeks limit to finish > merging the remaining bits that are still in arch/ppc64 and > include/asm-ppc64 and remove arch/ppc64 completely or not :) Hopefully, > we should be able to be over sometimes next week. Well, since right now ppc64 doesn't work at all, I think that's pretty moot ;/ Since building with "make ARCH=powerpc" _does_ work for me, and since the biggest section of ppc64 machines by far would likely be G5's and other machines where that merge is tested, I think the thing to do is just call it a fait accompli, and just say that ppc64 is dead. But I was going to release a -rc1 today (unless Andrew), and I do want things to just "work". So for me the question there is whether I just do the one-liner Makefile thing to force "ppc64" -> "powerpc", or whether you can get me a quick patch that gets ppc64 going again. I don't care terribly which way it goes, and if ppc64 ends up disabled in -rc1 and dying off entirely during -rc2, that's ok by me. But if it's enabled in -rc1, then I want it enabled in the final too. I don't want to kill a sub-arch after -rc1, that would be against the whole point of the modern definition of -rc1. So kill it off now, or kill it off after 2.6.15, that's what the choice boils down to. Linus From benh at kernel.crashing.org Sat Nov 12 09:13:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Nov 2005 09:13:28 +1100 Subject: [PATCH] glibc vDSO bits for ppc/ppc64 update In-Reply-To: References: Message-ID: <1131747209.24637.255.camel@gaston> > Additional hp_timer_t is now declared as unsigned long long int > to match up with benh's get_tbfreq on 32 bit ppc. (ppc64 is already > 64 bit) I'm not sure having hp_timer_t be 64 bits on 32 bits libraries is the right thing to do. I suppose it's a matter of performances tradeoff. The fact that the vDSO __kernel_get_tbfreq() returns a long long doesn't force you to use a long long hp_timer_t, the vDSO doesn't have to return a value of type hp_timer_t :) I did that on the vDSO to avoid having to change the vDSO interface if we ever have to deal with >4Ghz time sources. Ben. From bunk at stusta.de Sat Nov 12 07:18:49 2005 From: bunk at stusta.de (Adrian Bunk) Date: Fri, 11 Nov 2005 21:18:49 +0100 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051110182443.514622ed.akpm@osdl.org> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> Message-ID: <20051111201849.GP5376@stusta.de> On Thu, Nov 10, 2005 at 06:24:43PM -0800, Andrew Morton wrote: > Adrian Bunk wrote: > > > > On Thu, Nov 10, 2005 at 04:28:57AM -0800, Andrew Morton wrote: > > > Adrian Bunk wrote: > > > > > > > > Currently, using an undeclared function gives a compile warning, but it > > > > can lead to a nasty runtime error if the prototype of the function is > > > > different from what gcc guessed. > > > > > > > > With -Werror-implicit-function-declaration, we are getting an immediate > > > > compile error instead. > > > > > > > > There will be some compile errors in cases where compilation previously > > > > worked because the undefined function wasn't called due to gcc dead code > > > > elimination, but in these cases a proper fix doesnt harm. > > > > > > > > > > Sorry, I need to build allmodconfig kernels on wacky architectures (eg > > > ppc64) and this patch is killing me. > > > > Can you send me the list of compile errors so that I can work on fixing > > them? > > > > No handily, sorry. Missing virt_to_bus() is the typical problem. > But in this case -Werror-implicit-function-declaration doesn't create new compile errors, it only moves compile errors from compile time to link or depmod time - which is IMHO not a bad change. If you really want to keep the status quo, you can still steal the following from sparc64: extern unsigned long virt_to_bus_not_defined_use_pci_map(volatile void *addr); #define virt_to_bus virt_to_bus_not_defined_use_pci_map extern unsigned long bus_to_virt_not_defined_use_pci_map(volatile void *addr); #define bus_to_virt bus_to_virt_not_defined_use_pci_map Would a patch to mark the ISA legacy functions as __deprecated be OK? This might give some motivation for people to convert drivers and would avoid new code like the recently introduced kexec to use this obsolete API. > The cross-tools at http://developer.osdl.org/dev/plm/cross_compile/ are > quite simple to install. Thanks, I've tried it. Other problems I found until I gave up on compiling: - a problem in sk98lin indirectly corrected by my SkPciWriteCfgDWord() patch - drivers/net/wireless/tiacx/: missing #include 's (see my patch) - this seems to be a real bug cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From bunk at stusta.de Sat Nov 12 07:20:05 2005 From: bunk at stusta.de (Adrian Bunk) Date: Fri, 11 Nov 2005 21:20:05 +0100 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051111201849.GP5376@stusta.de> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> <20051111201849.GP5376@stusta.de> Message-ID: <20051111202005.GQ5376@stusta.de> On Fri, Nov 11, 2005 at 09:18:49PM +0100, Adrian Bunk wrote: >... > But in this case -Werror-implicit-function-declaration doesn't create > new compile errors, it only moves compile errors from compile time to > link or depmod time - which is IMHO not a bad change. > > If you really want to keep the status quo, you can still steal the > following from sparc64: > extern unsigned long virt_to_bus_not_defined_use_pci_map(volatile void *addr); > #define virt_to_bus virt_to_bus_not_defined_use_pci_map > extern unsigned long bus_to_virt_not_defined_use_pci_map(volatile void *addr); > #define bus_to_virt bus_to_virt_not_defined_use_pci_map > > Would a patch to mark the ISA legacy functions as __deprecated be OK? >... Sorry, this were two separate thoughts: Would a patch to mark both virt_to_bus/bus_to_virt and the ISA legacy functions (that cause similar problems) as __deprecated be OK? cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From benh at kernel.crashing.org Sat Nov 12 09:17:34 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Nov 2005 09:17:34 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> <1131743739.24637.237.camel@gaston> <1131744750.24637.245.camel@gaston> Message-ID: <1131747454.24637.258.camel@gaston> > Well, since right now ppc64 doesn't work at all, I think that's pretty > moot ;/ > > Since building with "make ARCH=powerpc" _does_ work for me, and since the > biggest section of ppc64 machines by far would likely be G5's and other > machines where that merge is tested, I think the thing to do is just call > it a fait accompli, and just say that ppc64 is dead. Yes, but we still have to move some files over from arch/ppc64 and include/asm-ppc64 (even building with ARCH=powerpc does get some files from there, at least until they are completely emptied). It's mostly just moving things over except for a couple of ones that need to be actually merged. Note that all machines supported by ppc64 are now supported by powerpc. In fact, all of the platform code has been in arch/powerpc/platforms for some time now. > But I was going to release a -rc1 today (unless Andrew), and I do want > things to just "work". So for me the question there is whether I just do > the one-liner Makefile thing to force "ppc64" -> "powerpc", or whether you > can get me a quick patch that gets ppc64 going again. Ok, just do the one liner in the Makefile then. > I don't care terribly which way it goes, and if ppc64 ends up disabled in > -rc1 and dying off entirely during -rc2, that's ok by me. Ok. > But if it's enabled in -rc1, then I want it enabled in the final too. I > don't want to kill a sub-arch after -rc1, that would be against the whole > point of the modern definition of -rc1. Ok, disable it then. > So kill it off now, or kill it off after 2.6.15, that's what the choice > boils down to. It's dead :) Ben. From tom_gall at vnet.ibm.com Sat Nov 12 09:11:00 2005 From: tom_gall at vnet.ibm.com (Tom Gall) Date: Fri, 11 Nov 2005 16:11:00 -0600 (CST) Subject: [PATCH] glibc vDSO bits for ppc/ppc64 update Message-ID: Greetings, Enclosed is the next revision of the glibc vDSO implementation for ppc/ppc64 that goes hand in hand with benh's vDSO kernel work currently available at http://gate.crashing.org/~benh/ppc64-vdso-update.diff. Implemented are: __vdso_gettimeofdate __vdso_clock_gettime __vdso_clock_getres __vdso_get_tbfreq New to this patch as compared to the last posting is a rework for symbol lookup to use _dl_vdso_weakref which as a nice side effect when vdso symbols aren't found anoying error messages are no longer printed. Additional hp_timer_t is now declared as unsigned long long int to match up with benh's get_tbfreq on 32 bit ppc. (ppc64 is already 64 bit) Appreciate comments, questions and suggestions! 2005-11-11 Steven Munroe Tom Gall * elf/rtld.c (dl_main): Initialize l_local_scope for sysinfo_map. * sysdeps/powerpc/elf/libc-start.c: Move this. * sysdeps/unix/sysv/linux/powerpc/libc-start.c: To here. * sysdeps/powerpc/Versions: add __vdso_get_tbfreq symbol * sysdeps/unix/sysv/linux/powerpc/bits/libc-vdso.h: New file. * sysdeps/unix/sysv/linux/powerpc/clock_getres.c: New file. * sysdeps/unix/sysv/linux/powerpc/clock_gettime.c: New file. * sysdeps/unix/sysv/linux/powerpc/dl-vdso.c: New file. * sysdeps/unix/sysv/linux/powerpc/dl-vdso.h: New file. * sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c: use vDSO / format * sysdeps/unix/sysv/linux/powerpc/gettimeofday.c: New file. * sysdeps/unix/sysv/linux/powerpc/Makefile: Add routines += dl-vdso. * sysdeps/powerpc/powerpc32/hp-timing.h: New file. diff -uNr libc/elf/rtld.c libc-41-32/elf/rtld.c --- libc/elf/rtld.c 2005-11-11 11:18:09.389605816 -0500 +++ libc-41-32/elf/rtld.c 2005-11-11 09:31:48.000000000 -0500 @@ -1296,6 +1296,13 @@ elf_get_dynamic_info (l, dyn_temp); _dl_setup_hash (l); l->l_relocated = 1; + /* Initialize l_local_scope to contain just this map. This allows + the use of dl_lookup_symbol_x to resolve symbols within the vdso. + So we create a single entry list pointing to l_real as its only + element */ + + l->l_local_scope[0]->r_nlist = 1; + l->l_local_scope[0]->r_list = &l->l_real; /* Now that we have the info handy, use the DSO image's soname so this object can be looked up by name. Note that we do not diff -uNr libc/sysdeps/powerpc/elf/libc-start.c libc-41-32/sysdeps/powerpc/elf/libc-start.c --- libc/sysdeps/powerpc/elf/libc-start.c 2005-11-11 11:18:20.366667224 -0500 +++ libc-41-32/sysdeps/powerpc/elf/libc-start.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,99 +0,0 @@ -/* Copyright (C) 1998,2000,2001,2002,2003,2004 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA - 02111-1307 USA. */ - -#include -#include -#include -#include -#include - -extern int __cache_line_size; -weak_extern (__cache_line_size) - -/* The main work is done in the generic function. */ -#define LIBC_START_MAIN generic_start_main -#define LIBC_START_DISABLE_INLINE -#define LIBC_START_MAIN_AUXVEC_ARG -#define MAIN_AUXVEC_ARG -#include - - -struct startup_info -{ - void *__unbounded sda_base; - int (*main) (int, char **, char **, void *); - int (*init) (int, char **, char **, void *); - void (*fini) (void); -}; - - -int -/* GKM FIXME: GCC: this should get __BP_ prefix by virtue of the - BPs in the arglist of startup_info.main and startup_info.init. */ -BP_SYM (__libc_start_main) (int argc, char *__unbounded *__unbounded ubp_av, - char *__unbounded *__unbounded ubp_ev, - ElfW(auxv_t) *__unbounded auxvec, - void (*rtld_fini) (void), - struct startup_info *__unbounded stinfo, - char *__unbounded *__unbounded stack_on_entry) -{ -#if __BOUNDED_POINTERS__ - char **argv; -#else -# define argv ubp_av -#endif - - /* the PPC SVR4 ABI says that the top thing on the stack will - be a NULL pointer, so if not we assume that we're being called - as a statically-linked program by Linux... */ - if (*stack_on_entry != NULL) - { - char *__unbounded *__unbounded temp; - /* ...in which case, we have argc as the top thing on the - stack, followed by argv (NULL-terminated), envp (likewise), - and the auxilary vector. */ - /* 32/64-bit agnostic load from stack */ - argc = *(long int *__unbounded) stack_on_entry; - ubp_av = stack_on_entry + 1; - ubp_ev = ubp_av + argc + 1; -#ifdef HAVE_AUX_VECTOR - temp = ubp_ev; - while (*temp != NULL) - ++temp; - auxvec = (ElfW(auxv_t) *)++temp; -#endif - rtld_fini = NULL; - } - - /* Initialize the __cache_line_size variable from the aux vector. */ - for (ElfW(auxv_t) *av = auxvec; av->a_type != AT_NULL; ++av) - switch (av->a_type) - { - case AT_DCACHEBSIZE: - { - int *cls = & __cache_line_size; - if (cls != NULL) - *cls = av->a_un.a_val; - } - break; - } - - return generic_start_main (stinfo->main, argc, ubp_av, auxvec, - stinfo->init, stinfo->fini, rtld_fini, - stack_on_entry); -} diff -uNr libc/sysdeps/powerpc/Versions libc-41-32/sysdeps/powerpc/Versions --- libc/sysdeps/powerpc/Versions 2005-11-11 11:18:20.367667072 -0500 +++ libc-41-32/sysdeps/powerpc/Versions 2005-11-11 09:31:48.000000000 -0500 @@ -13,5 +13,6 @@ GLIBC_PRIVATE { __novmx__libc_longjmp; __novmx__libc_siglongjmp; __vmx__libc_longjmp; __vmx__libc_siglongjmp; + __vdso_get_tbfreq; } } diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/bits/libc-vdso.h libc-41-32/sysdeps/unix/sysv/linux/powerpc/bits/libc-vdso.h --- libc/sysdeps/unix/sysv/linux/powerpc/bits/libc-vdso.h 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/bits/libc-vdso.h 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,36 @@ +/* Resolved function pointers to VDSO functions. + Copyright (C) 2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + + +#ifndef _LIBC_VDSO_H +#define _LIBC_VDSO_H + +#ifdef SHARED + +typedef int (* __vdso_gettimeofday_t)(void *, void *); + +typedef int (* __vdso_clock_gettime_t)(clockid_t, struct timespec *); + +typedef int (* __vdso_clock_getres_t)(struct timeval*, void*); + +typedef hp_timing_t (* __vdso_get_tbfreq_t)(void); + +#endif + +#endif /* _LIBC_VDSO_H */ diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/clock_getres.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/clock_getres.c --- libc/sysdeps/unix/sysv/linux/powerpc/clock_getres.c 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/clock_getres.c 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,197 @@ +/* clock_getres -- Get the resolution of a POSIX clockid_t. Linux version. + Copyright (C) 2003, 2004 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include +#include +#include +#include +#include + +#undef __clock_getres +#include + + +#define SYSCALL_GETRES_MONOTONIC \ +#ifdef SHARED \ + if (__vdso_clock_getres== NULL)\ + retval = INLINE_SYSCALL (clock_getres, 2, clock_id, res); \ + else \ + return (__vdso_clock_getres) (clock_id, tp); \ + break + +#define SYSCALL_GETRES_REALTIME \ + retval = INLINE_SYSCALL (clock_getres, 2, clock_id, res); \ + break + +#ifdef __ASSUME_POSIX_TIMERS + +/* This means the REALTIME and MONOTONIC clock are definitely + supported in the kernel. */ +# define SYSDEP_GETRES \ + SYSDEP_GETRES_CPUTIME \ + case CLOCK_REALTIME: \ + SYSCALL_GETRES_REALTIME \ + case CLOCK_MONOTONIC: \ + SYSCALL_GETRES_MONOTONIC + +# define __libc_missing_posix_timers 0 +#elif defined __NR_clock_getres +/* Is the syscall known to exist? */ +extern int __libc_missing_posix_timers attribute_hidden; + +static inline int +maybe_syscall_getres (clockid_t clock_id, struct timespec *res) +{ + int e = EINVAL; + + if (!__libc_missing_posix_timers) + { + INTERNAL_SYSCALL_DECL (err); + int r = INTERNAL_SYSCALL (clock_getres, err, 2, clock_id, res); + if (!INTERNAL_SYSCALL_ERROR_P (r, err)) + return 0; + + e = INTERNAL_SYSCALL_ERRNO (r, err); + if (e == ENOSYS) + { + __libc_missing_posix_timers = 1; + e = EINVAL; + } + } + + return e; +} + +/* The REALTIME and MONOTONIC clock might be available. Try the + syscall first. */ +# define SYSDEP_GETRES \ + SYSDEP_GETRES_CPUTIME \ + case CLOCK_REALTIME: \ + case CLOCK_MONOTONIC: \ + retval = maybe_syscall_getres (clock_id, res); \ + if (retval == 0) \ + break; \ + /* Fallback code. */ \ + if (retval == EINVAL && clock_id == CLOCK_REALTIME) \ + retval = realtime_getres (res); \ + else \ + { \ + __set_errno (retval); \ + retval = -1; \ + } \ + break; +#endif + +#ifdef __NR_clock_getres +/* We handled the REALTIME clock here. */ +# define HANDLED_REALTIME 1 +# define HANDLED_CPUTIME 1 + +# if __ASSUME_POSIX_CPU_TIMERS > 0 + +# define SYSDEP_GETRES_CPU SYSCALL_GETRES +# define SYSDEP_GETRES_CPUTIME /* Default catches them too. */ + +# else + +extern int __libc_missing_posix_cpu_timers attribute_hidden; + +static int +maybe_syscall_getres_cpu (clockid_t clock_id, struct timespec *res) +{ + int e = EINVAL; + + if (!__libc_missing_posix_cpu_timers) + { + INTERNAL_SYSCALL_DECL (err); + int r = INTERNAL_SYSCALL (clock_getres, err, 2, clock_id, res); + if (!INTERNAL_SYSCALL_ERROR_P (r, err)) + return 0; + + e = INTERNAL_SYSCALL_ERRNO (r, err); +# ifndef __ASSUME_POSIX_TIMERS + if (e == ENOSYS) + { + __libc_missing_posix_timers = 1; + __libc_missing_posix_cpu_timers = 1; + e = EINVAL; + } + else +# endif + { + if (e == EINVAL) + { + /* Check whether the kernel supports CPU clocks at all. + If not, record it for the future. */ + r = INTERNAL_SYSCALL (clock_getres, err, 2, + MAKE_PROCESS_CPUCLOCK (0, CPUCLOCK_SCHED), + NULL); + if (INTERNAL_SYSCALL_ERROR_P (r, err)) + __libc_missing_posix_cpu_timers = 1; + } + } + } + + return e; +} + +# define SYSDEP_GETRES_CPU \ + retval = maybe_syscall_getres_cpu (clock_id, res); \ + if (retval == 0) \ + break; \ + if (retval != EINVAL || !__libc_missing_posix_cpu_timers) \ + { \ + __set_errno (retval); \ + retval = -1; \ + break; \ + } \ + retval = -1 /* Otherwise continue on to the HP_TIMING version. */; + +static inline int +maybe_syscall_getres_cputime (clockid_t clock_id, struct timespec *res) +{ + return maybe_syscall_getres_cpu + (clock_id == CLOCK_THREAD_CPUTIME_ID + ? MAKE_THREAD_CPUCLOCK (0, CPUCLOCK_SCHED) + : MAKE_PROCESS_CPUCLOCK (0, CPUCLOCK_SCHED), + res); +} + +# define SYSDEP_GETRES_CPUTIME \ + case CLOCK_PROCESS_CPUTIME_ID: \ + case CLOCK_THREAD_CPUTIME_ID: \ + retval = maybe_syscall_getres_cputime (clock_id, res); \ + if (retval == 0) \ + break; \ + if (retval != EINVAL || !__libc_missing_posix_cpu_timers) \ + { \ + __set_errno (retval); \ + retval = -1; \ + break; \ + } \ + retval = hp_timing_getres (res); \ + break; +# if !HP_TIMING_AVAIL +# define hp_timing_getres(res) (__set_errno (EINVAL), -1) +# endif + +# endif +#endif + +#include diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/clock_gettime.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/clock_gettime.c --- libc/sysdeps/unix/sysv/linux/powerpc/clock_gettime.c 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/clock_gettime.c 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,196 @@ +/* clock_gettime -- Get current time from a POSIX clockid_t. Linux version. + Copyright (C) 2003, 2004 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include +#include +#include +#include +#include + +#undef __clock_gettime +#include + +#define SYSCALL_GETTIME_CLOCK_MONOTONIC \ +#ifdef SHARED \ + if (__vdso_clock_gettime== NULL) \ + retval = INLINE_SYSCALL (clock_gettime, 2, clock_id, tp); \ + else \ + return (__vdso_clock_gettime) (clock_id, tp); \ + break + +#define SYSCALL_GETTIME_CLOCK_REALTIME \ + retval = INLINE_SYSCALL (clock_gettime, 2, clock_id, tp); \ + break + +#ifdef __ASSUME_POSIX_TIMERS + +/* This means the REALTIME and MONOTONIC clock are definitely + supported in the kernel. */ +# define SYSDEP_GETTIME \ + SYSDEP_GETTIME_CPUTIME \ + case CLOCK_REALTIME: \ + SYSCALL_GETTIME_CLOCK_REALTIME \ + case CLOCK_MONOTONIC: \ + SYSCALL_GETTIME_CLOCK_MONOTONIC + +# define __libc_missing_posix_timers 0 +#elif defined __NR_clock_gettime +/* Is the syscall known to exist? */ +int __libc_missing_posix_timers attribute_hidden; + +static inline int +maybe_syscall_gettime (clockid_t clock_id, struct timespec *tp) +{ + int e = EINVAL; + + if (!__libc_missing_posix_timers) + { + INTERNAL_SYSCALL_DECL (err); + int r = INTERNAL_SYSCALL (clock_gettime, err, 2, clock_id, tp); + if (!INTERNAL_SYSCALL_ERROR_P (r, err)) + return 0; + + e = INTERNAL_SYSCALL_ERRNO (r, err); + if (e == ENOSYS) + { + __libc_missing_posix_timers = 1; + e = EINVAL; + } + } + + return e; +} + +/* The REALTIME and MONOTONIC clock might be available. Try the + syscall first. */ +# define SYSDEP_GETTIME \ + SYSDEP_GETTIME_CPUTIME \ + case CLOCK_REALTIME: \ + case CLOCK_MONOTONIC: \ + retval = maybe_syscall_gettime (clock_id, tp); \ + if (retval == 0) \ + break; \ + /* Fallback code. */ \ + if (retval == EINVAL && clock_id == CLOCK_REALTIME) \ + retval = realtime_gettime (tp); \ + else \ + { \ + __set_errno (retval); \ + retval = -1; \ + } \ + break; +#endif + +#ifdef __NR_clock_gettime +/* We handled the REALTIME clock here. */ +# define HANDLED_REALTIME 1 +# define HANDLED_CPUTIME 1 + +# if __ASSUME_POSIX_CPU_TIMERS > 0 + +# define SYSDEP_GETTIME_CPU SYSCALL_GETTIME +# define SYSDEP_GETTIME_CPUTIME /* Default catches them too. */ + +# else + +int __libc_missing_posix_cpu_timers attribute_hidden; + +static int +maybe_syscall_gettime_cpu (clockid_t clock_id, struct timespec *tp) +{ + int e = EINVAL; + + if (!__libc_missing_posix_cpu_timers) + { + INTERNAL_SYSCALL_DECL (err); + int r = INTERNAL_SYSCALL (clock_gettime, err, 2, clock_id, tp); + if (!INTERNAL_SYSCALL_ERROR_P (r, err)) + return 0; + + e = INTERNAL_SYSCALL_ERRNO (r, err); +# ifndef __ASSUME_POSIX_TIMERS + if (e == ENOSYS) + { + __libc_missing_posix_timers = 1; + __libc_missing_posix_cpu_timers = 1; + e = EINVAL; + } + else +# endif + { + if (e == EINVAL) + { + /* Check whether the kernel supports CPU clocks at all. + If not, record it for the future. */ + r = INTERNAL_SYSCALL (clock_getres, err, 2, + MAKE_PROCESS_CPUCLOCK (0, CPUCLOCK_SCHED), + NULL); + if (INTERNAL_SYSCALL_ERROR_P (r, err)) + __libc_missing_posix_cpu_timers = 1; + } + } + } + + return e; +} + +# define SYSDEP_GETTIME_CPU \ + retval = maybe_syscall_gettime_cpu (clock_id, tp); \ + if (retval == 0) \ + break; \ + if (retval != EINVAL || !__libc_missing_posix_cpu_timers) \ + { \ + __set_errno (retval); \ + retval = -1; \ + break; \ + } \ + retval = -1 /* Otherwise continue on to the HP_TIMING version. */; + +static inline int +maybe_syscall_gettime_cputime (clockid_t clock_id, struct timespec *tp) +{ + return maybe_syscall_gettime_cpu + (clock_id == CLOCK_THREAD_CPUTIME_ID + ? MAKE_THREAD_CPUCLOCK (0, CPUCLOCK_SCHED) + : MAKE_PROCESS_CPUCLOCK (0, CPUCLOCK_SCHED), + tp); +} + +# define SYSDEP_GETTIME_CPUTIME \ + case CLOCK_PROCESS_CPUTIME_ID: \ + case CLOCK_THREAD_CPUTIME_ID: \ + retval = maybe_syscall_gettime_cputime (clock_id, tp); \ + if (retval == 0) \ + break; \ + if (retval != EINVAL || !__libc_missing_posix_cpu_timers) \ + { \ + __set_errno (retval); \ + retval = -1; \ + break; \ + } \ + retval = hp_timing_gettime (clock_id, tp); \ + break; +# if !HP_TIMING_AVAIL +# define hp_timing_gettime(clock_id, tp) (__set_errno (EINVAL), -1) +# endif + +# endif +#endif + +#include diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/dl-vdso.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/dl-vdso.c --- libc/sysdeps/unix/sysv/linux/powerpc/dl-vdso.c 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/dl-vdso.c 2005-11-11 11:09:33.054571056 -0500 @@ -0,0 +1,128 @@ +/* VDSO symbol handling in the ELF dynamic linker. + Copyright (C) 2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include "config.h" +#include +#include + +/* This function attempts to resolve a symbol in the VDSO map but + simply returns NULL if it is not there. This is intended for the + syscall replacement case where the runtime can fail-safe back to + the syscall. For example if we are running on an older kernel + that does not implement the specific function yet. */ + +void *internal_function +_dl_vdso_weakref (const char *name) +{ + const ElfW (Sym) * ref = NULL; + struct link_map *map = GLRO (dl_sysinfo_map); + void *value = NULL; + lookup_t result; + + if (map != NULL) + { + /* This is a tricky because by default _dl_lookup_symbol_x will fail + and exit if the symbol is not found. Specifying a "skip_map" is + one case where _dl_loopup_symbol_x will do what we want, but the + l_local_scope in the dl_sysinfo_map will not to give the desired + result. So we create a dummy scope and link-map struct to be + the "skip_map". As the dummy.l_real points back to the real VDSO + map, do_lookup_x can find the symbol if it is there. */ + struct r_scope_elem *symbol_scope[2]; + struct link_map *scope_list[2]; + struct r_scope_elem dummy_scope; + struct link_map dummy; + + symbol_scope[0] = &dummy_scope; + symbol_scope[1] = NULL; + symbol_scope[0]->r_nlist = 2; + symbol_scope[0]->r_list = &scope_list[0]; + scope_list[0] = map->l_local_scope[0]->r_list[0]; + scope_list[1] = &dummy; + dummy.l_real = map->l_real; + + /* Search the scope of the given vdso map. */ + result = GLRO (dl_lookup_symbol_x) (name, map, &ref, + symbol_scope, NULL, 0, 0, &dummy); + + if (ref != NULL) + { + value = DL_SYMBOL_ADDRESS (result, ref); + } + } + + return value; +} + + +void *internal_function +_dl_vdso_sym (const char *name) +{ + const ElfW (Sym) * ref = NULL; + struct link_map *map = GLRO (dl_sysinfo_map); + void *value = NULL; + lookup_t result; + + if (map != NULL) + { + /* Search the scope of the given vdso map. */ + result = GLRO (dl_lookup_symbol_x) (name, map, &ref, + map->l_local_scope, + NULL, 0, 0, NULL); + + if (ref != NULL) + { + value = DL_SYMBOL_ADDRESS (result, ref); + } + } + + return value; +} + +void *internal_function +_dl_vdso_vsym (const char *name, const char *version) +{ + const ElfW (Sym) * ref = NULL; + struct link_map *map = GLRO (dl_sysinfo_map); + void *value = NULL; + struct r_found_version vers; + lookup_t result; + + if (map != NULL) + { + /* Compute hash value to the version string. */ + vers.name = version; + vers.hidden = 1; + vers.hash = _dl_elf_hash (version); + /* We don't have a specific file where the symbol can be found. */ + vers.filename = NULL; + + /* Search the scope of the vdso map. */ + result = GLRO (dl_lookup_symbol_x) (name, map, &ref, + map->l_local_scope, + &vers, 0, 0, NULL); + + if (ref != NULL) + { + value = DL_SYMBOL_ADDRESS (result, ref); + + } + } + return value; +} diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/dl-vdso.h libc-41-32/sysdeps/unix/sysv/linux/powerpc/dl-vdso.h --- libc/sysdeps/unix/sysv/linux/powerpc/dl-vdso.h 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/dl-vdso.h 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,33 @@ +/* Run-time dynamic linker data structures for loaded ELF shared objects. + Copyright (C) 2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#ifndef _DL_VDSO_H +#define _DL_VDSO_H + +/* Functions for resolving symbols in the VDSO link map. */ + extern void *_dl_vdso_weakref (const char *name) + internal_function attribute_hidden; + + extern void *_dl_vdso_sym (const char *name) + internal_function attribute_hidden; + + extern void *_dl_vdso_vsym (const char *name, const char *version) + internal_function attribute_hidden; + +#endif /* ldsodefs.h */ diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c --- libc/sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c 2005-11-11 11:18:22.804563648 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c 2005-11-11 09:31:48.000000000 -0500 @@ -22,7 +22,11 @@ #include #include #include +#include +#ifdef SHARED +extern __vdso_get_tbfreq_t __vdso_get_tbfreq; +#endif hp_timing_t __get_clockfreq (void) @@ -38,68 +42,78 @@ if (timebase_freq != 0) return timebase_freq; - int fd = open ("/proc/cpuinfo", O_RDONLY); - if (__builtin_expect (fd != -1, 1)) + /* if we can use the vDSO to obtain the timebase even better */ +#ifdef SHARED + if (__vdso_get_tbfreq != NULL) { - /* The timebase will be in the 1st 1024 bytes for systems with up - to 8 processors. If the first read returns less then 1024 - bytes read, we have the whole cpuinfo and can start the scan. - Otherwise we will have to read more to insure we have the - timebase value in the scan. */ - char buf[1024]; - ssize_t n; - - n = read (fd, buf, sizeof (buf)); - if (n == sizeof (buf)) - { - /* We are here because the 1st read returned exactly sizeof - (buf) bytes. This implies that we are not at EOF and may - not have read the timebase value yet. So we need to read - more bytes until we know we have EOF. We copy the lower - half of buf to the upper half and read sizeof (buf)/2 - bytes into the lower half of buf and repeat until we - reach EOF. We can assume that the timebase will be in - the last 512 bytes of cpuinfo, so two 512 byte half_bufs - will be sufficient to contain the timebase and will - handle the case where the timebase spans the half_buf - boundry. */ - const ssize_t half_buf = sizeof (buf) / 2; - while (n >= half_buf) - { - memcpy (buf, buf + half_buf, half_buf); - n = read (fd, buf + half_buf, half_buf); - } - if (n >= 0) - n += half_buf; - } - - if (__builtin_expect (n, 1) > 0) - { - char *mhz = memmem (buf, n, "timebase", 7); - - if (__builtin_expect (mhz != NULL, 1)) - { - char *endp = buf + n; - - /* Search for the beginning of the string. */ - while (mhz < endp && (*mhz < '0' || *mhz > '9') && *mhz != '\n') - ++mhz; - - while (mhz < endp && *mhz != '\n') - { - if (*mhz >= '0' && *mhz <= '9') - { - result *= 10; - result += *mhz - '0'; - } + timebase_freq=(__vdso_get_tbfreq)(); + } + else +#endif + { + int fd = open ("/proc/cpuinfo", O_RDONLY); + if (__builtin_expect (fd != -1, 1)) + { + /* The timebase will be in the 1st 1024 bytes for systems with up + to 8 processors. If the first read returns less then 1024 + bytes read, we have the whole cpuinfo and can start the scan. + Otherwise we will have to read more to insure we have the + timebase value in the scan. */ + char buf[1024]; + ssize_t n; + + n = read (fd, buf, sizeof (buf)); + if (n == sizeof (buf)) + { + /* We are here because the 1st read returned exactly sizeof + (buf) bytes. This implies that we are not at EOF and may + not have read the timebase value yet. So we need to read + more bytes until we know we have EOF. We copy the lower + half of buf to the upper half and read sizeof (buf)/2 + bytes into the lower half of buf and repeat until we + reach EOF. We can assume that the timebase will be in + the last 512 bytes of cpuinfo, so two 512 byte half_bufs + will be sufficient to contain the timebase and will + handle the case where the timebase spans the half_buf + boundry. */ + const ssize_t half_buf = sizeof (buf) / 2; + while (n >= half_buf) + { + memcpy (buf, buf + half_buf, half_buf); + n = read (fd, buf + half_buf, half_buf); + } + if (n >= 0) + n += half_buf; + } + + if (__builtin_expect (n, 1) > 0) + { + char *mhz = memmem (buf, n, "timebase", 7); + + if (__builtin_expect (mhz != NULL, 1)) + { + char *endp = buf + n; + + /* Search for the beginning of the string. */ + while (mhz < endp && (*mhz < '0' || *mhz > '9') && *mhz != '\n') ++mhz; - } - } - timebase_freq = result; - } - close (fd); - } + while (mhz < endp && *mhz != '\n') + { + if (*mhz >= '0' && *mhz <= '9') + { + result *= 10; + result += *mhz - '0'; + } + + ++mhz; + } + } + timebase_freq = result; + } + close (fd); + } + } return timebase_freq; } diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/gettimeofday.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/gettimeofday.c --- libc/sysdeps/unix/sysv/linux/powerpc/gettimeofday.c 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/gettimeofday.c 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,54 @@ +/* Copyright (C) 2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include +#include +#include +#include +#include +#include + +#undef __gettimeofday +#include + +/* Get the current time of day and timezone information, + putting it into *TV and *TZ. If TZ is NULL, *TZ is not filled. + Returns 0 on success, -1 on errors. */ + +#ifdef SHARED +extern __vdso_gettimeofday_t __vdso_gettimeofday; +#endif + +int +__gettimeofday (tv, tz) + struct timeval *tv; + struct timezone *tz; +{ +#ifdef SHARED + if (__vdso_gettimeofday == NULL) +#endif + return INLINE_SYSCALL (gettimeofday, 2, CHECK_1 (tv), CHECK_1 (tz)); +#ifdef SHARED + else + { + return (*__vdso_gettimeofday) (tv, tz); + } +#endif +} + +INTDEF (__gettimeofday) weak_alias (__gettimeofday, gettimeofday) diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/libc-start.c libc-41-32/sysdeps/unix/sysv/linux/powerpc/libc-start.c --- libc/sysdeps/unix/sysv/linux/powerpc/libc-start.c 1969-12-31 19:00:00.000000000 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/libc-start.c 2005-11-11 09:31:48.000000000 -0500 @@ -0,0 +1,201 @@ +/* Copyright (C) 1998,2000,2001,2002,2003,2004,2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include +#include +#include +#include +#include + +extern int __cache_line_size; +weak_extern (__cache_line_size) +/* The main work is done in the generic function. */ +#define LIBC_START_MAIN generic_start_main +#define LIBC_START_DISABLE_INLINE +#define LIBC_START_MAIN_AUXVEC_ARG +#define MAIN_AUXVEC_ARG +#define INIT_MAIN_ARGS +#include + struct startup_info + { + void *__unbounded sda_base; + int (*main) (int, char **, char **, void *); + int (*init) (int, char **, char **, void *); + void (*fini) (void); + }; + + +#ifdef SHARED +#include +#include +#undef __gettimeofday +#undef __clock_gettime +#undef __clock_getres +#include + + __vdso_gettimeofday_t __vdso_gettimeofday; + __vdso_clock_gettime_t __vdso_clock_gettime; + __vdso_clock_getres_t __vdso_clock_getres; + __vdso_get_tbfreq_t __vdso_get_tbfreq; + +#if __WORDSIZE == 64 + typedef struct + { + void *func; + void *toc_ptr; + void *extra; + } _ppc_func_desciptor; + + static struct + { + _ppc_func_desciptor fd_gettimeofday; + _ppc_func_desciptor fd_clock_gettime; + _ppc_func_desciptor fd_clock_getres; + _ppc_func_desciptor fd_get_tbfreq; + } __vdso_descriptor; +#endif + + static inline void _libc_vdso_platform_setup (void) +{ + void *vdso_ref = NULL; + + __vdso_gettimeofday = NULL; + __vdso_clock_gettime = NULL; + __vdso_clock_getres = NULL; + __vdso_get_tbfreq = NULL; + + vdso_ref = _dl_vdso_weakref ("__kernel_gettimeofday"); + +#if __WORDSIZE == 64 + if (vdso_ref != NULL) + { + __vdso_descriptor.fd_gettimeofday.func = vdso_ref; + __vdso_descriptor.fd_gettimeofday.toc_ptr = NULL; + __vdso_descriptor.fd_gettimeofday.extra = NULL; + __vdso_gettimeofday = (__vdso_gettimeofday_t) + & __vdso_descriptor.fd_gettimeofday; + } +#else + __vdso_gettimeofday = (__vdso_gettimeofday_t) vdso_ref; +#endif + + vdso_ref = _dl_vdso_weakref ("__kernel_clock_gettime"); + +#if __WORDSIZE == 64 + if (vdso_ref != NULL) + { + __vdso_descriptor.fd_clock_gettime.func = vdso_ref; + __vdso_descriptor.fd_clock_gettime.toc_ptr = NULL; + __vdso_descriptor.fd_clock_gettime.extra = NULL; + __vdso_clock_gettime = (__vdso_clock_gettime_t) + & __vdso_descriptor.fd_clock_gettime; + } +#else + __vdso_clock_gettime = (__vdso_clock_gettime_t) vdso_ref; +#endif + + vdso_ref = _dl_vdso_weakref ("__kernel_clock_getres"); + +#if __WORDSIZE == 64 + if (vdso_ref != NULL) + { + __vdso_descriptor.fd_clock_getres.func = vdso_ref; + __vdso_descriptor.fd_clock_getres.toc_ptr = NULL; + __vdso_descriptor.fd_clock_getres.extra = NULL; + __vdso_clock_getres = (__vdso_clock_getres_t) + & __vdso_descriptor.fd_clock_getres; + } +#else + __vdso_clock_getres = (__vdso_clock_getres_t) vdso_ref; +#endif + + vdso_ref = _dl_vdso_weakref ("__kernel_vdso_get_tbfreq"); + +#if __WORDSIZE == 64 + if (vdso_ref != NULL) + { + __vdso_descriptor.fd_clock_getres.func = vdso_ref; + __vdso_descriptor.fd_clock_getres.toc_ptr = NULL; + __vdso_descriptor.fd_clock_getres.extra = NULL; + __vdso_get_tbfreq = (__vdso_get_tbfreq_t) + & __vdso_descriptor.fd_get_tbfreq; + } +#else + __vdso_get_tbfreq = (__vdso_get_tbfreq_t) vdso_ref; +#endif +} +#endif + +int +/* GKM FIXME: GCC: this should get __BP_ prefix by virtue of the + BPs in the arglist of startup_info.main and startup_info.init. */ + BP_SYM (__libc_start_main) (int argc, char *__unbounded * __unbounded ubp_av, + char *__unbounded * __unbounded ubp_ev, + ElfW (auxv_t) * __unbounded auxvec, + void (*rtld_fini) (void), + struct startup_info * __unbounded stinfo, + char *__unbounded * __unbounded stack_on_entry) +{ +#if __BOUNDED_POINTERS__ + char **argv; +#else +# define argv ubp_av +#endif + + /* the PPC SVR4 ABI says that the top thing on the stack will + be a NULL pointer, so if not we assume that we're being called + as a statically-linked program by Linux... */ + if (*stack_on_entry != NULL) + { + char *__unbounded * __unbounded temp; + /* ...in which case, we have argc as the top thing on the + stack, followed by argv (NULL-terminated), envp (likewise), + and the auxilary vector. */ + /* 32/64-bit agnostic load from stack */ + argc = *(long int *__unbounded) stack_on_entry; + ubp_av = stack_on_entry + 1; + ubp_ev = ubp_av + argc + 1; +#ifdef HAVE_AUX_VECTOR + temp = ubp_ev; + while (*temp != NULL) + ++temp; + auxvec = (ElfW (auxv_t) *)++ temp; +#endif + rtld_fini = NULL; + } + + /* Initialize the __cache_line_size variable from the aux vector. */ + for (ElfW (auxv_t) * av = auxvec; av->a_type != AT_NULL; ++av) + switch (av->a_type) + { + case AT_DCACHEBSIZE: + { + int *cls = &__cache_line_size; + if (cls != NULL) + *cls = av->a_un.a_val; + } + break; + } +#ifdef SHARED + /* Resolve and initialize function pointers for VDSO functions. */ + _libc_vdso_platform_setup (); +#endif + return generic_start_main (stinfo->main, argc, ubp_av, auxvec, + stinfo->init, stinfo->fini, rtld_fini, + stack_on_entry); +} diff -uNr libc/sysdeps/unix/sysv/linux/powerpc/Makefile libc-41-32/sysdeps/unix/sysv/linux/powerpc/Makefile --- libc/sysdeps/unix/sysv/linux/powerpc/Makefile 2005-11-11 11:18:22.776567904 -0500 +++ libc-41-32/sysdeps/unix/sysv/linux/powerpc/Makefile 2005-11-11 09:31:48.000000000 -0500 @@ -2,3 +2,8 @@ ifeq ($(subdir),rt) librt-routines += rt-sysdep endif + +ifeq ($(subdir),misc) +routines += dl-vdso +endif + --- /dev/null 2005-08-19 11:20:54.270914304 -0400 +++ libc-34-32/sysdeps/powerpc/powerpc32/hp-timing.h 2005-11-11 15:44:16.817619528 -0500 @@ -0,0 +1,83 @@ +/* High precision, low overhead timing functions. Generic version. + Copyright (C) 1998, 2000 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Ulrich Drepper , 1998. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#ifndef _HP_TIMING_H +#define _HP_TIMING_H 1 + + +/* There are no generic definitions for the times. We could write something + using the `gettimeofday' system call where available but the overhead of + the system call might be too high. + + In case a platform supports timers in the hardware the following macros + and types must be defined: + + - HP_TIMING_AVAIL: test for availability. + + - HP_TIMING_INLINE: this macro is non-zero if the functionality is not + implemented using function calls but instead uses some inlined code + which might simply consist of a few assembler instructions. We have to + know this since we might want to use the macros here in places where we + cannot make function calls. + + - hp_timing_t: This is the type for variables used to store the time + values. + + - HP_TIMING_ZERO: clear `hp_timing_t' object. + + - HP_TIMING_NOW: place timestamp for current time in variable given as + parameter. + + - HP_TIMING_DIFF_INIT: do whatever is necessary to be able to use the + HP_TIMING_DIFF macro. + + - HP_TIMING_DIFF: compute difference between two times and store it + in a third. Source and destination might overlap. + + - HP_TIMING_ACCUM: add time difference to another variable. This might + be a bit more complicated to implement for some platforms as the + operation should be thread-safe and 64bit arithmetic on 32bit platforms + is not. + + - HP_TIMING_ACCUM_NT: this is the variant for situations where we know + there are no threads involved. + + - HP_TIMING_PRINT: write decimal representation of the timing value into + the given string. This operation need not be inline even though + HP_TIMING_INLINE is specified. + +*/ + +/* Provide dummy definitions. */ +#define HP_TIMING_AVAIL (0) +#define HP_TIMING_INLINE (0) +typedef unsigned long long int hp_timing_t; +#define HP_TIMING_ZERO(Var) +#define HP_TIMING_NOW(var) +#define HP_TIMING_DIFF_INIT() +#define HP_TIMING_DIFF(Diff, Start, End) +#define HP_TIMING_ACCUM(Sum, Diff) +#define HP_TIMING_ACCUM_NT(Sum, Diff) +#define HP_TIMING_PRINT(Buf, Len, Val) + +/* Since this implementation is not available we tell the user about it. */ +#define HP_TIMING_NONAVAIL 1 + +#endif /* hp-timing.h */ From sjmunroe at us.ibm.com Sat Nov 12 09:32:11 2005 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Fri, 11 Nov 2005 16:32:11 -0600 Subject: [PATCH] glibc vDSO bits for ppc/ppc64 update In-Reply-To: <1131747209.24637.255.camel@gaston> Message-ID: Benjamin Herrenschmidt wrote on 11/11/2005 04:13:28 PM: > > > Additional hp_timer_t is now declared as unsigned long long int > > to match up with benh's get_tbfreq on 32 bit ppc. (ppc64 is already > > 64 bit) > > I'm not sure having hp_timer_t be 64 bits on 32 bits libraries is the > right thing to do. I suppose it's a matter of performances tradeoff. The > fact that the vDSO __kernel_get_tbfreq() returns a long long doesn't > force you to use a long long hp_timer_t, the vDSO doesn't have to return > a value of type hp_timer_t :) > > I did that on the vDSO to avoid having to change the vDSO interface if > we ever have to deal with >4Ghz time sources. > A 64-bit timebase is required to the use hp-timing for clock_gettime( CLOCK_PROCESS_CPUTIME_ID, ) and clock_gettime(CLOCK_THREAD_CPUTIME_ID, ). So 64-bit it is. Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center From bunk at stusta.de Sat Nov 12 11:33:39 2005 From: bunk at stusta.de (Adrian Bunk) Date: Sat, 12 Nov 2005 01:33:39 +0100 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051111201849.GP5376@stusta.de> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> <20051111201849.GP5376@stusta.de> Message-ID: <20051112003339.GT5376@stusta.de> On Fri, Nov 11, 2005 at 09:18:49PM +0100, Adrian Bunk wrote: >... > This might give some motivation for people to convert drivers and would > avoid new code like the recently introduced kexec to use this obsolete > API. >... /me sits too long in front of the computer s/kexec// cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From paulus at samba.org Sat Nov 12 11:35:42 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 12 Nov 2005 11:35:42 +1100 Subject: [PATCH] Make 64-bit PowerPC use ARCH=powerpc by default Message-ID: <17269.14558.921730.210025@cargo.ozlabs.ibm.com> Since 64-bit PowerPC compiles and works with ARCH=powerpc for all supported platforms (with the possible exception of Cell, for which the support is still pretty immature), make ARCH=powerpc the default for 64-bit PowerPC machines. ppc64 is dead. Long live 64-bit powerpc! Signed-off-by: Paul Mackerras --- diff -urN powerpc-merge/Makefile merge-hack/Makefile --- powerpc-merge/Makefile 2005-11-11 21:16:18.000000000 +1100 +++ merge-hack/Makefile 2005-11-12 11:25:32.000000000 +1100 @@ -168,7 +168,8 @@ SUBARCH := $(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ -e s/arm.*/arm/ -e s/sa110/arm/ \ - -e s/s390x/s390/ -e s/parisc64/parisc/ ) + -e s/s390x/s390/ -e s/parisc64/parisc/ \ + -e s/ppc64/powerpc/ ) # Cross compiling and selecting different set of gcc/bin-utils # --------------------------------------------------------------------------- From paulus at samba.org Sat Nov 12 11:39:02 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 12 Nov 2005 11:39:02 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: References: <17268.35499.909993.63334@cargo.ozlabs.ibm.com> <1131743739.24637.237.camel@gaston> <1131744750.24637.245.camel@gaston> Message-ID: <17269.14758.346094.697174@cargo.ozlabs.ibm.com> Linus Torvalds writes: > But I was going to release a -rc1 today (unless Andrew), and I do want > things to just "work". So for me the question there is whether I just do > the one-liner Makefile thing to force "ppc64" -> "powerpc", or whether you > can get me a quick patch that gets ppc64 going again. Just sent you the one-liner patch to make ppc64 machines default to ARCH=powerpc. > I don't care terribly which way it goes, and if ppc64 ends up disabled in > -rc1 and dying off entirely during -rc2, that's ok by me. There are still a few files left in arch/ppc64 and include/asm-ppc64. Hopefully we can get them moved over by -rc2. > So kill it off now, or kill it off after 2.6.15, that's what the choice > boils down to. Let's kill it now. Paul. From sam at ravnborg.org Sat Nov 12 10:34:23 2005 From: sam at ravnborg.org (Sam Ravnborg) Date: Sat, 12 Nov 2005 00:34:23 +0100 Subject: [2.6 patch] add -Werror-implicit-function-declaration to CFLAGS In-Reply-To: <20051111132443.04061d10.akpm@osdl.org> References: <20051107200336.GH3847@stusta.de> <20051110042857.38b4635b.akpm@osdl.org> <20051111021258.GK5376@stusta.de> <20051110182443.514622ed.akpm@osdl.org> <20051111201849.GP5376@stusta.de> <20051111132443.04061d10.akpm@osdl.org> Message-ID: <20051111233423.GB28276@mars.ravnborg.org> On Fri, Nov 11, 2005 at 01:24:43PM -0800, Andrew Morton wrote: > Adrian Bunk wrote: > > > > > > > > > > > > Sorry, I need to build allmodconfig kernels on wacky architectures (eg > > > > > ppc64) and this patch is killing me. > > > > > > > > Can you send me the list of compile errors so that I can work on fixing > > > > them? > > > > > > > > > > No handily, sorry. Missing virt_to_bus() is the typical problem. > > > > > > > But in this case -Werror-implicit-function-declaration doesn't create > > new compile errors, it only moves compile errors from compile time to > > link or depmod time - which is IMHO not a bad change. > > It is a quite inconvenient change if you want to get full coverage with > `make allmodconfig'. It could be a Kconfig item if enabled or not. Then you could use the new mechanishm in kconfig to disable it for your allmodconfig builds. cat allmod.config CONFIG_CC_ERROR_IMPLICIT_FUNCTION_DECLARATION = 0 That should do the trick, but maybe too inconvinient?? Sam From nicolas-home at the-clickman.org Sun Nov 13 06:05:06 2005 From: nicolas-home at the-clickman.org (Nicolas DENIS) Date: Sat, 12 Nov 2005 20:05:06 +0100 Subject: Can't boot any linux on my server Message-ID: <43763CE2.6020707@the-clickman.org> Hello 64bit people! We suggest me on #ppc64 channel to write this here so it's what I do. I want to setup an old server I get with Gentoo PPC64 linux system. This is an old CHRP system, IBM Like hardware manufactured by BULL This is a 2-way RS64-III server known as "BULL Escala T450" 2 GB memory Open Firmware (ok prompt) Try to boot Gentoo CD 2005.1 (try all kernels) ok prompt command to boot : boot cdrom:\ppc\chrp\yaboot Kernel never boot, see the trace I have : Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 01d00000, size: 699 Kbytes can't set args OF stdout device is: /pci at fe0f0020/isa at 1/serial at i3f8 couldn't open /packages/elf-loader command line: memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 0000000001daf000 alloc_top : 0000000040000000 alloc_top_hi : 0000000130000000 rmo_top : 0000000040000000 ram_top : 0000000130000000 Looking for displays found display : /pci at fe0f0020/display at 6, opening ... done opening PHB /pci at fe0f0020/pci at 5... done instantiating rtas at 0x000000003fffa000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000001 : starting cpu hw idx 0000000000000001... start-cpu call big-endian for CPU # 1 of module # 0 failed: ffffffffffdc8328 copying OF device tree ... Building dt strings... Building dt structure... (no more; stop here) Request help. AIX System still working on the server for now. Can give any informations you can request Thanks. Nicolas "Clickman" More info you can use : Server info from Open Firmware : ibm,model-class F5 name BULL,Escala T450 model BULL,Escala T450 platform-open-pic 00000000 42000000 ibm,vpd 00 00 00 df 82 08 00 46 50 4c 20 20 20 20 20 90 ibm,loc-code U1.0-P1U1.0-P1-X1U1.0-V1U1.0-P1-X2U1.0-P1-C0U1.0-P1-C1U1.0-P1-M0U1.0-P1-M1U1.0-P1-M2U1.0-P1-M3U1.0-P1-X1U1. 0-P1-X1U1.0-P1-X1U1.0-P1-X1U1.0-P1-X2 system-id XAN F01 4C 001045 clock-frequency 05f5e100 #size-cells 00000002 #address-cells 00000002 device_type chrp ibm,aix-diagnostics Firmware file is here (date is Dec 04, 2000, this is the lastest, really!!) : http://www-opensup.bull.com/firm/pega/FW_F0.05.07.bin Readme file for Firmware : http://www-opensup.bull.com/firm/pega/FW_F0.05.07.txt Device tree from OpenFirmware : /BULL,module at fe0f0020 /BULL,local-nvram at 7fea82a6 /pci at fe0f0090 /event-sources /rtas /rom at fff00000 /nvram at ffdf9cc0 /pci at fe0f0020 /os /cpus /mmu /memory at 0 /aliases /options /openprom /chosen /packages /pci at fe0f0090/ethernet at 8 /event-sources/internal-errors /event-sources/epow-events /pci at fe0f0020/display at 6 /pci at fe0f0020/pci at 5 /pci at fe0f0020/interrupt-controller at 4 /pci at fe0f0020/scsi at 3 /pci at fe0f0020/scsi at 2 /pci at fe0f0020/isa at 1 /pci at fe0f0020/scsi at 3/tape /pci at fe0f0020/scsi at 3/disk /pci at fe0f0020/scsi at 2/tape /pci at fe0f0020/scsi at 2/disk /pci at fe0f0020/isa at 1/8042 at i60 /pci at fe0f0020/isa at 1/fdc at i3f0 /pci at fe0f0020/isa at 1/parallel at i378 /pci at fe0f0020/isa at 1/reserved at i62 /pci at fe0f0020/isa at 1/serial at i3e8 /pci at fe0f0020/isa at 1/serial at i2f8 /pci at fe0f0020/isa at 1/serial at i3f8 /pci at fe0f0020/isa at 1/nvram at i74 /pci at fe0f0020/isa at 1/nvram at mdc0000 /pci at fe0f0020/isa at 1/rtc at i70 /pci at fe0f0020/isa at 1/BULL,power-status at i800 /pci at fe0f0020/isa at 1/BULL,bump at i801 /pci at fe0f0020/isa at 1/timer at i40 /pci at fe0f0020/isa at 1/interrupt-controller at i20 /pci at fe0f0020/isa at 1/dma-controller at i00 /pci at fe0f0020/isa at 1/8042 at i60/mouse at 1 /pci at fe0f0020/isa at 1/8042 at i60/keyboard at 0 /os/aix /cpus/PowerPC,RS64-III at 1 /cpus/PowerPC,RS64-III at 0 /cpus/PowerPC,RS64-III at 1/l2-cache /cpus/PowerPC,RS64-III at 0/l2-cache /memory at 0/IBM,memory-module at 1f /memory at 0/IBM,memory-module at 1b /memory at 0/IBM,memory-module at 17 /memory at 0/IBM,memory-module at f /memory at 0/IBM,memory-module at 1e /memory at 0/IBM,memory-module at 1a /memory at 0/IBM,memory-module at 16 /memory at 0/IBM,memory-module at e /memory at 0/IBM,memory-module at 1d /memory at 0/IBM,memory-module at 19 /memory at 0/IBM,memory-module at 15 /memory at 0/IBM,memory-module at d /memory at 0/IBM,memory-module at 1c /memory at 0/IBM,memory-module at 18 /memory at 0/IBM,memory-module at 14 /memory at 0/IBM,memory-module at c /openprom/client-services /packages/tape-label /packages/BULL,mirror /packages/BULL,local-nvram /packages/isa /packages/pci /packages/disk-label /packages/iso9660-file-system /packages/fat-file-system /packages/obp-tftp /packages/deblocker /packages/stringio /packages/terminal-emulator Aliases set on OpenFirmware : screen /pci at fe0f0020/display at 6 keyboard /pci at fe0f0020/isa at 1/8042/keyboard fdc /pci at fe0f0020/isa at 1/fdc floppy /pci at fe0f0020/isa at 1/fdc mirror /pci at fe0f0020/isa at 1/BULL,mirror com3 /pci at fe0f0020/isa at 1/serial at i3e8 com2 /pci at fe0f0020/isa at 1/serial at i2f8 com1 /pci at fe0f0020/isa at 1/serial at i3f8 cdromdev /pci at fe0f0020/scsi at 3/disk at 6 cdrom /pci at fe0f0020/scsi at 3/disk at 6:\ppc\BULL_chrp\bootfile.exe tape /pci at fe0f0020/scsi at 3/tape at 5:fixed aixtape /pci at fe0f0020/scsi at 3/tape at 5:fixed mouse /pci at fe0f0020/isa/8042/mouse aixdisk /pci at fe0f0020/scsi at 2/disk at 0,0 net /pci at fe0f0090/ethernet at 8 scsi1 /pci at fe0f0020/scsi at 2 scsi /pci at fe0f0020/scsi at 3 diskr /pci at fe0f0090/DPT,scsis at b/scsi at 1/disk at 0,0 disk /pci at fe0f0020/scsi at 2/disk at 0,0 aix /os/aix SCSI Chain : /pci at fe0f0020/scsi at 3 Target 0 Unit 0 Disk SEAGATE ST39103LC 3127 Target 1 Unit 0 Disk SEAGATE ST318203LC 3256 Target 2 Unit 0 Disk SEAGATE ST318203LC 3256 Target 5 Unit 0 Removable Tape SONY SDT-9000 0600 Target 6 Unit 0 Removable Read Only device Toshiba CD-ROM XM-6201TA1D08 Target 9 Unit 0 Disk SEAGATE ST318406LC 010A Target a Unit 0 Disk SEAGATE ST318406LC 010A /pci at fe0f0020/scsi at 2 Target 0 Unit 0 Disk SEAGATE ST39204LC 0006 Target 1 Unit 0 Disk SEAGATE ST39173WC 6244 Target 2 Unit 0 Disk SEAGATE ST32171WC 0852 Target 8 Unit 0 Disk SEAGATE ST318406LC 010A Target 9 Unit 0 Disk SEAGATE ST318203LC 0500 Target a Unit 0 Disk SEAGATE ST318405LC 0105 From benh at kernel.crashing.org Sun Nov 13 11:27:39 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 13 Nov 2005 11:27:39 +1100 Subject: [PATCH] powerpc: vdso fixes Message-ID: <1131841660.5504.5.camel@gaston> This fixes various errors in the new functions added in the vDSO's, I've been able to test the 32 bits version and I get consistent results with the corresponding syscalls. There is still a question about get_tbfreq() though. It currently returns the value that the kernel keeps in tb_ticks_per_sec. This value is obtained from the timebase at boot, but it's truncated to HZ precision: tb_ticks_per_jiffy = ppc_tb_freq / HZ; tb_ticks_per_sec = tb_ticks_per_jiffy * HZ; And it's later on modified by the ppc_adjtimex() code. This is different from the value exposed in /proc/cpuinfo for the timebase which is a straight copy of ppc_tb_freq, which is the calibration value obtained at boot and unmodified. The question at this point are: Is that "rouding" to HZ done by the kernel correct ? And should the vDSO return this value that gets adjusted or the fixed initial calibration value, or both. In the later case, should I add an argument to get_tbfreq() or add a separate function ? In the meantime, please apply this patch as it fixes a few annoying bug in the vDSO implementation currently in -rc1. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/asm-offsets.c 2005-11-13 10:34:07.000000000 +1100 +++ linux-work/arch/powerpc/kernel/asm-offsets.c 2005-11-13 10:34:50.000000000 +1100 @@ -275,8 +275,8 @@ #else DEFINE(TVAL32_TV_SEC, offsetof(struct timeval, tv_sec)); DEFINE(TVAL32_TV_USEC, offsetof(struct timeval, tv_usec)); - DEFINE(TSPEC32_TV_SEC, offsetof(struct timespec, tv_sec)); - DEFINE(TSPEC32_TV_NSEC, offsetof(struct timespec, tv_nsec)); + DEFINE(TSPC32_TV_SEC, offsetof(struct timespec, tv_sec)); + DEFINE(TSPC32_TV_NSEC, offsetof(struct timespec, tv_nsec)); #endif /* timeval/timezone offsets for use by vdso */ DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); Index: linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-13 10:34:07.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-13 10:55:55.000000000 +1100 @@ -83,7 +83,7 @@ /* Check for supported clock IDs */ cmpli cr0,r3,CLOCK_REALTIME cmpli cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f mflr r12 /* r12 saves lr */ @@ -91,7 +91,7 @@ mr r10,r3 /* r10 saves id */ mr r11,r4 /* r11 saves tp */ bl __get_datapage at local /* get data page */ - mr r9, r3 /* datapage ptr in r9 */ + mr r9,r3 /* datapage ptr in r9 */ beq cr1,50f /* if monotonic -> jump there */ /* @@ -210,7 +210,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f li r3,0 Index: linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-13 10:34:07.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-13 10:56:08.000000000 +1100 @@ -68,7 +68,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f mflr r12 /* r12 saves lr */ @@ -181,7 +181,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f li r3,0 Index: linux-work/arch/powerpc/kernel/vdso32/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/datapage.S 2005-11-12 08:27:18.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/datapage.S 2005-11-13 11:06:39.000000000 +1100 @@ -77,8 +77,9 @@ mflr r12 .cfi_register lr,r12 bl __get_datapage at local - lwz r3,CFG_TB_TICKS_PER_SEC(r3) lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3) + lwz r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) Index: linux-work/arch/powerpc/kernel/vdso64/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/datapage.S 2005-11-12 08:27:19.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/datapage.S 2005-11-13 10:58:45.000000000 +1100 @@ -80,5 +80,6 @@ bl V_LOCAL_FUNC(__get_datapage) ld r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) From gb at clozure.com Sun Nov 13 19:06:45 2005 From: gb at clozure.com (Gary Byers) Date: Sun, 13 Nov 2005 01:06:45 -0700 (MST) Subject: FPSCR and 64-bit signal handlers Message-ID: <20051113001733.H53336@clozure.com> Hi. In the 2.6.14 kernel.org tree, the function setup_sigcontext() (in .../arch/ppc64/signal.c) contains (inter alia): flush_fp_to_thread(current); /* Make sure signal doesn't get spurrious FP exceptions */ current->thread.fpscr = 0; [...] err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE); [...] There seem to be a couple of unfortunate consequences of these things happening in this order: 1) The FPSCR image in current->thread.fpscr is zeroed before the thread's FP context is copied out to the userspace sigcontext; this means that the handler will run with the FPSCR set to 0 (which is arguably good), but it also means that the handler does not have access to the FPSCR value at the time of the exception. This also means that - unless all signal handlers are aware of this issue and somehow work around it - a thread's FPSCR value may be changed from non-zero to zero on return from a handler from any signal handler whose context was established by this function. (I'd think that changing the order in which the copyout and the zeroing of the FPSCR occur would fix this.) 2) The assembly-language function flush_fp_to_thread() has a side-effect on the f0/fr0 register (it's used to access the value of the FPSCR and store it in the thread context after it and the other 31 fp registers have been saved there.) There seem to be execution paths in which flush_fp_to_thread() is called at least once between the time that an exception is detected and the call above in setup_sigcontext() (it's called in parse_fpe(), in .../arch/ppc64/traps.c); on any call other than the first, the value of f0/fr0 saved in the current thread context will be incorrect (it'll contain some of the bits that were in the FPSCR on the most recent previous call.) This means both that signal handlers that care about the value of f0/fr0 will get incorrect information and also that the value of f0/fr0 can change unexpectedly on return from any signal handler whose context was established by this function if there was more than one call to flush_fp_to_thread() on the execution path. (I'd guess that the best fix would be to have flush_fp_to_thread() be a little more careful about preserving the value of f0/fr0, perhaps by reloading it after it'd been used to access and save the FPSCR.) I don't know how many programs are affected by these problems, but as I understand them these are both serious problems. Gary Byers gb at clozure.com http://www.clozure.com From benh at kernel.crashing.org Mon Nov 14 10:23:59 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 10:23:59 +1100 Subject: FPSCR and 64-bit signal handlers In-Reply-To: <20051113001733.H53336@clozure.com> References: <20051113001733.H53336@clozure.com> Message-ID: <1131924239.5504.89.camel@gaston> > 1) The FPSCR image in current->thread.fpscr is zeroed before the thread's > FP context is copied out to the userspace sigcontext; this means that > the handler will run with the FPSCR set to 0 (which is arguably good), > but it also means that the handler does not have access to the FPSCR > value at the time of the exception. This also means that - unless all > signal handlers are aware of this issue and somehow work around it - > a thread's FPSCR value may be changed from non-zero to zero on return > from a handler from any signal handler whose context was established > by this function. Yes, there seem to indeed be a bug, fpscr should be cleared after we copy the FP registers. > (I'd think that changing the order in which the copyout and the zeroing > of the FPSCR occur would fix this.) Yes. > 2) The assembly-language function flush_fp_to_thread() has a > side-effect on the f0/fr0 register (it's used to access the value of > the FPSCR and store it in the thread context after it and the other 31 > fp registers have been saved there.) There seem to be execution paths > in which flush_fp_to_thread() is called at least once between the time > that an exception is detected and the call above in setup_sigcontext() > (it's called in parse_fpe(), in .../arch/ppc64/traps.c); on any call > other than the first, the value of f0/fr0 saved in the current thread > context will be incorrect (it'll contain some of the bits that were in > the FPSCR on the most recent previous call.) This means both that > signal handlers that care about the value of f0/fr0 will get incorrect > information and also that the value of f0/fr0 can change unexpectedly > on return from any signal handler whose context was established by > this function if there was more than one call to flush_fp_to_thread() > on the execution path. Have you seen any occurence of the above ? flush_fp_to_thread() called more than once should do nothing since the first call should clear the MSR_FP bit of the MSR copy in the thread structure, causing further calls to do nothing. > (I'd guess that the best fix would be to have flush_fp_to_thread() > be a little more careful about preserving the value of f0/fr0, perhaps > by reloading it after it'd been used to access and save the FPSCR.) > > I don't know how many programs are affected by these problems, but > as I understand them these are both serious problems. Ben. From benh at kernel.crashing.org Mon Nov 14 11:02:20 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 11:02:20 +1100 Subject: [PATCH] powerpc: Always rebuild arch/powerpc/include/asm symlink Message-ID: <1131926541.5504.94.camel@gaston> This patch uses a FORCE dependency on the arch/powerpc/include/asm symlink so that it always gets rebuilt, thus avoiding all sort of funny errors if the .config is changed between 32 and 64 bits. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/Makefile =================================================================== --- linux-work.orig/arch/powerpc/Makefile 2005-11-14 11:02:20.000000000 +1100 +++ linux-work/arch/powerpc/Makefile 2005-11-14 11:03:03.000000000 +1100 @@ -187,7 +187,7 @@ # Temporary hack until we have migrated to asm-powerpc include/asm: arch/$(ARCH)/include/asm -arch/$(ARCH)/include/asm: +arch/$(ARCH)/include/asm: FORCE $(Q)if [ ! -d arch/$(ARCH)/include ]; then mkdir -p arch/$(ARCH)/include; fi $(Q)ln -fsn $(srctree)/include/asm-$(OLDARCH) arch/$(ARCH)/include/asm From gb at clozure.com Mon Nov 14 12:01:41 2005 From: gb at clozure.com (Gary Byers) Date: Sun, 13 Nov 2005 18:01:41 -0700 (MST) Subject: FPSCR and 64-bit signal handlers In-Reply-To: <1131924239.5504.89.camel@gaston> References: <20051113001733.H53336@clozure.com> <1131924239.5504.89.camel@gaston> Message-ID: <20051113170133.F10034@clozure.com> On Mon, 14 Nov 2005, Benjamin Herrenschmidt wrote: > >> 2) The assembly-language function flush_fp_to_thread() has a >> side-effect on the f0/fr0 register (it's used to access the value of >> the FPSCR and store it in the thread context after it and the other 31 >> fp registers have been saved there.) There seem to be execution paths >> in which flush_fp_to_thread() is called at least once between the time >> that an exception is detected and the call above in setup_sigcontext() >> (it's called in parse_fpe(), in .../arch/ppc64/traps.c); on any call >> other than the first, the value of f0/fr0 saved in the current thread >> context will be incorrect (it'll contain some of the bits that were in >> the FPSCR on the most recent previous call.) This means both that >> signal handlers that care about the value of f0/fr0 will get incorrect >> information and also that the value of f0/fr0 can change unexpectedly >> on return from any signal handler whose context was established by >> this function if there was more than one call to flush_fp_to_thread() >> on the execution path. > > Have you seen any occurence of the above ? flush_fp_to_thread() called > more than once should do nothing since the first call should clear the > MSR_FP bit of the MSR copy in the thread structure, causing further > calls to do nothing. I must have been confused about this; I can't reproduce any case where f0 gets clobbered as a result of calling flush_fp_to_thread() more than once. > > Ben. > > > From benh at kernel.crashing.org Mon Nov 14 14:55:58 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 14:55:58 +1100 Subject: [PATCH] powerpc: vdso fixes (take #2) Message-ID: <1131940559.5504.118.camel@gaston> This fixes various errors in the new functions added in the vDSO's, I've now verified all functions on both 32 and 64 bits vDSOs. It also fix a sign extension bug getting the initial time of day at boot that could cause the monotonic clock value to be completely on bogus for 64 bits applications (with either the vDSO or the syscall) on powermacs. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/asm-offsets.c 2005-11-14 11:06:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/asm-offsets.c 2005-11-14 13:09:57.000000000 +1100 @@ -270,13 +270,15 @@ DEFINE(TVAL64_TV_USEC, offsetof(struct timeval, tv_usec)); DEFINE(TVAL32_TV_SEC, offsetof(struct compat_timeval, tv_sec)); DEFINE(TVAL32_TV_USEC, offsetof(struct compat_timeval, tv_usec)); + DEFINE(TSPC64_TV_SEC, offsetof(struct timespec, tv_sec)); + DEFINE(TSPC64_TV_NSEC, offsetof(struct timespec, tv_nsec)); DEFINE(TSPC32_TV_SEC, offsetof(struct compat_timespec, tv_sec)); DEFINE(TSPC32_TV_NSEC, offsetof(struct compat_timespec, tv_nsec)); #else DEFINE(TVAL32_TV_SEC, offsetof(struct timeval, tv_sec)); DEFINE(TVAL32_TV_USEC, offsetof(struct timeval, tv_usec)); - DEFINE(TSPEC32_TV_SEC, offsetof(struct timespec, tv_sec)); - DEFINE(TSPEC32_TV_NSEC, offsetof(struct timespec, tv_nsec)); + DEFINE(TSPC32_TV_SEC, offsetof(struct timespec, tv_sec)); + DEFINE(TSPC32_TV_NSEC, offsetof(struct timespec, tv_nsec)); #endif /* timeval/timezone offsets for use by vdso */ DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); Index: linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-14 11:06:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-14 12:06:27.000000000 +1100 @@ -83,7 +83,7 @@ /* Check for supported clock IDs */ cmpli cr0,r3,CLOCK_REALTIME cmpli cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f mflr r12 /* r12 saves lr */ @@ -91,7 +91,7 @@ mr r10,r3 /* r10 saves id */ mr r11,r4 /* r11 saves tp */ bl __get_datapage at local /* get data page */ - mr r9, r3 /* datapage ptr in r9 */ + mr r9,r3 /* datapage ptr in r9 */ beq cr1,50f /* if monotonic -> jump there */ /* @@ -173,10 +173,14 @@ add r4,r4,r7 lis r5,NSEC_PER_SEC at h ori r5,r5,NSEC_PER_SEC at l - cmpli cr0,r4,r5 + cmpl cr0,r4,r5 + cmpli cr1,r4,0 blt 1f subf r4,r5,r4 addi r3,r3,1 +1: bge cr1,1f + addi r3,r3,-1 + add r4,r4,r5 1: stw r3,TSPC32_TV_SEC(r11) stw r4,TSPC32_TV_NSEC(r11) @@ -210,7 +214,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f li r3,0 Index: linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-14 11:06:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-14 14:38:51.000000000 +1100 @@ -1,4 +1,5 @@ -/* + + /* * Userland implementation of gettimeofday() for 64 bits processes in a * ppc64 kernel for use in the vDSO * @@ -68,7 +69,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f mflr r12 /* r12 saves lr */ @@ -84,16 +85,17 @@ bl V_LOCAL_FUNC(__do_get_xsec) /* get xsec from tb & kernel */ - lis r7,0x3b9a /* r7 = 1000000000 = NSEC_PER_SEC */ - ori r7,r7,0xca00 + lis r7,15 /* r7 = 1000000 = USEC_PER_SEC */ + ori r7,r7,16960 rldicl r5,r4,44,20 /* r5 = sec = xsec / XSEC_PER_SEC */ rldicr r6,r5,20,43 /* r6 = sec * XSEC_PER_SEC */ std r5,TSPC64_TV_SEC(r11) /* store sec in tv */ subf r0,r6,r4 /* r0 = xsec = (xsec - r6) */ - mulld r0,r0,r7 /* nsec = (xsec * NSEC_PER_SEC) / + mulld r0,r0,r7 /* usec = (xsec * USEC_PER_SEC) / * XSEC_PER_SEC */ rldicl r0,r0,44,20 + mulli r0,r0,1000 /* nsec = usec * 1000 */ std r0,TSPC64_TV_NSEC(r11) /* store nsec in tp */ mtlr r12 @@ -106,15 +108,16 @@ 50: bl V_LOCAL_FUNC(__do_get_xsec) /* get xsec from tb & kernel */ - lis r7,0x3b9a /* r7 = 1000000000 = NSEC_PER_SEC */ - ori r7,r7,0xca00 + lis r7,15 /* r7 = 1000000 = USEC_PER_SEC */ + ori r7,r7,16960 rldicl r5,r4,44,20 /* r5 = sec = xsec / XSEC_PER_SEC */ rldicr r6,r5,20,43 /* r6 = sec * XSEC_PER_SEC */ subf r0,r6,r4 /* r0 = xsec = (xsec - r6) */ - mulld r0,r0,r7 /* nsec = (xsec * NSEC_PER_SEC) / + mulld r0,r0,r7 /* usec = (xsec * USEC_PER_SEC) / * XSEC_PER_SEC */ rldicl r6,r0,44,20 + mulli r6,r6,1000 /* nsec = usec * 1000 */ /* now we must fixup using wall to monotonic. We need to snapshot * that value and do the counter trick again. Fortunately, we still @@ -123,8 +126,8 @@ * can be used */ - lwz r4,WTOM_CLOCK_SEC(r9) - lwz r7,WTOM_CLOCK_NSEC(r9) + lwa r4,WTOM_CLOCK_SEC(r3) + lwa r7,WTOM_CLOCK_NSEC(r3) /* We now have our result in r4,r7. We create a fake dependency * on that result and re-check the counter @@ -144,10 +147,14 @@ add r7,r7,r6 lis r9,NSEC_PER_SEC at h ori r9,r9,NSEC_PER_SEC at l - cmpli cr0,r7,r9 + cmpl cr0,r7,r9 + cmpli cr1,r7,0 blt 1f subf r7,r9,r7 addi r4,r4,1 +1: bge cr1,1f + addi r4,r4,-1 + add r7,r7,r9 1: std r4,TSPC64_TV_SEC(r11) std r7,TSPC64_TV_NSEC(r11) @@ -181,7 +188,7 @@ /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC - cror cr0,cr0,cr1 + cror cr0*4+eq,cr0*4+eq,cr1*4+eq bne cr0,99f li r3,0 Index: linux-work/arch/powerpc/kernel/vdso32/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/datapage.S 2005-11-14 11:06:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/datapage.S 2005-11-14 11:07:11.000000000 +1100 @@ -77,8 +77,9 @@ mflr r12 .cfi_register lr,r12 bl __get_datapage at local - lwz r3,CFG_TB_TICKS_PER_SEC(r3) lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3) + lwz r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) Index: linux-work/arch/powerpc/kernel/vdso64/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/datapage.S 2005-11-14 11:06:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/datapage.S 2005-11-14 11:07:11.000000000 +1100 @@ -80,5 +80,6 @@ bl V_LOCAL_FUNC(__get_datapage) ld r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) Index: linux-work/include/asm-powerpc/vdso_datapage.h =================================================================== --- linux-work.orig/include/asm-powerpc/vdso_datapage.h 2005-11-14 10:42:00.000000000 +1100 +++ linux-work/include/asm-powerpc/vdso_datapage.h 2005-11-14 11:52:12.000000000 +1100 @@ -73,7 +73,7 @@ /* those additional ones don't have to be located anywhere * special as they were not part of the original systemcfg */ - __s64 wtom_clock_sec; /* Wall to monotonic clock */ + __s32 wtom_clock_sec; /* Wall to monotonic clock */ __s32 wtom_clock_nsec; __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */ __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */ Index: linux-work/arch/powerpc/platforms/powermac/time.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/time.c 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/arch/powerpc/platforms/powermac/time.c 2005-11-14 14:28:10.000000000 +1100 @@ -102,7 +102,7 @@ static unsigned long cuda_get_time(void) { struct adb_request req; - unsigned long now; + unsigned int now; if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0) return 0; @@ -113,7 +113,7 @@ req.reply_len); now = (req.reply[3] << 24) + (req.reply[4] << 16) + (req.reply[5] << 8) + req.reply[6]; - return now - RTC_OFFSET; + return ((unsigned long)now) - RTC_OFFSET; } #define cuda_get_rtc_time(tm) to_rtc_time(cuda_get_time(), (tm)) @@ -146,7 +146,7 @@ static unsigned long pmu_get_time(void) { struct adb_request req; - unsigned long now; + unsigned int now; if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0) return 0; @@ -156,7 +156,7 @@ req.reply_len); now = (req.reply[0] << 24) + (req.reply[1] << 16) + (req.reply[2] << 8) + req.reply[3]; - return now - RTC_OFFSET; + return ((unsigned long)now) - RTC_OFFSET; } #define pmu_get_rtc_time(tm) to_rtc_time(pmu_get_time(), (tm)) From benh at kernel.crashing.org Mon Nov 14 15:49:48 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 15:49:48 +1100 Subject: [PATCH] powerpc: kill ppc64 rtc.c, use genrtc instead Message-ID: <1131943789.5504.137.camel@gaston> This moves the rtas RTC callbacks to rtas-rtc.c in arch/powerpc/kernel, and kills the rest of arch/ppc64/kernel/rtc.c which was just a duplicate of the genrtc functionality. Also enable build of genrtc for CONFIG_PPC64 (it just works are we already have the required callbacks) and enable it in all defconfigs. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/configs/pseries_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/pseries_defconfig 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/configs/pseries_defconfig 2005-11-14 15:27:15.000000000 +1100 @@ -1,18 +1,33 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14-rc4 -# Thu Oct 20 08:32:17 2005 +# Linux kernel version: 2.6.15-rc1 +# Mon Nov 14 15:27:00 2005 # +CONFIG_PPC64=y CONFIG_64BIT=y +CONFIG_PPC_MERGE=y CONFIG_MMU=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y -CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y +CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y -CONFIG_FORCE_MAX_ZONEORDER=13 + +# +# Processor support +# +# CONFIG_POWER4_ONLY is not set +CONFIG_POWER3=y +CONFIG_POWER4=y +CONFIG_PPC_FPU=y +CONFIG_ALTIVEC=y +CONFIG_PPC_STD_MMU=y +CONFIG_SMP=y +CONFIG_NR_CPUS=128 # # Code maturity level options @@ -68,75 +83,103 @@ CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y -CONFIG_SYSVIPC_COMPAT=y + +# +# Block layer +# + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support # -# CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y +# CONFIG_PPC_ISERIES is not set +# CONFIG_EMBEDDED6xx is not set +# CONFIG_APUS is not set CONFIG_PPC_PSERIES=y -# CONFIG_PPC_BPA is not set # CONFIG_PPC_PMAC is not set # CONFIG_PPC_MAPLE is not set -CONFIG_PPC=y -CONFIG_PPC64=y +# CONFIG_PPC_CELL is not set CONFIG_PPC_OF=y CONFIG_XICS=y +# CONFIG_U3_DART is not set CONFIG_MPIC=y -CONFIG_ALTIVEC=y -CONFIG_PPC_SPLPAR=y -CONFIG_KEXEC=y +CONFIG_PPC_RTAS=y +CONFIG_RTAS_ERROR_LOGGING=y +CONFIG_RTAS_PROC=y +CONFIG_RTAS_FLASH=m +# CONFIG_MMIO_NVRAM is not set CONFIG_IBMVIO=y -# CONFIG_U3_DART is not set -# CONFIG_BOOTX_TEXT is not set -# CONFIG_POWER4_ONLY is not set +# CONFIG_PPC_MPC106 is not set +# CONFIG_GENERIC_TBSYNC is not set +# CONFIG_CPU_FREQ is not set +# CONFIG_WANT_EARLY_SERIAL is not set + +# +# Kernel options +# +# CONFIG_HZ_100 is not set +CONFIG_HZ_250=y +# CONFIG_HZ_1000 is not set +CONFIG_HZ=250 +CONFIG_PREEMPT_NONE=y +# CONFIG_PREEMPT_VOLUNTARY is not set +# CONFIG_PREEMPT is not set +# CONFIG_PREEMPT_BKL is not set +CONFIG_BINFMT_ELF=y +# CONFIG_BINFMT_MISC is not set +CONFIG_FORCE_MAX_ZONEORDER=13 CONFIG_IOMMU_VMERGE=y -CONFIG_SMP=y -CONFIG_NR_CPUS=128 +CONFIG_HOTPLUG_CPU=y +CONFIG_KEXEC=y +# CONFIG_IRQ_ALL_CPUS is not set +CONFIG_PPC_SPLPAR=y +CONFIG_EEH=y +CONFIG_SCANLOG=m +CONFIG_LPARCFG=y +CONFIG_NUMA=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y -CONFIG_ARCH_FLATMEM_ENABLE=y -CONFIG_ARCH_DISCONTIGMEM_ENABLE=y -CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y +CONFIG_ARCH_SPARSEMEM_DEFAULT=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set -CONFIG_DISCONTIGMEM_MANUAL=y -# CONFIG_SPARSEMEM_MANUAL is not set -CONFIG_DISCONTIGMEM=y -CONFIG_FLAT_NODE_MEM_MAP=y +# CONFIG_DISCONTIGMEM_MANUAL is not set +CONFIG_SPARSEMEM_MANUAL=y +CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y +CONFIG_HAVE_MEMORY_PRESENT=y # CONFIG_SPARSEMEM_STATIC is not set +CONFIG_SPARSEMEM_EXTREME=y +# CONFIG_MEMORY_HOTPLUG is not set +CONFIG_SPLIT_PTLOCK_CPUS=4096 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_NODES_SPAN_OTHER_NODES=y -CONFIG_NUMA=y +# CONFIG_PPC_64K_PAGES is not set CONFIG_SCHED_SMT=y -CONFIG_PREEMPT_NONE=y -# CONFIG_PREEMPT_VOLUNTARY is not set -# CONFIG_PREEMPT is not set -# CONFIG_PREEMPT_BKL is not set -# CONFIG_HZ_100 is not set -CONFIG_HZ_250=y -# CONFIG_HZ_1000 is not set -CONFIG_HZ=250 -CONFIG_EEH=y -CONFIG_GENERIC_HARDIRQS=y -CONFIG_PPC_RTAS=y -CONFIG_RTAS_PROC=y -CONFIG_RTAS_FLASH=m -CONFIG_SCANLOG=m -CONFIG_LPARCFG=y -CONFIG_SECCOMP=y -CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set -CONFIG_HOTPLUG_CPU=y CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set +# CONFIG_PM is not set +CONFIG_SECCOMP=y CONFIG_ISA_DMA_API=y # -# Bus Options +# Bus options # +CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC_I8259=y +# CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y @@ -156,6 +199,7 @@ # CONFIG_HOTPLUG_PCI_SHPC is not set CONFIG_HOTPLUG_PCI_RPA=m CONFIG_HOTPLUG_PCI_RPA_DLPAR=m +CONFIG_KERNEL_START=0xc000000000000000 # # Networking @@ -197,6 +241,10 @@ # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set + +# +# Core Netfilter Configuration +# CONFIG_NETFILTER_NETLINK=y CONFIG_NETFILTER_NETLINK_QUEUE=m CONFIG_NETFILTER_NETLINK_LOG=m @@ -299,6 +347,10 @@ # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set + +# +# QoS and/or fair queueing +# # CONFIG_NET_SCHED is not set CONFIG_NET_CLS_ROUTE=y @@ -368,14 +420,6 @@ CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set - -# -# IO Schedulers -# -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # @@ -473,6 +517,7 @@ # # SCSI low-level drivers # +# CONFIG_ISCSI_TCP is not set # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set @@ -559,6 +604,7 @@ # # Macintosh device drivers # +# CONFIG_WINDFARM is not set # # Network device support @@ -645,7 +691,6 @@ # CONFIG_IXGB_NAPI is not set CONFIG_S2IO=m # CONFIG_S2IO_NAPI is not set -# CONFIG_2BUFF_MODE is not set # # Token Ring devices @@ -674,6 +719,7 @@ CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m +# CONFIG_PPP_MPPE is not set CONFIG_PPPOE=m # CONFIG_SLIP is not set # CONFIG_NET_FC is not set @@ -784,6 +830,8 @@ # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set +CONFIG_GEN_RTC=y +# CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set @@ -801,6 +849,7 @@ # TPM devices # # CONFIG_TCG_TPM is not set +# CONFIG_TELCLOCK is not set # # I2C support @@ -852,6 +901,7 @@ # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_SENSORS_MAX6875 is not set +# CONFIG_RTC_X1205_I2C is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set @@ -893,7 +943,6 @@ CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y -CONFIG_FB_SOFT_CURSOR=y CONFIG_FB_MACMODES=y CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y @@ -905,6 +954,7 @@ # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set +# CONFIG_FB_S1D13XXX is not set # CONFIG_FB_NVIDIA is not set # CONFIG_FB_RIVA is not set CONFIG_FB_MATROX=y @@ -927,7 +977,6 @@ # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set # CONFIG_FB_TRIDENT is not set -# CONFIG_FB_S1D13XXX is not set # CONFIG_FB_VIRTUAL is not set # @@ -936,6 +985,7 @@ # CONFIG_VGA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y +# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y @@ -990,12 +1040,15 @@ # # USB Device Class drivers # -# CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # -# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# + +# +# may also be needed; see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set @@ -1106,6 +1159,7 @@ # CONFIG_INFINIBAND_MTHCA_DEBUG is not set CONFIG_INFINIBAND_IPOIB=m # CONFIG_INFINIBAND_IPOIB_DEBUG is not set +# CONFIG_INFINIBAND_SRP is not set # # SN Devices @@ -1288,10 +1342,25 @@ # CONFIG_NLS_UTF8 is not set # -# Profiling support +# Library routines +# +CONFIG_CRC_CCITT=m +# CONFIG_CRC16 is not set +CONFIG_CRC32=y +CONFIG_LIBCRC32C=m +CONFIG_ZLIB_INFLATE=y +CONFIG_ZLIB_DEFLATE=m +CONFIG_TEXTSEARCH=y +CONFIG_TEXTSEARCH_KMP=m +CONFIG_TEXTSEARCH_BM=m +CONFIG_TEXTSEARCH_FSM=m + +# +# Instrumentation Support # CONFIG_PROFILING=y CONFIG_OPROFILE=y +# CONFIG_KPROBES is not set # # Kernel hacking @@ -1308,14 +1377,15 @@ # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y +# CONFIG_DEBUG_VM is not set +# CONFIG_RCU_TORTURE_TEST is not set CONFIG_DEBUG_STACKOVERFLOW=y -# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUGGER=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y -# CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y +# CONFIG_BOOTX_TEXT is not set # # Security options @@ -1355,17 +1425,3 @@ # # Hardware crypto devices # - -# -# Library routines -# -CONFIG_CRC_CCITT=m -# CONFIG_CRC16 is not set -CONFIG_CRC32=y -CONFIG_LIBCRC32C=m -CONFIG_ZLIB_INFLATE=y -CONFIG_ZLIB_DEFLATE=m -CONFIG_TEXTSEARCH=y -CONFIG_TEXTSEARCH_KMP=m -CONFIG_TEXTSEARCH_BM=m -CONFIG_TEXTSEARCH_FSM=m Index: linux-work/arch/powerpc/kernel/Makefile =================================================================== --- linux-work.orig/arch/powerpc/kernel/Makefile 2005-11-14 10:41:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/Makefile 2005-11-14 15:17:57.000000000 +1100 @@ -25,7 +25,7 @@ procfs-$(CONFIG_PPC64) := proc_ppc64.o obj-$(CONFIG_PROC_FS) += $(procfs-y) rtaspci-$(CONFIG_PPC64) := rtas_pci.o -obj-$(CONFIG_PPC_RTAS) += rtas.o $(rtaspci-y) +obj-$(CONFIG_PPC_RTAS) += rtas.o rtas-rtc.o $(rtaspci-y) obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_RTAS_PROC) += rtas-proc.o obj-$(CONFIG_LPARCFG) += lparcfg.o Index: linux-work/arch/powerpc/kernel/rtas-rtc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/kernel/rtas-rtc.c 2005-11-14 15:44:49.000000000 +1100 @@ -0,0 +1,105 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#define MAX_RTC_WAIT 5000 /* 5 sec */ +#define RTAS_CLOCK_BUSY (-2) +unsigned long __init rtas_get_boot_time(void) +{ + int ret[8]; + int error, wait_time; + unsigned long max_wait_tb; + + max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; + do { + error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret); + if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { + wait_time = rtas_extended_busy_delay_time(error); + /* This is boot time so we spin. */ + udelay(wait_time*1000); + error = RTAS_CLOCK_BUSY; + } + } while (error == RTAS_CLOCK_BUSY && (get_tb() < max_wait_tb)); + + if (error != 0 && printk_ratelimit()) { + printk(KERN_WARNING "error: reading the clock failed (%d)\n", + error); + return 0; + } + + return mktime(ret[0], ret[1], ret[2], ret[3], ret[4], ret[5]); +} + +/* NOTE: get_rtc_time will get an error if executed in interrupt context + * and if a delay is needed to read the clock. In this case we just + * silently return without updating rtc_tm. + */ +void rtas_get_rtc_time(struct rtc_time *rtc_tm) +{ + int ret[8]; + int error, wait_time; + unsigned long max_wait_tb; + + max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; + do { + error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret); + if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { + if (in_interrupt() && printk_ratelimit()) { + memset(&rtc_tm, 0, sizeof(struct rtc_time)); + printk(KERN_WARNING "error: reading clock" + " would delay interrupt\n"); + return; /* delay not allowed */ + } + wait_time = rtas_extended_busy_delay_time(error); + msleep(wait_time); + error = RTAS_CLOCK_BUSY; + } + } while (error == RTAS_CLOCK_BUSY && (get_tb() < max_wait_tb)); + + if (error != 0 && printk_ratelimit()) { + printk(KERN_WARNING "error: reading the clock failed (%d)\n", + error); + return; + } + + rtc_tm->tm_sec = ret[5]; + rtc_tm->tm_min = ret[4]; + rtc_tm->tm_hour = ret[3]; + rtc_tm->tm_mday = ret[2]; + rtc_tm->tm_mon = ret[1] - 1; + rtc_tm->tm_year = ret[0] - 1900; +} + +int rtas_set_rtc_time(struct rtc_time *tm) +{ + int error, wait_time; + unsigned long max_wait_tb; + + max_wait_tb = get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; + do { + error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL, + tm->tm_year + 1900, tm->tm_mon + 1, + tm->tm_mday, tm->tm_hour, tm->tm_min, + tm->tm_sec, 0); + if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { + if (in_interrupt()) + return 1; /* probably decrementer */ + wait_time = rtas_extended_busy_delay_time(error); + msleep(wait_time); + error = RTAS_CLOCK_BUSY; + } + } while (error == RTAS_CLOCK_BUSY && (get_tb() < max_wait_tb)); + + if (error != 0 && printk_ratelimit()) + printk(KERN_WARNING "error: setting the clock failed (%d)\n", + error); + + return 0; +} Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-11-14 10:41:58.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-11-14 15:20:05.000000000 +1100 @@ -13,7 +13,6 @@ obj-y += idle.o dma.o \ align.o \ - rtc.o \ iommu.o pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o Index: linux-work/arch/ppc64/kernel/rtc.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/rtc.c 2005-11-01 14:13:53.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,358 +0,0 @@ -/* - * Real Time Clock interface for PPC64. - * - * Based on rtc.c by Paul Gortmaker - * - * This driver allows use of the real time clock - * from user space. It exports the /dev/rtc - * interface supporting various ioctl() and also the - * /proc/driver/rtc pseudo-file for status information. - * - * Interface does not support RTC interrupts nor an alarm. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - * 1.0 Mike Corrigan: IBM iSeries rtc support - * 1.1 Dave Engebretsen: IBM pSeries rtc support - */ - -#define RTC_VERSION "1.1" - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include - -#include - -/* - * We sponge a minor off of the misc major. No need slurping - * up another valuable major dev number for this. If you add - * an ioctl, make sure you don't conflict with SPARC's RTC - * ioctls. - */ - -static ssize_t rtc_read(struct file *file, char __user *buf, - size_t count, loff_t *ppos); - -static int rtc_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg); - -static int rtc_read_proc(char *page, char **start, off_t off, - int count, int *eof, void *data); - -/* - * If this driver ever becomes modularised, it will be really nice - * to make the epoch retain its value across module reload... - */ - -static unsigned long epoch = 1900; /* year corresponding to 0x00 */ - -static const unsigned char days_in_mo[] = -{0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}; - -/* - * Now all the various file operations that we export. - */ - -static ssize_t rtc_read(struct file *file, char __user *buf, - size_t count, loff_t *ppos) -{ - return -EIO; -} - -static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, - unsigned long arg) -{ - struct rtc_time wtime; - - switch (cmd) { - case RTC_RD_TIME: /* Read the time/date from RTC */ - { - memset(&wtime, 0, sizeof(struct rtc_time)); - ppc_md.get_rtc_time(&wtime); - break; - } - case RTC_SET_TIME: /* Set the RTC */ - { - struct rtc_time rtc_tm; - unsigned char mon, day, hrs, min, sec, leap_yr; - unsigned int yrs; - - if (!capable(CAP_SYS_TIME)) - return -EACCES; - - if (copy_from_user(&rtc_tm, (struct rtc_time __user *)arg, - sizeof(struct rtc_time))) - return -EFAULT; - - yrs = rtc_tm.tm_year; - mon = rtc_tm.tm_mon + 1; /* tm_mon starts at zero */ - day = rtc_tm.tm_mday; - hrs = rtc_tm.tm_hour; - min = rtc_tm.tm_min; - sec = rtc_tm.tm_sec; - - if (yrs < 70) - return -EINVAL; - - leap_yr = ((!(yrs % 4) && (yrs % 100)) || !(yrs % 400)); - - if ((mon > 12) || (day == 0)) - return -EINVAL; - - if (day > (days_in_mo[mon] + ((mon == 2) && leap_yr))) - return -EINVAL; - - if ((hrs >= 24) || (min >= 60) || (sec >= 60)) - return -EINVAL; - - if ( yrs > 169 ) - return -EINVAL; - - ppc_md.set_rtc_time(&rtc_tm); - - return 0; - } - case RTC_EPOCH_READ: /* Read the epoch. */ - { - return put_user (epoch, (unsigned long __user *)arg); - } - case RTC_EPOCH_SET: /* Set the epoch. */ - { - /* - * There were no RTC clocks before 1900. - */ - if (arg < 1900) - return -EINVAL; - - if (!capable(CAP_SYS_TIME)) - return -EACCES; - - epoch = arg; - return 0; - } - default: - return -EINVAL; - } - return copy_to_user((void __user *)arg, &wtime, sizeof wtime) ? -EFAULT : 0; -} - -static int rtc_open(struct inode *inode, struct file *file) -{ - nonseekable_open(inode, file); - return 0; -} - -static int rtc_release(struct inode *inode, struct file *file) -{ - return 0; -} - -/* - * The various file operations we support. - */ -static struct file_operations rtc_fops = { - .owner = THIS_MODULE, - .llseek = no_llseek, - .read = rtc_read, - .ioctl = rtc_ioctl, - .open = rtc_open, - .release = rtc_release, -}; - -static struct miscdevice rtc_dev = { - .minor = RTC_MINOR, - .name = "rtc", - .fops = &rtc_fops -}; - -static int __init rtc_init(void) -{ - int retval; - - retval = misc_register(&rtc_dev); - if(retval < 0) - return retval; - -#ifdef CONFIG_PROC_FS - if (create_proc_read_entry("driver/rtc", 0, NULL, rtc_read_proc, NULL) - == NULL) { - misc_deregister(&rtc_dev); - return -ENOMEM; - } -#endif - - printk(KERN_INFO "i/pSeries Real Time Clock Driver v" RTC_VERSION "\n"); - - return 0; -} - -static void __exit rtc_exit (void) -{ - remove_proc_entry ("driver/rtc", NULL); - misc_deregister(&rtc_dev); -} - -module_init(rtc_init); -module_exit(rtc_exit); - -/* - * Info exported via "/proc/driver/rtc". - */ - -static int rtc_proc_output (char *buf) -{ - - char *p; - struct rtc_time tm; - - p = buf; - - ppc_md.get_rtc_time(&tm); - - /* - * There is no way to tell if the luser has the RTC set for local - * time or for Universal Standard Time (GMT). Probably local though. - */ - p += sprintf(p, - "rtc_time\t: %02d:%02d:%02d\n" - "rtc_date\t: %04d-%02d-%02d\n" - "rtc_epoch\t: %04lu\n", - tm.tm_hour, tm.tm_min, tm.tm_sec, - tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, epoch); - - p += sprintf(p, - "DST_enable\t: no\n" - "BCD\t\t: yes\n" - "24hr\t\t: yes\n" ); - - return p - buf; -} - -static int rtc_read_proc(char *page, char **start, off_t off, - int count, int *eof, void *data) -{ - int len = rtc_proc_output (page); - if (len <= off+count) *eof = 1; - *start = page + off; - len -= off; - if (len>count) len = count; - if (len<0) len = 0; - return len; -} - -#ifdef CONFIG_PPC_RTAS -#define MAX_RTC_WAIT 5000 /* 5 sec */ -#define RTAS_CLOCK_BUSY (-2) -unsigned long rtas_get_boot_time(void) -{ - int ret[8]; - int error, wait_time; - unsigned long max_wait_tb; - - max_wait_tb = __get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; - do { - error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret); - if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { - wait_time = rtas_extended_busy_delay_time(error); - /* This is boot time so we spin. */ - udelay(wait_time*1000); - error = RTAS_CLOCK_BUSY; - } - } while (error == RTAS_CLOCK_BUSY && (__get_tb() < max_wait_tb)); - - if (error != 0 && printk_ratelimit()) { - printk(KERN_WARNING "error: reading the clock failed (%d)\n", - error); - return 0; - } - - return mktime(ret[0], ret[1], ret[2], ret[3], ret[4], ret[5]); -} - -/* NOTE: get_rtc_time will get an error if executed in interrupt context - * and if a delay is needed to read the clock. In this case we just - * silently return without updating rtc_tm. - */ -void rtas_get_rtc_time(struct rtc_time *rtc_tm) -{ - int ret[8]; - int error, wait_time; - unsigned long max_wait_tb; - - max_wait_tb = __get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; - do { - error = rtas_call(rtas_token("get-time-of-day"), 0, 8, ret); - if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { - if (in_interrupt() && printk_ratelimit()) { - printk(KERN_WARNING "error: reading clock would delay interrupt\n"); - return; /* delay not allowed */ - } - wait_time = rtas_extended_busy_delay_time(error); - msleep_interruptible(wait_time); - error = RTAS_CLOCK_BUSY; - } - } while (error == RTAS_CLOCK_BUSY && (__get_tb() < max_wait_tb)); - - if (error != 0 && printk_ratelimit()) { - printk(KERN_WARNING "error: reading the clock failed (%d)\n", - error); - return; - } - - rtc_tm->tm_sec = ret[5]; - rtc_tm->tm_min = ret[4]; - rtc_tm->tm_hour = ret[3]; - rtc_tm->tm_mday = ret[2]; - rtc_tm->tm_mon = ret[1] - 1; - rtc_tm->tm_year = ret[0] - 1900; -} - -int rtas_set_rtc_time(struct rtc_time *tm) -{ - int error, wait_time; - unsigned long max_wait_tb; - - max_wait_tb = __get_tb() + tb_ticks_per_usec * 1000 * MAX_RTC_WAIT; - do { - error = rtas_call(rtas_token("set-time-of-day"), 7, 1, NULL, - tm->tm_year + 1900, tm->tm_mon + 1, - tm->tm_mday, tm->tm_hour, tm->tm_min, - tm->tm_sec, 0); - if (error == RTAS_CLOCK_BUSY || rtas_is_extended_busy(error)) { - if (in_interrupt()) - return 1; /* probably decrementer */ - wait_time = rtas_extended_busy_delay_time(error); - msleep_interruptible(wait_time); - error = RTAS_CLOCK_BUSY; - } - } while (error == RTAS_CLOCK_BUSY && (__get_tb() < max_wait_tb)); - - if (error != 0 && printk_ratelimit()) - printk(KERN_WARNING "error: setting the clock failed (%d)\n", - error); - - return 0; -} -#endif Index: linux-work/drivers/char/Kconfig =================================================================== --- linux-work.orig/drivers/char/Kconfig 2005-11-01 14:13:53.000000000 +1100 +++ linux-work/drivers/char/Kconfig 2005-11-14 15:26:45.000000000 +1100 @@ -735,7 +735,7 @@ config GEN_RTC tristate "Generic /dev/rtc emulation" - depends on RTC!=y && !IA64 && !ARM && !PPC64 && !M32R && !SPARC32 && !SPARC64 + depends on RTC!=y && !IA64 && !ARM && !M32R && !SPARC32 && !SPARC64 ---help--- If you say Y here and create a character special file /dev/rtc with major number 10 and minor number 135 using mknod ("man mknod"), you Index: linux-work/arch/powerpc/platforms/powermac/time.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/time.c 2005-11-14 14:28:10.000000000 +1100 +++ linux-work/arch/powerpc/platforms/powermac/time.c 2005-11-14 15:40:08.000000000 +1100 @@ -199,7 +199,7 @@ #define smu_set_rtc_time(tm, spin) 0 #endif -unsigned long pmac_get_boot_time(void) +unsigned long __init pmac_get_boot_time(void) { /* Get the time from the RTC, used only at boot time */ switch (sys_ctrler) { From benh at kernel.crashing.org Mon Nov 14 19:00:36 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 19:00:36 +1100 Subject: [PATCH] powerpc: Merge align.c Message-ID: <1131955237.5504.148.camel@gaston> Need testing !!! This patch merges align.c, the result isn't quite what was in ppc64 nor what was in ppc32 :) It should implement all the functionalities of both though. Kumar, since you played with that in the past, I suppose you have some test cases for verifying that it works properly before I dig out the 601 machine ? :) Since it's likely that I won't be able to test all scenario, code inspection is much welcome. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/kernel/Makefile =================================================================== --- linux-work.orig/arch/powerpc/kernel/Makefile 2005-11-14 15:17:57.000000000 +1100 +++ linux-work/arch/powerpc/kernel/Makefile 2005-11-14 17:18:14.000000000 +1100 @@ -12,7 +12,7 @@ endif obj-y := semaphore.o cputable.o ptrace.o syscalls.o \ - irq.o signal_32.o pmc.o vdso.o + irq.o align.o signal_32.o pmc.o vdso.o obj-y += vdso32/ obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ signal_64.o ptrace32.o systbl.o \ Index: linux-work/arch/powerpc/kernel/align.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/kernel/align.c 2005-11-14 18:41:22.000000000 +1100 @@ -0,0 +1,513 @@ +/* align.c - handle alignment exceptions for the Power PC. + * + * Copyright (c) 1996 Paul Mackerras + * Copyright (c) 1998-1999 TiVo, Inc. + * PowerPC 403GCX modifications. + * Copyright (c) 1999 Grant Erickson + * PowerPC 403GCX/405GP modifications. + * Copyright (c) 2001-2002 PPC64 team, IBM Corp + * 64-bit and Power4 support + * Copyright (c) 2005 Benjamin Herrenschmidt, IBM Corp + * + * Merge ppc32 and ppc64 implementations + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include + +struct aligninfo { + unsigned char len; + unsigned char flags; +}; + +#define IS_XFORM(inst) (((inst) >> 26) == 31) +#define IS_DSFORM(inst) (((inst) >> 26) >= 56) + +#define INVALID { 0, 0 } + +#define LD 1 /* load */ +#define ST 2 /* store */ +#define SE 4 /* sign-extend value */ +#define F 8 /* to/from fp regs */ +#define U 0x10 /* update index register */ +#define M 0x20 /* multiple load/store */ +#define SW 0x40 /* byte swap int or ... */ +#define S 0x40 /* ... single-precision fp */ +#define SX 0x40 /* byte count in XER */ +#define HARD 0x80 /* string, stwcx. */ + +#define DCBZ 0x5f /* 8xx/82xx dcbz faults when cache not enabled */ + +#define SWAP(a, b) (t = (a), (a) = (b), (b) = t) + +/* + * The PowerPC stores certain bits of the instruction that caused the + * alignment exception in the DSISR register. This array maps those + * bits to information about the operand length and what the + * instruction would do. + */ +static struct aligninfo aligninfo[128] = { + { 4, LD }, /* 00 0 0000: lwz / lwarx */ + INVALID, /* 00 0 0001 */ + { 4, ST }, /* 00 0 0010: stw */ + INVALID, /* 00 0 0011 */ + { 2, LD }, /* 00 0 0100: lhz */ + { 2, LD+SE }, /* 00 0 0101: lha */ + { 2, ST }, /* 00 0 0110: sth */ + { 4, LD+M }, /* 00 0 0111: lmw */ + { 4, LD+F+S }, /* 00 0 1000: lfs */ + { 8, LD+F }, /* 00 0 1001: lfd */ + { 4, ST+F+S }, /* 00 0 1010: stfs */ + { 8, ST+F }, /* 00 0 1011: stfd */ + INVALID, /* 00 0 1100 */ + { 8, LD }, /* 00 0 1101: ld/ldu/lwa */ + INVALID, /* 00 0 1110 */ + { 8, ST }, /* 00 0 1111: std/stdu */ + { 4, LD+U }, /* 00 1 0000: lwzu */ + INVALID, /* 00 1 0001 */ + { 4, ST+U }, /* 00 1 0010: stwu */ + INVALID, /* 00 1 0011 */ + { 2, LD+U }, /* 00 1 0100: lhzu */ + { 2, LD+SE+U }, /* 00 1 0101: lhau */ + { 2, ST+U }, /* 00 1 0110: sthu */ + { 4, ST+M }, /* 00 1 0111: stmw */ + { 4, LD+F+S+U }, /* 00 1 1000: lfsu */ + { 8, LD+F+U }, /* 00 1 1001: lfdu */ + { 4, ST+F+S+U }, /* 00 1 1010: stfsu */ + { 8, ST+F+U }, /* 00 1 1011: stfdu */ + INVALID, /* 00 1 1100 */ + INVALID, /* 00 1 1101 */ + INVALID, /* 00 1 1110 */ + INVALID, /* 00 1 1111 */ + { 8, LD }, /* 01 0 0000: ldx */ + INVALID, /* 01 0 0001 */ + { 8, ST }, /* 01 0 0010: stdx */ + INVALID, /* 01 0 0011 */ + INVALID, /* 01 0 0100 */ + { 4, LD+SE }, /* 01 0 0101: lwax */ + INVALID, /* 01 0 0110 */ + INVALID, /* 01 0 0111 */ + { 4, LD+M+HARD+SX }, /* 01 0 1000: lswx */ + { 4, LD+M+HARD }, /* 01 0 1001: lswi */ + { 4, ST+M+HARD+SX }, /* 01 0 1010: stswx */ + { 4, ST+M+HARD }, /* 01 0 1011: stswi */ + INVALID, /* 01 0 1100 */ + { 8, LD+U }, /* 01 0 1101: ldu */ + INVALID, /* 01 0 1110 */ + { 8, ST+U }, /* 01 0 1111: stdu */ + { 8, LD+U }, /* 01 1 0000: ldux */ + INVALID, /* 01 1 0001 */ + { 8, ST+U }, /* 01 1 0010: stdux */ + INVALID, /* 01 1 0011 */ + INVALID, /* 01 1 0100 */ + { 4, LD+SE+U }, /* 01 1 0101: lwaux */ + INVALID, /* 01 1 0110 */ + INVALID, /* 01 1 0111 */ + INVALID, /* 01 1 1000 */ + INVALID, /* 01 1 1001 */ + INVALID, /* 01 1 1010 */ + INVALID, /* 01 1 1011 */ + INVALID, /* 01 1 1100 */ + INVALID, /* 01 1 1101 */ + INVALID, /* 01 1 1110 */ + INVALID, /* 01 1 1111 */ + INVALID, /* 10 0 0000 */ + INVALID, /* 10 0 0001 */ + INVALID, /* 10 0 0010: stwcx. */ + INVALID, /* 10 0 0011 */ + INVALID, /* 10 0 0100 */ + INVALID, /* 10 0 0101 */ + INVALID, /* 10 0 0110 */ + INVALID, /* 10 0 0111 */ + { 4, LD+SW }, /* 10 0 1000: lwbrx */ + INVALID, /* 10 0 1001 */ + { 4, ST+SW }, /* 10 0 1010: stwbrx */ + INVALID, /* 10 0 1011 */ + { 2, LD+SW }, /* 10 0 1100: lhbrx */ + { 4, LD+SE }, /* 10 0 1101 lwa */ + { 2, ST+SW }, /* 10 0 1110: sthbrx */ + INVALID, /* 10 0 1111 */ + INVALID, /* 10 1 0000 */ + INVALID, /* 10 1 0001 */ + INVALID, /* 10 1 0010 */ + INVALID, /* 10 1 0011 */ + INVALID, /* 10 1 0100 */ + INVALID, /* 10 1 0101 */ + INVALID, /* 10 1 0110 */ + INVALID, /* 10 1 0111 */ + INVALID, /* 10 1 1000 */ + INVALID, /* 10 1 1001 */ + INVALID, /* 10 1 1010 */ + INVALID, /* 10 1 1011 */ + INVALID, /* 10 1 1100 */ + INVALID, /* 10 1 1101 */ + INVALID, /* 10 1 1110 */ + { 0, ST+HARD }, /* 10 1 1111: dcbz */ + { 4, LD }, /* 11 0 0000: lwzx */ + INVALID, /* 11 0 0001 */ + { 4, ST }, /* 11 0 0010: stwx */ + INVALID, /* 11 0 0011 */ + { 2, LD }, /* 11 0 0100: lhzx */ + { 2, LD+SE }, /* 11 0 0101: lhax */ + { 2, ST }, /* 11 0 0110: sthx */ + INVALID, /* 11 0 0111 */ + { 4, LD+F+S }, /* 11 0 1000: lfsx */ + { 8, LD+F }, /* 11 0 1001: lfdx */ + { 4, ST+F+S }, /* 11 0 1010: stfsx */ + { 8, ST+F }, /* 11 0 1011: stfdx */ + INVALID, /* 11 0 1100 */ + { 8, LD+M }, /* 11 0 1101: lmd */ + INVALID, /* 11 0 1110 */ + { 8, ST+M }, /* 11 0 1111: stmd */ + { 4, LD+U }, /* 11 1 0000: lwzux */ + INVALID, /* 11 1 0001 */ + { 4, ST+U }, /* 11 1 0010: stwux */ + INVALID, /* 11 1 0011 */ + { 2, LD+U }, /* 11 1 0100: lhzux */ + { 2, LD+SE+U }, /* 11 1 0101: lhaux */ + { 2, ST+U }, /* 11 1 0110: sthux */ + INVALID, /* 11 1 0111 */ + { 4, LD+F+S+U }, /* 11 1 1000: lfsux */ + { 8, LD+F+U }, /* 11 1 1001: lfdux */ + { 4, ST+F+S+U }, /* 11 1 1010: stfsux */ + { 8, ST+F+U }, /* 11 1 1011: stfdux */ + INVALID, /* 11 1 1100 */ + INVALID, /* 11 1 1101 */ + INVALID, /* 11 1 1110 */ + INVALID, /* 11 1 1111 */ +}; + +/* + * Create a DSISR value from the instruction + */ +static inline unsigned make_dsisr(unsigned instr) +{ + unsigned dsisr; + + + /* bits 6:15 --> 22:31 */ + dsisr = (instr & 0x03ff0000) >> 16; + + if ( IS_XFORM(instr) ) { + /* bits 29:30 --> 15:16 */ + dsisr |= (instr & 0x00000006) << 14; + /* bit 25 --> 17 */ + dsisr |= (instr & 0x00000040) << 8; + /* bits 21:24 --> 18:21 */ + dsisr |= (instr & 0x00000780) << 3; + } + else { + /* bit 5 --> 17 */ + dsisr |= (instr & 0x04000000) >> 12; + /* bits 1: 4 --> 18:21 */ + dsisr |= (instr & 0x78000000) >> 17; + /* bits 30:31 --> 12:13 */ + if ( IS_DSFORM(instr) ) + dsisr |= (instr & 0x00000003) << 18; + } + + return dsisr; +} + +/* + * The dcbz (data cache block zero) instruction + * gives an alignment fault if used on non-cacheable + * memory. We handle the fault mainly for the + * case when we are running with the cache disabled + * for debugging. + */ +static int emulate_dcbz(struct pt_regs *regs, unsigned char __user *addr) +{ + long __user *p; + int i, size; + +#ifdef __powerpc64__ + size = ppc64_caches.dline_size; +#else + size = L1_CACHE_BYTES; +#endif + p = (long __user *) (regs->dar & -size); + if (user_mode(regs) && !access_ok(VERIFY_WRITE, p, size)) + return -EFAULT; + for (i = 0; i < size / sizeof(long); ++i) + if (__put_user(0, p+i)) + return -EFAULT; + return 1; +} + +/* + * Emulate load & store multiple instructions + */ +static int emulate_multiple(struct pt_regs *regs, unsigned char __user *addr, + unsigned int reg, unsigned int nb, + unsigned int flags, unsigned int instr) +{ + unsigned char *rptr; + int nb0, i; + + /* + * We do not try to emulate 8 bytes multiple as they aren't really + * available in our operating environments and we don't try to + * emulate multiples operations in kernel land as they should never + * be used/generated there at least not on unaligned boundaries + */ + if (unlikely((nb > 4) || !user_mode(regs))) + return 0; + + /* lmw, stmw, lswi/x, stswi/x */ + nb0 = 0; + if (flags & HARD) { + if (flags & SX) { + nb = regs->xer & 127; + if (nb == 0) + return 1; + } else { + if (__get_user(instr, + (unsigned int __user *)regs->nip)) + return -EFAULT; + nb = (instr >> 11) & 0x1f; + if (nb == 0) + nb = 32; + } + if (nb + reg * 4 > 128) { + nb0 = nb + reg * 4 - 128; + nb = 128 - reg * 4; + } + } else { + /* lwm, stmw */ + nb = (32 - reg) * 4; + } + + if (!access_ok((flags & ST ? VERIFY_WRITE: VERIFY_READ), addr, nb+nb0)) + return -EFAULT; /* bad address */ + + rptr = (unsigned char *) ®s->gpr[reg]; + if (flags & LD) { + for (i = 0; i < nb; ++i) + if (__get_user(rptr[i], addr + i)) + return -EFAULT; + if (nb0 > 0) { + rptr = (unsigned char *) ®s->gpr[0]; + addr += nb; + for (i = 0; i < nb0; ++i) + if (__get_user(rptr[i], addr + i)) + return -EFAULT; + } + for (; (i & 3) != 0; ++i) + rptr[i] = 0; + } else { + for (i = 0; i < nb; ++i) + if (__put_user(rptr[i], addr + i)) + return -EFAULT; + if (nb0 > 0) { + rptr = (unsigned char *) ®s->gpr[0]; + addr += nb; + for (i = 0; i < nb0; ++i) + if (__put_user(rptr[i], addr + i)) + return -EFAULT; + } + } + return 1; +} + + +/* + * Called on alignment exception. Attempts to fixup + * + * Return 1 on success + * Return 0 if unable to handle the interrupt + * Return -EFAULT if data address is bad + */ + +int fix_alignment(struct pt_regs *regs) +{ + unsigned int instr, nb, flags; + unsigned int reg, areg; + unsigned int dsisr; + unsigned char __user *addr; + unsigned char __user *p; + int ret, t; + union { + long ll; + double dd; + unsigned char v[8]; + struct { + unsigned hi32; + int low32; + } x32; + struct { + unsigned char hi48[6]; + short low16; + } x16; + } data; + + /* + * We require a complete register set, if not, then our assembly + * is broken + */ + CHECK_FULL_REGS(regs); + + dsisr = regs->dsisr; + + /* Some processors don't provide us with a DSISR we can use here, + * let's make one up from the instruction + */ + if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { + unsigned int real_instr; + if (unlikely(__get_user(real_instr, + (unsigned int __user *)regs->nip))) + return -EFAULT; + dsisr = make_dsisr(real_instr); + } + + /* extract the operation and registers from the dsisr */ + reg = (dsisr >> 5) & 0x1f; /* source/dest register */ + areg = dsisr & 0x1f; /* register to update */ + instr = (dsisr >> 10) & 0x7f; + instr |= (dsisr >> 13) & 0x60; + + /* Lookup the operation in our table */ + nb = aligninfo[instr].len; + flags = aligninfo[instr].flags; + + /* DAR has the operand effective address */ + addr = (unsigned char __user *)regs->dar; + + /* A size of 0 indicates an instruction we don't support, with + * the exception of DCBZ which is handled as a special case here + */ + if (instr == DCBZ) + return emulate_dcbz(regs, addr); + if (unlikely(nb == 0)) + return 0; + + /* Load/Store Multiple instructions are handled in their own + * function + */ + if (flags & M) + return emulate_multiple(regs, addr, reg, nb, flags, instr); + + /* Verify the address of the operand */ + if (unlikely(user_mode(regs) && + !access_ok((flags & ST ? VERIFY_WRITE : VERIFY_READ), + addr, nb))) + return -EFAULT; + + /* Force the fprs into the save area so we can reference them */ + if (flags & F) { + /* userland only */ + if (unlikely(!user_mode(regs))) + return 0; + flush_fp_to_thread(current); + } + + /* If we are loading, get the data from user space, else + * get it from register values + */ + if (flags & LD) { + data.ll = 0; + ret = 0; + p = addr; + switch (nb) { + case 8: + ret |= __get_user(data.v[0], p++); + ret |= __get_user(data.v[1], p++); + ret |= __get_user(data.v[2], p++); + ret |= __get_user(data.v[3], p++); + case 4: + ret |= __get_user(data.v[4], p++); + ret |= __get_user(data.v[5], p++); + case 2: + ret |= __get_user(data.v[6], p++); + ret |= __get_user(data.v[7], p++); + if (unlikely(ret)) + return -EFAULT; + } + } else if (flags & F) + data.dd = current->thread.fpr[reg]; + else + data.ll = regs->gpr[reg]; + + /* Perform other misc operations like sign extension, byteswap, + * or floating point single precision conversion + */ + switch (flags & ~U) { + case LD+SE: /* sign extend */ + if ( nb == 2 ) + data.ll = data.x16.low16; + else /* nb must be 4 */ + data.ll = data.x32.low32; + break; + case LD+S: /* byte-swap */ + case ST+S: + if (nb == 2) { + SWAP(data.v[6], data.v[7]); + } else { + SWAP(data.v[4], data.v[7]); + SWAP(data.v[5], data.v[6]); + } + break; + + /* Single-precision FP load and store require conversions... */ + case LD+F+S: +#ifdef CONFIG_PPC_FPU + preempt_disable(); + enable_kernel_fp(); + cvt_fd((float *)&data.v[4], &data.dd, ¤t->thread); + preempt_enable(); +#else + return 0; +#endif + break; + case ST+F+S: +#ifdef CONFIG_PPC_FPU + preempt_disable(); + enable_kernel_fp(); + cvt_df(&data.dd, (float *)&data.v[4], ¤t->thread); + preempt_enable(); +#else + return 0; +#endif + break; + } + + /* Store result to memory or update registers */ + if (flags & ST) { + ret = 0; + p = addr; + switch (nb) { + case 8: + ret |= __put_user(data.v[0], p++); + ret |= __put_user(data.v[1], p++); + ret |= __put_user(data.v[2], p++); + ret |= __put_user(data.v[3], p++); + case 4: + ret |= __put_user(data.v[4], p++); + ret |= __put_user(data.v[5], p++); + case 2: + ret |= __put_user(data.v[6], p++); + ret |= __put_user(data.v[7], p++); + } + if (unlikely(ret)) + return -EFAULT; + } else if (flags & F) + current->thread.fpr[reg] = data.dd; + else + regs->gpr[reg] = data.ll; + + /* Update RA as needed */ + if (flags & U) + regs->gpr[areg] = regs->dar; + + return 1; +} Index: linux-work/arch/ppc/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc/kernel/Makefile 2005-11-11 10:14:48.000000000 +1100 +++ linux-work/arch/ppc/kernel/Makefile 2005-11-14 18:42:30.000000000 +1100 @@ -13,7 +13,7 @@ extra-y += vmlinux.lds obj-y := entry.o traps.o idle.o time.o misc.o \ - process.o align.o \ + process.o \ setup.o \ ppc_htab.o obj-$(CONFIG_6xx) += l2cr.o cpu_setup_6xx.o Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-11-14 15:20:05.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-11-14 18:42:12.000000000 +1100 @@ -11,9 +11,7 @@ endif -obj-y += idle.o dma.o \ - align.o \ - iommu.o +obj-y += idle.o dma.o iommu.o pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o Index: linux-work/include/asm-powerpc/cputable.h =================================================================== --- linux-work.orig/include/asm-powerpc/cputable.h 2005-11-11 10:14:49.000000000 +1100 +++ linux-work/include/asm-powerpc/cputable.h 2005-11-14 18:33:42.000000000 +1100 @@ -90,6 +90,7 @@ #define CPU_FTR_NEED_COHERENT ASM_CONST(0x0000000000020000) #define CPU_FTR_NO_BTIC ASM_CONST(0x0000000000040000) #define CPU_FTR_BIG_PHYS ASM_CONST(0x0000000000080000) +#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0000000000100000) #ifdef __powerpc64__ /* Add the 64b processor unique features in the top half of the word */ @@ -97,7 +98,6 @@ #define CPU_FTR_16M_PAGE ASM_CONST(0x0000000200000000) #define CPU_FTR_TLBIEL ASM_CONST(0x0000000400000000) #define CPU_FTR_NOEXECUTE ASM_CONST(0x0000000800000000) -#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0000001000000000) #define CPU_FTR_IABR ASM_CONST(0x0000002000000000) #define CPU_FTR_MMCRA ASM_CONST(0x0000004000000000) #define CPU_FTR_CTRL ASM_CONST(0x0000008000000000) @@ -113,7 +113,6 @@ #define CPU_FTR_16M_PAGE ASM_CONST(0x0) #define CPU_FTR_TLBIEL ASM_CONST(0x0) #define CPU_FTR_NOEXECUTE ASM_CONST(0x0) -#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0) #define CPU_FTR_IABR ASM_CONST(0x0) #define CPU_FTR_MMCRA ASM_CONST(0x0) #define CPU_FTR_CTRL ASM_CONST(0x0) @@ -273,18 +272,21 @@ CPU_FTRS_POWER3_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, CPU_FTRS_POWER4_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | - CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_NODSISRALIGN, CPU_FTRS_970_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_MAYBE_CAN_NAP, + CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN, CPU_FTRS_8XX = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_E200 = CPU_FTR_USE_TB, - CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, + CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, + CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, + CPU_FTRS_E200 = CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN, + CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, CPU_FTRS_E500_2 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | - CPU_FTR_BIG_PHYS, - CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON, + CPU_FTR_BIG_PHYS | CPU_FTR_NODSISRALIGN, + CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN, #ifdef __powerpc64__ CPU_FTRS_POWER3 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR, From benh at kernel.crashing.org Mon Nov 14 19:16:40 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 14 Nov 2005 19:16:40 +1100 Subject: [PATCH] GX bus support Message-ID: <1131956200.5504.157.camel@gaston> From: Heiko J Schick This patch adds the necessary core bus support used by device drivers that sit on the IBM GX bus on modern pSeries machines like the Galaxy infiniband for example. It provide transparent DMA ops (the low level driver works with virtual addresses directly) along with a simple bus layer using the Open Firmware matching routines. Signed-off-by: Heiko J Schick Signed-off-by: Benjamin Herrenschmidt --- You'll probably have to hack the path to arch/ppc64/kernel/dma.c in the patch before applying if you already moved that to arch/powerpc. It was not yet moved in my tree so I didn't "fix" the patch. Please send along with your next batch to Linus, it's been ready in time but some mailer breakage conspiracy prevented me from "polishing" it in time for -rc1. Index: linux-work/arch/ppc64/kernel/dma.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/dma.c 2005-11-14 19:08:12.000000000 +1100 +++ linux-work/arch/ppc64/kernel/dma.c 2005-11-14 19:08:14.000000000 +1100 @@ -10,6 +10,7 @@ /* Include the busses we support */ #include #include +#include #include #include @@ -23,6 +24,10 @@ if (dev->bus == &vio_bus_type) return &vio_dma_ops; #endif +#ifdef CONFIG_IBMEBUS + if (dev->bus == &ebus_bus_type) + return &ebus_dma_ops; +#endif return NULL; } @@ -46,7 +51,11 @@ #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return -EIO; -#endif /* CONFIG_IBMVIO */ +#endif +#ifdef CONFIG_IBMEBUS + if (dev->bus == &ebus_bus_type) + return -EIO; +#endif BUG(); return 0; } Index: linux-work/arch/powerpc/platforms/pseries/ebus.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/platforms/pseries/ebus.c 2005-11-14 19:11:08.000000000 +1100 @@ -0,0 +1,372 @@ +/* + * IBM PowerPC eBus Infrastructure Support. + * + * Copyright (c) 2005 IBM Corporation + * Heiko J Schick + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include + +static struct ebus_dev ebus_bus_device = { /* fake "parent" device */ + .name = ebus_bus_device.ofdev.dev.bus_id, + .ofdev.dev.bus_id = "ebus", + .ofdev.dev.bus = &ebus_bus_type, +}; + +static void *ebus_alloc_coherent(struct device *dev, + size_t size, + dma_addr_t *dma_handle, + unsigned int __nocast flag) +{ + return NULL; +} + +static void ebus_free_coherent(struct device *dev, + size_t size, void *vaddr, dma_addr_t dma_handle) +{ + return; +} + +static dma_addr_t ebus_map_single(struct device *dev, + void *ptr, + size_t size, + enum dma_data_direction direction) +{ + return (dma_addr_t)(ptr); +} + +static void ebus_unmap_single(struct device *dev, + dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ + return; +} + +static int ebus_map_sg(struct device *dev, + struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++) { + sg[i].dma_address = (dma_addr_t)page_address(sg[i].page) + + sg[i].offset; + } + + return nents; +} + +static void ebus_unmap_sg(struct device *dev, + struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + return; +} + +static int ebus_dma_supported(struct device *dev, u64 mask) +{ + return 1; +} + +struct dma_mapping_ops ebus_dma_ops = { + .alloc_coherent = ebus_alloc_coherent, + .free_coherent = ebus_free_coherent, + .map_single = ebus_map_single, + .unmap_single = ebus_unmap_single, + .map_sg = ebus_map_sg, + .unmap_sg = ebus_unmap_sg, + .dma_supported = ebus_dma_supported, +}; + +static int ebus_bus_probe(struct device *dev) +{ + struct ebus_dev *ebusdev = to_ebus_dev(dev); + struct ebus_driver *ebusdrv = to_ebus_driver(dev->driver); + const struct of_device_id *id; + int error = -ENODEV; + + if (!ebusdrv->probe) + return error; + + id = of_match_device(ebusdrv->id_table, &ebusdev->ofdev); + if (id) { + error = ebusdrv->probe(ebusdev, id); + } + + return error; +} + +static int ebus_bus_remove(struct device *dev) +{ + struct ebus_dev *ebusdev = to_ebus_dev(dev); + struct ebus_driver *ebusdrv = to_ebus_driver(dev->driver); + + if (ebusdrv->remove) { + return ebusdrv->remove(ebusdev); + } + + return 1; +} + +static void __devinit ebus_dev_release(struct device *dev) +{ + of_node_put(dev->platform_data); + kfree(to_ebus_dev(dev)); +} + +static ssize_t ebusdev_show_name(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%s\n", to_ebus_dev(dev)->name); +} +static DEVICE_ATTR(name, S_IRUSR | S_IRGRP | S_IROTH, ebusdev_show_name, NULL); + +static struct ebus_dev* __devinit ebus_register_device_common( + struct ebus_dev *ebusdev, char *name) +{ + ebusdev->name = name; + ebusdev->ofdev.dev.parent = &ebus_bus_device.ofdev.dev; + ebusdev->ofdev.dev.bus = &ebus_bus_type; + ebusdev->ofdev.dev.release = ebus_dev_release; + + if (of_device_register(&ebusdev->ofdev) != 0) { + printk(KERN_ERR "%s: failed to register device %s\n", + __FUNCTION__, ebusdev->ofdev.dev.bus_id); + return NULL; + } + + device_create_file(&ebusdev->ofdev.dev, &dev_attr_name); + + return ebusdev; +} + +struct ebus_dev* __devinit ebus_register_device_node(struct device_node *dn) +{ + struct ebus_dev *ebusdev; + char *loc_code; + int length; + + loc_code = (char *)get_property(dn, "ibm,loc-code", NULL); + if (!loc_code) { + printk(KERN_WARNING "%s: node %s missing 'ibm,loc-code'\n", + __FUNCTION__, dn->name ? dn->name : ""); + return NULL; + } + + if (strlen(loc_code) == 0) { + printk(KERN_WARNING "%s: 'ibm,loc-code' is invalid\n", + __FUNCTION__); + return NULL; + } + + ebusdev = kmalloc(sizeof(struct ebus_dev), GFP_KERNEL); + if (!ebusdev) { + return NULL; + } + memset(ebusdev, 0, sizeof(struct ebus_dev)); + + ebusdev->ofdev.node = of_node_get(dn); + + length = strlen(loc_code); + strncpy(ebusdev->ofdev.dev.bus_id, loc_code + + (strlen(loc_code) - min(length, BUS_ID_SIZE)), BUS_ID_SIZE); + + /* register with generic device framework */ + if (ebus_register_device_common(ebusdev, dn->name) == NULL) { + kfree(ebusdev); + return NULL; + } + + return ebusdev; +} + +static void probe_bus(char* name) +{ + struct device_node *dn = NULL; + + while ((dn = of_find_node_by_name(dn, name))) { + ebus_register_device_node(dn); + } + + of_node_put(dn); +} + +static int ebus_unregister_device(struct device *dev) +{ + device_remove_file(dev, &dev_attr_name); + of_device_unregister(to_of_device(dev)); + + return 0; +} + +static int ebus_match_helper(struct device *dev, void *data) +{ + if (strcmp((char*)data, to_ebus_dev(dev)->name) == 0) + return 1; + + return 0; +} + +int ebus_register_driver(struct ebus_driver *ebusdrv) +{ + struct of_device_id *idt; + struct device *dev; + + ebusdrv->driver.name = ebusdrv->name; + ebusdrv->driver.bus = &ebus_bus_type; + ebusdrv->driver.probe = ebus_bus_probe; + ebusdrv->driver.remove = ebus_bus_remove; + + /* check if a driver for that device name is already loaded */ + idt = ebusdrv->id_table; + while (strlen(idt->name) > 0) { + dev = bus_find_device(&ebus_bus_type, NULL, (void*)idt->name, + ebus_match_helper); + if (dev) { + printk(KERN_ERR + "%s: driver for device name %s already loaded\n", + __FUNCTION__, idt->name); + return -EPERM; + } + idt++; + } + + idt = ebusdrv->id_table; + while (strlen(idt->name) > 0) { + probe_bus(idt->name); + idt++; + } + + return driver_register(&ebusdrv->driver); +} +EXPORT_SYMBOL(ebus_register_driver); + +void ebus_unregister_driver(struct ebus_driver *ebusdrv) +{ + struct of_device_id *idt; + struct device *dev; + + driver_unregister(&ebusdrv->driver); + + idt = ebusdrv->id_table; + while (strlen(idt->name) > 0) { + while ((dev = bus_find_device(&ebus_bus_type, NULL, + (void*)idt->name, + ebus_match_helper))) { + ebus_unregister_device(dev); + } + idt++; + + } +} +EXPORT_SYMBOL(ebus_unregister_driver); + +int ebus_request_irq(u32 ist, + irqreturn_t (*handler)(int, void*, struct pt_regs *), + unsigned long irq_flags, const char * devname, + void *dev_id) +{ + unsigned int irq = virt_irq_create_mapping(ist); + + if (irq == NO_IRQ) + return -EINVAL; + + irq = irq_offset_up(irq); + + return request_irq(irq, handler, + irq_flags, devname, dev_id); +} +EXPORT_SYMBOL(ebus_request_irq); + +void ebus_free_irq(u32 ist, void *dev_id) +{ + unsigned int irq = virt_irq_create_mapping(ist); + + irq = irq_offset_up(irq); + free_irq(irq, dev_id); + + return; +} +EXPORT_SYMBOL(ebus_free_irq); + +static int ebus_bus_match(struct device *dev, struct device_driver *drv) +{ + const struct ebus_dev *ebus_dev = to_ebus_dev(dev); + struct ebus_driver *ebus_drv = to_ebus_driver(drv); + const struct of_device_id *ids = ebus_drv->id_table; + const struct of_device_id *found_id; + + if (!ids) + return 0; + + found_id = of_match_device(ids, &ebus_dev->ofdev); + if (found_id) + return 1; + + return 0; +} + +struct bus_type ebus_bus_type = { + .name = "ebus", + .match = ebus_bus_match, +}; +EXPORT_SYMBOL(ebus_bus_type); + +static int __init ebus_bus_init(void) +{ + int err; + + printk(KERN_INFO "eBus Device Driver\n"); + + err = bus_register(&ebus_bus_type); + if (err) { + printk(KERN_ERR "failed to register eBus\n"); + return err; + } + + err = device_register(&ebus_bus_device.ofdev.dev); + if (err) { + printk(KERN_WARNING "%s: device_register returned %i\n", + __FUNCTION__, err); + return err; + } + + return 0; +} +__initcall(ebus_bus_init); Index: linux-work/include/asm-powerpc/ebus.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/include/asm-powerpc/ebus.h 2005-11-14 19:08:14.000000000 +1100 @@ -0,0 +1,87 @@ +/* + * IBM PowerPC eBus Infrastructure Support. + * + * Copyright (c) 2005 IBM Corporation + * Heiko J Schick + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _ASM_EBUS_H +#define _ASM_EBUS_H + +#include +#include +#include +#include + +extern struct dma_mapping_ops ebus_dma_ops; +extern struct bus_type ebus_bus_type; + +struct ebus_dev { + char *name; + u64 unit_address; + struct of_device ofdev; +}; + +struct ebus_driver { + struct list_head node; + char *name; + struct of_device_id *id_table; + int (*probe) (struct ebus_dev *dev, const struct of_device_id *id); + int (*remove) (struct ebus_dev *dev); + unsigned long driver_data; + + struct device_driver driver; +}; + +int ebus_register_driver(struct ebus_driver *ebusdrv); +void ebus_unregister_driver(struct ebus_driver *ebusdrv); + +int ebus_request_irq(u32 ist, + irqreturn_t (*handler)(int, void*, struct pt_regs *), + unsigned long irq_flags, const char * devname, + void *dev_id); + +void ebus_free_irq(u32 ist, void *dev_id); + +static inline struct ebus_driver *to_ebus_driver(struct device_driver *drv) +{ + return container_of(drv, struct ebus_driver, driver); +} + +static inline struct ebus_dev *to_ebus_dev(struct device *dev) +{ + return container_of(dev, struct ebus_dev, ofdev.dev); +} + + +#endif /* _ASM_EBUS_H */ Index: linux-work/arch/powerpc/Kconfig =================================================================== --- linux-work.orig/arch/powerpc/Kconfig 2005-11-14 19:08:12.000000000 +1100 +++ linux-work/arch/powerpc/Kconfig 2005-11-14 19:08:14.000000000 +1100 @@ -384,6 +384,13 @@ bool default y +config IBMEBUS + depends on PPC_PSERIES + bool "Support for GX bus based adapters" + default y + help + Bus device driver for GX bus based adapters. + config PPC_MPC106 bool default n Index: linux-work/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-work.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-14 19:08:12.000000000 +1100 +++ linux-work/arch/powerpc/platforms/pseries/Makefile 2005-11-14 19:08:14.000000000 +1100 @@ -4,4 +4,5 @@ obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o obj-$(CONFIG_SCANLOG) += scanlog.o -obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_IBMEBUS) += ebus.o From aswathavijay at gmail.com Mon Nov 14 20:48:47 2005 From: aswathavijay at gmail.com (Vijayakumar Ramalingam) Date: Mon, 14 Nov 2005 15:18:47 +0530 Subject: Steps in upgrading Linux kernel 2.6.7 to 64 bit kernel Message-ID: <5f87992f0511140148o4317d522tdd3c2014705b1c56@mail.gmail.com> Hi all, I want to build a 64 bit ppc kernel from ppc-linux kernel 2.6.7 I want to know how it can be done & what else to be taken care? Thanks Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051114/a65f5782/attachment.htm From aswathavijay at gmail.com Mon Nov 14 20:55:56 2005 From: aswathavijay at gmail.com (Vijayakumar Ramalingam) Date: Mon, 14 Nov 2005 15:25:56 +0530 Subject: Steps in upgrading Linux kernel 2.6.7 to 64 bit kernel In-Reply-To: <5f87992f0511140148o4317d522tdd3c2014705b1c56@mail.gmail.com> References: <5f87992f0511140148o4317d522tdd3c2014705b1c56@mail.gmail.com> Message-ID: <5f87992f0511140155w31d6ab7fnb58d260d7b206567@mail.gmail.com> Hi all, I want to build a 64 bit ppc kernel from 32 bit ppc-linux kernel 2.6.7 I want to know how it can be done & what else to be taken care? On 11/14/05, Vijayakumar Ramalingam wrote: > > Hi all, > I want to build a 64 bit ppc kernel from ppc-linux kernel 2.6.7 > I want to know how it can be done & what else to be taken care? > Thanks > Vijay > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051114/3ea71e7f/attachment.htm From paulus at samba.org Mon Nov 14 22:21:58 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Nov 2005 22:21:58 +1100 Subject: please pull powerpc-merge.git Message-ID: <17272.29526.477339.230738@cargo.ozlabs.ibm.com> Linus, Please do another pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git I have moved over most of include/asm-ppc64 to include/asm-powerpc and almost everything from arch/ppc64 to arch/powerpc. We should have arch/ppc64 and include/asm-ppc64 empty in a few days. There are various other small fixes in there including some VDSO fixes from Ben. Thanks, Paul. Benjamin Herrenschmidt: powerpc: Always rebuild arch/powerpc/include/asm symlink powerpc: vdso fixes (take #2) powerpc: kill ppc64 rtc.c, use genrtc instead Heiko J Schick: powerpc: GX bus support on pSeries machines Michael Ellerman: powerpc: Merge page.h powerpc: Turn cpu_irq_down into kexec_cpu_down powerpc: Export htab start/end via device tree Paul Mackerras: powerpc: Move a bunch of ppc64 headers to include/asm-powerpc powerpc: Move most remaining ppc64 files over to arch/powerpc powerpc: Export a couple of prom functions powerpc: Mark PREP and embedded as broken for now powerpc: Fix 32-bit compile: PPC_MEMSTART was undeclared powerpc: Fix clearing of the FPSCR when invoking a signal handler powerpc: Remove an extraneous and incorrect declaration of pmac_nvram_init. powerpc: Remove __init from a function used in suspend/resume. Stephen Rothwell: powerpc: make iSeries use generic virtual irq mapping powerpc: have only one definition of __irq_offset_value powerpc: iSeries build fixes arch/powerpc/Kconfig | 12 arch/powerpc/Makefile | 2 arch/powerpc/configs/pseries_defconfig | 206 ++- arch/powerpc/kernel/Makefile | 20 arch/powerpc/kernel/asm-offsets.c | 6 arch/powerpc/kernel/dma_64.c | 11 arch/powerpc/kernel/iomap.c | 0 arch/powerpc/kernel/iommu.c | 0 arch/powerpc/kernel/irq.c | 9 arch/powerpc/kernel/kprobes.c | 0 arch/powerpc/kernel/lparcfg.c | 51 - arch/powerpc/kernel/machine_kexec_64.c | 63 + arch/powerpc/kernel/module_64.c | 0 arch/powerpc/kernel/pci_64.c | 0 arch/powerpc/kernel/pci_direct_iommu.c | 0 arch/powerpc/kernel/pci_dn.c | 0 arch/powerpc/kernel/pci_iommu.c | 0 arch/powerpc/kernel/prom.c | 2 arch/powerpc/kernel/rtas-rtc.c | 105 + arch/powerpc/kernel/setup_32.c | 4 arch/powerpc/kernel/setup_64.c | 5 arch/powerpc/kernel/signal_32.c | 7 arch/powerpc/kernel/signal_64.c | 6 arch/powerpc/kernel/vdso32/datapage.S | 3 arch/powerpc/kernel/vdso32/gettimeofday.S | 12 arch/powerpc/kernel/vdso64/datapage.S | 1 arch/powerpc/kernel/vdso64/gettimeofday.S | 31 arch/powerpc/platforms/iseries/irq.c | 25 arch/powerpc/platforms/iseries/setup.c | 6 arch/powerpc/platforms/powermac/time.c | 9 arch/powerpc/platforms/pseries/Makefile | 6 arch/powerpc/platforms/pseries/ebus.c | 372 +++++ arch/powerpc/platforms/pseries/hvconsole.c | 0 arch/powerpc/platforms/pseries/hvcserver.c | 0 arch/powerpc/platforms/pseries/setup.c | 26 arch/ppc64/Kconfig | 520 ------- arch/ppc64/kernel/Makefile | 41 - arch/ppc64/kernel/asm-offsets.c | 195 --- arch/ppc64/kernel/btext.c | 792 ----------- arch/ppc64/kernel/head.S | 2007 --------------------------- arch/ppc64/kernel/misc.S | 940 ------------- arch/ppc64/kernel/ppc_ksyms.c | 76 - arch/ppc64/kernel/prom.c | 1956 --------------------------- arch/ppc64/kernel/prom_init.c | 2051 ---------------------------- arch/ppc64/kernel/rtc.c | 358 ----- arch/ppc64/kernel/semaphore.c | 136 -- arch/ppc64/kernel/vdso.c | 625 --------- arch/ppc64/kernel/vmlinux.lds.S | 151 -- arch/ppc64/xmon/privinst.h | 64 - drivers/char/Kconfig | 2 include/asm-powerpc/btext.h | 0 include/asm-powerpc/delay.h | 19 include/asm-powerpc/ebus.h | 87 + include/asm-powerpc/eeh.h | 0 include/asm-powerpc/floppy.h | 25 include/asm-powerpc/hvconsole.h | 0 include/asm-powerpc/hvcserver.h | 0 include/asm-powerpc/kexec.h | 1 include/asm-powerpc/machdep.h | 4 include/asm-powerpc/nvram.h | 17 include/asm-powerpc/page.h | 179 ++ include/asm-powerpc/page_32.h | 40 + include/asm-powerpc/page_64.h | 174 ++ include/asm-powerpc/serial.h | 18 include/asm-powerpc/vdso_datapage.h | 2 include/asm-ppc/nvram.h | 73 - include/asm-ppc64/page.h | 328 ---- include/asm-ppc64/prom.h | 220 --- include/asm-ppc64/serial.h | 23 include/asm-ppc64/system.h | 310 ---- 70 files changed, 1356 insertions(+), 11078 deletions(-) rename arch/{ppc64/kernel/dma.c => powerpc/kernel/dma_64.c} (97%) rename arch/{ppc64/kernel/iomap.c => powerpc/kernel/iomap.c} (100%) rename arch/{ppc64/kernel/iommu.c => powerpc/kernel/iommu.c} (100%) rename arch/{ppc64/kernel/kprobes.c => powerpc/kernel/kprobes.c} (100%) rename arch/{ppc64/kernel/machine_kexec.c => powerpc/kernel/machine_kexec_64.c} (84%) rename arch/{ppc64/kernel/module.c => powerpc/kernel/module_64.c} (100%) rename arch/{ppc64/kernel/pci.c => powerpc/kernel/pci_64.c} (100%) rename arch/{ppc64/kernel/pci_direct_iommu.c => powerpc/kernel/pci_direct_iommu.c} (100%) rename arch/{ppc64/kernel/pci_dn.c => powerpc/kernel/pci_dn.c} (100%) rename arch/{ppc64/kernel/pci_iommu.c => powerpc/kernel/pci_iommu.c} (100%) create mode 100644 arch/powerpc/kernel/rtas-rtc.c create mode 100644 arch/powerpc/platforms/pseries/ebus.c rename arch/{ppc64/kernel/hvconsole.c => powerpc/platforms/pseries/hvconsole.c} (100%) rename arch/{ppc64/kernel/hvcserver.c => powerpc/platforms/pseries/hvcserver.c} (100%) delete mode 100644 arch/ppc64/Kconfig delete mode 100644 arch/ppc64/kernel/asm-offsets.c delete mode 100644 arch/ppc64/kernel/btext.c delete mode 100644 arch/ppc64/kernel/head.S delete mode 100644 arch/ppc64/kernel/misc.S delete mode 100644 arch/ppc64/kernel/ppc_ksyms.c delete mode 100644 arch/ppc64/kernel/prom.c delete mode 100644 arch/ppc64/kernel/prom_init.c delete mode 100644 arch/ppc64/kernel/rtc.c delete mode 100644 arch/ppc64/kernel/semaphore.c delete mode 100644 arch/ppc64/kernel/vdso.c delete mode 100644 arch/ppc64/kernel/vmlinux.lds.S delete mode 100644 arch/ppc64/xmon/privinst.h rename include/{asm-ppc64/btext.h => asm-powerpc/btext.h} (100%) rename include/{asm-ppc64/delay.h => asm-powerpc/delay.h} (71%) create mode 100644 include/asm-powerpc/ebus.h rename include/{asm-ppc64/eeh.h => asm-powerpc/eeh.h} (100%) rename include/{asm-ppc64/floppy.h => asm-powerpc/floppy.h} (90%) rename include/{asm-ppc64/hvconsole.h => asm-powerpc/hvconsole.h} (100%) rename include/{asm-ppc64/hvcserver.h => asm-powerpc/hvcserver.h} (100%) rename include/{asm-ppc64/nvram.h => asm-powerpc/nvram.h} (84%) create mode 100644 include/asm-powerpc/page.h create mode 100644 include/asm-powerpc/page_32.h create mode 100644 include/asm-powerpc/page_64.h create mode 100644 include/asm-powerpc/serial.h delete mode 100644 include/asm-ppc/nvram.h delete mode 100644 include/asm-ppc64/page.h delete mode 100644 include/asm-ppc64/prom.h delete mode 100644 include/asm-ppc64/serial.h delete mode 100644 include/asm-ppc64/system.h From hch at lst.de Mon Nov 14 22:56:47 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 14 Nov 2005 12:56:47 +0100 Subject: [PATCH] GX bus support In-Reply-To: <1131956200.5504.157.camel@gaston> References: <1131956200.5504.157.camel@gaston> Message-ID: <20051114115647.GA6056@lst.de> On Mon, Nov 14, 2005 at 07:16:40PM +1100, Benjamin Herrenschmidt wrote: > This patch adds the necessary core bus support used by device drivers > that sit on the IBM GX bus on modern pSeries machines like the Galaxy > infiniband for example. It provide transparent DMA ops (the low level > driver works with virtual addresses directly) along with a simple bus > layer using the Open Firmware matching routines. Why is this called ebus in the source when you call it GX bus here? This is specially confusing as we already support an 'ebus' on the sparc port. > @@ -0,0 +1,372 @@ > +/* > + * IBM PowerPC eBus Infrastructure Support. > + * > + * Copyright (c) 2005 IBM Corporation > + * Heiko J Schick > + * > + * All rights reserved. > + * > + * This source code is distributed under a dual license of GPL v2.0 and OpenIB > + * BSD. > + * > + * OpenIB BSD License > + * > + * Redistribution and use in source and binary forms, with or without folks, can we please have everything under arch/powerpc/ just licensed under plain GPL? Especially as this is obviously a derived work of the kernel and could hardly be used anywhere else. > +static void *ebus_alloc_coherent(struct device *dev, > + size_t size, > + dma_addr_t *dma_handle, > + unsigned int __nocast flag) > +{ > + return NULL; this should be a kmalloc for consistencies sake. > +struct ebus_dev* __devinit ebus_register_device_node(struct device_node *dn) wrong placement of the '*' From hch at lst.de Mon Nov 14 22:57:52 2005 From: hch at lst.de (Christoph Hellwig) Date: Mon, 14 Nov 2005 12:57:52 +0100 Subject: please pull powerpc-merge.git In-Reply-To: <17272.29526.477339.230738@cargo.ozlabs.ibm.com> References: <17272.29526.477339.230738@cargo.ozlabs.ibm.com> Message-ID: <20051114115752.GB6056@lst.de> On Mon, Nov 14, 2005 at 10:21:58PM +1100, Paul Mackerras wrote: > Heiko J Schick: > powerpc: GX bus support on pSeries machines hey, please wait a while with this. it's a bit half-backed and you don't really expect people to review it in the enourmous timespan of four hors, do you? From michael at ellerman.id.au Mon Nov 14 23:35:00 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Mon, 14 Nov 2005 23:35:00 +1100 (EST) Subject: [PATCH] powerpc: Merge kexec Message-ID: <20051114123500.178B468724@ozlabs.org> This patch merges, to some extent, the PPC32 and PPC64 kexec implementations. We adopt the PPC32 approach of having ppc_md callbacks for the kexec functions. The current PPC64 implementation becomes the "default" implementation for PPC64 which platforms can select if they need no special treatment. I've added these default callbacks to pseries/maple/cell/powermac, this means iSeries no longer supports kexec - but it never worked anyway. I've renamed PPC32's machine_kexec_simple to default_machine_kexec, inline with PPC64. Judging by the comments it might be better named machine_kexec_non_of, or something, but at the moment it's the only implementation for PPC32 so it's the "default". Kexec requires machine_shutdown(), which is in machine_kexec.c on PPC32, but we already have in setup-common.c on powerpc. All this does is call ppc_md.nvram_sync, which only powermac implements, so instead make machine_shutdown a ppc_md member and have it call core99_nvram_sync directly on powermac. I've also stuck relocate_kernel.S into misc_32.S for powerpc. Built for ARCH=ppc, and 32 & 64 bit ARCH=powerpc, with KEXEC=y/n. Booted on P5 LPAR and successfully kexec'ed. Should apply on top of 493f25ef4087395891c99fcfe2c72e62e293e89f. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/Makefile | 5 - arch/powerpc/kernel/machine_kexec.c | 67 ++++++++++++++++++ arch/powerpc/kernel/machine_kexec_32.c | 65 ++++++++++++++++++ arch/powerpc/kernel/machine_kexec_64.c | 25 ------- arch/powerpc/kernel/misc_32.S | 113 ++++++++++++++++++++++++++++++++ arch/powerpc/kernel/setup-common.c | 4 - arch/powerpc/platforms/cell/setup.c | 5 + arch/powerpc/platforms/maple/setup.c | 5 + arch/powerpc/platforms/powermac/nvram.c | 1 arch/powerpc/platforms/powermac/setup.c | 5 + arch/powerpc/platforms/pseries/setup.c | 3 arch/ppc/kernel/Makefile | 1 include/asm-powerpc/kexec.h | 7 + include/asm-powerpc/machdep.h | 8 +- 14 files changed, 281 insertions(+), 33 deletions(-) Index: kexec/include/asm-powerpc/machdep.h =================================================================== --- kexec.orig/include/asm-powerpc/machdep.h +++ kexec/include/asm-powerpc/machdep.h @@ -27,6 +27,9 @@ struct device_node; struct iommu_table; struct rtc_time; struct file; +#ifdef CONFIG_KEXEC +struct kimage; +#endif #ifdef CONFIG_SMP struct smp_ops_t { @@ -207,14 +210,14 @@ struct machdep_calls { /* this is for modules, since _machine can be a define -- Cort */ int ppc_machine; +#endif /* CONFIG_PPC32 */ -#ifdef CONFIG_KEXEC /* Called to shutdown machine specific hardware not already controlled * by other drivers. - * XXX Should we move this one out of kexec scope? */ void (*machine_shutdown)(void); +#ifdef CONFIG_KEXEC /* Called to do the minimal shutdown needed to run a kexec'd kernel * to run successfully. * XXX Should we move this one out of kexec scope? @@ -237,7 +240,6 @@ struct machdep_calls { */ void (*machine_kexec)(struct kimage *image); #endif /* CONFIG_KEXEC */ -#endif /* CONFIG_PPC32 */ }; extern void default_idle(void); Index: kexec/arch/powerpc/kernel/machine_kexec.c =================================================================== --- /dev/null +++ kexec/arch/powerpc/kernel/machine_kexec.c @@ -0,0 +1,67 @@ +/* + * Code to handle transition of Linux booting another kernel. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * Copyright (C) 2005 IBM Corporation. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include +#include +#include +#include + +/* + * Provide a dummy crash_notes definition until crash dump is implemented. + * This prevents breakage of crash_notes attribute in kernel/ksysfs.c. + */ +note_buf_t crash_notes[NR_CPUS]; + +void machine_crash_shutdown(struct pt_regs *regs) +{ + if (ppc_md.machine_crash_shutdown) + ppc_md.machine_crash_shutdown(); +} + +/* + * Do what every setup is needed on image and the + * reboot code buffer to allow us to avoid allocations + * later. + */ +int machine_kexec_prepare(struct kimage *image) +{ + if (ppc_md.machine_kexec_prepare) + return ppc_md.machine_kexec_prepare(image); + /* + * Fail if platform doesn't provide its own machine_kexec_prepare + * implementation. + */ + return -ENOSYS; +} + +void machine_kexec_cleanup(struct kimage *image) +{ + if (ppc_md.machine_kexec_cleanup) + ppc_md.machine_kexec_cleanup(image); +} + +/* + * Do not allocate memory (or fail in any way) in machine_kexec(). + * We are past the point of no return, committed to rebooting now. + */ +NORET_TYPE void machine_kexec(struct kimage *image) +{ + if (ppc_md.machine_kexec) + ppc_md.machine_kexec(image); + else { + /* + * Fall back to normal restart if platform doesn't provide + * its own kexec function, and user insist to kexec... + */ + machine_restart(NULL); + } + for(;;); +} Index: kexec/arch/powerpc/kernel/machine_kexec_32.c =================================================================== --- /dev/null +++ kexec/arch/powerpc/kernel/machine_kexec_32.c @@ -0,0 +1,65 @@ +/* + * PPC32 code to handle Linux booting another kernel. + * + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * Copyright (C) 2005 IBM Corporation. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include +#include +#include +#include +#include +#include + +typedef NORET_TYPE void (*relocate_new_kernel_t)( + unsigned long indirection_page, + unsigned long reboot_code_buffer, + unsigned long start_address) ATTRIB_NORET; + +/* + * This is a generic machine_kexec function suitable at least for + * non-OpenFirmware embedded platforms. + * It merely copies the image relocation code to the control page and + * jumps to it. + * A platform specific function may just call this one. + */ +void default_machine_kexec(struct kimage *image) +{ + const extern unsigned char relocate_new_kernel[]; + const extern unsigned int relocate_new_kernel_size; + unsigned long page_list; + unsigned long reboot_code_buffer, reboot_code_buffer_phys; + relocate_new_kernel_t rnk; + + /* Interrupts aren't acceptable while we reboot */ + local_irq_disable(); + + page_list = image->head; + + /* we need both effective and real address here */ + reboot_code_buffer = + (unsigned long)page_address(image->control_code_page); + reboot_code_buffer_phys = virt_to_phys((void *)reboot_code_buffer); + + /* copy our kernel relocation code to the control code page */ + memcpy((void *)reboot_code_buffer, relocate_new_kernel, + relocate_new_kernel_size); + + flush_icache_range(reboot_code_buffer, + reboot_code_buffer + KEXEC_CONTROL_CODE_SIZE); + printk(KERN_INFO "Bye!\n"); + + /* now call it */ + rnk = (relocate_new_kernel_t) reboot_code_buffer; + (*rnk)(page_list, reboot_code_buffer_phys, image->start); +} + +int default_machine_kexec_prepare(struct kimage *image) +{ + return 0; +} Index: kexec/arch/powerpc/platforms/cell/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/cell/setup.c +++ kexec/arch/powerpc/platforms/cell/setup.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -138,4 +139,8 @@ struct machdep_calls __initdata cell_md .set_rtc_time = rtas_set_rtc_time, .calibrate_decr = generic_calibrate_decr, .progress = cell_progress, +#ifdef CONFIG_KEXEC + .machine_kexec = default_machine_kexec, + .machine_kexec_prepare = default_machine_kexec_prepare, +#endif }; Index: kexec/arch/powerpc/platforms/maple/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/maple/setup.c +++ kexec/arch/powerpc/platforms/maple/setup.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include @@ -292,4 +293,8 @@ struct machdep_calls __initdata maple_md .calibrate_decr = generic_calibrate_decr, .progress = maple_progress, .idle_loop = native_idle, +#ifdef CONFIG_KEXEC + .machine_kexec = default_machine_kexec, + .machine_kexec_prepare = default_machine_kexec_prepare, +#endif }; Index: kexec/arch/powerpc/platforms/powermac/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/powermac/setup.c +++ kexec/arch/powerpc/platforms/powermac/setup.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include @@ -773,7 +774,11 @@ struct machdep_calls __initdata pmac_md .pci_probe_mode = pmac_probe_mode, .idle_loop = native_idle, .enable_pmcs = power4_enable_pmcs, +#ifdef CONFIG_KEXEC + .machine_kexec = default_machine_kexec, + .machine_kexec_prepare = default_machine_kexec_prepare, #endif +#endif /* CONFIG_PPC64 */ #ifdef CONFIG_PPC32 .pcibios_enable_device_hook = pmac_pci_enable_device_hook, .pcibios_after_init = pmac_pcibios_after_init, Index: kexec/arch/powerpc/platforms/pseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/setup.c +++ kexec/arch/powerpc/platforms/pseries/setup.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include #include "xics.h" @@ -638,5 +639,7 @@ struct machdep_calls __initdata pSeries_ .machine_check_exception = pSeries_machine_check_exception, #ifdef CONFIG_KEXEC .kexec_cpu_down = pseries_kexec_cpu_down, + .machine_kexec = default_machine_kexec, + .machine_kexec_prepare = default_machine_kexec_prepare, #endif }; Index: kexec/include/asm-powerpc/kexec.h =================================================================== --- kexec.orig/include/asm-powerpc/kexec.h +++ kexec/include/asm-powerpc/kexec.h @@ -41,10 +41,11 @@ extern note_buf_t crash_notes[]; extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern void __init kexec_setup(void); -#else -struct kimage; -extern void machine_kexec_simple(struct kimage *image); #endif +struct kimage; +extern void default_machine_kexec(struct kimage *image); +extern int default_machine_kexec_prepare(struct kimage *image); + #endif /* ! __ASSEMBLY__ */ #endif /* _ASM_POWERPC_KEXEC_H */ Index: kexec/arch/powerpc/kernel/Makefile =================================================================== --- kexec.orig/arch/powerpc/kernel/Makefile +++ kexec/arch/powerpc/kernel/Makefile @@ -64,8 +64,9 @@ pci64-$(CONFIG_PPC64) += pci_64.o pci_d pci_direct_iommu.o iomap.o obj-$(CONFIG_PCI) += $(pci64-y) -kexec64-$(CONFIG_PPC64) += machine_kexec_64.o -obj-$(CONFIG_KEXEC) += $(kexec64-y) +kexec-$(CONFIG_PPC64) := machine_kexec_64.o +kexec-$(CONFIG_PPC32) := machine_kexec_32.o +obj-$(CONFIG_KEXEC) += machine_kexec.o $(kexec-y) ifeq ($(CONFIG_PPC_ISERIES),y) $(obj)/head_64.o: $(obj)/lparmap.s Index: kexec/arch/powerpc/kernel/setup-common.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup-common.c +++ kexec/arch/powerpc/kernel/setup-common.c @@ -92,8 +92,8 @@ EXPORT_SYMBOL(ppc_do_canonicalize_irqs); /* also used by kexec */ void machine_shutdown(void) { - if (ppc_md.nvram_sync) - ppc_md.nvram_sync(); + if (ppc_md.machine_shutdown) + ppc_md.machine_shutdown(); } void machine_restart(char *cmd) Index: kexec/arch/powerpc/platforms/powermac/nvram.c =================================================================== --- kexec.orig/arch/powerpc/platforms/powermac/nvram.c +++ kexec/arch/powerpc/platforms/powermac/nvram.c @@ -549,6 +549,7 @@ static int __init core99_nvram_setup(str ppc_md.nvram_write = core99_nvram_write; ppc_md.nvram_size = core99_nvram_size; ppc_md.nvram_sync = core99_nvram_sync; + ppc_md.machine_shutdown = core99_nvram_sync; /* * Maybe we could be smarter here though making an exclusive list * of known flash chips is a bit nasty as older OF didn't provide us Index: kexec/arch/powerpc/kernel/misc_32.S =================================================================== --- kexec.orig/arch/powerpc/kernel/misc_32.S +++ kexec/arch/powerpc/kernel/misc_32.S @@ -5,6 +5,10 @@ * Largely rewritten by Cort Dougan (cort at cs.nmt.edu) * and Paul Mackerras. * + * kexec bits: + * Copyright (C) 2002-2003 Eric Biederman + * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz + * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version @@ -24,6 +28,8 @@ #include #include #include +#include +#include .text @@ -1014,3 +1020,110 @@ _GLOBAL(execve) */ _GLOBAL(__main) blr + +#ifdef CONFIG_KEXEC + /* + * Must be relocatable PIC code callable as a C function. + */ + .globl relocate_new_kernel +relocate_new_kernel: + /* r3 = page_list */ + /* r4 = reboot_code_buffer */ + /* r5 = start_address */ + + li r0, 0 + + /* + * Set Machine Status Register to a known status, + * switch the MMU off and jump to 1: in a single step. + */ + + mr r8, r0 + ori r8, r8, MSR_RI|MSR_ME + mtspr SPRN_SRR1, r8 + addi r8, r4, 1f - relocate_new_kernel + mtspr SPRN_SRR0, r8 + sync + rfi + +1: + /* from this point address translation is turned off */ + /* and interrupts are disabled */ + + /* set a new stack at the bottom of our page... */ + /* (not really needed now) */ + addi r1, r4, KEXEC_CONTROL_CODE_SIZE - 8 /* for LR Save+Back Chain */ + stw r0, 0(r1) + + /* Do the copies */ + li r6, 0 /* checksum */ + mr r0, r3 + b 1f + +0: /* top, read another word for the indirection page */ + lwzu r0, 4(r3) + +1: + /* is it a destination page? (r8) */ + rlwinm. r7, r0, 0, 31, 31 /* IND_DESTINATION (1<<0) */ + beq 2f + + rlwinm r8, r0, 0, 0, 19 /* clear kexec flags, page align */ + b 0b + +2: /* is it an indirection page? (r3) */ + rlwinm. r7, r0, 0, 30, 30 /* IND_INDIRECTION (1<<1) */ + beq 2f + + rlwinm r3, r0, 0, 0, 19 /* clear kexec flags, page align */ + subi r3, r3, 4 + b 0b + +2: /* are we done? */ + rlwinm. r7, r0, 0, 29, 29 /* IND_DONE (1<<2) */ + beq 2f + b 3f + +2: /* is it a source page? (r9) */ + rlwinm. r7, r0, 0, 28, 28 /* IND_SOURCE (1<<3) */ + beq 0b + + rlwinm r9, r0, 0, 0, 19 /* clear kexec flags, page align */ + + li r7, PAGE_SIZE / 4 + mtctr r7 + subi r9, r9, 4 + subi r8, r8, 4 +9: + lwzu r0, 4(r9) /* do the copy */ + xor r6, r6, r0 + stwu r0, 4(r8) + dcbst 0, r8 + sync + icbi 0, r8 + bdnz 9b + + addi r9, r9, 4 + addi r8, r8, 4 + b 0b + +3: + + /* To be certain of avoiding problems with self-modifying code + * execute a serializing instruction here. + */ + isync + sync + + /* jump to the entry point, usually the setup routine */ + mtlr r5 + blrl + +1: b 1b + +relocate_new_kernel_end: + + .globl relocate_new_kernel_size +relocate_new_kernel_size: + .long relocate_new_kernel_end - relocate_new_kernel +#endif Index: kexec/arch/ppc/kernel/Makefile =================================================================== --- kexec.orig/arch/ppc/kernel/Makefile +++ kexec/arch/ppc/kernel/Makefile @@ -49,5 +49,4 @@ obj-$(CONFIG_TAU) += temp.o ifndef CONFIG_E200 obj-$(CONFIG_FSL_BOOKE) += perfmon_fsl_booke.o endif -obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o endif Index: kexec/arch/powerpc/kernel/machine_kexec_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/machine_kexec_64.c +++ kexec/arch/powerpc/kernel/machine_kexec_64.c @@ -1,5 +1,5 @@ /* - * machine_kexec.c - handle transition of Linux booting another kernel + * PPC64 code to handle Linux booting another kernel. * * Copyright (C) 2004-2005, IBM Corp. * @@ -28,21 +28,7 @@ #define HASH_GROUP_SIZE 0x80 /* size of each hash group, asm/mmu.h */ -/* Have this around till we move it into crash specific file */ -note_buf_t crash_notes[NR_CPUS]; - -/* Dummy for now. Not sure if we need to have a crash shutdown in here - * and if what it will achieve. Letting it be now to compile the code - * in generic kexec environment - */ -void machine_crash_shutdown(struct pt_regs *regs) -{ - /* do nothing right now */ - /* smp_relase_cpus() if we want smp on panic kernel */ - /* cpu_irq_down to isolate us until we are ready */ -} - -int machine_kexec_prepare(struct kimage *image) +int default_machine_kexec_prepare(struct kimage *image) { int i; unsigned long begin, end; /* limits of segment */ @@ -111,11 +97,6 @@ int machine_kexec_prepare(struct kimage return 0; } -void machine_kexec_cleanup(struct kimage *image) -{ - /* we do nothing in prepare that needs to be undone */ -} - #define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE) static void copy_segments(unsigned long ind) @@ -283,7 +264,7 @@ extern NORET_TYPE void kexec_sequence(vo void (*clear_all)(void)) ATTRIB_NORET; /* too late to fail here */ -void machine_kexec(struct kimage *image) +void default_machine_kexec(struct kimage *image) { /* prepare control code if any */ From linas at austin.ibm.com Tue Nov 15 02:52:59 2005 From: linas at austin.ibm.com (linas) Date: Mon, 14 Nov 2005 09:52:59 -0600 Subject: Steps in upgrading Linux kernel 2.6.7 to 64 bit kernel In-Reply-To: <5f87992f0511140155w31d6ab7fnb58d260d7b206567@mail.gmail.com> References: <5f87992f0511140148o4317d522tdd3c2014705b1c56@mail.gmail.com> <5f87992f0511140155w31d6ab7fnb58d260d7b206567@mail.gmail.com> Message-ID: <20051114155259.GD19593@austin.ibm.com> On Mon, Nov 14, 2005 at 03:25:56PM +0530, Vijayakumar Ramalingam was heard to remark: > Hi all, > I want to build a 64 bit ppc kernel from 32 bit ppc-linux kernel 2.6.7 > I want to know how it can be done & what else to be taken care? Wouldn't it make more sense to build a 64-bit kernel from the 64-bit sources from a recent kernel? for example, 2.6.14 ?? --linas From miltonm at bga.com Tue Nov 15 04:20:38 2005 From: miltonm at bga.com (Milton Miller) Date: Mon, 14 Nov 2005 11:20:38 -0600 Subject: [PATCH] GX bus support Message-ID: <64a740f448eef69f738213f7ebfc0584@bga.com> On Mon Nov 14 19:16:40 EST 2005, Benjamin Herrenschmidt sent > This patch adds the necessary core bus support used by device drivers > that sit on the IBM GX bus on modern pSeries machines like the Galaxy > infiniband for example. It provide transparent DMA ops (the low level > driver works with virtual addresses directly) along with a simple bus > layer using the Open Firmware matching routines. > Index: linux-work/arch/ppc64/kernel/dma.c > =================================================================== > --- linux-work.orig/arch/ppc64/kernel/dma.c 2005-11-14 > 19:08:12.000000000 +1100 > +++ linux-work/arch/ppc64/kernel/dma.c 2005-11-14 19:08:14.000000000 > +1100 > @@ -10,6 +10,7 @@ > /* Include the busses we support */ > #include > #include > +#include > #include > #include > > @@ -23,6 +24,10 @@ > if (dev->bus == &vio_bus_type) > return &vio_dma_ops; > #endif > +#ifdef CONFIG_IBMEBUS > + if (dev->bus == &ebus_bus_type) > + return &ebus_dma_ops; > +#endif > return NULL; > } > > @@ -46,7 +51,11 @@ > #ifdef CONFIG_IBMVIO > if (dev->bus == &vio_bus_type) > return -EIO; > -#endif /* CONFIG_IBMVIO */ > +#endif > +#ifdef CONFIG_IBMEBUS > + if (dev->bus == &ebus_bus_type) > + return -EIO; > +#endif > BUG(); > return 0; > } > Index: linux-work/arch/powerpc/platforms/pseries/ebus.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/powerpc/platforms/pseries/ebus.c 2005-11-14 > 19:11:08.000000000 +1100 > @@ -0,0 +1,372 @@ > +/* > + * IBM PowerPC eBus Infrastructure Support. > + * > + * Copyright (c) 2005 IBM Corporation > + * Heiko J Schick > + * > + * All rights reserved. > + * > + * This source code is distributed under a dual license of GPL v2.0 > and OpenIB > + * BSD. > + * > + * OpenIB BSD License > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > are met: > + * > + * Redistributions of source code must retain the above copyright > notice, this > + * list of conditions and the following disclaimer. > + * > + * Redistributions in binary form must reproduce the above copyright > notice, > + * this list of conditions and the following disclaimer in the > documentation > + * and/or other materials > + * provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS "AS IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR > CONTRIBUTORS BE > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT > OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR > + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > LIABILITY, WHETHER > + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR > OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF > ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static struct ebus_dev ebus_bus_device = { /* fake "parent" device */ > + .name = ebus_bus_device.ofdev.dev.bus_id, > + .ofdev.dev.bus_id = "ebus", > + .ofdev.dev.bus = &ebus_bus_type, > +}; > + > +static void *ebus_alloc_coherent(struct device *dev, > + size_t size, > + dma_addr_t *dma_handle, > + unsigned int __nocast flag) > +{ > + return NULL; > +} > + > +static void ebus_free_coherent(struct device *dev, > + size_t size, void *vaddr, dma_addr_t > dma_handle) > +{ > + return; > +} > + > +static dma_addr_t ebus_map_single(struct device *dev, > + void *ptr, > + size_t size, > + enum dma_data_direction direction) > +{ > + return (dma_addr_t)(ptr); > +} > + > +static void ebus_unmap_single(struct device *dev, > + dma_addr_t dma_addr, > + size_t size, enum dma_data_direction > direction) > +{ > + return; > +} > + > +static int ebus_map_sg(struct device *dev, > + struct scatterlist *sg, > + int nents, enum dma_data_direction direction) > +{ > + int i; > + > + for (i = 0; i < nents; i++) { > + sg[i].dma_address = > (dma_addr_t)page_address(sg[i].page) > + + sg[i].offset; > + } > + > + return nents; > +} > + > +static void ebus_unmap_sg(struct device *dev, > + struct scatterlist *sg, > + int nents, enum dma_data_direction direction) > +{ > + return; > +} > + > +static int ebus_dma_supported(struct device *dev, u64 mask) > +{ > + return 1; > +} > + > +struct dma_mapping_ops ebus_dma_ops = { > + .alloc_coherent = ebus_alloc_coherent, > + .free_coherent = ebus_free_coherent, > + .map_single = ebus_map_single, > + .unmap_single = ebus_unmap_single, > + .map_sg = ebus_map_sg, > + .unmap_sg = ebus_unmap_sg, > + .dma_supported = ebus_dma_supported, > +}; > + > +static int ebus_bus_probe(struct device *dev) > +{ > + struct ebus_dev *ebusdev = to_ebus_dev(dev); > + struct ebus_driver *ebusdrv = to_ebus_driver(dev->driver); > + const struct of_device_id *id; > + int error = -ENODEV; > + > + if (!ebusdrv->probe) > + return error; > + > + id = of_match_device(ebusdrv->id_table, &ebusdev->ofdev); > + if (id) { > + error = ebusdrv->probe(ebusdev, id); > + } > + > + return error; > +} > + > +static int ebus_bus_remove(struct device *dev) > +{ > + struct ebus_dev *ebusdev = to_ebus_dev(dev); > + struct ebus_driver *ebusdrv = to_ebus_driver(dev->driver); > + > + if (ebusdrv->remove) { > + return ebusdrv->remove(ebusdev); > + } > + > + return 1; > +} > + what does the 1 mean? just return 0. (hmm... bus remove should probably be void) > +static void __devinit ebus_dev_release(struct device *dev) > +{ > + of_node_put(dev->platform_data); to_ebus_dev(dev)->ofdev.node ? buses don't own platform_data > + kfree(to_ebus_dev(dev)); > +} > + > +static ssize_t ebusdev_show_name(struct device *dev, > + struct device_attribute *attr, char > *buf) > +{ > + return sprintf(buf, "%s\n", to_ebus_dev(dev)->name); > +} > +static DEVICE_ATTR(name, S_IRUSR | S_IRGRP | S_IROTH, > ebusdev_show_name, NULL); We need this because the id can be truncated i guess. We should also create a devspec file with the devicetree path like we do for pci devices. Then use the list of attributes helper. > + > +static struct ebus_dev* __devinit ebus_register_device_common( > + struct ebus_dev *ebusdev, char *name) > +{ > + ebusdev->name = name; > + ebusdev->ofdev.dev.parent = &ebus_bus_device.ofdev.dev; > + ebusdev->ofdev.dev.bus = &ebus_bus_type; > + ebusdev->ofdev.dev.release = ebus_dev_release; > + > + if (of_device_register(&ebusdev->ofdev) != 0) { wait, calling into another buses' functions? I thought we were a bus driver. If that is a helper it should be clearly marked, not be hidden in the middle of another bus' implementation and registration function. Looking at that function, checking if the linux,device is NULL or non-existant will give us use-once too. > + printk(KERN_ERR "%s: failed to register device %s\n", > + __FUNCTION__, ebusdev->ofdev.dev.bus_id); > + return NULL; > + } > + > + device_create_file(&ebusdev->ofdev.dev, &dev_attr_name); > + > + return ebusdev; > +} > + > +struct ebus_dev* __devinit ebus_register_device_node(struct > device_node *dn) > +{ > + struct ebus_dev *ebusdev; > + char *loc_code; > + int length; > + > + loc_code = (char *)get_property(dn, "ibm,loc-code", NULL); > + if (!loc_code) { > + printk(KERN_WARNING "%s: node %s missing > 'ibm,loc-code'\n", > + __FUNCTION__, dn->name ? dn->name : > ""); > + return NULL; > + } > + > + if (strlen(loc_code) == 0) { > + printk(KERN_WARNING "%s: 'ibm,loc-code' is invalid\n", > + __FUNCTION__); > + return NULL; > + } > + > + ebusdev = kmalloc(sizeof(struct ebus_dev), GFP_KERNEL); > + if (!ebusdev) { > + return NULL; > + } > + memset(ebusdev, 0, sizeof(struct ebus_dev)); > + > + ebusdev->ofdev.node = of_node_get(dn); > + > + length = strlen(loc_code); > + strncpy(ebusdev->ofdev.dev.bus_id, loc_code > + + (strlen(loc_code) - min(length, BUS_ID_SIZE)), > BUS_ID_SIZE); length - min(length, BUS_ID_SIZE) Hmm.. wonder if there are bugs if we actually use BUS_ID_SIZE characters? Maybe that min should be BUS_ID_SIZE-1 > + > + /* register with generic device framework */ > + if (ebus_register_device_common(ebusdev, dn->name) == NULL) { > leaking of_node reference > + kfree(ebusdev); > + return NULL; > + } > + > + return ebusdev; > +} > + > +static void probe_bus(char* name) > +{ > + struct device_node *dn = NULL; > + > + while ((dn = of_find_node_by_name(dn, name))) { > + ebus_register_device_node(dn); > + } > + > + of_node_put(dn); > +} > + > +static int ebus_unregister_device(struct device *dev) > +{ > + device_remove_file(dev, &dev_attr_name); > + of_device_unregister(to_of_device(dev)); > + > + return 0; > +} > + > +static int ebus_match_helper(struct device *dev, void *data) > +{ > + if (strcmp((char*)data, to_ebus_dev(dev)->name) == 0) > + return 1; So we add and remove on strncmp name instead of of_match_device like we bind the driver? > + > + return 0; > +} > + > +int ebus_register_driver(struct ebus_driver *ebusdrv) > +{ > + struct of_device_id *idt; > + struct device *dev; > + > + ebusdrv->driver.name = ebusdrv->name; > + ebusdrv->driver.bus = &ebus_bus_type; > + ebusdrv->driver.probe = ebus_bus_probe; > + ebusdrv->driver.remove = ebus_bus_remove; > + > + /* check if a driver for that device name is already loaded */ > + idt = ebusdrv->id_table; > + while (strlen(idt->name) > 0) { > + dev = bus_find_device(&ebus_bus_type, NULL, > (void*)idt->name, > + ebus_match_helper); > + if (dev) { > + printk(KERN_ERR > + "%s: driver for device name %s already > loaded\n", > + __FUNCTION__, idt->name); > + return -EPERM; > + } > + idt++; > + } > + > + idt = ebusdrv->id_table; > + while (strlen(idt->name) > 0) { > + probe_bus(idt->name); > + idt++; > + } Lets seperate out this magic device discovery into a seperate function. It is a logically seperate concept. We really shouldn't need to verify the bus ids are unique, because the device registration will fail if the device shows up twice (kobject registration conflict). However, unloding either driver would remove the device from the matched driver... > + > + return driver_register(&ebusdrv->driver); > +} > +EXPORT_SYMBOL(ebus_register_driver); > + > +void ebus_unregister_driver(struct ebus_driver *ebusdrv) > +{ > + struct of_device_id *idt; > + struct device *dev; > + > + driver_unregister(&ebusdrv->driver); > + > + idt = ebusdrv->id_table; > + while (strlen(idt->name) > 0) { > + while ((dev = bus_find_device(&ebus_bus_type, NULL, > + (void*)idt->name, > + ebus_match_helper))) { > + ebus_unregister_device(dev); > + } > + idt++; > + > + } Again, device removal hidden in driver unregister ... seperate function please. Oh, and an extra blank line. > +} > +EXPORT_SYMBOL(ebus_unregister_driver); > + > +int ebus_request_irq(u32 ist, > + irqreturn_t (*handler)(int, void*, struct pt_regs > *), > + unsigned long irq_flags, const char * devname, > + void *dev_id) > +{ > + unsigned int irq = virt_irq_create_mapping(ist); > + > + if (irq == NO_IRQ) > + return -EINVAL; > + > + irq = irq_offset_up(irq); > + > + return request_irq(irq, handler, > + irq_flags, devname, dev_id); should we not at least pass in the ebus device, even if we don't check the irq requested against it today? > +} > +EXPORT_SYMBOL(ebus_request_irq); > + > +void ebus_free_irq(u32 ist, void *dev_id) > +{ > + unsigned int irq = virt_irq_create_mapping(ist); free_irq calls create_mapping ??? And doesn't check for NO_IRQ? how about we store the "linux" irq in ebusdevice->irq? > + > + irq = irq_offset_up(irq); > + free_irq(irq, dev_id); > + > + return; > +} > +EXPORT_SYMBOL(ebus_free_irq); > + > +static int ebus_bus_match(struct device *dev, struct device_driver > *drv) > +{ > + const struct ebus_dev *ebus_dev = to_ebus_dev(dev); > + struct ebus_driver *ebus_drv = to_ebus_driver(drv); > + const struct of_device_id *ids = ebus_drv->id_table; > + const struct of_device_id *found_id; > + > + if (!ids) > + return 0; > + > + found_id = of_match_device(ids, &ebus_dev->ofdev); > + if (found_id) > + return 1; > + > + return 0; > +} > + > +struct bus_type ebus_bus_type = { > + .name = "ebus", > + .match = ebus_bus_match, > +}; > +EXPORT_SYMBOL(ebus_bus_type); > + > +static int __init ebus_bus_init(void) > +{ > + int err; > + > + printk(KERN_INFO "eBus Device Driver\n"); > + > + err = bus_register(&ebus_bus_type); > + if (err) { > + printk(KERN_ERR "failed to register eBus\n"); > + return err; > + } > + > + err = device_register(&ebus_bus_device.ofdev.dev); > + if (err) { > + printk(KERN_WARNING "%s: device_register returned > %i\n", > + __FUNCTION__, err); > + return err; > + } If the device register fails we need to unregister the bus. > + > + return 0; > +} > +__initcall(ebus_bus_init); > Index: linux-work/include/asm-powerpc/ebus.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/include/asm-powerpc/ebus.h 2005-11-14 > 19:08:14.000000000 +1100 > @@ -0,0 +1,87 @@ > +/* > + * IBM PowerPC eBus Infrastructure Support. > + * > + * Copyright (c) 2005 IBM Corporation > + * Heiko J Schick > + * > + * All rights reserved. > + * > + * This source code is distributed under a dual license of GPL v2.0 > and OpenIB > + * BSD. > + * > + * OpenIB BSD License > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > are met: > + * > + * Redistributions of source code must retain the above copyright > notice, this > + * list of conditions and the following disclaimer. > + * > + * Redistributions in binary form must reproduce the above copyright > notice, > + * this list of conditions and the following disclaimer in the > documentation > + * and/or other materials > + * provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS "AS IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR > CONTRIBUTORS BE > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT > OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR > + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > LIABILITY, WHETHER > + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR > OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF > ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _ASM_EBUS_H > +#define _ASM_EBUS_H > + > +#include > +#include > +#include > +#include > + > +extern struct dma_mapping_ops ebus_dma_ops; > +extern struct bus_type ebus_bus_type; > + > +struct ebus_dev { > + char *name; > + u64 unit_address; > + struct of_device ofdev; > +}; unit_address isn't used anywhere ... > + > +struct ebus_driver { > + struct list_head node; > + char *name; > + struct of_device_id *id_table; > + int (*probe) (struct ebus_dev *dev, const struct of_device_id > *id); > + int (*remove) (struct ebus_dev *dev); > + unsigned long driver_data; Usually we would use the generic device's driver_data field. Oh, and add ebus_set_driver_data / ebus_get_driver_data > + > + struct device_driver driver; > +}; > + > +int ebus_register_driver(struct ebus_driver *ebusdrv); > +void ebus_unregister_driver(struct ebus_driver *ebusdrv); > + > +int ebus_request_irq(u32 ist, > + irqreturn_t (*handler)(int, void*, struct pt_regs > *), > + unsigned long irq_flags, const char * devname, > + void *dev_id); > + > +void ebus_free_irq(u32 ist, void *dev_id); > + > +static inline struct ebus_driver *to_ebus_driver(struct device_driver > *drv) > +{ > + return container_of(drv, struct ebus_driver, driver); > +} > + > +static inline struct ebus_dev *to_ebus_dev(struct device *dev) > +{ > + return container_of(dev, struct ebus_dev, ofdev.dev); > +} > + > + > +#endif /* _ASM_EBUS_H */ > Index: linux-work/arch/powerpc/Kconfig > =================================================================== > --- linux-work.orig/arch/powerpc/Kconfig 2005-11-14 > 19:08:12.000000000 +1100 > +++ linux-work/arch/powerpc/Kconfig 2005-11-14 19:08:14.000000000 > +1100 > @@ -384,6 +384,13 @@ > bool > default y > > +config IBMEBUS > + depends on PPC_PSERIES > + bool "Support for GX bus based adapters" > + default y > + help > + Bus device driver for GX bus based adapters. > + > config PPC_MPC106 > bool > default n > Index: linux-work/arch/powerpc/platforms/pseries/Makefile > =================================================================== > --- linux-work.orig/arch/powerpc/platforms/pseries/Makefile > 2005-11-14 19:08:12.000000000 +1100 > +++ linux-work/arch/powerpc/platforms/pseries/Makefile 2005-11-14 > 19:08:14.000000000 +1100 > @@ -4,4 +4,5 @@ > obj-$(CONFIG_IBMVIO) += vio.o > obj-$(CONFIG_XICS) += xics.o > obj-$(CONFIG_SCANLOG) += scanlog.o > -obj-$(CONFIG_EEH) += eeh.o eeh_event.o > +obj-$(CONFIG_EEH) += eeh.o eeh_event.o > +obj-$(CONFIG_IBMEBUS) += ebus.o From miltonm at bga.com Tue Nov 15 05:23:06 2005 From: miltonm at bga.com (Milton Miller) Date: Mon, 14 Nov 2005 12:23:06 -0600 Subject: [PATCH 0/8] powerpc: Kexec fixups and support for booting at 32MB Message-ID: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> I do have comments on this series, but have been slow to write them down. PS: Please copy me on kexec/kdump ppc patches so I can reply with references. I read the lists from the web. Quickly: [PATCH 1/8] powerpc: Turn cpu_irq_down into kexec_cpu_down (1) I was hoping to share code with cpu dlpar remove (2) If we need a seperate vpa call fine, but don't put an interrupt controller compare in a funtion, we may need a seperate pointer. Since we won't (are'nt likely to?) have vpa and mpic, I would settle today for having vpa call xics if it was near the setup function. [PATCH 5/8] powerpc: Add CONFIG_CRASH_DUMP the __va change should be in [PATCH 4/8] powerpc: Seperate usage of KERNELBASE and PAGE_OFFSET And should this not be last, since following patches are required to get the kernel to work again? What, you need PHYISCAL_START for them? well just #define it 0 for a bit in patch 4. [PATCH 6/8] powerpc: Reroute interrupts from 0 + offset to PHYSICAL_START + offset The following should be in user space / device tree: +#ifdef CONFIG_CRASH_DUMP + lmb_reserve(0, KDUMP_BACKUP_LIMIT); +#endif [PATCH 7/8] powerpc: Create a trampoline for the fwnmi vectors I totally disagree with this one, espically reregitering with the low address in the trampoline. The registration should be at the new address. And a1, a2 are very generic names. [PATCH 8/8] powerpc: Fixups for kernel linked at 32 MB (1) powermac smp.c -- use create_branch (2) The secondary hold code could be done as a 64 bit load in the first 0x100 bytes vs LOADADDR (3) Why did you move LOAD_HANDLER down one instruction? It would seem not to help optimization milton From haren at us.ibm.com Tue Nov 15 06:01:35 2005 From: haren at us.ibm.com (Haren Myneni) Date: Mon, 14 Nov 2005 11:01:35 -0800 Subject: [PATCH 0/8] powerpc: Kexec fixups and support for booting at 32MB In-Reply-To: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> References: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> Message-ID: <4378DF0F.6000101@us.ibm.com> Milton Miller wrote: > I do have comments on this series, but have been slow to write them > down. > > PS: Please copy me on kexec/kdump ppc patches so I can reply with > references. I read the lists from the web. Sure, will do that when we post kdump patches again. > > > The following should be in user space / device tree: > +#ifdef CONFIG_CRASH_DUMP > + lmb_reserve(0, KDUMP_BACKUP_LIMIT); > +#endif > We are also thinking of reserving this region in kexec-tools. But the question is what if the user wants to do OF boot at 32MB. With the current change as it is, we can achieve this. Even I am not sure the use, but in case ... Thanks Haren From becky.bruce at freescale.com Tue Nov 15 06:53:19 2005 From: becky.bruce at freescale.com (Becky Bruce) Date: Mon, 14 Nov 2005 13:53:19 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1131955237.5504.148.camel@gaston> References: <1131955237.5504.148.camel@gaston> Message-ID: Ben, I talked to Kumar about this a little bit (I had started a merge of this file, but got distracted!) and he doesn't have any test cases. I'll put something together and test this out on some of the 32-bit systems I have here in my lab. It won't be complete, but it will be something....... Cheers, B On Nov 14, 2005, at 2:00 AM, Benjamin Herrenschmidt wrote: > Need testing !!! > > This patch merges align.c, the result isn't quite what was in ppc64 nor > what was in ppc32 :) It should implement all the functionalities of > both > though. Kumar, since you played with that in the past, I suppose you > have some test cases for verifying that it works properly before I dig > out the 601 machine ? :) > > Since it's likely that I won't be able to test all scenario, code > inspection is much welcome. > > Signed-off-by: Benjamin Herrenschmidt > > > Index: linux-work/arch/powerpc/kernel/Makefile > =================================================================== > --- linux-work.orig/arch/powerpc/kernel/Makefile??????? 2005-11-14 > 15:17:57.000000000 +1100 > +++ linux-work/arch/powerpc/kernel/Makefile???? 2005-11-14 > 17:18:14.000000000 +1100 > @@ -12,7 +12,7 @@ > ?endif > ? > ?obj-y? ??????? ??????? ??????? := semaphore.o cputable.o ptrace.o > syscalls.o \ > -?????? ??????? ??????? ??????? ?? irq.o signal_32.o pmc.o vdso.o > +?????? ??????? ??????? ??????? ?? irq.o align.o signal_32.o pmc.o > vdso.o > ?obj-y? ??????? ??????? ??????? += vdso32/ > ?obj-$(CONFIG_PPC64)??? ??????? += setup_64.o binfmt_elf32.o > sys_ppc32.o \ > ??????? ??????? ??????? ??????? ?? signal_64.o ptrace32.o systbl.o \ > Index: linux-work/arch/powerpc/kernel/align.c > =================================================================== > --- /dev/null?? 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/powerpc/kernel/align.c????? 2005-11-14 > 18:41:22.000000000 +1100 > @@ -0,0 +1,513 @@ > +/* align.c - handle alignment exceptions for the Power PC. > + * > + * Copyright (c) 1996 Paul Mackerras > + * Copyright (c) 1998-1999 TiVo, Inc. > + *?? PowerPC 403GCX modifications. > + * Copyright (c) 1999 Grant Erickson > + *?? PowerPC 403GCX/405GP modifications. > + * Copyright (c) 2001-2002 PPC64 team, IBM Corp > + *?? 64-bit and Power4 support > + * Copyright (c) 2005 Benjamin Herrenschmidt, IBM Corp > + *??????????????????? > + *?? Merge ppc32 and ppc64 implementations > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct aligninfo { > +?????? unsigned char len; > +?????? unsigned char flags; > +}; > + > +#define IS_XFORM(inst) (((inst) >> 26) == 31) > +#define IS_DSFORM(inst)??????? (((inst) >> 26) >= 56) > + > +#define INVALID??????? { 0, 0 } > + > +#define LD???? 1?????? /* load */ > +#define ST???? 2?????? /* store */ > +#define??????? SE????? 4?????? /* sign-extend value */ > +#define F????? 8?????? /* to/from fp regs */ > +#define U????? 0x10??? /* update index register */ > +#define M????? 0x20??? /* multiple load/store */ > +#define SW???? 0x40??? /* byte swap int or ... */ > +#define S????? 0x40??? /* ... single-precision fp */ > +#define SX???? 0x40??? /* byte count in XER */ > +#define HARD?? 0x80??? /* string, stwcx. */ > + > +#define DCBZ?? 0x5f??? /* 8xx/82xx dcbz faults when cache not enabled > */ > + > +#define SWAP(a, b)???? (t = (a), (a) = (b), (b) = t) > + > +/* > + * The PowerPC stores certain bits of the instruction that caused the > + * alignment exception in the DSISR register.? This array maps those > + * bits to information about the operand length and what the > + * instruction would do. > + */ > +static struct aligninfo aligninfo[128] = { > +?????? { 4, LD },????? ??????? /* 00 0 0000: lwz / lwarx */ > +?????? INVALID,??????? ??????? /* 00 0 0001 */ > +?????? { 4, ST },????? ??????? /* 00 0 0010: stw */ > +?????? INVALID,??????? ??????? /* 00 0 0011 */ > +?????? { 2, LD },????? ??????? /* 00 0 0100: lhz */ > +?????? { 2, LD+SE },?? ??????? /* 00 0 0101: lha */ > +?????? { 2, ST },????? ??????? /* 00 0 0110: sth */ > +?????? { 4, LD+M },??? ??????? /* 00 0 0111: lmw */ > +?????? { 4, LD+F+S },? ??????? /* 00 0 1000: lfs */ > +?????? { 8, LD+F },??? ??????? /* 00 0 1001: lfd */ > +?????? { 4, ST+F+S },? ??????? /* 00 0 1010: stfs */ > +?????? { 8, ST+F },??? ??????? /* 00 0 1011: stfd */ > +?????? INVALID,??????? ??????? /* 00 0 1100 */ > +?????? { 8, LD },????? ??????? /* 00 0 1101: ld/ldu/lwa */ > +?????? INVALID,??????? ??????? /* 00 0 1110 */ > +?????? { 8, ST },????? ??????? /* 00 0 1111: std/stdu */ > +?????? { 4, LD+U },??? ??????? /* 00 1 0000: lwzu */ > +?????? INVALID,??????? ??????? /* 00 1 0001 */ > +?????? { 4, ST+U },??? ??????? /* 00 1 0010: stwu */ > +?????? INVALID,??????? ??????? /* 00 1 0011 */ > +?????? { 2, LD+U },??? ??????? /* 00 1 0100: lhzu */ > +?????? { 2, LD+SE+U }, ??????? /* 00 1 0101: lhau */ > +?????? { 2, ST+U },??? ??????? /* 00 1 0110: sthu */ > +?????? { 4, ST+M },??? ??????? /* 00 1 0111: stmw */ > +?????? { 4, LD+F+S+U },??????? /* 00 1 1000: lfsu */ > +?????? { 8, LD+F+U },? ??????? /* 00 1 1001: lfdu */ > +?????? { 4, ST+F+S+U },??????? /* 00 1 1010: stfsu */ > +?????? { 8, ST+F+U },? ??????? /* 00 1 1011: stfdu */ > +?????? INVALID,??????? ??????? /* 00 1 1100 */ > +?????? INVALID,??????? ??????? /* 00 1 1101 */ > +?????? INVALID,??????? ??????? /* 00 1 1110 */ > +?????? INVALID,??????? ??????? /* 00 1 1111 */ > +?????? { 8, LD },????? ??????? /* 01 0 0000: ldx */ > +?????? INVALID,??????? ??????? /* 01 0 0001 */ > +?????? { 8, ST },????? ??????? /* 01 0 0010: stdx */ > +?????? INVALID,??????? ??????? /* 01 0 0011 */ > +?????? INVALID,??????? ??????? /* 01 0 0100 */ > +?????? { 4, LD+SE },?? ??????? /* 01 0 0101: lwax */ > +?????? INVALID,??????? ??????? /* 01 0 0110 */ > +?????? INVALID,??????? ??????? /* 01 0 0111 */ > +?????? { 4, LD+M+HARD+SX },??? /* 01 0 1000: lswx */ > +?????? { 4, LD+M+HARD },?????? /* 01 0 1001: lswi */ > +?????? { 4, ST+M+HARD+SX },??? /* 01 0 1010: stswx */ > +?????? { 4, ST+M+HARD },?????? /* 01 0 1011: stswi */ > +?????? INVALID,??????? ??????? /* 01 0 1100 */ > +?????? { 8, LD+U },??? ??????? /* 01 0 1101: ldu */ > +?????? INVALID,??????? ??????? /* 01 0 1110 */ > +?????? { 8, ST+U },??? ??????? /* 01 0 1111: stdu */ > +?????? { 8, LD+U },??? ??????? /* 01 1 0000: ldux */ > +?????? INVALID,??????? ??????? /* 01 1 0001 */ > +?????? { 8, ST+U },??? ??????? /* 01 1 0010: stdux */ > +?????? INVALID,??????? ??????? /* 01 1 0011 */ > +?????? INVALID,??????? ??????? /* 01 1 0100 */ > +?????? { 4, LD+SE+U }, ??????? /* 01 1 0101: lwaux */ > +?????? INVALID,??????? ??????? /* 01 1 0110 */ > +?????? INVALID,??????? ??????? /* 01 1 0111 */ > +?????? INVALID,??????? ??????? /* 01 1 1000 */ > +?????? INVALID,??????? ??????? /* 01 1 1001 */ > +?????? INVALID,??????? ??????? /* 01 1 1010 */ > +?????? INVALID,??????? ??????? /* 01 1 1011 */ > +?????? INVALID,??????? ??????? /* 01 1 1100 */ > +?????? INVALID,??????? ??????? /* 01 1 1101 */ > +?????? INVALID,??????? ??????? /* 01 1 1110 */ > +?????? INVALID,??????? ??????? /* 01 1 1111 */ > +?????? INVALID,??????? ??????? /* 10 0 0000 */ > +?????? INVALID,??????? ??????? /* 10 0 0001 */ > +?????? INVALID,??????? ??????? /* 10 0 0010: stwcx. */ > +?????? INVALID,??????? ??????? /* 10 0 0011 */ > +?????? INVALID,??????? ??????? /* 10 0 0100 */ > +?????? INVALID,??????? ??????? /* 10 0 0101 */ > +?????? INVALID,??????? ??????? /* 10 0 0110 */ > +?????? INVALID,??????? ??????? /* 10 0 0111 */ > +?????? { 4, LD+SW },?? ??????? /* 10 0 1000: lwbrx */ > +?????? INVALID,??????? ??????? /* 10 0 1001 */ > +?????? { 4, ST+SW },?? ??????? /* 10 0 1010: stwbrx */ > +?????? INVALID,??????? ??????? /* 10 0 1011 */ > +?????? { 2, LD+SW },?? ??????? /* 10 0 1100: lhbrx */ > +?????? { 4, LD+SE },?? ??????? /* 10 0 1101? lwa */ > +?????? { 2, ST+SW },?? ??????? /* 10 0 1110: sthbrx */ > +?????? INVALID,??????? ??????? /* 10 0 1111 */ > +?????? INVALID,??????? ??????? /* 10 1 0000 */ > +?????? INVALID,??????? ??????? /* 10 1 0001 */ > +?????? INVALID,??????? ??????? /* 10 1 0010 */ > +?????? INVALID,??????? ??????? /* 10 1 0011 */ > +?????? INVALID,??????? ??????? /* 10 1 0100 */ > +?????? INVALID,??????? ??????? /* 10 1 0101 */ > +?????? INVALID,??????? ??????? /* 10 1 0110 */ > +?????? INVALID,??????? ??????? /* 10 1 0111 */ > +?????? INVALID,??????? ??????? /* 10 1 1000 */ > +?????? INVALID,??????? ??????? /* 10 1 1001 */ > +?????? INVALID,??????? ??????? /* 10 1 1010 */ > +?????? INVALID,??????? ??????? /* 10 1 1011 */ > +?????? INVALID,??????? ??????? /* 10 1 1100 */ > +?????? INVALID,??????? ??????? /* 10 1 1101 */ > +?????? INVALID,??????? ??????? /* 10 1 1110 */ > +?????? { 0, ST+HARD }, ??????? /* 10 1 1111: dcbz */ > +?????? { 4, LD },????? ??????? /* 11 0 0000: lwzx */ > +?????? INVALID,??????? ??????? /* 11 0 0001 */ > +?????? { 4, ST },????? ??????? /* 11 0 0010: stwx */ > +?????? INVALID,??????? ??????? /* 11 0 0011 */ > +?????? { 2, LD },????? ??????? /* 11 0 0100: lhzx */ > +?????? { 2, LD+SE },?? ??????? /* 11 0 0101: lhax */ > +?????? { 2, ST },????? ??????? /* 11 0 0110: sthx */ > +?????? INVALID,??????? ??????? /* 11 0 0111 */ > +?????? { 4, LD+F+S },? ??????? /* 11 0 1000: lfsx */ > +?????? { 8, LD+F },??? ??????? /* 11 0 1001: lfdx */ > +?????? { 4, ST+F+S },? ??????? /* 11 0 1010: stfsx */ > +?????? { 8, ST+F },??? ??????? /* 11 0 1011: stfdx */ > +?????? INVALID,??????? ??????? /* 11 0 1100 */ > +?????? { 8, LD+M },??? ??????? /* 11 0 1101: lmd */ > +?????? INVALID,??????? ??????? /* 11 0 1110 */ > +?????? { 8, ST+M },??? ??????? /* 11 0 1111: stmd */ > +?????? { 4, LD+U },??? ??????? /* 11 1 0000: lwzux */ > +?????? INVALID,??????? ??????? /* 11 1 0001 */ > +?????? { 4, ST+U },??? ??????? /* 11 1 0010: stwux */ > +?????? INVALID,??????? ??????? /* 11 1 0011 */ > +?????? { 2, LD+U },??? ??????? /* 11 1 0100: lhzux */ > +?????? { 2, LD+SE+U }, ??????? /* 11 1 0101: lhaux */ > +?????? { 2, ST+U },??? ??????? /* 11 1 0110: sthux */ > +?????? INVALID,??????? ??????? /* 11 1 0111 */ > +?????? { 4, LD+F+S+U },??????? /* 11 1 1000: lfsux */ > +?????? { 8, LD+F+U },? ??????? /* 11 1 1001: lfdux */ > +?????? { 4, ST+F+S+U },??????? /* 11 1 1010: stfsux */ > +?????? { 8, ST+F+U },? ??????? /* 11 1 1011: stfdux */ > +?????? INVALID,??????? ??????? /* 11 1 1100 */ > +?????? INVALID,??????? ??????? /* 11 1 1101 */ > +?????? INVALID,??????? ??????? /* 11 1 1110 */ > +?????? INVALID,??????? ??????? /* 11 1 1111 */ > +}; > + > +/* > + * Create a DSISR value from the instruction > + */ > +static inline unsigned make_dsisr(unsigned instr) > +{ > +?????? unsigned dsisr; > + > + > +?????? /* bits? 6:15 --> 22:31 */ > +?????? dsisr = (instr & 0x03ff0000) >> 16; > + > +?????? if ( IS_XFORM(instr) ) { > +?????? ??????? /* bits 29:30 --> 15:16 */ > +?????? ??????? dsisr |= (instr & 0x00000006) << 14; > +?????? ??????? /* bit???? 25 -->??? 17 */ > +?????? ??????? dsisr |= (instr & 0x00000040) << 8; > +?????? ??????? /* bits 21:24 --> 18:21 */ > +?????? ??????? dsisr |= (instr & 0x00000780) << 3; > +?????? } > +?????? else { > +?????? ??????? /* bit????? 5 -->??? 17 */ > +?????? ??????? dsisr |= (instr & 0x04000000) >> 12; > +?????? ??????? /* bits? 1: 4 --> 18:21 */ > +?????? ??????? dsisr |= (instr & 0x78000000) >> 17; > +?????? ??????? /* bits 30:31 --> 12:13 */ > +?????? ??????? if ( IS_DSFORM(instr) ) > +?????? ??????? ??????? dsisr |= (instr & 0x00000003) << 18; > +?????? } > + > +?????? return dsisr; > +} > + > +/* > + * The dcbz (data cache block zero) instruction > + * gives an alignment fault if used on non-cacheable > + * memory.? We handle the fault mainly for the > + * case when we are running with the cache disabled > + * for debugging. > + */ > +static int emulate_dcbz(struct pt_regs *regs, unsigned char __user > *addr) > +{ > +?????? long __user *p; > +?????? int i, size; > + > +#ifdef __powerpc64__ > +?????? size = ppc64_caches.dline_size; > +#else > +?????? size = L1_CACHE_BYTES; > +#endif > +?????? p = (long __user *) (regs->dar & -size); > +?????? if (user_mode(regs) && !access_ok(VERIFY_WRITE, p, size)) > +?????? ??????? return -EFAULT; > +?????? for (i = 0; i < size / sizeof(long); ++i) > +?????? ??????? if (__put_user(0, p+i)) > +?????? ??????? ??????? return -EFAULT; > +?????? return 1; > +} > + > +/* > + * Emulate load & store multiple instructions > + */ > +static int emulate_multiple(struct pt_regs *regs, unsigned char > __user *addr, > +?????? ??????? ??????? ??? unsigned int reg, unsigned int nb, > +?????? ??????? ??????? ??? unsigned int flags, unsigned int instr) > +{ > +?????? unsigned char *rptr; > +?????? int nb0, i; > + > +?????? /* > +??????? * We do not try to emulate 8 bytes multiple as they aren't > really > +??????? * available in our operating environments and we don't try to > +??????? * emulate multiples operations in kernel land as they should > never > +??????? * be used/generated there at least not on unaligned boundaries > +??????? */ > +?????? if (unlikely((nb > 4) || !user_mode(regs))) > +?????? ??????? return 0; > + > +?????? /* lmw, stmw, lswi/x, stswi/x */ > +?????? nb0 = 0; > +?????? if (flags & HARD) { > +?????? ??????? if (flags & SX) { > +?????? ??????? ??????? nb = regs->xer & 127; > +?????? ??????? ??????? if (nb == 0) > +?????? ??????? ??????? ??????? return 1; > +?????? ??????? } else { > +?????? ??????? ??????? if (__get_user(instr, > +?????? ??????? ??????? ??????? ?????? (unsigned int __user > *)regs->nip)) > +?????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? ??????? nb = (instr >> 11) & 0x1f; > +?????? ??????? ??????? if (nb == 0) > +?????? ??????? ??????? ??????? nb = 32; > +?????? ??????? } > +?????? ??????? if (nb + reg * 4 > 128) { > +?????? ??????? ??????? nb0 = nb + reg * 4 - 128; > +?????? ??????? ??????? nb = 128 - reg * 4; > +?????? ??????? } > +?????? } else { > +?????? ??????? /* lwm, stmw */ > +?????? ??????? nb = (32 - reg) * 4; > +?????? } > + > +?????? if (!access_ok((flags & ST ? VERIFY_WRITE: VERIFY_READ), addr, > nb+nb0)) > +?????? ??????? return -EFAULT; /* bad address */ > + > +?????? rptr = (unsigned char *) ®s->gpr[reg]; > +?????? if (flags & LD) { > +?????? ??????? for (i = 0; i < nb; ++i) > +?????? ??????? ??????? if (__get_user(rptr[i], addr + i)) > +?????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? if (nb0 > 0) { > +?????? ??????? ??????? rptr = (unsigned char *) ®s->gpr[0]; > +?????? ??????? ??????? addr += nb; > +?????? ??????? ??????? for (i = 0; i < nb0; ++i) > +?????? ??????? ??????? ??????? if (__get_user(rptr[i], addr + i)) > +?????? ??????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? } > +?????? ??????? for (; (i & 3) != 0; ++i) > +?????? ??????? ??????? rptr[i] = 0; > +?????? } else { > +?????? ??????? for (i = 0; i < nb; ++i) > +?????? ??????? ??????? if (__put_user(rptr[i], addr + i)) > +?????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? if (nb0 > 0) { > +?????? ??????? ??????? rptr = (unsigned char *) ®s->gpr[0]; > +?????? ??????? ??????? addr += nb; > +?????? ??????? ??????? for (i = 0; i < nb0; ++i) > +?????? ??????? ??????? ??????? if (__put_user(rptr[i], addr + i)) > +?????? ??????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? } > +?????? } > +?????? return 1; > +} > + > + > +/* > + * Called on alignment exception. Attempts to fixup > + * > + * Return 1 on success > + * Return 0 if unable to handle the interrupt > + * Return -EFAULT if data address is bad > + */ > + > +int fix_alignment(struct pt_regs *regs) > +{ > +?????? unsigned int instr, nb, flags; > +?????? unsigned int reg, areg; > +?????? unsigned int dsisr; > +?????? unsigned char __user *addr; > +?????? unsigned char __user *p; > +?????? int ret, t; > +?????? union { > +?????? ??????? long ll; > +?????? ??????? double dd; > +?????? ??????? unsigned char v[8]; > +?????? ??????? struct { > +?????? ??????? ??????? unsigned hi32; > +?????? ??????? ??????? int????? low32; > +?????? ??????? } x32; > +?????? ??????? struct { > +?????? ??????? ??????? unsigned char hi48[6]; > +?????? ??????? ??????? short?? ????? low16; > +?????? ??????? } x16; > +?????? } data; > + > +?????? /* > +??????? * We require a complete register set, if not, then our > assembly > +??????? * is broken > +??????? */ > +?????? CHECK_FULL_REGS(regs); > + > +?????? dsisr = regs->dsisr; > + > +?????? /* Some processors don't provide us with a DSISR we can use > here, > +??????? * let's make one up from the instruction > +??????? */ > +?????? if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { > +?????? ??????? unsigned int real_instr; > +?????? ??????? if (unlikely(__get_user(real_instr, > +?????? ??????? ??????? ??????? ??????? (unsigned int __user > *)regs->nip))) > +?????? ??????? ??????? return -EFAULT; > +?????? ??????? dsisr = make_dsisr(real_instr); > +?????? } > + > +?????? /* extract the operation and registers from the dsisr */ > +?????? reg = (dsisr >> 5) & 0x1f;????? /* source/dest register */ > +?????? areg = dsisr & 0x1f;??? ??????? /* register to update */ > +?????? instr = (dsisr >> 10) & 0x7f; > +?????? instr |= (dsisr >> 13) & 0x60; > + > +?????? /* Lookup the operation in our table */ > +?????? nb = aligninfo[instr].len; > +?????? flags = aligninfo[instr].flags; > + > +?????? /* DAR has the operand effective address */ > +?????? addr = (unsigned char __user *)regs->dar; > + > +?????? /* A size of 0 indicates an instruction we don't support, with > +??????? * the exception of DCBZ which is handled as a special case > here > +??????? */ > +?????? if (instr == DCBZ) > +?????? ??????? return emulate_dcbz(regs, addr); > +?????? if (unlikely(nb == 0)) > +?????? ??????? return 0; > + > +?????? /* Load/Store Multiple instructions are handled in their own > +??????? * function > +??????? */ > +?????? if (flags & M) > +?????? ??????? return emulate_multiple(regs, addr, reg, nb, flags, > instr); > + > +?????? /* Verify the address of the operand */ > +?????? if (unlikely(user_mode(regs) && > +?????? ??????? ???? !access_ok((flags & ST ? VERIFY_WRITE : > VERIFY_READ), > +?????? ??????? ??????? ??????? addr, nb))) > +?????? ??????? return -EFAULT; > + > +?????? /* Force the fprs into the save area so we can reference them > */ > +?????? if (flags & F) { > +?????? ??????? /* userland only */ > +?????? ??????? if (unlikely(!user_mode(regs))) > +?????? ??????? ??????? return 0; > +?????? ??????? flush_fp_to_thread(current); > +?????? } > + > +?????? /* If we are loading, get the data from user space, else > +??????? * get it from register values > +??????? */ > +?????? if (flags & LD) { > +?????? ??????? data.ll = 0; > +?????? ??????? ret = 0; > +?????? ??????? p = addr; > +?????? ??????? switch (nb) { > +?????? ??????? case 8: > +?????? ??????? ??????? ret |= __get_user(data.v[0], p++); > +?????? ??????? ??????? ret |= __get_user(data.v[1], p++); > +?????? ??????? ??????? ret |= __get_user(data.v[2], p++); > +?????? ??????? ??????? ret |= __get_user(data.v[3], p++); > +?????? ??????? case 4: > +?????? ??????? ??????? ret |= __get_user(data.v[4], p++); > +?????? ??????? ??????? ret |= __get_user(data.v[5], p++); > +?????? ??????? case 2: > +?????? ??????? ??????? ret |= __get_user(data.v[6], p++); > +?????? ??????? ??????? ret |= __get_user(data.v[7], p++); > +?????? ??????? ??????? if (unlikely(ret)) > +?????? ??????? ??????? ??????? return -EFAULT; > +?????? ??????? } > +?????? } else if (flags & F) > +?????? ??????? data.dd = current->thread.fpr[reg]; > +?????? else > +?????? ??????? data.ll = regs->gpr[reg]; > + > +?????? /* Perform other misc operations like sign extension, byteswap, > +??????? * or floating point single precision conversion > +??????? */ > +?????? switch (flags & ~U) { > +?????? case LD+SE:???? /* sign extend */ > +?????? ??????? if ( nb == 2 ) > +?????? ??????? ??????? data.ll = data.x16.low16; > +?????? ??????? else??? /* nb must be 4 */ > +?????? ??????? ??????? data.ll = data.x32.low32; > +?????? ??????? break; > +?????? case LD+S:????? /* byte-swap */ > +?????? case ST+S: > +?????? ??????? if (nb == 2) { > +?????? ??????? ??????? SWAP(data.v[6], data.v[7]); > +?????? ??????? } else { > +?????? ??????? ??????? SWAP(data.v[4], data.v[7]); > +?????? ??????? ??????? SWAP(data.v[5], data.v[6]); > +?????? ??????? } > +?????? ??????? break; > + > +?????? /* Single-precision FP load and store require conversions... */ > +?????? case LD+F+S: > +#ifdef CONFIG_PPC_FPU > +?????? ??????? preempt_disable(); > +?????? ??????? enable_kernel_fp(); > +?????? ??????? cvt_fd((float *)&data.v[4], &data.dd, > ¤t->thread); > +?????? ??????? preempt_enable(); > +#else > +?????? ??????? return 0; > +#endif > +?????? ??????? break; > +?????? case ST+F+S: > +#ifdef CONFIG_PPC_FPU > +?????? ??????? preempt_disable(); > +?????? ??????? enable_kernel_fp(); > +?????? ??????? cvt_df(&data.dd, (float *)&data.v[4], > ¤t->thread); > +?????? ??????? preempt_enable(); > +#else > +?????? ??????? return 0; > +#endif > +?????? ??????? break; > +?????? } > + > +?????? /* Store result to memory or update registers */ > +?????? if (flags & ST) { > +?????? ??????? ret = 0; > +?????? ??????? p = addr; > +?????? ??????? switch (nb) { > +?????? ??????? case 8: > +?????? ??????? ??????? ret |= __put_user(data.v[0], p++); > +?????? ??????? ??????? ret |= __put_user(data.v[1], p++); > +?????? ??????? ??????? ret |= __put_user(data.v[2], p++); > +?????? ??????? ??????? ret |= __put_user(data.v[3], p++); > +?????? ??????? case 4: > +?????? ??????? ??????? ret |= __put_user(data.v[4], p++); > +?????? ??????? ??????? ret |= __put_user(data.v[5], p++); > +?????? ??????? case 2: > +?????? ??????? ??????? ret |= __put_user(data.v[6], p++); > +?????? ??????? ??????? ret |= __put_user(data.v[7], p++); > +?????? ??????? } > +?????? ??????? if (unlikely(ret)) > +?????? ??????? ??????? return -EFAULT; > +?????? } else if (flags & F) > +?????? ??????? current->thread.fpr[reg] = data.dd; > +?????? else > +?????? ??????? regs->gpr[reg] = data.ll; > + > +?????? /* Update RA as needed */ > +?????? if (flags & U) > +?????? ??????? regs->gpr[areg] = regs->dar; > + > +?????? return 1; > +} > Index: linux-work/arch/ppc/kernel/Makefile > =================================================================== > --- linux-work.orig/arch/ppc/kernel/Makefile??? 2005-11-11 > 10:14:48.000000000 +1100 > +++ linux-work/arch/ppc/kernel/Makefile 2005-11-14 18:42:30.000000000 > +1100 > @@ -13,7 +13,7 @@ > ?extra-y??????? ??????? ??????? ??????? += vmlinux.lds > ? > ?obj-y? ??????? ??????? ??????? := entry.o traps.o idle.o time.o > misc.o \ > -?????? ??????? ??????? ??????? ??????? process.o align.o \ > +?????? ??????? ??????? ??????? ??????? process.o \ > ??????? ??????? ??????? ??????? ??????? setup.o \ > ??????? ??????? ??????? ??????? ??????? ppc_htab.o > ?obj-$(CONFIG_6xx)????? ??????? += l2cr.o cpu_setup_6xx.o > Index: linux-work/arch/ppc64/kernel/Makefile > =================================================================== > --- linux-work.orig/arch/ppc64/kernel/Makefile? 2005-11-14 > 15:20:05.000000000 +1100 > +++ linux-work/arch/ppc64/kernel/Makefile?????? 2005-11-14 > 18:42:12.000000000 +1100 > @@ -11,9 +11,7 @@ > ? > ?endif > ? > -obj-y?????????????? += idle.o dma.o \ > -?????? ??????? ??????? align.o \ > -?????? ??????? ??????? iommu.o > +obj-y?????????????? += idle.o dma.o iommu.o > ? > ?pci-obj-$(CONFIG_PPC_MULTIPLATFORM)??? += pci_dn.o pci_direct_iommu.o > ? > Index: linux-work/include/asm-powerpc/cputable.h > =================================================================== > --- linux-work.orig/include/asm-powerpc/cputable.h????? 2005-11-11 > 10:14:49.000000000 +1100 > +++ linux-work/include/asm-powerpc/cputable.h?? 2005-11-14 > 18:33:42.000000000 +1100 > @@ -90,6 +90,7 @@ > ?#define CPU_FTR_NEED_COHERENT? ??????? ASM_CONST(0x0000000000020000) > ?#define CPU_FTR_NO_BTIC??????? ??????? ??????? > ASM_CONST(0x0000000000040000) > ?#define CPU_FTR_BIG_PHYS?????? ??????? ASM_CONST(0x0000000000080000) > +#define CPU_FTR_NODSISRALIGN?? ??????? ASM_CONST(0x0000000000100000) > ? > ?#ifdef __powerpc64__ > ?/* Add the 64b processor unique features in the top half of the word > */ > @@ -97,7 +98,6 @@ > ?#define CPU_FTR_16M_PAGE?????? ??????? ASM_CONST(0x0000000200000000) > ?#define CPU_FTR_TLBIEL???????? ??????? ??????? > ASM_CONST(0x0000000400000000) > ?#define CPU_FTR_NOEXECUTE????? ??????? ASM_CONST(0x0000000800000000) > -#define CPU_FTR_NODSISRALIGN?? ??????? ASM_CONST(0x0000001000000000) > ?#define CPU_FTR_IABR?? ??????? ??????? ASM_CONST(0x0000002000000000) > ?#define CPU_FTR_MMCRA? ??????? ??????? ??????? > ASM_CONST(0x0000004000000000) > ?#define CPU_FTR_CTRL?? ??????? ??????? ASM_CONST(0x0000008000000000) > @@ -113,7 +113,6 @@ > ?#define CPU_FTR_16M_PAGE?????? ??????? ASM_CONST(0x0) > ?#define CPU_FTR_TLBIEL???????? ??????? ??????? ASM_CONST(0x0) > ?#define CPU_FTR_NOEXECUTE????? ??????? ASM_CONST(0x0) > -#define CPU_FTR_NODSISRALIGN?? ??????? ASM_CONST(0x0) > ?#define CPU_FTR_IABR?? ??????? ??????? ASM_CONST(0x0) > ?#define CPU_FTR_MMCRA? ??????? ??????? ??????? ASM_CONST(0x0) > ?#define CPU_FTR_CTRL?? ??????? ??????? ASM_CONST(0x0) > @@ -273,18 +272,21 @@ > ??????? CPU_FTRS_POWER3_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | > ??????? ??? CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, > ??????? CPU_FTRS_POWER4_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | > -?????? ??? CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, > +?????? ??? CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_NODSISRALIGN, > ??????? CPU_FTRS_970_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | > ??????? ??? CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_ALTIVEC_COMP > | > -?????? ??? CPU_FTR_MAYBE_CAN_NAP, > +?????? ??? CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN, > ??????? CPU_FTRS_8XX = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, > -?????? CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, > -?????? CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, > -?????? CPU_FTRS_E200 = CPU_FTR_USE_TB, > -?????? CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, > +?????? CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | > +?????? ??? CPU_FTR_NODSISRALIGN, > +?????? CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | > +?????? ??? CPU_FTR_NODSISRALIGN, > +?????? CPU_FTRS_E200 = CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN, > +?????? CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | > +?????? ??? CPU_FTR_NODSISRALIGN, > ??????? CPU_FTRS_E500_2 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | > -?????? ??? CPU_FTR_BIG_PHYS, > -?????? CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON, > +?????? ??? CPU_FTR_BIG_PHYS | CPU_FTR_NODSISRALIGN, > +?????? CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN, > ?#ifdef __powerpc64__ > ??????? CPU_FTRS_POWER3 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | > ??????? ??? CPU_FTR_HPTE_TABLE | CPU_FTR_IABR, > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev at ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev From benh at kernel.crashing.org Tue Nov 15 07:55:18 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 07:55:18 +1100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: References: <1131955237.5504.148.camel@gaston> Message-ID: <1132001719.5504.204.camel@gaston> On Mon, 2005-11-14 at 13:53 -0600, Becky Bruce wrote: > Ben, > > I talked to Kumar about this a little bit (I had started a merge of > this file, but got distracted!) and he doesn't have any test cases. > I'll put something together and test this out on some of the 32-bit > systems I have here in my lab. It won't be complete, but it will be > something....... Thanks, Ben. From paulus at samba.org Tue Nov 15 08:43:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 08:43:17 +1100 Subject: please pull powerpc-merge.git In-Reply-To: <20051114115752.GB6056@lst.de> References: <17272.29526.477339.230738@cargo.ozlabs.ibm.com> <20051114115752.GB6056@lst.de> Message-ID: <17273.1269.366017.519396@cargo.ozlabs.ibm.com> Christoph Hellwig writes: > hey, please wait a while with this. it's a bit half-backed and you > don't really expect people to review it in the enourmous timespan of > four hors, do you? *shrug* It doesn't affect any existing stuff, and I thought it looked OK. Paul. From dwmw2 at infradead.org Tue Nov 15 10:00:09 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Mon, 14 Nov 2005 23:00:09 +0000 Subject: [PATCH] powerpc: Merge kexec In-Reply-To: <20051114123500.178B468724@ozlabs.org> References: <20051114123500.178B468724@ozlabs.org> Message-ID: <1132009209.28963.17.camel@baythorne.infradead.org> On Mon, 2005-11-14 at 23:35 +1100, Michael Ellerman wrote: > +#ifdef CONFIG_KEXEC > +struct kimage; > +#endif No need for this to be in ifdef. -- dwmw2 From galak at gate.crashing.org Tue Nov 15 10:02:39 2005 From: galak at gate.crashing.org (Kumar Gala) Date: Mon, 14 Nov 2005 17:02:39 -0600 (CST) Subject: asm-ppc/page.h vs asm-powerpc/page.h (fwd) Message-ID: ---------- Forwarded message ---------- Date: Mon, 14 Nov 2005 17:01:43 -0600 (CST) From: Kumar Gala To: Paul Mackerras Cc: michael at ellerman.id.au, linuxppc-dev at ozlabs.org, linuxppc64 at gate.crashing.org Subject: asm-ppc/page.h vs asm-powerpc/page.h Guys, what's going on here. Why haven't we removed asm-ppc/page.h? When I build ARCH=powerpc I get asm-powerpc/page.h, which doesn't build on 85xx since page_to_virt is missing. Any reason we are keeping around asm-ppc/page.h and causing this confusion? - kumar _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev at ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev From paulus at samba.org Tue Nov 15 10:17:49 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 10:17:49 +1100 Subject: asm-ppc/page.h vs asm-powerpc/page.h (fwd) In-Reply-To: References: Message-ID: <17273.6941.128698.895628@cargo.ozlabs.ibm.com> Kumar Gala writes: > Guys, what's going on here. > > Why haven't we removed asm-ppc/page.h? I decided that all the APUS special-case stuff was too ugly to live, and that there had to be a better way, so I took it out of the merged page_32.h. Because of that, and because I didn't want to take the risk of breaking any other existing ARCH=ppc ports, I left the existing asm-ppc/page.h in the tree. > When I build ARCH=powerpc I get asm-powerpc/page.h, which doesn't build on > 85xx since page_to_virt is missing. Ah... well, send me a patch to add page_to_virt. :) Paul. From galak at gate.crashing.org Tue Nov 15 10:21:44 2005 From: galak at gate.crashing.org (Kumar Gala) Date: Mon, 14 Nov 2005 17:21:44 -0600 (CST) Subject: [PATCH] powerpc: put page page_to_virt for Book-e processors Message-ID: Book-E processors use page_to_virt since we have to always translate. Signed-off-by: Kumar Gala --- commit 62ac3a10f41d7d300cf82846348f65eab0dbfd40 tree 039f1c393bd77fb8c6daf1c8ebd883c9a72be6e2 parent ed24c128ba54329d142c4d4c7c5e05cec6065b08 author Kumar Gala Mon, 14 Nov 2005 17:22:35 -0600 committer Kumar Gala Mon, 14 Nov 2005 17:22:35 -0600 include/asm-powerpc/page.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/asm-powerpc/page.h b/include/asm-powerpc/page.h index 18c1e5e..e34a2ba 100644 --- a/include/asm-powerpc/page.h +++ b/include/asm-powerpc/page.h @@ -53,6 +53,7 @@ #endif #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) +#define page_to_virt(page) __va(page_to_pfn(page) << PAGE_SHIFT) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) From kravetz at us.ibm.com Tue Nov 15 11:12:49 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Mon, 14 Nov 2005 16:12:49 -0800 Subject: [PATCH] Remove SPAN_OTHER_NODES config definition Message-ID: <20051115001249.GA16196@w-mikek2.ibm.com> The config option SPAN_OTHER_NODES was created so that we could make pSeries numa layouts work within the DISCONTIG memory model. Now that DISCONTIG has been replaced by SPARSEMEM, we can eliminate this option. I'll be sending a separate patch to Andrew to remove the arch independent code as pSeries was the only arch that needed this. Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.15-rc1-git2/arch/powerpc/Kconfig linux-2.6.15-rc1-git2.work/arch/powerpc/Kconfig --- linux-2.6.15-rc1-git2/arch/powerpc/Kconfig 2005-11-12 01:43:36.000000000 +0000 +++ linux-2.6.15-rc1-git2.work/arch/powerpc/Kconfig 2005-11-14 22:47:31.000000000 +0000 @@ -598,19 +598,6 @@ config ARCH_MEMORY_PROBE def_bool y depends on MEMORY_HOTPLUG -# Some NUMA nodes have memory ranges that span -# other nodes. Even though a pfn is valid and -# between a node's start and end pfns, it may not -# reside on that node. -# -# This is a relatively temporary hack that should -# be able to go away when sparsemem is fully in -# place - -config NODES_SPAN_OTHER_NODES - def_bool y - depends on NEED_MULTIPLE_NODES - config PPC_64K_PAGES bool "64k page size" depends on PPC64 diff -Naupr linux-2.6.15-rc1-git2/arch/powerpc/configs/pseries_defconfig linux-2.6.15-rc1-git2.work/arch/powerpc/configs/pseries_defconfig --- linux-2.6.15-rc1-git2/arch/powerpc/configs/pseries_defconfig 2005-11-12 01:43:36.000000000 +0000 +++ linux-2.6.15-rc1-git2.work/arch/powerpc/configs/pseries_defconfig 2005-11-14 22:47:05.000000000 +0000 @@ -108,7 +108,6 @@ CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y -CONFIG_NODES_SPAN_OTHER_NODES=y CONFIG_NUMA=y CONFIG_SCHED_SMT=y CONFIG_PREEMPT_NONE=y From benh at kernel.crashing.org Tue Nov 15 11:17:07 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 11:17:07 +1100 Subject: [PATCH 2 of 2] tpm: updates for new hardware In-Reply-To: <200511141710.41230.bjorn.helgaas@hp.com> References: <1131739595.5048.15.camel@localhost.localdomain> <200511141710.41230.bjorn.helgaas@hp.com> Message-ID: <1132013827.6094.18.camel@gaston> > Why don't you use ioread8() instead of defining atmel_getb()? > > You'd still need something PPC64-specific to initialize the iomem cookie, > but the accessors would go away. > > Unfortunately, ioread8() and associated interfaces aren't mentioned > under Documentation/, but there are some hints in lib/iomap.c. Yes, I was about to reply the same thing :) Ben. From paulus at samba.org Tue Nov 15 12:10:56 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 12:10:56 +1100 Subject: please pull powerpc-merge.git Message-ID: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> Linus, Please do another pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git This is the same as the last lot I asked you to pull except that I have taken out the "GX bus support on pSeries machines" patch that needs a bit more work. Apart from various small bugfixes, including some VDSO fixes from Ben, this moves most of the remaining ppc64 stuff over to arch/powerpc and include/asm-powerpc. Thanks, Paul. Benjamin Herrenschmidt: powerpc: Always rebuild arch/powerpc/include/asm symlink powerpc: vdso fixes (take #2) powerpc: kill ppc64 rtc.c, use genrtc instead Michael Ellerman: powerpc: Merge page.h powerpc: Turn cpu_irq_down into kexec_cpu_down powerpc: Export htab start/end via device tree Paul Mackerras: powerpc: Move a bunch of ppc64 headers to include/asm-powerpc powerpc: Move most remaining ppc64 files over to arch/powerpc powerpc: Export a couple of prom functions powerpc: Mark PREP and embedded as broken for now powerpc: Fix 32-bit compile: PPC_MEMSTART was undeclared powerpc: Fix clearing of the FPSCR when invoking a signal handler powerpc: Remove an extraneous and incorrect declaration of pmac_nvram_init. powerpc: Remove __init from a function used in suspend/resume. Stephen Rothwell: powerpc: make iSeries use generic virtual irq mapping powerpc: have only one definition of __irq_offset_value powerpc: iSeries build fixes arch/powerpc/Kconfig | 5 arch/powerpc/Makefile | 2 arch/powerpc/configs/pseries_defconfig | 206 ++- arch/powerpc/kernel/Makefile | 20 arch/powerpc/kernel/asm-offsets.c | 6 arch/powerpc/kernel/dma_64.c | 0 arch/powerpc/kernel/iomap.c | 0 arch/powerpc/kernel/iommu.c | 0 arch/powerpc/kernel/irq.c | 9 arch/powerpc/kernel/kprobes.c | 0 arch/powerpc/kernel/lparcfg.c | 51 - arch/powerpc/kernel/machine_kexec_64.c | 63 + arch/powerpc/kernel/module_64.c | 0 arch/powerpc/kernel/pci_64.c | 0 arch/powerpc/kernel/pci_direct_iommu.c | 0 arch/powerpc/kernel/pci_dn.c | 0 arch/powerpc/kernel/pci_iommu.c | 0 arch/powerpc/kernel/prom.c | 2 arch/powerpc/kernel/rtas-rtc.c | 105 + arch/powerpc/kernel/setup_32.c | 4 arch/powerpc/kernel/setup_64.c | 5 arch/powerpc/kernel/signal_32.c | 7 arch/powerpc/kernel/signal_64.c | 6 arch/powerpc/kernel/vdso32/datapage.S | 3 arch/powerpc/kernel/vdso32/gettimeofday.S | 12 arch/powerpc/kernel/vdso64/datapage.S | 1 arch/powerpc/kernel/vdso64/gettimeofday.S | 31 arch/powerpc/platforms/iseries/irq.c | 25 arch/powerpc/platforms/iseries/setup.c | 6 arch/powerpc/platforms/powermac/time.c | 9 arch/powerpc/platforms/pseries/Makefile | 5 arch/powerpc/platforms/pseries/hvconsole.c | 0 arch/powerpc/platforms/pseries/hvcserver.c | 0 arch/powerpc/platforms/pseries/setup.c | 26 arch/ppc64/Kconfig | 520 ------- arch/ppc64/kernel/Makefile | 41 - arch/ppc64/kernel/asm-offsets.c | 195 --- arch/ppc64/kernel/btext.c | 792 ----------- arch/ppc64/kernel/head.S | 2007 --------------------------- arch/ppc64/kernel/misc.S | 940 ------------- arch/ppc64/kernel/ppc_ksyms.c | 76 - arch/ppc64/kernel/prom.c | 1956 --------------------------- arch/ppc64/kernel/prom_init.c | 2051 ---------------------------- arch/ppc64/kernel/rtc.c | 358 ----- arch/ppc64/kernel/semaphore.c | 136 -- arch/ppc64/kernel/vdso.c | 625 --------- arch/ppc64/kernel/vmlinux.lds.S | 151 -- arch/ppc64/xmon/privinst.h | 64 - drivers/char/Kconfig | 2 include/asm-powerpc/btext.h | 0 include/asm-powerpc/delay.h | 19 include/asm-powerpc/eeh.h | 0 include/asm-powerpc/floppy.h | 25 include/asm-powerpc/hvconsole.h | 0 include/asm-powerpc/hvcserver.h | 0 include/asm-powerpc/kexec.h | 1 include/asm-powerpc/machdep.h | 4 include/asm-powerpc/nvram.h | 17 include/asm-powerpc/page.h | 179 ++ include/asm-powerpc/page_32.h | 40 + include/asm-powerpc/page_64.h | 174 ++ include/asm-powerpc/serial.h | 18 include/asm-powerpc/vdso_datapage.h | 2 include/asm-ppc/nvram.h | 73 - include/asm-ppc64/page.h | 328 ---- include/asm-ppc64/prom.h | 220 --- include/asm-ppc64/serial.h | 23 include/asm-ppc64/system.h | 310 ---- 68 files changed, 879 insertions(+), 11077 deletions(-) rename arch/{ppc64/kernel/dma.c => powerpc/kernel/dma_64.c} (100%) rename arch/{ppc64/kernel/iomap.c => powerpc/kernel/iomap.c} (100%) rename arch/{ppc64/kernel/iommu.c => powerpc/kernel/iommu.c} (100%) rename arch/{ppc64/kernel/kprobes.c => powerpc/kernel/kprobes.c} (100%) rename arch/{ppc64/kernel/machine_kexec.c => powerpc/kernel/machine_kexec_64.c} (84%) rename arch/{ppc64/kernel/module.c => powerpc/kernel/module_64.c} (100%) rename arch/{ppc64/kernel/pci.c => powerpc/kernel/pci_64.c} (100%) rename arch/{ppc64/kernel/pci_direct_iommu.c => powerpc/kernel/pci_direct_iommu.c} (100%) rename arch/{ppc64/kernel/pci_dn.c => powerpc/kernel/pci_dn.c} (100%) rename arch/{ppc64/kernel/pci_iommu.c => powerpc/kernel/pci_iommu.c} (100%) create mode 100644 arch/powerpc/kernel/rtas-rtc.c rename arch/{ppc64/kernel/hvconsole.c => powerpc/platforms/pseries/hvconsole.c} (100%) rename arch/{ppc64/kernel/hvcserver.c => powerpc/platforms/pseries/hvcserver.c} (100%) delete mode 100644 arch/ppc64/Kconfig delete mode 100644 arch/ppc64/kernel/asm-offsets.c delete mode 100644 arch/ppc64/kernel/btext.c delete mode 100644 arch/ppc64/kernel/head.S delete mode 100644 arch/ppc64/kernel/misc.S delete mode 100644 arch/ppc64/kernel/ppc_ksyms.c delete mode 100644 arch/ppc64/kernel/prom.c delete mode 100644 arch/ppc64/kernel/prom_init.c delete mode 100644 arch/ppc64/kernel/rtc.c delete mode 100644 arch/ppc64/kernel/semaphore.c delete mode 100644 arch/ppc64/kernel/vdso.c delete mode 100644 arch/ppc64/kernel/vmlinux.lds.S delete mode 100644 arch/ppc64/xmon/privinst.h rename include/{asm-ppc64/btext.h => asm-powerpc/btext.h} (100%) rename include/{asm-ppc64/delay.h => asm-powerpc/delay.h} (71%) rename include/{asm-ppc64/eeh.h => asm-powerpc/eeh.h} (100%) rename include/{asm-ppc64/floppy.h => asm-powerpc/floppy.h} (90%) rename include/{asm-ppc64/hvconsole.h => asm-powerpc/hvconsole.h} (100%) rename include/{asm-ppc64/hvcserver.h => asm-powerpc/hvcserver.h} (100%) rename include/{asm-ppc64/nvram.h => asm-powerpc/nvram.h} (84%) create mode 100644 include/asm-powerpc/page.h create mode 100644 include/asm-powerpc/page_32.h create mode 100644 include/asm-powerpc/page_64.h create mode 100644 include/asm-powerpc/serial.h delete mode 100644 include/asm-ppc/nvram.h delete mode 100644 include/asm-ppc64/page.h delete mode 100644 include/asm-ppc64/prom.h delete mode 100644 include/asm-ppc64/serial.h delete mode 100644 include/asm-ppc64/system.h From bjorn.helgaas at hp.com Tue Nov 15 11:10:41 2005 From: bjorn.helgaas at hp.com (Bjorn Helgaas) Date: Mon, 14 Nov 2005 17:10:41 -0700 Subject: [PATCH 2 of 2] tpm: updates for new hardware In-Reply-To: <1131739595.5048.15.camel@localhost.localdomain> References: <1131739595.5048.15.camel@localhost.localdomain> Message-ID: <200511141710.41230.bjorn.helgaas@hp.com> On Friday 11 November 2005 1:06 pm, Kylene Jo Hall wrote: > +#ifdef CONFIG_PPC64 > +#define atmel_getb(chip, offset) readb(chip->vendor->iobase + offset); > +#define atmel_putb(val, chip, offset) writeb(val, chip->vendor->iobase + offset) > ... > +#else > +#define atmel_getb(chip, offset) inb(chip->vendor->base + offset) > +#define atmel_putb(val, chip, offset) outb(val, chip->vendor->base + offset) Why don't you use ioread8() instead of defining atmel_getb()? You'd still need something PPC64-specific to initialize the iomem cookie, but the accessors would go away. Unfortunately, ioread8() and associated interfaces aren't mentioned under Documentation/, but there are some hints in lib/iomap.c. From benh at kernel.crashing.org Tue Nov 15 14:34:24 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 14:34:24 +1100 Subject: [PATCH] powerpc: Merge align.c (#2) Message-ID: <1132025664.6094.47.camel@gaston> Need testing !!! This patch merges align.c, the result isn't quite what was in ppc64 nor what was in ppc32 :) It should implement all the functionalities of both though. Kumar, since you played with that in the past, I suppose you have some test cases for verifying that it works properly before I dig out the 601 machine ? :) Since it's likely that I won't be able to test all scenario, code inspection is much welcome. Signed-off-by: Benjamin Herrenschmidt --- No difference, just rebased on current -git Index: linux-work/arch/powerpc/kernel/Makefile =================================================================== --- linux-work.orig/arch/powerpc/kernel/Makefile 2005-11-15 13:31:57.000000000 +1100 +++ linux-work/arch/powerpc/kernel/Makefile 2005-11-15 14:31:22.000000000 +1100 @@ -12,7 +12,7 @@ endif obj-y := semaphore.o cputable.o ptrace.o syscalls.o \ - irq.o signal_32.o pmc.o vdso.o + irq.o align.o signal_32.o pmc.o vdso.o obj-y += vdso32/ obj-$(CONFIG_PPC64) += setup_64.o binfmt_elf32.o sys_ppc32.o \ signal_64.o ptrace32.o systbl.o \ Index: linux-work/arch/powerpc/kernel/align.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/powerpc/kernel/align.c 2005-11-15 14:31:22.000000000 +1100 @@ -0,0 +1,513 @@ +/* align.c - handle alignment exceptions for the Power PC. + * + * Copyright (c) 1996 Paul Mackerras + * Copyright (c) 1998-1999 TiVo, Inc. + * PowerPC 403GCX modifications. + * Copyright (c) 1999 Grant Erickson + * PowerPC 403GCX/405GP modifications. + * Copyright (c) 2001-2002 PPC64 team, IBM Corp + * 64-bit and Power4 support + * Copyright (c) 2005 Benjamin Herrenschmidt, IBM Corp + * + * Merge ppc32 and ppc64 implementations + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include + +struct aligninfo { + unsigned char len; + unsigned char flags; +}; + +#define IS_XFORM(inst) (((inst) >> 26) == 31) +#define IS_DSFORM(inst) (((inst) >> 26) >= 56) + +#define INVALID { 0, 0 } + +#define LD 1 /* load */ +#define ST 2 /* store */ +#define SE 4 /* sign-extend value */ +#define F 8 /* to/from fp regs */ +#define U 0x10 /* update index register */ +#define M 0x20 /* multiple load/store */ +#define SW 0x40 /* byte swap int or ... */ +#define S 0x40 /* ... single-precision fp */ +#define SX 0x40 /* byte count in XER */ +#define HARD 0x80 /* string, stwcx. */ + +#define DCBZ 0x5f /* 8xx/82xx dcbz faults when cache not enabled */ + +#define SWAP(a, b) (t = (a), (a) = (b), (b) = t) + +/* + * The PowerPC stores certain bits of the instruction that caused the + * alignment exception in the DSISR register. This array maps those + * bits to information about the operand length and what the + * instruction would do. + */ +static struct aligninfo aligninfo[128] = { + { 4, LD }, /* 00 0 0000: lwz / lwarx */ + INVALID, /* 00 0 0001 */ + { 4, ST }, /* 00 0 0010: stw */ + INVALID, /* 00 0 0011 */ + { 2, LD }, /* 00 0 0100: lhz */ + { 2, LD+SE }, /* 00 0 0101: lha */ + { 2, ST }, /* 00 0 0110: sth */ + { 4, LD+M }, /* 00 0 0111: lmw */ + { 4, LD+F+S }, /* 00 0 1000: lfs */ + { 8, LD+F }, /* 00 0 1001: lfd */ + { 4, ST+F+S }, /* 00 0 1010: stfs */ + { 8, ST+F }, /* 00 0 1011: stfd */ + INVALID, /* 00 0 1100 */ + { 8, LD }, /* 00 0 1101: ld/ldu/lwa */ + INVALID, /* 00 0 1110 */ + { 8, ST }, /* 00 0 1111: std/stdu */ + { 4, LD+U }, /* 00 1 0000: lwzu */ + INVALID, /* 00 1 0001 */ + { 4, ST+U }, /* 00 1 0010: stwu */ + INVALID, /* 00 1 0011 */ + { 2, LD+U }, /* 00 1 0100: lhzu */ + { 2, LD+SE+U }, /* 00 1 0101: lhau */ + { 2, ST+U }, /* 00 1 0110: sthu */ + { 4, ST+M }, /* 00 1 0111: stmw */ + { 4, LD+F+S+U }, /* 00 1 1000: lfsu */ + { 8, LD+F+U }, /* 00 1 1001: lfdu */ + { 4, ST+F+S+U }, /* 00 1 1010: stfsu */ + { 8, ST+F+U }, /* 00 1 1011: stfdu */ + INVALID, /* 00 1 1100 */ + INVALID, /* 00 1 1101 */ + INVALID, /* 00 1 1110 */ + INVALID, /* 00 1 1111 */ + { 8, LD }, /* 01 0 0000: ldx */ + INVALID, /* 01 0 0001 */ + { 8, ST }, /* 01 0 0010: stdx */ + INVALID, /* 01 0 0011 */ + INVALID, /* 01 0 0100 */ + { 4, LD+SE }, /* 01 0 0101: lwax */ + INVALID, /* 01 0 0110 */ + INVALID, /* 01 0 0111 */ + { 4, LD+M+HARD+SX }, /* 01 0 1000: lswx */ + { 4, LD+M+HARD }, /* 01 0 1001: lswi */ + { 4, ST+M+HARD+SX }, /* 01 0 1010: stswx */ + { 4, ST+M+HARD }, /* 01 0 1011: stswi */ + INVALID, /* 01 0 1100 */ + { 8, LD+U }, /* 01 0 1101: ldu */ + INVALID, /* 01 0 1110 */ + { 8, ST+U }, /* 01 0 1111: stdu */ + { 8, LD+U }, /* 01 1 0000: ldux */ + INVALID, /* 01 1 0001 */ + { 8, ST+U }, /* 01 1 0010: stdux */ + INVALID, /* 01 1 0011 */ + INVALID, /* 01 1 0100 */ + { 4, LD+SE+U }, /* 01 1 0101: lwaux */ + INVALID, /* 01 1 0110 */ + INVALID, /* 01 1 0111 */ + INVALID, /* 01 1 1000 */ + INVALID, /* 01 1 1001 */ + INVALID, /* 01 1 1010 */ + INVALID, /* 01 1 1011 */ + INVALID, /* 01 1 1100 */ + INVALID, /* 01 1 1101 */ + INVALID, /* 01 1 1110 */ + INVALID, /* 01 1 1111 */ + INVALID, /* 10 0 0000 */ + INVALID, /* 10 0 0001 */ + INVALID, /* 10 0 0010: stwcx. */ + INVALID, /* 10 0 0011 */ + INVALID, /* 10 0 0100 */ + INVALID, /* 10 0 0101 */ + INVALID, /* 10 0 0110 */ + INVALID, /* 10 0 0111 */ + { 4, LD+SW }, /* 10 0 1000: lwbrx */ + INVALID, /* 10 0 1001 */ + { 4, ST+SW }, /* 10 0 1010: stwbrx */ + INVALID, /* 10 0 1011 */ + { 2, LD+SW }, /* 10 0 1100: lhbrx */ + { 4, LD+SE }, /* 10 0 1101 lwa */ + { 2, ST+SW }, /* 10 0 1110: sthbrx */ + INVALID, /* 10 0 1111 */ + INVALID, /* 10 1 0000 */ + INVALID, /* 10 1 0001 */ + INVALID, /* 10 1 0010 */ + INVALID, /* 10 1 0011 */ + INVALID, /* 10 1 0100 */ + INVALID, /* 10 1 0101 */ + INVALID, /* 10 1 0110 */ + INVALID, /* 10 1 0111 */ + INVALID, /* 10 1 1000 */ + INVALID, /* 10 1 1001 */ + INVALID, /* 10 1 1010 */ + INVALID, /* 10 1 1011 */ + INVALID, /* 10 1 1100 */ + INVALID, /* 10 1 1101 */ + INVALID, /* 10 1 1110 */ + { 0, ST+HARD }, /* 10 1 1111: dcbz */ + { 4, LD }, /* 11 0 0000: lwzx */ + INVALID, /* 11 0 0001 */ + { 4, ST }, /* 11 0 0010: stwx */ + INVALID, /* 11 0 0011 */ + { 2, LD }, /* 11 0 0100: lhzx */ + { 2, LD+SE }, /* 11 0 0101: lhax */ + { 2, ST }, /* 11 0 0110: sthx */ + INVALID, /* 11 0 0111 */ + { 4, LD+F+S }, /* 11 0 1000: lfsx */ + { 8, LD+F }, /* 11 0 1001: lfdx */ + { 4, ST+F+S }, /* 11 0 1010: stfsx */ + { 8, ST+F }, /* 11 0 1011: stfdx */ + INVALID, /* 11 0 1100 */ + { 8, LD+M }, /* 11 0 1101: lmd */ + INVALID, /* 11 0 1110 */ + { 8, ST+M }, /* 11 0 1111: stmd */ + { 4, LD+U }, /* 11 1 0000: lwzux */ + INVALID, /* 11 1 0001 */ + { 4, ST+U }, /* 11 1 0010: stwux */ + INVALID, /* 11 1 0011 */ + { 2, LD+U }, /* 11 1 0100: lhzux */ + { 2, LD+SE+U }, /* 11 1 0101: lhaux */ + { 2, ST+U }, /* 11 1 0110: sthux */ + INVALID, /* 11 1 0111 */ + { 4, LD+F+S+U }, /* 11 1 1000: lfsux */ + { 8, LD+F+U }, /* 11 1 1001: lfdux */ + { 4, ST+F+S+U }, /* 11 1 1010: stfsux */ + { 8, ST+F+U }, /* 11 1 1011: stfdux */ + INVALID, /* 11 1 1100 */ + INVALID, /* 11 1 1101 */ + INVALID, /* 11 1 1110 */ + INVALID, /* 11 1 1111 */ +}; + +/* + * Create a DSISR value from the instruction + */ +static inline unsigned make_dsisr(unsigned instr) +{ + unsigned dsisr; + + + /* bits 6:15 --> 22:31 */ + dsisr = (instr & 0x03ff0000) >> 16; + + if ( IS_XFORM(instr) ) { + /* bits 29:30 --> 15:16 */ + dsisr |= (instr & 0x00000006) << 14; + /* bit 25 --> 17 */ + dsisr |= (instr & 0x00000040) << 8; + /* bits 21:24 --> 18:21 */ + dsisr |= (instr & 0x00000780) << 3; + } + else { + /* bit 5 --> 17 */ + dsisr |= (instr & 0x04000000) >> 12; + /* bits 1: 4 --> 18:21 */ + dsisr |= (instr & 0x78000000) >> 17; + /* bits 30:31 --> 12:13 */ + if ( IS_DSFORM(instr) ) + dsisr |= (instr & 0x00000003) << 18; + } + + return dsisr; +} + +/* + * The dcbz (data cache block zero) instruction + * gives an alignment fault if used on non-cacheable + * memory. We handle the fault mainly for the + * case when we are running with the cache disabled + * for debugging. + */ +static int emulate_dcbz(struct pt_regs *regs, unsigned char __user *addr) +{ + long __user *p; + int i, size; + +#ifdef __powerpc64__ + size = ppc64_caches.dline_size; +#else + size = L1_CACHE_BYTES; +#endif + p = (long __user *) (regs->dar & -size); + if (user_mode(regs) && !access_ok(VERIFY_WRITE, p, size)) + return -EFAULT; + for (i = 0; i < size / sizeof(long); ++i) + if (__put_user(0, p+i)) + return -EFAULT; + return 1; +} + +/* + * Emulate load & store multiple instructions + */ +static int emulate_multiple(struct pt_regs *regs, unsigned char __user *addr, + unsigned int reg, unsigned int nb, + unsigned int flags, unsigned int instr) +{ + unsigned char *rptr; + int nb0, i; + + /* + * We do not try to emulate 8 bytes multiple as they aren't really + * available in our operating environments and we don't try to + * emulate multiples operations in kernel land as they should never + * be used/generated there at least not on unaligned boundaries + */ + if (unlikely((nb > 4) || !user_mode(regs))) + return 0; + + /* lmw, stmw, lswi/x, stswi/x */ + nb0 = 0; + if (flags & HARD) { + if (flags & SX) { + nb = regs->xer & 127; + if (nb == 0) + return 1; + } else { + if (__get_user(instr, + (unsigned int __user *)regs->nip)) + return -EFAULT; + nb = (instr >> 11) & 0x1f; + if (nb == 0) + nb = 32; + } + if (nb + reg * 4 > 128) { + nb0 = nb + reg * 4 - 128; + nb = 128 - reg * 4; + } + } else { + /* lwm, stmw */ + nb = (32 - reg) * 4; + } + + if (!access_ok((flags & ST ? VERIFY_WRITE: VERIFY_READ), addr, nb+nb0)) + return -EFAULT; /* bad address */ + + rptr = (unsigned char *) ®s->gpr[reg]; + if (flags & LD) { + for (i = 0; i < nb; ++i) + if (__get_user(rptr[i], addr + i)) + return -EFAULT; + if (nb0 > 0) { + rptr = (unsigned char *) ®s->gpr[0]; + addr += nb; + for (i = 0; i < nb0; ++i) + if (__get_user(rptr[i], addr + i)) + return -EFAULT; + } + for (; (i & 3) != 0; ++i) + rptr[i] = 0; + } else { + for (i = 0; i < nb; ++i) + if (__put_user(rptr[i], addr + i)) + return -EFAULT; + if (nb0 > 0) { + rptr = (unsigned char *) ®s->gpr[0]; + addr += nb; + for (i = 0; i < nb0; ++i) + if (__put_user(rptr[i], addr + i)) + return -EFAULT; + } + } + return 1; +} + + +/* + * Called on alignment exception. Attempts to fixup + * + * Return 1 on success + * Return 0 if unable to handle the interrupt + * Return -EFAULT if data address is bad + */ + +int fix_alignment(struct pt_regs *regs) +{ + unsigned int instr, nb, flags; + unsigned int reg, areg; + unsigned int dsisr; + unsigned char __user *addr; + unsigned char __user *p; + int ret, t; + union { + long ll; + double dd; + unsigned char v[8]; + struct { + unsigned hi32; + int low32; + } x32; + struct { + unsigned char hi48[6]; + short low16; + } x16; + } data; + + /* + * We require a complete register set, if not, then our assembly + * is broken + */ + CHECK_FULL_REGS(regs); + + dsisr = regs->dsisr; + + /* Some processors don't provide us with a DSISR we can use here, + * let's make one up from the instruction + */ + if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { + unsigned int real_instr; + if (unlikely(__get_user(real_instr, + (unsigned int __user *)regs->nip))) + return -EFAULT; + dsisr = make_dsisr(real_instr); + } + + /* extract the operation and registers from the dsisr */ + reg = (dsisr >> 5) & 0x1f; /* source/dest register */ + areg = dsisr & 0x1f; /* register to update */ + instr = (dsisr >> 10) & 0x7f; + instr |= (dsisr >> 13) & 0x60; + + /* Lookup the operation in our table */ + nb = aligninfo[instr].len; + flags = aligninfo[instr].flags; + + /* DAR has the operand effective address */ + addr = (unsigned char __user *)regs->dar; + + /* A size of 0 indicates an instruction we don't support, with + * the exception of DCBZ which is handled as a special case here + */ + if (instr == DCBZ) + return emulate_dcbz(regs, addr); + if (unlikely(nb == 0)) + return 0; + + /* Load/Store Multiple instructions are handled in their own + * function + */ + if (flags & M) + return emulate_multiple(regs, addr, reg, nb, flags, instr); + + /* Verify the address of the operand */ + if (unlikely(user_mode(regs) && + !access_ok((flags & ST ? VERIFY_WRITE : VERIFY_READ), + addr, nb))) + return -EFAULT; + + /* Force the fprs into the save area so we can reference them */ + if (flags & F) { + /* userland only */ + if (unlikely(!user_mode(regs))) + return 0; + flush_fp_to_thread(current); + } + + /* If we are loading, get the data from user space, else + * get it from register values + */ + if (flags & LD) { + data.ll = 0; + ret = 0; + p = addr; + switch (nb) { + case 8: + ret |= __get_user(data.v[0], p++); + ret |= __get_user(data.v[1], p++); + ret |= __get_user(data.v[2], p++); + ret |= __get_user(data.v[3], p++); + case 4: + ret |= __get_user(data.v[4], p++); + ret |= __get_user(data.v[5], p++); + case 2: + ret |= __get_user(data.v[6], p++); + ret |= __get_user(data.v[7], p++); + if (unlikely(ret)) + return -EFAULT; + } + } else if (flags & F) + data.dd = current->thread.fpr[reg]; + else + data.ll = regs->gpr[reg]; + + /* Perform other misc operations like sign extension, byteswap, + * or floating point single precision conversion + */ + switch (flags & ~U) { + case LD+SE: /* sign extend */ + if ( nb == 2 ) + data.ll = data.x16.low16; + else /* nb must be 4 */ + data.ll = data.x32.low32; + break; + case LD+S: /* byte-swap */ + case ST+S: + if (nb == 2) { + SWAP(data.v[6], data.v[7]); + } else { + SWAP(data.v[4], data.v[7]); + SWAP(data.v[5], data.v[6]); + } + break; + + /* Single-precision FP load and store require conversions... */ + case LD+F+S: +#ifdef CONFIG_PPC_FPU + preempt_disable(); + enable_kernel_fp(); + cvt_fd((float *)&data.v[4], &data.dd, ¤t->thread); + preempt_enable(); +#else + return 0; +#endif + break; + case ST+F+S: +#ifdef CONFIG_PPC_FPU + preempt_disable(); + enable_kernel_fp(); + cvt_df(&data.dd, (float *)&data.v[4], ¤t->thread); + preempt_enable(); +#else + return 0; +#endif + break; + } + + /* Store result to memory or update registers */ + if (flags & ST) { + ret = 0; + p = addr; + switch (nb) { + case 8: + ret |= __put_user(data.v[0], p++); + ret |= __put_user(data.v[1], p++); + ret |= __put_user(data.v[2], p++); + ret |= __put_user(data.v[3], p++); + case 4: + ret |= __put_user(data.v[4], p++); + ret |= __put_user(data.v[5], p++); + case 2: + ret |= __put_user(data.v[6], p++); + ret |= __put_user(data.v[7], p++); + } + if (unlikely(ret)) + return -EFAULT; + } else if (flags & F) + current->thread.fpr[reg] = data.dd; + else + regs->gpr[reg] = data.ll; + + /* Update RA as needed */ + if (flags & U) + regs->gpr[areg] = regs->dar; + + return 1; +} Index: linux-work/arch/ppc/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc/kernel/Makefile 2005-11-11 10:14:48.000000000 +1100 +++ linux-work/arch/ppc/kernel/Makefile 2005-11-15 14:31:22.000000000 +1100 @@ -13,7 +13,7 @@ extra-y += vmlinux.lds obj-y := entry.o traps.o idle.o time.o misc.o \ - process.o align.o \ + process.o \ setup.o \ ppc_htab.o obj-$(CONFIG_6xx) += l2cr.o cpu_setup_6xx.o Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-11-15 14:30:34.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-11-15 14:31:37.000000000 +1100 @@ -2,6 +2,6 @@ # Makefile for the linux ppc64 kernel. # -obj-y += idle.o align.o +obj-y += idle.o obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o Index: linux-work/include/asm-powerpc/cputable.h =================================================================== --- linux-work.orig/include/asm-powerpc/cputable.h 2005-11-11 10:14:49.000000000 +1100 +++ linux-work/include/asm-powerpc/cputable.h 2005-11-15 14:31:22.000000000 +1100 @@ -90,6 +90,7 @@ #define CPU_FTR_NEED_COHERENT ASM_CONST(0x0000000000020000) #define CPU_FTR_NO_BTIC ASM_CONST(0x0000000000040000) #define CPU_FTR_BIG_PHYS ASM_CONST(0x0000000000080000) +#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0000000000100000) #ifdef __powerpc64__ /* Add the 64b processor unique features in the top half of the word */ @@ -97,7 +98,6 @@ #define CPU_FTR_16M_PAGE ASM_CONST(0x0000000200000000) #define CPU_FTR_TLBIEL ASM_CONST(0x0000000400000000) #define CPU_FTR_NOEXECUTE ASM_CONST(0x0000000800000000) -#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0000001000000000) #define CPU_FTR_IABR ASM_CONST(0x0000002000000000) #define CPU_FTR_MMCRA ASM_CONST(0x0000004000000000) #define CPU_FTR_CTRL ASM_CONST(0x0000008000000000) @@ -113,7 +113,6 @@ #define CPU_FTR_16M_PAGE ASM_CONST(0x0) #define CPU_FTR_TLBIEL ASM_CONST(0x0) #define CPU_FTR_NOEXECUTE ASM_CONST(0x0) -#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0) #define CPU_FTR_IABR ASM_CONST(0x0) #define CPU_FTR_MMCRA ASM_CONST(0x0) #define CPU_FTR_CTRL ASM_CONST(0x0) @@ -273,18 +272,21 @@ CPU_FTRS_POWER3_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, CPU_FTRS_POWER4_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | - CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE, + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_NODSISRALIGN, CPU_FTRS_970_32 = CPU_FTR_COMMON | CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_MAYBE_CAN_NAP, + CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN, CPU_FTRS_8XX = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, - CPU_FTRS_E200 = CPU_FTR_USE_TB, - CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB, + CPU_FTRS_40X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, + CPU_FTRS_44X = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, + CPU_FTRS_E200 = CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN, + CPU_FTRS_E500 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_NODSISRALIGN, CPU_FTRS_E500_2 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | - CPU_FTR_BIG_PHYS, - CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON, + CPU_FTR_BIG_PHYS | CPU_FTR_NODSISRALIGN, + CPU_FTRS_GENERIC_32 = CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN, #ifdef __powerpc64__ CPU_FTRS_POWER3 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR, Index: linux-work/arch/ppc64/kernel/align.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/align.c 2005-11-15 14:30:34.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,396 +0,0 @@ -/* align.c - handle alignment exceptions for the Power PC. - * - * Copyright (c) 1996 Paul Mackerras - * Copyright (c) 1998-1999 TiVo, Inc. - * PowerPC 403GCX modifications. - * Copyright (c) 1999 Grant Erickson - * PowerPC 403GCX/405GP modifications. - * Copyright (c) 2001-2002 PPC64 team, IBM Corp - * 64-bit and Power4 support - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include -#include -#include -#include -#include -#include - -struct aligninfo { - unsigned char len; - unsigned char flags; -}; - -#define IS_XFORM(inst) (((inst) >> 26) == 31) -#define IS_DSFORM(inst) (((inst) >> 26) >= 56) - -#define INVALID { 0, 0 } - -#define LD 1 /* load */ -#define ST 2 /* store */ -#define SE 4 /* sign-extend value */ -#define F 8 /* to/from fp regs */ -#define U 0x10 /* update index register */ -#define M 0x20 /* multiple load/store */ -#define SW 0x40 /* byte swap */ - -#define DCBZ 0x5f /* 8xx/82xx dcbz faults when cache not enabled */ - -/* - * The PowerPC stores certain bits of the instruction that caused the - * alignment exception in the DSISR register. This array maps those - * bits to information about the operand length and what the - * instruction would do. - */ -static struct aligninfo aligninfo[128] = { - { 4, LD }, /* 00 0 0000: lwz / lwarx */ - INVALID, /* 00 0 0001 */ - { 4, ST }, /* 00 0 0010: stw */ - INVALID, /* 00 0 0011 */ - { 2, LD }, /* 00 0 0100: lhz */ - { 2, LD+SE }, /* 00 0 0101: lha */ - { 2, ST }, /* 00 0 0110: sth */ - { 4, LD+M }, /* 00 0 0111: lmw */ - { 4, LD+F }, /* 00 0 1000: lfs */ - { 8, LD+F }, /* 00 0 1001: lfd */ - { 4, ST+F }, /* 00 0 1010: stfs */ - { 8, ST+F }, /* 00 0 1011: stfd */ - INVALID, /* 00 0 1100 */ - { 8, LD }, /* 00 0 1101: ld */ - INVALID, /* 00 0 1110 */ - { 8, ST }, /* 00 0 1111: std */ - { 4, LD+U }, /* 00 1 0000: lwzu */ - INVALID, /* 00 1 0001 */ - { 4, ST+U }, /* 00 1 0010: stwu */ - INVALID, /* 00 1 0011 */ - { 2, LD+U }, /* 00 1 0100: lhzu */ - { 2, LD+SE+U }, /* 00 1 0101: lhau */ - { 2, ST+U }, /* 00 1 0110: sthu */ - { 4, ST+M }, /* 00 1 0111: stmw */ - { 4, LD+F+U }, /* 00 1 1000: lfsu */ - { 8, LD+F+U }, /* 00 1 1001: lfdu */ - { 4, ST+F+U }, /* 00 1 1010: stfsu */ - { 8, ST+F+U }, /* 00 1 1011: stfdu */ - INVALID, /* 00 1 1100 */ - INVALID, /* 00 1 1101 */ - INVALID, /* 00 1 1110 */ - INVALID, /* 00 1 1111 */ - { 8, LD }, /* 01 0 0000: ldx */ - INVALID, /* 01 0 0001 */ - { 8, ST }, /* 01 0 0010: stdx */ - INVALID, /* 01 0 0011 */ - INVALID, /* 01 0 0100 */ - { 4, LD+SE }, /* 01 0 0101: lwax */ - INVALID, /* 01 0 0110 */ - INVALID, /* 01 0 0111 */ - { 0, LD }, /* 01 0 1000: lswx */ - { 0, LD }, /* 01 0 1001: lswi */ - { 0, ST }, /* 01 0 1010: stswx */ - { 0, ST }, /* 01 0 1011: stswi */ - INVALID, /* 01 0 1100 */ - { 8, LD+U }, /* 01 0 1101: ldu */ - INVALID, /* 01 0 1110 */ - { 8, ST+U }, /* 01 0 1111: stdu */ - { 8, LD+U }, /* 01 1 0000: ldux */ - INVALID, /* 01 1 0001 */ - { 8, ST+U }, /* 01 1 0010: stdux */ - INVALID, /* 01 1 0011 */ - INVALID, /* 01 1 0100 */ - { 4, LD+SE+U }, /* 01 1 0101: lwaux */ - INVALID, /* 01 1 0110 */ - INVALID, /* 01 1 0111 */ - INVALID, /* 01 1 1000 */ - INVALID, /* 01 1 1001 */ - INVALID, /* 01 1 1010 */ - INVALID, /* 01 1 1011 */ - INVALID, /* 01 1 1100 */ - INVALID, /* 01 1 1101 */ - INVALID, /* 01 1 1110 */ - INVALID, /* 01 1 1111 */ - INVALID, /* 10 0 0000 */ - INVALID, /* 10 0 0001 */ - { 0, ST }, /* 10 0 0010: stwcx. */ - INVALID, /* 10 0 0011 */ - INVALID, /* 10 0 0100 */ - INVALID, /* 10 0 0101 */ - INVALID, /* 10 0 0110 */ - INVALID, /* 10 0 0111 */ - { 4, LD+SW }, /* 10 0 1000: lwbrx */ - INVALID, /* 10 0 1001 */ - { 4, ST+SW }, /* 10 0 1010: stwbrx */ - INVALID, /* 10 0 1011 */ - { 2, LD+SW }, /* 10 0 1100: lhbrx */ - { 4, LD+SE }, /* 10 0 1101 lwa */ - { 2, ST+SW }, /* 10 0 1110: sthbrx */ - INVALID, /* 10 0 1111 */ - INVALID, /* 10 1 0000 */ - INVALID, /* 10 1 0001 */ - INVALID, /* 10 1 0010 */ - INVALID, /* 10 1 0011 */ - INVALID, /* 10 1 0100 */ - INVALID, /* 10 1 0101 */ - INVALID, /* 10 1 0110 */ - INVALID, /* 10 1 0111 */ - INVALID, /* 10 1 1000 */ - INVALID, /* 10 1 1001 */ - INVALID, /* 10 1 1010 */ - INVALID, /* 10 1 1011 */ - INVALID, /* 10 1 1100 */ - INVALID, /* 10 1 1101 */ - INVALID, /* 10 1 1110 */ - { L1_CACHE_BYTES, ST }, /* 10 1 1111: dcbz */ - { 4, LD }, /* 11 0 0000: lwzx */ - INVALID, /* 11 0 0001 */ - { 4, ST }, /* 11 0 0010: stwx */ - INVALID, /* 11 0 0011 */ - { 2, LD }, /* 11 0 0100: lhzx */ - { 2, LD+SE }, /* 11 0 0101: lhax */ - { 2, ST }, /* 11 0 0110: sthx */ - INVALID, /* 11 0 0111 */ - { 4, LD+F }, /* 11 0 1000: lfsx */ - { 8, LD+F }, /* 11 0 1001: lfdx */ - { 4, ST+F }, /* 11 0 1010: stfsx */ - { 8, ST+F }, /* 11 0 1011: stfdx */ - INVALID, /* 11 0 1100 */ - { 8, LD+M }, /* 11 0 1101: lmd */ - INVALID, /* 11 0 1110 */ - { 8, ST+M }, /* 11 0 1111: stmd */ - { 4, LD+U }, /* 11 1 0000: lwzux */ - INVALID, /* 11 1 0001 */ - { 4, ST+U }, /* 11 1 0010: stwux */ - INVALID, /* 11 1 0011 */ - { 2, LD+U }, /* 11 1 0100: lhzux */ - { 2, LD+SE+U }, /* 11 1 0101: lhaux */ - { 2, ST+U }, /* 11 1 0110: sthux */ - INVALID, /* 11 1 0111 */ - { 4, LD+F+U }, /* 11 1 1000: lfsux */ - { 8, LD+F+U }, /* 11 1 1001: lfdux */ - { 4, ST+F+U }, /* 11 1 1010: stfsux */ - { 8, ST+F+U }, /* 11 1 1011: stfdux */ - INVALID, /* 11 1 1100 */ - INVALID, /* 11 1 1101 */ - INVALID, /* 11 1 1110 */ - INVALID, /* 11 1 1111 */ -}; - -#define SWAP(a, b) (t = (a), (a) = (b), (b) = t) - -static inline unsigned make_dsisr(unsigned instr) -{ - unsigned dsisr; - - /* create a DSISR value from the instruction */ - dsisr = (instr & 0x03ff0000) >> 16; /* bits 6:15 --> 22:31 */ - - if ( IS_XFORM(instr) ) { - dsisr |= (instr & 0x00000006) << 14; /* bits 29:30 --> 15:16 */ - dsisr |= (instr & 0x00000040) << 8; /* bit 25 --> 17 */ - dsisr |= (instr & 0x00000780) << 3; /* bits 21:24 --> 18:21 */ - } - else { - dsisr |= (instr & 0x04000000) >> 12; /* bit 5 --> 17 */ - dsisr |= (instr & 0x78000000) >> 17; /* bits 1: 4 --> 18:21 */ - if ( IS_DSFORM(instr) ) { - dsisr |= (instr & 0x00000003) << 18; /* bits 30:31 --> 12:13 */ - } - } - - return dsisr; -} - -int -fix_alignment(struct pt_regs *regs) -{ - unsigned int instr, nb, flags; - int t; - unsigned long reg, areg; - unsigned long i; - int ret; - unsigned dsisr; - unsigned char __user *addr; - unsigned char __user *p; - unsigned long __user *lp; - union { - long ll; - double dd; - unsigned char v[8]; - struct { - unsigned hi32; - int low32; - } x32; - struct { - unsigned char hi48[6]; - short low16; - } x16; - } data; - - /* - * Return 1 on success - * Return 0 if unable to handle the interrupt - * Return -EFAULT if data address is bad - */ - - dsisr = regs->dsisr; - - if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { - unsigned int real_instr; - if (__get_user(real_instr, (unsigned int __user *)regs->nip)) - return 0; - dsisr = make_dsisr(real_instr); - } - - /* extract the operation and registers from the dsisr */ - reg = (dsisr >> 5) & 0x1f; /* source/dest register */ - areg = dsisr & 0x1f; /* register to update */ - instr = (dsisr >> 10) & 0x7f; - instr |= (dsisr >> 13) & 0x60; - - /* Lookup the operation in our table */ - nb = aligninfo[instr].len; - flags = aligninfo[instr].flags; - - /* DAR has the operand effective address */ - addr = (unsigned char __user *)regs->dar; - - /* A size of 0 indicates an instruction we don't support */ - /* we also don't support the multiples (lmw, stmw, lmd, stmd) */ - if ((nb == 0) || (flags & M)) - return 0; /* too hard or invalid instruction */ - - /* - * Special handling for dcbz - * dcbz may give an alignment exception for accesses to caching inhibited - * storage - */ - if (instr == DCBZ) - addr = (unsigned char __user *) ((unsigned long)addr & -L1_CACHE_BYTES); - - /* Verify the address of the operand */ - if (user_mode(regs)) { - if (!access_ok((flags & ST? VERIFY_WRITE: VERIFY_READ), addr, nb)) - return -EFAULT; /* bad address */ - } - - /* Force the fprs into the save area so we can reference them */ - if (flags & F) { - if (!user_mode(regs)) - return 0; - flush_fp_to_thread(current); - } - - /* If we are loading, get the data from user space */ - if (flags & LD) { - data.ll = 0; - ret = 0; - p = addr; - switch (nb) { - case 8: - ret |= __get_user(data.v[0], p++); - ret |= __get_user(data.v[1], p++); - ret |= __get_user(data.v[2], p++); - ret |= __get_user(data.v[3], p++); - case 4: - ret |= __get_user(data.v[4], p++); - ret |= __get_user(data.v[5], p++); - case 2: - ret |= __get_user(data.v[6], p++); - ret |= __get_user(data.v[7], p++); - if (ret) - return -EFAULT; - } - } - - /* If we are storing, get the data from the saved gpr or fpr */ - if (flags & ST) { - if (flags & F) { - if (nb == 4) { - /* Doing stfs, have to convert to single */ - preempt_disable(); - enable_kernel_fp(); - cvt_df(¤t->thread.fpr[reg], (float *)&data.v[4], ¤t->thread); - disable_kernel_fp(); - preempt_enable(); - } - else - data.dd = current->thread.fpr[reg]; - } - else - data.ll = regs->gpr[reg]; - } - - /* Swap bytes as needed */ - if (flags & SW) { - if (nb == 2) - SWAP(data.v[6], data.v[7]); - else { /* nb must be 4 */ - SWAP(data.v[4], data.v[7]); - SWAP(data.v[5], data.v[6]); - } - } - - /* Sign extend as needed */ - if (flags & SE) { - if ( nb == 2 ) - data.ll = data.x16.low16; - else /* nb must be 4 */ - data.ll = data.x32.low32; - } - - /* If we are loading, move the data to the gpr or fpr */ - if (flags & LD) { - if (flags & F) { - if (nb == 4) { - /* Doing lfs, have to convert to double */ - preempt_disable(); - enable_kernel_fp(); - cvt_fd((float *)&data.v[4], ¤t->thread.fpr[reg], ¤t->thread); - disable_kernel_fp(); - preempt_enable(); - } - else - current->thread.fpr[reg] = data.dd; - } - else - regs->gpr[reg] = data.ll; - } - - /* If we are storing, copy the data to the user */ - if (flags & ST) { - ret = 0; - p = addr; - switch (nb) { - case 128: /* Special case - must be dcbz */ - lp = (unsigned long __user *)p; - for (i = 0; i < L1_CACHE_BYTES / sizeof(long); ++i) - ret |= __put_user(0, lp++); - break; - case 8: - ret |= __put_user(data.v[0], p++); - ret |= __put_user(data.v[1], p++); - ret |= __put_user(data.v[2], p++); - ret |= __put_user(data.v[3], p++); - case 4: - ret |= __put_user(data.v[4], p++); - ret |= __put_user(data.v[5], p++); - case 2: - ret |= __put_user(data.v[6], p++); - ret |= __put_user(data.v[7], p++); - } - if (ret) - return -EFAULT; - } - - /* Update RA as needed */ - if (flags & U) { - regs->gpr[areg] = regs->dar; - } - - return 1; -} - From benh at kernel.crashing.org Tue Nov 15 14:40:25 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 14:40:25 +1100 Subject: [PATCH] powerpc: update defconfigs Message-ID: <1132026026.6094.50.camel@gaston> My patch moving ppc64 RTC to genrtc was supposed to update all defconfigs, but for some reason, the patch actually posted only had the pseries one... ouch. This patch properly updates all defconfigs. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/powerpc/configs/cell_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/cell_defconfig 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/configs/cell_defconfig 2005-11-15 14:36:26.000000000 +1100 @@ -1,18 +1,33 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14-rc4 -# Thu Oct 20 08:29:10 2005 +# Linux kernel version: 2.6.15-rc1 +# Tue Nov 15 14:36:20 2005 # +CONFIG_PPC64=y CONFIG_64BIT=y +CONFIG_PPC_MERGE=y CONFIG_MMU=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y -CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y +CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y -CONFIG_FORCE_MAX_ZONEORDER=13 + +# +# Processor support +# +# CONFIG_POWER4_ONLY is not set +CONFIG_POWER3=y +CONFIG_POWER4=y +CONFIG_PPC_FPU=y +CONFIG_ALTIVEC=y +CONFIG_PPC_STD_MMU=y +CONFIG_SMP=y +CONFIG_NR_CPUS=4 # # Code maturity level options @@ -66,31 +81,69 @@ # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y -CONFIG_SYSVIPC_COMPAT=y + +# +# Block layer +# + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support # -# CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y +# CONFIG_PPC_ISERIES is not set +# CONFIG_EMBEDDED6xx is not set +# CONFIG_APUS is not set # CONFIG_PPC_PSERIES is not set -CONFIG_PPC_BPA=y # CONFIG_PPC_PMAC is not set # CONFIG_PPC_MAPLE is not set -CONFIG_PPC=y -CONFIG_PPC64=y +CONFIG_PPC_CELL=y CONFIG_PPC_OF=y -CONFIG_BPA_IIC=y -CONFIG_ALTIVEC=y -CONFIG_KEXEC=y # CONFIG_U3_DART is not set -# CONFIG_BOOTX_TEXT is not set -# CONFIG_POWER4_ONLY is not set +CONFIG_PPC_RTAS=y +# CONFIG_RTAS_ERROR_LOGGING is not set +CONFIG_RTAS_PROC=y +CONFIG_RTAS_FLASH=y +CONFIG_MMIO_NVRAM=y +CONFIG_CELL_IIC=y +# CONFIG_PPC_MPC106 is not set +# CONFIG_GENERIC_TBSYNC is not set +# CONFIG_CPU_FREQ is not set +# CONFIG_WANT_EARLY_SERIAL is not set + +# +# Kernel options +# +# CONFIG_HZ_100 is not set +CONFIG_HZ_250=y +# CONFIG_HZ_1000 is not set +CONFIG_HZ=250 +CONFIG_PREEMPT_NONE=y +# CONFIG_PREEMPT_VOLUNTARY is not set +# CONFIG_PREEMPT is not set +CONFIG_PREEMPT_BKL=y +CONFIG_BINFMT_ELF=y +# CONFIG_BINFMT_MISC is not set +CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_IOMMU_VMERGE is not set -CONFIG_SMP=y -CONFIG_NR_CPUS=4 +CONFIG_KEXEC=y +CONFIG_IRQ_ALL_CPUS=y +# CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y +CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set @@ -98,30 +151,21 @@ CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set -# CONFIG_NUMA is not set +CONFIG_SPLIT_PTLOCK_CPUS=4096 +# CONFIG_PPC_64K_PAGES is not set CONFIG_SCHED_SMT=y -CONFIG_PREEMPT_NONE=y -# CONFIG_PREEMPT_VOLUNTARY is not set -# CONFIG_PREEMPT is not set -CONFIG_PREEMPT_BKL=y -# CONFIG_HZ_100 is not set -CONFIG_HZ_250=y -# CONFIG_HZ_1000 is not set -CONFIG_HZ=250 -CONFIG_GENERIC_HARDIRQS=y -CONFIG_PPC_RTAS=y -CONFIG_RTAS_PROC=y -CONFIG_RTAS_FLASH=y -CONFIG_SECCOMP=y -CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set +# CONFIG_PM is not set +CONFIG_SECCOMP=y CONFIG_ISA_DMA_API=y # -# Bus Options +# Bus options # +CONFIG_GENERIC_ISA_DMA=y +# CONFIG_PPC_I8259 is not set +# CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y @@ -136,6 +180,7 @@ # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set +CONFIG_KERNEL_START=0xc000000000000000 # # Networking @@ -183,6 +228,10 @@ CONFIG_IPV6_TUNNEL=m CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set + +# +# Core Netfilter Configuration +# # CONFIG_NETFILTER_NETLINK is not set # @@ -284,6 +333,10 @@ # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set + +# +# QoS and/or fair queueing +# # CONFIG_NET_SCHED is not set CONFIG_NET_CLS_ROUTE=y @@ -345,14 +398,6 @@ CONFIG_BLK_DEV_RAM_SIZE=131072 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set - -# -# IO Schedulers -# -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # @@ -442,6 +487,7 @@ # # Macintosh device drivers # +# CONFIG_WINDFARM is not set # # Network device support @@ -495,7 +541,6 @@ # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set -# CONFIG_SPIDER_NET is not set # CONFIG_MV643XX_ETH is not set # @@ -625,7 +670,7 @@ # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -CONFIG_WATCHDOG_RTAS=y +# CONFIG_WATCHDOG_RTAS is not set # # PCI-based Watchdog Cards @@ -633,6 +678,8 @@ # CONFIG_PCIPCWATCHDOG is not set # CONFIG_WDTPCI is not set # CONFIG_RTC is not set +CONFIG_GEN_RTC=y +# CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set @@ -649,6 +696,7 @@ # TPM devices # # CONFIG_TCG_TPM is not set +# CONFIG_TELCLOCK is not set # # I2C support @@ -699,6 +747,7 @@ # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_SENSORS_MAX6875 is not set +# CONFIG_RTC_X1205_I2C is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set @@ -757,6 +806,10 @@ # CONFIG_USB is not set # +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set @@ -943,9 +996,24 @@ # CONFIG_NLS_UTF8 is not set # -# Profiling support +# Library routines +# +# CONFIG_CRC_CCITT is not set +# CONFIG_CRC16 is not set +CONFIG_CRC32=y +# CONFIG_LIBCRC32C is not set +CONFIG_ZLIB_INFLATE=m +CONFIG_ZLIB_DEFLATE=m +CONFIG_TEXTSEARCH=y +CONFIG_TEXTSEARCH_KMP=m +CONFIG_TEXTSEARCH_BM=m +CONFIG_TEXTSEARCH_FSM=m + +# +# Instrumentation Support # # CONFIG_PROFILING is not set +# CONFIG_KPROBES is not set # # Kernel hacking @@ -962,13 +1030,14 @@ # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y +# CONFIG_DEBUG_VM is not set +# CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_STACKOVERFLOW is not set -# CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set CONFIG_DEBUGGER=y # CONFIG_XMON is not set -# CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y +# CONFIG_BOOTX_TEXT is not set # # Security options @@ -1008,17 +1077,3 @@ # # Hardware crypto devices # - -# -# Library routines -# -# CONFIG_CRC_CCITT is not set -# CONFIG_CRC16 is not set -CONFIG_CRC32=y -# CONFIG_LIBCRC32C is not set -CONFIG_ZLIB_INFLATE=m -CONFIG_ZLIB_DEFLATE=m -CONFIG_TEXTSEARCH=y -CONFIG_TEXTSEARCH_KMP=m -CONFIG_TEXTSEARCH_BM=m -CONFIG_TEXTSEARCH_FSM=m Index: linux-work/arch/powerpc/configs/g5_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/g5_defconfig 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/configs/g5_defconfig 2005-11-15 14:39:28.000000000 +1100 @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14 -# Mon Nov 7 13:37:59 2005 +# Linux kernel version: 2.6.15-rc1 +# Tue Nov 15 14:39:20 2005 # CONFIG_PPC64=y CONFIG_64BIT=y @@ -83,6 +83,23 @@ CONFIG_STOP_MACHINE=y # +# Block layer +# + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" + +# # Platform support # CONFIG_PPC_MULTIPLATFORM=y @@ -137,6 +154,7 @@ # CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y +CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set @@ -144,7 +162,7 @@ CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set -CONFIG_SPLIT_PTLOCK_CPUS=4 +CONFIG_SPLIT_PTLOCK_CPUS=4096 # CONFIG_PPC_64K_PAGES is not set # CONFIG_SCHED_SMT is not set CONFIG_PROC_DEVICETREE=y @@ -215,6 +233,10 @@ # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set + +# +# Core Netfilter Configuration +# # CONFIG_NETFILTER_NETLINK is not set # @@ -382,19 +404,6 @@ CONFIG_CDROM_PKTCDVD=m CONFIG_CDROM_PKTCDVD_BUFFERS=8 # CONFIG_CDROM_PKTCDVD_WCACHE is not set - -# -# IO Schedulers -# -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y -CONFIG_DEFAULT_AS=y -# CONFIG_DEFAULT_DEADLINE is not set -# CONFIG_DEFAULT_CFQ is not set -# CONFIG_DEFAULT_NOOP is not set -CONFIG_DEFAULT_IOSCHED="anticipatory" # CONFIG_ATA_OVER_ETH is not set # @@ -656,7 +665,6 @@ # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set # CONFIG_NET_PCI is not set -# CONFIG_FEC_8XX is not set # # Ethernet (1000 Mbit) @@ -710,6 +718,7 @@ CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m +# CONFIG_PPP_MPPE is not set CONFIG_PPPOE=m # CONFIG_SLIP is not set # CONFIG_NET_FC is not set @@ -804,6 +813,8 @@ # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set +CONFIG_GEN_RTC=y +# CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set @@ -917,7 +928,6 @@ CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y -CONFIG_FB_SOFT_CURSOR=y CONFIG_FB_MACMODES=y CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y @@ -932,6 +942,7 @@ # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set +# CONFIG_FB_S1D13XXX is not set CONFIG_FB_NVIDIA=y CONFIG_FB_NVIDIA_I2C=y # CONFIG_FB_RIVA is not set @@ -950,7 +961,6 @@ # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set # CONFIG_FB_TRIDENT is not set -# CONFIG_FB_S1D13XXX is not set # CONFIG_FB_VIRTUAL is not set # @@ -959,6 +969,7 @@ # CONFIG_VGA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y +# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y @@ -1474,10 +1485,11 @@ CONFIG_TEXTSEARCH_FSM=m # -# Profiling support +# Instrumentation Support # CONFIG_PROFILING=y CONFIG_OPROFILE=y +# CONFIG_KPROBES is not set # # Kernel hacking @@ -1497,7 +1509,6 @@ # CONFIG_DEBUG_VM is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_STACKOVERFLOW is not set -# CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUGGER is not set CONFIG_IRQSTACKS=y Index: linux-work/arch/powerpc/configs/iseries_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/iseries_defconfig 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/configs/iseries_defconfig 2005-11-15 14:38:16.000000000 +1100 @@ -1,18 +1,33 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14-rc4 -# Thu Oct 20 08:30:56 2005 +# Linux kernel version: 2.6.15-rc1 +# Tue Nov 15 14:38:09 2005 # +CONFIG_PPC64=y CONFIG_64BIT=y +CONFIG_PPC_MERGE=y CONFIG_MMU=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y -CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y +CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y -CONFIG_FORCE_MAX_ZONEORDER=13 + +# +# Processor support +# +# CONFIG_POWER4_ONLY is not set +CONFIG_POWER3=y +CONFIG_POWER4=y +CONFIG_PPC_FPU=y +# CONFIG_ALTIVEC is not set +CONFIG_PPC_STD_MMU=y +CONFIG_SMP=y +CONFIG_NR_CPUS=32 # # Code maturity level options @@ -68,22 +83,60 @@ CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y -CONFIG_SYSVIPC_COMPAT=y + +# +# Block layer +# + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support # -CONFIG_PPC_ISERIES=y # CONFIG_PPC_MULTIPLATFORM is not set -CONFIG_PPC=y -CONFIG_PPC64=y +CONFIG_PPC_ISERIES=y +# CONFIG_EMBEDDED6xx is not set +# CONFIG_APUS is not set +# CONFIG_PPC_RTAS is not set +# CONFIG_MMIO_NVRAM is not set CONFIG_IBMVIO=y -# CONFIG_POWER4_ONLY is not set +# CONFIG_PPC_MPC106 is not set +# CONFIG_GENERIC_TBSYNC is not set +# CONFIG_CPU_FREQ is not set +# CONFIG_WANT_EARLY_SERIAL is not set + +# +# Kernel options +# +# CONFIG_HZ_100 is not set +CONFIG_HZ_250=y +# CONFIG_HZ_1000 is not set +CONFIG_HZ=250 +CONFIG_PREEMPT_NONE=y +# CONFIG_PREEMPT_VOLUNTARY is not set +# CONFIG_PREEMPT is not set +# CONFIG_PREEMPT_BKL is not set +CONFIG_BINFMT_ELF=y +# CONFIG_BINFMT_MISC is not set +CONFIG_FORCE_MAX_ZONEORDER=13 CONFIG_IOMMU_VMERGE=y -CONFIG_SMP=y -CONFIG_NR_CPUS=32 +CONFIG_IRQ_ALL_CPUS=y +CONFIG_LPARCFG=y +# CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y +CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set @@ -91,26 +144,20 @@ CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set -# CONFIG_NUMA is not set +CONFIG_SPLIT_PTLOCK_CPUS=4096 +# CONFIG_PPC_64K_PAGES is not set # CONFIG_SCHED_SMT is not set -CONFIG_PREEMPT_NONE=y -# CONFIG_PREEMPT_VOLUNTARY is not set -# CONFIG_PREEMPT is not set -# CONFIG_PREEMPT_BKL is not set -# CONFIG_HZ_100 is not set -CONFIG_HZ_250=y -# CONFIG_HZ_1000 is not set -CONFIG_HZ=250 -CONFIG_GENERIC_HARDIRQS=y -CONFIG_LPARCFG=y +CONFIG_PROC_DEVICETREE=y +# CONFIG_PM is not set CONFIG_SECCOMP=y -CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set CONFIG_ISA_DMA_API=y # -# Bus Options +# Bus options # +CONFIG_GENERIC_ISA_DMA=y +# CONFIG_PPC_I8259 is not set +# CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y @@ -125,6 +172,7 @@ # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set +CONFIG_KERNEL_START=0xc000000000000000 # # Networking @@ -166,6 +214,10 @@ # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set + +# +# Core Netfilter Configuration +# # CONFIG_NETFILTER_NETLINK is not set # @@ -265,6 +317,10 @@ # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set + +# +# QoS and/or fair queueing +# # CONFIG_NET_SCHED is not set CONFIG_NET_CLS_ROUTE=y @@ -326,14 +382,6 @@ CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set - -# -# IO Schedulers -# -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # @@ -377,6 +425,7 @@ # # SCSI low-level drivers # +# CONFIG_ISCSI_TCP is not set # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set @@ -454,6 +503,7 @@ # # Macintosh device drivers # +# CONFIG_WINDFARM is not set # # Network device support @@ -561,6 +611,7 @@ CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m +# CONFIG_PPP_MPPE is not set CONFIG_PPPOE=m # CONFIG_SLIP is not set # CONFIG_NET_FC is not set @@ -643,6 +694,8 @@ # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set +CONFIG_GEN_RTC=y +# CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set @@ -660,6 +713,7 @@ # TPM devices # # CONFIG_TCG_TPM is not set +# CONFIG_TELCLOCK is not set # # I2C support @@ -713,6 +767,10 @@ # CONFIG_USB is not set # +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set @@ -917,10 +975,25 @@ CONFIG_VIOPATH=y # -# Profiling support +# Library routines +# +CONFIG_CRC_CCITT=m +# CONFIG_CRC16 is not set +CONFIG_CRC32=y +CONFIG_LIBCRC32C=m +CONFIG_ZLIB_INFLATE=y +CONFIG_ZLIB_DEFLATE=m +CONFIG_TEXTSEARCH=y +CONFIG_TEXTSEARCH_KMP=m +CONFIG_TEXTSEARCH_BM=m +CONFIG_TEXTSEARCH_FSM=m + +# +# Instrumentation Support # CONFIG_PROFILING=y CONFIG_OPROFILE=y +# CONFIG_KPROBES is not set # # Kernel hacking @@ -937,11 +1010,11 @@ # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y +# CONFIG_DEBUG_VM is not set +# CONFIG_RCU_TORTURE_TEST is not set CONFIG_DEBUG_STACKOVERFLOW=y -# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUGGER is not set -# CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y # @@ -982,17 +1055,3 @@ # # Hardware crypto devices # - -# -# Library routines -# -CONFIG_CRC_CCITT=m -# CONFIG_CRC16 is not set -CONFIG_CRC32=y -CONFIG_LIBCRC32C=m -CONFIG_ZLIB_INFLATE=y -CONFIG_ZLIB_DEFLATE=m -CONFIG_TEXTSEARCH=y -CONFIG_TEXTSEARCH_KMP=m -CONFIG_TEXTSEARCH_BM=m -CONFIG_TEXTSEARCH_FSM=m Index: linux-work/arch/powerpc/configs/maple_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/maple_defconfig 2005-11-07 10:31:39.000000000 +1100 +++ linux-work/arch/powerpc/configs/maple_defconfig 2005-11-15 14:39:04.000000000 +1100 @@ -1,18 +1,32 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.14-rc4 -# Thu Oct 20 08:31:24 2005 +# Linux kernel version: 2.6.15-rc1 +# Tue Nov 15 14:38:58 2005 # +CONFIG_PPC64=y CONFIG_64BIT=y +CONFIG_PPC_MERGE=y CONFIG_MMU=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y -CONFIG_GENERIC_ISA_DMA=y +CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y +CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y -CONFIG_FORCE_MAX_ZONEORDER=13 + +# +# Processor support +# +CONFIG_POWER4_ONLY=y +CONFIG_POWER4=y +CONFIG_PPC_FPU=y +# CONFIG_ALTIVEC is not set +CONFIG_PPC_STD_MMU=y +CONFIG_SMP=y +CONFIG_NR_CPUS=2 # # Code maturity level options @@ -67,32 +81,67 @@ CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y -CONFIG_SYSVIPC_COMPAT=y + +# +# Block layer +# + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +CONFIG_DEFAULT_AS=y +# CONFIG_DEFAULT_DEADLINE is not set +# CONFIG_DEFAULT_CFQ is not set +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support # -# CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y +# CONFIG_PPC_ISERIES is not set +# CONFIG_EMBEDDED6xx is not set +# CONFIG_APUS is not set # CONFIG_PPC_PSERIES is not set -# CONFIG_PPC_BPA is not set # CONFIG_PPC_PMAC is not set CONFIG_PPC_MAPLE=y -CONFIG_PPC=y -CONFIG_PPC64=y +# CONFIG_PPC_CELL is not set CONFIG_PPC_OF=y -CONFIG_MPIC=y -# CONFIG_ALTIVEC is not set -CONFIG_KEXEC=y CONFIG_U3_DART=y +CONFIG_MPIC=y +# CONFIG_PPC_RTAS is not set +# CONFIG_MMIO_NVRAM is not set CONFIG_MPIC_BROKEN_U3=y -CONFIG_BOOTX_TEXT=y -CONFIG_POWER4_ONLY=y +# CONFIG_PPC_MPC106 is not set +CONFIG_GENERIC_TBSYNC=y +# CONFIG_CPU_FREQ is not set +# CONFIG_WANT_EARLY_SERIAL is not set + +# +# Kernel options +# +# CONFIG_HZ_100 is not set +CONFIG_HZ_250=y +# CONFIG_HZ_1000 is not set +CONFIG_HZ=250 +CONFIG_PREEMPT_NONE=y +# CONFIG_PREEMPT_VOLUNTARY is not set +# CONFIG_PREEMPT is not set +# CONFIG_PREEMPT_BKL is not set +CONFIG_BINFMT_ELF=y +# CONFIG_BINFMT_MISC is not set +CONFIG_FORCE_MAX_ZONEORDER=13 CONFIG_IOMMU_VMERGE=y -CONFIG_SMP=y -CONFIG_NR_CPUS=2 +CONFIG_KEXEC=y +CONFIG_IRQ_ALL_CPUS=y +# CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y +CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set @@ -100,27 +149,21 @@ CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set -# CONFIG_NUMA is not set +CONFIG_SPLIT_PTLOCK_CPUS=4096 +# CONFIG_PPC_64K_PAGES is not set # CONFIG_SCHED_SMT is not set -CONFIG_PREEMPT_NONE=y -# CONFIG_PREEMPT_VOLUNTARY is not set -# CONFIG_PREEMPT is not set -# CONFIG_PREEMPT_BKL is not set -# CONFIG_HZ_100 is not set -CONFIG_HZ_250=y -# CONFIG_HZ_1000 is not set -CONFIG_HZ=250 -CONFIG_GENERIC_HARDIRQS=y -CONFIG_SECCOMP=y -CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set +# CONFIG_PM is not set +CONFIG_SECCOMP=y CONFIG_ISA_DMA_API=y # -# Bus Options +# Bus options # +CONFIG_GENERIC_ISA_DMA=y +# CONFIG_PPC_I8259 is not set +# CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y @@ -135,6 +178,7 @@ # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set +CONFIG_KERNEL_START=0xc000000000000000 # # Networking @@ -193,6 +237,10 @@ # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set + +# +# QoS and/or fair queueing +# # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set @@ -254,14 +302,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192 # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CDROM_PKTCDVD is not set - -# -# IO Schedulers -# -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # @@ -351,6 +391,7 @@ # # Macintosh device drivers # +# CONFIG_WINDFARM is not set # # Network device support @@ -533,6 +574,8 @@ # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set +CONFIG_GEN_RTC=y +# CONFIG_GEN_RTC_X is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set @@ -549,6 +592,7 @@ # TPM devices # # CONFIG_TCG_TPM is not set +# CONFIG_TELCLOCK is not set # # I2C support @@ -599,6 +643,7 @@ # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_SENSORS_MAX6875 is not set +# CONFIG_RTC_X1205_I2C is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set @@ -681,12 +726,15 @@ # # USB Device Class drivers # -# CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # -# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' +# + +# +# may also be needed; see USB_STORAGE Help for more information # # CONFIG_USB_STORAGE is not set @@ -776,6 +824,7 @@ # CONFIG_USB_SERIAL_KLSI is not set # CONFIG_USB_SERIAL_KOBIL_SCT is not set # CONFIG_USB_SERIAL_MCT_U232 is not set +# CONFIG_USB_SERIAL_NOKIA_DKU2 is not set # CONFIG_USB_SERIAL_PL2303 is not set # CONFIG_USB_SERIAL_HP4X is not set # CONFIG_USB_SERIAL_SAFE is not set @@ -985,9 +1034,19 @@ CONFIG_NLS_UTF8=y # -# Profiling support +# Library routines +# +CONFIG_CRC_CCITT=y +# CONFIG_CRC16 is not set +CONFIG_CRC32=y +# CONFIG_LIBCRC32C is not set +CONFIG_ZLIB_INFLATE=y + +# +# Instrumentation Support # # CONFIG_PROFILING is not set +# CONFIG_KPROBES is not set # # Kernel hacking @@ -1004,14 +1063,15 @@ # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_FS=y +# CONFIG_DEBUG_VM is not set +# CONFIG_RCU_TORTURE_TEST is not set CONFIG_DEBUG_STACKOVERFLOW=y -# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUGGER=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y -# CONFIG_PPCDBG is not set # CONFIG_IRQSTACKS is not set +CONFIG_BOOTX_TEXT=y # # Security options @@ -1051,12 +1111,3 @@ # # Hardware crypto devices # - -# -# Library routines -# -CONFIG_CRC_CCITT=y -# CONFIG_CRC16 is not set -CONFIG_CRC32=y -# CONFIG_LIBCRC32C is not set -CONFIG_ZLIB_INFLATE=y Index: linux-work/arch/powerpc/configs/pseries_defconfig =================================================================== --- linux-work.orig/arch/powerpc/configs/pseries_defconfig 2005-11-15 13:31:57.000000000 +1100 +++ linux-work/arch/powerpc/configs/pseries_defconfig 2005-11-15 14:37:00.000000000 +1100 @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.15-rc1 -# Mon Nov 14 15:27:00 2005 +# Tue Nov 15 14:36:55 2005 # CONFIG_PPC64=y CONFIG_64BIT=y @@ -144,7 +144,7 @@ CONFIG_IOMMU_VMERGE=y CONFIG_HOTPLUG_CPU=y CONFIG_KEXEC=y -# CONFIG_IRQ_ALL_CPUS is not set +CONFIG_IRQ_ALL_CPUS=y CONFIG_PPC_SPLPAR=y CONFIG_EEH=y CONFIG_SCANLOG=m From michael at ellerman.id.au Tue Nov 15 14:49:22 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 15 Nov 2005 14:49:22 +1100 (EST) Subject: [PATCH] powerpc: Fixup debugging in lmb.c Message-ID: <20051115034922.2870468742@ozlabs.org> Somewhere we lost the include of udbg.h in lmb.c. While we're there, add a DBG macro like every other file has and use it in lmb_dump_all(). Signed-off-by: Michael Ellerman --- arch/powerpc/mm/lmb.c | 33 ++++++++++++++++++--------------- 1 files changed, 18 insertions(+), 15 deletions(-) Index: kexec/arch/powerpc/mm/lmb.c =================================================================== --- kexec.orig/arch/powerpc/mm/lmb.c +++ kexec/arch/powerpc/mm/lmb.c @@ -22,35 +22,38 @@ #include "mmu_decl.h" /* for __max_low_memory */ #endif -struct lmb lmb; - #undef DEBUG +#ifdef DEBUG +#include +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +struct lmb lmb; + void lmb_dump_all(void) { #ifdef DEBUG unsigned long i; - udbg_printf("lmb_dump_all:\n"); - udbg_printf(" memory.cnt = 0x%lx\n", - lmb.memory.cnt); - udbg_printf(" memory.size = 0x%lx\n", - lmb.memory.size); + DBG("lmb_dump_all:\n"); + DBG(" memory.cnt = 0x%lx\n", lmb.memory.cnt); + DBG(" memory.size = 0x%lx\n", lmb.memory.size); for (i=0; i < lmb.memory.cnt ;i++) { - udbg_printf(" memory.region[0x%x].base = 0x%lx\n", + DBG(" memory.region[0x%x].base = 0x%lx\n", i, lmb.memory.region[i].base); - udbg_printf(" .size = 0x%lx\n", + DBG(" .size = 0x%lx\n", lmb.memory.region[i].size); } - udbg_printf("\n reserved.cnt = 0x%lx\n", - lmb.reserved.cnt); - udbg_printf(" reserved.size = 0x%lx\n", - lmb.reserved.size); + DBG("\n reserved.cnt = 0x%lx\n", lmb.reserved.cnt); + DBG(" reserved.size = 0x%lx\n", lmb.reserved.size); for (i=0; i < lmb.reserved.cnt ;i++) { - udbg_printf(" reserved.region[0x%x].base = 0x%lx\n", + DBG(" reserved.region[0x%x].base = 0x%lx\n", i, lmb.reserved.region[i].base); - udbg_printf(" .size = 0x%lx\n", + DBG(" .size = 0x%lx\n", lmb.reserved.region[i].size); } #endif /* DEBUG */ From michael at ellerman.id.au Tue Nov 15 15:16:38 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 15 Nov 2005 15:16:38 +1100 (EST) Subject: [PATCH] powerpc: More debugging fixups Message-ID: <20051115041638.76F2168746@ozlabs.org> Add a few more missing includes of udbg.h Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/pci_64.c | 2 +- arch/powerpc/kernel/setup-common.c | 1 + arch/powerpc/kernel/smp.c | 7 ++++--- arch/powerpc/platforms/pseries/smp.c | 1 + 4 files changed, 7 insertions(+), 4 deletions(-) Index: kexec/arch/powerpc/kernel/smp.c =================================================================== --- kexec.orig/arch/powerpc/kernel/smp.c +++ kexec/arch/powerpc/kernel/smp.c @@ -49,15 +49,16 @@ #include #endif -int smp_hw_index[NR_CPUS]; -struct thread_info *secondary_ti; - #ifdef DEBUG +#include #define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) #endif +int smp_hw_index[NR_CPUS]; +struct thread_info *secondary_ti; + cpumask_t cpu_possible_map = CPU_MASK_NONE; cpumask_t cpu_online_map = CPU_MASK_NONE; cpumask_t cpu_sibling_map[NR_CPUS] = { [0 ... NR_CPUS-1] = CPU_MASK_NONE }; Index: kexec/arch/powerpc/platforms/pseries/smp.c =================================================================== --- kexec.orig/arch/powerpc/platforms/pseries/smp.c +++ kexec/arch/powerpc/platforms/pseries/smp.c @@ -51,6 +51,7 @@ #include "plpar_wrappers.h" #ifdef DEBUG +#include #define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) Index: kexec/arch/powerpc/kernel/pci_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/pci_64.c +++ kexec/arch/powerpc/kernel/pci_64.c @@ -30,10 +30,10 @@ #include #include #include -#include #include #ifdef DEBUG +#include #define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) Index: kexec/arch/powerpc/kernel/setup-common.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup-common.c +++ kexec/arch/powerpc/kernel/setup-common.c @@ -59,6 +59,7 @@ #define DEBUG 1 #ifdef DEBUG +#include #define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) From torvalds at osdl.org Tue Nov 15 15:20:12 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 20:20:12 -0800 (PST) Subject: please pull powerpc-merge.git In-Reply-To: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> Message-ID: On Tue, 15 Nov 2005, Paul Mackerras wrote: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git Hey, I just got an oops. This was with a plain 2.6.15-rc1 kernel (so before the merge): Unable to handle kernel paging request for data at address 0xc0000000ff000000 Faulting instruction address: 0xc000000000031200 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 NUMA POWERMAC Modules linked in: autofs NIP: C000000000031200 LR: C000000000097B90 CTR: 0000000000000020 REGS: c0000001561f38d0 TRAP: 0300 Not tainted (2.6.15-rc1-g508862e4) MSR: 9000000000009032 CR: 88004448 XER: 00000000 DAR: C0000000FF000000, DSISR: 0000000042010000 TASK = c000000105185040[6398] 'git-fsck-object' THREAD: c0000001561f0000 CPU: 0 GPR00: 0000000000000080 C0000001561F3B50 C0000000006CD7E0 C0000000FF000000 GPR04: 00000000F2113000 C0000000040E6000 C0000000005BA400 9000000000009032 GPR08: C00000000FFFA800 C0000000006CED38 C0000000006D1CA8 0000000000000020 GPR12: 0000000048004424 C0000000005BA400 0000000010020000 0000000010010000 GPR16: 0000000000044836 00000000004B1EC2 0000000000000000 0000000000000000 GPR20: C0000001791DCC80 C00000017BE63700 C000000145D66700 00000000F2113000 GPR24: 0000000002000000 0000000000000000 0000000000000898 C0000000040E6000 GPR28: C00000013914E898 C00000013914E000 C0000000005D7830 0000000000000000 NIP [C000000000031200] .clear_user_page+0x10/0x60 LR [C000000000097B90] .__handle_mm_fault+0xda0/0xf10 Call Trace: [C0000001561F3B50] [C000000000097B24] .__handle_mm_fault+0xd34/0xf10 (unreliable) [C0000001561F3C60] [C00000000049729C] .do_page_fault+0x4ec/0x7f0 [C0000001561F3E30] [C000000000004760] .handle_page_fault+0x20/0x54 Instruction dump: 4d820020 7c0018a8 7c004878 7c0019ad 40c2fff4 4e800020 60000000 60000000 e922a3e0 8169000c 80090004 7d6903a6 <7c001fec> 7c630214 4320fff8 e922a3d8 the worrisome thing is that it happened while running git-fsck-objects.. Any ideas? I'm really bad at decoding ppc64 oopses, so ... (There are other reports of VM-induced problems on -rc1, this is probably not ppc64-related). Linus From paulus at samba.org Tue Nov 15 15:40:14 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 15:40:14 +1100 Subject: please pull powerpc-merge.git In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> Message-ID: <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Linus Torvalds writes: > Hey, I just got an oops. > > This was with a plain 2.6.15-rc1 kernel (so before the merge): > > Unable to handle kernel paging request for data at address 0xc0000000ff000000 How much RAM do you have? That address is in the I/O hole (from 2G to 4G). > NIP [C000000000031200] .clear_user_page+0x10/0x60 ... > [C0000001561F3B50] [C000000000097B24] .__handle_mm_fault+0xd34/0xf10 (unreliable) > [C0000001561F3C60] [C00000000049729C] .do_page_fault+0x4ec/0x7f0 > [C0000001561F3E30] [C000000000004760] .handle_page_fault+0x20/0x54 That looks like just an ordinary page fault that got a bad address back from alloc_page_vma in alloc_zeroed_user_highpage. Why, I have no idea. > (There are other reports of VM-induced problems on -rc1, this is probably > not ppc64-related). Looks that way to me... Paul. From benh at kernel.crashing.org Tue Nov 15 16:05:33 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 16:05:33 +1100 Subject: [PATCH] powerpc: pci_64 fixes & cleanups Message-ID: <1132031134.23979.2.camel@gaston> I discovered that in some cases (PowerMac for example) we wouldn't properly map the PCI IO space on recent kernels. In addition, the code for initializing PCI host bridges was scattered all over the place with some duplication between platforms. This patch fixes the problem and does a small cleanup by creating a pcibios_alloc_controller() in pci_64.c that is similar to the one in pci_32.c (just takes an additional device node argument) that takes care of all the grunt allocation and initialisation work. It should work for both boot time and dynamically allocated PHBs. Signed-off-by: Benjamin Herrenschmidt --- John, can you give it a spin with pci hotplug and see if I didn't break anything there ? Thanks ! Index: linux-work/arch/powerpc/kernel/rtas_pci.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/rtas_pci.c 2005-11-11 10:14:48.000000000 +1100 +++ linux-work/arch/powerpc/kernel/rtas_pci.c 2005-11-15 14:58:09.000000000 +1100 @@ -304,75 +304,18 @@ struct pci_controller *phb, unsigned int addr_size_words) { - pci_setup_pci_controller(phb); - if (is_python(dev)) python_countermeasures(dev, addr_size_words); if (phb_set_bus_ranges(dev, phb)) return 1; - phb->arch_data = dev; phb->ops = &rtas_pci_ops; phb->buid = get_phb_buid(dev); return 0; } -static void __devinit add_linux_pci_domain(struct device_node *dev, - struct pci_controller *phb, - struct property *of_prop) -{ - memset(of_prop, 0, sizeof(struct property)); - of_prop->name = "linux,pci-domain"; - of_prop->length = sizeof(phb->global_number); - of_prop->value = (unsigned char *)&of_prop[1]; - memcpy(of_prop->value, &phb->global_number, sizeof(phb->global_number)); - prom_add_property(dev, of_prop); -} - -static struct pci_controller * __init alloc_phb(struct device_node *dev, - unsigned int addr_size_words) -{ - struct pci_controller *phb; - struct property *of_prop; - - phb = alloc_bootmem(sizeof(struct pci_controller)); - if (phb == NULL) - return NULL; - - of_prop = alloc_bootmem(sizeof(struct property) + - sizeof(phb->global_number)); - if (!of_prop) - return NULL; - - if (setup_phb(dev, phb, addr_size_words)) - return NULL; - - add_linux_pci_domain(dev, phb, of_prop); - - return phb; -} - -static struct pci_controller * __devinit alloc_phb_dynamic(struct device_node *dev, unsigned int addr_size_words) -{ - struct pci_controller *phb; - - phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller), - GFP_KERNEL); - if (phb == NULL) - return NULL; - - if (setup_phb(dev, phb, addr_size_words)) - return NULL; - - phb->is_dynamic = 1; - - /* TODO: linux,pci-domain? */ - - return phb; -} - unsigned long __init find_and_init_phbs(void) { struct device_node *node; @@ -397,10 +340,10 @@ if (node->type == NULL || strcmp(node->type, "pci") != 0) continue; - phb = alloc_phb(node, root_size_cells); + phb = pcibios_alloc_controller(node); if (!phb) continue; - + setup_phb(node, phb, root_size_cells); pci_process_bridge_OF_ranges(phb, node, 0); pci_setup_phb_io(phb, index == 0); #ifdef CONFIG_PPC_PSERIES @@ -446,10 +389,10 @@ root_size_cells = prom_n_size_cells(root); primary = list_empty(&hose_list); - phb = alloc_phb_dynamic(dn, root_size_cells); + phb = pcibios_alloc_controller(dn); if (!phb) return NULL; - + setup_phb(dn, phb, root_size_cells); pci_process_bridge_OF_ranges(phb, dn, primary); pci_setup_phb_io_dynamic(phb, primary); @@ -505,8 +448,7 @@ } list_del(&phb->list_node); - if (phb->is_dynamic) - kfree(phb); + pcibios_free_controller(phb); return 0; } Index: linux-work/arch/powerpc/platforms/iseries/pci.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-09 11:49:03.000000000 +1100 +++ linux-work/arch/powerpc/platforms/iseries/pci.c 2005-11-15 14:41:29.000000000 +1100 @@ -244,10 +244,9 @@ if (ret == 0) { printk("bus %d appears to exist\n", bus); - phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller), GFP_KERNEL); + phb = pcibios_alloc_controller(NULL); if (phb == NULL) return -ENOMEM; - pci_setup_pci_controller(phb); phb->pci_mem_offset = phb->local_number = bus; phb->first_busno = bus; Index: linux-work/arch/powerpc/platforms/maple/pci.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/maple/pci.c 2005-11-11 10:14:48.000000000 +1100 +++ linux-work/arch/powerpc/platforms/maple/pci.c 2005-11-15 14:41:29.000000000 +1100 @@ -326,26 +326,12 @@ dev->full_name); } - hose = alloc_bootmem(sizeof(struct pci_controller)); + hose = pcibios_alloc_controller(dev); if (hose == NULL) return -ENOMEM; - pci_setup_pci_controller(hose); - - hose->arch_data = dev; hose->first_busno = bus_range ? bus_range[0] : 0; hose->last_busno = bus_range ? bus_range[1] : 0xff; - of_prop = alloc_bootmem(sizeof(struct property) + - sizeof(hose->global_number)); - if (of_prop) { - memset(of_prop, 0, sizeof(struct property)); - of_prop->name = "linux,pci-domain"; - of_prop->length = sizeof(hose->global_number); - of_prop->value = (unsigned char *)&of_prop[1]; - memcpy(of_prop->value, &hose->global_number, sizeof(hose->global_number)); - prom_add_property(dev, of_prop); - } - disp_name = NULL; if (device_is_compatible(dev, "u3-agp")) { setup_u3_agp(hose); Index: linux-work/arch/powerpc/platforms/powermac/pci.c =================================================================== --- linux-work.orig/arch/powerpc/platforms/powermac/pci.c 2005-11-11 10:14:48.000000000 +1100 +++ linux-work/arch/powerpc/platforms/powermac/pci.c 2005-11-15 14:41:29.000000000 +1100 @@ -640,15 +640,16 @@ * the reg address cell, we shall fix that by killing struct * reg_property and using some accessor functions instead */ - hose->cfg_data = (volatile unsigned char *)ioremap(0xf2000000, 0x02000000); + hose->cfg_data = (volatile unsigned char *)ioremap(0xf2000000, + 0x02000000); /* - * /ht node doesn't expose a "ranges" property, so we "remove" regions that - * have been allocated to AGP. So far, this version of the code doesn't assign - * any of the 0xfxxxxxxx "fine" memory regions to /ht. - * We need to fix that sooner or later by either parsing all child "ranges" - * properties or figuring out the U3 address space decoding logic and - * then read its configuration register (if any). + * /ht node doesn't expose a "ranges" property, so we "remove" + * regions that have been allocated to AGP. So far, this version of + * the code doesn't assign any of the 0xfxxxxxxx "fine" memory regions + * to /ht. We need to fix that sooner or later by either parsing all + * child "ranges" properties or figuring out the U3 address space + * decoding logic and then read its configuration register (if any). */ hose->io_base_phys = 0xf4000000; hose->pci_io_size = 0x00400000; @@ -671,10 +672,10 @@ return; } - /* We "remove" the AGP resources from the resources allocated to HT, that - * is we create "holes". However, that code does assumptions that so far - * happen to be true (cross fingers...), typically that resources in the - * AGP node are properly ordered + /* We "remove" the AGP resources from the resources allocated to HT, + * that is we create "holes". However, that code does assumptions + * that so far happen to be true (cross fingers...), typically that + * resources in the AGP node are properly ordered */ cur = 0; for (i=0; i<3; i++) { @@ -684,23 +685,30 @@ /* We don't care about "fine" resources */ if (res->start >= 0xf0000000) continue; - /* Check if it's just a matter of "shrinking" us in one direction */ + /* Check if it's just a matter of "shrinking" us in one + * direction + */ if (hose->mem_resources[cur].start == res->start) { DBG("U3/HT: shrink start of %d, %08lx -> %08lx\n", - cur, hose->mem_resources[cur].start, res->end + 1); + cur, hose->mem_resources[cur].start, + res->end + 1); hose->mem_resources[cur].start = res->end + 1; continue; } if (hose->mem_resources[cur].end == res->end) { DBG("U3/HT: shrink end of %d, %08lx -> %08lx\n", - cur, hose->mem_resources[cur].end, res->start - 1); + cur, hose->mem_resources[cur].end, + res->start - 1); hose->mem_resources[cur].end = res->start - 1; continue; } /* No, it's not the case, we need a hole */ if (cur == 2) { - /* not enough resources for a hole, we drop part of the range */ - printk(KERN_WARNING "Running out of resources for /ht host !\n"); + /* not enough resources for a hole, we drop part + * of the range + */ + printk(KERN_WARNING "Running out of resources" + " for /ht host !\n"); hose->mem_resources[cur].end = res->start - 1; continue; } @@ -714,17 +722,6 @@ hose->mem_resources[cur-1].end = res->start - 1; } } - -/* XXX this needs to be converged between ppc32 and ppc64... */ -static struct pci_controller * __init pcibios_alloc_controller(void) -{ - struct pci_controller *hose; - - hose = alloc_bootmem(sizeof(struct pci_controller)); - if (hose) - pci_setup_pci_controller(hose); - return hose; -} #endif /* @@ -756,11 +753,16 @@ #endif bus_range = (int *) get_property(dev, "bus-range", &len); if (bus_range == NULL || len < 2 * sizeof(int)) { - printk(KERN_WARNING "Can't get bus-range for %s, assume bus 0\n", - dev->full_name); + printk(KERN_WARNING "Can't get bus-range for %s, assume" + " bus 0\n", dev->full_name); } + /* XXX Different prototypes, to be merged */ +#ifdef CONFIG_PPC64 + hose = pcibios_alloc_controller(dev); +#else hose = pcibios_alloc_controller(); +#endif if (!hose) return -ENOMEM; hose->arch_data = dev; @@ -768,7 +770,7 @@ hose->last_busno = bus_range ? bus_range[1] : 0xff; disp_name = NULL; -#ifdef CONFIG_POWER4 +#ifdef CONFIG_PPC64 if (device_is_compatible(dev, "u3-agp")) { setup_u3_agp(hose); disp_name = "U3-AGP"; Index: linux-work/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-work.orig/include/asm-ppc64/pci-bridge.h 2005-11-11 10:14:49.000000000 +1100 +++ linux-work/include/asm-ppc64/pci-bridge.h 2005-11-15 14:59:23.000000000 +1100 @@ -61,12 +61,14 @@ int busno; /* for pci devices */ int bussubno; /* for pci devices */ int devfn; /* for pci devices */ + +#ifdef CONFIG_PPC_PSERIES int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ int eeh_is_bridge; /* device is pci-to-pci bridge */ - +#endif int pci_ext_config_space; /* for pci devices */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ @@ -74,9 +76,9 @@ struct device_node *node; /* back-pointer to the device_node */ #ifdef CONFIG_PPC_ISERIES struct list_head Device_List; - int Irq; /* Assigned IRQ */ - int Flags; /* Possible flags(disable/bist)*/ - u8 LogicalSlot; /* Hv Slot Index for Tces */ + int Irq; /* Assigned IRQ */ + int Flags; /* Possible flags(disable/bist)*/ + u8 LogicalSlot; /* Hv Slot Index for Tces */ #endif u32 config_space[16]; /* saved PCI config space */ }; @@ -136,6 +138,10 @@ return PCI_DN(busdn)->phb; } +extern struct pci_controller * +pcibios_alloc_controller(struct device_node *dev); +extern void pcibios_free_controller(struct pci_controller *phb); + /* Return values for ppc_md.pci_probe_mode function */ #define PCI_PROBE_NONE -1 /* Don't look at this bus at all */ #define PCI_PROBE_NORMAL 0 /* Do normal PCI probing */ Index: linux-work/arch/powerpc/kernel/pci_64.c =================================================================== --- linux-work.orig/arch/powerpc/kernel/pci_64.c 2005-11-15 13:31:57.000000000 +1100 +++ linux-work/arch/powerpc/kernel/pci_64.c 2005-11-15 15:53:07.000000000 +1100 @@ -187,7 +187,7 @@ /* * pci_controller(phb) initialized common variables. */ -void __devinit pci_setup_pci_controller(struct pci_controller *hose) +static void __devinit pci_setup_pci_controller(struct pci_controller *hose) { memset(hose, 0, sizeof(struct pci_controller)); @@ -197,6 +197,65 @@ spin_unlock(&hose_spinlock); } +static void add_linux_pci_domain(struct device_node *dev, + struct pci_controller *phb) +{ + struct property *of_prop; + unsigned int size; + + of_prop = (struct property *) + get_property(dev, "linux,pci-domain", &size); + if (of_prop != NULL) + return; + WARN_ON(of_prop && size < sizeof(int)); + if (of_prop && size < sizeof(int)) + of_prop = NULL; + size = sizeof(struct property) + sizeof(int); + if (of_prop == NULL) { + if (mem_init_done) + of_prop = kmalloc(size, GFP_KERNEL); + else + of_prop = alloc_bootmem(size); + } + memset(of_prop, 0, sizeof(struct property)); + of_prop->name = "linux,pci-domain"; + of_prop->length = sizeof(int); + of_prop->value = (unsigned char *)&of_prop[1]; + *((int *)of_prop->value) = phb->global_number; + prom_add_property(dev, of_prop); +} + +struct pci_controller * pcibios_alloc_controller(struct device_node *dev) +{ + struct pci_controller *phb; + + if (mem_init_done) + phb = kmalloc(sizeof(struct pci_controller), GFP_KERNEL); + else + phb = alloc_bootmem(sizeof (struct pci_controller)); + if (phb == NULL) + return NULL; + pci_setup_pci_controller(phb); + phb->arch_data = dev; + phb->is_dynamic = mem_init_done; + if (dev) + add_linux_pci_domain(dev, phb); + return phb; +} + +void pcibios_free_controller(struct pci_controller *phb) +{ + if (phb->arch_data) { + struct device_node *np = phb->arch_data; + int *domain = (int *)get_property(np, + "linux,pci-domain", NULL); + if (domain) + *domain = -1; + } + if (phb->is_dynamic) + kfree(phb); +} + static void __init pcibios_claim_one_bus(struct pci_bus *b) { struct pci_dev *dev; @@ -907,9 +966,10 @@ * (size depending on dev->n_addr_cells) * cells 4+5 or 5+6: the size of the range */ - rlen = 0; - hose->io_base_phys = 0; ranges = (unsigned int *) get_property(dev, "ranges", &rlen); + if (ranges == NULL) + return; + hose->io_base_phys = 0; while ((rlen -= np * sizeof(unsigned int)) >= 0) { res = NULL; pci_space = ranges[0]; @@ -1107,6 +1167,8 @@ if (get_bus_io_range(bus, &start_phys, &start_virt, &size)) return 1; + if (start_phys == 0) + return 1; printk("mapping IO %lx -> %lx, size: %lx\n", start_phys, start_virt, size); if (__ioremap_explicit(start_phys, start_virt, size, _PAGE_NO_CACHE | _PAGE_GUARDED)) Index: linux-work/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-work.orig/include/asm-powerpc/ppc-pci.h 2005-11-11 10:14:49.000000000 +1100 +++ linux-work/include/asm-powerpc/ppc-pci.h 2005-11-15 14:57:24.000000000 +1100 @@ -14,7 +14,6 @@ extern unsigned long isa_io_base; -extern void pci_setup_pci_controller(struct pci_controller *hose); extern void pci_setup_phb_io(struct pci_controller *hose, int primary); extern void pci_setup_phb_io_dynamic(struct pci_controller *hose, int primary); From becky.bruce at freescale.com Tue Nov 15 16:10:08 2005 From: becky.bruce at freescale.com (Becky Bruce) Date: Mon, 14 Nov 2005 23:10:08 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1132001719.5504.204.camel@gaston> References: <1132001719.5504.204.camel@gaston> Message-ID: <269d7972781989e47cc114f8e2124b80@freescale.com> Ben, I've just done some basic testing of lmw/stmw, lwz/stw, lhx/sth, lfs/stfs, and lfd/stfd misaligned across a doubleword boundary, and everything looks good so far. I'll check out the byte reversals and a few other forms tomorrow. Cheers, B On Nov 14, 2005, at 2:55 PM, Benjamin Herrenschmidt wrote: > On Mon, 2005-11-14 at 13:53 -0600, Becky Bruce wrote: > > Ben, > > > > I talked to Kumar about this a little bit (I had started a merge of > > this file, but got distracted!) and he doesn't have any test cases.? > > I'll put something together and test this out on some of the 32-bit > > systems I have here in my lab.? It won't be complete, but it will be > > something....... > > Thanks, > Ben. > From torvalds at osdl.org Tue Nov 15 16:27:20 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 21:27:20 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <17273.26286.464586.872800@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: On Tue, 15 Nov 2005, Paul Mackerras wrote: > > How much RAM do you have? That address is in the I/O hole (from 2G to > 4G). Hmm. I _thought_ I had just 2GB (possibly 4GB) in this machine, but the bootup says ... [boot]0100 MM Init IO Hole assumed to be 80000000 -> ffffffff [boot]0100 MM Init Done Linux version 2.6.15-rc1-g4060994c (torvalds at g5.osdl.org) (gcc version 4.0.1 200.. [boot]0012 Setup Arch Top of RAM: 0x180000000, Total RAM: 0x100000000 Memory hole size: 2048MB ... On node 0 totalpages: 1572864 DMA zone: 1572864 pages, LIFO batch:64 DMA32 zone: 0 pages, LIFO batch:2 Normal zone: 0 pages, LIFO batch:2 HighMem zone: 0 pages, LIFO batch:2 (I'm now running a newer kernel that has a DMA32 zone, I wasn't running that when the oops happened). Which looks like it thinks I have 6GB. That's what "free" thinks too. Cool. I just got 4GB extra memory without even opening the machine! Magic kernel. And I just found out how I can instantly crash the kernel again: int main(int argc, char **argv) { char * buf = malloc(1024*1024*1024); memset(buf, 0, 1024*1024*1024); sleep(100); } I run two of those programs, and on the second one I get an oops again: Unable to handle kernel paging request for data at address 0xc0000000ff000000 Faulting instruction address: 0xc000000000030800 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 NUMA POWERMAC Modules linked in: autofs NIP: C000000000030800 LR: C0000000000971F0 CTR: 0000000000000020 REGS: c0000001023a38d0 TRAP: 0300 Not tainted (2.6.15-rc1-g4060994c) MSR: 9000000000009032 CR: 88000448 XER: 00000000 DAR: C0000000FF000000, DSISR: 0000000042010000 TASK = c00000015af957c0[19554] 'a.out' THREAD: c0000001023a0000 CPU: 1 GPR00: 0000000000000080 C0000001023A3B50 C0000000006C8EF0 C0000000FF000000 GPR04: 00000000BADB9000 C0000000040E6000 C0000000005B6C00 9000000000009032 GPR08: C00000017BFB8A00 C0000000006CAD30 C0000000006CDCA0 0000000000000020 GPR12: 0000000088000442 C0000000005B6C00 0000000000000000 000000001016D918 GPR16: 00000000100D0000 0000000000000000 00000000100D0000 0000000000000000 GPR20: C00000007EC566B0 C00000017BC13980 C00000016F836590 00000000BADB9000 GPR24: 0000000002000000 0000000000000000 0000000000000DC8 C0000000040E6000 GPR28: C000000006D08DC8 C000000006D08000 C0000000005D2EB8 0000000000000000 NIP [C000000000030800] .clear_user_page+0x10/0x60 LR [C0000000000971F0] .__handle_mm_fault+0xda0/0xf10 Call Trace: [C0000001023A3B50] [C000000000097184] .__handle_mm_fault+0xd34/0xf10 (unreliable) [C0000001023A3C60] [C000000000496D3C] .do_page_fault+0x4ec/0x7f0 [C0000001023A3E30] [C000000000004760] .handle_page_fault+0x20/0x54 Instruction dump: 4d820020 7c0018a8 7c004878 7c0019ad 40c2fff4 4e800020 60000000 60000000 e922a810 8169000c 80090004 7d6903a6 <7c001fec> 7c630214 4320fff8 e922a808 ie it seems to have set up the mem_map[] to point all the way down from 6GB to 0, and then when I've used up the two high GB of memory (the _real_ memory in this machine) it starts allocating memory that it doesn't have, and that it doesn't have TLB mappings for. > > (There are other reports of VM-induced problems on -rc1, this is probably > > not ppc64-related). > > Looks that way to me... No, looks like a ppc64 memory setup bug, altough it's quite possibly brought on by the PageReserved() removal in the VM layer. Andrew, Nick, Hugh, I really think that removing that "PageReserved()" test from the page freeing functions was a mistake. I think I'm going to add it back in. I bet this happens on all the other architectures too. The bootup has marked pages reserved, and then frees them all. It used to be that the VM just silently skipped the reserved pages, now it will add them to the free lists.. Linus From benh at kernel.crashing.org Tue Nov 15 16:35:09 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 16:35:09 +1100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <269d7972781989e47cc114f8e2124b80@freescale.com> References: <1132001719.5504.204.camel@gaston> <269d7972781989e47cc114f8e2124b80@freescale.com> Message-ID: <1132032910.23979.6.camel@gaston> On Mon, 2005-11-14 at 23:10 -0600, Becky Bruce wrote: > Ben, > > I've just done some basic testing of lmw/stmw, lwz/stw, lhx/sth, > lfs/stfs, and lfd/stfd misaligned across a doubleword boundary, and > everything looks good so far. I'll check out the byte reversals and a > few other forms tomorrow. Excellent, thanks ! BTW. Make sure you test these one CPUs that actually trap on misaligned accesses :) Best is probably to do the misaligned access accross a page boundary, that's what most CPUs can do. Ben. From akpm at osdl.org Tue Nov 15 16:41:58 2005 From: akpm at osdl.org (Andrew Morton) Date: Mon, 14 Nov 2005 21:41:58 -0800 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: <20051114214158.580883b3.akpm@osdl.org> Linus Torvalds wrote: > > Andrew, Nick, Hugh, I really think that removing that "PageReserved()" > test from the page freeing functions was a mistake. I think I'm going to > add it back in. It's worth seeing if that fixes it. I'm hankering to back out the whole thing - it just has too many problems for now. But Hugh's busily working on things and I thought it best to leave it a couple of days until he has a verdict. From torvalds at osdl.org Tue Nov 15 16:46:31 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 21:46:31 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: On Mon, 14 Nov 2005, Linus Torvalds wrote: > > Hmm. I _thought_ I had just 2GB (possibly 4GB) in this machine, Yeah. 4GB. My other G5 has just 2GB (I had to upgrade myself, I think it came with 512M from Apple). So it seems like it's just the IO hole in the middle that got magically "added" as memory, with real memory at 0-2GB and 4GB-6GB. Linus From paulus at samba.org Tue Nov 15 16:52:14 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 16:52:14 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: <17273.30606.694749.166420@cargo.ozlabs.ibm.com> Linus Torvalds writes: > Hmm. I _thought_ I had just 2GB (possibly 4GB) in this machine, but the > bootup says > > ... > [boot]0100 MM Init > IO Hole assumed to be 80000000 -> ffffffff > [boot]0100 MM Init Done > Linux version 2.6.15-rc1-g4060994c (torvalds at g5.osdl.org) (gcc version 4.0.1 200.. > [boot]0012 Setup Arch > Top of RAM: 0x180000000, Total RAM: 0x100000000 > Memory hole size: 2048MB That says you have 4GB, 2GB at 0 and 2GB at 4G, with a 2G hole in between. > ie it seems to have set up the mem_map[] to point all the way down from > 6GB to 0, and then when I've used up the two high GB of memory (the _real_ > memory in this machine) it starts allocating memory that it doesn't have, > and that it doesn't have TLB mappings for. Yep, looks that way. I wonder why it hits 0xc0000000ff000000 before (e.g.) 0xc0000000fffff000? > No, looks like a ppc64 memory setup bug, altough it's quite possibly > brought on by the PageReserved() removal in the VM layer. > > Andrew, Nick, Hugh, I really think that removing that "PageReserved()" > test from the page freeing functions was a mistake. I think I'm going to > add it back in. > > I bet this happens on all the other architectures too. The bootup has > marked pages reserved, and then frees them all. It used to be that the VM > just silently skipped the reserved pages, now it will add them to the free > lists.. Hmmm, the lmb structure that do_init_bootmem uses should have two entries, and we should be doing free_bootmem(0, 2G) and free_bootmem(4G, 2G), which I would have thought would do the right thing. A printk in there (arch/powerpc/mm/mem.c) would tell us... Paul. From torvalds at osdl.org Tue Nov 15 17:17:01 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 22:17:01 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: On Mon, 14 Nov 2005, Linus Torvalds wrote: > > Andrew, Nick, Hugh, I really think that removing that "PageReserved()" > test from the page freeing functions was a mistake. I think I'm going to > add it back in. No, it's not that simple. I think it's still related, but the bootmem code shouldn't have cared about PG_reserved anyway, so my simplistic theory was incorrect. Maybe it's a ppc64 thing after all. If push comes to shove, I'll bisect it, but now I'm turning in for the dat. Linus From benh at kernel.crashing.org Tue Nov 15 17:21:12 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 17:21:12 +1100 Subject: [PATCH] powerpc: Make the vDSO functions set error code Message-ID: <1132035672.23979.14.camel@gaston> The vDSO functions should have the same calling convention as a syscall. Unfortunately, they currently don't set the cr0.so bit which is used to indicate an error. This patch makes them clear this bit unconditionally since all functions currently succeed. The syscall fallback done by some of them will eventually override this if the syscall fails. This also changes the symbol version of all vdso exports to make sure glibc can differenciate between old and fixed calls for existing ones like __kernel_gettimeofday. Signed-off-by: Benjamin Herrenschmidt --- Tom, Steve: You'll have to write a wrapper macro to call the vdso similar to the syscall one, that does something like: mtctr %0 <--- function address bctrl mfcr %1 <--- error indication in cr0.so With appropriate clobbers, similar to the syscall macro (the vDSO clobbers all volatile register (r0, r3 ... r12, cr0, cr1 and XER) The advantage of doing so is that you don't have to create fake descriptors for ppc64 and thus avoid useless TOC reloads, you can even have a single macro that works on both 32 and 64 bits. Index: linux-work/arch/powerpc/kernel/vdso32/cacheflush.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/cacheflush.S 2005-11-14 10:41:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/cacheflush.S 2005-11-15 17:11:51.000000000 +1100 @@ -30,6 +30,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache) .cfi_startproc + crclr cr0*4+so li r5,127 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -58,6 +59,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) .cfi_startproc + crclr cr0*4+so sync isync li r3,0 Index: linux-work/arch/powerpc/kernel/vdso32/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/datapage.S 2005-11-15 13:31:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/datapage.S 2005-11-15 17:10:46.000000000 +1100 @@ -52,6 +52,7 @@ */ V_FUNCTION_BEGIN(__kernel_get_syscall_map) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 @@ -74,6 +75,7 @@ */ V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 bl __get_datapage at local Index: linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-15 13:31:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-15 17:10:10.000000000 +1100 @@ -26,6 +26,7 @@ */ V_FUNCTION_BEGIN(__kernel_gettimeofday) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 @@ -80,6 +81,7 @@ */ V_FUNCTION_BEGIN(__kernel_clock_gettime) .cfi_startproc + crclr cr0*4+so /* Check for supported clock IDs */ cmpli cr0,r3,CLOCK_REALTIME cmpli cr1,r3,CLOCK_MONOTONIC @@ -211,6 +213,7 @@ */ V_FUNCTION_BEGIN(__kernel_clock_getres) .cfi_startproc + crclr cr0*4+so /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC Index: linux-work/arch/powerpc/kernel/vdso64/cacheflush.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/cacheflush.S 2005-11-14 10:41:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/cacheflush.S 2005-11-15 17:12:53.000000000 +1100 @@ -30,6 +30,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache) .cfi_startproc + crclr cr0*4+so li r5,127 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -58,6 +59,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) .cfi_startproc + crclr cr0*4+so sync isync li r3,0 Index: linux-work/arch/powerpc/kernel/vdso64/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/datapage.S 2005-11-15 13:31:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/datapage.S 2005-11-15 17:12:39.000000000 +1100 @@ -52,6 +52,7 @@ */ V_FUNCTION_BEGIN(__kernel_get_syscall_map) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 @@ -75,6 +76,7 @@ */ V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 bl V_LOCAL_FUNC(__get_datapage) Index: linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-15 13:31:58.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-15 17:12:29.000000000 +1100 @@ -27,6 +27,7 @@ */ V_FUNCTION_BEGIN(__kernel_gettimeofday) .cfi_startproc + crclr cr0*4+so mflr r12 .cfi_register lr,r12 @@ -66,6 +67,7 @@ */ V_FUNCTION_BEGIN(__kernel_clock_gettime) .cfi_startproc + crclr cr0*4+so /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC @@ -185,6 +187,7 @@ */ V_FUNCTION_BEGIN(__kernel_clock_getres) .cfi_startproc + crclr cr0*4+so /* Check for supported clock IDs */ cmpwi cr0,r3,CLOCK_REALTIME cmpwi cr1,r3,CLOCK_MONOTONIC Index: linux-work/include/asm-powerpc/vdso.h =================================================================== --- linux-work.orig/include/asm-powerpc/vdso.h 2005-11-14 10:42:00.000000000 +1100 +++ linux-work/include/asm-powerpc/vdso.h 2005-11-15 17:17:20.000000000 +1100 @@ -11,7 +11,7 @@ #define VDSO32_MBASE VDSO32_LBASE #define VDSO64_MBASE VDSO64_LBASE -#define VDSO_VERSION_STRING LINUX_2.6.12 +#define VDSO_VERSION_STRING LINUX_2.6.15 /* Define if 64 bits VDSO has procedure descriptors */ #undef VDS64_HAS_DESCRIPTORS From torvalds at osdl.org Tue Nov 15 17:22:20 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 22:22:20 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <20051114214158.580883b3.akpm@osdl.org> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <20051114214158.580883b3.akpm@osdl.org> Message-ID: On Mon, 14 Nov 2005, Andrew Morton wrote: > > It's worth seeing if that fixes it. The trivial patch didn't seem to fix it, and looking at the bootmem code I decided that maybe it shouldn't have ;) But I didn't undo all of it, so maybe it should have worked after all, and I just didn't do everything (I didn't do page_is_buddy() etc, I just added them back into the page freeing paths). > I'm hankering to back out the whole thing - it just has too many > problems for now. But Hugh's busily working on things and I thought it > best to leave it a couple of days until he has a verdict. Yeah. I'll give up for today, and look at it tomorrow. I think getting rid of PageReserved() was the right thing from a page table traversal standpoint: every time I see a code-path that removes a check for PageReserved() and replaces it with a VM_IO check or similar, it looks fine. I just think that people went too far in thinking that PageReserved itself was somehow wrong. It's great for the Zero page, and it's great for kernel pages in general (and things like the ISA hole on x86). Linus From torvalds at osdl.org Tue Nov 15 17:30:17 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 22:30:17 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <17273.30606.694749.166420@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> Message-ID: On Tue, 15 Nov 2005, Paul Mackerras wrote: > > Yep, looks that way. I wonder why it hits 0xc0000000ff000000 before > (e.g.) 0xc0000000fffff000? I think it just puts the lists within a buddy size in reverse order (actually, it builds them up in ascending order, but then just pops them off, so it pops the high blocks first). But then when it splits a buddy block, it takes the first part of it and releases the rest in the lower order lists. > Hmmm, the lmb structure that do_init_bootmem uses should have two > entries, and we should be doing free_bootmem(0, 2G) and > free_bootmem(4G, 2G), which I would have thought would do the right > thing. A printk in there (arch/powerpc/mm/mem.c) would tell us... I'm just about to boot something that added some printk's to mm/bootmem.c that should be equivalent. But then I'm really turning in. Linus From torvalds at osdl.org Tue Nov 15 17:40:09 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 14 Nov 2005 22:40:09 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> Message-ID: On Mon, 14 Nov 2005, Linus Torvalds wrote: > > I'm just about to boot something that added some printk's to mm/bootmem.c > that should be equivalent. But then I'm really turning in. Ok, looks like a ppc64 bug: Top of RAM: 0x180000000, Total RAM: 0x100000000 Memory hole size: 2048MB Freeing bootmem 0,6442450944 Reserving bootmem 0, 7589888 Reserving bootmem 30408704, 1433600 Reserving bootmem 34988032, 262144 Reserving bootmem 268427264, 8192 Reserving bootmem 2130702336, 16781312 Reserving bootmem 6375018496, 196608 Reserving bootmem 6375216128, 7048 Reserving bootmem 6375223296, 4096 Reserving bootmem 6375229016, 67221928 That's the trace from mm/bootmem.c. So the ppc64 boot code adds one 6GB region, not two 2GB ones. Linus From benh at kernel.crashing.org Tue Nov 15 18:01:00 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 18:01:00 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: <1132038060.5646.2.camel@gaston> > > No, looks like a ppc64 memory setup bug, altough it's quite possibly > brought on by the PageReserved() removal in the VM layer. > > Andrew, Nick, Hugh, I really think that removing that "PageReserved()" > test from the page freeing functions was a mistake. I think I'm going to > add it back in. > > I bet this happens on all the other architectures too. The bootup has > marked pages reserved, and then frees them all. It used to be that the VM > just silently skipped the reserved pages, now it will add them to the free > lists.. Well, there are two interesting things here: One is the fact that the kernel seem to didn't get the right value for the top of RAM (can you send me a tarball of your /proc/device-tree so I can see what open firmware exactly tells us please ?) and the other one is that we tried to manipulate the IO hole as if it was memory (on this machine ,the region between 2G and 4G is IOs) Also, can you send me the entire dmesg so I can see what our early setup code thinks about the physical memory ? Ben. From benh at kernel.crashing.org Tue Nov 15 18:07:12 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 18:07:12 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> Message-ID: <1132038433.5646.9.camel@gaston> On Mon, 2005-11-14 at 21:46 -0800, Linus Torvalds wrote: > > On Mon, 14 Nov 2005, Linus Torvalds wrote: > > > > Hmm. I _thought_ I had just 2GB (possibly 4GB) in this machine, > > Yeah. 4GB. My other G5 has just 2GB (I had to upgrade myself, I think it > came with 512M from Apple). > > So it seems like it's just the IO hole in the middle that got magically > "added" as memory, with real memory at 0-2GB and 4GB-6GB. Yes, that would be the problem. The hole was not properly accounted for . When we setup the zones, we do pass the hole size (but the zone stuff doesn't seem to have a way to tell _where_ the hole is) I'm not sure how the memory init is supposed to work any more, that area had too much changes that I didn't follow. What we do is to set a bootmem map fo the whole size (6Gb here) and then free_bootmem() on the actual RAM chunks in there, and memory_present() too. free_area_init_node() is called with a zone size of 6Gb but a hole size of 2Gb. Ben. From benh at kernel.crashing.org Tue Nov 15 18:11:37 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 18:11:37 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> Message-ID: <1132038697.5646.13.camel@gaston> On Mon, 2005-11-14 at 22:40 -0800, Linus Torvalds wrote: > > On Mon, 14 Nov 2005, Linus Torvalds wrote: > > > > I'm just about to boot something that added some printk's to mm/bootmem.c > > that should be equivalent. But then I'm really turning in. > > Ok, looks like a ppc64 bug: > > Top of RAM: 0x180000000, Total RAM: 0x100000000 > Memory hole size: 2048MB > Freeing bootmem 0,6442450944 Yes, that looks very wrong... You can try to add some debug output to early_init_dt_scan_memory() in arch/powerpc/kernel/prom.c to see what it does. Anyway, I'll have a look myself on the G5 here today, as soon as I'm done chasing the bug that breaks ARCH=ppc on some PowerMacs. Ben. From benh at kernel.crashing.org Tue Nov 15 18:50:22 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 18:50:22 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> Message-ID: <1132041022.5646.33.camel@gaston> On Mon, 2005-11-14 at 22:40 -0800, Linus Torvalds wrote: > > On Mon, 14 Nov 2005, Linus Torvalds wrote: > > > > I'm just about to boot something that added some printk's to mm/bootmem.c > > that should be equivalent. But then I'm really turning in. > > Ok, looks like a ppc64 bug: > > Top of RAM: 0x180000000, Total RAM: 0x100000000 > Memory hole size: 2048MB > Freeing bootmem 0,6442450944 > Reserving bootmem 0, 7589888 > Reserving bootmem 30408704, 1433600 > Reserving bootmem 34988032, 262144 > Reserving bootmem 268427264, 8192 > Reserving bootmem 2130702336, 16781312 > Reserving bootmem 6375018496, 196608 > Reserving bootmem 6375216128, 7048 > Reserving bootmem 6375223296, 4096 > Reserving bootmem 6375229016, 67221928 > > That's the trace from mm/bootmem.c. > > So the ppc64 boot code adds one 6GB region, not two 2GB ones. > Mine, which has 3.5Gb does: free_bootmem_core(0, 80000000) free_bootmem_core(100000000, 60000000) reserve_bootmem_core(0, 702000) reserve_bootmem_core(1d06000, 40000) reserve_bootmem_core(ffee000, 12000) reserve_bootmem_core(7efff000, 1001000) reserve_bootmem_core(15bfb7000, 2c000) reserve_bootmem_core(15bfe3700, 888) reserve_bootmem_core(15bfe4000, 1000) reserve_bootmem_core(15bfe55c0, 401aa40) Which looks correct. However, I just noticed there is some big bogosity in CONFIG_NUMA, arch/powerpc/mm/numa.c: static void __init setup_nonnuma(void) { unsigned long top_of_ram = lmb_end_of_DRAM(); unsigned long total_ram = lmb_phys_mem_size(); printk(KERN_INFO "Top of RAM: 0x%lx, Total RAM: 0x%lx\n", top_of_ram, total_ram); printk(KERN_INFO "Memory hole size: %ldMB\n", (top_of_ram - total_ram) >> 20); map_cpu_to_node(boot_cpuid, 0); add_region(0, 0, lmb_end_of_DRAM() >> PAGE_SHIFT); node_set_online(0); } That is absolute junk. It totally ignores the IO hole and will trigger exactly what you mentioned. However, for that code to be reached, you need to have both: 1) CONFIG_NUMA 2) numa=off on the command line Is this the case ? I'll try to catch the NUMA folks to fix that crap ... I'm not exactly sure what is the best way to proceed, probably when numa is disabled, we should still go through all the nodes, but adding all the regions to the same kernel-side node. Ben. From torvalds at osdl.org Tue Nov 15 19:01:17 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Tue, 15 Nov 2005 00:01:17 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <1132041022.5646.33.camel@gaston> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> Message-ID: On Tue, 15 Nov 2005, Benjamin Herrenschmidt wrote: > > However, I just noticed there is some big bogosity in CONFIG_NUMA, > arch/powerpc/mm/numa.c: I do indeed have CONFIG_NUMA enabled for some totally unknown reason. > However, for that code to be reached, you need to have both: > > 1) CONFIG_NUMA Yes. > 2) numa=off on the command line Nope. Linus From benh at kernel.crashing.org Tue Nov 15 19:07:12 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Nov 2005 19:07:12 +1100 Subject: ppc64 oops.. In-Reply-To: <1132041022.5646.33.camel@gaston> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> Message-ID: <1132042033.5646.39.camel@gaston> On Tue, 2005-11-15 at 18:50 +1100, Benjamin Herrenschmidt wrote: > 1) CONFIG_NUMA > 2) numa=off on the command line In fact, you don't need that, CONFIG_NUMA is enough ... provided that it builds at all, which needs CONFIG_SPARSEMEM (my previous test didn't show the bug because the kernel build acutally failed due to the lack of CONFIG_SPARSEMEM and I didn't notice). I suppose we shoud fix Kconfig there ...In fact, sparsemem makes sense to use on a G5 even without CONFIG_NUMA, provided that I add proper support, so that the IO hole doesn't get struct page created ... I'll try to come up with a fix tomorrow, in the meantime, disable CONFIG_NUMA. Ben. From hch at lst.de Tue Nov 15 20:20:42 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 15 Nov 2005 10:20:42 +0100 Subject: [PATCH] powerpc: put page page_to_virt for Book-e processors In-Reply-To: References: Message-ID: <20051115092042.GA2060@lst.de> On Mon, Nov 14, 2005 at 05:21:44PM -0600, Kumar Gala wrote: > Book-E processors use page_to_virt since we have to always translate. Why can't you use the proper page_address() macro? From paulus at samba.org Tue Nov 15 22:01:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 22:01:34 +1100 Subject: ppc64 oops.. In-Reply-To: <1132041022.5646.33.camel@gaston> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> Message-ID: <17273.49166.405599.953724@cargo.ozlabs.ibm.com> Benjamin Herrenschmidt writes: > However, for that code to be reached, you need to have both: > > 1) CONFIG_NUMA > 2) numa=off on the command line No, we'll also call setup_nonnuma() if the device tree doesn't have a /rtas node or if the /rtas node doesn't have a ibm,associativity-reference-points property. Which would be the case on the G5. Paul. From paulus at samba.org Tue Nov 15 22:08:18 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Nov 2005 22:08:18 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> Message-ID: <17273.49570.830903.559730@cargo.ozlabs.ibm.com> Linus Torvalds writes: > I do indeed have CONFIG_NUMA enabled for some totally unknown reason. I normally don't have CONFIG_NUMA on my G5 kernels, which is why I didn't notice it. I would think the patch below ought to fix it. But on my G5 (with 2.5GB of memory) I get an oops in mem_init due to pgdat_page_nr apparently returning 0x1c00000. When I turned on CONFIG_FLATMEM instead of CONFIG_SPARSEMEM, it wouldn't even compile or link. Grumble. Evidently I'm going to have to get my head around the sparsemem stuff. As Ben says, the best thing is probably just to turn off NUMA for now. Paul. diff -urN powerpc-merge/arch/powerpc/mm/numa.c merge-hack/arch/powerpc/mm/numa.c --- powerpc-merge/arch/powerpc/mm/numa.c 2005-11-14 10:35:09.000000000 +1100 +++ merge-hack/arch/powerpc/mm/numa.c 2005-11-15 21:58:26.000000000 +1100 @@ -483,6 +483,7 @@ { unsigned long top_of_ram = lmb_end_of_DRAM(); unsigned long total_ram = lmb_phys_mem_size(); + unsigned int i; printk(KERN_INFO "Top of RAM: 0x%lx, Total RAM: 0x%lx\n", top_of_ram, total_ram); @@ -490,7 +491,9 @@ (top_of_ram - total_ram) >> 20); map_cpu_to_node(boot_cpuid, 0); - add_region(0, 0, lmb_end_of_DRAM() >> PAGE_SHIFT); + for (i = 0; i < lmb.memory.cnt; ++i) + add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT, + lmb_size_pages(&lmb.memory, i)); node_set_online(0); } From nickpiggin at yahoo.com.au Tue Nov 15 19:25:50 2005 From: nickpiggin at yahoo.com.au (Nick Piggin) Date: Tue, 15 Nov 2005 19:25:50 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <20051114214158.580883b3.akpm@osdl.org> Message-ID: <43799B8E.3050600@yahoo.com.au> Linus Torvalds wrote: >>I'm hankering to back out the whole thing - it just has too many >>problems for now. But Hugh's busily working on things and I thought it >>best to leave it a couple of days until he has a verdict. > Been away for a couple of days and I'm still not caught up with things. Luckily this thing is looking more like a ppc64 bug, however I'll be watching this space to see if I can be of any use... > > Yeah. I'll give up for today, and look at it tomorrow. > > I think getting rid of PageReserved() was the right thing from a page > table traversal standpoint: every time I see a code-path that removes a > check for PageReserved() and replaces it with a VM_IO check or similar, it > looks fine. > > I just think that people went too far in thinking that PageReserved itself > was somehow wrong. It's great for the Zero page, and it's great for kernel > pages in general (and things like the ISA hole on x86). > I think it was really weird and conducive of bugs that PageReserved for some ungodly reason would turn put_page into a noop. I'm sure it was a hack when it was put in, and it IMO it was always a hack. Not to mention other side effects like preventing the page from being saved by swsusp, or exempting its user mappings from rmap accounting. I really don't think we've missed PG_reserved. The ZERO_PAGE accounting thing may be a problem, but that problem didn't come about due to removal of PageReserved, but rather the concurrent removal of ZERO_PAGE special casing we had there - it can be reinstated (and a solution for 2.6.15 won't be difficult). If we end up needing something like say, PG_zero for per-node zero pages then I would not be against the flag at all, so long as it had well defined semantics. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com From arnd at arndb.de Wed Nov 16 07:53:47 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:47 -0500 Subject: [PATCH 0/5] SPU file system for 2.6.15-rc-mm Message-ID: <20051115205347.395355000@localhost> I'd like to have the SPU file system included in the -mm kernel to get broader review and eventually have it merged in 2.6.16. This version now puts all the spufs files under arch/powerpc/platforms/cell/spufs instead of fs/spufs, since it is really specific to that platform. The interface has now stabilized on a set of files per logical SPU that can be accessed with read/write and sometime poll or mmap but not ioctl as well as two new system calls, spu_run and spu_create. As discussed, the system call numbers that I am using here conflict with those from the perfmon patches, so I'm moving the perfmon syscall range to start at 280. This means that the first patch gets to be applied on top of perfmon2-reserve-system-calls-reserve-spu-slots.patch. Arnd <>< -- From arnd at arndb.de Wed Nov 16 07:53:51 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:51 -0500 Subject: [PATCH 4/5] spufs: add spu-side context switch code References: <20051115205347.395355000@localhost> Message-ID: <20051115210408.933936000@localhost> An embedded and charset-unspecified text was scrubbed... Name: spufs-generated-files.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051115/d474de00/attachment.txt From arnd at arndb.de Wed Nov 16 07:53:50 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:50 -0500 Subject: [PATCH 3/5] kernel-side context switch code for spufs References: <20051115205347.395355000@localhost> Message-ID: <20051115210408.777988000@localhost> An embedded and charset-unspecified text was scrubbed... Name: spufs-context-part2-4.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051115/2f6966d6/attachment.txt From arnd at arndb.de Wed Nov 16 07:53:52 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:52 -0500 Subject: [PATCH 5/5] spufs: cooperative scheduler support References: <20051115205347.395355000@localhost> Message-ID: <20051115210409.136237000@localhost> An embedded and charset-unspecified text was scrubbed... Name: spufs-scheduler-2.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051115/3fd5c459/attachment.txt From arnd at arndb.de Wed Nov 16 07:53:49 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:49 -0500 Subject: [PATCH 2/5] spufs: switchable spu contexts References: <20051115205347.395355000@localhost> Message-ID: <20051115210408.533647000@localhost> An embedded and charset-unspecified text was scrubbed... Name: spufs-context-4.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051115/e023fd4f/attachment.txt From arnd at arndb.de Wed Nov 16 07:53:48 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 15:53:48 -0500 Subject: [PATCH 1/5] spufs: The SPU file system, base References: <20051115205347.395355000@localhost> Message-ID: <20051115210408.327453000@localhost> An embedded and charset-unspecified text was scrubbed... Name: spufs-12.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051115/f4345aaa/attachment.txt From torvalds at osdl.org Wed Nov 16 03:19:30 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Tue, 15 Nov 2005 08:19:30 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <1132042033.5646.39.camel@gaston> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: On Tue, 15 Nov 2005, Benjamin Herrenschmidt wrote: > > I'll try to come up with a fix tomorrow, in the meantime, disable > CONFIG_NUMA. I'd love to, but then it never boots at all, and stops after "Setup done". I'll now try with "flatmem", although on powerpc, sparsemem seems to be the default, and I bet that's the cause: config ARCH_SPARSEMEM_DEFAULT def_bool y depends on SMP && PPC_PSERIES just because I had PSERIES enabled (which is _also_ the default). So nobody has clearly ever tested either NUMA nor SPARSEMEM, yet they are both enabled by default. Tssk. Linus From galak at gate.crashing.org Wed Nov 16 03:45:01 2005 From: galak at gate.crashing.org (Kumar Gala) Date: Tue, 15 Nov 2005 10:45:01 -0600 (CST) Subject: [PATCH] powerpc: put page page_to_virt for Book-e processors In-Reply-To: <20051115092042.GA2060@lst.de> Message-ID: On Tue, 15 Nov 2005, Christoph Hellwig wrote: > On Mon, Nov 14, 2005 at 05:21:44PM -0600, Kumar Gala wrote: > > Book-E processors use page_to_virt since we have to always translate. > > Why can't you use the proper page_address() macro? I really want is lowmem_page_address() since the only current ppc user of page_to_virt() in include/asm-ppc/pgalloc.h is pmd_populate() for book-e. We have to have pages in lowmem. - kumar From linas at austin.ibm.com Tue Nov 15 08:47:03 2005 From: linas at austin.ibm.com (linas) Date: Mon, 14 Nov 2005 15:47:03 -0600 Subject: [PATCH 0/7] PCI Error Recovery In-Reply-To: <20051108234911.GC19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> Message-ID: <20051114214703.GG19593@austin.ibm.com> On Tue, Nov 08, 2005 at 05:49:11PM -0600, linas was heard to remark: > > Following seven patches implement the PCI error reporting and recovery > header and device driver changes as recently discussed, w/all requested > changes & etc. These are tested and wrk well. Please apply. These patches don't seem to be in either linux-2.6.15-rc1-git2 or linux-2.6.15-mm2 Is there something else I need to do, besides nag? --linas From greg at kroah.com Wed Nov 16 03:49:01 2005 From: greg at kroah.com (Greg KH) Date: Tue, 15 Nov 2005 08:49:01 -0800 Subject: [PATCH 0/7] PCI Error Recovery In-Reply-To: <20051114214703.GG19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> <20051114214703.GG19593@austin.ibm.com> Message-ID: <20051115164901.GA12968@kroah.com> On Mon, Nov 14, 2005 at 03:47:03PM -0600, linas wrote: > On Tue, Nov 08, 2005 at 05:49:11PM -0600, linas was heard to remark: > > > > Following seven patches implement the PCI error reporting and recovery > > header and device driver changes as recently discussed, w/all requested > > changes & etc. These are tested and wrk well. Please apply. > > These patches don't seem to be in either linux-2.6.15-rc1-git2 or linux-2.6.15-mm2 > > Is there something else I need to do, besides nag? Address the issue that was brought up on lkml with them? thanks, greg k-h From galak at gate.crashing.org Wed Nov 16 04:03:24 2005 From: galak at gate.crashing.org (Kumar Gala) Date: Tue, 15 Nov 2005 11:03:24 -0600 (CST) Subject: [PATCH] powerpc: replace page_to_virt() with lowmem_page_address() for Book-E Message-ID: page_to_virt and lowmem_page_address provided equiavlent functionality so use the more standard lowmem_page_address This also addresses build issue in ARCH=powerpc since page_to_virt() has been removed from include/asm-powerpc/page.h Signed-off-by: Kumar Gala --- commit bd544f8830a563d50d58970c9360379efc555641 tree 887891cdf18bef998f5b5a5ffff9b879ff3591db parent 1b5b521b837ee0a9366a0750d915e09566229be4 author Kumar Gala Tue, 15 Nov 2005 11:03:34 -0600 committer Kumar Gala Tue, 15 Nov 2005 11:03:34 -0600 include/asm-ppc/pgalloc.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/asm-ppc/pgalloc.h b/include/asm-ppc/pgalloc.h index 931b6de..bdefd1c 100644 --- a/include/asm-ppc/pgalloc.h +++ b/include/asm-ppc/pgalloc.h @@ -28,7 +28,7 @@ extern void pgd_free(pgd_t *pgd); #define pmd_populate_kernel(mm, pmd, pte) \ (pmd_val(*(pmd)) = (unsigned long)pte | _PMD_PRESENT) #define pmd_populate(mm, pmd, pte) \ - (pmd_val(*(pmd)) = (unsigned long)page_to_virt(pte) | _PMD_PRESENT) + (pmd_val(*(pmd)) = (unsigned long)lowmem_page_address(pte) | _PMD_PRESENT) #endif extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr); From torvalds at osdl.org Wed Nov 16 04:06:46 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Tue, 15 Nov 2005 09:06:46 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: On Tue, 15 Nov 2005, Linus Torvalds wrote: > > I'll now try with "flatmem", although on powerpc, sparsemem seems to be > the default, and I bet that's the cause Indeed. My machine now has 4GB, and it all works again. So this was a purely ppc64 bug, not a VM one. Linus From hugh at veritas.com Wed Nov 16 04:21:06 2005 From: hugh at veritas.com (Hugh Dickins) Date: Tue, 15 Nov 2005 17:21:06 +0000 (GMT) Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: On Tue, 15 Nov 2005, Linus Torvalds wrote: > > So this was a purely ppc64 bug, not a VM one. But you're right to be concerned about the loss of that PG_reserved safety net for initial kernel memory. Like Nick, I would like to do away with it in the long term; but the kind of bugs we're seeing at the moment, whether or not they turn out to be caused by those changes, we do want the safety net back, with a warning to tell when it's being used (and not clearing PG_reserved in bad_page, but preventing the page being reused). After a couple of silent releases we can discuss whether to throw the net away; but it's been silently working for so long, we've no idea how much might be relying on it. Hugh From anton at samba.org Wed Nov 16 04:43:32 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 16 Nov 2005 04:43:32 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: <20051115174332.GA9632@krispykreme> Hi, > I'd love to, but then it never boots at all, and stops after "Setup done". > > I'll now try with "flatmem", although on powerpc, sparsemem seems to be > the default, and I bet that's the cause: > > config ARCH_SPARSEMEM_DEFAULT > def_bool y > depends on SMP && PPC_PSERIES > > just because I had PSERIES enabled (which is _also_ the default). > > So nobody has clearly ever tested either NUMA nor SPARSEMEM, yet they are > both enabled by default. Tssk. All options (flatmem, sparsmem and NUMA sparsemem) were tested on pseries, I dont have access to a g5 in Austin unfortunately. Anton From linas at austin.ibm.com Wed Nov 16 04:59:34 2005 From: linas at austin.ibm.com (linas) Date: Tue, 15 Nov 2005 11:59:34 -0600 Subject: [PATCH 0/7] PCI Error Recovery In-Reply-To: <20051115164901.GA12968@kroah.com> References: <20051108234911.GC19593@austin.ibm.com> <20051114214703.GG19593@austin.ibm.com> <20051115164901.GA12968@kroah.com> Message-ID: <20051115175934.GO19593@austin.ibm.com> On Tue, Nov 15, 2005 at 08:49:01AM -0800, Greg KH was heard to remark: > On Mon, Nov 14, 2005 at 03:47:03PM -0600, linas wrote: > > On Tue, Nov 08, 2005 at 05:49:11PM -0600, linas was heard to remark: > > > > > > Following seven patches implement the PCI error reporting and recovery > > > header and device driver changes as recently discussed, w/all requested > > > changes & etc. These are tested and wrk well. Please apply. > > > > These patches don't seem to be in either linux-2.6.15-rc1-git2 or linux-2.6.15-mm2 > > > > Is there something else I need to do, besides nag? > > Address the issue that was brought up on lkml with them? ? I'm sorry, I'm crawling the archives, and can't find any threads that haven't already been addressed in the final patchset. --linas From dwmw2 at infradead.org Wed Nov 16 05:52:18 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 15 Nov 2005 18:52:18 +0000 Subject: [PATCH] syscall entry/exit revamp Message-ID: <1132080738.21643.28.camel@hades.cambridge.redhat.com> This cleanup patch speeds up the null syscall path on ppc64 by about 3%, and brings the ppc32 and ppc64 code slightly closer together. The ppc64 code was checking current_thread_info()->flags twice in the syscall exit path; once for TIF_SYSCALL_T_OR_A before disabling interrupts, and then again for TIF_SIGPENDING|TIF_NEED_RESCHED etc after disabling interrupts. Now we do the same as ppc32 -- check the flags only once in the fast path, and re-enable interrupts if necessary in the ptrace case. The patch abolishes the 'syscall_noerror' member of struct thread_info and replaces it with a TIF_NOERROR bit in the flags, which is handled in the slow path. This shortens the syscall entry code, which no longer needs to clear syscall_noerror. The patch adds a TIF_SAVE_NVGPRS flag which causes the syscall exit slow path to save the non-volatile GPRs into a signal frame. This removes the need for the assembly wrappers around sys_sigsuspend(), sys_rt_sigsuspend(), et al which existed solely to save those registers in advance. It also means I don't have to add new wrappers for ppoll() and pselect(), which is what I was supposed to be doing when I got distracted into this... Finally, it unifies the ppc64 and ppc32 methods of handling syscall exit directly into a signal handler (as required by sigsuspend et al) by introducing a TIF_RESTOREALL flag which causes _all_ the registers to be reloaded from the pt_regs by taking the ret_from_exception path, instead of the normal syscall exit path which stomps on the callee-saved GPRs. It appears to pass an LTP test run on ppc64, and passes basic testing on ppc32 too. Brief tests of ptrace functionality with strace and gdb also appear OK. I wouldn't send it to Linus for 2.6.15 just yet though :) Signed-off-by: David Woodhouse diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 91538d2..3bf89d1 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -92,9 +92,9 @@ int main(void) DEFINE(TI_FLAGS, offsetof(struct thread_info, flags)); DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count)); - DEFINE(TI_SC_NOERR, offsetof(struct thread_info, syscall_noerror)); -#ifdef CONFIG_PPC32 + DEFINE(TI_SIGFRAME, offsetof(struct thread_info, nvgprs_frame)); DEFINE(TI_TASK, offsetof(struct thread_info, task)); +#ifdef CONFIG_PPC32 DEFINE(TI_EXECDOMAIN, offsetof(struct thread_info, exec_domain)); DEFINE(TI_CPU, offsetof(struct thread_info, cpu)); #endif /* CONFIG_PPC32 */ diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 2e99ae4..8fed953 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -200,8 +200,6 @@ _GLOBAL(DoSyscall) bl do_show_syscall #endif /* SHOW_SYSCALLS */ rlwinm r10,r1,0,0,(31-THREAD_SHIFT) /* current_thread_info() */ - li r11,0 - stb r11,TI_SC_NOERR(r10) lwz r11,TI_FLAGS(r10) andi. r11,r11,_TIF_SYSCALL_T_OR_A bne- syscall_dotrace @@ -222,25 +220,21 @@ ret_from_syscall: bl do_show_syscall_exit #endif mr r6,r3 - li r11,-_LAST_ERRNO - cmplw 0,r3,r11 rlwinm r12,r1,0,0,(31-THREAD_SHIFT) /* current_thread_info() */ - blt+ 30f - lbz r11,TI_SC_NOERR(r12) - cmpwi r11,0 - bne 30f - neg r3,r3 - lwz r10,_CCR(r1) /* Set SO bit in CR */ - oris r10,r10,0x1000 - stw r10,_CCR(r1) - /* disable interrupts so current_thread_info()->flags can't change */ -30: LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ + LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ SYNC MTMSRD(r10) lwz r9,TI_FLAGS(r12) - andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SIGPENDING|_TIF_NEED_RESCHED) + li r8,-_LAST_ERRNO + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SIGPENDING|_TIF_NEED_RESCHED|_TIF_RESTOREALL) bne- syscall_exit_work + cmplw 0,r3,r8 + blt+ syscall_exit_cont + lwz r11,_CCR(r1) /* Load CR */ + neg r3,r3 + oris r11,r11,0x1000 /* Set SO bit in CR */ + stw r11,_CCR(r1) syscall_exit_cont: #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) /* If the process has its own DBCR0 value, load it up. The single @@ -292,46 +286,113 @@ syscall_dotrace: b syscall_dotrace_cont syscall_exit_work: - stw r6,RESULT(r1) /* Save result */ + andi. r0,r9,_TIF_RESTOREALL + bne- 2f + cmplw 0,r3,r8 + blt+ 1f + andi. r0,r9,_TIF_NOERROR + bne- 1f + lwz r11,_CCR(r1) /* Load CR */ + neg r3,r3 + oris r11,r11,0x1000 /* Set SO bit in CR */ + stw r11,_CCR(r1) + +1: stw r6,RESULT(r1) /* Save result */ stw r3,GPR3(r1) /* Update return value */ - andi. r0,r9,_TIF_SYSCALL_T_OR_A - beq 5f - ori r10,r10,MSR_EE - SYNC - MTMSRD(r10) /* re-enable interrupts */ +2: andi. r0,r9,(_TIF_PERSYSCALL_MASK) + beq 4f + + /* Clear per-syscall TIF flags if any are set, but _leave_ + _TIF_SAVE_NVGPRS set in r9 since we haven't dealt with that + yet. */ + + li r11,_TIF_PERSYSCALL_MASK + addi r12,r12,TI_FLAGS +3: lwarx r8,0,r12 + andc r8,r8,r11 +#ifdef CONFIG_IBM405_ERR77 + dcbt 0,r12 +#endif + stwcx. r8,0,r12 + bne- 3b + subi r12,r12,TI_FLAGS + +4: /* Anything which requires enabling interrupts? */ + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP|_TIF_SAVE_NVGPRS) + beq 7f + + /* Save NVGPRS if they're not saved already */ lwz r4,_TRAP(r1) andi. r4,r4,1 - beq 4f + beq 5f SAVE_NVGPRS(r1) li r4,0xc00 stw r4,_TRAP(r1) -4: + + /* Re-enable interrupts */ +5: ori r10,r10,MSR_EE + SYNC + MTMSRD(r10) + + andi. r0,r9,_TIF_SAVE_NVGPRS + bne save_user_nvgprs + +save_user_nvgprs_cont: + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) + beq 7f + addi r3,r1,STACK_FRAME_OVERHEAD bl do_syscall_trace_leave REST_NVGPRS(r1) -2: - lwz r3,GPR3(r1) + +6: lwz r3,GPR3(r1) LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ SYNC MTMSRD(r10) /* disable interrupts again */ rlwinm r12,r1,0,0,(31-THREAD_SHIFT) /* current_thread_info() */ lwz r9,TI_FLAGS(r12) -5: +7: andi. r0,r9,_TIF_NEED_RESCHED - bne 1f + bne 8f lwz r5,_MSR(r1) andi. r5,r5,MSR_PR - beq syscall_exit_cont + beq ret_from_except andi. r0,r9,_TIF_SIGPENDING - beq syscall_exit_cont + beq ret_from_except b do_user_signal -1: +8: ori r10,r10,MSR_EE SYNC MTMSRD(r10) /* re-enable interrupts */ bl schedule - b 2b + b 6b + +save_user_nvgprs: + ld r8,TI_SIGFRAME(r12) +.macro savewords start, end + 1: stw \start,4*(\start)(r8) + .section __ex_table,"a" + .align 2 + .long 1b,save_user_nvgprs_fault + .previous + .if \end - \start + savewords "(\start+1)",\end + .endif +.endm + savewords 14,31 + b save_user_nvgprs_cont + + +save_user_nvgprs_fault: + li r3,11 /* SIGSEGV */ + ld r4,TI_TASK(r12) + bl force_sigsegv + + rlwinm r12,r1,0,0,(31-THREAD_SHIFT) /* current_thread_info() */ + ld r9,TI_FLAGS(r12) + b save_user_nvgprs_cont + #ifdef SHOW_SYSCALLS do_show_syscall: #ifdef SHOW_SYSCALLS_TASK @@ -401,28 +462,10 @@ show_syscalls_task: #endif /* SHOW_SYSCALLS */ /* - * The sigsuspend and rt_sigsuspend system calls can call do_signal - * and thus put the process into the stopped state where we might - * want to examine its user state with ptrace. Therefore we need - * to save all the nonvolatile registers (r13 - r31) before calling - * the C code. + * The fork/clone functions need to copy the full register set into + * the child process. Therefore we need to save all the nonvolatile + * registers (r13 - r31) before calling the C code. */ - .globl ppc_sigsuspend -ppc_sigsuspend: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30 /* clear LSB to indicate full */ - stw r0,_TRAP(r1) /* register set saved */ - b sys_sigsuspend - - .globl ppc_rt_sigsuspend -ppc_rt_sigsuspend: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30 - stw r0,_TRAP(r1) - b sys_rt_sigsuspend - .globl ppc_fork ppc_fork: SAVE_NVGPRS(r1) @@ -447,14 +490,6 @@ ppc_clone: stw r0,_TRAP(r1) /* register set saved */ b sys_clone - .globl ppc_swapcontext -ppc_swapcontext: - SAVE_NVGPRS(r1) - lwz r0,_TRAP(r1) - rlwinm r0,r0,0,0,30 /* clear LSB to indicate full */ - stw r0,_TRAP(r1) /* register set saved */ - b sys_swapcontext - /* * Top-level page fault handling. * This is in assembler because if do_page_fault tells us that @@ -626,16 +661,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_601) .long ret_from_except #endif - .globl sigreturn_exit -sigreturn_exit: - subi r1,r3,STACK_FRAME_OVERHEAD - rlwinm r12,r1,0,0,(31-THREAD_SHIFT) /* current_thread_info() */ - lwz r9,TI_FLAGS(r12) - andi. r0,r9,_TIF_SYSCALL_T_OR_A - beq+ ret_from_except_full - bl do_syscall_trace_leave - /* fall through */ - .globl ret_from_except_full ret_from_except_full: REST_NVGPRS(r1) @@ -658,7 +683,7 @@ user_exc_return: /* r10 contains MSR_KE /* Check current_thread_info()->flags */ rlwinm r9,r1,0,0,(31-THREAD_SHIFT) lwz r9,TI_FLAGS(r9) - andi. r0,r9,(_TIF_SIGPENDING|_TIF_NEED_RESCHED) + andi. r0,r9,(_TIF_SIGPENDING|_TIF_NEED_RESCHED|_TIF_RESTOREALL) bne do_work restore_user: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2d22bf0..83b9edf 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -113,9 +113,7 @@ system_call_common: addi r9,r1,STACK_FRAME_OVERHEAD #endif clrrdi r11,r1,THREAD_SHIFT - li r12,0 ld r10,TI_FLAGS(r11) - stb r12,TI_SC_NOERR(r11) andi. r11,r10,_TIF_SYSCALL_T_OR_A bne- syscall_dotrace syscall_dotrace_cont: @@ -144,24 +142,12 @@ system_call: /* label this so stack tr bctrl /* Call handler */ syscall_exit: + std r3,RESULT(r1) #ifdef SHOW_SYSCALLS - std r3,GPR3(r1) bl .do_show_syscall_exit - ld r3,GPR3(r1) + ld r3,RESULT(r1) #endif - std r3,RESULT(r1) - ld r5,_CCR(r1) - li r10,-_LAST_ERRNO - cmpld r3,r10 clrrdi r12,r1,THREAD_SHIFT - bge- syscall_error -syscall_error_cont: - - /* check for syscall tracing or audit */ - ld r9,TI_FLAGS(r12) - andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) - bne- syscall_exit_trace -syscall_exit_trace_cont: /* disable interrupts so current_thread_info()->flags can't change, and so that we don't get interrupted after loading SRR0/1. */ @@ -173,8 +159,13 @@ syscall_exit_trace_cont: rotldi r10,r10,16 mtmsrd r10,1 ld r9,TI_FLAGS(r12) - andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SIGPENDING|_TIF_NEED_RESCHED) + li r11,-_LAST_ERRNO + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP|_TIF_SIGPENDING|_TIF_NEED_RESCHED|_TIF_RESTOREALL|_TIF_SAVE_NVGPRS|_TIF_NOERROR) bne- syscall_exit_work + cmpld r3,r11 + ld r5,_CCR(r1) + bge- syscall_error +syscall_error_cont: ld r7,_NIP(r1) stdcx. r0,0,r1 /* to clear the reservation */ andi. r6,r8,MSR_PR @@ -193,21 +184,12 @@ syscall_exit_trace_cont: rfid b . /* prevent speculative execution */ -syscall_enosys: - li r3,-ENOSYS - std r3,RESULT(r1) - clrrdi r12,r1,THREAD_SHIFT - ld r5,_CCR(r1) - -syscall_error: - lbz r11,TI_SC_NOERR(r12) - cmpwi 0,r11,0 - bne- syscall_error_cont - neg r3,r3 +syscall_error: oris r5,r5,0x1000 /* Set SO bit in CR */ + neg r3,r3 std r5,_CCR(r1) b syscall_error_cont - + /* Traced system call support */ syscall_dotrace: bl .save_nvgprs @@ -225,21 +207,69 @@ syscall_dotrace: ld r10,TI_FLAGS(r10) b syscall_dotrace_cont -syscall_exit_trace: - std r3,GPR3(r1) - bl .save_nvgprs +syscall_enosys: + li r3,-ENOSYS + b syscall_exit + +syscall_exit_work: + /* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr. + If TIF_NOERROR is set, just save r3 as it is. */ + + andi. r0,r9,_TIF_RESTOREALL + bne- 2f + cmpld r3,r11 /* r10 is -LAST_ERRNO */ + blt+ 1f + andi. r0,r9,_TIF_NOERROR + bne- 1f + ld r5,_CCR(r1) + neg r3,r3 + oris r5,r5,0x1000 /* Set SO bit in CR */ + std r5,_CCR(r1) +1: std r3,GPR3(r1) +2: andi. r0,r9,(_TIF_PERSYSCALL_MASK) + beq 4f + + /* Clear per-syscall TIF flags if any are set, but _leave_ + _TIF_SAVE_NVGPRS set in r9 since we haven't dealt with that + yet. */ + + li r11,_TIF_PERSYSCALL_MASK + addi r12,r12,TI_FLAGS +3: ldarx r10,0,r12 + andc r10,r10,r11 + stdcx. r10,0,r12 + bne- 3b + subi r12,r12,TI_FLAGS + +4: bl save_nvgprs + /* Anything else left to do? */ + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP|_TIF_SAVE_NVGPRS) + beq .ret_from_except_lite + + /* Re-enable interrupts */ + mfmsr r10 + ori r10,r10,MSR_EE + mtmsrd r10,1 + + andi. r0,r9,_TIF_SAVE_NVGPRS + bne save_user_nvgprs + + /* If tracing, re-enable interrupts and do it */ +save_user_nvgprs_cont: + andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) + beq 5f + addi r3,r1,STACK_FRAME_OVERHEAD bl .do_syscall_trace_leave REST_NVGPRS(r1) - ld r3,GPR3(r1) - ld r5,_CCR(r1) clrrdi r12,r1,THREAD_SHIFT - b syscall_exit_trace_cont -/* Stuff to do on exit from a system call. */ -syscall_exit_work: - std r3,GPR3(r1) - std r5,_CCR(r1) + /* Disable interrupts again and handle other work if any */ +5: mfmsr r10 + rldicl r10,r10,48,1 + rotldi r10,r10,16 + mtmsrd r10,1 + b .ret_from_except_lite /* Save non-volatile GPRs, if not already saved. */ @@ -252,6 +282,52 @@ _GLOBAL(save_nvgprs) std r0,_TRAP(r1) blr + +save_user_nvgprs: + ld r10,TI_SIGFRAME(r12) + andi. r0,r9,_TIF_32BIT + beq- save_user_nvgprs_64 + + /* 32-bit save to userspace */ + +.macro savewords start, end + 1: stw \start,4*(\start)(r10) + .section __ex_table,"a" + .align 3 + .llong 1b,save_user_nvgprs_fault + .previous + .if \end - \start + savewords "(\start+1)",\end + .endif +.endm + savewords 14,31 + b save_user_nvgprs_cont + +save_user_nvgprs_64: + /* 64-bit save to userspace */ + +.macro savelongs start, end + 1: std \start,8*(\start)(r10) + .section __ex_table,"a" + .align 3 + .llong 1b,save_user_nvgprs_fault + .previous + .if \end - \start + savelongs "(\start+1)",\end + .endif +.endm + savelongs 14,31 + b save_user_nvgprs_cont + +save_user_nvgprs_fault: + li r3,11 /* SIGSEGV */ + ld r4,TI_TASK(r12) + bl .force_sigsegv + + clrrdi r12,r1,THREAD_SHIFT + ld r9,TI_FLAGS(r12) + b save_user_nvgprs_cont + /* * The sigsuspend and rt_sigsuspend system calls can call do_signal * and thus put the process into the stopped state where we might @@ -260,35 +336,6 @@ _GLOBAL(save_nvgprs) * the C code. Similarly, fork, vfork and clone need the full * register state on the stack so that it can be copied to the child. */ -_GLOBAL(ppc32_sigsuspend) - bl .save_nvgprs - bl .compat_sys_sigsuspend - b 70f - -_GLOBAL(ppc64_rt_sigsuspend) - bl .save_nvgprs - bl .sys_rt_sigsuspend - b 70f - -_GLOBAL(ppc32_rt_sigsuspend) - bl .save_nvgprs - bl .compat_sys_rt_sigsuspend -70: cmpdi 0,r3,0 - /* If it returned an error, we need to return via syscall_exit to set - the SO bit in cr0 and potentially stop for ptrace. */ - bne syscall_exit - /* If sigsuspend() returns zero, we are going into a signal handler. We - may need to call audit_syscall_exit() to mark the exit from sigsuspend() */ -#ifdef CONFIG_AUDITSYSCALL - ld r3,PACACURRENT(r13) - ld r4,AUDITCONTEXT(r3) - cmpdi 0,r4,0 - beq .ret_from_except /* No audit_context: Leave immediately. */ - li r4, 2 /* AUDITSC_FAILURE */ - li r5,-4 /* It's always -EINTR */ - bl .audit_syscall_exit -#endif - b .ret_from_except _GLOBAL(ppc_fork) bl .save_nvgprs @@ -305,37 +352,6 @@ _GLOBAL(ppc_clone) bl .sys_clone b syscall_exit -_GLOBAL(ppc32_swapcontext) - bl .save_nvgprs - bl .compat_sys_swapcontext - b 80f - -_GLOBAL(ppc64_swapcontext) - bl .save_nvgprs - bl .sys_swapcontext - b 80f - -_GLOBAL(ppc32_sigreturn) - bl .compat_sys_sigreturn - b 80f - -_GLOBAL(ppc32_rt_sigreturn) - bl .compat_sys_rt_sigreturn - b 80f - -_GLOBAL(ppc64_rt_sigreturn) - bl .sys_rt_sigreturn - -80: cmpdi 0,r3,0 - blt syscall_exit - clrrdi r4,r1,THREAD_SHIFT - ld r4,TI_FLAGS(r4) - andi. r4,r4,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) - beq+ 81f - addi r3,r1,STACK_FRAME_OVERHEAD - bl .do_syscall_trace_leave -81: b .ret_from_except - _GLOBAL(ret_from_fork) bl .schedule_tail REST_NVGPRS(r1) diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 5a2eba6..c9d0275 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -76,7 +76,6 @@ * registers from *regs. This is what we need * to do when a signal has been delivered. */ -#define sigreturn_exit(regs) return 0 #define GP_REGS_SIZE min(sizeof(elf_gregset_t32), sizeof(struct pt_regs32)) #undef __SIGNAL_FRAMESIZE @@ -156,9 +155,17 @@ static inline int save_general_regs(stru elf_greg_t64 *gregs = (elf_greg_t64 *)regs; int i; - for (i = 0; i <= PT_RESULT; i ++) + if (!FULL_REGS(regs)) { + set_thread_flag(TIF_SAVE_NVGPRS); + current_thread_info()->nvgprs_frame = frame->mc_gregs; + } + + for (i = 0; i <= PT_RESULT; i ++) { + if (i == 14 && !FULL_REGS(regs)) + i = 32; if (__put_user((unsigned int)gregs[i], &frame->mc_gregs[i])) return -EFAULT; + } return 0; } @@ -179,8 +186,6 @@ static inline int restore_general_regs(s #else /* CONFIG_PPC64 */ -extern void sigreturn_exit(struct pt_regs *); - #define GP_REGS_SIZE min(sizeof(elf_gregset_t), sizeof(struct pt_regs)) static inline int put_sigset_t(sigset_t __user *uset, sigset_t *set) @@ -256,8 +261,10 @@ long sys_sigsuspend(old_sigset_t mask, i while (1) { current->state = TASK_INTERRUPTIBLE; schedule(); - if (do_signal(&saveset, regs)) - sigreturn_exit(regs); + if (do_signal(&saveset, regs)) { + set_thread_flag(TIF_RESTOREALL); + return 0; + } } } @@ -292,8 +299,10 @@ long sys_rt_sigsuspend( while (1) { current->state = TASK_INTERRUPTIBLE; schedule(); - if (do_signal(&saveset, regs)) - sigreturn_exit(regs); + if (do_signal(&saveset, regs)) { + set_thread_flag(TIF_RESTOREALL); + return 0; + } } } @@ -391,9 +400,6 @@ struct rt_sigframe { static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame, int sigret) { -#ifdef CONFIG_PPC32 - CHECK_FULL_REGS(regs); -#endif /* Make sure floating point registers are stored in regs */ flush_fp_to_thread(current); @@ -828,12 +834,6 @@ static int handle_rt_signal(unsigned lon regs->gpr[6] = (unsigned long) rt_sf; regs->nip = (unsigned long) ka->sa.sa_handler; regs->trap = 0; -#ifdef CONFIG_PPC64 - regs->result = 0; - - if (test_thread_flag(TIF_SINGLESTEP)) - ptrace_notify(SIGTRAP); -#endif return 1; badframe: @@ -911,8 +911,8 @@ long sys_swapcontext(struct ucontext __u */ if (do_setcontext(new_ctx, regs, 0)) do_exit(SIGSEGV); - sigreturn_exit(regs); - /* doesn't actually return back to here */ + + set_thread_flag(TIF_RESTOREALL); return 0; } @@ -945,12 +945,11 @@ long sys_rt_sigreturn(int r3, int r4, in * nobody does any... */ compat_sys_sigaltstack((u32)(u64)&rt_sf->uc.uc_stack, 0, 0, 0, 0, 0, regs); - return (int)regs->result; #else do_sigaltstack(&rt_sf->uc.uc_stack, NULL, regs->gpr[1]); - sigreturn_exit(regs); /* doesn't return here */ - return 0; #endif + set_thread_flag(TIF_RESTOREALL); + return 0; bad: force_sig(SIGSEGV, current); @@ -1041,9 +1040,7 @@ int sys_debug_setcontext(struct ucontext */ do_sigaltstack(&ctx->uc_stack, NULL, regs->gpr[1]); - sigreturn_exit(regs); - /* doesn't actually return back to here */ - + set_thread_flag(TIF_RESTOREALL); out: return 0; } @@ -1107,12 +1104,6 @@ static int handle_signal(unsigned long s regs->gpr[4] = (unsigned long) sc; regs->nip = (unsigned long) ka->sa.sa_handler; regs->trap = 0; -#ifdef CONFIG_PPC64 - regs->result = 0; - - if (test_thread_flag(TIF_SINGLESTEP)) - ptrace_notify(SIGTRAP); -#endif return 1; @@ -1160,12 +1151,8 @@ long sys_sigreturn(int r3, int r4, int r || restore_user_regs(regs, sr, 1)) goto badframe; -#ifdef CONFIG_PPC64 - return (int)regs->result; -#else - sigreturn_exit(regs); /* doesn't return */ + set_thread_flag(TIF_RESTOREALL); return 0; -#endif badframe: force_sig(SIGSEGV, current); diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 1decf27..5462bef 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -96,8 +96,10 @@ long sys_rt_sigsuspend(sigset_t __user * while (1) { current->state = TASK_INTERRUPTIBLE; schedule(); - if (do_signal(&saveset, regs)) + if (do_signal(&saveset, regs)) { + set_thread_flag(TIF_RESTOREALL); return 0; + } } } @@ -152,6 +154,14 @@ static long setup_sigcontext(struct sigc err |= __put_user(0, &sc->v_regs); #endif /* CONFIG_ALTIVEC */ err |= __put_user(&sc->gp_regs, &sc->regs); + if (!FULL_REGS(regs)) { + /* Zero out the unsaved GPRs to avoid information + leak, and set TIF_SAVE_NVGPRS to ensure that the + registers do actually get saved later. */ + memset(®s->gpr[14], 0, 18 * sizeof(unsigned long)); + set_thread_flag(TIF_SAVE_NVGPRS); + current_thread_info()->nvgprs_frame = &sc->gp_regs; + } err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE); err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE); err |= __put_user(signr, &sc->signal); @@ -340,6 +350,7 @@ int sys_swapcontext(struct ucontext __us do_exit(SIGSEGV); /* This returns like rt_sigreturn */ + set_thread_flag(TIF_RESTOREALL); return 0; } @@ -372,7 +383,8 @@ int sys_rt_sigreturn(unsigned long r3, u */ do_sigaltstack(&uc->uc_stack, NULL, regs->gpr[1]); - return regs->result; + set_thread_flag(TIF_RESTOREALL); + return 0; badframe: #if DEBUG_SIG @@ -454,9 +466,6 @@ static int setup_rt_frame(int signr, str if (err) goto badframe; - if (test_thread_flag(TIF_SINGLESTEP)) - ptrace_notify(SIGTRAP); - return 1; badframe: @@ -502,6 +511,8 @@ static inline void syscall_restart(struc * we only get here if there is a handler, we dont restart. */ regs->result = -EINTR; + regs->gpr[3] = EINTR; + regs->ccr |= 0x10000000; break; case -ERESTARTSYS: /* ERESTARTSYS means to restart the syscall if there is no @@ -509,6 +520,8 @@ static inline void syscall_restart(struc */ if (!(ka->sa.sa_flags & SA_RESTART)) { regs->result = -EINTR; + regs->gpr[3] = EINTR; + regs->ccr |= 0x10000000; break; } /* fallthrough */ diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S index 65eaea9..4bb3650 100644 --- a/arch/powerpc/kernel/systbl.S +++ b/arch/powerpc/kernel/systbl.S @@ -113,7 +113,7 @@ SYSCALL(sgetmask) COMPAT_SYS(ssetmask) SYSCALL(setreuid) SYSCALL(setregid) -SYSX(sys_ni_syscall,ppc32_sigsuspend,ppc_sigsuspend) +SYS32ONLY(sigsuspend) COMPAT_SYS(sigpending) COMPAT_SYS(sethostname) COMPAT_SYS(setrlimit) @@ -160,7 +160,7 @@ SYSCALL(swapoff) COMPAT_SYS(sysinfo) COMPAT_SYS(ipc) SYSCALL(fsync) -SYSX(sys_ni_syscall,ppc32_sigreturn,sys_sigreturn) +SYS32ONLY(sigreturn) PPC_SYS(clone) COMPAT_SYS(setdomainname) PPC_SYS(newuname) @@ -213,13 +213,13 @@ COMPAT_SYS(nfsservctl) SYSCALL(setresgid) SYSCALL(getresgid) COMPAT_SYS(prctl) -SYSX(ppc64_rt_sigreturn,ppc32_rt_sigreturn,sys_rt_sigreturn) +COMPAT_SYS(rt_sigreturn) COMPAT_SYS(rt_sigaction) COMPAT_SYS(rt_sigprocmask) COMPAT_SYS(rt_sigpending) COMPAT_SYS(rt_sigtimedwait) COMPAT_SYS(rt_sigqueueinfo) -SYSX(ppc64_rt_sigsuspend,ppc32_rt_sigsuspend,ppc_rt_sigsuspend) +COMPAT_SYS(rt_sigsuspend) COMPAT_SYS(pread64) COMPAT_SYS(pwrite64) SYSCALL(chown) @@ -290,7 +290,7 @@ COMPAT_SYS(clock_settime) COMPAT_SYS(clock_gettime) COMPAT_SYS(clock_getres) COMPAT_SYS(clock_nanosleep) -SYSX(ppc64_swapcontext,ppc32_swapcontext,ppc_swapcontext) +COMPAT_SYS(swapcontext) COMPAT_SYS(tgkill) COMPAT_SYS(utimes) COMPAT_SYS(statfs64) diff --git a/include/asm-powerpc/ptrace.h b/include/asm-powerpc/ptrace.h index 1f7ecdb..9c550b3 100644 --- a/include/asm-powerpc/ptrace.h +++ b/include/asm-powerpc/ptrace.h @@ -87,7 +87,7 @@ extern unsigned long profile_pc(struct p #define force_successful_syscall_return() \ do { \ - current_thread_info()->syscall_noerror = 1; \ + set_thread_flag(TIF_NOERROR); \ } while(0) /* diff --git a/include/asm-powerpc/thread_info.h b/include/asm-powerpc/thread_info.h index e525f49..ac1e80e 100644 --- a/include/asm-powerpc/thread_info.h +++ b/include/asm-powerpc/thread_info.h @@ -37,8 +37,7 @@ struct thread_info { int preempt_count; /* 0 => preemptable, <0 => BUG */ struct restart_block restart_block; - /* set by force_successful_syscall_return */ - unsigned char syscall_noerror; + void *nvgprs_frame; /* low level flags - has atomic operations done on it */ unsigned long flags ____cacheline_aligned_in_smp; }; @@ -123,6 +122,9 @@ static inline struct thread_info *curren #define TIF_SINGLESTEP 9 /* singlestepping active */ #define TIF_MEMDIE 10 #define TIF_SECCOMP 11 /* secure computing */ +#define TIF_RESTOREALL 12 /* Restore all regs (implies NOERROR) */ +#define TIF_SAVE_NVGPRS 13 /* Save r14-r31 in signal frame */ +#define TIF_NOERROR 14 /* Force successful syscall return */ /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1< References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: <17274.18790.902934.836437@cargo.ozlabs.ibm.com> Linus Torvalds writes: > Indeed. My machine now has 4GB, and it all works again. > > So this was a purely ppc64 bug, not a VM one. So non-NUMA flatmem works but sparsemem doesn't, on a machine with a hole in memory. I'll have a look at it. Could you send me your original .config? Paul. From torvalds at osdl.org Wed Nov 16 08:02:41 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Tue, 15 Nov 2005 13:02:41 -0800 (PST) Subject: ppc64 oops.. In-Reply-To: <17274.18790.902934.836437@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <17274.18790.902934.836437@cargo.ozlabs.ibm.com> Message-ID: On Wed, 16 Nov 2005, Paul Mackerras wrote: > > So non-NUMA flatmem works but sparsemem doesn't, on a machine with a > hole in memory. I'll have a look at it. > > Could you send me your original .config? Gaah, I overwrote it very much to never use it again by mistake. Here's my current config, where the only real difference is - I got rid of PSERIES, NUMA and SPARSEMEM (getting rid of PSERIES gets rid of a few drivers too, but they wouldn't be an issue for me anyway). It really was mainly a result of "make oldconfig" with new options defaulting to whatever their defaults are in their Kconfig files. I think the rule is that we should default pretty much everything to "no", not "yes". If somebody doesn't know what it is, and hasn't used if before it should be "no". Linus -------------- next part -------------- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.15-rc1 # Tue Nov 15 09:05:42 2005 # CONFIG_PPC64=y CONFIG_64BIT=y CONFIG_PPC_MERGE=y CONFIG_MMU=y CONFIG_GENERIC_HARDIRQS=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # # Processor support # CONFIG_POWER4_ONLY=y CONFIG_POWER4=y CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_PPC_STD_MMU=y CONFIG_SMP=y CONFIG_NR_CPUS=2 # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y # CONFIG_IKCONFIG is not set # CONFIG_CPUSETS is not set CONFIG_INITRAMFS_SOURCE="" # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y # # Block layer # # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support # CONFIG_PPC_MULTIPLATFORM=y # CONFIG_PPC_ISERIES is not set # CONFIG_EMBEDDED6xx is not set # CONFIG_APUS is not set # CONFIG_PPC_PSERIES is not set CONFIG_PPC_PMAC=y CONFIG_PPC_PMAC64=y # CONFIG_PPC_MAPLE is not set # CONFIG_PPC_CELL is not set CONFIG_PPC_OF=y CONFIG_U3_DART=y CONFIG_MPIC=y # CONFIG_PPC_RTAS is not set # CONFIG_MMIO_NVRAM is not set # CONFIG_PPC_MPC106 is not set CONFIG_GENERIC_TBSYNC=y # CONFIG_CPU_FREQ is not set # CONFIG_WANT_EARLY_SERIAL is not set # # Kernel options # # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_IOMMU_VMERGE is not set # CONFIG_HOTPLUG_CPU is not set # CONFIG_KEXEC is not set # CONFIG_IRQ_ALL_CPUS is not set # CONFIG_NUMA is not set CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4096 # CONFIG_PPC_64K_PAGES is not set # CONFIG_SCHED_SMT is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set # CONFIG_PM is not set # CONFIG_SECCOMP is not set CONFIG_ISA_DMA_API=y # # Bus options # CONFIG_GENERIC_ISA_DMA=y # CONFIG_PPC_I8259 is not set # CONFIG_PPC_INDIRECT_PCI is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_LEGACY_PROC=y # CONFIG_PCI_DEBUG is not set # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set CONFIG_KERNEL_START=0xc000000000000000 # # Networking # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_UNIX=y CONFIG_XFRM=y # CONFIG_XFRM_USER is not set CONFIG_NET_KEY=m CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=y # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_INET_TUNNEL=y # CONFIG_INET_DIAG is not set # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_BIC=y # # IP: Virtual Server Configuration # # CONFIG_IP_VS is not set # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set # # Core Netfilter Configuration # # CONFIG_NETFILTER_NETLINK is not set # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=y # CONFIG_IP_NF_CT_ACCT is not set # CONFIG_IP_NF_CONNTRACK_MARK is not set # CONFIG_IP_NF_CONNTRACK_EVENTS is not set # CONFIG_IP_NF_CT_PROTO_SCTP is not set # CONFIG_IP_NF_FTP is not set # CONFIG_IP_NF_IRC is not set # CONFIG_IP_NF_NETBIOS_NS is not set # CONFIG_IP_NF_TFTP is not set # CONFIG_IP_NF_AMANDA is not set # CONFIG_IP_NF_PPTP is not set CONFIG_IP_NF_QUEUE=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_MATCH_LIMIT=y CONFIG_IP_NF_MATCH_IPRANGE=y CONFIG_IP_NF_MATCH_MAC=y CONFIG_IP_NF_MATCH_PKTTYPE=y CONFIG_IP_NF_MATCH_MARK=y CONFIG_IP_NF_MATCH_MULTIPORT=y CONFIG_IP_NF_MATCH_TOS=y CONFIG_IP_NF_MATCH_RECENT=y CONFIG_IP_NF_MATCH_ECN=y CONFIG_IP_NF_MATCH_DSCP=y CONFIG_IP_NF_MATCH_AH_ESP=y CONFIG_IP_NF_MATCH_LENGTH=y CONFIG_IP_NF_MATCH_TTL=y CONFIG_IP_NF_MATCH_TCPMSS=y CONFIG_IP_NF_MATCH_HELPER=y CONFIG_IP_NF_MATCH_STATE=y CONFIG_IP_NF_MATCH_CONNTRACK=y CONFIG_IP_NF_MATCH_OWNER=y # CONFIG_IP_NF_MATCH_ADDRTYPE is not set # CONFIG_IP_NF_MATCH_REALM is not set # CONFIG_IP_NF_MATCH_SCTP is not set # CONFIG_IP_NF_MATCH_DCCP is not set # CONFIG_IP_NF_MATCH_COMMENT is not set # CONFIG_IP_NF_MATCH_HASHLIMIT is not set # CONFIG_IP_NF_MATCH_STRING is not set CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IP_NF_TARGET_ULOG=y CONFIG_IP_NF_TARGET_TCPMSS=y # CONFIG_IP_NF_TARGET_NFQUEUE is not set CONFIG_IP_NF_NAT=y CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=y CONFIG_IP_NF_TARGET_REDIRECT=y CONFIG_IP_NF_TARGET_NETMAP=y CONFIG_IP_NF_TARGET_SAME=y # CONFIG_IP_NF_NAT_SNMP_BASIC is not set CONFIG_IP_NF_MANGLE=y CONFIG_IP_NF_TARGET_TOS=y CONFIG_IP_NF_TARGET_ECN=y CONFIG_IP_NF_TARGET_DSCP=y CONFIG_IP_NF_TARGET_MARK=y CONFIG_IP_NF_TARGET_CLASSIFY=y # CONFIG_IP_NF_TARGET_TTL is not set # CONFIG_IP_NF_RAW is not set CONFIG_IP_NF_ARPTABLES=y CONFIG_IP_NF_ARPFILTER=y CONFIG_IP_NF_ARP_MANGLE=y # # DCCP Configuration (EXPERIMENTAL) # # CONFIG_IP_DCCP is not set # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=y # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_IEEE80211 is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # # Connector - unified userspace <-> kernelspace linker # # CONFIG_CONNECTOR is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=8192 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set CONFIG_ATA_OVER_ETH=m # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_IDETAPE=y CONFIG_BLK_DEV_IDEFLOPPY=y # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y # CONFIG_IDEPCI_SHARE_IRQ is not set # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_SL82C105 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set CONFIG_BLK_DEV_IDE_PMAC=y CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST=y CONFIG_BLK_DEV_IDEDMA_PMAC=y # CONFIG_BLK_DEV_IDE_PMAC_BLINK is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # # CONFIG_RAID_ATTRS is not set CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y # CONFIG_CHR_DEV_SCH is not set # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_ATTRS is not set # # SCSI low-level drivers # # CONFIG_ISCSI_TCP is not set # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set # CONFIG_MEGARAID_SAS is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_AHCI is not set CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_MV is not set # CONFIG_SCSI_SATA_NV is not set # CONFIG_SCSI_PDC_ADMA is not set # CONFIG_SCSI_SATA_QSTOR is not set # CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set # CONFIG_SCSI_SATA_SIL24 is not set # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set # CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_QLA24XX is not set # CONFIG_SCSI_LPFC is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y # CONFIG_MD_RAID10 is not set CONFIG_MD_RAID5=y # CONFIG_MD_RAID6 is not set # CONFIG_MD_MULTIPATH is not set # CONFIG_MD_FAULTY is not set CONFIG_BLK_DEV_DM=y # CONFIG_DM_CRYPT is not set # CONFIG_DM_SNAPSHOT is not set # CONFIG_DM_MIRROR is not set # CONFIG_DM_ZERO is not set # CONFIG_DM_MULTIPATH is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # CONFIG_FUSION_SPI is not set # CONFIG_FUSION_FC is not set # CONFIG_FUSION_SAS is not set # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=y # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set CONFIG_IEEE1394_OUI_DB=y CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y CONFIG_IEEE1394_CONFIG_ROM_IP1394=y # CONFIG_IEEE1394_EXPORT_FULL_API is not set # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=y # # Protocol Drivers # CONFIG_IEEE1394_VIDEO1394=m CONFIG_IEEE1394_SBP2=m # CONFIG_IEEE1394_SBP2_PHYS_DMA is not set CONFIG_IEEE1394_ETH1394=m CONFIG_IEEE1394_DV1394=m CONFIG_IEEE1394_RAWIO=y # CONFIG_IEEE1394_CMP is not set # # I2O device support # # CONFIG_I2O is not set # # Macintosh device drivers # CONFIG_ADB_PMU=y CONFIG_PMAC_SMU=y CONFIG_THERM_PM72=y # CONFIG_WINDFARM is not set # # Network device support # CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_BONDING=m # CONFIG_EQUALIZER is not set CONFIG_TUN=m # # ARCnet devices # # CONFIG_ARCNET is not set # # PHY device support # # CONFIG_PHYLIB is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set CONFIG_SUNGEM=y # CONFIG_CASSINI is not set # CONFIG_NET_VENDOR_3COM is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set # CONFIG_NET_PCI is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SIS190 is not set # CONFIG_SKGE is not set # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_MV643XX_ETH is not set # # Ethernet (10000 Mbit) # # CONFIG_CHELSIO_T1 is not set # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # CONFIG_TR=y CONFIG_IBMOL=y # CONFIG_3C359 is not set # CONFIG_TMS380TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set CONFIG_PPP=m # CONFIG_PPP_MULTILINK is not set # CONFIG_PPP_FILTER is not set CONFIG_PPP_ASYNC=m CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_BSDCOMP=m # CONFIG_PPP_MPPE is not set CONFIG_PPPOE=m # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_INPUT_JOYDEV=m # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y # CONFIG_INPUT_EVBUG is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y # CONFIG_KEYBOARD_ATKBD is not set # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y # CONFIG_MOUSE_PS2 is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Hardware I/O ports # CONFIG_SERIO=y # CONFIG_SERIO_I8042 is not set # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_PCIPS2 is not set # CONFIG_SERIO_RAW is not set # CONFIG_GAMEPORT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # # CONFIG_SERIAL_8250 is not set # # Non-8250 serial port support # # CONFIG_SERIAL_PMACZILOG is not set # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # CONFIG_AGP=y CONFIG_AGP_UNINORTH=y # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # CONFIG_TCG_TPM is not set # CONFIG_TELCLOCK is not set # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_ALGOPCF is not set # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_PIIX4 is not set CONFIG_I2C_KEYWEST=y CONFIG_I2C_PMAC_SMU=y # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Miscellaneous I2C Chip support # # CONFIG_SENSORS_DS1337 is not set # CONFIG_SENSORS_DS1374 is not set # CONFIG_SENSORS_EEPROM is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCA9539 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_SENSORS_MAX6875 is not set # CONFIG_RTC_X1205_I2C is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Hardware Monitoring support # CONFIG_HWMON=y # CONFIG_HWMON_VID is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ADM9240 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_ATXP1 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_FSCPOS is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_GL520SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_LM92 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SIS5595 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83792D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # CONFIG_SENSORS_W83627EHF is not set # CONFIG_HWMON_DEBUG_CHIP is not set # # Misc devices # # # Multimedia Capabilities Port drivers # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y CONFIG_FB_MACMODES=y CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_TILEBLITTING is not set # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set CONFIG_FB_OF=y # CONFIG_FB_CONTROL is not set # CONFIG_FB_PLATINUM is not set # CONFIG_FB_VALKYRIE is not set # CONFIG_FB_CT65550 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_S1D13XXX is not set # CONFIG_FB_NVIDIA is not set CONFIG_FB_RIVA=y # CONFIG_FB_RIVA_I2C is not set # CONFIG_FB_RIVA_DEBUG is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_CYBLA is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # # CONFIG_VGA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # # Logo configuration # CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_SEQUENCER=y # CONFIG_SND_SEQ_DUMMY is not set CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y CONFIG_SND_SEQUENCER_OSS=y # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set CONFIG_SND_GENERIC_DRIVER=y # # Generic devices # # CONFIG_SND_DUMMY is not set # CONFIG_SND_VIRMIDI is not set # CONFIG_SND_MTPAV is not set # CONFIG_SND_SERIAL_U16550 is not set # CONFIG_SND_MPU401 is not set # # PCI devices # # CONFIG_SND_ALI5451 is not set # CONFIG_SND_ATIIXP is not set # CONFIG_SND_ATIIXP_MODEM is not set # CONFIG_SND_AU8810 is not set # CONFIG_SND_AU8820 is not set # CONFIG_SND_AU8830 is not set # CONFIG_SND_AZT3328 is not set # CONFIG_SND_BT87X is not set # CONFIG_SND_CS46XX is not set # CONFIG_SND_CS4281 is not set # CONFIG_SND_EMU10K1 is not set # CONFIG_SND_EMU10K1X is not set # CONFIG_SND_CA0106 is not set # CONFIG_SND_KORG1212 is not set # CONFIG_SND_MIXART is not set # CONFIG_SND_NM256 is not set # CONFIG_SND_RME32 is not set # CONFIG_SND_RME96 is not set # CONFIG_SND_RME9652 is not set # CONFIG_SND_HDSP is not set # CONFIG_SND_HDSPM is not set # CONFIG_SND_TRIDENT is not set # CONFIG_SND_YMFPCI is not set # CONFIG_SND_AD1889 is not set # CONFIG_SND_ALS4000 is not set # CONFIG_SND_CMIPCI is not set # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set # CONFIG_SND_ES1938 is not set # CONFIG_SND_ES1968 is not set # CONFIG_SND_MAESTRO3 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_ICE1712 is not set # CONFIG_SND_ICE1724 is not set # CONFIG_SND_INTEL8X0 is not set # CONFIG_SND_INTEL8X0M is not set # CONFIG_SND_SONICVIBES is not set # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VIA82XX_MODEM is not set # CONFIG_SND_VX222 is not set # CONFIG_SND_HDA_INTEL is not set # # ALSA PowerMac devices # CONFIG_SND_POWERMAC=y CONFIG_SND_POWERMAC_AUTO_DRC=y # # USB devices # # CONFIG_SND_USB_AUDIO is not set # CONFIG_SND_USB_USX2Y is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_ISP116X_HCD is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_OHCI_BIG_ENDIAN is not set CONFIG_USB_OHCI_LITTLE_ENDIAN=y # CONFIG_USB_UHCI_HCD is not set # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # # CONFIG_OBSOLETE_OSS_USB_DRIVER is not set CONFIG_USB_ACM=m CONFIG_USB_PRINTER=y # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' # # # may also be needed; see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y # CONFIG_USB_STORAGE_USBAT is not set CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y # CONFIG_USB_STORAGE_ONETOUCH is not set # # USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y CONFIG_HID_FF=y CONFIG_HID_PID=y CONFIG_LOGITECH_FF=y CONFIG_THRUSTMASTER_FF=y CONFIG_USB_HIDDEV=y # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_ACECAD is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_ITMTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_YEALINK is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # CONFIG_USB_KEYSPAN_REMOTE is not set # CONFIG_USB_APPLETOUCH is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set CONFIG_USB_USBNET=m # CONFIG_USB_NET_AX8817X is not set CONFIG_USB_NET_CDCETHER=m # CONFIG_USB_NET_GL620A is not set # CONFIG_USB_NET_NET1080 is not set # CONFIG_USB_NET_PLUSB is not set # CONFIG_USB_NET_RNDIS_HOST is not set # CONFIG_USB_NET_CDC_SUBSET is not set # CONFIG_USB_NET_ZAURUS is not set # CONFIG_USB_MON is not set # # USB port drivers # # # USB Serial Converter support # CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y # CONFIG_USB_SERIAL_AIRPRIME is not set CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m # CONFIG_USB_SERIAL_CP2101 is not set # CONFIG_USB_SERIAL_CYPRESS_M8 is not set CONFIG_USB_SERIAL_EMPEG=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_VISOR=m CONFIG_USB_SERIAL_IPAQ=m CONFIG_USB_SERIAL_IR=m CONFIG_USB_SERIAL_EDGEPORT=m CONFIG_USB_SERIAL_EDGEPORT_TI=m # CONFIG_USB_SERIAL_GARMIN is not set # CONFIG_USB_SERIAL_IPW is not set CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m CONFIG_USB_SERIAL_KEYSPAN_MPR=y CONFIG_USB_SERIAL_KEYSPAN_USA28=y CONFIG_USB_SERIAL_KEYSPAN_USA28X=y CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y CONFIG_USB_SERIAL_KEYSPAN_USA19=y CONFIG_USB_SERIAL_KEYSPAN_USA18X=y CONFIG_USB_SERIAL_KEYSPAN_USA19W=y CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y CONFIG_USB_SERIAL_KEYSPAN_USA49W=y CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y CONFIG_USB_SERIAL_KLSI=m CONFIG_USB_SERIAL_KOBIL_SCT=m CONFIG_USB_SERIAL_MCT_U232=m # CONFIG_USB_SERIAL_NOKIA_DKU2 is not set CONFIG_USB_SERIAL_PL2303=m # CONFIG_USB_SERIAL_HP4X is not set CONFIG_USB_SERIAL_SAFE=m CONFIG_USB_SERIAL_SAFE_PADDED=y # CONFIG_USB_SERIAL_TI is not set CONFIG_USB_SERIAL_CYBERJACK=m CONFIG_USB_SERIAL_XIRCOM=m CONFIG_USB_SERIAL_OMNINET=m CONFIG_USB_EZUSB=y # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_SISUSBVGA is not set # CONFIG_USB_LD is not set # CONFIG_USB_TEST is not set # # USB DSL modem support # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # CONFIG_INFINIBAND is not set # # SN Devices # # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y # CONFIG_EXT2_FS_SECURITY is not set # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set CONFIG_INOTIFY=y # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m # CONFIG_AUTOFS4_FS is not set # CONFIG_FUSE_FS is not set # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y # CONFIG_JOLIET is not set # CONFIG_ZISOFS is not set CONFIG_UDF_FS=m CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # CONFIG_RELAYFS_FS is not set # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=y CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set CONFIG_NFS_V4=y # CONFIG_NFS_DIRECTIO is not set CONFIG_NFSD=y CONFIG_NFSD_V3=y # CONFIG_NFSD_V3_ACL is not set CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y CONFIG_RPCSEC_GSS_KRB5=y # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set # CONFIG_CIFS_XATTR is not set # CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # CONFIG_9P_FS is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set # CONFIG_SGI_PARTITION is not set # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_437 is not set # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set # CONFIG_NLS_ISO8859_1 is not set # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Library routines # CONFIG_CRC_CCITT=m # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=m CONFIG_ZLIB_DEFLATE=m # # Instrumentation Support # CONFIG_PROFILING=y CONFIG_OPROFILE=y # CONFIG_KPROBES is not set # # Kernel hacking # # CONFIG_PRINTK_TIME is not set CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_DETECT_SOFTLOCKUP=y # CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_FS is not set # CONFIG_DEBUG_VM is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUGGER is not set # CONFIG_IRQSTACKS is not set CONFIG_BOOTX_TEXT=y # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m # CONFIG_CRYPTO_AES is not set CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=m # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set CONFIG_CRYPTO_TEST=m # # Hardware crypto devices # From benh at kernel.crashing.org Wed Nov 16 08:05:27 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 08:05:27 +1100 Subject: ppc64 oops.. In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> Message-ID: <1132088728.5646.48.camel@gaston> > I'd love to, but then it never boots at all, and stops after "Setup done". > > I'll now try with "flatmem", although on powerpc, sparsemem seems to be > the default, and I bet that's the cause: > > config ARCH_SPARSEMEM_DEFAULT > def_bool y > depends on SMP && PPC_PSERIES > > just because I had PSERIES enabled (which is _also_ the default). > > So nobody has clearly ever tested either NUMA nor SPARSEMEM, yet they are > both enabled by default. Tssk. Heh, I use the g5_defconfig on G5s which doesn't have any of these... NUMA and SPARSEMEM hopefully work on pseries, but yah, there have been some recent breakage that we'll fix, it should be made to work on g5 too of course. Ben. From benh at kernel.crashing.org Wed Nov 16 08:07:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 08:07:21 +1100 Subject: ppc64 oops.. In-Reply-To: <20051115174332.GA9632@krispykreme> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <20051115174332.GA9632@krispykreme> Message-ID: <1132088841.5646.49.camel@gaston> On Wed, 2005-11-16 at 04:43 +1100, Anton Blanchard wrote: > Hi, > > > I'd love to, but then it never boots at all, and stops after "Setup done". > > > > I'll now try with "flatmem", although on powerpc, sparsemem seems to be > > the default, and I bet that's the cause: > > > > config ARCH_SPARSEMEM_DEFAULT > > def_bool y > > depends on SMP && PPC_PSERIES > > > > just because I had PSERIES enabled (which is _also_ the default). > > > > So nobody has clearly ever tested either NUMA nor SPARSEMEM, yet they are > > both enabled by default. Tssk. > > All options (flatmem, sparsmem and NUMA sparsemem) were tested on > pseries, I dont have access to a g5 in Austin unfortunately. Well, it probably also breaks pSeries without NUMA and with a memory hole :) Ben. From anton at samba.org Wed Nov 16 08:10:02 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 16 Nov 2005 08:10:02 +1100 Subject: ppc64 oops.. In-Reply-To: <1132088841.5646.49.camel@gaston> References: <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <20051115174332.GA9632@krispykreme> <1132088841.5646.49.camel@gaston> Message-ID: <20051115211002.GB30500@krispykreme> > Well, it probably also breaks pSeries without NUMA and with a memory > hole :) Which is nothing in the last 4 years :) Anton From benh at kernel.crashing.org Wed Nov 16 08:22:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 08:22:21 +1100 Subject: ppc64 oops.. In-Reply-To: <20051115211002.GB30500@krispykreme> References: <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <20051115174332.GA9632@krispykreme> <1132088841.5646.49.camel@gaston> <20051115211002.GB30500@krispykreme> Message-ID: <1132089742.5646.51.camel@gaston> On Wed, 2005-11-16 at 08:10 +1100, Anton Blanchard wrote: > > Well, it probably also breaks pSeries without NUMA and with a memory > > hole :) > > Which is nothing in the last 4 years :) I know you miss your POWER3 :) Ben. From cfriesen at nortel.com Wed Nov 16 08:16:57 2005 From: cfriesen at nortel.com (Christopher Friesen) Date: Tue, 15 Nov 2005 15:16:57 -0600 Subject: modify the cache-inhibit and guard bits from userspace? Message-ID: <437A5049.9090308@nortel.com> We're running a dual-970 blade, based on a modified 2.6.10. We have an application that does lots of random data fetches over a fairly large data set (a few GB) contained entirely in RAM, and the performance guys think that we may be spending time in unnecessary hardware prefetches and would like me to provide them a mechanism to individually specify the cache-inhibited and guard bits from userspace so that they can try to fine-tune their performance. What's the most logical way for me to do this? Do I extend mprotect() to support additional flags? Has anyone done this before? I didn't find anything in google. Currently the guard bit seems to only be used for ioremap() and in __pci_mmap_set_pgprot() if the memory doesn't support write combining. Thanks, Chris From paulus at samba.org Wed Nov 16 08:56:12 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 08:56:12 +1100 Subject: [PATCH 0/7] PCI Error Recovery In-Reply-To: <20051115175934.GO19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> <20051114214703.GG19593@austin.ibm.com> <20051115164901.GA12968@kroah.com> <20051115175934.GO19593@austin.ibm.com> Message-ID: <17274.22908.867547.10612@cargo.ozlabs.ibm.com> linas writes: > ? I'm sorry, I'm crawling the archives, and can't find any threads > that haven't already been addressed in the final patchset. I think someone wanted you to make the bitwise thing an unsigned int rather than an int. I don't remember any other changes being requested, if someone did want something, hopefully they'll chime in and remind us. :) Paul. From zarniwhoop at ntlworld.com Wed Nov 16 05:54:48 2005 From: zarniwhoop at ntlworld.com (Ken Moffat) Date: Tue, 15 Nov 2005 18:54:48 +0000 (GMT) Subject: G5 (SMU) loss of keyboard/mouse Message-ID: My latest toy is one of the last of the single processor G5s (PowerMac9,1). It works well enough to install ubuntu or fedora, but once I boot the installed system I lose keyboard and mouse input in a couple of minutes. With a raphical desktop login I lose them as soon as the graphical login appears, probably because there is a sound associated with that (found that from a bug filed in ubuntu, but it seems to be generic, certainly the fedora install has the same problem if I try to test a sound). I also lose it if I try tab-completion in bash. I've occasionally seen ide error messages, as if the disk is no longer responding. Running a non-graphical installation helps a bit (I had over 10 minutes uptime on one occasion), as does upgrading to a cross-compiled 2.6.14.2 (working network - thanks for that advice, Ben - and tab-completion doesn't always hang). I suspect this might be an unfortunate combination of options in .config, so I'll start by asking if anybody has one of these machines running reliably ? Ken -- das eine Mal als Trag?die, das andere Mal als Farce From arnd at arndb.de Wed Nov 16 09:38:53 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 15 Nov 2005 23:38:53 +0100 Subject: modify the cache-inhibit and guard bits from userspace? In-Reply-To: <437A5049.9090308@nortel.com> References: <437A5049.9090308@nortel.com> Message-ID: <200511152338.53238.arnd@arndb.de> Am Dienstag 15 November 2005 22:16 schrieb Christopher Friesen: > What's the most logical way for me to do this? ?Do I extend mprotect() > to support additional flags? > > Has anyone done this before? ?I didn't find anything in google. > Currently the guard bit seems to only be used for ioremap() and in > __pci_mmap_set_pgprot() if the memory doesn't support write combining. I have seen an earlier patch that modifies madvise to do this, which seems a little saner than mprotect, although they can probably both be implemented in a similar way. Alternatively, you could write a new file system similar to hugetlbfs and set the cache-inhibit bit in its mmap function. Arnd <>< From cfriesen at nortel.com Wed Nov 16 10:15:58 2005 From: cfriesen at nortel.com (Christopher Friesen) Date: Tue, 15 Nov 2005 17:15:58 -0600 Subject: modify the cache-inhibit and guard bits from userspace? In-Reply-To: <200511152338.53238.arnd@arndb.de> References: <437A5049.9090308@nortel.com> <200511152338.53238.arnd@arndb.de> Message-ID: <437A6C2E.1020807@nortel.com> Arnd Bergmann wrote: > I have seen an earlier patch that modifies madvise to do this, which seems > a little saner than mprotect, although they can probably both be implemented > in a similar way. Ah, that would make sense. It does fit the intent of the function a bit better. > Alternatively, you could write a new file system similar to hugetlbfs and set > the cache-inhibit bit in its mmap function. Also a possibility. I think the madvise method is a bit cleaner for the apps. Chris From benh at kernel.crashing.org Wed Nov 16 10:20:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 10:20:01 +1100 Subject: G5 (SMU) loss of keyboard/mouse In-Reply-To: References: Message-ID: <1132096801.5646.59.camel@gaston> On Tue, 2005-11-15 at 18:54 +0000, Ken Moffat wrote: > My latest toy is one of the last of the single processor G5s > (PowerMac9,1). It works well enough to install ubuntu or fedora, but > once I boot the installed system I lose keyboard and mouse input in a > couple of minutes. > > With a raphical desktop login I lose them as soon as the graphical > login appears, probably because there is a sound associated with that > (found that from a bug filed in ubuntu, but it seems to be generic, > certainly the fedora install has the same problem if I try to test a > sound). I also lose it if I try tab-completion in bash. I've > occasionally seen ide error messages, as if the disk is no longer > responding. > > Running a non-graphical installation helps a bit (I had over 10 minutes > uptime on one occasion), as does upgrading to a cross-compiled 2.6.14.2 > (working network - thanks for that advice, Ben - and tab-completion > doesn't always hang). > > I suspect this might be an unfortunate combination of options in > .config, so I'll start by asking if anybody has one of these machines > running reliably ? The problems you are reporting look strangely similar to what was reported by a iMac G5 rev 2 (PowerMac8,2) user. Strangely, the PowerMac8,1 that I have here (iMac G5 rev 1) seem to be rock solid. Is the kernel trying to load the alsa driver ? Does it help not loading it at all ? Did you try not using the CD/DVD-ROM drive ? For example, putting the files form the CD on a MacOS partition or on a remote machine and doing an HTTP install ? Also make sure nothing is trying to load a known broken driver like parport_pc (CUPS tend to do that) Ben. From paulus at samba.org Wed Nov 16 11:43:26 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 11:43:26 +1100 Subject: [PATCH] powerpc: Fix sparsemem with memory holes [was Re: ppc64 oops..] In-Reply-To: References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <17274.18790.902934.836437@cargo.ozlabs.ibm.com> Message-ID: <17274.32942.46039.118367@cargo.ozlabs.ibm.com> This patch should fix the crashes we have been seeing on 64-bit powerpc systems with a memory hole when sparsemem is enabled. I'd appreciate it if people who know more about NUMA and sparsemem than me could look over it. There were two bugs. The first was that if NUMA was enabled but there was no NUMA information for the machine, the setup_nonnuma() function was adding a single region, assuming memory was contiguous. The second was that the loops in mem_init() and show_mem() assumed that all pages within the span of a pgdat were valid (had a valid struct page). I also fixed the incorrect setting of num_physpages that Mike Kravetz pointed out. Signed-off-by: Paul Mackerras --- diff -urN powerpc-merge/arch/powerpc/mm/mem.c merge-hack/arch/powerpc/mm/mem.c --- powerpc-merge/arch/powerpc/mm/mem.c 2005-11-14 10:35:09.000000000 +1100 +++ merge-hack/arch/powerpc/mm/mem.c 2005-11-16 11:35:52.000000000 +1100 @@ -200,6 +200,8 @@ unsigned long flags; pgdat_resize_lock(pgdat, &flags); for (i = 0; i < pgdat->node_spanned_pages; i++) { + if (!pfn_valid(pgdat->node_start_pfn + i)) + continue; page = pgdat_page_nr(pgdat, i); total++; if (PageHighMem(page)) @@ -336,7 +338,7 @@ struct page *page; unsigned long reservedpages = 0, codesize, initsize, datasize, bsssize; - num_physpages = max_pfn; /* RAM is assumed contiguous */ + num_physpages = lmb.memory.size >> PAGE_SHIFT; high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); #ifdef CONFIG_NEED_MULTIPLE_NODES @@ -348,11 +350,13 @@ } } #else - max_mapnr = num_physpages; + max_mapnr = max_pfn; totalram_pages += free_all_bootmem(); #endif for_each_pgdat(pgdat) { for (i = 0; i < pgdat->node_spanned_pages; i++) { + if (!pfn_valid(pgdat->node_start_pfn + i)) + continue; page = pgdat_page_nr(pgdat, i); if (PageReserved(page)) reservedpages++; diff -urN powerpc-merge/arch/powerpc/mm/numa.c merge-hack/arch/powerpc/mm/numa.c --- powerpc-merge/arch/powerpc/mm/numa.c 2005-11-14 10:35:09.000000000 +1100 +++ merge-hack/arch/powerpc/mm/numa.c 2005-11-15 21:58:26.000000000 +1100 @@ -483,6 +483,7 @@ { unsigned long top_of_ram = lmb_end_of_DRAM(); unsigned long total_ram = lmb_phys_mem_size(); + unsigned int i; printk(KERN_INFO "Top of RAM: 0x%lx, Total RAM: 0x%lx\n", top_of_ram, total_ram); @@ -490,7 +491,9 @@ (top_of_ram - total_ram) >> 20); map_cpu_to_node(boot_cpuid, 0); - add_region(0, 0, lmb_end_of_DRAM() >> PAGE_SHIFT); + for (i = 0; i < lmb.memory.cnt; ++i) + add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT, + lmb_size_pages(&lmb.memory, i)); node_set_online(0); } From linas at austin.ibm.com Wed Nov 16 11:54:29 2005 From: linas at austin.ibm.com (linas) Date: Tue, 15 Nov 2005 18:54:29 -0600 Subject: [PATCH 0/7] PCI Error Recovery In-Reply-To: <17274.22908.867547.10612@cargo.ozlabs.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> <20051114214703.GG19593@austin.ibm.com> <20051115164901.GA12968@kroah.com> <20051115175934.GO19593@austin.ibm.com> <17274.22908.867547.10612@cargo.ozlabs.ibm.com> Message-ID: <20051116005429.GP19593@austin.ibm.com> On Wed, Nov 16, 2005 at 08:56:12AM +1100, Paul Mackerras was heard to remark: > linas writes: > > > ? I'm sorry, I'm crawling the archives, and can't find any threads > > that haven't already been addressed in the final patchset. > > I think someone wanted you to make the bitwise thing an unsigned int > rather than an int. Oh right. I replied off-list. Teach me to go off list. --linas From paulus at samba.org Wed Nov 16 12:07:32 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 12:07:32 +1100 Subject: ppc64 oops.. In-Reply-To: <43799B8E.3050600@yahoo.com.au> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <20051114214158.580883b3.akpm@osdl.org> <43799B8E.3050600@yahoo.com.au> Message-ID: <17274.34388.555865.416007@cargo.ozlabs.ibm.com> Nick Piggin writes: > I really don't think we've missed PG_reserved. The ZERO_PAGE accounting > thing may be a problem, but that problem didn't come about due to > removal of PageReserved, but rather the concurrent removal of ZERO_PAGE > special casing we had there - it can be reinstated (and a solution for > 2.6.15 won't be difficult). Not that I'm any sort of a VM expert, but it seems to me that we need some sort of way to mark things like the hashed page table on PowerPC as being "special" memory that is there, but that the VM system should just completely ignore. That's what I thought PG_reserved was for, and IMHO it's useful for that. For sure we should eliminate the abuses of PG_reserved, but I don't see that that means we should eliminate PG_reserved itself. Paul. From akpm at osdl.org Wed Nov 16 12:41:45 2005 From: akpm at osdl.org (Andrew Morton) Date: Tue, 15 Nov 2005 17:41:45 -0800 Subject: [PATCH 1/5] spufs: The SPU file system, base In-Reply-To: <20051115210408.327453000@localhost> References: <20051115205347.395355000@localhost> <20051115210408.327453000@localhost> Message-ID: <20051115174145.70f37501.akpm@osdl.org> Arnd Bergmann wrote: > > This is the current version of the spu file system, used > for driving SPEs on the Cell Broadband Engine. +EXPORT_SYMBOL_GPL(hash_page); +EXPORT_SYMBOL(spu_alloc); +EXPORT_SYMBOL(spu_free); +EXPORT_SYMBOL(spu_run); +EXPORT_SYMBOL(spu_ibox_read); +EXPORT_SYMBOL(spu_wbox_write); +EXPORT_SYMBOL_GPL(register_spu_syscalls); +EXPORT_SYMBOL_GPL(unregister_spu_syscalls); -EXPORT_SYMBOL_GPL(__handle_mm_fault); /* For MOL */ +EXPORT_SYMBOL_GPL(__handle_mm_fault); A strange mixture of GPL and non-GPL. What's the thinking here? From nickpiggin at yahoo.com.au Wed Nov 16 12:49:57 2005 From: nickpiggin at yahoo.com.au (Nick Piggin) Date: Wed, 16 Nov 2005 12:49:57 +1100 Subject: ppc64 oops.. In-Reply-To: <17274.34388.555865.416007@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <20051114214158.580883b3.akpm@osdl.org> <43799B8E.3050600@yahoo.com.au> <17274.34388.555865.416007@cargo.ozlabs.ibm.com> Message-ID: <437A9045.6070903@yahoo.com.au> Paul Mackerras wrote: > Nick Piggin writes: > > >>I really don't think we've missed PG_reserved. The ZERO_PAGE accounting >>thing may be a problem, but that problem didn't come about due to >>removal of PageReserved, but rather the concurrent removal of ZERO_PAGE >>special casing we had there - it can be reinstated (and a solution for >>2.6.15 won't be difficult). > > > Not that I'm any sort of a VM expert, but it seems to me that we need > some sort of way to mark things like the hashed page table on PowerPC > as being "special" memory that is there, but that the VM system should > just completely ignore. That's what I thought PG_reserved was for, > and IMHO it's useful for that. For sure we should eliminate the > abuses of PG_reserved, but I don't see that that means we should > eliminate PG_reserved itself. > Well, a page is introduced to the VM system in one of two ways really. Either it gets put into the page allocator as a free page, or it gets returned from ->nopage or otherwise mapped into a user mapping. The first case is arch specific, but sure PG_reserved may come in handy to track these pages. swsusp for example uses this flag, and in that case I think its usage is valid. In the second case, the VM can't completely ignore the page. It is cleaner to specify treatment of pages through this mapping by using properties of the mapping rather than pages that might be in it (ie. VM_RESERVED rather than PG_reserved). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com From becky.bruce at freescale.com Wed Nov 16 13:19:58 2005 From: becky.bruce at freescale.com (Becky Bruce) Date: Tue, 15 Nov 2005 20:19:58 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1132032910.23979.6.camel@gaston> References: <1132032910.23979.6.camel@gaston> Message-ID: <00eecfdbd5bccc7b293d847033121eee@freescale.com> Ben, Yeah, I clearly shouldn't run testcases at 11pm, because I got in a rush and only confirmed that lmw/stmw were actually taking the exception. Those 2 are working beautifully. To test the others, I need to run on a different board which, of course, isn't bootable at the moment. As soon as I can get that up and running, I'll try some of the other cases and let you know how it goes...... BTW, Based on the pile of docs I have here, I think the list of alignment-exception-causing events on FSL's current parts (603, 603e, 750, 74x, 74xx, e500) is: - lmw/stmw (all procs, non-word aligned) - single and double precision floating point ld/st ops (non-E500, non data size aligned) - dcbz to WT or CI memory (all procs) - dcbz with cache disabled (all procs but 603e?) - misaligned little endian accesses (603e) - lwarx/stwcx (all procs) - multiple/string with LE set (750, 603e, 7450, 7400) - eciwx/ecowx (750, 7450, 7400) - a couple of others related to vector processing If anybody knows offhand of something missing there, let me know. Cheers, B On Nov 14, 2005, at 11:35 PM, Benjamin Herrenschmidt wrote: > On Mon, 2005-11-14 at 23:10 -0600, Becky Bruce wrote: > > Ben, > > > > I've just done some basic testing of lmw/stmw, lwz/stw, lhx/sth, > > lfs/stfs, and lfd/stfd misaligned across a doubleword boundary, and > > everything looks good so far.?? I'll check out the byte reversals > and a > > few other forms tomorrow. > > Excellent, thanks ! BTW. Make sure you test these one CPUs that > actually > trap on misaligned accesses :) Best is probably to do the misaligned > access accross a page boundary, that's what most CPUs can do. > > Ben. > From benh at kernel.crashing.org Wed Nov 16 13:34:49 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 13:34:49 +1100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <00eecfdbd5bccc7b293d847033121eee@freescale.com> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> Message-ID: <1132108490.5646.67.camel@gaston> On Tue, 2005-11-15 at 20:19 -0600, Becky Bruce wrote: > Ben, > > Yeah, I clearly shouldn't run testcases at 11pm, because I got in a > rush and only confirmed that lmw/stmw were actually taking the > exception. Those 2 are working beautifully. To test the others, I > need to run on a different board which, of course, isn't bootable at > the moment. As soon as I can get that up and running, I'll try some of > the other cases and let you know how it goes...... > > BTW, Based on the pile of docs I have here, I think the list of > alignment-exception-causing events on FSL's current parts (603, 603e, > 750, 74x, 74xx, e500) is: > > - lmw/stmw (all procs, non-word aligned) > - single and double precision floating point ld/st ops (non-E500, non > data size aligned) > - dcbz to WT or CI memory (all procs) > - dcbz with cache disabled (all procs but 603e?) > - misaligned little endian accesses (603e) > - lwarx/stwcx (all procs) > - multiple/string with LE set (750, 603e, 7450, 7400) > - eciwx/ecowx (750, 7450, 7400) > - a couple of others related to vector processing > > If anybody knows offhand of something missing there, let me know. What about lwz/stw cropssing page boundaries ? Is this handled in HW ? Ben. From benh at kernel.crashing.org Wed Nov 16 13:54:32 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 13:54:32 +1100 Subject: [PATCH] powerpc: Make the vDSO functions set error code (#2) Message-ID: <1132109673.5646.72.camel@gaston> The vDSO functions should have the same calling convention as a syscall. Unfortunately, they currently don't set the cr0.so bit which is used to indicate an error. This patch makes them clear this bit unconditionally since all functions currently succeed. The syscall fallback done by some of them will eventually override this if the syscall fails. This also changes the symbol version of all vdso exports to make sure glibc can differenciate between old and fixed calls for existing ones like __kernel_gettimeofday. Signed-off-by: Benjamin Herrenschmidt --- Tom, Steve: You'll have to write a wrapper macro to call the vdso similar to the syscall one, that does something like: mtctr %0 <--- function address bctrl mfcr %1 <--- error indication in cr0.so With appropriate clobbers, similar to the syscall macro (the vDSO clobbers all volatile register (r0, r3 ... r12, cr0, cr1 and XER) The advantage of doing so is that you don't have to create fake descriptors for ppc64 and thus avoid useless TOC reloads, you can even have a single macro that works on both 32 and 64 bits. Index: linux-work/arch/powerpc/kernel/vdso32/cacheflush.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/cacheflush.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/cacheflush.S 2005-11-16 13:52:05.000000000 +1100 @@ -35,6 +35,7 @@ subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ srwi. r8,r8,7 /* compute line count */ + crclr cr0*4+so beqlr /* nothing to do? */ mtctr r8 mr r3,r6 @@ -58,6 +59,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) .cfi_startproc + crclr cr0*4+so sync isync li r3,0 Index: linux-work/arch/powerpc/kernel/vdso32/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/datapage.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/datapage.S 2005-11-16 13:51:35.000000000 +1100 @@ -54,7 +54,6 @@ .cfi_startproc mflr r12 .cfi_register lr,r12 - mr r4,r3 bl __get_datapage at local mtlr r12 @@ -63,6 +62,7 @@ beqlr li r0,__NR_syscalls stw r0,0(r4) + crclr cr0*4+so blr .cfi_endproc V_FUNCTION_END(__kernel_get_syscall_map) @@ -80,6 +80,7 @@ lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3) lwz r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + crclr cr0*4+so blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) Index: linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso32/gettimeofday.S 2005-11-16 13:50:35.000000000 +1100 @@ -59,6 +59,7 @@ stw r5,TZONE_TZ_DSTTIME(r11) 1: mtlr r12 + crclr cr0*4+so li r3,0 blr @@ -117,6 +118,7 @@ mulli r5,r5,1000 stw r5,TSPC32_TV_NSEC(r11) mtlr r12 + crclr cr0*4+so li r3,0 blr @@ -185,6 +187,7 @@ stw r4,TSPC32_TV_NSEC(r11) mtlr r12 + crclr cr0*4+so li r3,0 blr @@ -219,6 +222,7 @@ li r3,0 cmpli cr0,r4,0 + crclr cr0*4+so beqlr lis r5,CLOCK_REALTIME_RES at h ori r5,r5,CLOCK_REALTIME_RES at l Index: linux-work/arch/powerpc/kernel/vdso64/cacheflush.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/cacheflush.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/cacheflush.S 2005-11-16 13:53:48.000000000 +1100 @@ -35,6 +35,7 @@ subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ srwi. r8,r8,7 /* compute line count */ + crclr cr0*4+so beqlr /* nothing to do? */ mtctr r8 mr r3,r6 @@ -58,6 +59,7 @@ */ V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) .cfi_startproc + crclr cr0*4+so sync isync li r3,0 Index: linux-work/arch/powerpc/kernel/vdso64/datapage.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/datapage.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/datapage.S 2005-11-16 13:53:36.000000000 +1100 @@ -54,12 +54,12 @@ .cfi_startproc mflr r12 .cfi_register lr,r12 - mr r4,r3 bl V_LOCAL_FUNC(__get_datapage) mtlr r12 addi r3,r3,CFG_SYSCALL_MAP64 cmpli cr0,r4,0 + crclr cr0*4+so beqlr li r0,__NR_syscalls stw r0,0(r4) @@ -80,6 +80,7 @@ bl V_LOCAL_FUNC(__get_datapage) ld r3,CFG_TB_TICKS_PER_SEC(r3) mtlr r12 + crclr cr0*4+so blr .cfi_endproc V_FUNCTION_END(__kernel_get_tbfreq) Index: linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S =================================================================== --- linux-work.orig/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/arch/powerpc/kernel/vdso64/gettimeofday.S 2005-11-16 13:53:12.000000000 +1100 @@ -52,6 +52,7 @@ stw r4,TZONE_TZ_MINWEST(r10) stw r5,TZONE_TZ_DSTTIME(r10) 1: mtlr r12 + crclr cr0*4+so li r3,0 /* always success */ blr .cfi_endproc @@ -99,6 +100,7 @@ std r0,TSPC64_TV_NSEC(r11) /* store nsec in tp */ mtlr r12 + crclr cr0*4+so li r3,0 blr @@ -159,6 +161,7 @@ std r7,TSPC64_TV_NSEC(r11) mtlr r12 + crclr cr0*4+so li r3,0 blr @@ -193,6 +196,7 @@ li r3,0 cmpli cr0,r4,0 + crclr cr0*4+so beqlr lis r5,CLOCK_REALTIME_RES at h ori r5,r5,CLOCK_REALTIME_RES at l Index: linux-work/include/asm-powerpc/vdso.h =================================================================== --- linux-work.orig/include/asm-powerpc/vdso.h 2005-11-16 13:39:00.000000000 +1100 +++ linux-work/include/asm-powerpc/vdso.h 2005-11-16 13:42:22.000000000 +1100 @@ -11,7 +11,7 @@ #define VDSO32_MBASE VDSO32_LBASE #define VDSO64_MBASE VDSO64_LBASE -#define VDSO_VERSION_STRING LINUX_2.6.12 +#define VDSO_VERSION_STRING LINUX_2.6.15 /* Define if 64 bits VDSO has procedure descriptors */ #undef VDS64_HAS_DESCRIPTORS From becky.bruce at freescale.com Wed Nov 16 14:23:13 2005 From: becky.bruce at freescale.com (Becky Bruce) Date: Tue, 15 Nov 2005 21:23:13 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1132108490.5646.67.camel@gaston> References: <1132108490.5646.67.camel@gaston> Message-ID: <4ad202b87fa52d954e645b05fb45ca13@freescale.com> On Nov 15, 2005, at 8:34 PM, Benjamin Herrenschmidt wrote: > > > > BTW, Based on the pile of docs I have here, I think the list of > > alignment-exception-causing events on FSL's current parts (603, 603e, > > 750, 74x, 74xx, e500) is: > > > > - lmw/stmw (all procs, non-word aligned) > > - single and double precision floating point ld/st ops (non-E500, non > > data size aligned) > > - dcbz to WT or CI memory (all procs) > > - dcbz with cache disabled (all procs but 603e?) > > - misaligned little endian accesses (603e) > > - lwarx/stwcx (all procs) > > - multiple/string with LE set (750, 603e, 7450, 7400) > > - eciwx/ecowx (750, 7450, 7400) > > - a couple of others related to vector processing > > > > If anybody knows offhand of something missing there, let me know. > > What about lwz/stw cropssing page boundaries ? Is this handled in HW ? > > Ben. Apparently so, much to my surprise - I ran the testcase with those instructions misaligned across a page boundary last night and got no alignment exception. I was surprised, and asked my husband about it (he worked on the load/store units for a bunch of our parts), and he says these guys never cause an exception for any of FSL's current parts as far as he knows. This is supported by our documentation as well - the only place I see these listed is on 603e, where they can cause an exception if the page is mapped little endian. -B From david at gibson.dropbear.id.au Wed Nov 16 15:28:55 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 16 Nov 2005 15:28:55 +1100 Subject: powerpc: Remove imalloc.h Message-ID: <20051116042855.GC13985@localhost.localdomain> asm-ppc64/imalloc.h is only included from files in arch/powerpc/mm. We already have a header for mm local definitions, arch/powerpc/mm/mmu_decl.h. Thus, this patch moves the contents of imalloc.h into mmu_decl.h. The only exception are the definitions of PHBS_IO_BASE, IMALLOC_BASE and IMALLOC_END. Those are moved into pgtable.h, next to similar definitions of VMALLOC_START and VMALLOC_SIZE. Built for multiplatform 32bit and 64bit (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/imalloc.h =================================================================== --- working-2.6.orig/include/asm-ppc64/imalloc.h 2005-11-16 14:16:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,26 +0,0 @@ -#ifndef _PPC64_IMALLOC_H -#define _PPC64_IMALLOC_H - -/* - * Define the address range of the imalloc VM area. - */ -#define PHBS_IO_BASE VMALLOC_END -#define IMALLOC_BASE (PHBS_IO_BASE + 0x80000000ul) /* Reserve 2 gigs for PHBs */ -#define IMALLOC_END (VMALLOC_START + PGTABLE_RANGE) - - -/* imalloc region types */ -#define IM_REGION_UNUSED 0x1 -#define IM_REGION_SUBSET 0x2 -#define IM_REGION_EXISTS 0x4 -#define IM_REGION_OVERLAP 0x8 -#define IM_REGION_SUPERSET 0x10 - -extern struct vm_struct * im_get_free_area(unsigned long size); -extern struct vm_struct * im_get_area(unsigned long v_addr, unsigned long size, - int region_type); -extern void im_free(void *addr); - -extern unsigned long ioremap_bot; - -#endif /* _PPC64_IMALLOC_H */ Index: working-2.6/include/asm-ppc64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable.h 2005-11-16 14:16:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable.h 2005-11-16 14:16:35.000000000 +1100 @@ -47,6 +47,13 @@ #define VMALLOC_END (VMALLOC_START + VMALLOC_SIZE) /* + * Define the address range of the imalloc VM area. + */ +#define PHBS_IO_BASE VMALLOC_END +#define IMALLOC_BASE (PHBS_IO_BASE + 0x80000000ul) /* Reserve 2 gigs for PHBs */ +#define IMALLOC_END (VMALLOC_START + PGTABLE_RANGE) + +/* * Common bits in a linux-style PTE. These match the bits in the * (hardware-defined) PowerPC PTE as closely as possible. Additional * bits may be defined in pgtable-*.h Index: working-2.6/arch/powerpc/mm/mmu_decl.h =================================================================== --- working-2.6.orig/arch/powerpc/mm/mmu_decl.h 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/mm/mmu_decl.h 2005-11-16 14:17:20.000000000 +1100 @@ -33,7 +33,6 @@ extern int __map_without_bats; extern unsigned long ioremap_base; -extern unsigned long ioremap_bot; extern unsigned int rtas_data, rtas_size; extern PTE *Hash, *Hash_end; @@ -42,6 +41,7 @@ extern unsigned int num_tlbcam_entries; #endif +extern unsigned long ioremap_bot; extern unsigned long __max_low_memory; extern unsigned long __initial_memory_limit; extern unsigned long total_memory; @@ -84,4 +84,16 @@ else _tlbie(va); } +#else /* CONFIG_PPC64 */ +/* imalloc region types */ +#define IM_REGION_UNUSED 0x1 +#define IM_REGION_SUBSET 0x2 +#define IM_REGION_EXISTS 0x4 +#define IM_REGION_OVERLAP 0x8 +#define IM_REGION_SUPERSET 0x10 + +extern struct vm_struct * im_get_free_area(unsigned long size); +extern struct vm_struct * im_get_area(unsigned long v_addr, unsigned long size, + int region_type); +extern void im_free(void *addr); #endif Index: working-2.6/arch/powerpc/mm/imalloc.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/imalloc.c 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/mm/imalloc.c 2005-11-16 14:48:14.000000000 +1100 @@ -14,9 +14,10 @@ #include #include #include -#include #include +#include "mmu_decl.h" + static DECLARE_MUTEX(imlist_sem); struct vm_struct * imlist = NULL; Index: working-2.6/arch/powerpc/mm/pgtable_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/pgtable_64.c 2005-11-10 15:05:55.000000000 +1100 +++ working-2.6/arch/powerpc/mm/pgtable_64.c 2005-11-16 14:48:07.000000000 +1100 @@ -64,7 +64,8 @@ #include #include #include -#include + +#include "mmu_decl.h" unsigned long ioremap_bot = IMALLOC_BASE; static unsigned long phbs_io_bot = PHBS_IO_BASE; -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From paulus at samba.org Wed Nov 16 15:33:02 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 15:33:02 +1100 Subject: please pull powerpc-merge.git Message-ID: <17274.46718.214718.675981@cargo.ozlabs.ibm.com> Linus, Please do another pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git I have moved arch/ppc64/boot over to arch/powerpc/boot, so if you use a zImage, you will have to use arch/powerpc/boot/zImage.vmode rather than arch/ppc64/boot/zImage.vmode (for a G5). I assume you probably boot a vmlinux using yaboot, though. Other than that, there is a collection of minor build and bug fixes. I haven't included the NUMA/sparsemem fix yet (still waiting for comments on the patch). Thanks, Paul. Benjamin Herrenschmidt: powerpc: update defconfigs powerpc: pci_64 fixes & cleanups ppc: Fix boot with yaboot with ARCH=ppc ppc: Fix build with CONFIG_CHRP not set powerpc: Make the vDSO functions set error code (#2) Guido Guenther: PowerBook 6,1: headphone not detected after suspend Kumar Gala: powerpc: replace page_to_virt() with lowmem_page_address() for Book-E Marcelo Tosatti: ppc32 8xx: update_mmu_cache() needs unconditional tlbie Michael Ellerman: powerpc: Fixup debugging in lmb.c powerpc: More debugging fixups Olof Johansson: powerpc: add new powerbooks to feature table Paul Mackerras: powerpc: Move ppc64 boot wrapper code over to arch/powerpc arch/powerpc/Makefile | 25 +--- arch/powerpc/boot/Makefile | 5 - arch/powerpc/boot/README | 0 arch/powerpc/boot/addRamDisk.c | 0 arch/powerpc/boot/addnote.c | 0 arch/powerpc/boot/crt0.S | 0 arch/powerpc/boot/div64.S | 0 arch/powerpc/boot/elf.h | 0 arch/powerpc/boot/install.sh | 0 arch/powerpc/boot/main.c | 122 +++++++++++++------- arch/powerpc/boot/page.h | 0 arch/powerpc/boot/ppc_asm.h | 0 arch/powerpc/boot/prom.c | 0 arch/powerpc/boot/prom.h | 0 arch/powerpc/boot/stdio.h | 0 arch/powerpc/boot/string.S | 0 arch/powerpc/boot/string.h | 0 arch/powerpc/boot/zImage.lds | 0 arch/powerpc/configs/cell_defconfig | 175 +++++++++++++++++++---------- arch/powerpc/configs/g5_defconfig | 53 +++++---- arch/powerpc/configs/iseries_defconfig | 159 ++++++++++++++++++-------- arch/powerpc/configs/maple_defconfig | 155 +++++++++++++++++--------- arch/powerpc/configs/pseries_defconfig | 4 - arch/powerpc/kernel/pci_64.c | 70 +++++++++++- arch/powerpc/kernel/rtas_pci.c | 68 +---------- arch/powerpc/kernel/setup-common.c | 1 arch/powerpc/kernel/smp.c | 7 + arch/powerpc/kernel/vdso32/cacheflush.S | 2 arch/powerpc/kernel/vdso32/datapage.S | 3 arch/powerpc/kernel/vdso32/gettimeofday.S | 4 + arch/powerpc/kernel/vdso64/cacheflush.S | 2 arch/powerpc/kernel/vdso64/datapage.S | 3 arch/powerpc/kernel/vdso64/gettimeofday.S | 4 + arch/powerpc/mm/lmb.c | 33 +++-- arch/powerpc/platforms/iseries/pci.c | 3 arch/powerpc/platforms/maple/pci.c | 16 --- arch/powerpc/platforms/powermac/feature.c | 8 + arch/powerpc/platforms/powermac/pci.c | 62 +++++----- arch/powerpc/platforms/pseries/smp.c | 1 arch/ppc/kernel/setup.c | 14 ++ arch/ppc/mm/init.c | 23 ++-- arch/ppc/xmon/start.c | 5 + include/asm-powerpc/ppc-pci.h | 1 include/asm-powerpc/vdso.h | 2 include/asm-ppc/pgalloc.h | 2 include/asm-ppc64/pci-bridge.h | 14 ++ sound/ppc/tumbler.c | 8 + 47 files changed, 652 insertions(+), 402 deletions(-) rename arch/{ppc64/boot/Makefile => powerpc/boot/Makefile} (100%) rename arch/{ppc64/boot/README => powerpc/boot/README} (100%) rename arch/{ppc64/boot/addRamDisk.c => powerpc/boot/addRamDisk.c} (100%) rename arch/{ppc64/boot/addnote.c => powerpc/boot/addnote.c} (100%) rename arch/{ppc64/boot/crt0.S => powerpc/boot/crt0.S} (100%) rename arch/{ppc64/boot/div64.S => powerpc/boot/div64.S} (100%) rename arch/{ppc64/boot/elf.h => powerpc/boot/elf.h} (100%) rename arch/{ppc64/boot/install.sh => powerpc/boot/install.sh} (100%) rename arch/{ppc64/boot/main.c => powerpc/boot/main.c} (76%) rename arch/{ppc64/boot/page.h => powerpc/boot/page.h} (100%) rename arch/{ppc64/boot/ppc_asm.h => powerpc/boot/ppc_asm.h} (100%) rename arch/{ppc64/boot/prom.c => powerpc/boot/prom.c} (100%) rename arch/{ppc64/boot/prom.h => powerpc/boot/prom.h} (100%) rename arch/{ppc64/boot/stdio.h => powerpc/boot/stdio.h} (100%) rename arch/{ppc64/boot/string.S => powerpc/boot/string.S} (100%) rename arch/{ppc64/boot/string.h => powerpc/boot/string.h} (100%) rename arch/{ppc64/boot/zImage.lds => powerpc/boot/zImage.lds} (100%) From david at gibson.dropbear.id.au Wed Nov 16 15:43:48 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 16 Nov 2005 15:43:48 +1100 Subject: powerpc: Remove imalloc.h In-Reply-To: <20051116042855.GC13985@localhost.localdomain> References: <20051116042855.GC13985@localhost.localdomain> Message-ID: <20051116044348.GE13985@localhost.localdomain> Crud. Forgot a quilt refresh. Here's a correct version of the patch. powerpc: Remove imalloc.h asm-ppc64/imalloc.h is only included from files in arch/powerpc/mm. We already have a header for mm local definitions, arch/powerpc/mm/mmu_decl.h. Thus, this patch moves the contents of imalloc.h into mmu_decl.h. The only exception are the definitions of PHBS_IO_BASE, IMALLOC_BASE and IMALLOC_END. Those are moved into pgtable.h, next to similar definitions of VMALLOC_START and VMALLOC_SIZE. Built for multiplatform 32bit and 64bit (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/imalloc.h =================================================================== --- working-2.6.orig/include/asm-ppc64/imalloc.h 2005-11-16 14:16:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,26 +0,0 @@ -#ifndef _PPC64_IMALLOC_H -#define _PPC64_IMALLOC_H - -/* - * Define the address range of the imalloc VM area. - */ -#define PHBS_IO_BASE VMALLOC_END -#define IMALLOC_BASE (PHBS_IO_BASE + 0x80000000ul) /* Reserve 2 gigs for PHBs */ -#define IMALLOC_END (VMALLOC_START + PGTABLE_RANGE) - - -/* imalloc region types */ -#define IM_REGION_UNUSED 0x1 -#define IM_REGION_SUBSET 0x2 -#define IM_REGION_EXISTS 0x4 -#define IM_REGION_OVERLAP 0x8 -#define IM_REGION_SUPERSET 0x10 - -extern struct vm_struct * im_get_free_area(unsigned long size); -extern struct vm_struct * im_get_area(unsigned long v_addr, unsigned long size, - int region_type); -extern void im_free(void *addr); - -extern unsigned long ioremap_bot; - -#endif /* _PPC64_IMALLOC_H */ Index: working-2.6/include/asm-ppc64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable.h 2005-11-16 14:16:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable.h 2005-11-16 14:16:35.000000000 +1100 @@ -47,6 +47,13 @@ #define VMALLOC_END (VMALLOC_START + VMALLOC_SIZE) /* + * Define the address range of the imalloc VM area. + */ +#define PHBS_IO_BASE VMALLOC_END +#define IMALLOC_BASE (PHBS_IO_BASE + 0x80000000ul) /* Reserve 2 gigs for PHBs */ +#define IMALLOC_END (VMALLOC_START + PGTABLE_RANGE) + +/* * Common bits in a linux-style PTE. These match the bits in the * (hardware-defined) PowerPC PTE as closely as possible. Additional * bits may be defined in pgtable-*.h Index: working-2.6/arch/powerpc/mm/mmu_decl.h =================================================================== --- working-2.6.orig/arch/powerpc/mm/mmu_decl.h 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/mm/mmu_decl.h 2005-11-16 14:17:20.000000000 +1100 @@ -33,7 +33,6 @@ extern int __map_without_bats; extern unsigned long ioremap_base; -extern unsigned long ioremap_bot; extern unsigned int rtas_data, rtas_size; extern PTE *Hash, *Hash_end; @@ -42,6 +41,7 @@ extern unsigned int num_tlbcam_entries; #endif +extern unsigned long ioremap_bot; extern unsigned long __max_low_memory; extern unsigned long __initial_memory_limit; extern unsigned long total_memory; @@ -84,4 +84,16 @@ else _tlbie(va); } +#else /* CONFIG_PPC64 */ +/* imalloc region types */ +#define IM_REGION_UNUSED 0x1 +#define IM_REGION_SUBSET 0x2 +#define IM_REGION_EXISTS 0x4 +#define IM_REGION_OVERLAP 0x8 +#define IM_REGION_SUPERSET 0x10 + +extern struct vm_struct * im_get_free_area(unsigned long size); +extern struct vm_struct * im_get_area(unsigned long v_addr, unsigned long size, + int region_type); +extern void im_free(void *addr); #endif Index: working-2.6/arch/powerpc/mm/imalloc.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/imalloc.c 2005-11-08 10:57:14.000000000 +1100 +++ working-2.6/arch/powerpc/mm/imalloc.c 2005-11-16 14:48:14.000000000 +1100 @@ -14,9 +14,10 @@ #include #include #include -#include #include +#include "mmu_decl.h" + static DECLARE_MUTEX(imlist_sem); struct vm_struct * imlist = NULL; Index: working-2.6/arch/powerpc/mm/pgtable_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/pgtable_64.c 2005-11-10 15:05:55.000000000 +1100 +++ working-2.6/arch/powerpc/mm/pgtable_64.c 2005-11-16 14:48:07.000000000 +1100 @@ -64,7 +64,8 @@ #include #include #include -#include + +#include "mmu_decl.h" unsigned long ioremap_bot = IMALLOC_BASE; static unsigned long phbs_io_bot = PHBS_IO_BASE; Index: working-2.6/arch/powerpc/mm/init_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/init_64.c 2005-11-11 11:16:51.000000000 +1100 +++ working-2.6/arch/powerpc/mm/init_64.c 2005-11-16 14:48:39.000000000 +1100 @@ -64,7 +64,8 @@ #include #include #include -#include + +#include "mmu_decl.h" #ifdef DEBUG #define DBG(fmt...) printk(fmt) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson From benh at kernel.crashing.org Wed Nov 16 16:00:35 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 16:00:35 +1100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <02de724e66fe23fd23a3635c8b6f049f@embeddededge.com> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <1132108490.5646.67.camel@gaston> <02de724e66fe23fd23a3635c8b6f049f@embeddededge.com> Message-ID: <1132117236.5646.75.camel@gaston> On Tue, 2005-11-15 at 23:26 -0500, Dan Malek wrote: > On Nov 15, 2005, at 9:34 PM, Benjamin Herrenschmidt wrote: > > > What about lwz/stw cropssing page boundaries ? Is this handled in HW ? > > Yep. All of these hardware alignment support features on > the Freescale processors are the reasons they are used > so extensively in data communication processing (where > unaligned data can sometimes occur). All of the load/store > alignment issues are handled in the cache subsystem, so > to the external world all you really see are cache line > operations. In the event of uncached data operations, you > get the performance penalty of two bus accesses, where > some of the data is discarded. Oh well, I suppose I'll have to dig out paulus' 601 based mac :) Becky, can you send me a copy of your testcase ? Ben. From hugh at veritas.com Wed Nov 16 16:05:05 2005 From: hugh at veritas.com (Hugh Dickins) Date: Wed, 16 Nov 2005 05:05:05 +0000 (GMT) Subject: ppc64 oops.. In-Reply-To: <17274.34388.555865.416007@cargo.ozlabs.ibm.com> References: <17273.13728.450935.223836@cargo.ozlabs.ibm.com> <17273.26286.464586.872800@cargo.ozlabs.ibm.com> <20051114214158.580883b3.akpm@osdl.org> <43799B8E.3050600@yahoo.com.au> <17274.34388.555865.416007@cargo.ozlabs.ibm.com> Message-ID: On Wed, 16 Nov 2005, Paul Mackerras wrote: > > Not that I'm any sort of a VM expert, but it seems to me that we need > some sort of way to mark things like the hashed page table on PowerPC > as being "special" memory that is there, but that the VM system should > just completely ignore. At any time, most memory is in use for one purpose or another, and should not be interfered with by anything other than the subsystem that owns it at that point. Please explain what's so "special" about your hashed page table. Do you mean that that memory is set aside at initialization time, and it's a big disaster if it were mistakenly freed, or something else? Or it's not RAM? Excuse my ignorance. Hugh From dan at embeddededge.com Wed Nov 16 15:26:45 2005 From: dan at embeddededge.com (Dan Malek) Date: Tue, 15 Nov 2005 23:26:45 -0500 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1132108490.5646.67.camel@gaston> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <1132108490.5646.67.camel@gaston> Message-ID: <02de724e66fe23fd23a3635c8b6f049f@embeddededge.com> On Nov 15, 2005, at 9:34 PM, Benjamin Herrenschmidt wrote: > What about lwz/stw cropssing page boundaries ? Is this handled in HW ? Yep. All of these hardware alignment support features on the Freescale processors are the reasons they are used so extensively in data communication processing (where unaligned data can sometimes occur). All of the load/store alignment issues are handled in the cache subsystem, so to the external world all you really see are cache line operations. In the event of uncached data operations, you get the performance penalty of two bus accesses, where some of the data is discarded. -- Dan From paulus at samba.org Wed Nov 16 16:15:53 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 16:15:53 +1100 Subject: [PATCH 1/5] spufs: The SPU file system, base In-Reply-To: <20051115210408.327453000@localhost> References: <20051115205347.395355000@localhost> <20051115210408.327453000@localhost> Message-ID: <17274.49289.583486.477211@cargo.ozlabs.ibm.com> Arnd Bergmann writes: > --- linux-2.6.15-rc.orig/arch/ppc/kernel/ppc_ksyms.c > +++ linux-2.6.15-rc/arch/ppc/kernel/ppc_ksyms.c > @@ -311,7 +311,6 @@ EXPORT_SYMBOL(__res); > > EXPORT_SYMBOL(next_mmu_context); > EXPORT_SYMBOL(set_context); > -EXPORT_SYMBOL_GPL(__handle_mm_fault); /* For MOL */ Why? What have you got against MOL? :) Paul. From akpm at osdl.org Wed Nov 16 16:26:38 2005 From: akpm at osdl.org (Andrew Morton) Date: Tue, 15 Nov 2005 21:26:38 -0800 Subject: [PATCH 1/5] spufs: The SPU file system, base In-Reply-To: <17274.49289.583486.477211@cargo.ozlabs.ibm.com> References: <20051115205347.395355000@localhost> <20051115210408.327453000@localhost> <17274.49289.583486.477211@cargo.ozlabs.ibm.com> Message-ID: <20051115212638.5dca4a66.akpm@osdl.org> Paul Mackerras wrote: > > Arnd Bergmann writes: > > > --- linux-2.6.15-rc.orig/arch/ppc/kernel/ppc_ksyms.c > > +++ linux-2.6.15-rc/arch/ppc/kernel/ppc_ksyms.c > > @@ -311,7 +311,6 @@ EXPORT_SYMBOL(__res); > > > > EXPORT_SYMBOL(next_mmu_context); > > EXPORT_SYMBOL(set_context); > > -EXPORT_SYMBOL_GPL(__handle_mm_fault); /* For MOL */ > > Why? What have you got against MOL? :) > The export was moved to mm/memory.c. No explanation why though... From dan at embeddededge.com Wed Nov 16 16:35:30 2005 From: dan at embeddededge.com (Dan Malek) Date: Wed, 16 Nov 2005 00:35:30 -0500 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <1132117236.5646.75.camel@gaston> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <1132108490.5646.67.camel@gaston> <02de724e66fe23fd23a3635c8b6f049f@embeddededge.com> <1132117236.5646.75.camel@gaston> Message-ID: <324befe15b30886e86b0110c30334c8f@embeddededge.com> On Nov 16, 2005, at 12:00 AM, Benjamin Herrenschmidt wrote: > Oh well, I suppose I'll have to dig out paulus' 601 based mac :) If we don't have any contemporary processors that need this solution, can we just put on aside until someone has hardware that requires it? Thanks. -- Dan From paulus at samba.org Wed Nov 16 16:48:35 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 16:48:35 +1100 Subject: powerpc.git tree now on kernel.org Message-ID: <17274.51251.591579.8751@cargo.ozlabs.ibm.com> I have just created a git tree for ppc/powerpc patches that are candidates for 2.6.16. The tree is at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git Currently, besides the patches that are in the powerpc-merge.git tree, there are the following patches in there: Adrian Bunk: PPC_PREP: remove unneeded exports Benjamin Herrenschmidt: powerpc: Merge align.c (#2) David Gibson: powerpc: Remove imalloc.h David Woodhouse: syscall entry/exit revamp Kumar Gala: powerpc: moved ipic code to arch/powerpc Michael Ellerman: powerpc: Merge kexec Mike Kravetz: Remove SPAN_OTHER_NODES config definition Although some of these patches may go to Linus before 2.6.15 is released, I won't be asking Linus to pull this tree directly, since it is likely to get a bit messy as patches are updated. I expect that Andrew Morton will pull this tree periodically and include it in his -mm releases. Paul. From benh at kernel.crashing.org Wed Nov 16 17:13:45 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 17:13:45 +1100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <324befe15b30886e86b0110c30334c8f@embeddededge.com> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <1132108490.5646.67.camel@gaston> <02de724e66fe23fd23a3635c8b6f049f@embeddededge.com> <1132117236.5646.75.camel@gaston> <324befe15b30886e86b0110c30334c8f@embeddededge.com> Message-ID: <1132121625.5646.77.camel@gaston> On Wed, 2005-11-16 at 00:35 -0500, Dan Malek wrote: > On Nov 16, 2005, at 12:00 AM, Benjamin Herrenschmidt wrote: > > > Oh well, I suppose I'll have to dig out paulus' 601 based mac :) > > If we don't have any contemporary processors that need > this solution, can we just put on aside until someone > has hardware that requires it? I do not want to break an existing functionality with the merged file, though for now, I suppose the merged file will only apply to ARCH=powerpc, I can keep the old align.c in arch/ppc/kernel until it has been properly tested on old machines. Ben. From benh at kernel.crashing.org Wed Nov 16 17:53:09 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 16 Nov 2005 17:53:09 +1100 Subject: please pull powerpc-merge.git In-Reply-To: <17274.46718.214718.675981@cargo.ozlabs.ibm.com> References: <17274.46718.214718.675981@cargo.ozlabs.ibm.com> Message-ID: <1132123990.5646.83.camel@gaston> On Wed, 2005-11-16 at 15:33 +1100, Paul Mackerras wrote: > Linus, > > Please do another pull from > > git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git > > I have moved arch/ppc64/boot over to arch/powerpc/boot, so if you use > a zImage, you will have to use arch/powerpc/boot/zImage.vmode rather > than arch/ppc64/boot/zImage.vmode (for a G5). I assume you probably > boot a vmlinux using yaboot, though. Linus (and others on the list), in case that interests you, this zImage.vmode thing can be booted directly from open firmware. More specifically, it can be _netbooted_ which is very handy for testing kernels, especially when you have 2 G5s :) To netboot from OF, the simplest way to do so is to type: boot enet:server_ip,filename (server_ip is the numeric address of the tftp server, filename the file to request on that server). OF will automatically acquire an address for itself via DHCP. You can make it boot that way by default by doing setenv boot-device enet:server_ip,filename (You can usually always back off by using the "option" key at boot to pick your boot partition, and then from linux, re-do an "ybin" or "mkofboot" to restore the yaboot "boot-device"). OF can also automatically pick the server and filename via DHCP, however, Apple hacked badly there, it only works if your DHCP server has been modified to send some apple-specific extensions. (They claim they did that to avoid confusing users, go figure) so don't bother with that. > Other than that, there is a collection of minor build and bug fixes. > I haven't included the NUMA/sparsemem fix yet (still waiting for > comments on the patch). It got in anyway :) Ben. From hch at lst.de Wed Nov 16 19:24:56 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 16 Nov 2005 09:24:56 +0100 Subject: powerpc.git tree now on kernel.org In-Reply-To: <17274.51251.591579.8751@cargo.ozlabs.ibm.com> References: <17274.51251.591579.8751@cargo.ozlabs.ibm.com> Message-ID: <20051116082456.GA24802@lst.de> On Wed, Nov 16, 2005 at 04:48:35PM +1100, Paul Mackerras wrote: > Although some of these patches may go to Linus before 2.6.15 is > released, I won't be asking Linus to pull this tree directly, since it > is likely to get a bit messy as patches are updated. I expect that > Andrew Morton will pull this tree periodically and include it in his > -mm releases. Please make sure at least all patches required to kill arch/ppc64 and include/asm-ppc64 go to Linus. If 2.6.15 released with the include3 hack that would mean endless pain to people building external modules and the distributions trying to support that. From paulus at samba.org Wed Nov 16 19:31:09 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 16 Nov 2005 19:31:09 +1100 Subject: powerpc.git tree now on kernel.org In-Reply-To: <20051116082456.GA24802@lst.de> References: <17274.51251.591579.8751@cargo.ozlabs.ibm.com> <20051116082456.GA24802@lst.de> Message-ID: <17274.61005.845301.27856@cargo.ozlabs.ibm.com> Christoph Hellwig writes: > Please make sure at least all patches required to kill arch/ppc64 and > include/asm-ppc64 go to Linus. If 2.6.15 released with the include3 > hack that would mean endless pain to people building external modules > and the distributions trying to support that. Yes, that is my intention. For 2.6.15, at this stage, some of the header files might end up looking like: #ifdef CONFIG_PPC32 #include #else /* contents of asm-ppc64/foo.h */ #endif if it looks like doing a proper merge will require too many changes. But at least it gets the files out of include/asm-ppc64. Paul. From dwmw2 at infradead.org Thu Nov 17 00:38:24 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 16 Nov 2005 13:38:24 +0000 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <200509221446.56228.arnd@arndb.de> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> Message-ID: <1132148305.21643.58.camel@hades.cambridge.redhat.com> On Thu, 2005-09-22 at 14:46 +0200, Arnd Bergmann wrote: > How about making that a firmware feature bit? That would avoid the > extra global exported symbol and be slightly more efficient for > the pSeries-only configuration. But ppc64_firmware_features isn't exported yet either. Also, it wouldn't be a lot more efficient, since we set FW_FEATURE_PSERIES_ALWAYS to zero, so it would never have the effect of optimising out the check anyway. If we were to attempt to fix that, we'd have to handle other platforms; we'd have to take into account FW_FEATURE_{PMAC,MAPLE,CELL}_ALWAYS when setting FW_FEATURE_ALWAYS. And even if we did that, we'd still probably want to set the FW_FEATURE_EEH bit _only_ if EEH-capable devices were found, rather than having it in FW_FEATURE_PSERIES_ALWAYS. So we still wouldn't optimise out the check. Paulus, the reason your G5 doesn't die with CONFIG_EEH set is because you haven't got spinlock debugging. Try turning that on too :) -- dwmw2 From arnd at arndb.de Thu Nov 17 01:26:40 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 16 Nov 2005 15:26:40 +0100 Subject: [PATCH] spufs: Make all exports GPL-only In-Reply-To: <20051115174145.70f37501.akpm@osdl.org> References: <20051115205347.395355000@localhost> <20051115210408.327453000@localhost> <20051115174145.70f37501.akpm@osdl.org> Message-ID: <200511161526.42655.arnd@arndb.de> This changes all exported symbols of spufs to EXPORT_SYMBOL_GPL. The spu_ibox_read/spu_wbox_write symbols are not exported any more when the scheduler patch is applied. Signed-off-by: Arnd Bergmann --- On Middeweken 16 November 2005 02:41, Andrew Morton wrote: > +EXPORT_SYMBOL_GPL(hash_page); > +EXPORT_SYMBOL(spu_alloc); > +EXPORT_SYMBOL(spu_free); > +EXPORT_SYMBOL(spu_run); > +EXPORT_SYMBOL(spu_ibox_read); > +EXPORT_SYMBOL(spu_wbox_write); > +EXPORT_SYMBOL_GPL(register_spu_syscalls); > +EXPORT_SYMBOL_GPL(unregister_spu_syscalls); > -EXPORT_SYMBOL_GPL(__handle_mm_fault); /* For MOL */ > +EXPORT_SYMBOL_GPL(__handle_mm_fault); > > A strange mixture of GPL and non-GPL. ? What's the thinking here? Lack of thinking ;-) At first, I had everything as EXPORT_SYMBOL. Everything that was added in the last few months was EXPORT_SYMBOL_GPL. Index: linux-2.6.15-rc/arch/powerpc/platforms/cell/spu_base.c =================================================================== --- linux-2.6.15-rc.orig/arch/powerpc/platforms/cell/spu_base.c +++ linux-2.6.15-rc/arch/powerpc/platforms/cell/spu_base.c @@ -399,7 +399,7 @@ struct spu *spu_alloc(void) return spu; } -EXPORT_SYMBOL(spu_alloc); +EXPORT_SYMBOL_GPL(spu_alloc); void spu_free(struct spu *spu) { @@ -407,7 +407,7 @@ void spu_free(struct spu *spu) list_add_tail(&spu->list, &spu_list); up(&spu_mutex); } -EXPORT_SYMBOL(spu_free); +EXPORT_SYMBOL_GPL(spu_free); static int spu_handle_mm_fault(struct spu *spu) { @@ -576,7 +576,7 @@ int spu_run(struct spu *spu) return ret; } -EXPORT_SYMBOL(spu_run); +EXPORT_SYMBOL_GPL(spu_run); static void __iomem * __init map_spe_prop(struct device_node *n, const char *name) From arndb at de.ibm.com Thu Nov 17 01:38:16 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Wed, 16 Nov 2005 15:38:16 +0100 Subject: [PATCH 1/5] spufs: The SPU file system, base In-Reply-To: <20051115212638.5dca4a66.akpm@osdl.org> References: <20051115205347.395355000@localhost> <17274.49289.583486.477211@cargo.ozlabs.ibm.com> <20051115212638.5dca4a66.akpm@osdl.org> Message-ID: <200511161538.17507.arndb@de.ibm.com> On Middeweken 16 November 2005 06:26, Andrew Morton wrote: > > > > Why? ?What have you got against MOL? :) > > > > The export was moved to mm/memory.c. ? No explanation why though... > Sorry about the lack of explanation. There was a short discussion about this in August, see http://lkml.org/lkml/2005/8/8/205 : On Mon, 8 Aug 2005 11:42:03 -0700 (PDT), Linus Torvalds wrote: > I don't see any reason not to make it global if there are two > architectures that need it. Especially as long as it's marked GPL-only so > that people don't start misusing it. The __handle_mm_fault symbol is used by spu_base.ko because the DMA page fault handler calls handle_mm_fault. Of course at the point where ppc_ksyms.c gets merged into arch/powerpc, there would again only be one architecture needing it... Arnd <>< From galak at kernel.crashing.org Thu Nov 17 02:15:09 2005 From: galak at kernel.crashing.org (Kumar Gala) Date: Wed, 16 Nov 2005 09:15:09 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <20051116093609.GA26269@iram.es> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <20051116093609.GA26269@iram.es> Message-ID: <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> On Nov 16, 2005, at 3:36 AM, Gabriel Paubert wrote: > On Tue, Nov 15, 2005 at 08:19:58PM -0600, Becky Bruce wrote: >> Ben, >> >> Yeah, I clearly shouldn't run testcases at 11pm, because I got in a >> rush and only confirmed that lmw/stmw were actually taking the >> exception. Those 2 are working beautifully. To test the others, I >> need to run on a different board which, of course, isn't bootable at >> the moment. As soon as I can get that up and running, I'll try >> some of >> the other cases and let you know how it goes...... >> >> BTW, Based on the pile of docs I have here, I think the list of >> alignment-exception-causing events on FSL's current parts (603, 603e, >> 750, 74x, 74xx, e500) is: > > The 603 is still in production? And is the upcoming 8641 exactly > the same as the 74xx series in this respect? 603 is used in all 82xx/83xx processors from Freescale. The 8641 is the same core as 7448. >> - single and double precision floating point ld/st ops (non-E500, non >> data size aligned) > > Hmm, you can load a double from any 4 byte aligned address AFAIR. This is only because every processor handles the misalignment for you. Its completely valid for someone to build a PPC that has an alignment exception in this case. >> - dcbz to WT or CI memory (all procs) >> - dcbz with cache disabled (all procs but 603e?) >> - misaligned little endian accesses (603e) > > I understand that you mention it for completeness since we > don't care about LE mode AFAICT. But I believe that there > were some differences between 603 and 603e in this area. > > However we do care about byte reversal instructions, which > probably believe like the corresponding normal instruction > (i.e., lwbrx has the same rules as lwzx, etc.) > >> - lwarx/stwcx (all procs) > > And ldarx/stdcx. on 64 bit, but these ones should not > be emulated. So it's easy ;-) > >> - multiple/string with LE set (750, 603e, 7450, 7400) > > Again LE mode is probably irrelevant. Agree with that. We dont support LE on classic. >> - eciwx/ecowx (750, 7450, 7400) > > Have these instructions ever been used for something > under Linux? I dont believe so. >> - a couple of others related to vector processing > > Which ones? The Altivec load and store instructions > simply mask the low order bits AFAIR. SPE misalignment is something to look at. >> If anybody knows offhand of something missing there, let me know. > > Nothing, but did you check when crossing a segment (256MB) boundary. > I seem to remember that some processors performed misaligned > load/store across pages but not across segments. - kumar From paubert at iram.es Wed Nov 16 20:36:09 2005 From: paubert at iram.es (Gabriel Paubert) Date: Wed, 16 Nov 2005 10:36:09 +0100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <00eecfdbd5bccc7b293d847033121eee@freescale.com> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> Message-ID: <20051116093609.GA26269@iram.es> On Tue, Nov 15, 2005 at 08:19:58PM -0600, Becky Bruce wrote: > Ben, > > Yeah, I clearly shouldn't run testcases at 11pm, because I got in a > rush and only confirmed that lmw/stmw were actually taking the > exception. Those 2 are working beautifully. To test the others, I > need to run on a different board which, of course, isn't bootable at > the moment. As soon as I can get that up and running, I'll try some of > the other cases and let you know how it goes...... > > BTW, Based on the pile of docs I have here, I think the list of > alignment-exception-causing events on FSL's current parts (603, 603e, > 750, 74x, 74xx, e500) is: The 603 is still in production? And is the upcoming 8641 exactly the same as the 74xx series in this respect? > > - lmw/stmw (all procs, non-word aligned) Do we really want to emulate these instructions? Their purpose is to minimize code size in functions prologue and epilogue. If you hit an alignment execption with lwm/stmw, your stack is probably misaligned for some stupid reason or bug (back chain pointer corrrupted because of some buffer overflow comes to mind, and you want to know ASAP). > - single and double precision floating point ld/st ops (non-E500, non > data size aligned) Hmm, you can load a double from any 4 byte aligned address AFAIR. > - dcbz to WT or CI memory (all procs) > - dcbz with cache disabled (all procs but 603e?) > - misaligned little endian accesses (603e) I understand that you mention it for completeness since we don't care about LE mode AFAICT. But I believe that there were some differences between 603 and 603e in this area. However we do care about byte reversal instructions, which probably believe like the corresponding normal instruction (i.e., lwbrx has the same rules as lwzx, etc.) > - lwarx/stwcx (all procs) And ldarx/stdcx. on 64 bit, but these ones should not be emulated. So it's easy ;-) > - multiple/string with LE set (750, 603e, 7450, 7400) Again LE mode is probably irrelevant. > - eciwx/ecowx (750, 7450, 7400) Have these instructions ever been used for something under Linux? > - a couple of others related to vector processing Which ones? The Altivec load and store instructions simply mask the low order bits AFAIR. > If anybody knows offhand of something missing there, let me know. Nothing, but did you check when crossing a segment (256MB) boundary. I seem to remember that some processors performed misaligned load/store across pages but not across segments. Regards, Gabriel From becky.bruce at freescale.com Thu Nov 17 03:31:59 2005 From: becky.bruce at freescale.com (Becky Bruce) Date: Wed, 16 Nov 2005 10:31:59 -0600 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> References: <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> Message-ID: <28076a8ba1e55469c74b0677a289fd0b@freescale.com> > > > > The 603 is still in production? And is the upcoming 8641 exactly > > the same as the 74xx series in this respect? > > 603 is used in all 82xx/83xx processors from Freescale. The 8641 is? > the same core as 7448. The differences between 603 and 603e wrt alignment exceptions, as far as I can tell, are: - 603 does not take exception on misaligned LE accesses except for strings and multiples - 603 takes an alignment exception on ecowx/eciwx, 603e does not - 603 generates an alignment when a ld/st crosses a segment boundary and the T bit is different in the 2 segments I should have listed these out above, sorry! > > >> - single and double precision floating point ld/st ops (non-E500, > non > >> data size aligned) > > > > Hmm, you can load a double from any 4 byte aligned address AFAIR. > > This is only because every processor handles the misalignment for? > you.? Its completely valid for someone to build a PPC that has an? > alignment exception in this case. You're right, I should have said "word-aligned", not "data size aligned". While a load of a doubleword from a word aligned address is considered misaligned by the hardware, it doesn't generate an exception in any parts we have now that I know of. > > However we do care about byte reversal instructions, which > > probably believe like the corresponding normal instruction > > (i.e., lwbrx has the same rules as lwzx, etc.) Yep, they would work the same way, which for all of FSL's current parts would mean no exception. > > > >> - lwarx/stwcx (all procs) > > > > And ldarx/stdcx. on 64 bit, but these ones should not > > be emulated. So it's easy ;-) > > > >> - multiple/string with LE set (750, 603e, 7450, 7400) > > > > Again LE mode is probably irrelevant. > > Agree with that. We dont support LE on classic. Yep. Just listed for completeness. > > > >> - eciwx/ecowx (750, 7450, 7400) > > > > Have these instructions ever been used for something > > under Linux? > > I dont believe so. These guys are legagy - I don't think anyone uses them, and the alignment exception doesn't (and, IMHO shouldn't) care about them at all. They're just listed here for completeness. > > >> - a couple of others related to vector processing > > > > Which ones? The Altivec load and store instructions > > simply mask the low order bits AFAIR. > > SPE misalignment is something to look at. I'll look into it when I have a moment to breathe...... There are 2 conditions here that aren't currently handled (from the manual): - SPFP and SPE instructions are not aligned on a natural boundary (defined by the size of the data element being accessed) - physical address of certain evld/st instructions is not aligned on a 64-bit boundary. > > >> If anybody knows offhand of something missing there, let me know. > > > > Nothing, but did you check when crossing a segment (256MB) boundary. > > I seem to remember that some processors performed misaligned > > load/store across pages but not across segments. As far as I can tell, the only one that cares about segment boundaries is 603 (604, 604e, and 601 may care, but I don't consider those "current", and I don't have any working hardware). And it only takes an exception if there's a difference in the T-bit across the segments. Cheers! -B From zarniwhoop at ntlworld.com Thu Nov 17 04:11:37 2005 From: zarniwhoop at ntlworld.com (Ken Moffat) Date: Wed, 16 Nov 2005 17:11:37 +0000 (GMT) Subject: G5 (SMU) loss of keyboard/mouse In-Reply-To: <1132096801.5646.59.camel@gaston> References: <1132096801.5646.59.camel@gaston> Message-ID: On Wed, 16 Nov 2005, Benjamin Herrenschmidt wrote: > > The problems you are reporting look strangely similar to what was > reported by a iMac G5 rev 2 (PowerMac8,2) user. Strangely, the > PowerMac8,1 that I have here (iMac G5 rev 1) seem to be rock solid. > > Is the kernel trying to load the alsa driver ? Does it help not loading > it at all ? > Indeed, it does. Last night I had the box locked (monolithic kernel cross-compiled and put onto ubuntu 'server' install, static ip brought up with ifconfig), ok until I keyed <:> in vim at which point it slowly flooded the screen with ':' characters and was no longer responding to pings. That was repeatable. Today, recompiled the kernel without sound and it was up for over an hour on the same ubuntu system, no problems. Now, I've applied this kernel to fedora and it looks good (up for 30 minutes so far, dhcp worked, replying via ssh to my server). So, very usable without sound. I'll make yet another install of something I'm more comfortable with (cross-LFS, with latest udev) in a day or two, then I can play with the post-2.6.14 changes (fans, cpufreq). Thanks again. Ken -- das eine Mal als Trag?die, das andere Mal als Farce From avolkov at varma-el.com Thu Nov 17 03:54:31 2005 From: avolkov at varma-el.com (Andrey Volkov) Date: Wed, 16 Nov 2005 19:54:31 +0300 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <4ad202b87fa52d954e645b05fb45ca13@freescale.com> References: <1132108490.5646.67.camel@gaston> <4ad202b87fa52d954e645b05fb45ca13@freescale.com> Message-ID: <437B6447.8010203@varma-el.com> Becky Bruce wrote: > On Nov 15, 2005, at 8:34 PM, Benjamin Herrenschmidt wrote: > >> > >> > BTW, Based on the pile of docs I have here, I think the list of >> > alignment-exception-causing events on FSL's current parts (603, 603e, >> > 750, 74x, 74xx, e500) is: >> > >> > - lmw/stmw (all procs, non-word aligned) >> > - single and double precision floating point ld/st ops (non-E500, non >> > data size aligned) >> > - dcbz to WT or CI memory (all procs) >> > - dcbz with cache disabled (all procs but 603e?) >> > - misaligned little endian accesses (603e) >> > - lwarx/stwcx (all procs) >> > - multiple/string with LE set (750, 603e, 7450, 7400) >> > - eciwx/ecowx (750, 7450, 7400) >> > - a couple of others related to vector processing >> > >> > If anybody knows offhand of something missing there, let me know. >> >> What about lwz/stw cropssing page boundaries ? Is this handled in HW ? >> >> Ben. > > > Apparently so, much to my surprise - I ran the testcase with those > instructions misaligned across a page boundary last night and got no > alignment exception. I was surprised, and asked my husband about it (he > worked on the load/store units for a bunch of our parts), and he says > these guys never cause an exception for any of FSL's current parts as > far as he knows. This is supported by our documentation as well - the > only place I see these listed is on 603e, where they can cause an > exception if the page is mapped little endian. > Try this for 603e (BE): memcpy(xxxx3, xxxx0, 8); I get invalid behavior (0 in second dword) on MPC5200 for external flash access. -- Regards Andrey Volkov From dan at embeddededge.com Thu Nov 17 06:20:43 2005 From: dan at embeddededge.com (Dan Malek) Date: Wed, 16 Nov 2005 14:20:43 -0500 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <20051116093609.GA26269@iram.es> <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> Message-ID: <755b1bfb034aebb5de36dc0594e08ec6@embeddededge.com> On Nov 16, 2005, at 10:15 AM, Kumar Gala wrote: > 603 is used in all 82xx/83xx processors from Freescale. The 8641 is > the same core as 7448. The 82xx uses G2_LE, and 83xx is e300, which are similar to the old 603 but do have some subtle improvements that make them better cores. -- Dan From dan at embeddededge.com Thu Nov 17 06:24:00 2005 From: dan at embeddededge.com (Dan Malek) Date: Wed, 16 Nov 2005 14:24:00 -0500 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <28076a8ba1e55469c74b0677a289fd0b@freescale.com> References: <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> <28076a8ba1e55469c74b0677a289fd0b@freescale.com> Message-ID: <952fa47f1def7ef38b756a586cf783ea@embeddededge.com> On Nov 16, 2005, at 11:31 AM, Becky Bruce wrote: > As far as I can tell, the only one that cares about segment boundaries > is 603 Why would 603 care about segment boundaries? I couldn't find any documentation old enough that indicated such a thing :-) Thanks. -- Dan From paubert at iram.es Thu Nov 17 06:45:53 2005 From: paubert at iram.es (Gabriel Paubert) Date: Wed, 16 Nov 2005 20:45:53 +0100 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <755b1bfb034aebb5de36dc0594e08ec6@embeddededge.com> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <20051116093609.GA26269@iram.es> <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> <755b1bfb034aebb5de36dc0594e08ec6@embeddededge.com> Message-ID: <20051116194553.GA23679@iram.es> On Wed, Nov 16, 2005 at 02:20:43PM -0500, Dan Malek wrote: > > On Nov 16, 2005, at 10:15 AM, Kumar Gala wrote: > > >603 is used in all 82xx/83xx processors from Freescale. The 8641 is > >the same core as 7448. > > The 82xx uses G2_LE, and 83xx is e300, which are > similar to the old 603 but do have some subtle > improvements that make them better cores. I originally asked because I believed that these cores are actually closer to the 603e than to the original 603. But take this with a pinch of salt, I might be wrong. Gabriel From dan at embeddededge.com Thu Nov 17 07:36:40 2005 From: dan at embeddededge.com (Dan Malek) Date: Wed, 16 Nov 2005 15:36:40 -0500 Subject: [PATCH] powerpc: Merge align.c In-Reply-To: <20051116194553.GA23679@iram.es> References: <1132032910.23979.6.camel@gaston> <00eecfdbd5bccc7b293d847033121eee@freescale.com> <20051116093609.GA26269@iram.es> <43D0A21D-89BC-4EFE-BA2A-94760BA32276@kernel.crashing.org> <755b1bfb034aebb5de36dc0594e08ec6@embeddededge.com> <20051116194553.GA23679@iram.es> Message-ID: <6ebdb3aa72b262f31fd89bfc747c9d81@embeddededge.com> On Nov 16, 2005, at 2:45 PM, Gabriel Paubert wrote: > I originally asked because I believed that these cores are > actually closer to the 603e than to the original 603. That's correct. In fact, I think the original 8260 and perhaps the 5200 were 603e cores. As I mentioned, the newer ones are subtly different, but better than the 603e ;-) Thanks. -- Dan From benh at kernel.crashing.org Thu Nov 17 07:58:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Nov 2005 07:58:08 +1100 Subject: G5 (SMU) loss of keyboard/mouse In-Reply-To: References: <1132096801.5646.59.camel@gaston> Message-ID: <1132174689.5646.110.camel@gaston> > Indeed, it does. Last night I had the box locked (monolithic kernel > cross-compiled and put onto ubuntu 'server' install, static ip > brought up with ifconfig), ok until I keyed <:> in vim at which > point it slowly flooded the screen with ':' characters and was no longer > responding to pings. That was repeatable. > > Today, recompiled the kernel without sound and it was up for over an > hour on the same ubuntu system, no problems. Now, I've applied this > kernel to fedora and it looks good (up for 30 minutes so far, dhcp > worked, replying via ssh to my server). > > So, very usable without sound. I'll make yet another install of > something I'm more comfortable with (cross-LFS, with latest udev) in a > day or two, then I can play with the post-2.6.14 changes (fans, > cpufreq). Thanks again. This is strange as the sound driver isn't expected to initialize at all on this model since it doesn't recognize it. Or maybe alsa was trying to load a bogus module ? Also, be careful with parport, remove it from your /lib/modules. Things like CPUS tend to cause it to load and it will render your kernel unstable. Ben. From linas at austin.ibm.com Thu Nov 17 08:41:16 2005 From: linas at austin.ibm.com (linas) Date: Wed, 16 Nov 2005 15:41:16 -0600 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <1132148305.21643.58.camel@hades.cambridge.redhat.com> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> Message-ID: <20051116214116.GV19593@austin.ibm.com> Hi, It seems that I missed the begining of this conversation; Let me jump in ayway. On Wed, Nov 16, 2005 at 01:38:24PM +0000, David Woodhouse was heard to remark: > If we were to attempt to fix that, we'd have to handle other platforms; > we'd have to take into account FW_FEATURE_{PMAC,MAPLE,CELL}_ALWAYS when > setting FW_FEATURE_ALWAYS. And even if we did that, we'd still probably > want to set the FW_FEATURE_EEH bit _only_ if EEH-capable devices were > found, There already is a global flag, computed at boot time, that indicates if there's EEH hardware on the system: its called "eeh_subsystem_enabled". Starting with power5, if there is EEH hardware, then it *must* be turned on, else the hardware won't work. Thus the flag could be renamed to "eeh_hardware_is_present". > Paulus, the reason your G5 doesn't die with CONFIG_EEH set is because > you haven't got spinlock debugging. Try turning that on too :) ? What is the specific problem? EEH uses very few spinlocks; I'm having trouble imagining what the problem is. --linas From paulus at samba.org Thu Nov 17 09:14:16 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 17 Nov 2005 09:14:16 +1100 Subject: [PATCH] powerpc: Merge align.c (#2) In-Reply-To: <1132025664.6094.47.camel@gaston> References: <1132025664.6094.47.camel@gaston> Message-ID: <17275.44856.347629.443417@cargo.ozlabs.ibm.com> Benjamin Herrenschmidt writes: > Since it's likely that I won't be able to test all scenario, code > inspection is much welcome. I think you need this patch on top... Paul. diff -urN powerpc/arch/powerpc/kernel/align.c merge-hack/arch/powerpc/kernel/align.c --- powerpc/arch/powerpc/kernel/align.c 2005-11-17 09:05:04.000000000 +1100 +++ merge-hack/arch/powerpc/kernel/align.c 2005-11-17 09:06:29.000000000 +1100 @@ -198,21 +198,20 @@ /* bits 6:15 --> 22:31 */ dsisr = (instr & 0x03ff0000) >> 16; - if ( IS_XFORM(instr) ) { + if (IS_XFORM(instr)) { /* bits 29:30 --> 15:16 */ dsisr |= (instr & 0x00000006) << 14; /* bit 25 --> 17 */ dsisr |= (instr & 0x00000040) << 8; /* bits 21:24 --> 18:21 */ dsisr |= (instr & 0x00000780) << 3; - } - else { + } else { /* bit 5 --> 17 */ dsisr |= (instr & 0x04000000) >> 12; /* bits 1: 4 --> 18:21 */ dsisr |= (instr & 0x78000000) >> 17; /* bits 30:31 --> 12:13 */ - if ( IS_DSFORM(instr) ) + if (IS_DSFORM(instr)) dsisr |= (instr & 0x00000003) << 18; } @@ -247,13 +246,22 @@ /* * Emulate load & store multiple instructions + * On 64-bit machines, these instructions only affect/use the + * bottom 4 bytes of each register, and the loads clear the + * top 4 bytes of the affected register. */ +#ifdef CONFIG_PPC64 +#define REG_BYTE(rp, i) *((u8 *)((rp) + ((i) >> 2)) + ((i) & 3) + 4) +#else +#define REG_BYTE(rp, i) *((u8 *)(rp) + (i)) +#endif + static int emulate_multiple(struct pt_regs *regs, unsigned char __user *addr, unsigned int reg, unsigned int nb, unsigned int flags, unsigned int instr) { - unsigned char *rptr; - int nb0, i; + unsigned long *rptr; + unsigned int nb0, i; /* * We do not try to emulate 8 bytes multiple as they aren't really @@ -291,29 +299,38 @@ if (!access_ok((flags & ST ? VERIFY_WRITE: VERIFY_READ), addr, nb+nb0)) return -EFAULT; /* bad address */ - rptr = (unsigned char *) ®s->gpr[reg]; + rptr = ®s->gpr[reg]; if (flags & LD) { + /* + * This zeroes the top 4 bytes of the affected registers + * in 64-bit mode, and also zeroes out any remaining + * bytes of the last register for lsw*. + */ + memset(rptr, 0, ((nb + 3) / 4) * sizeof(unsigned long)); + if (nb0 > 0) + memset(®s->gpr[0], 0, + ((nb0 + 3) / 4) * sizeof(unsigned long)); + for (i = 0; i < nb; ++i) - if (__get_user(rptr[i], addr + i)) + if (__get_user(REG_BYTE(rptr, i), addr + i)) return -EFAULT; if (nb0 > 0) { - rptr = (unsigned char *) ®s->gpr[0]; + rptr = ®s->gpr[0]; addr += nb; for (i = 0; i < nb0; ++i) - if (__get_user(rptr[i], addr + i)) + if (__get_user(REG_BYTE(rptr, i), addr + i)) return -EFAULT; } - for (; (i & 3) != 0; ++i) - rptr[i] = 0; + } else { for (i = 0; i < nb; ++i) - if (__put_user(rptr[i], addr + i)) + if (__put_user(REG_BYTE(rptr, i), addr + i)) return -EFAULT; if (nb0 > 0) { - rptr = (unsigned char *) ®s->gpr[0]; + rptr = ®s->gpr[0]; addr += nb; for (i = 0; i < nb0; ++i) - if (__put_user(rptr[i], addr + i)) + if (__put_user(REG_BYTE(rptr, i), addr + i)) return -EFAULT; } } @@ -338,7 +355,7 @@ unsigned char __user *p; int ret, t; union { - long ll; + u64 ll; double dd; unsigned char v[8]; struct { From zarniwhoop at ntlworld.com Thu Nov 17 09:15:17 2005 From: zarniwhoop at ntlworld.com (Ken Moffat) Date: Wed, 16 Nov 2005 22:15:17 +0000 (GMT) Subject: G5 (SMU) loss of keyboard/mouse In-Reply-To: <1132174689.5646.110.camel@gaston> References: <1132096801.5646.59.camel@gaston> <1132174689.5646.110.camel@gaston> Message-ID: On Thu, 17 Nov 2005, Benjamin Herrenschmidt wrote: > > This is strange as the sound driver isn't expected to initialize at all > on this model since it doesn't recognize it. Or maybe alsa was trying to > load a bogus module ? > On the versions of 2.6.14.2 that I've built for this, there are no modules. The distro kernels certainly seem to have loaded modules for alsa. The differences between my 2.6.14.2 which dies, and the one which works fine are _only_ (a) all g5 sound options built in | no sound, and (b) default keyboard | add AT keyboard in the input device support (the box is on a KVM switch). Oh, and my configs are based on the g5_defconfig, but changed to all built in, and using the nvidia fb. Looking in the logs, the bad 2.6.14.2 has kernel: ALSA device list: #0: PowerMac AWACS Rev 0 lspci output attached, in case it isn't as expected. > Also, be careful with parport, remove it from your /lib/modules. Things > like CPUS tend to cause it to load and it will render your kernel > unstable. Checked for parport in my .config after your first reply, but it isn't there. Ken -- das eine Mal als Trag?die, das andere Mal als Farce -------------- next part -------------- A non-text attachment was scrubbed... Name: lspci.bz2 Type: application/octet-stream Size: 1783 bytes Desc: Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051116/4139cf42/attachment.obj From benh at kernel.crashing.org Thu Nov 17 09:18:04 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Nov 2005 09:18:04 +1100 Subject: G5 (SMU) loss of keyboard/mouse In-Reply-To: References: <1132096801.5646.59.camel@gaston> <1132174689.5646.110.camel@gaston> Message-ID: <1132179484.20016.0.camel@gaston> > kernel: ALSA device list: > #0: PowerMac AWACS Rev 0 Ok, there is indeed no AWACS in there, it may be wrecking things to have the driver tap it like that. I'll have a look. Thanks. Ben. From dwmw2 at infradead.org Thu Nov 17 09:22:26 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 16 Nov 2005 22:22:26 +0000 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <20051116214116.GV19593@austin.ibm.com> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> Message-ID: <1132179746.28963.82.camel@baythorne.infradead.org> On Wed, 2005-11-16 at 15:41 -0600, linas wrote: > It seems that I missed the begining of this conversation; I've attached the mail which started it. > There already is a global flag, computed at boot time, that indicates if > there's EEH hardware on the system: its called "eeh_subsystem_enabled". Yes, that's what my original patch used, although it had to export it. > > Paulus, the reason your G5 doesn't die with CONFIG_EEH set is because > > you haven't got spinlock debugging. Try turning that on too :) > > ? What is the specific problem? EEH uses very few spinlocks; I'm having > trouble imagining what the problem is. The problem is that eeh_init() is never called, because we're not running on a pSeries. But eeh_check_failure() is still called, and tries to lock a spinlock which was never initialised. -- dwmw2 -------------- next part -------------- An embedded message was scrubbed... From: David Woodhouse Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. Date: Wed, 21 Sep 2005 18:14:59 +0100 Size: 4403 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051116/82946d2a/attachment.eml From geoffrey.levand at am.sony.com Thu Nov 17 09:36:38 2005 From: geoffrey.levand at am.sony.com (Geoff Levand) Date: Wed, 16 Nov 2005 14:36:38 -0800 Subject: please pull powerpc-merge.git In-Reply-To: <1132123990.5646.83.camel@gaston> References: <17274.46718.214718.675981@cargo.ozlabs.ibm.com> <1132123990.5646.83.camel@gaston> Message-ID: <437BB476.9000907@am.sony.com> Benjamin Herrenschmidt wrote: > Linus (and others on the list), in case that interests you, this > zImage.vmode thing can be booted directly from open firmware. More > specifically, it can be _netbooted_ which is very handy for testing > kernels, especially when you have 2 G5s :) > > To netboot from OF, the simplest way to do so is to type: > > boot enet:server_ip,filename > > (server_ip is the numeric address of the tftp server, filename the file > to request on that server). > > OF can also automatically pick the server and filename via DHCP, > however, Apple hacked badly there, it only works if your DHCP server has > been modified to send some apple-specific extensions. (They claim they > did that to avoid confusing users, go figure) so don't bother with that. FYI, using the stock dhcpd in Fedora Core 4 (on a i386 PC) OF can get the IP address of the TFTP server from DHCP, but can't pickup the TFTP image path, so that needs to be put into the OF command. Note that you'll need to use a back slash in path names. Here's what I use in boot-device: enet:-1,\g5\boot\zImage.vmode Here's a dhcpd.conf entry: host g5 { hardware ethernet xxxxxxxxxxxxx; fixed-address 192.168.1.15; server-name "192.168.1.10"; next-server 192.168.1.10; filename "\g5\boot\zImage.vmode"; option host-name "g5"; option root-path "192.168.1.10:/target/g5"; } -Geoff From kjhall at us.ibm.com Thu Nov 17 09:48:16 2005 From: kjhall at us.ibm.com (Kylene Jo Hall) Date: Wed, 16 Nov 2005 16:48:16 -0600 Subject: [PATCH 2 of 2] tpm: updates for new hardware In-Reply-To: <200511141710.41230.bjorn.helgaas@hp.com> References: <1131739595.5048.15.camel@localhost.localdomain> <200511141710.41230.bjorn.helgaas@hp.com> Message-ID: <1132181296.4872.12.camel@localhost.localdomain> Patch to use ioread8 and iowrite8 as suggested. Signed-off-by: Kylene Hall On Mon, 2005-11-14 at 17:10 -0700, Bjorn Helgaas wrote: > On Friday 11 November 2005 1:06 pm, Kylene Jo Hall wrote: > > +#ifdef CONFIG_PPC64 > > +#define atmel_getb(chip, offset) readb(chip->vendor->iobase + offset); > > +#define atmel_putb(val, chip, offset) writeb(val, chip->vendor->iobase + offset) > > ... > > +#else > > +#define atmel_getb(chip, offset) inb(chip->vendor->base + offset) > > +#define atmel_putb(val, chip, offset) outb(val, chip->vendor->base + offset) > > Why don't you use ioread8() instead of defining atmel_getb()? > > You'd still need something PPC64-specific to initialize the iomem cookie, > but the accessors would go away. > > Unfortunately, ioread8() and associated interfaces aren't mentioned > under Documentation/, but there are some hints in lib/iomap.c. > --- --- linux-2.6.15-rc1/drivers/char/tpm/tpm_atmel.c 2005-11-16 16:02:31.000000000 +0100 +++ linux-2.6.15-rc1-git4/drivers/char/tpm/tpm_atmel.c 2005-11-16 16:34:32.000000000 +0100 @@ -47,13 +47,12 @@ static int tpm_atml_recv(struct tpm_chip return -EIO; for (i = 0; i < 6; i++) { - status = atmel_getb(chip, 1); + status = ioread8(chip->vendor->iobase + 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { - dev_err(chip->dev, - "error reading header\n"); + dev_err(chip->dev, "error reading header\n"); return -EIO; } - *buf++ = atmel_getb(chip, 0); + *buf++ = ioread8(chip->vendor->iobase); } /* size of the data received */ @@ -64,10 +63,9 @@ static int tpm_atml_recv(struct tpm_chip dev_err(chip->dev, "Recv size(%d) less than available space\n", size); for (; i < size; i++) { /* clear the waiting data anyway */ - status = atmel_getb(chip, 1); + status = ioread8(chip->vendor->iobase + 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { - dev_err(chip->dev, - "error reading data\n"); + dev_err(chip->dev, "error reading data\n"); return -EIO; } } @@ -76,17 +74,17 @@ static int tpm_atml_recv(struct tpm_chip /* read all the data available */ for (; i < size; i++) { - status = atmel_getb(chip, 1); + status = ioread8(chip->vendor->iobase + 1); if ((status & ATML_STATUS_DATA_AVAIL) == 0) { - dev_err(chip->dev, - "error reading data\n"); + dev_err(chip->dev, "error reading data\n"); return -EIO; } - *buf++ = atmel_getb(chip, 0); + *buf++ = ioread8(chip->vendor->iobase); } /* make sure data available is gone */ - status = atmel_getb(chip, 1); + status = ioread8(chip->vendor->iobase + 1); + if (status & ATML_STATUS_DATA_AVAIL) { dev_err(chip->dev, "data available is stuck\n"); return -EIO; @@ -102,7 +100,7 @@ static int tpm_atml_send(struct tpm_chip dev_dbg(chip->dev, "tpm_atml_send:\n"); for (i = 0; i < count; i++) { dev_dbg(chip->dev, "%d 0x%x(%d)\n", i, buf[i], buf[i]); - atmel_putb(buf[i], chip, 0); + iowrite8(buf[i], chip->vendor->iobase); } return count; @@ -110,12 +108,12 @@ static int tpm_atml_send(struct tpm_chip static void tpm_atml_cancel(struct tpm_chip *chip) { - atmel_putb(ATML_STATUS_ABORT, chip, 1); + iowrite8(ATML_STATUS_ABORT, chip->vendor->iobase + 1); } static u8 tpm_atml_status(struct tpm_chip *chip) { - return atmel_getb(chip, 1); + return ioread8(chip->vendor->iobase + 1); } static struct file_operations atmel_ops = { @@ -162,7 +160,8 @@ static void atml_plat_remove(void) if (chip) { if (chip->vendor->have_region) - atmel_release_region(chip->vendor->base, chip->vendor->region_size); + atmel_release_region(chip->vendor->base, + chip->vendor->region_size); atmel_put_base_addr(chip->vendor); tpm_remove_hardware(chip->dev); platform_device_unregister(pdev); @@ -183,14 +182,19 @@ static int __init init_atmel(void) driver_register(&atml_drv); - if (atmel_get_base_addr(&tpm_atmel) != 0) { + if ((tpm_atmel.iobase = atmel_get_base_addr(&tpm_atmel)) == NULL) { rc = -ENODEV; goto err_unreg_drv; } - tpm_atmel.have_region = (atmel_request_region( tpm_atmel.base, tpm_atmel.region_size, "tpm_atmel0") == NULL) ? 0 : 1; - - if (IS_ERR(pdev = platform_device_register_simple("tpm_atmel", -1, NULL, 0 ))) { + tpm_atmel.have_region = + (atmel_request_region + (tpm_atmel.base, tpm_atmel.region_size, + "tpm_atmel0") == NULL) ? 0 : 1; + + if (IS_ERR + (pdev = + platform_device_register_simple("tpm_atmel", -1, NULL, 0))) { rc = PTR_ERR(pdev); goto err_rel_reg; } @@ -202,9 +206,10 @@ static int __init init_atmel(void) err_unreg_dev: platform_device_unregister(pdev); err_rel_reg: - if (tpm_atmel.have_region) - atmel_release_region(tpm_atmel.base, tpm_atmel.region_size); atmel_put_base_addr(&tpm_atmel); + if (tpm_atmel.have_region) + atmel_release_region(tpm_atmel.base, + tpm_atmel.region_size); err_unreg_drv: driver_unregister(&atml_drv); return rc; --- linux-2.6.15-rc1/drivers/char/tpm/tpm_atmel.h 2005-11-16 16:02:31.000000000 +0100 +++ linux-2.6.15-rc1-git4/drivers/char/tpm/tpm_atmel.h 2005-11-16 15:43:26.000000000 +0100 @@ -27,12 +27,14 @@ #define atmel_putb(val, chip, offset) writeb(val, chip->vendor->iobase + offset) #define atmel_request_region request_mem_region #define atmel_release_region release_mem_region -static inline void atmel_put_base_addr(struct tpm_vendor_specific *vendor) + +static inline void atmel_put_base_addr(struct tpm_vendor_specific + *vendor) { iounmap(vendor->iobase); } -static int atmel_get_base_addr(struct tpm_vendor_specific *vendor) +static void __iomem * atmel_get_base_addr(struct tpm_vendor_specific *vendor) { struct device_node *dn; unsigned long address, size; @@ -44,11 +46,11 @@ static int atmel_get_base_addr(struct tp dn = of_find_node_by_name(NULL, "tpm"); if (!dn) - return 1; + return NULL; if (!device_is_compatible(dn, "AT97SC3201")) { of_node_put(dn); - return 1; + return NULL; } reg = (unsigned int *) get_property(dn, "reg", ®len); @@ -71,8 +73,7 @@ static int atmel_get_base_addr(struct tp vendor->base = address; vendor->region_size = size; - vendor->iobase = ioremap(address, size); - return 0; + return ioremap(vendor->base, vendor->region_size); } #else #define atmel_getb(chip, offset) inb(chip->vendor->base + offset) @@ -105,18 +106,19 @@ static int atmel_verify_tpm11(void) return 0; } -static inline void atmel_put_base_addr(struct tpm_vendor_specific *vendor) +static inline void atmel_put_base_addr(struct tpm_vendor_specific + *vendor) { } /* Determine where to talk to device */ -static unsigned long atmel_get_base_addr(struct tpm_vendor_specific +static void __iomem * atmel_get_base_addr(struct tpm_vendor_specific *vendor) { int lo, hi; if (atmel_verify_tpm11() != 0) - return 1; + return NULL; lo = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_LO); hi = tpm_read_index(TPM_ADDR, TPM_ATMEL_BASE_ADDR_HI); @@ -124,6 +126,6 @@ static unsigned long atmel_get_base_addr vendor->base = (hi << 8) | lo; vendor->region_size = 2; - return 0; + return ioport_map(vendor->base, vendor->region_size); } #endif From linas at austin.ibm.com Thu Nov 17 10:04:05 2005 From: linas at austin.ibm.com (linas) Date: Wed, 16 Nov 2005 17:04:05 -0600 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <1132179746.28963.82.camel@baythorne.infradead.org> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> Message-ID: <20051116230405.GX19593@austin.ibm.com> On Wed, Nov 16, 2005 at 10:22:26PM +0000, David Woodhouse was heard to remark: > > The problem is that eeh_init() is never called, because we're not > running on a pSeries. But eeh_check_failure() is still called, and tries > to lock a spinlock which was never initialised. Ah. Well, its a nice simple patch that seems to not have been applied to 2.6.15-rc1-git2. So ... my only objection is that there is a teeny amount of performance loss due to one more check, and the fact that the global will chew up yet another cacheline. You can regain most of that performance hit by reordering, so that instead of > -#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0) > +#define EEH_POSSIBLE_ERROR(val, type) (eeh_subsystem_enabled && (val) == (type)~0) one has #define EEH_POSSIBLE_ERROR(val, type) (((val) == (type)~0) && eeh_subsystem_enabled) so that the cacheline containing eeh_subsystem_enabled is pulled in only if val==all ff's. (val will be in a register anyway, so less loss there). --linas From olh at suse.de Thu Nov 17 10:08:20 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 17 Nov 2005 00:08:20 +0100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1131574051.25354.3.camel@localhost.localdomain> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> Message-ID: <20051116230820.GA29068@suse.de> On Wed, Nov 09, Badari Pulavarty wrote: > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote: > > > I didn't have any luck on 2.6.14-git12 either. > > > I tried 64k page support on my P570. > > > > > > Here are the console messages: > > > > What distro do you use in userland ? Some older glibc versions have a > > bug that cause issues with 64k pages, though it generally happens with > > login blowing up, not init ... > > SLES9 (could be SLES9 SP1). Can you double check? rpm -qi glibc | head should be enough. Would be bad if SP2 or SP3 does not work with 64k. -- short story of a lazy sysadmin: alias appserv=wotan From linas at austin.ibm.com Thu Nov 17 10:10:41 2005 From: linas at austin.ibm.com (linas) Date: Wed, 16 Nov 2005 17:10:41 -0600 Subject: [PATCH 1/7] PCI Error Recovery: header file patch In-Reply-To: <20051108235357.GD19593@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> <20051108235357.GD19593@austin.ibm.com> Message-ID: <20051116231041.GA16057@austin.ibm.com> Greg, Please apply. This has been modified to use unsigned int's per disucssion. --linas -------- PCI Error Recovery: header file patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds a set of callbacks to be used by error recovery routines to notify device drivers of the various stages of recovery. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git10/include/linux/pci.h =================================================================== --- linux-2.6.14-git10.orig/include/linux/pci.h 2005-11-07 17:24:23.048968436 -0600 +++ linux-2.6.14-git10/include/linux/pci.h 2005-11-07 17:42:46.026024245 -0600 @@ -78,6 +78,23 @@ #define PCI_UNKNOWN ((pci_power_t __force) 5) #define PCI_POWER_ERROR ((pci_power_t __force) -1) +/** The pci_channel state describes connectivity between the CPU and + * the pci device. If some PCI bus between here and the pci device + * has crashed or locked up, this info is reflected here. + */ +typedef unsigned int __bitwise pci_channel_state_t; + +enum pci_channel_state { + /* I/O channel is in normal state */ + pci_channel_io_normal = (__force pci_channel_state_t) 1, + + /* I/O to channel is blocked */ + pci_channel_io_frozen = (__force pci_channel_state_t) 2, + + /* PCI card is dead */ + pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, +}; + /* * The pci_dev structure is used to describe PCI devices. */ @@ -110,6 +127,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ + pci_channel_state_t error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -232,6 +250,54 @@ unsigned int use_driver_data:1; /* pci_driver->driver_data is used */ }; +/* ---------------------------------------------------------------- */ +/** PCI Error Recovery System (PCI-ERS). If a PCI device driver provides + * a set fof callbacks in struct pci_error_handlers, then that device driver + * will be notified of PCI bus errors, and will be driven to recovery + * when an error occurs. + */ + +typedef unsigned int __bitwise pci_ers_result_t; + +enum pci_ers_result { + /* no result/none/not supported in device driver */ + PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1, + + /* Device driver can recover without slot reset */ + PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2, + + /* Device driver wants slot to be reset. */ + PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3, + + /* Device has completely failed, is unrecoverable */ + PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4, + + /* Device driver is fully recovered and operational */ + PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5, +}; + +/* PCI bus error event callbacks */ +struct pci_error_handlers +{ + /* PCI bus error detected on this device */ + pci_ers_result_t (*error_detected)(struct pci_dev *dev, + enum pci_channel_state error); + + /* MMIO has been re-enabled, but not DMA */ + pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev); + + /* PCI Express link has been reset */ + pci_ers_result_t (*link_reset)(struct pci_dev *dev); + + /* PCI slot has been reset */ + pci_ers_result_t (*slot_reset)(struct pci_dev *dev); + + /* Device driver may resume normal operations */ + void (*resume)(struct pci_dev *dev); +}; + +/* ---------------------------------------------------------------- */ + struct module; struct pci_driver { struct list_head node; @@ -245,6 +311,7 @@ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); + struct pci_error_handlers *err_handler; struct device_driver driver; struct pci_dynids dynids; }; _______________________________________________ From pbadari at us.ibm.com Thu Nov 17 10:16:42 2005 From: pbadari at us.ibm.com (Badari Pulavarty) Date: Wed, 16 Nov 2005 15:16:42 -0800 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051116230820.GA29068@suse.de> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> <20051116230820.GA29068@suse.de> Message-ID: <1132183002.24066.90.camel@localhost.localdomain> On Thu, 2005-11-17 at 00:08 +0100, Olaf Hering wrote: > On Wed, Nov 09, Badari Pulavarty wrote: > > > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote: > > > > I didn't have any luck on 2.6.14-git12 either. > > > > I tried 64k page support on my P570. > > > > > > > > Here are the console messages: > > > > > > What distro do you use in userland ? Some older glibc versions have a > > > bug that cause issues with 64k pages, though it generally happens with > > > login blowing up, not init ... > > > > SLES9 (could be SLES9 SP1). > > Can you double check? rpm -qi glibc | head should be enough. > Would be bad if SP2 or SP3 does not work with 64k. > I think I am using SLES9. Planning to update to SP3. # rpm -qi glibc | head Name : glibc Relocations: (not relocatable) Version : 2.3.3 Vendor: SuSE Linux AG, Nuernberg, Germany Release : 98.28 Build Date: Wed Jun 30 15:55:45 2004 Install date: Wed Jul 6 17:24:44 2005 Build Host: gooseberry.suse.de Group : System/Libraries Source RPM: glibc-2.3.3-98.28.src.rpm Size : 6161800 License: GPL, LGPL Signature : DSA/SHA1, Wed Jun 30 16:00:21 2004, Key ID a84edae89c800aca Packager : http://www.suse.de/feedback URL : http://www.gnu.org/software/libc/libc.html Summary : The standard shared libraries (from the GNU C Library) Thanks, Badari From olh at suse.de Thu Nov 17 10:27:20 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 17 Nov 2005 00:27:20 +0100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1132183002.24066.90.camel@localhost.localdomain> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> <20051116230820.GA29068@suse.de> <1132183002.24066.90.camel@localhost.localdomain> Message-ID: <20051116232720.GA29512@suse.de> On Wed, Nov 16, Badari Pulavarty wrote: > I think I am using SLES9. Planning to update to SP3. > > # rpm -qi glibc | head > Name : glibc Relocations: (not > relocatable) > Version : 2.3.3 Vendor: SuSE Linux AG, > Nuernberg, Germany > Release : 98.28 Build Date: Wed Jun 30 > 15:55:45 2004 The release number indicates the GA glibc.spec was used, but the build date indicates its slightly older than SLES9 GA. -- short story of a lazy sysadmin: alias appserv=wotan From dwmw2 at infradead.org Thu Nov 17 10:34:24 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 16 Nov 2005 23:34:24 +0000 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <20051116230405.GX19593@austin.ibm.com> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> <20051116230405.GX19593@austin.ibm.com> Message-ID: <1132184064.28963.84.camel@baythorne.infradead.org> On Wed, 2005-11-16 at 17:04 -0600, linas wrote: > You can regain most of that performance hit by reordering, > so that instead of > > > -#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0) > > +#define EEH_POSSIBLE_ERROR(val, type) (eeh_subsystem_enabled && (val) == (type)~0) > > one has > > #define EEH_POSSIBLE_ERROR(val, type) (((val) == (type)~0) && > eeh_subsystem_enabled) Agreed. I actually thought precisely the same thing as I glanced at the attached email again when I sent it. Do you want me to submit a patch like that? -- dwmw2 From linas at austin.ibm.com Thu Nov 17 10:57:44 2005 From: linas at austin.ibm.com (linas) Date: Wed, 16 Nov 2005 17:57:44 -0600 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <1132184064.28963.84.camel@baythorne.infradead.org> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> <20051116230405.GX19593@austin.ibm.com> <1132184064.28963.84.camel@baythorne.infradead.org> Message-ID: <20051116235744.GZ19593@austin.ibm.com> On Wed, Nov 16, 2005 at 11:34:24PM +0000, David Woodhouse was heard to remark: > On Wed, 2005-11-16 at 17:04 -0600, linas wrote: > > You can regain most of that performance hit by reordering, > > so that instead of > > > > > -#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0) > > > +#define EEH_POSSIBLE_ERROR(val, type) (eeh_subsystem_enabled && (val) == (type)~0) > > > > one has > > > > #define EEH_POSSIBLE_ERROR(val, type) (((val) == (type)~0) && > > eeh_subsystem_enabled) > > Agreed. I actually thought precisely the same thing as I glanced at the > attached email again when I sent it. Do you want me to submit a patch > like that? Well, I would then sign-off-by/approve-by whatever; and then its up to anyone else who might object. --linas From michael at ellerman.id.au Thu Nov 17 11:07:12 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 17 Nov 2005 11:07:12 +1100 Subject: [PATCH 0/8] powerpc: Kexec fixups and support for booting at 32MB In-Reply-To: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> References: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> Message-ID: <200511171107.13501.michael@ellerman.id.au> On Tue, 15 Nov 2005 05:23, Milton Miller wrote: > I do have comments on this series, but have been slow to write them > down. > > PS: Please copy me on kexec/kdump ppc patches so I can reply with > references. I read the lists from the web. Sure. > [PATCH 1/8] powerpc: Turn cpu_irq_down into kexec_cpu_down Paul has already merged this so you can fix it up later if you want. It fixes bugs so we need it now rather than later. > [PATCH 5/8] powerpc: Add CONFIG_CRASH_DUMP > the __va change should be in [PATCH 4/8] powerpc: Seperate usage of > KERNELBASE and PAGE_OFFSET Done. > And should this not be last, since following patches are required > to get the kernel to work again? What, you need PHYISCAL_START > for them? well just #define it 0 for a bit in patch 4. Well I guess. Although it seems like overkill given that the kernel is fine unless you turn on CONFIG_CRASH_DUMP. I guess I'll reorder them. > [PATCH 6/8] powerpc: Reroute interrupts from 0 + offset to > PHYSICAL_START + offset > > The following should be in user space / device tree: > +#ifdef CONFIG_CRASH_DUMP > + lmb_reserve(0, KDUMP_BACKUP_LIMIT); > +#endif I disagree. It's a PPC64 implementation detail that we have to fiddle with stuff at 0, as far as userspace is concerned the kdump kernel is at 32 MB and up. If we require kexec-tools to do this, we'll have to keep the shape of head.S in sync with kexec-tools. > [PATCH 7/8] powerpc: Create a trampoline for the fwnmi vectors > I totally disagree with this one, espically reregitering with > the low address in the trampoline. The registration should be at > the new address. And a1, a2 are very generic names. I'm not sure which bit you disagree with? We have to use a trampoline, the addresses we pass to firmware must be < 32MB (see PAPR). I've changed the names. > [PATCH 8/8] powerpc: Fixups for kernel linked at 32 MB > (1) powermac smp.c -- use create_branch Done. > (2) The secondary hold code could be done as a 64 bit load in the > first 0x100 bytes vs LOADADDR Hmmm, not sure what you mean? > (3) Why did you move LOAD_HANDLER down one instruction? It would > seem not to help optimization Yep, fixed, that was a hang over from a previous version. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person From pbadari at us.ibm.com Thu Nov 17 11:33:55 2005 From: pbadari at us.ibm.com (Badari Pulavarty) Date: Wed, 16 Nov 2005 16:33:55 -0800 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> <20051116230820.GA29068@suse.de> <1132183002.24066.90.camel@localhost.localdomain> Message-ID: <1132187635.24066.99.camel@localhost.localdomain> On Wed, 2005-11-16 at 17:57 -0600, Sonny Rao wrote: > On 11/16/05, Badari Pulavarty wrote: > On Thu, 2005-11-17 at 00:08 +0100, Olaf Hering wrote: > > On Wed, Nov 09, Badari Pulavarty wrote: > > > > > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt > wrote: > > > > > I didn't have any luck on 2.6.14-git12 either. > > > > > I tried 64k page support on my P570. > > > > > > > > > > Here are the console messages: > > > > > > > > What distro do you use in userland ? Some older glibc > versions have a > > > > bug that cause issues with 64k pages, though it > generally happens with > > > > login blowing up, not init ... > > > > > > SLES9 (could be SLES9 SP1). > > > > Can you double check? rpm -qi glibc | head should be > enough. > > Would be bad if SP2 or SP3 does not work with 64k. > > > > I think I am using SLES9. Planning to update to SP3. > > > Badari, the problem is with your toolchain.. > the binutils in SLES9 is too old (even in SP3) > > The issue is that it cannot align something (the zero page I think) to > 64kb . > > SLES9 SP3 has "GNU ld version 2.15.90.0.1.1 20040303 (SuSE Linux)" > > But I have to use binutils 2.15.94 to make a 64kb kernel boot > properly > (I can give you the package offline if you need) Thank you Sonny. I updated my binutils package and 64k pagesize kernel works fine for me (atleast booted fine). Thanks, Badari From dwmw2 at infradead.org Thu Nov 17 11:44:03 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Thu, 17 Nov 2005 00:44:03 +0000 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <1132179746.28963.82.camel@baythorne.infradead.org> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> Message-ID: <1132188243.28963.91.camel@baythorne.infradead.org> If the kernel supports both G5 and pSeries, and CONFIG_EEH is enabled, eeh_init() is (quite reasonably) never called when we boot on a G5. Yet eeh_check_failure() still gets called. We should avoid doing that if !eeh_subsystem_enabled. Signed-off-by: David Woodhouse --- linux-2.6.13/include/asm-powerpc/eeh.h~ 2005-09-21 16:36:23.000000000 +0100 +++ linux-2.6.13/include/asm-powerpc/eeh.h 2005-09-21 17:41:51.000000000 +0100 @@ -32,6 +32,8 @@ struct notifier_block; #ifdef CONFIG_EEH +extern int eeh_subsystem_enabled; + /* Values for eeh_mode bits in device_node */ #define EEH_MODE_SUPPORTED (1<<0) #define EEH_MODE_NOCHECK (1<<1) @@ -95,7 +97,7 @@ int eeh_unregister_notifier(struct notif * If this macro yields TRUE, the caller relays to eeh_check_failure() * which does further tests out of line. */ -#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0) +#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0 && eeh_subsystem_enabled) /* * Reads from a device which has been isolated by EEH will return --- linux-2.6.13/arch/powerpc/platforms/pseries/eeh.c~ 2005-09-21 16:35:49.000000000 +0100 +++ linux-2.6.13/arch/powerpc/platforms/pseries/eeh.c 2005-09-21 17:40:41.000000000 +0100 @@ -99,7 +99,8 @@ static int ibm_read_slot_reset_state; static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; -static int eeh_subsystem_enabled; +int eeh_subsystem_enabled; +EXPORT_SYMBOL(eeh_subsystem_enabled); /* Buffer for reporting slot-error-detail rtas calls */ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -- dwmw2 From schwab at suse.de Thu Nov 17 12:32:45 2005 From: schwab at suse.de (Andreas Schwab) Date: Thu, 17 Nov 2005 02:32:45 +0100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <20051116232720.GA29512@suse.de> (Olaf Hering's message of "Thu, 17 Nov 2005 00:27:20 +0100") References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> <20051116230820.GA29068@suse.de> <1132183002.24066.90.camel@localhost.localdomain> <20051116232720.GA29512@suse.de> Message-ID: Olaf Hering writes: > On Wed, Nov 16, Badari Pulavarty wrote: > >> I think I am using SLES9. Planning to update to SP3. >> >> # rpm -qi glibc | head >> Name : glibc Relocations: (not >> relocatable) >> Version : 2.3.3 Vendor: SuSE Linux AG, >> Nuernberg, Germany >> Release : 98.28 Build Date: Wed Jun 30 >> 15:55:45 2004 > > The release number indicates the GA glibc.spec was used, but the > build date indicates its slightly older than SLES9 GA. Build date is local time (timezone has been chopped off here). Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From sonny.rao at gmail.com Thu Nov 17 10:57:40 2005 From: sonny.rao at gmail.com (Sonny Rao) Date: Wed, 16 Nov 2005 17:57:40 -0600 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1132183002.24066.90.camel@localhost.localdomain> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <20051109172125.GA12861@lst.de> <20051109201720.GB5443@w-mikek2.ibm.com> <1131568336.24637.91.camel@gaston> <1131573556.25354.1.camel@localhost.localdomain> <1131573693.24637.109.camel@gaston> <1131574051.25354.3.camel@localhost.localdomain> <20051116230820.GA29068@suse.de> <1132183002.24066.90.camel@localhost.localdomain> Message-ID: On 11/16/05, Badari Pulavarty wrote: > > On Thu, 2005-11-17 at 00:08 +0100, Olaf Hering wrote: > > On Wed, Nov 09, Badari Pulavarty wrote: > > > > > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote: > > > > > I didn't have any luck on 2.6.14-git12 either. > > > > > I tried 64k page support on my P570. > > > > > > > > > > Here are the console messages: > > > > > > > > What distro do you use in userland ? Some older glibc versions have > a > > > > bug that cause issues with 64k pages, though it generally happens > with > > > > login blowing up, not init ... > > > > > > SLES9 (could be SLES9 SP1). > > > > Can you double check? rpm -qi glibc | head should be enough. > > Would be bad if SP2 or SP3 does not work with 64k. > > > > I think I am using SLES9. Planning to update to SP3. Badari, the problem is with your toolchain.. the binutils in SLES9 is too old (even in SP3) The issue is that it cannot align something (the zero page I think) to 64kb . SLES9 SP3 has "GNU ld version 2.15.90.0.1.1 20040303 (SuSE Linux)" But I have to use binutils 2.15.94 to make a 64kb kernel boot properly (I can give you the package offline if you need) Hope it helps. Sonny -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051116/56915cd6/attachment.htm From benh at kernel.crashing.org Thu Nov 17 13:34:57 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Nov 2005 13:34:57 +1100 Subject: [PATCH] powerpc: Workaround for offb on 64 bits platforms Message-ID: <1132194898.20016.8.camel@gaston> This fixes a problem with offb not parsing addresses properly on 64 bits machines, and thus crashing at boot. The problem is worked around by locating the matching PCI device and using the properly relocated PCI base addresses instead of misparsing the Open Firmware properties. This fixes a crash at boot on MAUI among others. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/drivers/video/offb.c =================================================================== --- linux-work.orig/drivers/video/offb.c 2005-11-08 11:00:19.000000000 +1100 +++ linux-work/drivers/video/offb.c 2005-11-15 16:19:14.000000000 +1100 @@ -325,8 +325,8 @@ int *pp, i; unsigned int len; int width = 640, height = 480, depth = 8, pitch; - unsigned *up; - unsigned long address; + unsigned int rsize, *up; + unsigned long address = 0; if ((pp = (int *) get_property(dp, "depth", &len)) != NULL && len == sizeof(int)) @@ -344,10 +344,40 @@ pitch = 0x1000; } else pitch = width; - if ((up = (unsigned *) get_property(dp, "address", &len)) != NULL - && len == sizeof(unsigned)) + + rsize = (unsigned long)pitch * (unsigned long)height * + (unsigned long)(depth / 8); + + /* Try to match device to a PCI device in order to get a properly + * translated address rather then trying to decode the open firmware + * stuff in various incorrect ways + */ +#ifdef CONFIG_PCI + /* First try to locate the PCI device if any */ + { + struct pci_dev *pdev = NULL; + + for_each_pci_dev(pdev) { + if (dp == pci_device_to_OF_node(pdev)) + break; + } + if (pdev) { + for (i = 0; i < 6 && address == 0; i++) { + if ((pci_resource_flags(pdev, i) & + IORESOURCE_MEM) && + (pci_resource_len(pdev, i) >= rsize)) + address = pci_resource_start(pdev, i); + } + pci_dev_put(pdev); + } + } +#endif /* CONFIG_PCI */ + + if (address == 0 && + (up = (unsigned *) get_property(dp, "address", &len)) != NULL && + len == sizeof(unsigned)) address = (u_long) * up; - else { + if (address == 0) { for (i = 0; i < dp->n_addrs; ++i) if (dp->addrs[i].size >= pitch * height * depth / 8) From michael at ellerman.id.au Thu Nov 17 14:14:30 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 17 Nov 2005 14:14:30 +1100 (EST) Subject: [PATCH] powerpc: Early debugging support for iSeries Message-ID: <20051117031430.2911568748@ozlabs.org> Connect iSeries up to the standard early debugging infrastructure. To actually use this you need to define the iSeries version of EARLY_DEBUG_INIT in setup_64.c, and hit Ctrl-x Ctrl-x on your console to dump the Hypervisor console buffer. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/setup_64.c | 14 +++++++++----- arch/powerpc/platforms/iseries/setup.c | 18 +++++++++++++++--- 2 files changed, 24 insertions(+), 8 deletions(-) Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -74,22 +74,26 @@ * but your kernel will not boot on anything else if you do so */ -/* This one is for use on LPAR machines that support an HVC console - * on vterm 0 - */ +/* For use on LPAR machines that support an HVC console on vterm 0 */ extern void udbg_init_debug_lpar(void); -/* This one is for use on Apple G5 machines - */ + +/* This one is for use on Apple G5 machines */ extern void udbg_init_pmac_realmode(void); + /* That's RTAS panel debug */ extern void call_rtas_display_status_delay(unsigned char c); + /* Here's maple real mode debug */ extern void udbg_init_maple_realmode(void); +/* For iSeries - hit Ctrl-x Ctrl-x to see the output */ +extern void udbg_init_iseries(void); + #define EARLY_DEBUG_INIT() do {} while(0) #if 0 #define EARLY_DEBUG_INIT() udbg_init_debug_lpar() +#define EARLY_DEBUG_INIT() udbg_init_iseries() #define EARLY_DEBUG_INIT() udbg_init_maple_realmode() #define EARLY_DEBUG_INIT() udbg_init_pmac_realmode() #define EARLY_DEBUG_INIT() \ Index: kexec/arch/powerpc/platforms/iseries/setup.c =================================================================== --- kexec.orig/arch/powerpc/platforms/iseries/setup.c +++ kexec/arch/powerpc/platforms/iseries/setup.c @@ -52,6 +52,7 @@ #include #include #include +#include #include "naca.h" #include "setup.h" @@ -62,10 +63,8 @@ #include "call_sm.h" #include "call_hpt.h" -extern void hvlog(char *fmt, ...); - #ifdef DEBUG -#define DBG(fmt...) hvlog(fmt) +#define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) #endif @@ -994,3 +993,16 @@ static int __init early_parsemem(char *p return 0; } early_param("mem", early_parsemem); + +static void hvputc(unsigned char c) +{ + if (c == '\n') + hvputc('\r'); + + HvCall_writeLogBuffer(&c, 1); +} + +void udbg_init_iseries(void) +{ + udbg_putc = hvputc; +} From olof at lixom.net Thu Nov 17 14:24:27 2005 From: olof at lixom.net (Olof Johansson) Date: Wed, 16 Nov 2005 19:24:27 -0800 Subject: [PATCH] powerpc: Early debugging support for iSeries In-Reply-To: <20051117031430.2911568748@ozlabs.org> References: <20051117031430.2911568748@ozlabs.org> Message-ID: <20051117032427.GA6585@pb15.lixom.net> On Thu, Nov 17, 2005 at 02:14:30PM +1100, Michael Ellerman wrote: > Index: kexec/arch/powerpc/platforms/iseries/setup.c > =================================================================== > --- kexec.orig/arch/powerpc/platforms/iseries/setup.c > +++ kexec/arch/powerpc/platforms/iseries/setup.c > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > #include "naca.h" > #include "setup.h" > @@ -62,10 +63,8 @@ > #include "call_sm.h" > #include "call_hpt.h" > > -extern void hvlog(char *fmt, ...); > - You can make hvlog() in viocons.c static now too, this was the only external user of it. -Olof From sfr at canb.auug.org.au Thu Nov 17 14:31:39 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 17 Nov 2005 14:31:39 +1100 Subject: [PATCH] powerpc: Early debugging support for iSeries In-Reply-To: <20051117031430.2911568748@ozlabs.org> References: <20051117031430.2911568748@ozlabs.org> Message-ID: <20051117143139.33e985d8.sfr@canb.auug.org.au> On Thu, 17 Nov 2005 14:14:30 +1100 (EST) Michael Ellerman wrote: > > Connect iSeries up to the standard early debugging infrastructure. > > To actually use this you need to define the iSeries version of > EARLY_DEBUG_INIT in setup_64.c, and hit Ctrl-x Ctrl-x on your console to dump > the Hypervisor console buffer. > > Signed-off-by: Michael Ellerman Looks good to me. Acked-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051117/0907b956/attachment.pgp From dingrui at cn.ibm.com Thu Nov 17 19:27:47 2005 From: dingrui at cn.ibm.com (Rui Ding) Date: Thu, 17 Nov 2005 16:27:47 +0800 Subject: DIO defect with ppc64. Message-ID: Hi,ALL Some defect with DIO. Recently I did a DIO test based on ppc64 paltform. The system memory and swap space was eat up gradually .At last the system crashed down because out of memory . If we terminated this test case before the systetm down,mem and swap will never come back to normal eventhough many hours have past. Looks like kernel never free the memory and swap space that DIO operation had ever occupied. //////////////////////////////// // Test Environment //////////////////////////////// 2.6.9-22.EL,ppc64, i*86, powerpc //////////////////////////////// // Test Case //////////////////////////////// fsx-linux fsx-linux is a test suit provided by LTP. http://ltp.sourceforge.net/ Seems that the kernel nerver return the DIO buffuers .Did someone meet this problem? Or Is there some patch available for this bug? Regards Ding Rui(Rickey) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051117/eb765495/attachment.htm From michael at ellerman.id.au Thu Nov 17 20:41:49 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 17 Nov 2005 20:41:49 +1100 (EST) Subject: [PATCH] powerpc: Make early debugging configurable via Kconfig Message-ID: <20051117094149.157E36877B@ozlabs.org> This patch adds Kconfig entries to control the early debugging options, currently in setup_64.c. I was kind of keen to add these without them being exposed in the menus, as they're for "advanced users only", but that doesn't seem to work - anyone know if that's possible in Kconfig? Doing this via Kconfig means you can have one source tree, which is buildable for multiple platforms - and you can enable the correct early debug option for each platform via .config. I made udbg_early_init() a static inline 'cause otherwise GCC is to daft to optimise it away when debugging is off. Now that we have udbg_init_rtas() we can make call_rtas_display_status* static. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig.debug | 42 +++++++++++++++++++++++++++++++++++++++++ arch/powerpc/kernel/rtas.c | 10 +++++++-- arch/powerpc/kernel/setup_64.c | 38 +------------------------------------ arch/powerpc/kernel/udbg.c | 7 +++--- include/asm-powerpc/rtas.h | 1 include/asm-powerpc/udbg.h | 41 +++++++++++++++++++++++++++++++++++++++- 6 files changed, 96 insertions(+), 43 deletions(-) Index: kexec/arch/powerpc/Kconfig.debug =================================================================== --- kexec.orig/arch/powerpc/Kconfig.debug +++ kexec/arch/powerpc/Kconfig.debug @@ -115,4 +115,46 @@ config PPC_OCP depends on IBM_OCP || XILINX_OCP default y +choice + prompt "Early debugging (dangerous)" + bool + optional + help + Enable early debugging. Careful, if you enable debugging for the + wrong type of machine your kernel _will not boot_. + +config PPC_EARLY_DEBUG_LPAR + bool "LPAR HV Console" + depends on PPC_PSERIES + help + Select this to enable early debugging for a machine with a HVC + console on vterm 0. + +config PPC_EARLY_DEBUG_G5 + bool "Apple G5" + depends on PPC_PMAC64 + help + Select this to enable early debugging for Apple G5 machines. + +config PPC_EARLY_DEBUG_RTAS + bool "RTAS Panel" + depends on PPC_RTAS + help + Select this to enable early debugging via the RTAS panel. + +config PPC_EARLY_DEBUG_MAPLE + bool "Maple real mode" + depends on PPC_MAPLE + help + Select this to enable early debugging for Maple. + +config PPC_EARLY_DEBUG_ISERIES + bool "iSeries HV Console" + depends on PPC_ISERIES + help + Select this to enable early debugging for legacy iSeries. You need + to hit "Ctrl-x Ctrl-x" to see the messages on the console. + +endchoice + endmenu Index: kexec/arch/powerpc/kernel/rtas.c =================================================================== --- kexec.orig/arch/powerpc/kernel/rtas.c +++ kexec/arch/powerpc/kernel/rtas.c @@ -29,6 +29,7 @@ #include #include #include +#include struct rtas_t rtas = { .lock = SPIN_LOCK_UNLOCKED @@ -52,7 +53,7 @@ EXPORT_SYMBOL(rtas_flash_term_hook); * are designed only for very early low-level debugging, which * is why the token is hard-coded to 10. */ -void call_rtas_display_status(unsigned char c) +static void call_rtas_display_status(unsigned char c) { struct rtas_args *args = &rtas.args; unsigned long s; @@ -72,7 +73,7 @@ void call_rtas_display_status(unsigned c spin_unlock_irqrestore(&rtas.lock, s); } -void call_rtas_display_status_delay(unsigned char c) +static void call_rtas_display_status_delay(unsigned char c) { static int pending_newline = 0; /* did last write end with unprinted newline? */ static int width = 16; @@ -96,6 +97,11 @@ void call_rtas_display_status_delay(unsi } } +void udbg_init_rtas(void) +{ + udbg_putc = call_rtas_display_status_delay; +} + void rtas_progress(char *s, unsigned short hex) { struct device_node *root; Index: kexec/arch/powerpc/kernel/setup_64.c =================================================================== --- kexec.orig/arch/powerpc/kernel/setup_64.c +++ kexec/arch/powerpc/kernel/setup_64.c @@ -69,37 +69,6 @@ #define DBG(fmt...) #endif -/* - * Here are some early debugging facilities. You can enable one - * but your kernel will not boot on anything else if you do so - */ - -/* For use on LPAR machines that support an HVC console on vterm 0 */ -extern void udbg_init_debug_lpar(void); - -/* This one is for use on Apple G5 machines */ -extern void udbg_init_pmac_realmode(void); - -/* That's RTAS panel debug */ -extern void call_rtas_display_status_delay(unsigned char c); - -/* Here's maple real mode debug */ -extern void udbg_init_maple_realmode(void); - -/* For iSeries - hit Ctrl-x Ctrl-x to see the output */ -extern void udbg_init_iseries(void); - -#define EARLY_DEBUG_INIT() do {} while(0) - -#if 0 -#define EARLY_DEBUG_INIT() udbg_init_debug_lpar() -#define EARLY_DEBUG_INIT() udbg_init_iseries() -#define EARLY_DEBUG_INIT() udbg_init_maple_realmode() -#define EARLY_DEBUG_INIT() udbg_init_pmac_realmode() -#define EARLY_DEBUG_INIT() \ - do { udbg_putc = call_rtas_display_status_delay; } while(0) -#endif - int have_of = 1; int boot_cpuid = 0; int boot_cpuid_phys = 0; @@ -232,11 +201,8 @@ void __init early_setup(unsigned long dt struct paca_struct *lpaca = get_paca(); static struct machdep_calls **mach; - /* - * Enable early debugging if any specified (see top of - * this file) - */ - EARLY_DEBUG_INIT(); + /* Enable early debugging if any specified (see udbg.h) */ + udbg_early_init(); DBG(" -> early_setup()\n"); Index: kexec/arch/powerpc/kernel/udbg.c =================================================================== --- kexec.orig/arch/powerpc/kernel/udbg.c +++ kexec/arch/powerpc/kernel/udbg.c @@ -1,7 +1,8 @@ /* - * polling mode stateless debugging stuff, originally for NS16550 Serial Ports + * Polling mode stateless debugging stuff, originally for NS16550 Serial Ports. + * Used for early debugging and by xmon et al. * - * c 2001 PPC 64 Team, IBM Corp + * (c) 2001,2005 IBM Corporation. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -15,12 +16,12 @@ #include #include #include +#include void (*udbg_putc)(unsigned char c); unsigned char (*udbg_getc)(void); int (*udbg_getc_poll)(void); -/* udbg library, used by xmon et al */ void udbg_puts(const char *s) { if (udbg_putc) { Index: kexec/include/asm-powerpc/udbg.h =================================================================== --- kexec.orig/include/asm-powerpc/udbg.h +++ kexec/include/asm-powerpc/udbg.h @@ -1,5 +1,5 @@ /* - * c 2001 PPC 64 Team, IBM Corp + * (c) 2001,2005 IBM Corporation. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -28,4 +28,43 @@ extern void udbg_init_uart(void __iomem struct device_node; extern void udbg_init_scc(struct device_node *np); +extern void udbg_init_debug_lpar(void); +extern void udbg_init_pmac_realmode(void); +extern void udbg_init_maple_realmode(void); +extern void udbg_init_iseries(void); +extern void udbg_init_rtas(void); + +/* + * Early debugging facilities. You can enable _one_ of these, but if you do so + * your kernel _will not boot_ on anything else. Be careful. + */ +static inline void udbg_early_init(void) +{ +#if defined(CONFIG_PPC_EARLY_DEBUG_LPAR) + + /* For LPAR machines that have an HVC console on vterm 0 */ + udbg_init_debug_lpar(); + +#elif defined(CONFIG_PPC_EARLY_DEBUG_G5) + + /* For use on Apple G5 machines */ + udbg_init_pmac_realmode(); + +#elif defined(CONFIG_PPC_EARLY_DEBUG_RTAS) + + /* RTAS panel debug */ + udbg_init_rtas(); + +#elif defined(CONFIG_PPC_EARLY_DEBUG_MAPLE) + + /* Maple real mode debug */ + udbg_init_maple_realmode(); + +#elif defined(CONFIG_PPC_EARLY_DEBUG_ISERIES) + + /* For iSeries - hit Ctrl-x Ctrl-x to see the output */ + udbg_init_iseries(); +#endif +} + #endif /* _ASM_POWERPC_UDBG_H */ Index: kexec/include/asm-powerpc/rtas.h =================================================================== --- kexec.orig/include/asm-powerpc/rtas.h +++ kexec/include/asm-powerpc/rtas.h @@ -160,7 +160,6 @@ extern struct rtas_t rtas; extern void enter_rtas(unsigned long); extern int rtas_token(const char *service); extern int rtas_call(int token, int, int, int *, ...); -extern void call_rtas_display_status(unsigned char); extern void rtas_restart(char *cmd); extern void rtas_power_off(void); extern void rtas_halt(void); From dwmw2 at infradead.org Thu Nov 17 20:42:29 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Thu, 17 Nov 2005 09:42:29 +0000 Subject: [PATCH] Disable CONFIG_RTC on ppc64. Message-ID: <1132220549.28963.122.camel@baythorne.infradead.org> The CONFIG_RTC option no longer does what it used to do. It used to provide RTC functionality on ppc64 machines; now it just causes an infinite loop in rtc_get_rtc_time() instead. Disable it forcibly and we can use CONFIG_GEN_RTC instead, which works fine. Signed-off-by: David Woodhouse diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 970f70d..7491f4e 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -687,7 +687,7 @@ config NVRAM config RTC tristate "Enhanced Real Time Clock Support" - depends on !PPC32 && !PARISC && !IA64 && !M68K + depends on !PPC32 && !PPC64 && !PARISC && !IA64 && !M68K ---help--- If you say Y here and create a character special file /dev/rtc with major number 10 and minor number 135 using mknod ("man mknod"), you -- dwmw2 From paulus at samba.org Thu Nov 17 22:04:34 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 17 Nov 2005 22:04:34 +1100 Subject: [PATCH] Disable CONFIG_RTC on ppc64. In-Reply-To: <1132220549.28963.122.camel@baythorne.infradead.org> References: <1132220549.28963.122.camel@baythorne.infradead.org> Message-ID: <17276.25538.214271.296128@cargo.ozlabs.ibm.com> David Woodhouse writes: > The CONFIG_RTC option no longer does what it used to do. It used to > provide RTC functionality on ppc64 machines; now it just causes an It did? Not on partitioned systems, presumably. Anyway, ppc64 had its own /dev/misc/rtc implementation in arch/ppc64/kernel/rtc.c. Which, as you point out, has been dropped in favour of using drivers/char/genrtc.c. > + depends on !PPC32 && !PPC64 && !PARISC && !IA64 && !M68K How about just depends on !PPC && !PARISC && ... ? Paul. From aswathavijay at gmail.com Thu Nov 17 23:38:45 2005 From: aswathavijay at gmail.com (Vijayakumar Ramalingam) Date: Thu, 17 Nov 2005 18:08:45 +0530 Subject: How to compile gcc for 64 bit Power PC Message-ID: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> Hi all, I want to compile gcc 3.4.3 on 32-bit x86 redhat Linux system, for 64 bit Power PC. Can you give some pointers for cross compiling. I tried, but I am facing the problem, in gcc folder, For Configuring i have mentioned like this ./configure --target=powerpc-*-eabi --host=i686-pc-linux-gnu \ --enable-shared \ --enable-64-bit-bfd \ --prefix=/usr/src/redhat/BUILD/toolchain/src/gcc1 \ - --with-local-prefix=/usr/src/redhat/BUILD/toolchain/src/gcc1/local \ --with-sysroot=/ and i have done make following are the error messages. TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h" DEFINES="" \ /bin/sh ../../../gcc/mkconfig.sh bconfig.h bconfig.h is unchanged ./genmodes -h > tmp-modes.h /bin/sh: line 1: ./genmodes: cannot execute binary file Thanks in advance, I feel that i will get reply for this. Thanks & Regards, Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051117/794bbc2b/attachment.htm From dwmw2 at infradead.org Fri Nov 18 00:36:17 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Thu, 17 Nov 2005 13:36:17 +0000 Subject: [PATCH] Disable CONFIG_RTC on ppc64. In-Reply-To: <17276.25538.214271.296128@cargo.ozlabs.ibm.com> References: <1132220549.28963.122.camel@baythorne.infradead.org> <17276.25538.214271.296128@cargo.ozlabs.ibm.com> Message-ID: <1132234577.28963.155.camel@baythorne.infradead.org> On Thu, 2005-11-17 at 22:04 +1100, Paul Mackerras wrote: > It did? Not on partitioned systems, presumably. Anyway, ppc64 had > its own /dev/misc/rtc implementation in arch/ppc64/kernel/rtc.c. > Which, as you point out, has been dropped in favour of using > drivers/char/genrtc.c. The CONFIG_RTC option used to select the ppc64-specific implementation didn't it? > > + depends on !PPC32 && !PPC64 && !PARISC && !IA64 && !M68K > > How about just depends on !PPC && !PARISC && ... ? Is there a CONFIG_PPC? -- dwmw2 From haveblue at us.ibm.com Fri Nov 18 01:30:07 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Thu, 17 Nov 2005 15:30:07 +0100 Subject: How to compile gcc for 64 bit Power PC In-Reply-To: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> References: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> Message-ID: <1132237807.5834.72.camel@localhost> On Thu, 2005-11-17 at 18:08 +0530, Vijayakumar Ramalingam wrote: > Hi all, > I want to compile gcc 3.4.3 on 32-bit x86 redhat Linux system, > for 64 bit Power PC. Can you give some pointers for cross compiling. > I tried, but I am facing the problem, in gcc folder, OSDL has some precompiled ones already: http://developer.osdl.org/dev/plm/cross_compile/ I use them all the time. -- Dave From segher at kernel.crashing.org Fri Nov 18 01:48:47 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 17 Nov 2005 15:48:47 +0100 Subject: How to compile gcc for 64 bit Power PC In-Reply-To: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> References: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> Message-ID: > For Configuring i have mentioned like this > ????? ./configure? --target=powerpc-*-eabi > ????????????????????? --host=i686-pc-linux-gnu \ > ????????????????????? --enable-shared \ > ????????????????????? --enable-64-bit-bfd \??????????? > ????????????????????? > --prefix=/usr/src/redhat/BUILD/toolchain/src/gcc1 \????????????? > -????????????????? > --with-local-prefix=/usr/src/redhat/BUILD/toolchain/src/gcc1/local \ > ? ? ? ? ? ? ? ? ? ? ?? --with-sysroot=/ --build=i686-pc-linux-gnu You are going to run into more problems... much easier to use some precompiled thing, as Dave points out. Segher From hollis at penguinppc.org Fri Nov 18 02:27:13 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Thu, 17 Nov 2005 09:27:13 -0600 Subject: How to compile gcc for 64 bit Power PC In-Reply-To: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> References: <5f87992f0511170438v468d79b7l28e9f4024426f57c@mail.gmail.com> Message-ID: <7a6ca36a1f1ada05f7850801f4b120b1@penguinppc.org> On Nov 17, 2005, at 6:38 AM, Vijayakumar Ramalingam wrote: > Hi all, > ???? I want to compile gcc 3.4.3 on 32-bit x86 redhat Linux system, > for 64 bit Power PC. Can you give some pointers for cross compiling. I always use crosstool: http://penguinppc.org/dev/crosstool.php http://kegel.com/crosstool If you need that particular version of gcc, you can edit the (small) build config and uncomment the appropriate line. -Hollis From olof at lixom.net Fri Nov 18 02:51:01 2005 From: olof at lixom.net (Olof Johansson) Date: Thu, 17 Nov 2005 07:51:01 -0800 Subject: DIO defect with ppc64. In-Reply-To: References: Message-ID: <20051117155101.GA28050@pb15.lixom.net> Hi, On Thu, Nov 17, 2005 at 04:27:47PM +0800, Rui Ding wrote: > Recently I did a DIO test based on ppc64 paltform. > The system memory and swap space was eat up gradually .At last the > system crashed down because out of memory . [...] > //////////////////////////////// > // Test Environment > //////////////////////////////// > > 2.6.9-22.EL,ppc64, i*86, powerpc Sounds like you should be talking to RedHat instead of emailing this list. Or, let us know when you have reproduced with a mainline kernel. -Olof From apw at shadowen.org Fri Nov 18 04:00:31 2005 From: apw at shadowen.org (Andy Whitcroft) Date: Thu, 17 Nov 2005 17:00:31 +0000 Subject: [PATCH] ppc64 need HPAGE_SHIFT when huge pages disabled Message-ID: <20051117170031.GA30223@shadowen.org> With the new powerpc architecture we don't seem to be able to disable huge pages anymore. mm/built-in.o(.toc1+0xae0): undefined reference to `HPAGE_SHIFT' make: *** [.tmp_vmlinux1] Error 1 We seem to need to define HPAGE_SHIFT to something when HUGETLB_PAGE isn't defined. This patch defines it to 0 when we have no support. How does this look? Against 2.6.15-rc1-mm1. Signed-off-by: Andy Whitcroft --- diff -upN reference/include/asm-powerpc/page_64.h current/include/asm-powerpc/page_64.h --- reference/include/asm-powerpc/page_64.h +++ current/include/asm-powerpc/page_64.h @@ -86,7 +86,11 @@ static inline void copy_page(void *to, v extern u64 ppc64_pft_size; /* Large pages size */ +#ifdef CONFIG_HUGETLB_PAGE extern unsigned int HPAGE_SHIFT; +#else +#define HPAGE_SHIFT 0 +#endif #define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE - 1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) From akpm at osdl.org Fri Nov 18 08:11:20 2005 From: akpm at osdl.org (Andrew Morton) Date: Thu, 17 Nov 2005 13:11:20 -0800 Subject: [PATCH] ppc64 need HPAGE_SHIFT when huge pages disabled In-Reply-To: <20051117170031.GA30223@shadowen.org> References: <20051117170031.GA30223@shadowen.org> Message-ID: <20051117131120.5a61b9ba.akpm@osdl.org> Andy Whitcroft wrote: > > With the new powerpc architecture we don't seem to be able to disable > huge pages anymore. > > mm/built-in.o(.toc1+0xae0): undefined reference to `HPAGE_SHIFT' > make: *** [.tmp_vmlinux1] Error 1 > > We seem to need to define HPAGE_SHIFT to something when HUGETLB_PAGE isn't > defined. This patch defines it to 0 when we have no support. > Yes, i386 defines HPAGE_SHIFT always. > > Signed-off-by: Andy Whitcroft > --- > diff -upN reference/include/asm-powerpc/page_64.h current/include/asm-powerpc/page_64.h > --- reference/include/asm-powerpc/page_64.h > +++ current/include/asm-powerpc/page_64.h > @@ -86,7 +86,11 @@ static inline void copy_page(void *to, v > extern u64 ppc64_pft_size; > > /* Large pages size */ > +#ifdef CONFIG_HUGETLB_PAGE > extern unsigned int HPAGE_SHIFT; > +#else > +#define HPAGE_SHIFT 0 > +#endif > #define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) > #define HPAGE_MASK (~(HPAGE_SIZE - 1)) > #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) I think this change will cause a compile warning in mm/memory.c: if (unlikely(is_vm_hugetlb_page(vma))) { unmap_hugepage_range(vma, start, end); zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE); This code will be removed by the compiler. But before that happens, we're doing a divide by zero and the compiler will whine. So I'd suggest that you set the !CONFIG_HUGETLB_PAGE value of HPAGE_SHIFT to PAGE_SHIFT, not to zero. I'll make that change locally.. From segher at kernel.crashing.org Fri Nov 18 08:22:14 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 17 Nov 2005 22:22:14 +0100 Subject: [PATCH] Maple: request I/O resource. Message-ID: Against 2.6.15-rc1-git5. Segher [Also attached as I'm sure my mailer will screw it up]. Reserve the Maple RTC I/O resource. Needed now we use genrtc. Signed-off-by: Segher Boessenkool -- Index: linux-2.6.15-rc1/arch/powerpc/platforms/maple/time.c =================================================================== --- linux-2.6.15-rc1.orig/arch/powerpc/platforms/maple/time.c 2005-11-17 22:04:31.197832600 +0100 +++ linux-2.6.15-rc1/arch/powerpc/platforms/maple/time.c 2005-11-17 22:05:37.402754864 +0100 @@ -158,6 +158,11 @@ return 0; } +static struct resource rtc_iores = { + .name = "rtc", + .flags = IORESOURCE_BUSY, +}; + unsigned long __init maple_get_boot_time(void) { struct rtc_time tm; @@ -172,7 +177,11 @@ printk(KERN_INFO "Maple: No device node for RTC, assuming " "legacy address (0x%x)\n", maple_rtc_addr); } - + + rtc_iores.start = maple_rtc_addr; + rtc_iores.end = maple_rtc_addr + 7; + request_resource(&ioport_resource, &rtc_iores); + maple_get_rtc_time(&tm); return mktime(tm.tm_year+1900, tm.tm_mon+1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-maple-time-resource Type: application/octet-stream Size: 1049 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051117/795d95eb/attachment.obj From sonny at burdell.org Fri Nov 18 10:01:59 2005 From: sonny at burdell.org (Sonny Rao) Date: Thu, 17 Nov 2005 18:01:59 -0500 Subject: ppc64 oops.. In-Reply-To: <1132089742.5646.51.camel@gaston> References: <17273.30606.694749.166420@cargo.ozlabs.ibm.com> <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <20051115174332.GA9632@krispykreme> <1132088841.5646.49.camel@gaston> <20051115211002.GB30500@krispykreme> <1132089742.5646.51.camel@gaston> Message-ID: <20051117230159.GA9996@kevlar.burdell.org> On Wed, Nov 16, 2005 at 08:22:21AM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2005-11-16 at 08:10 +1100, Anton Blanchard wrote: > > > Well, it probably also breaks pSeries without NUMA and with a memory > > > hole :) > > > > Which is nothing in the last 4 years :) > > I know you miss your POWER3 :) Is it? I'm fairly certain POWER4+ in LPAR mode doesn't export NUMA topology. From anton at samba.org Fri Nov 18 10:25:52 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 18 Nov 2005 10:25:52 +1100 Subject: ppc64 oops.. In-Reply-To: <20051117230159.GA9996@kevlar.burdell.org> References: <1132041022.5646.33.camel@gaston> <1132042033.5646.39.camel@gaston> <20051115174332.GA9632@krispykreme> <1132088841.5646.49.camel@gaston> <20051115211002.GB30500@krispykreme> <1132089742.5646.51.camel@gaston> <20051117230159.GA9996@kevlar.burdell.org> Message-ID: <20051117232552.GB19128@krispykreme> > Is it? I'm fairly certain POWER4+ in LPAR mode doesn't export NUMA > topology. But it doesnt have an IO hole (which this bug required). Anton From greg at kroah.com Fri Nov 18 10:44:06 2005 From: greg at kroah.com (Greg KH) Date: Thu, 17 Nov 2005 15:44:06 -0800 Subject: [PATCH 1/7] PCI Error Recovery: header file patch In-Reply-To: <20051116231041.GA16057@austin.ibm.com> References: <20051108234911.GC19593@austin.ibm.com> <20051108235357.GD19593@austin.ibm.com> <20051116231041.GA16057@austin.ibm.com> Message-ID: <20051117234406.GA10573@kroah.com> On Wed, Nov 16, 2005 at 05:10:41PM -0600, linas wrote: > > Greg, Please apply. This has been modified to use unsigned int's > per disucssion. Ok, I've added this one now, and dropped the previous two I had. Can you bounce me the other 6 patches in the series, I dropped them from my inbox a while ago. thanks, greg k-h From paulus at samba.org Fri Nov 18 13:10:41 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 18 Nov 2005 13:10:41 +1100 Subject: [PATCH] Disable CONFIG_RTC on ppc64. In-Reply-To: <1132234577.28963.155.camel@baythorne.infradead.org> References: <1132220549.28963.122.camel@baythorne.infradead.org> <17276.25538.214271.296128@cargo.ozlabs.ibm.com> <1132234577.28963.155.camel@baythorne.infradead.org> Message-ID: <17277.14369.423233.267480@cargo.ozlabs.ibm.com> David Woodhouse writes: > Is there a CONFIG_PPC? Yes. From paulus at samba.org Fri Nov 18 17:17:55 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 18 Nov 2005 17:17:55 +1100 Subject: please pull powerpc-merge.git Message-ID: <17277.29203.840712.364366@cargo.ozlabs.ibm.com> Linus, Please do a pull from git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge.git With this, arch/ppc64 is now empty. There is still stuff in include/asm-ppc64; that will get moved out shortly. Apart from that there are a few other fixes as listed in the shortlog below. Thanks, Paul. Benjamin Herrenschmidt: powerpc: Workaround for offb on 64 bits platforms powerpc: merge align.c David Woodhouse: Avoid use of uninitialised spinlock in EEH. Kumar Gala: ppc: Fix MPC83xx device table ppc: Fix warnings related to seq_file Michael Ellerman: powerpc: Fix typo in topology.h Mike Kravetz: Remove SPAN_OTHER_NODES config definition Nick Piggin: powerpc: Fix database regression due to scheduler changes Olaf Hering: ppc boot: replace string labels with numbers Paul Mackerras: powerpc: Fix delay functions for 601 processors powerpc: Move remaining .c files from arch/ppc64 to arch/powerpc powerpc: Fix compile error on pSeries arising from delay.h changes powerpc: time-of-day fixes for 32-bit CHRP systems powerpc: Fix a couple of compile warnings for 32-bit compiles powerpc: Move defconfig over and remove remaining arch/ppc64 files offb: Fix compile error on ppc32 systems Segher Boessenkool: powerpc: Maple: request I/O resource. arch/powerpc/Kconfig | 13 arch/powerpc/Makefile | 9 arch/powerpc/boot/crt0.S | 23 arch/powerpc/configs/ppc64_defconfig | 346 +++++- arch/powerpc/configs/pseries_defconfig | 1 arch/powerpc/kernel/Makefile | 7 arch/powerpc/kernel/align.c | 530 ++++++++++ arch/powerpc/kernel/idle_64.c | 0 arch/powerpc/kernel/misc_32.S | 8 arch/powerpc/kernel/nvram_64.c | 0 arch/powerpc/kernel/rtas-rtc.c | 6 arch/powerpc/kernel/time.c | 28 + arch/powerpc/platforms/chrp/setup.c | 11 arch/powerpc/platforms/chrp/smp.c | 1 arch/powerpc/platforms/chrp/time.c | 3 arch/powerpc/platforms/maple/time.c | 11 arch/powerpc/platforms/pseries/eeh.c | 3 arch/powerpc/platforms/pseries/setup.c | 4 arch/ppc/kernel/Makefile | 4 arch/ppc/kernel/align.c | 410 -------- arch/ppc/kernel/pci.c | 1 arch/ppc/platforms/85xx/mpc85xx_ads_common.h | 2 arch/ppc/platforms/85xx/stx_gp3.h | 2 arch/ppc/syslib/mpc83xx_sys.c | 28 - arch/ppc64/Kconfig.debug | 65 - arch/ppc64/Makefile | 142 --- arch/ppc64/configs/bpa_defconfig | 1024 ------------------- arch/ppc64/configs/g5_defconfig | 1392 -------------------------- arch/ppc64/configs/iSeries_defconfig | 998 ------------------- arch/ppc64/configs/maple_defconfig | 1062 -------------------- arch/ppc64/configs/pSeries_defconfig | 1371 -------------------------- arch/ppc64/kernel/Makefile | 7 arch/ppc64/kernel/align.c | 396 ------- drivers/video/offb.c | 41 + include/asm-powerpc/cputable.h | 22 include/asm-powerpc/delay.h | 40 - include/asm-powerpc/eeh.h | 4 include/asm-powerpc/topology.h | 4 38 files changed, 936 insertions(+), 7083 deletions(-) rename arch/{ppc64/defconfig => powerpc/configs/ppc64_defconfig} (90%) create mode 100644 arch/powerpc/kernel/align.c rename arch/{ppc64/kernel/idle.c => powerpc/kernel/idle_64.c} (100%) rename arch/{ppc64/kernel/nvram.c => powerpc/kernel/nvram_64.c} (100%) delete mode 100644 arch/ppc/kernel/align.c delete mode 100644 arch/ppc64/Kconfig.debug delete mode 100644 arch/ppc64/Makefile delete mode 100644 arch/ppc64/configs/bpa_defconfig delete mode 100644 arch/ppc64/configs/g5_defconfig delete mode 100644 arch/ppc64/configs/iSeries_defconfig delete mode 100644 arch/ppc64/configs/maple_defconfig delete mode 100644 arch/ppc64/configs/pSeries_defconfig delete mode 100644 arch/ppc64/kernel/Makefile delete mode 100644 arch/ppc64/kernel/align.c From dwmw2 at infradead.org Fri Nov 18 23:15:33 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Fri, 18 Nov 2005 12:15:33 +0000 Subject: [PATCH] ppc64 syscall_exit_work: call the save_nvgprs function, not its descriptor. In-Reply-To: <1132080738.21643.28.camel@hades.cambridge.redhat.com> References: <1132080738.21643.28.camel@hades.cambridge.redhat.com> Message-ID: <1132316133.3642.43.camel@baythorne.infradead.org> On Tue, 2005-11-15 at 18:52 +0000, David Woodhouse wrote: > This cleanup patch speeds up the null syscall path on ppc64 by about 3%, > and brings the ppc32 and ppc64 code slightly closer together. Needs this unless your binutils, like mine, are clever enough to notice my stupidity and fix it up automatically... Spotted by Paul. Signed-off-by: David Woodhouse diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 83b9edf..2eb6f54 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -241,7 +241,7 @@ syscall_exit_work: bne- 3b subi r12,r12,TI_FLAGS -4: bl save_nvgprs +4: bl .save_nvgprs /* Anything else left to do? */ andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP|_TIF_SAVE_NVGPRS) beq .ret_from_except_lite -- dwmw2 From linas at austin.ibm.com Sat Nov 19 09:14:56 2005 From: linas at austin.ibm.com (linas) Date: Fri, 18 Nov 2005 16:14:56 -0600 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <1132188243.28963.91.camel@baythorne.infradead.org> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> <1132188243.28963.91.camel@baythorne.infradead.org> Message-ID: <20051118221456.GA9060@austin.ibm.com> Hi Paul, This looks like an entirely reasonable patch to me. Please push if you agree. --linas On Thu, Nov 17, 2005 at 12:44:03AM +0000, David Woodhouse was heard to remark: If the kernel supports both G5 and pSeries, and CONFIG_EEH is enabled, eeh_init() is (quite reasonably) never called when we boot on a G5. Yet eeh_check_failure() still gets called. We should avoid doing that if !eeh_subsystem_enabled. Signed-off-by: David Woodhouse Acked-by: Linas Vepstas --- linux-2.6.13/include/asm-powerpc/eeh.h~ 2005-09-21 16:36:23.000000000 +0100 +++ linux-2.6.13/include/asm-powerpc/eeh.h 2005-09-21 17:41:51.000000000 +0100 @@ -32,6 +32,8 @@ struct notifier_block; #ifdef CONFIG_EEH +extern int eeh_subsystem_enabled; + /* Values for eeh_mode bits in device_node */ #define EEH_MODE_SUPPORTED (1<<0) #define EEH_MODE_NOCHECK (1<<1) @@ -95,7 +97,7 @@ int eeh_unregister_notifier(struct notif * If this macro yields TRUE, the caller relays to eeh_check_failure() * which does further tests out of line. */ -#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0) +#define EEH_POSSIBLE_ERROR(val, type) ((val) == (type)~0 && eeh_subsystem_enabled) /* * Reads from a device which has been isolated by EEH will return --- linux-2.6.13/arch/powerpc/platforms/pseries/eeh.c~ 2005-09-21 16:35:49.000000000 +0100 +++ linux-2.6.13/arch/powerpc/platforms/pseries/eeh.c 2005-09-21 17:40:41.000000000 +0100 @@ -99,7 +99,8 @@ static int ibm_read_slot_reset_state; static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; -static int eeh_subsystem_enabled; +int eeh_subsystem_enabled; +EXPORT_SYMBOL(eeh_subsystem_enabled); /* Buffer for reporting slot-error-detail rtas calls */ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -- dwmw2 From paulus at samba.org Sat Nov 19 12:53:20 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 19 Nov 2005 12:53:20 +1100 Subject: [PATCH] Avoid use of uninitialised spinlock in EEH. In-Reply-To: <20051118221456.GA9060@austin.ibm.com> References: <1127322900.28995.149.camel@hades.cambridge.redhat.com> <200509221446.56228.arnd@arndb.de> <1132148305.21643.58.camel@hades.cambridge.redhat.com> <20051116214116.GV19593@austin.ibm.com> <1132179746.28963.82.camel@baythorne.infradead.org> <1132188243.28963.91.camel@baythorne.infradead.org> <20051118221456.GA9060@austin.ibm.com> Message-ID: <17278.34192.390577.646345@cargo.ozlabs.ibm.com> linas writes: It's in Linus' tree now. Paul. From miltonm at bga.com Sat Nov 19 17:44:58 2005 From: miltonm at bga.com (Milton Miller) Date: Sat, 19 Nov 2005 00:44:58 -0600 Subject: [PATCH 0/8] powerpc: Kexec fixups and support for booting at 32MB In-Reply-To: <200511171107.13501.michael@ellerman.id.au> References: <3e8a578e38ed908cfa7ad1e55d1de681@bga.com> <200511171107.13501.michael@ellerman.id.au> Message-ID: <5ec7a4d43949d07e34e669455ba9bfd3@bga.com> On Nov 16, 2005, at 6:07 PM, Michael Ellerman wrote: > On Tue, 15 Nov 2005 05:23, Milton Miller wrote: >> PS: Please copy me on kexec/kdump ppc patches so I can reply with >> references. I read the lists from the web. >> [PATCH 1/8] powerpc: Turn cpu_irq_down into kexec_cpu_down > > Paul has already merged this so you can fix it up later if you want. > It fixes > bugs so we need it now rather than later. Ok so one for the list (along with ppc32 kexec comments) > [PATCH 5/8] powerpc: Add CONFIG_CRASH_DUMP >> [PATCH 6/8] powerpc: Reroute interrupts from 0 + offset to >> PHYSICAL_START + offset >> >> The following should be in user space / device tree: >> +#ifdef CONFIG_CRASH_DUMP >> + lmb_reserve(0, KDUMP_BACKUP_LIMIT); >> +#endif > > I disagree. It's a PPC64 implementation detail that we have to fiddle > with > stuff at 0, as far as userspace is concerned the kdump kernel is at 32 > MB and > up. If we require kexec-tools to do this, we'll have to keep the shape > of > head.S in sync with kexec-tools. It's a PowerPC thing that we fiddle with the first 3 pages and that is well defined. It is a kdump thing that the kernel is at 32M, so it should reserve 0-32M if it wants that reserved. The kernel can touch reserved memory. This is based on a discussion I had with Haren that the memory given to the kernel would be 0-crash_kernel_end but the 0-32M would be reserved. I guess there is a correctness issue if the first 3 pages (4, since we need fwnmi) are not reserved, but reserving beyond that to KDUMP_BACKUP_LIMIT is arbitrary. In other words, in a pure first-boot high-kernel case, these pages couuld be used as normal pages. The hands-off part is because it is dumping the previous kernel. > [PATCH 7/8] powerpc: Create a trampoline for the fwnmi vectors >> I totally disagree with this one, espically reregitering with >> the low address in the trampoline. The registration should be at >> the new address. And a1, a2 are very generic names. > > I'm not sure which bit you disagree with? We have to use a trampoline, > the > addresses we pass to firmware must be < 32MB (see PAPR). > I've changed the names. I missed the requirement, and thought this was "register at the same place as last time" patch. I'll withdraw the objection, although a comment to this requirement would be nice. >> (2) The secondary hold code could be done as a 64 bit load in the >> first 0x100 bytes vs LOADADDR > > Hmmm, not sure what you mean? Alternative implementation option: vs LOAD_ADDR(label) do t: .llong